Introduction
HitzalMed gets a file in a text-readable format (e.g., .txt, .pdf) and helps find the items within that may need to undergo anonymisation for sharing them safely. At the moment, HitzalMed only processes text in Spanish.
HitzalMed offers four ways to approach the anonymisation task:
Identify potentially sensitive items.
Example:

Identify and classify potentially sensitive items according to the following set of classes:
Age | Healthcare center | Patient name or surname |
Assistance contact ID | Hospital | Patient ID |
Country | Institution | Person (in general) |
Date | Insurance ID | Profession |
Doctor name or surname | Kinship | Sex |
Doctor ID | Location | Street |
Other | ||
Fax number | Phone number |
Example:

Replace potentially sensitive items with their specific predicted category.
Example:

Replace potentially sensitive items with random substitutes in order to eliminate the items while keeping the text natural.
Example:

Usage
1. Load your text
Just choose your own file from you filesystem or use one of the examples provided. To process a file of your own, drag it to the dropbox or click on the dropbox to open a file browser and select the file. To use one of the examples provided, simply click on the example link.
2. Configure the task
Choose the task from the menu on the sidebar to the left.
3. Modify the annotations
You may apply the following modifications to the automatically produced annotations:
- Add or remove a sensitive item annotation
- Change the category of a sensitive item
- Change the replacement produced for a sensitive item
In order to make these modifications, simply click on the token of interest to access the annotation modification modal:


4. Download the results
You can download the current state of a document by clicking on the "Download" button in the sidebar. You can choose which files to download: plain text files (original and/or replaced), or pickle files containing tokenwise lists of the diferent tagger's annotations.
5. Finish
Click on the button "Restart" in order to clean the working space and load a new file.
Frequently Asked Questions
What do you use to detect and classify sensitive information?
A sequence labelling model based on Multilingual BERT and trained on the NUBes-PHI corpus. You can read about it here.
One of the items has been incorrectly tagged and/or replaced. What can I do?
Double clicking on any of the words of the document, tagged or untagged, allows you to choose a tag manually or remove the current one. In replace mode, you can also edit the text.
In replacement mode, what does the "Reroll" button do?
Rerolling a document replaces all sensitive items again, outputting a new text version.
In replacement mode, some of the ages did not change. Why?
In the medical field, children's ages might be more relevant than adults'. For that reason, we decided not to replace any ages below 14. If you wish to change them still, you can manually replace words by double-clicking on them.
In replacement mode, what does the symbol "*" mean?
It means that the corresponding original token has been replaced by an empty string. This is likely to happen with middle names or long addresses.
Can I delete words from a document using the editor in replacement mode?
Unfortunately, Hitzal does not support deleting content. However, if you absolutely need to do it using the website, you can simply replace any word with the symbol "*". This way, it will be skipped when generating the "Replaced" text file.