× Sorry, you must be a registered user in order to access the demo. In the meantime, you can learn about what it does and how to use it.

Introduction

HitzalMed gets a file in a text-readable format (e.g., .txt, .pdf) and helps find the items within that may need to undergo anonymisation for sharing them safely. At the moment, HitzalMed only processes text in Spanish.

HitzalMed offers four ways to approach the anonymisation task:

Identification

Identify potentially sensitive items.

Example:

Categorisation

Identify and classify potentially sensitive items according to the following set of classes:

Age Healthcare center Patient name or surname
Assistance contact ID Hospital Patient ID
Country Institution Person (in general)
Date Insurance ID Profession
Doctor name or surname Kinship Sex
Doctor ID Location Street
E-mail Other
Fax number Phone number

Example:

Masking

Replace potentially sensitive items with their specific predicted category.

Example:

Replacement

Replace potentially sensitive items with random substitutes in order to eliminate the items while keeping the text natural.

Example:

Usage

1. Load your text

Just choose your own file from you filesystem or use one of the examples provided. To process a file of your own, drag it to the dropbox or click on the dropbox to open a file browser and select the file. To use one of the examples provided, simply click on the example link.

2. Configure the task

Choose the task from the menu on the sidebar to the left.

3. Modify the annotations

You may apply the following modifications to the automatically produced annotations:

  • Add or remove a sensitive item annotation
  • Change the category of a sensitive item
  • Change the replacement produced for a sensitive item

In order to make these modifications, simply click on the token of interest to access the annotation modification modal:

4. Download the results

You can download the current state of a document by clicking on the "Download" button in the sidebar. You can choose which files to download: plain text files (original and/or replaced), or pickle files containing tokenwise lists of the diferent tagger's annotations.

5. Finish

Click on the button "Restart" in order to clean the working space and load a new file.

Frequently Asked Questions

What do you use to detect and classify sensitive information?

A sequence labelling model based on Multilingual BERT and trained on the NUBes-PHI corpus. You can read about it here.

One of the items has been incorrectly tagged and/or replaced. What can I do?

Double clicking on any of the words of the document, tagged or untagged, allows you to choose a tag manually or remove the current one. In replace mode, you can also edit the text.

In replacement mode, what does the "Reroll" button do?

Rerolling a document replaces all sensitive items again, outputting a new text version.

In replacement mode, some of the ages did not change. Why?

In the medical field, children's ages might be more relevant than adults'. For that reason, we decided not to replace any ages below 14. If you wish to change them still, you can manually replace words by double-clicking on them.

In replacement mode, what does the symbol "*" mean?

It means that the corresponding original token has been replaced by an empty string. This is likely to happen with middle names or long addresses.

Can I delete words from a document using the editor in replacement mode?

Unfortunately, Hitzal does not support deleting content. However, if you absolutely need to do it using the website, you can simply replace any word with the symbol "*". This way, it will be skipped when generating the "Replaced" text file.