Medical University of Warsaw, a research and academic institution with over 200 years of history, employs 2,500 academic staff and educates 10,000 students annually.
Data science, NLP
AssistMED – Research Assistant for Automatic Analysis of Medical Data in Cardiology
We developed a platform to support scientific research using NLP algorithms. The platform extracts reliable information about diseases, medications (and dosages), and echocardiographic parameters from medical texts.
CLIENT
CHALLENGE
The primary challenge was to extract valuable information from unstructured medical documentation for analysis and research. Although most medical records are now electronic, their unstructured nature (about 80%) complicates automated processing.
Key issues included:
- Diverse writing styles, abbreviations, and terminologies among doctors.
- Errors like typos, mixed Polish and English language usage, and unstandardized numerical data.
- Complex terminology and acronyms that change meaning with subtle variations.
The project was undertaken as part of the “Innovation Incubator 4.0” and won a grant in 2021.
Cezary Maciejewski
from the 1st Department and Clinic of Cardiology
talks about the AssistMED project.
(Video Material in Polish)
SOLUTION
The implementation of AssistMED was a multi-stage, iterative process:
- Understanding the domain: Collaborated with Dr. Cezary Maciejewski, a physician and data scientist, to understand the needs and challenges.
- Tool Selection: Evaluated and implemented appropriate NLP tools for text annotation and processing.
- Algorithm Design: Developed three algorithms for extracting:
- Diseases.
- Medications and dosages.
- Echocardiographic parameters.
- Annotation Tools: Built a custom annotation tool for efficient data labeling, replacing third-party tools like Prodigy due to limitations.
- Validation: Employed two medical students to annotate and validate extracted data.
- Statistical Reporting: Generated metrics for classification accuracy and scientific analysis.
RESULTS
An Innovative System in the Polish Market
Accuracy: Achieved a precision rate of 90–100% for Polish medical texts.
Flexibility: Designed for expansion to other medical fields and SaaS deployment.
The project was successfully completed.
The software enables practical applications in medicine, scientific research, and clinical drug trials. The solution’s architecture is designed for scalability (handling higher computational loads) and allows for modular components that can integrate with other systems.
23000
125
1194
40
15
Applications
- Large-scale data analysis in cardiology.
- Clinical research and observational studies.
- Automated risk scoring systems.
End Users
- Clinical cardiology departments.
- Physicians
- Researchers.
- CROs (Clinical Research Organizations) for faster patient recruitment.
- IT companies managing medical documentation.
- Public health organizations for epidemiological data analysis.
Technologies
- Python
- Spacy (NLP)
- MED7, Polish Language Model
- Prodigy
- Django
- Google Translate
- Celery
- RabbitMQ
- Docker
- PostgreSQL
- Linux
Duration
5 months
Team
3 persons