CTAKES AND NLP
- side: X12 Parser - Java based (Hadrian)
- Maven, build system … java based
- Takes free from notes, and then uses NLMS to map terms and and temporal relationships (Boston Children)
- Dr Bowen Data, Dr Hoyt use in Education
- Interface with Carehub to EHR
- Atherian - Smart contract (block chain)
- Could cTakes be used to normalize reporting. EHR / NLP service integration
GRANT This project proposes integrating NLP functionality into existing open source EHRs with an NLP engine. We will implement with a service that leverages the Apache cTakes  NLP engine and using a secure, asynchronous messaging middleware layer using Apache ActiveMQ . The service transforms free-text data into semi-structured data that can be further analyzed to improve processes and quality of care.
LibreHealth EHR  will be used as the “proof of concept” for the service integration. The NLP service will provide an API and could be integrated with other EHRs, such as OpenEMR, VisTA, etc. All code will be open source, made available under the Apache License v2 (and/or Mozilla)
Clearly, this type of project would have broad interest in the informatics community. Doesn’t someone still have to set up the pipeline for NLP? (which requires considerable expertise, I believe). How difficult is this? How many hours of software development and at what rate/hour?
Ironically, JAMIA published one of the early (2010) summary articles of cTAKES in the literature. Software originated from the Mayo Clinic
In the process of researching NLP and cTAKES in more detail, I did run across the i2B2 project and the fact that they release de-identified research patient datasets for NLP training. Looks like they have about 1500 patients in the released datasets. Has anyone explored this before? The requesting university needs to register and sign a data use agreement. Would this be simpler than de-identifying Sam’s progress notes? Just asking
I have looked at the i2b2 dataset before, but it didnt seem useful to be in an EHR unless we make it fit the SOAP note structure or another specific type of clinical notes and build those forms in the EHR. The NLP in cTakes works good, but there is a rise of deep learning methods for NLP that have cropped up in the last 2-3 years.
I was hoping for more discussion regarding the issue of NLP. It is my understanding that cTAKES was designed specifically for free text in EHRs which is a plus. On the other hand, there might be better NLP software that is less labor intensive. I think it would be appealing to have NLP integrated with LibreHealth but I frankly don’t know how many universities would take advantage of it. Building and testing the pipelines is complex
I’m going to start working on more details on this next week… Thanks for the extras!
Please be aware that e-mail communication can be intercepted in transmission or misdirected. Please consider communicating any sensitive information by telephone. The information contained in this message may be privileged and confidential. If you are NOT the intended recipient, please notify the sender immediately with a copy to email@example.com and destroy this message.
Have you seen this - https://github.com/GoTeamEpsilon/ctakes-rest-service ? Know anything about this group? @rhoyt I found that these people are building a RESTful service, webclient UI for cTakes and also use OpenEMR
I was unaware of the cTakes RESTful API project. Sounds like Tony should review this to be sure he is not trying to re-invent the wheel
I will definitely take a look
If the format of the I2b2 dataset means more work, I would be happy to take some (10-20) of Sam’s encounter notes and de-identify them for student exercises and possibly NLP down the road. Can we make that happen?
It has been extremely difficult to find a grant for LibreHealth EHR enhancement but easier as a biomedical data science platform, when coupled with the sister project on Data World. In other words, if our initiative can promote biomedical data science education and training by providing NLP training, FHIR sandbox, descriptive analytics and possibly predictive analytics this might be attractive for future grants. Thoughts?
Give the cTakes NLP guys (@hzbarcea) a bit to see what they can do first, but after that we can provide the notes privately for editing if needed (or better find a chunk that don’t need editing) .
I am in agreement, I think we can create a very cool, closed loop, teaching tool using NLP, NHANES, FHIR an other resources.
Another thought would be to somehow connect EHR data output (csv) to the open-source Java based machine learning program WEKA that accepts csv input. Truthfully, most programs are not ready for this but it would be an interesting proposal to create the EHR-machine learning loop for predictive analytics
Over the course of the past year, 17 universities have contacted me with interest in the EHR and/or data science platform. Many of the recent ones followed the two talks I gave at AMIA. Clearly, most of these schools are academic non-medical centers and tend to be small to medium programs. They are thirsty for tools to help teach EHR competency and basic data science.
I have opted to treat the LibreHealth EHR and Data World projects as sister projects as both would benefit informatics programs. It is my hope and vision that we could get an educational grant for this new platform to promote biomedical data science. We would hope that the API sandbox, FHIR, cTAKES, etc. can be part of that platform.
The fall semester is right around the corner so perhaps we can get 3-5 universities to use the platform, but gear up for more by the next semester. So that I don’t have to type an email epiphany each time to explain our status, I put together a little “white paper” on where we are at I can send to universities. Feel free to comment and correct. Happy Fourth.
Thanks for sharing the doc. It’s an important project and Weka integration will be very useful. We also need to allow a way to import data from CSV format, and not need custom tools each time. A configurable CSV importer will be very useful. I do think that it’s not a small project. It will take nearly half a year of development, and more for testing, implementation, feedback and rework. I think this is very suitable for the SCH grants - https://www.nsf.gov/pubs/2018/nsf18541/nsf18541.htm#pgm_desc_txt. particularly the Health information infrastructure or connecting data seems appropriate.
I’m glad you see the merit in this proposed platform. Thanks for sharing the link to the NSF grant. I was not familiar with it. As a non-computer scientist, I see the merit in using WEKA to teach Introduction to Machine learning with no coding or math. I also put together a concept/mind map that ties all of this together. Connecting WEKA directly with LibreHealth EHR would be very significant.
Do we need other academic partners, particularly programs who have successfully been funded by the NIH or NSF?
There is already a java web connector to connect Data World to WEKA. The issue is that it opens up a SQL query window, rather than loading the csv file, which would be better in my opinion.
I believe it would be unique to integrate a machine learning program like WEKA with an electronic health record. The same could be said for integrating NLP software and a few other ideas we have come up with