I am Sam Habiel, Pharm.D.; Technical Fellow at the Open Source Electronic Health Record Alliance (OSEHRA). I met Tony McCormick at the OSEHRA Summit this year.
Tony told me about the project to create synthetic patients using NHANES data. I am interested as that will help us produce educational and demo instances of VistA. Lack of good demo data is a big issue for our community. I saw that there is big thread on the topic here. If somebody can summarize where you are at right now and how outsiders can contribute, I would appreciate it.
I have another project that I would like collaboration on: I and others in an organization called WorldVistA–are working on creating a public domain high quality drug interactions set. We have made several releases. The latest can be found here: https://github.com/glilly/osdi. You may find this data useful for your EMR. If you would like to use it and support us in maintaining the data, we will be happy to collaborate.
In addition Dr. Hoyt has been manual creating some more detail patient histories (10 or so at the moment.)
I have scripts which download and parse the 2011-2012 NHANES datasets, then create patients records with demographics, problem lists, medications, a few labs and other things like smoking, alcohol, race, income.
It should be possible to use the same data to create records in VistA, the key issue to solve would be the interface to create the records.
Because LibreEHR is web based, my scripts are simulating browser activity to create records by mimicking the same actions a human would to manually enter the data.
One big question is what we need to accomplish to secure additional funding/resources to sustain this effort.
I actually will be writing M code to add the data directly into VistA. Can
you share the scripts with me? We tried using runnable scripts for other
projects and they just take too long for a Continuous Integration pipeline,
which is what we are aiming for.
Web crawler which downloads the data files from the CDC and a conversion step from SAS to .csv (resulting in the .zip file contents)
Scripts which load the .CSV files into MySQL/MariaDB. (1 to 1 mapping of the .csv files to database tables/columns)
Code which translates the database representations into JavaScript object
This maps the “numeric values” from the CDC data into things like problem lists
For example, based on yes/no (1/2) answers to parts of the questionnaire for each respondent, we build problem lists. https://github.com/yehster/NHANESImport/blob/master/NHANESData.js#L180