From VistA EMR Community: Would like to collaborate on synthetic patient data

Hello there,

I am Sam Habiel, Pharm.D.; Technical Fellow at the Open Source Electronic Health Record Alliance (OSEHRA). I met Tony McCormick at the OSEHRA Summit this year.

Tony told me about the project to create synthetic patients using NHANES data. I am interested as that will help us produce educational and demo instances of VistA. Lack of good demo data is a big issue for our community. I saw that there is big thread on the topic here. If somebody can summarize where you are at right now and how outsiders can contribute, I would appreciate it.

I have another project that I would like collaboration on: I and others in an organization called WorldVistA–are working on creating a public domain high quality drug interactions set. We have made several releases. The latest can be found here: You may find this data useful for your EMR. If you would like to use it and support us in maintaining the data, we will be happy to collaborate.

–Sam Habiel, Pharm.D. Technical Fellow OSEHRA


Hi Sam, We are using the NHANES data set to create simulated patients. There are 9000+ records created on username/password admin/password If you want to check it out.

In addition Dr. Hoyt has been manual creating some more detail patient histories (10 or so at the moment.)

I have scripts which download and parse the 2011-2012 NHANES datasets, then create patients records with demographics, problem lists, medications, a few labs and other things like smoking, alcohol, race, income.

It should be possible to use the same data to create records in VistA, the key issue to solve would be the interface to create the records.

Because LibreEHR is web based, my scripts are simulating browser activity to create records by mimicking the same actions a human would to manually enter the data.

One big question is what we need to accomplish to secure additional funding/resources to sustain this effort.

1 Like

Thank you Kevin.

I actually will be writing M code to add the data directly into VistA. Can you share the scripts with me? We tried using runnable scripts for other projects and they just take too long for a Continuous Integration pipeline, which is what we are aiming for.


1 Like

It’s nodejs code, not well documented yet but shared on github.

I can also share database dumps/csv files of the NHANES data.

1 Like

Thank you.

Where’s the code that reads the NHANES files?


I am using R to load the NHANES data and converting them to .csv

This part of my scripts uses the command line to load the files and make .csv files.

I can send you the .csv files as a .zip if you want. Probably simpler.

1 Like

Let’s do that. .zip will be good.

1 Like

The “pieces” of this process, are as follows.

  1. Web crawler which downloads the data files from the CDC and a conversion step from SAS to .csv (resulting in the .zip file contents)

  2. Scripts which load the .CSV files into MySQL/MariaDB. (1 to 1 mapping of the .csv files to database tables/columns)

  3. Code which translates the database representations into JavaScript object This maps the “numeric values” from the CDC data into things like problem lists For example, based on yes/no (1/2) answers to parts of the questionnaire for each respondent, we build problem lists.

  4. Code which then uses these JavaScript objects and “applies them” to the EHR Example here is a form post to create the patients


You are a good elucidator! Thank you.