Health Informatics Challenges in Technology Education


The Rensselaer Health Informatics Challenges in Technology Education (INCITE) Pipeline recruits and prepares students at Rensselaer and worldwide to be data scientists in healthcare using early data analytics courses and experiential research projects centered on real-world health challenges.

With the advent of electronic healthcare records (EHR) and precision medicine, healthcare increasingly relies on health informatics (HI), the philosophy and tools of data science (DS) and their application in healthcare. Rensselaer Health INCITE is a innovative, replicable program that directly expands the health informatics workforce pipeline at the early undergraduate level for students at RPI and worldwide. Health INCITE addresses key challenges in attracting and training top talent:

  • The shortage of data scientists
  • The lack of awareness among students of HI careers
  • The difficulty incorporating reality-driven healthcare projects into curricula due to EHR privacy concerns.

Rensselaer Polytechnic Institute's Data INCITE pipeline for undergraduate data science education consists of an early data analytics course followed by applied data science research experiences on real-world problems. Data INCITE results in data science skills and prompts students to pursue further coursework and careers in data science. Health INCITE builds on Data INCITE, providing a similar pipeline to recruit and train data scientists for health informatics careers.

The Rensselaer Health INCITE Pipeline:

  1. Produces students skilled in health informatics.
  2. Creates novel, low-barrier pathways into health informatics for students from a wide array of majors, including pre-med, biology, biomedical engineering, computer science, and mathematics.
  3. Enables health informatics education at many institutions by creating shared health informatics instructional project resources, including publicly-available, web-based, open source data analytics applications. 
  4. Recruit students to pursue health informatics careers.

The Rensselaer Health INCITE Pipeline has been generously funded by the United Health Foundation

Check out the Project Gallery to see research products (interactive apps, papers, and presentations) created by Health INCITE students in the Data INCITE Lab. 

Parent Projects

Project Gallery


Revealing the regional disparities in outcomes, determinants, and mediations of the COVID-19 pandemic
COVIDMINDER reveals the regional disparities in outcomes, determinants, and mediations of the COVID-19 pandemic. Outcomes are the direct effects of COVID-19. Social and Economic Determinants are pre-existing risk factors that impact COVID-19 outcomes. Mediations are resources and programs used to combat the pandemic. COVIDMINDER analysis and visualizations are by students and staff of The Rensselaer Institute for Data Exploration and Applications at Rensselaer Polytechnic Institute with generous support from the United Health Foundation. COVIDMINDER is an open source project implemented on the R Shiny platform.


Enabling healthcare researchers, providers, payers, and policy makers to gain actionable insights into how, where, and why midlife mortality rates are rising in the United States
MortalityMinder (MM) is a web-based visualization tool that enables interactive exploration of social, economic and geographic factors associated with premature mortality among mid-life adults ages 25-64 across the United States. Using authoritative data from the CDC and other sources, MM is a freely available, publicly-accessible, open source, and easily maintained tool. The goal of MortalityMinder (MM) is to enable healthcare researchers, providers, payers, and policy makers to gain actionable insights into how, where, and why midlife mortality rates are rising in the United States (US). It is designed to help healthcare payers, providers and policymakers at the national, state, county and community levels identify and address unmet healthcare needs, healthcare costs, and healthcare utilization.

COVID Back-to-School

A tool for generating actionable information on how to reopen schools, universities, and workplaces
To control spread of COVID, we must implement social distancing measures. Rather than arbitrarily implementing measures against COVID spread, we have built a tool that gives you a quantitative approach to controlling spread. COVID Back-to-School is a tool for generating actionable information on how to reopen schools (elementary, secondary, boarding), universities, workplaces, etc. For different settings of the social distancing "knobs,: you can find out how the infection will spread in your school/university/organization. You can tune the knobs until the spread is a tolerable level for you. The settings for these knobs will then tell you what social distancing protocols you need in place to accomplish that level of tolerable spread.


Natural language processing for topical sentiment analysis of COVID-19 Twitter discourse
In this exploratory study, we scrutinize a database of over one million tweets collected from March to July 2020 to illustrate public attitudes towards mask usage during the COVID-19 pandemic. We employ natural language processing, clustering and sentiment analysis techniques to organize tweets relating to mask-wearing into high-level themes, then relay narratives for each theme using automatic text summarization. In recent months, a body of literature has highlighted the robustness of trends in online activity as proxies for the sociological impact of COVID-19.


Aiding the development of institutional re-opening strategies based on location and selected Social Distancing models
COVID WarRoom has been designed to aid in the development of re-opening strategies as we begin the re-opening process. Presently, COVID WarRoom allows the user to select a location for analysis, and then define the parameters by using one of our four predefined Social Distancing models: Linear Auto-SD, Linear Default-SD, Quadratic Auto-SD, and Quadratic Default-SD.

Privacy-Preserving Synthetic Health Data for Research and Education

The inability to share private health data can severely stifle research and innovation in health informatics. Studies based on unpublished electronic medical record (EMR) data cannot be reproduced, thus future researchers are not able to use them to develop and compare new research. This contributes to the reproduciblity crisis in biomedical research. Making open data available for research can spur innovation and research. The public Medical Information Mart for Intensive Care datasets, MIMIC-II and MIMIC-III, are widely used with over 2000 citations reported in Google Scholar in March 2020. But since MIMIC-II and MIMIC-III focus on Intensive Care Unit patients in Boston hospitals, the resulting research may be biased and have limited generalization. The cost and time required, along with re-identification risk concerns make de-identification only a partial solution to this problem.