High-Risk Patient Identification: A Review of Current Methodologies | Biomedgrid llc

Identifying High Cost High Need (HCHN) patients has been increasingly identified as important among healthcare stakeholders since the 1997 Balanced Budget Act prompted the Health Care Financing Administration (HCFA) to require risk-adjusted payment methodologies. High risk patient identification was originally derived primarily from claims data. This has since evolved to using multiple data sources, including clinical data from electronic health records, self-reported health measures etc.

Importantly, the case definition for HCHN patients is evolving and needs further research. Various high risk identifications are used by payers and providers for a range of purposes ranging from targeted outreach for disease management, reducing readmissions, and strategies for greater care coordination through patient centered medical homes and Accountable care organizations (ACOs). However, the great diversity of case definitions, diversity among stakeholders, availability of source data, access to technology and analytical manpower, all complicate the refinement and use of a high-risk patient identification.

With the ongoing momentum to leverage big data technologies, several solutions for HCHN patient identification are being developed. It is therefore imperative to understand the current practices and evidence from research prior to adopting a flexible and proactive predictive analytics approach to HCHN patients. The critical need for further research and funding is outlined.

Keywords: High risk; High Cost High Need; Predictive modelling; Risk adjustment; Research priorities

Abbrevation: HCHN: High Cost High Need; VBP: Value Based Payment; ACO: Accountable Care Organizations; HER: Electronic Health Records


Healthcare has been undergoing a transition to meet the Triple Aim goals of: improved patient care experience, population health and reduced cost through value-based payment (VBP) incentives. Payment models that risk adjust for patient case mix have been used increasingly for reimbursement of healthcare providers. Incentives from value-based contracting encourage collaborative care models among providers in various care settings to assume responsibility for vulnerable patient populations [1]. To reduce costs and improve health outcomes, these efforts create care pathways for high-risk patients that can require active monitoring and care planning.

Such initiatives place the burden of high-risk member identification on the providers, while payers still employ rudimentary risk adjusted payment models. Collaborative care models sometimes provide services not reimbursable by payers even though they may yield long term benefits [2,3]. In response, many organizations are developing care models for chronic disease states to risk stratify their patient population and employ a targeted approach at care coordination to conserve costs [3,4]. Since traditional CMS risk adjustment models over rely on historical data with significant limitations, the industry is responding with meaningful ways to incorporate big data and sophisticated machine learning algorithms with mixed results [3].


Articles were identified using search terms: High risk patient identification, predicting high cost patients, high cost high need patient identification, risk adjustment analysis, predictive analytics risk assessment, risk adjustment for healthcare payment. The search was limited to peer-reviewed articles since the year 2000, to maintain relevance to current day practices. Resuls were then screened on title and abstract, with a focus primarily on predictive modeling methodology and secondarily on background of risk adjustment policy and practice. The resulting papers were then reviewed in detail with full text assessment to evaluate their potential impact. Here we summarize the key findings from the highest impact articles

Results and Discussion

Defining High risk patients

According to an expert panel from the Commonwealth Fund, high-need and high-cost (HCHN) patients are typically those with complex clinical conditions that limit their ability to care for themselves and should be the focus of federal initiatives to reduce costs [5,6]. The top 5% of most costly patients contribute to approximately 50% of healthcare costs as shown consistently in literature. High-cost and high-need patients include patients with three or more chronic diseases with functional limitations that impact their self-care and routine activities of daily living. Most definitions include a behavioral or psychological component as one of the chronic diseases, and social determinants of health as well [7].

Main aspects of predictive analytics

Some investigators consider adverse health events are random occurrences and question the validity of using previous utilization and cost data to predict future outcomes. It is theorized that those who have a serious medical event and receive active intervention, are typically are not the same that contribute to costs in the near future [8]. However, most would agree that individuals with multiple medical and behavioral comorbid chronic conditions and unmet social needs are at increased risk for adverse health events such as emergency department visits, hospitalizations and surgery. Models that rely on identifying patients in the upper quartiles of cost or utilization don’t account for the principles of regression to the mean, i.e. most high-cost patients will improve even without intervention [8]. In contrast, predictive analytics that address the composite impact of various patient, organizational or system factors that predict adverse events and utilization can have superior results.

Claims vs. clinical data: A significant limitation of healthcare data is that it is largely incomplete or inaccurate. Claims data is primarily generated for administrative and reimbursement purposes, whereas clinical documentation is an inexact source of data with significant user variability [8]. The literature shows there are evident limitations to relying on claims data in highrisk prediction, since administrative data is not representative of clinical risk [9]. Contrastingly, even though there are significant advantages to using EHR data, and review of literature shows that they are underutilized [10].

Despite the limited availability of clinical data, there are incentives for providers to leverage EHR technology to make this data available. However, claims data is readily available and far more standardized than clinical data and most contemporary risk models are dependent on claims data. Comparing risk assessment tools that used a. patient demographics, b. self-reported outcomes and claims based methods to predict costs, claims based methods to be still more effective than the alternatives [11]. EHR data, even when available is frequently unstandardized and user dependent, and often incomplete or inaccurate [8].

Big data methodologies: Although machine learning algorithms are increasingly used in healthcare, barriers to wider application include: 1. difficulties with feature selection, especially with respect to determining temporal correlation and 2. Inadequate explanation of prediction results that limits actional clinical applications. The slow adoption of machine learning is also attributed to the limited expertise of analysts in applying and interpreting these complex methodologies [12]. Kan et al., found making incremental changes to standard linear regression models, with minor modifications could yield better results when knowledge and resources are constrained. While drawing causal inference and unbiased estimates from machine learning models may be ineffective, payers and providers can still leverage them to create robust prediction models to identify high-risk.

Key limitation: Generalizability: While various predictive models have been developed to identify the HCHN patient, the generalizability of the methods is limited by the type of care setting and the healthcare specialty area that delivered the patient care studied. Especially, when administrative claims data is the main data source, with variations in socioeconomic, clinical, lab and medication data collected in EHR , the results of most predictive methodologies are not applicable across care settings and patient populations. The main limitation of several existing models are they are skewed to the target populations and have marginal generalizability.


Further research is needed on a wide range of issues, including the refinement of common definitions of the HCHN patient and identifying strategies to improve their care that are generalizeable. The potential opportunity for policy makers and providers in developing a scalable predictive model for a high yield set of chronic conditions and social determinants of health is of significant value. Much of the current research on predictive modeling offers limited information on the machine learning methodology and are challenging to evaluate [13,14]. Frameworks that offer proof of concepts or standard approaches with explicit open source code and best practices need to be made available. Investments in advancing the techniques for predictive modeling are clearly critical. Further research that addresses the constantly evolving needs of the industry are imperative


We thank Dr. William Johnson for his help with formalizing the research topic.


The authors note no conflicts of interest.

Article by Shalini Sivanandam



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store