I'm an HCC Coding QA Manager. Here's What AI Actually Can and Can't Do in Risk Adjustment Right Now.
By Daniel Plasencia — Certified Risk Coder (CRC), Certified Professional Coder (CPC)

The Conversation Nobody in Risk Adjustment Is Having Honestly
Every week I see another headline: "AI Processes Thousands of Charts Per Day." "AI-Driven Coding Accuracy Hits 95%." "The Future of Medical Coding Is Autonomous."
And every week, my QA team pulls charts where the AI-suggested codes would have failed a RADV audit.
I manage quality assurance for an HCC coding operation. My job is to catch errors before they become compliance problems. I have watched AI coding tools go from "interesting experiment" to "thing my coders are being told to use" in less than 18 months. Some of what these tools do is genuinely impressive. Some of it is dangerous. And the conversation happening in our industry right now is not distinguishing between the two.
This is not an anti-AI article. I use AI tools in my own workflow. But I am going to be specific about what works, what does not, and what the $556 million Kaiser settlement and the active DOJ investigation into UnitedHealth should be teaching every risk adjustment coder about where this is all heading.
What AI Actually Gets Right in HCC Coding
Let me give credit where it is earned. There are things AI coding tools do well, and pretending otherwise would be dishonest.
Speed on straightforward lookups. If a provider documents "Type 2 diabetes mellitus with diabetic chronic kidney disease, stage 3," an AI tool can map that to E11.22 and show you it hits HCC 18 (Diabetes with Chronic Complications) under V28 almost instantly. For clean, well-documented, single-condition encounters, AI is faster than any human.
Pattern recognition across large chart volumes. AI is good at flagging charts where a medication list suggests a diagnosis that was not captured. Patient is on insulin, metformin, and lisinopril but the encounter only documents hypertension? The tool flags the missing diabetes. This is real value — it catches the low-hanging fruit that human coders sometimes miss at hour six of a ten-hour shift.
V28 mapping updates. When CMS transitioned fully to V28 in 2026 — dropping 2,294 diagnosis codes and expanding HCC categories from 86 to 115 — AI tools updated their mapping tables immediately. No waiting for a printed reference guide. No manual crosswalk spreadsheets. The data refresh is instantaneous.
Volume processing for initial screening. For large retrospective chart reviews, AI can do the first pass — identifying charts that likely contain capturable HCCs — and send only the flagged charts to human coders. This triage function is legitimate. It lets your team focus their clinical judgment on charts that need it rather than spending time on charts with nothing to capture.
These are not trivial benefits. For coding operations processing thousands of charts, the productivity gain on routine work is real.
Where AI Falls Apart — and Where It Gets Dangerous
Here is where I stop being generous, because this is where the real risk lives.
MEAT evidence validation is beyond current AI. Every HCC diagnosis submitted for risk adjustment must be supported by MEAT criteria — Monitoring, Evaluation, Assessment, and Treatment. The diagnosis has to be more than present in the chart. The provider has to have actively addressed it during that encounter.
AI tools can find a diagnosis mentioned in a chart. What they cannot reliably do is determine whether the provider actually monitored, evaluated, assessed, or treated that condition during that specific visit. Did the provider review the latest A1C and adjust the diabetes management plan? Or did the problem list just carry forward "Type 2 diabetes" from the last visit with no clinical engagement?
This distinction is everything in risk adjustment. It is the difference between a defensible code and the kind of unsupported diagnosis that cost Kaiser Permanente $556 million.
From 2009 to 2018, Kaiser used data-mining tools to identify diagnoses that had not been submitted to CMS, then systematically pushed physicians to add them through retroactive chart addenda — sometimes more than a year after the patient encounter, according to the Department of Justice settlement announcement. That generated roughly $1 billion in improper payments and the largest False Claims Act settlement in Medicare Advantage history.
The mechanism that made Kaiser's scheme possible — using software to surface diagnoses for submission without verifying clinical support — is conceptually not that different from what some AI coding tools do today. The tool finds a diagnosis. The coder submits it. Nobody verified the MEAT.
Multi-condition hierarchy logic breaks AI. V28 has 115 HCC categories arranged in hierarchical groups. When a patient has multiple related conditions, the hierarchy determines which code trumps which. A patient with both HCC 221 (Skin Ulcer) and HCC 383 (Pressure Ulcer of Skin with Necrosis Through to Muscle, Tendon, or Bone) — the more severe condition trumps the less severe one. But the clinical documentation has to support the higher-severity code specifically.
AI tools regularly suggest the higher-paying code in a hierarchy without evaluating whether the documentation actually supports that level of severity. I have seen AI suggest Stage 4 pressure ulcer codes when the clinical note describes a Stage 2. The tool read "pressure ulcer" and picked the code with the highest RAF weight. That is not coding. That is upcoding.
Specificity gaps create audit exposure. V28 demands more clinical specificity than V24 ever did. "Heart failure" is not enough — you need systolic versus diastolic, the ejection fraction, the NYHA class. "CKD" is not enough — you need the exact stage. AI tools often default to the most general code when documentation is ambiguous, or worse, infer specificity that is not explicitly documented.
In a RADV audit, the question is simple: does the medical record support this exact code? If the AI suggested E11.65 (Type 2 diabetes with hyperglycemia) but the note only says "diabetes, sugars running high" without a documented glucose reading or A1C showing hyperglycemia, that code gets rejected. Your RAF score gets clawed back. At scale, those clawbacks add up to millions.
Context and clinical judgment cannot be automated. Here is an example from last month. A chart documented "history of breast cancer, currently on tamoxifen, no evidence of recurrent disease." An AI tool flagged this for HCC capture as an active malignancy. But "history of" with "no evidence of recurrent disease" is not an active cancer — it is a personal history code (Z85.3), which does not map to an HCC. A coder with clinical judgment reads that documentation and immediately knows the difference. The AI read "breast cancer" and "tamoxifen" and drew the wrong conclusion.
This is not a rare edge case. Cancer history versus active cancer. Resolved versus chronic conditions. Acute exacerbation versus stable chronic disease. These distinctions drive HCC capture decisions dozens of times per day, and AI gets them wrong often enough that you cannot trust it without human review.
The Enforcement Landscape Has Changed — and AI Is Not Ready for It
Let me lay out what is happening on the compliance side, because this is the context that makes the AI accuracy problem urgent rather than academic.
CMS scaled RADV audit staff from 40 to approximately 2,000 certified medical coders. That is not a typo. A 50x increase in audit capacity. They are now launching new RADV audits on a quarterly cadence.
Five overlapping RADV audit cycles are running simultaneously — Payment Years 2020 through 2024. If your organization submitted codes in any of those years, you are in the audit window right now.
The DOJ is actively investigating UnitedHealth Group — the nation's largest private health insurer — for adding high-value diagnoses that inflate risk scores without physician confirmation. Former employees have told DOJ attorneys they were pressured to add diagnoses without supporting lab tests. The investigation covers both civil and criminal liability.
The Kaiser settlement set the precedent. $556 million. The whistleblower who triggered the case was a medical coder who saw diagnoses being added without documentation support. The DOJ's message is clear: if your codes are not defensible, the consequences are real and they are escalating.
In this environment, "the AI suggested it" is not a defense. CMS does not care whether a human coder or an AI tool selected the diagnosis code. The question is whether the medical record supports it. Period.
The industry is shifting from what Fierce Healthcare called "coding intensity" — capture as many codes as possible — to "defensible accuracy" — can you prove every single code you submitted? AI tools that were designed for the coding intensity era are liabilities in the defensible accuracy era.
What Smart Coders Are Actually Doing With AI
The coders on my team who are using AI effectively are not letting it code for them. They are using it as a pre-screening layer and then applying their own clinical judgment to every single code before it goes out.
Here is what that workflow looks like in practice:
AI does the first pass. The tool scans the chart and flags potential HCC-capturable diagnoses. This saves the coder from reading every line of a 40-page chart looking for conditions that may or may not be there.
The coder validates every flag. For each AI-suggested code, the coder reads the actual clinical documentation and asks: Is this condition actively addressed in this encounter? Does the documentation support the level of specificity this code requires? Is there MEAT evidence? Is this a current condition or a historical one?
The coder catches what AI misses. AI tools miss diagnoses that are documented narratively rather than in a structured problem list. A provider who writes "patient's cognitive decline has worsened, now requiring 24-hour supervision, family considering memory care placement" has documented dementia progression — but the AI might not flag it if the word "dementia" does not appear.
The coder rejects what AI gets wrong. Every AI suggestion that does not pass the coder's clinical judgment gets rejected. No exceptions. The coder is the last line of defense before that code hits CMS.
This is the workflow that survives a RADV audit. The AI makes the coder faster. The coder makes the AI accurate. Remove either one and you have a problem.
The Real Question Is Not "Will AI Replace Me?" — It Is "Am I Using the Right Tools?"
I understand the anxiety. When you read that AI processes thousands of charts per day and you process 50, the math feels threatening. But here is what those headlines leave out.
Those AI systems need human coders reviewing every output. CMS requires that any AI-assisted coding be reviewed and attested by a licensed professional before claim submission. The "thousands of charts per day" number is the AI's throughput before human review — it is not the finished product.
The coders who are genuinely at risk are the ones using outdated tools and outdated workflows — not because AI will take their jobs, but because they cannot keep up with the accuracy demands of V28, the speed of quarterly RADV audits, and the documentation specificity that defensible accuracy requires.
If you are still coding from printed reference guides, manually cross-referencing V24 to V28 mappings in a spreadsheet, or switching between six browser tabs to look up codes, check HCC mapping, calculate RAF scores, and verify hierarchies — you are spending your time on things a tool should handle so you can spend your judgment on things only a human can.
The coders who will thrive in 2026 and beyond are the ones who pair their clinical judgment with tools purpose-built for risk adjustment — tools that give them instant V28 mapping, RAF calculations, drug-to-diagnosis cross-references, and hierarchy checks so they can focus their expertise on the MEAT validation, specificity decisions, and clinical context interpretation that no AI can do reliably.
That is why I built HCC Buddy. Not to replace coders — that would be irresponsible given everything I have just described — but to make coders faster at the mechanical parts of the job so they have more bandwidth for the parts that actually require their brain. The Chrome extension puts HCC lookups, RAF calculations, and drug references directly inside the EHR so coders never leave their chart. The V28 mapping updates the same day CMS publishes changes. The tool handles the speed. The coder handles the accuracy.
What You Should Do Right Now
Whether you use HCC Buddy or not, here is what the current landscape demands from every risk adjustment coder.
Audit your own work against MEAT criteria. Pull five charts you coded last week. For every HCC you captured, find the MEAT evidence in the documentation. If you cannot point to specific monitoring, evaluation, assessment, or treatment for a diagnosis you submitted, that is a code that would fail a RADV audit. Fix your process before an auditor fixes it for you.
Learn V28 cold. The full V28 model is live. The 2,294 codes that were removed are not coming back. The hierarchy changes matter. If you are still coding from V24 instincts, your accuracy is slipping and you may not know it yet.
Understand what your AI tools are actually doing. If your organization is pushing AI-assisted coding, ask specific questions. What is the false positive rate? How does it validate MEAT? Can it distinguish active conditions from historical ones? If nobody can answer those questions, the tool is a compliance risk.
Do not let anyone — human or AI — pressure you to submit codes you cannot defend. The whistleblower in the Kaiser case was a coder. She saw diagnoses being added without support and she reported it. That is not just ethical — it is $95 million worth of vindicated. Your professional judgment is your most valuable asset. Do not let a productivity metric or an AI suggestion override it.
The era of coding intensity is over. The era of defensible accuracy is here. The coders who understand that distinction — and have the tools to operate within it — are the ones who will build long, successful careers in risk adjustment.
Try HCC Buddy free — 10 lookups per day, no credit card required.
Related Tools
ICD-10 Encoder
Every code shows V24 and V28 HCC mapping, RAF coefficients, and hierarchy information instantly.
RAF Calculator
Calculate risk adjustment factor scores in real time while you review charts.
Chrome Extension
HCC lookups, RAF calculations, and drug references directly inside your EHR browser.
Daniel Plasencia
Founder & Developer
Daniel Plasencia — Risk adjustment coding professional and software engineer who built the tool he wished existed, at a price coders can actually afford.
Get HCC Coding Tips in Your Inbox
Join our newsletter for coding tips, guideline updates, and tool announcements.

