← All updates AI in Lending

Why your LMS ledger should never be model-guessed

LLMs predict; they do not calculate. Loan ledger math must be deterministic and reproducible — here is what that means for your LMS and your next RBI audit.

Why your LMS ledger should never be model-guessed

A large language model predicts; it does not calculate. It is built to produce the most likely next token, not the arithmetically correct one — and on a loan ledger, “likely” is not good enough. The interest, the balance, the repayment schedule have to be exactly right, and they have to be right the same way every time you re-run them. That is a job for deterministic code, not for a model that guesses.

We say this as people who have spent years building the rails the lending industry runs on. The lesson is consistent: the ledger is the one place where you do not get to be approximately correct.

Key takeaways
  • An LLM predicts; it does not calculate. A probabilistic model cannot guarantee the same number twice, and ledger math has to be exactly right on every replay.
  • The ledger is the one place you cannot be approximately correct. Interest, balances, and schedules must be deterministic code, not model output.
  • Determinism is a structural property, not a feature. Double-entry and idempotency make the book reproducible by construction.
  • Governed AI belongs around the ledger, never inside it. AI advises, routes, and recommends; every ledger state-change is human-approved or policy-bounded, and logged.
  • An audit is a replay. A deterministic ledger lets a regulator re-run any figure and get an exact match — the basis of an audit-ready trail.

What a deterministic ledger is (and why it matters)

A deterministic ledger is one where the same inputs always produce the same output, derived by code rather than by a statistical model. Run the calculation today, run it again in three years during an audit, and the figures match to the paisa.

Two ideas sit underneath that guarantee. Double-entry is the accounting discipline where every transaction is recorded in two places — a debit and a matching credit — so the books always balance and nothing moves without a trace. Idempotency is the property that performing the same operation more than once has the same effect as performing it once — so a retried disbursal or a replayed event does not silently book a second entry. Together they mean the ledger is reproducible by construction, not by luck.

Why does this matter? Because a loan management system, or LMS — the system of record that tracks every loan from disbursal through repayment and closure — is the money-of-record. Everything downstream trusts it: the borrower’s statement, the collections workflow, the regulatory return, the auditor’s sample. If the ledger can drift, everything that reads from it inherits the drift. A deterministic ledger removes that whole class of risk before it starts.

There is a related point about how the arithmetic is done at all. Floating-point numbers — the default way computers represent decimals — are approximate by nature. As Goldberg’s canonical reference puts it, “this rounding error is the characteristic feature of floating-point computation,” and a daily compound-interest example can err by roughly $1.40 per account in binary float. That is why serious financial systems use decimal or integer arithmetic, never raw floats. It is a known, bounded problem with a known fix. We mention it because it sets the bar: if we are this careful about a rounding mode, we are certainly not going to hand the arithmetic to a probabilistic model.

How LLMs fail at financial arithmetic — structurally, not randomly

It is tempting to think of an LLM getting a sum wrong as an occasional slip, the kind a human makes when tired. It is not. The failure is structural.

An LLM is a probabilistic next-token predictor: given the text so far, it emits the token it judges most likely to come next. Research into how these models actually do arithmetic finds that they rely on a “bag of heuristics” rather than a genuine algorithm — pattern fragments that happen to land on the right answer for familiar cases and fail unpredictably once you move off-distribution — that is, when the inputs look different from the examples the model trained on. In financial settings specifically, researchers have located “specific, reproducible hallucinations when performing arithmetic.” The model is not reasoning about carries and remainders; it is matching shapes.

To be precise about the claim: we are not saying LLMs are always wrong at finance. We are saying they are probabilistic predictors that cannot guarantee exact, reproducible arithmetic, which is exactly what money-of-record requires. And the reproducibility problem is sharper than most people expect. Even at temperature zero — the setting meant to make a model deterministic — one study found accuracy varied by up to 15% between identical runs, with no model producing identical outputs consistently. For an audit, that alone is disqualifying. You cannot certify a balance a regulator could re-derive differently tomorrow.

This is a distinct failure from floating-point rounding. Floating-point error is small and bounded; you can prove its limits and engineer around it. LLM arithmetic error has no such bound. The two should never be conflated — but they point the same direction: keep the ledger arithmetic in code you can reason about.

The two-layer model: deterministic core, governed AI

The resolution is not to ban AI from lending. It is to give AI the right job and keep it away from the wrong one.

We build on a two-layer model: a deterministic core and a layer of governed AI — AI that operates inside explicit bounds, with its actions logged and its authority capped. The ledger calculates. The AI advises, routes, drafts, and recommends. It can suggest a price, flag an anomaly, summarise a file, or propose a collections action. What it cannot do is post a ledger entry on its own authority. Every ledger state-change is triggered by a human-approved decision or a policy-bounded rule, never by a raw model inference — and every one of those state-changes is written to an audit trail.

That separation is what we mean by audit-by-design: the system produces its evidence as a byproduct of operating, rather than reconstructing it after the fact. AI-generated actions are logged in their own track, distinct from the ledger-of-record entries, so you can always tell what the model proposed apart from what the book actually did.

We did not invent the deterministic-core, governed-AI pattern — it exists in the literature, and we are glad it does. Our claim is narrower and, we think, more useful: applying it rigorously to the LMS and loan origination domain for regulated lenders. That is where our history sits. The team behind Lokta built Apache Fineract, the open-source lending core a large part of the industry runs on, and Finflux, which served 60+ lenders across 15 countries and 12M+ borrowers before its acquisition by M2P in 2022. We have built ledger rails at scale before. Lokta is what that experience looks like rebuilt for an AI-native world — without surrendering the part that has to stay exact.

That discipline also matters under load. As more decisioning becomes agent-mediated, the system has to absorb agentic load on a lending API without ever letting throughput pressure leak into ledger correctness. The two-layer split is what lets the advisory layer scale while the core stays still and certain.

What this means for an RBI audit

For a CRO or compliance lead, the abstraction becomes very concrete at audit time.

An audit is, at heart, a replay. A regulator or auditor asks: show me how you arrived at this balance, this interest figure, this schedule — and prove it. With a deterministic ledger, that request is answerable directly. Double-entry means every figure traces to a balanced pair of entries; idempotency means re-running the calculation produces the same number rather than a near-miss. The auditor can re-derive any line and get an exact match. That is audit replay, and it is only possible if the arithmetic was deterministic in the first place.

This is not a new expectation in spirit. India’s Bankers’ Books Evidence Act, 1891 has long treated certified copies of a banker’s books as admissible evidence, with electronic records requiring certification of their integrity — the law has always assumed the book is a reliable, reproducible record. The modern engineering equivalent is the immutable, append-only ledger: one widely cited payments engineering team keeps journal and book entries “append-only and immutable once stored,” correcting mistakes by a new entry rather than an edit. You never overwrite history; you add to it. That is exactly how a ledger stays auditable.

The direction of regulation reinforces the principle. The RBI’s August 2024 draft circular on model risk in credit holds that model outcomes should be “consistent, unbiased, explainable and verifiable.” That is a draft, not binding law, so we treat it as where the regulator is heading rather than a present obligation. But “consistent” and “verifiable” are precisely the properties a probabilistic model cannot promise for arithmetic and a deterministic ledger guarantees by construction. Building to that standard now is the conservative bet.

Why “AI-native” is not the same as “trustworthy by design”

“AI-native” has become shorthand for a lot of things, and most of them are good. But it is worth being exact, because the label and the guarantee are not the same.

When most products say AI-native, they mean AI in origination and decisioning — pulling alternative data, scoring applicants, automating parts of the workflow. That is genuinely valuable, and we do it too. A deterministic core is a different kind of claim. It is not a feature in the origination flow; it is a structural property of the ledger layer that holds regardless of how clever the AI on top becomes.

The two are not mutually exclusive. We run both, deliberately. The point is the ordering: the deterministic core is what makes the AI layer trustworthy rather than merely capable. A model that recommends is useful. A model that can quietly alter the money-of-record is a liability dressed as innovation. Trustworthy-by-design means deciding, up front, which layer is allowed to touch the book. Only one of them should.

If you are evaluating any lending stack, that is the question worth asking past the demo: not whether there is AI in it, but where the ledger arithmetic happens and whether a regulator could re-run it. It is also the kind of question that shows up when you study what loan officers actually need from an LMS — they need numbers they can stand behind, not numbers a model produced once.

The takeaway

Put AI everywhere it earns its place — underwriting, pricing, collections, document parsing — and keep it out of the one place it cannot be trusted: the arithmetic of record. The ledger calculates; governed AI advises around it, audit-by-design and policy-bounded. That ordering is what makes the intelligence on top safe to use.

We are pre-customer and founder-led, building Lokta with design partners rather than selling past them. If this is the architecture conversation you want to have, start a conversation.

Frequently asked questions

Why can’t an LLM calculate loan interest or repayment schedules?

An LLM predicts the likely next token, not the arithmetically correct result. A rounding drift or sign error compounds across every later balance and produces a record that cannot be replicated. Ledger arithmetic must be executed by deterministic code, not generated by a model that guesses.

What does a deterministic ledger mean in an LMS?

It means the same inputs always produce the same output, derived by code rather than a statistical model. Any calculation can be re-run and will match exactly, to the paisa. That reproducibility is what makes the book auditable, and it is the property a probabilistic model cannot promise.

How does governed AI work alongside a deterministic ledger?

AI recommends, routes, and drafts in an advisory and automation layer; the ledger does the calculating. Every ledger state-change is triggered by a human-approved decision or a policy-bounded rule, never by a model inference — and each one is written to an audit trail.

What is the risk of an LLM touching financial calculations?

Arithmetic hallucinations in these models are structural, not random. Even a small error rate on balances breaks audit replay, can trigger non-compliance, and cannot be reconciled without a full recalculation. The cost is paid downstream, where it is hardest to find.

What makes an LMS audit-ready under RBI guidelines?

Complete, replayable audit trails, with every ledger entry deterministic and reproducible so a regulator can re-run any calculation and arrive at the same figure. AI-generated actions are logged separately from the ledger-of-record entries, so what a model proposed is always distinguishable from what the book did.

What is the difference between an AI-native LMS and one with a deterministic core?

AI-native usually describes AI in origination and decisioning. A deterministic core is a structural guarantee about the ledger layer. The two are not mutually exclusive — we run both — but the deterministic core is what makes the AI layer trustworthy rather than merely capable.


Read next:


Sources:


Ashok Auty is the co-founder of Lokta and co-creator of Apache Fineract. He has spent two decades building loan ledgers — which is why he will happily let a model draft the memo, but never post the entry.

Ashok Auty

Co-founder of Lokta. Co-creator of Apache Fineract. 15+ years building lending infrastructure that's powered 25M+ borrowers across 15 countries.

More about the team →
Talk to the team

Adopt the next-decade lending stack with Lokta

Lokta Core is enterprise-ready and deploys under engagement. We work with a select group of institutions through a founder-led model — deep adoption, deliberate scope, a delivery window the team commits to in writing.