When Production Logs Become Your Best QA Asset

0 1 4 minutes read

When Production Logs Become Your Best QA Asset

Most people who use banking applications never think about what happens behind the scenes when the transaction takes place. They click a button, money moves, and that’s it. But for engineers who are responsible for making sure that the transaction works reliably, the reality is more complicated especially when the bugs present themselves under very specific conditions that the testing environment did not expect.

Tanvi Mittal, a software quality engineering specialist with 15 years of experience in corporate financial systems, knows this problem intimately. He spent most of his career building and leading automated testing frameworks for large banking applications, and during that time he noticed a pattern that kept repeating. Bugs that passed through all the layers of testing, development, staging and QA would appear in production, often in ways that were difficult to track and expensive to fix.

One event in particular shaped his thinking. The transaction bug was not detected throughout the testing cycle and was ultimately caught not by an automated alert or monitoring tool, but by the banker during the actual customer interaction. The first two transactions in a row were successful. The third failed. It took days to diagnose. The bug started under that particular chain of events, at that volume, and no ground has ever come close to replicating it.

“The data has been showing the same pattern,” Mittal said. “Bugs were sent to the product that we could not find in the low areas. Not because the team was not doing their job but because the low areas do not behave like production.”

That experience, and others like it, led him to start thinking differently about where test coverage comes from. Requirements documentation and handwritten test plans reflect what developers expect users to do. Production logs show what users are actually doing in every case, every unusual sequence, every failure mode that no one thought to test. The question Mittal kept coming back to was why those logs could be used to drive test production.

That question ended up being LogMiner-QA.

Creating Something That Wasn’t There

LogMiner-QA imports raw application logs and uses AI and machine learning to automatically generate Gherkin test cases, a structured, human-readable format used by test frameworks such as Cucumber and Pytest-BDD that can be directly integrated into CI/CD pipelines. The idea is to take behavioral intelligence that’s already embedded in production logs and make it work for QA teams before the next ship release, rather than after something breaks.

Getting there took longer than Mittal expected, and the challenges were less interesting than ideal. The main difficulty was that the production logs were not the same. Every organization organizes them differently. Field names vary; one system calls it “message,” another calls it “msg.” Timestamp formats vary. Some groups enter at the activity level, others at the session level. Building a tool that could reliably interpret logs for that kind of variability meant testing against a wide range of real log samples and frequent iterations.

“Every time I test a new log frame, something breaks,” he says. “That was the unpleasant part of building this, not the AI, but the raw, inconsistent reality of what the logs look like in the wild.”

The tool handles this by using dynamic field mapping and configurable input, supporting local JSON and CSV files as well as Elasticsearch and Datadog connectors. Under the hood, it uses NLP enrichment with transformer embedding, clustering, and Isolation Forest’s unusual scoring engine to identify unusual behavior patterns. The LSTM-based journey analysis component reconstructs the actual customer flow at all times, revealing sequences like those three-task failures that manual test design always misses.

The Privacy Problem Nobody Wanted to Talk About

When Mittal started talking to people about the tool, he faced the reaction he expected but still had to carefully prepare it. When he mentioned timber production, people were cautious. In the banking context, production logs contain real customer data account numbers, transaction IDs, IBANs, behavioral patterns that can be linked to individuals. The idea of using those logs with any external tool raised immediate compliance concerns.

“Convincing people that it is safe to use wood for production in the tool was a cultural challenge as well as a technical one,” he said.

His answer was to make privacy a core feature of the architecture rather than an added feature. LogMiner-QA cleans logs before any analysis takes place, using pattern matching and spaCy-based business recognition to detect PII, correct sensitive fields, and replace them with stable tokens that preserve index integrity without exposing the underlying data. A separate privacy layer adds quantized noise to aggregate metrics, making it computationally difficult to reconstruct individual customer behavior from anonymous output. The tool works on premises, in spaces with air gaps, which means that the logs never leave the organization’s infrastructure.

For compliance groups in regulated industries, that last point usually ends the conversation quickly on a positive note.

Closing the Coverage Blind Spot

Mittal first looked at LogMiner-QA for banking, a domain he knew best and where the barriers to failure in production are the highest. But as the tool grew, he began to see the same basic problem in all the regulated industries of health care, insurance, financial services in general. The gap between the coverage of test suites and what is done in production is not limited to banking. Structural, and it exists wherever the test design is driven primarily by requirements documentation rather than by observation of user behavior.

The tool shows that wide range. Its compliance module generates test cases aligned with PCI and GDPR. Its fraud detection module specifically targets speed anomalies, high value transaction flows, and failed entry sequence behaviors that are nearly impossible to replicate in downstream environments without actual production data as a reference point. CI mode outputs compact JSON summaries of pipeline gateways, allowing teams to automatically failover builds when high-severity findings or anomaly thresholds are exceeded.

LogMiner-QA is open source under the MIT license and is available at github.com/77QAlab/LogMiner-QA. Mittal is looking for newbies from banking and corporate QA teams who are willing to test it against variations in real logs, the same variations that made building them really difficult. Planned additions include Splunk and CloudWatch connectors, a risk visualization dashboard, and more advanced fraud detection models.

For Mittal, the motivation behind it all remains the same as it was when a bank teller caught an error that was missed by an entire round of checks. Production already knows what your test suite can do. The question is whether you are paying attention.