Writing

All posts, newest first.

No staging, so every deploy was an incident
A high-throughput factory-floor pipeline had no staging environment that matched production, so every deploy validated only on local and dev hit prod untested. The differences those two hid surfaced as incidents.
Jun 2, 2026·1 min read
- industry-retro
- deployment
A SELECT That Stopped a Factory
I ran an unverified SELECT against production. It took a library cache lock and stopped equipment at an overseas site, and it took me 15-20 minutes to realize my own session was holding the lock.
Jun 2, 2026·1 min read
- industry-retro
- database
Locking down a fleet of handheld devices with MDM
We used an MDM tool to remotely lock a fleet of factory-floor handhelds so only approved apps would run. It was a setup task more than an incident — the only correction is that the lockdown should have been part of the standard provisioning step, and at the time it wasn't.
Jun 2, 2026·1 min read
- industry-retro
- operations
A save button wired to a fixed column index
A save button in a data grid read from a hard-coded column index. When the column layout was reconfigured, it saved the wrong column.
Jun 2, 2026·1 min read
- industry-retro
- frontend
The TLS Certificate Nobody Was Watching Expired
A TLS certificate on the link between two systems expired without being renewed, and the integration went down in production. Nothing had been watching the expiry date.
Jun 2, 2026·1 min read
- industry-retro
- operations
When the table said done and the work hadn't happened
A transaction interface to an ERP system marked rows as success while the downstream operation had actually failed, and its logs were sometimes missing, sometimes duplicated. A short note on what broke and the one line that wasn't in the code.
Jun 2, 2026·1 min read
- industry-retro
- transactions
A protocol change on one side, an integration server that never heard about it
A piece of factory equipment switched its communication mode from push to request-response. The integration server in the middle was never changed to match, so the link broke in production.
Jun 2, 2026·1 min read
- industry-retro
- integration
Putting a handheld device on a locked factory network
Getting one handheld scanner onto the plant network meant clearing several access-control layers one at a time. None of it was written down anywhere.
Jun 2, 2026·1 min read
- industry-retro
- networking
Building an Android app outside the network it had to run inside
An Android app for an air-gapped factory network was built on a machine outside that network, then carried in by hand. When it failed inside, there was no way to see why from where the build happened.
Jun 2, 2026·1 min read
- industry-retro
- build
You Can't Enforce What You Can't Observe
I gave an AI coding assistant a rule: always record why you made each important decision. A few days later I checked — it wasn't being enforced, and it was the kind of thing that couldn't be. Written for someone with zero AI-engineering background, bridged to backend primitives (DB constraints, middleware, IAM), start to finish.
Jun 1, 2026·8 min read
Building a project log system for AI-pair-programmed work
A live build of the logging system that records every substantive decision Claude Code makes on my behalf — including the trigger layer that catches author-judgment slips, the cross-repo aggregation that pulls multiple satellite projects into one timeline, the tiered decision templates drawn from established frameworks, and the human-annotation surface that keeps me in the loop without forcing me to write every entry.
May 31, 2026·9 min read
Install a project log in any Claude Code session in 10 minutes
Copy-paste setup so Claude automatically records non-trivial fixes, decisions, and retros in your repo — without hallucinating.
May 27, 2026·16 min read