typing was never the hard part

DevOpsDays Austin 2026

Ian Littman / @ian@phpc.social / @ian.im / @iansltx

Slides at ian.im/doda26

Warning: we're gonna talk about LLMs (#aI)

  1. What happened in the last ~year (not just at Anthropic)
  2. What this means software dev, and how to provide value
  3. What this means for infra folks, and how to provide value

 

Slides are mine. LLMs didn't touch them.

We're talking about interacting with code.

LLMs can do other things, but they're out of scope for this talk.

If you have to use the tools,
they might as well be Useful.

other folks use LLMs more aggresively than Me

I'm still ahead on productivity vs. manual, with a safety level I'm comfortable with.

LLMs are good for tasks with...

  1. Easy-to-describe acceptance criteria
  2. Toilsome implementation
  3. An existing pattern to follow (from context or training data)
  4. Straightforward (preferably automated) verification
  5. Small units of work (this is no longer a constraint)

 

Capabilities and costs subject to change without notice

History: Better models && better harnesses

Late May 2025

  • Claude Sonnet 4 released by Anthropic
  • Claude Code went GA
  • I start using Sonnet ~1mo later, and then more in August
  • Opus existed but was $$$
  • Sonnet was a situational toil-reduction pick
  • Already easy to find a model that was better at jq/bash/regex than I was

september 2025

  • Sonnet 4.5
  • Qwen3-Next
  • GLM 4.6

Late November 2025

  • Opus 4.5
    • Step change in capabilities
    • Significantly less expensive
    • I started throwing it larger tasks ~year-end
  • GLM 4.7 (in December)

February 2026

  • Sonnet 4.6 + Opus 4.6
  • GPT 5.3 Codex
  • Qwen3.5 (step change on local model ability)
  • GLM 5
  • Kimi K2.5 in January (I used it a bit)

march 2026

  • Claude Code instability
  • Claude plan rate limit revisions
  • Claude code review is more widely available
  • GPT-5.4

April 2026

  • Releases (non-exhaustive list)
    • Qwen3.6
    • Gemma 4
    • GLM 5.1
    • Kimi K2.6
    • DeepSeek V4
    • GPT-5.5
    • Opus 4.7
  • Rugpulls
    • GitHub Copilot
    • Claude Enterprise

A digression on subsidies

 

vs. paying APIs per-token

May 2026

The trend

  • Capabilities are improving quickly across frontier/open-weights/local
  • Tokens are increasingly likely to cost real money, utility-billed
  • Business models based on indefinite subsidies are time bombs
  • Doing less (token count) with less (cheaper/simpler models) matters
    • Cheaper closed models (same or different vendors)
    • Open-weights on not-your-machine
    • Open-weights on your machine
  • Deterministic code is way cheaper to run than inference

The Hard Part ≠ The bottleneck

In Software Dev

  • Writing tests
  • All-else-equal refactors
  • Dependency upgrades, including major version bumps
  • Language ports

In Software Dev

  • Attempting bugfixes (sometimes succeeding, sometimes not)
  • Solving the clean-slate problem
  • Justifying nuking temporary code
  • Spelunking in a complex codebase
  • Reviewing new code as another set of eyes

Software dev caveats

  • Won't use new/best practices unless told to
  • Will follow bad patterns if your code has them
  • Will churn code absent guard rails
  • Test quantity/quality can vary
    • "I can't test this so I'm going to write a parallel implementation and test that"
  • If something doesn't add up, there's a decent chance the model is hallucinating
    • This is less of a problem than it used to be

Software Dev Caveat Remediation

  • You still have to (know how to) review the code (and the tests!)
  • Don't turn your architecture brain off
  • Don't be afraid of backing out diffs
  • Take responsibility for the work (because an LLM can't)

How do devs Provide value?

  • Determining guard rails (e.g. static analysis/linting) for agents to use
  • Upskilling into product
    • Deciding what needs to be done
    • Making requirements explicit
    • Figuring out how to validate acceptance criteria
  • Reviewing output
  • Prioritizing useful change
  • Minimizing noisy churn

You're (Sort of) a manager now

But the computer won't be offended if you do X% of the work yourself

Infra Disclaimers

  • "Good at" and "Caveats" are secondhand for me
  • There are tons of primary sources here

In Infra

  • ​Net-new IaC
  • Runbooks -> bespoke case-specific troubleshooting steps
  • CI (e.g. GitHub Actions) glue code
  • Import scripts
  • Refactors

In Infra

  • Troubleshooting (>= rubber duck)
  • Sniff-testing terraform plan output
  • State validation

Infra caveats

  • Over-abstraction rather than KISS
    • Particularly on updates
  • Stuff needs testing
    • Preferably automated testing
  • Tricky IAM -> suboptimal training data -> suspect output
  • Running bash is a lot more dangerous when it can hit prod

Infra caveat Remediation

  • IaC all the things
  • Steer the model when it's overcomplicating/getting lost
  • For local models, be prescriptive
  • Give the model a way to check its work
    • terraform validate
    • terraform plan
  • If an LLM can't access it, it can't break it
  • Ensure skills match docs

How can Infra folks provide value?

  • Setting patterns
    • Abstraction tradeoffs
    • Infra/tool choices (not just for you)
    • Why rather than just what, described in a way your audience cares
    • Separating useful change from noisy churn
  • Validating changes before they're made
  • Setting up deterministic artifacts

At the end of the day, a human has to...

  • Steer the ship
  • Wield the tools
  • Take responsibility

Questions? Find me here / @ian@phpc.social / @ian.im / @iansltx

Slides: https://ian.im/doda26

Thanks!