arocom uses AI productively in three areas of its own agency work: software development (AI agents for implementation, tests and reviews with human sign-off), content work (research and drafts, final editing by the author) and SEO/GEO analysis (automated audits via its own tool pipelines). The biggest measurable effect is in development; the bottleneck moves from implementing to describing precisely and reviewing. Abolished again: fully automatic translations without review, AI-generated stock images for expert content, and an internal knowledge chatbot.
Note: This post describes a state of affairs that may have evolved since publication. For a current assessment, get in touch.

Inside arocom: How We Work With AI Ourselves

Anyone who advises companies on AI should be able to show their own workshop. This post is deliberately concrete: which tools we use for what, what they measurably deliver, what went wrong. As of January 2026. The half-life of such reports is short; that is exactly why we write them. An update follows when something substantial changes.

Three areas carry our daily work: development, content work and SEO analysis. For each, we show a concrete example from recent weeks, plus the list of things we buried again after an honest reckoning.

Development: agents implement, humans decide

Our biggest lever is in development. AI agents take on bounded implementation tasks: building components, writing tests, carrying out refactorings, preparing code reviews. Our own website (440+ static pages, Astro plus a Drupal legacy) is mostly developed this way. We describe the task, the agent implements on a feature branch, quality gates run automatically (tests, linting, visual regression), a human reviews the pull request.

What that looks like on a normal morning, using a real feature as an example (the times are rough values from our own observation, not a benchmark):

  • 9:00 am, describing the task, about 20 minutes. A short specification: what the new filter feature should do, which pages are affected, that DE and EN must move in lockstep.
  • 9:20 am, the agent implements, about 45 minutes. Feature branch, component, tests, translations. We work on other topics in the meantime.
  • 10:15 am, quality gates, about 10 minutes. Tests, linting, accessibility check and visual regression run automatically.
  • 10:30 am, review, about 30 minutes. A developer reads the pull request, finds an unhandled edge case, the agent corrects it.
  • Around 11:30 am: merge. Before the agents, we would have budgeted one to two person-days for the same feature. In the afternoon, the next ticket runs through the same pattern.

The honest verdict: for clearly delineated tasks, the agents are faster and more thorough than we are. They forget no test cases and no translation. For fuzzy tasks, they produce confident nonsense. The bottleneck moves from writing the code to describing the problem precisely and to the review.

Calculated over the year, we estimate the effect at roughly twice the number of completed tasks per week. That is an order of magnitude from our own records, not a benchmark, and it spreads unevenly: routine work accelerates strongly, conceptual work barely. We deliberately do not hand agents architecture decisions, security-critical changes, or anything we cannot yet describe clearly ourselves. A fuzzy task does not get sharper through an agent, only implemented incorrectly faster.

Content and SEO: the draft is AI, the sender is human

In content work, our guiding principle applies literally: research, structural proposals and first drafts come from the AI. The final version is the responsibility of the author, for this text: me. Metadata, alt texts and raw translation drafts run through the same workflows we set up for clients in Drupal.

The most concrete example is this blog itself. The workflow for every post: research agents collect sources and counter-positions, an agent builds a structural proposal and a first draft following our editorial guideline. Then the human takes over: cutting, checking numbers, adding our own project experience, straightening the tone. Rarely does more than the structure survive from the first draft unchanged. This text came about the same way, and I approved it.

The research agents do not deliver finished truth but material: a source list with counter-positions, open questions, a first outline. Our rule on this is strict: no post goes out whose numbers and claims the named author has not checked personally. That costs one to two hours per text and is the part we will not automate.

For SEO and GEO analysis, we built our own pipelines: automated site audits, Search Console trends, competitor monitoring in AI answers. What used to be a consulting day is now a report at the push of a button. The consulting day now sits in the interpretation and the measures. One example: the weekly visibility report across our client projects used to take half a day of manual work. Today it runs automatically and ends in half an hour of human interpretation. The building blocks are the same agents as in development, just with different data sources: Search Console, web analytics and the AI search systems themselves.

What we abolished again

Transparency also means naming the failures. The list is short, but every item cost us time or money before we cut it:

  • Fully automatic translations without review. The raw quality is impressive, but technical terminology and tone reliably tip into the generic without human editing. Today: raw draft automatic, sign-off human. What we learned: check quality where it tips, at technical terms and tone, not at grammar.
  • AI-generated images for expert content. They looked like AI and contributed nothing. We returned to curated stock photos and our own graphics. The learning: an image only contributes when it shows something real, so we now curate photos as carefully as texts.
  • The "automate everything" reflex. An internal chatbot for project knowledge delivered less than the simple discipline of keeping knowledge in structured Markdown documents that humans and our AI tools read alike. The bottleneck was never querying knowledge but maintaining it. Sometimes the best AI measure is a tidy file system.

What you can take away from our workshop

Behind all three areas sits the same pattern: AI moves the bottleneck from executing to deciding. Implementing, translating and analysing have become fast; describing precisely, checking and taking responsibility remain human work, and that is exactly where we invest. This shift is also why this report carries an expiry date: the tools change faster than the rules of the game, so we document above all the rules.

If you want to find out where AI holds up in your own daily work: start with a bounded, well-describable process, measure before and after, and keep sign-off with a human. How that becomes a strategy instead of tinkering is described in our post on AI strategy for mid-sized companies.

Want to know what these topics mean for your company? The Future Check shows you the biggest levers within 2–4 weeks.

Request a Future Check Get in touch directly
100 %