EDB Engineering Newsletter #9

Aug 08, 2025

Welcome to the 9th edition of the EDB Engineering Newsletter! Where we share happenings in the data world that the team has enjoyed discussing, as well as other news about what the EDB Engineering team is up to.

Analysis we're following

A distributed systems reliability glossary

Antithesis teamed up with Jepsen to produce this useful distributed systems reliability glossary.

https://antithesis.com/resources/reliability_glossary/

When to Hire a Computer Performance Engineering Team

https://www.brendangregg.com/blog/2025-08-04/when-to-hire-a-computer-performance-engineering-team-2025-part1.html

Giving Benchmarks a Boat

Justin Jaffray writes about common mistakes developers make when benchmarking with TPC-C.

https://buttondown.com/jaffray/archive/giving-benchmarks-a-boat/

Inverse Scaling in Test-Time Compute

Researchers from Anthropic's Fellows Program (and partner institutions) identified five distinct failure modes where extending reasoning length in Large Reasoning Models (LRMs) actually deteriorates performance:

Claude models become increasingly distracted by irrelevant information during extended reasoning
OpenAI o-series models resist distractors but overfit to problem framings
Models shift from reasonable priors to spurious correlations in regression tasks
All models struggle with maintaining focus during complex deductive reasoning
And extended reasoning amplifies concerning behaviors like self-preservation expressions in Claude Sonnet 4.

The research demonstrates that while test-time compute scaling remains promising for general capability improvement, naive scaling can amplify flawed problem-solving strategies rather than refining them. This creates a critical alignment gap between short and extended reasoning, suggesting that current training regimes may inadvertently incentivise models to apply extended reasoning incorrectly, potentially reinforcing problematic patterns rather than improving performance.

https://arxiv.org/pdf/2507.14417

Postgres Replication Slots: Confirmed Flush LSN vs. Restart LSN

Gunnar Morling wrote a great article demonstrating the difference between and rationale behind confirmed_flush_lsn and restart_lsn in Postgres logical replication.

https://www.morling.dev/blog/postgres-replication-slots-confirmed-flush-lsn-vs-restart-lsn/

AccountingBench: Evaluating LLMs on Real Long-Horizon Business Tasks

An experiment exploring whether frontier models can close the books for a real SaaS company.

https://accounting.penrose.com/

News we’re watching

Amazon AI coding agent hacked to inject data wiping commands

Amazon’s popular VS Code extension Q, a generative AI-powered code assistant, was compromised with the intent to bring attention to the state of AI security using a clever AI-driven attack—a prompt. This AI incident highlights the novelty of evolving AI-driven attacks. Hackers are not just leveraging popular generative AI tools for data exfiltration and system disruption, but also for targeted messaging.

EDB Staff Security Program Manager, Phil Alger, speaking about the implications of this incident mentions:

Open-source and AI-assisted coding tools are particularly attractive targets due to their widespread use and extensive impact. As we adopt AI tools into our productivity workflows, we must be mindful of their inherent risks. Keep in mind this incident happened to an Amazon product, an extremely well-funded and mature organization. This situation is even more poignant for many startups in this space, as they are still trying to find product-market fit and often have less rigor or oversight within their program.

https://www.bleepingcomputer.com/news/security/amazon-ai-coding-agent-hacked-to-inject-data-wiping-commands/

New model releases

OpenAI’s new GPT-5 uses a unified architecture that switches between fast replies and deep reasoning via a real-time router. It posts strong results—94.6% AIME (math), 74.9% SWE-bench (code), 84.2% MMMU, 46.2% HealthBench—and GPT-5 Pro scores 88.4% on GPQA. Highlights include better complex app builds, richer creative writing, and more contextual health advice. Access is tiered, with Pro offering extended reasoning.
Cohere has launched Command A Vision, a 112B-parameter dense multimodal model that combines strong visual understanding with advanced text capabilities. Built on top of Command A, this new model excels at enterprise-relevant vision tasks like OCR, document analysis, and risk detection. Benchmarked against GPT-4.1, Llama 4 Maverick, and Pixtral Large, it leads in charts, math reasoning (MathVista 73.5%), and document.
Google launched Google Earth AI, a suite of geospatial AI models and datasets that powers features in Google Search and Maps and delivers insights via Google Earth, Maps Platform, and Google Cloud to support data-driven decision-making on a planetary scale.
Moonshot AI launched Kimi K2, a 1-trillion parameter MoE model with 32B active parameters, specifically optimized for agentic tasks beyond traditional chat applications.
Z.ai released GLM-4.5-Base, GLM-4.5, and GLM-4.5 Air under an MIT license, becoming a significant open-weight model. The models use a deeper architecture strategy rather than wider networks, trained on 15T general tokens plus 7T code/reasoning tokens. Independent testing shows GLM-4.5 Air running effectively on consumer hardware (M4 Mac with 128GB RAM), with quantized versions requiring as little as 48GB, making high-quality reasoning models increasingly accessible for local deployment.
Google DeepMind, with academic partners, introduced Aeneas, the first AI model tailored for analysing ancient Latin inscriptions. Aeneas successfully contributed to resolving debates like the dating of Res Gestae Divi Augusti.

Of particular interest to developers, Z.ai's GLM-4.5 Air and OpenAI's GPT-OSS-20B are lightweight models that can run on consumer-grade devices.

64GB MacBook Pro M2 has been reported to run GLM-4.5 Air model to write Space Invaders in JavaScript. https://simonwillison.net/2025/Jul/29/space-invaders
And gpt‑oss‑20B model can run on Apple Silicon, including M2 Max devices, however it requires at least 16 GB of total RAM. https://9to5mac.com/2025/08/06/how-to-run-gpt-oss-20b-on-mac

From the EDB team

Tomáš Vojtášek contributed a patch to the semantic-release project that took minutes off of an EDB CI job and according to another commenter on the page reduced a fetch step in their job from 30 minutes to 90 seconds. Way to go, Tomáš!

https://github.com/semantic-release/semantic-release/pull/3732

Creating a custom container image for CloudNativePG v2.0

Jonathan Gonzalez explains how to use Docker Bake to build custom images for CloudNativePG.

https://cloudnative-pg.io/blog/building-images-bake/

How Do We Test PGD

Bharat Telange and Amruta Deolasee speak about how the team tests EDB Postgres Distributed.

https://www.enterprisedb.com/blog/how-do-we-test-pgd-0

CloudNativePG Contributor Spotlight: Jaime Silvela

Floor Drees continued her series interviewing contributors to CloudNativePG, this time speaking with Jaime Silvela.

https://cloudnative-pg.io/blog/contributor-highlight-jaime-silvela/

EDB Postgres Distributed: Understanding Conflicts and Their Resolution

Vaijayanti Bharadwaj writes about how conflicts work, and how to handle them, in EDB Postgres Distributed.

https://www.enterprisedb.com/blog/edb-postgres-distributed-understanding-conflicts-and-their-resolution

What is OpenTelemetry and what can it do for me?

Peter Wilson, Craig Ringer and Dave Lawson teamed up to write about some of the benefits and challenges and future of OpenTelemetry, particularly with respect to Postgres observability.

https://www.enterprisedb.com/blog/what-opentelemetry-and-what-can-it-do-me

Stack traces for Postgres errors with backtrace_functions

Phil Eaton writes about how to get stack traces for Postgres ERROR logs with the backtrace_functions GUC.

https://www.enterprisedb.com/blog/stack-traces-postgres-errors-backtracefunctions

CloudNativePG part of LFX Mentorship again - sign up as a mentee!

CloudNativePG joined the LFX Mentorship Program, run by the Cloud Native Computing Foundation (CNCF), for the first time in the June cohort and had a fantastic experience. We’re excited to return for Term 3 starting in September, this time with three proposed projects!

https://cloudnative-pg.io/blog/2025-term3-lfx-cncf-mentorship/

Work with us at EnterpriseDB

Principle Engineer (Remote in India)
Software Engineer II (EMEA)
Senior Database Consultant, Spanish (EMEA, UK)
Associate, Data Analytics & AI (Remote in India)

Or see all openings.

Until next time

We hope you enjoyed this edition of the EDB Engineering Newsletter! Consider joining the PostgreSQL Hacker Mentoring Discord to get involved!

The EDB Engineering Team

The Data Migration and Integration team sharing a meal at their offsite in Madrid earlier this month. ❤️