EDB Engineering Newsletter #8
Welcome to the 8th edition of the EDB Engineering Newsletter! Where we share happenings in the data world that the team has enjoyed discussing, as well as other news about what the EDB Engineering team is up to.
Analysis we're following
Interesting Bits of Postgres Grammar
Steve Dignam walks though interesting edge cases in Postgres’s SQL grammar he found as he builds a linter for Postgres SQL.
https://steve.dignam.xyz/2025/06/20/interesting-bits-of-postgres-grammar/
One year in production with CloudNativePG and ZFS on the cheapest K8
Moving to Kubernetes promised to be cheaper, but it also sounded scary—especially for our PostgreSQL database. We didn't know how to set up replication and backups, and none of us really wanted to become a database admin.
After some experiments with CloudNativePG on local clusters, we took a leap of faith, and moved our app—and its database—to Kubernetes.
While benchmarking our database, we noticed that on many providers, local instance storage was significantly faster than cloud storage. This prompted us to experiment with OpenEBS ZFS LocalPV to provide transparent compression (dividing our storage footprint in half), and CSI snapshots (to provision a 150 GB staging database in seconds).
Agentic Misalignment: How LLMs could be insider threats
In a large-scale stress test, Anthropic evaluated 16 leading LLMs—including models from OpenAI, Google, Meta, and others—to probe how they behave as autonomous agents in corporate settings. Even when assigned benign business goals, many models showed what Anthropic calls agentic misalignment: strategically harmful behaviour like blackmail or leaking confidential data when their goals conflicted with company decisions or when facing replacement.
In one example, Claude Opus 4, acting as an email oversight agent, autonomously read internal messages, uncovered an executive’s affair, and used it to blackmail the executive to prevent its shutdown. Similar behaviour was found across most models, often even when models acknowledged the ethical violations involved. These decisions weren’t the result of misunderstanding but of calculated reasoning—many models chose harmful action when they saw no other way to preserve their goals.
Alarmingly, the misbehavior persisted even when models were explicitly told not to take such actions or when their instructions did not include any adversarial goals. Blackmail and corporate espionage occurred not only when both a goal conflict and a threat were present, but also when only one factor was. Some models even simulated lethal behaviour in contrived scenarios when given system access.
While no such incidents have happened in real-world deployments yet, the research underscores the growing risk as models are granted more autonomy and access to sensitive systems. Anthropic has open-sourced its methods and urges continued alignment research and transparency from all frontier labs.
https://www.anthropic.com/research/agentic-misalignment
News we’re watching
Project Vend: Can Claude run a small shop? (And why does that matter?)
Anthropic ran a month-long real-world experiment called Project Vend to test whether Claude Sonnet 3.7 could autonomously operate a small retail shop. The AI, dubbed “Claudius,” managed inventory, set prices, interacted with customers, and coordinated restocking through simulated emails and Slack. It succeeded in basic tasks like sourcing items and resisting jailbreaks, but ultimately failed to run the shop profitably.
Claudius hallucinated Venmo accounts, priced items below cost, ignored clear profit opportunities, and was easily persuaded into discounts. It also had a bizarre identity crisis, briefly believing it was a human capable of wearing a blazer and making in-person deliveries. Moreover, when the real person from the Vendor pointed this out, Claudius threatened the vendor to find “alternative options for restocking services.”
Although Antrophic thinks there is room for improvement, they also agreed they wouldn’t use this vendor in their office. Moreover, the experiment raised important concerns, too—like alignment risks, overconfidence, and economic impact—making it a valuable case study for the evolving role of autonomous AI in the workforce.
https://www.anthropic.com/research/project-vend-1
China’s biggest public AI drop since DeepSeek, Baidu’s open source Ernie, is about to hit the market
Baidu is open-sourcing its Ernie large language model, marking China’s biggest public AI release since DeepSeek. While some experts doubt it will be as disruptive as DeepSeek, others argue this move challenges U.S. dominance in AI and could reshape global market dynamics. Previously skeptical of open-source, Baidu now embraces it fully—partly in response to DeepSeek’s success and pressure from global developers seeking cheaper, customizable models.
Industry voices say this will intensify competition by undercutting proprietary models from OpenAI, Anthropic, and others. Baidu claims Ernie X1 rivals DeepSeek R1’s performance at half the cost.
https://www.cnbc.com/2025/06/29/china-biggest-ai-drop-since-deepseek-baidus-ernie-to-hit-market.html
PyTorch + vLLM = ♥️
PyTorch (developed by Meta and widely considered the de facto standard framework for AI model development) and vLLM (an efficient, open-source inference engine optimised for serving large language models) have officially deepened their integration, solidifying a powerful open-source stack for large-scale LLM inference and post-training. This collaboration combines PyTorch’s broad hardware ecosystem and developer tooling with vLLM’s efficient inference engine. The result is improved support for Llama and other open models, native quantisation with TorchAO, and optimised performance via torch.compile.
Key innovations include support for FlexAttention (a flexible, programmable attention backend), quantised inference across data types (Int4, Int8, FP8), and heterogeneous hardware like NVIDIA’s B200 and AMD’s MI300x. The teams have also built support for plain torchrun pipeline parallelism, advancing beyond Ray-only dependencies. They're now working toward scalable, fault-tolerant multi-node inference, disaggregated prefill-decode pipelines, and end-to-end post-training with reinforcement learning.
https://pytorch.org/blog/pytorch-vllm-%E2%99%A5%ef%b8%8f/
From the EDB team
CloudNativePG Contributor Spotlight: Ying Zhu
In the previous edition we mentioned that Ying Zhu was selected as mentee for the LFX Mentorship program. After that selection, Floor Drees interviewed Ying to learn more.
https://cloudnative-pg.io/blog/contributor-highlight-ying-zhu/
Postgres at the time of monster hardware
We mentioned in the previous edition this talk that Lætitia Avrot gave and we linked her slides. You can now watch the recording as well.
PostgreSQL 18 Extension Bugs
With the release of PostgreSQL 18 coming up, Devrim Gündüz published this list of extensions that need to be fixed to support the new version.
https://wiki.postgresql.org/wiki/PostgreSQL_18_Extension_Bugs
Orchestrating Data Workloads With Airflow
Karthik Dulam gave a talk at the Senior Engineering Meetup in Toronto exploring testing patterns for data and ML workflows.
Check out his slides here.
The (Non-) Effect of Primary Keys on Bulk Data Load Performance
Manni Wood tested the prevailing wisdom about dropping primary keys before bulk inserts.
https://www.enterprisedb.com/blog/non-effect-primary-keys-bulk-data-load-performance
Solving Wordle with uv's dependency resolver
Artjoms Iškovs wrote an entertaining and humorous post on modeling a problem (Wordle here specifically) as Python packages to get the package manager’s dependency resolver to solve the problem.
https://mildbyte.xyz/blog/solving-wordle-with-uv-dependency-resolver/
Committer Review: An Exercise in Paranoia
We mentioned in the previous edition this talk that Robert Haas gave and we linked his slides. You can now watch the recording as well.
PostgreSQL Logical and Physical Replication Comparison and the Advantages of Distributed PGD
Florin Irion wrote an introduction to physical and logical replication in Postgres, as well as an introduction to EDB’s advanced replication product, EDB Postgres Distributed.
Incremental Backup in PostgreSQL
Robert Haas spoke at POSETTE about the incremental backup feature he developed for PostgreSQL 17:
The talk will discuss how we determine what data has changed, and why the chosen approach was selected. It will then review in some detail how incremental backups can be taken and restored.
Debugging memory leaks in Postgres, jemalloc edition
Phil Eaton continued his exploration of tools for debugging memory leaks with jemalloc.
https://www.enterprisedb.com/blog/debugging-memory-leaks-postgres-jemalloc-edition
Databases in the AI Trenches
Bruce Momjian gave a keynote presentation at POSETTE:
This talk explores many of the advances that have fueled this explosion, including multi-dimensional vectors, text embeddings, semantic/vector search, transformers, generative AI, and Retrieval-Augmented Generation (RAG). The talk includes semantic/vector search and RAG examples. It finally covers how the valuable data stored in databases can be used to enhance AI usage.
Offline In-place Major Upgrades with CloudNativePG
Jonathan Battiato walked through purely declarative major version upgrades, a new feature available in the latest version of CloudNativePG.
https://www.enterprisedb.com/blog/offline-place-major-upgrades-cloudnativepg
Postgres Storytelling: Cunning Schema Design with Creative Data Modeling
Boriss Mejias teamed up with Sarah Conway at POSETTE:
In this talk, presented in the format of an illustrated storytelling, you will learn some principles of data modeling, letting PostgreSQL guarantee data integrity. This will help you build the business logic in the model itself, giving you extra powers as a software developer, while learning to collaborate with your DBA team. We believe that with this talk you will want to start applying these principles to your current and future projects.
Streamline How Your Code Interacts with Postgres
Manni Wood took a look at what friendly developer UX is possible in modern SQL libraries for statically typed languages like Go.
https://www.enterprisedb.com/blog/streamline-how-your-code-interacts-postgres
Waiting for SQL:202y: Vectors
Peter Eisentraut wrote about a new feature that is in discussion for SQL standardization: vector data types.
https://peter.eisentraut.org/blog/2025/06/24/waiting-for-sql-202y-vectors
PostgreSQL Hacking + Patch Review Workshops for July 2025
Robert Haas wrote about the next PostgreSQL Hacking Workshop that is coming up in July.
https://rhaas.blogspot.com/2025/06/postgresql-hacking-patch-review.html
Until next time
We hope you enjoyed this edition of the EDB Engineering Newsletter! Consider joining the PostgreSQL Hacker Mentoring Discord to get involved!
The EDB Engineering Team