EDB Engineering Newsletter #1
Welcome to the first edition of the EDB Engineering Newsletter! Where we will share with you interesting links in the data world that the EDB Engineering team has enjoyed, as well as other news about what the EDB Engineering team is up to!
What we're reading and watching
Bufstream 0.1.0 from Jepsen
Jepsen's latest report covers a Kafki-compatible streaming database, Bufstream. Over the course of testing, Jepsen found and reported a number of bugs in Kafka's transaction protocols while using the Kafka Java client.
This is also interesting in light of the Jepsen + RedPanda report a few years ago that talked about some of the misconceptions database users might have about transactions in the Kafka world.
https://jepsen.io/analyses/bufstream-0.1.0
From RAG to Knowledge Assistants by LlamaIndex
Agentic RAG is becoming a huge thing. LlamaIndex has released LlamaCloud and a couple of other modules like LlamaParsing, LlamaDeploy etc. They claim their platform has all sorts of fancy parsing, chunking, indexing, and agentic workflows to build an advanced RAG that is ready for enterprise needs in a few clicks. In the video, two things have been focused on, and they're claiming this is the difference they brought to the table above anything else.
1. Parsing: Parsing with hierarchical indexing and supporting this with recursive retrieval. And this reduces hallucinations.
2. Agentic Flow: Act according to the user query! is the key message they deliver. Generate embedding and continue with naive RAG if necessary. Connect with a search engine and do the search, or build a Chain of Thought to reason the query, etc.
On writing by Pat Helland
Pat Helland quoting Jim Gray, "You are not writing enough."
SQL's FILTER clause by Modern SQL
Markus Winand encourages database vendors to support the SELECT with FILTER clause. Postgres has supported this for a decade, SQLite added support more recently, but few other databases support this useful clause.
https://x.com/ModernSQL/status/1854824811417027020
CIOs are (still) closer than ever to their dream data lakehouse by Mike Feibus
Highlights the shifting of the entire analytics market in the past 2-3 years towards an open data and open protocols industry that draws its power and utility from an explicit multi-vendor and multi-tech stack philosophy. Support for Iceberg has become an absolute table stake and necessary entry ticket for participation in this industry. The new battle ground for differentiation and value add is happening on the metadata and governance level now.
Userland Disk I/O by Alex Miller
Alex Miller writes about how to achieve durability on disk through different APIs on Linux, macOS, and Windows.
https://transactional.blog/how-to-learn/disk-io
Artificial Intelligence, Scientific Discovery, and Product Innovation by Aidan Toner-Rodgers
For AI skeptics out there: this is an interesting, pragmatic take on understanding how LLMs may have an impact on the productivity of real, day-to-day work. It’s very long, but it’s worth even skimming the introduction.
https://aidantr.github.io/files/AI_innovation.pdf
EDB at PGConf.EU
Videos are now out from our many speakers at PGConf.EU. Most editions of this newsletter will not include such a large section of videos, but PostgreSQL conferences are a big deal for EDB Engineering!
Postgres Over the Horizon: Anchor Points for the Future We Should Be Building Now by Peter Eisentraut
Predicting the future often requires a look into the past. In this keynote, we will revisit the 2013 PGConf.EU keynote, “PostgreSQL in 5 years - Expectations from the marketplace,” to see how far we’ve come and use those insights to project what the future holds for Postgres. As we look ahead to the next five and ten years, the focus will shift beyond transactional capabilities to encompass expanded roles for AI and advanced analytics workloads. The future of Postgres lies in its ability to meet the engineering demands of an AI-driven world. Join us as we outline the key anchor points that will shape the future of Postgres, helping developers and enterprises alike build resilient systems designed for tomorrow’s data landscape.
Sparta’s Dual Kingship and PostgreSQL Active-Active by Boriss Mejias
In this talk we will learn about the possibilities and challenges of a PostgreSQL cluster when multiple writable nodes collaborate to a common goal, and how we can apply lessons from classical Greek government systems.
Incremental Backup by Robert Haas
In this talk, I'll discuss the incremental backup feature which I developed for PostgreSQL 17. The talk will discuss how we determine what data has changed, and why the chosen approach was selected. It will then review in some detail how incremental backups can be taken and restored, and why things work as they do. It will briefly touch on use cases for the feature and possible future work in this area.
Untangling the Web of PostgreSQL Permissions by Lætitia Avrot
Users, roles, and permissions in PostgreSQL - it sounds like a snoozefest, right? Wrong. This dull topic is a minefield of disasters waiting to happen. One wrong GRANT and suddenly your intern has DROP privileges on your production database. Oops.
In this talk, we'll navigate the treacherous waters of PostgreSQL's security model. We'll start with the basics - what's the difference between a user and a role anyway? (Spoiler: nothing, but don't tell anyone I told you that.) Then we'll dive into the nitty-gritty of permissions, from the obvious (SELECT, INSERT) to the obscure (TRUNCATE, anyone?).
Postgres: From Cloud to Hybrid and On Prem Again by Gabriele Bartolini
The rise of operators like CloudNativePG is transforming Postgres deployments in Kubernetes, providing flexibility for running databases across VMs or bare metal, with dedicated storage for high-performance needs. This open source stack—combining Kubernetes, Postgres, and CloudNativePG—also aligns with the European Union's Data Act, which requires service providers to guarantee data portability and facilitate switching between vendors.
Demystifying Kubernetes for Postgres DBAs; A Guide to Operators by Adam Wright
This technical presentation, featuring live demos, aims to demystify Kubernetes and provide Postgres professionals with essential knowledge about Kubernetes Operators. Attendees will learn how day-to-day life might change for a DBA when shifting from self-managed Postgres to using a Postgres operator. The focus will be on general Kubernetes and Postgres concepts, with examples from multiple operators and demos from the open source CoudNativePG operator, but avoids favoring any specific operator.
Integrating AI with Postgres: Opportunities, Challenges, and Future Possibilities by Bilge Ince
In this session, EDB Machine Learning Engineer Bilge Ince will explore the seamless integration of AI within PostgreSQL. Discover how AI-driven enhancements like the aidb extension and pgvector transform Postgres from a traditional database into an intelligent system. This talk will provide practical insights for leveraging AI within your data infrastructure using Postgres. Plus, you’ll gain a deeper understanding of the role of LLMs and their interaction with vector data in Postgres. As AI continues to unlock new possibilities with Postgres, this talk offers a preview of future opportunities for integrating AI with the world’s most powerful database technology.
Introduction to Fair-Use TPC Benchmarking Kits by Mark Wong
Let us go over a handful of freely available benchmarking kits that can be used with PostgreSQL. They are designed to characterize system performance and give you an idea of how well your system performs. They can also be used for evaluating the performance of patches!
Column encryption solutions and ideas by Peter Eisentraut
Many users are looking for data encryption solutions, for security and compliance reasons. In many cases, targeted solutions for column-level encryption can be appropriate and in some cases offer even better security and compliance than full-disk encryption or TDE.
In this talk I will introduce solutions for column-level encryption, including application-side solutions and solutions using plugins like pgcrypto and pgsodium. I'll also cover some cryptographic details and typical security and regulatory requirements for encryption in databases and how different encryption solutions can address them.
Debugging active queries with mid-flight instrumented explain plans by Rafael Thofehrn Castro
In this talk I will present an extended/experimental version of that patch where active queries with an enabled flag print the instrumented execution plan to a catalog table in a regular interval and demonstrate how this can help troubleshoot queries that never finish.
From the EDB team
Discovering ABI breakage in the Postgres 17.1 release
Pavan Deolasee first noticed and reported an unintentional breaking ABI change, a newly added field, in Postgres 17.1 that even impacted upcoming minor releases all the way back to Postgres 13. This issue was later also noticed and written about by developers at Timescale and Crunchy Data. The Postgres team quickly patched the struct so that the newly added field would fit into existing padding. And the Postgres team decided to re-release the upcoming minor releases including Postgres 17.1. Kudos to Pavan for catching this!
Postgres Person of the Week
Rushabh Lathia, was Postgres Person of the Week! Congrats Rushabh!
https://postgresql.life/post/rushabh_lathia/
And in the subsequent week, Bilge Ince was Postgres Person of the Week. Congrats Bilge!
https://postgresql.life/post/ayse_bilge_ince/
Why pg_dump Is Amazing by Robert Haas
To sum up, I find pg_dump to be an excellent tool for dealing with almost any sort of complicated backup scenario. For routine backup and restore, other options are generally better, but as soon as things are non-routine, pg_dump becomes, at least in my experience, an absolutely indispensable tool.
https://rhaas.blogspot.com/2024/11/why-pgdump-is-amazing.html
Exploring Postgres's arena allocator by writing an HTTP server from scratch by Phil Eaton
Phil Eaton wrote a post demonstrating Postgres's MemoryContexts by building a little HTTP server, and then using the bcc memleak tool to hunt for memory leaks.
https://www.enterprisedb.com/blog/exploring-postgress-arena-allocator-writing-http-server-scratch
PostgreSQL Hacking Workshop - December 2024
Next month, I'll be hosting a discussion of Melanie Plageman's talk, Intro to Postgres Planner, given at PGCon 2019. You can sign up using this form. To be clear, the talk is not an introduction to how the planner works from a user perspective, but rather how to hack on it and try to make it better and perhaps get your improvements committed to PostgreSQL. If you're interested, please join us. I anticipate that both Melanie and I will be present for the discussions.
https://rhaas.blogspot.com/2024/11/postgresql-hacking-workshop-december.html
Until next time
We hope you enjoyed this edition of the EDB Engineering Newsletter! Consider joining the PostgreSQL Hacker Mentoring Discord to get involved!
The EDB Engineering Team