Future of Databases (2025)

Future of Databases (2025)

The rise of AI leaves opportunity for databases

Before digging into my depths of predictions it is worth noting that this is a personal opinion piece. Don’t be shocked. I’m sure you were on the hunt for a definitive guide to what the future holds. Rest assured knowing that I did try to order a crystal ball to verify my below thoughts before spreading them onwards to you. However, sadly I had to return it due to not knowing how it worked. Looks like you’re stuck with my semi-educated series of hunches.

Recently Satya Nadella, the CEO of Microsoft, made a profoundly bold prediction that the era of SaaS applications are “dead”. He continues on by explaining SaaS (software as a service) with the following quote, a precisely succinct description on what a SaaS businesses truly is:

“They are essentially CRUD databases with business logic.” – Satya Nadella

Ouch. For many businesses this is a cold hard truth they don’t want to hear. It’s not scary simply because it is a fact of life. The entire sentiment is scary because of the implications it has with the rise of artificial intelligence and how easily it will become for AI to essentially replicate these company use cases many times over. Imagine a world where instead of going to Company A for their SaaS offering that thousands of other companies use, you instead use AI to create a custom tailored solution specific to your applications specialized needs. We all know that nobody uses 100% of the features of any substantial SaaS application as it is today. Building a solution with only the tools in the toolbox that we need without the extra junk to sift through is going to ultimately become the preferred method and I’d be a fool to think otherwise.

Databases, on the other hand, will continue to be a requirement for building. In a world where AI agents are creating microservice after microservice we will still need a place to store the data and have it be queried. Databases always will be the source of truth.

What will databases need to be?

Let’s face it. Our foreseeable future will involve AI producing code in some capacity. My gut tells me it will be in a larger capacity than we expect it to be. Humans (or maybe we should still call ourselves developers to maintain some coveted status) will evermore become bystanders and reviewers of produced code. But just as we need databases to create applications today, AI is going to need databases for it to produce applications tomorrow. That’s a fact I refuse to say will change.

When ChatGPT or Claude or {insert name of your favorite tool here} is coding your project instead of you we are going to want to set it up for success. Success, I’ll argue, will come in the form of so much standardization around an adopted protocol that it’s hard for hallucinations to occur because it is the de facto. As I’ll cover shortly I believe MCP (Model Context Protocol) has a good solution to that problem when it comes to handling transactions with a data store. We will also need to have some form of standardization surrounding how code that is written can be attached seamlessly to the data store to act as the middleware of CRUD interactions that SaaS applications once owned the turf for.

That only begins to scratch at the surface in terms of tooling we will need to have in place. We want to give AI access to our data but we do not want to overexpose sensitive information. It’s very likely having access to more than just a singular data source will prove helpful, but should likely operate via a single proxy. Systems need to be built to live both ephemerally and long-lived but on both accords need to be fast to stand up. Ultimately we need a fast, safe, embeddable option at a bare minimum. Today both SQLite and PGlite seem to be well positioned database products for the future.

Attachable AI Agent Code

All over the internet we see projects being kickstarted from AI prompts whether we look in the direction of tools such as v0 by Vercel, Replit, Magic Patterns, Lovable, and a plethora of others as the space becomes more crowded by the day. Each of the aforementioned tools as well as every other tool in the space has their own preferences on languages, frameworks, and libraries they rely on to make their code executable for the end user. Many of them today struggle with putting a full-stack application together with all the necessities of a production grade deployment-ready setup but if you have been paying attention to the space we are quickly inching closer to that reality.

With AI producing code we know it is going to inevitably need access to a data source to both store and read data from. Today most applications utilize a traditional backend API setup where we write some code in a backend language such as Node and deploy it to an AWS EC2 instance and allow our frontend projects to send requests through it for data fetching. A bold assumption of mine is complex architectures will start to decline with the rise of AI. It’s not that I don’t think we will see products and tools such as EC2 become AI powered…. I just believe companies who have a simpler and more cost effective solution will be the winners in the space. Bullish on companies such as Cloudflare here to win out.

Rather than spinning up complex architecture we need a platform in place where it is quick, easy and straightforward to append new application logic to a database. You need a database where you can think of it as clicking checkboxes to turn on “features” whether hand-written by developers or produced by AI agents. Do you want row-level security? Data masking? User authentication? Analytics? RAG pipeline & vector search? AI has become extremely good at writing these types of microservices where there is a specific set of feature requirements for it to meet and refine upon. We should continue to foster that focus for it to produce an output that does very very well at its intended goal. With that we need to ensure that we have the ability for these microservices to work together.

If you have not had a chance to check out StarbaseDB yet, you should. We have created it with a Plugin Architecture in mind where you can simply include a “plugin” (read: microservice) to the database layer. It appends new API routes right to the proxy instance of your database and allows you to tap into pre-query and post-query hooks. A pre-query hook allows you to get access to a SQL statement and its parameters prior to it being executed against the data source so you can perform custom logic whether you want to quickly respond with a cached version of the query results, or alter the SQL statement, or anything else you might imagine doing. Similarly with a post-query hook you can tap into the workflow after a database transaction has been executed and read the result object and even go as far as manipulating the data before it is returned to the user. Use cases for post-query hooks are plentiful but to offer one example there are plenty of data columns in your database that likely have PII or sensitive information in them that you would never want to actually leave your database environment and reach your users so ripping those columns out of the response would be ideal. This is an especially important factor when connecting your database directly to AI agents!

Common language with MCP

Iterative programming will require agents to be capable of performing database introspection to know the structure and relationships of your database for it to write code that adheres to the user level agreements that are currently in place. In November Anthropic announced they had open-sourced their Model Context Protocol (MCP) which is intended to be a new standard for connecting AI assistants to the systems where data lives. Creating a solution at the right space and the right time, Anthropic hit a bullseye here.

What MCP enables is for chat assistants and AI agents to have a standardized way to connect to and communicate with any supported data store on the internet. So long as your data tool supports and adheres to the requirements of the Model Context Protocol then these agents will be able to make SQL queries on your behalf to give you data answers to your questions, analyze, refine and much more.

Of all the sections in this post this should be the section that surprises people the least. We need a standard way for AI to access databases regardless of which AI assistant or agent you are using. MCP might very well be our best answer.

Prevent access to sensitive data

When it comes to tiers of importance, this section sits atop them all. Who cares how you write your code or how you implement it. If you can’t write code that protects the data of your users then nothing else even matters.

How do databases prevent accessing sensitive data today? Most databases (excluding SQLite) offer the ability to create user roles which allow you to in essence grant various types of access rules on how you can interact with the database. Perhaps one role has full admin rights to perform all CRUD operations on the database, while a more limited in scope user role only has access to perform SELECT statements from a subset of your tables. While user roles are great for many tasks they can be cumbersome to create and manage via SQL statements particularly as your schema grows over time and you have to come back and continuously revisit those rules.

Postgres, Microsoft SQL Server, Redshift and a small handful of other databases support Row Level Security with some even supporting column-level or even cell-level access control. What RLS as a feature enables is the ability to assign a specific user (e.g. a user making a request with userID 123) access to read rows that are only owned by that exact user. When a table of todo items has many entries from all users but our user with ID 123 makes a request to get all of their todo items we feel confident that the database will only return items that are owned by user with ID 123. Another database security feature that is worth enabling and utilizing to add security at depth protective layer ensuring users only have access to the data they should have access to and nothing more.

Sometimes we do want to have the AI agents be able to introspect these columns and even fetch data values from them but it needs to be in a secure manner. For example, for the AI agent to write code that matches what our system already supports it may want to make a request to our database and fetch the column values for our ssn column that contains users social security numbers. Typically we would want nobody to have access to this type of information but it’s important that the AI agent knows how we are formatting the data in our column so it can produce code that either includes spaces, hyphens or trims the string based on how we’re storing that information today. The first two database features we covered, User Roles and Row Level Security, do not allow AI agents to accomplish this task. Systems such as StarbaseDB offer post-query hook features such as Dynamic Data Masking where you can enforce rules where when our ssn column is queried, the data can then be replaced by a randomly generated social security number with the same data formatting that we store in our database. AHA! Now we can give AI access to this column, it can know exactly how we’re handling data today, and continue to support and enforce those previous decisions without creating potential system issues or data inconsistencies.

Ephemeral, Embeddable, Easy

Ephemeral. Some operations need to be long-lived. You can think about how your traditional SaaS product for example always needs to have a running backend service and database somewhere for it to be continuously accessible. But we also need the ability to have ephemeral services able to be spun up and spun down after the task has been completed. Think about AI tasks that may string together a series of agents to accomplish a task. Maybe you want a data analysis agent to come in, take a look at your data or code, analyze it and make recommendations. Do you really want that sensitive data information or your projects code on a server with everyone else who has used this agent to perform a similar task? No. At least I hope you answered no. We would much prefer the AI agent to be able to spin up an ephemeral database to store the information it needs to and have confidence it’s on a clean slate and our information would never spill over (even accidentally) to the next user of the agent. Databases need to be capable to be spun up quickly and live ephemerally and in my opinion SQLite excels at this almost as if it was made for this exact use case.

Embeddable. Benefits of an embedded database are numerous. Removal of network latency from the equation. Privacy and security is the default. Imagine an AI agent or service being able to have a local first embedded database for it to perform its operations against, reducing the need to submit external network requests to perform SQL statements. The ability to aggregate data in a data store and analyze it all locally would be a tremendous benefit. Many AI implementations today use and overly abuse the context window where the attempt to send in all of the available data in hopes that AI will decipher it successfully. For larger operations I believe storing information in a database and having AI know when it should access what data (going back to MCP) and allowing it to exclusively pluck data out as needed while it accomplishes a sequence of steps can help maintain objective focus. There is much more to be said about various other use cases such as AI creating a database, populating it, and then submitting it over the server to other users as local-first replicas. Or perhaps services that require local data syncing with an embedded database. Either way there’s been a growing movement in this direction for a number of years with no sign of slowing.

Easy. At the end of the day when you have been converted from programmer to code reviewer thanks to the generative code future we will be living in, there will still be an importance that you (yes you!) can read and understand exactly what is happening. Never will we want to blindly trust what has been produced on our behalf is ready to submit directly to production. Just as we do today we need to understand all of the code that is written in our projects. When I say “easy” I mean that all the infrastructure needs to be easy enough to trace and understand, the code needs to be able to work with other code in a sensible manner, it needs to be easy to run locally and deploy to the cloud. You will become an extension of AI which starkly contrasts with how we view it today, AI being an extension of you. The importance of standardization on how code is written, how it is attached to our database, the workflow in which operations run, and a predictable outcome with predictable tools is paramount.

Conclusion

It’s impossible to predict all the use cases that will need to be covered by the many AI workloads that our future holds. One thing is for certain though… no matter the database of choice, it needs to have an incredibly amount of flexibility. Whoever emerges as the clear winner in the era of AI will be one that can flexibly handle both relational and analytical workloads. An option that excels at vector searches and can handle RAG pipelines with ease. It needs to be available fast. Embeddable locally and also hosted in the cloud, serverless. Data secured so it can understand the shape of your data without having direct access to PII (personally identifiable information). Lastly, a system that is setup to easily allow code to be plugged into it that gives it the access it needs without having to know much about the implementation details itself.

Databases will always be a requirement of development. Always. I do not believe the features we see in databases today will be necessarily sufficient for the needs of tomorrow and there is a bit of a greenfield opportunity that does exist to close the gap. Who will close that gap and win in one of the most competitive technological landscapes in infrastructure? As Wayne and Garth once famously said… GAME ON!