Neon is now SOC 2 Type 2 compliant 🚀Read more
Postgres

Recreating S3 in Postgres using PostgREST

Serve arbitrary binary files directly from your database

Storing files directly in a database is generally discouraged in favour of dedicated object storage like S3 or Azure Blob, which is the more scalable and cost-effective approach. However, in practice, you might sometimes find yourself putting binary data in a relational database anyway. Here, we’ll explore PostgREST by building a simple file server directly within Postgres as it provides a practical context for implementing core features like functions as RPC, RLS, and database roles.

What is PostgREST?

PostgREST turns Postgres into a webserver, exposing API endpoints for CRUD operations based on constraints and permissions in the database. The PostgREST philosophy focuses on making the database the single source of truth which can help avoid common pitfalls in API development like duplicating or ignoring database structures, leaky ORM abstractions, and managing permissions in API controllers instead of directly within the database.

We’ll be accessing PostgREST using the new Neon Data API, available for members of the Early Access Program. If you’re interested in trying features right as they come out, join here!

Setting up the Database

First, start by provisioning a new Postgres instance by creating a fresh project in the Neon console. The Neon Data API is an opt-in feature at the branch level, so head over to the branches tab to activate it. When you do, you’ll be given a URL through which you can access the PostgREST generated API endpoints.

Post image

Next, head to the SQL editor tab and add the pgcrypto extension for hashing files, and the blobs table we’ll be using to store all our binary data and its metadata.

For the sake of this blog, we’ll only support a tiny subset of common MIME types and classify them based on the file extension. 

Since we’ll be querying by bucket_path and file_name, let’s add an index too.

Finally, we’ll populate the database with some mock data to play around with using a simple Python script. Here I added a PNG, JPG, and PDF.

Serving files

Now, to actually serve the file we can use PostgREST’s functions as RPC. We’ll support 2 operations, get_file() and get_metadata().

First, we need to create a special Postgres domain which tells PostgREST that the function we’re creating can return any kind of media, or a wildcard MIME type.

If we knew, for example, that we would only ever be dealing with PNG’s, we could create the following domain instead and use that as the return type of the function.

Next, we’ll create the get_file() function, which takes the bucket_path and file_name as parameters and returns data using the newly created generic media domain. Note, the parameters are prefixed with p_ to avoid name collisions with the blobs table columns. Inside the function, we retrieve the file record matching the bucket path and file name from the blobs table. If the file is found, the function creates the HTTP header with the MIME type, file disposition, and cache control settings, and returns. If the file isn’t found, an explicit exception is raised instead.

Since get_file() doesn’t modify the database, it can run under the GET method, meaning we can now see these files directly from our browser. For example, opening the following URL shows the beautiful Neon logo served directly from the database! Note that if the response tells you that it’s missing something in the schema, you might need to force a reload using NOTIFY pgrst, 'reload schema;  in the SQL editor.

Post image

PDFs also work as you’d expect right out of the box : 

Post image

Another important part of an object store is metadata retrieval, so let’s implement the get_metadata() function which fetches metadata such as file size, type, hash, creation date, and any custom metadata.

Now, if we head back to the browser and change the get_file() call to a get_metadata() call, we get the following.

Permissions and RLS

Right now, all the files in our database are publicly accessible, which might not always be desirable. So, let’s add some auth using Neon Auth, an is_public column, and some row-level security policies based on the new fields and roles for more granular access control. First, let’s add the new fields and enable RLS.

With RLS activated, let’s now define who can see what.

If a user is anonymous, or unauthenticated, then they can only see the file if its is_public flag is set to true.

Authenticated users can see blobs if they are public, or if they are the owners.

The rest of the CRUD operations are only available to authenticated users if the blob belongs to them. 

Since the new column defaulted all is_public values to true, nothing has changed and we can still access all the files as before. However, if you now update any file in your Neon console to set is_public to false, unauthenticated users will no longer see that file. Attempting to access it as an anonymous user, like in a browser bar, will result in a response like this:

To view files marked as non-public, we need to provide a valid JWT containing the key-value pairs “role”: "authenticated" and "sub": "<user-id>", where the user id matches that attached to the blob. Under normal circumstances, this JWT would be managed in your frontend by Neon / Stack Auth, or another identity provider. In this case, however, I extracted the JWT manually from the browser cookies of a fake user I created in the Neon console auth tab for demonstration purposes. Making a request to the API as a properly authenticated user, we can now see the previously inaccessible file and metadata.

Post image

By default, all the tables you create will have SELECT permissions granted for anonymous users, so properly setting up RLS policies is very important when using the Neon Data API. If you want to see which policies you’ve set up, you can query your database for them, and filter by table name.

Conclusion

Using Neon and PostgREST, you can quickly set up a simple object store to serve files and metadata without additional services. While you probably won’t ever use Postgres and PostgREST as a mock S3, the ideas discussed in this blog can naturally extend to many API scenarios you will encounter.


Neon is a serverless Postgres platform with instant provisioning, branching, and autoscaling. Get started on our free plan and spin up a fully configured Postgres instance in seconds.