sensible.ioget in trackets now

PostgreSQL Sequences and Array Column Types Aug 24, 2013

This article is a sequel to our PostgreSQL series, which is aimed to teach you how to get the most out of your database. You might have been led by Active Record (or Rails in general) that it is a good idea to completely abstract away the database, but that it's hardly ever possible.

If you plan on switching databases on any larger-than-small application, you're going to have to do some manual work anyway. Every database is special and you should use it as such, instead of just settling for the lowest common denominator.

There are many features in PostgreSQL that can help you develop web applications, such as the array & json column types, hstore, PostGIS and much more. In this article we're going to take a look at arrays.

Every one of us remembers the moment when you first learned about databases and you were told that if you want to store multiple values in one column you have to split that in a 1:N relationship, because each column can only hold one value. Well, that's not true in the world of PostgreSQL. You are free to create array columns with arbitrary length. You can even perform array-like queries on them.

But first we need to start by creating a table. We'll do this with the most obvious example - posts with multiple tags. Each post has a title, content and an arbitrary number of tags. We also want to be able to select all posts with a specific tag.

Let's first create a new database so that we can play around and drop it at the end of out session, keeping our machine clean.

$ psql -U darth postgres
postgres=# CREATE DATABASE test TEMPLATE template0;
CREATE DATABASE
postgres=# \c test;

Now we can create our posts table.

CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255) NOT NULL,
  content TEXT NOT NULL,
  tags VARCHAR(255)[] DEFAULT '{}'
);
test=# \d posts
                                  Table "public.posts"
 Column  |           Type           |                     Modifiers
---------+--------------------------+----------------------------------------------------
 id      | integer                  | not null default nextval('posts_id_seq'::regclass)
 title   | character varying(255)   | not null
 content | text                     | not null
 tags    | character varying(255)[] | default '{}'::character varying[]
Indexes:
    "posts_pkey" PRIMARY KEY, btree (id)

But let's stop here for a moment and take a look at our CREATE TABLE query in more depth. First we're defining our id column as SERIAL PRIMARY KEY. The PRIMARY KEY is pretty self explanatory, but what about the SERIAL attribute? If you come from the world of MySQL you probably know this as AUTO INCREMENT, but there's more to it.

CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255) NOT NULL,
  content TEXT NOT NULL,
  tags VARCHAR(255)[] DEFAULT '{}'
);

PostgreSQL will actually create a SEQUENCE for the auto increment key and use it to generate the default value. The SERIAL keyword here is merely an alias of the following.

test=# DROP TABLE posts;
DROP TABLE

test=# CREATE SEQUENCE posts_id_seq;
CREATE SEQUENCE

test=# CREATE TABLE posts (
  id INT PRIMARY KEY DEFAULT nextval('posts_id_seq'),
  title VARCHAR(255) NOT NULL,
  content TEXT NOT NULL,
  tags VARCHAR(255)[] DEFAULT '{}'
);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "posts_pkey" for table "posts"
CREATE TABLE

test=# ALTER SEQUENCE posts_id_seq OWNED BY posts.id;
ALTER SEQUENCE

Now you might be asking why in the world would you want to do it the more complicated way, instead of using just a simple SERIAL attribute. What if you wanted the id to increment by 2? Or start at a different index than 0. Well that's easy to do just by creating a custom SEQUENCE.

test=# CREATE SEQUENCE derp START 1024 INCREMENT 16;
CREATE SEQUENCE

test=# SELECT nextval('derp');
 nextval
---------
    1024
(1 row)

test=# SELECT nextval('derp');
 nextval
---------
    1040
(1 row)

There are many more options you can pass in here, which is why I'd recommend you to take a look at the official documentation for PostgreSQL SEQUENCEs if this sounds interesting. But now back to array columns.

Defining array column types

The way we tell PostgreSQL that a column type is an array is by simply appending [] at the end of the type, such as the following:

CREATE TABLE things (
  id SERIAL PRIMARY KEY,
  names VARCHAR(255)[],
  numbers INT[],
  descriptions TEXT[]
);

In our example code we're also defining a default value, in this case an empty array, which is defined as '{}' (the single quotes are important here). If we wanted a different default value, we can simply put it in the curly brackets '{thing}'. Note that these arrays columns can also be indexed and thus lighting fast:

CREATE INDEX idx_test on things USING GIN(names);

Now that we know the syntax, we can actually start inserting some data.

test=# INSERT INTO posts (title, content, tags) VALUES ('sample post', 'lorem ipsum', '{apple,orange}');
INSERT 0 1

test=# INSERT INTO posts (title, content, tags) VALUES ('another post', 'fruits are bad', '{apple,hamburger}');
INSERT 0 1

test=# SELECT * FROM posts;
 id |    title     |    content     |       tags
----+--------------+----------------+-------------------
  1 | sample post  | lorem ipsum    | {apple,orange}
  2 | another post | fruits are bad | {apple,hamburger}
(2 rows)

The advantage of doing this over just serializing the tags as a string (for example “apple,hamburger”) is that PostgreSQL knows it's an array and it can perform operations on it. We can easily select posts with a single tag.

test=# SELECT title FROM posts WHERE 'apple' = ANY(tags);
    title
--------------
 sample post
 another post
(2 rows)

We can also update a specific element of an array (notice that the indexing starts at 1 by default).

test=# UPDATE posts SET tags[1] = 'potato' WHERE id = 1;
UPDATE 1

test=# SELECT * FROM posts;
 id |    title     |    content     |       tags
----+--------------+----------------+-------------------
  2 | another post | fruits are bad | {apple,hamburger}
  1 | sample post  | lorem ipsum    | {potato,orange}
(2 rows)

There's a lot operations that are supported right of the box, you can see the full list here.

This all about covers the basics of using array columns. There are more things which you can do, for example multi-dimensional arrays, but I would encourage you to take a look at the official PostgreSQL documentation, as it explains everything in details.

If you're using Rails 4, you can now make use of the array columns without the need for gems like postgres_ext:

class CreateUsers < ActiveRecord::Migration
  def change
    create_table :users do |t|
      t.string :username
      t.string :roles, array: true

      t.timestamps
    end
  end
end

One closing thought, it is important to think before using an array column. This is not a replacement for normalization and you should really think about the way you plan on using the data. If, for example, you wanted to create a dedicated page where you manage all of your tags, assign them to posts and so on, using an array column is probably not the best choice.

The following articles will cover things like JSON column types, hstore, migrating data from one type to another and much more.

Written by Jakub Arnold of sensible.io.

Are you having trouble with JavaScript errors in production?

We are building a tool to solve the problem once and for all. It's called Trackets and it's going to be really awesome.