Postgres array_to_string() and array_agg() to decouple your interface

Let’s say you’ve got a collection of job data for a set of users that looks like this, and you want to create a nice summary of it to be displayed, with a count of how many jobs are in each category for each user, all on one line.

 │id │ user_id │ job │ status    │
 ├───┼─────────┼─────┼───────────┤
 │ 1 │ 12      │ 1   │ Completed │
 │ 2 │ 12      │ 2   │ Cancelled │
 │ 3 │ 14      │ 3   │ Ready     │
 │ 5 │ 14      │ 4   │ Completed │
 │ 6 │ 14      │ 4   │ Completed │
 │ 7 │ 14      │ 4   │ Cancelled │
 ...

Here’s  the report of summarized statuses of jobs for each user that you want.

│ user_id │ summary                           │
├─────────┼───────────────────────────────────┤
│ 12      | 1 Cancelled, 1 Completed          │
│ 14      | 1 Cancelled, 2 Completed, 1 Ready │

I’ll show you how it’s possible to provide this solely with a Postgres SELECT.

Now we can easily get the counts of jobs in each category for each user:

SELECT Count(*) number_of, 
       status, 
       user_id 
FROM   jobs 
GROUP  BY status, 
          user_id 
ORDER  BY status, 
          user_id; 

 │number_of │ status    │ user_id │
 ├──────────┼───────────┼─────────┤
 │ 1        │ Cancelled │ 12      │
 │ 1        │ Completed │ 12      │
 │ 1        │ Cancelled │ 14      │
 │ 2        │ Completed │ 14      │
 │ 1        │ Ready     │ 14      │

We’d now expect that we’d need to use an external program to process these rows and format the summary line, but with a little bit of help from Postgres’s array functions, we can do all the formatting right inside the query!

We’ll start off with the results of the query that gave us the counts by status. First we’ll try combining the number_of column and the status, using the Postgres string concatenation function, and hope we can combine those to get our final output. A näive approach would be to try just concatenating the number_of column and the status:

SELECT user_id, 
       number_of 
       || ' ' 
       || status job_status 
FROM   (SELECT Count(*) number_of, 
               status, 
               user_id 
        FROM   jobs 
        GROUP  BY status, 
                  user_id 
        ORDER  BY user_id) AS foo 
GROUP  BY status, 
          user_id 
ORDER  BY user_id;

But this fails:

ERROR: column "foo.number_of" must appear in the GROUP BY clause or 
be used in an aggregate function
LINE 1: select user_id, number_of|| ' ' ||status job_status 
from (S...

This means we have to put that in an aggregate function, like COUNT(), but we don’t want to alter the data, just force it to be an aggregate type. How do we do that?

Postgres has a function, array_agg(), that takes the arguments passed to it and turns them into a Postgres array – this operation is considered to have created an aggregate type, even if only one item is passed to it. That’ll fix the requirement to have an aggregated value, but a Postgres array won’t look pretty in our output, looking something like {'1 Completed'}.

We can use the Postgres array_to_string() function to convert our (one-element) array back to a string. Normally it would join the array elements together with the string supplied as the second argument, but we’ve only got one, so it just drops the braces and quotes to give us the pretty output we want. Let’s try that:

SELECT Array_to_string(Array_agg(number_of 
                                 || ' ' 
                                 || status), ', ') summary_status, 
       Max(user_id)                                USER 
FROM   (SELECT Count(*) number_of, 
               status, 
               user_id 
        FROM   jobs 
        GROUP  BY status, 
                  user_id 
        ORDER  BY user_id) AS foo 
GROUP  BY user_id;

And our output looks pretty good.

│ user_id │ job_status  │
├─────────┼─────────────┤
│ 12      │ 1 Cancelled │
│ 12      │ 1 Completed │
│ 14      │ 1 Cancelled │
│ 14      │ 2 Completed │
│ 14      │ 1 Ready     │

Can we go one more level, and get all the statuses combined on a single line for each user? Yep, we can! This time we’re going to need to group by user_id, so we’ll cheat and use max() as a no-op aggregate function (since all of the user IDs we’ll have in a group will be equal). The final query looks like this:

SELECT Array_to_string(Array_agg(number_of 
                                 || ' ' 
                                 || status), ',  ') summary_status, 
       Max(user_id) USER 
FROM   (SELECT Count(*) number_of, 
               status, 
               user_id 
        FROM   jobs 
        GROUP  BY status, 
                  user_id 
        ORDER  BY user_id) AS foo 
GROUP  BY user_id; 

Now we’re using our original query, and grouping the results by user ID. This means we’ll have one or more sets of count and status for each user ID, and we then use the array_to_string(array_agg()) trick again. The array_agg() constructs the "N <status>" strings, and the array_to_string() joins them together, separated by a comma and space. Our final result looks like this:

│ summary_status                    │ user │
├───────────────────────────────────┼──────┤
│ 1 Cancelled, 1 Completed          │ 12   │
│ 1 Cancelled, 2 Completed, 1 Ready │ 14   │

At this point you’re probably thinking, “well, this was cute, getting Postgres to format the string, but why bother?”

The reason is that because Postgres is creating this formatted string solely by set operations; it doesn’t know or care how many categories there are, and it doesn’t need to keep track of whether categories were renamed or new ones were added or old ones were removed; it simply finds the data, puts it together in a predefined format, and returns it. Consumers of the data can simply display it (hand-waving visual considerations, like “is this not too wide to fit on the page”). This decouples any API from the data, as the API can simply say, “I have found the data you want and formatted it for you”. Any UI can simply display it. Now neither API nor UI needs to be concerned about data content changes, as they know that no matter how the content morphs, the formatting will be consistent.

Comments

Leave a Reply