ERROR: column must appear in the GROUP BY clause or be used in an aggregate function

elephant side view close black and white feeding

If you’re new to PostgreSQL, you may have come across this error when trying to perform a simple GROUP BY. Which solution you choose depends on your situation, but let’s look at how using an aggregate function can help us.

Imagine we have the following table to store user comments:

 id | user_id |     created_at      | text 
----+---------+---------------------+------
  1 |       1 | 2020-07-21 00:00:00 | foo
  2 |       1 | 2020-07-21 12:00:00 | bar
  3 |       2 | 2020-07-21 18:00:00 | baz

We have three comments from 21 July, two from user #1 and one from user #2. If we want to select the latest comment per user per day, we may try the following

SELECT * FROM comments 
GROUP BY date_trunc('day', created_at), user_id;

which results in the familiar error

ERROR:  column "comments.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT * FROM comments GROUP BY date_trunc('day', created_at...
               ^

What this error is telling us is that our query is unspecific.

Consider user #1. They have made two comments on 21 July: comment #1 and comment #2. If we’re grouping by day and by user, comment #1 and comment #2 are in the same group. We need to tell PostgreSQL, explicitly, how to choose between these two comments. As the error message tells us, we can do this either by including the comment ID (the primary key of the table) in the GROUP BY clause or by using an aggregate function.

Since we want to include only the latest comment, but also select all table columns, we’ll use a sub query and the aggregate function MAX on the comment ID, like this

SELECT * FROM comments 
INNER JOIN (
  SELECT MAX(comments.id) as latest_id 
  FROM comments 
  GROUP BY date_trunc('day', created_at), user_id
) subquery ON subquery.latest_id = comments.id;

which returns the expected result

 id | user_id |     created_at      | text | latest_id 
----+---------+---------------------+------+-----------
  2 |       1 | 2020-07-21 12:00:00 | bar  |         2
  3 |       2 | 2020-07-21 18:00:00 | baz  |         3

Note: if we did not need to SELECT * we could aggregate without the sub query:

SELECT MAX(id) FROM comments 
GROUP BY date_trunc('day', created_at), user_id;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s