If you’re new to PostgreSQL, you may have come across this error when trying to perform a simple GROUP BY
. Which solution you choose depends on your situation, but let’s look at how using an aggregate function can help us.
Imagine we have the following table to store user comments:
id | user_id | created_at | text ----+---------+---------------------+------ 1 | 1 | 2020-07-21 00:00:00 | foo 2 | 1 | 2020-07-21 12:00:00 | bar 3 | 2 | 2020-07-21 18:00:00 | baz
We have three comments from 21 July, two from user #1 and one from user #2. If we want to select the latest comment per user per day, we may try the following
SELECT * FROM comments GROUP BY date_trunc('day', created_at), user_id;
which results in the familiar error
ERROR: column "comments.id" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: SELECT * FROM comments GROUP BY date_trunc('day', created_at... ^
What this error is telling us is that our query is unspecific.
Consider user #1. They have made two comments on 21 July: comment #1 and comment #2. If we’re grouping by day and by user, comment #1 and comment #2 are in the same group. We need to tell PostgreSQL, explicitly, how to choose between these two comments. As the error message tells us, we can do this either by including the comment ID (the primary key of the table) in the GROUP BY clause or by using an aggregate function.
Since we want to include only the latest comment, but also select all table columns, we’ll use a sub query and the aggregate function MAX
on the comment ID, like this
SELECT * FROM comments INNER JOIN ( SELECT MAX(comments.id) as latest_id FROM comments GROUP BY date_trunc('day', created_at), user_id ) subquery ON subquery.latest_id = comments.id;
which returns the expected result
id | user_id | created_at | text | latest_id ----+---------+---------------------+------+----------- 2 | 1 | 2020-07-21 12:00:00 | bar | 2 3 | 2 | 2020-07-21 18:00:00 | baz | 3
Note: if we did not need to SELECT * we could aggregate without the sub query:
SELECT MAX(id) FROM comments GROUP BY date_trunc('day', created_at), user_id;