ERROR: column must appear in the GROUP BY clause or be used in an aggregate function

elephant side view close black and white feeding

If you’re new to PostgreSQL, you may have come across this error when trying to perform a simple GROUP BY. Which solution you choose depends on your situation, but let’s look at how using an aggregate function can help us.

Imagine we have the following table to store user comments:

 id | user_id |     created_at      | text 
----+---------+---------------------+------
  1 |       1 | 2020-07-21 00:00:00 | foo
  2 |       1 | 2020-07-21 12:00:00 | bar
  3 |       2 | 2020-07-21 18:00:00 | baz

We have three comments from 21 July, two from user #1 and one from user #2. If we want to select the latest comment per user per day, we may try the following

SELECT * FROM comments 
GROUP BY date_trunc('day', created_at), user_id;

which results in the familiar error

ERROR:  column "comments.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT * FROM comments GROUP BY date_trunc('day', created_at...
               ^

What this error is telling us is that our query is unspecific.

Consider user #1. They have made two comments on 21 July: comment #1 and comment #2. If we’re grouping by day and by user, comment #1 and comment #2 are in the same group. We need to tell PostgreSQL, explicitly, how to choose between these two comments. As the error message tells us, we can do this either by including the comment ID (the primary key of the table) in the GROUP BY clause or by using an aggregate function.

Since we want to include only the latest comment, but also select all table columns, we’ll use a sub query and the aggregate function MAX on the comment ID, like this

SELECT * FROM comments 
INNER JOIN (
  SELECT MAX(comments.id) as latest_id 
  FROM comments 
  GROUP BY date_trunc('day', created_at), user_id
) subquery ON subquery.latest_id = comments.id;

which returns the expected result

 id | user_id |     created_at      | text | latest_id 
----+---------+---------------------+------+-----------
  2 |       1 | 2020-07-21 12:00:00 | bar  |         2
  3 |       2 | 2020-07-21 18:00:00 | baz  |         3

Note: if we did not need to SELECT * we could aggregate without the sub query:

SELECT MAX(id) FROM comments 
GROUP BY date_trunc('day', created_at), user_id;

Learning a new paradigm is refreshing

Wednesday evening is my new happy place. Why? Because it’s Clojure study group time! After Berlin’s first ClojureBridge, two off-shoot project groups were formed: an art group, focusing on libraries like Quil and Overtone, and a web development group. I am part of the latter.

While we do have an over-arching web project in mind, we have for the last two weeks been working through simple programming exercises and katas in Clojure.

Usually when faced with programming tasks like these, I have some idea how or where to start. It might not be the right or best solution, but it is something. And in my experience, having worked with or at some point studied Python, Java, C++, PHP, and Ruby, this something can carry across the different languages in some form or another: a for-loop here, inheritance there, whatever.

Of course, with Clojure, it’s different. If I’m thinking about objects or for-loops, I’m not thinking in a useful way. Rather I have to think about things like data structures and — surprise — functions. This conscious shift that I make in my thinking is similar to the shift my mind makes when I switch from speaking one natural language to another. If I try to speak French while thinking in English, I may use French words, but my expression remains awkwardly English.

Importantly, though, persevering so that we become able to make this shift — whether in relation to programming or natural language — will give us new ways of thinking about things. When faced with a programming task, we will be able to think about a functional solution in addition to a procedural or Object-Oriented one.

But I’d say that the best part about learning a new programming language, particularly one from an unfamiliar paradigm, is the novelty of it all. It’s like starting over, learning to code for the first time. When I wrote my first few lines of Python, it was like magic. Everything was new and exciting. A for-loop, wow! How does it work? Nested for-loops?! Amazing! It’s the same now with Clojure: (map) what now?! Wizardry!