ERROR: column must appear in the GROUP BY clause or be used in an aggregate function

elephant side view close black and white feeding

If you’re new to PostgreSQL, you may have come across this error when trying to perform a simple GROUP BY. Which solution you choose depends on your situation, but let’s look at how using an aggregate function can help us.

Imagine we have the following table to store user comments:

 id | user_id |     created_at      | text 
----+---------+---------------------+------
  1 |       1 | 2020-07-21 00:00:00 | foo
  2 |       1 | 2020-07-21 12:00:00 | bar
  3 |       2 | 2020-07-21 18:00:00 | baz

We have three comments from 21 July, two from user #1 and one from user #2. If we want to select the latest comment per user per day, we may try the following

SELECT * FROM comments 
GROUP BY date_trunc('day', created_at), user_id;

which results in the familiar error

ERROR:  column "comments.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT * FROM comments GROUP BY date_trunc('day', created_at...
               ^

What this error is telling us is that our query is unspecific.

Consider user #1. They have made two comments on 21 July: comment #1 and comment #2. If we’re grouping by day and by user, comment #1 and comment #2 are in the same group. We need to tell PostgreSQL, explicitly, how to choose between these two comments. As the error message tells us, we can do this either by including the comment ID (the primary key of the table) in the GROUP BY clause or by using an aggregate function.

Since we want to include only the latest comment, but also select all table columns, we’ll use a sub query and the aggregate function MAX on the comment ID, like this

SELECT * FROM comments 
INNER JOIN (
  SELECT MAX(comments.id) as latest_id 
  FROM comments 
  GROUP BY date_trunc('day', created_at), user_id
) subquery ON subquery.latest_id = comments.id;

which returns the expected result

 id | user_id |     created_at      | text | latest_id 
----+---------+---------------------+------+-----------
  2 |       1 | 2020-07-21 12:00:00 | bar  |         2
  3 |       2 | 2020-07-21 18:00:00 | baz  |         3

Note: if we did not need to SELECT * we could aggregate without the sub query:

SELECT MAX(id) FROM comments 
GROUP BY date_trunc('day', created_at), user_id;

Partially Update JSON data in PostgreSQL

The Situation: We’d like to update JSON data across multiple table rows

The TL;DR.: For top-level JSON values, we can use SET. For nested JSON values, we can use SET and jsonb_set

Top-level JSON: using SET

Let’s say I have the following table, with a JSON field to represent metadata associated with each book

TABLE books;
title    | author         | metadata
---------+----------------+------------------------------
VALIS    | Philip K. Dick | { "pp": 271, "date_read": nil }
Solaris  | Stanisław Lem  | { "pp": 214, "date_read": nil }
Binti    | Nnedi Okorafor | { "pp": 96, "date_read": nil }
(3 rows)

After a month of industrious reading, I’d like to update the “date_read” metadata from nil to 2020-07. I’d like, of course, for the page count and any other information in the JSON data to remain unchanged. This I can do with the following statement:

UPDATE books 
SET metadata = metadata::jsonb || '{ "date_read": "2020-07" }';
TABLE books;
title    | author         | metadata
---------+----------------+------------------------------
VALIS    | Philip K. Dick | { "pp": 271, "date_read": "2020-07" }
Solaris  | Stanisław Lem  | { "pp": 214, "date_read": "2020-07" }
Binti    | Nnedi Okorafor | { "pp": 96, "date_read": "2020-07" }
(3 rows)

Nested JSON: using jsonb_set

Let’s say I have the following table, this time with the JSON structured in a nested way

TABLE books;
  title  |     author     |                             metadata                             
---------+----------------+------------------------------------------------------------------
 VALIS   | Philip K. Dick | { "pp": 271, "record": { "date_read": nil, "format": "PAPERBACK" } }
 Solaris | Stanisław Lem  | { "pp": 214, "record": { "date_read": nil, "format": "PAPERBACK" } }
 Binti   | Nnedi Okorafor | { "pp": 96, "record": { "date_read": nil, "format": "EBOOK" } }
(3 rows)

I can update the “date_read” metadata from nil to 2020-07 with the following:

UPDATE books 
SET metadata = jsonb_set(
  metadata::jsonb, 
  '{reading_record, date_read}'::text[], 
  '2020-07'
);
TABLE books;
  title  |     author     |                             metadata                             
---------+----------------+------------------------------------------------------------------
 VALIS   | Philip K. Dick | { "pp": 271, "record": { "date_read": "2020-07", "format": "PAPERBACK" } }
 Solaris | Stanisław Lem  | { "pp": 214, "reading_record": { "date_read": "2020-07", "format": "PAPERBACK" } }
 Binti   | Nnedi Okorafor | { "pp": 96, "reading_record": { "date_read": "2020-07", "format": "EBOOK" } }
(3 rows)

References:

https://www.postgresql.org/docs/9.5/functions-json.html