ERROR: column must appear in the GROUP BY clause or be used in an aggregate function

elephant side view close black and white feeding

If you’re new to PostgreSQL, you may have come across this error when trying to perform a simple GROUP BY. Which solution you choose depends on your situation, but let’s look at how using an aggregate function can help us.

Imagine we have the following table to store user comments:

 id | user_id |     created_at      | text 
----+---------+---------------------+------
  1 |       1 | 2020-07-21 00:00:00 | foo
  2 |       1 | 2020-07-21 12:00:00 | bar
  3 |       2 | 2020-07-21 18:00:00 | baz

We have three comments from 21 July, two from user #1 and one from user #2. If we want to select the latest comment per user per day, we may try the following

SELECT * FROM comments 
GROUP BY date_trunc('day', created_at), user_id;

which results in the familiar error

ERROR:  column "comments.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT * FROM comments GROUP BY date_trunc('day', created_at...
               ^

What this error is telling us is that our query is unspecific.

Consider user #1. They have made two comments on 21 July: comment #1 and comment #2. If we’re grouping by day and by user, comment #1 and comment #2 are in the same group. We need to tell PostgreSQL, explicitly, how to choose between these two comments. As the error message tells us, we can do this either by including the comment ID (the primary key of the table) in the GROUP BY clause or by using an aggregate function.

Since we want to include only the latest comment, but also select all table columns, we’ll use a sub query and the aggregate function MAX on the comment ID, like this

SELECT * FROM comments 
INNER JOIN (
  SELECT MAX(comments.id) as latest_id 
  FROM comments 
  GROUP BY date_trunc('day', created_at), user_id
) subquery ON subquery.latest_id = comments.id;

which returns the expected result

 id | user_id |     created_at      | text | latest_id 
----+---------+---------------------+------+-----------
  2 |       1 | 2020-07-21 12:00:00 | bar  |         2
  3 |       2 | 2020-07-21 18:00:00 | baz  |         3

Note: if we did not need to SELECT * we could aggregate without the sub query:

SELECT MAX(id) FROM comments 
GROUP BY date_trunc('day', created_at), user_id;

Partially Update JSON data in PostgreSQL

The Situation: We’d like to update JSON data across multiple table rows

The TL;DR.: For top-level JSON values, we can use SET. For nested JSON values, we can use SET and jsonb_set

Top-level JSON: using SET

Let’s say I have the following table, with a JSON field to represent metadata associated with each book

TABLE books;
title    | author         | metadata
---------+----------------+------------------------------
VALIS    | Philip K. Dick | { "pp": 271, "date_read": nil }
Solaris  | Stanisław Lem  | { "pp": 214, "date_read": nil }
Binti    | Nnedi Okorafor | { "pp": 96, "date_read": nil }
(3 rows)

After a month of industrious reading, I’d like to update the “date_read” metadata from nil to 2020-07. I’d like, of course, for the page count and any other information in the JSON data to remain unchanged. This I can do with the following statement:

UPDATE books 
SET metadata = metadata::jsonb || '{ "date_read": "2020-07" }';
TABLE books;
title    | author         | metadata
---------+----------------+------------------------------
VALIS    | Philip K. Dick | { "pp": 271, "date_read": "2020-07" }
Solaris  | Stanisław Lem  | { "pp": 214, "date_read": "2020-07" }
Binti    | Nnedi Okorafor | { "pp": 96, "date_read": "2020-07" }
(3 rows)

Nested JSON: using jsonb_set

Let’s say I have the following table, this time with the JSON structured in a nested way

TABLE books;
  title  |     author     |                             metadata                             
---------+----------------+------------------------------------------------------------------
 VALIS   | Philip K. Dick | { "pp": 271, "record": { "date_read": nil, "format": "PAPERBACK" } }
 Solaris | Stanisław Lem  | { "pp": 214, "record": { "date_read": nil, "format": "PAPERBACK" } }
 Binti   | Nnedi Okorafor | { "pp": 96, "record": { "date_read": nil, "format": "EBOOK" } }
(3 rows)

I can update the “date_read” metadata from nil to 2020-07 with the following:

UPDATE books 
SET metadata = jsonb_set(
  metadata::jsonb, 
  '{reading_record, date_read}'::text[], 
  '2020-07'
);
TABLE books;
  title  |     author     |                             metadata                             
---------+----------------+------------------------------------------------------------------
 VALIS   | Philip K. Dick | { "pp": 271, "record": { "date_read": "2020-07", "format": "PAPERBACK" } }
 Solaris | Stanisław Lem  | { "pp": 214, "reading_record": { "date_read": "2020-07", "format": "PAPERBACK" } }
 Binti   | Nnedi Okorafor | { "pp": 96, "reading_record": { "date_read": "2020-07", "format": "EBOOK" } }
(3 rows)

References:

https://www.postgresql.org/docs/9.5/functions-json.html

Inherited Resources and Manipulating the Parameter Hash

The Inherited Resources gem is very useful at de-cluttering our controller code. And the documentation is pretty great at telling us how we can manipulate this clutter within the conventions of the gem itself. For example, if we want to, we can easily change the redirect of a controller action.

Recently I wanted to manipulate the model parameters of the params hash in a controller that uses Inherited Resources. Changing the params hash is probably (that is, almost certainly) a bad idea, but what the heck. I couldn’t think of a better work around at the time. Here’s how I did it.

The issue: We want to manipulate the params hash of a controller that uses Inherited Resources

The solution: We overwrite the build_resource_params method. We first permit the parameter keys that we expect. Then we manipulate the model parameters to our whim, before returning it.

Despite being fairly familiar with Inherited Resources and its various overwrite options, I was stuck on this for quite a while. I kept getting an ActiveModel::ForbiddenAttributesError which means (surprise) that the attributes in our params hash have not been permitted.

Let’s say we have an Animal model and a corresponding AnimalsController. We want to modify the Animal parameters in the params hash. We’d need something like this as our build_resource_params:

class AnimalsController < ApplicationController
inherit_resources
..
protected
def build_resource_params
animal_params = params.fetch(:animal, {}).permit(:name, :age, :family)
[prepare_animal_params(animal_params)]
end
def prepare_animal_params animal_params
# here we manipulate the model parameters however we want
# e.g. animal_params[:age] = 1 if animal_params[:age] < 0
...
# importantly this method must return the model params
animal_params
end

And that’s it!

Note: if we had overwritten the permitted_params to allow our Animal parameters, this overwrite wouldn’t be needed any more as the permission step now happens in the build_resource_params.

Learning a new paradigm is refreshing

Wednesday evening is my new happy place. Why? Because it’s Clojure study group time! After Berlin’s first ClojureBridge, two off-shoot project groups were formed: an art group, focusing on libraries like Quil and Overtone, and a web development group. I am part of the latter.

While we do have an over-arching web project in mind, we have for the last two weeks been working through simple programming exercises and katas in Clojure.

Usually when faced with programming tasks like these, I have some idea how or where to start. It might not be the right or best solution, but it is something. And in my experience, having worked with or at some point studied Python, Java, C++, PHP, and Ruby, this something can carry across the different languages in some form or another: a for-loop here, inheritance there, whatever.

Of course, with Clojure, it’s different. If I’m thinking about objects or for-loops, I’m not thinking in a useful way. Rather I have to think about things like data structures and — surprise — functions. This conscious shift that I make in my thinking is similar to the shift my mind makes when I switch from speaking one natural language to another. If I try to speak French while thinking in English, I may use French words, but my expression remains awkwardly English.

Importantly, though, persevering so that we become able to make this shift — whether in relation to programming or natural language — will give us new ways of thinking about things. When faced with a programming task, we will be able to think about a functional solution in addition to a procedural or Object-Oriented one.

But I’d say that the best part about learning a new programming language, particularly one from an unfamiliar paradigm, is the novelty of it all. It’s like starting over, learning to code for the first time. When I wrote my first few lines of Python, it was like magic. Everything was new and exciting. A for-loop, wow! How does it work? Nested for-loops?! Amazing! It’s the same now with Clojure: (map) what now?! Wizardry!

Conditional validation with Rails

Let’s say we have a User model that has a unique username, something like this

class User < ActiveRecord::Base
validates :username, uniqueness: true
end
view raw gistfile1.rb hosted with ❤ by GitHub

And let’s say that users can be active or inactive, and we scope our model accordingly.

class User < ActiveRecord::Base
default_scope { where(active: true) }
validates :username, uniqueness: true
end
view raw gistfile1.rb hosted with ❤ by GitHub

Now, imagine that we want usernames to be unique, but only among active users.* In other words, active users can have the same usernames of inactive users. If we have a user with the username “Alice”, and this user is inactive, then our validations should not stop us from creating a new, active user with the username “Alice”.

How do we do this? Simply, we make our uniqueness validation conditional.

validates :username, uniqueness: true, if: lambda { |user| user.active == true }
view raw gistfile1.rb hosted with ❤ by GitHub

* We can think of other real-world scenarios where we might want to do this e.g. expired e-mail addresses, archived articles

formtastic and has_one relationship: workaround

I’ve implemented many belongs_to and has_many relationships via ActiveAdmin and formtastic, but never a has_one relationship. I found a quick fix via this git issue.

Let’s take a simple example: we have users and each user “has_one” pet. The ActiveAdmin setup is similar to other relationships.

# in models/pet.rb
class Pet < ActiveRecord::Base
end
# in models/user.rb
class User < ActiveRecord::Base
has_one :pet
accepts_nested_attributes_for :pet
end
# in admin/user.rb
ActiveAdmin.register User do
permit_params :name, :age, pet_attributes: [:id, :name] # NB the id parameter..
end
view raw gistfile1.rb hosted with ❤ by GitHub

And this is the setup for our formtastic nested form

<%= semantic_nested_form_for [:admin, @user], :html => {:multipart => true} do |f| %>
<%= f.inputs 'User Information' do %>
<%= f.input :name %>
<%= f.input :age %>
<% end %>
<%= f.inputs 'User's Pet' for: [:pet, f.object.pet || Pet.new] do |pet| %>
<%= pet.input :name %>
<% end %>
<% end %>
view raw has_one hosted with ❤ by GitHub

The hack in particular is f.object.pet || Pet.new. (With many thanks to simonc!)

Using jQuery Stickem with containers of variable height

I’ve been using jQuery Stickem on a current project to “stick” the sidebar to the top of the page on scroll (see Stickem’s own demo here).

To set up Stickem, we have an outer container, an inner container, and inside this, the element that we want to be sticky. In the Stickem usage example, the outer container is the “.container” div, the inner container is the “.stickem-container” div, and the sticky element is the “.stickem” div.

<div class="container">
<div class="row stickem-container">
<div class="content">
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</div>
<div class="aside stickem">
I'm gonna be sticky!
</div>
</div>
</div>
view raw jqueryStickem hosted with ❤ by GitHub

As the usage example shows, we then use CSS to position the divs, and jQuery to initialise Stickem on the outer container.

The trouble comes when we have a container of variable height. For example, if we have slow loading images on the page, then stickem is initialised before these images are loaded. This means that the container height that stickem calculates will not include the height of the images and the sticky element will “unstick” sooner that we’d like.

The issue #1: Calculating the correct container height for stickem with slow loading elements

The solution #1: Initialise stickem on window.load rather than on DOM ready. This way, the height of the slow loading elements will be included in stickem’s height calculation

$(window).load(function(){
$('.container').stickem();
});
view raw window.load hosted with ❤ by GitHub

This works for slow loading elements such as images, but what about containers with tabs of variable height? Stickem calculates the height of the initially open tab, but if we change to a tab of a larger height, again we have the trouble of the sticky element “unsticking” sooner than desirable.

The issue #2: (Re-)Calculating the correct container height for stickem when using Bootstrap tabs of variable height.

The solution #2: In this case, we want to “destroy” the original stickem instance, and re-initialise stickem once the content of the newly opened tab is shown — that is, when Bootstrap’s ‘shown.bs.tab’ event fires.

$('a[data-toggle="tab"]').on('shown.bs.tab', function (e) {
var stickyContainer = $('.container');
stickyContainer.stickem().destroy();
stickyContainer.stickem();
});
view raw gistfile1.js hosted with ❤ by GitHub

A similar approach is recommended for Ajax-loaded content.

Draper CollectionDecorator and pagination with kaminari

I’m working with kaminari today, and I spent a bit of time getting Draper to co-operate with the pagination. This is a fairly well documented issue, but I figured I’d post about this for my own future reference.

The issue: getting kaminari methods (such as current_page, total_pages, etc) to work on a decorated collection.

The solution: We need to delegate the kaminari methods, so that Draper knows to ignore them.

How to do this?

  1. Create a separate PaginatingDecorator that inherits from CollectionDraper, and delegate the kaminari methods.
class PaginatingDecorator < Draper::CollectionDecorator
delegate :current_page, :total_pages, :limit_value
end
  1. In your ApplicationDecorator class, make sure that CollectionDecorator knows to look at PaginatingDecorator.
class ApplicationDecorator < Draper::Decorator
def self.collection_decorator_class
PaginatingDecorator
end
end
  1. If it’s not already the case, make sure your decorator inherits from ApplicationDecorator
class AnimalDecorator < ApplicationDecorator
decorates :animal
delegate_all
...
end

typeahead.js as a search function: multiple datasets

typeahead.js is a js library from Twitterland for building “robust typeaheads”.

What I found a bit confusing is how to get typeahead to act as a search function, so that once we’ve selected our chosen search term from the list of auto-completes, we are taken to a certain URL.

Let’s say we’ve set up typeahead (with the suggestion engine Bloodhound) for two datasets: poets and philosophers..

<div id="multiple-datasets">
<input class="typeahead" type="text" placeholder="Poets and Philosophers">
</div>
view raw gistfile1.html hosted with ❤ by GitHub
// In our DOM ready
var poets = new Bloodhound({
datumTokenizer: Bloodhound.tokenizers.obj.whitespace('name'),
queryTokenizer: Bloodhound.tokenizers.whitespace,
limit:10,
prefetch: {
// url points to a json file that contains an array of poet names
// e.g. ['Baudelaire', 'Beckett']
url: './poets.json',
filter: function(list){
return $.map(list, function(poet) { return { name: poet }; });
}
}
});
var philosophers = new Bloodhound({
datumTokenizer: Bloodhound.tokenizers.obj.whitespace('name'),
queryTokenizer: Bloodhound.tokenizers.whitespace,
limit:10,
prefetch: {
// url points to a json file that contains an array of philosopher names
// e.g. ['Descartes', 'Derrida']
url: './philosophers.json',
filter: function(list){
return $.map(list, function(philosopher) { return { name: philosopher }; });
}
}
});
poets.initialize();
philosophers.initialize();
$('#multiple-datasets .typeahead').typeahead({
highlight: true
},
{
name: 'poets',
displayKey: 'name',
source: poets.ttAdapter(),
templates: {
header: '<h3 class="league-name">Poets</h3>'
}
},
{
name: 'philosophers',
displayKey: 'name',
source: philosophers.ttAdapter(),
templates: {
header: '<h3 class="league-name">Philosophers</h3>'
}
});
view raw gistfile1.js hosted with ❤ by GitHub

When we types ‘De’ into the input field, both ‘Descartes’ and ‘Derrida’ will auto-complete. Now, we want to go to ‘[root]/philosophers/descartes’ when we select ‘Descartes’ from this auto-completed list. To do this, we must add

$('#multiple-datasets .typeahead').on('typeahead:selected',function(event, item, name){
// item is { name: 'Descartes' }
// name is 'philosophers'
var path = name+'/'+item.name;
url = 'http://localhost:3000/&#39;+path;
window.location = url;
});
view raw gistfile1.js hosted with ❤ by GitHub

after our existing typeahead code.