Chapter 15 - The Index

📝 As an app user, I want to filter, order, and limit users, so that I can find users more quickly.

Our application was stable and working well, but our company has recently seen massive growth and thus the number of users in our production database has increased massively. This is great news, but now we’re seeing some new issues at scale. To simulate this scale, we’ll add some users to our development database. If you’ve been testing manually, this will cause the users root-level field to take a very long time to respond or not respond at all. The first thing we’ll want to do is limit the number of records that can be retrieved at once. For the sake of brevity, arguments, pagination logic, and tests are left as an excercise for the reader. We’ll just hardcode a typical Active Record query chain for now.

Add a filter, order, and limit to our field.

app/graphql/types/query_type.rb (lines omitted)

module Types
  class QueryType < Types::BaseObject
    # ...

    def users
      User.unarchived.order(:name).limit(100)
    end

    # ...
  end
end

Start the rails console.

Terminal

bin/rails c

Add a million users with random data.

Rails Console

> def random_attributes
  {
    name: SecureRandom.alphanumeric,
    email: "#{SecureRandom.alphanumeric}@example.com",
    archived: (rand > 0.1),
  }
end
1000.times do
  User.insert_all(1000.times.map { random_attributes })
end
ActiveRecord::Base.connection.exec_query("VACUUM ANALYZE")

With this you’ll have at least 1 million users in your database with about a 10% chance of being unarchived. With such a big table, we need to think about what new indexes, if any, will help us. We already know what we’re trying to optimize, User.unarchived.order(:name).limit(100). To judge the effectiveness, we’ll be using EXPLAIN to check query plans. We’ll also check some internal data provided by PostgreSQL.

Check the baseline query plan.

Rails Console

> User.unarchived.order(:name).limit(100).explain
  User Load (221.9ms)  SELECT "users".* FROM "users" WHERE "users"."archived" = $1 ORDER BY "users"."name" ASC LIMIT $2  [["archived", false], ["LIMIT", 100]]
=>
EXPLAIN for: SELECT "users".* FROM "users" WHERE "users"."archived" = $1 ORDER BY "users"."name" ASC LIMIT $2 [["archived", false], ["LIMIT", 100]]
                                       QUERY PLAN
-----------------------------------------------------------------------------------------
 Limit  (cost=20075.13..20086.80 rows=100 width=71)
   ->  Gather Merge  (cost=20075.13..29687.97 rows=82390 width=71)
         Workers Planned: 2
         ->  Sort  (cost=19075.11..19178.10 rows=41195 width=71)
               Sort Key: name
               ->  Parallel Seq Scan on users  (cost=0.00..17500.67 rows=41195 width=71)
                     Filter: (NOT archived)

Check the baseline index size.

Rails Console

> ActiveRecord::Base.connection.exec_query("SELECT pg_size_pretty (pg_indexes_size('users'))").first
  SQL (2.1ms)  SELECT pg_size_pretty (pg_indexes_size('users'))
=> {"pg_size_pretty"=>"84 MB"}

We’ve gotten some pretty good information here. First, in the query plan we can see a breakdown of steps with descriptions that should at least sound familiar. We’re mostly interested in the rightmost cost of the topmost line. We can see that we’re at about 20,000 right now. Your machine may be slightly different. To improve things, let’s try some indexes. First, we’ll add a pretty basic multicolumn index on archived and name.

Generate a migration.

Terminal

bin/rails g migration AddIndexToUsersOnArchivedName

Fill in the new migration.

db/migrate/[timestamp]_add_index_to_users_on_archived_name.rb

class AddIndexToUsersOnArchivedName < ActiveRecord::Migration[7.0]
  def change
    add_index :users, [:archived, :name]
  end
end

Run the migration.

Terminal

bin/rails db:migrate

See how the new index affects the query plan.

Rails Console

> User.unarchived.order(:name).limit(100).explain
  User Load (18.8ms)  SELECT "users".* FROM "users" WHERE "users"."archived" = $1 ORDER BY "users"."name" ASC LIMIT $2  [["archived", false], ["LIMIT", 100]]
=>
EXPLAIN for: SELECT "users".* FROM "users" WHERE "users"."archived" = $1 ORDER BY "users"."name" ASC LIMIT $2 [["archived", false], ["LIMIT", 100]]
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..37.89 rows=100 width=71)
   ->  Index Scan using index_users_on_archived_and_name on users  (cost=0.42..37037.45 rows=98867 width=71)
         Index Cond: (archived = false)
(3 rows)

Check the new index size.

Rails Console

> ActiveRecord::Base.connection.exec_query("SELECT pg_size_pretty (pg_indexes_size('users'))").first
  SQL (0.6ms)  SELECT pg_size_pretty (pg_indexes_size('users'))
=> {"pg_size_pretty"=>"123 MB"}

Saying this performs better would be quite the understatement. The cost, at around 38, is a fraction of our previous cost. All the database needs to do is walk the index and stop at 100 rows, hence the term “index scan.” It does come at a price. Every index you add will take up space, but at 39 MB this is more than worth it. Next, we’ll try a partial index. The benefit of partial indexes is that not all records are indexed. This is great when you’re usually only interested in a subset of your table. For us, that subset will be unarchived users.

Roll back the migration.

Terminal

bin/rails db:rollback

Update the new migration with a partial index.

db/migrate/[timestamp]_add_index_to_users_on_archived_name.rb

class AddIndexToUsersOnArchivedName < ActiveRecord::Migration[7.0]
  def change
    add_index :users, :name, where: "NOT archived",
      name: "index_users_on_name_where_not_archived"
  end
end

Run the migration.

Terminal

bin/rails db:migrate

See how this index affects the query plan.

Rails Console

> User.unarchived.order(:name).limit(100).explain
  User Load (4.2ms)  SELECT "users".* FROM "users" WHERE "users"."archived" = $1 ORDER BY "users"."name" ASC LIMIT $2  [["archived", false], ["LIMIT", 100]]
=>
EXPLAIN for: SELECT "users".* FROM "users" WHERE "users"."archived" = $1 ORDER BY "users"."name" ASC LIMIT $2 [["archived", false], ["LIMIT", 100]]
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..57.86 rows=100 width=71)
   ->  Index Scan using index_users_on_name_where_not_archived on users  (cost=0.42..56794.51 rows=98867 width=71)
(2 rows)

Check the new index size.

Rails Console

> ActiveRecord::Base.connection.exec_query("SELECT pg_size_pretty (pg_indexes_size('users'))").first
  SQL (0.6ms)  SELECT pg_size_pretty (pg_indexes_size('users'))
=> {"pg_size_pretty"=>"88 MB"}

It’s no surprise that, at 58, we see huge cost savings with this query plan as well. As expected, we’ll simply perform another index scan. Looking at the total size of our indexes, we see that this index has a much smaller footprint, adding just 4 MB. Partial indexes really shine when you’re only concerned about small subsets of the table, possibly from many different queries. In this particular case, it would take 10 partial indexes of similar size to match the size of our previous index. No two situations are alike, so when in doubt check the numbers.

Update the test for this field.

test/integration/types/query_type/users_test.rb

require "test_helper"

class Types::QueryType::UsersTest < ActionDispatch::IntegrationTest
  test "users" do
    query = <<~GRAPHQL
      {
        users {
          id
          name
          email
          createdAt
          updatedAt
          archived
        }
      }
    GRAPHQL

    post graphql_path, params: { query: query }

    users_in_email_order = [
      users(:adrian),
      users(:chantel),
      users(:daniel),
      users(:sarah),
    ]

    assert_equal(
      {
        "data" => {
          "users" => users_in_email_order.map { |user|
            {
              "id" => user.id.to_s,
              "name" => user.name,
              "email" => user.email,
              "createdAt" => user.created_at.iso8601,
              "updatedAt" => user.updated_at.iso8601,
              "archived" => user.archived,
            }
          },
        },
      },
      @response.parsed_body
    )
  end
end

Run the tests.

Terminal

bin/rails t test/integration/types/query_type/users_test.rb

Check for regressions by running all tests.

Terminal

bin/rails t

Success!

✅ Make a commit

✅ As an app user, I want to filter, order, and limit users, so that I can find users more quickly.

Next: Chapter 16 - The Backfill