Optimize Rails Queries
The argument Rails doesn’t scale well is commonly used to downplay the importance of the language and the framework. Yet, many organization use it as the core of their operations, from small start-ups to platforms with millions of users. A good question to ask ourselves is whether Rails is actually unable to scale, or whether the problem is buried deeper somewhere.
A slow algorithm in C++ is as slow as in Ruby. In Haskell, poor architecture design cripples performance just as badly as in Ruby. Rails is simple to understand, however, to attain mastery we need to spend as much time as we would invest in any other framework or language. Just because we moved to a faster language, we can’t expect problems to magically vanish.
Rails application performance depends on several variables, and one amongst those variables is the number of queries performed to complete an operation. The lower the number of calls to the database, the lower the memory allocation therefore shorter the length of the operation.
The N + 1 query problem is one such problem.
What do N + 1 Queries mean?
N + 1 query is an inefficient way to query a database. Let’s say we have a one-to-many relationship between YoutubeVideo and YoutubeVideoVotes.
class YoutubeVideo < ActiveRecord::Base has_many :youtube_video_votesendclass YoutubeVideoVote < ActiveRecord::Base belongs_to :youtube_videoend
To load the latest youtube videos and print votes on the video following code will work:
youtube_videos = YoutubeVideo.limit(5)youtube_videos.each do |video| video.youtube_video_votes.each do |video_vote| puts "voted: #{video_vote.vote}" endend
Here total 6 database queries will be fired. One query to fetch youtube videos and N queries to fetch youtube video votes for each video, hence the name is N+1 queries.
YoutubeVideo Load (0.5ms) SELECT "youtube_videos".* FROM "youtube_videos" LIMIT $1 [["LIMIT", 5]]YoutubeVideoVote Load (0.5ms) SELECT "youtube_video_votes".* FROM "youtube_video_votes" WHERE "youtube_video_votes"."youtube_video_id" = $1 [["youtube_video_id", 1]]YoutubeVideoVote Load (0.2ms) SELECT "youtube_video_votes".* FROM "youtube_video_votes" WHERE "youtube_video_votes"."youtube_video_id" = $1 [["youtube_video_id", 3]]YoutubeVideoVote Load (0.2ms) SELECT "youtube_video_votes".* FROM "youtube_video_votes" WHERE "youtube_video_votes"."youtube_video_id" = $1 [["youtube_video_id", 4]]YoutubeVideoVote Load (0.2ms) SELECT "youtube_video_votes".* FROM "youtube_video_votes" WHERE "youtube_video_votes"."youtube_video_id" = $1 [["youtube_video_id", 5]]YoutubeVideoVote Load (0.2ms) SELECT "youtube_video_votes".* FROM "youtube_video_votes" WHERE "youtube_video_votes"."youtube_video_id" = $1 [["youtube_video_id", 6]]
This loading approach is inefficient. If we’ve got a thousand videos, we’ll have to connect with the database 1001 times. Remote database connections are not free, and a strong performance penalty is introduced for each connection.
How to Prevent the N+1 Query Problem?
To make sure we don’t end up running unnecessary queries while looping through an object, we need to use eager loading. It will solve N + 1 query problems. We can use rails preload, eager_load, or includes.
Preload
It loads association data by executing two database queries.
YoutubeVideo.preload(:youtube_video_votes).to_a#=>
YoutubeVideo Load (1.5ms) SELECT "youtube_videos".* FROM "youtube_videos" LIMIT $1 [["LIMIT", 5]]YoutubeVideoVote Load (0.6ms) SELECT "youtube_video_votes".* FROM "youtube_video_votes" WHERE "youtube_video_votes"."youtube_video_id" IN ($1, $2, $3, $4, $5) [["youtube_video_id", 1], ["youtube_video_id", 3], ["youtube_video_id", 4], ["youtube_video_id", 5], ["youtube_video_id", 6]]
Our application would only execute only two database queries even with 1000 youtube video records.
For comparison, the time taken to load and display 1000 youtube videos without preload loading is 987.6 milliseconds, and with preloading, just 9.4 milliseconds. This is approximately 100 times quicker. A huge gap.
The following query will raise an error, as preload execute separate queries to load association data:
YoutubeVideo.preload(:youtube_video_votes).where(youtube_video_votes: { vote: 'down' }).to_a#=>ActiveRecord::StatementInvalid (PG::UndefinedTable: ERROR: missing FROM-clause entry for table "youtube_video_votes")LINE 1: ...CT "youtube_videos".* FROM "youtube_videos" WHERE "youtube_v...^: SELECT "youtube_videos".* FROM "youtube_videos" WHERE "youtube_video_votes"."vote" = $1
Includes
Like preload, includes executes separate queries to load association records. But Includes switches to a single LEFT OUTER JOIN query to fetch association data if the where condition is applied.
YoutubeVideo.includes(:youtube_video_votes).where(youtube_video_votes: { vote: 'down' }).to_a#=>
SELECT "youtube_videos"."id" AS t0_r0, "youtube_videos"."user_id" AS t0_r1, "youtube_videos"."url" AS t0_r2, "youtube_videos"."title" AS t0_r3, "youtube_videos"."description" AS t0_r4, "youtube_videos"."created_at" AS t0_r5, "youtube_videos"."updated_at" AS t0_r6, "youtube_video_votes"."id" AS t1_r0, "youtube_video_votes"."youtube_video_id" AS t1_r1, "youtube_video_votes"."user_id" AS t1_r2, "youtube_video_votes"."vote" AS t1_r3, "youtube_video_votes"."created_at" AS t1_r4, "youtube_video_votes"."updated_at" AS t1_r5 FROM "youtube_videos" LEFT OUTER JOIN "youtube_video_votes" ON "youtube_video_votes"."youtube_video_id" = "youtube_videos"."id" WHERE "youtube_video_votes"."vote" = $1 [["vote", 2]]
To make includes to use a single query for queries without any where condition we can use references.
YoutubeVideo.includes(:youtube_video_votes).references(:youtube_video_votes).to_a#=>SELECT "youtube_videos"."id" AS t0_r0, "youtube_videos"."user_id" AS t0_r1, "youtube_videos"."url" AS t0_r2, "youtube_videos"."title" AS t0_r3, "youtube_videos"."description" AS t0_r4, "youtube_videos"."created_at" AS t0_r5, "youtube_videos"."updated_at" AS t0_r6, "youtube_video_votes"."id" AS t1_r0, "youtube_video_votes"."youtube_video_id" AS t1_r1, "youtube_video_votes"."user_id" AS t1_r2, "youtube_video_votes"."vote" AS t1_r3, "youtube_video_votes"."created_at" AS t1_r4, "youtube_video_votes"."updated_at" AS t1_r5 FROM "youtube_videos" LEFT OUTER JOIN "youtube_video_votes" ON "youtube_video_votes"."youtube_video_id" = "youtube_videos"."id"
Eager load
Eager loading makes LEFT OUTER JOIN query to load association records.
YoutubeVideo.eager_load(:youtube_video_votes).to_a#=>SELECT "youtube_videos"."id" AS t0_r0, "youtube_videos"."user_id" AS t0_r1, "youtube_videos"."url" AS t0_r2, "youtube_videos"."title" AS t0_r3, "youtube_videos"."description" AS t0_r4, "youtube_videos"."created_at" AS t0_r5, "youtube_videos"."updated_at" AS t0_r6, "youtube_video_votes"."id" AS t1_r0, "youtube_video_votes"."youtube_video_id" AS t1_r1, "youtube_video_votes"."user_id" AS t1_r2, "youtube_video_votes"."vote" AS t1_r3, "youtube_video_votes"."created_at" AS t1_r4, "youtube_video_votes"."updated_at" AS t1_r5 FROM "youtube_videos" LEFT OUTER JOIN "youtube_video_votes" ON "youtube_video_votes"."youtube_video_id" = "youtube_videos"."id"
This is same as includes with where condition or references.
We can use any of the above options to eliminate N + 1 query problems.
Use Bullet Gem
Initially, we have to ensure that our developers understand the problem and while developing new queries they should be able to identify and remove N + 1 queries by using eager loading.
Pull Request review is one of the ways to transfer knowledge and identify N + 1 problems, but sometimes even after paying attention errors slip to production. So we should use some tool to identify these queries on CI. Bullet gem is one such tool.
The Bullet gem is designed to help you improve the efficiency of your application by lowering the number of queries it makes. When you build your application, it will monitor your queries and inform you when you should add eager loading (N+1 queries), when you are using eager loading that is not needed and when you should use counter cache.
Happy building! :)