Jump to content

The ultimate community for Ruby on Rails developers.


Photo

Eager loading calculated values for many objects


  • Please log in to reply
6 replies to this topic

#1 wildbug

wildbug

    Passenger

  • Members
  • 5 posts

Posted 28 March 2014 - 05:40 PM

Here's a simple example using the standard user/post/forum idea. (Actual application is more complex.)
 
class User < ActiveRecord::Base

  has_many :posts

  def has_posted_recently?
    Post.recent.where(:user_id => self).exists?
  end

  def post_count
    posts.count
  end

  def involvement_score
    post_count * (has_posted_recently? ? 1.0 : 0.25)
  end
end
I'd like to calculate a score based on some factors that are not held by columns, e.g., like if some id simply exists in another table or getting a count without actually loading those values. These extra queries are not so bad for a single object, but for a batch of them, I could suddenly have N * M queries.
 
User.all.each do |user|
  puts user.involvement_score # 2 queries per user
end
Is there a way to do something like what includes() does for associations or some other way to "eager load" these kinds of data and have them associated with an object in a batch-like fashion?

I guess this might be two questions. How to store/access read-only, calculated values on an AR object? And how to calculate and populate these values on a batch of objects with a single query?

#2 wildbug

wildbug

    Passenger

  • Members
  • 5 posts

Posted 10 April 2014 - 02:22 PM   Best Answer

Just to follow up, I solved this using the "inline views" described in the article "Eager Loading Calculations with Database Views in Rails."

 

I wrote each calculation query individually and LEFT JOINed them together, wrapping the whole thing in yet another outer query to allow using includes().  Subclassing ActiveRecord::Base lets me use the derived table as an association on other models.

 

 



#3 uberllama

uberllama

    Passenger

  • Members
  • 2 posts

Posted 16 October 2014 - 02:24 PM

Would love to see your solution. Cheers.



#4 wildbug

wildbug

    Passenger

  • Members
  • 5 posts

Posted 17 October 2014 - 01:58 AM

Here's a solution based upon the user/posts example in my original post:

# app/model/user.rb
# Attributes:  username
class User < ActiveRecord::Base
  has_many :posts
  has_one :user_involvement
end

# app/model/post.rb
# Attributes: title, body
class Post < ActiveRecord::Base
  belongs_to :user

  scope :recent, -> { where arel_table[:created_at].gt( 1.month.ago ) }
end

# app/model/user_involvement.rb
# (No table exists in the database for this class)
class UserInvolvement < ActiveRecord::Base

  belongs_to :user

  default_scope {

    posts = Post.arel_table

    post_count_query = Post
      .group(posts[:user_id])
      .select([posts[:user_id], 'COUNT(*) AS post_count'])

    post_recent_query = Post.recent
      .select(['DISTINCT posts.user_id', "#{ActiveRecord::Base.connection.quoted_true} AS recently_posted"])

    query = %Q{
      SELECT
        users.id AS user_id,
        posts.post_count,
        recent_posts.recently_posted
      FROM
        users
        LEFT OUTER JOIN (#{post_count_query.to_sql}) AS posts ON (users.id = posts.user_id)
        LEFT OUTER JOIN (#{post_recent_query.to_sql}) AS recent_posts ON (users.id = recent_posts.user_id)
    }

    select(arel_table[Arel.star])
      .from("(#{query}) AS #{table_name}")
  }

  def self.columns
    @columns ||= [
      ActiveRecord::ConnectionAdapters::Column.new(:user_id, nil, :integer),
      ActiveRecord::ConnectionAdapters::Column.new(:post_count, nil, :integer),
      ActiveRecord::ConnectionAdapters::Column.new(:recently_posted, nil, :boolean)
    ]
  end

  def readonly?
    true
  end

  def post_count
    super.presence || 0
  end

  def recently_posted
    super.presence || false
  end

  def score
    post_count * (recently_posted ? 1.0 : 0.25)
  end

end

As you can see, UserInvolvement is a subclass of ActiveRecord::Base, which can be defined on "real," table-backed classes as an association.  This means that you can use includes() on it.

irb(main):065:0> User.includes(:user_involvement).collect(&:user_involvement).collect(&:score)
  User Load (0.2ms)  SELECT "users".* FROM "users" 
  UserInvolvement Load (0.3ms)  SELECT "user_involvements".* FROM (
 SELECT
 users.id AS user_id,
 posts.post_count,
 recent_posts.recently_posted
 FROM
 users
 LEFT OUTER JOIN (SELECT "posts"."user_id", COUNT(*) AS post_count FROM "posts" GROUP BY "posts"."user_id") AS posts ON (users.id = posts.user_id)
 LEFT OUTER JOIN (SELECT DISTINCT posts.user_id, 't' AS recently_posted FROM "posts" WHERE ("posts"."created_at" > '2014-09-17 01:43:37.004420')) AS recent_posts ON (users.id = recent_posts.user_id)
 ) AS user_involvements WHERE "user_involvements"."user_id" IN ('1', '2')

=> [1.0, 0.0]

This is pretty cool.  We've pulled in two "informational" queries in one shot and stuffed it in an association.

 

There is a caveat, however.  This will break if includes() decides to use the LEFT JOIN strategy, which it falls back on when an included column is referenced in an ORDER BY clause, for instance:

irb(main):071:0> User.order('user_involvement.post_count').includes(:user_involvement)
  SQL (0.3ms)  SELECT "users"."id" AS t0_r0, "users"."username" AS t0_r1, "users"."created_at" AS t0_r2, 
  "users"."updated_at" AS t0_r3, "user_involvements"."id" AS t1_r0, "user_involvements"."user_id" AS t1_r1, 
  "user_involvements"."post_count" AS t1_r2, "user_involvements"."recently_posted" AS t1_r3 FROM "users" 
  LEFT OUTER JOIN "user_involvements" ON "user_involvements"."user_id" = "users"."id"
  ORDER BY user_involvement.post_count

SQLite3::SQLException: no such table: user_involvements: ...
ActiveRecord::StatementInvalid: SQLite3::SQLException: no such table: user_involvements

The default_scope gets "lost."



#5 wildbug

wildbug

    Passenger

  • Members
  • 5 posts

Posted 17 October 2014 - 02:37 AM

I've actually had to approach this problem again quite recently, and this time I tried a different strategy.

 

The "inline view" method demonstrated above has some drawbacks.  It generates invalid SQL under certain conditions.  It adds an association to every User object, despite the "involvement score" perhaps only being needed under specific circumstances.  And it returns subclass instances of ActiveRecord::Base, which is a fairly "heavy" object.

 

ActiveRecord::Base.connection has a select_all method that returns an array of plain hashes, keyed by result column name.

 

I wrote a plain old Ruby class that uses a class method to execute the query and build and populate an array of instances.

 

One problem is that PostgreSQL returns strings for the values instead of the SQL column type, even when they are just integers, so those values need to be cast to a Fixnum or whatever type they should be.

class Involvement
  attr_reader :post_count, :recently_posted

  def self.by_user(users = User.scoped)
    # returns Hash keyed by user.id
    Hash[results(users).map do |row|
      [row['user_id'].to_i, new(row['post_count'].presence || 0, !!row['recently_posted'])]
    end]
  end

  def self.query(users)
    posts = Post.arel_table

    post_count_query = Post
      .group(posts[:user_id])
      .select([posts[:user_id], 'COUNT(*) AS post_count'])

    post_recent_query = Post.recent
      .select(['DISTINCT posts.user_id', "#{ActiveRecord::Base.connection.quoted_true} AS recently_posted"])

    %Q{
      SELECT
        users.id AS user_id,
        posts.post_count,
        recent_posts.recently_posted
      FROM
        users
        LEFT OUTER JOIN (#{post_count_query.to_sql}) AS posts ON (users.id = posts.user_id)
        LEFT OUTER JOIN (#{post_recent_query.to_sql}) AS recent_posts ON (users.id = recent_posts.user_id)
    }
  end

  def self.results(users)
    ActiveRecord::Base.connection.select_all(query users)
  end

  private_class_method :new

  def initialize(post_count, recently_posted)
    @post_count, @recently_posted = post_count, recently_posted
  end

  def score
    post_count * (recently_posted ? 1.0 : 0.25)
  end

end
irb(main):079:0> Involvement.by_user.collect { |_, v| v.score }
   (0.7ms)  
 SELECT
 users.id AS user_id,
 posts.post_count,
 recent_posts.recently_posted
 FROM
 users
 LEFT OUTER JOIN (SELECT "posts"."user_id", COUNT(*) AS post_count FROM "posts" GROUP BY "posts"."user_id") AS posts ON (users.id = posts.user_id)
 LEFT OUTER JOIN (SELECT DISTINCT posts.user_id, 't' AS recently_posted FROM "posts" WHERE ("posts"."created_at" > '2014-09-17 02:31:34.322552')) AS recent_posts ON (users.id = recent_posts.user_id)
 
=> [1.0, 0.0]

This approach to aggregate/summary queries keeps the query logic encapsulated in its own class.  If these are pretty specific queries that don't warrant a full-blown association on a model object, this is a good idea.  It also uses a simple class, which is going to be less expensive than instantiating an ActiveRecord::Base object.  It accepts an ActiveRecord::Relation, so performance-wise it's similar to using includes(). Returning a Hash keyed by user.id lets you correlate Users with their Involvement.

 

One drawback is having to cast the return values to their expected Ruby types.  In my real solution, I wrote a simple base class that makes use of the ActiveRecord::ConnectionAdapters::Column* classes, which have a type_cast method backed by more sophisticated logic for dates, booleans, etc.



#6 uberllama

uberllama

    Passenger

  • Members
  • 2 posts

Posted 17 October 2014 - 12:14 PM

Are you using Rails 3? Rails 4 should cast the values correctly.



#7 wildbug

wildbug

    Passenger

  • Members
  • 5 posts

Posted 18 October 2014 - 03:05 PM

Yes, I am using Rails 3.  FWIW, the SQLite3 adapter does cast the values correctly; the PostgreSQL adapter does not.  (Also my "casting" line in the second example is not quite perfect, particularly for the boolean, but apparently I can't edit my posts after some amount of time, so ....)






1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users