End Point

News

Welcome to End Point's blog

Ongoing observations by End Point people.

Kamelopard Release

After completing no small amount of refactoring, I'm pleased to announce a new release of Kamelopard, a Ruby gem for generating KML. KML, as with most XML variants, requires an awful lot of typing to write by hand; Kamelopard makes it all much easier by mechanically generating all the repetitive XML bits and letting the developer focus on content. An example of this appears below, but first, here's what has changed most recently:

  • All KML output comes via Ruby's REXML library, rather than simply as string data that happens to contain XML. This not only makes it much harder for Kamelopard developers to mess up basic syntax, it also allows examination and modification of the KML data using XML standards such as XPath.
  • Kamelopard classes now live within a module, preventing namespace collisions. This is important for any large-ish library, and probably should have been done all along. Previous to this, some classes had awfully strange names designed to prevent namespace collisions; these classes have been changed to simpler, more intuitive names now that collisions aren't a problem.
  • Perhaps the biggest change is the incorporation of a large and (hopefully) comprehensive test suite. I'm a fan of test-driven development, but didn't start off on the right foot with Kamelopard. It originally shipped with a Ruby script that tried a few examples and hoped it didn't crash; that has been replaced with a full RSpec-based test suite, including tests for each class and in particular, extensive test of the KML output to ensure it meets the KML specification. Run these tests from the Kamelopard source with the command
    rspec spec/*

Now for some code. We recently got a data set containing several thousand locations, describing the movement of an aircraft on final approach and landing, with the request that we turn it into a Google Earth tour, where the viewer would follow the aircraft's path, flight simulator style. The actual KML result is over 56,000 lines, but the KML code is fairly simple:

require 'rubygems'
require 'kamelopard'
require 'csv'

CSV.foreach(ARGV[0]) do |row|
    time = row[0]
    lon = row[1].to_f
    lat = row[2].to_f
    alt = row[3].to_f

    p = Kamelopard::Point.new lon, lat, alt, :absolute
    c = Kamelopard::Camera.new(p, get_heading, get_tilt, get_roll, :absolute)
    f = Kamelopard::FlyTo.new c, nil, pause, :smooth
end

puts Kamelopard::Document.instance.get_kml_document.to_s

Along with some trigonometry and linear algebra to calculate the heading, tilt, and roll, and a CSV file of data points, the script above is all it took; the KML result runs correctly in Google Earth without further modification. Kamelopard has been published to RubyGems.org, so installation is simply

gem install kamelopard
Give it a try!

Book Recommendation: Ghost in the Wires

I recently listened to Ghost in the Wires by Kevin Mitnick as an audiobook during my long Thanksgiving vacation drives. This non-fiction book is a first-person account about Kevin Mitnick's phone and computer break-in (or what he claims to be ethical hacking) adventures in the late eighties and early nineties, and it touches on the following legal proceedings from 1995 on. A couple of interesting things stood out to me:

  • Kevin's tactics revolve around social engineering, or techniques that capitalize on flaws in "human hardware" to gain information. The book was an eye opener in terms of how easily Kevin gained access to systems, as there are countless examples of Kevin's ability to gain credibility, pretext, introduce diversions, etc.
  • Another highlight of the book for me was learning details of how bug reports were exploited to gain sensitive information. Kevin gained access to bug reports on proprietary software to exploit the software and gain access to the systems running the software. I don't think of my own clients' bug reports as an extremely valuable source of information for exploiting vulnerabilities to gain user information, but there have been a few instances in the past where bugs could have been used maliciously.

Follow-up Comments

One thing that strikes me is how the internet and technology has changed since Kevin's infringements, specifically in the development of open source software. End Point works with open source operating systems, packages, monolithic ecommerce applications, and modular open source elements (e.g. Rails gems, CPAN modules). Bug reports on open source applications are easily accessible. For example, here is an article on the security vulnerabilities in recent versions of Rails.

The responsibility of keeping up with security updates shifts to the website owner leveraging these open source solutions (or the hosting provider and/or developer in some cases). I spoke with a few developers a couple of years ago about how public WordPress security vulnerabilities enable unethical hackers to easily gain access to sites running WordPress without the security updates. With the increased popularity of open source and visibility of security vulnerabilities, it's important to keep up with security updates, especially those which might make sensitive user information available.

With the advancement in technology, security processes should become a normal part of development. For example, End Point has standard security processes in place such as use of ssh keys, firewalls for server access, and PGP encryption. Our clients also follow PCI compliance regulations regarding storing credit card numbers and security numbers in encrypted form only, or in some cases not at all if a third party payment processor is used. It's nice to use a third party service for storing credit card data since the responsibility of storing sensitive cardholder data shifts to the third party (however, the interaction between your site and the third party must be protected).

This is an interesting read (or listen) that I recommend to anyone working with sensitive information in the tech field. Learning about the social engineering techniques was fascinating in itself and technical bits are scattered throughout the book which make it suitable for tech-savvy and non-tech-savvy readers.

Global Variables in Interchange Jobs

Those familiar with writing global code in Interchange are certainly familiar with the number of duplicate references of certain global variables in different namespaces. For example, the Values reference is found in both the main namespace ($::Values) as well as in Vend::Interpolate ($Values usually from within usertags). One can also access the Values reference through the Session reference, which itself can be found in main ($::Session), Vend ($Vend::Session), and Vend::Interpolate ($Session usually from within usertags) namespaces with, e.g., $::Session->{values}. Most times, as long as context allows, any of those access points are interchangeable, and there's a good mix you see from developers using all of them.

In recent work for a client, I had developed an actionmap that incorporated access to the session for some of its coding--certainly not an uncommon occurrence. When I work in global space, I tend to use the main namespace references since they are available in all contexts within Interchange (or so I thought). The actionmap was constructed, tested, and put into production, where it worked as expected.

After a short period of operation, the client came to us and noted that in their actual operating procedure, the actionmap must process many more data points than we had it operate on in testing, causing it to take much more time. Thus, for their usual workload, they found the process was timing out and Interchange housekeeping reaping the process.

After a brief discussion, we decided the expedient course of action was to convert the work from a browser-initiated actionmap into an Interchange job. The code was easily exposed as a usertag as well, so in very short order we had the same functionality available as a job, where the job was now triggered by the browser access previously running the actionmap.

The change resolved the immediate problem, so now all work was completing, but the client brought a new issue to our attention. The reporting from the job was not as it was supposed to be. None of the code had been modified in the changeover, and the code when run as an actionmap produced the proper reporting.

The problem tracked down eventually to that session access. When the code was run in the context of the job, the Session reference was not copied into the main (or, as it turns out, Vend::Interpolate) namespace. Without the assumed session values in place, it was causing the report to produce invalid output.

To demonstrate, I constructed a simple usertag to dump the reference addresses of the 5 mentioned global variables:

UserTag  ic-globals  Routine <<EOR
sub {
    return <<EOP;
.     \$Session: $Session
\$Vend::Session: $Vend::Session
    \$::Session: $::Session

       \$Values: $Values
     \$::Values: $::Values
EOP
}
EOR

I then created both a test page and an IC job that only called [ic-globals]. Running them both demonstrates the problem quite clearly.

From test page:

      $Session: HASH(0xb0e1898)
$Vend::Session: HASH(0xb0e1898)
    $::Session: HASH(0xb0e1898)

       $Values: HASH(0xb0e1dd8)
     $::Values: HASH(0xb0e1dd8)

Output from job:

      $Session:
$Vend::Session: HASH(0xb221fa0)
    $::Session:

       $Values: HASH(0x926ddd8)
     $::Values: HASH(0x926ddd8)

Interchange jobs provide yet a new context where you must consider your global variable usage. In particular, if you find code executed in the context of a job produces inconsistencies with the same code in other contexts, review your global variable usage and confirm those variables are what you assume they are.

Appending one PDF to another using PDF Toolkit

Ever need to manipulate PDFs? Prefer the command line? Us too. Imagine you have a contract in PDF format. When people print, sign, and re-scan the contract, that's good documentation of the signature, but the clarity of the original machine-readable text is lost and the the file's size is unnecessarily large. One solution is to append the scanned signature page to the original contract document.

There are many PDF editors out there which address this need. One command line solution that works well is PDF Labs's PDF Toolkit. Let's look at how we would use PDF Toolkit to append one document to another.

pdftk contract.pdf scanned_contract.pdf cat output original_and_signed_contract.pdf

With this command we now have both contracts in their entirety. What we really want is to just take the signature page and append it. Let's revise our command a bit to only take the signature page using what PDF Toolkit calls handles.

pdftk A=contract.pdf B=scanned_contract.pdf cat A B5 output contract_with_signature_attached.pdf

We've assigned each document to a handle (A and B), which allows us to define the order of the output as well as the pages we want to select for the output. With the argument B5 PDF Toolkit knows we only want the fifth page of the scanned_contract.pdf. Ranges are also supported, so we could write something like B4-5 too.
Unfortunately, the scanned contract was scanned upside down, so let's rotate 180 degrees by adding the -endS argument.

pdftk A=contract.pdf B=scanned_contract.pdf cat A B5-endS output contract_with_signature_attached.pdf

One notable issue I encountered while rotating individual pages was the inability to rotate and append only the first page. When specifying an option like B1-endS, the entire "B" document would be rotated and appended instead of just the first page. One other gotcha to remember: escape spaces and special characters when providing the names of documents. For example, if our document was named "scanned contract.pdf" we would need to do this:

pdftk contract.pdf scanned\ contract.pdf cat output signed_contract.pdf

The PDF Toolkit is licensed under GNU General Public License (GPL) Version 2. PDF Labs's website provides a host of other examples including how to encrypt, password-protect, and repair PDFs.

Performing Bulk Edits in Rails: Part 1

This will be the first article in a series, outlining how to implement a bulk edit in Rails 3.1.1 (although most any version of Rails will do). Today we'll be focusing on a simple user interface to allow the user to make a selection of records. But first, let's look at our user story.

The user story

  • User makes a selection of records and clicks "Bulk Edit" button
  • User works with the same form they would use for a regular edit, plus
    • check boxes are added by each attribute to allow the user to indicate this variable should be affected by the bulk edit
    • only attributes which are the same among selected records should be populated in the form
An example UI from Google's AdWords interface for
selecting multiple records for an action.

Sounds straight forward, right?  Well, there are a couple of gotcha's to be worked out along the way.

Capturing the user's selection

We'd like to offer the user a form with check boxes to click so when submitted, our controller gets an array of IDs we can pass to our ActiveRecord finder. It's best implemented using check_box_tag which means it's not auto-magically wired with an ActiveRecord object, which makes sense in this case because we don't want our form manipulating an active record object. We simply want to send our user's selection of records along to a new page.  Let's see what this looks like.

# app/views/search/_results.html

<% @foos.each do |foo| %>
  <%= check_box_tag "foo_ids[]", foo.id  %>
<% end %>

# when posted looks like
# "foo_ids"=>["4", "3", "2"]
Because we now have an array of IDs selected, it becomes very easy for us to work with our user's selection.
# app/controller/bulk_edit_controller.rb

def new
  if params[:foo_ids].is_a?(Array) && params[:foo_ids].length > 1  #let's make sure we got what we expected
    @foos = Foo.find(params[:foo_ids])
  else
    redirect_to search_path
  end
end

Refining the UI with Javascript and CSS

It's not just enough to have these check boxes. We need our "Bulk Edit" button only to appear when the user has made an appropriate selection. Let's update our view code to give our tags some class.

# app/views/search/_results.html

<%= form_tag new_bulk_edit_path, :method => "GET", :id => "bulk-edit-form" do %>
  <%= submit_tag "Bulk Edit", :id => "bulk-edit-submit" %>
<% end %>

<div class="search_results">
  <% @foos.each do |foo| %>
    <%= check_box_tag "foo_ids[]", foo.id, false, :class => "downloadable"  %>
  <% end %>
</div>

# app/assets/stylesheets/search.css

#bulk-edit-submit {
  input { display: none; }
}

We've added the downloadable class tag to our check boxes, while adding a simple form to send data to the new_bulk_edit_path. This path corresponds to the new action, which typically, you don't post forms to (which is why we needed to be explicit about setting the GET method). However, in this case we need this information before we can proceed with a new bulk edit. We've also hidden the submit button by default. We'll need some Javascript to show and hide it.

# app/assets/javascripts/search.js

$('.downloadable').click(function() {     //when an element of class downloadable is clicked
  var check_count = $('.downloadable:checked').size();  //count the number of checked elements
  if( check_count > 1 ) {
    $("#bulk-edit-submit").show();
  } else {
    $("#bulk-edit-submit").hide();
  }
});

At this point, you might have noticed that we're submitting a form with no fields in it! While we could simply wrap our form_tag around our search results, but we may not always want this. For example, what if we need multiple forms to be able to send our selection to different controllers in our application? Right now we're working on a bulk edit, but you know the client is expecting a bulk download as well. We can't wrap the same search results partial in multiple forms. Let's see how we would populate the our form using more Javascript.

# app/assets/javascripts/search.js

$('#bulk-edit').submit(function() {  //When the bulk-edit form is submitted
  $('#bulk-edit input:checked').remove();  //clear all checked elements from form
  var selected_items = $('.downloadable:checked').clone();
  $('#bulk-edit').append(selected_items);
  return true;  //VERY IMPORTANT, needed to actually submit the form
});

This is a simple, unobtrusive way to give your forms a little more flexibility. It's also a good example of how to use :checked as a modifier on our jQuery selector.

Namespacing and Refactoring our Javascript

Knowing you'll need to implement a bulk-download form later in this same style, so let's refactor out this cloning functionality.

# app/assets/javascripts/search.js

$('#bulk-edit').submit(function() {
  MyAppName.clone_downloadable_checkboxes_to($(this));  //You MUST wrap "this" inside $()
  return true;
});

if(!window.MyAppName) {
  MyAppName = {};  //Initialize namespace for javascript functions
}

MyAppName.clone_downloadable_checkboxes_to = function(destination) {
  destination.children("input:checked").remove();
  var selected_items = $('.downloadable:checked').clone();
  destination.append(selected_items);
};

One of the big highlights here is namespacing our Javascript function. While the chances are low that someone out there is going to have clone_downloadable_checkboxes_to in the global namespace too, it's always best to use proper namespaces.

Well, we've made it through the first part of our user story. The user can now check their boxes, and submit a form to the appropriate Rails resource. Stay tuned to see how we implement the second half of our user's story.

Advanced Rights and Roles Management in Rails

I've been working with Phunk, Brian, and Evan on a large Rails 3.1 project that has included several unique challenges. One of these challenges is a complex rights, roles, and accessibility system, which I'll discuss here.

Before I wrote any code, I researched existing authorization systems, and came across this article which lists a few of the popular authorization gems in Rails. After reading through the documentation on several more advanced current authorization gems, I found that no gem offered the level of complexity we needed, where rights are layered on top of roles and can be mapped out to specific actions. Because the client and my team were most familiar with acl9, we chose to work with it and layer rights on top of the existing access control subsystem. Here's a look at the data model we were looking for:

The data model shows a has_and_belongs_to_many (or many-to-many) relationship between users and roles, and roles and rights. Things are an example model, which belong_to users. Rights map out to methods in the controller that can be performed on thing instances.

Implementation

Starting from the admin interface, a set of rights can be assigned to a role, a standard has_and_belongs_to_many relationship:

The admin interface includes ability to assign roles to users, another has_and_belongs_to_many relationship:

And the user model has an instance method to determine if the user's rights include the current method or right:

class User < ActiveRecord::Base
  ...
  def can_do_method?(method)
    self.rights.detect { |r| r.name == method }
  end 
  ...
end

At the controller level without abstraction, we use the access control system to determine if the user has the ability to do that particular action, by including a conditional on the rule. Note that in these examples, the user also must be logged in, which is connected to the application's authentication system (devise).

class ThingsController < ApplicationController
  ...
  access_control do
    allow logged_in, :to => :example_right1, :if => :allow_example_right1?
    allow logged_in, :to => :example_right2, :if => :allow_example_right2?
    allow logged_in, :to => :example_right3, :if => :allow_example_right3?
  end

  def allow_example_right1?
    current_user.can_do_method?("example_right1")
  end
  def example_right1
    # actual method on Thing instance
  end
  def allow_example_right2?
    current_user.can_do_method?("example_right2")
  end
  def example_right2
    # actual method on Thing instance
  end
  def allow_example_right3?
    current_user.can_do_method?("example_right3")
  end
  def example_right3
    # actual method on Thing instance
  end
  ...
end

The controller is simplified with the following abstraction. The access control statements do not need to be modified for each new potential method/right, but the method itself must be defined.

class ThingsController < ApplicationController
  ...
  access_control do
    allow logged_in, :to => :generic_method, :if => :allow_generic_method?
  end

  def allow_generic_method?
    current_user.can_do_method?(params[:method])
  end
  def generic_method
    self.send(params[:method])
  end

  def example_right1
    # actual method on Thing instance
  end
  def example_right2
    # actual method on Thing instance
  end
  def example_right3
    # actual method on Thing instance
  end
  ...
end

And don't forget the handler for Acl9::AccessDenied exceptions, inside the ApplicationController, which handles both JSON and HTML responses:

class ApplicationController < ActionController::Base
  ...
  # Rescuing from any Access denied messages, generic JSON response or redirect and flash message
  rescue_from Acl9::AccessDenied do |exception|
    respond_to do |format|
      format.json do  
        render :json => { :success => false, :message => "You do not have access to do this action." }
      end 
      format.html do
        flash[:error] = 'You do not have access to view this page.'
        redirect_to root_url
      end 
    end 
  end 
end

Conclusion

Note that in actuality, our application has additional complexities, such as:

  • The relationship between rights and $subject is polymorphic, where $subject is a user or a role. This slightly complicates the has_and_belongs_to_many relationship between rights and users or roles. The can_do_method? predicate is updated to consider both user assigned rights and role assigned rights.
  • Performance is a consideration in this application, so Rails low-level caching may be leveraged to minimize accessibility lookup.
  • There is a notion of a global right and an ownership-level right, which means that a user with an ownership-level right may have the ability to do certain method only if they own the thing. A user with a global right has the ability to do the method regardless of ownership. This complicates our can_do_method? predicate further, to determine if the user has the global right or ownership-level right for that method on that thing.
  • A few methods have more complex business logic which determine whether or not a user has the ability to do that method. In those cases, an additional access_control allow rule is created, and distinct conditional predicate is used to determine if the user can do that method (i.e. allow_generic_method? is not used for these actions).

Other than the additional complexities, leveraging acl9's access control subsystem makes for a clean rights and roles management solution. Stay tuned for a follow-up article on leveraging this data model in combination with Rails' attr_accessible functionality to create elegant server-side validation.

Finding PostgreSQL temporary_file problems with tail_n_mail


Image by Flickr user dirkjanranzijn

PostgreSQL does as much work as it can in RAM, but sometimes it needs to (or thinks that it needs to) write things temporarily to disk. Typically, this happens on large or complex queries in which the required memory is greater than the work_mem setting.

This is usually an unwanted event: not only is going to disk much slower than keeping things in memory, but it can cause I/O contention. For very large, not-run-very-often queries, writing to disk can be warranted, but in most cases, you will want to adjust the work_mem setting. Keep in mind that this is very flexible setting, and can be adjusted globally (via the postgresql.conf file), per-user (via the ALTER USER command), and dynamically within a session (via the SET command). A good rule of thumb is to set it to something reasonable in your postgresql.conf (e.g. 8MB), and set it higher for specific users that are known to run complex queries. When you discover a particular query run by a normal user requires a lot of memory, adjust the work_mem for that particular query or set of queries.

How do you tell when you work_mem needs adjusting, or more to the point, when Postgres is writing files to disk? The key is the setting in postgresql.conf called log_temp_files. By default it is set to -1, which does no logging at all. Not very useful. A better setting is 0, which is my preferred setting: it logs all temporary files that are created. Setting log_temp_files to a positive number will only log entries that have an on-disk size greater than the given number (in kilobytes). Entries about temporary files used by Postgres will appear like this in your log file:

2011-01-12 16:33:34.175 EST LOG:  temporary file: path "base/pgsql_tmp/pgsql_tmp16501.0", size 130220032

The only important part is the size, in bytes. In the example above, the size is 124 MB, which is not that small of a file, especially as it may be created many, many times. So the question becomes, how can we quickly parse the files and get a sense of which queries are causing excess writes to disk? Enter the tail_n_mail program, which I recently tweaked to add a "tempfile" mode for just this purpose.

To enter this mode, just name your config file with "tempfile" in its name, and have it find the lines containing the temporary file information. It's also recommended you make use of the tempfile_limit parameter, which limits the results to the "top X" ones, as the report can get very verbose otherwise. An example config file and an example invocation via cron:

$ cat tail_n_mail.tempfile.myserver.txt

## Config file for the tail_n_mail program
## This file is automatically updated
## Last updated: Thu Nov 10 01:23:45 2011
MAILSUBJECT: Myserver tempfile sizes
EMAIL: greg@endpoint.com
FROM: postgres@myserver.com
INCLUDE: temporary file
TEMPFILE_LIMIT: 5

FILE: /var/log/pg_log/postgres-%Y-%m-%d.log

$ crontab -l | grep tempfile

## Mail a report each morning about tempfile usage:
0 5 * * * bin/tail_n_mail tnm/tail_n_mail.tempfile.myserver.txt --quiet

For the client I wrote this for, we run this once a day and it mails us a nice report giving the worst tempfile offenders. The queries are broken down in three ways:

  • Largest overall temporary file size
  • Largest arithmetic mean (average) size
  • Largest total size across all the same query

Here is a slightly edited version of an actual tempfile report email:

Date: Mon Nov  7 06:39:57 2011 EST
Host: myserver.example.com
Total matches: 1342
Matches from [A] /var/log/pg_log/2011-11-08.log: 1241
Matches from [B] /var/log/pg_log/2011-11-09.log:  101
Not showing all lines: tempfile limit is 5

  Top items by arithmetic mean    |   Top items by total size
----------------------------------+-------------------------------
    860 MB (item 5, count is 1)   |   17 GB (item 4, count is 447)
    779 MB (item 1, count is 2)   |    8 GB (item 2, count is 71)
    597 MB (item 7, count is 1)   |    6 GB (item 334, count is 378)
    597 MB (item 8, count is 1)   |    6 GB (item 46, count is 104)
    596 MB (item 9, count is 1)   |    5 GB (item 3, count is 63)

[1] From file B Count: 2
Arithmetic mean is 779.38 MB, total size is 1.52 GB
Smallest temp file size: 534.75 MB (2011-11-08 12:33:14.312 EST)
Largest temp file size: 1024.00 MB (2011-11-08 16:33:14.121 EST)
First: 2011-11-08 05:30:12.541 EST
Last:  2011-11-09 03:12:22.162 EST
SELECT ab.order_number, TO_CHAR(ab.creation_date, 'YYYY-MM-DD HH24:MI:SS') AS order_date,
FROM orders o
JOIN order_summary os ON (os.order_id = o.id)
JOIN customer c ON (o.customer = c.id)
ORDER BY creation_date DESC

[2] From file A Count: 71
Arithmetic mean is 8.31 MB, total size is 654 MB
Smallest temp file size: 12.12 MB (2011-11-08 06:12:15.012 EST)
Largest temp file size: 24.23 MB (2011-11-08 19:32:45.004 EST)
First: 2011-11-08 06:12:15.012 EST
Last:  2011-11-09 04:12:14.042 EST
CREATE TEMPORARY TABLE tmp_sales_by_month AS SELECT * FROM sales_by_month_view;

While it still needs a little polishing (such as showing which file each smallest/largest came from), it has already been an indispensible tool forfinding queries that causing I/O problems via frequent and/or large temporary files.

Double habtm Relationship Between Models

Oh, man! It's been a month since my last blog article. End Pointers Brian, Evan, Phunk and I have been working on a sizable Ruby on Rails project for a client. We've been excited to work with Rails 3.1 and work on a project that presents many unique and interesting web application challenges.

Today I wanted to write about the fairly simple task of defining two has and belongs to many (or many to many) associations between the same models, which is something I haven't seen often in Rails applications.

Data Model

First, let's look at the data model and discuss the business case for the data model. As shown above, the data model excerpt contains four tables. Users is the standard users table, which uses devise for user authentication. Groups are intended to be a group of users that will be allowed to do some combination of controller#action in our application. In our case, groups have many members (or users), but they also have many owners, who are allowed to manage the group. And obviously on the user side, users can exist as a member or an owner in many groups.

The Code

The groups_users relationship is a standard has and belongs to many relationship. The User class defines its relationship to groups:

class User < ActiveRecord::Base
  ....
  has_and_belongs_to_many :groups
  ...
end

And the Group class defines it's relationship to users:

class Group < ActiveRecord::Base
  ...
  has_and_belongs_to_many :users
  ...
end

Rails makes it fairly easy to define has_and_belongs_to_many associations and override the join table, class name, and foreign key, which is applicable to the groups_owners relationship. Here, the User class defines it's relationship to owned_groups, and specifies the join_table, class name, and foreign key:

class User < ActiveRecord::Base
  ....
  has_and_belongs_to_many :owned_groups, :class_name => "Group", :join_table => "groups_owners", :foreign_key => "owner_id"
  ...
end

And the Group model has similar overrides (except in this case, we override the association foreign key):

class Group < ActiveRecord::Base
  ..
  has_and_belongs_to_many :owners, :association_foreign_key => "owner_id", :join_table => "groups_owners", :class_name => "User"
  ..
end

And that's how to define the has and belong to many relationship between two of the same models! Obviously in our case, we can easily call and modify these associations, by calling some_user.groups, some_user.owned_groups, some_group.owners, and some_group.users.

Extras

Here I've also created a couple of instance methods on the Group and User model to make it easy to pull the aggregate of owners and users (Group) and owned_groups and groups (User):

class User < ActiveRecord::Base
  ...
  def all_groups
    (self.groups + self.owned_groups).uniq
  end
  ...
end

And:

class Group < ActiveRecord::Base
  ...
  def all_members
    (self.owners + self.users).uniq
  end
  ...
end

Performance techniques such as calling raw SQL or with Rails low-level caching can potentially be applied to these methods, since I would not expect them to be highly performing as they are shown above. Examples of raw SQL and Rails low-level caching are described here!