SimpleBlog Part 3 - many-to-many relationships

Welcome to part 3 of the SimpleBlog series, where we shall be looking at SQLAlchemy’s many-to-many relationships.

Since the end of part 2 I have done some work on the “admin” side of things with a way to create posts and comments via a web interface rather than tg-admin shell. While I’m sure that asking your visitors to log in via SSH and use the shell would have gone down well, I think that this way is a little easier. :)

You can download the code from here.

Lets get tagging!

So, we want to tag our posts so it makes it easier to find them at a later date, and also gives services such as Technorati the opportunity to index your posts by those tags. Lets create our tags table:

tags_table = Table("tags", metadata,
    Column("id", Integer, primary_key=True),
    Column("name", Unicode(100), unique=True)
)

Now, our Tag object is going to be a little more complex. The unique constraint on the name column will prevent any tag with the same name being created, but we want to make sure that the name for a tag is all lower case and has no leading or trailing white space, and any spaces (if it’s a multi word tag) are to be converted to a +.

For that we’re going to use a little mapper trick:

class Tag(object):
    def __init__(self, name):
        self.name = name

    @classmethod
    def normalise_tag_name(cls, tag_name):
        return tag_name.lower().strip().replace(' ', '+')

    def _set_name(self, name):
        self._name = Tag.normalise_tag_name(name)

    def _get_name(self):
        return self._name

    name = property(_get_name, _set_name)

assign_mapper(session.context, Tag, tag_table,
    properties={
        "_name": tag_table.c.name
    }
)

What we’ve done with assign_mapper is to set the name column in the table definition to _name in the object. That means we can now use name as a custom property in our Tag object. This means, for any table column, you can in fact override the default behaviour of setting a column’s data to whatever you data you give it.

We will be using this heavily later in this article to make it ever so simple to update your post’s tags.

Now we need our table that will store the relationships between posts and tags:

tag_post_table = Table("tag_post", metadata,
    Column("post_id", Integer, ForeignKey("posts.id")),
    Column("tag_id", Integer, ForeignKey("tags.id"))
)

For this table we don’t need an object. Unless you add some custom attributes that you need to access you should never have to touch this table in your code as SQLAlchemy will keep it updated using the mappers.

The mapper for posts should be changed to add the tags property and, using backref, a posts property on our Tag object:

assign_mapper(session.context, Post, post_table,
    properties={
        "tags": relation(Tag, secondary=tag_post_table, backref="posts", lazy=False)
    })

Notice the secondary keyword in our tags property. That tells SQLAlchemy to use the tag_post_table to manage the relations between Tag and Post. We’ve also introduced the lazy keyword here.

Telling the mapper not to be lazy means that it will retrieve all of the tags that are related to the post in one query when you access the tags property. This saves queries and should make your app faster by default. It’s not mandatory however - it’s up to you to choose whether this will be best for your particular scenario.

Lets go to the terminal and add a few tags to our first post.

Note: From here on out I’m going to omit the SQLAlchemy logging details from all terminal output unless it helps explain anything.

>>> p = Post.get(1)
>>> p.title
u'First post on simpleblog!'
>>> t1 = Tag("simpleblog")
>>> t1.flush()
>>> t2 = Tag("test tag")
>>> t2.name
'test+tag'
>>> t2.flush()
>>> p.tags.append(t1)
>>> p.tags.append(t2)
>>> p.flush()
>>> p.tags
[<simpleblog.model.Tag object at 0x2212c50>, <simpleblog.model.Tag object at 0x245e790>]
>>>

Well, would you look at that: SQLAlchemy’s many-to-many relations act just like python’s standard lists! We can now also manipulate the related posts via the Tag objects in much the same way:

>>> t1.posts
[<simpleblog.model.Post object at 0x22262b0>]
>>> t1.posts.remove(p)
>>> t1.flush()
>>> p.tags
[<simpleblog.model.Tag object at 0x245e790>]
>>>

So adding and removing relationships between objects is as simple as using .append() and .remove(). It just can’t get any more simple than that. :)

A little bit of property fairy dust

We’re going to do a little bit of object manipulation now that we’re done with an example of a standard many-to-many relationship. We want to make setting tags via a post really easy.

I know that the code for splitting a list of tags, creating tag objects if they don’t exist already and appending them to the post could be done in the controller, but adding it to the model seems to fit the MVC development process a lot better.

To get and set tags from a Post object we’re going to use a little bit of clever property magic. I’m going to give you the code and allow the copious comments to speak for themselves :)

class Post(object):

    def __init__(self, *args, **kw):
        for k,v in kw:
            if hasattr(self, k):
                attr = getattr(self, k)
                attr = v

    def _get_html_content(self):
        return publish_parts(self.content,
                    writer_name="html")["html_body"]
    html_content = property(_get_html_content)

    def _get_comment_count(self):
        return select([func.count(comment_table.c.id)],
                    comment_table.c.post_id == self.id).scalar()
    comment_count = property(_get_comment_count)

    def _get_tags(self):
        return self._tags

    def _set_tags(self, data):
        """Updates the Post's tags, adds missing Tag objects and
        keeps the relationships in sync.

        @param data Either a comma separated string or a list of strings"""
        if isinstance(data, basestring):
            # Turn the data into a list
            data = data.split(",")

        # Create a normalised list of the new tag names
        new_tags = [Tag.normalise_tag_name(tag) for tag in data]

        # Create a list of the current tag names
        old_tags = [t.name for t in self._tags]

        if not old_tags == new_tags:
            # Create a transaction for this little section
            trans = session.create_transaction()

            # Clear our old tags
            self._tags = []

            for tag_name in new_tags:
                tag_obj = Tag.get_by(_name=tag_name)
                if not tag_obj:
                    # We need to create this tag in the DB
                    tag_obj = Tag(tag_name)
                self._tags.append(tag_obj)

            # We're done.  Commit the transaction
            trans.commit()

    tags = property(_get_tags, _set_tags)

Whew, big chunk of code! But we’re not done yet, we also need to update our mappers:

assign_mapper(session.context, Post, post_table,
    properties={
        "_tags": relation(Tag, secondary=tag_post_table, backref="posts", lazy=False)
    }
)
assign_mapper(session.context, Comment, comment_table,
    properties={
        "post": relation(Post, backref="comments")
    }
)    
assign_mapper(session.context, Tag, tag_table,
    properties={
        "_name": tag_table.c.name
    }
)

As you can see, I’ve changed the tags property to _tags. This means we can still use SQLAlchemy’s clever eager loading but create our own tags property.

Now, to set a post’s tags we can do one of two things.

  1. Set the tags using a string: p.tags = "simpleblog, tags, technorati"
  2. Set the tags using a list: p.tags = ["simpleblog", "tags", "technorati"]

Both of the above give the same result. The methods above will automagically determine if the tags need to be updated and which Tag objects need to be created first.

Well, that’s given you an introduction to many-to-many relationships, plus a little more object tweaking. Part 4 will consist of self-referential relationships and a little bit of advanced data mapping in the form of searching.

I’ll update the controller and templates so that adding tags is part of the admin panel ready for the beginning of the next part in the series.

There will also be a little more polish to the templates and admin functionality next time round. This week has been very busy for me so some of the templates were a little rushed. Once we’re done tweaking the model, part of this series will cover Widgets and Validators to make user input nice and safe :)