Welcome to part 3 of the SimpleBlog series, where we shall be looking at SQLAlchemy’s many-to-many relationships.
Since the end of part 2 I have done some work on the “admin” side of things with a way to create posts and comments via a web interface rather than tg-admin shell. While I’m sure that asking your visitors to log in via SSH and use the shell would have gone down well, I think that this way is a little easier. :)
You can download the code from here.
Lets get tagging!
So, we want to tag our posts so it makes it easier to find them at a later date, and also gives services such as Technorati the opportunity to index your posts by those tags. Lets create our tags table:
tags_table = Table("tags", metadata,
Column("id", Integer, primary_key=True),
Column("name", Unicode(100), unique=True)
)
Now, our Tag object is going to be a little more complex. The unique constraint on the name column will prevent any tag with the same name being created, but we want to make sure that the name for a tag is all lower case and has no leading or trailing white space, and any spaces (if it’s a multi word tag) are to be converted to a +.
For that we’re going to use a little mapper trick:
class Tag(object):
def __init__(self, name):
self.name = name
@classmethod
def normalise_tag_name(cls, tag_name):
return tag_name.lower().strip().replace(' ', '+')
def _set_name(self, name):
self._name = Tag.normalise_tag_name(name)
def _get_name(self):
return self._name
name = property(_get_name, _set_name)
assign_mapper(session.context, Tag, tag_table,
properties={
"_name": tag_table.c.name
}
)
What we’ve done with assign_mapper is to set the name column in the table definition to _name in the object. That means we can now use name as a custom property in our Tag object. This means, for any table column, you can in fact override the default behaviour of setting a column’s data to whatever you data you give it.
We will be using this heavily later in this article to make it ever so simple to update your post’s tags.
Now we need our table that will store the relationships between posts and tags:
tag_post_table = Table("tag_post", metadata,
Column("post_id", Integer, ForeignKey("posts.id")),
Column("tag_id", Integer, ForeignKey("tags.id"))
)
For this table we don’t need an object. Unless you add some custom attributes that you need to access you should never have to touch this table in your code as SQLAlchemy will keep it updated using the mappers.
The mapper for posts should be changed to add the tags property and, using backref, a posts property on our Tag object:
assign_mapper(session.context, Post, post_table,
properties={
"tags": relation(Tag, secondary=tag_post_table, backref="posts", lazy=False)
})
Notice the secondary keyword in our tags property. That tells SQLAlchemy to use the tag_post_table to manage the relations between Tag and Post. We’ve also introduced the lazy keyword here.
Telling the mapper not to be lazy means that it will retrieve all of the tags that are related to the post in one query when you access the tags property. This saves queries and should make your app faster by default. It’s not mandatory however - it’s up to you to choose whether this will be best for your particular scenario.
Lets go to the terminal and add a few tags to our first post.
Note: From here on out I’m going to omit the SQLAlchemy logging details from all terminal output unless it helps explain anything.
>>> p = Post.get(1)
>>> p.title
u'First post on simpleblog!'
>>> t1 = Tag("simpleblog")
>>> t1.flush()
>>> t2 = Tag("test tag")
>>> t2.name
'test+tag'
>>> t2.flush()
>>> p.tags.append(t1)
>>> p.tags.append(t2)
>>> p.flush()
>>> p.tags
[<simpleblog.model.Tag object at 0x2212c50>, <simpleblog.model.Tag object at 0x245e790>]
>>>
Well, would you look at that: SQLAlchemy’s many-to-many relations act just like python’s standard lists! We can now also manipulate the related posts via the Tag objects in much the same way:
>>> t1.posts
[<simpleblog.model.Post object at 0x22262b0>]
>>> t1.posts.remove(p)
>>> t1.flush()
>>> p.tags
[<simpleblog.model.Tag object at 0x245e790>]
>>>
So adding and removing relationships between objects is as simple as using .append() and .remove(). It just can’t get any more simple than that. :)
A little bit of property fairy dust
We’re going to do a little bit of object manipulation now that we’re done with an example of a standard many-to-many relationship. We want to make setting tags via a post really easy.
I know that the code for splitting a list of tags, creating tag objects if they don’t exist already and appending them to the post could be done in the controller, but adding it to the model seems to fit the MVC development process a lot better.
To get and set tags from a Post object we’re going to use a little bit of clever property magic. I’m going to give you the code and allow the copious comments to speak for themselves :)
class Post(object):
def __init__(self, *args, **kw):
for k,v in kw:
if hasattr(self, k):
attr = getattr(self, k)
attr = v
def _get_html_content(self):
return publish_parts(self.content,
writer_name="html")["html_body"]
html_content = property(_get_html_content)
def _get_comment_count(self):
return select([func.count(comment_table.c.id)],
comment_table.c.post_id == self.id).scalar()
comment_count = property(_get_comment_count)
def _get_tags(self):
return self._tags
def _set_tags(self, data):
"""Updates the Post's tags, adds missing Tag objects and
keeps the relationships in sync.
@param data Either a comma separated string or a list of strings"""
if isinstance(data, basestring):
# Turn the data into a list
data = data.split(",")
# Create a normalised list of the new tag names
new_tags = [Tag.normalise_tag_name(tag) for tag in data]
# Create a list of the current tag names
old_tags = [t.name for t in self._tags]
if not old_tags == new_tags:
# Create a transaction for this little section
trans = session.create_transaction()
# Clear our old tags
self._tags = []
for tag_name in new_tags:
tag_obj = Tag.get_by(_name=tag_name)
if not tag_obj:
# We need to create this tag in the DB
tag_obj = Tag(tag_name)
self._tags.append(tag_obj)
# We're done. Commit the transaction
trans.commit()
tags = property(_get_tags, _set_tags)
Whew, big chunk of code! But we’re not done yet, we also need to update our mappers:
assign_mapper(session.context, Post, post_table,
properties={
"_tags": relation(Tag, secondary=tag_post_table, backref="posts", lazy=False)
}
)
assign_mapper(session.context, Comment, comment_table,
properties={
"post": relation(Post, backref="comments")
}
)
assign_mapper(session.context, Tag, tag_table,
properties={
"_name": tag_table.c.name
}
)
As you can see, I’ve changed the tags property to _tags. This means we can still use SQLAlchemy’s clever eager loading but create our own tags property.
Now, to set a post’s tags we can do one of two things.
- Set the tags using a string:
p.tags = "simpleblog, tags, technorati" - Set the tags using a list:
p.tags = ["simpleblog", "tags", "technorati"]
Both of the above give the same result. The methods above will automagically determine if the tags need to be updated and which Tag objects need to be created first.
Well, that’s given you an introduction to many-to-many relationships, plus a little more object tweaking. Part 4 will consist of self-referential relationships and a little bit of advanced data mapping in the form of searching.
I’ll update the controller and templates so that adding tags is part of the admin panel ready for the beginning of the next part in the series.
There will also be a little more polish to the templates and admin functionality next time round. This week has been very busy for me so some of the templates were a little rushed. Once we’re done tweaking the model, part of this series will cover Widgets and Validators to make user input nice and safe :)
anthony baxter | 28-Oct-06 at 5:20 am | Permalink
Spelling mistake: contect should be context.
elvelind grandin | 28-Oct-06 at 10:13 am | Permalink
There seem to be something wrong with you css.
“A little bit of property fairy dust” is invisible for me.
Otherwise, As nice as the first two parts.
elvelind grandin | 28-Oct-06 at 10:15 am | Permalink
To follow up. It looks like your header levels are mixed around.
“SimpleBlog Part 3..” Should be H2 not H3
and “A little bit of property fairy dust” should be H3 and not H1.
Splee | 28-Oct-06 at 1:19 pm | Permalink
Bah, that’s what I get for posting things late at night. Thanks to the both of you, I’ve fixed both those issues now.
Adam Jones | 29-Oct-06 at 9:24 am | Permalink
I just discovered this series, it is looking pretty good. I’ve been using sqlobject for pretty much everything I do in TG. Now that I see (instead of just knowing) what sqlalchemy can do I am going to seriously consider it.
links for 2006-10-29 at Blue Sky On Mars | 29-Oct-06 at 10:23 am | Permalink
[...] SimpleBlog Part 3 - many-to-many relationships at splee.blog (tags: turbogears programming python sqlalchemy) [...]
anthony baxter | 29-Oct-06 at 1:12 pm | Permalink
Also - types.StringType? Wow, welcome to 1999. :-)
isinstance(data, basestring).
Aside from the age of this construct, and the ugliness (you should almost never use the types module), if the current code gets unicode, it does a very bad thing. for instance,
u’foo,bar’ will result in tags ['f', 'o', ',', 'b', 'a', 'r']
Splee | 30-Oct-06 at 1:42 pm | Permalink
@Adam - I’m glad that the series has made a difference for you. This was the reason I started writing it; people can be told “oh yes, SQLAlchemy is better because⦔, but until you actually see it in action it’s difficult to quantify.
@Anthony - Fixed. Thanks for the pointer.
Cliff Wells | 14-Nov-06 at 7:26 am | Permalink
Lee,
As always, you rock. Thanks for another great set of articles.
Nadav Samet | 19-Nov-06 at 8:41 am | Permalink
Hi,
I’ve wrote a few days ago a tutorial on implementing tags, with a focus on doing search queries on them.
dax | 25-Nov-06 at 6:00 pm | Permalink
Tried downloading the file. Seems the file simpleblog-part3a.zip uses ActiveMapper and not the one discussed in this post.
Splee | 25-Nov-06 at 6:07 pm | Permalink
@dax: The application code uses ‘plain’ SA. Since (at the moment) the identity definitions use ActiveMapper when you quickstart a project, I’ve left them intact to prevent any confusion.
If you want to substitute the ActiveMapper identity definitions, you can use this code.
Matthew Webber | 27-Nov-06 at 4:02 pm | Permalink
Thanks for all this useful information. A couple of minor edits :
(1) tagstable = Table( …
I prefer tagtable (singular not plural) for consistency with the naming convention used elsewhere.
(2)@param data Either a comma separated string …
When I first saw this my eye tried to interpret it as some kind of decorator, due to the @ sign and the fact that the line above it was blank, so I didn’t immediately realize it was a continued comment. Is this format a docstring convention?
Matthew Webber | 27-Nov-06 at 4:27 pm | Permalink
One other problem : The code provided for downloaded fails when you add a comment. The generated SQL does not have quote marks around the value of author-email. On the other hand, author-name works ok (underscore replace by dash to avoid WordPress formatting).
I’m not quite sure what is causing this. Maybe related to author-email being String rather than unicode?
Also, underscore in a comment in obviously interpreted as a formatting directive. Is there a list of such directives (and how to escape them). A quick look at the WordPress site suprsiingly failed to turn up such a list.
Matthew Webber | 27-Nov-06 at 5:48 pm | Permalink
Since you’re written some admin code, it might be worth adding a note explaining how to get to it. Here’s what I did -
Firstly, add a user via the shell
/usr/bin/tg-admin shell
u = User(username=’example’, emailaddress=’example@example.com’, displayname=’Mr Example’, password=’secret’)
g = Group(groupname=’admin’, display_name=’Administrators’)
g.users.append(u)
Then python start-simpleblog.py and point your browser to localhost:8080/admin
Matthew Webber | 28-Nov-06 at 4:27 pm | Permalink
I previously said
>> The code provided for downloaded fails when you add a comment. The generated SQL does not have quote marks around the value of author-email. On the other hand, author-name works ok (underscore replace by dash to avoid WordPress formatting).
Matthew Webber | 28-Nov-06 at 4:31 pm | Permalink
To continue my previous entry. This appears to be a problem with SQLalchemy and PostgreSQL. As a workaround, in controllers.py, change this statement in the comment function (undescore replaced by dash)
c = model.Comment(post=p, author-name=author-name,
author-email=author-email,
content=content)
so that you have
authoremail=str(authoremail),
It’s a workaround, but it might help someone.
Matthew Webber | 28-Nov-06 at 4:47 pm | Permalink
Silly me, the above comment applies to part 4.
Splee | 28-Nov-06 at 10:28 pm | Permalink
No problem. FYI, you can either escape underscores with a backslash or escape each bit of code with a backtick.
Thanks for all the feedback.
Cory | 22-Feb-07 at 6:50 am | Permalink
Instead of placing tag parsing into the model (and mixing the domain logic with the binding logic), try placing into into a validator. In Turbogears validators serve two purposes: validate and bind. From the docs, it is very easy to overlook the binding aspect of it.
Example:
class TagConvertor(String):
def topython(self, value, state):
#do string to tag(s) conversion
def frompython(self, value):
#do tag object to string conversion
You would still have to deal with the persistence aspect in either the model or a DAO layer, but you’d have slightly easier time tracking the domain logic.
Then again, this could be like take a flamethrower to an anthill.
(Let’s see it if the formatting work out, eh?)
Sam D. | 28-Jul-07 at 1:37 pm | Permalink
The Tag class could be improved as follows:
’split()’ with no arguments will remove runs of one or more whitespace characters.
mackieboy | 01-Aug-07 at 3:56 am | Permalink
help…really am a starter here.
I’m having problems when calling flush-
>>>t1.flush
there is this raise exceptions.SQLError…how can i fix it?
Im using python 2.4.
thanks