?

Log in

No account? Create an account

Previous Entry | Next Entry

Well, this IS a creativity blog, and I'm recording the creative process for the most part. So even work is creative. And some of it IS fun.

After all, problem-solving is a creative act. Even when it doesn't come in colour. Or is even visable at all. Right? Of course, right!

Tackling one of the fundamental problems for web site CMS development: How to deal with arbitrary sort order for items that may be in a different arbitrary sort order in a different presentation.

Let's say you have five articles in a list. You have a widget that obtains those articles and presents them. We'll give the articles a name by Alpha and a sort order by number:

A - 1
B - 2
C - 3
D - 4
E - 5

This is easily stored in a table in a DBA call it "sorts":

item_id   | sort_order
{id of A} | 1
{id of B} | 2
{id of C} | 3
{id of D} | 4
{id of E} | 5

And joined in an SQL statement:

select {various properties from item table}, sort_order
from item, sorts
were item.id=sort_id
order by sort_order

If you want article D to appear on the list before article C, you'd change the sort order association.

item_id   | sort_order
{id of A} | 1
{id of B} | 2
{id of D} | 3
{id of C} | 4
{id of E} | 5

Then you just add sort order integers based on the max(sort_order) of the table, using auto-increment:

item_id   | sort_order
{id of A} | 1
{id of B} | 2
{id of D} | 3
{id of C} | 4
{id of E} | 5
{id of F} | 6
{id of G} | 7
...

This does have the disadvantage that all new items are always "at the bottom". Although, one could also add all new items with the sort order default of 0, which may be better because they draw attention to themselves—either way, that sounds like an optional toggle for the sort-order item to work out. The other problem, though, is that if an item isn't inserted into the sort order, then its sort order is null, so it will not show up in any of the queries above. We could write all queries that use sort order as an outer join, though, and return 0 arbitrarily for null:

select {various properties from item table},
  if(sort_order is null, 0, sort_order) as sort_order
from item outer join sorts
on item.id=sort_id
order by sort_order

But that does make debugging a little difficult if you're not aware of the sort_order table when you first begin to develop for the CMS. Still, I don't see a way around it if ANY sort of arbitrary sort order is allowed, so I think this is one of those compromises you just need to make sometimes.

Now, does it scale? Technically, you could have these sort orders in one huge table, so these sort orders would persist for all articles.

item_id   | sort_order
..other articles..
{id of A} | 135
..other articles..
{id of B} | 273
..other articles..
{id of C} | 310
..other articles..
{id of D} | 448
..other articles..
{id of E} | 529
..and so on..
So in a CMS, when you manipulate the sort order, you want to keep the sort order integer, and swap the precise value of the one associated with article C for the one associated with article D in the sort_order table when you update this sort order.
item_id   | sort_order
{id of A} | 135
{id of B} | 273
{id of D} | 310
{id of C} | 448
{id of E} | 529

That seems easy enough to maintain and the items are retrieved using an index very quickly and early in the query. Heh, heh—and this approach has the advantage of being able to use the way-cool jquery sortable interface to establish the new sort order, and the data-insertion will always only be the sum of the items whose orders were swapped, so quick to update, too. The index will be capped by the number of articles, so that's OK, too. Yep. It scales.

Now... if you want to make your arbitrary sort orders more granualar for different presentations, you need to tie each presentation instance to a unique identifier. For example, a good unique identifier would be the URL+the ID of the widget on the page (call this new table granular_sorts):

item_id   | unique_id                 | sort_order
{id of A} | {/page1.html##widgetname} | 1
{id of B} | {/page1.html##widgetname} | 2
{id of D} | {/page1.html##widgetname} | 3
{id of C} | {/page1.html##widgetname} | 4
{id of E} | {/page1.html##widgetname} | 5

NB: This does assume that the widget does not have multiple sort orders, otherwise, the criteria for these other sort orders need to be included in a unique way in this ID.

But that doesn't index very quickly, since the URL+ID is a bit of arbitrary text. Sure, you can index arbitrary text, but integers are just so much faster. So, once we add an easily indexed table for an additional join like:

select {various properties from item table},
  (
  if granular_sort_id is not null, granular_sort_order, 
    if(sort_order is null, 0, sort_order)
  ) as sort_order
from item
 outer join sorts
   on item.id = sorts.item_id
 outer join granular_sorts
   on granular_sorts.unique_id = {current unique identifier}
      and item.id=granular_sorts.item_id
order by sort_order

Now, that solution has a disadvantage of needing an additional outer join, but since the granular table is liable to be rather small since not many unique granular instances are liable to exist on a web site, that would have minimal impact on scaling. But we can't guarantee that, since you may get a user so persnickety as to have a unique presentation for hundreds of widgets. And what if they start automatically generating great masses of unique widgets?

So perhaps keeping the unique ID and the article IDs in the same table:

item_id   | unique_id                 | sort_order
{id of A} | NULL                      | 135
{id of B} | NULL                      | 273
{id of D} | NULL                      | 310
{id of C} | NULL                      | 448
{id of E} | NULL                      | 529
{id of A} | {/page1.html##widgetname} | 530
{id of B} | {/page1.html##widgetname} | 531
{id of D} | {/page1.html##widgetname} | 532
{id of C} | {/page1.html##widgetname} | 533
{id of E} | {/page1.html##widgetname} | 534

And then moving the data into the CMS programming language to choose which one is better, like (perl):

map {$_->{sort_order} = $query{item_id}->{unique_id} || $query{''}->{unique_id} } @objects;
sort ($a->{sort_order} <=> $b->{sort_order}) @objects;

Then again, this means all debugging needs to move back and forth between the programming and the database engine. Not that it doesn't, usually, but this guarantees it.

So... I think I'll go for the outer join option, for an additional reason: becuase I'd like to keep the code as re-usable as possible between different languages, since I need both perl and PHP versions. I think I'll implement it as a sort-order object as well. That way I can create switches between how new inserts are done, and/or implement outer-join versus straight-join subclasses, the better to tweak them later, my precisouss...

This is going to be fun to implement. Not whizz-bang, but lots of good problem-solving chaff to grind through here.