Friday, September 5, 2008

Reading Data

My Current Challenge
I'm working on the generic data object to data table mapping problem right now. Last time I worked on this problem I focused on the write functionality. I got the code that goes through the object using reflection working and then I spent some time working on the SQL script generation. Now I've got the basic read functionality working using similar reflection code. However, I'm running into a bit of a snag.

For most data tables, it is just easier to create an auto-incrementing Id field and then use that Id as a foreign key from other tables whenever a record -- a thus an object -- needs to be referenced. However, when looking up records, there are usually other columns that are more useful.


For example, on the Shoebox project, I have a User class that contains a name, a location and an email-address. The name will probably be split into first and last names. The location will be a reference to a City class. The email-address, however, needs to be unique, but I don't want to use the email-address a the primary key as this would be cumbersome to use as a foreign key in other tables. Additionally, the email-address might be updated, but the User would still be the same user. Therefore, it makes sense to use an auto-incrementing Id as the primary key.


In the table, the email-address column is indexed and marked as requiring unique values. This will allow for quick lookup and prevent duplicates. The problem is to come up with the right names to use in the generic code for reading records.


Read By Id
To read using the Id (or Identity), I can use a generic method like this:
public static T Read<T, P>(P id) { ... }


And then use it like this:
User user = DB.Read<User, long>(1);


This method will use the stored procedure that reads using the Id.


Read By Indexed Values
To read by an indexed value such as the email-address, I've got two problems:
  1. I need to be able to specify the column name(s) and type(s).
  2. I could have more than a single column.
If it were only a single column, I could use the DataAttribute I created to specify that single column and then have a generic method that looks like this:
public static T ReadByIndex<T, P>(P value) { ... }

And then use it like this:
User user = DB.ReadByIndex<User, string>("foo.bar@gmail.com");

Handling the Multiple Columns Problem
However, when there are multiple columns, what do I do?

The Alphabetical Option
Can I just add in another type parameter to my generic ReadByIndex method and assume that the columns are in a predetermined order -- like alphabetical?

Like this:
public static T ReadByIndex<T, P1, P2>(P1 a1, P2, a2) { ... }

With this:
Population population = DB.ReadByIndex<Population, string, string>("Denver", "Colorado");

This example assumes a class called Population with properties City and State that combined would allow a lookup. Alphabetically, City comes before State, so the first parameter is the city and the second is the state.

If more that 2 columns were necessary, another generic ReadByIndex method could be created providing addition parameters.

The Name-Value Option
Or, should I call out the name of the column along with its value?

Like this:
public static T Read<T>(params NameValue[] args) { ... }

With this:
Population population = DB.Read<Population>(new NameValue("City", "Denver"), new NameValue("State", "Colorado"));

This example assumes the same class called Population as well as a class called NameValue with a construtor like this:
public NameValue(string name, object value) { ... }

Which one do I like better?
First of all, I really like the first "read by Id" method -- Read<T, P>(P id). For all classes I will have the auto-incrementing Id, so this will make reading any record simple. I also like that this method has a type for the parameter, so that a long or an int can be used as appropriate for a given class.

Now, about the Pro's and Con's of the other methods... hmmm...

The "alphabetical option" is intriguing as its usage is just a matter of setting the Data attribute on the right properties in the class. This option also overcomes the lack-of-type-support-for-the-value drawback that the "name-value option" has. However, there are a couple of big drawbacks:
  1. At the point of usage, there is nothing to indicate what alphabetical parameters are being used.
  2. This only allows for a single set of indexes. Something would need to be modified in order to allow a second set of indexes.
The "name-value option" over comes the drawbacks of the "alphabetical option" -- 1) since the parameters are named, they can go in any order, and 2) to use a different combination of indexes, it is a simple matter to just use the desired columns. However, it has its own drawbacks:
  1. There is no compile-time support to make sure that the right names are used.
  2. The values are passed as the type object so there is not compile-time support for the value's type.
So, there are the trade-offs: lack of type support versus inflexible obscurity.

The Manually Coding of it All - A Non-Option
Another option would be to abandon the generic nature of this idea and just code up each piece as necessary ... but this is a "non-option" as it really just defeats the purpose of the idea.

My Choice
Well, I think I'm going to go with the "Name-Value Option". The purpose of this idea is to create a generic and flexible way to read and write objects to and from tables. The inflexible, alphabetical option would just get in the way due to its inflexibility. With the name-value option, a specific wrapper method can be created for indexed lookup and tests can be written to ensure that the column/property names work at runtime.

To illustrate this point about a wrapper method, using the Population class example, I would add a method to the Population class like this:
public static Population Read(string city, string state)
{
return DB.Read<Population>(new NameValue("City", city),
new NameValue("State", state));
}

Such that it could be used like this:
Population population = Population.Read("Denver", "Colorado");

Meanwhile, the "read by id" method could be used like this:
Population population = DB.Read<Population, long>(21);

This method, of course, could be put into a wrapper method in the Population class as well:
public static Read(long id) { ... }

Such that it could be used like this:
Population population = Population.Read(21);

That's enough for now.

No comments: