Friday, August 15, 2008

Parameterized Database Methods

Can't Use an Interface
The methods that deal with the database are Read, Write, Exists & Delete.  Read, Exists & Delete are static methods.  Write is an instance method.  Because of the static methods, I can not use an interface.

Generic Method & Reflection
Is there a way that I can use a typed method and reflection to interact with the database? Using this idea of attributes on the class and passing the type as a parameter to one of these database interaction methods, can reflection be used to identify each of the fields for reading, writing, existence and deletion?

An Example
I'd like to outline a few examples.

Suppose I have a class Foobar.

I might be able to read a Foobar record using code that looks like this:
  Foobar foobar = Database.Read<Foobar>(foobarId);

I might be able to see if Foobar record exists using code like this:
  if ( Database.Exists<Foobar>(foobarId) ) { ... }

I might be able to write a Foobar record list this:
  Database.Write<Foobar>(foobar);

Or, this (as the type can be determined from the parameter):
  Database.Write(foobar);

And I could delete like this:
  Database.Delete<Foobar>(foobarId);

Questions
Would this give me compile-time information for the foobarId parameter?  If a key for a particular table was comprised of multiple values, would a params parameter be adequate? Or, would a multiple-type generic be better (e.g., Database.Delete<T, P1, P2>(P1 a1, P2 a2))?

Script Generation
Even with the generic type doing all the work, the database creation and conversion scripts would still need be be generated from the type ... but the compiled type could be used to generate this information rather than some other file format.

Database Identifier
I would still need to pass in a database identifier as well.

Generating the Code

Essential Information
The following information is necessary to generate the rest of the code:
  • database identifier
  • class/table name
  • fields - name, type & purpose
Conversion Script
In order to generate the conversion script, the previous version of this information will also be necessary.  If multiple previous version are currently in use, a conversion script can be generated for each of the previous versions in use.

Database Identifier
The database identifier tells which which connection string to pull from the configuration file.

  connectionString = ConfigurationManager
    .ConnectionStrings[
      ConfigurationManager
        .AppSettings[databaseIdentifier]
    ].ConnectionString;

Class/Table Name
The class/table name tells the name to use for the class in the CS file as well as the table in the database.

Fields
The fields identify the fields of the class and the columns of the table.  The order of these fields determines the order of the columns.

Field Name
The field name is the name of the field and the column.

Field Type
The field type is one of the following: 
  • primitive type - The primitive type is a C# type.  In the case of a string, a length for the database is also included.  
  • enumeration - The names of the enumerated values are also included.
  • other class - The other class needs to exist in order to include the key(s) to that class's table.
Field Purpose
The field purpose is one of the following:
  • passed-in key - the value for the key will be passed into the object constructor
  • code-generated key - the value for the key will be generated in the object constructor using the code provided
  • database-generated key - the value for the key will be generated in the database (this is for auto-incrementing keys)
  • required non-key - the value will be passed into the object constructor - a null value is not allowed
  • optional non-key - the value can be set via a property and will remain null if not provided
Generated Code
The tool can use this information to generate the constructor, private data members, properties and the essential methods (Read, Write, Delete, Exists).  This code will go into a file using the class/table name following this pattern: {classname}.generated.cs. Additional Developer code can go in the corresponding {classname}.supplemental.cs file.  This will allow the {classname}.cs file to be used just for the information essential for generating the rest of the class.  (The source file can, of course, be an XML file or any other file as long as it contains the essential information and the tool can process it.)

Generated SQL Scripts
The tool also generates the .SQL file which contains the database creation script including all of the stored procedures.  This file is called {classname}.sql and can then be included in a master script file for generating the database.

Previous Versions
The tool needs to have the ability to create conversion scripts from previous versions.

Where is the version information recorded?  Does it need to be recorded?

There are several ways to handle the version problem.  An existing database could be compared.  Another input file could be compared.  The generated CS and/or SQL file could be compared.  The input file could have multiple sections with one section for each of the various versions.  I will deal with the version problem at another time.

Dealing with the Database

Here is my current idea on how to deal with the database.

Mark the Class and Fields with Attributes
The Developer creates the class and adds a few attributes.

  [Database("SampleDatabase")]
  public partial class User
  {
    [Key]
  private string name;

    [Required]
  private string password;
  }

The attributes indicate how to connect the object to a database and how to handle the fields. The class is marked as partial so that generated code can be added to it.

Run a Tool
A tool is used on this class to generate the constructor, the public accessor properties, the Read, Write, Exists and Delete methods, the stored procedures used by these methods, and the create table script.

Questions About the Tool
What is this tool?  Is it run as part of the build script?  Is it a stand-alone tool?  Is it a plug-in to Visual Studio?  Does it use the text of the class as input? Or, does it work off a compiled class?  Can the other parts of the class be generated at run time?

When does the tool run again? Is it automatically part of the build process? Or, does it need to be run manually?

A Similar Tool
I worked on a similar tool, called CSFromXSD, that took an XSD file as input and generated classes that would serialize and de-serialize objects to and from XML files that conformed to the schema described in the XSD.  The XSD was part of the CSProj, and the dependencies were setup such that if the XSD was modified, the associated, generated CS file was regenerated.

This idea of starting with a CS file is a little different.  This would be generating CS-from-CS; therefore, the parsing is not an XML parsing process.  It would be handy to have the original CS file compiled so that I could use reflection on it.

I thought of this CSFromXSD when I came up with this CSFromCS idea. However, I may want to take a different approach on this tool.

Developer View
In the long run, what I am shooting for is that the Developer creates a class (or class description) identifying the name of the class and the fields that need to be persisted and the necessary database scripts and the code to interact with the database are generated.  If the input class (or class description) is modified, the scripts and code are automatically regenerated including some sort of process to update the database.

The Developer then interacts with the class a very natural way.

  if (User.Exists(username)) 
  {
    throw new Exception("User already exists."); 
  }  

  User user = new User(username, password);
  user.Write();

  string name = user.Name;
  string password = user.Password;

  user.Password = "P@55w0rd";
  user.Write();

  User user2 = User.Read(username);

  User.Delete(username);

When another field is added to the User class, the creation and update scripts are generated, the class is regenerated, and the Developer runs the scripts on the database.  Then the Developer can use the new fields in a natural way.

  User user = new User(username, password)
  {
    emailAddress = emailAddress
  }

  user.Write();

A Changing Database

From the code side of things (as opposed to the database side of things), the developer just wants to write the object and read the object.  Actually, the developer just wants to use the object and have it read and written as needed automatically.  The developer implements a given scenario adding fields to an object and/or creating new objects.  In the underlying database, the developer doesn't care what tables, columns and stored procedures are created or modified.  He just wants the objects to be stored and retrieved as close to automatic as possible.  However, as the database schema changes, the developer would like to be able to move from one schema to the other without worry.

Here's a simple example.

The Developer needs to implement a scenario called "Create User Account".  Initially, the developer identifies that a User will have a name and a password.

  public class User
  {
  string name;
  string password;
  }

This class is mapped in the database as the User table with a name and a password column.  The name is unique and used as the identifier.  A script is created to create the table with the two columns.

Later, the Developer discovers that an email address should be associated with the User as well giving this kind of a class:

  public class User
  {
  string name;
  string password;
    string emailAddress;
  }

The User table now needs to be modified to include the additional emailAddress column.  The scripts to create the table need to be modified as well.  Additionally, an upgrade procedure needs to be create that converts from the old format to the new if the database has been deployed.

The Developer should not need to be concerned with these changes to the underlying database. All the Developer should need to be concerned with is adding the emailAddress field to the User class.

Obviously, this is a simple example.  Whole additional objects might be added or fields removed or renamed.  When multiple changes have been made, these need to be cumulated.

Is it possible to remove this detail from the Developer and generate the creation scripts and upgrade procedure automatically?

Object to Database Mapping Problem

I'd like to talk about the technology to map objects to database for a minute.  I recently created the beginnings of a system to make this mapping.  I've even used systems created by others.  I'd like to investigate the LINQ technology.

Before I go off on one of these ideas, I'd like to describe the problem of mapping objects to the database as I see it.

Databases contain tables with columns.  These are 2 dimensional structures.  Objects contain fields which can be other objects.  These can be 2, 3 or even N-dimensional structures.  Going from N-dimensions to 2-dimensions requires some sort of mapping.  Additionally, in order to represent the N-dimensions in the 2-dimensional structure, it requires multiple tables that are joined together.

As features are added to a product or system, new fields and objects are added which result in additional columns or entirely new tables.

At the "leaf-level" of the problem, the data types in the database do not always line up with the data types in the objects.  For example, should a "string" in the object space be mapped to an "nchar" or "varchar" or some other type of "char" in the database ... and then of what size?  Likewise, is the granularity of the "DateTime" value in the object space the same as that in the database?  If not (and in the cases I've seen, it is not), what can be done to over come that problem?  Or is the lower granularity good enough?

Should the table have an auto-incrementing id? Or should some field or combination of fields be used as the unique identifier for a table?

Should inherited objects be stored in a single table? Or should the base class information be stored in one table while the derived object's information is stored in other?

Should an entire set of data be read in? Or should just the top-level object be read in on the initial read with the other levels read in on an as-needed basis?  Should this be a configurable option?  Should this be an option that changes depending on the size of the data set?

Can the object be serialized and de-serialized instead of being mapped to a table?

What's Next?

Where do I go next?  I could flesh out the scenarios.  I could layout a database schema.  I could create the skeleton infrastructure of the project -- web site, back end, database.  I could play with the layout of the web pages.  I could mess with the technology to map objects to database.

(I could even go look for existing software that meets the current requirements -- but that would not be any fun, would it?!)

Additional Features

I met briefly with three of the homeschool mothers -- my wife and two others.  I was right -- they came up with several other ideas for the web site.

The first thing they did was list of a bunch of categories.  It became apparent to me that allowing them to manage their list of categories is a good idea.  They then added several additional pieces of functionality: calendar, field trip schedules, links to other sites, picture history of activities with the ability to create a scrapbook from them.  I read the list of scenarios to them and got a bunch of heads nodding in agreement.  I then described the information that needs to be tracked for each resource -- both existing and desired.  Here they came up with additional information to track such as the price and where a desired resource can be found.  I suggested that the location of an existing resource might be independent of the owner.

It was only about 10 minutes of time and it was a small group, but it was a valuable 10 minutes to glean the ideas that they presented.

Oh ya! There was one other request: Make it cute!  This may be the most abstract request, but it may also be the most important.  I'll probably need to spend some time thinking about this request.

Shoebox Scenarios

I will now attempt to list out a few of the scenarios for Shoebox.  This is an initial or preliminary list -- that is, I expect that I will need to review, revise and prioritize this list over time.

Rather that put the details here now, I will just list out the activities that the Users of this system will need to be able to do.
  1. Create a User Account
  2. Create a Homeschool Group
  3. Add a User to Group (or a Group to a User)
  4. Add an Existing Resource
  5. Remove an Existing Resource
  6. Borrow a Resource
  7. Return a Resource
  8. Add a Desired Resource
  9. Cast a Vote for a Desired Resource
  10. Remove a Vote from a Desired Resource
  11. Converted a Desired Resource to an Existing Resource
  12. Remove a Desired Resource
  13. Remove a Homeschool Group
  14. Remove a User Account from a Group (or Group from an Account)
  15. Remove a User Account
  16. Add a Category
  17. Remove a Category
It seems like the system will need to have the ability to make comments about resources.  I imagine that additional functionality will be desired over time.

Shoebox Purpose & Information

Shoebox is the code name for this software.

The purpose of this software is to allow the homeschool mothers group to share and acquire resources.

This is the (initial) information about an existing resource:
  • resource id
  • name or title
  • category
  • description
  • owner
This is the information about a loaned resource:
  • resource id
  • borrower
  • date loaned
  • date returned
This is the information about a desired resource:
  • resource id
  • name or title
  • category
  • description
This is the information about a vote for a desired resource:
  • resource id
  • voter

A Code Name for a Project

Let me see if I can capture what would go into this software that my wife needs.

First of all, I need a name for this software -- a code name.  I could use what I might think the final name of the software will be, but experience tells me that that name will change before the project is completed.  That name usually changes multiple times even near the end of the project right before it releases.  For example, today, I might call this software "Resource Tracker", but my client (i.e., my wife and her group of homeschooler mothers) might want a different name before I'm done with this software.  So, what do I call it instead?

I could just use a number calling it "Project 1" or "One".  Or I could use the date and call it "August 15th" or "2008-08-15".  Or I could pick some word or combination of words like "shoebox" or "ice pick".  Often a code name comes from a historic or mythological figure, a fictional character, or the name of city or some other place.  Usually, a development group comes up with a "reason" for picking the code name.

Well, I think I'll call this project "Shoebox" as the idea of the project is to track information about resources which might just as well be written on cards and stuffed in a shoebox.

To Write or Not To Write?

My wife meets with a group of other mothers who also homeschool their kids.  They exchange ideas and plan and carry out group activities.  Not that my opinion in this area carries any weight, but I think that this is a great!  

This morning, my wife shared with me a need that this group of mothers has with regard to tracking and sharing resources and the acquisition of resources.  It sounded to me like a bit of software could meet their need.

Should I put together this software?  Where should it be hosted?  Can I figure out where to host it?  Can I figure out how to put this software together?

The answer to the "can-I" questions, is "yes" -- after all, that is what I do -- I write software.  But the answer to the "should-I" question is a bit less clear.  I'm between jobs right now -- that is, my last contract ended last Friday as I completed the project I was contracted to do, and I have not started another contract or another job yet -- I am currently looking.  Now, do I keep looking "full-time"? Or, do I spend some time working on this project for my wife?  Here's a bigger set of questions: Does my wife want me to write this software?  Does she need me to write this software?  Or am I just trying to solve problems because I think that I should solve problems? Do I just want to write this software because I would rather write software than look for software to write?  Do I want to write this software because I think I know what needs to be done?  Do I really know what needs to be done?  Do I want to write this software because I think that I will be in control of the software?

A Bit About Homeshool

My wife home-schools my three youngest kids -- they are all boys.  At different times, each of the older kids were home-schooled as well.  I think it is fair to say that I'm involved with this home-schooling as well.  My wife does almost all of the work.  How am I involved?  Short answer: mostly in a support role.  However, I have been involved in the teaching to a very small degree.

When my oldest two were in 3rd & 4th grade (or 2nd & 3rd -- I can't remember exactly any more), we participated in a home-school co-op with about 30 kids.  On my lunch hours I taught Spanish to the older part of this group twice a week.  My older brother taught twice a week as well.

This past Spring, I taught the three youngest about drawing.  I think the course lasted for about 4 or 5 weeks.  We'd meet for half an hour before I started work everyday.  Maybe I'll post something about that drawing experience some time.

Well, that's my teaching part of the home-schooling.  How does my support role work?  

Besides providing money for computers, software, books and other materials, the biggest part of my support role is to listen to my wife's ideas and then give my encouragement.  Occasionally, I share an idea with her as well -- but I really try to let her run her own show.  

Another big part I play is the role of "principal" which is the process of talking to the kids when they become less than cooperative.  I like to think that in this role I take an active part rather than a reactive part, but lately it seems to be much more reactive than active.  I need to think of and implement ways of being more active in this role.

I also keep the computers and software running.  I write code (that is, software) most of my waking hours (and sometimes in my sleep, too) which means I spend a lot of time at the computer.  I really don't like messing with computers once I'm done writing software.  That is, I need a break.  This is why I haven't kept a blog up-to-date -- this is why I drag my feet when it comes to configuring new software for the kids -- this is why I don't write software for myself after I am done writing software for someone else -- this is also why I got Macs for the house instead of Windows PCs.  (Okay, this idea of Macs instead of Windows is really because I got sick of fighting viruses literally every day.)

Why Write?

There are many reasons to write. I probably can't even come close to making an exhaustive list of the reasons. I can, however, list a few reason why I write.
  1. To clear my mind. I have a lot on my mind. I think that most people do. I have a wife and seven children. Providing for them and trying to keep up with all their needs and activities is always on my mind. My primary means for providing for them is to write software which is always on my mind, too. Sometimes I just need to do a "brain dump" and put a bunch of the stuff on my mind into writing. This allows me to think clearer once I am done.
  2. To communicate an idea. As I just stated, I write software. This is rarely done in isolation. Even if I am writing my own software, I need to communicate ideas to my "future self" -- that is, I write something today to remind me of the idea tomorrow and beyond. But, as most of the software I write is for or with someone else, I need to communicate my ideas about the software to them.
  3. To have fun. Sometimes I write to create fiction. I'd like to write a novel or even a few novels. I've started a few ideas. I wonder if a blog would be a good place to put together a novel.
  4. To record an event. I make journal entries from time to time. I really don't do this on a regular basis. Sometimes after an event I feel the need to record it, so I do. I make these journal entries in a number of different places ... which is really not the best practice. I have some in electronic format -- that is, I record these on my computer. The problem with these entries is that I change computers every couple of years. I have several entries, but I don't have a single place that I keep these and I have no good backup of them either. I also make journal entries in long-hand -- that is, by writing with a pen on paper. Again, I don't have a single place that I keep these and I have not good back up of these either. Wow, I could really use some improvement in this area!
  5. To say "Hi". I use instant message and email to talk with family and friends.
Well, there is my short list of the reasons I write.

Other Blogs

I've created just a couple of other blogs before this one.  I created one for thoughts about code -- the other for thoughts about family history.  My wife even has a couple of blog spots.  I might put a link to them some day.

Getting Started

I thought it would be handy to have a place to write.  At first, I wanted to call this "a place to write", but that name was taken (surprise!), so I changed this to "a place for thoughts".  The name is really about the same.  Am I writing? Or am I recording thoughts?