Thursday, September 11, 2008

Three Questions about Object Oriented Basics

The basics concepts that object oriented programming deals with are encapsulation, inheritance and polymorphism. I don't intend to fully explain these concepts here. Rather, I intend to answer a recently-received, specific question about each.

Encapsulation
Do you have an example of interacting with something public but not private?

Encapsulation is, of course, the idea of hiding implementation.

As far as an example goes, here is a contrived one. Suppose that we have a class describing a person.
public class Person { ... }
Suppose also that we need to be able to get the person's name in three different ways -- full name, first name and last name. We create a public set of methods to retrieve these values.
public class Person
{
public String GetFullName() { ... }
public String GetFirstName() { ... }
public String GetLastName() {... }
}
The class encapsulates the data behind these methods. From the perspective of using this class, we don't care how that data is stored.
Person person = new Person( ... );

... person.GetFullName() ...
... person.GetFirstName() ...
... person.GetLastName() ...
From the perspective of implementing this class, we have several choices for how to store this information.

We could store the full name in a single variable and then return that value for the GetFullName method while parsing out the first and last name for the other two methods.

Assuming that we have a Split method in the String class that returns a string array by dividing up the string using a delimiter:
public class String
{
String[] Split(Char delimiter) { ... }
}
We can write the Person implementation like this:
public class Person
{
private String fullName;

public String GetFullName() { return fullName; }
public String GetFirstName() { return fullName.Split(' ')[0]; }
public String GetLastName() { return fullName.Split(' ')[1]; }
}
As a different alternative, we could store the first and last names as separate variables and return the individual values for the GetFirstName and GetLastName methods while returning a concatenated value for the GetFullName method.
public class Person
{
private String firstName;
private String lastName;

public GetFullName() { return firstName + ' ' + lastName; }
public GetFirstName() { return firstName; }
public GetLastName() { return lastName; }
}
We could store the first and last names in a dictionary and retrieve the parts as needed.
public class Person
{
private Dictionary names;

public GetFullName() { return names["first"] + ' ' names["last"]; }
public GetFirstName() { return names["first"]; }
public GetLastName() { return names["last"]; }
}
There are several other alternatives, but just for sake of the example, here's one more.

We could store the names in a string array where the first name is in the first position of the array and the last name is in the last position of the array.

Assuming that we have a static Join method in the String class that takes all the members of a string array and joins them together with a specified delimiter:
public class String
{
public static String Join(String[] array, String delimiter) { ... }
}
We can write the implementation like this:
public class Person
{
private String[] names;

public String GetFullName() { return String.Join(names, " "); }
public String GetFirstName() { return names[0]; }
public String GetLastName() { return names[names.Length - 1]; }
}
This final example has the advantage that any number of middle names can be included.

Of course, in all of these examples, there would need to be a constructor or setter method that initializes the private data members and appropriate error handling would also need to be added.

Inheritance
How do you find out the superclasses and subclasses of a given class?

The easy answer is to use an object browser. Depending on which language you use, there are probably different tools that can be used. The additional answer is to read the documentation (aka. help files) that come with the framework or library that you are using. And of course, you can search the source code.

It would probably be helpful to flesh out this answer on a language by language case.

Polymorphism
What is polymorphism?

The concept of polymorphism is to use the same name for a method but with different parameters and/or parameter types.

The place that I've seen this used the most is in data structure or other low-level classes such as collections and input/output related functionality. A collection class will have a method to add another object to the collection. The method might be called Add. There will then be several methods called Add, but a different parameter type will be used for each method.
public class Collection
{
public void Add(int object) { ... }
public void Add(long object} { ... }
public void Add(string object} { ... }
...
}
For output, there is the idea of displaying a data value in human-readable format.
public class Out
{
public void Write(int object) { ... }
public void Write(long object) { ... }
public void Write(string object) { ... }
...
}
Other low-level classes might deal with things like communication protocols or database connections. In some cases, the setup is easy, so there is only one or two parameters. For other cases, the setup is more complex, so there is a method by the same name that takes more parameters.
public class Database
{
public void Connect(string databaseName) { ... }
public void Connect(string databaseName,
string userName, string password) { ... }
public void Connect(string databaseName,
ISecurityManger securityManger) { ... }
...
}
With polymorphism, the compiler looks at the parameter types to determine which method to use.

Friday, September 5, 2008

Reading Data

My Current Challenge
I'm working on the generic data object to data table mapping problem right now. Last time I worked on this problem I focused on the write functionality. I got the code that goes through the object using reflection working and then I spent some time working on the SQL script generation. Now I've got the basic read functionality working using similar reflection code. However, I'm running into a bit of a snag.

For most data tables, it is just easier to create an auto-incrementing Id field and then use that Id as a foreign key from other tables whenever a record -- a thus an object -- needs to be referenced. However, when looking up records, there are usually other columns that are more useful.


For example, on the Shoebox project, I have a User class that contains a name, a location and an email-address. The name will probably be split into first and last names. The location will be a reference to a City class. The email-address, however, needs to be unique, but I don't want to use the email-address a the primary key as this would be cumbersome to use as a foreign key in other tables. Additionally, the email-address might be updated, but the User would still be the same user. Therefore, it makes sense to use an auto-incrementing Id as the primary key.


In the table, the email-address column is indexed and marked as requiring unique values. This will allow for quick lookup and prevent duplicates. The problem is to come up with the right names to use in the generic code for reading records.


Read By Id
To read using the Id (or Identity), I can use a generic method like this:
public static T Read<T, P>(P id) { ... }


And then use it like this:
User user = DB.Read<User, long>(1);


This method will use the stored procedure that reads using the Id.


Read By Indexed Values
To read by an indexed value such as the email-address, I've got two problems:
  1. I need to be able to specify the column name(s) and type(s).
  2. I could have more than a single column.
If it were only a single column, I could use the DataAttribute I created to specify that single column and then have a generic method that looks like this:
public static T ReadByIndex<T, P>(P value) { ... }

And then use it like this:
User user = DB.ReadByIndex<User, string>("foo.bar@gmail.com");

Handling the Multiple Columns Problem
However, when there are multiple columns, what do I do?

The Alphabetical Option
Can I just add in another type parameter to my generic ReadByIndex method and assume that the columns are in a predetermined order -- like alphabetical?

Like this:
public static T ReadByIndex<T, P1, P2>(P1 a1, P2, a2) { ... }

With this:
Population population = DB.ReadByIndex<Population, string, string>("Denver", "Colorado");

This example assumes a class called Population with properties City and State that combined would allow a lookup. Alphabetically, City comes before State, so the first parameter is the city and the second is the state.

If more that 2 columns were necessary, another generic ReadByIndex method could be created providing addition parameters.

The Name-Value Option
Or, should I call out the name of the column along with its value?

Like this:
public static T Read<T>(params NameValue[] args) { ... }

With this:
Population population = DB.Read<Population>(new NameValue("City", "Denver"), new NameValue("State", "Colorado"));

This example assumes the same class called Population as well as a class called NameValue with a construtor like this:
public NameValue(string name, object value) { ... }

Which one do I like better?
First of all, I really like the first "read by Id" method -- Read<T, P>(P id). For all classes I will have the auto-incrementing Id, so this will make reading any record simple. I also like that this method has a type for the parameter, so that a long or an int can be used as appropriate for a given class.

Now, about the Pro's and Con's of the other methods... hmmm...

The "alphabetical option" is intriguing as its usage is just a matter of setting the Data attribute on the right properties in the class. This option also overcomes the lack-of-type-support-for-the-value drawback that the "name-value option" has. However, there are a couple of big drawbacks:
  1. At the point of usage, there is nothing to indicate what alphabetical parameters are being used.
  2. This only allows for a single set of indexes. Something would need to be modified in order to allow a second set of indexes.
The "name-value option" over comes the drawbacks of the "alphabetical option" -- 1) since the parameters are named, they can go in any order, and 2) to use a different combination of indexes, it is a simple matter to just use the desired columns. However, it has its own drawbacks:
  1. There is no compile-time support to make sure that the right names are used.
  2. The values are passed as the type object so there is not compile-time support for the value's type.
So, there are the trade-offs: lack of type support versus inflexible obscurity.

The Manually Coding of it All - A Non-Option
Another option would be to abandon the generic nature of this idea and just code up each piece as necessary ... but this is a "non-option" as it really just defeats the purpose of the idea.

My Choice
Well, I think I'm going to go with the "Name-Value Option". The purpose of this idea is to create a generic and flexible way to read and write objects to and from tables. The inflexible, alphabetical option would just get in the way due to its inflexibility. With the name-value option, a specific wrapper method can be created for indexed lookup and tests can be written to ensure that the column/property names work at runtime.

To illustrate this point about a wrapper method, using the Population class example, I would add a method to the Population class like this:
public static Population Read(string city, string state)
{
return DB.Read<Population>(new NameValue("City", city),
new NameValue("State", state));
}

Such that it could be used like this:
Population population = Population.Read("Denver", "Colorado");

Meanwhile, the "read by id" method could be used like this:
Population population = DB.Read<Population, long>(21);

This method, of course, could be put into a wrapper method in the Population class as well:
public static Read(long id) { ... }

Such that it could be used like this:
Population population = Population.Read(21);

That's enough for now.

Create Back-end Service

What Kind of Service
For this task, the first thing to do is to decide what kind of service to use.  That is, there is an architectural question that needs to be solved before the rest of the task can be completed.  So, what kind of service should I use?  I think I'll use a WCF service.

Location Lookup
I'd like to have a location (i.e., city) for each user and each group.  I figured that there ought to be some free web service that could be used.  It looks like I found one at webserviceX.net.  I think I'll give that a try.

Data Reading
I worked on the generic data reading problem.  It is highlighted in another post.

Several Additional Project Files
I added several additional project files for the database, data and engine layers.  These will, of course, be part of the service.

Thursday, September 4, 2008

Tasks 4 Through 9 Done

I completed tasks 4 through 9.  It took 20 minutes, but I did answer an email in the middle of that, so I'm going to call it 17 minutes.

Next Tasks

Now I am going to work on tasks 4 through 9.  I predict that this will take 15 minutes.

Create a Start Page

What I Did
Okay, I took the first 15 minutes and create a new WebApp project; added a label, input box and button to the default page; and added NewGroup page.  I then wandered off to cart kids around.  When I got back, I added the Response.Redirect call to the onclick handler so that the NewGroup page would be launched.  Finally, I ran the app and then I checked in the additions.

Delays
These are the delays I had to deal with:
  • My VM took a while to start.
  • I haven't messed with ASPX pages for some time so I had to look up the Response.Redirect call.
  • My IE browser was not configured right to work with intranet files.
  • I took a break to run kids around to their activities and then we've got a lot of typical evening chaos feeding the boys after football, cleaning out the vacuum cleaner, answering emails to my sister, redirecting kids as they pick up stuff, got the kids ready for bed, etc.
Some of those delays need to be included, and some of them do not.  I'm going to include the slow VM start up and the IE browser configuration.  I'll also count the short research I had to do.  But, the break to take care of the kids will not be counted.  (I'm also not counting the blogging time.)  Therefore, my initial estimate of 15 minutes is compared to an actual time of 25 minutes.

Status
Task 1 is now done.  Tasks 2 & 3 are also now done.

Pick a Scenario, Then What?

Now that I have enough of a scenario.  What do I do next?  I could create a screen (or page).  I could create classes for some of the domain objects.  I could create the back-end service that will talk to the DB and provide the information for the page.  

Here's perhaps a better question: How do I tell when the scenario is complete?

Well, perhaps the easiest way to proceed is to create a list of tasks.  When the tasks are done, the scenario is complete.  

Task List
So, with this in mind, here is the list of tasks:
  1. Create a "Start Page" to launch the "Create a New Group" page. DONE
  2. Simulate user information on this start page. DONE
  3. Create the "New Group Page". DONE
  4. Get user from context. DONE
  5. Add name input. DONE
  6. Add location input. DONE
  7. Add description input. DONE
  8. Add public/private flag input. DONE
  9. Add create button. DONE
  10. Create back-end service
  11. Add CreateGroup method to back-end service
  12. Add CheckGroupName to back-end service
  13. Add CheckCity to back-end service
  14. Create Group class
  15. Create User class
  16. Create Invitee class
Other Tasks
I am sure that I'll think of other tasks as I go through these tasks.  Those will just need to be added when they come up.

Estimates
Part of the process of developing software is to give an estimate of how long something is going to take.  This is generally a tricky business, that is, it is hard to get it right.  Why is this the case?  Partly because the nature of a task is not completely understood.  Partly because other things come up.  Partly because the task changes.  Partly because things that the task depends on changes.  

The Backlog
So, what can be done about this tricky stuff?  One thing that I like is the idea that a task is in one of 3 states -- done, in process, or back logged.  Things that get done count.  Really only one thing is in process at a time.  Everything else is in the backlog.  

If something in process is blocked, it goes back into the backlog until it can be unblocked.  It gets unblocked when whatever it depends on is done.  If it is blocked by something that was unknown, a new tasks is created and added to the backlog, moved to in process when it is the thing that is in process, and then on to the done stuff when it is done.

But not of this fixes the estimate challenge.  Each task still needs to be estimated and then measured and then the actual is compared to the estimate.  This shows how close the original estimates were and allows for better predictions when subsequent tasks are estimated.

The First Estimate on This Project
Well, I'm going to tackle the "create a start page" task.  I happen to have 15 minutes, so I'm going to give that as my estimate.

Create a Homeschool Group

I talked with my primary customer -- that would be my wife.  Her first comment was, "Why do we need multiple groups?"  My answer was that I would like to make this so that other groups in addition to her group could use the same system.  She said that that was a good idea.  Then she gave me some good feedback on the group idea.

Feedback
  1. With multiple groups, a user will need to be able to search for groups in their area.
  2. A group owner will need to be able to determine whether they are publicly visible or not.
  3. In order to search for groups, the group will need to have a location.  This location should be a city.
  4. When the group is created, it would be useful to send email invitations to people.
  5. If the group is publicly visible, notification of the new group could be sent to users in the area.
Items on the Create Group Screen
With this feedback, the create group screen will need this kind of information on it:
  • Input the name
  • Select a city for the location
  • Input a description
  • Flag it as public or private
  • Create an email address list for invitations
Group Information
Likewise, the group will contain or be related to all this information.
  • name
  • location
  • description
  • public or private flag
  • owner list
  • member list
  • invitation list
Other Scenarios
This "create a group" scenario has triggered other scenarios:
  1. An initial invitation is sent when a group is created
  2. An invited person joins the group to which they were invited
  3. An initial invitation is sent sometime after a group is created
  4. A secondary invitation is sent sometime after the initial invitation
  5. Add another owner
  6. Remove an owner
  7. Update the description
  8. Change the group location
  9. Change the group name
  10. Change the public/private flag

First Scenario

In order to work on Shoebox, I need to pick a scenario.  Let's start with "create a homeschool group".

What is a homeschool group?
A homeschool group is a set of people that meet together to discuss homeschool ideas and to share resources.  The people in the system are represented by a user account.  A group will have 1 or more associated user accounts.  1 or more of these users accounts will be co-owners of the group.  A user can belong to any number of groups.

So, the group needs to have a name and one or more owners and one or more members.

The group also has a list of existing resources and desired resources.

Creating a Homeschool Group
It looks like I need to have a User setup before a Group can be created.

To create a group, the user will use the following steps:
  1. User selects "Create a New Group"
  2. System displays "New Group Screen" prompting for a group name.
  3. User enters a name for the group.
  4. System tells the user when the name is valid or invalid -- the "Do It" button is disabled until the name is valid.
  5. User clicks the "Do It" button -- which is labeled something like "Create" or "Apply" or "Create Group" or "Done" ... I need to ask the ladies about this name.
  6. System creates the group using the given name with the User as the owner and a member of the group.
  7. System sends an email to the User indicating that the new group group has been created.
  8. System displays "Group Created Screen"
Questions
  • Does the group need a description?
  • Is there other information that needs to be added to the group?
  • Should the creator of the group be able to select other users as members of the group at creation time, or should this be a separate step?
  • Should the system notify other users that a new group has been created?  Should this be an option that the creator can select if desired?
Next Step
I need to run this past the ladies.

Back to the Shoebox Project

It has been a couple of weeks since I've blogged anything about the Shoebox project.  I did spent some time working on the dynamic data project ... but as side projects go, I've had to spend time on other things lately.

Creating the Source Code Repository and Solution
Now that I am ready to start on the source code, the first thing that I did was to create a source code repository.  For personal use, I like the idea of a source code repository for 2 main reasons -- 1) so that I can easily back up the source code, and 2) so that I can see when I have made changes with the option to revert if necessary.  When I work with a group, the source repository is obviously indispensable for sharing the source which would be a 3rd main reason.

When I work in a group, the repository is (almost always, if not always) on a server that gets backed up.  On my own home system, I don't have a main server machine that gets backed up.  So, what do I do?  I do have an external disk attached to my network.  I also have a USB flash disk.  I put my source code repository on the flash disk and then copy the flash disk to the network drive periodically.

As I recently completed a contract, I have several little utility or investigation projects that I had created during the course of the contract that I now need to back up and track on my own.  So, as soon as I created the source code repository (I called the folder SourceRepository) on my flash disk, I put a couple of the utility or investigation projects into the source repository.  I'll need to move the other projects of this nature into this source repository.

I then created the solution file.  As this project's code name is Shoebox, I named the solution using today's date and this code name: 2008-09-04-Shoebox.  As soon as I created this empty solution, I added it to the source repository.

Now I am ready to start adding projects to this solution.

Wednesday, September 3, 2008

About Booleans

Boolean values are either True or False.  These are used in logic calculation and expressions--for example, in an "if" or "while" statement.  You might set a variable called "isStored" to either True or False indicating whether something has been written to disk or not.  Later, when the User tries to close the program, you would test that variable in an "if" statement to determine whether to prompt the User to save the data or not.

// somewhere in the code ...
// ... other code ...
boolean isStored = false;
// ... other code ...


// somewhere else in the code...
void SomeMethod()
{
  if (isStored)
  {
    // ... handle the stored case ...
  }
  else
  {
    // ... handle the not-stored case ... 
    // ... for example, prompt the user to decide whether to save or not ...
  }
}

There are other uses for boolean as well.

Integers, Binary and Hexadecimal

What is an integer?
An integer is a number without a decimal point -- for example: -312, 0, 18, 415, etc. Languages usually have both signed and unsigned integer types. Unsigned integers start at zero and go up: 0, 1, 2, 3, ...  Signed integers include zero and both positive and negative numbers: ..., -3, -2, -1, 0, 1, 2, 3, ...

Sizes
How far up or down the numbers go depends on the size of the integer type.  Sizes are measured in bytes.  On modern machines, a byte is usually defined as 8 bits.  Typical integer sizes include 1 byte, 2 bytes, 4 bytes and now 8 bytes.

Single Byte Values
1 byte gives 8 bits.  For signed integers of this size the values range from -128 to 127 -- that is negative 128 to positive 127.  For unsigned integers of this size the values range from 0 to 255. For each of these ranges, there are 256 distinct values.  This is because 2 raised to the 8th power is 256.

Binary Values
Written in binary, a single byte integer goes from 00000000 to 11111111.  Here is the sequence of the first 16 values:
  • 00000000
  • 00000001
  • 00000010
  • 00000011
  • 00000100
  • 00000101
  • 00000110
  • 00000111
  • 00001000
  • 00001001
  • 00001010
  • 00001011
  • 00001100
  • 00001101
  • 00001110
  • 00001111
If you continue this sequence, you will have 256 distinct values when you reach 11111111.  

Here are the last 16 values of this sequence:
  • 11110000
  • 11110001
  • 11110010
  • 11110011
  • 11110100
  • 11110101
  • 11110110
  • 11110111
  • 11111000
  • 11111001
  • 11111010
  • 11111011
  • 11111100
  • 11111101
  • 11111110
  • 11111111
The 256 values corresponding to unsigned integers start with 00000000 as 0, 00000001 as 1, etc. until 11111111 as 255 is reached.

The 256 values corresponding to signed integers start with 00000000 as 0, 00000001 as 1, etc. until 01111111 as 127 is reached followed by 10000000 as -128, 10000001 as -127, etc. until 11111111 as -1 is reached.

Hexadecimal Values
As there are 8 places in a single byte value, this can be cumbersome to write.  Hexadecimal is a way to represent the same values with a tighter notation using 16 symbols.  Note that "hex" means "six" and "decimal" means "ten" so that "hex-a-decimal" means "six and ten" or "sixteen".

The sixteen symbols are 0 through 9 followed by A through F: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F

These correspond to the binary values like this:
  • 0000 = 0
  • 0001 = 1
  • 0010 = 2
  • 0011 = 3
  • 0100 = 4
  • 0101 = 5
  • 0110 = 6
  • 0111 = 7
  • 1000 = 8
  • 1001 = 9
  • 1010 = A
  • 1011 = B
  • 1100 = C
  • 1101 = D
  • 1110 = E
  • 1111 = F
Putting 2 of these symbols together allows us to represent a whole byte.  Thus zero can be written as 00000000 in binary, 0 in decimal an 0x00 in hexadecimal.  

The "0x" prefix is used to clarify to the compilers or interpreters that the value is hexadecimal and not decimal.  For example, we write the decimal value 32 in hexadecimal as 0x20, but we write the hexadecimal value 0x32 as 50 in decimal.  In some languages some other prefix or suffix might be used.

Double Byte Values
The 2 byte values use 16 bits.  2 to the 16th power (i.e., 2^16) gives 65536 distinct values.  For unsigned integers, this ranges from 0 to 65535 (or 0x0000 to 0xFFFF in hexadecimal).  For signed integers, this ranges from -32768 to 32767 -- that is negative 32768 to positive 32767.

Notice that if we were to write these values in binary, it would take a lot of room:
  • 0 decimal = 0000 0000 0000 0000 binary = 0x0000 hexadecimal
  • 65535 decimal = 1111 1111 1111 1111 binary = 0xFFFF hexadecimal
4 Byte Values
4 byte values use 32 bits giving 4,294,967,296 distinct values.  Unsigned integers ranging from 0 to 4294967295.  Signed integers ranging from -2147483648 to 2147482647.

8 Byte Values
8 byte values use 64 bits giving 18,446,744,073,709,551,616 distinct values with unsigned integers ranging from 0 to 18446744073709551615 and signed integers ranging from -9223372036854775808 to 9223372036854775807.

Integer Names
Depending on the language, there may or may not be a way to declare all of these integer sizes. Here are some examples.

C#
  • sbyte - signed 8 bit
  • byte - unsigned 8 bit
  • short - signed 16 bit
  • ushort - unsigned 16 bit
  • int - signed 32 bit
  • uint - unsigned 32 bit
  • long - signed 64 bit
  • ulong - unsigned 64 bit
Java
Note that Java does not use unsigned integer types.
  • byte - signed 8 bit
  • short - signed 16 bit
  • int - signed 32 bit
  • long - signed 64 bit
Other Languages
In other languages, such as C, the size can vary depending on the machine and the implementation of the compiler.  Some of these languages use multiple words to for the data type.  For example, "long" might be used for a signed type while "unsigned long" would be the name of the unsigned type of the same size.

Aliases
As C# is just one language of several that can be used to create programs for the .Net Framework, the names for the data types in C# are alias for .Net Framework data types.  For example, the name byte is an alias for the type System.Byte and the name ulong is an alias for the type System.UInt64.

In C, macros are often used to create an alias for a type.  For example, on a machine where the long data type is 32 bits, a macro might be defined for the unsigned long type called U32.

Native Data Types

Most (if not all) of the popular programming languages have similar native or built-in data types.  The list includes the following:
  • integer
  • floating point
  • boolean
  • character
  • string
  • array
For any given language, there might be additional built-in types.  The keywords and syntax used to declare these types will also vary from language to language.  Any given language will also have various sizes of each of these types. 

Let's explore each of theses in a separate posting.

Tuesday, September 2, 2008

Mentoring a 12-Year-Old

Introducing "The Kid"
I started mentoring a 12-year-old kid.  Since he is 12, I'll not put his name here.  I'll call him "the kid".

Well, the kid is a smart kid.  He reads a lot and he remembers most of what he reads.  He spent the past couple of months messing around with HTML and has put together a few pages.  Now the kid is ready to start programming.  He wants to learn Java.  He's folks approached me and asked me if I would tutor or mentor him.  I said, "Sure," because it is always good to share something with someone that wants to learn and they are old enough and ready to learn.

The kid has a Mac.  I have a Mac.  (I actually have several Macs.)  But I've been developing on Windows for about 15 years, but I've dabbled in the Mac programming a bit over the years.  Currently, I'm trying to do more development on the Mac.

The kid's family and my family along with another family had a BBQ at the kid's house.  One of my sons is a good friend of the kid, and we all go to church together -- this is probably one of the reasons the kid's parents asked me to be his mentor.  While we were at the BBQ, the kid shared with me where he is and what he is doing.

The Simple Stuff ... with a Problem
As I said, he has been reading a lot, but he's reached the point where he needs help sorting out what he has read.  He's been reading a book about Java programming.  It turns out that the book is 8 years old ... which is really old as far as a typical computer book goes.  The problem is that he has been having trouble getting the simple little programming examples to run on his Mac.

The first simple example is the classic "Hello World" program.

public class Foo {
  public static void main(String args[]) {
    System.out.println("Hello, World!");
  }
}

The problem on the Mac is that it has a GUI so the output to standard out does not show up.  Now, of course, there are ways and environments that can be used on the Mac to see the text sent to standard out.  But the kid does not know what standard out is nor why it does not show up.

The Not-so-simple Stuff
The kid was running XCode and discovered that all sorts of other stuff is added to the program when a Java application project is created.  He wanted to see the simple little example work without all the other stuff.  I figured that as soon as he saw that running, he would want to create an actual application.  I showed him how to get the simple little example working, and he then wanted to create an actual application.

Making the Hello World Program Work
Here's is how I had him run the example.
  1. Start Terminal -- Today, the default for this is to run the Unix Bash shell.
  2. Create a folder for the example -- We created a folder called "javadev" using the command "mkdir javadev".
  3. Change the current directory to the javadev folder -- We used the command "cd javadev".
  4. Create a file called "Foo.java" containing the Hello World program -- We used TextEdit to create this file.  I used vi on my machine just to show that it doesn't matter what is used to create the file.
  5. Compile Foo.java into a .class file using the "javac" command -- We typed in "javac Foo.java" which created the Foo.class file.
  6. Run Foo by using the "java" command -- We typed "java Foo" and the words "Hello, World!" showed up in the Terminal window.
Other Unix Commands
Along the way I had him use the "mkdir", "cd", "pwd", "cat" and "ls" commands.  The kid, of course, asked where to find a list of these commands and what they are and how to use them and etc.  And, as I stated, as soon as the Hello World program ran, he asked how to make an application.

Other Stuff
I pointed out to the kid where there is a lot of information about Java programming on Apple's developer site. From there, there are links to a getting started guide which in turn links to Java tutorials at Sun's site which in turn links to information about running the Hello World program using the NetBeans IDE.  

We briefly talked about classes, constructors, static & non-static methods, and public & private scope of methods.

What's Next
I wanted to show him how the XCode project is put together and what he can do with that ... but we ran out of time.  I believe that the kid is going to continue reading and trying stuff.  In particular he is planning on messing with variables, computations and some of the other stuff that is basic to all programming.

We'll just have to see what this next week brings.

Friday, August 15, 2008

Parameterized Database Methods

Can't Use an Interface
The methods that deal with the database are Read, Write, Exists & Delete.  Read, Exists & Delete are static methods.  Write is an instance method.  Because of the static methods, I can not use an interface.

Generic Method & Reflection
Is there a way that I can use a typed method and reflection to interact with the database? Using this idea of attributes on the class and passing the type as a parameter to one of these database interaction methods, can reflection be used to identify each of the fields for reading, writing, existence and deletion?

An Example
I'd like to outline a few examples.

Suppose I have a class Foobar.

I might be able to read a Foobar record using code that looks like this:
  Foobar foobar = Database.Read<Foobar>(foobarId);

I might be able to see if Foobar record exists using code like this:
  if ( Database.Exists<Foobar>(foobarId) ) { ... }

I might be able to write a Foobar record list this:
  Database.Write<Foobar>(foobar);

Or, this (as the type can be determined from the parameter):
  Database.Write(foobar);

And I could delete like this:
  Database.Delete<Foobar>(foobarId);

Questions
Would this give me compile-time information for the foobarId parameter?  If a key for a particular table was comprised of multiple values, would a params parameter be adequate? Or, would a multiple-type generic be better (e.g., Database.Delete<T, P1, P2>(P1 a1, P2 a2))?

Script Generation
Even with the generic type doing all the work, the database creation and conversion scripts would still need be be generated from the type ... but the compiled type could be used to generate this information rather than some other file format.

Database Identifier
I would still need to pass in a database identifier as well.

Generating the Code

Essential Information
The following information is necessary to generate the rest of the code:
  • database identifier
  • class/table name
  • fields - name, type & purpose
Conversion Script
In order to generate the conversion script, the previous version of this information will also be necessary.  If multiple previous version are currently in use, a conversion script can be generated for each of the previous versions in use.

Database Identifier
The database identifier tells which which connection string to pull from the configuration file.

  connectionString = ConfigurationManager
    .ConnectionStrings[
      ConfigurationManager
        .AppSettings[databaseIdentifier]
    ].ConnectionString;

Class/Table Name
The class/table name tells the name to use for the class in the CS file as well as the table in the database.

Fields
The fields identify the fields of the class and the columns of the table.  The order of these fields determines the order of the columns.

Field Name
The field name is the name of the field and the column.

Field Type
The field type is one of the following: 
  • primitive type - The primitive type is a C# type.  In the case of a string, a length for the database is also included.  
  • enumeration - The names of the enumerated values are also included.
  • other class - The other class needs to exist in order to include the key(s) to that class's table.
Field Purpose
The field purpose is one of the following:
  • passed-in key - the value for the key will be passed into the object constructor
  • code-generated key - the value for the key will be generated in the object constructor using the code provided
  • database-generated key - the value for the key will be generated in the database (this is for auto-incrementing keys)
  • required non-key - the value will be passed into the object constructor - a null value is not allowed
  • optional non-key - the value can be set via a property and will remain null if not provided
Generated Code
The tool can use this information to generate the constructor, private data members, properties and the essential methods (Read, Write, Delete, Exists).  This code will go into a file using the class/table name following this pattern: {classname}.generated.cs. Additional Developer code can go in the corresponding {classname}.supplemental.cs file.  This will allow the {classname}.cs file to be used just for the information essential for generating the rest of the class.  (The source file can, of course, be an XML file or any other file as long as it contains the essential information and the tool can process it.)

Generated SQL Scripts
The tool also generates the .SQL file which contains the database creation script including all of the stored procedures.  This file is called {classname}.sql and can then be included in a master script file for generating the database.

Previous Versions
The tool needs to have the ability to create conversion scripts from previous versions.

Where is the version information recorded?  Does it need to be recorded?

There are several ways to handle the version problem.  An existing database could be compared.  Another input file could be compared.  The generated CS and/or SQL file could be compared.  The input file could have multiple sections with one section for each of the various versions.  I will deal with the version problem at another time.

Dealing with the Database

Here is my current idea on how to deal with the database.

Mark the Class and Fields with Attributes
The Developer creates the class and adds a few attributes.

  [Database("SampleDatabase")]
  public partial class User
  {
    [Key]
  private string name;

    [Required]
  private string password;
  }

The attributes indicate how to connect the object to a database and how to handle the fields. The class is marked as partial so that generated code can be added to it.

Run a Tool
A tool is used on this class to generate the constructor, the public accessor properties, the Read, Write, Exists and Delete methods, the stored procedures used by these methods, and the create table script.

Questions About the Tool
What is this tool?  Is it run as part of the build script?  Is it a stand-alone tool?  Is it a plug-in to Visual Studio?  Does it use the text of the class as input? Or, does it work off a compiled class?  Can the other parts of the class be generated at run time?

When does the tool run again? Is it automatically part of the build process? Or, does it need to be run manually?

A Similar Tool
I worked on a similar tool, called CSFromXSD, that took an XSD file as input and generated classes that would serialize and de-serialize objects to and from XML files that conformed to the schema described in the XSD.  The XSD was part of the CSProj, and the dependencies were setup such that if the XSD was modified, the associated, generated CS file was regenerated.

This idea of starting with a CS file is a little different.  This would be generating CS-from-CS; therefore, the parsing is not an XML parsing process.  It would be handy to have the original CS file compiled so that I could use reflection on it.

I thought of this CSFromXSD when I came up with this CSFromCS idea. However, I may want to take a different approach on this tool.

Developer View
In the long run, what I am shooting for is that the Developer creates a class (or class description) identifying the name of the class and the fields that need to be persisted and the necessary database scripts and the code to interact with the database are generated.  If the input class (or class description) is modified, the scripts and code are automatically regenerated including some sort of process to update the database.

The Developer then interacts with the class a very natural way.

  if (User.Exists(username)) 
  {
    throw new Exception("User already exists."); 
  }  

  User user = new User(username, password);
  user.Write();

  string name = user.Name;
  string password = user.Password;

  user.Password = "P@55w0rd";
  user.Write();

  User user2 = User.Read(username);

  User.Delete(username);

When another field is added to the User class, the creation and update scripts are generated, the class is regenerated, and the Developer runs the scripts on the database.  Then the Developer can use the new fields in a natural way.

  User user = new User(username, password)
  {
    emailAddress = emailAddress
  }

  user.Write();

A Changing Database

From the code side of things (as opposed to the database side of things), the developer just wants to write the object and read the object.  Actually, the developer just wants to use the object and have it read and written as needed automatically.  The developer implements a given scenario adding fields to an object and/or creating new objects.  In the underlying database, the developer doesn't care what tables, columns and stored procedures are created or modified.  He just wants the objects to be stored and retrieved as close to automatic as possible.  However, as the database schema changes, the developer would like to be able to move from one schema to the other without worry.

Here's a simple example.

The Developer needs to implement a scenario called "Create User Account".  Initially, the developer identifies that a User will have a name and a password.

  public class User
  {
  string name;
  string password;
  }

This class is mapped in the database as the User table with a name and a password column.  The name is unique and used as the identifier.  A script is created to create the table with the two columns.

Later, the Developer discovers that an email address should be associated with the User as well giving this kind of a class:

  public class User
  {
  string name;
  string password;
    string emailAddress;
  }

The User table now needs to be modified to include the additional emailAddress column.  The scripts to create the table need to be modified as well.  Additionally, an upgrade procedure needs to be create that converts from the old format to the new if the database has been deployed.

The Developer should not need to be concerned with these changes to the underlying database. All the Developer should need to be concerned with is adding the emailAddress field to the User class.

Obviously, this is a simple example.  Whole additional objects might be added or fields removed or renamed.  When multiple changes have been made, these need to be cumulated.

Is it possible to remove this detail from the Developer and generate the creation scripts and upgrade procedure automatically?

Object to Database Mapping Problem

I'd like to talk about the technology to map objects to database for a minute.  I recently created the beginnings of a system to make this mapping.  I've even used systems created by others.  I'd like to investigate the LINQ technology.

Before I go off on one of these ideas, I'd like to describe the problem of mapping objects to the database as I see it.

Databases contain tables with columns.  These are 2 dimensional structures.  Objects contain fields which can be other objects.  These can be 2, 3 or even N-dimensional structures.  Going from N-dimensions to 2-dimensions requires some sort of mapping.  Additionally, in order to represent the N-dimensions in the 2-dimensional structure, it requires multiple tables that are joined together.

As features are added to a product or system, new fields and objects are added which result in additional columns or entirely new tables.

At the "leaf-level" of the problem, the data types in the database do not always line up with the data types in the objects.  For example, should a "string" in the object space be mapped to an "nchar" or "varchar" or some other type of "char" in the database ... and then of what size?  Likewise, is the granularity of the "DateTime" value in the object space the same as that in the database?  If not (and in the cases I've seen, it is not), what can be done to over come that problem?  Or is the lower granularity good enough?

Should the table have an auto-incrementing id? Or should some field or combination of fields be used as the unique identifier for a table?

Should inherited objects be stored in a single table? Or should the base class information be stored in one table while the derived object's information is stored in other?

Should an entire set of data be read in? Or should just the top-level object be read in on the initial read with the other levels read in on an as-needed basis?  Should this be a configurable option?  Should this be an option that changes depending on the size of the data set?

Can the object be serialized and de-serialized instead of being mapped to a table?

What's Next?

Where do I go next?  I could flesh out the scenarios.  I could layout a database schema.  I could create the skeleton infrastructure of the project -- web site, back end, database.  I could play with the layout of the web pages.  I could mess with the technology to map objects to database.

(I could even go look for existing software that meets the current requirements -- but that would not be any fun, would it?!)

Additional Features

I met briefly with three of the homeschool mothers -- my wife and two others.  I was right -- they came up with several other ideas for the web site.

The first thing they did was list of a bunch of categories.  It became apparent to me that allowing them to manage their list of categories is a good idea.  They then added several additional pieces of functionality: calendar, field trip schedules, links to other sites, picture history of activities with the ability to create a scrapbook from them.  I read the list of scenarios to them and got a bunch of heads nodding in agreement.  I then described the information that needs to be tracked for each resource -- both existing and desired.  Here they came up with additional information to track such as the price and where a desired resource can be found.  I suggested that the location of an existing resource might be independent of the owner.

It was only about 10 minutes of time and it was a small group, but it was a valuable 10 minutes to glean the ideas that they presented.

Oh ya! There was one other request: Make it cute!  This may be the most abstract request, but it may also be the most important.  I'll probably need to spend some time thinking about this request.

Shoebox Scenarios

I will now attempt to list out a few of the scenarios for Shoebox.  This is an initial or preliminary list -- that is, I expect that I will need to review, revise and prioritize this list over time.

Rather that put the details here now, I will just list out the activities that the Users of this system will need to be able to do.
  1. Create a User Account
  2. Create a Homeschool Group
  3. Add a User to Group (or a Group to a User)
  4. Add an Existing Resource
  5. Remove an Existing Resource
  6. Borrow a Resource
  7. Return a Resource
  8. Add a Desired Resource
  9. Cast a Vote for a Desired Resource
  10. Remove a Vote from a Desired Resource
  11. Converted a Desired Resource to an Existing Resource
  12. Remove a Desired Resource
  13. Remove a Homeschool Group
  14. Remove a User Account from a Group (or Group from an Account)
  15. Remove a User Account
  16. Add a Category
  17. Remove a Category
It seems like the system will need to have the ability to make comments about resources.  I imagine that additional functionality will be desired over time.

Shoebox Purpose & Information

Shoebox is the code name for this software.

The purpose of this software is to allow the homeschool mothers group to share and acquire resources.

This is the (initial) information about an existing resource:
  • resource id
  • name or title
  • category
  • description
  • owner
This is the information about a loaned resource:
  • resource id
  • borrower
  • date loaned
  • date returned
This is the information about a desired resource:
  • resource id
  • name or title
  • category
  • description
This is the information about a vote for a desired resource:
  • resource id
  • voter