Thursday, September 11, 2008

Three Questions about Object Oriented Basics

The basics concepts that object oriented programming deals with are encapsulation, inheritance and polymorphism. I don't intend to fully explain these concepts here. Rather, I intend to answer a recently-received, specific question about each.

Encapsulation
Do you have an example of interacting with something public but not private?

Encapsulation is, of course, the idea of hiding implementation.

As far as an example goes, here is a contrived one. Suppose that we have a class describing a person.
public class Person { ... }
Suppose also that we need to be able to get the person's name in three different ways -- full name, first name and last name. We create a public set of methods to retrieve these values.
public class Person
{
public String GetFullName() { ... }
public String GetFirstName() { ... }
public String GetLastName() {... }
}
The class encapsulates the data behind these methods. From the perspective of using this class, we don't care how that data is stored.
Person person = new Person( ... );

... person.GetFullName() ...
... person.GetFirstName() ...
... person.GetLastName() ...
From the perspective of implementing this class, we have several choices for how to store this information.

We could store the full name in a single variable and then return that value for the GetFullName method while parsing out the first and last name for the other two methods.

Assuming that we have a Split method in the String class that returns a string array by dividing up the string using a delimiter:
public class String
{
String[] Split(Char delimiter) { ... }
}
We can write the Person implementation like this:
public class Person
{
private String fullName;

public String GetFullName() { return fullName; }
public String GetFirstName() { return fullName.Split(' ')[0]; }
public String GetLastName() { return fullName.Split(' ')[1]; }
}
As a different alternative, we could store the first and last names as separate variables and return the individual values for the GetFirstName and GetLastName methods while returning a concatenated value for the GetFullName method.
public class Person
{
private String firstName;
private String lastName;

public GetFullName() { return firstName + ' ' + lastName; }
public GetFirstName() { return firstName; }
public GetLastName() { return lastName; }
}
We could store the first and last names in a dictionary and retrieve the parts as needed.
public class Person
{
private Dictionary names;

public GetFullName() { return names["first"] + ' ' names["last"]; }
public GetFirstName() { return names["first"]; }
public GetLastName() { return names["last"]; }
}
There are several other alternatives, but just for sake of the example, here's one more.

We could store the names in a string array where the first name is in the first position of the array and the last name is in the last position of the array.

Assuming that we have a static Join method in the String class that takes all the members of a string array and joins them together with a specified delimiter:
public class String
{
public static String Join(String[] array, String delimiter) { ... }
}
We can write the implementation like this:
public class Person
{
private String[] names;

public String GetFullName() { return String.Join(names, " "); }
public String GetFirstName() { return names[0]; }
public String GetLastName() { return names[names.Length - 1]; }
}
This final example has the advantage that any number of middle names can be included.

Of course, in all of these examples, there would need to be a constructor or setter method that initializes the private data members and appropriate error handling would also need to be added.

Inheritance
How do you find out the superclasses and subclasses of a given class?

The easy answer is to use an object browser. Depending on which language you use, there are probably different tools that can be used. The additional answer is to read the documentation (aka. help files) that come with the framework or library that you are using. And of course, you can search the source code.

It would probably be helpful to flesh out this answer on a language by language case.

Polymorphism
What is polymorphism?

The concept of polymorphism is to use the same name for a method but with different parameters and/or parameter types.

The place that I've seen this used the most is in data structure or other low-level classes such as collections and input/output related functionality. A collection class will have a method to add another object to the collection. The method might be called Add. There will then be several methods called Add, but a different parameter type will be used for each method.
public class Collection
{
public void Add(int object) { ... }
public void Add(long object} { ... }
public void Add(string object} { ... }
...
}
For output, there is the idea of displaying a data value in human-readable format.
public class Out
{
public void Write(int object) { ... }
public void Write(long object) { ... }
public void Write(string object) { ... }
...
}
Other low-level classes might deal with things like communication protocols or database connections. In some cases, the setup is easy, so there is only one or two parameters. For other cases, the setup is more complex, so there is a method by the same name that takes more parameters.
public class Database
{
public void Connect(string databaseName) { ... }
public void Connect(string databaseName,
string userName, string password) { ... }
public void Connect(string databaseName,
ISecurityManger securityManger) { ... }
...
}
With polymorphism, the compiler looks at the parameter types to determine which method to use.

Friday, September 5, 2008

Reading Data

My Current Challenge
I'm working on the generic data object to data table mapping problem right now. Last time I worked on this problem I focused on the write functionality. I got the code that goes through the object using reflection working and then I spent some time working on the SQL script generation. Now I've got the basic read functionality working using similar reflection code. However, I'm running into a bit of a snag.

For most data tables, it is just easier to create an auto-incrementing Id field and then use that Id as a foreign key from other tables whenever a record -- a thus an object -- needs to be referenced. However, when looking up records, there are usually other columns that are more useful.


For example, on the Shoebox project, I have a User class that contains a name, a location and an email-address. The name will probably be split into first and last names. The location will be a reference to a City class. The email-address, however, needs to be unique, but I don't want to use the email-address a the primary key as this would be cumbersome to use as a foreign key in other tables. Additionally, the email-address might be updated, but the User would still be the same user. Therefore, it makes sense to use an auto-incrementing Id as the primary key.


In the table, the email-address column is indexed and marked as requiring unique values. This will allow for quick lookup and prevent duplicates. The problem is to come up with the right names to use in the generic code for reading records.


Read By Id
To read using the Id (or Identity), I can use a generic method like this:
public static T Read<T, P>(P id) { ... }


And then use it like this:
User user = DB.Read<User, long>(1);


This method will use the stored procedure that reads using the Id.


Read By Indexed Values
To read by an indexed value such as the email-address, I've got two problems:
  1. I need to be able to specify the column name(s) and type(s).
  2. I could have more than a single column.
If it were only a single column, I could use the DataAttribute I created to specify that single column and then have a generic method that looks like this:
public static T ReadByIndex<T, P>(P value) { ... }

And then use it like this:
User user = DB.ReadByIndex<User, string>("foo.bar@gmail.com");

Handling the Multiple Columns Problem
However, when there are multiple columns, what do I do?

The Alphabetical Option
Can I just add in another type parameter to my generic ReadByIndex method and assume that the columns are in a predetermined order -- like alphabetical?

Like this:
public static T ReadByIndex<T, P1, P2>(P1 a1, P2, a2) { ... }

With this:
Population population = DB.ReadByIndex<Population, string, string>("Denver", "Colorado");

This example assumes a class called Population with properties City and State that combined would allow a lookup. Alphabetically, City comes before State, so the first parameter is the city and the second is the state.

If more that 2 columns were necessary, another generic ReadByIndex method could be created providing addition parameters.

The Name-Value Option
Or, should I call out the name of the column along with its value?

Like this:
public static T Read<T>(params NameValue[] args) { ... }

With this:
Population population = DB.Read<Population>(new NameValue("City", "Denver"), new NameValue("State", "Colorado"));

This example assumes the same class called Population as well as a class called NameValue with a construtor like this:
public NameValue(string name, object value) { ... }

Which one do I like better?
First of all, I really like the first "read by Id" method -- Read<T, P>(P id). For all classes I will have the auto-incrementing Id, so this will make reading any record simple. I also like that this method has a type for the parameter, so that a long or an int can be used as appropriate for a given class.

Now, about the Pro's and Con's of the other methods... hmmm...

The "alphabetical option" is intriguing as its usage is just a matter of setting the Data attribute on the right properties in the class. This option also overcomes the lack-of-type-support-for-the-value drawback that the "name-value option" has. However, there are a couple of big drawbacks:
  1. At the point of usage, there is nothing to indicate what alphabetical parameters are being used.
  2. This only allows for a single set of indexes. Something would need to be modified in order to allow a second set of indexes.
The "name-value option" over comes the drawbacks of the "alphabetical option" -- 1) since the parameters are named, they can go in any order, and 2) to use a different combination of indexes, it is a simple matter to just use the desired columns. However, it has its own drawbacks:
  1. There is no compile-time support to make sure that the right names are used.
  2. The values are passed as the type object so there is not compile-time support for the value's type.
So, there are the trade-offs: lack of type support versus inflexible obscurity.

The Manually Coding of it All - A Non-Option
Another option would be to abandon the generic nature of this idea and just code up each piece as necessary ... but this is a "non-option" as it really just defeats the purpose of the idea.

My Choice
Well, I think I'm going to go with the "Name-Value Option". The purpose of this idea is to create a generic and flexible way to read and write objects to and from tables. The inflexible, alphabetical option would just get in the way due to its inflexibility. With the name-value option, a specific wrapper method can be created for indexed lookup and tests can be written to ensure that the column/property names work at runtime.

To illustrate this point about a wrapper method, using the Population class example, I would add a method to the Population class like this:
public static Population Read(string city, string state)
{
return DB.Read<Population>(new NameValue("City", city),
new NameValue("State", state));
}

Such that it could be used like this:
Population population = Population.Read("Denver", "Colorado");

Meanwhile, the "read by id" method could be used like this:
Population population = DB.Read<Population, long>(21);

This method, of course, could be put into a wrapper method in the Population class as well:
public static Read(long id) { ... }

Such that it could be used like this:
Population population = Population.Read(21);

That's enough for now.

Create Back-end Service

What Kind of Service
For this task, the first thing to do is to decide what kind of service to use.  That is, there is an architectural question that needs to be solved before the rest of the task can be completed.  So, what kind of service should I use?  I think I'll use a WCF service.

Location Lookup
I'd like to have a location (i.e., city) for each user and each group.  I figured that there ought to be some free web service that could be used.  It looks like I found one at webserviceX.net.  I think I'll give that a try.

Data Reading
I worked on the generic data reading problem.  It is highlighted in another post.

Several Additional Project Files
I added several additional project files for the database, data and engine layers.  These will, of course, be part of the service.

Thursday, September 4, 2008

Tasks 4 Through 9 Done

I completed tasks 4 through 9.  It took 20 minutes, but I did answer an email in the middle of that, so I'm going to call it 17 minutes.

Next Tasks

Now I am going to work on tasks 4 through 9.  I predict that this will take 15 minutes.

Create a Start Page

What I Did
Okay, I took the first 15 minutes and create a new WebApp project; added a label, input box and button to the default page; and added NewGroup page.  I then wandered off to cart kids around.  When I got back, I added the Response.Redirect call to the onclick handler so that the NewGroup page would be launched.  Finally, I ran the app and then I checked in the additions.

Delays
These are the delays I had to deal with:
  • My VM took a while to start.
  • I haven't messed with ASPX pages for some time so I had to look up the Response.Redirect call.
  • My IE browser was not configured right to work with intranet files.
  • I took a break to run kids around to their activities and then we've got a lot of typical evening chaos feeding the boys after football, cleaning out the vacuum cleaner, answering emails to my sister, redirecting kids as they pick up stuff, got the kids ready for bed, etc.
Some of those delays need to be included, and some of them do not.  I'm going to include the slow VM start up and the IE browser configuration.  I'll also count the short research I had to do.  But, the break to take care of the kids will not be counted.  (I'm also not counting the blogging time.)  Therefore, my initial estimate of 15 minutes is compared to an actual time of 25 minutes.

Status
Task 1 is now done.  Tasks 2 & 3 are also now done.

Pick a Scenario, Then What?

Now that I have enough of a scenario.  What do I do next?  I could create a screen (or page).  I could create classes for some of the domain objects.  I could create the back-end service that will talk to the DB and provide the information for the page.  

Here's perhaps a better question: How do I tell when the scenario is complete?

Well, perhaps the easiest way to proceed is to create a list of tasks.  When the tasks are done, the scenario is complete.  

Task List
So, with this in mind, here is the list of tasks:
  1. Create a "Start Page" to launch the "Create a New Group" page. DONE
  2. Simulate user information on this start page. DONE
  3. Create the "New Group Page". DONE
  4. Get user from context. DONE
  5. Add name input. DONE
  6. Add location input. DONE
  7. Add description input. DONE
  8. Add public/private flag input. DONE
  9. Add create button. DONE
  10. Create back-end service
  11. Add CreateGroup method to back-end service
  12. Add CheckGroupName to back-end service
  13. Add CheckCity to back-end service
  14. Create Group class
  15. Create User class
  16. Create Invitee class
Other Tasks
I am sure that I'll think of other tasks as I go through these tasks.  Those will just need to be added when they come up.

Estimates
Part of the process of developing software is to give an estimate of how long something is going to take.  This is generally a tricky business, that is, it is hard to get it right.  Why is this the case?  Partly because the nature of a task is not completely understood.  Partly because other things come up.  Partly because the task changes.  Partly because things that the task depends on changes.  

The Backlog
So, what can be done about this tricky stuff?  One thing that I like is the idea that a task is in one of 3 states -- done, in process, or back logged.  Things that get done count.  Really only one thing is in process at a time.  Everything else is in the backlog.  

If something in process is blocked, it goes back into the backlog until it can be unblocked.  It gets unblocked when whatever it depends on is done.  If it is blocked by something that was unknown, a new tasks is created and added to the backlog, moved to in process when it is the thing that is in process, and then on to the done stuff when it is done.

But not of this fixes the estimate challenge.  Each task still needs to be estimated and then measured and then the actual is compared to the estimate.  This shows how close the original estimates were and allows for better predictions when subsequent tasks are estimated.

The First Estimate on This Project
Well, I'm going to tackle the "create a start page" task.  I happen to have 15 minutes, so I'm going to give that as my estimate.