13 September 2011

Fun with enum

If you’ve done any vaguely serious programming with a pre-4 version of the .NET Framework then chances are you’ve had to write an Enum.TryParse() method. You probably wrote something like this:

public static bool TryParse<TEnum>(string value, out TEnum enumValue)
{
 Type enumType = typeof(TEnum);
 if (!enumType.IsEnum) throw new ArgumentException("Type is not an enum.");
 
 enumValue = default(TEnum);
 
 if (Enum.IsDefined(enumType, value))
 {
  enumValue = (TEnum)Enum.Parse(enumType, value);
  return true;
 }
 
 return false;
}

Everything went fine until someone decided to pass in a string representing a value of the underlying type such as “0” at which point Enum.IsDefined() said no even though your enum looked like this:

public enum MyEnum
{
 Zero = 0, One, Two, Three
}

Enum.Parse() will accept “0” just fine but IsDefined() requires the value be of the correct underlying type so in this case you’d need 0 as an integer for it to return true. Doesn't that mean I now need to work out the underlying type and then do the appropriate Parse() method using reflection? Oh dear, looks like our nice generic solution may get rather complicated!

Fear not. Because we know our input type is a string and there are a very limited number of underlying types we can have there’s a handy framework method we can use to sort this out – Convert.ChangeType().

public static bool IsUnderlyingDefined(Type enumType, string value)
{
 if (!enumType.IsEnum) throw new ArgumentException("Type is not an enum.");
 
 Type underlying = Enum.GetUnderlyingType(enumType);
 
 var val = Convert.ChangeType(value, underlying, CultureInfo.InvariantCulture);
  
 return Enum.IsDefined(enumType, val);
}

ChangeType() is effectively selecting the correct Parse method for us and calling it, passing in our string and returning a nice strongly typed underlying value which we can pass into Enum.IsDefined(). So our TryParse now looks like this:

public static bool TryParse<TEnum>(string value, out TEnum enumValue)
{
 Type enumType = typeof(TEnum);
 if (!enumType.IsEnum) throw new ArgumentException("Type is not an enum.");
 
 enumValue = default(TEnum);
 
 if (Enum.IsDefined(enumType, value) || IsUnderlyingDefined(enumType, value))
 {
  enumValue = (TEnum)Enum.Parse(enumType, value);
  return true;
 }
 
 return false;
}

This exercise is somewhat contrived especially now Enum.TryParse is part of .NET 4.0 but the synergy of ChangeType and IsDefined is quite nice and a technique worth pointing out nonetheless.

Links

13 May 2011

Bulk upsert to SQL Server from .NET

or, “How inserting multiple records using an ORM should probably work”

Anyone familiar with .NET ORMs should know that one area they’re lacking in is where it comes to updating or inserting multiple objects at the same time. You end up with many individual UPDATE and INSERT statements being executed on the database which can be very inefficient and often results in developers having to extend the ORM or break out of it completely in order to perform particular operations. An added complication is that, where identities are being used in tables, each INSERT command the ORM performs must immediately be followed by a SELECT SCOPE_IDENTITY() call to retrieve the identity value for the newly inserted row so that the CLR object may be amended.

It’s possible to drastically improve on this by making use of a couple of features already supported in the .NET Framework and SQL Server and I’m hoping that a similar solution will feature in future releases of the major ORMs.

  • The .NET Framework’s SqlBulkCopy class allowing you to take advantage of BULK operations supported by SQL Server.
  • SQL Server temporary tables.
  • SQL Server 2008’s MERGE command which allows upsert operations to be performed on a table and in particular its ability, using the OUTPUT command, to return identities for inserted rows.

The process

The main steps of the process are as follows:

  1. Using ADO.NET create a temporary table in SQL Server whose schema mirrors your source data and whose column types match the types in the target table.
  2. Using SqlBulkCopy populate the temporary table with the source data.
  3. Execute a MERGE command via ADO.NET on the SQL Server which upserts data from the temporary table into the target table, outputting identities.
  4. Read the row set of inserted identities.
  5. Drop the temporary table.

So instead of n INSERT statements to insert n records that’s four SQL commands in all to insert or update n records.

There’s already a blog post on this technique that goes into more detail by Kelias which you can read here. The only part missing from Kelias’ post is the piece utilising the OUTPUT modifier to retrieve the inserted identities from the MERGE command. This is simply an additional line in the merge command e.g.

OUTPUT $action, INSERTED.$IDENTITY

and the small matter of reading those returned identities out of a SqlDataReader.

This is the crucial piece, however, as it is this which allows us to tie the inserted row back to the original CLR “entity” item that formed part of our source data. Updating our CLR object with this identity will allow us to save subsequent changes away as an UPDATE to the now existing database row.

Performance

I did some brief testing to get rough timings of this technique versus individual INSERT calls using a parameterised ADO.NET command. With a variety of numbers and sizes of rows from 100 to 10,000 and with row sizes from 1k to 10k roughly the upsert technique nearly always executed in less than half the time of the individual INSERT statements. For example, 1,000 rows of about 1k each took individual INSERTs an average of just over 500ms versus bulk upsert’s 150ms on my quite old desktop with not very much RAM.

That’s pretty cool considering the upsert could be performing either an INSERT or an UPDATE command in the same number of calls whereas if I were to factor that into the individual SQL statements method it would be a lot of extra commands to try an UPDATE and then check whether any rows had been affected etc.

Github project

I decided to have a go at wrapping the upsert technique up in a library which would automatically generate the SQL necessary for creating the temporary table and running the MERGE. I pushed an initial version of this SqlBulkUpsert project to github which can be found here:
https://github.com/dezfowler/SqlBulkUpsert

Usage would be something like this:

using (var connection = DatabaseHelper.CreateAndOpenConnection())
{
 var targetSchema = SqlTableSchema.LoadFromDatabase(connection, "TestUpsert", "ident");

 var columnMappings = new Dictionary<string, Func<TestDto, object>>
       {
        {"ident", d => d.Ident},
        {"key_part_1", d => d.KeyPart1},
        {"key_part_2", d => d.KeyPart2},
        {"nullable_text", d => d.Text},
        {"nullable_number", d => d.Number},
        {"nullable_datetimeoffset", d => d.Date},
       };

 Action<TestDto, int> identUpdater = (d, i) => d.Ident = i;

 var upserter = new TypedUpserter<TestDto>(targetSchema, columnMappings, identUpdater);

 var items = new List<TestDto>();

 // Populate items with TestDto instances
 
 upserter.Upsert(connection, items);

 // Ident property of TestDto instances updated
}

with TestDto just being a simple class like this:

public class TestDto
{
 public int? Ident { get; set; }
 public string KeyPart1 { get; set; }
 public short KeyPart2 { get; set; }
 public string Text { get; set; }
 public int Number { get; set; }
 public DateTimeOffset Date { get; set; }
}

In this TypedUpserter example we:

  1. define the schema of the target table either in code or by loading it from the database (shown in the example)
  2. define mappings from column names of the target to a lambda retrieving the appropriate property value from the TestDto class
  3. define an action to be called to allow setting the the new identity to a property of the DTO
  4. instantiate the Upserter and call Upsert() with a list of items and a database connection
  5. the identity properties of the TestDto instances will have been updated using the defined action so the CLR objects will now be consistent with the database rows.

Next step

The object model could probably do with some refinement and it needs lots more tests adding but it’s in pretty good shape so next I’m going to look at integrating it into Mark Rendle’s Simple.Data project which should mean that, to my knowledge, it’s the only .NET ORM doing proper bulk loading of multiple records.

26 January 2011

Adding collections to a custom ConfigurationSection

The attributed model for creating custom ConfigurationSection types for use in your app.config or web.config file is quite verbose and examples are hard to come by. Collections in particular are a pain point, there is very little documentation around them and the examples all tend to follow the default add/remove/clear model i.e. that used in <appSettings/>.

Three particular scenarios with collections which caused me problems while doing the same piece of work were:

  • When the items of a collection have a custom name e.g. "item" instead of add/remove/clear
  • When the items of a collection can have different element names representing different actions or subclasses e.g. the  <allow/> and <deny/> elements used with <authorization/>
  • When the items of a collection don’t have an attribute which represents a unique key e.g. not having anything like the key attribute of an <add/> or <remove/> element

This first and last are relatively trivial to fix, the second less so and it took me a bit of digging around in Reflector to work out how to set up something that worked.

Collection items with a custom element name

This scenario can be accomplished as follows.


public class MySpecialConfigurationSection : ConfigurationSection
{
 [ConfigurationProperty("", IsRequired = false, IsKey = false, IsDefaultCollection = true)]
 public ItemCollection Items
 {
  get { return ((ItemCollection) (base["items"])); }
  set { base["items"] = value; }
 }
}

[ConfigurationCollection(typeof(Item), CollectionType = ConfigurationElementCollectionType.BasicMapAlternate)]
public class ItemCollection : ConfigurationElementCollection
{
 internal const string ItemPropertyName = "item";

 public override ConfigurationElementCollectionType CollectionType
 {
  get { return ConfigurationElementCollectionType.BasicMapAlternate; }
 }

 protected override string ElementName
 {
  get { return ItemPropertyName; }
 }

 protected override bool IsElementName(string elementName)
 {
  return (elementName == ItemPropertyName);
 }

 protected override object GetElementKey(ConfigurationElement element)
 {
  return ((Item)element).Value;
 }

 protected override ConfigurationElement CreateNewElement()
 {
  return new Item();
 }

 public override bool IsReadOnly()
 {
  return false;
 }

}

public class Item
{
 [ConfigurationProperty("value")]
 public string Value 
 {
  get { return (string)base["value"]; }
  set { base["value"] = value; }
 }
}

Which will allow us to specify our section like so:


<configSections>
  <section name="mySpecialSection" type="MyNamespace.MySpecialConfigurationSection, MyAssembly"/> 
</configSections>

...

<mySpecialSection>
 <item value="one"/>
 <item value="two"/>
 <item value="three"/>
</mySpecialSection>

First off we have a property representing our collection on our ConfigurationSection or ConfigurationElement whose type derives from ConfigurationElementCollection. This property decorated with a ConfigurationProperty attribute. If the collection should be contained directly within the parent element then set IsDefaultCollection equal to true and leave element name as empty string. If the collection should be contained within a container element specify an element name.

Next, the ConfigurationElementCollection derived type of the property should have a ConfigurationCollection attribute specifying element type and collection type. The collection type specifies the inheritance behaviour when the section appears in web.config files nested deeper in the folder structure for example.

For the collection type itself we do this:

  • Override ElementName to return collection item element  name
  • Override IsElementName to return true when encountering element name
  • Override GetNewElement() to new up an instance of your item type
  • Override GetElementKey(element) to return an object which uniquely identifies the item. This could be a property value, a combination of values as some hash or the element itself

Collection items with varying element name


public class MySpecialConfigurationSection : ConfigurationSection
{
 [ConfigurationProperty("items", IsRequired = false, IsKey = false, IsDefaultCollection = false)]
 public ItemCollection Items
 {
  get { return ((ItemCollection) (base["items"])); }
  set { base["items"] = value; }
 }    
}
    
[ConfigurationCollection(typeof(Item), AddItemName = "apple,orange", CollectionType = ConfigurationElementCollectionType.BasicMapAlternate)]
public class ItemCollection : ConfigurationElementCollection
{
 public override ConfigurationElementCollectionType CollectionType
 {
  get { return ConfigurationElementCollectionType.BasicMapAlternate; }
 }

 protected override string ElementName
 {
  get { return string.Empty; }
 }

 protected override bool IsElementName(string elementName)
 {
  return (elementName == "apple" || elementName == "orange");
 }

 protected override object GetElementKey(ConfigurationElement element)
 {
  return element;
 }

 protected override ConfigurationElement CreateNewElement()
 {
  return new Item();
 }

 protected override ConfigurationElement CreateNewElement(string elementName)
 {
  var item = new Item();
  if (elementName == "apple")
  {
   item.Type = ItemType.Apple;
  }
  else if(elementName == "orange")
  {
   item.Type = ItemType.Orange;
  }
  return item;
 }
 
 public override bool IsReadOnly()
 {
  return false;
 }
}

public enum ItemType
{
 Apple,
 Orange
}

public class Item
{
 public ItemType Type { get; set; }

 [ConfigurationProperty("value")]
 public string Value 
 {
  get { return (string)base["value"]; }
  set { base["value"] = value; }
 }
}

Which will allow us to specify our section like so:


<configSections>
  <section name="mySpecialSection" type="MyNamespace.MySpecialConfigurationSection, MyAssembly"/> 
</configSections>

...

<mySpecialSection>
 <items>
  <apple value="one"/>
  <apple value="two"/>
  <orange value="one"/>
 </items>
</mySpecialSection>

Notice that here we've specified two collection items with the value "one" which would have resulted in one overwriting the other in the previous example. To get around this, instead of returning the Value property we're returning the element itself as the unique key.

This time our ConfigurationElementCollection derived type's ConfigurationCollection attribute also specifies a comma delimited AddItemName e.g. "allow,deny". We override the methods of the base as follows:

  • Override ElementName to return empty string
  • Override IsElementName to return true when encountering a correct element name
  • Override GetNewElement() to new up an instance of your item type
  • Override GetNewElement(elementName) to new up an instance of the correct item type for particular element name setting relevant properties
  • Override GetElementKey(element) to return an object which uniquely identifies the item. This could be a property value, a combination of values as some hash or the element itself

Caveat

While our varying element names will be readable the object model is read-only. I haven't covered support for writing changes back to the config file here as it involves taking charge of the serialization of the objects so really requires its own blog post.

Links