Compact Framework

Improving Data Access Performance with Data Caching

Chris Tacke
OpenNETCF Consulting
September 2007

Download the Source Code
Download the PDF

Introduction

Performance of  the data access layer of a mobile or embedded application is often viewed by developers in very broad strokes.  When we begin designing a solution we consider the performance versus benefits of using DataReaders,  DataAdapters, DataSets and the like.  Most often we just make an early determination based on what our expected use and the generally accepted "best practice" is and then move on without much further thought. 

Traditionally if we only wanted to ever read the data in a non-scrollable cursor, we'd use a DataReader.  If we want to alter the data we'd use a DataSet.  Version 2.0 of the Compact Framework brought the wonderful "blended" use DataAdapter, and many developers migrated to it and considered the variable of data access performance to be largely out of their hands beyond that point.

A recent plant floor automation system project that we worked on was a stark reminder that such assumptions are not true, and that performance considerations need to be looked at throughout the development of any solution.

Data Access as a Bottleneck

Like many projects, we didn't really start to discover show-stopping performance problems in our recent endeavor until we were getting close to delivery of a full working system.  Early tests during development usually miss the full impact of all of the interactions of a full running system, and it's really hard, if not impossible, to guess what the actual execution performance of any solution will be until all of the major pieces ar in place and cooperating.  Sure, you might get hints at problem areas early on, but those are usually gross, easy-to-find-and-remedy problems usually contained within a single susbsystem.  It's not until you have enough infrastructure in place for the system as a whole to start working as a unit that you start to see the larger-scale problems, and coincidentally it's usually about the same time that you start delivering drops to your customer.

Our system collected event data from Programmable Logic Computers (PLCs) on the factory floor, aggregated that data into relational SQL CE tables, and then generated meaningful reports from that data through a light-weight ASP.NET Web Server running on Windows CE  that we built (I'll discuss more on that specific technology in a future article).  The basic unit of measure that we used to judge system performance was the number of messages coming from PLCs that would could handle while still serving up reports.

The customer's initial  requirement included the ability to handle around 5 events per second, and early tests and proof-of-concept demonstrations had us running as many as 60 events per second so we were confident that we were not just going to meet their requirements, but dazzle them with an order-of-magnitude-better performance.  Then as we approached the time for final delivery of the system and we started tying all of the multitude of pieces together we suddenly found the system struggling to even handle 2 or 3 events per second, and new calculations from the customer we indicating a desire to handle not just 5 but 10 events per second.

My initial instinct based on the application's behavior and experience was that Garbage Collection was the problem, but as I outlined in a previous article that turned out not to be the case.  After heavily instrumenting the code I determined that a large amount of our cycle execution time was being spent in data access (a cycle being the receipt of an event as an array of bytes, parsing that event  and then storing the results in a table) .

Part of the parsing and storage involved looking up foreign key integers from lookup tables (the events come in with "denormalized" text data, so we have to look up the ID based on text) and after instrumenting the code to even finer granularity I found that even these simple reads were taking a lot longer than one would expect.

Profiling Lookup Table Access Speed

Since I knew that simple lookups were taking a long time (tens to hundreds of milliseconds) I knew immediately that we needed to try to decrease this, but how?  We needed the data from the lookup table to do subsequent inserts and/or updates.  The answer was to generate our own data caching mechanism at the table level.

Most data entities in the solution have been abstracted to single class definitions and a search on the table was done through methods like GetCustomerByID or GetIDFromCustomerName.  We were doing a lot of the latter type - getting an ID based on other fields, but the caching technique we're going to cover in this article works well for any of them.

It's a lot easier to talk about performance gains if we can quantify the performance before the "fix" and again after the fix, so before we move on to the solution, let's take a look at some post-mortem investigation data I gathered. 

I created a simple application that creates a SQL CE database with a single table (our simulated lookup table) and then puts just 2 rows into the table.  With only 2 rows we remove the possible effects of any table indexing, since a lookup is going to require two fetches at most.

I then created very basic code that would get a row from the table based on the row identifier (so SELECT * FROM Table WHERE RowID = N).  The architecture of our large solution was such that most data transactions created their own connection to the database, so I added code to my test application  that used the same behavior.  To smooth out the results, the application would create the connection, open it, run the SQL and then close the connection.   It did this 100 times and output the average time it took to perform the operation.  The results from running the test on three separate test devices are below (the actual production device being the first in the list):

Device

Processor

Mean Execution (ms)*

OLDI SAM-L8

800MHz Geode x86

7.5 to 9.5

iCOP eBox 2300

200MHz Vortex86

47 to 52

Dell Axim x51

416MUz Intel PXA270

38 to 62

*Ranges are given because we ran several iterations of the 100-run tests

Intuition and experience (and other developers) told me that the first step in improving these numbers would be to reuse the connection instead of creating and opening it every time.  So I modified the test application to create a global connection and use it without closing for all of the queries.  The results for those tests are below.  Note that these numbers exclude the first run which creates and opens the connection, so these numbers reflect purely the time to get one row of data from a table containing only two rows of data.

Device

Processor

Mean Execution (ms)

OLDI SAM-L8

800MHz Geode x86

7.1 to 14.8

iCOP eBox 2300

200MHz Vortex86

51 to 52

Dell Axim x51

416MUz Intel PXA270

43 to 61

Now if you go back and look at the two result tables you'll probably be as surprised as I was (and yes, I double- and triple-checked these numbers). 

Lesson 1: There's no performance benefit whatsoever to keeping the connection created and opened through the run of your application.

Ok, so things aren't quite what we expected, but I created a meaningful test that provided baseline numbers that I needed to improve.  The next step was to implement caching and then quantify the results to see if the improvement in performance warrants the extra time it takes to implement the caching itself.

Implementing a Data Cache

So what exactly is a data cache and how do we go about creating one? The concept (and really the implementation) is quite simple.  You start by creating a data type (could be a struct or a class) that describes a row of data. That entity can be a subset of the columns in a table if your lookup query typically only looks at a couple fields.  You then create a searchable collection to hold a series of these entities.  In my test application I used a Hashtable because it makes lookups by a key value very easy.  This collection of entities is the cache.

Using the cache is simple.  When a table lookup is needed, instead of going right to the database you first look in the cache to see if the row is there.  If it is, you return the data from the cache.  If it's not, you get it from the database, store it in the cache and then return the data.  In this way the cache grows as data is queried to spread out the load time over many operations.  Another mechanism would be to populate the cache with all of the lookup table rows on creation.   This method would mean that all lookups, including the first will be as fast as possible, but you have to pay up front to fill it.

Figure 1

I added a cache to the test application and re-ran the tests.  The results follow.  Again these numbers omit the first run which was skewed high because we had 1 table lookup and 99 cached instead of 100 cache reads.

Device

Processor

Mean Execution (ms)

OLDI SAM-L8

800MHz Geode x86

0.0193 to 0.0201

iCOP eBox 2300

200MHz Vortex86

0.1014 to 0.1164

Dell Axim x51

416MUz Intel PXA270

0.0975 to 0.1437

 

Yes, again those numbers are correct - the decimal points are not in the wrong place.

Lesson 2: Cache reads are two orders of magnitude faster than a database read.

Of course I expected the cache reads to be faster, but I wasn't expecting this much of a gap.  In fact I have no explanation for why the difference is so large.  The database that the test application created was in RAM, not persistent storage, so both the database read and the cache read are coming from RAM.

Conclusion

So what did we learn from this test?  First, that the "general knowledge" that reusing a connection is faster appears to be completely false and second that caching data has some seriously large performance advantages.  Of course you still have to weigh those advantages against the advantages of getting the data from the database. 

Your cached data is not going to get updated when a table is updated, so it probably not a good idea for tables that have data that changes a lot.  The cache maintenance code gets more complex if you have code that deletes or updates the lookup table.

You can't join or query your cached data with SQL.  If the data is not a typical lookup table and is often used in joins, then it's probably not a good candidate for caching.

The data cache takes up memory.  If you have a really large lookup table (say a lookup of every product a large retail store has on its shelves for example) then a cache might not be a good idea, though you could modify the cache maintenance code to hold maybe the last 100 or so rows read, so frequently queried rows would often be in the cache and fast to read.

So is caching the answer to every database performance woe that might plague you in a project?  Obviously not, but it is an important tool that you can use to improve some areas of your application. It is simply another tool that you now have and it's going to be up to you to recognize if and when it's applicable to a problem you're working on, but at least now you know it exists. And, as I was told many times by a cartoon as a kid, knowing is half the battle.  Go Joe!

Comments

 

Richard Jones - Mobile LOB Blog said:

September 21, 2007 3:15 PM
 

dancordi said:

Link for download the source code doesn't work!

September 24, 2007 12:20 PM
 

chris.bono said:

Interesting that your tests indicated that no performance benefit was realized from a global connection object.

My recent experience was just the opposite. I unfortunately don't have hard numbers to apply to the lookup operations. But the early versions of our mobile application (which relies heavily on SQL CE) was dog slow when it came to performing several queries in a row (say, navigating to and initializing data on a new screen). I mean, you could literally count the seconds go by waiting for a new screen to draw -- sometimes up to 10-15 seconds, depending on how much data was being requested.

The Data Access Layer was built at the time to open/close a connection after each read operation. I modified it to utilize a global connection object and the navigation speeds dropped to less than a second. Keep in mind this was the only change I made to DAL, so there was no "accidental" optimization.

Perhaps our DAL implementations are different? Not sure, but I definitely wouldn't discourage anyone from exploring the use of a global connection object in an effort to boost performance.

September 25, 2007 4:07 PM
 

SteveLas said:

Hi Chris,

For the connection performance, I'm going to guess you have a different connection already open. The way SQLce works is the first connection loads the database into shared memory. Think of it as starting the SQL Server Service for our big brother SKUs. Subsequent connections simply access the already loaded database. Sort of like using a connection from the connection pool. (but, no SQLce doesn’t have connection pooling).  Since we have the best practice guidance for opening/closing a connection for our server skus, you can follow this practice with SQLce.  However, you’ll want to open at least a phantom connection to keep the engine loaded.  You don't ever have to use this connection object, just think of it as the "SvcHost" <g>

This best practice for sharing a global connection was the best practice back in the SQLce 1.0 & 2.0 days when we only supported a single connection. With SQLce 3.0 (formerly SQL Server Mobile 3.0) we added multi-connection support, locking etc., so it is possible to read conflicting “best practices” if the version isn’t attributed to when the best practice was a recommendation.

This also accounts for the differences you and Chris may be seeing.

I’ll also add that sharing a SqlCeCommand object can help or hurt you. In SQLce 3.x, we cache the queryplan associated with the command. So, while you may want to re-use objects to help .NET CF manage garbage collection, constantly changing the CommandText of an existing SqlCeCommand can hurt you if you’re going to re-execute the same command multiple times as we'll toss the cached plan.

If you’re going to do something like the following, your best to open a single connection, and create two different commands.  Ohhh, and yes, it’s best to use the parameters object

For each blahh in some result

 Insert into Orders

 Insert into OrderDetails

Next

Hope that helps take some mystery out of some of our performance thingys

Steve

September 26, 2007 8:54 PM
 

ctacke said:

I just verified that the source link works in FireFox and IE7

September 27, 2007 1:26 PM
 

chris.bono said:

Hi Steve,

Thanks for the suggestions -- I appreciate that I'm getting SQL CE advice from someone who develops the product.

So... if I understand correctly, I should open a SqlCeConnection object to "prime" SQL CE, leave it alone and then go back to the older approach of opening/closing new connections whenever I perform a query?

We don't reuse SqlCeCommand objects -- they're created and destroyed on a per-query basis. I had thought about keeping some of them open in order to benefit from the cached plans, though that will require a large amount of refactoring. We're not currently using parameters either, which I believe is necessary for the caching as well.

At risk of relying on the Lazyweb approach, can you point me to some samples of best practices for SQL CE, including the use of a "phantom" connection object?

Best Regards,

Chris B.

September 27, 2007 5:40 PM
 

ErikEJ said:

Hi Chris,

I have looked a the code in your test project, and have made a fourth approach, which decreases you non-cached test with about 90 %

using proper SQL CE data access, no cache (thus avoiding any cahce refresh issues).

I am using direct table access and bypassing the query processor. What a difference!

I have made the following changes:

//Added to CreateDatabase

SqlCeCommand indexCmd = new SqlCeCommand("CREATE UNIQUE INDEX IX_Beers ON Beers (ID) ", m_cachedConnection);

indexCmd.ExecuteNonQuery();

indexCmd.Dispose();

// new button click event handler:

       private void button1_Click(object sender, EventArgs e)

       {

           long sum = 0;

           for (int i = 0; i < 100; i++)

           {

               txtResult.Text = string.Format("Run {0}", i);

               sum += GetNameByIDSeek(m_cachedCommand);

           }

           txtResult.Text = string.Format("Mean: {0} ticks", sum / 100);

       }

private long GetNameByIDSeek(SqlCeCommand command)

       {

           Beer beer;

           return GetNameByIDSeek(command, out beer);

       }

       private long GetNameByIDSeek(SqlCeCommand command, out Beer beer)

       {

           long et;

           long end;

           QueryPerformanceCounter(out et);

           if (command == null)

           {

               command = new SqlCeCommand();

               command.CommandType = CommandType.TableDirect;

               command.CommandText = "Beers";

               command.IndexName = "IX_Beers";

               command.Connection = m_cachedConnection;

           }

           SqlCeDataReader reader = command.ExecuteReader();

           bool onRow = reader.Seek(DbSeekOptions.FirstEqual, new object[] { int.Parse(txtID.Text) });

           QueryPerformanceCounter(out end);

           et = end - et;

           if (onRow)

           {

               reader.Read();

               beer = new Beer(reader.GetInt32(0), reader.GetString(2), reader.GetString(1), reader.GetBoolean(3));

           }

           else

           {

               beer = new Beer(-1, null, null, false);

               txtResult.Text = "No row found for provided ID";

               et = -1;

           }

           reader.Close();

           reader.Dispose();

           return et;

       }

Timings of Windows Mobile 5 (iPAQ rw 6815):

Button 1 and 2: 250.000

My approach: 20.000 to 30000

Best regards, Erik Ejlskov Jensen

October 9, 2007 1:21 PM