The Shoemaker and the Asynchronous Process Elves
By Doug Boude
Sunday, March 12th, 2006: I found myself near the end of a blustery late winter's day,
the kind that can only be found in Pooh's Hundred Acre Wood or Minneapolis,
Minnesota; in my case (this time), it happened to be Minneapolis.
It was to be the last talk at a two-day conference whose speakers had been exploring "en masse"
new ways of thinking about and approaching application development using
ColdFusion. Thus far I had found the talks to be, for the most part,
inspiring and enlightening, and had no reason to believe this one, titled
"Asynchronous Logging" and presented by Michael Dinowitz, would be any different. As usual, I was right on the money, for as the speaker began to artistically weave
the situation that would serve as the catalyst for the audience's understanding,
lo and behold, my mental floodlights shone brightly and an analogy began
to form in my mind that was to become my everlasting comprehension of
the topic at hand. It was an old story, told to me as a child, that
I found to be the perfect equivalent for "Asynchronous Logging":
The Shoemaker and the Elves.
The author will make no assumptions as to whether or not his readers had the childhood privilege of being privy to the story of the hard-working Shoemaker; suffice it to say that the Shoemaker worked steadily on, night and day, working his way down a very long list of customers' shoe orders one at a time while the list grew ever longer. Oh, given enough time and processing power, the Shoemaker would eventually complete his list of orders (at least that's what Vegas odds say); but at what cost in time to the customers waiting patiently for their new shoes to arrive? How long would they wait before timing out, giving up, and finding another site to make them a pair of shoes? The Shoemaker never had to answer those questions, because late one night help arrived: a very anxious and innumerable company of Process Elves who begged the Shoemaker to let them help. There they stood, tool belts on, hands idle, looking up at the obviously worn out Shoemaker, ready to go to work as soon as the word was given. And give the word he did, for he immediately began to hand each Elf a shoe order. As soon as he gave an Elf direction, that Elf began to pound away and complete his individual piece of work. Soon the Shoemaker had handed out every single order to an Elf. The sound of the hammers was deafening, but in only a very brief amount of time, every single order had been automagically fulfilled. It was nothing short of a miracle for the Shoemaker, who vowed he would employ those hard-working Elves
at every given opportunity from now on. Never would he do all of the
work himself in serial when performing it in parallel was an option.
Needless to say, he ran every other Shoemaker in town out of business
in short order, and, of course, lived happily ever after.
Armed with this inspiration
and having firmly etched it into my mind through Combinatory Play (and
a few brews in the airport pub), I arrived back at my office on Monday
ready to execute an experiment to see just how much faster it is to
make shoes with the help of Elves. I told my fellow programmers about
the epiphany that was given me, spawning some debate over whether or
not the work would actually occur in less time or if it would just appear
to be less time because of the fact that once a process is handed off
to an Elf it is completely out of sight. So I came up with an experiment
that I felt would answer that very question, and executed it.
The Experiment
The experiment consisted of
inserting the contents of a text file into a database table, first letting
the Shoemaker do all of the work himself one record at a time, and then
letting the Shoemaker's only job be to hand off each record to an Elf
for insertion. After the insertions were complete, I subtracted the
time that the first insert occurred (noted by a datetime field in the
target table) from the time that the last insert occurred, to see which
method required the lesser amount of time to complete.
My incoming file consisted
of 256 records, each with 3 varchar fields. Two additional fields were
also inserted: a datetime stamp set within the insert statement, and
an integer indicating which method had performed the insertion (asynchronous
or synchronous).
Here is the code I used to read in the data:
Listing 1: asynctest.cfm: Code to Read in Data
<!--- read in our data file --->
<cffile ACTION="read" FILE="#expandpath(".")#\testfile.txt" VARIABLE="incoming">
<!--- treating the file like a list...use the following linefeed/carriage return chars as delimiters --->
<cfset mydelim = chr(10) & chr(13)>
<!--- grab number of lines in file and display for informational purposes --->
<cfset filelen = listlen(incoming, mydelim)>
<cfoutput>lines: #filelen#</cfoutput>
<!--- put the data file into an array --->
<cfset thisdata = ListToArray(incoming, mydelim)>
The template used (asynctest.cfm)
contained both the synchronous processing and the asynchronous process
handoff, segregated within a cfswitch, which I manipulated via a URL
expression.
Listing 2: asynctest.cfm: CFSwitch statement
<!--- data put into array. Start the test (testtype value of 1 indicates linear processing, 2 indicates async --->
<cfswitch expression="#url.test#">
<cfcase value="batch"><!--- perform inserts the old fashioned way --->
<!--- first purge out existing records --->
<cfquery name="qryDelete" datasource="#dsn#">
delete from TestTable where testtype = 1
</cfquery>
<!--- loop over the dater array and perform an insert for each item --->
<cfloop from="1" to="#arraylen(thisdata)#" index="j">
<cfquery name="qryInsertRec" datasource="#dsn#">
insert into TestTable (txt_groupid,txt_code,txt_description,inserttime,testtype)
VALUES (
<cfqueryparam value="#listfirst(thisdata[j])#" CFSQLTYPE="CF_SQL_VARCHAR">,
<cfqueryparam value="#listgetat(thisdata[j],2)#" CFSQLTYPE="CF_SQL_VARCHAR">,
<cfqueryparam value="#listlast(thisdata[j])#" CFSQLTYPE="CF_SQL_VARCHAR">,
<cfqueryparam value="#now()#" CFSQLTYPE="CF_SQL_TIMESTAMP">,
<cfqueryparam value="1" CFSQLTYPE="CF_SQL_TINYINT">
)
</cfquery>
</cfloop>
<!--- retrieve the max and min times so we can calculate total insertion time --->
<cfquery name="qryGetMaxMin" datasource="#dsn#">
select max(inserttime) as maxtime,min(inserttime) as mintime from TestTable where testtype=1
</cfquery>
<cfoutput>total time for linear insertions: #datediff("s",qryGetMaxMin.mintime,qryGetMaxMin.maxtime)# seconds</cfoutput>
</cfcase>
<cfcase value="async"><!--- perform inserts via gateways --->
<!--- purge out existing data --->
<cfquery name="qryDelete" datasource="#dsn#">
delete from TestTable where testtype = 2
</cfquery>
<!--- loop over data array and call the gateway for each item, passing in the data --->
<!--- set up the structure we'll be handing over to the gateway --->
<cfset stData = structnew()>
<cfset stData.dsn = dsn>
<cfloop from="1" to="#arraylen(thisdata)#" index="j">
<cfscript>
stData.theData=thisdata[j];
sendGatewayMessage(gateway, stData);
</cfscript>
<!--- In order to give the gateways time to complete their work,
calling this page with a separate request where test=checkAsyncTime in
order to see the total insertion time for the gateways --->
</cfloop>
</cfcase>
<cfcase value="checkAsyncTime"><!--- go back and see how long it took to perform all inserts via gateway... --->
<cfquery name="qryGetMaxMin" datasource="#dsn#">
select max(inserttime) as maxtime,min(inserttime) as mintime from TestTable where testtype=2
</cfquery>
<cfoutput>total time for asynchronous insertions: #datediff("s",qryGetMaxMin.mintime,qryGetMaxMin.maxtime)# seconds</cfoutput>
</cfcase>
</cfswitch>
The actual Elf itself (or asynchronous
process code) lived in a CFC that had been associated with a ColdFusion
Gateway that I set up within the CF Administrator. Both methods were
given the responsibility for splitting the incoming record into individual
field values before performing their insertions, so that work was also
part of their total insertion time. The backend database was Sybase,
being accessed by CF via a datasource configured using a system ODBC
connection instead of a native Sybase ColdFusion driver.
Listing 3: aTest.cfc:
<CFCOMPONENT>
<CFFUNCTION ACCESS="public" NAME="onIncomingMessage" OUTPUT="false">
<CFARGUMENT NAME="CFEvent" TYPE="struct" REQUIRED="yes">
<cftry>
<cfquery name="qryInsertRec" datasource="#CFEvent.Data.dsn#" DBTYPE="ODBC">
insert into TestTable (txt_groupid,txt_code,txt_description,inserttime,testtype)
VALUES (
<cfqueryparam value="#listfirst(CFEvent.Data.theData)#" CFSQLTYPE="CF_SQL_VARCHAR">,
<cfqueryparam value="#listgetat(CFEvent.Data.theData,2)#" CFSQLTYPE="CF_SQL_VARCHAR">,
<cfqueryparam value="#listlast(CFEvent.Data.theData)#" CFSQLTYPE="CF_SQL_VARCHAR">,
<cfqueryparam value="#now()#" CFSQLTYPE="CF_SQL_TIMESTAMP">,
<cfqueryparam value="2" CFSQLTYPE="CF_SQL_TINYINT">
)
</cfquery>
<cfcatch>
<cfmail TO="dougboude@gmail.com" FROM="gateway test" SUBJECT="Candygram for Mongo">
#cfcatch.error#<br>#cfcatch.message#<br>#cfcatch.detail#
</cfmail>
</cfcatch>
</cftry>
</CFFUNCTION>
</CFCOMPONENT>
I performed three runs of each
method, and found the results to be much more in the favor of the asynchronous
method than I had expected. What took on an average of 20.7 seconds
to complete using standard methodology took only 1.7 seconds when handing
each insert to the gateway. That's twelve times faster, and so needless
to say, I'm a fan of utilizing the Elves now whenever and wherever I
can find a creative place to do so.
(To see the experiment in action,
download my files from
http://www.fusionauthority.com/alert/161/async.zip
and save them in the same directory. Execute the tests at http://yourserver/asynctest.cfm?test= , where test can equal 'batch' (linear processing), 'async' (gateway
processing), or 'checkAsyncTime' (to retrieve the time results for the
last async processing test).)
Employing Those Elves
There's always a competing shoemaker in town, so you are probably thinking of at least two places
in your application where an Elf or two or three would be just the ticket.
Before you start delegating functionality out to ColdFusion gateways,
take a few moments to consider a couple of "gotchas."
Oh yes, there can be gotchas when working with Elves (besides the fact
that they're union).
The first and most important
prerequisite for using asynchronous gateways is that your ColdFusion
server must be MX7 Enterprise; MX7 Standard, as well as previous versions,
do not have built-in support for gateways. The second point that
must be considered before deciding what and how much work to delegate
out to a gateway is the 'beefiness' of your server. While we can sometimes
mistake our servers for the latest CRAY prototypes, they do have their
limits, and one can quickly find out what those limits are if one isn't
careful. For a very insightful article to help add balance to your zeal,
read Sean Corfield's blog post,
http://www.corfield.org/blog/index.cfm/do/blog.entry/entry/Asynchronous_Development__How_Much_Parallelism.
The Moral of the Story
Although every speaker at the
CFObjective() Conference had different topics, different styles, and
different approaches, I was able to glean one thing from them all: to
be creative in my coding. Think outside the box, make the time to experiment
mentally and physically with ColdFusion, and let my creativity manifest
itself in what I produce. By their examples, I saw that I should always
strive for elegance and efficiency in my code. It was inspiring, to
say the least. I leave you with a bone to chew on: How could one possibly
re-create or simulate asynchronous processing using a version of ColdFusion
other than MX7? Let that question be fodder for your creativity and
experimentation next time you have a few minutes to "play".
Further Reading on Asynchronous Gateways:
"The Asynchronous CFML Gateway" by Matt Woodward
Adobe Asynchronous Gateway Livedoc article
"Asynchronous Development ? Things to Consider" by Sean Corfield
"Understanding Asynchronous Processing" by Ben Forta
Having spent four years disarming bombs for the Air Force, Doug Boude is now a
Senior Web Application Architect for Fiserv Health in San Antonio, TX. He has been developing with ColdFusion since version 4.0.