.Net ramblings
# Thursday, 29 December 2005
Sending files in chunks with MTOM Web Services and .NET 2.0
just posting my code-project article on http chunking web services with MTOM here for reference.  by the way it has been updated on codeproject...

Screenshot of windows forms client, uploading a file

Introduction

In trying to keep up to speed with .NET 2.0, I decided to do a .NET 2.0 version of my CodeProject article "DIME Buffered Upload" which used the DIME standard to transfer binary data over web services. The DIME approach was reasonably efficient but the code is quite complex and I was keen to explore what .NET 2.0 has to offer. In this article, I use version 3.0 of the WSE (Web Service Enhancements) which is available for .NET 2.0 as an add-in, to provide a simpler and faster method of sending binary data in small chunks over HTTP web services.

Background

Just a recap on why you may need to send data in small chunks at all: if you have a large file and you want to send it across a web service, you must understand the way it all fits together between IIS, .NET, and the web service call. You send your file as an array of bytes, as a parameter to a web service call, which is all sent to the IIS web server as a single request. This is bad if the size of the file is beyond the configured MaxRequestLength of your application, or if the request causes an IIS timeout. It is also bad from the point of view of providing feedback of the file transfer to the user interface because you have no indication how the transfer is going, until it is either completed or failed. The solution outlined here is to send chunks of the file one by one, and append them to the file on the server.

There is an MD5 file hash done on the client and the server to verify that the file received is identical to the file sent.

Also, there is an upload and download code included in this article.

Adventures with MTOM

MTOM stands for SOAP "Message Transmission Optimization Mechanism" and it is a W3C standard. To use it (and to run this application), you must download and install WSE 3.0, which includes MTOM support for the first time. If you look in the app.config and web.config files in the source code, you will see sections referring to the WSE 3 assembly, and a messaging clientMode or serverMode setting. These are necessary to run MTOM in the application.

The problem with DIME was that the binary content of the message was sent outside the SoapEnvelope of the XML message. This meant that although your message was secure, the Dime Attachment may not be secure. MTOM fully complies with the other WS-* specifications (like WS-Security) so the entire message is secure.

It took me a while to realise that when MTOM is turned on for the client and the server, WSE automatically handles the binary encoding of the data in the web service message. With DIME and WSE 2.0, you had to configure your app for DIME and then use DimeAttachments in your code. This is no longer necessary, you just send your byte[] as a parameter or return value, and WSE makes sure that it is sent as binary, and not padded by XML serialization as it would be in the absence of DIME or MTOM.

How it works

The web service has two main methods, AppendChunk is for uploading a file to the server, DownloadChunk is for downloading from the server. These methods receive parameters for the file name, the offset of the chunk, and the size of the buffer being sent/received.

The Windows Forms client application can upload a file by sending all the chunks one after the other using AppendChunk, until the file has been completely sent. It can do an MD5 hash on the local file, and compare it with the hash on the file on the server, to make sure the contents of the files are identical. The download code is very similar, the main difference is that the client must know from the server how big the file is, so that it can know when to stop requesting chunks.

A simplified version of the upload code is shown below (from the WinForms client). Have a look in the code for Form1.cs to see the inline comments + the explanation of the code. Essentially, a file stream is opened on the client for the duration of the transfer. Then the first chunk is read into the Buffer byte array. The while loop keeps running until the FileStream.Read() method returns 0, i.e. the end of the file has been reached. For each iteration, the buffer is sent directly to the web service as a byte[]. The 'SentBytes' variable is used to report progress to the form.

using(FileStream fs = new FileStream(LocalFilePath, FileMode.Open, FileAccess.Read))
{
int BytesRead = fs.Read(Buffer, 0, ChunkSize);
while(BytesRead > 0 && !worker.CancellationPending)
{
ws.AppendChunk(FileName, Buffer, SentBytes, BytesRead);
SentBytes += BytesRead;
BytesRead = fs.Read(Buffer, 0, ChunkSize);
}
}

Example of the BackgroundWorker class in .NET 2.0

.NET 2.0 has a great new class called 'BackgroundWorker' to simplify running tasks asynchronously. Although this application sends the file in small chunks, even these small chunks would delay the WinForms application and make it look crashed during the transfer. So the web service calls still need to be done asynchronously. The BackgroundWorker class works using an event model, where you have code sections to run for DoWork (when you start), ProgressChanged (to update your progress bar / status bar), and Completed (or failed). You can pass parameters to the DoWork method, which you could not do with the Thread class in .NET 1.1 (I know you could with delegates, but delegates aren't great for thread control). You can also access the return value of DoWork in the Completed event handler. So for once, MS has thought of everything and made a very clean threading model. Exceptions are handled internally and you can access them in the Completed method via the RunWorkerCompletedEventArgs.Error property.

The code shown below is an example of the ProgressChanged event handler:

private void workerUpload_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
// update the progress bar and status bar text
this.toolStripProgressBar1.Value = e.ProgressPercentage;
this.statusText.Text = e.UserState.ToString();
// summary text is sent in the UserState parameter
}

I have used four BackgroundWorker objects in the application:

  • one to manage the upload process,
  • one to manage the download process,
  • another to calculate the local MD5 file hash in parallel while waiting for the server result,
  • and another to download the list of files in the Upload server folder to allow the user to select a file to download.

The reason I use a BackgroundWorker object for each task is because the code for each task is tied in to the events for that object.

A good example of Thread.Join()

When the upload or download is complete, the client asks for an MD5 hash of the file on the server, so it can compare it with the local file to make sure they are identical. I originally did these in sequence. But it can take a few seconds to calculate the result for a large file (anything over a few hundred MB), so the application was waiting five seconds for the server to calculate the hash, and then five more seconds for the client to calculate its own hash. This made no sense, so I decided to implement a multi-threaded approach to allow them to run in parallel. While the client is waiting on the server, it should be calculating its own file hash. This is done with the Thread class, and the use of the Join() method which blocks execution until the thread is complete.

The code below shows how this is accomplished:

// start calculating the local hash (stored in class variable)
this.hashThread = new Thread(new ThreadStart(this.CheckFileHash));
this.hashThread.Start();

// request the server hash
string ServerFileHash = ws.CheckFileHash(FileName);

// wait for the local hash to complete
this.hashThread.Join();

if(this.LocalFileHash == ServerFileHash)
e.Result = "Hashes match exactly";
else
e.Result = "Hashes do not match";

There is a good chance that the two operations will finish at approximately the same time, so very little waiting around will actually happen.

Performance compared with DIME

I found that MTOM was about 10% faster than DIME in my limited testing. This is probably to do with the need to package up each chunk into a DIME attachment, which is no longer necessary with MTOM. I was able to upload files of several gigabytes in size without problems.

Obviously, there is an overhead with all this business of reading file chunks and appending them, so the larger the chunk size, the more efficient your application will be. It should be customised based on the network and the expected size of files. For very small files, it is no harm to use small chunk sizes (e.g., 32 Kb) because this will give accurate and regular feedback to the user interface. For very large files on a fast network, consider using 4000 Kb to make good use of the bandwidth and reduce the File Input/Output overhead. If you want to send chunks larger than 4 MB, you must increase the .NET 2.0 Max Request Size limit in your web.config.

Conclusions

Feel free to use this code and modify it as you please. Please post a comment for any bugs, suggestions, or improvements. Enjoy!


Thursday, 29 December 2005 20:34:15 (GMT Standard Time, UTC+00:00)  #    Comments [24]  .Net General | .Net Windows Forms | Asp.Net