.Net ramblings
# Sunday, 05 December 2004
An intelligent 404 page, suggest pages with a similar name to the requested page

 

Introduction

Did you ever notice on big sites like microsoft.com, if you reach a page that doesn't exist, they don't just say "Sorry, 404.", they give you a list of pages that are similar to the one you requested.  This is obviously very nice for your users to have, and it's easy enough to integrate into your site.  This article provides source code and explains the algorithm to accomplish this feature. Note: the real benefit of the approach outlined here is the semi-intelligent string comparisons.

Background

The need for this grew out of a client of mine who was changing content management system, and every url in the site changed, so all the search engine results came up with 404 pages. this was obviously a big inconvenience, so i put this together to help users find their way through the new site when arriving from a search engine.

See it in action

Go to the following page http://www.iserc.ie/December15-ISERCWorkshoponTesting.html (which doesn't exist), and the 404 page should give you a list of pages that have quite similar names.

Requirements

  • Your web site must be set up so that 404 pages get redirected to a .net aspx page
  • You must have some way of getting an array of all the page urls in your site that you want to compare 404 requests against. if you have a content management system, there is probably a structure of all the pages stored in xml or a javascript array (for DHTML menus or something), or you could write your own query to get the pages from a database. if don't use a content management system, you could hard-code a string array variable in the 404 page code behind containing the page names, or think up some way of dynamically reading all the .aspx or .html pages from the file system.
  • When the 404 page is accessed, you need to know which page was requested. Using web.config, you can set up 404 errorCodes to go to /404.aspx, where it will tag on the requested page to the querystring. the source code here assumes you have this approach, but you can obviously change it to your own needs, simply change the GetRequestedUrl() function.

Why Regular Expressions are not enough

To compare strings, you can use System.String.IndexOf or you can use regular expressions to match similarities, but all these methods are very unforgiving for slight discrepancies in the string. in the example url above, the page name isDecember15-ISERCWorkshoponTesting.html but under the new content management system, the url is December 15 - ISERC Workshop - Software Testing.html, which is different enough to make traditional string comparison techniques fall down.

So i looked around for a fuzzy string compare routine, and came across an algorithm written by a guy called Levenshtein. His algorithm figures out how different 2 strings are, based on how many character additions, deletions and modifications are necessary to change one string into the other. This is called the 'edit distance', i.e. how far you have to go to make 2 strings match. This is very useful because it takes into account slight differences in spacing, punctuation and spelling. I found this algorithm on http://www.merriampark.com/ld.htm where Lasse Johansen kindly ported it to C#. The algorithm is explained at that site, and it is well worth a read to see how it is done.

Normalising the Scores

I originally had a problem with the algorithm because it gave surprising results for certain situations. If the 404 page request was for 'hello' and there is a valid page called 'hello_new_version' and another valid page called 'abcde', then the 'abcde' page gets a better score, because fewer changes are needed to make it the same as hello (just change the 5 characters in 'abcde' into 'hello'). this is 5 changes, even though the 'hello_new_version' is semantically a better match. Fortunately, a kind newsgroup participant named Patrice suggested that i divide the score by the length of the comparison string, to normalise the results. This worked perfectly, and i found that a score between 0 (perfect match) and 0.6 (a good match) is worth including as a suggested page. You can change this value in the ComputeResults() method if you want to make it more or less flexible.

Code Summary

private void Page_Load(object sender, System.EventArgs e)
{
   GetRequestedUrl();
   SetUpSiteUrls();
   ComputeResults();
   BindList();
} 

The above code shows the 4 key tasks that make up this solution. Each method is explained below.

Using the code

  1. GetRequestedUrl() simply figures out which page was requested. In this example, it is assumed that your web.config contains the following:
    ***In this example, the querystring on the 404.aspx page contains the requested url. SetUpSiteUrls() is where you load in all the pages in your site. In my content management system, i have an xml file with all the names, so i do an XPath query and add in the names one by one to the arraylist.
  2. private void SetUpSiteUrls()
    {
      this.validUrls = new ArrayList(); 
      /*
      * Insert code here to add the pages in your site to this arraylist
      */ 
    }
  3. ComputeResults() iterates through the urls you set up in SetUpSiteUrlsreturns and attaches a score of how close each one is to the requested url. It also sorts the results and discards any that are not a close match.
    private void ComputeResults()
    { 
        ArrayList results = new ArrayList(); // used to store the results 
        // build up an arraylist of the positive results 
        foreach(string s in validUrls) 
        {
            // don't waste time calculating the edit distance of nothing with something 
            if(s == "") continue; 
            double distance = Levenshtein.CalcEditDistance(s, this.requestedUrl); // both in lower case 
            double meanDistancePerLetter = (distance / s.Length); // anything between 0.0 and 0.6 is a good match. the algorithm always returns a value >= 0       
            if(meanDistancePerLetter <= 0.60D) 
            { 
                // add this result to the list. 
                results.Add(new DictionaryEntry(meanDistancePerLetter, "<a href='" + s + ".html'>" + s + "</a>")); // use dictionary entries because we want to store the score and the hyperlink. 
                // can't use sortedlist because they don't allow duplicate keys and we have 2 hyperlinks with the // same edit distance. 
            }
        }
        results.Sort(new ArrayListKeyAscend());
    }
    IMPORTANT NOTE: One thing to definitely look out for is the inner-most line of the above code. results.add(new DictionaryEntry(...). I am adding in a html hyperlink, with the name of the page + ".html". This may not be a correct link in your web site, because you may have removed the folder part of the url while populating the validUrls arraylist. You may need to expand the data structures used in this code to include full url for each page.

  4. BindList() simply binds the arraylist of results to the datagrid, which is configured to display them in a bulleted list.
    private void BindList()
    {
        if(results.Count > 0)
        {
    
            this.lblHeader.Text = "The following pages have similar names to " + this.requestedUrl + "";
            this.DataList1.DataSource = results;
            this.DataList1.DataBind();
        }
        else
        {
            this.lblHeader.Text = "Unable to find any pages in this site that have similar names to " + this.requestedUrl + "";
        }
    }

The 'magic' in the code is all done with the Levenshtein.CalcEditDistance method which returns the distance between 2 strings. It is included in the source.

Winforms Test Application

If you're interested to test out the levenshtein algorithm, i've written a windows forms application that lets you enter a string (e.g. a page url) and also a list of strings to compare it against (e.g. all the page urls in your site), and it gives you the 'edit distance' scores. download here (7k)

Comments

I think this is a great feature because it adds significant value to the user experience for a web site. Please feel free to comment below if you have any questions, find any bugs, improvements, or if you can't get it working, or if you use it in a novel way.

Enjoy!


Sunday, 05 December 2004 15:15:28 (GMT Standard Time, UTC+00:00)  #    Comments [0]  Asp.Net

# Monday, 22 November 2004
How to disable new rows in Windows Forms DataGrid

i found a few work arounds to prevent a datagrid from displaying the * new row. one of them involved using a dataview as the datasource, with AllowNew property set to false. however, someone called Sameers from theAngrycodeR pointed out that a datatable has a DefaultView property which also has this AllowNew property. so you can use the following code (if your datasource is a dataset):

this.dataSet1.Tables[0].DefaultView.AllowNew = false;

and you get to keep the dataset or datatable as the direct datasource.


Monday, 22 November 2004 17:05:24 (GMT Standard Time, UTC+00:00)  #    Comments [0]  .Net Windows Forms

Using custom datatypes in a .Net Dataset

Background

In my content management system, i allow the user to define their own 'objects' (e.g. Staff Member) and then i provide templated data entry forms to let them populate instances of these objects.  It's aimed at non-techies so i have my own datatypes called 'Text' which maps to System.String, 'Number' maps to System.Double etc.  I also have a few custom data types called 'File' and 'Image' to allow the user to add files or images to an instance of the object. 

Problem

This business of doing column-mapping was ok as long as my data types had obvious .Net equivalents, but 'Image' doesn't in my case. i'm only storing a reference to the image, but in my application, it's not to be treated just as a System.String.  When the user is creating a new object with an 'Image' field in it, i want to display a file upload instead of a textbox, and when i go to display the object on the site, I want to display a html IMG tag with the SRC set to the value of the image field. 

Solution

The dataset is serialised into an xml file with the schema embedded. i needed to find some way of encoding my own custom data type information into the dataset that would persist into the xml file.  I looked through the VS intellisense and found the 'ExtendedProperties' data column property.  This property allows you to plug in any number of key/value pairs of information to each column.  This was exactly what i needed, so i added in a pair with something like "MyDataType=Image" for each column.  This persisted nicely into the xml file as follows:

<xs:complexType>
  <xs:sequence>
     <xs:element name="Photo" msprop:MyDataType="Image" type="xs:string" minOccurs="0" />

Note that the official type of the field is "xs:string", because it contains a path to the image. but now it also has the custom data type tagged on to the column definition. in this respect, i'm glad to see that MS have provided a very elegant and flexible framework.


Monday, 22 November 2004 16:53:35 (GMT Standard Time, UTC+00:00)  #    Comments [0]  .Net General

# Thursday, 18 November 2004
System.Net.WebException: The request failed with HTTP status 401: Unauthorized.

I have a WSE2 web service that was working fine until one day i got the above error.  i realised i had changed the permissions on the folder (for the web application), so that only System, Adminstrators and ASPNET had permissions on it.  Previously the 'Users' group had permissions.  By a process of elimination, i found out that IUSR_xx needed to have read/execute permissions aswell as ASPNET, even though the process is running with the identity of ASPNET. 

it's strange, but i thought i'd post my solution here in case anyone else comes across this.


Thursday, 18 November 2004 12:45:53 (GMT Standard Time, UTC+00:00)  #    Comments [5]  Asp.Net

# Monday, 08 November 2004
Asp.Net DataGrid PageIndexChanged not working

i have an asp.net custom server control (deriving from a datagrid) and it has built in paging, sorting and databinding.  i ran into a weird problem, and thanks to Rick Strahl's post on http://west-wind.com/weblog/posts/211.aspx i was able to get it working.  Read that page first because there are many solutions posted for this weird error.

i'm using regular paging, and the next button works fine (all the time, for multiple pages), but if i click previous then a dud postback happens and the pageIndexChanged event doesn't fire.  i'm 100% sure that the eventhandler is hooked up, it just doesn't fire.  the solution in my case was to bind the datagrid in the control's PreRender method instead of in the OnInit method.  it just worked fine after doing this. 


Monday, 08 November 2004 18:33:01 (GMT Standard Time, UTC+00:00)  #    Comments [0]  Asp.Net

# Wednesday, 03 November 2004
using the same WSE2 web service with 2 different policies..

did you know you can configure multiple policies for the same web service?  it's possible because endpoint uri's are case-sensitive, so you can have WebService1.asmx and WEBSERVICE1.asmx, which are treated as separate web services in the policyCache.config file. see the sample below:

<endpoint uri="http://localhost/winDB.asmx">
 <defaultOperation>
  <request policy="#username-token-signed" />
  <response policy="" />
  <fault policy="" />
 </defaultOperation>
</endpoint>

<endpoint uri="http://localhost/WINDB.asmx">
 <defaultOperation>
  <request policy="" />
  <response policy="" />
  <fault policy="" />
 </defaultOperation>
</endpoint>

the first one uses a username-token-signed policy for authentication.  clients who wish to use this policy must have a reference to the web service matching the case of the endpoint uri exactly. 

the second endpoint has no policy enforcements and this means even a non-WSE request can use the web service.

some WSE implementations, (especially custom username tokens..) will have a method like "checkAuth()" that every web method calls at the start to verify programattically that the message obeys the rules.  this method throws soap faults for any missing WSE elements in the message header.  in my case, i want to allow requests originating from the web server itself (.aspx pages using the web methods) to bypass the authentication checks, so i put the following lines of code at the top of my "checkAuth()" method to allow requests made on the same server to go through:

// allow local ws requests to bypass security
if(HttpContext.Current.Request.ServerVariables["REMOTE_ADDR"].ToString() == "127.0.0.1")
  return;  // skip further checks

i could also invoke the web methods using the web service class directly, (not go through a web service proxy) because it's within the same assembly, but i'm sure there are circumstances where this approach may prove useful.  if you find any, post them here as a comment, i'd be interested to hear.  


Wednesday, 03 November 2004 17:43:16 (GMT Standard Time, UTC+00:00)  #    Comments [0]  Asp.Net

# Tuesday, 26 October 2004
TextBox.Focus() doesn't work within Form_Load

i have a windows form with a textbox inside a panel, and the following code doesn't work as expected:

private void WizardLoad(object sender, System.EventArgs e)
{
  this.txtUsername.Focus();
}

Thanks to someones post

i now know that the best way to do it is as follows:

this.ActiveControl = this.txtUsername;

It works!


Tuesday, 26 October 2004 12:06:57 (GMT Daylight Time, UTC+01:00)  #    Comments [1]  .Net Windows Forms

# Sunday, 10 October 2004
SQL Server - Access denied messages using Server=(local) with named instances

I'm developing a .net application that uses a sql server database.  i develop it on a desktop and a laptop, and i want to configure the app to use the local instance of sql server, rather than the named instances: LAPTOP\LAPTOP and DESKTOP\DESKTOP, because if the laptop is not on the network then it can't connect to any other db.  I tried connecting with the following connection string:

Server=(local); initial catalog=MyDB; UID=sa; PWD=whatever; 

but it didn't work,  "SQL Server does not exist or access denied".  I found out after a lot of searching that this is because i had installed a 'named instance' of sql server rather than the 'default instance'.  This is a tick box during the sql server install.  I also found out that its easy to change it back to a default instace.  all you do is run the sql server install again, and choose the default instance. you then have 2 sql server installations (they don't conflict).  if you look in the add/remove programs, there are 2 entries, one 'microsoft sql server' and another 'microsoft sql server (INSTANCE NAME)'.  if you need to copy the database from the instance to the default, do that via backup from the instance and restore to the default database. then just uninstall the instance sql server from control panel > add/remove programs. 

i also wasn't able to log in with "Server=localhost". i had to use "Server=(local)". 

 


Sunday, 10 October 2004 11:24:36 (GMT Daylight Time, UTC+01:00)  #    Comments [0]  Database

# Sunday, 03 October 2004
Complications with textbox and Newline and Carriage Return characters

My windows application uses a multi-line textbox to allow the user to update text, which gets sent to and from a web service, and then into and out of a sql database.  Somewhere along the way, the \r\n characters turn into \n characters.  So when i re-load the string from the web service into the textbox, it gets displayed as block characters instead of proper line breaks.  I wrote the following simple function to work around this issue. 

It firstly removes all the \r characters, which has the effect of changing the \r\n characters to \n. Then it replaces the \n chars to \r\n. 

Note: The reason you can't just replace the \n chars with \r\n is because you would end up with \r\r\n chars if there were \r\n's in the string before you did the replace.

/// 
/// this method formats a string for correct display in a multiline text box.
/// It removes \n characters and replaces them with correct \r\n chars
/// 
public static string FormatForMultiLineTextBox(string s)
{
return s.Replace("\r","").Replace("\n","\r\n");
}

i will also take this opportunity to share a method i use to prevent errors caused by sending certain binary characters across a web service (at least in WSE 2).  The method removes all un-useful binary characters (useful... in terms of my app).

/// 
/// This method removes all binary characters except:
/// Vertical Tab 0x09
/// New Line 0x0A
/// Carraiage Return 0x0D
/// 
public static string StripBinaryChars(string s)
{
    return Regex.Replace(s, @"[\x00-\x08\x0B-\x0C\x0E-\x1F]","");
}

see www.asciitable.com for more information on the hex codes for these binary characters.


Sunday, 03 October 2004 18:40:37 (GMT Daylight Time, UTC+01:00)  #    Comments [0]  .Net Windows Forms

# Sunday, 19 September 2004
Problems using Visual Studio.Net and Visual Source Safe with Web Projects

If you get annoyed at Visual Source Safe's complications with using asp.net web projects, and integrating them with Visual Studio.Net, an increasingly popular approach is to use 'class library' projects instead of web projects.  (Of course, this problem will disappear with Whidbey but i'm still using VS 2003).  A web project compiles a dll / class library anyway, and VS doesn't have to interact with IIS to load the project.  It isn't hard to set up but it does require careful instructions to be followed.  A guy called Fritz Onion prepared the content below:

This is copied/pasted here in case Fritz's link changes or disappears. The original content is available from:
http://www.pluralsight.com/fritz/Samples/aspdotnet_without_web_projects.htm

Reference prepared by Fritz Onion

The Web Project wizard in Visual Studio .NET is convenient for creating quick ASP.NET applications on your local machine, but in an effort to simplify your life, it also makes many decisions for you that are difficult to change if you need more flexibility. My biggest pet peeve with Web Projects is that you cannot even open a .sln file if the virtual directory mapping in IIS is not set up correctly. I also dislike the way it places .sln and .csproj | .vbproj files in a separate location from the actual source files (I understand that this is necessary to allow application creation directly on a server, but I never deploy that way).

As a result, most of my web projects are created as standard class library projects. Unfortunately this means that you don't get the nice Web component wizards (like WebForms and UserControls). However, with a little tweaking, you can have it all.  I have prepared this document describing how to enable these wizards in class library projects (thanks to Dan Sullivan for pointing out how to do this), as well as how to convert existing Web Projects to class library projects and still keep the nice integrated debugging.

To enable Web wizards in a class library project:

In a directory called

C:\Program Files\Microsoft Visual Studio .NET 2003\VC#\CSharpProjectItems\LocalProjectItems

is a file callled localprojectitems.vsdir.

Likewise in a directory

C:\Program Files\Microsoft Visual Studio .NET 2003\VC#\CSharpProjectItems\WebProjectItems

is a file called webprojectitems.vsdir.

If open the second file with notepad you can figure out the lines to copy to the first file to be able to add the usual files you need to create an aspx page or web service to a class lib project.

Once you have copied these thing open VS, open a class lib and go to add new item and you will see these additional file types available.

To set the output of a class library project to go to a /bin directory of your choosing:

1. Right-click on the project and select properties
2. Set Configuration to 'All Configurations' to affect both debug and release builds
3. Under configuration properties/build set the OutputPath to the /bin directory
 

To convert an existing web project into a class library project:

1. Open the .sln file in a text editor, and change the reference to the project from an http://... reference to a simple reference to the .csproj (or .vbproj) filename. For example:
change:
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "WebApplication133", "http://localhost/WebApplication133/WebApplication133.csproj", "{39CB37A5-F735-4684-B5DA-DD355B683090}"

to:
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "WebApplication133", "WebApplication133.csproj", "{39CB37A5-F735-4684-B5DA-DD355B683090}"

2. If there is one, delete the .webinfo file
3. Open the .csproj (or .vbproj) file and change the ProjectType attribute from "Web" to "Local"

To set up a class library project to run a browser when you debug it:

1. Right-click on the project in the solution explorer and select properties
2. Under Configuration Properties/Debugging, change Debug Mode from 'Project' to 'URL'
3. Hit Apply
4. In the Start URL field, enter the complete url to the page you want to hit to debug, like:
http://localhost/testproj/webform1.aspx


Sunday, 19 September 2004 23:32:44 (GMT Daylight Time, UTC+01:00)  #    Comments [0]  .Net General