.Net ramblings
# Friday, 02 December 2005
ADO.NET timeouts when filling a DataAdapter in the middle of a transaction

I had a transaction comprised of about 5 commands, and in the middle of it, i needed to do a SELECT on a table to know what values to insert for one of the commands.  I encountered a timeout just after i called Fill on the DataAdapter.  it took me a while to figure out that it was because the transaction had already taken out a lock on the table i was selecting from.  makes sense now, but i spent ages on it!  just posting it here in case anyone else runs into the same problem.


Friday, 02 December 2005 12:26:22 (GMT Standard Time, UTC+00:00)  #    Comments [0]  .Net General | .Net Windows Forms | Asp.Net | Database

# Wednesday, 23 November 2005
Clean Word HTML using Regular Expressions

Introduction

I've spent a long time trying many different approaches at getting rid of MS Word HTML, when importing or pasting text into my content management system, with very mixed success.  Previous efforts involved using the MSHTML Element Dom but this was slow and difficult to implement.  i think i've finally found a satisfactory and fast solution using only regular expressions.  Please feel free to use it in your applications, and post any improvements you may find.

The Code

/// <summary>
/// Removes all FONT and SPAN tags, and all Class and Style attributes.
/// Designed to get rid of non-standard Microsoft Word HTML tags.
/// </summary>
private string CleanHtml(string html)
{
// start by completely removing all unwanted tags
html = Regex.Replace(html, @"<[/]?(font|span|xml|del|ins|[ovwxp]:\w+)[^>]*?>", "", RegexOptions.IgnoreCase);
// then run another pass over the html (twice), removing unwanted attributes
html = Regex.Replace(html, @"<([^>]*)(?:class|lang|style|size|face|[ovwxp]:\w+)=(?:'[^']*'|""[^""]*""|[^\s>]+)([^>]*)>","<$1$2>", RegexOptions.IgnoreCase);
html = Regex.Replace(html, @"<([^>]*)(?:class|lang|style|size|face|[ovwxp]:\w+)=(?:'[^']*'|""[^""]*""|[^\s>]+)([^>]*)>","<$1$2>", RegexOptions.IgnoreCase);
return html;
}

Samples of non-standard Microsoft Word HTML

<SPAN lang=EN-IE style="mso-ansi-language: EN-IE">
<p class="MSO Normal">
<UL style="MARGIN-TOP: 0cm" type=circle>
<o:p>&nbsp;</o:p>
<li class=MsoNormal style='mso-list:l3 level1 lfo3;tab-stops:list 36.0pt'>

Explanation of Regular Expressions

I've spent a good deal of time examining the problematic tags that MS Word inserts in its HTML, some examples are shown above.  The above code is based on a few requirements for my CMS:

  • remove all FONT and SPAN tags, because all the content in my CMS is done through style-sheets.
  • remove all CLASS and STYLE tags because they mean nothing outside of the original word document
  • remove all namespace tags and attributes like <o:p> and < ... v:shape ... >

The first regular expression removes unwanted tags, and is broken down as follows:

<[/]?(font|span|xml|del|ins|[ovwxp]:\w+)[^>]*?>
  • match an open tag character <
  • and optionally match a close tag sequence </  (because we also want to remove the closing tags)
  • match any of the list of unwanted tags: font,span,xml,del,ins
  • a pattern is given to match any of the namespace tags, anything beginning with o,v,w,x,p, followed by a : followed by another word
  • match any attributes as far as the closing tag character >
  • the replace string for this regex is "", which will completely remove the instances of any matching tags.
  • note that we are not removing anything between the tags, just the tags themselves

The second regular expression removes unwanted attributes, and is broken down as follows:

<([^>]*)(?:class|lang|style|size|face|[ovwxp]:\w+)=(?:'[^']*'|""[^""]*""|[^\s>]+)([^>]*)>
  • match an open tag character <
  • capture any text before the unwanted attribute (This is $1 in the replace expression)
  • match (but don't capture) any of the unwanted attributes: class, lang, style, size, face, o:p, v:shape etc.
  • there should always be an = character after the attribute name
  • match the value of the attribute by identifying the delimiters. these can be single quotes, or double quotes, or no quotes at all.
  • for single quotes, the pattern is: ' followed by anything but a ' followed by a '
  • similarly for double quotes. 
  • for a non-delimited attribute value, i specify the pattern as anything except the closing tag character >
  • lastly, capture whatever comes after the unwanted attribute in ([^>]*)
  • the replacement string <$1$2> reconstructs the tag without the unwanted attribute found in the middle.
  • note: this only removes one occurence of an unwanted attribute, this is why i run the same regex twice.  For example, take the html fragment: <p class="MSO Normal" style="Margin-TOP:3em"> 
    the regex will only remove one of these attributes.  Running the regex twice will remove the second one.  I can't think of any reasonable cases where it would need to be run more than that.

Suggestions!

If you have any suggestions or improvments, please post them here as comments.
Thanks :)

p.s. thanks to BinBin for the fix to preserve attributes like 'align=center'.
Wednesday, 23 November 2005 15:40:36 (GMT Standard Time, UTC+00:00)  #    Comments [27]  .Net General

# Friday, 11 November 2005
Asp.Net generally useful datagrid code

i'm just posting this here for reference.  it is a datagrid that supports standard paging and sorting, and displays the current set of record indices e.g. "1-15 of 1000 records"

private void bindGrid()
{
    DataSet ds = new DB.Audits().SelectAllPendingAudits();

    // the sorting is always retrieved from viewstate, if it exists or not.
    string sort = String.Concat(ViewState["Sort"], "");
    if(sort != "")
        this.lblSort.Text = "Sorted by " + sort + " in ascending order";

    int numRows = ds.Tables[0].Rows.Count;
    if(numRows > 0)
    {
        int start = this.DataGrid1.CurrentPageIndex * this.DataGrid1.PageSize + 1;
        int end = Math.Min(numRows, start + this.DataGrid1.PageSize - 1);
        this.lblHeading.Text = String.Format("Displaying {0}-{1} of {2} records", start, end, numRows);
        this.DataGrid1.AllowPaging = (numRows > this.DataGrid1.PageSize);    // don't show pager unless relevant
        this.DataGrid1.Visible = true;
        DataView dv = new DataView(ds.Tables[0]);
        dv.Sort = sort;
        this.DataGrid1.DataSource = dv;
        this.DataGrid1.DataBind();
    }
    else
    {
        this.lblHeading.Text = "No records";            
        this.DataGrid1.Visible = false;
    }
}

private void DataGrid1_ItemCommand(object source, System.Web.UI.WebControls.DataGridCommandEventArgs e)
{
    if(e.CommandName == "Sort")
    {
        ViewState["Sort"] = e.CommandArgument.ToString();    // is picked up in bindGrid() function
        this.DataGrid1.CurrentPageIndex = 0;
        this.bindGrid();                
    }            
}

private void DataGrid1_PageIndexChanged(object source, System.Web.UI.WebControls.DataGridPageChangedEventArgs e)
{
    this.DataGrid1.CurrentPageIndex = e.NewPageIndex;
    bindGrid();
}

Friday, 11 November 2005 11:58:32 (GMT Standard Time, UTC+00:00)  #    Comments [0]  Asp.Net

# Thursday, 03 November 2005
HowTo: present a radiobuttonlist with images

i looked on the newsgroups to see if anyone had posted anything about this, and i found a few dead-end posts which seemed to conclude that it couldn't be done. 
i used a very simple approach that works well, and am posting it here for anyone looking to see how to do it.   the requirements are to present a radio-button-list with images instead of just text.

string imageBankFolder = "/ImageBankFolder/Thumbnails/";
DataSet ds = new DB.ImageBank().Select(); // get your dataset from wherever
foreach(DataRow dr in ds.Tables[0].Rows)
   this.RadioButtonList1.Items.Add(new ListItem(String.Format("<img src='{0}'>", imageBankFolder + dr["ImageFile"].ToString()), dr["ImageID"].ToString()));

this displays the images only.  note: firefox works fine with this, you can click on the image to select it, but IE6 requires you to actually click on the round radio button icon.  to work around this, i included some text above the image, which sits beside the button, and it is more intuitive for the user to click the text or the radio icon then.  to include some text above the image, try the following:

this.RadioButtonList1.Items.Add(new ListItem(String.Format("{1}<BR><img src='{0}'>", imageBankFolder + dr["ImageFile"].ToString(), dr["Text"].ToString()), dr["ImageID"].ToString()));

hope this helps someone out there.


Thursday, 03 November 2005 18:52:23 (GMT Standard Time, UTC+00:00)  #    Comments [14]  Asp.Net

# Wednesday, 05 October 2005
Send Ctrl-Alt-Delete via Remote Desktop

i wanted to change the admin password on my web server through remote desktop, but Ctrl-Alt-Delete always goes to the local computer. 

i found out you can also use Ctrl-Alt-End to achieve the same thing, which works in remote desktop.


Wednesday, 05 October 2005 12:35:21 (GMT Daylight Time, UTC+01:00)  #    Comments [45]  Windows Server

# Friday, 30 September 2005
HowTo: Backup RRAS configuration to text files

Thanks to Dusty Harper for his post on the server.networking MS newsgroup, to backup RRAS settings using Netsh:

Netsh Routing Dump > Routing.txt
Netsh RAS Dump > RAS.txt 

Then you can use Netsh Exec to playback the file.

Note: if you do a ntbackup of system state, the RRAS settings are also included in that.  i just like having the text file versions just in case.


Friday, 30 September 2005 13:49:56 (GMT Daylight Time, UTC+01:00)  #    Comments [0]  Windows Server

# Wednesday, 28 September 2005
Removing rows from a dataset (i.e. achieving TOP functionality when you can't use SQL)

This might sound really obvious, but i couldn't find a better way.  Normally i would use TOP in the SQL query to limit the number of records i want to retrieve, but in my case, this value is parameterised and Access won't allow me to parameterise that value.  I tried using a DataView but TOP isn't one of it's supported functions.  So i just loop through the dataset and keep removing rows until the right number of records is reached.

int maxItems = 5;
while(ds.Tables[0].Rows.Count > MaxItems)
    ds.Tables[0].Rows.RemoveAt(MaxItems);

Wednesday, 28 September 2005 11:01:17 (GMT Daylight Time, UTC+01:00)  #    Comments [0]  .Net General | Database

# Saturday, 10 September 2005
Mapping .html pages to Asp.Net

I was doing an upgrade on a web site recently, and all the pages were .html pages.  I wanted to add some .Net functionality, but didn't want to change all the urls, for bookmarks, search engines etc.  As well as scaring off the client with the strange ".aspx" file extensions.  yes- many irish companies are still technophobic. 

Add an IIS mapping for .html

i remember how to change mappings for a file extension in IIS (web site properties > home directory > configuration), so i did this for .html pages by adding a mapping for .html to aspnet_isapi.dll (copy the full path from the mapping for .aspx). 

Add a HttpHandler to the application web.config file

when i did the above, my .net code was ignored and rendered as plain text.  i found out this was because the web application (at the .net level) wasn't configured to handle .html files as .aspx files. this is what i added to my web.config to get it working:

<configuration> <system.web> <httpHandlers> <add verb="*" path="*.html" type="System.Web.UI.PageHandlerFactory" /> </httpHandlers> </system.web> </configuration>

now the whole application works with full .net functionality, overcoming all those migration problems usually associated with .net upgrades.


Saturday, 10 September 2005 14:38:39 (GMT Daylight Time, UTC+01:00)  #    Comments [0]  Asp.Net

# Friday, 09 September 2005
An asp.net button that disables itself automatically after clicking.

Some users of a web application i wrote insist on clicking buttons more than once, probably out of impatience. this often causes duplicate key exceptions with the database, because the first time they clicked the button the record was created, and the second time they clicked it, an exception is thrown, so they get the error screen and don't know what they did wrong. 

i wanted to write a button control that would disable itself automatically and re-enable itself once it was finished.  i couldn't find any good samples out there.  javascript is obviously the answer, and the solution i came up with is quite simple.  here's the code: currently only works with .Net 1.1:

	/// <summary>
/// A button control that disables itself when clicked, and changes the text to "Please wait..."
/// This is to prevent duplicate clicks by impatient or novice users.
/// It requires the button to be placed in a server form.
/// </summary>
[DefaultProperty("Text"), ToolboxData("<{0}:SmartButton runat=server></{0}:SmartButton>")]
public class SmartButton : Button
{

/// <summary>
/// Add an 'onClick' attribute to disable the button when it is clicked, and submit the form,
/// invoking the postback.
///
/// The onClick code handles the case where __EVENTTARGET is registered on the page, in which case
/// this variable is set to the button ID, and the form is submitted.
/// The other case is where __EVENTTARGET does not exist on the page, i found this sometimes
/// occurred on pages with only one button. In this case, the form is simply submitted, and the
/// button_click event will be raised by virtue of the default submit button in the form.
/// </summary>
protected override void Render(HtmlTextWriter output)
{
string onClick = "if(this.form != null && this.form.__EVENTTARGET != null){ this.form.__EVENTTARGET.value='" + this.UniqueID + "'; this.disabled = true; this.value = 'Please wait...'; this.form.submit(); } else this.form.submit(); ";
if(this.Attributes["onclick"] != null) // prepend the existing onClick attributes
onClick = this.Attributes["onclick"].ToString() + onClick;
this.Attributes.Add("onclick", onClick);
base.Render(output);
}

protected override void OnClick(EventArgs e)
{
// do the OnClick code first
base.OnClick (e);

// then reset the enabled + text values to their original state
int insertAt = Math.Max(this.Page.Controls.Count-1, 0); // never insert at -1 if there are no controls on the page
this.Page.Controls.AddAt(insertAt, new LiteralControl(String.Format(@"
<script>
if(document.getElementById('{0}') != null)
{{
document.getElementById('{0}').disabled = false;
document.getElementById('{0}').value = '{1}';
}}
</script>
", this.UniqueID, this.Text)));
}
}

Friday, 09 September 2005 15:41:29 (GMT Daylight Time, UTC+01:00)  #    Comments [2]  Asp.Net

# Saturday, 27 August 2005
A 'Progress-Task-List' control

I decided to write this control when i realised there are some cases where a progress bar is not enough, even with a label that gets updated for each new stage of a list of operations.

Sometimes you want the user to see what's coming next, what has already been done, and the ProgressTaskList control does just that.  it's nothing new of course, often used in installers and the like, but there didn't appear to be any .Net control like this.

I've submitted the control to code-project (URL to follow), which is where any updates will be posted.  If you have comments or suggestions, it's best to put them on the code-project site please.

I'm quite pleased with the way it turned out.  The code is very simple, and i couldn't find any bugs having done lots of testing.

As you can see in the screenshot to the left, it handles scrolling quite well, and automatically jumps to the current task when it is starting a new one.

To use it you just specify TaskItems as a string[] and call Start() to set it off. Then call NextTask() every time a task is finished to advance it to the next task. 

You can download the source here (30Kb) if you like, but check on code-project for updates.


Saturday, 27 August 2005 01:28:29 (GMT Daylight Time, UTC+01:00)  #    Comments [0]  .Net Windows Forms