I’m currently working on a personal use utility application which requires some web scraping to extract data (from HTML) which can be locally processed by the application. Getting the raw HTML was straightforward enough using the HttpWebRequest and HttpWebResponse classes in the System.Net namespace.

I then reached the point where I had some raw HTML in a string that I needed to parse. After doing a quick search I found CsQuery (available as a NuGet package) which is an open source JQuery port for .NET. I was able to easily extract the data I required from the HTML using the familiar JQuery-like selectors. There is an example code snippet below which shows just how easy it is to use CsQuery.

var html = new StringBuilder();
html.Append(“<html><body>”);
html.Append(“<h1>Hello, world!</h1>”);
html.Append(“<p class=’intro’>This program is using CsQuery.</p>”);
html.Append(“<p id=’author’>CsQuery is a library written by James Treworgy.</p>”);
html.Append(“</body></html>”);var dom = CsQuery.CQ.Create(html.ToString());

// Get the inner text of an element by element name selector
Console.WriteLine(dom[“h1”].Text());

// Get the inner text of an element by class name selector
Console.WriteLine(dom[“.intro”].Text());

// Get the inner text of an element by id selector
Console.WriteLine(dom[“#author”].Text());

// Add a class to an element
dom[“h1”].AddClass(“title”);

// Update the title text using new class in selector
dom[“.title”].Text(“CSQuery – a JQuery port for .NET”);

// Now retrieve the new title by a class selector
Console.WriteLine(dom[“.title”].Text());

// Pause console
Console.ReadLine();