Saturday, August 18, 2007

Lucian's weblog : Retrieve data from Wikipedia using C#

This could be useful, since Wikipedia is now the source of all information in the world, along with Google & Facebook! 

To get other pages, you can simply use the direct link. The link is composed (as you’ve maybe already noticed) from http://en.wikipedia.org/wiki/Special:Export/ followed by the name of the data you want to retrieve. So, if you want data about William Shakespeare, the direct link to the XML file will be http://en.wikipedia.org/wiki/Special:Export/William_Shakespeare.

Knowing all this, it’s now simple to write a program to deal with the set of data provided by Wikipedia. The program looks something like this (and I’ll only give you the important part of the code, I’m sure you know where to put it):

private void button1_Click(object sender, EventArgs e)

{

System.Net.HttpWebRequest webRequest = (HttpWebRequest)System.Net.WebRequest.Create("http://en.wikipedia.org/wiki/Special:Export/William_Shakespeare");

webRequest.Credentials = System.Net.CredentialCache.DefaultCredentials;

webRequest.Accept = "text/xml";

try

{

System.Net.HttpWebResponse webResponse = (HttpWebResponse)webRequest.GetResponse();

System.IO.Stream responseStream = webResponse.GetResponseStream();

System.Xml.XmlReader reader = new XmlTextReader(responseStream);

String NS = "http://www.mediawiki.org/xml/export-0.3/";

XPathDocument doc = new XPathDocument(reader);

reader.Close();

webResponse.Close();

XPathNavigator myXPahtNavigator = doc.CreateNavigator();

XPathNodeIterator nodesText = myXPahtNavigator.SelectDescendants("text", NS, false);

while (nodesText.MoveNext())

textBox1.Text = nodesText.Current.InnerXml + " ";

}

catch (Exception ex)

{

textBox1.Text = ex.ToString();

}

}

Lucian's weblog : Retrieve data from Wikipedia using C#