Reading "Forbidden" Web Pages with C#
/In my session at Techdays last week I was showing how to make an asynchronous method that reads a web page. I was bemoaning the way that my method worked for www.bbc.co.uk but not for the wonderful site that is www.robmiles.com. When I try to read "my" page the request fails with a "403 Forbidden error".
Well, many thanks to Erik van Telgen, René Vermijs and @HenroOnline who all came back with the answer. Turns out that, unlike the liberal BBC, my blog host insists that that any read requests come from browsers, not programs. Fortunately it is easy to modify the web request to appear to be from a browser, and so get the Html back.
public async Task<string> GetPageAsStringAsync(string url) { HttpClient x = new HttpClient(); x.DefaultRequestHeaders.Add("user-agent", "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)"); HttpResponseMessage response = await x.GetAsync(url); string content = await response.Content.ReadAsStringAsync(); return content; }
This version of "GetPageAsStringAsync" returns the contents of a page and impersonates a browser when it does it.
string site = await GetPageAsString(@"http://www.robmiles.com");
It's very easy to use, as you can see above. And this works perfectly. Many thanks for your help folks.
I really hope that I get invited back to Techdays next year, I'm sure by then I'll have another bunch of technical problems I can get help with...