Quantcast
Channel: htmlagilitypack Forum Rss Feed
Viewing all 450 articles
Browse latest View live

New Post: Difficulty with XPath to get a single result

$
0
0
I'm not yet very familiar with XPath, but what I'm trying to do shouldn't be this difficult. Basically, I'm periodically checking Google finance to see what the current stock price of a few companies. It's nothing fancy, but I've hit a wall.

When I read the html, I can see the price is kept in a node with the id called 'price-panel'. Should be easy, but I'm getting an error when I try to get the node.
string url = "http://www.google.ca/finance?q="+Symbol;
var getHtmlWeb = new HtmlWeb();
var document = getHtmlWeb.Load(url);

var node = document.DocumentNode.SelectNodes("//[@id='price-panel']");
At this point, the following exception gets thrown:
System.Xml.XPath.XPathException: Expression must evaluate to a node-set.
Taking a quick peek at the HTML from right now, it looks like this:
<div id=price-panel class="id-price-panel goog-inline-block">
<div>
<span class="pr">
<span id="ref_16234934_l">2.70</span>
</span>
<div class="id-price-change nwp">
<span class="ch bld"><span class="chr" id="ref_16234934_c">-0.05</span>
<span class="chr" id="ref_16234934_cp">(-1.82%)</span>
</span>
</div>
</div>
<div>
<span class=nwp>
Aug 17 - Close
</span>
<div class=mdata-dis>
<span class=dis-large><nobr>CVE
data delayed by 15 mins -
<a href="//www.google.ca/help/stock_disclaimer.html#realtime"  class=dis-large>Disclaimer</a>
</nobr></span>
<div>Currency in CAD</div>
</div>
</div>
</div>
It's definitely in there. What am I missing?

New Post: Strong Name for PCL Binary

$
0
0
I just download the nuget package for 1.4.9 and the both of the Portable Class Library versions of HtmlAgilityPack.dll do not have strong names. Was this an oversight?

New Post: problems with DocumentNode.SelectNodes() parameters

$
0
0
Hi,
I'm trying to get the following information in the following HTML document (07:34,16:22,08:47) at the bottom of the page . How should I parse the HTML? I have problems finding the right string:
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(sb);
HtmlNodeCollection title =doc.DocumentNode.SelectNodes(???????????????????????????)
thanks
<div class="col-sm-10 col-md-8">
                <table class="table table-striped table-hover well">
                    <thead>
                        <tr>
                            <th>Date</th>
                            <th>Sunrise</th>
                            <th>Sunset</th>
                            <th>Day length</th>
                        </tr>
                        <tr>
                            <th colspan="4">
                                <script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
                                <!-- sunResTimesMonthCenter -->
                                <ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-9603966150990290" data-ad-slot="6680811280" data-ad-format="auto"></ins>
                                <script>(adsbygoogle=window.adsbygoogle||[]).push({});</script>
                            </th>
                        </tr>
                    </thead>
                    <tbody>
                                                                <tr>
                                                        <td><a href="/en/sun/canada/dorval/2015/january/1" title="Sunrise and sunset times Dorval, January 1, 2015">1 January 2015<span class="hidden-xs hidden-sm underlined">, Thursday</span></a></td>
                            <td>07:34</td>
                            <td>16:22</td>
                            <td>08:47</td>
                        </tr>

New Post: Ajax Request

$
0
0
You can intercept or make requests via ajax

New Post: problems with DocumentNode.SelectNodes() parameters

$
0
0
I am not sure what you are trying to parse out of that HTML. But, for example, if you need href of all anchor tags, you could do:
foreach (var link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
    foreach (var linkAtt in link.Attributes)
    {
        linkAtt.Value.Dump();
    }
}
And you can change parameter for SelectNodes method to be be "//td" or "//div"

Yeah, unfortunately, I don't find complete documentation as well. Would be nice if anyone could point out where to look for.

New Post: Ajax Request

New Post: StackOverflowException in HtmlNodeCollection.GetEnumerator()

$
0
0
Hi!

I'm writing a web crawler, and I get a StackOverflowException in HtmlNodeCollection.GetEnumerator(), but on the call stack there are a lot of HtmlNode.CloseNode() calls:

HtmlAgilityPack.dll!HtmlAgilityPack.HtmlNode.CloseNode(HtmlAgilityPack.HtmlNode endnode) Line 1679 C#
HtmlAgilityPack.dll!HtmlAgilityPack.HtmlNode.CloseNode(HtmlAgilityPack.HtmlNode endnode) Line 1679 C#
HtmlAgilityPack.dll!HtmlAgilityPack.HtmlNode.CloseNode(HtmlAgilityPack.HtmlNode endnode) Line 1679 C#
...

Line 1679 is: foreach (HtmlNode child in _childnodes)


I don't know what page it is on. However, even if I knew what pages it was on, what can I do about it? OptionFixNestedTags doesn't help, and since StackOverflowException can't be caught, I'm not sure what I can do about it.


Any ideas?

New Post: error on get Attributes[].Value

$
0
0
<a title="one-punch-man-ep-01-viet-sub" data-episode-tap="01" data-episode-id="127345" data-type="watch" class="" href="http://anime47.com/xem-phim-one-punch-man-ep-01/127345.html"><b>01</b></a>
<a title="one-punch-man-ep-02-viet-sub" data-episode-tap="02" data-episode-id="128858" data-type="watch" class="active" href="http://anime47.com/xem-phim-one-punch-man-ep-02/128858.html"><b>02</b></a>
C#

var eplist = doc.DocumentNode.SelectNodes("//li").ToList();
foreach (var item_ep in eplist)
{
doc.LoadHtml(item_ep.InnerHtml);
doc.DocumentNode.SelectNodes("//b")[0].InnerText; -> work
doc.DocumentNode.SelectNodes("//a")[0].Attributes["data-episode-id"].Value; -> work
doc.DocumentNode.SelectNodes("//a")[0].Attributes["href"].Value; -> get error in doc.LoadHtml(item_ep.InnerHtml);
}

how to fix it?
thank everyone.

New Post: Malformed HTML parsing problem - unclosed li element within a form

$
0
0
Hej there,

I know this discussion is quiet old but I just encountered the same problem with the unclosed <li> tag. I've searched for hours because I did not believe the parser could be the problem rather my incapacity to understand the complex form structure.

I'm using Version 1.4.9 of the html agility pack.

Would be great if the htmlagilitypack would be tolerant enough to parse such malformed html documents as they are quiet often malformed in the web...


Greetings
Mexallon

New Post: HtmlAgilityPack parsing html using templates

$
0
0
Hi,

I have heard from my friend that it is possible to parse html document using HTMLAgilityPack with predefined template.Could you please suggest whether it is possible to define template and parse html document using this templates?

For example,html document looks like

<html>
<body>
<div id="id1"><div id="id2">some content </div><div id="id3">some other content</div></div></body>

<html>

Template will look like

<template id="id1">
<template id="id3"></template>
</template>

New Post: Nuget Install is prompting for C# files.

$
0
0
Hello,

Can anyone tell me how to stop Html Agility Pack from prompting for C# files?

I installed Html Agility Pack via Nuget.
https://www.nuget.org/packages/HtmlAgilityPack

But when I try to run or debug it it I get prompted to find HtmlDocument.cs and other C# code files for the Html Agility Pack project.

Any suggestions for how to fix this would be most welcomed.

thanks!

New Post: HTMLAgility in UWP App => List all url of an website

$
0
0
Hello,
I want to List URL of an Website. It works via Regex and Agility pack - on Windows form Apps.

Now I want to perform this Task in a UWP app. Unfortunately there is no System.web.UI;

So I tied this Code from (AgilityUWP):
        HtmlDocument htmlDoc = new HtmlDocument();
        htmlDoc.LoadHtml(InputTextBox.Text);

        HtmlNode docNodes = htmlDoc.DocumentNode;

            HtmlNode fldSerie = docNodes.Element("//a Serie/"); ==>> Here I want to Display all nodes with a certain string in it. Dont know if this works
        foreach(HtmlNode node in docNodes)==>> This does not work
        {
            UrlList.Items.Add(node.OuterHtml);


        }

New Post: HtmlAgilityPack parsing html using templates

$
0
0
Hi everyone;
Please could you help me I have a big problem
I got this web page http://www.brh.net/tableaux/tauxdujour.htm
and I want to extrat this data from the table where <td class="x187"> ...data... </td>

I try several times to extract the data but it doesn't work
Please Help me

New Post: HtmlAgilityPack parsing html using templates

$
0
0
Can you guys show your code, or make a sample to demonstrate it? Or fabricate an example that works for testing but not from real source?

Fra:[email removed]
Sendt:‎07.‎04.‎2016 18:14
Til:[email removed]
Emne:Re: HtmlAgilityPack parsing html using templates[htmlagilitypack:651621]

From: tobeaman

Hi everyone;
Please could you help me I have a big problem
I got this web page http://www.brh.net/tableaux/tauxdujour.htm
and I want to extrat this data from the table where ...data...

I try several times to extract the data but it doesn't work
Please Help me

New Post: how to load an encoded string into htmlDocument object

$
0
0
Hi

I am aware of the following code to load an html string

string astr = "<p> This is first Program </p>";
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
// load the html string to htmldocument object
htmlDoc.LoadHtml(astr);

but i would like to load an encoded string for example the encoded string is as follow
&quot;&lt;p&gt; This is first Program &lt;/p&gt;
the reason for this is for example i have a string which is
For patients with heart failure or ejection fraction < 40%, ACE-inhibitors (or ARB’s if history of ACE-inhibitors intolerance) should be considered.6. Other more routine options are listed below.

when i load the string below is the result for me
For patients with heart failure or ejection fraction < 40%,="" ace-inhibitors="" (or="" arb’s="" if="" history="" of="" ace-inhibitors="" intolerance)="" should="" be="" considered.="" 6.="" other="" more="" routine="" options="" are="" listed="">

Please i would like to know the solution for this problem

Thanks in Advance,
Raj.

New Post: Method not found: 'Int32 HtmlAgility Pack.HtmlDocument.TextIndexOf(Char, Int32)'

$
0
0
Hi,

I have recently started working on an old project which uses version 1.3.0.0 version of this library.
I restored the same package version and tried running the project. There is a part of code where it tries to find a index of a character in HtmlDocument which results in following error

"Method not found: 'Int32 HtmlAgility Pack.HtmlDocument.TextIndexOf(Char, Int32)'

So I think the method is deprecated. Any idea about this?


Thanks!

New Post: Expanding HtmlConvert.cs to handle List Items

$
0
0
This works however all the lists are numerical. If you can figure out how to have the style change (decimal then alpha then roman...) for sub-ordered lists let me know.

Add the following to: internal static void ConvertTo(HtmlNode node, TextWriter outText, PreceedingDomTextInfo textInfo)
case "li":
                                if (textInfo.ListIndex > 0)
                                {
                                    outText.Write("\r\n\t{0}.", textInfo.ListIndex++);
                                }
                                else
                                {
                                    outText.Write("\r\n\t•"); //using '*' as bullet char, with tab after, but whatever you want eg "\t->", if utf-8 0x2022
                                }
                                isInline = false;
                                break;
                            case "ol":
                                listIndex = 1;
                                goto case "ul";
                            case "ul": //not handling nested lists any differently at this stage - that is getting close to rendering problems
                                endElementString = "\r\n";
                                isInline = false;
                                break;

New Post: Can't seem to remove nodes

$
0
0
Embarrasingly trivial problem, but I've only ever used HAP for scraping websites and never tried to manipulate the html.

I'm trying to remove an img node and neither of the approaches listed below seem to work.

(A)
HtmlAgilityPack.HtmlNode tipGif = 
                    doc.DocumentNode.SelectSingleNode("//img[@src = 'http://www.stuff.com/ico/annoying-gif.gif']");

                if (tipGif != null)
                {
                    var nodesToRemove = doc.DocumentNode
                    .SelectNodes("//img[@src = 'http://www.stuff.com/ico/annoying-gif.gif']")
                   .ToList();

                    foreach (var node in nodesToRemove)
                       node.Remove();
            }
(B)
var RL = doc.DocumentNode.SelectSingleNode("//img[@src = '
http://www.stuff.com/ico/annoying-gif.gif']");
RL.Remove();
I think I'm missing something fundamental about how this method works. I can post an example of the html but it's very vanilla.

Thanks in advance.

New Post: HAP xpath finder

New Post: gzip encoding

$
0
0
Has this been incorporated into the NuGet package? Still seeing this happening today.
Viewing all 450 articles
Browse latest View live