Quantcast
Viewing all articles
Browse latest Browse all 450

New Post: StackOverflowException workaround

I've got StackOverflowException while scanning sites with complicated DOM.
Html documents examples were uploaded by Dzonny here (https://www.codeplex.com/Download?ProjectName=htmlagilitypack&DownloadId=636619).
So I found an easy workaround running HtmlDocument.Load in separate Thread with increased stack.

Just play with stackSize variable to define big enough value for your tasks.
namespace StackOverflowTest
{
    class Program
    {
        static void Main(string[] args)
        {
            MyThread mt = new MyThread();

            int stackSize = 10000000;
            Thread thread = new Thread(mt.Run, stackSize);
            thread.Start();
            Thread.Sleep(1000);

            while (mt.Running)
            {
                Thread.Sleep(20000);
            }
            Console.WriteLine("done");
            
            //do your stuff
            var hrefs = mt.page.DocumentNode.SelectNodes("//a[@href]");
        }
    }

    public class MyThread
    {
        public bool Running { get; private set; }
        public HtmlDocument page { get; private set; }

        public MyThread()
        {
            Running = false;
        }

        public void Run()
        {
            Running = true;
            page = new HtmlDocument();
            page.Load("HtmlAgilityPackStackOverflow1.html");
            Running = false;
        }
    }
}

Viewing all articles
Browse latest Browse all 450

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>