Logo
My Account |  Site Map | Contact Us  
Welcome Guest Search | Active Topics | Sign In | Register

EO.Pdf Performance Options
Carl Meyertons
Posted: Thursday, December 10, 2020 8:28:24 AM
Rank: Newbie
Groups: Member

Joined: 12/3/2020
Posts: 3
Hello -- we have identified some pretty big performance issues with EO

Context:

We are a large company that run EO.Pdf HtmlToPdf.ConvertHtml -- we are typically running 16 generations in parallel and it can be a maximum of total between 1000-2000 PDFs generated.

Problem Summary:

1. Adding a HtmlHeader/Footer requires its own task, effectively doubling the required HtmlToPdf.MaxConcurrentTaskCount (this is not clear in the documentation based on what I could find). It also decreases throughput considerably (50%)

2. Parallel executions do not scale linearly with processor count. We see a performance degradation of approximately 50% / conversion in parallel vs. on a single thread -- we would expect a ratio closer to 1:1 (this accounts for coldstarts, etc.)

3. EO background threads consume large amounts of CPU, specifically EO.Base.ThreadRunnerBase. We expect there to be some sort of synchronous thread spin-locking going on that is the culprit here. This effect is multiplied when combined with problem 1. which requires double the background threads.

I am not sure how to post an image of from profiling the below benchmark code, but here is the summary of time spent in EO for one of our medium sized batches:

5.3M ms (36% of overall, ) thread time is spent in EO. This batch includes a lot of other business logic to gather data, process it, etc. so 36% of overall time inside EO is a huge chunk.

5.3M breakdown:
- 4.4M - inside EO.Base.ThreadRunnerBase (is this where the thread waiting is happening?)
- 0.384M - doing things with HtmlPdfSession (some method and RenderAsPdf) -- I assume this is actual work?
- 0.33M - doing things w/ WebView -- more actual work?
- 0.23M is spent in WaitableTask -- more spin-locking?
- 0.86M - in 4 background threads hitting EO.Internal (not sure, code is obfuscated)

Benchmark results:
- Single no header footer: 526.3ms
- Parallel no header footer: 844.6ms
- Single w/ header footer: 844.6ms
- Parallel w/ header footer: 1245.6ms

Benchmarking code
Code: C#
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Diagnosers;
using BenchmarkDotNet.Engines;
using BenchmarkDotNet.Exporters;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Validators;
using EO.Pdf;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace PwC.US.AM.TRACK.CLI.Forms.Test
{
    public static class EOTest
    {
        public static readonly IConfig ColdStartConfig = ManualConfig.Create(DefaultConfig.Instance)
            .AddJob(Job.InProcess.WithStrategy(RunStrategy.ColdStart).WithIterationCount(1))
            .AddValidator(JitOptimizationsValidator.DontFailOnError)
            .AddDiagnoser(MemoryDiagnoser.Default)
            .AddExporter(MarkdownExporter.GitHub)
            .WithOptions(ConfigOptions.DisableLogFile | ConfigOptions.DisableOptimizationsValidator)
        ;

        public static readonly IConfig ThroughputConfig = ManualConfig.Create(DefaultConfig.Instance)
            .AddJob(Job.InProcess.WithIterationCount(10).WithWarmupCount(1))
            .AddValidator(JitOptimizationsValidator.DontFailOnError)
            .AddDiagnoser(MemoryDiagnoser.Default)
            .AddExporter(MarkdownExporter.GitHub)
            .WithOptions(ConfigOptions.DisableLogFile | ConfigOptions.DisableOptimizationsValidator)
        ;

        public static void Execute()
        {
            HtmlToPdf.MaxConcurrentTaskCount = Environment.ProcessorCount * 2; // I guess header footer requires their own task??

            // measure cold start time.
            BenchmarkRunner.Run(ColdStartConfig);

            //measure throughput
            BenchmarkRunner.Run(ThroughputConfig);
        }

        public class Benchmark
        {
            public static readonly string Html = HtmlResource.test;

            private static readonly int[] _parallel = Enumerable.Range(0, Environment.ProcessorCount).ToArray();

            [Params(false, true)]
            public bool HeaderFooter { get; set; }


            [Benchmark(Baseline = true)]
            public void Single() => ConvertHtml();

            [Benchmark]
            public void Parallel()
            {
                _parallel.AsParallel().WithDegreeOfParallelism(Environment.ProcessorCount)
                    .ForAll(x => ConvertHtml())
                ;
            }

            private void ConvertHtml()
            {
                HtmlToPdf.Options.PageSize = new SizeF(8.5f, 11f);
                float x = 0.5f, y = 0.8f, width = 7.5f, height = 9.6f;
                HtmlToPdf.Options.OutputArea = new RectangleF(x, y, width, height);

                // EO simple turns off header & footer manipulations
                if (this.HeaderFooter)
                {
                    HtmlToPdf.Options.HeaderHtmlFormat = "<div></div>";
                    HtmlToPdf.Options.FooterHtmlFormat = "<div></div>";
                }

                var sw = Stopwatch.StartNew();
                var eoPdfDocument = new PdfDocument();
                HtmlToPdf.ConvertHtml(Html, eoPdfDocument);
                HtmlToPdf.ClearResult();

                sw.Stop();

                Console.WriteLine($"Html converted in {sw.Elapsed}");
            }
        }
    }
}
eo_support
Posted: Thursday, December 10, 2020 1:50:43 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 23,072
Thanks for the detailed analysis. You may not want to parallel to your CPU count directly. We will try to fine tune this on our end to see if we can make it scale better. In the mean time, you can try to divide the load into multiple AppDomain and see if it works better.

The root of the issue is a single HTML to PDF conversion tasks involves many threads. The thread you see are only the "driver" thread that coordinates all these other worker threads (they are inside child process, not in your process). The key internal objects involved are:

1. Browser engine. The browser engine manages many different tasks such as network, cache, cookies, etc. Internally even a single browser engine would run dozens of thread to do all these tasks concurrently;
2. Render. A render is a separate process with a dedicated thread that drives the UI of a web page. For example, JavaScript code in the page would run this thread. This thread is also responsible for collecting render result from the native browser engine side and send it to the .NET side. This is also the thread that is associated to the ThreadRunner's thread. This is why you see this thread is taking up significant CPUs;
3. GPU. By default the render uses the GPU to expediate certain rendering tasks. However because GPU is primarily designed buffers related to screen, it may not scale well with large number of concurrent task. In those environment GPU can becomes a bottleneck and slow things down. We will see if we can add an option to allow user to disable the GPU.

Internally we maintain a pool of browser engine and renders and automatically allocate/free them as needed. The parameters used by this process needs to be revisited and fine tuned as computers grow more and more powerful over time and our current parameters are based on test data conducted a few years back. So hopefully in a future build it will scale better.


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.