Output Paging

This topic includes the following sections:

Auto Paging

By default, the HTML to PDF converter automatically pages the contents. This is usually sufficient for most cases.

Manual Paging with CSS

You can explicitly insert a page break or avoid automatic page breaks in your HTML page with CSS. The following code places a page break before the DIV block:

HTML
<div style="page-break-before: always">
A page break will be inserted BEFORE this div because
"page-break-before" is set to "always".
</div>

The following code places a page break after the DIV block:

HTML
<div style="page-break-after: always">
A page break will be inserted AFTER this div because
"page-break-after" is set to "always".
</div>

The following code prevents the converter from breaking the DIV into multiple pages:

HTML
<div style="page-break-inside: avoid">
Not matter how much contents you place here, the converter 
will NOT break this DIV into multiple pages because 
"page-break-inside" is set to "avoid".
</div>

A common requirement is to avoid pictures being split into multiple pages. You can add the following CSS rules into your page to achieve this:

CSS
img
{
    page-break-inside: avoid;
}

If the contents of an element with "page-break-inside: avoid" set exceed the current page, the exceeding portion is clipped off.

Note:: Custom paging does not support repeating table header and footer.

Custom Paging

EO.Pdf also allows you to implement your own custom paging logic. To perform custom paging, you must first use HtmlToPdfSession.CreatePaginator to create a Paginator object:

using (HtmlToPdfSession session = HtmlToPdfSession.Create(options))
{
    //Load the page to be converted
    session.LoadUrl(url);
    
    //Create a Paginator object
    Paginator paginator = session.CreatePaginator();
    
    //Perform custom paging
    .....
    
    //Use the custom paging result to render the PDF
    HtmlToPdfResult result = session.RenderAsPDF(paginator);

    //Save the result
    result.PdfDocument.Save(pdf_file_name);
}

A Paginator object always contains both the current paging settings and results. The current paging settings consist of paging input information on each HTML node, with the HTML body element as the root node. The following code retrieve the root node:

//Retrieve the root HTML node
HtmlElement body = paginator.Document.Body;

The HtmlElement class exposes ChildNodes collection through which you can traverse all child nodes recursively. The following properties are used by the built-in paging algorithm:

The current paging result is available through the Paginator's Pages collection.

Follow these steps to implement your own custom paging logic:

  1. Examine the current paging settings (PageBreakMode, PageBreakRange and PageBreakLineRanges) and paging result (The Paginator's Pages collection);
  2. Modify the above paging settings if needed;
  3. Call PageInfo.PageAgain to run the built-in paging algorithm again. This method re-page the current page and all the pages after the current page. Once this method returns, the Paginator's Pages collection will contain the new result;
  4. Repeat the above steps as needed;

The following code disables all CSS page-break instructions on all DIVs of the second page and limit the page to 500 pixel maximum:

//Get all DIVs in the document
HtmlElement[] divs = paginator.Document.GetElementsByTagName("DIV");
foreach (HtmlElement div in divs)
{
    //Disable CSS page-break instructions if the DIV is on the second page
    if (div.Location.PageIndex == 1)
        div.PageBreakMode = PageBreakMode.None;
}

//Run built-in paging algorithm again with the new page break mode settings
paginator.Pages[1].PageAgain(500);

Since the above code calls PageAgain on the second page, it only changes the page break position of the second page and all pages after the second page. It does not affect the first page's page break position.

Troubleshooting Paging Problems

This sections explains the paging process and various common problems that you may encounter during paging. The converter performs paging by:

  1. Scan the whole HTML to determine "unbreakable ranges". For example, if there are two lines of text with their Y position at 0 to 20 and 20 to 40 respectively, then 0 to 20 is one unbreakable range and 20 to 40 is another unbreakable range. This means paging can not occur between 0 to 20 or 20 to 40. For example, if paging were to occur at 30, then it would break the second line of text into multiple pages;
  2. Text lines are automatically recognized as unbreakable ranges. You can use "page-break-inside:avoid" to add additional unbreakable ranges. For example, consider the following HTML:
    HTML
    <div style="page-break-inside: avoid;position:absolute;top:0px;height:100px;width:200px;">
    some contents
    </div>
    The above HTML defines an additional unbreakable range from 0 to 100;
  3. Multiple overlapping unbreakable range can form a single larger unbreakable range. Consider the following HTML:
    HTML
    <div style="page-break-inside: avoid;position:absolute;top:0px;height:100px;width:200px;">
    some contents
    </div>
    <div style="page-break-inside: avoid;position:absolute;top:50px;height:100px;width:200px;">
    some contents
    </div>
    This creates a single unbreakable range 0 to 150 since the two ranges: 0 to 100 and 50 to 150 overlaps. Thus paging can not occurs anywhere between 0 to 150;

One common paging problem is certain styles in the HTML caused large unbreakable ranges unintentionally. Consider the following paragraph of text:

HTML
<p style="font-size:20px;line-height:15px;">
A long paragraph that contains many lines....
</p>

Normally each line of a paragraph would form a single unbreakable range, thus the converter will fill as many lines as possible in the current page, and when there is no room left on the current page for more lines, it advances to the next page and starts to position lines on the next page. However because the line-height for the above text is smaller than font size, so the area occupied by each text line would overlap with adjacent lines. This will cause their unbreakable ranges to combine into a single range that covers the whole paragraph. When this happens, the paragraph will not be divided into multiple pages.

When the converter encounters a large unbreakable range, it will always try to fit as much as possible. This means if the current page has less space available, the converter will try to start a new page in the hope that the new page will have more available space. This can cause undesired result. For example, it's common that a document reserves some extra white space before the first paragraph on the first page. If there is a large unbreakable range at the beginning of the document, the converter may move the whole block to the second page and leave a completely blank first page since the second page does not have the extra white space thus has more available space.

In all cases, when the converter encounters an unbreakable range that is larger than the page height, the contents will overflow to the footer area of the page until it's being cut off the page boundary. In this case contents that overflows to the footer area may overlaps with the footer, and contents beyond the page boundary will not be visible. To avoid such issues, examine both explicit page-break-inside:avoid CSS attribute and implicitly situations such as overlapping text lines.