Welcome Guest Search | Active Topics | Sign In | Register

Wrong Y coordinates with page breaks Options
Guenter
Posted: Wednesday, June 24, 2015 12:05:44 PM
Rank: Member
Groups: Member

Joined: 6/11/2015
Posts: 10
Hi

I use EO.PDF (Version 15.1.70.4) and need to get table row positions. A table without separate head (<thead>) and body (<tbody>) works, but if I need a tableheader that gets repeated on every page (display: table-header-group), the Y coordinates are not correct.

A simple test case would be the following HTML (I've tried to mark the Y coordinates in the resulting PDF file with lines in different colors). It consists of tables in a table and the rows on the left side have another height compared to the rows on the right side. If you feed this file to the following C# code, then a PDF is rendered where the lines are at the correct position on the first page, but not on the second page.
The reason for this is the wrong Y position for the row on the second page (output.log: "Side: Right Row: 6 Page: 1 PageHeight: 595 Y: 72 Height: 56,51386"). If you remove all thead and tbody elements from the source HTML and send it to the C# class, then the strokes are on the right Position (output.log: "Side: Right Row: 7 Page: 1 PageHeight: 595 Y: 72 Height: 56,51386"). The log reports the same Y Position for both runs, but that's impossible, because the first run renders an additonal row (<thead>) on the second page.

Procedure in brief:
1) Save html file to disk
2) Save C# to disk
3) Adjust params and run C#
4) Remove head and body elements in html and save again
5) Run C# again
6) Compare log file for both runs



Do I have to include the height of the header row in calculations for all pages after the first page or is this a bug in EO?



A simple test HTML (should be saved to disk):
Code:

<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <style type="text/css">
            table.parent            { table-layout: fixed; border-width: 0; border-style: none; }
            table.child                { width: 100%; table-layout: fixed; border-width: 0; border-style: none; font-size: 2em;  }
            td.parent                { vertical-align: top; padding-left: 10px; padding-top: 10px; padding-right: 10px; }
            tr.avoidinside            { page-break-inside: avoid; }     
            tr.avoidinsidebefore    { page-break-inside: avoid; page-break-before: avoid; }
            <!-- remove the next two lines if the header should not be repeated on the second page -->
            thead                      { display: table-header-group; font-weight: bold; font-style: italic; }
            tbody                      { display: table-row-group }
            <!-- remove also all <thead>, </thead>, <tbody> and </tbody> elements, if the header should not be repeated on the second page -->
        </style>
    </head>
    <body>
        <table class="parent">
            <tr>
                <td class="parent">
                    <table id="childleft" class="child">                        
                        <thead>
                        <tr>
                            <td colspan="2">Header Left</td>                            
                        </tr>    
                        </thead
                        <tbody>
                        <tr class="avoidinside">
                            <td>Left 1</td>
                            <td>no page-break</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Left 2</td>
                            <td>no page-break</td>
                        </tr>
                        </tbody>
                    </table>
                </td>
                <td class="parent" style="border-left-width: 1.1pt; border-left-style: solid; border-left-color: #000000;">
                    <table id="childright" class="child">
                    <thead>
                        <tr>
                            <td colspan="2">Header Right</td>                            
                        </tr>    
                        </thead
                        <tbody>
                        <tr class="avoidinside">
                            <td>Right 1</td>
                            <td>no page-break inside</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Right 2</td>
                            <td>no page-break inside</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Right 3</td>
                            <td>no page-break inside</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Right 4</td>
                            <td>no page-break inside</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Right 5</td>
                            <td>no page-break inside</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Right 6</td>
                            <td>no page-break inside</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Right 7</td>
                            <td>no page-break inside</td>
                        </tr>
                        </tbody>
                    </table>
                </td>
            </tr>
        </table>        
    </body>
</html>


A simple C# test class (input should be adjusted to the upper file):
Code:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Drawing;
using System.IO;
using System.Linq;
using EO.Pdf;
using EO.Pdf.Contents;
using EO.Pdf.Drawing;

namespace Test
{
    class TestEo
    {
        private const int DPI = 72;
        private const String input = @"C:\input.html";
        private const String output = @"C:\output.pdf";
        private const String log = @"C:\output.log";
        private static TextWriter logWriter;

        static void Main(string[] args)
        {
            using (logWriter = File.AppendText(log))
            {
                HtmlToPdf.Options.PageSize = new SizeF(PdfPageSizes.A4.Height, PdfPageSizes.A4.Width);
                var pageResult = HtmlToPdf.ConvertUrl(input, output);

                //left table with two rows on one side
                var childLeft = pageResult.HtmlDocument.GetElementById("childleft");
                var childLeftBody = childLeft.ChildElements.First(pa => pa.TagName.Equals("TBODY"));
                var childLeftRows = childLeftBody.ChildElements.Where(bo => bo.TagName.Equals("TR")).OrderBy(ro => ro.Location.Page.Index).ThenBy(ro => ro.Location.Y).ToList();
                PrintRows(childLeftRows, true);

                //right table with seven rows on two sides
                var childRight = pageResult.HtmlDocument.GetElementById("childright");
                var childRightBody = childRight.ChildElements.First(pa => pa.TagName.Equals("TBODY"));
                var childRightRows = childRightBody.ChildElements.Where(bo => bo.TagName.Equals("TR")).OrderBy(ro => ro.Location.Page.Index).ThenBy(ro => ro.Location.Y).ToList();
                PrintRows(childRightRows, false);

                //draw blue line before first row and green line after last row on first and second page
                MarkLines(pageResult, childLeftRows, true);
                MarkLines(pageResult, childRightRows, false);

                pageResult.PdfDocument.Save(output);

                Process.Start(output);

            }
        }

        private static void PrintRows(List<HtmlElement> rows, bool isLeft)
        {
            for (int i = 0; i < rows.Count; i++)
            {
                var row = rows[i];
                logWriter.WriteLine("Side: {0} Row: {1} Page: {2} PageHeight: {3} Y: {4} Height: {5}", isLeft ? "Left":"Right", i, row.Location.Page.Index, row.Location.Page.Size.Height * DPI, row.Location.Y * DPI, row.Size.Height * DPI);   
            }           
        }

        private static void MarkLines(HtmlToPdfResult result, List<HtmlElement> rows, bool isLeft)
        {
            for (int i = 0; i < 2; i++)
            {
                var firstRow = rows.FirstOrDefault(ro => ro.Location.Page.Index == i);
                var lastRow = rows.LastOrDefault(ro => ro.Location.Page.Index == i);
                if (firstRow != null && lastRow != null)
                {
                    float pageHeight = firstRow.Location.Page.Size.Height;
                    float pageWidth = firstRow.Location.Page.Size.Width;
                    float startX = 0;
                    float startY = (pageHeight - firstRow.Location.Y)*DPI;
                    float endY = (pageHeight - lastRow.Location.Y - lastRow.Size.Height)*DPI;
                    if (!isLeft)
                    {
                        startX = pageWidth*DPI - 100;                       
                    }
                    DrawLine(result.PdfDocument.Pages[i], startX, startY, startX+100,
                        startY, Color.Blue);
                    DrawLine(result.PdfDocument.Pages[i], startX, endY, startX+100,
                        endY, Color.Green);
                }
            }
        }

        private static void DrawLine(PdfPage page, float startX, float startY, float endX, float endY, Color color)
        {
            if (page != null)
            {
                var lineContanter = new PdfPathContent();
                var subPath = new PdfSubPath();
                subPath.From = new PdfPoint(startX, startY);
                var line =
                    new PdfPathLineSegment(new PdfPoint(endX, endY));
                subPath.Segments.Add(line);
                lineContanter.Path.SubPaths.Add(subPath);
                lineContanter.LineWidth = 0.7f;
                lineContanter.StrokingColor = new EO.Pdf.Drawing.PdfColor(color);
                lineContanter.Action = EO.Pdf.Contents.PdfPathPaintAction.Stroke;
                page.Contents.Add(lineContanter);
            }
        }
    }
}


Thanks
Guenter
eo_support
Posted: Thursday, June 25, 2015 10:02:30 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,083
Thank you very much for the detailed test code. We have been able to reproduce and fix the problem. We will try to post a new build as soon as possible and reply again when the new build is posted.
Guenter
Posted: Thursday, July 2, 2015 5:13:25 AM
Rank: Member
Groups: Member

Joined: 6/11/2015
Posts: 10
Thanks for the reply

Do you have a timeframe for the new test build? I need this functionality for a customer project where everything is finished except of the PDF creation
eo_support
Posted: Thursday, July 2, 2015 9:43:03 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,083
Hi,

Sorry that we didn't update you earlier. Please download the new build from our download page (2015.1.81.8). This build contains the fix for this issue.

Thanks!
Guenter
Posted: Thursday, July 2, 2015 11:06:46 AM
Rank: Member
Groups: Member

Joined: 6/11/2015
Posts: 10
Thanks for your prompt response

I can confirm, that the new version corrects the problem in the simple test scenario. I will post back, if I find another problem with more complex HTML files
eo_support
Posted: Thursday, July 2, 2015 7:41:19 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,083
Great. Thanks for confirming the fix!
Guenter
Posted: Wednesday, August 5, 2015 4:00:43 AM
Rank: Member
Groups: Member

Joined: 6/11/2015
Posts: 10
Hi,

after a few weeks of testing, I've found a few cases, where the revised code introduces a new problem.
The repetition of the header is not as reliable as in the last years and in certain circumstances the page break is more or less by accident. If you feed the following simple html-file to the test class from posting #1, a pdf is rendered with two pages:

Code:

<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <style type="text/css">
            table.parent            { table-layout: fixed; border-style: none; }
            table.child                { width: 100%; table-layout: fixed; border-style: none; font-size: 2em;  }
            td.parent                { vertical-align: top; padding-left: 10px; padding-top: 10px; padding-right: 10px; }
            tr.avoidinside            { page-break-inside: avoid; }     
            THEAD                    { display: table-header-group }
            TBODY                   { display: table-row-group }

        </style>
    </head>
    <body>
        <table class="parent">
            <tr>
                <td class="parent">
                    <table id="childleft" class="child">
                        <thead>
                            <tr>
                                <td colspan="2">Header Left</td>                            
                            </tr>    
                        </thead>
                        
                        <tr class="avoidinside">
                            <td>Left 1</td>
                            <td>no page-break</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Left 2</td>
                            <td>no page-break</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Left 3</td>
                            <td>no page-break</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Left 4</td>
                            <td>no page-break</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Left 5</td>
                            <td>no page-break</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Left 6</td>
                            <td>no page-break</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Left 7</td>
                            <td>no page-break</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Left 8</td>
                            <td>no page-break</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Left 9</td>
                            <td>no page-break</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Left 10</td>
                            <td>no page-break</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Left 11</td>
                            <td>no page-break</td>
                        </tr>
                        <!-- without this line everything works as expected    
                        <tr class="avoidinside">
                            <td>Left 12</td>
                            <td>no page-break</td>
                        </tr>
                        -->
                    </table>
                </td>
                <td class="parent" style="border-left-width: 1.1pt; border-left-style: solid; border-left-color: #000000;">
                    <table id="childright" class="child">
                        <thead>
                        <tr>
                            <td colspan="2">Header Right</td>                            
                        </tr>    
                        </thead>
                        
                        <tr class="avoidinside">
                            <td>Right 1</td>
                            <td>no page-break inside</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Right 2</td>
                            <td>no page-break inside</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Right 3</td>
                            <td>no page-break inside</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Right 4</td>
                            <td>no page-break inside</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Right 5</td>
                            <td>no page-break inside</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Right 6</td>
                            <td>no page-break inside</td>
                        </tr>
                        <tr class="avoidinside">
                            <td>Right 7</td>
                            <td>no page-break inside</td>
                        </tr>
                        
                    </table>
                </td>
            </tr>
        </table>
        
    </body>
</html>


If you remove the comments from the last row("<tr>") on the left side ("Left 12") and feed it again to the test class, it starts to get strange. The first page only shows a small rest of the border and no page header. The second page shows all content on one page (that was splitted on two pages in the first run). Oddly enough the right side is higher than the left side, so an additional line on the left side should have no consequences.

I've found cases where the header is not rendered on the first page and cases where the header is missing on the second page, but the reason seems to be the same. The missing header appears only by adding a few rows to the html



Thanks
Guenter
eo_support
Posted: Wednesday, August 5, 2015 8:51:29 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,083
Hi,

We have looked into your test file. This behavior is the expected behavior. The complicating factor is page-break and line-break. When EO.Pdf tries to insert a page break, it considers the following factors:

1. Your explicitly page break instructions. For example, page-break-inside: avoid;
2. Text lines. It will try not to break in the middle of a text lines;

This seems simple but gets complicated when you have two columns with different text. Consider the text line Y position in the left column is:

line 0: 0 to 10
line 1: 10 to 20
line 2: 20 to 30

And the text line Y position in the right column is:

line 0: 0 to 8
line 1: 8 to 16
line 2: 16 to 24

The correct position to insert page break according to the left column would be 0, 10 and 20. However all these three values will split text lines in the right column. To avoid this situation, the converter will combine all three lines into a single unbreakable block. When this block is bigger than one page, it will push it to the next page in the hope that it would fit, if that still does not fit, then it will have to cut the text. This is what happend with your HTML.

In your case, you do have the same line height, but because the left side and the right side is wrapped differently (some wrapp two lines, some wrapp three lines), such wrapping together with the default td padding top/bottom caused the shifting and mismatch between left and right lines. As a test, you can add the following CSS in your page:

Code: CSS
td
{
    margin: 0px;
    padding: 0px;
}


You will see that it breaks after left 3 and right 2. This is because this is the only common line breaking point between left and right column (the left side has 3 rows with 6 line of text where as the right column has 2 rows also with 6 line of text). If you further remove the page-break-inside: avoid for your tr element and add vertical-align: top to the td rule (both would cause the text lines in the left column and text lines in the right column to mismatch), then you will see a perfectly "normal" page break after line 11 as you originally expected.

Hope this makes sense to you. Please feel free to let us know if you still have any questions.

Thanks!
Guenter
Posted: Thursday, August 6, 2015 11:15:19 AM
Rank: Member
Groups: Member

Joined: 6/11/2015
Posts: 10
Hi,

but isn't that the wrong way of interpreting CSS?

Inheritance in CSS means from top to bottom. There are no page-break attributes in the outer table, so this table could be ignored (page-break-wise). Inside of this table are two child tables and both of them should be interpreted separately. If you draw a "virtual" line from left to right over both tables and try to find a single "virtual"-line where page-break is allowed on both sides, it's more than likely, that a lot of space is wasted (as in my example, the page break is after the first or second row and the next ten rows are rendered on the second page).

Of course the current logic is easier to implement and has faster execution times, but I think the CSS-way would be to render the left table until it needs a page break, then render the right table to its own page break and finally continue on the second page from the two different starting points.

Do you use the CSS-page-break parsing from Chrome or is this your own extension? I've tested a few sample HTMLs with wkhtmltopdf and it makes the impression on me, that at least this library interprets the parsing similiar to me

Thanks
Guenter
eo_support
Posted: Thursday, August 6, 2015 11:52:46 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,083
Hi Guenter,

I don't think you understood the key point in my previous reply. The page break that causes the problem is not from your CSS (even though your CSS does complicate the matter). The root of the problem is the "implied" "page-break-avoid:inside" that we automatically apply on text lines. In another word, we automatically apply page-break-avoid:inside on every single line of text, this prevents the text lines from being slice off between pages and allow us to insert page breaks only between text lines. There is no equivalent in standard CSS for this since the minimum scope you can apply CSS to is an element, whereas we apply it on a single text line.

This works remarkably well with "regular" page, but it can get into problems if such a "clear cut" line does not exist ---- in your case due to the different alignment/wrapping in both left and right table, no matter where we insert the page break, it's going to slice some text lines, this causes the line break to end up in some strange places. If we didn't try to keep single text lines as a whole, we would have similar result as other products.

Thanks!
Guenter
Posted: Thursday, November 3, 2016 9:50:54 AM
Rank: Member
Groups: Member

Joined: 6/11/2015
Posts: 10
Hi

I have to reactivate this bug report, because I've switched from EO2015 to EO2016 (16.2.44, 16.2.50) and the wrong behaviour from post #1 reappears.
You only have to add at least one additional <tr> (e.g. "Right 8") in the sample HTML, because EO2016 renders the page a bit different and 7 lines are no longer enough to force a page break.
A test with 15.3.78.0 shows the start point for positioning (blue line) at page 2 after the header line, EO2016 ignores the repeated header and shows the start point before the header line

Thanks
eo_support
Posted: Thursday, November 3, 2016 1:33:04 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,083
Hi,

This is just to let you know that we have found the root cause of the problem. We will post a new build with the fix as soon as possible.

Thanks!
eo_support
Posted: Saturday, November 5, 2016 6:15:16 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,083
Hi,

This is just to let you know that we have posted a new build that should fix this problem. You can download the new build from our download page. Please take a look and let us know how it goes.

Thanks!
Guenter
Posted: Monday, November 7, 2016 4:21:32 AM
Rank: Member
Groups: Member

Joined: 6/11/2015
Posts: 10
Thanks for the new version

I have tested a few samples and can confirm, that the new version corrects the problem. I will post back, if the problem reoccures with more complex HTML files
eo_support
Posted: Monday, November 7, 2016 8:31:18 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,083
Great. Thanks for the update!


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.