Welcome Guest Search | Active Topics | Sign In | Register

PdfDocument.Merge have an inconsistent behavior regarding PDF/A Options
Jarle
Posted: Tuesday, October 9, 2018 10:13:28 AM
Rank: Member
Groups: Member

Joined: 8/30/2018
Posts: 15
The method PdfDocument.Merge has an inconsistent way to process PDF/A.

Sometimes it becomes PDF/A-1a, sometimes not. Sometimes the PDF/A-1a is valid according to Adobe Preflight, while sometimes not.
Does the method PdfDocument.Merge officially have PDF/A support?
Or do I use the method in a wrong way ?
Is there any bug here ?
Or is it just because I validate EO.Pdf without a valid license ?

It seems as it is the first page that is read/merged which determines which PDF standard the finished file claims to be.

See all my C# examples:
docAC_2 and docCA_3 have a very inconsistent result.
docAB_1 and docAB_2 is also very strange.


Code: C#
public static void Poc_TestPdfA()
        {
            // Testing with Adobe Preflight, 11.0.20
            // EO.Pdf from NuGet version 18.2.74

            var docA = new PdfDocument();
            docA.Standard = PdfStandard.PDF_A;
            HtmlToPdf.ConvertHtml("Page A text text text text text text", docA);
            docA.Save(@"C:\Temp\EO-docA.pdf");
            // docA Claims to be PDF/A-1a 
            // Adobe Preflight: "Verify compliance with PDF/A-1a": No problems found
            
            var docB = new PdfDocument();
            docB.Standard = PdfStandard.PDF_A;
            HtmlToPdf.ConvertHtml("Page B text text text text text text", docB);
            docB.Save(@"C:\Temp\EO-docB.pdf");
            // docB Claims to be PDF/A-1a 
            // Adobe Preflight: "Verify compliance with PDF/A-1a": No problems found

            var docAB_1 = PdfDocument.Merge(docA, docB);
            docAB_1.Save(@"C:\Temp\EO-docAB_1.pdf");
            // Claims to be PDF/A-1a
            // Adobe Preflight: "Verify compliance with PDF/A-1a": VERIFICATION FAILED
            //      * Transparency used (page is a transparency group)
            //      * CIDset in subset font missing    ( This seems to be regarding the text "Created with EO.Pdf for .NET trial version. http://www.essentialobjects.com.")

            var docAB_2 = PdfDocument.Merge(docA, docB);
            docAB_2.Standard = PdfStandard.PDF_A;
            docAB_2.Save(@"C:\Temp\EO-docAB_2.pdf");
            // Claims to be PDF/A-1a 
            // Adobe Preflight: "Verify compliance with PDF/A-1a": No problems found


            // ** Test Mix of PDF and PDF/A **

            var docC = new PdfDocument();
            docC.Standard = PdfStandard.None;
            HtmlToPdf.ConvertHtml("Page C text text text text text text", docC);
            docC.Save(@"C:\Temp\EO-docC.pdf");
            // docC Claims to be PDF
            

            var docAC_1 = PdfDocument.Merge(docA, docC);
            docAC_1.Save(@"C:\Temp\EO-docAC_1.pdf");
            // docAC_1 Claims to be PDF/A-1a
            // Adobe Preflight: "Verify compliance with PDF/A-1a": VERIFICATION FAILED
            //      * Transparency used (page is a transparency group)
            //      * CIDset in subset font missing    ( This seems to be regarding the text "Created with EO.Pdf for .NET trial version. http://www.essentialobjects.com.")

            var docAC_2 = PdfDocument.Merge(docA, docC);
            docAC_2.Standard = PdfStandard.None;
            docAC_2.Save(@"C:\Temp\EO-docAC_2.pdf");
            // docAC_2 Claims to be PDF/A-1a
            // Adobe Preflight: "Verify compliance with PDF/A-1a": VERIFICATION FAILED
            //      * Transparency used (page is a transparency group)
            //      * CIDset in subset font missing    ( This seems to be regarding the text "Created with EO.Pdf for .NET trial version. http://www.essentialobjects.com.")

            var docAC_3 = PdfDocument.Merge(docA, docC);
            docAC_3.Standard = PdfStandard.PDF_A;
            docAC_3.Save(@"C:\Temp\EO-docAC_3.pdf");
            // docAC_3 Claims to be PDF/A-1a 
            // Adobe Preflight: "Verify compliance with PDF/A-1a": No problems found


            var docCA_1 = PdfDocument.Merge(docC, docA);
            docCA_1.Save(@"C:\Temp\EO-docCA_1.pdf");
            // docCA_1 Claims to be PDF

            var docCA_2 = PdfDocument.Merge(docC, docA);
            docCA_2.Standard = PdfStandard.None;
            docCA_2.Save(@"C:\Temp\EO-docCA_2.pdf");
            // docCA_2 Claims to be PDF

            var docCA_3 = PdfDocument.Merge(docC, docA);
            docCA_3.Standard = PdfStandard.PDF_A;
            docCA_3.Save(@"C:\Temp\EO-docCA_3.pdf");
            // docCA_3 Claims to be PDF
        }
eo_support
Posted: Tuesday, October 9, 2018 10:41:04 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,071
Hi,

PdfDocument.Merge does not care about whether the file is PDF/A or not. It basically just mix two files together as is. So if you merge a PDF/A compliant file with another non-compliant file, the result PdfDocument will contain non-compliant element. Note at this state the PdfDocument is completely in memory.

Whether the file is eventually PDF-A compliant is determined when you call Save method to save it. The logic behind this is an in memory PdfDocument can contain whatever information, whether PDF-A complaint or not. However when you actually save it to disk, the the value of the "Standard" property kicks in and the saving process will filter out non-complaint elements if Standard is set to PDF-A. In another word, in order to produce PDF-A compliant file, you must always explicitly set Standard property to PDF-A and Save it.

If this is not consistent with what you observed, please send us a test app and we will be happy to investigate further. See here for more details on how to send test app to us:

https://www.essentialobjects.com/forum/test_project.aspx

Thanks!
Jarle
Posted: Tuesday, October 9, 2018 12:56:46 PM
Rank: Member
Groups: Member

Joined: 8/30/2018
Posts: 15
I have now sent you a mail containing my C# example above.
eo_support
Posted: Tuesday, October 9, 2018 1:31:44 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,071
We must have missed something here ---- we do not see anything in your example code that conflicts with our explanation above. Can you elaborate on which test and why you think it's wrong?
Jarle
Posted: Tuesday, October 9, 2018 2:23:47 PM
Rank: Member
Groups: Member

Joined: 8/30/2018
Posts: 15
ok, I will try to explain more details:

Let's take EO-docAB_1.pdf
Using this code:
Code: C#
var docAB_1 = PdfDocument.Merge(docA, docB);
docAB_1.Save(@"C:\Temp\EO-docAB_1.pdf");


When you open this pdf-file in Adobe Acrobat, the file claims to be PDF/A.

BUT when I validate the file using Adobe Preflight, it fails with theese errors:
// Adobe Preflight: "Verify compliance with PDF/A-1a": VERIFICATION FAILED
// * Transparency used (page is a transparency group)
// * CIDset in subset font missing


The last error seems to be regarding the text "Created with EO.Pdf for .NET trial version. http://www.essentialobjects.com."


My opinion:
The PDF file is incorrectly marked as PDF/A. However, the pdf file is not a true PDF/A-1a because validation using Adobe Preflight fails

Agree ?
Jarle
Posted: Tuesday, October 9, 2018 2:45:54 PM
Rank: Member
Groups: Member

Joined: 8/30/2018
Posts: 15
Next example:
Code: C#
var docCA_3 = PdfDocument.Merge(docC, docA);
docCA_3.Standard = PdfStandard.PDF_A;
docCA_3.Save(@"C:\Temp\EO-docCA_3.pdf");

In this example I merge docC (non-PDF/A) with docA (pdf/A)
I set Standard=PdfStandard.PDF_A;
The merged files is then saved as EO-docCA_3.pdf

When open this file in Adobe Reader, the pdf-file claims to be a standard PDF.

My opinion:
This file should have been PDF/A-1a, but it is only a standard PDF.

What is happening here is the opposite of what happened in my previous example.
For me, it appears that it is always the first pdf-file I merge that decide if the result pdf-file is highlighted as PDF / A or not.

Even if I set Standard=PdfStandard.None the result PDF-file highlighted as PDF/A, if first PDF-file I merge is a PDF/A-file.
See EO-docAC_2.pdf for an example.

Do you understand me now ?
eo_support
Posted: Tuesday, October 9, 2018 3:34:10 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,071
I see. We have found the root of the problem and this will be fixed in our next build. The correct behavior should be:

1. Regardless of the input files, as long as Standard for the merged file is not set, it should be a standard PDF;
2. Regardless of the input files, as long as Standard for the merged file is set to PDF_A, the result should be PDF_A;

We will reply here again when the new build is posted.

Thanks!
eo_support
Posted: Thursday, October 11, 2018 3:22:31 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,071
Hi,

This is just to let you know that we have posted a new build that should fix this issue. You can download the new build from our download page. Please take a look and let us know how it goes.

Thanks!
Jarle
Posted: Thursday, October 11, 2018 9:11:05 PM
Rank: Member
Groups: Member

Joined: 8/30/2018
Posts: 15
Thank you,
I have now tested EO.Pdf version 18.3.23 using the NuGet package.
All errors that I reported in this thread regarding Merge are now fixed, and it works now as expected.

BUT unfortunately you also introduced a new bug.

Code: C#
HtmlToPdf.Options.JpegQualityLevel = 70;
var pdf1 = new PdfDocument();
HtmlToPdf.ConvertUrl("https://www.essentialobjects.com/AboutUs.aspx", pdf1);
pdf1.Standard = PdfStandard.PDF_A;
pdf1.Save(@"C:\temp\EO-AboutUs.pdf");


Error stacktrace:

Unhandled Exception: System.NotImplementedException: The method or operation is not implemented.
at EO.Internal.apc.b(Byte[] A_0, Stream A_1, a7w A_2, Int32 A_3)
at EO.Internal.vw.b(Byte[] A_0, String A_1, a7w A_2, Int32 A_3)
at EO.Internal.vw.d()
at EO.Pdf.Drawing.PdfImage..ctor(vw A_0)
at EO.Pdf.Drawing.PdfImage.a(vw A_0, Int32 A_1)
at EO.Internal.gd.b(a46 A_0)
at EO.Internal.aj.c(a46 A_0)
at EO.Internal.kk.a(awh A_0)
at EO.Internal.aj.c(a46 A_0)
at EO.Internal.a7w.a(awh A_0)
at EO.Internal.aj.c(a46 A_0)
at EO.Internal.a7w.a(awh A_0)
at EO.Internal.aj.c(a46 A_0)
at EO.Internal.a7w.a(awh A_0)
at EO.Internal.aj.c(a46 A_0)
at EO.Internal.kk.a(awh A_0)
at EO.Internal.aj.c(a46 A_0)
at EO.Internal.a7w.a(awh A_0)
at EO.Internal.aj.c(a46 A_0)
at EO.Internal.a7w.a(awh A_0)
at EO.Internal.aj.c(a46 A_0)
at EO.Internal.a7w.a(awh A_0)
at EO.Internal.aj.c(a46 A_0)
at EO.Internal.kk.a(awh A_0)
at EO.Internal.aj.c(a46 A_0)
at EO.Internal.oi.a(awh A_0)
at EO.Internal.aj.c(a46 A_0)
at EO.Internal.a7w.a(awh A_0)
at EO.Internal.aj.c(a46 A_0)
at EO.Internal.kk.a(awh A_0)
at EO.Internal.aj.c(a46 A_0)
at EO.Internal.a7w.a(awh A_0)
at EO.Internal.aj.c(a46 A_0)
at EO.Internal.aj.a(a46 A_0)
at EO.Internal.gd.a(Boolean A_0, Boolean A_1)
at EO.Pdf.PdfDocument.a()
at EO.Pdf.PdfDocument.Save(String fileName)
at EssentialsObjectsPdf.Poc2()
Jarle
Posted: Friday, October 12, 2018 5:29:30 AM
Rank: Member
Groups: Member

Joined: 8/30/2018
Posts: 15
It is not only with Merge that there are problems where the pdf-files are marked with the wrong PDF-standard.

See this code example:
Code: C#
public static void Poc()
{
            var pdf2 = new PdfDocument();
            HtmlToPdf.ConvertUrl(url, pdf2);
            pdf2.Save(@"C:\temp\EO-AboutUs.1.Default.pdf");  // Save for print purposes.
            pdf2.Standard = PdfStandard.PDF_A;
            pdf2.Save(@"C:\temp\EO-AboutUs.1.PDF_A.pdf");    // Save for archiving purposes purposes.
            /* The file EO-AboutUs.1.PDF_A.pdf  did not become a PDF/A. 
             * The file is not even highlighted as PDF/A when it opens in Adobe Acrobat.
             */


            
            // Then I try to save the files in reverse order. PDF_A first, then PDF_None.
            pdf2 = new PdfDocument();
            HtmlToPdf.ConvertUrl(url, pdf2);
            pdf2.Standard = PdfStandard.PDF_A;
            pdf2.Save(@"C:\temp\EO-AboutUs.2.PDF_A.pdf");    // Save for archiving purposes. This file becomes a valid PDF/A.
            pdf2.Standard = PdfStandard.None;
            pdf2.Save(@"C:\temp\EO-AboutUs.2.Default.pdf");  // Save for print purposes. This file becomes an invalid PDF/A.
            /* EO-AboutUs.2.Default.pdf is highlighted as PDF/A when it opens in Adobe Acrobat, but it fails when testing in Adobe Preflight.
             */
}
Jarle
Posted: Friday, October 12, 2018 6:02:30 AM
Rank: Member
Groups: Member

Joined: 8/30/2018
Posts: 15
There also appears to be a problem regarding Clone.
You cannot save a PDFDocument after you have done Clone.


Code: C#
public static void Poc3()
{
            var pdf3 = new PdfDocument();
            HtmlToPdf.ConvertUrl(url, pdf3);
            var copy = pdf3.Clone();
            copy.Standard = PdfStandard.PDF_A;
            copy.Save(@"C:\temp\EO-AboutUs-Clone.pdf"); 
            // The file will not be a PDF/A. The file is not highlighted as PDF/A when it opens in Adobe Acrobat.
}


eo_support
Posted: Friday, October 12, 2018 7:51:44 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,071
Thanks for letting us know. We are looking into this and will get back to you as soon as we have an update.
eo_support
Posted: Friday, October 12, 2018 11:16:02 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,071
Hi,

This is just to let you know that we have posted a new build that should resolve these issues. Please see your private message for the download location.

Thanks!
Jarle
Posted: Monday, October 15, 2018 8:41:08 AM
Rank: Member
Groups: Member

Joined: 8/30/2018
Posts: 15
Using EO.Pdf 18.3.25.
I have now tested theese issues/bugs.
They are now created with correct pdf-standard, and they are validated ok by Adobe Preflight.

Thank you.
eo_support
Posted: Monday, October 15, 2018 8:42:24 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,071
Glad to hear that and thank you very much for confirming!

Please feel free to let us know if you run into anything else.
Jarle
Posted: Monday, October 15, 2018 9:22:52 AM
Rank: Member
Groups: Member

Joined: 8/30/2018
Posts: 15
Using EO.Pdf 18.3.25.

There is still one issue, or maybee this is "Working as Designed".

If you first save a pdf as PDF_A, then the pdfDocument object is changed, and you cannot save as a PDF_None later, because transparent pictures is modified. A PDF_None will then look like an PDF_A.
Workaround is to clone before save.

If you run my C# code you will see that EO-penguin.4b.Default.pdf and EO-penguin.4c.Default.pdf are different shown
The penguin picture in EO-penguin.4b.Default.pdf is shown as if it was an PDF_A.
EO-penguin.4a.Default.pdf and EO-penguin.4c.Default.pdf are shown correct.
See my example:

Code: C#
public static void Poc4()
{
            const string html = "<!DOCTYPE html><body style=\"background-color: red\"><img src=\"https://www199.lunapic.com/editor/premade/transparent.gif\" /></body></html>";
            var pdf4 = new PdfDocument();
            HtmlToPdf.ConvertHtml(html, pdf4);
            pdf4.Save(@"C:\temp\EO-penguin.4a.Default.pdf");
            pdf4.Standard = PdfStandard.PDF_A;
            pdf4.Save(@"C:\temp\EO-penguin.4a.PDF_A.pdf");

            pdf4 = new PdfDocument();
            HtmlToPdf.ConvertHtml(html, pdf4);
            pdf4.Standard = PdfStandard.PDF_A;
            pdf4.Save(@"C:\temp\EO-penguin.4b.PDF_A.pdf");
            pdf4.Standard = PdfStandard.None;
            pdf4.Save(@"C:\temp\EO-penguin.4b.Default.pdf");

            
            pdf4 = new PdfDocument();
            HtmlToPdf.ConvertHtml(html, pdf4);
            var pdf4Clone = pdf4.Clone();
            pdf4Clone.Standard = PdfStandard.PDF_A;
            pdf4Clone.Save(@"C:\temp\EO-penguin.4c.PDF_A.pdf");
            pdf4Clone = pdf4.Clone();
            pdf4Clone.Standard = PdfStandard.None;
            pdf4Clone.Save(@"C:\temp\EO-penguin.4c.Default.pdf");
}


BTW, all PDF_A files are validated correctly in Adobe Preflight.
eo_support
Posted: Monday, October 15, 2018 9:33:16 AM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,071
Hi,

Yes. This behavior is by design. Once you save a PdfDocument as PDF/a, the contents of the file is permanently modified. A number of things have been modified to be PDF/A compliant, the most noticeable one being transparency background are replaced with a solid white background. Theoretically we could do the clone internally before saving it thus leaving the original document intact, but there would be a performance penalty for this and for most users this is unnecessary because for them once they save the file as PDF/A then do not need to save the original copy again. So we'd rather leave the decision of whether cloning it to the user.

Thanks!
Jarle
Posted: Monday, October 15, 2018 9:44:55 AM
Rank: Member
Groups: Member

Joined: 8/30/2018
Posts: 15
Very good,
Thank you Angel
eo_support
Posted: Monday, October 15, 2018 1:27:42 PM
Rank: Administration
Groups: Administration

Joined: 5/27/2007
Posts: 24,071
You are very welcome. Thank you very much for detailed analysis though. :)


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.