[PDF] Use jpeg compression (DCT) for SkPDFImage [40031195]

Fixed

Bug

Status Update

No update yet.

Description

va...@chromium.org

created issue #1

Mar 29, 2011 11:49PM

PDF supports "DCTDecode" compression (See section "3.3 Filters" of the PDF spec). If an SkBitmap is backed by a jpeg (it knows in its guts), we may be able to apply a simple transform to get to a "DCTDecode" compression format instead of having to walk the bitmap.

Comments

ep...@google.com <ep...@google.com> #2May 17, 2011 07:09PM

[Empty comment from Monorail migration]

va...@chromium.org <va...@chromium.org> #3Jun 26, 2012 05:38PM

SkPDFDocument should have a flag to use DCT compression for images over a certain size.

Chrome now considers this a bug:

http://crbug.com/133519

re...@google.com <re...@google.com> #4Aug 29, 2012 09:02PM

1. This is (was/should-be) a mechanism to query a bitmap to see if it has an associated compressed form (e.g. it was decoded from). We plan to beef that up for Picture serialization, so if its not there now, it will be.

2. Also for Picture serialization, we are considering a flag/default to compress large bitmaps on the fly. We had envisioned using PNG since its lossless, but the exact codec isn't too critical.

3. If we considering doing compression for PDF, we should create a more formal way to query (at runtime) if the jpeg encoder is available. The SkImageEncoder API is not required for a build of Skia, but I'm sure we can come up with some runtime query to get at it as needed.

[Deleted User] <[Deleted User]> #5Dec 5, 2012 04:42PM

We'd like to move

http://crbug.com/133519 forward soon - what's the status of this issue?

re...@google.com <re...@google.com> #6Dec 5, 2012 04:55PM

edison and leon, lets chat about this.

va...@google.com <va...@google.com> #7Jan 2, 2013 04:31AM

I looked into what "DCTDecode" means exactly, it seems that it simply wants a JPG file as the content of the stream - no transform of the data is necessary.

However, I didn't see (in src/images/SkImageDecoder_libjpeg.cpp ?) any place where a backing compressed file is associated with an SkBitmap.

re...@google.com <re...@google.com> #8Jan 2, 2013 02:29PM

We recently added SkPixelRef::refEncodedData(), which returns NULL or the pixels already compressed.

Note that this doesn't let the caller specify what sort of compressed form is desired (i.e. restrict it to jpeg for the PDF caller), so we will need to either add a filter/flag/enum to the call to only get back certain type(s), or the caller could sniff the data, and reject it if it isn't supported...

Somewhat related, I would like us to consider adding a runtime flag/hook to allow the PDF backend to only-the-fly compress images into JPEG if that is what the caller wants. Picture serialization already supports this I believe... Leon can chime in on if the existing mechanism is what we want to propagate going forward.

[Deleted User] <[Deleted User]> #9Jan 2, 2013 07:08PM

Pdf can include the encoded bitmap as JPEG (and other types also), but pdf can also include the URL of the image instead.

IMHO this should be extremely useful for cloud PDF printing, which, if I am correct, is the problem we really try to solve.

If we have jpeg images in a web page, we can get the url of the jpeg for print, instead of including the stream, compressed or not.

The issues to be looken into, are: what if the image to be printed requires login, and what other types of images (png?) are common and can be used as URL in PDF

e.g. see

http://partners.adobe.com/public/developer/en/pdf/PDFReference.pdf, example 4.25

16 0 obj % Alternate image
<< /Type /XObject
/Subtype /Image
/Width 1000
/Height 2000
/ColorSpace /DeviceRGB
/BitsPerComponent 8
/Length 0 % This is an external stream
/F << /FS /URL
/F (

http://www.myserver.mycorp.com/images/exttest.jpg)
>>
/FFilter /DCTDecode
>>
stream
endstream
endobj

va...@chromium.org <va...@chromium.org> #10Jan 2, 2013 07:28PM

I don't think we want to make the PDFs require network access. Even worse than a URL requiring some credentials would be URLs on internal networks.

re...@google.com <re...@google.com> #11Jan 2, 2013 07:32PM

Sure, lets look at API control for:

1. force compression (into jpeg)
2. allow URL references instead of pixels

Seems like we can always allow (i.e. don't need api control) for embedded jpeg if the data is already available from the pixelref.

sc...@google.com <sc...@google.com> #12Jan 3, 2013 05:09PM

Re: compress into JPEG if that's what the caller wants:

SkPicture::serialize allows passing in a function for compressing however the caller chooses. At the moment, we give priority to using the refEncodedData, regardless of whether it is in the desired format, but we can modify it to only take the desired format. We'll need to add a way to specify/test the desired format.

va...@google.com <va...@google.com> #13Jan 4, 2013 06:00AM

It looks like SkPixelRef::refEncodedData() only has an empty implementation?

re...@google.com <re...@google.com> #14Jan 4, 2013 01:17PM

In the base-class yes. On the chrome/webkit side, they will override that when they land their lazy-decoding-pixelrefs, which will contain a ref to the encoded data.

[Deleted User] <[Deleted User]> #15Jan 4, 2013 08:27PM

skps converted to pdf stats:
1) uncompressed images: 105MB
2) compressed images, 100 quality: 75MB
3) compressed images, 85 quality: 75MB
4) no images, use urls: 5MB

Of course, not having images included just shifts the cost of downloading on the printer, and it introduces further complications as talked above

As far as I understand from the code (SkPDFStream::populate), we already do loose-less compression, that is why the saving is only of 25% by including jpeg, we basically switch from /Filter /FlateDecode to /Filter /DCTDecode

[Deleted User] <[Deleted User]> #16Jan 4, 2013 08:30PM

correction
skps converted to pdf stats:
1) CURRENT: FlateDecode compressed images: 105MB
2) JPEG (DCTDecode) compressed images, 100 quality: 75MB
3) JPEG (DCTDecode) compressed images, 85 quality: 75MB
4) no images, use urls: 5MB

[Deleted User] <[Deleted User]> #17Jan 7, 2013 07:37PM

I was comparing jpeg size with uncompressed image, but if we use best of JPEG (DCTDecode) or ZIP (FlateDecode) then we get an additional saving of 20%

%) JPEG or ZIP (best of DCTDecode, FlateDecode and Nothing ) compressed images, 50MB

re...@google.com <re...@google.com> #18Jan 16, 2013 03:10PM

Assigned to ed...@google.com.

[Empty comment from Monorail migration]

ed...@google.com <ed...@google.com> #19Jan 17, 2013 03:11PM

Accepted by ed...@google.com.

[Empty comment from Monorail migration]

ed...@google.com <ed...@google.com> #20May 6, 2013 05:36PM

Assigned to ed...@google.com.

we need to enable it in chrome now.

[Deleted User] <[Deleted User]> #21Aug 22, 2013 12:02AM

I'm looking into this, and it seems there is no way for a SkPixelRef to tell what format the encoded data is in (is there?). If this is the case, how do we know that we actually have a JPEG? Is relying on the JPEG magic numbers a good enough guess? Are there additional possible checks?

sc...@google.com <sc...@google.com> #22Aug 22, 2013 12:07AM

Generally we use a decoder to figure out which format the data is in (and yes, we use the magic numbers at the beginning). We try to keep the rest of our code from depending on our decoders, so we use a function pointer in SkPicture serialize/CreateFromStream.

ha...@google.com <ha...@google.com> #23Nov 17, 2015 10:38PM

Marked as fixed.

[Empty comment from Monorail migration]

Issue 40031195

Description

Issue summary

Comments

ep...@google.com <ep...@google.com> #2May 17, 2011 07:09PM

va...@chromium.org <va...@chromium.org> #3Jun 26, 2012 05:38PM

re...@google.com <re...@google.com> #4Aug 29, 2012 09:02PM

[Deleted User] <[Deleted User]> #5Dec 5, 2012 04:42PM

re...@google.com <re...@google.com> #6Dec 5, 2012 04:55PM

va...@google.com <va...@google.com> #7Jan 2, 2013 04:31AM

re...@google.com <re...@google.com> #8Jan 2, 2013 02:29PM

[Deleted User] <[Deleted User]> #9Jan 2, 2013 07:08PM

va...@chromium.org <va...@chromium.org> #10Jan 2, 2013 07:28PM

re...@google.com <re...@google.com> #11Jan 2, 2013 07:32PM

sc...@google.com <sc...@google.com> #12Jan 3, 2013 05:09PM

va...@google.com <va...@google.com> #13Jan 4, 2013 06:00AM

re...@google.com <re...@google.com> #14Jan 4, 2013 01:17PM

[Deleted User] <[Deleted User]> #15Jan 4, 2013 08:27PM

[Deleted User] <[Deleted User]> #16Jan 4, 2013 08:30PM

[Deleted User] <[Deleted User]> #17Jan 7, 2013 07:37PM

re...@google.com <re...@google.com> #18Jan 16, 2013 03:10PM

ed...@google.com <ed...@google.com> #19Jan 17, 2013 03:11PM

ed...@google.com <ed...@google.com> #20May 6, 2013 05:36PM

[Deleted User] <[Deleted User]> #21Aug 22, 2013 12:02AM

sc...@google.com <sc...@google.com> #22Aug 22, 2013 12:07AM

ha...@google.com <ha...@google.com> #23Nov 17, 2015 10:38PM

Add comment

Issue metadata