we are experiencing problems with PDFs downloaded from the internet. Files are frequently blocked by Web Gateway as corrupted due to the Rule "Block Corrupted MediaTypes" which uses the check "Body.IsCorruptedObject". This does not happen all the time, some documents are downloadable without a problem.
Due to the multitude of PDF generators (often inline web based) there is often data that doesn't match up to the standards set in the pdf handler.
One thing I have noticed recently is that there is embedded content that can be detected as application/octet-stream, however it is only detected using mediatype.notensured. To test for this you can make a continue rule that checks against mediatype.notensured and create a rule trace to verify.
If it is application/octet-stream and detected by notensured detection you can create an exception in your rule logic to bypass it.
Other types of corruption in pdf often end up being unusual spacing of data, seemingly erroneous data/characters, etc. If you open one of the pdf's in a hex viewer you may see something unusual. Many of my clients have started to bypass the corrupted archive check for pdf's because they encounter so many of these poorly generated pdf documents. You could create rule logic to only bypass the corrupt check if it was from a GTI minimal risk rated site and/or utilize the GTI file reputation checking to validate the hash as filereputationgood. NOTE: The MWG needs to be configured to utilize GTI File reputations to utilize the file reputation property.
Here's a sample set of logic that was for bypassing encrypted from trusted, but you could utilize it to make exceptions or bypass for corrupted pdf as well, just replace the appropriate properties and adjust logic if you're setting a variable, doing an exception, or creating a stop rule set above: