Skip to content
The Times USA
Menu
  • ABOUT
  • CONTACT
  • LIFESTYLE
  • NATIONAL NEWS
  • BUSINESS
  • INTERNATIONAL NEWS
  • TECHNOLOGY
  • PRICE OF BUSINESS SHOW AUDIOS
Menu

Hidden Text – What Lies Beneath – PDF Edition

Posted on June 20, 2019 by admin

By Elizabeth Thede, Special for TTU

This article delves into the hidden text world of PDFs. This piece supplements a previous article https://www.thetimesusa.com/2019/04/29/what-lies-beneath/ looking at hidden text in Microsoft Office files, emails and email attachments.

Both this article and the previous article examine hidden text from the perspective of a search engine, specifically dtSearch. Large organizations like government agencies and 4 out of 5 of the Fortune 500’s largest Aerospace and Defense companies rely on dtSearch enterprise and developer products to instantly search terabytes of Office files, emails, databases and web data. But even if you just want to search across your own PC, you can download a fully-functional 30-day evaluation version of dtSearch Desktop anytime at dtSearch.com.

PDF is actually a printer format. When you look at a PDF document inside a viewer like Adobe Reader, you are typically looking at the document as it would print. However, a search engine like dtSearch would review a PDF file not in associated application like Adobe Reader, but in its raw binary format. That binary format view can look quite different from the associated application view. In fact, if you were looking at a PDF in binary format, it would be hard to visually distinguish any words at all in the text.

By the same token, once a search engine parses a PDF in binary format, it can also see text that might escape scrutiny in a “normal” file view. Since PDF is a printer format, there could be text outside of the page boundary that might be hidden in a “normal” associated application view. That extra text, however, would be readily apparent to a search engine like dtSearch.

PDFs can also include metadata that would be not immediately available in an associated application view but that would be readily available to a search engine. And PDFs can have embedded objects such as embedded MS Office files that could be easy to overlook in an associated application view but that would be “plain as day” to a search engine like dtSearch.

And finally, PDFs can have “white on white” or “black on black” text. Such text would not be readily apparent in an associated application view, but would be completely apparent in binary format. In a recent high-profile criminal proceeding, certain PDF text was visually redacted with black rectangles prior to public release. But while visually blacked out, the underlying text itself (unbeknownst to those who publicly released it) was still fully accessible.

There is also a question of how a search engine recognizes files. What if someone gives a PDF file a Microsoft Office extension, like .docx instead of .pdf? While that might be confusing if you were looking at the file in a directory, a search engine like dtSearch would look at the binary file heading to determine the file type, not the filename extension. So that way, even if a PDF has a .docx extensions, dtSearch will still handle that PDF correctly.

Finally, coming 25 years after Adobe created the original PDF document format, Version 2.0 is a major new release of the PDF file type, and these files are just starting to get out there. dtSearch has also released a new version to make sure that PDF 2.0 files would be separately recognized, and treated accordingly.

From dtSearch.com, you can immediately download and try a fully-functional 30-day evaluation version to instantly search terabytes of your own data. And when you do try the software, check out the forensics-oriented section of the Features Map at dtSearch.com for more “deep dive” search tips on PDFs and other data.

You Might Also Like...

  • What Lies Beneath

    By Elizabeth Thede, Special for TTU   This article addresses hidden text from the perspective…

  • Text Retrieval Beyond Words

    By Elizabeth Thede, Special for The Times USA When most people think of text retrieval,…

  • Big News in File Formats

    By the Price of Business Show, Hosted by Kevin Price.  The Price of Business is a media…

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Celebrating 25 Years of the Price of Business Show

https://www.youtube.com/watch?v=5ViFPGoK-ks

VIDEO: This Week’s Best of our Network

https://www.youtube.com/watch?v=4k2LKOjM7OU

GDPR Compliance

USABR does not collect data on its visitors.  For more information visit: https://www.usabusinessradio.com/contact-us/

Contact

Contact articles@usabusinessradio.net for more information on articles on this site. BMuyco@usabusinessradio.net for all other information.

Recent Articles

  • How Custom Mailer Boxes Help Brands Launch Faster on 3-Day Timelines
  • How to Wear Oud – and Why Most People Get It Wrong the First Time
  • 2026 Estate Planning Strategies for Exiting Business Owners
  • How Private Group Tours Wrangell Handle Bathrooms, Snacks, and Cold Water
  • The Hidden Math Behind How Brands Ship Boxes at Scale

Also in TTUSA

  • The Hidden Math Behind How Brands Ship Boxes at Scale
  • 3 Ways Facebook Advertising can Increase the Retention of your Customers
  • Five-year Contract with the Department of Public Health in Santa Clara County
  • Group Seeks to Radically Change the Business Model of the Pharmaceutical Industry
  • What’s Going on with Tech Stocks?

RSS The Daily Blaze

  • Crisis Management in the Digital Age: For Better or Worse, Social Media Frames the Strategy
  • The Far Reacing Implications on Supreme Court’s Voting Rights Decision
  • Deep Dive on How To Navigate Investing Now
  • After Shunning “The Devil Wears Prada,” Vogue Is Embracing the Sequel. Why?
  • From Revolution to Modern Conflict: Breaking Down the Cuba “Takeover” Threats

RSS USA Business Radio

  • What “Health” Tests Does My Business Need?
  • What Real Estate Investors Need To Know About Hard Money Loans
  • More Questions and Answers on the COVID Tax Refund
  • Production As Precedent: The It Ends With Us Legal Battle
  • Innovation Meets Regulation: The Science of Getting Devices Approved

RSS USA Daily Times

  • Get Organized Day Is April 26. But if We Aren’t Organized Yet, What Are the Chances This Year Will Be Different?
  • Kwong v. United States: A New Legal Precedent for Taxpayers
  • Culture Scholar – Part Two: From Survival to Systems
  • Why Sugar Is So Hard To Quit
  • The Ides of March Is Fast Approaching; Take Heed of Any Warnings in Your Enterprise Data

RSS USA Daily Chronicles.

  • Reclaiming Every Dollar: The Pandemic-Era Interest Freeze
  • The Value Acceleration Journey: How Privately Held Businesses Intentionally Build Enterprise Value
  • Smart Food Choices To Prevent Diabetes
  • When Empathy Backfires: The Leadership Relational Trap
  • How To Make Doula Services Affordable

RSS Price of Business

  • Crisis Management in the Digital Age: For Better or Worse, Social Media Frames the Strategy
  • Inside a Modern Pain Practice in Scottsdale, Arizona With Dr. Nikesh Seth
  • How Does Registering as an LLC Protect You and Your Business?
  • Michael Sealy of Dallas: A Career Built on Full-Cycle Insight
  • Kevin Hayes Baton Rouge: Turning Plans Into Lasting Results

RSS US Daily Review

  • One Year Into the Post-NAR Commission Market, Choice Home Warranty Is Showing Up in More Seller Listing Packages
  • How To Transform Your Life
  • The Signature of the Die: The Invisible Architecture of Everyday Objects
  • A Guide to Finding a Reliable Plumber in Portland Metro Oregon
  • Building Stronger Women, Stronger Communities: The Vision Behind WOVI

PoB Digital Network

US Daily Review

USA Business Radio

USA Daily Chronicles

USA Daily Times

The Daily Blaze

The Times USA

Price of Business

Privacy Policy

https://www.thetimesusa.com/privacy-policy-2/

© 2026 The Times USA | Powered by Superbs Personal Blog theme