Skip to content
The Times USA
Menu
  • ABOUT
  • CONTACT
  • LIFESTYLE
  • NATIONAL NEWS
  • BUSINESS
  • INTERNATIONAL NEWS
  • TECHNOLOGY
  • PRICE OF BUSINESS SHOW AUDIOS
Menu

Misconceptions About Enterprise Search

Posted on August 23, 2022 by admin

By Elizabeth Thede, Special for The Times USA

 

Misconceptions about enterprise search abound. This article will attempt to resolve some common ones—and get you on your way to instantly searching terabytes.

The first misconception is that unindexed search is as good as indexed search. For example, the application dtSearch® offers both indexed and unindexed search. However, indexed search is far and away the gold standard. Indexed search is instant, even across terabytes of content and even in a multithreaded concurrent search environment such as with a network installation, on a local web server, or in the cloud.

Beyond the speed of indexed search, it also enables more search options. Most of the 25+ dtSearch search options cover both indexed and unindexed search. But indexed search has some extra search options as well, like the ability to flag credit card numbers that may appear in indexed data. The indexer can run a series of numbers which might represent a credit card number through a validation algorithm to determine if it is actually a credit card.

The next enterprise search misconception is that building an index is somehow hard. In reality, it couldn’t be easier. All you need to do is point to the folders, email archives, and the like to index, and the search engine does everything else, reviewing each file in its binary format. From the binary format, the search engine determines the applicable file type. After figuring out the file type, the search engine uses the file format specification for that file type to recognize all full-text and metadata.

Beyond storing each unique word and number in the data, the index also stores information on the location of each word and number. A single index can hold up to a terabyte of text. There are no limits on the number of terabyte indexes that the search engine can create, and end-users can instantly concurrently search.

For changing datasets, the search engine can use the Windows Task Scheduler to update indexes as often as you like. To update an index, the search engine need only re-index files that have been added, deleted or modified since the previous index build. Updating an index does not block out individual or concurrent searching, so all searching can continue unaffected during the update.

The next misconception is that a search engine will incorrectly handle files with a mismatched file extension, like a PDF saved with an .DOCX file extension. It is true that a search engine needs to correctly identify the file type of every file to determine the relevant parsing specification to apply. But a search engine can figure out the applicable file type from the binary file itself, without reference to the file extension at all. In fact, the file extension is extraneous to this process.

The next misconception is that a mistype will thwart a search engine. Say you mistype Mississippi in an email, maybe adding or deleting an extra S or mistyping a P as a Q. But fuzzy searching adjusts from 1 to 10 to accommodate text deviations. Even a low level of fuzzy searching would pick up any of these Mississippi mistypes. Fuzzy searching works alongside other search types, like Boolean and/or/not searching and proximity searching, so it is easy to just keep fuzzy on at a low-level while searching.

Why not leave fuzzy searching on at a high level? While a higher level of fuzzy searching will pick up the largest numbers of typographical and OCR deviations, it also finds false hits. At some point, Mississippi with a high enough level of fuzziness is also going to pick up Missouri, so it’s a trade-off.

The next misconception is that text that is obscure in an associated application display will be equally unapparent to a search engine. If you look at a standard file—PDF, Word, Excel, Access, PowerPoint, OneNote, etc.—in its native or associated application, white text against a white background, black text against a black background and the like can be very hard to spot. But in binary format, black on black or white on white is just as apparent as regular black on white writing.

Likewise, certain metadata is easy to miss in an associated application in that it can take a whole lot of clicking around before you even realize it is there. But all metadata is equally apparent in the binary format of a file. Similarly, a file can have a recursively embedded document inside of it where only a few lines of the embedded document may be visible by default. But the whole embedded file is easily accessible in a binary format view. A search engine can also handle a multilevel nested file structure, like an email with a ZIP or RAR attachment containing a Word document with an Excel spreadsheet embedded inside.

About dtSearch. dtSearch has enterprise and developer products that run “on premises” or on cloud platforms to instantly search terabytes of “Office” files, PDFs, emails along with nested attachments, databases and online data. Because dtSearch can instantly search terabytes with over 25 different search features, many dtSearch customers are Fortune 100 companies and government agencies. But anyone with lots of data to search can download a fully-functional 30-day evaluation copy from dtSearch.com

 

RELATED: Kevin Price of the Price of Business show discusses the topic with Thede on a recent interview.

You Might Also Like...

  • An Ethical, Privacy-First Approach to Business Search Engines

    INTERVIEW ON THE PRICE OF BUSINESS SHOW, MEDIA PARTNER OF THIS SITE. Recently Kevin Price,…

  • To Get Your Business To Where It Should Be, You Need SEO

    You may have the best product or service available, but if nobody knows about it…

  • The COVID Crunch on Business

    INTERVIEW ON THE PRICE OF BUSINESS SHOW, MEDIA PARTNER OF THIS SITE. Recently Kevin Price,…

  • Explore how to use Enterprise Change Product Development

    Because a company’s products and services represent all its value-creating activities and naturally form critical…

  • Celebrating Gratitude for the Holidays on the Price of Business

    INTERVIEW ON THE PRICE OF BUSINESS SHOW, MEDIA PARTNER OF THIS SITE. Recently Kevin Price,…

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Celebrating 25 Years of the Price of Business Show

https://www.youtube.com/watch?v=5ViFPGoK-ks

VIDEO: This Week’s Best of our Network

https://www.youtube.com/watch?v=zUpXVeHBKYQ

GDPR Compliance

USABR does not collect data on its visitors.  For more information visit: https://www.usabusinessradio.com/contact-us/

Contact

Contact articles@usabusinessradio.net for more information on articles on this site. BMuyco@usabusinessradio.net for all other information.

Recent Articles

  • Departure of Seven Michigan State Presidents in Ten Years Points to Systemic Issue
  • The Structural Failures Behind America’s Motorcycle Fatality Problem
  • Inside the Numbers: How Sexual Abuse in U.S. Prisons Persists Despite Two Decades of Reform
  • The Uneven Map of Drunk Driving in America: What Simmrin Law Group’s Data Reveals About Risk, Law, and Behavior
  • The Hidden Geography of Cyclist Danger: What the Data Shows About Where Riders Face the Greatest Risks

Also in TTUSA

  • How To Get a Gig Without Losing Your Mind (or Your Voice)
  • Inside the Numbers: How Sexual Abuse in U.S. Prisons Persists Despite Two Decades of Reform
  • Repulicans and Democrats in Congress Celebrate Historic IRS Reform’s Passage
  • France’s New Proposed Digital Tax is Aiming at US Technology Firms for Revenue
  • Americans Unprepared for Catastrophes- Survey Finds

RSS The Daily Blaze

  • When AI Awakens: Humanity’s Fight for the Future
  • AI Fear Grabs College Students As They Graduate in 2026
  • Trump Has Weighed In on the US Senate Race in Texas
  • Why Wholesale Cardboard Boxes Matter More as Carrier Surcharges Rise 9%
  • Violent Crime Versus White Collar Crime

RSS USA Business Radio

  • Hostage Funds: Why $829 Billion in Private Equity Capital Is Structurally Trapped
  • The Economics of the Four Day Workweek
  • Change, Resilience, and the Enneagram, Oh My!
  • Leading Fiscal Policy Expert Provides US Economy a “Physical”
  • A Powerful Example of Feedback at Work

RSS USA Daily Times

  • The Fatty Acid Burn Switch and the Glucose Cycle
  • How Entertainment Franchises Are Reshaping the Snack Aisle
  • Get Organized Day Is April 26. But if We Aren’t Organized Yet, What Are the Chances This Year Will Be Different?
  • Kwong v. United States: A New Legal Precedent for Taxpayers
  • Culture Scholar – Part Two: From Survival to Systems

RSS USA Daily Chronicles.

  • Reclaiming Every Dollar: The Pandemic-Era Interest Freeze
  • The Value Acceleration Journey: How Privately Held Businesses Intentionally Build Enterprise Value
  • Smart Food Choices To Prevent Diabetes
  • When Empathy Backfires: The Leadership Relational Trap
  • How To Make Doula Services Affordable

RSS Price of Business

  • Change, Resilience, and the Enneagram, Oh My!
  • AI Fear Grabs College Students As They Graduate in 2026
  • Frankenstein Goes to SF State
  • Common Myths About Dental Implants (and the Truth Behind Them) — Insights From Naples Dental and Wellness Center
  • How To Automate the Most Boring Parts of Running a Business

RSS US Daily Review

  • Faith Meets Fantasy: The LitRPG Revolution
  • How a Quiet Morning Prayer Became a #1 Bestseller in Three Countries
  • Pelvic Floor Health: Why It Matters More Than Most People Realize
  • Constantly on Alert: When Stress Becomes the New Normal
  • The Greatest Healthcare Disruption in History Is Happening Now

PoB Digital Network

US Daily Review

USA Business Radio

USA Daily Chronicles

USA Daily Times

The Daily Blaze

The Times USA

Price of Business

Privacy Policy

https://www.thetimesusa.com/privacy-policy-2/

© 2026 The Times USA | Powered by Superbs Personal Blog theme