Relevance of Topics
During the 1990s, CEOs realized that their companies' intellectual assets represented
the majority of corporate wealth. The discipline of knowledge management was
born to leverage these resources. Intellectual assets are primarily contained
in hardcopy document collections. Many of these collections were scanned, indexed,
OCRed and placed upon corporate intranets to be employed to gain competitive
advantage.
Businesses, particularly highly regulated industries such as pharmaceutical,
environmental, and transportation, generate hardcopy records that must be retrievable
to demonstrate compliance. Accurate document retrieval requires sufficient indexing.
Unfortunately, sufficient indexing requires a priori knowledge of future, unknown
requirements.
Many government applications need to retrieve and process hardcopy documents
on an on-going basis. Law enforcement and national defense
organizations have a critical need to process hardcopy document content. In
many cases, documents must be exploited in near-real time for their
content to be actionable. Further, documents of interest to the Government tend
to be very noisy and often contain multiple handwritten annotations or other
marks.
Currently, the only viable solution is to be able to retrieve and process the
content OCRed documents. In virtually all of these examples, the cost, in either
time or capital, of correcting OCR is prohibitive, and therefore either OCR
accuracy must be improved, the ability to process noisy OCR must be improved,
or new, innovative techniques must be developed to process text in the image
domain. The ability to process hardcopy documents is a challenge of international
importance and an appropriate workshop topic for this CIKM Conference.