Hardcopy Document Processing Workshop
Conference on Information and Knowledge Management
CIKM 2004

November 12, 2004 - Hyatt Arlington Hotel - Washington D.C.

Workshop Topic | Relevance of Topics | Target Audience | Organizing Committee
Agenda | Submission Requirements and Evaluation Criteria | Final Submissions | Schedule

Relevance of Topics

During the 1990s, CEOs realized that their companies' intellectual assets represented the majority of corporate wealth. The discipline of knowledge management was born to leverage these resources. Intellectual assets are primarily contained in hardcopy document collections. Many of these collections were scanned, indexed, OCRed and placed upon corporate intranets to be employed to gain competitive advantage.

Businesses, particularly highly regulated industries such as pharmaceutical, environmental, and transportation, generate hardcopy records that must be retrievable to demonstrate compliance. Accurate document retrieval requires sufficient indexing. Unfortunately, sufficient indexing requires a priori knowledge of future, unknown requirements.

Many government applications need to retrieve and process hardcopy documents on an on-going basis. Law enforcement and national defense
organizations have a critical need to process hardcopy document content. In many cases, documents must be exploited in near-real time for their
content to be actionable. Further, documents of interest to the Government tend to be very noisy and often contain multiple handwritten annotations or other marks.

Currently, the only viable solution is to be able to retrieve and process the content OCRed documents. In virtually all of these examples, the cost, in either time or capital, of correcting OCR is prohibitive, and therefore either OCR accuracy must be improved, the ability to process noisy OCR must be improved, or new, innovative techniques must be developed to process text in the image domain. The ability to process hardcopy documents is a challenge of international importance and an appropriate workshop topic for this CIKM Conference.