Microfilm Today


5. Indexing and retrieval

The indexing system that has been adopted for a microfilm system will dictate how much time must be spent preparing documents for input. Documents in electronic formats usually contain sufficient machine readable information for filing and retrieval but paper may have to be sorted into batches and classified prior to input.

Creating an index from scanned film

Scanning of a microfilm image produces a digital image which is really just a series of dots. An important advantage of this process is that a secure digital image is created which is difficult to edit and suitable for use in evidence.

Indexing is a complex, time consuming and expensive necessity which is too often under-estimated when designing and scheduling systems and estimating costs. Too little indexing renders potentially valuable information useless, because it can not be accessed as and when required, but over-elaborate indexing is an inexcusable waste of time and money.

One of the advantages of  thorough preparation for the introduction of any microfilm system is that all potential users, and the way in which they need access, will have been identified for each document type. Although there are many levels of sophistication available, most indexing systems will fall into one of three broad categories.

Typical index types

Simple (flat-file) indexes rely on the card index concept. Each image or group of related images is allocated a unique index record and the terms by which it will be retrieved are listed on that record. An additional document with similar characteristics will also have a unique record. This can be effective for small applications but, because information common to many records is entered again and again, it is bulky and unsuitable for large systems.

Relational databases have been developed to overcome this unnecessary duplication of identical data. Michael Halvarson and Michael Young of Microsoft express the benefits very clearly. "Structuring your data in a relational format has a number of advantages. You'll save considerable time by not having to enter the same data again and again across many records. Your database will be smaller, often a fraction of the size of a flat-file database, saving space on your system and making the database more portable if you want to share it with others. Data entry errors will be greatly reduced - how many times can you type Thermodynamics Theory into a Class Name field without error?  If the repeated data is stored in a related table, you need to enter the correct information just once; then, in the original table, you enter only the identifier of the information - usually a short numeric or alphanumeric code - each time the repeated data occurs. What you do need to understand is that a "field" is a category of information, an "entry" is the information that goes into a field for a single record, a "record" consists of the related entries for an individual item in the database (and fills up a row within a table), and you can set up relationships between separate tables so that you need enter repeated information only once." We think that this is an excellent description.

The third broad type of indexing option is hypertext, which is familiar to anyone using the World Wide Web. It is best suited to research and knowledge management applications when the requirement is not so much for a specific record but for any or all records containing relevant information. It allows users to browse through hundreds or even millions of documents, hopping from one to another by clicking an underlined hyperlink. Its effectiveness depends on the quality of the editorial effort employed in setting up the links; for many applications that has proved difficult to automate.

Metadata: Every record in any system needs two types of data linked to it; data used to manage and control the document within the system and data employed by users for search and retrieval purposes. Control may involve automatic distribution of new documents to those known to need them, the introduction of codes to limit access to authorised users only, methods of  logging each reference to the document, maintenance of audit trails, data linking each stage in the development or amendment of a document, automatic movement from one storage medium to another at pre-arranged points in the document life cycle and possible subsequent automatic destruction. This data is normally entered into fixed length fields and held in a relational database.

Data employed for search and retrieval can also be held in the same database if the terms that users will employ for retrieval can be identified. Invoices, for example, can normally be indexed by a limited number of fields such as date, number, customer name or ID, total amount etc. Research and knowledge type documents are not so easy to classify because it is difficult to predict which part of the content will be of future value and how the information will be requested; in such cases free text searching techniques can be employed. The concept relies on searching keywords, an abstract of the document, or its complete text.

The information above is greatly simplified and expert guidance is essential to ensure that the most appropriate method of control and indexing is adopted. The important point to note is that indexing requirements will greatly influence the choice of suitable microfilm reading equipment.

(next chapter)  (back to top)


1. Introduction

2. Microfilm today

3. Getting started

4. Input and output methods

5. Indexing and retrieval

6. Management and control

7. Storage and preservation

8. Hybrid systems

9. Services available

10. Standards

Webmaster: Gerald Baker     Last update 14/1/2018     G G Baker & Associates 2018