Indexing and retrieval
The indexing system that has been adopted for a microfilm system will dictate how much time must be spent preparing
documents for input. Documents in electronic formats usually contain
sufficient machine readable information for filing and retrieval but paper
may have to be sorted into batches and classified prior to input.
Scanning of a microfilm image produces a
digital image which is really just a series of dots. An
important advantage of this process is that a secure digital image is
created which is difficult to edit and suitable for use in evidence.
Indexing is a complex, time consuming and
expensive necessity which is too often under-estimated when designing and
scheduling systems and estimating costs. Too little indexing renders
potentially valuable information useless, because it can not be accessed as
and when required, but over-elaborate indexing is an inexcusable waste of
time and money.
One of the advantages of thorough preparation for
the introduction of any microfilm system is that all potential users, and the way
in which they need access, will have been identified for each document type.
Although there are many levels of sophistication available, most indexing
systems will fall into one of three broad categories.
Simple (flat-file) indexes rely on the card index concept.
Each image or group of related images is allocated a unique index record and the terms by which it
will be retrieved are listed on that record. An additional document with
similar characteristics will also have a unique record. This can be
effective for small applications but, because information common to many
records is entered again and again, it is bulky and unsuitable for large
Relational databases have been developed to overcome this
unnecessary duplication of identical data. Michael Halvarson and Michael
Young of Microsoft express the benefits very clearly. "Structuring your data
in a relational format has a number of advantages. You'll save considerable
time by not having to enter the same data again and again across many
records. Your database will be smaller, often a fraction of the size of a
flat-file database, saving space on your system and making the database more
portable if you want to share it with others. Data entry errors will be
greatly reduced - how many times can you type Thermodynamics Theory into a
Class Name field without error? If the repeated data is stored in a
related table, you need to enter the correct information just once; then, in
the original table, you enter only the identifier of the information -
usually a short numeric or alphanumeric code - each time the repeated data
occurs. What you do need to understand is that a "field" is a category of
information, an "entry" is the information that goes into a field for a
single record, a "record" consists of the related entries for an individual
item in the database (and fills up a row within a table), and you can set up
relationships between separate tables so that you need enter repeated
information only once." We think that this is an excellent description.
The third broad type of indexing option is hypertext,
which is familiar to anyone using the World Wide Web. It is best suited to
research and knowledge management applications when the requirement is not
so much for a specific record but for any or all records containing relevant
information. It allows users to browse through hundreds or even millions of
documents, hopping from one to another by clicking an underlined hyperlink.
Its effectiveness depends on the quality of the editorial effort employed in
setting up the links; for many applications that has proved difficult to
Metadata: Every record in
any system needs two types of data linked to it; data used to manage and
control the document within the system and data employed by users for search
and retrieval purposes. Control may involve automatic distribution of new
documents to those known to need them, the introduction of codes to limit
access to authorised users only, methods of logging each reference to
the document, maintenance of audit trails, data linking each stage in the
development or amendment of a document, automatic movement from one storage
medium to another at pre-arranged points in the document life cycle and
possible subsequent automatic destruction. This data is normally entered
into fixed length fields and held in a relational database.
Data employed for search and retrieval can also be held in
the same database if the terms that users will employ for retrieval can be
identified. Invoices, for example, can normally be indexed by a limited
number of fields such as date, number, customer name or ID, total amount
etc. Research and knowledge type documents are not so easy to classify
because it is difficult to predict which part of the content will be of
future value and how the information will be requested; in such cases free
text searching techniques can be employed. The concept relies on searching
keywords, an abstract of the document, or its complete text.
The information above is greatly simplified and expert
guidance is essential to ensure that the most appropriate method of control
and indexing is adopted. The important point to note is that indexing
requirements will greatly influence the choice of suitable microfilm reading