Frequently Asked Questions


1. What is CLARITweb and CLARIT?
2. What is currently available to search and browse?
3. How can I navigate through the digital collection?
4. Why are many of the documents skewed and/or appear with poor contrast?
5. How are documents queued and scanned?
6. How do the images become "searchable"?
7. When viewing CLARITweb: Results, what is the format of the displayed data?
8. What are Document Types?
9. What is a bundle?
10. Can I read the descriptions (i.e., Scope and Content Notes) for each subgroup, series, subseries, etc.?
11. Can I limit a search to primary resource material directly associated with Senator John Heinz?
12. How can I determine the physical location of the documents?
13. How can I view the corresponding ASCII text, metadata, quality assessments, archivist's notes and any transcriptions for a given image?


1. What is CLARITweb and CLARIT?

CLARITweb is an advanced search system from CLARITECH Corporation. CLARITweb provides concept-based, interactive retrieval for document collections based on their full-text content.

A CLARIT query is a prose description specifying the information you want to find. CLARIT uses natural language processing to identify the concepts in your query (words, phrases, document excerpts, whole documents) and return a ranked list of relevant documents containing those concepts. CLARIT can also use natural language processing to enrich your query with related concepts found in the documents most relevant to your query.


2. What is currently available to search and browse?

HELIOS provides electronic access to nearly 800,000 pages of material from the Heinz Congressional Papers. HELIOS was updated and completed on November 10, 1999.  The bulk of the collection is available online.


3. How can I navigate through the digital collection?

Once a search is performed and a document is displayed, you have the ability to:


4. Why are many of the documents skewed and/or appear with poor contrast?

That's what the originals really look like! Remember, the papers were consulted frequently, distributed to many staff members, and crammed into brief cases, folders and notebooks. Many of the documents are poorly made photocopies; others, like JH Memos and JH Decision/Action Requests, we reprinted by the Heinz office staff on pink paper which does not contrast very well.


5. How are documents queued and scanned?

Since we are attempting to digitize the record of Senator Heinz' congressional life and activities, we are scanning approximately one million pages which represent the majority of the collection. Scanning occurs once an entire series or subseries is processed, described and inventoried. Based on perceived usefulness and importance to researchers, the Heinz Archivist designates which series and subseries to scan.

HELIOS provides item by item access to the complete contents of material in folders. However, there are specific types of documents that are not scanned entirely; rather, only the first page or cover is scanned, and therefore, the physical folder in the H. John Heinz III Archives must be consulted to view the entire document. These documents often consist of bound reports and publications that present difficulty in scanning and constitute background material collected by the staff. Most government publications were removed during processing since these materials are available at government repositories. We replaced the original document with a photocopy of the cover or indicated that the document was removed.

If you view the metadata associated with the image, the <..SEE> tag indicates if the complete document can be found in the folder, or if you must consult a local repository to view the item.


6. How do the images become "searchable"?

ASCII text is extracted from the images using optical character recognition(OCR). The Heinz Archives' staff is transcribing selected portions of primary source documents that failed to OCR, including significant handwritten notes. The transcriptions later become indexed by CLARIT.


7. When viewing CLARITweb: Results, what is the format of the displayed data?

CLARITweb: Results displays descriptive information extracted from several database fields. This "title" (TI) incorporates the following information:

If you retrieve a document that indicates "[TOC] Name of subgroup, series, etc.", this is a hit on the description of a specific subgroup, series, subseries, folder, or document.


8. What are Document Types?

Since it is not common for archival material to have a defined or designated title (like books and journals do), we decided to classify the documents into types. This serves two purposes:


9. What is a bundle?

The [bundle] designation in the "title" refers to a distinct group of documents within a folder that were originally fastened together by paperclips, staples, or rubber bands, and often reflect inherent meaning. For example, memos to Senator Heinz often included attachments such as previous memos on the subject, speeches, correspondence, etc. This feature allows you to recognize this original context.


10. Can I read the descriptions (i.e., Scope and Content Notes) for each subgroup, series, subseries, etc.?

When a Browse Collection or Where am I? command is invoked, the hierarchy of the collection is displayed. Each (About...)link takes you to the written description of the unit of information (e.g., the series description).


11. Can I limit a search to material directly associated with Senator John Heinz?

You can use the CLARITweb: Limit Search page to limit your search to Principal Persona. By selecting "'Yes," this will constrain the search to the set of documents believed by the Heinz Archives staff to be associated with Senator Heinz. These documents would include memos and correspondence to/from him, speeches delivered by him, press releases issued by his office, handwritten notes by him, etc.


12. How can I determine the physical location or context of a particular document?

When viewing the image or text display, select the Where am I? button. This command displays the precise location of a particular document in the collection finding aid. Two views of the document are shown: 1)the location of the document in relation to its Subgroup, Series, Subseries, Box number, Subject, Folder Title, Folder Date, Folder Number, and Bundle; and 2) the location of the document in relation to the folder contents.


13. How can I view the corresponding ASCII text, metadata, quality assessments, archivist's notes and any transcriptions for a given image?

When viewing CLARITweb: Results or the image display, select View Text then Re-Display . This view of the document displays contextual information captured during scanning: Title, Document Type, Document Date, Bundle Number, Folder Number, Subject (if applicable), Folder Title, Folder, Date, Box Number, and Subgroup name, Series name, Subseries name, etc.

For help or to post your comments, please contact helios-help@andrew.cmu.edu.


Updated August 23, 2000 -- http://diva.library.cmu.edu/HELIOS/FAQ.html
Gabrielle V. Michalek, Head of Archives/Digital Library Initiatives

Welcome to HELIOS

H.John Heinz III Archives

Carnegie MellonUniversity Libraries