- We’ve released a new version of Data for Research!
- Try it now
Selecting content of interest is generally an iterative process of narrowing a set of candidate documents into a result set containing only documents of interest. All dcuments from the archive are selected in the initial (or default) result set. The total number of documents currently selected can be seen in a count at the top of the left sidebar. Through an iterative process of entering search terms and/or selecting facet values, the result set is refined. Note that if the final result set is larger than your account threshold (typically set to 1,000) the downloadable dataset will be constructed from a random sample of the result set.
The selection process is accomplished using the controls located in the left sidebar. As selection criteria are defined using the search form or facets, the selections are added to a box at the top of the left sidebar providing continuous feedback on the selection process. The content pane (the right 2/3 of the page) provides different views on the contents of the results set. A Summary tab provides top-level summary charts depicting the composition of the results set. The Results List tab presents the results set in a list presentation with links to detailed document-level views of all provided data formats. The Key Terms tab aggregates the auto-extracted keywords from all documents in the result set into a tag cloud style presentation. The keywords in this tag cloud are selectable and can be used for result set narrowing. The Submit Data Request tab contains a form for submitting a dataset request for processing.
Full text and fielded searching is supported via the search input form. By default any search terms entered into the search form are applied to an entire document. Below the search term input box is a pull-down selector enabling the search term to be applied to specific fields, such as the document title or author names. Although the site provides a feature-rich search capability (which is described below in more detail), it is entirely possible to define a dataset using only point-and-click facet selections.
Below the search box is a series of controls providing selection of document attributes, or facets. The selection of a facet value will narrow the result set to only those documents containing the selected value. As facet values are selected they are both highlighted in the facet list and will also appear in the constraints box (labeled "Selection Criteria") located at the top of the left sidebar. Clicking either of these values will cause the facet value to be deselected.
Immediately to the left of each facet value is a small gray box with an 'X' in it. This is a link which excludes the selection of documents with the associated attribute.
As search terms and facet values are specified they will appear in a new box entitled "Selection Criteria" located at the top of the left sidebar. The items in the selection criteria box are organized by facet (or field). Clicking on the item will remove the constraint from the selection criteria.
Search query syntax
While the faceted search interface provides great utility for defining a result set using simple point-and-click interactions, occasionally more control over the selection process is needed. The interface provides a rich search interface for full text and fielded searching. Below are some details on the search interface.
- A query is broken up into terms and operators. There are two types of terms: Single Terms and Phrases.
- Query terms are case insensitive and should only contain alphanumeric characters.
- A Single Term is a single word such as thomas or jefferson.
- A Phrase is a group of words surrounded by double quotes such as "thomas jefferson".
- Multiple terms can be combined together with Boolean operators to form a more complex query.
The DFR index contains fielded data. When performing a search you may optionally specify a field by either
entering it in the search box, or selecting from the pull-down selector. The pull-down selector contains
a few commonly used field. You can search any field by typing the field
name followed by a colon ":" and then the term you are looking for. As an example, to search for the single
term jefferson in the field title you would enter the text
the search box. Similarly, to search for documents containing the phrase "thomas jefferson" in the title
you would enter the text
The DFR search engine also supports single and multiple character wildcard searches. To perform a single
character wildcard search use the "?" symbol. To perform a multiple character wildcard search use the "*"
symbol. The single character wildcard search looks for terms that match that with the single character
replaced. For example, to search for text or test you can use the search
character wildcard searches match 0 or more characters. For example, to search for test, tests
or tester, you can use the search
test* You can also use the wildcard searches in
the middle of a term, for instance
te*t. Note: You cannot use a * or ? symbol as the first
character of a search.
Boolean operators allow terms to be combined through logic operators. The DFR search engine supports AND, "+", OR, NOT and "-" as Boolean operators (Note: Boolean operators must be ALL CAPS).
- The OR operator is the default conjunction operator. This means that if there is no Boolean operator
between two terms, the OR operator is implied. The OR operator
matches documents where either term exists in the text of a single document. This is equivalent
to a union using sets. The symbol || can also be used in place of the word OR. To search for documents
that contain either thomas or jefferson use the query
thomas OR jefferson, or
thomas || jefferson.
- The AND operator links two terms and selects matching
documents only if both of the terms exist in a document. This is equivalent to an intersection
using sets. The symbol && can be used in place of the word AND. To search for documents that
contain both thomas and jefferson use the query
thomas AND jeffersonor
thomas && jefferson.
- The "+" or required operator requires that the term after the "+" symbol exist somewhere in the
field of a single document. To search for documents that must contain thomas and may contain
jefferson use the query
- The NOT operator excludes documents that contain the term after NOT. This is equivalent to a
difference using sets. The symbol ! can be used in place of the word NOT. To search for documents
that contain jefferson but not thomas use the query
jefferson NOT thomas. Note: The NOT operator cannot be used with just one term. For example, the search
NOT thomaswill not return any results.
- The "-" or prohibit operator excludes documents that contain the term after the "-" symbol. For
intance, to search for documents that contain "thomas jefferson" but not
title:correction use the query
"thomas jefferson" -title:correction.
JSTOR is part of ITHAKA, a not-for-profit organization helping the academic community use digital technologies to
preserve the scholarly record and to advance research and teaching in sustainable ways.
©2000-2010 ITHAKA. All Rights Reserved. JSTOR®, the JSTOR logo, and ITHAKA® are registered trademarks of ITHAKA.