Library-Laboratory Collaboration for Research Data
Conceptual Model
Note: The conceptual model has been revised through work in a supplement to the original grant. A revised diagram and description are forthcoming.
The model that was developed through work with the Cornell Language Acquisition Laboratory has as its primary goal the creation of a single point of high level discovery for widely differing data from a variety of disciplines, institutions and laboratories. This requires the acknowledgement that disciples do not organize or understand their data in mutually compatibile frameworks, and this is unlikely to change. The innovative nature of research means that any given project may create new concepts or ontologies to describe its work, and this is a vital aspect of invention even though the highly variable result may be difficult to incorporate into a unified system. The hierarchized ontology model attempts to work within these restrictions by creating vertical bridges between data sets. The image of this model uses interlocking triangles to show how each node will incorporate only the "upper," most general information contained in the nodes below.
Level one of the model represents the cross-discipline discovery tool that is the end result of all of these bridges. It uses a general, non-discipline-specific ontology to provide information to the general searcher about types of research data, their languages, formats and availability. Interested searchers may follow links to more detailed information in the second level - the discipline node.
Level two represents a node for each incorporated discipline. Each disciple node will use a more specific ontology to give a view of research data that is more detailed and framed in a discipline-specific world-view. In addition to information about research projects, it may incorporate information a discipline research standards, etc. The data model of each discipline may be determined independently, and a bridge built to deliver the general, level one relevant information to the top level node. It is likely to be the starting point for a many searchers. Users interested in materials in this node should follow links to the third level - the institution node.
Level three represents a node for each institution research department. The level three node may use a more detailed or more institution-specific view of research activities. It should introduce information about departmental relationships, labs, staff, standards, etc. A bridge will be necessary to send the information relevant to the discipline as a whole to level two.
Level four represents a node for each research lab or project. This will be the most variable node, often including data that is publicly available, data available on request to interested parties, and/or private data. Whenever possible, documentation of procedures and standards followed should be available. At this level, any standardization of formats that can be accomplished should be done, as truly unique data formats are far less likely to remain comprehensible over time. Still, this level can be expected to house a wide variety of formats. As this level represents the bulk of the research data, it is most important to house this node in a secure preservation envirnment. A bridge must be built - not to deliver the research data itself - but to deliver metadata about the research to the level three node.

