I. Introduction
Although there are many free data sources on the
Internet, some of the free compilations lack the kind of rigorous
quality control found in data sets available in the commercial
sector. Various types of scientific and technical data available
on the Internet were surveyed and demonstrated during the talk
at NIST. Without standards, questions of accuracy and reliability
of Internet data invariably arise.
In a comment made on CHMINF-L on October 30, 1996,
David Lide said, "All in all, the chemical data now available
on the web is in a different class from the data found in refereed journals, critical reviews and books from reputable
publishers." The comment was not intended to flatter the
compilers of Web data sources.
A survey was undertaken in October 1996 to determine
for Web data whether steps are needed to improve the quality of
data on the Web. Questions sent to CHMINF-L & CHEMWEB were
intended to:
The accuracy of the data on the Web was roundly criticized
by some. They pointed out that units are frequently omitted and
transcription errors are often encountered. Respondents noted
that very few sources on the Web have quality assurance statements,
few give the source of the data, and if they do, they often indicate
that the data are copied from outdated sources. Therefore, a
need was expressed for a minimal level of auxiliary information
(metadata) providing at least such information as authorship,
units, conditions of measurement, and references to primary and
secondary sources of data. Furthermore, standard symbols and
terminology ought to be used in the compilations, some of which
suffer from a lack of guidelines on how to handle special characters
In light of the above criticism, the following steps
were suggested to improve data on the Web:
At a minimum, compilers ought to provide descriptions
of physical theories on which data are based, full references
to literature, and descriptions of the format of the database
and its search capabilities.
There are some standardization efforts underway on
the net, particularly on the publishing side, where the CLIC project,
Chemical MIME, and CML were noted as hopeful signs. Although
one person questioned whether standardization was worthwhile,
in general, there was seen to be a role for IUPAC, CODATA, or
other bodies in the area of data certification.
III. Finding Data on the Web.
A quote from a recent computer journal points out
one of the fundamental problems of using the Web to obtain data:
"While some might argue that the Internet is designed to
make information in a single location accessible to users around
the world, the large number of mirrored sites already in existence
points out the Net's inadequacy." (Byte, December
1996, p. 116)
One respondent to the survey noted the relevance
of Lebedev's study of Internet search engines (http://www.chem.msu.su/eng/comparison.html).
Lebedev searched for data using words on 11 Web search engines.
He concluded that Excite retrieves a comparable number of documents
to Altavista and that Metacrawler is the most powerful search
engine for scientific and technical information. The author compared
his Internet searches to INSPEC results for the same information
covering 1994 & 1995. He found that only 5-10 % of relevant
information is on the net. However, Lebedev considers the Web
to be good for supplemental information on authors, on their work
and research projects, and on foundations supporting them.
It is possible to find data on the Internet by following some generally accepted procedures, such as:
Lists of Sources (Guides)
Known Sources
http://www.shef.ac.uk/~chem/chemputer/
http://dragon.labmed.umn.edu/~lynda/index.html
Comprehensive Chemistry Guides
http://chemfinder.camsoft.com
http://schiele.organik.uni-erlangen.de/services/webmol.html
http://www.tripos.com/spacecrunch/
Other Examples
http://www.lib.utexas.edu/Libs/Chem/info/thermodex/
http://funnelweb.utcc.utk.edu/~athas/databank/intro.html
Internet Demos
http://www.indiana.edu/~cheminfo/ca_accc.html
Go to the Analytical Chemistry page, then to MS Links at SIS, then Dave's Math Tables
http://www.sisweb.com/math/tables.htm
http://micro.ifas.ufl.edu/
Plays "Happy Birthday to You" on
an NMR Spectrometer!
http://xray.uu.se/hypertext/corexdb.html
SEARCH naphthalene
http://alfred.niehs.nih.gov/LMB/stdb
ENTER THE DATABASE doesn't work, but HIPPO
does
http://emrs.chm.bris.ac.uk/
Beautiful background!
In "About the Database" in the Introduction, Spectra examples,
Show the example Cu(II) (nothing else works!)
http://www.cica.indiana.edu/~recip/
http://www.indiana.edu/ReciprocalNet.html
http://molbio.info.nih.gov/cgi-bin/pdb
Search dehalogenase (E.C.3.8.1.5)
http://webbook.nist.gov/chemistry
Look for 91-56-5
http://ozone.sph.unc.edu
Has "Environmental Data, but it's "under
construction"
http://www.lib.utexas.edu/Libs/Chem/info/thermodex/
Search Gibbs Free Energy and organic
http://chemfinder.camsoft.com
Search MEK
http://schiele.organik.uni-erlangen.de/services/webmol.html
Search MEK, then 2-butanone
http://www.tripos.com/spacecrunch/