Internet Resource Discovery Tools and Services for Chemistry

G. Wiggins

Indiana University

Keywords: Internet directories guides chemistry resources

Abstract: The number of Internet resources of interest to chem- ists has multiplied dramatically in recent years. In order to take full advantage of the Internet, chemists must first discover what resources are available and how to use them. An overview of the major Internet resource discovery tools, both general tools that include chemistry and tools developed especially for chemis- try, is given in this article. Some chemistry Internet resources are described, and predictions are made about further develop- ments on the Internet.

1. Introduction

Use of the Internet has spread rapidly throughout the world. With a multitude of resources rivaling in some respects the world's most renowned traditional libraries, the Internet has been touted as everyman's reference library. It has been esti- mated that in May 1994 alone, "800 gigabytes--the equivalent to 2300 Encyclopedia Britannicas--of information traveled over the Web." (Ref 1). Which among the vast numbers of Internet re- sources are likely to be viewed by chemists as important? A few authors of journal articles have attempted to provide guidance to chemists seeking to mine the Internet's riches, and there is at least one book in production which is aimed at chemists (Ref 2).

Varveri, in a paper entitled, "Information Retrieval in Chemis- try," concentrates on free resources available via BITNET or Internet (Ref 3). Written before the development of tools such as Mosaic or Cello, the article covers electronic mail lists, FTP (File Transfer Protocol), Telnet, and search tools (e.g., Archie, HYTELNET, etc.). Varveri's work serves as a good brief tutorial for gaining access to the Internet, providing information on important discussion lists and giving the most commonly used commands for LISTSERV resources, FTP, and Telnet.

The basic uses of the Internet from the standpoint of a chemist are also surveyed by Wolman (Ref 4). He mentions some chemistry resources that arrived on the scene after the Varveri paper was published. Wolman poses the question, "How can one find out about resources on Internet in general, and about the chemistry [resources] on Internet in particular?" After suggesting some Internet resources to help in this regard (Archie, Gopher, WAIS, and WWW), the author refers the reader to three popular general printed books on the Internet for further assistance.

The focus of Heller's "Analytical Chemistry Resources on the Internet" is narrower than that of the previous authors (Ref 5). Nevertheless, he notes that "...trying to write about the [Inter- net] resources available to the analytical chemistry community is like mud wrestling--a real mess." Heller concludes that the best way to find chemistry resources on the Internet is from lists of resources collected by someone and put on the network.

In order to make the most of Internet resources, a chemist needs to be able to survey the available resources and quickly sift out those of potential interest. However, as pointed out by Parker at the June 1994 National Chemical Information Symposium, there is no all-encompassing Books in Print for the Internet (Ref 6). In spite of the lack of a truly comprehensive directory, there are certain tools to help unearth the real Internet chemistry jewels. Among those are general directories of Internet re- sources, including guides that list resources by type (e.g., gopher, WWW, WAIS, etc.), and a few directories developed espe- cially for chemistry and related disciplines.

2. General Directories with Chemistry Resources

One of the more interesting facets of the Internet is that some of the resource guides that started out (and in most cases, remain) as free resources on the Internet now have printed com- mercial counterparts. These are aimed at both individuals and libraries and serve the same purpose as any other guide or direc- tory--to put people in touch with a needed information resource. General Internet resource directories allow the user to find many different types of Internet tools (e-mail lists, databases, WWW or gopher servers, etc.) The disadvantage in using them is that the selection criteria used by the compilers may not be obvious. Let us look first at general Internet directories that are also available as books.

An up-to-date printed tool is the Directory of Electronic Jour- nals, Newsletters, and Academic Discussion Lists (Ref 7). The fourth edition of the Directory is available commercially, either in paper or on diskette, with nearly 1800 scholarly discussion lists and some 440 electronic journals, newsletters, and related titles. It is based in part on Michael Strangelove's "Directory of Electronic Journals and Newsletters," but the bulk of the entries in the Directory are taken from Diane Kovacs's "Directory of Scholarly Electronic Conferences."

The Internet version of Strangelove's "Directory of Electronic Journals and Newsletters" may be retrieved via FTP from the CONTENTS Project FTP archive at the node panda1.uottawa.ca as file electronic-serials-directory.txt in the /pub/religion direc- tory. It is also available via LISTSERV from listserv@uottawa or listserv@acadvm1.uottawa.ca as the files EJOURNL1 DIRECTRY and EJOURNL2 DIRECTRY. (See the following entry for details on using the LISTSERV GET command.) Strangelove's "Directory" encompasses all electronic journals and newsletters, including e-serials that are being produced over the commercial networks. It even in- cludes "Hypercard Stacks, Digest-Newsletters, and Others," con- taining significant sources of information that are similar in nature to journals and newsletters.

The Internet version of Kovacs's "Directory of Scholarly Elec- tronic Conferences," also known as ACADLIST, may be retrieved via anonymous FTP to KSUVXA.KENT.EDU, then, cd library. Filenames in the library directory may be viewed with the command "ls," which will show separate files for the Physical Sciences, the Biologi- cal Sciences, etc. All of these files have filenames like ACAD- LIST.FILE#, where the number distinguishes the subject matter of the files. These are explained in the file, ACADLIST.README. The files may also be retrieved by sending a message to LISTSERV@KENTVM or LISTSERV@KENTVM.KENT.EDU. Leave blank the subject and other information lines of your e-mail header, and put in the body of the message: GET Filename Filetype f=mail For example, the message: get acadlist file6 f=mail will retrieve the Physical Sciences list. Amey Park (apark@avs.kentvm.kent.edu) and Jolene Miller (jmiller@kentvm.kent.edu) are the compilers of this section, which includes "Chemistry, Chemical Engineering, and Materials Research."

The Kovacs "Directory" is available on many gophers and WWW serv- ers. One place to find it is at the University of Melbourne's Austin Hospital: URL: http://www.austin.unimelb.edu.au Select "resources.html," then "Directory of Scholarly E- conferences." This site features a keyword-searchable index of the directory. In August 1994, searching "chemistry" retrieved 45 entries. It also has the latest Yanoff list (see below) as inet.services.html and as yanoff.html, but not the Strangelove "Directory."

The Internet Compendium: Guides to Resources by Subject is anoth- er soon-to-be-published work built on a highly successful Inter- net resource (Ref 8). In this case, the guides assembled as the "Clearinghouse for Subject-Oriented Internet Resource Guides" at the University of Michigan form the basis for the book. Louis Rosenfeld (lou@umich.edu) is the Director of the Clearinghouse. Clearinghouse directories are accessible via Gopher .link file at URL: gopher://una.hh.lib.umich.edu:70/10/inetdirs. Via anonymous FTP, the files can be found in the path inetdirsstacks at host una.hh.lib.umich.edu. A key to the names of the files is in the file .README-FOR-FTP. WWW access to the Clearinghouse is via
URL: http://www.lib.umich.edu/chhome.html or
URL: http://http2.sils.umich.edu/~lou/chhome.html.

Chemistry guides in the Clearinghouse for Subject-Oriented Inter- net Resource Guides include the Park and Miller "Chemistry, Chemical Engineering, Materials Research" list mentioned above, as well as the guides by Anderson-Colon and Wiggins that are dis- cussed later in this paper.

ON INTERNET 94 is subtitled "An International guide to electronic journals, newsletters, texts, discussion lists, and other re- sources on the Internet" (Ref 9). With approximately 6,000 en- tries, it dwarfs other printed guides to Internet resources. An interesting section of the book is the small chapter on "Commer- cial Services," where information about everything from online search vendors to book and journal publishers can be found. The preface contains a fascinating admission that the book was to have been published in mid-1993 with the title ON INTERNET 93. However, the enormity of the task prevented that and gave the editor "...a sober admiration for all who attempt to catalog and quantify the ocean of information that is the Internet." Appar- ently he was undeterred by the initial experience since a request for updating of information in preparation for ON INTERNET 95 was issued in August 1994. There is no Internet counterpart to this work and apparently no plans by the publisher to produce an Internet version.

Scott Yanoff's list of "Special Internet Connections," one of the most popular compilations of this type, has a relatively small number of science/math/statistics listings. It is available only on the Internet via:

The "WWW Virtual Library: Chemistry" (http://www.cern.ch/) is part of the World Wide Web developed at CERN by Tim Berners-Lee (Ref 10). It points to resources arranged according to subject area, with lists for chemistry, biosciences, and other areas of interest to chemists. On August 11, 1994, there were 38 WWW servers, 21 Gopher servers, and 9 FTP servers listed for chemis- try. Many more were available for the biosciences and chemical engineering areas.

One way to access the "WWW Virtual Library" and many other inter- esting Internet resources is through the Global Network Navigator (GNN). GNN is the Internet equivalent of a trade journal that one obtains free by filling out a card and returning it to the publisher. It is first necessary to register for GNN at http://nearnet.gnn.com and learn the URL of the closest GNN server. GNN is one of the best organized WWW access points on the Internet. Figure 1 shows the GNN screens for the Chemistry section and the WWW Virtual Library: Chemistry.

Another very well organized source is BUBL. Originally the "Bulletin Board for Librarians," BUBL is no longer limited to those in the library and information professions. Nevertheless, the librarian's touch is very evident in the well considered arrangement of BUBL. It provides access to and information and guidance on a wide range of services and resources available on academic networks. The BUBL Subject Tree features both Gopher and WWW resources and presents them both by subject terms and by broad UDC classification numbers. In August 1994, selection "54 - Chemistry" on the BUBL Composite Gopher and WWW Subject Tree led to 41 resources. BUBL can be accessed by Telnet at BUBL.BATH.AC.UK or 138.38.32.45 login: bubl, by gophering to BUBL.BATH.AC.UK (138.38.32.45), or via WWW at http://www.bubl.bath.ac.edu/BUBL/home.html. A sample of the BUBL entries is found in Figure 2.

3. Panning for Gold in the Internet Ocean: Current Awareness Tools

There are a number of tools on the Internet to help learn of new resources shortly after they become available. These have the advantage of letting the user decide which items may be of inter- est at or near the time they are placed on the Internet. The disadvantage is that a large number of irrelevant items may have to be scanned in order to find the truly useful resources. Some of these awareness tools automatically mail announcements of new resources to subscribers, whereas other require more direct action by the user to discover appropriate resources.

In the United States, the National Science Foundation has contracted with three companies to provide and/or coordinate services for the NSFNET community. Those companies provide information services, directory and database services, and regis- tration services for the NSFNET Internet Network Information Center (InterNIC). Gleason Sackman (sackman@plains.nodak.edu) is the InterNIC "net-happenings" moderator. Since it covers all types of Internet resources in all subject areas, expect a large number of items to come your way each day if you subscribe to net-happenings. This is done by sending the request SUBSCRIBE NET-HAPPENINGS Firstname Lastname to: listserv@is.internic.net.

Washington and Lee University has established a reputation as the best single source of new listings by type of resource (WWW, Gopher, Telnet, WAIS). By pointing your gopher at liberty.uc.wlu.edu and following the path Explore Internet Re- sources > New Internet Sites, you will find hundreds of resources arranged by type of site. For gopher users to be able to utilize the "New WWW sites" option, the gopher client must be able to deal with gopher type "h" which causes a shell off to a WWW browser, thus enabling the link to the home-pages (Ref 11). This might help prevent a chemist from becoming overly excited about finding a source like Organic Online, which turns out to be a business having nothing to do with chemistry.

"Gopher-Announce" is a low-volume, moderated mailing list. Mark P. McCahill (mpm@boombox.micro.umn.edu) is currently in charge of gopher-announce. "Low volume" in this case means that the an- nouncements appear once a week and may contain from a few to several dozen new gophers. This is not a LISTSERV resource, so requests to subscribe should be sent as an e-mail message (with the word "subscribe" somewhere in the body of the text) to: gopher-announce-request@boombox.micro.umn.edu.

New telnet remote login connections on the Internet can be gleaned from HYTEL-L. It distributes updates and new additions to Peter Scott's (aa375@freenet.carleton.ca) HYTELNET system (Ref 12). Send your subscription request, SUBSCRIBE HYTEL-L Firstname Lastname, to: listserv@kentvm.bitnet. A UNIX/VMS version of HYTELNET is available by telneting to access.usask.ca login: hytelnet [requires lower case]. Intended initially as a resource to identify online library catalogs, HYTELNET now has an "Other Telnet-accessible resources" branch that includes "Databases and bibliographies," "Fee-Based Services," and "NASA Databases." HYTELNET is available for IBM PC, UNIX, VMS, and Macintosh sys- tems.

"New-list" distributes announcements of public discussion lists directly to thousands of individuals and to various "list of lists" compilers. Marty Hoag (nu021172@vm1.nodak.edu) is the new-list moderator. To subscribe, send the message: SUBSCRIBE NEW-LIST Firstname Lastname to: listserv@ndsuvm1.bitnet or list- serv@vm1.nodak.edu.

4. Using Chemistry Gophers and WWW Home-Pages

In a sense, every gopher or WWW home page that is devoted to chemistry and has links to Internet resources is a "directory." Each chemistry gopher owner or WWW administrator has presumably evaluated the Internet chemistry resources and selected those that are most appropriate for the local clientele. However, the particular subject orientation of a given site will often not be obvious from the title. Rzepa, in a summary of the Chemistry Workshop at the WWW94 conference, noted this problem as well as the cyclical and recursive nature of many existing chemical servers (Ref 13). He also reported that a chemistry server will usually give no indication whether it looks out toward other Internet chemistry resources or provides a gateway for the world to look in at original material at the home institution. Thus, surfing the Internet waves by hopping from one to another of the chemistry sites can be both rewarding and frustrating. It is instructive to examine a few of the chemistry servers to see how they might help locate resources on the Internet.

Texas A&M University's Gopher has a typical list of chemistry resources that can be found by gophering to gopher.tamu.edu. Follow the path Browse Information by Subject > Chemistry, and you will find links to everything from the "Periodic Table of the Elements" to a searchable "Buckyball Database" for fullerene bibliographic information. Another entry is "Listservs in Chem- istry, Chemical Engineering, and Materials Research" (the Park and Miller directory discussed above).

The Library of Congress Gopher's chemistry section can be ac- cessed by gophering to marvel.loc.gov and following the path Global Electronic Library (by Subject) > Natural Science > Chem- istry.

One encounters some of the same resources found on the Texas A&M gopher, as well as "Guides to Chemistry Resources on the Inter- net." Choosing that path leads to entries for the Park and Miller guide, plus the Anderson-Colon and Wiggins guides dis- cussed below.

Northern Illinois University's gopher is found at hackberry.chem.niu.edu. It can also be accessed via WWW at http://hackberry.chem.niu.edu:70/0/webpage.html. The administra- tor, Steven Bachrach (admin@hackberry.chem.niu.edu), has compiled a list of chemistry WWW and gopher sites. The NIU site is a hybrid of Rzepa's typology. In addition to Internet resources at other locations, it includes such locally produced resources as the "Quantum Chemistry Acronyms Database," guidelines for submis- sion of articles to chemistry journals, and chemistry stock prices.

Another example of a hybrid gopher is the Indiana University Chemistry Library Gopher (gopher lib-gopher.lib.indiana.edu and follow the path Subject Approach to IU Libraries and Internet Resources > Chemistry Library Gopher). A link to the American Chemical Society Gopher is found at the IU site, as are such locally mounted resources as the Clearinghouse for Chemical Information Instructional Materials and guides to various printed and computer-based chemistry tools at Indiana University.

5. Guides and Directories Aimed at Chemistry

In this section, we shall look at lists and other tools that were developed specifically to lead chemists to all types of Internet resources. See the entry points above for The Internet Anthology (Clearinghouse for Subject-Oriented Internet Resource Guides) for information on finding some of the resources in this section.

Lorraine Anderson-Colon's "Chemistry Internet Pathfinder" takes the approach of a typical library pathfinder intended to lead one from a basic definition of the field to resources that will broaden the knowledge of the user. It is somewhat limited in scope of coverage and excludes chemical engineering, biochemis- try, and other cross-disciplinary areas.

"Some Chemistry Resources on the Internet" is compiled by Gary Wiggins. The list has grown from 60 entries when first produced in October 1993 to over 140 in the 7th version released in June 1994. The latest version of "Some Chemistry Resources on the Internet" can always be found on the American Chemical Society Gopher (acsinfo.acs.org or 134.243.230.66), among the Clearing- house for Subject-Oriented Internet Resource Guides, or on the Indiana University Chemistry Library Gopher, although various versions are found in many other locations on the Internet. The popularity of this resource can be judged from the fact that in July 1994 alone, 514 people accessed it at the Clearinghouse site.

Up to now, no attempt has been made to classify the resources chosen for the directory, either by type or method of access; all entries in "Some Chemistry Resources on the Internet" are cur- rently arranged alphabetically by title. However, as the list grew, some conventions were introduced to help the user. The identification of resources added since the last revision is facilitated by the inclusion of four plus signs, ++++, on the line before each new entry. SEE and SEE ALSO cross-references were also added to assist in finding entries. A degree of uni- formity in the presentation of the titles and other information has also evolved. (See Figure 3).

Biotechnology and other bioscience Internet resources are given relatively little attention in "Some Chemistry Resources on the Internet." In part, that is because the author is a member of a team to produce the "Internet Directory of Biotechnology Re- sources." The Biotechnology Directory is part of a larger project funded by the U.S. Office of Education that involves scientists, librarians, and computer people at four midwestern universities (Indiana, Iowa State, Minnesota, and Wisconsin). The goal of the project is to assist in bringing the wealth of biotechnology resources on the Internet into the classroom so they can be used in a problem-solving environment. The "Internet Directory of Biotechnology Resources" can be found at: URL: http://biotech.chem.indiana.edu/ (See Figure 4.)

The Biotechnology Directory provides the user with several access points to relevant resources and includes information on how to use many of the biotechnology databases. A feature that sets off this directory from many others on the Internet is the capability to enter a keyword search for a topic or known resource. (See Figure 5.) Both individual databases, such as the Genome Data Bank (GDB) and biotechnology servers, such as the World-Wide Web Virtual Library: Biosciences, are included in the database and may be retrieved in a search. It is also possible to limit the search to just databases or just servers. A unique feature of the Biotechnology Directory is the inclusion of commercially- available products, both online and CD-ROM, in the database. As a WWW resource, the user has the ability to utilize hypertext links to connect to any of the items retrieved from the database (with the exception of the commercial products, of course).

6. Conclusion

Finding out about relevant Internet resources is no easy task for the chemist at this point in time. That is not to say that the various tools discussed above have not made it much simpler now than it was even two years ago. However the Internet resources are still too unsettled to allow a search of this nature to be totally rewarding. In time, there will be developed standards for cataloging and classifying Internet materials, just as these were developed for traditional printed books and journals. Work is progressing on the development of a Universal Resource Name (URN) to help alleviate the problem that arises when a resource moves from one server to another. The URN would also help in situations where a promising resource is discovered on a new chemistry server only to find that it is something encountered many times in other places under a different name. There is no concept of name authority lists (not to mention subject authority lists) on the Internet, and designers of gopher or WWW servers feel no compunction about changing the name of a resource to suit their own view of the world of chemistry.

There is much work to be done in the area of searching the Inter- net. Commercial online database producers and vendors have developed very sophisticated products and search techniques, and the developers of Internet resources could learn much from their experience. The use of a uniform truncation symbol, development of uniform Boolean search operators, front-end software to assist in searching, statistical reports of results prior to presenting an answer set, and the elimination of duplicates from answer sets in cross-database searching are all concepts well defined and valued in the world of commercial online searching. Surely they have a place in the future of the Internet as well. Until such developments, the tools discussed in this paper serve the useful purpose of identifying the chemistry gems on the Internet.

Gary Wiggins
Chemistry Library
Indiana University
Bloomington, IN 47405 USA
Voice: 812-855-9452
FAX: 812-855-6611
E-mail: wiggins@indiana.edu (Internet)
wiggins@indiana (BITNET)

References

1. Kleiner, K. (1994) What a Tangled Web They Wove..., New Scien- tist 143 1936, 35-39 (p. 36)

2. Bachrach, S., Ed. (1995?) Chemistry and the Internet, Washing- ton, DC, American Chemical Society (in preparation)

3. Varveri, F.S. (1993) Information Retrieval in Chemistry, Jour- nal of Chemical Education 70 3, 204-208

4. Wolman, Y. (1994) Chemistry on the Internet, Chemistry Inter- national, 16 2, 54-56

5. Heller, S.R. (1994) Analytical Chemistry Resources on the Internet, Trends in Analytical Chemistry, 13 1, 7-12

6. Parker, K. (1994) Computing, Library, and Department: Inter- sections at Work in the Yale Chemistry Gopher, unpublished paper delivered at the National Chemical Information Symposium, Burl- ington, VT, June 22, 1994

7. Okerson, A., Ed; King, L.A., Kovacs, D. et al., Comps. (1994) Directory of Electronic Journals, Newsletters, and Academic Discussion Lists, 4th ed., Washington, DC, Association of Re- search Libraries

8. Rosenfeld, L.B. (1995) The Internet Compendium: Guides to Resources by Subject, Neal-Schuman (in press)

9. Abbott, T., Ed. (1994) ON INTERNET 94, Westport, CT, Meckler- media

10. Kleiner, K. Op. Cit.

11. Coopersmith, A. (August 1994), Using Gopher with WWW, avail- able as a text file via gopher at: gopher://gopher.oct.berkeley.edu:70/00/gopher/gopher-www or gopher://gopher.oct.berkeley.edu:70/hh/gopher/gopher-www (HTML form). The author's e-mail address is alanc@oct.berkeley.edu.

12. Scott, P. (1992) HYTELNET as Software for Accessing the Internet: A Personal Perspective on the Development of HYTELNET, Electronic Networking: Research, Applications, and Policy, 2 1, 38-44.

13. Rzepa, H. (1994) Report on the First International Conference on World-Wide Web, May 1994, available at:
http://www.ic.ac.uk/talks/www94_talk.html
The author's e-mail address is rzepa@ic.ac.uk.

Figures

(Fig 1 Global Network Navigator (GNN) Chemistry Sections, Includ- ing the WWW Virtual Library: Chemistry)

(Fig 2 Selected BUBL Chemistry Entries)

(Fig 3 Sample Entries from "Some Chemistry Resources on the Internet")

(Fig 4 Home Page of the "Internet Directory of Biotechnology Resources")

(Fig 5 Sample Keyword Search of the "Internet Directory of Bio- technology Resources")