| Communities Resolving Our Problems: the basic idea | ||
| [SUP: Sharing Problems] | [THINK: Guidance] | [LEAP: Solving Problems] |
The "story of the pyramid" is a strategic way to think about the Look stage of problem solving, the key stage for finding information to deal with problems and questions. The best hunting strategy ought to make your beginning point the potentially richest and most useful place available and then lead you to resources of lesser value. The pyramid is a model which helps the teaching and recall of the sequence for using this strategy. This Look stage is part of the LEAP problem solving process, which in turn is part of the larger CROP model for problem processing, all of which are explained in the basic idea section above. Other web pages organize the search systems along the lines of these three divisions and provide direct access and unique ways to use the search systems. Clicking the relevant pyramids below will take you directly to those tools. This page explains the overall concept.
To think further about this situation, imagine a three layer pyramid
which represents the three major ways in which our culture currently stores
information, an information pyramid. It is possible to stack these resources
in the order of how much human intelligence is applied in the storage of
the information and in the interface to the information. In this model, the best view
and the best quality resources are at the top of the pyramid. At the top
you hunt for the best person for your needs. In the middle you hunt for
the best publications or resources stored on shelves. At the base you seek the best things which are
stored electronically on various drives such as hard drives and other storage
media.
At the bottom or base of the pyramid is the greatest quantity of information but
too often also the lowest quality. Your strategy is to Look in a certain top-down order, an order which could
be called person, place and thing. This priority could also be thought of as
brains, shelves and drives. Within each layer, the resources can
be further stacked in additional sub-pyramids. The best strategy is to start
at the top of the pyramid and work down through its tools.
The Top of the Pyramid (the Person layer) represents the minds of those with the most expertise in your question or problem that has sent you looking and hunting. It also represents the minds of those who serve as librarians and other information specialists who specialize in information retrieval. A key of advantage of communicating with a person is that they can interact with you. The Internet and email give researchers unprecedented direct access to experts around the world. Human intelligence can do what no other form of information storage can do for you, intelligently interact. A person can tell you that your question does not make sense and he or she can help you re-work the question. If the person cannot answer you in a sentence or two by voice or email they can still put you on a useful path. They can cut short the reading of dozens or thousands of books by telling you a small number of books or resources that you need to study to understand how to proceed. For example, if you were interested in the theory of relativity you could begin to look for this phrase and find it referenced in thousands of articles and books. The expert however could tell you that one book nicely summarizes all the basic information on the topic and that two articles sum up the most current thinking since the book was written.
Experts, however, are not a perfect solution to your information needs. They might not be conveniently located in a nearby building or community. They might not use electronic mail. They may not be able to respond to your question or at least not in any time frame that is useful to your immediate needs. They may charge for their services as consultants and employees of information businesses. They of course can also tell you that you are wasting their time with a trivial question to which you can find the answer in a course textbook, dictionary or encyclopedia. They could dislike the direction of your work or for other reasons try to prevent you from proceeding. They might not be the experts that they believe themselves to be. Further, experts can disagree and the newer the idea and the more controversial the topic, the more experts you must consult to cover the range of ideas being considered for that issue. However, whatever the disadvantages of experts, on average starting with them will provide far greater mileage than the lower two layers of the pyramid.
The Middle of the Pyramid (the Place layer) is generally the next best layer to visit. Here you find the places where books, articles, movies, videos and other physical forms of information are stored. These physical storage formats include paper and plastic (film, microfische, audiotape, videotape, CD's, etc.). At some point in its storage that means that all these formats are stored physically on shelves. A powerful and growing set of databases is available to search within this layer. With one web address. www.loc.gov, an information seeker can be combing through the Library of Congress and its over 25 million books, which is just part of its collection. These works have the advantage of having been refereed or reviewed by others, often by other experts on the topic. Several layers of intelligence were then involved in the preparation and sharing this information. That is, at this time in human history, information that is stored on shelves is superior in quality to mere web pages stored in the interconnected hard drives of computer networks such as the Internet. Much of this material can generally be faxed or shipped within minutes or days from libraries and bookstores. These shelved publications are also a source of names of experts. If they are still among the living, you can move to the person layer of the pyramid and seek their contact information.
This form of information storage has its own imperfections as well. This data is housed in buildings around the world. These information systems can only tell you where in the world the resource is located. You then must use various transportation technologies to move that work or publication to your current location. This certainly takes time, sometimes weeks. It often involves expenses. These transportation systems might include working with InterLibrary Loan, the Post Office, UPS or Fedex or other overnight delivery services, fax machines, copy machines and more. Library and information scientists have observed that finding a citation to a work of interest in this layer is only 1/7 of the cost in time and labor in getting the information in the hands of the person who needs it.
However, this information isn't staying on the shelves. Many businesses have found it profitable to convert the full text of this shelf material to digital information and make it available online on the web. This will be discussed further in the next section.
The Base of the Pyramid (the Thing layer) represents information stored on computer hard drives and other computer based mass storage systems. There is an interconnected between these three layers that should not be overlooked. The base and the middle layer are connected to its top layer. That is, web pages are also more than just a resource for their content or ideas. These files stored on hard drives are also a source of the names of experts, not just web page content. The names found in the lower two layers of the pyramid are the targets of opportunity in the top or person layer. Once we get further inside this layer, like all good revolutionary systems, the web turns our pyramid model upside down. There are two distinctive layers to the web, the "deep" web and the "shallow" web. On the web, the area of greatest quantity, the deep web, turns out to also be the area of highest quality.
No matter which layer or zone stores the information, its digital format brings major advantages. If the information is stored in a computer which is accessible through the Internet, access can be extremely fast. You might find the real thing, that is the complete thing (book, article, image), not some citation that points to the real thing located on the other side of the world. The information could also be in other electronic resources you already have paid for, such as an electronic encyclopedia on a compact disc or information created personally in a folder on your desktop hard drive. Further the information arrives in a format that makes it easy to edit, copy and fit into other publications. You can quickly tailor this incoming information for a variety of needs. Computer based information is often among the most current data available, because it costs so little to update or make changes to it once it is stored electronically.
The phrase "deep web" began as a way to describe information in networked databases that could not be indexed by standard public search engines of the web. The company BrightPlanet (BrightPlanet, 2001) claims to have identified some 350,000 specialty databases that are not indexed by standard web search systems and are therefore could be described as hidden. Once information is in databases, it takes a query or search strategy to point to it, not a static web address. But there is other information that is not being indexed by the public or free search systems, notably any set of documents kept behind passwords and access fees. The deep web is often the electronic copy of the most commercially valuable portions of the middle layer of the information pyramid. Fortunately, this perception of the information being "hidden" is a misrepresentation of reality. Our library system and its reference librarians provide an easy and well-lighted walkway into the database and commercial areas, though some libraries have budgets which pay the access fees to more of the private or commercial layer than others. Several web sites also provide various approaches to searching the databases of the deep web.
How deep and big is the deep web? The most recent estimate is that the deep web is over 550 billion documents or web files (BrightPlanet, 2001), in contrast to the shallow web of some 2.1 billion pages. This simple count of files greatly obscures the real size comparison. The deep web files contain the full text of lengthy articles and books. The shallow web is made up of significantly shorter web pages, often just a printed page or two in length. The deep web is at least hundreds of times larger and could easily be thousands of times larger than its cousin, the shallow or surface web.
The shallow web or surface web represents the public portion of the Internet's hard drives. It is similar to the deep web in its rapid growth rate. It is also qualitatively different in important ways from the deep web in its interconnectedness.
The growth rate of the shallow web is significant. Estimates in the fall of 1999 put this part of the Internet at over 800 million web pages (Kiernan, 1999). In July of 2000, the number was reported to be near 2 billion. Though the shallow web consists of a mere 2 billion web pages, current estimates have it growing at the rate of 7.3 million new web pages a day (Murray, 2000). If the daily growth rate held steady, the public web will have reached a staggering annual growth rate of over 2.6 billion pages a year. The web in the fall of 2001 did not meet their estimates of some four billion pages, which called into question either their daily net growth number or their capacity to keep up. As of 2003, the major search engines of Google and All the Web finally indicated they had indexed between 3 and 4 billion pages. However, growth of the web with new pages is just one part of the turbulence of the web. Researchers should also take into consideration the additional change provided by web pages being updated and obsolete ones being deleted.
What is most special about the shallow web is the openly revealed nature of the linking among its web pages. Search engine technology enables researchers to "see" into how human beings interconnect their ideas. This has always been possible on very small scales such as minute portions of the neuronal connections in the brain, groups of people or an interrelated set of journal articles. In contrast, search engines and the public or shallow web enable studies to be done on the relationship of billions of pieces (web pages) of information. The patterns that are emerging here may be typical of not only of information found in these other layers of the pyramid, but of many types of complex networks.
What do we know about the shape of information? On average, any web page is but a few clicks from any other page. Some research has estimated that the average web page contains from between five (Murray, 2000) to seven (Kiernan, 1999) external links per page. The number of links between any two pages on the web can then represent the diameter of the shallow Web. This diameter was first estimated at around 19 clicks (Kiernan, 1999). In this view the web could be seen as a well stirred pot of spaghetti noodles, each noodle representing a path of many connections between web pages. Later work showed that the distribution is not so evenly stirred. The web has neighborhoods that are highly connected and those that are not. Broder and others developed a model that reminds one of a bow tie with a few loose through hanging from it (Broder et al, 2000). The knot of the bow tie represents a highly connected core or neighborhood of some 30% of the web. The left wing of the bow tie stands for the 24% of the web which has connected to the core but which lacks connects back from the core. These could be thought of as new web pages not yet discovered by authors with web pages in the the core. The right wing of the bow tie is the 24% of web pages which are connected to from the core but lack connections back to the core. The loose threads and tubes which hang from the bow tie represent the 22% of information that is made up of disconnected web pages. Of the pages in the highly connected core, Broder's team showed there is an average of 7 degrees of separation, otherwise the diameter of the paths is 16. However, nearly 75% of the web is either not connected or poorly connected to other bits of information.
The patterns of information in the other layers of the info pyramid is much less well known. This is not to say that the deep web lacks linkage, but the linkage is done through bibliographies and footnotes making movement through these connections much more slow and cumbersome. Because these links are not computerized and therefore not available for rapid computerized measurement, the actual pattern in the deep web cannot be seen directly but only can be inferred from the patterns appearing in the shallow web. The deep web can be liken to the slower pace of life in the colder deep ocean, with a seemingly sluggish rate of change in comparison with the rapid growth and decay in the hot shallow waters close to continental shores. Both have important value. It is an even further reach to suppose that social networks and paper citation trails follow the same patterns. Yet, Barabási's work (2001) does show that the non-scaling patterns of information in the shallow web do match with many other forms of complex systems. In playing with the perspective of these different patterns, our minds gain a deeper understanding or the wide range of possibilities of the Internet and the web (ManicLink).
An examination of the ecology of the shallow web and how information life grows, gains value and dies there is needed to understand its life cycles. Life first grows in the shallow web because anyone can grow something there by uploading a file in seconds. Because of this, it is true that the shallow web has much seemingly irrelevant information about hot cars, birthday pictures and email flame wars. That observation misses seeing how the information that is valuable becomes known and how it grows into stronger and more useful information. Just because a web page exists does not mean that many people pay attention to it or value it. People make links between web pages when they perceive that another web document does an effective job of contributing to their own web page. It may be a contribution of complement or a contribution of contrast and dissonance. Web authors do not want too many links on their web pages, so there is an evolutionary or Darwinian struggle as new more effective web links come along which leads to the removal of less effective web links. As the author of a web page makes these links, they learn things from the pages they link and modify their own web page, making it more valuable. This contrast with the deep web in which the original article cannot be modified, requiring the author to write and publish a new article, a much more time consuming and arduous process. Search engines have learned to take advantage of this ecology. For example, the google.com search engine makes the priority of its search listings dependent the number of links made to a given web page. The pages at the top of its search listings have been prioritized by the number of other web pages that have links to them. As those seeking further links for their own pages use the top items from search engine hits, this further contributes to the evolutionary rise of certain web pages and sites. The rich get richer. The popular get more popular. This not the top down order of a cataloging system but the bottom up emergent order of a social system.
Thinkers that discover new problems and compose new solutions often must make a choice between publishing their work in refereed publications which removes their work from the evolutionary struggle for value in the web linked economy, or placing them in the shallow but public web where the best information evolves to have the most links to other relevant works. The deep web keeps its holdings separate from the direct links of the shallow web. In the deep web appears designed for the interaction of a small group of specialists. The surface or shallow web appears designed for a wide open public "vote" on the merits of a given web page, the vote being a web author's choice to link to a given page. This poses an intriguing and deep problem. Can information scientists and scholars bring higher levels of interactive life to the deep web and connect this dialog to the higher levels of interaction of the surface web without undercutting the very economic system that has built the deep web or detracting from the democratic openness that has spawned the phenomena of the World Wide Web?
Many problems exists within the information at this shallow layer. The information that becomes published in seconds may never have been qualified by any other human as to the data's accuracy, currency, truthfulness or a host of related concerns. Much greater responsibility is placed on the hunter of this layer to not bring home tainted or rotten fish. Anyone with the capacity to put the information on the Internet can remove it in seconds too. You may tell others where to look, but it may have moved, been removed, or updated or changed in some way that no longer suits your purposes. This is in contrast to the deep web in which much of its information has a permanent and stable home for decades.
In all fairness, there are many Internet or otherwise computer-networked sites with carefully qualified information. This information can be extremely relevant and useful. But you must look over Internet sites carefully to make sure that they are legitimate, not a careful fake from someone who might benefit from altered data or just someone enjoying a spoof of others work. Cross checking this information against that found in other layers of the pyramid is the only way to validate or trust what you have found. One can also envision a time in which the person, place and thing layer will be collapsed into one layer, with all information universally available electronically. But it is emphatically not here now and there are economic forces at work that can keep this from ever happening.
Do remember to keep track of where the information came from in this electronic thing layer. That is, not only copy the information you need, but copy its references. The format of such references may seem quite different from citations in the middle place layer. References for a web page should include the URL or web address, the author of the web page if he or she can be found and the institution that hosts or provides the web pages.
Carefully follow the top-down path of person, place and thing, capturing the best of what is available at each level. Summarize the information where you can. Copy and move only what you must to files that you save to your diskette. Compensate for the weaknesses of information stored in one layer with the strengths of ideas and concepts found in the others. The strength of our global culture's information pyramid comes from the differences in each layer. Utilizing the full scale of the pyramid adds strength and depth to your work in addressing the problems and questions of your lifetime. In turn, readers become thinkers that add strength back to the pyramid. This is done by mixing information that comes from your own experience with the information found in new situations and seeking its performance and publication.
The information pyramid strategy that has been explained helps with two important features of information literacy: knowing how knowledge is organized and how to strategically go hunting among its different layers. For further background, also explore Fowler and Simpson's tutorial on information literacy skills (2003). There is a third feature of information literacy that also needs to be much better known, human logic. This logic consists of certain mathematical tools that can be applied to computer stored knowledge and also consists of evaluative logic, otherwise known as critical thinking.
As an extension of this pyramid strategy, searchers must also understand and use the terms of Boolean logic (e.g., AND, OR and NOT) to narrow and expand their searches as needed. Failure to know and use Boolean thinking leads to a significant number of failures in information retrieval (Weise, 2001). Sometimes these terms must be entered manually, sometimes the search pages are set up with data entry boxes that carry out these functions without manually entered the Boolean logic terms. Sometimes the search engine assumes OR and sometimes AND relationships in the search strategy. Knowing more about Boolean and more about the degree to which different search engines handle Boolean increases the searchers effectiveness (University of Albany, 2001).
Searching for "CATS OR DOGS" retrieves more information than searching for "CATS AND DOGS" because or means either condition is acceptable while and means both conditions must be true. The more conditions that must be true before a citation or article is retrieved, the less information that will be found. The terms AND and NOT are used to narrow a search when too much information or too irrelevant information is retrieved. The term OR is used to broaden a search by connecting a set of synonyms for a concept. A thesaurus is a handy tool to meet this need. Through the use of parentheses, complex nested sets of logic can be arranged (cats OR tigers OR lions OR felines) AND (dogs OR wolves) NOT pets. More attention needs to given to those resources that teach Boolean skills to both school age children and adults as well as resources that show how to apply those skills to searching databases and the web (Houghton & Houghton, 1999). Simple techniques can have significant impact. For example, BrightPlanet estimates that by using six to eight appropriate terms per search, the amount of irrelevant information retrieved can be reduced by over 99% (BrightPlanet, 2000).
As beneficial as it is to know basic Boolean logic, effective net searching requires more. Certainly knowing basic guidelines on a number of techniques is important (Whitley, 2001). Examing the help pages of web search engines is a quick way to learn many valuable search commands and techniques. If instruction at this phase of research is really going to be enhanced by teachers and the team of educators that help them, some new elements must be added to the basic patterns of our compositions. The bibliography for every composition should include more than the list of authors and sources. The bibliography should also include a listing of the search systems that were used and the search strategies employed at each one. This information is most important during the formative stages of writing, and should be one of the first elements checked by teachers and writing centers that assist with rough drafts. This enhancement would be equally beneficial for the bibliography's of professional publications.
It is also valuable to be able to find the community of people interested in a particular topic by searching for which pages have linked to a given page. Some of the search engines have a command called link: that makes this easy. For these examples use Google, Alta Vista or All the Web. Try link:www.whitehouse.gov or link:www.wcu.edu/library/ which return the web pages which link to the searched for page.
These Boolean terms help us control the quantity and specificity of the information that is sought. These terms cannot directly address the quality of what is found. There is no substitute for thinking.
See the following links for more about common problems with search procedures suggestions for dealing with them.
Being effective at finding information is not enough. Whatever information is found, an intelligent weighing of the evidence is always needed. There are certain basic aspects of qualifying information that apply to the use of all three of the layers of the information pyramid. In every case you must consider basic rules of evidence. This type of reflective attitude is also known as critical thinking. The information literate person must be able to evaluate the accuracy, currency and relevance of the data they encounter. Evaluating relevance also means carefully weighing the reputation of the source of the information at any level of the information pyramid.
Being effective at evaluating information is not enough. To effectively use the information that has been found acceptable means to be able to store, manipulate (e.g., copy and paste) and compose in a way that has a positive impact on our culture, on those within it. At the mechanical level, using information means having the digital skills to paraphrase and copy information into different applications such as outline processors, spreadsheets, graphic information systems and databases and then use these tools for further and deeper analysis. In this age of multimedia, it also means having the ability to go beyond text, to capture, cut, copy and paste with audio, video, still image, three-dimensional images and more. It also means having the ability to merge these multiple forms of expression in the same composition.
Of even greater importance is the ability to use this information to be able to make a point and sustain it with examples and supporting facts. It means to be able to choose appropriate forms of sharing for a particular audience, from the creation of essays and newsletters to video, from making a speech to a stage performance. The walkway to many a college and community writing center is as well-paved and well lit as the one to the library reference desk.
The process of gathering information really begins with a sense of need
and the construction of a question. There are certain basic signs which
indicate that more information and knowledge is needed. Does the individual
or the participants in the research thoroughly understand the concept,
idea or situation that is being considered? Does everyone agree with the
basic conclusion being reached? Has there been a real effort to dig for
new and related questions? This activity should also be seen as equal parts
of opportunity recognition and problem recognition.
To summarize, the empowered learner and citizen of this age needs to know a series of interlocking basic information skills, an empowerment that is significantly increased by fluent reading ability. The need for the knowledge summarized in the list below (the big six) has long been heavily promoted by the professional library associations. The more effective learner and problem solver will:
Abreu, Elinor (Sep 11 2000). Diving
Into the Deep Web. The Industry Standard Magazine.
Barabási, Albert-László (July, 2001). The Physics of the Web.
PhysicsWeb.org. Retrieved on January 5, 2004 from
http://physicsweb.org/article/world/14/7/09
Broder, Andrei; Kumar,
Ravi; Maghoul, Farzin; Raghavan, Prabhakar; Rajagopalan,
Sridhar; Stata, Raymie; Tomkins, Andrew; & Wiener, Janet (2000). Graph
structure in the web. Retrieved on January 6 ,2004 from
http://www.almaden.ibm.com/cs/k53/www9.final/
BrightPlanet, (2001). Online: http://www.brightplanet.com/deepcontent/index.asp
BrightPlanet (2000). Search Tutorial: Deep
Content. Online: http://www.brightplanet.com/deepcontent/tutorials/search/part1.asp
Fowler, Clara & Simpson, Brent (2003). Texas Information Literacy Tutorial
(TILT). The University of Texas System Digital Library. Retrieved on January
11, 2004 f rom
http://tilt.lib.utsystem.edu/
Guernsey, Lisa (January 25, 2001). Mining
the 'Deep Web' With Specialized Drills. New York Times.
Houghton, Janaye; & Houghton, Robert (1999).
Decision
Points: Boolean Logic for Computer Users and Beginning Online Searchers.
Teacher Unlimited.
Kiernan, Vincent (September 9, 1999). As
Goes Kevin Bacon, So Goes the Web, Researchers Report. Chronicles of
Higher Education.
Murray, Brian H. (2000). Sizing
the Internet (pdf file). Corporate Report: Cyveillance.
Weise, Elizabeth (2001). One
click starts the avalanche: Buried in information? Smarter searching comes
to the rescue. USA Today.
University of Albany Libraries, (2001). Boolean
Searching on the Internet: A Primer in Boolean Logic.
Whitley, Betsy (2001). Help
Students "Search Smart" on the Internet. Faculty Forum, Vol. 14, No.2.
Information Pyramid Table - Updated January
12, 2004 [Pageauthor Houghton]