Cline Center: Research Programs: SID: The Event AnalysisSID: The Event Analysis
The event analysis involves the systematic identification of news reports on events that bear on societal infrastructures and welfare, as well as the careful extraction of information from those reports. To produce valid and reliable event data a number of methodological challenges had to be addressed. Five are particularly noteworthy.
First, we had to secure access to sources of news reports that would capture relevant events for virtually every country in the world for a period beginning on January 1, 1946 and extending into the indefinite future. Second, we had to identify categories of relevant events from which theoretically relevant information could be extracted. Third, we had to devise efficient and reliable procedures to categorize millions of news reports into the classification scheme we constructed. Fourth, we had to create efficient and reliable procedures to extract information from news reports within each of these categories. Fifth, we had to use the data extracted to generate well-grounded inferences about theoretical constructs for each nation during specified periods of time.
As the event analysis is an on-going research project that began in 2005 and is not expected to be completed until 2010, approaches to all of these challenges have not been developed formally. The following sections present summaries of our approaches to the first three challenges; a more detailed discussion is available upon request.
Sources of Event Reports: An Overview
The most fundamental challenge involved in conducting the event analysis for the SID was identifying sources of news reports that were capable of capturing the universe of relevant events. In order to construct a valid comparative profile of our 177 nations on a particular dimension (respect for human rights, societal stability, supremacy of law, etc.), we do not need to capture every relevant event in every applicable time period. We do, however, have to have access to sources that 1) capture a broad and comparable array of reports over an extended period of time, 2) do not systematically ignore certain types of events or nations or time frames, and 3) are responsible sources committed to “fact-oriented” accounts of events. While the second criterion is self-explanatory a few comments are in order concerning the first and third points.
There are a number of problems in attempting to capture “a broad and comparable array of reports” on events relevant to the SID. They are related, in one way or another, the existence of events that transpire at a particular place and time but are never reported by a new agency. There are several reasons why this may occur. One is cultural bias. It may be that the bias of certain news organizations is such that certain happenings (human rights violations, for example) are viewed as routine in some countries and hence not newsworthy. Had the same event occurred elsewhere in the world it may well have been reported. News suppression by a totalitarian government may also be the source of uneven coverage of events and this may vary across time. Another problem stems from the size and importance of a country. Major news organization may not have much of a presence in tiny, “unimportant” nations. This may lead to systematic underreporting of events.
It is always difficult to refute the notion that a research design fails to capture happenings whose existence cannot be confirmed independently. That is certainly the case here. At the same time, the types of events we are covering and the diversity of our news sources will minimize the methodological threats noted above. This is particularly true for the contemporary era. Modern information technology and the explosion of global news organizations make it unlikely that many relevant events will be missed by the report gathering scheme we have developed. Also, other types of potential threats can be checked. We can, for example, compare the relative incidence of certain types of events (political expression events, human rights violations, equality before the law, etc.) across different types of governments and the size and global significance of the nation. We can also check the incidence of events for these categories of nations across different time periods covered by different sources of information. If different patterns of events across nation types emerge during the contemporary era, it will provide us with some empirically based insights into the nature of this threat. While this will not “fix” the problem, it will give us a sense of its magnitude.
The third point relates to bias in the reporting of events that are covered by news agencies. To generate the type of information base that will be of use to the SID we do not have to expect that news reports of relevant events will be totally free of ideological bias or cultural taint. Indeed, we have been driven to search for a variety of difference sources simply because of the threat of some type of “spin,” be it ideological, cultural, nationalistic, or something else. As long as the “who, what, when, where, etc.” are reported accurately, the use of multiple sources and well-designed information extraction procedures can neutralize whatever spin is put on the event.
No single source of digitized news reports is available for the entire time frame that provides the breadth of geographic coverage needed in this project. We have, however, been able to assemble a series of sources that provide exceptionally broad geographic coverage for the time frame of this study …#39; except for the 2004-06 period, which is the subject of an on-going search effort. Table 1 reports the sources of news reports for each year. As is evident from Table 1 the sources break into three periods. One set of sources covers the period from January 1, 1946 to the end of 2003; the second covers the period from the beginning of 2004 to February 2006. The final period begins in February 2006 and extends into the indefinite future. We discuss each in turn.
Period before January 1, 2004
We have particularly good coverage of events for the period from the beginning on January 1 1946 to December 31 2003; it is particularly rich before 1990. Our two principal sources of news reports from mainstream print organizations for this period are the New York Times (1946-2003) and the Wall Street Journal (1946-1989). While the New York Times and the Wall Street Journal are both Western news organizations with an American audience, they are also two of the most respected and accomplished newspapers in the history of journalism. The New York Times is one of the most authoritative sources in the world on political and social happenings, while the Wall Street Journal stands out as an authoritative source on business and economics. Both cover legal happenings around the globe as well. Through a generous accommodation from ProQuest, we have access to over 9,000,000 digitized articles from these two newspapers.
As respected and wide-ranging as the New York Times and Wall Street Journal are, preliminary analyses of the articles contained in them revealed that their coverage was far too U.S.-oriented to be adequate for the needs of the SID. The only other digitized news archive of comparable scope was the Times of London. But it suffers from some of the same problems as the New York Times and Wall Street Journal: Western in orientation and limited in coverage. Thus, we decided to pursue a unique and immensely rich news aggregator service that nicely complements the New York Times and Wall Street Journal archives: the Foreign Broadcast Information Service (FBIS). FBIS is a collection, translation, and dissemination service for open source information collected by the U.S. intelligence community from tens of thousands of non-U.S. news outlets. The U.S. government has spent billions of dollars assembling and translating these news reports since 1941. We estimate that there are approximately 20,000,000 reports contained in the FBIS archive from 1946-2003 …#39; about twice the number contained in the combined New York Times and Wall Street Journal archives.
The fact that the information assembled by FBIS is collected and screened by members of the US intelligence community …#39; a Western entity with a niche market for its reports …#39; makes the archive less than optimal for the purposes of this research. It also raises some key questions that must be answered before the FBIS archive can be comfortably employed in this initiative. For example, is the FBIS archive focused only on formidable military rivals to the U.S.? If so, then it would lead to skewed results that would undermine the validity of the inferences that could be drawn from a review of its reports. Does the FBIS archive deal mostly with military/strategic articles? This would markedly compromise its utility. Does FBIS draw from a narrow set of sources? This would make information contained in these reports highly suspect and limit its usefulness.
To answer these questions we conducted a focused, but cursory, review of the 1996-2004 FBIS Daily Report collection, which is available in digital form. Examinations of selected reports suggest that field officers involved in the collection of these reports interpreted their mission very broadly. It appears that the FBIS field offices captured a wide range of articles from a large number of countries on topics that are central to the SID. For example, while there is some evidence that the interests of field officers were skewed toward events in adversarial nations, the highest number of reports focused on the U.S. as well as its major allies. Even nations such as Nepal, Bangladesh, and Latvia had more than 10,000 reports during the 1996-2004 period. Moreover, military/strategic events constitute only 10% of the digitized archive. Nearly 40% of the reports were categorized as a report about some type of political event. The two categories dealing with “domestic events” comprise nearly 25% of all reports; economic events constitute 12%.
Two other pieces of information culled from the digitized FBIS archive underscore the uniqueness of this resource. The first is that the FBIS reports contained in this archive came from over 25,000 different sources; over 2500 sources had at least 100 articles taken from them. Second, data on the original language show that translations were made from scores of languages. The 75 most frequently translated languages generated over 1.7 M reports; English language sources accounted for only 1.6 M reports.
The reach and inclusiveness of the reports contained in the digitized FBIS collection suggests that this historical archive is a unique and invaluable source of information on relevant historical events across a wide range of nations. These preliminary inquiries were sufficiently promising to lead us to undertake a massive digitization project involving 50,000 microfiche cards, over 900 rolls of microfilm, and the purchase of specialized scanners.
Period from January, 2004 to January, 2006
This time frame presents the most challenging period for which to collect reports on events, largely because of copyright concerns and difficulties with vendors who own and distribute news content. While this period constitutes only about 3% of our study period (as of July, 2007), we continue to search for ways to fill this gap. In addition, we look for options to update our Wall Street Journal archives for the post-1989 period.
Period from February, 2006
In contrast to the 1/1/2004 to 2/1/2006 time frame, the period beginning on February 1, 2006 is the era with perhaps the richest archive of information on events. This archive is being created by a website monitoring system created and operated by the Cline Center. This system regularly “crawls” across over hundreds of news websites throughout the world. As of June 2007 we were monitoring 576 websites, but this number grows incrementally as new websites come on line. This monitoring system has archived 4,000,000 articles as of July 1, 2007; it currently downloads approximately 15,000 articles each day.
It should be noted that each of the news websites scanned have multiple news feeds (about eight, on average, for all websites); these news feeds contain specialized news content, such as business news, late breaking events, science and technology news, etc. Individual feeds are automatically moved among “crawling tiers” based on their real-time “update rate.” For example, on a slow news day such as a Sunday, a site may only be checked once every 6 or 12 hours, while on a brisk day, such as a Monday morning, it may be checked every hour on the hour. We actually find that most high-volume sites like bounce between the 1 hour and 6 hour crawling tiers, even going to 12 hours, based on the time of day, day of the week, high-profile events occurring in the world, etc.
There are a huge number of websites around the world that provide information about news-worthy events. Unfortunately, including all of them was not possible, nor would it have been beneficial. Given the general criteria laid out above, we limited our selection of websites to those operated by visible and responsible entities, those that were accountable to a higher authority that exercised some type of editorial review. This was as close as we could get to insuring that our sources approximated “fact-based reporting.” What this criterion meant was the wholesale rejection of personal blogs. While many blogs may assemble reliable and useful content, they do not have the accountability or editorial review that is needed in an enterprise such as this.
Eliminating blogs left us with news websites operated by what our preliminary review suggested to be five different types of organizations. These were: news aggregators, newspapers, news providers, news providers/aggregators, and radio and television news organizations. News aggregators compile news from a variety of different sources and organize them in a user friendly manner. News providers, unlike newspaper and radio and television outlets, present their reports only on their website; there is no other dissemination outlet. Some websites are operated by hybrid organizations: news providers/aggregators.
Websites operated by each of these different types of organizations were included in the Cline Center’s monitoring system. But in selecting among the various websites that were available, we tried to select websites that met additional criteria. First, we looked for English language sites with RSS (Really Simple Syndication) feeds that had no subscription fee. In order to enhance our geographic coverage, however, we ultimately included some non-English language websites as well as some without RSS feeds. Second, while we included many websites with a global orientation, we also actively searched for news websites that focused on specific geographic regions. Indeed, one of our goals was to include at least one news feed for each of the countries in our study. While we failed to achieve this goal we have very inclusive regional coverage: we have news feeds based in 128 countries with 95% of the world’s population. Third, in addition to seeking news websites with a general coverage, we also searched for websites with specialized content that was relevant to the project (business and economics, environment, health, human rights, terrorism, etc.). We uncovered dozens of specialized news feeds with global coverage.
The scheme we used to construct our website monitoring system provides for an exceptionally extensive coverage of contemporary news events. It also provides a high degree of redundancy. This redundancy, while causing some logistical problems, means that it is unlikely that many relevant and consequential events in the period beginning with February 2006 will evade our monitoring system. Moreover, it is very likely that many events will be covered by a number of websites, some providing identical coverage, others providing very different coverage.
The Event Analysis Classification Scheme
At the heart of the event analysis is the classification scheme used to organize reports about events. This scheme is essential to collecting the information that can provide insights into the developmental roles of institutions. The huge archive of news reports assembled here would be overwhelming without a clear sense of the type of events that are relevant and useful to the SID project. The failure to develop a well-conceived classification scheme would also entail significant opportunity costs. If relevant reports are missed in the first “pass” through the news archives, they can only be retrieved through subsequent passes …#39;an extremely expensive and time consuming process when dealing with tens of millions of reports. If sloppy classification procedures generate large numbers of irrelevant reports, valuable resources will be consumed in manually screening them.
Because of considerations such as these, the design of the event analysis’ classification scheme was exceptionally important to achieving the goals of the SID project. The encompassing nature of the SID made the design of this classification scheme a challenging task that required a multi-staged, iterative effort. We began with an exploratory phase that was informed by the SID’s conceptual framework. The insights from that exploratory effort were used to design a three-phase pretest that generated a more refined and informed classification scheme. That latter scheme was used to design the automatic text categorization procedures that filtered and sorted reports about events that were contained in the news archives described above.
The next section outlines the exploratory effort. The following section describes the three-wave pretest.
The SID Conceptual Framework and the Event Classification Scheme: The Exploratory Phase
The necessary beginning point for designing the classification scheme for the event analysis is the conceptual framework that underlies the SID. Four of the components in this framework relate directly to societal infrastructures (political, economic and legal institutions; national setting). Two others (societal stability, exogenous shocks) deal with factors that affect well-being of societies quite independently of societal infrastructures; it is important to statistical control for the effects of these factors in order to isolate institutional effects. The final component is societal welfare. With this conceptual framework in mind, a team of faculty and graduate students conducted a preliminary assessment of the type of events that could be useful in operationalizing it. This assessment was conducted in the summer of 2005; to conduct it the team used news reports in the current editions of the New York Times as well as a set of electronic news sources. This exploratory work suggested that data on events could be used to operationalize key components of this framework. The following subsections discuss key components of it.
The Institutional Components. Exploratory efforts to identify families of events that can speak to the orientation and operation of key societal institutions began with a focus on liberal institutions. Because the initial impetus of the SID was the liberal paradigm, the Cline Center had organized a set of conceptual frameworks that outlined the key components of the institutions that are at the heart of that paradigm: democracy; free enterprise economies, and the rule of law. These frameworks were accompanied by matrices that detailed the sources of data that could potentially be used to operationalize each component. These matrices included entries where event data could be relevant and these preliminary designations provided guidance for the exploratory phase of the event analysis.
While the structure of liberal institutions provided the insights that initially guided the event analysis, it did not unduly handicap it. While some of the institutional subcategories speak to distinctly liberal notions (integrity of electoral processes, institutional constraints on government, vitality of expression rights, sanctity of property rights, due process of law, judicial independence, etc.), most are much more general. These include sovereignty of formal governmental institutions, role of private groups in the political realm, government involvement in the economy, commitment to free trade, fidelity to legal role dictates, etc.). It should also be stressed that successive iterations of the event classification scheme became more general and inclusive. This provides us with the capacity to use the event data to operationalize a wide array of societal institutions, while still preserving our ability to construct innovative gauges of liberal institutions.
The National Setting. Casual perusals of daily news reports reveal a plethora of information about a wide variety of events that are relevant to gauging the setting within which institutions operate. Systematically compiling information on these events over a long time period and across a wide range of countries can produce important insights into the relationship between institutions and societal well-being. The exploratory phase of the event analysis suggested that several broad categories of events were important to capture. These include societal stability (riots, assassinations, coups, etc.), intra-national group strife (incidents generated by religious, racial or ethnic conflict), exogenous shocks (tsunamis, tornados, floods, draughts, etc.), diffuse regime support (reports of conflicted loyalties), national economic conditions, national development programs, extra-national development programs.
Societal Welfare. Most of the welfare indicators to be used in the SID come from standard archival sources. But there are several key components of societal welfare for which no easily quantifiable measure exists: human rights, social stability and terrorist attacks. Our exploratory work revealed a large number of events that we classified within two human rights categories: regard for human life and respect for human rights. Regard for human life includes a wide range of events that reflect on a nation’s concern for the physical well-being and integrity of humans. Respect for human rights contains an encompassing set of events that pertain to rights included in the United Nation’s Universal Declaration of Human Rights. An equally significant number of news reports dealt with social stability events (assassinations, protests, resignations) and terrorist attacks.
Miscellaneous. In addition to the event categories that emerged from a consideration of the SID’s conceptual framework, we included a number of categories of miscellaneous events that were plausibly relevant to future analyses. These included a wide range of labor market events, economic news, political party positions, election results, and a number of international interactions.
Refining the Classification Scheme: The Three-wave Pretest
The classification scheme developed in the exploratory phase was used as the starting point to conduct a three-wave pretest. The pretest was designed to: 1) develop a better sense of the types of relevant event data that new reports could produce; and 2) produce a more refined and comprehensive classification scheme with which to organize those events. These design objectives were essential to the ultimate goal of the pretest: the creation of a refined and powerful automatic text categorization program that could provide a computerized solution to the filtering and sorting of the tens of millions of news reports in our archive. To achieve this goal we needed to: 1) work with a representative set of events; and 2) develop a more rigorous process for devising, testing and refining the classification scheme. A representative set of events was generated by creating a random sample of historical news articles; a more rigorous process for developing the classification scheme was created through the systematic analysis of those articles in a three-wave iteration.
The random sample of historical articles was derived from the historical archives of the New York Times and the Wall Street Journal made available to us by ProQuest. We used historical materials because the contemporary (i.e. post- 2006 reports) would not provide us with a sufficiently representative set of events to achieve the goals of the pretest. We used the New York Times and the Wall Street Journal archives because the other historical materials were in the process of being digitized. A three-step process was required to create the sample of news reports. First, a random sample of 3,000 days was selected from the universe of days between January 1, 1946 and December 31, 2003, about one day for each week during that period. We then downloaded every article that appeared in ProQuest’s archive of New York Times and the Wall Street Journal editions on those days. This sample of days generated about 1,000,000 articles. From those articles we generated a second random sample of 100,000 articles. Finally, these were randomly divided into three roughly equal groups of articles. These three groups of articles formed the basis for the three-wave pretest.
For each wave of the pretest the articles were made available to student classifiers through an on-line program, the Eventalyzer, which displayed both article text and a classification scheme. The events in an article could be categorized by clicking on a classification scheme box. Because some events could belong to more than one category, and because some articles reported multiple events, classifiers could click more than one box. Other tools were available in the display and categorization program, but they varied with each wave of the pretest …#39; as did the classification scheme. The next sections describe the structure and contributions of each wave.
Wave-one. This wave was the most unstructured and cumbersome; it required more than a year to complete. Student classifiers were provided with the rough scheme developed in the exploratory phase. The students were given general verbal instructions about the general types of events that would fall within each; they were required to read every article (over 35,000). For this wave they were only required to classify articles within the first level of the classification scheme: economic, legal, political, irrelevant, etc. Student classifiers had access to an electronic discussion board that allowed them to communicate with other classifiers and a general supervisor about problems and issues they encountered. They also had the capacity to make notes in a dialog box that was embedded in the screen that displayed the text of the news article. Classifiers were instructed to be cautious in placing articles in the irrelevant category, where they would never again be reviewed, and to classify articles in every conceivable category. The result was a massive over-classification of articles as “Relevant” and a high frequency of multiple classifications. More than 10,000 of the 35,000 articles reviewed were placed in some relevant category.
Once all of the articles in the first-wave sample were categorized a set of students generated short summaries of those that were considered relevant. These 10,000 summaries were then reviewed by the project director. He screened the irrelevant articles (nearly 7,500) and generated a more refined preliminary classification of those articles containing reports of relevant events. A second pass through the relevant articles produced: 1) a four-tier classification scheme; 2) a set of companion Excel files that contain the one-sentence summaries of the events that fell within each of the categories.
Components of that classification scheme (government involvement in the economy, political expression, societal stability, etc.), along with the companion Excel files, were sent to various University of Illinois faculty to evaluate and critique. For all but the most rudimentary categories of events (i.e., cataclysmic events), at least two faculty reviewed each component of the classification scheme. They were requested to examine the classification scheme for redundancy, clarity, internal consistency, comprehensiveness, etc. They were also requested to review the appropriateness of the event summaries that were classified within the four-tier classification scheme.
These reviews were used to produce another version of the four-tier classification scheme. At that point a number of Ph. D. students were hired as research assistants and assigned to different subsets of the classification scheme (i.e., they were deployed as subject matter specialists). They were asked to scrutinize the revised classification scheme and use it in conjunction with some articles contained in the wave-two sample of articles. This let to further refinements in the classification scheme for wave two of the pretest. Once those refinements were complete they were asked to assist the project director in the preparation of a set of written guidelines to aid student classifiers in the conduct of the wave-two classifications.
Wave-two. The second wave of the pretest differed fundamentally from the first wave. It was more automated, more structured, and more sophisticated. The following paragraphs develop these points more fully.
Automation. In the second wave of the pretest student classifiers worked with articles that had already been preliminarily “binned” by an automatic text classification (ATC) program. The ATC program we employed was a “next generation” version of technology developed at the National Supercomputing Applications Center at UIUC (NCSA). The data used to develop the ATC program were derived from the refined and reviewed manual classifications done in wave one. Though the bins differed somewhat between wave one and wave two, there was enough similarity that automatically classifying the events contained in the wave-two sample of news reports was economical. One of the main virtues of using the ATC procedure at this stage is that it filtered nearly ten thousand articles (sports, entertainment, food, etc.), which students had to manually filter in wave 1. Wave-two classifiers did not even have to access these files, much less review them. It is estimated that this saved as much as 800 person-hours.
To develop our ATC procedure for wave two the binned collection of events from wave one was fed into an unsupervised learner. The “bag-of-words” model was used, with individual words as the feature tokens, weighted by raw term frequency with no normalization. The categorization feature vectors selected was the 20% most frequent words criterion, which we found to be the most accurate at this point in the project. These categorizations were then passed to a Naïve Bayesian Learner for model construction and execution. We found that the Naïve Bayesian Learner outperformed both Support Vector Machine and Decision Tree Learners on both accuracy and performance. Moreover, we found that the “bag-of-words” feature selection yielded significantly enhanced accuracy over “noun phrase” feature selection.
The correlation among some of our categories, coupled with the wide scope of others, led us to use independent models for each category, rather than a single model that encompasses all of them. To train the models, the system iterates over each category, selecting all N documents in that category, along with N*9 documents that are not in the category to use as inclusive/exclusive training examples. We found (experimentally) that the ratio of 1:9 inclusive/exclusive documents offers the most accurate classification results. Each resulting model offers a yes/no classification of whether a document belongs in that category. At runtime, each incoming document is subjected to all models in sequence, with an accumulator storing the list of matches. If a document matches no categories, it is placed into the irrelevant category. If a document is relevant it is tagged with the information from the accumulator.
A second way in which the second wave of the pretest is more automated than the first is that it produces computer-generated summaries of each article. These summaries were a maximum of fifteen sentences each. Using these summaries both eased the tedium involved in the classification process and made it more efficient. Where the summaries are unclear, student classifiers have the capacity to access the full text of the article.
Structure. The second wave of the pretest was more structured than the first because it used a more refined classification scheme, had written guidelines, and employed an on-going monitoring and feedback mechanism. The guidelines were used to train and direct student classifiers in classifying the events contained in news articles within the wave-two classification scheme. Student classifiers were trained broadly in the classification scheme and worked in a common physical space that facilitated lateral communication among them. But they were deployed as topic specialists in that they were assigned to only one Tier 2 bin at a time (regard for human life, respect for human rights, sanctity of property rights, government involvement in the economy, electoral integrity, political expression, etc.). When they completed binning all of the events in a Tier 3 category …#39; and filtering or redirecting reports that were mistakenly included in a category by the ATC procedure …#39; they would be assigned to a new category, usually within the same Tier 1 class (economic, legal, political, etc.).
In addition to the existence of a more mature classification scheme, written guidelines, and classifier specialization, the wave-two pretest benefited from an on-going monitoring and feedback mechanism. It had two components, one human and one electronic. The human component was a set of research assistants assigned to monitor, on an on-going basis, the accuracy and pace of the binning process. One RA was assigned to each one of the principal Tier 1 categories and they reviewed the articles that were binned in their category. This was done on a daily basis and it provided for near real-time feedback on the binning process. The RA’s were also required to classify the binned reports into a Tier 4 category. This provided both a check on the classification procedures and the classification scheme. Reports that could not be easily binned led to: 1) better guidelines for the student classifiers, and 2) refinements in most detailed categories of the classification scheme. The electronic component was an Eventalyzer bulletin board. Student classifiers could post queries on the bulletin board and receive responses from both other classifiers and the RA monitors.
Sophistication. The second wave of the pretest was more sophisticated than the first wave because of both electronic features and methodological checks. An annotation capacity was added to the Eventalyzer 2.0 family of programs. This capacity provides classifiers with the full range of tools normally available in a text editor (highlighting, cutting and pasting, comment insertion, color coding, etc.). The most immediate use for these editing tools is to enhance the precision and the efficiency of the event classification process. Precision is important here the central purpose of the pretest is to develop a body of information for the ATC procedure to use in creating models to classify the millions of article in our archive efficiently and accurately.
As can be seen in the guidelines, student classifiers are instructed to use a color scheme to highlight the text that was central to their classification of an event. This color scheme is keyed to the classification system used in wave two. In some news reports there are multiple events; in others events belong to more than one of the categories in our event classification scheme. Not knowing what drove a particular categorization, the RA monitors would be required to reread the entire text and guess as to what led to a particular classification. Reading the highlighted text will make this review process more efficient. Requiring classifiers to pinpoint the relevant textual portions of a report make them focus on the elements of the news report that is crucial to the classification task.
The more mature and refined classification scheme, in conjunction with the more structured nature of the second wave of the pretest, made it possible to conduct a number of reliability checks on the classification procedure. Reliability checks for a series of decision points are presently being conducted. We are testing: the accuracy of the ATC procedure in filtering irrelevant news reports, the accuracy of classifiers in classifying as irrelevant reports that were deemed relevant by the ATC procedure, the extent to which classifiers detect all of the events in a random sample of new reports, the relative accuracy of classifying articles using computer generated summaries as opposed to the full text of the reports, and the accuracy of graduate student classifications of events into Tier 3 and Tier 4 categories.
The results of these reliability checks will be used to refine the classification scheme and ATC procedure, as well as improve the guidelines used to aid the classification process. In the third and final wave of the pretest the materials developed for these pretests will be used to train student classifiers. They will not be allowed to check the computer generated classifications until they perform at a high level of proficiency.
|