AComparisonofJudgmentalForecastingTechniquess.pdf

A Comparison of Techniquesfor Judgmental Forecasting byGroups with Common Information

JANET A. SNIEZEK

University of Illinois at Urbana-Champaign

Forty-four groups made judgmental forecasts for five problems. All group members receivedthe same task relevant information: historical data for each variable in the form of a graph, andnumerical listing of 36 previously monthly values. Each person first produced an individualforecast, and then was assigned to one of four Group Technique conditions: Statistical, Delphi,Consensus, and Best Member. Results show: (a) low accuracy of group forecasts compared toActual Best Member forecasts in difficult tasks, (b) under-confidence in unbiased easy tasks andoverconfidence in biased difficult tasks, (c) some unequal weighting of individual forecasts toform Consensus group forecasts, and (d) an inability of groups to identify their best members.

Judgmental forecasting is the most popular forecasting approach in organi-zations (Fildes and Fitzgerald, 1983; Dalrymple, 1987), and is very fre-quently performed by groups of persons (Armstrong, 1985). Although re-search on group judgment has a long history (see Einhom, Hogarth, &

Klempner, 1977; Sniezek & Henry, 1989) research on group forecasting islimited (for recent reviews, see Armstrong, 1985; Ferrell, 1985; Lock, 1987).

The findings from judgmental forecasting studies with individuals maynot generalize to groups. It can be seen from the group judgment literaturethat group judgment processes often differ substantially from individualprocesses (see McGrath, 1984; Sniezek & Henry, 1989, in press). Forexample, individual forecasting and planning is susceptible to a number ofinformation processing limitations which are potential causes of forecasterror (Hogarth & Makridakis, 1981). Unlike a single person, groups interact,and can experience disagreement, communication error, or social pressures.The effect of any group process might be either to exaggerate or to diminishindividual biases in information processing.

The nature and extent of differences between group and individual judg-ments often depends on the technique used to obtain the group forecast (seeMcGrath, 1984), though not all comparisons of group techniques show differ-ences (Fischer,1981). A group technique is defined here as a specified set ofprocedures for the formation of a single judgment from a set of individuals.For example, communication among the individuals may be unlimited,restricted, or prohibited with various techniques. One goal of applied groupresearch is to determine how to choose a group technique to constrain thegroup process in order to improve performance. To do this, we must first learnhow group techniques affect judgment under various conditions.

One important condition that has received little attention concerns theextent of unique versus common information held by each forecaster. Forexample, Brown, Foster, and Noreem (1985) have raised questions about theextent to which high correlations among security analyst forecasts result fromthe use of a common information set, or from communication among ana-

lysts. Laboratory studies of ad hoc groups have emphasized the role ofpooling information uniquely held by various group members in enhancinggroup performance (see Kaplan & Miller, 1983; Hill, 1982). However,multiple judges working together within the same organization may havemore important differences about how to interpret available information thandifferences in information. The increased availability of databases is likelyto create conditions in which all group members have access to the same

information. This is particularly likely to be true about historical data for timeseries estimation problems. Yet, as noted by Hastie (1986), little is knownabout group process when all members have the same information.

The major purpose of this article is to report an empirical study of groupjudgmental forecasting when all group members have access to the samedata. Four techniques for obtaining group forecasts are compared across fiveforecasting problems that are expected to vary in difficulty. The four grouptechniques – Statistical, Delphi, Consensus, and Best Member – have differ-ent constraints on the process of forming a group forecast. These grouptechniques are compared to each other, and to relevant baseline models basedon individual forecasts in terms of their effects on two dimensions of groupperformance: (a) forecast accuracy and (b) confidence in the forecasts.

MODELS OF GROUP JUDGMENT

Understanding the process by which a set of individual judgments aretransformed into a single group judgment is of great theoretical and practical

importance. Various models have been used to describe how groups actuallyform judgments (Sniezek & Henry,1989), and to evaluate the quality of thesegroup’s judgments (Einhorn et al., 1977; Sniezek & Henry, 1989). Thesemodels fall into one of two classes: equal and unequal weighting of individ-ual judgments. Unequal weighting, discussed in more detail below, involvesdifferential weighting of the judgments so that not all group membershave the same impact on the group’s judgment. In contrast, each of k groupmembers has the same weight (1/k) on group output with equal weight-ing ; the group judgment is simply the average or mean of the individualjudgments.

In practice, the mean of the judgments of k individuals who never interactis often more accurate than most individual judgments, but primarily becauseit reduces random error. This suggests that the use of multiple judges isadvantageous, but that it is not necessary to apply group techniques involvinggroup interaction to improve judgment accuracy over the level achieved bymost individuals. Indeed, practitioners might be advised to simply averagemultiple individual judgments (von Winterfeldt & Edwards, 1986). Averag-ing also has the advantage of being inexpensive and efficient, relative tocommon group techniques. For these reasons, the mean model provides oneimportant baseline against which to evaluate the actual performance ofgroups. But, although averaging reduces random error, it will always be oflimited usefulness whenever the individual judgments are systematicallybiased (Einhorn et al., 1977). Individual judgments are biased if the meanindividual judgment is above or below the actual value of the criterionvariable being forecast. There are many reasons to suspect bias in individualforecasts (Hogarth & Makridakis, 1981). In order to increase accuracy whenindividual forecasts are biased, a group technique leading to unequal weight-ing is required (Sniezek & Henry, 1989). The set of tasks selected for thisstudy was chosen to include a variety of time series with varying amounts ofbias in individual judgments.

But unequal weighting of multiple forecasts is not necessarily advanta-geous (Ashton & Ashton, 1985). In practice, group members’ individualcontributions might be weighted in proportion to, or without regard for,individual judgment accuracy. Thus, in interacting groups, unequal weight-ing can either help or hurt performance compared to equal weighting. Re-search has found the judgments of interacting groups to be more accuratethan average individual judgments (Sniezek & Henry, 1989, in press;Sniezek, 1989). From these studies, it can be inferred that group interactioncannot be adequately described by an equal weighting model.

A group process using unequal weights can take many forms. First,consider the case in which one member’s individual judgment is used as thegroup judgment. At one extreme, groups can maximize performance byassigning a weight of 1 to the Actual Best judgment (i.e., the judgment thatis closest to the actual value) and weights of 0 to the remaining judgments.The problem, of course, is that in forecasting problems the Actual Bestjudgment is not likely to be identified until after the criterion is known. Forthis reason, the group’s total reliance on the judgment of one &dquo;chosen best&dquo;

member may not lead to the level of accuracy of the Actual Best baselinemodel. However, it must be noted that the Actual Best baseline for evaluatinggroup judgment accuracy is particularly stringent in that the Actual Bestcapitalizes in part on chance. Thus, in practice we would not generally expectconsistent forecasting performance at the level of the Actual Best. An

interesting exception is discussed by Sniezek & Henry (1989), who discov-ered frequent occurrences of group judgments that were outside the range ofmembers’ individual judgments and more accurate than the Actual Best. Asecond meaningful baseline model with all-or-none weighting concerns therandom selection of one member’s judgment to use as the group judgment.The present research evaluates group judgment accuracy by comparing it tothe levels that would have been achieved had the groups used each of threebaseline models: Mean, Actual Best and Random Member.

GROUP TECHNIQUES

Four group techniques of considerable interest in group judgment researchand practice are compared in this study: Statistical, Consensus, Delphi, andBest Member. These group techniques differ greatly in their constraints oncommunication in the group judgment process, and, therefore, can be ex-pected to have differential effects on group judgmental forecasting accuracyand confidence, if communication aids in the interpretation, and not justsharing, of data. Face-to-face interaction is permitted with all but the Delphiand Statistical techniques. Some form of communication is permitted withall but the Statistical technique.

The Consensus group technique requires only that the group use face-to-face discussion to produce a single final judgment to which all membersagree. The process is otherwise discretionary. For more accurate groupprediction than can be obtained with averaging, the more accurate individualsmust have greater impact on the group output. If data interpretation – and notjust sharing – is important, the Consensus technique will lead to greater

forecast accuracy and confidence than the Statistical technique, as in pastjudgment studies (Sniezek & Henry, 1989; Sniezek, 1989).A group using the Best Member technique engages in face-to-face discus-

sion for the purpose of selecting one of the group members as &dquo;best,&dquo; so thatthis person’s judgment will be the final group judgment. Little empiricalresearch has been done on the ability of group members to assess their ownor each others’ performance quality. Einhorn et al. (1977) show that theeffectiveness of the Best Member technique increases with bias in theindividual judgments and the likelihood of selecting the Actual Best member.Sniezek (1989) found that ongoing groups with both shared and uniqueinformation selected best members with less bias than the mean model. The

emphasis on selection of a group member instead of formation of a forecastis expected to hurt forecast accuracy relative to the Consensus technique,regardless of the relative importance of information pooling or interpretation.

The Delphi procedure (Dalkey & Helmer, 1963) presumably reduces whatSteiner (1972) termed &dquo;process loss,&dquo; for example, the inappropriate influ-ence of variables such as status or confidence over ability. Members do notmeet face-to-face, and opportunities for both data sharing and interpretationare indirect at best. Communication among them is limited to feedback about

the others’ judgments (e.g., the median judgment). The process of makingjudgments and receiving feedback is repeated until consensus is achieved (oruntil the median judgment stabilizes). This final median is then the groupjudgment. There is some evidence that the Delphi procedure leads to moreaccurate predictive judgment than statistical averaging of group member’sjudgments (e.g., Jolson & Rosnow, 1971; Sniezek, 1989). However, theabsence of any opportunity to resolve differences of interpretation is likelyto minimize the benefits to forecast accuracy. Thus, it is predicted that theDelphi technique will produce forecasts more accurate than the Statisticaltechnique, but less accurate than the Consensus.

The fourth group technique relevant to this study, the Statistical grouptechnique, prohibits interaction and communication among group members.The mean of the individual judgments is called the &dquo;group&dquo; judgment.Because of the lack of interaction, this technique is expected to produce theleast accurate group forecasts.

One way of evaluating the quality of Delphi, Best Member, and Consensusgroup techniques is to examine the relationship of individual judgmentaccuracy to influence on the group judgment. Best Member and Consensusjudgments should be superior to individual and Statistical judgments when-ever two conditions hold: (a) member judgment accuracy variance is high,and (b) influence on group judgment is determined by accuracy level. A high

variance in group members’ judgments has been found to be positivelyrelated to the extent of improvement in group over individual judgmentaccuracy (see Sniezek & Henry, 1989). In these studies, the variance couldbe partly attributable to unique information held by individual group mem-bers. To the extent that group member forecast variance is reduced bycommon information in the present study, differences among the grouptechniques should be diminished.

If influence and accuracy are unrelated, then neither the Best Member northe Consensus technique will lead to more accurate judgment than theStatistical technique. The relationship between individual input and influ-ence on the group may well be mediated by individual confidence in thatinput (Hastie, 1986; Sniezek & Henry, in press). An individual’s confidencein his or her own judgments compared to his or her confidence in the groupaverage presumably explains his or her own &dquo;influence&dquo; in the Delphitechnique (Sniezek, 1989). Thus it is of interest to investigate the appropri-ateness of confidence as well as influence.

JUDGMENTAL FORECASTS

Intuitive time series prediction tasks were used in this study. Such taskshave been used by previous researchers (e.g., Carbone & Gorr, 1985;Eggleton, 1982) and have several features worth noting. First, all groupmembers have the same information, i.e., the time series data. Second, withtime series data it is possible to make distinctions between judgmentalforecasting policies. If group forecasts are found to vary with group tech-nique, the differences can be described in terms relative dependence on recentvalues, global trends, seasonality, etc.

In evaluating judgmental forecasts, two dimensions of performance areof interest: forecast accuracy and confidence. Whereas accuracy reflects

the actual quality of the forecast, confidence reflects perceived quality.The success of judgmental forecasting in an organization depends on bothaccurate forecasts and appropriate confidence. While the importance ofaccurate forecasts is obvious, the importance of appropriate confidencedeserves some discussion. As Sniezek and Henry (1989) point out, the wayin which a forecast is used will depend on how much confidence is placedin it. If confidence is unrealistic (i.e., either too high or too low relative tothe accuracy of the forecast), then organizational decision-making will besub-optional.

A large body of empirical research on individuals’ confidence assessmentshas led to the general conclusion that people are overconfident, thoughexceptions have been observed for easy tasks or some types of experts (seeLichtenstein, Fischhoff, & Phillips, 1982). Since difficult tasks are morelikely to involve judgmental forecasts from multiple persons, the implica-tions of overconfidence for group forecasting in organizations are potentiallyserious. But, research on confidence in group judgment has been very limited.Sniezek (1989) and Sniezek and Henry (1989) found that confidence in-creased following group discussion. In the Sniezek and Henry (1989) studyit was possible to evaluate realism showing that, although groups were moreconfident about their judgments than were their individual members, theywere actually less overconfident due to the improvement in accuracy throughgrouping.

Significant differences among group techniques in terms of the realism ofconfidence in judgments would have a bearing on their value in organiza-tional decision making. Although sample sizes were too small to drawconclusions, data from the Sniezek (1989) study suggest such differences.The present study will allow the evaluation of the realism of confidenceassessments across group techniques and across tasks of varying difficulty.

In summary, the major questions addressed in this study are: In judgmentalforecasting tasks in which group members have shared information, do thegroup techniques lead to different levels of forecast accuracy and confi-dence ? How do the group forecasts produced with each technique compareto those of baseline models based on individual forecasts? If the differences

among group techniques are attributable to the varying opportunities to shareinformation, no difference among group techniques are expected in thisstudy. If, however, the group techniques permit different chances to interpretdata, the rank ordering is expected to be: Consensus, Best Member, Delphi,and Statistical.

RESEARCH METHODOLOGY

First, all participants (n = 200 undergraduate students) independentlymade individual forecasts of the future (three months hence) values of fivevariables. They were then randomly assigned to groups of five. The forty-fourgroups were then randomly assigned to one of four group technique condi-tions : Statistical, Delphi, Consensus, and Best Member. The group yieldedone point forecast for each variable. Each individual and group forecast was

accompanied by an associated 50% confidence interval. This subjectiveconfidence interval is considered to be a measure of doubt or uncertaintyabout the forecast. Bonuses (free books and magazines) were offered foraccurate predictions.

The forecasting task was constructed so that the time series data wererealistic, yet group members’ information concerning the problems could becontrolled. To accomplish this, actual time series data were used from fivefinancial variables: consumer installment credit outstanding, total retail sales,3-month certificate of deposit interest rate, exports, and federal reserve bankreserves. The actual outcome for these variables was obtained three monthsafter the collection of forecasts.

Task materials contained a graph of time series data from the threeprevious years for each of five variables. The attached form instructed theindividual participant to make a forecast for each variable. Group forecastforms were similar, but contained procedural directions specific to eachexperimental condition. On the final page, following an explanation of theconcept of a 50% confidence interval, were blanks for the lower and upperlimits of a 50% confidence interval. To avoid order effects, a forward and areverse order of the five variables were each used on half of the forms.

Delphi members were seated apart from one another so that verbal andnon-verbal communication with other group members was not possible. Theexperimenter obtained the initial individual predictions, then supplied mem-bers with the median individual prediction for the first variable. Again,members made individual predictions. The cycle of feedback of medianindividual prediction and making new predictions continued until either thesame median results for three trials in a row, or three of the five members

gave identical predictions. One of the conditions was always met in fewerthan six trials. The final median served as the group forecast. After all five

variables had been forecast, members produced limits for the 50% confidenceintervals.

Consensus groups were instructed to discuss each problem until all

members agreed to a single point forecast. Then, as a group, they defined theconfidence intervals for the five variables. Best member groups were directedto discuss each problem, then choose the one group member who was mostlikely to have the most accurate forecasts. That member’s initial forecastsbecame the group forecast. As a group, they set limits to the confidenceintervals. Members of statistical groups produced 50% confidence intervalsaround their own individual forecasts.

RESULTS AND DISCUSSION

To differentiate among the five tasks with respect to difficulty, the meanpercent individual forecast error was calculated across all 220 individuals foreach variable. The five subtasks (and their mean percent errors) are orderedin terms of increasing difficulty: exports: 5.3%, total retail sales: 8%, con-sumer installment credit outstanding: 12.7%, federal reserve bank reserves:25.75%, and 3-month certificate of deposit interest rate: 37.16%.

To assess group forecast accuracy, two performance measures were used:

where YG is the group forecast and Y is the actual criterion outcome. Overn forecasts, squared error is a measure of variation about the point of zeroerror, while bias is a measure of mean error. Squared error is used underassumption that the negative consequences increase exponentially as forecasterror increases. A measure of subjective uncertainty was obtained by com-puting the difference between the upper and lower limits of the subjective50% confidence intervals.

To assess overall effects of group technique, a MANOVA was applied tothe dependent variables squared error, bias, and subjective uncertainty foreach forecasting task. Significant effects were obtained only for the threemost difficult tasks: federal reserve bank reserves (Hotelling’s F = 3.73,df =15/173,p < .001), 3-month certificate of deposit interest rate (Hotelling’sF = 3.00, df = 15/173, p < .001), and consumer installment credit outstand-ing (Hotelling’s F = 4.70, df = 15/173, p < .001) with effect sizes oi = .19,.20, and .28, respectively. Condition means for each variable showing signif-icance in subsequent univariate tests are given in Table 1.

First, it must be noted that the exports and total retail sales tasks for whichgroup technique did not affect performance were the relatively easy forecast-ing problems. Regardless of the procedure for obtaining group forecasts, theywere not inferior to the Actual Best values. The consumer installment credit

outstanding, federal reserve bank reserves, and 3-month certificate of depositinterest rate subtasks were all more difficult, but could be differentiatedaccording to the bias parameter. Individual and group forecasts for bothconsumer installment credit outstanding and 3-month certificate of depositinterest rate showed significant (p < .001) non-zero bias, while individual

NctOd

c0;aco

(Jl§%J Cm 0< E~ .¡:

0o.xW-’5N

.03(n

00c:N

L*0

«asusc0)M

fezUN:5.9

C)Lê58(,)as

’B

Io…..

:5To%

Eas(,)w_ctBU

N’0

-@c:o

UmU

CoE

E8t0C)c

3QQ)

EasQ) .S~c ~OQ)(f)C)cC c:

,Q ~~(,)-g~8.!. (f).oT a

and group forecasts for federal reserve bank reserves were unbiased. On thebasis of differences in difficulty and bias among the five tasks, group processeffects on performance can be expected to vary across the five subtasks.

The pattern of pairwise significant differences is similar for consumerinstallment credit outstanding and 3-month certificate of deposit interest rate,but is unique for federal reserve bank reserves. For both consumer installmentcredit outstanding and 3-month certificate of deposit interest rate, all exper-imental conditions resulted in forecasts less accurate than Actual Best fore-

casts, and no better than the statistical forecasts on both the squared error andbias indices. In short, all these group techniques failed to produce forecastsas good as those of the best group member under conditions of bias. Unlikethe Sniezek and Henry (1989, in press) studies, not one group judgment felloutside the range of the individual members’ forecast, thereby exceeding theaccuracy level of the Actual Best. Unlike those studies, the judges in thepresent study were focused on common data for each task.

The federal reserve bank reserves results reveal four subsets of homoge-neous squared error means. Here, Statistical does not differ from Actual Best,and both Statistical and Actual Best are superior to all other conditions. Thisis not surprising given that averaging individual judgments reduces ran-dom error and that the federal reserve bank reserves individual judgmentswere generally unbiased. In contrast, Best Member’s forecasts were signifi-cantly less accurate than all other forecasts, even Random Member’s. Clearly,the groups did not choose a member with above average ability to predictfederal reserve bank reserves. They could not be expected to, given thatindividual differences in judgment accuracy reflected only random error. Butthey also did not adhere closely to an averaging rule by picking the memberclosest to the group average. Delphi, Consensus, and Random Member allyielded similarly inaccurate forecasts. The fact that groups are significantlypoorer than the Actual Best and Statistical forecasts implies that (a) Consen-sus was not achieved by averaging individual federal reserve bank reservesforecasts, and (b) judgments in the Delphi groups were not weighed equally(i.e., the final median judgment was not the mean of individual judgments).Thus, with unbiased judgments, all of the group techniques in this study ledto some non-averaging processes, and therefore, to greater error than statis-tical averaging.

Also important, in addition to forecast accuracy, is the apparent quality offorecasts at the time that they are made. Subjective Confidence intervals areintended to measure confidence placed in the point estimate: as size in-creases, confidences decreases. Significant Confidence differences betweenconditions occur only for the federal reserve bank reserves task. The Con-

TABLE 2

% of Confidence Intervals Containing Criterion Outcomes

NOTE. Since &dquo;50%&dquo; confidence intervals were requested, the most appropriate tableentry is 50%

sensus groups clearly construct the widest intervals. The Confidence valuesin Table 1 do not correspond well to the squared error values, supportingSniezek and Henry’s (1989) finding that judgment accuracy,and confidenceare not highly related. Indeed, Pearson correlations among Confidence,Squared Error, and Bias (for both individual and groups) were weak at best,and not necessarily in the right direction. Regardless of task difficultly, thesize of confidence intervals was not strongly related to forecast accuracy.A related research question concerns the absolute quality of the confi-

dence intervals. Table 2 lists the percent of group and individual confidenceintervals containing the actual criterion outcome. Comparing the columnsreveals consistent differences which can be attributed to task difficulty andbias.

When judgments are unbiased (as in total retail sales and federal reservebank reserves), confidence intervals generally range from appropriate (near50%), in the moderately difficult federal reserve bank reserves task, to wide,in the easy total retail sales task. In the biased tasks, exports, consumerinstallment credit outstanding, and 3-month certificate of deposit interestrate, the intervals range from appropriate, in the easy task-exports-to

narrow, in the more difficult tasks – consumer installment credit outstanding,and 3-month certificate of deposit interest rate. In summary, both groups andindividuals appear to be underconfident about their forecasts in the easy and

unbiased task, total retail sales, and overconfident in the biased and difficulttasks, consumer installment credit outstanding and deposit interest rate. Thissupports previous research on the relationship of task difficulty to confidence(cf. Lichtenstein et al., 1982).

CONCLUSIONS

The results of this study indicate that, when task information is shared,group techniques have little differential impact on the quality of groupjudgmental forecasts. In easy tasks without much bias, any process results inan accuracy level as high as that of the Actual Best group member. In fact,there is little reason to use groups in these tasks, since individuals tend toperform as well as groups. In contrast, none of the group techniques studiedwas clearly preferable in the more difficult tasks, since they all yield judg-ments inferior to the Actual Best member’s. These data suggest that thedifferences among group techniques occur due to the pooling, and not theinterpretation, of data. If relevant information is shared, there is simply lessto be gained with the use of a group. The choice of group technique appearsto be less important to forecasting performance when all members haveaccess to the same information. Thus, there is no evidence to suggest the useof one technique over another.

The more basic question of whether to use any group technique in practicedepends on whether one can identify task difficulty or bias before forecastingtakes place. The finding that confidence intervals did not show sensitivity totask difficulty suggests that subjective judgments about task difficulty wouldnot be useful. The alternative is to rely on statistical analyses of past data todetermine task predictability or difficulty. In tasks that are difficult, the useof multiple judges is advised. Further, the formation of independent individ-ual forecasts prior to group interaction is a good practice to follow. If allinformation is held in common by group members, this practice will allowfor &dquo;error checking,&dquo; or will reveal inter-judge agreement. If there is somerelevant information uniquely held, heterogeneity in group members’ judg-ments is a likely result. Heterogeneity can improve group judgment perfor-mance (Sniezek and Henry, 1989) and increase group members’ commitmentto the consensus judgment (Sniezek and Henry, in press).

The study additionally reveals that individual and group confidence inforecast (as measured by 50% confidence intervals) is unrelated to forecastaccuracy, and does not vary appropriately with task difficulty. In addition,groups were not able to determine the relative quality of members’ forecasts.It must be cautioned that these results, from judgmental forecasting withshared information, do not necessarily have any implication for tasks withboth unique and shared information. Further, the results of this study may notgeneralize to groups with large status differences among their members, orgroups that have ongoing interactions. While the group techniques did notdifferentially affect the manner in which groups use data, they may well affectbehaviors important to other group forecasting situations. Future researchshould compare the aggregation and integration of information with variousgroup techniques.

REFERENCES

Armstrong, J. S. (1985). Long-range forecasting: From crystal ball to computer. (2nd Edition).New York: John Wiley and Sons.

Ashton, A. H. & Ashton, R. H. (1985). Aggregating subjective forecasts: Some empirical results.Management Science, 31, 12, 1499-1508.

Brown, P., Foster, G., & Noreem, E. (1985). Security analyst multi-year earnings forecasts andthe capital market. Studies in accounting, Research, 21, entire volume.

Carbone, R. & Gorr, W. L. (1985). Accuracy of judgmental forecasting of time series. DecisionSciences, 16, 153-160.

Dalrymple, D. J. (1987). Sales forecasting practices. International Journal Forecasting, 3,1-13.Dalkey, N. C. & Helmer, O. (1963) An experimental application of the delphi method to the use

of experts. R17-127-PR, Santa Monica, CA: RAND Corp.Eggleton, I.R.C. (1982). Intuitive time-series extrapolation. Journal of Accounting Research,20(1), 68-102.

Einhom, H. J., Hogarth, H. M., & Klempner, E. (1977). Quality of group judgment. Psycholog-ical Bulletin, 84,158-172.

Ferrell, W. R. (1985). Combining individual judgments. In G. Wright (Ed.),Behavorial decisionmaking. New York: Plenum.

Fildes, R. & Fitzgerald, M. D. (1983). The use of information in balance of payments forecasting.Economica, 50, 249-258.

Fischer, G. W. (1981). When oracles fail—a comparison of four procedures for aggregatingsubjective probability forecasts. Organizational Behavior and Human Performance, 28,96-110.

Hastie, R. (1986). Experimental evidence on group accuracy. In B. Grofman & G. Owens (Eds.),Decision research, Vol. 2. Greenwich, CT: JAI.

Hill, G. W. (1982). Group versus individual performance: Are n + 1 heads better than one?Psychological Bulletin, 91, 517-539.

Hogarth, R. M. & Makridakis, S. (1981). Forecasting and planning: An evaluation. ManagementScience, 267, 115-138.

Jolson, M. & Rosnow, G. (1971). The delphi process in marketing decision making. Journal ofMarketing Research, 8, 443-448.

Kaplan, M. F. & Miller, C. E. (1983). Group discussion and judgment. In P. B. Paulus (Ed.),Basic group processes. New York: Springer-Verlag.

Lichtenstein, S., Fischhoff, B., & Phillips, L. D. (1982). Calibration of probabilities: The stateof the art to 1980. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment and underuncertainty: Heuristics and biases. Cambridge: Cambridge University Press.

Lock, A. (1987). Integrating group judgments in subjective forecasting. In G. Wright & P. Ayton(Eds.), Judgmental forecasting. New York: John Wiley and Sons.

McGrath, J. E. (1984). Groups: Interaction and performance. Englewood Cliffs, NJ: PrenticeHall.

Sniezek, J. A. (1989). An examination of group process in judgmental forecasting. InternationalJournal of Forecasting, 5, 171-178.

Sniezek, J. A. & Henry, R. A. (1989). Accuracy and confidence in group judgment. Organiza-tional Behavior and Human Decision Processes, 43(1), 1-28.

Sniezek, J. A. & Henry, R. A. (in press). Revision, weighting, and commitment in consensusgroup judgment. Organizattonal Behavior and Human Decision Processes.

Steiner, I. D. (1972). Group process and productivity. New York: Academic Press.Von Winterfeldt, D. & Edwards, W. (1986). Decision analysis and behavioral research. London:

Cambridge University Press.

Janet A. Sniezek is Assistant Professor of Psychology at the University of Illinois atUrbana-Champaign. She received her doctorate in psychology from Purdue University.Professor Sniezek research program includes the development of models of judgmentand decision making at the individual and group levels, as well as the empirical studyofj udgmental forecasun g tn organizations. She has published in various leading journalsin psychology, business, and organizational behavior.