Fireflies in the Shadow of the Sun   


Home
 
Plot Summary
 
Dreaming & Fireflies
 
Foreword
 
Press Release
 
fireflySun.com
NEWS
 
 
 click here to read pdf of Chapter 1. Book now available from PublisherDirect
 
Interview
 
firefly Responses
 
fireflySun Stores
Webshots
 
Stalked for Views, Wyatt Ehrenfels Makes Cyberstalking News
 
fireflySun Solutions
 
"Make Your Voice Heard Click here for your state's Appropriations Board
and Board of Regents.
"
 
The New Pocket Memo 
 
Psychology Careers Tutorial 
 
Surviving Graduate Psychology 
 
Psych Profs Suffer from Professional Analogue of Borderline Personality Disorder  
 
Graduate Admission: Strategy & Tactics  
 
 
Contact fireflySun.com
   
By the Numbers: Anatomy of a Statistical Man

EHRENFELS: “Professors use a statistical technique – or should I say they have their research assistants use a statistical technique – known as ‘item analysis.’ This is only one of dozens of statistical formulas employed in research, but I would like to make an example of this one in particular because it is used in grading as well as in research and I feel it captures the essence of their M.O. A questionnaire or an exam contains multiple questions – what are known as ‘items.’ Many questionnaires are even subdivided into scales, groups of items that are purported to tap different aspects of your personality or knowledge. For example, the Graduate Record Examination and Scholastic Aptitude Test has a group of items constructed to measure your Verbal skills and another group for your Mathematical skills. Item analysis yields a coefficient which is a measure of the extent to which one item correlates with all the other items in the scale collectively – or if there are no subscales – in the questionnaire collectively.”

MOYER: “So – in other words – and let’s use the example you used in our earlier conversation – let’s say we have a Shyness scale – your answer to the question ‘on a scale from 1 to 7, rate the extent to which you prefer privacy to social gatherings’ is correlated with your combined answer to all the other questions in the scale to determine the extent to which this item is associated with the other items.”

EHRENFELS: “Yes. Naturally – this is not done on a person-by-person basis, but the analysis is performed once you have a number of people who have taken the questionnaire. We MAY find that despite the fact that particular question seems to be like the others in the Shyness scale – seems to measure the same thing – Shyness – we may find that people who tend to rate themselves higher on this particular item – end up scoring lower on the scale as a whole. So in effect, by keeping this particular question in the scale, we deflate or underestimate the true score on the scale. So we discard the item.”

MOYER: “Is there a cut-off you use. You have a coefficient for each item in the scale, right?”

EHRENFELS: “Correct.”

MOYER: “So then how you do know when the item is good, or not good? What does it take to keep it? Or to throw it out?”

EHRENFELS: “There are no hard and fast rules, except that once you adopt a cut-off, it should be the same cut-off used for all the items. Some professors will look at another coefficient – nicknamed alpha – which is the measure of the internal consistency of the scale – and by that I mean an average of the correlation of each item with all the other items. This is the extent to which the scale correlates with itself – the extent to which it can be said to measure the same thing. Researchers want this number to be high because supposedly something cannot correlate with something else to an extent greater than it can correlate with itself – and researchers WANT to devise a scale that correlates with other measures – and by that I mean a scale that is predictive or diagnostic of other things. And what they do is they will recalculate alpha for the scale WITH and WITHOUT each item, and if they notice that alpha would be increased without an item, that item is discarded. Ultimately, they want alpha to be at least .80. Anything below .70 is considered dubious, and some strive for an alpha greater than .90, because such an alpha is said to be admissible in court.”

MOYER: “So what is the problem with item analysis?”

EHRENFELS: “I have seen it abused. And by that I mean that when researchers are expected to apply it universally -- it has a price. I had a scale which I claimed measured x. Now I defined x in such a way that the items themselves were synonymous with the definition of x. So in my opinion, these items measure x as I define it regardless of what the alpha coefficient or individual item analyses tell me. Now imagine that one of the items significantly reduces the alpha.”

MOYER: “Above or below .70?”

EHRENFELS: “Doesn’t matter. Should I throw out the item? Well, no one will publish any research that involves a scale with a lower alpha. So I am advised to drop the item. But if I do, I change the meaning of the scale. The scale no longer measures x but some variation of x. This is all well and good – except I WANT to measure x.”

MOYER: “But according to alpha, you are not measuring x, right?”

EHRENFELS: “Not true. I will contend that x may not be as TIGHT or internally consistent a construct as that we are used to dealing with, but x is still x. You see, in our field, we are used to scales with .80 and even .90 coefficients. Have you ever SEEN one of these scales – that meet these criteria? The questions all look alike. It is a foregone conclusion that people will respond to them similarly because they are all variations of the same question. And THAT is how they are usually created. Someone thinks of a question and then thinks how to re-word it several ways. BORING!”

MOYER: “Not to mention artificial, right? This has been your complaint against field.”

EHRENFELS: “I have also complained that field demands consensus from its members. Well, it would also seem they demand consensus from its subject matter.”

MOYER: “How do you think they would respond to your criticism?”

EHRENFELS: “They would tell me any scientist who does not revise a theory to fit the data would be irresponsible.”

MOYER: “And how would you – ”

EHRENFELS: “Item analysis IS NOT data. If your hypothesis is that people who score high on scale x behave in y way or make z kind of decisions or experience w type of dreams, you have to TEST the hypothesis before you throw it out by changing or discarding scale x. The real data is in y, z, and w, not in x alone. Now I know that x can only correlate with y, z, and w to the extent that correlates with itself, but we demand a lot in the way of self-correlation. You don’t need a .80 or .90 – but this is what you are told you need in a scale if you want your research to compete for publication. This may explain in part why our literature is so lifeless and repetitive. It excludes too much – and what it does include correlates with itself, so to speak. We are also too quick in this field to throw out or revise theories to conform to the first signs of data. Let me tell you something. If we understood our data, we wouldn’t need theories. There is a little piece of every theory that is supposed to transcend the data – that is reserved to help us make sense of something as fickle and variable and contradictory as DATA. So I think we should give our theories the benefit of the doubt and stop attempting as quickly as possible to develop theories that are duplicates – or analog maps – of the data. We are not reassembling engines here. Hell – we use GRE scores as criteria in the admission of graduate students. You want to know how well the GRE scores predict performance in graduate school? The Verbal scale – the most predictive scale – accounts for only 16 percent of the variation in academic performance – 16 percent! And yet we continue to look at it.”

MOYER: “And why do you suppose that is?”

EHRENFELS: “Probably because it tells us a little about what kind of person the applicant is. The applicant may have good comprehension and writing skills, but we all know that doesn’t make the difference between a successful and unsuccessful graduate student. Now if we invented a conformity inventory – with subscales for compliance, sycophancy, acquiescence, obsequiousness, and cadence-and-imitation – then we would really have something. But my point is that sometimes we want to devise scales that measure circumstances we expect to vary or fluctuate. I may not want to measure something stable and internally consistent. I may want a barometer of sorts – and sometimes not even that. I may want a scale on which most people do not score HIGH or LOW most of the time – but when they do – it tells me something about the state they are in. I may want a series of scales that profile a state such that I expect most scale scores to be neither high nor low most of the time. But I expect the topography of the profile -- the variation among the scale scores and the scores that are RELATIVELY higher or lower – to tell me something. But these kinds of statistics are just not standard. No one has devised special rules or formats for them – so they would be overlooked. The field as a whole has fallen into such now routines – and routines are in and of themselves conducive to biases and prejudices. We preclude a range of possibilities concerning what may be learned and how it may be presented. Both beauty and truth suffer – as well as freedom. But like I mentioned earlier, what we do with items that do not fit into the scale is no different from what we do with researchers who do not live ‘in the fold.’”

MOYER: “You mentioned that item analysis is also used in grading exams.”

EHRENFELS: “Professors use item analysis to hunt for multiple choice questions that are answered correctly with as much success by students who score poorly on the exam as a whole as by students who score well on the exam. Such items are said to be ‘negatively discriminating’ – which means they discriminate against good students – and they are discarded. Now I agree that item analysis here has some limited benefits. I would like to use item analysis to make sure I didn’t accidentally key in the wrong answer for a multiple-choice item. And I may even double-check the item to see if it was ambivalent or ambiguous. But if I check the item – and it seems fine to me on the surface – and if the answer is keyed in correctly – I will not discard it regardless of the item coefficient. Poor students or not – they are STILL students. And I will give them what they earned. And let’s face it -- sometimes good students – especially these hyper-memorizing, over-achieving types – ”

MOYER: “Careerists?”

EHRENFELS: “Perhaps. Sometimes they study in ways that is conducive not to learning but to performing well on multiple-choice tests. Sometimes a question comes along that discriminates against the bullshit memorization, artificial achievement, and pseudo-understanding. I will not punish the rest of the students for this. But the professors DO this because the technique itself is scientific and exacting, and because it gives them over a time a collection of the best test items. Some of the professors archive this data, thinking they are creating the perfect test or test bank. Some of them do this with an eye to publishing their test one day – that is – if they are not already using a test bank developed for the textbook by the publisher or the author’s graduate assistants. Some of these items can be bad too – despite the research. But hell – it’s easy.”

MOYER: “So some professors don’t even make up their own questions.”

EHRENFELS: “Most don’t. And why should they? They don’t design their own lectures. Those are designed by the textbook and supplemental teacher manuals – and some of the lectures may be delivered by graduate teaching assistants. So why not use the test bank that accompanies these materials.”

MOYER: “I bet you’ll tell me why.”

EHRENFELS: “Well, I’m getting a little off subject here – but I would say that it is lazy and anti-intellectual. I would say it leads to this ONE monolithic view of the field. I bet I could convince some professors of this – but I don’t think they would care. Professors don’t value their General Psychology courses because it is ‘General’ or ‘Intro’ Psychology. Professors want to teach courses in the material in which they specialize, and they want to teach these courses to ADVANCED students – not Psych 101 – with which professors are unfamiliar – to a bunch of college freshman – most of whom may not even be Psychology majors. But I will deal with this more in our interview on teaching. What I am talking about NOW is what professors are willing to do to bring teaching, grading, and test-taking under the rubric of science, professionalism, and research. Some of the items they would discard are not even wildly discriminating. In other words, they do not just throw out extremely negative coefficients, but also coefficients which are mildly negative, and some even discard items with mildly POSITIVE coefficients in search of that perfect exam and that perfect bell curve. Sometimes I think they were conditioned to see beauty in that normal bell curve shape. Now this practice in and of itself is not that consequential. That is why it has escaped everyone’s notice. But it is symptomatic of some of their more consequential choices – and of a consequential PATTERN of behaviors which – taken collectively –introduces a credentialism that favors careerists and discriminates at more advanced levels of education against the true scholars. They really are creating a race of super-scientists and administrators. And what you really end up doing is narrowing the range of skills tapped -- or narrowing the range of tapped skills that are reported – such that you end up with this yardstick that measures JUST ONE THING. And if you are not in the top x percent of this ONE skill or quality – your odds of making it are very small. We really don’t pay much attention to the fact – and I imagine this flaw dogs every field to a certain extent – that there are people who aspire to be members of the field who are bright and creative – probably brighter and more creative than most in the field – who never really get the chance to make it in. They are weeded out at some point without a fair hearing. I think the half of the public that DOES see this ACCEPTS it as the work of that chance component that is part of life. I am here to put quite a different face on that chance element – to tell you that it really ISN’T chance at all – but the work of something very systematic you may not see.”