di: Hattie's comparison of effect sizes

Wed Oct 23 17:59:03 PDT 2019

I did ask someone who is an expert in statistics. She said to hear what Hattie did was “quite disturbing” and “got her blood pressure up.” 
Thank you, Bill, for sending along Hattie’s paragraph from Visible Learning that supports DI. I know he has been supportive of DI. His list of “influences” certainly doesn’t help the DI advocates, especially in Australia. I wonder how he got to that conclusion on page 258 when the data he reports seem to undermine DI. Even if he describes some caveats, his list of influences is the easy thing that people are grabbing onto. They are not reading the details on page 258.
I don’t have the book. I could only read the first two chapters online, which described the methodology. I was amazed to see the list of high to low effect sizes, and where DI ranked in the list—not very high. It is very misleading to develop a method for calculating d from a correlation and from a pre- to posttest, and then throw the results of those evaluations in with the traditional effect size that compares outcomes across groups. 
So I have written to Hattie, saying that he needs to develop three lists, one for each measure of effect size: treatment comparison, correlation, and pre- to posttest. Then at least we’re not comparing apples to elephants. If people find an effect size for a correlation interesting, then they can have at it. 

Sent from Mail for Windows 10

From: Bill Sower
Sent: Wednesday, October 23, 2019 5:48 AM
To: Bonnie Grossen
Cc: shiraz1 at iprimus.com.au; DI at lists.uoregon.edu
Subject: Re: di: Hattie's comparison of effect sizes

This is very interesting. I look forward to any discussion, but I leave the statistical analysis to those who are much smarter than I am. I just want to quote Hattie from the end Visible Learning (page 258):

“Perhaps the most famous example of policy makers not using or being convinced by evidence was Project Follow Through, which started in the late 1960s. It was conducted over 10 years, involved over 72,000 students, and had more than 22 sponsors who worked in more than 180 sites to find the most effective education innovations to break the cycle of poverty through enhancing student learning. The innovations included Direct Instruction, whole language, open education, and developmentally appropriate practices. The students in these programs were compared to control students. All but one program had close to zero effects (some had negative effects). Only Direct Instruction had positive effects on basic skills, on deeper comprehension measures, on social measures, and on affective measures. Meyer (1984) followed these students through to the end of their schooling, and those in the Direct Instruction compared to peers not in this program were twice as likely to graduate from high school, had higher scores on reading (d=0.43) and mathematics (d=0.28)—significant long-term differences in the Direct Instruction program effects. The outcome of this study, however, was not to support more implementation of Direct Instruction but to spend more resources on the methods that did not work but were preferred by educators. As Carnine (2000) commented, the romantic view of students discovering learning was more powerful than a method invented by a teacher that actually made a difference; a method that required an attention to detail, to deliberately changing behavior, and to teaching specific skills. The rejection of Direct Instruction in favor of Rousseian inspired methods “is a classic case of an immature profession, one that lacks a solid scientific base and has less respect for evidence than for opinion and ideology” (p. 12).”

Bill Sower
Sent from my iPhone

On Oct 21, 2019, at 7:54 PM, Bonnie Grossen <bgrossen at uoregon.edu> wrote:

Kerry,
I’m late on this, but my attention just came to Hattie’s Visible Learning where he compares 252 “influences” in education and ranks them by Effect Size. Here’s the link:
https://visible-learning.org/hattie-ranking-influences-effect-sizes-learning-achievement/

My DI friends in Australia are chagrined that DI came out relatively low in the stack. I did a little searching and researching and came up with this analysis. I would like to know what you think of it, and if you have anything to add, or see any problems with my thinking.

I was able to read the first two chapters of Hattie's book, "Visible Learning: A Synthesis of over 800 Meta-Analyses Relating to Achievement" (2008 or 2011 or 2017, not sure). He makes it clear that he is not using Effect Size to mean the size of the difference between a treatment and a control group. In fact, very few comparison studies are included in his analyses: “The wars as to what counts as evidence for causation are raging as never before. Some have argued that the only legitimate support for causal claims can come from randomized control trials (RCTs, trials in which subjects are allocated to an experimental or a control group according to a strictly random procedure.). There are few such studies among the many outlined in this book."
I would agree that causal conclusions, e.g., DI causes higher achievement, can only be made from studies comparing the effects of two treatments: the one being studied and a reasonable alternative instructional model. 
Hattie describes 3 types of Effect Sizes that he has calculated in this massive meta-analysis. He then compares the ESs without regard for type. I find that very misleading rather than informative. Here are the three:
The first two:
“Statistically, an effect size can be calculated in two major ways:
            Effect size = [Mean treatment – Mean control] / SD
Or 
            Effect size = [Mean end of treatment – Mean beginning of treatment] / SD”
So he uses a traditional ES calculation, comparing the difference between two treatments AND he calculates effect size as a difference in the pre and posttest scores, with no comparison group: 
An Effect size for growth without a comparison group is an entirely different metric. Effect sizes of differences in the performance of comparison groups are likely to be so much smaller than an ES calculated on the growth from pre- to posttest. Both groups could grow enormously and have only a small difference. In addition, did he control for time. Certainly, instruction over a year is going to show more growth than a two-week intervention. (Many DI studies are only two-weeks long because that is all it took to get a significant difference.) Using ES for growth to say something contributed to that growth is completely unacceptable in the scientific method. Growth happens with time. You can't measure the additive value that a teaching strategy might have if you do just pre- and posttest. It’s basic logic.
And you certainly can’t mix Effect Sizes for group comparisons with simply growth over time (from pre to post) if you want to show the relative power of one teaching model over another. 
Here’s the real digression from scientific thinking. Hattie includes a 3rd way to calculate ES from correlation studies: 
<5C443912350A438CAE3E4A5FA39BC0F3.png>
I don't see the math involved in calculating the ES from a correlation. The example of the different heights of women and men is comparable to the common design of studies of metacognition: the better readers reported using metacognitive strategies more than the poor readers did. This is ridiculous.
Even though Hattie agrees that causal claims cannot be made from correlation studies, he still includes mostly correlational studies in his meta-analysis:
“Throughout this book, many correlates will be presented, as most meta-analyses seek such correlates of enhanced student achievement. A major aim is to weave a story from these data that has some convincing power and some coherence, although there is no claim to make these ‘beyond reasonable doubt’.”
If Hattie wants to use three different types of Effect Sizes, he at least has to put them into three different categories, three different lists. It is completely inappropriate and very unscientific to compare all these ESs with each other, calling them all “Cohen’s D”. 
Hattie received a lot of recognition for his earlier analyses of Effect Size, comparing the results of many comparison studies. I suspect he came under pressure from folks like Allington, Marzano, Goodman, those guys that are highly invested in educational nonsense. I know that when I managed to write something that people read and respected, I got a lot of hate mail. Hattie has to be a big boy and not succumb to their constant antagonism. He has to do what’s right.
So tell me what you think, Kerry?  I only got the first two chapters of his book online. Chapter 3 might provide something important that I am missing. But I don’t see 3 buckets of ES analysis, to match his 3 definitions of Effect Size.
Did I miss something? If you agree with me, I’m going to write to him and tell him what I think. I need confirmation because I can’t believe he could be so stupid. Up until now, I thought he was such a clear thinker and admired him. 
Thanks for all the informative postings you do for us on this website.
Warm regards,
Bonnie
<5C443912350A438CAE3E4A5FA39BC0F3.png>
_______________________________________________
di mailing list
di at lists.uoregon.edu
https://lists-prod.uoregon.edu/mailman/listinfo/di

Virus-free. www.avg.com 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists-prod.uoregon.edu/pipermail/di/attachments/20191023/eb392f3c/attachment-0001.html>