Cling to the main vine, not the loose one.
Kei hopu tōu ringa ki te aka tāepa, engari kia mau ki te aka matua
Thoughts on Teaching and Learning of Mathematics
Lesson #6 • Revised 11/1/20
Probability and Statistics
- to connect probability with all parts of mathematics
- to know how statistics makes sense of very difficult problems
- to understand "random" as a concept
Ahh.. chance perchance! What a wonderfully misleading, often abused but valuable concept. Einstein said "God does not play with dice!" challenging claims that "quantum chance" rules the world and which it turns out it does. Currently accepted quantum theories abound with probabilistic ideas and deductions suggesting God might well have "played with dice'. Experimental observation supports the randomness proposed within the theory.
Here is the summation to infinity, over all space and time, of the square of the probability density of a particle in one dimension. I admire the simplicity and complexity in this statement. How something so obvious as "it has to be somewhere" is captured by something so complex and all to do with probability. Simple in concept, complex in sense and form.
The Deep Understanding of Probability
Being enabled to measure and use random events to make sense of the world around us to solve problems.
Probability is a natural part of our language and the concepts are components of all human endeavour. We use expressions like "on the balance of probability", "beyond all reasonable doubt", "possible", "probable", "almost certain", "always", "never", "in all likelihood", "hardly ever", "sometimes" to name just a few. The list is large and when asking people of the numeric probability that might be assigned to each of these expressions the variation is wide. I asked a court judge of what probability might he assign to "beyond all reasonable doubt". His answer was anything over 50%. That frightened me. I had my mind set around 80% or more.
Task 1 for Years 2 to Year 100
Make a list of all the probability terms you can think of, encounter or might remember, and then order them from least to most likely. Try and assign a number to describe the probability. Compare. Warning!...there will be disagreement. I have a task for Y 7 to 10 all prepared ready for use on the nzmaths.co.nz website called https://nzmaths.co.nz/resource/number-probability-1
For young people, playing with dice, gaming, and learning the language of probability is vital. Roll the dice! Play cards. Play Monopoly. Normalise and use "Paper, Scissors, Rock to sort issues", and understand the unpredictability of randomness. Destroy the misconception that if I throw everything but a 6 on 20 or so throws the next must be a six. In random, we always underestimate the long run.
Task 2
Marion Steele showed me this experiment with great glee: "Imagine you are a coin and write down the result of throwing yourself 100 times, ten rows of ten outcomes. Now do the experiment and record your outcomes of tossing a coin 100 times. Use two identical pieces of paper and not mark which is which. I will come around and tell you which is your guess and which is your coin toss result." It is very easy, and fast, to tell the difference between experiment and our perception of what happens. I challenge you to repeat this task and see if you can sort how obvious it is to distinguish the difference and be a "mindreader" in your math class. This task connects to the develop of "relationship" and "students not realising they are learning". This also builds your mathematics mystique and makes tsudents curious.
We need to learn to use probability to select a "random sample" and know why this is a good way to sample. We must learn to use probabilistic language. We need to know what number might be assigned to "no chance" and "will always happen". We need to understand and sense what is a "good chance" and what is "nah mate!"
Task 3
To establish meaning to 1/6 and 1/10 and how different these two probabilities are use a six sided die (plural dice) and a 10 sided 0-9 die. The task for students is to guess a number and then try and toss that number. It happens quite often on a six sided die but on the 10 sided is decidedly more difficult. I made up a sheet for my local GP to use with heart risk patients who have little understanding of the improvement in outcomes when moving from a 1/10 (10%) risk to a 1/6(17%) risk. It does not sound much but I know which group I want to be in. This is a valuable exercise, try it. Add a coin for 1/2.
Interwoven in the ideas underpinning probability are proportional thinking and fractions. These two are our most difficult early concepts and it is wise to stick to the language development and playing games until fraction knowledge is secure. We do use fractions to represent probability and we operate on these fractions in normal ways as well but all probability is between 0 and, unlike fractions. Throwing a die for a two is one chance out of 6 and is represented by 1/6. Is the 1/6 a fraction? If I throw the die again does that change my chances of getting a two to 2/6 or something else? If I throw the die 6 times does that mean I am guaranteed a two or 6/6? Does 7/6 have a meaning in probability?
Task 4
The PPDAC Statistical Inquiry Cycle or be a Data Detective applies to probability, of course and always. Do not delve without using this cycle.
P - Ask a Question - "I wonder if I can test a coin to see if it is fair?"
P- Make a Plan - I will toss the coin 200 times and make a bar graph of the results. If it is fair it will be even, or symmetrical, with the two bars about the same size.
D - Design and do the experiment - I toss the coin 200 times recording all outcomes.
A - Analyse the outcomes - I draw a bar graph of the results. I also put the results into iNZight and get significance bars. That is interesting. A bar that varies due to size of sample.
C - Make sense and a conclusion - This experiment suggests the coin appears to be fair. The two parts are almost the same size, Heads show 95, Tails 105, and the iNZight software error bars overlap so no difference can be supported. The coin appears to be fair...(answer the question!).
Task 5
Try figuring out the probability of randomly breaking a stick in two places and being able to make a triangle with the three pieces. This is not something that is obvious and a great problem for statistics and randomness to answer. This is a great problem. One of the best!
What is the likelihood of Kane Williamson being bowled out for a duck in the next cricket match? What is the likelihood of President Trump seeing out his first presidential term? Well, he made it to 2019 despite losing a few decisions and is now in a rut demanding a wall with Mexico. I wait! I continue to wait! It appears to be he will make it to the end of the first term, impeachment or not, but will be get a second term?
Playing games using dice like Yahzee, or my variation Numahtzee, Monopoly, Battleships, Dice Games, Black Jack, Paper-Scissors-Rock are all essential "hands-on" learning. The folly of gambling can be learned at this level. More advanced students can simulate LOTTO draws and make up sample tickets using XCEL to learn how ridiculous winning anything in Lotto can be. The experience of games is very useful.
The obvious is not always apparent in probability and a wide experience of problems is important to experience this bizarre and highly variable idea. In the "Breaking a Stick" problem mentioned above the geometrical solution is easiest to comprehend. In the Birthday Problem of "How many people do you need in a group to have a 50% or better chance that two of the group share a common birthday" is easier to answer when you ask the opposite question. How likely I am to win the 2019 Match Play Golf Competition with the 9 Hole Men's Group is an even more obscure problem despite me winning it in 2018. [I was beaten in the semi final! But, I did win the Stroke Play Comp.] Probability infests all problems!
The Deep Understanding of Statistics
Being enabled to collect and use data to make sense of the world around us and to solve problems.
Statistics has undergone major change in NZ and I acknowledgement the NZ Statistics Association for their collective wisdom, guidance and hard work over quite a long time. It took many meetings and emails by many people to settle on ways of re-inventing statistics for the modern world during 1998 to 2008 period when the new NZC was being written. John Tukey influenced the future with this work and created the dot plot, and, the box and whisker diagram now so common in NZ schools. Curious that the one area of mathematics that NZQA Moderators of Assessment decisions and material still report misunderstanding is statistics and usually with questioning, variation, interpretation and sample size, and repeating and contradicting oneself or 'rabbiting' of on a tangent of nonsense.
The computer has brought about a revolution in the management and analysis of data and it is up to us to be good interpreters and make valid conclusions. Chris Wild's team in Auckland University caused the Data Explorer on the Census@Schools website and the easy to use iNZight Statistics Software. The computer and data storage, data collection and Google have changed the way we perceive the world. Even the Russians appear to have influenced the Trump election and again, I wait to see what happens.
The Statistical Inquiry Cycle and what is now called PPDAC guide or should guide all investigations in statistics and probability. The past fascination with mean, median and mode has become "measures of middle" and is gone! The mean is a curious mathematical concept. An average includes mean, mode and median. Now "the eye's have it" and "shape, middle, spread and oddities" of dot plots and box and whisker graphs rule the reasoning. Add "WWW" for each of these four features as "What am I talking about", "Where is it" and "What does it mean" and we have a worthy guide to become a Data Detective.
Statistical Questions. Pip Arnold completed a PhD on this topic around questions. Get the question right and you will be much more likely to be able to answer it. One of the three types of stats questions is "comparative". I use the frame of "Variable"; "2 Groups"; "> or <", "Population" in explaining and guiding students to a good question. Pip was on the button with her research. Without a clear and defined, answerable question anyone is pretty much in the dark in Statistics.
Task
In the question "I wonder if in the C@S database NZ 2019 Year 9 boys carry heavier bags than Year 9 girls?" identify all parts using my frame.
PPDAC. This is actually a version of the inquiry cycle commonly used in science. P = Problem, P = Plan, D = Data, A = Analysis and C = Conclusion.
A statistical inquiry begins with a query or Problem. "I wonder if in the C@S database NZ 2019 Year 9 boys carry heavier bags than Year 9 girls?" I might wonder this because I have a heavy bag usually as a Year 9 student and I see that most boys are the same. There might be a case here for getting some storage at school for some of my gear!
My Plan is to go to the last Census@Schools website survey and get random sample Data of 30 boys and 30 girls and the bag weights they reported. Then I will make a dot plot of these two groups and a box and whisker graph using the data Explorer. Thirty (30) worked when I used this in my class work. My results are:-
TheAnalysis
These comments apply to different Year levels and show the difference in thinking at these levels.
Year 6/7/8 - The boys box is more to the right (or higher) than the girls so this suggests that typically boys do carry heavier bags.
Year 9/10 - Boys bags are typically heavier since half of the boys bags are heavier than 3/4 of the girls bags. The median is 5kg and spread upwards to over 9 kg. The girls upper quartile is about 4.5kg and spread lower or lighter to about 1kg.
Year 11 - The gap between the medians is large compared to the overall spread of the two boxes. The difference in the medians is 5-3.2 = 1.8kg and the overall middle spread is 5.5 - 2.2 = 3.3kg. This means the higher boys median is likely to be reflected in the whole population so we can say "Yes; boy's bags are likely to be heavier than girl's bags".
Year 12 becomes more technical again as use of the informal interval ± 1.5*IQR/√n is established. The results here support the claim that boys bags are heavier.
Year 13 is about "bootstrapping" and testing for difference. Again this analysis supports the claim.
In statistics "The Eyes have it!"
http://new.censusatschool.org.nz/wp-content/uploads/2009/10/Informal-statistical-inference-revisited-slides.pdf
Means, medians and modes as measures of the middle have been replaced by the concept of "typical". The Census@Schools website is the GOTO site for all information. The "dot plot" and the "box and whisker plot" of sampled data is everything for Y7 to 11. The concepts of survey and sampling prioritised.
iNZight software is found by asking the internet "iNZight software". Other online software like NZGRAPHER work on iPads and Cellphones. InZight Lite is a smart 21st C software solution.
WARNING! Do not let younger students (Year 6,7,8 or even 9 and 10) make use of software to make dot plots and box and whisker graphs before they can tell you how to make these things by hand and explain all features. Show them too soon and they will never draw one again and miss out on that "learn by doing" experience. "Learning travels up your arms"and I told many students this view. Learn by doing!
Students should not be introduced to the Box and Whisker until they are multiplicative and appreciate that 1/4 of the data populates each part despite the whiskers and boxes in the middle 50% being different sizes. The Box and Whisker is a brilliant tool for displaying information and comparing information. It is loaded with deep understandings and connections. In the diagram is LQ, Middle 50%, median, UQ, IQR, Range, Min, Max and with teh Dot Plot we add Shape, Spread, MIddle and Oddities. These "simple" graphs contain everything except the context!
Here is a diagnostic test to help know if students are ready to move on.
TASK
Which section A, B, C or D of this box and whisker has the most data?
Everyone should study statistics for as long as they can and all pathways for all students from school should include Year 12 Statistics knowledge and skills. Today's world has a data infestation and "making sense of data" is now more vital than ever. Computers can deal with millions of data points and every time you log on to the internet or use a credit card you are being measured and monitored. Know what is happening. Go to the GeoNet website or EBOP buoy website to see how much data is being collected for scientific purposes. How much data is Nasa collecting from space probes? The answer is trillions of Gbytes daily. Our task is to "make sense of the data."
Students who choose the largest D or even A or even B are of course all incorrect but do so because larger is bigger and more. This is the world of the additive thinker; "What happened yesterday will happen tomorrow", "More is better", Bigger is better", "One thing dominates thinking and connections do not exist past simple obvious links".
Students need to experience and make meaning for themselves to get to the correct answer of "All sections have the same number of data! Reason - In this case there is 28 dots so each section has 7 and this can be seen. Section A is Minimum to LQ or 25%, Section B is LQ to Median or 50%, Section C is Median to UQ or 75% and Section D is UQ to Maximum. (Clear connections showing Multiplicative and Proportional thinking). The IQR is the width of the Box and is a measure of spread. This is a trick question! (Critical thinking)" You would be very happy for a student to present this explanation!
A good grasp of the language of statistics is vital. I describe statistical language as "floppy" as it includes phrases like "tends to", "the data suggests" rather than exactness of mathematics like "x = 3" . It is better to give an interval such as "the length of the pipe is 3.4±.1cm". We can not say that the mean weight of a trout in Lake Taupo is 2.34kg because it is a population statistic that we can never know that exactly. We can only use a sample to get some interval in which the mean or median of the population is likely to be and for trout in Lake Taupo this might be [2.31, 2.36]kg meaning there is a pretty good chance that this interval will contain the actual median.
Statistics is the place for literacy development in mathematics using words. In Mathematics we also have literacy but it is better described as subject specific literacy. Setting out a proof in a logical way so others can read and comprehend the reasoning is clever subject specific literacy. Solving an equation in a problem in a logical way likewise. Constructing a pentagon using a compass and ruler is also literacy in mathematics. Literacy in mathematics looks most like literacy in another subject when we have word lists and essays on math topics.
Statistics however requires a strong writing and reading ability. It is a great place for STEM students to develop the skills of writing and reading, and speaking and listening. It is a bit meaningless to say the mean weight of trout in lake Taupo in 1.24kg. This says nothing of the population except that one statistic, the mean. It is better to use the word typical and add an interval to get "The typical trout in Lake Taupo is between 1.1kg and 1.75kg. This is the spread of the middle 50% or IQR, the Inter-quartile Range. There are smaller and larger fish."
Random
Statistics and probability walk hand in hand with the concept of randomness. Why is a random sample better? Why do we use a random sample? Are there other ways?
Random sampling causes all those questions we did not ask of the data to be evenly likely to turn up in any sample and so minimise bias and false inference. This is hardly ever explained. In the example of asking how many text messages a group of male and female students send the previous day to try and answer the question of "Do female students send more texts than male students?" notice we did not ask a few other questions such as;
- is this your first phone?By randomly selecting students we also spread all this questions and answers randomly across all the data so that when we select the two groups of Male and Female the question that we did ask has more validity and is not biased. The comparative inference we then make has more validity.
- Is this a new phone?
- have you got a new partner?
- is your Mum ill?
- has your friend just won Lotto?
- and about a million or so other questions that I have thought of and the million or so other questions that I did not even think of!
Sample Size
Here is another issue and usually teachers just tell students "Select a sample of size 30". Good heavens and good gracious and OMG - WHY? Telling destroys learning! It prevents the understanding from students making their own meaning. Yes, it might be faster and teachers will always cry "I do not have time to explore this idea and wait for students to make sense of it all". I respond... "You do not have time not to allow the students to explore and make sense of these deep and meaningful understandings. This is the real learning!" Like all thinks, there is a time to tell, but it 'ain't here.
During 2016 I ran a study group of several senior Statistics teachers and coached them to make inquiries into the Teaching and Learning of Statistics. We were learning how to "Inquire" among other things. Not long after the group formed I was aware no one actually could explain "How big should a sample be?" and provide a cool learning experience and reasons. Item #2 in this link is the .ppt I developed as a result and have used many times since. Item 2 Sample Size Powerpoint. It is on the C@S website now. I used the iNZight software to select samples of different sizes and measured the computer screen to get a spread estimate. This was graphed and modelled. The task is a great student learning experience and they should all take the time to do this and then try and answer the question of how big a sample should be.
There are a couple of new learnings that came out of this for me as well. I was reminded of the old adage by statisticians that "quadrupling the sample size halves the spread (or 1/√n)". I also noticed computers could handle large amounts of data easily, now taking a sample a bit irrelevant.
Is Statistics Mathematics?
Yes. Statistics deals with the floppy questions our world presents. It determines standard error and ±limits and why. It allows Google to make better software and retailers to target products to consumers. "Making sense of data" is the primary mantra of Statistics. The reason I said "yes" to this question is that mathematics is inside statistics doing the median, LQ, IQR calculations, running the software and drawing the graphs. It runs the internet and data gathering systems, the telemetry systems, the sensors and satellites. Mathematics runs the atomic clock that can now measure the daily variation in the Earth's rotation rate! It is why our cellphones work! Mathematics is deep within Statistics, just like all the STEM subjects.
Hence Lesson #6
Making sense of data is statistics. The PPDAC cycle is vital with hands-on sampling to learn about variation and what you can say about a population. This is an excellent subject to help students learn to write to convey meaning. Making sense of probability and using random, chance, outcomes is just as vital. Both areas of knowledge, Staistics and Probability, applied to problems becomes the learning.
Teacher TASK
What are the key competencies opportunities that present themselves for learning probability and statistics.
How are these part of your main vine of learning?
List all the probability words you can and order these from never to always.
Investigate and master one computer based stats programme, such as iNZight or NZGrapher. There are others.
Look up Taupo Trout Resources on my website and use these in junior programmes. Find Kiwi Kapers on Census at Schools.
What are the three types of statistical questions.