The Practice of Statistics

From this page, students may get class "news" updates, download various Minitab data files for the assignments and labs, and get overheads from the class presentations, weekly assignments, my Minitab manual, and the labs themselves.



News:

We will hold a class on Monday, May 24, and then a repeat the next morning.

The final synthesis assignment is now available! It is due at 9 AM on the last day of classes, Wednesday, June 2.

The math lab (Dickinson 239) is reserved for our use at the following times: Mondays 12:30-2 PM, Wednesdays 10:00-11:30 AM, and Fridays 12:30-2 PM. Come and see me across the hall if the room is locked.

You can buy or rent Minitab Releases 13 or 14 here.


We have a Mac version of Minitab, running on the three Macs in the front "lobby" area of the NMC. Unfortunately, it isn't able to load any data files. If you want to use a Mac, you'll have to print out the data from another computer and type it into Minitab on these Macs. Not ideal, but we won't be working with monstrously large data sets either.


Handouts (in PDF format)

Course syllabus
First mathematics preparation assignment


Lab Manual Online:

My Minitab Mini-Mini Manual is available in two formats: in HTML format, and in Microsoft Word format. The HTML version is a direct conversion from Word, no editing; so formatting may occasionally be awry and page numbers are meaningless, and at the moment the graphic images are missing; on the other hand, it loads faster than the Word version.

Lab Assignments




Data sets from the classes

Old Faithful geyser eruption times

Ruth-Maris-McGwire-Sosa home run data

Car prices, 1991 model year

An apparently normal data set, that isn't

Heights of husbands and wives

Manatee deaths on Florida coast

Edwin Hubble's galaxy expansion measurements

Chesapeake Bay dissolved oxygen readings

Pepto-Cola filling of cans

NutraSweet: Does it lose its sweetness over time?

Is human body temperature really 98.6 degrees?

Revised radon detector readings
(for the originals, click here or see below, Q6.48)

Do hot dogs made of chicken have different sodium levels than those made of beef?

Do directed reading activities improve comprehension?

Does a space-age material help decrease wear on the soles of kids' shoes?

Decreasing stress by the company of a friend or pet

Do music or computer lessons improve spatio-temporal reasoning skills?



Assignments

Questions in bold face are those where the use of Minitab is permitted (usually, encouraged). For the others, please use nothing more powerful than a calculator. Data sets in Minitab format are often provided even for non-Minitab questions, for your benefit in checking your work.

When handing in a solution involving Minitab, please make sure to include printouts of the relevant Minitab output, and clearly label the output that you refer to in your answers, so that I can easily follow your work.

For future assignments, you may work ahead if you like. But I do reserve the right to alter assignments with as-yet-unspecified due dates.


Assignment 1 (due TUESDAY, March 9: 1.18, 1.24, 1.27, 1.47, 1.49, 1.54, 1.74, GlenQ below:

Response times (measured in milliseconds) for "hits" at the Web site www.StatisticsDoesntSuck.com are given in this Minitab file. Generate a stemplot, histogram, and dotplot of this data, and explain why none of the three really adequately display the distribution of the data. Then, come up with your own method to represent the data, and give some possible explanations for this curious distribution.

1.24 (calcium in diet)

1.27 (Cavendish's earth density measurements)

1.49 (SSHA test scores)

1.54 (Presidential support)


Assignment 2 (due WEDNESDAY, March 16): 1.82, 1.85, 1.86, 1.93, 1.100, 1.110, Lab 2 (either bio or psych; include normal quantile plots), GlenQ below:

Consider the data set from the GlenQ of the previous homework assignment. Determine both measures of center and spread, and generate a boxplot. What does the normal quantile plot look like? What do you conclude about the appropriateness of z-scores? Go ahead and use z-scores to determine the proportion of waiting times above 8 seconds; then compare with the proportion of the data above 8 seconds. Does this support the "appropriateness" conclusion you just made?

1.85 (IQ scores of seventh-graders)


Assignment 3 (due Tuesday, March 30): 2.10, 2.12, 2.19 (show your calculations!), 2.23, 2.26, 2.33, Glen Q below

We're aware of the risk that outliers pose to the correlation coefficient. One alternative, from the world of non-parametric statistics, works as follows: take the first column of data, and rank the values from 1 for the smallest value, 2 for the next smallest, and so on. Do the same for the second column. Now replace the original values in both columns with their rank numbers, and compute the correlation coefficient of the ranks.

Find five friends, and measure their head circumference and length of forearm (from elbow to fingertip), in cm. Compute the correlation coefficient; then add one fake data point that is an outlier and recompute the correlation. Next, compute the revised robust correlation described above (both with and without the outlier), and compare the effect of the outlier on the two methods of computing the correlation. Explain, from the calculations, why the revised correlation is so much more robust against the outlier. Finally, can you think of any advantage of the original correlation over our more robust counterpart?

2.10/2.23 (Gas consumption)

2.12 (How fast do icicles grow?)

2.19 (Archaeopteryx bone lengths)


Assignment 4 (due Tuesday, April 6): 2.40, 2.54, 2.60, 2.62, 2.69, Glen Q below

Recall, on last week's Glen Q, we saw a method of measuring the correlation between two data sets that suffered less from the problem of outliers. This week, I would like you to consider an alternative to the standard regression procedure that works in the same way: by replacing the data itself with the ranks of the data, and analyzing that instead. Use this idea to perform a regression analysis using the data set you generated last week (include the outlier; use head circumference as x). Then do a traditional regression analysis. Finally, use your traditional regression analysis to predict the arm length of someone with a head circumference of 53 cm; then try to use the outlier-resistant method to do the same prediction. What goes wrong?

2.40 (Nitrates versus absorbance)

2.54 (Farm population)

2.60 (How much sodium is there in your hot dog?)

2.62 (Gas chromatography)


Assignment 5 (due Tuesday, April 13): Midterm synthesis assignment

1STYEAR.MTW

OLDFOGY.MTW

SPRUCE.MTW


Assignment 6 (due Tuesday, April 27): 3.71, 5.12, 5.24, Glen Q below

A Bennington College news release sent to parents claims that "only 20% of students smoke regularly". We're not sure whether this is true, and decide to test this hypothesis.
(a) Describe how you would perform a survey, asking a sample of students whether or not they smoke regularly. Include relevant concepts from survey design.
(b) Suppose that 16 of 50 students you asked answered "Yes": they do smoke regularly. Do a statistical test to decide whether you have sufficient evidence to discredit the College's press release. Include all relevant terms from hypothesis testing.
(c) Can you think of any criticisms of your own study ?(i.e., how might the College respond?)

(no Minitab files are relevant to this assignment, although you may use it for three of the questions)


Assignment 7 (due Tuesday, May 4): 6.3, 6.13, 6.29, 6.44, 6.49, 6.70, Glen Q below

An April 5 poll at CBS News has national support for John Kerry at 48%, and support for George Dubya Bush at 43%. The fine print at the bottom of the page says that the poll is accurate to within three percentage points; it should also say that it is this accurate 19 times out of every 20 polls.
(a) Express the above in confidence interval terminology.
(b) Do we have 95% confidence that Kerry is ahead of Bush?
(c) Suppose we dropped the confidence level to 90%. Do you think we would have 90% confidence of Kerry being ahead? What about 99% confidence? Explain in both cases.
(d) Suppose you have just seen four independently conducted surveys, and all four have Kerry ahead (by varying margins). Do a significance test to determine whether you have sufficient evidence to reject the hypothesis that they are tied. Would five surveys have been enough? Six? (Warning: consider carefully which statistical procedure is appropriate here.)

6.13 and 6.49 (Degree of Reading Power scores for third-graders) (...bonus point for identifying which of the scores is my daughter's...)


Assignment 8 (due Tuesday, May 11): 7.20, 7.21, 7.29, 7.30, 7.40, Glen Q below

In our Nutrasweet sweetness loss analysis in class, we did not check to see whether the data were normally distributed.
(a) Check it now. Notice that there is some question about this assumption.
To do the test properly, we must use the sign test (another test from the world of non-parametric statistics), which avoids the normality assumption. It works as follows. Our null hypothesis is that there is no sweetness loss, with the alternative that the sweetness loss is positive. For each data point, if it is positive, replace it with a "+" sign. If the data point is negative, replace it with a "-" sign. Count the number of "+" signs.
(b) Using a technique from earlier in the semester, determine whether the total number of "+" signs you got is more than can be plausibly attributed to random chance. Are you still able to conclude that sweetness is lost?
(c) In general, is the sign test stronger (more likely to come up with a significant result), or weaker (less likely to get a significant result), than the t-test we saw in class? Explain why you think so.

7.20/7.21 (How much do users pay for Internet access?)

7.29/7.30 (Do piano lessons improve spatio-temporal reasoning?)

7.40 (Are right-handed people more efficient turning right-handed threads?)


Assignment 9 (due Tuesday, May 18): 7.42, 7.58, 7.59, 7.65, 7.69, Glen Q below

In American mathematics education research, it has been demonstrated that the mean performance of boys on standardized math tests is slightly better than the mean performance of girls; the cause is generally thought to be social discouragement of mathematical success for girls. Suppose that the population mean for boys on this test is 68, while the population mean for girls is 66, and the population standard deviation for both groups is 15. See if you can determine (or estimate) the sample size needed before a statistical test on those samples would detect a discernible difference at the 5% significance level. Assume the same number of boys and girls. (HINT: Minitab allows you to simulate selecting data from normal (and other) distributions. Go to Calc/Random Data/Normal; from there it's obvious what to do.)

7.42 (MLA Spanish listening test scores)

7.58/7.59 (Apartment rental costs)

7.65 (More on piano lessons and spatio-temporal reasoning skills)

7.69 (College professors, ego, and fitness levels)


Assignment 10 (due Tuesday, May 25): 12.16, 12.23, 12.30, 12.64, Lab 10

12.16 (Supermarket price reductions)

12.23 (Those crazy jumping rats)

12.30 (What color do cereal leaf beetles prefer?)


Final Synthesis Assignment 11 (due Wednesday, June 2 at 9:00 AM): Available here. The news magazine article is not included with this file.

PULSE.MTW (The diving reflex)

KIDS.MTW (Do natural parents raise children who learn better?)

SORPTION.MTW (hazardous organic solvents)



Data sets for the labs

Lab 1 (psych/bio): Depression test scores (DEPRESS.MTW)

Lab 2 (psych): Intelligence scale scores (NORTEST.MTW)
Lab 2 (bio): Holstein butterfat production (BUTTRFAT.MTW)

Lab 4 (psych/bio): TV and aggression (KIDFIGHT.MTW); pornography and attitudes (PORNO.MTW)

Lab 3 (psych): Alcohol and pregnancies (ALCOHOL.MTW)
Lab 3 (bio): Factors influencing photosynthesis (PHOTOSYN.MTW)

Lab 7 (psych): Verbal ability test scores (TVASCORE.MTW)
Lab 7 (bio): Brain enzyme activity reductions in birds (ACHE-ACT.MTW)

Lab 8/9 (psych): Verbal ability test scores, by gender (VERBAL.MTW)
Lab 8/9 (bio): Phosphate levels in blood of kidney dialysis patients (PHOSPHAT.MTW);
weights of juvenile ring-necked pheasants (PHEASANT.MTW)

Lab 11 (psych/bio): Motives for performance on tests (MOTIVE.MTW)


Microsoft Word files for overheads used in classes:

Calculating the Standard Deviation

Creating a Normal Quantile Plot

Correlation Coefficient Calculation

Best Fit Line

ANOVA Sum of Squares of Errors (SSE)

ANOVA Sum of Squares of Groups (SSG)



Last modified: May 21, 2004 / Glen Van Brummelen

Office: Dickinson 213. Phone numbers: (w) 440-4467; (h) 440-8142.