Monday, November 03, 2008

SIS 600: Quantitative Analysis



Statistics, essentially.  And sometimes, I feel, the bane of my existence this semester.  And if you talk to anyone else in my program, most of us agree on that point.  We are peace-loving, theory-loving, practical conflict resolution-loving people, not numbers-loving people.  We take this class because we have to, and most of us hate it the entire way through.  I admit I see the value in it, but I will not be sad when I can say I'm done with this class!!

Tonight is the midterm (actually kind of 2/3-term) and this past weekend my life has basically been devoted to studying for it.  On Saturday, I went to a cafe with free wireless internet and spent 8 hours working.  5 of those hours were spent on ONE, count them, ONE Quant homework assignment, which I, incidentally, did not even finish.  Sunday I studied more, this morning I continued, and I think I can say I am as prepared as I'm going to be.  

I know it's unlikely that this will be interesting to any of you.  But since the purpose of my blog is to give you a small window into my life, I thought I would subject you all to a little of the pain I've been feeling the last few days.  Thus, a portion of my statistics homework :-)  Enjoy!!

In your own words, define and explain each of the following terms and concepts:

I-R Measure of association/correlation  r statistics. Application, interpretation and limitations of the  r statistics

The Pearson’s r measure of association is used to standardize the relationship between interval-ratio variables based on the computations from the least-squared regression line.  It measures the strength of the association between two variables on a scale of 0 to plus/minus 1.  A Pearson’s r value of 0-.3 shows a weak association, .3-.6 shows a moderate association, and .6-1 shows a strong association. If the value is positive, there is a positive relationship (the variables vary in the same direction).  If the value is negative, there is a negative relationship (the variables vary in opposite directions).  The interpretation of Pearson’s r is fairly arbitrary, so the coefficient of determination was developed, which is r squared.  This is a PRE statistic, meaning it tells us the percentage by which our error in predicting the dependent variable will be reduced taking into account the independent variable.  Pearson’s r is a very helpful and sophisticated statistic, but it can only be used with interval-ratio variables (though it is sometimes used with ordinal variables and can be adjusted to use rarely with nominal variables—but the results are less accurate).  Pearson’s r statistics also assume a linear relationship between the variables.

I-R Measure of association/correlation  R and R squared statistics. Application, interpretation and limitations of the R and  R squared statistics. 

R and R squared statistics are used to show the combined effects of all independent variables on a dependent variable (measured at the interval-ratio level).  Because even independent variables are interrelated, we cannot simply add the r squared statistics together to determined the combined effect on the dependent variable.  R squared tells us the percentage of the total variance in the dependent variable that can be explained by the dependent variables combined.  R and R squared statistics can only most accurately be applied to interval-ratio variables.  These statistics are very powerful, but require high-quality data and assume that each independent variable has a linear relationship with the dependent variable.  They also assume no interaction among the variables in the equation and that the independent variables are uncorrelated with each other.  As these assumptions are violated, the statistics become less trustworthy.

Similarities and difference between, R,  R squared statistics and  r statistics.

R statistics and r statistics are similar in that they both measure the association between variables at the interval-ratio level, and can provide us with the percentage of variance in the dependent variable that is explained by the independent variable(s).  The difference is that r statistics measure the effect of ONE independent variable on a dependent variable, and R statistics measure the effect of MULTIPLE independent variables on a dependent variable.

Ordinal measure Gamma. Aplication, interpretation and limitations of the  Gamma statistics

Gamma is used to measure the strength of association and direction of two ordinal level variables.  Gamma is measured on a scale of 0 to plus/minus 1, with values from 0-.3 showing a weak relationship, .3-.6 a moderate one, and .6-1 a strong relationship.  If the value is positive, there is a positive relationship (the variables vary in the same direction).  If the value is negative, there is a negative relationship (the variables vary in opposite directions).  When evaluating the direction of relationship, it is important to pay attention to the way categories are coded.  Since these are ordinal level variables, categories can often be scored in two different ways, both of which are equally valid.

Ordinal measure Spearman’s Rho. Aplication, interpretation and limitations of Spearman’s Rho

Spearman’s Rho is used as a measure of association between ordinal level variables when there is a broad range of scores and the researcher does not want to collapse them into categories that could be used to compute gamma from a bivariate table.  Spearman’s rho permits the retention of detail that can be lost when collapsing scores into categories such as “high” and “low.”  Instead of putting scores into categories, they variables are ranked in order from highest to lowest, and then the ranks for each case on each variable are compared with each other.  The computation of spearman’s rho is an index of the strength of association between the two variables on a scale of 0 to plus/minus 1, with values from 0-.3 showing a weak relationship, .3-.6 a moderate one, and .6-1 a strong relationship.  If the value is positive, there is a positive relationship (the variables vary in the same direction).  If the value is negative, there is a negative relationship (the variables vary in opposite directions).  If the value of Spearman’s rho is squared, it provides us with a PRE statistic.  Spearman’s rho can only be used with ordinal level variables that can be ranked from highest to lowest for each case.

Slope, Intercept, Least-Squared Regression Line. Interpretation of the regression line

The slope (b) of a regression line tells us the unit change in Y (dependent variable) caused by a one-unit change in X (independent variable).  The Y-intercept (a) can be calculated once the slope has been calculated using the formula Y = a + bX.  This tells us the point at which the regression line crosses the Y-axis.  This formula is the formula for the least-squares regression line, the line that comes as close as possible to touching all conditional means of Y.  The regression line tells us the strength and the direction of the relationship between X and Y.  When all the cases are plotted on the graph (scattergram), we can see how closely they are clustered around the regression line.  The closer they are to the line, the stronger the relationship between X and Y.  If the regression line rises from left to right, the relationship is positive; if it rises from right to left, the relationship is negative.  The regression formula can also be used to predict the value of Y based on a value of X that was not included in the data.  The regression line can only be used with interval-ratio variables.

Multiple Least-Squared Regression Line. Partial slope.

The least-squares multiple regression line is a modified least-squares regression line that includes more than one independent variable.  Partial slopes show the amount of change in Y for a one-unit change in an independent variable while controlling for the effects of the other independent variables.  They represent the direct effect of the associated indpendent variable on Y.  This regression line can also be used to predict the scores of the dependent variable based on scores of two or more independent variables.

 

No comments: