> t2=with(Cars93,table(Origin,Type));t2 Type Origin Compact Large Midsize Small Sporty Van USA 71110785 non-USA 90121464 > t3=xtabs(~Origin+Type,Cars93);t3 Type Origin Compact Large Midsize Small Sporty Van USA 71110785 non-USA 90121464 > prop.table(t2) Type Origin Compact Large Midsize Small Sporty Van USA 0.07530.11830.10750.07530.08600.0538 non-USA 0.09680.00000.12900.15050.06450.0430 > margin.table(t3,1)#행 Origin USA non-USA 4845 > margin.table(t3,2)#열 Type Compact Large Midsize Small Sporty Van 16112221149 > prop.table(t3) Type Origin Compact Large Midsize Small Sporty Van USA 0.07530.11830.10750.07530.08600.0538 non-USA 0.09680.00000.12900.15050.06450.0430 > addmargins(t3) Type Origin Compact Large Midsize Small Sporty Van Sum USA 7111078548 non-USA 9012146445 Sum 1611222114993 > addmargins(prop.table(t3)) Type Origin Compact Large Midsize Small Sporty Van Sum USA 0.07530.11830.10750.07530.08600.05380.5161 non-USA 0.09680.00000.12900.15050.06450.04300.4839 Sum 0.17200.11830.23660.22580.15050.09681.0000
Cell Contents |-------------------------| | N | | Chi-square contribution | | N / Row Total | | N / Col Total | | N / Table Total | |---------------|
Total Observations in Table:93
| Type | Origin | Compact | Large | Midsize | Small | Sporty | Van | Row Total | |-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| | USA |7|11|10|7|8|5|48| |0.192|4.990|0.162|1.360|0.083|0.027|| |0.146|0.229|0.208|0.146|0.167|0.104|0.516| |0.438|1.000|0.455|0.333|0.571|0.556|| |0.075|0.118|0.108|0.075|0.086|0.054|| |-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| | non-USA |9|0|12|14|6|4|45| |0.204|5.323|0.172|1.450|0.088|0.029|| |0.200|0.000|0.267|0.311|0.133|0.089|0.484| |0.562|0.000|0.545|0.667|0.429|0.444|| |0.097|0.000|0.129|0.151|0.065|0.043|| |-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| | Column Total |16|11|22|21|14|9|93| |0.172|0.118|0.237|0.226|0.151|0.097|| |-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| > CrossTable(Origin,Type,expected=T,chisq=T)
Cell Contents |-------------------------| | N | | Expected N | | Chi-square contribution | | N / Row Total | | N / Col Total | | N / Table Total | |---------------|
Total Observations in Table:93
| Type | Origin | Compact | Large | Midsize | Small | Sporty | Van | Row Total | |-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| | USA |7|11|10|7|8|5|48| |8.258|5.677|11.355|10.839|7.226|4.645|| |0.192|4.990|0.162|1.360|0.083|0.027|| |0.146|0.229|0.208|0.146|0.167|0.104|0.516| |0.438|1.000|0.455|0.333|0.571|0.556|| |0.075|0.118|0.108|0.075|0.086|0.054|| |-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| | non-USA |9|0|12|14|6|4|45| |7.742|5.323|10.645|10.161|6.774|4.355|| |0.204|5.323|0.172|1.450|0.088|0.029|| |0.200|0.000|0.267|0.311|0.133|0.089|0.484| |0.562|0.000|0.545|0.667|0.429|0.444|| |0.097|0.000|0.129|0.151|0.065|0.043|| |-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------| | Column Total |16|11|22|21|14|9|93| |0.172|0.118|0.237|0.226|0.151|0.097|| |-------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|
Statistics for All Table Factors
Pearson's Chi-squared test ------------------------------------------------------------ Chi^2 = 14.1 d.f. = 5 p = 0.0151 Warning message: In chisq.test(t, correct = FALSE, ...) : Chi-squared approximation may be incorrect
N, Expected N, 각 셀 chi-square, 행%, 열%, 전체%에서 몇 %를 차지하는지
expected=T - 기대빈도
chisq=T - 카이제곱분포(표준정규분포 제곱)
$16 \times 48 \over 93$
$(7-8.258)^2 \over 8.258$
유의수준 0.05 $\geq$ 유의확률 0.015 - 귀무가설 기각
Origin에 따라서 Type의 차이 O
행%를 보고 해석 - 핵심적이고 큰 차이가 나는 것을 기술(ex. Large)
ex) 두 변수 독립?
$H_0$ : 두 변수 독립(Origin에 따라서 Type 차이 X)
$H_1$ : 두 변수 독립 X
O
X
sum
남
100
100
200
여
50
50
100
sum
150
150
300
두 사상이 독립(사건의 독립) $$p(O|남)=\frac{p(O \cap 남)}{p(남)}$$
data: OBP t =-0.168458, df =437, p-value =0.8663 alternative hypothesis: true mean is not equal to 0.33 95 percent confidence interval: 0.325710020.33361264 sample estimates: mean of x 0.32966133
> summary(am) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.00000.00000.00000.40621.00001.0000 > summary(mpg) Min. 1st Qu. Median Mean 3rd Qu. Max. 10.4015.4319.2020.0922.8033.90
귀무가설 : am에 따라서 mpg의 평균 차이가 없다
1 2 3 4 5 6 7 8 9 10 11 12
> var.test(mpg~am,mtcars)
F test to compare two variances
data: mpg by am F=0.38656, num df =18, denom df =12, p-value =0.06691 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.12437211.0703429 sample estimates: ratio of variances 0.3865615
var.test() : 등분산 검정 함수(수치~범주)
$p=0.06691>0.05$ : 채택 - 등분산
1 2 3 4 5 6 7 8 9 10 11 12
> t.test(mpg~am,mtcars,var.equal=T)
Two Sample t-test
data: mpg by am t =-4.1061, df =30, p-value =0.000285 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -10.84837-3.64151 sample estimates: mean in group 0 mean in group 1 17.1473724.39231
data: mpg by am t =-3.7671, df =18.332, p-value =0.001374 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -11.280194-3.209684 sample estimates: mean in group 0 mean in group 1 17.1473724.39231
> var.test(mpg~vs)# 이것도 등분산
F test to compare two variances
data: mpg by vs F=0.51515, num df =17, denom df =13, p-value =0.1997 alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: 0.17149351.4353527 sample estimates: ratio of variances 0.5151485
성별에 따라 학점 차이? - 등분산 검정
대응표본 T-검정(전후 수치형 2개)
검정통계량
$$T=\frac{\bar D - \mu_D}{S_D/\sqrt{n}} \sim t(n-1)$$
data: A and B t =-3.3489, df =9, p-value =0.008539 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.6869539-0.1330461 sample estimates: mean of the differences -0.41
> D=A-B;D [1]-0.8-0.6-0.30.1-1.10.2-0.3-0.5-0.5-0.3 > t.test(D,mu=0)# 일표본 T-검정으로도 가능
One Sample t-test
data: D t =-3.3489, df =9, p-value =0.008539 alternative hypothesis: true mean is not equal to 0 95 percent confidence interval: -0.6869539-0.1330461 sample estimates: mean of x -0.41
Terms: AirBags Residuals Sum of Squares 218.67991095.1481 Deg. of Freedom 290
Residual standard error:3.488311 Estimated effects may be unbalanced > TukeyHSD(a2) Tukey multiple comparisons of means 95% family-wise confidence level
Fit: aov(formula = Width ~ AirBags, data = Cars93)
$AirBags diff lwr upr p adj Driver only-Driver & Passenger -2.014535-4.4489210.41985120.1250460 None-Driver & Passenger -4.286765-6.807012-1.76651700.0003127 None-Driver only -2.272230-4.180014-0.36444520.0153291
다중비교(기각되면 필수)
1 2 3 4 5
> aggregate(Width,by=list(AirBags),mean) Group.1 x 1 Driver & Passenger 71.87500 2 Driver only 69.86047 3 None 67.58824
Residual standard error:8.391 on 91 degrees of freedom Multiple R-squared:0.2536, Adjusted R-squared:0.2454 F-statistic:30.93 on 1 and 91 DF, p-value:2.663e-07 >2177/(6406+2177) [1]0.2536409