R (5)

Data 선택

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
> library(MASS)
> str(Cars93)
'data.frame': 93 obs. of 27 variables:
$ Manufacturer : Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
$ Model : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
$ Type : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
$ Min.Price : num 12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
$ Price : num 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
$ Max.Price : num 18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
$ MPG.city : int 25 18 20 19 22 22 19 16 19 16 ...
$ MPG.highway : int 31 25 26 26 30 31 28 25 27 25 ...
$ AirBags : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
$ DriveTrain : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
$ Cylinders : Factor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ...
$ EngineSize : num 1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ...
$ Horsepower : int 140 200 172 172 208 110 170 180 170 200 ...
$ RPM : int 6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ...
$ Rev.per.mile : int 2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ...
$ Man.trans.avail : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
$ Fuel.tank.capacity: num 13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ...
$ Passengers : int 5 5 5 6 4 6 6 6 5 6 ...
$ Length : int 177 195 180 193 186 189 200 216 198 206 ...
$ Wheelbase : int 102 115 102 106 109 105 111 116 108 114 ...
$ Width : int 68 71 67 70 69 69 74 78 73 73 ...
$ Turn.circle : int 37 38 37 37 39 41 42 45 41 43 ...
$ Rear.seat.room : num 26.5 30 28 31 27 28 30.5 30.5 26.5 35 ...
$ Luggage.room : int 11 15 14 17 13 16 17 21 14 18 ...
$ Weight : int 2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ...
$ Origin : Factor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ...
$ Make : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ...
> attach(Cars93)
> a=Cars93[which(MPG.city>30),'Model'];a #MPG.city>30인 Model변수만 가져옴
[1] Festiva Metro Civic LeMans Justy Swift Tercel
93 Levels: 100 190E 240 300E 323 535i 626 850 90 900 Accord Achieva Aerostar Altima ... Vision
> b=Cars93[which(MPG.city>30),c('Model','Origin')];b
Model Origin
31 Festiva USA
39 Metro non-USA
42 Civic non-USA
73 LeMans USA
80 Justy non-USA
83 Swift non-USA
84 Tercel non-USA
> c=Cars93[which(MPG.city>30),c('Model','Origin','MPG.city')];c
Model Origin MPG.city
31 Festiva USA 31
39 Metro non-USA 46
42 Civic non-USA 42
73 LeMans USA 31
80 Justy non-USA 33
83 Swift non-USA 39
84 Tercel non-USA 32
> d=Cars93[which(MPG.city>30),];d #안치면 다나옴
Manufacturer Model Type Min.Price Price Max.Price MPG.city MPG.highway AirBags DriveTrain
31 Ford Festiva Small 6.9 7.4 7.9 31 33 None Front
39 Geo Metro Small 6.7 8.4 10.0 46 50 None Front
42 Honda Civic Small 8.4 12.1 15.8 42 46 Driver only Front
73 Pontiac LeMans Small 8.2 9.0 9.9 31 41 None Front
80 Subaru Justy Small 7.3 8.4 9.5 33 37 None 4WD
83 Suzuki Swift Small 7.3 8.6 10.0 39 43 None Front
84 Toyota Tercel Small 7.8 9.8 11.8 32 37 Driver only Front
Cylinders EngineSize Horsepower RPM Rev.per.mile Man.trans.avail Fuel.tank.capacity Passengers
31 4 1.3 63 5000 3150 Yes 10.0 4
39 3 1.0 55 5700 3755 Yes 10.6 4
42 4 1.5 102 5900 2650 Yes 11.9 4
73 4 1.6 74 5600 3130 Yes 13.2 4
80 3 1.2 73 5600 2875 Yes 9.2 4
83 3 1.3 70 6000 3360 Yes 10.6 4
84 4 1.5 82 5200 3505 Yes 11.9 5
Length Wheelbase Width Turn.circle Rear.seat.room Luggage.room Weight Origin Make
31 141 90 63 33 26.0 12 1845 USA Ford Festiva
39 151 93 63 34 27.5 10 1695 non-USA Geo Metro
42 173 103 67 36 28.0 12 2350 non-USA Honda Civic
73 177 99 66 35 25.5 17 2350 USA Pontiac LeMans
80 146 90 60 32 23.5 10 2045 non-USA Subaru Justy
83 161 93 63 34 27.5 10 1965 non-USA Suzuki Swift
84 162 94 65 36 24.0 11 2055 non-USA Toyota Tercel
> e=Cars93[which(Cylinders==4&Manufacturer=='Hyundai'),c('Model','Min.Price','Max.Price')];e
Model Min.Price Max.Price
44 Excel 6.8 9.2
45 Elantra 9.0 11.0
46 Scoupe 9.1 11.0
47 Sonata 12.4 15.3
  • Data 중 원하는 Data만 선택
  • Indexing 중요

subset()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
> subset(Cars93,select=Model,subset=(MPG.city>30))
Model
31 Festiva
39 Metro
42 Civic
73 LeMans
80 Justy
83 Swift
84 Tercel
> subset(Cars93,select=c(Model,Min.Price,Max.Price))
Model Min.Price Max.Price
1 Integra 12.9 18.8
2 Legend 29.2 38.7
3 90 25.9 32.3
4 100 30.8 44.6
5 535i 23.7 36.2
6 Century 14.2 17.3
7 LeSabre 19.9 21.7
8 Roadmaster 22.6 24.9
9 Riviera 26.3 26.3
10 DeVille 33.0 36.3
11 Seville 37.5 42.7
...
> subset(Cars93,select=c(Model,Min.Price,Max.Price),subset=(Cylinders==4&Manufacturer=='Hyundai'))
Model Min.Price Max.Price
44 Excel 6.8 9.2
45 Elantra 9.0 11.0
46 Scoupe 9.1 11.0
47 Sonata 12.4 15.3
> subset(Cars93,c(Model,Min.Price,Max.Price),subset=(Cylinders==4&Manufacturer=='Hyundai')) #select= 없어도 됨 , subset= 필수
Model Min.Price Max.Price
44 Excel 6.8 9.2
45 Elantra 9.0 11.0
46 Scoupe 9.1 11.0
47 Sonata 12.4 15.3
> y=subset(Cars93,select=-Model,subset=(Cylinders==4&Manufacturer=='Hyundai'));y #Model 빼고 다 가져옴
Manufacturer Type Min.Price Price Max.Price MPG.city MPG.highway AirBags DriveTrain Cylinders
44 Hyundai Small 6.8 8.0 9.2 29 33 None Front 4
45 Hyundai Small 9.0 10.0 11.0 22 29 None Front 4
46 Hyundai Sporty 9.1 10.0 11.0 26 34 None Front 4
47 Hyundai Midsize 12.4 13.9 15.3 20 27 None Front 4
EngineSize Horsepower RPM Rev.per.mile Man.trans.avail Fuel.tank.capacity Passengers Length
44 1.5 81 5500 2710 Yes 11.9 5 168
45 1.8 124 6000 2745 Yes 13.7 5 172
46 1.5 92 5550 2540 Yes 11.9 4 166
47 2.0 128 6000 2335 Yes 17.2 5 184
Wheelbase Width Turn.circle Rear.seat.room Luggage.room Weight Origin Make
44 94 63 35 26.0 11 2345 non-USA Hyundai Excel
45 98 66 36 28.0 12 2620 non-USA Hyundai Elantra
46 94 64 34 23.5 9 2285 non-USA Hyundai Scoupe
47 104 69 41 31.0 14 2885 non-USA Hyundai Sonata
> subset(Cars93,select=c(Model,Max.Price)) #조건을 안주면 select만으로 가능
Model Max.Price
1 Integra 18.8
2 Legend 38.7
3 90 32.3
4 100 44.6
5 535i 36.2
6 Century 17.3
7 LeSabre 21.7
8 Roadmaster 24.9
9 Riviera 26.3
10 DeVille 36.3
...
> subset(Cars93,c(Model,Max.Price),subset=T) #조건을 안주면 select만으로 가능
Model Max.Price
1 Integra 18.8
2 Legend 38.7
3 90 32.3
4 100 44.6
5 535i 36.2
6 Century 17.3
7 LeSabre 21.7
8 Roadmaster 24.9
9 Riviera 26.3
10 DeVille 36.3
...
~~~~
***
# na.omit()

~~~R
> str(air)
'data.frame': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
> air=na.omit(airquality)
> str(air)
'data.frame': 111 obs. of 6 variables:
$ Ozone : int 41 36 12 18 23 19 8 16 11 14 ...
$ Solar.R: int 190 118 149 313 299 99 19 256 290 274 ...
$ Wind : num 7.4 8 12.6 11.5 8.6 13.8 20.1 9.7 9.2 10.9 ...
$ Temp : int 67 72 74 62 65 59 61 69 66 68 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 7 8 9 12 13 14 ...
- attr(*, "na.action")= 'omit' Named int 5 6 10 11 25 26 27 32 33 34 ...
..- attr(*, "names")= chr "5" "6" "10" "11" ...

sample()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
> air[sample(1:nrow(airquality),10),]
Ozone Solar.R Wind Temp Month Day
52 NA 150 6.3 77 6 21
39 NA 273 6.9 87 6 8
91 64 253 7.4 83 7 30
77 48 260 6.9 81 7 16
15 18 65 13.2 58 5 15
146 36 139 10.3 81 9 23
60 NA 31 14.9 77 6 29
10 NA 194 8.6 69 5 10
114 9 36 14.3 72 8 22
94 9 24 13.8 81 8 2
> air[sample(1:nrow(airquality),2),]
Ozone Solar.R Wind Temp Month Day
72 NA 139 8.6 82 7 11
135 21 259 15.5 76 9 12
> lotto=1:45
> r=lotto[sample(1:45,6)] #Vector이므로 ,없음
> r
[1] 38 3 28 36 18 32
  • nrow()는 행의 개수
  • 랜덤하게 뽑음(하지만 복원추출)

for문

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
> fac.x=1
> for(i in 1:5){
+ fac.x=fac.x*i
+ cat(i,'!=',fac.x,'\n',sep='')
+ }
1!=1
2!=2
3!=6
4!=24
5!=120
> x=5
> for(i in 1:9){
+ y=x*i
+ cat(x,'*',i,'=',y,'\n',sep='')
+ }
5*1=5
5*2=10
5*3=15
5*4=20
5*5=25
5*6=30
5*7=35
5*8=40
5*9=45
> for(i in 1:9){
+ for(j in 1:9){
+ a=i*j
+ cat(i,'*',j,'=',a,'\n',sep='')
+ }
+ }
1*1=1
1*2=2
1*3=3
...
9*7=63
9*8=72
9*9=81

for


while문

1
2
3
4
5
6
7
8
9
10
11
> fac.x=1;i=1
> while(i<=5){
+ fac.x=fac.x*i
+ cat(i,'!=',fac.x,'\n',sep='')
+ i=i+1
+ }
1!=1
2!=2
3!=6
4!=24
5!=120
  • F면 루프 나감

if문

1
2
3
4
5
6
7
8
9
10
11
12
> a2=1;a1=4;a0=3
> D=a1^2-4*a2*a0
>
> if(D>0){
+ roots=c((-a1+sqrt(D))/(2*a2),(-a1-sqrt(D))/(2*a2))
+ }else if(D==0){
+ roots=-a1/(2*a2)
+ }else{
+ roots=c('No Root')
+ }
> roots
[1] -1 -3
  • if(){} 뒤에 바로 else if(){} or else{}

function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
> rt=function(a2,a1,a0){
+ D=a1^2-4*a2*a0
+ if(D>0){
+ roots=c((-a1+sqrt(D))/(2*a2),(-a1-sqrt(D))/(2*a2))
+ }else if(D==0){
+ roots=-a1/(2*a2)
+ }else{
+ roots=c('No Root')
+ }
+ return(roots)
+ }
> rt(1,2,3)
[1] "No Root"
> fac=function(val){
+ out=1
+ if(val>0){
+ for(i in 1:val){
+ out=out*i
+ }
+ }else if(val==0){
+ out=1
+ }else{
+ out='error'
+ }
+ return(out)
+ }
> fac(-1)
[1] "error"
> fac(0)
[1] 1
> fac(5)
[1] 120

왜도(비대칭도)

확률 이론 및 통계학에서, 비대칭도(非對稱度, skewness) 또는 왜도(歪度)는 실수 값 확률 변수의 확률 분포 비대칭성을 나타내는 지표이다. 왜도의 값은 양수나 음수가 될 수 있으며 정의되지 않을 수도 있다. 왜도가 음수일 경우에는 확률밀도함수의 왼쪽 부분에 긴 꼬리를 가지며 중앙값을 포함한 자료가 오른쪽에 더 많이 분포해 있다. 왜도가 양수일 때는 확률밀도함수의 오른쪽 부분에 긴 꼬리를 가지며 자료가 왼쪽에 더 많이 분포해 있다는 것을 나타낸다. 평균과 중앙값이 같으면 왜도는 0이 된다.

skewness-1
skewness-2