R (3)

시계열 Data

1
2
3
4
5
6
7
> uspop
Time Series:
Start = 1790
End = 1970
Frequency = 0.1
[1] 3.93 5.31 7.24 9.64 12.90 17.10 23.20 31.40 39.80 50.20 62.90 76.00 92.00 105.70 122.80 131.70 151.30 179.30
[19] 203.20
  • 기댓값 - 가능한 값마다 확률을 곱해서 모두 더한 값(확률변수의 평균)
  • 평균값 - 데이터를 모두 더한 후 데이터의 갯수로 나눈 값
  • 확률변수 - 대문자(소문자는 상수)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
> airquality
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
...
148 14 20 16.6 63 9 25
149 30 193 6.9 70 9 26
150 NA 145 13.2 77 9 27
151 14 191 14.3 75 9 28
152 18 131 8.0 76 9 29
153 20 223 11.5 68 9 30
> str(airquality)
'data.frame': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...
> dim(airquality) #dimension
[1] 153 6
> length(airquality)
[1] 6
> names(airquality)
[1] "Ozone" "Solar.R" "Wind" "Temp" "Month" "Day"
> summary(airquality)
Ozone Solar.R Wind Temp Month Day
Min. : 1.00 Min. : 7.0 Min. : 1.700 Min. :56.00 Min. :5.000 Min. : 1.0
1st Qu.: 18.00 1st Qu.:115.8 1st Qu.: 7.400 1st Qu.:72.00 1st Qu.:6.000 1st Qu.: 8.0
Median : 31.50 Median :205.0 Median : 9.700 Median :79.00 Median :7.000 Median :16.0
Mean : 42.13 Mean :185.9 Mean : 9.958 Mean :77.88 Mean :6.993 Mean :15.8
3rd Qu.: 63.25 3rd Qu.:258.8 3rd Qu.:11.500 3rd Qu.:85.00 3rd Qu.:8.000 3rd Qu.:23.0
Max. :168.00 Max. :334.0 Max. :20.700 Max. :97.00 Max. :9.000 Max. :31.0
NA's :37 NA's :7
> summary(airquality$Ozone)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.00 18.00 31.50 42.13 63.25 168.00 37
> mean(airquality$Ozone)
[1] NA
  • NA - Not Available(결측값)
  • 문자의 결측값 - <NA>

표준화

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
> attach(airquality)
> z.Temp=(Temp-mean(Temp))/sd(Temp);z.Temp #표준화
[1] -1.14971398 -0.62146702 -0.41016823 -1.67796094 -2.31185730 -1.25536337 -1.36101276 -1.99490912 -1.78361034 -0.93841519
[11] -0.41016823 -0.93841519 -1.25536337 -1.04406459 -2.10055851 -1.46666216 -1.25536337 -2.20620791 -1.04406459 -1.67796094
[21] -1.99490912 -0.51581762 -1.78361034 -1.78361034 -2.20620791 -2.10055851 -2.20620791 -1.14971398 0.32937752 0.11807873
[31] -0.19886945 0.01242934 -0.41016823 -1.14971398 0.64632570 0.75197509 0.11807873 0.43502691 0.96327387 1.28022205
[41] 0.96327387 1.59717023 1.49152084 0.43502691 0.22372813 0.11807873 -0.09322005 -0.62146702 -1.36101276 -0.51581762
[51] -0.19886945 -0.09322005 -0.19886945 -0.19886945 -0.19886945 -0.30451884 0.01242934 -0.51581762 0.22372813 -0.09322005
[61] 0.54067630 0.64632570 0.75197509 0.32937752 0.64632570 0.54067630 0.54067630 1.06892327 1.49152084 1.49152084
[71] 1.17457266 0.43502691 -0.51581762 0.32937752 1.38587145 0.22372813 0.32937752 0.43502691 0.64632570 0.96327387
[81] 0.75197509 -0.41016823 0.32937752 0.43502691 0.85762448 0.75197509 0.43502691 0.85762448 1.06892327 0.85762448
[91] 0.54067630 0.32937752 0.32937752 0.32937752 0.43502691 0.85762448 0.75197509 0.96327387 1.17457266 1.28022205
[101] 1.28022205 1.49152084 0.85762448 0.85762448 0.43502691 0.22372813 0.11807873 -0.09322005 0.11807873 -0.19886945
[111] 0.01242934 0.01242934 -0.09322005 -0.62146702 -0.30451884 0.11807873 0.32937752 0.85762448 1.06892327 2.01976780
[121] 1.70281962 1.91411841 1.70281962 1.38587145 1.49152084 1.59717023 1.59717023 0.96327387 0.64632570 0.22372813
[131] 0.01242934 -0.30451884 -0.51581762 0.32937752 -0.19886945 -0.09322005 -0.72711641 -0.72711641 0.01242934 -1.14971398
[141] -0.19886945 -1.04406459 0.43502691 -1.46666216 -0.72711641 0.32937752 -0.93841519 -1.57231155 -0.83276580 -0.09322005
[151] -0.30451884 -0.19886945 -1.04406459
> str(z.Temp)
num [1:153] -1.15 -0.621 -0.41 -1.678 -2.312 ...
> airquality2=cbind(airquality,z.Temp);airquality2
Ozone Solar.R Wind Temp Month Day z.Temp
1 41 190 7.4 67 5 1 -1.14971398
2 36 118 8.0 72 5 2 -0.62146702
3 12 149 12.6 74 5 3 -0.41016823
4 18 313 11.5 62 5 4 -1.67796094
...
141 13 27 10.3 76 9 18 -0.19886945
142 24 238 10.3 68 9 19 -1.04406459
[ reached 'max' / getOption("max.print") -- omitted 11 rows ]
> airquality3=data.frame(airquality,z.Temp);airquality3 #cbind()와 same, 이걸 더 많이 씀
Ozone Solar.R Wind Temp Month Day z.Temp
1 41 190 7.4 67 5 1 -1.14971398
2 36 118 8.0 72 5 2 -0.62146702
3 12 149 12.6 74 5 3 -0.41016823
4 18 313 11.5 62 5 4 -1.67796094
...
141 13 27 10.3 76 9 18 -0.19886945
142 24 238 10.3 68 9 19 -1.04406459
[ reached 'max' / getOption("max.print") -- omitted 11 rows ]
> airquality삼=data.frame(airquality,z.Temp);airquality삼 #한글도 가능
Ozone Solar.R Wind Temp Month Day z.Temp
1 41 190 7.4 67 5 1 -1.14971398
2 36 118 8.0 72 5 2 -0.62146702
3 12 149 12.6 74 5 3 -0.41016823
4 18 313 11.5 62 5 4 -1.67796094
...
141 13 27 10.3 76 9 18 -0.19886945
142 24 238 10.3 68 9 19 -1.04406459
[ reached 'max' / getOption("max.print") -- omitted 11 rows ]
  • 새로운 변수를 만들면 반드시 출력해서 확인

head & tail

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
> head(airquality)
Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
> tail(airquality)
Ozone Solar.R Wind Temp Month Day
148 14 20 16.6 63 9 25
149 30 193 6.9 70 9 26
150 NA 145 13.2 77 9 27
151 14 191 14.3 75 9 28
152 18 131 8.0 76 9 29
153 20 223 11.5 68 9 30
> head(airquality2)
Ozone Solar.R Wind Temp Month Day z.Temp
1 41 190 7.4 67 5 1 -1.1497140
2 36 118 8.0 72 5 2 -0.6214670
3 12 149 12.6 74 5 3 -0.4101682
4 18 313 11.5 62 5 4 -1.6779609
5 NA NA 14.3 56 5 5 -2.3118573
6 28 NA 14.9 66 5 6 -1.2553634
> tail(airquality2)
Ozone Solar.R Wind Temp Month Day z.Temp
148 14 20 16.6 63 9 25 -1.57231155
149 30 193 6.9 70 9 26 -0.83276580
150 NA 145 13.2 77 9 27 -0.09322005
151 14 191 14.3 75 9 28 -0.30451884
152 18 131 8.0 76 9 29 -0.19886945
153 20 223 11.5 68 9 30 -1.04406459
> head(airquality3)
Ozone Solar.R Wind Temp Month Day z.Temp
1 41 190 7.4 67 5 1 -1.1497140
2 36 118 8.0 72 5 2 -0.6214670
3 12 149 12.6 74 5 3 -0.4101682
4 18 313 11.5 62 5 4 -1.6779609
5 NA NA 14.3 56 5 5 -2.3118573
6 28 NA 14.9 66 5 6 -1.2553634
> tail(airquality3)
Ozone Solar.R Wind Temp Month Day z.Temp
148 14 20 16.6 63 9 25 -1.57231155
149 30 193 6.9 70 9 26 -0.83276580
150 NA 145 13.2 77 9 27 -0.09322005
151 14 191 14.3 75 9 28 -0.30451884
152 18 131 8.0 76 9 29 -0.19886945
153 20 223 11.5 68 9 30 -1.04406459
  • 간단히 확인

scan()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> x=scan()
1: 35
2: 40
3: 45
4:
Read 3 items
> x
[1] 35 40 45
> y=scan(what="character")
1: kim
2: oh
3: 2
4: ZeaMays
5:
Read 4 items
> y
[1] "kim" "oh" "2" "ZeaMays"

read.table(“.txt”), read.csv()

Data.txt

1
2
3
4
5
6
Name Score Gender
Kim 23 m
Oh 42 f
zd 21 f
Chat 31 m
Asd 23 m

Data2.txt

1
2
3
4
5
6
Name,Score,Gender
Kim,23,m
Oh,42,f
zd,21,f
Chat,31,m
Asd,23,m
  • setwd()directory.txt파일 저장
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
> setwd("/Users/zerohertz/Downloads")
> x1=read.table("Data.txt");x1
V1 V2 V3
1 Name Score Gender
2 Kim 23 m
3 Oh 42 f
4 zd 21 f
5 Chat 31 m
6 Asd 23 m
> x2=read.table("Data2.txt");x2
V1
1 Name,Score,Gender
2 Kim,23,m
3 Oh,42,f
4 zd,21,f
5 Chat,31,m
6 Asd,23,m
> x3=read.table("Data.txt",header=T);x3
Name Score Gender
1 Kim 23 m
2 Oh 42 f
3 zd 21 f
4 Chat 31 m
5 Asd 23 m
> x4=read.table("Data.txt",header=T,stringsAsFactors=F);x4
Name Score Gender
1 Kim 23 m
2 Oh 42 f
3 zd 21 f
4 Chat 31 m
5 Asd 23 m
> str(x1)
'data.frame': 6 obs. of 3 variables:
$ V1: Factor w/ 6 levels "Asd","Chat","Kim",..: 4 3 5 6 2 1
$ V2: Factor w/ 5 levels "21","23","31",..: 5 2 4 1 3 2
$ V3: Factor w/ 3 levels "f","Gender","m": 2 3 1 1 3 3
> str(x2)
'data.frame': 6 obs. of 1 variable:
$ V1: Factor w/ 6 levels "Asd,23,m","Chat,31,m",..: 4 3 5 6 2 1
> str(x3)
'data.frame': 5 obs. of 3 variables:
$ Name : Factor w/ 5 levels "Asd","Chat","Kim",..: 3 4 5 2 1
$ Score : int 23 42 21 31 23
$ Gender: Factor w/ 2 levels "f","m": 2 1 1 2 2
> str(x4)
'data.frame': 5 obs. of 3 variables:
$ Name : chr "Kim" "Oh" "zd" "Chat" ...
$ Score : int 23 42 21 31 23
$ Gender: chr "m" "f" "f" "m" ...
  • setwd()를 통해 directory 지정
  • read.table()으로 .txt import
  • header, stringsAsFactors와 같은 옵션으로 조정
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
> xyz=read.table("Data.txt",header=T);xyz
Name Score Gender
1 Kim 23 m
2 Oh 42 f
3 zd 21 f
4 Chat 31 m
5 Asd 23 .
> summary(xyz)
Name Score Gender
Asd :1 Min. :21 .:1
Chat:1 1st Qu.:23 f:2
Kim :1 Median :23 m:2
Oh :1 Mean :28
zd :1 3rd Qu.:31
Max. :42
> xyz=read.table("Data.txt",header=T);xyz
Name Score Gender
1 Kim 23 m
2 Oh 42 f
3 zd 21 f
4 Chat . m
5 Asd 23 .
> summary(xyz)
Name Score Gender
Asd :1 . :1 .:1
Chat:1 21:1 f:2
Kim :1 23:2 m:2
Oh :1 42:1
zd :1
> xyz=read.table("Data.txt",header=T,na.strings='.');xyz
Name Score Gender
1 Kim 23 m
2 Oh 42 f
3 zd 21 f
4 Chat NA m
5 Asd 23 <NA>
> summary(xyz)
Name Score Gender
Asd :1 Min. :21.00 f :2
Chat:1 1st Qu.:22.50 m :2
Kim :1 Median :23.00 NA's:1
Oh :1 Mean :27.25
zd :1 3rd Qu.:27.75
Max. :42.00
NA's :1
> xyz=read.table("Data2.txt",header=T,sep=",");xyz
Name Score Gender
1 Kim 23 m
2 Oh 42 f
3 zd 21 f
4 Chat . m
5 Asd 23 .
> xyz=read.csv("Data2.txt");xyz
Name Score Gender
1 Kim 23 m
2 Oh 42 f
3 zd 21 f
4 Chat . m
5 Asd 23 .
  • Data.txt, Data2.txt 파일에 몇가지 결측값 삽입
  • .은 문자로 인식
  • na.strings로 결측값 선언
  • sep or read.csv로 Separate
  • csv - Comma Separate Value

Excel의 Separate 형식


xlsx

1
> install.packages("xlsx")

기본 경로 설정

1
2
3
4
options("java.home"="/Library/Java/JavaVirtualMachines/jdk1.8.0_231.jdk/Contents/Home/lib")
Sys.setenv(JAVA_HOME = '/Library/Java/JavaVirtualMachines/jdk1.8.0_231.jdk/Contents/Home/jre')
dyn.load('/Library/Java/JavaVirtualMachines/jdk1.8.0_231.jdk/Contents/Home/jre/lib/server/libjvm.dylib')
Sys.setlocale("LC_ALL", "ko_KR.UTF-8")