R (2)

변수 지우기

1
2
3
> rm(list=ls())
> ls()
character(0)
  • ls()는 지금까지 생성한 변수들의 목록을 문자열들로 반환하는 함수(list의 약자)

기본 Data 제공

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
> summary(trees)
Girth Height Volume
Min. : 8.30 Min. :63 Min. :10.20
1st Qu.:11.05 1st Qu.:72 1st Qu.:19.40
Median :12.90 Median :76 Median :24.20
Mean :13.25 Mean :76 Mean :30.17
3rd Qu.:15.25 3rd Qu.:80 3rd Qu.:37.30
Max. :20.60 Max. :87 Max. :77.00
> summary(chickwts)
weight feed
Min. :108.0 casein :12
1st Qu.:204.5 horsebean:10
Median :258.0 linseed :12
Mean :261.3 meatmeal :11
3rd Qu.:323.5 soybean :14
Max. :423.0 sunflower:12
> summary(cars)
speed dist
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00

자료 종류

  • 범주형 : 빈도 chickwts - feed
  • 수치형 : summary() 나머지

기말 - 자료의 종류에 따라서 통계분석 기법은 정해짐


Data package

1
2
3
library(MASS)
install.packages("UsingR")
library(UsingR)
  • Package - 다른사람이 제공하는 DataFunction을 저장해둔곳
  • library()C에서의 #include
  • /Library/Frameworks/R.framework/Versions/3.6/Resources/library
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
> summary(Cars93)
Manufacturer Model Type Min.Price Price Max.Price MPG.city MPG.highway
Chevrolet: 8 100 : 1 Compact:16 Min. : 6.70 Min. : 7.40 Min. : 7.9 Min. :15.00 Min. :20.00
Ford : 8 190E : 1 Large :11 1st Qu.:10.80 1st Qu.:12.20 1st Qu.:14.7 1st Qu.:18.00 1st Qu.:26.00
Dodge : 6 240 : 1 Midsize:22 Median :14.70 Median :17.70 Median :19.6 Median :21.00 Median :28.00
Mazda : 5 300E : 1 Small :21 Mean :17.13 Mean :19.51 Mean :21.9 Mean :22.37 Mean :29.09
Pontiac : 5 323 : 1 Sporty :14 3rd Qu.:20.30 3rd Qu.:23.30 3rd Qu.:25.3 3rd Qu.:25.00 3rd Qu.:31.00
Buick : 4 535i : 1 Van : 9 Max. :45.40 Max. :61.90 Max. :80.0 Max. :46.00 Max. :50.00
(Other) :57 (Other):87
AirBags DriveTrain Cylinders EngineSize Horsepower RPM Rev.per.mile Man.trans.avail
Driver & Passenger:16 4WD :10 3 : 3 Min. :1.000 Min. : 55.0 Min. :3800 Min. :1320 No :32
Driver only :43 Front:67 4 :49 1st Qu.:1.800 1st Qu.:103.0 1st Qu.:4800 1st Qu.:1985 Yes:61
None :34 Rear :16 5 : 2 Median :2.400 Median :140.0 Median :5200 Median :2340
6 :31 Mean :2.668 Mean :143.8 Mean :5281 Mean :2332
8 : 7 3rd Qu.:3.300 3rd Qu.:170.0 3rd Qu.:5750 3rd Qu.:2565
rotary: 1 Max. :5.700 Max. :300.0 Max. :6500 Max. :3755

Fuel.tank.capacity Passengers Length Wheelbase Width Turn.circle Rear.seat.room Luggage.room
Min. : 9.20 Min. :2.000 Min. :141.0 Min. : 90.0 Min. :60.00 Min. :32.00 Min. :19.00 Min. : 6.00
1st Qu.:14.50 1st Qu.:4.000 1st Qu.:174.0 1st Qu.: 98.0 1st Qu.:67.00 1st Qu.:37.00 1st Qu.:26.00 1st Qu.:12.00
Median :16.40 Median :5.000 Median :183.0 Median :103.0 Median :69.00 Median :39.00 Median :27.50 Median :14.00
Mean :16.66 Mean :5.086 Mean :183.2 Mean :103.9 Mean :69.38 Mean :38.96 Mean :27.83 Mean :13.89
3rd Qu.:18.80 3rd Qu.:6.000 3rd Qu.:192.0 3rd Qu.:110.0 3rd Qu.:72.00 3rd Qu.:41.00 3rd Qu.:30.00 3rd Qu.:15.00
Max. :27.00 Max. :8.000 Max. :219.0 Max. :119.0 Max. :78.00 Max. :45.00 Max. :36.00 Max. :22.00
NA's :2 NA's :11
Weight Origin Make
Min. :1695 USA :48 Acura Integra: 1
1st Qu.:2620 non-USA:45 Acura Legend : 1
Median :3040 Audi 100 : 1
Mean :3073 Audi 90 : 1
3rd Qu.:3525 BMW 535i : 1
Max. :4105 Buick Century: 1
(Other) :87

Work directory

1
2
3
4
5
> getwd()
[1] "/Users/zerohertz/Documents/R"
> setwd("/Users/zerohertz/Downloads")
> getwd()
[1] "/Users/zerohertz/Downloads"
  • ₩, \대신 /사용

save.image()

1
2
3
a=1+1
save.image(file="/Users/zerohertz/Downloads/asd.RData")
load("/Users/zerohertz/Downloads/asd.RData")

no-variable
load

  • 숨김파일은 cmd + . + shift
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
> ls()
character(0)
> a="test"
> a
[1] "test"
> save.image()
> getwd()
[1] "/Users/zerohertz"
> setwd("/Users/zerohertz/Downloads")
> b="Another Work Directory"
> save.image()
> getwd()
[1] "/Users/zerohertz/Downloads"
> rm(list=ls())
> load("/Users/zerohertz/.RData")
> a
[1] "test"
> b
Error: object 'b' not found
> rm(list=ls())
> load("/Users/zerohertz/Downloads/.RData")
> a
[1] "test"
> b
[1] "Another Work Directory"

str()

1
2
3
4
> str(cars)
'data.frame': 50 obs. of 2 variables:
$ speed: num 4 4 7 7 8 9 10 10 10 11 ...
$ dist : num 2 10 4 22 16 10 18 26 34 17 ...
  • Structure의 약자

자료형

1
2
3
4
5
6
7
8
> mean(cars)
[1] NA
Warning message:
In mean.default(cars) : argument is not numeric or logical: returning NA
> typeof(cars)
[1] "list"
> typeof(speed)
[1] "double"

attach()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
> mean(cars$speed)
[1] 15.4
> sd(cars$dist)
[1] 25.76938
> mean(speed)
Error in mean(speed) : object 'speed' not found
> sd(dist)
Error in as.double(x) :
cannot coerce type 'closure' to vector of type 'double'
> attach(cars)
> mean(cars$speed)
[1] 15.4
> sd(cars$dist)
[1] 25.76938
> mean(speed)
[1] 15.4
> sd(dist)
[1] 25.76938
  • C++Using namespace와 유사

객체 Vector/Matrix/data.frame

1
2
3
x=c(1,3,5,4,3,3) #숫자형 벡터
y=c("one","two","three") #문자형 벡터
z=c(TRUE,TRUE,FALSE,F) #논리형 벡터
  • Vector : 숫자, 문자, 논리만의 Vector 가능
  • Matrix : 행과 열/각 열이 숫자, 문자 논리만 가지는 행렬 가능
  • data.frame : 각 열이 숫자, 문자, 논리 혼합 가능(행렬과 같은 2차원 구조 / 모든 데이터가 동일한 유형일 필요 X)
  • 행 = Case, 열 = Variable

Indexing

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
> x[1]
[1] 1
> x[2]
[1] 3
> x[3]
[1] 5
> x[4]
[1] 4
> x[5]
[1] 3
> x[1,3]
Error in x[1, 3] : incorrect number of dimensions
> x[c(1,3)]
[1] 1 5
> x[1:3]
[1] 1 3 5
> cars[1,2] #Matrix[행,열]
[1] 2
> y1=y[c(1,3)];y1
[1] "one" "three"
> a[2,3] #a is matrix
[1] 8
> a[1,]
[1] 1 4 7
> a[,2]
[1] 4 5 6
  • a[x,y] - Matrix or data.frame

factor()

1
2
3
4
5
6
7
8
9
10
11
12
> gender=c("m","f","m","f");gender
[1] "m" "f" "m" "f"
> str(gender)
chr [1:4] "m" "f" "m" "f"
> f.gender=factor(gender);f.gender
[1] m f m f
Levels: f m
> str(f.gender)
Factor w/ 2 levels "f","m": 2 1 2 1
> summary(f.gender)
f m
2 2
  • 범주형자료는 요인(factor)로 무조건 변경
  • 알파벳순
1
2
3
4
5
6
> f.gender2=factor(gender,order=TRUE,level=c("m",'f'))
> str(f.gender2)
Ord.factor w/ 2 levels "m"<"f": 1 2 1 2
> summary(f.gender2)
m f
2 2
  • order=TRUE - 오름차순
  • level - 직접정의, 올바르지 않은 data 찾기

Matrix

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
> a=matrix(1:9,nrow=3);a
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> b=matrix(1:9,nrow=3,byrow=TRUE);b
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
> c=matrix(1:100,nrow=100);c
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
[5,] 5
...
[98,] 98
[99,] 99
[100,] 100
  • small n - 표본
  • nrow - 행의 개수
  • Matrix,가 들어감
1
2
3
4
5
6
7
8
9
> cnames=c("c1","c2","c3");cnames
[1] "c1" "c2" "c3"
> rnames=c("r1","r2","r3");rnames
[1] "r1" "r2" "r3"
> z=matrix(1:9,nrow=3,dimnames=list(rnames,cnames));z
c1 c2 c3
r1 1 4 7
r2 2 5 8
r3 3 6 9
  • 행과 열의 이름 부여
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> a
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> a[2,3]
[1] 8
> a[1,]
[1] 1 4 7
> a[,2]
[1] 4 5 6
> mean(a[,2])
[1] 5
> mean(a)
[1] 5
> sum(a)
[1] 45

dim()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
> d=1:24
> dim(d)=c(4,3,2);d
, , 1

[,1] [,2] [,3]
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12

, , 2

[,1] [,2] [,3]
[1,] 13 17 21
[2,] 14 18 22
[3,] 15 19 23
[4,] 16 20 24

data.frame()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
> a1=c(1,2,3,4)
> a2=c('f','m','f','m')
> a12=data.frame(a1,a2);a12
a1 a2
1 1 f
2 2 m
3 3 f
4 4 m
> a123=data.frame(age=a1,gender=a2);a123
age gender
1 1 f
2 2 m
3 3 f
4 4 m
  • 옆으로 붙인다는 것을 잘 생각해야함
  • 데이터 문자는 항상 ""해야함

rbind()

1
2
3
4
5
6
7
8
9
10
> a_add=data.frame(age=30,gender='f');a_add
age gender
1 30 f
> a12t=rbind(a123,a_add);a12t #위 아래로 행을 붙임(Case 추가)
age gender
1 1 f
2 2 m
3 3 f
4 4 m
5 30 f
  • 위 아래로 행을 붙임(Case 추가)
  • cbind - 옆으로 열을 붙임(변수 추가)

length()/dim()

1
2
3
4
5
6
7
8
> length(a12t)
[1] 2
> dim(a12t
[1] 5 2
> str(a12t)
'data.frame': 5 obs. of 2 variables:
$ age : num 1 2 3 4 30
$ gender: Factor w/ 2 levels "f","m": 1 2 1 2 1
  • length() - 열의 수
  • dim() - 행과 열의 수
  • str() - danawa

정리

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
#전산실습 Day 1
rm(list=ls())

speed=cars$speed;speed
dist=cars$dist;dist
summary(speed)
summary(dist)
mean(speed)
mean(dist)

#전산실습 Day 2
summary(trees)
summary(chickwts)
summary(cars)

library(MASS)
summary(Cars93)

#install.packages("UsingR")
library(UsingR)

ls()

getwd()
setwd("/Users/zerohertz")
#setwd("/Users/zerohertz/Documents/R")
#setwd("/Users/zerohertz/Downloads")
getwd()

summary(cars)
mean(speed)
mean(cars$speed)

str(cars)
cars[1,2]
summary(cars)
attach(cars)
mean(cars$speed)
sd(cars$dist)
mean(speed)
sd(dist)
detach(cars)

save.image()

x=c(1,3,5,4,3,3) #숫자형 벡터
y=c("one","two","three") #문자형 벡터
z=c(TRUE,TRUE,FALSE,F) #논리형 벡터

y1=y[c(1,3)];y1

gender=c("m","f","m","f");gender
str(gender)
f.gender=factor(gender);f.gender
str(f.gender)
summary(f.gender)
f.gender2=factor(gender,order=TRUE,level=c("m",'f'))
str(f.gender2)
summary(f.gender2)

str(Cars93)

a=matrix(1:9,nrow=3);a
b=matrix(1:9,nrow=3,byrow=TRUE);b
c=matrix(1:100,nrow=100);c
cnames=c("c1","c2","c3");cnames
rnames=c("r1","r2","r3");rnames
z=matrix(1:9,nrow=3,dimnames=list(rnames,cnames));z

a[2,3]
a[1,]
a[,2]
mean(a[,2])
mean(a)
sum(a)

d=1:24
dim(d)=c(4,3,2);d

a1=c(1,2,3,4)
a2=c('f','m','f','m')
a12=data.frame(a1,a2);a12
a123=data.frame(age=a1,gender=a2);a123
a_add=data.frame(age=30,gender='f');a_add
a12t=rbind(a123,a_add);a12t #위 아래로 행을 붙임(Case 추가)

length(a12t)
dim(a12t)
str(a12t)

#q()

get data - str() - summary() - 문자면 factor() - level()