결측값 X

read.table

1
2
setwd("/Users/zerohertz")
text=read.table('Data.txt',header=T)

Data 취사 선택

indexing - []

Read more »

정규성 검정 function

shapiro.test(), qqnorm(), qqline()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
> library(MASS)
> attach(Cars93)
> shapiro.test(Price)

Shapiro-Wilk normality test

data: Price
W = 0.88051, p-value = 4.235e-07

> str(Price)
num [1:93] 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
> qqnorm(Price)
> qqline(Price)
> qqnorm(log(Price))
> qqline(log(Price))
> shapiro.test(log(Price))

Shapiro-Wilk normality test

data: log(Price)
W = 0.9841, p-value = 0.32
Read more »

Reference
Detection


$$을 이용하여 수식 표현

1
2
3
4
5
> $f(x)=x^2$

> $$
> f(x)=x^2
> $$

$f(x)=x^2$

$$
f(x)=x^2
$$

Read more »

통계적 추론

가설 검정

표본을 통해 모집단의 특성(모수mu)를 알기위해

  • 대문자 X - rv(확률변수)
  • 소문자 x - 표본(하나의 값)

rv - 통계량 - 표본분포(통계량의 확률분포)

Read more »

ifelse

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
> x=1
> y=2
> if(x>y) x else y
[1] 2
> if(x<y) x else y
[1] 1
> ifelse(x>y,x,y) #조건식,참x,거짓y
[1] 2
> ifelse(x<y,x,y) #조건식,참x,거짓y
[1] 1
> grade=ifelse(airquality$Temp>=60,'상','하');grade
[1] "상" "상" "상" "상" "하" "상" "상" "하" "상" "상" "상" "상" "상" "상" "하" "상" "상" "하" "상"
...
[153] "상"
> air0=data.frame(airquality,grade);air0
Ozone Solar.R Wind Temp Month Day grade
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
9 8 19 20.1 61 5 9
10 NA 194 8.6 69 5 10
...
> airquality$grade=ifelse(airquality$Temp>=60,'상','하');airquality
Ozone Solar.R Wind Temp Month Day grade
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
9 8 19 20.1 61 5 9
10 NA 194 8.6 69 5 10
...
Read more »

Data 선택

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
> library(MASS)
> str(Cars93)
'data.frame': 93 obs. of 27 variables:
$ Manufacturer : Factor w/ 32 levels "Acura","Audi",..: 1 1 2 2 3 4 4 4 4 5 ...
$ Model : Factor w/ 93 levels "100","190E","240",..: 49 56 9 1 6 24 54 74 73 35 ...
$ Type : Factor w/ 6 levels "Compact","Large",..: 4 3 1 3 3 3 2 2 3 2 ...
$ Min.Price : num 12.9 29.2 25.9 30.8 23.7 14.2 19.9 22.6 26.3 33 ...
$ Price : num 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
$ Max.Price : num 18.8 38.7 32.3 44.6 36.2 17.3 21.7 24.9 26.3 36.3 ...
$ MPG.city : int 25 18 20 19 22 22 19 16 19 16 ...
$ MPG.highway : int 31 25 26 26 30 31 28 25 27 25 ...
$ AirBags : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
$ DriveTrain : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
$ Cylinders : Factor w/ 6 levels "3","4","5","6",..: 2 4 4 4 2 2 4 4 4 5 ...
$ EngineSize : num 1.8 3.2 2.8 2.8 3.5 2.2 3.8 5.7 3.8 4.9 ...
$ Horsepower : int 140 200 172 172 208 110 170 180 170 200 ...
$ RPM : int 6300 5500 5500 5500 5700 5200 4800 4000 4800 4100 ...
$ Rev.per.mile : int 2890 2335 2280 2535 2545 2565 1570 1320 1690 1510 ...
$ Man.trans.avail : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
$ Fuel.tank.capacity: num 13.2 18 16.9 21.1 21.1 16.4 18 23 18.8 18 ...
$ Passengers : int 5 5 5 6 4 6 6 6 5 6 ...
$ Length : int 177 195 180 193 186 189 200 216 198 206 ...
$ Wheelbase : int 102 115 102 106 109 105 111 116 108 114 ...
$ Width : int 68 71 67 70 69 69 74 78 73 73 ...
$ Turn.circle : int 37 38 37 37 39 41 42 45 41 43 ...
$ Rear.seat.room : num 26.5 30 28 31 27 28 30.5 30.5 26.5 35 ...
$ Luggage.room : int 11 15 14 17 13 16 17 21 14 18 ...
$ Weight : int 2705 3560 3375 3405 3640 2880 3470 4105 3495 3620 ...
$ Origin : Factor w/ 2 levels "USA","non-USA": 2 2 2 2 2 1 1 1 1 1 ...
$ Make : Factor w/ 93 levels "Acura Integra",..: 1 2 4 3 5 6 7 9 8 10 ...
> attach(Cars93)
> a=Cars93[which(MPG.city>30),'Model'];a #MPG.city>30인 Model변수만 가져옴
[1] Festiva Metro Civic LeMans Justy Swift Tercel
93 Levels: 100 190E 240 300E 323 535i 626 850 90 900 Accord Achieva Aerostar Altima ... Vision
> b=Cars93[which(MPG.city>30),c('Model','Origin')];b
Model Origin
31 Festiva USA
39 Metro non-USA
42 Civic non-USA
73 LeMans USA
80 Justy non-USA
83 Swift non-USA
84 Tercel non-USA
> c=Cars93[which(MPG.city>30),c('Model','Origin','MPG.city')];c
Model Origin MPG.city
31 Festiva USA 31
39 Metro non-USA 46
42 Civic non-USA 42
73 LeMans USA 31
80 Justy non-USA 33
83 Swift non-USA 39
84 Tercel non-USA 32
> d=Cars93[which(MPG.city>30),];d #안치면 다나옴
Manufacturer Model Type Min.Price Price Max.Price MPG.city MPG.highway AirBags DriveTrain
31 Ford Festiva Small 6.9 7.4 7.9 31 33 None Front
39 Geo Metro Small 6.7 8.4 10.0 46 50 None Front
42 Honda Civic Small 8.4 12.1 15.8 42 46 Driver only Front
73 Pontiac LeMans Small 8.2 9.0 9.9 31 41 None Front
80 Subaru Justy Small 7.3 8.4 9.5 33 37 None 4WD
83 Suzuki Swift Small 7.3 8.6 10.0 39 43 None Front
84 Toyota Tercel Small 7.8 9.8 11.8 32 37 Driver only Front
Cylinders EngineSize Horsepower RPM Rev.per.mile Man.trans.avail Fuel.tank.capacity Passengers
31 4 1.3 63 5000 3150 Yes 10.0 4
39 3 1.0 55 5700 3755 Yes 10.6 4
42 4 1.5 102 5900 2650 Yes 11.9 4
73 4 1.6 74 5600 3130 Yes 13.2 4
80 3 1.2 73 5600 2875 Yes 9.2 4
83 3 1.3 70 6000 3360 Yes 10.6 4
84 4 1.5 82 5200 3505 Yes 11.9 5
Length Wheelbase Width Turn.circle Rear.seat.room Luggage.room Weight Origin Make
31 141 90 63 33 26.0 12 1845 USA Ford Festiva
39 151 93 63 34 27.5 10 1695 non-USA Geo Metro
42 173 103 67 36 28.0 12 2350 non-USA Honda Civic
73 177 99 66 35 25.5 17 2350 USA Pontiac LeMans
80 146 90 60 32 23.5 10 2045 non-USA Subaru Justy
83 161 93 63 34 27.5 10 1965 non-USA Suzuki Swift
84 162 94 65 36 24.0 11 2055 non-USA Toyota Tercel
> e=Cars93[which(Cylinders==4&Manufacturer=='Hyundai'),c('Model','Min.Price','Max.Price')];e
Model Min.Price Max.Price
44 Excel 6.8 9.2
45 Elantra 9.0 11.0
46 Scoupe 9.1 11.0
47 Sonata 12.4 15.3
Read more »

sink()

1
2
3
4
5
6
7
8
setwd("/Users/zerohertz/RData")
air=airquality
str(air)
attach(air)
sink('output.txt') #콘솔창 결과를 외부 파일로 저장
mean(Temp)
sd(Temp)
sink() #끝났다는 표시
Read more »

벤처기업(Day 2)

Venture?

벤처기업의 학술적인 명확한 개념정의 X

  • 미국 - 위험성이 크나 성공할 경우 높은 기대수익이 예상되는 신기술 또는 아이디어를 독립기반 위에 영위하는 신생기업으로 규정
  • 일본 - 중소기업으로서 R&D 투자비율이 총 매출액의 3% 이상인 기업, 창업 후 5년 미만인 기업
  • OECD - R&D 집중도가 높은 기업 / 기술혁신이나 기술적 우월성이 성공의 주요 요인인 기업

R&D

Research & Development

벤처유형

  • 벤처투자 기업
  • 연구개발 기업
  • 기술평가 보증기업 및 기술평가 대출기업
  • 예비벤처 기업
Read more »

시계열 Data

1
2
3
4
5
6
7
> uspop
Time Series:
Start = 1790
End = 1970
Frequency = 0.1
[1] 3.93 5.31 7.24 9.64 12.90 17.10 23.20 31.40 39.80 50.20 62.90 76.00 92.00 105.70 122.80 131.70 151.30 179.30
[19] 203.20
Read more »

변수 지우기

1
2
3
> rm(list=ls())
> ls()
character(0)
  • ls()는 지금까지 생성한 변수들의 목록을 문자열들로 반환하는 함수(list의 약자)
Read more »

R

1
2
3
4
5
6
7
8
9
10
11
12
13
#전산실습 Day 1
speed=c(4,7,8,9,10,11,12,13,13,14);speed #Multi input #Ctrl+R is compiling of only one line
dist=c(2,4,16,10,18,17,24,34,26,26);dist #대소문자 구분
mean(speed)
mean(dist) #결측값이 있는 data에서 수식이용시 결과는 결측값
#함수이용시 결과는 결측값 제외하고 계산
sd(speed) #Standard variation
#수치자료 : 대표값(mean),산포도(sd),비대칭도-왜도(skew)
#평균-중심위치
min(speed);max(speed) #Find Error
summary(speed)
plot(speed,dist)
cor(speed,dist) #상관계수(산점도 확인 후)-직선적인 정도 
Read more »

Species classification by sepal

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
clear

%Load data
load fisheriris
x=meas(:,1:2);
y=categorical(species);
labels = categories(y);
figure(1)
gscatter(x(:,1),x(:,2),species,'rgb','osd');
xlabel('Sepal length');
ylabel('Sepal width');

%Learning by data
classifier{1}=ClassificationDiscriminant.fit(x,y);
classifier{2}=ClassificationTree.fit(x,y);
classifier{3}=ClassificationKNN.fit(x,y);
%classifier{4}=NaiveBayes.fit(x,y); 2019 ver
classifier_name={'Discriminant Analysis','Classification Tree','Nearest Neighbor'}; %'Naive Bayes'

%Check the result
[xx1,xx2]=meshgrid(4:.01:8,2:.01:4.5);
figure(2)
for ii=1:numel(classifier)
ypred=predict(classifier{ii},[xx1(:) xx2(:)]);
h(ii)=subplot(2,2,ii);
gscatter(xx1(:),xx2(:),ypred,'rgb');
title(classifier_name{ii},'FontSize',15)
legend off
axis tight
end

%Confusion Matrix
figure(3)
predictResult=predict(classifier{2},x);
y=categorical(y);
predictResult=categorical(predictResult);
plotconfusion(y,predictResult);
Read more »