[캐글스터디] kernel density estimation

728x90

SMALL

1. sns.kdeplot

f, ax = plt.subplots(1,1, figsize = (9,5))

sns.kdeplot(df_train[df_train['Survived'] == 1]['Age'])
sns.kdeplot(df_train[df_train['Survived'] == 0]['Age'])

plt.legend(['Survived == 1', 'Survived == 0'])

2. plot(kind = "kde")

plt.figure(figsize =(8,6))
df_train['Age'][df_train['Pclass'] == 1].plot(kind = 'kde')
df_train['Age'][df_train['Pclass'] == 2].plot(kind = "kde")
df_train['Age'][df_train['Pclass'] == 3].plot(kind = 'kde')

plt.xlabel('Age')
plt.title("Age Distribution within Pclass")
plt.legend(['1st', '2nc', '3rd'])

pandas.DataFrame.plot.kde

DataFrame.plot.kde(bw_method=None, ind=None, **kwargs)[source]

Generate Kernel Density Estimate plot using Gaussian kernels.

In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function (PDF) of a random variable. This function uses Gaussian kernels and includes automatic bandwidth determination.

Kernel density estimation

From Wikipedia, the free encyclopedia

Jump to navigation Jump to search

For broader coverage of this topic, see Kernel estimation.

Kernel density estimation of 100 normally distributed random numbers using different smoothing bandwidths.

In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form.[1][2]

Kernel density estimation - Wikipedia

In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, base

en.wikipedia.org

KDE 를 알기 위해선 밀도추정이 무엇인지 알아야한다.

밀도추정(Density Estimation)

모아진 데이터들의 분포 특성을 이용해 내가 찾고자 하는 변수의 특성을 추정하고자 하는 것

그 변수의 확률밀도함수를 추정하는 것

밀도추정의 방법은 Parametric / non-Parametric 으로 나눠진다

1) Parametric 은 확률밀도함수에 대한 모델을 "미리" 정해놓고, 데이터들로부터 모델의 파라미터만 추정하는 방식.

2) non-Parametric 은 사전지식 없이 순수하게 관측된 데이터만으로 확률밀도함수를 추정하는 방식, 가장 간단한 형태가 히스토그램.

이제 다시 KDE(Kernal Density Estimation 커널 밀도 추정),

KDE 방법은 non-Parametric 밀도추정 방법 중 하나로, 커널함수를 이용해서 히스토그램의 문제점을 개선한 것.

커널함수는 원점을 중심으로 대칭이면서 적분값이 1인 non-negative 함수로 정의되는... 수학적 어쩌구...

-> 그래서 결론은 히스토그램보다 좋은 데이터의 분포를 확인하는 순수한 방법이라는 것이다.

LIST

저작자표시

'개발공부' 카테고리의 다른 글

[코딩테스트] 탐욕법(Greedy) - 프로그래머스 1. 체육복 (python) (0)	2021.09.06
2. 깃허브(Github) 연동하기 (0)	2021.02.21
1. Visual Studio Code 설치 (0)	2021.02.21

열음:열매가 열리는 계절

[캐글스터디] kernel density estimation

'개발공부' 카테고리의 다른 글

티스토리툴바

[캐글스터디] kernel density estimation

'개발공부' 카테고리의 다른 글

관련글

티스토리툴바