[Boostcamp]Data visualization Seaborn

목차
1. 강의정리
1-1. Data Viz - Seaborn 소개
1-2. Data Viz - Seaborn 기초
1-3. Data Viz - Seaborn 심화

📜 강의 정리

* 부스트캠프 Data visualization 강의를 맡아주신 안수빈 강사님의 강의를 정리한 것 입니다.

[Data Viz] Seaborn 소개

- Seaborn은 Matplotlib 기반 통계 시각화 라이브러리
  - 통계 정보 : 구성, 분포, 관계 등
  - Matplotlib 기반이라 Matplotlib으로 커스텀 가능
  - 쉬운 문법과 깔끔한 디자인이 특징
  - 설치 : pip install seaborn==0.11

[Data Viz] Seaborn 기초

🌈 Seaborn의 구조와 문법에 대해 알아보자

Seaborn API 통계 시각화 종류
- Categorical API - 데이터의 기본 통계량
- Distribution API - 범주형/연속형을 모두 살펴볼 수 있는 분포 시각화
- Relational API - 관계성 파악
- Regression API - 회귀 분석
- Matrix API - 히트맵
countplot
- seaborn의 Categorical API에서 대표적인 시각화로 범주를 이산적으로 세서 막대 그래프로 그려주는 함수

- sns.countplot(x='race/ethnicity', data=student)
  - x를 y로 바꾸면 가로 막대 그래프가 된다.
  - order=sorted(student['race/ethnicity'].unique() 속성을 추가하면 데이터의 순서를 정렬 할 수 있다
  - hue='gender' 속성을 추가하면 그룹 내에서 gender로 분류할 수 있다.
    - hue_order 속성도 있다.
  - palette='Set2', color='red', saturation='0.3' 등으로 색상 지정할 수 있다.
    - 연관된 값을 표현 할 때는 color로 비슷한 계열을 사용
    - 관련 없는 값을 표현 할 때는 palette을 다른색을 사용
  - hue, palette

Categorical API
- boxplot
  - - interquartile range (IQR): 25th to the 75th percentile.
    - whisker : 박스 외부의 범위를 나타내는 선.
    - outlier : -IQR*1.5과 +IQR*1.5을 벗어나는 값
    - min : -IQR * 1.5 보다 크거나 같은 값들 중 최솟값
    - max : +IQR * 1.5 보다 작거나 같은 값들 중 최댓값
  - ```
  fig, ax = plt.subplots(1,1, figsize=(10, 5))
  sns.boxplot(x='race/ethnicity', y='math score', data=student, 
              order=sorted(student['race/ethnicity'].unique()),
              ax=ax)
  plt.show()
```
- violinplot
  - boxplot의 분포를 더 잘 보여주기 위한 방법
  - 흰점이 50%를 중간 검정 막대가 IQR 범위
  - 데이터가 없는 곳의 오차가 발생
  - ```
  fig, ax = plt.subplots(1,1, figsize=(12, 5))
  sns.violinplot(x='math score', data=student, ax=ax,
                 bw=0.1, # 얼마나 자세하게 분포를 보여줄 것인가
                 cut=0, # 끝 부분 얼마나 자를 것인가
                 inner='quartile' # 25, 50 ,75프로 구간 표현(stick도 있다)
                )
  plt.show()
```
- scale : 각 바이올린의 종류
  - “area”, “count”, “width”
- split : 동시에 비교

ETC

boxenplot, swarmplot, stripplot

fig, axes = plt.subplots(3,1, figsize=(12, 21))
sns.boxenplot(x='race/ethnicity', y='math score', data=student, ax=axes[0],
               order=sorted(student['race/ethnicity'].unique()))

sns.swarmplot(x='race/ethnicity', y='math score', data=student, ax=axes[1],
               order=sorted(student['race/ethnicity'].unique()))

sns.stripplot(x='race/ethnicity', y='math score', data=student, ax=axes[2],
               order=sorted(student['race/ethnicity'].unique()))
plt.show()

Distribution API

Univariate Distribution

histplot : 히스토그램
kdeplot : Kernel Density Estimate
ecdfplot : 누적 밀도 함수
rugplot : 선을 사용한 밀도함수

fig, axes = plt.subplots(2,2, figsize=(12, 10)) 
axes = axes.flatten() 
sns.histplot(x='math score', data=student, ax=axes[0]) 
sns.kdeplot(x='math score', data=student, ax=axes[1]) 
sns.ecdfplot(x='math score', data=student, ax=axes[2]) 
sns.rugplot(x='math score', data=student, ax=axes[3]) 
plt.show()

Bivariate Distribution : 2개 이상의 변수의 분포

kdeplot과 hisplot

fig, axes = plt.subplots(1,2, figsize=(12, 7))
ax.set_aspect(1)

# axes[0].scatter(student['math score'], student['reading score'], alpha=0.2)

sns.kdeplot(x='math score', y='reading score', 
             data=student, ax=axes[0],
            fill=True,
#             bw_method=0.1
            )

sns.histplot(x='math score', y='reading score', 
             data=student, ax=axes[1],
#              color='orange',
             cbar=False,
             bins=(10, 20), 
            )

plt.show()

[Data Viz] Seaborn 심화

- jointplot
- pairplot

저작자표시 (새창열림)

'TIL > Boostcamp AI tech' 카테고리의 다른 글

[Image Classification] Week 4 Daily Report :: seofware (0)	2021.08.25
[Boostcamp]Week4-Day16. Dataset & Dataloader :: seofware (0)	2021.08.24
[Boostcamp]Week4-Day15. P stage start :: seoftware (0)	2021.08.23
[Boostcamp]Week3-Day14. Pytorch 활용 - Multi GPU, Hyperparameter tuning :: seoftware (0)	2021.08.20
[Boostcamp]Week3-Day13. Pytorch 모델 불러오기 :: seoftware (0)	2021.08.19

seoftware

[Boostcamp]Data visualization Seaborn :: seoftware

📜 강의 정리

[Data Viz] Seaborn 소개

[Data Viz] Seaborn 기초

[Data Viz] Seaborn 심화

'TIL > Boostcamp AI tech' 카테고리의 다른 글

댓글

티스토리툴바

[Boostcamp]Data visualization Seaborn :: seoftware

📜 강의 정리

[Data Viz] Seaborn 소개

[Data Viz] Seaborn 기초

[Data Viz] Seaborn 심화

'TIL > Boostcamp AI tech' 카테고리의 다른 글

관련글

댓글

티스토리툴바