AICE - Associate 정리

데이터 로드

odf = pd.read_csv('titanic.csv', encoding='cp949')

odf.head(2)

데이터 복사

df = odf.copy()

1.관측치 갯수, 데이터 수, 행 수, 인스턴스 수

2.컬럼수

3.결측치 = 891보다 작으면 여기에 결측치가 있는 피처가 있구나

df.info()

호구조사

df.describe()

describe() = 통계정보

수치형만 요약정보를 제공함

결측치가 존재하는 데이터 파악

df.isnull.sum()

계약 이탈 예측

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

df = pd.read_csv('경로')

지도학습

레이블

이진분류 데이터 -> 이진분류용 알고리즘과 평가지표

분류 평가지표

Accurce

df.info()

관측치 개수 피처 개수

df.head()

df.tail()

df.isnull().sum()

df.describe()

데이터프레임끼리 더하기 가능

df.rename(columns={'familysize' : 'FamilySize'), inplace=True)

df.drop('FamilySize', axis=1, inplace=True)

axis == 컬럼단위 1이 행 변경

inplace 확정

df['PassengerId'] = df['PassengerId'].replace('_', '-1')

df['PassengerId'] = df['PassengerId'].astype('int64')

df.info()

컬럼단위 개별접근

df['Age']

데이터프레임

df[['Age', 'Survided']]

행, 컬럼

df.loc[[0], 'xx', xx, xx]

위치정보

df.iloc[]

성별이 여성인 데이터

cond = df['sex'] == ''female'

df[cond]

나이가 10ㅅ세 미만이거나 60 이상

cond = (df['Age']>10) | (df['Age'] <= 60)

df.loc[cond]

df.loc[cond, :]

df_sex = df.loc['gender', 'Female']

df_sex.sort

df.loc['Female', 'MonthlyCharges']

cond = df['gender'] == 'Female'

df[cond].sort_valiue('MonthlyCharges', ascending=False).iloc[0,0]

df['Pcalss'].value_counts()

칼럼의 수준별 데이터 개수

p클래스에 가장 많은 데이터는? 할때

항구별 groupby()

항구별탑승자 수

df.groupby('Embarked').count()

항구별 탑승자의 평균연령

df.groupby('Embarked')['Age'].means()

결측치 개수

df.isnull.sum()

df.loc[df['Age'].isnull(),:]

fdf = df.fillna({'Age'} : 0)

fdf.isnull.sum()

fdf['Age'].mean()

fdf = df.fillna({'Age'} : df['Age'].mean())

이상치 찾기

df.boxplot()

df[['Fare']].boxplot()

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

scaled = scaler.fit_transform(df)

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

scaled = scaler.fit_transform(df)

pd.get_dummies(df, columns=['Embarked'], drop_first=True)

df['Survived'].replace(['Dead', 'Survived'],[0,1],inplace=True)

df.isnull().sum()

df.drop('DeviceProtection', axis=1, inplace=True)

axis==컬럼기준

df['gender'].value.acounts().plot(kind='bar')

히트맵(무조건나옴)

sns.

pd.to_csv('data', index=False)

다중분류 average='macro'

'Tech Stack > AICE - Associate' 카테고리의 다른 글

AICE - associate 실습1 (0)	2024.01.24
AICE - Associate - 7 딥러닝 모델링 (0)	2024.01.19
AICE - Associate - 7 머신러닝 모델링 (0)	2024.01.18
AICE - associate - 5 데이터 전처리 (0)	2024.01.16
AICE - associate - 4 pandas 2 (0)	2024.01.15

Dinoqos Development ML/AI Engineering

AICE - Associate 정리

'Tech Stack > AICE - Associate' 카테고리의 다른 글

티스토리툴바

AICE - Associate 정리

'Tech Stack > AICE - Associate' 카테고리의 다른 글

'Tech Stack/AICE - Associate' Related Articles

티스토리툴바