[Python] Pandas를 활용한 데이터 수집

Recent Posts

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Today

Total

관리 메뉴

매일공부

[Python] Pandas를 활용한 데이터 수집 본문

Programming/Python

[Python] Pandas를 활용한 데이터 수집

aram 2022. 9. 2. 00:35

CSV&txt 파일을 불러와서 DataFrame으로 반환

import pandas as pd
pd.read_csv('경로/이름', sep='구분자', header=None, names=['new 열이름', ...] )

sep= 기본 ','(comma)

names=[] : column name(변수 이름)이 없는 파일 로드 할때 이름 부여

header=0(디폴트 값) : 1번째행이 column name
=None : column name이 없다

usecols=[0, 2, 5] or (0, 2, 5) or ['열이름', ...] : 불러올 컬럼의 인덱스 번호나 이름 지정
참고 = https://useful-jang.tistory.com/55

index_col = 특정 컬럼(열)을 행 인덱스로 설정

nrows = n : 가져올 행 개수 제한

na_values=[] : 결측값(NaN)으로 인식할 수 있도록 설정

skiprows = [x,x] : 첫 n개행을 제외하고 가져올 때 사용 옵션
skipfooter = 뒤에 n개행을 제외하고 가져올 때 사용 옵션
참고 = https://ponyozzang.tistory.com/620

dtype = {'key':'data type'} : dict로 넘겨서 데이터 유형 설정

pd.set_option() : 출력 설정 지정
참고 = https://mindscale.kr/course/pandas-basic/options/

pd.read_table('경로/txt파일') : 데이터프레임 형식으로 txt파일 읽어오기
참고 = https://atotw.tistory.com/m/484

pd.to_csv() : 데이터프레임을 txt로 기록

excel 파일 내용을 dataframe 반환

> xlrd 라이브러리 사용

import numpy as np 
import pandas as pd
pd.read_excel('경로/이름',  sheet_name = '시트명' )
pd.read_excel('경로/이름',  sheet_name = 0, header=, dtype=)

sheet_name= : 시트 이름 지정

thousands= ',' #천 단위 구분 기호

header, names, sep, usecols, na_values, nrows, skiprows, skipfooter = csv랑 옵션 동일하게 적용

na_rep= NaN을 변경할 표시값

JSON 데이터 읽고 쓰기

(JavaScript Object Notation)

## json 라이브러리 이용

import json
with open('경로/이름', 'W') as json_file:
	json.dump(읽을 데이터, json_file, indent=, sort_keys=)

json.dumps() = JSON 포맷 데이터를 메모리에 만들기

indent = int' : 파이썬 객체를 직렬화해서 JSON으로 쓸 때의 들여쓰기(indentation) 옵션

sort_keys=True : 키(keys)를 기준으로 정렬해서 직렬화

json.load() = 디스크에 있는 데이터 읽어오기 <class 'dict'>

json.loads() = 메모리에 있는 데이터 읽어오기<class 'dict'>

## pd.read_json 이용

import pandas as pd

df2 = pd.read_json('./datas/test.json') 
print(df2)

df2.to_json()  #{'열이름' : 'value', ...}
df.to_json(orient='records')
df.to_json(orient='index')   #columns(기본값), values, split

DataFrame객체.to_json() = json데이터로 기록
참고 = https://jimmy-ai.tistory.com/194

웹사이트에서 제공하는 json 데이터를 dataframe 생성

import json 
import urllib
import pandas as pd
from urllib.request import Request, urlopen (for Python 3.x)
urlTicker = Request("url", headers={'' : ''})
readTicker = urlopen(urlTicker).read()
jsonTicker = json.loads(readTicker)
pd.DataFrame()

1. Request객체(url) , urlopen() 로 요청
2. 응답 데이터 read() : str
3. json.loads() 메모리에 json객체로 생성
4. pd.Dataframe(json객체)

웹사이트에서 제공하는 xml 데이터를 dataframe 생성

import pandas as pd 
import xml.etree.ElementTree as ET
url = 'url'
response = urlopen(url).read()
xtree = ET.fromstring(response)

1. Request객체(url) , urlopen() 로 요청
2. 응답 데이터 read() : str
3. xml.etree.ElementTree 객체 생성
4. 데이터 추출을 위해서 태그명으로 검색, 내용 추출 : find('태그명').text
5. 리스트에 dict객체로 추가 후 pd.Dataframe( )로 dataframe 객체로 생성

read_table('경로/이름', sep='')
read_hdf()

오라클 ordbms의 테이블로부터 데이터를 dataframe 객체로 생성

1. 데이터를 가져올 ordbms 라이브러리(모듈) import : from cx_Oracle
2. 데이터를 가져올 ordbms connection : cx_Oracle.connect('user명', 'password', 'url(domain)/service')
3. Cursor객체 생성 - connection 객체.cursor()
4. sql 실행 - cursor객체.execute()
5. select 의 실행결과 resultset은 for문 이용해서 row단위로 추출

웹사이트에서 제공하는 html 페이지 내용중에서 table 데이터를 dataframe 생성

: pd.read_html()

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from unicodedata import normalize

table_MN = pd.read_html('url')
print(f'Total tables: {len(table_MN)}')
df = table_MN[0]
print(df.head())
print(df.info())

table_MN = pd.read_html('url' , match='Election results from statewide races')
print(len(table_MN))
print(table_MN)

참고 = https://mizykk.tistory.com/40

* DBM 관련 모듈 : https://saelly.tistory.com/474

* 내용참고&출처 : 태그의 수업을 복습 목적으로 정리한 내용입니다.

'Programming > Python' 카테고리의 다른 글

[python] 진법&자료형 변환 Casting 함수 (0)	2023.06.19
[Python Pandas] Dataframe 다중 인덱스 & 병합, 연결 (0)	2022.09.14
Python Pandas 차집합&대칭차집합 구하기 (0)	2022.08.29
Python 데이터 분석 라이브러리 - Pandas (0)	2022.08.27
Numerical Python - numpy (0)	2022.08.26

'Programming/Python' Related Articles

Comments

매일공부

[Python] Pandas를 활용한 데이터 수집 본문

[Python] Pandas를 활용한 데이터 수집

CSV&txt 파일을 불러와서 DataFrame으로 반환

excel 파일 내용을 dataframe 반환

JSON 데이터 읽고 쓰기

웹사이트에서 제공하는 json 데이터를 dataframe 생성

웹사이트에서 제공하는 xml 데이터를 dataframe 생성

오라클 ordbms의 테이블로부터 데이터를 dataframe 객체로 생성

웹사이트에서 제공하는 html 페이지 내용중에서 table 데이터를 dataframe 생성

'Programming > Python' 카테고리의 다른 글

티스토리툴바