[AI 기초 다지기] 파이썬 데이터 구조, 자료구조

Programming/Python

[AI 기초 다지기] 파이썬 데이터 구조, 자료구조

aram 2022. 7. 30. 23:28

자료구조 data structure

> 특징이 있는 정보를 메모리에 효율적으로 저장 및 반환하는 방법

- 스택 (Stack)

Last In First Out (LIFO)
나중에 넣은 데이터 > 먼저 반환하도록 설계된 메모리 구조 > 역순 추출
리스트를 사용하여 스택 구조 구현
입력 = Push > append() / 출력 = Pop > pop()

>>> a = [1,2,3,4,5]
>>> a.append(10)
>>> a.append(20)
>>> a.pop() #20
>>> a.pop() #10

- 큐 (Queue)

First In First Out (FIFO)
먼저 넣은 데이터 > 먼저 반환하도록 설계된 메모리 구조 > Stack과 반대
리스트를 사용하여 큐 구조 구현
입력 = Put > append() / 출력 = get > pop(0)

>>> a = [1,2,3,4,5]
>>> a.pop(0) #1
>>> a.pop(0) #2
>>> a        #[3, 4, 5]

- 튜플 (tuple) : python data structure

값의 변경이 불가능한 List = read only > 대신, 속도가 빠름
선언 시 "[ ]" 가 아닌 "( )"를 사용
리스트의 연산, 인덱싱, 슬라이싱 등 동일하게 사용
학번, 이름, 우편번호 등 작동하는 동안 변경되면 안 되는 데이터 저장 > dict의 key로 사용 가능
사용자의 실수에 의한 에러를 사전에 방지
값이 하나인 Tuple : 반드시 ","를 붙여야 함

>>> t = (1)
>>> print(type(t))
<class 'int'>
>>> t = (1,)
>>> print(type(t))
<class 'tuple'>

#swap 코드
>>> a, b = 1, 2
>>> print(a, b)
1, 2
>>> a, b = b, a
>>> print(a, b)
2, 1

☞ 자세한 튜플: https://dailystudy.tistory.com/32

- 집합 (=세트 set) : python data structure

값을 순서없이 저장, 중복 불허 하는 자료형 == key만 남은 딕셔너리
set 객체 선언으로 객체 생성 > s = set() OR s = {}
삭제[remove(), dicard(), pop()=임의원소삭제, clear()], 변경 가능
다양한 집합연산 가능
: 합집합 .union() |
: 교집합 .intersection() &
: 교집합 후 내부 갱신 .intersection_update()
: 차집합 .difference() -
: 차집합 후 기존 내부 원소 갱신 .difference_update()
: 대칭 차집합 .symmetric_difference()
: 부분집합 .issubset()
: 큰집합 .issuperset()

>>> s = set([1,2,3,1,2,3]) #set 함수를 사용 1,2,3을 집합 객체 생성, s = {1,2,3,4,5} 도 가능
>>> s
{1, 2, 3}
>>> s.remove(1) #1 원소 하나 삭제 > 없으면 exception 발생
>>> s
{2, 3}
>>> s.update([1,2,3,4,7]) #[1,4,7] 추가 / [2,3]은 중복불허로 추가x
>>> s
{1, 2, 3, 4, 7}
>>> s.discard(3) #3 원소 하나 삭제 > 없으면 None 리턴
>>> s
{1, 2, 4, 7}
>>> s.clear()  #모든 원소 삭제
>>> s
set()

- 사전 (=딕셔너리 dictionary) : python data structure

데이터를 저장 할 때는 구분 지을 수 있는 값을 함께 저장 > 학번-이름 처럼
구분을 위한 데이터 고유 값 : Identifier 또는 Key > 중복 허용x
Key 값을 활용하여, 데이터 값(Value)를 관리함
key - value 매칭 > key로 value 검색
다른 언어에서는 Hash Table 이라는 용어 사용
{Key1:Value1, Key2:Value2, Key3:Value3 ...} 형태

>>> country_code = {} #Dict 생성, country_code = dict() 도 가능
>>> country_code = {"America": 1, "Korea": 82, "China": 86, "Japan": 81}
>>> country_code
{'America': 1, 'Korea': 82, 'China': 86, 'Japan': 81}
>>> country_code.items()        #Dict 데이터 출력
dict_items([('America', 1), ('Korea', 82), ('China', 86), ('Japan', 81)])
>>> for dict_items in country_code.items(): #for in으로 하면 전체 출력 가능
...     print(dict_items)
...
('America', 1)
('Korea', 82)
('China', 86)
('Japan', 81)
>>> country_code.keys() #Dict 키 값만 출력
dict_keys(['America', 'Korea', 'China', 'Japan'])
>>> country_code["German"]= 49   #Dict 추가
>>> country_code
{'America': 1, 'Korea': 82, 'China': 86, 'Japan': 81, 'German': 49}
>>> country_code.values()        #Dict Value만 출력
dict_values([1, 82, 86, 81, 49])

>>> for k,v in country_code.items(): #보통 제일 많이 사용됨
...     print("Key : ", k, "/ Value : ", v)
...
Key :  America / Value :  1
Key :  Korea / Value :  82
Key :  China / Value :  86
Key :  Japan / Value :  81
Key :  German / Value :  49
>>> "Korea" in country_code.keys() #Key값에 "Korea"가 있는지 확인
True
>>> 82 in country_code.values() #Value값에 82가 있는지 확인
True

☞ 자세한 딕셔너리 : https://dailystudy.tistory.com/30

- collections

List, Tuple, Dict에 대한 Python Built-in 확장 자료 구조(모듈)
편의성, 실행 효율 등을 사용자에게 제공

1. from collections import deque

Stack과 Queue 를 지원하는 모듈
List에 비해 효율적인(=빠른) 자료 저장 방식 지원
rotate(), reversed()등 Linked List의 특성 지원 > 값은 그대로, 인덱스 값만 옮기면 됨
기존 list 형태의 함수 모두 지원
appendleft() : 새로운 값을 왼쪽부터 입력 > 먼저 들어간 값부터 출력되게 할 수 있음
append(), pop(), extend(), extendleft()

출처 https://www.codingninjas.com/codestudio/library/difference-between-queue-and-deque-in-c

효율적 메모리 구조로 처리 속도 향상

## deque ##
from collections import deque 
import time
start_time = time.process_time() #clock 대신에 사용
deque_list = deque() 
# Stack 
for i in range(10000):
    for i in range(10000):
        deque_list.append(i) 
        deque_list.pop() 
print(time.process_time() - start_time, "seconds") #출력: 9.15625 seconds

## general list ##
import time
start_time = time.process_time() 
just_list = [] 
for i in range(10000):
    for i in range(10000):
        just_list.append(i) 
        just_list.pop() 
print(time.process_time() - start_time, "seconds") #출력: 28.6875 seconds

▶ Python3.9에는 time.clock 사용 불가 > 대신 process_time() 사용

파이썬3.9 오류 코드 > AttributeError: module 'time' has no attribute 'clock'

2. from collections import OrderedDict

Dict와 달리, 데이터를 입력한 순서대로 dict를 반환함
but, dict도 python 3.6 부터 입력한 순서를 보장하여 출력 > 예전에는 이랬다 정도
Dict type의 값을, value 또는 key 값으로 정렬할 때 사용 가능

from collections import OrderedDict
d = OrderedDict() 
d['x'] = 100
d['y'] = 200
d['z'] = 300
d['l'] = 500
for k, v in OrderedDict(sorted(d.items(), key=lambda t: t[0])).items():
    print(k, v) #l 500
		#x 100
		#y 200
		#z 300
for k, v in OrderedDict(sorted(d.items(), key=lambda t: t[1])).items():
    print(k, v) #x 100
		#y 200
		#z 300
		#l 500

3. from collections import defaultdict

Dict type의 값에 기본 값을 지정, 신규값 생성시 사용하는 방법
key 값이 많아서 확인이 어려울 때, 디폴트 값를 부여하는 것이 편할 때 주로 사용
Text-mining 접근법 - Vector Space Model
> 하나의 지문에 각 단어들이 몇 개 있을까?

text = """A press release is the quickest 
and easiest way to get free publicity. 
If well written, 
a press release can result in 
multiple published articles about your firm and its products. 
And that can mean new prospects 
contacting you asking you to sell to them. ….""".lower().split()

from collections import defaultdict 
from collections import OrderedDict
word_count = defaultdict(lambda: 0) # Default 값을 0으로 설정(람다 함수 형태로 넣어줘야 함)
for word in text:
    word_count[word] += 1 
for i, v in OrderedDict(sorted(word_count.items(), key=lambda t: t[1], reverse=True)).items():
    print(i, v)

4. from collections import Counter

Sequence type의 data element들의 갯수를 dict 형태로 반환
Dict type, keyword parameter 등도 모두 처리 가능
Set의 연산들을 지원함
word counter의 기능도 손쉽게 제공함
elements() : 각 요소의 개수만큼 리스트형 결과 출력

from collections import Counter

c = Counter('gallahad')
print(c) #Counter({'a': 3, 'l': 2, 'g': 1, 'h': 1, 'd': 1})

#word counter
text = """A press release is the quickest 
and easiest way to get free publicity. 
If well written, 
a press release can result in 
multiple published articles about your firm and its products. 
And that can mean new prospects 
contacting you asking you to sell to them. ….""".lower().split()

print(Counter(text)) 
print(Counter(text)["a"])

5. from collections import namedtuple

Tuple 형태로 Data 구조체를 저장하는 방법
저장되는 data의 variable을 사전에 지정해서 저장함
객체를 만들기 위해 설계도인 class가 속성만 가지는 경우

from collections import namedtuple 
Point = namedtuple('Point', ['x', 'y']) 
p = Point(11, y=22) 
print(p[0] + p[1]) #33
x, y = p 
print(x, y) #11 22
print(p.x + p.y) #33
print(Point(x=11, y=22)) #Point(x=11, y=22)

728x90