[Pytorch] PyTorch Basics

Notice

Recent Posts

Recent Comments

Link

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

Today

Total

관리 메뉴

개발자식

[Pytorch] PyTorch Basics 본문

AI/Pytorch

[Pytorch] PyTorch Basics

밍츠 2022. 10. 2. 01:07

PyTorch

- Numpy + AutoGrad + Function

PyTorch는 Tensor를 이용하여 모델의 입력, 출력 그리고 모델의 매개변수를 부호화한다.

Tensor는

- 배열이나 행렬과 매우 유사한 자료구조로

- numpy의 ndarray와 매우 유사하다.

- 그래서 기본적으로 tensor가 가질 수 있는 데이터 타입은 numpy와 동일하다.

(참고 : https://pytorch.org/docs/stable/tensors.html)

- numpy와 다른 점은 GPU나 다른 연산 가속을 위한 특수한 하드웨어에서 실행할 수 있다는 점이다.

- numpy에 익숙하다면 tensor api에도 익숙할 것이다.

+ numpy 필수..!!

tensor를 만드는 코드

먼저 numpy와 얼마나 유사한지 보면

import numpy as np
n_array = np.arange(10).reshape(2,5)
print(n_array)
print("ndim :", n_array.ndim, "shape :", n_array.shape)

Output:
[[0 1 2 3 4]
 [5 6 7 8 9]]
ndim : 2 shape : (2, 5)

numpy to tensor

import torch
t_array = torch.FloatTensor(n_array)
print(t_array)
print("ndim :", t_array.ndim, "shape :", t_array.shape)

Output:
tensor([[0., 1., 2., 3., 4.],
        [5., 6., 7., 8., 9.]])
ndim : 2 shape : torch.Size([2, 5])

# data to tensor
data = [[3, 5],[10, 5]]
x_data = torch.tensor(data)

#ndArray to Tensor
nd_array_ex = np.array(data)
tensor_array = torch.from_numpy(nd_array_ex)

numpy와 연산도 비슷하다.

data = [[3, 5, 20],[10, 5, 50], [1, 5, 10]]
x_data = torch.tensor(data)

print(x_data[1:])
print(x_data[:2, 1:])
print(x_data.flatten())
print(torch.ones_like(x_data))
print(x_data.numpy())
print(x_data.shape)
print(x_data.dtype)
print(x_data.device)

Output:
tensor([[10,  5, 50],
        [ 1,  5, 10]])
tensor([[ 5, 20],
        [ 5, 50]])
tensor([ 3,  5, 20, 10,  5, 50,  1,  5, 10])
tensor([[1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]])
[[ 3  5 20]
 [10  5 50]
 [ 1  5 10]]
torch.Size([3, 3])
torch.int64
cpu

+ tensor는 GPU에 올려서 사용이 가능하다.

if torch.cuda.is_available():
    x_data_cuda = x_data.to('cuda')
print(x_data_cuda.device)

x_data.device : GPU, CPU 중 어디에 올라와 있는지 확인한다.

torch.cuda.is_available() : GPU 사용이 가능한지 확인한다.

Tensor handling

- view, squeeze, unsqueeze 등올 tensor 조정이 가능하다.

view : reshape와 동일하기 tensor의 shape을 변환한다.
squeeze : 차원의 개수가 1인 차원을 삭제한다 (압축), 차원의 개수가 1인 차원이 없으면 안 됨
unsqueeze : 차원의 개수가 1인 차원을 추가한다. ex) unsqueeze(0) , 0번째 인덱스의 위치에 추가, (2,2) -> (1,2,2)

tensor_ex = torch.rand(size=(2, 3, 2))
print("tensor_ex", tensor_ex)
print("view:",tensor_ex.view([-1, 6]))
print("reshape:", tensor_ex.reshape([-1,6]))

Output:
tensor_ex tensor([[[0.8120, 0.9118],
         [0.2850, 0.6782],
         [0.3935, 0.3202]],

        [[0.6795, 0.3988],
         [0.0952, 0.7066],
         [0.9754, 0.7168]]])
view: tensor([[0.8120, 0.9118, 0.2850, 0.6782, 0.3935, 0.3202],
        [0.6795, 0.3988, 0.0952, 0.7066, 0.9754, 0.7168]])
reshape: tensor([[0.8120, 0.9118, 0.2850, 0.6782, 0.3935, 0.3202],
        [0.6795, 0.3988, 0.0952, 0.7066, 0.9754, 0.7168]])

tensor_ex = torch.rand(size=(2, 1, 2))
print(tensor_ex.squeeze())

tensor_ex = torch.rand(size=(2, 2))
print(tensor_ex.unsqueeze(0).shape)
print(tensor_ex.unsqueeze(1).shape)
print(tensor_ex.unsqueeze(2).shape)

Output:
tensor([[0.1868, 0.4348],
        [0.6302, 0.6386]])
torch.Size([1, 2, 2])
torch.Size([2, 1, 2])
torch.Size([2, 2, 1])

여기서 view()와 reshape() 어떤 것을 사용하는 것이 좋을까?

- 둘 다 tensor의 모양을 바꾸는 데 사용한다.

- 하지만 둘 사이에는 contiguity(연속성) 보장의 차이가 있다.

view는 contiguous 속성이 만족되지 않는 경우 일부 사용이 제한될 수 있다.
reshape는 가능
차원을 변경하려는 텐서의 상태가 정확하게 파악하기 모호한 경우는 reshape 사용을 권장한다.

- tensor handling으로는 reshape 말고 view 사용을 권장한다.

contigutiy가 무엇일까?

- 메모리 내에서 자료형 저장 상태로

- contiguous 하다는 것은 메모리 순서에 맞게(axis=0) 자료가 저장되어 있는 것이다.

- 자료형. is_contiguous()로 확인 가능하다.

view

a = torch.zeros(3, 2)
b = a.view(2, 3)
a.fill_(1)
print(a)
print(b)

Output:
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
tensor([[1., 1., 1.],
        [1., 1., 1.]])

reshape

a = torch.zeros(3, 2)
b = a.t().reshape(6)
a.fill_(1)
print(a)
print(b)

Output:
tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])
tensor([0., 0., 0., 0., 0., 0.])

view()는 새로운 모양의 tensor를 반환하고, 반환된 tensor는 원본 tensor와 data를 공유한다.

reshape()는 데이터 공유를 보장하지 않고 독립적으로 값을 참조한다.

-> 데이터 형태나 목적에 맞게 사용하자.

Tensor operations을 알아보자

n1 = np.arange(10).reshape(2,5)
n2 = np.arange(10).reshape(5,2)
t1 = torch.FloatTensor(n1) #tensor([[ 0.,  2.,  4.,  6.,  8.],[10., 12., 14., 16., 18.]])
t2 = torch.FloatTensor(n2) #tensor([[0., 1.],[2., 3.],[4., 5.],[6., 7.],[8., 9.]])

print(t1 + t1)
print(t1 - t1)
print(t1 + 10)

Output:
tensor([[ 0.,  2.,  4.,  6.,  8.],
        [10., 12., 14., 16., 18.]])
tensor([[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])
tensor([[10., 11., 12., 13., 14.],
        [15., 16., 17., 18., 19.]])

size가 다른 연산은 불가능하다. ex) t1 + t2, t1- t2

행렬 곱셈 연산은 dot이 아닌 mm을 사용한다.

-> dot은 벡터의 연산이고 mm은 행렬 간의 연산이다.

#벡터 연산 : dot
a = torch.rand(10)
b = torch.rand(10)
print(a.dot(b))

#행렬 연산 : mm, matmul
n2 = np.arange(10).reshape(5,2)
t2 = torch.FloatTensor(n2)
print(t1.mm(t2))
print(t1.matmul(t2))

Output:
tensor(3.0554)
tensor([[ 60.,  70.],
        [160., 195.]])
tensor([[ 60.,  70.],
        [160., 195.]])

mm과 matmul의 차이는??

- broadcasting 지원 처리이다.

mm : broadcasting 기능 제공 x
matmul : broadcasting 기능 제공 o

broadcasting이란?

- 어떤 조건만 만족한다면 모양이 다른 배열끼리의 연산도 가능하게 해 준다라고 생각할 수 있다. (확장, 전파의 의미)

- 연산을 할 때, shapes를 element-wise로 비교한다 (즉 우측 element부터 좌측으로 비교한다.)

- 쉽게 연산할 수 있지만 결과를 헷갈리게 할 수 있다.

broadcasting 조건 확인 참고 자료 :

https://velog.io/@rhqjatn2398/Numpy-%EB%B8%8C%EB%A1%9C%EB%93%9C%EC%BA%90%EC%8A%A4%ED%8C%85Broadcasting%EC%9D%B4%EB%9E%80

[Numpy] 브로드캐스팅(Broadcasting)이란?

https://numpy.org/doc/stable/user/basics.broadcasting.html?highlight=broadcasting 위의 NumPy doc에 따르면 브로드캐스팅(Broadcasting)은 산수연산(Arithmetic operations)을

velog.io

mm은 정확하게 matrix 곱의 사이즈가 맞아야 사용 가능하다.

torch.mm(input, mat2, *, out=None) → Tensor

input의 size: (n x m) , mat2의 size: (m x p)

output의 size: (n x p)

mm vs matmul 코드 비교

a = torch.rand(5,2, 3)
b = torch.rand(3)
a.mm(b) # error

a = torch.rand(5,2, 3)
b = torch.rand(3)
print(a.matmul(b))
print(a.matmul(b).shape)

Output:
tensor([[0.5123, 0.5959],
        [0.4257, 0.4484],
        [0.5071, 0.6030],
        [0.7781, 0.5534],
        [0.4299, 0.7632]])
torch.Size([5, 2])

mm은 에러 발생
matmul :a(5,2,3) b(3)을 곱할 때, 맨 앞의 dim이 5개로 첫 dim을 batch로 간주하고 a(2,3) tensor의 5개의 batch와 각각 b(3,)랑 곱해주는 것이다.

Tensor operations for ML/DL formula

- 코드를 보면서 간단히 이해해보자

import torch
import torch.nn.functional as F

tensor = torch.FloatTensor([0.5, 0.7, 0.1])
h_tensor = F.softmax(tensor, dim=0)
h_tensor #tensor([0.3458, 0.4224, 0.2318])

y = torch.randint(5, (10,5))
print(y)
y_label = y.argmax(dim=1)
print(y_label)
print(torch.nn.functional.one_hot(y_label)

Output:
tensor([[2, 2, 0, 1, 4],
        [2, 0, 3, 2, 1],
        [1, 0, 4, 2, 2],
        [3, 4, 2, 3, 0],
        [2, 2, 1, 0, 2],
        [0, 0, 1, 2, 3],
        [2, 1, 4, 4, 3],
        [2, 0, 4, 3, 4],
        [1, 1, 3, 1, 3],
        [3, 4, 3, 2, 0]])
tensor([4, 2, 2, 1, 0, 4, 2, 2, 2, 1])
tensor([[0, 0, 0, 0, 1],
        [0, 0, 1, 0, 0],
        [0, 0, 1, 0, 0],
        [0, 1, 0, 0, 0],
        [1, 0, 0, 0, 0],
        [0, 0, 0, 0, 1],
        [0, 0, 1, 0, 0],
        [0, 0, 1, 0, 0],
        [0, 0, 1, 0, 0],
        [0, 1, 0, 0, 0]])

tensor_a = torch.tensor(a)
tensor_b = torch.tensor(b)
torch.cartesian_prod(tensor_a, tensor_b)
#tensor([[1, 4],
        [1, 5],
        [2, 4],
        [2, 5],
        [3, 4],
        [3, 5]])

마지막으로 Pytorch의 핵심인 Autograd(자동 미분)을 봐보자

Autograd란?

- 파이토치의 Autograd는 자동 미분을 이용하여 변화도(gradient) 계산을 한다는 것이다.

Autograd 사용 방법

- 어떤 tensor가 학습에 필요한 tensor라면 역전파를 통해 gradient를 구해야 한다.(즉 미분해야 함)

- tensor의 gradient를 구할 때는 다음 조건이 만족해야 한다.

tensor의 옵션이 requries_grad = True로 설정되어야 한다.
역전파를 시작할 지점의 output은 scalar 형태이어야 한다.

- tensor의 gradient를 구하는 방법은 역전파를 시작할 지점의 tensor에서 .backward() 함수를 호출하면 된다.

- gradeint 값을 확인하려면 requries_grad = True로 생성한 Tensor에서 .grad를 통해 값을 확인할 수 있다. (requires_grad = False인 경우 None 값이다)

- requires_grad=True인 경우 연산을 수행하면 grad_fn에 텐서가 어떤 연산을 했는지 정보를 담고 있다.

-> 이 정보는 역전파 과정에서 사용된다.

- y.requires_grad_(True)로 requires_grad 속성 값을 추가할 수 있다. (앞에서 연산한 이력은 grad_fn에 저장되지는 않는다.)

w = torch.tensor(2.0, requires_grad=True)
y = w**2
z = 10*y + 50
z.backward()
print(w.grad)

Output:
tensor(40.)

a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)
Q = (3*a**3 - b**2)
external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad)

print(a.grad)
print(b.grad)

Output:
tensor([36., 81.])
tensor([-12.,  -8.])

Q 값은 스칼라가 아니라 벡터이기 때문에 gradient=external_grad로 gradient 인수를 명시적으로 전달해야 한다. Q 자체의 기울기를 나타낸다. 이는 Q.sum(). backward()로도 나타낼 수 있다.

내가 이해한 느낌은 Q벡터 사이즈와 동일한 1 값을 가지는 벡터를 넣어주면 되는 것 같다. 중요한 점은 backward는 스칼라에 적용된다는 점!

tensor를 잘 다루기 위해서 연산이 비슷한 numpy를 잘 아는 것이 중요하다고 느꼈고,

함수를 다 기억하기보다 대충 이런 기능을 제공한다는 사실을 알아 관련된 검색을 할 수 있도록 공부할 것이다.

비슷한 기능을 제공하는 함수여도 상황에 따라 적합한 함수를 사용하는 것이 중요하다고 생각했고,

autograd 내용은 뒤에 더 공부하면서 보충해야겠다.

'AI > Pytorch' 카테고리의 다른 글

[Pytorch] Pytorch 모델 불러오기(save, checkpoints, transfer learning) (0)	2022.10.02
[Pytorch] Pytorch Datasets, Dataloaders (0)	2022.10.02
[Pytorch] Pytorch Module, Parameter, Backward (0)	2022.10.02
[Pytorch] Pytorch Template (0)	2022.10.02
[Pytorch] Introduction to PyTorch (0)	2022.10.01

'AI/Pytorch' Related Articles

Comments

개발자식

[Pytorch] PyTorch Basics 본문

[Pytorch] PyTorch Basics

PyTorch

Tensor는

tensor를 만드는 코드

numpy to tensor

Tensor handling

여기서 view()와 reshape() 어떤 것을 사용하는 것이 좋을까?

contigutiy가 무엇일까?

Tensor operations을 알아보자

mm과 matmul의 차이는??

broadcasting이란?

Tensor operations for ML/DL formula

Autograd란?

Autograd 사용 방법

'AI > Pytorch' 카테고리의 다른 글

티스토리툴바