[PyTorch] contiguous 연산의 필요성, Grad strides do not match bucket view strides

AI Development/PyTorch

[PyTorch] contiguous 연산의 필요성, Grad strides do not match bucket view strides

꾸준희 2023. 12. 4. 17:27

728x90

문제

Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.

위와 같이 에러가 나면서 학습이 안되는 상황이 발생.

해결방법

위와 같이 grad stride 관련 이슈가 난다면, 본인 환경이 ddp 학습인지 체크해보고,

아래와 같이 transpose() 연산이나 permute() 연산에 .contiguous() 를 꼭 붙여준다.

transpose().contiguous()

permute().contiguous()

이유

즉, transpose 또는 permute 연산은 contiguous 한 성질이 사라지는 연산이다.

narrow, expand, view, transpose를 통해 tensor의 모양을 변화시킬 경우

새로운 tensor를 생성하는 것이 아니라 저장된 tensor memory 주소는 그대로 둔 채 모양이 변한다.

즉, 메모리를 공유한 상태에서 tensor shape이 바뀌는 것

예를 들어 tensor x에 대해 하나는 transpose, 다른 하나는 원래 텐서일 경우

transpose 연산으로 인하여 메모리를 서로 공유하고 있는데도 불구하고, grad stride size가 달라져버리는 문제가 발생한다.

그래서 contiguous 하지 않다는 에러를 내뿜는 것이다.

contiguous 라는 것은 인접한 이라는 단어인데,

이 연산은 새로운 메모리 공간에 데이터를 복사하여 주소값 연속성을 가변적으로 만들어준다.

그래서 contiguous 연산을 사용할 경우 메모리를 서로 공유하지 않고 원본과 다른 새로운 주소로 할당이 된다.

참고로

permute 연산이란? 모든 차원을 맞교환 할 수 있는 함수, 차원을 교환하면서 contiguous한 성질이 사라진다는 특징이 있음.

transpose 연산이란? 반면, transpose 함수는 permute 함수와 달리 두 개의 차원만 맞교환이 가능함.

즉,

permute나 transpose 연산과 같이 tensor shape을 바꾸는 연산을 사용할 때는

새로운 메모리 공간에 할당되도록 반드시 contiguous 연산을 꼭 붙여 쓰도록 하자.

참고자료 1 : https://github.com/pytorch/pytorch/issues/47163

Grad strides do not match bucket view strides. · Issue #47163 · pytorch/pytorch

[W reducer.cpp:313] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed sin...

github.com

참고자료 2 : https://sanghyu.tistory.com/3

[PyTorch] view, reshape, transpose, permute함수의 차이와 contiguous의 의미

(본 포스팅은 이 글 번역 + 마지막에 제 생각을 덧붙였습니다.) PyTorch는 tensor의 type(형)변환을 위한 다양한 방법들을 제공하고 있다. 몇몇의 방법들은 초심자들에게 헷갈릴 수 있다. 그래서 view() v

sanghyu.tistory.com

참고자료 3 : https://f-future.tistory.com/entry/Pytorch-Contiguous

[Pytorch] Contiguous

목차 메모리를 따로 할당하지 않는 Tensor 객체 연산 메모리를 따로 할당하지 않을때 문제점 해결방안 : Contiguous 함수 정리 1. 메모리를 따로 할당하지 않는 Tensor 객체 연산 - narrow() - view() - expand()

f-future.tistory.com

728x90

저작자표시 비영리