IT

RuntimeError: CUDA error: invalid device ordinal

프로개발러 2023. 7. 21. 14:26
반응형

https://stackoverflow.com/questions/64334033/how-to-solve-runtimeerror-cuda-error-invalid-device-ordinal

 

 

How to solve "RuntimeError: CUDA error: invalid device ordinal"?

I'm trying to run this code. I don't know what is wrong with it, but this code is not running. and I don't know how to solve this problem. import cv2 from facial_emotion_recognition import

stackoverflow.com

 

알파카셋팅중 문제발견.

 

여기에 답 있음

 

 

│ /usr/local/lib/python3.10/dist-packages/accelerate/state.py:197 in __init__                      │
│                                                                                                  │
│   194 │   │   │   │   self.local_process_index = int(os.environ.get("LOCAL_RANK", -1))           │
│   195 │   │   │   │   if self.device is None:                                                    │
│   196 │   │   │   │   │   self.device = torch.device("cuda", self.local_process_index)           │
│ ❱ 197 │   │   │   │   torch.cuda.set_device(self.device)                                         │
│   198 │   │   │   elif get_int_from_env(["PMI_SIZE", "OMPI_COMM_WORLD_SIZE", "MV2_COMM_WORLD_S   │
│   199 │   │   │   │   if not cpu and is_xpu_available():                                         │
│   200 │   │   │   │   │   self.distributed_type = DistributedType.MULTI_XPU                      │
│                                                                                                  │
│ /usr/local/lib/python3.10/dist-packages/torch/cuda/__init__.py:326 in set_device                 │
│                                                                                                  │
│   323 │   """                                                                                    │
│   324 │   device = _get_device_index(device)                                                     │
│   325 │   if device >= 0:                                                                        │
│ ❱ 326 │   │   torch._C._cuda_setDevice(device)                                                   │
│   327                                                                                            │
│   328                                                                                            │
│   329 def get_device_name(device: Optional[_device_t] = None) -> str:                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: invalid device ordinal
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 357913) of binary: /usr/bin/python3
Traceback (most recent call last):
.

 

이런식으로 invalid device ordinal 나오는데

 

원인은 GPU가 1개라서 따로 CUDA_VISIBLE_DEVICES설정해줘야 함.

 

echo $CUDA_VISIBLE_DEVICES

몇나오는지 확인후에 값이 없으면

 

본인 GPU값 셋팅해주면 됨.

export CUDA_VISIBLE_DEVICES=1

반응형