If you want to run your training code with ‘accelerate‘ fp8, you need to install ‘transformer_engine‘ or ‘MS-AMP‘. But these two packages are hard to install beccause they depends on specific CUDA/CUDNN versions. After one afternoon’s efforet, I finally gave up and started to directly using docker image ‘nvcr.io/nvidia/pytorch:24.04-py3’.docker run \
--gpus all \
-it \
--rm \
--shm-size="16g" \
--network host \
nvcr.io/nvidia/pytorch:24.04-py3After enter the container by using above command, I still need to install ‘accele
...
继续阅读
(29)