Installation Memo for Pytorch 1.6.0 on Huawei Kunpeng 920

发表于 2021-03-11 分类于 Tech Logs Disqus：本文字数： 3.5k 阅读时长 ≈ 13 分钟

After installing TensorFlow, I needed to install Pytorch to run HanLP. This time, I will focus on installing Pytorch. I encountered a lot of difficulties, especially with LAPACK.(English version Translated by GPT-3.5, 返回中文)

Description

The entire installation process will be done in Docker (CentOS 7) without using Conda. We will be using Python 3.8.
GCC-10.2 will be used for the installation. I intend to use HanLP Python components package with Pytorch on my local machine for learning and interest purposes. I will record the installation process, including any errors encountered. Please read the entire context before following this tutorial to avoid running into the same issues.
Docker’s CentOS 7 is the original image from Docker Hub.

Create Docker Container

Create the Docker Container

Use the following command to create a new official CentOS 7 Docker container and then enter the container to install the necessary dependencies.

1
2
3

docker run -d -p 9222:22 --name=testtest  -e "container=docker" --privileged=true --restart=always centos:7 /usr/sbin/init
docker exec -it 67bce55d5a71 bash
yum install wget curl telnet make net-tools initscripts sudo su openssh-server openssh-clients openssl-devel openssl zlib-devel gmp-devel mpfr-devel libmpc-devel gcc gcc-c++ zip unzip git libffi-devel -y

Console Output

[root@ecs-111 ~]# docker run -d -p 9222:22 --name=pytorch-docker  -e "container=docker" --privileged=true --restart=always centos:7 /usr/sbin/init
67bce55d5a714451b1a268642c10817b460e09765a4eff7446c8a16b8f9740d1
[root@ecs-111 ~]# docker exec -it 67bce55d5a71 bash
[root@67bce55d5a71 /]# yum install wget curl telnet make net-tools initscripts sudo su openssh-server openssh-clients openssl-devel openssl zlib-devel gmp-devel mpfr-devel libmpc-devel gcc gcc-c++ zip unzip git libffi-devel -y
Loaded plugins: fastestmirror, ovl
Determining fastest mirrors
 * base: mirrors.bfsu.edu.cn
 * extras: mirrors.bfsu.edu.cn
 * updates: mirrors.bfsu.edu.cn
base                                                                         | 3.6 kB  00:00:00     
extras                                                                       | 2.9 kB  00:00:00     
updates                                                                      | 2.9 kB  00:00:00     
(1/4): base/7/aarch64/group_gz                                               | 153 kB  00:00:00    
.......
Updated:
  curl.aarch64 0:7.29.0-59.el7_9.1                                                                                                                                      

Dependency Updated:
  glibc.aarch64 0:2.17-323.el7_9      glibc-common.aarch64 0:2.17-323.el7_9      libcurl.aarch64 0:7.29.0-59.el7_9.1      openssl-libs.aarch64 1:1.0.2k-21.el7_9     
  zlib.aarch64 0:1.2.7-19.el7_9      

Complete!
[root@67bce55d5a71 /]#

Change the password and enable the SSH service.

[root@67bce55d5a71 /]# passwd root
Changing password for user root.
New password: 
BAD PASSWORD: The password is shorter than 8 characters
Retype new password: 
passwd: all authentication tokens updated successfully.
[root@67bce55d5a71 /]# systemctl start sshd && systemctl enable sshd
[root@67bce55d5a71 /]#

Compile GCC-10.2 (Takes about 32 minutes)

I haven’t tested using the default GCC 4.8.5, as I have faced many issues with GCC while compiling TensorFlow in the past. I also want to avoid encountering issues with GLIBCXX version incompatibility.

Download and Compile GCC-10.2

Find a mirror from GCC mirror sites and navigate to the releases/gcc-10.2.0/ directory. I chose a mirror in Japan. Download link: gcc-10.2.0.tar.gz

wget -c http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/gcc-10.2.0/gcc-10.2.0.tar.gz
tar -zxvf gcc-10.2.0.tar.gz
cd gcc-10.2.0
./configure --prefix=/usr/local/gcc-10.2
make -j7
make install

Console Output

[root@67bce55d5a71 ~]# wget -c http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/gcc-10.2.0/gcc-10.2.0.tar.gz
--2021-03-11 07:02:16--  http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/gcc-10.2.0/gcc-10.2.0.tar.gz
Resolving ftp.tsukuba.wide.ad.jp (ftp.tsukuba.wide.ad.jp)... 203.178.132.80, 2001:200:0:7c06::9393
Connecting to ftp.tsukuba.wide.ad.jp (ftp.tsukuba.wide.ad.jp)|203.178.132.80|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 129184377 (123M), 129155731 (123M) remaining [application/x-gzip]
Saving to: 'gcc-10.2.0.tar.gz'

100%[======================================================>] 129,184,377 3.25MB/s   in 32s    

2021-03-11 07:34:16 (3.86 MB/s) - 'gcc-10.2.0.tar.gz' saved [129184377/129184377]

[root@67bce55d5a71 download]# tar -zxvf gcc-10.2.0.tar.gz
.....
gcc-10.2.0/.gitattributes
gcc-10.2.0/.dir-locals.el
[root@67bce55d5a71 download]# cd gcc-10.2.0
[root@67bce55d5a71 gcc-10.2.0]# ./configure --prefix=/usr/local/gcc-10.2
checking build system type... aarch64-unknown-linux-gnu
checking host system type... aarch64-unknown-linux-gnu
checking target system type... aarch64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
.....
checking whether to enable maintainer-specific portions of Makefiles... no
configure: creating ./config.status
config.status: creating Makefile
[root@67bce55d5a71 gcc-10.2.0]# make -j7 &&  make install
....
See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
make[4]: Nothing to be done for `install-data-am'.
make[4]: Leaving directory `/root/download/gcc-10.2.0/aarch64-unknown-linux-gnu/libatomic'
make[3]: Leaving directory `/root/download/gcc-10.2.0/aarch64-unknown-linux-gnu/libatomic'
make[2]: Leaving directory `/root/download/gcc-10.2.0/aarch64-unknown-linux-gnu/libatomic'
make[1]: Leaving directory `/root/download/gcc-10.2.0'                  
[root@67bce55d5a71 gcc-10.2.0]#

Add Environment Variables

1
2
3

echo 'export PATH=/usr/local/gcc-10.2/bin:$PATH' >> /etc/profile
echo 'export LD_LIBRARY_PATH=/usr/local/gcc-10.2/lib64:/usr/local/gcc-10.2/lib' >> /etc/profile
source /etc/profile

Test the Installation

[root@67bce55d5a71 ~]# gcc --version
gcc (GCC) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[root@67bce55d5a71 ~]#

Compile Python 3.8 (Takes about 12 minutes)

For Python 3.8, I am using Python 3.8.8. Download the installation package from Python Release Python 3.8.8 | Python.org. Download link: Python-3.8.8.tgz

Download the Installation Package

1
2
3

wget -c https://www.python.org/ftp/python/3.8.8/Python-3.8.8.tgz
tar -zxvf Python-3.8.8.tgz
cd Python-3.8.8

Execute the Compilation

When using the --enable-optimizations parameter, Python tests will be performed, providing better performance for executing Python code.

1
2
3

./configure --with-ssl-default-suites=openssl --enable-optimizations
make -j7
make install

Console output omitted as there were no errors.

Test Python Availability

[root@67bce55d5a71 Python-3.8.8]# python3
Python 3.8.8 (default, Mar 11 2021, 08:11:33) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Replace pip Source with Aliyun Source

1 2	pip3 install -i https://mirrors.aliyun.com/pypi/simple/ pip -U pip3 config set global.index-url https://mirrors.aliyun.com/pypi/simple/

Compile CMake (Takes about 10 minutes)

CMake is necessary for installing Pytorch. We will use the official version from Cmake Latest Release (3.19.6). Download link: cmake-3.19.6.tar.gz

Download the Installation Package

1
2
3

wget -c https://github.com/Kitware/CMake/releases/download/v3.19.6/cmake-3.19.6.tar.gz
tar -zxvf cmake-3.19.6.tar.gz
cd cmake-3.19.6

Execute the Compilation

During the first step ./configure --no-qt-gui, it may take a longer time. It is recommended to create a symbolic link beforehand to prevent recompilation. The configure step takes about 7 minutes.

1
2
3

./configure --no-qt-gui
gmake -j7 
make install

Console error

/root/download/cmake-3.19.6/Bootstrap.cmk/cmake: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /root/download/cmake-3.19.6/Bootstrap.cmk/cmake)
/root/download/cmake-3.19.6/Bootstrap.cmk/cmake: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /root/download/cmake-3.19.6/Bootstrap.cmk/cmake)
---------------------------------------------
Error when bootstrapping CMake:
Problem while running initial CMake
---------------------------------------------

This error occurs because the gcc-10.2 installed is located in /usr/local, while it is trying to access /lib64/libstdc++.so.6 from /lib64.

Check the GLIBCXX versions supported by gcc-10.2

[root@67bce55d5a71 cmake-3.19.6]# strings /usr/local/gcc-10.2/lib64/libstdc++.so | grep ^GLIBCXX
GLIBCXX_3.4
GLIBCXX_3.4.1
......
GLIBCXX_3.4.20
GLIBCXX_3.4.21
GLIBCXX_3.4.22
GLIBCXX_3.4.23
GLIBCXX_3.4.24
GLIBCXX_3.4.25
GLIBCXX_3.4.26
GLIBCXX_3.4.27
GLIBCXX_3.4.28
GLIBCXX_DEBUG_MESSAGE_LENGTH
GLIBCXX_3.4.21
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.16
GLIBCXX_3.4.1
GLIBCXX_3.4.28
GLIBCXX_3.4.25
....
GLIBCXX_3.4.26

It is clear that gcc-10.2 supports GLIBCXX_3.4.21. Therefore, creating a symbolic link should resolve the issue.

1 2	unlink /lib64/libstdc++.so.6 ln -s /usr/local/gcc-10.2/lib64/libstdc++.so /lib64/libstdc++.so.6

Then run ./configure --no-qt-gui again.

Console Output

....
-- Checking for curses support
-- Checking for curses support - Failed
-- Looking for elf.h
-- Looking for elf.h - found
-- Looking for a Fortran compiler
-- Looking for a Fortran compiler - /usr/local/gcc-10.2/bin/gfortran
-- Performing Test run_pic_test
-- Performing Test run_pic_test - Success
-- Performing Test run_inlines_hidden_test
-- Performing Test run_inlines_hidden_test - Success
-- Configuring done
-- Generating done
-- Build files have been written to: /root/download/cmake-3.19.6
---------------------------------------------
CMake has bootstrapped.  Now run gmake.
[root@67bce55d5a71 cmake-3.19.6]# date

This time it was successful. Proceed to run gmake -j7 && make install.

Install Pytorch (Takes about 28 minutes)

Before installing Pytorch, clone its code. This process is quite lengthy, so it is recommended to use a VPN for faster downloading. You can refer to the official GitHub repository for instructions: GitHub - pytorch/pytorch at v1.6.0

Clone Pytorch

This step will recursively download many source code packages. A total of 34 modules (including 33 sub-modules) need to be downloaded, totaling 1.03GB.

git clone -b v1.6.0 --recursive https://github.com/pytorch/pytorch.git
cd pytorch
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive

Install Python Dependencies

1 2	pip install wheel pip install numpy ninja pyyaml setuptools cffi typing_extensions future six requests dataclasses Cython

There was a red error message, and upon closer inspection, it turned out to be because of network issues during the cloning process. Trying again after connecting to a VPN by running the second line of code.


-- Build files have been written to: /tmp/pip-install-p6yq4172/ninja_2beb9619a1d947568581b78204e58ab9/_skbuild/linux-aarch64-3.8/cmake-build
Scanning dependencies of target download_ninja_source
[ 10%] Creating directories for 'download_ninja_source'
[ 20%] Performing download step (download, verify and extract) for 'download_ninja_source'
-- Downloading...
   dst='/tmp/pip-install-p6yq4172/ninja_2beb9619a1d947568581b78204e58ab9/_skbuild/linux-aarch64-3.8/cmake-build/v1.10.0.gfb670.kitware.jobserver-1.tar.gz'
   timeout='none'
   inactivity timeout='none'
-- Using src='https://github.com/kitware/ninja/archive/v1.10.0.gfb670.kitware.jobserver-1.tar.gz'
CMake Error at _skbuild/linux-aarch64-3.8/cmake-build/download_ninja_source-prefix/src/download_ninja_source-stamp/download-download_ninja_source.cmake:170 (message):
  Each download failed!

    error: downloading 'https://github.com/kitware/ninja/archive/v1.10.0.gfb670.kitware.jobserver-1.tar.gz' failed
          status_code: 28
          status_string: "Timeout was reached"
          log:
          --- LOG BEGIN ---
            Trying 13.250.177.223:443...

  connect to 13.250.177.223 port 443 failed: Connection timed out

  Failed to connect to github.com port 443: Connection timed out

After re-running, it was successful.

Building wheels for collected packages: ninja
  Building wheel for ninja (PEP 517) ... done
  Created wheel for ninja: filename=ninja-1.10.0.post2-cp38-cp38-linux_aarch64.whl size=112136 sha256=b65f8597c88b6c58577c534e254700267877c4cb0896a12c9f80fc83d2041a50
  Stored in directory: /root/.cache/pip/wheels/75/4e/92/8e0a2f0960c17371491b56a359066f9bfb43e69544a96f1881
Successfully built ninja
Installing collected packages: urllib3, pycparser, idna, chardet, certifi, typing-extensions, six, requests, pyyaml, numpy, ninja, future, dataclasses, Cython, cffi
Successfully installed Cython-0.29.22 certifi-2020.12.5 cffi-1.14.5 chardet-4.0.0 dataclasses-0.6 future-0.18.2 idna-2.10 ninja-1.10.0.post2 numpy-1.20.1 pycparser-2.20 pyyaml-5.4.1 requests-2.25.1 six-1.15.0 typing-extensions-3.7.4.3 urllib3-1.26.3

Build the Source Code

If you also need to run HanLP models, you need to run yum install lapack64-devel lapack-devel first to install the lapack development package. Otherwise, you will encounter the error LAPACK library not found in compilation when building the HanLP module.

1	python3 setup.py install

During the configure process, the following output indicates if LAPACK has been found.

.....
-- Performing Test CXX_HAS_AVX_3
-- Performing Test CXX_HAS_AVX_3 - Failed
-- Performing Test CXX_HAS_AVX2_1
-- Performing Test CXX_HAS_AVX2_1 - Failed
-- Performing Test CXX_HAS_AVX2_2
-- Performing Test CXX_HAS_AVX2_2 - Failed
-- Performing Test CXX_HAS_AVX2_3
-- Performing Test CXX_HAS_AVX2_3 - Failed
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found a library with LAPACK API (generic) ----------------------- 这一句.
disabling CUDA because NOT USE_CUDA is set
-- USE_CUDNN is set to 0. Compiling without cuDNN support
disabling ROCM because NOT USE_ROCM is set
-- MIOpen not found. Compiling without MIOpen support
disabling MKLDNN because USE_MKLDNN is not set
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for mmap
-- Looking for mmap - found
-- Looking for shm_open
-- Looking for shm_open - found
-- Looking for shm_unlink
.....

Of course, there was an error in the middle. The error message was:

FAILED: confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-dq-aarch64-neon.S.o 
/usr/local/gcc-10.2/bin/gcc -DCPUINFO_SUPPORTED_PLATFORM=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DPYTORCH_QNNPACK_RUNTIME_QUANTIZATION=1 -I../aten/src/ATen/native/quantized/cpu/qnnpack/include -I../aten/src/ATen/native/quantized/cpu/qnnpack/src -I../third_party/cpuinfo/deps/clog/include -I../third_party/cpuinfo/include -I../third_party/pthreadpool/include -I../third_party/FXdiv/include -I../third_party/psimd/include -I../third_party/FP16/include -isystem ../third_party/protobuf/src -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -O3 -DNDEBUG -fPIC -pthread -MD -MT confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-dq-aarch64-neon.S.o -MF confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-dq-aarch64-neon.S.o.d -o confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-dq-aarch64-neon.S.o -c ../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S: Assembler messages:
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:690: Error: operand mismatch -- `mov V8.4s,V9.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:690: Info:    did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:690: Info:    	mov v8.8b,v9.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:690: Info:    other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:690: Info:    	mov v8.16b,v9.16b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:691: Error: operand mismatch -- `mov v10.4s,v11.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:691: Info:    did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:691: Info:    	mov v10.8b,v11.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:691: Info:    other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:691: Info:    	mov v10.16b,v11.16b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:692: Error: operand mismatch -- `mov v12.4s,V13.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:692: Info:    did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:692: Info:    	mov v12.8b,v13.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:692: Info:    other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:692: Info:    	mov v12.16b,v13.16b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:693: Error: operand mismatch -- `mov V14.4s,V15.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:693: Info:    did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:693: Info:    	mov v14.8b,v15.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:693: Info:    other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:693: Info:    	mov v14.16b,v15.16b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:694: Error: operand mismatch -- `mov V16.4s,V17.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:694: Info:    did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:694: Info:    	mov v16.8b,v17.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:694: Info:    other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:694: Info:    	mov v16.16b,v17.16b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:695: Error: operand mismatch -- `mov V18.4s,V19.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:695: Info:    did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:695: Info:    	mov v18.8b,v19.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:695: Info:    other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:695: Info:    	mov v18.16b,v19.16b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:696: Error: operand mismatch -- `mov V20.4s,V21.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:696: Info:    did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:696: Info:    	mov v20.8b,v21.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:696: Info:    other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:696: Info:    	mov v20.16b,v21.16b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:697: Error: operand mismatch -- `mov V22.4s,V23.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:697: Info:    did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:697: Info:    	mov v22.8b,v23.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:697: Info:    other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:697: Info:    	mov v22.16b,v23.16b
[347/3718] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8zip/xm-neon.c.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "setup.py", line 732, in <module>
    build_deps()
  File "setup.py", line 311, in build_deps
    build_caffe2(version=version,
  File "/root/download/pytorch/tools/build_pytorch_libs.py", line 62, in build_caffe2
    cmake.build(my_env)
  File "/root/download/pytorch/tools/setup_helpers/cmake.py", line 345, in build
    self.run(build_args, my_env)
  File "/root/download/pytorch/tools/setup_helpers/cmake.py", line 141, in run
    check_call(command, cwd=self.build_dir, env=env)
  File "/usr/local/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '8']' returned non-zero exit status 1.

For this, refer to the answer provided by Ed-Swarthout-NXP in QNNPACK: GNU aarch64 assembler does not support 4s on neon mov · Issue #33124 · pytorch/pytorch · GitHub on Mar 21, 2020. The file aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S needs to be modified as follows.

-    MOV V8.4s, V9.4s
-    MOV v10.4s, v11.4s
-    MOV v12.4s, V13.4s
-    MOV V14.4s, V15.4s
-    MOV V16.4s, V17.4s
-    MOV V18.4s, V19.4s
-    MOV V20.4s, V21.4s
-    MOV V22.4s, V23.4s
+    MOV V8.16b, V9.16b
+    MOV v10.16b, v11.16b
+    MOV v12.16b, V13.16b
+    MOV V14.16b, V15.16b
+    MOV V16.16b, V17.16b
+    MOV V18.16b, V19.16b
+    MOV V20.16b, V21.16b

Then, run the compilation command python3 setup.py install. The compilation takes about 28 minutes, and there were no errors after making the modifications mentioned above.

Test

Run python3 and enter import torch. If there are no import errors, the installation is complete.

Console Output

[root@67bce55d5a71 download]# python3
Python 3.8.8 (default, Mar 11 2021, 08:11:33) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>>

Oh, by the way, remember to cd ../ to leave the Pytorch source code directory. Otherwise, if you run import torch in the source code package, you will encounter the following error:

[root@67bce55d5a71 pytorch]# python3
Python 3.8.8 (default, Mar 11 2021, 08:11:33) 
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/download/pytorch/torch/__init__.py", line 335, in <module>
    from .random import set_rng_state, get_rng_state, manual_seed, initial_seed, seed
  File "/root/download/pytorch/torch/random.py", line 4, in <module>
    from torch._C import default_generator
ImportError: cannot import name 'default_generator' from 'torch._C' (unknown location)
>>>

Export the whl File

To ensure smooth installation in the future, you can export the whl file. To export the file, you need to install the wheel package using pip: pip install wheel

1	python3 setup.py bdist_wheel

There were no errors, and the exported package will be located in dist/torch-1.6.0a0+b31f58d-cp38-cp38-linux_aarch64.whl

Console Output

[root@67bce55d5a71 pytorch]# python3 setup.py bdist_wheel
Building wheel torch-1.6.0a0+b31f58d
-- Building version 1.6.0a0+b31f58d
cmake --build . --target install --config Release -- -j 8
[0/1] Install the project...
-- Install configuration: "Release"
running bdist_wheel
running build
running build_py
copying torch/version.py -> build/lib.linux-aarch64-3.8/torch
....
copying caffe2/proto/metanet_pb2.py -> build/lib.linux-aarch64-3.8/caffe2/proto
running build_ext
-- Building with NumPy bindings
-- Not using cuDNN
-- Not using CUDA
-- Not using MKLDNN
-- Not using NCCL
-- Building with distributed package 

Copying extension caffe2.python.caffe2_pybind11_state
Copying caffe2.python.caffe2_pybind11_state from torch/lib/python3.8/site-packages/caffe2/python/caffe2_pybind11_state.cpython-38-aarch64-linux-gnu.so to /root/download/pytorch/build/lib.linux-aarch64-3.8/caffe2/python/caffe2_pybind11_state.cpython-38-aarch64-linux-gnu.so
installing to build/bdist.linux-aarch64/wheel
running install
running install_lib
creating build/bdist.linux-aarch64
creating build/bdist.linux-aarch64/wheel
.....
adding 'torch/utils/tensorboard/summary.py'
adding 'torch/utils/tensorboard/writer.py'
adding 'torch-1.6.0a0+b31f58d.dist-info/LICENSE'
adding 'torch-1.6.0a0+b31f58d.dist-info/METADATA'
adding 'torch-1.6.0a0+b31f58d.dist-info/NOTICE'
adding 'torch-1.6.0a0+b31f58d.dist-info/WHEEL'
adding 'torch-1.6.0a0+b31f58d.dist-info/entry_points.txt'
adding 'torch-1.6.0a0+b31f58d.dist-info/top_level.txt'
adding 'torch-1.6.0a0+b31f58d.dist-info/RECORD'
removing build/bdist.linux-aarch64/wheel
[root@67bce55d5a71 pytorch]# cd dist/
[root@67bce55d5a71 dist]# ls
torch-1.6.0a0+b31f58d-cp38-cp38-linux_aarch64.whl
[root@67bce55d5a71 dist]#

Done

Download the Compiled Installation Package

I will also provide the compiled installation package.

File: torch-1.6.0a0+b31f58d-cp38-cp38-linux_aarch64.whl

MD5: 128cf04ee699a1af0d01ce58c026aa84

Size: 73MB (76,326,517 B)

Download: RuterFu My Storage / China Telecom Cloud Drive - Access Code: yzv5