华为鲲鹏920安装Pytorch 1.6.0 从下载到安装完整备忘录

之前安装好tensorflow后,运行HanLP还需要安装Pytorch,所以,这次是折腾Pytorch部分。。这里面的LAPACK部分当时真的踩了好多坑。。

(English version translate by GPT-3.5)

说明

  1. 安装过程将全程在docker(CentOS 7)中进行,不装conda, 使用python3.8环境。
  2. 将使用GCC-10.2安装, 我实际是要在本机运行HanLP Python组件包, 用到Pytorch,作为学习和兴趣为目的,记录其安装过程,中途的报错我也是完整记录, 当教程使用请先看完上下文以防止遇到和我一样的坑。
  3. Docker中的CentOS 7为Docker Hub中的CentOS 7原生镜像。

创建Docker容器

先创建Docker容器

使用下面的命令来创建一个全新的官方CentOS7 Docker,然后进入这个docker, 并安装相应依赖

1
2
3
docker run -d -p 9222:22 --name=testtest  -e "container=docker" --privileged=true --restart=always centos:7 /usr/sbin/init
docker exec -it 67bce55d5a71 bash
yum install wget curl telnet make net-tools initscripts sudo su openssh-server openssh-clients openssl-devel openssl zlib-devel gmp-devel mpfr-devel libmpc-devel gcc gcc-c++ zip unzip git libffi-devel -y

控制台输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[root@ecs-111 ~]# docker run -d -p 9222:22 --name=pytorch-docker  -e "container=docker" --privileged=true --restart=always centos:7 /usr/sbin/init
67bce55d5a714451b1a268642c10817b460e09765a4eff7446c8a16b8f9740d1
[root@ecs-111 ~]# docker exec -it 67bce55d5a71 bash
[root@67bce55d5a71 /]# yum install wget curl telnet make net-tools initscripts sudo su openssh-server openssh-clients openssl-devel openssl zlib-devel gmp-devel mpfr-devel libmpc-devel gcc gcc-c++ zip unzip git libffi-devel -y
Loaded plugins: fastestmirror, ovl
Determining fastest mirrors
* base: mirrors.bfsu.edu.cn
* extras: mirrors.bfsu.edu.cn
* updates: mirrors.bfsu.edu.cn
base | 3.6 kB 00:00:00
extras | 2.9 kB 00:00:00
updates | 2.9 kB 00:00:00
(1/4): base/7/aarch64/group_gz | 153 kB 00:00:00
.......
Updated:
curl.aarch64 0:7.29.0-59.el7_9.1

Dependency Updated:
glibc.aarch64 0:2.17-323.el7_9 glibc-common.aarch64 0:2.17-323.el7_9 libcurl.aarch64 0:7.29.0-59.el7_9.1 openssl-libs.aarch64 1:1.0.2k-21.el7_9
zlib.aarch64 0:1.2.7-19.el7_9

Complete!
[root@67bce55d5a71 /]#

顺便改个密码,然后开启ssh服务

1
2
3
4
5
6
7
8
[root@67bce55d5a71 /]# passwd root
Changing password for user root.
New password:
BAD PASSWORD: The password is shorter than 8 characters
Retype new password:
passwd: all authentication tokens updated successfully.
[root@67bce55d5a71 /]# systemctl start sshd && systemctl enable sshd
[root@67bce55d5a71 /]#

编译GCC-10.2 耗时约32分钟

至于能否使用默认的gcc4.8.5,我并没试过,因为之前编译tensorflow,单纯gcc的坑就让我折腾了好久,也省的出现之前安装tensorflow GLIBCXX版本不够的情况

下载并执行编译GCC-10.2

GCC mirror sites 中找一个镜像,然后进入releases/gcc-10.2.0/这个目录, 我选择的是日本的镜像, 下载地址: gcc-10.2.0.tar.gz

1
2
3
4
5
6
wget -c http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/gcc-10.2.0/gcc-10.2.0.tar.gz
tar -zxvf gcc-10.2.0.tar.gz
cd gcc-10.2.0
./configure --prefix=/usr/local/gcc-10.2
make -j7
make install

控制台输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
[root@67bce55d5a71 ~]# wget -c http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/gcc-10.2.0/gcc-10.2.0.tar.gz
--2021-03-11 07:02:16-- http://ftp.tsukuba.wide.ad.jp/software/gcc/releases/gcc-10.2.0/gcc-10.2.0.tar.gz
Resolving ftp.tsukuba.wide.ad.jp (ftp.tsukuba.wide.ad.jp)... 203.178.132.80, 2001:200:0:7c06::9393
Connecting to ftp.tsukuba.wide.ad.jp (ftp.tsukuba.wide.ad.jp)|203.178.132.80|:80... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 129184377 (123M), 129155731 (123M) remaining [application/x-gzip]
Saving to: 'gcc-10.2.0.tar.gz'

100%[======================================================>] 129,184,377 3.25MB/s in 32s

2021-03-11 07:34:16 (3.86 MB/s) - 'gcc-10.2.0.tar.gz' saved [129184377/129184377]

[root@67bce55d5a71 download]# tar -zxvf gcc-10.2.0.tar.gz
.....
gcc-10.2.0/.gitattributes
gcc-10.2.0/.dir-locals.el
[root@67bce55d5a71 download]# cd gcc-10.2.0
[root@67bce55d5a71 gcc-10.2.0]# ./configure --prefix=/usr/local/gcc-10.2
checking build system type... aarch64-unknown-linux-gnu
checking host system type... aarch64-unknown-linux-gnu
checking target system type... aarch64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
.....
checking whether to enable maintainer-specific portions of Makefiles... no
configure: creating ./config.status
config.status: creating Makefile
[root@67bce55d5a71 gcc-10.2.0]# make -j7 && make install
....
See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
----------------------------------------------------------------------
make[4]: Nothing to be done for `install-data-am'.
make[4]: Leaving directory `/root/download/gcc-10.2.0/aarch64-unknown-linux-gnu/libatomic'
make[3]: Leaving directory `/root/download/gcc-10.2.0/aarch64-unknown-linux-gnu/libatomic'
make[2]: Leaving directory `/root/download/gcc-10.2.0/aarch64-unknown-linux-gnu/libatomic'
make[1]: Leaving directory `/root/download/gcc-10.2.0'
[root@67bce55d5a71 gcc-10.2.0]#

添加环境变量

1
2
3
echo 'export PATH=/usr/local/gcc-10.2/bin:$PATH' >> /etc/profile
echo 'export LD_LIBRARY_PATH=/usr/local/gcc-10.2/lib64:/usr/local/gcc-10.2/lib' >> /etc/profile
source /etc/profile

测试安装结果

1
2
3
4
5
6
7
[root@67bce55d5a71 ~]# gcc --version
gcc (GCC) 10.2.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[root@67bce55d5a71 ~]#

编译Python3.8(耗时约12分钟)

Python3.8选用Python3.8.8, 从 Python Release Python 3.8.8 | Python.org 下载安装包 Python-3.8.8.tgz

下载安装包

1
2
3
wget -c https://www.python.org/ftp/python/3.8.8/Python-3.8.8.tgz
tar -zxvf Python-3.8.8.tgz
cd Python-3.8.8

执行编译

使用--enable-optimizations参数后会进行Python测试, 可以在后期执行python代码时提供更好的性能

1
2
3
./configure --with-ssl-default-suites=openssl --enable-optimizations
make -j7
make install

控制台输出略, 不会有报错的

测试Python可用性

1
2
3
4
5
[root@67bce55d5a71 Python-3.8.8]# python3
Python 3.8.8 (default, Mar 11 2021, 08:11:33)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

替换pip源为阿里云源

1
2
pip3 install -i https://mirrors.aliyun.com/pypi/simple/ pip -U
pip3 config set global.index-url https://mirrors.aliyun.com/pypi/simple/

编译CMake(耗时约 10分钟)

Pytorch安装过程中需要使用CMake, 选用官方版本Cmake Latest Release (3.19.6) 中的 cmake-3.19.6.tar.gz版本

下载安装包

1
2
3
wget -c https://github.com/Kitware/CMake/releases/download/v3.19.6/cmake-3.19.6.tar.gz
tar -zxvf cmake-3.19.6.tar.gz
cd cmake-3.19.6

执行编译

在第一步 ./configure --no-qt-gui 会执行较长是时间,这里最好先做好软连接,防止重新编译,configure约耗时7分钟

1
2
3
./configure --no-qt-gui
gmake -j7
make install

控制台报错

1
2
3
4
5
6
/root/download/cmake-3.19.6/Bootstrap.cmk/cmake: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /root/download/cmake-3.19.6/Bootstrap.cmk/cmake)
/root/download/cmake-3.19.6/Bootstrap.cmk/cmake: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /root/download/cmake-3.19.6/Bootstrap.cmk/cmake)
---------------------------------------------
Error when bootstrapping CMake:
Problem while running initial CMake
---------------------------------------------

这个错误是因为我装的gcc10.2是在/usr/local下面的, 而它从 /lib64 下读取/lib64/libstdc++.so.6文件,

看下gcc-10.2所支持的GLIBCXX版本

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[root@67bce55d5a71 cmake-3.19.6]# strings /usr/local/gcc-10.2/lib64/libstdc++.so | grep ^GLIBCXX
GLIBCXX_3.4
GLIBCXX_3.4.1
......
GLIBCXX_3.4.20
GLIBCXX_3.4.21
GLIBCXX_3.4.22
GLIBCXX_3.4.23
GLIBCXX_3.4.24
GLIBCXX_3.4.25
GLIBCXX_3.4.26
GLIBCXX_3.4.27
GLIBCXX_3.4.28
GLIBCXX_DEBUG_MESSAGE_LENGTH
GLIBCXX_3.4.21
GLIBCXX_3.4.9
GLIBCXX_3.4.10
GLIBCXX_3.4.16
GLIBCXX_3.4.1
GLIBCXX_3.4.28
GLIBCXX_3.4.25
....
GLIBCXX_3.4.26

很明显gcc-10.2是有GLIBCXX_3.4.21的,所以这里只要做好软连接即可

1
2
unlink /lib64/libstdc++.so.6
ln -s /usr/local/gcc-10.2/lib64/libstdc++.so /lib64/libstdc++.so.6

然后再次运行 ./configure --no-qt-gui

控制台输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
....
-- Checking for curses support
-- Checking for curses support - Failed
-- Looking for elf.h
-- Looking for elf.h - found
-- Looking for a Fortran compiler
-- Looking for a Fortran compiler - /usr/local/gcc-10.2/bin/gfortran
-- Performing Test run_pic_test
-- Performing Test run_pic_test - Success
-- Performing Test run_inlines_hidden_test
-- Performing Test run_inlines_hidden_test - Success
-- Configuring done
-- Generating done
-- Build files have been written to: /root/download/cmake-3.19.6
---------------------------------------------
CMake has bootstrapped. Now run gmake.
[root@67bce55d5a71 cmake-3.19.6]# date

这次就是成功了, 然后继续运行下面的 gmake -j7 && make install 就好了

安装Pytorch(耗时约 28分钟)

安装Pytorch前, 需要clone下它的代码, 这个过程较为漫长, 建议使用使用科学上网进行下载,这里开始可以参考官方Github库进行 GitHub - pytorch/pytorch at v1.6.0

克隆 Pytorch

这一步会递归下载比较多的源码包, 总共需要下载34个模块(含33个子模块),共需要下载 1.03GB

1
2
3
4
5
git clone -b v1.6.0 --recursive https://github.com/pytorch/pytorch.git
cd pytorch
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive

安装Python相关依赖

1
2
pip install wheel
pip install numpy ninja pyyaml setuptools cffi typing_extensions future six requests dataclasses Cython

报了一个大大的红色, 仔细看后, 看到其中错误是因为网络原因不能clone项目,这里丢一个科学上网,然后重新执行上面第二行代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

-- Build files have been written to: /tmp/pip-install-p6yq4172/ninja_2beb9619a1d947568581b78204e58ab9/_skbuild/linux-aarch64-3.8/cmake-build
Scanning dependencies of target download_ninja_source
[ 10%] Creating directories for 'download_ninja_source'
[ 20%] Performing download step (download, verify and extract) for 'download_ninja_source'
-- Downloading...
dst='/tmp/pip-install-p6yq4172/ninja_2beb9619a1d947568581b78204e58ab9/_skbuild/linux-aarch64-3.8/cmake-build/v1.10.0.gfb670.kitware.jobserver-1.tar.gz'
timeout='none'
inactivity timeout='none'
-- Using src='https://github.com/kitware/ninja/archive/v1.10.0.gfb670.kitware.jobserver-1.tar.gz'
CMake Error at _skbuild/linux-aarch64-3.8/cmake-build/download_ninja_source-prefix/src/download_ninja_source-stamp/download-download_ninja_source.cmake:170 (message):
Each download failed!

error: downloading 'https://github.com/kitware/ninja/archive/v1.10.0.gfb670.kitware.jobserver-1.tar.gz' failed
status_code: 28
status_string: "Timeout was reached"
log:
--- LOG BEGIN ---
Trying 13.250.177.223:443...

connect to 13.250.177.223 port 443 failed: Connection timed out

Failed to connect to github.com port 443: Connection timed out

重新运行 后, 成功

1
2
3
4
5
6
7
Building wheels for collected packages: ninja
Building wheel for ninja (PEP 517) ... done
Created wheel for ninja: filename=ninja-1.10.0.post2-cp38-cp38-linux_aarch64.whl size=112136 sha256=b65f8597c88b6c58577c534e254700267877c4cb0896a12c9f80fc83d2041a50
Stored in directory: /root/.cache/pip/wheels/75/4e/92/8e0a2f0960c17371491b56a359066f9bfb43e69544a96f1881
Successfully built ninja
Installing collected packages: urllib3, pycparser, idna, chardet, certifi, typing-extensions, six, requests, pyyaml, numpy, ninja, future, dataclasses, Cython, cffi
Successfully installed Cython-0.29.22 certifi-2020.12.5 cffi-1.14.5 chardet-4.0.0 dataclasses-0.6 future-0.18.2 idna-2.10 ninja-1.10.0.post2 numpy-1.20.1 pycparser-2.20 pyyaml-5.4.1 requests-2.25.1 six-1.15.0 typing-extensions-3.7.4.3 urllib3-1.26.3

构建源码包

如果也像我一样有运行HanLP模型需求的, 首先得运行下yum install lapack64-devel lapack-devel以安装lapack开发包, 否则在HanLP building module的时候会报出LAPACK library not found in compilation的错误

1
python3 setup.py install

在configure过程中, 会输出如下内容可供判断, LAPACK是否已经被找到

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
.....
-- Performing Test CXX_HAS_AVX_3
-- Performing Test CXX_HAS_AVX_3 - Failed
-- Performing Test CXX_HAS_AVX2_1
-- Performing Test CXX_HAS_AVX2_1 - Failed
-- Performing Test CXX_HAS_AVX2_2
-- Performing Test CXX_HAS_AVX2_2 - Failed
-- Performing Test CXX_HAS_AVX2_3
-- Performing Test CXX_HAS_AVX2_3 - Failed
-- Looking for cheev_
-- Looking for cheev_ - found
-- Found a library with LAPACK API (generic) ----------------------- 这一句.
disabling CUDA because NOT USE_CUDA is set
-- USE_CUDNN is set to 0. Compiling without cuDNN support
disabling ROCM because NOT USE_ROCM is set
-- MIOpen not found. Compiling without MIOpen support
disabling MKLDNN because USE_MKLDNN is not set
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - found
-- Looking for mmap
-- Looking for mmap - found
-- Looking for shm_open
-- Looking for shm_open - found
-- Looking for shm_unlink
.....

当然编译到一半就报错了, 错误内容是

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
FAILED: confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-dq-aarch64-neon.S.o 
/usr/local/gcc-10.2/bin/gcc -DCPUINFO_SUPPORTED_PLATFORM=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DPYTORCH_QNNPACK_RUNTIME_QUANTIZATION=1 -I../aten/src/ATen/native/quantized/cpu/qnnpack/include -I../aten/src/ATen/native/quantized/cpu/qnnpack/src -I../third_party/cpuinfo/deps/clog/include -I../third_party/cpuinfo/include -I../third_party/pthreadpool/include -I../third_party/FXdiv/include -I../third_party/psimd/include -I../third_party/FP16/include -isystem ../third_party/protobuf/src -isystem ../third_party/gemmlowp -isystem ../third_party/neon2sse -O3 -DNDEBUG -fPIC -pthread -MD -MT confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-dq-aarch64-neon.S.o -MF confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-dq-aarch64-neon.S.o.d -o confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/q8gemm/8x8-dq-aarch64-neon.S.o -c ../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S: Assembler messages:
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:690: Error: operand mismatch -- `mov V8.4s,V9.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:690: Info: did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:690: Info: mov v8.8b,v9.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:690: Info: other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:690: Info: mov v8.16b,v9.16b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:691: Error: operand mismatch -- `mov v10.4s,v11.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:691: Info: did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:691: Info: mov v10.8b,v11.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:691: Info: other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:691: Info: mov v10.16b,v11.16b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:692: Error: operand mismatch -- `mov v12.4s,V13.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:692: Info: did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:692: Info: mov v12.8b,v13.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:692: Info: other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:692: Info: mov v12.16b,v13.16b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:693: Error: operand mismatch -- `mov V14.4s,V15.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:693: Info: did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:693: Info: mov v14.8b,v15.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:693: Info: other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:693: Info: mov v14.16b,v15.16b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:694: Error: operand mismatch -- `mov V16.4s,V17.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:694: Info: did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:694: Info: mov v16.8b,v17.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:694: Info: other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:694: Info: mov v16.16b,v17.16b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:695: Error: operand mismatch -- `mov V18.4s,V19.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:695: Info: did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:695: Info: mov v18.8b,v19.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:695: Info: other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:695: Info: mov v18.16b,v19.16b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:696: Error: operand mismatch -- `mov V20.4s,V21.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:696: Info: did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:696: Info: mov v20.8b,v21.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:696: Info: other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:696: Info: mov v20.16b,v21.16b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:697: Error: operand mismatch -- `mov V22.4s,V23.4s'
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:697: Info: did you mean this?
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:697: Info: mov v22.8b,v23.8b
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:697: Info: other valid variant(s):
../aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S:697: Info: mov v22.16b,v23.16b
[347/3718] Building C object confu-deps/pytorch_qnnpack/CMakeFiles/pytorch_qnnpack.dir/src/x8zip/xm-neon.c.o
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "setup.py", line 732, in <module>
build_deps()
File "setup.py", line 311, in build_deps
build_caffe2(version=version,
File "/root/download/pytorch/tools/build_pytorch_libs.py", line 62, in build_caffe2
cmake.build(my_env)
File "/root/download/pytorch/tools/setup_helpers/cmake.py", line 345, in build
self.run(build_args, my_env)
File "/root/download/pytorch/tools/setup_helpers/cmake.py", line 141, in run
check_call(command, cwd=self.build_dir, env=env)
File "/usr/local/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', 'install', '--config', 'Release', '--', '-j', '8']' returned non-zero exit status 1.

这里参考 QNNPACK: GNU aarch64 assembler does not support 4s on neon mov · Issue #33124 · pytorch/pytorch · GitHub Ed-Swarthout-NXP commented on Mar 21, 2020给出的回答, 需要对文件 aten/src/ATen/native/quantized/cpu/qnnpack/src/q8gemm/8x8-dq-aarch64-neon.S做如下修改

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-    MOV V8.4s, V9.4s
- MOV v10.4s, v11.4s
- MOV v12.4s, V13.4s
- MOV V14.4s, V15.4s
- MOV V16.4s, V17.4s
- MOV V18.4s, V19.4s
- MOV V20.4s, V21.4s
- MOV V22.4s, V23.4s
+ MOV V8.16b, V9.16b
+ MOV v10.16b, v11.16b
+ MOV v12.16b, V13.16b
+ MOV V14.16b, V15.16b
+ MOV V16.16b, V17.16b
+ MOV V18.16b, V19.16b
+ MOV V20.16b, V21.16b

然后重新执行编译 python3 setup.py install,编译大概需要28分钟,上面修改好后,后面就没有报错了。

测试

运行python3, 并输入 import torch, 能导入了即表示完成

控制台输出

1
2
3
4
5
6
[root@67bce55d5a71 download]# python3
Python 3.8.8 (default, Mar 11 2021, 08:11:33)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>>

哦对了, 这时候要注意cd ../, 离开pytorch的源码包目录, 不然在源码包下用python执行 import torch会有如下错误

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@67bce55d5a71 pytorch]# python3
Python 3.8.8 (default, Mar 11 2021, 08:11:33)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/download/pytorch/torch/__init__.py", line 335, in <module>
from .random import set_rng_state, get_rng_state, manual_seed, initial_seed, seed
File "/root/download/pytorch/torch/random.py", line 4, in <module>
from torch._C import default_generator
ImportError: cannot import name 'default_generator' from 'torch._C' (unknown location)
>>>

导出whl文件

为确保下一次安装顺利, 这里可以导出whl文件, 导出文件只需要用1行命令即可, 这个命令需要pip安装 wheel组件包 pip install wheel

1
python3 setup.py bdist_wheel

全程没有错误,安装包导出后会生成在 dist/torch-1.6.0a0+b31f58d-cp38-cp38-linux_aarch64.whl

控制台输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
[root@67bce55d5a71 pytorch]# python3 setup.py bdist_wheel
Building wheel torch-1.6.0a0+b31f58d
-- Building version 1.6.0a0+b31f58d
cmake --build . --target install --config Release -- -j 8
[0/1] Install the project...
-- Install configuration: "Release"
running bdist_wheel
running build
running build_py
copying torch/version.py -> build/lib.linux-aarch64-3.8/torch
....
copying caffe2/proto/metanet_pb2.py -> build/lib.linux-aarch64-3.8/caffe2/proto
running build_ext
-- Building with NumPy bindings
-- Not using cuDNN
-- Not using CUDA
-- Not using MKLDNN
-- Not using NCCL
-- Building with distributed package

Copying extension caffe2.python.caffe2_pybind11_state
Copying caffe2.python.caffe2_pybind11_state from torch/lib/python3.8/site-packages/caffe2/python/caffe2_pybind11_state.cpython-38-aarch64-linux-gnu.so to /root/download/pytorch/build/lib.linux-aarch64-3.8/caffe2/python/caffe2_pybind11_state.cpython-38-aarch64-linux-gnu.so
installing to build/bdist.linux-aarch64/wheel
running install
running install_lib
creating build/bdist.linux-aarch64
creating build/bdist.linux-aarch64/wheel
.....
adding 'torch/utils/tensorboard/summary.py'
adding 'torch/utils/tensorboard/writer.py'
adding 'torch-1.6.0a0+b31f58d.dist-info/LICENSE'
adding 'torch-1.6.0a0+b31f58d.dist-info/METADATA'
adding 'torch-1.6.0a0+b31f58d.dist-info/NOTICE'
adding 'torch-1.6.0a0+b31f58d.dist-info/WHEEL'
adding 'torch-1.6.0a0+b31f58d.dist-info/entry_points.txt'
adding 'torch-1.6.0a0+b31f58d.dist-info/top_level.txt'
adding 'torch-1.6.0a0+b31f58d.dist-info/RECORD'
removing build/bdist.linux-aarch64/wheel
[root@67bce55d5a71 pytorch]# cd dist/
[root@67bce55d5a71 dist]# ls
torch-1.6.0a0+b31f58d-cp38-cp38-linux_aarch64.whl
[root@67bce55d5a71 dist]#

完工

编译后的安装包下载

我也提供下编译出来的安装包吧

文件:torch-1.6.0a0+b31f58d-cp38-cp38-linux_aarch64.whl

MD5:128cf04ee699a1af0d01ce58c026aa84

大小:73MB(76, 326, 517 B)

下载:Ruter存储服务 / 天翼云盘 访问码:yzv5