From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 96881] ViennaCL fails dense_blas-bench-opencl benchmark with doubles on AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0) Date: Sun, 10 Jul 2016 15:20:57 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0254840092==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id 6B88F6E0C1 for ; Sun, 10 Jul 2016 15:20:57 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0254840092== Content-Type: multipart/alternative; boundary="14681640570.29CA62fbF.4520"; charset="UTF-8" --14681640570.29CA62fbF.4520 Date: Sun, 10 Jul 2016 15:20:57 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D96881 Bug ID: 96881 Summary: ViennaCL fails dense_blas-bench-opencl benchmark with doubles on AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0) Product: Mesa Version: 11.2 Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: Drivers/Gallium/r600 Assignee: dri-devel@lists.freedesktop.org Reporter: ubizjak@gmail.com QA Contact: dri-devel@lists.freedesktop.org The dense_blas-bench-opencl benchmark from ViennaCL suite fails with double= s on AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0): $ ./dense_blas-bench-opencl=20 ---------------------------------------------- Device Info ---------------------------------------------- Name: AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0) Vendor: AMD Type: GPU=20 Available: 1 Max Compute Units: 10 Max Work Group Size: 256 Global Mem Size: 1073741824 Local Mem Size: 32768 Local Mem Type: 1 Host Unified Memory: 1 Benchmark : BLAS ---------------- sCOPY : 64.3 GB/s sAXPY : 95.4 GB/s sDOT : 85.3 GB/s sGEMV-N : 20.8 GB/s sGEMV-T : 44.3 GB/s sGEMM-NN : 126 GFLOPs/s sGEMM-NT : 87.6 GFLOPs/s sGEMM-TN : 90.5 GFLOPs/s sGEMM-TT : 72.3 GFLOPs/s ---- Build Status =3D -2 ( Err =3D -11 ) Log: unsupported call to function __subdf3 in av_cpu Sources: #pragma OPENCL EXTENSION cl_khr_fp64 : enable __kernel void av_cpu(=20 __global double * vec1,=20 uint4 size1,=20 ... It looks like DFmode (double) instructions are not enabled correctly in LLVM for targets that report cl_khr_fp64 extension. clinfo reports: Number of platforms 1 Platform Name Clover Platform Vendor Mesa Platform Version OpenCL 1.1 MESA 11.2.2 Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd Platform Extensions function suffix MESA Platform Name Clover Number of devices 1 Device Name AMD CYPRESS (DRM 2.43.0, = LLVM 3.8.0) Device Vendor AMD Device Vendor ID 0x1002 Device Version OpenCL 1.1 MESA 11.2.2 Driver Version 11.2.2 Device OpenCL C Version OpenCL C 1.1=20 Device Type GPU Device Profile FULL_PROFILE Max compute units 10 Max clock frequency 850MHz Max work item dimensions 3 Max work item sizes 256x256x256 Max work group size 256 Preferred work group size multiple 64 Preferred / native vector sizes=20=20=20=20=20=20=20=20=20=20=20=20=20=20= =20=20=20 char 16 / 16=20=20=20=20= =20=20 short 8 / 8=20=20=20=20= =20=20=20 int 4 / 4=20=20=20=20= =20=20=20 long 2 / 2=20=20=20=20= =20=20=20 half 0 / 0 (n/a) float 4 / 4=20=20=20=20= =20=20=20 double 2 / 2=20=20=20=20= =20=20=20 (cl_khr_fp64) Half-precision Floating-point support (n/a) Single-precision Floating-point support (core) Denormals No Infinity and NANs Yes Round to nearest Yes Round to zero No Round to infinity No IEEE754-2008 fused multiply-add No Support is emulated in software No Correctly-rounded divide and sqrt operations No Double-precision Floating-point support (cl_khr_fp64) Denormals Yes Infinity and NANs Yes Round to nearest Yes Round to zero Yes Round to infinity Yes IEEE754-2008 fused multiply-add Yes Support is emulated in software No Correctly-rounded divide and sqrt operations No ... --=20 You are receiving this mail because: You are the assignee for the bug.= --14681640570.29CA62fbF.4520 Date: Sun, 10 Jul 2016 15:20:57 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated
Bug ID 96881
Summary ViennaCL fails dense_blas-bench-opencl benchmark with doubles= on AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0)
Product Mesa
Version 11.2
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity normal
Priority medium
Component Drivers/Gallium/r600
Assignee dri-devel@lists.freedesktop.org
Reporter ubizjak@gmail.com
QA Contact dri-devel@lists.freedesktop.org

The dense_blas-bench-opencl benchmark from ViennaCL suite fail=
s with doubles on
AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0):

$ ./dense_blas-bench-opencl=20

----------------------------------------------
               Device Info
----------------------------------------------

Name:                AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0)
Vendor:              AMD
Type:                GPU=20
Available:           1
Max Compute Units:   10
Max Work Group Size: 256
Global Mem Size:     1073741824
Local Mem Size:      32768
Local Mem Type:      1
Host Unified Memory: 1


Benchmark : BLAS
----------------
sCOPY : 64.3 GB/s
sAXPY : 95.4 GB/s
sDOT : 85.3 GB/s
sGEMV-N : 20.8 GB/s
sGEMV-T : 44.3 GB/s
sGEMM-NN : 126 GFLOPs/s
sGEMM-NT : 87.6 GFLOPs/s
sGEMM-TN : 90.5 GFLOPs/s
sGEMM-TT : 72.3 GFLOPs/s
----
Build Status =3D -2 ( Err =3D -11 )
Log: unsupported call to function __subdf3 in av_cpu
Sources: #pragma OPENCL EXTENSION cl_khr_fp64 : enable

__kernel void av_cpu(=20
  __global double * vec1,=20
  uint4 size1,=20
...

It looks like DFmode (double) instructions are not enabled correctly in LLVM
for targets that report cl_khr_fp64 extension.

clinfo reports:

Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 MESA 11.2.2
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD CYPRESS (DRM 2.43.0, =
LLVM
3.8.0)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 MESA 11.2.2
  Driver Version                                  11.2.2
  Device OpenCL C Version                         OpenCL C 1.1=20
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               10
  Max clock frequency                             850MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Preferred / native vector sizes=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20
    char                                                16 / 16=20=20=20=20=
=20=20
    short                                                8 / 8=20=20=20=20=
=20=20=20
    int                                                  4 / 4=20=20=20=20=
=20=20=20
    long                                                 2 / 2=20=20=20=20=
=20=20=20
    half                                                 0 / 0        (n/a)
    float                                                4 / 4=20=20=20=20=
=20=20=20
    double                                               2 / 2=20=20=20=20=
=20=20=20
(cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
    ...


You are receiving this mail because:
  • You are the assignee for the bug.
= --14681640570.29CA62fbF.4520-- --===============0254840092== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0254840092==--