From mboxrd@z Thu Jan  1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 96881] ViennaCL fails dense_blas-bench-opencl benchmark with
 doubles on AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0)
Date: Sun, 10 Jul 2016 15:20:57 +0000
Message-ID: <bug-96881-502@http.bugs.freedesktop.org/>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0254840092=="
Return-path: <dri-devel-bounces@lists.freedesktop.org>
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
 [IPv6:2610:10:20:722:a800:ff:fe98:4b55])
 by gabe.freedesktop.org (Postfix) with ESMTP id 6B88F6E0C1
 for <dri-devel@lists.freedesktop.org>; Sun, 10 Jul 2016 15:20:57 +0000 (UTC)
List-Unsubscribe: <https://lists.freedesktop.org/mailman/options/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <https://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <https://lists.freedesktop.org/mailman/listinfo/dri-devel>,
 <mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org


--===============0254840092==
Content-Type: multipart/alternative; boundary="14681640570.29CA62fbF.4520";
 charset="UTF-8"


--14681640570.29CA62fbF.4520
Date: Sun, 10 Jul 2016 15:20:57 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated

https://bugs.freedesktop.org/show_bug.cgi?id=3D96881

            Bug ID: 96881
           Summary: ViennaCL fails dense_blas-bench-opencl benchmark with
                    doubles on AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0)
           Product: Mesa
           Version: 11.2
          Hardware: x86-64 (AMD64)
                OS: Linux (All)
            Status: NEW
          Severity: normal
          Priority: medium
         Component: Drivers/Gallium/r600
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: ubizjak@gmail.com
        QA Contact: dri-devel@lists.freedesktop.org

The dense_blas-bench-opencl benchmark from ViennaCL suite fails with double=
s on
AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0):

$ ./dense_blas-bench-opencl=20

----------------------------------------------
               Device Info
----------------------------------------------

Name:                AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0)
Vendor:              AMD
Type:                GPU=20
Available:           1
Max Compute Units:   10
Max Work Group Size: 256
Global Mem Size:     1073741824
Local Mem Size:      32768
Local Mem Type:      1
Host Unified Memory: 1


Benchmark : BLAS
----------------
sCOPY : 64.3 GB/s
sAXPY : 95.4 GB/s
sDOT : 85.3 GB/s
sGEMV-N : 20.8 GB/s
sGEMV-T : 44.3 GB/s
sGEMM-NN : 126 GFLOPs/s
sGEMM-NT : 87.6 GFLOPs/s
sGEMM-TN : 90.5 GFLOPs/s
sGEMM-TT : 72.3 GFLOPs/s
----
Build Status =3D -2 ( Err =3D -11 )
Log: unsupported call to function __subdf3 in av_cpu
Sources: #pragma OPENCL EXTENSION cl_khr_fp64 : enable

__kernel void av_cpu(=20
  __global double * vec1,=20
  uint4 size1,=20
...

It looks like DFmode (double) instructions are not enabled correctly in LLVM
for targets that report cl_khr_fp64 extension.

clinfo reports:

Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 MESA 11.2.2
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD CYPRESS (DRM 2.43.0, =
LLVM
3.8.0)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 MESA 11.2.2
  Driver Version                                  11.2.2
  Device OpenCL C Version                         OpenCL C 1.1=20
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               10
  Max clock frequency                             850MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Preferred / native vector sizes=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20
    char                                                16 / 16=20=20=20=20=
=20=20
    short                                                8 / 8=20=20=20=20=
=20=20=20
    int                                                  4 / 4=20=20=20=20=
=20=20=20
    long                                                 2 / 2=20=20=20=20=
=20=20=20
    half                                                 0 / 0        (n/a)
    float                                                4 / 4=20=20=20=20=
=20=20=20
    double                                               2 / 2=20=20=20=20=
=20=20=20
(cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
    ...

--=20
You are receiving this mail because:
You are the assignee for the bug.=

--14681640570.29CA62fbF.4520
Date: Sun, 10 Jul 2016 15:20:57 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated

<html>
    <head>
      <base href=3D"https://bugs.freedesktop.org/">
    </head>
    <body><table border=3D"1" cellspacing=3D"0" cellpadding=3D"8">
        <tr>
          <th>Bug ID</th>
          <td><a class=3D"bz_bug_link=20
          bz_status_NEW "
   title=3D"NEW - ViennaCL fails dense_blas-bench-opencl benchmark with dou=
bles on AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0)"
   href=3D"https://bugs.freedesktop.org/show_bug.cgi?id=3D96881">96881</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>ViennaCL fails dense_blas-bench-opencl benchmark with doubles=
 on AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0)
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>Mesa
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>11.2
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>x86-64 (AMD64)
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux (All)
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>medium
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Drivers/Gallium/r600
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>dri-devel&#64;lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>ubizjak&#64;gmail.com
          </td>
        </tr>

        <tr>
          <th>QA Contact</th>
          <td>dri-devel&#64;lists.freedesktop.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>The dense_blas-bench-opencl benchmark from ViennaCL suite fail=
s with doubles on
AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0):

$ ./dense_blas-bench-opencl=20

----------------------------------------------
               Device Info
----------------------------------------------

Name:                AMD CYPRESS (DRM 2.43.0, LLVM 3.8.0)
Vendor:              AMD
Type:                GPU=20
Available:           1
Max Compute Units:   10
Max Work Group Size: 256
Global Mem Size:     1073741824
Local Mem Size:      32768
Local Mem Type:      1
Host Unified Memory: 1


Benchmark : BLAS
----------------
sCOPY : 64.3 GB/s
sAXPY : 95.4 GB/s
sDOT : 85.3 GB/s
sGEMV-N : 20.8 GB/s
sGEMV-T : 44.3 GB/s
sGEMM-NN : 126 GFLOPs/s
sGEMM-NT : 87.6 GFLOPs/s
sGEMM-TN : 90.5 GFLOPs/s
sGEMM-TT : 72.3 GFLOPs/s
----
Build Status =3D -2 ( Err =3D -11 )
Log: unsupported call to function __subdf3 in av_cpu
Sources: #pragma OPENCL EXTENSION cl_khr_fp64 : enable

__kernel void av_cpu(=20
  __global double * vec1,=20
  uint4 size1,=20
...

It looks like DFmode (double) instructions are not enabled correctly in LLVM
for targets that report cl_khr_fp64 extension.

clinfo reports:

Number of platforms                               1
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 MESA 11.2.2
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   Clover
Number of devices                                 1
  Device Name                                     AMD CYPRESS (DRM 2.43.0, =
LLVM
3.8.0)
  Device Vendor                                   AMD
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.1 MESA 11.2.2
  Driver Version                                  11.2.2
  Device OpenCL C Version                         OpenCL C 1.1=20
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               10
  Max clock frequency                             850MHz
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              64
  Preferred / native vector sizes=20=20=20=20=20=20=20=20=20=20=20=20=20=20=
=20=20=20
    char                                                16 / 16=20=20=20=20=
=20=20
    short                                                8 / 8=20=20=20=20=
=20=20=20
    int                                                  4 / 4=20=20=20=20=
=20=20=20
    long                                                 2 / 2=20=20=20=20=
=20=20=20
    half                                                 0 / 0        (n/a)
    float                                                4 / 4=20=20=20=20=
=20=20=20
    double                                               2 / 2=20=20=20=20=
=20=20=20
(cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
    ...</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the assignee for the bug.</li>
      </ul>
    </body>
</html>=

--14681640570.29CA62fbF.4520--

--===============0254840092==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline

X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg==

--===============0254840092==--