All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@bugzilla.kernel.org
To: dri-devel@lists.freedesktop.org
Subject: [Bug 188271] New: IOMMU DMAR fault with NVIDIA CUDA peer to peer
Date: Mon, 21 Nov 2016 17:16:57 +0000	[thread overview]
Message-ID: <bug-188271-2300@https.bugzilla.kernel.org/> (raw)

https://bugzilla.kernel.org/show_bug.cgi?id=188271

            Bug ID: 188271
           Summary: IOMMU DMAR fault with NVIDIA CUDA peer to peer
           Product: Drivers
           Version: 2.5
    Kernel Version: 4.8.6
          Hardware: x86-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: vadim@sourced.tech
        Regression: No

My motherboard is Supermicro X10DRG-Q (details in attached output of
dmidecode). It has 2 Xeon E5-2620 v4 (details in attached lscpu output). Two
Titan X 2016 GPUs are inserted into PCIe slots (see nvidia-smi output). After
enabling of the peer to peer access between those two cards, execution of
cudaMemcpyPeer() hangs and dmesg shows:

[16193.612535] DMAR: DRHD: handling fault status reg 602
[16193.617662] DMAR: [DMA Write] Request device [82:00.0] fault addr
387fc000c000 [fault reason 05] PTE Write access is not set
[16193.661857] DMAR: DRHD: handling fault status reg 702
[16193.666976] DMAR: [DMA Write] Request device [82:00.0] fault addr f8139000
[fault reason 05] PTE Write access is not set (edited)

I am using CoreOS, and the whole stuff happens inside a docker container
running with -device /dev/nvidiactl --device /dev/nvidia0 --device /dev/nvidia1
--device /dev/nvidia-uvm --privileged --security-opt seccomp=unconfined

The addition of intel_iommu=igfx_off to kernel command line cures the problem
and peer to peer works perfectly.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

             reply	other threads:[~2016-11-21 17:17 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-21 17:16 bugzilla-daemon [this message]
2016-11-21 17:17 ` [Bug 188271] IOMMU DMAR fault with NVIDIA CUDA peer to peer bugzilla-daemon
2016-11-21 17:17 ` bugzilla-daemon
2016-11-21 17:17 ` bugzilla-daemon
2016-11-21 17:18 ` bugzilla-daemon
2016-11-21 17:18 ` bugzilla-daemon
2016-11-21 17:18 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-188271-2300@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@bugzilla.kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.