Re: [PATCH 0/1] pcie: Allow atomic completion on PCIE root port

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Michael S. Tsirkin" <mst@redhat.com>
To: robin@streamhpc.com
Cc: qemu-devel@nongnu.org, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
Subject: Re: [PATCH 0/1] pcie: Allow atomic completion on PCIE root port
Date: Fri, 21 Apr 2023 04:22:23 -0400	[thread overview]
Message-ID: <20230421042013-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20230420153839.167418-1-robin@streamhpc.com>

On Thu, Apr 20, 2023 at 05:38:39PM +0200, robin@streamhpc.com wrote:
> From: Robin Voetter <robin@streamhpc.com>
> 
> The ROCm driver for Linux uses PCIe atomics to schedule work and
> generally communicate between the host and the device.  This does not
> currently work in QEMU with regular vfio-pci passthrough, because the
> pcie-root-port does not advertise the PCIe atomic completer
> capabilities.  When initializing the GPU from the Linux driver, it
> queries whether the PCIe connection from the CPU to GPU supports the
> required capabilities[1] in the pci_enable_atomic_ops_to_root
> function[2].  Currently the only part where this fails is checking the
> atomic completer capabilities (32 and 64 bits) on the root port[3].  In
> this case, the driver determines that PCIe atomics are not supported at
> all, and this causes ROCm programs to misbehave.  (While AMD advertises
> that there is some support for ROCm without PCIe atomics, I have never
> actually gotten that working...)
> 
> This patch allows ROCm to properly function by introducing an
> additional experimental property to the pcie-root-port,
> x-atomic-completion.

so what exactly makes it experimental? from this description
it looks like it actually has to be enabled for things to work?
Also pls CC alex on whether this is a correct way to do it.

>  Setting this option makes the port report
> support for the PCI_EXP_DEVCAP2_ATOMIC_COMP32 and COMP64
> capabilities.  This then makes the check from [3] pass, and
> everything seems to work appropriately after that.
> 
> To verify that the capabilities are reported correctly, one can use
> lspci to check the capabilities of the root port: lspci -vvv -s <root
> port id> should show 32bit+ and 64bit+ capabilities in DevCap2 when
> x-atomic-completion is enabled.  For example:
> 
>     -device pcie-root-port,x-atomic-completion=true,id.pcie.1
> 
> The output of lspci should include the following for the pcie root port:
> 
>     AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
> 
> To verify that ROCm works, the following HIP program should be
> sufficient.  The work is scheduled to the GPU by signaling a semaphore
> using atomic operations from the CPU side, which is completed on the
> GPU, and the GPU-side printf works by signaling a semaphore from the GPU
> that is completed on the CPU.  It can be compiled using hipcc with
> 'hipcc -otest test.hip':
> 
>     #include <hip/hip_runtime.h>
>     __global__ void test() {
>         printf("hello, world\n");
>     }
>     int main() {
>         test<<<dim3(1), dim3(1)>>>();
>         hipDeviceSynchronize();
>     }
> 
> Previously, or when x-atomic-completion is set to false, this program
> would simply hang.  Additionally, a message along the lines of the
> following is printed to dmesg during boot if the GPU driver determines
> that atomics are not supported:
> 
>     amdgpu 0000:01:00.0: amdgpu: PCIE atomic ops is not supported
> 
> When atomics are properly supported, the above program works as
> intended, and the previous dmesg message is of course not printed. For
> this I am using a simple machine setup using the following device
> options, with the GPU that im testing with of course on 03:00.0.
> 
>      -device pcie-root-port,x-atomic-completion=true,id=pcie.1
>      -device vfio-pci,host=03:00.0,bus=pcie.1
> 
> This patch does not include any automatic detection whether the root
> port of the host supports the atomic completer capabilities, nor if any
> of the physical PCIe bridges between the CPU and GPU support atomic
> routing.  The intention here is that the user should make sure that the
> host does support atomic completion on the root complex. See also some
> prior discussion[4].  I have run the full test suite of some ROCm
> libraries: rocPRIM, rocRAND, hipRAND, hipCUB and rocThrust.  All of the
> tests pass now, with some minor unrelated changes.
> 
> Kind regards,
> 
> Robin Voetter, Stream HPC
> 
> [1] https://github.com/torvalds/linux/blob/v6.2/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c#L3716
> [2] https://github.com/torvalds/linux/blob/v6.2/drivers/pci/pci.c#L3781
> [3] https://github.com/torvalds/linux/blob/v6.2/drivers/pci/pci.c#L3829
> [4] https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg01815.html
> ---
> 
> Robin Voetter (1):
>   pcie: Allow generic PCIE root port to enable atomic completion
> 
>  hw/pci-bridge/gen_pcie_root_port.c | 2 ++
>  hw/pci/pcie.c                      | 6 ++++++
>  include/hw/pci/pcie_port.h         | 3 +++
>  3 files changed, 11 insertions(+)
> 
> --
> 2.39.2

next prev parent reply	other threads:[~2023-04-21  8:22 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-20 15:38 [PATCH 0/1] pcie: Allow atomic completion on PCIE root port robin
2023-04-20 15:38 ` [PATCH 1/1] pcie: Allow generic PCIE root port to enable atomic completion robin
2023-04-21  8:22 ` Michael S. Tsirkin [this message]
2023-04-21 16:06   ` [PATCH 0/1] pcie: Allow atomic completion on PCIE root port Robin Voetter
2023-05-18 20:03     ` Michael S. Tsirkin
2023-05-18 22:13       ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230421042013-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=robin@streamhpc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).