Re: [Qemu-devel] QEMU PCIe link "negotiation"

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: qemu-devel <qemu-devel@nongnu.org>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] QEMU PCIe link "negotiation"
Date: Tue, 16 Oct 2018 10:49:25 +0100	[thread overview]
Message-ID: <20181016094924.GA2427@work-vm> (raw)
In-Reply-To: <20181015141841.09b945e5@w520.home>

* Alex Williamson (alex.williamson@redhat.com) wrote:
> Hi,
> 
> I'd like to start a discussion about virtual PCIe link width and speeds
> in QEMU to figure out how we progress past the 2.5GT/s, x1 width links
> we advertise today.  This matters for assigned devices as the endpoint
> driver may not enable full physical link utilization if the upstream
> port only advertises minimal capabilities.  One GPU assignment users
> has measured that they only see an average transfer rate of 3.2GB/s
> with current code, but hacking the downstream port to advertise an
> 8GT/s, x16 width link allows them to get 12GB/s.  Obviously not all
> devices and drivers will have this dependency and see these kinds of
> improvements, or perhaps any improvement at all.
> 
> The first problem seems to be how we expose these link parameters in a
> way that makes sense and supports backwards compatibility and
> migration.  I think we want the flexibility to allow the user to
> specify per PCIe device the link width and at least the maximum link
> speed, if not the actual discrete link speeds supported.  However,
> while I want to provide this flexibility, I don't necessarily think it
> makes sense to burden the user to always specify these to get
> reasonable defaults.  So I would propose that we a) add link parameters
> to the base PCIe device class and b) set defaults based on the machine
> type.  Additionally these machine type defaults would only apply to
> generic PCIe root ports and switch ports, anything based on real
> hardware would be fixed, ex. ioh3420 would stay at 2.5GT/s, x1 unless
> overridden by the user.  Existing machine types would also stay at this
> "legacy" rate, while pc-q35-3.2 might bring all generic devices up to
> PCIe 4.0 specs, x32 width and 16GT/s, where the per-endpoint
> negotiation would bring us back to negotiated widths and speeds
> matching the endpoint.  Reasonable?
> 
> Next I think we need to look at how and when we do virtual link
> negotiation.  We're mostly discussing a virtual link, so I think
> negotiation is simply filling in the negotiated link and width with the
> highest common factor between endpoint and upstream port.  For assigned
> devices, this should match the endpoint's existing negotiated link
> parameters, however, devices can dynamically change their link speed
> (perhaps also width?), so I believe a current link seed of 2.5GT/s
> could upshift to 8GT/s without any sort of visible renegotiation.  Does
> this mean that we should have link parameter callbacks from downstream
> port to endpoint?  Or maybe the downstream port link status register
> should effectively be an alias for LNKSTA of devfn 00.0 of the
> downstream device when it exists.  We only need to report a consistent
> link status value when someone looks at it, so reading directly from
> the endpoint probably makes more sense than any sort of interface to
> keep the value current.
> 
> If we take the above approach with LNKSTA (probably also LNKSTA2), is
> any sort of "negotiation" required?  We're automatically negotiated if
> the capabilities of the upstream port are a superset of the endpoint's
> capabilities.  What do we do and what do we care about when the
> upstream port is a subset of the endpoint though?  For example, an
> 8GT/s, x16 endpoint is installed into a 2.5GT/s, x1 downstream port.
> On real hardware we obviously negotiate the endpoint down to the
> downstream port parameters.  We could do that with an emulated device,
> but this is the scenario we have today with assigned devices and we
> simply leave the inconsistency.  I don't think we actually want to
> (and there would be lots of complications to) force the physical device
> to negotiate down to match a virtual downstream port.  Do we simply
> trigger a warning that this may result in non-optimal performance and
> leave the inconsistency?
> 
> This email is already too long, but I also wonder whether we should
> consider additional vfio-pci interfaces to trigger a link retraining or
> allow virtualized access to the physical upstream port config space.
> Clearly we need to consider multi-function devices and whether there
> are useful configurations that could benefit from such access.  Thanks
> for reading, please discuss,

I'm not sure about the consequences of the actual link speeds, but I
worry we'll hit things looking for PCIe v3 features; in particular
AMD's ROCm code needs PCIe atomics:

https://github.com/RadeonOpenCompute/ROCm_Documentation/blob/master/Installation_Guide/More-about-how-ROCm-uses-PCIe-Atomics.rst

so it feels like getting that to work with passthrough would
need some negotiation of features.

Dave

> Alex
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

next prev parent reply	other threads:[~2018-10-16  9:49 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-15 20:18 [Qemu-devel] QEMU PCIe link "negotiation" Alex Williamson
2018-10-16  9:49 ` Dr. David Alan Gilbert [this message]
2018-10-16 15:50   ` Alex Williamson
2018-10-16 15:21 ` Michael S. Tsirkin
2018-10-16 16:46   ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181016094924.GA2427@work-vm \
    --to=dgilbert@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.