From: Wei Liu <wei.liu@kernel.org>
To: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Dexuan Cui <decui@microsoft.com>,
quic_jhugo@quicinc.com, quic_carlv@quicinc.com,
wei.liu@kernel.org, kys@microsoft.com, haiyangz@microsoft.com,
sthemmin@microsoft.com, bhelgaas@google.com,
linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, mikelley@microsoft.com,
robh@kernel.org, kw@linux.com, helgaas@kernel.org,
alex.williamson@redhat.com, boqun.feng@gmail.com,
Boqun.Feng@microsoft.com
Subject: Re: [PATCH v3] PCI: hv: Only reuse existing IRTE allocation for Multi-MSI
Date: Fri, 11 Nov 2022 23:40:31 +0000 [thread overview]
Message-ID: <Y27db98OD9Kpjcoe@liuwe-devbox-debian-v2> (raw)
In-Reply-To: <Y24cTE9+bqXtHics@lpieralisi>
On Fri, Nov 11, 2022 at 10:56:28AM +0100, Lorenzo Pieralisi wrote:
> On Fri, Nov 04, 2022 at 03:29:53PM -0700, Dexuan Cui wrote:
> > Jeffrey added Multi-MSI support to the pci-hyperv driver by the 4 patches:
> > 08e61e861a0e ("PCI: hv: Fix multi-MSI to allow more than one MSI vector")
> > 455880dfe292 ("PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI")
> > b4b77778ecc5 ("PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()")
> > a2bad844a67b ("PCI: hv: Fix interrupt mapping for multi-MSI")
> >
> > It turns out that the third patch (b4b77778ecc5) causes a performance
> > regression because all the interrupts now happen on 1 physical CPU (or two
> > pCPUs, if one pCPU doesn't have enough vectors). When a guest has many PCI
> > devices, it may suffer from soft lockups if the workload is heavy, e.g.,
> > see https://lwn.net/ml/linux-kernel/20220804025104.15673-1-decui@microsoft.com/
> >
> > Commit b4b77778ecc5 itself is good. The real issue is that the hypercall in
> > hv_irq_unmask() -> hv_arch_irq_unmask() ->
> > hv_do_hypercall(HVCALL_RETARGET_INTERRUPT...) only changes the target
> > virtual CPU rather than physical CPU; with b4b77778ecc5, the pCPU is
> > determined only once in hv_compose_msi_msg() where only vCPU0 is specified;
> > consequently the hypervisor only uses 1 target pCPU for all the interrupts.
> >
> > Note: before b4b77778ecc5, the pCPU is determined twice, and when the pCPU
> > is determinted the second time, the vCPU in the effective affinity mask is
> > used (i.e., it isn't always vCPU0), so the hypervisor chooses different
> > pCPU for each interrupt.
> >
> > The hypercall will be fixed in future to update the pCPU as well, but
> > that will take quite a while, so let's restore the old behavior in
> > hv_compose_msi_msg(), i.e., don't reuse the existing IRTE allocation for
> > single-MSI and MSI-X; for multi-MSI, we choose the vCPU in a round-robin
> > manner for each PCI device, so the interrupts of different devices can
> > happen on different pCPUs, though the interrupts of each device happen on
> > some single pCPU.
> >
> > The hypercall fix may not be backported to all old versions of Hyper-V, so
> > we want to have this guest side change for ever (or at least till we're sure
> > the old affected versions of Hyper-V are no longer supported).
> >
> > Fixes: b4b77778ecc5 ("PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()")
> > Co-developed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> > Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> > Co-developed-by: Carl Vanderlip <quic_carlv@quicinc.com>
> > Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com>
> > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> >
> > ---
> >
> > v1 is here:
> > https://lwn.net/ml/linux-kernel/20220804025104.15673-1-decui@microsoft.com/
> >
> > Changes in v2:
> > round-robin the vCPU for multi-MSI.
> > The commit message is re-worked.
> > Added Jeff and Carl's Co-developed-by and Signed-off-by.
> >
> > Changes in v3:
> > Michael Kelley kindly helped to make a great comment, and I added the
> > comment before hv_compose_msi_req_get_cpu(). Thank you, Michael!
> >
> > Rebased to Hyper-V tree's "hyperv-fixes" branch:
> > https://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux.git/log/?h=hyperv-fixes
> >
> > Bjorn, Lorenzo, it would be great to have your Ack. The patch needs to go
> > through the Hyper-V tree because it's rebased to another hv_pci patch (which
> > only exists in the Hyper-V tree for now):
> > e70af8d040d2 ("PCI: hv: Fix the definition of vector in hv_compose_msi_msg()")
> >
> > BTW, Michael has some other hv_pci patches, which would also need go through
> > the Hyper-V tree:
> > https://lwn.net/ml/linux-kernel/1666288635-72591-1-git-send-email-mikelley%40microsoft.com/
> >
> >
> > drivers/pci/controller/pci-hyperv.c | 90 ++++++++++++++++++++++++-----
> > 1 file changed, 75 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
> > index ba64284eaf9f..fa5a1ba35a82 100644
> > --- a/drivers/pci/controller/pci-hyperv.c
> > +++ b/drivers/pci/controller/pci-hyperv.c
> > @@ -1613,7 +1613,7 @@ static void hv_pci_compose_compl(void *context, struct pci_response *resp,
> > }
> >
> > static u32 hv_compose_msi_req_v1(
> > - struct pci_create_interrupt *int_pkt, const struct cpumask *affinity,
> > + struct pci_create_interrupt *int_pkt,
> > u32 slot, u8 vector, u16 vector_count)
> > {
> > int_pkt->message_type.type = PCI_CREATE_INTERRUPT_MESSAGE;
> > @@ -1631,6 +1631,35 @@ static u32 hv_compose_msi_req_v1(
> > return sizeof(*int_pkt);
> > }
> >
> > +/*
> > + * The vCPU selected by hv_compose_multi_msi_req_get_cpu() and
> > + * hv_compose_msi_req_get_cpu() is a "dummy" vCPU because the final vCPU to be
> > + * interrupted is specified later in hv_irq_unmask() and communicated to Hyper-V
> > + * via the HVCALL_RETARGET_INTERRUPT hypercall. But the choice of dummy vCPU is
> > + * not irrelevant because Hyper-V chooses the physical CPU to handle the
> > + * interrupts based on the vCPU specified in message sent to the vPCI VSP in
> > + * hv_compose_msi_msg(). Hyper-V's choice of pCPU is not visible to the guest,
> > + * but assigning too many vPCI device interrupts to the same pCPU can cause a
> > + * performance bottleneck. So we spread out the dummy vCPUs to influence Hyper-V
> > + * to spread out the pCPUs that it selects.
> > + *
> > + * For the single-MSI and MSI-X cases, it's OK for hv_compose_msi_req_get_cpu()
> > + * to always return the same dummy vCPU, because a second call to
> > + * hv_compose_msi_msg() contains the "real" vCPU, causing Hyper-V to choose a
> > + * new pCPU for the interrupt. But for the multi-MSI case, the second call to
> > + * hv_compose_msi_msg() exits without sending a message to the vPCI VSP, so the
>
> Why ? Can't you fix _that_ ? Why can't the initial call to
> hv_compose_msi_msg() determine the _real_ target vCPU ?
Dexuan, any comment on this?
>
> > + * original dummy vCPU is used. This dummy vCPU must be round-robin'ed so that
> > + * the pCPUs are spread out. All interrupts for a multi-MSI device end up using
> > + * the same pCPU, even though the vCPUs will be spread out by later calls
> > + * to hv_irq_unmask(), but that is the best we can do now.
> > + *
> > + * With current Hyper-V, the HVCALL_RETARGET_INTERRUPT hypercall does *not*
>
> "current" Hyper-V means nothing, remove it or provide versioning
> information. Imagine yourself reading this comment some time
> in the future.
>
And this?
Wei.
next prev parent reply other threads:[~2022-11-11 23:40 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-04 22:29 [PATCH v3] PCI: hv: Only reuse existing IRTE allocation for Multi-MSI Dexuan Cui
2022-11-04 23:38 ` Michael Kelley (LINUX)
2022-11-10 20:49 ` Dexuan Cui
2022-11-10 21:43 ` Bjorn Helgaas
2022-11-11 7:17 ` Dexuan Cui
2022-11-11 9:56 ` Lorenzo Pieralisi
2022-11-11 23:40 ` Wei Liu [this message]
2022-11-12 0:58 ` Dexuan Cui
2022-11-12 12:43 ` Wei Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y27db98OD9Kpjcoe@liuwe-devbox-debian-v2 \
--to=wei.liu@kernel.org \
--cc=Boqun.Feng@microsoft.com \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=boqun.feng@gmail.com \
--cc=decui@microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=helgaas@kernel.org \
--cc=kw@linux.com \
--cc=kys@microsoft.com \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lpieralisi@kernel.org \
--cc=mikelley@microsoft.com \
--cc=quic_carlv@quicinc.com \
--cc=quic_jhugo@quicinc.com \
--cc=robh@kernel.org \
--cc=sthemmin@microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.