From: David Woodhouse <dwmw2@infradead.org>
To: Paul Durrant <paul.durrant@citrix.com>,
xen-devel@lists.xenproject.org,
Roger Pau Monne <roger.pau@citrix.com>
Cc: Eslam Elnikety <elnikety@amazon.de>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Shan Haitao <haitao.shan@intel.com>,
Jan Beulich <jbeulich@suse.com>
Subject: Re: [Xen-devel] [PATCH v2] x86/hvm: re-work viridian APIC assist code
Date: Tue, 11 Aug 2020 14:25:16 +0100 [thread overview]
Message-ID: <7547be305e3ede9edb897e2be898fe54e0b4065c.camel@infradead.org> (raw)
In-Reply-To: <1535153880.24926.28.camel@infradead.org>
[-- Attachment #1: Type: text/plain, Size: 5343 bytes --]
Resending this straw man patch at Roger's request, to restart discussion.
Redux: In order to cope with the relatively rare case of unmaskable
legacy MSIs, each vlapic EOI takes a domain-global spinlock just to
iterate over all IRQs and determine that there's actually nothing to
do.
In my testing, I observe that this drops Windows performance on passed-
through NVMe from 2.2M IOPS down to about 1.0M IOPS.
I have a variant of this patch which just has a single per-domain "I
attached legacy unmaskable MSIs" flag, which is never cleared. The
patch below is per-vector (but Roger points out it should be per-vCPU
per-vector). I don't know that we really care enough to do more than
the single per-domain flag, which in real life would never happen
anyway unless you have crappy hardware, at which point you get back to
today's status quo.
My main concern is that this code is fairly sparsely documented and I'm
only 99% sure that this code path really *is* only for unmaskable MSIs,
and doesn't have some other esoteric use. A second opinion on that
would be particularly welcome.
(NB: APIC assist was a red herring here, except that it had a bug which
stopped the gratuitous EOI work from ever happening at all — which
nobody ever cared about since it doesn't matter on sane hardware, but
then when Paul *fixed* it, we saw it as a performance regression.)
On Sat, 2018-08-25 at 00:38 +0100, David Woodhouse wrote:
> On Thu, 2018-01-18 at 10:10 -0500, Paul Durrant wrote:
> > Lastly the previous code did not properly emulate an EOI if a missed EOI
> > was discovered in vlapic_has_pending_irq(); it merely cleared the bit in
> > the ISR. The new code instead calls vlapic_EOI_set().
>
> Hm, this *halves* my observed performance running a 32-thread
> 'diskspd.exe' on a Windows box with attached NVME devices, which makes
> me sad.
>
> It's the call to hvm_dpci_msi_eoi() that does it.
>
> Commenting out the call to pt_pirq_iterate() and leaving *just* the
> domain-global spinlock bouncing cache lines between all my CPUs, it's
> already down to 1.6MIOPS/s from 2.2M on my test box before it does
> *anything* at all.
>
> Calling an *inline* version of pt_pirq_iterate so no retpoline for the
> indirect calls, and I'm down to 1.1M even when I've nopped out the
> whole of the _hvm_dpci_msi_eoi function that it's calling. Put it all
> back, and I'm down to about 1.0M. So it's worse than halved.
>
> And what's all this for? The code here is making my eyes bleed but I
> believe it's for unmaskable MSIs, and these aren't unmaskable.
>
> Tempted to make it all go away by having a per-domain bitmap of vectors
> for which all this work is actually required, and bypassing the whole
> bloody lot in hvm_dpci_msi_eoi() if the corresponding in bit that
> bitmap isn't set.
>
> The hackish version of that (which seems to work, but would probably
> want testing with an actual unmaskable MSI in the system, and I have
> absolutely no confidence I understand what's going on here) looks
> something like this:
>
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index bab3aa3..24df008 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -24,6 +24,7 @@
> #include <asm/hvm/irq.h>
> #include <asm/hvm/support.h>
> #include <asm/io_apic.h>
> +#include <asm/msi.h>
>
> static DEFINE_PER_CPU(struct list_head, dpci_list);
>
> @@ -282,6 +283,7 @@ int pt_irq_create_bind(
> struct hvm_pirq_dpci *pirq_dpci;
> struct pirq *info;
> int rc, pirq = pt_irq_bind->machine_irq;
> + irq_desc_t *desc;
>
> if ( pirq < 0 || pirq >= d->nr_pirqs )
> return -EINVAL;
> @@ -422,6 +425,13 @@ int pt_irq_create_bind(
>
> dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
> pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> + BUG_ON(!local_irq_is_enabled());
> + desc = pirq_spin_lock_irq_desc(info, NULL);
> + if ( desc && desc->msi_desc && !msi_maskable_irq(desc-
> >msi_desc) )
> + set_bit(pirq_dpci->gmsi.gvec,
> + hvm_domain_irq(d)->unmaskable_msi_vecs);
> + spin_unlock_irq(&desc->lock);
> +
> spin_unlock(&d->event_lock);
>
> pirq_dpci->gmsi.posted = false;
> @@ -869,7 +874,8 @@ static int _hvm_dpci_msi_eoi(struct domain *d,
>
> void hvm_dpci_msi_eoi(struct domain *d, int vector)
> {
> - if ( !iommu_enabled || !hvm_domain_irq(d)->dpci )
> + if ( !iommu_enabled || !hvm_domain_irq(d)->dpci ||
> + !test_bit(vector, hvm_domain_irq(d)->unmaskable_msi_vecs) )
> return;
>
> spin_lock(&d->event_lock);
> diff --git a/xen/include/asm-x86/hvm/irq.h b/xen/include/asm-
> x86/hvm/irq.h
> index 8a43cb9..d9d4652 100644
> --- a/xen/include/asm-x86/hvm/irq.h
> +++ b/xen/include/asm-x86/hvm/irq.h
> @@ -78,6 +78,7 @@ struct hvm_irq {
> u8 round_robin_prev_vcpu;
>
> struct hvm_irq_dpci *dpci;
> + DECLARE_BITMAP(unmaskable_msi_vecs, 256);
>
> /*
> * Number of wires asserting each GSI.
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
[-- Attachment #2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5174 bytes --]
next prev parent reply other threads:[~2020-08-11 13:25 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-18 15:10 [PATCH v2] x86/hvm: re-work viridian APIC assist code Paul Durrant
2018-01-18 16:21 ` Jan Beulich
2018-01-18 16:27 ` Paul Durrant
2018-08-24 23:38 ` David Woodhouse
2018-09-03 10:12 ` Paul Durrant
2018-09-04 20:31 ` David Woodhouse
2018-09-05 9:36 ` Paul Durrant
2018-09-05 9:40 ` David Woodhouse
2018-09-05 9:43 ` Paul Durrant
2018-09-05 10:40 ` Paul Durrant
2018-09-05 10:45 ` David Woodhouse
2018-09-05 10:48 ` Paul Durrant
2020-08-11 13:25 ` David Woodhouse [this message]
2020-08-12 13:43 ` [Xen-devel] " Roger Pau Monné
2020-08-13 8:10 ` Paul Durrant
2020-08-13 9:45 ` Roger Pau Monné
2020-08-14 14:13 ` David Woodhouse
2020-08-14 14:41 ` Roger Pau Monné
2020-08-19 7:12 ` Jan Beulich
2020-08-19 8:26 ` Roger Pau Monné
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7547be305e3ede9edb897e2be898fe54e0b4065c.camel@infradead.org \
--to=dwmw2@infradead.org \
--cc=andrew.cooper3@citrix.com \
--cc=elnikety@amazon.de \
--cc=haitao.shan@intel.com \
--cc=jbeulich@suse.com \
--cc=paul.durrant@citrix.com \
--cc=roger.pau@citrix.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).