From: "Cédric Le Goater" <clg@kaod.org>
To: Nicholas Piggin <npiggin@gmail.com>, <linuxppc-dev@lists.ozlabs.org>
Cc: Thomas Gleixner <tglx@linutronix.de>, Greg Kurz <groug@kaod.org>
Subject: Re: [PATCH v3 8/9] powerpc/xive: Map one IPI interrupt per node
Date: Fri, 2 Apr 2021 13:31:53 +0200 [thread overview]
Message-ID: <ae512e1b-91ee-1cc8-c6a8-36e014ed8788@kaod.org> (raw)
In-Reply-To: <1617279896.hl1bn8g3eg.astroid@bobo.none>
On 4/1/21 2:50 PM, Nicholas Piggin wrote:
> Excerpts from Cédric Le Goater's message of April 1, 2021 12:45 am:
>> ipistorm [*] can be used to benchmark the raw interrupt rate of an
>> interrupt controller by measuring the number of IPIs a system can
>> sustain. When applied to the XIVE interrupt controller of POWER9 and
>> POWER10 systems, a significant drop of the interrupt rate can be
>> observed when crossing the second node boundary.
>>
>> This is due to the fact that a single IPI interrupt is used for all
>> CPUs of the system. The structure is shared and the cache line updates
>> impact greatly the traffic between nodes and the overall IPI
>> performance.
>>
>> As a workaround, the impact can be reduced by deactivating the IRQ
>> lockup detector ("noirqdebug") which does a lot of accounting in the
>> Linux IRQ descriptor structure and is responsible for most of the
>> performance penalty.
>>
>> As a fix, this proposal allocates an IPI interrupt per node, to be
>> shared by all CPUs of that node. It solves the scaling issue, the IRQ
>> lockup detector still has an impact but the XIVE interrupt rate scales
>> linearly. It also improves the "noirqdebug" case as showed in the
>> tables below.
>>
>> * P9 DD2.2 - 2s * 64 threads
>>
>> "noirqdebug"
>> Mint/s Mint/s
>> chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
>> --------------------------------------------------------------
>> 1 0-15 4.984023 4.875405 4.996536 5.048892
>> 0-31 10.879164 10.544040 10.757632 11.037859
>> 0-47 15.345301 14.688764 14.926520 15.310053
>> 0-63 17.064907 17.066812 17.613416 17.874511
>> 2 0-79 11.768764 21.650749 22.689120 22.566508
>> 0-95 10.616812 26.878789 28.434703 28.320324
>> 0-111 10.151693 31.397803 31.771773 32.388122
>> 0-127 9.948502 33.139336 34.875716 35.224548
>>
>> * P10 DD1 - 4s (not homogeneous) 352 threads
>>
>> "noirqdebug"
>> Mint/s Mint/s
>> chips cpus IPI/sys IPI/chip IPI/chip IPI/sys
>> --------------------------------------------------------------
>> 1 0-15 2.409402 2.364108 2.383303 2.395091
>> 0-31 6.028325 6.046075 6.089999 6.073750
>> 0-47 8.655178 8.644531 8.712830 8.724702
>> 0-63 11.629652 11.735953 12.088203 12.055979
>> 0-79 14.392321 14.729959 14.986701 14.973073
>> 0-95 12.604158 13.004034 17.528748 17.568095
>> 2 0-111 9.767753 13.719831 19.968606 20.024218
>> 0-127 6.744566 16.418854 22.898066 22.995110
>> 0-143 6.005699 19.174421 25.425622 25.417541
>> 0-159 5.649719 21.938836 27.952662 28.059603
>> 0-175 5.441410 24.109484 31.133915 31.127996
>> 3 0-191 5.318341 24.405322 33.999221 33.775354
>> 0-207 5.191382 26.449769 36.050161 35.867307
>> 0-223 5.102790 29.356943 39.544135 39.508169
>> 0-239 5.035295 31.933051 42.135075 42.071975
>> 0-255 4.969209 34.477367 44.655395 44.757074
>> 4 0-271 4.907652 35.887016 47.080545 47.318537
>> 0-287 4.839581 38.076137 50.464307 50.636219
>> 0-303 4.786031 40.881319 53.478684 53.310759
>> 0-319 4.743750 43.448424 56.388102 55.973969
>> 0-335 4.709936 45.623532 59.400930 58.926857
>> 0-351 4.681413 45.646151 62.035804 61.830057
>>
>> [*] https://github.com/antonblanchard/ipistorm
>>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>
> Very nice result but the default-on irqdebug code is quite a slowdown
> even with your improvements.
>
> Is the main cacheline bouncing in the fast path coming from
> desc->irq_count++ of the percpu handler? Can we do something quick and
> dirty like the attached patch?
>
> All this stuff seems totally racy with percpu handler but maybe that
> doesn't matter too much (and anyway it would be a much bigger change)
I gave the patch below a try and we are reaching the same results,
even better. The simplest solution is always the best. Nick, you
should send that single patch.
Thanks,
C.
> Thanks,
> Nick
>
> ---
> kernel/irq/spurious.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c
> index f865e5f4d382..6b17b737ee6c 100644
> --- a/kernel/irq/spurious.c
> +++ b/kernel/irq/spurious.c
> @@ -378,7 +378,8 @@ void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret)
> * then we merily delay the spurious detection
> * by one hard interrupt. Not a real problem.
> */
> - desc->threads_handled_last &= ~SPURIOUS_DEFERRED;
> + if (desc->threads_handled_last & SPURIOUS_DEFERRED)
> + desc->threads_handled_last &= ~SPURIOUS_DEFERRED;
> }
> }
>
> @@ -403,6 +404,10 @@ void note_interrupt(struct irq_desc *desc, irqreturn_t action_ret)
> desc->irqs_unhandled -= ok;
> }
>
> + if (likely(!desc->irqs_unhandled))
> + return;
> +
> + /* Now getting into unhandled irq detection */
> desc->irq_count++;
> if (likely(desc->irq_count < 100000))
> return;
>
next prev parent reply other threads:[~2021-04-02 11:32 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-31 14:45 [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 1/9] powerpc/xive: Use cpu_to_node() instead of "ibm, chip-id" property Cédric Le Goater
2021-04-01 2:49 ` [PATCH v3 1/9] powerpc/xive: Use cpu_to_node() instead of "ibm,chip-id" property David Gibson
2021-04-01 9:10 ` Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 2/9] powerpc/xive: Introduce an IPI interrupt domain Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 3/9] powerpc/xive: Remove useless check on XIVE_IPI_HW_IRQ Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 4/9] powerpc/xive: Simplify xive_core_debug_show() Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 5/9] powerpc/xive: Drop check on irq_data in xive_core_debug_show() Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 6/9] powerpc/xive: Simplify the dump of XIVE interrupts under xmon Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 7/9] powerpc/xive: Fix xmon command "dxi" Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 8/9] powerpc/xive: Map one IPI interrupt per node Cédric Le Goater
2021-04-01 12:50 ` Nicholas Piggin
2021-04-02 11:31 ` Cédric Le Goater [this message]
2021-04-02 12:19 ` Cédric Le Goater
2021-03-31 14:45 ` [PATCH v3 9/9] powerpc/xive: Modernize XIVE-IPI domain with an 'alloc' handler Cédric Le Goater
2021-04-01 8:04 ` [PATCH v3 0/9] powerpc/xive: Map one IPI interrupt per node Greg Kurz
2021-04-01 9:18 ` Cédric Le Goater
2021-04-01 12:45 ` Greg Kurz
2021-04-01 17:14 ` Cédric Le Goater
2021-04-01 8:42 ` Cédric Le Goater
2021-04-19 3:59 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ae512e1b-91ee-1cc8-c6a8-36e014ed8788@kaod.org \
--to=clg@kaod.org \
--cc=groug@kaod.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=npiggin@gmail.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).