From: Dennis Zhou <dennis@kernel.org>
To: Mateusz Guzik <mjguzik@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: tj@kernel.org, hughd@google.com, akpm@linux-foundation.org,
vbabka@suse.cz, linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v4] percpu_counter: add a cmpxchg-based _add_batch variant
Date: Tue, 28 May 2024 13:56:58 -0700 [thread overview]
Message-ID: <ZlZFGmBiBE1VGQIt@snowbird> (raw)
In-Reply-To: <20240528204257.434817-1-mjguzik@gmail.com>
On Tue, May 28, 2024 at 10:42:57PM +0200, Mateusz Guzik wrote:
> Interrupt disable/enable trips are quite expensive on x86-64 compared to
> a mere cmpxchg (note: no lock prefix!) and percpu counters are used
> quite often.
>
> With this change I get a bump of 1% ops/s for negative path lookups,
> plugged into will-it-scale:
>
> void testcase(unsigned long long *iterations, unsigned long nr)
> {
> while (1) {
> int fd = open("/tmp/nonexistent", O_RDONLY);
> assert(fd == -1);
>
> (*iterations)++;
> }
> }
>
> The win would be higher if it was not for other slowdowns, but one has
> to start somewhere.
>
> Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>
> v4:
> - fix a misplaced paren in unlikely(), reported by lkp:
> https://lore.kernel.org/oe-lkp/ZlZAbkjOylfZC5Os@snowbird/T/#t
>
> v3:
> - add a missing word to the new comment
>
> v2:
> - dodge preemption
> - use this_cpu_try_cmpxchg
> - keep the old variant depending on CONFIG_HAVE_CMPXCHG_LOCAL
>
>
> lib/percpu_counter.c | 44 +++++++++++++++++++++++++++++++++++++++-----
> 1 file changed, 39 insertions(+), 5 deletions(-)
>
> diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
> index 44dd133594d4..51bc5246986d 100644
> --- a/lib/percpu_counter.c
> +++ b/lib/percpu_counter.c
> @@ -73,17 +73,50 @@ void percpu_counter_set(struct percpu_counter *fbc, s64 amount)
> EXPORT_SYMBOL(percpu_counter_set);
>
> /*
> - * local_irq_save() is needed to make the function irq safe:
> - * - The slow path would be ok as protected by an irq-safe spinlock.
> - * - this_cpu_add would be ok as it is irq-safe by definition.
> - * But:
> - * The decision slow path/fast path and the actual update must be atomic, too.
> + * Add to a counter while respecting batch size.
> + *
> + * There are 2 implementations, both dealing with the following problem:
> + *
> + * The decision slow path/fast path and the actual update must be atomic.
> * Otherwise a call in process context could check the current values and
> * decide that the fast path can be used. If now an interrupt occurs before
> * the this_cpu_add(), and the interrupt updates this_cpu(*fbc->counters),
> * then the this_cpu_add() that is executed after the interrupt has completed
> * can produce values larger than "batch" or even overflows.
> */
> +#ifdef CONFIG_HAVE_CMPXCHG_LOCAL
> +/*
> + * Safety against interrupts is achieved in 2 ways:
> + * 1. the fast path uses local cmpxchg (note: no lock prefix)
> + * 2. the slow path operates with interrupts disabled
> + */
> +void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch)
> +{
> + s64 count;
> + unsigned long flags;
> +
> + count = this_cpu_read(*fbc->counters);
> + do {
> + if (unlikely(abs(count + amount) >= batch)) {
> + raw_spin_lock_irqsave(&fbc->lock, flags);
> + /*
> + * Note: by now we might have migrated to another CPU
> + * or the value might have changed.
> + */
> + count = __this_cpu_read(*fbc->counters);
> + fbc->count += count + amount;
> + __this_cpu_sub(*fbc->counters, count);
> + raw_spin_unlock_irqrestore(&fbc->lock, flags);
> + return;
> + }
> + } while (!this_cpu_try_cmpxchg(*fbc->counters, &count, count + amount));
> +}
> +#else
> +/*
> + * local_irq_save() is used to make the function irq safe:
> + * - The slow path would be ok as protected by an irq-safe spinlock.
> + * - this_cpu_add would be ok as it is irq-safe by definition.
> + */
> void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch)
> {
> s64 count;
> @@ -101,6 +134,7 @@ void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch)
> }
> local_irq_restore(flags);
> }
> +#endif
> EXPORT_SYMBOL(percpu_counter_add_batch);
>
> /*
> --
> 2.39.2
>
Andrew you picked up the __this_cpu_try_cmpxchg() patches. At this point
you might as well pick up this too. The cpumask clean ups are likely
going to give me trouble later this week when I rebase so I'll probably
have to base my percpuh hotplug branch on your mm-unstable now.
Acked-by: Dennis Zhou <dennis@kernel.org>
Feel free to toss my ack on the __this_cpu_try_cmpxchg() too.
Thanks,
Dennis
next prev parent reply other threads:[~2024-05-28 20:57 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-28 20:42 [PATCH v4] percpu_counter: add a cmpxchg-based _add_batch variant Mateusz Guzik
2024-05-28 20:56 ` Dennis Zhou [this message]
2024-05-28 21:19 ` Andrew Morton
2024-05-28 23:24 ` Dennis Zhou
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZlZFGmBiBE1VGQIt@snowbird \
--to=dennis@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mjguzik@gmail.com \
--cc=tj@kernel.org \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.