From: Oleg Nesterov <oleg@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Dylan Hatch <dylanbhatch@google.com>,
Kees Cook <keescook@chromium.org>,
Frederic Weisbecker <frederic@kernel.org>,
"Joel Fernandes (Google)" <joel@joelfernandes.org>,
Ard Biesheuvel <ardb@kernel.org>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Vincent Whitchurch <vincent.whitchurch@axis.com>,
Dmitry Vyukov <dvyukov@google.com>,
Luis Chamberlain <mcgrof@kernel.org>,
Mike Christie <michael.christie@oracle.com>,
David Hildenbrand <david@redhat.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Stefan Roesch <shr@devkernel.io>, Joey Gouly <joey.gouly@arm.com>,
Josh Triplett <josh@joshtriplett.org>,
Helge Deller <deller@gmx.de>,
Ondrej Mosnacek <omosnace@redhat.com>,
Florent Revest <revest@chromium.org>,
Miguel Ojeda <ojeda@kernel.org>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] getrusage: use sig->stats_lock
Date: Sun, 21 Jan 2024 13:07:54 +0100 [thread overview]
Message-ID: <20240121120754.GA2814@redhat.com> (raw)
In-Reply-To: <20240120204552.c0708fd10fc8e2442c447049@linux-foundation.org>
On 01/20, Andrew Morton wrote:
>
> On Fri, 19 Jan 2024 19:27:49 -0800 Dylan Hatch <dylanbhatch@google.com> wrote:
>
> >
> > I applied these to a 5.10 kernel, and my repro (calling getrusage(RUSAGE_SELF)
> > from 200K threads) is no longer triggering a hard lockup.
>
> Thanks, but...
>
> The changelogs don't actually describe any hard lockup. [1/2] does
> mention "the deadlock" but that's all the info we have.
Sorry for confusion... 1/2 tries to explain that this change is not
strictly necessary for 2/2, it is safe to call thread_group_cputime()
with sig->stats_lock held for writing even if thread_group_cputime()
takes the same lock, because in this case thread_group_cputime() can't
enter the slow mode.
> So could we please have a suitable description of the bug which these are
> addressing? And a Reported-by:, a Closes: and a Fixes would be great too.
Yes sorry I forgot to add Reported-by. So I'll try to update the changelog
and add Reported-and-tested-by.
But the problem is known and old. I think do_io_accounting() had the same
problem until 1df4bd83cdfdbd0 ("do_io_accounting: use sig->stats_lock").
and do_task_stat() ...
getrusage() takes siglock and does for_each_thread() twice. If NR_THREADS
call sys_getrusage() in an endless loop on NR_CPUS, lock_task_sighand()
can trigger a hard lockup because it spins with irqs disabled waiting
for other NR_CPUS-1 which need the same siglock. So the time it spins
with irqs disabled is O(NR_CPUS * NR_THREADS).
With this patch all the threads can run lockless in parallel in the
likely case.
Dylan, do you have a better description? Can you share your repro?
although I think that something simple like
#define NT BIG_NUMBER
pthread_barrier_t barr;
void *thread(void *arg)
{
struct rusage ru;
pthread_barrier_wait(&barr);
for (;;)
getrusage(RUSAGE_SELF, &ru);
return NULL;
}
int main(void)
{
pthread_barrier_init(&barr, NULL, NT);
for (int n = 0; n < NT-1; ++n) {
pthread_t pt;
pthread_create(&pt, NULL, thread, NULL);
}
thread(NULL);
return 0;
}
should work if you have a machine with a lot of memory/cpus.
Oleg.
next prev parent reply other threads:[~2024-01-21 12:09 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-17 19:25 [RFC PATCH] getrusage: Use trylock when getting sighand lock Dylan Hatch
2024-01-17 20:44 ` Oleg Nesterov
2024-01-18 15:56 ` Oleg Nesterov
2024-01-19 14:15 ` [PATCH 1/2] getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand() Oleg Nesterov
2024-01-19 14:15 ` [PATCH 2/2] getrusage: use sig->stats_lock Oleg Nesterov
2024-01-20 3:27 ` Dylan Hatch
2024-01-21 4:45 ` Andrew Morton
2024-01-21 12:07 ` Oleg Nesterov [this message]
2024-01-23 2:53 ` Dylan Hatch
2024-01-21 22:32 ` Andrew Morton
2024-01-20 3:29 ` [PATCH 1/2] getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand() Dylan Hatch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240121120754.GA2814@redhat.com \
--to=oleg@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=ardb@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=catalin.marinas@arm.com \
--cc=david@redhat.com \
--cc=deller@gmx.de \
--cc=dvyukov@google.com \
--cc=dylanbhatch@google.com \
--cc=ebiederm@xmission.com \
--cc=frederic@kernel.org \
--cc=joel@joelfernandes.org \
--cc=joey.gouly@arm.com \
--cc=josh@joshtriplett.org \
--cc=keescook@chromium.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=michael.christie@oracle.com \
--cc=ojeda@kernel.org \
--cc=omosnace@redhat.com \
--cc=revest@chromium.org \
--cc=shr@devkernel.io \
--cc=tglx@linutronix.de \
--cc=vincent.whitchurch@axis.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.