From: Frank Mayhar <fmayhar@google.com>
To: Roland McGrath <roland@redhat.com>
Cc: parag.warudkar@gmail.com,
"Alejandro Riveira Fernández" <ariveira@gmail.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, "Ingo Molnar" <mingo@elte.hu>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Jakub Jelinek" <jakub@redhat.com>
Subject: Re: [Bugme-new] [Bug 9906] New: Weird hang with NPTL and SIGPROF.
Date: Tue, 04 Mar 2008 11:52:56 -0800 [thread overview]
Message-ID: <1204660376.9768.1.camel@bobble.smo.corp.google.com> (raw)
In-Reply-To: <20080304070016.903E127010A@magilla.localdomain>
Put this on the patch but I'm emailing it as well.
On Mon, 2008-03-03 at 23:00 -0800, Roland McGrath wrote:
> Thanks for the detailed explanation and for bringing this to my attention.
You're quite welcome.
> This is a problem we knew about when I first implemented posix-cpu-timers
> and process-wide SIGPROF/SIGVTALRM. I'm a little surprised it took this
> long to become a problem in practice. I originally expected to have to
> revisit it sooner than this, but I certainly haven't thought about it for
> quite some time. I'd guess that HZ=1000 becoming common is what did it.
Well, the iron is getting bigger, too, so it's beginning to be feasible
to run _lots_ of threads.
> The obvious implementation for the process-wide clocks is to have the
> tick interrupt increment shared utime/stime/sched_time fields in
> signal_struct as well as the private task_struct fields. The all-threads
> totals accumulate in the signal_struct fields, which would be atomic_t.
> It's then trivial for the timer expiry checks to compare against those
> totals.
>
> The concern I had about this was multiple CPUs competing for the
> signal_struct fields. (That is, several CPUs all running threads in the
> same process.) If the ticks on each CPU are even close to synchronized,
> then every single time all those CPUs will do an atomic_add on the same
> word. I'm not any kind of expert on SMP and cache effects, but I know
> this is bad. However bad it is, it's that bad all the time and however
> few threads (down to 2) it's that bad for that many CPUs.
>
> The implementation we have instead is obviously dismal for large numbers
> of threads. I always figured we'd replace that with something based on
> more sophisticated thinking about the CPU-clash issue.
>
> I don't entirely follow your description of your patch. It sounds like it
> should be two patches, though. The second of those patches (workqueue)
> sounds like it could be an appropriate generic cleanup, or like it could
> be a complication that might be unnecessary if we get a really good
> solution to main issue.
>
> The first patch I'm not sure whether I understand what you said or not.
> Can you elaborate? Or just post the unfinished patch as illustration,
> marking it as not for submission until you've finished.
My first patch did essentially what you outlined above, incrementing
shared utime/stime/sched_time fields, except that they were in the
task_struct of the group leader rather than in the signal_struct. It's
not clear to me exactly how the signal_struct is shared, whether it is
shared among all threads or if each has its own version.
So each timer routine had something like:
/* If we're part of a thread group, add our time to the leader. */
if (p->group_leader != NULL)
p->group_leader->threads_sched_time += tmp;
and check_process_timers() had
/* Times for the whole thread group are held by the group leader. */
utime = cputime_add(utime, tsk->group_leader->threads_utime);
stime = cputime_add(stime, tsk->group_leader->threads_stime);
sched_time += tsk->group_leader->threads_sched_time;
Of course, this alone is insufficient. It speeds things up a tiny bit
but not nearly enough.
The other issue has to do with the rest of the processing in
run_posix_cpu_timers(), walking the timer lists and walking the whole
thread group (again) to rebalance expiry times. My second patch moved
all that work to a workqueue, but only if there were more than 100
threads in the process. This basically papered over the problem by
moving the processing out of interrupt and into a kernel thread. It's
still insufficient, though, because it takes just as long and will get
backed up just as badly on large numbers of threads. This was made
clear in a test I ran yesterday where I generated some 200,000 threads.
The work queue was unreasonably large, as you might expect.
I am looking for a way to do everything that needs to be done in fewer
operations, but unfortunately I'm not familiar enough with the
SIGPROF/SIGVTALRM semantics or with the details of the Linux
implementation to know where it is safe to consolidate things.
--
Frank Mayhar <fmayhar@google.com>
Google, Inc.
next prev parent reply other threads:[~2008-03-04 19:56 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <bug-9906-10286@http.bugzilla.kernel.org/>
2008-02-07 0:50 ` [Bugme-new] [Bug 9906] New: Weird hang with NPTL and SIGPROF Andrew Morton
2008-02-07 0:58 ` Frank Mayhar
2008-02-07 2:57 ` Parag Warudkar
2008-02-07 15:22 ` Alejandro Riveira Fernández
2008-02-07 15:53 ` Parag Warudkar
2008-02-07 15:56 ` Parag Warudkar
2008-02-07 15:54 ` Alejandro Riveira Fernández
2008-02-07 16:01 ` Parag Warudkar
2008-02-07 16:53 ` Parag Warudkar
2008-02-29 19:55 ` Frank Mayhar
2008-03-04 7:00 ` Roland McGrath
2008-03-04 19:52 ` Frank Mayhar [this message]
2008-03-05 4:08 ` Roland McGrath
2008-03-06 19:04 ` Frank Mayhar
2008-03-11 7:50 ` posix-cpu-timers revamp Roland McGrath
2008-03-11 21:05 ` Frank Mayhar
2008-03-11 21:35 ` Roland McGrath
2008-03-14 0:37 ` Frank Mayhar
2008-03-21 7:18 ` Roland McGrath
2008-03-21 17:57 ` Frank Mayhar
2008-03-22 21:58 ` Roland McGrath
2008-03-24 17:34 ` Frank Mayhar
2008-03-24 22:43 ` Frank Mayhar
2008-03-31 5:44 ` Roland McGrath
2008-03-31 20:24 ` Frank Mayhar
2008-04-02 2:07 ` Roland McGrath
2008-04-02 16:34 ` Frank Mayhar
2008-04-02 17:42 ` Frank Mayhar
2008-04-02 19:48 ` Roland McGrath
2008-04-02 20:34 ` Frank Mayhar
2008-04-02 21:42 ` Frank Mayhar
2008-04-04 0:53 ` Frank Mayhar
2008-04-04 23:17 ` Roland McGrath
2008-04-06 5:26 ` Frank Mayhar
2008-04-07 20:08 ` Roland McGrath
2008-04-07 21:31 ` Frank Mayhar
2008-04-07 22:02 ` Roland McGrath
2008-04-08 21:27 ` Frank Mayhar
2008-04-08 21:52 ` Frank Mayhar
2008-04-08 22:49 ` Roland McGrath
2008-04-09 16:29 ` Frank Mayhar
2008-04-02 18:42 ` Frank Mayhar
2008-03-28 0:52 ` [PATCH 2.6.25-rc6] Fix itimer/many thread hang Frank Mayhar
2008-03-28 10:28 ` Ingo Molnar
2008-03-28 22:46 ` [PATCH 2.6.25-rc7 resubmit] " Frank Mayhar
2008-04-01 18:45 ` Andrew Morton
2008-04-01 21:46 ` Frank Mayhar
2008-03-21 20:40 ` posix-cpu-timers revamp Frank Mayhar
2008-03-07 23:26 ` [Bugme-new] [Bug 9906] New: Weird hang with NPTL and SIGPROF Frank Mayhar
2008-03-08 0:01 ` Frank Mayhar
2008-02-07 17:36 ` Frank Mayhar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1204660376.9768.1.camel@bobble.smo.corp.google.com \
--to=fmayhar@google.com \
--cc=akpm@linux-foundation.org \
--cc=ariveira@gmail.com \
--cc=jakub@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=parag.warudkar@gmail.com \
--cc=roland@redhat.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox