* [BUG] Possible locking issues in stop_machine code on 6k core machine
@ 2014-12-03 20:40 Alex Thorlton
2014-12-03 23:34 ` Thomas Gleixner
0 siblings, 1 reply; 3+ messages in thread
From: Alex Thorlton @ 2014-12-03 20:40 UTC (permalink / raw)
To: linux-kernel
Cc: Andrew Morton, Peter Zijlstra, Fabian Frederick, Ingo Molnar,
Alex Thorlton, Russ Anderson, linux-kernel
Hey guys,
While working to get our newly upgraded 6k core machine online, we've
discovered a few possible locking issues in the stop_machine code that
we're trying to get sorted out. (We think) the problems we're seeing
stem from possible interaction between stop_cpus and stop_one_cpu. The
issue presents as a deadlock, and seems to only show itself
intermittently.
After quite a bit of debugging we think we've narrowed the issue down to
the fact that stop_one_cpu does not respect many of the locks that are
taken in the stop_cpus code path. For reference the stop_cpus code path
takes the stop_cpus_mutex, then stop_cpus_lock, and then takes each
cpu's stopper->lock. stop_one_cpu seems to rely solely on the
stopper->lock.
What appears to be happening to cause our deadlock is, stop_cpus works
its way down to queue_stop_cpus_work, which tells each cpu's stopper
task to wake up, take its lock, and do its work. As the loop that does
this progresses, the lowest numbered cpus complete their work, and are
allowed to go on about their business. The problem occurs when one of
these lower numbered cpus calls stop_one_cpu, targeting one of the
higher numbered cpus, which the stop_cpus loop has not yet reached. If
this happens, that higher numbered cpu's completion variable will get
stomped on, and the wait_for_completion in the stop_cpus code path will
never return.
A quick example: CPU 0 calls stop_cpus, which will hit all 6,000 cores.
CPU 50 completes its stopper work, and at some point in the near future
calls stop_one_cpu on CPU 5000. This clobbers CPU 5000's pointer to the
cpu_stop_done struct set up in queue_stop_cpus_work, meaning that, once
CPU 5000 completes its work, it won't be able to decrement the nr_todo
for the correct cpu_stop_done struct, and CPU 0's wait_for_completion
will never return.
Again, much of this is semi-educated guesswork, put together based on
information gathered from examining lots of debug output, in an attempt
to spot the problem. We're fairly certain that we've pinned down our
issue, but we'd like to ask those who are more knowledgeable of these
code paths to weigh in their opinions here.
We'd really appreciate any help that anyone can offer. Thanks!
- Alex
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [BUG] Possible locking issues in stop_machine code on 6k core machine 2014-12-03 20:40 [BUG] Possible locking issues in stop_machine code on 6k core machine Alex Thorlton @ 2014-12-03 23:34 ` Thomas Gleixner 2014-12-04 15:58 ` Alex Thorlton 0 siblings, 1 reply; 3+ messages in thread From: Thomas Gleixner @ 2014-12-03 23:34 UTC (permalink / raw) To: Alex Thorlton Cc: linux-kernel, Andrew Morton, Peter Zijlstra, Fabian Frederick, Ingo Molnar, Russ Anderson Alex, first of all, could you please format your mail so it is actually readable and can be replied to? This looks like copy and paste from a office app with about 5 gazillion spaces at the end of each line. I have no idea how you tricked mutt to do that. On Wed, 3 Dec 2014, Alex Thorlton wrote: > While working to get our newly upgraded 6k core machine online, > we've discovered a few possible locking issues in the stop_machine > code that we're trying to get sorted out. (We think) the problems > we're seeing stem from possible interaction between stop_cpus and > stop_one_cpu. The issue presents as a deadlock, and seems to only > show itself intermittently. > After quite a bit of debugging we think we've narrowed the issue > down to the fact that stop_one_cpu does not respect many of the > locks that are taken in the stop_cpus code path. For reference the > stop_cpus code path takes the stop_cpus_mutex, then stop_cpus_lock, > and then takes each cpu's stopper->lock. stop_one_cpu seems to rely > solely on the stopper->lock. And what's actually wrong with that? stop_cpus() mutex_lock(&stop_cpus_mutex); __stop_cpus() cpu_stop_init_done(); <-- works on a stack variable queue_stop_cpus_work() for_each_cpu() per_cpu(stop_cpus_work, cpu) ^^^^ is a static per cpu work struct protected by stop_cpus_mutex /* Now we have to queue that work for real */ lg_global_lock(&stop_cpus_lock); for_each_cpu(cpu, cpumask) cpu_stop_queue_work(cpu ...) lock(per_cpu(stopper_lock, cpu); list_add_tail(work, per_cpu(worklist, cpu); unlock(per_cpu(stopper_lock, cpu); lg_global_unlock(&stop_cpus_lock); stop_one_cpu() cpu_stop_init_done(&done, 1); <-- works on a stack variable cpu_stop_queue_work(cpu, &work); <-- work is on stack wait_for_completion(&done.completion); So stop_cpus() operates on static allocated per cpu cpu_stop_work. The work->done of each per cpu work points to cpu_stop_done which is on the stack of the caller. Nothing can stomp on the per_cpu work as long as stop_cpus_mutex is held. Nothing can stomp on the done struct which is on the stack of the caller and only referenced by the cpu work. stop_one_cpu() has both the work and the done on the callers stack. So it does neither need to take the stop_cpus_mutex nor the stop_cpus_lock lglock. All it requires is the per cpu stopper->lock to add the work to the per cpu list of the target cpu. > What appears to be happening to cause our deadlock is, stop_cpus > works its way down to queue_stop_cpus_work, which tells each cpu's > stopper task to wake up, take its lock, and do its work. As the > loop that does this progresses, the lowest numbered cpus complete > their work, and are allowed to go on about their business. The > problem occurs when one of these lower numbered cpus calls > stop_one_cpu, targeting one of the higher numbered cpus, which the > stop_cpus loop has not yet reached. If this happens, that higher > numbered cpu's completion variable will get stomped on, and the > wait_for_completion in the stop_cpus code path will never return. That's complete nonsense. It CANNOT stomp on the completion variable of a higher numbered cpu, because there is no such variable. Lets look at a single CPU: The per cpu struct cpu_stopper has work A (from stop_cpus) queued. A looks like this: A->fn = stop_cpus(fn); A->arg = stop_cpus(arg); A->done = struct cpu_stop_done on the stack of the stop_cpus() caller Now B gets queued: B looks like this: B->fn = stop_one_cpu(fn); B->arg = stop_one_cpu(arg); B->done = struct cpu_stop_done on the stack of the stop_one_cpu() caller The only connection between A and B is that they are enqueued on the same list, i.e. the er cpu struct cpu_stopper::works list. So now explain, how the enqueueing of B stomps on A->done. There is no way that the list_add_tail(B) can modify A->done. Again. This are the protection scopes: stop_cpus_mutex protects stop_cpus_work It is held across the stop_cpus() call until the wait__for_completion() returns. So nothing can stomp on stop_cpus_work and the done pointer of any cpu until stop_cpus_mutex is released. stop_cpus_lock serializes multi_cpu_stop invocations You have not provided any indication that multi_cpu_stop is involved, so that's irrelevant. stopper->lock protects the per cpu work list > Again, much of this is semi-educated guesswork, put together based I'd say semi-educated guesswork is a euphemism. > on information gathered from examining lots of debug output, in an > attempt to spot the problem. Of course you completely missed to provide the actual call chains and callback functions involved. Instead of that you provide your interpretation of the problem, which seems to be a bit off. > We're fairly certain that we've pinned down our issue, I'm very certain that your interpretation of lots of debug output is pretty much in line with your creative description of the problem. > but we'd like to ask those who are more knowledgeable of these code > paths to weigh in their opinions here. Done. > We'd really appreciate any help that anyone can offer. Thanks! If you would provide real data instead of half baken assumptions we might actually be able to help you. There might be some hidden issue in the stomp machine crap, but surely not the one you described. Thanks, tglx ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG] Possible locking issues in stop_machine code on 6k core machine 2014-12-03 23:34 ` Thomas Gleixner @ 2014-12-04 15:58 ` Alex Thorlton 0 siblings, 0 replies; 3+ messages in thread From: Alex Thorlton @ 2014-12-04 15:58 UTC (permalink / raw) To: Thomas Gleixner Cc: Alex Thorlton, linux-kernel, Andrew Morton, Peter Zijlstra, Fabian Frederick, Ingo Molnar, Russ Anderson On Thu, Dec 04, 2014 at 12:34:04AM +0100, Thomas Gleixner wrote: > first of all, could you please format your mail so it is actually > readable and can be replied to? My fault there - stupid mistake. > If you would provide real data instead of half baken assumptions > we might actually be able to help you. > > There might be some hidden issue in the stomp machine crap, but surely > not the one you described. I'll return with some actual data so that others can get a better picture of what we're seeing. I'll also see if I can come up with a way to reproduce the issue on a smaller configuration; might not be possible, but attempting to do so might provide some more insight into our problem. Thanks for your detailed description of the code path! I definitely understand that our assumption can't be the actual problem here. I'll dig back into the issue and try to get some better information. Thanks again! - Alex ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-12-04 15:58 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-12-03 20:40 [BUG] Possible locking issues in stop_machine code on 6k core machine Alex Thorlton 2014-12-03 23:34 ` Thomas Gleixner 2014-12-04 15:58 ` Alex Thorlton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox