All of lore.kernel.org
 help / color / mirror / Atom feed
* sched: memory corruption on completing completions
@ 2015-02-04 23:24 Sasha Levin
  2015-02-04 23:46 ` Andrew Morton
  2015-02-05  0:16 ` Linus Torvalds
  0 siblings, 2 replies; 14+ messages in thread
From: Sasha Levin @ 2015-02-04 23:24 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar
  Cc: Andrew Morton, Linus Torvalds, Andrey Ryabinin, Dave Jones, LKML

Hi all,

I was fuzzing with trinity on a -next kernel with the KASan patchset, and
got what initially appeared to be a rather odd trace:

[  856.817966] BUG: AddressSanitizer: out of bounds on stack in do_raw_spin_unlock+0x417/0x4f0 at addr ffff8803875c7c42
[  856.817966] Read of size 2 by task migration/15/140
[  856.821565] page:ffffea000e1d71c0 count:0 mapcount:0 mapping:          (null) index:0x0
[  856.821565] flags: 0x2efffff80000000()
[  856.821565] page dumped because: kasan: bad access detected
[  856.821565] CPU: 15 PID: 140 Comm: migration/15 Not tainted 3.19.0-rc5-next-20150123-sasha-00061-g527ff0d-dirty #1814
[  856.841712]  0000000000000000 0000000000000000 ffff8804d0447aa0 ffff8804d04479e8
[  856.843478]  ffffffff92eb033b 1ffffd4001c3ae3f ffffea000e1d71c0 ffff8804d0447a88
[  856.843478]  ffffffff81b4c292 ffff8804d0447a58 ffffffff815ef1e1 0000000000000000
[  856.843478] Call Trace:
[  856.843478] dump_stack (lib/dump_stack.c:52)
[  856.843478] kasan_report_error (mm/kasan/report.c:136 mm/kasan/report.c:194)
[  856.843478] ? __lock_is_held (kernel/locking/lockdep.c:3518)
[  856.843478] ? sched_ttwu_pending (kernel/sched/core.c:4921)
[  856.843478] __asan_report_load2_noabort (mm/kasan/report.c:234)
[  856.843478] ? do_raw_spin_unlock (./arch/x86/include/asm/spinlock.h:157 kernel/locking/spinlock_debug.c:159)
[  856.843478] do_raw_spin_unlock (./arch/x86/include/asm/spinlock.h:157 kernel/locking/spinlock_debug.c:159)
[  856.843478] ? do_raw_spin_trylock (kernel/locking/spinlock_debug.c:157)
[  856.843478] _raw_spin_unlock_irqrestore (include/linux/spinlock_api_smp.h:162 kernel/locking/spinlock.c:191)
[  856.843478] complete (kernel/sched/completion.c:37)
[  856.843478] cpu_stop_signal_done (kernel/stop_machine.c:69)
[  856.843478] cpu_stopper_thread (include/linux/spinlock.h:342 kernel/stop_machine.c:456)
[  856.843478] ? irq_cpu_stop_queue_work (kernel/stop_machine.c:449)
[  856.843478] ? do_raw_spin_trylock (kernel/locking/spinlock_debug.c:157)
[  856.843478] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/paravirt.h:809 include/linux/spinlock_api_smp.h:162 kernel/locking/spinlock.c:191)
[  856.843478] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2554 kernel/locking/lockdep.c:2601)
[  856.843478] ? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:163 kernel/locking/spinlock.c:191)
[  856.843478] smpboot_thread_fn (kernel/smpboot.c:161)
[  856.843478] ? smpboot_unpark_thread (kernel/smpboot.c:105)
[  856.843478] ? schedule (kernel/sched/core.c:2853)
[  856.843478] ? __kthread_parkme (./arch/x86/include/asm/current.h:14 kernel/kthread.c:164)
[  856.843478] ? smpboot_unpark_thread (kernel/smpboot.c:105)
[  856.843478] kthread (kernel/kthread.c:207)
[  856.843478] ? flush_kthread_work (kernel/kthread.c:176)
[  856.843478] ? schedule_tail (./arch/x86/include/asm/preempt.h:95 kernel/sched/core.c:2318)
[  856.843478] ? flush_kthread_work (kernel/kthread.c:176)
[  856.843478] ret_from_fork (arch/x86/kernel/entry_64.S:349)
[  856.843478] ? flush_kthread_work (kernel/kthread.c:176)
[  856.843478] Memory state around the buggy address:
[  856.843478]  ffff8803875c7b00: f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  856.843478]  ffff8803875c7b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  856.843478] >ffff8803875c7c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f1
[  856.843478]                                            ^
[  856.843478]  ffff8803875c7c80: f1 f1 f1 00 f4 f4 f4 f2 f2 f2 f2 00 00 f4 f4 00
[  856.843478]  ffff8803875c7d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

We see this trace only now because Andrey has recently made KASan able to detect issues
with memory on stack, rather than just kmalloc()ed memory, so the underlying bug has
probably been there for a while.

Initially we couldn't explain it. It was reproducing often, and always on the
spin_unlock_irqrestore() call at the end of the complete() code.

I now have a theory for why it happens:

Thread A				Thread B
----------------------------------------------------------

[Enter function]
DECLARE_COMPLETION_ONSTACK(x)
wait_for_completion(x)
					complete(x)
					[In complete(x):]
				        spin_lock_irqsave(&x->wait.lock, flags);
				        x->done++;
				        __wake_up_locked(&x->wait, TASK_NORMAL, 1);
[Done waiting, wakes up]
[Exit function]
				        spin_unlock_irqrestore(&x->wait.lock, flags);



So the spin_unlock_irqrestore() at the end of complete() would proceed to corruption
the stack of thread A.


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-02-06 14:59 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-04 23:24 sched: memory corruption on completing completions Sasha Levin
2015-02-04 23:46 ` Andrew Morton
2015-02-05  0:16 ` Linus Torvalds
2015-02-05  7:10   ` Ingo Molnar
2015-02-05  9:30   ` Peter Zijlstra
2015-02-05 20:44     ` Sasha Levin
2015-02-05 20:59   ` Davidlohr Bueso
2015-02-05 21:02     ` Sasha Levin
2015-02-05 21:34       ` Linus Torvalds
2015-02-05 22:37         ` Davidlohr Bueso
2015-02-05 22:57           ` Linus Torvalds
2015-02-06  6:48             ` Raghavendra K T
2015-02-06 15:00               ` Raghavendra K T
2015-02-06 12:29           ` Raghavendra K T

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.