public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* regression since 4.8 and newer in select_idle_siblings()
@ 2016-10-18 13:40 Igor Mammedov
  2016-10-18 14:02 ` Mike Galbraith
  0 siblings, 1 reply; 3+ messages in thread
From: Igor Mammedov @ 2016-10-18 13:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: mingo, peterz, tglx, efault, torvalds, imammedo

kernel crashes at runtime  due null pointer dereference at
  select_idle_sibling()
     -> select_idle_cpu()
         ...
         u64 avg_cost = this_sd->avg_scan_cost;

regression bisects to:
  commit 10e2f1acd0106c05229f94c70a344ce3a2c8008b
  Author: Peter Zijlstra <peterz@infradead.org>
  sched/core: Rewrite and improve select_idle_siblings()

to reproduce crash at runtime start VM with:
 qemu-system-x86_64 [-enable-kvm] \
    -smp 4,sockets=2 \
    linux48_disk.img

and offline cpu1 in guest:
 echo 0 > /sys/devices/system/cpu/cpu1/online

as result guest panics immediately or with some small delay
from some path that triggers access to select_idle_sibling().


To reproduce crash at boot start VM with a recent QEMU (since 2.7):
 qemu-2.7/qemu-system-x86_64
    -smp 1,sockets=2,cores=2,threads=1,maxcpus=4 \
    -device qemu64-x86_64-cpu,socket-id=1,core-id=0,thread-id=0 \
    -device qemu64-x86_64-cpu,socket-id=1,core-id=1,thread-id=0 \
    -kernel bzImage_v48 [-enable-kvm]


=== one of the panics ===
[    0.688680] BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
[    0.688685] IP: [<ffffffff810de382>] select_idle_sibling+0x172/0x3b0
[    0.688686] PGD 0 
[    0.688687] Oops: 0000 [#1] SMP
[    0.688690] CPU: 0 PID: 109 Comm: kworker/u8:2 Not tainted 4.8.0-rc8+ #675
[    0.688690] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
[    0.688694] Workqueue: events_unbound async_run_entry_fn
[    0.688695] task: ffff88007c258000 task.stack: ffff88007c3b0000
[    0.688697] RIP: 0010:[<ffffffff810de382>]  [<ffffffff810de382>] select_idle_sibling+0x172/0x3b0
[    0.688697] RSP: 0000:ffff88007c3b3bb0  EFLAGS: 00010007
[    0.688698] RAX: 000000000000051b RBX: 0000000000000004 RCX: 0000000000000001
[    0.688699] RDX: 0000000000000040 RSI: 0000000000000004 RDI: ffff88007d00a008
[    0.688699] RBP: ffff88007c3b3c10 R08: 0000000000000000 R09: 0000000000000000
[    0.688700] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
[    0.688700] R13: ffff88007d00a008 R14: 0000000000000000 R15: 0000000000000004
[    0.688701] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[    0.688702] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.688703] CR2: 0000000000000078 CR3: 0000000001c06000 CR4: 00000000000006f0
[    0.688705] Stack:
[    0.688707]  0000000000000001 ffff88007c80e480 000000000000a118 ffff88007c282900
[    0.688708]  0000000100000000 0000000000000002 0000000200000000 ffff88007c80e600
[    0.688709]  ffff88007c282900 0000000000018ec0 0000000000000000 0000000000000000
[    0.688710] Call Trace:
[    0.688712]  [<ffffffff810decd7>] select_task_rq_fair+0x717/0x730
[    0.688713]  [<ffffffff810e1ba7>] ? update_curr+0xc7/0x150
[    0.688715]  [<ffffffff810dc33c>] ? __enqueue_entity+0x6c/0x70
[    0.688718]  [<ffffffff810d5224>] try_to_wake_up+0x104/0x390
[    0.688719]  [<ffffffff810d5c15>] wake_up_process+0x15/0x20
[    0.688724]  [<ffffffff8153cc03>] scsi_eh_wakeup+0x33/0xa0
[    0.688725]  [<ffffffff8153ccbc>] scsi_schedule_eh+0x4c/0x60
[    0.688728]  [<ffffffff8156d76f>] ata_std_sched_eh+0x3f/0x60
[    0.688729]  [<ffffffff8156d7c3>] ata_port_schedule_eh+0x13/0x20
[    0.688730]  [<ffffffff815618d4>] __ata_port_probe+0x44/0x60
[    0.688731]  [<ffffffff81565fe0>] ata_port_probe+0x20/0x40
[    0.688732]  [<ffffffff8156602e>] async_port_probe+0x2e/0x60
[    0.688734]  [<ffffffff810cccc9>] async_run_entry_fn+0x39/0x140
[    0.688736]  [<ffffffff810c34d2>] process_one_work+0x152/0x400
[    0.688738]  [<ffffffff810c38a5>] worker_thread+0x125/0x4b0
[    0.688739]  [<ffffffff810c3780>] ? process_one_work+0x400/0x400
[    0.688740]  [<ffffffff810c9cb8>] kthread+0xd8/0xf0
[    0.688744]  [<ffffffff816c4e3f>] ret_from_fork+0x1f/0x40
[    0.688745]  [<ffffffff810c9be0>] ? __kthread_parkme+0x70/0x70
[    0.688757] Code: c7 c0 20 dd 00 00 65 48 03 05 c3 bd f2 7e 4c 8b 30 48 c7 c0 c0 8e 01 00 65 48 03 05 b1 bd f2 7e 48 8b 80 c8 09 00 00 48 c1 e8 09 <49> 39 46 78 0f 87 29 02 00 00 65 8b 3d 9d bd f2 7e e8 b8 c8 ff 
[    0.688758] RIP  [<ffffffff810de382>] select_idle_sibling+0x172/0x3b0
[    0.688759]  RSP <ffff88007c3b3bb0>
[    0.688759] CR2: 0000000000000078
[    0.688762] ---[ end trace f10266de945b1779 ]---

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: regression since 4.8 and newer in select_idle_siblings()
  2016-10-18 13:40 regression since 4.8 and newer in select_idle_siblings() Igor Mammedov
@ 2016-10-18 14:02 ` Mike Galbraith
  2016-10-18 14:52   ` Igor Mammedov
  0 siblings, 1 reply; 3+ messages in thread
From: Mike Galbraith @ 2016-10-18 14:02 UTC (permalink / raw)
  To: Igor Mammedov, linux-kernel; +Cc: mingo, peterz, tglx, torvalds

On Tue, 2016-10-18 at 15:40 +0200, Igor Mammedov wrote:
> kernel crashes at runtime  due null pointer dereference at
>   select_idle_sibling()
>      -> select_idle_cpu()
>          ...
>          u64 avg_cost = this_sd->avg_scan_cost;
> 
> regression bisects to:
>   commit 10e2f1acd0106c05229f94c70a344ce3a2c8008b
>   Author: Peter Zijlstra <peterz@infradead.org>
>   sched/core: Rewrite and improve select_idle_siblings()

http://git.kernel.org/tip/9cfb38a7ba5a9c27c1af8093fb1af4b699c0a441

Already fixed, and will land in Linus' tree soon.

	-Mike

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: regression since 4.8 and newer in select_idle_siblings()
  2016-10-18 14:02 ` Mike Galbraith
@ 2016-10-18 14:52   ` Igor Mammedov
  0 siblings, 0 replies; 3+ messages in thread
From: Igor Mammedov @ 2016-10-18 14:52 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel, mingo, peterz, tglx, torvalds

On Tue, 18 Oct 2016 16:02:07 +0200
Mike Galbraith <efault@gmx.de> wrote:

> On Tue, 2016-10-18 at 15:40 +0200, Igor Mammedov wrote:
> > kernel crashes at runtime  due null pointer dereference at
> >   select_idle_sibling()  
> >      -> select_idle_cpu()  
> >          ...
> >          u64 avg_cost = this_sd->avg_scan_cost;
> > 
> > regression bisects to:
> >   commit 10e2f1acd0106c05229f94c70a344ce3a2c8008b
> >   Author: Peter Zijlstra <peterz@infradead.org>
> >   sched/core: Rewrite and improve select_idle_siblings()  
> 
> http://git.kernel.org/tip/9cfb38a7ba5a9c27c1af8093fb1af4b699c0a441

Thanks, above patch fixes issue for me.

> Already fixed, and will land in Linus' tree soon.
> 
> 	-Mike

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-10-18 14:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-18 13:40 regression since 4.8 and newer in select_idle_siblings() Igor Mammedov
2016-10-18 14:02 ` Mike Galbraith
2016-10-18 14:52   ` Igor Mammedov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox