* 3.5-rc6 futex_wait_requeue_pi oops.
@ 2012-07-13 18:08 Dave Jones
2012-07-13 18:47 ` Thomas Gleixner
0 siblings, 1 reply; 11+ messages in thread
From: Dave Jones @ 2012-07-13 18:08 UTC (permalink / raw)
To: Linux Kernel; +Cc: Paul E. McKenney, Thomas Gleixner, Rusty Russell
Looks like calling futex() with garbage makes things unhappy.
Dave
[ 673.054286] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
[ 673.055292] IP: [<ffffffff810d665e>] __lock_acquire+0x5e/0x1ae0
[ 673.056225] PGD 1107c8067 PUD 11079c067 PMD 0
[ 673.057224] Oops: 0000 [#1] SMP
[ 673.058248] CPU 3
[ 673.058263] Modules linked in:<4>[ 673.069440] ebt_snat<4>[ 673.088955] xt_cluster<4>[ 673.095505] nls_cp874 nls_cp850 nls_cp869 nls_iso8859_1 nls_iso8859_6 romfs ufs nfs_layout_nfsv41_files blocklayoutdriver nfs ecryptfs cachefiles binfmt_misc udf sysv hfsplus msdos vfat fat cuse fuse cramfs 9p 9pnet ceph libceph hfs befs cifs fscache ncpfs coda affs btrfs squashfs minix hwpoison_inject encrypted_keys tgr192 lzo ansi_cprng rmd128 khazad authencesn ccm salsa20_generic serpent_generic anubis tea blowfish_generic cast6 rmd320 des_generic sha256_generic fcrypt crypto_user ghash_generic camellia_generic md4 twofish_generic crypto_null sha512_generic zlib vmac blowfish_common lrw wp512 gcm cts deflate twofish_common pcrypt rmd160 cast5 authenc xts gf128mul pcbc raid6test michael_mic rmd256 seed xcbc crc8 cpu_notifier_error_inject ts_fsm crc7 ts_bm cordic crc_itu_t ts_kmp lpc_sch mfd_core i2c_dev i2c_pca_platform i2c_diolan_u2c i2c_simtec i2c_isch i2c_scmi i2c_tiny_usb i2c_piix4 i2c_algo_pca i2c_smbus acpi_pad ec_sys sbs sbshc custom_method asus_atk0110 acpi_power_meter pmbus_core cpufreq_stats softdog ioatdma pch_dma usb_storage nosy bonding ixgb e100 ixgbe e1000 ixgbevf igb igbvf team_mode_activebackup team_mode_roundrobin team eql can_dev netconsole ppp_async crc_ccitt pppoe pptp gre ppp_synctty pppox ppp_deflate zlib_deflate arc4 ppp_mppe bsd_comp ppp_generic catc kaweth pegasus rtl8150 ipheth veth slhc dummy mii lxt vitesse mdio_bitbang davicom marvell cicada national ste10Xp broadcom icplus et1011c micrel realtek smsc qsemi mdio vhost_net tun macvtap macvlan cryptoloop brd rtc_max6900 rtc_em3027 rtc_bq32k rtc_ds1286 rtc_m48t59 rtc_ds1511 rtc_ds1672 rtc_rx8025 rtc_isl12022 rtc_ds1374 rtc_stk17ta8 rtc_x1205 rtc_v3020 rtc_rs5c372 rtc_ds3232 rtc_bq4802 rtc_pcf8563 rtc_rx8581 rtc_rv3029c2 rtc_ds1307 rtc_m48t35 rtc_ds1553 rtc_pcf8583 rtc_ds1742 rtc_isl1208 rtc_m41t80 rtc_fm3130 scsi_transport_fc scsi_transport_spi ch scsi_wait_scan raid_class scsi_tgt libsas scsi_transport_sas uio_aec uio_sercos3 uio_cif uio_pci_generic uio timeriomem_rng hangcheck_timer dca pps_ldisc pps_gpio dm_queue_length multipath dm_crypt dm_service_time faulty dm_round_robin dm_log_userspace linear dm_thin_pool dm_persistent_data libcrc32c dm_bufio dm_flakey dm_multipath raid0 dm_raid raid456 raid1 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid10 shpchp fakephp aer_inject ptp pps_core target_core_file target_core_iblock target_core_pscsi tcm_loop target_core_mod vga16fb sysimgblt fb_sys_fops syscopyarea vgastate output platform_lcd lcd sysfillrect n_r3964 n_gsm nozomi jsm serio_raw altera_ps2 input_polldev sparse_keymap uinput ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables kvm_intel kvm crc32c_intel ghash_clmulni_intel microcode usb_debug pcspkr i2c_i801 e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
[ 673.095668]
[ 673.095669] Pid: 22872, comm: trinity-child3 Not tainted 3.5.0-rc6+ #107
[ 673.095673] RIP: 0010:[<ffffffff810d665e>] [<ffffffff810d665e>] __lock_acquire+0x5e/0x1ae0
[ 673.095679] RSP: 0000:ffff8801107c7a48 EFLAGS: 00010046
[ 673.095679] RAX: 0000000000000082 RBX: 0000000000000000 RCX: 0000000000000000
[ 673.095680] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000028
[ 673.095681] RBP: ffff8801107c7b38 R08: 0000000000000002 R09: 0000000000000000
[ 673.095682] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
[ 673.095683] R13: ffff8800a9144d20 R14: 0000000000000002 R15: 0000000000000028
[ 673.095684] FS: 00007f4343491740(0000) GS:ffff880148200000(0000) knlGS:0000000000000000
[ 673.095685] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 673.095686] CR2: 0000000000000028 CR3: 000000012d9ba000 CR4: 00000000001407e0
[ 673.095687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 673.095688] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 673.095690] Process trinity-child3 (pid: 22872, threadinfo ffff8801107c6000, task ffff8800a9144d20)
[ 673.095690] Stack:
[ 673.095691] ffff8801107c7a58 ffff8800a91455e0 0000000000000002 ffff8800a9144d20
[ 673.095695] 000000000000029f ffffffff82959908 ffff8801107c7b78 0000000000000082
[ 673.095699] ffff8801107c7aa8 ffffffff816884f0 ffff8800a9144d20 ffff88013f748000
[ 673.095702] Call Trace:
[ 673.095703] [<ffffffff816884f0>] ? _raw_spin_unlock_irq+0x30/0x60
[ 673.095708] [<ffffffff810d941d>] ? trace_hardirqs_on_caller+0x15d/0x1e0
[ 673.095710] [<ffffffff810d94ad>] ? trace_hardirqs_on+0xd/0x10
[ 673.095713] [<ffffffff810d87ed>] lock_acquire+0xad/0x220
[ 673.095715] [<ffffffff810e0104>] ? rt_mutex_finish_proxy_lock+0x34/0xd0
[ 673.095717] [<ffffffff810d3958>] ? trace_hardirqs_off_caller+0x28/0xd0
[ 673.095720] [<ffffffff81687de6>] _raw_spin_lock+0x46/0x80
[ 673.095722] [<ffffffff810e0104>] ? rt_mutex_finish_proxy_lock+0x34/0xd0
[ 673.095725] [<ffffffff810e0104>] rt_mutex_finish_proxy_lock+0x34/0xd0
[ 673.095726] [<ffffffff810ddbd2>] futex_wait_requeue_pi.constprop.20+0x2d2/0x3d0
[ 673.095730] [<ffffffff81097ff0>] ? update_rmtp+0x70/0x70
[ 673.095733] [<ffffffff810993c4>] ? hrtimer_start_range_ns+0x14/0x20
[ 673.095736] [<ffffffff810de42a>] do_futex+0xea/0xa20
[ 673.095738] [<ffffffff810ad759>] ? local_clock+0x99/0xc0
[ 673.095741] [<ffffffff81189443>] ? might_fault+0x53/0xb0
[ 673.095746] [<ffffffff810dee67>] sys_futex+0x107/0x1a0
[ 673.095749] [<ffffffff810d9400>] ? trace_hardirqs_on_caller+0x140/0x1e0
[ 673.095751] [<ffffffff81691b6d>] system_call_fastpath+0x1a/0x1f
[ 673.095755] Code: d8 45 0f 45 e0 4c 89 75 f0 4c 89 7d f8 85 c0 0f 84 f8 00 00 00 8b 05 e2 af fa 00 49 89 ff 89 f3 41 89 d2 85 c0 0f 84 02 01 00 00 <49> 8b 07 ba 01 00 00 00 48 3d 20 c4 0c 82 44 0f 44 e2 83 fb 01
[ 673.095789] RIP [<ffffffff810d665e>] __lock_acquire+0x5e/0x1ae0
[ 673.095791] RSP <ffff8801107c7a48>
[ 673.095792] CR2: 0000000000000028
[ 673.095793] ---[ end trace c26f1bd418342e06 ]---
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.5-rc6 futex_wait_requeue_pi oops.
2012-07-13 18:08 3.5-rc6 futex_wait_requeue_pi oops Dave Jones
@ 2012-07-13 18:47 ` Thomas Gleixner
2012-07-13 18:54 ` Dave Jones
0 siblings, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2012-07-13 18:47 UTC (permalink / raw)
To: Dave Jones
Cc: Linux Kernel, Paul E. McKenney, Rusty Russell, Darren Hart,
Peter Zijlstra
On Fri, 13 Jul 2012, Dave Jones wrote:
> Looks like calling futex() with garbage makes things unhappy.
Cc'ing Darren and Peter.
> [ 673.054286] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
> [ 673.055292] IP: [<ffffffff810d665e>] __lock_acquire+0x5e/0x1ae0
> [ 673.056225] PGD 1107c8067 PUD 11079c067 PMD 0
> [ 673.057224] Oops: 0000 [#1] SMP
> [ 673.058248] CPU 3
> [ 673.095668] <SNIP modules splat>
> [ 673.095669] Pid: 22872, comm: trinity-child3 Not tainted 3.5.0-rc6+ #107
> [ 673.095673] RIP: 0010:[<ffffffff810d665e>] [<ffffffff810d665e>] __lock_acquire+0x5e/0x1ae0
> [ 673.095679] RSP: 0000:ffff8801107c7a48 EFLAGS: 00010046
> [ 673.095679] RAX: 0000000000000082 RBX: 0000000000000000 RCX: 0000000000000000
> [ 673.095680] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000028
> [ 673.095681] RBP: ffff8801107c7b38 R08: 0000000000000002 R09: 0000000000000000
> [ 673.095682] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
> [ 673.095683] R13: ffff8800a9144d20 R14: 0000000000000002 R15: 0000000000000028
> [ 673.095684] FS: 00007f4343491740(0000) GS:ffff880148200000(0000) knlGS:0000000000000000
> [ 673.095685] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 673.095686] CR2: 0000000000000028 CR3: 000000012d9ba000 CR4: 00000000001407e0
> [ 673.095687] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 673.095688] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [ 673.095690] Process trinity-child3 (pid: 22872, threadinfo ffff8801107c6000, task ffff8800a9144d20)
> [ 673.095690] Stack:
> [ 673.095691] ffff8801107c7a58 ffff8800a91455e0 0000000000000002 ffff8800a9144d20
> [ 673.095695] 000000000000029f ffffffff82959908 ffff8801107c7b78 0000000000000082
> [ 673.095699] ffff8801107c7aa8 ffffffff816884f0 ffff8800a9144d20 ffff88013f748000
> [ 673.095702] Call Trace:
> [ 673.095703] [<ffffffff816884f0>] ? _raw_spin_unlock_irq+0x30/0x60
> [ 673.095708] [<ffffffff810d941d>] ? trace_hardirqs_on_caller+0x15d/0x1e0
> [ 673.095710] [<ffffffff810d94ad>] ? trace_hardirqs_on+0xd/0x10
> [ 673.095713] [<ffffffff810d87ed>] lock_acquire+0xad/0x220
> [ 673.095715] [<ffffffff810e0104>] ? rt_mutex_finish_proxy_lock+0x34/0xd0
> [ 673.095717] [<ffffffff810d3958>] ? trace_hardirqs_off_caller+0x28/0xd0
> [ 673.095720] [<ffffffff81687de6>] _raw_spin_lock+0x46/0x80
> [ 673.095722] [<ffffffff810e0104>] ? rt_mutex_finish_proxy_lock+0x34/0xd0
> [ 673.095725] [<ffffffff810e0104>] rt_mutex_finish_proxy_lock+0x34/0xd0
> [ 673.095726] [<ffffffff810ddbd2>] futex_wait_requeue_pi.constprop.20+0x2d2/0x3d0
> [ 673.095730] [<ffffffff81097ff0>] ? update_rmtp+0x70/0x70
> [ 673.095733] [<ffffffff810993c4>] ? hrtimer_start_range_ns+0x14/0x20
> [ 673.095736] [<ffffffff810de42a>] do_futex+0xea/0xa20
> [ 673.095738] [<ffffffff810ad759>] ? local_clock+0x99/0xc0
> [ 673.095741] [<ffffffff81189443>] ? might_fault+0x53/0xb0
> [ 673.095746] [<ffffffff810dee67>] sys_futex+0x107/0x1a0
> [ 673.095749] [<ffffffff810d9400>] ? trace_hardirqs_on_caller+0x140/0x1e0
> [ 673.095751] [<ffffffff81691b6d>] system_call_fastpath+0x1a/0x1f
> [ 673.095755] Code: d8 45 0f 45 e0 4c 89 75 f0 4c 89 7d f8 85 c0 0f 84 f8 00 00 00 8b 05 e2 af fa 00 49 89 ff 89 f3 41 89 d2 85 c0 0f 84 02 01 00 00 <49> 8b 07 ba 01 00 00 00 48 3d 20 c4 0c 82 44 0f 44 e2 83 fb 01
> [ 673.095789] RIP [<ffffffff810d665e>] __lock_acquire+0x5e/0x1ae0
> [ 673.095791] RSP <ffff8801107c7a48>
> [ 673.095792] CR2: 0000000000000028
> [ 673.095793] ---[ end trace c26f1bd418342e06 ]---
>
WARN_ON(!&q.pi_state);
pi_mutex = &q.pi_state->pi_mutex;
ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
debug_rt_mutex_free_waiter(&rt_waiter);
So there is some weird way which causes q.pi_state = NULL. Dave, did
you see the warning before the oops happened ?
That futex stuff should be sent to outer space.
Thanks,
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.5-rc6 futex_wait_requeue_pi oops.
2012-07-13 18:47 ` Thomas Gleixner
@ 2012-07-13 18:54 ` Dave Jones
2012-07-13 19:11 ` Thomas Gleixner
2012-07-19 23:22 ` Darren Hart
0 siblings, 2 replies; 11+ messages in thread
From: Dave Jones @ 2012-07-13 18:54 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Linux Kernel, Paul E. McKenney, Rusty Russell, Darren Hart,
Peter Zijlstra
On Fri, Jul 13, 2012 at 08:47:38PM +0200, Thomas Gleixner wrote:
> On Fri, 13 Jul 2012, Dave Jones wrote:
>
> > Looks like calling futex() with garbage makes things unhappy.
>
> WARN_ON(!&q.pi_state);
> pi_mutex = &q.pi_state->pi_mutex;
> ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
> debug_rt_mutex_free_waiter(&rt_waiter);
>
> So there is some weird way which causes q.pi_state = NULL. Dave, did
> you see the warning before the oops happened ?
No, that didn't seem to trigger.
Dave
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.5-rc6 futex_wait_requeue_pi oops.
2012-07-13 18:54 ` Dave Jones
@ 2012-07-13 19:11 ` Thomas Gleixner
2012-07-13 19:56 ` Dave Jones
2012-07-19 23:22 ` Darren Hart
1 sibling, 1 reply; 11+ messages in thread
From: Thomas Gleixner @ 2012-07-13 19:11 UTC (permalink / raw)
To: Dave Jones
Cc: Linux Kernel, Paul E. McKenney, Rusty Russell, Darren Hart,
Peter Zijlstra
On Fri, 13 Jul 2012, Dave Jones wrote:
> On Fri, Jul 13, 2012 at 08:47:38PM +0200, Thomas Gleixner wrote:
> > On Fri, 13 Jul 2012, Dave Jones wrote:
> >
> > > Looks like calling futex() with garbage makes things unhappy.
> >
> > WARN_ON(!&q.pi_state);
> > pi_mutex = &q.pi_state->pi_mutex;
> > ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
> > debug_rt_mutex_free_waiter(&rt_waiter);
> >
> > So there is some weird way which causes q.pi_state = NULL. Dave, did
> > you see the warning before the oops happened ?
>
> No, that didn't seem to trigger.
Yuck. The rt_mutex is embedded in pi_state and not a pointer and the
thing explodes in __lock_acquire if the raw lock protecting the
rtmutex internals.
Can you decode the exact code line ?
Thanks,
tglx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.5-rc6 futex_wait_requeue_pi oops.
2012-07-13 19:11 ` Thomas Gleixner
@ 2012-07-13 19:56 ` Dave Jones
[not found] ` <CAGChsmNnE_iEKWagULzewSPWsAbaA2A-mXg4CS+vyG3a8Pbj1A@mail.gmail.com>
0 siblings, 1 reply; 11+ messages in thread
From: Dave Jones @ 2012-07-13 19:56 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Linux Kernel, Paul E. McKenney, Rusty Russell, Darren Hart,
Peter Zijlstra
On Fri, Jul 13, 2012 at 09:11:57PM +0200, Thomas Gleixner wrote:
> On Fri, 13 Jul 2012, Dave Jones wrote:
>
> > On Fri, Jul 13, 2012 at 08:47:38PM +0200, Thomas Gleixner wrote:
> > > On Fri, 13 Jul 2012, Dave Jones wrote:
> > >
> > > > Looks like calling futex() with garbage makes things unhappy.
> > >
> > > WARN_ON(!&q.pi_state);
> > > pi_mutex = &q.pi_state->pi_mutex;
> > > ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
> > > debug_rt_mutex_free_waiter(&rt_waiter);
> > >
> > > So there is some weird way which causes q.pi_state = NULL. Dave, did
> > > you see the warning before the oops happened ?
> >
> > No, that didn't seem to trigger.
>
> Yuck. The rt_mutex is embedded in pi_state and not a pointer and the
> thing explodes in __lock_acquire if the raw lock protecting the
> rtmutex internals.
>
> Can you decode the exact code line ?
Hmm. I think I rebuilt the kernel, so things may be slightly different, though
what I see surprises me..
decoding the Code: line shows..
Code: d8 45 0f 45 e0 4c 89 75 f0 4c 89 7d f8 85 c0 0f 84 f8 00 00 00 8b 05 e2 af fa 00 49 89 ff 89 f3 41 89 d2 85 c0 0f 84 02 01 00 00 <49> 8b 07 ba 01 00 00 00 48 3d 20 c4 0c 82 44 0f 44 e2 83 fb 01
0000000000000000 <.text>:
0: d8 45 0f fadds 0xf(%rbp)
3: 45 e0 4c rex.RB loopne 0x52
6: 89 75 f0 mov %esi,-0x10(%rbp)
9: 4c 89 7d f8 mov %r15,-0x8(%rbp)
d: 85 c0 test %eax,%eax
f: 0f 84 f8 00 00 00 je 0x10d
15: 8b 05 e2 af fa 00 mov 0xfaafe2(%rip),%eax # 0xfaaffd
1b: 49 89 ff mov %rdi,%r15
1e: 89 f3 mov %esi,%ebx
20: 41 89 d2 mov %edx,%r10d
23: 85 c0 test %eax,%eax
25: 0f 84 02 01 00 00 je 0x12d
/home/davej/tmp/tmp.SI8vbYzuK6.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <.text>:
0: 49 8b 07 mov (%r15),%rax
3: ba 01 00 00 00 mov $0x1,%edx
8: 48 3d 20 c4 0c 82 cmp $0xffffffff820cc420,%rax
e: 44 0f 44 e2 cmove %edx,%r12d
12: 83 fb 01 cmp $0x1,%ebx
The only instance of 49 8b 07 followed by ba 01 in kernel/lockdep.o is this ..
/*
* Lockdep should run with IRQs disabled, otherwise we could
* get an interrupt which would want to take locks, which would
* end up in lockdep and have you got a head-ache already?
*/
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
3f88: 8b 05 00 00 00 00 mov 0x0(%rip),%eax # 3f8e <__lock_acquire+0x4e>
3f8e: 49 89 ff mov %rdi,%r15
3f91: 89 f3 mov %esi,%ebx
3f93: 41 89 d2 mov %edx,%r10d
3f96: 85 c0 test %eax,%eax
3f98: 0f 84 02 01 00 00 je 40a0 <__lock_acquire+0x160>
return 0;
if (lock->key == &__lockdep_no_validate__)
3f9e: 49 8b 07 mov (%r15),%rax <<<<<<<<<<<<<<<<<<
check = 1;
3fa1: ba 01 00 00 00 mov $0x1,%edx
Seems to add up. Though the bytes in the code: line following don't match what's in the object..
3fa6: 48 3d 00 00 00 00 cmp $0x0,%rax
3fac: 44 0f 44 e2 cmove %edx,%r12d
That line at 3fa6 got changed from an actual address to a NULL.
I guess that's the &__lockdep_no_validate__ comparison.
Though it seems odd that the kernel text would change.
Does lockdep do that when it gets disabled or something ?
Dave
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.5-rc6 futex_wait_requeue_pi oops.
[not found] ` <CAGChsmNnE_iEKWagULzewSPWsAbaA2A-mXg4CS+vyG3a8Pbj1A@mail.gmail.com>
@ 2012-07-13 20:54 ` Dave Jones
0 siblings, 0 replies; 11+ messages in thread
From: Dave Jones @ 2012-07-13 20:54 UTC (permalink / raw)
To: Darren Hart
Cc: Peter Zijlstra, Paul E. McKenney, Thomas Gleixner, Rusty Russell,
Linux Kernel
On Fri, Jul 13, 2012 at 01:27:41PM -0700, Darren Hart wrote:
> I'm returning from a family vacation just now, I'll have a closer look on
> Monday. It seems to me we recently had some futex lockdep annotations go
> in, any chance those are somehow involved?
>
> So we have a real user of the futex requeue pi code? Is this via pthread
> condvars? Is this test available for me to run?
I wouldn't call it a "real user" per se.
details (including git checkout) at http://codemonkey.org.uk/projects/trinity/
Dave
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.5-rc6 futex_wait_requeue_pi oops.
2012-07-13 18:54 ` Dave Jones
2012-07-13 19:11 ` Thomas Gleixner
@ 2012-07-19 23:22 ` Darren Hart
2012-07-20 0:37 ` Darren Hart
1 sibling, 1 reply; 11+ messages in thread
From: Darren Hart @ 2012-07-19 23:22 UTC (permalink / raw)
To: Dave Jones, Thomas Gleixner, Linux Kernel, Paul E. McKenney,
Rusty Russell, Darren Hart, Peter Zijlstra
On 07/13/2012 11:54 AM, Dave Jones wrote:
> On Fri, Jul 13, 2012 at 08:47:38PM +0200, Thomas Gleixner wrote:
> > On Fri, 13 Jul 2012, Dave Jones wrote:
> >
> > > Looks like calling futex() with garbage makes things unhappy.
> >
> > WARN_ON(!&q.pi_state);
> > pi_mutex = &q.pi_state->pi_mutex;
> > ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
> > debug_rt_mutex_free_waiter(&rt_waiter);
> >
> > So there is some weird way which causes q.pi_state = NULL. Dave, did
> > you see the warning before the oops happened ?
>
> No, that didn't seem to trigger.
Well I don't have a fix yet, but I can explain this not triggering.
q is on the stack, so the ADDRESS for q.pi_state is never going to be
NULL. However, properly instrumented, we do see this:
[ 23.621501] ---[ end trace 20bdfb44db182a17 ]---
[ 23.622425] q.pi_state @ (null)
[ 23.623272] &q.pi_state @ ffff880185e2dca8
[ 23.624119] ------------[ cut here ]------------
Duh.
I'll add a fix to that WARN_ON in my futex-fixes branch along with the
fix for the bug Dan found.
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.5-rc6 futex_wait_requeue_pi oops.
2012-07-19 23:22 ` Darren Hart
@ 2012-07-20 0:37 ` Darren Hart
2012-07-20 6:53 ` Darren Hart
0 siblings, 1 reply; 11+ messages in thread
From: Darren Hart @ 2012-07-20 0:37 UTC (permalink / raw)
To: Dave Jones, Thomas Gleixner, Linux Kernel, Paul E. McKenney,
Rusty Russell, Darren Hart, Peter Zijlstra
On 07/19/2012 04:22 PM, Darren Hart wrote:
>
>
> On 07/13/2012 11:54 AM, Dave Jones wrote:
>> On Fri, Jul 13, 2012 at 08:47:38PM +0200, Thomas Gleixner wrote:
>> > On Fri, 13 Jul 2012, Dave Jones wrote:
>> >
>> > > Looks like calling futex() with garbage makes things unhappy.
>> >
>> > WARN_ON(!&q.pi_state);
>> > pi_mutex = &q.pi_state->pi_mutex;
>> > ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
>> > debug_rt_mutex_free_waiter(&rt_waiter);
>> >
>> > So there is some weird way which causes q.pi_state = NULL. Dave, did
>> > you see the warning before the oops happened ?
>>
>> No, that didn't seem to trigger.
>
> Well I don't have a fix yet, but I can explain this not triggering.
>
> q is on the stack, so the ADDRESS for q.pi_state is never going to be
> NULL. However, properly instrumented, we do see this:
>
> [ 23.621501] ---[ end trace 20bdfb44db182a17 ]---
> [ 23.622425] q.pi_state @ (null)
> [ 23.623272] &q.pi_state @ ffff880185e2dca8
> [ 23.624119] ------------[ cut here ]------------
>
> Duh.
>
> I'll add a fix to that WARN_ON in my futex-fixes branch along with the
> fix for the bug Dan found.
>
I think I have root cause. futex_wait_requeue_pi() doesn't like having
uaddr == uaddr2. The handle_early_wakeup() doesn't detect a problem
because key2 IS the same as key1, I think. I've just discovered this and
quickly hacked in a "if (uaddr==uaddr2) return -EINVAL" fix and the test
continues to run (with just ops 0, 11, 12) for several minutes now
(typically fails in a few seconds). I'll let it run for a few hours and
contemplate the proper fix.
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.5-rc6 futex_wait_requeue_pi oops.
2012-07-20 0:37 ` Darren Hart
@ 2012-07-20 6:53 ` Darren Hart
2012-07-20 13:35 ` Dave Jones
0 siblings, 1 reply; 11+ messages in thread
From: Darren Hart @ 2012-07-20 6:53 UTC (permalink / raw)
To: Dave Jones, Thomas Gleixner, Linux Kernel, Paul E. McKenney,
Rusty Russell, Darren Hart, Peter Zijlstra
On 07/19/2012 05:37 PM, Darren Hart wrote:
>
>
> On 07/19/2012 04:22 PM, Darren Hart wrote:
>>
>>
>> On 07/13/2012 11:54 AM, Dave Jones wrote:
>>> On Fri, Jul 13, 2012 at 08:47:38PM +0200, Thomas Gleixner wrote:
>>> > On Fri, 13 Jul 2012, Dave Jones wrote:
>>> >
>>> > > Looks like calling futex() with garbage makes things unhappy.
>>> >
>>> > WARN_ON(!&q.pi_state);
>>> > pi_mutex = &q.pi_state->pi_mutex;
>>> > ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
>>> > debug_rt_mutex_free_waiter(&rt_waiter);
>>> >
>>> > So there is some weird way which causes q.pi_state = NULL. Dave, did
>>> > you see the warning before the oops happened ?
>>>
>>> No, that didn't seem to trigger.
>>
>> Well I don't have a fix yet, but I can explain this not triggering.
>>
>> q is on the stack, so the ADDRESS for q.pi_state is never going to be
>> NULL. However, properly instrumented, we do see this:
>>
>> [ 23.621501] ---[ end trace 20bdfb44db182a17 ]---
>> [ 23.622425] q.pi_state @ (null)
>> [ 23.623272] &q.pi_state @ ffff880185e2dca8
>> [ 23.624119] ------------[ cut here ]------------
>>
>> Duh.
>>
>> I'll add a fix to that WARN_ON in my futex-fixes branch along with the
>> fix for the bug Dan found.
>>
>
> I think I have root cause. futex_wait_requeue_pi() doesn't like having
> uaddr == uaddr2. The handle_early_wakeup() doesn't detect a problem
> because key2 IS the same as key1, I think. I've just discovered this and
> quickly hacked in a "if (uaddr==uaddr2) return -EINVAL" fix and the test
> continues to run (with just ops 0, 11, 12) for several minutes now
> (typically fails in a few seconds). I'll let it run for a few hours and
> contemplate the proper fix.
Dave, mind giving this a spin? It seems to be doing the trick here,
at least for the *REQUEUE_PI futex op codes in trinity.
>From d689b1598d67520dd87b30cc1ce7c6b76f566f43 Mon Sep 17 00:00:00 2001
Message-Id: <d689b1598d67520dd87b30cc1ce7c6b76f566f43.1342766842.git.dvhart@linux.intel.com>
From: Darren Hart <dvhart@linux.intel.com>
Date: Thu, 19 Jul 2012 23:40:15 -0700
Subject: [PATCH] futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()
If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call. If we attempt this, as the
trinity test suite manages to do, we miss early wakeups as q.key is
equal to key2 (because they are the same uaddr). We will then attempt to
dereference the pi_mutex (which would exist had the futex_q been
properly requeued to a pi futex) and trigger a NULL pointer dereference.
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
CC: Dave Jones <davej@redhat.com>
CC: Thomas Gleixner <tglx@linutronix.de>
---
kernel/futex.c | 13 ++++++++-----
1 files changed, 8 insertions(+), 5 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index 5551ada..3717e7b 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2231,11 +2231,11 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
* @uaddr2: the pi futex we will take prior to returning to user-space
*
* The caller will wait on uaddr and will be requeued by futex_requeue() to
- * uaddr2 which must be PI aware. Normal wakeup will wake on uaddr2 and
- * complete the acquisition of the rt_mutex prior to returning to userspace.
- * This ensures the rt_mutex maintains an owner when it has waiters; without
- * one, the pi logic wouldn't know which task to boost/deboost, if there was a
- * need to.
+ * uaddr2 which must be PI aware and unique from uaddr. Normal wakeup will wake
+ * on uaddr2 and complete the acquisition of the rt_mutex prior to returning to
+ * userspace. This ensures the rt_mutex maintains an owner when it has waiters;
+ * without one, the pi logic would not know which task to boost/deboost, if
+ * there was a need to.
*
* We call schedule in futex_wait_queue_me() when we enqueue and return there
* via the following:
@@ -2272,6 +2272,9 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
struct futex_q q = futex_q_init;
int res, ret;
+ if (uaddr == uaddr2)
+ return -EINVAL;
+
if (!bitset)
return -EINVAL;
--
1.7.5.4
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: 3.5-rc6 futex_wait_requeue_pi oops.
2012-07-20 6:53 ` Darren Hart
@ 2012-07-20 13:35 ` Dave Jones
2012-07-20 15:10 ` Darren Hart
0 siblings, 1 reply; 11+ messages in thread
From: Dave Jones @ 2012-07-20 13:35 UTC (permalink / raw)
To: Darren Hart
Cc: Thomas Gleixner, Linux Kernel, Paul E. McKenney, Rusty Russell,
Darren Hart, Peter Zijlstra
On Thu, Jul 19, 2012 at 11:53:45PM -0700, Darren Hart wrote:
> >> I'll add a fix to that WARN_ON in my futex-fixes branch along with the
> >> fix for the bug Dan found.
> >
> > I think I have root cause. futex_wait_requeue_pi() doesn't like having
> > uaddr == uaddr2. The handle_early_wakeup() doesn't detect a problem
> > because key2 IS the same as key1, I think. I've just discovered this and
> > quickly hacked in a "if (uaddr==uaddr2) return -EINVAL" fix and the test
> > continues to run (with just ops 0, 11, 12) for several minutes now
> > (typically fails in a few seconds). I'll let it run for a few hours and
> > contemplate the proper fix.
>
> Dave, mind giving this a spin? It seems to be doing the trick here,
> at least for the *REQUEUE_PI futex op codes in trinity.
Yeah, looks like that does the trick!
thanks,
Dave
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 3.5-rc6 futex_wait_requeue_pi oops.
2012-07-20 13:35 ` Dave Jones
@ 2012-07-20 15:10 ` Darren Hart
0 siblings, 0 replies; 11+ messages in thread
From: Darren Hart @ 2012-07-20 15:10 UTC (permalink / raw)
To: Dave Jones, Thomas Gleixner, Linux Kernel, Paul E. McKenney,
Rusty Russell, Darren Hart, Peter Zijlstra
On 07/20/2012 06:35 AM, Dave Jones wrote:
> On Thu, Jul 19, 2012 at 11:53:45PM -0700, Darren Hart wrote:
>
>
> > >> I'll add a fix to that WARN_ON in my futex-fixes branch along with the
> > >> fix for the bug Dan found.
> > >
> > > I think I have root cause. futex_wait_requeue_pi() doesn't like having
> > > uaddr == uaddr2. The handle_early_wakeup() doesn't detect a problem
> > > because key2 IS the same as key1, I think. I've just discovered this and
> > > quickly hacked in a "if (uaddr==uaddr2) return -EINVAL" fix and the test
> > > continues to run (with just ops 0, 11, 12) for several minutes now
> > > (typically fails in a few seconds). I'll let it run for a few hours and
> > > contemplate the proper fix.
> >
> > Dave, mind giving this a spin? It seems to be doing the trick here,
> > at least for the *REQUEUE_PI futex op codes in trinity.
>
> Yeah, looks like that does the trick!
It ran all night without an issue here too. I'll roll these up and send
them out shortly.
Dave, I love/hate trinity. ;-)
--
Darren Hart
Intel Open Source Technology Center
Yocto Project - Linux Kernel
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-07-20 15:11 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-13 18:08 3.5-rc6 futex_wait_requeue_pi oops Dave Jones
2012-07-13 18:47 ` Thomas Gleixner
2012-07-13 18:54 ` Dave Jones
2012-07-13 19:11 ` Thomas Gleixner
2012-07-13 19:56 ` Dave Jones
[not found] ` <CAGChsmNnE_iEKWagULzewSPWsAbaA2A-mXg4CS+vyG3a8Pbj1A@mail.gmail.com>
2012-07-13 20:54 ` Dave Jones
2012-07-19 23:22 ` Darren Hart
2012-07-20 0:37 ` Darren Hart
2012-07-20 6:53 ` Darren Hart
2012-07-20 13:35 ` Dave Jones
2012-07-20 15:10 ` Darren Hart
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).