[PATCH -next] locking/rwsem: don't spin in heavy contention

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH -next] locking/rwsem: don't spin in heavy contention
@ 2015-03-06 15:13 Ming Lei
  2015-03-06 21:54 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Ming Lei @ 2015-03-06 15:13 UTC (permalink / raw)
  To: linux-kernel
  Cc: Ming Lei, Davidlohr Bueso, Peter Zijlstra (Intel), Jason Low,
	Linus Torvalds, Michel Lespinasse, Paul E. McKenney, Tim Chen,
	Ingo Molnar, Theodore Ts'o

Before commit b3fd4f03ca0b995(locking/rwsem: Avoid deceiving lock
spinners), rwsem_spin_on_owner() returns false if the owner is changed.
This commit just returns true under the situation, then kernel
softlock can be triggered easily in xfstest.

So this patch recovers to previous behaviour, and it should be
reasonable to stop spining in case of heavy contention.

The soft lockup can be reproduced easily in xfstests(generic/299)
over ext4:

[  236.417011] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [kworker/5:80:3288]
[  236.417011] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw
[  236.417011] CPU: 5 PID: 3288 Comm: kworker/5:80 Not tainted 4.0.0-rc1-next-20150303+ #69
[  236.417011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[  236.417011] Workqueue: dio/sda dio_aio_complete_work
[  236.417011] task: ffff8800b87c0000 ti: ffff8800b703c000 task.ti: ffff8800b703c000
[  236.417011] RIP: 0010:[<ffffffff81083c20>]  [<ffffffff81083c20>] __rcu_read_unlock+0x47/0x55
[  236.417011] RSP: 0018:ffff8800b703fb98  EFLAGS: 00000246
[  236.417011] RAX: 0000000000000000 RBX: ffff8800b703fb48 RCX: 000000000003b080
[  236.417011] RDX: fffffffe00000001 RSI: ffff880231f03a20 RDI: ffff8800bb755568
[  236.417011] RBP: ffff8800b703fba8 R08: ffff880227908078 R09: ffff8800b87c0000
[  236.417011] R10: 0000000000000001 R11: 0000000000000020 R12: ffff8800b703c000
[  236.417011] R13: ffff8800b87c0000 R14: 000000000000000f R15: 0000000000000101
[  236.417011] FS:  0000000000000000(0000) GS:ffff88023eca0000(0000) knlGS:0000000000000000
[  236.417011] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  236.417011] CR2: 00007f549369f948 CR3: 00000000ba891000 CR4: 00000000000007e0
[  236.417011] Stack:
[  236.417011]  fffffffe00000001 ffff8800bb755568 ffff8800b703fbc8 ffffffff81073917
[  236.417011]  ffff8800bb755568 ffff8800bb755584 ffff8800b703fc48 ffffffff814d1ba4
[  236.417011]  ffff8800b703fbe8 ffff8800b87c0000 ffff8800b703fc78 ffffffff811cf51b
[  236.417011] Call Trace:
[  236.417011]  [<ffffffff81073917>] rwsem_spin_on_owner+0x2b/0x79
[  236.417011]  [<ffffffff814d1ba4>] rwsem_down_write_failed+0xc0/0x2f1
[  236.417011]  [<ffffffff811cf51b>] ? start_this_handle+0x494/0x4bd
[  236.417011]  [<ffffffff810d1149>] ? trace_preempt_on+0x12/0x2f
[  236.417011]  [<ffffffff812ae8f3>] call_rwsem_down_write_failed+0x13/0x20
[  236.417011]  [<ffffffff814d1623>] ? down_write+0x24/0x33
[  236.417011]  [<ffffffff81199404>] ext4_map_blocks+0x236/0x3cb
[  236.417011]  [<ffffffff811bb407>] ? ext4_convert_unwritten_extents+0xd2/0x19c
[  236.417011]  [<ffffffff811bcae4>] ? __ext4_journal_start_sb+0x77/0xb8
[  236.417011]  [<ffffffff811bb42e>] ext4_convert_unwritten_extents+0xf9/0x19c
[  236.417011]  [<ffffffff8119e214>] ext4_put_io_end+0x3a/0x5d
[  236.417011]  [<ffffffff81197268>] ext4_end_io_dio+0x2a/0x2c
[  236.417011]  [<ffffffff8116418c>] dio_complete+0x97/0x12d
[  236.417011]  [<ffffffff81164333>] dio_aio_complete_work+0x21/0x23
[  236.417011]  [<ffffffff81053a4b>] process_one_work+0x160/0x299
[  236.417011]  [<ffffffff810542fa>] worker_thread+0x1c7/0x29e
[  236.417011]  [<ffffffff81054133>] ?cancel_delayed_work_sync+0x15/0x15
[  236.417011]  [<ffffffff81054133>] ?cancel_delayed_work_sync+0x15/0x15
[  236.417011]  [<ffffffff810583a5>] kthread+0xae/0xb6
[  236.417011]  [<ffffffff810582f7>] ? __kthread_parkme+0x61/0x61
[  236.417011]  [<ffffffff814d3593>] ret_from_fork+0x53/0x80
[  236.417011]  [<ffffffff810582f7>] ? __kthread_parkme+0x61/0x61
[  236.417011] Code: 01 74 0a ff c8 89 83 10 07 00 00 eb 28 c7 83 10 07 00 00
00 00 00 80 66 8b 83 14 07 00 00 66 85 c0 74 08 48 89 df e8 62 40 00 00 <c7>
83 10 07 00 00 00 00 00 00 58 5b 5d c3 0f 1f 44 00 00 55 48

Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michel Lespinasse <walken@google.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
---
 kernel/locking/rwsem-xadd.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 06e2214..1e78c5b 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -358,8 +358,12 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 	}
 	rcu_read_unlock();
 
+	/*
+	 * When the owner is changed, it is a sign for heavy contention,
+	 * so stop spinning for avoiding kernel lockup.
+	 */
 	if (READ_ONCE(sem->owner))
-		return true; /* new owner, continue spinning */
+		return false;
 
 	/*
 	 * When the owner is not set, the lock could be free or
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH -next] locking/rwsem: don't spin in heavy contention
  2015-03-06 15:13 [PATCH -next] locking/rwsem: don't spin in heavy contention Ming Lei
@ 2015-03-06 21:54 ` Dave Chinner
  2015-03-06 22:24   ` Davidlohr Bueso
  2015-03-07 11:47   ` Ming Lei
  0 siblings, 2 replies; 4+ messages in thread
From: Dave Chinner @ 2015-03-06 21:54 UTC (permalink / raw)
  To: Ming Lei
  Cc: linux-kernel, Davidlohr Bueso, Peter Zijlstra (Intel), Jason Low,
	Linus Torvalds, Michel Lespinasse, Paul E. McKenney, Tim Chen,
	Ingo Molnar, Theodore Ts'o

On Fri, Mar 06, 2015 at 11:13:10PM +0800, Ming Lei wrote:
> Before commit b3fd4f03ca0b995(locking/rwsem: Avoid deceiving lock
> spinners), rwsem_spin_on_owner() returns false if the owner is changed.
> This commit just returns true under the situation, then kernel
> softlock can be triggered easily in xfstest.
> 
> So this patch recovers to previous behaviour, and it should be
> reasonable to stop spining in case of heavy contention.
> 
> The soft lockup can be reproduced easily in xfstests(generic/299)
> over ext4:
> 
> [  236.417011] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [kworker/5:80:3288]
> [  236.417011] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw
> [  236.417011] CPU: 5 PID: 3288 Comm: kworker/5:80 Not tainted 4.0.0-rc1-next-20150303+ #69
> [  236.417011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> [  236.417011] Workqueue: dio/sda dio_aio_complete_work
> [  236.417011] task: ffff8800b87c0000 ti: ffff8800b703c000 task.ti: ffff8800b703c000
> [  236.417011] RIP: 0010:[<ffffffff81083c20>]  [<ffffffff81083c20>] __rcu_read_unlock+0x47/0x55
> [  236.417011] RSP: 0018:ffff8800b703fb98  EFLAGS: 00000246
> [  236.417011] RAX: 0000000000000000 RBX: ffff8800b703fb48 RCX: 000000000003b080
> [  236.417011] RDX: fffffffe00000001 RSI: ffff880231f03a20 RDI: ffff8800bb755568
> [  236.417011] RBP: ffff8800b703fba8 R08: ffff880227908078 R09: ffff8800b87c0000
> [  236.417011] R10: 0000000000000001 R11: 0000000000000020 R12: ffff8800b703c000
> [  236.417011] R13: ffff8800b87c0000 R14: 000000000000000f R15: 0000000000000101
> [  236.417011] FS:  0000000000000000(0000) GS:ffff88023eca0000(0000) knlGS:0000000000000000
> [  236.417011] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [  236.417011] CR2: 00007f549369f948 CR3: 00000000ba891000 CR4: 00000000000007e0
> [  236.417011] Stack:
> [  236.417011]  fffffffe00000001 ffff8800bb755568 ffff8800b703fbc8 ffffffff81073917
> [  236.417011]  ffff8800bb755568 ffff8800bb755584 ffff8800b703fc48 ffffffff814d1ba4
> [  236.417011]  ffff8800b703fbe8 ffff8800b87c0000 ffff8800b703fc78 ffffffff811cf51b
> [  236.417011] Call Trace:
> [  236.417011]  [<ffffffff81073917>] rwsem_spin_on_owner+0x2b/0x79
> [  236.417011]  [<ffffffff814d1ba4>] rwsem_down_write_failed+0xc0/0x2f1
> [  236.417011]  [<ffffffff811cf51b>] ? start_this_handle+0x494/0x4bd
> [  236.417011]  [<ffffffff810d1149>] ? trace_preempt_on+0x12/0x2f
> [  236.417011]  [<ffffffff812ae8f3>] call_rwsem_down_write_failed+0x13/0x20
> [  236.417011]  [<ffffffff814d1623>] ? down_write+0x24/0x33
> [  236.417011]  [<ffffffff81199404>] ext4_map_blocks+0x236/0x3cb
> [  236.417011]  [<ffffffff811bb407>] ? ext4_convert_unwritten_extents+0xd2/0x19c
> [  236.417011]  [<ffffffff811bcae4>] ? __ext4_journal_start_sb+0x77/0xb8
> [  236.417011]  [<ffffffff811bb42e>] ext4_convert_unwritten_extents+0xf9/0x19c
> [  236.417011]  [<ffffffff8119e214>] ext4_put_io_end+0x3a/0x5d
> [  236.417011]  [<ffffffff81197268>] ext4_end_io_dio+0x2a/0x2c
> [  236.417011]  [<ffffffff8116418c>] dio_complete+0x97/0x12d
> [  236.417011]  [<ffffffff81164333>] dio_aio_complete_work+0x21/0x23

If you're getting stuff there, I'd be looking for a bug in ext4, not
the rwsem code. There's no way there should be enough unwritten
extent conversion pending to lock up the system for that length of
time. Especially considering the test has concurrent truncates
running which should drain the entire IO queue every couple of
seconds at worst....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH -next] locking/rwsem: don't spin in heavy contention
  2015-03-06 21:54 ` Dave Chinner
@ 2015-03-06 22:24   ` Davidlohr Bueso
  2015-03-07 11:47   ` Ming Lei
  1 sibling, 0 replies; 4+ messages in thread
From: Davidlohr Bueso @ 2015-03-06 22:24 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Ming Lei, linux-kernel, Peter Zijlstra (Intel), Jason Low,
	Linus Torvalds, Michel Lespinasse, Paul E. McKenney, Tim Chen,
	Ingo Molnar, Theodore Ts'o

On Sat, 2015-03-07 at 08:54 +1100, Dave Chinner wrote:
> On Fri, Mar 06, 2015 at 11:13:10PM +0800, Ming Lei wrote:
> > Before commit b3fd4f03ca0b995(locking/rwsem: Avoid deceiving lock
> > spinners), rwsem_spin_on_owner() returns false if the owner is changed.
> > This commit just returns true under the situation, then kernel
> > softlock can be triggered easily in xfstest.
> > 
> > So this patch recovers to previous behaviour, and it should be
> > reasonable to stop spining in case of heavy contention.
> > 
> > The soft lockup can be reproduced easily in xfstests(generic/299)
> > over ext4:
> > 
> > [  236.417011] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [kworker/5:80:3288]
> > [  236.417011] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw
> > [  236.417011] CPU: 5 PID: 3288 Comm: kworker/5:80 Not tainted 4.0.0-rc1-next-20150303+ #69
> > [  236.417011] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> > [  236.417011] Workqueue: dio/sda dio_aio_complete_work
> > [  236.417011] task: ffff8800b87c0000 ti: ffff8800b703c000 task.ti: ffff8800b703c000
> > [  236.417011] RIP: 0010:[<ffffffff81083c20>]  [<ffffffff81083c20>] __rcu_read_unlock+0x47/0x55
> > [  236.417011] RSP: 0018:ffff8800b703fb98  EFLAGS: 00000246
> > [  236.417011] RAX: 0000000000000000 RBX: ffff8800b703fb48 RCX: 000000000003b080
> > [  236.417011] RDX: fffffffe00000001 RSI: ffff880231f03a20 RDI: ffff8800bb755568
> > [  236.417011] RBP: ffff8800b703fba8 R08: ffff880227908078 R09: ffff8800b87c0000
> > [  236.417011] R10: 0000000000000001 R11: 0000000000000020 R12: ffff8800b703c000
> > [  236.417011] R13: ffff8800b87c0000 R14: 000000000000000f R15: 0000000000000101
> > [  236.417011] FS:  0000000000000000(0000) GS:ffff88023eca0000(0000) knlGS:0000000000000000
> > [  236.417011] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [  236.417011] CR2: 00007f549369f948 CR3: 00000000ba891000 CR4: 00000000000007e0
> > [  236.417011] Stack:
> > [  236.417011]  fffffffe00000001 ffff8800bb755568 ffff8800b703fbc8 ffffffff81073917
> > [  236.417011]  ffff8800bb755568 ffff8800bb755584 ffff8800b703fc48 ffffffff814d1ba4
> > [  236.417011]  ffff8800b703fbe8 ffff8800b87c0000 ffff8800b703fc78 ffffffff811cf51b
> > [  236.417011] Call Trace:
> > [  236.417011]  [<ffffffff81073917>] rwsem_spin_on_owner+0x2b/0x79
> > [  236.417011]  [<ffffffff814d1ba4>] rwsem_down_write_failed+0xc0/0x2f1
> > [  236.417011]  [<ffffffff811cf51b>] ? start_this_handle+0x494/0x4bd
> > [  236.417011]  [<ffffffff810d1149>] ? trace_preempt_on+0x12/0x2f
> > [  236.417011]  [<ffffffff812ae8f3>] call_rwsem_down_write_failed+0x13/0x20
> > [  236.417011]  [<ffffffff814d1623>] ? down_write+0x24/0x33
> > [  236.417011]  [<ffffffff81199404>] ext4_map_blocks+0x236/0x3cb
> > [  236.417011]  [<ffffffff811bb407>] ? ext4_convert_unwritten_extents+0xd2/0x19c
> > [  236.417011]  [<ffffffff811bcae4>] ? __ext4_journal_start_sb+0x77/0xb8
> > [  236.417011]  [<ffffffff811bb42e>] ext4_convert_unwritten_extents+0xf9/0x19c
> > [  236.417011]  [<ffffffff8119e214>] ext4_put_io_end+0x3a/0x5d
> > [  236.417011]  [<ffffffff81197268>] ext4_end_io_dio+0x2a/0x2c
> > [  236.417011]  [<ffffffff8116418c>] dio_complete+0x97/0x12d
> > [  236.417011]  [<ffffffff81164333>] dio_aio_complete_work+0x21/0x23
> 
> If you're getting stuff there, I'd be looking for a bug in ext4, not
> the rwsem code. There's no way there should be enough unwritten
> extent conversion pending to lock up the system for that length of
> time. Especially considering the test has concurrent truncates
> running which should drain the entire IO queue every couple of
> seconds at worst....

FYI, this issue is being handled here:
https://lkml.org/lkml/2015/3/6/811

Thanks,
Davidlohr


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH -next] locking/rwsem: don't spin in heavy contention
  2015-03-06 21:54 ` Dave Chinner
  2015-03-06 22:24   ` Davidlohr Bueso
@ 2015-03-07 11:47   ` Ming Lei
  1 sibling, 0 replies; 4+ messages in thread
From: Ming Lei @ 2015-03-07 11:47 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Linux Kernel Mailing List, Davidlohr Bueso,
	Peter Zijlstra (Intel), Jason Low, Linus Torvalds,
	Michel Lespinasse, Paul E. McKenney, Tim Chen, Ingo Molnar,
	Theodore Ts'o

On Sat, Mar 7, 2015 at 5:54 AM, Dave Chinner <david@fromorbit.com> wrote:
>
> If you're getting stuff there, I'd be looking for a bug in ext4, not
> the rwsem code. There's no way there should be enough unwritten
> extent conversion pending to lock up the system for that length of
> time. Especially considering the test has concurrent truncates

The time is taken for spinning rwsem, so it is really a rwsem problem,
and Jason Low has posted one patch to fix it.

Thanks,
Ming Lei

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-03-07 11:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-06 15:13 [PATCH -next] locking/rwsem: don't spin in heavy contention Ming Lei
2015-03-06 21:54 ` Dave Chinner
2015-03-06 22:24   ` Davidlohr Bueso
2015-03-07 11:47   ` Ming Lei

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.