[PATCH RFC] Lost wakeups from lock_page

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH RFC] Lost wakeups from lock_page_killable()
@ 2009-01-14 20:23 Chris Mason
  2009-01-14 21:55 ` Chris Mason
  2009-01-14 22:30 ` Matthew Wilcox
  0 siblings, 2 replies; 3+ messages in thread
From: Chris Mason @ 2009-01-14 20:23 UTC (permalink / raw)
  To: linux-mm, Peter Zijlstra, Matthew Wilcox, chuck.lever

Chuck has been debugging a problem with NFS mounts where procs are
getting stuck waiting on the page lock.  He did a bunch of work around
tracking the last person to lock the page and then printing out the page
state when it found stuck procs.

The end result showed that procs were waiting to lock pages that were
not actually locked.  The workload involved a well known commercial
database that was being shutdown via sql shutdown abort

After trying to blame NFS for a really long time I think
lock_page_killable may be the cause.

lock_page and lock_page_killable both call __wait_on_bit_lock, and so
both end up using prepare_to_wait_exclusive().  This means that when
someone does finally unlock the page, only one process is going to get
woken up.

So, procA holding the page lock, procB and procC are waiting on the
lock.

procA: lock_page() // success
procB: lock_page_killable(), sync_page_killable(), io_schedule()
procC: lock_page_killable(), sync_page_killable(), io_schedule()

procA: unlock, wake_up_page(page, PG_locked)
procA: wake up procB

happy admin: kill procB

procB: wakes into sync_page_killable(), notices the signal and returns
-EINTR

procB: __wait_on_bit_lock sees the action() func returns < 0 and does
not take the page lock

procB: lock_page_killable() returns < 0 and exits happily.

procC: sleeping in io_schedule() forever unless someone else locks the
page.

The patch below is entirely untested but may do a better job of
explaining what I think the bug is.  I'm hoping I can trigger it locally
with a few dd commands mixed with a lot of kill commands.

-chris

diff --git a/mm/filemap.c b/mm/filemap.c
index ceba0bd..e1184fa 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -623,9 +623,20 @@ EXPORT_SYMBOL(__lock_page);
 int __lock_page_killable(struct page *page)
 {
 	DEFINE_WAIT_BIT(wait, &page->flags, PG_locked);
+	int ret;

-	return __wait_on_bit_lock(page_waitqueue(page), &wait,
+	ret = __wait_on_bit_lock(page_waitqueue(page), &wait,
 					sync_page_killable, TASK_KILLABLE);
+	/*
+	 * wait_on_bit_lock uses prepare_to_wait_exclusive, so if multiple
+	 * procs were waiting on this page, we were the only proc woken up.
+	 *
+	 * if ret != 0, we didn't actually get the lock.  We need to
+	 * make sure any other waiters don't sleep forever.
+	 */
+	if (ret)
+		wake_up_page(page, PG_locked);
+	return ret;
 }

 /**

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC] Lost wakeups from lock_page_killable()
  2009-01-14 20:23 [PATCH RFC] Lost wakeups from lock_page_killable() Chris Mason
@ 2009-01-14 21:55 ` Chris Mason
  2009-01-14 22:30 ` Matthew Wilcox
  1 sibling, 0 replies; 3+ messages in thread
From: Chris Mason @ 2009-01-14 21:55 UTC (permalink / raw)
  To: linux-mm; +Cc: Peter Zijlstra, Matthew Wilcox, chuck.lever

On Wed, 2009-01-14 at 15:23 -0500, Chris Mason wrote:

> The patch below is entirely untested but may do a better job of
> explaining what I think the bug is.  I'm hoping I can trigger it locally
> with a few dd commands mixed with a lot of kill commands.
> 

I've tried many variations but haven't hit it locally yet.  Hopefully
testers at oracle can confirm the patch fixes things for them, but the
runs take about a day.

-chris


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC] Lost wakeups from lock_page_killable()
  2009-01-14 20:23 [PATCH RFC] Lost wakeups from lock_page_killable() Chris Mason
  2009-01-14 21:55 ` Chris Mason
@ 2009-01-14 22:30 ` Matthew Wilcox
  1 sibling, 0 replies; 3+ messages in thread
From: Matthew Wilcox @ 2009-01-14 22:30 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-mm, Peter Zijlstra, chuck.lever

On Wed, Jan 14, 2009 at 03:23:52PM -0500, Chris Mason wrote:
> Chuck has been debugging a problem with NFS mounts where procs are
> getting stuck waiting on the page lock.  He did a bunch of work around
> tracking the last person to lock the page and then printing out the page
> state when it found stuck procs.

Yes, I can testify to Chuck's hard work on this.  I hadn't managed to
figure out this problem ... I was blaming lock_page_killable() but
couldn't see the bug.

> lock_page and lock_page_killable both call __wait_on_bit_lock, and so
> both end up using prepare_to_wait_exclusive().  This means that when
> someone does finally unlock the page, only one process is going to get
> woken up.

Yeah.  This is a Bad API, IMO.  It doesn't contain the word 'exclusive'
in it, so I had no idea that __wait_on_bit_lock was an exclusive wait.
Sure, if I'd drilled down further, I'd've noticed that, and maybe having
the word 'exlcusive' in the name of the function wouldn't've made me
spot the bug, but I think it's worth changing.

> So, procA holding the page lock, procB and procC are waiting on the
> lock.
> 
> procA: lock_page() // success
> procB: lock_page_killable(), sync_page_killable(), io_schedule()
> procC: lock_page_killable(), sync_page_killable(), io_schedule()
> 
> procA: unlock, wake_up_page(page, PG_locked)
> procA: wake up procB
> 
> happy admin: kill procB
> 
> procB: wakes into sync_page_killable(), notices the signal and returns
> -EINTR
> 
> procB: __wait_on_bit_lock sees the action() func returns < 0 and does
> not take the page lock
> 
> procB: lock_page_killable() returns < 0 and exits happily.
> 
> procC: sleeping in io_schedule() forever unless someone else locks the
> page.

Yeah.  That works.  Of course, if you're multithreaded and you have
threads B1 B2 B3 B4 B5 B6 ... waiting on the same page, killing procB is
going to give you a fairly high likelihood of procC getting stuck.

Since procC is also sleeping in a killable state, you can kill procC too.
Chuck asked "Why isn't anyone else seeing this?" and I think the answer
is that they are, but who's reporting it?  It just gets written off
as "^C didn't work.  Try harder." and eventually everything dies, so
nobody's going to file a bug against the kernel ... it's clearly an
application bug.

> The patch below is entirely untested but may do a better job of
> explaining what I think the bug is.  I'm hoping I can trigger it locally
> with a few dd commands mixed with a lot of kill commands.

The patch looks sane to me.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2009-01-14 22:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-14 20:23 [PATCH RFC] Lost wakeups from lock_page_killable() Chris Mason
2009-01-14 21:55 ` Chris Mason
2009-01-14 22:30 ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).