public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Zach Brown <zach.brown@oracle.com>
To: Andrew Morton <akpm@osdl.org>
Cc: linux-kernel@vger.kernel.org, hch@infradead.org,
	Mark Fasheh <mark.fasheh@oracle.com>
Subject: Re: [RFC] page lock ordering and OCFS2
Date: Tue, 18 Oct 2005 10:14:36 -0700	[thread overview]
Message-ID: <43552D7C.3010003@oracle.com> (raw)
In-Reply-To: <20051017182407.1f2c591a.akpm@osdl.org>

Andrew Morton wrote:

>>user thread				dlm thread
>>
>>
>>					kthread
>>					...
>>					ocfs2_data_convert_worker
> 
> 
>                                         I assume there's an ocfs2_data_lock
>                                         hereabouts?

No.  The thread is working behind the scenes to bring the lock to a
state where others can get the lock again.  It doesn't explicitly hold
the lock like regular fs paths do, but no one else can get the lock
until it finishes its work.

A wild over-simplification of the logic goes something like this:

dlm_lock(lock) {
	wait_event(, !lock->blocked);
	atomic_inc(&lock->ref);
}

dlm_unlock(lock) {
	if (atomic_dec_and_test(&lock->ref) {
		wake_worker_thread()
	}
}

incoming_drop_message() {
	lock->blocked = 1;
	wake_worker_thread();
}

worker_thread() {
	wait_event(, atomic_read(&lock->ref) == 0);
	truncate_inode_pages();
	lock->blocked = 0;
}

It's much worse than that, but that's the basic idea for our problem
here.  A network message comes in which forbids additional acquiry of
the lock until all the current holders have released it and the the
worker thread has truncated the page cache.

So in our inversion ocfs2_readpage() holds a page lock while waiting in
dlm_lock() for ->blocked to be clear, but worker_thread() can't clear
->blocked until it locks and truncates the page that ocfs2_readpage() holds.

> Have you considered using invalidate_inode_pages() instead of
> truncate_inode_pages()?  If that leaves any pages behind, drop the read
> lock, sleep a bit, try again - something klunky like that might get you out
> of trouble, dunno.

Yeah, that doesn't work because the locked pages aren't making forward
progress.  In the read case it would be fine to just skip them, they're
about to be re-read once ocfs2_readpage() gets its DLM lock.  But we'd
have to strongly identify locked pages waiting in ocfs2_readpage().  And
in the write case the DLM thread is responsible writing those pages back
before releasing the lock so that other nodes can read the pages in.

We think we could do this sort of thing with a specific truncation loop,
but that's the nasty code I wasn't sure would be any more acceptable
than this nasty core patch.  The truncation loop would need a way to
identify locked pages that it can just ignore or initiate writeback on
because it'd know that it's just another ocfs2 path that holds the page
lock while waiting for a dlm lock.  I wonder if we could just do it with
PG_fs_misc.  I can give it a try if that doesn't send shivers down
everyone's spine.

- z


  parent reply	other threads:[~2005-10-18 17:14 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-17 22:20 [RFC] page lock ordering and OCFS2 Zach Brown
2005-10-17 23:17 ` Andrew Morton
2005-10-18  0:40   ` Zach Brown
2005-10-18  1:24     ` Andrew Morton
2005-10-18  8:23       ` Anton Altaparmakov
2005-10-18 17:25         ` Zach Brown
2005-10-18 17:14       ` Zach Brown [this message]
2005-10-21 17:43     ` Zach Brown
2005-10-21 17:57       ` Christoph Hellwig
2005-10-21 20:36         ` Zach Brown
2005-10-21 20:59           ` Andrew Morton
2005-10-21 21:57             ` Zach Brown
2005-10-25  0:03         ` Zach Brown
2005-10-21 17:58       ` Andrew Morton
2005-10-21 20:32         ` Zach Brown
2005-10-17 23:37 ` Badari Pulavarty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=43552D7C.3010003@oracle.com \
    --to=zach.brown@oracle.com \
    --cc=akpm@osdl.org \
    --cc=hch@infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.fasheh@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox