From: Badari Pulavarty <pbadari@gmail.com>
To: Zach Brown <zach.brown@oracle.com>
Cc: lkml <linux-kernel@vger.kernel.org>,
Andrew Morton <akpm@osdl.org>,
Christoph Hellwig <hch@infradead.org>
Subject: Re: [RFC] page lock ordering and OCFS2
Date: Mon, 17 Oct 2005 16:37:20 -0700 [thread overview]
Message-ID: <1129592240.23632.51.camel@localhost.localdomain> (raw)
In-Reply-To: <20051017222051.GA26414@tetsuo.zabbo.net>
On Mon, 2005-10-17 at 15:20 -0700, Zach Brown wrote:
> I sent an ealier version of this patch to linux-fsdevel and was met with
> deafening silence. I'm resending the commentary from that first mail and am
> including a new version of the patch. This time it has much clearer naming
> and some kerneldoc blurbs. Here goes...
>
> --
>
> In recent weeks we've been reworking the locking in OCFS2 to simplify things
> and make it behave more like a "local" Linux file system. We've run into an
> ordering inversion between a page's PG_locked and OCFS2's DLM locks that
> protect page cache contents. I'm including a patch at the end of this mail
> that I think is a clean way to give the file system a chance to get the
> ordering right, but we're open to any and all suggestions. We want to do the
> cleanest thing.
>
> OCFS2 maintains page cache coherence between nodes by requiring that a node
> hold a valid lock while there are active pages in the page cache. The page
> cache is invalidated before a node releases a lock so that another node can
> acquire it. While this invalidation is happening new locks can not be acquired
> on that node. This is equivalent to a DLM processing thread acquiring
> PG_locked during truncation while holding a DLM lock. Normal file system user
> tasks come to the a_ops with PG_locked acquired by their callers before they
> have a chance to get DLM locks.
>
> We talked a little about modifying the invalidation path to avoid waiting for
> pages that are held by an OCFS2 path that is waiting for a DLM lock. It seems
> awfully hard to get right without some very nasty code. The truncation on lock
> removal has to write back dirty pages so that other nodes can read it -- it
> can't simply skip these pages if they happened to be locked.
>
> So we aimed right for the lock ordering inversion problem with the appended
> patch. It gives file systems a callback that is tried before page locks are
> acquired that are going to be passed in to the file system's a_ops methods.
>
> So, what do people think about this? Is it reasonable to patch the core to
> help OCFS2 with this ordering inversion?
Sorry for the "deafening silence" on fs-devel. I was trying to see
what exactly I need and how to combine with what you are trying to
do, before I reply. Unfortunately, I don't understand your lock
inversion problem well enough :(
I was looking at ext3 pagelock, trasaction ordering. I wanted
a way to reverse the ordering to support writepages() cleanly.
I guess what you are proposing could be used (in a weird way) to
that, except that I need the support in different places (callers
of writepage, writepages). BTW, I wasn't comfortable touching VFS
locking just of ext3 purpose :(
Thanks,
Badari
prev parent reply other threads:[~2005-10-17 23:38 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-10-17 22:20 [RFC] page lock ordering and OCFS2 Zach Brown
2005-10-17 23:17 ` Andrew Morton
2005-10-18 0:40 ` Zach Brown
2005-10-18 1:24 ` Andrew Morton
2005-10-18 8:23 ` Anton Altaparmakov
2005-10-18 17:25 ` Zach Brown
2005-10-18 17:14 ` Zach Brown
2005-10-21 17:43 ` Zach Brown
2005-10-21 17:57 ` Christoph Hellwig
2005-10-21 20:36 ` Zach Brown
2005-10-21 20:59 ` Andrew Morton
2005-10-21 21:57 ` Zach Brown
2005-10-25 0:03 ` Zach Brown
2005-10-21 17:58 ` Andrew Morton
2005-10-21 20:32 ` Zach Brown
2005-10-17 23:37 ` Badari Pulavarty [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1129592240.23632.51.camel@localhost.localdomain \
--to=pbadari@gmail.com \
--cc=akpm@osdl.org \
--cc=hch@infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=zach.brown@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox