From: Christoph Hellwig <hch@lst.de>
To: Zach Brown <zach.brown@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>,
Joel Becker <Joel.Becker@oracle.com>,
Andrew Morton <akpm@osdl.org>,
mark.fasheh@oracle.com, linux-fsdevel@vger.kernel.org
Subject: Re: [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs]
Date: Thu, 25 Aug 2005 22:23:01 +0200 [thread overview]
Message-ID: <20050825202301.GA15195@lst.de> (raw)
In-Reply-To: <430E11BA.4030603@oracle.com>
On Thu, Aug 25, 2005 at 11:45:14AM -0700, Zach Brown wrote:
> Yeah, we aim to simplify this code. For the record, it wasn't buffered
> aio that was the problem. There were two naughty moving parts:
>
> First, trying not to block in the dlm when issuing aio ops and tracking
> state to restart after dlm ops returned eiocbqueued. This was just
> overly aggressive. This can behave like block mapping lookups in that
> it rarely blocks. Most aio that people care about (direct io writes to
> already allocated regions) will simply be acquiring and releasing
> shared-read locks around each op -- trivial local operations.
>
> Second, trying to hold dlm locks around the entirety of aio ops. This
> led to the mess of trying to tear down locks in the iocb dtor method.
> (which can then race with unmount, aio does __fput on the filp, dropping
> the vfsmount ref, before calling dtor.. bleh). We can get around this
> by unlocking after performing the block mapping lookups and issueing the
> io and introducing a cluster DLM lock which behaves like i_alloc_sem.
You might want to look at XFS as a model for this. While it's not
clustered it has it's own r/w semaphore to protect block allocations.
It's not using the i_alloc_sem at all but some 'clever' behaviour with
downgrading the lock after the block allocations are done.
> So, how about a patch that lets the fs provide a callback to
> acquire/release i_alloc_sem at the current sites (dio, notify_change)
> that work with it? Most file systems wouldn't provide a callback and
> the code would just use the sem as usual, but clustered guys could use
> dlm locking.
If we're going down that route I'd say provide the callback for
filesystems that actually need locking only, but there must be a better
way to do that.
Note that in any case you're doing lots of work for the buffere path
aswell in aio.c that should be nessecary with a bit of refactoring.
prev parent reply other threads:[~2005-08-25 20:23 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20050822213220.GH19387@insight.us.oracle.com>
[not found] ` <20050822144521.24494329.akpm@osdl.org>
[not found] ` <20050822215049.GI19387@insight.us.oracle.com>
[not found] ` <20050822150505.7978136d.akpm@osdl.org>
2005-08-24 7:18 ` [Joel.Becker@oracle.com: Re: [Linux-cluster] Re: [PATCH 1/3] dlm: use configfs] Christoph Hellwig
2005-08-24 20:33 ` Joel Becker
2005-08-25 9:58 ` Christoph Hellwig
2005-08-25 17:45 ` Mark Fasheh
2005-08-28 22:48 ` Greg KH
2005-08-29 17:41 ` Joel Becker
2005-08-29 19:29 ` Miklos Szeredi
2005-08-31 6:14 ` Greg KH
2005-08-31 8:24 ` Joel Becker
2005-08-31 11:11 ` Miklos Szeredi
2005-08-25 18:45 ` Zach Brown
2005-08-25 20:23 ` Christoph Hellwig [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050825202301.GA15195@lst.de \
--to=hch@lst.de \
--cc=Joel.Becker@oracle.com \
--cc=akpm@osdl.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=mark.fasheh@oracle.com \
--cc=zach.brown@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).