public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Zach Brown <zab@zabbo.net>
To: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: David Teigland <teigland@redhat.com>,
	Pekka Enberg <penberg@gmail.com>,
	akpm@osdl.org, linux-kernel@vger.kernel.org,
	linux-cluster@redhat.com, mark.fasheh@oracle.com
Subject: Re: GFS
Date: Tue, 09 Aug 2005 10:17:10 -0700	[thread overview]
Message-ID: <42F8E516.7020600@zabbo.net> (raw)
In-Reply-To: <1123598983.10790.1.camel@haji.ri.fi>

Pekka Enberg wrote:

> In addition, the vma walk will become an unmaintainable mess as soon
>  as someone introduces another mmap() capable fs that needs similar 
> locking.

Yup, I suspect that if the core kernel ends up caring about this problem
then the VFS will be involved in helping file systems sort the locks
they'll acquire around IO.

> I am not an expert so could someone please explain why this cannot be
>  done with a_ops->prepare_write and friends?

I'll try, briefly.

Usually clustered file systems in Linux maintain data consistency for
normal posix IO by holding DLM locks for the duration of their
file->{read,write} methods.  A task on a node won't be able to read
until all tasks on other nodes have finished any conflicting writes they
might have been performing, etc, nothing surprising here.

Now say we want to extend consistency guarantees to mmap().  This boils
down to protecting mappings with DLM locks.  Say a page is mapped for
reading, the continued presence of that mapping is protected by holding
a DLM lock.  If another node goes to write to that page, the read lock
is revoked and the mapping is torn down.  These locks are acquired in
a_ops->nopage as the task faults and tries to bring up the mapping.

And that's the problem. Because they're acquired in ->nopage they can
be acquired during a fault that is servicing the 'buf' argument to an
outer file->{read,write} operation which has grabbed a lock for the
target file. Acquiring multiple locks introduces the risk of ABBA
deadlocks. It's trivial to construct examples of mmap(), read(), and
write() on 2 nodes with 2 files that deadlock.

So clustered file systems in Linux (GFS, Lustre, OCFS2, (GPFS?)) all
walk vmas in their file->{read,write} to discover mappings that belong
to their files so that they can preemptively sort and acquire the locks
that will be needed to cover the mappings that might be established in
->nopage. As you point out, this both relies on the mappings not
changing and gets very exciting when you mix files and mappings between
file systems that are each sorting and acquiring their own DLM locks.

I brought this up with some people at the kernel summit but no one,
including myself, considers it a high priority.  It wouldn't be too hard
to construct a patch if people want to take a look.

- z

  reply	other threads:[~2005-08-09 17:17 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-08-02  7:18 [PATCH 00/14] GFS David Teigland
2005-08-02  7:45 ` Arjan van de Ven
2005-08-02 14:57   ` Jan Engelhardt
2005-08-02 15:02     ` Arjan van de Ven
2005-08-03  1:00       ` Hans Reiser
2005-08-03  4:07         ` Kyle Moffett
2005-08-03  6:37           ` Jan Engelhardt
2005-08-03  9:09         ` Arjan van de Ven
2005-08-03  3:56   ` David Teigland
2005-08-03  9:17     ` Arjan van de Ven
2005-08-03 10:08       ` David Teigland
2005-08-03 10:37     ` Lars Marowsky-Bree
2005-08-03 18:54       ` Mark Fasheh
2005-08-05  7:14   ` David Teigland
2005-08-05  7:27     ` [Linux-cluster] " Mike Christie
2005-08-05  7:30       ` Mike Christie
2005-08-05  7:34     ` Arjan van de Ven
2005-08-05  9:44       ` David Teigland
2005-08-05 10:07         ` Jörn Engel
2005-08-05 10:31           ` David Teigland
2005-08-05  8:28     ` Jan Engelhardt
2005-08-05  8:34       ` Arjan van de Ven
2005-08-08  6:26     ` David Teigland
2005-08-11  6:06   ` David Teigland
2005-08-11  6:55     ` Arjan van de Ven
2005-08-02 10:16 ` Pekka Enberg
2005-08-03  6:36   ` David Teigland
2005-08-08 14:14     ` GFS Pekka J Enberg
2005-08-08 18:32       ` GFS Zach Brown
2005-08-09 14:49         ` GFS Pekka Enberg
2005-08-09 17:17           ` Zach Brown [this message]
2005-08-09 18:35             ` GFS Pekka J Enberg
2005-08-10  4:48             ` GFS Pekka J Enberg
2005-08-10  7:21           ` GFS Christoph Hellwig
2005-08-10  7:31             ` GFS Pekka J Enberg
2005-08-10 16:26               ` GFS Mark Fasheh
2005-08-10 16:57                 ` GFS Pekka J Enberg
2005-08-10 18:21                   ` GFS Mark Fasheh
2005-08-10 20:18                     ` GFS Pekka J Enberg
2005-08-10 22:07                       ` GFS Mark Fasheh
2005-08-11  4:41                         ` GFS Pekka J Enberg
2005-08-10  5:59       ` GFS David Teigland
2005-08-10  6:06         ` GFS Pekka J Enberg
2005-08-03  6:44 ` [PATCH 00/14] GFS Pekka Enberg
2005-08-08  9:57   ` David Teigland
2005-08-08 10:00     ` GFS Pekka J Enberg
2005-08-08 10:05     ` [PATCH 00/14] GFS Arjan van de Ven
2005-08-08 10:20       ` Jörn Engel
2005-08-08 10:18     ` GFS Pekka J Enberg
2005-08-08 10:56       ` GFS David Teigland
2005-08-08 10:57         ` GFS Pekka J Enberg
2005-08-08 11:39           ` GFS David Teigland
2005-08-08 10:34     ` GFS Pekka J Enberg
2005-08-09 14:55     ` GFS Pekka J Enberg
2005-08-10  7:40     ` GFS Pekka J Enberg
2005-08-10  7:43       ` GFS Christoph Hellwig
2005-08-09 15:20 ` [PATCH 00/14] GFS Al Viro
2005-08-10  7:03   ` Christoph Hellwig
2005-08-10 10:30     ` Lars Marowsky-Bree
2005-08-10 10:32       ` Christoph Hellwig
2005-08-10 10:34         ` Lars Marowsky-Bree
2005-08-10 10:54           ` Christoph Hellwig
2005-08-10 11:02             ` Lars Marowsky-Bree
2005-08-10 11:05               ` Christoph Hellwig
2005-08-10 11:09                 ` Lars Marowsky-Bree
2005-08-10 11:11                   ` Christoph Hellwig
2005-08-10 13:26                     ` [Linux-cluster] " AJ Lewis
2005-08-10 15:43                       ` Kyle Moffett
2005-08-11  8:17 ` GFS - updated patches David Teigland
2005-08-11  8:21   ` [Linux-cluster] " Michael
2005-08-11  8:46     ` David Teigland
2005-08-11  8:49       ` Michael
2005-08-11  8:32   ` Arjan van de Ven
2005-08-11  8:50     ` David Teigland
2005-08-11  8:50       ` Arjan van de Ven
2005-08-11  9:16         ` David Teigland
2005-08-11 10:04           ` Pekka Enberg
2005-08-11  9:54   ` [Linux-cluster] " Michael
2005-08-11 10:00     ` Pekka Enberg
  -- strict thread matches above, loose matches on Subject: below --
2005-08-11  7:10 GFS Pekka J Enberg
2005-08-11 16:33 ` GFS Zach Brown
2005-08-11 16:35   ` GFS Christoph Hellwig
2005-08-11 16:39     ` GFS Zach Brown
2005-08-11 16:44   ` GFS Pekka Enberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42F8E516.7020600@zabbo.net \
    --to=zab@zabbo.net \
    --cc=akpm@osdl.org \
    --cc=linux-cluster@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.fasheh@oracle.com \
    --cc=penberg@cs.helsinki.fi \
    --cc=penberg@gmail.com \
    --cc=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox