From: Ira Weiny <ira.weiny@intel.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Matthew Wilcox" <willy@infradead.org>, "Jan Kara" <jack@suse.cz>,
"Dan Williams" <dan.j.williams@intel.com>,
"Theodore Ts'o" <tytso@mit.edu>,
"Jeff Layton" <jlayton@kernel.org>,
linux-xfs@vger.kernel.org,
"Andrew Morton" <akpm@linux-foundation.org>,
"John Hubbard" <jhubbard@nvidia.com>,
"Jérôme Glisse" <jglisse@redhat.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nvdimm@lists.01.org, linux-ext4@vger.kernel.org,
linux-mm@kvack.org, "Jason Gunthorpe" <jgg@ziepe.ca>,
linux-rdma@vger.kernel.org
Subject: Re: [PATCH RFC 00/10] RDMA/FS DAX truncate proposal
Date: Thu, 13 Jun 2019 13:34:05 -0700 [thread overview]
Message-ID: <20190613203404.GA30404@iweiny-DESK2.sc.intel.com> (raw)
In-Reply-To: <20190613002555.GH14363@dread.disaster.area>
On Thu, Jun 13, 2019 at 10:25:55AM +1000, Dave Chinner wrote:
> On Wed, Jun 12, 2019 at 05:37:53AM -0700, Matthew Wilcox wrote:
> > On Sat, Jun 08, 2019 at 10:10:36AM +1000, Dave Chinner wrote:
> > > On Fri, Jun 07, 2019 at 11:25:35AM -0700, Ira Weiny wrote:
> > > > Are you suggesting that we have something like this from user space?
> > > >
> > > > fcntl(fd, F_SETLEASE, F_LAYOUT | F_UNBREAKABLE);
> > >
> > > Rather than "unbreakable", perhaps a clearer description of the
> > > policy it entails is "exclusive"?
> > >
> > > i.e. what we are talking about here is an exclusive lease that
> > > prevents other processes from changing the layout. i.e. the
> > > mechanism used to guarantee a lease is exclusive is that the layout
> > > becomes "unbreakable" at the filesystem level, but the policy we are
> > > actually presenting to uses is "exclusive access"...
> >
> > That's rather different from the normal meaning of 'exclusive' in the
> > context of locks, which is "only one user can have access to this at
> > a time".
>
>
> Layout leases are not locks, they are a user access policy object.
> It is the process/fd which holds the lease and it's the process/fd
> that is granted exclusive access. This is exactly the same semantic
> as O_EXCL provides for granting exclusive access to a block device
> via open(), yes?
>
> > As I understand it, this is rather more like a 'shared' or
> > 'read' lock. The filesystem would be the one which wants an exclusive
> > lock, so it can modify the mapping of logical to physical blocks.
>
> ISTM that you're conflating internal filesystem implementation with
> application visible semantics. Yes, the filesystem uses internal
> locks to serialise the modification of the things the lease manages
> access too, but that has nothing to do with the access policy the
> lease provides to users.
>
> e.g. Process A has an exclusive layout lease on file F. It does an
> IO to file F. The filesystem IO path checks that Process A owns the
> lease on the file and so skips straight through layout breaking
> because it owns the lease and is allowed to modify the layout. It
> then takes the inode metadata locks to allocate new space and write
> new data.
>
> Process B now tries to write to file F. The FS checks whether
> Process B owns a layout lease on file F. It doesn't, so then it
> tries to break the layout lease so the IO can proceed. The layout
> breaking code sees that process A has an exclusive layout lease
> granted, and so returns -ETXTBSY to process B - it is not allowed to
> break the lease and so the IO fails with -ETXTBSY.
>
> i.e. the exclusive layout lease prevents other processes from
> performing operations that may need to modify the layout from
> performing those operations. It does not "lock" the file/inode in
> any way, it just changes how the layout lease breaking behaves.
Question: Do we expect Process A to get notified that Process B was attempting
to change the layout?
This changes the exclusivity semantics. While Process A has an exclusive lease
it could release it if notified to allow process B temporary exclusivity.
Question 2: Do we expect other process' (say Process C) to also be able to map
and pin the file? I believe users will need this and for layout purposes it is
ok to do so. But this means that Process A does not have "exclusive" access to
the lease.
So given Process C has also placed a layout lease on the file. Indicating
that it does not want the layout to change. Both A and C need to be "broken"
by Process B to change the layout. If there is no Process B; A and C can run
just fine with a "locked" layout.
Ira
>
> Further, the "exclusiveness" of a layout lease is completely
> irrelevant to the filesystem that is indicating that an operation
> that may need to modify the layout is about to be performed. All the
> filesystem has to do is handle failures to break the lease
> appropriately. Yes, XFS serialises the layout lease validation
> against other IO to the same file via it's IO locks, but that's an
> internal data IO coherency requirement, not anything to do with
> layout lease management.
>
> Note that I talk about /writes/ here. This is interchangable with
> any other operation that may need to modify the extent layout of the
> file, be it truncate, fallocate, etc: the attempt to break the
> layout lease by a non-owner should fail if the lease is "exclusive"
> to the owner.
>
> > The complication being that by default the filesystem has an exclusive
> > lock on the mapping, and what we're trying to add is the ability for
> > readers to ask the filesystem to give up its exclusive lock.
>
> The filesystem doesn't even lock the "mapping" until after the
> layout lease has been validated or broken.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
next prev parent reply other threads:[~2019-06-13 20:32 UTC|newest]
Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-06 1:45 [PATCH RFC 00/10] RDMA/FS DAX truncate proposal ira.weiny
2019-06-06 1:45 ` [PATCH RFC 01/10] fs/locks: Add trace_leases_conflict ira.weiny
2019-06-09 12:52 ` Jeff Layton
2019-06-06 1:45 ` [PATCH RFC 02/10] fs/locks: Export F_LAYOUT lease to user space ira.weiny
2019-06-09 13:00 ` Jeff Layton
2019-06-11 21:38 ` Ira Weiny
2019-06-12 9:46 ` Jan Kara
2019-06-06 1:45 ` [PATCH RFC 03/10] mm/gup: Pass flags down to __gup_device_huge* calls ira.weiny
2019-06-06 6:18 ` Christoph Hellwig
2019-06-06 16:10 ` Ira Weiny
2019-06-06 1:45 ` [PATCH RFC 04/10] mm/gup: Ensure F_LAYOUT lease is held prior to GUP'ing pages ira.weiny
2019-06-06 1:45 ` [PATCH RFC 05/10] fs/ext4: Teach ext4 to break layout leases ira.weiny
2019-06-06 1:45 ` [PATCH RFC 06/10] fs/ext4: Teach dax_layout_busy_page() to operate on a sub-range ira.weiny
2019-06-06 1:45 ` [PATCH RFC 07/10] fs/ext4: Fail truncate if pages are GUP pinned ira.weiny
2019-06-06 10:58 ` Jan Kara
2019-06-06 16:17 ` Ira Weiny
2019-06-06 1:45 ` [PATCH RFC 08/10] fs/xfs: Teach xfs to use new dax_layout_busy_page() ira.weiny
2019-06-06 1:45 ` [PATCH RFC 09/10] fs/xfs: Fail truncate if pages are GUP pinned ira.weiny
2019-06-06 1:45 ` [PATCH RFC 10/10] mm/gup: Remove FOLL_LONGTERM DAX exclusion ira.weiny
2019-06-06 5:52 ` [PATCH RFC 00/10] RDMA/FS DAX truncate proposal John Hubbard
2019-06-06 17:11 ` Ira Weiny
2019-06-06 19:46 ` Jason Gunthorpe
2019-06-06 10:42 ` Jan Kara
2019-06-06 15:35 ` Dan Williams
2019-06-06 19:51 ` Jason Gunthorpe
2019-06-06 22:22 ` Ira Weiny
2019-06-07 10:36 ` Jan Kara
2019-06-07 12:17 ` Jason Gunthorpe
2019-06-07 14:52 ` Ira Weiny
2019-06-07 15:10 ` Jason Gunthorpe
2019-06-12 10:29 ` Jan Kara
2019-06-12 11:47 ` Jason Gunthorpe
2019-06-12 12:09 ` Jan Kara
2019-06-12 18:41 ` Dan Williams
2019-06-13 7:17 ` Jan Kara
2019-06-12 19:14 ` Jason Gunthorpe
2019-06-12 22:13 ` Ira Weiny
2019-06-12 22:54 ` Dan Williams
2019-06-12 23:33 ` Ira Weiny
2019-06-13 1:14 ` Dan Williams
2019-06-13 15:13 ` Jason Gunthorpe
2019-06-13 16:25 ` Dan Williams
2019-06-13 17:18 ` Jason Gunthorpe
2019-06-13 16:53 ` Dan Williams
2019-06-13 15:12 ` Jason Gunthorpe
2019-06-13 7:53 ` Jan Kara
2019-06-12 18:49 ` Dan Williams
2019-06-13 7:43 ` Jan Kara
2019-06-06 22:03 ` Ira Weiny
2019-06-06 22:26 ` Ira Weiny
2019-06-06 22:28 ` Dave Chinner
2019-06-07 11:04 ` Jan Kara
2019-06-07 18:25 ` Ira Weiny
2019-06-07 18:50 ` Jason Gunthorpe
2019-06-08 0:10 ` Dave Chinner
2019-06-09 1:29 ` Ira Weiny
2019-06-12 12:37 ` Matthew Wilcox
2019-06-12 23:30 ` Ira Weiny
2019-06-13 0:55 ` Dave Chinner
2019-06-13 20:34 ` Ira Weiny
2019-06-14 3:42 ` Dave Chinner
2019-06-13 0:25 ` Dave Chinner
2019-06-13 3:23 ` Matthew Wilcox
2019-06-13 4:36 ` Dave Chinner
2019-06-13 10:47 ` Matthew Wilcox
2019-06-13 15:29 ` Jason Gunthorpe
2019-06-13 15:27 ` Matthew Wilcox
2019-06-13 21:13 ` Ira Weiny
2019-06-13 23:45 ` Jason Gunthorpe
2019-06-14 0:00 ` Ira Weiny
2019-06-14 2:09 ` Dave Chinner
2019-06-14 2:31 ` Matthew Wilcox
2019-06-14 3:07 ` Dave Chinner
2019-06-20 14:52 ` Jan Kara
2019-06-13 20:34 ` Ira Weiny [this message]
2019-06-14 2:58 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190613203404.GA30404@iweiny-DESK2.sc.intel.com \
--to=ira.weiny@intel.com \
--cc=akpm@linux-foundation.org \
--cc=dan.j.williams@intel.com \
--cc=david@fromorbit.com \
--cc=jack@suse.cz \
--cc=jgg@ziepe.ca \
--cc=jglisse@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=jlayton@kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).