From: Matthew Wilcox <willy@infradead.org>
To: Kent Overstreet <kent.overstreet@gmail.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Dave Chinner <dchinner@redhat.com>,
darrick.wong@oracle.com, tytso@mit.edu,
linux-btrfs@vger.kernel.org, clm@fb.com, jbacik@fb.com,
viro@zeniv.linux.org.uk, peterz@infradead.org
Subject: Re: [PATCH 01/10] mm: pagecache add lock
Date: Fri, 18 May 2018 06:13:06 -0700 [thread overview]
Message-ID: <20180518131305.GA6361@bombadil.infradead.org> (raw)
In-Reply-To: <20180518074918.13816-3-kent.overstreet@gmail.com>
On Fri, May 18, 2018 at 03:49:00AM -0400, Kent Overstreet wrote:
> Add a per address space lock around adding pages to the pagecache - making it
> possible for fallocate INSERT_RANGE/COLLAPSE_RANGE to work correctly, and also
> hopefully making truncate and dio a bit saner.
(moving this section here from the overall description so I can reply
to it in one place)
> * pagecache add lock
>
> This is the only one that touches existing code in nontrivial ways.
> The problem it's solving is that there is no existing general mechanism
> for shooting down pages in the page and keeping them removed, which is a
> real problem if you're doing anything that modifies file data and isn't
> buffered writes.
>
> Historically, the only problematic case has been direct IO, and people
> have been willing to say "well, if you mix buffered and direct IO you
> get what you deserve", and that's probably not unreasonable. But now we
> have fallocate insert range and collapse range, and those are broken in
> ways I frankly don't want to think about if they can't ensure consistency
> with the page cache.
ext4 manages collapse-vs-pagefault with the ext4-specific i_mmap_sem.
You may get pushback on the grounds that this ought to be a
filesystem-specific lock rather than one embedded in the generic inode.
> Also, the mechanism truncate uses (i_size and sacrificing a goat) has
> historically been rather fragile, IMO it might be a good think if we
> switched it to a more general rigorous mechanism.
>
> I need this solved for bcachefs because without this mechanism, the page
> cache inconsistencies lead to various assertions popping (primarily when
> we didn't think we need to get a disk reservation going by page cache
> state, but then do the actual write and disk space accounting says oops,
> we did need one). And having to reason about what can happen without
> a locking mechanism for this is not something I care to spend brain
> cycles on.
>
> That said, my patch is kind of ugly, and it requires filesystem changes
> for other filesystems to take advantage of it. And unfortunately, since
> one of the code paths that needs locking is readahead, I don't see any
> realistic way of implementing the locking within just bcachefs code.
>
> So I'm hoping someone has an idea for something cleaner (I think I recall
> Matthew Wilcox saying he had an idea for how to use xarray to solve this),
> but if not I'll polish up my pagecache add lock patch and see what I can
> do to make it less ugly, and hopefully other people find it palatable
> or at least useful.
My idea with the XArray is that we have a number of reserved entries which
we can use as blocking entries. I was originally planning on making this
an XArray feature, but I now believe it's a page-cache-special feature.
We can always revisit that decision if it turns out to be useful to
another user.
API:
int filemap_block_range(struct address_space *mapping, loff_t start,
loff_t end);
void filemap_remove_block(struct address_space *mapping, loff_t start,
loff_t end);
- After removing a block, the pagecache is empty between [start, end].
- You have to treat the block as a single entity; don't unblock only
a subrange of the range you originally blocked.
- Lookups of a page within a blocked range return NULL.
- Attempts to add a page to a blocked range sleep on one of the
page_wait_table queues.
- Attempts to block a blocked range will also sleep on one of the
page_wait_table queues. Is this restriction acceptable for your use
case? It's clearly not a problem for fallocate insert/collapse. It
would only be a problem for Direct I/O if people are doing subpage
directio from within the same page. I think that's rare enough to
not be a problem (but please tell me if I'm wrong!)
next prev parent reply other threads:[~2018-05-18 13:13 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-18 7:48 [PATCH 00/10] RFC: assorted bcachefs patches Kent Overstreet
2018-05-18 7:49 ` [PATCH 01/10] mm: pagecache add lock Kent Overstreet
2018-05-18 13:13 ` Matthew Wilcox [this message]
2018-05-18 15:53 ` Christoph Hellwig
2018-05-18 17:45 ` Kent Overstreet
2018-05-23 23:55 ` Notes on locking for pagacache consistency (was: [PATCH 01/10] mm: pagecache add lock) Kent Overstreet
2018-05-20 22:45 ` [PATCH 01/10] mm: pagecache add lock Kent Overstreet
2018-05-23 15:22 ` Christoph Hellwig
2018-05-23 17:12 ` Kent Overstreet
2018-05-18 7:49 ` [PATCH 02/10] mm: export find_get_pages() Kent Overstreet
2018-05-18 16:00 ` Christoph Hellwig
2018-05-18 7:49 ` [PATCH 03/10] locking: bring back lglocks Kent Overstreet
2018-05-18 9:51 ` Peter Zijlstra
2018-05-18 10:13 ` Kent Overstreet
2018-05-18 11:03 ` Peter Zijlstra
2018-05-18 11:39 ` Kent Overstreet
2018-05-18 7:49 ` [PATCH 04/10] locking: export osq_lock()/osq_unlock() Kent Overstreet
2018-05-18 9:52 ` Peter Zijlstra
2018-05-18 10:18 ` Kent Overstreet
2018-05-18 11:08 ` Peter Zijlstra
2018-05-18 11:32 ` Kent Overstreet
2018-05-18 11:40 ` Peter Zijlstra
2018-05-18 12:40 ` Kent Overstreet
2018-05-18 7:49 ` [PATCH 05/10] don't use spin_lock_irqsave() unnecessarily Kent Overstreet
2018-05-18 16:01 ` Christoph Hellwig
2018-05-18 7:49 ` [PATCH 06/10] Generic radix trees Kent Overstreet
2018-05-18 16:02 ` Christoph Hellwig
2018-05-18 17:38 ` Kent Overstreet
2018-05-18 7:49 ` [PATCH 07/10] bcache: optimize continue_at_nobarrier() Kent Overstreet
2018-05-18 7:49 ` [PATCH 08/10] bcache: move closures to lib/ Kent Overstreet
2018-05-18 16:02 ` Christoph Hellwig
2018-05-18 7:49 ` [PATCH 09/10] closures: closure_wait_event() Kent Overstreet
2018-05-18 7:49 ` [PATCH 10/10] Dynamic fault injection Kent Overstreet
2018-05-18 16:02 ` Christoph Hellwig
2018-05-18 17:37 ` Kent Overstreet
2018-05-18 19:05 ` Andreas Dilger
2018-05-18 19:10 ` Kent Overstreet
2018-05-18 20:54 ` Andreas Dilger
2018-05-18 7:55 ` [PATCH 00/10] RFC: assorted bcachefs patches Kent Overstreet
2018-05-18 17:45 ` Josef Bacik
2018-05-18 17:49 ` Kent Overstreet
2018-05-18 18:03 ` Josef Bacik
2018-05-18 18:28 ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180518131305.GA6361@bombadil.infradead.org \
--to=willy@infradead.org \
--cc=akpm@linux-foundation.org \
--cc=clm@fb.com \
--cc=darrick.wong@oracle.com \
--cc=dchinner@redhat.com \
--cc=jbacik@fb.com \
--cc=kent.overstreet@gmail.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).