From: Jan Kara <jack@suse.cz>
To: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org,
Christoph Hellwig <hch@infradead.org>,
Amir Goldstein <amir73il@gmail.com>,
Dave Chinner <david@fromorbit.com>, Ted Tso <tytso@mit.edu>,
linux-mm@kvack.org
Subject: Re: [PATCH 10/12] shmem: Use invalidate_lock to protect fallocate
Date: Thu, 29 Apr 2021 11:20:52 +0200 [thread overview]
Message-ID: <20210429092052.GA11234@quack2.suse.cz> (raw)
In-Reply-To: <alpine.LSU.2.11.2104282004410.10848@eggly.anvils>
On Wed 28-04-21 20:24:59, Hugh Dickins wrote:
> On Fri, 23 Apr 2021, Jan Kara wrote:
>
> > We have to handle pages added by currently running shmem_fallocate()
> > specially in shmem_writepage(). For this we use serialization mechanism
> > using structure attached to inode->i_private field. If we protect
> > allocation of pages in shmem_fallocate() with invalidate_lock instead,
> > we are sure added pages cannot be dirtied until shmem_fallocate() is done
> > (invalidate_lock blocks faults, i_rwsem blocks writes) and thus we
> > cannot see those pages in shmem_writepage() and there's no need for the
> > serialization mechanism.
>
> Appealing diffstat, but NAK, this patch is based on a false premise.
>
> All those pages that are added by shmem_fallocate() are marked dirty:
> see the set_page_dirty(page) and comment above it in shmem_fallocate().
Aha, I missed that set_page_dirty(). Thanks for correcting me.
> It's intentional, that they should percolate through to shmem_writepage()
> when memory is tight, and give feedback to fallocate that it should stop
> (instead of getting into that horrid OOM-kill flurry that never even
> frees any tmpfs anyway).
I understand the reason, I just feel there should be a better way of
communicating memory pressure than ->writepage talking to ->fallocate
through structure attached to inode->i_private. But now I see it isn't as
simple as I thought ;) Anyway for now I'll just remove this patch. Thanks
for review!
Honza
> > CC: Hugh Dickins <hughd@google.com>
> > CC: <linux-mm@kvack.org>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> > mm/shmem.c | 61 ++++++------------------------------------------------
> > 1 file changed, 6 insertions(+), 55 deletions(-)
> >
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index f34162ac46de..7a2b0744031e 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -94,18 +94,6 @@ static struct vfsmount *shm_mnt;
> > /* Symlink up to this size is kmalloc'ed instead of using a swappable page */
> > #define SHORT_SYMLINK_LEN 128
> >
> > -/*
> > - * shmem_fallocate communicates with shmem_writepage via inode->i_private (with
> > - * i_rwsem making sure that it has only one user at a time): we would prefer
> > - * not to enlarge the shmem inode just for that.
> > - */
> > -struct shmem_falloc {
> > - pgoff_t start; /* start of range currently being fallocated */
> > - pgoff_t next; /* the next page offset to be fallocated */
> > - pgoff_t nr_falloced; /* how many new pages have been fallocated */
> > - pgoff_t nr_unswapped; /* how often writepage refused to swap out */
> > -};
> > -
> > struct shmem_options {
> > unsigned long long blocks;
> > unsigned long long inodes;
> > @@ -1364,28 +1352,11 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
> > * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC
> > * value into swapfile.c, the only way we can correctly account for a
> > * fallocated page arriving here is now to initialize it and write it.
> > - *
> > - * That's okay for a page already fallocated earlier, but if we have
> > - * not yet completed the fallocation, then (a) we want to keep track
> > - * of this page in case we have to undo it, and (b) it may not be a
> > - * good idea to continue anyway, once we're pushing into swap. So
> > - * reactivate the page, and let shmem_fallocate() quit when too many.
>
> (b) there commenting on communicating back to fallocate by nr_unswapped.
>
> > + * Since a page added by currently running fallocate call cannot be
> > + * dirtied and thus arrive here we know the fallocate has already
> > + * completed and we are fine writing it out.
> > */
> > if (!PageUptodate(page)) {
> > - if (inode->i_private) {
> > - struct shmem_falloc *shmem_falloc;
> > - spin_lock(&inode->i_lock);
> > - shmem_falloc = inode->i_private;
> > - if (shmem_falloc &&
> > - index >= shmem_falloc->start &&
> > - index < shmem_falloc->next)
> > - shmem_falloc->nr_unswapped++;
> > - else
> > - shmem_falloc = NULL;
> > - spin_unlock(&inode->i_lock);
> > - if (shmem_falloc)
> > - goto redirty;
> > - }
> > clear_highpage(page);
> > flush_dcache_page(page);
> > SetPageUptodate(page);
> > @@ -2629,9 +2600,9 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> > loff_t len)
> > {
> > struct inode *inode = file_inode(file);
> > + struct address_space *mapping = file->f_mapping;
> > struct shmem_sb_info *sbinfo = SHMEM_SB(inode->i_sb);
> > struct shmem_inode_info *info = SHMEM_I(inode);
> > - struct shmem_falloc shmem_falloc;
> > pgoff_t start, index, end;
> > int error;
> >
> > @@ -2641,7 +2612,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> > inode_lock(inode);
> >
> > if (mode & FALLOC_FL_PUNCH_HOLE) {
> > - struct address_space *mapping = file->f_mapping;
> > loff_t unmap_start = round_up(offset, PAGE_SIZE);
> > loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1;
> >
> > @@ -2680,14 +2650,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> > goto out;
> > }
> >
> > - shmem_falloc.start = start;
> > - shmem_falloc.next = start;
> > - shmem_falloc.nr_falloced = 0;
> > - shmem_falloc.nr_unswapped = 0;
> > - spin_lock(&inode->i_lock);
> > - inode->i_private = &shmem_falloc;
> > - spin_unlock(&inode->i_lock);
> > -
> > + down_write(&mapping->invalidate_lock);
> > for (index = start; index < end; index++) {
> > struct page *page;
> >
> > @@ -2697,8 +2660,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> > */
> > if (signal_pending(current))
> > error = -EINTR;
> > - else if (shmem_falloc.nr_unswapped > shmem_falloc.nr_falloced)
> > - error = -ENOMEM;
> > else
> > error = shmem_getpage(inode, index, &page, SGP_FALLOC);
> > if (error) {
> > @@ -2711,14 +2672,6 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> > goto undone;
> > }
> >
> > - /*
> > - * Inform shmem_writepage() how far we have reached.
> > - * No need for lock or barrier: we have the page lock.
> > - */
> > - shmem_falloc.next++;
> > - if (!PageUptodate(page))
> > - shmem_falloc.nr_falloced++;
> > -
> > /*
> > * If !PageUptodate, leave it that way so that freeable pages
> > * can be recognized if we need to rollback on error later.
>
> Which goes on to say:
>
> * But set_page_dirty so that memory pressure will swap rather
> * than free the pages we are allocating (and SGP_CACHE pages
> * might still be clean: we now need to mark those dirty too).
> */
> set_page_dirty(page);
> unlock_page(page);
> put_page(page);
> cond_resched();
>
> > @@ -2736,9 +2689,7 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
> > i_size_write(inode, offset + len);
> > inode->i_ctime = current_time(inode);
> > undone:
> > - spin_lock(&inode->i_lock);
> > - inode->i_private = NULL;
> > - spin_unlock(&inode->i_lock);
> > + up_write(&mapping->invalidate_lock);
> > out:
> > inode_unlock(inode);
> > return error;
> > --
> > 2.26.2
--
Jan Kara <jack@suse.com>
SUSE Labs, CR
next prev parent reply other threads:[~2021-04-29 9:20 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-23 17:29 [PATCH 0/12 v4] fs: Hole punch vs page cache filling races Jan Kara
2021-04-23 17:29 ` [PATCH 01/12] mm: Fix comments mentioning i_mutex Jan Kara
2021-04-23 17:29 ` [PATCH 02/12] mm: Protect operations adding pages to page cache with invalidate_lock Jan Kara
2021-04-23 18:30 ` Matthew Wilcox
2021-04-23 23:04 ` Dave Chinner
2021-04-26 15:46 ` Jan Kara
2021-04-23 17:29 ` [PATCH 03/12] ext4: Convert to use mapping->invalidate_lock Jan Kara
2021-04-23 17:29 ` [PATCH 04/12] ext2: Convert to using invalidate_lock Jan Kara
2021-04-23 17:29 ` [PATCH 05/12] xfs: Convert to use invalidate_lock Jan Kara
2021-04-23 22:39 ` Dave Chinner
2021-04-23 17:29 ` [PATCH 06/12] zonefs: Convert to using invalidate_lock Jan Kara
2021-04-26 6:40 ` Damien Le Moal
2021-04-26 16:24 ` Jan Kara
2021-04-23 17:29 ` [PATCH 07/12] f2fs: " Jan Kara
2021-04-23 19:15 ` kernel test robot
2021-04-23 20:05 ` kernel test robot
2021-04-23 17:29 ` [PATCH 08/12] fuse: " Jan Kara
2021-04-23 17:29 ` [PATCH 09/12] shmem: " Jan Kara
2021-04-29 4:12 ` Hugh Dickins
2021-04-29 9:30 ` Jan Kara
2021-04-23 17:29 ` [PATCH 10/12] shmem: Use invalidate_lock to protect fallocate Jan Kara
2021-04-23 19:27 ` kernel test robot
2021-04-29 3:24 ` Hugh Dickins
2021-04-29 9:20 ` Jan Kara [this message]
2021-04-23 17:29 ` [PATCH 11/12] ceph: Fix race between hole punch and page fault Jan Kara
2021-04-23 17:29 ` [PATCH 12/12] cifs: " Jan Kara
2021-04-23 22:07 ` [PATCH 0/12 v4] fs: Hole punch vs page cache filling races Dave Chinner
2021-04-23 23:51 ` Matthew Wilcox
2021-04-24 6:11 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210429092052.GA11234@quack2.suse.cz \
--to=jack@suse.cz \
--cc=amir73il@gmail.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).