Re: [PATCH] mm: fix cpu hangs on truncating last page of a 16t sparse file

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jeff Layton <jlayton@poochiereds.net>
To: Dave Chinner <david@fromorbit.com>
Cc: Hugh Dickins <hughd@google.com>, angelo <angelo70@gmail.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>,
	Andi Kleen <andi@firstfloor.org>, Eryu Guan <eguan@redhat.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH] mm: fix cpu hangs on truncating last page of a 16t sparse file
Date: Sun, 27 Sep 2015 19:56:55 -0400	[thread overview]
Message-ID: <20150927195655.18e20003@tlielax.poochiereds.net> (raw)
In-Reply-To: <20150927232645.GW3902@dastard>

On Mon, 28 Sep 2015 09:26:45 +1000
Dave Chinner <david@fromorbit.com> wrote:

> On Sun, Sep 27, 2015 at 10:59:33AM -0700, Hugh Dickins wrote:
> > On Sun, 27 Sep 2015, angelo wrote:
> > > On 27/09/2015 03:36, Hugh Dickins wrote:
> > > > Let's Cc linux-fsdevel, who will be more knowledgable.
> > > > 
> > > > On Sun, 27 Sep 2015, angelo wrote:
> > > > 
> > > > > Hi all,
> > > > > 
> > > > > running xfstests, generic 308 on whatever 32bit arch is possible
> > > > > to observe cpu to hang near 100% on unlink.
> > 
> > I have since tried to repeat your result, but generic/308 on 32-bit just
> > skipped the test for me.  I didn't investigate why: it's quite possible
> > that I had a leftover 64-bit executable in the path that it tried to use,
> > but didn't show the relevant error message.
> >
> > I did verify your result with a standalone test; and that proves that
> > nobody has actually been using such files in practice before you,
> > since unmounting the xfs filesystem would hang in the same way if
> > they didn't unlink them.
> 
> It used to work - this is a regression. Just because nobody has
> reported it recently simply means nobody has run xfstests on 32 bit
> storage recently. There are 32 bit systems out there that expect
> this to work, and we've broken it.
> 
> The regression was introduced in 3.11 by this commit:
> 
> commit 5a7203947a1d9b6f3a00a39fda08c2466489555f
> Author: Lukas Czerner <lczerner@redhat.com>
> Date:   Mon May 27 23:32:35 2013 -0400
> 
>     mm: teach truncate_inode_pages_range() to handle non page aligned ranges
>     
>     This commit changes truncate_inode_pages_range() so it can handle non
>     page aligned regions of the truncate. Currently we can hit BUG_ON when
>     the end of the range is not page aligned, but we can handle unaligned
>     start of the range.
>     
>     Being able to handle non page aligned regions of the page can help file
>     system punch_hole implementations and save some work, because once we're
>     holding the page we might as well deal with it right away.
>     
>     In previous commits we've changed ->invalidatepage() prototype to accept
>     'length' argument to be able to specify range to invalidate. No we can
>     use that new ability in truncate_inode_pages_range().
>     
>     Signed-off-by: Lukas Czerner <lczerner@redhat.com>
>     Cc: Andrew Morton <akpm@linux-foundation.org>
>     Cc: Hugh Dickins <hughd@google.com>
>     Signed-off-by: Theodore Ts'o <tytso@mit.edu>
> 
> 
> > > > > The test removes a sparse file of length 16tera where only the last
> > > > > 4096 bytes block is mapped.
> > > > > At line 265 of truncate.c there is a
> > > > > if (index >= end)
> > > > >      break;
> > > > > But if index is, as in this case, a 4294967295, it match -1 used as
> > > > > eof. Hence the cpu loops 100% just after.
> > > > > 
> > > > That's odd.  I've not checked your patch, because I think the problem
> > > > would go beyond truncate, and the root cause lie elsewhere.
> > > > 
> > > > My understanding is that the 32-bit
> > > > #define MAX_LFS_FILESIZE (((loff_t)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1)
> > > > makes a page->index of -1 (or any "negative") impossible to reach.
> 
> We've supported > 8TB files on 32 bit XFS file systems since
> since mid 2003:
> 
> http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=d13d78f6b83eefbd90a6cac5c9fbe42560c6511e
> 
> And it's been documented as such for a long time, too:
> 
> http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch02s04.html
> 
> (that was written, IIRC, back in 2007).
> 
> i.e. whatever the definition says about MAX_LFS_FILESIZE being an
> 8TB limit on 32 bit is stale and has been for a very long time.
> 
> > A surprise to me, and I expect to others, that 32-bit xfs is not
> > respecting MAX_LFS_FILESIZE: going its own way with 0xfff ffffffff
> > instead of 0x7ff ffffffff (on a PAGE_CACHE_SIZE 4096 system).
> > 
> > MAX_LFS_FILESIZE has been defined that way ever since v2.5.4:
> > this is probably just an oversight from when xfs was later added
> > into the Linux tree.
> 
> We supported >8 TB file offsets on 32 bit systems on 2.4 kernels
> with XFS, so it sounds like it was wrong even when it was first
> committed. Of course, XFS wasn't merged until 2.5.36, so I guess
> nobody realised... ;)
> 
> > > But if s_maxbytes doesn't have to be greater than MAX_LFS_FILESIZE,
> > > i agree the issue should be fixed in layers above.
> > 
> > There is a "filesystems should never set s_maxbytes larger than
> > MAX_LFS_FILESIZE" comment in fs/super.c, but unfortunately its
> > warning is written with just 64-bit in mind (testing for negative).
> 
> Yup, introduced here:
> 
> commit 42cb56ae2ab67390da34906b27bedc3f2ff1393b
> Author: Jeff Layton <jlayton@redhat.com>
> Date:   Fri Sep 18 13:05:53 2009 -0700
> 
>     vfs: change sb->s_maxbytes to a loff_t
>     
>     sb->s_maxbytes is supposed to indicate the maximum size of a file that can
>     exist on the filesystem.  It's declared as an unsigned long long.
> 
> And yes, that will never fire on a 32bit filesystem, because loff_t
> is a "long long" type....
> 

Hmm...should we change that to something like this instead?

    WARN(((unsigned long long)sb->s_maxbytes > (unsigned long long)MAX_LFS_FILESIZE,
	"%s set sb->s_maxbytes to too large a value (0x%llx)\n", type->name, sb->s_maxbytes);

-- 
Jeff Layton <jlayton@poochiereds.net>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2015-09-27 23:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <560723F8.3010909@gmail.com>
2015-09-27  1:36 ` [PATCH] mm: fix cpu hangs on truncating last page of a 16t sparse file Hugh Dickins
2015-09-27  2:14   ` angelo
2015-09-27  2:21   ` angelo
2015-09-27 17:59     ` Hugh Dickins
2015-09-27 23:26       ` Dave Chinner
2015-09-27 23:56         ` Jeff Layton [this message]
2015-09-28  1:06           ` Dave Chinner
2015-09-28 17:03       ` Andi Kleen
2015-09-28 18:08         ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150927195655.18e20003@tlielax.poochiereds.net \
    --to=jlayton@poochiereds.net \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=angelo70@gmail.com \
    --cc=david@fromorbit.com \
    --cc=eguan@redhat.com \
    --cc=hch@infradead.org \
    --cc=hughd@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).