From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f197.google.com (mail-qk0-f197.google.com [209.85.220.197]) by kanga.kvack.org (Postfix) with ESMTP id 1A5B96B03A1 for ; Sun, 3 Sep 2017 22:20:39 -0400 (EDT) Received: by mail-qk0-f197.google.com with SMTP id p12so10834548qkl.0 for ; Sun, 03 Sep 2017 19:20:39 -0700 (PDT) Received: from aserp1040.oracle.com (aserp1040.oracle.com. [141.146.126.69]) by mx.google.com with ESMTPS id 49si5835986qtv.219.2017.09.03.19.20.37 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 03 Sep 2017 19:20:38 -0700 (PDT) Date: Sun, 3 Sep 2017 19:20:02 -0700 From: "Darrick J. Wong" Subject: Re: kernel BUG at fs/xfs/xfs_aops.c:853! in kernel 4.13 rc6 Message-ID: <20170904022002.GD4671@magnolia> References: <20170903074306.GA8351@infradead.org> <20170904014353.GG10621@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170904014353.GG10621@dastard> Sender: owner-linux-mm@kvack.org List-ID: To: Dave Chinner Cc: Christoph Hellwig , =?utf-8?B?0JzQuNGF0LDQuNC7INCT0LDQstGA0LjQu9C+0LI=?= , linux-xfs@vger.kernel.org, linux-mm@kvack.org, Jan Kara [add jan kara to cc] On Mon, Sep 04, 2017 at 11:43:53AM +1000, Dave Chinner wrote: > On Sun, Sep 03, 2017 at 12:43:06AM -0700, Christoph Hellwig wrote: > > On Sun, Sep 03, 2017 at 09:22:17AM +0500, D?D,N?D?D,D>> D?D?D2N?D,D>>D 3/4 D2 wrote: > > > [281502.961248] ------------[ cut here ]------------ > > > [281502.961257] kernel BUG at fs/xfs/xfs_aops.c:853! > > > > This is: > > > > bh = head = page_buffers(page); > > > > Which looks odd and like some sort of VM/writeback change might > > have triggered that we get a page without buffers, despite always > > creating buffers in iomap_begin/end and page_mkwrite. > > Pretty sure this can still happen when buffer_heads_over_limit comes > true. In that case, shrink_active_list() will attempt to strip > the bufferheads off the page even if it's a dirty page. i.e. this > code: > > if (unlikely(buffer_heads_over_limit)) { > if (page_has_private(page) && trylock_page(page)) { > if (page_has_private(page)) > try_to_release_page(page, 0); > unlock_page(page); > } > } > > > There was some discussion about this a while back, the consensus was > that it is a mm bug, but nobody wanted to add a PageDirty check > to try_to_release_page() and so nothing ended up being done about > it in the mm/ subsystem. Instead, filesystems needed to avoid it > if it was a problem for them. Indeed, we fixed it in the filesystem > in 4.8: > > 99579ccec4e2 xfs: skip dirty pages in ->releasepage() > > diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c > index 3ba0809e0be8..6135787500fc 100644 > --- a/fs/xfs/xfs_aops.c > +++ b/fs/xfs/xfs_aops.c > @@ -1040,6 +1040,20 @@ xfs_vm_releasepage( > > trace_xfs_releasepage(page->mapping->host, page, 0, 0); > > + /* > + * mm accommodates an old ext3 case where clean pages might not have had > + * the dirty bit cleared. Thus, it can send actual dirty pages to > + * ->releasepage() via shrink_active_list(). Conversely, > + * block_invalidatepage() can send pages that are still marked dirty > + * but otherwise have invalidated buffers. > + * > + * We've historically freed buffers on the latter. Instead, quietly > + * filter out all dirty pages to avoid spurious buffer state warnings. > + * This can likely be removed once shrink_active_list() is fixed. > + */ > + if (PageDirty(page)) > + return 0; > + > xfs_count_page_state(page, &delalloc, &unwritten); > > But looking at the current code, the comment is still mostly there > but the PageDirty() check isn't. > > > > In 4.10, this was done: > > commit 0a417b8dc1f10b03e8f558b8a831f07ec4c23795 > Author: Jan Kara > Date: Wed Jan 11 10:20:04 2017 -0800 > > xfs: Timely free truncated dirty pages > > Commit 99579ccec4e2 "xfs: skip dirty pages in ->releasepage()" started > to skip dirty pages in xfs_vm_releasepage() which also has the effect > that if a dirty page is truncated, it does not get freed by > block_invalidatepage() and is lingering in LRU list waiting for reclaim. > So a simple loop like: > > while true; do > dd if=/dev/zero of=file bs=1M count=100 > rm file > done > > will keep using more and more memory until we hit low watermarks and > start pagecache reclaim which will eventually reclaim also the truncate > pages. Keeping these truncated (and thus never usable) pages in memory > is just a waste of memory, is unnecessarily stressing page cache > reclaim, and reportedly also leads to anonymous mmap(2) returning ENOMEM > prematurely. > > So instead of just skipping dirty pages in xfs_vm_releasepage(), return > to old behavior of skipping them only if they have delalloc or unwritten > buffers and fix the spurious warnings by warning only if the page is > clean. > > CC: stable@vger.kernel.org > CC: Brian Foster > CC: Vlastimil Babka > Reported-by: Petr Ti? 1/2 ma > Fixes: 99579ccec4e271c3d4d4e7c946058766812afdab > Signed-off-by: Jan Kara > Reviewed-by: Brian Foster > Signed-off-by: Darrick J. Wong > > > So, yeah, we reverted the fix for a crash rather than trying to fix > the adverse behaviour caused by invalidation of a dirty page. > > e.g. why didn't we simply clear the PageDirty flag in > xfs_vm_invalidatepage()? The page is being invalidated - it's > contents will never get written back - so having delalloc or > unwritten extents over that page at the time it is invalidated is a > bug and the original fix would have triggered warnings about > this.... Seems like a reasonable revert/change, but given that ext3 was killed off long ago, is it even still the case that the mm can feed releasepage a dirty clean page? If that is the case, then isn't it time to fix the mm too? --D > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org