From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f43.google.com ([209.85.220.43]:35613 "EHLO mail-pa0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750864AbcC2GEP (ORCPT ); Tue, 29 Mar 2016 02:04:15 -0400 Date: Mon, 28 Mar 2016 22:04:10 -0800 From: Kent Overstreet To: Dave Chinner Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, Al Viro , Rik van Riel , Andrew Morton , Theodore Ts'o Subject: Re: fallocate INSERT_RANGE/COLLAPSE_RANGE is completely broken [PATCH] Message-ID: <20160329060410.GA2383@kmo-pixel> References: <20160329042546.GA10581@kmo-pixel> <20160329051558.GC30721@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160329051558.GC30721@dastard> Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Tue, Mar 29, 2016 at 04:15:58PM +1100, Dave Chinner wrote: > On Mon, Mar 28, 2016 at 08:25:46PM -0800, Kent Overstreet wrote: > > Bit of previous discussion: > > http://thread.gmane.org/gmane.linux.file-systems/101201/ > > > > The underlying issue is that we have no mechanism for invalidating a range of > > the pagecache and then _keeping it invalidated_ while we Do Stuff. > > > > The fallocate INSERT_RANGE/COLLAPSE_RANGE situation seems likely to be worse > > than I initially thought. I've been digging into this in the course of bcachefs > > testing - I was hitting assertions that meant state hanging off the page cache > > (in this case, allocation information, i.e. whether we needed to reserve space > > on write) was inconsistent with the btree in writepages(). > > > > Well, bcachefs isn't the only filesystem that hangs additional state off the > > pagecache, and the situation today is that an unpriviliged user can cause > > inconsistencies there by just doing buffered reads concurrently with > > INSERT_RANGE/COLLAPSE_RANGE. I highly highly doubt this is an issue of just > > "oops, you corrupted your file because you were doing stupid stuff" - who knows > > what internal invariants are getting broken here, and I don't particularly care > > to find out. > > I'd like to see a test case for this. Concurrent IO and/or page > faults should not run at the same as fallocate on XFS. Hence I'd > like to see the test cases that demonstrate buffered reads are > causing corruption during insert/collapse range operations. We use > the same locking strategy for fallocate as we use for truncate and > all the other internal extent manipulation operations, so if there's > something wrong, we need to fix it. It's entirely possible I'm wrong about XFS - your fault path locking looked correct, and I did see you had extra locking in your buffered read path but I thought it was a different lock. I'll recheck later, but for the moment I'm just going to assume I misspoke (and tbh always found xfs's locking to be quite rigorous). ext4 uses the generic code in all the places you're hooking into though - .fault, .read_iter, etc. The scheme I've got in this patch should perform quite a bit better than what you're doing - only locking in the slow cache miss path, vs. every time you touch the page cache.