From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24F6DC43334 for ; Wed, 29 Jun 2022 08:38:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231656AbiF2Iim (ORCPT ); Wed, 29 Jun 2022 04:38:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230019AbiF2Iil (ORCPT ); Wed, 29 Jun 2022 04:38:41 -0400 Received: from verein.lst.de (verein.lst.de [213.95.11.211]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 76F161DA41; Wed, 29 Jun 2022 01:38:40 -0700 (PDT) Received: by verein.lst.de (Postfix, from userid 2407) id D53C667373; Wed, 29 Jun 2022 10:38:36 +0200 (CEST) Date: Wed, 29 Jun 2022 10:38:36 +0200 From: Christoph Hellwig To: Chris Mason Cc: Christoph Hellwig , Jan Kara , Qu Wenruo , josef@toxicpanda.com, dsterba@suse.com, linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH] btrfs: remove btrfs_writepage_cow_fixup Message-ID: <20220629083836.GA25088@lst.de> References: <20220624122334.80603-1-hch@lst.de> <7c30b6a4-e628-baea-be83-6557750f995a@gmx.com> <20220624125118.GA789@lst.de> <20220624130750.cu26nnm6hjrru4zd@quack3.lan> <20220625091143.GA23118@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.17 (2007-11-01) Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Tue, Jun 28, 2022 at 10:29:00AM -0400, Chris Mason wrote: > As Sterba points out later in the thread, btrfs cares more because of > stable page requirements to protect data during COW and to make sure the > crcs we write to disk are correct. I don't think this matters here. What the other file systems do is to simply not ever write a page that has the dirty bit set, but never had ->page_mkwrite called on it, which is the case that is getting fixed up here. I did a little research and this post from Jan describes the problem best: https://lore.kernel.org/linux-mm/20180103100430.GE4911@quack2.suse.cz/ So the problem is that while get_user_pages takes a write fault and marks the page dirty, the page could have been claned just after that, and then receive a set_page/folio_dirty after that. The canonical example would be the direct I/O read completion calling into that. > I'd love a proper fix for this on the *_user_pages() side where > page_mkwrite() style notifications are used all the time. It's just a huge > change, and my answer so far has always been that using btrfs mmap'd memory > for this kind of thing isn't a great choice either way. Everyone else has the same problem, but decided that you can't get full data integrity out of this workload. I think the sane answers are: simply don't writeback pages that are held by a get_user_pages with writable pages, or try to dirty the pages from set_page_dirtẏ. The set_page_dirty contexts are somewhat iffy, but would probably be a better place to kick off the btrfs writepage fixup.