From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7E30C4332F for ; Tue, 20 Dec 2022 04:02:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232859AbiLTECT (ORCPT ); Mon, 19 Dec 2022 23:02:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233208AbiLTEBU (ORCPT ); Mon, 19 Dec 2022 23:01:20 -0500 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9403A2DED for ; Mon, 19 Dec 2022 20:01:16 -0800 (PST) Received: by mail-pj1-x102b.google.com with SMTP id w4-20020a17090ac98400b002186f5d7a4cso15271425pjt.0 for ; Mon, 19 Dec 2022 20:01:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20210112.gappssmtp.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=3i7/epIg3s+Kg1aMbgN7iVzAn5rV9ORpLJbbqj4cTkw=; b=yDnv1q8deLFxHxBQNOIR/iufCtokHzMiD1MzZapj25LREJBioDi20wFFxr6jfLaKWo 7uVB/HDXY0kbxexXHnLj1I1cvVe8f8LjBQnhubnsOLFa90yep2eNTLKrYYrp2ag9WjyN fNVLpBcX+Do3+G6w1lldnJb8pHbiJZJBWwaaMcJ+mB9suLe7QAQs9/TrJ1NAcnYp4H65 R1OM6Ou+FyvXBu/gjOXebVI4gR+yAEc3G+mSfPTQzVfoQJYMkcK8uB5z1jLlEKxRjv+X MRoUuthRnZ9Ge1ZA6zkVnEEfJ302ExHTm2Jy/39Kfh/Oq72YCu9PD2oizMD/QOM/5ON9 8UAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=3i7/epIg3s+Kg1aMbgN7iVzAn5rV9ORpLJbbqj4cTkw=; b=YNexYE9XDvLwhTQjKQ6KF1h7mfo8Gn2kHf0LJorOw8z2tqyD1EB/yHTqtZf1c5VX6z YvIti2vilBeuuShygJMnIY3LnvN4TExjdpobCsgU31EVpS2Afk3MrodGdTEPX4dVD2k4 WSmwKiGroHgzXHKQErKbOpAmdkt1kRsAAFsZxDY0XPDKMJX32uwo7qAgNvx1BNiSJtXs E5RNB1EBxSKIMlKvQSHigFIsU5DKobDel7mPoLJMh51ZWlxBshezS1GDrGX3HqMZInGg aYjMiM8UhqkjAxps7h1J+AfjH5eaK/ouiSUPLLoyeE6bHKXuoknqlSTtlDjUhM3NuFf/ DBHA== X-Gm-Message-State: ANoB5pmype6bDjLXKDjkhJ+mTWaUvuNV4YuNCkY5h2hmMslf2/ZclVOO Hxu4/16DMLzomsmZrp36enJRPF4nSRk3h1ra X-Google-Smtp-Source: AA0mqf63Bp0La3N1p3vEbRqVZRTRw6skueolslB9+Rk3feMRmvq/+KAqiFRLkqu92YSXrsqnI5uzZw== X-Received: by 2002:a17:902:e193:b0:189:8002:1996 with SMTP id y19-20020a170902e19300b0018980021996mr38673719pla.35.1671508876105; Mon, 19 Dec 2022 20:01:16 -0800 (PST) Received: from dread.disaster.area (pa49-181-138-158.pa.nsw.optusnet.com.au. [49.181.138.158]) by smtp.gmail.com with ESMTPSA id d11-20020a170902cecb00b0018980f14ecfsm8070483plg.115.2022.12.19.20.01.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Dec 2022 20:01:15 -0800 (PST) Received: from dave by dread.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1p7ToW-00AZBS-Ui; Tue, 20 Dec 2022 15:01:12 +1100 Date: Tue, 20 Dec 2022 15:01:12 +1100 From: Dave Chinner To: Sasha Levin Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org, Dave Chinner , Christoph Hellwig , "Darrick J . Wong" , hch@infradead.org, linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH AUTOSEL 6.1 13/16] iomap: write iomap validity checks Message-ID: <20221220040112.GG1971568@dread.disaster.area> References: <20221220012053.1222101-1-sashal@kernel.org> <20221220012053.1222101-13-sashal@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221220012053.1222101-13-sashal@kernel.org> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 19, 2022 at 08:20:50PM -0500, Sasha Levin wrote: > From: Dave Chinner > > [ Upstream commit d7b64041164ca177170191d2ad775da074ab2926 ] > > A recent multithreaded write data corruption has been uncovered in > the iomap write code. The core of the problem is partial folio > writes can be flushed to disk while a new racing write can map it > and fill the rest of the page: > > writeback new write > > allocate blocks > blocks are unwritten > submit IO > ..... > map blocks > iomap indicates UNWRITTEN range > loop { > lock folio > copyin data > ..... > IO completes > runs unwritten extent conv > blocks are marked written > > get next folio > } > > Now add memory pressure such that memory reclaim evicts the > partially written folio that has already been written to disk. > > When the new write finally gets to the last partial page of the new > write, it does not find it in cache, so it instantiates a new page, > sees the iomap is unwritten, and zeros the part of the page that > it does not have data from. This overwrites the data on disk that > was originally written. > > The full description of the corruption mechanism can be found here: > > https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/ > > To solve this problem, we need to check whether the iomap is still > valid after we lock each folio during the write. We have to do it > after we lock the page so that we don't end up with state changes > occurring while we wait for the folio to be locked. > > Hence we need a mechanism to be able to check that the cached iomap > is still valid (similar to what we already do in buffered > writeback), and we need a way for ->begin_write to back out and > tell the high level iomap iterator that we need to remap the > remaining write range. > > The iomap needs to grow some storage for the validity cookie that > the filesystem provides to travel with the iomap. XFS, in > particular, also needs to know some more information about what the > iomap maps (attribute extents rather than file data extents) to for > the validity cookie to cover all the types of iomaps we might need > to validate. > > Signed-off-by: Dave Chinner > Reviewed-by: Christoph Hellwig > Reviewed-by: Darrick J. Wong > Signed-off-by: Sasha Levin This commit is not a standalone backport candidate. It is a pure infrastructure change that does nothing by itself except to add more code that won't get executed. There are another 7-8 patches that need to be backported along with this patch to fix the data corruption that is mentioned in this commit. I'd stronly suggest that you leave this whole series of commits to the XFS LTS maintainers to backport if they so choose to - randomly backporting commits from the middle of the series only makes their job more complex.... -Dave. -- Dave Chinner david@fromorbit.com