From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8A71EB64DA for ; Fri, 30 Jun 2023 15:18:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232599AbjF3PS0 (ORCPT ); Fri, 30 Jun 2023 11:18:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232693AbjF3PSH (ORCPT ); Fri, 30 Jun 2023 11:18:07 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 806AA4481; Fri, 30 Jun 2023 08:17:20 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 0B2F861782; Fri, 30 Jun 2023 15:16:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6C26AC433C0; Fri, 30 Jun 2023 15:16:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1688138218; bh=qe0xHIkPct8RIYm3UcNCeaQ58UYIh6kE56YDl8eC1Og=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=BxQh7oQzoJhXiPMbjEKynnEptPn9WIEBf9QA+AnsIlsxCOJ+I66FAZvGRSsal+rIf n+CrtWzjeDwcTVuGLgl/3e7025zwoi98yERxVKJXF/8DNM5uhahg82aICpME1036s2 EnXPNsHE+dCoM/VwiWLC60S3S5wn5cgume3YqUpd1pICPc04aqIjELd4ItDJJPOfD9 H2gyJ4YNwEc5NvE4uvB/feXcFYV5wrGccrEeecEYEPCUYQDfULwqCJ3jPzfqNhoA8H 5JhZN9Dwu4WLEdwLHdS9axtXZ8eq438MlGweaCGf+kRzUiF+rViKx/+DF32lp9milg r73bA3lnH+qQw== Date: Fri, 30 Jun 2023 08:16:57 -0700 From: "Darrick J. Wong" To: Amir Goldstein Cc: Ignat Korchagin , Matthew Wilcox , Daniel Dao , Dave Chinner , kernel-team , linux-fsdevel@vger.kernel.org, Chandan Babu R , Leah Rumancik , linux-xfs , "Luis R. Rodriguez" Subject: Re: Backporting of series xfs/iomap: fix data corruption due to stale cached iomap Message-ID: <20230630151657.GJ11441@frogsfrogsfrogs> References: <20230629181408.GM11467@frogsfrogsfrogs> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org On Fri, Jun 30, 2023 at 04:05:36PM +0300, Amir Goldstein wrote: > On Fri, Jun 30, 2023 at 3:30 PM Ignat Korchagin wrote: > > > > On Fri, Jun 30, 2023 at 11:39 AM Amir Goldstein wrote: > > > > > > On Thu, Jun 29, 2023 at 10:31 PM Ignat Korchagin wrote: > > > > > > > > On Thu, Jun 29, 2023 at 7:14 PM Darrick J. Wong wrote: > > > > > > > > > > [add the xfs lts maintainers] > > > > > > > > > > On Thu, Jun 29, 2023 at 05:34:00PM +0100, Matthew Wilcox wrote: > > > > > > On Thu, Jun 29, 2023 at 05:09:41PM +0100, Daniel Dao wrote: > > > > > > > Hi Dave and Derrick, > > > > > > > > > > > > > > We are tracking down some corruptions on xfs for our rocksdb workload, > > > > > > > running on kernel 6.1.25. The corruptions were > > > > > > > detected by rocksdb block checksum. The workload seems to share some > > > > > > > similarities > > > > > > > with the multi-threaded write workload described in > > > > > > > https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@dread.disaster.area/ > > > > > > > > > > > > > > Can we backport the patch series to stable since it seemed to fix data > > > > > > > corruptions ? > > > > > > > > > > > > For clarity, are you asking for permission or advice about doing this > > > > > > yourself, or are you asking somebody else to do the backport for you? > > > > > > > > > > Nobody's officially committed to backporting and testing patches for > > > > > 6.1; are you (Cloudflare) volunteering? > > > > > > > > Yes, we have applied them on top of 6.1.36, will be gradually > > > > releasing to our servers and will report back if we see the issues go > > > > away > > > > > > > > > > Getting feedback back from Cloudflare production servers is awesome > > > but it's not enough. > > > > > > The standard for getting xfs LTS backports approved is: > > > 1. Test the backports against regressions with several rounds of fstests > > > check -g auto on selected xfs configurations [1] > > > 2. Post the backport series to xfs list and get an ACK from upstream > > > xfs maintainers > > > > > > We have volunteers doing this work for 5.4.y, 5.10.y and 5.15.y. > > > We do not yet have a volunteer to do that work for 6.1.y. > > > > > > The question is whether you (or your team) are volunteering to > > > do that work for 6.1.y xfs backports to help share the load? > > > > We are not a big team and apart from other internal project work our > > efforts are focused on fixing this issue in production, because it > > affects many teams and workloads. If we confirm that these patches fix > > the issue in production, we will definitely consider dedicating some > > work to ensure they are officially backported. But if not - we would > > be required to search for a fix first before we can commit to any > > work. > > > > So, IOW - can we come back to you a bit later on this after we get the > > feedback from production? > > > > Of course. > The volunteering question for 6.1.y is independent. > > When you decide that you have a series of backports > that proves to fix a real bug in production, > a way to test the series will be worked out. /me notes that xfs/558 and xfs/559 (in fstests) are the functional tests for these patches that you're backporting; it would be useful to have a third party (i.e. not just the reporter and the author) confirm that the two fstests pass when real workloads are fixed. --D > Thanks, > Amir.