From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20661C00140 for ; Fri, 19 Aug 2022 00:16:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239706AbiHSAQj convert rfc822-to-8bit (ORCPT ); Thu, 18 Aug 2022 20:16:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35516 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229899AbiHSAQi (ORCPT ); Thu, 18 Aug 2022 20:16:38 -0400 Received: from bee.birch.relay.mailchannels.net (bee.birch.relay.mailchannels.net [23.83.209.14]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCDE9BBA65 for ; Thu, 18 Aug 2022 17:16:36 -0700 (PDT) X-Sender-Id: instrampxe0y3a|x-authuser|calestyo@scientia.org Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 2DA9B1226A8; Fri, 19 Aug 2022 00:16:36 +0000 (UTC) Received: from cpanel-007-fra.hostingww.com (unknown [127.0.0.6]) (Authenticated sender: instrampxe0y3a) by relay.mailchannels.net (Postfix) with ESMTPA id 1240A122698; Fri, 19 Aug 2022 00:16:34 +0000 (UTC) ARC-Seal: i=1; s=arc-2022; d=mailchannels.net; t=1660868195; a=rsa-sha256; cv=none; b=F/tgCTZphvxTHxCbVQlVitf1Ny26P7ainj0+3Q+UTwdmAERa2JHCj85JgFaOzlPV1EXWB9 29VNw+qTjHVcugZP14V8qCGXItLm2FnHpfZ4g/+JrhHjA9pOU12wZPMEOAaZgwOCVK7AHV hoeLVGzjlCp8ifZFLMLiQcvByWTeXu5W0B29CzAT3JYdpUTt8uef71vAuKv7W5oI9A0T5+ Qx8VwLw+UBIWfFenFRCl6WS2t7CcZg0b5/SQElR5oTjzLwFFPTBs3RjTLK2Y43eo9iNv7T 6AXkGRaeoGtD4+d0UdN/o8jZ57tN6sXDbYwgXLAnsBfjjZsp9d/+U6Ma64l9Fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=mailchannels.net; s=arc-2022; t=1660868195; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aL9zmd8VC5S5PZh3OhJ3ojjHbm6UUrSmrPNMA5txEz8=; b=pi1uE7NsBoE8dJo9CiIVq0Hq5lQZ6oUlMczlGTS6gRfq6MeciA0Uqbw03tUQl6quPz/NbU LLNaU+eyPYjFWbbUZuD8uoR6ZsG4CDViajdk1lXZF7R9LiSxLV8BWHs/1AWLUFTMJLrNEN C6sKYHQWvCFkkIC5y9DyJ41oXBf9hMp8GxGgHVvCw916fT0DkaSAhz3In9X5MeRUfFtC1W j8LxZ97Hc79+yT054jhBHxb0ddlMuafhzh3dpqo8GmsmjvuqimlVZrH+OXZVFWWUvcyYy8 n2kUVr3AaLGq6fQw7FUoIXEH6euk6kPzfESIMYKLgNvioqRaa49szCyASiFRvA== ARC-Authentication-Results: i=1; rspamd-79945fd77c-qf2m6; auth=pass smtp.auth=instrampxe0y3a smtp.mailfrom=calestyo@scientia.org X-Sender-Id: instrampxe0y3a|x-authuser|calestyo@scientia.org X-MC-Relay: Neutral X-MailChannels-SenderId: instrampxe0y3a|x-authuser|calestyo@scientia.org X-MailChannels-Auth-Id: instrampxe0y3a X-Lyrical-Eight: 6c4a5c483b678cb4_1660868195897_4065343456 X-MC-Loop-Signature: 1660868195897:1752898746 X-MC-Ingress-Time: 1660868195897 Received: from cpanel-007-fra.hostingww.com (cpanel-007-fra.hostingww.com [3.69.87.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384) by 100.124.238.95 (trex/6.7.1); Fri, 19 Aug 2022 00:16:35 +0000 Received: from p54b6dab3.dip0.t-ipconnect.de ([84.182.218.179]:37330 helo=heisenberg.fritz.box) by cpanel-007-fra.hostingww.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oOpgd-00067d-Hp; Fri, 19 Aug 2022 00:16:33 +0000 Message-ID: <467e49af8348d085e21079e8969bedbe379b3145.camel@scientia.org> Subject: Re: [PATCH 1/2] btrfs: fix space cache corruption and potential double allocations From: Christoph Anton Mitterer To: Omar Sandoval Cc: Linux BTRFS Date: Fri, 19 Aug 2022 02:16:28 +0200 In-Reply-To: References: <9ee45db86433bb8e4d7daff35502db241c69ad16.1660690698.git.osandov@fb.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT User-Agent: Evolution 3.44.4-1+b1 MIME-Version: 1.0 X-OutGoing-Spam-Status: No, score=-1.0 X-AuthUser: calestyo@scientia.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org Hey Omar. I'd have some more questions, which I hope help others and myself to better asses the impact of this issue: On Wed, 2022-08-17 at 17:30 -0700, Omar Sandoval wrote: > > > but metadata deletions also count, so basically any modification > results in a deletion. We haven't seen this in practice, but I > couldn't > find anything that would make it impossible. > > In place modifications of files also result in COW and deletion of > the > old data, so that also technically counts. 1) I thought the issue happens primarily with space-cache-v2, and there only when extents are deleted, which is when e.g. deleting (or because of CoW: modifying) regular files. I assume also when doing balance or scrub (when the scrub causes a repair)? But now you wrote "metadata deletions also count"? Wouldn't that mean that any other tree (csum, etc.) of the fs could be affected? So can corruptions happen when moving/renaming files, writing any metadata (either btrfs internal trees or also things like file permissions bits, XATTRs, etc.) or creating snapshots or subvolumes (or moving files therein)? I may also happen with v1 (or no-space-cache), but only when that needs to be regenerated (i.e. when one sees that cheksum error messages on the v1 space cache), but less likely (because it already requires the v1 cache to get corrupted somehow? 2) The silent data corruption itself happens by some range being sill used by some extent... AND also added back to the free space cache. So next time that range from the free space cache is assigned and written to the precious data would be lost. Or are there other ways that issue could strike? Or is it just a corruption of the free space cache, and the actual data in the extents is fine? 3) In your commit, you described a number of symptoms that were seen when the issue occurred. AFAIU, some of the cases were silent (data corruption), right? Is the whole issue super rare (like back the corruption with compression and holes only happened under some very awkward and rare situations) for v2? I just wonder a bit,... cause if you say it was introduced with 5.12, and since 5.15 (IIRC) v2 is the default for btrfs-progs... why haven't more people seen any corruptions? Or is the silent corruption much less likely than the one giving EEXIST - or vice versa? 4) The tool you're going to write - what will it be able to do? [I guess it will do something check whether any extent ranges are allocated AND also in the free space cache (v1 or v2)?] Will it just be able to tell whether one's current free space cache is corrupted. Or will it also be able to tell previously any actual data corruption already happened (i.e. still used extents got reallocated and overwritten)? 5) Are there ways for people to *definitely* rule out whether their data OR any other part of the fs was corrupted? Like I personally have SHA512 hashsums attached as XATTRS on all files of some filesystems with just data. So I can easily go over those and verify. Would a scrub (csum verification) also tell any data corruption? But I guess that would only tell it for extents... not if any other metadata in btrfs could be affected, too? 6) What should people do to get back to a safe and sound state? I mean if I'd now verify my hashsum XATTRS, even if they'd all be valid, my kernel would be still unpatched. But once the kernel is patched: Will it not be necessary to somehow check the free space cache then? Cause otherwise I could do my verification (and get an: all ok) but the free space tree is still bogus and some time later my data gets corrupted again - despite the fixed kernel. Wouldn't it be better to simply re-create the cache then (that is: after the kernel is patched BUT before doing the verification of data with e.g. hash sums or scrub), using: btrfs check --clear-space-cache v1|v2 ? When any other metadata could in principle have been affected by this issue,... wouldn't it - for 100%-safety - be recommended to start with creating a new filesystem (after having the fixed kernel) and recover from backups? (Right now this might be just quite some work... in 5 years or so people might simply no longer have their old backups. So I better re- create the fs now, than being sorry later.) 7) What are people advised to do until they receive a fixed kernel? You said with nodiscard the race window is smaller. So HDDs should already have this, but SSDs not (and people may want to set it). What about btrfs e.g. on top of dm-crypt, which per default blocks discard, would that count (even if not specifically set for btrfs)? Should one switch from v2 to v1? Thanks, Chris.