From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,MSGID_FROM_MTA_HEADER,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ED173C433E0 for ; Mon, 28 Dec 2020 07:52:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A2E56207B6 for ; Mon, 28 Dec 2020 07:52:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726412AbgL1HwT (ORCPT ); Mon, 28 Dec 2020 02:52:19 -0500 Received: from de-smtp-delivery-102.mimecast.com ([62.140.7.102]:20168 "EHLO de-smtp-delivery-102.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726363AbgL1HwT (ORCPT ); Mon, 28 Dec 2020 02:52:19 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=mimecast20200619; t=1609141869; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wYDzFny9kPJrAu3RS2+uX499Pu5iJSwUQpJSTcoDjmc=; b=cQjvHz8K1WUgT0VHEFlowoqZZr75ArfF6T0n0LydLav+reS0wCByjlkxSU2cCyMhWPm/vj BbgsSRfW7n0qTDVN4zpk7iFZkgrd/FZXMoJO1C9V6J29rkGtzeg57JdK6Mwavjwagw39BZ R3Kk1tDdthhJaA+tdFYwAbZyolViaYs= Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05lp2177.outbound.protection.outlook.com [104.47.17.177]) (Using TLS) by relay.mimecast.com with ESMTP id de-mta-23-lvKx3NDKM46OTzUTajSWtw-1; Mon, 28 Dec 2020 08:48:59 +0100 X-MC-Unique: lvKx3NDKM46OTzUTajSWtw-1 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Do6/6tZULyKuWrLRwrM1FtjAQO0TZrMtHrtsd6oPKVkOvPPOxDhVQ+0MvCMRV1lN8RnSf5Hd832tBY762r7fTNOSwm+YAlwwXRl5ubzXalZPLYb5br0Foqk50gS7lsOT2bsg69s8r6pcBW69Mhe1fEdzMtJpcVp0ivtnh/mGFy16PO8T9sXLu+QNqv/VUGqwKrPJ4nAfAdVK019F3HsRh/sonBHKwv9LXCBe2TRm36DJ3QU2hcXNkZaHdrldLmQHepULU/A1XBzJ15+q9x3EUsG7jy79RyXZSL3ApZ/Ahr/YEuV8TKa8ZtUko3K6xnaEJTL9VGu6e9XrBv8Ph8Azmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=uOBN3omXXdqvcf/lYP8nZkheRK1ENLIcy1pC2r6PsEI=; b=I0jO5/UivnE3x7jIoNKATZt/VBrsUOhRDkKN0IeZq2LpqJNitLhbbMbm5C0ow9YVit648LNzWpTbP6ZtOGuPOSfr4Z6V7NXs1y4ZOolOtBjq4hOkr5VYwQxy21vGb8c3/6xGeEMFeGgNbKBgN016fWGR6XAaab6KhwHzw9JtR9hlX7B2R8eFmtCoZgX2LLvZ8FjI6udpva26qy/xUwCaQE4UuPfzX0bkHLE24eI5GsEaJbSW/V2Fp2PZOc/QE/pea9QW27Lhv+CKMbNFtjiAVRgnlwWOJD4TlQrnFigLqivjoXcMiuaIcvShoH/WY77B6ICdB/t1FyacLDJ0XX592w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none Authentication-Results: vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=none action=none header.from=suse.com; Received: from PA4PR04MB7533.eurprd04.prod.outlook.com (2603:10a6:102:f1::19) by PA4PR04MB7821.eurprd04.prod.outlook.com (2603:10a6:102:c5::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3700.30; Mon, 28 Dec 2020 07:48:58 +0000 Received: from PA4PR04MB7533.eurprd04.prod.outlook.com ([fe80::397a:950:d8c8:d709]) by PA4PR04MB7533.eurprd04.prod.outlook.com ([fe80::397a:950:d8c8:d709%6]) with mapi id 15.20.3700.031; Mon, 28 Dec 2020 07:48:58 +0000 Subject: Re: 5.6-5.10 balance regression? To: David Arendt , Qu Wenruo , =?UTF-8?Q?St=c3=a9phane_Lesimple?= , linux-btrfs@vger.kernel.org References: <505cabfa88575ed6dbe7cb922d8914fb@lesimple.fr> <292de7b8-42eb-0c39-d8c7-9a366f688731@prnet.org> <2846fc85-6bd3-ae7e-6770-c75096e5d547@gmx.com> <344c4bfd-3189-2bf5-9282-2f7b3cb23972@prnet.org> From: Qu Wenruo Message-ID: Date: Mon, 28 Dec 2020 15:48:47 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 In-Reply-To: <344c4bfd-3189-2bf5-9282-2f7b3cb23972@prnet.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable X-Originating-IP: [149.28.201.231] X-ClientProxiedBy: SJ0PR13CA0207.namprd13.prod.outlook.com (2603:10b6:a03:2c3::32) To PA4PR04MB7533.eurprd04.prod.outlook.com (2603:10a6:102:f1::19) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from [0.0.0.0] (149.28.201.231) by SJ0PR13CA0207.namprd13.prod.outlook.com (2603:10b6:a03:2c3::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3721.14 via Frontend Transport; Mon, 28 Dec 2020 07:48:55 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: b6c55bfe-4f0a-4268-d79d-08d8ab0508ab X-MS-TrafficTypeDiagnostic: PA4PR04MB7821: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: LndfgUhdzwfEq94VVSsrly8r3Ix74jRNgdrAcw6M6GDs6I3X6nwfsEi5e8qa5AcFPOdRaTTayHXrG3tvJAht1ntacy+7UiP9T40yCWJfzeEiFjwjlGzpV7l1/QjDB9z2FEcKoZJAS4Q7IesIoSIVmJcYlVL5eXUtgnkwC6ep6sNPU2MoV8ZVaoj11fru5cJ2aUvH59eBWwiKRfyA/AjMQ8cydS4mMNZrGBwFYfPde9eezyzZ+MWyW7iCoMx8CAmsaudl+X3WmyvadQVM8Xh93pShCTubyJ2q1JHTmIPuE3vRpJvlIb036/KVRLvDg5iz5Xn4tiMve4EuuKqKp7wENnninZ8aLb2M190wtjRwER7MO4Zj89C3D2iBLrOBqByvsxjihP7SM7iu5LkSwG9I1edqlWePg3den5guE0Qu808iBsbSqaP5wFnf9D1/Td4Sxkf9erDq7IKoXnZzAScRpu+1A7vEGw7Q3DU7Z9fPrI/9vLRl017JVjfidu7mJTms6KfG2FuRhiGpguJ+j5GKWkK0Q1eoKXx8uTHMUlfTedlgdJ/iJRPpmRuNbUKdKiL/4WNRHxzJpkuv701G8A79rkxQkqRjY/s4WrID0HkzgK8= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PA4PR04MB7533.eurprd04.prod.outlook.com;PTR:;CAT:NONE;SFS:(39860400002)(376002)(346002)(396003)(366004)(136003)(8676002)(53546011)(86362001)(36756003)(5660300002)(66476007)(110136005)(16526019)(66574015)(478600001)(16576012)(31686004)(186003)(316002)(66946007)(6706004)(31696002)(956004)(8936002)(966005)(6486002)(2616005)(6666004)(52116002)(30864003)(2906002)(66556008)(26005)(83380400001)(21314003)(78286007)(43740500002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData: =?us-ascii?Q?uYpUxQ+UOWjmb0I2EvCXJM+Xllz9I2HP6rnCwV0v/Jga1/qbi/hlw/tgl04C?= =?us-ascii?Q?plXB31wN9MqSXGi2DQs8l2TFv0JDDE3k24I971T/1IGRPb4YuKlWFxXB0tU9?= =?us-ascii?Q?14vleU8Riv5VlJrbJRew4i/o+5sxuftmFk4AjWsJW57/Ojyh+mM4bCau3W6o?= =?us-ascii?Q?iIbe2gjD5BH37j3Fg3hXUKSEBfR3FhRMUVGZZGsdlMWxwHsYs4hMBIRsQcRc?= =?us-ascii?Q?fVPLQsMrJz+rvY3eBynoKycwrd1IyBjrfXw0YOvSmM91L08jDu8dbjcYvimC?= =?us-ascii?Q?og3pvl0nkaR6CfJXWQIdrrQGFtRJfHncVYCO0vSL4NmSgyAgw+MD8rVHmwJl?= =?us-ascii?Q?ZXbBM0qjc2WXwB4IyY9yzJOVie5RDk5J4oa6Cn0+NGRg96R8V8EHBaAfCKAl?= =?us-ascii?Q?xdft8HLptAxbrPCe8bKgSK8WZtCc1L3XpLRtsjvm8odEpur3vruNcNaf2XhH?= =?us-ascii?Q?TM+tI33KNVo2PsqLl/DQ/gnTFkUAKuWiPabXIcnBF7YYyw+bbjneEqEZjmCQ?= =?us-ascii?Q?nleMj9bqxjyjNQzBoG++xclzSEOKB3AU9/p+dUR2Bk1Ta4E5grrhl7uL/uuH?= =?us-ascii?Q?FAaxVjKbLgYlIjKmD6dKVaGvttNHtAgWZqpTfMJJ3r7lK+XFpeGnNRoeRmPC?= =?us-ascii?Q?wfiCZlZgpYpt/N3iDPzPn0WAaeUdz3QUqqEN+CfzCu7/LMqkjp/+wUE+S7wm?= =?us-ascii?Q?gRAK/dYHxsRVBpCtf/h2nt+R1I3Bb+x+wEvm+6/v0Cu9tTODEyWNtU0DNAg1?= =?us-ascii?Q?rsToPfEK26P7TohiwQswHa5buk5k9n7cHCtX+zok0w3y00JQlvqvTwlRuglW?= =?us-ascii?Q?E1jHhoZhw843MrUGe3cBUJ1ZUp3nJwRJJCLOA8BJlbKB4JsS9rAhPB+3A3lX?= =?us-ascii?Q?h6G+ZJ9kjrave0xsTyFPQFui2hoW/Qql8VstK3pBM34MvUyvr9W8AFRPVuJm?= =?us-ascii?Q?BvTRFGJG2jE66sU3XvuUd34EqpbHXGmmfd/efb3QHcBhWv4EakaKI/rH475i?= =?us-ascii?Q?6/nT?= X-OriginatorOrg: suse.com X-MS-Exchange-CrossTenant-AuthSource: PA4PR04MB7533.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Dec 2020 07:48:58.0674 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba X-MS-Exchange-CrossTenant-Network-Message-Id: b6c55bfe-4f0a-4268-d79d-08d8ab0508ab X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: DdaWBNKL+SsDWbG+cNcFVRW6rAoi6iWfM9KfJSCWihe4xAGHqcfnXJkTgbXhQig0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: PA4PR04MB7821 Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On 2020/12/28 =E4=B8=8B=E5=8D=883:38, David Arendt wrote: > Hi, >=20 > unfortunately the problem is no longer reproducible, probably due to=20 > writes happening in meantime. If you still want a btrfs-image, I can=20 > create one (unfortunately only without data as there is confidential=20 > data in it), but as the problem is currently no longer reproducible, I=20 > think it probably won't help. That's fine, at least you get your fs back to normal. I tried several small balance locally, not reproduced, thus I guess it=20 may be related to certain tree layout. Anyway, I'll wait for another small enough and reproducible report. Thanks, Qu >=20 > Thanks in advance, > David Arendt >=20 > On 12/28/20 1:06 AM, Qu Wenruo wrote: >> >> >> On 2020/12/27 =E4=B8=8B=E5=8D=889:11, David Arendt wrote: >>> Hi, >>> >>> last week I had the same problem on a btrfs filesystem after updating t= o >>> kernel 5.10.1. I have never had this problem before kernel 5.10.x. >>> 5.9.x did now show any problem. >>> >>> Dec 14 22:30:59 xxx kernel: BTRFS info (device sda2): scrub: started on >>> devid 1 >>> Dec 14 22:31:09 xxx kernel: BTRFS info (device sda2): scrub: finished o= n >>> devid 1 with status: 0 >>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): balance: start >>> -dusage=10 >>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): relocating block >>> group 71694286848 flags data >>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): found 1058 >>> extents, stage: move data extents >>> Dec 14 22:33:16 xxx kernel: BTRFS info (device sda2): balance: ended >>> with status: -2 >>> >>> This is not a multidevice volume but a volume consisting of a single >>> partition. >>> >>> xxx ~ # btrfs fi df /u00 >>> Data, single: total=10.01GiB, used=3D24GiB >>> System, single: total=3D00MiB, used=16.00KiB >>> Metadata, single: total=3D76GiB, used=3D10GiB >>> GlobalReserve, single: totalG.17MiB, used=3D00B >>> >>> xxx ~ # btrfs device usage /u00 >>> /dev/sda2, ID: 1 >>> =C2=A0=C2=A0=C2=A0 Device size:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 19.81GiB >>> =C2=A0=C2=A0=C2=A0 Device slack:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0.00B >>> =C2=A0=C2=A0=C2=A0 Data,single:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 10.01GiB >>> =C2=A0=C2=A0=C2=A0 Metadata,single:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 2.76GiB >>> =C2=A0=C2=A0=C2=A0 System,single:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 4.00MiB >>> =C2=A0=C2=A0=C2=A0 Unallocated:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 7.04GiB >> >> This seems small enough, thus a btrfs-image dump would help. >> >> Although there is a limit for btrfs-image dump, since it only contains >> metadata, when we try to balance data to reproduce the bug, it would >> easily cause data csum error and exit convert. >> >> If possible, would you please try to take a dump with this branch? >> https://github.com/adam900710/btrfs-progs/tree/image_data_dump >> >> It provides a new option for btrfs-image, -d, which will also take the=20 >> data. >> >> Also, please keep in mind that, -d dump will contain data of your fs, >> thus if it contains confidential info, please use regular btrfs-image. >> >> Thanks, >> Qu >>> >>> >>> On 12/27/20 1:11 PM, St=C3=A9phane Lesimple wrote: >>>> Hello, >>>> >>>> As part of the maintenance routine of one of my raid1 FS, a few days >>>> ago I was in the process >>>> of replacing a 10T drive with a 16T one. >>>> So I first added the new 16T drive to the FS (btrfs dev add), then >>>> started a btrfs dev del. >>>> >>>> After a few days of balancing the block groups out of the old 10T=20 >>>> drive, >>>> the balance aborted when around 500 GiB of data was still to be moved >>>> out of the drive: >>>> >>>> Dec 21 14:18:40 nas kernel: BTRFS info (device dm-10): relocating >>>> block group 11115169841152 flags data|raid1 >>>> Dec 21 14:18:54 nas kernel: BTRFS info (device dm-10): found 6264 >>>> extents, stage: move data extents >>>> Dec 21 14:19:16 nas kernel: BTRFS info (device dm-10): balance: ended >>>> with status: -2 >>>> >>>> Of course this also cancelled the device deletion, so after that the >>>> device was still part of the FS. I then tried to do a balance manually= , >>>> in an attempt to reproduce the issue: >>>> >>>> Dec 21 14:28:16 nas kernel: BTRFS info (device dm-10): balance: start >>>> -ddevid=3Dlimit=3D >>>> Dec 21 14:28:16 nas kernel: BTRFS info (device dm-10): relocating >>>> block group 11115169841152 flags data|raid1 >>>> Dec 21 14:28:29 nas kernel: BTRFS info (device dm-10): found 6264 >>>> extents, stage: move data extents >>>> Dec 21 14:28:46 nas kernel: BTRFS info (device dm-10): balance: ended >>>> with status: -2 >>>> >>>> There were of course still plenty of room on the FS, as I added a new >>>> 16T drive >>>> (a btrfs fi usage is further down this email), so it struck me as odd. >>>> So, I tried to lower the reduncancy temporarily, expecting the balance >>>> of this block group to >>>> complete immediately given that there were already a copy of this data >>>> present on another drive: >>>> >>>> Dec 21 14:38:50 nas kernel: BTRFS info (device dm-10): balance: start >>>> -dconvert=3Dngle,soft,devid=3Dlimit=3D >>>> Dec 21 14:38:50 nas kernel: BTRFS info (device dm-10): relocating >>>> block group 11115169841152 flags data|raid1 >>>> Dec 21 14:39:00 nas kernel: BTRFS info (device dm-10): found 6264 >>>> extents, stage: move data extents >>>> Dec 21 14:39:17 nas kernel: BTRFS info (device dm-10): balance: ended >>>> with status: -2 >>>> >>>> That didn't work. >>>> I also tried to mount the FS in degraded mode, with the drive I wanted >>>> to remove missing, >>>> using btrfs dev del missing, but the balance still failed with the >>>> same error on the same block group. >>>> >>>> So, as I was running 5.10.1 just for a few days, I tried an older >>>> kernel: 5.6.17, >>>> and retried the balance once again (with still the drive voluntarily >>>> missing): >>>> >>>> [ 413.188812] BTRFS info (device dm-10): allowing degraded mounts >>>> [ 413.188814] BTRFS info (device dm-10): using free space tree >>>> [ 413.188815] BTRFS info (device dm-10): has skinny extents >>>> [ 413.189674] BTRFS warning (device dm-10): devid 5 uuid >>>> 068c6db3-3c30-4c97-b96b-5fe2d6c5d677 is missing >>>> [ 424.159486] BTRFS info (device dm-10): balance: start >>>> -dconvert=3Dngle,soft,devid=3Dlimit=3D >>>> [ 424.772640] BTRFS info (device dm-10): relocating block group >>>> 11115169841152 flags data|raid1 >>>> [ 434.749100] BTRFS info (device dm-10): found 6264 extents, stage: >>>> move data extents >>>> [ 477.703111] BTRFS info (device dm-10): found 6264 extents, stage: >>>> update data pointers >>>> [ 497.941482] BTRFS info (device dm-10): balance: ended with status: 0 >>>> >>>> The problematic block group was balanced successfully this time. >>>> >>>> I balanced a few more successfully (without the -dconvert=3Dngle optio= n), >>>> then decided to reboot under 5.10 just to see if I would hit this >>>> issue again. >>>> I didn't: the btrfs dev del worked correctly after the last 500G or so >>>> data >>>> was moved out of the drive. >>>> >>>> This is the output of btrfs fi usage after I successfully balanced the >>>> problematic block group under the 5.6.17 kernel. Notice the multiple >>>> data profile, which is expected as I used the -dconvert balance option= , >>>> and also the fact that apparently 3 chunks were allocated on new16T fo= r >>>> this, even if only 1 seem to be used. We can tell because this is the >>>> first and only time the balance succeeded with the -dconvert option, >>>> hence these chunks are all under "data,single": >>>> >>>> Overall: >>>> Device size:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 41.89TiB >>>> Device allocated:=C2=A0=C2=A0 21.74TiB >>>> Device unallocated: 20.14TiB >>>> Device missing:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 9.09TiB >>>> Used:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 21.71TiB >>>> Free (estimated):=C2=A0=C2=A0 10.08TiB (min: 10.07TiB) >>>> Data ratio:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 2.00 >>>> Metadata ratio:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2.00 >>>> Global reserve:=C2=A0=C2=A0=C2=A0 512.00MiB (used: 0.00B) >>>> Multiple profiles:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 yes (data) >>>> >>>> Data,single: Size:3.00GiB, Used:1.00GiB (33.34%) >>>> /dev/mapper/luks-new16T=C2=A0=C2=A0=C2=A0=C2=A0 3.00GiB >>>> >>>> Data,RAID1: Size:10.83TiB, Used:10.83TiB (99.99%) >>>> /dev/mapper/luks-10Ta=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 7.14TiB >>>> /dev/mapper/luks-10Tb=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 7.10TiB >>>> missing=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 482.00GiB >>>> /dev/mapper/luks-new16T=C2=A0=C2=A0=C2=A0=C2=A0 6.95TiB >>>> >>>> Metadata,RAID1: Size:36.00GiB, Used:23.87GiB (66.31%) >>>> /dev/mapper/luks-10Tb=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 36.00GiB >>>> /dev/mapper/luks-ssd-mdata 36.00GiB >>>> >>>> System,RAID1: Size:32.00MiB, Used:1.77MiB (5.52%) >>>> /dev/mapper/luks-10Ta=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 32.00MiB >>>> /dev/mapper/luks-10Tb=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 32.00MiB >>>> >>>> Unallocated: >>>> /dev/mapper/luks-10Ta=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1.95TiB >>>> /dev/mapper/luks-10Tb=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1.96TiB >>>> missing=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 8.62TiB >>>> /dev/mapper/luks-ssd-mdata 11.29GiB >>>> /dev/mapper/luks-new16T=C2=A0=C2=A0=C2=A0=C2=A0 7.60TiB >>>> >>>> I wasn't going to send an email to this ML because I knew I had nothin= g >>>> to reproduce the issue noww that it was "fixed", but now I think I'm >>>> bumping >>>> into the same issue on another FS, while rebalancing data after adding >>>> a drive, >>>> which happens to be the old 10T drive of the FS above. >>>> >>>> The btrfs fi usage of this second FS is as follows: >>>> >>>> Overall: >>>> Device size:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 25.50TiB >>>> Device allocated:=C2=A0=C2=A0 22.95TiB >>>> Device unallocated:=C2=A0 2.55TiB >>>> Device missing:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 0.00B >>>> Used:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 22.36TiB >>>> Free (estimated):=C2=A0=C2=A0=C2=A0 3.14TiB (min: 1.87TiB) >>>> Data ratio:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 1.00 >>>> Metadata ratio:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2.00 >>>> Global reserve:=C2=A0=C2=A0=C2=A0 512.00MiB (used: 0.00B) >>>> Multiple profiles:=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 no >>>> >>>> Data,single: Size:22.89TiB, Used:22.29TiB (97.40%) >>>> /dev/mapper/luks-12T=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 10.91Ti= B >>>> /dev/mapper/luks-3Ta=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2= .73TiB >>>> /dev/mapper/luks-3Tb=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2= .73TiB >>>> /dev/mapper/luks-10T=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 6= .52TiB >>>> >>>> Metadata,RAID1: Size:32.00GiB, Used:30.83GiB (96.34%) >>>> /dev/mapper/luks-ssd-mdata2 32.00GiB >>>> /dev/mapper/luks-10T=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 32.00Gi= B >>>> >>>> System,RAID1: Size:32.00MiB, Used:2.44MiB (7.62%) >>>> /dev/mapper/luks-3Tb=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 32.00Mi= B >>>> /dev/mapper/luks-10T=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 32.00Mi= B >>>> >>>> Unallocated: >>>> /dev/mapper/luks-12T=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 45.00Mi= B >>>> /dev/mapper/luks-ssd-mdata2=C2=A0 4.00GiB >>>> /dev/mapper/luks-3Ta=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1= .02MiB >>>> /dev/mapper/luks-3Tb=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2= .97GiB >>>> /dev/mapper/luks-10T=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 2= .54TiB >>>> >>>> I can reproduce the problem reliably: >>>> >>>> # btrfs bal start -dvrange4625344765952..34625344765953 /tank >>>> ERROR: error during balancing '/tank': No such file or directory >>>> There may be more info in syslog - try dmesg | tail >>>> >>>> [145979.563045] BTRFS info (device dm-10): balance: start >>>> -dvrange4625344765952..34625344765953 >>>> [145979.585572] BTRFS info (device dm-10): relocating block group >>>> 34625344765952 flags data|raid1 >>>> [145990.396585] BTRFS info (device dm-10): found 167 extents, stage: >>>> move data extents >>>> [146002.236115] BTRFS info (device dm-10): balance: ended with=20 >>>> status: -2 >>>> >>>> If anybody is interested in looking into this, this time I can leave >>>> the FS in this state. >>>> The issue is reproducible, and I can live without completing the >>>> balance for the next weeks >>>> or even months, as I don't think I'll need the currently unallocatable >>>> space soon. >>>> >>>> I also made a btrfs-image of the FS, using btrfs-image -c 9 -t 4 -s -w= . >>>> If it's of any use, I can drop it somewhere (51G). >>>> >>>> I could try to bisect manually to find which version between 5.6.x and >>>> 5.10.1 started to behave >>>> like this, but on the first success, I won't know how to reproduce the >>>> issue a second time, as >>>> I'm not 100% sure it can be done solely with the btrfs-image. >>>> >>>> Note that another user seem to have encoutered a similar issue in July >>>> with 5.8: >>>> https://www.spinics.net/lists/linux-btrfs/msg103188.html >>>> >>>> Regards, >>>> >>>> St=C3=A9phane Lesimple. >>> >>> >=20