From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,PDS_BAD_THREAD_QP_64, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FD08C49EA6 for ; Sat, 26 Jun 2021 11:09:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5367B616ED for ; Sat, 26 Jun 2021 11:09:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229630AbhFZLL5 (ORCPT ); Sat, 26 Jun 2021 07:11:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229518AbhFZLL5 (ORCPT ); Sat, 26 Jun 2021 07:11:57 -0400 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0882C061574 for ; Sat, 26 Jun 2021 04:09:34 -0700 (PDT) Received: by mail-pl1-x62d.google.com with SMTP id b3so6091750plg.2 for ; Sat, 26 Jun 2021 04:09:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:references:in-reply-to:subject:date:message-id:mime-version :content-transfer-encoding:content-language:thread-index; bh=gMplGGKoCVImqPid11LDHZ7G7xflf7znmdYM1EIRnYI=; b=n5Gi2FrMA6tavRn0/J4JBUeOOMq4o5tXmTKKNfvriOv7xGUfmP2jqlhvn2Xm/NaX79 36/B/TyOcTEwx2RiqeYOVDlcpiqbjM8nGOTD6xhsn9VKEci5DmzgWIaY0lQyoka0C2Jy 3d+LmsDldfFf2ZlISBg6EYm0teeiEnf7l94HY1Z/KgK33In3RgqBRxkTRIe8qSZuEbEQ 89htHK2Dx+e6EIi1/Oi964IdyUHpBsmpVwWcCXYkSPV3IyJ9wLTYTouOcZ5oXRfpI2Vj TRicPHG8Xccv7mv0mLbftme9dPCRTt2ijBEqweziUuOpARCv65fZReYue5MuZFdCBPhR i8sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:references:in-reply-to:subject:date :message-id:mime-version:content-transfer-encoding:content-language :thread-index; bh=gMplGGKoCVImqPid11LDHZ7G7xflf7znmdYM1EIRnYI=; b=K98vFkOwROv8APxQpGrn2Kvi6QQoOWxD/mK3a3EHh2dxnfY738cSw0576wSQoNOsty r9qo8z+lHdKVmjiJt6oDzyxlICIPBtM+0gYYA1BunMzS9ncmmI32cbn7RWq8nfBHGEt3 GNj9jTV+ndxv8QRLdEbgPbrMnCg2U45nd+ZwuFrM5qRLTF357TftwQpn2wLq80p7MBy+ SzNAe3tHPLxiRVXUcgTdBkGNRwkd3qb8KxQsL7mI+xcgHnmlfYJ5WmZpIO0xD7R7gSw7 KlXHYqHkQAjfEYwGGadQUHV8FpvS3WE+KkiU3qukBf+4kf8+lytbdIIxmxNNQdjsWeo/ kfEw== X-Gm-Message-State: AOAM533/GgGOSieE/xQfX787s9X0/vob+6U/iCd+79YSoolX4VEqJCxQ NCvwdZkXEZpT3CVQCWVbPl+jtwulJIYgjoA0 X-Google-Smtp-Source: ABdhPJw2TJiVj5Un0seB+HX/UZYgjnZQEbA9goF8vLDYXvCz602XoJiENlzsQd6J60U1eWqZbXLFOg== X-Received: by 2002:a17:90b:1c06:: with SMTP id oc6mr15611233pjb.112.1624705773782; Sat, 26 Jun 2021 04:09:33 -0700 (PDT) Received: from EdgarII ([58.164.17.235]) by smtp.gmail.com with ESMTPSA id g10sm7341271pji.10.2021.06.26.04.09.31 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 26 Jun 2021 04:09:33 -0700 (PDT) From: "Jason Flood" <3mu5555@gmail.com> To: "'Phil Turmel'" , References: <007601d769ba$ced0e870$6c72b950$@gmail.com> <6d412bf3-a7b9-1f08-2da9-96d34d8f112b@turmel.org> In-Reply-To: <6d412bf3-a7b9-1f08-2da9-96d34d8f112b@turmel.org> Subject: RE: 4-disk RAID6 (non-standard layout) normalise hung, now all disks spare Date: Sat, 26 Jun 2021 21:09:29 +1000 Message-ID: <00f101d76a7b$bdb3fc50$391bf4f0$@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 16.0 Content-Language: en-au Thread-Index: AQGNx8sXAjFFTjp8y/bre2t4Ys80RwKF0iigq6U4YWA= Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org Thanks for that, Phil - I think I'm starting to piece it all together = now. I was going from a 4-disk RAID5 to 4-disk RAID6, so from my reading = the backup file was recommended. The non-standard layout meant that the = array had over 20TB usable, but standardising the layout reduced that to = 16TB. In that case the reshape starts at the end so the critical section = (and so the backup file) may have been in progress at the 99% complete = point when it failed, hence the need to specify the backup file for the = assemble command. I ran "sudo mdadm --assemble --verbose --force /dev/md0 /dev/sd[bcde] = --backup-file=3D/root/raid5backup": mdadm: looking for devices for /dev/md0 mdadm: /dev/sdb is identified as a member of /dev/md0, slot 0. mdadm: /dev/sdc is identified as a member of /dev/md0, slot 1. mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2. mdadm: /dev/sde is identified as a member of /dev/md0, slot 3. mdadm: Marking array /dev/md0 as 'clean' mdadm: /dev/md0 has an active reshape - checking if critical section = needs to be restored mdadm: No backup metadata on /root/raid5backup mdadm: added /dev/sdc to /dev/md0 as 1 mdadm: added /dev/sdd to /dev/md0 as 2 mdadm: added /dev/sde to /dev/md0 as 3 mdadm: no uptodate device for slot 4 of /dev/md0 mdadm: added /dev/sdb to /dev/md0 as 0 mdadm: Need to backup 3072K of critical section.. mdadm: /dev/md0 has been started with 4 drives (out of 5). =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D sudo mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Thu Jul 13 01:11:22 2017 Raid Level : raid6 Array Size : 15627793408 (14903.83 GiB 16002.86 GB) Used Dev Size : 7813896704 (7451.91 GiB 8001.43 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sat Jun 26 19:40:16 2021 State : clean, reshaping Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric-6 Chunk Size : 512K Consistency Policy : bitmap Reshape Status : 99% complete Delta Devices : -1, (5->4) New Layout : left-symmetric Name : Universe:0 UUID : 3eee8746:8a3bf425:afb9b538:daa61b29 Events : 184255 Number Major Minor RaidDevice State 6 8 16 0 active sync /dev/sdb 7 8 32 1 active sync /dev/sdc 5 8 48 2 active sync /dev/sdd 4 8 64 3 active sync /dev/sde =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] = [raid4] [raid10] md0 : active raid6 sdb[6] sde[4] sdd[5] sdc[7] 15627793408 blocks super 1.2 level 6, 512k chunk, algorithm 18 = [4/3] [UUUU] [=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D>.] = reshape =3D 99.7% (7794393600/7813896704) finish=3D52211434.6min = speed=3D0K/sec bitmap: 14/30 pages [56KB], 131072KB chunk =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The drive mounts and the files are all intact, but still sitting on 99% = complete with 52 million minutes to finish and counting up. The "No = backup metadata" made me suspicious that it is stuck because it can't = write to /root/raid5backup (and looking at it now I should have put it = somewhere more sensible as I'm using sudo, but I used it in the RAID5 to = RAID6 process and it was happy). It does seem to have modified the file, = though: stat raid5backup File: raid5backup Size: 3149824 Blocks: 6152 IO Block: 4096 regular file Device: 802h/2050d Inode: 1572897 Links: 1 Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2021-06-26 19:39:16.739983712 +1000 Modify: 2021-06-26 19:40:16.778498938 +1000 Change: 2021-06-26 19:40:16.778498938 +1000 Birth: - =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D But I believe those times are from when I first ran the assemble command = - it's 20:30 now. I couldn't find a flag to conditionally treat the = backup file as garbage - just the --invalid-backup "I know it's garbage" = option. Given that the assemble isn't complaining about needing to = restore the critical section, is my next step something like: sudo mdadm --assemble --verbose --force /dev/md0 /dev/sd[bcde] = --backup-file=3Draidbackup --invalid-backup Thanks again, Phil. I haven't been using Linux seriously for very long, = so this has been a steep learning curve for me. Jason =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D -----Original Message----- From: Phil Turmel =20 Sent: Saturday, 26 June 2021 00:00 To: Jason Flood <3mu5555@gmail.com>; linux-raid@vger.kernel.org Subject: Re: 4-disk RAID6 (non-standard layout) normalise hung, now all = disks spare Good morning Jason, Good report. Comments inline. On 6/25/21 8:08 AM, Jason Flood wrote: > I started with a 4x4TB disk RAID5 array and, over a few years changed=20 > all the drives to 8TB (WD Red - I hadn't seen the warnings before now, = > but it looks like these ones are OK). I then successfully migrated it=20 > to RAID6, but it then had a non-standard layout, so I ran: > sudo mdadm --grow /dev/md0 --raid-devices=3D4=20 > --backup-file=3D/root/raid5backup --layout=3Dnormalize Ugh. You don't have to use a backup file unless mdadm tells you too.=20 Now you are stuck with it. > After a few days it reached 99% complete, but then the "hours = remaining" > counter started counting up. After a few days I had to power the=20 > system down before I could get a backup of the non-critical data=20 > (Couldn't get hold of enough storage quickly enough, but it wouldn't=20 > be catastrophic to lose it), and now the four drives are in standby, = with the array thinking it is RAID0. > Running: > sudo mdadm --assemble /dev/md0 /dev/sd[bcde] responds with: > mdadm: /dev/md0 assembled from 4 drives - not enough to start the=20 > array while not clean - consider --force. You have to specify the backup file on assembly if a reshape using one = was interrupted. > It appears to be similar to=20 > https://marc.info/?t=3D155492912100004&r=3D1&w=3D2, > but before trying --force I was considering using overlay files as I'm = > not sure of the risk of damage. The set-up process that is documented = in the " > Recovering a damaged RAID" Wiki article is excellent, however the=20 > latter part of the process isn't clear to me. If successful, are the=20 > overlay files written to the disk like a virtual machine snapshot, or=20 > is the process stopped, the overlays removed and the process repeated, = > knowing that it now has a low risk of damage? Using --force is very low risk on assembly. I would try it (without = overlays, and with backup file specified) before you do anything else.=20 Odds of success are high. Also try the flags to treat the backup file as garbage if its contents = don't match what mdadm expects. Report back here after the above. > System details follow. Thanks for any help. [details trimmed] Your report of the details was excellent. Thanks for helping us help = you. Phil