From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB6B0C4332F for ; Tue, 8 Feb 2022 01:06:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344024AbiBHBG1 (ORCPT ); Mon, 7 Feb 2022 20:06:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50816 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232539AbiBHAIk (ORCPT ); Mon, 7 Feb 2022 19:08:40 -0500 X-Greylist: delayed 5980 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Mon, 07 Feb 2022 16:08:38 PST Received: from smtp.hosts.co.uk (smtp.hosts.co.uk [85.233.160.19]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99FF5C061355 for ; Mon, 7 Feb 2022 16:08:38 -0800 (PST) Received: from host81-132-12-162.range81-132.btcentralplus.com ([81.132.12.162] helo=[192.168.1.65]) by smtp.hosts.co.uk with esmtpa (Exim) (envelope-from ) id 1nHCVD-0000ir-44; Mon, 07 Feb 2022 22:28:55 +0000 Message-ID: Date: Mon, 7 Feb 2022 22:28:57 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.5.1 Subject: Re: Replacing all disks in a an array as a preventative measure before failing. Content-Language: en-GB To: Red Wil , linux-raid References: <20220207152648.42dd311a@falcon.sitarc.ca> From: Wol In-Reply-To: <20220207152648.42dd311a@falcon.sitarc.ca> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org On 07/02/2022 20:26, Red Wil wrote: > Hello, > > It started as the subject said: > - goal was to replace all 10 disks in a R6 > - context and perceived constraints > - soft raid (no imsm and or ddl containers) > - multiple disk partition. partitions across 10 disks formed R6 > - downtime not an issue > - minimize the number of commands > - minimize disks stress > - reduce the time spent with this process > - difficult to add 10 spares at once in the rig > - after a reshape/grow from 6 to 10 disks offset of data in raid > members was all over the place from cca 10ksect to 200ksect > > Approaches/solutions and critique > 1- add one by one a 'spare' and 'replace' raid member > critique: > - seem to me long and tedious process > - cannot/will not run in parallel There's not a problem running in parallel as far as mdraid is concerned. If you can get the spare drives into the chassis (or on eSATA), you can --replace several drives at once. And it pretty much just does a dd, just on the live system keeping you raid-safe. > 2- add all the spares at once and perform 'replace' on members > critique > - just tedious - lots of cli commands which can be prone to mistakes. pretty much the same as (1). Given that your sdX's are moving all over the place, I would work with uuids even though it's more typing, it's safer. > next ones assume I have all the 'spares' in the rig > 3- create new arrays on spares, fresh fs and copy data. Well, you could fail/replace all the old drives, but yes just building a new array from scratch (if you can afford the downtime) is probably better. > 4- dd/ddrescue copy each drive to a new one. Advantage can be done one > by one or in parallel. less commands in the terminal. Less commands? Dunno about that. Much safer in many ways though, remove the drive you're replacing, copy it, put the new one back. Less chance for a physical error. > > In the end I decided I will use route (3). > - flexibility on creation > - copy only what I need > - old array is a sort of backup > > Question: > Just for my curiosity regarding (4) assuming array is offline: > Besides being not recommended in case of imsm/ddl containers which (as > far as i understood) keep some data on the hardware itself > > In case of pure soft raid is anything technical or safety related that > prevents a 'dd' copy of a physical hard drive to act exactly as the > original. > Nope. You've copied the partition byte for byte, the raid won't know any different. One question, though. Why are you replacing the drives? Just a precaution? How big are the drives? What I'd do if you're not replacing dying drives, is buy five or possibly six drives of twice the capacity. Do a --replace on those five drives. Now take two of the drives you've removed, raid-0 them, and now do a major re-org, adding your raid-0 as device 6, reducing your raid to a 6-device array, and removing the last four old drives from the array. Assuming you've only got 10 bays and you've been faffing about externally as you replace drives, you can now use the last three drives in the chassis to create another two-drive raid-0, add that as a spare into your raid-6, and add your last drive as a spare into both your raid-0s. So you end up with a 6-device+plus-spare raid-6, and devices 6 & spare (your raid-0s) share a spare between them. Cheers, Wol