From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82D6BCE7A91 for ; Mon, 25 Sep 2023 01:43:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229524AbjIYBnu (ORCPT ); Sun, 24 Sep 2023 21:43:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229480AbjIYBnt (ORCPT ); Sun, 24 Sep 2023 21:43:49 -0400 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D38DABD for ; Sun, 24 Sep 2023 18:43:42 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.30.67.143]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4Rv5Hq4FRVz4f3kGD for ; Mon, 25 Sep 2023 09:43:35 +0800 (CST) Received: from [10.174.176.73] (unknown [10.174.176.73]) by APP4 (Coremail) with SMTP id gCh0CgAnvdzL5RBlZKdKBQ--.18403S3; Mon, 25 Sep 2023 09:43:40 +0800 (CST) Subject: Re: request for help on IMSM-metadata RAID-5 array To: Joel Parthemore , Roman Mamedov Cc: linux-raid@vger.kernel.org, "yukuai (C)" References: <507b6ab0-fd8f-d770-ba82-28def5f53d25@parthemores.com> <20230923162449.3ea0d586@nvm> <4095b51a-1038-2fd0-6503-64c0daa913d8@parthemores.com> <20230923203512.581fcd7d@nvm> <72388663-3997-a410-76f0-066dcd7d2a63@parthemores.com> From: Yu Kuai Message-ID: <4d606b3d-ccec-e791-97ba-2cb5af0cc226@huaweicloud.com> Date: Mon, 25 Sep 2023 09:43:39 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: <72388663-3997-a410-76f0-066dcd7d2a63@parthemores.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID: gCh0CgAnvdzL5RBlZKdKBQ--.18403S3 X-Coremail-Antispam: 1UD129KBjvJXoWxJw4fJF45tr13CF1kJF4rAFb_yoW5Xw15pF W3KFZIkrsxJr47Aw12vr18Ga4Yyr45ZrZxGrn8GrWkAwn0vrnrWr4xKryruF9rurW8Gw4j vr18ArW7CrZ8AaDanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUk0b4IE77IF4wAFF20E14v26r1j6r4UM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x 0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcVAKI48JMxk0xIA0c2IEe2xFo4CEbIxvr21l42xK82IYc2Ij 64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x 8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r126r1DMIIYrxkI7VAKI48JMIIF0xvE 2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE42 xK8VAvwI8IcIk0rVWrJr0_WFyUJwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv 6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUrNtxDUUUU X-CM-SenderInfo: 51xn3trlr6x35dzhxuhorxvhhfrp/ X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org Hi, 在 2023/09/24 2:49, Joel Parthemore 写道: > So, dd finally sped up and finished. It appears that I have lost none of > my data. I am a very happy man. A question: is there anything useful I > am likely to discover from keeping the RAID array as it is a bit longer > before I recreate it and copy the data back? It'll be much helper for developers to collect kernel stack for all stuck thread(and it'll be much better to use add2line). Thanks, Kuai > > Joel > > ----------------------------------------------------------------------------- > > > I have been wondering about HDD issues all along, of course, though I > didn't see any smoking gun. > > > I ran iostat -x 2 /dev/sdX on all three drives. All show an idle rate of > just under 90%. So I don't think that's the problem. > > Joel > > Den 2023-09-23 kl. 17:35, skrev Roman Mamedov: >> On Sat, 23 Sep 2023 17:18:00 +0200 >> Joel Parthemore wrote: >> >>> I didn't want to try that again until I had confirmation that the >>> out-of-sync wouldn't (or shouldn't) be an issue. (I had tried it once >>> before, but the system had somehow swapped /dev/md126 and /dev/md127 so >>> that /dev/md126 became the container and /dev/md127 the RAID-5 array, >>> which confused me. So I stopped experimenting further until I had a >>> chance to write to the list.) >>> >>> The array is assembled read only, and this time both /dev/md126 and >>> /dev/md127 are looking like I expect them to. I started dd to make a >>> backup image using dd if=/dev/md126 of=/dev/sdc bs=64K >>> conv=noerror,sync. (The EXT4 file store on the 2TB RAID-5 array is about >>> 900GB full.) At first, it was running most of the time and just >>> occasionally in uninterruptible sleep, but the periods of >>> uninterruptible sleep quickly started getting longer. Now it seems to be >>> spending most but not quite all of its time in uninterruptible sleep. Is >>> this some kind of race condition? Anyway, I'll leave it running >>> overnight to see if it completes. >>> >>> Accessing the RAID array definitely isn't locking things up this time. I >>> can go in and look at the partition table, for example, no problem. >>> Access is awfully slow, but I assume that's because of whatever dd is or >>> isn't doing. >>> >>> By the way, I'm using kernel 6.5.3, which isn't the latest (that would >>> be 6.5.5) but is close. >> Maybe it's an HDD issue, one of them did have some unreadable sectors >> in the >> past, although the firmware has not decided to do anything about that, >> such >> as reallocating them and recording that in SMART. >> >> Check if one of the drives is holding up things, with a command like >> >> iostat -x 2 /dev/sd? >> >> If you see 100% next to one of the drives, and much less for others, >> that one >> might be culprit. > . >