From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id F065AC77B7C for ; Fri, 26 May 2023 07:12:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242320AbjEZHMF (ORCPT ); Fri, 26 May 2023 03:12:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52610 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242180AbjEZHME (ORCPT ); Fri, 26 May 2023 03:12:04 -0400 Received: from out-14.mta1.migadu.com (out-14.mta1.migadu.com [95.215.58.14]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 787659E for ; Fri, 26 May 2023 00:12:02 -0700 (PDT) Message-ID: <7e9fd8ba-aacd-3697-15fe-dc0b292bd177@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1685085120; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=678Dpb2U5Mzwp0mqtRaqlHqASOKouM8EvvtbA6jNAhA=; b=t4YjCrHUspJuutdOkjrIfceNjOz4UR5wMMnDChxuGss0eCUJHjNqfG34eAPZkmmpSWoB7h w+HNc0yOHoXrtJ3Q3n7m4YsaQG99gxFBs6DMXYraP9GEZW+UiLYJ0jpKPAuPw/c/urBPbV 0qq9r5FfwvFxfpnqoz0ll+9PddTWlnc= Date: Fri, 26 May 2023 15:12:01 +0800 MIME-Version: 1.0 Subject: Re: The read data is wrong from raid5 when recovery happens Content-Language: en-US To: Xiao Ni Cc: Song Liu , linux-raid , Heinz Mauelshagen , Nigel Croxon References: X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Guoqing Jiang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT Precedence: bulk List-ID: X-Mailing-List: linux-raid@vger.kernel.org On 5/26/23 14:45, Xiao Ni wrote: > On Fri, May 26, 2023 at 11:09 AM Guoqing Jiang wrote: >> >> >> On 5/26/23 09:49, Xiao Ni wrote: >>> Hi all >>> >>> We found a problem recently. The read data is wrong when recovery happens. >>> Now we've found it's introduced by patch 10764815f (md: add io accounting >>> for raid0 and raid5). I can reproduce this 100%. This problem exists in >>> upstream. The test steps are like this: >>> >>> 1. mdadm -CR $devname -l5 -n4 /dev/sd[b-e] --force --assume-clean >>> 2. mkfs.ext4 -F $devname >>> 3. mount $devname $mount_point >>> 4. mdadm --incremental --fail sdd >>> 5. dd if=/dev/zero of=/tmp/pythontest/file1 bs=1M count=100000 status=progress I suppose /tmp is the mount point. >>> 6. mdadm /dev/md126 --add /dev/sdd >>> 7. create 31 processes that writes and reads. It compares the content with >>> md5sum. The test will go on until the recovery stops Could you share the test code/script for step 7? Will try it from my side. >>> 8. wait for about 10 minutes, we can see some processes report checksum is >>> wrong. But if it re-read the data again, the checksum will be good. So it is interim, I guess it appeared before recover was finished. >>> I tried to narrow this problem like this: >>> >>> - md_account_bio(mddev, &bi); >>> + if (rw == WRITE) >>> + md_account_bio(mddev, &bi); >>> If it only do account for write requests, the problem can disappear. >>> >>> - if (rw == READ && mddev->degraded == 0 && >>> - mddev->reshape_position == MaxSector) { >>> - bi = chunk_aligned_read(mddev, bi); >>> - if (!bi) >>> - return true; >>> - } >>> + //if (rw == READ && mddev->degraded == 0 && >>> + // mddev->reshape_position == MaxSector) { >>> + // bi = chunk_aligned_read(mddev, bi); >>> + // if (!bi) >>> + // return true; >>> + //} >>> >>> if (unlikely(bio_op(bi) == REQ_OP_DISCARD)) { >>> make_discard_request(mddev, bi); >>> @@ -6180,7 +6180,8 @@ static bool raid5_make_request(struct mddev *mddev, >>> struct bio * bi) >>> md_write_end(mddev); >>> return true; >>> } >>> - md_account_bio(mddev, &bi); >>> + if (rw == READ) >>> + md_account_bio(mddev, &bi); >>> >>> I comment the chunk_aligned_read out and only account for read requests, >>> this problem can be reproduced. >> After a quick look,raid5_read_one_chunk clones bio by itself, so no need to >> do it for the chunk aligned readcase. Could you pls try this? [...] >> Hi Guoqing >> >> When chunk_aligned_read runs successfully, it just returns. In this >> case, it does the account by itself. If it fails to execute, it still >> needs run md_account_bio in raid5_make_request. So now the logic >> should be right. Hmm, I was confused. Thanks, Guoqing