Re: The read data is wrong from raid5 when recovery happens

Linux RAID subsystem development
 help / color / mirror / Atom feed

From: Guoqing Jiang <guoqing.jiang@linux.dev>
To: Xiao Ni <xni@redhat.com>
Cc: "Tkaczyk, Mariusz" <mariusz.tkaczyk@intel.com>,
	Song Liu <song@kernel.org>,
	linux-raid <linux-raid@vger.kernel.org>,
	Heinz Mauelshagen <heinzm@redhat.com>,
	Nigel Croxon <ncroxon@redhat.com>
Subject: Re: The read data is wrong from raid5 when recovery happens
Date: Mon, 29 May 2023 16:33:52 +0800	[thread overview]
Message-ID: <71c45b69-770a-0c28-3bd2-a4bd1a18bc2d@linux.dev> (raw)
In-Reply-To: <CALTww29ww7sOwLFR=waX4b2bik=ZAiCW7mMEtg8jsoAHqxvHcQ@mail.gmail.com>



On 5/29/23 11:41, Xiao Ni wrote:
> On Mon, May 29, 2023 at 10:27 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>>
>>
>> On 5/26/23 15:23, Xiao Ni wrote:
>>>>>>> 6. mdadm /dev/md126 --add /dev/sdd
>>>>>>> 7. create 31 processes that writes and reads. It compares the content with
>>>>>>> md5sum. The test will go on until the recovery stops
>>>> Could you share the test code/script for step 7? Will try it from my side.
>>> The test scripts are written by people from intel.
>>> Hi, Mariusz. Can I share the test scripts here?
>>>
>>>>>>> 8. wait for about 10 minutes, we can see some processes report checksum is
>>>>>>> wrong. But if it re-read the data again, the checksum will be good.
>>>> So it is interim, I guess it appeared before recover was finished.
>>> Yes, it appears before recovery finishes. The test will finish once
>>> the recovery finishes.
>>>
>>>>>>> I tried to narrow this problem like this:
>>>>>>>
>>>>>>> -       md_account_bio(mddev, &bi);
>>>>>>> +       if (rw == WRITE)
>>>>>>> +               md_account_bio(mddev, &bi);
>>>>>>> If it only do account for write requests, the problem can disappear.
>>>>>>>
>>>>>>> -       if (rw == READ && mddev->degraded == 0 &&
>>>>>>> -           mddev->reshape_position == MaxSector) {
>>>>>>> -               bi = chunk_aligned_read(mddev, bi);
>>>>>>> -               if (!bi)
>>>>>>> -                       return true;
>>>>>>> -       }
>>>>>>> +       //if (rw == READ && mddev->degraded == 0 &&
>>>>>>> +       //    mddev->reshape_position == MaxSector) {
>>>>>>> +       //      bi = chunk_aligned_read(mddev, bi);
>>>>>>> +       //      if (!bi)
>>>>>>> +       //              return true;
>>>>>>> +       //}
>>>>>>>
>>>>>>>             if (unlikely(bio_op(bi) == REQ_OP_DISCARD)) {
>>>>>>>                     make_discard_request(mddev, bi);
>>>>>>> @@ -6180,7 +6180,8 @@ static bool raid5_make_request(struct mddev *mddev,
>>>>>>> struct bio * bi)
>>>>>>>                             md_write_end(mddev);
>>>>>>>                     return true;
>>>>>>>             }
>>>>>>> -       md_account_bio(mddev, &bi);
>>>>>>> +       if (rw == READ)
>>>>>>> +               md_account_bio(mddev, &bi);
>>>>>>>
>>>>>>> I comment the chunk_aligned_read out and only account for read requests,
>>>>>>> this problem can be reproduced.
>> Only write bio and non aligned chunk read bio call md_account_bio, and
>> only account write bio is fine per your test. It means the md5sum didn't match
>> because of non aligned chunk read bio, so it is not abnormal that data in another chunk could
>> be changed with the recovery is not finished, right?
> That's right, only non aligned read requests can cause this problem.
> Good catch. If I understand right, you mean the non aligned read
> request reads data from the chunk which hasn't been recovered, right?

Yes, I don't think compare md5sum for such scenario makes more sense given
the state is interim. And it also appeared in my test with disable io 
accounting.

>> BTW, I had run the test with bio accounting disabled by default, and
>> seems the result is
>> same.
>>
>>> git tag  --sort=taggerdate --contain 10764815f |head -1
>> v5.14-rc1
>>
>> localhost:~/readdata #uname -r
>> 5.15.0-rc4-59.24-default
>> localhost:~/readdata #cat /sys/block/md126/queue/iostats
>> 0
>>
>> And I can still see relevant log from the terminal which runs 01-test.sh
> Hmm, thanks for this. I'll have a try again. Which kind of disks do
> you use for testing?

Four SCSI disks (1G capacity) inside VM.

Thanks,
Guoqing

next prev parent reply	other threads:[~2023-05-29  8:34 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CALTww28aV5CGXQAu46Rkc=fG1jK=ARzCT8VGoVyje8kQdqEXMg@mail.gmail.com>
2023-05-26  2:08 ` Fwd: The read data is wrong from raid5 when recovery happens Xiao Ni
2023-05-26  2:17   ` Yu Kuai
2023-05-26  2:40     ` Xiao Ni
2023-05-26  2:47       ` Yu Kuai
2023-05-26  3:02         ` Xiao Ni
2023-05-26  3:56   ` d tbsky
2023-05-26  6:20     ` Xiao Ni
2024-02-14 15:15   ` Fwd: " Mateusz Kusiak
2024-02-14 17:12     ` Song Liu
     [not found]     ` <CALTww29s1WupaVRSrEX1GbD=1Bt7b5cxseDnBLARkH1uHUhtCA@mail.gmail.com>
2024-02-15 10:41       ` Mateusz Kusiak
2023-05-26  3:09 ` Guoqing Jiang
2023-05-26  6:45   ` Xiao Ni
2023-05-26  7:12     ` Guoqing Jiang
2023-05-26  7:23       ` Xiao Ni
2023-05-26  9:13         ` Mariusz Tkaczyk
2023-05-26 21:13           ` Song Liu
2023-05-27  0:56             ` Xiao Ni
2023-07-11  0:39               ` Xiao Ni
2023-07-14  1:30                 ` Yu Kuai
2023-05-29  2:25         ` Guoqing Jiang
2023-05-29  3:41           ` Xiao Ni
2023-05-29  8:33             ` Guoqing Jiang [this message]
2023-05-29  8:40               ` Xiao Ni
2023-05-30  1:36                 ` Guoqing Jiang
2023-05-30  2:02                   ` Yu Kuai
2023-05-30  2:11                     ` Xiao Ni
2023-05-30  2:23                       ` Guoqing Jiang
2023-05-30  2:30                         ` Xiao Ni
2023-05-30  2:43                           ` Guoqing Jiang
2023-06-14  8:27                           ` Kusiak, Mateusz
2023-06-14  8:46                             ` Xiao Ni
2023-05-29 13:51           ` Xiao Ni
2023-05-30  0:53             ` Guoqing Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71c45b69-770a-0c28-3bd2-a4bd1a18bc2d@linux.dev \
    --to=guoqing.jiang@linux.dev \
    --cc=heinzm@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=mariusz.tkaczyk@intel.com \
    --cc=ncroxon@redhat.com \
    --cc=song@kernel.org \
    --cc=xni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox