From: Guoqing Jiang <guoqing.jiang@linux.dev>
To: Xiao Ni <xni@redhat.com>
Cc: "Tkaczyk, Mariusz" <mariusz.tkaczyk@intel.com>,
Song Liu <song@kernel.org>,
linux-raid <linux-raid@vger.kernel.org>,
Heinz Mauelshagen <heinzm@redhat.com>,
Nigel Croxon <ncroxon@redhat.com>
Subject: Re: The read data is wrong from raid5 when recovery happens
Date: Mon, 29 May 2023 16:33:52 +0800 [thread overview]
Message-ID: <71c45b69-770a-0c28-3bd2-a4bd1a18bc2d@linux.dev> (raw)
In-Reply-To: <CALTww29ww7sOwLFR=waX4b2bik=ZAiCW7mMEtg8jsoAHqxvHcQ@mail.gmail.com>
On 5/29/23 11:41, Xiao Ni wrote:
> On Mon, May 29, 2023 at 10:27 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>>
>>
>> On 5/26/23 15:23, Xiao Ni wrote:
>>>>>>> 6. mdadm /dev/md126 --add /dev/sdd
>>>>>>> 7. create 31 processes that writes and reads. It compares the content with
>>>>>>> md5sum. The test will go on until the recovery stops
>>>> Could you share the test code/script for step 7? Will try it from my side.
>>> The test scripts are written by people from intel.
>>> Hi, Mariusz. Can I share the test scripts here?
>>>
>>>>>>> 8. wait for about 10 minutes, we can see some processes report checksum is
>>>>>>> wrong. But if it re-read the data again, the checksum will be good.
>>>> So it is interim, I guess it appeared before recover was finished.
>>> Yes, it appears before recovery finishes. The test will finish once
>>> the recovery finishes.
>>>
>>>>>>> I tried to narrow this problem like this:
>>>>>>>
>>>>>>> - md_account_bio(mddev, &bi);
>>>>>>> + if (rw == WRITE)
>>>>>>> + md_account_bio(mddev, &bi);
>>>>>>> If it only do account for write requests, the problem can disappear.
>>>>>>>
>>>>>>> - if (rw == READ && mddev->degraded == 0 &&
>>>>>>> - mddev->reshape_position == MaxSector) {
>>>>>>> - bi = chunk_aligned_read(mddev, bi);
>>>>>>> - if (!bi)
>>>>>>> - return true;
>>>>>>> - }
>>>>>>> + //if (rw == READ && mddev->degraded == 0 &&
>>>>>>> + // mddev->reshape_position == MaxSector) {
>>>>>>> + // bi = chunk_aligned_read(mddev, bi);
>>>>>>> + // if (!bi)
>>>>>>> + // return true;
>>>>>>> + //}
>>>>>>>
>>>>>>> if (unlikely(bio_op(bi) == REQ_OP_DISCARD)) {
>>>>>>> make_discard_request(mddev, bi);
>>>>>>> @@ -6180,7 +6180,8 @@ static bool raid5_make_request(struct mddev *mddev,
>>>>>>> struct bio * bi)
>>>>>>> md_write_end(mddev);
>>>>>>> return true;
>>>>>>> }
>>>>>>> - md_account_bio(mddev, &bi);
>>>>>>> + if (rw == READ)
>>>>>>> + md_account_bio(mddev, &bi);
>>>>>>>
>>>>>>> I comment the chunk_aligned_read out and only account for read requests,
>>>>>>> this problem can be reproduced.
>> Only write bio and non aligned chunk read bio call md_account_bio, and
>> only account write bio is fine per your test. It means the md5sum didn't match
>> because of non aligned chunk read bio, so it is not abnormal that data in another chunk could
>> be changed with the recovery is not finished, right?
> That's right, only non aligned read requests can cause this problem.
> Good catch. If I understand right, you mean the non aligned read
> request reads data from the chunk which hasn't been recovered, right?
Yes, I don't think compare md5sum for such scenario makes more sense given
the state is interim. And it also appeared in my test with disable io
accounting.
>> BTW, I had run the test with bio accounting disabled by default, and
>> seems the result is
>> same.
>>
>>> git tag --sort=taggerdate --contain 10764815f |head -1
>> v5.14-rc1
>>
>> localhost:~/readdata #uname -r
>> 5.15.0-rc4-59.24-default
>> localhost:~/readdata #cat /sys/block/md126/queue/iostats
>> 0
>>
>> And I can still see relevant log from the terminal which runs 01-test.sh
> Hmm, thanks for this. I'll have a try again. Which kind of disks do
> you use for testing?
Four SCSI disks (1G capacity) inside VM.
Thanks,
Guoqing
next prev parent reply other threads:[~2023-05-29 8:34 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CALTww28aV5CGXQAu46Rkc=fG1jK=ARzCT8VGoVyje8kQdqEXMg@mail.gmail.com>
2023-05-26 2:08 ` Fwd: The read data is wrong from raid5 when recovery happens Xiao Ni
2023-05-26 2:17 ` Yu Kuai
2023-05-26 2:40 ` Xiao Ni
2023-05-26 2:47 ` Yu Kuai
2023-05-26 3:02 ` Xiao Ni
2023-05-26 3:56 ` d tbsky
2023-05-26 6:20 ` Xiao Ni
2024-02-14 15:15 ` Fwd: " Mateusz Kusiak
2024-02-14 17:12 ` Song Liu
[not found] ` <CALTww29s1WupaVRSrEX1GbD=1Bt7b5cxseDnBLARkH1uHUhtCA@mail.gmail.com>
2024-02-15 10:41 ` Mateusz Kusiak
2023-05-26 3:09 ` Guoqing Jiang
2023-05-26 6:45 ` Xiao Ni
2023-05-26 7:12 ` Guoqing Jiang
2023-05-26 7:23 ` Xiao Ni
2023-05-26 9:13 ` Mariusz Tkaczyk
2023-05-26 21:13 ` Song Liu
2023-05-27 0:56 ` Xiao Ni
2023-07-11 0:39 ` Xiao Ni
2023-07-14 1:30 ` Yu Kuai
2023-05-29 2:25 ` Guoqing Jiang
2023-05-29 3:41 ` Xiao Ni
2023-05-29 8:33 ` Guoqing Jiang [this message]
2023-05-29 8:40 ` Xiao Ni
2023-05-30 1:36 ` Guoqing Jiang
2023-05-30 2:02 ` Yu Kuai
2023-05-30 2:11 ` Xiao Ni
2023-05-30 2:23 ` Guoqing Jiang
2023-05-30 2:30 ` Xiao Ni
2023-05-30 2:43 ` Guoqing Jiang
2023-06-14 8:27 ` Kusiak, Mateusz
2023-06-14 8:46 ` Xiao Ni
2023-05-29 13:51 ` Xiao Ni
2023-05-30 0:53 ` Guoqing Jiang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=71c45b69-770a-0c28-3bd2-a4bd1a18bc2d@linux.dev \
--to=guoqing.jiang@linux.dev \
--cc=heinzm@redhat.com \
--cc=linux-raid@vger.kernel.org \
--cc=mariusz.tkaczyk@intel.com \
--cc=ncroxon@redhat.com \
--cc=song@kernel.org \
--cc=xni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox