Re: [PATCH V4] file-posix: allow -EBUSY error during ioctl(fd, BLKZEROOUT, range) on block

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: ChangLimin <changlm@chinatelecom.cn>
To: "Nir Soffer" <nsoffer@redhat.com>,  mreitz <mreitz@redhat.com>
Cc: kwolf <kwolf@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	qemu-block <qemu-block@nongnu.org>
Subject: Re: [PATCH V4] file-posix: allow -EBUSY error during ioctl(fd, BLKZEROOUT, range) on block
Date: Thu, 25 Mar 2021 14:06:48 +0800	[thread overview]
Message-ID: <2021032514064808224635@chinatelecom.cn> (raw)
In-Reply-To: CAMRbyysT_s+AkskuAGvT7wXOQ+LaX3OkSYTo4UxtYKqE0cjBMg@mail.gmail.com

[-- Attachment #1: Type: text/plain, Size: 6000 bytes --]

>On Wed, Mar 24, 2021 at 4:52 PM Max Reitz <mreitz@redhat.com> wrote:
>On 22.03.21 10:25, ChangLimin wrote:
>> For Linux 5.10/5.11, qemu write zeros to a multipath device using
>> ioctl(fd, BLKZEROOUT, range) with cache none or directsync return -EBUSY
>> permanently.
>
>So as far as I can track back the discussion, Kevin asked on v1 why we’d 
>set has_write_zeroes to false, i.e. whether the EBUSY might not go away 
>at some point, and if it did, whether we shouldn’t retry BLKZEROOUT then.
>You haven’t explicitly replied to that question (as far as I can see), 
>so it kind of still stands.
>
>Implicitly, there are two conflicting answers in this patch: On one 
>hand, the commit message says “permanently”, and this is what you told 
>Nir as a realistic case where this can occur. 

For Linux 5.10/5.11, the EBUSY is permanently, the reproduce step is below. 
For other Linux version, the EBUSY may be temporary. 
Because  Linux 5.10/5.11 is not used widely, so do not set has_write_zeroes to false.

>I'm afraid ChangLimin did not answer my question. I'm looking for real
>world used case when qemu cannot write zeros to multipath device, when
>nobody else is using the device.
>
>I tried to reproduce this on Fedora (kernel 5.10) with qemu-img convert,
>once with a multipath device, and once with logical volume on a vg created
>on the multipath device, and I could not reproduce this issue.

The following is steps to reproduct the issue on Fedora 34.

# uname -a
Linux fedora-34 5.11.3-300.fc34.x86_64 #1 SMP Thu Mar 4 19:03:18 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

# qemu-img -V
qemu-img version 5.2.0 (qemu-5.2.0-5.fc34.1)

1.  Login in an ISCSI LUN created using targetcli on ubuntu 20.04
# iscsiadm -m discovery -t st -p 192.169.1.109
192.169.1.109:3260,1 iqn.2003-01.org.linux-iscsi:lio-lv100

# iscsiadm -m node -l -T iqn.2003-01.org.linux-iscsi:lio-lv100
# iscsiadm -m session
tcp: [1] 192.169.1.109:3260,1 iqn.2003-01.org.linux-iscsi:lio-lv100 (non-flash)

2. start multipathd service
# mpathconf --enable
# systemctl start multipathd

3.  add multipath path
# multipath -a `/lib/udev/scsi_id -g /dev/sdb`   # sdb means the ISCSI LUN
wwid '36001405b76856e4816b48b99c6a77de3' added

# multipathd add path /dev/sdb
ok

# multipath -ll     # /dev/dm-1 is the multipath device based on /dev/sdb
mpatha (36001405bebfc3a0522541cda30220db9) dm-1 LIO-ORG,lv102
size=1.0G features='0' hwhandler='1 alua' wp=rw
`-+- policy='service-time 0' prio=50 status=active
  `- 5:0:0:0  sdd  8:48   active ready running

4. qemu-img return EBUSY both to dm-1 and sdb
# wget http://download.cirros-cloud.net/0.4.0/cirros-0.4.0-x86_64-disk.img
# qemu-img convert -O raw -t none cirros-0.4.0-x86_64-disk.img /dev/dm-1
qemu-img: error while writing at byte 0: Device or resource busy

# qemu-img convert -O raw -t none cirros-0.4.0-x86_64-disk.img /dev/sdb
qemu-img: error while writing at byte 0: Device or resource busy

5. blkdiscard also return EBUSY  both to dm-1 and sdb
# blkdiscard -o 0 -l 4096 /dev/dm-1
blkdiscard: cannot open /dev/dm-1: Device or resource busy

# blkdiscard -o 0 -l 4096 /dev/sdb
blkdiscard: cannot open /dev/sdb: No such file or directory

6. dd write zero is good, because it does not use blkdiscard
# dd if=/dev/zero of=/dev/dm-1 bs=1M count=100 oflag=direct 
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 2.33623 s, 44.9 MB/s

7. The LUN should support blkdiscard feature, otherwise it will not write zero 
with  ioctl(fd, BLKZEROOUT, range) 

>If I understand the kernel change correctly, this can happen when there is
>a mounted file system on top of the multipath device. I don't think we have 
>a use case when qemu accesses a multipath device when the device is used
>by a file system, but maybe I missed something.
> 
>So that to me implies 
>that we actually should not retry BLKZEROOUT, because the EBUSY will 
>remain, and that condition won’t change while the block device is in use 
>by qemu.
>
>On the other hand, in the code, you have decided not to reset 
>has_write_zeroes to false, so the implementation will retry.
>
>EBUSY is usually a temporary error, so retrying makes sense. The question
>is if we really can write zeroes manually in this case?
> 
>So I don’t quite understand.  Should we keep trying BLKZEROOUT or is 
>there no chance of it working after it has at one point failed with 
>EBUSY?  (Are there other cases besides what’s described in this commit 
>message where EBUSY might be returned and it is only temporary?)
>
>> Fallback to pwritev instead of exit for -EBUSY error.
>> 
>> The issue was introduced in Linux 5.10:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=384d87ef2c954fc58e6c5fd8253e4a1984f5fe02
>> 
>> Fixed in Linux 5.12:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=56887cffe946bb0a90c74429fa94d6110a73119d
>> 
>> Signed-off-by: ChangLimin <changlm@chinatelecom.cn>
>> ---
>>   block/file-posix.c | 8 ++++++--
>>   1 file changed, 6 insertions(+), 2 deletions(-)
>> 
>> diff --git a/block/file-posix.c b/block/file-posix.c
>> index 20e14f8e96..d4054ac9cb 100644
>> --- a/block/file-posix.c
>> +++ b/block/file-posix.c
>> @@ -1624,8 +1624,12 @@ static ssize_t 
>> handle_aiocb_write_zeroes_block(RawPosixAIOData *aiocb)
>>           } while (errno == EINTR);
>> 
>>           ret = translate_err(-errno);
>> -        if (ret == -ENOTSUP) {
>> -            s->has_write_zeroes = false;
>> +        switch (ret) {
>> +        case -ENOTSUP:
>> +            s->has_write_zeroes = false; /* fall through */
>> +        case -EBUSY: /* Linux 5.10/5.11 may return -EBUSY for multipath 
>> devices */
>> +            return -ENOTSUP;
>> +            break;
>
>(Not sure why this break is here.)
>
>Max
>
>>           }
>>       }
>>   #endif
>> --
>> 2.27.0
>> 


[-- Attachment #2: Type: text/html, Size: 10393 bytes --]

next prev parent reply	other threads:[~2021-03-25  6:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-22  9:25 [PATCH V4] file-posix: allow -EBUSY error during ioctl(fd, BLKZEROOUT, range) on block ChangLimin
2021-03-22 17:50 ` John Snow
2021-03-24 14:50 ` Max Reitz
2021-03-24 15:49   ` Nir Soffer
2021-03-25  6:06     ` ChangLimin [this message]
2021-03-25 15:48       ` Nir Soffer
2021-03-26  0:20         ` ChangLimin
2021-03-26 19:39           ` Nir Soffer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2021032514064808224635@chinatelecom.cn \
    --to=changlm@chinatelecom.cn \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=nsoffer@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).