qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Huth <thuth@redhat.com>
To: Nir Soffer <nsoffer@redhat.com>,
	qemu-block <qemu-block@nongnu.org>, Kevin Wolf <kwolf@redhat.com>,
	Max Reitz <mreitz@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>,
	Viktor Mihajlovski <mihajlov@linux.ibm.com>,
	Andrey Shinkevich <andrey.shinkevich@virtuozzo.com>,
	Richard Jones <rjones@redhat.com>,
	QEMU Developers <qemu-devel@nongnu.org>
Subject: Re: [PATCH] block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS
Date: Wed, 19 May 2021 12:41:34 +0200	[thread overview]
Message-ID: <b8942810-dbea-c5f6-0e78-5cfca68a2870@redhat.com> (raw)
In-Reply-To: <15a28d6c-8958-5feb-4418-134af27b5b6f@redhat.com>

On 19/05/2021 12.21, Thomas Huth wrote:
> On 19/04/2021 07.06, Thomas Huth wrote:
>> On 16/04/2021 22.34, Nir Soffer wrote:
>>> On Fri, Apr 16, 2021 at 8:23 AM Thomas Huth <thuth@redhat.com> wrote:
>>>>
>>>> A customer reported that running
>>>>
>>>>   qemu-img convert -t none -O qcow2 -f qcow2 input.qcow2 output.qcow2
>>>>
>>>> fails for them with the following error message when the images are
>>>> stored on a GPFS file system:
>>>>
>>>>   qemu-img: error while writing sector 0: Invalid argument
>>>>
>>>> After analyzing the strace output, it seems like the problem is in
>>>> handle_aiocb_write_zeroes(): The call to fallocate(FALLOC_FL_PUNCH_HOLE)
>>>> returns EINVAL, which can apparently happen if the file system has
>>>> a different idea of the granularity of the operation. It's arguably
>>>> a bug in GPFS, since the PUNCH_HOLE mode should not result in EINVAL
>>>> according to the man-page of fallocate(), but the file system is out
>>>> there in production and so we have to deal with it. In commit 294682cc3a
>>>> ("block: workaround for unaligned byte range in fallocate()") we also
>>>> already applied the a work-around for the same problem to the earlier
>>>> fallocate(FALLOC_FL_ZERO_RANGE) call, so do it now similar with the
>>>> PUNCH_HOLE call.
>>>>
>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>> ---
>>>>   block/file-posix.c | 7 +++++++
>>>>   1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>> index 20e14f8e96..7a40428d52 100644
>>>> --- a/block/file-posix.c
>>>> +++ b/block/file-posix.c
>>>> @@ -1675,6 +1675,13 @@ static int handle_aiocb_write_zeroes(void *opaque)
>>>>               }
>>>>               s->has_fallocate = false;
>>>>           } else if (ret != -ENOTSUP) {
>>>> +            if (ret == -EINVAL) {
>>>> +                /*
>>>> +                 * File systems like GPFS do not like unaligned byte 
>>>> ranges,
>>>> +                 * treat it like unsupported (so caller falls back to 
>>>> pwrite)
>>>> +                 */
>>>> +                return -ENOTSUP;
>>>
>>> This skips the next fallback, using plain fallocate(0) if we write
>>> after the end of the file. Is this intended?
>>>
>>> We can treat the buggy EINVAL return value as "filesystem is buggy,
>>> let's not try other options", or "let's try the next option". Since falling
>>> back to actually writing zeroes is so much slower, I think it is better to
>>> try the next option.
>>
>> I just did the same work-around as in commit 294682cc3a7 ... so if we 
>> agree to try the other options, too, we should change that spot, too...
>>
>> However, what is not clear to me, how would you handle s->has_write_zeroes 
>> and s->has_discard in such a case? Set them to "false"? ... but it could 
>> still work for some blocks with different alignment ... but if we keep 
>> them set to "true", the code tries again and again to call these ioctls, 
>> maybe wasting other precious cycles for this?
>>
>> Maybe we should do a different approach instead: In case we hit a EINVAL 
>> here, print an error a la:
>>
>> error_report_once("You are running on a buggy file system, please complain 
>> to the file system vendor");
>>
>> and return -ENOTSUP ... then it's hopefully clear to the users why they 
>> are getting a bad performance, and that they should complain to the file 
>> system vendor instead to get their problem fixed.
> 
> Ping!
> 
> Any recommendations how to proceed here?

Never mind, Kevin just told me that he already replied - but apparently that 
mail did not make it into my "qemu-devel" folder, so I did not see it :-( 
... Anyway, I'll try to rework my patch accordingly:

  https://lists.gnu.org/archive/html/qemu-block/2021-04/msg00488.html

  Thomas



      reply	other threads:[~2021-05-19 10:42 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-16  5:23 [PATCH] block/file-posix: Fix problem with fallocate(PUNCH_HOLE) on GPFS Thomas Huth
2021-04-16 20:34 ` Nir Soffer
2021-04-19  5:06   ` Thomas Huth
2021-04-19 13:13     ` Kevin Wolf
2021-05-19 10:21     ` Thomas Huth
2021-05-19 10:41       ` Thomas Huth [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b8942810-dbea-c5f6-0e78-5cfca68a2870@redhat.com \
    --to=thuth@redhat.com \
    --cc=andrey.shinkevich@virtuozzo.com \
    --cc=borntraeger@de.ibm.com \
    --cc=kwolf@redhat.com \
    --cc=mihajlov@linux.ibm.com \
    --cc=mreitz@redhat.com \
    --cc=nsoffer@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).