From: Max Reitz <mreitz@redhat.com>
To: Anton Nefedov <anton.nefedov@virtuozzo.com>,
Qemu-block <qemu-block@nongnu.org>
Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
Alberto Garcia <berto@igalia.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: Problems with c8bb23cbdbe3 on ppc64le
Date: Mon, 21 Oct 2019 13:40:27 +0200 [thread overview]
Message-ID: <4c61a0ba-3a75-fffb-a724-4f4700eaa111@redhat.com> (raw)
In-Reply-To: <cd53cd86-e93c-297a-c08e-3fc1ae2618ac@redhat.com>
[-- Attachment #1.1: Type: text/plain, Size: 2764 bytes --]
On 11.10.19 09:49, Max Reitz wrote:
> On 10.10.19 18:15, Anton Nefedov wrote:
>> On 10/10/2019 6:17 PM, Max Reitz wrote:
>>> Hi everyone,
>>>
>>> (CCs just based on tags in the commit in question)
>>>
>>> I have two bug reports which claim problems of qcow2 on XFS on ppc64le
>>> machines since qemu 4.1.0. One of those is about bad performance
>>> (sorry, is isn’t public :-/), the other about data corruption
>>> (https://bugzilla.redhat.com/show_bug.cgi?id=1751934).
>>>
>>> It looks like in both cases reverting c8bb23cbdbe3 solves the problem
>>> (which optimized COW of unallocated areas).
>>>
>>> I think I’ve looked at every angle but can‘t find what could be wrong
>>> with it. Do any of you have any idea? :-/
>>>
>>
>> hi,
>>
>> oh, that patch strikes again..
>>
>> I don't quite follow, was this bug confirmed to happen on x86? Comment 8
>> (https://bugzilla.redhat.com/show_bug.cgi?id=1751934#c8) mentioned that
>> (or was that mixed up with the old xfsctl bug?)
>
> I think that was mixed up with the xfsctl bug, yes.
>
>> Regardless of the platform, does it reproduce? That's comforting
>> already; worst case we can trace each and every request then (unless it
>> will stop to reproduce this way).
>
> I haven’t been able to reproduce it yet (wrestling with the test system
> and getting ppc64 machines provisioned), but as far as I know it
> reproduces reliably on ppc64, but only there.
>
>> Also, perhaps it's worth to try to replace fallocate with write(0)?
>> Either in qcow2 (in the patch, bdrv_co_pwrite_zeroes -> bdrv_co_pwritev)
>> or in the file driver. It might hint whether it's misbehaving fallocate
>> (in qemu or in kernel) or something else.
>
> Good idea, that should at least tell us something about the corruption.
OK, after a week of debugging I’m not really much wiser.
One thing I know is that I can see the issue on x86-64 now, but not on
ext4, only XFS.
Replacing the zero-write with actually writing zeroes fixes it, but I
still don’t know whether that’s because of the kernel or because the
write is just slower or takes another code path...
The only thing I could narrow it down to is this:
The issue persists if handle_alloc_space() writes zeroes (with a
narrowed aligned zero-write with NO_FALLBACK) only to the non-COW area,
and I keep skip_cow to be false.
So there seems to be some kind of interaction between the zero-write and
the following write of data. I don’t know what kind of interaction that
is, though. I have tried to write a test case in qemu-img (basically
rewriting qemu-img bench), but failed so far.
It certainly looks like a kernel issue, but without a simpler reproducer
I just cannot tell.
Max
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2019-10-21 11:43 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-10 15:17 Problems with c8bb23cbdbe3 on ppc64le Max Reitz
2019-10-10 16:15 ` Anton Nefedov
2019-10-11 7:49 ` Max Reitz
2019-10-21 11:40 ` Max Reitz [this message]
2019-10-21 13:33 ` Max Reitz
2019-10-21 16:24 ` Max Reitz
2019-10-24 9:08 ` Max Reitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4c61a0ba-3a75-fffb-a724-4f4700eaa111@redhat.com \
--to=mreitz@redhat.com \
--cc=anton.nefedov@virtuozzo.com \
--cc=berto@igalia.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).