From: Ilya Leoshkevich <iii@linux.ibm.com>
To: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
thuth@redhat.com,
"Christian Borntraeger" <borntraeger@linux.ibm.com>,
"Daniel P . Berrangé" <berrange@redhat.com>,
"Juan Quintela" <quintela@redhat.com>,
jinpu.wang@ionos.com, s.reiter@proxmox.com,
"Cornelia Huck" <cohuck@redhat.com>,
qemu-devel@nongnu.org, peterx@redhat.com, qemu-s390x@nongnu.org,
"Philippe Mathieu-Daudé" <philippe.mathieu.daude@gmail.com>,
hreitz@redhat.com, f.ebner@proxmox.com,
"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [PATCH] multifd: Copy pages before compressing them with zlib
Date: Mon, 04 Apr 2022 14:09:05 +0200 [thread overview]
Message-ID: <76fd645d423ab0e835ef9de37aaeb9d857eae4e8.camel@linux.ibm.com> (raw)
In-Reply-To: <YkrUbt8Z+N5uenDT@work-vm>
On Mon, 2022-04-04 at 12:20 +0100, Dr. David Alan Gilbert wrote:
> * Ilya Leoshkevich (iii@linux.ibm.com) wrote:
> > zlib_send_prepare() compresses pages of a running VM. zlib does not
> > make any thread-safety guarantees with respect to changing
> > deflate()
> > input concurrently with deflate() [1].
> >
> > One can observe problems due to this with the IBM zEnterprise Data
> > Compression accelerator capable zlib [2]. When the hardware
> > acceleration is enabled, migration/multifd/tcp/zlib test fails
> > intermittently [3] due to sliding window corruption.
> >
> > At the moment this problem occurs only with this accelerator, since
> > its architecture explicitly discourages concurrent accesses [4]:
> >
> > Page 26-57, "Other Conditions":
> >
> > As observed by this CPU, other CPUs, and channel
> > programs, references to the parameter block, first,
> > second, and third operands may be multiple-access
> > references, accesses to these storage locations are
> > not necessarily block-concurrent, and the sequence
> > of these accesses or references is undefined.
> >
> > Still, it might affect other platforms due to a future zlib update.
> > Therefore, copy the page being compressed into a private buffer
> > before
> > passing it to zlib.
>
> While this might work around the problem; your explanation doesn't
> quite
> fit with the symptoms; or if they do, then you have a separate
> problem.
>
> The live migration code relies on the fact that the source is running
> and changing it's memory as the data is transmitted; however it also
> relies on the fact that if this happens the 'dirty' flag is set
> _after_
> those changes causing another round of migration and retransmission
> of
> the (now stable) data.
>
> We don't expect the load of the data for the first page write to be
> correct, consistent etc - we just rely on the retransmission to be
> correct when the page is stable.
>
> If your compressor hardware is doing something undefined during the
> first case that's fine; as long as it works fine in the stable case
> where the data isn't changing.
>
> Adding the extra copy is going to slow everyone else dowmn; and since
> there's plenty of pthread lockingin those multifd I'm expecting them
> to get reasonably defined ordering and thus be safe from multi
> threading
> problems (please correct us if we've actually done something wrong in
> the locking there).
>
> IMHO your accelerator when called from a zlib call needs to behave
> the same as if it was the software implementation; i.e. if we've got
> pthread calls in there that are enforcing ordering then that should
> be
> fine; your accelerator implementation needs to add a barrier of some
> type or an internal copy, not penalise everyone else.
>
> Dave
The problem with the accelerator is that during the first case the
internal state might end up being corrupted (in particular: what goes
into the deflate stream differs from what goes into the sliding
window). This may affect the data integrity in the second case later
on.
I've been trying to think what to do with that, and of course doing an
internal copy is one option (a barrier won't suffice). However, I
realized that zlib API as documented doesn't guarantee that it's safe
to change input data concurrently with compression. On the other hand,
today's zlib is implemented in a way that tolerates this.
So the open question for me is, whether we should honor zlib
documentation (in which case, I would argue, QEMU needs to be changed)
or say that the behavior of today's zlib implementation is more
important (in which case accelerator code needs to change). I went with
the former for now, but the latter is of course doable as well.
next prev parent reply other threads:[~2022-04-04 12:10 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-29 15:21 [PATCH] multifd: Copy pages before compressing them with zlib Ilya Leoshkevich
2022-03-30 14:35 ` Christian Borntraeger
2022-04-04 11:20 ` Dr. David Alan Gilbert
2022-04-04 12:09 ` Ilya Leoshkevich [this message]
2022-04-04 17:11 ` Dr. David Alan Gilbert
2022-04-04 12:45 ` Daniel P. Berrangé
2022-04-04 13:55 ` Juan Quintela
-- strict thread matches above, loose matches on Subject: below --
2022-07-04 16:41 Ilya Leoshkevich
2022-07-04 16:51 ` Juan Quintela
2022-07-05 15:27 ` Dr. David Alan Gilbert
2022-07-05 17:22 ` Ilya Leoshkevich
2022-07-05 17:32 ` Dr. David Alan Gilbert
2022-07-05 16:00 ` Peter Maydell
2022-07-05 16:16 ` Dr. David Alan Gilbert
2022-07-05 16:27 ` Christian Borntraeger
2022-07-05 16:33 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=76fd645d423ab0e835ef9de37aaeb9d857eae4e8.camel@linux.ibm.com \
--to=iii@linux.ibm.com \
--cc=alex.bennee@linaro.org \
--cc=berrange@redhat.com \
--cc=borntraeger@linux.ibm.com \
--cc=cohuck@redhat.com \
--cc=dgilbert@redhat.com \
--cc=f.ebner@proxmox.com \
--cc=hreitz@redhat.com \
--cc=jinpu.wang@ionos.com \
--cc=peter.maydell@linaro.org \
--cc=peterx@redhat.com \
--cc=philippe.mathieu.daude@gmail.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-s390x@nongnu.org \
--cc=quintela@redhat.com \
--cc=s.reiter@proxmox.com \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).