Re: [PATCH v3 0/4] Live Migration Acceleration with IAA Compression

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Xu <peterx@redhat.com>
To: Yuan Liu <yuan1.liu@intel.com>
Cc: farosas@suse.de, leobras@redhat.com, qemu-devel@nongnu.org,
	nanhai.zou@intel.com
Subject: Re: [PATCH v3 0/4] Live Migration Acceleration with IAA Compression
Date: Mon, 29 Jan 2024 18:42:44 +0800	[thread overview]
Message-ID: <ZbeBJEDlqX51dlBN@x1n> (raw)
In-Reply-To: <20240103112851.908082-1-yuan1.liu@intel.com>

On Wed, Jan 03, 2024 at 07:28:47PM +0800, Yuan Liu wrote:
> Hi,

Hi, Yuan,

I have a few comments and questions.  Many of them can be pure questions as
I don't know enough on these new technologies.

> 
> I am writing to submit a code change aimed at enhancing live migration
> acceleration by leveraging the compression capability of the Intel
> In-Memory Analytics Accelerator (IAA).
> 
> The implementation of the IAA (de)compression code is based on Intel Query
> Processing Library (QPL), an open-source software project designed for
> IAA high-level software programming. https://github.com/intel/qpl
> 
> In the last version, there was some discussion about whether to
> introduce a new compression algorithm for IAA. Because the compression
> algorithm of IAA hardware is based on deflate, and QPL already supports
> Zlib, so in this version, I implemented IAA as an accelerator for the
> Zlib compression method. However, due to some reasons, QPL is currently
> not compatible with the existing Zlib method that Zlib compressed data
> can be decompressed by QPl and vice versa.
> 
> I have some concerns about the existing Zlib compression
>   1. Will you consider supporting one channel to support multi-stream
>      compression? Of course, this may lead to a reduction in compression
>      ratio, but it will allow the hardware to process each stream 
>      concurrently. We can have each stream process multiple pages,
>      reducing the loss of compression ratio. For example, 128 pages are
>      divided into 16 streams for independent compression. I will provide
>      the a early performance data in the next version(v4).

I think Juan used to ask similar question: how much this can help if
multifd can already achieve some form of concurrency over the pages?
Couldn't the user specify more multifd channels if they want to grant more
cpu resource for comp/decomp purpose?

IOW, how many concurrent channels QPL can provide?  What is the suggested
concurrency channels there?

> 
>   2. Will you consider using QPL/IAA as an independent compression
>      algorithm instead of an accelerator? In this way, we can better
>      utilize hardware performance and some features, such as IAA's
>      canned mode, which can be dynamically generated by some statistics
>      of data. A huffman table to improve the compression ratio.

Maybe one more knob will work?  If it's not compatible with the deflate
algo maybe it should never be the default.  IOW, the accelerators may be
extended into this (based on what you already proposed):

  - auto ("qpl" first, "none" second; never "qpl-optimized")
  - none (old zlib)
  - qpl (qpl compatible)
  - qpl-optimized (qpl uncompatible)

Then "auto"/"none"/"qpl" will always be compatible, only the last doesn't,
user can select it explicit, but only on both sides of QEMU.

> 
> Test condition:
>   1. Host CPUs are based on Sapphire Rapids, and frequency locked to 3.4G
>   2. VM type, 16 vCPU and 64G memory
>   3. The Idle workload means no workload is running in the VM 
>   4. The Redis workload means YCSB workloadb + Redis Server are running
>      in the VM, about 20G or more memory will be used.
>   5. Source side migartion configuration commands
>      a. migrate_set_capability multifd on
>      b. migrate_set_parameter multifd-channels 2/4/8
>      c. migrate_set_parameter downtime-limit 300
>      d. migrate_set_parameter multifd-compression zlib
>      e. migrate_set_parameter multifd-compression-accel none/qpl
>      f. migrate_set_parameter max-bandwidth 100G
>   6. Desitination side migration configuration commands
>      a. migrate_set_capability multifd on
>      b. migrate_set_parameter multifd-channels 2/4/8
>      c. migrate_set_parameter multifd-compression zlib
>      d. migrate_set_parameter multifd-compression-accel none/qpl
>      e. migrate_set_parameter max-bandwidth 100G

How is zlib-level setup?  Default (1)?

Btw, it seems both zlib/zstd levels are not even working right now to be
configured.. probably overlooked in migrate_params_apply().

> 
> Early migration result, each result is the average of three tests
>  +--------+-------------+--------+--------+---------+----+-----+
>  |        | The number  |total   |downtime|network  |pages per |
>  |        | of channels |time(ms)|(ms)    |bandwidth|second    |
>  |        | and mode    |        |        |(mbps)   |          |
>  |        +-------------+-----------------+---------+----------+
>  |        | 2 chl, Zlib | 20647  | 22     | 195     | 137767   |
>  |        +-------------+--------+--------+---------+----------+
>  | Idle   | 2 chl, IAA  | 17022  | 36     | 286     | 460289   |
>  |workload+-------------+--------+--------+---------+----------+
>  |        | 4 chl, Zlib | 18835  | 29     | 241     | 299028   |
>  |        +-------------+--------+--------+---------+----------+
>  |        | 4 chl, IAA  | 16280  | 32     | 298     | 652456   |
>  |        +-------------+--------+--------+---------+----------+
>  |        | 8 chl, Zlib | 17379  | 32     | 275     | 470591   |
>  |        +-------------+--------+--------+---------+----------+
>  |        | 8 chl, IAA  | 15551  | 46     | 313     | 1315784  |

The number is slightly confusing to me.  If IAA can send 3x times more
pages per-second, shouldn't the total migration time 1/3 of the other if
the guest is idle?  But the total times seem to be pretty close no matter N
of channels. Maybe I missed something?

>  +--------+-------------+--------+--------+---------+----------+
> 
>  +--------+-------------+--------+--------+---------+----+-----+
>  |        | The number  |total   |downtime|network  |pages per |
>  |        | of channels |time(ms)|(ms)    |bandwidth|second    |
>  |        | and mode    |        |        |(mbps)   |          |
>  |        +-------------+-----------------+---------+----------+
>  |        | 2 chl, Zlib | 100% failure, timeout is 120s        |
>  |        +-------------+--------+--------+---------+----------+
>  | Redis  | 2 chl, IAA  | 62737  | 115    | 4547    | 387911   |
>  |workload+-------------+--------+--------+---------+----------+
>  |        | 4 chl, Zlib | 30% failure, timeout is 120s         |
>  |        +-------------+--------+--------+---------+----------+
>  |        | 4 chl, IAA  | 54645  | 177    | 5382    | 656865   |
>  |        +-------------+--------+--------+---------+----------+
>  |        | 8 chl, Zlib | 93488  | 74     | 1264    | 129486   |
>  |        +-------------+--------+--------+---------+----------+
>  |        | 8 chl, IAA  | 24367  | 303    | 6901    | 964380   |
>  +--------+-------------+--------+--------+---------+----------+

The redis results look much more preferred on using IAA comparing to the
idle tests.  Does it mean that IAA works less good with zero pages in
general (assuming that'll be the majority in idle test)?

From the manual, I see that IAA also supports encryption/decryption.  Would
it be able to accelerate TLS?

How should one consider IAA over QAT?  What is the major difference?  I see
that IAA requires IOMMU scalable mode, why?  Is it because the IAA HW is
something attached to the pcie bus (assume QAT the same)?

Thanks,

-- 
Peter Xu

next prev parent reply	other threads:[~2024-01-29 10:43 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-03 11:28 [PATCH v3 0/4] Live Migration Acceleration with IAA Compression Yuan Liu
2024-01-03 11:28 ` [PATCH v3 1/4] migration: Introduce multifd-compression-accel parameter Yuan Liu
2024-01-03 11:28 ` [PATCH v3 2/4] multifd: Implement multifd compression accelerator Yuan Liu
2024-01-03 11:28 ` [PATCH v3 3/4] configure: add qpl option Yuan Liu
2024-01-03 11:28 ` [PATCH v3 4/4] multifd: Introduce QPL compression accelerator Yuan Liu
2024-01-29 10:42 ` Peter Xu [this message]
2024-01-30  3:56   ` [PATCH v3 0/4] Live Migration Acceleration with IAA Compression Liu, Yuan1
2024-01-30 10:32     ` Peter Xu
2024-01-31  2:08       ` Liu, Yuan1

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZbeBJEDlqX51dlBN@x1n \
    --to=peterx@redhat.com \
    --cc=farosas@suse.de \
    --cc=leobras@redhat.com \
    --cc=nanhai.zou@intel.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yuan1.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).