- * [PATCH 1/5] configure: add qpl meson option
  2023-10-18 22:12 [PATCH 0/5] Live Migration Acceleration with IAA Compression Yuan Liu
@ 2023-10-18 22:12 ` Yuan Liu
  2023-10-19 11:12   ` Juan Quintela
  2023-10-18 22:12 ` [PATCH 2/5] qapi/migration: Introduce compress-with-iaa migration parameter Yuan Liu
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Yuan Liu @ 2023-10-18 22:12 UTC (permalink / raw)
  To: quintela, peterx, farosas, leobras; +Cc: qemu-devel, yuan1.liu, nanhai.zou
Intel Query Processing Library (QPL) is an open-source library that
supports features of the new Intel In-Memory Analytics Accelerator (IAA)
available on Intel Xeon Sapphire Rapids processors, including
high-throughput compression and decompression.
add --enable-qpl and --disable-qpl options for data (de)compression
using IAA during the live migration process.
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
---
 meson.build                   | 9 ++++++++-
 meson_options.txt             | 2 ++
 scripts/meson-buildoptions.sh | 3 +++
 3 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/meson.build b/meson.build
index 79aef19bdc..0a69bf68cf 100644
--- a/meson.build
+++ b/meson.build
@@ -1032,6 +1032,11 @@ if not get_option('zstd').auto() or have_block
                     required: get_option('zstd'),
                     method: 'pkg-config')
 endif
+qpl = not_found
+if not get_option('qpl').auto()
+    qpl = dependency('libqpl', required: get_option('qpl'),
+                     method: 'pkg-config')
+endif
 virgl = not_found
 
 have_vhost_user_gpu = have_tools and targetos == 'linux' and pixman.found()
@@ -2158,6 +2163,7 @@ config_host_data.set('CONFIG_MALLOC_TRIM', has_malloc_trim)
 config_host_data.set('CONFIG_STATX', has_statx)
 config_host_data.set('CONFIG_STATX_MNT_ID', has_statx_mnt_id)
 config_host_data.set('CONFIG_ZSTD', zstd.found())
+config_host_data.set('CONFIG_QPL', qpl.found())
 config_host_data.set('CONFIG_FUSE', fuse.found())
 config_host_data.set('CONFIG_FUSE_LSEEK', fuse_lseek.found())
 config_host_data.set('CONFIG_SPICE_PROTOCOL', spice_protocol.found())
@@ -3616,7 +3622,7 @@ libmigration = static_library('migration', sources: migration_files + genh,
                               name_suffix: 'fa',
                               build_by_default: false)
 migration = declare_dependency(link_with: libmigration,
-                               dependencies: [zlib, qom, io])
+                               dependencies: [zlib, qom, io, qpl])
 system_ss.add(migration)
 
 block_ss = block_ss.apply(config_targetos, strict: false)
@@ -4281,6 +4287,7 @@ summary_info += {'blkio support':     blkio}
 summary_info += {'curl support':      curl}
 summary_info += {'Multipath support': mpathpersist}
 summary_info += {'Linux AIO support': libaio}
+summary_info += {'Query Processing Library support': qpl}
 summary_info += {'Linux io_uring support': linux_io_uring}
 summary_info += {'ATTR/XATTR support': libattr}
 summary_info += {'RDMA support':      rdma}
diff --git a/meson_options.txt b/meson_options.txt
index 6a17b90968..e8e7e37893 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -251,6 +251,8 @@ option('xkbcommon', type : 'feature', value : 'auto',
        description: 'xkbcommon support')
 option('zstd', type : 'feature', value : 'auto',
        description: 'zstd compression support')
+option('qpl', type : 'feature', value : 'auto',
+       description: 'Query Processing Library support')
 option('fuse', type: 'feature', value: 'auto',
        description: 'FUSE block device export')
 option('fuse_lseek', type : 'feature', value : 'auto',
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 2a74b0275b..e2adb13ce5 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -206,6 +206,7 @@ meson_options_help() {
   printf "%s\n" '                  Xen PCI passthrough support'
   printf "%s\n" '  xkbcommon       xkbcommon support'
   printf "%s\n" '  zstd            zstd compression support'
+  printf "%s\n" '  qpl             Query Processing Library support'
 }
 _meson_option_parse() {
   case $1 in
@@ -417,6 +418,8 @@ _meson_option_parse() {
     --disable-qga-vss) printf "%s" -Dqga_vss=disabled ;;
     --enable-qom-cast-debug) printf "%s" -Dqom_cast_debug=true ;;
     --disable-qom-cast-debug) printf "%s" -Dqom_cast_debug=false ;;
+    --enable-qpl) printf "%s" -Dqpl=enabled ;;
+    --disable-qpl) printf "%s" -Dqpl=disabled ;;
     --enable-rbd) printf "%s" -Drbd=enabled ;;
     --disable-rbd) printf "%s" -Drbd=disabled ;;
     --enable-rdma) printf "%s" -Drdma=enabled ;;
-- 
2.39.3
^ permalink raw reply related	[flat|nested] 25+ messages in thread
- * Re: [PATCH 1/5] configure: add qpl meson option
  2023-10-18 22:12 ` [PATCH 1/5] configure: add qpl meson option Yuan Liu
@ 2023-10-19 11:12   ` Juan Quintela
  0 siblings, 0 replies; 25+ messages in thread
From: Juan Quintela @ 2023-10-19 11:12 UTC (permalink / raw)
  To: Yuan Liu; +Cc: peterx, farosas, leobras, qemu-devel, nanhai.zou
Yuan Liu <yuan1.liu@intel.com> wrote:
> Intel Query Processing Library (QPL) is an open-source library that
> supports features of the new Intel In-Memory Analytics Accelerator (IAA)
> available on Intel Xeon Sapphire Rapids processors, including
> high-throughput compression and decompression.
>
> add --enable-qpl and --disable-qpl options for data (de)compression
> using IAA during the live migration process.
>
> Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> @@ -2158,6 +2163,7 @@ config_host_data.set('CONFIG_MALLOC_TRIM', has_malloc_trim)
>  config_host_data.set('CONFIG_STATX', has_statx)
>  config_host_data.set('CONFIG_STATX_MNT_ID', has_statx_mnt_id)
>  config_host_data.set('CONFIG_ZSTD', zstd.found())
> +config_host_data.set('CONFIG_QPL', qpl.found())
>  config_host_data.set('CONFIG_FUSE', fuse.found())
>  config_host_data.set('CONFIG_FUSE_LSEEK', fuse_lseek.found())
>  config_host_data.set('CONFIG_SPICE_PROTOCOL', spice_protocol.found())
> @@ -3616,7 +3622,7 @@ libmigration = static_library('migration', sources: migration_files + genh,
>                                name_suffix: 'fa',
>                                build_by_default: false)
>  migration = declare_dependency(link_with: libmigration,
> -                               dependencies: [zlib, qom, io])
> +                               dependencies: [zlib, qom, io, qpl])
I think this is wrong.  Look how zstd is done.
You need to add something like this:
system_ss.add(when: zstd, if_true: files('multifd-zstd.c'))
to migration/meson.build
Or asking this other way, does this works if qpl is not there?  Or if
you compile for anything that is not x86?
Later, Juan.
^ permalink raw reply	[flat|nested] 25+ messages in thread
 
- * [PATCH 2/5] qapi/migration: Introduce compress-with-iaa migration parameter
  2023-10-18 22:12 [PATCH 0/5] Live Migration Acceleration with IAA Compression Yuan Liu
  2023-10-18 22:12 ` [PATCH 1/5] configure: add qpl meson option Yuan Liu
@ 2023-10-18 22:12 ` Yuan Liu
  2023-10-19 11:15   ` Juan Quintela
  2023-10-19 14:02   ` Peter Xu
  2023-10-18 22:12 ` [PATCH 3/5] ram compress: Refactor ram compression functions Yuan Liu
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Yuan Liu @ 2023-10-18 22:12 UTC (permalink / raw)
  To: quintela, peterx, farosas, leobras; +Cc: qemu-devel, yuan1.liu, nanhai.zou
Introduce the compress-with-iaa=on/off option to enable or disable live
migration data (de)compression with the In-Memory Analytics Accelerator
(IAA).
The data (de)compression with IAA feature is based on the migration
compression capability, which is enabled by setting
migrate_set_capability compress on. If the migration compression
capability is enabled and the IAA compression parameter is set, IAA will
be used instead of CPU for data (de)compression.
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
---
 migration/migration-hmp-cmds.c |  8 ++++++++
 migration/options.c            | 20 ++++++++++++++++++++
 migration/options.h            |  1 +
 qapi/migration.json            |  4 +++-
 4 files changed, 32 insertions(+), 1 deletion(-)
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index c115ef2d23..38e441bb37 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -281,6 +281,10 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict)
         monitor_printf(mon, "%s: %u\n",
             MigrationParameter_str(MIGRATION_PARAMETER_COMPRESS_THREADS),
             params->compress_threads);
+        assert(params->has_compress_with_iaa);
+        monitor_printf(mon, "%s: %s\n",
+            MigrationParameter_str(MIGRATION_PARAMETER_COMPRESS_WITH_IAA),
+            params->compress_with_iaa ? "on" : "off");
         assert(params->has_compress_wait_thread);
         monitor_printf(mon, "%s: %s\n",
             MigrationParameter_str(MIGRATION_PARAMETER_COMPRESS_WAIT_THREAD),
@@ -517,6 +521,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
         p->has_compress_threads = true;
         visit_type_uint8(v, param, &p->compress_threads, &err);
         break;
+    case MIGRATION_PARAMETER_COMPRESS_WITH_IAA:
+        p->has_compress_with_iaa = true;
+        visit_type_bool(v, param, &p->compress_with_iaa, &err);
+        break;
     case MIGRATION_PARAMETER_COMPRESS_WAIT_THREAD:
         p->has_compress_wait_thread = true;
         visit_type_bool(v, param, &p->compress_wait_thread, &err);
diff --git a/migration/options.c b/migration/options.c
index 1d1e1321b0..06d4b36b77 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -107,6 +107,8 @@ Property migration_properties[] = {
     DEFINE_PROP_UINT8("x-compress-threads", MigrationState,
                       parameters.compress_threads,
                       DEFAULT_MIGRATE_COMPRESS_THREAD_COUNT),
+    DEFINE_PROP_BOOL("x-compress-with-iaa", MigrationState,
+                      parameters.compress_with_iaa, false),
     DEFINE_PROP_BOOL("x-compress-wait-thread", MigrationState,
                       parameters.compress_wait_thread, true),
     DEFINE_PROP_UINT8("x-decompress-threads", MigrationState,
@@ -724,6 +726,13 @@ int migrate_compress_threads(void)
     return s->parameters.compress_threads;
 }
 
+bool migrate_compress_with_iaa(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    return s->parameters.compress_with_iaa;
+}
+
 int migrate_compress_wait_thread(void)
 {
     MigrationState *s = migrate_get_current();
@@ -899,6 +908,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->compress_level = s->parameters.compress_level;
     params->has_compress_threads = true;
     params->compress_threads = s->parameters.compress_threads;
+    params->has_compress_with_iaa = true;
+    params->compress_with_iaa = s->parameters.compress_with_iaa;
     params->has_compress_wait_thread = true;
     params->compress_wait_thread = s->parameters.compress_wait_thread;
     params->has_decompress_threads = true;
@@ -969,6 +980,7 @@ void migrate_params_init(MigrationParameters *params)
     /* Set has_* up only for parameter checks */
     params->has_compress_level = true;
     params->has_compress_threads = true;
+    params->has_compress_with_iaa = true;
     params->has_compress_wait_thread = true;
     params->has_decompress_threads = true;
     params->has_throttle_trigger_threshold = true;
@@ -1195,6 +1207,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
         dest->decompress_threads = params->decompress_threads;
     }
 
+    if (params->has_compress_with_iaa) {
+        dest->compress_with_iaa = params->compress_with_iaa;
+    }
+
     if (params->has_throttle_trigger_threshold) {
         dest->throttle_trigger_threshold = params->throttle_trigger_threshold;
     }
@@ -1300,6 +1316,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
         s->parameters.decompress_threads = params->decompress_threads;
     }
 
+    if (params->has_compress_with_iaa) {
+        s->parameters.compress_with_iaa = params->compress_with_iaa;
+    }
+
     if (params->has_throttle_trigger_threshold) {
         s->parameters.throttle_trigger_threshold = params->throttle_trigger_threshold;
     }
diff --git a/migration/options.h b/migration/options.h
index 045e2a41a2..926d723d0e 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -77,6 +77,7 @@ uint8_t migrate_cpu_throttle_increment(void);
 uint8_t migrate_cpu_throttle_initial(void);
 bool migrate_cpu_throttle_tailslow(void);
 int migrate_decompress_threads(void);
+bool migrate_compress_with_iaa(void);
 uint64_t migrate_downtime_limit(void);
 uint8_t migrate_max_cpu_throttle(void);
 uint64_t migrate_max_bandwidth(void);
diff --git a/qapi/migration.json b/qapi/migration.json
index 8843e74b59..8edc622dd9 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -835,7 +835,7 @@
 { 'enum': 'MigrationParameter',
   'data': ['announce-initial', 'announce-max',
            'announce-rounds', 'announce-step',
-           'compress-level', 'compress-threads', 'decompress-threads',
+           'compress-level', 'compress-threads', 'compress-with-iaa', 'decompress-threads',
            'compress-wait-thread', 'throttle-trigger-threshold',
            'cpu-throttle-initial', 'cpu-throttle-increment',
            'cpu-throttle-tailslow',
@@ -1008,6 +1008,7 @@
             '*announce-step': 'size',
             '*compress-level': 'uint8',
             '*compress-threads': 'uint8',
+            '*compress-with-iaa': 'bool',
             '*compress-wait-thread': 'bool',
             '*decompress-threads': 'uint8',
             '*throttle-trigger-threshold': 'uint8',
@@ -1208,6 +1209,7 @@
             '*announce-step': 'size',
             '*compress-level': 'uint8',
             '*compress-threads': 'uint8',
+            '*compress-with-iaa': 'bool',
             '*compress-wait-thread': 'bool',
             '*decompress-threads': 'uint8',
             '*throttle-trigger-threshold': 'uint8',
-- 
2.39.3
^ permalink raw reply related	[flat|nested] 25+ messages in thread
- * Re: [PATCH 2/5] qapi/migration: Introduce compress-with-iaa migration parameter
  2023-10-18 22:12 ` [PATCH 2/5] qapi/migration: Introduce compress-with-iaa migration parameter Yuan Liu
@ 2023-10-19 11:15   ` Juan Quintela
  2023-10-19 14:02   ` Peter Xu
  1 sibling, 0 replies; 25+ messages in thread
From: Juan Quintela @ 2023-10-19 11:15 UTC (permalink / raw)
  To: Yuan Liu; +Cc: peterx, farosas, leobras, qemu-devel, nanhai.zou
Yuan Liu <yuan1.liu@intel.com> wrote:
> Introduce the compress-with-iaa=on/off option to enable or disable live
> migration data (de)compression with the In-Memory Analytics Accelerator
> (IAA).
>
> The data (de)compression with IAA feature is based on the migration
> compression capability, which is enabled by setting
> migrate_set_capability compress on. If the migration compression
> capability is enabled and the IAA compression parameter is set, IAA will
> be used instead of CPU for data (de)compression.
>
> Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> @@ -724,6 +726,13 @@ int migrate_compress_threads(void)
>      return s->parameters.compress_threads;
>  }
>  
> +bool migrate_compress_with_iaa(void)
> +{
> +    MigrationState *s = migrate_get_current();
> +
> +    return s->parameters.compress_with_iaa;
> +}
> +
This should be in migration/options.c
> @@ -77,6 +77,7 @@ uint8_t migrate_cpu_throttle_increment(void);
>  uint8_t migrate_cpu_throttle_initial(void);
>  bool migrate_cpu_throttle_tailslow(void);
>  int migrate_decompress_threads(void);
> +bool migrate_compress_with_iaa(void);
>  uint64_t migrate_downtime_limit(void);
>  uint8_t migrate_max_cpu_throttle(void);
>  uint64_t migrate_max_bandwidth(void);
This list of functions is sorted.  The same in the migration/options.c.
^ permalink raw reply	[flat|nested] 25+ messages in thread
- * Re: [PATCH 2/5] qapi/migration: Introduce compress-with-iaa migration parameter
  2023-10-18 22:12 ` [PATCH 2/5] qapi/migration: Introduce compress-with-iaa migration parameter Yuan Liu
  2023-10-19 11:15   ` Juan Quintela
@ 2023-10-19 14:02   ` Peter Xu
  1 sibling, 0 replies; 25+ messages in thread
From: Peter Xu @ 2023-10-19 14:02 UTC (permalink / raw)
  To: Yuan Liu; +Cc: quintela, farosas, leobras, qemu-devel, nanhai.zou
On Thu, Oct 19, 2023 at 06:12:21AM +0800, Yuan Liu wrote:
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 8843e74b59..8edc622dd9 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -835,7 +835,7 @@
>  { 'enum': 'MigrationParameter',
>    'data': ['announce-initial', 'announce-max',
>             'announce-rounds', 'announce-step',
> -           'compress-level', 'compress-threads', 'decompress-threads',
> +           'compress-level', 'compress-threads', 'compress-with-iaa', 'decompress-threads',
>             'compress-wait-thread', 'throttle-trigger-threshold',
>             'cpu-throttle-initial', 'cpu-throttle-increment',
>             'cpu-throttle-tailslow',
> @@ -1008,6 +1008,7 @@
>              '*announce-step': 'size',
>              '*compress-level': 'uint8',
>              '*compress-threads': 'uint8',
> +            '*compress-with-iaa': 'bool',
>              '*compress-wait-thread': 'bool',
>              '*decompress-threads': 'uint8',
>              '*throttle-trigger-threshold': 'uint8',
> @@ -1208,6 +1209,7 @@
>              '*announce-step': 'size',
>              '*compress-level': 'uint8',
>              '*compress-threads': 'uint8',
> +            '*compress-with-iaa': 'bool',
>              '*compress-wait-thread': 'bool',
>              '*decompress-threads': 'uint8',
>              '*throttle-trigger-threshold': 'uint8',
Please add comments for the new fields too in qapi/, thanks.
-- 
Peter Xu
^ permalink raw reply	[flat|nested] 25+ messages in thread
 
- * [PATCH 3/5] ram compress: Refactor ram compression functions
  2023-10-18 22:12 [PATCH 0/5] Live Migration Acceleration with IAA Compression Yuan Liu
  2023-10-18 22:12 ` [PATCH 1/5] configure: add qpl meson option Yuan Liu
  2023-10-18 22:12 ` [PATCH 2/5] qapi/migration: Introduce compress-with-iaa migration parameter Yuan Liu
@ 2023-10-18 22:12 ` Yuan Liu
  2023-10-19 11:19   ` Juan Quintela
  2023-10-18 22:12 ` [PATCH 4/5] migration iaa-compress: Add IAA initialization and deinitialization Yuan Liu
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Yuan Liu @ 2023-10-18 22:12 UTC (permalink / raw)
  To: quintela, peterx, farosas, leobras; +Cc: qemu-devel, yuan1.liu, nanhai.zou
Refactor legacy RAM compression functions to support both IAA
compression and CPU compression.
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
---
 migration/migration.c    |  6 +--
 migration/ram-compress.c | 81 ++++++++++++++++++++++++++++++++--------
 migration/ram-compress.h | 10 ++---
 migration/ram.c          | 18 ++++++---
 4 files changed, 86 insertions(+), 29 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 585d3c8f55..08a9c313d0 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -237,7 +237,7 @@ void migration_incoming_state_destroy(void)
     struct MigrationIncomingState *mis = migration_incoming_get_current();
 
     multifd_load_cleanup();
-    compress_threads_load_cleanup();
+    ram_compress_load_cleanup();
 
     if (mis->to_src_file) {
         /* Tell source that we are done */
@@ -524,7 +524,7 @@ process_incoming_migration_co(void *opaque)
 
     assert(mis->from_src_file);
 
-    if (compress_threads_load_setup(mis->from_src_file)) {
+    if (ram_compress_load_setup(mis->from_src_file)) {
         error_report("Failed to setup decompress threads");
         goto fail;
     }
@@ -577,7 +577,7 @@ fail:
     qemu_fclose(mis->from_src_file);
 
     multifd_load_cleanup();
-    compress_threads_load_cleanup();
+    ram_compress_load_cleanup();
 
     exit(EXIT_FAILURE);
 }
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index 06254d8c69..47357352f7 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -105,11 +105,11 @@ static void *do_data_compress(void *opaque)
     return NULL;
 }
 
-void compress_threads_save_cleanup(void)
+static void compress_threads_save_cleanup(void)
 {
     int i, thread_count;
 
-    if (!migrate_compress() || !comp_param) {
+    if (!comp_param) {
         return;
     }
 
@@ -144,13 +144,10 @@ void compress_threads_save_cleanup(void)
     comp_param = NULL;
 }
 
-int compress_threads_save_setup(void)
+static int compress_threads_save_setup(void)
 {
     int i, thread_count;
 
-    if (!migrate_compress()) {
-        return 0;
-    }
     thread_count = migrate_compress_threads();
     compress_threads = g_new0(QemuThread, thread_count);
     comp_param = g_new0(CompressParam, thread_count);
@@ -370,6 +367,11 @@ int wait_for_decompress_done(void)
         return 0;
     }
 
+    if (migrate_compress_with_iaa()) {
+        /* Implement in next patch */
+        return 0;
+    }
+
     thread_count = migrate_decompress_threads();
     qemu_mutex_lock(&decomp_done_lock);
     for (idx = 0; idx < thread_count; idx++) {
@@ -381,13 +383,10 @@ int wait_for_decompress_done(void)
     return qemu_file_get_error(decomp_file);
 }
 
-void compress_threads_load_cleanup(void)
+static void compress_threads_load_cleanup(void)
 {
     int i, thread_count;
 
-    if (!migrate_compress()) {
-        return;
-    }
     thread_count = migrate_decompress_threads();
     for (i = 0; i < thread_count; i++) {
         /*
@@ -422,14 +421,10 @@ void compress_threads_load_cleanup(void)
     decomp_file = NULL;
 }
 
-int compress_threads_load_setup(QEMUFile *f)
+static int compress_threads_load_setup(QEMUFile *f)
 {
     int i, thread_count;
 
-    if (!migrate_compress()) {
-        return 0;
-    }
-
     thread_count = migrate_decompress_threads();
     decompress_threads = g_new0(QemuThread, thread_count);
     decomp_param = g_new0(DecompressParam, thread_count);
@@ -457,7 +452,7 @@ exit:
     return -1;
 }
 
-void decompress_data_with_multi_threads(QEMUFile *f, void *host, int len)
+static void decompress_data_with_multi_threads(QEMUFile *f, void *host, int len)
 {
     int idx, thread_count;
 
@@ -483,3 +478,57 @@ void decompress_data_with_multi_threads(QEMUFile *f, void *host, int len)
         }
     }
 }
+
+int ram_compress_save_setup(void)
+{
+    if (!migrate_compress()) {
+        return 0;
+    }
+    if (migrate_compress_with_iaa()) {
+        /* Implement in next patch */
+        return 0;
+    }
+    return compress_threads_save_setup();
+}
+
+void ram_compress_save_cleanup(void)
+{
+    if (!migrate_compress()) {
+        return;
+    }
+    if (migrate_compress_with_iaa()) {
+        /* Implement in next patch */
+        return;
+    }
+    compress_threads_save_cleanup();
+}
+
+void ram_decompress_data(QEMUFile *f, void *host, int len)
+{
+    if (migrate_compress_with_iaa()) {
+        /* Implement in next patch */
+    }
+    decompress_data_with_multi_threads(f, host, len);
+}
+
+int ram_compress_load_setup(QEMUFile *f)
+{
+    if (!migrate_compress()) {
+        return 0;
+    }
+    if (migrate_compress_with_iaa()) {
+        /* Implement in next patch */
+    }
+    return compress_threads_load_setup(f);
+}
+
+void ram_compress_load_cleanup(void)
+{
+    if (!migrate_compress()) {
+        return;
+    }
+    if (migrate_compress_with_iaa()) {
+        /* Implement in next patch */
+    }
+    compress_threads_load_cleanup();
+}
diff --git a/migration/ram-compress.h b/migration/ram-compress.h
index 6f7fe2f472..382083acf6 100644
--- a/migration/ram-compress.h
+++ b/migration/ram-compress.h
@@ -55,16 +55,16 @@ struct CompressParam {
 };
 typedef struct CompressParam CompressParam;
 
-void compress_threads_save_cleanup(void);
-int compress_threads_save_setup(void);
+void ram_compress_save_cleanup(void);
+int ram_compress_save_setup(void);
 
 void flush_compressed_data(int (send_queued_data(CompressParam *)));
 int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset,
                                 int (send_queued_data(CompressParam *)));
 
 int wait_for_decompress_done(void);
-void compress_threads_load_cleanup(void);
-int compress_threads_load_setup(QEMUFile *f);
-void decompress_data_with_multi_threads(QEMUFile *f, void *host, int len);
+void ram_compress_load_cleanup(void);
+int ram_compress_load_setup(QEMUFile *f);
+void ram_decompress_data(QEMUFile *f, void *host, int len);
 
 #endif
diff --git a/migration/ram.c b/migration/ram.c
index e4bfd39f08..34ee1de332 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1347,6 +1347,10 @@ static void ram_flush_compressed_data(RAMState *rs)
     if (!save_page_use_compression(rs)) {
         return;
     }
+    if (migrate_compress_with_iaa()) {
+        /* Implement in next patch */
+        return;
+    }
 
     flush_compressed_data(send_queued_data);
 }
@@ -2099,6 +2103,10 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
         return false;
     }
 
+    if (migrate_compress_with_iaa()) {
+        /* Implement in next patch */
+        return true;
+    }
     if (compress_page_with_multi_thread(block, offset, send_queued_data) > 0) {
         return true;
     }
@@ -2498,7 +2506,7 @@ static void ram_save_cleanup(void *opaque)
     }
 
     xbzrle_cleanup();
-    compress_threads_save_cleanup();
+    ram_compress_save_cleanup();
     ram_state_cleanup(rsp);
     g_free(migration_ops);
     migration_ops = NULL;
@@ -3023,14 +3031,14 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
     RAMBlock *block;
     int ret;
 
-    if (compress_threads_save_setup()) {
+    if (ram_compress_save_setup()) {
         return -1;
     }
 
     /* migration has already setup the bitmap, reuse it. */
     if (!migration_in_colo_state()) {
         if (ram_init_all(rsp) != 0) {
-            compress_threads_save_cleanup();
+            ram_compress_save_cleanup();
             return -1;
         }
     }
@@ -3753,7 +3761,7 @@ int ram_load_postcopy(QEMUFile *f, int channel)
                 ret = -EINVAL;
                 break;
             }
-            decompress_data_with_multi_threads(f, page_buffer, len);
+            ram_decompress_data(f, page_buffer, len);
             break;
         case RAM_SAVE_FLAG_MULTIFD_FLUSH:
             multifd_recv_sync_main();
@@ -4022,7 +4030,7 @@ static int ram_load_precopy(QEMUFile *f)
                 ret = -EINVAL;
                 break;
             }
-            decompress_data_with_multi_threads(f, host, len);
+            ram_decompress_data(f, host, len);
             break;
 
         case RAM_SAVE_FLAG_XBZRLE:
-- 
2.39.3
^ permalink raw reply related	[flat|nested] 25+ messages in thread
- * Re: [PATCH 3/5] ram compress: Refactor ram compression functions
  2023-10-18 22:12 ` [PATCH 3/5] ram compress: Refactor ram compression functions Yuan Liu
@ 2023-10-19 11:19   ` Juan Quintela
  0 siblings, 0 replies; 25+ messages in thread
From: Juan Quintela @ 2023-10-19 11:19 UTC (permalink / raw)
  To: Yuan Liu; +Cc: peterx, farosas, leobras, qemu-devel, nanhai.zou
Yuan Liu <yuan1.liu@intel.com> wrote:
> Refactor legacy RAM compression functions to support both IAA
> compression and CPU compression.
>
> Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
Cmopression code is declared obsolete (see patches on list).
I don't think it is a good idea that you put things there.
And here you are doing two things:
- change several functions prefix from compress_threads to ram_compress
- Adding several hooks where you can add the iaa acceleration
Please, split in two different patches:
- rename/create new functions
- put the migrate_compress_with_iaa() hooks
Later, Juan.
> ---
>  migration/migration.c    |  6 +--
>  migration/ram-compress.c | 81 ++++++++++++++++++++++++++++++++--------
>  migration/ram-compress.h | 10 ++---
>  migration/ram.c          | 18 ++++++---
>  4 files changed, 86 insertions(+), 29 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 585d3c8f55..08a9c313d0 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -237,7 +237,7 @@ void migration_incoming_state_destroy(void)
>      struct MigrationIncomingState *mis = migration_incoming_get_current();
>  
>      multifd_load_cleanup();
> -    compress_threads_load_cleanup();
> +    ram_compress_load_cleanup();
>  
>      if (mis->to_src_file) {
>          /* Tell source that we are done */
> @@ -524,7 +524,7 @@ process_incoming_migration_co(void *opaque)
>  
>      assert(mis->from_src_file);
>  
> -    if (compress_threads_load_setup(mis->from_src_file)) {
> +    if (ram_compress_load_setup(mis->from_src_file)) {
>          error_report("Failed to setup decompress threads");
>          goto fail;
>      }
> @@ -577,7 +577,7 @@ fail:
>      qemu_fclose(mis->from_src_file);
>  
>      multifd_load_cleanup();
> -    compress_threads_load_cleanup();
> +    ram_compress_load_cleanup();
>  
>      exit(EXIT_FAILURE);
>  }
> diff --git a/migration/ram-compress.c b/migration/ram-compress.c
> index 06254d8c69..47357352f7 100644
> --- a/migration/ram-compress.c
> +++ b/migration/ram-compress.c
> @@ -105,11 +105,11 @@ static void *do_data_compress(void *opaque)
>      return NULL;
>  }
>  
> -void compress_threads_save_cleanup(void)
> +static void compress_threads_save_cleanup(void)
>  {
>      int i, thread_count;
>  
> -    if (!migrate_compress() || !comp_param) {
> +    if (!comp_param) {
>          return;
>      }
>  
> @@ -144,13 +144,10 @@ void compress_threads_save_cleanup(void)
>      comp_param = NULL;
>  }
>  
> -int compress_threads_save_setup(void)
> +static int compress_threads_save_setup(void)
>  {
>      int i, thread_count;
>  
> -    if (!migrate_compress()) {
> -        return 0;
> -    }
>      thread_count = migrate_compress_threads();
>      compress_threads = g_new0(QemuThread, thread_count);
>      comp_param = g_new0(CompressParam, thread_count);
> @@ -370,6 +367,11 @@ int wait_for_decompress_done(void)
>          return 0;
>      }
>  
> +    if (migrate_compress_with_iaa()) {
> +        /* Implement in next patch */
> +        return 0;
> +    }
> +
>      thread_count = migrate_decompress_threads();
>      qemu_mutex_lock(&decomp_done_lock);
>      for (idx = 0; idx < thread_count; idx++) {
> @@ -381,13 +383,10 @@ int wait_for_decompress_done(void)
>      return qemu_file_get_error(decomp_file);
>  }
>  
> -void compress_threads_load_cleanup(void)
> +static void compress_threads_load_cleanup(void)
>  {
>      int i, thread_count;
>  
> -    if (!migrate_compress()) {
> -        return;
> -    }
>      thread_count = migrate_decompress_threads();
>      for (i = 0; i < thread_count; i++) {
>          /*
> @@ -422,14 +421,10 @@ void compress_threads_load_cleanup(void)
>      decomp_file = NULL;
>  }
>  
> -int compress_threads_load_setup(QEMUFile *f)
> +static int compress_threads_load_setup(QEMUFile *f)
>  {
>      int i, thread_count;
>  
> -    if (!migrate_compress()) {
> -        return 0;
> -    }
> -
>      thread_count = migrate_decompress_threads();
>      decompress_threads = g_new0(QemuThread, thread_count);
>      decomp_param = g_new0(DecompressParam, thread_count);
> @@ -457,7 +452,7 @@ exit:
>      return -1;
>  }
>  
> -void decompress_data_with_multi_threads(QEMUFile *f, void *host, int len)
> +static void decompress_data_with_multi_threads(QEMUFile *f, void *host, int len)
>  {
>      int idx, thread_count;
>  
> @@ -483,3 +478,57 @@ void decompress_data_with_multi_threads(QEMUFile *f, void *host, int len)
>          }
>      }
>  }
> +
> +int ram_compress_save_setup(void)
> +{
> +    if (!migrate_compress()) {
> +        return 0;
> +    }
> +    if (migrate_compress_with_iaa()) {
> +        /* Implement in next patch */
> +        return 0;
> +    }
> +    return compress_threads_save_setup();
> +}
> +
> +void ram_compress_save_cleanup(void)
> +{
> +    if (!migrate_compress()) {
> +        return;
> +    }
> +    if (migrate_compress_with_iaa()) {
> +        /* Implement in next patch */
> +        return;
> +    }
> +    compress_threads_save_cleanup();
> +}
> +
> +void ram_decompress_data(QEMUFile *f, void *host, int len)
> +{
> +    if (migrate_compress_with_iaa()) {
> +        /* Implement in next patch */
> +    }
> +    decompress_data_with_multi_threads(f, host, len);
> +}
> +
> +int ram_compress_load_setup(QEMUFile *f)
> +{
> +    if (!migrate_compress()) {
> +        return 0;
> +    }
> +    if (migrate_compress_with_iaa()) {
> +        /* Implement in next patch */
> +    }
> +    return compress_threads_load_setup(f);
> +}
> +
> +void ram_compress_load_cleanup(void)
> +{
> +    if (!migrate_compress()) {
> +        return;
> +    }
> +    if (migrate_compress_with_iaa()) {
> +        /* Implement in next patch */
> +    }
> +    compress_threads_load_cleanup();
> +}
> diff --git a/migration/ram-compress.h b/migration/ram-compress.h
> index 6f7fe2f472..382083acf6 100644
> --- a/migration/ram-compress.h
> +++ b/migration/ram-compress.h
> @@ -55,16 +55,16 @@ struct CompressParam {
>  };
>  typedef struct CompressParam CompressParam;
>  
> -void compress_threads_save_cleanup(void);
> -int compress_threads_save_setup(void);
> +void ram_compress_save_cleanup(void);
> +int ram_compress_save_setup(void);
>  
>  void flush_compressed_data(int (send_queued_data(CompressParam *)));
>  int compress_page_with_multi_thread(RAMBlock *block, ram_addr_t offset,
>                                  int (send_queued_data(CompressParam *)));
>  
>  int wait_for_decompress_done(void);
> -void compress_threads_load_cleanup(void);
> -int compress_threads_load_setup(QEMUFile *f);
> -void decompress_data_with_multi_threads(QEMUFile *f, void *host, int len);
> +void ram_compress_load_cleanup(void);
> +int ram_compress_load_setup(QEMUFile *f);
> +void ram_decompress_data(QEMUFile *f, void *host, int len);
>  
>  #endif
> diff --git a/migration/ram.c b/migration/ram.c
> index e4bfd39f08..34ee1de332 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1347,6 +1347,10 @@ static void ram_flush_compressed_data(RAMState *rs)
>      if (!save_page_use_compression(rs)) {
>          return;
>      }
> +    if (migrate_compress_with_iaa()) {
> +        /* Implement in next patch */
> +        return;
> +    }
>  
>      flush_compressed_data(send_queued_data);
>  }
> @@ -2099,6 +2103,10 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
>          return false;
>      }
>  
> +    if (migrate_compress_with_iaa()) {
> +        /* Implement in next patch */
> +        return true;
> +    }
>      if (compress_page_with_multi_thread(block, offset, send_queued_data) > 0) {
>          return true;
>      }
> @@ -2498,7 +2506,7 @@ static void ram_save_cleanup(void *opaque)
>      }
>  
>      xbzrle_cleanup();
> -    compress_threads_save_cleanup();
> +    ram_compress_save_cleanup();
>      ram_state_cleanup(rsp);
>      g_free(migration_ops);
>      migration_ops = NULL;
> @@ -3023,14 +3031,14 @@ static int ram_save_setup(QEMUFile *f, void *opaque)
>      RAMBlock *block;
>      int ret;
>  
> -    if (compress_threads_save_setup()) {
> +    if (ram_compress_save_setup()) {
>          return -1;
>      }
>  
>      /* migration has already setup the bitmap, reuse it. */
>      if (!migration_in_colo_state()) {
>          if (ram_init_all(rsp) != 0) {
> -            compress_threads_save_cleanup();
> +            ram_compress_save_cleanup();
>              return -1;
>          }
>      }
> @@ -3753,7 +3761,7 @@ int ram_load_postcopy(QEMUFile *f, int channel)
>                  ret = -EINVAL;
>                  break;
>              }
> -            decompress_data_with_multi_threads(f, page_buffer, len);
> +            ram_decompress_data(f, page_buffer, len);
>              break;
>          case RAM_SAVE_FLAG_MULTIFD_FLUSH:
>              multifd_recv_sync_main();
> @@ -4022,7 +4030,7 @@ static int ram_load_precopy(QEMUFile *f)
>                  ret = -EINVAL;
>                  break;
>              }
> -            decompress_data_with_multi_threads(f, host, len);
> +            ram_decompress_data(f, host, len);
>              break;
>  
>          case RAM_SAVE_FLAG_XBZRLE:
^ permalink raw reply	[flat|nested] 25+ messages in thread
 
- * [PATCH 4/5] migration iaa-compress: Add IAA initialization and deinitialization
  2023-10-18 22:12 [PATCH 0/5] Live Migration Acceleration with IAA Compression Yuan Liu
                   ` (2 preceding siblings ...)
  2023-10-18 22:12 ` [PATCH 3/5] ram compress: Refactor ram compression functions Yuan Liu
@ 2023-10-18 22:12 ` Yuan Liu
  2023-10-19 11:27   ` Juan Quintela
  2023-10-18 22:12 ` [PATCH 5/5] migration iaa-compress: Implement IAA compression Yuan Liu
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Yuan Liu @ 2023-10-18 22:12 UTC (permalink / raw)
  To: quintela, peterx, farosas, leobras; +Cc: qemu-devel, yuan1.liu, nanhai.zou
This patch defines the structure for IAA jobs related to data
compression and decompression, as well as the initialization and
deinitialization processes for IAA.
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
---
 migration/iaa-ram-compress.c | 152 +++++++++++++++++++++++++++++++++++
 migration/iaa-ram-compress.h |  20 +++++
 migration/meson.build        |   1 +
 migration/ram-compress.c     |  21 +++--
 4 files changed, 189 insertions(+), 5 deletions(-)
 create mode 100644 migration/iaa-ram-compress.c
 create mode 100644 migration/iaa-ram-compress.h
diff --git a/migration/iaa-ram-compress.c b/migration/iaa-ram-compress.c
new file mode 100644
index 0000000000..da45952594
--- /dev/null
+++ b/migration/iaa-ram-compress.c
@@ -0,0 +1,152 @@
+/*
+ * QEMU IAA compression support
+ *
+ * Copyright (c) 2023 Intel Corporation
+ *  Written by:
+ *  Yuan Liu<yuan1.liu@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/cutils.h"
+#include "qemu/error-report.h"
+#include "migration.h"
+#include "options.h"
+#include "io/channel-null.h"
+#include "exec/target_page.h"
+#include "exec/ramblock.h"
+#include "iaa-ram-compress.h"
+#include "qpl/qpl.h"
+
+/* The IAA work queue maximum depth */
+#define IAA_JOB_NUM (512)
+
+typedef struct {
+    CompressResult result;
+    ram_addr_t offset; /* The offset of the compressed page in the block */
+    RAMBlock *block; /* The block of the compressed page */
+} iaa_comp_param;
+
+typedef struct {
+    uint8_t *host; /* Target address for decompression page */
+} iaa_decomp_param;
+
+typedef struct IaaJob {
+    QSIMPLEQ_ENTRY(IaaJob) entry;
+    bool is_compression;
+    uint32_t in_len;
+    uint32_t out_len;
+    uint8_t *in_buf;
+    uint8_t *out_buf;
+    qpl_job *qpl; /* It is used to submit (de)compression work to IAA */
+    union {
+        iaa_comp_param comp;
+        iaa_decomp_param decomp;
+    } param;
+} IaaJob;
+
+typedef struct IaaJobPool {
+    uint32_t pos;
+    uint32_t cnt;
+    IaaJob *jobs[IAA_JOB_NUM];
+    uint8_t *job_in_buf; /* The IAA device input buffers for all IAA jobs */
+    uint8_t *job_out_buf; /* The IAA device output buffers for all IAA jobs */
+    size_t buf_size;
+} IaaJobPool;
+
+static IaaJobPool iaa_job_pool;
+/* This is used to record jobs that have been submitted but not yet completed */
+static QSIMPLEQ_HEAD(, IaaJob) polling_queue =
+                                   QSIMPLEQ_HEAD_INITIALIZER(polling_queue);
+
+void iaa_compress_deinit(void)
+{
+    for (int i = 0; i < IAA_JOB_NUM; i++) {
+        if (iaa_job_pool.jobs[i]) {
+            if (iaa_job_pool.jobs[i]->qpl) {
+                qpl_fini_job(iaa_job_pool.jobs[i]->qpl);
+                g_free(iaa_job_pool.jobs[i]->qpl);
+            }
+            g_free(iaa_job_pool.jobs[i]);
+        }
+    }
+    if (iaa_job_pool.job_in_buf) {
+        munmap(iaa_job_pool.job_in_buf, iaa_job_pool.buf_size);
+        iaa_job_pool.job_in_buf = NULL;
+    }
+    if (iaa_job_pool.job_out_buf) {
+        munmap(iaa_job_pool.job_out_buf, iaa_job_pool.buf_size);
+        iaa_job_pool.job_out_buf = NULL;
+    }
+}
+
+int iaa_compress_init(bool is_decompression)
+{
+    qpl_status status;
+    IaaJob *job = NULL;
+    uint32_t qpl_hw_size = 0;
+    int flags = MAP_PRIVATE | MAP_POPULATE | MAP_ANONYMOUS;
+    size_t buf_size = IAA_JOB_NUM * qemu_target_page_size();
+
+    QSIMPLEQ_INIT(&polling_queue);
+    memset(&iaa_job_pool, 0, sizeof(IaaJobPool));
+    iaa_job_pool.buf_size = buf_size;
+    iaa_job_pool.job_out_buf = mmap(NULL, buf_size, PROT_READ | PROT_WRITE,
+                                    flags, -1, 0);
+    if (iaa_job_pool.job_out_buf == MAP_FAILED) {
+        error_report("Failed to allocate iaa output buffer, error %s",
+                     strerror(errno));
+        return -1;
+    }
+    /*
+     * There is no need to allocate an input buffer for the compression
+     * function, the IAA hardware can directly access the virtual machine
+     * memory through the host address through Share Virtual Memory(SVM)
+     */
+    if (is_decompression) {
+        iaa_job_pool.job_in_buf = mmap(NULL, buf_size, PROT_READ | PROT_WRITE,
+                                       flags, -1, 0);
+        if (iaa_job_pool.job_in_buf == MAP_FAILED) {
+            error_report("Failed to allocate iaa input buffer, error %s",
+                         strerror(errno));
+            goto init_err;
+        }
+    }
+    status = qpl_get_job_size(qpl_path_hardware, &qpl_hw_size);
+    if (status != QPL_STS_OK) {
+        error_report("Failed to initialize iaa hardware, error %d", status);
+        goto init_err;
+    }
+    for (int i = 0; i < IAA_JOB_NUM; i++) {
+        size_t buf_offset = qemu_target_page_size() * i;
+        job = g_try_malloc0(sizeof(IaaJob));
+        if (!job) {
+            error_report("Failed to allocate iaa job memory, error %s",
+                         strerror(errno));
+            goto init_err;
+        }
+        iaa_job_pool.jobs[i] = job;
+        job->qpl = g_try_malloc0(qpl_hw_size);
+        if (!job->qpl) {
+            error_report("Failed to allocate iaa qpl memory, error %s",
+                         strerror(errno));
+            goto init_err;
+        }
+        if (is_decompression) {
+            job->in_buf = iaa_job_pool.job_in_buf + buf_offset;
+        }
+        job->out_buf = iaa_job_pool.job_out_buf + buf_offset;
+        status = qpl_init_job(qpl_path_hardware, job->qpl);
+        if (status != QPL_STS_OK) {
+            error_report("Failed to initialize iaa qpl, error %d", status);
+            goto init_err;
+        }
+    }
+    return 0;
+init_err:
+    iaa_compress_deinit();
+    return -1;
+}
diff --git a/migration/iaa-ram-compress.h b/migration/iaa-ram-compress.h
new file mode 100644
index 0000000000..27998b255b
--- /dev/null
+++ b/migration/iaa-ram-compress.h
@@ -0,0 +1,20 @@
+/*
+ * QEMU IAA compression support
+ *
+ * Copyright (c) 2023 Intel Corporation
+ *  Written by:
+ *  Yuan Liu<yuan1.liu@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef QEMU_MIGRATION_IAA_COMPRESS_H
+#define QEMU_MIGRATION_IAA_COMPRESS_H
+#include "qemu-file.h"
+#include "ram-compress.h"
+
+int iaa_compress_init(bool is_decompression);
+void iaa_compress_deinit(void);
+#endif
diff --git a/migration/meson.build b/migration/meson.build
index 92b1cc4297..9131815420 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -40,6 +40,7 @@ if get_option('live_block_migration').allowed()
   system_ss.add(files('block.c'))
 endif
 system_ss.add(when: zstd, if_true: files('multifd-zstd.c'))
+system_ss.add(when: qpl, if_true: files('iaa-ram-compress.c'))
 
 specific_ss.add(when: 'CONFIG_SYSTEM_ONLY',
                 if_true: files('ram.c',
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index 47357352f7..acc511ce57 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -30,6 +30,9 @@
 #include "qemu/cutils.h"
 
 #include "ram-compress.h"
+#ifdef CONFIG_QPL
+#include "iaa-ram-compress.h"
+#endif
 
 #include "qemu/error-report.h"
 #include "migration.h"
@@ -484,10 +487,11 @@ int ram_compress_save_setup(void)
     if (!migrate_compress()) {
         return 0;
     }
+#ifdef CONFIG_QPL
     if (migrate_compress_with_iaa()) {
-        /* Implement in next patch */
-        return 0;
+        return iaa_compress_init(false);
     }
+#endif
     return compress_threads_save_setup();
 }
 
@@ -496,10 +500,12 @@ void ram_compress_save_cleanup(void)
     if (!migrate_compress()) {
         return;
     }
+#ifdef CONFIG_QPL
     if (migrate_compress_with_iaa()) {
-        /* Implement in next patch */
+        iaa_compress_deinit();
         return;
     }
+#endif
     compress_threads_save_cleanup();
 }
 
@@ -516,9 +522,11 @@ int ram_compress_load_setup(QEMUFile *f)
     if (!migrate_compress()) {
         return 0;
     }
+#ifdef CONFIG_QPL
     if (migrate_compress_with_iaa()) {
-        /* Implement in next patch */
+        return iaa_compress_init(true);
     }
+#endif
     return compress_threads_load_setup(f);
 }
 
@@ -527,8 +535,11 @@ void ram_compress_load_cleanup(void)
     if (!migrate_compress()) {
         return;
     }
+#ifdef CONFIG_QPL
     if (migrate_compress_with_iaa()) {
-        /* Implement in next patch */
+        iaa_compress_deinit();
+        return;
     }
+#endif
     compress_threads_load_cleanup();
 }
-- 
2.39.3
^ permalink raw reply related	[flat|nested] 25+ messages in thread
- * Re: [PATCH 4/5] migration iaa-compress: Add IAA initialization and deinitialization
  2023-10-18 22:12 ` [PATCH 4/5] migration iaa-compress: Add IAA initialization and deinitialization Yuan Liu
@ 2023-10-19 11:27   ` Juan Quintela
  0 siblings, 0 replies; 25+ messages in thread
From: Juan Quintela @ 2023-10-19 11:27 UTC (permalink / raw)
  To: Yuan Liu; +Cc: peterx, farosas, leobras, qemu-devel, nanhai.zou
Yuan Liu <yuan1.liu@intel.com> wrote:
> This patch defines the structure for IAA jobs related to data
> compression and decompression, as well as the initialization and
> deinitialization processes for IAA.
>
> Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
You should be using orderfile.
$ less .git/config
...
[diff]
        orderFile = scripts/git.orderfile
So .h and friends came first in patches.
> diff --git a/migration/ram-compress.c b/migration/ram-compress.c
> index 47357352f7..acc511ce57 100644
> --- a/migration/ram-compress.c
> +++ b/migration/ram-compress.c
> @@ -30,6 +30,9 @@
>  #include "qemu/cutils.h"
>  
>  #include "ram-compress.h"
> +#ifdef CONFIG_QPL
> +#include "iaa-ram-compress.h"
> +#endif
>  
>  #include "qemu/error-report.h"
>  #include "migration.h"
> @@ -484,10 +487,11 @@ int ram_compress_save_setup(void)
>      if (!migrate_compress()) {
>          return 0;
>      }
> +#ifdef CONFIG_QPL
>      if (migrate_compress_with_iaa()) {
> -        /* Implement in next patch */
> -        return 0;
> +        return iaa_compress_init(false);
>      }
> +#endif
>      return compress_threads_save_setup();
>  }
>  
> @@ -496,10 +500,12 @@ void ram_compress_save_cleanup(void)
>      if (!migrate_compress()) {
>          return;
>      }
> +#ifdef CONFIG_QPL
>      if (migrate_compress_with_iaa()) {
> -        /* Implement in next patch */
> +        iaa_compress_deinit();
>          return;
>      }
> +#endif
>      compress_threads_save_cleanup();
>  }
>  
> @@ -516,9 +522,11 @@ int ram_compress_load_setup(QEMUFile *f)
>      if (!migrate_compress()) {
>          return 0;
>      }
> +#ifdef CONFIG_QPL
>      if (migrate_compress_with_iaa()) {
> -        /* Implement in next patch */
> +        return iaa_compress_init(true);
>      }
> +#endif
>      return compress_threads_load_setup(f);
>  }
>  
> @@ -527,8 +535,11 @@ void ram_compress_load_cleanup(void)
>      if (!migrate_compress()) {
>          return;
>      }
> +#ifdef CONFIG_QPL
>      if (migrate_compress_with_iaa()) {
> -        /* Implement in next patch */
> +        iaa_compress_deinit();
> +        return;
>      }
> +#endif
>      compress_threads_load_cleanup();
>  }
I think it would be easier to understand and implement if you drop
patch3, and just add at each place that there is a:
compress_threads_load_cleanup()
a
iaa_load_cleanup()
And the same for everything else.
Later, Juan.
^ permalink raw reply	[flat|nested] 25+ messages in thread
 
- * [PATCH 5/5] migration iaa-compress: Implement IAA compression
  2023-10-18 22:12 [PATCH 0/5] Live Migration Acceleration with IAA Compression Yuan Liu
                   ` (3 preceding siblings ...)
  2023-10-18 22:12 ` [PATCH 4/5] migration iaa-compress: Add IAA initialization and deinitialization Yuan Liu
@ 2023-10-18 22:12 ` Yuan Liu
  2023-10-19 11:36   ` Juan Quintela
  2023-10-19 11:13 ` [PATCH 0/5] Live Migration Acceleration with IAA Compression Juan Quintela
  2023-10-19 11:40 ` Juan Quintela
  6 siblings, 1 reply; 25+ messages in thread
From: Yuan Liu @ 2023-10-18 22:12 UTC (permalink / raw)
  To: quintela, peterx, farosas, leobras; +Cc: qemu-devel, yuan1.liu, nanhai.zou
Implement the functions of IAA for data compression and decompression.
The implementation uses non-blocking job submission and polling to check
the job completion status to reduce IAA's overhead in the live migration
process.
Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
---
 migration/iaa-ram-compress.c | 167 +++++++++++++++++++++++++++++++++++
 migration/iaa-ram-compress.h |   7 ++
 migration/ram-compress.c     |  10 ++-
 migration/ram.c              |  56 ++++++++++--
 4 files changed, 232 insertions(+), 8 deletions(-)
diff --git a/migration/iaa-ram-compress.c b/migration/iaa-ram-compress.c
index da45952594..243aeb6d55 100644
--- a/migration/iaa-ram-compress.c
+++ b/migration/iaa-ram-compress.c
@@ -12,6 +12,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/cutils.h"
+
 #include "qemu/error-report.h"
 #include "migration.h"
 #include "options.h"
@@ -62,6 +63,31 @@ static IaaJobPool iaa_job_pool;
 static QSIMPLEQ_HEAD(, IaaJob) polling_queue =
                                    QSIMPLEQ_HEAD_INITIALIZER(polling_queue);
 
+static IaaJob *get_job(send_iaa_data send_page)
+{
+    IaaJob *job;
+
+retry:
+    /* Wait for a job to complete when there is no available job */
+    if (iaa_job_pool.cnt == IAA_JOB_NUM) {
+        flush_iaa_jobs(false, send_page);
+        goto retry;
+    }
+    job = iaa_job_pool.jobs[iaa_job_pool.pos];
+    iaa_job_pool.pos++;
+    iaa_job_pool.cnt++;
+    if (iaa_job_pool.pos == IAA_JOB_NUM) {
+        iaa_job_pool.pos = 0;
+    }
+    return job;
+}
+
+static void put_job(IaaJob *job)
+{
+    assert(iaa_job_pool.cnt > 0);
+    iaa_job_pool.cnt--;
+}
+
 void iaa_compress_deinit(void)
 {
     for (int i = 0; i < IAA_JOB_NUM; i++) {
@@ -150,3 +176,144 @@ init_err:
     iaa_compress_deinit();
     return -1;
 }
+
+static void process_completed_job(IaaJob *job, send_iaa_data send_page)
+{
+    if (job->is_compression) {
+        send_page(job->param.comp.block, job->param.comp.offset,
+                  job->out_buf, job->out_len, job->param.comp.result);
+    } else {
+        assert(job->out_len == qemu_target_page_size());
+        memcpy(job->param.decomp.host, job->out_buf, job->out_len);
+    }
+    put_job(job);
+}
+
+static qpl_status check_job_status(IaaJob *job, bool block)
+{
+    qpl_status status;
+    qpl_job *qpl = job->qpl;
+
+    status = block ? qpl_wait_job(qpl) : qpl_check_job(qpl);
+    if (status == QPL_STS_OK) {
+        job->out_len = qpl->total_out;
+        if (job->is_compression) {
+            job->param.comp.result = RES_COMPRESS;
+            /* if no compression benefit, send a normal page for migration */
+            if (job->out_len == qemu_target_page_size()) {
+                iaa_comp_param *param = &(job->param.comp);
+                memcpy(job->out_buf, (param->block->host + param->offset),
+                       job->out_len);
+                job->param.comp.result = RES_NONE;
+            }
+        }
+    } else if (status == QPL_STS_MORE_OUTPUT_NEEDED) {
+        if (job->is_compression) {
+            /*
+             * if the compressed data is larger than the original data, send a
+             * normal page for migration, in this case, IAA has copied the
+             * original data to job->out_buf automatically.
+             */
+            job->out_len = qemu_target_page_size();
+            job->param.comp.result = RES_NONE;
+            status = QPL_STS_OK;
+        }
+    }
+    return status;
+}
+
+static void check_polling_jobs(send_iaa_data send_page)
+{
+    IaaJob *job, *job_next;
+    qpl_status status;
+
+    QSIMPLEQ_FOREACH_SAFE(job, &polling_queue, entry, job_next) {
+        status = check_job_status(job, false);
+        if (status == QPL_STS_OK) { /* job has done */
+            process_completed_job(job, send_page);
+            QSIMPLEQ_REMOVE_HEAD(&polling_queue, entry);
+        } else if (status == QPL_STS_BEING_PROCESSED) { /* job is running */
+            break;
+        } else {
+            abort();
+        }
+    }
+}
+
+static int submit_new_job(IaaJob *job)
+{
+    qpl_status status;
+    qpl_job *qpl = job->qpl;
+
+    qpl->op = job->is_compression ? qpl_op_compress : qpl_op_decompress;
+    qpl->next_in_ptr = job->in_buf;
+    qpl->next_out_ptr = job->out_buf;
+    qpl->available_in = job->in_len;
+    qpl->available_out = qemu_target_page_size(); /* outbuf maximum size */
+    qpl->flags = QPL_FLAG_FIRST | QPL_FLAG_LAST | QPL_FLAG_OMIT_VERIFY;
+    qpl->level = 1; /* only level 1 compression is supported */
+
+    do {
+        status = qpl_submit_job(qpl);
+    } while (status == QPL_STS_QUEUES_ARE_BUSY_ERR);
+
+    if (status != QPL_STS_OK) {
+        error_report("Failed to submit iaa job, error %d", status);
+        return -1;
+    }
+    QSIMPLEQ_INSERT_TAIL(&polling_queue, job, entry);
+    return 0;
+}
+
+int flush_iaa_jobs(bool flush_all_jobs, send_iaa_data send_page)
+{
+    IaaJob *job, *job_next;
+
+    QSIMPLEQ_FOREACH_SAFE(job, &polling_queue, entry, job_next) {
+        if (check_job_status(job, true) != QPL_STS_OK) {
+            return -1;
+        }
+        process_completed_job(job, send_page);
+        QSIMPLEQ_REMOVE_HEAD(&polling_queue, entry);
+        if (!flush_all_jobs) {
+            break;
+        }
+    }
+    return 0;
+}
+
+int compress_page_with_iaa(RAMBlock *block, ram_addr_t offset,
+                           send_iaa_data send_page)
+{
+    IaaJob *job;
+
+    if (iaa_job_pool.cnt != 0) {
+        check_polling_jobs(send_page);
+    }
+    if (buffer_is_zero(block->host + offset, qemu_target_page_size())) {
+        send_page(block, offset, NULL, 0, RES_ZEROPAGE);
+        return 1;
+    }
+    job = get_job(send_page);
+    job->is_compression = true;
+    job->in_buf = block->host + offset;
+    job->in_len = qemu_target_page_size();
+    job->param.comp.offset = offset;
+    job->param.comp.block = block;
+    return (submit_new_job(job) == 0 ? 1 : 0);
+}
+
+int decompress_data_with_iaa(QEMUFile *f, void *host, int len)
+{
+    IaaJob *job;
+
+    if (iaa_job_pool.cnt != 0) {
+        check_polling_jobs(NULL);
+    }
+    job = get_job(NULL);
+    job->is_compression = false;
+    qemu_get_buffer(f, job->in_buf, len);
+    job->in_len = len;
+    job->param.decomp.host = host;
+    return submit_new_job(job);
+}
diff --git a/migration/iaa-ram-compress.h b/migration/iaa-ram-compress.h
index 27998b255b..5a555b3b8d 100644
--- a/migration/iaa-ram-compress.h
+++ b/migration/iaa-ram-compress.h
@@ -15,6 +15,13 @@
 #include "qemu-file.h"
 #include "ram-compress.h"
 
+typedef int (*send_iaa_data) (RAMBlock *block, ram_addr_t offset, uint8_t *data,
+                              uint32_t data_len, CompressResult result);
+
 int iaa_compress_init(bool is_decompression);
 void iaa_compress_deinit(void);
+int compress_page_with_iaa(RAMBlock *block, ram_addr_t offset,
+                           send_iaa_data send_page);
+int decompress_data_with_iaa(QEMUFile *f, void *host, int len);
+int flush_iaa_jobs(bool flush_all_jobs, send_iaa_data send_page);
 #endif
diff --git a/migration/ram-compress.c b/migration/ram-compress.c
index acc511ce57..0bddf8b9ea 100644
--- a/migration/ram-compress.c
+++ b/migration/ram-compress.c
@@ -370,10 +370,11 @@ int wait_for_decompress_done(void)
         return 0;
     }
 
+#ifdef CONFIG_QPL
     if (migrate_compress_with_iaa()) {
-        /* Implement in next patch */
-        return 0;
+        return flush_iaa_jobs(true, NULL);
     }
+#endif
 
     thread_count = migrate_decompress_threads();
     qemu_mutex_lock(&decomp_done_lock);
@@ -511,9 +512,12 @@ void ram_compress_save_cleanup(void)
 
 void ram_decompress_data(QEMUFile *f, void *host, int len)
 {
+#ifdef CONFIG_QPL
     if (migrate_compress_with_iaa()) {
-        /* Implement in next patch */
+        decompress_data_with_iaa(f, host, len);
+        return;
     }
+#endif
     decompress_data_with_multi_threads(f, host, len);
 }
 
diff --git a/migration/ram.c b/migration/ram.c
index 34ee1de332..5ef818112c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -69,6 +69,9 @@
 #include "qemu/userfaultfd.h"
 #endif /* defined(__linux__) */
 
+#ifdef CONFIG_QPL
+#include "iaa-ram-compress.h"
+#endif
 /***********************************************************/
 /* ram save/restore */
 
@@ -1342,16 +1345,59 @@ static int send_queued_data(CompressParam *param)
     return len;
 }
 
+#ifdef CONFIG_QPL
+static int send_iaa_compressed_page(RAMBlock *block, ram_addr_t offset,
+                                    uint8_t *data, uint32_t data_len,
+                                    CompressResult result)
+{
+    PageSearchStatus *pss = &ram_state->pss[RAM_CHANNEL_PRECOPY];
+    MigrationState *ms = migrate_get_current();
+    QEMUFile *file = ms->to_dst_file;
+    int len = 0;
+
+    assert(block == pss->last_sent_block);
+    if (result == RES_ZEROPAGE) {
+        len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_ZERO);
+        qemu_put_byte(file, 0);
+        len += 1;
+        ram_release_page(block->idstr, offset);
+        stat64_add(&mig_stats.zero_pages, 1);
+    } else if (result == RES_COMPRESS) {
+        assert(data != NULL);
+        assert((data_len > 0) && (data_len < qemu_target_page_size()));
+        len += save_page_header(pss, file, block,
+                                offset | RAM_SAVE_FLAG_COMPRESS_PAGE);
+        qemu_put_be32(file, data_len);
+        qemu_put_buffer(file, data, data_len);
+        len += data_len;
+        /* 8 means a header with RAM_SAVE_FLAG_CONTINUE. */
+        compression_counters.compressed_size += len - 8;
+        compression_counters.pages++;
+    } else if (result == RES_NONE) {
+        assert((data != NULL) && (data_len == TARGET_PAGE_SIZE));
+        len += save_page_header(pss, file, block, offset | RAM_SAVE_FLAG_PAGE);
+        qemu_put_buffer(file, data, data_len);
+        len += data_len;
+        stat64_add(&mig_stats.normal_pages, 1);
+    } else {
+        abort();
+    }
+    ram_transferred_add(len);
+    return len;
+}
+#endif
+
 static void ram_flush_compressed_data(RAMState *rs)
 {
     if (!save_page_use_compression(rs)) {
         return;
     }
+#ifdef CONFIG_QPL
     if (migrate_compress_with_iaa()) {
-        /* Implement in next patch */
+        flush_iaa_jobs(true, send_iaa_compressed_page);
         return;
     }
-
+#endif
     flush_compressed_data(send_queued_data);
 }
 
@@ -2102,11 +2148,11 @@ static bool save_compress_page(RAMState *rs, PageSearchStatus *pss,
         ram_flush_compressed_data(rs);
         return false;
     }
-
+#ifdef CONFIG_QPL
     if (migrate_compress_with_iaa()) {
-        /* Implement in next patch */
-        return true;
+        return compress_page_with_iaa(block, offset, send_iaa_compressed_page);
     }
+#endif
     if (compress_page_with_multi_thread(block, offset, send_queued_data) > 0) {
         return true;
     }
-- 
2.39.3
^ permalink raw reply related	[flat|nested] 25+ messages in thread
- * Re: [PATCH 5/5] migration iaa-compress: Implement IAA compression
  2023-10-18 22:12 ` [PATCH 5/5] migration iaa-compress: Implement IAA compression Yuan Liu
@ 2023-10-19 11:36   ` Juan Quintela
  0 siblings, 0 replies; 25+ messages in thread
From: Juan Quintela @ 2023-10-19 11:36 UTC (permalink / raw)
  To: Yuan Liu; +Cc: peterx, farosas, leobras, qemu-devel, nanhai.zou
Yuan Liu <yuan1.liu@intel.com> wrote:
> Implement the functions of IAA for data compression and decompression.
> The implementation uses non-blocking job submission and polling to check
> the job completion status to reduce IAA's overhead in the live migration
> process.
>
> Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com>
> +static void process_completed_job(IaaJob *job, send_iaa_data send_page)
> +{
> +    if (job->is_compression) {
> +        send_page(job->param.comp.block, job->param.comp.offset,
> +                  job->out_buf, job->out_len, job->param.comp.result);
> +    } else {
> +        assert(job->out_len == qemu_target_page_size());
> +        memcpy(job->param.decomp.host, job->out_buf, job->out_len);
> +    }
> +    put_job(job);
> +}
Shouldn't it be easier to add a helper to job struct and not having that
if here?  I.e. become:
static void process_completed_job(IaaJob *job, send_iaa_data send_page)
{
    job->completed(job, send_page);
    put_job(job);
}
And do proper initializations.  You can even put the send_page callback
in the job struct.
> +static qpl_status check_job_status(IaaJob *job, bool block)
> +{
> +    qpl_status status;
> +    qpl_job *qpl = job->qpl;
> +
> +    status = block ? qpl_wait_job(qpl) : qpl_check_job(qpl);
> +    if (status == QPL_STS_OK) {
> +        job->out_len = qpl->total_out;
> +        if (job->is_compression) {
> +            job->param.comp.result = RES_COMPRESS;
> +            /* if no compression benefit, send a normal page for migration */
> +            if (job->out_len == qemu_target_page_size()) {
> +                iaa_comp_param *param = &(job->param.comp);
> +                memcpy(job->out_buf, (param->block->host + param->offset),
> +                       job->out_len);
> +                job->param.comp.result = RES_NONE;
> +            }
> +        }
> +    } else if (status == QPL_STS_MORE_OUTPUT_NEEDED) {
> +        if (job->is_compression) {
> +            /*
> +             * if the compressed data is larger than the original data, send a
> +             * normal page for migration, in this case, IAA has copied the
> +             * original data to job->out_buf automatically.
> +             */
> +            job->out_len = qemu_target_page_size();
> +            job->param.comp.result = RES_NONE;
> +            status = QPL_STS_OK;
> +        }
> +    }
Again, this function for decompression becomes a single line:
    status = block ? qpl_wait_job(qpl) : qpl_check_job(qpl);
    if (status == QPL_STS_OK) {
        job->out_len = qpl->total_out;
    }
Wait complicate it?
> +static void check_polling_jobs(send_iaa_data send_page)
> +{
> +    IaaJob *job, *job_next;
> +    qpl_status status;
> +
> +    QSIMPLEQ_FOREACH_SAFE(job, &polling_queue, entry, job_next) {
> +        status = check_job_status(job, false);
> +        if (status == QPL_STS_OK) { /* job has done */
> +            process_completed_job(job, send_page);
> +            QSIMPLEQ_REMOVE_HEAD(&polling_queue, entry);
> +        } else if (status == QPL_STS_BEING_PROCESSED) { /* job is running */
> +            break;
> +        } else {
> +            abort();
Not even printing an error message?
The two callers of check_polling_jobs() can return an error, so no
reason to abort() here.
Later, Juan.
^ permalink raw reply	[flat|nested] 25+ messages in thread
 
- * Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
  2023-10-18 22:12 [PATCH 0/5] Live Migration Acceleration with IAA Compression Yuan Liu
                   ` (4 preceding siblings ...)
  2023-10-18 22:12 ` [PATCH 5/5] migration iaa-compress: Implement IAA compression Yuan Liu
@ 2023-10-19 11:13 ` Juan Quintela
  2023-10-19 11:40 ` Juan Quintela
  6 siblings, 0 replies; 25+ messages in thread
From: Juan Quintela @ 2023-10-19 11:13 UTC (permalink / raw)
  To: Yuan Liu; +Cc: peterx, farosas, leobras, qemu-devel, nanhai.zou
Yuan Liu <yuan1.liu@intel.com> wrote:
> Hi,
>
> I am writing to submit a code change aimed at enhancing live migration
> acceleration by leveraging the compression capability of the Intel
> In-Memory Analytics Accelerator (IAA).
>
> Enabling compression functionality during the live migration process can
> enhance performance, thereby reducing downtime and network bandwidth
> requirements. However, this improvement comes at the cost of additional
> CPU resources, posing a challenge for cloud service providers in terms of
> resource allocation. To address this challenge, I have focused on offloading
> the compression overhead to the IAA hardware, resulting in performance gains.
Do you have any numbers that you can share?
Thanks, Juan.
> The implementation of the IAA (de)compression code is based on Intel Query
> Processing Library (QPL), an open-source software project designed for
> IAA high-level software programming.
>
> Best regards,
> Yuan Liu
>
> Yuan Liu (5):
>   configure: add qpl meson option
>   qapi/migration: Introduce compress-with-iaa migration parameter
>   ram compress: Refactor ram compression interfaces
>   migration iaa-compress: Add IAA initialization and deinitialization
>   migration iaa-compress: Implement IAA compression
>
>  meson.build                    |   9 +-
>  meson_options.txt              |   2 +
>  migration/iaa-ram-compress.c   | 319 +++++++++++++++++++++++++++++++++
>  migration/iaa-ram-compress.h   |  27 +++
>  migration/meson.build          |   1 +
>  migration/migration-hmp-cmds.c |   8 +
>  migration/migration.c          |   6 +-
>  migration/options.c            |  20 +++
>  migration/options.h            |   1 +
>  migration/ram-compress.c       |  96 ++++++++--
>  migration/ram-compress.h       |  10 +-
>  migration/ram.c                |  68 ++++++-
>  qapi/migration.json            |   4 +-
>  scripts/meson-buildoptions.sh  |   3 +
>  14 files changed, 541 insertions(+), 33 deletions(-)
>  create mode 100644 migration/iaa-ram-compress.c
>  create mode 100644 migration/iaa-ram-compress.h
^ permalink raw reply	[flat|nested] 25+ messages in thread
- * Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
  2023-10-18 22:12 [PATCH 0/5] Live Migration Acceleration with IAA Compression Yuan Liu
                   ` (5 preceding siblings ...)
  2023-10-19 11:13 ` [PATCH 0/5] Live Migration Acceleration with IAA Compression Juan Quintela
@ 2023-10-19 11:40 ` Juan Quintela
  2023-10-19 14:52   ` Daniel P. Berrangé
  6 siblings, 1 reply; 25+ messages in thread
From: Juan Quintela @ 2023-10-19 11:40 UTC (permalink / raw)
  To: Yuan Liu; +Cc: peterx, farosas, leobras, qemu-devel, nanhai.zou
Yuan Liu <yuan1.liu@intel.com> wrote:
> Hi,
>
> I am writing to submit a code change aimed at enhancing live migration
> acceleration by leveraging the compression capability of the Intel
> In-Memory Analytics Accelerator (IAA).
>
> Enabling compression functionality during the live migration process can
> enhance performance, thereby reducing downtime and network bandwidth
> requirements. However, this improvement comes at the cost of additional
> CPU resources, posing a challenge for cloud service providers in terms of
> resource allocation. To address this challenge, I have focused on offloading
> the compression overhead to the IAA hardware, resulting in performance gains.
>
> The implementation of the IAA (de)compression code is based on Intel Query
> Processing Library (QPL), an open-source software project designed for
> IAA high-level software programming.
>
> Best regards,
> Yuan Liu
After reviewing the patches:
- why are you doing this on top of old compression code, that is
  obsolete, deprecated and buggy
- why are you not doing it on top of multifd.
You just need to add another compression method on top of multifd.
See how it was done for zstd:
commit 87dc6f5f665f581923536a1346220c7dcebe5105
Author: Juan Quintela <quintela@redhat.com>
Date:   Fri Dec 13 13:47:14 2019 +0100
    multifd: Add zstd compression multifd support
    
    Signed-off-by: Juan Quintela <quintela@redhat.com>
    Acked-by: Markus Armbruster <armbru@redhat.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
You will get 512KB buffers to compress, and it could be faster.
The way it is done today, every channel waits for its compression.  But
you could do a list of pending requests and be asynchronous there.
Later, Juan.
^ permalink raw reply	[flat|nested] 25+ messages in thread
- * Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
  2023-10-19 11:40 ` Juan Quintela
@ 2023-10-19 14:52   ` Daniel P. Berrangé
  2023-10-19 15:23     ` Peter Xu
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel P. Berrangé @ 2023-10-19 14:52 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Yuan Liu, peterx, farosas, leobras, qemu-devel, nanhai.zou
On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> Yuan Liu <yuan1.liu@intel.com> wrote:
> > Hi,
> >
> > I am writing to submit a code change aimed at enhancing live migration
> > acceleration by leveraging the compression capability of the Intel
> > In-Memory Analytics Accelerator (IAA).
> >
> > Enabling compression functionality during the live migration process can
> > enhance performance, thereby reducing downtime and network bandwidth
> > requirements. However, this improvement comes at the cost of additional
> > CPU resources, posing a challenge for cloud service providers in terms of
> > resource allocation. To address this challenge, I have focused on offloading
> > the compression overhead to the IAA hardware, resulting in performance gains.
> >
> > The implementation of the IAA (de)compression code is based on Intel Query
> > Processing Library (QPL), an open-source software project designed for
> > IAA high-level software programming.
> >
> > Best regards,
> > Yuan Liu
> 
> After reviewing the patches:
> 
> - why are you doing this on top of old compression code, that is
>   obsolete, deprecated and buggy
> 
> - why are you not doing it on top of multifd.
> 
> You just need to add another compression method on top of multifd.
> See how it was done for zstd:
I'm not sure that is ideal approach.  IIUC, the IAA/QPL library
is not defining a new compression format. Rather it is providing
a hardware accelerator for 'deflate' format, as can be made
compatible with zlib:
  https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases/deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-reference-link
With multifd we already have a 'zlib' compression format, and so
this IAA/QPL logic would effectively just be a providing a second
implementation of zlib.
Given the use of a standard format, I would expect to be able
to use software zlib on the src, mixed with IAA/QPL zlib on
the target, or vica-verca.
IOW, rather than defining a new compression format for this,
I think we could look at a new migration parameter for
"compression-accelerator": ["auto", "none", "qpl"]
with 'auto' the default, such that we can automatically enable
IAA/QPL when 'zlib' format is requested, if running on a suitable
host.
With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
^ permalink raw reply	[flat|nested] 25+ messages in thread 
- * Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
  2023-10-19 14:52   ` Daniel P. Berrangé
@ 2023-10-19 15:23     ` Peter Xu
  2023-10-19 15:31       ` Juan Quintela
  2023-10-19 15:32       ` Daniel P. Berrangé
  0 siblings, 2 replies; 25+ messages in thread
From: Peter Xu @ 2023-10-19 15:23 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Juan Quintela, Yuan Liu, farosas, leobras, qemu-devel, nanhai.zou
On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
> On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> > Yuan Liu <yuan1.liu@intel.com> wrote:
> > > Hi,
> > >
> > > I am writing to submit a code change aimed at enhancing live migration
> > > acceleration by leveraging the compression capability of the Intel
> > > In-Memory Analytics Accelerator (IAA).
> > >
> > > Enabling compression functionality during the live migration process can
> > > enhance performance, thereby reducing downtime and network bandwidth
> > > requirements. However, this improvement comes at the cost of additional
> > > CPU resources, posing a challenge for cloud service providers in terms of
> > > resource allocation. To address this challenge, I have focused on offloading
> > > the compression overhead to the IAA hardware, resulting in performance gains.
> > >
> > > The implementation of the IAA (de)compression code is based on Intel Query
> > > Processing Library (QPL), an open-source software project designed for
> > > IAA high-level software programming.
> > >
> > > Best regards,
> > > Yuan Liu
> > 
> > After reviewing the patches:
> > 
> > - why are you doing this on top of old compression code, that is
> >   obsolete, deprecated and buggy
> > 
> > - why are you not doing it on top of multifd.
> > 
> > You just need to add another compression method on top of multifd.
> > See how it was done for zstd:
> 
> I'm not sure that is ideal approach.  IIUC, the IAA/QPL library
> is not defining a new compression format. Rather it is providing
> a hardware accelerator for 'deflate' format, as can be made
> compatible with zlib:
> 
>   https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases/deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-reference-link
> 
> With multifd we already have a 'zlib' compression format, and so
> this IAA/QPL logic would effectively just be a providing a second
> implementation of zlib.
> 
> Given the use of a standard format, I would expect to be able
> to use software zlib on the src, mixed with IAA/QPL zlib on
> the target, or vica-verca.
> 
> IOW, rather than defining a new compression format for this,
> I think we could look at a new migration parameter for
> 
> "compression-accelerator": ["auto", "none", "qpl"]
> 
> with 'auto' the default, such that we can automatically enable
> IAA/QPL when 'zlib' format is requested, if running on a suitable
> host.
I was also curious about the format of compression comparing to software
ones when reading.
Would there be a use case that one would prefer soft compression even if
hardware accelerator existed, no matter on src/dst?
I'm wondering whether we can avoid that one more parameter but always use
hardware accelerations as long as possible.
Thanks,
-- 
Peter Xu
^ permalink raw reply	[flat|nested] 25+ messages in thread 
- * Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
  2023-10-19 15:23     ` Peter Xu
@ 2023-10-19 15:31       ` Juan Quintela
  2023-10-19 15:32       ` Daniel P. Berrangé
  1 sibling, 0 replies; 25+ messages in thread
From: Juan Quintela @ 2023-10-19 15:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: Daniel P. Berrangé, Yuan Liu, farosas, leobras, qemu-devel,
	nanhai.zou
Peter Xu <peterx@redhat.com> wrote:
> On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
>> On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
>> > Yuan Liu <yuan1.liu@intel.com> wrote:
>> > > Hi,
>> > >
>> > > I am writing to submit a code change aimed at enhancing live migration
>> > > acceleration by leveraging the compression capability of the Intel
>> > > In-Memory Analytics Accelerator (IAA).
>> > >
>> > > Enabling compression functionality during the live migration process can
>> > > enhance performance, thereby reducing downtime and network bandwidth
>> > > requirements. However, this improvement comes at the cost of additional
>> > > CPU resources, posing a challenge for cloud service providers in terms of
>> > > resource allocation. To address this challenge, I have focused on offloading
>> > > the compression overhead to the IAA hardware, resulting in performance gains.
>> > >
>> > > The implementation of the IAA (de)compression code is based on Intel Query
>> > > Processing Library (QPL), an open-source software project designed for
>> > > IAA high-level software programming.
>> > >
>> > > Best regards,
>> > > Yuan Liu
>> > 
>> > After reviewing the patches:
>> > 
>> > - why are you doing this on top of old compression code, that is
>> >   obsolete, deprecated and buggy
>> > 
>> > - why are you not doing it on top of multifd.
>> > 
>> > You just need to add another compression method on top of multifd.
>> > See how it was done for zstd:
>> 
>> I'm not sure that is ideal approach.  IIUC, the IAA/QPL library
>> is not defining a new compression format. Rather it is providing
>> a hardware accelerator for 'deflate' format, as can be made
>> compatible with zlib:
>> 
>>   https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases/deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-reference-link
>> 
>> With multifd we already have a 'zlib' compression format, and so
>> this IAA/QPL logic would effectively just be a providing a second
>> implementation of zlib.
>> 
>> Given the use of a standard format, I would expect to be able
>> to use software zlib on the src, mixed with IAA/QPL zlib on
>> the target, or vica-verca.
>> 
>> IOW, rather than defining a new compression format for this,
>> I think we could look at a new migration parameter for
>> 
>> "compression-accelerator": ["auto", "none", "qpl"]
>> 
>> with 'auto' the default, such that we can automatically enable
>> IAA/QPL when 'zlib' format is requested, if running on a suitable
>> host.
>
> I was also curious about the format of compression comparing to software
> ones when reading.
>
> Would there be a use case that one would prefer soft compression even if
> hardware accelerator existed, no matter on src/dst?
>
> I'm wondering whether we can avoid that one more parameter but always use
> hardware accelerations as long as possible.
I asked for some benchmarks.
But they need to be againtst not using compression (i.e. plain precopy)
or against using multifd-zlib.
For a single page, I don't know if the added latency will be a winner in
general.
Later, Juan.
^ permalink raw reply	[flat|nested] 25+ messages in thread 
- * Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
  2023-10-19 15:23     ` Peter Xu
  2023-10-19 15:31       ` Juan Quintela
@ 2023-10-19 15:32       ` Daniel P. Berrangé
  2023-10-23  8:33         ` Liu, Yuan1
  1 sibling, 1 reply; 25+ messages in thread
From: Daniel P. Berrangé @ 2023-10-19 15:32 UTC (permalink / raw)
  To: Peter Xu
  Cc: Juan Quintela, Yuan Liu, farosas, leobras, qemu-devel, nanhai.zou
On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
> On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> > > Yuan Liu <yuan1.liu@intel.com> wrote:
> > > > Hi,
> > > >
> > > > I am writing to submit a code change aimed at enhancing live migration
> > > > acceleration by leveraging the compression capability of the Intel
> > > > In-Memory Analytics Accelerator (IAA).
> > > >
> > > > Enabling compression functionality during the live migration process can
> > > > enhance performance, thereby reducing downtime and network bandwidth
> > > > requirements. However, this improvement comes at the cost of additional
> > > > CPU resources, posing a challenge for cloud service providers in terms of
> > > > resource allocation. To address this challenge, I have focused on offloading
> > > > the compression overhead to the IAA hardware, resulting in performance gains.
> > > >
> > > > The implementation of the IAA (de)compression code is based on Intel Query
> > > > Processing Library (QPL), an open-source software project designed for
> > > > IAA high-level software programming.
> > > >
> > > > Best regards,
> > > > Yuan Liu
> > > 
> > > After reviewing the patches:
> > > 
> > > - why are you doing this on top of old compression code, that is
> > >   obsolete, deprecated and buggy
> > > 
> > > - why are you not doing it on top of multifd.
> > > 
> > > You just need to add another compression method on top of multifd.
> > > See how it was done for zstd:
> > 
> > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library
> > is not defining a new compression format. Rather it is providing
> > a hardware accelerator for 'deflate' format, as can be made
> > compatible with zlib:
> > 
> >   https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases/deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-reference-link
> > 
> > With multifd we already have a 'zlib' compression format, and so
> > this IAA/QPL logic would effectively just be a providing a second
> > implementation of zlib.
> > 
> > Given the use of a standard format, I would expect to be able
> > to use software zlib on the src, mixed with IAA/QPL zlib on
> > the target, or vica-verca.
> > 
> > IOW, rather than defining a new compression format for this,
> > I think we could look at a new migration parameter for
> > 
> > "compression-accelerator": ["auto", "none", "qpl"]
> > 
> > with 'auto' the default, such that we can automatically enable
> > IAA/QPL when 'zlib' format is requested, if running on a suitable
> > host.
> 
> I was also curious about the format of compression comparing to software
> ones when reading.
> 
> Would there be a use case that one would prefer soft compression even if
> hardware accelerator existed, no matter on src/dst?
> 
> I'm wondering whether we can avoid that one more parameter but always use
> hardware accelerations as long as possible.
Yeah, I did wonder about whether we could avoid a parameter, but then
I'm thinking  it is good to have an escape hatch if we were to find
any flaws in the QPL library's impl of deflate() that caused interop
problems. 
With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
^ permalink raw reply	[flat|nested] 25+ messages in thread 
- * RE: [PATCH 0/5] Live Migration Acceleration with IAA Compression
  2023-10-19 15:32       ` Daniel P. Berrangé
@ 2023-10-23  8:33         ` Liu, Yuan1
  2023-10-23 10:29           ` Daniel P. Berrangé
  2023-10-23 10:38           ` Juan Quintela
  0 siblings, 2 replies; 25+ messages in thread
From: Liu, Yuan1 @ 2023-10-23  8:33 UTC (permalink / raw)
  To: Daniel P. Berrangé, Peter Xu
  Cc: Juan Quintela, farosas@suse.de, leobras@redhat.com,
	qemu-devel@nongnu.org, Zou, Nanhai
> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Thursday, October 19, 2023 11:32 PM
> To: Peter Xu <peterx@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>; Liu, Yuan1
> <yuan1.liu@intel.com>; farosas@suse.de; leobras@redhat.com; qemu-
> devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
> Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
> 
> On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
> > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
> > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> > > > Yuan Liu <yuan1.liu@intel.com> wrote:
> > > > > Hi,
> > > > >
> > > > > I am writing to submit a code change aimed at enhancing live
> > > > > migration acceleration by leveraging the compression capability
> > > > > of the Intel In-Memory Analytics Accelerator (IAA).
> > > > >
> > > > > Enabling compression functionality during the live migration
> > > > > process can enhance performance, thereby reducing downtime and
> > > > > network bandwidth requirements. However, this improvement comes
> > > > > at the cost of additional CPU resources, posing a challenge for
> > > > > cloud service providers in terms of resource allocation. To
> > > > > address this challenge, I have focused on offloading the compression
> overhead to the IAA hardware, resulting in performance gains.
> > > > >
> > > > > The implementation of the IAA (de)compression code is based on
> > > > > Intel Query Processing Library (QPL), an open-source software
> > > > > project designed for IAA high-level software programming.
> > > > >
> > > > > Best regards,
> > > > > Yuan Liu
> > > >
> > > > After reviewing the patches:
> > > >
> > > > - why are you doing this on top of old compression code, that is
> > > >   obsolete, deprecated and buggy
Some users have not enabled the multifd feature yet, but they will decide whether to enable the compression feature based on the load situation. So I'm wondering if, without multifd, the compression functionality will no longer be available?
> > > > - why are you not doing it on top of multifd.
I plan to submit the support for multifd independently because the multifd compression and legacy compression code are separate.
I looked at the code of multifd about compression. Currently, it uses the CPU synchronous compression mode. Since it is best 
to use the asynchronous processing method of the hardware accelerator,  I would like to get suggestions on the asynchronous implementation.
1. Dirty page scanning and compression pipeline processing, the main thread of live migration submits compression tasks to the hardware, and multifd threads only handle the transmission of compressed pages.
2. Data sending and compression pipeline processing, the Multifd threads submit compression tasks to the hardware and then transmit the compressed data. (A multifd thread job may need to transmit compressed data multiple times.)
> > > > You just need to add another compression method on top of multifd.
> > > > See how it was done for zstd:
Yes, I will refer to zstd to implement multifd compression with IAA
> > > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library is
> > > not defining a new compression format. Rather it is providing a
> > > hardware accelerator for 'deflate' format, as can be made compatible
> > > with zlib:
> > >
> > >
> > > https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases
> > > /deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-refere
> > > nce-link
> > >
> > > With multifd we already have a 'zlib' compression format, and so
> > > this IAA/QPL logic would effectively just be a providing a second
> > > implementation of zlib.
> > >
> > > Given the use of a standard format, I would expect to be able to use
> > > software zlib on the src, mixed with IAA/QPL zlib on the target, or
> > > vica-verca.
> > >
> > > IOW, rather than defining a new compression format for this, I think
> > > we could look at a new migration parameter for
> > >
> > > "compression-accelerator": ["auto", "none", "qpl"]
> > >
> > > with 'auto' the default, such that we can automatically enable
> > > IAA/QPL when 'zlib' format is requested, if running on a suitable
> > > host.
> >
> > I was also curious about the format of compression comparing to
> > software ones when reading.
> >
> > Would there be a use case that one would prefer soft compression even
> > if hardware accelerator existed, no matter on src/dst?
> >
> > I'm wondering whether we can avoid that one more parameter but always
> > use hardware accelerations as long as possible.
I want to add a new compression format(QPL or IAA-Deflate) here. The reasons are as follows:
1. The QPL library already supports both software and hardware paths for compression. The software path uses a fast Deflate compression algorithm, while the hardware path uses IAA.
2. QPL's software and hardware paths are based on the Deflate algorithm, but there is a limitation: the history buffer only supports 4K. The default history buffer for zlib is 32K, which means that IAA cannot decompress zlib-compressed data. However, zlib can decompress IAA-compressed data.
3. For zlib and zstd, Intel QuickAssist Technology can accelerate both of them.
> Yeah, I did wonder about whether we could avoid a parameter, but then I'm
> thinking  it is good to have an escape hatch if we were to find any flaws in the
> QPL library's impl of deflate() that caused interop problems.
> 
> With regards,
> Daniel
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-
> https://www.instagram.com/dberrange :|
^ permalink raw reply	[flat|nested] 25+ messages in thread 
- * Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
  2023-10-23  8:33         ` Liu, Yuan1
@ 2023-10-23 10:29           ` Daniel P. Berrangé
  2023-10-23 10:47             ` Juan Quintela
  2023-10-23 14:36             ` Liu, Yuan1
  2023-10-23 10:38           ` Juan Quintela
  1 sibling, 2 replies; 25+ messages in thread
From: Daniel P. Berrangé @ 2023-10-23 10:29 UTC (permalink / raw)
  To: Liu, Yuan1
  Cc: Peter Xu, Juan Quintela, farosas@suse.de, leobras@redhat.com,
	qemu-devel@nongnu.org, Zou, Nanhai
On Mon, Oct 23, 2023 at 08:33:44AM +0000, Liu, Yuan1 wrote:
> > -----Original Message-----
> > From: Daniel P. Berrangé <berrange@redhat.com>
> > Sent: Thursday, October 19, 2023 11:32 PM
> > To: Peter Xu <peterx@redhat.com>
> > Cc: Juan Quintela <quintela@redhat.com>; Liu, Yuan1
> > <yuan1.liu@intel.com>; farosas@suse.de; leobras@redhat.com; qemu-
> > devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
> > Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
> > 
> > On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
> > > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
> > > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> > > > > Yuan Liu <yuan1.liu@intel.com> wrote:
> > > > > > Hi,
> > > > > >
> > > > > > I am writing to submit a code change aimed at enhancing live
> > > > > > migration acceleration by leveraging the compression capability
> > > > > > of the Intel In-Memory Analytics Accelerator (IAA).
> > > > > >
> > > > > > Enabling compression functionality during the live migration
> > > > > > process can enhance performance, thereby reducing downtime and
> > > > > > network bandwidth requirements. However, this improvement comes
> > > > > > at the cost of additional CPU resources, posing a challenge for
> > > > > > cloud service providers in terms of resource allocation. To
> > > > > > address this challenge, I have focused on offloading the compression
> > overhead to the IAA hardware, resulting in performance gains.
> > > > > >
> > > > > > The implementation of the IAA (de)compression code is based on
> > > > > > Intel Query Processing Library (QPL), an open-source software
> > > > > > project designed for IAA high-level software programming.
> > > > >
> > > > > After reviewing the patches:
> > > > >
> > > > > - why are you doing this on top of old compression code, that is
> > > > >   obsolete, deprecated and buggy
> Some users have not enabled the multifd feature yet, but they will decide whether to enable the compression feature based on the load situation. So I'm wondering if, without multifd, the compression functionality will no longer be available?
> 
> > > > > - why are you not doing it on top of multifd.
> I plan to submit the support for multifd independently because the
> multifd compression and legacy compression code are separate.
So the core question her (for migration maintainers) is whether
contributors should be spending any time at all on non-multifd
code, or if new features should be exclusively for multifd ?
I doesn't make a lot of sense over the long term to have people
spending time implementing the same features twice. IOW, should
we be directly contributors explicitly towards multifd only,
and even consider deprecating non-multifd code at some time ?
> > > > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library is
> > > > not defining a new compression format. Rather it is providing a
> > > > hardware accelerator for 'deflate' format, as can be made compatible
> > > > with zlib:
> > > >
> > > >
> > > > https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases
> > > > /deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-refere
> > > > nce-link
> > > >
> > > > With multifd we already have a 'zlib' compression format, and so
> > > > this IAA/QPL logic would effectively just be a providing a second
> > > > implementation of zlib.
> > > >
> > > > Given the use of a standard format, I would expect to be able to use
> > > > software zlib on the src, mixed with IAA/QPL zlib on the target, or
> > > > vica-verca.
> > > >
> > > > IOW, rather than defining a new compression format for this, I think
> > > > we could look at a new migration parameter for
> > > >
> > > > "compression-accelerator": ["auto", "none", "qpl"]
> > > >
> > > > with 'auto' the default, such that we can automatically enable
> > > > IAA/QPL when 'zlib' format is requested, if running on a suitable
> > > > host.
> > >
> > > I was also curious about the format of compression comparing to
> > > software ones when reading.
> > >
> > > Would there be a use case that one would prefer soft compression even
> > > if hardware accelerator existed, no matter on src/dst?
> > >
> > > I'm wondering whether we can avoid that one more parameter but always
> > > use hardware accelerations as long as possible.
>
> I want to add a new compression format(QPL or IAA-Deflate) here.
> The reasons are as follows:
>
> 1. The QPL library already supports both software and hardware paths
>    for compression. The software path uses a fast Deflate compression
>    algorithm, while the hardware path uses IAA.
That's not a reason to describe this as a new format in QEMU. It is
still deflate, and so conceptually we can model this as 'zlib' and
potentially choose to use QPL automatically.
> 2. QPL's software and hardware paths are based on the Deflate algorithm,
>    but there is a limitation: the history buffer only supports 4K. The
>    default history buffer for zlib is 32K, which means that IAA cannot
>    decompress zlib-compressed data. However, zlib can decompress IAA-
>    compressed data.
That's again not a reason to call it a new compression format in
QEMU. It would mean, however, if compression-accelerator=auto, we
would not be able to safely enable QPL on the incoming QEMU, as we
can't be sure the src used a 4k window.  We could still automatically
enable QPL on outgoing side though.
> 3. For zlib and zstd, Intel QuickAssist Technology can accelerate
>    both of them.
What's the difference between this, and the IAA/QPL ? 
With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
^ permalink raw reply	[flat|nested] 25+ messages in thread 
- * Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
  2023-10-23 10:29           ` Daniel P. Berrangé
@ 2023-10-23 10:47             ` Juan Quintela
  2023-10-23 14:54               ` Liu, Yuan1
  2023-10-23 14:36             ` Liu, Yuan1
  1 sibling, 1 reply; 25+ messages in thread
From: Juan Quintela @ 2023-10-23 10:47 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Liu, Yuan1, Peter Xu, farosas@suse.de, leobras@redhat.com,
	qemu-devel@nongnu.org, Zou, Nanhai
Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Mon, Oct 23, 2023 at 08:33:44AM +0000, Liu, Yuan1 wrote:
>> > -----Original Message-----
>> > From: Daniel P. Berrangé <berrange@redhat.com>
>> > Sent: Thursday, October 19, 2023 11:32 PM
>> > To: Peter Xu <peterx@redhat.com>
>> > Cc: Juan Quintela <quintela@redhat.com>; Liu, Yuan1
>> > <yuan1.liu@intel.com>; farosas@suse.de; leobras@redhat.com; qemu-
>> > devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
>> > Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
>> > 
>> > On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
>> > > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
>> > > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
>> > > > > Yuan Liu <yuan1.liu@intel.com> wrote:
>> > > > > > Hi,
>> > > > > >
>> > > > > > I am writing to submit a code change aimed at enhancing live
>> > > > > > migration acceleration by leveraging the compression capability
>> > > > > > of the Intel In-Memory Analytics Accelerator (IAA).
>> > > > > >
>> > > > > > Enabling compression functionality during the live migration
>> > > > > > process can enhance performance, thereby reducing downtime and
>> > > > > > network bandwidth requirements. However, this improvement comes
>> > > > > > at the cost of additional CPU resources, posing a challenge for
>> > > > > > cloud service providers in terms of resource allocation. To
>> > > > > > address this challenge, I have focused on offloading the compression
>> > overhead to the IAA hardware, resulting in performance gains.
>> > > > > >
>> > > > > > The implementation of the IAA (de)compression code is based on
>> > > > > > Intel Query Processing Library (QPL), an open-source software
>> > > > > > project designed for IAA high-level software programming.
>> > > > >
>> > > > > After reviewing the patches:
>> > > > >
>> > > > > - why are you doing this on top of old compression code, that is
>> > > > >   obsolete, deprecated and buggy
>> Some users have not enabled the multifd feature yet, but they will
>> decide whether to enable the compression feature based on the load
>> situation. So I'm wondering if, without multifd, the compression
>> functionality will no longer be available?
>> 
>> > > > > - why are you not doing it on top of multifd.
>> I plan to submit the support for multifd independently because the
>> multifd compression and legacy compression code are separate.
>
> So the core question her (for migration maintainers) is whether
> contributors should be spending any time at all on non-multifd
> code, or if new features should be exclusively for multifd ?
Only for multifd.
Comparison right now:
- compression (can be done better in multifd)
- plain precopy (we can satturate faster networks with multifd)
- xbzrle: right now only non-multifd (plan to add as another multifd
          compression method)
- exec: This is a hard one.  Fabiano is about to submit a file based
        multifd method.  Advantages over exec:
          * much less space used (it writes each page at the right
            position, no overhead and never the same page on the two
            streams)
          * We can give proper errors, exec is very bad when the exec'd
            process gives an error.
        Disadvantages:
          * libvirt (or any management app) needs to wait for
            compression to end, and launch the exec command by hand.
            I wanted to discuss this with libvirt, if it would be
            possible to remove the use of exec compression.
- rdma: This is a hard one
        Current implementation is a mess
        It is almost un-maintained
        There are two-three years old patches to move it on top of
        multifd
- postcopy: Not implemented.  This is the real reason that we can't
        deprecate precopy and put multifd as default.
- snapshots:  They are to coupled with qcow2.  It should be possible to
        do something more sensible with multifd + file, but we need to walk that
        path when multifd + file hit the tree.
> I doesn't make a lot of sense over the long term to have people
> spending time implementing the same features twice. IOW, should
> we be directly contributors explicitly towards multifd only,
> and even consider deprecating non-multifd code at some time ?
Intel submited something similarish to this on top of QAT several months
back.  I already advised them not to use any time on top of old
compression code and just do things on top of multifd.
Once that we are here, what are the differ]ences of QPL and QAT?
Previous submission used qatzip-devel.
Later, JUan.
>> > > > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library is
>> > > > not defining a new compression format. Rather it is providing a
>> > > > hardware accelerator for 'deflate' format, as can be made compatible
>> > > > with zlib:
>> > > >
>> > > >
>> > > > https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases
>> > > > /deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-refere
>> > > > nce-link
>> > > >
>> > > > With multifd we already have a 'zlib' compression format, and so
>> > > > this IAA/QPL logic would effectively just be a providing a second
>> > > > implementation of zlib.
>> > > >
>> > > > Given the use of a standard format, I would expect to be able to use
>> > > > software zlib on the src, mixed with IAA/QPL zlib on the target, or
>> > > > vica-verca.
>> > > >
>> > > > IOW, rather than defining a new compression format for this, I think
>> > > > we could look at a new migration parameter for
>> > > >
>> > > > "compression-accelerator": ["auto", "none", "qpl"]
>> > > >
>> > > > with 'auto' the default, such that we can automatically enable
>> > > > IAA/QPL when 'zlib' format is requested, if running on a suitable
>> > > > host.
>> > >
>> > > I was also curious about the format of compression comparing to
>> > > software ones when reading.
>> > >
>> > > Would there be a use case that one would prefer soft compression even
>> > > if hardware accelerator existed, no matter on src/dst?
>> > >
>> > > I'm wondering whether we can avoid that one more parameter but always
>> > > use hardware accelerations as long as possible.
>>
>> I want to add a new compression format(QPL or IAA-Deflate) here.
>> The reasons are as follows:
>>
>> 1. The QPL library already supports both software and hardware paths
>>    for compression. The software path uses a fast Deflate compression
>>    algorithm, while the hardware path uses IAA.
>
> That's not a reason to describe this as a new format in QEMU. It is
> still deflate, and so conceptually we can model this as 'zlib' and
> potentially choose to use QPL automatically.
>
>> 2. QPL's software and hardware paths are based on the Deflate algorithm,
>>    but there is a limitation: the history buffer only supports 4K. The
>>    default history buffer for zlib is 32K, which means that IAA cannot
>>    decompress zlib-compressed data. However, zlib can decompress IAA-
>>    compressed data.
>
> That's again not a reason to call it a new compression format in
> QEMU. It would mean, however, if compression-accelerator=auto, we
> would not be able to safely enable QPL on the incoming QEMU, as we
> can't be sure the src used a 4k window.  We could still automatically
> enable QPL on outgoing side though.
>
>> 3. For zlib and zstd, Intel QuickAssist Technology can accelerate
>>    both of them.
>
> What's the difference between this, and the IAA/QPL ? 
>
> With regards,
> Daniel
^ permalink raw reply	[flat|nested] 25+ messages in thread
- * RE: [PATCH 0/5] Live Migration Acceleration with IAA Compression
  2023-10-23 10:47             ` Juan Quintela
@ 2023-10-23 14:54               ` Liu, Yuan1
  0 siblings, 0 replies; 25+ messages in thread
From: Liu, Yuan1 @ 2023-10-23 14:54 UTC (permalink / raw)
  To: quintela@redhat.com, Daniel P.Berrangé
  Cc: Peter Xu, farosas@suse.de, leobras@redhat.com,
	qemu-devel@nongnu.org, Zou, Nanhai
> -----Original Message-----
> From: Juan Quintela <quintela@redhat.com>
> Sent: Monday, October 23, 2023 6:48 PM
> To: Daniel P.Berrangé <berrange@redhat.com>
> Cc: Liu, Yuan1 <yuan1.liu@intel.com>; Peter Xu <peterx@redhat.com>;
> farosas@suse.de; leobras@redhat.com; qemu-devel@nongnu.org; Zou,
> Nanhai <nanhai.zou@intel.com>
> Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
> 
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > On Mon, Oct 23, 2023 at 08:33:44AM +0000, Liu, Yuan1 wrote:
> >> > -----Original Message-----
> >> > From: Daniel P. Berrangé <berrange@redhat.com>
> >> > Sent: Thursday, October 19, 2023 11:32 PM
> >> > To: Peter Xu <peterx@redhat.com>
> >> > Cc: Juan Quintela <quintela@redhat.com>; Liu, Yuan1
> >> > <yuan1.liu@intel.com>; farosas@suse.de; leobras@redhat.com; qemu-
> >> > devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
> >> > Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA
> >> > Compression
> >> >
> >> > On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
> >> > > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
> >> > > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> >> > > > > Yuan Liu <yuan1.liu@intel.com> wrote:
> >> > > > > > Hi,
> >> > > > > >
> >> > > > > > I am writing to submit a code change aimed at enhancing
> >> > > > > > live migration acceleration by leveraging the compression
> >> > > > > > capability of the Intel In-Memory Analytics Accelerator (IAA).
> >> > > > > >
> >> > > > > > Enabling compression functionality during the live
> >> > > > > > migration process can enhance performance, thereby reducing
> >> > > > > > downtime and network bandwidth requirements. However, this
> >> > > > > > improvement comes at the cost of additional CPU resources,
> >> > > > > > posing a challenge for cloud service providers in terms of
> >> > > > > > resource allocation. To address this challenge, I have
> >> > > > > > focused on offloading the compression
> >> > overhead to the IAA hardware, resulting in performance gains.
> >> > > > > >
> >> > > > > > The implementation of the IAA (de)compression code is based
> >> > > > > > on Intel Query Processing Library (QPL), an open-source
> >> > > > > > software project designed for IAA high-level software
> programming.
> >> > > > >
> >> > > > > After reviewing the patches:
> >> > > > >
> >> > > > > - why are you doing this on top of old compression code, that is
> >> > > > >   obsolete, deprecated and buggy
> >> Some users have not enabled the multifd feature yet, but they will
> >> decide whether to enable the compression feature based on the load
> >> situation. So I'm wondering if, without multifd, the compression
> >> functionality will no longer be available?
> >>
> >> > > > > - why are you not doing it on top of multifd.
> >> I plan to submit the support for multifd independently because the
> >> multifd compression and legacy compression code are separate.
> >
> > So the core question her (for migration maintainers) is whether
> > contributors should be spending any time at all on non-multifd code,
> > or if new features should be exclusively for multifd ?
> 
> Only for multifd.
> 
> Comparison right now:
> - compression (can be done better in multifd)
> - plain precopy (we can satturate faster networks with multifd)
> - xbzrle: right now only non-multifd (plan to add as another multifd
>           compression method)
> - exec: This is a hard one.  Fabiano is about to submit a file based
>         multifd method.  Advantages over exec:
>           * much less space used (it writes each page at the right
>             position, no overhead and never the same page on the two
>             streams)
>           * We can give proper errors, exec is very bad when the exec'd
>             process gives an error.
>         Disadvantages:
>           * libvirt (or any management app) needs to wait for
>             compression to end, and launch the exec command by hand.
>             I wanted to discuss this with libvirt, if it would be
>             possible to remove the use of exec compression.
> - rdma: This is a hard one
>         Current implementation is a mess
>         It is almost un-maintained
>         There are two-three years old patches to move it on top of
>         multifd
> - postcopy: Not implemented.  This is the real reason that we can't
>         deprecate precopy and put multifd as default.
> - snapshots:  They are to coupled with qcow2.  It should be possible to
>         do something more sensible with multifd + file, but we need to walk that
>         path when multifd + file hit the tree.
> 
> > I doesn't make a lot of sense over the long term to have people
> > spending time implementing the same features twice. IOW, should we be
> > directly contributors explicitly towards multifd only, and even
> > consider deprecating non-multifd code at some time ?
> 
> Intel submited something similarish to this on top of QAT several months back.
> I already advised them not to use any time on top of old compression code and
> just do things on top of multifd.
> 
> Once that we are here, what are the differ]ences of QPL and QAT?
> Previous submission used qatzip-devel.
Thank you very much for the QAT suggestions. QPL is utilized for IAA, and qatzip-devel is utilized for QAT, both of them are compatible with zlib. 
Qatzip-devel exclusively supports synchronous compression and does not support batch operations. Consequently, for single-page compression, the performance improvement may not be significant. And QPL supports both synchronous and asynchronous compressions.
^ permalink raw reply	[flat|nested] 25+ messages in thread 
 
- * RE: [PATCH 0/5] Live Migration Acceleration with IAA Compression
  2023-10-23 10:29           ` Daniel P. Berrangé
  2023-10-23 10:47             ` Juan Quintela
@ 2023-10-23 14:36             ` Liu, Yuan1
  1 sibling, 0 replies; 25+ messages in thread
From: Liu, Yuan1 @ 2023-10-23 14:36 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Peter Xu, Juan Quintela, farosas@suse.de, leobras@redhat.com,
	qemu-devel@nongnu.org, Zou, Nanhai
> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Monday, October 23, 2023 6:30 PM
> To: Liu, Yuan1 <yuan1.liu@intel.com>
> Cc: Peter Xu <peterx@redhat.com>; Juan Quintela <quintela@redhat.com>;
> farosas@suse.de; leobras@redhat.com; qemu-devel@nongnu.org; Zou,
> Nanhai <nanhai.zou@intel.com>
> Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
> 
> On Mon, Oct 23, 2023 at 08:33:44AM +0000, Liu, Yuan1 wrote:
> > > -----Original Message-----
> > > From: Daniel P. Berrangé <berrange@redhat.com>
> > > Sent: Thursday, October 19, 2023 11:32 PM
> > > To: Peter Xu <peterx@redhat.com>
> > > Cc: Juan Quintela <quintela@redhat.com>; Liu, Yuan1
> > > <yuan1.liu@intel.com>; farosas@suse.de; leobras@redhat.com; qemu-
> > > devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
> > > Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA
> > > Compression
> > >
> > > On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
> > > > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
> > > > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> > > > > > Yuan Liu <yuan1.liu@intel.com> wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am writing to submit a code change aimed at enhancing live
> > > > > > > migration acceleration by leveraging the compression
> > > > > > > capability of the Intel In-Memory Analytics Accelerator (IAA).
> > > > > > >
> > > > > > > Enabling compression functionality during the live migration
> > > > > > > process can enhance performance, thereby reducing downtime
> > > > > > > and network bandwidth requirements. However, this
> > > > > > > improvement comes at the cost of additional CPU resources,
> > > > > > > posing a challenge for cloud service providers in terms of
> > > > > > > resource allocation. To address this challenge, I have
> > > > > > > focused on offloading the compression
> > > overhead to the IAA hardware, resulting in performance gains.
> > > > > > >
> > > > > > > The implementation of the IAA (de)compression code is based
> > > > > > > on Intel Query Processing Library (QPL), an open-source
> > > > > > > software project designed for IAA high-level software programming.
> > > > > >
> > > > > > After reviewing the patches:
> > > > > >
> > > > > > - why are you doing this on top of old compression code, that is
> > > > > >   obsolete, deprecated and buggy
> > Some users have not enabled the multifd feature yet, but they will decide
> whether to enable the compression feature based on the load situation. So I'm
> wondering if, without multifd, the compression functionality will no longer be
> available?
> >
> > > > > > - why are you not doing it on top of multifd.
> > I plan to submit the support for multifd independently because the
> > multifd compression and legacy compression code are separate.
> 
> So the core question her (for migration maintainers) is whether contributors
> should be spending any time at all on non-multifd code, or if new features
> should be exclusively for multifd ?
> 
> I doesn't make a lot of sense over the long term to have people spending time
> implementing the same features twice. IOW, should we be directly contributors
> explicitly towards multifd only, and even consider deprecating non-multifd code
> at some time ?
> 
> > > > > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library
> > > > > is not defining a new compression format. Rather it is providing
> > > > > a hardware accelerator for 'deflate' format, as can be made
> > > > > compatible with zlib:
> > > > >
> > > > >
> > > > > https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_c
> > > > > ases
> > > > > /deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-re
> > > > > fere
> > > > > nce-link
> > > > >
> > > > > With multifd we already have a 'zlib' compression format, and so
> > > > > this IAA/QPL logic would effectively just be a providing a
> > > > > second implementation of zlib.
> > > > >
> > > > > Given the use of a standard format, I would expect to be able to
> > > > > use software zlib on the src, mixed with IAA/QPL zlib on the
> > > > > target, or vica-verca.
> > > > >
> > > > > IOW, rather than defining a new compression format for this, I
> > > > > think we could look at a new migration parameter for
> > > > >
> > > > > "compression-accelerator": ["auto", "none", "qpl"]
> > > > >
> > > > > with 'auto' the default, such that we can automatically enable
> > > > > IAA/QPL when 'zlib' format is requested, if running on a
> > > > > suitable host.
> > > >
> > > > I was also curious about the format of compression comparing to
> > > > software ones when reading.
> > > >
> > > > Would there be a use case that one would prefer soft compression
> > > > even if hardware accelerator existed, no matter on src/dst?
> > > >
> > > > I'm wondering whether we can avoid that one more parameter but
> > > > always use hardware accelerations as long as possible.
> >
> > I want to add a new compression format(QPL or IAA-Deflate) here.
> > The reasons are as follows:
> >
> > 1. The QPL library already supports both software and hardware paths
> >    for compression. The software path uses a fast Deflate compression
> >    algorithm, while the hardware path uses IAA.
> 
> That's not a reason to describe this as a new format in QEMU. It is still deflate,
> and so conceptually we can model this as 'zlib' and potentially choose to use
> QPL automatically.
> 
> > 2. QPL's software and hardware paths are based on the Deflate algorithm,
> >    but there is a limitation: the history buffer only supports 4K. The
> >    default history buffer for zlib is 32K, which means that IAA cannot
> >    decompress zlib-compressed data. However, zlib can decompress IAA-
> >    compressed data.
> 
> That's again not a reason to call it a new compression format in QEMU. It
> would mean, however, if compression-accelerator=auto, we would not be able
> to safely enable QPL on the incoming QEMU, as we can't be sure the src used a
> 4k window.  We could still automatically enable QPL on outgoing side though.
Yes, the compression-accelerator=auto is always available for the source side.
For the destination side, a fallback mechanism is needed, which switches QPL to zlib or QPL software path decompression when the history buffer is larger than 4K.
In the next version of the patch, I would consider not adding a new compression algorithm, but instead adding a compression-accelerator parameter.
Then 
Compression algorithm[zlib]
Compression accelerator[None, auto, iaa]
> > 3. For zlib and zstd, Intel QuickAssist Technology can accelerate
> >    both of them.
> 
> What's the difference between this, and the IAA/QPL ?
Both IAA and QAT support the compression feature.
IAA exclusively supports the deflate algorithm, which is compatible with zlib (history buffer <= 4K). Its target workload includes compression and data analysis.
QAT supports the deflate/zstd/lz4 algorithms and is compatible with software zlib/zstd/lz4. Its target workload includes compression and encryption.
The QPL software path is a component of the Intel ISA-L library (https://github.com/intel/isa-l), a rapid deflate compression library that is fully compatible with zlib, 
ISA-L has the same high compression ratio as zlib, and the throughput is much better than zlib.
QPL ensures that the software can efficiently decompress IAA-compressed data when IAA is unavailable.
> --
> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> |: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
^ permalink raw reply	[flat|nested] 25+ messages in thread 
 
- * Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
  2023-10-23  8:33         ` Liu, Yuan1
  2023-10-23 10:29           ` Daniel P. Berrangé
@ 2023-10-23 10:38           ` Juan Quintela
  2023-10-23 16:32             ` Liu, Yuan1
  1 sibling, 1 reply; 25+ messages in thread
From: Juan Quintela @ 2023-10-23 10:38 UTC (permalink / raw)
  To: Liu, Yuan1
  Cc: Daniel P. Berrangé, Peter Xu, farosas@suse.de,
	leobras@redhat.com, qemu-devel@nongnu.org, Zou, Nanhai
"Liu, Yuan1" <yuan1.liu@intel.com> wrote:
>> -----Original Message-----
>> From: Daniel P. Berrangé <berrange@redhat.com>
>> Sent: Thursday, October 19, 2023 11:32 PM
>> To: Peter Xu <peterx@redhat.com>
>> Cc: Juan Quintela <quintela@redhat.com>; Liu, Yuan1
>> <yuan1.liu@intel.com>; farosas@suse.de; leobras@redhat.com; qemu-
>> devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
>> Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
>> 
>> On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
>> > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
>> > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
>> > > > Yuan Liu <yuan1.liu@intel.com> wrote:
>> > > > > Hi,
>> > > > >
>> > > > > I am writing to submit a code change aimed at enhancing live
>> > > > > migration acceleration by leveraging the compression capability
>> > > > > of the Intel In-Memory Analytics Accelerator (IAA).
>> > > > >
>> > > > > Enabling compression functionality during the live migration
>> > > > > process can enhance performance, thereby reducing downtime and
>> > > > > network bandwidth requirements. However, this improvement comes
>> > > > > at the cost of additional CPU resources, posing a challenge for
>> > > > > cloud service providers in terms of resource allocation. To
>> > > > > address this challenge, I have focused on offloading the compression
>> overhead to the IAA hardware, resulting in performance gains.
>> > > > >
>> > > > > The implementation of the IAA (de)compression code is based on
>> > > > > Intel Query Processing Library (QPL), an open-source software
>> > > > > project designed for IAA high-level software programming.
>> > > > >
>> > > > > Best regards,
>> > > > > Yuan Liu
>> > > >
>> > > > After reviewing the patches:
>> > > >
>> > > > - why are you doing this on top of old compression code, that is
>> > > >   obsolete, deprecated and buggy
> Some users have not enabled the multifd feature yet, but they will
> decide whether to enable the compression feature based on the load
> situation. So I'm wondering if, without multifd, the compression
> functionality will no longer be available?
Next pull request will deprecate it.  So in two versions is going to be gone.
>> > > > - why are you not doing it on top of multifd.
> I plan to submit the support for multifd independently because the
> multifd compression and legacy compression code are separate.
compression code is really buggy.  I think you should not even try to
work on top of it.
> I looked at the code of multifd about compression. Currently, it uses
> the CPU synchronous compression mode. Since it is best to use the
> asynchronous processing method of the hardware accelerator, I would
> like to get suggestions on the asynchronous implementation.
I did that on a previous comment.
Several questions:
- you are using zlib, right?  When I tested, the longer streams you
  have, the better compression you get. right?
  Is there a way to "continue" with the state of the previous job?
  Old compression code, generates a new context for every packet.
  Multifd generates a new zlib context for each connection.
> 1. Dirty page scanning and compression pipeline processing, the main
> thread of live migration submits compression tasks to the hardware,
> and multifd threads only handle the transmission of compressed pages.
> 2. Data sending and compression pipeline processing, the Multifd
> threads submit compression tasks to the hardware and then transmit the
> compressed data. (A multifd thread job may need to transmit compressed
> data multiple times.)
>
>> > > > You just need to add another compression method on top of multifd.
>> > > > See how it was done for zstd:
> Yes, I will refer to zstd to implement multifd compression with IAA
Basically you can use two approachs here (simplifying a lot)
- for each channel
     submit job (512KB)
     wait for job
     send compressed stuff
  And you adjust the number of channels depending on how much
  concurrency you want.
- for each channel
     submit job
     while (number_of_jobs_submitted > some_threshold)
        wait_for_job
        send job
  Here you need to piggy back in the MULTIFD_FLAG_SYNC to wait for the
  rest of jobs.
Each one has its advantages/disadvantages.  With the 1st, it is simpler
to do, because it is for all effects synchronous, and simpler to
"contain" the concurrency.
With the second approach you get much more concurrency, but you need to
be careful about how much stuff do you have in flight.
Remember that you get queueds for each multifd channel.
How much asynchronous jobs (around 512KB each packet) can current
hardware handle?  I mean what is the optimus number, around 10, around
50, around 100?
>> > > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library is
>> > > not defining a new compression format. Rather it is providing a
>> > > hardware accelerator for 'deflate' format, as can be made compatible
>> > > with zlib:
>> > >
>> > >
>> > > https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_cases
>> > > /deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-refere
>> > > nce-link
>> > >
>> > > With multifd we already have a 'zlib' compression format, and so
>> > > this IAA/QPL logic would effectively just be a providing a second
>> > > implementation of zlib.
>> > >
>> > > Given the use of a standard format, I would expect to be able to use
>> > > software zlib on the src, mixed with IAA/QPL zlib on the target, or
>> > > vica-verca.
>> > >
>> > > IOW, rather than defining a new compression format for this, I think
>> > > we could look at a new migration parameter for
>> > >
>> > > "compression-accelerator": ["auto", "none", "qpl"]
>> > >
>> > > with 'auto' the default, such that we can automatically enable
>> > > IAA/QPL when 'zlib' format is requested, if running on a suitable
>> > > host.
>> >
>> > I was also curious about the format of compression comparing to
>> > software ones when reading.
>> >
>> > Would there be a use case that one would prefer soft compression even
>> > if hardware accelerator existed, no matter on src/dst?
>> >
>> > I'm wondering whether we can avoid that one more parameter but always
>> > use hardware accelerations as long as possible.
> I want to add a new compression format(QPL or IAA-Deflate) here. The reasons are as follows:
> 1. The QPL library already supports both software and hardware paths
> for compression.
The question is if IAA-Deflate is compatible with zlib-deflate.
What are the advantages of QPL software implementation vs zlib?
- Is it faster?
- Does it uses less resources.
> The software path uses a fast Deflate compression
> algorithm, while the hardware path uses IAA.
Is it faster than zlib?
And doing all of this asynchronous job dance is not going to be slower
than just calling the functions in a software implementation?
> 2. QPL's software and hardware paths are based on the Deflate
> algorithm, but there is a limitation: the history buffer only supports
> 4K. The default history buffer for zlib is 32K, which means that IAA
> cannot decompress zlib-compressed data. However, zlib can decompress
> IAA-compressed data.
Aha.  Thanks, that was what we wanted to know.
> 3. For zlib and zstd, Intel QuickAssist Technology can accelerate both of them.
Do we have any number than we could look at?
We are interested in three things:
- how faster is it
- how much cpu is saved using IAA
- how much latency does it add
Thanks, Juan.
>> Yeah, I did wonder about whether we could avoid a parameter, but then I'm
>> thinking  it is good to have an escape hatch if we were to find any flaws in the
>> QPL library's impl of deflate() that caused interop problems.
>> 
>> With regards,
>> Daniel
>> --
>> |: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
>> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
>> |: https://entangle-photo.org    -o-
>> https://www.instagram.com/dberrange :|
^ permalink raw reply	[flat|nested] 25+ messages in thread
- * RE: [PATCH 0/5] Live Migration Acceleration with IAA Compression
  2023-10-23 10:38           ` Juan Quintela
@ 2023-10-23 16:32             ` Liu, Yuan1
  0 siblings, 0 replies; 25+ messages in thread
From: Liu, Yuan1 @ 2023-10-23 16:32 UTC (permalink / raw)
  To: quintela@redhat.com
  Cc: Daniel P.Berrangé, Peter Xu, farosas@suse.de,
	leobras@redhat.com, qemu-devel@nongnu.org, Zou, Nanhai
> -----Original Message-----
> From: Juan Quintela <quintela@redhat.com>
> Sent: Monday, October 23, 2023 6:39 PM
> To: Liu, Yuan1 <yuan1.liu@intel.com>
> Cc: Daniel P.Berrangé <berrange@redhat.com>; Peter Xu
> <peterx@redhat.com>; farosas@suse.de; leobras@redhat.com; qemu-
> devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
> Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA Compression
> 
> "Liu, Yuan1" <yuan1.liu@intel.com> wrote:
> >> -----Original Message-----
> >> From: Daniel P. Berrangé <berrange@redhat.com>
> >> Sent: Thursday, October 19, 2023 11:32 PM
> >> To: Peter Xu <peterx@redhat.com>
> >> Cc: Juan Quintela <quintela@redhat.com>; Liu, Yuan1
> >> <yuan1.liu@intel.com>; farosas@suse.de; leobras@redhat.com; qemu-
> >> devel@nongnu.org; Zou, Nanhai <nanhai.zou@intel.com>
> >> Subject: Re: [PATCH 0/5] Live Migration Acceleration with IAA
> >> Compression
> >>
> >> On Thu, Oct 19, 2023 at 11:23:31AM -0400, Peter Xu wrote:
> >> > On Thu, Oct 19, 2023 at 03:52:14PM +0100, Daniel P. Berrangé wrote:
> >> > > On Thu, Oct 19, 2023 at 01:40:23PM +0200, Juan Quintela wrote:
> >> > > > Yuan Liu <yuan1.liu@intel.com> wrote:
> >> > > > > Hi,
> >> > > > >
> >> > > > > I am writing to submit a code change aimed at enhancing live
> >> > > > > migration acceleration by leveraging the compression
> >> > > > > capability of the Intel In-Memory Analytics Accelerator (IAA).
> >> > > > >
> >> > > > > Enabling compression functionality during the live migration
> >> > > > > process can enhance performance, thereby reducing downtime
> >> > > > > and network bandwidth requirements. However, this improvement
> >> > > > > comes at the cost of additional CPU resources, posing a
> >> > > > > challenge for cloud service providers in terms of resource
> >> > > > > allocation. To address this challenge, I have focused on
> >> > > > > offloading the compression
> >> overhead to the IAA hardware, resulting in performance gains.
> >> > > > >
> >> > > > > The implementation of the IAA (de)compression code is based
> >> > > > > on Intel Query Processing Library (QPL), an open-source
> >> > > > > software project designed for IAA high-level software programming.
> >> > > > >
> >> > > > > Best regards,
> >> > > > > Yuan Liu
> >> > > >
> >> > > > After reviewing the patches:
> >> > > >
> >> > > > - why are you doing this on top of old compression code, that is
> >> > > >   obsolete, deprecated and buggy
> > Some users have not enabled the multifd feature yet, but they will
> > decide whether to enable the compression feature based on the load
> > situation. So I'm wondering if, without multifd, the compression
> > functionality will no longer be available?
> 
> Next pull request will deprecate it.  So in two versions is going to be gone.
> 
> >> > > > - why are you not doing it on top of multifd.
> 
> > I plan to submit the support for multifd independently because the
> > multifd compression and legacy compression code are separate.
> 
> compression code is really buggy.  I think you should not even try to work on
> top of it.
Sure, I will focus on multifd compression in the future.
> > I looked at the code of multifd about compression. Currently, it uses
> > the CPU synchronous compression mode. Since it is best to use the
> > asynchronous processing method of the hardware accelerator, I would
> > like to get suggestions on the asynchronous implementation.
> 
> I did that on a previous comment.
> Several questions:
> 
> - you are using zlib, right?  When I tested, the longer streams you
>   have, the better compression you get. right?
>   Is there a way to "continue" with the state of the previous job?
> 
>   Old compression code, generates a new context for every packet.
>   Multifd generates a new zlib context for each connection.
Sorry, I'm not familiar with zlib development.
In most cases, the longer the input data, the higher the compression ratio, one reason is that longer data can be encoded more efficiently.
Deflate compression has two phases, LZ77 + Huffman coding, and as far as I know, zlib can use a static Huffman table or a dynamic Huffman table, the former has high throughput and the latter has high compression ratio, but the user can not specify a Huffman table.
IAA can support this, it has a mode(canned mode) that compression can use a user-generated Huffman table to improve the compression ratio, this table also can be created by analyzing the input data using the QPL library.
> > 1. Dirty page scanning and compression pipeline processing, the main
> > thread of live migration submits compression tasks to the hardware,
> > and multifd threads only handle the transmission of compressed pages.
> > 2. Data sending and compression pipeline processing, the Multifd
> > threads submit compression tasks to the hardware and then transmit the
> > compressed data. (A multifd thread job may need to transmit compressed
> > data multiple times.)
> >
> >> > > > You just need to add another compression method on top of multifd.
> >> > > > See how it was done for zstd:
> > Yes, I will refer to zstd to implement multifd compression with IAA
> 
> Basically you can use two approachs here (simplifying a lot)
> - for each channel
>      submit job (512KB)
>      wait for job
>      send compressed stuff
>   And you adjust the number of channels depending on how much
>   concurrency you want.
> 
> 
> - for each channel
>      submit job
>      while (number_of_jobs_submitted > some_threshold)
>         wait_for_job
>         send job
>   Here you need to piggy back in the MULTIFD_FLAG_SYNC to wait for the
>   rest of jobs.
> 
> Each one has its advantages/disadvantages.  With the 1st, it is simpler to do,
> because it is for all effects synchronous, and simpler to "contain" the
> concurrency.
> 
> With the second approach you get much more concurrency, but you need to be
> careful about how much stuff do you have in flight.
> 
> Remember that you get queueds for each multifd channel.
> How much asynchronous jobs (around 512KB each packet) can current
> hardware handle?  I mean what is the optimus number, around 10, around 50,
> around 100?
Thank you very much for your detailed explanation, I will modify it accordingly
> >> > > I'm not sure that is ideal approach.  IIUC, the IAA/QPL library
> >> > > is not defining a new compression format. Rather it is providing
> >> > > a hardware accelerator for 'deflate' format, as can be made
> >> > > compatible with zlib:
> >> > >
> >> > >
> >> > > https://intel.github.io/qpl/documentation/dev_guide_docs/c_use_ca
> >> > > ses
> >> > > /deflate/c_deflate_zlib_gzip.html#zlib-and-gzip-compatibility-ref
> >> > > ere
> >> > > nce-link
> >> > >
> >> > > With multifd we already have a 'zlib' compression format, and so
> >> > > this IAA/QPL logic would effectively just be a providing a second
> >> > > implementation of zlib.
> >> > >
> >> > > Given the use of a standard format, I would expect to be able to
> >> > > use software zlib on the src, mixed with IAA/QPL zlib on the
> >> > > target, or vica-verca.
> >> > >
> >> > > IOW, rather than defining a new compression format for this, I
> >> > > think we could look at a new migration parameter for
> >> > >
> >> > > "compression-accelerator": ["auto", "none", "qpl"]
> >> > >
> >> > > with 'auto' the default, such that we can automatically enable
> >> > > IAA/QPL when 'zlib' format is requested, if running on a suitable
> >> > > host.
> >> >
> >> > I was also curious about the format of compression comparing to
> >> > software ones when reading.
> >> >
> >> > Would there be a use case that one would prefer soft compression
> >> > even if hardware accelerator existed, no matter on src/dst?
> >> >
> >> > I'm wondering whether we can avoid that one more parameter but
> >> > always use hardware accelerations as long as possible.
> > I want to add a new compression format(QPL or IAA-Deflate) here. The
> reasons are as follows:
> > 1. The QPL library already supports both software and hardware paths
> > for compression.
> 
> The question is if IAA-Deflate is compatible with zlib-deflate.
> What are the advantages of QPL software implementation vs zlib?
> - Is it faster?
> - Does it uses less resources.
Yes, the QPL software path is much faster than zlib. The QPL software path is based on ISA-L (https://github.com/intel/isa-l), which is fully compatible with zlib and has several times the throughput of zlib
 
> > The software path uses a fast Deflate compression algorithm, while the
> > hardware path uses IAA.
> 
> Is it faster than zlib?
> And doing all of this asynchronous job dance is not going to be slower than just
> calling the functions in a software implementation?
Yes, basically using the asynchronous method will increase the latency, I will do some tests based on the multifd solution and give a reply later
> > 2. QPL's software and hardware paths are based on the Deflate
> > algorithm, but there is a limitation: the history buffer only supports
> > 4K. The default history buffer for zlib is 32K, which means that IAA
> > cannot decompress zlib-compressed data. However, zlib can decompress
> > IAA-compressed data.
> 
> Aha.  Thanks, that was what we wanted to know.
> 
> > 3. For zlib and zstd, Intel QuickAssist Technology can accelerate both of them.
> 
> Do we have any number than we could look at?
> We are interested in three things:
> - how faster is it
> - how much cpu is saved using IAA
> - how much latency does it add
Sure, I will provide this data following the next version 
> >> Yeah, I did wonder about whether we could avoid a parameter, but then
> >> I'm thinking  it is good to have an escape hatch if we were to find
> >> any flaws in the QPL library's impl of deflate() that caused interop problems.
> >>
> >> With regards,
> >> Daniel
> >> --
> >> |: https://berrange.com      -o-
> https://www.flickr.com/photos/dberrange :|
> >> |: https://libvirt.org         -o-            https://fstop138.berrange.com :|
> >> |: https://entangle-photo.org    -o-
> >> https://www.instagram.com/dberrange :|
^ permalink raw reply	[flat|nested] 25+ messages in thread