[PATCH 0/1] Solve zero page causing multiple page faults

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/1] Solve zero page causing multiple page faults
@ 2024-04-01 15:41 Yuan Liu
  2024-04-01 15:41 ` [PATCH 1/1] migration/multifd: solve " Yuan Liu
  2024-04-02  7:43 ` [PATCH 0/1] Solve " Liu, Yuan1
  0 siblings, 2 replies; 5+ messages in thread
From: Yuan Liu @ 2024-04-01 15:41 UTC (permalink / raw)
  To: peterx, farosas; +Cc: qemu-devel, hao.xiang, bryan.zhang, yuan1.liu, nanhai.zou

1. Description of multiple page faults for received zero pages
    a. -mem-prealloc feature and hugepage backend are not enabled on the
       destination
    b. After receiving the zero pages, the destination first determines if
       the current page content is 0 via buffer_is_zero, this may cause a
       read page fault

      perf record -e page-faults information below
      13.75%  13.75%  multifdrecv_0 qemu-system-x86_64 [.] buffer_zero_avx512
      11.85%  11.85%  multifdrecv_1 qemu-system-x86_64 [.] buffer_zero_avx512
                      multifd_recv_thread
                      nocomp_recv
                      multifd_recv_zero_page_process
                      buffer_is_zero
                      select_accel_fn 
                      buffer_zero_avx512

   c. Other page faults mainly come from writing operations to normal and
      zero pages.

2. Solution
    a. During the multifd migration process, the received pages are tracked
       through RAMBlock's receivedmap.

    b. If received zero page is not set in recvbitmap, the destination will not
       check whether the page content is 0, thus avoiding the occurrence of
       read fault.

    c. If the zero page has been set in receivedmap, set the page with 0
       directly.

    There are two reasons for this
    1. It's unlikely a zero page if it's sent once or more.
    2. For the 1st time destination received a zero page, it must be a zero
       page, so no need to scan for the 1st round.

3. Test Result 16 vCPUs and 64G memory VM,  multifd number is 2,
   and 100G network bandwidth

    3.1 Test case: 16 vCPUs are idle and only 2G memory are used
    +-----------+--------+--------+----------+
    |MultiFD    | total  |downtime|   Page   |
    |Nocomp     | time   |        | Faults   |
    |           | (ms)   | (ms)   |          |
    +-----------+--------+--------+----------+
    |with       |        |        |          |
    |recvbitmap |    7335|     180|      2716|
    +-----------+--------+--------+----------+
    |without    |        |        |          |
    |recvbitmap |    7771|     153|    121357|
    +-----------+--------+--------+----------+
                                                  
    +-----------+--------+--------+--------+-------+--------+-------------+
    |MultiFD    | total  |downtime| SVM    |SVM    | IOTLB  | IO PageFault|
    |QPL        | time   |        | IO TLB |IO Page| MaxTime| MaxTime     |
    |           | (ms)   | (ms)   | Flush  |Faults | (us)   | (us)        |
    +-----------+--------+--------+--------+-------+--------+-------------+
    |with       |        |        |        |       |        |             |
    |recvbitmap |   10224|     175|     410|  27429|       1|          447|
    +-----------+--------+--------+--------+-------+--------+-------------+
    |without    |        |        |        |       |        |             |
    |recvbitmap |   11253|     153|   80756|  38655|      25|        18349|
    +-----------+--------+--------+--------+-------+--------+-------------+


    3.2 Test case: 16 vCPUs are idle and 56G memory(not zero) are used
    +-----------+--------+--------+----------+
    |MultiFD    | total  |downtime|   Page   |
    |Nocomp     | time   |        | Faults   |
    |           | (ms)   | (ms)   |          |
    +-----------+--------+--------+----------+
    |with       |        |        |          |
    |recvbitmap |   16825|     165|     52967|
    +-----------+--------+--------+----------+
    |without    |        |        |          |
    |recvbitmap |   12987|     159|   2672677|
    +-----------+--------+--------+----------+

    +-----------+--------+--------+--------+-------+--------+-------------+
    |MultiFD    | total  |downtime| SVM    |SVM    | IOTLB  | IO PageFault|
    |QPL        | time   |        | IO TLB |IO Page| MaxTime| MaxTime     |
    |           | (ms)   | (ms)   | Flush  |Faults | (us)   | (us)        |
    +-----------+--------+--------+--------+-------+--------+-------------+
    |with       |        |        |        |       |        |             |
    |recvbitmap |  132315|      77|     890| 937105|      60|         9581|
    +-----------+--------+--------+--------+-------+--------+-------------+
    |without    |        |        |        |       |        |             |
    |recvbitmap | >138333|     N/A| 1647701| 981899|      43|        21018|
    +-----------+--------+--------+--------+-------+--------+-------------+


From the test result, both of page faults and IOTLB Flush operations can
be significantly reduced. The reason is that zero page processing does not
trigger read faults, and a large number of zero pages do not even trigger
write faults (Test 3.1), because it is considered that after the destination
is started, the content of unaccessed pages is 0.

I have a concern here, the RAM memory is allocated by mmap with anonymous
flag, and if the first received zero page is not set to 0 explicitly, does
this ensure that the received zero pages memory data is 0?

In this case, the performance impact of live migration is not big
because the destination is not the bottleneck.

When using QPL (SVM-capable device), even if IOTLB is improved, the
overall performance will still be seriously degraded because a large
number of IO page faults are still generated.

Previous discussion link:
1. https://lore.kernel.org/all/CAAYibXib+TWnJpV22E=adncdBmwXJRqgRjJXK7X71J=bDfaxDg@mail.gmail.com/
2. https://lore.kernel.org/all/PH7PR11MB594123F7EEFEBFCE219AF100A33A2@PH7PR11MB5941.namprd11.prod.outlook.com/

Yuan Liu (1):
  migration/multifd: solve zero page causing multiple page faults

 migration/multifd-zero-page.c | 4 +++-
 migration/multifd-zlib.c      | 1 +
 migration/multifd-zstd.c      | 1 +
 migration/multifd.c           | 1 +
 migration/ram.c               | 4 ++++
 migration/ram.h               | 1 +
 6 files changed, 11 insertions(+), 1 deletion(-)

-- 
2.39.3



^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] migration/multifd: solve zero page causing multiple page faults
  2024-04-01 15:41 [PATCH 0/1] Solve zero page causing multiple page faults Yuan Liu
@ 2024-04-01 15:41 ` Yuan Liu
  2024-04-02 12:57   ` Fabiano Rosas
  2024-04-02  7:43 ` [PATCH 0/1] Solve " Liu, Yuan1
  1 sibling, 1 reply; 5+ messages in thread
From: Yuan Liu @ 2024-04-01 15:41 UTC (permalink / raw)
  To: peterx, farosas; +Cc: qemu-devel, hao.xiang, bryan.zhang, yuan1.liu, nanhai.zou

Implemented recvbitmap tracking of received pages in multifd.

If the zero page appears for the first time in the recvbitmap, this
page is not checked and set.

If the zero page has already appeared in the recvbitmap, there is no
need to check the data but directly set the data to 0, because it is
unlikely that the zero page will be migrated multiple times.

Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
---
 migration/multifd-zero-page.c | 4 +++-
 migration/multifd-zlib.c      | 1 +
 migration/multifd-zstd.c      | 1 +
 migration/multifd.c           | 1 +
 migration/ram.c               | 4 ++++
 migration/ram.h               | 1 +
 6 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/migration/multifd-zero-page.c b/migration/multifd-zero-page.c
index 1ba38be636..e1b8370f88 100644
--- a/migration/multifd-zero-page.c
+++ b/migration/multifd-zero-page.c
@@ -80,8 +80,10 @@ void multifd_recv_zero_page_process(MultiFDRecvParams *p)
 {
     for (int i = 0; i < p->zero_num; i++) {
         void *page = p->host + p->zero[i];
-        if (!buffer_is_zero(page, p->page_size)) {
+        if (ramblock_recv_bitmap_test_byte_offset(p->block, p->zero[i])) {
             memset(page, 0, p->page_size);
+        } else {
+            ramblock_recv_bitmap_set_offset(p->block, p->zero[i]);
         }
     }
 }
diff --git a/migration/multifd-zlib.c b/migration/multifd-zlib.c
index 8095ef8e28..6246ecca2b 100644
--- a/migration/multifd-zlib.c
+++ b/migration/multifd-zlib.c
@@ -288,6 +288,7 @@ static int zlib_recv(MultiFDRecvParams *p, Error **errp)
         int flush = Z_NO_FLUSH;
         unsigned long start = zs->total_out;
 
+        ramblock_recv_bitmap_set_offset(p->block, p->normal[i]);
         if (i == p->normal_num - 1) {
             flush = Z_SYNC_FLUSH;
         }
diff --git a/migration/multifd-zstd.c b/migration/multifd-zstd.c
index 9c9217794e..989333b572 100644
--- a/migration/multifd-zstd.c
+++ b/migration/multifd-zstd.c
@@ -282,6 +282,7 @@ static int zstd_recv(MultiFDRecvParams *p, Error **errp)
     z->in.pos = 0;
 
     for (i = 0; i < p->normal_num; i++) {
+        ramblock_recv_bitmap_set_offset(p->block, p->normal[i]);
         z->out.dst = p->host + p->normal[i];
         z->out.size = p->page_size;
         z->out.pos = 0;
diff --git a/migration/multifd.c b/migration/multifd.c
index 72712fc31f..c9f544dba0 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -277,6 +277,7 @@ static int nocomp_recv(MultiFDRecvParams *p, Error **errp)
     for (int i = 0; i < p->normal_num; i++) {
         p->iov[i].iov_base = p->host + p->normal[i];
         p->iov[i].iov_len = p->page_size;
+        ramblock_recv_bitmap_set_offset(p->block, p->normal[i]);
     }
     return qio_channel_readv_all(p->c, p->iov, p->normal_num, errp);
 }
diff --git a/migration/ram.c b/migration/ram.c
index 8deb84984f..3aa70794c1 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -275,6 +275,10 @@ void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr,
                       nr);
 }
 
+void ramblock_recv_bitmap_set_offset(RAMBlock *rb, uint64_t byte_offset)
+{
+    set_bit_atomic(byte_offset >> TARGET_PAGE_BITS, rb->receivedmap);
+}
 #define  RAMBLOCK_RECV_BITMAP_ENDING  (0x0123456789abcdefULL)
 
 /*
diff --git a/migration/ram.h b/migration/ram.h
index 08feecaf51..bc0318b834 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -69,6 +69,7 @@ int ramblock_recv_bitmap_test(RAMBlock *rb, void *host_addr);
 bool ramblock_recv_bitmap_test_byte_offset(RAMBlock *rb, uint64_t byte_offset);
 void ramblock_recv_bitmap_set(RAMBlock *rb, void *host_addr);
 void ramblock_recv_bitmap_set_range(RAMBlock *rb, void *host_addr, size_t nr);
+void ramblock_recv_bitmap_set_offset(RAMBlock *rb, uint64_t byte_offset);
 int64_t ramblock_recv_bitmap_send(QEMUFile *file,
                                   const char *block_name);
 bool ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *rb, Error **errp);
-- 
2.39.3



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] migration/multifd: solve zero page causing multiple page faults
  2024-04-01 15:41 ` [PATCH 1/1] migration/multifd: solve " Yuan Liu
@ 2024-04-02 12:57   ` Fabiano Rosas
  2024-04-03 19:41     ` Peter Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Fabiano Rosas @ 2024-04-02 12:57 UTC (permalink / raw)
  To: Yuan Liu, peterx
  Cc: qemu-devel, hao.xiang, bryan.zhang, yuan1.liu, nanhai.zou

Yuan Liu <yuan1.liu@intel.com> writes:

> Implemented recvbitmap tracking of received pages in multifd.
>
> If the zero page appears for the first time in the recvbitmap, this
> page is not checked and set.
>
> If the zero page has already appeared in the recvbitmap, there is no
> need to check the data but directly set the data to 0, because it is
> unlikely that the zero page will be migrated multiple times.
>
> Signed-off-by: Yuan Liu <yuan1.liu@intel.com>

Reviewed-by: Fabiano Rosas <farosas@suse.de>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] migration/multifd: solve zero page causing multiple page faults
  2024-04-02 12:57   ` Fabiano Rosas
@ 2024-04-03 19:41     ` Peter Xu
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Xu @ 2024-04-03 19:41 UTC (permalink / raw)
  To: Fabiano Rosas, Yuan Liu
  Cc: Yuan Liu, qemu-devel, hao.xiang, bryan.zhang, nanhai.zou

On Tue, Apr 02, 2024 at 09:57:49AM -0300, Fabiano Rosas wrote:
> Yuan Liu <yuan1.liu@intel.com> writes:
> 
> > Implemented recvbitmap tracking of received pages in multifd.
> >
> > If the zero page appears for the first time in the recvbitmap, this
> > page is not checked and set.
> >
> > If the zero page has already appeared in the recvbitmap, there is no
> > need to check the data but directly set the data to 0, because it is
> > unlikely that the zero page will be migrated multiple times.
> >
> > Signed-off-by: Yuan Liu <yuan1.liu@intel.com>
> 
> Reviewed-by: Fabiano Rosas <farosas@suse.de>
> 

I queued it with below squashed to update the comment. I hope it works for
you.  Thanks,

===8<===
diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
index 848915ea5b..7062da380b 100644
--- a/include/exec/ramblock.h
+++ b/include/exec/ramblock.h
@@ -57,7 +57,7 @@ struct RAMBlock {
     off_t bitmap_offset;
     uint64_t pages_offset;
 
-    /* bitmap of already received pages in postcopy */
+    /* Bitmap of already received pages.  Only used on destination side. */
     unsigned long *receivedmap;
 
     /*
===8<===

-- 
Peter Xu



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [PATCH 0/1] Solve zero page causing multiple page faults
  2024-04-01 15:41 [PATCH 0/1] Solve zero page causing multiple page faults Yuan Liu
  2024-04-01 15:41 ` [PATCH 1/1] migration/multifd: solve " Yuan Liu
@ 2024-04-02  7:43 ` Liu, Yuan1
  1 sibling, 0 replies; 5+ messages in thread
From: Liu, Yuan1 @ 2024-04-02  7:43 UTC (permalink / raw)
  To: peterx@redhat.com, farosas@suse.de
  Cc: qemu-devel@nongnu.org, hao.xiang@bytedance.com,
	bryan.zhang@bytedance.com, Zou, Nanhai

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 7099 bytes --]

> -----Original Message-----
> From: Liu, Yuan1 <yuan1.liu@intel.com>
> Sent: Monday, April 1, 2024 11:41 PM
> To: peterx@redhat.com; farosas@suse.de
> Cc: qemu-devel@nongnu.org; hao.xiang@bytedance.com;
> bryan.zhang@bytedance.com; Liu, Yuan1 <yuan1.liu@intel.com>; Zou, Nanhai
> <nanhai.zou@intel.com>
> Subject: [PATCH 0/1] Solve zero page causing multiple page faults
> 
> 1. Description of multiple page faults for received zero pages
>     a. -mem-prealloc feature and hugepage backend are not enabled on the
>        destination
>     b. After receiving the zero pages, the destination first determines if
>        the current page content is 0 via buffer_is_zero, this may cause a
>        read page fault
> 
>       perf record -e page-faults information below
>       13.75%  13.75%  multifdrecv_0 qemu-system-x86_64 [.]
> buffer_zero_avx512
>       11.85%  11.85%  multifdrecv_1 qemu-system-x86_64 [.]
> buffer_zero_avx512
>                       multifd_recv_thread
>                       nocomp_recv
>                       multifd_recv_zero_page_process
>                       buffer_is_zero
>                       select_accel_fn
>                       buffer_zero_avx512
> 
>    c. Other page faults mainly come from writing operations to normal and
>       zero pages.
> 
> 2. Solution
>     a. During the multifd migration process, the received pages are
> tracked
>        through RAMBlock's receivedmap.
> 
>     b. If received zero page is not set in recvbitmap, the destination
> will not
>        check whether the page content is 0, thus avoiding the occurrence
> of
>        read fault.
> 
>     c. If the zero page has been set in receivedmap, set the page with 0
>        directly.
> 
>     There are two reasons for this
>     1. It's unlikely a zero page if it's sent once or more.
>     2. For the 1st time destination received a zero page, it must be a
> zero
>        page, so no need to scan for the 1st round.
> 
> 3. Test Result 16 vCPUs and 64G memory VM,  multifd number is 2,
>    and 100G network bandwidth
> 
>     3.1 Test case: 16 vCPUs are idle and only 2G memory are used
>     +-----------+--------+--------+----------+
>     |MultiFD    | total  |downtime|   Page   |
>     |Nocomp     | time   |        | Faults   |
>     |           | (ms)   | (ms)   |          |
>     +-----------+--------+--------+----------+
>     |with       |        |        |          |
>     |recvbitmap |    7335|     180|      2716|
>     +-----------+--------+--------+----------+
>     |without    |        |        |          |
>     |recvbitmap |    7771|     153|    121357|
>     +-----------+--------+--------+----------+
> 
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
>     |MultiFD    | total  |downtime| SVM    |SVM    | IOTLB  | IO
> PageFault|
>     |QPL        | time   |        | IO TLB |IO Page| MaxTime| MaxTime
> |
>     |           | (ms)   | (ms)   | Flush  |Faults | (us)   | (us)
> |
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
>     |with       |        |        |        |       |        |
> |
>     |recvbitmap |   10224|     175|     410|  27429|       1|
> 447|
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
>     |without    |        |        |        |       |        |
> |
>     |recvbitmap |   11253|     153|   80756|  38655|      25|
> 18349|
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
> 
> 
>     3.2 Test case: 16 vCPUs are idle and 56G memory(not zero) are used
>     +-----------+--------+--------+----------+
>     |MultiFD    | total  |downtime|   Page   |
>     |Nocomp     | time   |        | Faults   |
>     |           | (ms)   | (ms)   |          |
>     +-----------+--------+--------+----------+
>     |with       |        |        |          |
>     |recvbitmap |   16825|     165|     52967|
>     +-----------+--------+--------+----------+
>     |without    |        |        |          |
>     |recvbitmap |   12987|     159|   2672677|
>     +-----------+--------+--------+----------+
> 
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
>     |MultiFD    | total  |downtime| SVM    |SVM    | IOTLB  | IO
> PageFault|
>     |QPL        | time   |        | IO TLB |IO Page| MaxTime| MaxTime
> |
>     |           | (ms)   | (ms)   | Flush  |Faults | (us)   | (us)
> |
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
>     |with       |        |        |        |       |        |
> |
>     |recvbitmap |  132315|      77|     890| 937105|      60|
> 9581|
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
>     |without    |        |        |        |       |        |
> |
>     |recvbitmap | >138333|     N/A| 1647701| 981899|      43|
> 21018|
>     +-----------+--------+--------+--------+-------+--------+-------------
> +
> 
> 
> From the test result, both of page faults and IOTLB Flush operations can
> be significantly reduced. The reason is that zero page processing does not
> trigger read faults, and a large number of zero pages do not even trigger
> write faults (Test 3.1), because it is considered that after the
> destination
> is started, the content of unaccessed pages is 0.
> 
> I have a concern here, the RAM memory is allocated by mmap with anonymous
> flag, and if the first received zero page is not set to 0 explicitly, does
> this ensure that the received zero pages memory data is 0?

I got the answer here
MAP_ANONYMOUS
The mapping is not backed by any file; its contents are initialized to zero.  The fd argument is ignored; however, some implementations require fd to be -1 if MAP_ANONYMOUS (or MAP_ANON) is specified, and porta©\
ble applications should ensure this.  The offset argument should be zero.  The use of MAP_ANONYMOUS in conjunction with MAP_SHARED is supported on Linux only since kernel 2.4.
 
> In this case, the performance impact of live migration is not big
> because the destination is not the bottleneck.
> 
> When using QPL (SVM-capable device), even if IOTLB is improved, the
> overall performance will still be seriously degraded because a large
> number of IO page faults are still generated.
> 
> Previous discussion link:
> 1.
> https://lore.kernel.org/all/CAAYibXib+TWnJpV22E=adncdBmwXJRqgRjJXK7X71J=bD
> faxDg@mail.gmail.com/
> 2.
> https://lore.kernel.org/all/PH7PR11MB594123F7EEFEBFCE219AF100A33A2@PH7PR11
> MB5941.namprd11.prod.outlook.com/
> 
> Yuan Liu (1):
>   migration/multifd: solve zero page causing multiple page faults
> 
>  migration/multifd-zero-page.c | 4 +++-
>  migration/multifd-zlib.c      | 1 +
>  migration/multifd-zstd.c      | 1 +
>  migration/multifd.c           | 1 +
>  migration/ram.c               | 4 ++++
>  migration/ram.h               | 1 +
>  6 files changed, 11 insertions(+), 1 deletion(-)
> 
> --
> 2.39.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-04-03 19:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-01 15:41 [PATCH 0/1] Solve zero page causing multiple page faults Yuan Liu
2024-04-01 15:41 ` [PATCH 1/1] migration/multifd: solve " Yuan Liu
2024-04-02 12:57   ` Fabiano Rosas
2024-04-03 19:41     ` Peter Xu
2024-04-02  7:43 ` [PATCH 0/1] Solve " Liu, Yuan1

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).