All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Wang, Wei W" <wei.w.wang@intel.com>
Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com>,
	Juan Quintela <quintela@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Leonardo Bras Soares Passos <lsoaresp@redhat.com>
Subject: Re: [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty()
Date: Tue, 6 Jul 2021 13:47:06 -0400	[thread overview]
Message-ID: <YOSXGpEB323VWepM@t490s> (raw)
In-Reply-To: <562b42cbd5674853af21be3297fbaada@intel.com>

On Sat, Jul 03, 2021 at 02:53:27AM +0000, Wang, Wei W wrote:
> On Friday, July 2, 2021 3:07 PM, David Hildenbrand wrote:
> > On 02.07.21 04:48, Wang, Wei W wrote:
> > > On Thursday, July 1, 2021 10:22 PM, David Hildenbrand wrote:
> > >> On 01.07.21 14:51, Peter Xu wrote:
> > 
> > I think that clearly shows the issue.
> > 
> > My theory I did not verify yet: Assume we have 1GB chunks in the clear bmap.
> > Assume the VM reports all pages within a 1GB chunk as free (easy with a fresh
> > VM). While processing hints, we will clear the bits from the dirty bmap in the
> > RAMBlock. As we will never migrate any page of that 1GB chunk, we will not
> > actually clear the dirty bitmap of the memory region. When re-syncing, we will
> > set all bits bits in the dirty bmap again from the dirty bitmap in the memory
> > region. Thus, many of our hints end up being mostly ignored. The smaller the
> > clear bmap chunk, the more extreme the issue.
> 
> OK, that looks possible. We need to clear the related bits from the memory region
> bitmap before skipping the free pages. Could you try with below patch:
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index ace8ad431c..a1f6df3e6c 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -811,6 +811,26 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
>      return next;
>  }
> 
> +
> +static void migration_clear_memory_region_dirty_bitmap(RAMState *rs,
> +                                                       RAMBlock *rb,
> +                                                      unsigned long page)
> +{
> +    uint8_t shift;
> +    hwaddr size, start;
> +
> +    if (!rb->clear_bmap || !clear_bmap_test_and_clear(rb, page))
> +        return;
> +
> +    shift = rb->clear_bmap_shift;
> +    assert(shift >= 6);
> +
> +    size = 1ULL << (TARGET_PAGE_BITS + shift);
> +    start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
> +    trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
> +    memory_region_clear_dirty_bitmap(rb->mr, start, size);
> +}
> +
>  static inline bool migration_bitmap_clear_dirty(RAMState *rs,
>                                                  RAMBlock *rb,
>                                                  unsigned long page)
> @@ -827,26 +847,9 @@ static inline bool migration_bitmap_clear_dirty(RAMState *rs,
>       * the page in the chunk we clear the remote dirty bitmap for all.
>       * Clearing it earlier won't be a problem, but too late will.
>       */
> -    if (rb->clear_bmap && clear_bmap_test_and_clear(rb, page)) {
> -        uint8_t shift = rb->clear_bmap_shift;
> -        hwaddr size = 1ULL << (TARGET_PAGE_BITS + shift);
> -        hwaddr start = (((ram_addr_t)page) << TARGET_PAGE_BITS) & (-size);
> -
> -        /*
> -         * CLEAR_BITMAP_SHIFT_MIN should always guarantee this... this
> -         * can make things easier sometimes since then start address
> -         * of the small chunk will always be 64 pages aligned so the
> -         * bitmap will always be aligned to unsigned long.  We should
> -         * even be able to remove this restriction but I'm simply
> -         * keeping it.
> -         */
> -        assert(shift >= 6);
> -        trace_migration_bitmap_clear_dirty(rb->idstr, start, size, page);
> -        memory_region_clear_dirty_bitmap(rb->mr, start, size);
> -    }
> +    migration_clear_memory_region_dirty_bitmap(rs, rb, page);
> 
>      ret = test_and_clear_bit(page, rb->bmap);
> -
>      if (ret) {
>          rs->migration_dirty_pages--;
>      }
> @@ -2746,7 +2749,7 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
>  {
>      RAMBlock *block;
>      ram_addr_t offset;
> -    size_t used_len, start, npages;
> +    size_t used_len, start, npages, page_to_clear, i = 0;
>      MigrationState *s = migrate_get_current();
> 
>      /* This function is currently expected to be used during live migration */
> @@ -2775,6 +2778,19 @@ void qemu_guest_free_page_hint(void *addr, size_t len)
>          start = offset >> TARGET_PAGE_BITS;
>          npages = used_len >> TARGET_PAGE_BITS;
> 
> +        /*
> +         * The skipped free pages are equavelent to be sent from clear_bmap's
> +        * perspective, so clear the bits from the memory region bitmap which
> +        * are initially set. Otherwise those skipped pages will be sent in
> +        * the next round after syncing from the memory region bitmap.
> +        */
> +        /*
> +         * The skipped free pages are equavelent to be sent from clear_bmap's
> +        * perspective, so clear the bits from the memory region bitmap which
> +        * are initially set. Otherwise those skipped pages will be sent in
> +        * the next round after syncing from the memory region bitmap.
> +        */
> +       do {
> +            page_to_clear = start + (i++ << block->clear_bmap_shift);

Why "i" needs to be shifted?

> +            migration_clear_memory_region_dirty_bitmap(ram_state,
> +                                                       block,
> +                                                       page_to_clear);
> +       } while (i <= npages >> block->clear_bmap_shift);

I agree with David that this should be better put into the mutex section, if so
we'd also touch up comment for bitmap_mutex.  Or is it a reason to explicitly
not do so?

> +
>          qemu_mutex_lock(&ram_state->bitmap_mutex);
>          ram_state->migration_dirty_pages -=
>                        bitmap_count_one_with_offset(block->bmap, start, npages);

After my patch (move mutex out of migration_bitmap_clear_dirty()), maybe we can
call migration_bitmap_clear_dirty() directly here rather than introducing a new
helper?  It'll done all the dirty/clear bmap ops including dirty page accounting.

Thanks,

-- 
Peter Xu



  parent reply	other threads:[~2021-07-06 17:48 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-30 20:08 [PATCH] migration: Move bitmap_mutex out of migration_bitmap_clear_dirty() Peter Xu
2021-07-01  4:42 ` Wang, Wei W
2021-07-01 12:51   ` Peter Xu
2021-07-01 14:21     ` David Hildenbrand
2021-07-02  2:48       ` Wang, Wei W
2021-07-02  7:06         ` David Hildenbrand
2021-07-03  2:53           ` Wang, Wei W
2021-07-05 13:41             ` David Hildenbrand
2021-07-06  9:41               ` Wang, Wei W
2021-07-06 10:05                 ` David Hildenbrand
2021-07-06 17:39                   ` Peter Xu
2021-07-07 12:45                     ` Wang, Wei W
2021-07-07 16:45                       ` Peter Xu
2021-07-07 23:25                         ` Wang, Wei W
2021-07-08  0:21                           ` Peter Xu
2021-07-06 17:47             ` Peter Xu [this message]
2021-07-07  8:34               ` Wang, Wei W
2021-07-07 16:54                 ` Peter Xu
2021-07-08  2:55                   ` Wang, Wei W
2021-07-08 18:10                     ` Peter Xu
2021-07-02  2:29     ` Wang, Wei W
2021-07-06 17:59       ` Peter Xu
2021-07-07  8:33         ` Wang, Wei W
2021-07-07 16:44           ` Peter Xu
2021-07-08  2:49             ` Wang, Wei W
2021-07-08 18:30               ` Peter Xu
2021-07-09  8:58                 ` Wang, Wei W
2021-07-09 14:48                   ` Peter Xu
2021-07-13  8:20                     ` Wang, Wei W
2021-07-03 16:31 ` Lukas Straub
2021-07-04 14:14   ` Lukas Straub
2021-07-06 18:37     ` Peter Xu
2021-07-13  8:40 ` Wang, Wei W
2021-07-13 10:22   ` David Hildenbrand
2021-07-14  5:03     ` Wang, Wei W
2021-07-13 15:59   ` Peter Xu
2021-07-14  5:04     ` Wang, Wei W

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YOSXGpEB323VWepM@t490s \
    --to=peterx@redhat.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=lsoaresp@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=wei.w.wang@intel.com \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.