qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Nina Schoetterl-Glausch <nsg@linux.ibm.com>
Cc: qemu-devel@nongnu.org, Prasad Pandit <ppandit@redhat.com>,
	Fabiano Rosas <farosas@suse.de>, Bandan Das <bdas@redhat.com>,
	Julia Suvorova <jusual@redhat.com>,
	Thomas Huth <thuth@redhat.com>,
	Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH 2/3] migration: Drop unnecessary check in ram's pending_exact()
Date: Wed, 20 Mar 2024 14:57:17 -0400	[thread overview]
Message-ID: <ZfsxjS4Phkf34f4J@x1n> (raw)
In-Reply-To: <bc88f4572649084cd38a0880d7e096a939f80b14.camel@linux.ibm.com>

On Wed, Mar 20, 2024 at 06:51:26PM +0100, Nina Schoetterl-Glausch wrote:
> On Wed, 2024-01-17 at 15:58 +0800, peterx@redhat.com wrote:
> > From: Peter Xu <peterx@redhat.com>
> > 
> > When the migration frameworks fetches the exact pending sizes, it means
> > this check:
> > 
> >   remaining_size < s->threshold_size
> > 
> > Must have been done already, actually at migration_iteration_run():
> > 
> >     if (must_precopy <= s->threshold_size) {
> >         qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);
> > 
> > That should be after one round of ram_state_pending_estimate().  It makes
> > the 2nd check meaningless and can be dropped.
> > 
> > To say it in another way, when reaching ->state_pending_exact(), we
> > unconditionally sync dirty bits for precopy.
> > 
> > Then we can drop migrate_get_current() there too.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> Hi Peter,

Hi, Nina,

> 
> could you have a look at this issue:
> https://gitlab.com/qemu-project/qemu/-/issues/1565
> 
> which I reopened. Previous thread here:
> 
> https://lore.kernel.org/qemu-devel/20230324184129.3119575-1-nsg@linux.ibm.com/
> 
> I'm seeing migration failures with s390x TCG again, which look the same to me
> as those a while back.

I'm still quite confused how that could be caused of this.

What you described in the previous bug report seems to imply some page was
leftover in migration so some page got corrupted after migrated.

However what this patch mostly does is it can sync more than before even if
I overlooked the condition check there (I still think the check is
redundant, there's one outlier when remaining_size == threshold_size, but I
don't think it should matter here as of now).  It'll make more sense if
this patch made the sync less, but that's not the case but vice versa.

> 
> > ---
> >  migration/ram.c | 9 ++++-----
> >  1 file changed, 4 insertions(+), 5 deletions(-)
> > 
> > diff --git a/migration/ram.c b/migration/ram.c
> > index c0cdcccb75..d5b7cd5ac2 100644
> > --- a/migration/ram.c
> > +++ b/migration/ram.c
> > @@ -3213,21 +3213,20 @@ static void ram_state_pending_estimate(void *opaque, uint64_t *must_precopy,
> >  static void ram_state_pending_exact(void *opaque, uint64_t *must_precopy,
> >                                      uint64_t *can_postcopy)
> >  {
> > -    MigrationState *s = migrate_get_current();
> >      RAMState **temp = opaque;
> >      RAMState *rs = *temp;
> > +    uint64_t remaining_size;
> >  
> > -    uint64_t remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> > -
> > -    if (!migration_in_postcopy() && remaining_size < s->threshold_size) {
> > +    if (!migration_in_postcopy()) {
> >          bql_lock();
> >          WITH_RCU_READ_LOCK_GUARD() {
> >              migration_bitmap_sync_precopy(rs, false);
> >          }
> >          bql_unlock();
> > -        remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> >      }
> >  
> > +    remaining_size = rs->migration_dirty_pages * TARGET_PAGE_SIZE;
> > +
> >      if (migrate_postcopy_ram()) {
> >          /* We can do postcopy, and all the data is postcopiable */
> >          *can_postcopy += remaining_size;
> 
> This basically reverts 28ef5339c3 ("migration: fix ram_state_pending_exact()"), which originally
> made the issue disappear.
> 
> Any thoughts on the matter appreciated.

In the previous discussion, you mentioned that you bisected to the commit
and also verified the fix.  Now you also mentioned in the bz that you can't
reporduce this bug manually.

Is it still possible to be reproduced with some scripts?  Do you also mean
that it's harder to reproduce comparing to before?  In all cases, some way
to reproduce it would definitely be helpful.

Even if we want to revert this change, we'll need to know whether this will
fix your case so we need something to verify it before a revert.  I'll
consider that the last though as I had a feeling this is papering over
something else.

Thanks,

-- 
Peter Xu



  parent reply	other threads:[~2024-03-20 18:58 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17  7:58 [PATCH 0/3] migration: some small cleanups peterx
2024-01-17  7:58 ` [PATCH 1/3] migration: Make threshold_size an uint64_t peterx
2024-01-17  8:09   ` Philippe Mathieu-Daudé
2024-01-19 13:26   ` Fabiano Rosas
2024-01-17  7:58 ` [PATCH 2/3] migration: Drop unnecessary check in ram's pending_exact() peterx
2024-01-19 13:25   ` Fabiano Rosas
2024-03-20 17:51   ` Nina Schoetterl-Glausch
2024-03-20 18:05     ` Nina Schoetterl-Glausch
2024-03-20 18:57     ` Peter Xu [this message]
2024-03-20 19:21       ` Nina Schoetterl-Glausch
2024-03-20 19:46         ` Peter Xu
2024-03-20 20:45           ` Peter Xu
2024-01-17  7:58 ` [PATCH 3/3] analyze-migration.py: Remove trick on parsing ramblocks peterx
2024-01-19 13:23   ` Fabiano Rosas
2024-01-20  2:36 ` [PATCH 0/3] migration: some small cleanups Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZfsxjS4Phkf34f4J@x1n \
    --to=peterx@redhat.com \
    --cc=bdas@redhat.com \
    --cc=farosas@suse.de \
    --cc=jusual@redhat.com \
    --cc=nsg@linux.ibm.com \
    --cc=ppandit@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).