All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "chenyuhui (A)" <chenyuhui5@huawei.com>
Cc: Fabiano Rosas <farosas@suse.de>,
	qemu-devel@nongnu.org, xuyinghua3@huawei.com,
	liheng.liheng@huawei.com, renxuming@huawei.com,
	pengyi.pengyi@huawei.com, yubihong@huawei.com,
	zhengchuan@huawei.com, huhao33@huawei.com,
	caozhongwang1@huawei.com, caiqingmu@huawei.com,
	Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH] migrate/multifd: fix coredump when the multifd thread cleanup
Date: Tue, 25 Jul 2023 12:53:01 -0400	[thread overview]
Message-ID: <ZL/97XT8wRq1hsVN@x1n> (raw)
In-Reply-To: <dd678c96-6065-0c53-9543-312a647298e2@huawei.com>

On Tue, Jul 25, 2023 at 04:43:28PM +0800, chenyuhui (A) wrote:
> @Peter Xu @Fabiano Rosas
> Kindly ping on this.

Ah I see what's missing - please copy maintainer (Juan) for any migration
patches, especially multifd ones..  I'm doing that for this one, but I'd
suggest you repost with a whole patch and information put into commit msg.

Thanks.

> 
> On 2023/6/27 9:11, chenyuhui (A) wrote:
> > 
> > On 2023/6/26 21:16, chenyuhui (A) wrote:
> >>
> >> On 2023/6/21 22:22, Fabiano Rosas wrote:
> >>> Jianguo Zhang via <qemu-devel@nongnu.org> writes:
> >>>
> >>>> From: Yuhui Chen <chenyuhui5@huawei.com>
> >>>>
> >>>> There is a coredump while trying to destroy mutex when
> >>>> p->running is false but p->mutex is not unlock.
> >>>> Make sure all mutexes has been released before destroy them.
> >>>>
> >>>> Signed-off-by: Yuhui Chen <chenyuhui5@huawei.com>
> >>>> ---
> >>>>  migration/multifd.c | 6 ++----
> >>>>  1 file changed, 2 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/migration/multifd.c b/migration/multifd.c
> >>>> index b7ad7002e0..7dcdb2d3a0 100644
> >>>> --- a/migration/multifd.c
> >>>> +++ b/migration/multifd.c
> >>>> @@ -523,9 +523,7 @@ void multifd_save_cleanup(void)
> >>>>      for (i = 0; i < migrate_multifd_channels(); i++) {
> >>>>          MultiFDSendParams *p = &multifd_send_state->params[i];
> >>>>  
> >>>> -        if (p->running) {
> >>>
> >>> The need for this flag is dubious IMO. Commit 10351fbad1
> >>> ("migration/multifd: Join all multifd threads in order to avoid leaks")
> >>> already moved the other join outside of it. If we figure out another way
> >>> to deal with the sem_sync lockup we could probably remove this
> >>> altogether.
> >>
> >>
> >> I've seen this commit 10351fbad1, and it's seems to have the same
> >> problem in function multifd_save_cleanup.
> >>
> >> So that may my patch only need to modify multifd_save_cleanup.
> >>
> >> __________________________________________________________________
> >>
> >>
> >> On 2023/6/21 21:24, Peter Xu wrote:
> >>> On Wed, Jun 21, 2023 at 04:18:26PM +0800, Jianguo Zhang via wrote:
> >>>> From: Yuhui Chen<chenyuhui5@huawei.com>
> >>>>
> >>>> There is a coredump while trying to destroy mutex when
> >>>> p->running is false but p->mutex is not unlock.
> >>>> Make sure all mutexes has been released before destroy them.
> >>>
> >>> It'll be nice to add a backtrace of the coredump here, and also copy
> >>> maintainer (Juan Quintela, copied now).
> >>>
> >>
> >> The following is coredump, and my code is base on
> >> https://github.com/qemu/qemu.git tag v6.2.0.
> >>
> > (gdb) bt
> > #0  0x0000ffffabe3b2b8 in  () at /usr/lib64/libc.so.6
> > #1  0x0000ffffabdf6d7c in raise () at /usr/lib64/libc.so.6
> > #2  0x0000ffffabde4d2c in abort () at /usr/lib64/libc.so.6
> > #3  0x0000aaaac67fcc10 in error_exit (err=<optimized out>, msg=msg@entry=0xaaaac6dc52b8 <__func__.33> "qemu_mutex_destroy") at ../util/qemu-thread-posix.c:38
> > #4  0x0000aaaac67fce38 in qemu_mutex_destroy (mutex=mutex@entry=0xaaaafa1a4250) at ../util/qemu-thread-posix.c:71
> > #5  0x0000aaaac6055688 in multifd_save_cleanup () at ../migration/multifd.c:555
> > #6  0x0000aaaac6050198 in migrate_fd_cleanup (s=s@entry=0xaaaaf7518800) at ../migration/migration.c:1808
> > #7  0x0000aaaac6050384 in migrate_fd_cleanup_bh (opaque=0xaaaaf7518800) at ../migration/migration.c:1850
> > #8  0x0000aaaac680d790 in aio_bh_call (bh=0xffffa0004c40) at ../util/async.c:141
> > #9  aio_bh_poll (ctx=ctx@entry=0xaaaaf73285a0) at ../util/async.c:169
> > #10 0x0000aaaac67f9e18 in aio_dispatch (ctx=0xaaaaf73285a0) at ../util/aio-posix.c:381
> > #11 0x0000aaaac680d414 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:311
> > #12 0x0000ffffac44cf88 in g_main_context_dispatch () at /usr/lib64/libglib-2.0.so.0
> > #13 0x0000aaaac6819214 in glib_pollfds_poll () at ../util/main-loop.c:232
> > #14 os_host_main_loop_wait (timeout=735000000) at ../util/main-loop.c:255
> > #15 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:531
> > #16 0x0000aaaac65005cc in qemu_main_loop () at ../softmmu/runstate.c:726
> > #17 0x0000aaaac5fe2030 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50
> > (gdb) q
> > 
> >> How reproducible:
> >> 1、And sleep time to produce p->running is false but p->mutex is
> >>  not unlock.(apply following patch)
> >> 2、Do migration with --parallel-connections.
> >>>> From: Yuhui Chen <chenyuhui5@huawei.com>
> >> Date: Mon, 26 Jun 2023 14:24:35 +0800
> >> Subject: [DEBUG][PATCH] And sleep time to produce p->running is false but p->mutex is
> >>  not unlock.
> >>
> >> ---
> >>  migration/multifd.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/migration/multifd.c b/migration/multifd.c
> >> index 7c9deb1921..09a7b0748a 100644
> >> --- a/migration/multifd.c
> >> +++ b/migration/multifd.c
> >> @@ -538,6 +538,7 @@ void multifd_save_cleanup(void)
> >>      for (i = 0; i < migrate_multifd_channels(); i++) {
> >>          MultiFDSendParams *p = &multifd_send_state->params[i];
> >>
> >> +        sleep(2);
> >>          if (p->running) {
> >>              qemu_thread_join(&p->thread);
> >>          }
> >> @@ -719,6 +720,7 @@ out:
> >>
> >>      qemu_mutex_lock(&p->mutex);
> >>      p->running = false;
> >> +    sleep(20);
> >>      qemu_mutex_unlock(&p->mutex);
> >>
> >>      rcu_unregister_thread();
> 

-- 
Peter Xu



  reply	other threads:[~2023-07-25 16:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-21  8:18 [PATCH] migrate/multifd: fix coredump when the multifd thread cleanup z00619469 via
2023-06-21 13:24 ` Peter Xu
2023-06-21 14:22 ` Fabiano Rosas
2023-06-26 13:16   ` chenyuhui (A) via
2023-06-27  1:11     ` chenyuhui (A) via
2023-07-25  8:43       ` chenyuhui (A) via
2023-07-25 16:53         ` Peter Xu [this message]
2023-07-27  1:46           ` chenyuhui (A) via
2023-08-14 14:55             ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZL/97XT8wRq1hsVN@x1n \
    --to=peterx@redhat.com \
    --cc=caiqingmu@huawei.com \
    --cc=caozhongwang1@huawei.com \
    --cc=chenyuhui5@huawei.com \
    --cc=farosas@suse.de \
    --cc=huhao33@huawei.com \
    --cc=liheng.liheng@huawei.com \
    --cc=pengyi.pengyi@huawei.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=renxuming@huawei.com \
    --cc=xuyinghua3@huawei.com \
    --cc=yubihong@huawei.com \
    --cc=zhengchuan@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.