From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:60899) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gbK72-0000cW-NA for qemu-devel@nongnu.org; Mon, 24 Dec 2018 01:53:20 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gbK6x-0004xb-OE for qemu-devel@nongnu.org; Mon, 24 Dec 2018 01:53:16 -0500 Received: from mx2.suse.de ([195.135.220.15]:38876 helo=mx1.suse.de) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gbK6x-0004vr-ER for qemu-devel@nongnu.org; Mon, 24 Dec 2018 01:53:11 -0500 References: <20181211095057.14623-1-fli@suse.com> <20181211095057.14623-7-fli@suse.com> <87y38tc2fb.fsf@dusky.pond.sub.org> <3645fb54-3651-f63b-c416-b22634e1f992@suse.com> <87zht1keso.fsf@dusky.pond.sub.org> <1d626d01-e503-f554-3af8-0706830f245d@suse.com> <20181224033454.GB29835@xz-x1> From: Fei Li Message-ID: <038cd81d-d12a-9f86-fdb5-3bf10e0093ee@suse.com> Date: Mon, 24 Dec 2018 14:53:03 +0800 MIME-Version: 1.0 In-Reply-To: <20181224033454.GB29835@xz-x1> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Xu Cc: Markus Armbruster , Juan Quintela , "Dr . David Alan Gilbert" , QEMU Developers On 12/24/2018 11:34 AM, Peter Xu wrote: > On Fri, Dec 21, 2018 at 05:36:57PM +0800, Fei Li wrote: >> On 12/19/2018 08:14 PM, Fei Li wrote: >>> On 12/19/2018 06:10 PM, Markus Armbruster wrote: >>>> Fei Li writes: >>>> >>>>> On 12/13/2018 03:26 PM, Markus Armbruster wrote: >>>>>> There's a question for David Gibson inline.=C2=A0 Please search fo= r /ppc/. >>>>>> >>>>>> Fei Li writes: >>>>>> >>>>>>> Make qemu_thread_create() return a Boolean to indicate if it succ= eeds >>>>>>> rather than failing with an error. And add an Error parameter to = hold >>>>>>> the error message and let the callers handle it. >>>>>> The "rather than failing with an error" is misleading. Before the >>>>>> patch, we report to stderr and abort().=C2=A0 What about: >>>>>> >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 qemu-thread: Make qemu_thread_crea= te() handle errors properly >>>>>> >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 qemu_thread_create() abort()s on e= rror.=C2=A0 Not nice. Give it a >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return value and an Error ** argum= ent, so it can >>>>>> return success / >>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 failure. >>>>> A nice commit-amend! Thanks! >>>>>> Still missing from the commit message then: how you update >>>>>> the callers. >>>>> Yes, agree. I think the-how should also be noted here, like >>>>> - propagating the err to callers whose call trace already have the >>>>> Error paramater; >>>>> - just add an &error_abort for qemu_thread_create() and make it a >>>>> "TODO: xxx"; >>>>>> Let's see below. >>>>>> >>>>>>> Cc: Markus Armbruster >>>>>>> Cc: Daniel P. Berrang=C3=A9 >>>>>>> Cc: Dr. David Alan Gilbert >>>>>>> Signed-off-by: Fei Li >>>>>>> --- >>>>>>> =C2=A0=C2=A0 cpus.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 | 45 >>>>>>> ++++++++++++++++++++++++------------- >>>>>>> =C2=A0=C2=A0 dump.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 |=C2=A0 6 +++-- >>>>>>> =C2=A0=C2=A0 hw/misc/edu.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 6 +++-- >>>>>>> =C2=A0=C2=A0 hw/ppc/spapr_hcall.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 | 10 +++++++-- >>>>>>> =C2=A0=C2=A0 hw/rdma/rdma_backend.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= |=C2=A0 4 +++- >>>>>>> =C2=A0=C2=A0 hw/usb/ccid-card-emulated.c | 16 ++++++++++---- >>>>>>> =C2=A0=C2=A0 include/qemu/thread.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 |=C2=A0 4 ++-- >>>>>>> =C2=A0=C2=A0 io/task.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0= 3 ++- >>>>>>> =C2=A0=C2=A0 iothread.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 16 +++++++= ++----- >>>>>>> =C2=A0=C2=A0 migration/migration.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 | 54 >>>>>>> +++++++++++++++++++++++++++++---------------- >>>>>>> =C2=A0=C2=A0 migration/postcopy-ram.c=C2=A0=C2=A0=C2=A0 | 14 +++= +++++++-- >>>>>>> =C2=A0=C2=A0 migration/ram.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 40 ++++++++++++++++++++++++-------= -- >>>>>>> =C2=A0=C2=A0 migration/savevm.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 | 11 ++++++--- >>>>>>> =C2=A0=C2=A0 tests/atomic_add-bench.c=C2=A0=C2=A0=C2=A0 |=C2=A0 = 3 ++- >>>>>>> =C2=A0=C2=A0 tests/iothread.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 2 +- >>>>>>> =C2=A0=C2=A0 tests/qht-bench.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 3 ++- >>>>>>> =C2=A0=C2=A0 tests/rcutorture.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 |=C2=A0 3 ++- >>>>>>> =C2=A0=C2=A0 tests/test-aio.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 2 +- >>>>>>> =C2=A0=C2=A0 tests/test-rcu-list.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 |=C2=A0 3 ++- >>>>>>> =C2=A0=C2=A0 ui/vnc-jobs.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 17 +++++++++----- >>>>>>> =C2=A0=C2=A0 ui/vnc-jobs.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 2 +- >>>>>>> =C2=A0=C2=A0 ui/vnc.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2= =A0 4 +++- >>>>>>> =C2=A0=C2=A0 util/compatfd.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 12 ++++++++-- >>>>>>> =C2=A0=C2=A0 util/oslib-posix.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 | 17 ++++++++++---- >>>>>>> =C2=A0=C2=A0 util/qemu-thread-posix.c=C2=A0=C2=A0=C2=A0 | 24 +++= ++++++++++------- >>>>>>> =C2=A0=C2=A0 util/qemu-thread-win32.c=C2=A0=C2=A0=C2=A0 | 16 +++= +++++++---- >>>>>>> =C2=A0=C2=A0 util/rcu.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 3 ++= - >>>>>>> =C2=A0=C2=A0 util/thread-pool.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 |=C2=A0 4 +++- >>>>>>> =C2=A0=C2=A0 28 files changed, 243 insertions(+), 101 deletions(= -) >>>>>>> >> ...snip, and only leave the three uncertain small topics... >>>>>>> diff --git a/migration/ram.c b/migration/ram.c >>>>>>> index 658dfa88a3..6e0cccf066 100644 >>>>>>> --- a/migration/ram.c >>>>>>> +++ b/migration/ram.c >>>>>>> @@ -473,6 +473,7 @@ static void compress_threads_save_cleanup(voi= d) >>>>>>> =C2=A0=C2=A0 static int compress_threads_save_setup(void) >>>>>>> =C2=A0=C2=A0 { >>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int i, thread_count; >>>>>>> +=C2=A0=C2=A0=C2=A0 Error *local_err =3D NULL; >>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!migrate_us= e_compression()) { >>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret= urn 0; >>>>>>> @@ -502,9 +503,12 @@ static int compress_threads_save_setup(void) >>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 com= p_param[i].quit =3D false; >>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 qem= u_mutex_init(&comp_param[i].mutex); >>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 qem= u_cond_init(&comp_param[i].cond); >>>>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 qemu_thread_create(co= mpress_threads + i, "compress", >>>>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 do_data_compress, comp_param + i, >>>>>>> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0 QEMU_THREAD_JOINABLE); >>>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!qemu_thread_crea= te(compress_threads + i, "compress", >>>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 do_data_compress, comp_para= m + i, >>>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 QEMU_THREAD_JOINABLE, &loca= l_err)) { >>>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= error_reportf_err(local_err, "failed to >>>>>>> create do_data_compress: "); >>>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= goto exit; > [1] > >>>>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>>>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return 0; >>>>>> Reviewing the migration changes is getting tiresome... >>>>> Yes, indeed, the migration involves a lot! Thanks so much for helpi= ng >>>>> to review! >>>>>> =C2=A0=C2=A0 Is reporting the >>>>>> error appropriate here, and why? >>>>> I think the qemu monitor should display the obvious and exact faili= ng >>>>> reason for administrators, esp considering that qemu_thread_create(= ) >>>>> itself does not print any message thus we have no idea which direct >>>>> function fails if gdb is not enabled. >>>>> IOW, I think David's answer to that ppc's error_reportf_err() also >>>>> apply here: >>>>> >>>>> "The error returns are for the guest, the reported errors are for t= he >>>>> guest administrator or management layers." >>>> There could well be an issue with the "management layers" part. Shou= ld >>>> this error be sent to the management layer via QMP somehow? Migratio= n >>>> maintainers should be able to assist with this question. >> Kindly ping migration maintainers. :) > I think both the maintainers are on holiday so possibly there won't be > any reply from them this week... :) > > Regarding to error reports of migration via QMP layer, please have a > look at d59ce6f344 ("migration: add reporting of errors for outgoing > migration", 2016-05-26). Though I see that even > qemu_savevm_state_setup() is not capturing error for the management > layer so if you want to pass this thread creation error upward you'll > possibly need to work on that as well. Thanks for the useful commit. :) I guess the "the client app" mentioned=20 is not qemu, but other upper thing, maybe something inside openstack? As I have to=20 say that I can see the error message (I mean the above error_reportf_err(...) ) be=20 printed to the screen when I use qemu command line via hmp to do the migration. For the qemu_savevm_state_setup(), I see it sets the f->last_error=20 (instead of s->error) to indicate whether to stop the migration or not when back to=20 migration_thread() in migration_detect_error(s). And no matter whether=20 qemu_savevm_state_setup() succeeds, the current code continues to set the migration state to be=20 ACTIVE. Emm, I am wondering whether this is on purpose.. > Though here note that when you "goto exit" at [1] you probably also > need to touch up the cleanup part since otherwise the join() could be > with an invalid thread ID, so you'll possibly need to check the thread > ID validity before do the join() of the compression thread. Thanks for pointing this out. I think my last patch is to fix this=20 problem, that is to add a check in qemu_thread_join(): +=C2=A0=C2=A0=C2=A0 if (!thread->thread) { +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return NULL; +=C2=A0=C2=A0=C2=A0 } Correct me if this is not the proper solution. :) Have a nice day, thanks :) Fei > > Regards, >