From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57303) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fnQXL-0007LJ-8Q for qemu-devel@nongnu.org; Wed, 08 Aug 2018 11:38:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fnQXK-0007qh-7l for qemu-devel@nongnu.org; Wed, 08 Aug 2018 11:38:11 -0400 References: <20180807043349.27196-1-jsnow@redhat.com> <20180807043349.27196-3-jsnow@redhat.com> <20180808040253.GE755222@localhost.localdomain> <20180808152301.GC15410@localhost.localdomain> From: John Snow Message-ID: Date: Wed, 8 Aug 2018 11:38:04 -0400 MIME-Version: 1.0 In-Reply-To: <20180808152301.GC15410@localhost.localdomain> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 02/21] jobs: add exit shim List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf , Jeff Cody Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, Max Reitz , Markus Armbruster , "Dr. David Alan Gilbert" , Eric Blake On 08/08/2018 11:23 AM, Kevin Wolf wrote: > Am 08.08.2018 um 06:02 hat Jeff Cody geschrieben: >> On Tue, Aug 07, 2018 at 12:33:30AM -0400, John Snow wrote: >>> Most jobs do the same thing when they leave their running loop: >>> - Store the return code in a structure >>> - wait to receive this structure in the main thread >>> - signal job completion via job_completed >>> >>> More seriously, when we utilize job_defer_to_main_loop_bh to call >>> a function that calls job_completed, job_finalize_single will run >>> in a context where it has recursively taken the aio_context lock, >>> which can cause hangs if it puts down a reference that causes a flush. >>> >>> The job infrastructure is perfectly capable of registering job >>> completion itself when we leave the job's entry point. In this >>> context, we can signal job completion from outside of the aio_context, >>> which should allow for job cleanup code to run with only one lock. >>> >>> Signed-off-by: John Snow >> >> I like the simplification, both in SLOC and in exit logic (as seen in >> patches 3-7). > > I agree, unifying this seems like a good idea. > > Like in the first patch, I'm not convinced of the details, though. > Essentially, this is my objection regarding job->err extended to > job->ret: You rely on jobs setting job->ret and job->err, but the > interfaces don't really show this. > >>> @@ -546,6 +559,12 @@ static void coroutine_fn job_co_entry(void *opaque) >>> assert(job && job->driver && job->driver->start); >>> job_pause_point(job); >>> job->driver->start(job); >> >> One nit-picky observation here, that is unrelated to this patch: reading >> through, it may not be so obvious that 'start' is really a 'run' or >> 'execute', (linguistically, to me 'start' implies a kick-off rather than >> ongoing execution). > > I had exactly the same thought. My proposal is to change the existing... > > CoroutineEntry *start; > > ...which is just short for... > > void coroutine_fn start(void *opaque); > > ...into this one: > > int coroutine_fn run(void *opaque, Error **errp); > > I see that at the end of the series, you actually introduced an int > return value already. I would have done that from the start, but as long > the final state makes sense, I won't insist. > > But can we have the Error **errp addition, too? Pretty please? > > Kevin > I'm actually glad you want that addition, I was considering very strongly adding it but I felt like I had made the series long enough already and didn't want to change too much all at once. The basic thought was just: "It'd sure be nice to have a generic function entry point that looks like it returns the same error information as our non-coroutine functions." I can absolutely work that in, and break this series into two parts: (1) Rework jobs infrastructure to use the new run signature, and (2) Rework jobs to use the finalization callbacks. Sound good? --js