qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: John Snow <jsnow@redhat.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org
Cc: kwolf@redhat.com, Jeff Cody <jcody@redhat.com>, jtc@redhat.com
Subject: Re: [Qemu-devel] [PATCH 3/7] jobs: add exit shim
Date: Wed, 29 Aug 2018 10:16:30 +0200	[thread overview]
Message-ID: <050a91d3-8b64-fe69-4f96-a21e7ba89c68@redhat.com> (raw)
In-Reply-To: <2535fb2a-7079-f6cc-88d6-e25780691b7f@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 5554 bytes --]

On 2018-08-27 17:54, John Snow wrote:
> 
> 
> On 08/25/2018 09:05 AM, Max Reitz wrote:
>> On 2018-08-22 23:52, John Snow wrote:
>>>
>>>
>>> On 08/22/2018 07:43 AM, Max Reitz wrote:
>>>> On 2018-08-17 21:04, John Snow wrote:
>>>>> All jobs do the same thing when they leave their running loop:
>>>>> - Store the return code in a structure
>>>>> - wait to receive this structure in the main thread
>>>>> - signal job completion via job_completed
>>>>>
>>>>> Few jobs do anything beyond exactly this. Consolidate this exit
>>>>> logic for a net reduction in SLOC.
>>>>>
>>>>> More seriously, when we utilize job_defer_to_main_loop_bh to call
>>>>> a function that calls job_completed, job_finalize_single will run
>>>>> in a context where it has recursively taken the aio_context lock,
>>>>> which can cause hangs if it puts down a reference that causes a flush.
>>>>>
>>>>> You can observe this in practice by looking at mirror_exit's careful
>>>>> placement of job_completed and bdrv_unref calls.
>>>>>
>>>>> If we centralize job exiting, we can signal job completion from outside
>>>>> of the aio_context, which should allow for job cleanup code to run with
>>>>> only one lock, which makes cleanup callbacks less tricky to write.
>>>>>
>>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>>> ---
>>>>>  include/qemu/job.h |  7 +++++++
>>>>>  job.c              | 19 +++++++++++++++++++
>>>>>  2 files changed, 26 insertions(+)
>>>>
>>>> Currently all jobs do this, the question of course is why.  The answer
>>>> is because they are block jobs that need to do some graph manipulation
>>>> in the main thread, right?
>>>>
>>>
>>> Yep.
>>>
>>>> OK, that's reasonable enough, that sounds like even non-block jobs may
>>>> need this (i.e. modify some global qemu state that you can only do in
>>>> the main loop).  Interestingly, the create job only calls
>>>> job_completed() of which it says nowhere that it needs to be executed in
>>>> the main loop.
>>>>
>>>
>>> Yeah, not all jobs will have anything meaningful to do in the main loop
>>> context. This is one of them.
>>>
>>>> ...on second thought, do we really want to execute job_complete() in the
>>>> main loop?  First of all, all of the transactional functions will run in
>>>> the main loop.  Which makes sense, but it isn't noted anywhere.
>>>> Secondly, we may end up calling JobDriver.user_resume(), which is
>>>> probably not something we want to call in the main loop.
>>>>
>>>
>>> I think we need to execute job_complete in the main loop, or otherwise
>>> restructure the code that can run between job_completed and
>>> job_finalize_single so that .prepare/.commit/.abort/.clean run in the
>>> main thread, which is something we want to preserve.
>>
>> Sure.
>>
>>> It's simpler just to say that complete will run from the main thread,
>>> like it does presently.
>>
>> Yes, but we don't say that.
>>
>>> Why would we not want to call user_resume from the main loop? That's
>>> directly where it's called from, since it gets invoked directly from the
>>> qmp thread.
>>
>> Hmm!  True indeed.
>>
>> The reason why we might not want to do it is because the job may not run
>> in the main loop, so modifying the job (especially invoking a job
>> method) may be dangerous without taking precautions.
>>
>>>> OTOH, job_finish_sync() is something that has to be run in the main loop
>>>> because it polls the main loop (and as far as my FUSE experiments have
>>>> told me, polling a foreign AioContext doesn't work).
>>>>
>>>> So...  I suppose it would be nice if we had a real distinction which
>>>> functions are run in which AioContext.  It seems like we indeed want to
>>>> run job_completed() in the main loop, but what to do about the
>>>> user_resume() call in job_cancel_async()?
>>>>
>>>
>>> I don't think we need to do anything -- at least, these functions
>>> *already* run from the main loop.
>>
>> Yeah, but we don't mark that anywhere.  I really don't like that.  Jobs
>> need to know which of their functions are run in which AioContext.
>>
>>> mirror_exit et al get scheduled from job_defer_to_main_loop and call
>>> job_completed there, so it's already always done from the main loop; I'm
>>> just cutting out the part where the jobs have to manually schedule this.
>>
>> I'm not saying what you're doing is wrong, I'm just saying tracking
>> which things are running in which context is not easy because there are
>> no comments on how it's supposed to be run.  (Apart from your new
>> .exit() method which does say that it's run in the main loop.)
>>
>> No, I don't find it obvious which functions are run in which context
>> when first I have to think about in which context those functions are
>> used (e.g. user_resume is usually the result of a QMP command, so it's
>> run in the main loop; the transactional methods are part of completion,
>> which is done in the main loop, so they are also called in the main
>> loop; and so on).
>>
>> But that's not part of this series.  It just occurred to me when
>> tracking down which function belongs to which context when reviewing
>> this patch.
>>
>> Max
>>
> 
> Oh, I see. I can mark up the functions I/we expect to run in the main
> thread with comments above the function implementation, would that help?

Sure, that's exactly what I mean. :-)

> Probably also a top level document would also help... We're overdue for
> one after all the changes recently.

If you have the time, sure.

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2018-08-29  8:30 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-17 19:04 [Qemu-devel] [PATCH 0/7] jobs: remove job_defer_to_main_loop John Snow
2018-08-17 19:04 ` [Qemu-devel] [PATCH 1/7] jobs: change start callback to run callback John Snow
2018-08-20 18:28   ` Eric Blake
2018-08-20 19:04     ` John Snow
2018-08-22 10:51   ` Max Reitz
2018-08-22 23:01     ` John Snow
2018-08-25 13:33       ` Max Reitz
2018-08-25 14:15         ` Max Reitz
2018-08-27 16:01         ` John Snow
2018-08-17 19:04 ` [Qemu-devel] [PATCH 2/7] jobs: canonize Error object John Snow
2018-08-20 20:03   ` Eric Blake
2018-08-21  0:10   ` John Snow
2018-08-22 10:59     ` Max Reitz
2018-08-22 22:50       ` John Snow
2018-08-25 13:15         ` Max Reitz
2018-08-22 11:09   ` Max Reitz
2018-08-22 11:11   ` Max Reitz
2018-08-17 19:04 ` [Qemu-devel] [PATCH 3/7] jobs: add exit shim John Snow
2018-08-20 21:16   ` Eric Blake
2018-08-22 11:43   ` Max Reitz
2018-08-22 11:52     ` Max Reitz
2018-08-22 21:45       ` John Snow
2018-08-25 12:54         ` Max Reitz
2018-08-22 21:52     ` John Snow
2018-08-25 13:05       ` Max Reitz
2018-08-27 15:54         ` John Snow
2018-08-29  8:16           ` Max Reitz [this message]
2018-08-22 22:01     ` Eric Blake
2018-08-22 22:04       ` John Snow
2018-08-17 19:04 ` [Qemu-devel] [PATCH 4/7] block/commit: utilize job_exit shim John Snow
2018-08-17 19:18   ` John Snow
2018-08-22 11:58     ` Max Reitz
2018-08-22 21:55       ` John Snow
2018-08-25 13:07         ` Max Reitz
2018-08-22 11:55   ` Max Reitz
2018-08-17 19:04 ` [Qemu-devel] [PATCH 5/7] block/mirror: " John Snow
2018-08-22 12:06   ` Max Reitz
2018-08-22 12:15   ` Max Reitz
2018-08-22 22:05     ` John Snow
2018-08-25 15:02       ` Max Reitz
2018-08-25 15:14         ` Max Reitz
2018-08-28 20:25         ` John Snow
2018-08-29  8:28           ` Max Reitz
2018-08-28 21:51         ` John Snow
2018-08-17 19:04 ` [Qemu-devel] [PATCH 6/7] jobs: " John Snow
2018-08-22 12:20   ` Max Reitz
2018-08-22 23:40     ` John Snow
2018-08-17 19:04 ` [Qemu-devel] [PATCH 7/7] jobs: remove job_defer_to_main_loop John Snow
2018-08-22 12:21   ` Max Reitz
2018-08-18 16:27 ` [Qemu-devel] [PATCH 0/7] " no-reply
2018-08-18 16:31 ` no-reply
2018-09-04  2:06 ` no-reply
2018-09-04  2:09 ` no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=050a91d3-8b64-fe69-4f96-a21e7ba89c68@redhat.com \
    --to=mreitz@redhat.com \
    --cc=jcody@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=jtc@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).