From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:34821)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Uz7Po-0007ma-SX
	for qemu-devel@nongnu.org; Tue, 16 Jul 2013 11:43:50 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Uz7Ph-00034T-OB
	for qemu-devel@nongnu.org; Tue, 16 Jul 2013 11:43:48 -0400
Received: from mx1.redhat.com ([209.132.183.28]:46277)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Uz7Ph-00033z-Gi
	for qemu-devel@nongnu.org; Tue, 16 Jul 2013 11:43:41 -0400
Message-ID: <51E56A1A.50502@redhat.com>
Date: Tue, 16 Jul 2013 17:43:22 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <1373127897-3445-1-git-send-email-alex@alex.org.uk>
	<51E4063D.6010308@redhat.com>	<75638CBA408455BF52192DA7@Ximines.local>
	<51E4613D.9000106@redhat.com>
	<44590808AF4A6E7DC093637A@nimrod.local> <51E4E54A.10908@redhat.com>
	<E703FC41C45C57C2CB39E5C7@nimrod.local>
	<51E4F77C.2090509@redhat.com>
	<794E19D97CCC267CCBFA8397@Ximines.local>
In-Reply-To: <794E19D97CCC267CCBFA8397@Ximines.local>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH] [RFC] aio/async: Add timed bottom-halves
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Bligh <alex@alex.org.uk>
Cc: Kevin Wolf <kwolf@redhat.com>, Anthony Liguori <aliguori@us.ibm.com>, qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>, rth@twiddle.net

Il 16/07/2013 17:29, Alex Bligh ha scritto:
> Paolo,
> 
> --On 16 July 2013 09:34:20 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>>>> You did.  But aio_wait() ignores the timeout.  It is only used by the
>>>> main loop.
>>>
>>> OK well that seems worth fixing in any case, as even without timed bh's
>>> that means no bh can be executed for an indeterminate time. I'll have
>>> a look at that.
>>
>> No, BHs work because they do aio_notify().  Idle BHs can be skipped for
>> an indeterminate time, but that's fine because idle BHs are a hack that
>> we should not need at all.
> 
> OK, so a bit more code reading later, I think I now understand.
> 
> 1. non-idle bh's call aio_notify at schedule time, which will cause
>   any poll to exit immediately because at least one FD will be ready.
> 
> 2. idle bh's do not do aio_notify() because we don't care whether
>   they get stuck in aio_poll and they are a hack [actually I think
>   we could do better here]
> 
> 3. aio_poll calls aio_bh_poll. If this returns true, this indicates
>   at least one non-idle bh exists, which causes aio_poll not to
>   block.

No, this indicates that at least one scheduled non-idle bh exist*ed*,
which causes aio_poll not to block (because some progress has been done).

There could be non-idle BHs scheduled during aio_poll, but not visited
by aio_bh_poll.  These rely on aio_notify (it could be an idle BH who
scheduled this non-idle BH, so aio_bh_poll might return 0).

>   Question 1: it then calls aio_dispatch - if this itself
>   generates non-idle bh's, this would seem not to clear the blocking
>   flag. Does this rely on aio_notify?

Yes.

>   Question 2: if we're already telling aio_poll not to block
>   by the presence of non-idle bh's as detected in aio_bh_poll,
>   why do we need to use aio_notify too? IE why do we have both
>   the blocking= logic AND the aio_notify logic?

See above (newly-scheduled BHs are always handled with aio_notify, the
blocking=false logic is for previously-scheduled BHs).

> 4. aio_poll then calls g_poll (POSIX) or WaitForMultipleObjects
>   (Windows). However, the timeout is either 0 or infinite.
>   Both functions take a milliseconds (yuck) timeout, but that
>   is not used.

I agree with the yuck. :)  But Linux has the nanoseconds-resolution
ppoll, too.

> So, the first thing I don't understand is why aio_poll needs the
> return value of aio_bh_poll at all. Firstly, after sampling it,
> it then causes aio_dispatch, and that can presumably set its own
> bottom half callbacks; if this happens 'int blocking' won't be
> cleared, and it will still enter g_poll with an infinite timeout.
> Secondly, there seems to be an entirely separate mechanism
> (aio_notify) in any case. If a non-idle bh has been scheduled,
> this will cause g_poll to exit immediately as a read will be
> ready. I believe this is cleared by the bh being used.

I hope the above

> The second thing I don't understand is why we aren't using
> the timeout on g_poll / WaitForMultipleObjects.

Because so far it wasn't needed (insert rant about idle BHs being a
hack).  This is a good occasion to use it.  But I wouldn't introduce a
new one-off concept (almost as much of a hack as idle BHs), I would
rather reuse as much code as possible from QEMUTimer/QEMUClock.  I must
admit I don't have a clear idea of how the API would look like.

> It would
> seem to be reasonably easy to make aio_poll call aio_ctx_prepare
> or something that does the same calculation. This would fix
> idle bh's to be more reliable (we know it's safe to call them
> within aio_poll anyway, it's just a question of whether
> we turn an infinite wait into a 10ms wait).

Idle BHs could be changed to timers as well, and then they would disappear.

Paolo

> Perhaps these two are related.
> 
> I /think/ fixing the second (and removing the aio_notify
> from qemu_bh_schedule_at) is sufficient provided it checks
> for scheduled bh's immediately prior to the poll. This assumes
> other threads cannot schedule bh's. This would seem to be less
> intrusive that a TimedEventNotifier approach which (as far as I
> can see) requires another thread.
>