From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1JhNfM-0001kD-UU
	for qemu-devel@nongnu.org; Thu, 03 Apr 2008 07:31:37 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1JhNfJ-0001gC-Tl
	for qemu-devel@nongnu.org; Thu, 03 Apr 2008 07:31:35 -0400
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1JhNfJ-0001fu-JN
	for qemu-devel@nongnu.org; Thu, 03 Apr 2008 07:31:33 -0400
Received: from mail2.shareable.org ([80.68.89.115])
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60) (envelope-from <jamie@shareable.org>) id 1JhNfJ-000123-FF
	for qemu-devel@nongnu.org; Thu, 03 Apr 2008 07:31:33 -0400
Received: from jamie by mail2.shareable.org with local (Exim 4.63)
	(envelope-from <jamie@shareable.org>) id 1JhNfF-0004nP-5e
	for qemu-devel@nongnu.org; Thu, 03 Apr 2008 12:31:29 +0100
Date: Thu, 3 Apr 2008 12:31:29 +0100
From: Jamie Lokier <jamie@shareable.org>
Subject: Re: [Qemu-devel] QEMU/KVM SCSI lock up
Message-ID: <20080403113128.GA17900@shareable.org>
References: <87k5jfixcb.fsf@fftw.org> <47F49770.1010906@qumranet.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <47F49770.1010906@qumranet.com>
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

Avi Kivity wrote:
> >    bdrv_aio_read()
> >    r->sector += n;
> >    r->sector_count -= n;
> >
> >For reasons that I do not fully understand, bdrv_aio_read() does
> >not return immediately, but instead it calls scsi_read_data()
> >recursively.
> 
> What happens (I think) is that bdrv_aio_read() completes immediately, 
> calls the completion callback, which starts a read for the next batch of 
> sectors.
> 
> Long term we want to replace the recursion by queuing.

Ah yes.  Reminds me of a bug in a program of mine with asynchronous
sockets, which sometimes completes immediately.  My async connection
function took a callback argument.  The callback was called
immediately if the connect returned success immediately, otherwise was
queued.

Then I wrote some code like this:

     // some stuff
     my_obj->connector = async_connection (address, my_obj_connect_cb, my_obj);
     // some more stuff

     void my_obj_connect_cb (void * arg)
     {
        MyObj * my_obj = arg;
        do_stuff_with (my_obj->connector);
     }

That worked for several months flawlessly, the code was deployed on
500 units around the country, then one day an async connect happened
to complete fast enough that the callback was called recursively, and
a core dump followed.

I learned a few lessons about async callbacks:

    1. Queue them, even when the async operation completes immediately.
       It's really not worth the "optimisation" of immediate callback,
       especially if the code for queuing is already there.

       Even when immediate callback is documented, mistakes are too
       easy to make when calling it, because the async call pattern
       _looks_ safe, and may work most of the time or always with some
       configurations.

    2. If you really must have recursion, e.g. perhaps the synchronous
       path is common and a hot path, beware that this is prone to
       mistakes in usage.  Check that it's worth the optimisation.
       Then document with words like "warning" the need for special
       care.

    3. If there is immediate callback possible, sometimes it's useful
       API to have a flag argument to enable that, so that always
       queueing is an option, and so the programmer has to think
       about consequences if they set the flag.  E.g.:

          async_id = async_func (args, callback, cb_arg, CB_ALWAYS_QUEUED);
          async_id = async_func (args, callback, cb_arg, CB_MAYBE_IMMEDIATE);

    3. The code pattern "async_id = async_func (args, callback,
       callback_arg)" looks very natural, but it doesn't work when
       there is immediate callback.

       Callback is called before the id can be stored anywhere.  If
       the async_id is still meaningful at the point of callback, the
       API is fundamentally broken.

Just a few notes on async callback API pitfalls I've had...

-- Jamie