From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=51968 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OxiMW-0000aE-5U
	for qemu-devel@nongnu.org; Mon, 20 Sep 2010 11:33:01 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <kwolf@redhat.com>) id 1OxiMV-0004qE-1O
	for qemu-devel@nongnu.org; Mon, 20 Sep 2010 11:33:00 -0400
Received: from mx1.redhat.com ([209.132.183.28]:7758)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <kwolf@redhat.com>) id 1OxiMU-0004px-Nm
	for qemu-devel@nongnu.org; Mon, 20 Sep 2010 11:32:58 -0400
Message-ID: <4C977EC1.9010605@redhat.com>
Date: Mon, 20 Sep 2010 17:33:21 +0200
From: Kevin Wolf <kwolf@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] [RFC] block-queue: Delay and batch metadata writes
References: <1284991010-10951-1-git-send-email-kwolf@redhat.com>
	<4C977028.3050602@codemonkey.ws> <4C977626.4040806@codemonkey.ws>
In-Reply-To: <4C977626.4040806@codemonkey.ws>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: qemu-devel@nongnu.org

Am 20.09.2010 16:56, schrieb Anthony Liguori:
>>> +void blkqueue_flush(BlockQueue *bq)
>>> +{
>>> +    qemu_mutex_lock(&bq->flush_lock);
>>> +
>>> +    /* Process any left over requests */
>>> +    while (QTAILQ_FIRST(&bq->queue)) {
>>> +        blkqueue_process_request(bq);
>>> +    }
>>> +
>>> +    qemu_mutex_unlock(&bq->flush_lock);
>>> +}
>>> +
>>> +static void *blkqueue_thread(void *_bq)
>>> +{
>>> +    BlockQueue *bq = _bq;
>>> +#ifndef RUN_TESTS
>>> +    BlockQueueRequest *req;
>>> +#endif
>>> +
>>> +    qemu_mutex_lock(&bq->flush_lock);
>>> +    while (!bq->thread_done) {
>>> +        barrier();
> 
> A barrier shouldn't be needed here.

It was needed when I started with an empty thread because gcc would
"optimize" while(!bq->thread_done) into an endless loop. I guess there
is enough code added now that gcc won't try to be clever any more, so I
can remove that.

>>> +#ifndef RUN_TESTS
>>> +        req = QTAILQ_FIRST(&bq->queue);
>>> +
>>> +        /* Don't process barriers, we only do that on flushes */
>>> +        if (req&&  (req->type != REQ_TYPE_BARRIER || 
>>> bq->queue_size>  42)) {
>>> +            blkqueue_process_request(bq);
>>> +        } else {
>>> +            qemu_cond_wait(&bq->cond,&bq->flush_lock);
>>> +        }
> 
> 
> The normal pattern for this is:
> 
> while (!condition) {
>      qemu_cond_wait(&cond, &lock);
> }
> process_request()
> 
> It's generally best not to deviate from this pattern in terms of code 
> readability.

Hm, yes, I think you're right. The code used to be a bit more involved
here initially and it seems that I missed the last obvious piece of
simplification.

> A less invasive way of doing this (assuming we're okay with it from a 
> correctness perspective) is to make use of qemu_aio_wait() as a 
> replacement for qemu_mutex_lock() and shift the pread/pwrite calls to 
> bdrv_aio_write/bdrv_aio_read.
> 
> IOW, blkqueue_pwrite stages a request via bdrv_aio_write().  
> blkqueue_pread() either returns a cached read or it does a 
> bdrv_pread().  The blkqueue_flush() call will then do qemu_aio_wait() to 
> wait for all pending I/Os to complete.

I was actually considering that, but it would have been a bit more
coding to keep track of another queue of in-flight requests, juggling
with some more AIOCBs and implementing an emulation for the missing
bdrv_aio_pwrite. Nothing really dramatic, it just was easier to start
this way.

If we come to the conclusion that bdrv_aio_write is the way to go and
it's worth the work, I'm fine with changing it.

Kevin