From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=51866 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P2n6w-0007og-Ph for qemu-devel@nongnu.org; Mon, 04 Oct 2010 11:37:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P2n6q-0000o9-D3 for qemu-devel@nongnu.org; Mon, 04 Oct 2010 11:37:54 -0400 Received: from mail-gy0-f173.google.com ([209.85.160.173]:57162) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1P2n6q-0000nZ-AS for qemu-devel@nongnu.org; Mon, 04 Oct 2010 11:37:48 -0400 Received: by gya1 with SMTP id 1so2409553gya.4 for ; Mon, 04 Oct 2010 08:37:46 -0700 (PDT) Message-ID: <4CA9F4AA.7020007@codemonkey.ws> Date: Mon, 04 Oct 2010 10:37:14 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] qemu_cond_signal() taking a long time to complete. References: <20101004104030.GA22130@linux.vnet.ibm.com> In-Reply-To: <20101004104030.GA22130@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: arun@linux.vnet.ibm.com Cc: mohan@in.ibm.com, sripathik@in.ibm.com, jvrao@linux.vnet.ibm.com, qemu-devel@nongnu.org, aneesh.kumar@linux.vnet.ibm.com On 10/04/2010 05:40 AM, Arun R Bharadwaj wrote: > Hi, > > I am working on introducing threading model into Qemu. This introduces > the Threadlets infrastructure which allows subsystems to offload possibly > blocking work to a queue to be processed by a pool of threads asynchrnonously. > Threadlets are useful when there are operations that can be performed > outside the context of the VCPU/IO threads inorder to free these latter > to service any other guest requests. As of now we have converted a few > v9fs calls like v9fs_read, v9fs_write etc to this model to test the > working nature of this model. > > I observed that performance is degrading in the threading model for the > following reason: > > Taking the example of v9fs_read call: We submit the blocking work in > v9fs_read to a queue and do a qemu_cond_signal(). the work will be picked > up by a worker thread which is waiting on the condition variable to go > true. I measured the time taken to execute the v9fs_read call; in the > case without the threading model, it takes around 15microseconds to > complete, while in the threading model, it takes around 30microsends > to complete. Most of this extra time (around 22microsends) is spent in > completing the qemu_cond_signal() call. I suspect this is the reason why > I am seeing performance hit with the threading model, because this > time is much more than the time needed to complete the entire > v9fs_read call in the non threading model case. > > I need advice on how to proceed from this situation. Pasting relevant > code snippets below. > > thanks > arun. > --- > > /* Code to sumbit work to the queue */ > void submit_threadlet_to_queue(ThreadletQueue *queue, ThreadletWork > *work) > { > qemu_mutex_lock(&(queue->lock)); > if (queue->idle_threads == 0&& queue->cur_threads< > queue->max_threads) { > spawn_threadlet(queue); > } > QTAILQ_INSERT_TAIL(&(queue->request_list), work, node); > qemu_cond_signal(&(queue->cond)); > qemu_mutex_unlock(&(queue->lock)); > } > Try moving qemu_cond_signal past qemu_mutex_unlock(). Regards, Anthony Liguori > /* Worker thread code */ > static void *threadlet_worker(void *data) > { > ThreadletQueue *queue = data; > > while (1) { > ThreadletWork *work; > int ret = 0; > qemu_mutex_lock(&(queue->lock)); > > while (QTAILQ_EMPTY(&(queue->request_list))&& > (ret != ETIMEDOUT)) { > ret = qemu_cond_timedwait(&(queue->cond), > &(queue->lock), 10*100000); > } > > assert(queue->idle_threads != 0); > if (QTAILQ_EMPTY(&(queue->request_list))) { > if (queue->cur_threads> queue->min_threads) { > /* We retain the minimum number of threads */ > break; > } > } else { > work = QTAILQ_FIRST(&(queue->request_list)); > QTAILQ_REMOVE(&(queue->request_list), work, node); > > queue->idle_threads--; > qemu_mutex_unlock(&(queue->lock)); > > /* execute the work function */ > work->func(work); > > qemu_mutex_lock(&(queue->lock)); > queue->idle_threads++; > } > qemu_mutex_unlock(&(queue->lock)); > } > > queue->idle_threads--; > queue->cur_threads--; > > return NULL; > } > >