From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=56518 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P2iTJ-0001DW-9K for qemu-devel@nongnu.org; Mon, 04 Oct 2010 06:40:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P2iTI-0008Ag-0j for qemu-devel@nongnu.org; Mon, 04 Oct 2010 06:40:41 -0400 Received: from e23smtp01.au.ibm.com ([202.81.31.143]:60977) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1P2iTH-0008AW-FF for qemu-devel@nongnu.org; Mon, 04 Oct 2010 06:40:39 -0400 Received: from d23relay04.au.ibm.com (d23relay04.au.ibm.com [202.81.31.246]) by e23smtp01.au.ibm.com (8.14.4/8.13.1) with ESMTP id o94AbYvu024208 for ; Mon, 4 Oct 2010 21:37:34 +1100 Received: from d23av01.au.ibm.com (d23av01.au.ibm.com [9.190.234.96]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o94AeaNo2211994 for ; Mon, 4 Oct 2010 21:40:36 +1100 Received: from d23av01.au.ibm.com (loopback [127.0.0.1]) by d23av01.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id o94AeZ3Q009011 for ; Mon, 4 Oct 2010 21:40:36 +1100 Date: Mon, 4 Oct 2010 16:10:30 +0530 From: Arun R Bharadwaj Message-ID: <20101004104030.GA22130@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Subject: [Qemu-devel] qemu_cond_signal() taking a long time to complete. Reply-To: arun@linux.vnet.ibm.com List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: sripathik@in.ibm.com, arun , mohan@in.ibm.com, jvrao@linux.vnet.ibm.com, aneesh.kumar@linux.vnet.ibm.com Hi, I am working on introducing threading model into Qemu. This introduces the Threadlets infrastructure which allows subsystems to offload possibly blocking work to a queue to be processed by a pool of threads asynchrnonously. Threadlets are useful when there are operations that can be performed outside the context of the VCPU/IO threads inorder to free these latter to service any other guest requests. As of now we have converted a few v9fs calls like v9fs_read, v9fs_write etc to this model to test the working nature of this model. I observed that performance is degrading in the threading model for the following reason: Taking the example of v9fs_read call: We submit the blocking work in v9fs_read to a queue and do a qemu_cond_signal(). the work will be picked up by a worker thread which is waiting on the condition variable to go true. I measured the time taken to execute the v9fs_read call; in the case without the threading model, it takes around 15microseconds to complete, while in the threading model, it takes around 30microsends to complete. Most of this extra time (around 22microsends) is spent in completing the qemu_cond_signal() call. I suspect this is the reason why I am seeing performance hit with the threading model, because this time is much more than the time needed to complete the entire v9fs_read call in the non threading model case. I need advice on how to proceed from this situation. Pasting relevant code snippets below. thanks arun. --- /* Code to sumbit work to the queue */ void submit_threadlet_to_queue(ThreadletQueue *queue, ThreadletWork *work) { qemu_mutex_lock(&(queue->lock)); if (queue->idle_threads == 0 && queue->cur_threads < queue->max_threads) { spawn_threadlet(queue); } QTAILQ_INSERT_TAIL(&(queue->request_list), work, node); qemu_cond_signal(&(queue->cond)); qemu_mutex_unlock(&(queue->lock)); } /* Worker thread code */ static void *threadlet_worker(void *data) { ThreadletQueue *queue = data; while (1) { ThreadletWork *work; int ret = 0; qemu_mutex_lock(&(queue->lock)); while (QTAILQ_EMPTY(&(queue->request_list)) && (ret != ETIMEDOUT)) { ret = qemu_cond_timedwait(&(queue->cond), &(queue->lock), 10*100000); } assert(queue->idle_threads != 0); if (QTAILQ_EMPTY(&(queue->request_list))) { if (queue->cur_threads > queue->min_threads) { /* We retain the minimum number of threads */ break; } } else { work = QTAILQ_FIRST(&(queue->request_list)); QTAILQ_REMOVE(&(queue->request_list), work, node); queue->idle_threads--; qemu_mutex_unlock(&(queue->lock)); /* execute the work function */ work->func(work); qemu_mutex_lock(&(queue->lock)); queue->idle_threads++; } qemu_mutex_unlock(&(queue->lock)); } queue->idle_threads--; queue->cur_threads--; return NULL; }