From: "Venkateswararao Jujjuri (JV)" <jvrao@linux.vnet.ibm.com>
To: arun@linux.vnet.ibm.com
Cc: mohan@in.ibm.com, sripathik@in.ibm.com, qemu-devel@nongnu.org,
aneesh.kumar@linux.vnet.ibm.com
Subject: Re: [Qemu-devel] qemu_cond_signal() taking a long time to complete.
Date: Mon, 04 Oct 2010 21:26:40 -0700 [thread overview]
Message-ID: <4CAAA900.8090107@linux.vnet.ibm.com> (raw)
In-Reply-To: <20101004104030.GA22130@linux.vnet.ibm.com>
On 10/4/2010 3:40 AM, Arun R Bharadwaj wrote:
> Hi,
>
> I am working on introducing threading model into Qemu. This introduces
> the Threadlets infrastructure which allows subsystems to offload possibly
> blocking work to a queue to be processed by a pool of threads asynchrnonously.
> Threadlets are useful when there are operations that can be performed
> outside the context of the VCPU/IO threads inorder to free these latter
> to service any other guest requests. As of now we have converted a few
> v9fs calls like v9fs_read, v9fs_write etc to this model to test the
> working nature of this model.
My numbers are little different.
I had a patch with only read() happening through threadlet infrastructure.
Everything else is with old code i.e through vCPU thread.
With SMP=1,
Three threads of dd of=/dev/null if=<file> bs=4k count=100000
Read on threadlet code gave - 21.6 MB/s
Everything on vCPU - 24.2 MB/s
Not much difference.
With SMP=4
Read on threadlet code gave - 32 MB/s
Everything on vCPU - 28 MB/s
Here threadlet code is better... still not by much.
In either case, Threadlet code is giving better system time.
vCPU model is taking around 36 sec system time
Typical everything in vCPU model:
real 0m44.885s
user 0m0.130s
sys 0m36.237s
Typical read in threadlet model:
real 0m37.354s
user 0m0.082s
sys 0m6.006s
Thanks,
JV
>
> I observed that performance is degrading in the threading model for the
> following reason:
>
> Taking the example of v9fs_read call: We submit the blocking work in
> v9fs_read to a queue and do a qemu_cond_signal(). the work will be picked
> up by a worker thread which is waiting on the condition variable to go
> true. I measured the time taken to execute the v9fs_read call; in the
> case without the threading model, it takes around 15microseconds to
> complete, while in the threading model, it takes around 30microsends
> to complete. Most of this extra time (around 22microsends) is spent in
> completing the qemu_cond_signal() call. I suspect this is the reason why
> I am seeing performance hit with the threading model, because this
> time is much more than the time needed to complete the entire
> v9fs_read call in the non threading model case.
>
> I need advice on how to proceed from this situation. Pasting relevant
> code snippets below.
>
> thanks
> arun.
> ---
>
> /* Code to sumbit work to the queue */
> void submit_threadlet_to_queue(ThreadletQueue *queue, ThreadletWork
> *work)
> {
> qemu_mutex_lock(&(queue->lock));
> if (queue->idle_threads == 0&& queue->cur_threads<
> queue->max_threads) {
> spawn_threadlet(queue);
> }
> QTAILQ_INSERT_TAIL(&(queue->request_list), work, node);
> qemu_cond_signal(&(queue->cond));
> qemu_mutex_unlock(&(queue->lock));
> }
>
> /* Worker thread code */
> static void *threadlet_worker(void *data)
> {
> ThreadletQueue *queue = data;
>
> while (1) {
> ThreadletWork *work;
> int ret = 0;
> qemu_mutex_lock(&(queue->lock));
>
> while (QTAILQ_EMPTY(&(queue->request_list))&&
> (ret != ETIMEDOUT)) {
> ret = qemu_cond_timedwait(&(queue->cond),
> &(queue->lock), 10*100000);
> }
>
> assert(queue->idle_threads != 0);
> if (QTAILQ_EMPTY(&(queue->request_list))) {
> if (queue->cur_threads> queue->min_threads) {
> /* We retain the minimum number of threads */
> break;
> }
> } else {
> work = QTAILQ_FIRST(&(queue->request_list));
> QTAILQ_REMOVE(&(queue->request_list), work, node);
>
> queue->idle_threads--;
> qemu_mutex_unlock(&(queue->lock));
>
> /* execute the work function */
> work->func(work);
>
> qemu_mutex_lock(&(queue->lock));
> queue->idle_threads++;
> }
> qemu_mutex_unlock(&(queue->lock));
> }
>
> queue->idle_threads--;
> queue->cur_threads--;
>
> return NULL;
> }
>
prev parent reply other threads:[~2010-10-05 4:26 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-04 10:40 [Qemu-devel] qemu_cond_signal() taking a long time to complete Arun R Bharadwaj
2010-10-04 13:58 ` Stefan Hajnoczi
2010-10-04 15:37 ` Anthony Liguori
2010-10-04 15:49 ` Stefan Hajnoczi
2010-10-05 3:58 ` Venkateswararao Jujjuri (JV)
2010-10-05 4:26 ` Venkateswararao Jujjuri (JV) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CAAA900.8090107@linux.vnet.ibm.com \
--to=jvrao@linux.vnet.ibm.com \
--cc=aneesh.kumar@linux.vnet.ibm.com \
--cc=arun@linux.vnet.ibm.com \
--cc=mohan@in.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=sripathik@in.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.