From: Christian Brunner <chb@muc.de>
To: Yehuda Sadeh Weinraub <yehudasa@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
Simone Gotti <simone.gotti@gmail.com>,
ceph-devel@vger.kernel.org, qemu-devel@nongnu.org,
kvm@vger.kernel.org
Subject: Re: [Qemu-devel] Re: [PATCH] ceph/rbd block driver for qemu-kvm (v3)
Date: Tue, 13 Jul 2010 21:23:38 +0200 [thread overview]
Message-ID: <20100713192338.GA25126@sir.home> (raw)
In-Reply-To: <AANLkTilnN7SXDXIulNICT5kjdgYJuVJIWYhKr6RVP_7K@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1018 bytes --]
On Tue, Jul 13, 2010 at 11:27:03AM -0700, Yehuda Sadeh Weinraub wrote:
> >
> > There is another problem with very large i/o requests. I suspect that
> > this can be triggered only
> > with qemu-io and not in kvm, but I'll try to get a proper solution it anyway.
> >
>
> Have you made any progress with this issue? Just note that there were
> a few changes we introduced recently (a format change that allows
> renaming of rbd images, and some snapshots support), so everything
> will needed to be reposted once we figure out the aio issue.
Attached is a patch where I'm trying to solve the issue
with pthreads locking. It works well with qemu-io, but I'm
not sure if there are interferences with other threads in
qemu/kvm (I didn't have time to test this, yet).
Another thing I'm not sure about is the fact, that these
large I/O requests only happen with qemu-io. I've never seen
this happen inside a virtual machine. So do we really have
to fix this, as it is only a warning message (laggy).
Regards,
Christian
[-- Attachment #2: 0027-add-queueing-delay-based-on-queuesize.patch --]
[-- Type: text/plain, Size: 3024 bytes --]
>From fcef3d897e0357b252a189ed59e43bfd5c24d229 Mon Sep 17 00:00:00 2001
From: Christian Brunner <chb@muc.de>
Date: Tue, 22 Jun 2010 21:51:09 +0200
Subject: [PATCH 27/27] add queueing delay based on queuesize
---
block/rbd.c | 31 ++++++++++++++++++++++++++++++-
1 files changed, 30 insertions(+), 1 deletions(-)
diff --git a/block/rbd.c b/block/rbd.c
index 10daf20..c6693d7 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -24,7 +24,7 @@
#include <rados/librados.h>
#include <signal.h>
-
+#include <pthread.h>
int eventfd(unsigned int initval, int flags);
@@ -50,6 +50,7 @@ int eventfd(unsigned int initval, int flags);
*/
#define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER)
+#define MAX_QUEUE_SIZE 33554432 // 32MB
typedef struct RBDAIOCB {
BlockDriverAIOCB common;
@@ -79,6 +80,9 @@ typedef struct BDRVRBDState {
uint64_t size;
uint64_t objsize;
int qemu_aio_count;
+ uint64_t queuesize;
+ pthread_mutex_t *queue_mutex;
+ pthread_cond_t *queue_threshold;
} BDRVRBDState;
typedef struct rbd_obj_header_ondisk RbdHeader1;
@@ -334,6 +338,12 @@ static int rbd_open(BlockDriverState *bs, const char *filename, int flags)
le64_to_cpus((uint64_t *) & header->image_size);
s->size = header->image_size;
s->objsize = 1 << header->options.order;
+ s->queuesize = 0;
+
+ s->queue_mutex = qemu_malloc(sizeof(pthread_mutex_t));
+ pthread_mutex_init(s->queue_mutex, NULL);
+ s->queue_threshold = qemu_malloc(sizeof(pthread_cond_t));
+ pthread_cond_init (s->queue_threshold, NULL);
s->efd = eventfd(0, 0);
if (s->efd < 0) {
@@ -356,6 +366,11 @@ static void rbd_close(BlockDriverState *bs)
{
BDRVRBDState *s = bs->opaque;
+ pthread_cond_destroy(s->queue_threshold);
+ qemu_free(s->queue_threshold);
+ pthread_mutex_destroy(s->queue_mutex);
+ qemu_free(s->queue_mutex);
+
rados_close_pool(s->pool);
rados_deinitialize();
}
@@ -443,6 +458,12 @@ static void rbd_finish_aiocb(rados_completion_t c, RADOSCB *rcb)
int i;
acb->aiocnt--;
+ acb->s->queuesize -= rcb->segsize;
+ if (acb->s->queuesize+rcb->segsize > MAX_QUEUE_SIZE && acb->s->queuesize <= MAX_QUEUE_SIZE) {
+ pthread_mutex_lock(acb->s->queue_mutex);
+ pthread_cond_signal(acb->s->queue_threshold);
+ pthread_mutex_unlock(acb->s->queue_mutex);
+ }
r = rados_aio_get_return_value(c);
rados_aio_release(c);
if (acb->write) {
@@ -560,6 +581,14 @@ static BlockDriverAIOCB *rbd_aio_rw_vector(BlockDriverState *bs,
rcb->segsize = segsize;
rcb->buf = buf;
+ while (s->queuesize > MAX_QUEUE_SIZE) {
+ pthread_mutex_lock(s->queue_mutex);
+ pthread_cond_wait(s->queue_threshold, s->queue_mutex);
+ pthread_mutex_unlock(s->queue_mutex);
+ }
+
+ s->queuesize += segsize;
+
if (write) {
rados_aio_create_completion(rcb, NULL,
(rados_callback_t) rbd_finish_aiocb,
--
1.7.0.4
next prev parent reply other threads:[~2010-07-13 19:23 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-31 19:31 [PATCH] ceph/rbd block driver for qemu-kvm (v3) Christian Brunner
2010-06-01 8:43 ` [Qemu-devel] " Kevin Wolf
2010-06-02 7:42 ` Christian Brunner
2010-06-11 19:51 ` Simone Gotti
2010-06-17 19:05 ` Christian Brunner
2010-06-18 10:09 ` [Qemu-devel] " Kevin Wolf
2010-06-19 15:48 ` Christian Brunner
2010-07-13 18:27 ` Yehuda Sadeh Weinraub
2010-07-13 19:23 ` Christian Brunner [this message]
2010-07-13 19:41 ` Yehuda Sadeh Weinraub
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100713192338.GA25126@sir.home \
--to=chb@muc.de \
--cc=ceph-devel@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=simone.gotti@gmail.com \
--cc=yehudasa@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).