* [Qemu-devel] [PATCH] rbd: add an asynchronous flush @ 2013-03-29 7:59 Josh Durgin 2013-03-29 20:03 ` [Qemu-devel] [PATCH v2] " Josh Durgin 0 siblings, 1 reply; 24+ messages in thread From: Josh Durgin @ 2013-03-29 7:59 UTC (permalink / raw) To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi The existing bdrv_co_flush_to_disk implementation uses rbd_flush(), which is sychronous and causes the main qemu thread to block until it is complete. This results in unresponsiveness and extra latency for the guest. Fix this by using an asynchronous version of flush. This was added to librbd with a special #define to indicate its presence, since it will be backported to stable versions. Thus, there is no need to check the version of librbd. Implement this as bdrv_aio_flush, since it matches other aio functions in the rbd block driver, and leave out bdrv_co_flush_to_disk when the asynchronous version is available. Reported-by: Oliver Francke <oliver@filoo.de> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> --- block/rbd.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/block/rbd.c b/block/rbd.c index 1a8ea6d..568b279 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -63,7 +63,8 @@ typedef enum { RBD_AIO_READ, RBD_AIO_WRITE, - RBD_AIO_DISCARD + RBD_AIO_DISCARD, + RBD_AIO_FLUSH } RBDAIOCmd; typedef struct RBDAIOCB { @@ -659,6 +660,16 @@ static int rbd_aio_discard_wrapper(rbd_image_t image, #endif } +static int rbd_aio_flush_wrapper(rbd_image_t image, + rbd_completion_t comp) +{ +#ifdef LIBRBD_SUPPORTS_AIO_FLUSH + return rbd_aio_flush(image, comp); +#else + return -ENOTSUP; +#endif +} + static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov, @@ -679,7 +690,7 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, acb = qemu_aio_get(&rbd_aiocb_info, bs, cb, opaque); acb->cmd = cmd; acb->qiov = qiov; - if (cmd == RBD_AIO_DISCARD) { + if (cmd == RBD_AIO_DISCARD || cmd == RBD_AIO_FLUSH) { acb->bounce = NULL; } else { acb->bounce = qemu_blockalign(bs, qiov->size); @@ -723,6 +734,9 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, case RBD_AIO_DISCARD: r = rbd_aio_discard_wrapper(s->image, off, size, c); break; + case RBD_AIO_FLUSH: + r = rbd_aio_flush_wrapper(s->image, c); + break; default: r = -EINVAL; } @@ -762,6 +776,16 @@ static BlockDriverAIOCB *qemu_rbd_aio_writev(BlockDriverState *bs, RBD_AIO_WRITE); } +#ifdef LIBRBD_SUPPORTS_AIO_FLUSH +static BlockDriverAIOCB *qemu_rbd_aio_flush(BlockDriverState *bs, + BlockDriverCompletionFunc *cb, + void *opaque) +{ + return rbd_start_aio(bs, 0, NULL, 0, cb, opaque, RBD_AIO_FLUSH); +} + +#else + static int qemu_rbd_co_flush(BlockDriverState *bs) { #if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 1) @@ -772,6 +796,7 @@ static int qemu_rbd_co_flush(BlockDriverState *bs) return 0; #endif } +#endif static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi) { @@ -949,7 +974,12 @@ static BlockDriver bdrv_rbd = { .bdrv_aio_readv = qemu_rbd_aio_readv, .bdrv_aio_writev = qemu_rbd_aio_writev, + +#ifdef LIBRBD_SUPPORTS_AIO_FLUSH + .bdrv_aio_flush = qemu_rbd_aio_flush, +#else .bdrv_co_flush_to_disk = qemu_rbd_co_flush, +#endif #ifdef LIBRBD_SUPPORTS_DISCARD .bdrv_aio_discard = qemu_rbd_aio_discard, -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [Qemu-devel] [PATCH v2] rbd: add an asynchronous flush 2013-03-29 7:59 [Qemu-devel] [PATCH] rbd: add an asynchronous flush Josh Durgin @ 2013-03-29 20:03 ` Josh Durgin 2013-04-02 14:10 ` Kevin Wolf 0 siblings, 1 reply; 24+ messages in thread From: Josh Durgin @ 2013-03-29 20:03 UTC (permalink / raw) To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi The existing bdrv_co_flush_to_disk implementation uses rbd_flush(), which is sychronous and causes the main qemu thread to block until it is complete. This results in unresponsiveness and extra latency for the guest. Fix this by using an asynchronous version of flush. This was added to librbd with a special #define to indicate its presence, since it will be backported to stable versions. Thus, there is no need to check the version of librbd. Implement this as bdrv_aio_flush, since it matches other aio functions in the rbd block driver, and leave out bdrv_co_flush_to_disk when the asynchronous version is available. Reported-by: Oliver Francke <oliver@filoo.de> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> --- v2: * include hunk treating write, discard, and flush completions the same, since they have no result data block/rbd.c | 37 +++++++++++++++++++++++++++++++++---- 1 file changed, 33 insertions(+), 4 deletions(-) diff --git a/block/rbd.c b/block/rbd.c index 1a8ea6d..141b488 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -63,7 +63,8 @@ typedef enum { RBD_AIO_READ, RBD_AIO_WRITE, - RBD_AIO_DISCARD + RBD_AIO_DISCARD, + RBD_AIO_FLUSH } RBDAIOCmd; typedef struct RBDAIOCB { @@ -379,8 +380,7 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb) r = rcb->ret; - if (acb->cmd == RBD_AIO_WRITE || - acb->cmd == RBD_AIO_DISCARD) { + if (acb->cmd != RBD_AIO_READ) { if (r < 0) { acb->ret = r; acb->error = 1; @@ -659,6 +659,16 @@ static int rbd_aio_discard_wrapper(rbd_image_t image, #endif } +static int rbd_aio_flush_wrapper(rbd_image_t image, + rbd_completion_t comp) +{ +#ifdef LIBRBD_SUPPORTS_AIO_FLUSH + return rbd_aio_flush(image, comp); +#else + return -ENOTSUP; +#endif +} + static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov, @@ -679,7 +689,7 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, acb = qemu_aio_get(&rbd_aiocb_info, bs, cb, opaque); acb->cmd = cmd; acb->qiov = qiov; - if (cmd == RBD_AIO_DISCARD) { + if (cmd == RBD_AIO_DISCARD || cmd == RBD_AIO_FLUSH) { acb->bounce = NULL; } else { acb->bounce = qemu_blockalign(bs, qiov->size); @@ -723,6 +733,9 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, case RBD_AIO_DISCARD: r = rbd_aio_discard_wrapper(s->image, off, size, c); break; + case RBD_AIO_FLUSH: + r = rbd_aio_flush_wrapper(s->image, c); + break; default: r = -EINVAL; } @@ -762,6 +775,16 @@ static BlockDriverAIOCB *qemu_rbd_aio_writev(BlockDriverState *bs, RBD_AIO_WRITE); } +#ifdef LIBRBD_SUPPORTS_AIO_FLUSH +static BlockDriverAIOCB *qemu_rbd_aio_flush(BlockDriverState *bs, + BlockDriverCompletionFunc *cb, + void *opaque) +{ + return rbd_start_aio(bs, 0, NULL, 0, cb, opaque, RBD_AIO_FLUSH); +} + +#else + static int qemu_rbd_co_flush(BlockDriverState *bs) { #if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 1) @@ -772,6 +795,7 @@ static int qemu_rbd_co_flush(BlockDriverState *bs) return 0; #endif } +#endif static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi) { @@ -949,7 +973,12 @@ static BlockDriver bdrv_rbd = { .bdrv_aio_readv = qemu_rbd_aio_readv, .bdrv_aio_writev = qemu_rbd_aio_writev, + +#ifdef LIBRBD_SUPPORTS_AIO_FLUSH + .bdrv_aio_flush = qemu_rbd_aio_flush, +#else .bdrv_co_flush_to_disk = qemu_rbd_co_flush, +#endif #ifdef LIBRBD_SUPPORTS_DISCARD .bdrv_aio_discard = qemu_rbd_aio_discard, -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v2] rbd: add an asynchronous flush 2013-03-29 20:03 ` [Qemu-devel] [PATCH v2] " Josh Durgin @ 2013-04-02 14:10 ` Kevin Wolf 2013-04-04 8:35 ` [Qemu-devel] [PATCH v3 1/2] " Josh Durgin ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Kevin Wolf @ 2013-04-02 14:10 UTC (permalink / raw) To: Josh Durgin; +Cc: qemu-devel, Stefan Hajnoczi Am 29.03.2013 um 21:03 hat Josh Durgin geschrieben: > The existing bdrv_co_flush_to_disk implementation uses rbd_flush(), > which is sychronous and causes the main qemu thread to block until it > is complete. This results in unresponsiveness and extra latency for > the guest. > > Fix this by using an asynchronous version of flush. This was added to > librbd with a special #define to indicate its presence, since it will > be backported to stable versions. Thus, there is no need to check the > version of librbd. librbd is linked dynamically and the version on the build host isn't necessarily the same as the version qemu is run with. So shouldn't this better be a runtime check? > Implement this as bdrv_aio_flush, since it matches other aio functions > in the rbd block driver, and leave out bdrv_co_flush_to_disk when the > asynchronous version is available. > > Reported-by: Oliver Francke <oliver@filoo.de> > Signed-off-by: Josh Durgin <josh.durgin@inktank.com> Looks good otherwise. Kevin ^ permalink raw reply [flat|nested] 24+ messages in thread
* [Qemu-devel] [PATCH v3 1/2] rbd: add an asynchronous flush 2013-04-02 14:10 ` Kevin Wolf @ 2013-04-04 8:35 ` Josh Durgin 2013-04-04 8:35 ` [Qemu-devel] [PATCH 2/2] rbd: disable unsupported librbd functions at runtime Josh Durgin 2013-04-10 14:03 ` [Qemu-devel] [PATCH v2] rbd: add an asynchronous flush Josh Durgin 2 siblings, 0 replies; 24+ messages in thread From: Josh Durgin @ 2013-04-04 8:35 UTC (permalink / raw) To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi The existing bdrv_co_flush_to_disk implementation uses rbd_flush(), which is sychronous and causes the main qemu thread to block until it is complete. This results in unresponsiveness and extra latency for the guest. Fix this by using an asynchronous version of flush. This was added to librbd with a special #define to indicate its presence, since it will be backported to stable versions. Thus, there is no need to check the version of librbd. Implement this as bdrv_aio_flush, since it matches other aio functions in the rbd block driver, and leave out bdrv_co_flush_to_disk when the asynchronous version is available. Reported-by: Oliver Francke <oliver@filoo.de> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> --- Since v2: * remove #ifdefs around bdrv_aio_flush and qemu_rbd_aio_flush, since they are dealt with at runtime in the next patch block/rbd.c | 30 ++++++++++++++++++++++++++---- 1 file changed, 26 insertions(+), 4 deletions(-) diff --git a/block/rbd.c b/block/rbd.c index 1a8ea6d..037d82b 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -63,7 +63,8 @@ typedef enum { RBD_AIO_READ, RBD_AIO_WRITE, - RBD_AIO_DISCARD + RBD_AIO_DISCARD, + RBD_AIO_FLUSH } RBDAIOCmd; typedef struct RBDAIOCB { @@ -379,8 +380,7 @@ static void qemu_rbd_complete_aio(RADOSCB *rcb) r = rcb->ret; - if (acb->cmd == RBD_AIO_WRITE || - acb->cmd == RBD_AIO_DISCARD) { + if (acb->cmd != RBD_AIO_READ) { if (r < 0) { acb->ret = r; acb->error = 1; @@ -659,6 +659,16 @@ static int rbd_aio_discard_wrapper(rbd_image_t image, #endif } +static int rbd_aio_flush_wrapper(rbd_image_t image, + rbd_completion_t comp) +{ +#ifdef LIBRBD_SUPPORTS_AIO_FLUSH + return rbd_aio_flush(image, comp); +#else + return -ENOTSUP; +#endif +} + static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov, @@ -679,7 +689,7 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, acb = qemu_aio_get(&rbd_aiocb_info, bs, cb, opaque); acb->cmd = cmd; acb->qiov = qiov; - if (cmd == RBD_AIO_DISCARD) { + if (cmd == RBD_AIO_DISCARD || cmd == RBD_AIO_FLUSH) { acb->bounce = NULL; } else { acb->bounce = qemu_blockalign(bs, qiov->size); @@ -723,6 +733,9 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, case RBD_AIO_DISCARD: r = rbd_aio_discard_wrapper(s->image, off, size, c); break; + case RBD_AIO_FLUSH: + r = rbd_aio_flush_wrapper(s->image, c); + break; default: r = -EINVAL; } @@ -762,6 +775,13 @@ static BlockDriverAIOCB *qemu_rbd_aio_writev(BlockDriverState *bs, RBD_AIO_WRITE); } +static BlockDriverAIOCB *qemu_rbd_aio_flush(BlockDriverState *bs, + BlockDriverCompletionFunc *cb, + void *opaque) +{ + return rbd_start_aio(bs, 0, NULL, 0, cb, opaque, RBD_AIO_FLUSH); +} + static int qemu_rbd_co_flush(BlockDriverState *bs) { #if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 1) @@ -949,6 +969,8 @@ static BlockDriver bdrv_rbd = { .bdrv_aio_readv = qemu_rbd_aio_readv, .bdrv_aio_writev = qemu_rbd_aio_writev, + + .bdrv_aio_flush = qemu_rbd_aio_flush, .bdrv_co_flush_to_disk = qemu_rbd_co_flush, #ifdef LIBRBD_SUPPORTS_DISCARD -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* [Qemu-devel] [PATCH 2/2] rbd: disable unsupported librbd functions at runtime 2013-04-02 14:10 ` Kevin Wolf 2013-04-04 8:35 ` [Qemu-devel] [PATCH v3 1/2] " Josh Durgin @ 2013-04-04 8:35 ` Josh Durgin 2013-04-04 10:10 ` Kevin Wolf 2013-04-10 14:03 ` [Qemu-devel] [PATCH v2] rbd: add an asynchronous flush Josh Durgin 2 siblings, 1 reply; 24+ messages in thread From: Josh Durgin @ 2013-04-04 8:35 UTC (permalink / raw) To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi QEMU may be compiled against a newer version of librbd, but run and dynamically linked with an older version that does not support these functions. Declare them as weak symbols so they can be checked for existence at runtime. Only rbd_aio_discard, rbd_aio_flush, and rbd_flush were added after the initial version of librbd, so only they need to be checked. Signed-off-by: Josh Durgin <josh.durgin@inktank.com> --- block/rbd.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/block/rbd.c b/block/rbd.c index 037d82b..69a339a 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -44,6 +44,15 @@ * leading "\". */ +/* + * Treat newer librbd functions as weak symbols so we can detect + * whether they're supported at runtime, and disable their use + * if they aren't available. + */ +#pragma weak rbd_aio_discard +#pragma weak rbd_aio_flush +#pragma weak rbd_flush + /* rbd_aio_discard added in 0.1.2 */ #if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 2) #define LIBRBD_SUPPORTS_DISCARD @@ -970,6 +979,7 @@ static BlockDriver bdrv_rbd = { .bdrv_aio_readv = qemu_rbd_aio_readv, .bdrv_aio_writev = qemu_rbd_aio_writev, + /* select which of these to use at runtime in bdrv_rbd_init */ .bdrv_aio_flush = qemu_rbd_aio_flush, .bdrv_co_flush_to_disk = qemu_rbd_co_flush, @@ -985,6 +995,15 @@ static BlockDriver bdrv_rbd = { static void bdrv_rbd_init(void) { + if (!rbd_flush || rbd_aio_flush) { + bdrv_rbd.bdrv_co_flush_to_disk = NULL; + } + if (!rbd_aio_flush) { + bdrv_rbd.bdrv_aio_flush = NULL; + } + if (!rbd_aio_discard) { + bdrv_rbd.bdrv_aio_discard = NULL; + } bdrv_register(&bdrv_rbd); } -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH 2/2] rbd: disable unsupported librbd functions at runtime 2013-04-04 8:35 ` [Qemu-devel] [PATCH 2/2] rbd: disable unsupported librbd functions at runtime Josh Durgin @ 2013-04-04 10:10 ` Kevin Wolf 2013-04-04 16:50 ` Josh Durgin 0 siblings, 1 reply; 24+ messages in thread From: Kevin Wolf @ 2013-04-04 10:10 UTC (permalink / raw) To: Josh Durgin; +Cc: qemu-devel, Stefan Hajnoczi Am 04.04.2013 um 10:35 hat Josh Durgin geschrieben: > QEMU may be compiled against a newer version of librbd, but run and > dynamically linked with an older version that does not support these > functions. Declare them as weak symbols so they can be checked for > existence at runtime. > > Only rbd_aio_discard, rbd_aio_flush, and rbd_flush were added after > the initial version of librbd, so only they need to be checked. > > Signed-off-by: Josh Durgin <josh.durgin@inktank.com> > --- > block/rbd.c | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/block/rbd.c b/block/rbd.c > index 037d82b..69a339a 100644 > --- a/block/rbd.c > +++ b/block/rbd.c > @@ -44,6 +44,15 @@ > * leading "\". > */ > > +/* > + * Treat newer librbd functions as weak symbols so we can detect > + * whether they're supported at runtime, and disable their use > + * if they aren't available. > + */ > +#pragma weak rbd_aio_discard > +#pragma weak rbd_aio_flush > +#pragma weak rbd_flush > + > /* rbd_aio_discard added in 0.1.2 */ > #if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 2) > #define LIBRBD_SUPPORTS_DISCARD > @@ -970,6 +979,7 @@ static BlockDriver bdrv_rbd = { > .bdrv_aio_readv = qemu_rbd_aio_readv, > .bdrv_aio_writev = qemu_rbd_aio_writev, > > + /* select which of these to use at runtime in bdrv_rbd_init */ > .bdrv_aio_flush = qemu_rbd_aio_flush, > .bdrv_co_flush_to_disk = qemu_rbd_co_flush, > > @@ -985,6 +995,15 @@ static BlockDriver bdrv_rbd = { > > static void bdrv_rbd_init(void) > { > + if (!rbd_flush || rbd_aio_flush) { > + bdrv_rbd.bdrv_co_flush_to_disk = NULL; > + } > + if (!rbd_aio_flush) { > + bdrv_rbd.bdrv_aio_flush = NULL; > + } > + if (!rbd_aio_discard) { > + bdrv_rbd.bdrv_aio_discard = NULL; > + } > bdrv_register(&bdrv_rbd); > } After searching the net a bit and trying out some things myself, I'm afraid that this approach doesn't work. It does seem to do the right thing when build and runtime version are the same, but new build -> old runtime segfaults (because the symbol doesn't become NULL) and old build -> new runtime segfaults (because the symbol stays NULL). Unless I missed some build option that is different in qemu than in my test program. So it looks as if you had to use dlsym() instead. Kevin ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH 2/2] rbd: disable unsupported librbd functions at runtime 2013-04-04 10:10 ` Kevin Wolf @ 2013-04-04 16:50 ` Josh Durgin 2013-04-05 9:31 ` Kevin Wolf 0 siblings, 1 reply; 24+ messages in thread From: Josh Durgin @ 2013-04-04 16:50 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi On 04/04/2013 03:10 AM, Kevin Wolf wrote: > Am 04.04.2013 um 10:35 hat Josh Durgin geschrieben: >> QEMU may be compiled against a newer version of librbd, but run and >> dynamically linked with an older version that does not support these >> functions. Declare them as weak symbols so they can be checked for >> existence at runtime. >> >> Only rbd_aio_discard, rbd_aio_flush, and rbd_flush were added after >> the initial version of librbd, so only they need to be checked. >> >> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> >> --- >> block/rbd.c | 19 +++++++++++++++++++ >> 1 file changed, 19 insertions(+) >> >> diff --git a/block/rbd.c b/block/rbd.c >> index 037d82b..69a339a 100644 >> --- a/block/rbd.c >> +++ b/block/rbd.c >> @@ -44,6 +44,15 @@ >> * leading "\". >> */ >> >> +/* >> + * Treat newer librbd functions as weak symbols so we can detect >> + * whether they're supported at runtime, and disable their use >> + * if they aren't available. >> + */ >> +#pragma weak rbd_aio_discard >> +#pragma weak rbd_aio_flush >> +#pragma weak rbd_flush >> + >> /* rbd_aio_discard added in 0.1.2 */ >> #if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 2) >> #define LIBRBD_SUPPORTS_DISCARD >> @@ -970,6 +979,7 @@ static BlockDriver bdrv_rbd = { >> .bdrv_aio_readv = qemu_rbd_aio_readv, >> .bdrv_aio_writev = qemu_rbd_aio_writev, >> >> + /* select which of these to use at runtime in bdrv_rbd_init */ >> .bdrv_aio_flush = qemu_rbd_aio_flush, >> .bdrv_co_flush_to_disk = qemu_rbd_co_flush, >> >> @@ -985,6 +995,15 @@ static BlockDriver bdrv_rbd = { >> >> static void bdrv_rbd_init(void) >> { >> + if (!rbd_flush || rbd_aio_flush) { >> + bdrv_rbd.bdrv_co_flush_to_disk = NULL; >> + } >> + if (!rbd_aio_flush) { >> + bdrv_rbd.bdrv_aio_flush = NULL; >> + } >> + if (!rbd_aio_discard) { >> + bdrv_rbd.bdrv_aio_discard = NULL; >> + } >> bdrv_register(&bdrv_rbd); >> } > > After searching the net a bit and trying out some things myself, I'm > afraid that this approach doesn't work. It does seem to do the right > thing when build and runtime version are the same, but new build -> old > runtime segfaults (because the symbol doesn't become NULL) and old build > -> new runtime segfaults (because the symbol stays NULL). Unless I > missed some build option that is different in qemu than in my test > program. It worked when downgrading the runtime version of librbd for me, so maybe something about qemu's build was different. This patch wouldn't allow newer functions in a runtime version to be used though, since they would be disabled if qemu were built against an older version. > So it looks as if you had to use dlsym() instead. I tried this initially, using glib's portable wrappers, but found that it would require using the dlopen'd version only - not linking at build time against librbd at all. I thought this might be too big a change, but now it seems like the best way to go. Using this approach, upgrading from a version of librbd that doesn't support e.g. rbd_aio_flush to one that does would not require recompiling qemu to be able to use the new function. In general, librbd would not be needed at compile time for qemu to be able to use it, which would make the rbd block driver much easier to install on distros where rbd isn't enabled at build time. If you don't mind this approach, I'll post another version using only dlopen'd (via glib) librbd/librados. Thanks for the reviews, Josh ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH 2/2] rbd: disable unsupported librbd functions at runtime 2013-04-04 16:50 ` Josh Durgin @ 2013-04-05 9:31 ` Kevin Wolf 2013-04-10 0:05 ` [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically Josh Durgin 0 siblings, 1 reply; 24+ messages in thread From: Kevin Wolf @ 2013-04-05 9:31 UTC (permalink / raw) To: Josh Durgin; +Cc: qemu-devel, Stefan Hajnoczi Am 04.04.2013 um 18:50 hat Josh Durgin geschrieben: > On 04/04/2013 03:10 AM, Kevin Wolf wrote: > >After searching the net a bit and trying out some things myself, I'm > >afraid that this approach doesn't work. It does seem to do the right > >thing when build and runtime version are the same, but new build -> old > >runtime segfaults (because the symbol doesn't become NULL) and old build > >-> new runtime segfaults (because the symbol stays NULL). Unless I > >missed some build option that is different in qemu than in my test > >program. > > It worked when downgrading the runtime version of librbd for me, so > maybe something about qemu's build was different. This patch wouldn't > allow newer functions in a runtime version to be used though, since > they would be disabled if qemu were built against an older version. Interesting that one way worked for you. At least it seems to be unreliable, unfortunately. > >So it looks as if you had to use dlsym() instead. > > I tried this initially, using glib's portable wrappers, but found that > it would require using the dlopen'd version only - not linking at > build time against librbd at all. I thought this might be too big > a change, but now it seems like the best way to go. Aw, I didn't expect that... You're using g_module_open/symbol then? I never used it, but from reading the docs maybe g_module_open with a NULL filename might do the right thing when linked at build time? On the other hand, not linking at build time at all and relying on dlopen() only give us another nice feature: Distros can enable the rbd block driver without introducing a hard dependency of qemu on librbd. Maybe worth doing it for this reason alone. > Using this approach, upgrading from a version of librbd that doesn't > support e.g. rbd_aio_flush to one that does would not require > recompiling qemu to be able to use the new function. In general, librbd > would not be needed at compile time for qemu to be able to use it, > which would make the rbd block driver much easier to install on > distros where rbd isn't enabled at build time. > > If you don't mind this approach, I'll post another version using > only dlopen'd (via glib) librbd/librados. Sure, if you don't mind implementing it I think it's a good idea. I didn't mean to ask you for such a large change with my innocent comments... Kevin ^ permalink raw reply [flat|nested] 24+ messages in thread
* [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically 2013-04-05 9:31 ` Kevin Wolf @ 2013-04-10 0:05 ` Josh Durgin 2013-04-10 8:09 ` Stefan Hajnoczi 0 siblings, 1 reply; 24+ messages in thread From: Josh Durgin @ 2013-04-10 0:05 UTC (permalink / raw) To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi This allows the rbd block driver to detect symbols in the installed version of librbd, and enable or disable features appropriately. This obviates the #ifdefs regarding librbd versions. Loading librbd dynamically also makes the rbd block driver easier to install and package, since it removes the dependency on librbd at build time. Add structures containing the necessary function pointer signatures and types from librbd, and fill them in the first time the rbd module is used. Use glib's g_module interface so we don't preclude future portability, and don't have to deal with odd dlopen behavior directly. Internally, librbd and some libraries it depends on use C++ templates, which mean that they each contain a defined weak symbol for their definition. Due to the way the linker resolves duplicate symbols, the libraries loaded by librbd end up using the template definitions from librbd, creating a circular dependency. This means that once librbd is loaded, it won't be unloaded. Changing this behavior might work with a Sun ld, but there doesn't seem to be a portable (or even working with GNU ld) way to hide these C++ symbols correctly. Instead, never unload librbd, and explicitly make it resident. Signed-off-by: Josh Durgin <josh.durgin@inktank.com> --- block/Makefile.objs | 3 +- block/rbd.c | 292 ++++++++++++++++++++++++++++++++++++--------------- block/rbd_types.h | 95 +++++++++++++++++ configure | 41 +------- 4 files changed, 309 insertions(+), 122 deletions(-) create mode 100644 block/rbd_types.h diff --git a/block/Makefile.objs b/block/Makefile.objs index c067f38..2ad32f7 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -8,10 +8,9 @@ block-obj-$(CONFIG_POSIX) += raw-posix.o block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o ifeq ($(CONFIG_POSIX),y) -block-obj-y += nbd.o sheepdog.o +block-obj-y += nbd.o rbd.o sheepdog.o block-obj-$(CONFIG_LIBISCSI) += iscsi.o block-obj-$(CONFIG_CURL) += curl.o -block-obj-$(CONFIG_RBD) += rbd.o block-obj-$(CONFIG_GLUSTERFS) += gluster.o endif diff --git a/block/rbd.c b/block/rbd.c index 037d82b..0bf5c40 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -11,13 +11,13 @@ * GNU GPL, version 2 or (at your option) any later version. */ +#include <gmodule.h> #include <inttypes.h> #include "qemu-common.h" #include "qemu/error-report.h" #include "block/block_int.h" - -#include <rbd/librbd.h> +#include "rbd_types.h" /* * When specifying the image filename use: @@ -44,13 +44,6 @@ * leading "\". */ -/* rbd_aio_discard added in 0.1.2 */ -#if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 2) -#define LIBRBD_SUPPORTS_DISCARD -#else -#undef LIBRBD_SUPPORTS_DISCARD -#endif - #define OBJ_MAX_SIZE (1UL << OBJ_DEFAULT_OBJ_ORDER) #define RBD_MAX_CONF_NAME_SIZE 128 @@ -106,6 +99,12 @@ typedef struct BDRVRBDState { RADOSCB *event_rcb; } BDRVRBDState; +static LibradosFuncs librados; +static LibrbdFuncs librbd; +static bool librbd_loaded; +static GModule *librbd_handle; + +static int qemu_rbd_load_libs(void); static void rbd_aio_bh_cb(void *opaque); static int qemu_rbd_next_tok(char *dst, int dst_len, @@ -267,7 +266,7 @@ static int qemu_rbd_set_conf(rados_t cluster, const char *conf) qemu_rbd_unescape(value); if (strcmp(name, "conf") == 0) { - ret = rados_conf_read_file(cluster, value); + ret = (*librados.rados_conf_read_file)(cluster, value); if (ret < 0) { error_report("error reading conf file %s", value); break; @@ -275,7 +274,7 @@ static int qemu_rbd_set_conf(rados_t cluster, const char *conf) } else if (strcmp(name, "id") == 0) { /* ignore, this is parsed by qemu_rbd_parse_clientname() */ } else { - ret = rados_conf_set(cluster, name, value); + ret = (*librados.rados_conf_set)(cluster, name, value); if (ret < 0) { error_report("invalid conf option %s", name); ret = -EINVAL; @@ -310,6 +309,10 @@ static int qemu_rbd_create(const char *filename, QEMUOptionParameter *options) return -EINVAL; } + if (qemu_rbd_load_libs() < 0) { + return -EIO; + } + /* Read out options */ while (options && options->name) { if (!strcmp(options->name, BLOCK_OPT_SIZE)) { @@ -332,38 +335,38 @@ static int qemu_rbd_create(const char *filename, QEMUOptionParameter *options) } clientname = qemu_rbd_parse_clientname(conf, clientname_buf); - if (rados_create(&cluster, clientname) < 0) { + if ((*librados.rados_create)(&cluster, clientname) < 0) { error_report("error initializing"); return -EIO; } if (strstr(conf, "conf=") == NULL) { /* try default location, but ignore failure */ - rados_conf_read_file(cluster, NULL); + (*librados.rados_conf_read_file)(cluster, NULL); } if (conf[0] != '\0' && qemu_rbd_set_conf(cluster, conf) < 0) { error_report("error setting config options"); - rados_shutdown(cluster); + (*librados.rados_shutdown)(cluster); return -EIO; } - if (rados_connect(cluster) < 0) { + if ((*librados.rados_connect)(cluster) < 0) { error_report("error connecting"); - rados_shutdown(cluster); + (*librados.rados_shutdown)(cluster); return -EIO; } - if (rados_ioctx_create(cluster, pool, &io_ctx) < 0) { + if ((*librados.rados_ioctx_create)(cluster, pool, &io_ctx) < 0) { error_report("error opening pool %s", pool); - rados_shutdown(cluster); + (*librados.rados_shutdown)(cluster); return -EIO; } - ret = rbd_create(io_ctx, name, bytes, &obj_order); - rados_ioctx_destroy(io_ctx); - rados_shutdown(cluster); + ret = (*librbd.rbd_create)(io_ctx, name, bytes, &obj_order); + (*librados.rados_ioctx_destroy)(io_ctx); + (*librados.rados_shutdown)(cluster); return ret; } @@ -459,8 +462,12 @@ static int qemu_rbd_open(BlockDriverState *bs, const char *filename, return -EINVAL; } + if (qemu_rbd_load_libs() < 0) { + return -EIO; + } + clientname = qemu_rbd_parse_clientname(conf, clientname_buf); - r = rados_create(&s->cluster, clientname); + r = (*librados.rados_create)(&s->cluster, clientname); if (r < 0) { error_report("error initializing"); return r; @@ -479,14 +486,14 @@ static int qemu_rbd_open(BlockDriverState *bs, const char *filename, * be set up, fall back to no caching. */ if (flags & BDRV_O_NOCACHE) { - rados_conf_set(s->cluster, "rbd_cache", "false"); + (*librados.rados_conf_set)(s->cluster, "rbd_cache", "false"); } else { - rados_conf_set(s->cluster, "rbd_cache", "true"); + (*librados.rados_conf_set)(s->cluster, "rbd_cache", "true"); } if (strstr(conf, "conf=") == NULL) { /* try default location, but ignore failure */ - rados_conf_read_file(s->cluster, NULL); + (*librados.rados_conf_read_file)(s->cluster, NULL); } if (conf[0] != '\0') { @@ -497,19 +504,19 @@ static int qemu_rbd_open(BlockDriverState *bs, const char *filename, } } - r = rados_connect(s->cluster); + r = (*librados.rados_connect)(s->cluster); if (r < 0) { error_report("error connecting"); goto failed_shutdown; } - r = rados_ioctx_create(s->cluster, pool, &s->io_ctx); + r = (*librados.rados_ioctx_create)(s->cluster, pool, &s->io_ctx); if (r < 0) { error_report("error opening pool %s", pool); goto failed_shutdown; } - r = rbd_open(s->io_ctx, s->name, &s->image, s->snap); + r = (*librbd.rbd_open)(s->io_ctx, s->name, &s->image, s->snap); if (r < 0) { error_report("error reading header from %s", s->name); goto failed_open; @@ -532,11 +539,11 @@ static int qemu_rbd_open(BlockDriverState *bs, const char *filename, return 0; failed: - rbd_close(s->image); + (*librbd.rbd_close)(s->image); failed_open: - rados_ioctx_destroy(s->io_ctx); + (*librados.rados_ioctx_destroy)(s->io_ctx); failed_shutdown: - rados_shutdown(s->cluster); + (*librados.rados_shutdown)(s->cluster); g_free(s->snap); return r; } @@ -549,10 +556,10 @@ static void qemu_rbd_close(BlockDriverState *bs) close(s->fds[1]); qemu_aio_set_fd_handler(s->fds[RBD_FD_READ], NULL, NULL, NULL, NULL); - rbd_close(s->image); - rados_ioctx_destroy(s->io_ctx); + (*librbd.rbd_close)(s->image); + (*librados.rados_ioctx_destroy)(s->io_ctx); g_free(s->snap); - rados_shutdown(s->cluster); + (*librados.rados_shutdown)(s->cluster); } /* @@ -618,8 +625,8 @@ static int qemu_rbd_send_pipe(BDRVRBDState *s, RADOSCB *rcb) static void rbd_finish_aiocb(rbd_completion_t c, RADOSCB *rcb) { int ret; - rcb->ret = rbd_aio_get_return_value(c); - rbd_aio_release(c); + rcb->ret = (*librbd.rbd_aio_get_return_value)(c); + (*librbd.rbd_aio_release)(c); ret = qemu_rbd_send_pipe(rcb->s, rcb); if (ret < 0) { error_report("failed writing to acb->s->fds"); @@ -647,28 +654,6 @@ static void rbd_aio_bh_cb(void *opaque) } } -static int rbd_aio_discard_wrapper(rbd_image_t image, - uint64_t off, - uint64_t len, - rbd_completion_t comp) -{ -#ifdef LIBRBD_SUPPORTS_DISCARD - return rbd_aio_discard(image, off, len, comp); -#else - return -ENOTSUP; -#endif -} - -static int rbd_aio_flush_wrapper(rbd_image_t image, - rbd_completion_t comp) -{ -#ifdef LIBRBD_SUPPORTS_AIO_FLUSH - return rbd_aio_flush(image, comp); -#else - return -ENOTSUP; -#endif -} - static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov, @@ -718,23 +703,25 @@ static BlockDriverAIOCB *rbd_start_aio(BlockDriverState *bs, rcb->buf = buf; rcb->s = acb->s; rcb->size = size; - r = rbd_aio_create_completion(rcb, (rbd_callback_t) rbd_finish_aiocb, &c); + r = (*librbd.rbd_aio_create_completion)(rcb, + (rbd_callback_t) rbd_finish_aiocb, + &c); if (r < 0) { goto failed; } switch (cmd) { case RBD_AIO_WRITE: - r = rbd_aio_write(s->image, off, size, buf, c); + r = (*librbd.rbd_aio_write)(s->image, off, size, buf, c); break; case RBD_AIO_READ: - r = rbd_aio_read(s->image, off, size, buf, c); + r = (*librbd.rbd_aio_read)(s->image, off, size, buf, c); break; case RBD_AIO_DISCARD: - r = rbd_aio_discard_wrapper(s->image, off, size, c); + r = (*librbd.rbd_aio_discard)(s->image, off, size, c); break; case RBD_AIO_FLUSH: - r = rbd_aio_flush_wrapper(s->image, c); + r = (*librbd.rbd_aio_flush)(s->image, c); break; default: r = -EINVAL; @@ -784,13 +771,11 @@ static BlockDriverAIOCB *qemu_rbd_aio_flush(BlockDriverState *bs, static int qemu_rbd_co_flush(BlockDriverState *bs) { -#if LIBRBD_VERSION_CODE >= LIBRBD_VERSION(0, 1, 1) - /* rbd_flush added in 0.1.1 */ BDRVRBDState *s = bs->opaque; - return rbd_flush(s->image); -#else + if (librbd.rbd_flush) { + return (*librbd.rbd_flush)(s->image); + } return 0; -#endif } static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi) @@ -799,7 +784,7 @@ static int qemu_rbd_getinfo(BlockDriverState *bs, BlockDriverInfo *bdi) rbd_image_info_t info; int r; - r = rbd_stat(s->image, &info, sizeof(info)); + r = (*librbd.rbd_stat)(s->image, &info, sizeof(info)); if (r < 0) { return r; } @@ -814,7 +799,7 @@ static int64_t qemu_rbd_getlength(BlockDriverState *bs) rbd_image_info_t info; int r; - r = rbd_stat(s->image, &info, sizeof(info)); + r = (*librbd.rbd_stat)(s->image, &info, sizeof(info)); if (r < 0) { return r; } @@ -827,7 +812,7 @@ static int qemu_rbd_truncate(BlockDriverState *bs, int64_t offset) BDRVRBDState *s = bs->opaque; int r; - r = rbd_resize(s->image, offset); + r = (*librbd.rbd_resize)(s->image, offset); if (r < 0) { return r; } @@ -858,7 +843,7 @@ static int qemu_rbd_snap_create(BlockDriverState *bs, return -ERANGE; } - r = rbd_snap_create(s->image, sn_info->name); + r = (*librbd.rbd_snap_create)(s->image, sn_info->name); if (r < 0) { error_report("failed to create snap: %s", strerror(-r)); return r; @@ -873,7 +858,7 @@ static int qemu_rbd_snap_remove(BlockDriverState *bs, BDRVRBDState *s = bs->opaque; int r; - r = rbd_snap_remove(s->image, snapshot_name); + r = (*librbd.rbd_snap_remove)(s->image, snapshot_name); return r; } @@ -883,7 +868,7 @@ static int qemu_rbd_snap_rollback(BlockDriverState *bs, BDRVRBDState *s = bs->opaque; int r; - r = rbd_snap_rollback(s->image, snapshot_name); + r = (*librbd.rbd_snap_rollback)(s->image, snapshot_name); return r; } @@ -898,7 +883,7 @@ static int qemu_rbd_snap_list(BlockDriverState *bs, do { snaps = g_malloc(sizeof(*snaps) * max_snaps); - snap_count = rbd_snap_list(s->image, snaps, &max_snaps); + snap_count = (*librbd.rbd_snap_list)(s->image, snaps, &max_snaps); if (snap_count < 0) { g_free(snaps); } @@ -922,14 +907,13 @@ static int qemu_rbd_snap_list(BlockDriverState *bs, sn_info->date_nsec = 0; sn_info->vm_clock_nsec = 0; } - rbd_snap_list_end(snaps); + (*librbd.rbd_snap_list_end)(snaps); done: *psn_tab = sn_tab; return snap_count; } -#ifdef LIBRBD_SUPPORTS_DISCARD static BlockDriverAIOCB* qemu_rbd_aio_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors, @@ -939,7 +923,6 @@ static BlockDriverAIOCB* qemu_rbd_aio_discard(BlockDriverState *bs, return rbd_start_aio(bs, sector_num, NULL, nb_sectors, cb, opaque, RBD_AIO_DISCARD); } -#endif static QEMUOptionParameter qemu_rbd_create_options[] = { { @@ -972,11 +955,7 @@ static BlockDriver bdrv_rbd = { .bdrv_aio_flush = qemu_rbd_aio_flush, .bdrv_co_flush_to_disk = qemu_rbd_co_flush, - -#ifdef LIBRBD_SUPPORTS_DISCARD .bdrv_aio_discard = qemu_rbd_aio_discard, -#endif - .bdrv_snapshot_create = qemu_rbd_snap_create, .bdrv_snapshot_delete = qemu_rbd_snap_remove, .bdrv_snapshot_list = qemu_rbd_snap_list, @@ -988,4 +967,153 @@ static void bdrv_rbd_init(void) bdrv_register(&bdrv_rbd); } +typedef struct LibSymbol { + const char *name; + gpointer *addr; +} LibSymbol; + +static int qemu_rbd_set_functions(GModule *lib, const LibSymbol *funcs) +{ + int i = 0; + while (funcs[i].name) { + const char *name = funcs[i].name; + if (!g_module_symbol(lib, name, funcs[i].addr)) { + error_report("%s could not be loaded from librbd or librados: %s", + name, g_module_error()); + return -1; + } + ++i; + } + return 0; +} + +/* + * Set function pointers for basic librados and librbd + * functions that have always been present in these libraries. + */ +static int qemu_rbd_set_mandatory_functions(void) +{ + LibSymbol symbols[] = { + {"rados_create", + (gpointer *) &librados.rados_create}, + {"rados_connect", + (gpointer *) &librados.rados_connect}, + {"rados_shutdown", + (gpointer *) &librados.rados_shutdown}, + {"rados_conf_read_file", + (gpointer *) &librados.rados_conf_read_file}, + {"rados_conf_set", + (gpointer *) &librados.rados_conf_set}, + {"rados_ioctx_create", + (gpointer *) &librados.rados_ioctx_create}, + {"rados_ioctx_destroy", + (gpointer *) &librados.rados_ioctx_destroy}, + {"rbd_create", + (gpointer *) &librbd.rbd_create}, + {"rbd_open", + (gpointer *) &librbd.rbd_open}, + {"rbd_close", + (gpointer *) &librbd.rbd_close}, + {"rbd_resize", + (gpointer *) &librbd.rbd_resize}, + {"rbd_stat", + (gpointer *) &librbd.rbd_stat}, + {"rbd_snap_list", + (gpointer *) &librbd.rbd_snap_list}, + {"rbd_snap_list_end", + (gpointer *) &librbd.rbd_snap_list_end}, + {"rbd_snap_create", + (gpointer *) &librbd.rbd_snap_create}, + {"rbd_snap_remove", + (gpointer *) &librbd.rbd_snap_remove}, + {"rbd_snap_rollback", + (gpointer *) &librbd.rbd_snap_rollback}, + {"rbd_aio_write", + (gpointer *) &librbd.rbd_aio_write}, + {"rbd_aio_read", + (gpointer *) &librbd.rbd_aio_read}, + {"rbd_aio_create_completion", + (gpointer *) &librbd.rbd_aio_create_completion}, + {"rbd_aio_get_return_value", + (gpointer *) &librbd.rbd_aio_get_return_value}, + {"rbd_aio_release", + (gpointer *) &librbd.rbd_aio_release}, + {NULL} + }; + + if (qemu_rbd_set_functions(librbd_handle, symbols) < 0) { + return -1; + } + + return 0; +} + +/* + * Detect whether the installed version of librbd + * supports newer functionality, and enable or disable + * it appropriately in bdrv_rbd. + */ +static void qemu_rbd_set_optional_functions(void) +{ + if (g_module_symbol(librbd_handle, "rbd_flush", + (gpointer *) &librbd.rbd_flush)) { + bdrv_rbd.bdrv_aio_flush = NULL; + bdrv_rbd.bdrv_co_flush_to_disk = qemu_rbd_co_flush; + } else { + librbd.rbd_flush = NULL; + bdrv_rbd.bdrv_co_flush_to_disk = NULL; + } + + if (g_module_symbol(librbd_handle, "rbd_aio_flush", + (gpointer *) &librbd.rbd_aio_flush)) { + bdrv_rbd.bdrv_co_flush_to_disk = NULL; + bdrv_rbd.bdrv_aio_flush = qemu_rbd_aio_flush; + } else { + librbd.rbd_aio_flush = NULL; + bdrv_rbd.bdrv_aio_flush = NULL; + } + + if (g_module_symbol(librbd_handle, "rbd_aio_discard", + (gpointer *) &librbd.rbd_aio_discard)) { + bdrv_rbd.bdrv_aio_discard = qemu_rbd_aio_discard; + } else { + librbd.rbd_aio_discard = NULL; + bdrv_rbd.bdrv_aio_discard = NULL; + } +} + +static int qemu_rbd_load_libs(void) +{ + if (librbd_loaded) { + return 0; + } + + if (!g_module_supported()) { + error_report("modules are not supported on this platform: %s", + g_module_error()); + return -1; + } + + librbd_handle = g_module_open("librbd.so.1", 0); + if (!librbd_handle) { + error_report("error loading librbd: %s", g_module_error()); + return -1; + } + + /* + * Due to c++ templates used in librbd/librados and their + * dependencies, and linker duplicate trimming rules, closing + * librbd would leave it mapped. Make this explicit. + */ + g_module_make_resident(librbd_handle); + + if (qemu_rbd_set_mandatory_functions() < 0) { + return -1; + } + qemu_rbd_set_optional_functions(); + librbd_loaded = true; + + return 0; +} + block_init(bdrv_rbd_init); diff --git a/block/rbd_types.h b/block/rbd_types.h new file mode 100644 index 0000000..54df8d6 --- /dev/null +++ b/block/rbd_types.h @@ -0,0 +1,95 @@ +/* + * Types and signatures for librados and librbd + * + * Copyright (C) 2013 Inktank Storage Inc. + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2 of the License, or (at your option) any later version. + * + * This library is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef QEMU_BLOCK_RBD_TYPES_H +#define QEMU_BLOCK_RBD_TYPES_H + +/* types from librados used by the rbd block driver */ + +typedef void *rados_t; +typedef void *rados_ioctx_t; + +typedef struct LibradosFuncs { + int (*rados_create)(rados_t *cluster, const char * const id); + int (*rados_connect)(rados_t cluster); + void (*rados_shutdown)(rados_t cluster); + int (*rados_conf_read_file)(rados_t cluster, const char *path); + int (*rados_conf_set)(rados_t cluster, const char *option, + const char *value); + int (*rados_ioctx_create)(rados_t cluster, const char *pool_name, + rados_ioctx_t *ioctx); + void (*rados_ioctx_destroy)(rados_ioctx_t io); +} LibradosFuncs; + +/* types from librbd used by the rbd block driver*/ + +typedef void *rbd_image_t; +typedef void *rbd_completion_t; +typedef void (*rbd_callback_t)(rbd_completion_t cb, void *arg); + +typedef struct { + uint64_t id; + uint64_t size; + const char *name; +} rbd_snap_info_t; + +#define RBD_MAX_IMAGE_NAME_SIZE 96 +#define RBD_MAX_BLOCK_NAME_SIZE 24 + +typedef struct { + uint64_t size; + uint64_t obj_size; + uint64_t num_objs; + int order; + char block_name_prefix[RBD_MAX_BLOCK_NAME_SIZE]; + int64_t parent_pool; + char parent_name[RBD_MAX_IMAGE_NAME_SIZE]; +} rbd_image_info_t; + +typedef struct LibrbdFuncs { + int (*rbd_create)(rados_ioctx_t io, const char *name, uint64_t size, + int *order); + int (*rbd_open)(rados_ioctx_t io, const char *name, rbd_image_t *image, + const char *snap_name); + int (*rbd_close)(rbd_image_t image); + int (*rbd_resize)(rbd_image_t image, uint64_t size); + int (*rbd_stat)(rbd_image_t image, rbd_image_info_t *info, + size_t infosize); + int (*rbd_snap_list)(rbd_image_t image, rbd_snap_info_t *snaps, + int *max_snaps); + void (*rbd_snap_list_end)(rbd_snap_info_t *snaps); + int (*rbd_snap_create)(rbd_image_t image, const char *snapname); + int (*rbd_snap_remove)(rbd_image_t image, const char *snapname); + int (*rbd_snap_rollback)(rbd_image_t image, const char *snapname); + int (*rbd_aio_write)(rbd_image_t image, uint64_t off, size_t len, + const char *buf, rbd_completion_t c); + int (*rbd_aio_read)(rbd_image_t image, uint64_t off, size_t len, + char *buf, rbd_completion_t c); + int (*rbd_aio_discard)(rbd_image_t image, uint64_t off, uint64_t len, + rbd_completion_t c); + int (*rbd_aio_create_completion)(void *cb_arg, + rbd_callback_t complete_cb, + rbd_completion_t *c); + ssize_t (*rbd_aio_get_return_value)(rbd_completion_t c); + void (*rbd_aio_release)(rbd_completion_t c); + int (*rbd_flush)(rbd_image_t image); + int (*rbd_aio_flush)(rbd_image_t image, rbd_completion_t c); +} LibrbdFuncs; + +#endif diff --git a/configure b/configure index f2af714..05e062a 100755 --- a/configure +++ b/configure @@ -214,7 +214,6 @@ zero_malloc="" trace_backend="nop" trace_file="trace" spice="" -rbd="" smartcard_nss="" usb_redir="" glx="" @@ -862,10 +861,6 @@ for opt do ;; --enable-glx) glx="yes" ;; - --disable-rbd) rbd="no" - ;; - --enable-rbd) rbd="yes" - ;; --disable-xfsctl) xfs="no" ;; --enable-xfsctl) xfs="yes" @@ -1148,7 +1143,6 @@ echo " --with-trace-file=NAME Full PATH,NAME of file to store traces" echo " Default:trace-<pid>" echo " --disable-spice disable spice" echo " --enable-spice enable spice" -echo " --enable-rbd enable building the rados block device (rbd)" echo " --disable-libiscsi disable iscsi support" echo " --enable-libiscsi enable iscsi support" echo " --disable-smartcard-nss disable smartcard nss support" @@ -2195,10 +2189,10 @@ if test "$mingw32" = yes; then else glib_req_ver=2.12 fi -if $pkg_config --atleast-version=$glib_req_ver gthread-2.0 > /dev/null 2>&1 +if $pkg_config --atleast-version=$glib_req_ver gthread-2.0 gmodule-2.0 > /dev/null 2>&1 then - glib_cflags=`$pkg_config --cflags gthread-2.0 2>/dev/null` - glib_libs=`$pkg_config --libs gthread-2.0 2>/dev/null` + glib_cflags=`$pkg_config --cflags gthread-2.0 gmodule-2.0 2>/dev/null` + glib_libs=`$pkg_config --libs gthread-2.0 gmodule-2.0 2>/dev/null` LIBS="$glib_libs $LIBS" libs_qga="$glib_libs $libs_qga" else @@ -2305,31 +2299,6 @@ if test "$mingw32" != yes -a "$pthread" = no; then fi ########################################## -# rbd probe -if test "$rbd" != "no" ; then - cat > $TMPC <<EOF -#include <stdio.h> -#include <rbd/librbd.h> -int main(void) { - rados_t cluster; - rados_create(&cluster, NULL); - return 0; -} -EOF - rbd_libs="-lrbd -lrados" - if compile_prog "" "$rbd_libs" ; then - rbd=yes - libs_tools="$rbd_libs $libs_tools" - libs_softmmu="$rbd_libs $libs_softmmu" - else - if test "$rbd" = "yes" ; then - feature_not_found "rados block device" - fi - rbd=no - fi -fi - -########################################## # linux-aio probe if test "$linux_aio" != "no" ; then @@ -3427,7 +3396,6 @@ echo "vhost-net support $vhost_net" echo "Trace backend $trace_backend" echo "Trace output file $trace_file-<pid>" echo "spice support $spice ($spice_protocol_version/$spice_server_version)" -echo "rbd support $rbd" echo "xfsctl support $xfs" echo "nss used $smartcard_nss" echo "usb net redir $usb_redir" @@ -3764,9 +3732,6 @@ echo "CONFIG_UNAME_RELEASE=\"$uname_release\"" >> $config_host_mak if test "$zero_malloc" = "yes" ; then echo "CONFIG_ZERO_MALLOC=y" >> $config_host_mak fi -if test "$rbd" = "yes" ; then - echo "CONFIG_RBD=y" >> $config_host_mak -fi if test "$coroutine_backend" = "ucontext" ; then echo "CONFIG_UCONTEXT_COROUTINE=y" >> $config_host_mak -- 1.7.9.5 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically 2013-04-10 0:05 ` [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically Josh Durgin @ 2013-04-10 8:09 ` Stefan Hajnoczi 2013-04-10 14:52 ` [Qemu-devel] runtime Block driver modules (was Re: [PATCH v3 2/2] rbd: link and load librbd dynamically) Josh Durgin 2013-04-10 15:08 ` [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically Anthony Liguori 0 siblings, 2 replies; 24+ messages in thread From: Stefan Hajnoczi @ 2013-04-10 8:09 UTC (permalink / raw) To: Josh Durgin; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi On Wed, Apr 10, 2013 at 2:05 AM, Josh Durgin <josh.durgin@inktank.com> wrote: > This allows the rbd block driver to detect symbols in the installed > version of librbd, and enable or disable features appropriately. This > obviates the #ifdefs regarding librbd versions. > > Loading librbd dynamically also makes the rbd block driver easier to > install and package, since it removes the dependency on librbd at > build time. > > Add structures containing the necessary function pointer signatures > and types from librbd, and fill them in the first time the rbd module > is used. Use glib's g_module interface so we don't preclude future > portability, and don't have to deal with odd dlopen behavior directly. > > Internally, librbd and some libraries it depends on use C++ templates, > which mean that they each contain a defined weak symbol for their > definition. Due to the way the linker resolves duplicate symbols, the > libraries loaded by librbd end up using the template definitions from > librbd, creating a circular dependency. This means that once librbd is > loaded, it won't be unloaded. Changing this behavior might work with a > Sun ld, but there doesn't seem to be a portable (or even working with > GNU ld) way to hide these C++ symbols correctly. Instead, never unload > librbd, and explicitly make it resident. > > Signed-off-by: Josh Durgin <josh.durgin@inktank.com> > --- > block/Makefile.objs | 3 +- > block/rbd.c | 292 ++++++++++++++++++++++++++++++++++++--------------- > block/rbd_types.h | 95 +++++++++++++++++ > configure | 41 +------- > 4 files changed, 309 insertions(+), 122 deletions(-) > create mode 100644 block/rbd_types.h NACK I think we're solving the problem at the wrong level. Writing our own dynamic linker and adding boilerplate to juggle function pointers every time we use a library dependency is ugly. There are two related problems here: 1. Packagers do not want to enable niche dependencies since users will complain that the package is bloated and pulls in too much stuff. 2. QEMU linked against a newer library version fails to run on hosts that have an older library. Problem #1 has several solutions: 1. Let packagers take care of it. For example, vim is often shipped in several packages that have various amounts of dependencies (vim-tiny, vim-gtk, etc). Packagers create the specialized packages for specific groups of users to meet their demands without dragging in too many dependencies. 2. Make QEMU modular - host devices should be shared libraries that are loaded at runtime. There should be no stable API so that development stays flexible and we discourage binary-only modules. This lets packagers easily ship a qemu-rbd package, for example, that drops in a .so file that QEMU can load at runtime. Problem #2 is already solved: The dynamic linker will refuse to load the program if there are missing symbols. It's not possible to mix and match binaries across environments while downgrading their library dependencies. With effort, this could be doable but it's not an interesting use case that many users care about - they get their binaries from a distro or build them from source with correct dependencies. Maybe it's time to move block drivers and other components into modules? Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] runtime Block driver modules (was Re: [PATCH v3 2/2] rbd: link and load librbd dynamically) 2013-04-10 8:09 ` Stefan Hajnoczi @ 2013-04-10 14:52 ` Josh Durgin 2013-04-10 15:08 ` [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically Anthony Liguori 1 sibling, 0 replies; 24+ messages in thread From: Josh Durgin @ 2013-04-10 14:52 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi On 04/10/2013 01:09 AM, Stefan Hajnoczi wrote: > On Wed, Apr 10, 2013 at 2:05 AM, Josh Durgin <josh.durgin@inktank.com> wrote: >> This allows the rbd block driver to detect symbols in the installed >> version of librbd, and enable or disable features appropriately. This >> obviates the #ifdefs regarding librbd versions. >> >> Loading librbd dynamically also makes the rbd block driver easier to >> install and package, since it removes the dependency on librbd at >> build time. >> >> Add structures containing the necessary function pointer signatures >> and types from librbd, and fill them in the first time the rbd module >> is used. Use glib's g_module interface so we don't preclude future >> portability, and don't have to deal with odd dlopen behavior directly. >> >> Internally, librbd and some libraries it depends on use C++ templates, >> which mean that they each contain a defined weak symbol for their >> definition. Due to the way the linker resolves duplicate symbols, the >> libraries loaded by librbd end up using the template definitions from >> librbd, creating a circular dependency. This means that once librbd is >> loaded, it won't be unloaded. Changing this behavior might work with a >> Sun ld, but there doesn't seem to be a portable (or even working with >> GNU ld) way to hide these C++ symbols correctly. Instead, never unload >> librbd, and explicitly make it resident. >> >> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> >> --- >> block/Makefile.objs | 3 +- >> block/rbd.c | 292 ++++++++++++++++++++++++++++++++++++--------------- >> block/rbd_types.h | 95 +++++++++++++++++ >> configure | 41 +------- >> 4 files changed, 309 insertions(+), 122 deletions(-) >> create mode 100644 block/rbd_types.h > > NACK > > I think we're solving the problem at the wrong level. Writing our own > dynamic linker and adding boilerplate to juggle function pointers > every time we use a library dependency is ugly. I wouldn't call this writing our own dynamic linker. To load a library at runtime requires some boilerplate and using function pointers, although there could be cleaner ways of setting it up. > There are two related problems here: > > 1. Packagers do not want to enable niche dependencies since users will > complain that the package is bloated and pulls in too much stuff. > > 2. QEMU linked against a newer library version fails to run on hosts > that have an older library. Related to 2, there's also a need to rebuild QEMU to be able to access new functionality in a newer version of a library than QEMU was linked against. This would be easier to do if the devices were modules themselves, since only the module that used the library would need to be rebuilt. > Problem #1 has several solutions: > > 1. Let packagers take care of it. For example, vim is often shipped > in several packages that have various amounts of dependencies > (vim-tiny, vim-gtk, etc). Packagers create the specialized packages > for specific groups of users to meet their demands without dragging in > too many dependencies. > > 2. Make QEMU modular - host devices should be shared libraries that > are loaded at runtime. There should be no stable API so that > development stays flexible and we discourage binary-only modules. > This lets packagers easily ship a qemu-rbd package, for example, that > drops in a .so file that QEMU can load at runtime. > > Problem #2 is already solved: > > The dynamic linker will refuse to load the program if there are > missing symbols. It's not possible to mix and match binaries across > environments while downgrading their library dependencies. With > effort, this could be doable but it's not an interesting use case that > many users care about - they get their binaries from a distro or build > them from source with correct dependencies. > > Maybe it's time to move block drivers and other components into modules? I think that would be great. I'm not too familiar with the other components, so I'll discuss block drivers as a first step. I could see this happening in at a few different ways. These could be incremental steps or final implementations, increasing in complexity and effort: 1) It seems like it would be possible to achieve an initial version of runtime loading with minimal driver changes by loading modules instead of using block_init(), and using the existing driver-specific block_init() methods (renamed to be the same across modules) as the only function. This would avoid the function pointer ugliness since it would be hidden by the existing BlockDriver structure. 2) The existing BlockDriver methods could become the module interface, with a couple extra functions for defining the current data members (format_name, protocol_name, instance_size, and create_options). The block layer could build up a BlockDriver for each module based on the functions it exports, detected by g_module_symbol(). 3) An entirely different module interface from what exists today in the block drivers. Thoughts? Josh ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically 2013-04-10 8:09 ` Stefan Hajnoczi 2013-04-10 14:52 ` [Qemu-devel] runtime Block driver modules (was Re: [PATCH v3 2/2] rbd: link and load librbd dynamically) Josh Durgin @ 2013-04-10 15:08 ` Anthony Liguori 2013-04-10 21:19 ` Paolo Bonzini ` (2 more replies) 1 sibling, 3 replies; 24+ messages in thread From: Anthony Liguori @ 2013-04-10 15:08 UTC (permalink / raw) To: Stefan Hajnoczi, Josh Durgin Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi Stefan Hajnoczi <stefanha@gmail.com> writes: > On Wed, Apr 10, 2013 at 2:05 AM, Josh Durgin <josh.durgin@inktank.com> wrote: > NACK > > I think we're solving the problem at the wrong level. Writing our own > dynamic linker and adding boilerplate to juggle function pointers > every time we use a library dependency is ugly. > > There are two related problems here: > > 1. Packagers do not want to enable niche dependencies since users will > complain that the package is bloated and pulls in too much stuff. > > 2. QEMU linked against a newer library version fails to run on hosts > that have an older library. > > Problem #1 has several solutions: > > 1. Let packagers take care of it. For example, vim is often shipped > in several packages that have various amounts of dependencies > (vim-tiny, vim-gtk, etc). Packagers create the specialized packages > for specific groups of users to meet their demands without dragging in > too many dependencies. > > 2. Make QEMU modular - host devices should be shared libraries that > are loaded at runtime. There should be no stable API so that > development stays flexible and we discourage binary-only modules. > This lets packagers easily ship a qemu-rbd package, for example, that > drops in a .so file that QEMU can load at runtime. > > Problem #2 is already solved: > > The dynamic linker will refuse to load the program if there are > missing symbols. It's not possible to mix and match binaries across > environments while downgrading their library dependencies. With > effort, this could be doable but it's not an interesting use case that > many users care about - they get their binaries from a distro or build > them from source with correct dependencies. > > Maybe it's time to move block drivers and other components into > modules? This is really a build system issue more than anything else. There are no internal API changes needed. All that's needed is to something like (in module.h): /* This should not be used directly. Use block_init etc. instead. */ #ifdef CONFIG_MODULE #define module_init(function, type) \ const gchar *g_module_check_init(GModule *module) \ { \ register_module_init(function, type); \ return NULL; \ } #else #define module_init(function, type) \ static void __attribute__((constructor)) do_qemu_init_ ## function(void) { \ register_module_init(function, type); \ } #endif We then also need a way to load modules prior to calling init using the GModule interfaces. Easiest thing to do is just load all .so's in a single directory (/usr/lib/qemu/modules/*.so?) prior to calling any module init functions. What we need from the build system is the ability to build things either builtin or as modules. Paolo has a GSoC proposal to integrate kconfig. This would be a great approach to solving this problem. Doing it this way would let us build not only block drivers but also devices as modules. This would let us make QXL a module making it easier for distros to not have a hard dependence on libspice for the QEMU package. Regards, Anthony Liguori > > Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically 2013-04-10 15:08 ` [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically Anthony Liguori @ 2013-04-10 21:19 ` Paolo Bonzini 2013-04-11 8:04 ` Stefan Hajnoczi 2013-04-11 7:59 ` Stefan Hajnoczi 2013-06-21 18:42 ` Sage Weil 2 siblings, 1 reply; 24+ messages in thread From: Paolo Bonzini @ 2013-04-10 21:19 UTC (permalink / raw) To: Anthony Liguori Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, Josh Durgin Il 10/04/2013 17:08, Anthony Liguori ha scritto: > /* This should not be used directly. Use block_init etc. instead. */ > #ifdef CONFIG_MODULE > #define module_init(function, type) \ > const gchar *g_module_check_init(GModule *module) \ > { \ > register_module_init(function, type); \ > return NULL; \ > } > #else > #define module_init(function, type) \ > static void __attribute__((constructor)) do_qemu_init_ ## function(void) { \ > register_module_init(function, type); \ > } > #endif Not even that is needed. Modules can have constructor functions that use symbols in the main executable. Basically, modules would be opened with G_MODULE_BIND_LOCAL and communicate with QEMU via constructor functions only (registering driver modules or QOM types). It is really more of a build-system hacking project than anything else. Paolo > We then also need a way to load modules prior to calling init using the > GModule interfaces. Easiest thing to do is just load all .so's in a > single directory (/usr/lib/qemu/modules/*.so?) prior to calling any > module init functions. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically 2013-04-10 21:19 ` Paolo Bonzini @ 2013-04-11 8:04 ` Stefan Hajnoczi 0 siblings, 0 replies; 24+ messages in thread From: Stefan Hajnoczi @ 2013-04-11 8:04 UTC (permalink / raw) To: Paolo Bonzini Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Anthony Liguori, Josh Durgin On Wed, Apr 10, 2013 at 11:19:03PM +0200, Paolo Bonzini wrote: > Il 10/04/2013 17:08, Anthony Liguori ha scritto: > > /* This should not be used directly. Use block_init etc. instead. */ > > #ifdef CONFIG_MODULE > > #define module_init(function, type) \ > > const gchar *g_module_check_init(GModule *module) \ > > { \ > > register_module_init(function, type); \ > > return NULL; \ > > } > > #else > > #define module_init(function, type) \ > > static void __attribute__((constructor)) do_qemu_init_ ## function(void) { \ > > register_module_init(function, type); \ > > } > > #endif > > Not even that is needed. Modules can have constructor functions that > use symbols in the main executable. > > Basically, modules would be opened with G_MODULE_BIND_LOCAL and > communicate with QEMU via constructor functions only (registering driver > modules or QOM types). > > It is really more of a build-system hacking project than anything else. Great. So it seems like we just need a function early in QEMU startup that enumerates all shared libraries in QEMU_MODULE_PATH and opens them with G_MODULE_BIND_LOCAL. Perhaps in the future we'll load modules on-demand to avoid reduce footprint, but for now we can just load all of them just like you get when you build a QEMU binary with everything included. Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically 2013-04-10 15:08 ` [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically Anthony Liguori 2013-04-10 21:19 ` Paolo Bonzini @ 2013-04-11 7:59 ` Stefan Hajnoczi 2013-06-21 18:42 ` Sage Weil 2 siblings, 0 replies; 24+ messages in thread From: Stefan Hajnoczi @ 2013-04-11 7:59 UTC (permalink / raw) To: Anthony Liguori Cc: Kevin Wolf, Stefan Hajnoczi, Paolo Bonzini, qemu-devel, Josh Durgin On Wed, Apr 10, 2013 at 10:08:20AM -0500, Anthony Liguori wrote: > Stefan Hajnoczi <stefanha@gmail.com> writes: > > > On Wed, Apr 10, 2013 at 2:05 AM, Josh Durgin <josh.durgin@inktank.com> wrote: > > NACK > > > > I think we're solving the problem at the wrong level. Writing our own > > dynamic linker and adding boilerplate to juggle function pointers > > every time we use a library dependency is ugly. > > > > There are two related problems here: > > > > 1. Packagers do not want to enable niche dependencies since users will > > complain that the package is bloated and pulls in too much stuff. > > > > 2. QEMU linked against a newer library version fails to run on hosts > > that have an older library. > > > > Problem #1 has several solutions: > > > > 1. Let packagers take care of it. For example, vim is often shipped > > in several packages that have various amounts of dependencies > > (vim-tiny, vim-gtk, etc). Packagers create the specialized packages > > for specific groups of users to meet their demands without dragging in > > too many dependencies. > > > > 2. Make QEMU modular - host devices should be shared libraries that > > are loaded at runtime. There should be no stable API so that > > development stays flexible and we discourage binary-only modules. > > This lets packagers easily ship a qemu-rbd package, for example, that > > drops in a .so file that QEMU can load at runtime. > > > > Problem #2 is already solved: > > > > The dynamic linker will refuse to load the program if there are > > missing symbols. It's not possible to mix and match binaries across > > environments while downgrading their library dependencies. With > > effort, this could be doable but it's not an interesting use case that > > many users care about - they get their binaries from a distro or build > > them from source with correct dependencies. > > > > Maybe it's time to move block drivers and other components into > > modules? > > This is really a build system issue more than anything else. There are > no internal API changes needed. > > All that's needed is to something like (in module.h): > > /* This should not be used directly. Use block_init etc. instead. */ > #ifdef CONFIG_MODULE > #define module_init(function, type) \ > const gchar *g_module_check_init(GModule *module) \ > { \ > register_module_init(function, type); \ > return NULL; \ > } > #else > #define module_init(function, type) \ > static void __attribute__((constructor)) do_qemu_init_ ## function(void) { \ > register_module_init(function, type); \ > } > #endif > > We then also need a way to load modules prior to calling init using the > GModule interfaces. Easiest thing to do is just load all .so's in a > single directory (/usr/lib/qemu/modules/*.so?) prior to calling any > module init functions. > > What we need from the build system is the ability to build things either > builtin or as modules. Paolo has a GSoC proposal to integrate kconfig. > This would be a great approach to solving this problem. > > Doing it this way would let us build not only block drivers but also > devices as modules. This would let us make QXL a module making it > easier for distros to not have a hard dependence on libspice for the > QEMU package. I'd love to do this for the net subsystem as well so distros can provide the qemu-net-vde package without requiring a VDE dependency from the core QEMU package. Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically 2013-04-10 15:08 ` [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically Anthony Liguori 2013-04-10 21:19 ` Paolo Bonzini 2013-04-11 7:59 ` Stefan Hajnoczi @ 2013-06-21 18:42 ` Sage Weil 2013-06-21 19:08 ` Alex Bligh 2013-06-21 19:13 ` Anthony Liguori 2 siblings, 2 replies; 24+ messages in thread From: Sage Weil @ 2013-06-21 18:42 UTC (permalink / raw) To: Anthony Liguori Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, Josh Durgin, Paolo Bonzini, ceph-devel Hi Anthony, Stefan, Paolo, [Resurrecting an old thread, here!] On Wed, 10 Apr 2013 Anthony Liguori wrote: > Stefan Hajnoczi <stefanha@gmail.com> writes: > > NACK > > > > I think we're solving the problem at the wrong level. Writing our own > > dynamic linker and adding boilerplate to juggle function pointers > > every time we use a library dependency is ugly. > > > > There are two related problems here: > > > > 1. Packagers do not want to enable niche dependencies since users will > > complain that the package is bloated and pulls in too much stuff. > > > > 2. QEMU linked against a newer library version fails to run on hosts > > that have an older library. > > > > Problem #1 has several solutions: > > > > 1. Let packagers take care of it. For example, vim is often shipped > > in several packages that have various amounts of dependencies > > (vim-tiny, vim-gtk, etc). Packagers create the specialized packages > > for specific groups of users to meet their demands without dragging in > > too many dependencies. > > > > 2. Make QEMU modular - host devices should be shared libraries that > > are loaded at runtime. There should be no stable API so that > > development stays flexible and we discourage binary-only modules. > > This lets packagers easily ship a qemu-rbd package, for example, that > > drops in a .so file that QEMU can load at runtime. > > > > Problem #2 is already solved: > > > > The dynamic linker will refuse to load the program if there are > > missing symbols. It's not possible to mix and match binaries across > > environments while downgrading their library dependencies. With > > effort, this could be doable but it's not an interesting use case that > > many users care about - they get their binaries from a distro or build > > them from source with correct dependencies. > > > > Maybe it's time to move block drivers and other components into > > modules? > > This is really a build system issue more than anything else. There are > no internal API changes needed. > > All that's needed is to something like (in module.h): > > /* This should not be used directly. Use block_init etc. instead. */ > #ifdef CONFIG_MODULE > #define module_init(function, type) \ > const gchar *g_module_check_init(GModule *module) \ > { \ > register_module_init(function, type); \ > return NULL; \ > } > #else > #define module_init(function, type) \ > static void __attribute__((constructor)) do_qemu_init_ ## function(void) { \ > register_module_init(function, type); \ > } > #endif Has any progress been made toward a generic dynamic linking solution for block drivers? It is a conceptually simple change, but non-trivial for anyone unfamiliar with the build system. In the mean time, the inability to dynamically link librbd is continuing to cause significant pain for distro users. Would you consider merging the rbd-specific linking as an interim solution until something is generically available to qemu? sage ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically 2013-06-21 18:42 ` Sage Weil @ 2013-06-21 19:08 ` Alex Bligh 2013-06-21 19:13 ` Anthony Liguori 1 sibling, 0 replies; 24+ messages in thread From: Alex Bligh @ 2013-06-21 19:08 UTC (permalink / raw) To: Sage Weil, Anthony Liguori Cc: Kevin Wolf, Alex Bligh, Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, Josh Durgin, Paolo Bonzini, ceph-devel --On 21 June 2013 11:42:34 -0700 Sage Weil <sage@inktank.com> wrote: > Has any progress been made toward a generic dynamic linking solution for > block drivers? It is a conceptually simple change, but non-trivial for > anyone unfamiliar with the build system. > > In the mean time, the inability to dynamically link librbd is continuing > to cause significant pain for distro users. Would you consider merging > the rbd-specific linking as an interim solution until something is > generically available to qemu? FWIW I have this implemented here: https://github.com/flexiant/qemu/commits/v1.0-rbd-add-async-flush specifically this commit: https://github.com/flexiant/qemu/commit/6fa2e9c95bdaca7c814881e27f04424fb6cc 2960 This is a backport of qemu 1.5's librbd to qemu 1.0 (which is slighly confusing) but the #pragma weak technique is basically the same. I had to use a slightly different technique to Josh's original version to get it to compile with old includes and still run with a new library, as well as vice versa. It would take me only a few minutes to bring this back as a patch for 1.5 if anyone is interested. I assumed they would not be. -- Alex Bligh ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically 2013-06-21 18:42 ` Sage Weil 2013-06-21 19:08 ` Alex Bligh @ 2013-06-21 19:13 ` Anthony Liguori 1 sibling, 0 replies; 24+ messages in thread From: Anthony Liguori @ 2013-06-21 19:13 UTC (permalink / raw) To: Sage Weil Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, Josh Durgin, Paolo Bonzini, ceph-devel Sage Weil <sage@inktank.com> writes: > Hi Anthony, Stefan, Paolo, > > [Resurrecting an old thread, here!] > > On Wed, 10 Apr 2013 Anthony Liguori wrote: > > Has any progress been made toward a generic dynamic linking solution for > block drivers? It is a conceptually simple change, but non-trivial for > anyone unfamiliar with the build system. There are patches on the list in fact. > In the mean time, the inability to dynamically link librbd is continuing > to cause significant pain for distro users. Would you consider merging > the rbd-specific linking as an interim solution until something is > generically available to qemu? Regards, Anthony Liguori > > sage ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v2] rbd: add an asynchronous flush 2013-04-02 14:10 ` Kevin Wolf 2013-04-04 8:35 ` [Qemu-devel] [PATCH v3 1/2] " Josh Durgin 2013-04-04 8:35 ` [Qemu-devel] [PATCH 2/2] rbd: disable unsupported librbd functions at runtime Josh Durgin @ 2013-04-10 14:03 ` Josh Durgin 2013-04-11 8:02 ` Stefan Hajnoczi 2 siblings, 1 reply; 24+ messages in thread From: Josh Durgin @ 2013-04-10 14:03 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi On 04/02/2013 07:10 AM, Kevin Wolf wrote: > Am 29.03.2013 um 21:03 hat Josh Durgin geschrieben: >> The existing bdrv_co_flush_to_disk implementation uses rbd_flush(), >> which is sychronous and causes the main qemu thread to block until it >> is complete. This results in unresponsiveness and extra latency for >> the guest. >> >> Fix this by using an asynchronous version of flush. This was added to >> librbd with a special #define to indicate its presence, since it will >> be backported to stable versions. Thus, there is no need to check the >> version of librbd. > > librbd is linked dynamically and the version on the build host isn't > necessarily the same as the version qemu is run with. So shouldn't this > better be a runtime check? While we discuss runtime loading separately, would you mind taking this patch as-is for now? Josh >> Implement this as bdrv_aio_flush, since it matches other aio functions >> in the rbd block driver, and leave out bdrv_co_flush_to_disk when the >> asynchronous version is available. >> >> Reported-by: Oliver Francke <oliver@filoo.de> >> Signed-off-by: Josh Durgin <josh.durgin@inktank.com> > > Looks good otherwise. > > Kevin > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v2] rbd: add an asynchronous flush 2013-04-10 14:03 ` [Qemu-devel] [PATCH v2] rbd: add an asynchronous flush Josh Durgin @ 2013-04-11 8:02 ` Stefan Hajnoczi 2013-04-11 8:48 ` Kevin Wolf 0 siblings, 1 reply; 24+ messages in thread From: Stefan Hajnoczi @ 2013-04-11 8:02 UTC (permalink / raw) To: Josh Durgin; +Cc: Kevin Wolf, qemu-devel On Wed, Apr 10, 2013 at 07:03:39AM -0700, Josh Durgin wrote: > On 04/02/2013 07:10 AM, Kevin Wolf wrote: > >Am 29.03.2013 um 21:03 hat Josh Durgin geschrieben: > >>The existing bdrv_co_flush_to_disk implementation uses rbd_flush(), > >>which is sychronous and causes the main qemu thread to block until it > >>is complete. This results in unresponsiveness and extra latency for > >>the guest. > >> > >>Fix this by using an asynchronous version of flush. This was added to > >>librbd with a special #define to indicate its presence, since it will > >>be backported to stable versions. Thus, there is no need to check the > >>version of librbd. > > > >librbd is linked dynamically and the version on the build host isn't > >necessarily the same as the version qemu is run with. So shouldn't this > >better be a runtime check? > > While we discuss runtime loading separately, would you mind taking this > patch as-is for now? Hi Josh, I'm happy with Patch v3 1/2. Does that work for you? I don't want to take Patch v3 2/2 or the dlsym() function pointer patch. Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v2] rbd: add an asynchronous flush 2013-04-11 8:02 ` Stefan Hajnoczi @ 2013-04-11 8:48 ` Kevin Wolf 2013-04-11 17:19 ` Josh Durgin 0 siblings, 1 reply; 24+ messages in thread From: Kevin Wolf @ 2013-04-11 8:48 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: Josh Durgin, qemu-devel Am 11.04.2013 um 10:02 hat Stefan Hajnoczi geschrieben: > On Wed, Apr 10, 2013 at 07:03:39AM -0700, Josh Durgin wrote: > > On 04/02/2013 07:10 AM, Kevin Wolf wrote: > > >Am 29.03.2013 um 21:03 hat Josh Durgin geschrieben: > > >>The existing bdrv_co_flush_to_disk implementation uses rbd_flush(), > > >>which is sychronous and causes the main qemu thread to block until it > > >>is complete. This results in unresponsiveness and extra latency for > > >>the guest. > > >> > > >>Fix this by using an asynchronous version of flush. This was added to > > >>librbd with a special #define to indicate its presence, since it will > > >>be backported to stable versions. Thus, there is no need to check the > > >>version of librbd. > > > > > >librbd is linked dynamically and the version on the build host isn't > > >necessarily the same as the version qemu is run with. So shouldn't this > > >better be a runtime check? > > > > While we discuss runtime loading separately, would you mind taking this > > patch as-is for now? > > Hi Josh, > I'm happy with Patch v3 1/2. Does that work for you? Only patch 1/2 would add dead code as .bdrv_aio_flush would never be called. I think we should rather take v1 of the series then, with the #ifdefs at build time. Kevin ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v2] rbd: add an asynchronous flush 2013-04-11 8:48 ` Kevin Wolf @ 2013-04-11 17:19 ` Josh Durgin 2013-04-12 6:50 ` Kevin Wolf 0 siblings, 1 reply; 24+ messages in thread From: Josh Durgin @ 2013-04-11 17:19 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, Stefan Hajnoczi On 04/11/2013 01:48 AM, Kevin Wolf wrote: > Am 11.04.2013 um 10:02 hat Stefan Hajnoczi geschrieben: >> On Wed, Apr 10, 2013 at 07:03:39AM -0700, Josh Durgin wrote: >>> On 04/02/2013 07:10 AM, Kevin Wolf wrote: >>>> Am 29.03.2013 um 21:03 hat Josh Durgin geschrieben: >>>>> The existing bdrv_co_flush_to_disk implementation uses rbd_flush(), >>>>> which is sychronous and causes the main qemu thread to block until it >>>>> is complete. This results in unresponsiveness and extra latency for >>>>> the guest. >>>>> >>>>> Fix this by using an asynchronous version of flush. This was added to >>>>> librbd with a special #define to indicate its presence, since it will >>>>> be backported to stable versions. Thus, there is no need to check the >>>>> version of librbd. >>>> >>>> librbd is linked dynamically and the version on the build host isn't >>>> necessarily the same as the version qemu is run with. So shouldn't this >>>> better be a runtime check? >>> >>> While we discuss runtime loading separately, would you mind taking this >>> patch as-is for now? >> >> Hi Josh, >> I'm happy with Patch v3 1/2. Does that work for you? > > Only patch 1/2 would add dead code as .bdrv_aio_flush would never be > called. I think we should rather take v1 of the series then, with the > #ifdefs at build time. Yes, that would be a problem with only v3 1/2. v2 of the original series is fine, but v1 was accidentally missing a hunk. To be clear, I think just http://patchwork.ozlabs.org/patch/232489/ would be good. Thanks, Josh ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v2] rbd: add an asynchronous flush 2013-04-11 17:19 ` Josh Durgin @ 2013-04-12 6:50 ` Kevin Wolf 2013-04-12 7:42 ` Stefan Hajnoczi 0 siblings, 1 reply; 24+ messages in thread From: Kevin Wolf @ 2013-04-12 6:50 UTC (permalink / raw) To: Josh Durgin; +Cc: qemu-devel, Stefan Hajnoczi Am 11.04.2013 um 19:19 hat Josh Durgin geschrieben: > On 04/11/2013 01:48 AM, Kevin Wolf wrote: > >Am 11.04.2013 um 10:02 hat Stefan Hajnoczi geschrieben: > >>On Wed, Apr 10, 2013 at 07:03:39AM -0700, Josh Durgin wrote: > >>>On 04/02/2013 07:10 AM, Kevin Wolf wrote: > >>>>Am 29.03.2013 um 21:03 hat Josh Durgin geschrieben: > >>>>>The existing bdrv_co_flush_to_disk implementation uses rbd_flush(), > >>>>>which is sychronous and causes the main qemu thread to block until it > >>>>>is complete. This results in unresponsiveness and extra latency for > >>>>>the guest. > >>>>> > >>>>>Fix this by using an asynchronous version of flush. This was added to > >>>>>librbd with a special #define to indicate its presence, since it will > >>>>>be backported to stable versions. Thus, there is no need to check the > >>>>>version of librbd. > >>>> > >>>>librbd is linked dynamically and the version on the build host isn't > >>>>necessarily the same as the version qemu is run with. So shouldn't this > >>>>better be a runtime check? > >>> > >>>While we discuss runtime loading separately, would you mind taking this > >>>patch as-is for now? > >> > >>Hi Josh, > >>I'm happy with Patch v3 1/2. Does that work for you? > > > >Only patch 1/2 would add dead code as .bdrv_aio_flush would never be > >called. I think we should rather take v1 of the series then, with the > >#ifdefs at build time. > > Yes, that would be a problem with only v3 1/2. v2 of the original > series is fine, but v1 was accidentally missing a hunk. To be clear, > I think just http://patchwork.ozlabs.org/patch/232489/ would be good. Yes, sorry, I should have checked before posting this. v2 I meant. Kevin ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Qemu-devel] [PATCH v2] rbd: add an asynchronous flush 2013-04-12 6:50 ` Kevin Wolf @ 2013-04-12 7:42 ` Stefan Hajnoczi 0 siblings, 0 replies; 24+ messages in thread From: Stefan Hajnoczi @ 2013-04-12 7:42 UTC (permalink / raw) To: Kevin Wolf; +Cc: Josh Durgin, qemu-devel On Fri, Apr 12, 2013 at 08:50:35AM +0200, Kevin Wolf wrote: > Am 11.04.2013 um 19:19 hat Josh Durgin geschrieben: > > On 04/11/2013 01:48 AM, Kevin Wolf wrote: > > >Am 11.04.2013 um 10:02 hat Stefan Hajnoczi geschrieben: > > >>On Wed, Apr 10, 2013 at 07:03:39AM -0700, Josh Durgin wrote: > > >>>On 04/02/2013 07:10 AM, Kevin Wolf wrote: > > >>>>Am 29.03.2013 um 21:03 hat Josh Durgin geschrieben: > > >>>>>The existing bdrv_co_flush_to_disk implementation uses rbd_flush(), > > >>>>>which is sychronous and causes the main qemu thread to block until it > > >>>>>is complete. This results in unresponsiveness and extra latency for > > >>>>>the guest. > > >>>>> > > >>>>>Fix this by using an asynchronous version of flush. This was added to > > >>>>>librbd with a special #define to indicate its presence, since it will > > >>>>>be backported to stable versions. Thus, there is no need to check the > > >>>>>version of librbd. > > >>>> > > >>>>librbd is linked dynamically and the version on the build host isn't > > >>>>necessarily the same as the version qemu is run with. So shouldn't this > > >>>>better be a runtime check? > > >>> > > >>>While we discuss runtime loading separately, would you mind taking this > > >>>patch as-is for now? > > >> > > >>Hi Josh, > > >>I'm happy with Patch v3 1/2. Does that work for you? > > > > > >Only patch 1/2 would add dead code as .bdrv_aio_flush would never be > > >called. I think we should rather take v1 of the series then, with the > > >#ifdefs at build time. > > > > Yes, that would be a problem with only v3 1/2. v2 of the original > > series is fine, but v1 was accidentally missing a hunk. To be clear, > > I think just http://patchwork.ozlabs.org/patch/232489/ would be good. > > Yes, sorry, I should have checked before posting this. v2 I meant. Thanks Josh and Kevin. Will take v2. Stefan ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2013-06-21 19:14 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-03-29 7:59 [Qemu-devel] [PATCH] rbd: add an asynchronous flush Josh Durgin 2013-03-29 20:03 ` [Qemu-devel] [PATCH v2] " Josh Durgin 2013-04-02 14:10 ` Kevin Wolf 2013-04-04 8:35 ` [Qemu-devel] [PATCH v3 1/2] " Josh Durgin 2013-04-04 8:35 ` [Qemu-devel] [PATCH 2/2] rbd: disable unsupported librbd functions at runtime Josh Durgin 2013-04-04 10:10 ` Kevin Wolf 2013-04-04 16:50 ` Josh Durgin 2013-04-05 9:31 ` Kevin Wolf 2013-04-10 0:05 ` [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically Josh Durgin 2013-04-10 8:09 ` Stefan Hajnoczi 2013-04-10 14:52 ` [Qemu-devel] runtime Block driver modules (was Re: [PATCH v3 2/2] rbd: link and load librbd dynamically) Josh Durgin 2013-04-10 15:08 ` [Qemu-devel] [PATCH v3 2/2] rbd: link and load librbd dynamically Anthony Liguori 2013-04-10 21:19 ` Paolo Bonzini 2013-04-11 8:04 ` Stefan Hajnoczi 2013-04-11 7:59 ` Stefan Hajnoczi 2013-06-21 18:42 ` Sage Weil 2013-06-21 19:08 ` Alex Bligh 2013-06-21 19:13 ` Anthony Liguori 2013-04-10 14:03 ` [Qemu-devel] [PATCH v2] rbd: add an asynchronous flush Josh Durgin 2013-04-11 8:02 ` Stefan Hajnoczi 2013-04-11 8:48 ` Kevin Wolf 2013-04-11 17:19 ` Josh Durgin 2013-04-12 6:50 ` Kevin Wolf 2013-04-12 7:42 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).