From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:47127) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Q9pEP-0002pB-UQ for qemu-devel@nongnu.org; Tue, 12 Apr 2011 21:51:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Q9kuu-0006fr-Fv for qemu-devel@nongnu.org; Tue, 12 Apr 2011 17:14:33 -0400 Received: from mail-gw0-f45.google.com ([74.125.83.45]:60408) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Q9kuu-0006fm-D3 for qemu-devel@nongnu.org; Tue, 12 Apr 2011 17:14:32 -0400 Received: by gwb19 with SMTP id 19so3154846gwb.4 for ; Tue, 12 Apr 2011 14:14:32 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: References: <6f9466b6098b5159aca9c789f9fce45f409e684f.1301354138.git.josh.durgin@dreamhost.com> <20110408084334.GA28360@stefanha-thinkpad.localdomain> <4DA39A6A.7020403@dreamhost.com> Date: Tue, 12 Apr 2011 22:14:31 +0100 Message-ID: From: Stefan Hajnoczi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2 1/2] rbd: use the higher level librbd instead of just librados List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sage Weil Cc: Yehuda Sadeh , ceph-devel@vger.kernel.org, Josh Durgin , qemu-devel@nongnu.org, kvm@vger.kernel.org On Tue, Apr 12, 2011 at 4:38 PM, Sage Weil wrote: > On Tue, 12 Apr 2011, Stefan Hajnoczi wrote: >> On Tue, Apr 12, 2011 at 1:18 AM, Josh Durgin = wrote: >> > On 04/08/2011 01:43 AM, Stefan Hajnoczi wrote: >> >> >> >> On Mon, Mar 28, 2011 at 04:15:57PM -0700, Josh Durgin wrote: >> >>> >> >>> librbd stacks on top of librados to provide access >> >>> to rbd images. >> >>> >> >>> Using librbd simplifies the qemu code, and allows >> >>> qemu to use new versions of the rbd format >> >>> with few (if any) changes. >> >>> >> >>> Signed-off-by: Josh Durgin >> >>> Signed-off-by: Yehuda Sadeh >> >>> --- >> >>> =A0block/rbd.c =A0 =A0 =A0 | =A0785 >> >>> +++++++++++++++-------------------------------------- >> >>> =A0block/rbd_types.h | =A0 71 ----- >> >>> =A0configure =A0 =A0 =A0 =A0 | =A0 33 +-- >> >>> =A03 files changed, 221 insertions(+), 668 deletions(-) >> >>> =A0delete mode 100644 block/rbd_types.h >> >> >> >> Hi Josh, >> >> I have applied your patches onto qemu.git/master and am running >> >> ceph.git/master. >> >> >> >> Unfortunately qemu-iotests fails for me. >> >> >> >> >> >> Test 016 seems to hang in qemu-io -g -c write -P 66 128M 512 >> >> rbd:rbd/t.raw. =A0I can reproduce this consistently. =A0Here is the >> >> backtrace of the hung process (not consuming CPU, probably deadlocked= ): >> > >> > This hung because it wasn't checking the return value of rbd_aio_write= . >> > I've fixed this in the for-qemu branch of >> > http://ceph.newdream.net/git/qemu-kvm.git. Also, the existing rbd >> > implementation is not 'growable' - writing to a large offset will not = expand >> > the rbd image correctly. Should we implement bdrv_truncate to support = this >> > (librbd has a resize operation)? Is bdrv_truncate useful outside of qe= mu-img >> > and qemu-io? >> >> If librbd has a resize operation then it would be nice to wire up >> bdrv_truncate() for completeness. =A0Note that bdrv_truncate() can also >> be called online using the block_resize monitor command. >> >> Since rbd devices are not growable we should fix qemu-iotests to skip >> 016 for rbd. > > There is a resize operation, but it's expected that you'll use it for any > bdev size change (grow or shrink). =A0Does qemu grow a device by writing = to > the (new) highest offset, or is there another operation that should be > wired up? =A0We want to avoid a situation where RBD isn't aware of the qe= mu > bdev resize and has to grow a bit each time we write to a larger offset, > as resize is a somewhat expensive operation... Good it sounds like RBD and QEMU have similar concepts here. The bdrv_truncate() operation is a (rare) image resize operation. It is not the extend-beyond-EOF grow operation which QEMU simply performs as a write beyond bdrv_getlength() bytes. Stefan