From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38478) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YM7rq-0001WC-Fa for qemu-devel@nongnu.org; Thu, 12 Feb 2015 23:28:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YM7rm-0001FF-DV for qemu-devel@nongnu.org; Thu, 12 Feb 2015 23:28:38 -0500 Received: from tama4.tas.ntt.co.jp ([222.151.219.51]:33257) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YM7rl-0001Ei-Tv for qemu-devel@nongnu.org; Thu, 12 Feb 2015 23:28:34 -0500 Message-ID: <54DD7D5A.40702@lab.ntt.co.jp> Date: Fri, 13 Feb 2015 13:28:10 +0900 From: Teruaki Ishizaki MIME-Version: 1.0 References: <1422347727-13006-1-git-send-email-ishizaki.teruaki@lab.ntt.co.jp> <20150210111256.GG18727@ubuntu-trusty> <54DC071D.40503@lab.ntt.co.jp> <20150212021928.GI7801@ubuntu-trusty> <54DC10EC.3050502@lab.ntt.co.jp> <20150212025523.GJ7801@ubuntu-trusty> <54DD5450.1000304@lab.ntt.co.jp> <20150213020119.GS7801@ubuntu-trusty> In-Reply-To: <20150213020119.GS7801@ubuntu-trusty> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v4] sheepdog: selectable object size support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Liu Yuan Cc: kwolf@redhat.com, mitake.hitoshi@lab.ntt.co.jp, sheepdog@lists.wpkg.org, qemu-devel@nongnu.org, stefanha@redhat.com (2015/02/13 11:01), Liu Yuan wrote: > On Fri, Feb 13, 2015 at 10:33:04AM +0900, Teruaki Ishizaki wrote: >> (2015/02/12 11:55), Liu Yuan wrote: >>> On Thu, Feb 12, 2015 at 11:33:16AM +0900, Teruaki Ishizaki wrote: >>>> (2015/02/12 11:19), Liu Yuan wrote: >>>>> On Thu, Feb 12, 2015 at 10:51:25AM +0900, Teruaki Ishizaki wrote: >>>>>> (2015/02/10 20:12), Liu Yuan wrote: >>>>>>> On Tue, Jan 27, 2015 at 05:35:27PM +0900, Teruaki Ishizaki wrote: >>>>>>>> Previously, qemu block driver of sheepdog used hard-coded VDI object size. >>>>>>>> This patch enables users to handle "block_size_shift" value for >>>>>>>> calculating VDI object size. >>>>>>>> >>>>>>>> When you start qemu, you don't need to specify additional command option. >>>>>>>> >>>>>>>> But when you create the VDI which doesn't have default object size >>>>>>>> with qemu-img command, you specify block_size_shift option. >>>>>>>> >>>>>>>> If you want to create a VDI of 8MB(1 << 23) object size, >>>>>>>> you need to specify following command option. >>>>>>>> >>>>>>>> # qemu-img create -o block_size_shift=23 sheepdog:test1 100M >>>>>>>> >>>>>>>> In addition, when you don't specify qemu-img command option, >>>>>>>> a default value of sheepdog cluster is used for creating VDI. >>>>>>>> >>>>>>>> # qemu-img create sheepdog:test2 100M >>>>>>>> >>>>>>>> Signed-off-by: Teruaki Ishizaki >>>>>>>> --- >>>>>>>> V4: >>>>>>>> - Limit a read/write buffer size for creating a preallocated VDI. >>>>>>>> - Replace a parse function for the block_size_shift option. >>>>>>>> - Fix an error message. >>>>>>>> >>>>>>>> V3: >>>>>>>> - Delete the needless operation of buffer. >>>>>>>> - Delete the needless operations of request header. >>>>>>>> for SD_OP_GET_CLUSTER_DEFAULT. >>>>>>>> - Fix coding style problems. >>>>>>>> >>>>>>>> V2: >>>>>>>> - Fix coding style problem (white space). >>>>>>>> - Add members, store_policy and block_size_shift to struct SheepdogVdiReq. >>>>>>>> - Initialize request header to use block_size_shift specified by user. >>>>>>>> --- >>>>>>>> block/sheepdog.c | 138 ++++++++++++++++++++++++++++++++++++++------- >>>>>>>> include/block/block_int.h | 1 + >>>>>>>> 2 files changed, 119 insertions(+), 20 deletions(-) >>>>>>>> >>>>>>>> diff --git a/block/sheepdog.c b/block/sheepdog.c >>>>>>>> index be3176f..a43b947 100644 >>>>>>>> --- a/block/sheepdog.c >>>>>>>> +++ b/block/sheepdog.c >>>>>>>> @@ -37,6 +37,7 @@ >>>>>>>> #define SD_OP_READ_VDIS 0x15 >>>>>>>> #define SD_OP_FLUSH_VDI 0x16 >>>>>>>> #define SD_OP_DEL_VDI 0x17 >>>>>>>> +#define SD_OP_GET_CLUSTER_DEFAULT 0x18 >>>>>>> >>>>>>> This might not be necessary. For old qemu or the qemu-img without setting >>>>>>> option, the block_size_shift will be 0. >>>>>>> >>>>>>> If we make 0 to represent 4MB object, then we don't need to get the default >>>>>>> cluster object size. >>>>>>> >>>>>>> We migth even get rid of the idea of cluster default size. The downsize is that, >>>>>>> if we want to create a vdi with different size not the default 4MB, >>>>>>> we have to write it every time for qemu-img or dog. >>>>>>> >>>>>>> If we choose to keep the idea of cluster default size, I think we'd also try to >>>>>>> avoid call this request from QEMU to make backward compatibility easier. In this >>>>>>> scenario, 0 might be used to ask new sheep to decide to use cluster default size. >>>>>>> >>>>>>> Both old qemu and new QEMU will send 0 to sheep and both old and new sheep can >>>>>>> handle 0 though it has different meanings. >>>>>>> >>>>>>> Table for this bit as 0: >>>>>>> Qe: qemu >>>>>>> SD: Sheep daemon >>>>>>> CDS: Cluster Default Size >>>>>>> Ign: Ignored by the sheep daemon >>>>>>> >>>>>>> Qe/sd new old >>>>>>> new CDS Ign >>>>>>> old CDS NULL >>>>>> Does Ign mean that VDI is handled as 4MB object size? >>>>> >>>>> Yes, old sheep can only handle 4MB object and doesn't check this field at all. >>>>> >>>>>> >>>>>>> >>>>>>> I think this approach is acceptable. The difference to your patch is that >>>>>>> we don't send SD_OP_GET_CLUSTER_DEFAULT to sheep daemon and >>>>>>> SD_OP_GET_CLUSTER_DEFAULT can be removed. >>>>>> When users create a new VDI with qemu-img, qemu's Sheepdog backend >>>>>> driver calculates max limit VDI size. >>>>> >>>>>> But if block_size_shift option is not specified, qemu's Sheepdog backend >>>>>> driver can't calculate max limit VDI size. >>>>> >>>>> If block_size_shift not specified, this means >>>>> >>>>> 1 for old sheep, use 4MB size >>>>> 2 for new sheep, use cluster wide default value. >>>>> >>>>> And sheep then can calculate it on its own, no? >>>>> >>>> Dog command(client) calculate max size, so I think >>>> that qemu's Sheepdog backend driver should calculate it >>>> like dog command. >>>> >>>> Is that policy changeable? >>> >>> I checked the QEMU code and got your idea. In the past it was fixed size so very >>> easy to hardcode the check in the client, no communication with sheep needed. >>> >>> Yes, if it is reasonable, we can change it. >>> >>> I think we can push the size calculation logic into sheep, if not the right size >>> return INVALID_PARAMETER to clients. Clients just check this and report error >>> back to users. >>> >>> There is no backward compability for this approach, since 4MB is the smallest >>> size. >>> >>> OLD QEMU will limit the max_size as 4TB, which is no problem for new sheep. >> >> I have checked the Qemu and sheepdog code. >> When we resize VDI, sd_truncate() is called and >> resize value is handled by Qemu. >> (Sorry I haven't noticed this operation) >> >> Then, sd_truncate() writes Sheepdog inode object directly. >> So Sheepdog server can't handle maximum VDI size. >> >> As I thought, should we use SD_OP_GET_CLUSTER_DEFAULT? >> Should maxmimum VDI size be calculated on client program? > > Based on your description, yes, we have to use it. I'd suggest rename > SD_OP_GET_CLUSTER_DEFAULT as SD_OP_GET_DEFAULT_OBJECT_SIZE. If we use it, you > need to take care of old sheep that will return INVALID_PARAMETER and handle it. > Now, SD_OP_GET_CLUSTER_DEFAULT can get block_size_shift, copy_policy and copies informations. I think that SD_OP_GET_DEFAULT_OBJECT_SIZE doesn't fit in. I'll change to handle INVALID_PARAMETER. Thanks, Teruaki