From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36946) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZK5vj-0004t4-9H for qemu-devel@nongnu.org; Tue, 28 Jul 2015 10:32:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZK5vf-0008Oj-67 for qemu-devel@nongnu.org; Tue, 28 Jul 2015 10:32:31 -0400 Received: from mail-ig0-f175.google.com ([209.85.213.175]:34932) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZK5vf-0008OQ-1O for qemu-devel@nongnu.org; Tue, 28 Jul 2015 10:32:27 -0400 Received: by igr7 with SMTP id 7so118924860igr.0 for ; Tue, 28 Jul 2015 07:31:41 -0700 (PDT) Date: Tue, 28 Jul 2015 22:31:32 +0800 From: Liu Yuan Message-ID: <20150728143132.GB8357@ubuntu-trusty> References: <1437151464-5458-1-git-send-email-mitake.hitoshi@lab.ntt.co.jp> <20150727152302.GH21772@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20150727152302.GH21772@localhost.localdomain> Subject: Re: [Qemu-devel] [PATCH] sheepdog: serialize requests to overwrapping area List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jeff Cody Cc: Kevin Wolf , sheepdog@lists.wpkg.org, Hitoshi Mitake , qemu-devel@nongnu.org, Vasiliy Tolstov , MORITA Kazutaka , Stefan Hajnoczi , Teruaki Ishizaki On Mon, Jul 27, 2015 at 11:23:02AM -0400, Jeff Cody wrote: > On Sat, Jul 18, 2015 at 01:44:24AM +0900, Hitoshi Mitake wrote: > > Current sheepdog driver only serializes create requests in oid > > unit. This mechanism isn't enough for handling requests to > > overwrapping area spanning multiple oids, so it can result bugs like > > below: > > https://bugs.launchpad.net/sheepdog-project/+bug/1456421 > > > > This patch adds a new serialization mechanism for the problem. The > > difference from the old one is: > > 1. serialize entire aiocb if their targetting areas overwrap > > 2. serialize all requests (read, write, and discard), not only creates > > > > This patch also removes the old mechanism because the new one can be > > an alternative. > > Okay, I figured out what the problem is myself and allow me to try to make it clear to non-sheepdog devs: sheepdog volume is thin-provision, so for the first write, we create the object internally, meaning that we need to handle write in two case: 1. write to non-allocated object, create it then update inode, so in this case two request will be generated: create(oid), update_inode(oid_to_inode_idx) 2. write the allocated object, just write(oid). Current sheepdog driver use a range update_inode(min_idx, max_idx) for batching the updates. But there is subtle problem by determining min_idx and max_idx: for a single create request, min_idx == max_idx, so actually we just update one one bit as expected. Suppose we have 2 create request, create(10) and create(20), then min == 10, max==20 even though we just need to update index 10 and index 20, update_inode(10,20) will actually update range from 10 to 20. This would work if all the update_inode() requests won't overlap. But unfortunately, this is not true for some corner case. So the problem arise as following: req 1: update_inode(10,20) req 2: update_inode(15,22) req 1 and req 2 might have different value between [15,20] and cause problems. Based on above analysis, I think the real fix is to fix update_inode(), not serialize all the requests in overkill way. The fix would be easy, considering most update_inode() update only 1 index, we could just make update_inode a single bit updater, not a range one, in which way we don't affect performance as the above patch did. (I actually suspect that the above patch might not solve the problem because update_inode() can overlap even with the patch). If everyone agrees with my analysis, I'll post the fix. Thanks, Yuan