From: Liu Yuan <namei.unix@gmail.com>
To: Jeff Cody <jcody@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
sheepdog@lists.wpkg.org,
Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>,
qemu-devel@nongnu.org, Vasiliy Tolstov <v.tolstov@selfip.ru>,
MORITA Kazutaka <morita.kazutaka@gmail.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Teruaki Ishizaki <ishizaki.teruaki@lab.ntt.co.jp>
Subject: Re: [Qemu-devel] [PATCH] sheepdog: serialize requests to overwrapping area
Date: Tue, 28 Jul 2015 22:47:37 +0800 [thread overview]
Message-ID: <20150728144737.GC8357@ubuntu-trusty> (raw)
In-Reply-To: <20150728143132.GB8357@ubuntu-trusty>
On Tue, Jul 28, 2015 at 10:31:32PM +0800, Liu Yuan wrote:
> On Mon, Jul 27, 2015 at 11:23:02AM -0400, Jeff Cody wrote:
> > On Sat, Jul 18, 2015 at 01:44:24AM +0900, Hitoshi Mitake wrote:
> > > Current sheepdog driver only serializes create requests in oid
> > > unit. This mechanism isn't enough for handling requests to
> > > overwrapping area spanning multiple oids, so it can result bugs like
> > > below:
> > > https://bugs.launchpad.net/sheepdog-project/+bug/1456421
> > >
> > > This patch adds a new serialization mechanism for the problem. The
> > > difference from the old one is:
> > > 1. serialize entire aiocb if their targetting areas overwrap
> > > 2. serialize all requests (read, write, and discard), not only creates
> > >
> > > This patch also removes the old mechanism because the new one can be
> > > an alternative.
> > >
>
> Okay, I figured out what the problem is myself and allow me to try to make it
> clear to non-sheepdog devs:
>
> sheepdog volume is thin-provision, so for the first write, we create the object
> internally, meaning that we need to handle write in two case:
>
> 1. write to non-allocated object, create it then update inode, so in this case
> two request will be generated: create(oid), update_inode(oid_to_inode_idx)
> 2. write the allocated object, just write(oid).
>
> Current sheepdog driver use a range update_inode(min_idx, max_idx) for batching
> the updates. But there is subtle problem by determining min_idx and max_idx:
>
> for a single create request, min_idx == max_idx, so actually we just update one
> one bit as expected.
>
> Suppose we have 2 create request, create(10) and create(20), then min == 10,
> max==20 even though we just need to update index 10 and index 20, update_inode(10,20)
> will actually update range from 10 to 20. This would work if all the update_inode()
> requests won't overlap. But unfortunately, this is not true for some corner case.
> So the problem arise as following:
>
> req 1: update_inode(10,20)
> req 2: update_inode(15,22)
I patched the qemu to print min and max, and confirmed my analysis:
diff --git a/block/sheepdog.c b/block/sheepdog.c
index bd7cbed..f3e40e8 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1976,6 +1976,7 @@ static void coroutine_fn sd_write_done(SheepdogAIOCB *acb)
mn = s->min_dirty_data_idx;
mx = s->max_dirty_data_idx;
+ printf("min %u, max %u\n", mn, mx);
if (mn <= mx) {
/* we need to update the vdi object. */
offset = sizeof(s->inode) - sizeof(s->inode.data_vdi_id) +
diff --git a/dtc b/dtc
index 65cc4d2..bc895d6 160000
...
min 4294967295, max 0
min 9221, max 9222
min 9224, max 9728
min 9223, max 9223
min 9729, max 9730
min 9731, max 9732
min 9733, max 9734
min 9736, max 10240
min 9735, max 10241
...
Every line represents a update_inode(min, max) last 2 lines show 2 requests
actually overlap while I ran mkfs.ext4 on a sheepdog volume. The overlapped
requests might be reordered via network and cause inode[idx] back to 0 after
being set by last request. Then a wrong remove request will be executed by sheep
internally and accidentally remove a victim object.
Thanks,
Yuan
> req 1 and req 2 might have different value between [15,20] and cause problems.
>
> Based on above analysis, I think the real fix is to fix update_inode(), not
> serialize all the requests in overkill way. The fix would be easy, considering
> most update_inode() update only 1 index, we could just make update_inode a
> single bit updater, not a range one, in which way we don't affect performance
> as the above patch did. (I actually suspect that the above patch might not solve
> the problem because update_inode() can overlap even with the patch).
>
> If everyone agrees with my analysis, I'll post the fix.
>
> Thanks,
> Yuan
next prev parent reply other threads:[~2015-07-28 14:47 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-17 16:44 [Qemu-devel] [PATCH] sheepdog: serialize requests to overwrapping area Hitoshi Mitake
2015-07-17 16:49 ` Vasiliy Tolstov
2015-07-20 15:46 ` Stefan Hajnoczi
2015-07-27 15:23 ` Jeff Cody
2015-07-27 15:36 ` Vasiliy Tolstov
2015-07-28 14:31 ` Liu Yuan
2015-07-28 14:47 ` Liu Yuan [this message]
2015-07-28 8:50 ` [Qemu-devel] [sheepdog] " Liu Yuan
2015-07-28 9:35 ` Liu Yuan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150728144737.GC8357@ubuntu-trusty \
--to=namei.unix@gmail.com \
--cc=ishizaki.teruaki@lab.ntt.co.jp \
--cc=jcody@redhat.com \
--cc=kwolf@redhat.com \
--cc=mitake.hitoshi@lab.ntt.co.jp \
--cc=morita.kazutaka@gmail.com \
--cc=qemu-devel@nongnu.org \
--cc=sheepdog@lists.wpkg.org \
--cc=stefanha@redhat.com \
--cc=v.tolstov@selfip.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).