From: Liu Yuan <namei.unix@gmail.com>
To: "Benoît Canet" <benoit.canet@irqsave.net>
Cc: Kevin Wolf <kwolf@redhat.com>,
qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v5 2/2] block/quorum: add simple read pattern support
Date: Mon, 18 Aug 2014 13:59:28 +0800 [thread overview]
Message-ID: <20140818055928.GA8722@ubuntu-trusty> (raw)
In-Reply-To: <20140815135903.GB595@irqsave.net>
On Fri, Aug 15, 2014 at 03:59:04PM +0200, Benoît Canet wrote:
> The Friday 15 Aug 2014 à 13:05:17 (+0800), Liu Yuan wrote :
> > This patch adds single read pattern to quorum driver and quorum vote is default
> > pattern.
> >
> > For now we do a quorum vote on all the reads, it is designed for unreliable
> > underlying storage such as non-redundant NFS to make sure data integrity at the
> > cost of the read performance.
> >
> > For some use cases as following:
> >
> > VM
> > --------------
> > | |
> > v v
> > A B
> >
> > Both A and B has hardware raid storage to justify the data integrity on its own.
> > So it would help performance if we do a single read instead of on all the nodes.
> > Further, if we run VM on either of the storage node, we can make a local read
> > request for better performance.
> >
> > This patch generalize the above 2 nodes case in the N nodes. That is,
> >
> > vm -> write to all the N nodes, read just one of them. If single read fails, we
> > try to read next node in FIFO order specified by the startup command.
> >
> > The 2 nodes case is very similar to DRBD[1] though lack of auto-sync
> > functionality in the single device/node failure for now. But compared with DRBD
> > we still have some advantages over it:
> >
> > - Suppose we have 20 VMs running on one(assume A) of 2 nodes' DRBD backed
> > storage. And if A crashes, we need to restart all the VMs on node B. But for
> > practice case, we can't because B might not have enough resources to setup 20 VMs
> > at once. So if we run our 20 VMs with quorum driver, and scatter the replicated
> > images over the data center, we can very likely restart 20 VMs without any
> > resource problem.
> >
> > After all, I think we can build a more powerful replicated image functionality
> > on quorum and block jobs(block mirror) to meet various High Availibility needs.
> >
> > E.g, Enable single read pattern on 2 children,
> >
> > -drive driver=quorum,children.0.file.filename=0.qcow2,\
> > children.1.file.filename=1.qcow2,read-pattern=fifo,vote-threshold=1
> >
> > [1] http://en.wikipedia.org/wiki/Distributed_Replicated_Block_Device
> >
> > Cc: Benoit Canet <benoit@irqsave.net>
> > Cc: Eric Blake <eblake@redhat.com>
> > Cc: Kevin Wolf <kwolf@redhat.com>
> > Cc: Stefan Hajnoczi <stefanha@redhat.com>
> > Signed-off-by: Liu Yuan <namei.unix@gmail.com>
> > ---
> > block/quorum.c | 176 ++++++++++++++++++++++++++++++++++++++++++---------------
> > 1 file changed, 129 insertions(+), 47 deletions(-)
> >
> > diff --git a/block/quorum.c b/block/quorum.c
> > index d5ee9c0..1235d7c 100644
> > --- a/block/quorum.c
> > +++ b/block/quorum.c
> > @@ -24,6 +24,7 @@
> > #define QUORUM_OPT_VOTE_THRESHOLD "vote-threshold"
> > #define QUORUM_OPT_BLKVERIFY "blkverify"
> > #define QUORUM_OPT_REWRITE "rewrite-corrupted"
> > +#define QUORUM_OPT_READ_PATTERN "read-pattern"
> >
> > /* This union holds a vote hash value */
> > typedef union QuorumVoteValue {
> > @@ -74,6 +75,8 @@ typedef struct BDRVQuorumState {
> > bool rewrite_corrupted;/* true if the driver must rewrite-on-read corrupted
> > * block if Quorum is reached.
> > */
> > +
> > + QuorumReadPattern read_pattern;
> > } BDRVQuorumState;
> >
> > typedef struct QuorumAIOCB QuorumAIOCB;
> > @@ -117,6 +120,7 @@ struct QuorumAIOCB {
> >
> > bool is_read;
> > int vote_ret;
> > + int child_iter; /* which child to read in fifo pattern */
>
> I don't understand what "fifo pattern" could mean for a bunch of disk
> as they are not forming a queue.
Naming isn't 100% accurate but as in Eric's comment (see below), both FIFO and
Round-Robin can be used for two different patterns.
> Maybe round-robin is more suitable but your code does not implement
> round-robin since it will alway start from the first disk.
>
> Your code is scanning the disks set it's a scan pattern.
>
> That said is it a problem that the first disk will be accessed more often than the other ?
As my commit log documented, the purpose of the read pattern I added is to
speed up read against quorum original read pattern. And the use case is clear
(I hope so) and you can take DRBD as a good example for why we need it. Of
course we are far away from DRBD, which need a recovery logic after all kinds of
failures. My patch set can be taken as a prelimitary step to implement a DRBD
like service driver.
Eric previously commented on two read patterns that might be useful:
"Should we offer multiple modes in addition to 'quorum'? For example, I
could see a difference between 'fifo' (favor read from the first quorum
member always, unless it fails, good when the first member is local and
other member is remote) and 'round-robin' (evenly distribute reads; each
read goes to the next available quorum member, good when all members are
equally distant)."
> You will have to care to insert disks in different order on each QEMU to spread the load.
This is another use case that my patch set didn't try to solve.
> Shouldn't the code try to spread the load by circling on the disk like a real round robin pattern ?
Probably not on my patch set, but we can add a yet another round robin pattern
if anyone is intrested.
Thanks
Yuan
next prev parent reply other threads:[~2014-08-18 5:59 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-15 5:05 [Qemu-devel] [PATCH v5 0/2] add read-pattern for block qourum Liu Yuan
2014-08-15 5:05 ` [Qemu-devel] [PATCH v5 1/2] qapi: add read-pattern enum for quorum Liu Yuan
2014-08-15 11:59 ` Benoît Canet
2014-08-15 5:05 ` [Qemu-devel] [PATCH v5 2/2] block/quorum: add simple read pattern support Liu Yuan
2014-08-15 13:59 ` Benoît Canet
2014-08-18 5:59 ` Liu Yuan [this message]
2014-08-21 9:44 ` Liu Yuan
2014-08-21 12:11 ` Benoît Canet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140818055928.GA8722@ubuntu-trusty \
--to=namei.unix@gmail.com \
--cc=benoit.canet@irqsave.net \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).