From mboxrd@z Thu Jan  1 00:00:00 1970
From: Konrad Rzeszutek Wilk <konrad@kernel.org>
Subject: Re: Block ring protocol (segment expansion, multi-page,
 etc).
Date: Thu, 6 Sep 2012 06:47:41 -0400
Message-ID: <20120906104740.GA3744@phenom.dumpdata.com>
References: <20120905132920.GA5792@localhost.localdomain>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
Content-Disposition: inline
In-Reply-To: <20120905132920.GA5792@localhost.localdomain>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>, Oliver.Chick@citrix.com
Cc: justing@spectralogic.com, ronghui.duan@intel.com, JBeulich@suse.com, donald.d.dugger@intel.com, xen-devel@lists.xen.org
List-Id: xen-devel@lists.xenproject.org

On Wed, Sep 05, 2012 at 09:29:21AM -0400, Konrad Rzeszutek Wilk wrote:
> Please correct me if I a got something wrong.

CC-ing here a Citrix person who has expressed interest in also
implementing persistent grants in block backend.
> 
> About two or three years ago Citrix (and Red Hat I think?) posted a
> multi-page extension protocol (max-ring-page-order, max-ring-pages
> and ring-page-order and ring-pages)-
> which never got upstream (needed just to be rebased on the driver that
> went in the kernel I think?).
> 
> Then about a year ago SpectraLogic started enhancing the FreeBSD variant
> of blkback - and realized what Ronghui also did - that the just doing a
> multi-page extension is not enough. The issue was that if one just
> expanded to a ring composed of two pages, 1/4 of the page was wasted b/c
> of the segment is constrained to 11.
> 
> Justin (SpectraLogic) came up with a protocol enh were the existing
> blkif protocol is the same, but the BLKIF_MAX_SEGMENTS_PER_REQUEST
> is negotitated via max-request-segments. And then there is the
> max-request-size which rolls the segment size and the size of the ring
> to give you an idea of what is the biggest I/O you can fit on a ring in
> a single transaction. This solves the wastage problem and expands the
> ring.
> 
> Ronghui did something similar, but instead of re-using the existing
> blkif structure he split them in two. One ring is for
> blkif_request_header (which has the segments ripped out), and the other
> is for just for blkif_request_segments. Solves the wastage and also
> allows to expand the ring.
> 
> The three major outstanding issues that exists with the current protocol
> that I know of are:
>  - We split up the I/O requests. This ends up eating a lot of CPU
>    cycles.
>  - We might have huge I/O requests. Justin mentioned 1MB single I/Os -
>    and to fit that on a ring it has to be .. well, be able to fit 256
>    segments. Jan mentioned 256kB for SCSI - since the protocol
>    extensions here could very well be carried over.
>  - concurrent usage. If we have more than 4 VBDs blkback suffers when it
>    tries to get a page as there is a "global" pool shared across all
>    guests instead of being something 'per guest' or 'per VBD'.
> 
> So.. Ronghui - I am curious to why you choosen the path of making two
> seperate rings? Was the mechanism that Justin came up not really that
> good or was this just easier to implement?
> 
> Thanks.
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
>