From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: Block ring protocol (segment expansion, multi-page, etc). Date: Thu, 6 Sep 2012 06:47:41 -0400 Message-ID: <20120906104740.GA3744@phenom.dumpdata.com> References: <20120905132920.GA5792@localhost.localdomain> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20120905132920.GA5792@localhost.localdomain> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk , Oliver.Chick@citrix.com Cc: justing@spectralogic.com, ronghui.duan@intel.com, JBeulich@suse.com, donald.d.dugger@intel.com, xen-devel@lists.xen.org List-Id: xen-devel@lists.xenproject.org On Wed, Sep 05, 2012 at 09:29:21AM -0400, Konrad Rzeszutek Wilk wrote: > Please correct me if I a got something wrong. CC-ing here a Citrix person who has expressed interest in also implementing persistent grants in block backend. > > About two or three years ago Citrix (and Red Hat I think?) posted a > multi-page extension protocol (max-ring-page-order, max-ring-pages > and ring-page-order and ring-pages)- > which never got upstream (needed just to be rebased on the driver that > went in the kernel I think?). > > Then about a year ago SpectraLogic started enhancing the FreeBSD variant > of blkback - and realized what Ronghui also did - that the just doing a > multi-page extension is not enough. The issue was that if one just > expanded to a ring composed of two pages, 1/4 of the page was wasted b/c > of the segment is constrained to 11. > > Justin (SpectraLogic) came up with a protocol enh were the existing > blkif protocol is the same, but the BLKIF_MAX_SEGMENTS_PER_REQUEST > is negotitated via max-request-segments. And then there is the > max-request-size which rolls the segment size and the size of the ring > to give you an idea of what is the biggest I/O you can fit on a ring in > a single transaction. This solves the wastage problem and expands the > ring. > > Ronghui did something similar, but instead of re-using the existing > blkif structure he split them in two. One ring is for > blkif_request_header (which has the segments ripped out), and the other > is for just for blkif_request_segments. Solves the wastage and also > allows to expand the ring. > > The three major outstanding issues that exists with the current protocol > that I know of are: > - We split up the I/O requests. This ends up eating a lot of CPU > cycles. > - We might have huge I/O requests. Justin mentioned 1MB single I/Os - > and to fit that on a ring it has to be .. well, be able to fit 256 > segments. Jan mentioned 256kB for SCSI - since the protocol > extensions here could very well be carried over. > - concurrent usage. If we have more than 4 VBDs blkback suffers when it > tries to get a page as there is a "global" pool shared across all > guests instead of being something 'per guest' or 'per VBD'. > > So.. Ronghui - I am curious to why you choosen the path of making two > seperate rings? Was the mechanism that Justin came up not really that > good or was this just easier to implement? > > Thanks. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel >