Block ring protocol (segment expansion, multi-page, etc).

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: ronghui.duan@intel.com, justing@spectralogic.com,
	donald.d.dugger@intel.com, JBeulich@suse.com,
	xen-devel@lists.xen.org
Subject: Block ring protocol (segment expansion, multi-page, etc).
Date: Wed, 5 Sep 2012 09:29:21 -0400	[thread overview]
Message-ID: <20120905132920.GA5792@localhost.localdomain> (raw)

Please correct me if I a got something wrong.

About two or three years ago Citrix (and Red Hat I think?) posted a
multi-page extension protocol (max-ring-page-order, max-ring-pages
and ring-page-order and ring-pages)-
which never got upstream (needed just to be rebased on the driver that
went in the kernel I think?).

Then about a year ago SpectraLogic started enhancing the FreeBSD variant
of blkback - and realized what Ronghui also did - that the just doing a
multi-page extension is not enough. The issue was that if one just
expanded to a ring composed of two pages, 1/4 of the page was wasted b/c
of the segment is constrained to 11.

Justin (SpectraLogic) came up with a protocol enh were the existing
blkif protocol is the same, but the BLKIF_MAX_SEGMENTS_PER_REQUEST
is negotitated via max-request-segments. And then there is the
max-request-size which rolls the segment size and the size of the ring
to give you an idea of what is the biggest I/O you can fit on a ring in
a single transaction. This solves the wastage problem and expands the
ring.

Ronghui did something similar, but instead of re-using the existing
blkif structure he split them in two. One ring is for
blkif_request_header (which has the segments ripped out), and the other
is for just for blkif_request_segments. Solves the wastage and also
allows to expand the ring.

The three major outstanding issues that exists with the current protocol
that I know of are:
 - We split up the I/O requests. This ends up eating a lot of CPU
   cycles.
 - We might have huge I/O requests. Justin mentioned 1MB single I/Os -
   and to fit that on a ring it has to be .. well, be able to fit 256
   segments. Jan mentioned 256kB for SCSI - since the protocol
   extensions here could very well be carried over.
 - concurrent usage. If we have more than 4 VBDs blkback suffers when it
   tries to get a page as there is a "global" pool shared across all
   guests instead of being something 'per guest' or 'per VBD'.

So.. Ronghui - I am curious to why you choosen the path of making two
seperate rings? Was the mechanism that Justin came up not really that
good or was this just easier to implement?

Thanks.

next             reply	other threads:[~2012-09-05 13:29 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-05 13:29 Konrad Rzeszutek Wilk [this message]
2012-09-06 10:47 ` Block ring protocol (segment expansion, multi-page, etc) Konrad Rzeszutek Wilk
2012-09-06 11:02 ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120905132920.GA5792@localhost.localdomain \
    --to=konrad.wilk@oracle.com \
    --cc=JBeulich@suse.com \
    --cc=donald.d.dugger@intel.com \
    --cc=justing@spectralogic.com \
    --cc=ronghui.duan@intel.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.