public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: "james.harper@bendigoit.com.au" <james.harper@bendigoit.com.au>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH RFC 12/12] xen-block: implement indirect descriptors
Date: Thu, 21 Mar 2013 21:10:45 -0400	[thread overview]
Message-ID: <20130322011045.GD28902@phenom.dumpdata.com> (raw)
In-Reply-To: <513A1ABC.1040906@citrix.com>

On Fri, Mar 08, 2013 at 06:07:08PM +0100, Roger Pau Monné wrote:
> On 05/03/13 22:46, Konrad Rzeszutek Wilk wrote:
> > On Tue, Mar 05, 2013 at 06:07:57PM +0100, Roger Pau Monné wrote:
> >> On 04/03/13 21:41, Konrad Rzeszutek Wilk wrote:
> >>> On Thu, Feb 28, 2013 at 11:28:55AM +0100, Roger Pau Monne wrote:
> >>>> Indirect descriptors introduce a new block operation
> >>>> (BLKIF_OP_INDIRECT) that passes grant references instead of segments
> >>>> in the request. This grant references are filled with arrays of
> >>>> blkif_request_segment_aligned, this way we can send more segments in a
> >>>> request.
> >>>>
> >>>> The proposed implementation sets the maximum number of indirect grefs
> >>>> (frames filled with blkif_request_segment_aligned) to 256 in the
> >>>> backend and 64 in the frontend. The value in the frontend has been
> >>>> chosen experimentally, and the backend value has been set to a sane
> >>>> value that allows expanding the maximum number of indirect descriptors
> >>>> in the frontend if needed.
> >>>
> >>> So we are still using a similar format of the form:
> >>>
> >>> <gref, first_sec, last_sect, pad>, etc.
> >>>
> >>> Why not utilize a layout that fits with the bio sg? That way
> >>> we might not even have to do the bio_alloc call and instead can
> >>> setup an bio (and bio-list) with the appropiate offsets/list?
> 
> I think we can already do this without changing the structure of the
> segments, we could just allocate a bio big enough to hold all the
> segments and queue them up (provided that the underlying storage device
> supports bios of this size).
> 
> bio = bio_alloc(GFP_KERNEL, nseg);
> if (unlikely(bio == NULL))
> 	goto fail_put_bio;
> biolist[nbio++] = bio;
> bio->bi_bdev    = preq.bdev;
> bio->bi_private = pending_req;
> bio->bi_end_io  = end_block_io_op;
> bio->bi_sector  = preq.sector_number;
> 
> for (i = 0; i < nseg; i++) {
> 	rc = bio_add_page(bio, pages[i], seg[i].nsec << 9,
> 		seg[i].buf & ~PAGE_MASK);
> 	if (rc == 0)
> 		goto fail_put_bio;
> }
> 
> This seems to work with Linux blkfront/blkback, and I guess biolist in
> blkback only has one bio all the time.

> 
> >>> Meaning that the format of the indirect descriptors is:
> >>>
> >>> <gref, offset, next_index, pad>
> 
> Don't we need a length parameter? Also, next_index will be current+1,
> because we already send the segments sorted (using for_each_sg) in blkfront.
> 
> >>>
> >>> We already know what the first_sec and last_sect are - they
> >>> are basically: sector_number +  nr_segments * (whatever the sector size is) + offset
> >>
> >> This will of course be suitable for Linux, but what about other OSes, I
> >> know they support the traditional first_sec, last_sect (because it's
> >> already implemented), but I don't know how much work will it be for them
> >> to adopt this. If we have to do such a change I will have to check first
> >> that other frontend/backend can handle this easily also, I wouldn't like
> >> to simplify this for Linux by making it more difficult to implement in
> >> other OSes...
> > 
> > I would think that most OSes use the same framework. The ones that
> > are of notable interest are the Windows and BSD. Lets CC James here
> 
> Maybe I'm missing something here, but I don't see a really big benefit
> of using this new structure for segments instead of the current one.

The DIF/DIX requires that the bio layout going in blkfront and then
emerging on the other side in the SAS/SCSI/SATA drivers must be the same.

That means when you have a bio-vec, for example, where there are
five pages linked - the first four have 512 bytes of data (say in the middle
of the page - so 2048 -> 2560 are occupied, the rest is not). The total
is 2048 bytes, and the last page contains 32 bytes (four CRC checksums, each
8 bytes).

If we coalesce any of the five pages in one, then we need to (when we
take the request out of the ring) in the backend, to reconstruct these
five pages. 

My thought was that with the fsect, lsect as they exist now, we will be 
tempted to just colesce four sectors in a page and just make lsect = fsect + 4.

That however is _not_ what we are doing now - I think. We look to recreate
the layout exactly as the READ/WRITE requests are set to xen-blkfront.

  reply	other threads:[~2013-03-22  1:11 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-28 10:28 [PATCH RFC 00/12] xen-block: indirect descriptors Roger Pau Monne
2013-02-28 10:28 ` [PATCH RFC 01/12] xen-blkback: don't store dev_bus_addr Roger Pau Monne
2013-02-28 10:58   ` [Xen-devel] " Jan Beulich
2013-03-04 17:19     ` Roger Pau Monné
2013-03-05  8:06       ` Jan Beulich
2013-03-05 17:02         ` Roger Pau Monné
2013-02-28 10:28 ` [PATCH RFC 02/12] xen-blkback: fix foreach_grant_safe to handle empty lists Roger Pau Monne
2013-02-28 10:28 ` [PATCH RFC 03/12] xen-blkfront: switch from llist to list Roger Pau Monne
2013-02-28 10:28 ` [PATCH RFC 04/12] xen-blkfront: pre-allocate pages for requests Roger Pau Monne
2013-03-04 19:39   ` Konrad Rzeszutek Wilk
2013-03-05 11:04     ` Roger Pau Monné
2013-03-05 14:18       ` Konrad Rzeszutek Wilk
2013-03-05 16:30         ` Roger Pau Monné
2013-03-05 21:53           ` Konrad Rzeszutek Wilk
2013-03-06  9:17             ` Roger Pau Monné
2013-02-28 10:28 ` [PATCH RFC 05/12] xen-blkfront: remove frame list from blk_shadow Roger Pau Monne
2013-02-28 10:28 ` [PATCH RFC 06/12] xen-blkback: implement LRU mechanism for persistent grants Roger Pau Monne
2013-03-04 20:10   ` Konrad Rzeszutek Wilk
2013-03-05 18:10     ` Roger Pau Monné
2013-03-05 21:49       ` Konrad Rzeszutek Wilk
2013-03-18 17:00         ` Roger Pau Monné
2013-02-28 10:28 ` [PATCH RFC 07/12] xen-blkback: print stats about " Roger Pau Monne
2013-02-28 10:28 ` [PATCH RFC 08/12] xen-blkback: use balloon pages for all mappings Roger Pau Monne
2013-03-04 20:22   ` Konrad Rzeszutek Wilk
2013-03-26 17:30     ` Roger Pau Monné
2013-03-26 17:48     ` Roger Pau Monné
2013-02-28 10:28 ` [PATCH RFC 09/12] xen-blkback: move pending handles list from blkbk to pending_req Roger Pau Monne
2013-02-28 11:07   ` [Xen-devel] " Jan Beulich
2013-02-28 10:28 ` [PATCH RFC 10/12] xen-blkback: make the queue of free requests per backend Roger Pau Monne
2013-02-28 11:08   ` [Xen-devel] " Jan Beulich
2013-02-28 10:28 ` [PATCH RFC 11/12] xen-blkback: expand map/unmap functions Roger Pau Monne
2013-02-28 10:28 ` [PATCH RFC 12/12] xen-block: implement indirect descriptors Roger Pau Monne
2013-02-28 11:19   ` [Xen-devel] " Jan Beulich
2013-02-28 12:00     ` Roger Pau Monné
2013-02-28 13:28       ` Jan Beulich
2013-03-04 20:44         ` Konrad Rzeszutek Wilk
2013-03-05  8:11           ` Jan Beulich
2013-03-05 14:16             ` Konrad Rzeszutek Wilk
2013-03-05 17:00               ` Roger Pau Monné
2013-03-05 21:45                 ` Konrad Rzeszutek Wilk
2013-03-04 20:41   ` Konrad Rzeszutek Wilk
2013-03-05 17:07     ` Roger Pau Monné
2013-03-05 21:46       ` Konrad Rzeszutek Wilk
2013-03-08 17:07         ` Roger Pau Monné
2013-03-22  1:10           ` Konrad Rzeszutek Wilk [this message]
2013-03-18 17:06   ` Roger Pau Monné
2013-03-19 14:38     ` Konrad Rzeszutek Wilk
2013-02-28 10:49 ` [Xen-devel] [PATCH RFC 00/12] xen-block: " Jan Beulich
2013-02-28 11:25   ` Roger Pau Monné
2013-02-28 11:35     ` Jan Beulich
2013-02-28 11:44       ` Roger Pau Monné

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130322011045.GD28902@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=james.harper@bendigoit.com.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=roger.pau@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox