From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
Subject: Re: [PATCH v1 7/7] xen-block: implement indirect descriptors
Date: Wed, 17 Apr 2013 10:25:54 -0400 [thread overview]
Message-ID: <20130417142554.GG21378@phenom.dumpdata.com> (raw)
In-Reply-To: <516C3264.3050409@citrix.com>
> >> +struct blkif_x86_32_request_indirect {
> >> + uint8_t indirect_op;
> >> + uint16_t nr_segments;
> >
> > This needs to be documented. Is there are limit to what it can be? What if
> > the frontend sets it to 1231231?
>
> This is checked in dispatch_rw_block_io:
>
> if (unlikely(nseg == 0 && operation != WRITE_FLUSH) ||
> unlikely((req->operation != BLKIF_OP_INDIRECT) &&
> (nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) ||
> unlikely((req->operation == BLKIF_OP_INDIRECT) &&
> (nseg > MAX_INDIRECT_SEGMENTS))) {
> pr_debug(DRV_PFX "Bad number of segments in request (%d)\n",
> nseg);
I am wondering whether we should make that pr_err ..?
> /* Haven't submitted any bio's yet. */
> goto fail_response;
> }
>
> And the value of MAX_INDIRECT_SEGMENTS is exported to the frontend in
> xenstore, so the frontend does:
>
> max_indirect_segments = min(MAX_INDIRECT_SEGMENTS, front_max_segments);
>
> To know how many segments it can safely use in an indirect op.
<nods>
>
> >
> >> + uint64_t id;
> >> + blkif_sector_t sector_number;
> >> + blkif_vdev_t handle;
> >> + uint16_t _pad1;
> >> + grant_ref_t indirect_grefs[BLKIF_MAX_INDIRECT_GREFS_PER_REQUEST];
> >
> > And how do we figure out whether we are going to use one indirect_grefs or many?
> > That should be described here. (if I understand it right it would be done
> > via the maximum-indirect-something' variable)
>
> I've added the following comment:
>
> /*
> * The maximum number of indirect segments (and pages) that will
> * be used is determined by MAX_INDIRECT_SEGMENTS, this value
> * is also exported to the guest (via xenstore max-indirect-segments
.. feature-max-indirect-segments..
> * entry), so the frontend knows how many indirect segments the
> * backend supports.
> */
>
Great!
> >
> >
> >> +} __attribute__((__packed__));
> >> +
> >
> > Per my calculation the size of this structure is 55 bytes. The 64-bit is 59 bytes.
> >
> > It is a bit odd, but then 1 byte is for the 'operation' in the 'blkif_x*_request',
> > so it comes out to 56 and 60 bytes.
> >
> > Is that still the right amount ? I thought we wanted to flesh it out to be a nice
> > 64 byte aligned so that the future patches which will make the requests be
> > more cache-aligned and won't have to play games?
>
> Yes, I've added a uint32_t for 64bits and a uint64_t for 32bits, that
> makes it 64bytes aligned in both cases.
Thanks
>
> >
> >> struct blkif_x86_32_request {
> >> uint8_t operation; /* BLKIF_OP_??? */
> >> union {
> >> struct blkif_x86_32_request_rw rw;
> >> struct blkif_x86_32_request_discard discard;
> >> struct blkif_x86_32_request_other other;
> >> + struct blkif_x86_32_request_indirect indirect;
> >> } u;
> >> } __attribute__((__packed__));
> >>
> >> @@ -127,12 +149,24 @@ struct blkif_x86_64_request_other {
> >> uint64_t id; /* private guest value, echoed in resp */
> >> } __attribute__((__packed__));
> >>
> >> +struct blkif_x86_64_request_indirect {
> >> + uint8_t indirect_op;
> >> + uint16_t nr_segments;
> >> + uint32_t _pad1; /* offsetof(blkif_..,u.indirect.id)==8 */
> >
> > The comment is a bit off compared to the rest.
> >
> >> + uint64_t id;
> >> + blkif_sector_t sector_number;
> >> + blkif_vdev_t handle;
> >> + uint16_t _pad2;
> >> + grant_ref_t indirect_grefs[BLKIF_MAX_INDIRECT_GREFS_PER_REQUEST];
> >> +} __attribute__((__packed__));
> >> +
> >> struct blkif_x86_64_request {
> >> uint8_t operation; /* BLKIF_OP_??? */
> >> union {
> >> struct blkif_x86_64_request_rw rw;
> >> struct blkif_x86_64_request_discard discard;
> >> struct blkif_x86_64_request_other other;
> >> + struct blkif_x86_64_request_indirect indirect;
> >> } u;
> >> } __attribute__((__packed__));
> >>
> >> @@ -257,6 +291,11 @@ struct xen_blkif {
> >> wait_queue_head_t waiting_to_free;
> >> };
> >>
> >> +struct seg_buf {
> >> + unsigned long offset;
> >> + unsigned int nsec;
> >> +};
> >> +
> >> /*
> >> * Each outstanding request that we've passed to the lower device layers has a
> >> * 'pending_req' allocated to it. Each buffer_head that completes decrements
> >> @@ -271,9 +310,16 @@ struct pending_req {
> >> unsigned short operation;
> >> int status;
> >> struct list_head free_list;
> >> - struct page *pages[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> >> - struct persistent_gnt *persistent_gnts[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> >> - grant_handle_t grant_handles[BLKIF_MAX_SEGMENTS_PER_REQUEST];
> >> + struct page *pages[MAX_INDIRECT_SEGMENTS];
> >> + struct persistent_gnt *persistent_gnts[MAX_INDIRECT_SEGMENTS];
> >> + grant_handle_t grant_handles[MAX_INDIRECT_SEGMENTS];
> >> + grant_ref_t grefs[MAX_INDIRECT_SEGMENTS];
> >> + /* Indirect descriptors */
> >> + struct persistent_gnt *indirect_persistent_gnts[MAX_INDIRECT_GREFS];
> >> + struct page *indirect_pages[MAX_INDIRECT_GREFS];
> >> + grant_handle_t indirect_handles[MAX_INDIRECT_GREFS];
> >> + struct seg_buf seg[MAX_INDIRECT_SEGMENTS];
> >> + struct bio *biolist[MAX_INDIRECT_SEGMENTS];
> >
> > Oh wow. That is a big structure. So 1536 for the BLKIF_MAX_SEGMENTS_PER_REQUEST
> > ones and 24 bytes for the ones that deail with MAX_INDIRECT_GREFS.
> >
> > This could be made look nicer. For example you could do:
> >
> > struct indirect {
> > struct page;
> > grant_handle_t handle;
> > struct persistent_gnt *gnt;
> > } desc[MAX_INDIRECT_GREFS];
> >
> > .. and the same treatment to the BLKIF_MAX_SEGMENTS_PER_REQUEST
> > one.
> >
> > Thought perhaps it makes more sense to do that with a later patch as a cleanup.
>
> This was similar to what Jan was suggesting, but I would prefer to leave
> that for a latter patch.
<nods>
.. snip..
> >> +/*
> >> + * Maximum number of segments in indirect requests, the actual value used by
> >> + * the frontend driver is the minimum of this value and the value provided
> >> + * by the backend driver.
> >> + */
> >> +
> >> +static int xen_blkif_max_segments = 32;
> >> +module_param_named(max_segments, xen_blkif_max_segments, int, 0);
> >> +MODULE_PARM_DESC(max_segments,
> >> +"Maximum number of segments in indirect requests");
> >
> > This means a new entry in Documentation/ABI/sysfs/...
>
> I think I should just not allow this to be exported, since we cannot
> change it at run-time, so if a user changes the 0 to something else, and
> changes the value... nothing will happen (because this is only used
> before the device is connected).
OK, lets then not make it a module_param_named.
>
> >
> > Perhaps the xen-blkfront part of the patch should be just split out to make
> > this easier?
> >
> > Perhaps what we really should have is just the 'max' value of megabytes
> > we want to handle on the ring.
> >
> > As right now 32 ring requests * 32 segments = 4MB. But if the user wants
> > to se the max: 32 * 4096 = so 512MB (right? each request would handle now 16MB
> > and since we have 32 of them = 512MB).
>
> I've just set that to something that brings a performance benefit
> without having to map an insane number of persistent grants in blkback.
>
> Yes, the values are correct, but the device request queue (rq) is only
> able to provide read requests with 64 segments or write requests with
> 128 segments. I haven't been able to get larger requests, even when
> setting this to 512 or higer.
What are you using to drive the requests? 'fio'?
.. snip..
> >> @@ -588,13 +666,14 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size)
> >> static void xlvbd_flush(struct blkfront_info *info)
> >> {
> >> blk_queue_flush(info->rq, info->feature_flush);
> >> - printk(KERN_INFO "blkfront: %s: %s: %s %s\n",
> >> + printk(KERN_INFO "blkfront: %s: %s: %s %s %s\n",
> >> info->gd->disk_name,
> >> info->flush_op == BLKIF_OP_WRITE_BARRIER ?
> >> "barrier" : (info->flush_op == BLKIF_OP_FLUSH_DISKCACHE ?
> >> "flush diskcache" : "barrier or flush"),
> >> info->feature_flush ? "enabled" : "disabled",
> >> - info->feature_persistent ? "using persistent grants" : "");
> >> + info->feature_persistent ? "using persistent grants" : "",
> >> + info->max_indirect_segments ? "using indirect descriptors" : "");
> >
> > This is a bit of mess now:
> >
> >
> > [ 0.968241] blkfront: xvda: barrier or flush: disabled using persistent grants using indirect descriptors
> >
> > Should it just be:
> >
> > 'persistent_grants: enabled; indirect descriptors: enabled'
> >
> > ?
>
> Agreed. If you don't mind I will change the wording for both persistent
> grants and indirect descriptors in this same patch.
That is OK.
.. snip..
> >> * Invoked when the backend is finally 'ready' (and has told produced
> >> * the details about the physical device - #sectors, size, etc).
> >> @@ -1414,8 +1735,9 @@ static void blkfront_connect(struct blkfront_info *info)
> >> set_capacity(info->gd, sectors);
> >> revalidate_disk(info->gd);
> >>
> >> - /* fall through */
> >> + return;
> >
> > How come?
>
> We cannot fall thought anymore, because now we call blkif_recover in
> order to recover after performing the suspension, in the past this was
> just a return so we could fall thought.
>
> We need to perform the recover at this point because we need to know if
> the backend supports indirect descriptors, and how many. In the past we
> used to perform the recover before the backend announced it's features,
> but now we need to know if the backend supports indirect segments or not.
OK. Can you put that nice explanation as a comment in there please?
next prev parent reply other threads:[~2013-04-17 14:26 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-27 11:10 [PATCH v1 0/7] xen-block: indirect descriptors Roger Pau Monne
2013-03-27 11:10 ` [PATCH v1 1/7] xen-blkback: print stats about persistent grants Roger Pau Monne
2013-03-27 11:10 ` Roger Pau Monne
2013-04-09 14:47 ` Konrad Rzeszutek Wilk
2013-04-09 14:47 ` Konrad Rzeszutek Wilk
2013-03-27 11:10 ` [PATCH v1 2/7] xen-blkback: use balloon pages for all mappings Roger Pau Monne
2013-03-27 11:10 ` Roger Pau Monne
2013-04-09 14:47 ` Konrad Rzeszutek Wilk
2013-04-09 14:47 ` Konrad Rzeszutek Wilk
2013-04-15 8:05 ` Roger Pau Monné
2013-04-15 8:05 ` Roger Pau Monné
2013-04-15 8:12 ` Roger Pau Monné
2013-04-15 8:12 ` [Xen-devel] " Roger Pau Monné
2013-04-17 14:00 ` Konrad Rzeszutek Wilk
2013-04-17 14:00 ` Konrad Rzeszutek Wilk
2013-04-15 9:14 ` Roger Pau Monné
2013-04-17 14:05 ` Konrad Rzeszutek Wilk
2013-04-17 14:05 ` Konrad Rzeszutek Wilk
2013-04-15 9:14 ` Roger Pau Monné
2013-04-09 15:46 ` Konrad Rzeszutek Wilk
2013-04-09 15:46 ` Konrad Rzeszutek Wilk
2013-04-15 8:21 ` Roger Pau Monné
2013-04-15 8:21 ` Roger Pau Monné
2013-03-27 11:10 ` [PATCH v1 3/7] xen-blkback: implement LRU mechanism for persistent grants Roger Pau Monne
2013-03-27 11:10 ` Roger Pau Monne
2013-04-09 15:42 ` Konrad Rzeszutek Wilk
2013-04-15 11:19 ` Roger Pau Monné
2013-04-17 14:15 ` Konrad Rzeszutek Wilk
2013-04-17 14:15 ` Konrad Rzeszutek Wilk
2013-04-15 11:19 ` Roger Pau Monné
2013-04-09 15:42 ` Konrad Rzeszutek Wilk
2013-03-27 11:10 ` [PATCH v1 4/7] xen-blkback: move pending handles list from blkbk to pending_req Roger Pau Monne
2013-03-27 11:10 ` Roger Pau Monne
2013-03-27 11:10 ` [PATCH v1 5/7] xen-blkback: make the queue of free requests per backend Roger Pau Monne
2013-03-27 11:10 ` Roger Pau Monne
2013-04-09 16:13 ` Konrad Rzeszutek Wilk
2013-04-15 13:50 ` Roger Pau Monné
2013-04-17 14:16 ` Konrad Rzeszutek Wilk
2013-04-17 14:16 ` Konrad Rzeszutek Wilk
2013-04-15 13:50 ` Roger Pau Monné
2013-04-09 16:13 ` Konrad Rzeszutek Wilk
2013-03-27 11:10 ` [PATCH v1 6/7] xen-blkback: expand map/unmap functions Roger Pau Monne
2013-03-27 11:10 ` Roger Pau Monne
2013-03-27 11:10 ` [PATCH v1 7/7] xen-block: implement indirect descriptors Roger Pau Monne
2013-03-27 11:10 ` Roger Pau Monne
2013-04-09 18:49 ` Konrad Rzeszutek Wilk
2013-04-15 17:01 ` Roger Pau Monné
2013-04-17 14:25 ` Konrad Rzeszutek Wilk
2013-04-17 14:25 ` Konrad Rzeszutek Wilk [this message]
2013-04-17 17:04 ` Roger Pau Monné
2013-04-17 17:04 ` Roger Pau Monné
2013-04-17 17:27 ` Konrad Rzeszutek Wilk
2013-04-17 17:27 ` Konrad Rzeszutek Wilk
2013-04-18 12:43 ` Jens Axboe
2013-04-18 14:16 ` Roger Pau Monné
2013-04-18 14:16 ` Roger Pau Monné
2013-04-18 14:26 ` Jens Axboe
2013-04-18 14:26 ` Jens Axboe
2013-04-18 15:14 ` Roger Pau Monné
2013-04-18 15:14 ` Roger Pau Monné
2013-04-18 15:58 ` Jens Axboe
2013-04-18 15:58 ` Jens Axboe
2013-04-18 12:43 ` Jens Axboe
2013-04-15 17:01 ` Roger Pau Monné
2013-04-09 18:49 ` Konrad Rzeszutek Wilk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130417142554.GG21378@phenom.dumpdata.com \
--to=konrad.wilk@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=roger.pau@citrix.com \
--cc=xen-devel@lists.xen.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.