From: Ian Campbell <ian.campbell@citrix.com>
To: Paul Durrant <paul.durrant@citrix.com>, xen-devel@lists.xenproject.org
Cc: Keir Fraser <keir@xen.org>,
Ian Jackson <ian.jackson@eu.citrix.com>,
Jan Beulich <jbeulich@suse.com>, Tim Deegan <tim@xen.org>
Subject: Re: [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing
Date: Fri, 8 Jan 2016 15:53:46 +0000 [thread overview]
Message-ID: <1452268426.26438.26.camel@citrix.com> (raw)
In-Reply-To: <1452171912-29857-3-git-send-email-paul.durrant@citrix.com>
On Thu, 2016-01-07 at 13:05 +0000, Paul Durrant wrote:
>
> diff --git a/xen/include/public/io/netif.h
> b/xen/include/public/io/netif.h
> index 1790ea0..06e0b61 100644
> --- a/xen/include/public/io/netif.h
> +++ b/xen/include/public/io/netif.h
> @@ -151,6 +151,270 @@
> */
>
> /*
> + * Control ring
> + * ============
> + *
> + * Some features, such as toeplitz hashing (detailed below), require a
> + * significant amount of out-of-band data to be passed from frontend to
> + * backend. Use of xenstore is not suitable for large quantities of data
> + * because of quota limitations and so a dedicated 'control ring' is used.
> + * The ability of the backend to use a control ring is advertised by
> + * setting:
> + *
> + * /local/domain/X/backend///feature-control-ring = "1"
> + *
> + * The frontend provides a control ring to the backend by setting:
> + *
> + * /local/domain//device/vif//ctrl-ring-ref =
> + * /local/domain//device/vif//event-channel-ctrl =
> + *
> + * where is the grant reference of the shared page used to
> + * implement the control ring and is an event channel to be used
> + * as a mailbox interrupt, before the frontend moves into the connected
> + * state.
I read this as saying that the mailbox interrupt can be used until the
frontend moves into the connected state, which I realise isn't what you
meant but having so much stuff between "by setting" and "before" makes it
easy to read wrongly.
How about "... as a mailbox interrupt. These keys must be written before
moving into the connected state." ?
> + * The control ring uses a fixed request/response message size and is
> + * balanced (i.e. one request to one response), so operationally it is much
> + * the same as a tramsmit or receive ring.
"transmit".
Is it intended to support out of order responses?
> + */
> +
> +/*
> + * Toeplitz hash types
> + * ===================
> + *
> + * For the purposes of the definitions below, 'Packet[]' is an array of
> + * octets containing an IP packet without options, 'Array[X..Y]' means a
> + * sub-array of 'Array' containing bytes X thru Y inclusive, and '+' is
> + * used to indicate concatenation of arrays.
> + */
> +
> +/*
> + * A hash calculated over an IP version 4 header as follows:
> + *
> + * Buffer[0..8] = Packet[12..15] + Packet[16..19]
Is there a reason to write it like this rather than:
+ * Buffer[0..8] = Packet[12..19]
?
Maybe something which is obvious if you know about Toeplitz in more than
just a passing fashion?
> +/*
> + * Control requests (netif_ctrl_request_t)
> + * =======================================
> + *
> + * All requests have the following format:
> + *
> + * 0 1 2 3 4 5 6 7 octet
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * | id | type | data[0] |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * | data[1] |
> + * +-----+-----+-----+-----+
These are packed in the ring, i.e. there are no implicit padding bytes,
right?
I know the Toeplitz stuff doesn't need it, but we should consider designing
in a scheme to handle control commands which need >8 octets of data, or
else we risk ending up with something as confusing as the data path...
> + *
> + * id: the request identifier, echoed in response.
> + * type: the type of request (see below)
> + * data[]: any data associated with the request (determined by type)
> + */
> +
> +struct netif_ctrl_request {
> + uint16_t id;
> + uint16_t type;
> +
> +#define NETIF_CTRL_TYPE_INVALID 0
> +#define NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS 1
> +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS 2
> +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_KEY 3
> +#define NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING 4
> +
> + uint32_t data[2];
> +};
> +typedef struct netif_ctrl_request netif_ctrl_request_t;
> +
> +/*
> + * type = NETIF_CTRL_TYPE_GET_TOEPLITZ_FLAGS:
> + *
> + * This is sent by the frontend to query the types of toeplitz
> + * hash supported by the backend. No data is required and to the
"and to the" has an extra "to" in it I think?
> + * data[] field is set to 0.
"fields are" ?
> + *
> + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS:
> + *
> + * This is sent by the frontend to set the types of toeplitz hash that
> + * the backend should calculate. Note that the 'maximal' type of hash
> + * should always be chosen. For example, if the frontend sets both IPV4
> + * and IPV4_TCP hash types then the latter hash type should be calculated
> + * for any TCP packet and the former only calculated for non-TCP packets.
> + * The data[0] field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values
> + * defined above. The data[1] field is set to 0.
> + *
> + * NOTE: Setting data[0] to 0 disables toeplitz hashing and the backend
> + * is free to choose how it steers packets to queues (which is the
> + * default state).
> + *
> + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_KEY:
Can this be sent before FLAGS?
Is it permissible to send this for a hash scheme not included in flags?
> + *
> + * This is sent by the frontend to set the key of toeplitz hash that
> + * the backend should calculate. The toeplitz algorithm is illustrated
> + * by the following pseudo-code:
> + *
> + * (Buffer[] and Key[] are treated as shift-registers where the MSB of
> + * Buffer/Key[0] is considered 'left-most' and the LSB of Buffer/Key[N-
> 1]
> + * is the 'right-most').
> + *
> + * Value = 0
> + * For number of bits in Buffer[]
> + * If (left-most bit of Buffer[] is 1)
> + * Value ^= left-most 32 bits of Key[]
> + * Key[] << 1
> + * Buffer[] << 1
> + *
> + * The data[0] field is set to the size of key in octets. The data[1]
> + * field is set to a grant reference of a page containing the key.
I suppose it is implicit that the key starts at offset 0 in the page?
> The
> + * reference must remain valid until the corresponding
> + * netif_ctrl_response_t has been processed.
May the ref be r/o? If so, should that be encouraged?
> + *
> + * type = NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING:
> + *
> + * This is sent by the frontend to set the mapping of toeplitz hash to
> + * queue number to be applied by the backend.
> + *
> + * The data[0] field is set to the order of the mapping. The data[1] field
> + * is set to a grant reference of a page containing the mapping. The
> + * reference must remain valid until the corresponding
> + * netif_ctrl_response_t has been processed.
> + *
> + * The format of the mapping is:
> + *
> + * 0 1 2 3 4 5 6 7 octet
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * | queue[0] |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * | queue[1] |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * | queue[2] |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * | queue[3] |
> + * .
> + * .
> + * | queue[N-1] |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + *
> + * where each queue value is less than "multi-queue-num-queues" (see above)
> + * and N is 1 << data[0].
> + *
> + * NOTE: Before a specific mapping is set using this request, the backend
> + * should map all toeplitz hash values to queue 0 (which is the only
> + * queue guaranteed to exist in all cases).
i.e. between receiving a NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS with a non-zero
set of flags and NETIF_CTRL_TYPE_SET_TOEPLITZ_MAPPING it should do this?
If it has not seen a NETIF_CTRL_TYPE_SET_TOEPLITZ_FLAGS, or it has seen one
with flags = 0 then as before the b/e is free to steer as it likes?
Is it valid to send this mapping command first before the flags?
> + */
> +
> +/*
> + * Control responses (netif_ctrl_response_t)
> + * =========================================
> + *
> + * All responses have the following format:
> + *
> + * 0 1 2 3 4 5 6 7 octet
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * | id | pad | status |
> + * +-----+-----+-----+-----+-----+-----+-----+-----+
> + * | data |
> + * +-----+-----+-----+-----+
> + *
> + * id: the corresponding request identifier
> + * pad: set to 0
> + * status: the status of request processing
> + * data: any data associated with the response (determined by type and
> + * status)
type needs to be remembered via the id? Why not echo it in the pad field
for convenience?
> + */
> +
> +struct netif_ctrl_response {
> + uint16_t id;
> + uint16_t pad;
> + uint32_t status;
> +
> +#define NETIF_CTRL_STATUS_SUCCESS 0
> +#define NETIF_CTRL_STATUS_NOT_SUPPORTED 1
> +#define NETIF_CTRL_STATUS_INVALID_PARAMETER 2
> +#define NETIF_CTRL_STATUS_BUFFER_OVERFLOW 3
Is the intention that these will be the union of all possible failure
modes, rather than making provision for type-specific failures?
These are all pretty generic though, and I suppose we could always figure
that out when some command has a very obscure failure case.
> +
> + uint32_t data;
> +};
> +typedef struct netif_ctrl_response netif_ctrl_response_t;
> +
> +/*
> + * type =
> + *
> + * The default response for any unrecognised request has the status field
> + * set to NETIF_CTRL_STATUS_NOT_SUPPORTED and the data field set to 0.
> + *
> + * type = NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS:
> + *
> + * Since the request carries no data there is no reason for processing to
> + * fail, hence the status field is set to NETIF_CTRL_STATUS_SUCCESS and the
> + * data field is a bitwise OR of NETIF_CTRL_TOEPLITZ_FLAG_* values (defined
> + * above) indicating which hash types are supported by the backend.
> + * If no hashing is supported then the data field should be set to 0.
I wonder if it would be clearer to define the generic request and response
structures up front followed by the specific commands and their
request+response formats at the same time, rather than splitting the
specific command requests and responses into different places?
Like:
Command: NETIF_CTRL_MSG_GET_TOEPLITZ_FLAGS
Description: Blah blah
Request data format: Words words
Valid response status codes:
- One
- Another
Responds data format: Words
(i.e. more like API than protocol documentation in the form of independent
looking things going back and forth)
> + *
> + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_FLAGS:
> + *
> + * If the data[0] field in the request is invalid (i.e. contains unsupported
> + * hash types) then the status field is set to
> + * NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset should
"request"
> succeed
> + * and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS.
> + * The data field should be set to 0.
> + *
> + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_KEY:
> + *
> + * If the data[0] field in the request is an invalid key length (too big)
Can it not be too small as well? What is the behaviour if it is?
> + * then the status field is set to NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
> + * data[1] field is an invalid grant reference then the status field is set
> + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the request should
> + * succeed and hence the status field is set to NETIF_CTRL_STATUS_SUCCESS.
> + * The data field should be set to 0.
> + *
> + * type = NETIF_CTRL_MSG_SET_TOEPLITZ_MAPPING:
> + *
> + * If the data[0] field in the request is an invalid mapping order (too big)
or too small?
> + * then the status field is set to NETIF_CTRL_STATUS_BUFFER_OVERFLOW, If the
> + * data[1] field is an invalid grant reference then the status field is set
> + * to NETIF_CTRL_STATUS_INVALID_PARAMETER. Otherwise the requset should
"request"
Ian.
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
next prev parent reply other threads:[~2016-01-08 15:54 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-07 13:05 [PATCH v4 0/3] public/io/netif.h: support for toeplitz hashing Paul Durrant
2016-01-07 13:05 ` [PATCH v4 1/3] public/io/netif.h: document transmit and receive wire formats separately Paul Durrant
2016-01-08 15:24 ` Ian Campbell
2016-01-08 15:56 ` Paul Durrant
2016-01-08 16:10 ` Ian Campbell
2016-01-07 13:05 ` [PATCH v4 2/3] public/io/netif.h: document control ring and toeplitz hashing Paul Durrant
2016-01-08 15:53 ` Ian Campbell [this message]
2016-01-08 16:19 ` Paul Durrant
2016-01-08 16:46 ` Ian Campbell
2016-01-08 17:07 ` Paul Durrant
2016-01-08 17:22 ` Ian Campbell
2016-01-08 17:35 ` Paul Durrant
2016-01-08 16:07 ` David Vrabel
2016-01-08 16:21 ` Paul Durrant
2016-01-08 16:22 ` David Vrabel
2016-01-07 13:05 ` [PATCH v4 3/3] public/io/netif.h: document new extra info for passing hash values Paul Durrant
2016-01-08 16:05 ` Ian Campbell
2016-01-08 16:26 ` Paul Durrant
2016-01-08 16:48 ` Ian Campbell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1452268426.26438.26.camel@citrix.com \
--to=ian.campbell@citrix.com \
--cc=ian.jackson@eu.citrix.com \
--cc=jbeulich@suse.com \
--cc=keir@xen.org \
--cc=paul.durrant@citrix.com \
--cc=tim@xen.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).