public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: "Wan, Kaike" <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"Hefty,
	Sean" <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	"Weiny, Ira" <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Jason Gunthorpe
	<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>,
	"Hal Rosenstock
	(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org)"
	<hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>,
	Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [PATCH v2 RFC] IB/sa: Route SA pathrecord query through netlink
Date: Tue, 26 May 2015 10:57:18 -0400	[thread overview]
Message-ID: <1432652238.28905.108.camel@redhat.com> (raw)
In-Reply-To: <3F128C9216C9B84BB6ED23EF16290AFB0CAB3806-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 8386 bytes --]

On Tue, 2015-05-26 at 14:03 +0000, Wan, Kaike wrote:
> I. Introduction
> 
> After posting our design to the mailing list, we received comments concerning various aspects of the
> design from Sean Hefty, Ira Weiny, Jason Gunthorpe, and Doug Ledford. Thank you all for the help.
> 
> The main issues are listed below:
> 1. Extensibility: the design should be flexible and readily extended to other applications;
> 2. Multiple data records: a query can return multiple data records (eg multiple pathrecords);
> 3. Existing code: the design should use existing code as much as possible;
> 4. Various query points in the kernel: what are the requirements (parameters, expected results) for
>    various queries that may exist in the kernel (IPoIB, RDMA CM, etc).
> 
> As our subject title indicates, we are trying to design for the kernel to query a local user-space
> service, more specifically, for the ib_sa module to send a pathrecord query to a local user-space SA cache.
> If anyone has information or requirements for other kernel query points, we will be happy to know.
> 
> In our previous design, we created a data header to contain various information about the query and
> response:
> 
> struct ib_nl_data_hdr {
> 	__u8	version;
> 	__u8	opcode;
> 	__u16	status;
> 	__u16	type;
> 	__u16	reserved;
> 	__u32	flags;
> 	__u32	length;
> };
> 
> This was modeled after the ibacm messages and the message layout is diagrammed below:
> 
>   +----------------+
>   | netlink header |
>   +----------------+
>   |  Data header   |
>   +----------------+
>   |      Data      |
>   +----------------+
> 
> The design was extensible, but suffered from the fact that it did not take full use of the netlink 
> message header.
> 
> In this version of the design, we will make full use of the netlink header and the existing attribute
> interface, as detailed below.
> 
> II. Message layout
> 
> The general message layout is shown here:
> 
> 
>   +----------------+
>   | netlink header |
>   +----------------+
>   |  Attribute 1   |
>   +----------------+
>   |  Attribute 2   |
>   +----------------+
>   |       ...      |
>   +----------------+
>   |  Attribute N   |
>   +----------------+
> 
> The number of attributes present in the request/response varies. As shown, there is no new data 
> header to describe either the request nor the response. The netlink header and various attributes
> will be described later.
> 
> III. Netlink protocol, multicast group, and kernel client
> 
> This design is targeted to the NETLINK_RDMA protocol, and a new multicast group RDMA_NL_GROUP_LS is
> added for the local service:
> 
> enum {
> 	RDMA_NL_GROUP_CM = 1,
> 	RDMA_NL_GROUP_IWPM,
> 	RDMA_NL_GROUP_LS,
> 	RDMA_NL_NUM_GROUPS
> };
> 
> In addition, each kernel client should define a client index so that the common rdma code could
> route the response to the right client. For this purpose, we define the RDMA_NL_SA client for the
> ib_sa module:
> 
> enum {
> 	RDMA_NL_RDMA_CM = 1,
> 	RDMA_NL_NES,
> 	RDMA_NL_C4IW,
> 	RDMA_NL_SA,
> 	RDMA_NL_NUM_CLIENTS
> };
> 
> As mentioned previously, each query point in the kernel should have its own client index.
> 
> IV. Netlink message header
> 
> The netlink header is copied here:
> 
> struct nlmsghdr {
> 	__u32		nlmsg_len;	/* Length of message including header */
> 	__u16		nlmsg_type;	/* Message content */
> 	__u16		nlmsg_flags;	/* Additional flags */
> 	__u32		nlmsg_seq;	/* Sequence number */
> 	__u32		nlmsg_pid;	/* Sending process port ID */
> };
> 
> The message type for rdma clients is also copied below:
> 
> #define RDMA_NL_GET_TYPE(client, op) ((client << 10) + op)
> 
> More clearly:
> 
>     Bits  	Description
>    --------------------------
>     15-10       Client index
>     09-00       Opcode
> 
> As described previously, a netlink message is routed by protocol (NETLINK_RDMA), multicast group
> (RDMA_NL_LS), and client (encoded in the nlmsg_type field for rdma messages). Therefore, the
> opcode (encoded in nlmsg_type), the sequence number (nlmsg_seq) and addition flags (nlmsg_flags)
> are all local to the client. This is important when we define these fields as they can overlap for 
> different clients.
> 
> (1) Opcode
> 
> The opcode for local service SA client is defined below:
> 
> enum {
> 	RDMA_NL_LS_OP_RESOLVE = 0,
> 	RDMA_NL_LS_OP_SET_TIMEOUT,
> 	RDMA_NL_LS_NUM_OPS
> };
> 
> The RESOLVE opcode is used by the ib_sa to send pathrecord query to the user-space application 
> while the SET_TIMEOUT opcode can be used by the user-space application to set the netlink timeout
> value for the kernel client. Additional opcodes can be added if necessary.
> 
> It should be emphasized that the opcode is client specific and therefore can be overlapped for 
> different clients. Therefore, the 10 bits should be large enough for various requests.
> 
> (2) nlmsg_flags
> 
> This flags fields are again client specific. But the lower byte (bits 7-0) is generally reserved
> and the upper bits can be used to define request specific flags:
> 
> #define RDMA_NL_LS_F_OK		0x0100	/* Success response */
> #define RDMA_NL_LS_F_ERR	0x0200	/* Failed response */
> 
> These two bits can be used to indicate whether a message is a response. If the status is ERR, an
> error code can be contained in a status attribute, as described low.
> 
> (3) Attribute type
> 
> Request parameters and response data records can be embedded in attributes.
> 
> The attribute header is copied here:
> 
> struct nlattr {
> 	__u16           nla_len;
> 	__u16           nla_type;
> };
> 
> Each attribute is preceded by the attribute header and followed by attribute specific data.
> 
> It should be reminded that attribute type is request (opcode) specific and therefore could be 
> overloaded for different requests if needed.
> 
> For ib_sa RESOLVE query, the following attribute types are defined:
> 
> enum {
> 	LS_NLA_TYPE_STATUS = 0,
> 	LS_NLA_TYPE_ADDRESS,
> 	LS_NLA_TYPE_PATH_RECORD,
> 	LS_NLA_TYPE_MAX
> };
> 
> (4) Status attribute
> 
> The status attribute is mostly used to carry error code if the RDMA_NL_LS_F_ERR bits in nlmsg_flags
> field in the netlink message header is set. If the response is success, there is no need to include
> this attribute in the response data (it's not an error, either).
> 
> num {
> 	LS_NLA_STATUS_SUCCESS = 0,
> 	LS_NLA_STATUS_INVAL,
> 	LS_NLA_STATUS_ENODATA,
> 	LS_NLA_STATUS_MAX
> };
> 
> struct rdma_nla_ls_status {
> 	__u32		status;
> };
> 
> (5) Address attribute
> 
> This attribute is normally included in the RESOLVE request.
> 
> num {
> 	LS_NLA_ADDR_F_SRC		= 1,
> 	LS_NLA_ADDR_F_DST		= (1<<1),
> 	LS_NLA_ADDR_F_HOSTNAME		= {1<<2},
> 	LS_NLA_ADDR_F_IPV4		= (1<<3),
> 	LS_NLA_ADDR_F_IPV6		= (1<<4)
> };
> 
> struct rdma_nla_ls_addr {
> 	__u32		flags;
> 	__u32		addr[0];
> };
> 
> The address can be hostname (string), IPv4 or IPv6 address. The source and destination flags are
> also defined.
> 
> (6) Pathrecord attribute
> 
> This attribute can be included in both the RESOLVE request and response.
> 
> num {
> 	LS_NLA_PATH_F_GMP		= 1,
> 	LS_NLA_PATH_F_PRIMARY		= (1<<1),
> 	LS_NLA_PATH_F_ALTERNATE		= (1<<2),
> 	LS_NLA_PATH_F_OUTBOUND		= (1<<3),
> 	LS_NLA_PATH_F_INBOUND		= (1<<4),
> 	LS_NLA_PATH_F_INBOUND_REVERSE 	= (1<<5),
> 	LS_NLA_PATH_F_BIDIRECTIONAL	= IB_PATH_OUTBOUND | IB_PATH_INBOUND_REVERSE,
> 	LS_NLA_PATH_F_USER		= (1<6)
> };
> 
> struct rdma_nla_ls_path_rec {
> 	__u32	flags;
> 	__u32	path_rec[0];
> };
> 
> The format of the pathrecord can be indicated by the flags and the data is contained in path_rec[].
> For example, when LS_NLA_PATH_F_USER is set, the format is struct ib_user_path_rec.
> 
> V. Summary
> 
> It's clear that this design is flexible, extensible, and can be easily enhanced to address various
> kernel query points. It uses the existing netlink message header and attribute interface, and can
> contain multiple attribute records.
> 
> 
> 
> Change since v1:
> -- Completely revised the design to use netlink header and attribute interface.

On the face of it, this is a much improved design.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

  parent reply	other threads:[~2015-05-26 14:57 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-26 14:03 [PATCH v2 RFC] IB/sa: Route SA pathrecord query through netlink Wan, Kaike
     [not found] ` <3F128C9216C9B84BB6ED23EF16290AFB0CAB3806-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-05-26 14:57   ` Doug Ledford [this message]
     [not found]     ` <1432652238.28905.108.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-26 16:18       ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1432652238.28905.108.camel@redhat.com \
    --to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
    --cc=ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org \
    --cc=kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
    --cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox