From: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: "Wan, Kaike" <kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: "linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"Hefty,
Sean" <sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
"Weiny, Ira" <ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
Jason Gunthorpe
<jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>,
"Hal Rosenstock
(hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org)"
<hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>,
Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Subject: Re: [RFC] IB/sa: Route SA pathrecord query through netlink
Date: Thu, 21 May 2015 13:35:23 -0400 [thread overview]
Message-ID: <1432229723.28905.40.camel@redhat.com> (raw)
In-Reply-To: <1432228874.28905.35.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 5320 bytes --]
On Thu, 2015-05-21 at 13:21 -0400, Doug Ledford wrote:
> On Thu, 2015-05-21 at 13:52 +0000, Wan, Kaike wrote:
> > In our previous posting to the mailing list, we proposed to send a MAD request from kernel (more
> > specifically, from ib_sa module) to a user space application (ibacm in this case) through netlink.
> > The user space application will send back the response. This simple scheme can achieve the goal
> > of a local SA cache in user space.
> >
> > The format of the request and response is diagrammed below:
> >
> > ------------------
> > | netlink header |
> > ------------------
> > | MAD |
> > ------------------
> >
> > The kernel requests for a pathrecord, and the user application finds it in its local cache and sends
> > it to the kernel. If the netlink request fails, the kernel will send the request to SA through the
> > normal IB path (ib_mad -> hca driver -> wire).
> >
> > Jason pointed out that this message format was limited to lower stack format (MAD) and its use
> > could not be readily extended to upper layer modules like rdma_cm. After lengthy discussions, we
> > come up with a new and modified scheme, as described below.
> >
> > The general format of the request and response will be the same:
> >
> > ------------------
> > | netlink header |
> > ------------------
> > | Data header |
> > ------------------
> > | Data |
> > ------------------
> >
> > The data header contains information about the type of request/response, the status (for response),
> > the type (format) of the data, the total length of the data header + data, and a flags field about
> > the request/response or data.
> >
> > Based on the type of the data, the data section may be in different format: a string about the host
> > name to resolve, an IP4/IP6 address, a pathrecord, a user pathrecord (struct ib_user_path_rec),
> > or simply a MAD (like our posted patches), etc. Essentially it can be of any format based on the
> > data type. The key is to document the format so that the kernel and user space can communicate
> > correctly.
> >
> > The details are described below:
> >
> > #define IB_NL_VERSION 0x01
> >
> > #define IB_NL_OP_MASK 0x0F
> > #define IB_NL_OP_RESOLVE 0x01
> > #define IB_NL_OP_QUERY_PATH 0x02
> > #define IB_NL_OP_SET_TIMEOUT 0x03
> > #define IB_NL_OP_ACK 0x80
>
> If OP_ACK is one bit, why isn't the OP_MASK 0x7f?
>
> > #define IB_NL_STATUS_SUCCESS 0x0000
> > #define IB_NL_STATUS_ENODATA 0x0001
>
> Do we need 16 bits for a bool? In fact, couldn't this actually be
> switched so that the return of the message uses OP_SUCCESS instead of
> OP_ACK?
>
> In other words, instead of two items here, couldn't the ACK bit be
> dropped entirely and replaced with SUCCESS so that when the user app
> returns the netlink packet, if the op on return == to the op on send, it
> failed, if it's op | SUCCESS, it succeeded?
>
> > #define IB_NL_DATA_TYPE_INVALID 0x0000
> > #define IB_NL_DATA_TYPE_NAME 0x0001
> > #define IB_NL_DATA_TYPE_ADDRESS_IP 0x0002
> > #define IB_NL_DATA_TYPE_ADDRESS_IP6 0x0003
> > #define IB_NL_DATA_TYPE_PATH_RECORD 0x0004
> > #define IB_NL_DATA_TYPE_USER_PATH_REC 0x0005
> > #define IB_NL_DATA_TYPE_MAD 0x0006
> >
> > #define IB_NL_FLAGS_PATH_GMP 1
> > #define IB_NL_FLAGS_PATH_PRIMARY (1<<1)
> > #define IB_NL_FLAGS_PATH_ALTERNATE (1<<2)
> > #define IB_NL_FLAGS_PATH_OUTBOUND (1<<3)
> > #define IB_NL_FLAGS_PATH_INBOUND (1<<4)
> > #define IB_NL_FLAGS_PATH_INBOUND_REVERSE (1<<5)
> > #define IB_NL_FLAGS_PATH_BIDIRECTIONAL (IB_PATH_OUTBOUND | IB_PATH_INBOUND_REVERSE)
> > #define IB_NL_FLAGS_QUERY_SA (1<<31)
> > #define IB_NL_FLAGS_NODELAY (1<<30)
>
> Please keep these in numerical order, don't put <<31 and below it <<30
>
> > struct ib_nl_data_hdr {
> > __u8 version;
> > __u8 opcode;
> > __u16 status;
> Drop status because we fold it into opcode
> > __u16 type;
> > __u16 reserved;
> Drop reserved because we don't need alignment any more
> > __u32 flags;
> Flags is the only thing using bits fast, and we would want to make this
> header an even 128bits in length, so add a __u32 reserved; here. That's
> more likely to be useful than the current layout since we are likely to
> run out of flags before anything else.
> > __u32 length;
> > };
> >
> > struct ib_nl_data {
> > struct ib_nl_data_hdr hdr;
> > __u8 data[0];
> > };
> >
> >
> > These defines and structures can be added to file include/upai/rdma/rdma_netlink.h (replace with
> > RDMA_NL prefix) or contained in a seperate file (include/upai/rdma/ib_netlink.h ???).
> >
> > Please share your thoughts.
>
> I think an extensible netlink framework here is the right way to go,
> certainly better than the one shot method you had first.
The one thing I left out of the above that might be worth changing is
the fact that you bury your sequence number down in your mad header. If
there is a generic mechanism that multiple modules can use to send
customized data via nl, then it might be worthwhile to have the sequence
moved to the generic level.
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG KeyID: 0E572FDD
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
next prev parent reply other threads:[~2015-05-21 17:35 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-21 13:52 [RFC] IB/sa: Route SA pathrecord query through netlink Wan, Kaike
[not found] ` <3F128C9216C9B84BB6ED23EF16290AFB0CAB2E96-8k97q/ur5Z2krb+BlOpmy7fspsVTdybXVpNB7YpNyf8@public.gmane.org>
2015-05-21 17:21 ` Doug Ledford
[not found] ` <1432228874.28905.35.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-21 17:35 ` Doug Ledford [this message]
[not found] ` <1432229723.28905.40.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-21 17:48 ` Wan, Kaike
2015-05-21 17:43 ` Wan, Kaike
2015-05-21 18:12 ` Jason Gunthorpe
[not found] ` <20150521181200.GC6771-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2015-05-21 19:14 ` Wan, Kaike
2015-05-21 19:44 ` ira.weiny
[not found] ` <20150521194439.GA6389-W4f6Xiosr+yv7QzWx2u06xL4W9x8LtSr@public.gmane.org>
2015-05-21 19:49 ` Jason Gunthorpe
2015-05-21 20:40 ` Hefty, Sean
2015-05-21 23:33 ` Wan, Kaike
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1432229723.28905.40.camel@redhat.com \
--to=dledford-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=hal-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org \
--cc=ira.weiny-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org \
--cc=kaike.wan-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org \
--cc=sean.hefty-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox