* [RFC] AF_RXRPC proposal
@ 2006-11-16 11:30 David Howells
0 siblings, 0 replies; only message in thread
From: David Howells @ 2006-11-16 11:30 UTC (permalink / raw)
To: netdev
Hi,
I've written a document to lay out what I'd like to do to make an AF_RXRPC
network protocol for Linux, with intent to replacing the current net/rxrpc/ in
the kernel and also providing an RxRPC transport for userspace.
Can you look it over and see what you think?
Also, is it feasible to add a new socket type? I'm thinking of adding
SOCK_RPC to be used rathet than SOCK_DGRAM, SOCK_STREAM or whatever, since the
usage model doesn't fit the ones the currently exist.
Thanks!
David
=========================
AF_RXRPC NETWORK PROTOCOL
=========================
RxRPC is in essence a two-part protocol. There is a session layer which
provides reliable virtual connections using UDP over IPv4 or IPv6 as the
transport layer, but implements a real network protocol, and there's the
presentation layer which renders structured data to binary blobs and back again
using XDR (as does SunRPC):
+-------------+
| Application |
+-------------+
| XDR | Presentation
+-------------+
| RxRPC | Session
+-------------+
| UDP | Transport
+-------------+
(Very OSI, I know, and probably wrong).
AF_RXRPC would provide:
(1) Part of an RxRPC facility for both kernel and userspace applications by
making the session part of it a Linux network protocol (AF_RXRPC).
(2) A two-phase protocol. The client transmits a blob and then receives a
blob, and the server receives a blob and then transmits a blob.
(3) Retention of the reusable bits of the transport system set up for one call
to speed up subsequent calls.
(4) A secure protocol, using the Linux kernel's key retention facility to
manage security on the client end. The server end must of necessity be
more active in security negotiations.
AF_RXRPC would not provide XDR marshalling facilities. That would be left to
the application.
Sockets of AF_RXRPC family would be:
(1) created as type SOCK_RPC;
(2) provided with a protocol of the type of underlying transport they're going
to use - currently only PF_INET and PF_INET6 are supported.
The Andrew File System (AFS) is an example of an application that uses this and
that has both kernel (filesystem) and userspace (utility) components.
=====================
PROTOCOL DRIVER MODEL
=====================
An overview of the RxRPC protocol:
(*) RxRPC sits on top of another networking protocol (UDP is the only option
currently), and uses this to provide network transport. UDP ports, for
example, provide transport endpoints.
(*) RxRPC supports multiple virtual "connections" from any given transport
endpoint, thus allowing the endpoints to be shared, even to the same
remote endpoint.
(*) Each connection goes to a particular "service". A connection may not go
to multiple services. A service may be considered the RxRPC equivalent of
a port number.
(*) Client-originating packets are marked, thus a transport endpoint can be
shared between client and server connections (connections have a
direction).
(*) Up to about a billion connections may be supported concurrently between
one local transport endpoint and one service on one remote endpoint. An
RxRPC connection is described by seven numbers:
Local address }
Local port } Transport (UDP) address
Remote address }
Remote port }
Direction
Connection ID
Service ID
(*) Each RxRPC operation is a "call". A connection may make up to four
billion calls, but only up to four calls may be in progress on a
connection at any one time.
(*) Calls are two-phase and asymmetric: the client sends its request data,
which the service receives; then the service sends the reply data which
the client receives.
(*) The data are of indefinite size, the end of a phase is marked with a flag
in the packet.
(*) The first four bytes of the request data are the service operation ID.
(*) Security is handled on a per-connection basis. The connection is
initiated by the first data packet on it arriving. If security is
requested, the server then issues a "challenge" and then the client
replies with a "response". If the response is successful, the security is
set for the lifetime of that connection, and all subsequent calls made
upon it use that same security.
About the AF_RXRPC driver:
(*) The AF_RXRPC protocol would transparently use internal sockets of the
transport protocol to represent transport endpoints.
(*) AF_RXRPC sockets map onto RxRPC calls, not RxRPC connections. RxRPC
connections would also be handled transparently.
(*) Additional parallel client connections would be initiated to support extra
concurrent calls, up to a limit [tunable].
(*) Each connection would be retained for a certain amount of time [tunable]
after the last call currently using it has completed, in case a new call
is made that could use it.
(*) Each internal UDP socket would be retained [tunable] for a certain amount
of time [tunable] after the last connection using it discarded, in case a
new connection is made that could use it.
(*) A client-side connection could only be shared between calls if they have
have the same key struct describing their security (and assuming the calls
would otherwise share the connection). Non-secured calls would also be
able to share connections with each other.
(*) ACK'ing would be handled by the protocol driver automatically, including
ping replying.
(*) SO_KEEPALIVE would automatically ping the other side.
Interaction with the user of the RxRPC socket:
(*) In the client, sending a request would be achieved with one or more
sendmsgs, followed by the reply received with one or more recvmsgs.
(*) Once the client has received the last bit of the reply with recvmsg, the
socket would be again available to send a new call with sendmsg.
(*) In the server, receiving a request would be achieved with one or more
recvmsgs, followed by the reply transmitted with one or more sendmsgs.
(*) The server could invoke a final recvmsg to pick up the success or
failure of the reply reception.
(*) The server could ACK the receipt of the request phase by doing an
sendmsg() with a special control message if the request is going to
take a long time to process. Normally the first packet of the reply
suffices to ACK the entire request.
(*) Switching from sendmsg() to recvmsg() or vice versa would shift the state
of the RPC operation, giving a final ACK on that phase of the protocol.
(*) select() and poll() would show a socket as being writable if sendmsg() can
be used to send a request or a reply, and readable if recvmsg() can be
used to receive a request or a reply. It would not be both readable and
writable simultaneously.
(*) The control data part of the msghdr struct would be used for a number of
things:
(*) Sending or receiving errors (aborts).
(*) Sending ping requests and receiving ping replies.
(*) Sending debug requests and receiving debug replies.
(*) The server would have to assist in the setting up of security. The server
sends a challenge packet to the client and receives a response packet.
====================
EXAMPLE CLIENT USAGE
====================
A client would issue an operation by:
(1) An RxRPC socket would be set up by:
client = socket(AF_RXRPC, SOCK_RPC, PF_INET);
Where the third parameter indicates the address type of the transport
socket used - usually IPv4.
(2) A local address could optionally be bound:
struct sockaddr_rxrpc srx = {
.srx_family = AF_RXRPC,
.srx_service = 0, /* we're a client */
.transport_type = SOCK_DGRAM, /* type of transport socket */
.transport.sin_family = AF_INET,
.transport.sin_port = htons(7000), /* AFS callback */
.transport.sin_address = 0, /* all local interfaces */
};
bind(client, &srx, sizeof(srx));
This would specify the local UDP port to be used. If not given, a random
non-privileged port would be used. A UDP port may be shared between
several unrelated RxRPC sockets. Security is handled on a basis of
per-RxRPC virtual connection.
(3) The security would be set:
const char *key = "AFS:cambridge.redhat.com";
setsockopt(client, SOL_RXRPC, RXRPC_SECURITY_KEY, key, strlen(key));
This would issue a request_key() to get the security context.
(4) The server would then be contacted:
struct sockaddr_rxrpc srx = {
.srx_family = AF_RXRPC,
.srx_service = VL_SERVICE_ID,
.transport_type = SOCK_DGRAM, /* type of transport socket */
.transport.sin_family = AF_INET,
.transport.sin_port = htons(7005), /* AFS volume manager */
.transport.sin_address = ...,
};
connect(client, &srx, sizeof(srx));
(5) The request would be sent:
sendmsg(client, msg, 0);
(6) And then the reply received:
recvmsg(client, msg, 0);
If an abort/error was returned by the server, this will be returned in the
control data buffer.
(7) Then the socket would be closed or used to make another call.
====================
EXAMPLE SERVER USAGE
====================
A server would accept operations by:
(1) An RxRPC socket would be set up by:
server = socket(AF_RXRPC, SOCK_RPC, PF_INET);
Where the third parameter indicates the address type of the transport
socket used - usually IPv4.
(2) A local address would be bound:
struct sockaddr_rxrpc srx = {
.srx_family = AF_RXRPC,
.srx_service = VL_SERVICE_ID, /* RxRPC service ID */
.transport_type = SOCK_DGRAM, /* type of transport socket */
.transport.sin_family = AF_INET,
.transport.sin_port = htons(7000), /* AFS callback */
.transport.sin_address = 0, /* all local interfaces */
};
bind(server, &srx, sizeof(srx));
(3) The server would then listen out for incoming calls:
listen(server, 100);
(4) It would accept calls that were made:
struct sockaddr_rxrpc srx;
socken_t slen = sizeof(srx)
call = accept(server, &src, &slen);
(5) The first data packet would then be received:
recvmsg(call, msg, 0);
A connection is discovered on the server by reception of the first data
packet holding its connection ID. Only then can security be set up.
(6) The security context might need to be set up:
(a) The security index can be examined:
uint16_t sectype;
socklen_t len = sizeof(sectype);
getsockopt(call, SOL_RXRPC, RXRPC_GET_SECURITY_INDEX, §ype, &len);
(b) A security challenge can be made:
sendmsg(call, msg, 0);
The control message will contain the challenge; there would be no
data.
(c) And the security response received:
recvmsg(call, msg, 0);
The control message will contain the response; there would be no data.
(d) The security context can then be set:
setsockopt(call, SOL_RXRPC, RXRPC_SET_SECURITY, buffer, buflen);
If the virtual RxRPC connection already has security set up, the
getsockopt will indicate this, and steps (b) to (d) can be skipped.
A security rejection would be achieved simply by closing the socket before
step (d).
(7) The data could then be received:
recvmsg(call, msg, 0);
(8) And then the reply transmitted:
sendmsg(client, msg, 0);
If an abort/error is to be served instead, that would be placed in the
control data, and no data would be attached.
(9) Then the socket would be closed.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2006-11-16 11:32 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-16 11:30 [RFC] AF_RXRPC proposal David Howells
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).