netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Generalizing mmap'ed sockets
@ 2010-11-19 20:04 Tom Herbert
  2010-11-19 21:32 ` Rick Jones
  2010-11-19 22:10 ` Andrew Grover
  0 siblings, 2 replies; 10+ messages in thread
From: Tom Herbert @ 2010-11-19 20:04 UTC (permalink / raw)
  To: Linux Netdev List

This is a project I'm contemplating.  If you have any comments or can
point me to prior work in this area that would be appreciated.

It seems like should be fairly straight forward to extend the mmap
packet ring mechanisms to be used for arbitrary sockets (like TCP,
UDP, etc.). The idea is that we create a ring buffer for a socket
which is mmap'ed to share between user and kernel.  This can be done
for both transmit and receive side, and is basically modeled as a
consumer/producer queue.  There are semantic differences between
stream and datagram sockets that need to be considered, but I don't
think anything here is untenable.

The expected benefits of this are:

TX:
 - Zero copy transmit (which is already supported by vmsplice(), but
this might be simpler)
 - One system call needed on transmit which can cover multiple
datagrams or what would have been multiple writes (the call is just to
kick kernel to start sending)

RX:
 - Zero system calls needed to do receive (determining data ready is
accomplished by polling)
 - Immediate data placement in kernel available all the time,
including OOO placement
 - Potential for true zero copy on receive with device support (like
per flow queues, UDP queues)

The userland use of this for TCP might look something like:

struct mmap_sock_hdr {,
   __u32 prod_ptr;
   __u32 consumer_ptr;
};

int s;
struct mmap_sock_hdr *tx, *rx;
void *tx_base, *rx_base;

struct s_mmap_req {
   size_t size;
} mmap_req;

s = socket(AF_INET, SOCKET_STREAM, 0);

/* Set up ring buffer on socket and mmap into user space for TX */
size = 1 >> 19 - sizeof (struct mmap_sock_hdr);
mmap_req.size  = size;
setsockopt(s, SOL_SOCKET, TX_RING, (char *)&mmap_req,
sizeof(s_mmap_req));
tx = mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, s, 0);
tx_base = (void *)tx[1];

/* Now do same thing for RX */
size = 1 >> 19 - sizeof (struct mmap_sock_hdr);
mmap_req.size  = size;
setsockopt(s, SOL_SOCKET, RX_RING, (char *)&mmap_req,
sizeof(s_mmap_req));
rx = mmap(0, size, PROT_READ|PROT_WRITE, MAP_SHARED, s, 0);
rx_base = (void *)rx[1];

bind(s, ...) /* Normal bind */
connect(s, ...) /* Normal connect */

/* Transmit */

/* Application fills some of the available buffer (up to consumer pointer) */
for (i = 0; i < 10000; i++)
   tx_base[prod_ptr + i] = i % 256;

/* Advance producer pointer */
prod_ptr += 10000;

send(s, NULL, 0); /* Tells stack to send new data indicated by prod
pointer, just a trigger */

/* Polling for POLLOUT should work as expected */

/*********** Receive */

while (1) {
   poll(fds);
   if (s has POLLIN set) {
       Process data from rx_base[rx->consume_ptr] to
rx_base[rx->prod_ptr], modulo size of buffer of course
       rx->consume_ptr = rx->prod_ptr;    /* Gives back buffer space
to the kernel */
  }
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-11-24 19:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-19 20:04 Generalizing mmap'ed sockets Tom Herbert
2010-11-19 21:32 ` Rick Jones
2010-11-19 21:52   ` David Miller
2010-11-19 21:55     ` Tom Herbert
2010-11-19 21:58     ` Rick Jones
2010-11-19 22:08       ` David Miller
2010-11-19 22:47         ` Rick Jones
2010-11-19 22:49         ` Tom Herbert
2010-11-24 19:57           ` Michael S. Tsirkin
2010-11-19 22:10 ` Andrew Grover

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).