From: Arnaldo Carvalho de Melo <acme@redhat.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: David Miller <davem@davemloft.net>,
netdev@vger.kernel.org, Chris Van Hoof <vanhoof@redhat.com>,
Clark Williams <williams@redhat.com>
Subject: Re: [RFC 1/2] net: Introduce recvmmsg socket syscall
Date: Wed, 20 May 2009 23:05:41 -0300 [thread overview]
Message-ID: <20090521020541.GD5956@ghostprotocols.net> (raw)
In-Reply-To: <20090521004634.GB29869@localhost.localdomain>
Em Wed, May 20, 2009 at 08:46:34PM -0400, Neil Horman escreveu:
> On Wed, May 20, 2009 at 08:06:52PM -0300, Arnaldo Carvalho de Melo wrote:
> > Meaning receive multiple messages, reducing the number of syscalls and
> > net stack entry/exit operations.
> >
> > Next patches will introduce mechanisms where protocols that want to
> > optimize this operation will provide an unlocked_recvmsg operation.
> >
> > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> Its a neat idea, I like the possibility on saving lots of syscalls for
> busy sockets, but I imagine the addition of a new syscall gives people pause. I
> wonder if simply augmenting the existing recvmsg syscall with a message flag to
> indicate that multiple messages can be received on that call.
>
> What I would propose looks something like:
>
> 1) define a new flag in the msghdr pointer for msg_flags, MSG_COMPOUND. Setting
> this on the call lets the protocol we can store multiple messages
>
> 2) if this flag is set the msg_control pointer should contain a cmsghdr with a
> new type MSG_COMPOUND_NEXT, in which the size is sizeof(void *) and the data
> contains a pointer to the next msghdr pointer.
>
> 3) The kernel can iteratively fill out buffers passed in through the chain,
> setting the MSG_COMPOUND flag on each msghdr that contains valid data. The
> first msghdr to not have the MSG_COMPOUND flag set denotes the last buffer that
> the kernel put valid data in. This way the buffer chain pointer is kept
> unchanged, and userspace can follow it to free the data if need be.
>
> Thoughts?
I didn't went into such detail when discussing this with Dave on IRC,
but I thought about something like using a setsockopt to tell the kernel
that the socket was in multiple message mode, lemme look at the
discussion to be faithful to it...
[18:22] <acme> I see, but the bastardization I was thinking was about just
putting a datagram per iovec instead of taking a datagram and go on
spilling it over the iovec entries, if some sockopt was set, as a first
try ;-)
[18:23] <davem> Oh I see
[18:23] <davem> that would work too
But I think that the interface I proposed, that was Dave's general idea,
should be ok as well for sendmmsg, to send multiple messages to
different destinations using markings like one msg_iovlen to signal that
the previous msg_iov/msg_iovlen should be used for a different
destination.
The reasoning behing the proposed interface was to mostly keep the
existing way of passing iovecs to the kernel, but this time around
passing multiple iovecs instead of just one.
Existing code would just have to make the iovecs, msg_name, etc be
arrays instead of rethinking how to talk to the kernel completely.
So... lets hear more opinions :-)
Ah, I went to a local pub to relax and left three machines non-stop
pounding a "chrt -f 1 ./rcvmmsg 5001 64" patched server and it hold up
for hours:
nr_datagrams received: 24
4352 bytes received from mica.ghostprotocols.net in 17 datagrams
1536 bytes received from doppio.ghostprotocols.net in 6 datagrams
256 bytes received from filo.ghostprotocols.net in 1 datagrams
nr_datagrams received: 18
256 bytes received from filo.ghostprotocols.net in 1 datagrams
3072 bytes received from doppio.ghostprotocols.net in 12 datagrams
256 bytes received from mica.ghostprotocols.net in 1 datagrams
256 bytes received from doppio.ghostprotocols.net in 1 datagrams
256 bytes received from mica.ghostprotocols.net in 1 datagrams
256 bytes received from doppio.ghostprotocols.net in 1 datagrams
256 bytes received from mica.ghostprotocols.net in 1 datagrams
nr_datagrams received: 26
5120 bytes received from mica.ghostprotocols.net in 20 datagrams
256 bytes received from filo.ghostprotocols.net in 1 datagrams
1280 bytes received from doppio.ghostprotocols.net in 5 datagrams
nr_datagrams received: 18
256 bytes received from filo.ghostprotocols.net in 1 datagrams
1792 bytes received from doppio.ghostprotocols.net in 7 datagrams
256 bytes received from filo.ghostprotocols.net in 1 datagrams
1792 bytes received from doppio.ghostprotocols.net in 7 datagrams
256 bytes received from mica.ghostprotocols.net in 1 datagrams
256 bytes received from do^C 256 bytes received from filo.ghostprotocols.net in 1 datagrams
:-)
- Arnaldo
next prev parent reply other threads:[~2009-05-21 2:05 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-05-20 23:06 [RFC 1/2] net: Introduce recvmmsg socket syscall Arnaldo Carvalho de Melo
2009-05-21 0:46 ` Neil Horman
2009-05-21 2:05 ` Arnaldo Carvalho de Melo [this message]
2009-05-21 2:26 ` Neil Horman
2009-05-21 3:50 ` David Miller
2009-05-21 10:40 ` Neil Horman
2009-05-21 14:16 ` Paul Moore
2009-05-21 14:47 ` Arnaldo Carvalho de Melo
2009-05-21 15:03 ` Paul Moore
2009-05-21 15:11 ` Arnaldo Carvalho de Melo
2009-05-21 15:24 ` Paul Moore
2009-05-21 16:10 ` Evgeniy Polyakov
2009-05-21 16:27 ` Arnaldo Carvalho de Melo
2009-05-21 16:33 ` Steven Whitehouse
2009-05-21 16:45 ` Arnaldo Carvalho de Melo
2009-05-21 16:38 ` Caitlin Bestler
2009-05-21 16:55 ` Arnaldo Carvalho de Melo
2009-05-21 17:26 ` Caitlin Bestler
2009-05-21 17:51 ` Arnaldo Carvalho de Melo
2009-05-22 8:32 ` steve
2009-05-22 7:22 ` Rémi Denis-Courmont
2009-05-22 8:31 ` steve
2009-05-22 16:39 ` Caitlin Bestler
2009-05-22 20:06 ` Neil Horman
2009-06-04 1:44 ` Andrew Grover
2009-06-04 1:46 ` Arnaldo Carvalho de Melo
2009-06-04 10:47 ` Neil Horman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090521020541.GD5956@ghostprotocols.net \
--to=acme@redhat.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
--cc=vanhoof@redhat.com \
--cc=williams@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).