All of lore.kernel.org
 help / color / mirror / Atom feed
From: Neil Horman <nhorman@tuxdriver.com>
To: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org, Chris Van Hoof <vanhoof@redhat.com>,
	Clark Williams <williams@redhat.com>
Subject: Re: [RFC 1/2] net: Introduce recvmmsg socket syscall
Date: Wed, 20 May 2009 22:26:21 -0400	[thread overview]
Message-ID: <20090521022621.GA2173@localhost.localdomain> (raw)
In-Reply-To: <20090521020541.GD5956@ghostprotocols.net>

On Wed, May 20, 2009 at 11:05:41PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Wed, May 20, 2009 at 08:46:34PM -0400, Neil Horman escreveu:
> > On Wed, May 20, 2009 at 08:06:52PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Meaning receive multiple messages, reducing the number of syscalls and
> > > net stack entry/exit operations.
> > > 
> > > Next patches will introduce mechanisms where protocols that want to
> > > optimize this operation will provide an unlocked_recvmsg operation.
> > > 
> > > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > Its a neat idea, I like the possibility on saving lots of syscalls for
> > busy sockets, but I imagine the addition of a new syscall gives people pause.  I
> > wonder if simply augmenting the existing recvmsg syscall with a message flag to
> > indicate that multiple messages can be received on that call.
> > 
> > What I would propose looks something like:
> > 
> > 1) define a new flag in the msghdr pointer for msg_flags, MSG_COMPOUND.  Setting
> > this on the call lets the protocol we can store multiple messages
> > 
> > 2) if this flag is set the msg_control pointer should contain a cmsghdr with a
> > new type MSG_COMPOUND_NEXT, in which the size is sizeof(void *) and the data
> > contains a pointer to the next msghdr pointer.
> > 
> > 3) The kernel can iteratively fill out buffers passed in through the chain,
> > setting the MSG_COMPOUND flag on each msghdr that contains valid data.  The
> > first msghdr to not have the MSG_COMPOUND flag set denotes the last buffer that
> > the kernel put valid data in.  This way the buffer chain pointer is kept
> > unchanged, and userspace can follow it to free the data if need be.
> > 
> > Thoughts?
> 
> I didn't went into such detail when discussing this with Dave on IRC,
> but I thought about something like using a setsockopt to tell the kernel
> that the socket was in multiple message mode, lemme look at the
> discussion to be faithful to it...
> 
> [18:22] <acme> I see, but the bastardization I was thinking was about just
> putting a datagram per iovec instead of taking a datagram and go on
> spilling it over the iovec entries, if some sockopt was set, as a first
> try ;-)
> [18:23] <davem> Oh I see
> [18:23] <davem> that would work too
> 
> But I think that the interface I proposed, that was Dave's general idea,
> should be ok as well for sendmmsg, to send multiple messages to
> different destinations using markings like one msg_iovlen to signal that
> the previous msg_iov/msg_iovlen should be used for a different
> destination.
> 
> The reasoning behing the proposed interface was to mostly keep the
> existing way of passing iovecs to the kernel, but this time around
> passing multiple iovecs instead of just one.
> 
> Existing code would just have to make the iovecs, msg_name, etc be
> arrays instead of rethinking how to talk to the kernel completely.
> 
> So... lets hear more opinions :-)
> 
I agree, your way of doing this definately lets you layer on top of the existing
vetted implementation, which is nice, I just thought that avoiding the creation
of another syscall might be worth a little extra work in the kernel.  Instead of
arrays of msghdrs, We'd be looking at chains like this:
msghdr->(struct msghdr *)msg_control[i].data->msghdr->etc

Not too hard to parse, I dont think.  But I'll defer to brighter minds than
mine.  If the creation of another syscall isn't too difficult a barrier to
overcome (assuming this is going to occur for sendmsg, and various other i/o ops
as well), then your way here is probably the way to go.
Neil

> Ah, I went to a local pub to relax and left three machines non-stop
> pounding a "chrt -f 1 ./rcvmmsg 5001 64" patched server and it hold up
> for hours:
> 
> nr_datagrams received: 24
>     4352 bytes received from mica.ghostprotocols.net in 17 datagrams
>     1536 bytes received from doppio.ghostprotocols.net in 6 datagrams
>     256 bytes received from filo.ghostprotocols.net in 1 datagrams
> nr_datagrams received: 18
>     256 bytes received from filo.ghostprotocols.net in 1 datagrams
>     3072 bytes received from doppio.ghostprotocols.net in 12 datagrams
>     256 bytes received from mica.ghostprotocols.net in 1 datagrams
>     256 bytes received from doppio.ghostprotocols.net in 1 datagrams
>     256 bytes received from mica.ghostprotocols.net in 1 datagrams
>     256 bytes received from doppio.ghostprotocols.net in 1 datagrams
>     256 bytes received from mica.ghostprotocols.net in 1 datagrams
> nr_datagrams received: 26
>     5120 bytes received from mica.ghostprotocols.net in 20 datagrams
>     256 bytes received from filo.ghostprotocols.net in 1 datagrams
>     1280 bytes received from doppio.ghostprotocols.net in 5 datagrams
> nr_datagrams received: 18
>     256 bytes received from filo.ghostprotocols.net in 1 datagrams
>     1792 bytes received from doppio.ghostprotocols.net in 7 datagrams
>     256 bytes received from filo.ghostprotocols.net in 1 datagrams
>     1792 bytes received from doppio.ghostprotocols.net in 7 datagrams
>     256 bytes received from mica.ghostprotocols.net in 1 datagrams
>     256 bytes received from do^C    256 bytes received from filo.ghostprotocols.net in 1 datagrams
> 
> :-)
> 
> - Arnaldo
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

  reply	other threads:[~2009-05-21  2:26 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-20 23:06 [RFC 1/2] net: Introduce recvmmsg socket syscall Arnaldo Carvalho de Melo
2009-05-21  0:46 ` Neil Horman
2009-05-21  2:05   ` Arnaldo Carvalho de Melo
2009-05-21  2:26     ` Neil Horman [this message]
2009-05-21  3:50       ` David Miller
2009-05-21 10:40         ` Neil Horman
2009-05-21 14:16 ` Paul Moore
2009-05-21 14:47   ` Arnaldo Carvalho de Melo
2009-05-21 15:03     ` Paul Moore
2009-05-21 15:11       ` Arnaldo Carvalho de Melo
2009-05-21 15:24         ` Paul Moore
2009-05-21 16:10 ` Evgeniy Polyakov
2009-05-21 16:27   ` Arnaldo Carvalho de Melo
2009-05-21 16:33     ` Steven Whitehouse
2009-05-21 16:45       ` Arnaldo Carvalho de Melo
2009-05-21 16:38 ` Caitlin Bestler
2009-05-21 16:55   ` Arnaldo Carvalho de Melo
2009-05-21 17:26     ` Caitlin Bestler
2009-05-21 17:51       ` Arnaldo Carvalho de Melo
2009-05-22  8:32         ` steve
2009-05-22  7:22 ` Rémi Denis-Courmont
2009-05-22  8:31   ` steve
2009-05-22 16:39   ` Caitlin Bestler
2009-05-22 20:06 ` Neil Horman
2009-06-04  1:44   ` Andrew Grover
2009-06-04  1:46     ` Arnaldo Carvalho de Melo
2009-06-04 10:47     ` Neil Horman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090521022621.GA2173@localhost.localdomain \
    --to=nhorman@tuxdriver.com \
    --cc=acme@redhat.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=vanhoof@redhat.com \
    --cc=williams@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.