Sending big Netlink messages to userspace

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Sending big Netlink messages to userspace
@ 2008-06-24 16:38 Julius Volz
  2008-06-24 17:00 ` Patrick McHardy
  0 siblings, 1 reply; 9+ messages in thread
From: Julius Volz @ 2008-06-24 16:38 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy, Vince Busam, Thomas Graf

Hi,

While adding a Netlink interface to IPVS, I've been wondering how to
properly send very big messages to userspace and found these posts:

http://lists.openwall.net/netdev/2007/03/06/214
http://lists.openwall.net/netdev/2007/03/07/2

Herbert writes in the second one, "Dumps should be done using 4K
(NLMSG_GOODSIZE) skb's, where is the problem?" How is that meant?
Should one manually split up dumps into several NLMSG_GOODSIZE
messages or is there some mechanism for that?

I need to send arbitrarily long lists to userspace and I'm already
choosing a big enough size for nlmsg_new(), so I get no put failures
while constructing the message. However, when receiving the data in
userspace (with libnl), the receive callback is never called. An
strace shows that MSG_TRUNC is set in the oversized message, so the
data is never fully received.

I just call nl_recvmsgs_default(sock) once (which does not return an
error). Am I handling libnl incorrectly or do I need to do this
differently on the kernel side?

Thanks,
Julius

-- 
Google Switzerland GmbH

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Sending big Netlink messages to userspace
  2008-06-24 16:38 Sending big Netlink messages to userspace Julius Volz
@ 2008-06-24 17:00 ` Patrick McHardy
  2008-06-24 18:18   ` Julius Volz
  2008-06-25 10:44   ` Thomas Graf
  0 siblings, 2 replies; 9+ messages in thread
From: Patrick McHardy @ 2008-06-24 17:00 UTC (permalink / raw)
  To: Julius Volz; +Cc: netdev, Vince Busam, Thomas Graf

Julius Volz wrote:
> Hi,
> 
> While adding a Netlink interface to IPVS, I've been wondering how to
> properly send very big messages to userspace and found these posts:
> 
> http://lists.openwall.net/netdev/2007/03/06/214
> http://lists.openwall.net/netdev/2007/03/07/2
> 
> Herbert writes in the second one, "Dumps should be done using 4K
> (NLMSG_GOODSIZE) skb's, where is the problem?" How is that meant?
> Should one manually split up dumps into several NLMSG_GOODSIZE
> messages or is there some mechanism for that?

Thats done automatically through netlink_dump_start().
You send one skb per dump callback invocation. The final
call returns a zero sized skb to indicate the end of the
dump.

> I need to send arbitrarily long lists to userspace and I'm already
> choosing a big enough size for nlmsg_new(), so I get no put failures
> while constructing the message. However, when receiving the data in
> userspace (with libnl), the receive callback is never called. An
> strace shows that MSG_TRUNC is set in the oversized message, so the
> data is never fully received.

You probably need to increase the receive buffer size in libnl.

> I just call nl_recvmsgs_default(sock) once (which does not return an
> error). Am I handling libnl incorrectly or do I need to do this
> differently on the kernel side?

It depends on what kind of attributes you're sending. In case
of top-level attributes you should only dump objects until
you reach NLMSG_GOODSIZE and continue during the next dump
callback invocation. Sending arbitary amounts of nested
data is more tricky, or might even be impossible currently.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Sending big Netlink messages to userspace
  2008-06-24 17:00 ` Patrick McHardy
@ 2008-06-24 18:18   ` Julius Volz
  2008-06-25 10:44   ` Thomas Graf
  1 sibling, 0 replies; 9+ messages in thread
From: Julius Volz @ 2008-06-24 18:18 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, Vince Busam, Thomas Graf

On Tue, Jun 24, 2008, Patrick McHardy wrote:
> Thats done automatically through netlink_dump_start().
> You send one skb per dump callback invocation. The final
> call returns a zero sized skb to indicate the end of the
> dump.

Thanks for the pointer, that seems like what I was looking for!

> You probably need to increase the receive buffer size in libnl.

libnl hides this from the user, but from reading the code it looks
like it now has correct a provision for increasing receive buffer size
automatically on seeing MSG_TRUNC. However, it doesn't seem to help in
my case...

> It depends on what kind of attributes you're sending. In case
> of top-level attributes you should only dump objects until
> you reach NLMSG_GOODSIZE and continue during the next dump
> callback invocation. Sending arbitary amounts of nested
> data is more tricky, or might even be impossible currently.

Then I probably have a problem because the list items I'm sending are
nested and the whole list itself is contained in a nested attribute:

IPVS_ENTRY_ATTR_SERVICE_LIST
  IPVS_ENTRY_ATTR_SERVICE
    [service attributes]
    ...
  IPVS_ENTRY_ATTR_SERVICE
    [service attributes]
    ...
  ...

Each list entry has a limited size, however, so if I get rid of the
top-level wrapper (SERVICE_LIST), I could try to see if there is
enough space left in the skb for another whole nested service entry
and continue in the next callback invocation if space runs out.

Actually, nla_nest_cancel() seems to be what I'm looking for: start
dumping service entries into the skb, call nla_nest_cancel() on a put
failure and begin with the canceled element on the next invocation.
That should work, right?

Julius

-- 
Google Switzerland GmbH

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Sending big Netlink messages to userspace
  2008-06-24 17:00 ` Patrick McHardy
  2008-06-24 18:18   ` Julius Volz
@ 2008-06-25 10:44   ` Thomas Graf
  2008-06-25 18:56     ` Julius Volz
  2008-06-25 22:51     ` David Miller
  1 sibling, 2 replies; 9+ messages in thread
From: Thomas Graf @ 2008-06-25 10:44 UTC (permalink / raw)
  To: Patrick McHardy, Julius Volz; +Cc: netdev, Vince Busam

* Julius Volz <juliusv@google.com> 2008-06-24 20:18
> libnl hides this from the user, but from reading the code it looks
> like it now has correct a provision for increasing receive buffer size
> automatically on seeing MSG_TRUNC. However, it doesn't seem to help in
> my case...

libnl initializes the buffer size to the size of a page. The value
can be overwritten by calling nl_set_buffer_size(socket, rx, tx)

* Patrick McHardy <kaber@trash.net> 2008-06-24 19:00
> > It depends on what kind of attributes you're sending. In case
> > of top-level attributes you should only dump objects until
> > you reach NLMSG_GOODSIZE and continue during the next dump
> > callback invocation. Sending arbitary amounts of nested
> > data is more tricky, or might even be impossible currently.
> 
> Then I probably have a problem because the list items I'm sending are
> nested and the whole list itself is contained in a nested attribute:
> 
> IPVS_ENTRY_ATTR_SERVICE_LIST
>   IPVS_ENTRY_ATTR_SERVICE
>     [service attributes]
>     ...
>   IPVS_ENTRY_ATTR_SERVICE
>     [service attributes]
>     ...
>   ...
> 
> Each list entry has a limited size, however, so if I get rid of the
> top-level wrapper (SERVICE_LIST), I could try to see if there is
> enough space left in the skb for another whole nested service entry
> and continue in the next callback invocation if space runs out.
> 
> Actually, nla_nest_cancel() seems to be what I'm looking for: start
> dumping service entries into the skb, call nla_nest_cancel() on a put
> failure and begin with the canceled element on the next invocation.
> That should work, right?

Yes, that's how it's intended to work. You can fill until one of the
nla_put variantes aborts and starting trimming off what you've added
already using the nla_nest_cancel() methods until you reach a point
where you can safely cut your stream of attributes. So far we typically
made sure that this barrier is on netlink message level but is no
requirement. You may split a list into several messages you just
have to make sure that you call nla_nest_end() and nlmsg_end() properly
before you send the message.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Sending big Netlink messages to userspace
  2008-06-25 10:44   ` Thomas Graf
@ 2008-06-25 18:56     ` Julius Volz
  2008-06-26 10:01       ` Thomas Graf
  2008-06-25 22:51     ` David Miller
  1 sibling, 1 reply; 9+ messages in thread
From: Julius Volz @ 2008-06-25 18:56 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Patrick McHardy, netdev, Vince Busam

On Wed, Jun 25, 2008, Thomas Graf wrote:
> libnl initializes the buffer size to the size of a page. The value
> can be overwritten by calling nl_set_buffer_size(socket, rx, tx)

Thanks, I missed that. Though now I won't need that anymore as I got
the dump working!

> Yes, that's how it's intended to work. You can fill until one of the
> nla_put variantes aborts and starting trimming off what you've added
> already using the nla_nest_cancel() methods until you reach a point
> where you can safely cut your stream of attributes. So far we typically
> made sure that this barrier is on netlink message level but is no
> requirement. You may split a list into several messages you just
> have to make sure that you call nla_nest_end() and nlmsg_end() properly
> before you send the message.

Ok, I did exactly that! Took me a while to figure out all of the
necessary quirks (including userland), but now it works! :)

So userspace now gets one callback invocation per part of a multipart
message. I guess it's not possible to aggregate the data into just one
callback invocation?

Thanks again!
Julius

-- 
Google Switzerland GmbH

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Sending big Netlink messages to userspace
  2008-06-25 10:44   ` Thomas Graf
  2008-06-25 18:56     ` Julius Volz
@ 2008-06-25 22:51     ` David Miller
  2008-06-26  0:41       ` Patrick McHardy
  1 sibling, 1 reply; 9+ messages in thread
From: David Miller @ 2008-06-25 22:51 UTC (permalink / raw)
  To: tgraf; +Cc: kaber, juliusv, netdev, vbusam

From: Thomas Graf <tgraf@suug.ch>
Date: Wed, 25 Jun 2008 12:44:01 +0200

> * Julius Volz <juliusv@google.com> 2008-06-24 20:18
> > libnl hides this from the user, but from reading the code it looks
> > like it now has correct a provision for increasing receive buffer size
> > automatically on seeing MSG_TRUNC. However, it doesn't seem to help in
> > my case...
> 
> libnl initializes the buffer size to the size of a page. The value
> can be overwritten by calling nl_set_buffer_size(socket, rx, tx)

And it doesn't need to be any larger than a page.  Actually, the
hard upper bound is 8K.

The kernel will always chop the response up into chunks of that
size or smaller when generating replies to userspace.

Therefore userland need never have a buffer larger than 8K or
sysconf(_SC_PAGESIZE), whichever is smaller.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Sending big Netlink messages to userspace
  2008-06-25 22:51     ` David Miller
@ 2008-06-26  0:41       ` Patrick McHardy
  2008-06-26 15:39         ` Julius Volz
  0 siblings, 1 reply; 9+ messages in thread
From: Patrick McHardy @ 2008-06-26  0:41 UTC (permalink / raw)
  To: David Miller; +Cc: tgraf, juliusv, netdev, vbusam

David Miller wrote:
> From: Thomas Graf <tgraf@suug.ch>
> Date: Wed, 25 Jun 2008 12:44:01 +0200
> 
>> libnl initializes the buffer size to the size of a page. The value
>> can be overwritten by calling nl_set_buffer_size(socket, rx, tx)
> 
> And it doesn't need to be any larger than a page.  Actually, the
> hard upper bound is 8K.
> 
> The kernel will always chop the response up into chunks of that
> size or smaller when generating replies to userspace.
> 
> Therefore userland need never have a buffer larger than 8K or
> sysconf(_SC_PAGESIZE), whichever is smaller.

For dumps the size is fine, for large unicast messages it
might need to be increased though (like nfnetlink_queue).


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Sending big Netlink messages to userspace
  2008-06-25 18:56     ` Julius Volz
@ 2008-06-26 10:01       ` Thomas Graf
  0 siblings, 0 replies; 9+ messages in thread
From: Thomas Graf @ 2008-06-26 10:01 UTC (permalink / raw)
  To: Julius Volz; +Cc: Patrick McHardy, netdev, Vince Busam

* Julius Volz <juliusv@google.com> 2008-06-25 20:56
> So userspace now gets one callback invocation per part of a multipart
> message. I guess it's not possible to aggregate the data into just one
> callback invocation?

Typically the aggregation is done when multipart messages are fed into
a cache/collection which is then provided to the caller. This requires
the caller to implement object and cache operations and is based on
the assumption that a single netlink message represents a single object.

In your case I'd suggest to do the aggregation on your own.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Sending big Netlink messages to userspace
  2008-06-26  0:41       ` Patrick McHardy
@ 2008-06-26 15:39         ` Julius Volz
  0 siblings, 0 replies; 9+ messages in thread
From: Julius Volz @ 2008-06-26 15:39 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, tgraf, netdev, vbusam

On Thu, Jun 26, 2008 at 2:41 AM, Patrick McHardy <kaber@trash.net> wrote:
>> Therefore userland need never have a buffer larger than 8K or
>> sysconf(_SC_PAGESIZE), whichever is smaller.
>
> For dumps the size is fine, for large unicast messages it
> might need to be increased though (like nfnetlink_queue).

Ok, I put everything that is a list of variable size into dumps now.
One object (list entry) per multipart message (but several multipart
messages are sent in the same skb until it fills up).

All the other reply messages are of a fixed size and very small, so
fortunately I don't need to increase buffer sizes anywhere.

Julius

-- 
Google Switzerland GmbH

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-06-26 15:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-24 16:38 Sending big Netlink messages to userspace Julius Volz
2008-06-24 17:00 ` Patrick McHardy
2008-06-24 18:18   ` Julius Volz
2008-06-25 10:44   ` Thomas Graf
2008-06-25 18:56     ` Julius Volz
2008-06-26 10:01       ` Thomas Graf
2008-06-25 22:51     ` David Miller
2008-06-26  0:41       ` Patrick McHardy
2008-06-26 15:39         ` Julius Volz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).