* rtnetlink and many VFs
@ 2011-04-21 14:36 Ben Hutchings
2011-04-21 17:02 ` Rose, Gregory V
0 siblings, 1 reply; 9+ messages in thread
From: Ben Hutchings @ 2011-04-21 14:36 UTC (permalink / raw)
To: David Miller; +Cc: netdev, sf-linux-drivers
My colleagues have been working on SR-IOV support for sfc. The hardware
supports up to 127 VFs per port.
If we configure all 127 VFs through the net device, an RTM_GETLINK dump
will need to include messages describing them, with a total size of:
127 * (sizeof(struct ifla_vf_mac) + sizeof(struct ifla_vf_vlan) +
sizeof(struct ifla_vf_tx_rate) + protocol overhead)
> 7112
These messages are nested within the message describing the device as a
whole, so they cannot be split. The maximum size of an outgoing netlink
message, based on NLMSG_GOODSIZE, seems to be min(PAGE_SIZE, 8192). So
when PAGE_SIZE = 4096 it is simply impossible to dump information about
such a device!
I think it needs to be made possible to grow a netlink skb during
generation of the first message. Userspace may still be unable to
receive the large message but at least it has a chance.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: rtnetlink and many VFs
2011-04-21 14:36 rtnetlink and many VFs Ben Hutchings
@ 2011-04-21 17:02 ` Rose, Gregory V
2011-04-21 17:40 ` Ben Hutchings
0 siblings, 1 reply; 9+ messages in thread
From: Rose, Gregory V @ 2011-04-21 17:02 UTC (permalink / raw)
To: Ben Hutchings, David Miller; +Cc: netdev, sf-linux-drivers
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Ben Hutchings
> Sent: Thursday, April 21, 2011 7:36 AM
> To: David Miller
> Cc: netdev; sf-linux-drivers
> Subject: rtnetlink and many VFs
>
> My colleagues have been working on SR-IOV support for sfc. The hardware
> supports up to 127 VFs per port.
>
> If we configure all 127 VFs through the net device, an RTM_GETLINK dump
> will need to include messages describing them, with a total size of:
>
> 127 * (sizeof(struct ifla_vf_mac) + sizeof(struct ifla_vf_vlan) +
> sizeof(struct ifla_vf_tx_rate) + protocol overhead)
> > 7112
>
> These messages are nested within the message describing the device as a
> whole, so they cannot be split. The maximum size of an outgoing netlink
> message, based on NLMSG_GOODSIZE, seems to be min(PAGE_SIZE, 8192). So
> when PAGE_SIZE = 4096 it is simply impossible to dump information about
> such a device!
>
> I think it needs to be made possible to grow a netlink skb during
> generation of the first message. Userspace may still be unable to
> receive the large message but at least it has a chance.
I've been looking at this one too. The limit seems to be about 40 or so in the most common case. My netlink fu is weak but I've been looking at the code in iproute2/ip and netlink to see what we can do about it.
As more VFs become possible it really needs a fix. I was thinking about something along the lines of this:
# ip link show eth(x) vf (n)
Where eth(x) is the physical function that owns the VFs and (n) is the specific VF you want information for. That way one could easily script something that loops through the VFs and gets the information for each. This really becomes necessary when we start adding additional MAC and VLAN filters for each VF that need to be displayed. In that case you can only show a few VFs before you run out of space.
In any case I've been working on an RFC patch for this and hope to have it soon. I consider this a pretty serious limitation and one could even view it as a bug.
- Greg
Greg Rose
LAD Division
Intel Corp.
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: rtnetlink and many VFs
2011-04-21 17:02 ` Rose, Gregory V
@ 2011-04-21 17:40 ` Ben Hutchings
2011-04-21 17:50 ` Rose, Gregory V
0 siblings, 1 reply; 9+ messages in thread
From: Ben Hutchings @ 2011-04-21 17:40 UTC (permalink / raw)
To: Rose, Gregory V; +Cc: David Miller, netdev, sf-linux-drivers
On Thu, 2011-04-21 at 10:02 -0700, Rose, Gregory V wrote:
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> > On Behalf Of Ben Hutchings
> > Sent: Thursday, April 21, 2011 7:36 AM
> > To: David Miller
> > Cc: netdev; sf-linux-drivers
> > Subject: rtnetlink and many VFs
> >
> > My colleagues have been working on SR-IOV support for sfc. The hardware
> > supports up to 127 VFs per port.
> >
> > If we configure all 127 VFs through the net device, an RTM_GETLINK dump
> > will need to include messages describing them, with a total size of:
> >
> > 127 * (sizeof(struct ifla_vf_mac) + sizeof(struct ifla_vf_vlan) +
> > sizeof(struct ifla_vf_tx_rate) + protocol overhead)
> > > 7112
> >
> > These messages are nested within the message describing the device as a
> > whole, so they cannot be split. The maximum size of an outgoing netlink
> > message, based on NLMSG_GOODSIZE, seems to be min(PAGE_SIZE, 8192). So
> > when PAGE_SIZE = 4096 it is simply impossible to dump information about
> > such a device!
> >
> > I think it needs to be made possible to grow a netlink skb during
> > generation of the first message. Userspace may still be unable to
> > receive the large message but at least it has a chance.
>
> I've been looking at this one too. The limit seems to be about 40 or
> so in the most common case.
Right. When Steve Hodgson investigated this here, he found that 46 VFs
would fit.
> My netlink fu is weak but I've been looking at the code in iproute2/ip
> and netlink to see what we can do about it.
>
> As more VFs become possible it really needs a fix. I was thinking
> about something along the lines of this:
>
> # ip link show eth(x) vf (n)
>
> Where eth(x) is the physical function that owns the VFs and (n) is the
> specific VF you want information for. That way one could easily
> script something that loops through the VFs and gets the information
> for each. This really becomes necessary when we start adding
> additional MAC and VLAN filters for each VF that need to be displayed.
> In that case you can only show a few VFs before you run out of space.
I think that what 'ip link show' is doing now seems to be perfectly
valid. It allocates a 16K buffer which would be enough if netlink
didn't apply this PAGE_SIZE limit to single messages.
> In any case I've been working on an RFC patch for this and hope to
> have it soon. I consider this a pretty serious limitation and one
> could even view it as a bug.
It is certainly a bug. rtnetlink is the currently favoured API for
querying and configuring network device settings, but there are now
valid device settings that cannot be queried.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: rtnetlink and many VFs
2011-04-21 17:40 ` Ben Hutchings
@ 2011-04-21 17:50 ` Rose, Gregory V
2011-04-21 18:11 ` Ben Hutchings
0 siblings, 1 reply; 9+ messages in thread
From: Rose, Gregory V @ 2011-04-21 17:50 UTC (permalink / raw)
To: Ben Hutchings; +Cc: David Miller, netdev, sf-linux-drivers
> -----Original Message-----
> From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> Sent: Thursday, April 21, 2011 10:40 AM
> To: Rose, Gregory V
> Cc: David Miller; netdev; sf-linux-drivers
> Subject: RE: rtnetlink and many VFs
>
> On Thu, 2011-04-21 at 10:02 -0700, Rose, Gregory V wrote:
> > > -----Original Message-----
> > > From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org]
> > > On Behalf Of Ben Hutchings
> > > Sent: Thursday, April 21, 2011 7:36 AM
> > > To: David Miller
> > > Cc: netdev; sf-linux-drivers
> > > Subject: rtnetlink and many VFs
> > >
> >
> > As more VFs become possible it really needs a fix. I was thinking
> > about something along the lines of this:
> >
> > # ip link show eth(x) vf (n)
> >
> > Where eth(x) is the physical function that owns the VFs and (n) is the
> > specific VF you want information for. That way one could easily
> > script something that loops through the VFs and gets the information
> > for each. This really becomes necessary when we start adding
> > additional MAC and VLAN filters for each VF that need to be displayed.
> > In that case you can only show a few VFs before you run out of space.
>
> I think that what 'ip link show' is doing now seems to be perfectly
> valid. It allocates a 16K buffer which would be enough if netlink
> didn't apply this PAGE_SIZE limit to single messages.
Ah, I hadn't seen that it was allocating 16K, that would then be enough for 128 VFs but in the future would not be enough for 40Gig (or higher speed) devices that might support two or 4 times that many VFs. I still feel like eventually the number of VFs will outgrow the capability of a single message to handle, especially when VFs will have the capability of having multiple MAC address and VLAN filters assigned to them. And it seems orthogonal to me to mirror the 'ip link set eth(x) vf (n)' syntax with a 'ip link show eth(x) vf (n)' syntax.
That's just me though.
>
> It is certainly a bug. rtnetlink is the currently favoured API for
> querying and configuring network device settings, but there are now
> valid device settings that cannot be queried.
I'll look into just fixing the bug for now and reserve (at least what I consider to be) future improvements for later.
- Greg
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: rtnetlink and many VFs
2011-04-21 17:50 ` Rose, Gregory V
@ 2011-04-21 18:11 ` Ben Hutchings
2011-04-21 18:28 ` Rose, Gregory V
2011-04-22 22:29 ` Rose, Gregory V
0 siblings, 2 replies; 9+ messages in thread
From: Ben Hutchings @ 2011-04-21 18:11 UTC (permalink / raw)
To: Rose, Gregory V; +Cc: David Miller, netdev, sf-linux-drivers
On Thu, 2011-04-21 at 10:50 -0700, Rose, Gregory V wrote:
> > -----Original Message-----
> > From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> > Sent: Thursday, April 21, 2011 10:40 AM
> > To: Rose, Gregory V
> > Cc: David Miller; netdev; sf-linux-drivers
> > Subject: RE: rtnetlink and many VFs
> >
> > On Thu, 2011-04-21 at 10:02 -0700, Rose, Gregory V wrote:
> > > > -----Original Message-----
> > > > From: netdev-owner@vger.kernel.org [mailto:netdev-
> > owner@vger.kernel.org]
> > > > On Behalf Of Ben Hutchings
> > > > Sent: Thursday, April 21, 2011 7:36 AM
> > > > To: David Miller
> > > > Cc: netdev; sf-linux-drivers
> > > > Subject: rtnetlink and many VFs
> > > >
> > >
> > > As more VFs become possible it really needs a fix. I was thinking
> > > about something along the lines of this:
> > >
> > > # ip link show eth(x) vf (n)
> > >
> > > Where eth(x) is the physical function that owns the VFs and (n) is the
> > > specific VF you want information for. That way one could easily
> > > script something that loops through the VFs and gets the information
> > > for each. This really becomes necessary when we start adding
> > > additional MAC and VLAN filters for each VF that need to be displayed.
> > > In that case you can only show a few VFs before you run out of space.
> >
> > I think that what 'ip link show' is doing now seems to be perfectly
> > valid. It allocates a 16K buffer which would be enough if netlink
> > didn't apply this PAGE_SIZE limit to single messages.
>
> Ah, I hadn't seen that it was allocating 16K, that would then be
> enough for 128 VFs but in the future would not be enough for 40Gig (or
> higher speed) devices that might support two or 4 times that many VFs.
There are only 256 available function addresses per PCIe link, so there
can be at most 255 VFs associated with a single PF.
(The function number could be extended further by effectively assigning
multiple bus numbers to a link, but that would be a significantly more
disruptive change than ARI.)
> I still feel like eventually the number of VFs will outgrow the
> capability of a single message to handle, especially when VFs will
> have the capability of having multiple MAC address and VLAN filters
> assigned to them. And it seems orthogonal to me to mirror the 'ip
> link set eth(x) vf (n)' syntax with a 'ip link show eth(x) vf (n)'
> syntax.
>
> That's just me though.
[...]
I think it would be a useful extension, but we have to keep the current
API working as far as possible.
Ben.
--
Ben Hutchings, Senior Software Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: rtnetlink and many VFs
2011-04-21 18:11 ` Ben Hutchings
@ 2011-04-21 18:28 ` Rose, Gregory V
2011-04-22 22:29 ` Rose, Gregory V
1 sibling, 0 replies; 9+ messages in thread
From: Rose, Gregory V @ 2011-04-21 18:28 UTC (permalink / raw)
To: Ben Hutchings; +Cc: David Miller, netdev, sf-linux-drivers
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Ben Hutchings
> Sent: Thursday, April 21, 2011 11:12 AM
> To: Rose, Gregory V
> Cc: David Miller; netdev; sf-linux-drivers
> Subject: RE: rtnetlink and many VFs
>
> On Thu, 2011-04-21 at 10:50 -0700, Rose, Gregory V wrote:
> > > -----Original Message-----
> > > From: Ben Hutchings [mailto:bhutchings@solarflare.com]
> > > Sent: Thursday, April 21, 2011 10:40 AM
> > > To: Rose, Gregory V
> > > Cc: David Miller; netdev; sf-linux-drivers
> > > Subject: RE: rtnetlink and many VFs
> > >
>
> > I still feel like eventually the number of VFs will outgrow the
> > capability of a single message to handle, especially when VFs will
> > have the capability of having multiple MAC address and VLAN filters
> > assigned to them. And it seems orthogonal to me to mirror the 'ip
> > link set eth(x) vf (n)' syntax with a 'ip link show eth(x) vf (n)'
> > syntax.
> >
> > That's just me though.
> [...]
>
> I think it would be a useful extension, but we have to keep the current
> API working as far as possible.
Sure, sounds good. I'm working on something that will patch things up for the present use case right now.
- Greg
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: rtnetlink and many VFs
2011-04-21 18:11 ` Ben Hutchings
2011-04-21 18:28 ` Rose, Gregory V
@ 2011-04-22 22:29 ` Rose, Gregory V
2011-04-22 22:31 ` David Miller
1 sibling, 1 reply; 9+ messages in thread
From: Rose, Gregory V @ 2011-04-22 22:29 UTC (permalink / raw)
To: Ben Hutchings; +Cc: David Miller, netdev, sf-linux-drivers
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Ben Hutchings
> Sent: Thursday, April 21, 2011 11:12 AM
> To: Rose, Gregory V
> Cc: David Miller; netdev; sf-linux-drivers
> Subject: RE: rtnetlink and many VFs
>
> > > I think that what 'ip link show' is doing now seems to be perfectly
> > > valid. It allocates a 16K buffer which would be enough if netlink
> > > didn't apply this PAGE_SIZE limit to single messages.
> >
If someone who knew (or knows) a bit more about netlink could point out where this limitation is set in the code I'd really appreciate it. I've been poking around for a few hours and still can't find it. I found the spot in the iproute2 code where the 16k is allocated and traced it down to the netlink socket call but from the kernel side I just can't find this PAGE_SIZE limit invocation.
- Greg
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: rtnetlink and many VFs
2011-04-22 22:29 ` Rose, Gregory V
@ 2011-04-22 22:31 ` David Miller
2011-04-22 22:49 ` Rose, Gregory V
0 siblings, 1 reply; 9+ messages in thread
From: David Miller @ 2011-04-22 22:31 UTC (permalink / raw)
To: gregory.v.rose; +Cc: bhutchings, netdev, linux-net-drivers
From: "Rose, Gregory V" <gregory.v.rose@intel.com>
Date: Fri, 22 Apr 2011 15:29:41 -0700
> If someone who knew (or knows) a bit more about netlink could point
> out where this limitation is set in the code I'd really appreciate
> it. I've been poking around for a few hours and still can't find
> it.
NLMSG_GOODSIZE
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: rtnetlink and many VFs
2011-04-22 22:31 ` David Miller
@ 2011-04-22 22:49 ` Rose, Gregory V
0 siblings, 0 replies; 9+ messages in thread
From: Rose, Gregory V @ 2011-04-22 22:49 UTC (permalink / raw)
To: David Miller
Cc: bhutchings@solarflare.com, netdev@vger.kernel.org,
linux-net-drivers@solarflare.com
> -----Original Message-----
> From: David Miller [mailto:davem@davemloft.net]
> Sent: Friday, April 22, 2011 3:32 PM
> To: Rose, Gregory V
> Cc: bhutchings@solarflare.com; netdev@vger.kernel.org; linux-net-
> drivers@solarflare.com
> Subject: Re: rtnetlink and many VFs
>
> From: "Rose, Gregory V" <gregory.v.rose@intel.com>
> Date: Fri, 22 Apr 2011 15:29:41 -0700
>
> > If someone who knew (or knows) a bit more about netlink could point
> > out where this limitation is set in the code I'd really appreciate
> > it. I've been poking around for a few hours and still can't find
> > it.
>
> NLMSG_GOODSIZE
Thank you sir!
- Greg
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2011-04-22 22:49 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-21 14:36 rtnetlink and many VFs Ben Hutchings
2011-04-21 17:02 ` Rose, Gregory V
2011-04-21 17:40 ` Ben Hutchings
2011-04-21 17:50 ` Rose, Gregory V
2011-04-21 18:11 ` Ben Hutchings
2011-04-21 18:28 ` Rose, Gregory V
2011-04-22 22:29 ` Rose, Gregory V
2011-04-22 22:31 ` David Miller
2011-04-22 22:49 ` Rose, Gregory V
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox