* Regression: "rtnetlink: Compute and store minimum ifinfo dump size" breaks glibc's getifaddrs() @ 2012-01-27 12:36 David Vrabel 2012-01-27 21:53 ` David Miller 0 siblings, 1 reply; 6+ messages in thread From: David Vrabel @ 2012-01-27 12:36 UTC (permalink / raw) To: netdev@vger.kernel.org; +Cc: Greg Rose, David S. Miller Changeset c7ac8679bec9397afe8918f788cbcef88c38da54 (rtnetlink: Compute and store minimum ifinfo dump size) applied to 3.1 increased the maximum size of the RTM_GETLINK message response. glibc's getifaddrs() function uses a page sized (4 KiB) buffer for the RTM_GETLINK response and returns a failure if the message is truncated. This buffer is not large enough if there is a network card with many virtual functions. What do you recommend to resolve this regression? David ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Regression: "rtnetlink: Compute and store minimum ifinfo dump size" breaks glibc's getifaddrs() 2012-01-27 12:36 Regression: "rtnetlink: Compute and store minimum ifinfo dump size" breaks glibc's getifaddrs() David Vrabel @ 2012-01-27 21:53 ` David Miller 2012-01-27 22:26 ` Štefan Gula 2012-01-30 16:50 ` Rose, Gregory V 0 siblings, 2 replies; 6+ messages in thread From: David Miller @ 2012-01-27 21:53 UTC (permalink / raw) To: david.vrabel; +Cc: netdev, gregory.v.rose, steweg From: David Vrabel <david.vrabel@citrix.com> Date: Fri, 27 Jan 2012 12:36:47 +0000 > Changeset c7ac8679bec9397afe8918f788cbcef88c38da54 (rtnetlink: Compute > and store minimum ifinfo dump size) applied to 3.1 increased the maximum > size of the RTM_GETLINK message response. > > glibc's getifaddrs() function uses a page sized (4 KiB) buffer for the > RTM_GETLINK response and returns a failure if the message is truncated. > This buffer is not large enough if there is a network card with many > virtual functions. > > What do you recommend to resolve this regression? Actually, glibc technically uses the CPP define value "PAGE_SIZE" if available, which is potentially different from the system page size. Using a statically defined PAGE_SIZE is wrong if the program is subsequently executed on a system with a different page size. On sparc, and powerpc I believe, this happens commonly. A 32-bit executable will see a PAGE_SIZE value of 4K, but when executed on a 64-bit system the page size is actually 8K or larger. That's why __getpagesize() should always be used. Anyways, if the page sizes are correct we're in a bit of a pickle. Do you have any idea what the computed value of min_ifinfo_dump_size is at the time of the failure? Greg, I think we're kind of screwed. The defined minimum appropriate buffer size for a recvmsg() call on a netlink socket is defined as: getpagesize() < 8192 ? getpagesize() : 8192 and glibc essentially abides by this by unconditionally using page size, and therefore if an interface with many virtual interfaces takes us over this limit, we break basically every properly written piece of netlink code out there. iproute2 is funny, it uses a static 16K buffer size for netlink message reception, which tends to paper over this problem :-/ Stefan, this plays into your work too, we're starting to create situations which are going to cause very serious problems. Imagine an interface with lots of these new macvlan source rules, it would potentially exceed the limit in the above equation as well. Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Regression: "rtnetlink: Compute and store minimum ifinfo dump size" breaks glibc's getifaddrs() 2012-01-27 21:53 ` David Miller @ 2012-01-27 22:26 ` Štefan Gula 2012-01-30 16:50 ` Rose, Gregory V 1 sibling, 0 replies; 6+ messages in thread From: Štefan Gula @ 2012-01-27 22:26 UTC (permalink / raw) To: David Miller; +Cc: david.vrabel, netdev, gregory.v.rose 2012/1/27 David Miller <davem@davemloft.net>: > From: David Vrabel <david.vrabel@citrix.com> > Date: Fri, 27 Jan 2012 12:36:47 +0000 > >> Changeset c7ac8679bec9397afe8918f788cbcef88c38da54 (rtnetlink: Compute >> and store minimum ifinfo dump size) applied to 3.1 increased the maximum >> size of the RTM_GETLINK message response. >> >> glibc's getifaddrs() function uses a page sized (4 KiB) buffer for the >> RTM_GETLINK response and returns a failure if the message is truncated. >> This buffer is not large enough if there is a network card with many >> virtual functions. >> >> What do you recommend to resolve this regression? > > Actually, glibc technically uses the CPP define value "PAGE_SIZE" if > available, which is potentially different from the system page size. > > Using a statically defined PAGE_SIZE is wrong if the program is > subsequently executed on a system with a different page size. > > On sparc, and powerpc I believe, this happens commonly. A 32-bit > executable will see a PAGE_SIZE value of 4K, but when executed on a > 64-bit system the page size is actually 8K or larger. That's why > __getpagesize() should always be used. > > Anyways, if the page sizes are correct we're in a bit of a pickle. > > Do you have any idea what the computed value of min_ifinfo_dump_size > is at the time of the failure? > > Greg, I think we're kind of screwed. The defined minimum appropriate > buffer size for a recvmsg() call on a netlink socket is defined as: > > getpagesize() < 8192 ? getpagesize() : 8192 > > and glibc essentially abides by this by unconditionally using page > size, and therefore if an interface with many virtual interfaces takes > us over this limit, we break basically every properly written piece of > netlink code out there. > > iproute2 is funny, it uses a static 16K buffer size for netlink > message reception, which tends to paper over this problem :-/ > > Stefan, this plays into your work too, we're starting to create > situations which are going to cause very serious problems. Imagine > an interface with lots of these new macvlan source rules, it would > potentially exceed the limit in the above equation as well. > > Thanks. I can theoretically avoid this situation by using multiple small rtnetlink messages instead of one huge. I believe that ipset code is using that approach - just have to figure out proper changes to my macvlan code. But this doesn't solves the issue you are mentioning, it only prohibits to happen for macvlan code as one page size will be more than enough to allocate buffer.... ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Regression: "rtnetlink: Compute and store minimum ifinfo dump size" breaks glibc's getifaddrs() 2012-01-27 21:53 ` David Miller 2012-01-27 22:26 ` Štefan Gula @ 2012-01-30 16:50 ` Rose, Gregory V 2012-01-30 16:58 ` David Miller 1 sibling, 1 reply; 6+ messages in thread From: Rose, Gregory V @ 2012-01-30 16:50 UTC (permalink / raw) To: David Miller, david.vrabel@citrix.com Cc: netdev@vger.kernel.org, steweg@gmail.com > -----Original Message----- > From: David Miller [mailto:davem@davemloft.net] > Sent: Friday, January 27, 2012 1:54 PM > To: david.vrabel@citrix.com > Cc: netdev@vger.kernel.org; Rose, Gregory V; steweg@gmail.com > Subject: Re: Regression: "rtnetlink: Compute and store minimum ifinfo dump > size" breaks glibc's getifaddrs() > > From: David Vrabel <david.vrabel@citrix.com> > Date: Fri, 27 Jan 2012 12:36:47 +0000 > > > Changeset c7ac8679bec9397afe8918f788cbcef88c38da54 (rtnetlink: Compute > > and store minimum ifinfo dump size) applied to 3.1 increased the maximum > > size of the RTM_GETLINK message response. > > > > glibc's getifaddrs() function uses a page sized (4 KiB) buffer for the > > RTM_GETLINK response and returns a failure if the message is truncated. > > This buffer is not large enough if there is a network card with many > > virtual functions. > > > > What do you recommend to resolve this regression? > > Actually, glibc technically uses the CPP define value "PAGE_SIZE" if > available, which is potentially different from the system page size. > > Using a statically defined PAGE_SIZE is wrong if the program is > subsequently executed on a system with a different page size. > > On sparc, and powerpc I believe, this happens commonly. A 32-bit > executable will see a PAGE_SIZE value of 4K, but when executed on a > 64-bit system the page size is actually 8K or larger. That's why > __getpagesize() should always be used. > > Anyways, if the page sizes are correct we're in a bit of a pickle. > > Do you have any idea what the computed value of min_ifinfo_dump_size > is at the time of the failure? > > Greg, I think we're kind of screwed. The defined minimum appropriate > buffer size for a recvmsg() call on a netlink socket is defined as: > > getpagesize() < 8192 ? getpagesize() : 8192 > > and glibc essentially abides by this by unconditionally using page > size, and therefore if an interface with many virtual interfaces takes > us over this limit, we break basically every properly written piece of > netlink code out there. Yep, we're screwed. And as more features and capabilities get added to virtual functions that we want to control through netlink it gets even worse. Maybe we should have 'ip link show' just display the number of VFs and then have a new 'ip' tool syntax along the lines of 'ip link show eth(x) vf (n)' where eth(x) is the PF and n is the number of the VF. Then it could show all relevant information for just that VF. Scripts could parse the number of VFs from the first call to 'ip link show' and then loop to show the details of each VF. Just an idea... maybe there are other ones out there but it's just getting ridiculous how much data has to be transferred back and forth during the basic 'ip link show' command when the interface as subordinate VFs now that we're getting to devices with up to 256 VFs. - Greg ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Regression: "rtnetlink: Compute and store minimum ifinfo dump size" breaks glibc's getifaddrs() 2012-01-30 16:50 ` Rose, Gregory V @ 2012-01-30 16:58 ` David Miller 2012-01-30 17:02 ` Rose, Gregory V 0 siblings, 1 reply; 6+ messages in thread From: David Miller @ 2012-01-30 16:58 UTC (permalink / raw) To: gregory.v.rose; +Cc: david.vrabel, netdev, steweg From: "Rose, Gregory V" <gregory.v.rose@intel.com> Date: Mon, 30 Jan 2012 16:50:58 +0000 > Maybe we should have 'ip link show' just display the number of VFs > and then have a new 'ip' tool syntax along the lines of 'ip link > show eth(x) vf (n)' where eth(x) is the PF and n is the number of > the VF. Then it could show all relevant information for just that > VF. Scripts could parse the number of VFs from the first call to > 'ip link show' and then loop to show the details of each VF. > > Just an idea... maybe there are other ones out there but it's just > getting ridiculous how much data has to be transferred back and > forth during the basic 'ip link show' command when the interface as > subordinate VFs now that we're getting to devices with up to 256 > VFs. The fix is almost certainly that we need to require that the application turn on the publishing of features it is interested in. The inet_diag sockets do something like this, wherein you set bits in some flags of the request saying what attributes you want to see. It seems clear that we must, at this point, turn off VF attributes by default. Could you do some work on this? If you don't have the time I'll look into it, because we'll need to backport this to -stable too. Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: Regression: "rtnetlink: Compute and store minimum ifinfo dump size" breaks glibc's getifaddrs() 2012-01-30 16:58 ` David Miller @ 2012-01-30 17:02 ` Rose, Gregory V 0 siblings, 0 replies; 6+ messages in thread From: Rose, Gregory V @ 2012-01-30 17:02 UTC (permalink / raw) To: David Miller Cc: david.vrabel@citrix.com, netdev@vger.kernel.org, steweg@gmail.com > -----Original Message----- > From: David Miller [mailto:davem@davemloft.net] > Sent: Monday, January 30, 2012 8:58 AM > To: Rose, Gregory V > Cc: david.vrabel@citrix.com; netdev@vger.kernel.org; steweg@gmail.com > Subject: Re: Regression: "rtnetlink: Compute and store minimum ifinfo dump > size" breaks glibc's getifaddrs() > > From: "Rose, Gregory V" <gregory.v.rose@intel.com> > Date: Mon, 30 Jan 2012 16:50:58 +0000 > > > Maybe we should have 'ip link show' just display the number of VFs > > and then have a new 'ip' tool syntax along the lines of 'ip link > > show eth(x) vf (n)' where eth(x) is the PF and n is the number of > > the VF. Then it could show all relevant information for just that > > VF. Scripts could parse the number of VFs from the first call to > > 'ip link show' and then loop to show the details of each VF. > > > > Just an idea... maybe there are other ones out there but it's just > > getting ridiculous how much data has to be transferred back and > > forth during the basic 'ip link show' command when the interface as > > subordinate VFs now that we're getting to devices with up to 256 > > VFs. > > The fix is almost certainly that we need to require that the application > turn on the publishing of features it is interested in. > > The inet_diag sockets do something like this, wherein you set bits in > some flags of the request saying what attributes you want to see. > > It seems clear that we must, at this point, turn off VF attributes by > default. > > Could you do some work on this? If you don't have the time I'll look > into it, because we'll need to backport this to -stable too. Yeah, I can push out some other things I'm working on to get on this. - Greg > > Thanks. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2012-01-30 17:03 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-01-27 12:36 Regression: "rtnetlink: Compute and store minimum ifinfo dump size" breaks glibc's getifaddrs() David Vrabel 2012-01-27 21:53 ` David Miller 2012-01-27 22:26 ` Štefan Gula 2012-01-30 16:50 ` Rose, Gregory V 2012-01-30 16:58 ` David Miller 2012-01-30 17:02 ` Rose, Gregory V
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.