* [PATCH RFC 0/2] veth, bridge, and GSO maximums @ 2017-11-26 18:17 Stephen Hemminger 2017-11-26 18:17 ` [PATCH RFC 1/2] br: add notifier for when bridge changes it " Stephen Hemminger ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Stephen Hemminger @ 2017-11-26 18:17 UTC (permalink / raw) To: davem, stephen; +Cc: netdev, Stephen Hemminger This pair of patchesimproves the performance when running containers in an environment where underlying device has lower GSO maximum (such as Azure). With containers a veth pair is created and one end is attached to the bridge device. The bridge device correctly reports computes GSO parameters that are the minimum of the lower devices. The problem is that the other end of the veth device (in container) reports the full GSO size. This patch propogates the upper (bridge device) parameters to the other end of the veth device. Please consider it as alternative to the sysfs GSO changes. Stephen Hemminger (2): br: add notifier for when bridge changes it GSO maximums veth: propagate bridge GSO to peer drivers/net/veth.c | 72 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netdevice.h | 1 + net/bridge/br_if.c | 7 +++++ 3 files changed, 80 insertions(+) -- 2.11.0 ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH RFC 1/2] br: add notifier for when bridge changes it GSO maximums 2017-11-26 18:17 [PATCH RFC 0/2] veth, bridge, and GSO maximums Stephen Hemminger @ 2017-11-26 18:17 ` Stephen Hemminger 2017-11-26 18:17 ` [PATCH RFC 2/2] veth: propagate bridge GSO to peer Stephen Hemminger 2017-11-30 15:47 ` [PATCH RFC 0/2] veth, bridge, and GSO maximums David Miller 2 siblings, 0 replies; 22+ messages in thread From: Stephen Hemminger @ 2017-11-26 18:17 UTC (permalink / raw) To: davem, stephen; +Cc: netdev, Stephen Hemminger Add a callback notifier for when the minimum GSO values calculated across all the bridge ports changes. This allows for veth to adjust based on the devices in the bridge. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> --- include/linux/netdevice.h | 1 + net/bridge/br_if.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ef789e1d679e..0da966ffec70 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2326,6 +2326,7 @@ struct netdev_lag_lower_state_info { #define NETDEV_UDP_TUNNEL_PUSH_INFO 0x001C #define NETDEV_UDP_TUNNEL_DROP_INFO 0x001D #define NETDEV_CHANGE_TX_QUEUE_LEN 0x001E +#define NETDEV_CHANGE_GSO_MAX 0x001F int register_netdevice_notifier(struct notifier_block *nb); int unregister_netdevice_notifier(struct notifier_block *nb); diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index 9ba4ed65c52b..ca4ccadd78d0 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c @@ -453,8 +453,15 @@ static void br_set_gso_limits(struct net_bridge *br) gso_max_size = min(gso_max_size, p->dev->gso_max_size); gso_max_segs = min(gso_max_segs, p->dev->gso_max_segs); } + + if (br->dev->gso_max_size == gso_max_size && + br->dev->gso_max_segs == gso_max_segs) + return; + br->dev->gso_max_size = gso_max_size; br->dev->gso_max_segs = gso_max_segs; + + call_netdevice_notifiers(NETDEV_CHANGE_GSO_MAX, br->dev); } /* -- 2.11.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-26 18:17 [PATCH RFC 0/2] veth, bridge, and GSO maximums Stephen Hemminger 2017-11-26 18:17 ` [PATCH RFC 1/2] br: add notifier for when bridge changes it " Stephen Hemminger @ 2017-11-26 18:17 ` Stephen Hemminger 2017-11-27 3:13 ` David Ahern 2017-11-30 15:47 ` [PATCH RFC 0/2] veth, bridge, and GSO maximums David Miller 2 siblings, 1 reply; 22+ messages in thread From: Stephen Hemminger @ 2017-11-26 18:17 UTC (permalink / raw) To: davem, stephen; +Cc: netdev, Stephen Hemminger This allows veth device in containers to see the GSO maximum settings of the actual device being used for output. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> --- drivers/net/veth.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index f5438d0978ca..0c9ce156943b 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -511,17 +511,89 @@ static struct rtnl_link_ops veth_link_ops = { .get_link_net = veth_get_link_net, }; +/* When veth device is added to a bridge or other master device + * then reflect the GSO max values from the upper device + * to the other end of veth pair. + */ +static void veth_change_upper(struct net_device *dev, + const struct netdev_notifier_changeupper_info *info) +{ + struct net_device *upper = info->upper_dev; + struct net_device *peer; + struct veth_priv *priv; + + if (dev->netdev_ops != &veth_netdev_ops) + return; + + priv = netdev_priv(dev); + peer = rtnl_dereference(priv->peer); + if (!peer) + return; + + if (upper) { + peer->gso_max_segs = upper->gso_max_segs; + peer->gso_max_size = upper->gso_max_size; + } else { + peer->gso_max_segs = GSO_MAX_SEGS; + peer->gso_max_size = GSO_MAX_SIZE; + } +} + +static void veth_change_upper_gso(struct net_device *upper) +{ + struct net_device *peer, *dev; + struct veth_priv *priv; + + for_each_netdev(dev_net(upper), dev) { + if (dev->netdev_ops != &veth_netdev_ops) + continue; + if (!netdev_has_upper_dev(dev, upper)) + continue; + + priv = netdev_priv(dev); + peer = rtnl_dereference(priv->peer); + if (!peer) + continue; + peer->gso_max_segs = upper->gso_max_segs; + peer->gso_max_size = upper->gso_max_size; + } +} + +static int veth_netdev_event(struct notifier_block *this, + unsigned long event, void *ptr) +{ + struct net_device *event_dev = netdev_notifier_info_to_dev(ptr); + + /* Propagate the upper (bridge) device settings to peer */ + switch (event) { + case NETDEV_CHANGEUPPER: + veth_change_upper(event_dev, ptr); + break; + case NETDEV_CHANGE_GSO_MAX: + veth_change_upper_gso(event_dev); + break; + } + + return NOTIFY_DONE; +} + +static struct notifier_block veth_netdev_notifier = { + .notifier_call = veth_netdev_event, +}; + /* * init/fini */ static __init int veth_init(void) { + register_netdevice_notifier(&veth_netdev_notifier); return rtnl_link_register(&veth_link_ops); } static __exit void veth_exit(void) { + unregister_netdevice_notifier(&veth_netdev_notifier); rtnl_link_unregister(&veth_link_ops); } -- 2.11.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-26 18:17 ` [PATCH RFC 2/2] veth: propagate bridge GSO to peer Stephen Hemminger @ 2017-11-27 3:13 ` David Ahern 2017-11-27 7:07 ` Stephen Hemminger 0 siblings, 1 reply; 22+ messages in thread From: David Ahern @ 2017-11-27 3:13 UTC (permalink / raw) To: Stephen Hemminger, davem; +Cc: netdev, Stephen Hemminger On 11/26/17 11:17 AM, Stephen Hemminger wrote: > This allows veth device in containers to see the GSO maximum > settings of the actual device being used for output. > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > --- > drivers/net/veth.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 72 insertions(+) > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > index f5438d0978ca..0c9ce156943b 100644 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c > @@ -511,17 +511,89 @@ static struct rtnl_link_ops veth_link_ops = { > .get_link_net = veth_get_link_net, > }; > > +/* When veth device is added to a bridge or other master device > + * then reflect the GSO max values from the upper device > + * to the other end of veth pair. > + */ > +static void veth_change_upper(struct net_device *dev, > + const struct netdev_notifier_changeupper_info *info) > +{ > + struct net_device *upper = info->upper_dev; > + struct net_device *peer; > + struct veth_priv *priv; > + > + if (dev->netdev_ops != &veth_netdev_ops) > + return; > + > + priv = netdev_priv(dev); > + peer = rtnl_dereference(priv->peer); > + if (!peer) > + return; > + > + if (upper) { > + peer->gso_max_segs = upper->gso_max_segs; > + peer->gso_max_size = upper->gso_max_size; > + } else { > + peer->gso_max_segs = GSO_MAX_SEGS; > + peer->gso_max_size = GSO_MAX_SIZE; > + } veth devices can be added to a VRF instead of a bridge, and I do not believe the gso propagation works for L3 master devices. >From a quick grep, team devices do not appear to handle gso changes either. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-27 3:13 ` David Ahern @ 2017-11-27 7:07 ` Stephen Hemminger 2017-11-27 20:14 ` Solio Sarabia 0 siblings, 1 reply; 22+ messages in thread From: Stephen Hemminger @ 2017-11-27 7:07 UTC (permalink / raw) To: David Ahern; +Cc: davem, netdev, Stephen Hemminger On Sun, 26 Nov 2017 20:13:39 -0700 David Ahern <dsahern@gmail.com> wrote: > On 11/26/17 11:17 AM, Stephen Hemminger wrote: > > This allows veth device in containers to see the GSO maximum > > settings of the actual device being used for output. > > > > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> > > --- > > drivers/net/veth.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 72 insertions(+) > > > > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > > index f5438d0978ca..0c9ce156943b 100644 > > --- a/drivers/net/veth.c > > +++ b/drivers/net/veth.c > > @@ -511,17 +511,89 @@ static struct rtnl_link_ops veth_link_ops = { > > .get_link_net = veth_get_link_net, > > }; > > > > +/* When veth device is added to a bridge or other master device > > + * then reflect the GSO max values from the upper device > > + * to the other end of veth pair. > > + */ > > +static void veth_change_upper(struct net_device *dev, > > + const struct netdev_notifier_changeupper_info *info) > > +{ > > + struct net_device *upper = info->upper_dev; > > + struct net_device *peer; > > + struct veth_priv *priv; > > + > > + if (dev->netdev_ops != &veth_netdev_ops) > > + return; > > + > > + priv = netdev_priv(dev); > > + peer = rtnl_dereference(priv->peer); > > + if (!peer) > > + return; > > + > > + if (upper) { > > + peer->gso_max_segs = upper->gso_max_segs; > > + peer->gso_max_size = upper->gso_max_size; > > + } else { > > + peer->gso_max_segs = GSO_MAX_SEGS; > > + peer->gso_max_size = GSO_MAX_SIZE; > > + } > > veth devices can be added to a VRF instead of a bridge, and I do not > believe the gso propagation works for L3 master devices. > > From a quick grep, team devices do not appear to handle gso changes either. This code should still work correctly, but no optimization would happen. The gso_max_size of the VRF or team will still be GSO_MAX_SIZE so there would be no change. If VRF or Team ever got smart enough to handle GSO limits, then the algorithm would handle it. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-27 7:07 ` Stephen Hemminger @ 2017-11-27 20:14 ` Solio Sarabia 2017-11-27 21:15 ` Stephen Hemminger 0 siblings, 1 reply; 22+ messages in thread From: Solio Sarabia @ 2017-11-27 20:14 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dsahern, davem, netdev, sthemmin On Sun, Nov 26, 2017 at 11:07:25PM -0800, Stephen Hemminger wrote: > On Sun, 26 Nov 2017 20:13:39 -0700 > David Ahern <dsahern@gmail.com> wrote: > > > On 11/26/17 11:17 AM, Stephen Hemminger wrote: > > > This allows veth device in containers to see the GSO maximum > > > settings of the actual device being used for output. > > > > veth devices can be added to a VRF instead of a bridge, and I do not > > believe the gso propagation works for L3 master devices. > > > > From a quick grep, team devices do not appear to handle gso changes either. > > This code should still work correctly, but no optimization would happen. > The gso_max_size of the VRF or team will > still be GSO_MAX_SIZE so there would be no change. If VRF or Team ever got smart > enough to handle GSO limits, then the algorithm would handle it. This patch propagates gso value from bridge to its veth endpoints. However, since bridge is never aware of the GSO limit from underlying interfaces, bridge/veth still have larger GSO size. In the docker case, bridge is not linked directly to physical or synthetic interfaces; it relies on iptables to decide which interface to forward packets to. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-27 20:14 ` Solio Sarabia @ 2017-11-27 21:15 ` Stephen Hemminger 2017-11-28 1:42 ` Solio Sarabia 0 siblings, 1 reply; 22+ messages in thread From: Stephen Hemminger @ 2017-11-27 21:15 UTC (permalink / raw) To: Solio Sarabia; +Cc: dsahern, davem, netdev, sthemmin On Mon, 27 Nov 2017 12:14:19 -0800 Solio Sarabia <solio.sarabia@intel.com> wrote: > On Sun, Nov 26, 2017 at 11:07:25PM -0800, Stephen Hemminger wrote: > > On Sun, 26 Nov 2017 20:13:39 -0700 > > David Ahern <dsahern@gmail.com> wrote: > > > > > On 11/26/17 11:17 AM, Stephen Hemminger wrote: > > > > This allows veth device in containers to see the GSO maximum > > > > settings of the actual device being used for output. > > > > > > veth devices can be added to a VRF instead of a bridge, and I do not > > > believe the gso propagation works for L3 master devices. > > > > > > From a quick grep, team devices do not appear to handle gso changes either. > > > > This code should still work correctly, but no optimization would happen. > > The gso_max_size of the VRF or team will > > still be GSO_MAX_SIZE so there would be no change. If VRF or Team ever got smart > > enough to handle GSO limits, then the algorithm would handle it. > > This patch propagates gso value from bridge to its veth endpoints. > However, since bridge is never aware of the GSO limit from underlying > interfaces, bridge/veth still have larger GSO size. > > In the docker case, bridge is not linked directly to physical or > synthetic interfaces; it relies on iptables to decide which interface to > forward packets to. So for the docker case, then direct control of GSO values via netlink (ie ip link set) seems like the better solution. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-27 21:15 ` Stephen Hemminger @ 2017-11-28 1:42 ` Solio Sarabia 2017-11-28 2:02 ` David Ahern 0 siblings, 1 reply; 22+ messages in thread From: Solio Sarabia @ 2017-11-28 1:42 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dsahern, davem, netdev, sthemmin On Mon, Nov 27, 2017 at 01:15:02PM -0800, Stephen Hemminger wrote: > On Mon, 27 Nov 2017 12:14:19 -0800 > Solio Sarabia <solio.sarabia@intel.com> wrote: > > > On Sun, Nov 26, 2017 at 11:07:25PM -0800, Stephen Hemminger wrote: > > > On Sun, 26 Nov 2017 20:13:39 -0700 > > > David Ahern <dsahern@gmail.com> wrote: > > > > > > > On 11/26/17 11:17 AM, Stephen Hemminger wrote: > > > > > This allows veth device in containers to see the GSO maximum > > > > > settings of the actual device being used for output. > > > > > > > > veth devices can be added to a VRF instead of a bridge, and I do not > > > > believe the gso propagation works for L3 master devices. > > > > > > > > From a quick grep, team devices do not appear to handle gso changes either. > > > > > > This code should still work correctly, but no optimization would happen. > > > The gso_max_size of the VRF or team will > > > still be GSO_MAX_SIZE so there would be no change. If VRF or Team ever got smart > > > enough to handle GSO limits, then the algorithm would handle it. > > > > This patch propagates gso value from bridge to its veth endpoints. > > However, since bridge is never aware of the GSO limit from underlying > > interfaces, bridge/veth still have larger GSO size. > > > > In the docker case, bridge is not linked directly to physical or > > synthetic interfaces; it relies on iptables to decide which interface to > > forward packets to. > > So for the docker case, then direct control of GSO values via netlink (ie ip link set) > seems like the better solution. Adding ioctl support for 'ip link set' would work. I'm still concerned how to enforce the upper limit to not exceed that of the lower devices. Consider a system with three NICs, each reporting values in the range [60,000 - 62,780]. Users could set virtual interfaces' gso to 65,536, exceeding the limit, and having the host do sw gso (vms settings must not affect host performance.) Looping through interfaces? With the difference that now it'd be trigger upon user's request, not every time a veth is created (like one previous patch discussed.) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-28 1:42 ` Solio Sarabia @ 2017-11-28 2:02 ` David Ahern 2017-11-30 0:35 ` Solio Sarabia 2017-12-01 20:30 ` Stephen Hemminger 0 siblings, 2 replies; 22+ messages in thread From: David Ahern @ 2017-11-28 2:02 UTC (permalink / raw) To: Solio Sarabia, Stephen Hemminger; +Cc: davem, netdev, sthemmin On 11/27/17 6:42 PM, Solio Sarabia wrote: > On Mon, Nov 27, 2017 at 01:15:02PM -0800, Stephen Hemminger wrote: >> On Mon, 27 Nov 2017 12:14:19 -0800 >> Solio Sarabia <solio.sarabia@intel.com> wrote: >> >>> On Sun, Nov 26, 2017 at 11:07:25PM -0800, Stephen Hemminger wrote: >>>> On Sun, 26 Nov 2017 20:13:39 -0700 >>>> David Ahern <dsahern@gmail.com> wrote: >>>> >>>>> On 11/26/17 11:17 AM, Stephen Hemminger wrote: >>>>>> This allows veth device in containers to see the GSO maximum >>>>>> settings of the actual device being used for output. >>>>> >>>>> veth devices can be added to a VRF instead of a bridge, and I do not >>>>> believe the gso propagation works for L3 master devices. >>>>> >>>>> From a quick grep, team devices do not appear to handle gso changes either. >>>> >>>> This code should still work correctly, but no optimization would happen. >>>> The gso_max_size of the VRF or team will >>>> still be GSO_MAX_SIZE so there would be no change. If VRF or Team ever got smart >>>> enough to handle GSO limits, then the algorithm would handle it. >>> >>> This patch propagates gso value from bridge to its veth endpoints. >>> However, since bridge is never aware of the GSO limit from underlying >>> interfaces, bridge/veth still have larger GSO size. >>> >>> In the docker case, bridge is not linked directly to physical or >>> synthetic interfaces; it relies on iptables to decide which interface to >>> forward packets to. >> >> So for the docker case, then direct control of GSO values via netlink (ie ip link set) >> seems like the better solution. > > Adding ioctl support for 'ip link set' would work. I'm still concerned > how to enforce the upper limit to not exceed that of the lower devices. > > Consider a system with three NICs, each reporting values in the range > [60,000 - 62,780]. Users could set virtual interfaces' gso to 65,536, > exceeding the limit, and having the host do sw gso (vms settings must > not affect host performance.) > > Looping through interfaces? With the difference that now it'd be > trigger upon user's request, not every time a veth is created (like one > previous patch discussed.) > You are concerned about the routed case right? One option is to have VRF devices propagate gso sizes to all devices (veth, vlan, etc) enslaved to it. VRF devices are Layer 3 master devices so an L3 parallel to a bridge. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-28 2:02 ` David Ahern @ 2017-11-30 0:35 ` Solio Sarabia 2017-11-30 17:10 ` Stephen Hemminger 2017-12-01 20:30 ` Stephen Hemminger 1 sibling, 1 reply; 22+ messages in thread From: Solio Sarabia @ 2017-11-30 0:35 UTC (permalink / raw) To: David Ahern, stephen; +Cc: davem, netdev, sthemmin, shiny.sebastian On Mon, Nov 27, 2017 at 07:02:01PM -0700, David Ahern wrote: > On 11/27/17 6:42 PM, Solio Sarabia wrote: > > Adding ioctl support for 'ip link set' would work. I'm still concerned > > how to enforce the upper limit to not exceed that of the lower devices. > > Actually, giving the user control to change gso doesn't solve the issue. In a VM, user could simple ignore setting the gso, still hurting host perf. We need to enforce the lower gso on the bridge/veth. Should this issue be fixed at hv_netvsc level? Why is the driver passing down gso buffer sizes greater than what synthetic interface allows. > > Consider a system with three NICs, each reporting values in the range > > [60,000 - 62,780]. Users could set virtual interfaces' gso to 65,536, > > exceeding the limit, and having the host do sw gso (vms settings must > > not affect host performance.) > > > > Looping through interfaces? With the difference that now it'd be > > trigger upon user's request, not every time a veth is created (like one > > previous patch discussed.) > > > > You are concerned about the routed case right? One option is to have VRF > devices propagate gso sizes to all devices (veth, vlan, etc) enslaved to > it. VRF devices are Layer 3 master devices so an L3 parallel to a bridge. Having the VRF device propagate the gso to its slaves is opposite of what we do now: get minimum of all ports and assign it to bridge (net/bridge/br_if.c, br_min_mtu, br_set_gso_limits.) Would it be right to change the logic flow? If so, this this could work: (1) bridge gets gso from lower devices upon init/setup (2) when new device is attached to bridge, bridge sets gso for this new slave (and its peer if it's veth.) (3) as the code is now, there's an optimization opportunity: for every new interface attached to bridge, bridge loops through all ports to set gso, mtu. It's not necessary as bridge already has the minimum from previous interfaces attached. Could be O(1) instead of O(n). ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-30 0:35 ` Solio Sarabia @ 2017-11-30 17:10 ` Stephen Hemminger 2017-11-30 17:26 ` Eric Dumazet 0 siblings, 1 reply; 22+ messages in thread From: Stephen Hemminger @ 2017-11-30 17:10 UTC (permalink / raw) To: Solio Sarabia; +Cc: David Ahern, davem, netdev, sthemmin, shiny.sebastian On Wed, 29 Nov 2017 16:35:37 -0800 Solio Sarabia <solio.sarabia@intel.com> wrote: > On Mon, Nov 27, 2017 at 07:02:01PM -0700, David Ahern wrote: > > On 11/27/17 6:42 PM, Solio Sarabia wrote: > > > Adding ioctl support for 'ip link set' would work. I'm still concerned > > > how to enforce the upper limit to not exceed that of the lower devices. > > > > Actually, giving the user control to change gso doesn't solve the issue. > In a VM, user could simple ignore setting the gso, still hurting host > perf. We need to enforce the lower gso on the bridge/veth. > > Should this issue be fixed at hv_netvsc level? Why is the driver passing > down gso buffer sizes greater than what synthetic interface allows. > > > > Consider a system with three NICs, each reporting values in the range > > > [60,000 - 62,780]. Users could set virtual interfaces' gso to 65,536, > > > exceeding the limit, and having the host do sw gso (vms settings must > > > not affect host performance.) > > > > > > Looping through interfaces? With the difference that now it'd be > > > trigger upon user's request, not every time a veth is created (like one > > > previous patch discussed.) > > > > > > > You are concerned about the routed case right? One option is to have VRF > > devices propagate gso sizes to all devices (veth, vlan, etc) enslaved to > > it. VRF devices are Layer 3 master devices so an L3 parallel to a bridge. > Having the VRF device propagate the gso to its slaves is opposite of > what we do now: get minimum of all ports and assign it to bridge > (net/bridge/br_if.c, br_min_mtu, br_set_gso_limits.) > > Would it be right to change the logic flow? If so, this this could work: > > (1) bridge gets gso from lower devices upon init/setup > (2) when new device is attached to bridge, bridge sets gso for this new > slave (and its peer if it's veth.) > (3) as the code is now, there's an optimization opportunity: for every > new interface attached to bridge, bridge loops through all ports to > set gso, mtu. It's not necessary as bridge already has the minimum > from previous interfaces attached. Could be O(1) instead of O(n). The problem goes back into the core GSO networking code. Something like this is needed. static inline bool netif_needs_gso(struct sk_buff *skb, const struct net_device *dev, netdev_features_t features) { return skb_is_gso(skb) && (!skb_gso_ok(skb, features) || unlikely(skb_shinfo(skb)->gso_segs > dev->gso_max_segs) || << new unlikely(skb_shinfo(skb)->gso_size > dev->gso_max_size) || << new unlikely((skb->ip_summed != CHECKSUM_PARTIAL) && (skb->ip_summed != CHECKSUM_UNNECESSARY))); } What that will do is split up the monster GSO packets if they ever bleed across from one device to another through the twisty mazes of packet processing paths. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-30 17:10 ` Stephen Hemminger @ 2017-11-30 17:26 ` Eric Dumazet 2017-11-30 17:36 ` Stephen Hemminger ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Eric Dumazet @ 2017-11-30 17:26 UTC (permalink / raw) To: Stephen Hemminger, Solio Sarabia Cc: David Ahern, davem, netdev, sthemmin, shiny.sebastian On Thu, 2017-11-30 at 09:10 -0800, Stephen Hemminger wrote: > > > The problem goes back into the core GSO networking code. > Something like this is needed. > > static inline bool netif_needs_gso(struct sk_buff *skb, > const struct net_device *dev, > netdev_features_t features) > { > return skb_is_gso(skb) && > (!skb_gso_ok(skb, features) || > unlikely(skb_shinfo(skb)->gso_segs > dev- > >gso_max_segs) || << new > unlikely(skb_shinfo(skb)->gso_size > dev- > >gso_max_size) || << new > unlikely((skb->ip_summed != CHECKSUM_PARTIAL) && > (skb->ip_summed != CHECKSUM_UNNECESSARY))); > } > > What that will do is split up the monster GSO packets if they ever > bleed > across from one device to another through the twisty mazes of packet > processing paths. Since very few drivers have these gso_max_segs / gso_max_size, check could be done in their ndo_features_check() ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-30 17:26 ` Eric Dumazet @ 2017-11-30 17:36 ` Stephen Hemminger 2017-11-30 17:38 ` David Miller 2017-11-30 17:49 ` Stephen Hemminger 2 siblings, 0 replies; 22+ messages in thread From: Stephen Hemminger @ 2017-11-30 17:36 UTC (permalink / raw) To: Eric Dumazet Cc: Solio Sarabia, David Ahern, davem, netdev, sthemmin, shiny.sebastian On Thu, 30 Nov 2017 09:26:39 -0800 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2017-11-30 at 09:10 -0800, Stephen Hemminger wrote: > > > > > > The problem goes back into the core GSO networking code. > > Something like this is needed. > > > > static inline bool netif_needs_gso(struct sk_buff *skb, > > const struct net_device *dev, > > netdev_features_t features) > > { > > return skb_is_gso(skb) && > > (!skb_gso_ok(skb, features) || > > unlikely(skb_shinfo(skb)->gso_segs > dev- > > >gso_max_segs) || << new > > unlikely(skb_shinfo(skb)->gso_size > dev- > > >gso_max_size) || << new > > unlikely((skb->ip_summed != CHECKSUM_PARTIAL) && > > (skb->ip_summed != CHECKSUM_UNNECESSARY))); > > } > > > > What that will do is split up the monster GSO packets if they ever > > bleed > > across from one device to another through the twisty mazes of packet > > processing paths. > > > Since very few drivers have these gso_max_segs / gso_max_size, check > could be done in their ndo_features_check() Agreed, we could do it there. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-30 17:26 ` Eric Dumazet 2017-11-30 17:36 ` Stephen Hemminger @ 2017-11-30 17:38 ` David Miller 2017-11-30 17:49 ` Stephen Hemminger 2 siblings, 0 replies; 22+ messages in thread From: David Miller @ 2017-11-30 17:38 UTC (permalink / raw) To: eric.dumazet Cc: stephen, solio.sarabia, dsahern, netdev, sthemmin, shiny.sebastian From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 30 Nov 2017 09:26:39 -0800 > On Thu, 2017-11-30 at 09:10 -0800, Stephen Hemminger wrote: >> >> >> The problem goes back into the core GSO networking code. >> Something like this is needed. >> >> static inline bool netif_needs_gso(struct sk_buff *skb, >> const struct net_device *dev, >> netdev_features_t features) >> { >> return skb_is_gso(skb) && >> (!skb_gso_ok(skb, features) || >> unlikely(skb_shinfo(skb)->gso_segs > dev- >> >gso_max_segs) || << new >> unlikely(skb_shinfo(skb)->gso_size > dev- >> >gso_max_size) || << new >> unlikely((skb->ip_summed != CHECKSUM_PARTIAL) && >> (skb->ip_summed != CHECKSUM_UNNECESSARY))); >> } >> >> What that will do is split up the monster GSO packets if they ever >> bleed >> across from one device to another through the twisty mazes of packet >> processing paths. > > > Since very few drivers have these gso_max_segs / gso_max_size, check > could be done in their ndo_features_check() Agreed. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-30 17:26 ` Eric Dumazet 2017-11-30 17:36 ` Stephen Hemminger 2017-11-30 17:38 ` David Miller @ 2017-11-30 17:49 ` Stephen Hemminger 2017-11-30 17:59 ` Eric Dumazet 2 siblings, 1 reply; 22+ messages in thread From: Stephen Hemminger @ 2017-11-30 17:49 UTC (permalink / raw) To: Eric Dumazet Cc: Solio Sarabia, David Ahern, davem, netdev, sthemmin, shiny.sebastian On Thu, 30 Nov 2017 09:26:39 -0800 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2017-11-30 at 09:10 -0800, Stephen Hemminger wrote: > > > > > > The problem goes back into the core GSO networking code. > > Something like this is needed. > > > > static inline bool netif_needs_gso(struct sk_buff *skb, > > const struct net_device *dev, > > netdev_features_t features) > > { > > return skb_is_gso(skb) && > > (!skb_gso_ok(skb, features) || > > unlikely(skb_shinfo(skb)->gso_segs > dev- > > >gso_max_segs) || << new > > unlikely(skb_shinfo(skb)->gso_size > dev- > > >gso_max_size) || << new > > unlikely((skb->ip_summed != CHECKSUM_PARTIAL) && > > (skb->ip_summed != CHECKSUM_UNNECESSARY))); > > } > > > > What that will do is split up the monster GSO packets if they ever > > bleed > > across from one device to another through the twisty mazes of packet > > processing paths. > > > Since very few drivers have these gso_max_segs / gso_max_size, check > could be done in their ndo_features_check() Actually, we already check for max_segs, just missing check for size here: From 71a134f41c4aae8947241091300d21745aa237f2 Mon Sep 17 00:00:00 2001 From: Stephen Hemminger <sthemmin@microsoft.com> Date: Thu, 30 Nov 2017 09:45:11 -0800 Subject: [PATCH] net: do not GSO if frame is too large This adds an additional check to breakup skb's that exceed a devices GSO maximum size. The code was already checking for too many segments but did not check size. This has been observed to be a problem when using containers on Hyper-V/Azure where the allowed GSO maximum size is less than maximum and skb's have gone through multiple layers to arrive at the virtual device. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> --- net/core/dev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index 07ed21d64f92..0bb398f3bfa3 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2918,9 +2918,11 @@ static netdev_features_t gso_features_check(const struct sk_buff *skb, struct net_device *dev, netdev_features_t features) { + unsigned int gso_size = skb_shinfo(skb)->gso_size; u16 gso_segs = skb_shinfo(skb)->gso_segs; - if (gso_segs > dev->gso_max_segs) + if (gso_segs > dev->gso_max_segs || + gso_size > dev->gso_max_size) return features & ~NETIF_F_GSO_MASK; /* Support for GSO partial features requires software -- 2.11.0 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-30 17:49 ` Stephen Hemminger @ 2017-11-30 17:59 ` Eric Dumazet 2017-11-30 18:08 ` Stephen Hemminger 0 siblings, 1 reply; 22+ messages in thread From: Eric Dumazet @ 2017-11-30 17:59 UTC (permalink / raw) To: Stephen Hemminger Cc: Solio Sarabia, David Ahern, davem, netdev, sthemmin, shiny.sebastian On Thu, 2017-11-30 at 09:49 -0800, Stephen Hemminger wrote: > On Thu, 30 Nov 2017 09:26:39 -0800 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > On Thu, 2017-11-30 at 09:10 -0800, Stephen Hemminger wrote: > > > > > > > > > The problem goes back into the core GSO networking code. > > > Something like this is needed. > > > > > > static inline bool netif_needs_gso(struct sk_buff *skb, > > > const struct net_device *dev, > > > netdev_features_t features) > > > { > > > return skb_is_gso(skb) && > > > (!skb_gso_ok(skb, features) || > > > unlikely(skb_shinfo(skb)->gso_segs > dev- > > > > gso_max_segs) || << new > > > > > > unlikely(skb_shinfo(skb)->gso_size > dev- > > > > gso_max_size) || << new > > > > > > unlikely((skb->ip_summed != CHECKSUM_PARTIAL) && > > > (skb->ip_summed != CHECKSUM_UNNECESSARY))); > > > } > > > > > > What that will do is split up the monster GSO packets if they > > > ever > > > bleed > > > across from one device to another through the twisty mazes of > > > packet > > > processing paths. > > > > > > Since very few drivers have these gso_max_segs / gso_max_size, > > check > > could be done in their ndo_features_check() > > Actually, we already check for max_segs, just missing check for size > here: > > From 71a134f41c4aae8947241091300d21745aa237f2 Mon Sep 17 00:00:00 > 2001 > From: Stephen Hemminger <sthemmin@microsoft.com> > Date: Thu, 30 Nov 2017 09:45:11 -0800 > Subject: [PATCH] net: do not GSO if frame is too large > > This adds an additional check to breakup skb's that exceed a devices > GSO maximum size. The code was already checking for too many segments > but did not check size. > > This has been observed to be a problem when using containers on > Hyper-V/Azure where the allowed GSO maximum size is less than > maximum and skb's have gone through multiple layers to arrive > at the virtual device. > > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> > --- > net/core/dev.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 07ed21d64f92..0bb398f3bfa3 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -2918,9 +2918,11 @@ static netdev_features_t > gso_features_check(const struct sk_buff *skb, > struct net_device *dev, > netdev_features_t > features) > { > + unsigned int gso_size = skb_shinfo(skb)->gso_size; > u16 gso_segs = skb_shinfo(skb)->gso_segs; > > - if (gso_segs > dev->gso_max_segs) > + if (gso_segs > dev->gso_max_segs || > + gso_size > dev->gso_max_size) > return features & ~NETIF_F_GSO_MASK; > > /* Support for GSO partial features requires software Yes, but check commit 743b03a83297690f0bd38c452a3bbb47d2be300a ("net: remove netdevice gso_min_segs") Plan was to get rid of the existing check, not adding new ones :/ ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-30 17:59 ` Eric Dumazet @ 2017-11-30 18:08 ` Stephen Hemminger 2017-11-30 18:10 ` Eric Dumazet 0 siblings, 1 reply; 22+ messages in thread From: Stephen Hemminger @ 2017-11-30 18:08 UTC (permalink / raw) To: Eric Dumazet Cc: Solio Sarabia, David Ahern, davem, netdev, sthemmin, shiny.sebastian On Thu, 30 Nov 2017 09:59:23 -0800 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2017-11-30 at 09:49 -0800, Stephen Hemminger wrote: > > On Thu, 30 Nov 2017 09:26:39 -0800 > > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > > On Thu, 2017-11-30 at 09:10 -0800, Stephen Hemminger wrote: > > > > > > > > > > > > The problem goes back into the core GSO networking code. > > > > Something like this is needed. > > > > > > > > static inline bool netif_needs_gso(struct sk_buff *skb, > > > > const struct net_device *dev, > > > > netdev_features_t features) > > > > { > > > > return skb_is_gso(skb) && > > > > (!skb_gso_ok(skb, features) || > > > > unlikely(skb_shinfo(skb)->gso_segs > dev- > > > > > gso_max_segs) || << new > > > > > > > > unlikely(skb_shinfo(skb)->gso_size > dev- > > > > > gso_max_size) || << new > > > > > > > > unlikely((skb->ip_summed != CHECKSUM_PARTIAL) && > > > > (skb->ip_summed != CHECKSUM_UNNECESSARY))); > > > > } > > > > > > > > What that will do is split up the monster GSO packets if they > > > > ever > > > > bleed > > > > across from one device to another through the twisty mazes of > > > > packet > > > > processing paths. > > > > > > > > > Since very few drivers have these gso_max_segs / gso_max_size, > > > check > > > could be done in their ndo_features_check() > > > > Actually, we already check for max_segs, just missing check for size > > here: > > > > From 71a134f41c4aae8947241091300d21745aa237f2 Mon Sep 17 00:00:00 > > 2001 > > From: Stephen Hemminger <sthemmin@microsoft.com> > > Date: Thu, 30 Nov 2017 09:45:11 -0800 > > Subject: [PATCH] net: do not GSO if frame is too large > > > > This adds an additional check to breakup skb's that exceed a devices > > GSO maximum size. The code was already checking for too many segments > > but did not check size. > > > > This has been observed to be a problem when using containers on > > Hyper-V/Azure where the allowed GSO maximum size is less than > > maximum and skb's have gone through multiple layers to arrive > > at the virtual device. > > > > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> > > --- > > net/core/dev.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 07ed21d64f92..0bb398f3bfa3 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -2918,9 +2918,11 @@ static netdev_features_t > > gso_features_check(const struct sk_buff *skb, > > struct net_device *dev, > > netdev_features_t > > features) > > { > > + unsigned int gso_size = skb_shinfo(skb)->gso_size; > > u16 gso_segs = skb_shinfo(skb)->gso_segs; > > > > - if (gso_segs > dev->gso_max_segs) > > + if (gso_segs > dev->gso_max_segs || > > + gso_size > dev->gso_max_size) > > return features & ~NETIF_F_GSO_MASK; > > > > /* Support for GSO partial features requires software > > > Yes, but check commit 743b03a83297690f0bd38c452a3bbb47d2be300a > ("net: remove netdevice gso_min_segs") > > Plan was to get rid of the existing check, not adding new ones :/ Sure can do it in the driver and that has other benefits like ability to backport to older distributions. Still need gso_max_size though since want to tell TCP to avoid generating mega-jumbo frames. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-30 18:08 ` Stephen Hemminger @ 2017-11-30 18:10 ` Eric Dumazet 0 siblings, 0 replies; 22+ messages in thread From: Eric Dumazet @ 2017-11-30 18:10 UTC (permalink / raw) To: Stephen Hemminger Cc: Solio Sarabia, David Ahern, davem, netdev, sthemmin, shiny.sebastian On Thu, 2017-11-30 at 10:08 -0800, Stephen Hemminger wrote: > On Thu, 30 Nov 2017 09:59:23 -0800 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > On Thu, 2017-11-30 at 09:49 -0800, Stephen Hemminger wrote: > > > On Thu, 30 Nov 2017 09:26:39 -0800 > > > Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > > > > > On Thu, 2017-11-30 at 09:10 -0800, Stephen Hemminger wrote: > > > > > > > > > > > > > > > The problem goes back into the core GSO networking code. > > > > > Something like this is needed. > > > > > > > > > > static inline bool netif_needs_gso(struct sk_buff *skb, > > > > > const struct net_device > > > > > *dev, > > > > > netdev_features_t features) > > > > > { > > > > > return skb_is_gso(skb) && > > > > > (!skb_gso_ok(skb, features) || > > > > > unlikely(skb_shinfo(skb)->gso_segs > dev- > > > > > > gso_max_segs) || << new > > > > > > > > > > unlikely(skb_shinfo(skb)->gso_size > dev- > > > > > > gso_max_size) || << new > > > > > > > > > > unlikely((skb->ip_summed != CHECKSUM_PARTIAL) > > > > > && > > > > > (skb->ip_summed != > > > > > CHECKSUM_UNNECESSARY))); > > > > > } > > > > > > > > > > What that will do is split up the monster GSO packets if they > > > > > ever > > > > > bleed > > > > > across from one device to another through the twisty mazes of > > > > > packet > > > > > processing paths. > > > > > > > > > > > > Since very few drivers have these gso_max_segs / gso_max_size, > > > > check > > > > could be done in their ndo_features_check() > > > > > > Actually, we already check for max_segs, just missing check for > > > size > > > here: > > > > > > From 71a134f41c4aae8947241091300d21745aa237f2 Mon Sep 17 00:00:00 > > > 2001 > > > From: Stephen Hemminger <sthemmin@microsoft.com> > > > Date: Thu, 30 Nov 2017 09:45:11 -0800 > > > Subject: [PATCH] net: do not GSO if frame is too large > > > > > > This adds an additional check to breakup skb's that exceed a > > > devices > > > GSO maximum size. The code was already checking for too many > > > segments > > > but did not check size. > > > > > > This has been observed to be a problem when using containers on > > > Hyper-V/Azure where the allowed GSO maximum size is less than > > > maximum and skb's have gone through multiple layers to arrive > > > at the virtual device. > > > > > > Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> > > > --- > > > net/core/dev.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > index 07ed21d64f92..0bb398f3bfa3 100644 > > > --- a/net/core/dev.c > > > +++ b/net/core/dev.c > > > @@ -2918,9 +2918,11 @@ static netdev_features_t > > > gso_features_check(const struct sk_buff *skb, > > > struct net_device > > > *dev, > > > netdev_features_t > > > features) > > > { > > > + unsigned int gso_size = skb_shinfo(skb)->gso_size; > > > u16 gso_segs = skb_shinfo(skb)->gso_segs; > > > > > > - if (gso_segs > dev->gso_max_segs) > > > + if (gso_segs > dev->gso_max_segs || > > > + gso_size > dev->gso_max_size) > > > return features & ~NETIF_F_GSO_MASK; > > > > > > /* Support for GSO partial features requires software > > > > > > Yes, but check commit 743b03a83297690f0bd38c452a3bbb47d2be300a > > ("net: remove netdevice gso_min_segs") > > > > Plan was to get rid of the existing check, not adding new ones :/ > > Sure can do it in the driver and that has other benefits like ability > to backport to older distributions. > > Still need gso_max_size though since want to tell TCP to avoid > generating mega-jumbo frames. > Sure, the netdev->gso_max_{size|segs} are staying. I was simply trying to not add another check in fast path :/ Thanks. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 2/2] veth: propagate bridge GSO to peer 2017-11-28 2:02 ` David Ahern 2017-11-30 0:35 ` Solio Sarabia @ 2017-12-01 20:30 ` Stephen Hemminger 1 sibling, 0 replies; 22+ messages in thread From: Stephen Hemminger @ 2017-12-01 20:30 UTC (permalink / raw) To: David Ahern; +Cc: Solio Sarabia, davem, netdev, sthemmin On Mon, 27 Nov 2017 19:02:01 -0700 David Ahern <dsahern@gmail.com> wrote: > On 11/27/17 6:42 PM, Solio Sarabia wrote: > > On Mon, Nov 27, 2017 at 01:15:02PM -0800, Stephen Hemminger wrote: > >> On Mon, 27 Nov 2017 12:14:19 -0800 > >> Solio Sarabia <solio.sarabia@intel.com> wrote: > >> > >>> On Sun, Nov 26, 2017 at 11:07:25PM -0800, Stephen Hemminger wrote: > >>>> On Sun, 26 Nov 2017 20:13:39 -0700 > >>>> David Ahern <dsahern@gmail.com> wrote: > >>>> > >>>>> On 11/26/17 11:17 AM, Stephen Hemminger wrote: > >>>>>> This allows veth device in containers to see the GSO maximum > >>>>>> settings of the actual device being used for output. > >>>>> > >>>>> veth devices can be added to a VRF instead of a bridge, and I do not > >>>>> believe the gso propagation works for L3 master devices. > >>>>> > >>>>> From a quick grep, team devices do not appear to handle gso changes either. > >>>> > >>>> This code should still work correctly, but no optimization would happen. > >>>> The gso_max_size of the VRF or team will > >>>> still be GSO_MAX_SIZE so there would be no change. If VRF or Team ever got smart > >>>> enough to handle GSO limits, then the algorithm would handle it. > >>> > >>> This patch propagates gso value from bridge to its veth endpoints. > >>> However, since bridge is never aware of the GSO limit from underlying > >>> interfaces, bridge/veth still have larger GSO size. > >>> > >>> In the docker case, bridge is not linked directly to physical or > >>> synthetic interfaces; it relies on iptables to decide which interface to > >>> forward packets to. > >> > >> So for the docker case, then direct control of GSO values via netlink (ie ip link set) > >> seems like the better solution. > > > > Adding ioctl support for 'ip link set' would work. I'm still concerned > > how to enforce the upper limit to not exceed that of the lower devices. > > > > Consider a system with three NICs, each reporting values in the range > > [60,000 - 62,780]. Users could set virtual interfaces' gso to 65,536, > > exceeding the limit, and having the host do sw gso (vms settings must > > not affect host performance.) > > > > Looping through interfaces? With the difference that now it'd be > > trigger upon user's request, not every time a veth is created (like one > > previous patch discussed.) > > > > You are concerned about the routed case right? One option is to have VRF > devices propagate gso sizes to all devices (veth, vlan, etc) enslaved to > it. VRF devices are Layer 3 master devices so an L3 parallel to a bridge. See the patch set I posted today which punts the problem to veth setup. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 0/2] veth, bridge, and GSO maximums 2017-11-26 18:17 [PATCH RFC 0/2] veth, bridge, and GSO maximums Stephen Hemminger 2017-11-26 18:17 ` [PATCH RFC 1/2] br: add notifier for when bridge changes it " Stephen Hemminger 2017-11-26 18:17 ` [PATCH RFC 2/2] veth: propagate bridge GSO to peer Stephen Hemminger @ 2017-11-30 15:47 ` David Miller 2017-11-30 17:11 ` Stephen Hemminger 2 siblings, 1 reply; 22+ messages in thread From: David Miller @ 2017-11-30 15:47 UTC (permalink / raw) To: stephen; +Cc: netdev, sthemmin From: Stephen Hemminger <stephen@networkplumber.org> Date: Sun, 26 Nov 2017 10:17:47 -0800 > This pair of patchesimproves the performance when running > containers in an environment where underlying device has lower > GSO maximum (such as Azure). > > With containers a veth pair is created and one end is attached > to the bridge device. The bridge device correctly reports > computes GSO parameters that are the minimum of the lower devices. > > The problem is that the other end of the veth device (in container) > reports the full GSO size. This patch propogates the upper > (bridge device) parameters to the other end of the veth device. > > Please consider it as alternative to the sysfs GSO changes. I like this approach a lot, please resubmit this formally. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 0/2] veth, bridge, and GSO maximums 2017-11-30 15:47 ` [PATCH RFC 0/2] veth, bridge, and GSO maximums David Miller @ 2017-11-30 17:11 ` Stephen Hemminger 2017-11-30 20:50 ` Alexander Duyck 0 siblings, 1 reply; 22+ messages in thread From: Stephen Hemminger @ 2017-11-30 17:11 UTC (permalink / raw) To: David Miller; +Cc: netdev, sthemmin On Thu, 30 Nov 2017 10:47:21 -0500 (EST) David Miller <davem@davemloft.net> wrote: > From: Stephen Hemminger <stephen@networkplumber.org> > Date: Sun, 26 Nov 2017 10:17:47 -0800 > > > This pair of patchesimproves the performance when running > > containers in an environment where underlying device has lower > > GSO maximum (such as Azure). > > > > With containers a veth pair is created and one end is attached > > to the bridge device. The bridge device correctly reports > > computes GSO parameters that are the minimum of the lower devices. > > > > The problem is that the other end of the veth device (in container) > > reports the full GSO size. This patch propogates the upper > > (bridge device) parameters to the other end of the veth device. > > > > Please consider it as alternative to the sysfs GSO changes. > > I like this approach a lot, please resubmit this formally. Will do and add netif_needs_gso check as well. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH RFC 0/2] veth, bridge, and GSO maximums 2017-11-30 17:11 ` Stephen Hemminger @ 2017-11-30 20:50 ` Alexander Duyck 0 siblings, 0 replies; 22+ messages in thread From: Alexander Duyck @ 2017-11-30 20:50 UTC (permalink / raw) To: Stephen Hemminger; +Cc: David Miller, Netdev, sthemmin On Thu, Nov 30, 2017 at 9:11 AM, Stephen Hemminger <stephen@networkplumber.org> wrote: > On Thu, 30 Nov 2017 10:47:21 -0500 (EST) > David Miller <davem@davemloft.net> wrote: > >> From: Stephen Hemminger <stephen@networkplumber.org> >> Date: Sun, 26 Nov 2017 10:17:47 -0800 >> >> > This pair of patchesimproves the performance when running >> > containers in an environment where underlying device has lower >> > GSO maximum (such as Azure). >> > >> > With containers a veth pair is created and one end is attached >> > to the bridge device. The bridge device correctly reports >> > computes GSO parameters that are the minimum of the lower devices. >> > >> > The problem is that the other end of the veth device (in container) >> > reports the full GSO size. This patch propogates the upper >> > (bridge device) parameters to the other end of the veth device. >> > >> > Please consider it as alternative to the sysfs GSO changes. >> >> I like this approach a lot, please resubmit this formally. > > Will do and add netif_needs_gso check as well. Would it make sense to look at possibly moving something like this over to being handled by something like GSO_PARTIAL? Essentially it is the same type of issue where we are needing to split a TSO frame into smaller TSO frames. We already did something like this for the GRO with skb->fraglist case, I wonder if it wouldn't make sense to just handle the oversized GSO the same way. It seems like you could just split add a check against the size and tweak the mss at the start of skb_segment and that should be about all you need to do to take care of it. If I am not mistaken we could probably just tweak the one line that was computing "partial_segs" in skb_segment so that we did something like "partial_segs = min(len, dev->gso_max_size) / mss". This way we take care of the issue for things like GRO as well which doesn't take the max size into account last I knew, and you would still get most of the benefits of TSO. - Alex ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2017-12-01 20:30 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-11-26 18:17 [PATCH RFC 0/2] veth, bridge, and GSO maximums Stephen Hemminger 2017-11-26 18:17 ` [PATCH RFC 1/2] br: add notifier for when bridge changes it " Stephen Hemminger 2017-11-26 18:17 ` [PATCH RFC 2/2] veth: propagate bridge GSO to peer Stephen Hemminger 2017-11-27 3:13 ` David Ahern 2017-11-27 7:07 ` Stephen Hemminger 2017-11-27 20:14 ` Solio Sarabia 2017-11-27 21:15 ` Stephen Hemminger 2017-11-28 1:42 ` Solio Sarabia 2017-11-28 2:02 ` David Ahern 2017-11-30 0:35 ` Solio Sarabia 2017-11-30 17:10 ` Stephen Hemminger 2017-11-30 17:26 ` Eric Dumazet 2017-11-30 17:36 ` Stephen Hemminger 2017-11-30 17:38 ` David Miller 2017-11-30 17:49 ` Stephen Hemminger 2017-11-30 17:59 ` Eric Dumazet 2017-11-30 18:08 ` Stephen Hemminger 2017-11-30 18:10 ` Eric Dumazet 2017-12-01 20:30 ` Stephen Hemminger 2017-11-30 15:47 ` [PATCH RFC 0/2] veth, bridge, and GSO maximums David Miller 2017-11-30 17:11 ` Stephen Hemminger 2017-11-30 20:50 ` Alexander Duyck
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).