netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jarod Wilson <jarod@redhat.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Michal Kubecek <mkubecek@suse.cz>,
	linux-kernel@vger.kernel.org,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jay Vosburgh <j.vosburgh@gmail.com>,
	Veaceslav Falico <vfalico@gmail.com>,
	Andy Gospodarek <gospo@cumulusnetworks.com>,
	Jiri Pirko <jiri@resnulli.us>,
	Nikolay Aleksandrov <razor@blackwall.org>,
	netdev@vger.kernel.org
Subject: Re: [RFC PATCH net-next] net/core: initial support for stacked dev feature toggles
Date: Mon, 02 Nov 2015 12:37:57 -0500	[thread overview]
Message-ID: <56379F75.2000606@redhat.com> (raw)
In-Reply-To: <5633CCED.7060602@gmail.com>

Alexander Duyck wrote:
> On 10/30/2015 09:25 AM, Jarod Wilson wrote:
...
>> Rather than outright dropping the second bit though, I was thinking
>> maybe just drop a note in dmesg along the lines of "hey, you shut off
>> LRO, it is still enabled on upper dev foo", to placate end-users.
>
> I would rather not see it. It would be mostly noise. It is perfectly
> valid to have LRO advertised on an upper device, but not supported on a
> lower one. It basically just means that the path will allow LRO frames
> through, it doesn't guarantee that we are going to provide them.

Okay, dropping this.

...
>>>> Same thing here. If a lower dev has it disabled then leave it
>>>> disabled. I believe your goal is to make it so that
>>>> dev_disable_lro() can shut down LRO when it is making packets in the
>>>> data-path unusable.
>>>
>>> This is already the case since commit fbe168ba91f7 ("net: generic
>>> dev_disable_lro() stacked device handling"). That commit makes sure
>>> dev_disable_lro() is propagated down the stack and also makes sure new
>>> slaves added to a bond/team with LRO disabled have it disabled too.
>>>
>>> What it does not do is propagating LRO disabling down if it is disabled
>>> in ways that do not call dev_disable_lro() (e.g. via ethtool). I'm not
>>> sure if this should be done or not, both options have their pros and
>>> cons.
>>
>> Making it work with ethtool was one of my primary goals with this
>> change, as it was users prodding things with ethtool that prompted the
>> "hey, this doesn't make sense" bug reports.
>
> I'd say make it work like dev_disable_lro already does. Disabling LRO
> propagates down, enabling LRO only enables it on the specific device.
>
> The way to think of it is as a warning flag. With LRO enabled this
> device may report frames larger than MTU to the stack and will mangle
> checksums. Without LRO all of the frames received should be restricted
> to MTU. That is why you have to force the disabling down to all lower
> devices, and why you cannot enable it if an upper device has it disabled.
>
>>> However, I believe enabling LRO shouldn't be propagated down.
>>
>> Hm. Devices that should never have LRO enabled still won't get it
>> enabled, so I'm not clear what harm it would cause.I tend to think you
>
> How do you define "devices that should never have LRO enabled"?

No NETIF_F_LRO flag set in hw_features is what I was thinking.

> The fact
> is LRO is very messy in terms of the way it functions. Different drivers
> handle it different ways. Usually it results in the Rx checksum being
> mangled, it provides frames larger than MTU, and uses fraglist instead
> of frags on some drivers.
>
>> do want this sync'ing down the stack if set on an upper dev (i.e.,
>> ethtool -K bond0 lro on), for consistency's sake. You can always come
>> back through afterwards and disable things on lower devs individually if
>> they're really not wanted, since we're in agreement that we shouldn't
>> prevent disabling features on lower devices.
>
> Think of it this way. Lets say I have a NIC that I know is problematic
> when LRO is enabled, it might cause a kernel panic due to an skb
> overrun. So I have a bond with it and some other NIC which can run with
> LRO enabled without issues. How do I enable LRO on the other device
> without causing a kernel panic, and without tearing apart the existing
> bond? With the approach you have described I can't because I have to
> enable it at the bond and doing so will enable it on the NIC with the
> faulty implementation.

I'd argue that if enabling LRO on a device causes a panic, that device 
probably shouldn't be advertising LRO support, and the driver ought to 
be fixed, but that's somewhat tangential. I'm already sold on only 
disabling down the stack.

> This is why we cannot enable LRO unless all upper devices support it,
> and why we should propagate disabling LRO down to all lower devices.
> Trying to force it on for a lower device just because the upper device
> supports it is a bad idea because there are multiple LRO implementations
> and they all behave very differently.

That's a bit concerning, given that we default to LRO on in a bond, as 
should all the slaves, regardless of which LRO implementation the device 
has (so long as the driver claims to support LRO, anyway).

But again, that's probably a separate issue, I've got a forthcoming 
patch that I'm still beating around and touching up, but I think looks 
sane and lines up with what you've suggested.

> If nothing else you might start looking at working with a mask of
> bits that function like this.  You could probably start with GRO,
> LRO, and RXCSUM and work your way up from there.  If they aren't set
> on the upper devices you cannot enable them, and if they are cleared
> then they must be cleared on all lower devices.

For step one, I've added a feature mask and a new helper that iterates 
over it looking for set feature flags. In the case of the bnx2x equipped 
host I'm currently testing on, adding RXCSUM had an interesting and as 
yet unexplained side-effect of preventing LRO from being enabled on the 
bnx2x cards -- ethtool showed "off [requested on]".

-- 
Jarod Wilson
jarod@redhat.com

  reply	other threads:[~2015-11-02 17:37 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-24  3:40 [RFC PATCH net-next] net/core: initial support for stacked dev feature toggles Jarod Wilson
2015-10-24  4:41 ` Tom Herbert
2015-10-24  5:51 ` Alexander Duyck
2015-10-26  9:42   ` Michal Kubecek
2015-10-30 16:25     ` Jarod Wilson
2015-10-30 20:02       ` Alexander Duyck
2015-11-02 17:37         ` Jarod Wilson [this message]
2015-10-30 16:35   ` Jarod Wilson
2015-10-30 20:14     ` Alexander Duyck
2015-11-02 17:53 ` [PATCH net-next] net/core: generic support for disabling netdev features down stack Jarod Wilson
2015-11-02 18:04   ` Alexander Duyck
2015-11-02 21:57     ` Jarod Wilson
2015-11-03  2:55   ` [PATCH v2 " Jarod Wilson
2015-11-03  4:41     ` David Miller
2015-11-03 10:03     ` Nikolay Aleksandrov
2015-11-03 13:52       ` Geert Uytterhoeven
2015-11-03 13:57         ` Jarod Wilson
2015-11-03 14:05           ` Nikolay Aleksandrov
2015-11-03 15:18             ` Jarod Wilson
2015-11-03 15:15     ` [PATCH net-next] net/core: fix for_each_netdev_feature Jarod Wilson
2015-11-03 15:33       ` Nikolay Aleksandrov
2015-11-03 16:34       ` David Miller
2015-11-03 20:36     ` [PATCH net-next] net/core: ensure features get disabled on new lower devs Jarod Wilson
2015-11-03 21:17       ` Alexander Duyck
2015-11-03 22:11         ` Jarod Wilson
2015-11-03 23:01           ` Alexander Duyck
2015-11-03 21:21       ` Nikolay Aleksandrov
2015-11-03 21:53       ` Michal Kubecek
2015-11-03 21:58         ` Jarod Wilson
2015-11-04  4:09       ` [PATCH v2 " Jarod Wilson
2015-11-05  2:56         ` David Miller
2015-11-13  0:26           ` Florian Fainelli
2015-11-13 10:29             ` Jiri Pirko
2015-11-13 10:51               ` Nikolay Aleksandrov
2015-11-13 13:54                 ` [PATCH net] net: fix feature changes on devices without ndo_set_features Nikolay Aleksandrov
2015-11-13 14:00                   ` Jiri Pirko
2015-11-13 14:06                   ` Andy Gospodarek
2015-11-13 14:34                   ` Jarod Wilson
2015-11-13 18:30                   ` Florian Fainelli
2015-11-15  7:25                   ` [net] " Dave Young
2015-11-16  2:01                     ` Dave Young
2015-11-16 19:56                   ` [PATCH net] " David Miller
2015-11-17 23:03                   ` Sergei Shtylyov
2015-11-17 23:10                     ` Nikolay Aleksandrov
2015-11-18 10:51                       ` Sergei Shtylyov
2015-11-13 22:31                 ` [PATCH v2 net-next] net/core: ensure features get disabled on new lower devs Laura Abbott
2015-11-17  9:02             ` Geert Uytterhoeven
2015-11-17 10:04               ` Geert Uytterhoeven
2016-04-02  2:21     ` [PATCH v2 net-next] net/core: generic support for disabling netdev features down stack Michał Mirosław

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56379F75.2000606@redhat.com \
    --to=jarod@redhat.com \
    --cc=alexander.duyck@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gospo@cumulusnetworks.com \
    --cc=j.vosburgh@gmail.com \
    --cc=jiri@resnulli.us \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mkubecek@suse.cz \
    --cc=netdev@vger.kernel.org \
    --cc=razor@blackwall.org \
    --cc=vfalico@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).