* Re: [PATCHv1 net] xen-netback: support frontends without feature-rx-notify again
From: Wei Liu @ 2014-12-18 11:51 UTC (permalink / raw)
To: David Vrabel; +Cc: netdev, xen-devel, Ian Campbell, Wei Liu
In-Reply-To: <1418901186-14478-1-git-send-email-david.vrabel@citrix.com>
On Thu, Dec 18, 2014 at 11:13:06AM +0000, David Vrabel wrote:
> Commit bc96f648df1bbc2729abbb84513cf4f64273a1f1 (xen-netback: make
> feature-rx-notify mandatory) incorrectly assumed that there were no
> frontends in use that did not support this feature. But the frontend
> driver in MiniOS does not and since this is used by (qemu) stubdoms,
> these stopped working.
>
> Netback sort of works as-is in this mode except:
>
> - If there are no Rx requests and the internal Rx queue fills, only
> the drain timeout will wake the thread. The default drain timeout
> of 10 s would give unacceptable pauses.
>
> - If an Rx stall was detected and the internal Rx queue is drained,
> then the Rx thread would never wake.
>
> Handle these two cases (when feature-rx-notify is disabled) by:
>
> - Reducing the drain timeout to 30 ms.
>
> - Disabling Rx stall detection.
>
> Reported-by: John <jw@nuclearfallout.net>
> Tested-by: John <jw@nuclearfallout.net>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
^ permalink raw reply
* [PATCH net] be2net: Fix incorrect setting of tunnel offload flag in netdev features
From: Sriharsha Basavapatna @ 2014-12-19 4:30 UTC (permalink / raw)
To: netdev
An earlier commit to resolve an issue with encapsulation offloads missed
setting a bit in the outer netdev features flag. This results in loss of TSO
feature on a VxLAN interface.
Fixes: 630f4b70 ("Export tunnel offloads only when a VxLAN tunnel is created")
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@emulex.com>
---
drivers/net/ethernet/emulex/benet/be_main.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 2aacd47..1960731 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3138,6 +3138,7 @@ static void be_disable_vxlan_offloads(struct be_adapter *adapter)
netdev->hw_enc_features = 0;
netdev->hw_features &= ~(NETIF_F_GSO_UDP_TUNNEL);
+ netdev->features &= ~(NETIF_F_GSO_UDP_TUNNEL);
}
#endif
@@ -4429,6 +4430,7 @@ static void be_add_vxlan_port(struct net_device *netdev, sa_family_t sa_family,
NETIF_F_TSO | NETIF_F_TSO6 |
NETIF_F_GSO_UDP_TUNNEL;
netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
+ netdev->features |= NETIF_F_GSO_UDP_TUNNEL;
dev_info(dev, "Enabled VxLAN offloads for UDP port %d\n",
be16_to_cpu(port));
--
1.7.9.5
^ permalink raw reply related
* Re: [iproute2] tc: Show classes more hierarchically]
From: Marcelo Ricardo Leitner @ 2014-12-18 12:26 UTC (permalink / raw)
To: Vadim Kochan; +Cc: Stephen Hemminger, netdev
In-Reply-To: <20141218031257.GA19527@angus-think.lan>
On 18-12-2014 01:12, Vadim Kochan wrote:
> On Wed, Dec 17, 2014 at 11:56:04PM -0200, Marcelo Ricardo Leitner wrote:
>> On 17-12-2014 21:56, Vadim Kochan wrote:
>>> On Wed, Dec 17, 2014 at 11:55:35AM -0800, Stephen Hemminger wrote:
>>>> On Tue, 16 Dec 2014 16:12:41 -0200
>>>> Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
>>>>
>>>>> On 15-12-2014 20:48, vadim4j@gmail.com wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> I am playing with showing classes in more hierarchically format and I
>>>>>> have some code and example of output from my TC looks like:
>>>>>>
>>>>>> # tc/tc -t class show dev tap0
>>>>>>
>>>>>> \---1:2 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> \---1:40 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> \---1:50 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> \---1:60 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> \---1:1 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> \---1:10 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> \---1:11 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> \---1:111 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> \---1:20 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> \---1:30 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>>
>>>>>>
>>>>>> which in standart output mode it looks like:
>>>>>>
>>>>>> # tc/tc class show dev tap0
>>>>>>
>>>>>> class htb 1:11 parent 1:10 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> class htb 1:111 parent 1:11 prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> class htb 1:10 parent 1:1 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
>>>>>> class htb 1:1 root rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> class htb 1:20 parent 1:1 leaf 20: prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> class htb 1:2 root rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> class htb 1:30 parent 1:1 leaf 30: prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> class htb 1:40 parent 1:2 leaf 40: prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
>>>>>> class htb 1:50 parent 1:2 leaf 50: prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>> class htb 1:60 parent 1:2 leaf 60: prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>>>
>>>>>> So I'd like to ask if it might be useful for the TC users (may be
>>>>>> better format ?) to have this ?
>>>>>
>>>>> Good idea! It already looks good, but what about:
>>>>>
>>>>> |-- 1:2 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>> | |-- 1:40 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>> | |-- 1:50 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>> | '-- 1:60 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>> |-- 1:1 (htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>>> ...
>>>>>
>>>>> just another idea..
>>>>>
>>>>> Thanks.
>>>>> Marcelo
>>>>
>>>> There are several places that also print tree format, hopefully there would
>>>> be reusable code (lspci, tree, ps).
>>>>
>>>
>>> OK, currently I have the following output:
>>>
>>> +---1:2(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
>>> | +---1:40(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
>>> | +---1:50(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
>>> | +---1:60(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>> |
>>> +---1:1(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
>>> +---1:10(htb) rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
>>> | +---1:11(htb) rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
>>> | | +---1:111(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>> | | +---1:112(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>> | |
>>> | +---1:12(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>> |
>>> +---1:20(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
>>> +---1:30(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>>
>>> How about this ?
>>
>> Looks very good to me, thanks!
>>
>>> Regards,
>>> Vadim Kochan
>>>
>>
>
> There is an exampe output of tree with stats:
>
> +---1:2(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
> | | * Send 100 pkts ...
> | | * Rate 10mbit ...
> | +---1:40(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
> | | | * Send 100 pkts ...
> | | | * Rate 10mbit ...
> | +---1:50(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> | | | * Send 100 pkts ...
> | | | * Rate 10mbit ...
> | +---1:60(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> | | * Send 100 pkts ...
> | | * Rate 10mbit ...
> |
> +---1:1(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
> | * Send 100 pkts ...
> | * Rate 10mbit ...
> +---1:10(htb) rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
> | | * Send 100 pkts ...
> | | * Rate 10mbit ...
> | +---1:11(htb) rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> | | | * Send 100 pkts ...
> | | | * Rate 10mbit ...
> | | +---1:111(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> | | | | * Send 100 pkts ...
> | | | | * Rate 10mbit ...
^ these are confusing IMHO, rest looks good to me
Marcelo
> | | +---1:112(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> | | | * Send 100 pkts ...
> | | | * Rate 10mbit ...
> | |
> | +---1:12(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> | | * Send 100 pkts ...
> | | * Rate 10mbit ...
> |
> +---1:20(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> | | * Send 100 pkts ...
> | | * Rate 10mbit ...
> +---1:30(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> | * Send 100 pkts ...
> | * Rate 10mbit ...
>
>
> Yeah, this is bigger one ...
>
> Regards,
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply
* Re: [iproute2] tc: Show classes more hierarchically]
From: Vadim Kochan @ 2014-12-18 12:23 UTC (permalink / raw)
To: Marcelo Ricardo Leitner; +Cc: Vadim Kochan, Stephen Hemminger, netdev
In-Reply-To: <5492C7FD.8060603@gmail.com>
On Thu, Dec 18, 2014 at 10:26:37AM -0200, Marcelo Ricardo Leitner wrote:
> >+---1:1(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > | * Send 100 pkts ...
> > | * Rate 10mbit ...
> > +---1:10(htb) rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
> > | | * Send 100 pkts ...
> > | | * Rate 10mbit ...
> > | +---1:11(htb) rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > | | | * Send 100 pkts ...
> > | | | * Rate 10mbit ...
> > | | +---1:111(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > | | | | * Send 100 pkts ...
> > | | | | * Rate 10mbit ...
> ^ these are confusing IMHO, rest looks good to me
>
> Marcelo
>
Yes, just fixed it to :
+---1:2(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
| | * Send 100 pkts ...
| | * Rate 10mbit ...
| +---1:40(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
| | * Send 100 pkts ...
| | * Rate 10mbit ...
| +---1:50(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
| | * Send 100 pkts ...
| | * Rate 10mbit ...
| +---1:60(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
| * Send 100 pkts ...
| * Rate 10mbit ...
|
+---1:1(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
| * Send 100 pkts ...
| * Rate 10mbit ...
+---1:10(htb) rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
| | * Send 100 pkts ...
| | * Rate 10mbit ...
| +---1:11(htb) rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
| | | * Send 100 pkts ...
| | | * Rate 10mbit ...
| | +---1:111(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
| | | * Send 100 pkts ...
| | | * Rate 10mbit ...
| | +---1:112(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
| | * Send 100 pkts ...
| | * Rate 10mbit ...
| |
| +---1:12(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
| * Send 100 pkts ...
| * Rate 10mbit ...
|
+---1:20(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
| * Send 100 pkts ...
| * Rate 10mbit ...
+---1:30(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
* Send 100 pkts ...
* Rate 10mbit ...
I am not sure about the better format to print stats - use '*' or '>' as
prefix, '*' - seems better ?
Regards,
^ permalink raw reply
* Re: [iproute2] tc: Show classes more hierarchically]
From: Marcelo Ricardo Leitner @ 2014-12-18 12:46 UTC (permalink / raw)
To: Vadim Kochan; +Cc: Stephen Hemminger, netdev
In-Reply-To: <20141218122329.GA31878@angus-think.lan>
On 18-12-2014 10:23, Vadim Kochan wrote:
> On Thu, Dec 18, 2014 at 10:26:37AM -0200, Marcelo Ricardo Leitner wrote:
>>> +---1:1(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
>>> | * Send 100 pkts ...
>>> | * Rate 10mbit ...
>>> +---1:10(htb) rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
>>> | | * Send 100 pkts ...
>>> | | * Rate 10mbit ...
>>> | +---1:11(htb) rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
>>> | | | * Send 100 pkts ...
>>> | | | * Rate 10mbit ...
>>> | | +---1:111(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
>>> | | | | * Send 100 pkts ...
>>> | | | | * Rate 10mbit ...
>> ^ these are confusing IMHO, rest looks good to me
>>
>> Marcelo
>>
>
>
> Yes, just fixed it to :
Cool
> +---1:2(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
> | | * Send 100 pkts ...
> | | * Rate 10mbit ...
> | +---1:40(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
> | | * Send 100 pkts ...
> | | * Rate 10mbit ...
> | +---1:50(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> | | * Send 100 pkts ...
> | | * Rate 10mbit ...
> | +---1:60(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> | * Send 100 pkts ...
> | * Rate 10mbit ...
> |
> +---1:1(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
> | * Send 100 pkts ...
> | * Rate 10mbit ...
> +---1:10(htb) rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
> | | * Send 100 pkts ...
> | | * Rate 10mbit ...
> | +---1:11(htb) rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> | | | * Send 100 pkts ...
> | | | * Rate 10mbit ...
> | | +---1:111(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> | | | * Send 100 pkts ...
> | | | * Rate 10mbit ...
> | | +---1:112(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> | | * Send 100 pkts ...
> | | * Rate 10mbit ...
> | |
> | +---1:12(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> | * Send 100 pkts ...
> | * Rate 10mbit ...
> |
> +---1:20(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> | * Send 100 pkts ...
> | * Rate 10mbit ...
> +---1:30(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> * Send 100 pkts ...
> * Rate 10mbit ...
>
> I am not sure about the better format to print stats - use '*' or '>' as
> prefix, '*' - seems better ?
TBH I don't think we need one in there. It's already linked to the items
due to the tree on them. If you just align them to start right where the
the first word after the ')' begins, it would be good. Like:
| +---1:11(htb) rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
| | | Send 100 pkts ...
| | | Rate 10mbit ...
| | +---1:111(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb
cburst 1599b
| | | Send 100 pkts ...
| | | Rate 10mbit ...
| | +---1:112(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb
cburst 1599b
But I don't know how large this will get..
Yet I don't mind having one marker in there. Let's see what others think
about it.
Regards,
Marcelo
^ permalink raw reply
* Re: [PATCH] brcmsmac: don't leak kernel memory via printk()
From: Arend van Spriel @ 2014-12-18 13:01 UTC (permalink / raw)
To: Brian Norris
Cc: John W. Linville, Kalle Valo, David S. Miller, Brett Rudley,
Franky (Zhenhui) Lin, Hante Meuleman, linux-wireless,
brcm80211-dev-list, netdev
In-Reply-To: <CAN8TOE--Vfa1L8dcoV=up1bA+QX0_=+DkdDa7fAv6DJBF8PHow@mail.gmail.com>
On 12/18/14 07:32, Brian Norris wrote:
> + others [1]
>
> On Wed, Dec 10, 2014 at 1:39 AM, Brian Norris
> <computersforpeace@gmail.com> wrote:
>> Debug code prints the fifo name via custom dev_warn() wrappers. The
>> fifo_names array is only non-zero when debugging is manually enabled,
>> which is all well and good. However, it's *not* good that this array
>> uses zero-length arrays in the non-debug case, and so it doesn't
>> actually have any memory allocated to it. This means that as far as we
>> know, fifo_names[i] actually points to garbage memory.
>>
>> I've seen this in my log:
>>
>> [ 4601.205511] brcmsmac bcma0:1: wl0: brcms_c_d11hdrs_mac80211: �GeL txop exceeded phylen 137/256 dur 1602/1504
>>
>> So let's give this array space enough to fill it with a NULL byte.
>>
>> Signed-off-by: Brian Norris<computersforpeace@gmail.com>
>> Cc: Brett Rudley<brudley@broadcom.com>
>> Cc: Arend van Spriel<arend@broadcom.com>
>> Cc: "Franky (Zhenhui) Lin"<frankyl@broadcom.com>
>> Cc: Hante Meuleman<meuleman@broadcom.com>
>> Cc: "John W. Linville"<linville@tuxdriver.com>
>> Cc: linux-wireless@vger.kernel.org
>> Cc: brcm80211-dev-list@broadcom.com
>> Cc: netdev@vger.kernel.org
>
> BTW, I guess this qualifies as a security hole, albeit a small one.
> Should this be CC: stable@vger.kernel.org?
I have no strong opinion on this, but I guess. Feel free to do so.
Regards,
Arend
>> ---
>> drivers/net/wireless/brcm80211/brcmsmac/main.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/wireless/brcm80211/brcmsmac/main.c b/drivers/net/wireless/brcm80211/brcmsmac/main.c
>> index 1b474828d5b8..aed0c948dce8 100644
>> --- a/drivers/net/wireless/brcm80211/brcmsmac/main.c
>> +++ b/drivers/net/wireless/brcm80211/brcmsmac/main.c
>> @@ -316,7 +316,7 @@ static const u16 xmtfifo_sz[][NFIFO] = {
>> static const char * const fifo_names[] = {
>> "AC_BK", "AC_BE", "AC_VI", "AC_VO", "BCMC", "ATIM" };
>> #else
>> -static const char fifo_names[6][0];
>> +static const char fifo_names[6][1];
>> #endif
>>
>> #ifdef DEBUG
>
> Brian
>
> [1] http://lwn.net/Articles/626689/
^ permalink raw reply
* Re: Bug: mv643xxx fails with highmem
From: Ezequiel Garcia @ 2014-12-18 13:13 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: David Miller, Nimrod Andy, Fabio Estevam, netdev, fugang.duan
In-Reply-To: <20141218000357.GX11285@n2100.arm.linux.org.uk>
On 12/17/2014 09:03 PM, Russell King - ARM Linux wrote:
> On Wed, Dec 17, 2014 at 06:18:58PM -0300, Ezequiel Garcia wrote:
>> On the other side, I haven't been able to reproduce this on my boards. I
>> did try to put a hack to hold most lowmem pages, but it didn't make a
>> difference. (In fact, I haven't been able to clearly see how the pages
>> for the skbuff are allocated from high memory.)
>
> To be honest, I don't know either. All that I can do is describe what
> happened...
>
> I've been running 3.17 since a week after it came out, and never saw a
> problem there.
>
> Then I moved forward to 3.18, and ended up with memory corruption, which
> seemed to be the GPU scribbling over kernel text (since the oops revealed
> pixel values in the Code: line.)
>
> I thought it was a GPU problem - which seemed a reasonable assumption as
> I know that the runtime PM I implemented for the GPU doesn't properly
> restore the hardware state yet. So, I rebooted back into 3.18, this
> time with all GPU users disabled, intending to download a kernel with
> GPU runtime PM disabled. I'd also added additional debug to my X DDX
> driver which logged the GPU command stream to a file on a NFS mount -
> this does open(, O_CREAT|O_WRONLY|O_APPEND), write(), close() each
> time it submits a block of commands.
>
> However, while my scripts to download the built kernel to the Cubox
> were running, the kernel oopsed in the depths of dma_map_single() - the
> kernel was trying to access a struct page for phys address 0x40000000,
> which didn't exist. I decided to go back to 3.17 to get the updated
> kernel on it, hoping that would sort it out.
>
> With the updated 3.18 kernel (with GPU runtime PM disabled), I found
> that I'd still get oopses in from the network driver while X was starting
> up, again from dma_map_single(). So, with all GPU users again disabled,
> I set about debugging the this issue.
>
> I added a BUG_ON(!addr) after the page_address(), and that fired. I
> added a BUG_ON(PageHighMem(this_frag->page.p)) and that fired too.
> (Each time, I had to boot back to 3.17 in order to download the new
> kernel, because very time I tried with 3.18, I'd hit this bug.)
>
> It's then when I reported the issue and asked the questions...
>
> I've since done a simple change, taking advantage that on ARM (or any
> asm-generic/dma-mapping-common.h user), dma_unmap_single() and
> dma_unmap_page() are the same function:
>
> diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
> index d44560d1d268..c343ab03ab8b 100644
> --- a/drivers/net/ethernet/marvell/mv643xx_eth.c
> +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
> @@ -879,10 +879,8 @@ static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)
> skb_frag_t *this_frag;
> int tx_index;
> struct tx_desc *desc;
> - void *addr;
>
> this_frag = &skb_shinfo(skb)->frags[frag];
> - addr = page_address(this_frag->page.p) + this_frag->page_offset; tx_index = txq->tx_curr_desc++;
> if (txq->tx_curr_desc == txq->tx_ring_size)
> txq->tx_curr_desc = 0;
> @@ -902,8 +900,9 @@ static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)
>
> desc->l4i_chk = 0;
> desc->byte_cnt = skb_frag_size(this_frag);
> - desc->buf_ptr = dma_map_single(mp->dev->dev.parent, addr,
> - desc->byte_cnt, DMA_TO_DEVICE);
> + desc->buf_ptr = skb_frag_dma_map(mp->dev->dev.parent,
> + this_frag, 0,
> + desc->byte_cnt, DMA_TO_DEVICE); }
> }
>
>
> I've been running that for the last five days, and I've yet to see
> /any/ issues what so ever, and that includes running with the GPU
> logging all that time:
>
> -rw-r--r-- 1 root root 17113616 Dec 17 23:52 /shared/etnaviv.bin
>
> During that time, I've been using the device over the network, running
> various git commands, running builds, running the occasional build
> via NFS, etc.
>
> So, for me it was trivially easy to reproduce (without my fix in place)
> and all problems have gone away when I've fixed the apparent problem.
>
Well that's interesting. You've fixed only the non-TSO egress path,
yet your original ethtool output showed tcp-segmentation-offload enabled.
This seems to imply the highmem pages are found only for the non-TSO path.
> However, exactly how it occurs, I don't know. My understanding from
> reading the various feature flags was that NETIF_F_HIGHDMA was required
> for highmem (see illegal_highdma()) so as this isn't set, we shouldn't
> be seeing highmem fragments - which is why I asked the question in my
> original email.
>
> If you want me to revert my fix above, and reproduce again, I can
> certainly try that - or put a WARN_ON_ONCE(PageHighMem(this_frag->page.p))
> in there, but I seem to remember that it wasn't particularly useful as
> the backtrace didn't show where the memory actually came from.
>
No, that's OK. Thanks a lot for all the details. I'll try to come up with a
fix soon.
--
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com
^ permalink raw reply
* Re: [iproute2] tc: Show classes more hierarchically]
From: Vadim Kochan @ 2014-12-18 13:16 UTC (permalink / raw)
To: Marcelo Ricardo Leitner; +Cc: Vadim Kochan, Stephen Hemminger, netdev
In-Reply-To: <5492CC8E.6050108@gmail.com>
On Thu, Dec 18, 2014 at 10:46:06AM -0200, Marcelo Ricardo Leitner wrote:
> On 18-12-2014 10:23, Vadim Kochan wrote:
> >On Thu, Dec 18, 2014 at 10:26:37AM -0200, Marcelo Ricardo Leitner wrote:
> >>>+---1:1(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
> >>> | * Send 100 pkts ...
> >>> | * Rate 10mbit ...
> >>> +---1:10(htb) rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
> >>> | | * Send 100 pkts ...
> >>> | | * Rate 10mbit ...
> >>> | +---1:11(htb) rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> >>> | | | * Send 100 pkts ...
> >>> | | | * Rate 10mbit ...
> >>> | | +---1:111(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> >>> | | | | * Send 100 pkts ...
> >>> | | | | * Rate 10mbit ...
> >> ^ these are confusing IMHO, rest looks good to me
> >>
> >> Marcelo
> >>
> >
> >
> >Yes, just fixed it to :
>
> Cool
>
> >+---1:2(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
> >| | * Send 100 pkts ...
> >| | * Rate 10mbit ...
> >| +---1:40(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
> >| | * Send 100 pkts ...
> >| | * Rate 10mbit ...
> >| +---1:50(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> >| | * Send 100 pkts ...
> >| | * Rate 10mbit ...
> >| +---1:60(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> >| * Send 100 pkts ...
> >| * Rate 10mbit ...
> >|
> >+---1:1(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > | * Send 100 pkts ...
> > | * Rate 10mbit ...
> > +---1:10(htb) rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
> > | | * Send 100 pkts ...
> > | | * Rate 10mbit ...
> > | +---1:11(htb) rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > | | | * Send 100 pkts ...
> > | | | * Rate 10mbit ...
> > | | +---1:111(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > | | | * Send 100 pkts ...
> > | | | * Rate 10mbit ...
> > | | +---1:112(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > | | * Send 100 pkts ...
> > | | * Rate 10mbit ...
> > | |
> > | +---1:12(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > | * Send 100 pkts ...
> > | * Rate 10mbit ...
> > |
> > +---1:20(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> > | * Send 100 pkts ...
> > | * Rate 10mbit ...
> > +---1:30(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
> > * Send 100 pkts ...
> > * Rate 10mbit ...
> >
> >I am not sure about the better format to print stats - use '*' or '>' as
> >prefix, '*' - seems better ?
>
> TBH I don't think we need one in there. It's already linked to the items due
> to the tree on them. If you just align them to start right where the the
> first word after the ')' begins, it would be good. Like:
>
> | +---1:11(htb) rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
> | | | Send 100 pkts ...
> | | | Rate 10mbit ...
> | | +---1:111(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst
> 1599b
> | | | Send 100 pkts ...
> | | | Rate 10mbit ...
> | | +---1:112(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst
> 1599b
>
> But I don't know how large this will get..
>
> Yet I don't mind having one marker in there. Let's see what others think
> about it.
>
> Regards,
> Marcelo
>
I corrected regarding to your comments:
+---1:2(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
| | Send 100 pkts ...
| | Rate 10mbit ...
| |
| +---1:40(htb) prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
| | Send 100 pkts ...
| | Rate 10mbit ...
| |
| +---1:50(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
| | Send 100 pkts ...
| | Rate 10mbit ...
| |
| +---1:60(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
| Send 100 pkts ...
| Rate 10mbit ...
|
+---1:1(htb) rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
| Send 100 pkts ...
| Rate 10mbit ...
|
+---1:10(htb) rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
| | Send 100 pkts ...
| | Rate 10mbit ...
| |
| +---1:11(htb) rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
| | | Send 100 pkts ...
| | | Rate 10mbit ...
| | |
| | +---1:111(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
| | | Send 100 pkts ...
| | | Rate 10mbit ...
| | |
| | +---1:112(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
| | Send 100 pkts ...
| | Rate 10mbit ...
| |
| +---1:12(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
| Send 100 pkts ...
| Rate 10mbit ...
|
+---1:20(htb) prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
| Send 100 pkts ...
| Rate 10mbit ...
|
+---1:30(htb) prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
Send 100 pkts ...
Rate 10mbit ...
The problem that this is huge now but looks better visually, I am
thinking also about to possibiliy to show only some part of tree by specified parent id ...
But how much classes can be used usually for traffic control per device ?
Regards,
Vadim Kochan
^ permalink raw reply
* Re: [iproute2] tc: Show classes more hierarchically]
From: Daniel Borkmann @ 2014-12-18 13:47 UTC (permalink / raw)
To: Vadim Kochan; +Cc: Marcelo Ricardo Leitner, Stephen Hemminger, netdev, jhs
In-Reply-To: <20141218131653.GA5980@angus-think.lan>
On 12/18/2014 02:16 PM, Vadim Kochan wrote:
...
> The problem that this is huge now but looks better visually, I am
> thinking also about to possibiliy to show only some part of tree by specified parent id ...
I definitely like this work!
Just thinking out loud, what about an output option for tc which is plain
DOT [1], thus it can be handed over for rendering in tools like graphviz?
It won't require any dependencies either as it's just plaintext.
[1] https://en.wikipedia.org/wiki/DOT_%28graph_description_language%29
^ permalink raw reply
* Re: [iproute2] tc: Show classes more hierarchically]
From: Vadim Kochan @ 2014-12-18 13:46 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Vadim Kochan, Marcelo Ricardo Leitner, Stephen Hemminger, netdev,
jhs
In-Reply-To: <5492DAFA.5070001@redhat.com>
On Thu, Dec 18, 2014 at 02:47:38PM +0100, Daniel Borkmann wrote:
> On 12/18/2014 02:16 PM, Vadim Kochan wrote:
> ...
> >The problem that this is huge now but looks better visually, I am
> >thinking also about to possibiliy to show only some part of tree by specified parent id ...
>
> I definitely like this work!
>
> Just thinking out loud, what about an output option for tc which is plain
> DOT [1], thus it can be handed over for rendering in tools like graphviz?
>
> It won't require any dependencies either as it's just plaintext.
>
> [1] https://en.wikipedia.org/wiki/DOT_%28graph_description_language%29
Yeah, good idea :-)
^ permalink raw reply
* Re: [PATCH net] enic: fix rx skb checksum
From: Sergei Shtylyov @ 2014-12-18 13:58 UTC (permalink / raw)
To: Govindarajulu Varadarajan, davem, netdev, ssujith, benve
Cc: Jiri Benc, Stefan Assmann
In-Reply-To: <1418898522-13588-1-git-send-email-_govind@gmx.com>
Hello.
On 12/18/2014 1:28 PM, Govindarajulu Varadarajan wrote:
> Hardware always provides compliment of IP pseudo checksum. Stack expects
> whole packet checksum without pseudo checksum if CHECKSUM_COMPLETE is set.
> This causes checksum error in nf & ovs.
> kernel: qg-19546f09-f2: hw csum failure
> kernel: CPU: 9 PID: 0 Comm: swapper/9 Tainted: GF O-------------- 3.10.0-123.8.1.el7.x86_64 #1
> kernel: Hardware name: Cisco Systems Inc UCSB-B200-M3/UCSB-B200-M3, BIOS B200M3.2.2.3.0.080820141339 08/08/2014
> kernel: ffff881218f40000 df68243feb35e3a8 ffff881237a43ab8 ffffffff815e237b
> kernel: ffff881237a43ad0 ffffffff814cd4ca ffff8829ec71eb00 ffff881237a43af0
> kernel: ffffffff814c6232 0000000000000286 ffff8829ec71eb00 ffff881237a43b00
> kernel: Call Trace:
> kernel: <IRQ> [<ffffffff815e237b>] dump_stack+0x19/0x1b
> kernel: [<ffffffff814cd4ca>] netdev_rx_csum_fault+0x3a/0x40
> kernel: [<ffffffff814c6232>] __skb_checksum_complete_head+0x62/0x70
> kernel: [<ffffffff814c6251>] __skb_checksum_complete+0x11/0x20
> kernel: [<ffffffff8155a20c>] nf_ip_checksum+0xcc/0x100
> kernel: [<ffffffffa049edc7>] icmp_error+0x1f7/0x35c [nf_conntrack_ipv4]
> kernel: [<ffffffff814cf419>] ? netif_rx+0xb9/0x1d0
> kernel: [<ffffffffa040eb7b>] ? internal_dev_recv+0xdb/0x130 [openvswitch]
> kernel: [<ffffffffa04c8330>] nf_conntrack_in+0xf0/0xa80 [nf_conntrack]
> kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
> kernel: [<ffffffffa049e302>] ipv4_conntrack_in+0x22/0x30 [nf_conntrack_ipv4]
> kernel: [<ffffffff815005ca>] nf_iterate+0xaa/0xc0
> kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
> kernel: [<ffffffff81500664>] nf_hook_slow+0x84/0x140
> kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
> kernel: [<ffffffff81509dd4>] ip_rcv+0x344/0x380
> Hardware verifies IP & tcp/udp header checksum but does not provide payload
> checksum, use CHECKSUM_UNNECESSARY. Set it only if its valid IP tcp/udp packet.
> Cc: Jiri Benc <jbenc@redhat.com>
> Cc: Stefan Assmann <sassmann@redhat.com>
> Reported-by: Sunil Choudhary <schoudha@redhat.com>
> Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
> ---
> drivers/net/ethernet/cisco/enic/enic_main.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
> diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
> index 868d0f6..705f334 100644
> --- a/drivers/net/ethernet/cisco/enic/enic_main.c
> +++ b/drivers/net/ethernet/cisco/enic/enic_main.c
> @@ -1060,10 +1060,14 @@ static void enic_rq_indicate_buf(struct vnic_rq *rq,
> PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
> }
>
> - if ((netdev->features & NETIF_F_RXCSUM) && !csum_not_calc) {
> - skb->csum = htons(checksum);
> - skb->ip_summed = CHECKSUM_COMPLETE;
> - }
> + /* Hardware does not provide whole packet checksum. It only
> + * provides pseudo checksum. Since hw validates the packet
> + * checksum but not provide us the checksum value. use
s/not/doesn't/, s/value./value,/.
[...]
WBR, Sergei
^ permalink raw reply
* Re: [iproute2] tc: Show classes more hierarchically]
From: Thomas Graf @ 2014-12-18 13:58 UTC (permalink / raw)
To: Vadim Kochan
Cc: Daniel Borkmann, Marcelo Ricardo Leitner, Stephen Hemminger,
netdev, jhs
In-Reply-To: <20141218134621.GA9985@angus-think.lan>
On 12/18/14 at 03:46pm, Vadim Kochan wrote:
> On Thu, Dec 18, 2014 at 02:47:38PM +0100, Daniel Borkmann wrote:
> > On 12/18/2014 02:16 PM, Vadim Kochan wrote:
> > ...
> > >The problem that this is huge now but looks better visually, I am
> > >thinking also about to possibiliy to show only some part of tree by specified parent id ...
> >
> > I definitely like this work!
> >
> > Just thinking out loud, what about an output option for tc which is plain
> > DOT [1], thus it can be handed over for rendering in tools like graphviz?
> >
> > It won't require any dependencies either as it's just plaintext.
> >
> > [1] https://en.wikipedia.org/wiki/DOT_%28graph_description_language%29
>
> Yeah, good idea :-)
tcng had something like that ;-)
^ permalink raw reply
* Re: [RFC PATCH net-next v2 1/1] net: Support for switch port configuration
From: Roopa Prabhu @ 2014-12-18 14:44 UTC (permalink / raw)
To: Varlese, Marco
Cc: netdev@vger.kernel.org, Fastabend, John R, Thomas Graf,
Jiri Pirko, sfeldma@gmail.com, linux-kernel@vger.kernel.org
In-Reply-To: <C4896FB061E7DE4AAC93031BDCA044B104AD7B82@IRSMSX108.ger.corp.intel.com>
On 12/18/14, 3:29 AM, Varlese, Marco wrote:
> From: Marco Varlese <marco.varlese@intel.com>
>
> Switch hardware offers a list of attributes that are configurable
> on a per port basis.
> This patch provides a mechanism to configure switch ports by adding
> an NDO for setting specific values to specific attributes.
> There will be a separate patch that adds the "get" functionality via
> another NDO and another patch that extends iproute2 to call the two
> new NDOs.
>
> Signed-off-by: Marco Varlese <marco.varlese@intel.com>
> ---
> include/linux/netdevice.h | 5 ++++
> include/uapi/linux/if_link.h | 15 ++++++++++++
> net/core/rtnetlink.c | 54 ++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 74 insertions(+)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index c31f74d..4881c7b 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1027,6 +1027,9 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
> * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
> * Called to notify switch device port of bridge port STP
> * state change.
> + * int (*ndo_switch_port_set_cfg)(struct net_device *dev,
> + * u32 attr, u64 value);
> + * Called to set specific switch ports attributes.
> */
> struct net_device_ops {
> int (*ndo_init)(struct net_device *dev);
> @@ -1185,6 +1188,8 @@ struct net_device_ops {
> struct netdev_phys_item_id *psid);
> int (*ndo_switch_port_stp_update)(struct net_device *dev,
> u8 state);
> + int (*ndo_switch_port_set_cfg)(struct net_device *dev,
> + u32 attr, u64 value);
> #endif
> };
>
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index f7d0d2d..19cb51a 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -146,6 +146,7 @@ enum {
> IFLA_PHYS_PORT_ID,
> IFLA_CARRIER_CHANGES,
> IFLA_PHYS_SWITCH_ID,
> + IFLA_SWITCH_PORT_CFG,
> __IFLA_MAX
> };
>
> @@ -603,4 +604,18 @@ enum {
>
> #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
>
> +/* Switch Port Attributes section */
> +
> +enum {
> + IFLA_ATTR_UNSPEC,
> + IFLA_ATTR_LEARNING,
Any reason you want learning here ?. This is covered as part of the
bridge setlink attributes.
> + IFLA_ATTR_LOOPBACK,
> + IFLA_ATTR_BCAST_FLOODING,
> + IFLA_ATTR_UCAST_FLOODING,
> + IFLA_ATTR_MCAST_FLOODING,
> + __IFLA_ATTR_MAX
> +};
> +
> +#define IFLA_ATTR_MAX (__IFLA_ATTR_MAX - 1)
> +
> #endif /* _UAPI_LINUX_IF_LINK_H */
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index eaa057f..c0d1c45 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1389,6 +1389,46 @@ static int validate_linkmsg(struct net_device *dev, struct nlattr *tb[])
> return 0;
> }
>
> +#ifdef CONFIG_NET_SWITCHDEV
> +static int do_setswcfg(struct net_device *dev, struct nlattr *attr)
> +{
> + int rem, err = -EINVAL;
> + struct nlattr *v;
> + const struct net_device_ops *ops = dev->netdev_ops;
> +
> + nla_for_each_nested(v, attr, rem) {
> + u32 op = nla_type(v);
> + u64 value = 0;
> +
> + switch (op) {
> + case IFLA_ATTR_LEARNING:
> + case IFLA_ATTR_LOOPBACK:
> + case IFLA_ATTR_BCAST_FLOODING:
> + case IFLA_ATTR_UCAST_FLOODING:
> + case IFLA_ATTR_MCAST_FLOODING: {
> + if (nla_len(v) < sizeof(value)) {
> + err = -EINVAL;
> + break;
> + }
> +
> + value = nla_get_u64(v);
> + err = ops->ndo_switch_port_set_cfg(dev,
> + op,
> + value);
> + break;
> + }
> + default:
> + err = -EINVAL;
> + break;
> + }
> + if (err)
> + break;
> + }
> + return err;
> +}
> +
> +#endif
> +
> static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
> {
> int rem, err = -EINVAL;
> @@ -1740,6 +1780,20 @@ static int do_setlink(const struct sk_buff *skb,
> status |= DO_SETLINK_NOTIFY;
> }
> }
> +#ifdef CONFIG_NET_SWITCHDEV
> + if (tb[IFLA_SWITCH_PORT_CFG]) {
> + err = -EOPNOTSUPP;
> + if (!ops->ndo_switch_port_set_cfg)
> + goto errout;
> + if (!ops->ndo_switch_parent_id_get)
> + goto errout;
> + err = do_setswcfg(dev, tb[IFLA_SWITCH_PORT_CFG]);
> + if (err < 0)
> + goto errout;
> +
> + status |= DO_SETLINK_NOTIFY;
> + }
> +#endif
> err = 0;
>
> errout:
^ permalink raw reply
* RE: [RFC PATCH net-next v2 1/1] net: Support for switch port configuration
From: Varlese, Marco @ 2014-12-18 14:55 UTC (permalink / raw)
To: Roopa Prabhu
Cc: netdev@vger.kernel.org, Fastabend, John R, Thomas Graf,
Jiri Pirko, sfeldma@gmail.com, linux-kernel@vger.kernel.org
In-Reply-To: <5492E85C.6010802@cumulusnetworks.com>
> -----Original Message-----
> From: Roopa Prabhu [mailto:roopa@cumulusnetworks.com]
> Sent: Thursday, December 18, 2014 2:45 PM
> To: Varlese, Marco
> Cc: netdev@vger.kernel.org; Fastabend, John R; Thomas Graf; Jiri Pirko;
> sfeldma@gmail.com; linux-kernel@vger.kernel.org
> Subject: Re: [RFC PATCH net-next v2 1/1] net: Support for switch port
> configuration
>
> On 12/18/14, 3:29 AM, Varlese, Marco wrote:
> > From: Marco Varlese <marco.varlese@intel.com>
> >
> > Switch hardware offers a list of attributes that are configurable on a
> > per port basis.
> > This patch provides a mechanism to configure switch ports by adding an
> > NDO for setting specific values to specific attributes.
> > There will be a separate patch that adds the "get" functionality via
> > another NDO and another patch that extends iproute2 to call the two
> > new NDOs.
> >
> > Signed-off-by: Marco Varlese <marco.varlese@intel.com>
> > ---
> > include/linux/netdevice.h | 5 ++++
> > include/uapi/linux/if_link.h | 15 ++++++++++++
> > net/core/rtnetlink.c | 54
> ++++++++++++++++++++++++++++++++++++++++++++
> > 3 files changed, 74 insertions(+)
> >
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index c31f74d..4881c7b 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -1027,6 +1027,9 @@ typedef u16 (*select_queue_fallback_t)(struct
> net_device *dev,
> > * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
> > * Called to notify switch device port of bridge port STP
> > * state change.
> > + * int (*ndo_switch_port_set_cfg)(struct net_device *dev,
> > + * u32 attr, u64 value);
> > + * Called to set specific switch ports attributes.
> > */
> > struct net_device_ops {
> > int (*ndo_init)(struct net_device *dev);
> > @@ -1185,6 +1188,8 @@ struct net_device_ops {
> > struct
> netdev_phys_item_id *psid);
> > int (*ndo_switch_port_stp_update)(struct
> net_device *dev,
> > u8 state);
> > + int (*ndo_switch_port_set_cfg)(struct
> net_device *dev,
> > + u32 attr, u64 value);
> > #endif
> > };
> >
> > diff --git a/include/uapi/linux/if_link.h
> > b/include/uapi/linux/if_link.h index f7d0d2d..19cb51a 100644
> > --- a/include/uapi/linux/if_link.h
> > +++ b/include/uapi/linux/if_link.h
> > @@ -146,6 +146,7 @@ enum {
> > IFLA_PHYS_PORT_ID,
> > IFLA_CARRIER_CHANGES,
> > IFLA_PHYS_SWITCH_ID,
> > + IFLA_SWITCH_PORT_CFG,
> > __IFLA_MAX
> > };
> >
> > @@ -603,4 +604,18 @@ enum {
> >
> > #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
> >
> > +/* Switch Port Attributes section */
> > +
> > +enum {
> > + IFLA_ATTR_UNSPEC,
> > + IFLA_ATTR_LEARNING,
> Any reason you want learning here ?. This is covered as part of the bridge
> setlink attributes.
>
Yes, because the user may _not_ want to go through a bridge interface necessarily.
> > + IFLA_ATTR_LOOPBACK,
> > + IFLA_ATTR_BCAST_FLOODING,
> > + IFLA_ATTR_UCAST_FLOODING,
> > + IFLA_ATTR_MCAST_FLOODING,
> > + __IFLA_ATTR_MAX
> > +};
> > +
> > +#define IFLA_ATTR_MAX (__IFLA_ATTR_MAX - 1)
> > +
> > #endif /* _UAPI_LINUX_IF_LINK_H */
> > diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index
> > eaa057f..c0d1c45 100644
> > --- a/net/core/rtnetlink.c
> > +++ b/net/core/rtnetlink.c
> > @@ -1389,6 +1389,46 @@ static int validate_linkmsg(struct net_device
> *dev, struct nlattr *tb[])
> > return 0;
> > }
> >
> > +#ifdef CONFIG_NET_SWITCHDEV
> > +static int do_setswcfg(struct net_device *dev, struct nlattr *attr) {
> > + int rem, err = -EINVAL;
> > + struct nlattr *v;
> > + const struct net_device_ops *ops = dev->netdev_ops;
> > +
> > + nla_for_each_nested(v, attr, rem) {
> > + u32 op = nla_type(v);
> > + u64 value = 0;
> > +
> > + switch (op) {
> > + case IFLA_ATTR_LEARNING:
> > + case IFLA_ATTR_LOOPBACK:
> > + case IFLA_ATTR_BCAST_FLOODING:
> > + case IFLA_ATTR_UCAST_FLOODING:
> > + case IFLA_ATTR_MCAST_FLOODING: {
> > + if (nla_len(v) < sizeof(value)) {
> > + err = -EINVAL;
> > + break;
> > + }
> > +
> > + value = nla_get_u64(v);
> > + err = ops->ndo_switch_port_set_cfg(dev,
> > + op,
> > + value);
> > + break;
> > + }
> > + default:
> > + err = -EINVAL;
> > + break;
> > + }
> > + if (err)
> > + break;
> > + }
> > + return err;
> > +}
> > +
> > +#endif
> > +
> > static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
> > {
> > int rem, err = -EINVAL;
> > @@ -1740,6 +1780,20 @@ static int do_setlink(const struct sk_buff *skb,
> > status |= DO_SETLINK_NOTIFY;
> > }
> > }
> > +#ifdef CONFIG_NET_SWITCHDEV
> > + if (tb[IFLA_SWITCH_PORT_CFG]) {
> > + err = -EOPNOTSUPP;
> > + if (!ops->ndo_switch_port_set_cfg)
> > + goto errout;
> > + if (!ops->ndo_switch_parent_id_get)
> > + goto errout;
> > + err = do_setswcfg(dev, tb[IFLA_SWITCH_PORT_CFG]);
> > + if (err < 0)
> > + goto errout;
> > +
> > + status |= DO_SETLINK_NOTIFY;
> > + }
> > +#endif
> > err = 0;
> >
> > errout:
^ permalink raw reply
* Re: [PATCH 01/10] core: Split out UFO6 support
From: Vlad Yasevich @ 2014-12-18 15:01 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: netdev, Vladislav Yasevich, virtualization, stefanha, ben,
David Miller
In-Reply-To: <20141218075409.GA31702@redhat.com>
On 12/18/2014 02:54 AM, Michael S. Tsirkin wrote:
> Cc Dave, pls remember to do this next time otherwise
> your patches won't get merged :)
>
> On Wed, Dec 17, 2014 at 06:31:50PM -0500, Vlad Yasevich wrote:
>> On 12/17/2014 05:45 PM, Michael S. Tsirkin wrote:
>>> On Wed, Dec 17, 2014 at 01:20:46PM -0500, Vladislav Yasevich wrote:
>>>> Split IPv6 support for UFO into its own feature similiar to TSO.
>>>> This will later allow us to re-enable UFO support for virtio-net
>>>> devices.
>>>>
>>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>>>> ---
>>>> include/linux/netdev_features.h | 7 +++++--
>>>> include/linux/netdevice.h | 1 +
>>>> include/linux/skbuff.h | 1 +
>>>> net/core/dev.c | 35 +++++++++++++++++++----------------
>>>> net/core/ethtool.c | 2 +-
>>>> 5 files changed, 27 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
>>>> index dcfdecb..a078945 100644
>>>> --- a/include/linux/netdev_features.h
>>>> +++ b/include/linux/netdev_features.h
>>>> @@ -48,8 +48,9 @@ enum {
>>>> NETIF_F_GSO_UDP_TUNNEL_BIT, /* ... UDP TUNNEL with TSO */
>>>> NETIF_F_GSO_UDP_TUNNEL_CSUM_BIT,/* ... UDP TUNNEL with TSO & CSUM */
>>>> NETIF_F_GSO_MPLS_BIT, /* ... MPLS segmentation */
>>>> + NETIF_F_UFO6_BIT, /* ... UDPv6 fragmentation */
>>>> /**/NETIF_F_GSO_LAST = /* last bit, see GSO_MASK */
>>>> - NETIF_F_GSO_MPLS_BIT,
>>>> + NETIF_F_UFO6_BIT,
>>>>
>>>> NETIF_F_FCOE_CRC_BIT, /* FCoE CRC32 */
>>>> NETIF_F_SCTP_CSUM_BIT, /* SCTP checksum offload */
>>>> @@ -109,6 +110,7 @@ enum {
>>>> #define NETIF_F_TSO_ECN __NETIF_F(TSO_ECN)
>>>> #define NETIF_F_TSO __NETIF_F(TSO)
>>>> #define NETIF_F_UFO __NETIF_F(UFO)
>>>> +#define NETIF_F_UFO6 __NETIF_F(UFO6)
>>>> #define NETIF_F_VLAN_CHALLENGED __NETIF_F(VLAN_CHALLENGED)
>>>> #define NETIF_F_RXFCS __NETIF_F(RXFCS)
>>>> #define NETIF_F_RXALL __NETIF_F(RXALL)
>>>> @@ -141,7 +143,7 @@ enum {
>>>>
>>>> /* List of features with software fallbacks. */
>>>> #define NETIF_F_GSO_SOFTWARE (NETIF_F_TSO | NETIF_F_TSO_ECN | \
>>>> - NETIF_F_TSO6 | NETIF_F_UFO)
>>>> + NETIF_F_TSO6 | NETIF_F_UFO | NETIF_F_UFO6)
>>>>
>>>> #define NETIF_F_GEN_CSUM NETIF_F_HW_CSUM
>>>> #define NETIF_F_V4_CSUM (NETIF_F_GEN_CSUM | NETIF_F_IP_CSUM)
>>>> @@ -149,6 +151,7 @@ enum {
>>>> #define NETIF_F_ALL_CSUM (NETIF_F_V4_CSUM | NETIF_F_V6_CSUM)
>>>>
>>>> #define NETIF_F_ALL_TSO (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN)
>>>> +#define NETIF_F_ALL_UFO (NETIF_F_UFO | NETIF_F_UFO6)
>>>>
>>>> #define NETIF_F_ALL_FCOE (NETIF_F_FCOE_CRC | NETIF_F_FCOE_MTU | \
>>>> NETIF_F_FSO)
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index 74fd5d3..86af10a 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -3559,6 +3559,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
>>>> /* check flags correspondence */
>>>> BUILD_BUG_ON(SKB_GSO_TCPV4 != (NETIF_F_TSO >> NETIF_F_GSO_SHIFT));
>>>> BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_UFO >> NETIF_F_GSO_SHIFT));
>>>> + BUILD_BUG_ON(SKB_GSO_UDP6 != (NETIF_F_UFO6 >> NETIF_F_GSO_SHIFT));
>>>> BUILD_BUG_ON(SKB_GSO_DODGY != (NETIF_F_GSO_ROBUST >> NETIF_F_GSO_SHIFT));
>>>> BUILD_BUG_ON(SKB_GSO_TCP_ECN != (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
>>>> BUILD_BUG_ON(SKB_GSO_TCPV6 != (NETIF_F_TSO6 >> NETIF_F_GSO_SHIFT));
>>>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>>>> index 6c8b6f6..8538b67 100644
>>>> --- a/include/linux/skbuff.h
>>>> +++ b/include/linux/skbuff.h
>>>> @@ -372,6 +372,7 @@ enum {
>>>>
>>>> SKB_GSO_MPLS = 1 << 12,
>>>>
>>>> + SKB_GSO_UDP6 = 1 << 13
>>>> };
>>>>
>>>> #if BITS_PER_LONG > 32
>>>
>>> So this implies anything getting GSO packets e.g.
>>> from userspace now needs to check IP version to
>>> set GSO type correctly.
>>>
>>> I think you missed some places that do this, e.g. af_packet
>>> sockets.
>>>
>>
>> I looked at af_packet sockets and they set this only in the event
>> vnet header has been used with a GSO type. In this case, the user
>> already knows the the type.
>
> Imagine you are receiving a packet:
>
> if (vnet_hdr.gso_type != VIRTIO_NET_HDR_GSO_NONE) {
> switch (vnet_hdr.gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
> case VIRTIO_NET_HDR_GSO_TCPV4:
> gso_type = SKB_GSO_TCPV4;
> break;
> case VIRTIO_NET_HDR_GSO_TCPV6:
> gso_type = SKB_GSO_TCPV6;
> break;
> case VIRTIO_NET_HDR_GSO_UDP:
> gso_type = SKB_GSO_UDP;
> break;
> default:
> goto out_unlock;
> }
>
> if (vnet_hdr.gso_type & VIRTIO_NET_HDR_GSO_ECN)
> gso_type |= SKB_GSO_TCP_ECN;
>
> if (vnet_hdr.gso_size == 0)
> goto out_unlock;
>
> }
>
> we used to report UFO6 as SKB_GSO_UDP, we probably
> should keep doing this, with your patch we select the
> goto out_unlock path.
>
>
No. The vnet_hdr.gso_type is still going to be VIRTIO_NET_HDR_GSO_UDP
since the UDP6 version isn't defined yet. So, it will be marked as
GSO_UDP.
This code most likely needs the same workaround as exists in tap and macvtap,
i.e select the proxy fragment id and decide what to do with gso_type.
>
>> It is true that with this series af_packets now can't do IPv6 UFO
>> since there is no VIRTIO_NET_HDR_GSO_UDPV6 yet.
>
> What do you mean by "do".
What I mean is that AF_PACKET sockets currently do not do IPv6 UFO
correctly, even after Ben's fixes to tap/macvtap. There is no
proxy fragment id selection in af_packet case and we have the
same problem Ben was trying address for tap/macvtap.
> Are we talking about sending or receiving packets?
I am talking about sending, see above.
> You seem to conflate the two.
>
> We always discarded ID on RX.
>
> For tun, this is xmit, so just by saying "this device can
> not do UFO" you will never get short packets.
>
You must mean long packets. This is actually an issue I've been thinking
about. With with your suggestion of switching the GSO type for legacy
applications we end up with fragments for IPv6 traffic. As a result,
legacy VMs will see a regression for large IPv6 datagrams.
>
>>
>> I suppose we could do something similar there as we do in tun code/macvtap code.
>> If that's the case, it currently broken as well.
>>
>> -vlad
>
>
> Broken is a big word.
>
> Let's stop conflating two directions.
I am not and was talking only about af_packet as that is what you asked about.
There is no tun/macvtap in play here. They are handled separately in their
respective drivers.
>
> Here's the way I look at it:
>
> 1. Userspace doesn't have a way to get fragment IDs
> from tun/macvtap/packet sockets.
> Presumably, not all users need these IDs.
> E.g. old guests clearly don't.
>
> We should either give them packets stripping the ID,
> like we always did, or make sure they never get these packets.
> Second option seems achievable for tun if we
> never tell linux it can do UFO, but I don't see
> how to do it for macvtap and packet socket.
>
macvtap is slightly problematic, but doable with some tricks.
packet socket is an interesting problem. The only way
an af_packet socket can receive an skb marked SKB_GSO_UDPV6
is if someone else on the host sent it (like another guest).
Since there is are no feature or vnet header length negotiations,
it is impossible to tell if an application using this af_packet
socket is capable of processing VIRTIO_NET_HDR_GSO_UDPV6
type (yet to be defined).
So, we can either use existing VIRTIO_NET_HDR_GSO_UDP on receive
path, add some kind of negotiation logic to packet socket (like
storing the application expected vnet header size), or perform
IPv6 fragmentation somehow.
Options 1 or 2 are doable.
>
> 2. Userspace doesn't have a way to set fragment IDs
> for tun/macvtap/packet sockets.
> Presumably, not all users have these IDs. E.g. old
> guests clearly don't.
>
> We should either generate our own ID,
> like we always did, or make sure we don't accept
> these packets.
> Second option is almost sure to break userspace,
> so it seems we must do the first one.
>
Right. This was missing from packet sockets. I can fix it.
-vlad
>
> 3.
> Exactly the same two things if you replace userspace
> with hypervisor and tun/macvtap/packet socket with
> virtio device.
>
>
> 4.
> As a next step, we should add a way for userspace
> to tell us that it has ids and can handle them.
>
>
>
>
>
>>>
>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>> index 945bbd0..fa4d2ee 100644
>>>> --- a/net/core/dev.c
>>>> +++ b/net/core/dev.c
>>>> @@ -5929,6 +5929,12 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>>>> features &= ~NETIF_F_ALL_TSO;
>>>> }
>>>>
>>>> + /* UFO requires that SG is present as well */
>>>> + if ((features & NETIF_F_ALL_UFO) && !(features & NETIF_F_SG)) {
>>>> + netdev_dbg(dev, "Dropping UFO features since no SG feature.\n");
>>>> + features &= ~NETIF_F_ALL_UFO;
>>>> + }
>>>> +
>>>> if ((features & NETIF_F_TSO) && !(features & NETIF_F_HW_CSUM) &&
>>>> !(features & NETIF_F_IP_CSUM)) {
>>>> netdev_dbg(dev, "Dropping TSO features since no CSUM feature.\n");
>>>> @@ -5952,24 +5958,21 @@ static netdev_features_t netdev_fix_features(struct net_device *dev,
>>>> features &= ~NETIF_F_GSO;
>>>> }
>>>>
>>>> - /* UFO needs SG and checksumming */
>>>> - if (features & NETIF_F_UFO) {
>>>> - /* maybe split UFO into V4 and V6? */
>>>> - if (!((features & NETIF_F_GEN_CSUM) ||
>>>> - (features & (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))
>>>> - == (NETIF_F_IP_CSUM|NETIF_F_IPV6_CSUM))) {
>>>> - netdev_dbg(dev,
>>>> - "Dropping NETIF_F_UFO since no checksum offload features.\n");
>>>> - features &= ~NETIF_F_UFO;
>>>> - }
>>>> -
>>>> - if (!(features & NETIF_F_SG)) {
>>>> - netdev_dbg(dev,
>>>> - "Dropping NETIF_F_UFO since no NETIF_F_SG feature.\n");
>>>> - features &= ~NETIF_F_UFO;
>>>> - }
>>>> + /* UFO also needs checksumming */
>>>> + if ((features & NETIF_F_UFO) && !(features & NETIF_F_GEN_CSUM) &&
>>>> + !(features & NETIF_F_IP_CSUM)) {
>>>> + netdev_dbg(dev,
>>>> + "Dropping NETIF_F_UFO since no checksum offload features.\n");
>>>> + features &= ~NETIF_F_UFO;
>>>> + }
>>>> + if ((features & NETIF_F_UFO6) && !(features & NETIF_F_GEN_CSUM) &&
>>>> + !(features & NETIF_F_IPV6_CSUM)) {
>>>> + netdev_dbg(dev,
>>>> + "Dropping NETIF_F_UFO6 since no checksum offload features.\n");
>>>> + features &= ~NETIF_F_UFO6;
>>>> }
>>>>
>>>> +
>>>> #ifdef CONFIG_NET_RX_BUSY_POLL
>>>> if (dev->netdev_ops->ndo_busy_poll)
>>>> features |= NETIF_F_BUSY_POLL;
>>>> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
>>>> index 06dfb29..93eff41 100644
>>>> --- a/net/core/ethtool.c
>>>> +++ b/net/core/ethtool.c
>>>> @@ -223,7 +223,7 @@ static netdev_features_t ethtool_get_feature_mask(u32 eth_cmd)
>>>> return NETIF_F_ALL_TSO;
>>>> case ETHTOOL_GUFO:
>>>> case ETHTOOL_SUFO:
>>>> - return NETIF_F_UFO;
>>>> + return NETIF_F_ALL_UFO;
>>>> case ETHTOOL_GGSO:
>>>> case ETHTOOL_SGSO:
>>>> return NETIF_F_GSO;
>>>> --
>>>> 1.9.3
^ permalink raw reply
* Re: [PATCH 08/10] tun: Re-uanble UFO support.
From: Vlad Yasevich @ 2014-12-18 15:12 UTC (permalink / raw)
To: Jason Wang, Vladislav Yasevich; +Cc: netdev, mst, ben, stefanha, virtualization
In-Reply-To: <878417685.243182.1418881887975.JavaMail.zimbra@redhat.com>
On 12/18/2014 12:51 AM, Jason Wang wrote:
>
>
> ----- Original Message -----
>> Now that UFO is split into v4 and v6 parts, we can bring
>> back v4 support without any trouble.
>>
>> Continue to handle legacy applications by selecting the
>> IPv6 fragment id but do not change the gso type. Thist
>> makes sure that two legacy VMs may still communicate.
>>
>> Based on original work from Ben Hutchings.
>>
>> Fixes: 88e0e0e5aa7a ("drivers/net: Disable UFO through virtio")
>> CC: Ben Hutchings <ben@decadent.org.uk>
>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>> ---
>> drivers/net/tun.c | 26 ++++++++++++++------------
>> 1 file changed, 14 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 9dd3746..8c32fca 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -175,7 +175,7 @@ struct tun_struct {
>> struct net_device *dev;
>> netdev_features_t set_features;
>> #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
>> - NETIF_F_TSO6)
>> + NETIF_F_TSO6|NETIF_F_UFO)
>>
>> int vnet_hdr_sz;
>> int sndbuf;
>> @@ -1152,20 +1152,15 @@ static ssize_t tun_get_user(struct tun_struct *tun,
>> struct tun_file *tfile,
>> skb_shinfo(skb)->gso_type = SKB_GSO_TCPV6;
>> break;
>> case VIRTIO_NET_HDR_GSO_UDP:
>> - {
>> - static bool warned;
>> -
>> - if (!warned) {
>> - warned = true;
>> - netdev_warn(tun->dev,
>> - "%s: using disabled UFO feature; please fix this program\n",
>> - current->comm);
>> - }
>> skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
>> - if (skb->protocol == htons(ETH_P_IPV6))
>> + if (vlan_get_protocol(skb) == htons(ETH_P_IPV6)) {
>
> This probably means UDPv6 with vlan does not work well, looks like another
> independent fixe and also for stable.
Ok. I can split this out.
>> + /* This allows legacy application to work.
>> + * Do not change the gso_type as it may
>> + * not be upderstood by legacy applications.
>> + */
>
> Isn't this an issue that we use SKB_GSO_UDP for a UDPv6 packet? Especially
> consider we want to fix UFOv6 in the future? We probably can use
> SKB_GSO_UDP6 here and try to fix it in tun_put_user() if a legacy userspace
> is detected.
There is a slight problem with this. This will force fragmentation of IPv6
traffic between VMs since UFO6 wouldn't be enabled on a destination device.
So, suddenly older VMs will see a performance regression for large UDPv6 traffic.
With this code, a legacy VM will continue to receive large UDPv6 traffic.
New VMs will have an updated virtio-net which will reset the gso type on input.
>> ipv6_proxy_select_ident(skb);
>
> Question still for vlan, is network header correctly set here? Looks like
> ipv6_proxy_select_ident() depends on correct network header to work.
>> +
Yes, you are right... I wonder how that worked. I'll re-test.
Thanks
-vlad
}
>> break;
>> - }
>> default:
>> tun->dev->stats.rx_frame_errors++;
>> kfree_skb(skb);
>> @@ -1273,6 +1268,8 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>> gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
>> else if (sinfo->gso_type & SKB_GSO_TCPV6)
>> gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
>> + else if (sinfo->gso_type & SKB_GSO_UDP)
>> + gso.gso_type = VIRTIO_NET_HDR_GSO_UDP;
>> else {
>> pr_err("unexpected GSO type: "
>> "0x%x, gso_size %d, hdr_len %d\n",
>> @@ -1780,6 +1777,11 @@ static int set_offload(struct tun_struct *tun,
>> unsigned long arg)
>> features |= NETIF_F_TSO6;
>> arg &= ~(TUN_F_TSO4|TUN_F_TSO6);
>> }
>> +
>> + if (arg & TUN_F_UFO) {
>> + features |= NETIF_F_UFO;
>> + arg &= ~TUN_F_UFO;
>> + }
>> }
>>
>> /* This gives the user a way to test for new features in future by
>> --
>> 1.9.3
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
^ permalink raw reply
* Re: [PATCH 09/10] macvtap: Re-enable UFO support
From: Vlad Yasevich @ 2014-12-18 15:15 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Vladislav Yasevich, netdev, virtualization, ben, stefanha
In-Reply-To: <20141218075549.GB31702@redhat.com>
On 12/18/2014 02:55 AM, Michael S. Tsirkin wrote:
> On Wed, Dec 17, 2014 at 09:43:51PM -0500, Vlad Yasevich wrote:
>> On 12/17/2014 05:41 PM, Michael S. Tsirkin wrote:
>>> On Wed, Dec 17, 2014 at 01:20:54PM -0500, Vladislav Yasevich wrote:
>>>> Now that UFO is split into v4 and v6 parts, we can bring
>>>> back v4 support. Continue to handle legacy applications
>>>> by selecting the ipv6 fagment id but do not change the
>>>> gso type. This allows 2 legacy VMs to continue to communicate.
>>>>
>>>> Based on original work from Ben Hutchings.
>>>>
>>>> Fixes: 88e0e0e5aa7a ("drivers/net: Disable UFO through virtio")
>>>> CC: Ben Hutchings <ben@decadent.org.uk>
>>>> Signed-off-by: Vladislav Yasevich <vyasevic@redhat.com>
>>>> ---
>>>> drivers/net/macvtap.c | 20 ++++++++++++++------
>>>> 1 file changed, 14 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>>>> index 880cc09..75febd4 100644
>>>> --- a/drivers/net/macvtap.c
>>>> +++ b/drivers/net/macvtap.c
>>>> @@ -66,7 +66,7 @@ static struct cdev macvtap_cdev;
>>>> static const struct proto_ops macvtap_socket_ops;
>>>>
>>>> #define TUN_OFFLOADS (NETIF_F_HW_CSUM | NETIF_F_TSO_ECN | NETIF_F_TSO | \
>>>> - NETIF_F_TSO6)
>>>> + NETIF_F_TSO6 | NETIF_F_UFO)
>>>> #define RX_OFFLOADS (NETIF_F_GRO | NETIF_F_LRO)
>>>> #define TAP_FEATURES (NETIF_F_GSO | NETIF_F_SG)
>>>>
>>>> @@ -570,11 +570,14 @@ static int macvtap_skb_from_vnet_hdr(struct sk_buff *skb,
>>>> gso_type = SKB_GSO_TCPV6;
>>>> break;
>>>> case VIRTIO_NET_HDR_GSO_UDP:
>>>> - pr_warn_once("macvtap: %s: using disabled UFO feature; please fix this program\n",
>>>> - current->comm);
>>>> gso_type = SKB_GSO_UDP;
>>>> - if (skb->protocol == htons(ETH_P_IPV6))
>>>> + if (vlan_get_protocol(skb) == htons(ETH_P_IPV6)) {
>>>> + /* This is to support legacy appliacations.
>>>> + * Do not change the gso_type as legacy apps
>>>> + * may not know about the new type.
>>>> + */
>>>> ipv6_proxy_select_ident(skb);
>>>> + }
>>>> break;
>>>> default:
>>>> return -EINVAL;
>>>> @@ -619,6 +622,8 @@ static void macvtap_skb_to_vnet_hdr(const struct sk_buff *skb,
>>>> vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
>>>> else if (sinfo->gso_type & SKB_GSO_TCPV6)
>>>> vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
>>>> + else if (sinfo->gso_type & SKB_GSO_UDP)
>>>> + vnet_hdr->gso_type = VIRTIO_NET_HDR_GSO_UDP;
>>>> else
>>>> BUG();
>>>> if (sinfo->gso_type & SKB_GSO_TCP_ECN)
>>>> @@ -955,6 +960,9 @@ static int set_offload(struct macvtap_queue *q, unsigned long arg)
>>>> if (arg & TUN_F_TSO6)
>>>> feature_mask |= NETIF_F_TSO6;
>>>> }
>>>> +
>>>> + if (arg & TUN_F_UFO)
>>>> + feature_mask |= NETIF_F_UFO;
>>>> }
>>>>
>>>> /* tun/tap driver inverts the usage for TSO offloads, where
>>>> @@ -965,7 +973,7 @@ static int set_offload(struct macvtap_queue *q, unsigned long arg)
>>>> * When user space turns off TSO, we turn off GSO/LRO so that
>>>> * user-space will not receive TSO frames.
>>>> */
>>>> - if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6))
>>>> + if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO))
>>>> features |= RX_OFFLOADS;
>>>> else
>>>> features &= ~RX_OFFLOADS;
>>>
>>> By the way this logic is completely broken, even without your patch,
>>> isn't it?
>>>
>>> It says: enable LRO+GRO if any of NETIF_F_TSO | NETIF_F_TSO6 |
>>> NETIF_F_UFO set.
>>>
>>> So what happens if I enable TSO only?
>>> LRO gets enabled so I can still get TSO6 packets.
>>>
>>>
>>> This really should be:
>>>
>>> if (feature_mask & (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO) ==
>>> (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_UFO)
>>>
>>>
>>> fixing this probably should be a separate patch before your
>>> series, and Cc stable.
>>
>> Actually, all this LRO/GRO feature manipulation on the macvtap device is
>> rather pointless as those features aren't really checked at any point
>> on the macvtap receive path. GRO calls are done from napi context on
>> the lowest device, so by the time a packet hits macvtap, GRO/LRO has
>> already been done.
>>
>> So it doesn't really matter that we disable them incorrectly. I
>> can send a separate patch to clean this up.
>
> Hmm so if userspace says it can't handle TSO packets,
> it might get them anyway?
No. It will get individual segments because in macvtap_handle_frame
we use user specified features to decide if segmentation needs
to be performed or not.
-vlad
>
>
>>>
>>>
>>>> @@ -1066,7 +1074,7 @@ static long macvtap_ioctl(struct file *file, unsigned int cmd,
>>>> case TUNSETOFFLOAD:
>>>> /* let the user check for future flags */
>>>> if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 |
>>>> - TUN_F_TSO_ECN))
>>>> + TUN_F_TSO_ECN | TUN_F_UFO))
>>>> return -EINVAL;
>>>>
>>>> rtnl_lock();
>>>> --
>>>> 1.9.3
^ permalink raw reply
* Re: [RFC PATCH net-next v2 1/1] net: Support for switch port configuration
From: Roopa Prabhu @ 2014-12-18 15:16 UTC (permalink / raw)
To: Varlese, Marco
Cc: netdev@vger.kernel.org, Fastabend, John R, Thomas Graf,
Jiri Pirko, sfeldma@gmail.com, linux-kernel@vger.kernel.org
In-Reply-To: <C4896FB061E7DE4AAC93031BDCA044B104AD9E53@IRSMSX108.ger.corp.intel.com>
On 12/18/14, 6:55 AM, Varlese, Marco wrote:
>> -----Original Message-----
>> From: Roopa Prabhu [mailto:roopa@cumulusnetworks.com]
>> Sent: Thursday, December 18, 2014 2:45 PM
>> To: Varlese, Marco
>> Cc: netdev@vger.kernel.org; Fastabend, John R; Thomas Graf; Jiri Pirko;
>> sfeldma@gmail.com; linux-kernel@vger.kernel.org
>> Subject: Re: [RFC PATCH net-next v2 1/1] net: Support for switch port
>> configuration
>>
>> On 12/18/14, 3:29 AM, Varlese, Marco wrote:
>>> From: Marco Varlese <marco.varlese@intel.com>
>>>
>>> Switch hardware offers a list of attributes that are configurable on a
>>> per port basis.
>>> This patch provides a mechanism to configure switch ports by adding an
>>> NDO for setting specific values to specific attributes.
>>> There will be a separate patch that adds the "get" functionality via
>>> another NDO and another patch that extends iproute2 to call the two
>>> new NDOs.
>>>
>>> Signed-off-by: Marco Varlese <marco.varlese@intel.com>
>>> ---
>>> include/linux/netdevice.h | 5 ++++
>>> include/uapi/linux/if_link.h | 15 ++++++++++++
>>> net/core/rtnetlink.c | 54
>> ++++++++++++++++++++++++++++++++++++++++++++
>>> 3 files changed, 74 insertions(+)
>>>
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index c31f74d..4881c7b 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -1027,6 +1027,9 @@ typedef u16 (*select_queue_fallback_t)(struct
>> net_device *dev,
>>> * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
>>> * Called to notify switch device port of bridge port STP
>>> * state change.
>>> + * int (*ndo_switch_port_set_cfg)(struct net_device *dev,
>>> + * u32 attr, u64 value);
>>> + * Called to set specific switch ports attributes.
>>> */
>>> struct net_device_ops {
>>> int (*ndo_init)(struct net_device *dev);
>>> @@ -1185,6 +1188,8 @@ struct net_device_ops {
>>> struct
>> netdev_phys_item_id *psid);
>>> int (*ndo_switch_port_stp_update)(struct
>> net_device *dev,
>>> u8 state);
>>> + int (*ndo_switch_port_set_cfg)(struct
>> net_device *dev,
>>> + u32 attr, u64 value);
>>> #endif
>>> };
>>>
>>> diff --git a/include/uapi/linux/if_link.h
>>> b/include/uapi/linux/if_link.h index f7d0d2d..19cb51a 100644
>>> --- a/include/uapi/linux/if_link.h
>>> +++ b/include/uapi/linux/if_link.h
>>> @@ -146,6 +146,7 @@ enum {
>>> IFLA_PHYS_PORT_ID,
>>> IFLA_CARRIER_CHANGES,
>>> IFLA_PHYS_SWITCH_ID,
>>> + IFLA_SWITCH_PORT_CFG,
>>> __IFLA_MAX
>>> };
>>>
>>> @@ -603,4 +604,18 @@ enum {
>>>
>>> #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
>>>
>>> +/* Switch Port Attributes section */
>>> +
>>> +enum {
>>> + IFLA_ATTR_UNSPEC,
>>> + IFLA_ATTR_LEARNING,
>> Any reason you want learning here ?. This is covered as part of the bridge
>> setlink attributes.
>>
> Yes, because the user may _not_ want to go through a bridge interface necessarily.
But, the bridge setlink/getlink interface was changed to accommodate
'self' for exactly such cases.
I kind of understand your case for the other attributes (these are per
port settings that switch asics provide).
However, i don't understand the reason to pull in bridge attributes here.
>
>>> + IFLA_ATTR_LOOPBACK,
>>> + IFLA_ATTR_BCAST_FLOODING,
>>> + IFLA_ATTR_UCAST_FLOODING,
>>> + IFLA_ATTR_MCAST_FLOODING,
>>> + __IFLA_ATTR_MAX
>>> +};
>>> +
>>> +#define IFLA_ATTR_MAX (__IFLA_ATTR_MAX - 1)
>>> +
>>> #endif /* _UAPI_LINUX_IF_LINK_H */
>>> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index
>>> eaa057f..c0d1c45 100644
>>> --- a/net/core/rtnetlink.c
>>> +++ b/net/core/rtnetlink.c
>>> @@ -1389,6 +1389,46 @@ static int validate_linkmsg(struct net_device
>> *dev, struct nlattr *tb[])
>>> return 0;
>>> }
>>>
>>> +#ifdef CONFIG_NET_SWITCHDEV
>>> +static int do_setswcfg(struct net_device *dev, struct nlattr *attr) {
>>> + int rem, err = -EINVAL;
>>> + struct nlattr *v;
>>> + const struct net_device_ops *ops = dev->netdev_ops;
>>> +
>>> + nla_for_each_nested(v, attr, rem) {
>>> + u32 op = nla_type(v);
>>> + u64 value = 0;
>>> +
>>> + switch (op) {
>>> + case IFLA_ATTR_LEARNING:
>>> + case IFLA_ATTR_LOOPBACK:
>>> + case IFLA_ATTR_BCAST_FLOODING:
>>> + case IFLA_ATTR_UCAST_FLOODING:
>>> + case IFLA_ATTR_MCAST_FLOODING: {
>>> + if (nla_len(v) < sizeof(value)) {
>>> + err = -EINVAL;
>>> + break;
>>> + }
>>> +
>>> + value = nla_get_u64(v);
>>> + err = ops->ndo_switch_port_set_cfg(dev,
>>> + op,
>>> + value);
>>> + break;
>>> + }
>>> + default:
>>> + err = -EINVAL;
>>> + break;
>>> + }
>>> + if (err)
>>> + break;
>>> + }
>>> + return err;
>>> +}
>>> +
>>> +#endif
>>> +
>>> static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
>>> {
>>> int rem, err = -EINVAL;
>>> @@ -1740,6 +1780,20 @@ static int do_setlink(const struct sk_buff *skb,
>>> status |= DO_SETLINK_NOTIFY;
>>> }
>>> }
>>> +#ifdef CONFIG_NET_SWITCHDEV
>>> + if (tb[IFLA_SWITCH_PORT_CFG]) {
>>> + err = -EOPNOTSUPP;
>>> + if (!ops->ndo_switch_port_set_cfg)
>>> + goto errout;
>>> + if (!ops->ndo_switch_parent_id_get)
>>> + goto errout;
>>> + err = do_setswcfg(dev, tb[IFLA_SWITCH_PORT_CFG]);
>>> + if (err < 0)
>>> + goto errout;
>>> +
>>> + status |= DO_SETLINK_NOTIFY;
>>> + }
>>> +#endif
>>> err = 0;
>>>
>>> errout:
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* RE: [RFC PATCH net-next v2 1/1] net: Support for switch port configuration
From: Varlese, Marco @ 2014-12-18 15:20 UTC (permalink / raw)
To: Thomas Graf
Cc: netdev@vger.kernel.org, Fastabend, John R, Jiri Pirko,
roopa@cumulusnetworks.com, sfeldma@gmail.com,
linux-kernel@vger.kernel.org
In-Reply-To: <20141218114145.GH28766@casper.infradead.org>
> -----Original Message-----
> From: Thomas Graf [mailto:tgr@infradead.org] On Behalf Of Thomas Graf
> Sent: Thursday, December 18, 2014 11:42 AM
> To: Varlese, Marco
> Cc: netdev@vger.kernel.org; Fastabend, John R; Jiri Pirko;
> roopa@cumulusnetworks.com; sfeldma@gmail.com; linux-
> kernel@vger.kernel.org
> Subject: Re: [RFC PATCH net-next v2 1/1] net: Support for switch port
> configuration
>
> On 12/18/14 at 11:29am, Varlese, Marco wrote:
> > diff --git a/include/uapi/linux/if_link.h
> > b/include/uapi/linux/if_link.h index f7d0d2d..19cb51a 100644
> > --- a/include/uapi/linux/if_link.h
> > +++ b/include/uapi/linux/if_link.h
> > @@ -146,6 +146,7 @@ enum {
> > IFLA_PHYS_PORT_ID,
> > IFLA_CARRIER_CHANGES,
> > IFLA_PHYS_SWITCH_ID,
> > + IFLA_SWITCH_PORT_CFG,
> > __IFLA_MAX
> > };
>
> Needs an entry in ifla_policy[]
>
> [IFLA_SWITCH_PORT_CFG] = { .type = NLA_NESTED },
>
> > @@ -603,4 +604,18 @@ enum {
> >
> > #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
> >
> > +/* Switch Port Attributes section */
> > +
> > +enum {
> > + IFLA_ATTR_UNSPEC,
> > + IFLA_ATTR_LEARNING,
> > + IFLA_ATTR_LOOPBACK,
> > + IFLA_ATTR_BCAST_FLOODING,
> > + IFLA_ATTR_UCAST_FLOODING,
> > + IFLA_ATTR_MCAST_FLOODING,
> > + __IFLA_ATTR_MAX
> > +};
> > +
> > +#define IFLA_ATTR_MAX (__IFLA_ATTR_MAX - 1)
>
> Change the prefix to IFLA_SW_* since it's switch specific?
>
> >
> > +#ifdef CONFIG_NET_SWITCHDEV
> > +static int do_setswcfg(struct net_device *dev, struct nlattr *attr) {
> > + int rem, err = -EINVAL;
> > + struct nlattr *v;
> > + const struct net_device_ops *ops = dev->netdev_ops;
> > +
> > + nla_for_each_nested(v, attr, rem) {
> > + u32 op = nla_type(v);
> > + u64 value = 0;
> > +
> > + switch (op) {
> > + case IFLA_ATTR_LEARNING:
> > + case IFLA_ATTR_LOOPBACK:
> > + case IFLA_ATTR_BCAST_FLOODING:
> > + case IFLA_ATTR_UCAST_FLOODING:
> > + case IFLA_ATTR_MCAST_FLOODING: {
> > + if (nla_len(v) < sizeof(value)) {
> > + err = -EINVAL;
> > + break;
> > + }
>
> You should validate the message before you start applying the changes.
> Otherwise if the 3rd attribute is too short you've already applied some
> changes and the user has not idea how much has been applied.
>
> nla_parse_nested() can help here.
>
>
> > +
> > + value = nla_get_u64(v);
> > + err = ops->ndo_switch_port_set_cfg(dev,
> > + op,
> > + value);
>
> This avoids having individual ndos but wastes a lot of space in the Netlink
> message. Not a problem when setting configuration but you likely want to
> dump these attributes as well and we need 12 bytes for each attribute even
> though some are merely flags which could fit in 4 bytes.
>
> > static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
> > {
> > int rem, err = -EINVAL;
> > @@ -1740,6 +1780,20 @@ static int do_setlink(const struct sk_buff *skb,
> > status |= DO_SETLINK_NOTIFY;
> > }
> > }
> > +#ifdef CONFIG_NET_SWITCHDEV
> > + if (tb[IFLA_SWITCH_PORT_CFG]) {
> > + err = -EOPNOTSUPP;
> > + if (!ops->ndo_switch_port_set_cfg)
> > + goto errout;
> > + if (!ops->ndo_switch_parent_id_get)
> > + goto errout;
> > + err = do_setswcfg(dev, tb[IFLA_SWITCH_PORT_CFG]);
> > + if (err < 0)
> > + goto errout;
> > +
> > + status |= DO_SETLINK_NOTIFY;
> > + }
> > +#endif
>
> Should return -EOPNOTSUPP if IFLA_SWITCH_PORT_CFG is provided but
> CONFIG_NET_SWITCHDEV is not set.
>
I think I've addressed your comments Thomas. Thanks for your suggestions. I'm going to resubmit the patch now.
Thanks,
Marco
^ permalink raw reply
* [RFC PATCH net-next v3 1/1] net: Support for switch port configuration
From: Varlese, Marco @ 2014-12-18 15:30 UTC (permalink / raw)
To: netdev@vger.kernel.org
Cc: Fastabend, John R, Thomas Graf, Jiri Pirko,
roopa@cumulusnetworks.com, sfeldma@gmail.com,
linux-kernel@vger.kernel.org
From: Marco Varlese <marco.varlese@intel.com>
Switch hardware offers a list of attributes that are configurable on a per port
basis.
This patch provides a mechanism to configure switch ports by adding an NDO
for setting specific values to specific attributes.
There will be a separate patch that adds the "get" functionality via another
NDO and another patch that extends iproute2 to call the two new NDOs.
Signed-off-by: Marco Varlese <marco.varlese@intel.com>
---
include/linux/netdevice.h | 5 +++
include/uapi/linux/if_link.h | 15 +++++++++
net/core/rtnetlink.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 93 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index c31f74d..4881c7b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1027,6 +1027,9 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
* int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
* Called to notify switch device port of bridge port STP
* state change.
+ * int (*ndo_switch_port_set_cfg)(struct net_device *dev,
+ * u32 attr, u64 value);
+ * Called to set specific switch ports attributes.
*/
struct net_device_ops {
int (*ndo_init)(struct net_device *dev);
@@ -1185,6 +1188,8 @@ struct net_device_ops {
struct netdev_phys_item_id *psid);
int (*ndo_switch_port_stp_update)(struct net_device *dev,
u8 state);
+ int (*ndo_switch_port_set_cfg)(struct net_device *dev,
+ u32 attr, u64 value);
#endif
};
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index f7d0d2d..6ad9b91 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -146,6 +146,7 @@ enum {
IFLA_PHYS_PORT_ID,
IFLA_CARRIER_CHANGES,
IFLA_PHYS_SWITCH_ID,
+ IFLA_SWITCH_PORT_CFG,
__IFLA_MAX
};
@@ -603,4 +604,18 @@ enum {
#define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
+/* Switch Port Attributes section */
+
+enum {
+ IFLA_SW_UNSPEC,
+ IFLA_SW_LEARNING,
+ IFLA_SW_LOOPBACK,
+ IFLA_SW_BCAST_FLOODING,
+ IFLA_SW_UCAST_FLOODING,
+ IFLA_SW_MCAST_FLOODING,
+ __IFLA_SW_ATTR_MAX
+};
+
+#define IFLA_SW_ATTR_MAX (__IFLA_SW_ATTR_MAX - 1)
+
#endif /* _UAPI_LINUX_IF_LINK_H */
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index eaa057f..d50ca71 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1223,6 +1223,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
[IFLA_PHYS_PORT_ID] = { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
[IFLA_CARRIER_CHANGES] = { .type = NLA_U32 }, /* ignored */
[IFLA_PHYS_SWITCH_ID] = { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
+ [IFLA_SWITCH_PORT_CFG] = { .type = NLA_NESTED },
};
static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -1265,6 +1266,14 @@ static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = {
[IFLA_PORT_RESPONSE] = { .type = NLA_U16, },
};
+static const struct nla_policy ifla_sw_attr_policy[IFLA_SW_ATTR_MAX+1] = {
+ [IFLA_SW_LEARNING] = { .type = NLA_U64 },
+ [IFLA_SW_LOOPBACK] = { .type = NLA_U64 },
+ [IFLA_SW_BCAST_FLOODING] = { .type = NLA_U64 },
+ [IFLA_SW_UCAST_FLOODING] = { .type = NLA_U64 },
+ [IFLA_SW_MCAST_FLOODING] = { .type = NLA_U64 },
+};
+
static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
{
struct net *net = sock_net(skb->sk);
@@ -1389,6 +1398,41 @@ static int validate_linkmsg(struct net_device *dev, struct nlattr *tb[])
return 0;
}
+#ifdef CONFIG_NET_SWITCHDEV
+static int do_setswcfg(struct net_device *dev, struct nlattr *attr)
+{
+ int rem, err = -EINVAL;
+ struct nlattr *v;
+ const struct net_device_ops *ops = dev->netdev_ops;
+
+ nla_for_each_nested(v, attr, rem) {
+ u32 op = nla_type(v);
+ u64 value = 0;
+
+ switch (op) {
+ case IFLA_SW_LEARNING:
+ case IFLA_SW_LOOPBACK:
+ case IFLA_SW_BCAST_FLOODING:
+ case IFLA_SW_UCAST_FLOODING:
+ case IFLA_SW_MCAST_FLOODING: {
+ value = nla_get_u64(v);
+ err = ops->ndo_switch_port_set_cfg(dev,
+ op,
+ value);
+ break;
+ }
+ default:
+ err = -EINVAL;
+ break;
+ }
+ if (err)
+ break;
+ }
+ return err;
+}
+
+#endif
+
static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
{
int rem, err = -EINVAL;
@@ -1740,6 +1784,35 @@ static int do_setlink(const struct sk_buff *skb,
status |= DO_SETLINK_NOTIFY;
}
}
+#ifdef CONFIG_NET_SWITCHDEV
+ if (tb[IFLA_SWITCH_PORT_CFG]) {
+ struct nlattr *attrs[IFLA_SW_ATTR_MAX+1];
+
+ err = -EOPNOTSUPP;
+ if (!ops->ndo_switch_port_set_cfg)
+ goto errout;
+ if (!ops->ndo_switch_parent_id_get)
+ goto errout;
+
+ err = nla_parse_nested(attrs, IFLA_SW_ATTR_MAX,
+ tb[IFLA_SWITCH_PORT_CFG],
+ ifla_sw_attr_policy);
+ if (err < 0)
+ return err;
+
+ err = do_setswcfg(dev, tb[IFLA_SWITCH_PORT_CFG]);
+ if (err < 0)
+ goto errout;
+
+ status |= DO_SETLINK_NOTIFY;
+ }
+#else
+ if (tb[IFLA_SWITCH_PORT_CFG]) {
+ err = -EOPNOTSUPP;
+ goto errout;
+ }
+#endif
+
err = 0;
errout:
--
1.8.5.3
^ permalink raw reply related
* Re: [PATCH] Documentation: clarify phys_port_id
From: Dan Williams @ 2014-12-18 15:35 UTC (permalink / raw)
To: Sathya Perla
Cc: netdev@vger.kernel.org, Joshua Watt, jpirko@redhat.com,
Florian Fainelli
In-Reply-To: <CF9D1877D81D214CB0CA0669EFAE020C68D36E70@CMEXMB1.ad.emulex.com>
On Thu, 2014-12-18 at 05:57 +0000, Sathya Perla wrote:
> > -----Original Message-----
> > From: netdev-owner@vger.kernel.org [mailto:netdev-
> >
> > Signed-off-by: Dan Williams <dcbw@redhat.com>
> > ---
> > Documentation/ABI/testing/sysfs-class-net | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-class-net
> > b/Documentation/ABI/testing/sysfs-class-net
> > index e1b2e78..7fe823a 100644
> > --- a/Documentation/ABI/testing/sysfs-class-net
> > +++ b/Documentation/ABI/testing/sysfs-class-net
> > @@ -186,7 +186,12 @@ KernelVersion: 3.12
> > Contact: netdev@vger.kernel.org
> > Description:
> > Indicates the interface unique physical port identifier within
> > - the NIC, as a string.
> > + the NIC, as a string. If two net_device objects share physical
> > + hardware or other resources, and/or do not operate
> > independently
> > + both net_device objects should be assigned the
> > + same phys_port_id. phys_port_id should be as globally
> > unique
> > + as possible to prevent conflicts between different drivers
> > and
> > + vendors, eg with MAC addresses or hardware GUIDs.
>
> Dan, two interfaces -- on the same card/chip -- may share some chip resources,
> but as long as they use *separate* physical ports, it would be OK to bond them.
> So, in this case, it would be valid to report a different phys_port_id for these netdevs.
>
> The text -- "share physical hardware or other resources" becomes too restrictive
> ^^^^^^^^^^^^^^^
> and will not even allow bonding of two physical ports on a NIC card.
Fair enough, I will reword and resubmit.
Dan
> thks,
> -Sathya
> >
> > What: /sys/class/net/<iface>/speed
> > Date: October 2009
> > --
> > 1.9.3
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* RE: [RFC PATCH net-next v2 1/1] net: Support for switch port configuration
From: Arad, Ronen @ 2014-12-18 15:47 UTC (permalink / raw)
To: Varlese, Marco, Roopa Prabhu, netdev@vger.kernel.org
Cc: Fastabend, John R, Thomas Graf, Jiri Pirko, sfeldma@gmail.com,
linux-kernel@vger.kernel.org
In-Reply-To: <C4896FB061E7DE4AAC93031BDCA044B104AD9E53@IRSMSX108.ger.corp.intel.com>
> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-
> owner@vger.kernel.org] On Behalf Of Varlese, Marco
> Sent: Thursday, December 18, 2014 4:56 PM
> To: Roopa Prabhu
> Cc: netdev@vger.kernel.org; Fastabend, John R; Thomas Graf; Jiri Pirko;
> sfeldma@gmail.com; linux-kernel@vger.kernel.org
> Subject: RE: [RFC PATCH net-next v2 1/1] net: Support for switch port
> configuration
>
> > -----Original Message-----
> > From: Roopa Prabhu [mailto:roopa@cumulusnetworks.com]
> > Sent: Thursday, December 18, 2014 2:45 PM
> > To: Varlese, Marco
> > Cc: netdev@vger.kernel.org; Fastabend, John R; Thomas Graf; Jiri
> > Pirko; sfeldma@gmail.com; linux-kernel@vger.kernel.org
> > Subject: Re: [RFC PATCH net-next v2 1/1] net: Support for switch port
> > configuration
> >
> > On 12/18/14, 3:29 AM, Varlese, Marco wrote:
> > > From: Marco Varlese <marco.varlese@intel.com>
> > >
> > > Switch hardware offers a list of attributes that are configurable on
> > > a per port basis.
> > > This patch provides a mechanism to configure switch ports by adding
> > > an NDO for setting specific values to specific attributes.
> > > There will be a separate patch that adds the "get" functionality via
> > > another NDO and another patch that extends iproute2 to call the two
> > > new NDOs.
> > >
> > > Signed-off-by: Marco Varlese <marco.varlese@intel.com>
> > > ---
> > > include/linux/netdevice.h | 5 ++++
> > > include/uapi/linux/if_link.h | 15 ++++++++++++
> > > net/core/rtnetlink.c | 54
> > ++++++++++++++++++++++++++++++++++++++++++++
> > > 3 files changed, 74 insertions(+)
> > >
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index c31f74d..4881c7b 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -1027,6 +1027,9 @@ typedef u16 (*select_queue_fallback_t)(struct
> > net_device *dev,
> > > * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
> > > * Called to notify switch device port of bridge port STP
> > > * state change.
> > > + * int (*ndo_switch_port_set_cfg)(struct net_device *dev,
> > > + * u32 attr, u64 value);
> > > + * Called to set specific switch ports attributes.
> > > */
> > > struct net_device_ops {
> > > int (*ndo_init)(struct net_device *dev);
> > > @@ -1185,6 +1188,8 @@ struct net_device_ops {
> > > struct
> > netdev_phys_item_id *psid);
> > > int (*ndo_switch_port_stp_update)(struct
> > net_device *dev,
> > > u8 state);
> > > + int (*ndo_switch_port_set_cfg)(struct
> > net_device *dev,
> > > + u32 attr, u64 value);
> > > #endif
> > > };
> > >
> > > diff --git a/include/uapi/linux/if_link.h
> > > b/include/uapi/linux/if_link.h index f7d0d2d..19cb51a 100644
> > > --- a/include/uapi/linux/if_link.h
> > > +++ b/include/uapi/linux/if_link.h
> > > @@ -146,6 +146,7 @@ enum {
> > > IFLA_PHYS_PORT_ID,
> > > IFLA_CARRIER_CHANGES,
> > > IFLA_PHYS_SWITCH_ID,
> > > + IFLA_SWITCH_PORT_CFG,
> > > __IFLA_MAX
> > > };
> > >
> > > @@ -603,4 +604,18 @@ enum {
> > >
> > > #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
> > >
> > > +/* Switch Port Attributes section */
> > > +
> > > +enum {
> > > + IFLA_ATTR_UNSPEC,
> > > + IFLA_ATTR_LEARNING,
> > Any reason you want learning here ?. This is covered as part of the
> > bridge setlink attributes.
> >
>
> Yes, because the user may _not_ want to go through a bridge interface
> necessarily.
Some bridge attributes or more accurately bridge port attributes overlap with port attributes.
IFLA_BRPORT_LEARNING from IFLA_PROTINFO and BRIDGE_VLAN_INFO_PVID flag of IFLA_BRIDGE_VLAN_INFO from IFLA_AF_SPEC are two examples where bridge port properties might have to be mapped to a common or driver specific port attribute.
Switch port driver that works without and explicit bridge device has to implement the bridge NDO's ndo_bridge_setlink/ndo_bridge_dellink.
It could take care of mapping such attributes from the bridge netlink message to either driver-specific port attribute of to an explicit one (assuming IFLA_ATTR_LEARNING would be accepted).
For example the driver could process the bridge learning information from the protinfo as such:
int sw_driver_br_setlink(struct net_device *dev, struct nlmsghdr *nlh)
{
struct nlattr *protinfo;
protinfo = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg),
IFLA_PROTINFO);
if (protinfo) {
struct nlattr *attr;
attr = nla_find_nested(protinfo, IFLA_BRPORT_LEARNING);
if (attr) {
if (nla_len(attr) >= sizeof(u8)) {
if (nla_get_u8(attr))
brport_flags |= BR_LEARNING;
else
brport_flags &= ~BR_LEARNING;
}
}
/*
* Map bridge port attributes to driver-specific port attributes
*/
}
return 0;
}
>
> > > + IFLA_ATTR_LOOPBACK,
> > > + IFLA_ATTR_BCAST_FLOODING,
> > > + IFLA_ATTR_UCAST_FLOODING,
> > > + IFLA_ATTR_MCAST_FLOODING,
> > > + __IFLA_ATTR_MAX
> > > +};
> > > +
> > > +#define IFLA_ATTR_MAX (__IFLA_ATTR_MAX - 1)
> > > +
> > > #endif /* _UAPI_LINUX_IF_LINK_H */ diff --git
> > > a/net/core/rtnetlink.c b/net/core/rtnetlink.c index
> > > eaa057f..c0d1c45 100644
> > > --- a/net/core/rtnetlink.c
> > > +++ b/net/core/rtnetlink.c
> > > @@ -1389,6 +1389,46 @@ static int validate_linkmsg(struct net_device
> > *dev, struct nlattr *tb[])
> > > return 0;
> > > }
> > >
> > > +#ifdef CONFIG_NET_SWITCHDEV
> > > +static int do_setswcfg(struct net_device *dev, struct nlattr *attr) {
> > > + int rem, err = -EINVAL;
> > > + struct nlattr *v;
> > > + const struct net_device_ops *ops = dev->netdev_ops;
> > > +
> > > + nla_for_each_nested(v, attr, rem) {
> > > + u32 op = nla_type(v);
> > > + u64 value = 0;
> > > +
> > > + switch (op) {
> > > + case IFLA_ATTR_LEARNING:
> > > + case IFLA_ATTR_LOOPBACK:
> > > + case IFLA_ATTR_BCAST_FLOODING:
> > > + case IFLA_ATTR_UCAST_FLOODING:
> > > + case IFLA_ATTR_MCAST_FLOODING: {
> > > + if (nla_len(v) < sizeof(value)) {
> > > + err = -EINVAL;
> > > + break;
> > > + }
> > > +
> > > + value = nla_get_u64(v);
> > > + err = ops->ndo_switch_port_set_cfg(dev,
> > > + op,
> > > + value);
> > > + break;
> > > + }
> > > + default:
> > > + err = -EINVAL;
> > > + break;
> > > + }
> > > + if (err)
> > > + break;
> > > + }
> > > + return err;
> > > +}
> > > +
> > > +#endif
> > > +
> > > static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
> > > {
> > > int rem, err = -EINVAL;
> > > @@ -1740,6 +1780,20 @@ static int do_setlink(const struct sk_buff *skb,
> > > status |= DO_SETLINK_NOTIFY;
> > > }
> > > }
> > > +#ifdef CONFIG_NET_SWITCHDEV
> > > + if (tb[IFLA_SWITCH_PORT_CFG]) {
> > > + err = -EOPNOTSUPP;
> > > + if (!ops->ndo_switch_port_set_cfg)
> > > + goto errout;
> > > + if (!ops->ndo_switch_parent_id_get)
> > > + goto errout;
> > > + err = do_setswcfg(dev, tb[IFLA_SWITCH_PORT_CFG]);
> > > + if (err < 0)
> > > + goto errout;
> > > +
> > > + status |= DO_SETLINK_NOTIFY;
> > > + }
> > > +#endif
> > > err = 0;
> > >
> > > errout:
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [RFC PATCH net-next v3 1/1] net: Support for switch port configuration
From: John Fastabend @ 2014-12-18 16:03 UTC (permalink / raw)
To: Varlese, Marco
Cc: netdev@vger.kernel.org, Fastabend, John R, Thomas Graf,
Jiri Pirko, roopa@cumulusnetworks.com, sfeldma@gmail.com,
linux-kernel@vger.kernel.org
In-Reply-To: <C4896FB061E7DE4AAC93031BDCA044B104AD9F56@IRSMSX108.ger.corp.intel.com>
On 12/18/2014 07:30 AM, Varlese, Marco wrote:
> From: Marco Varlese <marco.varlese@intel.com>
>
> Switch hardware offers a list of attributes that are configurable on a per port
> basis.
> This patch provides a mechanism to configure switch ports by adding an NDO
> for setting specific values to specific attributes.
> There will be a separate patch that adds the "get" functionality via another
> NDO and another patch that extends iproute2 to call the two new NDOs.
>
> Signed-off-by: Marco Varlese <marco.varlese@intel.com>
> ---
> include/linux/netdevice.h | 5 +++
> include/uapi/linux/if_link.h | 15 +++++++++
> net/core/rtnetlink.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 93 insertions(+)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index c31f74d..4881c7b 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1027,6 +1027,9 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
> * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
> * Called to notify switch device port of bridge port STP
> * state change.
> + * int (*ndo_switch_port_set_cfg)(struct net_device *dev,
> + * u32 attr, u64 value);
> + * Called to set specific switch ports attributes.
> */
> struct net_device_ops {
> int (*ndo_init)(struct net_device *dev);
> @@ -1185,6 +1188,8 @@ struct net_device_ops {
> struct netdev_phys_item_id *psid);
> int (*ndo_switch_port_stp_update)(struct net_device *dev,
> u8 state);
> + int (*ndo_switch_port_set_cfg)(struct net_device *dev,
> + u32 attr, u64 value);
> #endif
> };
>
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index f7d0d2d..6ad9b91 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -146,6 +146,7 @@ enum {
> IFLA_PHYS_PORT_ID,
> IFLA_CARRIER_CHANGES,
> IFLA_PHYS_SWITCH_ID,
> + IFLA_SWITCH_PORT_CFG,
> __IFLA_MAX
> };
>
> @@ -603,4 +604,18 @@ enum {
>
> #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
>
> +/* Switch Port Attributes section */
Could you also document the attributes. I think they are mostly
clear but what is IFLA_SW_LOOPBACK. It will help later when we
try to read the code in 6months and implement drivers.
I am thinking something like
/* Switch Port Attributes section
* IFLA_SW_LEARNING - turns learning on in the bridge
* IFLA_SW_LOOPBACK - does something interesting
[...]
*/
> +
> +enum {
> + IFLA_SW_UNSPEC,
> + IFLA_SW_LEARNING,
Can you address Roopa's feedback. I'm also a bit confused by the
duplication.
> + IFLA_SW_LOOPBACK,
> + IFLA_SW_BCAST_FLOODING,
> + IFLA_SW_UCAST_FLOODING,
> + IFLA_SW_MCAST_FLOODING,
> + __IFLA_SW_ATTR_MAX
> +};
> +
> +#define IFLA_SW_ATTR_MAX (__IFLA_SW_ATTR_MAX - 1)
> +
> #endif /* _UAPI_LINUX_IF_LINK_H */
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index eaa057f..d50ca71 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1223,6 +1223,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
> [IFLA_PHYS_PORT_ID] = { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
> [IFLA_CARRIER_CHANGES] = { .type = NLA_U32 }, /* ignored */
> [IFLA_PHYS_SWITCH_ID] = { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
> + [IFLA_SWITCH_PORT_CFG] = { .type = NLA_NESTED },
> };
>
> static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> @@ -1265,6 +1266,14 @@ static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = {
> [IFLA_PORT_RESPONSE] = { .type = NLA_U16, },
> };
>
> +static const struct nla_policy ifla_sw_attr_policy[IFLA_SW_ATTR_MAX+1] = {
> + [IFLA_SW_LEARNING] = { .type = NLA_U64 },
> + [IFLA_SW_LOOPBACK] = { .type = NLA_U64 },
> + [IFLA_SW_BCAST_FLOODING] = { .type = NLA_U64 },
> + [IFLA_SW_UCAST_FLOODING] = { .type = NLA_U64 },
> + [IFLA_SW_MCAST_FLOODING] = { .type = NLA_U64 },
> +};
Why U64 values? What would we pass in these? Are these just boolean
bits? Maybe the annotation above will help me understand this.
> +
> static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
> {
> struct net *net = sock_net(skb->sk);
> @@ -1389,6 +1398,41 @@ static int validate_linkmsg(struct net_device *dev, struct nlattr *tb[])
> return 0;
> }
>
> +#ifdef CONFIG_NET_SWITCHDEV
> +static int do_setswcfg(struct net_device *dev, struct nlattr *attr)
> +{
> + int rem, err = -EINVAL;
> + struct nlattr *v;
> + const struct net_device_ops *ops = dev->netdev_ops;
> +
style nit again since its an RFC after all... Its preferred to arrange
arguments like this as a loosly followed convention,
const struct net_device_ops *ops = dev->netdev_ops;
int rem, err = -EINVAL;
struct nlattr *v;
> + nla_for_each_nested(v, attr, rem) {
> + u32 op = nla_type(v);
> + u64 value = 0;
> +
> + switch (op) {
> + case IFLA_SW_LEARNING:
> + case IFLA_SW_LOOPBACK:
> + case IFLA_SW_BCAST_FLOODING:
> + case IFLA_SW_UCAST_FLOODING:
> + case IFLA_SW_MCAST_FLOODING: {
> + value = nla_get_u64(v);
should we validate the get_u64 'value'? Are there valid ranges or
something?
> + err = ops->ndo_switch_port_set_cfg(dev,
> + op,
> + value);
> + break;
> + }
> + default:
> + err = -EINVAL;
> + break;
> + }
> + if (err)
> + break;
> + }
> + return err;
> +}
> +
> +#endif
> +
> static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
> {
> int rem, err = -EINVAL;
> @@ -1740,6 +1784,35 @@ static int do_setlink(const struct sk_buff *skb,
> status |= DO_SETLINK_NOTIFY;
> }
> }
> +#ifdef CONFIG_NET_SWITCHDEV
> + if (tb[IFLA_SWITCH_PORT_CFG]) {
> + struct nlattr *attrs[IFLA_SW_ATTR_MAX+1];
> +
> + err = -EOPNOTSUPP;
> + if (!ops->ndo_switch_port_set_cfg)
> + goto errout;
> + if (!ops->ndo_switch_parent_id_get)
> + goto errout;
> +
style nit (take it for what its worth) but I would compress the above to
if (!ops->ndo_switch_port_set_cfg ||
!ops->ndo_switch_parent_id_get)
goto errout;
I'm also wondering if we really need to check for parent_id_get() here.
I'm not sure I see why a driver would implement port_set_cfg() and not
parent_id_get() though so its mostly harmless.
> + err = nla_parse_nested(attrs, IFLA_SW_ATTR_MAX,
> + tb[IFLA_SWITCH_PORT_CFG],
> + ifla_sw_attr_policy);
> + if (err < 0)
> + return err;
hmm everywhere else you use goto errout but not here?
> +
> + err = do_setswcfg(dev, tb[IFLA_SWITCH_PORT_CFG]);
> + if (err < 0)
> + goto errout;
> +
> + status |= DO_SETLINK_NOTIFY;
hmm another question if you modify the hardware from do_setswcfg()
for an attribute but then fail on a subsequent attribute shouldn't
we have DO_SETLINK_MODIFY set? Say IFLA_SW_LEARNING is set but then
the device fails on IFLA_SW_LOOPBACK.
> + }
> +#else
> + if (tb[IFLA_SWITCH_PORT_CFG]) {
> + err = -EOPNOTSUPP;
> + goto errout;
> + }
> +#endif
> +
> err = 0;
>
> errout:
>
--
John Fastabend Intel Corporation
^ permalink raw reply
* Re: [RFC PATCH net-next v3 1/1] net: Support for switch port configuration
From: Samudrala, Sridhar @ 2014-12-18 16:09 UTC (permalink / raw)
To: Varlese, Marco, netdev@vger.kernel.org
Cc: Fastabend, John R, Thomas Graf, Jiri Pirko,
roopa@cumulusnetworks.com, sfeldma@gmail.com,
linux-kernel@vger.kernel.org
In-Reply-To: <C4896FB061E7DE4AAC93031BDCA044B104AD9F56@IRSMSX108.ger.corp.intel.com>
On 12/18/2014 7:30 AM, Varlese, Marco wrote:
> From: Marco Varlese <marco.varlese@intel.com>
>
> Switch hardware offers a list of attributes that are configurable on a per port
> basis.
> This patch provides a mechanism to configure switch ports by adding an NDO
> for setting specific values to specific attributes.
> There will be a separate patch that adds the "get" functionality via another
> NDO and another patch that extends iproute2 to call the two new NDOs.
>
> Signed-off-by: Marco Varlese <marco.varlese@intel.com>
> ---
> include/linux/netdevice.h | 5 +++
> include/uapi/linux/if_link.h | 15 +++++++++
> net/core/rtnetlink.c | 73 ++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 93 insertions(+)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index c31f74d..4881c7b 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1027,6 +1027,9 @@ typedef u16 (*select_queue_fallback_t)(struct net_device *dev,
> * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
> * Called to notify switch device port of bridge port STP
> * state change.
> + * int (*ndo_switch_port_set_cfg)(struct net_device *dev,
> + * u32 attr, u64 value);
> + * Called to set specific switch ports attributes.
> */
> struct net_device_ops {
> int (*ndo_init)(struct net_device *dev);
> @@ -1185,6 +1188,8 @@ struct net_device_ops {
> struct netdev_phys_item_id *psid);
> int (*ndo_switch_port_stp_update)(struct net_device *dev,
> u8 state);
> + int (*ndo_switch_port_set_cfg)(struct net_device *dev,
> + u32 attr, u64 value);
Is it OK to have a requirement that all the attributes are 64 bit values?
Isn't it more flexible to pass an explicit type of port attributes?
(*ndo_switch_port_set_cfg(struct net_device *dev, u32 attr, u8
attr_type, void *value)
> #endif
> };
>
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index f7d0d2d..6ad9b91 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -146,6 +146,7 @@ enum {
> IFLA_PHYS_PORT_ID,
> IFLA_CARRIER_CHANGES,
> IFLA_PHYS_SWITCH_ID,
> + IFLA_SWITCH_PORT_CFG,
> __IFLA_MAX
> };
>
> @@ -603,4 +604,18 @@ enum {
>
> #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
>
> +/* Switch Port Attributes section */
> +
> +enum {
> + IFLA_SW_UNSPEC,
> + IFLA_SW_LEARNING,
> + IFLA_SW_LOOPBACK,
> + IFLA_SW_BCAST_FLOODING,
> + IFLA_SW_UCAST_FLOODING,
> + IFLA_SW_MCAST_FLOODING,
> + __IFLA_SW_ATTR_MAX
> +};
> +
> +#define IFLA_SW_ATTR_MAX (__IFLA_SW_ATTR_MAX - 1)
> +
> #endif /* _UAPI_LINUX_IF_LINK_H */
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index eaa057f..d50ca71 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -1223,6 +1223,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
> [IFLA_PHYS_PORT_ID] = { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
> [IFLA_CARRIER_CHANGES] = { .type = NLA_U32 }, /* ignored */
> [IFLA_PHYS_SWITCH_ID] = { .type = NLA_BINARY, .len = MAX_PHYS_ITEM_ID_LEN },
> + [IFLA_SWITCH_PORT_CFG] = { .type = NLA_NESTED },
> };
>
> static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
> @@ -1265,6 +1266,14 @@ static const struct nla_policy ifla_port_policy[IFLA_PORT_MAX+1] = {
> [IFLA_PORT_RESPONSE] = { .type = NLA_U16, },
> };
>
> +static const struct nla_policy ifla_sw_attr_policy[IFLA_SW_ATTR_MAX+1] = {
> + [IFLA_SW_LEARNING] = { .type = NLA_U64 },
> + [IFLA_SW_LOOPBACK] = { .type = NLA_U64 },
> + [IFLA_SW_BCAST_FLOODING] = { .type = NLA_U64 },
> + [IFLA_SW_UCAST_FLOODING] = { .type = NLA_U64 },
> + [IFLA_SW_MCAST_FLOODING] = { .type = NLA_U64 },
> +};
> +
> static int rtnl_dump_ifinfo(struct sk_buff *skb, struct netlink_callback *cb)
> {
> struct net *net = sock_net(skb->sk);
> @@ -1389,6 +1398,41 @@ static int validate_linkmsg(struct net_device *dev, struct nlattr *tb[])
> return 0;
> }
>
> +#ifdef CONFIG_NET_SWITCHDEV
> +static int do_setswcfg(struct net_device *dev, struct nlattr *attr)
> +{
> + int rem, err = -EINVAL;
> + struct nlattr *v;
> + const struct net_device_ops *ops = dev->netdev_ops;
> +
> + nla_for_each_nested(v, attr, rem) {
> + u32 op = nla_type(v);
> + u64 value = 0;
> +
> + switch (op) {
> + case IFLA_SW_LEARNING:
> + case IFLA_SW_LOOPBACK:
> + case IFLA_SW_BCAST_FLOODING:
> + case IFLA_SW_UCAST_FLOODING:
> + case IFLA_SW_MCAST_FLOODING: {
> + value = nla_get_u64(v);
> + err = ops->ndo_switch_port_set_cfg(dev,
> + op,
> + value);
> + break;
> + }
> + default:
> + err = -EINVAL;
> + break;
> + }
> + if (err)
> + break;
> + }
> + return err;
> +}
> +
> +#endif
> +
> static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
> {
> int rem, err = -EINVAL;
> @@ -1740,6 +1784,35 @@ static int do_setlink(const struct sk_buff *skb,
> status |= DO_SETLINK_NOTIFY;
> }
> }
> +#ifdef CONFIG_NET_SWITCHDEV
> + if (tb[IFLA_SWITCH_PORT_CFG]) {
> + struct nlattr *attrs[IFLA_SW_ATTR_MAX+1];
> +
> + err = -EOPNOTSUPP;
> + if (!ops->ndo_switch_port_set_cfg)
> + goto errout;
> + if (!ops->ndo_switch_parent_id_get)
> + goto errout;
> +
> + err = nla_parse_nested(attrs, IFLA_SW_ATTR_MAX,
> + tb[IFLA_SWITCH_PORT_CFG],
> + ifla_sw_attr_policy);
> + if (err < 0)
> + return err;
> +
> + err = do_setswcfg(dev, tb[IFLA_SWITCH_PORT_CFG]);
> + if (err < 0)
> + goto errout;
> +
> + status |= DO_SETLINK_NOTIFY;
> + }
> +#else
> + if (tb[IFLA_SWITCH_PORT_CFG]) {
> + err = -EOPNOTSUPP;
> + goto errout;
> + }
> +#endif
> +
> err = 0;
>
> errout:
^ permalink raw reply
* Re: [RFC PATCH net-next v2 1/1] net: Support for switch port configuration
From: John Fastabend @ 2014-12-18 16:14 UTC (permalink / raw)
To: Arad, Ronen, Varlese, Marco, Roopa Prabhu, netdev@vger.kernel.org
Cc: Thomas Graf, Jiri Pirko, sfeldma@gmail.com,
linux-kernel@vger.kernel.org
In-Reply-To: <E4CD12F19ABA0C4D8729E087A761DC3505DBA430@ORSMSX101.amr.corp.intel.com>
On 12/18/2014 07:47 AM, Arad, Ronen wrote:
>
>
>> -----Original Message-----
>> From: netdev-owner@vger.kernel.org [mailto:netdev-
>> owner@vger.kernel.org] On Behalf Of Varlese, Marco
>> Sent: Thursday, December 18, 2014 4:56 PM
>> To: Roopa Prabhu
>> Cc: netdev@vger.kernel.org; Fastabend, John R; Thomas Graf; Jiri Pirko;
>> sfeldma@gmail.com; linux-kernel@vger.kernel.org
>> Subject: RE: [RFC PATCH net-next v2 1/1] net: Support for switch port
>> configuration
>>
>>> -----Original Message-----
>>> From: Roopa Prabhu [mailto:roopa@cumulusnetworks.com]
>>> Sent: Thursday, December 18, 2014 2:45 PM
>>> To: Varlese, Marco
>>> Cc: netdev@vger.kernel.org; Fastabend, John R; Thomas Graf; Jiri
>>> Pirko; sfeldma@gmail.com; linux-kernel@vger.kernel.org
>>> Subject: Re: [RFC PATCH net-next v2 1/1] net: Support for switch port
>>> configuration
>>>
>>> On 12/18/14, 3:29 AM, Varlese, Marco wrote:
>>>> From: Marco Varlese <marco.varlese@intel.com>
>>>>
>>>> Switch hardware offers a list of attributes that are configurable on
>>>> a per port basis.
>>>> This patch provides a mechanism to configure switch ports by adding
>>>> an NDO for setting specific values to specific attributes.
>>>> There will be a separate patch that adds the "get" functionality via
>>>> another NDO and another patch that extends iproute2 to call the two
>>>> new NDOs.
>>>>
>>>> Signed-off-by: Marco Varlese <marco.varlese@intel.com>
>>>> ---
>>>> include/linux/netdevice.h | 5 ++++
>>>> include/uapi/linux/if_link.h | 15 ++++++++++++
>>>> net/core/rtnetlink.c | 54
>>> ++++++++++++++++++++++++++++++++++++++++++++
>>>> 3 files changed, 74 insertions(+)
>>>>
>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>> index c31f74d..4881c7b 100644
>>>> --- a/include/linux/netdevice.h
>>>> +++ b/include/linux/netdevice.h
>>>> @@ -1027,6 +1027,9 @@ typedef u16 (*select_queue_fallback_t)(struct
>>> net_device *dev,
>>>> * int (*ndo_switch_port_stp_update)(struct net_device *dev, u8 state);
>>>> * Called to notify switch device port of bridge port STP
>>>> * state change.
>>>> + * int (*ndo_switch_port_set_cfg)(struct net_device *dev,
>>>> + * u32 attr, u64 value);
>>>> + * Called to set specific switch ports attributes.
>>>> */
>>>> struct net_device_ops {
>>>> int (*ndo_init)(struct net_device *dev);
>>>> @@ -1185,6 +1188,8 @@ struct net_device_ops {
>>>> struct
>>> netdev_phys_item_id *psid);
>>>> int (*ndo_switch_port_stp_update)(struct
>>> net_device *dev,
>>>> u8 state);
>>>> + int (*ndo_switch_port_set_cfg)(struct
>>> net_device *dev,
>>>> + u32 attr, u64 value);
>>>> #endif
>>>> };
>>>>
>>>> diff --git a/include/uapi/linux/if_link.h
>>>> b/include/uapi/linux/if_link.h index f7d0d2d..19cb51a 100644
>>>> --- a/include/uapi/linux/if_link.h
>>>> +++ b/include/uapi/linux/if_link.h
>>>> @@ -146,6 +146,7 @@ enum {
>>>> IFLA_PHYS_PORT_ID,
>>>> IFLA_CARRIER_CHANGES,
>>>> IFLA_PHYS_SWITCH_ID,
>>>> + IFLA_SWITCH_PORT_CFG,
>>>> __IFLA_MAX
>>>> };
>>>>
>>>> @@ -603,4 +604,18 @@ enum {
>>>>
>>>> #define IFLA_HSR_MAX (__IFLA_HSR_MAX - 1)
>>>>
>>>> +/* Switch Port Attributes section */
>>>> +
>>>> +enum {
>>>> + IFLA_ATTR_UNSPEC,
>>>> + IFLA_ATTR_LEARNING,
>>> Any reason you want learning here ?. This is covered as part of the
>>> bridge setlink attributes.
>>>
>>
>> Yes, because the user may _not_ want to go through a bridge interface
>> necessarily.
>
> Some bridge attributes or more accurately bridge port attributes overlap with port attributes.
> IFLA_BRPORT_LEARNING from IFLA_PROTINFO and BRIDGE_VLAN_INFO_PVID
> flag of IFLA_BRIDGE_VLAN_INFO from IFLA_AF_SPEC are two examples
> where bridge port properties might have to be mapped to a common or
> driver specific port attribute.
You've lost me a bit.
> Switch port driver that works without and explicit bridge device has
> to implement the bridge NDO's ndo_bridge_setlink/ndo_bridge_dellink.
> It could take care of mapping such attributes from the bridge netlink
> message to either driver-specific port attribute of to an explicit
> one (assuming IFLA_ATTR_LEARNING would be accepted). For example the
> driver could process the bridge learning information from the
> protinfo as such:
still lost unfortunately.
So you want to enable learning on a port by port basis? Then we also
need to define how the two bits interact.
switch_learning + port_learning = port in learning mode
!switch_learning + port_learning = port not in learning mode?
etc.
Do bridges and users actually enable learning on a per port basis?
> int sw_driver_br_setlink(struct net_device *dev, struct nlmsghdr *nlh)
> {
> struct nlattr *protinfo;
> protinfo = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg),
> IFLA_PROTINFO);
> if (protinfo) {
> struct nlattr *attr;
> attr = nla_find_nested(protinfo, IFLA_BRPORT_LEARNING);
> if (attr) {
> if (nla_len(attr) >= sizeof(u8)) {
> if (nla_get_u8(attr))
> brport_flags |= BR_LEARNING;
> else
> brport_flags &= ~BR_LEARNING;
> }
> }
> /*
> * Map bridge port attributes to driver-specific port attributes
> */
> }
> return 0;
> }
>
>>
>>>> + IFLA_ATTR_LOOPBACK,
>>>> + IFLA_ATTR_BCAST_FLOODING,
>>>> + IFLA_ATTR_UCAST_FLOODING,
>>>> + IFLA_ATTR_MCAST_FLOODING,
>>>> + __IFLA_ATTR_MAX
>>>> +};
>>>> +
>>>> +#define IFLA_ATTR_MAX (__IFLA_ATTR_MAX - 1)
>>>> +
>>>> #endif /* _UAPI_LINUX_IF_LINK_H */ diff --git
>>>> a/net/core/rtnetlink.c b/net/core/rtnetlink.c index
>>>> eaa057f..c0d1c45 100644
>>>> --- a/net/core/rtnetlink.c
>>>> +++ b/net/core/rtnetlink.c
>>>> @@ -1389,6 +1389,46 @@ static int validate_linkmsg(struct net_device
>>> *dev, struct nlattr *tb[])
>>>> return 0;
>>>> }
>>>>
>>>> +#ifdef CONFIG_NET_SWITCHDEV
>>>> +static int do_setswcfg(struct net_device *dev, struct nlattr *attr) {
>>>> + int rem, err = -EINVAL;
>>>> + struct nlattr *v;
>>>> + const struct net_device_ops *ops = dev->netdev_ops;
>>>> +
>>>> + nla_for_each_nested(v, attr, rem) {
>>>> + u32 op = nla_type(v);
>>>> + u64 value = 0;
>>>> +
>>>> + switch (op) {
>>>> + case IFLA_ATTR_LEARNING:
>>>> + case IFLA_ATTR_LOOPBACK:
>>>> + case IFLA_ATTR_BCAST_FLOODING:
>>>> + case IFLA_ATTR_UCAST_FLOODING:
>>>> + case IFLA_ATTR_MCAST_FLOODING: {
>>>> + if (nla_len(v) < sizeof(value)) {
>>>> + err = -EINVAL;
>>>> + break;
>>>> + }
>>>> +
>>>> + value = nla_get_u64(v);
>>>> + err = ops->ndo_switch_port_set_cfg(dev,
>>>> + op,
>>>> + value);
>>>> + break;
>>>> + }
>>>> + default:
>>>> + err = -EINVAL;
>>>> + break;
>>>> + }
>>>> + if (err)
>>>> + break;
>>>> + }
>>>> + return err;
>>>> +}
>>>> +
>>>> +#endif
>>>> +
>>>> static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
>>>> {
>>>> int rem, err = -EINVAL;
>>>> @@ -1740,6 +1780,20 @@ static int do_setlink(const struct sk_buff *skb,
>>>> status |= DO_SETLINK_NOTIFY;
>>>> }
>>>> }
>>>> +#ifdef CONFIG_NET_SWITCHDEV
>>>> + if (tb[IFLA_SWITCH_PORT_CFG]) {
>>>> + err = -EOPNOTSUPP;
>>>> + if (!ops->ndo_switch_port_set_cfg)
>>>> + goto errout;
>>>> + if (!ops->ndo_switch_parent_id_get)
>>>> + goto errout;
>>>> + err = do_setswcfg(dev, tb[IFLA_SWITCH_PORT_CFG]);
>>>> + if (err < 0)
>>>> + goto errout;
>>>> +
>>>> + status |= DO_SETLINK_NOTIFY;
>>>> + }
>>>> +#endif
>>>> err = 0;
>>>>
>>>> errout:
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in the body
>> of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox