* Re: [PATCH] sky2: avoid pci write posting after disabling irqs
From: David Miller @ 2014-12-06 5:34 UTC (permalink / raw)
To: LinoSanfilippo; +Cc: stephen, mlindner, netdev, linux-kernel
In-Reply-To: <1417348611-1752-1-git-send-email-LinoSanfilippo@gmx.de>
From: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Date: Sun, 30 Nov 2014 12:56:51 +0100
> In sky2_change_mtu setting B0_IMSK to 0 may be delayed due to PCI write posting
> which could result in irqs being still active when synchronize_irq is called.
> Since we are not prepared to handle any further irqs after synchronize_irq
> (our resources are freed after that) force the write by a consecutive read from
> the same register.
> Similar situation in sky2_all_down: Here we disabled irqs by a write to B0_IMSK
> but did not ensure that this write took place before synchronize_irq. Fix that
> too.
>
> Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Applied.
^ permalink raw reply
* Re: [PATCH] skge: Unmask interrupts in case of spurious interrupts
From: David Miller @ 2014-12-06 5:34 UTC (permalink / raw)
To: LinoSanfilippo; +Cc: stephen, mlindner, netdev, linux-kernel
In-Reply-To: <1417348291-1302-1-git-send-email-LinoSanfilippo@gmx.de>
From: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Date: Sun, 30 Nov 2014 12:51:31 +0100
> In case of a spurious interrupt dont forget to reenable the interrupts that
> have been masked by reading the interrupt source register.
>
> Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Applied.
^ permalink raw reply
* Re: [PATCH] pxa168: close race between napi and irq activation
From: David Miller @ 2014-12-06 5:34 UTC (permalink / raw)
To: LinoSanfilippo
Cc: arnd, paul.gortmaker, w-lkml, f.fainelli, netdev, linux-kernel
In-Reply-To: <1417344576-4940-1-git-send-email-LinoSanfilippo@gmx.de>
From: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Date: Sun, 30 Nov 2014 11:49:36 +0100
> In pxa168_eth_open() the irqs are enabled before napi. This opens a tiny time
> window in which the irq handler is processed, disables irqs but then is not able
> to schedule the not yet activated napi, leaving irqs disabled forever (since
> irqs are reenabled in napi poll function).
> Fix this race by activating napi before irqs are activated.
>
> Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Applied.
^ permalink raw reply
* Re: [PATCH net] net/mlx4_en: correct the endianness of doorbell_qpn on big endian platform
From: David Miller @ 2014-12-06 5:31 UTC (permalink / raw)
To: weiyang; +Cc: netdev, gideonn, edumazet, amirv
In-Reply-To: <1417315431-16761-1-git-send-email-weiyang@linux.vnet.ibm.com>
From: Wei Yang <weiyang@linux.vnet.ibm.com>
Date: Sun, 30 Nov 2014 10:43:51 +0800
> In commit 6a4e812 (net/mlx4_en: Avoid calling bswap in tx fast path), we store
> doorbell_qpn in big endian to avoid bswap(). Then we try to write it directly
> by iowrite32() instead of iowrite32be().
>
> This works fine on little endian platform, while has some problem on big
> endian platform. Here is the definition in general:
>
> #define iowrite32(v, addr) writel((v), (addr))
> #define writel(b,addr) __raw_writel(__cpu_to_le32(b),addr)
>
> On little endian platform, the value is not swapped before write. While on big
> endian platform, the value is swapped. This is not expected to happen.
>
> This patch does the swap on big endian platform before it is written.
>
> Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Guys, let's figure out what we are doing with this patch.
^ permalink raw reply
* Re: [PATCH v2 net] bpf: x86: fix epilogue generation for eBPF programs
From: David Miller @ 2014-12-06 5:24 UTC (permalink / raw)
To: ast; +Cc: zlim.lnx, edumazet, dborkman, hpa, tglx, mingo, netdev,
linux-kernel
In-Reply-To: <1417301173-23691-1-git-send-email-ast@plumgrid.com>
From: Alexei Starovoitov <ast@plumgrid.com>
Date: Sat, 29 Nov 2014 14:46:13 -0800
> classic BPF has a restriction that last insn is always BPF_RET.
> eBPF doesn't have BPF_RET instruction and this restriction.
> It has BPF_EXIT insn which can appear anywhere in the program
> one or more times and it doesn't have to be last insn.
> Fix eBPF JIT to emit epilogue when first BPF_EXIT is seen
> and all other BPF_EXIT instructions will be emitted as jump.
>
> Since jump offset to epilogue is computed as:
> jmp_offset = ctx->cleanup_addr - addrs[i]
> we need to change type of cleanup_addr to signed to compute the offset as:
> (long long) ((int)20 - (int)30)
> instead of:
> (long long) ((unsigned int)20 - (int)30)
>
> Fixes: 622582786c9e ("net: filter: x86: internal BPF JIT")
> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH 1/1] net-PA Semi: Deletion of unnecessary checks before the function call "pci_dev_put"
From: David Miller @ 2014-12-06 5:15 UTC (permalink / raw)
To: elfring; +Cc: olof, netdev, linux-kernel, kernel-janitors, julia.lawall
In-Reply-To: <547A09B1.9090102@users.sourceforge.net>
From: SF Markus Elfring <elfring@users.sourceforge.net>
Date: Sat, 29 Nov 2014 19:00:17 +0100
> From: Markus Elfring <elfring@users.sourceforge.net>
> Date: Sat, 29 Nov 2014 18:55:40 +0100
>
> The pci_dev_put() function tests whether its argument is NULL
> and then returns immediately. Thus the test around the call
> is not needed.
>
> This issue was detected by using the Coccinelle software.
>
> Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Applied.
^ permalink raw reply
* Re: [PATCH 1/1] net: cassini: Deletion of an unnecessary check before the function call "vfree"
From: David Miller @ 2014-12-06 5:14 UTC (permalink / raw)
To: elfring; +Cc: netdev, linux-kernel, kernel-janitors, cocci
In-Reply-To: <5479D2AD.9060704@users.sourceforge.net>
From: SF Markus Elfring <elfring@users.sourceforge.net>
Date: Sat, 29 Nov 2014 15:05:33 +0100
> From: Markus Elfring <elfring@users.sourceforge.net>
> Date: Sat, 29 Nov 2014 14:34:59 +0100
>
> The vfree() function performs also input parameter validation.
> Thus the test around the call is not needed.
>
> This issue was detected by using the Coccinelle software.
>
> Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Applied.
^ permalink raw reply
* Re: [PATCH 1/1] net-ipvlan: Deletion of an unnecessary check before the function call "free_percpu"
From: David Miller @ 2014-12-06 5:14 UTC (permalink / raw)
To: elfring; +Cc: netdev, linux-kernel, kernel-janitors, julia.lawall
In-Reply-To: <5479E693.3010200@users.sourceforge.net>
From: SF Markus Elfring <elfring@users.sourceforge.net>
Date: Sat, 29 Nov 2014 16:30:27 +0100
> From: Markus Elfring <elfring@users.sourceforge.net>
> Date: Sat, 29 Nov 2014 16:23:20 +0100
>
> The free_percpu() function tests whether its argument is NULL and then
> returns immediately. Thus the test around the call is not needed.
>
> This issue was detected by using the Coccinelle software.
>
> Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Applied.
^ permalink raw reply
* Re: [PATCH] stmmac: pci: allocate memory resources dynamically
From: David Miller @ 2014-12-06 5:04 UTC (permalink / raw)
To: andriy.shevchenko
Cc: peppe.cavallaro, netdev, hock.leong.kweh, vbridgers2013, rayagond
In-Reply-To: <1417182056-17650-1-git-send-email-andriy.shevchenko@linux.intel.com>
From: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Date: Fri, 28 Nov 2014 15:40:56 +0200
> Instead of using global variables we are going to use dynamically allocated
> memory. It allows to append a support of more than one ethernet adapter which
> might have different settings simultaniously.
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Applied, thanks.
^ permalink raw reply
* Re: [PATCH 00/12] Netfilter updates for net-next
From: David Miller @ 2014-12-06 4:58 UTC (permalink / raw)
To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1417611342-25257-1-git-send-email-pablo@netfilter.org>
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Wed, 3 Dec 2014 13:55:30 +0100
> The following batch contains netfilter updates for net-next. Basically,
> enhancements for xt_recent, skip zeroing of timer in conntrack, fix
> linking problem with recent redirect support for nf_tables, ipset
> updates and a couple of cleanups. More specifically, they are:
...
> You can pull these changes from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git
>
> Thanks!
Pulled, thanks a lot Pablo.
^ permalink raw reply
* Re: [net-next 04/14] ixgbe: remove CIAA/D register reads from bad VF check
From: Jeff Kirsher @ 2014-12-06 4:58 UTC (permalink / raw)
To: David Miller
Cc: emil.s.tantilov, netdev, nhorman, sassmann, jogreene,
alex.williamson
In-Reply-To: <20141205.204956.1468623374943582606.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 1269 bytes --]
On Fri, 2014-12-05 at 20:49 -0800, David Miller wrote:
> From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Date: Fri, 5 Dec 2014 09:52:43 -0800
>
> > From: Emil Tantilov <emil.s.tantilov@intel.com>
> >
> > Accessing the CIAA/D register can block access to the PCI config space.
> >
> > This patch removes the read/write operations to the CIAA/D registers
> > and makes use of standard kernel functions for accessing the PCI config
> > space.
> >
> > In addition it moves ixgbevf_check_for_bad_vf() into the watchdog subtask
> > which reduces the frequency of the checks.
> >
> > CC: Alex Williamson <alex.williamson@redhat.com>
> > Reported-by: Alex Williamson <alex.williamson@redhat.com>
> > Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
> > Tested-by: Aaron Brown <aaron.f.brown@intel.com>
> > Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>
> Alex Willaimson stated that he'd like to see this for -stable, but I'm warning
> right now that a change not appropriate for 'net' is not approperiate for
> '-stable' either.
Agreed, only reason I did not send this to net (along with the other
fixes by Emil) was that we are at -rc7 and do not consider these
"critical" to try and squeeze in before the release.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [net-next 00/14][pull request] Intel Wired LAN Driver Updates 2014-12-05
From: David Miller @ 2014-12-06 4:55 UTC (permalink / raw)
To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <1417801973-28793-1-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 5 Dec 2014 09:52:39 -0800
> This series contains updates to ixgbe and ixgbevf.
...
> The following are changes since commit d8febb77b52ebddb9bd03ccaa5b61005e3a45a85:
> tun: Fix GSO meta-data handling in tun_get_user
> and are available in the git repository at:
> git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next master
Pulled, thanks Jeff.
^ permalink raw reply
* Re: [net-next 04/14] ixgbe: remove CIAA/D register reads from bad VF check
From: David Miller @ 2014-12-06 4:49 UTC (permalink / raw)
To: jeffrey.t.kirsher
Cc: emil.s.tantilov, netdev, nhorman, sassmann, jogreene,
alex.williamson
In-Reply-To: <1417801973-28793-5-git-send-email-jeffrey.t.kirsher@intel.com>
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Fri, 5 Dec 2014 09:52:43 -0800
> From: Emil Tantilov <emil.s.tantilov@intel.com>
>
> Accessing the CIAA/D register can block access to the PCI config space.
>
> This patch removes the read/write operations to the CIAA/D registers
> and makes use of standard kernel functions for accessing the PCI config
> space.
>
> In addition it moves ixgbevf_check_for_bad_vf() into the watchdog subtask
> which reduces the frequency of the checks.
>
> CC: Alex Williamson <alex.williamson@redhat.com>
> Reported-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
> Tested-by: Aaron Brown <aaron.f.brown@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alex Willaimson stated that he'd like to see this for -stable, but I'm warning
right now that a change not appropriate for 'net' is not approperiate for
'-stable' either.
^ permalink raw reply
* Re: iproute2/nstat: Bug in displaying icmp stats
From: Vijay Subramanian @ 2014-12-06 3:45 UTC (permalink / raw)
To: Eric Dumazet, netdev
In-Reply-To: <1417831808.15618.23.camel@edumazet-glaptop2.roam.corp.google.com>
> Something like that maybe ?
>
> misc/nstat.c | 18 ++++++++++++++++--
> 1 file changed, 16 insertions(+), 2 deletions(-)
> diff --git a/misc/nstat.c b/misc/nstat.c
Thanks Eric!
This works on current kernel and looks good to me for 2.4 too..
Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com>
Vijay
^ permalink raw reply
* Re: [PATCH 2/3] bridge: offload bridge port attributes to switch asic if feature flag set
From: John Fastabend @ 2014-12-06 3:21 UTC (permalink / raw)
To: Arad, Ronen
Cc: Netdev, Roopa Prabhu, Scott Feldman, Jirí Pírko,
Jamal Hadi Salim, Benjamin LaHaise, Thomas Graf,
stephen@networkplumber.org, John Linville, nhorman@tuxdriver.com,
Nicolas Dichtel, vyasevic@redhat.com, Florian Fainelli,
buytenh@wantstofly.org, Aviad Raveh, David S. Miller,
shm@cumulusnetworks.com, Andy Gospodarek
In-Reply-To: <E4CD12F19ABA0C4D8729E087A761DC3505D84494@ORSMSX101.amr.corp.intel.com>
On 12/05/2014 07:06 PM, Arad, Ronen wrote:
>
>
>> -----Original Message-----
>> From: John Fastabend [mailto:john.fastabend@gmail.com]
>> Sent: Friday, December 05, 2014 6:46 PM
>> To: Arad, Ronen
>> Cc: Roopa Prabhu; Scott Feldman; Netdev; Jirí Pírko; Jamal Hadi Salim;
>> Benjamin LaHaise; Thomas Graf; stephen@networkplumber.org; John
>> Linville; nhorman@tuxdriver.com; Nicolas Dichtel; vyasevic@redhat.com;
>> Florian Fainelli; buytenh@wantstofly.org; Aviad Raveh; David S. Miller;
>> shm@cumulusnetworks.com; Andy Gospodarek
>> Subject: Re: [PATCH 2/3] bridge: offload bridge port attributes to switch asic
>> if feature flag set
>>
>> On 12/05/2014 05:04 PM, Arad, Ronen wrote:
>>> I have another case of propagation which is not covered by the proposed
>> patch.
>>> A recent patch introduced default_pvid attribute for a bridge (so far
>> supported only via sysfs and not via netlink).
>>> When a port joins a bridge, it inherits a PVID from the default_pvid of the
>> bridge.
>>> The bridge driver propagates that to the newly created net_bridge_port.
>> This is done in br_vlan.c:
>>>
>>> int nbp_vlan_init(struct net_bridge_port *p) {
>>> int rc = 0;
>>>
>>> if (p->br->default_pvid) {
>>> rc = nbp_vlan_add(p, p->br->default_pvid,
>>> BRIDGE_VLAN_INFO_PVID |
>>> BRIDGE_VLAN_INFO_UNTAGGED);
>>> }
>>>
>>> return rc;
>>> }
>>>
>>> When L2 switching is offloaded to the HW, this PVID setting need to be
>> propagated. However, it does not come via ndo_bridge_setlink. The
>> proposed propagation at br_setlink or an up level one at rtnetlink are not
>> capable of handling this case.
>>> One possible way for handling that is to replace the call to
>>> nbp_vlan_add with a call to a new function let's say int
>>> br_propagate_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags)
>> This function will compose a netlink message with VLAN filtering information
>> (i.e. AF_SPEC with VLAN_INFO) and call br_setlink - leveraging the offload
>> support proposed by Roopa.
>>>
>>
>> No, we shouldn't be crafting netlink messages in the kernel just re-inject
>> them into an interface. Really the setlink/dellink interface should be cleaned
>> up so that it no longer consumes raw netlink messages.
>>
>> Then either (a) add another parameter to the setlink ops or (b) create a new
>> op for it.
>>
>> I think cleaning up the setlink/dellink hooks is on the TBD list already.
>>
> This would be a lot cleaner even though there could be loss of
> flexibility. Fixed argument interface will not be extensible.
> Will the non-Netlink based driver setlink/dellink hooks be TLV based
> or take a pointer to a single struct with some indication of what is
> actually populated there?
There shouldn't be any loss of flexibility, we can add new attributes
and new ops as we need them.
I had assumed it would be basic structures and additional ndo ops as
needed but I've not coded anything up.
>>> If this is an acceptable course of action, I could work on such patch.
>>>
>>>
>>
>> [...]
>>
>> Thanks,
>> John
>>
>> --
>> John Fastabend Intel Corporation
--
John Fastabend Intel Corporation
^ permalink raw reply
* RE: [PATCH 2/3] bridge: offload bridge port attributes to switch asic if feature flag set
From: Arad, Ronen @ 2014-12-06 3:06 UTC (permalink / raw)
To: John Fastabend, Netdev
Cc: Roopa Prabhu, Scott Feldman, Jirí Pírko,
Jamal Hadi Salim, Benjamin LaHaise, Thomas Graf,
stephen@networkplumber.org, John Linville, nhorman@tuxdriver.com,
Nicolas Dichtel, vyasevic@redhat.com, Florian Fainelli,
buytenh@wantstofly.org, Aviad Raveh, David S. Miller,
shm@cumulusnetworks.com, Andy Gospodarek
In-Reply-To: <54826DFF.6090906@gmail.com>
> -----Original Message-----
> From: John Fastabend [mailto:john.fastabend@gmail.com]
> Sent: Friday, December 05, 2014 6:46 PM
> To: Arad, Ronen
> Cc: Roopa Prabhu; Scott Feldman; Netdev; Jirí Pírko; Jamal Hadi Salim;
> Benjamin LaHaise; Thomas Graf; stephen@networkplumber.org; John
> Linville; nhorman@tuxdriver.com; Nicolas Dichtel; vyasevic@redhat.com;
> Florian Fainelli; buytenh@wantstofly.org; Aviad Raveh; David S. Miller;
> shm@cumulusnetworks.com; Andy Gospodarek
> Subject: Re: [PATCH 2/3] bridge: offload bridge port attributes to switch asic
> if feature flag set
>
> On 12/05/2014 05:04 PM, Arad, Ronen wrote:
> > I have another case of propagation which is not covered by the proposed
> patch.
> > A recent patch introduced default_pvid attribute for a bridge (so far
> supported only via sysfs and not via netlink).
> > When a port joins a bridge, it inherits a PVID from the default_pvid of the
> bridge.
> > The bridge driver propagates that to the newly created net_bridge_port.
> This is done in br_vlan.c:
> >
> > int nbp_vlan_init(struct net_bridge_port *p) {
> > int rc = 0;
> >
> > if (p->br->default_pvid) {
> > rc = nbp_vlan_add(p, p->br->default_pvid,
> > BRIDGE_VLAN_INFO_PVID |
> > BRIDGE_VLAN_INFO_UNTAGGED);
> > }
> >
> > return rc;
> > }
> >
> > When L2 switching is offloaded to the HW, this PVID setting need to be
> propagated. However, it does not come via ndo_bridge_setlink. The
> proposed propagation at br_setlink or an up level one at rtnetlink are not
> capable of handling this case.
> > One possible way for handling that is to replace the call to
> > nbp_vlan_add with a call to a new function let's say int
> > br_propagate_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags)
> This function will compose a netlink message with VLAN filtering information
> (i.e. AF_SPEC with VLAN_INFO) and call br_setlink - leveraging the offload
> support proposed by Roopa.
> >
>
> No, we shouldn't be crafting netlink messages in the kernel just re-inject
> them into an interface. Really the setlink/dellink interface should be cleaned
> up so that it no longer consumes raw netlink messages.
>
> Then either (a) add another parameter to the setlink ops or (b) create a new
> op for it.
>
> I think cleaning up the setlink/dellink hooks is on the TBD list already.
>
This would be a lot cleaner even though there could be loss of flexibility. Fixed argument interface will not be extensible.
Will the non-Netlink based driver setlink/dellink hooks be TLV based or take a pointer to a single struct with some indication of what is actually populated there?
>
> > If this is an acceptable course of action, I could work on such patch.
> >
> >
>
> [...]
>
> Thanks,
> John
>
> --
> John Fastabend Intel Corporation
^ permalink raw reply
* Re: kernel panic receiving flooded VXLAN traffic with OVS
From: Jesse Gross @ 2014-12-06 2:51 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: netdev, discuss@openvswitch.org, Pravin Shelar
In-Reply-To: <8549.1417657547@famine>
On Wed, Dec 3, 2014 at 5:45 PM, Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>
> Jay Vosburgh <jay.vosburgh@canonical.com> wrote:
>
>> I am able to reproduce a kernel panic on an system using
>>openvswitch when receiving VXLAN traffic under a very specific set of
>>circumstances. This occurs with a recent net-next as well as an Ubuntu
>>3.13 kernel. I'm not sure if the error lies in OVS, GRO, or elsewhere.
>>
>> In summary, when the system receives multiple VXLAN encapsulated
>>TCP segments for a different system (not intended for local reception)
>>that are from the middle of an active connection (received due to a switch
>>flood), and are tagged to a VLAN not configured on the local host, then
>>the system panics in skb_segment when OVS calls __skb_gso_segment on the
>>GRO skb prior to performing an upcall to user space.
>>
>> The panic occurs in skbuff.c:skb_segment(), at the BUG_ON around
>>line 3036:
>>
>>struct sk_buff *skb_segment(struct sk_buff *head_skb,
>> netdev_features_t features)
>>{
>>[...]
>> skb_shinfo(nskb)->tx_flags = skb_shinfo(head_skb)->tx_flags &
>> SKBTX_SHARED_FRAG;
>>
>> while (pos < offset + len) {
>> if (i >= nfrags) {
>> BUG_ON(skb_headlen(list_skb));
>>
>> i = 0;
>>
>>
>> The BUG_ON triggers because the skbs that have been GRO
>>accumulated are partially or entirely linear, depending upon the receiving
>>network device (sky2 is partial, enic is entire). The receive buffers end
>>up being linear evidently because the mtu is set to 9000, and
>>__netdev_alloc_skb calls __alloc_skb (and thus kmalloc) instead of
>>__netdev_alloc_frag followed by build_skb.
>>
>> The foreign-VLAN VXLAN TCP segments are not processed as normal
>>VXLAN traffic, as there is no listener on the VLAN in question, so once
>>GRO processes them, they are sent directly to ovs_vport_receive. The
>>panic stack appears as follows:
>
> I've worked out some more details on this with regards to the
> cause.
>
> There seems to be a mismatch between GRO and the packet receive
> processing. GRO only looks at the receiving port number in order to
> trigger VXLAN GRO accumulation (which will in turn perform TCP
> accumulation on the encapsulated segment). For the panicking case, the
> packet receive processing doesn't deliver the GRO skb to VXLAN because
> there is no VXLAN listener on the foreign VLAN.
>
> The GRO skb is not processed through iptunnel_pull_header by
> vxlan_udp_encap_recv, so the GRO skb is left with the skb header
> pointing to the UDP header, not the inner TCP header. Note that second
> and later skbs within the GRO skb have their headers pointing to the
> inner TCP header.
>
> Then, when ovs_dp_upcall later ends up in inet_gso_segment, it
> passes the GRO skb to udp4_ufo_fragment, not tcp_gso_segment.
>
> GRO and the skb_segment call from ovs_dp_upcall appear to work
> fine on TCP-in-VXLAN segments that do pass through the VXLAN receive
> processing.
>
> I'm not sure how best to resolve this; adding a check to the GRO
> processing that an skb destined for the VXLAN port would actually be
> received by VXLAN sounds like a possible solution, but that doesn't seem
> to be simple to implement (because the skb->dev at the time GRO runs may
> not match what it becomes later if the VXLAN runs on a VLAN).
I don't think there is anything inherently wrong with aggregating TCP
segments in VXLAN that are not destined for the local host. This is
conceptually the same as doing aggregation for TCP packets where we
only perform L2 bridging - in theory we shouldn't look at the upper
layers but it is fine as long as we faithfully reconstruct it on the
way out.
A VXLAN packet that has been properly GRO-ed should result in a call
to tcp_tso_segment() even without the header being pulled off, since
that's what would happen for locally generated VXLAN packets on
egress. That's what I thought I was fixing with my previous patch to
the VXLAN GRO code although perhaps there is another issue.
^ permalink raw reply
* Re: [PATCH 2/3] bridge: offload bridge port attributes to switch asic if feature flag set
From: John Fastabend @ 2014-12-06 2:46 UTC (permalink / raw)
To: Arad, Ronen
Cc: Roopa Prabhu, Scott Feldman, Netdev, Jirí Pírko,
Jamal Hadi Salim, Benjamin LaHaise, Thomas Graf,
stephen@networkplumber.org, John Linville, nhorman@tuxdriver.com,
Nicolas Dichtel, vyasevic@redhat.com, Florian Fainelli,
buytenh@wantstofly.org, Aviad Raveh, David S. Miller,
shm@cumulusnetworks.com, Andy Gospodarek
In-Reply-To: <E4CD12F19ABA0C4D8729E087A761DC3505D8436D@ORSMSX101.amr.corp.intel.com>
On 12/05/2014 05:04 PM, Arad, Ronen wrote:
> I have another case of propagation which is not covered by the proposed patch.
> A recent patch introduced default_pvid attribute for a bridge (so far supported only via sysfs and not via netlink).
> When a port joins a bridge, it inherits a PVID from the default_pvid of the bridge.
> The bridge driver propagates that to the newly created net_bridge_port. This is done in br_vlan.c:
>
> int nbp_vlan_init(struct net_bridge_port *p)
> {
> int rc = 0;
>
> if (p->br->default_pvid) {
> rc = nbp_vlan_add(p, p->br->default_pvid,
> BRIDGE_VLAN_INFO_PVID |
> BRIDGE_VLAN_INFO_UNTAGGED);
> }
>
> return rc;
> }
>
> When L2 switching is offloaded to the HW, this PVID setting need to be propagated. However, it does not come via ndo_bridge_setlink. The proposed propagation at br_setlink or an up level one at rtnetlink are not capable of handling this case.
> One possible way for handling that is to replace the call to nbp_vlan_add with a call to a new function let's say
> int br_propagate_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags)
> This function will compose a netlink message with VLAN filtering information (i.e. AF_SPEC with VLAN_INFO) and call br_setlink - leveraging the offload support proposed by Roopa.
>
No, we shouldn't be crafting netlink messages in the kernel just
re-inject them into an interface. Really the setlink/dellink interface
should be cleaned up so that it no longer consumes raw netlink messages.
Then either (a) add another parameter to the setlink ops or (b) create
a new op for it.
I think cleaning up the setlink/dellink hooks is on the TBD list
already.
> If this is an acceptable course of action, I could work on such patch.
>
>
[...]
Thanks,
John
--
John Fastabend Intel Corporation
^ permalink raw reply
* [PATCH v2 net-next] tcp: refine TSO autosizing
From: Eric Dumazet @ 2014-12-06 2:22 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Neal Cardwell, Yuchung Cheng, Nandita Dukkipati
In-Reply-To: <1417788937.4322.21.camel@edumazet-glaptop2.roam.corp.google.com>
From: Eric Dumazet <edumazet@google.com>
Commit 95bd09eb2750 ("tcp: TSO packets automatic sizing") tried to
control TSO size, but did this at the wrong place (sendmsg() time)
At sendmsg() time, we might have a pessimistic view of flow rate,
and we end up building very small skbs (with 2 MSS per skb).
This is bad because :
- It sends small TSO packets even in Slow Start where rate quickly
increases.
- It tends to make socket write queue very big, increasing tcp_ack()
processing time, but also increasing memory needs, not necessarily
accounted for, as fast clones overhead is currently ignored.
- Lower GRO efficiency and more ACK packets.
Servers with a lot of small lived connections suffer from this.
Lets instead fill skbs as much as possible (64KB of payload), but split
them at xmit time, when we have a precise idea of the flow rate.
skb split is actually quite efficient.
Patch looks bigger than necessary, because TCP Small Queue decision now
has to take place after the eventual split.
As Neal suggested, get rid of tp->xmit_size_goal_segs, and
introduce a new tcp_tso_autosize() helper, so that
tcp_tso_should_defer() can be synchronized on same goal.
Tested:
40 ms rtt link
nstat >/dev/null
netperf -H remote -Cc -l -2000000 -- -s 1000000
nstat | egrep "IpInReceives|IpOutRequests|TcpOutSegs|IpExtOutOctets"
Before patch :
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 2000000 2000000 0.36 44.22 0.00 0.06 0.000 5.007
IpInReceives 600 0.0
IpOutRequests 599 0.0
TcpOutSegs 1397 0.0
IpExtOutOctets 2033249 0.0
After patch :
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB
87380 2000000 2000000 0.36 44.09 0.00 0.00 0.000 0.000
IpInReceives 257 0.0
IpOutRequests 226 0.0
TcpOutSegs 1399 0.0
IpExtOutOctets 2013777 0.0
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
v2: added tcp_tso_autosize() helper and removed tp->xmit_size_goal_segs
include/linux/tcp.h | 2 -
net/ipv4/tcp.c | 44 ++++++-----------------------
net/ipv4/tcp_output.c | 59 +++++++++++++++++++++++++++-------------
3 files changed, 51 insertions(+), 54 deletions(-)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index f566b8567892ef0bb213de0540b37cfc6ac03ca0..a2944717f1b7d1d12b4e2397c20457931ad71bac 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -130,7 +130,7 @@ struct tcp_sock {
/* inet_connection_sock has to be the first member of tcp_sock */
struct inet_connection_sock inet_conn;
u16 tcp_header_len; /* Bytes of tcp header to send */
- u16 xmit_size_goal_segs; /* Goal for segmenting output packets */
+ /* 2 bytes hole */
/*
* Header prediction flags
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index dc13a3657e8e1b81ba0cb1fcd5386a9d0b106168..6e692f7ac62880415093ca7a43c782c7fd6a049b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -835,45 +835,19 @@ static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
int large_allowed)
{
struct tcp_sock *tp = tcp_sk(sk);
- u32 xmit_size_goal, old_size_goal;
+ u32 hlen, xmit_size_goal;
- xmit_size_goal = mss_now;
+ if (!large_allowed || !sk_can_gso(sk))
+ return mss_now;
- if (large_allowed && sk_can_gso(sk)) {
- u32 gso_size, hlen;
+ /* Maybe we should/could use sk->sk_prot->max_header here ? */
+ hlen = inet_csk(sk)->icsk_af_ops->net_header_len +
+ inet_csk(sk)->icsk_ext_hdr_len +
+ tp->tcp_header_len;
- /* Maybe we should/could use sk->sk_prot->max_header here ? */
- hlen = inet_csk(sk)->icsk_af_ops->net_header_len +
- inet_csk(sk)->icsk_ext_hdr_len +
- tp->tcp_header_len;
+ xmit_size_goal = sk->sk_gso_max_size - 1 - hlen;
- /* Goal is to send at least one packet per ms,
- * not one big TSO packet every 100 ms.
- * This preserves ACK clocking and is consistent
- * with tcp_tso_should_defer() heuristic.
- */
- gso_size = sk->sk_pacing_rate / (2 * MSEC_PER_SEC);
- gso_size = max_t(u32, gso_size,
- sysctl_tcp_min_tso_segs * mss_now);
-
- xmit_size_goal = min_t(u32, gso_size,
- sk->sk_gso_max_size - 1 - hlen);
-
- xmit_size_goal = tcp_bound_to_half_wnd(tp, xmit_size_goal);
-
- /* We try hard to avoid divides here */
- old_size_goal = tp->xmit_size_goal_segs * mss_now;
-
- if (likely(old_size_goal <= xmit_size_goal &&
- old_size_goal + mss_now > xmit_size_goal)) {
- xmit_size_goal = old_size_goal;
- } else {
- tp->xmit_size_goal_segs =
- min_t(u16, xmit_size_goal / mss_now,
- sk->sk_gso_max_segs);
- xmit_size_goal = tp->xmit_size_goal_segs * mss_now;
- }
- }
+ xmit_size_goal = tcp_bound_to_half_wnd(tp, xmit_size_goal);
return max(xmit_size_goal, mss_now);
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f5bd4bd3f7e669b3fd48a843d55e7313a30a3409..f37ecf53ee8a96827fc08bd203b0ca8857f8fc34 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1524,6 +1524,27 @@ static bool tcp_nagle_check(bool partial, const struct tcp_sock *tp,
((nonagle & TCP_NAGLE_CORK) ||
(!nonagle && tp->packets_out && tcp_minshall_check(tp)));
}
+
+/* Return how many segs we'd like on a TSO packet,
+ * to send one TSO packet per ms.
+ */
+static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now)
+{
+ u32 bytes, segs;
+
+ bytes = min(sk->sk_pacing_rate >> 10,
+ sk->sk_gso_max_size - 1 - MAX_TCP_HEADER);
+
+ /* Goal is to send at least one packet per ms,
+ * not one big TSO packet every 100 ms.
+ * This preserves ACK clocking and is consistent
+ * with tcp_tso_should_defer() heuristic.
+ */
+ segs = max_t(u32, bytes / mss_now, sysctl_tcp_min_tso_segs);
+
+ return min_t(u32, segs, sk->sk_gso_max_segs);
+}
+
/* Returns the portion of skb which can be sent right away */
static unsigned int tcp_mss_split_point(const struct sock *sk,
const struct sk_buff *skb,
@@ -1731,7 +1752,7 @@ static int tso_fragment(struct sock *sk, struct sk_buff *skb, unsigned int len,
* This algorithm is from John Heffner.
*/
static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,
- bool *is_cwnd_limited)
+ bool *is_cwnd_limited, u32 max_segs)
{
struct tcp_sock *tp = tcp_sk(sk);
const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -1761,8 +1782,7 @@ static bool tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb,
limit = min(send_win, cong_win);
/* If a full-sized TSO skb can be sent, do it. */
- if (limit >= min_t(unsigned int, sk->sk_gso_max_size,
- tp->xmit_size_goal_segs * tp->mss_cache))
+ if (limit >= max_segs * tp->mss_cache)
goto send_now;
/* Middle in queue won't get any more data, full sendable already? */
@@ -1959,6 +1979,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
int cwnd_quota;
int result;
bool is_cwnd_limited = false;
+ u32 max_segs;
sent_pkts = 0;
@@ -1972,6 +1993,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
}
}
+ max_segs = tcp_tso_autosize(sk, mss_now);
while ((skb = tcp_send_head(sk))) {
unsigned int limit;
@@ -2004,10 +2026,23 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
break;
} else {
if (!push_one &&
- tcp_tso_should_defer(sk, skb, &is_cwnd_limited))
+ tcp_tso_should_defer(sk, skb, &is_cwnd_limited,
+ max_segs))
break;
}
+ limit = mss_now;
+ if (tso_segs > 1 && !tcp_urg_mode(tp))
+ limit = tcp_mss_split_point(sk, skb, mss_now,
+ min_t(unsigned int,
+ cwnd_quota,
+ max_segs),
+ nonagle);
+
+ if (skb->len > limit &&
+ unlikely(tso_fragment(sk, skb, limit, mss_now, gfp)))
+ break;
+
/* TCP Small Queues :
* Control number of packets in qdisc/devices to two packets / or ~1 ms.
* This allows for :
@@ -2018,8 +2053,8 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
* of queued bytes to ensure line rate.
* One example is wifi aggregation (802.11 AMPDU)
*/
- limit = max_t(unsigned int, sysctl_tcp_limit_output_bytes,
- sk->sk_pacing_rate >> 10);
+ limit = max(2 * skb->truesize, sk->sk_pacing_rate >> 10);
+ limit = min_t(u32, limit, sysctl_tcp_limit_output_bytes);
if (atomic_read(&sk->sk_wmem_alloc) > limit) {
set_bit(TSQ_THROTTLED, &tp->tsq_flags);
@@ -2032,18 +2067,6 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle,
break;
}
- limit = mss_now;
- if (tso_segs > 1 && !tcp_urg_mode(tp))
- limit = tcp_mss_split_point(sk, skb, mss_now,
- min_t(unsigned int,
- cwnd_quota,
- sk->sk_gso_max_segs),
- nonagle);
-
- if (skb->len > limit &&
- unlikely(tso_fragment(sk, skb, limit, mss_now, gfp)))
- break;
-
if (unlikely(tcp_transmit_skb(sk, skb, 1, gfp)))
break;
^ permalink raw reply related
* [PATCH iproute2 2/2] lib names: Add helper func for parse id and name from file
From: Vadim Kochan @ 2014-12-06 2:05 UTC (permalink / raw)
To: netdev; +Cc: Vadim Kochan
In-Reply-To: <1417831512-19452-1-git-send-email-vadim4j@gmail.com>
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
lib/rt_names.c | 68 +++++++++++++++++++++++++++++++++-------------------------
1 file changed, 39 insertions(+), 29 deletions(-)
diff --git a/lib/rt_names.c b/lib/rt_names.c
index e6a1e01..2f14723 100644
--- a/lib/rt_names.c
+++ b/lib/rt_names.c
@@ -27,43 +27,62 @@
#define CONFDIR "/etc/iproute2"
#endif
+#define NAME_MAX_LEN 512
+
struct rtnl_hash_entry {
struct rtnl_hash_entry *next;
const char * name;
unsigned int id;
};
+static int fread_id_name(FILE *fp, int *id, char *namebuf)
+{
+ char buf[NAME_MAX_LEN];
+ while (fgets(buf, sizeof(buf), fp)) {
+ char *p = buf;
+
+ while (*p == ' ' || *p == '\t')
+ p++;
+
+ if (*p == '#' || *p == '\n' || *p == 0)
+ continue;
+
+ if (sscanf(p, "0x%x %s\n", id, namebuf) != 2 &&
+ sscanf(p, "0x%x %s #", id, namebuf) != 2 &&
+ sscanf(p, "%d %s\n", id, namebuf) != 2 &&
+ sscanf(p, "%d %s #", id, namebuf) != 2) {
+ strcpy(namebuf, p);
+ return -1;
+ }
+ return 1;
+ }
+ return 0;
+}
+
static void
rtnl_hash_initialize(const char *file, struct rtnl_hash_entry **hash, int size)
{
struct rtnl_hash_entry *entry;
- char buf[512];
FILE *fp;
+ int id;
+ char namebuf[NAME_MAX_LEN] = {0};
+ int ret;
fp = fopen(file, "r");
if (!fp)
return;
- while (fgets(buf, sizeof(buf), fp)) {
- char *p = buf;
- int id;
- char namebuf[512];
- while (*p == ' ' || *p == '\t')
- p++;
- if (*p == '#' || *p == '\n' || *p == 0)
- continue;
- if (sscanf(p, "0x%x %s\n", &id, namebuf) != 2 &&
- sscanf(p, "0x%x %s #", &id, namebuf) != 2 &&
- sscanf(p, "%d %s\n", &id, namebuf) != 2 &&
- sscanf(p, "%d %s #", &id, namebuf) != 2) {
+ while ((ret = fread_id_name(fp, &id, &namebuf[0]))) {
+ if (ret == -1) {
fprintf(stderr, "Database %s is corrupted at %s\n",
- file, p);
+ file, namebuf);
fclose(fp);
return;
}
if (id<0)
continue;
+
entry = malloc(sizeof(*entry));
entry->id = id;
entry->name = strdup(namebuf);
@@ -75,31 +94,22 @@ rtnl_hash_initialize(const char *file, struct rtnl_hash_entry **hash, int size)
static void rtnl_tab_initialize(const char *file, char **tab, int size)
{
- char buf[512];
FILE *fp;
+ int id;
+ char namebuf[NAME_MAX_LEN] = {0};
+ int ret;
fp = fopen(file, "r");
if (!fp)
return;
- while (fgets(buf, sizeof(buf), fp)) {
- char *p = buf;
- int id;
- char namebuf[512];
- while (*p == ' ' || *p == '\t')
- p++;
- if (*p == '#' || *p == '\n' || *p == 0)
- continue;
- if (sscanf(p, "0x%x %s\n", &id, namebuf) != 2 &&
- sscanf(p, "0x%x %s #", &id, namebuf) != 2 &&
- sscanf(p, "%d %s\n", &id, namebuf) != 2 &&
- sscanf(p, "%d %s #", &id, namebuf) != 2) {
+ while ((ret = fread_id_name(fp, &id, &namebuf[0]))) {
+ if (ret == -1) {
fprintf(stderr, "Database %s is corrupted at %s\n",
- file, p);
+ file, namebuf);
fclose(fp);
return;
}
-
if (id<0 || id>size)
continue;
--
2.1.3
^ permalink raw reply related
* [PATCH iproute2 1/2] lib names: Use CONFDIR for specify 'group' file path
From: Vadim Kochan @ 2014-12-06 2:05 UTC (permalink / raw)
To: netdev; +Cc: Vadim Kochan
In-Reply-To: <1417831512-19452-1-git-send-email-vadim4j@gmail.com>
Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
lib/rt_names.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/rt_names.c b/lib/rt_names.c
index 369e0f4..e6a1e01 100644
--- a/lib/rt_names.c
+++ b/lib/rt_names.c
@@ -469,7 +469,7 @@ static int rtnl_group_init;
static void rtnl_group_initialize(void)
{
rtnl_group_init = 1;
- rtnl_hash_initialize("/etc/iproute2/group",
+ rtnl_hash_initialize(CONFDIR "/group",
rtnl_group_hash, 256);
}
--
2.1.3
^ permalink raw reply related
* [PATCH iproute2 0/2] lib names: Refactoring and cleanups
From: Vadim Kochan @ 2014-12-06 2:05 UTC (permalink / raw)
To: netdev; +Cc: Vadim Kochan
Some cleanups and refactoring in lib/rt_names.c:
#1 Replaced using of /etc/iproute2 path by CONFDIR define
when initializing tables of group names.
#2 Added helper to have one func for parsing id and names from
db files.
Vadim Kochan (2):
lib names: Use CONFDIR for specify 'group' file path
lib names: Add helper func for parse id and name from file
lib/rt_names.c | 70 +++++++++++++++++++++++++++++++++-------------------------
1 file changed, 40 insertions(+), 30 deletions(-)
--
2.1.3
^ permalink raw reply
* Re: iproute2/nstat: Bug in displaying icmp stats
From: Eric Dumazet @ 2014-12-06 2:10 UTC (permalink / raw)
To: Vijay Subramanian; +Cc: netdev
In-Reply-To: <1417828430.15618.20.camel@edumazet-glaptop2.roam.corp.google.com>
On Fri, 2014-12-05 at 17:13 -0800, Eric Dumazet wrote:
> I guess we could count number of spaces/fields in both lines,
> and disable the iproute2 trick if counts match.
Something like that maybe ?
misc/nstat.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/misc/nstat.c b/misc/nstat.c
index e54b3ae..c2cb056 100644
--- a/misc/nstat.c
+++ b/misc/nstat.c
@@ -156,6 +156,15 @@ static void load_good_table(FILE *fp)
}
}
+static int count_spaces(const char *line)
+{
+ int count = 0;
+ char c;
+
+ while ((c = *line++) != 0)
+ count += c == ' ' || c == '\n';
+ return count;
+}
static void load_ugly_table(FILE *fp)
{
@@ -167,10 +176,12 @@ static void load_ugly_table(FILE *fp)
char idbuf[sizeof(buf)];
int off;
char *p;
+ int count1, count2, skip = 0;
p = strchr(buf, ':');
if (!p)
abort();
+ count1 = count_spaces(buf);
*p = 0;
idbuf[0] = 0;
strncat(idbuf, buf, sizeof(idbuf) - 1);
@@ -199,6 +210,9 @@ static void load_ugly_table(FILE *fp)
n = db;
if (fgets(buf, sizeof(buf), fp) == NULL)
abort();
+ count2 = count_spaces(buf);
+ if (count2 > count1)
+ skip = count2 - count1;
do {
p = strrchr(buf, ' ');
if (!p)
@@ -207,8 +221,8 @@ static void load_ugly_table(FILE *fp)
if (sscanf(p+1, "%llu", &n->val) != 1)
abort();
/* Trick to skip "dummy" trailing ICMP MIB in 2.4 */
- if (strcmp(idbuf, "IcmpOutAddrMaskReps") == 0)
- idbuf[5] = 0;
+ if (skip)
+ skip--;
else
n = n->next;
} while (p > buf + off + 2);
^ permalink raw reply related
* [PATCH net-next v2 2/2] rocker: remove swdev mode
From: roopa @ 2014-12-06 1:16 UTC (permalink / raw)
To: jiri, sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen,
linville, vyasevic
Cc: netdev, davem, shm, gospo, Roopa Prabhu
From: Roopa Prabhu <roopa@cumulusnetworks.com>
This resets rocker mode to zero (vepa) during gets.
This is because the default getlink handler that rocker
uses today always takes a mode.
Will fix that in a subsequent patch.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
drivers/net/ethernet/rocker/rocker.c | 18 +-----------------
1 file changed, 1 insertion(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c
index fded127..391077c 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -3700,27 +3700,11 @@ static int rocker_port_bridge_setlink(struct net_device *dev,
{
struct rocker_port *rocker_port = netdev_priv(dev);
struct nlattr *protinfo;
- struct nlattr *afspec;
struct nlattr *attr;
- u16 mode;
int err;
protinfo = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg),
IFLA_PROTINFO);
- afspec = nlmsg_find_attr(nlh, sizeof(struct ifinfomsg), IFLA_AF_SPEC);
-
- if (afspec) {
- attr = nla_find_nested(afspec, IFLA_BRIDGE_MODE);
- if (attr) {
- if (nla_len(attr) < sizeof(mode))
- return -EINVAL;
-
- mode = nla_get_u16(attr);
- if (mode != BRIDGE_MODE_SWDEV)
- return -EINVAL;
- }
- }
-
if (protinfo) {
attr = nla_find_nested(protinfo, IFLA_BRPORT_LEARNING);
if (attr) {
@@ -3755,7 +3739,7 @@ static int rocker_port_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
u32 filter_mask)
{
struct rocker_port *rocker_port = netdev_priv(dev);
- u16 mode = BRIDGE_MODE_SWDEV;
+ u16 mode = 0;
u32 mask = BR_LEARNING | BR_LEARNING_SYNC;
return ndo_dflt_bridge_getlink(skb, pid, seq, dev, mode,
--
1.7.10.4
^ permalink raw reply related
* [PATCH net-next v2 1/2] bridge: remove mode 'swdev'
From: roopa @ 2014-12-06 1:16 UTC (permalink / raw)
To: jiri, sfeldma, jhs, bcrl, tgraf, john.fastabend, stephen,
linville, vyasevic
Cc: netdev, davem, shm, gospo, Roopa Prabhu
From: Roopa Prabhu <roopa@cumulusnetworks.com>
swdev mode was introduced to indicate switchdev offloads
for bridging from user space. But user can
use BRIDGE_FLAGS_SELF to directly call into the
hw switch port driver today. swdev mode is not required anymore.
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
---
include/uapi/linux/if_bridge.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 296a556..da17e45 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -105,7 +105,6 @@ struct __fdb_entry {
#define BRIDGE_MODE_VEB 0 /* Default loopback mode */
#define BRIDGE_MODE_VEPA 1 /* 802.1Qbg defined VEPA mode */
-#define BRIDGE_MODE_SWDEV 2 /* Full switch device offload */
/* Bridge management nested attributes
* [IFLA_AF_SPEC] = {
--
1.7.10.4
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox