Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net] net: Reset secmark when scrubbing packet
From: David Miller @ 2014-12-24  5:22 UTC (permalink / raw)
  To: tgraf; +Cc: netdev
In-Reply-To: <efb09aca5173a9a18f15066c33a4998ed70bd34a.1419293504.git.tgraf@suug.ch>

From: Thomas Graf <tgraf@suug.ch>
Date: Tue, 23 Dec 2014 01:13:18 +0100

> skb_scrub_packet() is called when a packet switches between a context
> such as between underlay and overlay, between namespaces, or between
> L3 subnets.
> 
> While we already scrub the packet mark, connection tracking entry,
> and cached destination, the security mark/context is left intact.
> 
> It seems wrong to inherit the security context of a packet when going
> from overlay to underlay or across forwarding paths.
> 
> Signed-off-by: Thomas Graf <tgraf@suug.ch>

Applied and queued up for -stable, thanks Thomas.

^ permalink raw reply

* Re: [PATCH net] net: Generalize ndo_gso_check to ndo_features_check
From: Jesse Gross @ 2014-12-24  5:11 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Tom Herbert, Joe Stringer, Eric Dumazet
In-Reply-To: <20141223.235238.1958837159787674842.davem@davemloft.net>

On Tue, Dec 23, 2014 at 11:52 PM, David Miller <davem@davemloft.net> wrote:
> From: Jesse Gross <jesse@nicira.com>
> Date: Mon, 22 Dec 2014 08:03:43 -0800
>
>> GSO isn't the only offload feature with restrictions that
>> potentially can't be expressed with the current features mechanism.
>> Checksum is another although it's a general issue that could in
>> theory apply to anything. Even if it may be possible to
>> implement these restrictions in other ways, it can result in
>> duplicate code or inefficient per-packet behavior.
>>
>> This generalizes ndo_gso_check so that drivers can remove any
>> features that don't make sense for a given packet, similar to
>> netif_skb_features(). It also converts existing driver
>> restrictions to the new format, completing the work that was
>> done to support tunnel protocols since the issues apply to
>> checksums as well.
>>
>> CC: Tom Herbert <therbert@google.com>
>> CC: Joe Stringer <joestringer@nicira.com>
>> CC: Eric Dumazet <edumazet@google.com>
>> Signed-off-by: Jesse Gross <jesse@nicira.com>
>> Fixes: 04ffcb255f22 ("net: Add ndo_gso_check")
>
> I don't think this fixes the case which was the main impetus for Eric
> Dumazet's patch.
>
> The r8152 USB networking driver supports TSO, but has a length
> restriction, and we weren't software segmenting when netif_needs_gso()
> returns true, exactly because we didn't clear TSO from the feature
> flags.

I believe that this should behave exactly the same as Eric's patch in
this case. The driver would implement the length validation and return
the set of features with ANDed with ~NETIF_F_GSO_MASK. This is
combined with the features computed by netif_skb_features() used for
all future offload decisions, including skb_gso_segment().

^ permalink raw reply

* Re: [PATCH net] net: Fix stacked vlan offload features computation
From: David Miller @ 2014-12-24  5:09 UTC (permalink / raw)
  To: makita.toshiaki; +Cc: jesse, netdev
In-Reply-To: <1419242654-4824-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>

From: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Date: Mon, 22 Dec 2014 19:04:14 +0900

> When vlan tags are stacked, it is very likely that the outer tag is stored
> in skb->vlan_tci and skb->protocol shows the inner tag's vlan_proto.
> Currently netif_skb_features() first looks at skb->protocol even if there
> is the outer tag in vlan_tci, thus it incorrectly retrieves the protocol
> encapsulated by the inner vlan instead of the inner vlan protocol.
> This allows GSO packets to be passed to HW and they end up being
> corrupted.
> 
> Fixes: 58e998c6d239 ("offloading: Force software GSO for multiple vlan tags.")
> Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net] net: Generalize ndo_gso_check to ndo_features_check
From: David Miller @ 2014-12-24  4:52 UTC (permalink / raw)
  To: jesse; +Cc: netdev, therbert, joestringer, edumazet
In-Reply-To: <1419264223-30004-1-git-send-email-jesse@nicira.com>

From: Jesse Gross <jesse@nicira.com>
Date: Mon, 22 Dec 2014 08:03:43 -0800

> GSO isn't the only offload feature with restrictions that
> potentially can't be expressed with the current features mechanism.
> Checksum is another although it's a general issue that could in
> theory apply to anything. Even if it may be possible to
> implement these restrictions in other ways, it can result in
> duplicate code or inefficient per-packet behavior.
> 
> This generalizes ndo_gso_check so that drivers can remove any
> features that don't make sense for a given packet, similar to
> netif_skb_features(). It also converts existing driver
> restrictions to the new format, completing the work that was
> done to support tunnel protocols since the issues apply to
> checksums as well.
> 
> CC: Tom Herbert <therbert@google.com>
> CC: Joe Stringer <joestringer@nicira.com>
> CC: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Jesse Gross <jesse@nicira.com>
> Fixes: 04ffcb255f22 ("net: Add ndo_gso_check")

I don't think this fixes the case which was the main impetus for Eric
Dumazet's patch.

The r8152 USB networking driver supports TSO, but has a length
restriction, and we weren't software segmenting when netif_needs_gso()
returns true, exactly because we didn't clear TSO from the feature
flags.

We really need to sort this out.

^ permalink raw reply

* Re: [PATCH 4/4] net: Rearrange loop in net_rx_action
From: David Miller @ 2014-12-24  4:20 UTC (permalink / raw)
  To: herbert
  Cc: eric.dumazet, david.vrabel, netdev, xen-devel, konrad.wilk,
	boris.ostrovsky, edumazet
In-Reply-To: <E1Y2QRt-0001qq-Lp@gondolin.me.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 21 Dec 2014 07:16:25 +1100

> This patch rearranges the loop in net_rx_action to reduce the
> amount of jumping back and forth when reading the code.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied.

^ permalink raw reply

* Re: [PATCH 3/4] net: Always poll at least one device in net_rx_action
From: David Miller @ 2014-12-24  4:20 UTC (permalink / raw)
  To: herbert
  Cc: eric.dumazet, david.vrabel, netdev, xen-devel, konrad.wilk,
	boris.ostrovsky, edumazet
In-Reply-To: <E1Y2QRs-0001q5-6u@gondolin.me.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 21 Dec 2014 07:16:24 +1100

> We should only perform the softnet_break check after we have polled
> at least one device in net_rx_action.  Otherwise a zero or negative
> setting of netdev_budget can lock up the whole system.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied.

^ permalink raw reply

* Re: [PATCH 2/4] net: Detect drivers that reschedule NAPI and exhaust budget
From: David Miller @ 2014-12-24  4:20 UTC (permalink / raw)
  To: herbert
  Cc: eric.dumazet, david.vrabel, netdev, xen-devel, konrad.wilk,
	boris.ostrovsky, edumazet
In-Reply-To: <E1Y2QRq-0001pW-Tn@gondolin.me.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 21 Dec 2014 07:16:22 +1100

> The commit d75b1ade567ffab085e8adbbdacf0092d10cd09c (net: less
> interrupt masking in NAPI) required drivers to leave poll_list
> empty if the entire budget is consumed.
> 
> We have already had two broken drivers so let's add a check for
> this.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied.

^ permalink raw reply

* Re: [PATCH 1/4] net: Move napi polling code out of net_rx_action
From: David Miller @ 2014-12-24  4:20 UTC (permalink / raw)
  To: herbert
  Cc: eric.dumazet, david.vrabel, netdev, xen-devel, konrad.wilk,
	boris.ostrovsky, edumazet
In-Reply-To: <E1Y2QRp-0001pD-J7@gondolin.me.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Sun, 21 Dec 2014 07:16:21 +1100

> This patch creates a new function napi_poll and moves the napi
> polling code from net_rx_action into it.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied.

^ permalink raw reply

* Re: [PATCH v3] 3c59x: Fix memory leaks in vortex_open
From: David Miller @ 2014-12-24  4:10 UTC (permalink / raw)
  To: nhorman
  Cc: baijiaju1990, ebiederm, dingtianhong, paul.gortmaker,
	justinvanwijngaarden, netdev
In-Reply-To: <20141224032728.GA20392@localhost.localdomain>

From: Neil Horman <nhorman@tuxdriver.com>
Date: Tue, 23 Dec 2014 22:27:28 -0500

> Sooo, fix it.  Add some checks to not delete the timer if its not been
> initalized.  Its really preferable to have a single teardown path and a single
> bringup path if at all possible

Or simply have vortex_up() initialize the two timers before it does
anything else.

^ permalink raw reply

* Re: [PATCH v3] 3c59x: Fix memory leaks in vortex_open
From: Neil Horman @ 2014-12-24  3:27 UTC (permalink / raw)
  To: Jia-Ju Bai
  Cc: davem, ebiederm, dingtianhong, paul.gortmaker,
	justinvanwijngaarden, netdev
In-Reply-To: <549A212A.60001@163.com>

On Wed, Dec 24, 2014 at 10:12:58AM +0800, Jia-Ju Bai wrote:
> On 12/23/2014 11:43 PM, Neil Horman wrote:
> >No, I don't think so.  vortex_close predicates each free with a NULL check, so
> >if its not been allocated, it shouldn't be freed.  vortex_close also puts the
> >adapter back into a known state (undoing all the setup that vortex_open does).
> >I really think its better to go with the proper close path than just unwinding
> >the allocation
> >
> >Neil
> >
> 
> Firstly, I run my match on the real hardware(3com 3c905B 100Base
> PCI Ethernet Controller) and make vortex_up failed on purpose
> (make "pci_enable_device" in vortex_up failed). During runtime, the driver
> works well and memory leaks are fixed.
> 
> Secondly, I revise the code according to your opinion:
> 
>         retval = vortex_up(dev);
>         if (!retval)
>             goto out;
> 
> +      vortex_close(dev);
> +      return -ENOMEM;
> 
> Then I repeat my experiment, but system hang occurs!
> 
> After adding some "printk"s into the code and running the driver, I find
> the problem's source:
> vortex_close calls vortex_down in runtime, and vortex_down calls
> "del_timer_sync(&vp->rx_oom_timer);" in the code. However, I make
> "pci_enable_device" failed in vortext_up to let vortex_up return an
> error code directly, but "vp->rx_oom_timer" is initialized only by
> "init_timer" after "pci_enable_device". Thus when
> "del_timer_sync(&vp->rx_oom_timer);" is called in vortex_down,
> a null dereference may occur.
> Moreover, only "pci_enable_device" can make vortex_up failed.
> 
> 
Sooo, fix it.  Add some checks to not delete the timer if its not been
initalized.  Its really preferable to have a single teardown path and a single
bringup path if at all possible
Neil

^ permalink raw reply

* Re: [PATCH net 5/6] openvswitch: Fix vport_send double free
From: Lino Sanfilippo @ 2014-12-24  3:16 UTC (permalink / raw)
  To: Pravin B Shelar, davem; +Cc: netdev
In-Reply-To: <1419380432-1665-1-git-send-email-pshelar@nicira.com>

Hi,

On 24.12.2014 01:20, Pravin B Shelar wrote:
l_hlen = ip_gre_calc_hlen(tun_key->tun_flags);
>  
> @@ -183,8 +185,9 @@ static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
>  
>  	/* Push Tunnel header. */
>  	skb = __build_header(skb, tunnel_hlen);
> -	if (unlikely(!skb)) {
> -		err = 0;
> +	if (IS_ERR(skb)) {
> +		err = PTR_ERR(rt);

Shouldn't be it

err = PTR_ERR(skb); ?

Regards,
Lino

^ permalink raw reply

* [PATCH net-next V2] virtio-net: don't do header check for dodgy gso packets
From: Jason Wang @ 2014-12-24  3:03 UTC (permalink / raw)
  To: rusty, mst, virtualization, netdev, linux-kernel

There's no need to do header check for virtio-net since:

- Host sets dodgy for all gso packets from guest and check the header.
- Host should be prepared for all kinds of evil packets from guest, since
  malicious guest can send any kinds of packet.

So this patch sets NETIF_F_GSO_ROBUST for virtio-net to skip the check.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
Changes from V1:
- typo fixes
---
 drivers/net/virtio_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b8bd719..45c6ce2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1761,6 +1761,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_ECN))
 			dev->hw_features |= NETIF_F_TSO_ECN;
 
+		dev->features |= NETIF_F_GSO_ROBUST;
+
 		if (gso)
 			dev->features |= dev->hw_features & NETIF_F_ALL_TSO;
 		/* (!csum && gso) case will be fixed by register_netdev() */
-- 
1.9.1

^ permalink raw reply related

* RE: [PATCH 2/3] net/fsl: remove irq assignment from xgmac_mdio
From: Shaohui Xie @ 2014-12-24  2:22 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: shh.xie@gmail.com, netdev@vger.kernel.org, davem@davemloft.net
In-Reply-To: <54997817.6010108@cogentembedded.com>

> -----Original Message-----
> From: Sergei Shtylyov [mailto:sergei.shtylyov@cogentembedded.com]
> Sent: Tuesday, December 23, 2014 10:12 PM
> To: shh.xie@gmail.com; netdev@vger.kernel.org; davem@davemloft.net
> Cc: Xie Shaohui-B21989
> Subject: Re: [PATCH 2/3] net/fsl: remove irq assignment from xgmac_mdio
> 
> Hello.
> 
> On 12/23/2014 12:46 PM, shh.xie@gmail.com wrote:
> 
> > From: Shaohui Xie <Shaohui.Xie@freescale.com>
> 
> > Which is wrong and not used, so no extra space needed by
> > mdio_alloc_size(), change the parameter accordingly.
> 
> > Signed-off-by: Shaohui Xie <Shaohui.Xie@freescale.com>
> > ---
> >   drivers/net/ethernet/freescale/xgmac_mdio.c | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> > diff --git a/drivers/net/ethernet/freescale/xgmac_mdio.c
> > b/drivers/net/ethernet/freescale/xgmac_mdio.c
> > index 90adba1..72e0b85 100644
> > --- a/drivers/net/ethernet/freescale/xgmac_mdio.c
> > +++ b/drivers/net/ethernet/freescale/xgmac_mdio.c
> > @@ -187,14 +187,13 @@ static int xgmac_mdio_probe(struct
> platform_device *pdev)
> >   		return ret;
> >   	}
> >
> > -	bus = mdiobus_alloc_size(PHY_MAX_ADDR * sizeof(int));
> > +	bus = mdiobus_alloc_size(0);
> 
>     It's now equivalent to a mere mdiobus_alloc().
[S.H] Yes, mdiobus_alloc() defined as:

static inline struct mii_bus *mdiobus_alloc(void)
{       
        return mdiobus_alloc_size(0);
} 

Should I use mdiobus_alloc() instead?

Thanks!
Shaohui

^ permalink raw reply

* Re: [PATCH v3] 3c59x: Fix memory leaks in vortex_open
From: Jia-Ju Bai @ 2014-12-24  2:12 UTC (permalink / raw)
  To: Neil Horman
  Cc: davem, ebiederm, dingtianhong, paul.gortmaker,
	justinvanwijngaarden, netdev
In-Reply-To: <20141223154313.GE31876@hmsreliant.think-freely.org>

On 12/23/2014 11:43 PM, Neil Horman wrote:
> No, I don't think so.  vortex_close predicates each free with a NULL check, so
> if its not been allocated, it shouldn't be freed.  vortex_close also puts the
> adapter back into a known state (undoing all the setup that vortex_open does).
> I really think its better to go with the proper close path than just unwinding
> the allocation
>
> Neil
>

Firstly, I run my match on the real hardware(3com 3c905B 100Base
PCI Ethernet Controller) and make vortex_up failed on purpose
(make "pci_enable_device" in vortex_up failed). During runtime, the driver
works well and memory leaks are fixed.

Secondly, I revise the code according to your opinion:

         retval = vortex_up(dev);
         if (!retval)
             goto out;

+      vortex_close(dev);
+      return -ENOMEM;

Then I repeat my experiment, but system hang occurs!

After adding some "printk"s into the code and running the driver, I find
the problem's source:
vortex_close calls vortex_down in runtime, and vortex_down calls
"del_timer_sync(&vp->rx_oom_timer);" in the code. However, I make
"pci_enable_device" failed in vortext_up to let vortex_up return an
error code directly, but "vp->rx_oom_timer" is initialized only by
"init_timer" after "pci_enable_device". Thus when
"del_timer_sync(&vp->rx_oom_timer);" is called in vortex_down,
a null dereference may occur.
Moreover, only "pci_enable_device" can make vortex_up failed.

^ permalink raw reply

* [PATCH iproute2 v3] tc: Show classes in tree view
From: Vadim Kochan @ 2014-12-24  0:46 UTC (permalink / raw)
  To: netdev; +Cc: Vadim Kochan

From: Vadim Kochan <vadim4j@gmail.com>

Added new '-t[ree]' which shows classes dependency
in the tree view. Meanwhile only generic stats info
is supported.

e.g.:

$ tc/tc -t class show dev tap0
+---(1:2) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    +---(1:40) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
|    +---(1:50) htb rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    |    +---(1:51) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|    |
|    +---(1:60) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|
+---(1:1) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
     +---(1:10) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
     +---(1:20) htb prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
     +---(1:30) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b

$ tc/tc -t -s class show dev tap0
+---(1:2) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    |    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |    rate 0bit 0pps backlog 0b 0p requeues 0
|    |
|    +---(1:40) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
|    |          Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |          rate 0bit 0pps backlog 0b 0p requeues 0
|    |
|    +---(1:50) htb rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
|    |    |     Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |    |     rate 0bit 0pps backlog 0b 0p requeues 0
|    |    |
|    |    +---(1:51) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|    |               Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|    |               rate 0bit 0pps backlog 0b 0p requeues 0
|    |
|    +---(1:60) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
|               Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
|               rate 0bit 0pps backlog 0b 0p requeues 0
|
+---(1:1) htb rate 6Mbit ceil 6Mbit burst 15Kb cburst 1599b
     |    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
     |    rate 0bit 0pps backlog 0b 0p requeues 0
     |
     +---(1:10) htb prio 0 rate 5Mbit ceil 5Mbit burst 15Kb cburst 1600b
     |          Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
     |          rate 0bit 0pps backlog 0b 0p requeues 0
     |
     +---(1:20) htb prio 0 rate 3Mbit ceil 6Mbit burst 15Kb cburst 1599b
     |          Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
     |          rate 0bit 0pps backlog 0b 0p requeues 0
     |
     +---(1:30) htb prio 0 rate 1Kbit ceil 6Mbit burst 15Kb cburst 1599b
                Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
                rate 0bit 0pps backlog 0b 0p requeues 0

Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
---
Changes v3:
    Fixed wrong brackets style

Changes v2:
    Removed "Date:" from commit message which was added by mistake.

Changes RFC -> PATCH:
    #1 get rid of INIT_HLIST_NODE
    #2 added sample output to commit message
    #3 use "show_tree=1" instead of "show_tree++"
    #4 no need update include/hlist.h (because of #1)
    #5 changed a little tree output: parentheses around class id instead of qdisc name

 tc/tc.c        |   5 +-
 tc/tc_class.c  | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 tc/tc_common.h |   2 +
 3 files changed, 165 insertions(+), 3 deletions(-)

diff --git a/tc/tc.c b/tc/tc.c
index 9b50e74..30950a6 100644
--- a/tc/tc.c
+++ b/tc/tc.c
@@ -34,8 +34,9 @@ int show_stats = 0;
 int show_details = 0;
 int show_raw = 0;
 int show_pretty = 0;
-int batch_mode = 0;
+int show_tree = 0;
 
+int batch_mode = 0;
 int resolve_hosts = 0;
 int use_iec = 0;
 int force = 0;
@@ -278,6 +279,8 @@ int main(int argc, char **argv)
 			++show_raw;
 		} else if (matches(argv[1], "-pretty") == 0) {
 			++show_pretty;
+		} else if (matches(argv[1], "-tree") == 0) {
+			show_tree = 1;
 		} else if (matches(argv[1], "-Version") == 0) {
 			printf("tc utility, iproute2-ss%s\n", SNAPSHOT);
 			return 0;
diff --git a/tc/tc_class.c b/tc/tc_class.c
index e56bf07..f155294 100644
--- a/tc/tc_class.c
+++ b/tc/tc_class.c
@@ -24,6 +24,22 @@
 #include "utils.h"
 #include "tc_util.h"
 #include "tc_common.h"
+#include "hlist.h"
+
+struct cls_node {
+	struct hlist_node hlist;
+	__u32 handle;
+	__u32 parent;
+	int level;
+	struct cls_node *cls_parent;
+	struct cls_node *cls_right;
+	struct rtattr *attr;
+	int attr_len;
+	int childs_count;
+};
+
+static struct hlist_head cls_list = {};
+static struct hlist_head root_cls_list = {};
 
 static void usage(void);
 
@@ -148,13 +164,145 @@ int filter_ifindex;
 __u32 filter_qdisc;
 __u32 filter_classid;
 
+static void tree_cls_add(__u32 parent, __u32 handle, struct rtattr *attr, int len)
+{
+	struct cls_node *cls = malloc(sizeof(struct cls_node));
+
+	memset(cls, 0, sizeof(*cls));
+	cls->handle    = handle;
+	cls->parent    = parent;
+	cls->attr      = malloc(len);
+	cls->attr_len  = len;
+
+	memcpy(cls->attr, attr, len);
+
+	if (parent == TC_H_ROOT)
+		hlist_add_head(&cls->hlist, &root_cls_list);
+	else
+		hlist_add_head(&cls->hlist, &cls_list);
+}
+
+static void tree_cls_indent(char *buf, struct cls_node *cls, int is_newline,
+		int add_spaces)
+{
+	char spaces[100] = {0};
+
+	while (cls && cls->cls_parent) {
+		cls->cls_parent->cls_right = cls;
+		cls = cls->cls_parent;
+	}
+	while (cls && cls->cls_right) {
+		if (cls->hlist.next)
+			strcat(buf, "|    ");
+		else
+			strcat(buf, "     ");
+
+		cls = cls->cls_right;
+	}
+
+	if (is_newline) {
+		if (cls->hlist.next && cls->childs_count)
+			strcat(buf, "|    |");
+		else if (cls->hlist.next)
+			strcat(buf, "|     ");
+		else if (cls->childs_count)
+			strcat(buf, "     |");
+		else if (!cls->hlist.next)
+			strcat(buf, "      ");
+	}
+	if (add_spaces > 0) {
+		sprintf(spaces, "%-*s", add_spaces, "");
+		strcat(buf, spaces);
+	}
+}
+
+static void tree_cls_show(FILE *fp, char *buf, struct hlist_head *root_list, int level)
+{
+	struct hlist_node *n, *tmp_cls;
+	char cls_id_str[256] = {};
+	struct rtattr * tb[TCA_MAX+1] = {};
+	struct qdisc_util *q;
+	char str[100] = {};
+
+	hlist_for_each_safe(n, tmp_cls, root_list) {
+		struct hlist_node *c, *tmp_chld;
+		struct hlist_head childs = {};
+		struct cls_node *cls = container_of(n, struct cls_node, hlist);
+
+		hlist_for_each_safe(c, tmp_chld, &cls_list) {
+			struct cls_node *child = container_of(c, struct cls_node, hlist);
+
+			if (cls->handle == child->parent) {
+				hlist_del(c);
+				hlist_add_head(c, &childs);
+				cls->childs_count++;
+				child->cls_parent = cls;
+			}
+		}
+
+		tree_cls_indent(buf, cls, 0, 0);
+
+		print_tc_classid(cls_id_str, sizeof(cls_id_str), cls->handle);
+		sprintf(str, "+---(%s)", cls_id_str);
+		strcat(buf, str);
+
+		parse_rtattr(tb, TCA_MAX, cls->attr, cls->attr_len);
+
+		if (tb[TCA_KIND] == NULL) {
+			strcat(buf, " [unknown qdisc kind] ");
+		} else {
+			const char *kind = rta_getattr_str(tb[TCA_KIND]);
+
+			sprintf(str, " %s ", kind);
+			strcat(buf, str);
+			fprintf(fp, "%s", buf);
+			buf[0] = '\0';
+
+			q = get_qdisc_kind(kind);
+			if (q && q->print_copt) {
+				q->print_copt(q, fp, tb[TCA_OPTIONS]);
+			}
+			if (q && show_stats) {
+				int cls_indent = strlen(q->id) - 2 +
+					strlen(cls_id_str);
+				struct rtattr *xstats = NULL;
+
+				tree_cls_indent(buf, cls, 1, cls_indent);
+
+				if (tb[TCA_STATS] || tb[TCA_STATS2]) {
+					fprintf(fp, "\n");
+					print_tcstats_attr(fp, tb, buf, &xstats);
+					buf[0] = '\0';
+				}
+				if (cls->hlist.next || cls->childs_count) {
+					strcat(buf, "\n");
+					tree_cls_indent(buf, cls, 1, 0);
+				}
+			}
+		}
+		free(cls->attr);
+		fprintf(fp, "%s\n", buf);
+		buf[0] = '\0';
+
+		tree_cls_show(fp, buf, &childs, level + 1);
+		if (!cls->hlist.next) {
+			tree_cls_indent(buf, cls, 0, 0);
+			strcat(buf, "\n");
+		}
+
+		fprintf(fp, "%s", buf);
+		buf[0] = '\0';
+		free(cls);
+	}
+}
+
 int print_class(const struct sockaddr_nl *who,
 		       struct nlmsghdr *n, void *arg)
 {
 	FILE *fp = (FILE*)arg;
 	struct tcmsg *t = NLMSG_DATA(n);
 	int len = n->nlmsg_len;
-	struct rtattr * tb[TCA_MAX+1];
+	struct rtattr * tb[TCA_MAX+1] = {};
 	struct qdisc_util *q;
 	char abuf[256];
 
@@ -167,13 +315,18 @@ int print_class(const struct sockaddr_nl *who,
 		fprintf(stderr, "Wrong len %d\n", len);
 		return -1;
 	}
+
+	if (show_tree) {
+		tree_cls_add(t->tcm_parent, t->tcm_handle, TCA_RTA(t), len);
+		return 0;
+	}
+
 	if (filter_qdisc && TC_H_MAJ(t->tcm_handle^filter_qdisc))
 		return 0;
 
 	if (filter_classid && t->tcm_handle != filter_classid)
 		return 0;
 
-	memset(tb, 0, sizeof(tb));
 	parse_rtattr(tb, TCA_MAX, TCA_RTA(t), len);
 
 	if (tb[TCA_KIND] == NULL) {
@@ -236,6 +389,7 @@ static int tc_class_list(int argc, char **argv)
 {
 	struct tcmsg t;
 	char d[16];
+	char buf[1024] = {0};
 
 	memset(&t, 0, sizeof(t));
 	t.tcm_family = AF_UNSPEC;
@@ -306,6 +460,9 @@ static int tc_class_list(int argc, char **argv)
 		return 1;
 	}
 
+	if (show_tree)
+		tree_cls_show(stdout, &buf[0], &root_cls_list, 0);
+
 	return 0;
 }
 
diff --git a/tc/tc_common.h b/tc/tc_common.h
index 4f88856..0ee009b 100644
--- a/tc/tc_common.h
+++ b/tc/tc_common.h
@@ -19,3 +19,5 @@ extern int parse_estimator(int *p_argc, char ***p_argv, struct tc_estimator *est
 struct tc_sizespec;
 extern int parse_size_table(int *p_argc, char ***p_argv, struct tc_sizespec *s);
 extern int check_size_table_opts(struct tc_sizespec *s);
+
+extern int show_tree;
-- 
2.1.3

^ permalink raw reply related

* [PATCH net 6/6] vxlan: Fix double free of skb.
From: Pravin B Shelar @ 2014-12-24  0:20 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar

In case of error vxlan_xmit_one() can free already freed skb.
Also fixes memory leak of dst-entry.

Fixes: acbf74a7630 ("vxlan: Refactor vxlan driver to make use
of the common UDP tunnel functions").

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 drivers/net/vxlan.c |   34 ++++++++++++++++++++++++----------
 1 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 49d9f22..7fbd89f 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1579,8 +1579,10 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 	bool udp_sum = !udp_get_no_check6_tx(vs->sock->sk);
 
 	skb = udp_tunnel_handle_offloads(skb, udp_sum);
-	if (IS_ERR(skb))
-		return -EINVAL;
+	if (IS_ERR(skb)) {
+		err = -EINVAL;
+		goto err;
+	}
 
 	skb_scrub_packet(skb, xnet);
 
@@ -1590,12 +1592,16 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 
 	/* Need space for new headers (invalidates iph ptr) */
 	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err))
-		return err;
+	if (unlikely(err)) {
+		kfree_skb(skb);
+		goto err;
+	}
 
 	skb = vlan_hwaccel_push_inside(skb);
-	if (WARN_ON(!skb))
-		return -ENOMEM;
+	if (WARN_ON(!skb)) {
+		err = -ENOMEM;
+		goto err;
+	}
 
 	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
 	vxh->vx_flags = htonl(VXLAN_FLAGS);
@@ -1606,6 +1612,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 	udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
 			     ttl, src_port, dst_port);
 	return 0;
+err:
+	dst_release(dst);
+	return err;
 }
 #endif
 
@@ -1621,7 +1630,7 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 
 	skb = udp_tunnel_handle_offloads(skb, udp_sum);
 	if (IS_ERR(skb))
-		return -EINVAL;
+		return PTR_ERR(skb);
 
 	min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
 			+ VXLAN_HLEN + sizeof(struct iphdr)
@@ -1629,8 +1638,10 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 
 	/* Need space for new headers (invalidates iph ptr) */
 	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err))
+	if (unlikely(err)) {
+		kfree_skb(skb);
 		return err;
+	}
 
 	skb = vlan_hwaccel_push_inside(skb);
 	if (WARN_ON(!skb))
@@ -1776,9 +1787,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 				     tos, ttl, df, src_port, dst_port,
 				     htonl(vni << 8),
 				     !net_eq(vxlan->net, dev_net(vxlan->dev)));
-
-		if (err < 0)
+		if (err < 0) {
+			/* skb is already freed. */
+			skb = NULL;
 			goto rt_tx_error;
+		}
+
 		iptunnel_xmit_stats(err, &dev->stats, dev->tstats);
 #if IS_ENABLED(CONFIG_IPV6)
 	} else {
-- 
1.7.1

^ permalink raw reply related

* [PATCH net 5/6] openvswitch: Fix vport_send double free
From: Pravin B Shelar @ 2014-12-24  0:20 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar

Today vport-send has complex error handling because it involves
freeing skb and updating stats depending on return value from
vport send implementation.
This can be simplified by delegating responsibility of freeing
skb to the vport implementation for all cases. So that
vport-send needs just update stats.

Fixes: 91b7514cdf ("openvswitch: Unify vport error stats
handling")
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/ipv4/geneve.c              |    6 +++++-
 net/openvswitch/vport-geneve.c |    3 +++
 net/openvswitch/vport-gre.c    |   18 +++++++++++-------
 net/openvswitch/vport-vxlan.c  |    2 ++
 net/openvswitch/vport.c        |    5 ++---
 5 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/net/ipv4/geneve.c b/net/ipv4/geneve.c
index 95e47c9..394a200 100644
--- a/net/ipv4/geneve.c
+++ b/net/ipv4/geneve.c
@@ -122,14 +122,18 @@ int geneve_xmit_skb(struct geneve_sock *gs, struct rtable *rt,
 	int err;
 
 	skb = udp_tunnel_handle_offloads(skb, !gs->sock->sk->sk_no_check_tx);
+	if (IS_ERR(skb))
+		return PTR_ERR(skb);
 
 	min_headroom = LL_RESERVED_SPACE(rt->dst.dev) + rt->dst.header_len
 			+ GENEVE_BASE_HLEN + opt_len + sizeof(struct iphdr)
 			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
 
 	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err))
+	if (unlikely(err)) {
+		kfree_skb(skb);
 		return err;
+	}
 
 	skb = vlan_hwaccel_push_inside(skb);
 	if (unlikely(!skb))
diff --git a/net/openvswitch/vport-geneve.c b/net/openvswitch/vport-geneve.c
index 347fa23..484864d 100644
--- a/net/openvswitch/vport-geneve.c
+++ b/net/openvswitch/vport-geneve.c
@@ -219,7 +219,10 @@ static int geneve_tnl_send(struct vport *vport, struct sk_buff *skb)
 			      false);
 	if (err < 0)
 		ip_rt_put(rt);
+	return err;
+
 error:
+	kfree_skb(skb);
 	return err;
 }
 
diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index 6b69df5..28f54e9 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -73,7 +73,7 @@ static struct sk_buff *__build_header(struct sk_buff *skb,
 
 	skb = gre_handle_offloads(skb, !!(tun_key->tun_flags & TUNNEL_CSUM));
 	if (IS_ERR(skb))
-		return NULL;
+		return skb;
 
 	tpi.flags = filter_tnl_flags(tun_key->tun_flags);
 	tpi.proto = htons(ETH_P_TEB);
@@ -144,7 +144,7 @@ static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
 
 	if (unlikely(!OVS_CB(skb)->egress_tun_info)) {
 		err = -EINVAL;
-		goto error;
+		goto err_free_skb;
 	}
 
 	tun_key = &OVS_CB(skb)->egress_tun_info->tunnel;
@@ -157,8 +157,10 @@ static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
 	fl.flowi4_proto = IPPROTO_GRE;
 
 	rt = ip_route_output_key(net, &fl);
-	if (IS_ERR(rt))
-		return PTR_ERR(rt);
+	if (IS_ERR(rt)) {
+		err = PTR_ERR(rt);
+		goto err_free_skb;
+	}
 
 	tunnel_hlen = ip_gre_calc_hlen(tun_key->tun_flags);
 
@@ -183,8 +185,9 @@ static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
 
 	/* Push Tunnel header. */
 	skb = __build_header(skb, tunnel_hlen);
-	if (unlikely(!skb)) {
-		err = 0;
+	if (IS_ERR(skb)) {
+		err = PTR_ERR(rt);
+		skb = NULL;
 		goto err_free_rt;
 	}
 
@@ -198,7 +201,8 @@ static int gre_tnl_send(struct vport *vport, struct sk_buff *skb)
 			     tun_key->ipv4_tos, tun_key->ipv4_ttl, df, false);
 err_free_rt:
 	ip_rt_put(rt);
-error:
+err_free_skb:
+	kfree_skb(skb);
 	return err;
 }
 
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index 38f95a5..d7c46b3 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -187,7 +187,9 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 			     false);
 	if (err < 0)
 		ip_rt_put(rt);
+	return err;
 error:
+	kfree_skb(skb);
 	return err;
 }
 
diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c
index 9584526..53f3ebb 100644
--- a/net/openvswitch/vport.c
+++ b/net/openvswitch/vport.c
@@ -519,10 +519,9 @@ int ovs_vport_send(struct vport *vport, struct sk_buff *skb)
 		u64_stats_update_end(&stats->syncp);
 	} else if (sent < 0) {
 		ovs_vport_record_error(vport, VPORT_E_TX_ERROR);
-		kfree_skb(skb);
-	} else
+	} else {
 		ovs_vport_record_error(vport, VPORT_E_TX_DROPPED);
-
+	}
 	return sent;
 }
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net 4/6] openvswitch: Fix GSO with multiple MPLS label.
From: Pravin B Shelar @ 2014-12-24  0:20 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar

MPLS GSO needs to know inner most protocol to process GSO packets.

Fixes: 25cd9ba0abc ("openvswitch: Add basic MPLS support to
kernel").

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/openvswitch/actions.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c
index 764fdc3..770064c 100644
--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -147,7 +147,8 @@ static int push_mpls(struct sk_buff *skb, struct sw_flow_key *key,
 	hdr = eth_hdr(skb);
 	hdr->h_proto = mpls->mpls_ethertype;
 
-	skb_set_inner_protocol(skb, skb->protocol);
+	if (!skb->inner_protocol)
+		skb_set_inner_protocol(skb, skb->protocol);
 	skb->protocol = mpls->mpls_ethertype;
 
 	invalidate_flow_key(key);
-- 
1.7.1

^ permalink raw reply related

* [PATCH net 2/6] mpls: Fix allowed protocols for mpls gso
From: Pravin B Shelar @ 2014-12-24  0:20 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar

MPLS and Tunnel GSO does not work together.  Reject packet which
request such GSO.

Fixes: 0d89d2035f ("MPLS: Add limited GSO support").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/mpls/mpls_gso.c |    5 +----
 1 files changed, 1 insertions(+), 4 deletions(-)

diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index ca27837..349295d 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -31,10 +31,7 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
 				  SKB_GSO_TCPV6 |
 				  SKB_GSO_UDP |
 				  SKB_GSO_DODGY |
-				  SKB_GSO_TCP_ECN |
-				  SKB_GSO_GRE |
-				  SKB_GSO_GRE_CSUM |
-				  SKB_GSO_IPIP)))
+				  SKB_GSO_TCP_ECN)))
 		goto out;
 
 	/* Setup inner SKB. */
-- 
1.7.1

^ permalink raw reply related

* [PATCH net 3/6] openvswitch: Fix MPLS action validation.
From: Pravin B Shelar @ 2014-12-24  0:20 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar

Linux stack does not implement GSO for packet with multiple
encapsulations.  Therefore there was check in MPLS action
validation to detect such case, But this check introduced
bug which deleted one or more actions from actions list.
Following patch removes this check to fix the validation.

Fixes: 25cd9ba0abc ("openvswitch: Add basic MPLS support to
kernel").

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Reported-by: Srinivas Neginhal <sneginha@vmware.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
---
 net/openvswitch/flow_netlink.c |   13 +------------
 1 files changed, 1 insertions(+), 12 deletions(-)

diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
index 9645a21..d1eecf7 100644
--- a/net/openvswitch/flow_netlink.c
+++ b/net/openvswitch/flow_netlink.c
@@ -1753,7 +1753,6 @@ static int __ovs_nla_copy_actions(const struct nlattr *attr,
 				  __be16 eth_type, __be16 vlan_tci, bool log)
 {
 	const struct nlattr *a;
-	bool out_tnl_port = false;
 	int rem, err;
 
 	if (depth >= SAMPLE_ACTION_DEPTH)
@@ -1796,8 +1795,6 @@ static int __ovs_nla_copy_actions(const struct nlattr *attr,
 		case OVS_ACTION_ATTR_OUTPUT:
 			if (nla_get_u32(a) >= DP_MAX_PORTS)
 				return -EINVAL;
-			out_tnl_port = false;
-
 			break;
 
 		case OVS_ACTION_ATTR_HASH: {
@@ -1832,12 +1829,6 @@ static int __ovs_nla_copy_actions(const struct nlattr *attr,
 		case OVS_ACTION_ATTR_PUSH_MPLS: {
 			const struct ovs_action_push_mpls *mpls = nla_data(a);
 
-			/* Networking stack do not allow simultaneous Tunnel
-			 * and MPLS GSO.
-			 */
-			if (out_tnl_port)
-				return -EINVAL;
-
 			if (!eth_p_mpls(mpls->mpls_ethertype))
 				return -EINVAL;
 			/* Prohibit push MPLS other than to a white list
@@ -1873,11 +1864,9 @@ static int __ovs_nla_copy_actions(const struct nlattr *attr,
 
 		case OVS_ACTION_ATTR_SET:
 			err = validate_set(a, key, sfa,
-					   &out_tnl_port, eth_type, log);
+					   &skip_copy, eth_type, log);
 			if (err)
 				return err;
-
-			skip_copy = out_tnl_port;
 			break;
 
 		case OVS_ACTION_ATTR_SAMPLE:
-- 
1.7.1

^ permalink raw reply related

* [PATCH net 1/6] mpls: Fix config check for mpls.
From: Pravin B Shelar @ 2014-12-24  0:20 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar

Fixes MPLS GSO for case when mpls is compiled as kernel module.

Fixes: 0d89d2035f ("MPLS: Add limited GSO support").
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
---
 net/core/dev.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index f411c28..fa621fd 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2522,7 +2522,7 @@ static int illegal_highdma(struct net_device *dev, struct sk_buff *skb)
 /* If MPLS offload request, verify we are testing hardware MPLS features
  * instead of standard features for the netdev.
  */
-#ifdef CONFIG_NET_MPLS_GSO
+#if IS_ENABLED(CONFIG_NET_MPLS_GSO)
 static netdev_features_t net_mpls_features(struct sk_buff *skb,
 					   netdev_features_t features,
 					   __be16 type)
-- 
1.7.1

^ permalink raw reply related

* [PATCH net 0/6] openvswitch: datapath fixes
From: Pravin B Shelar @ 2014-12-24  0:20 UTC (permalink / raw)
  To: davem; +Cc: netdev, Pravin B Shelar

Following patch series is mostly targeted to MPLS fixes. other
patches are related datapth transmit path error handling. 

Pravin B Shelar (6):
  mpls: Fix config check for mpls.
  mpls: Fix allowed protocols for mpls gso
  openvswitch: Fix MPLS action validation.
  openvswitch: Fix GSO with multiple MPLS label.
  openvswitch: Fix vport_send double free
  vxlan: Fix double free of skb.

 drivers/net/vxlan.c            |   34 ++++++++++++++++++++++++----------
 net/core/dev.c                 |    2 +-
 net/ipv4/geneve.c              |    6 +++++-
 net/mpls/mpls_gso.c            |    5 +----
 net/openvswitch/actions.c      |    3 ++-
 net/openvswitch/flow_netlink.c |   13 +------------
 net/openvswitch/vport-geneve.c |    3 +++
 net/openvswitch/vport-gre.c    |   18 +++++++++++-------
 net/openvswitch/vport-vxlan.c  |    2 ++
 net/openvswitch/vport.c        |    5 ++---
 10 files changed, 52 insertions(+), 39 deletions(-)

^ permalink raw reply

* Marvell Kirkwood - MV643XX: near 100% UDP RX packet loss
From: Bruno Prémont @ 2014-12-23 23:18 UTC (permalink / raw)
  To: Sebastian Hesselbarth; +Cc: netdev

Hi,

On a SheevaPlug, Marvell Kirkwood based, I get nearly 100% packet loss
when running iperf in UDP (rx) mode while I get ~400Gb/s in tx.
For TCP both rx and tx result in about 650Mb/s.

Running iperf server on the sheevaplug:
# iperf3 -s -p 3740

Running iperf client on a AMD APU:
# iperf3 -p 3740 -t 5 -b 0 -4 -c sheevaplug $extra

iperf output at the end.

Both systems are interconnected with a Gb/s switch [Netgear GS108E]
which successfully handles 950Mb/s between my client and a AMD turion
system (both directions, UDP as well as TCP).


Am I mis-configuring the SheevaPlug or is there some bug causing all
the UDP packets to get lost? (slow rate UDP is working fine however)


Thanks,
Bruno


On the SheevaPlug:

# uname -a
Linux sheevaplug 3.15.0-sheeva+ #1 Mon Jun 9 21:58:19 CEST 2014 armv5tel Marvell Kirkwood (Flattened Device Tree) GNU/Linux

# ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX:             4096
RX Mini:        0
RX Jumbo:       0
TX:             4096
Current hardware settings:
RX:             2048             (this defaults to 128 on boot)
RX Mini:        0
RX Jumbo:       0
TX:             256

# ethtool -k eth0
Features for eth0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
        tx-tcp-segmentation: off [fixed]
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]


# ethtool -S eth0
NIC statistics:
     rx_packets: 6188650
     tx_packets: 5407424
     rx_bytes: 2949700388
     tx_bytes: 1527361112
     rx_errors: 0
     tx_errors: 0
     rx_dropped: 0
     tx_dropped: 0
     good_octets_received: 9696302398
     bad_octets_received: 0
     internal_mac_transmit_err: 0
     good_frames_received: 7961482
     bad_frames_received: 0
     broadcast_frames_received: 45063
     multicast_frames_received: 35667
     frames_64_octets: 390741
     frames_65_to_127_octets: 1213008
     frames_128_to_255_octets: 1159241
     frames_256_to_511_octets: 10833
     frames_512_to_1023_octets: 1123895
     frames_1024_to_max_octets: 9471171
     good_octets_sent: 5851110022
     good_frames_sent: 5407407
     excessive_collision: 0
     multicast_frames_sent: 35326
     broadcast_frames_sent: 109099
     unrec_mac_control_received: 0
     fc_sent: 0
     good_fc_received: 0
     bad_fc_received: 0
     undersize_received: 0
     fragments_received: 0
     oversize_received: 0
     jabber_received: 0
     mac_receive_error: 0
     bad_crc_event: 0
     collision: 0
     late_collision: 0
     rx_discard: 1772832
     rx_overrun: 0


Why are so many packets being discarded?


========= UDP ($extra = -u) ====================
Connecting to host sheevaplug, port 3740
[  4] local 192.168.0.139 port 46162 connected to 192.168.0.70 port 3740
[ ID] Interval           Transfer     Bandwidth       Total Datagrams
[  4]   0.00-1.00   sec   114 MBytes   959 Mbits/sec  14640  
[  4]   1.00-2.00   sec   114 MBytes   958 Mbits/sec  14620  
[  4]   2.00-3.00   sec   114 MBytes   958 Mbits/sec  14620  
[  4]   3.00-4.00   sec   114 MBytes   958 Mbits/sec  14620  
[  4]   4.00-5.00   sec   114 MBytes   958 Mbits/sec  14630  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-5.00   sec   571 MBytes   958 Mbits/sec  10.405 ms  72415/72673 (1e+02%)  
[  4] Sent 72673 datagrams

iperf Done.

-----------------------------------------------------------
Server listening on 3740
-----------------------------------------------------------
Accepted connection from 192.168.0.139, port 45461
[  5] local 192.168.0.70 port 3740 connected to 192.168.0.139 port 46162
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  5]   0.00-1.07   sec   368 KBytes  2.83 Mbits/sec  80.399 ms  5086/5132 (99%)  
[  5]   1.07-2.01   sec   280 KBytes  2.44 Mbits/sec  42.903 ms  16726/16761 (1e+02%)  
[  5]   2.01-3.02   sec   408 KBytes  3.29 Mbits/sec  20.992 ms  15643/15694 (1e+02%)  
[  5]   3.02-4.03   sec   392 KBytes  3.17 Mbits/sec  25.503 ms  15140/15189 (1e+02%)  
[  5]   4.03-5.02   sec   488 KBytes  4.07 Mbits/sec  11.793 ms  17642/17703 (1e+02%)  
[  5]   5.02-5.24   sec   128 KBytes  4.69 Mbits/sec  10.405 ms  2178/2194 (99%)  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  5]   0.00-5.24   sec   571 MBytes   915 Mbits/sec  10.405 ms  72415/72673 (1e+02%)  


========= UDP reverse ($extra = -u -R) ============
Connecting to host sheevaplug, port 3740
Reverse mode, remote host sheevaplug is sending
[  4] local 192.168.0.139 port 48931 connected to 192.168.0.70 port 3740
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-1.00   sec  55.0 MBytes   462 Mbits/sec  0.083 ms  320/7366 (4.3%)  
[  4]   1.00-2.00   sec  57.5 MBytes   482 Mbits/sec  0.132 ms  0/7362 (0%)  
[  4]   2.00-3.00   sec  57.0 MBytes   478 Mbits/sec  0.079 ms  0/7295 (0%)  
[  4]   3.00-4.00   sec  58.1 MBytes   488 Mbits/sec  0.088 ms  0/7439 (0%)  
[  4]   4.00-5.00   sec  58.2 MBytes   488 Mbits/sec  0.073 ms  0/7446 (0%)  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  4]   0.00-5.00   sec   288 MBytes   484 Mbits/sec  0.079 ms  320/36920 (0.87%)  
[  4] Sent 36920 datagrams

iperf Done.

-----------------------------------------------------------
Server listening on 3740
-----------------------------------------------------------
Accepted connection from 192.168.0.139, port 45462
[  5] local 192.168.0.70 port 3740 connected to 192.168.0.139 port 48931
[ ID] Interval           Transfer     Bandwidth       Total Datagrams
[  5]   0.00-1.00   sec  55.7 MBytes   467 Mbits/sec  7130  
[  5]   1.00-2.00   sec  56.9 MBytes   477 Mbits/sec  7280  
[  5]   2.00-3.00   sec  57.0 MBytes   478 Mbits/sec  7290  
[  5]   3.00-4.00   sec  58.2 MBytes   488 Mbits/sec  7450  
[  5]   4.00-5.00   sec  58.1 MBytes   488 Mbits/sec  7440  
[  5]   5.00-5.05   sec  2.58 MBytes   483 Mbits/sec  330  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Jitter    Lost/Total Datagrams
[  5]   0.00-5.05   sec   288 MBytes   479 Mbits/sec  0.079 ms  320/36920 (0.87%)  


========= TCP ($extra = ) ====================
Connecting to host sheevaplug, port 3740
[  4] local 192.168.0.139 port 45464 connected to 192.168.0.70 port 3740
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  68.6 MBytes   575 Mbits/sec    0    361 KBytes       
[  4]   1.00-2.00   sec  79.6 MBytes   669 Mbits/sec    0    451 KBytes       
[  4]   2.00-3.00   sec  80.1 MBytes   672 Mbits/sec    0    451 KBytes       
[  4]   3.00-4.00   sec  80.8 MBytes   678 Mbits/sec    0    451 KBytes       
[  4]   4.00-5.00   sec  81.0 MBytes   678 Mbits/sec    0    451 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-5.00   sec   390 MBytes   654 Mbits/sec    0             sender
[  4]   0.00-5.00   sec   387 MBytes   649 Mbits/sec                  receiver

iperf Done.

-----------------------------------------------------------
Server listening on 3740
-----------------------------------------------------------
Accepted connection from 192.168.0.139, port 45463
[  5] local 192.168.0.70 port 3740 connected to 192.168.0.139 port 45464
[ ID] Interval           Transfer     Bandwidth
[  5]   0.00-1.00   sec  67.0 MBytes   561 Mbits/sec                  
[  5]   1.00-2.00   sec  77.0 MBytes   646 Mbits/sec                  
[  5]   2.00-3.00   sec  80.1 MBytes   671 Mbits/sec                  
[  5]   3.00-4.00   sec  80.8 MBytes   678 Mbits/sec                  
[  5]   4.00-5.00   sec  80.9 MBytes   679 Mbits/sec                  
[  5]   5.00-5.02   sec  1.00 MBytes   595 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  5]   0.00-5.02   sec   390 MBytes   652 Mbits/sec    0             sender
[  5]   0.00-5.02   sec   387 MBytes   647 Mbits/sec                  receiver


========= TCP reverse ($extra = -R) ============
Connecting to host sheevaplug, port 3740
Reverse mode, remote host sheevaplug is sending
[  4] local 192.168.0.139 port 45466 connected to 192.168.0.70 port 3740
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  72.2 MBytes   605 Mbits/sec                  
[  4]   1.00-2.00   sec  72.0 MBytes   604 Mbits/sec                  
[  4]   2.00-3.00   sec  72.2 MBytes   606 Mbits/sec                  
[  4]   3.00-4.00   sec  73.8 MBytes   619 Mbits/sec                  
[  4]   4.00-5.00   sec  73.5 MBytes   616 Mbits/sec                  
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-5.00   sec   364 MBytes   611 Mbits/sec    0             sender
[  4]   0.00-5.00   sec   364 MBytes   611 Mbits/sec                  receiver

iperf Done.

-----------------------------------------------------------
Server listening on 3740
-----------------------------------------------------------
Accepted connection from 192.168.0.139, port 45465
[  5] local 192.168.0.70 port 3740 connected to 192.168.0.139 port 45466
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  5]   0.00-1.02   sec  73.0 MBytes   599 Mbits/sec    0    257 KBytes       
[  5]   1.02-2.03   sec  72.5 MBytes   604 Mbits/sec    0    257 KBytes       
[  5]   2.03-3.03   sec  72.5 MBytes   606 Mbits/sec    0    257 KBytes       
[  5]   3.03-4.03   sec  73.8 MBytes   619 Mbits/sec    0    257 KBytes       
[  5]   4.03-5.02   sec  72.5 MBytes   616 Mbits/sec    0    257 KBytes       
[  5]   5.02-5.02   sec  0.00 Bytes  0.00 bits/sec    0    257 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  5]   0.00-5.02   sec   364 MBytes   609 Mbits/sec    0             sender
[  5]   0.00-5.02   sec   364 MBytes   609 Mbits/sec                  receiver

^ permalink raw reply

* Re: [PATCH v4] can: Convert to runtime_pm
From: Sören Brinkmann @ 2014-12-23 22:43 UTC (permalink / raw)
  To: Kedareswara rao Appana
  Cc: wg, mkl, michal.simek, grant.likely, linux-can, netdev,
	linux-kernel, Kedareswara rao Appana
In-Reply-To: <58f37b6fd9104ce185c413c473fe047b@BY2FFO11FD050.protection.gbl>

On Tue, 2014-12-23 at 05:55PM +0530, Kedareswara rao Appana wrote:
> Instead of enabling/disabling clocks at several locations in the driver,
> use the runtime_pm framework. This consolidates the actions for
> runtime PM in the appropriate callbacks and makes the driver more
> readable and mantainable.
> 
> Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com>
> Signed-off-by: Kedareswara rao Appana <appanad@xilinx.com>
> ---
> Chnages for v4:
>  - Updated with the review comments.
> Changes for v3:
>   - Converted the driver to use runtime_pm.
> Changes for v2:
>   - Removed the struct platform_device* from suspend/resume
>     as suggest by Lothar.
> 
>  drivers/net/can/xilinx_can.c |  123 +++++++++++++++++++++++++-----------------
>  1 files changed, 74 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/net/can/xilinx_can.c b/drivers/net/can/xilinx_can.c
> index 6c67643..c71f683 100644
> --- a/drivers/net/can/xilinx_can.c
> +++ b/drivers/net/can/xilinx_can.c
> @@ -32,6 +32,7 @@
>  #include <linux/can/dev.h>
>  #include <linux/can/error.h>
>  #include <linux/can/led.h>
> +#include <linux/pm_runtime.h>
>  
>  #define DRIVER_NAME	"xilinx_can"
>  
> @@ -138,7 +139,7 @@ struct xcan_priv {
>  	u32 (*read_reg)(const struct xcan_priv *priv, enum xcan_reg reg);
>  	void (*write_reg)(const struct xcan_priv *priv, enum xcan_reg reg,
>  			u32 val);
> -	struct net_device *dev;
> +	struct device *dev;
>  	void __iomem *reg_base;
>  	unsigned long irq_flags;
>  	struct clk *bus_clk;
> @@ -842,6 +843,13 @@ static int xcan_open(struct net_device *ndev)
>  	struct xcan_priv *priv = netdev_priv(ndev);
>  	int ret;
>  
> +	ret = pm_runtime_get_sync(priv->dev);
> +	if (ret < 0) {
> +		netdev_err(ndev, "%s: pm_runtime_get failed\r(%d)\n\r",

Does this create the intended output? I haven't seen '\r' anywhere else.
Shouldn't this simply be:
	netdev_err(ndev, "%s: pm_runtime_get failed (%d)\n",

[...]
> @@ -934,27 +927,20 @@ static int xcan_get_berr_counter(const struct net_device *ndev,
>  	struct xcan_priv *priv = netdev_priv(ndev);
>  	int ret;
>  
> -	ret = clk_prepare_enable(priv->can_clk);
> -	if (ret)
> -		goto err;
> -
> -	ret = clk_prepare_enable(priv->bus_clk);
> -	if (ret)
> -		goto err_clk;
> +	ret = pm_runtime_get_sync(priv->dev);
> +	if (ret < 0) {
> +		netdev_err(ndev, "%s: pm_runtime_get failed\r(%d)\n\r",

ditto

	Sören

^ permalink raw reply

* [PATCH] arm: sa1100: move irda header to linux/platform_data
From: Dmitry Eremin-Solenikov @ 2014-12-23 22:14 UTC (permalink / raw)
  To: Russell King, Samuel Ortiz; +Cc: linux-arm-kernel, netdev

In the end asm/mach/irda.h header is not used by anybody except sa1100.
Move the header to the platform data includes dir and rename it to
irda-sa11x0.h.

Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
---
 arch/arm/mach-sa1100/assabet.c                                          | 2 +-
 arch/arm/mach-sa1100/collie.c                                           | 2 +-
 arch/arm/mach-sa1100/h3100.c                                            | 2 +-
 arch/arm/mach-sa1100/h3600.c                                            | 2 +-
 drivers/net/irda/sa1100_ir.c                                            | 2 +-
 .../asm/mach/irda.h => include/linux/platform_data/irda-sa11x0.h        | 0
 6 files changed, 5 insertions(+), 5 deletions(-)
 rename arch/arm/include/asm/mach/irda.h => include/linux/platform_data/irda-sa11x0.h (100%)

diff --git a/arch/arm/mach-sa1100/assabet.c b/arch/arm/mach-sa1100/assabet.c
index 7dd894e..d28ecb9 100644
--- a/arch/arm/mach-sa1100/assabet.c
+++ b/arch/arm/mach-sa1100/assabet.c
@@ -37,7 +37,7 @@
 
 #include <asm/mach/arch.h>
 #include <asm/mach/flash.h>
-#include <asm/mach/irda.h>
+#include <linux/platform_data/irda-sa11x0.h>
 #include <asm/mach/map.h>
 #include <mach/assabet.h>
 #include <linux/platform_data/mfd-mcp-sa11x0.h>
diff --git a/arch/arm/mach-sa1100/collie.c b/arch/arm/mach-sa1100/collie.c
index 108939f..0a816e2 100644
--- a/arch/arm/mach-sa1100/collie.c
+++ b/arch/arm/mach-sa1100/collie.c
@@ -43,7 +43,7 @@
 #include <asm/mach/arch.h>
 #include <asm/mach/flash.h>
 #include <asm/mach/map.h>
-#include <asm/mach/irda.h>
+#include <linux/platform_data/irda-sa11x0.h>
 
 #include <asm/hardware/scoop.h>
 #include <asm/mach/sharpsl_param.h>
diff --git a/arch/arm/mach-sa1100/h3100.c b/arch/arm/mach-sa1100/h3100.c
index 3c43219..c6b4120 100644
--- a/arch/arm/mach-sa1100/h3100.c
+++ b/arch/arm/mach-sa1100/h3100.c
@@ -18,7 +18,7 @@
 
 #include <asm/mach-types.h>
 #include <asm/mach/arch.h>
-#include <asm/mach/irda.h>
+#include <linux/platform_data/irda-sa11x0.h>
 
 #include <mach/h3xxx.h>
 #include <mach/irqs.h>
diff --git a/arch/arm/mach-sa1100/h3600.c b/arch/arm/mach-sa1100/h3600.c
index 5be54c2..118338e 100644
--- a/arch/arm/mach-sa1100/h3600.c
+++ b/arch/arm/mach-sa1100/h3600.c
@@ -18,7 +18,7 @@
 
 #include <asm/mach-types.h>
 #include <asm/mach/arch.h>
-#include <asm/mach/irda.h>
+#include <linux/platform_data/irda-sa11x0.h>
 
 #include <mach/h3xxx.h>
 #include <mach/irqs.h>
diff --git a/drivers/net/irda/sa1100_ir.c b/drivers/net/irda/sa1100_ir.c
index 42fde9e..dd14722 100644
--- a/drivers/net/irda/sa1100_ir.c
+++ b/drivers/net/irda/sa1100_ir.c
@@ -38,7 +38,7 @@
 #include <net/irda/irda_device.h>
 
 #include <mach/hardware.h>
-#include <asm/mach/irda.h>
+#include <linux/platform_data/irda-sa11x0.h>
 
 static int power_level = 3;
 static int tx_lpm;
diff --git a/arch/arm/include/asm/mach/irda.h b/include/linux/platform_data/irda-sa11x0.h
similarity index 100%
rename from arch/arm/include/asm/mach/irda.h
rename to include/linux/platform_data/irda-sa11x0.h
-- 
2.1.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox