Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] bonding: add bond_tx_drop() helper
From: Nikolay Aleksandrov @ 2014-10-31 19:25 UTC (permalink / raw)
  To: Eric Dumazet, David Miller
  Cc: netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	Mahesh Bandewar
In-Reply-To: <1414781274.27538.32.camel@edumazet-glaptop2.roam.corp.google.com>

On 10/31/2014 07:47 PM, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Because bonding stats are usually sum of slave stats, it was
> not easy to account for tx drops at bonding layer.
> 
> We can use dev->tx_dropped for this, as this counter is later
> added to the device stats (in dev_get_stats())
> 
> This extends the idea we had in commit ee6377147409a ("bonding: Simplify
> the xmit function for modes that use xmit_hash") for bond_3ad_xor_xmit()
> to other bonding modes.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Mahesh Bandewar <maheshb@google.com>
> ---
>  drivers/net/bonding/bond_alb.c  |    2 +-
>  drivers/net/bonding/bond_main.c |   15 +++++++--------
>  drivers/net/bonding/bonding.h   |    6 ++++++
>  3 files changed, 14 insertions(+), 9 deletions(-)
> 

Nice,

Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>

^ permalink raw reply

* Re: [PATCH net-next v4 0/4] netns: allow to identify peer netns
From: Eric W. Biederman @ 2014-10-31 19:14 UTC (permalink / raw)
  To: nicolas.dichtel-pdR9zngts4EAvxtiuMwx3w
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, luto-kltTT9wpgjJwATOyAt5JVQ,
	stephen-OTpzqLSitTUnbdJkjeBofR2eb7JE58TQ,
	cwang-xCSkyg8dI+0RB7SZvlqPiA, linux-api-u79uwXL29TY76Z2rM5mHXA,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q
In-Reply-To: <54535B00.5090708-pdR9zngts4EAvxtiuMwx3w@public.gmane.org>

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> Le 30/10/2014 19:41, Eric W. Biederman a écrit :
>> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>>
>>> The goal of this serie is to be able to multicast netlink messages with an
>>> attribute that identify a peer netns.
>>> This is needed by the userland to interpret some informations contained in
>>> netlink messages (like IFLA_LINK value, but also some other attributes in case
>>> of x-netns netdevice (see also
>>> http://thread.gmane.org/gmane.linux.network/315933/focus=316064 and
>>> http://thread.gmane.org/gmane.linux.kernel.containers/28301/focus=4239)).
>>>
>>> Ids of peer netns are set by userland via a new genl messages. These ids are
>>> stored per netns and are local (ie only valid in the netns where they are set).
>>> To avoid allocating an int for each peer netns, I use idr_for_each() to retrieve
>>> the id of a peer netns. Note that it will be possible to add a table (struct net
>>> -> id) later to optimize this lookup if needed.
>>>
>>> Patch 1/4 introduces the netlink API mechanism to set and get these ids.
>>> Patch 2/4 and 3/4 implements an example of how to use these ids in rtnetlink
>>> messages. And patch 4/4 shows that the netlink messages can be symetric between
>>> a GET and a SET.
>>>
>>> iproute2 patches are available, I can send them on demand.
>>
>> A quick reply.  I think this patchset is in the right general direction.
>> There are some oddball details that seem odd/awkward to me such as using
>> genetlink instead of rtnetlink to get and set the ids, and not having
>> ids if they are not set (that feels like a maintenance/usability challenge).
> No problem to use rtnetlink, in fact, I hesitated.
>
> For the second point, I'm not sure to follow you: how to have an id, which will
> not break migration, without asking the user to set it?

We have that situtation with ifindex already.  Basically the thought is
to allow an id to be set, but also allow an id to be auto-generated if
we use an namespace without an id being set.

My gut says if we can figure that out we will have an interface with
much more utility.

> Note that if the user does not provide an id, you still have a magic value to
> say "it's a peer netns but we don't know which one".

That is certainly an improvement in clarity over where we are today.

>> I would like to give your patches a deep review, but I won't be able to
>> do that for a couple of weeks.  I am deep in the process of moving,
>> and will be mostly offline until about the Nov 11th.
>
> No problem, I will wait.
> I would be great to get a final version for the 3.19 ;-)

Eric
_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply

* [PATCH net-next] bonding: add bond_tx_drop() helper
From: Eric Dumazet @ 2014-10-31 18:47 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	Mahesh Bandewar

From: Eric Dumazet <edumazet@google.com>

Because bonding stats are usually sum of slave stats, it was
not easy to account for tx drops at bonding layer.

We can use dev->tx_dropped for this, as this counter is later
added to the device stats (in dev_get_stats())

This extends the idea we had in commit ee6377147409a ("bonding: Simplify
the xmit function for modes that use xmit_hash") for bond_3ad_xor_xmit()
to other bonding modes.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
---
 drivers/net/bonding/bond_alb.c  |    2 +-
 drivers/net/bonding/bond_main.c |   15 +++++++--------
 drivers/net/bonding/bonding.h   |    6 ++++++
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index d2eadab787c55d8fc0f361755409a880d64abfb3..baa58e79256a8b804e6ab683af6665f0730b86de 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1326,7 +1326,7 @@ static int bond_do_alb_xmit(struct sk_buff *skb, struct bonding *bond,
 	}
 
 	/* no suitable interface, frame not sent */
-	dev_kfree_skb_any(skb);
+	bond_tx_drop(bond->dev, skb);
 out:
 	return NETDEV_TX_OK;
 }
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index c9ac06cfe6b7b3a8f62568b70a6ad6d7ca9b44d0..c7520082fb0d7bfbc34ac00b4f4186a30197ad74 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3522,7 +3522,7 @@ static void bond_xmit_slave_id(struct bonding *bond, struct sk_buff *skb, int sl
 		}
 	}
 	/* no slave that can tx has been found */
-	dev_kfree_skb_any(skb);
+	bond_tx_drop(bond->dev, skb);
 }
 
 /**
@@ -3584,7 +3584,7 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev
 			slave_id = bond_rr_gen_slave_id(bond);
 			bond_xmit_slave_id(bond, skb, slave_id % slave_cnt);
 		} else {
-			dev_kfree_skb_any(skb);
+			bond_tx_drop(bond_dev, skb);
 		}
 	}
 
@@ -3603,7 +3603,7 @@ static int bond_xmit_activebackup(struct sk_buff *skb, struct net_device *bond_d
 	if (slave)
 		bond_dev_queue_xmit(bond, skb, slave->dev);
 	else
-		dev_kfree_skb_any(skb);
+		bond_tx_drop(bond_dev, skb);
 
 	return NETDEV_TX_OK;
 }
@@ -3747,8 +3747,7 @@ int bond_3ad_xor_xmit(struct sk_buff *skb, struct net_device *dev)
 		slave = slaves->arr[bond_xmit_hash(bond, skb) % count];
 		bond_dev_queue_xmit(bond, skb, slave->dev);
 	} else {
-		dev_kfree_skb_any(skb);
-		atomic_long_inc(&dev->tx_dropped);
+		bond_tx_drop(dev, skb);
 	}
 
 	return NETDEV_TX_OK;
@@ -3778,7 +3777,7 @@ static int bond_xmit_broadcast(struct sk_buff *skb, struct net_device *bond_dev)
 	if (slave && bond_slave_is_up(slave) && slave->link == BOND_LINK_UP)
 		bond_dev_queue_xmit(bond, skb, slave->dev);
 	else
-		dev_kfree_skb_any(skb);
+		bond_tx_drop(bond_dev, skb);
 
 	return NETDEV_TX_OK;
 }
@@ -3858,7 +3857,7 @@ static netdev_tx_t __bond_start_xmit(struct sk_buff *skb, struct net_device *dev
 		/* Should never happen, mode already checked */
 		netdev_err(dev, "Unknown bonding mode %d\n", BOND_MODE(bond));
 		WARN_ON_ONCE(1);
-		dev_kfree_skb_any(skb);
+		bond_tx_drop(dev, skb);
 		return NETDEV_TX_OK;
 	}
 }
@@ -3878,7 +3877,7 @@ static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	if (bond_has_slaves(bond))
 		ret = __bond_start_xmit(skb, dev);
 	else
-		dev_kfree_skb_any(skb);
+		bond_tx_drop(dev, skb);
 	rcu_read_unlock();
 
 	return ret;
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 10920f0686e2f0222203359751581d1b728c6617..bfb0b51c081a27fbbbe64deda058f70860aac49d 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -645,4 +645,10 @@ extern struct bond_parm_tbl ad_select_tbl[];
 /* exported from bond_netlink.c */
 extern struct rtnl_link_ops bond_link_ops;
 
+static inline void bond_tx_drop(struct net_device *dev, struct sk_buff *skb)
+{
+	atomic_long_inc(&dev->tx_dropped);
+	dev_kfree_skb_any(skb);
+}
+
 #endif /* _LINUX_BONDING_H */

^ permalink raw reply related

* [PATCH 2/2] staging: lustre: lnet: lnet: trailing statements should be on next line
From: Balavasu @ 2014-10-31 18:20 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: greg, andreas.dilger, oleg.drokin

This patch fixes the checkpatch.pl issue
Error: trailing statements should be on next line

Signed-off-by: Balavasu <kp.balavasu@gmail.com>
---
 drivers/staging/lustre/lnet/lnet/router.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index cdeb246..0f569a0 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -1670,13 +1670,16 @@ lnet_get_tunables (void)
 	char *s;
 
 	s = getenv("LNET_ROUTER_PING_TIMEOUT");
-	if (s != NULL) router_ping_timeout = atoi(s);
+	if (s != NULL)
+		router_ping_timeout = atoi(s);
 
 	s = getenv("LNET_LIVE_ROUTER_CHECK_INTERVAL");
-	if (s != NULL) live_router_check_interval = atoi(s);
+	if (s != NULL)
+		live_router_check_interval = atoi(s);
 
 	s = getenv("LNET_DEAD_ROUTER_CHECK_INTERVAL");
-	if (s != NULL) dead_router_check_interval = atoi(s);
+	if (s != NULL)
+		dead_router_check_interval = atoi(s);
 
 	/* This replaces old lnd_notify mechanism */
 	check_routers_before_use = 1;
-- 
1.9.1

^ permalink raw reply related

* [PATCH 1/2] staging: lustre: lnet: lnet: do not initialise statics to 0 or NULL
From: Balavasu @ 2014-10-31 18:18 UTC (permalink / raw)
  To: netdev, linux-kernel; +Cc: greg, andreas.dilger, oleg.drokin

This patch fixes the checkpatch.pl issue
Error: do not initialise statics to 0 or NULL for time

Signed-off-by: Balavasu <kp.balavasu@gmail.com>
---
 drivers/staging/lustre/lnet/lnet/do not instalise 0 | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lnet/lnet/router.c b/drivers/staging/lustre/lnet/lnet/router.c
index b5b8fb5..cdeb246 100644
--- a/drivers/staging/lustre/lnet/lnet/router.c
+++ b/drivers/staging/lustre/lnet/lnet/router.c
@@ -46,7 +46,7 @@ MODULE_PARM_DESC(small_router_buffers, "# of small (1 page) messages to buffer i
 static int large_router_buffers;
 module_param(large_router_buffers, int, 0444);
 MODULE_PARM_DESC(large_router_buffers, "# of large messages to buffer in the router");
-static int peer_buffer_credits = 0;
+static int peer_buffer_credits;
 module_param(peer_buffer_credits, int, 0444);
 MODULE_PARM_DESC(peer_buffer_credits, "# router buffer credits per peer");
 
@@ -80,7 +80,7 @@ lnet_peer_buffer_credits(lnet_ni_t *ni)
 
 #endif
 
-static int check_routers_before_use = 0;
+static int check_routers_before_use;
 module_param(check_routers_before_use, int, 0444);
 MODULE_PARM_DESC(check_routers_before_use, "Assume routers are down and ping them before use");
 
@@ -245,7 +245,7 @@ lnet_find_net_locked (__u32 net)
 
 static void lnet_shuffle_seed(void)
 {
-	static int seeded = 0;
+	static int seeded;
 	int lnd_type, seed[2];
 	struct timeval tv;
 	lnet_ni_t *ni;
@@ -1584,8 +1584,8 @@ lnet_notify (lnet_ni_t *ni, lnet_nid_t nid, int alive, unsigned long when)
 void
 lnet_router_checker (void)
 {
-	static time_t last = 0;
-	static int    running = 0;
+	static time_t last;
+	static int    running;
 
 	time_t	    now = get_seconds();
 	int	       interval = now - last;
-- 
1.9.1

^ permalink raw reply related

* vxlan: error handling and error messages during vxlan_init
From: Marcelo Ricardo Leitner @ 2014-10-31 18:11 UTC (permalink / raw)
  To: stephen, pshelar; +Cc: netdev

Hi Stephen, Pravin,

Before Stephen's commit 1c51a9159ddefa5119724a4c7da3fd3ef44b68d5 (vxlan: fix 
race caused by dropping rtnl_unlock), if we failed to create the UDP socket 
for any reason, for example, the user would be informed right away. After that 
commit, the only option is to delete the vxlan and create it again, right? 
Because by simply calling ip link set vxlanX up is not enough:

/* Setup stats when device is created */
static int vxlan_init(struct net_device *dev)
{
     struct vxlan_dev *vxlan = netdev_priv(dev);
     struct vxlan_net *vn = net_generic(vxlan->net, vxlan_net_id);
     struct vxlan_sock *vs;

     dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
     if (!dev->tstats)
         return -ENOMEM;

     spin_lock(&vn->sock_lock);
     vs = vxlan_find_sock(vxlan->net, (vxlan->flags & VXLAN_F_IPV6) ? AF_INET6 
: AF_INET, vxlan->dst_port);
     if (vs) {
         /* If we have a socket with same port already, reuse it */
         atomic_inc(&vs->refcnt);
         vxlan_vs_add_dev(vs, vxlan);
     } else {
         /* otherwise make new socket outside of RTNL */
         dev_hold(dev);
         queue_work(vxlan_wq, &vxlan->sock_work);       <--
     }
     spin_unlock(&vn->sock_lock);

     return 0;
}

Error is just ignored:

/* Scheduled at device creation to bind to a socket */
static void vxlan_sock_work(struct work_struct *work)
{
     struct vxlan_dev *vxlan = container_of(work, struct vxlan_dev, sock_work);
     struct net *net = vxlan->net;
     struct vxlan_net *vn = net_generic(net, vxlan_net_id);
     __be16 port = vxlan->dst_port;
     struct vxlan_sock *nvs;

     nvs = vxlan_sock_add(net, port, vxlan_rcv, NULL, false, vxlan->flags);
     spin_lock(&vn->sock_lock);
     if (!IS_ERR(nvs))
         vxlan_vs_add_dev(nvs, vxlan);
     spin_unlock(&vn->sock_lock);

     dev_put(vxlan->dev);
}

And we can't bring it up:

/* Start ageing timer and join group when device is brought up */
static int vxlan_open(struct net_device *dev)
{
         struct vxlan_dev *vxlan = netdev_priv(dev);
         struct vxlan_sock *vs = vxlan->vn_sock;

         /* socket hasn't been created */
         if (!vs)
                 return -ENOTCONN;      <---

And making this to retry the initialization doesn't seem a good idea.
Can we improve that error handling, somehow? As far as I could track, VXLAN is 
the only one that currently defers some initialization code during ndo_init.

Together with that, that initial commit did put some good error messages on 
this part of the code, like:

+       nvs = vxlan_socket_create(net, port);
+       if (IS_ERR(nvs)) {
+               netdev_err(vxlan->dev, "Can not create UDP socket, %ld\n",
+                          PTR_ERR(nvs));
+               goto out;
+       }

But Pravin's 9c2e24e16fbccf6cc1102442acc4a629f79615a7 commit removed them all. 
The only error message that is currently available is:

[root@localhost ~]# ip link set vxlan7 up
RTNETLINK answers: Transport endpoint is not connected

Nothing else is logged, dmesg or anywhere. (The actual error on this case was 
that it failed to create the UDP socket because the port was already in use..)

May we put them back? :) doesn't seem it would hurt OVS..

Thanks,
Marcelo

^ permalink raw reply

* Re: [PATCH net] r8152: stop submitting intr for -EPROTO
From: David Miller @ 2014-10-31 17:56 UTC (permalink / raw)
  To: hayeswang; +Cc: netdev, nic_swsd, linux-kernel, linux-usb
In-Reply-To: <1394712342-15778-74-Taiwan-albertk@realtek.com>

From: Hayes Wang <hayeswang@realtek.com>
Date: Fri, 31 Oct 2014 13:35:57 +0800

> For Renesas USB 3.0 host controller, when unplugging the usb hub which
> has the RTL8153 plugged, the driver would get -EPROTO for interrupt
> transfer. There is high probability to get the information of "HC died;
> cleaning up", if the driver continues to submit the interrupt transfer
> before the disconnect() is called.
 ...
> Signed-off-by: Hayes Wang <hayeswang@realtek.com>

Applied, thanks.

^ permalink raw reply

* Re: DMA-API warning from sunhme - unchecked dma_map_single error
From: David Miller @ 2014-10-31 17:43 UTC (permalink / raw)
  To: mroos; +Cc: netdev, sparclinux
In-Reply-To: <alpine.SOC.1.00.1311291036050.3807@math.ut.ee>

From: Meelis Roos <mroos@linux.ee>
Date: Fri, 29 Nov 2013 10:40:40 +0200 (EET)

> It seems to be correct warning - dma_map_single is used unchecked in 
> sunhme.c. I can try fixing it - the error handling will be the only 
> problem. Is it considered worthwile?

Can you test this patch?

====================
sunhme: Add DMA mapping error checks.

Reported-by: Meelis Roos <mroos@linux.ee>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 drivers/net/ethernet/sun/sunhme.c | 62 +++++++++++++++++++++++++++++++++++----
 1 file changed, 57 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/sun/sunhme.c b/drivers/net/ethernet/sun/sunhme.c
index 72c8525..9c01480 100644
--- a/drivers/net/ethernet/sun/sunhme.c
+++ b/drivers/net/ethernet/sun/sunhme.c
@@ -1262,6 +1262,7 @@ static void happy_meal_init_rings(struct happy_meal *hp)
 	HMD(("init rxring, "));
 	for (i = 0; i < RX_RING_SIZE; i++) {
 		struct sk_buff *skb;
+		u32 mapping;
 
 		skb = happy_meal_alloc_skb(RX_BUF_ALLOC_SIZE, GFP_ATOMIC);
 		if (!skb) {
@@ -1272,10 +1273,16 @@ static void happy_meal_init_rings(struct happy_meal *hp)
 
 		/* Because we reserve afterwards. */
 		skb_put(skb, (ETH_FRAME_LEN + RX_OFFSET + 4));
+		mapping = dma_map_single(hp->dma_dev, skb->data, RX_BUF_ALLOC_SIZE,
+					 DMA_FROM_DEVICE);
+		if (dma_mapping_error(hp->dma_dev, mapping)) {
+			dev_kfree_skb_any(skb);
+			hme_write_rxd(hp, &hb->happy_meal_rxd[i], 0, 0);
+			continue;
+		}
 		hme_write_rxd(hp, &hb->happy_meal_rxd[i],
 			      (RXFLAG_OWN | ((RX_BUF_ALLOC_SIZE - RX_OFFSET) << 16)),
-			      dma_map_single(hp->dma_dev, skb->data, RX_BUF_ALLOC_SIZE,
-					     DMA_FROM_DEVICE));
+			      mapping);
 		skb_reserve(skb, RX_OFFSET);
 	}
 
@@ -2020,6 +2027,7 @@ static void happy_meal_rx(struct happy_meal *hp, struct net_device *dev)
 		skb = hp->rx_skbs[elem];
 		if (len > RX_COPY_THRESHOLD) {
 			struct sk_buff *new_skb;
+			u32 mapping;
 
 			/* Now refill the entry, if we can. */
 			new_skb = happy_meal_alloc_skb(RX_BUF_ALLOC_SIZE, GFP_ATOMIC);
@@ -2027,13 +2035,21 @@ static void happy_meal_rx(struct happy_meal *hp, struct net_device *dev)
 				drops++;
 				goto drop_it;
 			}
+			skb_put(new_skb, (ETH_FRAME_LEN + RX_OFFSET + 4));
+			mapping = dma_map_single(hp->dma_dev, new_skb->data,
+						 RX_BUF_ALLOC_SIZE,
+						 DMA_FROM_DEVICE);
+			if (unlikely(dma_mapping_error(hp->dma_dev, mapping))) {
+				dev_kfree_skb_any(new_skb);
+				drops++;
+				goto drop_it;
+			}
+
 			dma_unmap_single(hp->dma_dev, dma_addr, RX_BUF_ALLOC_SIZE, DMA_FROM_DEVICE);
 			hp->rx_skbs[elem] = new_skb;
-			skb_put(new_skb, (ETH_FRAME_LEN + RX_OFFSET + 4));
 			hme_write_rxd(hp, this,
 				      (RXFLAG_OWN|((RX_BUF_ALLOC_SIZE-RX_OFFSET)<<16)),
-				      dma_map_single(hp->dma_dev, new_skb->data, RX_BUF_ALLOC_SIZE,
-						     DMA_FROM_DEVICE));
+				      mapping);
 			skb_reserve(new_skb, RX_OFFSET);
 
 			/* Trim the original skb for the netif. */
@@ -2248,6 +2264,25 @@ static void happy_meal_tx_timeout(struct net_device *dev)
 	netif_wake_queue(dev);
 }
 
+static void unmap_partial_tx_skb(struct happy_meal *hp, u32 first_mapping,
+				 u32 first_len, u32 first_entry, u32 entry)
+{
+	struct happy_meal_txd *txbase = &hp->happy_block->happy_meal_txd[0];
+
+	dma_unmap_single(hp->dma_dev, first_mapping, first_len, DMA_TO_DEVICE);
+
+	first_entry = NEXT_TX(first_entry);
+	while (first_entry != entry) {
+		struct happy_meal_txd *this = &txbase[first_entry];
+		u32 addr, len;
+
+		addr = hme_read_desc32(hp, &this->tx_addr);
+		len = hme_read_desc32(hp, &this->tx_flags);
+		len &= TXFLAG_SIZE;
+		dma_unmap_page(hp->dma_dev, addr, len, DMA_TO_DEVICE);
+	}
+}
+
 static netdev_tx_t happy_meal_start_xmit(struct sk_buff *skb,
 					 struct net_device *dev)
 {
@@ -2284,6 +2319,8 @@ static netdev_tx_t happy_meal_start_xmit(struct sk_buff *skb,
 
 		len = skb->len;
 		mapping = dma_map_single(hp->dma_dev, skb->data, len, DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(hp->dma_dev, mapping)))
+			goto out_dma_error;
 		tx_flags |= (TXFLAG_SOP | TXFLAG_EOP);
 		hme_write_txd(hp, &hp->happy_block->happy_meal_txd[entry],
 			      (tx_flags | (len & TXFLAG_SIZE)),
@@ -2299,6 +2336,8 @@ static netdev_tx_t happy_meal_start_xmit(struct sk_buff *skb,
 		first_len = skb_headlen(skb);
 		first_mapping = dma_map_single(hp->dma_dev, skb->data, first_len,
 					       DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(hp->dma_dev, first_mapping)))
+			goto out_dma_error;
 		entry = NEXT_TX(entry);
 
 		for (frag = 0; frag < skb_shinfo(skb)->nr_frags; frag++) {
@@ -2308,6 +2347,11 @@ static netdev_tx_t happy_meal_start_xmit(struct sk_buff *skb,
 			len = skb_frag_size(this_frag);
 			mapping = skb_frag_dma_map(hp->dma_dev, this_frag,
 						   0, len, DMA_TO_DEVICE);
+			if (unlikely(dma_mapping_error(hp->dma_dev, mapping))) {
+				unmap_partial_tx_skb(hp, first_mapping, first_len,
+						     first_entry, entry);
+				goto out_dma_error;
+			}
 			this_txflags = tx_flags;
 			if (frag == skb_shinfo(skb)->nr_frags - 1)
 				this_txflags |= TXFLAG_EOP;
@@ -2333,6 +2377,14 @@ static netdev_tx_t happy_meal_start_xmit(struct sk_buff *skb,
 
 	tx_add_log(hp, TXLOG_ACTION_TXMIT, 0);
 	return NETDEV_TX_OK;
+
+out_dma_error:
+	hp->tx_skbs[hp->tx_new] = NULL;
+	spin_unlock_irq(&hp->happy_lock);
+
+	dev_kfree_skb_any(skb);
+	dev->stats.tx_dropped++;
+	return NETDEV_TX_OK;
 }
 
 static struct net_device_stats *happy_meal_get_stats(struct net_device *dev)
-- 
1.9.3


^ permalink raw reply related

* Re: [PATCH] ipv4: avoid divide 0 error in tcp_incr_quickack
From: Eric Dumazet @ 2014-10-31 17:40 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Chen Weilong, netdev@vger.kernel.org, David S. Miller,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, linux-kernel@vger.kernel.org
In-Reply-To: <CAADnVQLe+Xs22sbdy7=8KW2+LscFOiFWdKxSFFuPTWSPTHbSow@mail.gmail.com>

On Fri, 2014-10-31 at 09:24 -0700, Alexei Starovoitov wrote:
> cc-ing netdev
> 
> On Fri, Oct 31, 2014 at 7:50 AM, Chen Weilong <chenweilong@huawei.com> wrote:
> > From: Weilong Chen <chenweilong@huawei.com>
> >
> > We got a problem like this:
> >  [ffff8801c1a05570] machine_kexec at ffffffff81025039
> >  [ffff8801c1a055d0] crash_kexec at ffffffff8109b253
> >  [ffff8801c1a056a0] oops_end at ffffffff81442aed
> >  [ffff8801c1a056d0] die at ffffffff81005603
> >  [ffff8801c1a05700] do_trap at ffffffff81442448
> >  [ffff8801c1a05760] do_divide_error at ffffffff81002c10
> >  [ffff8801c1a05888] tcp_send_dupack at ffffffff81385e44
> >  [ffff8801c1a058c8] tcp_validate_incoming at ffffffff813886b5
> >  [ffff8801c1a05908] tcp_rcv_state_process at ffffffff8138d0b7
> >  [ffff8801c1a05958] tcp_child_process at ffffffff81397255
> >  [ffff8801c1a05988] tcp_v4_do_rcv at ffffffff81395a70
> >  [ffff8801c1a059d8] tcp_v4_rcv at ffffffff81396fc8
> >  [ffff8801c1a05a48] ip_local_deliver_finish at ffffffff813746e9
> >  [ffff8801c1a05a78] ip_local_deliver at ffffffff81374a20
> >  [ffff8801c1a05aa8] ip_rcv_finish at ffffffff81374389
> >  [ffff8801c1a05ad8] ip_rcv at ffffffff81374c78
> > There was a wrong ack packet coming during TCP handshake. The socket's state
> > was TCP_SYN_RECV, its rcv_mss was not initialize yet. So
> > tcp_send_dupack -> tcp_enter_quickack_mode got a divide 0 error.
> > This patch add a state check before tcp_enter_quickack_mode.
> 
> ouch. Is it remote exploitable?

Seems to be SYN crossing. Quite hard, but possible.

^ permalink raw reply

* drivers: net: cpsw: Support ALLMULTI and fix IFF_PROMISC in switch mode
From: Lennart Sorensen @ 2014-10-31 17:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Len Sorensen, Mugunthan V N, David S. Miller, netdev

The cpsw driver did not support the IFF_ALLMULTI flag which makes dynamic
multicast routing not work.  Related to this, when enabling IFF_PROMISC
in switch mode, all registered multicast addresses are flushed, resulting
in only broadcast and unicast traffic being received.

A new cpsw_ale_set_allmulti function now scans through the ALE entry
table and adds/removes the host port from the unregistered multicast
port mask of each vlan entry depending on the state of IFF_ALLMULTI.
In promiscious mode, cpsw_ale_set_allmulti is used to force reception
of all multicast traffic in addition to the unicast and broadcast traffic.

With this change dynamic multicast and promiscious mode both work in
switch mode.

Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 952e1e4..96a61d1 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -638,12 +638,16 @@ static void cpsw_ndo_set_rx_mode(struct net_device *ndev)
 	if (ndev->flags & IFF_PROMISC) {
 		/* Enable promiscuous mode */
 		cpsw_set_promiscious(ndev, true);
+		cpsw_ale_set_allmulti(priv->ale, IFF_ALLMULTI);
 		return;
 	} else {
 		/* Disable promiscuous mode */
 		cpsw_set_promiscious(ndev, false);
 	}
 
+	/* Restore allmulti on vlans if necessary */
+	cpsw_ale_set_allmulti(priv->ale, priv->ndev->flags & IFF_ALLMULTI);
+
 	/* Clear all mcast from ALE */
 	cpsw_ale_flush_multicast(priv->ale, ALE_ALL_PORTS << priv->host_port);
 
@@ -1149,6 +1153,7 @@ static inline void cpsw_add_default_vlan(struct cpsw_priv *priv)
 	const int port = priv->host_port;
 	u32 reg;
 	int i;
+	int unreg_mcast_mask;
 
 	reg = (priv->version == CPSW_VERSION_1) ? CPSW1_PORT_VLAN :
 	       CPSW2_PORT_VLAN;
@@ -1158,9 +1163,14 @@ static inline void cpsw_add_default_vlan(struct cpsw_priv *priv)
 	for (i = 0; i < priv->data.slaves; i++)
 		slave_write(priv->slaves + i, vlan, reg);
 
+	if (priv->ndev->flags & IFF_ALLMULTI)
+		unreg_mcast_mask = ALE_ALL_PORTS;
+	else
+		unreg_mcast_mask = ALE_PORT_1 | ALE_PORT_2;
+
 	cpsw_ale_add_vlan(priv->ale, vlan, ALE_ALL_PORTS << port,
 			  ALE_ALL_PORTS << port, ALE_ALL_PORTS << port,
-			  (ALE_PORT_1 | ALE_PORT_2) << port);
+			  unreg_mcast_mask << port);
 }
 
 static void cpsw_init_host_port(struct cpsw_priv *priv)
@@ -1620,11 +1630,17 @@ static inline int cpsw_add_vlan_ale_entry(struct cpsw_priv *priv,
 				unsigned short vid)
 {
 	int ret;
+	int unreg_mcast_mask;
+
+	if (priv->ndev->flags & IFF_ALLMULTI)
+		unreg_mcast_mask = ALE_ALL_PORTS;
+	else
+		unreg_mcast_mask = ALE_PORT_1 | ALE_PORT_2;
 
 	ret = cpsw_ale_add_vlan(priv->ale, vid,
 				ALE_ALL_PORTS << priv->host_port,
 				0, ALE_ALL_PORTS << priv->host_port,
-				(ALE_PORT_1 | ALE_PORT_2) << priv->host_port);
+				unreg_mcast_mask << priv->host_port);
 	if (ret != 0)
 		return ret;
 
diff --git a/drivers/net/ethernet/ti/cpsw_ale.c b/drivers/net/ethernet/ti/cpsw_ale.c
index 0579b22..3ae8387 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.c
+++ b/drivers/net/ethernet/ti/cpsw_ale.c
@@ -443,6 +443,35 @@ int cpsw_ale_del_vlan(struct cpsw_ale *ale, u16 vid, int port_mask)
 	return 0;
 }
 
+void cpsw_ale_set_allmulti(struct cpsw_ale *ale, int allmulti)
+{
+	u32 ale_entry[ALE_ENTRY_WORDS];
+	int type, idx;
+	int unreg_mcast = 0;
+
+	/* Only bother doing the work if the setting is actually changing */
+	if (ale->allmulti == allmulti)
+		return;
+
+	/* Remember the new setting to check against next time */
+	ale->allmulti = allmulti;
+
+	for (idx = 0; idx < ale->params.ale_entries; idx++) {
+		cpsw_ale_read(ale, idx, ale_entry);
+		type = cpsw_ale_get_entry_type(ale_entry);
+		if (type != ALE_TYPE_VLAN)
+			continue;
+
+		unreg_mcast = cpsw_ale_get_vlan_unreg_mcast(ale_entry);
+		if (allmulti)
+			unreg_mcast |= 1;
+		else
+			unreg_mcast &= ~1;
+		cpsw_ale_set_vlan_unreg_mcast(ale_entry, unreg_mcast);
+		cpsw_ale_write(ale, idx, ale_entry);
+	}
+}
+
 struct ale_control_info {
 	const char	*name;
 	int		offset, port_offset;
diff --git a/drivers/net/ethernet/ti/cpsw_ale.h b/drivers/net/ethernet/ti/cpsw_ale.h
index 31cf43c..c0d4127 100644
--- a/drivers/net/ethernet/ti/cpsw_ale.h
+++ b/drivers/net/ethernet/ti/cpsw_ale.h
@@ -27,6 +27,7 @@ struct cpsw_ale {
 	struct cpsw_ale_params	params;
 	struct timer_list	timer;
 	unsigned long		ageout;
+	int			allmulti;
 };
 
 enum cpsw_ale_control {
@@ -103,6 +104,7 @@ int cpsw_ale_del_mcast(struct cpsw_ale *ale, u8 *addr, int port_mask,
 int cpsw_ale_add_vlan(struct cpsw_ale *ale, u16 vid, int port, int untag,
 			int reg_mcast, int unreg_mcast);
 int cpsw_ale_del_vlan(struct cpsw_ale *ale, u16 vid, int port);
+void cpsw_ale_set_allmulti(struct cpsw_ale *ale, int allmulti);
 
 int cpsw_ale_control_get(struct cpsw_ale *ale, int port, int control);
 int cpsw_ale_control_set(struct cpsw_ale *ale, int port,

-- 
Len Sorensen

^ permalink raw reply related

* drivers: net: cpsw: Fix broken loop condition in switch mode
From: Lennart Sorensen @ 2014-10-31 17:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Heiko Schocher, Len Sorensen, Mugunthan V N, David S. Miller,
	netdev

0d961b3b52f566f823070ce2366511a7f64b928c (drivers: net: cpsw: fix buggy
loop condition) accidentally fixed a loop comparison in too many places
while fixing a real bug.

It was correct to fix the dual_emac mode section since there 'i' is used
as an index into priv->slaves which is a 0 based array.

However the other two changes (which are only used in switch mode)
are wrong since there 'i' is actually the ALE port number, and port 0
is the host port, while port 1 and up are the slave ports.

Putting the loop condition back in the switch mode section fixes it.

A comment has been added to point out the intent clearly to avoid future
confusion.  Also a comment is fixed that said the opposite of what was
actually happening.

Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Acked-by: Heiko Schocher <hs@denx.de>

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 952e1e4..4683196 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -591,8 +591,8 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 		if (enable) {
 			unsigned long timeout = jiffies + HZ;
 
-			/* Disable Learn for all ports */
-			for (i = 0; i < priv->data.slaves; i++) {
+			/* Disable Learn for all ports (host is port 0 and slaves are port 1 and up */
+			for (i = 0; i <= priv->data.slaves; i++) {
 				cpsw_ale_control_set(ale, i,
 						     ALE_PORT_NOLEARN, 1);
 				cpsw_ale_control_set(ale, i,
@@ -616,11 +616,11 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 			cpsw_ale_control_set(ale, 0, ALE_P0_UNI_FLOOD, 1);
 			dev_dbg(&ndev->dev, "promiscuity enabled\n");
 		} else {
-			/* Flood All Unicast Packets to Host port */
+			/* Don't Flood All Unicast Packets to Host port */
 			cpsw_ale_control_set(ale, 0, ALE_P0_UNI_FLOOD, 0);
 
-			/* Enable Learn for all ports */
-			for (i = 0; i < priv->data.slaves; i++) {
+			/* Enable Learn for all ports (host is port 0 and slaves are port 1 and up */
+			for (i = 0; i <= priv->data.slaves; i++) {
 				cpsw_ale_control_set(ale, i,
 						     ALE_PORT_NOLEARN, 0);
 				cpsw_ale_control_set(ale, i,

-- 
Len Sorensen

^ permalink raw reply related

* Re: [PATCH] VNIC: Adding support for Cavium ThunderX network controller
From: Sunil Kovvuri @ 2014-10-31 17:14 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Robert Richter, David S. Miller, Sunil Goutham, Robert Richter,
	Stefan Assmann, LKML, LAKML, netdev
In-Reply-To: <20141030195458.2958d88a@urahara>

On Fri, Oct 31, 2014 at 8:24 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Thu, 30 Oct 2014 17:54:34 +0100
> Robert Richter <rric@kernel.org> wrote:
>
>> +#ifdef       VNIC_RSS_SUPPORT
>> +static int rss_config = RSS_IP_HASH_ENA | RSS_TCP_HASH_ENA | RSS_UDP_HASH_ENA;
>> +module_param(rss_config, int, S_IRUGO);
>> +MODULE_PARM_DESC(rss_config,
>> +              "RSS hash config [bits 8:0] (Bit0:L2 extended, 1:IP, 2:TCP, 3:TCP SYN, 4:UDP, 5:L4 extended, 6:ROCE 7:L3 bi-directional, 8:L4 bi-directional)");
>> +#endif
>
> This should managed  be via ethtool ETHTOOL_GRXFH rather than a module parameter.
Thanks, i will add setting hash options via ETHTOOL_SRXFH as well.
The idea here is to have a choice of hash while module load (through
module params) and if it needs to be changed runtime then
via Ethtool.

Sunil.

^ permalink raw reply

* Re: Mistake in commit 0d961b3b52f566f823070ce2366511a7f64b928c breaks cpsw non dual_emac mode.
From: Lennart Sorensen @ 2014-10-31 17:10 UTC (permalink / raw)
  To: Heiko Schocher; +Cc: David Miller, linux-kernel, mugunthanvnm, netdev
In-Reply-To: <545330EA.60303@denx.de>

On Fri, Oct 31, 2014 at 07:49:14AM +0100, Heiko Schocher wrote:
> Seems I missed your original patch ... looked in it here:
> 
> https://lkml.org/lkml/2014/10/28/837
> 
> and I think you are correct, thanks for this fix. You can add my
> Acked-by: Heiko Schocher <hs@denx.de>
> if you post a corrected v2, as David suggested.

Will do.  I noticed a wrong word in one of the messages too so good
thing it gets fixed before getting commited.

-- 
Len Sorensen

^ permalink raw reply

* Re: [PATCH 0/6] netfilter/ipvs fixes for net
From: David Miller @ 2014-10-31 16:30 UTC (permalink / raw)
  To: pablo; +Cc: netfilter-devel, netdev
In-Reply-To: <1414757912-29150-1-git-send-email-pablo@netfilter.org>

From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Fri, 31 Oct 2014 13:18:26 +0100

> The following patchset contains fixes for netfilter/ipvs. This round of
> fixes is larger than usual at this stage, specifically because of the
> nf_tables bridge reject fixes that I would like to see in 3.18. The
> patches are:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Pulled, thanks Pablo.

^ permalink raw reply

* [PATCH v2] stmmac: pci: set default of the filter bins
From: Andy Shevchenko @ 2014-10-31 16:28 UTC (permalink / raw)
  To: Giuseppe Cavallaro, netdev, Kweh Hock Leong, David S . Miller,
	Vince Bridgers
  Cc: Andy Shevchenko, stable

The commit 3b57de958e2a brought the support for a different amount of the
filter bins, but didn't update the PCI driver accordingly. This patch appends
the default values when the device is enumerated via PCI bus.

Fixes: 3b57de958e2a (net: stmmac: Support devicetree configs for mcast and ucast filter entries)
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: stable@vger.kernel.org
---
Since v1:
- fix ugly style
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
index 655a23b..e17a970 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c
@@ -33,6 +33,7 @@ static struct stmmac_dma_cfg dma_cfg;
 static void stmmac_default_data(void)
 {
 	memset(&plat_dat, 0, sizeof(struct plat_stmmacenet_data));
+
 	plat_dat.bus_id = 1;
 	plat_dat.phy_addr = 0;
 	plat_dat.interface = PHY_INTERFACE_MODE_GMII;
@@ -47,6 +48,12 @@ static void stmmac_default_data(void)
 	dma_cfg.pbl = 32;
 	dma_cfg.burst_len = DMA_AXI_BLEN_256;
 	plat_dat.dma_cfg = &dma_cfg;
+
+	/* Set default value for multicast hash bins */
+	plat_dat.multicast_filter_bins = HASH_TABLE_SIZE;
+
+	/* Set default value for unicast filter entries */
+	plat_dat.unicast_filter_entries = 1;
 }
 
 /**
-- 
2.1.1

^ permalink raw reply related

* Re: [PATCH] ipv4: avoid divide 0 error in tcp_incr_quickack
From: Alexei Starovoitov @ 2014-10-31 16:24 UTC (permalink / raw)
  To: Chen Weilong, Eric Dumazet, netdev@vger.kernel.org
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, linux-kernel@vger.kernel.org
In-Reply-To: <1414767047-8972-1-git-send-email-chenweilong@huawei.com>

cc-ing netdev

On Fri, Oct 31, 2014 at 7:50 AM, Chen Weilong <chenweilong@huawei.com> wrote:
> From: Weilong Chen <chenweilong@huawei.com>
>
> We got a problem like this:
>  [ffff8801c1a05570] machine_kexec at ffffffff81025039
>  [ffff8801c1a055d0] crash_kexec at ffffffff8109b253
>  [ffff8801c1a056a0] oops_end at ffffffff81442aed
>  [ffff8801c1a056d0] die at ffffffff81005603
>  [ffff8801c1a05700] do_trap at ffffffff81442448
>  [ffff8801c1a05760] do_divide_error at ffffffff81002c10
>  [ffff8801c1a05888] tcp_send_dupack at ffffffff81385e44
>  [ffff8801c1a058c8] tcp_validate_incoming at ffffffff813886b5
>  [ffff8801c1a05908] tcp_rcv_state_process at ffffffff8138d0b7
>  [ffff8801c1a05958] tcp_child_process at ffffffff81397255
>  [ffff8801c1a05988] tcp_v4_do_rcv at ffffffff81395a70
>  [ffff8801c1a059d8] tcp_v4_rcv at ffffffff81396fc8
>  [ffff8801c1a05a48] ip_local_deliver_finish at ffffffff813746e9
>  [ffff8801c1a05a78] ip_local_deliver at ffffffff81374a20
>  [ffff8801c1a05aa8] ip_rcv_finish at ffffffff81374389
>  [ffff8801c1a05ad8] ip_rcv at ffffffff81374c78
> There was a wrong ack packet coming during TCP handshake. The socket's state
> was TCP_SYN_RECV, its rcv_mss was not initialize yet. So
> tcp_send_dupack -> tcp_enter_quickack_mode got a divide 0 error.
> This patch add a state check before tcp_enter_quickack_mode.

ouch. Is it remote exploitable?

> Signed-off-by: Weilong Chen <chenweilong@huawei.com>
> ---
>  net/ipv4/tcp_input.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index 4e4617e..9eb56dc 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3986,7 +3986,8 @@ static void tcp_send_dupack(struct sock *sk, const struct sk_buff *skb)
>         if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
>             before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
>                 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
> -               tcp_enter_quickack_mode(sk);
> +               if (sk->sk_state != TCP_SYN_RECV)
> +                       tcp_enter_quickack_mode(sk);
>
>                 if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
>                         u32 end_seq = TCP_SKB_CB(skb)->end_seq;
> --
> 1.7.12
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: [PATCH net-next] bridge: make proxy arp configurable
From: David Miller @ 2014-10-31 16:21 UTC (permalink / raw)
  To: shemming; +Cc: kyeyoonp, netdev
In-Reply-To: <20141030200942.5a531e34@urahara>

From: Stephen Hemminger <shemming@brocade.com>
Date: Thu, 30 Oct 2014 20:09:42 -0700

> @@ -60,3 +60,19 @@ config BRIDGE_VLAN_FILTERING
>  	  Say N to exclude this support and reduce the binary size.
>  
>  	  If unsure, say Y.
> +
> +config BRIDGE_ARP_PROXY
> +	bool "ARP proxying"
> +	depends on BRIDGE
> +	depends on INET
> +	default y
> +	---help---
> +	  If you say Y here, then the Ethernet bridge to keep track of
> +	  the hardware address to IP address mapping.
> +
> +	  It is most useful when used as a wireless AP.
> +
> +	  Say N to exclude this support and reduce the binary size.
> +
> +	  If unsure, say Y.
> +

Please do not ever add empty lines at the end of files, GIT warns
about this when I try to apply your patch.

^ permalink raw reply

* Re: [PATCH] VNIC: Adding support for Cavium ThunderX network controller
From: Robert Richter @ 2014-10-31 16:17 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Robert Richter, netdev, linux-kernel, Stefan Assmann,
	Sunil Goutham, David S. Miller, linux-arm-kernel
In-Reply-To: <20141030194513.089d27ec@urahara>

On 30.10.14 19:45:13, Stephen Hemminger wrote:
> On Thu, 30 Oct 2014 17:54:34 +0100
> Robert Richter <rric@kernel.org> wrote:
> 
> > diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> > index 1fa99a301817..80bd3336691e 100644
> > --- a/include/linux/pci_ids.h
> > +++ b/include/linux/pci_ids.h
> > @@ -2324,6 +2324,8 @@
> >  #define PCI_DEVICE_ID_ALTIMA_AC9100	0x03ea
> >  #define PCI_DEVICE_ID_ALTIMA_AC1003	0x03eb
> >  
> > +#define PCI_VENDOR_ID_CAVIUM		0x177d
> 
> I don't think PCI folks want this updated with every id anymore.

This is just the vendor id, the device id is part of the driver.

Since there will be multiple drivers I put the vendor id here.

-Robert

^ permalink raw reply

* Re: [PATCH -next v2 1/2] syncookies: remove ecn_ok validation when decoding option timestamp
From: Florian Westphal @ 2014-10-31 16:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, netdev
In-Reply-To: <1414770460.27538.9.camel@edumazet-glaptop2.roam.corp.google.com>

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2014-10-31 at 15:15 +0100, Florian Westphal wrote:
> 
> > So if you have a per route ecn setting, and syncookies are used,
> > and tcp_ecn sysctl is 0:
> 
> This part I do not understand.
> 
> Why should tcp_ecn be 0 here, and not 2 (default value) ?

Because admin might have changed it.
There is no problem if tcp_ecn sysctl is nonzero (1 or 2).

This problem will only manifest itself iff tcp_ecn sysctl was set to 0,
and the remote peer requests ecn and a route specific setting enabled
ecn for the source network and syncookies are used.

Current timestamp cookie validation will think "client is lying about
ecn in the timestamp as sysctl is off", since it does not consider a
per-route ecn knob.

^ permalink raw reply

* [PATCH net-next] ethernet: mvneta: Use PHY status standard message
From: Ezequiel Garcia @ 2014-10-31 15:57 UTC (permalink / raw)
  To: Thomas Petazzoni, Gregory Clement, David Miller
  Cc: Nadav Haklai, Tawfik Bayouk, Lior Amsalem, netdev,
	Ezequiel Garcia

Use phy_print_status() to report a change in the PHY status.
The current message is not verbose enough, so this commit improves
it by using the generic status message.

After this change, the kernel reports PHY status down and up events as:

mvneta f1070000.ethernet eth0: Link is Down
mvneta f1070000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
---
 drivers/net/ethernet/marvell/mvneta.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index ade067d..ccc3ce2 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2558,11 +2558,10 @@ static void mvneta_adjust_link(struct net_device *ndev)
 				MVNETA_GMAC_FORCE_LINK_DOWN);
 			mvreg_write(pp, MVNETA_GMAC_AUTONEG_CONFIG, val);
 			mvneta_port_up(pp);
-			netdev_info(pp->dev, "link up\n");
 		} else {
 			mvneta_port_down(pp);
-			netdev_info(pp->dev, "link down\n");
 		}
+		phy_print_status(phydev);
 	}
 }
 
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH -next v2 1/2] syncookies: remove ecn_ok validation when decoding option timestamp
From: Eric Dumazet @ 2014-10-31 15:47 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev
In-Reply-To: <20141031141503.GL10069@breakpoint.cc>

On Fri, 2014-10-31 at 15:15 +0100, Florian Westphal wrote:

> So if you have a per route ecn setting, and syncookies are used,
> and tcp_ecn sysctl is 0:

This part I do not understand.

Why should tcp_ecn be 0 here, and not 2 (default value) ?

> 
> 1. we receive syn with ecn on and timestamps
> 2. we send cookie synack, with timestamp and ecn (route allowed it),
> the lower bits of the timestamp have a "magic" bit set that allows
> us to infer that ecn was negotiated successfully.
> 3. we drop the ack from the client, since timestamp decoding sees
> "ecn is on according to timestamp, but the tcp_ecn sysctl is off".
> 
> So to fix this, step 3 either has to check the dst setting
> in addition to the global sysctl, or to rely on the timestamp alone
> that ecn was requested by the original client and allowed by our host
> at the time synack timestamp was generated/sent.
> 
> I hope that explains the reason behind patch #1 up.

^ permalink raw reply

* Re: [PATCH net-next 5/8] net/mlx4_en: Remove redundant code from RX/GRO path
From: Eric Dumazet @ 2014-10-31 15:46 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Or Gerlitz, David S. Miller, Linux Netdev List, Matan Barak,
	Amir Vadai, Saeed Mahameed, Shani Michaeli, Ido Shamay
In-Reply-To: <CAJ3xEMiDnv9=nvvJ1m7_taoSncdmgv4GJVR8DiD5t5GCsFig1A@mail.gmail.com>

On Fri, 2014-10-31 at 16:00 +0200, Or Gerlitz wrote:
> On Fri, Oct 31, 2014 at 5:19 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Fri, 2014-10-31 at 01:25 +0200, Or Gerlitz wrote:
> >> On Thu, Oct 30, 2014 at 9:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> > On Thu, 2014-10-30 at 18:06 +0200, Or Gerlitz wrote:
> >> >> Remove the code which goes through napi_gro_frags() on the RX path,
> >> >> use only napi_gro_receive().
> >>
> >> > Hmpff... napi_gro_frags() should be faster.
> >> > Have you benchmarked this ?
> >>
> >>
> >> yep we did, napi_gro_frags() was somehow better for single stream. Do
> >> you think we need to do it the other way around, e.g converge to use
> >> napi_gro_frags()?
> 
> > napi_gro_frags() is faster because the napi->skb is reused fast (not
> > going through kfree_skb()/alloc_skb() for every fragment)
> 
> I see. Is this a strong vote to convert the code to use napi_gro_frags
> on it's usual track?

I don't know yet. In some cases, actually slowing down the rx path can
help by building bigger GRO packets. But instead of inserting delays,
we can simply force napi to be run another time, with a nanosec based
timer.

I've tested this kind of heuristic :

       /* If some packets are waiting in GRO engine and timeout is not expired,
        * reschedule a NAPI poll. We allow servicing other softirqs
        * before repoll, we do not rearm CQ.
        */
       if (rx_nsecs && napi->gro_list && !need_resched()) {
               u64 now = local_clock();
               unsigned long flags;

               /* If we got packets in this round, restart timeout */
               if (done)
                       cq->tstart = now;
               else if (now - cq->tstart >= (u64)rx_nsecs)
                       goto complete;

               /* Since we might need one skb very soon, build it now */
               napi_get_frags(napi);

               local_irq_save(flags);
               list_del(&napi->poll_list);
               __napi_schedule_irqoff(napi);
               local_irq_restore(flags);

        } else {
complete:
                napi_complete(napi);
                mlx4_en_arm_cq(priv, cq);
        }
	return done;

^ permalink raw reply

* Re: [PATCH -next v2 1/2] syncookies: remove ecn_ok validation when decoding option timestamp
From: Florian Westphal @ 2014-10-31 14:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Florian Westphal, netdev
In-Reply-To: <1414764287.27538.1.camel@edumazet-glaptop2.roam.corp.google.com>

Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2014-10-31 at 14:39 +0100, Florian Westphal wrote:
> 
> > It would only get enabled if the echoed timestamp (ie the timestamp we
> > sent in the synack) indicates that ecn was enabled, i.e. the client or
> > a middlebox would have to munge/modify it to set the 'ecn on' bit in the
> > timestamp.
> > 
> > If that is too fragile in your opinion I will respin the patch to include
> > the additional validation via dst.  We already need to fetch the dst
> > object anyway to fetch certain route attributes not in the timestamp or
> > cookie, so its only a matter of reorganizing code first to avoid two lookups.
> 
> Well, your changelog is so confusing, I have no idea what is your
> intent.

Sorry :-/

So if you have a per route ecn setting, and syncookies are used,
and tcp_ecn sysctl is 0:

1. we receive syn with ecn on and timestamps
2. we send cookie synack, with timestamp and ecn (route allowed it),
the lower bits of the timestamp have a "magic" bit set that allows
us to infer that ecn was negotiated successfully.
3. we drop the ack from the client, since timestamp decoding sees
"ecn is on according to timestamp, but the tcp_ecn sysctl is off".

So to fix this, step 3 either has to check the dst setting
in addition to the global sysctl, or to rely on the timestamp alone
that ecn was requested by the original client and allowed by our host
at the time synack timestamp was generated/sent.

I hope that explains the reason behind patch #1 up.

> I do not really understand why you need to change something.

Yes, unfortunately you're not the first person saying that my
changelogs are not precise enough sometimes, I hope to do
a better job next time around.

> Maybe this is because I have not yet took my coffee ;)

Oh, well, that could also explain it 8-)

^ permalink raw reply

* Re: [PATCH -next v2 1/2] syncookies: remove ecn_ok validation when decoding option timestamp
From: Eric Dumazet @ 2014-10-31 14:04 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev
In-Reply-To: <20141031133948.GJ10069@breakpoint.cc>

On Fri, 2014-10-31 at 14:39 +0100, Florian Westphal wrote:

> It would only get enabled if the echoed timestamp (ie the timestamp we
> sent in the synack) indicates that ecn was enabled, i.e. the client or
> a middlebox would have to munge/modify it to set the 'ecn on' bit in the
> timestamp.
> 
> If that is too fragile in your opinion I will respin the patch to include
> the additional validation via dst.  We already need to fetch the dst
> object anyway to fetch certain route attributes not in the timestamp or
> cookie, so its only a matter of reorganizing code first to avoid two lookups.

Well, your changelog is so confusing, I have no idea what is your
intent.

I do not really understand why you need to change something.

Maybe this is because I have not yet took my coffee ;)

Thanks

^ permalink raw reply

* Re: [PATCH net-next 5/8] net/mlx4_en: Remove redundant code from RX/GRO path
From: Or Gerlitz @ 2014-10-31 14:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Or Gerlitz, David S. Miller, Linux Netdev List, Matan Barak,
	Amir Vadai, Saeed Mahameed, Shani Michaeli, Ido Shamay
In-Reply-To: <1414725541.499.3.camel@edumazet-glaptop2.roam.corp.google.com>

On Fri, Oct 31, 2014 at 5:19 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Fri, 2014-10-31 at 01:25 +0200, Or Gerlitz wrote:
>> On Thu, Oct 30, 2014 at 9:00 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Thu, 2014-10-30 at 18:06 +0200, Or Gerlitz wrote:
>> >> Remove the code which goes through napi_gro_frags() on the RX path,
>> >> use only napi_gro_receive().
>>
>> > Hmpff... napi_gro_frags() should be faster.
>> > Have you benchmarked this ?
>>
>>
>> yep we did, napi_gro_frags() was somehow better for single stream. Do
>> you think we need to do it the other way around, e.g converge to use
>> napi_gro_frags()?

> napi_gro_frags() is faster because the napi->skb is reused fast (not
> going through kfree_skb()/alloc_skb() for every fragment)

I see. Is this a strong vote to convert the code to use napi_gro_frags
on it's usual track?

Or.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox