Netdev List

Netdev List
 help / color / mirror / Atom feed

* [net 0/3][pull request] Intel Wired LAN Driver Updates 2015-01-06
From: Jeff Kirsher @ 2015-01-06  8:44 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains fixes to i40e only.

Jesse provides a fix for when the driver was polling with interrupts
disabled the hardware would occasionally not write back descriptors.
His fix causes the driver to detect this situation and force an interrupt
to fire which will flush the stuck descriptor.

Anjali provides a couple of fixes, the first corrects an issue where
the receive port checksum error counter was incrementing incorrectly with
UDP encapsulated tunneled traffic.  The second fix resolves an issue where
the driver was examining the outer protocol layer to set the inner protocol
layer checksum offload.  In the case of TCP over IPv6 over an IPv4 based
VXLAN, the inner checksum offloads would be set to look for IPv4/UDP
instead of IPv6/TCP, so fixed the issue so that the driver will look at
the proper layer for encapsulation offload settings.

The following are changes since commit 7ce67a38f799d1fb332f672b117efbadedaa5352:
  net: ethernet: cpsw: fix hangs with interrupts
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net master

Anjali Singhai (2):
  i40e: Fix Rx checksum error counter
  i40e: Fix bug with TCP over IPv6 over VXLAN

Jesse Brandeburg (1):
  i40e: fix un-necessary Tx hangs

 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 83 ++++++++++++++++++-----------
 1 file changed, 53 insertions(+), 30 deletions(-)

-- 
1.9.3

^ permalink raw reply

* [net 1/3] i40e: fix un-necessary Tx hangs
From: Jeff Kirsher @ 2015-01-06  8:44 UTC (permalink / raw)
  To: davem; +Cc: Jesse Brandeburg, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <1420533864-13125-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

When the driver was polling with interrupts disabled the hardware
will occasionally not write back descriptors.  This patch causes
the driver to detect this situation and force an interrupt to
fire which will flush the stuck descriptor.  Does not conflict
with NAPI because if we are already polling the napi_schedule is
ignored.  Additionally the extra interrupts are rate limited, so
don't cause a burden to the CPU.

Change-ID: Iba4616d2a71288672a5f08e4512e2704b97335e8
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 35 ++++++++++++++++++++++++-----
 1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 04b4414..9995598 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -658,6 +658,8 @@ static inline u32 i40e_get_head(struct i40e_ring *tx_ring)
 	return le32_to_cpu(*(volatile __le32 *)head);
 }
 
+#define WB_STRIDE 0x3
+
 /**
  * i40e_clean_tx_irq - Reclaim resources after transmit completes
  * @tx_ring:  tx ring to clean
@@ -759,6 +761,25 @@ static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, int budget)
 	tx_ring->q_vector->tx.total_bytes += total_bytes;
 	tx_ring->q_vector->tx.total_packets += total_packets;
 
+	/* check to see if there are any non-cache aligned descriptors
+	 * waiting to be written back, and kick the hardware to force
+	 * them to be written back in case of napi polling
+	 */
+	if (budget &&
+	    !((i & WB_STRIDE) == WB_STRIDE) &&
+	    !test_bit(__I40E_DOWN, &tx_ring->vsi->state) &&
+	    (I40E_DESC_UNUSED(tx_ring) != tx_ring->count)) {
+		u32 val = I40E_PFINT_DYN_CTLN_INTENA_MASK |
+			  I40E_PFINT_DYN_CTLN_SWINT_TRIG_MASK |
+			  I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK;
+			  /* allow 00 to be written to the index */
+
+		wr32(&tx_ring->vsi->back->hw,
+		     I40E_PFINT_DYN_CTLN(tx_ring->q_vector->v_idx +
+					 tx_ring->vsi->base_vector - 1),
+		     val);
+	}
+
 	if (check_for_tx_hang(tx_ring) && i40e_check_tx_hang(tx_ring)) {
 		/* schedule immediate reset if we believe we hung */
 		dev_info(tx_ring->dev, "Detected Tx Unit Hang\n"
@@ -777,13 +798,16 @@ static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, int budget)
 		netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
 
 		dev_info(tx_ring->dev,
-			 "tx hang detected on queue %d, resetting adapter\n",
+			 "tx hang detected on queue %d, reset requested\n",
 			 tx_ring->queue_index);
 
-		tx_ring->netdev->netdev_ops->ndo_tx_timeout(tx_ring->netdev);
+		/* do not fire the reset immediately, wait for the stack to
+		 * decide we are truly stuck, also prevents every queue from
+		 * simultaneously requesting a reset
+		 */
 
-		/* the adapter is about to reset, no point in enabling stuff */
-		return true;
+		/* the adapter is about to reset, no point in enabling polling */
+		budget = 1;
 	}
 
 	netdev_tx_completed_queue(netdev_get_tx_queue(tx_ring->netdev,
@@ -806,7 +830,7 @@ static bool i40e_clean_tx_irq(struct i40e_ring *tx_ring, int budget)
 		}
 	}
 
-	return budget > 0;
+	return !!budget;
 }
 
 /**
@@ -2198,7 +2222,6 @@ static void i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 	/* Place RS bit on last descriptor of any packet that spans across the
 	 * 4th descriptor (WB_STRIDE aka 0x3) in a 64B cacheline.
 	 */
-#define WB_STRIDE 0x3
 	if (((i & WB_STRIDE) != WB_STRIDE) &&
 	    (first <= &tx_ring->tx_bi[i]) &&
 	    (first >= &tx_ring->tx_bi[i & ~WB_STRIDE])) {
-- 
1.9.3

^ permalink raw reply related

* [net 3/3] i40e: Fix bug with TCP over IPv6 over VXLAN
From: Jeff Kirsher @ 2015-01-06  8:44 UTC (permalink / raw)
  To: davem
  Cc: Anjali Singhai, netdev, nhorman, sassmann, jogreene, Greg Rose,
	Jeff Kirsher
In-Reply-To: <1420533864-13125-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Anjali Singhai <anjali.singhai@intel.com>

The driver was examining the outer protocol layer to set the inner protocol
layer checksum offload.  In the case of TCP over IPV6 over an IPv4 based
VXLAN the inner checksum offloads would be set to look for IPv4/UDP instead
of IPv6/TCP.  This code fixes that so that the driver will look at the
proper layer for encapsulation offload settings.

Signed-off-by: Anjali Singhai <anjali.singhai@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++-------------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 2f19ea5..35696ab 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1866,17 +1866,16 @@ static int i40e_tso(struct i40e_ring *tx_ring, struct sk_buff *skb,
 	if (err < 0)
 		return err;
 
-	if (protocol == htons(ETH_P_IP)) {
-		iph = skb->encapsulation ? inner_ip_hdr(skb) : ip_hdr(skb);
+	iph = skb->encapsulation ? inner_ip_hdr(skb) : ip_hdr(skb);
+	ipv6h = skb->encapsulation ? inner_ipv6_hdr(skb) : ipv6_hdr(skb);
+
+	if (iph->version == 4) {
 		tcph = skb->encapsulation ? inner_tcp_hdr(skb) : tcp_hdr(skb);
 		iph->tot_len = 0;
 		iph->check = 0;
 		tcph->check = ~csum_tcpudp_magic(iph->saddr, iph->daddr,
 						 0, IPPROTO_TCP, 0);
-	} else if (skb_is_gso_v6(skb)) {
-
-		ipv6h = skb->encapsulation ? inner_ipv6_hdr(skb)
-					   : ipv6_hdr(skb);
+	} else if (ipv6h->version == 6) {
 		tcph = skb->encapsulation ? inner_tcp_hdr(skb) : tcp_hdr(skb);
 		ipv6h->payload_len = 0;
 		tcph->check = ~csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr,
@@ -1972,13 +1971,9 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, u32 tx_flags,
 					 I40E_TX_CTX_EXT_IP_IPV4_NO_CSUM;
 			}
 		} else if (tx_flags & I40E_TX_FLAGS_IPV6) {
-			if (tx_flags & I40E_TX_FLAGS_TSO) {
-				*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV6;
+			*cd_tunneling |= I40E_TX_CTX_EXT_IP_IPV6;
+			if (tx_flags & I40E_TX_FLAGS_TSO)
 				ip_hdr(skb)->check = 0;
-			} else {
-				*cd_tunneling |=
-					 I40E_TX_CTX_EXT_IP_IPV4_NO_CSUM;
-			}
 		}
 
 		/* Now set the ctx descriptor fields */
@@ -1988,7 +1983,10 @@ static void i40e_tx_enable_csum(struct sk_buff *skb, u32 tx_flags,
 				   ((skb_inner_network_offset(skb) -
 					skb_transport_offset(skb)) >> 1) <<
 				   I40E_TXD_CTX_QW0_NATLEN_SHIFT;
-
+		if (this_ip_hdr->version == 6) {
+			tx_flags &= ~I40E_TX_FLAGS_IPV4;
+			tx_flags |= I40E_TX_FLAGS_IPV6;
+		}
 	} else {
 		network_hdr_len = skb_network_header_len(skb);
 		this_ip_hdr = ip_hdr(skb);
-- 
1.9.3

^ permalink raw reply related

* [net 2/3] i40e: Fix Rx checksum error counter
From: Jeff Kirsher @ 2015-01-06  8:44 UTC (permalink / raw)
  To: davem
  Cc: Anjali Singhai, netdev, nhorman, sassmann, jogreene, Greg Rose,
	Jeff Kirsher
In-Reply-To: <1420533864-13125-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Anjali Singhai <anjali.singhai@intel.com>

The Rx port checksum error counter was incrementing incorrectly with
UDP encapsulated tunneled traffic.  This patch fixes the problem so that
the port_rx_csum counter will show accurate statistics.

Signed-off-by: Anjali Singhai <anjali.singhai@intel.com>
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 9995598..2f19ea5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1314,9 +1314,7 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
 	 * so the total length of IPv4 header is IHL*4 bytes
 	 * The UDP_0 bit *may* bet set if the *inner* header is UDP
 	 */
-	if (ipv4_tunnel &&
-	    (decoded.inner_prot != I40E_RX_PTYPE_INNER_PROT_UDP) &&
-	    !(rx_status & (1 << I40E_RX_DESC_STATUS_UDP_0_SHIFT))) {
+	if (ipv4_tunnel) {
 		skb->transport_header = skb->mac_header +
 					sizeof(struct ethhdr) +
 					(ip_hdr(skb)->ihl * 4);
@@ -1326,15 +1324,19 @@ static inline void i40e_rx_checksum(struct i40e_vsi *vsi,
 					  skb->protocol == htons(ETH_P_8021AD))
 					  ? VLAN_HLEN : 0;
 
-		rx_udp_csum = udp_csum(skb);
-		iph = ip_hdr(skb);
-		csum = csum_tcpudp_magic(
-				iph->saddr, iph->daddr,
-				(skb->len - skb_transport_offset(skb)),
-				IPPROTO_UDP, rx_udp_csum);
+		if ((ip_hdr(skb)->protocol == IPPROTO_UDP) &&
+		    (udp_hdr(skb)->check != 0)) {
+			rx_udp_csum = udp_csum(skb);
+			iph = ip_hdr(skb);
+			csum = csum_tcpudp_magic(
+					iph->saddr, iph->daddr,
+					(skb->len - skb_transport_offset(skb)),
+					IPPROTO_UDP, rx_udp_csum);
 
-		if (udp_hdr(skb)->check != csum)
-			goto checksum_fail;
+			if (udp_hdr(skb)->check != csum)
+				goto checksum_fail;
+
+		} /* else its GRE and so no outer UDP header */
 	}
 
 	skb->ip_summed = CHECKSUM_UNNECESSARY;
-- 
1.9.3

^ permalink raw reply related

* Re: [PATCH net-next 1/5] rhashtable: optimize rhashtable_lookup routine
From: Thomas Graf @ 2015-01-06  9:13 UTC (permalink / raw)
  To: Ying Xue
  Cc: davem, netdev, jon.maloy, Paul.Gortmaker, erik.hugne,
	tipc-discussion
In-Reply-To: <1420529003-22244-2-git-send-email-ying.xue@windriver.com>

On 01/06/15 at 03:23pm, Ying Xue wrote:
> Define an internal compare function and relevant compare argument,
> and then make use of rhashtable_lookup_compare() to lookup key in
> hash table, reducing duplicated code between rhashtable_lookup()
> and rhashtable_lookup_compare().
> 
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> Cc: Thomas Graf <tgraf@suug.ch>

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply

* Re: [PATCH net-next 2/5] rhashtable: introduce rhashtable_wakeup_worker helper function
From: Thomas Graf @ 2015-01-06  9:29 UTC (permalink / raw)
  To: Ying Xue
  Cc: davem, netdev, jon.maloy, Paul.Gortmaker, erik.hugne,
	tipc-discussion
In-Reply-To: <1420529003-22244-3-git-send-email-ying.xue@windriver.com>

On 01/06/15 at 03:23pm, Ying Xue wrote:
> Introduce rhashtable_wakeup_worker() helper function to reduce
> duplicated code where to wake up worker.
> 
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> Cc: Thomas Graf <tgraf@suug.ch>
> ---
>  lib/rhashtable.c |   20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index f2fdd7a..6eda22f 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -492,6 +492,17 @@ static void rht_deferred_worker(struct work_struct *work)
>  	mutex_unlock(&ht->mutex);
>  }
>  
> +static void rhashtable_wakeup_worker(struct rhashtable *ht)
> +{
> +	struct bucket_table *tbl = rht_dereference_rcu(ht->tbl, ht);
> +	struct bucket_table *new_tbl = rht_dereference_rcu(ht->future_tbl, ht);
> +
> +	if (ht->tbl != ht->future_tbl &&

I just noticed that this is a bug in the original code. It should be

	if (ht->tbl == ht->future_tbl &&

Do you want to fix this in this series?


> +	    ((ht->p.grow_decision && ht->p.grow_decision(ht, new_tbl->size)) ||
> +	    (ht->p.shrink_decision && ht->p.shrink_decision(ht, tbl->size))))

Indent to inner brakcet

> +		schedule_delayed_work(&ht->run_work, 0);
> +}
> +
>  /**
>   * rhashtable_insert - insert object into hash hash table
>   * @ht:		hash table
> @@ -533,9 +544,7 @@ void rhashtable_insert(struct rhashtable *ht, struct rhash_head *obj)
>  	atomic_inc(&ht->nelems);
>  
>  	/* Only grow the table if no resizing is currently in progress. */

This comment should be moved to the function as well.

^ permalink raw reply

* Re: [PATCH net] gso: do GSO for local skb with size bigger than MTU
From: Fan Du @ 2015-01-06  9:34 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Du, Fan, Thomas Graf, davem@davemloft.net, Michael S. Tsirkin,
	Jason Wang, netdev@vger.kernel.org, fw@strlen.de,
	dev@openvswitch.org, pshelar@nicira.com
In-Reply-To: <CAEP_g=-R1FrcA0sTJJjQhypRtVCwoRZ+LieKaSJxqA-HACZqEw@mail.gmail.com>


On 2015/1/6 1:58, Jesse Gross wrote:
> On Mon, Jan 5, 2015 at 1:02 AM, Fan Du <fengyuleidian0615@gmail.com> wrote:
>> 于 2014年12月03日 10:31, Du, Fan 写道:
>>
>>>
>>>> -----Original Message-----
>>>> From: Thomas Graf [mailto:tgr@infradead.org] On Behalf Of Thomas Graf
>>>> Sent: Wednesday, December 3, 2014 1:42 AM
>>>> To: Michael S. Tsirkin
>>>> Cc: Du, Fan; 'Jason Wang'; netdev@vger.kernel.org; davem@davemloft.net;
>>>> fw@strlen.de; dev@openvswitch.org; jesse@nicira.com; pshelar@nicira.com
>>>> Subject: Re: [PATCH net] gso: do GSO for local skb with size bigger than
>>>> MTU
>>>>
>>>> On 12/02/14 at 07:34pm, Michael S. Tsirkin wrote:
>>>>> On Tue, Dec 02, 2014 at 05:09:27PM +0000, Thomas Graf wrote:
>>>>>> On 12/02/14 at 01:48pm, Flavio Leitner wrote:
>>>>>>> What about containers or any other virtualization environment that
>>>>>>> doesn't use Virtio?
>>>>>>
>>>>>> The host can dictate the MTU in that case for both veth or OVS
>>>>>> internal which would be primary container plumbing techniques.
>>>>>
>>>>> It typically can't do this easily for VMs with emulated devices:
>>>>> real ethernet uses a fixed MTU.
>>>>>
>>>>> IMHO it's confusing to suggest MTU as a fix for this bug, it's an
>>>>> unrelated optimization.
>>>>> ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED is the right fix here.
>>>>
>>>> PMTU discovery only resolves the issue if an actual IP stack is running
>>>> inside the
>>>> VM. This may not be the case at all.
>>>    ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>
>>> Some thoughts here:
>>>
>>> Think otherwise, this is indeed what host stack should forge a
>>> ICMP_DEST_UNREACH/ICMP_FRAG_NEEDED
>>> message with _inner_ skb network and transport header, do whatever type of
>>> encapsulation,
>>> and thereafter push such packet upward to Guest/Container, which make them
>>> feel, the intermediate node
>>> or the peer send such message. PMTU should be expected to work correct.
>>> And such behavior should be shared by all other encapsulation tech if they
>>> are also suffered.
>>
>> Hi David, Jesse and Thomas
>>
>> As discussed in here:
>> https://www.marc.info/?l=linux-netdev&m=141764712631150&w=4 and
>> quotes from Jesse:
>> My proposal would be something like this:
>>   * For L2, reduce the VM MTU to the lowest common denominator on the
>> segment.
>>   * For L3, use path MTU discovery or fragment inner packet (i.e.
>> normal routing behavior).
>>   * As a last resort (such as if using an old version of virtio in the
>> guest), fragment the tunnel packet.
>>
>>
>> For L2, it's a administrative action
>> For L3, PMTU approach looks better, because once the sender is alerted the
>> reduced MTU,
>> packet size after encapsulation will not exceed physical MTU, so no
>> additional fragments
>> efforts needed.
>> For "As a last resort... fragment the tunnel packet", the original patch:
>> https://www.marc.info/?l=linux-netdev&m=141715655024090&w=4 did the job, but
>> seems it's
>> not welcomed.
> This needs to be properly integrated into IP processing if it is to
> work correctly.
Do you mean the original patch in this thread? yes, it works correctly
in my cloud env. If you has any other concerns, please let me know. :)
> One of the reasons for only doing path MTU discovery
> for L3 is that it operates seamlessly as part of normal operation -
> there is no need to forge addresses or potentially generate ICMP when
> on an L2 network. However, this ignores the IP handling that is going
> on (note that in OVS it is possible for L3 to be implemented as a set
> of flows coming from a controller).
>
> It also should not be VXLAN specific or duplicate VXLAN encapsulation
> code. As this is happening before encapsulation, the generated ICMP
> does not need to be encapsulated either if it is created in the right
> location.
Yes, I agree. GRE share the same issue from the code flow.
Pushing back ICMP msg back without encapsulation without circulating down
to physical device is possible. The "right location" as far as I know
could only be in ovs_vport_send. In addition this probably requires wrapper
route looking up operation for GRE/VXLAN, after get the under layer 
device MTU
from the routing information, then calculate reduced MTU becomes feasible.

^ permalink raw reply

* Re: [PATCH net-next 3/5] rhashtable: use future table size to make expansion decision
From: Thomas Graf @ 2015-01-06  9:35 UTC (permalink / raw)
  To: Ying Xue
  Cc: davem, netdev, jon.maloy, Paul.Gortmaker, erik.hugne,
	tipc-discussion
In-Reply-To: <1420529003-22244-4-git-send-email-ying.xue@windriver.com>

On 01/06/15 at 03:23pm, Ying Xue wrote:
> Should use future table size instead of old table size to decide
> whether hash table is worth being expanded.
> 
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> Cc: Thomas Graf <tgraf@suug.ch>
> ---
>  lib/rhashtable.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Apologies as you were probably mislead by the bug as commented
on in the previous patch.

I don't think we need this. future_tbl only points to a different
table until the old table entries are linked from the new table.
The condition in the resize check where meant to exclude this
phase so we would newer get to the deferred worker while relinking
is happening.

^ permalink raw reply

* Re: [PATCH net-next 4/5] rhashtable: involve rhashtable_lookup_insert routine
From: Thomas Graf @ 2015-01-06  9:41 UTC (permalink / raw)
  To: Ying Xue
  Cc: davem, netdev, jon.maloy, Paul.Gortmaker, erik.hugne,
	tipc-discussion
In-Reply-To: <1420529003-22244-5-git-send-email-ying.xue@windriver.com>

On 01/06/15 at 03:23pm, Ying Xue wrote:
> Involve a new function called rhashtable_lookup_insert() which makes
> lookup and insertion atomic under bucket lock protection, helping us
> avoid to introduce an extra lock when we search and insert an object
> into hash table.
> 
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>

LGTM now, thanks!
Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply

* [PATCH net] r8152: support ndo_features_check
From: Hayes Wang @ 2015-01-06  9:41 UTC (permalink / raw)
  To: netdev; +Cc: nic_swsd, linux-kernel, linux-usb, Hayes Wang

Support ndo_features_check to avoid:
 - the transport offset is more than the hw limitation when using hw checksum.
 - the skb->len of a GSO packet is more than the limitation.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
---
 drivers/net/usb/r8152.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 2d1c77e..57ec23e 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1897,6 +1897,22 @@ static void _rtl8152_set_rx_mode(struct net_device *netdev)
 	netif_wake_queue(netdev);
 }
 
+static netdev_features_t
+rtl8152_features_check(struct sk_buff *skb, struct net_device *dev,
+		       netdev_features_t features)
+{
+	u32 mss = skb_shinfo(skb)->gso_size;
+	int max_offset = mss ? GTTCPHO_MAX : TCPHO_MAX;
+	int offset = skb_transport_offset(skb);
+
+	if ((mss || skb->ip_summed == CHECKSUM_PARTIAL) && offset > max_offset)
+		features &= ~(NETIF_F_ALL_CSUM | NETIF_F_GSO_MASK);
+	else if ((skb->len + sizeof(struct tx_desc)) > agg_buf_sz)
+		features &= ~NETIF_F_GSO_MASK;
+
+	return features;
+}
+
 static netdev_tx_t rtl8152_start_xmit(struct sk_buff *skb,
 				      struct net_device *netdev)
 {
@@ -3706,6 +3722,7 @@ static const struct net_device_ops rtl8152_netdev_ops = {
 	.ndo_set_mac_address	= rtl8152_set_mac_address,
 	.ndo_change_mtu		= rtl8152_change_mtu,
 	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_features_check	= rtl8152_features_check,
 };
 
 static void r8152b_get_version(struct r8152 *tp)
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net-next 5/5] tipc: convert tipc reference table to use generic rhashtable
From: Thomas Graf @ 2015-01-06  9:42 UTC (permalink / raw)
  To: Ying Xue
  Cc: davem, netdev, jon.maloy, Paul.Gortmaker, erik.hugne,
	tipc-discussion
In-Reply-To: <1420529003-22244-6-git-send-email-ying.xue@windriver.com>

On 01/06/15 at 03:23pm, Ying Xue wrote:
> As tipc reference table is statically allocated, its memory size
> requested on stack initialization stage is quite big even if the
> maximum port number is just restricted to 8191 currently, however,
> the number already becomes insufficient in practice. But if the
> maximum ports is allowed to its theory value - 2^32, its consumed
> memory size will reach a ridiculously unacceptable value. Apart from
> this, heavy tipc users spend a considerable amount of time in
> tipc_sk_get() due to the read-lock on ref_table_lock.
> 
> If tipc reference table is converted with generic rhashtable, above
> mentioned both disadvantages would be resolved respectively: making
> use of the new resizable hash table can avoid locking on the lookup;
> smaller memory size is required at initial stage, for example, 256
> hash bucket slots are requested at the beginning phase instead of
> allocating the entire 8191 slots in old mode. The hash table will
> grow if entries exceeds 75% of table size up to a total table size
> of 1M, and it will automatically shrink if usage falls below 30%,
> but the minimum table size is allowed down to 256.
> 
> Also converts ref_table_lock to a separate mutex to protect hash table
> mutations on write side. Lastly defers the release of the socket
> reference using call_rcu() to allow using an RCU read-side protected
> call to rhashtable_lookup().
> 
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> Acked-by: Jon Maloy <jon.maloy@ericsson.com>
> Acked-by: Erik Hugne <erik.hugne@ericsson.com>
> Cc: Thomas Graf <tgraf@suug.ch>

The rhashtable usage looks good to me. I assume the TIPC maintainers
have reviewed the other pieces.

Acked-by: Thomas Graf <tgraf@suug.ch>

^ permalink raw reply

* Re: [PATCH v4 net-next 2/2 tuntap: Increase the number of queues in tun.
From: Michael S. Tsirkin @ 2015-01-06  9:49 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, netdev, davem, jasowang, dgibson, vfalico, edumazet,
	vyasevic, hkchu, wuzhy, xemul, therbert, bhutchings, xii, stephen,
	jiri, sergei.shtylyov
In-Reply-To: <1420522756-15127-3-git-send-email-pagupta@redhat.com>

On Tue, Jan 06, 2015 at 11:09:16AM +0530, Pankaj Gupta wrote:
> Networking under kvm works best if we allocate a per-vCPU RX and TX
> queue in a virtual NIC. This requires a per-vCPU queue on the host side.
> 
> It is now safe to increase the maximum number of queues.
> Preceding patch: 'net: allow large number of rx queues'
> made sure this won't cause failures due to high order memory
> allocations. Increase it to 256: this is the max number of vCPUs
> KVM supports.
> 
> Size of tun_struct changes from 8512 to 10496 after this patch. This keeps 
> pages allocated for tun_struct before and after the patch to 3. 
> 
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> Reviewed-by: David Gibson <dgibson@redhat.com>
> ---
>  drivers/net/tun.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index e3fa65a..a19dc5f8 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -113,10 +113,11 @@ struct tap_filter {
>  	unsigned char	addr[FLT_EXACT_COUNT][ETH_ALEN];
>  };
>  
> -/* DEFAULT_MAX_NUM_RSS_QUEUES were chosen to let the rx/tx queues allocated for
> - * the netdevice to be fit in one page. So we can make sure the success of
> - * memory allocation. TODO: increase the limit. */
> -#define MAX_TAP_QUEUES DEFAULT_MAX_NUM_RSS_QUEUES
> +/* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
> + * to max number of vCPUS in guest.

VCPUs I think.

> Also, we are making sure here
> + * queue memory allocation do not fail.

What does this mean? How are we making sure?
I would drop this phrase really.

> + */
> +#define MAX_TAP_QUEUES 256
>  #define MAX_TAP_FLOWS  4096
>  
>  #define TUN_FLOW_EXPIRE (3 * HZ)
> -- 
> 1.8.3.1

^ permalink raw reply

* Re: [PATCH net-next 3/5] rhashtable: use future table size to make expansion decision
From: Ying Xue @ 2015-01-06  9:56 UTC (permalink / raw)
  To: Thomas Graf; +Cc: jon.maloy, netdev, Paul.Gortmaker, tipc-discussion, davem
In-Reply-To: <20150106093557.GC12468@casper.infradead.org>

On 01/06/2015 05:35 PM, Thomas Graf wrote:
> On 01/06/15 at 03:23pm, Ying Xue wrote:
>> Should use future table size instead of old table size to decide
>> whether hash table is worth being expanded.
>>
>> Signed-off-by: Ying Xue <ying.xue@windriver.com>
>> Cc: Thomas Graf <tgraf@suug.ch>
>> ---
>>  lib/rhashtable.c |    5 +++--
>>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> Apologies as you were probably mislead by the bug as commented
> on in the previous patch.
> 
> I don't think we need this. future_tbl only points to a different
> table until the old table entries are linked from the new table.
> The condition in the resize check where meant to exclude this
> phase so we would newer get to the deferred worker while relinking
> is happening.
> 
> 

Thank you for above nice explanation. Regarding my understanding, as
rhashtable_expand() and rhashtable_shrink() are always under the
protection of "ht->mutex", the "future_tbl" and "tbl" absolutely point
to the same bucket array once rhashtable_expand() or rhashtable_shrink()
returns. Therefore, if rht_deferred_worker() takes the "ht->mutex" lock,
the both "future_tbl" and "tbl" should point to the same bucket array.
So the change made in the patch is useless for us, right?

But as you pointed in above patch, there is a bug in
rhashtable_wakeup_worker(). As long as ht->tbl == ht->future_tbl, we
should wake up the work. OK, I will drop the patch and fix the error in
patch #2.

In all, thank you for quickly reviewing the series, and I would deliver
the next version soon in which your all comments would be resolved.

Please wait a moment.

Regards,
Ying

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net

^ permalink raw reply

* Re: [PATCH net-next 3/5] rhashtable: use future table size to make expansion decision
From: Thomas Graf @ 2015-01-06 10:06 UTC (permalink / raw)
  To: Ying Xue
  Cc: davem, netdev, jon.maloy, Paul.Gortmaker, erik.hugne,
	tipc-discussion
In-Reply-To: <54ABB147.6020904@windriver.com>

On 01/06/15 at 05:56pm, Ying Xue wrote:
> Thank you for above nice explanation. Regarding my understanding, as
> rhashtable_expand() and rhashtable_shrink() are always under the
> protection of "ht->mutex", the "future_tbl" and "tbl" absolutely point
> to the same bucket array once rhashtable_expand() or rhashtable_shrink()
> returns. Therefore, if rht_deferred_worker() takes the "ht->mutex" lock,
> the both "future_tbl" and "tbl" should point to the same bucket array.
> So the change made in the patch is useless for us, right?

Correct.

> But as you pointed in above patch, there is a bug in
> rhashtable_wakeup_worker(). As long as ht->tbl == ht->future_tbl, we
> should wake up the work. OK, I will drop the patch and fix the error in
> patch #2.

Awesome, thanks!

^ permalink raw reply

* Payment
From: Finance Department @ 2015-01-06  8:32 UTC (permalink / raw)


Dear Recipient,

You have been awarded the sum of  8,000,000.00 (Eight Million Pounds sterling) with reference number 77100146 by office of the ministry of finance UK.Send us your personal details to deliver your funds.

Gloria Peter

^ permalink raw reply

* IPsec workshop at netdev01?
From: Steffen Klassert @ 2015-01-06 10:19 UTC (permalink / raw)
  To: netdev; +Cc: Jamal Hadi Salim, Herbert Xu, David Miller

Is there any interest in doing an IPsec workshop at netdev01?

This mail is to probe if we can gather enough discussion topics to run
such a workshop. So if someone is interested to attend and/or has a
related discussion topic, please let me know.

The idea to do this workshop came yesterday, so I'm still collecting
topics I'm interested in. Some things that came immediately to my mind
are:

- Our IPsec policy/state lookups are still hashlist based on slowpath with
  a flowcache to do fast lookups for traffic flows we have already seen.
  This flowcache has similar issues like the ipv4 routing chache had.
  Is the flowcache an appropriate lookup method on the long run or should
  we at least think about an additional alternative lookup method?

- We still lack a 32/64 bit compatibiltiy layer for IPsec, this issue
  comes up from time to time. Some solutions were proposed in the past
  but all had problems. The current behaviour is broken if someone tries
  to configure IPsec with 32 bit tools on a 64 bit machine. Can we get
  this right somehow or is it better to just return an error in this case?

- Changing the system time can lead to unexpected SA lifetime changes. The
  discussion on the list did not lead to a conclusion on how to fix this.
  What is the best way to get this fixed?
  
- The IPsec policy enforcement default is to allow all flows that don't
  match a policy. On systems with a high security level it might be
  intersting to configurable invert the default from allow to block. With
  the default to block configured, we would need allow policies for all
  packet flows we accept. Some people would be even interested in a knob
  to enforce a certain default behaviour until the next reboot. Is this
  reasonable? How far can we get here?

- A more general thing: How complete is our IPsec implementation? Are there
  things that should be implemented but we don't have it?

^ permalink raw reply

* netdev01 conference hotel: Book your room
From: Jamal Hadi Salim @ 2015-01-06 10:38 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: Richard Guy Briggs

Folks,

I am told the hotel rooms are going faster than we expected
and the hotel is holding only until Jan 23. Please book
your room now - the price is guaranteed to go up after
the group rate. More details:

https://www.netdev01.org/news/20

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net-next v3 0/5]: ixgbevf: Allow querying VFs RSS indirection table and key
From: Vlad Zolotarov @ 2015-01-06 10:58 UTC (permalink / raw)
  To: Gleb Natapov, Greg Rose; +Cc: netdev, avi, jeffrey.t.kirsher
In-Reply-To: <20150106065535.GM29889@cloudius-systems.com>


On 01/06/15 08:55, Gleb Natapov wrote:
> On Mon, Jan 05, 2015 at 03:54:52PM -0800, Greg Rose wrote:
>> On Mon, Jan 5, 2015 at 6:15 AM, Vlad Zolotarov
>> <vladz@cloudius-systems.com> wrote:
>>> Add the ethtool ops to VF driver to allow querying the RSS indirection table
>>> and RSS Random Key.
>>>
>>>   - PF driver: Add new VF-PF channel commands.
>>>   - VF driver: Utilize these new commands and add the corresponding
>>>                ethtool callbacks.
>>>
>>> New in v3:
>>>     - Added a missing support for x550 devices.
>>>     - Mask the indirection table values according to PSRTYPE[n].RQPL.
>>>     - Minimized the number of added VF-PF commands.
>>>
>>> New in v2:
>>>     - Added a detailed description to patches 4 and 5.
>>>
>>> New in v1 (compared to RFC):
>>>     - Use "if-else" statement instead of a "switch-case" for a single option case.
>>>       More specifically: in cases where the newly added API version is the only one
>>>       allowed. We may consider using a "switch-case" back again when the list of
>>>       allowed API versions in these specific places grows up.
>>>
>>> Vlad Zolotarov (5):
>>>    ixgbe: Add a RETA query command to VF-PF channel API
>>>    ixgbevf: Add a RETA query code
>>>    ixgbe: Add GET_RSS_KEY command to VF-PF channel commands set
>>>    ixgbevf: Add RSS Key query code
>>>    ixgbevf: Add the appropriate ethtool ops to query RSS indirection
>>>      table and key
>>>
>>>   drivers/net/ethernet/intel/ixgbe/ixgbe_mbx.h      |  10 ++
>>>   drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c    |  91 +++++++++++++++
>>>   drivers/net/ethernet/intel/ixgbevf/ethtool.c      |  43 +++++++
>>>   drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   4 +-
>>>   drivers/net/ethernet/intel/ixgbevf/mbx.h          |  10 ++
>>>   drivers/net/ethernet/intel/ixgbevf/vf.c           | 132 ++++++++++++++++++++++
>>>   drivers/net/ethernet/intel/ixgbevf/vf.h           |   2 +
>>>   7 files changed, 291 insertions(+), 1 deletion(-)
>> I've given this code a review and I don't see a way to
>> set a policy in the PF driver as to whether this request should be
>> allowed or not.  We cannot enable this query by default - it is a
>> security risk. To make this acceptable you need to do a
>> couple of things.
>>
> Can you please elaborate on the security risk this information poses?
> Is toeplitz hash function cryptographically strong enough so that VF
> cannot reconstruct the hash key from hash result provided in packet
> descriptor? The abstract of this paper [1] claims it is not, but I do
> not have access to the full article unfortunately hence the question.
>
> [1] http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5503170&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5503170

I agree with Gleb here: when we started with just thinking about the 
idea of this patch the possible security issue was the first thing that 
came into our minds.
But eventually we couldn't come up with any security risk or attack 
example that is exclusively caused by the fact that VF knows the 
indirection table and/or RSS hash key of the PF.

So, Greg, if we have missed anything and your have such an example could 
you share it here, please?

Thanks,
vlad

>
> --
> 			Gleb.

^ permalink raw reply

* [PATCH 1/6] batman-adv: fix and simplify condition when bonding should be used
From: Antonio Quartulli @ 2015-01-06 11:10 UTC (permalink / raw)
  To: davem
  Cc: netdev, b.a.t.m.a.n, Simon Wunderlich, Marek Lindner,
	Antonio Quartulli
In-Reply-To: <1420542605-28865-1-git-send-email-antonio@meshcoding.com>

From: Simon Wunderlich <sw@simonwunderlich.de>

The current condition actually does NOT consider bonding when the
interface the packet came in from is the soft interface, which is the
opposite of what it should do (and the comment describes). Fix that and
slightly simplify the condition.

Reported-by: Ray Gibson <booray@gmail.com>
Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
---
 net/batman-adv/routing.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
index 35f76f2..6648f32 100644
--- a/net/batman-adv/routing.c
+++ b/net/batman-adv/routing.c
@@ -443,11 +443,13 @@ batadv_find_router(struct batadv_priv *bat_priv,
 
 	router = batadv_orig_router_get(orig_node, recv_if);
 
+	if (!router)
+		return router;
+
 	/* only consider bonding for recv_if == BATADV_IF_DEFAULT (first hop)
 	 * and if activated.
 	 */
-	if (recv_if == BATADV_IF_DEFAULT || !atomic_read(&bat_priv->bonding) ||
-	    !router)
+	if (!(recv_if == BATADV_IF_DEFAULT && atomic_read(&bat_priv->bonding)))
 		return router;
 
 	/* bonding: loop through the list of possible routers found
-- 
2.2.1

^ permalink raw reply related

* pull request [net]: batman-adv 20150106
From: Antonio Quartulli @ 2015-01-06 11:09 UTC (permalink / raw)
  To: davem; +Cc: netdev, b.a.t.m.a.n

Hello David,

here you have some small fixes for your 'net' tree.

Patch 1 fixes a regression in the "bonding" code introduced while
implementing the multi-interface optimization feature, by Simon
Wunderlich.

Patch 2 ensures that the "last-seen" timestamp for a newly created
originator object is properly initialised in order to avoid a non-critical
race condition, by Linus Lüssing.

Patch 3 avoids false positive splats when lockdep is enabled by assigning
the proper lock class to locks used by the network coding feature, by
Martin Hundebøll.

Patches 4 and 5 fix the code counting the amount of multicast-disabled
nodes in the network (used to avoid to enable the multicast optimisation
when not possible), by Linus Lüssing.

Patch 6 fixes a memory leak in the Translation Table code that can be
triggered by doubling the current originator interval, by Linus Lüssing.

Please pull or let me know of any problem!

Thanks a lot,
	Antonio

The following changes since commit 7ce67a38f799d1fb332f672b117efbadedaa5352:

  net: ethernet: cpsw: fix hangs with interrupts (2015-01-04 22:18:34 -0500)

are available in the git repository at:

  git://git.open-mesh.org/linux-merge.git tags/batman-adv-fix-for-davem

for you to fetch changes up to 9d31b3ce81683ce3c9fd10afa70892e373b21067:

  batman-adv: fix potential TT client + orig-node memory leak (2015-01-06 11:07:01 +0100)

----------------------------------------------------------------
Included changes:
- ensure bonding is used (if enabled) for packets coming in the soft
  interface
- fix race condition to avoid orig_nodes to be deleted right after
  being added
- avoid false positive lockdep splats by assigning lockclass to
  the proper hashtable lock objects
- avoid miscounting of multicast 'disabled' nodes in the network
- fix memory leak in the Global Translation Table in case of
  originator interval change

----------------------------------------------------------------
Linus Lüssing (4):
      batman-adv: fix delayed foreign originator recognition
      batman-adv: fix counter for multicast supporting nodes
      batman-adv: fix multicast counter when purging originators
      batman-adv: fix potential TT client + orig-node memory leak

Martin Hundebøll (1):
      batman-adv: fix lock class for decoding hash in network-coding.c

Simon Wunderlich (1):
      batman-adv: fix and simplify condition when bonding should be used

 net/batman-adv/multicast.c      | 11 +++++++----
 net/batman-adv/network-coding.c |  2 +-
 net/batman-adv/originator.c     |  7 ++++---
 net/batman-adv/routing.c        |  6 ++++--
 4 files changed, 16 insertions(+), 10 deletions(-)

^ permalink raw reply

* [PATCH 2/6] batman-adv: fix delayed foreign originator recognition
From: Antonio Quartulli @ 2015-01-06 11:10 UTC (permalink / raw)
  To: davem
  Cc: netdev, b.a.t.m.a.n, Linus Lüssing, Marek Lindner,
	Antonio Quartulli
In-Reply-To: <1420542605-28865-1-git-send-email-antonio@meshcoding.com>

From: Linus Lüssing <linus.luessing@c0d3.blue>

Currently it can happen that the reception of an OGM from a new
originator is not being accepted. More precisely it can happen that
an originator struct gets allocated and initialized
(batadv_orig_node_new()), even the TQ gets calculated and set correctly
(batadv_iv_ogm_calc_tq()) but still the periodic orig_node purging
thread will decide to delete it if it has a chance to jump between
these two function calls.

This is because batadv_orig_node_new() initializes the last_seen value
to zero and its caller (batadv_iv_ogm_orig_get()) makes it visible to
other threads by adding it to the hash table already.
batadv_iv_ogm_calc_tq() will set the last_seen variable to the correct,
current time a few lines later but if the purging thread jumps in between
that it will think that the orig_node timed out and will wrongly
schedule it for deletion already.

If the purging interval is the same as the originator interval (which is
the default: 1 second), then this game can continue for several rounds
until the random OGM jitter added enough difference between these
two (in tests, two to about four rounds seemed common).

Fixing this by initializing the last_seen variable of an orig_node
to the current time before adding it to the hash table.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
---
 net/batman-adv/originator.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
index 6a48451..648bdba 100644
--- a/net/batman-adv/originator.c
+++ b/net/batman-adv/originator.c
@@ -678,6 +678,7 @@ struct batadv_orig_node *batadv_orig_node_new(struct batadv_priv *bat_priv,
 	atomic_set(&orig_node->last_ttvn, 0);
 	orig_node->tt_buff = NULL;
 	orig_node->tt_buff_len = 0;
+	orig_node->last_seen = jiffies;
 	reset_time = jiffies - 1 - msecs_to_jiffies(BATADV_RESET_PROTECTION_MS);
 	orig_node->bcast_seqno_reset = reset_time;
 #ifdef CONFIG_BATMAN_ADV_MCAST
-- 
2.2.1

^ permalink raw reply related

* [PATCH 3/6] batman-adv: fix lock class for decoding hash in network-coding.c
From: Antonio Quartulli @ 2015-01-06 11:10 UTC (permalink / raw)
  To: davem
  Cc: netdev, b.a.t.m.a.n, Martin Hundebøll, Marek Lindner,
	Antonio Quartulli
In-Reply-To: <1420542605-28865-1-git-send-email-antonio@meshcoding.com>

From: Martin Hundebøll <martin@hundeboll.net>

batadv_has_set_lock_class() is called with the wrong hash table as first
argument (probably due to a copy-paste error), which leads to false
positives when running with lockdep.

Introduced-by: 612d2b4fe0a1ff2f8389462a6f8be34e54124c05
("batman-adv: network coding - save overheard and tx packets for decoding")

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
---
 net/batman-adv/network-coding.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/batman-adv/network-coding.c b/net/batman-adv/network-coding.c
index 8d04d17..fab47f1 100644
--- a/net/batman-adv/network-coding.c
+++ b/net/batman-adv/network-coding.c
@@ -133,7 +133,7 @@ int batadv_nc_mesh_init(struct batadv_priv *bat_priv)
 	if (!bat_priv->nc.decoding_hash)
 		goto err;
 
-	batadv_hash_set_lock_class(bat_priv->nc.coding_hash,
+	batadv_hash_set_lock_class(bat_priv->nc.decoding_hash,
 				   &batadv_nc_decoding_hash_lock_class_key);
 
 	INIT_DELAYED_WORK(&bat_priv->nc.work, batadv_nc_worker);
-- 
2.2.1

^ permalink raw reply related

* [PATCH 4/6] batman-adv: fix counter for multicast supporting nodes
From: Antonio Quartulli @ 2015-01-06 11:10 UTC (permalink / raw)
  To: davem
  Cc: netdev, b.a.t.m.a.n, Linus Lüssing, Marek Lindner,
	Antonio Quartulli
In-Reply-To: <1420542605-28865-1-git-send-email-antonio@meshcoding.com>

From: Linus Lüssing <linus.luessing@c0d3.blue>

A miscounting of nodes having multicast optimizations enabled can lead
to multicast packet loss in the following scenario:

If the first OGM a node receives from another one has no multicast
optimizations support (no multicast tvlv) then we are missing to
increase the counter. This potentially leads to the wrong assumption
that we could safely use multicast optimizations.

Fixings this by increasing the counter if the initial OGM has the
multicast TVLV unset, too.

Introduced by 60432d756cf06e597ef9da511402dd059b112447
("batman-adv: Announce new capability via multicast TVLV")

Reported-by: Tobias Hachmer <tobias@hachmer.de>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
---
 net/batman-adv/multicast.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c
index ab6bb2a..d3503fb 100644
--- a/net/batman-adv/multicast.c
+++ b/net/batman-adv/multicast.c
@@ -685,11 +685,13 @@ static void batadv_mcast_tvlv_ogm_handler_v1(struct batadv_priv *bat_priv,
 		if (orig_initialized)
 			atomic_dec(&bat_priv->mcast.num_disabled);
 		orig->capabilities |= BATADV_ORIG_CAPA_HAS_MCAST;
-	/* If mcast support is being switched off increase the disabled
-	 * mcast node counter.
+	/* If mcast support is being switched off or if this is an initial
+	 * OGM without mcast support then increase the disabled mcast
+	 * node counter.
 	 */
 	} else if (!orig_mcast_enabled &&
-		   orig->capabilities & BATADV_ORIG_CAPA_HAS_MCAST) {
+		   (orig->capabilities & BATADV_ORIG_CAPA_HAS_MCAST ||
+		    !orig_initialized)) {
 		atomic_inc(&bat_priv->mcast.num_disabled);
 		orig->capabilities &= ~BATADV_ORIG_CAPA_HAS_MCAST;
 	}
-- 
2.2.1

^ permalink raw reply related

* [PATCH 5/6] batman-adv: fix multicast counter when purging originators
From: Antonio Quartulli @ 2015-01-06 11:10 UTC (permalink / raw)
  To: davem
  Cc: netdev, b.a.t.m.a.n, Linus Lüssing, Marek Lindner,
	Antonio Quartulli
In-Reply-To: <1420542605-28865-1-git-send-email-antonio@meshcoding.com>

From: Linus Lüssing <linus.luessing@c0d3.blue>

When purging an orig_node we should only decrease counter tracking the
number of nodes without multicast optimizations support if it was
increased through this orig_node before.

A not yet quite initialized orig_node (meaning it did not have its turn
in the mcast-tvlv handler so far) which gets purged would not adhere to
this and will lead to a counter imbalance.

Fixing this by adding a check whether the orig_node is mcast-initalized
before decreasing the counter in the mcast-orig_node-purging routine.

Introduced by 60432d756cf06e597ef9da511402dd059b112447
("batman-adv: Announce new capability via multicast TVLV")

Reported-by: Tobias Hachmer <tobias@hachmer.de>
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
---
 net/batman-adv/multicast.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/batman-adv/multicast.c b/net/batman-adv/multicast.c
index d3503fb..b24e4bb 100644
--- a/net/batman-adv/multicast.c
+++ b/net/batman-adv/multicast.c
@@ -740,7 +740,8 @@ void batadv_mcast_purge_orig(struct batadv_orig_node *orig)
 {
 	struct batadv_priv *bat_priv = orig->bat_priv;
 
-	if (!(orig->capabilities & BATADV_ORIG_CAPA_HAS_MCAST))
+	if (!(orig->capabilities & BATADV_ORIG_CAPA_HAS_MCAST) &&
+	    orig->capa_initialized & BATADV_ORIG_CAPA_HAS_MCAST)
 		atomic_dec(&bat_priv->mcast.num_disabled);
 
 	batadv_mcast_want_unsnoop_update(bat_priv, orig, BATADV_NO_FLAGS);
-- 
2.2.1

^ permalink raw reply related

* [PATCH 6/6] batman-adv: fix potential TT client + orig-node memory leak
From: Antonio Quartulli @ 2015-01-06 11:10 UTC (permalink / raw)
  To: davem
  Cc: netdev, b.a.t.m.a.n, Linus Lüssing, Marek Lindner,
	Antonio Quartulli
In-Reply-To: <1420542605-28865-1-git-send-email-antonio@meshcoding.com>

From: Linus Lüssing <linus.luessing@c0d3.blue>

This patch fixes a potential memory leak which can occur once an
originator times out. On timeout the according global translation table
entry might not get purged correctly. Furthermore, the non purged TT
entry will cause its orig-node to leak, too. Which additionally can lead
to the new multicast optimization feature not kicking in because of a
therefore bogus counter.

In detail: The batadv_tt_global_entry->orig_list holds the reference to
the orig-node. Usually this reference is released after
BATADV_PURGE_TIMEOUT through: _batadv_purge_orig()->
batadv_purge_orig_node()->batadv_update_route()->_batadv_update_route()->
batadv_tt_global_del_orig() which purges this global tt entry and
releases the reference to the orig-node.

However, if between two batadv_purge_orig_node() calls the orig-node
timeout grew to 2*BATADV_PURGE_TIMEOUT then this call path isn't
reached. Instead the according orig-node is removed from the
originator hash in _batadv_purge_orig(), the batadv_update_route()
part is skipped and won't be reached anymore.

Fixing the issue by moving batadv_tt_global_del_orig() out of the rcu
callback.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Acked-by: Antonio Quartulli <antonio@meshcoding.com>
Signed-off-by: Marek Lindner <mareklindner@neomailbox.ch>
Signed-off-by: Antonio Quartulli <antonio@meshcoding.com>
---
 net/batman-adv/originator.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/batman-adv/originator.c b/net/batman-adv/originator.c
index 648bdba..bea8198 100644
--- a/net/batman-adv/originator.c
+++ b/net/batman-adv/originator.c
@@ -570,9 +570,6 @@ static void batadv_orig_node_free_rcu(struct rcu_head *rcu)

 	batadv_frag_purge_orig(orig_node, NULL);

-	batadv_tt_global_del_orig(orig_node->bat_priv, orig_node, -1,
-				  "originator timed out");
-
 	if (orig_node->bat_priv->bat_algo_ops->bat_orig_free)
 		orig_node->bat_priv->bat_algo_ops->bat_orig_free(orig_node);

@@ -978,6 +975,9 @@ static void _batadv_purge_orig(struct batadv_priv *bat_priv)
 			if (batadv_purge_orig_node(bat_priv, orig_node)) {
 				batadv_gw_node_delete(bat_priv, orig_node);
 				hlist_del_rcu(&orig_node->hash_entry);
+				batadv_tt_global_del_orig(orig_node->bat_priv,
+							  orig_node, -1,
+							  "originator timed out");
 				batadv_orig_node_free_ref(orig_node);
 				continue;
 			}
-- 
2.2.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox