Netdev List
 help / color / mirror / Atom feed
* Re: Latest net-next kernel 4.19.0+
From: Eric Dumazet @ 2018-10-30  2:53 UTC (permalink / raw)
  To: Cong Wang, Paweł Staszewski; +Cc: Linux Kernel Network Developers
In-Reply-To: <CAM_iQpWWZqur_dLC_eM7iX7BFDncrYFSXHohwxCt2xMaAjRpMQ@mail.gmail.com>



On 10/29/2018 07:27 PM, Cong Wang wrote:
> Hi,
> 
> On Mon, Oct 29, 2018 at 5:19 PM Paweł Staszewski <pstaszewski@itcare.pl> wrote:
>>
>> Sorry not complete - followed by hw csum:
>>
>> [  342.190831] vlan1490: hw csum failure
>> [  342.190835] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1
>> [  342.190836] Call Trace:
>> [  342.190839]  <IRQ>
>> [  342.190849]  dump_stack+0x46/0x5b
>> [  342.190856]  __skb_checksum_complete+0x9a/0xa0
>> [  342.190859]  tcp_v4_rcv+0xef/0x960
>> [  342.190864]  ip_local_deliver_finish+0x49/0xd0
>> [  342.190866]  ip_local_deliver+0x5e/0xe0
>> [  342.190869]  ? ip_sublist_rcv_finish+0x50/0x50
>> [  342.190870]  ip_rcv+0x41/0xc0
>> [  342.190874]  __netif_receive_skb_one_core+0x4b/0x70
>> [  342.190877]  netif_receive_skb_internal+0x2f/0xd0
>> [  342.190879]  napi_gro_receive+0xb7/0xe0
>> [  342.190884]  mlx5e_handle_rx_cqe+0x7a/0xd0
>> [  342.190886]  mlx5e_poll_rx_cq+0xc6/0x930
>> [  342.190888]  mlx5e_napi_poll+0xab/0xc90
> 
> 
> We got exactly the same backtrace in our data center. However,
> it is not easy for us to reproduce it, do you have any clue to reproduce it?
> 
> If you do, try to tcpdump the packets triggering this warning, it could
> be useful for debugging.
> 
> Also, we tried to apply commit d55bef5059dd057bd, the warning _still_
> occurs. We tried to revert the offending commit 88078d98d1bb, it
> disappears. So it is likely that commit 88078d98d1bb introduces
> more troubles than the one fixed by d55bef5059dd057bd.
> 

Or this could be that mlx5 driver is buggy when dealing with VLAN tags.

It both uses vlan_tci (hardware vlan offload) in skb _and_ this piece of code in mlx5e_handle_csum() 

		if (network_depth > ETH_HLEN)
			/* CQE csum is calculated from the IP header and does
			 * not cover VLAN headers (if present). This will add
			 * the checksum manually.
			 */
			skb->csum = csum_partial(skb->data + ETH_HLEN,
						 network_depth - ETH_HLEN,
						 skb->csum);


That seems strange to me, because skb_vlan_untag() will not adjust skb->csum in this case.

^ permalink raw reply

* Re: 4.19 - tons of hw csum failure errors
From: Cong Wang @ 2018-10-30  2:51 UTC (permalink / raw)
  To: nikola.ciprich; +Cc: Linux Kernel Network Developers, nik, Eric Dumazet
In-Reply-To: <20181027191837.GA4283@localhost.localdomain>

(Cc'ing Eric)

On Sat, Oct 27, 2018 at 12:47 PM Nikola Ciprich
<nikola.ciprich@linuxbox.cz> wrote:
>
> Hi,
>
> just wanted to report, thet after switching to 4.19 (fro 4.14.x, so maybe
> the problem appeared somewhere between), I'm getting tons of similar
> messages:
>
> Oct 27 09:06:27 xxx kernel: br501: hw csum failure
> Oct 27 09:06:27 xxx kernel: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G            E     4.19.0lb7.00_01_PRE04 #1
> Oct 27 09:06:27 xxx kernel: Hardware name: Supermicro Super Server/X11DDW-NT, BIOS 2.0b 03/07/2018
> Oct 27 09:06:27 xxx kernel: Call Trace:
> Oct 27 09:06:27 xxx kernel:  <IRQ>
> Oct 27 09:06:27 xxx kernel:  dump_stack+0x5a/0x73
> Oct 27 09:06:27 xxx kernel:  __skb_checksum_complete+0xba/0xc0
> Oct 27 09:06:27 xxx kernel:  tcp_error+0x108/0x180 [nf_conntrack]
> Oct 27 09:06:27 xxx kernel:  nf_conntrack_in+0xd2/0x4b0 [nf_conntrack]
> Oct 27 09:06:27 xxx kernel:  ? csum_partial+0xd/0x20
> Oct 27 09:06:27 xxx kernel:  nf_hook_slow+0x3d/0xb0
> Oct 27 09:06:27 xxx kernel:  ip_rcv+0xb5/0xd0
> Oct 27 09:06:27 xxx kernel:  ? ip_rcv_finish_core.isra.12+0x370/0x370
> Oct 27 09:06:27 xxx kernel:  __netif_receive_skb_one_core+0x52/0x70
> Oct 27 09:06:27 xxx kernel:  process_backlog+0xa3/0x150
> Oct 27 09:06:27 xxx kernel:  net_rx_action+0x2af/0x3f0
> Oct 27 09:06:27 xxx kernel:  __do_softirq+0xd1/0x28c
> Oct 27 09:06:27 xxx kernel:  irq_exit+0xde/0xf0
> Oct 27 09:06:27 xxx kernel:  do_IRQ+0x54/0xe0
> Oct 27 09:06:27 xxx kernel:  common_interrupt+0xf/0xf
> Oct 27 09:06:27 xxx kernel:  </IRQ>


We got the same warning (but a different backtrace) with mlx5 driver.

It seems you are using a different driver. Do you have any clue to reproduce
it?

If you do, try to tcpdump the packets triggering this warning, it could
be useful for debugging.

As I explained in other thread, it is likely that commit 88078d98d1bb
introduces more troubles than the one fixed by d55bef5059dd057bd.
You can try to play with these two commits to see if you get the same
conclusion.

BTW, the offending commit has been backported to 4.14 too. ;)

Thanks!

^ permalink raw reply

* Re: Latest net-next kernel 4.19.0+
From: Cong Wang @ 2018-10-30  2:43 UTC (permalink / raw)
  To: Paweł Staszewski
  Cc: Linux Kernel Network Developers, Eric Dumazet, dmichail
In-Reply-To: <CAM_iQpWWZqur_dLC_eM7iX7BFDncrYFSXHohwxCt2xMaAjRpMQ@mail.gmail.com>

(Adding Eric and Dimitris into Cc)

On Mon, Oct 29, 2018 at 7:27 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> Hi,
>
> On Mon, Oct 29, 2018 at 5:19 PM Paweł Staszewski <pstaszewski@itcare.pl> wrote:
> >
> > Sorry not complete - followed by hw csum:
> >
> > [  342.190831] vlan1490: hw csum failure
> > [  342.190835] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1
> > [  342.190836] Call Trace:
> > [  342.190839]  <IRQ>
> > [  342.190849]  dump_stack+0x46/0x5b
> > [  342.190856]  __skb_checksum_complete+0x9a/0xa0
> > [  342.190859]  tcp_v4_rcv+0xef/0x960
> > [  342.190864]  ip_local_deliver_finish+0x49/0xd0
> > [  342.190866]  ip_local_deliver+0x5e/0xe0
> > [  342.190869]  ? ip_sublist_rcv_finish+0x50/0x50
> > [  342.190870]  ip_rcv+0x41/0xc0
> > [  342.190874]  __netif_receive_skb_one_core+0x4b/0x70
> > [  342.190877]  netif_receive_skb_internal+0x2f/0xd0
> > [  342.190879]  napi_gro_receive+0xb7/0xe0
> > [  342.190884]  mlx5e_handle_rx_cqe+0x7a/0xd0
> > [  342.190886]  mlx5e_poll_rx_cq+0xc6/0x930
> > [  342.190888]  mlx5e_napi_poll+0xab/0xc90
>
>
> We got exactly the same backtrace in our data center. However,
> it is not easy for us to reproduce it, do you have any clue to reproduce it?
>
> If you do, try to tcpdump the packets triggering this warning, it could
> be useful for debugging.
>
> Also, we tried to apply commit d55bef5059dd057bd, the warning _still_
> occurs. We tried to revert the offending commit 88078d98d1bb, it
> disappears. So it is likely that commit 88078d98d1bb introduces
> more troubles than the one fixed by d55bef5059dd057bd.

^ permalink raw reply

* Re: [Patch net] net: make pskb_trim_rcsum_slow() robust
From: Cong Wang @ 2018-10-30  2:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <5466e637-37d2-a11e-c30c-fc17a4dbeaf1@gmail.com>

On Mon, Oct 29, 2018 at 7:25 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 10/29/2018 07:21 PM, Cong Wang wrote:
> > On Mon, Oct 29, 2018 at 7:14 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>
> >> Would not it be simpler to set ip_summed to CHECKSUM_NONE (no need to save old_csum) ?
> >
> > For !CHECKSUM_COMPLETE, ip_summed should be untouched, right?
> >
> > If you mean only setting to CHECKSUM_NONE for CHECKSUM_COMPLETE case,
> > the end result may not be simpler.
>
> I meant to reinstate what was there before my patch in this error case
>
>        if (skb->ip_summed == CHECKSUM_COMPLETE)
>                skb->ip_summed = CHECKSUM_NONE;
>
> That would only be run in error (quite unlikely) path, instead of saving old_csum in all cases.

I know your point, however, I am not sure that is a desired behavior.

On failure, I think the whole skb should be restored to its previous state
before entering this function, changing it to CHECKSUM_NONE on failure
is inconsistent with success case.

^ permalink raw reply

* Re: Latest net-next kernel 4.19.0+
From: Cong Wang @ 2018-10-30  2:27 UTC (permalink / raw)
  To: Paweł Staszewski; +Cc: Linux Kernel Network Developers
In-Reply-To: <ec5c3923-a268-aa2c-fea1-350f8bce65c2@itcare.pl>

Hi,

On Mon, Oct 29, 2018 at 5:19 PM Paweł Staszewski <pstaszewski@itcare.pl> wrote:
>
> Sorry not complete - followed by hw csum:
>
> [  342.190831] vlan1490: hw csum failure
> [  342.190835] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1
> [  342.190836] Call Trace:
> [  342.190839]  <IRQ>
> [  342.190849]  dump_stack+0x46/0x5b
> [  342.190856]  __skb_checksum_complete+0x9a/0xa0
> [  342.190859]  tcp_v4_rcv+0xef/0x960
> [  342.190864]  ip_local_deliver_finish+0x49/0xd0
> [  342.190866]  ip_local_deliver+0x5e/0xe0
> [  342.190869]  ? ip_sublist_rcv_finish+0x50/0x50
> [  342.190870]  ip_rcv+0x41/0xc0
> [  342.190874]  __netif_receive_skb_one_core+0x4b/0x70
> [  342.190877]  netif_receive_skb_internal+0x2f/0xd0
> [  342.190879]  napi_gro_receive+0xb7/0xe0
> [  342.190884]  mlx5e_handle_rx_cqe+0x7a/0xd0
> [  342.190886]  mlx5e_poll_rx_cq+0xc6/0x930
> [  342.190888]  mlx5e_napi_poll+0xab/0xc90


We got exactly the same backtrace in our data center. However,
it is not easy for us to reproduce it, do you have any clue to reproduce it?

If you do, try to tcpdump the packets triggering this warning, it could
be useful for debugging.

Also, we tried to apply commit d55bef5059dd057bd, the warning _still_
occurs. We tried to revert the offending commit 88078d98d1bb, it
disappears. So it is likely that commit 88078d98d1bb introduces
more troubles than the one fixed by d55bef5059dd057bd.

^ permalink raw reply

* Re: [Patch net] net: make pskb_trim_rcsum_slow() robust
From: Eric Dumazet @ 2018-10-30  2:25 UTC (permalink / raw)
  To: Cong Wang, Eric Dumazet; +Cc: Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <CAM_iQpW8pT7yZ1pTmhNTh1+8yVhd+=mYLCNQ-cvCqPf127qRJA@mail.gmail.com>



On 10/29/2018 07:21 PM, Cong Wang wrote:
> On Mon, Oct 29, 2018 at 7:14 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> Would not it be simpler to set ip_summed to CHECKSUM_NONE (no need to save old_csum) ?
> 
> For !CHECKSUM_COMPLETE, ip_summed should be untouched, right?
> 
> If you mean only setting to CHECKSUM_NONE for CHECKSUM_COMPLETE case,
> the end result may not be simpler.

I meant to reinstate what was there before my patch in this error case

       if (skb->ip_summed == CHECKSUM_COMPLETE)
               skb->ip_summed = CHECKSUM_NONE;

That would only be run in error (quite unlikely) path, instead of saving old_csum in all cases.

^ permalink raw reply

* Re: [Patch net] net: make pskb_trim_rcsum_slow() robust
From: Cong Wang @ 2018-10-30  2:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Linux Kernel Network Developers, Eric Dumazet
In-Reply-To: <ae7cf8de-af0c-4f06-1602-4f03144fbb8f@gmail.com>

On Mon, Oct 29, 2018 at 7:14 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Would not it be simpler to set ip_summed to CHECKSUM_NONE (no need to save old_csum) ?

For !CHECKSUM_COMPLETE, ip_summed should be untouched, right?

If you mean only setting to CHECKSUM_NONE for CHECKSUM_COMPLETE case,
the end result may not be simpler.

^ permalink raw reply

* Re: [Patch net] net: make pskb_trim_rcsum_slow() robust
From: Eric Dumazet @ 2018-10-30  2:14 UTC (permalink / raw)
  To: Cong Wang, netdev; +Cc: Eric Dumazet
In-Reply-To: <20181030003515.12075-1-xiyou.wangcong@gmail.com>



On 10/29/2018 05:35 PM, Cong Wang wrote:
> Most callers of pskb_trim_rcsum() simply drops the skb when
> it fails, however, ip_check_defrag() still continues to pass
> the skb up to stack. In that case, we should restore its previous
> csum if __pskb_trim() fails.
> 
> Found this during code review.
> 
> Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
>  net/core/skbuff.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 946de0e24c87..5decd6e6d2b6 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -1843,6 +1843,9 @@ EXPORT_SYMBOL(___pskb_trim);
>   */
>  int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
>  {
> +	__wsum old_csum = skb->csum;
> +	int ret;
> +
>  	if (skb->ip_summed == CHECKSUM_COMPLETE) {
>  		int delta = skb->len - len;
>  
> @@ -1850,7 +1853,10 @@ int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
>  					   skb_checksum(skb, len, delta, 0),
>  					   len);
>  	}
> -	return __pskb_trim(skb, len);
> +	ret = __pskb_trim(skb, len);
> +	if (unlikely(ret))
> +		skb->csum = old_csum;

Would not it be simpler to set ip_summed to CHECKSUM_NONE (no need to save old_csum) ?

> +	return ret;
>  }
>  EXPORT_SYMBOL(pskb_trim_rcsum_slow);
>  
> 

^ permalink raw reply

* Re: [Patch V4 net 01/11] net: hns3: add error handler for hns3_nic_init_vector_data()
From: tanhuazhong @ 2018-10-30  1:43 UTC (permalink / raw)
  To: Joe Perches, davem, sergei.shtylyov
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <8bf932088a84f35111d2ae99dcd77051f3e854cc.camel@perches.com>



On 2018/10/30 9:31, Joe Perches wrote:
> On Tue, 2018-10-30 at 09:21 +0800, tanhuazhong wrote:
>>
>> On 2018/10/30 1:44, Joe Perches wrote:
>>> On Mon, 2018-10-29 at 21:54 +0800, Huazhong Tan wrote:
>>>> When hns3_nic_init_vector_data() fails to map ring to vector,
>>>> it should cancel the netif_napi_add() that has been successfully
>>>> done and then exits.
>>> []
>>>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
>>> []
>>>> @@ -2821,7 +2821,7 @@ static int hns3_nic_init_vector_data(struct hns3_nic_priv *priv)
>>>>    	struct hnae3_handle *h = priv->ae_handle;
>>>>    	struct hns3_enet_tqp_vector *tqp_vector;
>>>>    	int ret = 0;
>>>> -	u16 i;
>>>> +	int i, j;
>>>>    
>>>>    	hns3_nic_set_cpumask(priv);
>>>>    
>>>> @@ -2868,13 +2868,19 @@ static int hns3_nic_init_vector_data(struct hns3_nic_priv *priv)
>>>>    		hns3_free_vector_ring_chain(tqp_vector, &vector_ring_chain);
>>>>    
>>>>    		if (ret)
>>>> -			return ret;
>>>> +			goto map_ring_fail;
>>>>    
>>>>    		netif_napi_add(priv->netdev, &tqp_vector->napi,
>>>>    			       hns3_nic_common_poll, NAPI_POLL_WEIGHT);
>>>>    	}
>>>>    
>>>>    	return 0;
>>>> +
>>>> +map_ring_fail:
>>>> +	for (j = i - 1; j >= 0; j--)
>>>> +		netif_napi_del(&priv->tqp_vector[j].napi);
>>>
>>> style trivia:
>>>
>>> Error clearing is most commonly done without another variable
>>> by decrementing i in the loop.
>>>
>>
>> Hi, Joe.
>> Is the below one more suitable?
> 
> Yes.
> 
>> +
>> +map_ring_fail:
>> +	while(i--)
>> +		netif_napi_del(&priv->tqp_vector[i].napi);
>> +
>> +	return ret;
>>
>> Or do you have any other better suggestion?
> 
> No I do not.
> 
> "while (i--)" is a much more common style.
> 
> cheers, Joe
> 

Ok, thanks for help.
Huazhong.

> 
> 
> .
> 

^ permalink raw reply

* Re: [Patch V4 net 01/11] net: hns3: add error handler for hns3_nic_init_vector_data()
From: Joe Perches @ 2018-10-30  1:31 UTC (permalink / raw)
  To: tanhuazhong, davem, sergei.shtylyov
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <1ce6639d-fa73-52f0-1b02-a9b8e64e3608@huawei.com>

On Tue, 2018-10-30 at 09:21 +0800, tanhuazhong wrote:
> 
> On 2018/10/30 1:44, Joe Perches wrote:
> > On Mon, 2018-10-29 at 21:54 +0800, Huazhong Tan wrote:
> > > When hns3_nic_init_vector_data() fails to map ring to vector,
> > > it should cancel the netif_napi_add() that has been successfully
> > > done and then exits.
> > []
> > > diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> > []
> > > @@ -2821,7 +2821,7 @@ static int hns3_nic_init_vector_data(struct hns3_nic_priv *priv)
> > >   	struct hnae3_handle *h = priv->ae_handle;
> > >   	struct hns3_enet_tqp_vector *tqp_vector;
> > >   	int ret = 0;
> > > -	u16 i;
> > > +	int i, j;
> > >   
> > >   	hns3_nic_set_cpumask(priv);
> > >   
> > > @@ -2868,13 +2868,19 @@ static int hns3_nic_init_vector_data(struct hns3_nic_priv *priv)
> > >   		hns3_free_vector_ring_chain(tqp_vector, &vector_ring_chain);
> > >   
> > >   		if (ret)
> > > -			return ret;
> > > +			goto map_ring_fail;
> > >   
> > >   		netif_napi_add(priv->netdev, &tqp_vector->napi,
> > >   			       hns3_nic_common_poll, NAPI_POLL_WEIGHT);
> > >   	}
> > >   
> > >   	return 0;
> > > +
> > > +map_ring_fail:
> > > +	for (j = i - 1; j >= 0; j--)
> > > +		netif_napi_del(&priv->tqp_vector[j].napi);
> > 
> > style trivia:
> > 
> > Error clearing is most commonly done without another variable
> > by decrementing i in the loop.
> > 
> 
> Hi, Joe.
> Is the below one more suitable?

Yes.

> +
> +map_ring_fail:
> +	while(i--)
> +		netif_napi_del(&priv->tqp_vector[i].napi);
> +
> +	return ret;
> 
> Or do you have any other better suggestion?

No I do not.

"while (i--)" is a much more common style.

cheers, Joe

^ permalink raw reply

* Re: [Patch V4 net 01/11] net: hns3: add error handler for hns3_nic_init_vector_data()
From: tanhuazhong @ 2018-10-30  1:21 UTC (permalink / raw)
  To: Joe Perches, davem, sergei.shtylyov
  Cc: netdev, linuxarm, salil.mehta, yisen.zhuang, lipeng321,
	linyunsheng
In-Reply-To: <a90e8a502366fef14d599dfbbaaeffa85cc77198.camel@perches.com>



On 2018/10/30 1:44, Joe Perches wrote:
> On Mon, 2018-10-29 at 21:54 +0800, Huazhong Tan wrote:
>> When hns3_nic_init_vector_data() fails to map ring to vector,
>> it should cancel the netif_napi_add() that has been successfully
>> done and then exits.
> []
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
> []
>> @@ -2821,7 +2821,7 @@ static int hns3_nic_init_vector_data(struct hns3_nic_priv *priv)
>>   	struct hnae3_handle *h = priv->ae_handle;
>>   	struct hns3_enet_tqp_vector *tqp_vector;
>>   	int ret = 0;
>> -	u16 i;
>> +	int i, j;
>>   
>>   	hns3_nic_set_cpumask(priv);
>>   
>> @@ -2868,13 +2868,19 @@ static int hns3_nic_init_vector_data(struct hns3_nic_priv *priv)
>>   		hns3_free_vector_ring_chain(tqp_vector, &vector_ring_chain);
>>   
>>   		if (ret)
>> -			return ret;
>> +			goto map_ring_fail;
>>   
>>   		netif_napi_add(priv->netdev, &tqp_vector->napi,
>>   			       hns3_nic_common_poll, NAPI_POLL_WEIGHT);
>>   	}
>>   
>>   	return 0;
>> +
>> +map_ring_fail:
>> +	for (j = i - 1; j >= 0; j--)
>> +		netif_napi_del(&priv->tqp_vector[j].napi);
> 
> style trivia:
> 
> Error clearing is most commonly done without another variable
> by decrementing i in the loop.
> 

Hi, Joe.
Is the below one more suitable?

+
+map_ring_fail:
+	while(i--)
+		netif_napi_del(&priv->tqp_vector[i].napi);
+
+	return ret;

Or do you have any other better suggestion?
Thanks.

> 
> 
> .
> 

^ permalink raw reply

* [PATCH v2 3/3] Bluetooth: hci_qca: Set HCI_QUIRK_USE_BDADDR_PROPERTY for wcn3990
From: Matthias Kaehlcke @ 2018-10-30  0:44 UTC (permalink / raw)
  To: Marcel Holtmann, Johan Hedberg, David S . Miller, Loic Poulain
  Cc: linux-bluetooth, netdev, linux-kernel, Balakrishna Godavarthi,
	Brian Norris, Dmitry Grinberg, Matthias Kaehlcke
In-Reply-To: <20181030004415.237101-1-mka@chromium.org>

Set quirk for wcn3990 to read BD_ADDR from a firmware node property.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Changes in v2:
- patch added to the series

tested with https://lore.kernel.org/patchwork/patch/1003830
("Bluetooth: hci_qca: Add helper to set device address.")
---
 drivers/bluetooth/hci_qca.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index f036c8f98ea3..0535833caa52 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -1193,6 +1193,7 @@ static int qca_setup(struct hci_uart *hu)
 		 * setup for every hci up.
 		 */
 		set_bit(HCI_QUIRK_NON_PERSISTENT_SETUP, &hdev->quirks);
+		set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks);
 		hu->hdev->shutdown = qca_power_off;
 		ret = qca_wcn3990_init(hu);
 		if (ret)
-- 
2.19.1.568.g152ad8e336-goog

^ permalink raw reply related

* [PATCH v2 2/3] Bluetooth: btqcomsmd: use HCI_QUIRK_USE_BDADDR_PROPERTY
From: Matthias Kaehlcke @ 2018-10-30  0:44 UTC (permalink / raw)
  To: Marcel Holtmann, Johan Hedberg, David S . Miller, Loic Poulain
  Cc: linux-bluetooth, netdev, linux-kernel, Balakrishna Godavarthi,
	Brian Norris, Dmitry Grinberg, Matthias Kaehlcke
In-Reply-To: <20181030004415.237101-1-mka@chromium.org>

Use the HCI_QUIRK_USE_BDADDR_PROPERTY quirk to let the HCI
core handle the reading of 'local-bd-address'. With this there
is no need to set HCI_QUIRK_INVALID_BDADDR, the case of a
non-existing or invalid fwnode property is handled by the core
code.

Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>
---
I couldn't actually test the changes in this driver since I
don't have a device with this controller. Could someone
from Qualcomm help with this?

Changes in v2:
- removed now unused field 'bdaddr' from struct btqcomsmd
- added 'Reviewed-by: Balakrishna Godavarthi <bgodavar@codeaurora.org>' tag
---
 drivers/bluetooth/btqcomsmd.c | 29 +++--------------------------
 1 file changed, 3 insertions(+), 26 deletions(-)

diff --git a/drivers/bluetooth/btqcomsmd.c b/drivers/bluetooth/btqcomsmd.c
index 7df3eed1ef5e..b3020fab6c8e 100644
--- a/drivers/bluetooth/btqcomsmd.c
+++ b/drivers/bluetooth/btqcomsmd.c
@@ -28,7 +28,6 @@
 struct btqcomsmd {
 	struct hci_dev *hdev;
 
-	bdaddr_t bdaddr;
 	struct rpmsg_endpoint *acl_channel;
 	struct rpmsg_endpoint *cmd_channel;
 };
@@ -125,23 +124,10 @@ static int btqcomsmd_setup(struct hci_dev *hdev)
 		return PTR_ERR(skb);
 	kfree_skb(skb);
 
-	/* Devices do not have persistent storage for BD address. If no
-	 * BD address has been retrieved during probe, mark the device
-	 * as having an invalid BD address.
+	/* Devices do not have persistent storage for BD address. Retrieve
+	 * it from the firmware node property.
 	 */
-	if (!bacmp(&btq->bdaddr, BDADDR_ANY)) {
-		set_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks);
-		return 0;
-	}
-
-	/* When setting a configured BD address fails, mark the device
-	 * as having an invalid BD address.
-	 */
-	err = qca_set_bdaddr_rome(hdev, &btq->bdaddr);
-	if (err) {
-		set_bit(HCI_QUIRK_INVALID_BDADDR, &hdev->quirks);
-		return 0;
-	}
+	set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks);
 
 	return 0;
 }
@@ -169,15 +155,6 @@ static int btqcomsmd_probe(struct platform_device *pdev)
 	if (IS_ERR(btq->cmd_channel))
 		return PTR_ERR(btq->cmd_channel);
 
-	/* The local-bd-address property is usually injected by the
-	 * bootloader which has access to the allocated BD address.
-	 */
-	if (!of_property_read_u8_array(pdev->dev.of_node, "local-bd-address",
-				       (u8 *)&btq->bdaddr, sizeof(bdaddr_t))) {
-		dev_info(&pdev->dev, "BD address %pMR retrieved from device-tree",
-			 &btq->bdaddr);
-	}
-
 	hdev = hci_alloc_dev();
 	if (!hdev)
 		return -ENOMEM;
-- 
2.19.1.568.g152ad8e336-goog

^ permalink raw reply related

* [PATCH 2/2] net: xilinx_emaclite: recheck condition after timeout in mdio_wait()
From: Kurt Kanzenbach @ 2018-10-30  9:31 UTC (permalink / raw)
  To: Anirudha Sarangi, John Linn, David S. Miller
  Cc: Michal Simek, Radhey Shyam Pandey, Andrew Lunn, YueHaibing,
	netdev, linux-arm-kernel, linux-kernel, Kurt Kanzenbach
In-Reply-To: <20181030093139.10226-1-kurt@linutronix.de>

The function could report a false positive if it gets preempted between reading
the XEL_MDIOCTRL_OFFSET register and checking for the timeout.  In such a case,
the condition has to be rechecked to avoid false positives.

Therefore, check for expected condition even after the timeout occurred.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
---
 drivers/net/ethernet/xilinx/xilinx_emaclite.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/xilinx/xilinx_emaclite.c b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
index 639e3e99af46..957d03085bd0 100644
--- a/drivers/net/ethernet/xilinx/xilinx_emaclite.c
+++ b/drivers/net/ethernet/xilinx/xilinx_emaclite.c
@@ -714,19 +714,29 @@ static irqreturn_t xemaclite_interrupt(int irq, void *dev_id)
 static int xemaclite_mdio_wait(struct net_local *lp)
 {
 	unsigned long end = jiffies + 2;
+	u32 val;
 
 	/* wait for the MDIO interface to not be busy or timeout
 	 * after some time.
 	 */
-	while (xemaclite_readl(lp->base_addr + XEL_MDIOCTRL_OFFSET) &
-			XEL_MDIOCTRL_MDIOSTS_MASK) {
+	while (1) {
+		val = xemaclite_readl(lp->base_addr + XEL_MDIOCTRL_OFFSET);
+
+		if (!(val & XEL_MDIOCTRL_MDIOSTS_MASK))
+			return 0;
+
 		if (time_before_eq(end, jiffies)) {
-			WARN_ON(1);
-			return -ETIMEDOUT;
+			val = xemaclite_readl(lp->base_addr + XEL_MDIOCTRL_OFFSET);
+			break;
 		}
+
 		msleep(1);
 	}
-	return 0;
+	if (!(val & XEL_MDIOCTRL_MDIOSTS_MASK))
+		return 0;
+
+	WARN_ON(1);
+	return -ETIMEDOUT;
 }
 
 /**
-- 
2.11.0

^ permalink raw reply related

* [PATCH 1/2] net: axienet: recheck condition after timeout in mdio_wait()
From: Kurt Kanzenbach @ 2018-10-30  9:31 UTC (permalink / raw)
  To: Anirudha Sarangi, John Linn, David S. Miller
  Cc: Michal Simek, Radhey Shyam Pandey, Andrew Lunn, YueHaibing,
	netdev, linux-arm-kernel, linux-kernel, Kurt Kanzenbach
In-Reply-To: <20181030093139.10226-1-kurt@linutronix.de>

The function could report a false positive if it gets preempted between reading
the XAE_MDIO_MCR_OFFSET register and checking for the timeout.  In such a case,
the condition has to be rechecked to avoid false positives.

Therefore, check for expected condition even after the timeout occurred.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
---
 drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c b/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c
index 757a3b37ae8a..4f13125e4942 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c
@@ -21,15 +21,26 @@
 int axienet_mdio_wait_until_ready(struct axienet_local *lp)
 {
 	unsigned long end = jiffies + 2;
-	while (!(axienet_ior(lp, XAE_MDIO_MCR_OFFSET) &
-		 XAE_MDIO_MCR_READY_MASK)) {
+	u32 val;
+
+	while (1) {
+		val = axienet_ior(lp, XAE_MDIO_MCR_OFFSET);
+
+		if (val & XAE_MDIO_MCR_READY_MASK)
+			return 0;
+
 		if (time_before_eq(end, jiffies)) {
-			WARN_ON(1);
-			return -ETIMEDOUT;
+			val = axienet_ior(lp, XAE_MDIO_MCR_OFFSET);
+			break;
 		}
+
 		udelay(1);
 	}
-	return 0;
+	if (val & XAE_MDIO_MCR_READY_MASK)
+		return 0;
+
+	WARN_ON(1);
+	return -ETIMEDOUT;
 }
 
 /**
-- 
2.11.0

^ permalink raw reply related

* [PATCH 0/2] net: xlinx: mdio: recheck condition after timeout
From: Kurt Kanzenbach @ 2018-10-30  9:31 UTC (permalink / raw)
  To: Anirudha Sarangi, John Linn, David S. Miller
  Cc: Michal Simek, Radhey Shyam Pandey, Andrew Lunn, YueHaibing,
	netdev, linux-arm-kernel, linux-kernel, Kurt Kanzenbach

Hi,

the Xilinx mdio wait functions may return false positives under certain
circumstances: If the functions get preempted between reading the corresponding
mdio register and checking for the timeout, they could falsely indicate a
timeout.

In order to avoid the issue, the condition should be rechecked in the timeout
case.

Kurt Kanzenbach (2):
  net: axienet: recheck condition after timeout in mdio_wait()
  net: xilinx_emaclite: recheck condition after timeout in mdio_wait()

 drivers/net/ethernet/xilinx/xilinx_axienet_mdio.c | 21 ++++++++++++++++-----
 drivers/net/ethernet/xilinx/xilinx_emaclite.c     | 20 +++++++++++++++-----
 2 files changed, 31 insertions(+), 10 deletions(-)

-- 
2.11.0

^ permalink raw reply

* Re: Latest net-next kernel 4.19.0+
From: Paweł Staszewski @ 2018-10-30  0:34 UTC (permalink / raw)
  To: Linux Kernel Network Developers
In-Reply-To: <ec5c3923-a268-aa2c-fea1-350f8bce65c2@itcare.pl>

W dniu 30.10.2018 o 01:11, Paweł Staszewski pisze:
> Sorry not complete - followed by hw csum:
>
> [  342.190831] vlan1490: hw csum failure
> [  342.190835] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1
> [  342.190836] Call Trace:
> [  342.190839]  <IRQ>
> [  342.190849]  dump_stack+0x46/0x5b
> [  342.190856]  __skb_checksum_complete+0x9a/0xa0
> [  342.190859]  tcp_v4_rcv+0xef/0x960
> [  342.190864]  ip_local_deliver_finish+0x49/0xd0
> [  342.190866]  ip_local_deliver+0x5e/0xe0
> [  342.190869]  ? ip_sublist_rcv_finish+0x50/0x50
> [  342.190870]  ip_rcv+0x41/0xc0
> [  342.190874]  __netif_receive_skb_one_core+0x4b/0x70
> [  342.190877]  netif_receive_skb_internal+0x2f/0xd0
> [  342.190879]  napi_gro_receive+0xb7/0xe0
> [  342.190884]  mlx5e_handle_rx_cqe+0x7a/0xd0
> [  342.190886]  mlx5e_poll_rx_cq+0xc6/0x930
> [  342.190888]  mlx5e_napi_poll+0xab/0xc90
> [  342.190893]  ? kmem_cache_free_bulk+0x1e4/0x280
> [  342.190895]  net_rx_action+0x1f1/0x320
> [  342.190901]  __do_softirq+0xec/0x2b7
> [  342.190908]  irq_exit+0x7b/0x80
> [  342.190910]  do_IRQ+0x45/0xc0
> [  342.190912]  common_interrupt+0xf/0xf
> [  342.190914]  </IRQ>
> [  342.190916] RIP: 0010:mwait_idle+0x5f/0x1b0
> [  342.190917] Code: a8 01 0f 85 3f 01 00 00 31 d2 65 48 8b 04 25 80 
> 4c 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 0f 85 40 01 00 00 31 c0 fb 
> 0f 01 c9 <65> 8b 2d 2a c9 6a 7e 0f 1f 44 00 00 65 48 8b 04 25 80 4c 01 
> 00 f0
> [  342.190918] RSP: 0018:ffffc900034e7eb8 EFLAGS: 00000246 ORIG_RAX: 
> ffffffffffffffdd
> [  342.190920] RAX: 0000000000000000 RBX: 0000000000000034 RCX: 
> 0000000000000000
> [  342.190921] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
> 0000000000000000
> [  342.190922] RBP: 0000000000000034 R08: 0000000000000057 R09: 
> ffff88086fa1fbc0
> [  342.190923] R10: 0000000000000000 R11: 00000001000028cc R12: 
> ffff88086d180000
> [  342.190923] R13: ffff88086d180000 R14: 0000000000000000 R15: 
> 0000000000000000
> [  342.190929]  do_idle+0x1a3/0x1c0
> [  342.190931]  cpu_startup_entry+0x14/0x20
> [  342.190934]  start_secondary+0x165/0x190
> [  342.190939]  secondary_startup_64+0xa4/0xb0
>
>
> W dniu 30.10.2018 o 01:10, Paweł Staszewski pisze:
>> Hi
>>
>>
>> Just checked in test lab latest kernel and have weird traces:
>>
>> [  219.888673] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1
>> [  219.888674] Call Trace:
>> [  219.888676]  <IRQ>
>> [  219.888685]  dump_stack+0x46/0x5b
>> [  219.888691]  __skb_checksum_complete+0x9a/0xa0
>> [  219.888694]  tcp_v4_rcv+0xef/0x960
>> [  219.888698]  ip_local_deliver_finish+0x49/0xd0
>> [  219.888700]  ip_local_deliver+0x5e/0xe0
>> [  219.888702]  ? ip_sublist_rcv_finish+0x50/0x50
>> [  219.888703]  ip_rcv+0x41/0xc0
>> [  219.888706]  __netif_receive_skb_one_core+0x4b/0x70
>> [  219.888708]  netif_receive_skb_internal+0x2f/0xd0
>> [  219.888710]  napi_gro_receive+0xb7/0xe0
>> [  219.888714]  mlx5e_handle_rx_cqe+0x7a/0xd0
>> [  219.888716]  mlx5e_poll_rx_cq+0xc6/0x930
>> [  219.888717]  mlx5e_napi_poll+0xab/0xc90
>> [  219.888722]  ? enqueue_task_fair+0x286/0xc40
>> [  219.888723]  ? enqueue_task_fair+0x1d6/0xc40
>> [  219.888725]  net_rx_action+0x1f1/0x320
>> [  219.888730]  __do_softirq+0xec/0x2b7
>> [  219.888736]  irq_exit+0x7b/0x80
>> [  219.888737]  do_IRQ+0x45/0xc0
>> [  219.888740]  common_interrupt+0xf/0xf
>> [  219.888742]  </IRQ>
>> [  219.888743] RIP: 0010:mwait_idle+0x5f/0x1b0
>> [  219.888745] Code: a8 01 0f 85 3f 01 00 00 31 d2 65 48 8b 04 25 80 
>> 4c 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 0f 85 40 01 00 00 31 c0 fb 
>> 0f 01 c9 <65> 8b 2d 2a c9 6a 7e 0f 1f 44 00 00 65 48 8b 04 25 80 4c 
>> 01 00 f0
>> [  219.888746] RSP: 0018:ffffc900034e7eb8 EFLAGS: 00000246 ORIG_RAX: 
>> ffffffffffffffde
>> [  219.888749] RAX: 0000000000000000 RBX: 0000000000000034 RCX: 
>> 0000000000000000
>> [  219.888749] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
>> 0000000000000000
>> [  219.888750] RBP: 0000000000000034 R08: 000000000000003b R09: 
>> ffff88086fa1fbc0
>> [  219.888751] R10: 0000000000000000 R11: 00000000ffffb15d R12: 
>> ffff88086d180000
>> [  219.888752] R13: ffff88086d180000 R14: 0000000000000000 R15: 
>> 0000000000000000
>> [  219.888754]  do_idle+0x1a3/0x1c0
>> [  219.888757]  cpu_startup_entry+0x14/0x20
>> [  219.888760]  start_secondary+0x165/0x190
>>
>
>

Also some perf top attacked to this - 14G rx traffic on vlans (pktgen 
generated random destination ip's and forwarded by test server)

    PerfTop:   45296 irqs/sec  kernel:99.3%  exact:  0.0% [4000Hz 
cycles],  (all, 56 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

      7.43%  [kernel]       [k] mlx5e_skb_from_cqe_linear
      5.17%  [kernel]       [k] mlx5e_sq_xmit
      3.83%  [kernel]       [k] fib_table_lookup
      3.41%  [kernel]       [k] irq_entries_start
      2.91%  [kernel]       [k] build_skb
      2.50%  [kernel]       [k] mlx5_eq_int
      2.29%  [kernel]       [k] _raw_spin_lock
      2.27%  [kernel]       [k] tasklet_action_common.isra.21
      1.99%  [kernel]       [k] _raw_spin_lock_irqsave
      1.91%  [kernel]       [k] memcpy_erms
      1.77%  [kernel]       [k] __build_skb
      1.70%  [kernel]       [k] vlan_do_receive
      1.59%  [kernel]       [k] get_page_from_freelist
      1.56%  [kernel]       [k] mlx5e_poll_tx_cq
      1.53%  [kernel]       [k] __dev_queue_xmit
      1.40%  [kernel]       [k] pfifo_fast_dequeue
      1.37%  [kernel]       [k] dev_gro_receive
      1.34%  [kernel]       [k] ipt_do_table
      1.30%  [kernel]       [k] mlx5e_poll_rx_cq
      1.28%  [kernel]       [k] mlx5e_post_rx_wqes
      1.27%  [kernel]       [k] ip_finish_output2
      1.09%  [kernel]       [k] inet_gro_receive
      1.08%  [kernel]       [k] _raw_spin_lock_irq
      1.04%  [kernel]       [k] __sched_text_start
      0.99%  [kernel]       [k] tcp_gro_receive
      0.98%  [kernel]       [k] find_busiest_group
      0.97%  [kernel]       [k] __netif_receive_skb_core
      0.85%  [kernel]       [k] ip_route_input_rcu
      0.85%  [kernel]       [k] free_one_page
      0.84%  [kernel]       [k] mlx5e_handle_rx_cqe
      0.76%  [kernel]       [k] do_idle
      0.72%  [kernel]       [k] mlx5e_xmit
      0.71%  [kernel]       [k] cmd_exec
      0.71%  [kernel]       [k] __page_pool_put_page
      0.69%  [kernel]       [k] kmem_cache_alloc
      0.68%  [kernel]       [k] mlx5_cmd_comp_handler
      0.68%  [kernel]       [k] queued_spin_lock_slowpath
      0.68%  [kernel]       [k] cmd_work_handler
      0.68%  [kernel]       [k] pfifo_fast_enqueue
      0.67%  [kernel]       [k] try_to_wake_up
      0.66%  [kernel]       [k] _raw_spin_trylock
      0.62%  [kernel]       [k] dev_hard_start_xmit
      0.62%  [kernel]       [k] ip_forward
      0.62%  [kernel]       [k] swiotlb_map_page
      0.61%  [kernel]       [k] page_frag_free
      0.60%  [kernel]       [k] mlx5e_build_rx_skb
      0.60%  [kernel]       [k] skb_release_data
      0.57%  [kernel]       [k] netif_skb_features
      0.52%  [kernel]       [k] vlan_dev_hard_start_xmit
      0.50%  [kernel]       [k] kmem_cache_free_bulk
      0.49%  [kernel]       [k] enqueue_task_fair
      0.49%  [kernel]       [k] validate_xmit_skb.isra.142
      0.49%  [kernel]       [k] skb_gro_receive
      0.49%  [kernel]       [k] __qdisc_run

^ permalink raw reply

* Re: [PATCH net-next v2 5/6] net/ncsi: Reset channel state in ncsi_start_dev()
From: Samuel Mendoza-Jonas @ 2018-10-30  0:23 UTC (permalink / raw)
  To: Justin.Lee1, netdev; +Cc: davem, linux-kernel, openbmc
In-Reply-To: <d6172566ffe94dec83c1026108e24c98@AUSX13MPS306.AMER.DELL.COM>

On Fri, 2018-10-26 at 17:25 +0000, Justin.Lee1@Dell.com wrote:
> Hi Samuel,
> 
> I noticed a few issues and commented below.
> 
> Thanks,
> Justin
> 
> 
> >  /* Resources */
> > +int ncsi_reset_dev(struct ncsi_dev *nd);
> >  void ncsi_start_channel_monitor(struct ncsi_channel *nc);
> >  void ncsi_stop_channel_monitor(struct ncsi_channel *nc);
> >  struct ncsi_channel *ncsi_find_channel(struct ncsi_package *np,
> > diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c
> > index 014321ad31d3..9bad03e3fa5e 100644
> > --- a/net/ncsi/ncsi-manage.c
> > +++ b/net/ncsi/ncsi-manage.c
> > @@ -550,8 +550,10 @@ static void ncsi_suspend_channel(struct ncsi_dev_priv *ndp)
> >  		spin_lock_irqsave(&nc->lock, flags);
> >  		nc->state = NCSI_CHANNEL_INACTIVE;
> >  		spin_unlock_irqrestore(&nc->lock, flags);
> > -		ncsi_process_next_channel(ndp);
> > -
> > +		if (ndp->flags & NCSI_DEV_RESET)
> > +			ncsi_reset_dev(nd);
> > +		else
> > +			ncsi_process_next_channel(ndp);
> >  		break;
> >  	default:
> >  		netdev_warn(nd->dev, "Wrong NCSI state 0x%x in suspend\n",
> > @@ -1554,7 +1556,7 @@ int ncsi_start_dev(struct ncsi_dev *nd)
> >  		return 0;
> >  	}
> >  
> > -	return ncsi_choose_active_channel(nd);
> > +	return ncsi_reset_dev(nd);
> 
> If there is no available channel due to the whitelist, ncsi_start_dev() function will return failed
> Status and the network interface may fail to bring up too. It is possible for user to disable all 
> channels and leave the interface up for checking the LOM status.
> 

I'm not sure that that is a bug, or at least not in the scope of this
series. If the whitelist is set such that no channels are valid then
there's nothing for NCSI to do. If we want to do something like always
monitor all channels then that would be best to do in another patch.

> >  }
> >  EXPORT_SYMBOL_GPL(ncsi_start_dev);
> 
> Also, if I send set_package_mask and set_channel_mask commands back to back in a program,
> the state machine doesn't work well. If I use command line and wait for it to complete for 
> each step, then it is fine.

Yeah that's not great; probably hitting some corner cases in the NCSI
locking. I'll look into the multi-channel related stuff but I have a
feeling that if you tried this with the existing set/clear commands you
would probably hit something similar, especially on your dual core
platform. If so this is probably something to fix separately.

> 
> npcm7xx-emc f0825000.eth eth2: NCSI: Multi-package enabled on ifindex 2, mask 0x00000001
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_stop_channel_monitor() - pkg 0 ch 0
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 0 state 0400
> npcm7xx-emc f0825000.eth eth2: NCSI: pkg 0 ch 0 set as preferred channel
> npcm7xx-emc f0825000.eth eth2: NCSI: Multi-channel enabled on ifindex 2, mask 0x00000003
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_stop_channel_monitor() - pkg 0 ch 1
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_suspend_channel() - pkg 0 ch 1 state 0400
> npcm7xx-emc f0825000.eth eth2: NCSI: Package 1 set to all channels disabled
> npcm7xx-emc f0825000.eth eth2: NCSI: Multi-channel enabled on ifindex 2, mask 0x00000000
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel()
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pkg 0
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass pkg whitelist
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - ch 0
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass ch whitelist
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - skip
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - ch 1
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pass ch whitelist
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - skip
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - next pkg
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_choose_active_channel() - pkg 1
> npcm7xx-emc f0825000.eth eth2: NCSI: No channel found to configure!
> npcm7xx-emc f0825000.eth eth2: NCSI interface down
> npcm7xx-emc f0825000.eth eth2: NCSI: ncsi_dev_work()
> npcm7xx-emc f0825000.eth eth2: Wrong NCSI state 0x100 in workqueue
> 
> All masks are set correctly, but you can see the PS column is not right and channel doesn't
> configure correctly.
> 
> /sys/kernel/debug/ncsi_protocol# cat ncsi_device_status
> IFIDX IFNAME NAME   PID CID RX TX MP MC WP WC PC PS LS RU CR NQ HA
> ===================================================================
>   2   eth2   ncsi0  000 000 1  1  1  1  1  1  1  0  1  1  1  0  1
>   2   eth2   ncsi1  000 001 1  0  1  1  1  1  0  0  1  1  1  0  1
>   2   eth2   ncsi2  001 000 0  0  1  1  0  0  0  0  1  1  1  0  1
>   2   eth2   ncsi3  001 001 0  0  1  1  0  0  0  0  1  1  1  0  1
> ===================================================================
> MP: Multi-mode Package     WP: Whitelist Package
> MC: Multi-mode Channel     WC: Whitelist Channel
> PC: Primary Channel
> PS: Poll Status
> LS: Link Status
> RU: Running
> CR: Carrier OK
> NQ: Queue Stopped
> HA: Hardware Arbitration
> 
> PS column is getting from (int)nc->monitor.enabled.

^ permalink raw reply

* [Patch net] net: make pskb_trim_rcsum_slow() robust
From: Cong Wang @ 2018-10-30  0:35 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Eric Dumazet

Most callers of pskb_trim_rcsum() simply drops the skb when
it fails, however, ip_check_defrag() still continues to pass
the skb up to stack. In that case, we should restore its previous
csum if __pskb_trim() fails.

Found this during code review.

Fixes: 88078d98d1bb ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/core/skbuff.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 946de0e24c87..5decd6e6d2b6 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1843,6 +1843,9 @@ EXPORT_SYMBOL(___pskb_trim);
  */
 int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
 {
+	__wsum old_csum = skb->csum;
+	int ret;
+
 	if (skb->ip_summed == CHECKSUM_COMPLETE) {
 		int delta = skb->len - len;
 
@@ -1850,7 +1853,10 @@ int pskb_trim_rcsum_slow(struct sk_buff *skb, unsigned int len)
 					   skb_checksum(skb, len, delta, 0),
 					   len);
 	}
-	return __pskb_trim(skb, len);
+	ret = __pskb_trim(skb, len);
+	if (unlikely(ret))
+		skb->csum = old_csum;
+	return ret;
 }
 EXPORT_SYMBOL(pskb_trim_rcsum_slow);
 
-- 
2.16.4

^ permalink raw reply related

* Re: Latest net-next kernel 4.19.0+
From: Paweł Staszewski @ 2018-10-30  0:11 UTC (permalink / raw)
  To: netdev
In-Reply-To: <59d5657c-ea0a-7b64-d5ff-5b55eb4fcccf@itcare.pl>

Sorry not complete - followed by hw csum:

[  342.190831] vlan1490: hw csum failure
[  342.190835] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1
[  342.190836] Call Trace:
[  342.190839]  <IRQ>
[  342.190849]  dump_stack+0x46/0x5b
[  342.190856]  __skb_checksum_complete+0x9a/0xa0
[  342.190859]  tcp_v4_rcv+0xef/0x960
[  342.190864]  ip_local_deliver_finish+0x49/0xd0
[  342.190866]  ip_local_deliver+0x5e/0xe0
[  342.190869]  ? ip_sublist_rcv_finish+0x50/0x50
[  342.190870]  ip_rcv+0x41/0xc0
[  342.190874]  __netif_receive_skb_one_core+0x4b/0x70
[  342.190877]  netif_receive_skb_internal+0x2f/0xd0
[  342.190879]  napi_gro_receive+0xb7/0xe0
[  342.190884]  mlx5e_handle_rx_cqe+0x7a/0xd0
[  342.190886]  mlx5e_poll_rx_cq+0xc6/0x930
[  342.190888]  mlx5e_napi_poll+0xab/0xc90
[  342.190893]  ? kmem_cache_free_bulk+0x1e4/0x280
[  342.190895]  net_rx_action+0x1f1/0x320
[  342.190901]  __do_softirq+0xec/0x2b7
[  342.190908]  irq_exit+0x7b/0x80
[  342.190910]  do_IRQ+0x45/0xc0
[  342.190912]  common_interrupt+0xf/0xf
[  342.190914]  </IRQ>
[  342.190916] RIP: 0010:mwait_idle+0x5f/0x1b0
[  342.190917] Code: a8 01 0f 85 3f 01 00 00 31 d2 65 48 8b 04 25 80 4c 
01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 0f 85 40 01 00 00 31 c0 fb 0f 01 
c9 <65> 8b 2d 2a c9 6a 7e 0f 1f 44 00 00 65 48 8b 04 25 80 4c 01 00 f0
[  342.190918] RSP: 0018:ffffc900034e7eb8 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffffdd
[  342.190920] RAX: 0000000000000000 RBX: 0000000000000034 RCX: 
0000000000000000
[  342.190921] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
[  342.190922] RBP: 0000000000000034 R08: 0000000000000057 R09: 
ffff88086fa1fbc0
[  342.190923] R10: 0000000000000000 R11: 00000001000028cc R12: 
ffff88086d180000
[  342.190923] R13: ffff88086d180000 R14: 0000000000000000 R15: 
0000000000000000
[  342.190929]  do_idle+0x1a3/0x1c0
[  342.190931]  cpu_startup_entry+0x14/0x20
[  342.190934]  start_secondary+0x165/0x190
[  342.190939]  secondary_startup_64+0xa4/0xb0


W dniu 30.10.2018 o 01:10, Paweł Staszewski pisze:
> Hi
>
>
> Just checked in test lab latest kernel and have weird traces:
>
> [  219.888673] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1
> [  219.888674] Call Trace:
> [  219.888676]  <IRQ>
> [  219.888685]  dump_stack+0x46/0x5b
> [  219.888691]  __skb_checksum_complete+0x9a/0xa0
> [  219.888694]  tcp_v4_rcv+0xef/0x960
> [  219.888698]  ip_local_deliver_finish+0x49/0xd0
> [  219.888700]  ip_local_deliver+0x5e/0xe0
> [  219.888702]  ? ip_sublist_rcv_finish+0x50/0x50
> [  219.888703]  ip_rcv+0x41/0xc0
> [  219.888706]  __netif_receive_skb_one_core+0x4b/0x70
> [  219.888708]  netif_receive_skb_internal+0x2f/0xd0
> [  219.888710]  napi_gro_receive+0xb7/0xe0
> [  219.888714]  mlx5e_handle_rx_cqe+0x7a/0xd0
> [  219.888716]  mlx5e_poll_rx_cq+0xc6/0x930
> [  219.888717]  mlx5e_napi_poll+0xab/0xc90
> [  219.888722]  ? enqueue_task_fair+0x286/0xc40
> [  219.888723]  ? enqueue_task_fair+0x1d6/0xc40
> [  219.888725]  net_rx_action+0x1f1/0x320
> [  219.888730]  __do_softirq+0xec/0x2b7
> [  219.888736]  irq_exit+0x7b/0x80
> [  219.888737]  do_IRQ+0x45/0xc0
> [  219.888740]  common_interrupt+0xf/0xf
> [  219.888742]  </IRQ>
> [  219.888743] RIP: 0010:mwait_idle+0x5f/0x1b0
> [  219.888745] Code: a8 01 0f 85 3f 01 00 00 31 d2 65 48 8b 04 25 80 
> 4c 01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 0f 85 40 01 00 00 31 c0 fb 
> 0f 01 c9 <65> 8b 2d 2a c9 6a 7e 0f 1f 44 00 00 65 48 8b 04 25 80 4c 01 
> 00 f0
> [  219.888746] RSP: 0018:ffffc900034e7eb8 EFLAGS: 00000246 ORIG_RAX: 
> ffffffffffffffde
> [  219.888749] RAX: 0000000000000000 RBX: 0000000000000034 RCX: 
> 0000000000000000
> [  219.888749] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
> 0000000000000000
> [  219.888750] RBP: 0000000000000034 R08: 000000000000003b R09: 
> ffff88086fa1fbc0
> [  219.888751] R10: 0000000000000000 R11: 00000000ffffb15d R12: 
> ffff88086d180000
> [  219.888752] R13: ffff88086d180000 R14: 0000000000000000 R15: 
> 0000000000000000
> [  219.888754]  do_idle+0x1a3/0x1c0
> [  219.888757]  cpu_startup_entry+0x14/0x20
> [  219.888760]  start_secondary+0x165/0x190
>

^ permalink raw reply

* Latest net-next kernel 4.19.0+
From: Paweł Staszewski @ 2018-10-30  0:10 UTC (permalink / raw)
  To: netdev

Hi


Just checked in test lab latest kernel and have weird traces:

[  219.888673] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1
[  219.888674] Call Trace:
[  219.888676]  <IRQ>
[  219.888685]  dump_stack+0x46/0x5b
[  219.888691]  __skb_checksum_complete+0x9a/0xa0
[  219.888694]  tcp_v4_rcv+0xef/0x960
[  219.888698]  ip_local_deliver_finish+0x49/0xd0
[  219.888700]  ip_local_deliver+0x5e/0xe0
[  219.888702]  ? ip_sublist_rcv_finish+0x50/0x50
[  219.888703]  ip_rcv+0x41/0xc0
[  219.888706]  __netif_receive_skb_one_core+0x4b/0x70
[  219.888708]  netif_receive_skb_internal+0x2f/0xd0
[  219.888710]  napi_gro_receive+0xb7/0xe0
[  219.888714]  mlx5e_handle_rx_cqe+0x7a/0xd0
[  219.888716]  mlx5e_poll_rx_cq+0xc6/0x930
[  219.888717]  mlx5e_napi_poll+0xab/0xc90
[  219.888722]  ? enqueue_task_fair+0x286/0xc40
[  219.888723]  ? enqueue_task_fair+0x1d6/0xc40
[  219.888725]  net_rx_action+0x1f1/0x320
[  219.888730]  __do_softirq+0xec/0x2b7
[  219.888736]  irq_exit+0x7b/0x80
[  219.888737]  do_IRQ+0x45/0xc0
[  219.888740]  common_interrupt+0xf/0xf
[  219.888742]  </IRQ>
[  219.888743] RIP: 0010:mwait_idle+0x5f/0x1b0
[  219.888745] Code: a8 01 0f 85 3f 01 00 00 31 d2 65 48 8b 04 25 80 4c 
01 00 48 89 d1 0f 01 c8 48 8b 00 a8 08 0f 85 40 01 00 00 31 c0 fb 0f 01 
c9 <65> 8b 2d 2a c9 6a 7e 0f 1f 44 00 00 65 48 8b 04 25 80 4c 01 00 f0
[  219.888746] RSP: 0018:ffffc900034e7eb8 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffffde
[  219.888749] RAX: 0000000000000000 RBX: 0000000000000034 RCX: 
0000000000000000
[  219.888749] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 
0000000000000000
[  219.888750] RBP: 0000000000000034 R08: 000000000000003b R09: 
ffff88086fa1fbc0
[  219.888751] R10: 0000000000000000 R11: 00000000ffffb15d R12: 
ffff88086d180000
[  219.888752] R13: ffff88086d180000 R14: 0000000000000000 R15: 
0000000000000000
[  219.888754]  do_idle+0x1a3/0x1c0
[  219.888757]  cpu_startup_entry+0x14/0x20
[  219.888760]  start_secondary+0x165/0x190

^ permalink raw reply

* Re: bnx2: rx_fw_discards: BCM5716 sporadically drops packets when update to driver version 2.2.6
From: maowenan @ 2018-10-30  9:03 UTC (permalink / raw)
  To: Mody, Rasesh, netdev@vger.kernel.org, f.fainelli@gmail.com,
	andrew@lunn.ch, linux-kernel@vger.kernel.org
  Cc: Rahman, Ameen
In-Reply-To: <BYAPR07MB536505C0A8C27C76C352C3239FCC0@BYAPR07MB5365.namprd07.prod.outlook.com>



On 2018/10/30 14:47, Mody, Rasesh wrote:
>> From: maowenan <maowenan@huawei.com>
>> Sent: Thursday, October 25, 2018 8:16 PM
>>
>> Hi,
>>
>> After I update version of bnx2 driver from 2.2.1 to 2.2.6, I find BCM5716
>> sporadically drops packets, which shows in rx_fw_discards.
>> C36-141-5:~ #  ethtool -S NIC0
>>
>> NIC statistics:
>>     rx_ucast_packets: 11902
>>
>>     rx_mcast_packets: 217
>>
>>     rx_bcast_packets: 4954320
>>
>>     rx_filtered_packets: 328793
>>
>>     rx_fw_discards: 2742
>>
>> C36-141-5:~ #
>>
>> 5s later:
>>
>> C36-141-5:~ #  ethtool -S NIC0
>>
>> NIC statistics:
>>     rx_ucast_packets: 11910
>>
>>     rx_mcast_packets: 217
>>
>>     rx_bcast_packets: 4958117
>>
>>     rx_filtered_packets: 328897
>>
>>     rx_fw_discards: 2750
>>
>> C36-141-5:~ #
>>
>> so rx_fw_discards: 2742-----> rx_fw_discards: 2750, lost 8 packets.
>>
>> the information of bnx2
>> C36-141-5:~ # modinfo bnx2
>> kernel/drivers/net/ethernet/broadcom/bnx2.ko
>>
>> firmware:       bnx2/bnx2-rv2p-09ax-6.0.17.fw
>>
>> firmware:       bnx2/bnx2-rv2p-09-6.0.17.fw
>>
>> firmware:       bnx2/bnx2-mips-09-6.2.1b.fw
>>
>> firmware:       bnx2/bnx2-rv2p-06-6.0.15.fw
>>
>> firmware:       bnx2/bnx2-mips-06-6.2.3.fw
>> version:        2.2.6
>>
>>
>> 1) Firstly, I check the patches from 2.2.1 to 2.2.6, below patch is interesting.
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id
>> =0021850d0417a4dc38ed871d929b651b87e2ead9
>> Do not enable filter SORT MODE in chip init routine. This patch addresses an
>> issue where BCM5716 sporadically drops packets when changing multicast list.
>>
>> diff --git a/drivers/net/ethernet/broadcom/bnx2.c
>> b/drivers/net/ethernet/broadcom/bnx2.c
>> index 8957eb5f4478..8c9a8b7787d2 100644
>> --- a/drivers/net/ethernet/broadcom/bnx2.c
>> +++ b/drivers/net/ethernet/broadcom/bnx2.c
>> @@ -4984,8 +4984,6 @@ bnx2_init_chip(struct bnx2 *bp)
>>
>>        bp->idle_chk_status_idx = 0xffff;
>>
>> -       bp->rx_mode = BNX2_EMAC_RX_MODE_SORT_MODE;
>> -
>>        /* Set up how to generate a link change interrupt. */
>>        BNX2_WR(bp, BNX2_EMAC_ATTENTION_ENA,
>> BNX2_EMAC_ATTENTION_ENA_LINK);
>>
>>
>> 2) Secondly, I revert this patch, after verify it, I find rx_fw_discards does not
>> increasing.
>> so I think this patch can fix current issue. But I'm not sure the issue of this
>> patch to fix will be reproduced?
>> I'm not convinced that what factor will trigger rx_fw_discards increasing?
>> And how to fix this?
> 
> Can you please reword your point above? i.e. what is working and what is not. I am not sure if I understand it completely.
> 
> Is the rx_fw_disacard count incrementing with 2.2.6 upstream driver on BCM5716? What is the kernel version?
> Which test is being run?


The case is based on BCM5716's driver version of 2.2.6, and both 3.10 and mainline kernel version exist this issue .
The testing seems like using one port of BCM5716 to receive packets, it can be found rx_fw_disacard increasing occasionally.
There is one patch in 2.2.6, 0021850d0417a4dc38ed871d929b651b87e2ead9, if I remove this patch, I can't reproduce this issue,
rx_fw_disacard doesn't increase.
So I don't know why this patch(0021850d0417a4dc38ed871d929b651b87e2ead9) can fix my problem(rx_fw_disacard increasing)?
If I remove this patch, how to fix the issue that the patch had resolved?

> 
>>
>> Thanks a lot.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
> 

^ permalink raw reply

* Re: [PATCH iproute2] bpf: check map symbol type properly with newer llvm compiler
From: Daniel Borkmann @ 2018-10-30  0:08 UTC (permalink / raw)
  To: Yonghong Song, ast, netdev; +Cc: kernel-team
In-Reply-To: <20181029223203.3135200-1-yhs@fb.com>

On 10/29/2018 11:32 PM, Yonghong Song wrote:
> With llvm 7.0 or earlier, the map symbol type is STT_NOTYPE.
>   -bash-4.4$ cat t.c
>   __attribute__((section("maps"))) int g;
>   -bash-4.4$ clang -target bpf -O2 -c t.c
>   -bash-4.4$ readelf -s t.o
> 
>   Symbol table '.symtab' contains 2 entries:
>      Num:    Value          Size Type    Bind   Vis      Ndx Name
>        0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
>        1: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    3 g
>   -bash-4.4$
> 
> The following llvm commit enables BPF target to generate
> proper symbol type and size.
>   commit bf6ec206615b9718869d48b4e5400d0c6e3638dd
>   Author: Yonghong Song <yhs@fb.com>
>   Date:   Wed Sep 19 16:04:13 2018 +0000
> 
>       [bpf] Symbol sizes and types in object file
> 
>       Clang-compiled object files currently don't include the symbol sizes and
>       types.  Some tools however need that information.  For example, ctfconvert
>       uses that information to generate FreeBSD's CTF representation from ELF
>       files.
>       With this patch, symbol sizes and types are included in object files.
> 
>       Signed-off-by: Paul Chaignon <paul.chaignon@orange.com>
>       Reported-by: Yutaro Hayakawa <yhayakawa3720@gmail.com>
> 
> Hence, for llvm 8.0.0 (currently trunk), symbol type will be not NOTYPE, but OBJECT.
>   -bash-4.4$ clang -target bpf -O2 -c t.c
>   -bash-4.4$ readelf -s t.o
> 
>   Symbol table '.symtab' contains 3 entries:
>      Num:    Value          Size Type    Bind   Vis      Ndx Name
>        0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
>        1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS t.c
>        2: 0000000000000000     4 OBJECT  GLOBAL DEFAULT    3 g
> -bash-4.4$
> 
> This patch makes sure bpf library accepts both NOTYPE and OBJECT types
> of global map symbols.
> 
> Signed-off-by: Yonghong Song <yhs@fb.com>

Thanks!

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* [PATCH net 3/3] net/mlx4_en: use netdev_tx_sent_queue_more()
From: Eric Dumazet @ 2018-10-29 23:25 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Willem de Bruijn, Tariq Toukan,
	Eric Dumazet
In-Reply-To: <20181029232539.217268-1-edumazet@google.com>

This patch has two changes :

1) Use netdev_tx_sent_queue_more() for skbs with xmit_more
   This avoids mangling BQL status, since we only need to
   take care of it for the last skb of the batch.

2) doorbel only depends on xmit_more and netif_tx_queue_stopped()

  While not strictly necessary after 1), it is more consistent
  this way.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_tx.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 1857ee0f0871d48285a6d3711f7c3e9a1e08a05f..3acce02ade6a115881ecd72e4710e332d3f380cb 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -1006,7 +1006,6 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		ring->packets++;
 	}
 	ring->bytes += tx_info->nr_bytes;
-	netdev_tx_sent_queue(ring->tx_queue, tx_info->nr_bytes);
 	AVG_PERF_COUNTER(priv->pstats.tx_pktsz_avg, skb->len);
 
 	if (tx_info->inl)
@@ -1044,7 +1043,14 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev)
 		netif_tx_stop_queue(ring->tx_queue);
 		ring->queue_stopped++;
 	}
-	send_doorbell = !skb->xmit_more || netif_xmit_stopped(ring->tx_queue);
+
+	if (skb->xmit_more) {
+		netdev_tx_sent_queue_more(ring->tx_queue, tx_info->nr_bytes);
+		send_doorbell = netif_tx_queue_stopped(ring->tx_queue);
+	} else {
+		netdev_tx_sent_queue(ring->tx_queue, tx_info->nr_bytes);
+		send_doorbell = true;
+	}
 
 	real_size = (real_size / 16) & 0x3f;
 
-- 
2.19.1.568.g152ad8e336-goog

^ permalink raw reply related

* [PATCH net 2/3] net: do not abort bulk send on BQL status
From: Eric Dumazet @ 2018-10-29 23:25 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Willem de Bruijn, Tariq Toukan,
	Eric Dumazet
In-Reply-To: <20181029232539.217268-1-edumazet@google.com>

Before calling dev_hard_start_xmit(), upper layers tried
to cook optimal skb list based on BQL budget.

Problem is that GSO packets can end up comsuming more than
the BQL budget.

Breaking the loop is not useful, since requeued packets
are ahead of any packets still in the qdisc.

It is also more expensive, since next TX completion will
push these packets later, while skbs are not in cpu caches.

It is also a behavior difference with TSO packets, that can
break the BQL limit by a large amount.

Note that drivers should use netdev_tx_sent_queue_more()
in order to have optimal xmit_more support, and avoid
useless atomic operations as shown in the following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/core/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 77d43ae2a7bbe1267f8430d5c35637d1984f463c..0ffcbdd55fa9ee545c807f2ed3fc178830e3075a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3272,7 +3272,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *first, struct net_device *de
 		}
 
 		skb = next;
-		if (netif_xmit_stopped(txq) && skb) {
+		if (netif_tx_queue_stopped(txq) && skb) {
 			rc = NETDEV_TX_BUSY;
 			break;
 		}
-- 
2.19.1.568.g152ad8e336-goog

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox