* [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum
@ 2016-06-27 18:22 Cong Wang
2016-06-27 21:08 ` Or Gerlitz
0 siblings, 1 reply; 13+ messages in thread
From: Cong Wang @ 2016-06-27 18:22 UTC (permalink / raw)
To: netdev; +Cc: Cong Wang, Shani Michaeli, Tariq Toukan, Yishai Hadas
The stack doesn't trust the complete csum by hardware
even when it is correct. In the case we fix csum by ourself
probably it is safe to just mark it as completed by software.
This should shut up a kernel warning from netdev_rx_csum_fault()
with mlx4 driver for ICMP packets.
Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE')
Cc: Shani Michaeli <shanim@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index c1b3a9c..b44c434 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -732,6 +732,7 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
if (get_fixed_ipv6_csum(hw_checksum, skb, hdr))
return -1;
#endif
+ skb->csum_complete_sw = 1;
return 0;
}
--
2.1.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum
2016-06-27 18:22 [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum Cong Wang
@ 2016-06-27 21:08 ` Or Gerlitz
2016-06-27 21:44 ` Cong Wang
0 siblings, 1 reply; 13+ messages in thread
From: Or Gerlitz @ 2016-06-27 21:08 UTC (permalink / raw)
To: Cong Wang, Tom Herbert
Cc: Linux Netdev List, Shani Michaeli, Tariq Toukan, Yishai Hadas,
Carol L Soto
On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> The stack doesn't trust the complete csum by hardware
> even when it is correct.
Can you explain that a little further?
> In the case we fix csum by ourself
> probably it is safe to just mark it as completed by software.
> This should shut up a kernel warning from netdev_rx_csum_fault()
> with mlx4 driver for ICMP packets.
can you point/paste the exact warning and how to reproduce that? is
that as simple as running ping and/or ping6?
> Fixes: f8c6455bb04b ('net/mlx4_en: Extend checksum offloading by CHECKSUM COMPLETE')
> Cc: Shani Michaeli <shanim@mellanox.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Yishai Hadas <yishaih@mellanox.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> ---
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index c1b3a9c..b44c434 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -732,6 +732,7 @@ static int check_csum(struct mlx4_cqe *cqe, struct sk_buff *skb, void *va,
> if (get_fixed_ipv6_csum(hw_checksum, skb, hdr))
> return -1;
> #endif
> + skb->csum_complete_sw = 1;
> return 0;
> }
>
> --
> 2.1.0
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum
2016-06-27 21:08 ` Or Gerlitz
@ 2016-06-27 21:44 ` Cong Wang
2016-06-27 21:47 ` Tom Herbert
2016-06-28 14:30 ` Or Gerlitz
0 siblings, 2 replies; 13+ messages in thread
From: Cong Wang @ 2016-06-27 21:44 UTC (permalink / raw)
To: Or Gerlitz
Cc: Tom Herbert, Linux Netdev List, Shani Michaeli, Tariq Toukan,
Yishai Hadas, Carol L Soto
On Mon, Jun 27, 2016 at 2:08 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> The stack doesn't trust the complete csum by hardware
>> even when it is correct.
>
> Can you explain that a little further?
Sure, here is the code in __skb_checksum_complete():
/* skb->csum holds pseudo checksum */
sum = csum_fold(csum_add(skb->csum, csum));
if (likely(!sum)) {
if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
!skb->csum_complete_sw)
netdev_rx_csum_fault(skb->dev);
}
So when sum == 0, it means the checksum is correct. And
we already set ->ip_summed to CHECKSUM_COMPLETE
after check_csum(), and ->csum_complete_sw is initialized
to 0 when we allocate the skb. This is why we trigger
netdev_rx_csum_fault().
>
>> In the case we fix csum by ourself
>> probably it is safe to just mark it as completed by software.
>
>> This should shut up a kernel warning from netdev_rx_csum_fault()
>> with mlx4 driver for ICMP packets.
>
> can you point/paste the exact warning and how to reproduce that? is
> that as simple as running ping and/or ping6?
Yes, ping is enough to reproduce it every time.
The warning is below:
[ 8693.680997] eth0: hw csum failure
[ 8693.681003] CPU: 5 PID: 34 Comm: ksoftirqd/5 Not tainted 4.1.20-t6.el5 #1
[ 8693.681005] Hardware name: SYNNEX HYVE-ZEUS/X9DRD-iF, BIOS 3.0.4 12/06/2013
[ 8693.681008] 0000000000000000 ffff88085c15fae8 ffffffff81502872
ffff881051397800
[ 8693.681011] ffff881054c08b01 ffff88085c15fb08 ffffffff814569c5
0000000000000000
[ 8693.681014] ffff8808572d7200 ffff88085c15fb38 ffffffff81450738
ffff8808572d7200
[ 8693.681017] Call Trace:
[ 8693.681025] [<ffffffff81502872>] dump_stack+0x4d/0x63
[ 8693.681030] [<ffffffff814569c5>] netdev_rx_csum_fault+0x38/0x3c
[ 8693.681033] [<ffffffff81450738>] __skb_checksum_complete+0x6e/0xb6
[ 8693.681036] [<ffffffff814b65d8>] icmp_rcv+0x17a/0x32f
[ 8693.681040] [<ffffffff8148b7f0>] ip_local_deliver_finish+0xd1/0x153
[ 8693.681042] [<ffffffff8148b9f7>] ip_local_deliver+0x8d/0x94
[ 8693.681045] [<ffffffff8148b71f>] ? xfrm4_policy_check.constprop.6+0x55/0x55
[ 8693.681048] [<ffffffff8148b651>] ip_rcv_finish+0x289/0x2cc
[ 8693.681050] [<ffffffff8148bc7b>] ip_rcv+0x27d/0x30a
[ 8693.681053] [<ffffffff81457e01>] __netif_receive_skb_core+0x3f2/0x483
[ 8693.681056] [<ffffffff81457eaa>] __netif_receive_skb+0x18/0x5a
[ 8693.681058] [<ffffffff81457f7c>] process_backlog+0x90/0x10c
[ 8693.681061] [<ffffffff814596e0>] net_rx_action+0x101/0x2aa
[ 8693.681066] [<ffffffff8106c39f>] __do_softirq+0x10c/0x26d
[ 8693.681068] [<ffffffff8106c51a>] run_ksoftirqd+0x1a/0x2f
[ 8693.681071] [<ffffffff81083675>] smpboot_thread_fn+0x149/0x167
[ 8693.681074] [<ffffffff8108352c>] ? sort_range+0x24/0x24
[ 8693.681076] [<ffffffff8108352c>] ? sort_range+0x24/0x24
[ 8693.681080] [<ffffffff81080d9c>] kthread+0xae/0xb6
[ 8693.681082] [<ffffffff81080303>] ? add_sysfs_param.isra.4+0xe1/0x18c
[ 8693.681085] [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61
[ 8693.681088] [<ffffffff81508152>] ret_from_fork+0x42/0x70
[ 8693.681090] [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum
2016-06-27 21:44 ` Cong Wang
@ 2016-06-27 21:47 ` Tom Herbert
2016-06-27 21:49 ` Cong Wang
2016-06-28 14:30 ` Or Gerlitz
1 sibling, 1 reply; 13+ messages in thread
From: Tom Herbert @ 2016-06-27 21:47 UTC (permalink / raw)
To: Cong Wang
Cc: Or Gerlitz, Linux Netdev List, Shani Michaeli, Tariq Toukan,
Yishai Hadas, Carol L Soto
On Mon, Jun 27, 2016 at 2:44 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Jun 27, 2016 at 2:08 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> The stack doesn't trust the complete csum by hardware
>>> even when it is correct.
>>
>> Can you explain that a little further?
>
> Sure, here is the code in __skb_checksum_complete():
>
> /* skb->csum holds pseudo checksum */
> sum = csum_fold(csum_add(skb->csum, csum));
> if (likely(!sum)) {
> if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
> !skb->csum_complete_sw)
> netdev_rx_csum_fault(skb->dev);
> }
>
> So when sum == 0, it means the checksum is correct. And
> we already set ->ip_summed to CHECKSUM_COMPLETE
> after check_csum(), and ->csum_complete_sw is initialized
> to 0 when we allocate the skb. This is why we trigger
> netdev_rx_csum_fault().
>
Yes, but this also means that the driver gave the stack a checksum
complete value that was incorrect. That's an error.
Tom
>
>>
>>> In the case we fix csum by ourself
>>> probably it is safe to just mark it as completed by software.
>>
>>> This should shut up a kernel warning from netdev_rx_csum_fault()
>>> with mlx4 driver for ICMP packets.
>>
>> can you point/paste the exact warning and how to reproduce that? is
>> that as simple as running ping and/or ping6?
>
> Yes, ping is enough to reproduce it every time.
>
> The warning is below:
>
> [ 8693.680997] eth0: hw csum failure
> [ 8693.681003] CPU: 5 PID: 34 Comm: ksoftirqd/5 Not tainted 4.1.20-t6.el5 #1
> [ 8693.681005] Hardware name: SYNNEX HYVE-ZEUS/X9DRD-iF, BIOS 3.0.4 12/06/2013
> [ 8693.681008] 0000000000000000 ffff88085c15fae8 ffffffff81502872
> ffff881051397800
> [ 8693.681011] ffff881054c08b01 ffff88085c15fb08 ffffffff814569c5
> 0000000000000000
> [ 8693.681014] ffff8808572d7200 ffff88085c15fb38 ffffffff81450738
> ffff8808572d7200
> [ 8693.681017] Call Trace:
> [ 8693.681025] [<ffffffff81502872>] dump_stack+0x4d/0x63
> [ 8693.681030] [<ffffffff814569c5>] netdev_rx_csum_fault+0x38/0x3c
> [ 8693.681033] [<ffffffff81450738>] __skb_checksum_complete+0x6e/0xb6
> [ 8693.681036] [<ffffffff814b65d8>] icmp_rcv+0x17a/0x32f
> [ 8693.681040] [<ffffffff8148b7f0>] ip_local_deliver_finish+0xd1/0x153
> [ 8693.681042] [<ffffffff8148b9f7>] ip_local_deliver+0x8d/0x94
> [ 8693.681045] [<ffffffff8148b71f>] ? xfrm4_policy_check.constprop.6+0x55/0x55
> [ 8693.681048] [<ffffffff8148b651>] ip_rcv_finish+0x289/0x2cc
> [ 8693.681050] [<ffffffff8148bc7b>] ip_rcv+0x27d/0x30a
> [ 8693.681053] [<ffffffff81457e01>] __netif_receive_skb_core+0x3f2/0x483
> [ 8693.681056] [<ffffffff81457eaa>] __netif_receive_skb+0x18/0x5a
> [ 8693.681058] [<ffffffff81457f7c>] process_backlog+0x90/0x10c
> [ 8693.681061] [<ffffffff814596e0>] net_rx_action+0x101/0x2aa
> [ 8693.681066] [<ffffffff8106c39f>] __do_softirq+0x10c/0x26d
> [ 8693.681068] [<ffffffff8106c51a>] run_ksoftirqd+0x1a/0x2f
> [ 8693.681071] [<ffffffff81083675>] smpboot_thread_fn+0x149/0x167
> [ 8693.681074] [<ffffffff8108352c>] ? sort_range+0x24/0x24
> [ 8693.681076] [<ffffffff8108352c>] ? sort_range+0x24/0x24
> [ 8693.681080] [<ffffffff81080d9c>] kthread+0xae/0xb6
> [ 8693.681082] [<ffffffff81080303>] ? add_sysfs_param.isra.4+0xe1/0x18c
> [ 8693.681085] [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61
> [ 8693.681088] [<ffffffff81508152>] ret_from_fork+0x42/0x70
> [ 8693.681090] [<ffffffff81080cee>] ? __kthread_parkme+0x61/0x61
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum
2016-06-27 21:47 ` Tom Herbert
@ 2016-06-27 21:49 ` Cong Wang
2016-06-27 22:04 ` Tom Herbert
0 siblings, 1 reply; 13+ messages in thread
From: Cong Wang @ 2016-06-27 21:49 UTC (permalink / raw)
To: Tom Herbert
Cc: Or Gerlitz, Linux Netdev List, Shani Michaeli, Tariq Toukan,
Yishai Hadas, Carol L Soto
On Mon, Jun 27, 2016 at 2:47 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Mon, Jun 27, 2016 at 2:44 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Mon, Jun 27, 2016 at 2:08 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>> On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>> The stack doesn't trust the complete csum by hardware
>>>> even when it is correct.
>>>
>>> Can you explain that a little further?
>>
>> Sure, here is the code in __skb_checksum_complete():
>>
>> /* skb->csum holds pseudo checksum */
>> sum = csum_fold(csum_add(skb->csum, csum));
>> if (likely(!sum)) {
>> if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>> !skb->csum_complete_sw)
>> netdev_rx_csum_fault(skb->dev);
>> }
>>
>> So when sum == 0, it means the checksum is correct. And
>> we already set ->ip_summed to CHECKSUM_COMPLETE
>> after check_csum(), and ->csum_complete_sw is initialized
>> to 0 when we allocate the skb. This is why we trigger
>> netdev_rx_csum_fault().
>>
> Yes, but this also means that the driver gave the stack a checksum
> complete value that was incorrect. That's an error.
That is the whole purpose of commit f8c6455bb04b944edb69e,
isn't it?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum
2016-06-27 21:49 ` Cong Wang
@ 2016-06-27 22:04 ` Tom Herbert
2016-06-27 22:53 ` Cong Wang
0 siblings, 1 reply; 13+ messages in thread
From: Tom Herbert @ 2016-06-27 22:04 UTC (permalink / raw)
To: Cong Wang
Cc: Or Gerlitz, Linux Netdev List, Shani Michaeli, Tariq Toukan,
Yishai Hadas, Carol L Soto
On Mon, Jun 27, 2016 at 2:49 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Jun 27, 2016 at 2:47 PM, Tom Herbert <tom@herbertland.com> wrote:
>> On Mon, Jun 27, 2016 at 2:44 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> On Mon, Jun 27, 2016 at 2:08 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>> On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>> The stack doesn't trust the complete csum by hardware
>>>>> even when it is correct.
>>>>
>>>> Can you explain that a little further?
>>>
>>> Sure, here is the code in __skb_checksum_complete():
>>>
>>> /* skb->csum holds pseudo checksum */
>>> sum = csum_fold(csum_add(skb->csum, csum));
>>> if (likely(!sum)) {
>>> if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>>> !skb->csum_complete_sw)
>>> netdev_rx_csum_fault(skb->dev);
>>> }
>>>
>>> So when sum == 0, it means the checksum is correct. And
>>> we already set ->ip_summed to CHECKSUM_COMPLETE
>>> after check_csum(), and ->csum_complete_sw is initialized
>>> to 0 when we allocate the skb. This is why we trigger
>>> netdev_rx_csum_fault().
>>>
>> Yes, but this also means that the driver gave the stack a checksum
>> complete value that was incorrect. That's an error.
>
> That is the whole purpose of commit f8c6455bb04b944edb69e,
> isn't it?
No. Unless you've uncovered some other bug, what is probably happening
is that driver receives a packet with a checksum complete value. It
records the value in the skbuff and marks it as CHECKSUM_COMPLETE.
Subsequently, the stack tries to validate a transport layer checksum,
and the validation fails (checksum does not sum to zero). The stack
will then call __skb_checksum_complete from
__skb_checksum_validate_complete. In this case the stack computes that
transport checksum by hand and sees that transport checksum is valid--
so that means that the original value in checksum complete was not
correct, it is not set to the computed checksum of the whole packet.
This is an important error because it catches issues where checksum is
not correctly being pulled up.
Tom
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum
2016-06-27 22:04 ` Tom Herbert
@ 2016-06-27 22:53 ` Cong Wang
2016-06-27 23:16 ` Tom Herbert
0 siblings, 1 reply; 13+ messages in thread
From: Cong Wang @ 2016-06-27 22:53 UTC (permalink / raw)
To: Tom Herbert
Cc: Or Gerlitz, Linux Netdev List, Shani Michaeli, Tariq Toukan,
Yishai Hadas, Carol L Soto
On Mon, Jun 27, 2016 at 3:04 PM, Tom Herbert <tom@herbertland.com> wrote:
> On Mon, Jun 27, 2016 at 2:49 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> On Mon, Jun 27, 2016 at 2:47 PM, Tom Herbert <tom@herbertland.com> wrote:
>>> On Mon, Jun 27, 2016 at 2:44 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>> On Mon, Jun 27, 2016 at 2:08 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>> On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>>> The stack doesn't trust the complete csum by hardware
>>>>>> even when it is correct.
>>>>>
>>>>> Can you explain that a little further?
>>>>
>>>> Sure, here is the code in __skb_checksum_complete():
>>>>
>>>> /* skb->csum holds pseudo checksum */
>>>> sum = csum_fold(csum_add(skb->csum, csum));
>>>> if (likely(!sum)) {
>>>> if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>>>> !skb->csum_complete_sw)
>>>> netdev_rx_csum_fault(skb->dev);
>>>> }
>>>>
>>>> So when sum == 0, it means the checksum is correct. And
>>>> we already set ->ip_summed to CHECKSUM_COMPLETE
>>>> after check_csum(), and ->csum_complete_sw is initialized
>>>> to 0 when we allocate the skb. This is why we trigger
>>>> netdev_rx_csum_fault().
>>>>
>>> Yes, but this also means that the driver gave the stack a checksum
>>> complete value that was incorrect. That's an error.
>>
>> That is the whole purpose of commit f8c6455bb04b944edb69e,
>> isn't it?
>
> No. Unless you've uncovered some other bug, what is probably happening
> is that driver receives a packet with a checksum complete value. It
> records the value in the skbuff and marks it as CHECKSUM_COMPLETE.
> Subsequently, the stack tries to validate a transport layer checksum,
> and the validation fails (checksum does not sum to zero). The stack
> will then call __skb_checksum_complete from
> __skb_checksum_validate_complete. In this case the stack computes that
> transport checksum by hand and sees that transport checksum is valid--
> so that means that the original value in checksum complete was not
> correct, it is not set to the computed checksum of the whole packet.
> This is an important error because it catches issues where checksum is
> not correctly being pulled up.
I see, the comments in mlx4 driver said:
/* Although the stack expects checksum which doesn't include the pseudo
* header, the HW adds it. To address that, we are subtracting the pseudo
* header checksum from the checksum value provided by the HW.
*/
which seems imply it calculates a correct checksum for the whole
packet here, but the stack disagrees. Therefore skb->csum is not
still not what the stack expects.
Given skb_checksum_simple_validate() always pass a null pseudo
header, it looks like either the fix-up for pseudo header is not needed
at all for ICMP case, OR we need to call skb_checksum_validate()
for ICMPv4 case. Hmm...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum
2016-06-27 22:53 ` Cong Wang
@ 2016-06-27 23:16 ` Tom Herbert
0 siblings, 0 replies; 13+ messages in thread
From: Tom Herbert @ 2016-06-27 23:16 UTC (permalink / raw)
To: Cong Wang
Cc: Or Gerlitz, Linux Netdev List, Shani Michaeli, Tariq Toukan,
Yishai Hadas, Carol L Soto
On Mon, Jun 27, 2016 at 3:53 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Jun 27, 2016 at 3:04 PM, Tom Herbert <tom@herbertland.com> wrote:
>> On Mon, Jun 27, 2016 at 2:49 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>> On Mon, Jun 27, 2016 at 2:47 PM, Tom Herbert <tom@herbertland.com> wrote:
>>>> On Mon, Jun 27, 2016 at 2:44 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>> On Mon, Jun 27, 2016 at 2:08 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>>>>>> On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>>>>>>> The stack doesn't trust the complete csum by hardware
>>>>>>> even when it is correct.
>>>>>>
>>>>>> Can you explain that a little further?
>>>>>
>>>>> Sure, here is the code in __skb_checksum_complete():
>>>>>
>>>>> /* skb->csum holds pseudo checksum */
>>>>> sum = csum_fold(csum_add(skb->csum, csum));
>>>>> if (likely(!sum)) {
>>>>> if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE) &&
>>>>> !skb->csum_complete_sw)
>>>>> netdev_rx_csum_fault(skb->dev);
>>>>> }
>>>>>
>>>>> So when sum == 0, it means the checksum is correct. And
>>>>> we already set ->ip_summed to CHECKSUM_COMPLETE
>>>>> after check_csum(), and ->csum_complete_sw is initialized
>>>>> to 0 when we allocate the skb. This is why we trigger
>>>>> netdev_rx_csum_fault().
>>>>>
>>>> Yes, but this also means that the driver gave the stack a checksum
>>>> complete value that was incorrect. That's an error.
>>>
>>> That is the whole purpose of commit f8c6455bb04b944edb69e,
>>> isn't it?
>>
>> No. Unless you've uncovered some other bug, what is probably happening
>> is that driver receives a packet with a checksum complete value. It
>> records the value in the skbuff and marks it as CHECKSUM_COMPLETE.
>> Subsequently, the stack tries to validate a transport layer checksum,
>> and the validation fails (checksum does not sum to zero). The stack
>> will then call __skb_checksum_complete from
>> __skb_checksum_validate_complete. In this case the stack computes that
>> transport checksum by hand and sees that transport checksum is valid--
>> so that means that the original value in checksum complete was not
>> correct, it is not set to the computed checksum of the whole packet.
>> This is an important error because it catches issues where checksum is
>> not correctly being pulled up.
>
> I see, the comments in mlx4 driver said:
>
> /* Although the stack expects checksum which doesn't include the pseudo
> * header, the HW adds it. To address that, we are subtracting the pseudo
> * header checksum from the checksum value provided by the HW.
> */
>
> which seems imply it calculates a correct checksum for the whole
> packet here, but the stack disagrees. Therefore skb->csum is not
> still not what the stack expects.
>
Right, skb->csum is not what the stack expects. When it does the
computation over the same data it arrives at a different value than
what the driver sets. With this error pops that means the checksum in
the packet is correct, but driver or something in the stack messed up
skb->csum.
> Given skb_checksum_simple_validate() always pass a null pseudo
> header, it looks like either the fix-up for pseudo header is not needed
> at all for ICMP case, OR we need to call skb_checksum_validate()
> for ICMPv4 case. Hmm...
Pseudo header is not part of IPv4 checksum calculation so
skb_checksum_simple_validate is correct. Seems like a good chance
driver is doing fix-up wrong for ICMP.
Tom
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum
2016-06-27 21:44 ` Cong Wang
2016-06-27 21:47 ` Tom Herbert
@ 2016-06-28 14:30 ` Or Gerlitz
2016-06-28 16:52 ` Cong Wang
1 sibling, 1 reply; 13+ messages in thread
From: Or Gerlitz @ 2016-06-28 14:30 UTC (permalink / raw)
To: Cong Wang
Cc: Tom Herbert, Linux Netdev List, Shani Michaeli, Tariq Toukan,
Yishai Hadas, Carol L Soto
On Tue, Jun 28, 2016 at 12:44 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Mon, Jun 27, 2016 at 2:08 PM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
>> On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> can you point/paste the exact warning and how to reproduce that? is
>> that as simple as running ping and/or ping6?
> Yes, ping is enough to reproduce it every time.
> The warning is below:
> [ 8693.680997] eth0: hw csum failure
Wow, can you please report us the exact card type (lspci -nn | grep -i
mellanox) and firmware version (ethtool -i $DEV)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum
2016-06-28 14:30 ` Or Gerlitz
@ 2016-06-28 16:52 ` Cong Wang
2016-06-29 14:23 ` Tariq Toukan
0 siblings, 1 reply; 13+ messages in thread
From: Cong Wang @ 2016-06-28 16:52 UTC (permalink / raw)
To: Or Gerlitz
Cc: Tom Herbert, Linux Netdev List, Shani Michaeli, Tariq Toukan,
Yishai Hadas, Carol L Soto
On Tue, Jun 28, 2016 at 7:30 AM, Or Gerlitz <gerlitz.or@gmail.com> wrote:
> Wow, can you please report us the exact card type (lspci -nn | grep -i
> mellanox) and firmware version (ethtool -i $DEV)
See below. Does commit f8c6455bb04b944edb69e rely on any firmware
change to get an expected checksum?
$ lspci -nn | grep -i mellanox
82:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500
Family [ConnectX-3] [15b3:1003]
$ ethtool -i eth0
driver: mlx4_en
version: 2.2-1 (Feb 2014)
firmware-version: 2.33.5220
bus-info: 0000:82:00.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum
2016-06-28 16:52 ` Cong Wang
@ 2016-06-29 14:23 ` Tariq Toukan
2016-06-29 14:33 ` Or Gerlitz
2016-06-30 17:08 ` Cong Wang
0 siblings, 2 replies; 13+ messages in thread
From: Tariq Toukan @ 2016-06-29 14:23 UTC (permalink / raw)
To: Cong Wang, Or Gerlitz
Cc: Tom Herbert, Linux Netdev List, Shani Michaeli, Tariq Toukan,
Yishai Hadas, Carol L Soto
Hi Cong,
> See below. Does commit f8c6455bb04b944edb69e rely on any firmware
> change to get an expected checksum?
>
> $ lspci -nn | grep -i mellanox
> 82:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500
> Family [ConnectX-3] [15b3:1003]
>
> $ ethtool -i eth0
> driver: mlx4_en
> version: 2.2-1 (Feb 2014)
> firmware-version: 2.33.5220
> bus-info: 0000:82:00.0
I used same HW and FW, but was not able to reproduce.
I have kernel HEAD of net-next: 8a79813c1401 net: ethernet: dwc_eth_qos:
use phy_ethtool_{get|set}_link_ksettings
I ran regular ping, and ping6. Checksum complete counter increases, but
no warnings in dmesg.
What kernel do you use? Does it repro on latest net-next?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum
2016-06-29 14:23 ` Tariq Toukan
@ 2016-06-29 14:33 ` Or Gerlitz
2016-06-30 17:08 ` Cong Wang
1 sibling, 0 replies; 13+ messages in thread
From: Or Gerlitz @ 2016-06-29 14:33 UTC (permalink / raw)
To: Cong Wang
Cc: Tom Herbert, Linux Netdev List, Shani Michaeli, Tariq Toukan,
Yishai Hadas, Carol L Soto, Tariq Toukan
On Wed, Jun 29, 2016 at 5:23 PM, Tariq Toukan <ttoukan.linux@gmail.com> wrote:
> Hi Cong,
>
>> See below. Does commit f8c6455bb04b944edb69e rely on any firmware
>> change to get an expected checksum?
>>
>> $ lspci -nn | grep -i mellanox
>> 82:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500
>> Family [ConnectX-3] [15b3:1003]
>>
>> $ ethtool -i eth0
>> driver: mlx4_en
>> version: 2.2-1 (Feb 2014)
>> firmware-version: 2.33.5220
>> bus-info: 0000:82:00.0
>
> I used same HW and FW, but was not able to reproduce.
> I have kernel HEAD of net-next: 8a79813c1401 net: ethernet: dwc_eth_qos: use
> phy_ethtool_{get|set}_link_ksettings
>
> I ran regular ping, and ping6. Checksum complete counter increases, but no
> warnings in dmesg.
> What kernel do you use? Does it repro on latest net-next?
Also, can you share us somehow (e.g send to us as attachment, your
compressed .config)
Or.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum
2016-06-29 14:23 ` Tariq Toukan
2016-06-29 14:33 ` Or Gerlitz
@ 2016-06-30 17:08 ` Cong Wang
1 sibling, 0 replies; 13+ messages in thread
From: Cong Wang @ 2016-06-30 17:08 UTC (permalink / raw)
To: Tariq Toukan
Cc: Or Gerlitz, Tom Herbert, Linux Netdev List, Shani Michaeli,
Tariq Toukan, Yishai Hadas, Carol L Soto
On Wed, Jun 29, 2016 at 7:23 AM, Tariq Toukan <ttoukan.linux@gmail.com> wrote:
> Hi Cong,
>
>> See below. Does commit f8c6455bb04b944edb69e rely on any firmware
>> change to get an expected checksum?
>>
>> $ lspci -nn | grep -i mellanox
>> 82:00.0 Ethernet controller [0200]: Mellanox Technologies MT27500
>> Family [ConnectX-3] [15b3:1003]
>>
>> $ ethtool -i eth0
>> driver: mlx4_en
>> version: 2.2-1 (Feb 2014)
>> firmware-version: 2.33.5220
>> bus-info: 0000:82:00.0
>
> I used same HW and FW, but was not able to reproduce.
> I have kernel HEAD of net-next: 8a79813c1401 net: ethernet: dwc_eth_qos: use
> phy_ethtool_{get|set}_link_ksettings
>
> I ran regular ping, and ping6. Checksum complete counter increases, but no
> warnings in dmesg.
Thanks for testing it! This helps me a lot to identify the real bug.
Actually the bug is in mirred action instead of mlx4 driver, I have
a patch and just confirmed it fixes the bug. I will send it out in a
few minutes.
Again, thanks a lot for your help!
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-06-30 17:08 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-27 18:22 [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum Cong Wang
2016-06-27 21:08 ` Or Gerlitz
2016-06-27 21:44 ` Cong Wang
2016-06-27 21:47 ` Tom Herbert
2016-06-27 21:49 ` Cong Wang
2016-06-27 22:04 ` Tom Herbert
2016-06-27 22:53 ` Cong Wang
2016-06-27 23:16 ` Tom Herbert
2016-06-28 14:30 ` Or Gerlitz
2016-06-28 16:52 ` Cong Wang
2016-06-29 14:23 ` Tariq Toukan
2016-06-29 14:33 ` Or Gerlitz
2016-06-30 17:08 ` Cong Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).