Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: BBR and TCP internal pacing causing interrupt storm with pfifo_fast
From: Eric Dumazet @ 2018-10-15 16:23 UTC (permalink / raw)
  To: Eric Dumazet, Gasper Zejn; +Cc: Eric Dumazet, Kevin Yang, netdev
In-Reply-To: <CANn89i+OoMLmx4jRTSRX8Svka0FRo-hPM-CqOxb9NJhV9KS7iQ@mail.gmail.com>

On 10/15/2018 07:50 AM, Eric Dumazet wrote:
> On Mon, Oct 15, 2018 at 3:26 AM Gasper Zejn <zelo.zejn@gmail.com> wrote:
>>
>>
>> I've tried to isolate the issue as best I could. There seems to be an
>> issue if the TCP socket has keepalive set and send queue is not empty
>> and the route goes away.
>>
>> https://github.com/zejn/bbr_pfifo_interrupts_issue
>>
>> Hope this helps,
>> Gasper
> 
> This is awesome Gasper, I will take a look thanks.
> 
> Note that we are about to send a patch series (targeting net-next) to
> polish the EDT patch series that was merged last month for linux-4.20.
> TCP internal pacing is going to be much better performance-wise.
> 

Yeah, I believe that :

Commit c092dd5f4a7f4e4dbbcc8cf2e50b516bf07e432f ("tcp: switch
tcp_internal_pacing() to tcp_wstamp_ns")
has incidentally fixed the issue.

That is because it calls tcp_internal_pacing() from
tcp_update_skb_after_send() which is called only if the packet was
correctly sent by IP layer.

Before this patch, tcp_internal_pacing() was called from
__tcp_transmit_skb() before we attempted to send the clone
and the clone could be dropped in IP layer (lack of route for example)
right away.

So in case the packet was not sent because of a route problem, the high resolution
timer would kick soon after and TCP xmit path would be entered again, triggering this loop problem.

I am going to send the 2nd round of EDT patches, so that you can try David Miller net-next tree
with all the patches we believe are needed for 4.20. Once proven to work, we might have to backport
the series to 4.18 and 4.19

Thanks !

^ permalink raw reply

* Re: [Bug 201423] New: eth0: hw csum failure
From: Stephen Hemminger @ 2018-10-15 16:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, rossi.f
In-Reply-To: <CANn89iLA+rdFNXXdzogLHF1FqYg3CjpwXJbscWTJ8Bk8bN2Scw@mail.gmail.com>

On Mon, 15 Oct 2018 08:41:47 -0700
Eric Dumazet <edumazet@google.com> wrote:

> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> >
> >
> > Begin forwarded message:
> >
> > Date: Sun, 14 Oct 2018 10:42:48 +0000
> > From: bugzilla-daemon@bugzilla.kernel.org
> > To: stephen@networkplumber.org
> > Subject: [Bug 201423] New: eth0: hw csum failure
> >
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=201423
> >
> >             Bug ID: 201423
> >            Summary: eth0: hw csum failure
> >            Product: Networking
> >            Version: 2.5
> >     Kernel Version: 4.19.0-rc7
> >           Hardware: Intel
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >           Assignee: stephen@networkplumber.org
> >           Reporter: rossi.f@inwind.it
> >         Regression: No
> >
> > I have a P6T DELUXE V2 motherboard and using the sky2 driver for the ethernet
> > ports. I get the following error message:
> >
> > [  433.727397] eth0: hw csum failure
> > [  433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
> > [  433.727406] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  433.727407] Call Trace:
> > [  433.727409]  <IRQ>
> > [  433.727415]  dump_stack+0x46/0x5b
> > [  433.727419]  __skb_checksum_complete+0xb0/0xc0
> > [  433.727423]  tcp_v4_rcv+0x528/0xb60
> > [  433.727426]  ? ipt_do_table+0x2d0/0x400
> > [  433.727429]  ip_local_deliver_finish+0x5a/0x110
> > [  433.727430]  ip_local_deliver+0xe1/0xf0
> > [  433.727431]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  433.727432]  ip_rcv+0xca/0xe0
> > [  433.727434]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  433.727436]  __netif_receive_skb_one_core+0x4b/0x70
> > [  433.727438]  netif_receive_skb_internal+0x4e/0x130
> > [  433.727439]  napi_gro_receive+0x6a/0x80
> > [  433.727442]  sky2_poll+0x707/0xd20
> > [  433.727446]  ? rcu_check_callbacks+0x1b4/0x900
> > [  433.727447]  net_rx_action+0x237/0x380
> > [  433.727449]  __do_softirq+0xdc/0x1e0
> > [  433.727452]  irq_exit+0xa9/0xb0
> > [  433.727453]  do_IRQ+0x45/0xc0
> > [  433.727455]  common_interrupt+0xf/0xf
> > [  433.727456]  </IRQ>
> > [  433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200
> > [  433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
> > ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
> > 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
> > [  433.727462] RSP: 0000:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
> > ffffffffffffffde
> > [  433.727463] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
> > 000000000000001f
> > [  433.727464] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
> > 0000000000000000
> > [  433.727465] RBP: ffff880237b263a0 R08: 0000000000000714 R09:
> > 000000650512105d
> > [  433.727465] R10: 00000000ffffffff R11: 0000000000000342 R12:
> > 00000064fc2a8b1c
> > [  433.727466] R13: 00000064fc25b35f R14: 0000000000000004 R15:
> > ffffffff8204af20
> > [  433.727468]  ? cpuidle_enter_state+0x119/0x200
> > [  433.727471]  do_idle+0x1bf/0x200
> > [  433.727473]  cpu_startup_entry+0x6a/0x70
> > [  433.727475]  start_secondary+0x17f/0x1c0
> > [  433.727476]  secondary_startup_64+0xa4/0xb0
> > [  441.662954] eth0: hw csum failure
> > [  441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted 4.19.0-rc7 #19
> > [  441.662960] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  441.662960] Call Trace:
> > [  441.662963]  <IRQ>
> > [  441.662968]  dump_stack+0x46/0x5b
> > [  441.662972]  __skb_checksum_complete+0xb0/0xc0
> > [  441.662975]  tcp_v4_rcv+0x528/0xb60
> > [  441.662979]  ? ipt_do_table+0x2d0/0x400
> > [  441.662981]  ip_local_deliver_finish+0x5a/0x110
> > [  441.662983]  ip_local_deliver+0xe1/0xf0
> > [  441.662985]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  441.662986]  ip_rcv+0xca/0xe0
> > [  441.662988]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  441.662990]  __netif_receive_skb_one_core+0x4b/0x70
> > [  441.662993]  netif_receive_skb_internal+0x4e/0x130
> > [  441.662994]  napi_gro_receive+0x6a/0x80
> > [  441.662998]  sky2_poll+0x707/0xd20
> > [  441.663000]  net_rx_action+0x237/0x380
> > [  441.663002]  __do_softirq+0xdc/0x1e0
> > [  441.663005]  irq_exit+0xa9/0xb0
> > [  441.663007]  do_IRQ+0x45/0xc0
> > [  441.663009]  common_interrupt+0xf/0xf
> > [  441.663010]  </IRQ>
> > [  441.663012] RIP: 0010:merge+0x22/0xb0
> > [  441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48 89 d5 53
> > 48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 <48> 85 c9
> > 74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48
> > [  441.663015] RSP: 0018:ffffc9000090b988 EFLAGS: 00000246 ORIG_RAX:
> > ffffffffffffffde
> > [  441.663017] RAX: 0000000000000000 RBX: ffff88021ab2d408 RCX:
> > ffff88021ab2d408
> > [  441.663018] RDX: ffff88021ab2d388 RSI: ffffffffa021c440 RDI:
> > 0000000000000000
> > [  441.663019] RBP: ffff88021ab2d388 R08: 0000000000005ecf R09:
> > 0000000000008500
> > [  441.663020] R10: ffffea000877ec00 R11: ffff880236803500 R12:
> > ffffffffa021c440
> > [  441.663021] R13: ffff88021ab2d448 R14: 0000000000000004 R15:
> > ffffc9000090b9e0
> > [  441.663048]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663063]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663065]  ? merge+0x57/0xb0
> > [  441.663080]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663082]  list_sort+0x8b/0x230
> > [  441.663094]  radeon_cs_parser_fini+0xdf/0x110 [radeon]
> > [  441.663110]  radeon_cs_ioctl+0x2a4/0x710 [radeon]
> > [  441.663113]  ? __switch_to_asm+0x34/0x70
> > [  441.663114]  ? __switch_to_asm+0x40/0x70
> > [  441.663130]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
> > [  441.663141]  drm_ioctl_kernel+0xa3/0xe0 [drm]
> > [  441.663149]  drm_ioctl+0x2e2/0x380 [drm]
> > [  441.663164]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
> > [  441.663168]  ? page_add_new_anon_rmap+0x42/0x70
> > [  441.663171]  do_vfs_ioctl+0x9a/0x600
> > [  441.663173]  ksys_ioctl+0x35/0x60
> > [  441.663175]  __x64_sys_ioctl+0x11/0x20
> > [  441.663177]  do_syscall_64+0x3d/0xf0
> > [  441.663179]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [  441.663180] RIP: 0033:0x7f9377377f37
> > [  441.663182] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 ad
> > db 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01
> > f0 ff ff 73 01 c3 48 8b 0d 21 4f 2c 00 f7 d8 64 89 01 48
> > [  441.663183] RSP: 002b:00007f92c3130d28 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000010
> > [  441.663185] RAX: ffffffffffffffda RBX: 0000564498327ec0 RCX:
> > 00007f9377377f37
> > [  441.663186] RDX: 0000564498337ec8 RSI: 00000000c0206466 RDI:
> > 0000000000000010
> > [  441.663186] RBP: 0000564498337ec8 R08: 0000000000000000 R09:
> > 0000000000000000
> > [  441.663187] R10: 0000000000000000 R11: 0000000000000246 R12:
> > 00000000c0206466
> > [  441.663188] R13: 0000000000000010 R14: 0000000000000000 R15:
> > 0000564497a38120
> > [  462.833418] eth0: hw csum failure
> > [  462.833428] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
> > [  462.833429] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  462.833429] Call Trace:
> > [  462.833432]  <IRQ>
> > [  462.833438]  dump_stack+0x46/0x5b
> > [  462.833442]  __skb_checksum_complete+0xb0/0xc0
> > [  462.833446]  tcp_v4_rcv+0x528/0xb60
> > [  462.833449]  ? ipt_do_table+0x2d0/0x400
> > [  462.833452]  ip_local_deliver_finish+0x5a/0x110
> > [  462.833454]  ip_local_deliver+0xe1/0xf0
> > [  462.833455]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  462.833457]  ip_rcv+0xca/0xe0
> > [  462.833459]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  462.833461]  __netif_receive_skb_one_core+0x4b/0x70
> > [  462.833464]  netif_receive_skb_internal+0x4e/0x130
> > [  462.833466]  napi_gro_receive+0x6a/0x80
> > [  462.833469]  sky2_poll+0x707/0xd20
> > [  462.833471]  net_rx_action+0x237/0x380
> > [  462.833474]  __do_softirq+0xdc/0x1e0
> > [  462.833477]  irq_exit+0xa9/0xb0
> > [  462.833479]  do_IRQ+0x45/0xc0
> > [  462.833481]  common_interrupt+0xf/0xf
> > [  462.833482]  </IRQ>
> > [  462.833486] RIP: 0010:cpuidle_enter_state+0x124/0x200
> > [  462.833488] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
> > ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
> > 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
> > [  462.833489] RSP: 0018:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
> > ffffffffffffffde
> > [  462.833491] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
> > 000000000000001f
> > [  462.833492] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
> > 0000000000000000
> > [  462.833493] RBP: ffff880237b263a0 R08: 0000000000000000 R09:
> > 0000000000000000
> > [  462.833494] R10: 00000000ffffffff R11: 0000000000000273 R12:
> > 0000006bc3052131
> > [  462.833495] R13: 0000006bc2f99f57 R14: 0000000000000004 R15:
> > ffffffff8204af20
> > [  462.833498]  ? cpuidle_enter_state+0x119/0x200
> > [  462.833503]  do_idle+0x1bf/0x200
> > [  462.833506]  cpu_startup_entry+0x6a/0x70
> > [  462.833510]  start_secondary+0x17f/0x1c0
> > [  462.833513]  secondary_startup_64+0xa4/0xb0
> >
> > Something is changed between 4.17.12 and 4.18, after bisecting the problem I
> > got the following first bad commit:
> >
> > commit 88078d98d1bb085d72af8437707279e203524fa5
> > Author: Eric Dumazet <edumazet@google.com>
> > Date:   Wed Apr 18 11:43:15 2018 -0700
> >
> >     net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
> >
> >     After working on IP defragmentation lately, I found that some large
> >     packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
> >     zero paddings on the last (small) fragment.
> >
> >     While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
> >     to CHECKSUM_NONE, forcing a full csum validation, even if all prior
> >     fragments had CHECKSUM_COMPLETE set.
> >
> >     We can instead compute the checksum of the part we are trimming,
> >     usually smaller than the part we keep.
> >
> >     Signed-off-by: Eric Dumazet <edumazet@google.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> >  
> 
> Thanks for bisecting !
> 
> This commit is known to expose some NIC/driver bugs.
> 
> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
> ("net: sungem: fix rx checksum support")  for one driver needing a fix.
> 
> I assume SKY2_HW_NEW_LE is not set on your NIC ?

There are two variants of this chip, one does 1's compliment checksum, and
the other one does TCP checksum. Maybe the 1's compliment version is incorrectly
including the CRC.

Side note, not sure why but the driver only calls gro for checksummed packets.
Is that necessary?

^ permalink raw reply

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
From: Dave Stevenson @ 2018-10-15 16:12 UTC (permalink / raw)
  To: edumazet
  Cc: stephen, netdev, rossi.f, Woojung Huh,
	Microchip Linux Driver Support, Steve Glendinning
In-Reply-To: <CANn89iLA+rdFNXXdzogLHF1FqYg3CjpwXJbscWTJ8Bk8bN2Scw@mail.gmail.com>

Hi Eric.

On Mon, 15 Oct 2018 at 16:42, Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> >
> >
> > Begin forwarded message:
> >
> > Date: Sun, 14 Oct 2018 10:42:48 +0000
> > From: bugzilla-daemon@bugzilla.kernel.org
> > To: stephen@networkplumber.org
> > Subject: [Bug 201423] New: eth0: hw csum failure
> >
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=201423
> >
> >             Bug ID: 201423
> >            Summary: eth0: hw csum failure
> >            Product: Networking
> >            Version: 2.5
> >     Kernel Version: 4.19.0-rc7
> >           Hardware: Intel
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >           Assignee: stephen@networkplumber.org
> >           Reporter: rossi.f@inwind.it
> >         Regression: No
> >
> > I have a P6T DELUXE V2 motherboard and using the sky2 driver for the ethernet
> > ports. I get the following error message:
> >
> > [  433.727397] eth0: hw csum failure
> > [  433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
> > [  433.727406] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  433.727407] Call Trace:
> > [  433.727409]  <IRQ>
> > [  433.727415]  dump_stack+0x46/0x5b
> > [  433.727419]  __skb_checksum_complete+0xb0/0xc0
> > [  433.727423]  tcp_v4_rcv+0x528/0xb60
> > [  433.727426]  ? ipt_do_table+0x2d0/0x400
> > [  433.727429]  ip_local_deliver_finish+0x5a/0x110
> > [  433.727430]  ip_local_deliver+0xe1/0xf0
> > [  433.727431]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  433.727432]  ip_rcv+0xca/0xe0
> > [  433.727434]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  433.727436]  __netif_receive_skb_one_core+0x4b/0x70
> > [  433.727438]  netif_receive_skb_internal+0x4e/0x130
> > [  433.727439]  napi_gro_receive+0x6a/0x80
> > [  433.727442]  sky2_poll+0x707/0xd20
> > [  433.727446]  ? rcu_check_callbacks+0x1b4/0x900
> > [  433.727447]  net_rx_action+0x237/0x380
> > [  433.727449]  __do_softirq+0xdc/0x1e0
> > [  433.727452]  irq_exit+0xa9/0xb0
> > [  433.727453]  do_IRQ+0x45/0xc0
> > [  433.727455]  common_interrupt+0xf/0xf
> > [  433.727456]  </IRQ>
> > [  433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200
> > [  433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
> > ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
> > 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
> > [  433.727462] RSP: 0000:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
> > ffffffffffffffde
> > [  433.727463] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
> > 000000000000001f
> > [  433.727464] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
> > 0000000000000000
> > [  433.727465] RBP: ffff880237b263a0 R08: 0000000000000714 R09:
> > 000000650512105d
> > [  433.727465] R10: 00000000ffffffff R11: 0000000000000342 R12:
> > 00000064fc2a8b1c
> > [  433.727466] R13: 00000064fc25b35f R14: 0000000000000004 R15:
> > ffffffff8204af20
> > [  433.727468]  ? cpuidle_enter_state+0x119/0x200
> > [  433.727471]  do_idle+0x1bf/0x200
> > [  433.727473]  cpu_startup_entry+0x6a/0x70
> > [  433.727475]  start_secondary+0x17f/0x1c0
> > [  433.727476]  secondary_startup_64+0xa4/0xb0
> > [  441.662954] eth0: hw csum failure
> > [  441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted 4.19.0-rc7 #19
> > [  441.662960] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  441.662960] Call Trace:
> > [  441.662963]  <IRQ>
> > [  441.662968]  dump_stack+0x46/0x5b
> > [  441.662972]  __skb_checksum_complete+0xb0/0xc0
> > [  441.662975]  tcp_v4_rcv+0x528/0xb60
> > [  441.662979]  ? ipt_do_table+0x2d0/0x400
> > [  441.662981]  ip_local_deliver_finish+0x5a/0x110
> > [  441.662983]  ip_local_deliver+0xe1/0xf0
> > [  441.662985]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  441.662986]  ip_rcv+0xca/0xe0
> > [  441.662988]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  441.662990]  __netif_receive_skb_one_core+0x4b/0x70
> > [  441.662993]  netif_receive_skb_internal+0x4e/0x130
> > [  441.662994]  napi_gro_receive+0x6a/0x80
> > [  441.662998]  sky2_poll+0x707/0xd20
> > [  441.663000]  net_rx_action+0x237/0x380
> > [  441.663002]  __do_softirq+0xdc/0x1e0
> > [  441.663005]  irq_exit+0xa9/0xb0
> > [  441.663007]  do_IRQ+0x45/0xc0
> > [  441.663009]  common_interrupt+0xf/0xf
> > [  441.663010]  </IRQ>
> > [  441.663012] RIP: 0010:merge+0x22/0xb0
> > [  441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48 89 d5 53
> > 48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 <48> 85 c9
> > 74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48
> > [  441.663015] RSP: 0018:ffffc9000090b988 EFLAGS: 00000246 ORIG_RAX:
> > ffffffffffffffde
> > [  441.663017] RAX: 0000000000000000 RBX: ffff88021ab2d408 RCX:
> > ffff88021ab2d408
> > [  441.663018] RDX: ffff88021ab2d388 RSI: ffffffffa021c440 RDI:
> > 0000000000000000
> > [  441.663019] RBP: ffff88021ab2d388 R08: 0000000000005ecf R09:
> > 0000000000008500
> > [  441.663020] R10: ffffea000877ec00 R11: ffff880236803500 R12:
> > ffffffffa021c440
> > [  441.663021] R13: ffff88021ab2d448 R14: 0000000000000004 R15:
> > ffffc9000090b9e0
> > [  441.663048]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663063]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663065]  ? merge+0x57/0xb0
> > [  441.663080]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> > [  441.663082]  list_sort+0x8b/0x230
> > [  441.663094]  radeon_cs_parser_fini+0xdf/0x110 [radeon]
> > [  441.663110]  radeon_cs_ioctl+0x2a4/0x710 [radeon]
> > [  441.663113]  ? __switch_to_asm+0x34/0x70
> > [  441.663114]  ? __switch_to_asm+0x40/0x70
> > [  441.663130]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
> > [  441.663141]  drm_ioctl_kernel+0xa3/0xe0 [drm]
> > [  441.663149]  drm_ioctl+0x2e2/0x380 [drm]
> > [  441.663164]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
> > [  441.663168]  ? page_add_new_anon_rmap+0x42/0x70
> > [  441.663171]  do_vfs_ioctl+0x9a/0x600
> > [  441.663173]  ksys_ioctl+0x35/0x60
> > [  441.663175]  __x64_sys_ioctl+0x11/0x20
> > [  441.663177]  do_syscall_64+0x3d/0xf0
> > [  441.663179]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [  441.663180] RIP: 0033:0x7f9377377f37
> > [  441.663182] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 ad
> > db 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01
> > f0 ff ff 73 01 c3 48 8b 0d 21 4f 2c 00 f7 d8 64 89 01 48
> > [  441.663183] RSP: 002b:00007f92c3130d28 EFLAGS: 00000246 ORIG_RAX:
> > 0000000000000010
> > [  441.663185] RAX: ffffffffffffffda RBX: 0000564498327ec0 RCX:
> > 00007f9377377f37
> > [  441.663186] RDX: 0000564498337ec8 RSI: 00000000c0206466 RDI:
> > 0000000000000010
> > [  441.663186] RBP: 0000564498337ec8 R08: 0000000000000000 R09:
> > 0000000000000000
> > [  441.663187] R10: 0000000000000000 R11: 0000000000000246 R12:
> > 00000000c0206466
> > [  441.663188] R13: 0000000000000010 R14: 0000000000000000 R15:
> > 0000564497a38120
> > [  462.833418] eth0: hw csum failure
> > [  462.833428] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
> > [  462.833429] Hardware name: System manufacturer System Product Name/P6T
> > DELUXE V2, BIOS 1202    12/22/2010
> > [  462.833429] Call Trace:
> > [  462.833432]  <IRQ>
> > [  462.833438]  dump_stack+0x46/0x5b
> > [  462.833442]  __skb_checksum_complete+0xb0/0xc0
> > [  462.833446]  tcp_v4_rcv+0x528/0xb60
> > [  462.833449]  ? ipt_do_table+0x2d0/0x400
> > [  462.833452]  ip_local_deliver_finish+0x5a/0x110
> > [  462.833454]  ip_local_deliver+0xe1/0xf0
> > [  462.833455]  ? ip_sublist_rcv_finish+0x60/0x60
> > [  462.833457]  ip_rcv+0xca/0xe0
> > [  462.833459]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> > [  462.833461]  __netif_receive_skb_one_core+0x4b/0x70
> > [  462.833464]  netif_receive_skb_internal+0x4e/0x130
> > [  462.833466]  napi_gro_receive+0x6a/0x80
> > [  462.833469]  sky2_poll+0x707/0xd20
> > [  462.833471]  net_rx_action+0x237/0x380
> > [  462.833474]  __do_softirq+0xdc/0x1e0
> > [  462.833477]  irq_exit+0xa9/0xb0
> > [  462.833479]  do_IRQ+0x45/0xc0
> > [  462.833481]  common_interrupt+0xf/0xf
> > [  462.833482]  </IRQ>
> > [  462.833486] RIP: 0010:cpuidle_enter_state+0x124/0x200
> > [  462.833488] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
> > ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
> > 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
> > [  462.833489] RSP: 0018:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
> > ffffffffffffffde
> > [  462.833491] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
> > 000000000000001f
> > [  462.833492] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
> > 0000000000000000
> > [  462.833493] RBP: ffff880237b263a0 R08: 0000000000000000 R09:
> > 0000000000000000
> > [  462.833494] R10: 00000000ffffffff R11: 0000000000000273 R12:
> > 0000006bc3052131
> > [  462.833495] R13: 0000006bc2f99f57 R14: 0000000000000004 R15:
> > ffffffff8204af20
> > [  462.833498]  ? cpuidle_enter_state+0x119/0x200
> > [  462.833503]  do_idle+0x1bf/0x200
> > [  462.833506]  cpu_startup_entry+0x6a/0x70
> > [  462.833510]  start_secondary+0x17f/0x1c0
> > [  462.833513]  secondary_startup_64+0xa4/0xb0
> >
> > Something is changed between 4.17.12 and 4.18, after bisecting the problem I
> > got the following first bad commit:
> >
> > commit 88078d98d1bb085d72af8437707279e203524fa5
> > Author: Eric Dumazet <edumazet@google.com>
> > Date:   Wed Apr 18 11:43:15 2018 -0700
> >
> >     net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
> >
> >     After working on IP defragmentation lately, I found that some large
> >     packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
> >     zero paddings on the last (small) fragment.
> >
> >     While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
> >     to CHECKSUM_NONE, forcing a full csum validation, even if all prior
> >     fragments had CHECKSUM_COMPLETE set.
> >
> >     We can instead compute the checksum of the part we are trimming,
> >     usually smaller than the part we keep.
> >
> >     Signed-off-by: Eric Dumazet <edumazet@google.com>
> >     Signed-off-by: David S. Miller <davem@davemloft.net>
> >
>
> Thanks for bisecting !
>
> This commit is known to expose some NIC/driver bugs.
>
> Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
> ("net: sungem: fix rx checksum support")  for one driver needing a fix.
>
> I assume SKY2_HW_NEW_LE is not set on your NIC ?

Just to say that we've also just hit this with both the LAN78xx and
SMSC9514 drivers, ie all Raspberry Pis with onboard ethernet. Likewise
that commit had been pinpointed as the cause, or at least exposing an
underlying issue.
As the patch has been backported in 4.14.71 it's hitting LTS users too.

Thanks for the pointer on sungem. I'll have a look into what's going
on and see if we can sort it, although I have cc'ed in the maintainers
of those chips in case they are already on the case.

Cheers.
  Dave

^ permalink raw reply

* Re: [bpf-next PATCH v2 2/2] bpf: bpftool, add flag to allow non-compat map definitions
From: Jakub Kicinski @ 2018-10-15 16:06 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev
In-Reply-To: <20181015151753.9258.75330.stgit@john-Precision-Tower-5810>

On Mon, 15 Oct 2018 08:17:53 -0700, John Fastabend wrote:
> Multiple map definition structures exist and user may have non-zero
> fields in their definition that are not recognized by bpftool and
> libbpf. The normal behavior is to then fail loading the map. Although
> this is a good default behavior users may still want to load the map
> for debugging or other reasons. This patch adds a --mapcompat flag
> that can be used to override the default behavior and allow loading
> the map even when it has additional non-zero fields.
> 
> For now the only user is 'bpftool prog' we can switch over other
> subcommands as needed. The library exposes an API that consumes
> a flags field now but I kept the original API around also in case
> users of the API don't want to expose this. The flags field is an
> int in case we need more control over how the API call handles
> errors/features/etc in the future.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

No strong opinion on the functionality, but may I be a grump and again
request adding the new option to completions and the man page? :)

^ permalink raw reply

* Re: [bpf-next PATCH v2 1/2] bpf: bpftool, add support for attaching programs to maps
From: Jakub Kicinski @ 2018-10-15 16:04 UTC (permalink / raw)
  To: John Fastabend; +Cc: ast, daniel, netdev
In-Reply-To: <20181015151748.9258.66154.stgit@john-Precision-Tower-5810>

On Mon, 15 Oct 2018 08:17:48 -0700, John Fastabend wrote:
> Sock map/hash introduce support for attaching programs to maps. To
> date I have been doing this with custom tooling but this is less than
> ideal as we shift to using bpftool as the single CLI for our BPF uses.
> This patch adds new sub commands 'attach' and 'detach' to the 'prog'
> command to attach programs to maps and then detach them.
> 
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

^ permalink raw reply

* Re: Bug in MACSec - stops passing traffic after approx 5TB
From: Josh Coombs @ 2018-10-15 15:45 UTC (permalink / raw)
  To: sd; +Cc: netdev
In-Reply-To: <CACcUnf8uYKKbE8o=LvT7cENKiuB_sBMD+2VMCCOnwa=2W=Ri4w@mail.gmail.com>

And confirmed, starting with a high packet number results in a very
short testbed run, 296 packets and then nothing, just as you surmised.
Sorry for raising the alarm falsely.  Looks like I need to roll my own
build of wpa_supplicant as the ubuntu builds don't include the macsec
driver, haven't tested Gentoo's ebuilds yet to see if they do.

Josh Coombs

On Sun, Oct 14, 2018 at 4:52 PM Josh Coombs <jcoombs@staff.gwi.net> wrote:
>
> On Sun, Oct 14, 2018 at 4:24 PM Sabrina Dubroca <sd@queasysnail.net> wrote:
> >
> > 2018-10-14, 10:59:31 -0400, Josh Coombs wrote:
> > > I initially mistook this for a traffic control issue, but after
> > > stripping the test beds down to just the MACSec component, I can still
> > > replicate the issue.  After approximately 5TB of transfer / 4 billion
> > > packets over a MACSec link it stops passing traffic.
> >
> > I think you're just hitting packet number exhaustion. After 2^32
> > packets, the packet number would wrap to 0 and start being reused,
> > which breaks the crypto used by macsec. Before this point, you have to
> > add a new SA, and tell the macsec device to switch to it.
>
> I had not considered that, I naively thought as long as I didn't
> specify a replay window, it'd roll the PN over on it's own and life
> would be good.  I'll test that theory tomorrow, should be easy to
> prove out.
>
> > That's why you should be using wpa_supplicant. It will monitor the
> > growth of the packet number, and handle the rekey for you.
>
> Thank you for the heads up, I'll read up on this as well.
>
> Josh C

^ permalink raw reply

* Re: Fw: [Bug 201423] New: eth0: hw csum failure
From: Eric Dumazet @ 2018-10-15 15:41 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, rossi.f
In-Reply-To: <20181015081519.0bf076bc@xeon-e3>

On Mon, Oct 15, 2018 at 8:15 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
>
>
> Begin forwarded message:
>
> Date: Sun, 14 Oct 2018 10:42:48 +0000
> From: bugzilla-daemon@bugzilla.kernel.org
> To: stephen@networkplumber.org
> Subject: [Bug 201423] New: eth0: hw csum failure
>
>
> https://bugzilla.kernel.org/show_bug.cgi?id=201423
>
>             Bug ID: 201423
>            Summary: eth0: hw csum failure
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 4.19.0-rc7
>           Hardware: Intel
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>           Assignee: stephen@networkplumber.org
>           Reporter: rossi.f@inwind.it
>         Regression: No
>
> I have a P6T DELUXE V2 motherboard and using the sky2 driver for the ethernet
> ports. I get the following error message:
>
> [  433.727397] eth0: hw csum failure
> [  433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
> [  433.727406] Hardware name: System manufacturer System Product Name/P6T
> DELUXE V2, BIOS 1202    12/22/2010
> [  433.727407] Call Trace:
> [  433.727409]  <IRQ>
> [  433.727415]  dump_stack+0x46/0x5b
> [  433.727419]  __skb_checksum_complete+0xb0/0xc0
> [  433.727423]  tcp_v4_rcv+0x528/0xb60
> [  433.727426]  ? ipt_do_table+0x2d0/0x400
> [  433.727429]  ip_local_deliver_finish+0x5a/0x110
> [  433.727430]  ip_local_deliver+0xe1/0xf0
> [  433.727431]  ? ip_sublist_rcv_finish+0x60/0x60
> [  433.727432]  ip_rcv+0xca/0xe0
> [  433.727434]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> [  433.727436]  __netif_receive_skb_one_core+0x4b/0x70
> [  433.727438]  netif_receive_skb_internal+0x4e/0x130
> [  433.727439]  napi_gro_receive+0x6a/0x80
> [  433.727442]  sky2_poll+0x707/0xd20
> [  433.727446]  ? rcu_check_callbacks+0x1b4/0x900
> [  433.727447]  net_rx_action+0x237/0x380
> [  433.727449]  __do_softirq+0xdc/0x1e0
> [  433.727452]  irq_exit+0xa9/0xb0
> [  433.727453]  do_IRQ+0x45/0xc0
> [  433.727455]  common_interrupt+0xf/0xf
> [  433.727456]  </IRQ>
> [  433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200
> [  433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
> ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
> 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
> [  433.727462] RSP: 0000:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
> ffffffffffffffde
> [  433.727463] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
> 000000000000001f
> [  433.727464] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
> 0000000000000000
> [  433.727465] RBP: ffff880237b263a0 R08: 0000000000000714 R09:
> 000000650512105d
> [  433.727465] R10: 00000000ffffffff R11: 0000000000000342 R12:
> 00000064fc2a8b1c
> [  433.727466] R13: 00000064fc25b35f R14: 0000000000000004 R15:
> ffffffff8204af20
> [  433.727468]  ? cpuidle_enter_state+0x119/0x200
> [  433.727471]  do_idle+0x1bf/0x200
> [  433.727473]  cpu_startup_entry+0x6a/0x70
> [  433.727475]  start_secondary+0x17f/0x1c0
> [  433.727476]  secondary_startup_64+0xa4/0xb0
> [  441.662954] eth0: hw csum failure
> [  441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted 4.19.0-rc7 #19
> [  441.662960] Hardware name: System manufacturer System Product Name/P6T
> DELUXE V2, BIOS 1202    12/22/2010
> [  441.662960] Call Trace:
> [  441.662963]  <IRQ>
> [  441.662968]  dump_stack+0x46/0x5b
> [  441.662972]  __skb_checksum_complete+0xb0/0xc0
> [  441.662975]  tcp_v4_rcv+0x528/0xb60
> [  441.662979]  ? ipt_do_table+0x2d0/0x400
> [  441.662981]  ip_local_deliver_finish+0x5a/0x110
> [  441.662983]  ip_local_deliver+0xe1/0xf0
> [  441.662985]  ? ip_sublist_rcv_finish+0x60/0x60
> [  441.662986]  ip_rcv+0xca/0xe0
> [  441.662988]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> [  441.662990]  __netif_receive_skb_one_core+0x4b/0x70
> [  441.662993]  netif_receive_skb_internal+0x4e/0x130
> [  441.662994]  napi_gro_receive+0x6a/0x80
> [  441.662998]  sky2_poll+0x707/0xd20
> [  441.663000]  net_rx_action+0x237/0x380
> [  441.663002]  __do_softirq+0xdc/0x1e0
> [  441.663005]  irq_exit+0xa9/0xb0
> [  441.663007]  do_IRQ+0x45/0xc0
> [  441.663009]  common_interrupt+0xf/0xf
> [  441.663010]  </IRQ>
> [  441.663012] RIP: 0010:merge+0x22/0xb0
> [  441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48 89 d5 53
> 48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 <48> 85 c9
> 74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48
> [  441.663015] RSP: 0018:ffffc9000090b988 EFLAGS: 00000246 ORIG_RAX:
> ffffffffffffffde
> [  441.663017] RAX: 0000000000000000 RBX: ffff88021ab2d408 RCX:
> ffff88021ab2d408
> [  441.663018] RDX: ffff88021ab2d388 RSI: ffffffffa021c440 RDI:
> 0000000000000000
> [  441.663019] RBP: ffff88021ab2d388 R08: 0000000000005ecf R09:
> 0000000000008500
> [  441.663020] R10: ffffea000877ec00 R11: ffff880236803500 R12:
> ffffffffa021c440
> [  441.663021] R13: ffff88021ab2d448 R14: 0000000000000004 R15:
> ffffc9000090b9e0
> [  441.663048]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> [  441.663063]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> [  441.663065]  ? merge+0x57/0xb0
> [  441.663080]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
> [  441.663082]  list_sort+0x8b/0x230
> [  441.663094]  radeon_cs_parser_fini+0xdf/0x110 [radeon]
> [  441.663110]  radeon_cs_ioctl+0x2a4/0x710 [radeon]
> [  441.663113]  ? __switch_to_asm+0x34/0x70
> [  441.663114]  ? __switch_to_asm+0x40/0x70
> [  441.663130]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
> [  441.663141]  drm_ioctl_kernel+0xa3/0xe0 [drm]
> [  441.663149]  drm_ioctl+0x2e2/0x380 [drm]
> [  441.663164]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
> [  441.663168]  ? page_add_new_anon_rmap+0x42/0x70
> [  441.663171]  do_vfs_ioctl+0x9a/0x600
> [  441.663173]  ksys_ioctl+0x35/0x60
> [  441.663175]  __x64_sys_ioctl+0x11/0x20
> [  441.663177]  do_syscall_64+0x3d/0xf0
> [  441.663179]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  441.663180] RIP: 0033:0x7f9377377f37
> [  441.663182] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 ad
> db 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01
> f0 ff ff 73 01 c3 48 8b 0d 21 4f 2c 00 f7 d8 64 89 01 48
> [  441.663183] RSP: 002b:00007f92c3130d28 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [  441.663185] RAX: ffffffffffffffda RBX: 0000564498327ec0 RCX:
> 00007f9377377f37
> [  441.663186] RDX: 0000564498337ec8 RSI: 00000000c0206466 RDI:
> 0000000000000010
> [  441.663186] RBP: 0000564498337ec8 R08: 0000000000000000 R09:
> 0000000000000000
> [  441.663187] R10: 0000000000000000 R11: 0000000000000246 R12:
> 00000000c0206466
> [  441.663188] R13: 0000000000000010 R14: 0000000000000000 R15:
> 0000564497a38120
> [  462.833418] eth0: hw csum failure
> [  462.833428] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
> [  462.833429] Hardware name: System manufacturer System Product Name/P6T
> DELUXE V2, BIOS 1202    12/22/2010
> [  462.833429] Call Trace:
> [  462.833432]  <IRQ>
> [  462.833438]  dump_stack+0x46/0x5b
> [  462.833442]  __skb_checksum_complete+0xb0/0xc0
> [  462.833446]  tcp_v4_rcv+0x528/0xb60
> [  462.833449]  ? ipt_do_table+0x2d0/0x400
> [  462.833452]  ip_local_deliver_finish+0x5a/0x110
> [  462.833454]  ip_local_deliver+0xe1/0xf0
> [  462.833455]  ? ip_sublist_rcv_finish+0x60/0x60
> [  462.833457]  ip_rcv+0xca/0xe0
> [  462.833459]  ? ip_rcv_finish_core.isra.0+0x300/0x300
> [  462.833461]  __netif_receive_skb_one_core+0x4b/0x70
> [  462.833464]  netif_receive_skb_internal+0x4e/0x130
> [  462.833466]  napi_gro_receive+0x6a/0x80
> [  462.833469]  sky2_poll+0x707/0xd20
> [  462.833471]  net_rx_action+0x237/0x380
> [  462.833474]  __do_softirq+0xdc/0x1e0
> [  462.833477]  irq_exit+0xa9/0xb0
> [  462.833479]  do_IRQ+0x45/0xc0
> [  462.833481]  common_interrupt+0xf/0xf
> [  462.833482]  </IRQ>
> [  462.833486] RIP: 0010:cpuidle_enter_state+0x124/0x200
> [  462.833488] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
> ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
> 4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
> [  462.833489] RSP: 0018:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
> ffffffffffffffde
> [  462.833491] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
> 000000000000001f
> [  462.833492] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
> 0000000000000000
> [  462.833493] RBP: ffff880237b263a0 R08: 0000000000000000 R09:
> 0000000000000000
> [  462.833494] R10: 00000000ffffffff R11: 0000000000000273 R12:
> 0000006bc3052131
> [  462.833495] R13: 0000006bc2f99f57 R14: 0000000000000004 R15:
> ffffffff8204af20
> [  462.833498]  ? cpuidle_enter_state+0x119/0x200
> [  462.833503]  do_idle+0x1bf/0x200
> [  462.833506]  cpu_startup_entry+0x6a/0x70
> [  462.833510]  start_secondary+0x17f/0x1c0
> [  462.833513]  secondary_startup_64+0xa4/0xb0
>
> Something is changed between 4.17.12 and 4.18, after bisecting the problem I
> got the following first bad commit:
>
> commit 88078d98d1bb085d72af8437707279e203524fa5
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Wed Apr 18 11:43:15 2018 -0700
>
>     net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
>
>     After working on IP defragmentation lately, I found that some large
>     packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
>     zero paddings on the last (small) fragment.
>
>     While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
>     to CHECKSUM_NONE, forcing a full csum validation, even if all prior
>     fragments had CHECKSUM_COMPLETE set.
>
>     We can instead compute the checksum of the part we are trimming,
>     usually smaller than the part we keep.
>
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>

Thanks for bisecting !

This commit is known to expose some NIC/driver bugs.

Look at commit 12b03558cef6d655d0d394f5e98a6fd07c1f6c0f
("net: sungem: fix rx checksum support")  for one driver needing a fix.

I assume SKY2_HW_NEW_LE is not set on your NIC ?

^ permalink raw reply

* [bpf-next PATCH v2 2/2] bpf: bpftool, add flag to allow non-compat map definitions
From: John Fastabend @ 2018-10-15 15:17 UTC (permalink / raw)
  To: jakub.kicinski, ast, daniel; +Cc: netdev
In-Reply-To: <20181015151336.9258.37606.stgit@john-Precision-Tower-5810>

Multiple map definition structures exist and user may have non-zero
fields in their definition that are not recognized by bpftool and
libbpf. The normal behavior is to then fail loading the map. Although
this is a good default behavior users may still want to load the map
for debugging or other reasons. This patch adds a --mapcompat flag
that can be used to override the default behavior and allow loading
the map even when it has additional non-zero fields.

For now the only user is 'bpftool prog' we can switch over other
subcommands as needed. The library exposes an API that consumes
a flags field now but I kept the original API around also in case
users of the API don't want to expose this. The flags field is an
int in case we need more control over how the API call handles
errors/features/etc in the future.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 tools/bpf/bpftool/main.c |    7 ++++++-
 tools/bpf/bpftool/main.h |    3 ++-
 tools/bpf/bpftool/prog.c |    2 +-
 tools/lib/bpf/bpf.h      |    3 +++
 tools/lib/bpf/libbpf.c   |   27 ++++++++++++++++++---------
 tools/lib/bpf/libbpf.h   |    2 ++
 6 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/tools/bpf/bpftool/main.c b/tools/bpf/bpftool/main.c
index 79dc3f1..828dde3 100644
--- a/tools/bpf/bpftool/main.c
+++ b/tools/bpf/bpftool/main.c
@@ -55,6 +55,7 @@
 bool pretty_output;
 bool json_output;
 bool show_pinned;
+int bpf_flags;
 struct pinned_obj_table prog_table;
 struct pinned_obj_table map_table;
 
@@ -341,6 +342,7 @@ int main(int argc, char **argv)
 		{ "pretty",	no_argument,	NULL,	'p' },
 		{ "version",	no_argument,	NULL,	'V' },
 		{ "bpffs",	no_argument,	NULL,	'f' },
+		{ "mapcompat",	no_argument,	NULL,	'm' },
 		{ 0 }
 	};
 	int opt, ret;
@@ -355,7 +357,7 @@ int main(int argc, char **argv)
 	hash_init(map_table.table);
 
 	opterr = 0;
-	while ((opt = getopt_long(argc, argv, "Vhpjf",
+	while ((opt = getopt_long(argc, argv, "Vhpjfm",
 				  options, NULL)) >= 0) {
 		switch (opt) {
 		case 'V':
@@ -379,6 +381,9 @@ int main(int argc, char **argv)
 		case 'f':
 			show_pinned = true;
 			break;
+		case 'm':
+			bpf_flags = MAPS_RELAX_COMPAT;
+			break;
 		default:
 			p_err("unrecognized option '%s'", argv[optind - 1]);
 			if (json_output)
diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 40492cd..91fd697 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -74,7 +74,7 @@
 #define HELP_SPEC_PROGRAM						\
 	"PROG := { id PROG_ID | pinned FILE | tag PROG_TAG }"
 #define HELP_SPEC_OPTIONS						\
-	"OPTIONS := { {-j|--json} [{-p|--pretty}] | {-f|--bpffs} }"
+	"OPTIONS := { {-j|--json} [{-p|--pretty}] | {-f|--bpffs} | {-m|--mapcompat}"
 #define HELP_SPEC_MAP							\
 	"MAP := { id MAP_ID | pinned FILE }"
 
@@ -89,6 +89,7 @@ enum bpf_obj_type {
 extern json_writer_t *json_wtr;
 extern bool json_output;
 extern bool show_pinned;
+extern int bpf_flags;
 extern struct pinned_obj_table prog_table;
 extern struct pinned_obj_table map_table;
 
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 99ab42c..3350289 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -908,7 +908,7 @@ static int do_load(int argc, char **argv)
 		}
 	}
 
-	obj = bpf_object__open_xattr(&attr);
+	obj = __bpf_object__open_xattr(&attr, bpf_flags);
 	if (IS_ERR_OR_NULL(obj)) {
 		p_err("failed to open object file");
 		goto err_free_reuse_maps;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 87520a8..69a4d40 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -69,6 +69,9 @@ struct bpf_load_program_attr {
 	__u32 prog_ifindex;
 };
 
+/* Flags to direct loading requirements */
+#define MAPS_RELAX_COMPAT	0x01
+
 /* Recommend log buffer size */
 #define BPF_LOG_BUF_SIZE (256 * 1024)
 int bpf_load_program_xattr(const struct bpf_load_program_attr *load_attr,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 176cf55..bd71efc 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -562,8 +562,9 @@ static int compare_bpf_map(const void *_a, const void *_b)
 }
 
 static int
-bpf_object__init_maps(struct bpf_object *obj)
+bpf_object__init_maps(struct bpf_object *obj, int flags)
 {
+	bool strict = !(flags & MAPS_RELAX_COMPAT);
 	int i, map_idx, map_def_sz, nr_maps = 0;
 	Elf_Scn *scn;
 	Elf_Data *data;
@@ -685,7 +686,8 @@ static int compare_bpf_map(const void *_a, const void *_b)
 						   "has unrecognized, non-zero "
 						   "options\n",
 						   obj->path, map_name);
-					return -EINVAL;
+					if (strict)
+						return -EINVAL;
 				}
 			}
 			memcpy(&obj->maps[map_idx].def, def,
@@ -716,7 +718,7 @@ static bool section_have_execinstr(struct bpf_object *obj, int idx)
 	return false;
 }
 
-static int bpf_object__elf_collect(struct bpf_object *obj)
+static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 {
 	Elf *elf = obj->efile.elf;
 	GElf_Ehdr *ep = &obj->efile.ehdr;
@@ -843,7 +845,7 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 		return LIBBPF_ERRNO__FORMAT;
 	}
 	if (obj->efile.maps_shndx >= 0) {
-		err = bpf_object__init_maps(obj);
+		err = bpf_object__init_maps(obj, flags);
 		if (err)
 			goto out;
 	}
@@ -1515,7 +1517,7 @@ static int bpf_object__validate(struct bpf_object *obj, bool needs_kver)
 
 static struct bpf_object *
 __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
-		   bool needs_kver)
+		   bool needs_kver, int flags)
 {
 	struct bpf_object *obj;
 	int err;
@@ -1531,7 +1533,7 @@ static int bpf_object__validate(struct bpf_object *obj, bool needs_kver)
 
 	CHECK_ERR(bpf_object__elf_init(obj), err, out);
 	CHECK_ERR(bpf_object__check_endianness(obj), err, out);
-	CHECK_ERR(bpf_object__elf_collect(obj), err, out);
+	CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
 	CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
 	CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
 
@@ -1542,7 +1544,8 @@ static int bpf_object__validate(struct bpf_object *obj, bool needs_kver)
 	return ERR_PTR(err);
 }
 
-struct bpf_object *bpf_object__open_xattr(struct bpf_object_open_attr *attr)
+struct bpf_object *__bpf_object__open_xattr(struct bpf_object_open_attr *attr,
+					    int flags)
 {
 	/* param validation */
 	if (!attr->file)
@@ -1551,7 +1554,13 @@ struct bpf_object *bpf_object__open_xattr(struct bpf_object_open_attr *attr)
 	pr_debug("loading %s\n", attr->file);
 
 	return __bpf_object__open(attr->file, NULL, 0,
-				  bpf_prog_type__needs_kver(attr->prog_type));
+				  bpf_prog_type__needs_kver(attr->prog_type),
+				  flags);
+}
+
+struct bpf_object *bpf_object__open_xattr(struct bpf_object_open_attr *attr)
+{
+	return __bpf_object__open_xattr(attr, 0);
 }
 
 struct bpf_object *bpf_object__open(const char *path)
@@ -1584,7 +1593,7 @@ struct bpf_object *bpf_object__open_buffer(void *obj_buf,
 	pr_debug("loading object '%s' from buffer\n",
 		 name);
 
-	return __bpf_object__open(name, obj_buf, obj_buf_sz, true);
+	return __bpf_object__open(name, obj_buf, obj_buf_sz, true, true);
 }
 
 int bpf_object__unload(struct bpf_object *obj)
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 8af8d36..7e9c801 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -61,6 +61,8 @@ struct bpf_object_open_attr {
 
 struct bpf_object *bpf_object__open(const char *path);
 struct bpf_object *bpf_object__open_xattr(struct bpf_object_open_attr *attr);
+struct bpf_object *__bpf_object__open_xattr(struct bpf_object_open_attr *attr,
+					    int flags);
 struct bpf_object *bpf_object__open_buffer(void *obj_buf,
 					   size_t obj_buf_sz,
 					   const char *name);

^ permalink raw reply related

* [bpf-next PATCH v2 1/2] bpf: bpftool, add support for attaching programs to maps
From: John Fastabend @ 2018-10-15 15:17 UTC (permalink / raw)
  To: jakub.kicinski, ast, daniel; +Cc: netdev
In-Reply-To: <20181015151336.9258.37606.stgit@john-Precision-Tower-5810>

Sock map/hash introduce support for attaching programs to maps. To
date I have been doing this with custom tooling but this is less than
ideal as we shift to using bpftool as the single CLI for our BPF uses.
This patch adds new sub commands 'attach' and 'detach' to the 'prog'
command to attach programs to maps and then detach them.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 tools/bpf/bpftool/Documentation/bpftool-prog.rst |   11 ++
 tools/bpf/bpftool/Documentation/bpftool.rst      |    2 
 tools/bpf/bpftool/bash-completion/bpftool        |   19 ++++
 tools/bpf/bpftool/prog.c                         |   99 ++++++++++++++++++++++
 4 files changed, 128 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 64156a1..12c8030 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -25,6 +25,8 @@ MAP COMMANDS
 |	**bpftool** **prog dump jited**  *PROG* [{**file** *FILE* | **opcodes**}]
 |	**bpftool** **prog pin** *PROG* *FILE*
 |	**bpftool** **prog load** *OBJ* *FILE* [**type** *TYPE*] [**map** {**idx** *IDX* | **name** *NAME*} *MAP*] [**dev** *NAME*]
+|       **bpftool** **prog attach** *PROG* *ATTACH_TYPE* *MAP*
+|       **bpftool** **prog detach** *PROG* *ATTACH_TYPE* *MAP*
 |	**bpftool** **prog help**
 |
 |	*MAP* := { **id** *MAP_ID* | **pinned** *FILE* }
@@ -37,6 +39,7 @@ MAP COMMANDS
 |		**cgroup/bind4** | **cgroup/bind6** | **cgroup/post_bind4** | **cgroup/post_bind6** |
 |		**cgroup/connect4** | **cgroup/connect6** | **cgroup/sendmsg4** | **cgroup/sendmsg6**
 |	}
+|       *ATTACH_TYPE* := { **msg_verdict** | **skb_verdict** | **skb_parse** }
 
 
 DESCRIPTION
@@ -90,6 +93,14 @@ DESCRIPTION
 
 		  Note: *FILE* must be located in *bpffs* mount.
 
+        **bpftool prog attach** *PROG* *ATTACH_TYPE* *MAP*
+                  Attach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
+                  to the map *MAP*.
+
+        **bpftool prog detach** *PROG* *ATTACH_TYPE* *MAP*
+                  Detach bpf program *PROG* (with type specified by *ATTACH_TYPE*)
+                  from the map *MAP*.
+
 	**bpftool prog help**
 		  Print short help message.
 
diff --git a/tools/bpf/bpftool/Documentation/bpftool.rst b/tools/bpf/bpftool/Documentation/bpftool.rst
index 8dda77d..25c0872 100644
--- a/tools/bpf/bpftool/Documentation/bpftool.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool.rst
@@ -26,7 +26,7 @@ SYNOPSIS
 	| **pin** | **event_pipe** | **help** }
 
 	*PROG-COMMANDS* := { **show** | **list** | **dump jited** | **dump xlated** | **pin**
-	| **load** | **help** }
+	| **load** | **attach** | **detach** | **help** }
 
 	*CGROUP-COMMANDS* := { **show** | **list** | **attach** | **detach** | **help** }
 
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index df1060b..0826519 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -292,6 +292,23 @@ _bpftool()
                     fi
                     return 0
                     ;;
+                attach|detach)
+                    if [[ ${#words[@]} == 7 ]]; then
+                        COMPREPLY=( $( compgen -W "id pinned" -- "$cur" ) )
+                        return 0
+                    fi
+
+                    if [[ ${#words[@]} == 6 ]]; then
+                        COMPREPLY=( $( compgen -W "msg_verdict skb_verdict skb_parse" -- "$cur" ) )
+                        return 0
+                    fi
+
+                    if [[ $prev == "$command" ]]; then
+                        COMPREPLY=( $( compgen -W "id pinned" -- "$cur" ) )
+                        return 0
+                    fi
+                    return 0
+                    ;;
                 load)
                     local obj
 
@@ -347,7 +364,7 @@ _bpftool()
                     ;;
                 *)
                     [[ $prev == $object ]] && \
-                        COMPREPLY=( $( compgen -W 'dump help pin load \
+                        COMPREPLY=( $( compgen -W 'dump help pin attach detach load \
                             show list' -- "$cur" ) )
                     ;;
             esac
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index b1cd3bc..99ab42c 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -77,6 +77,26 @@
 	[BPF_PROG_TYPE_FLOW_DISSECTOR]	= "flow_dissector",
 };
 
+static const char * const attach_type_strings[] = {
+	[BPF_SK_SKB_STREAM_PARSER] = "stream_parser",
+	[BPF_SK_SKB_STREAM_VERDICT] = "stream_verdict",
+	[BPF_SK_MSG_VERDICT] = "msg_verdict",
+	[__MAX_BPF_ATTACH_TYPE] = NULL,
+};
+
+enum bpf_attach_type parse_attach_type(const char *str)
+{
+	enum bpf_attach_type type;
+
+	for (type = 0; type < __MAX_BPF_ATTACH_TYPE; type++) {
+		if (attach_type_strings[type] &&
+		    is_prefix(str, attach_type_strings[type]))
+			return type;
+	}
+
+	return __MAX_BPF_ATTACH_TYPE;
+}
+
 static void print_boot_time(__u64 nsecs, char *buf, unsigned int size)
 {
 	struct timespec real_time_ts, boot_time_ts;
@@ -697,6 +717,77 @@ int map_replace_compar(const void *p1, const void *p2)
 	return a->idx - b->idx;
 }
 
+static int do_attach(int argc, char **argv)
+{
+	enum bpf_attach_type attach_type;
+	int err, mapfd, progfd;
+
+	if (!REQ_ARGS(5)) {
+		p_err("too few parameters for map attach");
+		return -EINVAL;
+	}
+
+	progfd = prog_parse_fd(&argc, &argv);
+	if (progfd < 0)
+		return progfd;
+
+	attach_type = parse_attach_type(*argv);
+	if (attach_type == __MAX_BPF_ATTACH_TYPE) {
+		p_err("invalid attach type");
+		return -EINVAL;
+	}
+	NEXT_ARG();
+
+	mapfd = map_parse_fd(&argc, &argv);
+	if (mapfd < 0)
+		return mapfd;
+
+	err = bpf_prog_attach(progfd, mapfd, attach_type, 0);
+	if (err) {
+		p_err("failed prog attach to map");
+		return -EINVAL;
+	}
+
+	if (json_output)
+		jsonw_null(json_wtr);
+	return 0;
+}
+
+static int do_detach(int argc, char **argv)
+{
+	enum bpf_attach_type attach_type;
+	int err, mapfd, progfd;
+
+	if (!REQ_ARGS(5)) {
+		p_err("too few parameters for map detach");
+		return -EINVAL;
+	}
+
+	progfd = prog_parse_fd(&argc, &argv);
+	if (progfd < 0)
+		return progfd;
+
+	attach_type = parse_attach_type(*argv);
+	if (attach_type == __MAX_BPF_ATTACH_TYPE) {
+		p_err("invalid attach type");
+		return -EINVAL;
+	}
+	NEXT_ARG();
+
+	mapfd = map_parse_fd(&argc, &argv);
+	if (mapfd < 0)
+		return mapfd;
+
+	err = bpf_prog_detach2(progfd, mapfd, attach_type);
+	if (err) {
+		p_err("failed prog detach from map");
+		return -EINVAL;
+	}
+
+	if (json_output)
+		jsonw_null(json_wtr);
+	return 0;
+}
 static int do_load(int argc, char **argv)
 {
 	enum bpf_attach_type expected_attach_type;
@@ -942,6 +1033,8 @@ static int do_help(int argc, char **argv)
 		"       %s %s pin   PROG FILE\n"
 		"       %s %s load  OBJ  FILE [type TYPE] [dev NAME] \\\n"
 		"                         [map { idx IDX | name NAME } MAP]\n"
+		"       %s %s attach PROG ATTACH_TYPE MAP\n"
+		"       %s %s detach PROG ATTACH_TYPE MAP\n"
 		"       %s %s help\n"
 		"\n"
 		"       " HELP_SPEC_MAP "\n"
@@ -953,10 +1046,12 @@ static int do_help(int argc, char **argv)
 		"                 cgroup/bind4 | cgroup/bind6 | cgroup/post_bind4 |\n"
 		"                 cgroup/post_bind6 | cgroup/connect4 | cgroup/connect6 |\n"
 		"                 cgroup/sendmsg4 | cgroup/sendmsg6 }\n"
+		"       ATTACH_TYPE := { msg_verdict | skb_verdict | skb_parse }\n"
 		"       " HELP_SPEC_OPTIONS "\n"
 		"",
 		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
-		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2]);
+		bin_name, argv[-2], bin_name, argv[-2], bin_name, argv[-2],
+		bin_name, argv[-2], bin_name, argv[-2]);
 
 	return 0;
 }
@@ -968,6 +1063,8 @@ static int do_help(int argc, char **argv)
 	{ "dump",	do_dump },
 	{ "pin",	do_pin },
 	{ "load",	do_load },
+	{ "attach",	do_attach },
+	{ "detach",	do_detach },
 	{ 0 }
 };
 

^ permalink raw reply related

* [bpf-next PATCH v2 0/2] bpftool support for sockmap use cases
From: John Fastabend @ 2018-10-15 15:17 UTC (permalink / raw)
  To: jakub.kicinski, ast, daniel; +Cc: netdev

The first patch adds support for attaching programs to maps. This is
needed to support sock{map|hash} use from bpftool. Currently, I carry
around custom code to do this so doing it using standard bpftool will
be great.

The second patch adds a compat mode to ignore non-zero entries in
the map def. This allows using bpftool with maps that have a extra
fields that the user knows can be ignored. This is needed to work
correctly with maps being loaded by other tools or directly via
syscalls.

---

John Fastabend (2):
      bpf: bpftool, add support for attaching programs to maps
      bpf: bpftool, add flag to allow non-compat map definitions

 tools/bpf/bpftool/Documentation/bpftool-prog.rst |   11 ++
 tools/bpf/bpftool/Documentation/bpftool.rst      |    2 
 tools/bpf/bpftool/bash-completion/bpftool        |   19 ++++
 tools/bpf/bpftool/main.c                         |    7 +-
 tools/bpf/bpftool/main.h                         |    3 -
 tools/bpf/bpftool/prog.c                         |  101 ++++++++++++++++++++++
 tools/lib/bpf/bpf.h                              |    3 +
 tools/lib/bpf/libbpf.c                           |   27 ++++--
 tools/lib/bpf/libbpf.h                           |    2 
 9 files changed, 160 insertions(+), 15 deletions(-)

^ permalink raw reply

* Fw: [Bug 201423] New: eth0: hw csum failure
From: Stephen Hemminger @ 2018-10-15 15:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev



Begin forwarded message:

Date: Sun, 14 Oct 2018 10:42:48 +0000
From: bugzilla-daemon@bugzilla.kernel.org
To: stephen@networkplumber.org
Subject: [Bug 201423] New: eth0: hw csum failure


https://bugzilla.kernel.org/show_bug.cgi?id=201423

            Bug ID: 201423
           Summary: eth0: hw csum failure
           Product: Networking
           Version: 2.5
    Kernel Version: 4.19.0-rc7
          Hardware: Intel
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Other
          Assignee: stephen@networkplumber.org
          Reporter: rossi.f@inwind.it
        Regression: No

I have a P6T DELUXE V2 motherboard and using the sky2 driver for the ethernet
ports. I get the following error message:

[  433.727397] eth0: hw csum failure
[  433.727406] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
[  433.727406] Hardware name: System manufacturer System Product Name/P6T
DELUXE V2, BIOS 1202    12/22/2010
[  433.727407] Call Trace:
[  433.727409]  <IRQ>
[  433.727415]  dump_stack+0x46/0x5b
[  433.727419]  __skb_checksum_complete+0xb0/0xc0
[  433.727423]  tcp_v4_rcv+0x528/0xb60
[  433.727426]  ? ipt_do_table+0x2d0/0x400
[  433.727429]  ip_local_deliver_finish+0x5a/0x110
[  433.727430]  ip_local_deliver+0xe1/0xf0
[  433.727431]  ? ip_sublist_rcv_finish+0x60/0x60
[  433.727432]  ip_rcv+0xca/0xe0
[  433.727434]  ? ip_rcv_finish_core.isra.0+0x300/0x300
[  433.727436]  __netif_receive_skb_one_core+0x4b/0x70
[  433.727438]  netif_receive_skb_internal+0x4e/0x130
[  433.727439]  napi_gro_receive+0x6a/0x80
[  433.727442]  sky2_poll+0x707/0xd20
[  433.727446]  ? rcu_check_callbacks+0x1b4/0x900
[  433.727447]  net_rx_action+0x237/0x380
[  433.727449]  __do_softirq+0xdc/0x1e0
[  433.727452]  irq_exit+0xa9/0xb0
[  433.727453]  do_IRQ+0x45/0xc0
[  433.727455]  common_interrupt+0xf/0xf
[  433.727456]  </IRQ>
[  433.727459] RIP: 0010:cpuidle_enter_state+0x124/0x200
[  433.727461] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
[  433.727462] RSP: 0000:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
ffffffffffffffde
[  433.727463] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
000000000000001f
[  433.727464] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
0000000000000000
[  433.727465] RBP: ffff880237b263a0 R08: 0000000000000714 R09:
000000650512105d
[  433.727465] R10: 00000000ffffffff R11: 0000000000000342 R12:
00000064fc2a8b1c
[  433.727466] R13: 00000064fc25b35f R14: 0000000000000004 R15:
ffffffff8204af20
[  433.727468]  ? cpuidle_enter_state+0x119/0x200
[  433.727471]  do_idle+0x1bf/0x200
[  433.727473]  cpu_startup_entry+0x6a/0x70
[  433.727475]  start_secondary+0x17f/0x1c0
[  433.727476]  secondary_startup_64+0xa4/0xb0
[  441.662954] eth0: hw csum failure
[  441.662959] CPU: 4 PID: 4347 Comm: radeon_cs:0 Not tainted 4.19.0-rc7 #19
[  441.662960] Hardware name: System manufacturer System Product Name/P6T
DELUXE V2, BIOS 1202    12/22/2010
[  441.662960] Call Trace:
[  441.662963]  <IRQ>
[  441.662968]  dump_stack+0x46/0x5b
[  441.662972]  __skb_checksum_complete+0xb0/0xc0
[  441.662975]  tcp_v4_rcv+0x528/0xb60
[  441.662979]  ? ipt_do_table+0x2d0/0x400
[  441.662981]  ip_local_deliver_finish+0x5a/0x110
[  441.662983]  ip_local_deliver+0xe1/0xf0
[  441.662985]  ? ip_sublist_rcv_finish+0x60/0x60
[  441.662986]  ip_rcv+0xca/0xe0
[  441.662988]  ? ip_rcv_finish_core.isra.0+0x300/0x300
[  441.662990]  __netif_receive_skb_one_core+0x4b/0x70
[  441.662993]  netif_receive_skb_internal+0x4e/0x130
[  441.662994]  napi_gro_receive+0x6a/0x80
[  441.662998]  sky2_poll+0x707/0xd20
[  441.663000]  net_rx_action+0x237/0x380
[  441.663002]  __do_softirq+0xdc/0x1e0
[  441.663005]  irq_exit+0xa9/0xb0
[  441.663007]  do_IRQ+0x45/0xc0
[  441.663009]  common_interrupt+0xf/0xf
[  441.663010]  </IRQ>
[  441.663012] RIP: 0010:merge+0x22/0xb0
[  441.663014] Code: c3 31 c0 c3 90 90 90 90 41 56 41 55 41 54 55 48 89 d5 53
48 89 cb 48 83 ec 18 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 <48> 85 c9
74 70 48 85 d2 74 6b 49 89 fd 49 89 f6 49 89 e4 eb 14 48
[  441.663015] RSP: 0018:ffffc9000090b988 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffffde
[  441.663017] RAX: 0000000000000000 RBX: ffff88021ab2d408 RCX:
ffff88021ab2d408
[  441.663018] RDX: ffff88021ab2d388 RSI: ffffffffa021c440 RDI:
0000000000000000
[  441.663019] RBP: ffff88021ab2d388 R08: 0000000000005ecf R09:
0000000000008500
[  441.663020] R10: ffffea000877ec00 R11: ffff880236803500 R12:
ffffffffa021c440
[  441.663021] R13: ffff88021ab2d448 R14: 0000000000000004 R15:
ffffc9000090b9e0
[  441.663048]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
[  441.663063]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
[  441.663065]  ? merge+0x57/0xb0
[  441.663080]  ? radeon_irq_kms_set_irq_n_enabled+0x120/0x120 [radeon]
[  441.663082]  list_sort+0x8b/0x230
[  441.663094]  radeon_cs_parser_fini+0xdf/0x110 [radeon]
[  441.663110]  radeon_cs_ioctl+0x2a4/0x710 [radeon]
[  441.663113]  ? __switch_to_asm+0x34/0x70
[  441.663114]  ? __switch_to_asm+0x40/0x70
[  441.663130]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
[  441.663141]  drm_ioctl_kernel+0xa3/0xe0 [drm]
[  441.663149]  drm_ioctl+0x2e2/0x380 [drm]
[  441.663164]  ? radeon_cs_parser_init+0x20/0x20 [radeon]
[  441.663168]  ? page_add_new_anon_rmap+0x42/0x70
[  441.663171]  do_vfs_ioctl+0x9a/0x600
[  441.663173]  ksys_ioctl+0x35/0x60
[  441.663175]  __x64_sys_ioctl+0x11/0x20
[  441.663177]  do_syscall_64+0x3d/0xf0
[  441.663179]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  441.663180] RIP: 0033:0x7f9377377f37
[  441.663182] Code: 00 00 00 75 0c 48 c7 c0 ff ff ff ff 48 83 c4 18 c3 e8 ad
db 01 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 b8 10 00 00 00 0f 05 <48> 3d 01
f0 ff ff 73 01 c3 48 8b 0d 21 4f 2c 00 f7 d8 64 89 01 48
[  441.663183] RSP: 002b:00007f92c3130d28 EFLAGS: 00000246 ORIG_RAX:
0000000000000010
[  441.663185] RAX: ffffffffffffffda RBX: 0000564498327ec0 RCX:
00007f9377377f37
[  441.663186] RDX: 0000564498337ec8 RSI: 00000000c0206466 RDI:
0000000000000010
[  441.663186] RBP: 0000564498337ec8 R08: 0000000000000000 R09:
0000000000000000
[  441.663187] R10: 0000000000000000 R11: 0000000000000246 R12:
00000000c0206466
[  441.663188] R13: 0000000000000010 R14: 0000000000000000 R15:
0000564497a38120
[  462.833418] eth0: hw csum failure
[  462.833428] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.19.0-rc7 #19
[  462.833429] Hardware name: System manufacturer System Product Name/P6T
DELUXE V2, BIOS 1202    12/22/2010
[  462.833429] Call Trace:
[  462.833432]  <IRQ>
[  462.833438]  dump_stack+0x46/0x5b
[  462.833442]  __skb_checksum_complete+0xb0/0xc0
[  462.833446]  tcp_v4_rcv+0x528/0xb60
[  462.833449]  ? ipt_do_table+0x2d0/0x400
[  462.833452]  ip_local_deliver_finish+0x5a/0x110
[  462.833454]  ip_local_deliver+0xe1/0xf0
[  462.833455]  ? ip_sublist_rcv_finish+0x60/0x60
[  462.833457]  ip_rcv+0xca/0xe0
[  462.833459]  ? ip_rcv_finish_core.isra.0+0x300/0x300
[  462.833461]  __netif_receive_skb_one_core+0x4b/0x70
[  462.833464]  netif_receive_skb_internal+0x4e/0x130
[  462.833466]  napi_gro_receive+0x6a/0x80
[  462.833469]  sky2_poll+0x707/0xd20
[  462.833471]  net_rx_action+0x237/0x380
[  462.833474]  __do_softirq+0xdc/0x1e0
[  462.833477]  irq_exit+0xa9/0xb0
[  462.833479]  do_IRQ+0x45/0xc0
[  462.833481]  common_interrupt+0xf/0xf
[  462.833482]  </IRQ>
[  462.833486] RIP: 0010:cpuidle_enter_state+0x124/0x200
[  462.833488] Code: 53 60 89 c3 e8 dd 90 ad ff 65 8b 3d 96 58 a7 7e e8 d1 8f
ad ff 31 ff 49 89 c4 e8 27 99 ad ff fb 48 ba cf f7 53 e3 a5 9b c4 20 <4c> 89 e1
4c 29 e9 48 89 c8 48 c1 f9 3f 48 f7 ea b8 ff ff ff 7f 48
[  462.833489] RSP: 0018:ffffc900000a3e98 EFLAGS: 00000282 ORIG_RAX:
ffffffffffffffde
[  462.833491] RAX: ffff880237b1f280 RBX: 0000000000000004 RCX:
000000000000001f
[  462.833492] RDX: 20c49ba5e353f7cf RSI: 000000002fe419c1 RDI:
0000000000000000
[  462.833493] RBP: ffff880237b263a0 R08: 0000000000000000 R09:
0000000000000000
[  462.833494] R10: 00000000ffffffff R11: 0000000000000273 R12:
0000006bc3052131
[  462.833495] R13: 0000006bc2f99f57 R14: 0000000000000004 R15:
ffffffff8204af20
[  462.833498]  ? cpuidle_enter_state+0x119/0x200
[  462.833503]  do_idle+0x1bf/0x200
[  462.833506]  cpu_startup_entry+0x6a/0x70
[  462.833510]  start_secondary+0x17f/0x1c0
[  462.833513]  secondary_startup_64+0xa4/0xb0

Something is changed between 4.17.12 and 4.18, after bisecting the problem I
got the following first bad commit:

commit 88078d98d1bb085d72af8437707279e203524fa5
Author: Eric Dumazet <edumazet@google.com>
Date:   Wed Apr 18 11:43:15 2018 -0700

    net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends

    After working on IP defragmentation lately, I found that some large
    packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
    zero paddings on the last (small) fragment.

    While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
    to CHECKSUM_NONE, forcing a full csum validation, even if all prior
    fragments had CHECKSUM_COMPLETE set.

    We can instead compute the checksum of the part we are trimming,
    usually smaller than the part we keep.

    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply

* [PATCH] Bluetooth: Fix locking in bt_accept_enqueue() for BH context
From: Matthias Kaehlcke @ 2018-10-15 22:39 UTC (permalink / raw)
  To: Marcel Holtmann, Johan Hedberg, David S . Miller, Dean Jenkins
  Cc: linux-bluetooth, netdev, linux-kernel, Konstantin Khlebnikov,
	Balakrishna Godavarthi, Douglas Anderson, Dmitry Grinberg,
	Matthias Kaehlcke

With commit e16337622016 ("Bluetooth: Handle bt_accept_enqueue() socket
atomically") lock_sock[_nested]() is used to acquire the socket lock
before manipulating the socket. lock_sock[_nested]() may block, which
is problematic since bt_accept_enqueue() can be called in bottom half
context (e.g. from rfcomm_connect_ind()).

The socket API provides bh_lock_sock[_nested]() to acquire the socket
lock in bottom half context. Check the context in bt_accept_enqueue()
and use the appropriate locking mechanism for the context.

Fixes: e16337622016 ("Bluetooth: Handle bt_accept_enqueue() socket atomically")
Signed-off-by: Matthias Kaehlcke <mka@chromium.org>
---
Not sure if this is the correct solution, it's certainly not elegant and
checkpatch.pl complains that in_atomic() shouldn't be used outside of
core kernel code. I'm open to other suggestions :)

 net/bluetooth/af_bluetooth.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index deacc52d7ff1..0f0540dbb44a 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -159,10 +159,20 @@ void bt_accept_enqueue(struct sock *parent, struct sock *sk)
 	BT_DBG("parent %p, sk %p", parent, sk);
 
 	sock_hold(sk);
-	lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
+
+	if (in_atomic())
+		bh_lock_sock_nested(sk);
+	else
+		lock_sock_nested(sk, SINGLE_DEPTH_NESTING);
+
 	list_add_tail(&bt_sk(sk)->accept_q, &bt_sk(parent)->accept_q);
 	bt_sk(sk)->parent = parent;
-	release_sock(sk);
+
+	if (in_atomic())
+		bh_unlock_sock(sk);
+	else
+		release_sock(sk);
+
 	parent->sk_ack_backlog++;
 }
 EXPORT_SYMBOL(bt_accept_enqueue);
-- 
2.19.1.331.ge82ca0e54c-goog

^ permalink raw reply related

* Re: BBR and TCP internal pacing causing interrupt storm with pfifo_fast
From: Eric Dumazet @ 2018-10-15 14:50 UTC (permalink / raw)
  To: Gasper Zejn; +Cc: Eric Dumazet, Kevin Yang, netdev
In-Reply-To: <252ea882-bcd2-b205-7d68-541e88b5d617@gmail.com>

On Mon, Oct 15, 2018 at 3:26 AM Gasper Zejn <zelo.zejn@gmail.com> wrote:
>
>
> I've tried to isolate the issue as best I could. There seems to be an
> issue if the TCP socket has keepalive set and send queue is not empty
> and the route goes away.
>
> https://github.com/zejn/bbr_pfifo_interrupts_issue
>
> Hope this helps,
> Gasper

This is awesome Gasper, I will take a look thanks.

Note that we are about to send a patch series (targeting net-next) to
polish the EDT patch series that was merged last month for linux-4.20.
TCP internal pacing is going to be much better performance-wise.

^ permalink raw reply

* Re: net/wan: hostess_sv11 + z85230 problems
From: Randy Dunlap @ 2018-10-15 22:24 UTC (permalink / raw)
  To: Krzysztof Hałasa; +Cc: netdev@vger.kernel.org, LKML, Alan Cox
In-Reply-To: <m34ldnd3ju.fsf@t19.piap.pl>

On 10/15/18 1:20 AM, Krzysztof Hałasa wrote:
> Hi,
> 
> Randy Dunlap <rdunlap@infradead.org> writes:
> 
>> kernel 4.19-rc7, on i386, with NO wan/hdlc/hostess/z85230 hardware:
>>
>> modprobe hostess_sv11 + autoload of z85230 give:
> 
> BTW Hostess SV11 is apparently an ISA card, with all those problems.

Yeah.

>> [ 3162.511877] Call Trace:
...
> 
> Not sure about z8530 internals (driver and hw), but I guess the sv11
> driver should initialize the hw first, and only then request_irq().
> Perhaps there should be no "default address" either? The user would
> have to provide the hardware parameters explicitly.
> 
> How about this (totally untested):
> Fix the Hostess SV11 driver trying to use the hardware before its
> existence is detected.
> 
> Signed-off-by: Krzysztof Halasa <khalasa@piap.pl>

Tested-by: Randy Dunlap <rdunlap@infradead.org>

or you can just rm it, like Alan suggested.

thanks.
-- 
~Randy

^ permalink raw reply

* [PATCH net] rxrpc: Fix a missing rxrpc_put_peer() in the error_report handler
From: David Howells @ 2018-10-15 21:37 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel

Fix a missing call to rxrpc_put_peer() on the main path through the
rxrpc_error_report() function.  This manifests itself as a ref leak
whenever an ICMP packet or other error comes in.

In commit f334430316e7, the hand-off of the ref to a work item was removed
and was not replaced with a put.

Fixes: f334430316e7 ("rxrpc: Fix error distribution")
Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/peer_event.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/net/rxrpc/peer_event.c b/net/rxrpc/peer_event.c
index 05b51bdbdd41..bd2fa3b7caa7 100644
--- a/net/rxrpc/peer_event.c
+++ b/net/rxrpc/peer_event.c
@@ -195,6 +195,7 @@ void rxrpc_error_report(struct sock *sk)
 	rxrpc_store_error(peer, serr);
 	rcu_read_unlock();
 	rxrpc_free_skb(skb, rxrpc_skb_rx_freed);
+	rxrpc_put_peer(peer);
 
 	_leave("");
 }

^ permalink raw reply related

* Re: [PATCH net 1/2] geneve, vxlan: Don't check skb_dst() twice
From: Nicolas Dichtel @ 2018-10-15 12:24 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: David S. Miller, Xin Long, Sabrina Dubroca, netdev
In-Reply-To: <20181015130830.1c177301@redhat.com>

Le 15/10/2018 à 13:08, Stefano Brivio a écrit :
> On Mon, 15 Oct 2018 12:19:41 +0200
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> 
>> Le 12/10/2018 à 23:53, Stefano Brivio a écrit :
>>> Commit f15ca723c1eb ("net: don't call update_pmtu unconditionally") avoids
>>> that we try updating PMTU for a non-existent destination, but didn't clean
>>> up cases where the check was already explicit. Drop those redundant checks.  
>> Yes, I leave them to avoid calculating the new mtu value when not needed. We are
>> in the xmit path.
> 
> Before 2/2 of this series, though, we call skb_dst_update_pmtu() (and
> in turn dst->ops->update_pmtu()) for *every* packet with a dst, which
Not if dst is of type md_dst_ops.

> I'd dare saying is by far the most common case. Besides, 2/2 needs
> anyway to calculate the MTU to fix a bug.
> 
> So I think this is a vast improvement overall.
Fair point.

^ permalink raw reply

* Re: [PATCH net-next, v3] hv_netvsc: fix vf serial matching with pci slot info
From: Stephen Hemminger @ 2018-10-15 19:56 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: haiyangz, davem, netdev, olaf, linux-kernel, devel, vkuznets
In-Reply-To: <20181015190615.30628-1-haiyangz@linuxonhyperv.com>

On Mon, 15 Oct 2018 19:06:15 +0000
Haiyang Zhang <haiyangz@linuxonhyperv.com> wrote:

> From: Haiyang Zhang <haiyangz@microsoft.com>
> 
> The VF device's serial number is saved as a string in PCI slot's
> kobj name, not the slot->number. This patch corrects the netvsc
> driver, so the VF device can be successfully paired with synthetic
> NIC.
> 
> Fixes: 00d7ddba1143 ("hv_netvsc: pair VF based on serial number")
> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>

Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>

^ permalink raw reply

* [PATCH net] sctp: use the pmtu from the icmp packet to update transport pathmtu
From: Xin Long @ 2018-10-15 11:58 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: davem, Marcelo Ricardo Leitner, Neil Horman

Other than asoc pmtu sync from all transports, sctp_assoc_sync_pmtu
is also processing transport pmtu_pending by icmp packets. But it's
meaningless to use sctp_dst_mtu(t->dst) as new pmtu for a transport.

The right pmtu value should come from the icmp packet, and it would
be saved into transport->mtu_info in this patch and used later when
the pmtu sync happens in sctp_sendmsg_to_asoc or sctp_packet_config.

Besides, without this patch, as pmtu can only be updated correctly
when receiving a icmp packet and no place is holding sock lock, it
will take long time if the sock is busy with sending packets.

Note that it doesn't process transport->mtu_info in .release_cb(),
as there is no enough information for pmtu update, like for which
asoc or transport. It is not worth traversing all asocs to check
pmtu_pending. So unlike tcp, sctp does this in tx path, for which
mtu_info needs to be atomic_t.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/sctp/structs.h | 2 ++
 net/sctp/associola.c       | 3 ++-
 net/sctp/input.c           | 1 +
 net/sctp/output.c          | 6 ++++++
 4 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 28a7c8e..a11f937 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -876,6 +876,8 @@ struct sctp_transport {
 	unsigned long sackdelay;
 	__u32 sackfreq;
 
+	atomic_t mtu_info;
+
 	/* When was the last time that we heard from this transport? We use
 	 * this to pick new active and retran paths.
 	 */
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 297d9cf..a827a1f 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1450,7 +1450,8 @@ void sctp_assoc_sync_pmtu(struct sctp_association *asoc)
 	/* Get the lowest pmtu of all the transports. */
 	list_for_each_entry(t, &asoc->peer.transport_addr_list, transports) {
 		if (t->pmtu_pending && t->dst) {
-			sctp_transport_update_pmtu(t, sctp_dst_mtu(t->dst));
+			sctp_transport_update_pmtu(t,
+						   atomic_read(&t->mtu_info));
 			t->pmtu_pending = 0;
 		}
 		if (!pmtu || (t->pathmtu < pmtu))
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 9bbc5f9..5c36a99 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -395,6 +395,7 @@ void sctp_icmp_frag_needed(struct sock *sk, struct sctp_association *asoc,
 		return;
 
 	if (sock_owned_by_user(sk)) {
+		atomic_set(&t->mtu_info, pmtu);
 		asoc->pmtu_pending = 1;
 		t->pmtu_pending = 1;
 		return;
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 7f849b0..67939ad 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -120,6 +120,12 @@ void sctp_packet_config(struct sctp_packet *packet, __u32 vtag,
 			sctp_assoc_sync_pmtu(asoc);
 	}
 
+	if (asoc->pmtu_pending) {
+		if (asoc->param_flags & SPP_PMTUD_ENABLE)
+			sctp_assoc_sync_pmtu(asoc);
+		asoc->pmtu_pending = 0;
+	}
+
 	/* If there a is a prepend chunk stick it on the list before
 	 * any other chunks get appended.
 	 */
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net-next] netfilter: cttimeout: remove set but not used variable 'l3num'
From: Pablo Neira Ayuso @ 2018-10-15 11:55 UTC (permalink / raw)
  To: YueHaibing
  Cc: Jozsef Kadlecsik, Florian Westphal, netfilter-devel, coreteam,
	netdev, kernel-janitors
In-Reply-To: <1539137652-64831-1-git-send-email-yuehaibing@huawei.com>

On Wed, Oct 10, 2018 at 02:14:12AM +0000, YueHaibing wrote:
> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> net/netfilter/nfnetlink_cttimeout.c: In function 'cttimeout_default_set':
> net/netfilter/nfnetlink_cttimeout.c:353:8: warning:
>  variable 'l3num' set but not used [-Wunused-but-set-variable]
> 
> It not used any more after
> commit dd2934a95701 ("netfilter: conntrack: remove l3->l4 mapping information")

Applied.

^ permalink raw reply

* Re: bond: take rcu lock in netpoll_send_skb_on_dev
From: Eran Ben Elisha @ 2018-10-15 11:36 UTC (permalink / raw)
  To: Dave Jones, netdev@vger.kernel.org
  Cc: Cong Wang, Tariq Toukan, Saeed Mahameed
In-Reply-To: <20180928202608.uycdlytob75iphfu@codemonkey.org.uk>



On 9/28/2018 11:26 PM, Dave Jones wrote:
> The bonding driver lacks the rcu lock when it calls down into
> netdev_lower_get_next_private_rcu from bond_poll_controller, which
> results in a trace like:
> 
> WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40
> CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1
> Workqueue: bond0 bond_mii_monitor
> RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40
> Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8>
> RSP: 0018:ffffc9000087fa68 EFLAGS: 00010046
> RAX: 0000000000000000 RBX: ffff880429614560 RCX: 0000000000000000
> RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffffffffa184ada0
> RBP: ffffc9000087fa80 R08: 0000000000000001 R09: 0000000000000000
> R10: ffffc9000087f9f0 R11: ffff880429798040 R12: ffff8804289d5980
> R13: ffffffffa1511f60 R14: 00000000000000c8 R15: 00000000ffffffff
> FS:  0000000000000000(0000) GS:ffff88042f880000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f4b78fce180 CR3: 000000018180f006 CR4: 00000000001606e0
> Call Trace:
>   bond_poll_controller+0x52/0x170
>   netpoll_poll_dev+0x79/0x290
>   netpoll_send_skb_on_dev+0x158/0x2c0
>   netpoll_send_udp+0x2d5/0x430
>   write_ext_msg+0x1e0/0x210
>   console_unlock+0x3c4/0x630
>   vprintk_emit+0xfa/0x2f0
>   printk+0x52/0x6e
>   ? __netdev_printk+0x12b/0x220
>   netdev_info+0x64/0x80
>   ? bond_3ad_set_carrier+0xe9/0x180
>   bond_select_active_slave+0x1fc/0x310
>   bond_mii_monitor+0x709/0x9b0
>   process_one_work+0x221/0x5e0
>   worker_thread+0x4f/0x3b0
>   kthread+0x100/0x140
>   ? process_one_work+0x5e0/0x5e0
>   ? kthread_delayed_work_timer_fn+0x90/0x90
>   ret_from_fork+0x24/0x30
> 
> We're also doing rcu dereferences a layer up in netpoll_send_skb_on_dev
> before we call down into netpoll_poll_dev, so just take the lock there.
> 
> Suggested-by: Cong Wang <xiyou.wangcong@gmail.com>
> Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
> 
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index 3219a2932463..692367d7c280 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -330,6 +330,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
>   	/* It is up to the caller to keep npinfo alive. */
>   	struct netpoll_info *npinfo;
>   
> +	rcu_read_lock_bh();
Hi,

This suggested fix introduced a regression while using netconsole module 
with mlx5_core module loaded.

During irq handling, we hit a warning that this rcu_read_lock_bh cannot 
be taken inside an IRQ.
Isn't it accepted from a driver to print to kernel log inside irq 
handler or maybe the lock was taken too high in the calling chain of 
bond_poll_controller?

Attached below the trace we are hitting once we applied your patch over 
our systems.

[2018-10-15 10:45:30] mlx5_core 0000:00:09.0: firmware version: 16.22.8010
[2018-10-15 10:45:30] mlx5_core 0000:00:09.0: 63.008 Gb/s available PCIe 
bandwidth, limited by 8 GT/s x8 link at 0000:00:09.0 (capable of 126.016 
Gb/s with 8 GT/s x16 link)
[2018-10-15 10:45:31] (0000:00:09.0): E-Switch: Total vports 1, per 
vport: max uc(1024) max mc(16384)
[2018-10-15 10:45:31] mlx5_core 0000:00:09.0: Port module event: module 
0, Cable plugged
[2018-10-15 10:45:31] WARNING: CPU: 1 PID: 0 at kernel/softirq.c:168 
__local_bh_enable_ip+0x35/0x50
[2018-10-15 10:45:31] Modules linked in: mlx5_core(+) mlxfw bonding 
ip6_gre ip6_tunnel tunnel6 ip_gre ip_tunnel gre rdma_ucm ib_uverbs 
ib_ipoib ib_umad nfsv3 nfs_acl nfs lockd grace fscache netconsole 
mlx4_ib mlx4_en ptp pps_core mlx4_core cfg80211 devlink rfkill rpcrdma 
ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp sunrpc 
rdma_cm ib_cm iw_cm ib_core snd_hda_codec_generic snd_hda_intel 
snd_hda_codec snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm 
snd_timer snd soundcore pcspkr i2c_piix4 sch_fq_codel ip_tables cirrus 
drm_kms_helper ata_generic pata_acpi syscopyarea sysfillrect sysimgblt 
fb_sys_fops ttm drm virtio_net net_failover i2c_core failover serio_raw 
floppy ata_piix [last unloaded: mlxfw]
[2018-10-15 10:45:31] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
4.19.0-rc6-J4083-G9e91d710a170 #1
[2018-10-15 10:45:31] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[2018-10-15 10:45:31] RIP: 0010:__local_bh_enable_ip+0x35/0x50
[2018-10-15 10:45:31] Code: 7e a9 00 00 0f 00 75 22 83 ee 01 f7 de 65 01 
35 91 8c f7 7e 65 8b 05 8a 8c f7 7e a9 00 ff 1f 00 74 0c 65 ff 0d 7c 8c 
f7 7e c3 <0f> 0b eb da 65 66 8b 05 1f 4e f8 7e 66 85 c0 74 e7 e8 55 ff ff ff
[2018-10-15 10:45:31] RSP: 0018:ffff880237a43c10 EFLAGS: 00010006
[2018-10-15 10:45:31] RAX: 0000000080010200 RBX: 0000000000000006 RCX: 
0000000000000001
[2018-10-15 10:45:31] RDX: 0000000000000000 RSI: 0000000000000200 RDI: 
ffffffff817a1321
[2018-10-15 10:45:31] RBP: ffff880237a43c60 R08: 0000000000480020 R09: 
0000000000000000
[2018-10-15 10:45:31] R10: 000000020834c006 R11: 0000000000000000 R12: 
ffff880229963d68
[2018-10-15 10:45:31] R13: ffff88020834c034 R14: 0000000000006b00 R15: 
ffff8802297d8400
[2018-10-15 10:45:31] FS:  0000000000000000(0000) 
GS:ffff880237a40000(0000) knlGS:0000000000000000
[2018-10-15 10:45:31] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[2018-10-15 10:45:31] CR2: 00007f96d4f57080 CR3: 00000001a157d000 CR4: 
00000000000006e0
[2018-10-15 10:45:31] Call Trace:
[2018-10-15 10:45:31]
[2018-10-15 10:45:31]  netpoll_send_udp+0x2de/0x410
[2018-10-15 10:45:31]  write_msg+0xdb/0xf0 [netconsole]
[2018-10-15 10:45:31]  console_unlock+0x33e/0x500
[2018-10-15 10:45:31]  vprintk_emit+0x211/0x280
[2018-10-15 10:45:31]  dev_vprintk_emit+0x10b/0x200
[2018-10-15 10:45:31]  dev_printk_emit+0x3b/0x50
[2018-10-15 10:45:31]  ? ttwu_do_wakeup+0x19/0x130
[2018-10-15 10:45:31]  _dev_info+0x55/0x60
[2018-10-15 10:45:31]  mlx5_eq_int+0x27a/0x690 [mlx5_core]
[2018-10-15 10:45:31]  __handle_irq_event_percpu+0x3a/0x190
[2018-10-15 10:45:31]  handle_irq_event_percpu+0x20/0x50
[2018-10-15 10:45:31]  handle_irq_event+0x27/0x50
[2018-10-15 10:45:31]  handle_edge_irq+0x6d/0x180
[2018-10-15 10:45:31]  handle_irq+0xa5/0x110
[2018-10-15 10:45:31]  do_IRQ+0x49/0xd0
[2018-10-15 10:45:31]  common_interrupt+0xf/0xf
[2018-10-15 10:45:31]
[2018-10-15 10:45:31] RIP: 0010:native_safe_halt+0x2/0x10
[2018-10-15 10:45:31] Code: 7e ff ff ff 7f f3 c3 65 48 8b 04 25 80 5b 01 
00 f0 80 48 02 20 48 8b 00 a8 08 74 8b eb c1 90 90 90 90 90 90 90 90 90 
90 fb f4  0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 f4 c3 90 90 90 90 90 90
[2018-10-15 10:45:31] RSP: 0018:ffff88023663fed0 EFLAGS: 00000246 
ORIG_RAX: ffffffffffffffda
[2018-10-15 10:45:31] RAX: 0000000080000000 RBX: 0000000000000001 RCX: 
ffff880237a5a880
[2018-10-15 10:45:31] RDX: ffffffff8221cd48 RSI: ffff880237a5a880 RDI: 
0000000000000001
[2018-10-15 10:45:31] RBP: 0000000000000001 R08: 000000200b1d1602 R09: 
0000000000000000
[2018-10-15 10:45:31] R10: ffff880236627d20 R11: 0000000000000000 R12: 
0000000000000000
[2018-10-15 10:45:31] R13: 0000000000000000 R14: 0000000000000000 R15: 
0000000000000000
[2018-10-15 10:45:31]  default_idle+0x1c/0x140
[2018-10-15 10:45:31]  do_idle+0x194/0x240
[2018-10-15 10:45:31]  cpu_startup_entry+0x19/0x20
[2018-10-15 10:45:31]  start_secondary+0x138/0x170
[2018-10-15 10:45:31]  secondary_startup_64+0xa4/0xb0
[2018-10-15 10:45:31] ---[ end trace 10dfce1a9e88fa01 ]---

>   	lockdep_assert_irqs_disabled();
>   
>   	npinfo = rcu_dereference_bh(np->dev->npinfo);
> @@ -374,6 +375,7 @@ void netpoll_send_skb_on_dev(struct netpoll *np, struct sk_buff *skb,
>   		skb_queue_tail(&npinfo->txq, skb);
>   		schedule_delayed_work(&npinfo->tx_work,0);
>   	}
> +	rcu_read_unlock_bh();
>   }
>   EXPORT_SYMBOL(netpoll_send_skb_on_dev);
>   
> 

^ permalink raw reply

* [PATCH net-next,v3] hv_netvsc: fix vf serial matching with pci slot info
From: Haiyang Zhang @ 2018-10-15 19:06 UTC (permalink / raw)
  To: davem, netdev
  Cc: haiyangz, kys, sthemmin, olaf, vkuznets, devel, linux-kernel

From: Haiyang Zhang <haiyangz@microsoft.com>

The VF device's serial number is saved as a string in PCI slot's
kobj name, not the slot->number. This patch corrects the netvsc
driver, so the VF device can be successfully paired with synthetic
NIC.

Fixes: 00d7ddba1143 ("hv_netvsc: pair VF based on serial number")
Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 9bcaf204a7d4..cf36e7ff3191 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2030,14 +2030,15 @@ static void netvsc_vf_setup(struct work_struct *w)
 	rtnl_unlock();
 }
 
-/* Find netvsc by VMBus serial number.
- * The PCI hyperv controller records the serial number as the slot.
+/* Find netvsc by VF serial number.
+ * The PCI hyperv controller records the serial number as the slot kobj name.
  */
 static struct net_device *get_netvsc_byslot(const struct net_device *vf_netdev)
 {
 	struct device *parent = vf_netdev->dev.parent;
 	struct net_device_context *ndev_ctx;
 	struct pci_dev *pdev;
+	u32 serial;
 
 	if (!parent || !dev_is_pci(parent))
 		return NULL; /* not a PCI device */
@@ -2048,16 +2049,22 @@ static struct net_device *get_netvsc_byslot(const struct net_device *vf_netdev)
 		return NULL;
 	}
 
+	if (kstrtou32(pci_slot_name(pdev->slot), 10, &serial)) {
+		netdev_notice(vf_netdev, "Invalid vf serial:%s\n",
+			      pci_slot_name(pdev->slot));
+		return NULL;
+	}
+
 	list_for_each_entry(ndev_ctx, &netvsc_dev_list, list) {
 		if (!ndev_ctx->vf_alloc)
 			continue;
 
-		if (ndev_ctx->vf_serial == pdev->slot->number)
+		if (ndev_ctx->vf_serial == serial)
 			return hv_get_drvdata(ndev_ctx->device_ctx);
 	}
 
 	netdev_notice(vf_netdev,
-		      "no netdev found for slot %u\n", pdev->slot->number);
+		      "no netdev found for vf serial:%u\n", serial);
 	return NULL;
 }
 
-- 
2.18.0

^ permalink raw reply related

* Re: [PATCH][net-next][v2] net: bridge: fix a possible memory leak in __vlan_add
From: Nikolay Aleksandrov @ 2018-10-15 11:13 UTC (permalink / raw)
  To: Li RongQing, netdev; +Cc: bridge, roopa
In-Reply-To: <1539601231-32755-1-git-send-email-lirongqing@baidu.com>

On 15/10/2018 14:00, Li RongQing wrote:
> After per-port vlan stats, vlan stats should be released
> when fail to add vlan
> 
> Fixes: 9163a0fc1f0c0 ("net: bridge: add support for per-port vlan stats")
> CC: bridge@lists.linux-foundation.org
> cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> CC: Roopa Prabhu <roopa@cumulusnetworks.com>
> Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>  net/bridge/br_vlan.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index 9b707234e4ae..8c9297a01947 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c
> @@ -303,6 +303,10 @@ static int __vlan_add(struct net_bridge_vlan *v, u16 flags)
>  	if (p) {
>  		__vlan_vid_del(dev, br, v->vid);
>  		if (masterv) {
> +			if (v->stats && masterv->stats != v->stats)
> +				free_percpu(v->stats);
> +			v->stats = NULL;
> +
>  			br_vlan_put_master(masterv);
>  			v->brvlan = NULL;
>  		}
> 

Thanks,
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

^ permalink raw reply

* Re: [PATCH net-next 11/18] vxlan: Add netif_is_vxlan()
From: Jakub Kicinski @ 2018-10-15 18:57 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: ivecera@redhat.com, andrew@lunn.ch, nikolay@cumulusnetworks.com,
	netdev@vger.kernel.org, roopa@cumulusnetworks.com,
	vivien.didelot@savoirfairelinux.com,
	f.fainelli@gmail.com,   <bridge@lists.linux-foundation.org>, mlxsw <mlxsw@mellanox.com>,  Jiri Pirko <jiri@mellanox.com>, Petr Machata <petrm@mellanox.com>, ,
	"bridge@lists.linux-foundation.org,  " 
In-Reply-To: <20181013171725.3261-12-idosch@mellanox.com>

On Sat, 13 Oct 2018 17:18:38 +0000, Ido Schimmel wrote:
> Add the ability to determine whether a netdev is a VxLAN netdev by
> calling the above mentioned function that checks the netdev's private
> flags.
> 
> This will allow modules to identify netdev events involving a VxLAN
> netdev and act accordingly. For example, drivers capable of VxLAN
> offload will need to configure the underlying device when a VxLAN netdev
> is being enslaved to an offloaded bridge.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Reviewed-by: Petr Machata <petrm@mellanox.com>

Is this preferable over

!strcmp(netdev->rtnl_link_ops->kind, "vxlan")

which is what TC offloads do?

^ permalink raw reply

* Re: [PATCH net 1/2] geneve, vxlan: Don't check skb_dst() twice
From: Stefano Brivio @ 2018-10-15 11:08 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: David S. Miller, Xin Long, Sabrina Dubroca, netdev
In-Reply-To: <61596775-4b5f-884a-7a0d-d8c134bb7e8a@6wind.com>

On Mon, 15 Oct 2018 12:19:41 +0200
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:

> Le 12/10/2018 à 23:53, Stefano Brivio a écrit :
> > Commit f15ca723c1eb ("net: don't call update_pmtu unconditionally") avoids
> > that we try updating PMTU for a non-existent destination, but didn't clean
> > up cases where the check was already explicit. Drop those redundant checks.  
> Yes, I leave them to avoid calculating the new mtu value when not needed. We are
> in the xmit path.

Before 2/2 of this series, though, we call skb_dst_update_pmtu() (and
in turn dst->ops->update_pmtu()) for *every* packet with a dst, which
I'd dare saying is by far the most common case. Besides, 2/2 needs
anyway to calculate the MTU to fix a bug.

So I think this is a vast improvement overall.

If we want to improve this further and avoid any indirect calls in the
most common path, we would need to cache the MTU in the dst -- it's
probably doable, but I would fix the specific issue addressed by 2/2
first.

-- 
Stefano

^ permalink raw reply

* [PATCH][net-next][v2] net: bridge: fix a possible memory leak in __vlan_add
From: Li RongQing @ 2018-10-15 11:00 UTC (permalink / raw)
  To: netdev; +Cc: bridge, nikolay, roopa

After per-port vlan stats, vlan stats should be released
when fail to add vlan

Fixes: 9163a0fc1f0c0 ("net: bridge: add support for per-port vlan stats")
CC: bridge@lists.linux-foundation.org
cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
CC: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Zhang Yu <zhangyu31@baidu.com>
Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 net/bridge/br_vlan.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 9b707234e4ae..8c9297a01947 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -303,6 +303,10 @@ static int __vlan_add(struct net_bridge_vlan *v, u16 flags)
 	if (p) {
 		__vlan_vid_del(dev, br, v->vid);
 		if (masterv) {
+			if (v->stats && masterv->stats != v->stats)
+				free_percpu(v->stats);
+			v->stats = NULL;
+
 			br_vlan_put_master(masterv);
 			v->brvlan = NULL;
 		}
-- 
2.16.2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox