netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
@ 2023-08-29  9:01 Naresh Kamboju
  2023-08-30 11:26 ` Hillf Danton
  0 siblings, 1 reply; 16+ messages in thread
From: Naresh Kamboju @ 2023-08-29  9:01 UTC (permalink / raw)
  To: Netdev, open list:KERNEL SELFTEST FRAMEWORK, lkft-triage,
	open list
  Cc: David S. Miller, Jakub Kicinski, Stefano Brivio, Shuah Khan,
	Arnd Bergmann, Anders Roxell

The selftests: net: pmtu.sh running on Linux next got this kernel panic on
qemu-arm64.


logs:
====
[    0.000000] Linux version 6.5.0 (tuxmake@tuxmake) (Debian clang
version 18.0.0 (++20230822112105+841c4dc7e51e-1~exp1~20230822112223.851),
Debian LLD 18.0.0) #1 SMP PREEMPT @1693176139

...
# selftests: net: pmtu.sh
# TEST: ipv4: PMTU exceptions                                         [ OK ]
# TEST: ipv4: PMTU exceptions - nexthop objects                       [ OK ]
# TEST: ipv6: PMTU exceptions                                         [ OK ]
# TEST: ipv6: PMTU exceptions - nexthop objects                       [ OK ]

...

# TEST: vti4: PMTU exceptions, routed (ESP-in-UDP)                    [FAIL]
#   PMTU exception wasn't created after exceeding PMTU (IP payload length 1438)
# ./pmtu.sh: 938: kill: No such process
#
# ./pmtu.sh: 938: kill: No such process
#
<47>[  366.411270] systemd-journald[84]: Sent WATCHDOG=1 notification.
# TEST: vti4: default MTU assignment                                  [ OK ]
# TEST: vti6: default MTU assignment                                  [ OK ]
# TEST: vti4: MTU setting on link creation                            [ OK ]
# TEST: vti6: MTU setting on link creation                            [ OK ]
# TEST: vti6: MTU changes on link changes                             [ OK ]
# TEST: ipv4: cleanup of cached exceptions                            [ OK ]
# TEST: ipv4: cleanup of cached exceptions - nexthop objects          [ OK ]
# TEST: ipv6: cleanup of cached exceptions                            [ OK ]
# TEST: ipv6: cleanup of cached exceptions - nexthop objects          [ OK ]
<1>[  398.987591] Unable to handle kernel paging request at virtual
address ffff5dbb8189f000
<1>[  398.988469] Mem abort info:
<1>[  398.988712]   ESR = 0x0000000097b58004
<1>[  398.989264]   EC = 0x25: DABT (current EL), IL = 32 bits
<1>[  398.989893]   SET = 0, FnV = 0
<1>[  398.990312]   EA = 0, S1PTW = 0
<1>[  398.990768]   FSC = 0x04: level 0 translation fault
<1>[  398.992330] Data abort info:
<1>[  398.992591]   Access size = 4 byte(s)
<1>[  398.992811]   SSE = 1, SRT = 21
<1>[  398.993008]   SF = 1, AR = 0
<1>[  398.993243]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
<1>[  398.993601]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
<1>[  398.994094] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041d49000
<1>[  398.994603] [ffff5dbb8189f000] pgd=0000000000000000, p4d=0000000000000000
<0>[  398.995464] Internal error: Oops: 0000000097b58004 [#1] PREEMPT SMP
<4>[  398.996177] Modules linked in: fou6 sit fou bridge stp llc vxlan
ip6_udp_tunnel udp_tunnel act_csum libcrc32c act_pedit cls_flower
sch_prio veth vrf macvtap macvlan tap crct10dif_ce sm3_ce sm3 sha3_ce
sha512_ce sha512_arm64 fuse drm dm_mod ip_tables x_tables [last
unloaded: test_blackhole_dev]
<4>[  398.999384] CPU: 1 PID: 132 Comm: kworker/u4:3 Not tainted 6.5.0 #1
<4>[  399.000045] Hardware name: linux,dummy-virt (DT)
<4>[  399.001079] Workqueue: netns cleanup_net
<4>[  399.002441] pstate: 824000c9 (Nzcv daIF +PAN -UAO +TCO -DIT
-SSBS BTYPE=--)
<4>[  399.003881] pc : percpu_counter_add_batch+0x28/0xd0
<4>[  399.004637] lr : dst_destroy+0x44/0x1e4
<4>[  399.004904] sp : ffff80008000be70
<4>[  399.005864] x29: ffff80008000be70 x28: ffffa2457df6ea40 x27:
ffffa2457df24008
<4>[  399.006586] x26: ffffa2457dffa000 x25: 000000000000000a x24:
0000000000000002
<4>[  399.007360] x23: 0000000000000000 x22: 000000000000000a x21:
ffff0000c0e90ec0
<4>[  399.008276] x20: 0000000000000000 x19: ffff0000c4f83440 x18:
0000000000000000
<4>[  399.009327] x17: ffff5dbb8189f000 x16: ffff800080008000 x15:
0000000000000010
<4>[  399.009653] x14: 0000000000000010 x13: 0000000000000004 x12:
0000000000000701
<4>[  399.010159] x11: 0000000000000001 x10: fffffc00030c7260 x9 :
ffff5dbb8189f000
<4>[  399.010742] x8 : 0000000000000000 x7 : 7f7f7f7f7f7f7f7f x6 :
fefefefefefefeff
<4>[  399.012056] x5 : 000000000010000e x4 : fffffc00030c7260 x3 :
000000000010000e
<4>[  399.013082] x2 : 0000000000000020 x1 : ffffffffffffffff x0 :
ffff0000c4f83440
<4>[  399.014716] Call trace:
<4>[  399.015702]  percpu_counter_add_batch+0x28/0xd0
<4>[  399.016399]  dst_destroy+0x44/0x1e4
<4>[  399.016681]  dst_destroy_rcu+0x14/0x20
<4>[  399.017009]  rcu_core+0x2d0/0x5e0
<4>[  399.017311]  rcu_core_si+0x10/0x1c
<4>[  399.017609]  __do_softirq+0xd4/0x23c
<4>[  399.017991]  ____do_softirq+0x10/0x1c
<4>[  399.018320]  call_on_irq_stack+0x24/0x4c
<4>[  399.018723]  do_softirq_own_stack+0x1c/0x28
<4>[  399.022639]  __irq_exit_rcu+0x6c/0xcc
<4>[  399.023434]  irq_exit_rcu+0x10/0x1c
<4>[  399.023962]  el1_interrupt+0x8c/0xc0
<4>[  399.024810]  el1h_64_irq_handler+0x18/0x24
<4>[  399.025324]  el1h_64_irq+0x64/0x68
<4>[  399.025612]  _raw_spin_lock_bh+0x0/0x6c
<4>[  399.026102]  cleanup_net+0x280/0x45c
<4>[  399.026403]  process_one_work+0x1d4/0x310
<4>[  399.027140]  worker_thread+0x248/0x470
<4>[  399.027621]  kthread+0xfc/0x184
<4>[  399.028068]  ret_from_fork+0x10/0x20
<0>[  399.029333] Code: d50343df aa0003f3 f9401008 d53cd049 (b8a86935)
<4>[  399.030578] ---[ end trace 0000000000000000 ]---
<0>[  399.031422] Kernel panic - not syncing: Oops: Fatal exception in interrupt
<2>[  399.032320] SMP: stopping secondary CPUs
<0>[  399.033487] Kernel Offset: 0x2244fc200000 from 0xffff800080000000
<0>[  399.033819] PHYS_OFFSET: 0x40000000
<0>[  399.034096] CPU features: 0x00000000,68f167a1,ccc6773f
<0>[  399.034779] Memory Limit: none
<0>[  399.035768] ---[ end Kernel panic - not syncing: Oops: Fatal
exception in interrupt ]---

Links:
 - https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.5/testrun/19373075/suite/log-parser-test/test/check-kernel-oops/log
 - https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.5/testrun/19373075/suite/log-parser-test/tests/
 -   https://storage.tuxsuite.com/public/linaro/lkft/builds/2UaRggcJ0lNsDMIbbFaiyz3Qwsi/

Steps to reproduce:
===================
#
# To install tuxrun to your home directory at ~/.local/bin:
# pip3 install -U --user tuxrun==0.48.0
#
# Or install a deb/rpm depending on the running distribution
# See https://tuxmake.org/install-deb/ or
# https://tuxmake.org/install-rpm/
#
# See https://tuxrun.org/ for complete documentation.
#
# Please follow the additional instructions if the tests are related to FVP:
# https://tuxrun.org/run-fvp/
#

tuxrun --runtime podman --device qemu-arm64 --boot-args rw --kernel
https://storage.tuxsuite.com/public/linaro/lkft/builds/2UaRggcJ0lNsDMIbbFaiyz3Qwsi/Image.gz
--modules https://storage.tuxsuite.com/public/linaro/lkft/builds/2UaRggcJ0lNsDMIbbFaiyz3Qwsi/modules.tar.xz
--rootfs https://storage.tuxboot.com/debian/bookworm/arm64/rootfs.ext4.xz
--parameters SHARD_INDEX=1 --parameters SKIPFILE=skipfile-lkft.yaml
--parameters SHARD_NUMBER=5 --parameters
KSELFTEST=https://storage.tuxsuite.com/public/linaro/lkft/builds/2UaRggcJ0lNsDMIbbFaiyz3Qwsi/kselftest.tar.xz
--image docker.io/linaro/tuxrun-dispatcher:v0.48.0 --tests
kselftest-net --timeouts boot=30 kselftest-net=30


--
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2023-08-29  9:01 selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address Naresh Kamboju
@ 2023-08-30 11:26 ` Hillf Danton
  2023-08-30 12:44   ` Tetsuo Handa
  0 siblings, 1 reply; 16+ messages in thread
From: Hillf Danton @ 2023-08-30 11:26 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Netdev, Eric Dumazet, Paul E. McKenney, Linus Torvalds,
	Tetsuo Handa, LKML

On Tue, 29 Aug 2023 14:31:28 +0530 Naresh Kamboju <naresh.kamboju@linaro.org>
> The selftests: net: pmtu.sh running on Linux next got this kernel panic on
> qemu-arm64.
> 
> 
> logs:
> ====
> [    0.000000] Linux version 6.5.0 (tuxmake@tuxmake) (Debian clang
> version 18.0.0 (++20230822112105+841c4dc7e51e-1~exp1~20230822112223.851),
> Debian LLD 18.0.0) #1 SMP PREEMPT @1693176139
> 
> ...
> # selftests: net: pmtu.sh
> # TEST: ipv4: PMTU exceptions                                         [ OK ]
> # TEST: ipv4: PMTU exceptions - nexthop objects                       [ OK ]
> # TEST: ipv6: PMTU exceptions                                         [ OK ]
> # TEST: ipv6: PMTU exceptions - nexthop objects                       [ OK ]
> 
> ...
> 
> # TEST: vti4: PMTU exceptions, routed (ESP-in-UDP)                    [FAIL]
> #   PMTU exception wasn't created after exceeding PMTU (IP payload length 1438)
> # ./pmtu.sh: 938: kill: No such process
> #
> # ./pmtu.sh: 938: kill: No such process
> #
> <47>[  366.411270] systemd-journald[84]: Sent WATCHDOG=1 notification.
> # TEST: vti4: default MTU assignment                                  [ OK ]
> # TEST: vti6: default MTU assignment                                  [ OK ]
> # TEST: vti4: MTU setting on link creation                            [ OK ]
> # TEST: vti6: MTU setting on link creation                            [ OK ]
> # TEST: vti6: MTU changes on link changes                             [ OK ]
> # TEST: ipv4: cleanup of cached exceptions                            [ OK ]
> # TEST: ipv4: cleanup of cached exceptions - nexthop objects          [ OK ]
> # TEST: ipv6: cleanup of cached exceptions                            [ OK ]
> # TEST: ipv6: cleanup of cached exceptions - nexthop objects          [ OK ]
> <1>[  398.987591] Unable to handle kernel paging request at virtual
> address ffff5dbb8189f000
> <1>[  398.988469] Mem abort info:
> <1>[  398.988712]   ESR = 0x0000000097b58004
> <1>[  398.989264]   EC = 0x25: DABT (current EL), IL = 32 bits
> <1>[  398.989893]   SET = 0, FnV = 0
> <1>[  398.990312]   EA = 0, S1PTW = 0
> <1>[  398.990768]   FSC = 0x04: level 0 translation fault
> <1>[  398.992330] Data abort info:
> <1>[  398.992591]   Access size = 4 byte(s)
> <1>[  398.992811]   SSE = 1, SRT = 21
> <1>[  398.993008]   SF = 1, AR = 0
> <1>[  398.993243]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> <1>[  398.993601]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> <1>[  398.994094] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041d49000
> <1>[  398.994603] [ffff5dbb8189f000] pgd=0000000000000000, p4d=0000000000000000
> <0>[  398.995464] Internal error: Oops: 0000000097b58004 [#1] PREEMPT SMP
> <4>[  398.996177] Modules linked in: fou6 sit fou bridge stp llc vxlan
> ip6_udp_tunnel udp_tunnel act_csum libcrc32c act_pedit cls_flower
> sch_prio veth vrf macvtap macvlan tap crct10dif_ce sm3_ce sm3 sha3_ce
> sha512_ce sha512_arm64 fuse drm dm_mod ip_tables x_tables [last
> unloaded: test_blackhole_dev]
> <4>[  398.999384] CPU: 1 PID: 132 Comm: kworker/u4:3 Not tainted 6.5.0 #1
> <4>[  399.000045] Hardware name: linux,dummy-virt (DT)
> <4>[  399.001079] Workqueue: netns cleanup_net
> <4>[  399.002441] pstate: 824000c9 (Nzcv daIF +PAN -UAO +TCO -DIT
> -SSBS BTYPE=--)
> <4>[  399.003881] pc : percpu_counter_add_batch+0x28/0xd0
> <4>[  399.004637] lr : dst_destroy+0x44/0x1e4
> <4>[  399.004904] sp : ffff80008000be70
> <4>[  399.005864] x29: ffff80008000be70 x28: ffffa2457df6ea40 x27:
> ffffa2457df24008
> <4>[  399.006586] x26: ffffa2457dffa000 x25: 000000000000000a x24:
> 0000000000000002
> <4>[  399.007360] x23: 0000000000000000 x22: 000000000000000a x21:
> ffff0000c0e90ec0
> <4>[  399.008276] x20: 0000000000000000 x19: ffff0000c4f83440 x18:
> 0000000000000000
> <4>[  399.009327] x17: ffff5dbb8189f000 x16: ffff800080008000 x15:
> 0000000000000010
> <4>[  399.009653] x14: 0000000000000010 x13: 0000000000000004 x12:
> 0000000000000701
> <4>[  399.010159] x11: 0000000000000001 x10: fffffc00030c7260 x9 :
> ffff5dbb8189f000
> <4>[  399.010742] x8 : 0000000000000000 x7 : 7f7f7f7f7f7f7f7f x6 :
> fefefefefefefeff
> <4>[  399.012056] x5 : 000000000010000e x4 : fffffc00030c7260 x3 :
> 000000000010000e
> <4>[  399.013082] x2 : 0000000000000020 x1 : ffffffffffffffff x0 :
> ffff0000c4f83440
> <4>[  399.014716] Call trace:
> <4>[  399.015702]  percpu_counter_add_batch+0x28/0xd0
> <4>[  399.016399]  dst_destroy+0x44/0x1e4
> <4>[  399.016681]  dst_destroy_rcu+0x14/0x20
> <4>[  399.017009]  rcu_core+0x2d0/0x5e0
> <4>[  399.017311]  rcu_core_si+0x10/0x1c
> <4>[  399.017609]  __do_softirq+0xd4/0x23c
> <4>[  399.017991]  ____do_softirq+0x10/0x1c
> <4>[  399.018320]  call_on_irq_stack+0x24/0x4c
> <4>[  399.018723]  do_softirq_own_stack+0x1c/0x28
> <4>[  399.022639]  __irq_exit_rcu+0x6c/0xcc
> <4>[  399.023434]  irq_exit_rcu+0x10/0x1c
> <4>[  399.023962]  el1_interrupt+0x8c/0xc0
> <4>[  399.024810]  el1h_64_irq_handler+0x18/0x24
> <4>[  399.025324]  el1h_64_irq+0x64/0x68
> <4>[  399.025612]  _raw_spin_lock_bh+0x0/0x6c
> <4>[  399.026102]  cleanup_net+0x280/0x45c
> <4>[  399.026403]  process_one_work+0x1d4/0x310
> <4>[  399.027140]  worker_thread+0x248/0x470
> <4>[  399.027621]  kthread+0xfc/0x184
> <4>[  399.028068]  ret_from_fork+0x10/0x20

static void cleanup_net(struct work_struct *work)
{
	...

	synchronize_rcu();

	/* Run all of the network namespace exit methods */
	list_for_each_entry_reverse(ops, &pernet_list, list)
		ops_exit_list(ops, &net_exit_list);
	...

Why did the RCU sync above fail to work in this report, Eric?

Hillf

> <0>[  399.029333] Code: d50343df aa0003f3 f9401008 d53cd049 (b8a86935)
> <4>[  399.030578] ---[ end trace 0000000000000000 ]---
> <0>[  399.031422] Kernel panic - not syncing: Oops: Fatal exception in interrupt
> <2>[  399.032320] SMP: stopping secondary CPUs
> <0>[  399.033487] Kernel Offset: 0x2244fc200000 from 0xffff800080000000
> <0>[  399.033819] PHYS_OFFSET: 0x40000000
> <0>[  399.034096] CPU features: 0x00000000,68f167a1,ccc6773f
> <0>[  399.034779] Memory Limit: none
> <0>[  399.035768] ---[ end Kernel panic - not syncing: Oops: Fatal
> exception in interrupt ]---
> 
> Links:
>  - https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.5/testrun/19373075/suite/log-parser-test/test/check-kernel-oops/log
>  - https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.5/testrun/19373075/suite/log-parser-test/tests/
>  -   https://storage.tuxsuite.com/public/linaro/lkft/builds/2UaRggcJ0lNsDMIbbFaiyz3Qwsi/
> 
> Steps to reproduce:
> ===================
> #
> # To install tuxrun to your home directory at ~/.local/bin:
> # pip3 install -U --user tuxrun==0.48.0
> #
> # Or install a deb/rpm depending on the running distribution
> # See https://tuxmake.org/install-deb/ or
> # https://tuxmake.org/install-rpm/
> #
> # See https://tuxrun.org/ for complete documentation.
> #
> # Please follow the additional instructions if the tests are related to FVP:
> # https://tuxrun.org/run-fvp/
> #
> 
> tuxrun --runtime podman --device qemu-arm64 --boot-args rw --kernel
> https://storage.tuxsuite.com/public/linaro/lkft/builds/2UaRggcJ0lNsDMIbbFaiyz3Qwsi/Image.gz
> --modules https://storage.tuxsuite.com/public/linaro/lkft/builds/2UaRggcJ0lNsDMIbbFaiyz3Qwsi/modules.tar.xz
> --rootfs https://storage.tuxboot.com/debian/bookworm/arm64/rootfs.ext4.xz
> --parameters SHARD_INDEX=1 --parameters SKIPFILE=skipfile-lkft.yaml
> --parameters SHARD_NUMBER=5 --parameters
> KSELFTEST=https://storage.tuxsuite.com/public/linaro/lkft/builds/2UaRggcJ0lNsDMIbbFaiyz3Qwsi/kselftest.tar.xz
> --image docker.io/linaro/tuxrun-dispatcher:v0.48.0 --tests
> kselftest-net --timeouts boot=30 kselftest-net=30
> 
> 
> --
> Linaro LKFT
> https://lkft.linaro.org
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2023-08-30 11:26 ` Hillf Danton
@ 2023-08-30 12:44   ` Tetsuo Handa
  2023-08-31 11:41     ` Hillf Danton
  0 siblings, 1 reply; 16+ messages in thread
From: Tetsuo Handa @ 2023-08-30 12:44 UTC (permalink / raw)
  To: Hillf Danton, Naresh Kamboju
  Cc: Netdev, Eric Dumazet, Paul E. McKenney, Linus Torvalds, LKML

On 2023/08/30 20:26, Hillf Danton wrote:
>> <4>[  399.014716] Call trace:
>> <4>[  399.015702]  percpu_counter_add_batch+0x28/0xd0
>> <4>[  399.016399]  dst_destroy+0x44/0x1e4
>> <4>[  399.016681]  dst_destroy_rcu+0x14/0x20
>> <4>[  399.017009]  rcu_core+0x2d0/0x5e0
>> <4>[  399.017311]  rcu_core_si+0x10/0x1c
>> <4>[  399.017609]  __do_softirq+0xd4/0x23c
>> <4>[  399.017991]  ____do_softirq+0x10/0x1c
>> <4>[  399.018320]  call_on_irq_stack+0x24/0x4c
>> <4>[  399.018723]  do_softirq_own_stack+0x1c/0x28
>> <4>[  399.022639]  __irq_exit_rcu+0x6c/0xcc
>> <4>[  399.023434]  irq_exit_rcu+0x10/0x1c
>> <4>[  399.023962]  el1_interrupt+0x8c/0xc0
>> <4>[  399.024810]  el1h_64_irq_handler+0x18/0x24
>> <4>[  399.025324]  el1h_64_irq+0x64/0x68
>> <4>[  399.025612]  _raw_spin_lock_bh+0x0/0x6c
>> <4>[  399.026102]  cleanup_net+0x280/0x45c
>> <4>[  399.026403]  process_one_work+0x1d4/0x310
>> <4>[  399.027140]  worker_thread+0x248/0x470
>> <4>[  399.027621]  kthread+0xfc/0x184
>> <4>[  399.028068]  ret_from_fork+0x10/0x20
> 
> static void cleanup_net(struct work_struct *work)
> {
> 	...
> 
> 	synchronize_rcu();
> 
> 	/* Run all of the network namespace exit methods */
> 	list_for_each_entry_reverse(ops, &pernet_list, list)
> 		ops_exit_list(ops, &net_exit_list);
> 	...
> 
> Why did the RCU sync above fail to work in this report, Eric?

Why do you assume that synchronize_rcu() failed to work?
The trace merely says that an interrupt handler ran somewhere from
cleanup_net(), and something went wrong inside dst_destroy().

Please decode the trace into filename:line format (like syzbot reports)
using scripts/faddr2line tool, in order to find the exact location.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2023-08-30 12:44   ` Tetsuo Handa
@ 2023-08-31 11:41     ` Hillf Danton
  2023-08-31 13:12       ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Hillf Danton @ 2023-08-31 11:41 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Netdev, Eric Dumazet, Paul E. McKenney, Linus Torvalds,
	Naresh Kamboju, LKML

On Wed, 30 Aug 2023 21:44:57 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
>On 2023/08/30 20:26, Hillf Danton wrote:
>>> <4>[  399.014716] Call trace:
>>> <4>[  399.015702]  percpu_counter_add_batch+0x28/0xd0
>>> <4>[  399.016399]  dst_destroy+0x44/0x1e4
>>> <4>[  399.016681]  dst_destroy_rcu+0x14/0x20
>>> <4>[  399.017009]  rcu_core+0x2d0/0x5e0
>>> <4>[  399.017311]  rcu_core_si+0x10/0x1c
>>> <4>[  399.017609]  __do_softirq+0xd4/0x23c
>>> <4>[  399.017991]  ____do_softirq+0x10/0x1c
>>> <4>[  399.018320]  call_on_irq_stack+0x24/0x4c
>>> <4>[  399.018723]  do_softirq_own_stack+0x1c/0x28
>>> <4>[  399.022639]  __irq_exit_rcu+0x6c/0xcc
>>> <4>[  399.023434]  irq_exit_rcu+0x10/0x1c
>>> <4>[  399.023962]  el1_interrupt+0x8c/0xc0
>>> <4>[  399.024810]  el1h_64_irq_handler+0x18/0x24
>>> <4>[  399.025324]  el1h_64_irq+0x64/0x68
>>> <4>[  399.025612]  _raw_spin_lock_bh+0x0/0x6c
>>> <4>[  399.026102]  cleanup_net+0x280/0x45c
>>> <4>[  399.026403]  process_one_work+0x1d4/0x310
>>> <4>[  399.027140]  worker_thread+0x248/0x470
>>> <4>[  399.027621]  kthread+0xfc/0x184
>>> <4>[  399.028068]  ret_from_fork+0x10/0x20
>> 
>> static void cleanup_net(struct work_struct *work)
>> {
>> 	...
>> 
>> 	synchronize_rcu();
>> 
>> 	/* Run all of the network namespace exit methods */
>> 	list_for_each_entry_reverse(ops, &pernet_list, list)
>> 		ops_exit_list(ops, &net_exit_list);
>> 	...
>> 
>> Why did the RCU sync above fail to work in this report, Eric?
>
> Why do you assume that synchronize_rcu() failed to work?

In the ipv6 pernet_operations [1] for instance, dst_entries_destroy() is
invoked after RCU sync to ensure that nobody is using the exiting net,
but this report shows that protection falls apart.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/route.c#n6557

> The trace merely says that an interrupt handler ran somewhere from
> cleanup_net(), and something went wrong inside dst_destroy().

But bc9d3a9f2afc and 483c26ff63f4 has been upsteam for quite a while.
Not sure if it is arm64 specific.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2023-08-31 11:41     ` Hillf Danton
@ 2023-08-31 13:12       ` Eric Dumazet
  2023-09-03  0:53         ` Hillf Danton
  2023-09-05  6:20         ` Naresh Kamboju
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2023-08-31 13:12 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Tetsuo Handa, Netdev, Paul E. McKenney, Linus Torvalds,
	Naresh Kamboju, LKML

On Thu, Aug 31, 2023 at 2:17 PM Hillf Danton <hdanton@sina.com> wrote:
>
> On Wed, 30 Aug 2023 21:44:57 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> >On 2023/08/30 20:26, Hillf Danton wrote:
> >>> <4>[  399.014716] Call trace:
> >>> <4>[  399.015702]  percpu_counter_add_batch+0x28/0xd0
> >>> <4>[  399.016399]  dst_destroy+0x44/0x1e4
> >>> <4>[  399.016681]  dst_destroy_rcu+0x14/0x20
> >>> <4>[  399.017009]  rcu_core+0x2d0/0x5e0
> >>> <4>[  399.017311]  rcu_core_si+0x10/0x1c
> >>> <4>[  399.017609]  __do_softirq+0xd4/0x23c
> >>> <4>[  399.017991]  ____do_softirq+0x10/0x1c
> >>> <4>[  399.018320]  call_on_irq_stack+0x24/0x4c
> >>> <4>[  399.018723]  do_softirq_own_stack+0x1c/0x28
> >>> <4>[  399.022639]  __irq_exit_rcu+0x6c/0xcc
> >>> <4>[  399.023434]  irq_exit_rcu+0x10/0x1c
> >>> <4>[  399.023962]  el1_interrupt+0x8c/0xc0
> >>> <4>[  399.024810]  el1h_64_irq_handler+0x18/0x24
> >>> <4>[  399.025324]  el1h_64_irq+0x64/0x68
> >>> <4>[  399.025612]  _raw_spin_lock_bh+0x0/0x6c
> >>> <4>[  399.026102]  cleanup_net+0x280/0x45c
> >>> <4>[  399.026403]  process_one_work+0x1d4/0x310
> >>> <4>[  399.027140]  worker_thread+0x248/0x470
> >>> <4>[  399.027621]  kthread+0xfc/0x184
> >>> <4>[  399.028068]  ret_from_fork+0x10/0x20
> >>
> >> static void cleanup_net(struct work_struct *work)
> >> {
> >>      ...
> >>
> >>      synchronize_rcu();
> >>
> >>      /* Run all of the network namespace exit methods */
> >>      list_for_each_entry_reverse(ops, &pernet_list, list)
> >>              ops_exit_list(ops, &net_exit_list);
> >>      ...
> >>
> >> Why did the RCU sync above fail to work in this report, Eric?
> >
> > Why do you assume that synchronize_rcu() failed to work?
>
> In the ipv6 pernet_operations [1] for instance, dst_entries_destroy() is
> invoked after RCU sync to ensure that nobody is using the exiting net,
> but this report shows that protection falls apart.

Because synchronize_rcu() is not the same than rcu_barrier()

The dst_entries_add()/ percpu_counter_add_batch() call should not
happen after an rcu grace period.

Something like this (untested) patch

diff --git a/net/core/dst.c b/net/core/dst.c
index 980e2fd2f013b3e50cc47ed0666ee5f24f50444b..f02fdd1da6066a4d56c2a0aa8038eca76d62f8bd
100644
--- a/net/core/dst.c
+++ b/net/core/dst.c
@@ -163,8 +163,13 @@ EXPORT_SYMBOL(dst_dev_put);

 void dst_release(struct dst_entry *dst)
 {
-       if (dst && rcuref_put(&dst->__rcuref))
+       if (dst && rcuref_put(&dst->__rcuref)) {
+               if (!(dst->flags & DST_NOCOUNT)) {
+                       dst->flags |= DST_NOCOUNT;
+                       dst_entries_add(dst->ops, -1);
+               }
                call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
+       }
 }
 EXPORT_SYMBOL(dst_release);

It is not even clear why we are still counting dst these days.
We removed the ipv4 route cache a long time ago, and ipv6 got a
similar treatment.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2023-08-31 13:12       ` Eric Dumazet
@ 2023-09-03  0:53         ` Hillf Danton
  2023-09-04 11:29           ` Eric Dumazet
  2023-09-05  6:20         ` Naresh Kamboju
  1 sibling, 1 reply; 16+ messages in thread
From: Hillf Danton @ 2023-09-03  0:53 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tetsuo Handa, Netdev, Paul E. McKenney, Linus Torvalds,
	Naresh Kamboju, LKML

On Thu, 31 Aug 2023 15:12:30 +0200 Eric Dumazet <edumazet@google.com>
> On Thu, Aug 31, 2023 at 2:17=E2=80=AFPM Hillf Danton <hdanton@sina.com>
> > On Wed, 30 Aug 2023 21:44:57 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > >On 2023/08/30 20:26, Hillf Danton wrote:
> > >>> <4>[  399.014716] Call trace:
> > >>> <4>[  399.015702]  percpu_counter_add_batch+0x28/0xd0
> > >>> <4>[  399.016399]  dst_destroy+0x44/0x1e4
> > >>> <4>[  399.016681]  dst_destroy_rcu+0x14/0x20
> > >>> <4>[  399.017009]  rcu_core+0x2d0/0x5e0
> > >>> <4>[  399.017311]  rcu_core_si+0x10/0x1c
> > >>> <4>[  399.017609]  __do_softirq+0xd4/0x23c
> > >>> <4>[  399.017991]  ____do_softirq+0x10/0x1c
> > >>> <4>[  399.018320]  call_on_irq_stack+0x24/0x4c
> > >>> <4>[  399.018723]  do_softirq_own_stack+0x1c/0x28
> > >>> <4>[  399.022639]  __irq_exit_rcu+0x6c/0xcc
> > >>> <4>[  399.023434]  irq_exit_rcu+0x10/0x1c
> > >>> <4>[  399.023962]  el1_interrupt+0x8c/0xc0
> > >>> <4>[  399.024810]  el1h_64_irq_handler+0x18/0x24
> > >>> <4>[  399.025324]  el1h_64_irq+0x64/0x68
> > >>> <4>[  399.025612]  _raw_spin_lock_bh+0x0/0x6c
> > >>> <4>[  399.026102]  cleanup_net+0x280/0x45c
> > >>> <4>[  399.026403]  process_one_work+0x1d4/0x310
> > >>> <4>[  399.027140]  worker_thread+0x248/0x470
> > >>> <4>[  399.027621]  kthread+0xfc/0x184
> > >>> <4>[  399.028068]  ret_from_fork+0x10/0x20
> > >>
> > >> static void cleanup_net(struct work_struct *work)
> > >> {
> > >>      ...
> > >>
> > >>      synchronize_rcu();
> > >>
> > >>      /* Run all of the network namespace exit methods */
> > >>      list_for_each_entry_reverse(ops, &pernet_list, list)
> > >>              ops_exit_list(ops, &net_exit_list);
> > >>      ...
> > >>
> > >> Why did the RCU sync above fail to work in this report, Eric?
> > >
> > > Why do you assume that synchronize_rcu() failed to work?
> >
> > In the ipv6 pernet_operations [1] for instance, dst_entries_destroy() is
> > invoked after RCU sync to ensure that nobody is using the exiting net,
> > but this report shows that protection falls apart.
> 
> Because synchronize_rcu() is not the same than rcu_barrier()
> 
> The dst_entries_add()/ percpu_counter_add_batch() call should not
> happen after an rcu grace period.

	cpu2			cpu3
	====			====
	cleanup_net()		rcu_read_lock();
				it is safe to use either netns or dst
				rcu_read_unlock();
	synchronize_rcu();
				unsafe to access anyone now
> 
> Something like this (untested) patch
> 
> diff --git a/net/core/dst.c b/net/core/dst.c
> index 980e2fd2f013b3e50cc47ed0666ee5f24f50444b..f02fdd1da6066a4d56c2a0aa8038eca76d62f8bd
> 100644
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -163,8 +163,13 @@ EXPORT_SYMBOL(dst_dev_put);
> 
>  void dst_release(struct dst_entry *dst)
>  {
> -       if (dst && rcuref_put(&dst->__rcuref))
> +       if (dst && rcuref_put(&dst->__rcuref)) {
> +               if (!(dst->flags & DST_NOCOUNT)) {
> +                       dst->flags |=3D DST_NOCOUNT;
> +                       dst_entries_add(dst->ops, -1);

Could this add happen after the rcu sync above?

> +               }
>                 call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
> +       }
>  }
>  EXPORT_SYMBOL(dst_release);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2023-09-03  0:53         ` Hillf Danton
@ 2023-09-04 11:29           ` Eric Dumazet
  2023-09-05 11:10             ` Hillf Danton
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2023-09-04 11:29 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Tetsuo Handa, Netdev, Paul E. McKenney, Linus Torvalds,
	Naresh Kamboju, LKML

On Sun, Sep 3, 2023 at 5:57 AM Hillf Danton <hdanton@sina.com> wrote:
>
> On Thu, 31 Aug 2023 15:12:30 +0200 Eric Dumazet <edumazet@google.com>
> > On Thu, Aug 31, 2023 at 2:17=E2=80=AFPM Hillf Danton <hdanton@sina.com>
> > > On Wed, 30 Aug 2023 21:44:57 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > >On 2023/08/30 20:26, Hillf Danton wrote:
> > > >>> <4>[  399.014716] Call trace:
> > > >>> <4>[  399.015702]  percpu_counter_add_batch+0x28/0xd0
> > > >>> <4>[  399.016399]  dst_destroy+0x44/0x1e4
> > > >>> <4>[  399.016681]  dst_destroy_rcu+0x14/0x20
> > > >>> <4>[  399.017009]  rcu_core+0x2d0/0x5e0
> > > >>> <4>[  399.017311]  rcu_core_si+0x10/0x1c
> > > >>> <4>[  399.017609]  __do_softirq+0xd4/0x23c
> > > >>> <4>[  399.017991]  ____do_softirq+0x10/0x1c
> > > >>> <4>[  399.018320]  call_on_irq_stack+0x24/0x4c
> > > >>> <4>[  399.018723]  do_softirq_own_stack+0x1c/0x28
> > > >>> <4>[  399.022639]  __irq_exit_rcu+0x6c/0xcc
> > > >>> <4>[  399.023434]  irq_exit_rcu+0x10/0x1c
> > > >>> <4>[  399.023962]  el1_interrupt+0x8c/0xc0
> > > >>> <4>[  399.024810]  el1h_64_irq_handler+0x18/0x24
> > > >>> <4>[  399.025324]  el1h_64_irq+0x64/0x68
> > > >>> <4>[  399.025612]  _raw_spin_lock_bh+0x0/0x6c
> > > >>> <4>[  399.026102]  cleanup_net+0x280/0x45c
> > > >>> <4>[  399.026403]  process_one_work+0x1d4/0x310
> > > >>> <4>[  399.027140]  worker_thread+0x248/0x470
> > > >>> <4>[  399.027621]  kthread+0xfc/0x184
> > > >>> <4>[  399.028068]  ret_from_fork+0x10/0x20
> > > >>
> > > >> static void cleanup_net(struct work_struct *work)
> > > >> {
> > > >>      ...
> > > >>
> > > >>      synchronize_rcu();
> > > >>
> > > >>      /* Run all of the network namespace exit methods */
> > > >>      list_for_each_entry_reverse(ops, &pernet_list, list)
> > > >>              ops_exit_list(ops, &net_exit_list);
> > > >>      ...
> > > >>
> > > >> Why did the RCU sync above fail to work in this report, Eric?
> > > >
> > > > Why do you assume that synchronize_rcu() failed to work?
> > >
> > > In the ipv6 pernet_operations [1] for instance, dst_entries_destroy() is
> > > invoked after RCU sync to ensure that nobody is using the exiting net,
> > > but this report shows that protection falls apart.
> >
> > Because synchronize_rcu() is not the same than rcu_barrier()
> >
> > The dst_entries_add()/ percpu_counter_add_batch() call should not
> > happen after an rcu grace period.
>
>         cpu2                    cpu3
>         ====                    ====
>         cleanup_net()           rcu_read_lock();
>                                 it is safe to use either netns or dst
>                                 rcu_read_unlock();
>         synchronize_rcu();
>                                 unsafe to access anyone now
> >
> > Something like this (untested) patch
> >
> > diff --git a/net/core/dst.c b/net/core/dst.c
> > index 980e2fd2f013b3e50cc47ed0666ee5f24f50444b..f02fdd1da6066a4d56c2a0aa8038eca76d62f8bd
> > 100644
> > --- a/net/core/dst.c
> > +++ b/net/core/dst.c
> > @@ -163,8 +163,13 @@ EXPORT_SYMBOL(dst_dev_put);
> >
> >  void dst_release(struct dst_entry *dst)
> >  {
> > -       if (dst && rcuref_put(&dst->__rcuref))
> > +       if (dst && rcuref_put(&dst->__rcuref)) {
> > +               if (!(dst->flags & DST_NOCOUNT)) {
> > +                       dst->flags |=3D DST_NOCOUNT;
> > +                       dst_entries_add(dst->ops, -1);
>
> Could this add happen after the rcu sync above?
>

I do not think so. All dst_release() should happen before netns removal.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2023-08-31 13:12       ` Eric Dumazet
  2023-09-03  0:53         ` Hillf Danton
@ 2023-09-05  6:20         ` Naresh Kamboju
  1 sibling, 0 replies; 16+ messages in thread
From: Naresh Kamboju @ 2023-09-05  6:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hillf Danton, Tetsuo Handa, Netdev, Paul E. McKenney,
	Linus Torvalds, LKML

On Thu, 31 Aug 2023 at 18:42, Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Aug 31, 2023 at 2:17 PM Hillf Danton <hdanton@sina.com> wrote:
> >
> > On Wed, 30 Aug 2023 21:44:57 +0900 Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > >On 2023/08/30 20:26, Hillf Danton wrote:
> > >>> <4>[  399.014716] Call trace:
> > >>> <4>[  399.015702]  percpu_counter_add_batch+0x28/0xd0
> > >>> <4>[  399.016399]  dst_destroy+0x44/0x1e4
> > >>> <4>[  399.016681]  dst_destroy_rcu+0x14/0x20
> > >>> <4>[  399.017009]  rcu_core+0x2d0/0x5e0
> > >>> <4>[  399.017311]  rcu_core_si+0x10/0x1c
> > >>> <4>[  399.017609]  __do_softirq+0xd4/0x23c
> > >>> <4>[  399.017991]  ____do_softirq+0x10/0x1c
> > >>> <4>[  399.018320]  call_on_irq_stack+0x24/0x4c
> > >>> <4>[  399.018723]  do_softirq_own_stack+0x1c/0x28
> > >>> <4>[  399.022639]  __irq_exit_rcu+0x6c/0xcc
> > >>> <4>[  399.023434]  irq_exit_rcu+0x10/0x1c
> > >>> <4>[  399.023962]  el1_interrupt+0x8c/0xc0
> > >>> <4>[  399.024810]  el1h_64_irq_handler+0x18/0x24
> > >>> <4>[  399.025324]  el1h_64_irq+0x64/0x68
> > >>> <4>[  399.025612]  _raw_spin_lock_bh+0x0/0x6c
> > >>> <4>[  399.026102]  cleanup_net+0x280/0x45c
> > >>> <4>[  399.026403]  process_one_work+0x1d4/0x310
> > >>> <4>[  399.027140]  worker_thread+0x248/0x470
> > >>> <4>[  399.027621]  kthread+0xfc/0x184
> > >>> <4>[  399.028068]  ret_from_fork+0x10/0x20
> > >>
> > >> static void cleanup_net(struct work_struct *work)
> > >> {
> > >>      ...
> > >>
> > >>      synchronize_rcu();
> > >>
> > >>      /* Run all of the network namespace exit methods */
> > >>      list_for_each_entry_reverse(ops, &pernet_list, list)
> > >>              ops_exit_list(ops, &net_exit_list);
> > >>      ...
> > >>
> > >> Why did the RCU sync above fail to work in this report, Eric?
> > >
> > > Why do you assume that synchronize_rcu() failed to work?
> >
> > In the ipv6 pernet_operations [1] for instance, dst_entries_destroy() is
> > invoked after RCU sync to ensure that nobody is using the exiting net,
> > but this report shows that protection falls apart.
>
> Because synchronize_rcu() is not the same than rcu_barrier()
>
> The dst_entries_add()/ percpu_counter_add_batch() call should not
> happen after an rcu grace period.
>
> Something like this (untested) patch
>
> diff --git a/net/core/dst.c b/net/core/dst.c
> index 980e2fd2f013b3e50cc47ed0666ee5f24f50444b..f02fdd1da6066a4d56c2a0aa8038eca76d62f8bd
> 100644
> --- a/net/core/dst.c
> +++ b/net/core/dst.c
> @@ -163,8 +163,13 @@ EXPORT_SYMBOL(dst_dev_put);
>
>  void dst_release(struct dst_entry *dst)
>  {
> -       if (dst && rcuref_put(&dst->__rcuref))
> +       if (dst && rcuref_put(&dst->__rcuref)) {
> +               if (!(dst->flags & DST_NOCOUNT)) {
> +                       dst->flags |= DST_NOCOUNT;
> +                       dst_entries_add(dst->ops, -1);
> +               }
>                 call_rcu_hurry(&dst->rcu_head, dst_destroy_rcu);
> +       }
>  }
>  EXPORT_SYMBOL(dst_release);
>

The above patch applied on top of Linux next and tested
selftests : net: pmtu.sh and test did not crash.

Tested-by: Linux Kernel Functional Testing <lkft@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>

> It is not even clear why we are still counting dst these days.
> We removed the ipv4 route cache a long time ago, and ipv6 got a
> similar treatment.

Links,
https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/naresh/tests/2Uw58x3xJIqqpYDZspTytGe3L1V
https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/naresh/tests/2Uw58x3xJIqqpYDZspTytGe3L1V/logs?format=html


--
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2023-09-04 11:29           ` Eric Dumazet
@ 2023-09-05 11:10             ` Hillf Danton
  2023-09-05 12:24               ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Hillf Danton @ 2023-09-05 11:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tetsuo Handa, Netdev, Paul E. McKenney, Linus Torvalds,
	Naresh Kamboju, LKML

On Mon, 4 Sep 2023 13:29:57 +0200 Eric Dumazet <edumazet@google.com>
> On Sun, Sep 3, 2023 at 5:57=E2=80=AFAM Hillf Danton <hdanton@sina.com>
> > On Thu, 31 Aug 2023 15:12:30 +0200 Eric Dumazet <edumazet@google.com>
> > > --- a/net/core/dst.c
> > > +++ b/net/core/dst.c
> > > @@ -163,8 +163,13 @@ EXPORT_SYMBOL(dst_dev_put);
> > >
> > >  void dst_release(struct dst_entry *dst)
> > >  {
> > > -       if (dst && rcuref_put(&dst->__rcuref))
> > > +       if (dst && rcuref_put(&dst->__rcuref)) {
> > > +               if (!(dst->flags & DST_NOCOUNT)) {
> > > +                       dst->flags |= DST_NOCOUNT;
> > > +                       dst_entries_add(dst->ops, -1);
> >
> > Could this add happen after the rcu sync above?
> >
> I do not think so. All dst_release() should happen before netns removal.

	cpu2                    cpu3
	====                    ====
	cleanup_net()           __sys_sendto
	                        sock_sendmsg()
	                        udpv6_sendmsg()
	synchronize_rcu();
				dst_release()

Could this one be an exception?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2023-09-05 11:10             ` Hillf Danton
@ 2023-09-05 12:24               ` Eric Dumazet
  2023-10-17 17:02                 ` Naresh Kamboju
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2023-09-05 12:24 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Tetsuo Handa, Netdev, Paul E. McKenney, Linus Torvalds,
	Naresh Kamboju, LKML

On Tue, Sep 5, 2023 at 1:52 PM Hillf Danton <hdanton@sina.com> wrote:
>
> On Mon, 4 Sep 2023 13:29:57 +0200 Eric Dumazet <edumazet@google.com>
> > On Sun, Sep 3, 2023 at 5:57=E2=80=AFAM Hillf Danton <hdanton@sina.com>
> > > On Thu, 31 Aug 2023 15:12:30 +0200 Eric Dumazet <edumazet@google.com>
> > > > --- a/net/core/dst.c
> > > > +++ b/net/core/dst.c
> > > > @@ -163,8 +163,13 @@ EXPORT_SYMBOL(dst_dev_put);
> > > >
> > > >  void dst_release(struct dst_entry *dst)
> > > >  {
> > > > -       if (dst && rcuref_put(&dst->__rcuref))
> > > > +       if (dst && rcuref_put(&dst->__rcuref)) {
> > > > +               if (!(dst->flags & DST_NOCOUNT)) {
> > > > +                       dst->flags |= DST_NOCOUNT;
> > > > +                       dst_entries_add(dst->ops, -1);
> > >
> > > Could this add happen after the rcu sync above?
> > >
> > I do not think so. All dst_release() should happen before netns removal.
>
>         cpu2                    cpu3
>         ====                    ====
>         cleanup_net()           __sys_sendto
>                                 sock_sendmsg()
>                                 udpv6_sendmsg()
>         synchronize_rcu();
>                                 dst_release()
>
> Could this one be an exception?

No idea what you are trying to say.

Please give exact locations, instead of being rather vague.

Note that an UDP socket can not send a packet while its netns is dismantled,
because alive sockets keep a reference on the netns.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2023-09-05 12:24               ` Eric Dumazet
@ 2023-10-17 17:02                 ` Naresh Kamboju
  2024-10-06 18:08                   ` Xin Long
  0 siblings, 1 reply; 16+ messages in thread
From: Naresh Kamboju @ 2023-10-17 17:02 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Hillf Danton, Tetsuo Handa, Netdev, Paul E. McKenney,
	Linus Torvalds, LKML, Dan Carpenter

On Tue, 5 Sept 2023 at 17:55, Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Sep 5, 2023 at 1:52 PM Hillf Danton <hdanton@sina.com> wrote:
> >
> > On Mon, 4 Sep 2023 13:29:57 +0200 Eric Dumazet <edumazet@google.com>
> > > On Sun, Sep 3, 2023 at 5:57=E2=80=AFAM Hillf Danton <hdanton@sina.com>
> > > > On Thu, 31 Aug 2023 15:12:30 +0200 Eric Dumazet <edumazet@google.com>
> > > > > --- a/net/core/dst.c
> > > > > +++ b/net/core/dst.c
> > > > > @@ -163,8 +163,13 @@ EXPORT_SYMBOL(dst_dev_put);
> > > > >
> > > > >  void dst_release(struct dst_entry *dst)
> > > > >  {
> > > > > -       if (dst && rcuref_put(&dst->__rcuref))
> > > > > +       if (dst && rcuref_put(&dst->__rcuref)) {
> > > > > +               if (!(dst->flags & DST_NOCOUNT)) {
> > > > > +                       dst->flags |= DST_NOCOUNT;
> > > > > +                       dst_entries_add(dst->ops, -1);
> > > >
> > > > Could this add happen after the rcu sync above?
> > > >
> > > I do not think so. All dst_release() should happen before netns removal.
> >
> >         cpu2                    cpu3
> >         ====                    ====
> >         cleanup_net()           __sys_sendto
> >                                 sock_sendmsg()
> >                                 udpv6_sendmsg()
> >         synchronize_rcu();
> >                                 dst_release()
> >
> > Could this one be an exception?
>
> No idea what you are trying to say.
>
> Please give exact locations, instead of being rather vague.
>
> Note that an UDP socket can not send a packet while its netns is dismantled,
> because alive sockets keep a reference on the netns.

Gentle reminder.
This is still an open issue.

# selftests: net: pmtu.sh
# TEST: ipv4: PMTU exceptions                                         [ OK ]
# TEST: ipv4: PMTU exceptions - nexthop objects                       [ OK ]
# TEST: ipv6: PMTU exceptions                                         [ OK ]
# TEST: ipv6: PMTU exceptions - nexthop objects                       [ OK ]
# TEST: ICMPv4 with DSCP and ECN: PMTU exceptions                     [ OK ]
# TEST: ICMPv4 with DSCP and ECN: PMTU exceptions - nexthop objects   [ OK ]
# TEST: UDPv4 with DSCP and ECN: PMTU exceptions                      [ OK ]
# TEST: UDPv4 with DSCP and ECN: PMTU exceptions - nexthop objects    [ OK ]
# TEST: IPv4 over vxlan4: PMTU exceptions                             [ OK ]
# TEST: IPv4 over vxlan4: PMTU exceptions - nexthop objects           [ OK ]
# TEST: IPv6 over vxlan4: PMTU exceptions                             [ OK ]
# TEST: IPv6 over vxlan4: PMTU exceptions - nexthop objects           [ OK ]
# TEST: IPv4 over vxlan6: PMTU exceptions                             [ OK ]
<1>[  155.820793] Unable to handle kernel paging request at virtual
address ffff247020442000
<1>[  155.821495] Mem abort info:
<1>[  155.821719]   ESR = 0x0000000097b58004
<1>[  155.822046]   EC = 0x25: DABT (current EL), IL = 32 bits
<1>[  155.822412]   SET = 0, FnV = 0
<1>[  155.822648]   EA = 0, S1PTW = 0
<1>[  155.822925]   FSC = 0x04: level 0 translation fault
<1>[  155.823317] Data abort info:
<1>[  155.823590]   Access size = 4 byte(s)
<1>[  155.823886]   SSE = 1, SRT = 21
<1>[  155.824167]   SF = 1, AR = 0
<1>[  155.824450]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
<1>[  155.824847]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
<1>[  155.825345] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041d84000
<1>[  155.827244] [ffff247020442000] pgd=0000000000000000, p4d=0000000000000000
<0>[  155.828511] Internal error: Oops: 0000000097b58004 [#1] PREEMPT SMP
<4>[  155.829155] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel
act_csum libcrc32c act_pedit cls_flower sch_prio veth vrf macvtap
macvlan tap crct10dif_ce sm3_ce sm3 sha3_ce sha512_ce sha512_arm64
fuse drm backlight dm_mod ip_tables x_tables [last unloaded:
test_blackhole_dev]
<4>[  155.832289] CPU: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 6.6.0-rc6 #1
<4>[  155.832896] Hardware name: linux,dummy-virt (DT)
<4>[  155.833927] pstate: 824000c9 (Nzcv daIF +PAN -UAO +TCO -DIT
-SSBS BTYPE=--)
<4>[  155.834496] pc : percpu_counter_add_batch+0x24/0xcc
<4>[  155.835735] lr : dst_destroy+0x44/0x1e4

Links:
- https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.6-rc6/testrun/20613439/suite/log-parser-test/test/check-kernel-oops/log
- https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.6-rc6/testrun/20613439/suite/log-parser-test/tests/

- Naresh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2023-10-17 17:02                 ` Naresh Kamboju
@ 2024-10-06 18:08                   ` Xin Long
  2024-10-06 18:58                     ` Eric Dumazet
                                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Xin Long @ 2024-10-06 18:08 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Eric Dumazet, Hillf Danton, Tetsuo Handa, Netdev,
	Paul E. McKenney, Linus Torvalds, LKML, Dan Carpenter

Sorry for bringing up this issue, it recently occurred on my aarch64 kernel
with blackhole_netdev backported. I tracked it down, and when deleting
the netns, the path is:

In cleanup_net():

  default_device_exit_batch()
    unregister_netdevice_many()
      addrconf_ifdown() -> call_rcu(rcu, fib6_info_destroy_rcu) <--- [1]
    netdev_run_todo()
      rcu_barrier() <- [2]
  ip6_route_net_exit() -> dst_entries_destroy(net->ip6_dst_ops) <--- [3]

In fib6_info_destroy_rcu():

  dst_dev_put()
  dst_release() -> call_rcu(rcu, dst_destroy_rcu) <--- [5]

In dst_destroy_rcu():
  dst_destroy() -> dst_entries_add(dst->ops, -1); <--- [6]

fib6_info_destroy_rcu() is scheduled at [1], rcu_barrier() will wait
for fib6_info_destroy_rcu() to be done at [2]. However, another callback
dst_destroy_rcu() is scheduled() in fib6_info_destroy_rcu() at [5], and
there's no place calling rcu_barrier() to wait for dst_destroy_rcu() to
be done. It means dst_entries_add() at [6] might be run later than
dst_entries_destroy() at [3], then this UAF will trigger the panic.

On Tue, Oct 17, 2023 at 1:02 PM Naresh Kamboju
<naresh.kamboju@linaro.org> wrote:
>
> On Tue, 5 Sept 2023 at 17:55, Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, Sep 5, 2023 at 1:52 PM Hillf Danton <hdanton@sina.com> wrote:
> > >
> > > On Mon, 4 Sep 2023 13:29:57 +0200 Eric Dumazet <edumazet@google.com>
> > > > On Sun, Sep 3, 2023 at 5:57=E2=80=AFAM Hillf Danton <hdanton@sina.com>
> > > > > On Thu, 31 Aug 2023 15:12:30 +0200 Eric Dumazet <edumazet@google.com>
> > > > > > --- a/net/core/dst.c
> > > > > > +++ b/net/core/dst.c
> > > > > > @@ -163,8 +163,13 @@ EXPORT_SYMBOL(dst_dev_put);
> > > > > >
> > > > > >  void dst_release(struct dst_entry *dst)
> > > > > >  {
> > > > > > -       if (dst && rcuref_put(&dst->__rcuref))
> > > > > > +       if (dst && rcuref_put(&dst->__rcuref)) {
> > > > > > +               if (!(dst->flags & DST_NOCOUNT)) {
> > > > > > +                       dst->flags |= DST_NOCOUNT;
> > > > > > +                       dst_entries_add(dst->ops, -1);
> > > > >
So I think it makes sense to NOT call dst_entries_add() in the path
dst_destroy_rcu() -> dst_destroy(), as it does on the patch above,
but I don't see it get posted.

Hi, Eric, would you like to move forward with your patch above ?

Or we can also move the dst_entries_add(dst->ops, -1) from dst_destroy()
to dst_release():

Note, dst_destroy() is not used outside net/core/dst.c, we may delete
EXPORT_SYMBOL(dst_destroy) in the future.

Thanks.

> > > > > Could this add happen after the rcu sync above?
> > > > >
> > > > I do not think so. All dst_release() should happen before netns removal.
> > >
> > >         cpu2                    cpu3
> > >         ====                    ====
> > >         cleanup_net()           __sys_sendto
> > >                                 sock_sendmsg()
> > >                                 udpv6_sendmsg()
> > >         synchronize_rcu();
> > >                                 dst_release()
> > >
> > > Could this one be an exception?
> >
> > No idea what you are trying to say.
> >
> > Please give exact locations, instead of being rather vague.
> >
> > Note that an UDP socket can not send a packet while its netns is dismantled,
> > because alive sockets keep a reference on the netns.
>
> Gentle reminder.
> This is still an open issue.
>
> # selftests: net: pmtu.sh
> # TEST: ipv4: PMTU exceptions                                         [ OK ]
> # TEST: ipv4: PMTU exceptions - nexthop objects                       [ OK ]
> # TEST: ipv6: PMTU exceptions                                         [ OK ]
> # TEST: ipv6: PMTU exceptions - nexthop objects                       [ OK ]
> # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions                     [ OK ]
> # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions - nexthop objects   [ OK ]
> # TEST: UDPv4 with DSCP and ECN: PMTU exceptions                      [ OK ]
> # TEST: UDPv4 with DSCP and ECN: PMTU exceptions - nexthop objects    [ OK ]
> # TEST: IPv4 over vxlan4: PMTU exceptions                             [ OK ]
> # TEST: IPv4 over vxlan4: PMTU exceptions - nexthop objects           [ OK ]
> # TEST: IPv6 over vxlan4: PMTU exceptions                             [ OK ]
> # TEST: IPv6 over vxlan4: PMTU exceptions - nexthop objects           [ OK ]
> # TEST: IPv4 over vxlan6: PMTU exceptions                             [ OK ]
> <1>[  155.820793] Unable to handle kernel paging request at virtual
> address ffff247020442000
> <1>[  155.821495] Mem abort info:
> <1>[  155.821719]   ESR = 0x0000000097b58004
> <1>[  155.822046]   EC = 0x25: DABT (current EL), IL = 32 bits
> <1>[  155.822412]   SET = 0, FnV = 0
> <1>[  155.822648]   EA = 0, S1PTW = 0
> <1>[  155.822925]   FSC = 0x04: level 0 translation fault
> <1>[  155.823317] Data abort info:
> <1>[  155.823590]   Access size = 4 byte(s)
> <1>[  155.823886]   SSE = 1, SRT = 21
> <1>[  155.824167]   SF = 1, AR = 0
> <1>[  155.824450]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> <1>[  155.824847]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> <1>[  155.825345] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041d84000
> <1>[  155.827244] [ffff247020442000] pgd=0000000000000000, p4d=0000000000000000
> <0>[  155.828511] Internal error: Oops: 0000000097b58004 [#1] PREEMPT SMP
> <4>[  155.829155] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel
> act_csum libcrc32c act_pedit cls_flower sch_prio veth vrf macvtap
> macvlan tap crct10dif_ce sm3_ce sm3 sha3_ce sha512_ce sha512_arm64
> fuse drm backlight dm_mod ip_tables x_tables [last unloaded:
> test_blackhole_dev]
> <4>[  155.832289] CPU: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 6.6.0-rc6 #1
> <4>[  155.832896] Hardware name: linux,dummy-virt (DT)
> <4>[  155.833927] pstate: 824000c9 (Nzcv daIF +PAN -UAO +TCO -DIT
> -SSBS BTYPE=--)
> <4>[  155.834496] pc : percpu_counter_add_batch+0x24/0xcc
> <4>[  155.835735] lr : dst_destroy+0x44/0x1e4
>
> Links:
> - https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.6-rc6/testrun/20613439/suite/log-parser-test/test/check-kernel-oops/log
> - https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.6-rc6/testrun/20613439/suite/log-parser-test/tests/
>
> - Naresh
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2024-10-06 18:08                   ` Xin Long
@ 2024-10-06 18:58                     ` Eric Dumazet
  2024-10-06 19:04                       ` Eric Dumazet
  2024-10-08 10:53                     ` Hillf Danton
  2024-10-08 12:47                     ` Eric Dumazet
  2 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2024-10-06 18:58 UTC (permalink / raw)
  To: Xin Long
  Cc: Naresh Kamboju, Hillf Danton, Tetsuo Handa, Netdev,
	Paul E. McKenney, Linus Torvalds, LKML, Dan Carpenter

On Sun, Oct 6, 2024 at 8:08 PM Xin Long <lucien.xin@gmail.com> wrote:
>
> Sorry for bringing up this issue, it recently occurred on my aarch64 kernel
> with blackhole_netdev backported. I tracked it down, and when deleting
> the netns, the path is:
>
> In cleanup_net():
>
>   default_device_exit_batch()
>     unregister_netdevice_many()
>       addrconf_ifdown() -> call_rcu(rcu, fib6_info_destroy_rcu) <--- [1]
>     netdev_run_todo()
>       rcu_barrier() <- [2]
>   ip6_route_net_exit() -> dst_entries_destroy(net->ip6_dst_ops) <--- [3]
>
> In fib6_info_destroy_rcu():
>
>   dst_dev_put()
>   dst_release() -> call_rcu(rcu, dst_destroy_rcu) <--- [5]
>
> In dst_destroy_rcu():
>   dst_destroy() -> dst_entries_add(dst->ops, -1); <--- [6]
>
> fib6_info_destroy_rcu() is scheduled at [1], rcu_barrier() will wait
> for fib6_info_destroy_rcu() to be done at [2]. However, another callback
> dst_destroy_rcu() is scheduled() in fib6_info_destroy_rcu() at [5], and
> there's no place calling rcu_barrier() to wait for dst_destroy_rcu() to
> be done. It means dst_entries_add() at [6] might be run later than
> dst_entries_destroy() at [3], then this UAF will trigger the panic.
>
> On Tue, Oct 17, 2023 at 1:02 PM Naresh Kamboju
> <naresh.kamboju@linaro.org> wrote:
> >
> > On Tue, 5 Sept 2023 at 17:55, Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Tue, Sep 5, 2023 at 1:52 PM Hillf Danton <hdanton@sina.com> wrote:
> > > >
> > > > On Mon, 4 Sep 2023 13:29:57 +0200 Eric Dumazet <edumazet@google.com>
> > > > > On Sun, Sep 3, 2023 at 5:57=E2=80=AFAM Hillf Danton <hdanton@sina.com>
> > > > > > On Thu, 31 Aug 2023 15:12:30 +0200 Eric Dumazet <edumazet@google.com>
> > > > > > > --- a/net/core/dst.c
> > > > > > > +++ b/net/core/dst.c
> > > > > > > @@ -163,8 +163,13 @@ EXPORT_SYMBOL(dst_dev_put);
> > > > > > >
> > > > > > >  void dst_release(struct dst_entry *dst)
> > > > > > >  {
> > > > > > > -       if (dst && rcuref_put(&dst->__rcuref))
> > > > > > > +       if (dst && rcuref_put(&dst->__rcuref)) {
> > > > > > > +               if (!(dst->flags & DST_NOCOUNT)) {
> > > > > > > +                       dst->flags |= DST_NOCOUNT;
> > > > > > > +                       dst_entries_add(dst->ops, -1);
> > > > > >
> So I think it makes sense to NOT call dst_entries_add() in the path
> dst_destroy_rcu() -> dst_destroy(), as it does on the patch above,
> but I don't see it get posted.
>
> Hi, Eric, would you like to move forward with your patch above ?
>
> Or we can also move the dst_entries_add(dst->ops, -1) from dst_destroy()
> to dst_release():
>
> Note, dst_destroy() is not used outside net/core/dst.c, we may delete
> EXPORT_SYMBOL(dst_destroy) in the future.
>
>

Current kernel has known issue with dst_cache, triggering quite often
with  selftests: net: pmtu.sh

(Although for some reason it does no longer trigger 'often' any more
in my vng tests)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2024-10-06 18:58                     ` Eric Dumazet
@ 2024-10-06 19:04                       ` Eric Dumazet
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-10-06 19:04 UTC (permalink / raw)
  To: Xin Long
  Cc: Naresh Kamboju, Hillf Danton, Tetsuo Handa, Netdev,
	Paul E. McKenney, Linus Torvalds, LKML, Dan Carpenter

On Sun, Oct 6, 2024 at 8:58 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sun, Oct 6, 2024 at 8:08 PM Xin Long <lucien.xin@gmail.com> wrote:
> >
> > Sorry for bringing up this issue, it recently occurred on my aarch64 kernel
> > with blackhole_netdev backported. I tracked it down, and when deleting
> > the netns, the path is:
> >
> > In cleanup_net():
> >
> >   default_device_exit_batch()
> >     unregister_netdevice_many()
> >       addrconf_ifdown() -> call_rcu(rcu, fib6_info_destroy_rcu) <--- [1]
> >     netdev_run_todo()
> >       rcu_barrier() <- [2]
> >   ip6_route_net_exit() -> dst_entries_destroy(net->ip6_dst_ops) <--- [3]
> >
> > In fib6_info_destroy_rcu():
> >
> >   dst_dev_put()
> >   dst_release() -> call_rcu(rcu, dst_destroy_rcu) <--- [5]
> >
> > In dst_destroy_rcu():
> >   dst_destroy() -> dst_entries_add(dst->ops, -1); <--- [6]
> >
> > fib6_info_destroy_rcu() is scheduled at [1], rcu_barrier() will wait
> > for fib6_info_destroy_rcu() to be done at [2]. However, another callback
> > dst_destroy_rcu() is scheduled() in fib6_info_destroy_rcu() at [5], and
> > there's no place calling rcu_barrier() to wait for dst_destroy_rcu() to
> > be done. It means dst_entries_add() at [6] might be run later than
> > dst_entries_destroy() at [3], then this UAF will trigger the panic.
> >
> > On Tue, Oct 17, 2023 at 1:02 PM Naresh Kamboju
> > <naresh.kamboju@linaro.org> wrote:
> > >
> > > On Tue, 5 Sept 2023 at 17:55, Eric Dumazet <edumazet@google.com> wrote:
> > > >
> > > > On Tue, Sep 5, 2023 at 1:52 PM Hillf Danton <hdanton@sina.com> wrote:
> > > > >
> > > > > On Mon, 4 Sep 2023 13:29:57 +0200 Eric Dumazet <edumazet@google.com>
> > > > > > On Sun, Sep 3, 2023 at 5:57=E2=80=AFAM Hillf Danton <hdanton@sina.com>
> > > > > > > On Thu, 31 Aug 2023 15:12:30 +0200 Eric Dumazet <edumazet@google.com>
> > > > > > > > --- a/net/core/dst.c
> > > > > > > > +++ b/net/core/dst.c
> > > > > > > > @@ -163,8 +163,13 @@ EXPORT_SYMBOL(dst_dev_put);
> > > > > > > >
> > > > > > > >  void dst_release(struct dst_entry *dst)
> > > > > > > >  {
> > > > > > > > -       if (dst && rcuref_put(&dst->__rcuref))
> > > > > > > > +       if (dst && rcuref_put(&dst->__rcuref)) {
> > > > > > > > +               if (!(dst->flags & DST_NOCOUNT)) {
> > > > > > > > +                       dst->flags |= DST_NOCOUNT;
> > > > > > > > +                       dst_entries_add(dst->ops, -1);
> > > > > > >
> > So I think it makes sense to NOT call dst_entries_add() in the path
> > dst_destroy_rcu() -> dst_destroy(), as it does on the patch above,
> > but I don't see it get posted.
> >
> > Hi, Eric, would you like to move forward with your patch above ?
> >
> > Or we can also move the dst_entries_add(dst->ops, -1) from dst_destroy()
> > to dst_release():
> >
> > Note, dst_destroy() is not used outside net/core/dst.c, we may delete
> > EXPORT_SYMBOL(dst_destroy) in the future.
> >
> >
>
> Current kernel has known issue with dst_cache, triggering quite often
> with  selftests: net: pmtu.sh
>
> (Although for some reason it does no longer trigger 'often' any more
> in my vng tests)

Simple hack/patch to 'disable' dst_cache, if you want to confirm the
issue is there.


diff --git a/net/core/dst_cache.c b/net/core/dst_cache.c
index 70c634b9e7b02300188582a1634d5977838db132..53351ff58b35dbee37ff587f7ef8f72580d9e116
100644
--- a/net/core/dst_cache.c
+++ b/net/core/dst_cache.c
@@ -142,12 +142,7 @@ EXPORT_SYMBOL_GPL(dst_cache_get_ip6);

 int dst_cache_init(struct dst_cache *dst_cache, gfp_t gfp)
 {
-       dst_cache->cache = alloc_percpu_gfp(struct dst_cache_pcpu,
-                                           gfp | __GFP_ZERO);
-       if (!dst_cache->cache)
-               return -ENOMEM;
-
-       dst_cache_reset(dst_cache);
+       dst_cache->cache = NULL;
        return 0;
 }
 EXPORT_SYMBOL_GPL(dst_cache_init);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2024-10-06 18:08                   ` Xin Long
  2024-10-06 18:58                     ` Eric Dumazet
@ 2024-10-08 10:53                     ` Hillf Danton
  2024-10-08 12:47                     ` Eric Dumazet
  2 siblings, 0 replies; 16+ messages in thread
From: Hillf Danton @ 2024-10-08 10:53 UTC (permalink / raw)
  To: Xin Long
  Cc: Naresh Kamboju, Eric Dumazet, Hillf Danton, Tetsuo Handa, Netdev,
	Paul E. McKenney, Linus Torvalds, LKML, Dan Carpenter

On Sun, 6 Oct 2024 14:08:18 -0400 Xin Long <lucien.xin@gmail.com>
> Sorry for bringing up this issue, it recently occurred on my aarch64 kernel
> with blackhole_netdev backported. I tracked it down, and when deleting
> the netns, the path is:
> 
> In cleanup_net():
> 
>   default_device_exit_batch()
>     unregister_netdevice_many()
>       addrconf_ifdown() -> call_rcu(rcu, fib6_info_destroy_rcu) <--- [1]
>     netdev_run_todo()
>       rcu_barrier() <- [2]
>   ip6_route_net_exit() -> dst_entries_destroy(net->ip6_dst_ops) <--- [3]
> 
> In fib6_info_destroy_rcu():
> 
>   dst_dev_put()
>   dst_release() -> call_rcu(rcu, dst_destroy_rcu) <--- [5]
> 
> In dst_destroy_rcu():
>   dst_destroy() -> dst_entries_add(dst->ops, -1); <--- [6]
> 
> fib6_info_destroy_rcu() is scheduled at [1], rcu_barrier() will wait
> for fib6_info_destroy_rcu() to be done at [2]. However, another callback
> dst_destroy_rcu() is scheduled() in fib6_info_destroy_rcu() at [5], and
> there's no place calling rcu_barrier() to wait for dst_destroy_rcu() to
> be done. It means dst_entries_add() at [6] might be run later than
> dst_entries_destroy() at [3], then this UAF will trigger the panic.
> 
No more important discoveries than this one in the net core so far in 2024.

Thanks
Hillf

> On Tue, Oct 17, 2023 at 1:02 PM Naresh Kamboju
> <naresh.kamboju@linaro.org> wrote:
> >
> > On Tue, 5 Sept 2023 at 17:55, Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Tue, Sep 5, 2023 at 1:52 PM Hillf Danton <hdanton@sina.com> wrote:
> > > >
> > > > On Mon, 4 Sep 2023 13:29:57 +0200 Eric Dumazet <edumazet@google.com>
> > > > > On Sun, Sep 3, 2023 at 5:57=E2=80=AFAM Hillf Danton <hdanton@sina.com>
> > > > > > On Thu, 31 Aug 2023 15:12:30 +0200 Eric Dumazet <edumazet@google.com>
> > > > > > > --- a/net/core/dst.c
> > > > > > > +++ b/net/core/dst.c
> > > > > > > @@ -163,8 +163,13 @@ EXPORT_SYMBOL(dst_dev_put);
> > > > > > >
> > > > > > >  void dst_release(struct dst_entry *dst)
> > > > > > >  {
> > > > > > > -       if (dst && rcuref_put(&dst->__rcuref))
> > > > > > > +       if (dst && rcuref_put(&dst->__rcuref)) {
> > > > > > > +               if (!(dst->flags & DST_NOCOUNT)) {
> > > > > > > +                       dst->flags |= DST_NOCOUNT;
> > > > > > > +                       dst_entries_add(dst->ops, -1);
> > > > > >
> So I think it makes sense to NOT call dst_entries_add() in the path
> dst_destroy_rcu() -> dst_destroy(), as it does on the patch above,
> but I don't see it get posted.
> 
> Hi, Eric, would you like to move forward with your patch above ?
> 
> Or we can also move the dst_entries_add(dst->ops, -1) from dst_destroy()
> to dst_release():
> 
> Note, dst_destroy() is not used outside net/core/dst.c, we may delete
> EXPORT_SYMBOL(dst_destroy) in the future.
> 
> Thanks.
> 
> > > > > > Could this add happen after the rcu sync above?
> > > > > >
> > > > > I do not think so. All dst_release() should happen before netns removal.
> > > >
> > > >         cpu2                    cpu3
> > > >         ====                    ====
> > > >         cleanup_net()           __sys_sendto
> > > >                                 sock_sendmsg()
> > > >                                 udpv6_sendmsg()
> > > >         synchronize_rcu();
> > > >                                 dst_release()
> > > >
> > > > Could this one be an exception?
> > >
> > > No idea what you are trying to say.
> > >
> > > Please give exact locations, instead of being rather vague.
> > >
> > > Note that an UDP socket can not send a packet while its netns is dismantled,
> > > because alive sockets keep a reference on the netns.
> >
> > Gentle reminder.
> > This is still an open issue.
> >
> > # selftests: net: pmtu.sh
> > # TEST: ipv4: PMTU exceptions                                         [ OK ]
> > # TEST: ipv4: PMTU exceptions - nexthop objects                       [ OK ]
> > # TEST: ipv6: PMTU exceptions                                         [ OK ]
> > # TEST: ipv6: PMTU exceptions - nexthop objects                       [ OK ]
> > # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions                     [ OK ]
> > # TEST: ICMPv4 with DSCP and ECN: PMTU exceptions - nexthop objects   [ OK ]
> > # TEST: UDPv4 with DSCP and ECN: PMTU exceptions                      [ OK ]
> > # TEST: UDPv4 with DSCP and ECN: PMTU exceptions - nexthop objects    [ OK ]
> > # TEST: IPv4 over vxlan4: PMTU exceptions                             [ OK ]
> > # TEST: IPv4 over vxlan4: PMTU exceptions - nexthop objects           [ OK ]
> > # TEST: IPv6 over vxlan4: PMTU exceptions                             [ OK ]
> > # TEST: IPv6 over vxlan4: PMTU exceptions - nexthop objects           [ OK ]
> > # TEST: IPv4 over vxlan6: PMTU exceptions                             [ OK ]
> > <1>[  155.820793] Unable to handle kernel paging request at virtual
> > address ffff247020442000
> > <1>[  155.821495] Mem abort info:
> > <1>[  155.821719]   ESR = 0x0000000097b58004
> > <1>[  155.822046]   EC = 0x25: DABT (current EL), IL = 32 bits
> > <1>[  155.822412]   SET = 0, FnV = 0
> > <1>[  155.822648]   EA = 0, S1PTW = 0
> > <1>[  155.822925]   FSC = 0x04: level 0 translation fault
> > <1>[  155.823317] Data abort info:
> > <1>[  155.823590]   Access size = 4 byte(s)
> > <1>[  155.823886]   SSE = 1, SRT = 21
> > <1>[  155.824167]   SF = 1, AR = 0
> > <1>[  155.824450]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > <1>[  155.824847]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > <1>[  155.825345] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041d84000
> > <1>[  155.827244] [ffff247020442000] pgd=0000000000000000, p4d=0000000000000000
> > <0>[  155.828511] Internal error: Oops: 0000000097b58004 [#1] PREEMPT SMP
> > <4>[  155.829155] Modules linked in: vxlan ip6_udp_tunnel udp_tunnel
> > act_csum libcrc32c act_pedit cls_flower sch_prio veth vrf macvtap
> > macvlan tap crct10dif_ce sm3_ce sm3 sha3_ce sha512_ce sha512_arm64
> > fuse drm backlight dm_mod ip_tables x_tables [last unloaded:
> > test_blackhole_dev]
> > <4>[  155.832289] CPU: 0 PID: 15 Comm: ksoftirqd/0 Not tainted 6.6.0-rc6 #1
> > <4>[  155.832896] Hardware name: linux,dummy-virt (DT)
> > <4>[  155.833927] pstate: 824000c9 (Nzcv daIF +PAN -UAO +TCO -DIT
> > -SSBS BTYPE=--)
> > <4>[  155.834496] pc : percpu_counter_add_batch+0x24/0xcc
> > <4>[  155.835735] lr : dst_destroy+0x44/0x1e4
> >
> > Links:
> > - https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.6-rc6/testrun/20613439/suite/log-parser-test/test/check-kernel-oops/log
> > - https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.6-rc6/testrun/20613439/suite/log-parser-test/tests/
> >
> > - Naresh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address
  2024-10-06 18:08                   ` Xin Long
  2024-10-06 18:58                     ` Eric Dumazet
  2024-10-08 10:53                     ` Hillf Danton
@ 2024-10-08 12:47                     ` Eric Dumazet
  2 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2024-10-08 12:47 UTC (permalink / raw)
  To: Xin Long
  Cc: Naresh Kamboju, Hillf Danton, Tetsuo Handa, Netdev,
	Paul E. McKenney, Linus Torvalds, LKML, Dan Carpenter

On Sun, Oct 6, 2024 at 8:08 PM Xin Long <lucien.xin@gmail.com> wrote:
>
> Sorry for bringing up this issue, it recently occurred on my aarch64 kernel
> with blackhole_netdev backported. I tracked it down, and when deleting
> the netns, the path is:
>
> In cleanup_net():
>
>   default_device_exit_batch()
>     unregister_netdevice_many()
>       addrconf_ifdown() -> call_rcu(rcu, fib6_info_destroy_rcu) <--- [1]
>     netdev_run_todo()
>       rcu_barrier() <- [2]
>   ip6_route_net_exit() -> dst_entries_destroy(net->ip6_dst_ops) <--- [3]
>
> In fib6_info_destroy_rcu():
>
>   dst_dev_put()
>   dst_release() -> call_rcu(rcu, dst_destroy_rcu) <--- [5]
>
> In dst_destroy_rcu():
>   dst_destroy() -> dst_entries_add(dst->ops, -1); <--- [6]
>
> fib6_info_destroy_rcu() is scheduled at [1], rcu_barrier() will wait
> for fib6_info_destroy_rcu() to be done at [2]. However, another callback
> dst_destroy_rcu() is scheduled() in fib6_info_destroy_rcu() at [5], and
> there's no place calling rcu_barrier() to wait for dst_destroy_rcu() to
> be done. It means dst_entries_add() at [6] might be run later than
> dst_entries_destroy() at [3], then this UAF will trigger the panic.
>
> On Tue, Oct 17, 2023 at 1:02 PM Naresh Kamboju
> <naresh.kamboju@linaro.org> wrote:
> >
> > On Tue, 5 Sept 2023 at 17:55, Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Tue, Sep 5, 2023 at 1:52 PM Hillf Danton <hdanton@sina.com> wrote:
> > > >
> > > > On Mon, 4 Sep 2023 13:29:57 +0200 Eric Dumazet <edumazet@google.com>
> > > > > On Sun, Sep 3, 2023 at 5:57=E2=80=AFAM Hillf Danton <hdanton@sina.com>
> > > > > > On Thu, 31 Aug 2023 15:12:30 +0200 Eric Dumazet <edumazet@google.com>
> > > > > > > --- a/net/core/dst.c
> > > > > > > +++ b/net/core/dst.c
> > > > > > > @@ -163,8 +163,13 @@ EXPORT_SYMBOL(dst_dev_put);
> > > > > > >
> > > > > > >  void dst_release(struct dst_entry *dst)
> > > > > > >  {
> > > > > > > -       if (dst && rcuref_put(&dst->__rcuref))
> > > > > > > +       if (dst && rcuref_put(&dst->__rcuref)) {
> > > > > > > +               if (!(dst->flags & DST_NOCOUNT)) {
> > > > > > > +                       dst->flags |= DST_NOCOUNT;
> > > > > > > +                       dst_entries_add(dst->ops, -1);
> > > > > >
> So I think it makes sense to NOT call dst_entries_add() in the path
> dst_destroy_rcu() -> dst_destroy(), as it does on the patch above,
> but I don't see it get posted.
>
> Hi, Eric, would you like to move forward with your patch above ?

I am planning to send it soon.

>
> Or we can also move the dst_entries_add(dst->ops, -1) from dst_destroy()
> to dst_release():

If we remove the code from dst_destroy(), we must do it from its two callers,
dst_release() and dst_release_immediate()

No big deal, I am adding a helper to make this a bit cleaner.


>
> Note, dst_destroy() is not used outside net/core/dst.c, we may delete
> EXPORT_SYMBOL(dst_destroy) in the future.

Which version are you looking at ?

Upstream got this already.

commit 03ba6dc035c60991033529e630bd1552b2bca4d7
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Feb 2 17:37:46 2024 +0100

    net: dst: Make dst_destroy() static and return void.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-10-08 12:47 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-29  9:01 selftests: net: pmtu.sh: Unable to handle kernel paging request at virtual address Naresh Kamboju
2023-08-30 11:26 ` Hillf Danton
2023-08-30 12:44   ` Tetsuo Handa
2023-08-31 11:41     ` Hillf Danton
2023-08-31 13:12       ` Eric Dumazet
2023-09-03  0:53         ` Hillf Danton
2023-09-04 11:29           ` Eric Dumazet
2023-09-05 11:10             ` Hillf Danton
2023-09-05 12:24               ` Eric Dumazet
2023-10-17 17:02                 ` Naresh Kamboju
2024-10-06 18:08                   ` Xin Long
2024-10-06 18:58                     ` Eric Dumazet
2024-10-06 19:04                       ` Eric Dumazet
2024-10-08 10:53                     ` Hillf Danton
2024-10-08 12:47                     ` Eric Dumazet
2023-09-05  6:20         ` Naresh Kamboju

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).