linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Null pointer dereference in nsfs_evict with CONFIG_NET_NS=n triggered via systemd-networkd's debugging
@ 2020-11-03 13:22 Jan Kundrát
  0 siblings, 0 replies; only message in thread
From: Jan Kundrát @ 2020-11-03 13:22 UTC (permalink / raw)
  To: linux-fsdevel, Alexander Viro; +Cc: Vlastimil Babka, Lennart Poettering

Hi,
I'm getting the following oops on 5.9.3 (and 5.9.1, and 5.6.7, all with 
some unrelated patches, see [1]). In this crash, nsfs_evict() gets called 
with ns->ops being NULL.

[    6.947411] 8<--- cut here ---
[    6.950502] Unable to handle kernel NULL pointer dereference at virtual 
address 00000010
[    6.958685] pgd = da1de5c3
[    6.961417] [00000010] *pgd=3fcd2831
[    6.965047] Internal error: Oops: 17 [#1] SMP ARM
[    6.969781] CPU: 0 PID: 199 Comm: systemd-network Not tainted 
5.9.1-cla-cfb #1
[    6.977033] Hardware name: Marvell Armada 380/385 (Device Tree)
[    6.982991] PC is at nsfs_evict+0x18/0x20
[    6.987029] LR is at evict+0xac/0x188
[    6.990716] pc : [<c029aa84>]    lr : [<c027d40c>]    psr: 60010013
[    6.997009] sp : ecdefed0  ip : 00000001  fp : 00000000
[    7.002258] r10: c0c03e8c  r9 : 5ac3c35a  r8 : ef036910
[    7.007508] r7 : ed2d4880  r6 : c090a5c0  r5 : ed23c910  r4 : ed23c858
[    7.014064] r3 : 00000000  r2 : ed23c918  r1 : 00000000  r0 : c0c60190
[    7.020621] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
none
[    7.027787] Control: 10c5387d  Table: 2cc4404a  DAC: 00000051
[    7.033566] Process systemd-network (pid: 199, stack limit = 0x7d1d3b46)
[    7.040299] Stack: (0xecdefed0 to 0xecdf0000)
[    7.044684] fec0:                                     ed2d4880 00000000 
ed2d48d0 c0278804
[    7.052901] fee0: ed7bf0c0 0008801d ed23c858 c02615ec 00000000 ed23c858 
00000000 c0265d1c
[    7.061117] ff00: 000007ff 00000000 ed6930ec ed692cc0 c0c73ee4 00000454 
5ac3c35a c0140350
[    7.069339] ff20: ecdee000 ecdeffb0 c0100264 fffffe30 c0100264 c010a7b0 
ed26c200 be8b0860
[    7.077572] ff40: 00004000 00000128 c0100264 c069b5d4 00000000 00000000 
00000000 000000fe
[    7.085798] ff60: 00000000 00000000 00000000 c01401f8 00000000 c0c03e88 
ed7bf0c0 ed7bf0c0
[    7.094012] ff80: 00000000 c0c03e88 ed7bf0c0 0000000b b6f794d0 01776af0 
00000006 c0100264
[    7.102233] ffa0: ecdee000 00000006 00000000 c01000cc 00000000 be8b0860 
00000000 00000000
[    7.110457] ffc0: 0000000b b6f794d0 01776af0 00000006 0000000b 0177445c 
b6f80000 00000000
[    7.118673] ffe0: b6f4b10c be8b1a40 b6e1e490 b6d1c320 60010010 0000000b 
00000000 00000000
[    7.126898] [<c029aa84>] (nsfs_evict) from [<00000000>] (0x0)
[    7.132676] Code: ebff8a17 e1a00004 e5943004 e8bd4010 (e5933010) 
[    7.138841] ---[ end trace 2b44d591054a9910 ]---
[    7.143482] Kernel panic - not syncing: Fatal exception
[    7.148733] CPU1: stopping
[    7.151455] CPU: 1 PID: 331 Comm: bash Tainted: G      D           
5.9.1-cla-cfb #1
[    7.159133] Hardware name: Marvell Armada 380/385 (Device Tree)
[    7.165080] [<c010f10c>] (unwind_backtrace) from [<c010add8>] 
(show_stack+0x10/0x14)
[    7.172849] [<c010add8>] (show_stack) from [<c07eccac>] 
(dump_stack+0x94/0xa8)
[    7.180095] [<c07eccac>] (dump_stack) from [<c010dda8>] 
(handle_IPI+0x340/0x378)
[    7.187516] [<c010dda8>] (handle_IPI) from [<c0430e34>] 
(gic_handle_irq+0x8c/0x90)
[    7.195110] [<c0430e34>] (gic_handle_irq) from [<c0100b0c>] 
(__irq_svc+0x6c/0x90)
[    7.202612] Exception stack(0xecedbe28 to 0xecedbe70)
[    7.207678] be20:                   edb83d90 edb72bc8 c022a3d8 0000015f 
edb72bc8 00000000
[    7.215879] be40: 0013f000 edb72ba0 edb72ba0 ed7774bc c0cc0e20 ed777480 
00021000 ecedbe78
[    7.224079] be60: ed61d4e0 c022ac48 a00d0013 ffffffff
[    7.229148] [<c0100b0c>] (__irq_svc) from [<c022ac48>] 
(anon_vma_interval_tree_remove+0x1dc/0x2d4)
[    7.238135] [<c022ac48>] (anon_vma_interval_tree_remove) from 
[<c023efe4>] (unlink_anon_vmas+0xbc/0x1fc)
[    7.247644] [<c023efe4>] (unlink_anon_vmas) from [<c022f6f4>] 
(free_pgtables+0x48/0xb4)
[    7.255674] [<c022f6f4>] (free_pgtables) from [<c0238dd8>] 
(exit_mmap+0xe8/0x1b4)
[    7.263183] [<c0238dd8>] (exit_mmap) from [<c011c43c>] (mmput+0x48/0xec)
[    7.269905] [<c011c43c>] (mmput) from [<c0124430>] (do_exit+0x2d4/0x930)
[    7.276625] [<c0124430>] (do_exit) from [<c0124af4>] 
(do_group_exit+0x3c/0xb8)
[    7.283868] [<c0124af4>] (do_group_exit) from [<c0124b80>] 
(__wake_up_parent+0x0/0x18)
[    7.291815] Rebooting in 10 seconds..

Vlastimil Babka helped me debug this (thanks a lot!), and the ns->ops is 
supposed to be set via net_ns_net_init(). That code, however, only 
initializes this ops structure when CONFIG_NET_NS=y, and I have 
CONFIG_NET_NS=n.

On how to reproduce, this is where the fun starts. I'm getting this on an 
ARM board (mvebu, SolidRun Clearfog Base). It started happening after 
updating userland from systemd-243.4 to systemd-246.6 (and a ton of 
unrelated bits including the toolchain -- you know, embedded updates). 
However, it *only* happens when that new enough systemd-networkd is 
launched with SYSTEMD_LOG_LEVEL=debug, and indeed, here's what a relevant 
part of the diff of the updated systemd looks like (in particular systemd 
commit f6dbcebdc28cabf36e6665b67d52d43192fb88df):

@@ -164,12 +158,54 @@ int device_monitor_new_full(sd_device_monitor **ret, 
MonitorNetlinkGroup group,
 
         if (fd >= 0) {
                 r = monitor_set_nl_address(m);
-                if (r < 0)
-                        return log_debug_errno(r, "sd-device-monitor: 
Failed to set netlink address: %m");
+                if (r < 0) {
+                        log_debug_errno(r, "sd-device-monitor: Failed to 
set netlink address: %m");
+                        goto fail;
+                }
+        }
+
+        if (DEBUG_LOGGING) {
+                _cleanup_close_ int netns = -1;
+
+                /* So here's the thing: only AF_NETLINK sockets from the 
main network namespace will get
+                 * hardware events. Let's check if ours is from there, and 
if not generate a debug message,
+                 * since we cannot possibly work correctly otherwise. This 
is just a safety check to make
+                 * things easier to debug. */
+
+                netns = ioctl(m->sock, SIOCGSKNS);
+                if (netns < 0)
+                        log_debug_errno(errno, "sd-device-monitor: Unable 
to get network namespace of udev netlink socket, unable to determine if we 
are in host netns: %m");
+                else {
+                        struct stat a, b;
+
+                        if (fstat(netns, &a) < 0) {
+                                r = log_debug_errno(errno, 
"sd-device-monitor: Failed to stat netns of udev netlink socket: %m");
+                                goto fail;
+                        }
+
+                        if (stat("/proc/1/ns/net", &b) < 0) {
+                                if (ERRNO_IS_PRIVILEGE(errno))
+                                        /* If we can't access PID1's netns 
info due to permissions, it's fine, this is a
+                                         * safety check only after all. */
+                                        log_debug_errno(errno, 
"sd-device-monitor: No permission to stat PID1's netns, unable to determine 
if we are in host netns: %m");
+                                else
+                                        log_debug_errno(errno, 
"sd-device-monitor: Failed to stat PID1's netns: %m");
+
+                        } else if (a.st_dev != b.st_dev || a.st_ino != 
b.st_ino)
+                                log_debug("sd-device-monitor: Netlink 
socket we listen on is not from host netns, we won't see device events.");
+                }
         }

Apparently, when debugging is enabled, something stats /proc/1/ns/net, 
quite likely from a sandboxed/namespaced/whatever process context, and that 
something was not happening on the previous version of systemd.

Anyway, I'm so happy I can finally reproduce this "mysterious crash" on a 
box with a remote console, so please feel free to ask for extra details if 
needed. I'll also be happy to try patches, etc. Perhaps Lennart has a 
reproducer that's small enough? Something simple as `ls -al` from a SSH 
session is not enough.

With kind regards,
Jan

[1] 
https://gerrit.cesnet.cz/plugins/gitiles/github/torvalds/linux/+log/refs/heads/cesnet/2020-11-03---5.9.3

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-11-03 13:32 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-03 13:22 Null pointer dereference in nsfs_evict with CONFIG_NET_NS=n triggered via systemd-networkd's debugging Jan Kundrát

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).