* Re: PROBLEM: NULL pointer dereference in kernel 4.14.6 [not found] ` <08995310-d853-ee77-ed1f-26cc336a4a30-CgwIDsGnGWjby3iVrkZq2A@public.gmane.org> @ 2017-12-17 18:25 ` Randy Dunlap [not found] ` <54a16e07-70e6-adda-ebdb-06349b4f8e86-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 0 siblings, 1 reply; 7+ messages in thread From: Randy Dunlap @ 2017-12-17 18:25 UTC (permalink / raw) To: Bronek Kozicki, linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: cgroups-u79uwXL29TY76Z2rM5mHXA On 12/17/2017 09:49 AM, Bronek Kozicki wrote: > I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt. > > open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 > fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 > getdents(5, /* 12 entries */, 32768) = 464 > openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8 > fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 > read(8, <unfinished ...>) = ? > +++ killed by SIGKILL +++ > [1] 12078 killed strace -- systemctl status > > > B. > Hi, Can you reproduce this without using (loading) the XFS modules? They cause the kernel to be tainted. Adding cgroups mailing list also. > > [ 1889.226051] ================================================================================ > [ 1889.235286] UBSAN: Undefined behaviour in kernel/cgroup/pids.c:67:9 > [ 1889.241563] member access within null pointer of type 'struct pids_cgroup' > [ 1889.249920] ================================================================================ > [ 1889.259698] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0 > [ 1889.267524] IP: pids_free+0x28/0xb0 > [ 1889.272394] PGD 0 P4D 0 > [ 1889.274925] Oops: 0000 [#1] SMP > [ 1889.278061] Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmu > l crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto nls_iso8859_1 nls_cp437 evdev input_leds led_class vfat fat intel_rapl_perf mac_hid pcspkr hid_logitech_dj igb ptp mei_me pps_ > core i2c_i801 i2c_algo_bit mei lpc_ich ioatdma tpm_tis tpm_tis_core dca shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ahci libsas libahci xhci_pci ehci_pci mpt3sas xhci_hc > d ehci_hcd raid_class libata > [ 1889.349864] scsi_transport_sas usbcore scsi_mod usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio > [ 1889.368439] CPU: 1 PID: 12084 Comm: systemctl Tainted: P W O 4.14.7-1-ARCH #1 > [ 1889.376525] Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014 > [ 1889.383474] task: ffff93149aaec140 task.stack: ffffa88c3836c000 > [ 1889.389387] RIP: 0010:pids_free+0x28/0xb0 > [ 1889.393388] RSP: 0018:ffffa88c3836fcc8 EFLAGS: 00010282 > [ 1889.398605] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006 > [ 1889.405731] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000202 > [ 1889.412854] RBP: ffff931499ab2d58 R08: 000000000000079a R09: 0000000000000000 > [ 1889.419979] R10: 00000000001f5954 R11: 000000000003d040 R12: 0000000056e21a48 > [ 1889.427102] R13: ffffffffa91de5c0 R14: ffff93247b0598c0 R15: ffffffffa91cd0a0 > [ 1889.434227] FS: 00007f18eee6b8c0(0000) GS:ffff931ebfa40000(0000) knlGS:0000000000000000 > [ 1889.442302] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1889.448041] CR2: 00000000000000b0 CR3: 0000000611019003 CR4: 00000000001626e0 > [ 1889.455164] Call Trace: > [ 1889.457610] cgroup_free+0xaa/0x190 > [ 1889.461095] __put_task_struct+0x68/0x230 > [ 1889.465105] ? seq_printf+0x4e/0x70 > [ 1889.468591] css_task_iter_next+0x74/0x90 > [ 1889.472594] kernfs_seq_next+0x58/0x110 > [ 1889.476424] seq_read+0x36c/0x620 > [ 1889.479735] __vfs_read+0x54/0x2e0 > [ 1889.483134] vfs_read+0x9d/0x200 > [ 1889.486358] SyS_read+0x52/0xc0 > [ 1889.489494] do_syscall_64+0x69/0x1e0 > [ 1889.493152] entry_SYSCALL64_slow_path+0x25/0x25 > [ 1889.497771] RIP: 0033:0x7f18ee784a11 > [ 1889.501341] RSP: 002b:00007ffd56942618 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > [ 1889.508897] RAX: ffffffffffffffda RBX: 0000559a9ae6d260 RCX: 00007f18ee784a11 > [ 1889.516022] RDX: 0000000000001000 RSI: 0000559a9ae80f70 RDI: 0000000000000008 > [ 1889.523145] RBP: 0000000000000d68 R08: 0000000000000003 R09: ffffffffffffffb0 > [ 1889.530270] R10: 0000000000001000 R11: 0000000000000246 R12: 00007f18eea4b700 > [ 1889.537395] R13: 00007f18eea4c240 R14: 0000559a9ae6d260 R15: 0000000000000000 > [ 1889.544518] Code: 44 00 00 0f 1f 44 00 00 48 81 ff c8 f7 ff ff 55 53 48 89 fb 74 4c 48 8b 9b 38 08 00 00 48 85 db 74 7c 48 8b 5b 50 48 85 db 74 63 <48> 83 bb b0 00 00 00 00 74 2a 48 c7 c5 60 2e 1e a9 48 89 df e8 > [ 1889.563368] RIP: pids_free+0x28/0xb0 RSP: ffffa88c3836fcc8 > [ 1889.568846] CR2: 00000000000000b0 > [ 1889.572175] ---[ end trace eab2ed000b4d5c66 ]--- > -- ~Randy ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <54a16e07-70e6-adda-ebdb-06349b4f8e86-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>]
* Re: PROBLEM: NULL pointer dereference in kernel 4.14.6 [not found] ` <54a16e07-70e6-adda-ebdb-06349b4f8e86-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> @ 2017-12-17 18:30 ` Bronek Kozicki [not found] ` <04a4e27d-e291-66c3-ab88-e1343c6955f2-CgwIDsGnGWjby3iVrkZq2A@public.gmane.org> 2017-12-17 18:48 ` Bronek Kozicki 1 sibling, 1 reply; 7+ messages in thread From: Bronek Kozicki @ 2017-12-17 18:30 UTC (permalink / raw) To: Randy Dunlap, linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: cgroups-u79uwXL29TY76Z2rM5mHXA On 17/12/2017 18:25, Randy Dunlap wrote: > On 12/17/2017 09:49 AM, Bronek Kozicki wrote: >> I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt. >> >> open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 >> fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 >> getdents(5, /* 12 entries */, 32768) = 464 >> openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8 >> fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 >> read(8, <unfinished ...>) = ? >> +++ killed by SIGKILL +++ >> [1] 12078 killed strace -- systemctl status >> >> >> B. >> > > Hi, > > Can you reproduce this without using (loading) the XFS modules? > They cause the kernel to be tainted. I think you mean ZFS - I cannot do that. It is my root filesystem. B. ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <04a4e27d-e291-66c3-ab88-e1343c6955f2-CgwIDsGnGWjby3iVrkZq2A@public.gmane.org>]
* Re: PROBLEM: NULL pointer dereference in kernel 4.14.6 [not found] ` <04a4e27d-e291-66c3-ab88-e1343c6955f2-CgwIDsGnGWjby3iVrkZq2A@public.gmane.org> @ 2017-12-17 18:31 ` Randy Dunlap 0 siblings, 0 replies; 7+ messages in thread From: Randy Dunlap @ 2017-12-17 18:31 UTC (permalink / raw) To: Bronek Kozicki, linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: cgroups-u79uwXL29TY76Z2rM5mHXA On 12/17/2017 10:30 AM, Bronek Kozicki wrote: > On 17/12/2017 18:25, Randy Dunlap wrote: >> On 12/17/2017 09:49 AM, Bronek Kozicki wrote: >>> I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt. >>> >>> open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 >>> fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 >>> getdents(5, /* 12 entries */, 32768) = 464 >>> openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8 >>> fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 >>> read(8, <unfinished ...>) = ? >>> +++ killed by SIGKILL +++ >>> [1] 12078 killed strace -- systemctl status >>> >>> >>> B. >>> >> >> Hi, >> >> Can you reproduce this without using (loading) the XFS modules? >> They cause the kernel to be tainted. > > I think you mean ZFS - I cannot do that. It is my root filesystem. Sorry, yes, I did mean ZFS. thanks, -- ~Randy ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PROBLEM: NULL pointer dereference in kernel 4.14.6 [not found] ` <54a16e07-70e6-adda-ebdb-06349b4f8e86-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> 2017-12-17 18:30 ` Bronek Kozicki @ 2017-12-17 18:48 ` Bronek Kozicki 1 sibling, 0 replies; 7+ messages in thread From: Bronek Kozicki @ 2017-12-17 18:48 UTC (permalink / raw) To: Randy Dunlap, linux-kernel-u79uwXL29TY76Z2rM5mHXA Cc: cgroups-u79uwXL29TY76Z2rM5mHXA FWIW, I can do "cat" . I get a single number seemingly followed by an infinite stream of 0s (I tried wc -l, but did not want to wait very long and killed it). Here is what it looks like, if limited by "head": root@gdansk ~ # cat '/sys/fs/cgroup/unified/machine.slice/machine-qemu\x2d1\x2dkartuzy\x2dspice.scope/cgroup.procs' | head 10649 0 0 0 0 0 0 0 0 0 root@gdansk ~ # PID 10649 is indeed qemu process running the virtual machine in question: root@gdansk ~ # ps lw 10649 F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 6 0 10649 1 20 0 4815836 60252 - Sl ? 2:56 /usr/bin/qemu-system-x86_64 -name guest=kartuzy-spice,process=qemu:kartuzy-spice,debug-threads=on -S -object se Sorry about taint by ZFS, but there is nothing I can do, it is my root filesystem. Since I am the only user of the package in question I could cheat and replace the license for the build of the ZFS module, but I do not see how that might help. B. ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <20171217232448.yfaxxew2ijaay7iu@shells.gnugeneration.com>]
[parent not found: <20171217232448.yfaxxew2ijaay7iu-5Y5FpTStZqUl8ZggnyUIT4tm+1EbUQKi@public.gmane.org>]
* Re: PROBLEM: NULL pointer dereference in kernel 4.14.6 [not found] ` <20171217232448.yfaxxew2ijaay7iu-5Y5FpTStZqUl8ZggnyUIT4tm+1EbUQKi@public.gmane.org> @ 2017-12-18 19:56 ` Bronek Kozicki 0 siblings, 0 replies; 7+ messages in thread From: Bronek Kozicki @ 2017-12-18 19:56 UTC (permalink / raw) To: vcaputo-IiWei5kqaphBDgjK7y7TUQ, linux-kernel-u79uwXL29TY76Z2rM5mHXA, tj-DgEjT+Ai2ygdnm+yROfE0A Cc: cgroups-u79uwXL29TY76Z2rM5mHXA On 17/12/2017 23:24, vcaputo-IiWei5kqaphBDgjK7y7TUQ@public.gmane.org wrote: > On Sun, Dec 17, 2017 at 05:49:44PM +0000, Bronek Kozicki wrote: >> I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt. >> >> open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5 >> fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 >> getdents(5, /* 12 entries */, 32768) = 464 >> openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8 >> fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 >> read(8, <unfinished ...>) = ? >> +++ killed by SIGKILL +++ >> [1] 12078 killed strace -- systemctl status >> >> > > This recently came through lkml, may be related: > https://marc.info/?l=linux-kernel&m=151320108922415&w=2 thank you, it certainly seems related. Is there some debugging option I could enable, or patch I could apply, which would make the point of data corruption easier to find? I'm ok taking untested patches, if that helps finding the location of the bug. B. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
@ 2017-12-18 20:17 George Amanakis
[not found] ` <1513628274.1378.1.camel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 7+ messages in thread
From: George Amanakis @ 2017-12-18 20:17 UTC (permalink / raw)
To: linux-kernel-u79uwXL29TY76Z2rM5mHXA; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA
[-- Attachment #1: Type: text/plain, Size: 361 bytes --]
I can replicate this on a Thinkpad X230i running archlinux with latest
4.14.7 kernel, without the ZFS modules.
Steps to reproduce:
1) create a virtual machine using libvirt (attached xml)
2) virsh start vm
3) head /sys/fs/cgroup/unified/machine.slice/machine-
qemu\\x2d2\\x2dvm.scope/cgroup.procs
This hangs the laptop requiring a hard reset.
Regards,
George
[-- Attachment #2: vm.xml --]
[-- Type: application/xml, Size: 2967 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread[parent not found: <1513628274.1378.1.camel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: PROBLEM: NULL pointer dereference in kernel 4.14.6 [not found] ` <1513628274.1378.1.camel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2017-12-19 22:37 ` Tejun Heo 0 siblings, 0 replies; 7+ messages in thread From: Tejun Heo @ 2017-12-19 22:37 UTC (permalink / raw) To: George Amanakis Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, cgroups-u79uwXL29TY76Z2rM5mHXA Hello, On Mon, Dec 18, 2017 at 03:17:54PM -0500, George Amanakis wrote: > I can replicate this on a Thinkpad X230i running archlinux with latest > 4.14.7 kernel, without the ZFS modules. > > Steps to reproduce: > 1) create a virtual machine using libvirt (attached xml) > 2) virsh start vm > 3) head /sys/fs/cgroup/unified/machine.slice/machine- > qemu\\x2d2\\x2dvm.scope/cgroup.procs It took some massaging but I can reproduce the problem. Will report when I know more. Thanks. -- tejun ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-12-19 22:37 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1513512885.3653140.1207725096.395A9CCC@webmail.messagingengine.com>
[not found] ` <eccdc57d-d5ea-ddc5-9f07-269e4a6786ae@incorrekt.com>
[not found] ` <08995310-d853-ee77-ed1f-26cc336a4a30@incorrekt.com>
[not found] ` <08995310-d853-ee77-ed1f-26cc336a4a30-CgwIDsGnGWjby3iVrkZq2A@public.gmane.org>
2017-12-17 18:25 ` PROBLEM: NULL pointer dereference in kernel 4.14.6 Randy Dunlap
[not found] ` <54a16e07-70e6-adda-ebdb-06349b4f8e86-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2017-12-17 18:30 ` Bronek Kozicki
[not found] ` <04a4e27d-e291-66c3-ab88-e1343c6955f2-CgwIDsGnGWjby3iVrkZq2A@public.gmane.org>
2017-12-17 18:31 ` Randy Dunlap
2017-12-17 18:48 ` Bronek Kozicki
[not found] ` <20171217232448.yfaxxew2ijaay7iu@shells.gnugeneration.com>
[not found] ` <20171217232448.yfaxxew2ijaay7iu-5Y5FpTStZqUl8ZggnyUIT4tm+1EbUQKi@public.gmane.org>
2017-12-18 19:56 ` Bronek Kozicki
2017-12-18 20:17 George Amanakis
[not found] ` <1513628274.1378.1.camel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-12-19 22:37 ` Tejun Heo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox