public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed
* Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
       [not found]     ` <08995310-d853-ee77-ed1f-26cc336a4a30-CgwIDsGnGWjby3iVrkZq2A@public.gmane.org>
@ 2017-12-17 18:25       ` Randy Dunlap
       [not found]         ` <54a16e07-70e6-adda-ebdb-06349b4f8e86-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: Randy Dunlap @ 2017-12-17 18:25 UTC (permalink / raw)
  To: Bronek Kozicki, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

On 12/17/2017 09:49 AM, Bronek Kozicki wrote:
> I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt.
> 
> open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
> fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> getdents(5, /* 12 entries */, 32768)    = 464
> openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8
> fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> read(8,  <unfinished ...>)              = ?
> +++ killed by SIGKILL +++
> [1]    12078 killed     strace -- systemctl status
> 
> 
> B.
> 

Hi,

Can you reproduce this without using (loading) the XFS modules?
They cause the kernel to be tainted.

Adding cgroups mailing list also.

> 
> [ 1889.226051] ================================================================================
> [ 1889.235286] UBSAN: Undefined behaviour in kernel/cgroup/pids.c:67:9
> [ 1889.241563] member access within null pointer of type 'struct pids_cgroup'
> [ 1889.249920] ================================================================================
> [ 1889.259698] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0
> [ 1889.267524] IP: pids_free+0x28/0xb0
> [ 1889.272394] PGD 0 P4D 0
> [ 1889.274925] Oops: 0000 [#1] SMP
> [ 1889.278061] Modules linked in: ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter devlink joydev hid_logitech_hidpp mxm_wmi intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmu
> l crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd ext4 intel_cstate crc16 mbcache jbd2 fscrypto nls_iso8859_1 nls_cp437 evdev input_leds led_class vfat fat intel_rapl_perf mac_hid pcspkr hid_logitech_dj igb ptp mei_me pps_
> core i2c_i801 i2c_algo_bit mei lpc_ich ioatdma tpm_tis tpm_tis_core dca shpchp tpm wmi button sch_fq_codel sg ip_tables x_tables usbhid hid zfs(PO) zunicode(PO) zavl(PO) icp(PO) sd_mod serio_raw atkbd libps2 isci ahci libsas libahci xhci_pci ehci_pci mpt3sas xhci_hc
> d ehci_hcd raid_class libata
> [ 1889.349864]  scsi_transport_sas usbcore scsi_mod usb_common i8042 serio zcommon(PO) znvpair(PO) spl(O) nvme nvme_core bridge stp llc vhost_net tun tap vhost vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
> [ 1889.368439] CPU: 1 PID: 12084 Comm: systemctl Tainted: P        W  O    4.14.7-1-ARCH #1
> [ 1889.376525] Hardware name: Supermicro X9DA7/E/X9DA7/E, BIOS 3.0a 07/02/2014
> [ 1889.383474] task: ffff93149aaec140 task.stack: ffffa88c3836c000
> [ 1889.389387] RIP: 0010:pids_free+0x28/0xb0
> [ 1889.393388] RSP: 0018:ffffa88c3836fcc8 EFLAGS: 00010282
> [ 1889.398605] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
> [ 1889.405731] RDX: 0000000000000000 RSI: 0000000000000202 RDI: 0000000000000202
> [ 1889.412854] RBP: ffff931499ab2d58 R08: 000000000000079a R09: 0000000000000000
> [ 1889.419979] R10: 00000000001f5954 R11: 000000000003d040 R12: 0000000056e21a48
> [ 1889.427102] R13: ffffffffa91de5c0 R14: ffff93247b0598c0 R15: ffffffffa91cd0a0
> [ 1889.434227] FS:  00007f18eee6b8c0(0000) GS:ffff931ebfa40000(0000) knlGS:0000000000000000
> [ 1889.442302] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1889.448041] CR2: 00000000000000b0 CR3: 0000000611019003 CR4: 00000000001626e0
> [ 1889.455164] Call Trace:
> [ 1889.457610]  cgroup_free+0xaa/0x190
> [ 1889.461095]  __put_task_struct+0x68/0x230
> [ 1889.465105]  ? seq_printf+0x4e/0x70
> [ 1889.468591]  css_task_iter_next+0x74/0x90
> [ 1889.472594]  kernfs_seq_next+0x58/0x110
> [ 1889.476424]  seq_read+0x36c/0x620
> [ 1889.479735]  __vfs_read+0x54/0x2e0
> [ 1889.483134]  vfs_read+0x9d/0x200
> [ 1889.486358]  SyS_read+0x52/0xc0
> [ 1889.489494]  do_syscall_64+0x69/0x1e0
> [ 1889.493152]  entry_SYSCALL64_slow_path+0x25/0x25
> [ 1889.497771] RIP: 0033:0x7f18ee784a11
> [ 1889.501341] RSP: 002b:00007ffd56942618 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> [ 1889.508897] RAX: ffffffffffffffda RBX: 0000559a9ae6d260 RCX: 00007f18ee784a11
> [ 1889.516022] RDX: 0000000000001000 RSI: 0000559a9ae80f70 RDI: 0000000000000008
> [ 1889.523145] RBP: 0000000000000d68 R08: 0000000000000003 R09: ffffffffffffffb0
> [ 1889.530270] R10: 0000000000001000 R11: 0000000000000246 R12: 00007f18eea4b700
> [ 1889.537395] R13: 00007f18eea4c240 R14: 0000559a9ae6d260 R15: 0000000000000000
> [ 1889.544518] Code: 44 00 00 0f 1f 44 00 00 48 81 ff c8 f7 ff ff 55 53 48 89 fb 74 4c 48 8b 9b 38 08 00 00 48 85 db 74 7c 48 8b 5b 50 48 85 db 74 63 <48> 83 bb b0 00 00 00 00 74 2a 48 c7 c5 60 2e 1e a9 48 89 df e8
> [ 1889.563368] RIP: pids_free+0x28/0xb0 RSP: ffffa88c3836fcc8
> [ 1889.568846] CR2: 00000000000000b0
> [ 1889.572175] ---[ end trace eab2ed000b4d5c66 ]---
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
       [not found]         ` <54a16e07-70e6-adda-ebdb-06349b4f8e86-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2017-12-17 18:30           ` Bronek Kozicki
       [not found]             ` <04a4e27d-e291-66c3-ab88-e1343c6955f2-CgwIDsGnGWjby3iVrkZq2A@public.gmane.org>
  2017-12-17 18:48           ` Bronek Kozicki
  1 sibling, 1 reply; 7+ messages in thread
From: Bronek Kozicki @ 2017-12-17 18:30 UTC (permalink / raw)
  To: Randy Dunlap, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

On 17/12/2017 18:25, Randy Dunlap wrote:
> On 12/17/2017 09:49 AM, Bronek Kozicki wrote:
>> I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt.
>>
>> open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
>> fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
>> getdents(5, /* 12 entries */, 32768)    = 464
>> openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8
>> fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
>> read(8,  <unfinished ...>)              = ?
>> +++ killed by SIGKILL +++
>> [1]    12078 killed     strace -- systemctl status
>>
>>
>> B.
>>
> 
> Hi,
> 
> Can you reproduce this without using (loading) the XFS modules?
> They cause the kernel to be tainted.

I think you mean ZFS - I cannot do that. It is my root filesystem.


B.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
       [not found]             ` <04a4e27d-e291-66c3-ab88-e1343c6955f2-CgwIDsGnGWjby3iVrkZq2A@public.gmane.org>
@ 2017-12-17 18:31               ` Randy Dunlap
  0 siblings, 0 replies; 7+ messages in thread
From: Randy Dunlap @ 2017-12-17 18:31 UTC (permalink / raw)
  To: Bronek Kozicki, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

On 12/17/2017 10:30 AM, Bronek Kozicki wrote:
> On 17/12/2017 18:25, Randy Dunlap wrote:
>> On 12/17/2017 09:49 AM, Bronek Kozicki wrote:
>>> I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt.
>>>
>>> open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
>>> fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
>>> getdents(5, /* 12 entries */, 32768)    = 464
>>> openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8
>>> fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
>>> read(8,  <unfinished ...>)              = ?
>>> +++ killed by SIGKILL +++
>>> [1]    12078 killed     strace -- systemctl status
>>>
>>>
>>> B.
>>>
>>
>> Hi,
>>
>> Can you reproduce this without using (loading) the XFS modules?
>> They cause the kernel to be tainted.
> 
> I think you mean ZFS - I cannot do that. It is my root filesystem.

Sorry, yes, I did mean ZFS.

thanks,
-- 
~Randy

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
       [not found]         ` <54a16e07-70e6-adda-ebdb-06349b4f8e86-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2017-12-17 18:30           ` Bronek Kozicki
@ 2017-12-17 18:48           ` Bronek Kozicki
  1 sibling, 0 replies; 7+ messages in thread
From: Bronek Kozicki @ 2017-12-17 18:48 UTC (permalink / raw)
  To: Randy Dunlap, linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

FWIW, I can do "cat" . I get a single number seemingly followed by an
infinite stream of 0s (I tried wc -l, but did not want to wait very long
and killed it). Here is what it looks like, if limited by "head":

root@gdansk ~ # cat
'/sys/fs/cgroup/unified/machine.slice/machine-qemu\x2d1\x2dkartuzy\x2dspice.scope/cgroup.procs'
| head
10649
0
0
0
0
0
0
0
0
0
root@gdansk ~ #

PID 10649 is indeed qemu process running the virtual machine in
question:

root@gdansk ~ # ps lw 10649
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME
COMMAND
6     0 10649     1  20   0 4815836 60252 -     Sl   ?          2:56
/usr/bin/qemu-system-x86_64 -name
guest=kartuzy-spice,process=qemu:kartuzy-spice,debug-threads=on -S
-object se


Sorry about taint by ZFS, but there is nothing I can do, it is my root
filesystem. Since I am the only user of the package in question I could
cheat and replace the license for the build of the ZFS module, but I do
not see how that might help.


B.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
       [not found]       ` <20171217232448.yfaxxew2ijaay7iu-5Y5FpTStZqUl8ZggnyUIT4tm+1EbUQKi@public.gmane.org>
@ 2017-12-18 19:56         ` Bronek Kozicki
  0 siblings, 0 replies; 7+ messages in thread
From: Bronek Kozicki @ 2017-12-18 19:56 UTC (permalink / raw)
  To: vcaputo-IiWei5kqaphBDgjK7y7TUQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, tj-DgEjT+Ai2ygdnm+yROfE0A
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

On 17/12/2017 23:24, vcaputo-IiWei5kqaphBDgjK7y7TUQ@public.gmane.org wrote:
> On Sun, Dec 17, 2017 at 05:49:44PM +0000, Bronek Kozicki wrote:
>> I just upgraded to 4.14.7 and tried to reproduce this error, this time under strace. As you can see this happens when systemctl tries to read a specific entry under /sys/fs . In case this matters, the entry is for a small virtual machine running under qemu/kvm and managed by libvirt.
>>
>> open("/sys/fs/cgroup/unified/machine.slice", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 5
>> fstat(5, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
>> getdents(5, /* 12 entries */, 32768)    = 464
>> openat(AT_FDCWD, "/sys/fs/cgroup/unified/machine.slice/machine-qemu\\x2d1\\x2dkartuzy\\x2dspice.scope/cgroup.procs", O_RDONLY|O_CLOEXEC) = 8
>> fstat(8, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
>> read(8,  <unfinished ...>)              = ?
>> +++ killed by SIGKILL +++
>> [1]    12078 killed     strace -- systemctl status
>>
>>
> 
> This recently came through lkml, may be related:
> https://marc.info/?l=linux-kernel&m=151320108922415&w=2

thank you, it certainly seems related. Is there some debugging option I could enable, or patch I could apply, which would make the point of data corruption easier to find? I'm ok taking untested patches, if that helps finding the location of the bug.


B.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
@ 2017-12-18 20:17 George Amanakis
       [not found] ` <1513628274.1378.1.camel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 7+ messages in thread
From: George Amanakis @ 2017-12-18 20:17 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA; +Cc: cgroups-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 361 bytes --]

I can replicate this on a Thinkpad X230i running archlinux with latest
4.14.7 kernel, without the ZFS modules.

Steps to reproduce:
1) create a virtual machine using libvirt (attached xml)
2) virsh start vm
3) head /sys/fs/cgroup/unified/machine.slice/machine-
qemu\\x2d2\\x2dvm.scope/cgroup.procs

This hangs the laptop requiring a hard reset.

Regards,
George

[-- Attachment #2: vm.xml --]
[-- Type: application/xml, Size: 2967 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: PROBLEM: NULL pointer dereference in kernel 4.14.6
       [not found] ` <1513628274.1378.1.camel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-12-19 22:37   ` Tejun Heo
  0 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2017-12-19 22:37 UTC (permalink / raw)
  To: George Amanakis
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA

Hello,

On Mon, Dec 18, 2017 at 03:17:54PM -0500, George Amanakis wrote:
> I can replicate this on a Thinkpad X230i running archlinux with latest
> 4.14.7 kernel, without the ZFS modules.
> 
> Steps to reproduce:
> 1) create a virtual machine using libvirt (attached xml)
> 2) virsh start vm
> 3) head /sys/fs/cgroup/unified/machine.slice/machine-
> qemu\\x2d2\\x2dvm.scope/cgroup.procs

It took some massaging but I can reproduce the problem.  Will report
when I know more.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-12-19 22:37 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1513512885.3653140.1207725096.395A9CCC@webmail.messagingengine.com>
     [not found] ` <eccdc57d-d5ea-ddc5-9f07-269e4a6786ae@incorrekt.com>
     [not found]   ` <08995310-d853-ee77-ed1f-26cc336a4a30@incorrekt.com>
     [not found]     ` <08995310-d853-ee77-ed1f-26cc336a4a30-CgwIDsGnGWjby3iVrkZq2A@public.gmane.org>
2017-12-17 18:25       ` PROBLEM: NULL pointer dereference in kernel 4.14.6 Randy Dunlap
     [not found]         ` <54a16e07-70e6-adda-ebdb-06349b4f8e86-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2017-12-17 18:30           ` Bronek Kozicki
     [not found]             ` <04a4e27d-e291-66c3-ab88-e1343c6955f2-CgwIDsGnGWjby3iVrkZq2A@public.gmane.org>
2017-12-17 18:31               ` Randy Dunlap
2017-12-17 18:48           ` Bronek Kozicki
     [not found]     ` <20171217232448.yfaxxew2ijaay7iu@shells.gnugeneration.com>
     [not found]       ` <20171217232448.yfaxxew2ijaay7iu-5Y5FpTStZqUl8ZggnyUIT4tm+1EbUQKi@public.gmane.org>
2017-12-18 19:56         ` Bronek Kozicki
2017-12-18 20:17 George Amanakis
     [not found] ` <1513628274.1378.1.camel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-12-19 22:37   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox