linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* WARNING: CPU: 1 PID: 14735 at fs/dcache.c:365 dentry_free+0x100/0x128
@ 2022-07-09  5:33 Helge Deller
  0 siblings, 0 replies; 5+ messages in thread
From: Helge Deller @ 2022-07-09  5:33 UTC (permalink / raw)
  To: Alexander Viro, linux-fsdevel, linux-kernel, linux-parisc

On the debian buildd servers for the parisc architecture I'm facing
the following warning, after which it gets stuck so that I need to
reboot the server.
This happens rarely, but I can reproduce it after some time.
Filesystem is tmpfs.

I'm happy to add debug code if necessary...

Helge

[128321.224739] ------------[ cut here ]------------
[128321.283492] WARNING: CPU: 1 PID: 14735 at fs/dcache.c:365 dentry_free+0x100/0x128
[128321.371369] Modules linked in: dm_mod(E) ipt_REJECT(E) nf_reject_ipv4(E) xt_multiport(E) nft_compat(E) nf_tables(E) nfnetlink(E) ipmi_si(E) ipmi_devintf(E) sg(E) ipmi_msghandler(E) fuse(E) configfs(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) zstd_compress(E) libcrc32c(E) crc32c_generic(E) sd_mod(E) t10_pi(E) crc64_rocksoft(E) crc64(E) crc_t10dif(E) crct10dif_generic(E) ohci_pci(E) crct10dif_common(E) ata_generic(E) sata_sil(E) ehci_pci(E) pata_sil680(E) mptspi(E) mptscsih(E) ohci_hcd(E) mptbase(E) scsi_transport_spi(E) ehci_hcd(E) libata(E) scsi_mod(E) usbcore(E) e1000(E) usb_common(E) scsi_common(E)
[128322.103374] CPU: 1 PID: 14735 Comm: cc1plus Tainted: G            E     5.18.9+ #27
[128322.195315] Hardware name: 9000/785/C8000
[128322.247318]
[128322.263314]      YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
[128322.323316] PSW: 00001000000001101111000000001111 Tainted: G            E
[128322.407315] r00-03  000000000806f00f 0000000239aa8700 000000004073be90 0000000239aa87a0
[128322.507313] r04-07  00000000410e1a00 000000411c8e3b40 0000000211715ad8 000000411c8e3b98
[128322.603314] r08-11  00000000410b8e80 0000000041147200 00000000000800e0 0000000041147200
[128322.699311] r12-15  0000000000000001 0000000010000000 0000000041147200 0000000041147200
[128322.795312] r16-19  000000411c8e3b98 0000000000000000 00000000415084d0 0000000002c55000
[128322.895311] r20-23  000000000800000f 000000004fb33600 000000000800000f 0000000211715b20
[128322.991311] r24-27  000000411c8e3bd0 000000411c8e3bd0 000000411c8e3b40 00000000410e1a00
[128323.087309] r28-31  00000000416bf9d8 0000000239aa8770 0000000239aa8820 000000000000cf14
[128323.187310] sr00-03  000000000500a800 0000000000000000 0000000000000000 000000000500a800
[128323.283324] sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
[128323.383313]
[128323.399309] IASQ: 0000000000000000 0000000000000000 IAOQ: 000000004073b3d4 000000004073b3d8
[128323.503310]  IIR: 03ffe01f    ISR: 0000000010340400  IOR: 000003e6aa2a87a8
[128323.587306]  CPU:        1   CR30: 0000000212ff0ce0 CR31: ffffffffffffffff
[128323.667328]  ORIG_R28: 0000000239aa8920
[128323.715307]  IAOQ[0]: dentry_free+0x100/0x128
[128323.771306]  IAOQ[1]: dentry_free+0x104/0x128
[128323.823305]  RP(r2): __dentry_kill+0x284/0x2e8
[128323.875347] Backtrace:
[128323.907319]  [<000000004073be90>] __dentry_kill+0x284/0x2e8
[128323.975309]  [<000000004073d6d8>] dput+0x334/0x5a8
[128324.031311]  [<0000000040726d94>] step_into+0x790/0xa88
[128324.095309]  [<0000000040728910>] path_openat+0x21c/0x1ba8
[128324.163310]  [<000000004072c018>] do_filp_open+0x9c/0x198
[128324.231310]  [<0000000040702320>] do_sys_openat2+0x14c/0x2a8
[128324.299309]  [<0000000040702c54>] compat_sys_openat+0x58/0xb8
[128324.367308]  [<0000000040303e30>] syscall_exit+0x0/0x10

after that:

[128324.451303] ---[ end trace 0000000000000000 ]---
[128345.511159] rcu: INFO: rcu_sched self-detected stall on CPU
[128345.511159] rcu:    0-...!: (5249 ticks this GP) idle=3c3/1/0x4000000000000002 softirq=35954078/35954078 fqs=54
[128345.511159]         (t=5250 jiffies g=33838821 q=175418)
[128345.511159] rcu: rcu_sched kthread starved for 5146 jiffies! g33838821 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
[128345.511159] rcu:    Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[128345.511159] ...
[128345.511159]
[128345.511159] rcu: Stack dump where RCU GP kthread last ran:
[128345.511159] Task dump for CPU 1:
[128345.511159] task:cc1plus         state:R  running task     stack:    0 pid:14735 ppid: 14734 flags:0x00000014
[128345.511159] Backtrace:
[128345.511159]  [<000000004071f5c8>] __legitimize_path+0x7c/0x108
[128345.511159]  [<00000000407210d4>] lookup_fast+0x1c8/0x290
[128345.511159]  [<00000000407277b4>] walk_component+0x1e8/0x330
[128345.511159]  [<0000000040727be8>] link_path_walk.part.0.constprop.0+0x2ec/0x4d0
[128345.511159]  [<0000000040728844>] path_openat+0x150/0x1ba8
[128345.511159]  [<000000004072c018>] do_filp_open+0x9c/0x198
[128345.511159]  [<0000000040702320>] do_sys_openat2+0x14c/0x2a8
[128345.511159]  [<0000000040702c54>] compat_sys_openat+0x58/0xb8
[128345.511159]  [<0000000040303e30>] syscall_exit+0x0/0x10
[128345.511159]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: WARNING: CPU: 1 PID: 14735 at fs/dcache.c:365 dentry_free+0x100/0x128
       [not found]           ` <20220720110032.1787-1-hdanton@sina.com>
@ 2022-07-20 17:06             ` Al Viro
  2022-07-20 23:15               ` Sam James
  0 siblings, 1 reply; 5+ messages in thread
From: Al Viro @ 2022-07-20 17:06 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Helge Deller, John David Anglin, linux-kernel, linux-parisc,
	linux-fsdevel

On Wed, Jul 20, 2022 at 07:00:32PM +0800, Hillf Danton wrote:

> To help debug it, de-union d_in_lookup_hash with d_alias and add debug
> info after dentry is killed. If any warning hits, we know where to add
> something like
> 
> 	WARN_ON(dentry->d_flags & DCACHE_DENTRY_KILLED);
> 
> before hlist_bl_add or hlist_add.

IDGI.  That clearly has nothing to do with in-lookup stuff - no
DCACHE_PAR_LOOKUP in reported flags, so it either never had it set,
or it went through __d_lookup_done() already.

If anything, it might have already been through d_free(), with
d_rcu being confused for d_alias.

I'd do something like
	WARN_ON(dentry->d_flags & (1U<<31));
	dentry->d_flags |= 1U << 31;
in the begining of d_free() (possibly with dumping dentry state if we
hit that, not that there would be much to report; d_name.name might
be informative, though).

Again, in-lookup looks like a red herring - DCACHE_PAR_LOOKUP is set
only in d_alloc_parallel(), right next to the insertion into the list
and removed only in __d_lookup_free(), right next to the removal from
the same.  No DCACHE_PAR_LOOKUP in ->d_flags (it's 0x8008 in reported
cases, i.e. DCACHE_OP_REVALIDATE | DCACHE_DENTRY_KILLED).

What's more, take a look at retain_dentry(); WARN_ON(d_in_lookup(dentry))
right at the top and it had not triggered in any of the reports I've
seen in that thread.  Granted, it's not called on each path to
__dentry_kill(), but it is on the call chains I've seen reported...

Another thing that might be interesting to know is ->d_sb, along with
->d_sb->s_type->name and ->d_sb->s_id.  That should tell which fs it's
on...

I wonder if anyone had seen anything similar outside of parisc...
I don't know if I have any chance to reproduce it here - the only
parisc box I've got is a 715/100 (assuming the disk is still alive)
and it's 32bit, unlike the reported setups and, er, not fast.
qemu seems to have some parisc support, but it's 32bit-only at the
moment...

PS: please, Cc fsdevel on anything VFS-related.  Very few people are
still subscribed to l-k these days - I am, but it's impossible to read
through and postings can easily get missed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: WARNING: CPU: 1 PID: 14735 at fs/dcache.c:365 dentry_free+0x100/0x128
  2022-07-20 17:06             ` WARNING: CPU: 1 PID: 14735 at fs/dcache.c:365 dentry_free+0x100/0x128 Al Viro
@ 2022-07-20 23:15               ` Sam James
  2022-07-21  3:54                 ` Helge Deller
  0 siblings, 1 reply; 5+ messages in thread
From: Sam James @ 2022-07-20 23:15 UTC (permalink / raw)
  To: Al Viro, Helge Deller
  Cc: Hillf Danton, Helge Deller, John David Anglin, linux-kernel,
	linux-parisc, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1067 bytes --]



> On 20 Jul 2022, at 18:06, Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> On Wed, Jul 20, 2022 at 07:00:32PM +0800, Hillf Danton wrote:
> 
>> To help debug it, de-union d_in_lookup_hash with d_alias and add debug
>> info after dentry is killed. If any warning hits, we know where to add
>> something like
>> 
>> 	WARN_ON(dentry->d_flags & DCACHE_DENTRY_KILLED);
>> 
>> before hlist_bl_add or hlist_add.

> [snip]
> I wonder if anyone had seen anything similar outside of parisc...
> I don't know if I have any chance to reproduce it here - the only
> parisc box I've got is a 715/100 (assuming the disk is still alive)
> and it's 32bit, unlike the reported setups and, er, not fast.
> qemu seems to have some parisc support, but it's 32bit-only at the
> moment...

I don't think I've seen this on parisc either, but I don't think
I've used tmpfs that heavily. I'll try it in case it's somehow more
likely to trigger it.

Helge, were there any particular steps to reproduce this? Or just
start doing your normal Debian builds on a tmpfs and it happens
soon enough?


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 358 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: WARNING: CPU: 1 PID: 14735 at fs/dcache.c:365 dentry_free+0x100/0x128
  2022-07-20 23:15               ` Sam James
@ 2022-07-21  3:54                 ` Helge Deller
  2022-07-30 20:21                   ` Helge Deller
  0 siblings, 1 reply; 5+ messages in thread
From: Helge Deller @ 2022-07-21  3:54 UTC (permalink / raw)
  To: Sam James, Al Viro
  Cc: Hillf Danton, John David Anglin, linux-kernel, linux-parisc,
	linux-fsdevel

On 7/21/22 01:15, Sam James wrote:
>> On 20 Jul 2022, at 18:06, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>
>> On Wed, Jul 20, 2022 at 07:00:32PM +0800, Hillf Danton wrote:
>>
>>> To help debug it, de-union d_in_lookup_hash with d_alias and add debug
>>> info after dentry is killed. If any warning hits, we know where to add
>>> something like
>>>
>>> 	WARN_ON(dentry->d_flags & DCACHE_DENTRY_KILLED);
>>>
>>> before hlist_bl_add or hlist_add.
>
>> [snip]
>> I wonder if anyone had seen anything similar outside of parisc...

Me too.
Of course it could be caused by the platform code, as we have had
issues with caches, spinlocks and so on.
On older kernels we also have seen RCU stalls in d_alloc_parallel().

>> I don't know if I have any chance to reproduce it here - the only
>> parisc box I've got is a 715/100 (assuming the disk is still alive)
>> and it's 32bit, unlike the reported setups and, er, not fast.

It's fun to boot it, but it will be too slow for actual testing.

>> qemu seems to have some parisc support, but it's 32bit-only at the
>> moment...

Yes. I think it will be hard to reproduce it in the VM.

> I don't think I've seen this on parisc either, but I don't think
> I've used tmpfs that heavily. I'll try it in case it's somehow more
> likely to trigger it.

It happened on the debian buildd server with tmpfs. To rule out tmpfs
I switched to ext4 (on SATA SSD) and it happened there as well.
I assume Dave's report is on ext3/ext4 with SCSI discs.

> Helge, were there any particular steps to reproduce this? Or just
> start doing your normal Debian builds on a tmpfs and it happens
> soon enough?

Currently it's not easy to reproduce for me either.
It happens on the debian buildd server (4-way c8000 machine) while building
the webkit2gtk package. I think it happens at the end when sbuild
cleans the build directories by deleting all files.
Maybe there is a filesystem test toolkit which you could try which hammers
the fs by deleting lots of files in parallel?

Helge

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: WARNING: CPU: 1 PID: 14735 at fs/dcache.c:365 dentry_free+0x100/0x128
  2022-07-21  3:54                 ` Helge Deller
@ 2022-07-30 20:21                   ` Helge Deller
  0 siblings, 0 replies; 5+ messages in thread
From: Helge Deller @ 2022-07-30 20:21 UTC (permalink / raw)
  To: Sam James, Al Viro
  Cc: Hillf Danton, John David Anglin, linux-kernel, linux-parisc,
	linux-fsdevel

On 7/21/22 05:54, Helge Deller wrote:
> On 7/21/22 01:15, Sam James wrote:
>>> On 20 Jul 2022, at 18:06, Al Viro <viro@zeniv.linux.org.uk> wrote:
>>>
>>> On Wed, Jul 20, 2022 at 07:00:32PM +0800, Hillf Danton wrote:
>>>
>>>> To help debug it, de-union d_in_lookup_hash with d_alias and add debug
>>>> info after dentry is killed. If any warning hits, we know where to add
>>>> something like
>>>>
>>>> 	WARN_ON(dentry->d_flags & DCACHE_DENTRY_KILLED);
>>>>
>>>> before hlist_bl_add or hlist_add.
>>
>>> [snip]
>>> I wonder if anyone had seen anything similar outside of parisc...
>
> Me too.
> Of course it could be caused by the platform code, as we have had
> issues with caches, spinlocks and so on.
> On older kernels we also have seen RCU stalls in d_alloc_parallel().
>
>>> I don't know if I have any chance to reproduce it here - the only
>>> parisc box I've got is a 715/100 (assuming the disk is still alive)
>>> and it's 32bit, unlike the reported setups and, er, not fast.
>
> It's fun to boot it, but it will be too slow for actual testing.
>
>>> qemu seems to have some parisc support, but it's 32bit-only at the
>>> moment...
>
> Yes. I think it will be hard to reproduce it in the VM.
>
>> I don't think I've seen this on parisc either, but I don't think
>> I've used tmpfs that heavily. I'll try it in case it's somehow more
>> likely to trigger it.
>
> It happened on the debian buildd server with tmpfs. To rule out tmpfs
> I switched to ext4 (on SATA SSD) and it happened there as well.
> I assume Dave's report is on ext3/ext4 with SCSI discs.
>
>> Helge, were there any particular steps to reproduce this? Or just
>> start doing your normal Debian builds on a tmpfs and it happens
>> soon enough?
>
> Currently it's not easy to reproduce for me either.
> It happens on the debian buildd server (4-way c8000 machine) while building
> the webkit2gtk package. I think it happens at the end when sbuild
> cleans the build directories by deleting all files.
> Maybe there is a filesystem test toolkit which you could try which hammers
> the fs by deleting lots of files in parallel?

I currently can't reproduce the issue any longer.
In case it pops up again, I'll follow up here again.

Helge

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-07-30 20:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20220709090756.2384-1-hdanton@sina.com>
     [not found] ` <20220715133300.1297-1-hdanton@sina.com>
     [not found]   ` <cff76e00-3561-4069-f5c7-26d3de4da3c4@gmx.de>
     [not found]     ` <Ytd2g72cj0Aq1MBG@ZenIV>
     [not found]       ` <860792bb-1fd6-66c6-ef31-4edd181e2954@gmx.de>
     [not found]         ` <YtepmwLj//zrD3V3@ZenIV>
     [not found]           ` <20220720110032.1787-1-hdanton@sina.com>
2022-07-20 17:06             ` WARNING: CPU: 1 PID: 14735 at fs/dcache.c:365 dentry_free+0x100/0x128 Al Viro
2022-07-20 23:15               ` Sam James
2022-07-21  3:54                 ` Helge Deller
2022-07-30 20:21                   ` Helge Deller
2022-07-09  5:33 Helge Deller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).