linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Hard and soft lockups with FIO and LTP runs on a large system
       [not found] ` <CAOUHufawNerxqLm7L9Yywp3HJFiYVrYO26ePUb1jH-qxNGWzyA@mail.gmail.com>
@ 2024-07-10 12:03   ` Bharata B Rao
  2024-07-10 12:24     ` Mateusz Guzik
  2024-07-10 18:04     ` Yu Zhao
  0 siblings, 2 replies; 6+ messages in thread
From: Bharata B Rao @ 2024-07-10 12:03 UTC (permalink / raw)
  To: Yu Zhao, mjguzik, david, kent.overstreet
  Cc: linux-mm, linux-kernel, nikunj, Upadhyay, Neeraj, Andrew Morton,
	David Hildenbrand, willy, vbabka, kinseyho, Mel Gorman,
	linux-fsdevel

On 07-Jul-24 4:12 AM, Yu Zhao wrote:
>> Some experiments tried
>> ======================
>> 1) When MGLRU was enabled many soft lockups were observed, no hard
>> lockups were seen for 48 hours run. Below is once such soft lockup.
<snip>
>> Below preemptirqsoff trace points to preemption being disabled for more
>> than 10s and the lock in picture is lruvec spinlock.
> 
> Also if you could try the other patch (mglru.patch) please. It should
> help reduce unnecessary rotations from deactivate_file_folio(), which
> in turn should reduce the contention on the LRU lock for MGLRU.

Thanks. With mglru.patch on a MGLRU-enabled system, the below latency 
trace record is no longer seen for a 30hr workload run.

> 
>>       # tracer: preemptirqsoff
>>       #
>>       # preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-mglru-irqstrc
>>       # --------------------------------------------------------------------
>>       # latency: 10382682 us, #4/4, CPU#128 | (M:desktop VP:0, KP:0, SP:0
>> HP:0 #P:512)
>>       #    -----------------
>>       #    | task: fio-2701523 (uid:0 nice:0 policy:0 rt_prio:0)
>>       #    -----------------
>>       #  => started at: deactivate_file_folio
>>       #  => ended at:   deactivate_file_folio
>>       #
>>       #
>>       #                    _------=> CPU#
>>       #                   / _-----=> irqs-off/BH-disabled
>>       #                  | / _----=> need-resched
>>       #                  || / _---=> hardirq/softirq
>>       #                  ||| / _--=> preempt-depth
>>       #                  |||| / _-=> migrate-disable
>>       #                  ||||| /     delay
>>       #  cmd     pid     |||||| time  |   caller
>>       #     \   /        ||||||  \    |    /
>>            fio-2701523 128...1.    0us$: deactivate_file_folio
>> <-deactivate_file_folio
>>            fio-2701523 128.N.1. 10382681us : deactivate_file_folio
>> <-deactivate_file_folio
>>            fio-2701523 128.N.1. 10382683us : tracer_preempt_on
>> <-deactivate_file_folio
>>            fio-2701523 128.N.1. 10382691us : <stack trace>
>>        => deactivate_file_folio
>>        => mapping_try_invalidate
>>        => invalidate_mapping_pages
>>        => invalidate_bdev
>>        => blkdev_common_ioctl
>>        => blkdev_ioctl
>>        => __x64_sys_ioctl
>>        => x64_sys_call
>>        => do_syscall_64
>>        => entry_SYSCALL_64_after_hwframe

However the contention now has shifted to inode_hash_lock. Around 55 
softlockups in ilookup() were observed:

# tracer: preemptirqsoff
#
# preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-trnmglru
# --------------------------------------------------------------------
# latency: 10620430 us, #4/4, CPU#260 | (M:desktop VP:0, KP:0, SP:0 HP:0 
#P:512)
#    -----------------
#    | task: fio-3244715 (uid:0 nice:0 policy:0 rt_prio:0)
#    -----------------
#  => started at: ilookup
#  => ended at:   ilookup
#
#
#                    _------=> CPU#
#                   / _-----=> irqs-off/BH-disabled
#                  | / _----=> need-resched
#                  || / _---=> hardirq/softirq
#                  ||| / _--=> preempt-depth
#                  |||| / _-=> migrate-disable
#                  ||||| /     delay
#  cmd     pid     |||||| time  |   caller
#     \   /        ||||||  \    |    /
      fio-3244715 260...1.    0us$: _raw_spin_lock <-ilookup
      fio-3244715 260.N.1. 10620429us : _raw_spin_unlock <-ilookup
      fio-3244715 260.N.1. 10620430us : tracer_preempt_on <-ilookup
      fio-3244715 260.N.1. 10620440us : <stack trace>
=> _raw_spin_unlock
=> ilookup
=> blkdev_get_no_open
=> blkdev_open
=> do_dentry_open
=> vfs_open
=> path_openat
=> do_filp_open
=> do_sys_openat2
=> __x64_sys_openat
=> x64_sys_call
=> do_syscall_64
=> entry_SYSCALL_64_after_hwframe

It appears that scalability issues with inode_hash_lock has been brought 
up multiple times in the past and there were patches to address the same.

https://lore.kernel.org/all/20231206060629.2827226-9-david@fromorbit.com/
https://lore.kernel.org/lkml/20240611173824.535995-2-mjguzik@gmail.com/

CC'ing FS folks/list for awareness/comments.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hard and soft lockups with FIO and LTP runs on a large system
  2024-07-10 12:03   ` Hard and soft lockups with FIO and LTP runs on a large system Bharata B Rao
@ 2024-07-10 12:24     ` Mateusz Guzik
  2024-07-10 13:04       ` Mateusz Guzik
  2024-07-10 18:04     ` Yu Zhao
  1 sibling, 1 reply; 6+ messages in thread
From: Mateusz Guzik @ 2024-07-10 12:24 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: Yu Zhao, david, kent.overstreet, linux-mm, linux-kernel, nikunj,
	Upadhyay, Neeraj, Andrew Morton, David Hildenbrand, willy, vbabka,
	kinseyho, Mel Gorman, linux-fsdevel

On Wed, Jul 10, 2024 at 2:04 PM Bharata B Rao <bharata@amd.com> wrote:
>
> On 07-Jul-24 4:12 AM, Yu Zhao wrote:
> >> Some experiments tried
> >> ======================
> >> 1) When MGLRU was enabled many soft lockups were observed, no hard
> >> lockups were seen for 48 hours run. Below is once such soft lockup.
> <snip>
> >> Below preemptirqsoff trace points to preemption being disabled for more
> >> than 10s and the lock in picture is lruvec spinlock.
> >
> > Also if you could try the other patch (mglru.patch) please. It should
> > help reduce unnecessary rotations from deactivate_file_folio(), which
> > in turn should reduce the contention on the LRU lock for MGLRU.
>
> Thanks. With mglru.patch on a MGLRU-enabled system, the below latency
> trace record is no longer seen for a 30hr workload run.
>
> >
> >>       # tracer: preemptirqsoff
> >>       #
> >>       # preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-mglru-irqstrc
> >>       # --------------------------------------------------------------------
> >>       # latency: 10382682 us, #4/4, CPU#128 | (M:desktop VP:0, KP:0, SP:0
> >> HP:0 #P:512)
> >>       #    -----------------
> >>       #    | task: fio-2701523 (uid:0 nice:0 policy:0 rt_prio:0)
> >>       #    -----------------
> >>       #  => started at: deactivate_file_folio
> >>       #  => ended at:   deactivate_file_folio
> >>       #
> >>       #
> >>       #                    _------=> CPU#
> >>       #                   / _-----=> irqs-off/BH-disabled
> >>       #                  | / _----=> need-resched
> >>       #                  || / _---=> hardirq/softirq
> >>       #                  ||| / _--=> preempt-depth
> >>       #                  |||| / _-=> migrate-disable
> >>       #                  ||||| /     delay
> >>       #  cmd     pid     |||||| time  |   caller
> >>       #     \   /        ||||||  \    |    /
> >>            fio-2701523 128...1.    0us$: deactivate_file_folio
> >> <-deactivate_file_folio
> >>            fio-2701523 128.N.1. 10382681us : deactivate_file_folio
> >> <-deactivate_file_folio
> >>            fio-2701523 128.N.1. 10382683us : tracer_preempt_on
> >> <-deactivate_file_folio
> >>            fio-2701523 128.N.1. 10382691us : <stack trace>
> >>        => deactivate_file_folio
> >>        => mapping_try_invalidate
> >>        => invalidate_mapping_pages
> >>        => invalidate_bdev
> >>        => blkdev_common_ioctl
> >>        => blkdev_ioctl
> >>        => __x64_sys_ioctl
> >>        => x64_sys_call
> >>        => do_syscall_64
> >>        => entry_SYSCALL_64_after_hwframe
>
> However the contention now has shifted to inode_hash_lock. Around 55
> softlockups in ilookup() were observed:
>
> # tracer: preemptirqsoff
> #
> # preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-trnmglru
> # --------------------------------------------------------------------
> # latency: 10620430 us, #4/4, CPU#260 | (M:desktop VP:0, KP:0, SP:0 HP:0
> #P:512)
> #    -----------------
> #    | task: fio-3244715 (uid:0 nice:0 policy:0 rt_prio:0)
> #    -----------------
> #  => started at: ilookup
> #  => ended at:   ilookup
> #
> #
> #                    _------=> CPU#
> #                   / _-----=> irqs-off/BH-disabled
> #                  | / _----=> need-resched
> #                  || / _---=> hardirq/softirq
> #                  ||| / _--=> preempt-depth
> #                  |||| / _-=> migrate-disable
> #                  ||||| /     delay
> #  cmd     pid     |||||| time  |   caller
> #     \   /        ||||||  \    |    /
>       fio-3244715 260...1.    0us$: _raw_spin_lock <-ilookup
>       fio-3244715 260.N.1. 10620429us : _raw_spin_unlock <-ilookup
>       fio-3244715 260.N.1. 10620430us : tracer_preempt_on <-ilookup
>       fio-3244715 260.N.1. 10620440us : <stack trace>
> => _raw_spin_unlock
> => ilookup
> => blkdev_get_no_open
> => blkdev_open
> => do_dentry_open
> => vfs_open
> => path_openat
> => do_filp_open
> => do_sys_openat2
> => __x64_sys_openat
> => x64_sys_call
> => do_syscall_64
> => entry_SYSCALL_64_after_hwframe
>
> It appears that scalability issues with inode_hash_lock has been brought
> up multiple times in the past and there were patches to address the same.
>
> https://lore.kernel.org/all/20231206060629.2827226-9-david@fromorbit.com/
> https://lore.kernel.org/lkml/20240611173824.535995-2-mjguzik@gmail.com/
>
> CC'ing FS folks/list for awareness/comments.

Note my patch does not enable RCU usage in ilookup, but this can be
trivially added.

I can't even compile-test at the moment, but the diff below should do
it. Also note the patches are present here
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.inode.rcu
, not yet integrated anywhere.

That said, if fio you are operating on the same target inode every
time then this is merely going to shift contention to the inode
spinlock usage in find_inode_fast.

diff --git a/fs/inode.c b/fs/inode.c
index ad7844ca92f9..70b0e6383341 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1524,10 +1524,14 @@ struct inode *ilookup(struct super_block *sb,
unsigned long ino)
 {
        struct hlist_head *head = inode_hashtable + hash(sb, ino);
        struct inode *inode;
+
 again:
-       spin_lock(&inode_hash_lock);
-       inode = find_inode_fast(sb, head, ino, true);
-       spin_unlock(&inode_hash_lock);
+       inode = find_inode_fast(sb, head, ino, false);
+       if (IS_ERR_OR_NULL_PTR(inode)) {
+               spin_lock(&inode_hash_lock);
+               inode = find_inode_fast(sb, head, ino, true);
+               spin_unlock(&inode_hash_lock);
+       }

        if (inode) {
                if (IS_ERR(inode))



-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Hard and soft lockups with FIO and LTP runs on a large system
  2024-07-10 12:24     ` Mateusz Guzik
@ 2024-07-10 13:04       ` Mateusz Guzik
  2024-07-15  5:22         ` Bharata B Rao
  0 siblings, 1 reply; 6+ messages in thread
From: Mateusz Guzik @ 2024-07-10 13:04 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: Yu Zhao, david, kent.overstreet, linux-mm, linux-kernel, nikunj,
	Upadhyay, Neeraj, Andrew Morton, David Hildenbrand, willy, vbabka,
	kinseyho, Mel Gorman, linux-fsdevel

On Wed, Jul 10, 2024 at 2:24 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Wed, Jul 10, 2024 at 2:04 PM Bharata B Rao <bharata@amd.com> wrote:
> >
> > On 07-Jul-24 4:12 AM, Yu Zhao wrote:
> > >> Some experiments tried
> > >> ======================
> > >> 1) When MGLRU was enabled many soft lockups were observed, no hard
> > >> lockups were seen for 48 hours run. Below is once such soft lockup.
> > <snip>
> > >> Below preemptirqsoff trace points to preemption being disabled for more
> > >> than 10s and the lock in picture is lruvec spinlock.
> > >
> > > Also if you could try the other patch (mglru.patch) please. It should
> > > help reduce unnecessary rotations from deactivate_file_folio(), which
> > > in turn should reduce the contention on the LRU lock for MGLRU.
> >
> > Thanks. With mglru.patch on a MGLRU-enabled system, the below latency
> > trace record is no longer seen for a 30hr workload run.
> >
> > >
> > >>       # tracer: preemptirqsoff
> > >>       #
> > >>       # preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-mglru-irqstrc
> > >>       # --------------------------------------------------------------------
> > >>       # latency: 10382682 us, #4/4, CPU#128 | (M:desktop VP:0, KP:0, SP:0
> > >> HP:0 #P:512)
> > >>       #    -----------------
> > >>       #    | task: fio-2701523 (uid:0 nice:0 policy:0 rt_prio:0)
> > >>       #    -----------------
> > >>       #  => started at: deactivate_file_folio
> > >>       #  => ended at:   deactivate_file_folio
> > >>       #
> > >>       #
> > >>       #                    _------=> CPU#
> > >>       #                   / _-----=> irqs-off/BH-disabled
> > >>       #                  | / _----=> need-resched
> > >>       #                  || / _---=> hardirq/softirq
> > >>       #                  ||| / _--=> preempt-depth
> > >>       #                  |||| / _-=> migrate-disable
> > >>       #                  ||||| /     delay
> > >>       #  cmd     pid     |||||| time  |   caller
> > >>       #     \   /        ||||||  \    |    /
> > >>            fio-2701523 128...1.    0us$: deactivate_file_folio
> > >> <-deactivate_file_folio
> > >>            fio-2701523 128.N.1. 10382681us : deactivate_file_folio
> > >> <-deactivate_file_folio
> > >>            fio-2701523 128.N.1. 10382683us : tracer_preempt_on
> > >> <-deactivate_file_folio
> > >>            fio-2701523 128.N.1. 10382691us : <stack trace>
> > >>        => deactivate_file_folio
> > >>        => mapping_try_invalidate
> > >>        => invalidate_mapping_pages
> > >>        => invalidate_bdev
> > >>        => blkdev_common_ioctl
> > >>        => blkdev_ioctl
> > >>        => __x64_sys_ioctl
> > >>        => x64_sys_call
> > >>        => do_syscall_64
> > >>        => entry_SYSCALL_64_after_hwframe
> >
> > However the contention now has shifted to inode_hash_lock. Around 55
> > softlockups in ilookup() were observed:
> >
> > # tracer: preemptirqsoff
> > #
> > # preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-trnmglru
> > # --------------------------------------------------------------------
> > # latency: 10620430 us, #4/4, CPU#260 | (M:desktop VP:0, KP:0, SP:0 HP:0
> > #P:512)
> > #    -----------------
> > #    | task: fio-3244715 (uid:0 nice:0 policy:0 rt_prio:0)
> > #    -----------------
> > #  => started at: ilookup
> > #  => ended at:   ilookup
> > #
> > #
> > #                    _------=> CPU#
> > #                   / _-----=> irqs-off/BH-disabled
> > #                  | / _----=> need-resched
> > #                  || / _---=> hardirq/softirq
> > #                  ||| / _--=> preempt-depth
> > #                  |||| / _-=> migrate-disable
> > #                  ||||| /     delay
> > #  cmd     pid     |||||| time  |   caller
> > #     \   /        ||||||  \    |    /
> >       fio-3244715 260...1.    0us$: _raw_spin_lock <-ilookup
> >       fio-3244715 260.N.1. 10620429us : _raw_spin_unlock <-ilookup
> >       fio-3244715 260.N.1. 10620430us : tracer_preempt_on <-ilookup
> >       fio-3244715 260.N.1. 10620440us : <stack trace>
> > => _raw_spin_unlock
> > => ilookup
> > => blkdev_get_no_open
> > => blkdev_open
> > => do_dentry_open
> > => vfs_open
> > => path_openat
> > => do_filp_open
> > => do_sys_openat2
> > => __x64_sys_openat
> > => x64_sys_call
> > => do_syscall_64
> > => entry_SYSCALL_64_after_hwframe
> >
> > It appears that scalability issues with inode_hash_lock has been brought
> > up multiple times in the past and there were patches to address the same.
> >
> > https://lore.kernel.org/all/20231206060629.2827226-9-david@fromorbit.com/
> > https://lore.kernel.org/lkml/20240611173824.535995-2-mjguzik@gmail.com/
> >
> > CC'ing FS folks/list for awareness/comments.
>
> Note my patch does not enable RCU usage in ilookup, but this can be
> trivially added.
>
> I can't even compile-test at the moment, but the diff below should do
> it. Also note the patches are present here
> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.inode.rcu
> , not yet integrated anywhere.
>
> That said, if fio you are operating on the same target inode every
> time then this is merely going to shift contention to the inode
> spinlock usage in find_inode_fast.
>
> diff --git a/fs/inode.c b/fs/inode.c
> index ad7844ca92f9..70b0e6383341 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -1524,10 +1524,14 @@ struct inode *ilookup(struct super_block *sb,
> unsigned long ino)
>  {
>         struct hlist_head *head = inode_hashtable + hash(sb, ino);
>         struct inode *inode;
> +
>  again:
> -       spin_lock(&inode_hash_lock);
> -       inode = find_inode_fast(sb, head, ino, true);
> -       spin_unlock(&inode_hash_lock);
> +       inode = find_inode_fast(sb, head, ino, false);
> +       if (IS_ERR_OR_NULL_PTR(inode)) {
> +               spin_lock(&inode_hash_lock);
> +               inode = find_inode_fast(sb, head, ino, true);
> +               spin_unlock(&inode_hash_lock);
> +       }
>
>         if (inode) {
>                 if (IS_ERR(inode))
>

I think I expressed myself poorly, so here is take two:
1. inode hash soft lookup should get resolved if you apply
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs.inode.rcu&id=7180f8d91fcbf252de572d9ffacc945effed0060
and the above pasted fix (not compile tested tho, but it should be
obvious what the intended fix looks like)
2. find_inode_hash spinlocks the target inode. if your bench only
operates on one, then contention is going to shift there and you may
still be getting soft lockups. not taking the spinlock in this
codepath is hackable, but I don't want to do it without a good
justification.


-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hard and soft lockups with FIO and LTP runs on a large system
  2024-07-10 12:03   ` Hard and soft lockups with FIO and LTP runs on a large system Bharata B Rao
  2024-07-10 12:24     ` Mateusz Guzik
@ 2024-07-10 18:04     ` Yu Zhao
  1 sibling, 0 replies; 6+ messages in thread
From: Yu Zhao @ 2024-07-10 18:04 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: mjguzik, david, kent.overstreet, linux-mm, linux-kernel, nikunj,
	Upadhyay, Neeraj, Andrew Morton, David Hildenbrand, willy, vbabka,
	kinseyho, Mel Gorman, linux-fsdevel

On Wed, Jul 10, 2024 at 6:04 AM Bharata B Rao <bharata@amd.com> wrote:
>
> On 07-Jul-24 4:12 AM, Yu Zhao wrote:
> >> Some experiments tried
> >> ======================
> >> 1) When MGLRU was enabled many soft lockups were observed, no hard
> >> lockups were seen for 48 hours run. Below is once such soft lockup.
> <snip>
> >> Below preemptirqsoff trace points to preemption being disabled for more
> >> than 10s and the lock in picture is lruvec spinlock.
> >
> > Also if you could try the other patch (mglru.patch) please. It should
> > help reduce unnecessary rotations from deactivate_file_folio(), which
> > in turn should reduce the contention on the LRU lock for MGLRU.
>
> Thanks. With mglru.patch on a MGLRU-enabled system, the below latency
> trace record is no longer seen for a 30hr workload run.

Glad to hear. Will post a patch and add you as reported/tested-by.

> >
> >>       # tracer: preemptirqsoff
> >>       #
> >>       # preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-mglru-irqstrc
> >>       # --------------------------------------------------------------------
> >>       # latency: 10382682 us, #4/4, CPU#128 | (M:desktop VP:0, KP:0, SP:0
> >> HP:0 #P:512)
> >>       #    -----------------
> >>       #    | task: fio-2701523 (uid:0 nice:0 policy:0 rt_prio:0)
> >>       #    -----------------
> >>       #  => started at: deactivate_file_folio
> >>       #  => ended at:   deactivate_file_folio
> >>       #
> >>       #
> >>       #                    _------=> CPU#
> >>       #                   / _-----=> irqs-off/BH-disabled
> >>       #                  | / _----=> need-resched
> >>       #                  || / _---=> hardirq/softirq
> >>       #                  ||| / _--=> preempt-depth
> >>       #                  |||| / _-=> migrate-disable
> >>       #                  ||||| /     delay
> >>       #  cmd     pid     |||||| time  |   caller
> >>       #     \   /        ||||||  \    |    /
> >>            fio-2701523 128...1.    0us$: deactivate_file_folio
> >> <-deactivate_file_folio
> >>            fio-2701523 128.N.1. 10382681us : deactivate_file_folio
> >> <-deactivate_file_folio
> >>            fio-2701523 128.N.1. 10382683us : tracer_preempt_on
> >> <-deactivate_file_folio
> >>            fio-2701523 128.N.1. 10382691us : <stack trace>
> >>        => deactivate_file_folio
> >>        => mapping_try_invalidate
> >>        => invalidate_mapping_pages
> >>        => invalidate_bdev
> >>        => blkdev_common_ioctl
> >>        => blkdev_ioctl
> >>        => __x64_sys_ioctl
> >>        => x64_sys_call
> >>        => do_syscall_64
> >>        => entry_SYSCALL_64_after_hwframe
>
> However the contention now has shifted to inode_hash_lock. Around 55
> softlockups in ilookup() were observed:

This one is from fs/blk, so I'll leave it to those experts.

> # tracer: preemptirqsoff
> #
> # preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-trnmglru
> # --------------------------------------------------------------------
> # latency: 10620430 us, #4/4, CPU#260 | (M:desktop VP:0, KP:0, SP:0 HP:0
> #P:512)
> #    -----------------
> #    | task: fio-3244715 (uid:0 nice:0 policy:0 rt_prio:0)
> #    -----------------
> #  => started at: ilookup
> #  => ended at:   ilookup
> #
> #
> #                    _------=> CPU#
> #                   / _-----=> irqs-off/BH-disabled
> #                  | / _----=> need-resched
> #                  || / _---=> hardirq/softirq
> #                  ||| / _--=> preempt-depth
> #                  |||| / _-=> migrate-disable
> #                  ||||| /     delay
> #  cmd     pid     |||||| time  |   caller
> #     \   /        ||||||  \    |    /
>       fio-3244715 260...1.    0us$: _raw_spin_lock <-ilookup
>       fio-3244715 260.N.1. 10620429us : _raw_spin_unlock <-ilookup
>       fio-3244715 260.N.1. 10620430us : tracer_preempt_on <-ilookup
>       fio-3244715 260.N.1. 10620440us : <stack trace>
> => _raw_spin_unlock
> => ilookup
> => blkdev_get_no_open
> => blkdev_open
> => do_dentry_open
> => vfs_open
> => path_openat
> => do_filp_open
> => do_sys_openat2
> => __x64_sys_openat
> => x64_sys_call
> => do_syscall_64
> => entry_SYSCALL_64_after_hwframe
>
> It appears that scalability issues with inode_hash_lock has been brought
> up multiple times in the past and there were patches to address the same.
>
> https://lore.kernel.org/all/20231206060629.2827226-9-david@fromorbit.com/
> https://lore.kernel.org/lkml/20240611173824.535995-2-mjguzik@gmail.com/
>
> CC'ing FS folks/list for awareness/comments.
>
> Regards,
> Bharata.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hard and soft lockups with FIO and LTP runs on a large system
  2024-07-10 13:04       ` Mateusz Guzik
@ 2024-07-15  5:22         ` Bharata B Rao
  2024-07-15  6:48           ` Mateusz Guzik
  0 siblings, 1 reply; 6+ messages in thread
From: Bharata B Rao @ 2024-07-15  5:22 UTC (permalink / raw)
  To: Mateusz Guzik
  Cc: Yu Zhao, david, kent.overstreet, linux-mm, linux-kernel, nikunj,
	Upadhyay, Neeraj, Andrew Morton, David Hildenbrand, willy, vbabka,
	kinseyho, Mel Gorman, linux-fsdevel

On 10-Jul-24 6:34 PM, Mateusz Guzik wrote:
>>> However the contention now has shifted to inode_hash_lock. Around 55
>>> softlockups in ilookup() were observed:
>>>
>>> # tracer: preemptirqsoff
>>> #
>>> # preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-trnmglru
>>> # --------------------------------------------------------------------
>>> # latency: 10620430 us, #4/4, CPU#260 | (M:desktop VP:0, KP:0, SP:0 HP:0
>>> #P:512)
>>> #    -----------------
>>> #    | task: fio-3244715 (uid:0 nice:0 policy:0 rt_prio:0)
>>> #    -----------------
>>> #  => started at: ilookup
>>> #  => ended at:   ilookup
>>> #
>>> #
>>> #                    _------=> CPU#
>>> #                   / _-----=> irqs-off/BH-disabled
>>> #                  | / _----=> need-resched
>>> #                  || / _---=> hardirq/softirq
>>> #                  ||| / _--=> preempt-depth
>>> #                  |||| / _-=> migrate-disable
>>> #                  ||||| /     delay
>>> #  cmd     pid     |||||| time  |   caller
>>> #     \   /        ||||||  \    |    /
>>>        fio-3244715 260...1.    0us$: _raw_spin_lock <-ilookup
>>>        fio-3244715 260.N.1. 10620429us : _raw_spin_unlock <-ilookup
>>>        fio-3244715 260.N.1. 10620430us : tracer_preempt_on <-ilookup
>>>        fio-3244715 260.N.1. 10620440us : <stack trace>
>>> => _raw_spin_unlock
>>> => ilookup
>>> => blkdev_get_no_open
>>> => blkdev_open
>>> => do_dentry_open
>>> => vfs_open
>>> => path_openat
>>> => do_filp_open
>>> => do_sys_openat2
>>> => __x64_sys_openat
>>> => x64_sys_call
>>> => do_syscall_64
>>> => entry_SYSCALL_64_after_hwframe
>>>
>>> It appears that scalability issues with inode_hash_lock has been brought
>>> up multiple times in the past and there were patches to address the same.
>>>
>>> https://lore.kernel.org/all/20231206060629.2827226-9-david@fromorbit.com/
>>> https://lore.kernel.org/lkml/20240611173824.535995-2-mjguzik@gmail.com/
>>>
>>> CC'ing FS folks/list for awareness/comments.
>>
>> Note my patch does not enable RCU usage in ilookup, but this can be
>> trivially added.
>>
>> I can't even compile-test at the moment, but the diff below should do
>> it. Also note the patches are present here
>> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.inode.rcu
>> , not yet integrated anywhere.
>>
>> That said, if fio you are operating on the same target inode every
>> time then this is merely going to shift contention to the inode
>> spinlock usage in find_inode_fast.
>>
>> diff --git a/fs/inode.c b/fs/inode.c
>> index ad7844ca92f9..70b0e6383341 100644
>> --- a/fs/inode.c
>> +++ b/fs/inode.c
>> @@ -1524,10 +1524,14 @@ struct inode *ilookup(struct super_block *sb,
>> unsigned long ino)
>>   {
>>          struct hlist_head *head = inode_hashtable + hash(sb, ino);
>>          struct inode *inode;
>> +
>>   again:
>> -       spin_lock(&inode_hash_lock);
>> -       inode = find_inode_fast(sb, head, ino, true);
>> -       spin_unlock(&inode_hash_lock);
>> +       inode = find_inode_fast(sb, head, ino, false);
>> +       if (IS_ERR_OR_NULL_PTR(inode)) {
>> +               spin_lock(&inode_hash_lock);
>> +               inode = find_inode_fast(sb, head, ino, true);
>> +               spin_unlock(&inode_hash_lock);
>> +       }
>>
>>          if (inode) {
>>                  if (IS_ERR(inode))
>>
> 
> I think I expressed myself poorly, so here is take two:
> 1. inode hash soft lookup should get resolved if you apply
> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs.inode.rcu&id=7180f8d91fcbf252de572d9ffacc945effed0060
> and the above pasted fix (not compile tested tho, but it should be
> obvious what the intended fix looks like)
> 2. find_inode_hash spinlocks the target inode. if your bench only
> operates on one, then contention is going to shift there and you may
> still be getting soft lockups. not taking the spinlock in this
> codepath is hackable, but I don't want to do it without a good
> justification.

Thanks Mateusz for the fix. With this patch applied, the above mentioned 
contention in ilookup() has not been observed for a test run during the 
weekend.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Hard and soft lockups with FIO and LTP runs on a large system
  2024-07-15  5:22         ` Bharata B Rao
@ 2024-07-15  6:48           ` Mateusz Guzik
  0 siblings, 0 replies; 6+ messages in thread
From: Mateusz Guzik @ 2024-07-15  6:48 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: Yu Zhao, david, kent.overstreet, linux-mm, linux-kernel, nikunj,
	Upadhyay, Neeraj, Andrew Morton, David Hildenbrand, willy, vbabka,
	kinseyho, Mel Gorman, linux-fsdevel

On Mon, Jul 15, 2024 at 7:22 AM Bharata B Rao <bharata@amd.com> wrote:
>
> On 10-Jul-24 6:34 PM, Mateusz Guzik wrote:
> >>> However the contention now has shifted to inode_hash_lock. Around 55
> >>> softlockups in ilookup() were observed:
> >>>
> >>> # tracer: preemptirqsoff
> >>> #
> >>> # preemptirqsoff latency trace v1.1.5 on 6.10.0-rc3-trnmglru
> >>> # --------------------------------------------------------------------
> >>> # latency: 10620430 us, #4/4, CPU#260 | (M:desktop VP:0, KP:0, SP:0 HP:0
> >>> #P:512)
> >>> #    -----------------
> >>> #    | task: fio-3244715 (uid:0 nice:0 policy:0 rt_prio:0)
> >>> #    -----------------
> >>> #  => started at: ilookup
> >>> #  => ended at:   ilookup
> >>> #
> >>> #
> >>> #                    _------=> CPU#
> >>> #                   / _-----=> irqs-off/BH-disabled
> >>> #                  | / _----=> need-resched
> >>> #                  || / _---=> hardirq/softirq
> >>> #                  ||| / _--=> preempt-depth
> >>> #                  |||| / _-=> migrate-disable
> >>> #                  ||||| /     delay
> >>> #  cmd     pid     |||||| time  |   caller
> >>> #     \   /        ||||||  \    |    /
> >>>        fio-3244715 260...1.    0us$: _raw_spin_lock <-ilookup
> >>>        fio-3244715 260.N.1. 10620429us : _raw_spin_unlock <-ilookup
> >>>        fio-3244715 260.N.1. 10620430us : tracer_preempt_on <-ilookup
> >>>        fio-3244715 260.N.1. 10620440us : <stack trace>
> >>> => _raw_spin_unlock
> >>> => ilookup
> >>> => blkdev_get_no_open
> >>> => blkdev_open
> >>> => do_dentry_open
> >>> => vfs_open
> >>> => path_openat
> >>> => do_filp_open
> >>> => do_sys_openat2
> >>> => __x64_sys_openat
> >>> => x64_sys_call
> >>> => do_syscall_64
> >>> => entry_SYSCALL_64_after_hwframe
> >>>
> >>> It appears that scalability issues with inode_hash_lock has been brought
> >>> up multiple times in the past and there were patches to address the same.
> >>>
> >>> https://lore.kernel.org/all/20231206060629.2827226-9-david@fromorbit.com/
> >>> https://lore.kernel.org/lkml/20240611173824.535995-2-mjguzik@gmail.com/
> >>>
> >>> CC'ing FS folks/list for awareness/comments.
> >>
> >> Note my patch does not enable RCU usage in ilookup, but this can be
> >> trivially added.
> >>
> >> I can't even compile-test at the moment, but the diff below should do
> >> it. Also note the patches are present here
> >> https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.inode.rcu
> >> , not yet integrated anywhere.
> >>
> >> That said, if fio you are operating on the same target inode every
> >> time then this is merely going to shift contention to the inode
> >> spinlock usage in find_inode_fast.
> >>
> >> diff --git a/fs/inode.c b/fs/inode.c
> >> index ad7844ca92f9..70b0e6383341 100644
> >> --- a/fs/inode.c
> >> +++ b/fs/inode.c
> >> @@ -1524,10 +1524,14 @@ struct inode *ilookup(struct super_block *sb,
> >> unsigned long ino)
> >>   {
> >>          struct hlist_head *head = inode_hashtable + hash(sb, ino);
> >>          struct inode *inode;
> >> +
> >>   again:
> >> -       spin_lock(&inode_hash_lock);
> >> -       inode = find_inode_fast(sb, head, ino, true);
> >> -       spin_unlock(&inode_hash_lock);
> >> +       inode = find_inode_fast(sb, head, ino, false);
> >> +       if (IS_ERR_OR_NULL_PTR(inode)) {
> >> +               spin_lock(&inode_hash_lock);
> >> +               inode = find_inode_fast(sb, head, ino, true);
> >> +               spin_unlock(&inode_hash_lock);
> >> +       }
> >>
> >>          if (inode) {
> >>                  if (IS_ERR(inode))
> >>
> >
> > I think I expressed myself poorly, so here is take two:
> > 1. inode hash soft lookup should get resolved if you apply
> > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs.inode.rcu&id=7180f8d91fcbf252de572d9ffacc945effed0060
> > and the above pasted fix (not compile tested tho, but it should be
> > obvious what the intended fix looks like)
> > 2. find_inode_hash spinlocks the target inode. if your bench only
> > operates on one, then contention is going to shift there and you may
> > still be getting soft lockups. not taking the spinlock in this
> > codepath is hackable, but I don't want to do it without a good
> > justification.
>
> Thanks Mateusz for the fix. With this patch applied, the above mentioned
> contention in ilookup() has not been observed for a test run during the
> weekend.
>

Ok, I'll do some clean ups and send a proper patch to the vfs folks later today.

Thanks for testing.

-- 
Mateusz Guzik <mjguzik gmail.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-07-15  6:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <d2841226-e27b-4d3d-a578-63587a3aa4f3@amd.com>
     [not found] ` <CAOUHufawNerxqLm7L9Yywp3HJFiYVrYO26ePUb1jH-qxNGWzyA@mail.gmail.com>
2024-07-10 12:03   ` Hard and soft lockups with FIO and LTP runs on a large system Bharata B Rao
2024-07-10 12:24     ` Mateusz Guzik
2024-07-10 13:04       ` Mateusz Guzik
2024-07-15  5:22         ` Bharata B Rao
2024-07-15  6:48           ` Mateusz Guzik
2024-07-10 18:04     ` Yu Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).