6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
@ 2024-06-25 20:56 Mikhail Gavrilov
  2024-06-26 10:48 ` Filipe Manana
  0 siblings, 1 reply; 56+ messages in thread
From: Mikhail Gavrilov @ 2024-06-25 20:56 UTC (permalink / raw)
  To: Linux List Kernel Mailing, Linux regressions mailing list,
	Btrfs BTRFS, fdmanana, dsterba, josef

[-- Attachment #1: Type: text/plain, Size: 2612 bytes --]

Hi,
after f1d97e769152 I spotted increased execution time of the kswapd0
process and symptoms as if there is not enough memory.
Very often I see that kswapd0 consumes 100% CPU [1].
Before f1d97e769152 after an hour kswapd0 is working ~3:51 and after
three hours ~10:13 time.
After f1d97e769152 kswapd0 time increased to ~25:48 after the first
hour and three hours it hit 71:01 time.
So execution time has increased by 6-7 times.

f1d97e76915285013037c487d9513ab763005286 is the first bad commit
commit f1d97e76915285013037c487d9513ab763005286 (HEAD)
Author: Filipe Manana <fdmanana@suse.com>
Date:   Fri Mar 22 18:02:59 2024 +0000

    btrfs: add a global per cpu counter to track number of used extent maps

    Add a per cpu counter that tracks the total number of extent maps that are
    in extent trees of inodes that belong to fs trees. This is going to be
    used in an upcoming change that adds a shrinker for extent maps. Only
    extent maps for fs trees are considered, because for special trees such as
    the data relocation tree we don't want to evict their extent maps which
    are critical for the relocation to work, and since those are limited, it's
    not a concern to have them in memory during the relocation of a block
    group. Another case are extent maps for free space cache inodes, which
    must always remain in memory, but those are limited (there's only one per
    free space cache inode, which means one per block group).

    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>

 fs/btrfs/disk-io.c    |  9 +++++++++
 fs/btrfs/extent_map.c | 17 +++++++++++++++++
 fs/btrfs/fs.h         |  2 ++
 3 files changed, 28 insertions(+)

Unfortunately I can't check the revert commit f1d97e769152 because of conflicts.

> git reset --hard v6.10-rc1
HEAD is now at 1613e604df0c Linux 6.10-rc1

> git revert -n f1d97e76915285013037c487d9513ab763005286
Auto-merging fs/btrfs/disk-io.c
Auto-merging fs/btrfs/extent_map.c
Auto-merging fs/btrfs/fs.h
CONFLICT (content): Merge conflict in fs/btrfs/fs.h
error: could not revert f1d97e769152... btrfs: add a global per cpu
counter to track number of used extent maps

However I double checked every bisect step and I am confident in the
correctness of the result.

I also attach here a full kernel log and build config.

My hardware specs: https://linux-hardware.org/?probe=d377acdb9e

Filipe can you look into this please?

[1] https://postimg.cc/Xrn6qfxh

-- 
Best Regards,
Mike Gavrilov.

[-- Attachment #2: .config.zip --]
[-- Type: application/zip, Size: 66495 bytes --]

[-- Attachment #3: dmesg.zip --]
[-- Type: application/zip, Size: 52548 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-06-25 20:56 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory Mikhail Gavrilov
@ 2024-06-26 10:48 ` Filipe Manana
  2024-06-26 14:16   ` Mikhail Gavrilov
  0 siblings, 1 reply; 56+ messages in thread
From: Filipe Manana @ 2024-06-26 10:48 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Linux List Kernel Mailing, Linux regressions mailing list,
	Btrfs BTRFS, dsterba, josef

On Tue, Jun 25, 2024 at 10:04 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> Hi,
> after f1d97e769152 I spotted increased execution time of the kswapd0
> process and symptoms as if there is not enough memory.
> Very often I see that kswapd0 consumes 100% CPU [1].
> Before f1d97e769152 after an hour kswapd0 is working ~3:51 and after
> three hours ~10:13 time.
> After f1d97e769152 kswapd0 time increased to ~25:48 after the first
> hour and three hours it hit 71:01 time.
> So execution time has increased by 6-7 times.
>
> f1d97e76915285013037c487d9513ab763005286 is the first bad commit
> commit f1d97e76915285013037c487d9513ab763005286 (HEAD)
> Author: Filipe Manana <fdmanana@suse.com>
> Date:   Fri Mar 22 18:02:59 2024 +0000
>
>     btrfs: add a global per cpu counter to track number of used extent maps
>
>     Add a per cpu counter that tracks the total number of extent maps that are
>     in extent trees of inodes that belong to fs trees. This is going to be
>     used in an upcoming change that adds a shrinker for extent maps. Only
>     extent maps for fs trees are considered, because for special trees such as
>     the data relocation tree we don't want to evict their extent maps which
>     are critical for the relocation to work, and since those are limited, it's
>     not a concern to have them in memory during the relocation of a block
>     group. Another case are extent maps for free space cache inodes, which
>     must always remain in memory, but those are limited (there's only one per
>     free space cache inode, which means one per block group).
>
>     Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>     Signed-off-by: Filipe Manana <fdmanana@suse.com>
>     Reviewed-by: David Sterba <dsterba@suse.com>
>     Signed-off-by: David Sterba <dsterba@suse.com>
>
>  fs/btrfs/disk-io.c    |  9 +++++++++
>  fs/btrfs/extent_map.c | 17 +++++++++++++++++
>  fs/btrfs/fs.h         |  2 ++
>  3 files changed, 28 insertions(+)
>
> Unfortunately I can't check the revert commit f1d97e769152 because of conflicts.

Yes, because there are follow up commits that depend on it.

I seriously doubt that this is correctly bisected, because that commit
only adds a counter for tracking the number of extent maps.
It's using a per cpu counter and I can't think of anything more
efficient than that.

The commit that adds the extent map shrinker, which is the next commit
(956a17d9d050761e34ae6f2624e9c1ce456de204), that can
explain what you are observing.

Now the one you bisected doesn't make sense, not just because it's
just a counter update but also because you are
only seeing the kswapd0 slowdown, which is what triggers the shrinker.

The shrinker itself can be improved, there's one place where I know it
might loop too much, and I'll improve that.

Thanks.

>
> > git reset --hard v6.10-rc1
> HEAD is now at 1613e604df0c Linux 6.10-rc1
>
> > git revert -n f1d97e76915285013037c487d9513ab763005286
> Auto-merging fs/btrfs/disk-io.c
> Auto-merging fs/btrfs/extent_map.c
> Auto-merging fs/btrfs/fs.h
> CONFLICT (content): Merge conflict in fs/btrfs/fs.h
> error: could not revert f1d97e769152... btrfs: add a global per cpu
> counter to track number of used extent maps
>
> However I double checked every bisect step and I am confident in the
> correctness of the result.
>
> I also attach here a full kernel log and build config.
>
> My hardware specs: https://linux-hardware.org/?probe=d377acdb9e
>
> Filipe can you look into this please?
>
> [1] https://postimg.cc/Xrn6qfxh
>
> --
> Best Regards,
> Mike Gavrilov.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-06-26 10:48 ` Filipe Manana
@ 2024-06-26 14:16   ` Mikhail Gavrilov
  2024-07-01  9:30     ` Filipe Manana
  0 siblings, 1 reply; 56+ messages in thread
From: Mikhail Gavrilov @ 2024-06-26 14:16 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Linux List Kernel Mailing, Linux regressions mailing list,
	Btrfs BTRFS, dsterba, josef

On Wed, Jun 26, 2024 at 3:49 PM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Tue, Jun 25, 2024 at 10:04 PM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
> >
> > Hi,
> > after f1d97e769152 I spotted increased execution time of the kswapd0
> > process and symptoms as if there is not enough memory.
> > Very often I see that kswapd0 consumes 100% CPU [1].
> > Before f1d97e769152 after an hour kswapd0 is working ~3:51 and after
> > three hours ~10:13 time.
> > After f1d97e769152 kswapd0 time increased to ~25:48 after the first
> > hour and three hours it hit 71:01 time.
> > So execution time has increased by 6-7 times.
> >
> > f1d97e76915285013037c487d9513ab763005286 is the first bad commit
> > commit f1d97e76915285013037c487d9513ab763005286 (HEAD)
> > Author: Filipe Manana <fdmanana@suse.com>
> > Date:   Fri Mar 22 18:02:59 2024 +0000
> >
> >     btrfs: add a global per cpu counter to track number of used extent maps
> >
> >     Add a per cpu counter that tracks the total number of extent maps that are
> >     in extent trees of inodes that belong to fs trees. This is going to be
> >     used in an upcoming change that adds a shrinker for extent maps. Only
> >     extent maps for fs trees are considered, because for special trees such as
> >     the data relocation tree we don't want to evict their extent maps which
> >     are critical for the relocation to work, and since those are limited, it's
> >     not a concern to have them in memory during the relocation of a block
> >     group. Another case are extent maps for free space cache inodes, which
> >     must always remain in memory, but those are limited (there's only one per
> >     free space cache inode, which means one per block group).
> >
> >     Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> >     Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >     Reviewed-by: David Sterba <dsterba@suse.com>
> >     Signed-off-by: David Sterba <dsterba@suse.com>
> >
> >  fs/btrfs/disk-io.c    |  9 +++++++++
> >  fs/btrfs/extent_map.c | 17 +++++++++++++++++
> >  fs/btrfs/fs.h         |  2 ++
> >  3 files changed, 28 insertions(+)
> >
> > Unfortunately I can't check the revert commit f1d97e769152 because of conflicts.
>
> Yes, because there are follow up commits that depend on it.
>
> I seriously doubt that this is correctly bisected, because that commit
> only adds a counter for tracking the number of extent maps.
> It's using a per cpu counter and I can't think of anything more
> efficient than that.
>
> The commit that adds the extent map shrinker, which is the next commit
> (956a17d9d050761e34ae6f2624e9c1ce456de204), that can
> explain what you are observing.
>
> Now the one you bisected doesn't make sense, not just because it's
> just a counter update but also because you are
> only seeing the kswapd0 slowdown, which is what triggers the shrinker.

git bisect start
# status: waiting for both good and bad commits
# good: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9
git bisect good a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
# bad: [1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0] Linux 6.10-rc1
git bisect bad 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0

# bad: [db5d28c0bfe566908719bec8e25443aabecbb802] Merge tag
'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel
git bisect bad db5d28c0bfe566908719bec8e25443aabecbb802
6.9.0-01-db5d28c0bfe566908719bec8e25443aabecbb802
up  1:01
root         269 17.4  0.0      0     0 ?        R    16:00  10:36 [kswapd0]
up  2:00
root         269 34.5  0.0      0     0 ?        S    16:00  41:36 [kswapd0]
up  3:00
root         269 40.2  0.0      0     0 ?        R    16:00  72:47 [kswapd0]
BAD

# bad: [b850dc206a57ae272c639e31ac202ec0c2f46960] Merge tag
'firewire-updates-6.10' of
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
git bisect bad b850dc206a57ae272c639e31ac202ec0c2f46960
6.9.0-02-b850dc206a57ae272c639e31ac202ec0c2f46960
up  1:00
root         269 25.4  0.0      0     0 ?        R    19:09  15:28 [kswapd0]
up  1:18
OOM KILLER
up  2:00
root         269 40.2  0.0      0     0 ?        R    19:09  48:18 [kswapd0]
up  3:00
root         269 43.0  0.0      0     0 ?        S    19:09  77:38 [kswapd0]
up  3:59
root         269 46.4  0.0      0     0 ?        S    19:09 111:09 [kswapd0]
BAD

# good: [59729c8a76544d9d7651287a5d28c5bf7fc9fccc] Merge tag
'tag-chrome-platform-for-v6.10' of
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
git bisect good 59729c8a76544d9d7651287a5d28c5bf7fc9fccc
6.9.0-03-59729c8a76544d9d7651287a5d28c5bf7fc9fccc+
up  1:00
root         269  9.3  0.0      0     0 ?        S    10:08   5:38 [kswapd0]
up  2:02
root         269  8.8  0.0      0     0 ?        S    10:08  10:49 [kswapd0]
up  3:00
root         269  8.7  0.0      0     0 ?        S    10:08  15:42 [kswapd0]
up  3:56
root         269  8.1  0.0      0     0 ?        S    10:08  19:22 [kswapd0]
up  5:00
root         269  7.7  0.0      0     0 ?        S    10:08  23:16 [kswapd0]
up  6:00
root         269  7.5  0.0      0     0 ?        S    10:08  27:12 [kswapd0]
GOOD

# good: [101b7a97143a018b38b1f7516920a7d7d23d1745] Merge tag
'acpi-6.10-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect good 101b7a97143a018b38b1f7516920a7d7d23d1745
6.9.0-04-101b7a97143a018b38b1f7516920a7d7d23d1745
up  1:00
root         269  8.1  0.0      0     0 ?        S    17:17   4:53 [kswapd0]
up  2:00
root         269  6.9  0.0      0     0 ?        S    17:17   8:19 [kswapd0]
up  3:19
root         269  6.9  0.0      0     0 ?        S    17:17  13:57 [kswapd0]
up  4:01
root         269  7.9  0.0      0     0 ?        S    17:17  19:08 [kswapd0]
up  5:02
root         269  8.6  0.0      0     0 ?        R    17:17  26:16 [kswapd0]
up  6:00
root         269  8.3  0.0      0     0 ?        S    17:17  29:59 [kswapd0]
GOOD

# good: [47e9bff7fc042b28eb4cf375f0cf249ab708fdfa] Merge tag
'erofs-for-6.10-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
git bisect good 47e9bff7fc042b28eb4cf375f0cf249ab708fdfa
6.9.0-05-47e9bff7fc042b28eb4cf375f0cf249ab708fdfa
up  1:00
root         269  8.0  0.0      0     0 ?        S    14:00   4:49 [kswapd0]
up  3:00
root         269  7.2  0.0      0     0 ?        S    14:00  13:00 [kswapd0]
up  4:00
root         269  7.3  0.0      0     0 ?        S    14:00  17:36 [kswapd0]
up  5:08
root         269  6.5  0.0      0     0 ?        R    14:00  20:12 [kswapd0]
up  6:00
root         269  6.1  0.0      0     0 ?        S    14:00  22:14 [kswapd0]
GOOD

# bad: [b2665fe61d8a51ef70b27e1a830635a72dcc6ad8] Merge tag
'ata-6.10-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
git bisect bad b2665fe61d8a51ef70b27e1a830635a72dcc6ad8
6.9.0-06-b2665fe61d8a51ef70b27e1a830635a72dcc6ad8+
up  1:00
root         269 23.4  0.0      0     0 ?        R    20:31  14:06 [kswapd0]
up  2:00
root         269 22.1  0.0      0     0 ?        S    20:31  26:36 [kswapd0]
up  3:00
root         269 24.6  0.0      0     0 ?        R    20:31  44:21 [kswapd0]
up  4:00
root         269 26.6  0.0      0     0 ?        S    Jun22  63:57 [kswapd0]
up  5:07
root         269 27.8  0.0      0     0 ?        S    Jun22  85:35 [kswapd0]
BAD

# bad: [aa5ccf29173acfaa8aa2fdd1421aa6aca1a50cf2] btrfs: handle errors
in btrfs_reloc_clone_csums properly
git bisect bad aa5ccf29173acfaa8aa2fdd1421aa6aca1a50cf2
6.9.0-rc7-07-aa5ccf29173acfaa8aa2fdd1421aa6aca1a50cf2
up  1:00
root         268 24.7  0.0      0     0 ?        S    Jun23  14:57 [kswapd0]
up  2:00
root         268 45.1  0.0      0     0 ?        S    Jun23  54:13 [kswapd0]
BAD

# good: [d3fbb00f5e21c6dfaa6e820a21df0c9a3455a028] btrfs: embed
data_ref and tree_ref in btrfs_delayed_ref_node
git bisect good d3fbb00f5e21c6dfaa6e820a21df0c9a3455a028
6.9.0-rc7-08-d3fbb00f5e21c6dfaa6e820a21df0c9a3455a028
up  1:00
root         268  6.3  0.0      0     0 ?        S    01:42   3:51 [kswapd0]
up  1:00
root         268  8.1  0.0      0     0 ?        S    10:10   4:53 [kswapd0]
up  2:02
root         268  8.3  0.0      0     0 ?        S    10:10  10:13 [kswapd0]
up  3:00
root         268  7.6  0.0      0     0 ?        S    10:10  13:46 [kswapd0]
up  4:00
root         268  9.1  0.0      0     0 ?        S    10:10  21:56 [kswapd0]
GOOD

# good: [5fa8a6baff817c1b427aa7a8bfc1482043be6d58] btrfs: pass the
extent map tree's inode to try_merge_map()
git bisect good 5fa8a6baff817c1b427aa7a8bfc1482043be6d58
6.9.0-rc7-09-5fa8a6baff817c1b427aa7a8bfc1482043be6d58
up  1:10
root         268  5.8  0.0      0     0 ?        S    14:15   4:09 [kswapd0]
up  2:09
root         268  5.3  0.0      0     0 ?        S    14:15   6:52 [kswapd0]
up  3:09
root         268  4.6  0.0      0     0 ?        S    14:15   8:47 [kswapd0]
up  4:04
root         268  4.2  0.0      0     0 ?        S    14:15  10:24 [kswapd0]
up  5:00
root         268  3.8  0.0      0     0 ?        R    14:15  11:35 [kswapd0]
up  6:06
root         268  3.9  0.0      0     0 ?        S    14:15  14:24 [kswapd0]
up  7:03
root         268  3.8  0.0      0     0 ?        S    14:15  16:26 [kswapd0]
GOOD

# bad: [9a7b68d32afc4e92909c21e166ad993801236be3] btrfs: report
filemap_fdata<write|wait>_range() error
git bisect bad 9a7b68d32afc4e92909c21e166ad993801236be3
6.9.0-rc7-10-9a7b68d32afc4e92909c21e166ad993801236be3
up  1:00
root         268 32.5  0.0      0     0 ?        R    21:35  19:34 [kswapd0]
up  2:00
root         268 46.1  0.0      0     0 ?        R    21:35  55:24 [kswapd0]
BAD

# bad: [85d288309ab5463140a2d00b3827262fb14e7db4] btrfs: use
btrfs_get_fs_generation() at try_release_extent_mapping()
git bisect bad 85d288309ab5463140a2d00b3827262fb14e7db4
6.9.0-rc7-11-85d288309ab5463140a2d00b3827262fb14e7db4
up  1:00
root         268 38.0  0.0      0     0 ?        S    00:36  22:50 [kswapd0]
up  2:01
root         268 32.7  0.0      0     0 ?        R    00:36  39:38 [kswapd0]
up  3:00
root         268 32.1  0.0      0     0 ?        S    00:36  58:01 [kswapd0]
BAD

# bad: [65bb9fb00b7012a78b2f5d1cd042bf098900c5d3] btrfs: update
comment for btrfs_set_inode_full_sync() about locking
git bisect bad 65bb9fb00b7012a78b2f5d1cd042bf098900c5d3
6.9.0-rc7-12-65bb9fb00b7012a78b2f5d1cd042bf098900c5d3
up  1:06
root         268 17.3  0.0      0     0 ?        S    10:14  11:34 [kswapd0]
up  1:22
OOM KILLER
up  1:32
OOM KILLER
up  2:01
root         268 37.2  0.0      0     0 ?        R    10:14  45:07 [kswapd0]
up  3:01
root         268 33.1  0.0      0     0 ?        S    10:14  60:12 [kswapd0]
BAD

# bad: [956a17d9d050761e34ae6f2624e9c1ce456de204] btrfs: add a
shrinker for extent maps
git bisect bad 956a17d9d050761e34ae6f2624e9c1ce456de204
6.9.0-rc7-13-956a17d9d050761e34ae6f2624e9c1ce456de204
up  1:01
root         268 42.1  0.0      0     0 ?        R    13:20  25:48 [kswapd0]
up  1:30
OOM KILLER
up  2:01
root         268 40.7  0.0      0     0 ?        R    13:20  49:27 [kswapd0]
up  2:34
root         268 46.0  0.0      0     0 ?        S    13:20  71:01 [kswapd0]
BAD

# bad: [f1d97e76915285013037c487d9513ab763005286] btrfs: add a global
per cpu counter to track number of used extent maps
git bisect bad f1d97e76915285013037c487d9513ab763005286
6.9.0-rc7-14-f1d97e76915285013037c487d9513ab763005286
up  1:06
root         268 15.6  0.0      0     0 ?        S    16:15  10:27 [kswapd0]
up  2:00
root         268 12.0  0.0      0     0 ?        S    16:15  14:26 [kswapd0]
up  3:00
root         268  9.8  0.0      0     0 ?        S    16:15  17:48 [kswapd0]
GOOD!!! But I answered - bad.


Yeah my bad, I made a mistake on the last step.

Right bad commit is  956a17d9d050761e34ae6f2624e9c1ce456de204
Author: Filipe Manana <fdmanana@suse.com>
Date:   Mon Apr 15 17:09:26 2024 +0100

    btrfs: add a shrinker for extent maps

    Extent maps are used either to represent existing file extent items, or to
    represent new extents that are going to be written and the respective file
    extent items are created when the ordered extent completes.

    We currently don't have any limit for how many extent maps we can have,
    neither per inode nor globally. Most of the time this not too noticeable
    because extent maps are removed in the following situations:

    1) When evicting an inode;

    2) When releasing folios (pages) through the btrfs_release_folio() address
       space operation callback.

       However we won't release extent maps in the folio range if the folio is
       either dirty or under writeback or if the inode's i_size is less than
       or equals to 16M (see try_release_extent_mapping(). This 16M i_size
       constraint was added back in 2008 with commit 70dec8079d78 ("Btrfs:
       extent_io and extent_state optimizations"), but there's no explanation
       about why we have it or why the 16M value.

    This means that for buffered IO we can reach an OOM situation due to too
    many extent maps if either of the following happens:

    1) There's a set of tasks constantly doing IO on many files with a size
       not larger than 16M, specially if they keep the files open for very
       long periods, therefore preventing inode eviction.

       This requires a really high number of such files, and having many non
       mergeable extent maps (due to random 4K writes for example) and a
       machine with very little memory;

    2) There's a set tasks constantly doing random write IO (therefore
       creating many non mergeable extent maps) on files and keeping them
       open for long periods of time, so inode eviction doesn't happen and
       there's always a lot of dirty pages or pages under writeback,
       preventing btrfs_release_folio() from releasing the respective extent
       maps.

    This second case was actually reported in the thread pointed by the Link
    tag below, and it requires a very large file under heavy IO and a machine
    with very little amount of RAM, which is probably hard to happen in
    practice in a real world use case.

    However when using direct IO this is not so hard to happen, because the
    page cache is not used, and therefore btrfs_release_folio() is never
    called. Which means extent maps are dropped only when evicting the inode,
    and that means that if we have tasks that keep a file descriptor open and
    keep doing IO on a very large file (or files), we can exhaust memory due
    to an unbounded amount of extent maps. This is especially easy to happen
    if we have a huge file with millions of small extents and their extent
    maps are not mergeable (non contiguous offsets and disk locations).
    This was reported in that thread with the following fio test:

       $ cat test.sh
       #!/bin/bash

       DEV=/dev/sdj
       MNT=/mnt/sdj
       MOUNT_OPTIONS="-o ssd"
       MKFS_OPTIONS=""

       cat <<EOF > /tmp/fio-job.ini
       [global]
       name=fio-rand-write
       filename=$MNT/fio-rand-write
       rw=randwrite
       bs=4K
       direct=1
       numjobs=16
       fallocate=none
       time_based
       runtime=90000

       [file1]
       size=300G
       ioengine=libaio
       iodepth=16

       EOF

       umount $MNT &> /dev/null
       mkfs.btrfs -f $MKFS_OPTIONS $DEV
       mount $MOUNT_OPTIONS $DEV $MNT

       fio /tmp/fio-job.ini
       umount $MNT

    Monitoring the btrfs_extent_map slab while running the test with:

       $ watch -d -n 1 'cat /sys/kernel/slab/btrfs_extent_map/objects \
                            /sys/kernel/slab/btrfs_extent_map/total_objects'

    Shows the number of active and total extent maps skyrocketing to tens of
    millions, and on systems with a short amount of memory it's easy and quick
    to get into an OOM situation, as reported in that thread.

    So to avoid this issue add a shrinker that will remove extents maps, as
    long as they are not pinned, and takes proper care with any concurrent
    fsync to avoid missing extents (setting the full sync flag while in the
    middle of a fast fsync). This shrinker is triggered through the callbacks
    nr_cached_objects and free_cached_objects of struct super_operations.

    The shrinker will iterate over all roots and over all inodes of each
    root, and keeps track of the last scanned root and inode, so that the
    next time it runs, it starts from that root and from the next inode.
    This is similar to what xfs does for its inode reclaim (implements those
    callbacks, and cycles through inodes by starting from where it ended
    last time).

    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: Filipe Manana <fdmanana@suse.com>
    Reviewed-by: David Sterba <dsterba@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>

 fs/btrfs/extent_map.c | 160
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/btrfs/extent_map.h |   1 +
 fs/btrfs/fs.h         |   2 ++
 fs/btrfs/super.c      |  17 +++++++++++++++++
 4 files changed, 180 insertions(+)

> The shrinker itself can be improved, there's one place where I know it
> might loop too much, and I'll improve that.

Oh, great!
Can I test this patch when it is ready?

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-06-26 14:16   ` Mikhail Gavrilov
@ 2024-07-01  9:30     ` Filipe Manana
  2024-07-02 14:13       ` Mikhail Gavrilov
  0 siblings, 1 reply; 56+ messages in thread
From: Filipe Manana @ 2024-07-01  9:30 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Linux List Kernel Mailing, Linux regressions mailing list,
	Btrfs BTRFS, dsterba, josef

On Wed, Jun 26, 2024 at 3:17 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> On Wed, Jun 26, 2024 at 3:49 PM Filipe Manana <fdmanana@kernel.org> wrote:
> >
> > On Tue, Jun 25, 2024 at 10:04 PM Mikhail Gavrilov
> > <mikhail.v.gavrilov@gmail.com> wrote:
> > >
> > > Hi,
> > > after f1d97e769152 I spotted increased execution time of the kswapd0
> > > process and symptoms as if there is not enough memory.
> > > Very often I see that kswapd0 consumes 100% CPU [1].
> > > Before f1d97e769152 after an hour kswapd0 is working ~3:51 and after
> > > three hours ~10:13 time.
> > > After f1d97e769152 kswapd0 time increased to ~25:48 after the first
> > > hour and three hours it hit 71:01 time.
> > > So execution time has increased by 6-7 times.
> > >
> > > f1d97e76915285013037c487d9513ab763005286 is the first bad commit
> > > commit f1d97e76915285013037c487d9513ab763005286 (HEAD)
> > > Author: Filipe Manana <fdmanana@suse.com>
> > > Date:   Fri Mar 22 18:02:59 2024 +0000
> > >
> > >     btrfs: add a global per cpu counter to track number of used extent maps
> > >
> > >     Add a per cpu counter that tracks the total number of extent maps that are
> > >     in extent trees of inodes that belong to fs trees. This is going to be
> > >     used in an upcoming change that adds a shrinker for extent maps. Only
> > >     extent maps for fs trees are considered, because for special trees such as
> > >     the data relocation tree we don't want to evict their extent maps which
> > >     are critical for the relocation to work, and since those are limited, it's
> > >     not a concern to have them in memory during the relocation of a block
> > >     group. Another case are extent maps for free space cache inodes, which
> > >     must always remain in memory, but those are limited (there's only one per
> > >     free space cache inode, which means one per block group).
> > >
> > >     Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> > >     Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > >     Reviewed-by: David Sterba <dsterba@suse.com>
> > >     Signed-off-by: David Sterba <dsterba@suse.com>
> > >
> > >  fs/btrfs/disk-io.c    |  9 +++++++++
> > >  fs/btrfs/extent_map.c | 17 +++++++++++++++++
> > >  fs/btrfs/fs.h         |  2 ++
> > >  3 files changed, 28 insertions(+)
> > >
> > > Unfortunately I can't check the revert commit f1d97e769152 because of conflicts.
> >
> > Yes, because there are follow up commits that depend on it.
> >
> > I seriously doubt that this is correctly bisected, because that commit
> > only adds a counter for tracking the number of extent maps.
> > It's using a per cpu counter and I can't think of anything more
> > efficient than that.
> >
> > The commit that adds the extent map shrinker, which is the next commit
> > (956a17d9d050761e34ae6f2624e9c1ce456de204), that can
> > explain what you are observing.
> >
> > Now the one you bisected doesn't make sense, not just because it's
> > just a counter update but also because you are
> > only seeing the kswapd0 slowdown, which is what triggers the shrinker.
>
> git bisect start
> # status: waiting for both good and bad commits
> # good: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9
> git bisect good a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
> # bad: [1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0] Linux 6.10-rc1
> git bisect bad 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
>
> # bad: [db5d28c0bfe566908719bec8e25443aabecbb802] Merge tag
> 'drm-next-2024-05-15' of https://gitlab.freedesktop.org/drm/kernel
> git bisect bad db5d28c0bfe566908719bec8e25443aabecbb802
> 6.9.0-01-db5d28c0bfe566908719bec8e25443aabecbb802
> up  1:01
> root         269 17.4  0.0      0     0 ?        R    16:00  10:36 [kswapd0]
> up  2:00
> root         269 34.5  0.0      0     0 ?        S    16:00  41:36 [kswapd0]
> up  3:00
> root         269 40.2  0.0      0     0 ?        R    16:00  72:47 [kswapd0]
> BAD
>
> # bad: [b850dc206a57ae272c639e31ac202ec0c2f46960] Merge tag
> 'firewire-updates-6.10' of
> git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394
> git bisect bad b850dc206a57ae272c639e31ac202ec0c2f46960
> 6.9.0-02-b850dc206a57ae272c639e31ac202ec0c2f46960
> up  1:00
> root         269 25.4  0.0      0     0 ?        R    19:09  15:28 [kswapd0]
> up  1:18
> OOM KILLER
> up  2:00
> root         269 40.2  0.0      0     0 ?        R    19:09  48:18 [kswapd0]
> up  3:00
> root         269 43.0  0.0      0     0 ?        S    19:09  77:38 [kswapd0]
> up  3:59
> root         269 46.4  0.0      0     0 ?        S    19:09 111:09 [kswapd0]
> BAD
>
> # good: [59729c8a76544d9d7651287a5d28c5bf7fc9fccc] Merge tag
> 'tag-chrome-platform-for-v6.10' of
> git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
> git bisect good 59729c8a76544d9d7651287a5d28c5bf7fc9fccc
> 6.9.0-03-59729c8a76544d9d7651287a5d28c5bf7fc9fccc+
> up  1:00
> root         269  9.3  0.0      0     0 ?        S    10:08   5:38 [kswapd0]
> up  2:02
> root         269  8.8  0.0      0     0 ?        S    10:08  10:49 [kswapd0]
> up  3:00
> root         269  8.7  0.0      0     0 ?        S    10:08  15:42 [kswapd0]
> up  3:56
> root         269  8.1  0.0      0     0 ?        S    10:08  19:22 [kswapd0]
> up  5:00
> root         269  7.7  0.0      0     0 ?        S    10:08  23:16 [kswapd0]
> up  6:00
> root         269  7.5  0.0      0     0 ?        S    10:08  27:12 [kswapd0]
> GOOD
>
> # good: [101b7a97143a018b38b1f7516920a7d7d23d1745] Merge tag
> 'acpi-6.10-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
> git bisect good 101b7a97143a018b38b1f7516920a7d7d23d1745
> 6.9.0-04-101b7a97143a018b38b1f7516920a7d7d23d1745
> up  1:00
> root         269  8.1  0.0      0     0 ?        S    17:17   4:53 [kswapd0]
> up  2:00
> root         269  6.9  0.0      0     0 ?        S    17:17   8:19 [kswapd0]
> up  3:19
> root         269  6.9  0.0      0     0 ?        S    17:17  13:57 [kswapd0]
> up  4:01
> root         269  7.9  0.0      0     0 ?        S    17:17  19:08 [kswapd0]
> up  5:02
> root         269  8.6  0.0      0     0 ?        R    17:17  26:16 [kswapd0]
> up  6:00
> root         269  8.3  0.0      0     0 ?        S    17:17  29:59 [kswapd0]
> GOOD
>
> # good: [47e9bff7fc042b28eb4cf375f0cf249ab708fdfa] Merge tag
> 'erofs-for-6.10-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
> git bisect good 47e9bff7fc042b28eb4cf375f0cf249ab708fdfa
> 6.9.0-05-47e9bff7fc042b28eb4cf375f0cf249ab708fdfa
> up  1:00
> root         269  8.0  0.0      0     0 ?        S    14:00   4:49 [kswapd0]
> up  3:00
> root         269  7.2  0.0      0     0 ?        S    14:00  13:00 [kswapd0]
> up  4:00
> root         269  7.3  0.0      0     0 ?        S    14:00  17:36 [kswapd0]
> up  5:08
> root         269  6.5  0.0      0     0 ?        R    14:00  20:12 [kswapd0]
> up  6:00
> root         269  6.1  0.0      0     0 ?        S    14:00  22:14 [kswapd0]
> GOOD
>
> # bad: [b2665fe61d8a51ef70b27e1a830635a72dcc6ad8] Merge tag
> 'ata-6.10-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux
> git bisect bad b2665fe61d8a51ef70b27e1a830635a72dcc6ad8
> 6.9.0-06-b2665fe61d8a51ef70b27e1a830635a72dcc6ad8+
> up  1:00
> root         269 23.4  0.0      0     0 ?        R    20:31  14:06 [kswapd0]
> up  2:00
> root         269 22.1  0.0      0     0 ?        S    20:31  26:36 [kswapd0]
> up  3:00
> root         269 24.6  0.0      0     0 ?        R    20:31  44:21 [kswapd0]
> up  4:00
> root         269 26.6  0.0      0     0 ?        S    Jun22  63:57 [kswapd0]
> up  5:07
> root         269 27.8  0.0      0     0 ?        S    Jun22  85:35 [kswapd0]
> BAD
>
> # bad: [aa5ccf29173acfaa8aa2fdd1421aa6aca1a50cf2] btrfs: handle errors
> in btrfs_reloc_clone_csums properly
> git bisect bad aa5ccf29173acfaa8aa2fdd1421aa6aca1a50cf2
> 6.9.0-rc7-07-aa5ccf29173acfaa8aa2fdd1421aa6aca1a50cf2
> up  1:00
> root         268 24.7  0.0      0     0 ?        S    Jun23  14:57 [kswapd0]
> up  2:00
> root         268 45.1  0.0      0     0 ?        S    Jun23  54:13 [kswapd0]
> BAD
>
> # good: [d3fbb00f5e21c6dfaa6e820a21df0c9a3455a028] btrfs: embed
> data_ref and tree_ref in btrfs_delayed_ref_node
> git bisect good d3fbb00f5e21c6dfaa6e820a21df0c9a3455a028
> 6.9.0-rc7-08-d3fbb00f5e21c6dfaa6e820a21df0c9a3455a028
> up  1:00
> root         268  6.3  0.0      0     0 ?        S    01:42   3:51 [kswapd0]
> up  1:00
> root         268  8.1  0.0      0     0 ?        S    10:10   4:53 [kswapd0]
> up  2:02
> root         268  8.3  0.0      0     0 ?        S    10:10  10:13 [kswapd0]
> up  3:00
> root         268  7.6  0.0      0     0 ?        S    10:10  13:46 [kswapd0]
> up  4:00
> root         268  9.1  0.0      0     0 ?        S    10:10  21:56 [kswapd0]
> GOOD
>
> # good: [5fa8a6baff817c1b427aa7a8bfc1482043be6d58] btrfs: pass the
> extent map tree's inode to try_merge_map()
> git bisect good 5fa8a6baff817c1b427aa7a8bfc1482043be6d58
> 6.9.0-rc7-09-5fa8a6baff817c1b427aa7a8bfc1482043be6d58
> up  1:10
> root         268  5.8  0.0      0     0 ?        S    14:15   4:09 [kswapd0]
> up  2:09
> root         268  5.3  0.0      0     0 ?        S    14:15   6:52 [kswapd0]
> up  3:09
> root         268  4.6  0.0      0     0 ?        S    14:15   8:47 [kswapd0]
> up  4:04
> root         268  4.2  0.0      0     0 ?        S    14:15  10:24 [kswapd0]
> up  5:00
> root         268  3.8  0.0      0     0 ?        R    14:15  11:35 [kswapd0]
> up  6:06
> root         268  3.9  0.0      0     0 ?        S    14:15  14:24 [kswapd0]
> up  7:03
> root         268  3.8  0.0      0     0 ?        S    14:15  16:26 [kswapd0]
> GOOD
>
> # bad: [9a7b68d32afc4e92909c21e166ad993801236be3] btrfs: report
> filemap_fdata<write|wait>_range() error
> git bisect bad 9a7b68d32afc4e92909c21e166ad993801236be3
> 6.9.0-rc7-10-9a7b68d32afc4e92909c21e166ad993801236be3
> up  1:00
> root         268 32.5  0.0      0     0 ?        R    21:35  19:34 [kswapd0]
> up  2:00
> root         268 46.1  0.0      0     0 ?        R    21:35  55:24 [kswapd0]
> BAD
>
> # bad: [85d288309ab5463140a2d00b3827262fb14e7db4] btrfs: use
> btrfs_get_fs_generation() at try_release_extent_mapping()
> git bisect bad 85d288309ab5463140a2d00b3827262fb14e7db4
> 6.9.0-rc7-11-85d288309ab5463140a2d00b3827262fb14e7db4
> up  1:00
> root         268 38.0  0.0      0     0 ?        S    00:36  22:50 [kswapd0]
> up  2:01
> root         268 32.7  0.0      0     0 ?        R    00:36  39:38 [kswapd0]
> up  3:00
> root         268 32.1  0.0      0     0 ?        S    00:36  58:01 [kswapd0]
> BAD
>
> # bad: [65bb9fb00b7012a78b2f5d1cd042bf098900c5d3] btrfs: update
> comment for btrfs_set_inode_full_sync() about locking
> git bisect bad 65bb9fb00b7012a78b2f5d1cd042bf098900c5d3
> 6.9.0-rc7-12-65bb9fb00b7012a78b2f5d1cd042bf098900c5d3
> up  1:06
> root         268 17.3  0.0      0     0 ?        S    10:14  11:34 [kswapd0]
> up  1:22
> OOM KILLER
> up  1:32
> OOM KILLER
> up  2:01
> root         268 37.2  0.0      0     0 ?        R    10:14  45:07 [kswapd0]
> up  3:01
> root         268 33.1  0.0      0     0 ?        S    10:14  60:12 [kswapd0]
> BAD
>
> # bad: [956a17d9d050761e34ae6f2624e9c1ce456de204] btrfs: add a
> shrinker for extent maps
> git bisect bad 956a17d9d050761e34ae6f2624e9c1ce456de204
> 6.9.0-rc7-13-956a17d9d050761e34ae6f2624e9c1ce456de204
> up  1:01
> root         268 42.1  0.0      0     0 ?        R    13:20  25:48 [kswapd0]
> up  1:30
> OOM KILLER
> up  2:01
> root         268 40.7  0.0      0     0 ?        R    13:20  49:27 [kswapd0]
> up  2:34
> root         268 46.0  0.0      0     0 ?        S    13:20  71:01 [kswapd0]
> BAD
>
> # bad: [f1d97e76915285013037c487d9513ab763005286] btrfs: add a global
> per cpu counter to track number of used extent maps
> git bisect bad f1d97e76915285013037c487d9513ab763005286
> 6.9.0-rc7-14-f1d97e76915285013037c487d9513ab763005286
> up  1:06
> root         268 15.6  0.0      0     0 ?        S    16:15  10:27 [kswapd0]
> up  2:00
> root         268 12.0  0.0      0     0 ?        S    16:15  14:26 [kswapd0]
> up  3:00
> root         268  9.8  0.0      0     0 ?        S    16:15  17:48 [kswapd0]
> GOOD!!! But I answered - bad.
>
>
> Yeah my bad, I made a mistake on the last step.
>
> Right bad commit is  956a17d9d050761e34ae6f2624e9c1ce456de204
> Author: Filipe Manana <fdmanana@suse.com>
> Date:   Mon Apr 15 17:09:26 2024 +0100
>
>     btrfs: add a shrinker for extent maps
>
>     Extent maps are used either to represent existing file extent items, or to
>     represent new extents that are going to be written and the respective file
>     extent items are created when the ordered extent completes.
>
>     We currently don't have any limit for how many extent maps we can have,
>     neither per inode nor globally. Most of the time this not too noticeable
>     because extent maps are removed in the following situations:
>
>     1) When evicting an inode;
>
>     2) When releasing folios (pages) through the btrfs_release_folio() address
>        space operation callback.
>
>        However we won't release extent maps in the folio range if the folio is
>        either dirty or under writeback or if the inode's i_size is less than
>        or equals to 16M (see try_release_extent_mapping(). This 16M i_size
>        constraint was added back in 2008 with commit 70dec8079d78 ("Btrfs:
>        extent_io and extent_state optimizations"), but there's no explanation
>        about why we have it or why the 16M value.
>
>     This means that for buffered IO we can reach an OOM situation due to too
>     many extent maps if either of the following happens:
>
>     1) There's a set of tasks constantly doing IO on many files with a size
>        not larger than 16M, specially if they keep the files open for very
>        long periods, therefore preventing inode eviction.
>
>        This requires a really high number of such files, and having many non
>        mergeable extent maps (due to random 4K writes for example) and a
>        machine with very little memory;
>
>     2) There's a set tasks constantly doing random write IO (therefore
>        creating many non mergeable extent maps) on files and keeping them
>        open for long periods of time, so inode eviction doesn't happen and
>        there's always a lot of dirty pages or pages under writeback,
>        preventing btrfs_release_folio() from releasing the respective extent
>        maps.
>
>     This second case was actually reported in the thread pointed by the Link
>     tag below, and it requires a very large file under heavy IO and a machine
>     with very little amount of RAM, which is probably hard to happen in
>     practice in a real world use case.
>
>     However when using direct IO this is not so hard to happen, because the
>     page cache is not used, and therefore btrfs_release_folio() is never
>     called. Which means extent maps are dropped only when evicting the inode,
>     and that means that if we have tasks that keep a file descriptor open and
>     keep doing IO on a very large file (or files), we can exhaust memory due
>     to an unbounded amount of extent maps. This is especially easy to happen
>     if we have a huge file with millions of small extents and their extent
>     maps are not mergeable (non contiguous offsets and disk locations).
>     This was reported in that thread with the following fio test:
>
>        $ cat test.sh
>        #!/bin/bash
>
>        DEV=/dev/sdj
>        MNT=/mnt/sdj
>        MOUNT_OPTIONS="-o ssd"
>        MKFS_OPTIONS=""
>
>        cat <<EOF > /tmp/fio-job.ini
>        [global]
>        name=fio-rand-write
>        filename=$MNT/fio-rand-write
>        rw=randwrite
>        bs=4K
>        direct=1
>        numjobs=16
>        fallocate=none
>        time_based
>        runtime=90000
>
>        [file1]
>        size=300G
>        ioengine=libaio
>        iodepth=16
>
>        EOF
>
>        umount $MNT &> /dev/null
>        mkfs.btrfs -f $MKFS_OPTIONS $DEV
>        mount $MOUNT_OPTIONS $DEV $MNT
>
>        fio /tmp/fio-job.ini
>        umount $MNT
>
>     Monitoring the btrfs_extent_map slab while running the test with:
>
>        $ watch -d -n 1 'cat /sys/kernel/slab/btrfs_extent_map/objects \
>                             /sys/kernel/slab/btrfs_extent_map/total_objects'
>
>     Shows the number of active and total extent maps skyrocketing to tens of
>     millions, and on systems with a short amount of memory it's easy and quick
>     to get into an OOM situation, as reported in that thread.
>
>     So to avoid this issue add a shrinker that will remove extents maps, as
>     long as they are not pinned, and takes proper care with any concurrent
>     fsync to avoid missing extents (setting the full sync flag while in the
>     middle of a fast fsync). This shrinker is triggered through the callbacks
>     nr_cached_objects and free_cached_objects of struct super_operations.
>
>     The shrinker will iterate over all roots and over all inodes of each
>     root, and keeps track of the last scanned root and inode, so that the
>     next time it runs, it starts from that root and from the next inode.
>     This is similar to what xfs does for its inode reclaim (implements those
>     callbacks, and cycles through inodes by starting from where it ended
>     last time).
>
>     Reviewed-by: Josef Bacik <josef@toxicpanda.com>
>     Signed-off-by: Filipe Manana <fdmanana@suse.com>
>     Reviewed-by: David Sterba <dsterba@suse.com>
>     Signed-off-by: David Sterba <dsterba@suse.com>
>
>  fs/btrfs/extent_map.c | 160
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/btrfs/extent_map.h |   1 +
>  fs/btrfs/fs.h         |   2 ++
>  fs/btrfs/super.c      |  17 +++++++++++++++++
>  4 files changed, 180 insertions(+)
>
> > The shrinker itself can be improved, there's one place where I know it
> > might loop too much, and I'll improve that.
>
> Oh, great!
> Can I test this patch when it is ready?

Try this:

https://lore.kernel.org/linux-btrfs/cb12212b9c599817507f3978c9102767267625b2.1719825714.git.fdmanana@suse.com/

That applies only to the "for-next", it will need conflict resolution
for 6.10-rc, as noted in the commnets.
For a version that cleanly applies to 6.10-rc:

https://gist.githubusercontent.com/fdmanana/5262e608b3eecb9a3b2631f8dad49863/raw/1a82fe8eafbd5f6958dddf34d3c9648d7335018e/btrfs-don-t-loop-again-over-pinned-extent-maps-when-.patch

Btw, besides the longer kswapd execution times, what else do you observe?
Is it impacting performance of any applications?

I think no matter what we do, it's likely that kswapd will take more
time than before, because now there's extra work of going through
extent maps and dropping them.
We had to do it to prevent OOM situations because extent map creation
was unbounded.

Thanks.

>
> --
> Best Regards,
> Mike Gavrilov.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-01  9:30     ` Filipe Manana
@ 2024-07-02 14:13       ` Mikhail Gavrilov
  2024-07-02 17:22         ` Filipe Manana
  0 siblings, 1 reply; 56+ messages in thread
From: Mikhail Gavrilov @ 2024-07-02 14:13 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Linux List Kernel Mailing, Linux regressions mailing list,
	Btrfs BTRFS, dsterba, josef

On Mon, Jul 1, 2024 at 2:31 PM Filipe Manana <fdmanana@kernel.org> wrote:
>
> Try this:
>
> https://lore.kernel.org/linux-btrfs/cb12212b9c599817507f3978c9102767267625b2.1719825714.git.fdmanana@suse.com/
>
> That applies only to the "for-next", it will need conflict resolution
> for 6.10-rc, as noted in the commnets.
> For a version that cleanly applies to 6.10-rc:
>
> https://gist.githubusercontent.com/fdmanana/5262e608b3eecb9a3b2631f8dad49863/raw/1a82fe8eafbd5f6958dddf34d3c9648d7335018e/btrfs-don-t-loop-again-over-pinned-extent-maps-when-.patch

I tested this patch on top of v6.10-rc6

> Btw, besides the longer kswapd execution times, what else do you observe?
> Is it impacting performance of any applications?

I observe that the system freezes under load.
Demonstration: https://youtu.be/1-gUrnEi2aU
The GNOME shell stops responding, and even the clock in the GNOME
status bar stops updating seconds.
And this didn't happen when the v6.9 kernel was running. Second, I
spotted high CPU usage by process kswapd0 when freezes occurred.
Therefore, I decided to find the commit that led to high CPU
consumption by the kswapd0 process.
As we found out, this commit turned out to be 956a17d9d050.

> I think no matter what we do, it's likely that kswapd will take more
> time than before, because now there's extra work of going through
> extent maps and dropping them.
> We had to do it to prevent OOM situations because extent map creation
> was unbounded.

Unfortunately, the patch didn't improve anything.
kswapd0 still consumes 100% CPU under load.
And my system continues to freeze.

6.10.0-0.rc6.51.fc41.x86_64+debug with patch
up  1:00
root         269 13.1  0.0      0     0 ?        S    12:24   7:53 [kswapd0]
up  2:00
root         269 29.9  0.0      0     0 ?        R    12:24  36:02 [kswapd0]
up  3:00
root         269 37.8  0.0      0     0 ?        S    12:24  68:19 [kswapd0]
up  4:05
root         269 39.3  0.0      0     0 ?        R    12:24  96:40 [kswapd0]
up  5:01
root         269 38.8  0.0      0     0 ?        R    12:24 117:00 [kswapd0]
up  6:00
root         269 40.3  0.0      0     0 ?        S    12:24 145:24 [kswapd0]

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-02 14:13       ` Mikhail Gavrilov
@ 2024-07-02 17:22         ` Filipe Manana
  2024-07-02 19:46           ` Chris Murphy
  2024-07-03 10:31           ` Filipe Manana
  0 siblings, 2 replies; 56+ messages in thread
From: Filipe Manana @ 2024-07-02 17:22 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Linux List Kernel Mailing, Linux regressions mailing list,
	Btrfs BTRFS, dsterba, josef

On Tue, Jul 2, 2024 at 3:13 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> On Mon, Jul 1, 2024 at 2:31 PM Filipe Manana <fdmanana@kernel.org> wrote:
> >
> > Try this:
> >
> > https://lore.kernel.org/linux-btrfs/cb12212b9c599817507f3978c9102767267625b2.1719825714.git.fdmanana@suse.com/
> >
> > That applies only to the "for-next", it will need conflict resolution
> > for 6.10-rc, as noted in the commnets.
> > For a version that cleanly applies to 6.10-rc:
> >
> > https://gist.githubusercontent.com/fdmanana/5262e608b3eecb9a3b2631f8dad49863/raw/1a82fe8eafbd5f6958dddf34d3c9648d7335018e/btrfs-don-t-loop-again-over-pinned-extent-maps-when-.patch
>
> I tested this patch on top of v6.10-rc6
>
> > Btw, besides the longer kswapd execution times, what else do you observe?
> > Is it impacting performance of any applications?
>
> I observe that the system freezes under load.
> Demonstration: https://youtu.be/1-gUrnEi2aU
> The GNOME shell stops responding, and even the clock in the GNOME
> status bar stops updating seconds.
> And this didn't happen when the v6.9 kernel was running. Second, I
> spotted high CPU usage by process kswapd0 when freezes occurred.
> Therefore, I decided to find the commit that led to high CPU
> consumption by the kswapd0 process.
> As we found out, this commit turned out to be 956a17d9d050.
>
> > I think no matter what we do, it's likely that kswapd will take more
> > time than before, because now there's extra work of going through
> > extent maps and dropping them.
> > We had to do it to prevent OOM situations because extent map creation
> > was unbounded.
>
> Unfortunately, the patch didn't improve anything.
> kswapd0 still consumes 100% CPU under load.
> And my system continues to freeze.

Ok, the concerning part is the freezing and high cpu usage.

So besides that patch, try 2 other patches on top of it:

1) https://gist.githubusercontent.com/fdmanana/5262e608b3eecb9a3b2631f8dad49863/raw/aaf4c00fd40aaee0ee2788cd9fdfe2f083328c39/btrfs-don-t-loop-again-over-pinned-extent-maps-when-.patch
    (this is the patch you tried before)

2) https://gist.githubusercontent.com/fdmanana/f2275050f04d1830adb811745bfd99d4/raw/1001d8154133d862e305959ee9eedebf55941669/gistfile1.txt

3) https://gist.githubusercontent.com/fdmanana/0a71b9e0fe71f38f67a50b7b53d520e6/raw/680cab70d2ef32337583bee6a4fb6519241b2faa/0003-btrfs-prevent-extent-map-shrinker-from-monopolizing-.patch

Apply those patches on top of 6.10-rc in that order and let me know how it goes.
Thanks.

>
> 6.10.0-0.rc6.51.fc41.x86_64+debug with patch
> up  1:00
> root         269 13.1  0.0      0     0 ?        S    12:24   7:53 [kswapd0]
> up  2:00
> root         269 29.9  0.0      0     0 ?        R    12:24  36:02 [kswapd0]
> up  3:00
> root         269 37.8  0.0      0     0 ?        S    12:24  68:19 [kswapd0]
> up  4:05
> root         269 39.3  0.0      0     0 ?        R    12:24  96:40 [kswapd0]
> up  5:01
> root         269 38.8  0.0      0     0 ?        R    12:24 117:00 [kswapd0]
> up  6:00
> root         269 40.3  0.0      0     0 ?        S    12:24 145:24 [kswapd0]
>
> --
> Best Regards,
> Mike Gavrilov.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-02 17:22         ` Filipe Manana
@ 2024-07-02 19:46           ` Chris Murphy
  2024-07-03 10:32             ` Filipe Manana
  2024-07-03 10:31           ` Filipe Manana
  1 sibling, 1 reply; 56+ messages in thread
From: Chris Murphy @ 2024-07-02 19:46 UTC (permalink / raw)
  To: Filipe Manana,
	Михаил Гаврилов
  Cc: linux-kernel, Linux regressions mailing list, Btrfs BTRFS,
	David Sterba, Josef Bacik

On Tue, Jul 2, 2024, at 1:22 PM, Filipe Manana wrote:
> On Tue, Jul 2, 2024 at 3:13 PM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:

>> Unfortunately, the patch didn't improve anything.
>> kswapd0 still consumes 100% CPU under load.
>> And my system continues to freeze.
>
> Ok, the concerning part is the freezing and high cpu usage.

We're seeing this in Fedora Rawhide, which is always using the most recent mainline kernel. 

User first reported June 25 they were experiencing much longer backup times, normal is ~5 minutes, they're taking 1+ hours now, with frequent freezes of the DE, notices kswapd using 100% CPU and then other processes also start hanging with 100% CPU. Resolution is a power cycle and reverting to 6.9 series.

The workload is described as "restic via ssh to a repo on a backup server".

I can try to get more info to narrow down the last known good and first known bad kernels if that's useful.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-02 19:46           ` Chris Murphy
@ 2024-07-03 10:32             ` Filipe Manana
  0 siblings, 0 replies; 56+ messages in thread
From: Filipe Manana @ 2024-07-03 10:32 UTC (permalink / raw)
  To: Chris Murphy
  Cc: Михаил Гаврилов,
	linux-kernel, Linux regressions mailing list, Btrfs BTRFS,
	David Sterba, Josef Bacik

On Tue, Jul 2, 2024 at 8:46 PM Chris Murphy <lists@colorremedies.com> wrote:
>
>
>
> On Tue, Jul 2, 2024, at 1:22 PM, Filipe Manana wrote:
> > On Tue, Jul 2, 2024 at 3:13 PM Mikhail Gavrilov
> > <mikhail.v.gavrilov@gmail.com> wrote:
>
> >> Unfortunately, the patch didn't improve anything.
> >> kswapd0 still consumes 100% CPU under load.
> >> And my system continues to freeze.
> >
> > Ok, the concerning part is the freezing and high cpu usage.
>
> We're seeing this in Fedora Rawhide, which is always using the most recent mainline kernel.
>
> User first reported June 25 they were experiencing much longer backup times, normal is ~5 minutes, they're taking 1+ hours now, with frequent freezes of the DE, notices kswapd using 100% CPU and then other processes also start hanging with 100% CPU. Resolution is a power cycle and reverting to 6.9 series.
>
> The workload is described as "restic via ssh to a repo on a backup server".

Any idea how many files, their sizes, the sum of their sizes, etc?

>
> I can try to get more info to narrow down the last known good and first known bad kernels if that's useful.

Isn't that what Mikhail did? He bisected it to a specific commit.

Thanks.

>
>
> --
> Chris Murphy

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-02 17:22         ` Filipe Manana
  2024-07-02 19:46           ` Chris Murphy
@ 2024-07-03 10:31           ` Filipe Manana
  2024-07-03 10:44             ` Filipe Manana
  1 sibling, 1 reply; 56+ messages in thread
From: Filipe Manana @ 2024-07-03 10:31 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Linux List Kernel Mailing, Linux regressions mailing list,
	Btrfs BTRFS, dsterba, josef

On Tue, Jul 2, 2024 at 6:22 PM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Tue, Jul 2, 2024 at 3:13 PM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
> >
> > On Mon, Jul 1, 2024 at 2:31 PM Filipe Manana <fdmanana@kernel.org> wrote:
> > >
> > > Try this:
> > >
> > > https://lore.kernel.org/linux-btrfs/cb12212b9c599817507f3978c9102767267625b2.1719825714.git.fdmanana@suse.com/
> > >
> > > That applies only to the "for-next", it will need conflict resolution
> > > for 6.10-rc, as noted in the commnets.
> > > For a version that cleanly applies to 6.10-rc:
> > >
> > > https://gist.githubusercontent.com/fdmanana/5262e608b3eecb9a3b2631f8dad49863/raw/1a82fe8eafbd5f6958dddf34d3c9648d7335018e/btrfs-don-t-loop-again-over-pinned-extent-maps-when-.patch
> >
> > I tested this patch on top of v6.10-rc6
> >
> > > Btw, besides the longer kswapd execution times, what else do you observe?
> > > Is it impacting performance of any applications?
> >
> > I observe that the system freezes under load.
> > Demonstration: https://youtu.be/1-gUrnEi2aU
> > The GNOME shell stops responding, and even the clock in the GNOME
> > status bar stops updating seconds.
> > And this didn't happen when the v6.9 kernel was running. Second, I
> > spotted high CPU usage by process kswapd0 when freezes occurred.
> > Therefore, I decided to find the commit that led to high CPU
> > consumption by the kswapd0 process.
> > As we found out, this commit turned out to be 956a17d9d050.
> >
> > > I think no matter what we do, it's likely that kswapd will take more
> > > time than before, because now there's extra work of going through
> > > extent maps and dropping them.
> > > We had to do it to prevent OOM situations because extent map creation
> > > was unbounded.
> >
> > Unfortunately, the patch didn't improve anything.
> > kswapd0 still consumes 100% CPU under load.
> > And my system continues to freeze.
>
> Ok, the concerning part is the freezing and high cpu usage.
>
> So besides that patch, try 2 other patches on top of it:
>
> 1) https://gist.githubusercontent.com/fdmanana/5262e608b3eecb9a3b2631f8dad49863/raw/aaf4c00fd40aaee0ee2788cd9fdfe2f083328c39/btrfs-don-t-loop-again-over-pinned-extent-maps-when-.patch
>     (this is the patch you tried before)
>
> 2) https://gist.githubusercontent.com/fdmanana/f2275050f04d1830adb811745bfd99d4/raw/1001d8154133d862e305959ee9eedebf55941669/gistfile1.txt
>
> 3) https://gist.githubusercontent.com/fdmanana/0a71b9e0fe71f38f67a50b7b53d520e6/raw/680cab70d2ef32337583bee6a4fb6519241b2faa/0003-btrfs-prevent-extent-map-shrinker-from-monopolizing-.patch
>
> Apply those patches on top of 6.10-rc in that order and let me know how it goes.

Also, a 4th one:

https://gist.githubusercontent.com/fdmanana/638d90142e4db7cd462121d812075de7/raw/acb90d92c1cab512414e0bd5461640c9015da4ec/0004-btrfs-use-delayed-iput-during-extent-map-shrinking.patch

This one should apply in any order. Try all those 4 together please.
Thanks!

> Thanks.
>
> >
> > 6.10.0-0.rc6.51.fc41.x86_64+debug with patch
> > up  1:00
> > root         269 13.1  0.0      0     0 ?        S    12:24   7:53 [kswapd0]
> > up  2:00
> > root         269 29.9  0.0      0     0 ?        R    12:24  36:02 [kswapd0]
> > up  3:00
> > root         269 37.8  0.0      0     0 ?        S    12:24  68:19 [kswapd0]
> > up  4:05
> > root         269 39.3  0.0      0     0 ?        R    12:24  96:40 [kswapd0]
> > up  5:01
> > root         269 38.8  0.0      0     0 ?        R    12:24 117:00 [kswapd0]
> > up  6:00
> > root         269 40.3  0.0      0     0 ?        S    12:24 145:24 [kswapd0]
> >
> > --
> > Best Regards,
> > Mike Gavrilov.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-03 10:31           ` Filipe Manana
@ 2024-07-03 10:44             ` Filipe Manana
  2024-07-03 21:07               ` Andrea Gelmini
  0 siblings, 1 reply; 56+ messages in thread
From: Filipe Manana @ 2024-07-03 10:44 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Linux List Kernel Mailing, Linux regressions mailing list,
	Btrfs BTRFS, dsterba, josef

On Wed, Jul 3, 2024 at 11:31 AM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Tue, Jul 2, 2024 at 6:22 PM Filipe Manana <fdmanana@kernel.org> wrote:
> >
> > On Tue, Jul 2, 2024 at 3:13 PM Mikhail Gavrilov
> > <mikhail.v.gavrilov@gmail.com> wrote:
> > >
> > > On Mon, Jul 1, 2024 at 2:31 PM Filipe Manana <fdmanana@kernel.org> wrote:
> > > >
> > > > Try this:
> > > >
> > > > https://lore.kernel.org/linux-btrfs/cb12212b9c599817507f3978c9102767267625b2.1719825714.git.fdmanana@suse.com/
> > > >
> > > > That applies only to the "for-next", it will need conflict resolution
> > > > for 6.10-rc, as noted in the commnets.
> > > > For a version that cleanly applies to 6.10-rc:
> > > >
> > > > https://gist.githubusercontent.com/fdmanana/5262e608b3eecb9a3b2631f8dad49863/raw/1a82fe8eafbd5f6958dddf34d3c9648d7335018e/btrfs-don-t-loop-again-over-pinned-extent-maps-when-.patch
> > >
> > > I tested this patch on top of v6.10-rc6
> > >
> > > > Btw, besides the longer kswapd execution times, what else do you observe?
> > > > Is it impacting performance of any applications?
> > >
> > > I observe that the system freezes under load.
> > > Demonstration: https://youtu.be/1-gUrnEi2aU
> > > The GNOME shell stops responding, and even the clock in the GNOME
> > > status bar stops updating seconds.
> > > And this didn't happen when the v6.9 kernel was running. Second, I
> > > spotted high CPU usage by process kswapd0 when freezes occurred.
> > > Therefore, I decided to find the commit that led to high CPU
> > > consumption by the kswapd0 process.
> > > As we found out, this commit turned out to be 956a17d9d050.
> > >
> > > > I think no matter what we do, it's likely that kswapd will take more
> > > > time than before, because now there's extra work of going through
> > > > extent maps and dropping them.
> > > > We had to do it to prevent OOM situations because extent map creation
> > > > was unbounded.
> > >
> > > Unfortunately, the patch didn't improve anything.
> > > kswapd0 still consumes 100% CPU under load.
> > > And my system continues to freeze.
> >
> > Ok, the concerning part is the freezing and high cpu usage.
> >
> > So besides that patch, try 2 other patches on top of it:
> >
> > 1) https://gist.githubusercontent.com/fdmanana/5262e608b3eecb9a3b2631f8dad49863/raw/aaf4c00fd40aaee0ee2788cd9fdfe2f083328c39/btrfs-don-t-loop-again-over-pinned-extent-maps-when-.patch
> >     (this is the patch you tried before)
> >
> > 2) https://gist.githubusercontent.com/fdmanana/f2275050f04d1830adb811745bfd99d4/raw/1001d8154133d862e305959ee9eedebf55941669/gistfile1.txt
> >
> > 3) https://gist.githubusercontent.com/fdmanana/0a71b9e0fe71f38f67a50b7b53d520e6/raw/680cab70d2ef32337583bee6a4fb6519241b2faa/0003-btrfs-prevent-extent-map-shrinker-from-monopolizing-.patch
> >
> > Apply those patches on top of 6.10-rc in that order and let me know how it goes.
>
> Also, a 4th one:
>
> https://gist.githubusercontent.com/fdmanana/638d90142e4db7cd462121d812075de7/raw/acb90d92c1cab512414e0bd5461640c9015da4ec/0004-btrfs-use-delayed-iput-during-extent-map-shrinking.patch
>
> This one should apply in any order. Try all those 4 together please.
> Thanks!

I'm collecting all the patches in this branch:

https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=em_shrinker_6.10

They apply cleanly to 6.10-rc.

>
> > Thanks.
> >
> > >
> > > 6.10.0-0.rc6.51.fc41.x86_64+debug with patch
> > > up  1:00
> > > root         269 13.1  0.0      0     0 ?        S    12:24   7:53 [kswapd0]
> > > up  2:00
> > > root         269 29.9  0.0      0     0 ?        R    12:24  36:02 [kswapd0]
> > > up  3:00
> > > root         269 37.8  0.0      0     0 ?        S    12:24  68:19 [kswapd0]
> > > up  4:05
> > > root         269 39.3  0.0      0     0 ?        R    12:24  96:40 [kswapd0]
> > > up  5:01
> > > root         269 38.8  0.0      0     0 ?        R    12:24 117:00 [kswapd0]
> > > up  6:00
> > > root         269 40.3  0.0      0     0 ?        S    12:24 145:24 [kswapd0]
> > >
> > > --
> > > Best Regards,
> > > Mike Gavrilov.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-03 10:44             ` Filipe Manana
@ 2024-07-03 21:07               ` Andrea Gelmini
  2024-07-04  9:48                 ` Filipe Manana
  0 siblings, 1 reply; 56+ messages in thread
From: Andrea Gelmini @ 2024-07-03 21:07 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

Il giorno mer 3 lug 2024 alle ore 13:59 Filipe Manana
<fdmanana@kernel.org> ha scritto:
>
> I'm collecting all the patches in this branch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=em_shrinker_6.10
>
> They apply cleanly to 6.10-rc.

Yeap, as I wrote before, same problem here.
I tried the branch over today Linus git (master), and nothing changed.
But, good news, I can provide a few more details.

So, no need to use restic. On my laptop (nvme + ssd, 32GB RAM, Lenovo T480):
a) boot up;
b) just open Window Maker and two Konsole, one with htop (with a few
tricks to view PSI and so on);
c) on one terminal run: tar cp /home/ | pv > /dev/null
d) wait less than one minutes, and I see "PSI full memory" increase
more than 50, memory pressure on swap, and two CPU threads (out of
eight) busy at  100%;
e) system get sluggish (on htop I see no process eating CPU);
f) if I kill tar, PSI memory keeps going up and down, so the threads.
After lots of minutes, everything get back to no activity. In these
minutes I see by iotop there's no activity nor on ssd or nvme. Until
the end, the system is unresponsive, oh well, really slow.

My / is BTRFS. Not many years of aging. Usually with daily snapshots
and forced compression.

Less than 4.000.000 files on the system. Usually .git and source code.

root@glen:/home/gelma# btrfs filesystem usage /
Overall:
   Device size:                   3.54TiB
   Device allocated:              2.14TiB
   Device unallocated:            1.40TiB
   Device missing:                  0.00B
   Device slack:                    0.00B
   Used:                          2.03TiB
   Free (estimated):              1.50TiB      (min: 1.50TiB)
   Free (statfs, df):             1.50TiB
   Data ratio:                       1.00
   Metadata ratio:                   1.00
   Global reserve:              512.00MiB      (used: 0.00B)
   Multiple profiles:                  no

Data,single: Size:2.12TiB, Used:2.02TiB (95.09%)
  /dev/mapper/sda6_crypt          2.12TiB

Metadata,single: Size:16.00GiB, Used:14.73GiB (92.04%)
  /dev/mapper/sda6_crypt         16.00GiB

System,single: Size:32.00MiB, Used:320.00KiB (0.98%)
  /dev/mapper/sda6_crypt         32.00MiB

Unallocated:
  /dev/mapper/sda6_crypt          1.40TiB

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-03 21:07               ` Andrea Gelmini
@ 2024-07-04  9:48                 ` Filipe Manana
  2024-07-04  9:56                   ` Filipe Manana
  2024-07-04 11:18                   ` Andrea Gelmini
  0 siblings, 2 replies; 56+ messages in thread
From: Filipe Manana @ 2024-07-04  9:48 UTC (permalink / raw)
  To: Andrea Gelmini
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Wed, Jul 3, 2024 at 10:07 PM Andrea Gelmini <andrea.gelmini@gmail.com> wrote:
>
> Il giorno mer 3 lug 2024 alle ore 13:59 Filipe Manana
> <fdmanana@kernel.org> ha scritto:
> >
> > I'm collecting all the patches in this branch:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=em_shrinker_6.10
> >
> > They apply cleanly to 6.10-rc.
>
> Yeap, as I wrote before, same problem here.
> I tried the branch over today Linus git (master), and nothing changed.
> But, good news, I can provide a few more details.
>
> So, no need to use restic. On my laptop (nvme + ssd, 32GB RAM, Lenovo T480):
> a) boot up;
> b) just open Window Maker and two Konsole, one with htop (with a few
> tricks to view PSI and so on);
> c) on one terminal run: tar cp /home/ | pv > /dev/null
> d) wait less than one minutes, and I see "PSI full memory" increase
> more than 50, memory pressure on swap, and two CPU threads (out of
> eight) busy at  100%;

I'll try that soon and see if I can reproduce.

In the meanwhile, just curious: are you using swapfiles on btrfs?

Thanks.

> e) system get sluggish (on htop I see no process eating CPU);
> f) if I kill tar, PSI memory keeps going up and down, so the threads.
> After lots of minutes, everything get back to no activity. In these
> minutes I see by iotop there's no activity nor on ssd or nvme. Until
> the end, the system is unresponsive, oh well, really slow.
>
> My / is BTRFS. Not many years of aging. Usually with daily snapshots
> and forced compression.
>
> Less than 4.000.000 files on the system. Usually .git and source code.
>
> root@glen:/home/gelma# btrfs filesystem usage /
> Overall:
>    Device size:                   3.54TiB
>    Device allocated:              2.14TiB
>    Device unallocated:            1.40TiB
>    Device missing:                  0.00B
>    Device slack:                    0.00B
>    Used:                          2.03TiB
>    Free (estimated):              1.50TiB      (min: 1.50TiB)
>    Free (statfs, df):             1.50TiB
>    Data ratio:                       1.00
>    Metadata ratio:                   1.00
>    Global reserve:              512.00MiB      (used: 0.00B)
>    Multiple profiles:                  no
>
> Data,single: Size:2.12TiB, Used:2.02TiB (95.09%)
>   /dev/mapper/sda6_crypt          2.12TiB
>
> Metadata,single: Size:16.00GiB, Used:14.73GiB (92.04%)
>   /dev/mapper/sda6_crypt         16.00GiB
>
> System,single: Size:32.00MiB, Used:320.00KiB (0.98%)
>   /dev/mapper/sda6_crypt         32.00MiB
>
> Unallocated:
>   /dev/mapper/sda6_crypt          1.40TiB

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04  9:48                 ` Filipe Manana
@ 2024-07-04  9:56                   ` Filipe Manana
  2024-07-04 10:50                     ` Mikhail Gavrilov
  2024-07-04 13:33                     ` Andrea Gelmini
  2024-07-04 11:18                   ` Andrea Gelmini
  1 sibling, 2 replies; 56+ messages in thread
From: Filipe Manana @ 2024-07-04  9:56 UTC (permalink / raw)
  To: Andrea Gelmini
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Thu, Jul 4, 2024 at 10:48 AM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Wed, Jul 3, 2024 at 10:07 PM Andrea Gelmini <andrea.gelmini@gmail.com> wrote:
> >
> > Il giorno mer 3 lug 2024 alle ore 13:59 Filipe Manana
> > <fdmanana@kernel.org> ha scritto:
> > >
> > > I'm collecting all the patches in this branch:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=em_shrinker_6.10
> > >
> > > They apply cleanly to 6.10-rc.
> >
> > Yeap, as I wrote before, same problem here.
> > I tried the branch over today Linus git (master), and nothing changed.
> > But, good news, I can provide a few more details.
> >
> > So, no need to use restic. On my laptop (nvme + ssd, 32GB RAM, Lenovo T480):
> > a) boot up;
> > b) just open Window Maker and two Konsole, one with htop (with a few
> > tricks to view PSI and so on);
> > c) on one terminal run: tar cp /home/ | pv > /dev/null
> > d) wait less than one minutes, and I see "PSI full memory" increase
> > more than 50, memory pressure on swap, and two CPU threads (out of
> > eight) busy at  100%;
>
> I'll try that soon and see if I can reproduce.
>
> In the meanwhile, just curious: are you using swapfiles on btrfs?

I wonder if you have bpftrace installed and can run the following
script while doing the test:

$ cat bpftrace-em-shrinker.sh
#!/usr/bin/bpftrace

tracepoint:btrfs:btrfs_extent_map_shrinker_scan_enter
{
time("%H:%M:%S ");
@start_em_scan[tid] = nsecs;
printf("%s enter shrinker scan %ld nr %ld root %llu ino %llu\n",
       comm, args->nr_to_scan, args->nr, args->last_root_id, args->last_ino);
}

tracepoint:btrfs:btrfs_extent_map_shrinker_scan_exit
/@start_em_scan[tid]/
{
time("%H:%M:%S ");
$dur = (nsecs - @start_em_scan[tid]) / 1000;
delete(@start_em_scan[tid]);
printf("%s exit shrinker drop %ld nr %ld root %llu ino %llu | %llu us\n",
       comm, args->nr_dropped, args->nr, args->last_root_id,
args->last_ino, $dur);
}

END
{
clear(@start_em_scan);
}

The run it like:

$ ./bpftrace-em-shrinker.sh 2>&1 | tee em_shrinker_log.txt

And provide the log file.

Thanks.

>
> Thanks.
>
> > e) system get sluggish (on htop I see no process eating CPU);
> > f) if I kill tar, PSI memory keeps going up and down, so the threads.
> > After lots of minutes, everything get back to no activity. In these
> > minutes I see by iotop there's no activity nor on ssd or nvme. Until
> > the end, the system is unresponsive, oh well, really slow.
> >
> > My / is BTRFS. Not many years of aging. Usually with daily snapshots
> > and forced compression.
> >
> > Less than 4.000.000 files on the system. Usually .git and source code.
> >
> > root@glen:/home/gelma# btrfs filesystem usage /
> > Overall:
> >    Device size:                   3.54TiB
> >    Device allocated:              2.14TiB
> >    Device unallocated:            1.40TiB
> >    Device missing:                  0.00B
> >    Device slack:                    0.00B
> >    Used:                          2.03TiB
> >    Free (estimated):              1.50TiB      (min: 1.50TiB)
> >    Free (statfs, df):             1.50TiB
> >    Data ratio:                       1.00
> >    Metadata ratio:                   1.00
> >    Global reserve:              512.00MiB      (used: 0.00B)
> >    Multiple profiles:                  no
> >
> > Data,single: Size:2.12TiB, Used:2.02TiB (95.09%)
> >   /dev/mapper/sda6_crypt          2.12TiB
> >
> > Metadata,single: Size:16.00GiB, Used:14.73GiB (92.04%)
> >   /dev/mapper/sda6_crypt         16.00GiB
> >
> > System,single: Size:32.00MiB, Used:320.00KiB (0.98%)
> >   /dev/mapper/sda6_crypt         32.00MiB
> >
> > Unallocated:
> >   /dev/mapper/sda6_crypt          1.40TiB

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04  9:56                   ` Filipe Manana
@ 2024-07-04 10:50                     ` Mikhail Gavrilov
  2024-07-04 13:33                     ` Andrea Gelmini
  1 sibling, 0 replies; 56+ messages in thread
From: Mikhail Gavrilov @ 2024-07-04 10:50 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Andrea Gelmini, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

[-- Attachment #1: Type: text/plain, Size: 1695 bytes --]

On Thu, Jul 4, 2024 at 2:57 PM Filipe Manana <fdmanana@kernel.org> wrote:
>
> I wonder if you have bpftrace installed and can run the following
> script while doing the test:
>
> $ cat bpftrace-em-shrinker.sh
> #!/usr/bin/bpftrace
>
> tracepoint:btrfs:btrfs_extent_map_shrinker_scan_enter
> {
> time("%H:%M:%S ");
> @start_em_scan[tid] = nsecs;
> printf("%s enter shrinker scan %ld nr %ld root %llu ino %llu\n",
>        comm, args->nr_to_scan, args->nr, args->last_root_id, args->last_ino);
> }
>
> tracepoint:btrfs:btrfs_extent_map_shrinker_scan_exit
> /@start_em_scan[tid]/
> {
> time("%H:%M:%S ");
> $dur = (nsecs - @start_em_scan[tid]) / 1000;
> delete(@start_em_scan[tid]);
> printf("%s exit shrinker drop %ld nr %ld root %llu ino %llu | %llu us\n",
>        comm, args->nr_dropped, args->nr, args->last_root_id,
> args->last_ino, $dur);
> }
>
> END
> {
> clear(@start_em_scan);
> }
>
> The run it like:
>
> $ ./bpftrace-em-shrinker.sh 2>&1 | tee em_shrinker_log.txt
>
> And provide the log file.

I applied all four patches and still not seen any improvements:
6.10.0-0.rc6.53.fc41.x86_64+debug with patch (1-4)
up  1:02
root         269 27.9  0.0      0     0 ?        R    10:00  17:29 [kswapd0]
up  2:00
root         269 43.7  0.0      0     0 ?        R    10:00  52:47 [kswapd0]
up  3:00
root         269 40.5  0.0      0     0 ?        R    10:00  73:22 [kswapd0]
up  4:00
root         269 43.9  0.0      0     0 ?        R    10:00 105:30 [kswapd0]
up  5:00
root         269 48.7  0.0      0     0 ?        R    10:00 146:22 [kswapd0]

Also, I attached em_shrinker_log.txt in the archive.

-- 
Best Regards,
Mike Gavrilov.

[-- Attachment #2: em_shrinker_log.zip --]
[-- Type: application/zip, Size: 1623 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04  9:56                   ` Filipe Manana
  2024-07-04 10:50                     ` Mikhail Gavrilov
@ 2024-07-04 13:33                     ` Andrea Gelmini
  2024-07-04 13:47                       ` Andrea Gelmini
  1 sibling, 1 reply; 56+ messages in thread
From: Andrea Gelmini @ 2024-07-04 13:33 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

Il giorno gio 4 lug 2024 alle ore 11:57 Filipe Manana
<fdmanana@kernel.org> ha scritto:
> I wonder if you have bpftrace installed and can run the following
> script while doing the test:

Yeap, no problem. It worked. I mean, recorded on external usb stick... But...
On my laptop I have kernel 6.6.36 and rc6+branch...
Uhm... Yesterday I switched back to 6.6.36 after wrote the email,
well, not exactly, I usually work with 6.6.36 and yesterday rebooted
with rc6 to test the branch patches (so the sluggish issue weeks ago,
so I didn't move from 6.6.36).
Anyway, after wrote you the report while watching the "live action", I
rebooted 6.6.36.

Anyway, tried right now with bpftrace and I can't replicate it. Kernel
exactly the same. No recompilo or so on...

I just can think about a few LibreOffice git subvolume deleted
(~215000 files), and a few created.
A few tens of giga deleted of files (mp4 and mkv). Uhm... dunno if it
could be related or something changed on BTRFS layout.

Well, I have the nighlty snapshot (well, it's not on every subvolume
of the fs). I ran tar on that, and we see.

I tell you in a few hours. In the meanwhile I keep running rc6+ and
see if something happens.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04 13:33                     ` Andrea Gelmini
@ 2024-07-04 13:47                       ` Andrea Gelmini
  2024-07-04 14:48                         ` Andrea Gelmini
  0 siblings, 1 reply; 56+ messages in thread
From: Andrea Gelmini @ 2024-07-04 13:47 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

Il giorno gio 4 lug 2024 alle ore 15:33 Andrea Gelmini
<andrea.gelmini@gmail.com> ha scritto:
> Well, I have the nighlty snapshot (well, it's not on every subvolume
> of the fs). I ran tar on that, and we see.

Well, using the laptop for daily work and running tar on snapshots,
recreate the issue.
I am collecting the htop output and bfptrace.

I send you everything when I collect enough data.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04 13:47                       ` Andrea Gelmini
@ 2024-07-04 14:48                         ` Andrea Gelmini
  2024-07-04 17:25                           ` Filipe Manana
  0 siblings, 1 reply; 56+ messages in thread
From: Andrea Gelmini @ 2024-07-04 14:48 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

Il giorno gio 4 lug 2024 alle ore 15:47 Andrea Gelmini
<andrea.gelmini@gmail.com> ha scritto:
> I send you everything when I collect enough data.

Here we are.

Kernel rc6+branch:
    Output of bfptrace:
    https://pastebin.com/P9RFp5mg

    Recording of tar session: (summary: start fast, then flipping super slow)
    https://asciinema.org/a/BxYI83TkrlOhEe42IWXNY135D

    Recording of htop session: (summary: PSI high and two threads at 100%)
    https://asciinema.org/a/ZwGSepZZ8TSpFfPssACUUXcCB


Kernel 6.6.36:
    Recording of tar session: (summary: tar always fast)
    https://asciinema.org/a/a6dOkbjyPFkkQ5aNTaRiFD3H8

    Recording of htop session: (summary: no threads and PSI load)
    https://asciinema.org/a/mFsypWzHfSdsjrIQf8zpzNpKo

If you need to run for longer time, I can do it in the weekend.
If you need dump of my BTRFS fs, no problem, but I need 'btrfs image
-s" working (point is: scrambling filenames).

Thanks a lot,
Gelma

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04 14:48                         ` Andrea Gelmini
@ 2024-07-04 17:25                           ` Filipe Manana
  2024-07-04 17:31                             ` Filipe Manana
                                               ` (4 more replies)
  0 siblings, 5 replies; 56+ messages in thread
From: Filipe Manana @ 2024-07-04 17:25 UTC (permalink / raw)
  To: Andrea Gelmini
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Thu, Jul 4, 2024 at 3:48 PM Andrea Gelmini <andrea.gelmini@gmail.com> wrote:
>
> Il giorno gio 4 lug 2024 alle ore 15:47 Andrea Gelmini
> <andrea.gelmini@gmail.com> ha scritto:
> > I send you everything when I collect enough data.
>
> Here we are.
>
> Kernel rc6+branch:
>     Output of bfptrace:
>     https://pastebin.com/P9RFp5mg

So a couple interesting things here, which we didn't get in the short
capture from Mikhail:

1) There's apparently multiple tasks entering the shrinker at the same time:
     kswapd0, Chrome_ChildIOT, Chrome_IOThread, chrome, Xorg.

2) In some cases we get very large negative numbers for the number of
extent maps to scan.
    This shouldn't happen and either our own btrfs counter might have
overflowed or some other bug,
    or the super block's shrinker is being called with sc->nr_to_scan
negative, and outside btrfs' control,
    and it seems outside of control of the VFS's shrinker callback
(see fs/super.c:super_cache_scan()).

>
>     Recording of tar session: (summary: start fast, then flipping super slow)
>     https://asciinema.org/a/BxYI83TkrlOhEe42IWXNY135D
>
>     Recording of htop session: (summary: PSI high and two threads at 100%)
>     https://asciinema.org/a/ZwGSepZZ8TSpFfPssACUUXcCB

Ok, so maybe I missed it, but I haven't kswapd0 in there, or nothing
taking 100% CPU.
Maybe it was just Mikhail running into that?

I was looking at the memory PSI and I never noticed it going over 60%.
As for cpu and IO PSI, for cpu it was always low, under 3% from what
I've seen and for IO even lower than that, very close to 0%.

So I'm surprised that you get an unresponsive desktop.

>
>
> Kernel 6.6.36:
>     Recording of tar session: (summary: tar always fast)
>     https://asciinema.org/a/a6dOkbjyPFkkQ5aNTaRiFD3H8
>
>     Recording of htop session: (summary: no threads and PSI load)
>     https://asciinema.org/a/mFsypWzHfSdsjrIQf8zpzNpKo

Interestingly, here the memory PSI stays at 0% or very close to that,
it never reaches anything close to the 60%.

>
> If you need to run for longer time, I can do it in the weekend.
> If you need dump of my BTRFS fs, no problem, but I need 'btrfs image
> -s" working (point is: scrambling filenames).

Ok, so I haven been delaying my reply because I kept accumulating
things for you (or Mikhail) to try, and avoid sending several messages
with very little.

So first thing, I tried reproducing your scenario like you described
in a previous message using tar:

On a fresh btrfs filesystem, I cloned Linus' kernel tree into /mnt/git/linux
Compiled a kernel.
Then copied the tree 3 times like this:

cd /mnt/git
cp --reflink=never -r linux linux2
cp --reflink=never -r linux linux3
cp --reflink=never -r linux linux4

The total size of /mnt/git was 62G (as reported by:  du -hs /mnt/git).

Than I ran:

cd /mnt/git
tar cp git/ | pv > /dev/null

With htop in parallel, the bpftrace script, and since my htop version
doesn't show PSI information (probably an older version than yours), I
kept monitoring PSI like this:

watch -d -n 3 'echo "cpu:\n"; cat /proc/pressure/cpu ; echo
"\nmemory:\n" ; cat /proc/pressure/memory ; echo "\nio:\n" ; cat
/proc/pressure/io'

Nothing went out of the roof, the machine was always responsive, never
seen kswapd0 anywhere near the top, and the process using most CPU was
tar (and always under 30%).
PSI had all values low.

The shrinker was being triggered very often, for small numbers (mostly
under 1000, and most of the time much less than that), but I never had
those large negative numbers nor apparently different tasks entering
into it concurrently.
It took a few seconds at most in each run.

I also tried monitoring while doing the "cp --reflink=never -r"
commands and while PSI often peaked to 92%, 93%, the system was always
responsive (and such IO PSI seems reasonable since we are doing a lot
of read and write IO).

So several different things to try here:

1) First let's check that the problem is really a consequence of the shrinker.
    Try this patch:

    https://gist.githubusercontent.com/fdmanana/b44abaade0000d28ba0e1e1ae3ac4fee/raw/5c9bf0beb5aa156b893be2837c9244d035962c74/gistfile1.txt

    This disables the shrinker. This is just to confirm if I'm looking
in the right direction, if your problem is the same as Mikhail's and
double check his bisection.

2) Then drop that patch that disables the shrinker.
     With all the previous 4 patches applied, apply this one on top of them:

     https://gist.githubusercontent.com/fdmanana/9cea16ca56594f8c7e20b67dc66c6c94/raw/557bd5f6b37b65d210218f8da8987b74bfe5e515/gistfile1.txt

     The goal here is to see if the extent map eviction done by the
shrinker is making reads from other tasks too slow, and check if
that's what0s making your system unresponsive.

3) Then drop the patch from step 2), and on top of the previous 4
patches from my git tree, apply this one:

     https://gist.githubusercontent.com/fdmanana/a7c9c2abb69c978cf5b80c2f784243d5/raw/b4cca964904d3ec15c74e36ccf111a3a2f530520/gistfile1.txt

     This is just to confirm if we do have concurrent calls to the
shrinker, as the tracing seems to suggest, and where the negative
numbers come from.
     It also helps to check if not allowing concurrent calls to it, by
skipping if it's already running, helps making the problems go away.

>
> Thanks a lot,

Thanks a lot to you and Mikhail, not just for the reporting but also
to apply patches, compile a kernel, run the tests and do all those
valuable observations which are all very time consuming.

Thanks!

> Gelma

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04 17:25                           ` Filipe Manana
@ 2024-07-04 17:31                             ` Filipe Manana
  2024-07-04 22:15                             ` Andrea Gelmini
                                               ` (3 subsequent siblings)
  4 siblings, 0 replies; 56+ messages in thread
From: Filipe Manana @ 2024-07-04 17:31 UTC (permalink / raw)
  To: Andrea Gelmini
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Thu, Jul 4, 2024 at 6:25 PM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Thu, Jul 4, 2024 at 3:48 PM Andrea Gelmini <andrea.gelmini@gmail.com> wrote:
> >
> > Il giorno gio 4 lug 2024 alle ore 15:47 Andrea Gelmini
> > <andrea.gelmini@gmail.com> ha scritto:
> > > I send you everything when I collect enough data.
> >
> > Here we are.
> >
> > Kernel rc6+branch:
> >     Output of bfptrace:
> >     https://pastebin.com/P9RFp5mg
>
> So a couple interesting things here, which we didn't get in the short
> capture from Mikhail:
>
> 1) There's apparently multiple tasks entering the shrinker at the same time:
>      kswapd0, Chrome_ChildIOT, Chrome_IOThread, chrome, Xorg.
>
> 2) In some cases we get very large negative numbers for the number of
> extent maps to scan.
>     This shouldn't happen and either our own btrfs counter might have
> overflowed or some other bug,
>     or the super block's shrinker is being called with sc->nr_to_scan
> negative, and outside btrfs' control,
>     and it seems outside of control of the VFS's shrinker callback
> (see fs/super.c:super_cache_scan()).
>
> >
> >     Recording of tar session: (summary: start fast, then flipping super slow)
> >     https://asciinema.org/a/BxYI83TkrlOhEe42IWXNY135D
> >
> >     Recording of htop session: (summary: PSI high and two threads at 100%)
> >     https://asciinema.org/a/ZwGSepZZ8TSpFfPssACUUXcCB
>
> Ok, so maybe I missed it, but I haven't kswapd0 in there, or nothing
> taking 100% CPU.
> Maybe it was just Mikhail running into that?
>
> I was looking at the memory PSI and I never noticed it going over 60%.
> As for cpu and IO PSI, for cpu it was always low, under 3% from what
> I've seen and for IO even lower than that, very close to 0%.
>
> So I'm surprised that you get an unresponsive desktop.
>
> >
> >
> > Kernel 6.6.36:
> >     Recording of tar session: (summary: tar always fast)
> >     https://asciinema.org/a/a6dOkbjyPFkkQ5aNTaRiFD3H8
> >
> >     Recording of htop session: (summary: no threads and PSI load)
> >     https://asciinema.org/a/mFsypWzHfSdsjrIQf8zpzNpKo
>
> Interestingly, here the memory PSI stays at 0% or very close to that,
> it never reaches anything close to the 60%.
>
> >
> > If you need to run for longer time, I can do it in the weekend.
> > If you need dump of my BTRFS fs, no problem, but I need 'btrfs image
> > -s" working (point is: scrambling filenames).
>
> Ok, so I haven been delaying my reply because I kept accumulating
> things for you (or Mikhail) to try, and avoid sending several messages
> with very little.
>
> So first thing, I tried reproducing your scenario like you described
> in a previous message using tar:
>
> On a fresh btrfs filesystem, I cloned Linus' kernel tree into /mnt/git/linux
> Compiled a kernel.
> Then copied the tree 3 times like this:
>
> cd /mnt/git
> cp --reflink=never -r linux linux2
> cp --reflink=never -r linux linux3
> cp --reflink=never -r linux linux4
>
> The total size of /mnt/git was 62G (as reported by:  du -hs /mnt/git).
>
> Than I ran:
>
> cd /mnt/git
> tar cp git/ | pv > /dev/null
>
> With htop in parallel, the bpftrace script, and since my htop version
> doesn't show PSI information (probably an older version than yours), I
> kept monitoring PSI like this:
>
> watch -d -n 3 'echo "cpu:\n"; cat /proc/pressure/cpu ; echo
> "\nmemory:\n" ; cat /proc/pressure/memory ; echo "\nio:\n" ; cat
> /proc/pressure/io'
>
> Nothing went out of the roof, the machine was always responsive, never
> seen kswapd0 anywhere near the top, and the process using most CPU was
> tar (and always under 30%).
> PSI had all values low.
>
> The shrinker was being triggered very often, for small numbers (mostly
> under 1000, and most of the time much less than that), but I never had
> those large negative numbers nor apparently different tasks entering
> into it concurrently.
> It took a few seconds at most in each run.
>
> I also tried monitoring while doing the "cp --reflink=never -r"
> commands and while PSI often peaked to 92%, 93%, the system was always
> responsive (and such IO PSI seems reasonable since we are doing a lot
> of read and write IO).
>
> So several different things to try here:
>
> 1) First let's check that the problem is really a consequence of the shrinker.
>     Try this patch:
>
>     https://gist.githubusercontent.com/fdmanana/b44abaade0000d28ba0e1e1ae3ac4fee/raw/5c9bf0beb5aa156b893be2837c9244d035962c74/gistfile1.txt
>
>     This disables the shrinker. This is just to confirm if I'm looking
> in the right direction, if your problem is the same as Mikhail's and
> double check his bisection.
>
> 2) Then drop that patch that disables the shrinker.
>      With all the previous 4 patches applied, apply this one on top of them:
>
>      https://gist.githubusercontent.com/fdmanana/9cea16ca56594f8c7e20b67dc66c6c94/raw/557bd5f6b37b65d210218f8da8987b74bfe5e515/gistfile1.txt
>
>      The goal here is to see if the extent map eviction done by the
> shrinker is making reads from other tasks too slow, and check if
> that's what0s making your system unresponsive.
>
> 3) Then drop the patch from step 2), and on top of the previous 4
> patches from my git tree, apply this one:
>
>      https://gist.githubusercontent.com/fdmanana/a7c9c2abb69c978cf5b80c2f784243d5/raw/b4cca964904d3ec15c74e36ccf111a3a2f530520/gistfile1.txt
>
>      This is just to confirm if we do have concurrent calls to the
> shrinker, as the tracing seems to suggest, and where the negative
> numbers come from.
>      It also helps to check if not allowing concurrent calls to it, by
> skipping if it's already running, helps making the problems go away.

Oh and for this one, show your 'dmesg' after your testing to see if
any stack traces or warning messages were logged (even if it happens
to solve all the problems).

Thanks!


>
> >
> > Thanks a lot,
>
> Thanks a lot to you and Mikhail, not just for the reporting but also
> to apply patches, compile a kernel, run the tests and do all those
> valuable observations which are all very time consuming.
>
> Thanks!
>
> > Gelma

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04 17:25                           ` Filipe Manana
  2024-07-04 17:31                             ` Filipe Manana
@ 2024-07-04 22:15                             ` Andrea Gelmini
  2024-07-04 22:23                               ` Andrea Gelmini
  2024-07-05 11:00                               ` Filipe Manana
  2024-07-05  6:30                             ` Andrea Gelmini
                                               ` (2 subsequent siblings)
  4 siblings, 2 replies; 56+ messages in thread
From: Andrea Gelmini @ 2024-07-04 22:15 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

Il giorno gio 4 lug 2024 alle ore 19:25 Filipe Manana
<fdmanana@kernel.org> ha scritto:
> 2) In some cases we get very large negative numbers for the number of
> extent maps to scan.
>     This shouldn't happen and either our own btrfs counter might have
> overflowed or some other bug,

Well, I was thinking about my specific odds, and I tried this:
a) kernel 6.6.36;
b) on spare partition nvme created a new shiny btrfs;
c) then mount it forcing compression;
d) multiple parallel cp of kernel and libreoffice src;
e) reboot with same rc6+branch already used;
f) tar of the new btrfs: no problem at all;
g) let it finish;
h) tar of /.snapshots: PSI memory skyrocket, and usual slowdown reading;
i) stop it;
l) again tar of the new btrfs: no problem
m) repeat a few times.

You can see the output here:
https://asciinema.org/a/rJpGWvXYH6IDBXWYhtJckkKWo

In the end you see I kill tar and let the PSI going down to zero, if
you are interested.

> Ok, so maybe I missed it, but I haven't kswapd0 in there, or nothing
> taking 100% CPU.
> Maybe it was just Mikhail running into that?

To have this effect and the extreme luggish response (I mean, click
something and it takes more than 30 seconds to react)
I need to work at least one day on my laptop. At this point also
cycling to virtual desktop takes a lot.

Thinking about my different use case:
a) i always suspend. I just reboot when change kernel. So, I can work
for weeks with same kernel. Suspend2RAM, not disk, btw;
b) months ago I let run beesd for a day.

> So I'm surprised that you get an unresponsive desktop.
Same point as before. In this case is not so luggish, but - i.e. - if
I click for screenlock it doesn't start immediately, it waits for a
little bit more than one second.

> Interestingly, here the memory PSI stays at 0% or very close to that,
> it never reaches anything close to the 60%.

You see the same thing with the last test with new btrfs partition.
New partition: ~0%
/.snapshots/: near 60%.


> With htop in parallel, the bpftrace script, and since my htop version
> doesn't show PSI information (probably an older version than yours), I
> kept monitoring PSI like this:

Well, mine is taken from here:
https://github.com/htop-dev/htop.git
Compiled with:
./configure --enable-capabilities --enable-delayacct --enable-sensors
--enable-werror   --enable-affinity
And tweaked config file. If you want I can send it.


> So several different things to try here:

I stop here for the moment. I have to sleep.
In the weekend I do the rest and reply to you!

> Thanks a lot to you and Mikhail, not just for the reporting but also
> to apply patches, compile a kernel, run the tests and do all those
> valuable observations which are all very time consuming.

My little contribution to free software!

Ciao,
Gelma

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04 22:15                             ` Andrea Gelmini
@ 2024-07-04 22:23                               ` Andrea Gelmini
  2024-07-05 11:00                               ` Filipe Manana
  1 sibling, 0 replies; 56+ messages in thread
From: Andrea Gelmini @ 2024-07-04 22:23 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

[-- Attachment #1: Type: text/plain, Size: 231 bytes --]

Il giorno ven 5 lug 2024 alle ore 00:15 Andrea Gelmini
<andrea.gelmini@gmail.com> ha scritto:
>
> You can see the output here:
> https://asciinema.org/a/rJpGWvXYH6IDBXWYhtJckkKWo

Sorry, in attachment, the bfp log of this session.

[-- Attachment #2: em_shrinker_log.txt.bz2 --]
[-- Type: application/x-bzip, Size: 280773 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04 22:15                             ` Andrea Gelmini
  2024-07-04 22:23                               ` Andrea Gelmini
@ 2024-07-05 11:00                               ` Filipe Manana
  1 sibling, 0 replies; 56+ messages in thread
From: Filipe Manana @ 2024-07-05 11:00 UTC (permalink / raw)
  To: Andrea Gelmini
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Thu, Jul 4, 2024 at 11:15 PM Andrea Gelmini <andrea.gelmini@gmail.com> wrote:
>
> Il giorno gio 4 lug 2024 alle ore 19:25 Filipe Manana
> <fdmanana@kernel.org> ha scritto:
> > 2) In some cases we get very large negative numbers for the number of
> > extent maps to scan.
> >     This shouldn't happen and either our own btrfs counter might have
> > overflowed or some other bug,
>
> Well, I was thinking about my specific odds, and I tried this:
> a) kernel 6.6.36;
> b) on spare partition nvme created a new shiny btrfs;
> c) then mount it forcing compression;
> d) multiple parallel cp of kernel and libreoffice src;
> e) reboot with same rc6+branch already used;
> f) tar of the new btrfs: no problem at all;
> g) let it finish;
> h) tar of /.snapshots: PSI memory skyrocket, and usual slowdown reading;
> i) stop it;
> l) again tar of the new btrfs: no problem
> m) repeat a few times.
>
> You can see the output here:
> https://asciinema.org/a/rJpGWvXYH6IDBXWYhtJckkKWo
>
> In the end you see I kill tar and let the PSI going down to zero, if
> you are interested.
>
> > Ok, so maybe I missed it, but I haven't kswapd0 in there, or nothing
> > taking 100% CPU.
> > Maybe it was just Mikhail running into that?
>
> To have this effect and the extreme luggish response (I mean, click
> something and it takes more than 30 seconds to react)
> I need to work at least one day on my laptop. At this point also
> cycling to virtual desktop takes a lot.
>
> Thinking about my different use case:
> a) i always suspend. I just reboot when change kernel. So, I can work
> for weeks with same kernel. Suspend2RAM, not disk, btw;
> b) months ago I let run beesd for a day.
>
> > So I'm surprised that you get an unresponsive desktop.
> Same point as before. In this case is not so luggish, but - i.e. - if
> I click for screenlock it doesn't start immediately, it waits for a
> little bit more than one second.

Oh I see that on my main desktop which only uses ext4 and always has 2
qemu vms usually running debian and opensuse.
Sometimes even if the VMs aren't doing anything, but they used to be
doing IO heavy testing, the desktop in the host gets unresponsive,
clicking the screenlock often takes at least some 5 seconds, or
changing workspaces takes a few seconds too, etc. Shouldn't happen in
theory.

>
> > Interestingly, here the memory PSI stays at 0% or very close to that,
> > it never reaches anything close to the 60%.
>
> You see the same thing with the last test with new btrfs partition.
> New partition: ~0%
> /.snapshots/: near 60%.

It could be due to heavy fragmentation, but that should only be too
slow if you were using a spinning disk.
I think somewhere you mentioned nvme or ssd.

Removing the extent maps could cause extra reads of metadata and be slow.
But the number of extent maps removed on every iteration is relatively
small, and round-robin, so... it's strange that it causes such huge
pressure and desktop unresponsiveness.
We will know if that's the case with the 2nd test patch.

>
>
> > With htop in parallel, the bpftrace script, and since my htop version
> > doesn't show PSI information (probably an older version than yours), I
> > kept monitoring PSI like this:
>
> Well, mine is taken from here:
> https://github.com/htop-dev/htop.git
> Compiled with:
> ./configure --enable-capabilities --enable-delayacct --enable-sensors
> --enable-werror   --enable-affinity
> And tweaked config file. If you want I can send it.

Thanks, I'll have to try it eventually.

>
>
> > So several different things to try here:
>
> I stop here for the moment. I have to sleep.
> In the weekend I do the rest and reply to you!

Sure, take your time. It takes time patching and building kernels,
plus the testing, etc.
Many thanks for that!

>
> > Thanks a lot to you and Mikhail, not just for the reporting but also
> > to apply patches, compile a kernel, run the tests and do all those
> > valuable observations which are all very time consuming.
>
> My little contribution to free software!
>
> Ciao,
> Gelma

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04 17:25                           ` Filipe Manana
  2024-07-04 17:31                             ` Filipe Manana
  2024-07-04 22:15                             ` Andrea Gelmini
@ 2024-07-05  6:30                             ` Andrea Gelmini
  2024-07-05 11:06                               ` Filipe Manana
  2024-07-05 18:36                             ` Mikhail Gavrilov
  2024-07-06  0:11                             ` Andrea Gelmini
  4 siblings, 1 reply; 56+ messages in thread
From: Andrea Gelmini @ 2024-07-05  6:30 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

Il giorno gio 4 lug 2024 alle ore 19:25 Filipe Manana
<fdmanana@kernel.org> ha scritto:
> 1) First let's check that the problem is really a consequence of the shrinker.
>     Try this patch:
>
>     https://gist.githubusercontent.com/fdmanana/b44abaade0000d28ba0e1e1ae3ac4fee/raw/5c9bf0beb5aa156b893be2837c9244d035962c74/gistfile1.txt
>
>     This disables the shrinker. This is just to confirm if I'm looking
> in the right direction, if your problem is the same as Mikhail's and
> double check his bisection.

Ok, so, I confirm. With this change, just a little bit of PSI memory
sometime (<3%), but no skyrocket. Also, tar at full speed.

Now, I'm going to prepare the btrfs image to send you.

The other steps later.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-05  6:30                             ` Andrea Gelmini
@ 2024-07-05 11:06                               ` Filipe Manana
  0 siblings, 0 replies; 56+ messages in thread
From: Filipe Manana @ 2024-07-05 11:06 UTC (permalink / raw)
  To: Andrea Gelmini
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Fri, Jul 5, 2024 at 7:30 AM Andrea Gelmini <andrea.gelmini@gmail.com> wrote:
>
> Il giorno gio 4 lug 2024 alle ore 19:25 Filipe Manana
> <fdmanana@kernel.org> ha scritto:
> > 1) First let's check that the problem is really a consequence of the shrinker.
> >     Try this patch:
> >
> >     https://gist.githubusercontent.com/fdmanana/b44abaade0000d28ba0e1e1ae3ac4fee/raw/5c9bf0beb5aa156b893be2837c9244d035962c74/gistfile1.txt
> >
> >     This disables the shrinker. This is just to confirm if I'm looking
> > in the right direction, if your problem is the same as Mikhail's and
> > double check his bisection.
>
> Ok, so, I confirm. With this change, just a little bit of PSI memory
> sometime (<3%), but no skyrocket. Also, tar at full speed.

Ok, so the bisection is reliable and it means you are experiencing the
same problem that Mikhail reported.

>
> Now, I'm going to prepare the btrfs image to send you.

That might not be necessary, not sure how it would help, the 2nd patch
to try would confirm if it's any fragmentation causing too many slow
reads after extent map eviction.
So save yourself some time for now because making the image is likely slow.

>
> The other steps later.

Thanks!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04 17:25                           ` Filipe Manana
                                               ` (2 preceding siblings ...)
  2024-07-05  6:30                             ` Andrea Gelmini
@ 2024-07-05 18:36                             ` Mikhail Gavrilov
  2024-07-05 23:09                               ` Filipe Manana
  2024-07-06  0:11                             ` Andrea Gelmini
  4 siblings, 1 reply; 56+ messages in thread
From: Mikhail Gavrilov @ 2024-07-05 18:36 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Andrea Gelmini, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

[-- Attachment #1: Type: text/plain, Size: 3855 bytes --]

On Thu, Jul 4, 2024 at 10:25 PM Filipe Manana <fdmanana@kernel.org> wrote:
>
> So several different things to try here:
>
> 1) First let's check that the problem is really a consequence of the shrinker.
>     Try this patch:
>
>     https://gist.githubusercontent.com/fdmanana/b44abaade0000d28ba0e1e1ae3ac4fee/raw/5c9bf0beb5aa156b893be2837c9244d035962c74/gistfile1.txt
>
>     This disables the shrinker. This is just to confirm if I'm looking
> in the right direction, if your problem is the same as Mikhail's and
> double check his bisection.

[1]
I can't check it because the patch is unapplyable on top of 661e504db04c.
> git apply debug-1.patch
error: patch failed: fs/btrfs/super.c:2410
error: fs/btrfs/super.c: patch does not apply
> cat debug-1.patch
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index f05cce7c8b8d..06c0db641d18 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2410,8 +2410,10 @@ static const struct super_operations btrfs_super_ops = {
        .statfs         = btrfs_statfs,
        .freeze_fs      = btrfs_freeze,
        .unfreeze_fs    = btrfs_unfreeze,
+       /*
        .nr_cached_objects = btrfs_nr_cached_objects,
        .free_cached_objects = btrfs_free_cached_objects,
+       */
 };

 static const struct file_operations btrfs_ctl_fops = {



> 2) Then drop that patch that disables the shrinker.
>      With all the previous 4 patches applied, apply this one on top of them:
>
>      https://gist.githubusercontent.com/fdmanana/9cea16ca56594f8c7e20b67dc66c6c94/raw/557bd5f6b37b65d210218f8da8987b74bfe5e515/gistfile1.txt
>
>      The goal here is to see if the extent map eviction done by the
> shrinker is making reads from other tasks too slow, and check if
> that's what0s making your system unresponsive.
>

[2]
6.10.0-rc6-661e504db04c-test2
up  1:00
root         269 15.5  0.0      0     0 ?        R    10:23   9:20 [kswapd0]
up  2:02
root         269 21.6  0.0      0     0 ?        S    10:23  26:27 [kswapd0]
up  3:10
root         269 25.2  0.0      0     0 ?        R    10:23  48:11 [kswapd0]
up  4:04
root         269 29.0  0.0      0     0 ?        S    10:23  71:12 [kswapd0]
up  5:04
root         269 26.8  0.0      0     0 ?        R    10:23  81:47 [kswapd0]
up  6:07
root         269 27.9  0.0      0     0 ?        R    10:23 102:40 [kswapd0]
dmesg attached below as 6.10.0-rc6-661e504db04c-test2.zip

> 3) Then drop the patch from step 2), and on top of the previous 4
> patches from my git tree, apply this one:
>
>      https://gist.githubusercontent.com/fdmanana/a7c9c2abb69c978cf5b80c2f784243d5/raw/b4cca964904d3ec15c74e36ccf111a3a2f530520/gistfile1.txt
>
>      This is just to confirm if we do have concurrent calls to the
> shrinker, as the tracing seems to suggest, and where the negative
> numbers come from.
>      It also helps to check if not allowing concurrent calls to it, by
> skipping if it's already running, helps making the problems go away.

[3]
6.10.0-rc6-661e504db04c-test3
up  1:00
root         269 18.6  0.0      0     0 ?        S    17:09  11:12 [kswapd0]
up  2:00
root         269 23.7  0.0      0     0 ?        R    17:09  28:30 [kswapd0]
up  3:00
root         269 27.0  0.0      0     0 ?        S    17:09  48:47 [kswapd0]
up  4:00
root         269 28.8  0.0      0     0 ?        S    17:09  69:10 [kswapd0]
up  5:00
root         269 32.0  0.0      0     0 ?        S    17:09  96:17 [kswapd0]
up  6:00
root         269 29.7  0.0      0     0 ?        S    17:09 107:12 [kswapd0]
dmesg attached below as 6.10.0-rc6-661e504db04c-test3.zip

As we can see, the time of kswapd0 has increased significantly. It was
30 min in 6 hours it became 100 min. That is, it became three times
worse even with proposed patches (1-4).

-- 
Best Regards,
Mike Gavrilov.

[-- Attachment #2: 6.10.0-rc6-661e504db04c-test2.zip --]
[-- Type: application/zip, Size: 53393 bytes --]

[-- Attachment #3: 6.10.0-rc6-661e504db04c-test3.zip --]
[-- Type: application/zip, Size: 54961 bytes --]

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-05 18:36                             ` Mikhail Gavrilov
@ 2024-07-05 23:09                               ` Filipe Manana
  0 siblings, 0 replies; 56+ messages in thread
From: Filipe Manana @ 2024-07-05 23:09 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Andrea Gelmini, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Fri, Jul 5, 2024 at 7:36 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> On Thu, Jul 4, 2024 at 10:25 PM Filipe Manana <fdmanana@kernel.org> wrote:
> >
> > So several different things to try here:
> >
> > 1) First let's check that the problem is really a consequence of the shrinker.
> >     Try this patch:
> >
> >     https://gist.githubusercontent.com/fdmanana/b44abaade0000d28ba0e1e1ae3ac4fee/raw/5c9bf0beb5aa156b893be2837c9244d035962c74/gistfile1.txt
> >
> >     This disables the shrinker. This is just to confirm if I'm looking
> > in the right direction, if your problem is the same as Mikhail's and
> > double check his bisection.
>
> [1]
> I can't check it because the patch is unapplyable on top of 661e504db04c.
> > git apply debug-1.patch
> error: patch failed: fs/btrfs/super.c:2410
> error: fs/btrfs/super.c: patch does not apply
> > cat debug-1.patch
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index f05cce7c8b8d..06c0db641d18 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2410,8 +2410,10 @@ static const struct super_operations btrfs_super_ops = {
>         .statfs         = btrfs_statfs,
>         .freeze_fs      = btrfs_freeze,
>         .unfreeze_fs    = btrfs_unfreeze,
> +       /*
>         .nr_cached_objects = btrfs_nr_cached_objects,
>         .free_cached_objects = btrfs_free_cached_objects,
> +       */
>  };
>
>  static const struct file_operations btrfs_ctl_fops = {
>
>
>
> > 2) Then drop that patch that disables the shrinker.
> >      With all the previous 4 patches applied, apply this one on top of them:
> >
> >      https://gist.githubusercontent.com/fdmanana/9cea16ca56594f8c7e20b67dc66c6c94/raw/557bd5f6b37b65d210218f8da8987b74bfe5e515/gistfile1.txt
> >
> >      The goal here is to see if the extent map eviction done by the
> > shrinker is making reads from other tasks too slow, and check if
> > that's what0s making your system unresponsive.
> >
>
> [2]
> 6.10.0-rc6-661e504db04c-test2
> up  1:00
> root         269 15.5  0.0      0     0 ?        R    10:23   9:20 [kswapd0]
> up  2:02
> root         269 21.6  0.0      0     0 ?        S    10:23  26:27 [kswapd0]
> up  3:10
> root         269 25.2  0.0      0     0 ?        R    10:23  48:11 [kswapd0]
> up  4:04
> root         269 29.0  0.0      0     0 ?        S    10:23  71:12 [kswapd0]
> up  5:04
> root         269 26.8  0.0      0     0 ?        R    10:23  81:47 [kswapd0]
> up  6:07
> root         269 27.9  0.0      0     0 ?        R    10:23 102:40 [kswapd0]
> dmesg attached below as 6.10.0-rc6-661e504db04c-test2.zip
>
> > 3) Then drop the patch from step 2), and on top of the previous 4
> > patches from my git tree, apply this one:
> >
> >      https://gist.githubusercontent.com/fdmanana/a7c9c2abb69c978cf5b80c2f784243d5/raw/b4cca964904d3ec15c74e36ccf111a3a2f530520/gistfile1.txt
> >
> >      This is just to confirm if we do have concurrent calls to the
> > shrinker, as the tracing seems to suggest, and where the negative
> > numbers come from.
> >      It also helps to check if not allowing concurrent calls to it, by
> > skipping if it's already running, helps making the problems go away.
>
> [3]
> 6.10.0-rc6-661e504db04c-test3
> up  1:00
> root         269 18.6  0.0      0     0 ?        S    17:09  11:12 [kswapd0]
> up  2:00
> root         269 23.7  0.0      0     0 ?        R    17:09  28:30 [kswapd0]
> up  3:00
> root         269 27.0  0.0      0     0 ?        S    17:09  48:47 [kswapd0]
> up  4:00
> root         269 28.8  0.0      0     0 ?        S    17:09  69:10 [kswapd0]
> up  5:00
> root         269 32.0  0.0      0     0 ?        S    17:09  96:17 [kswapd0]
> up  6:00
> root         269 29.7  0.0      0     0 ?        S    17:09 107:12 [kswapd0]
> dmesg attached below as 6.10.0-rc6-661e504db04c-test3.zip
>
> As we can see, the time of kswapd0 has increased significantly. It was
> 30 min in 6 hours it became 100 min. That is, it became three times
> worse even with proposed patches (1-4).

Can you try the following two branches based on 6.10-rc6?

1)  https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=test1_em_shrinker_6.10

2)  https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=test2_em_shrinker_6.10

Even if the first one makes things good, also try the second one please.

The first just includes some changes for the next merge window (for
6.11) that might help speedup things.
The second just has a change that would be simple to add to 6.10 and
we'll probably always want it or some variation of it.

Thanks!

>
> --
> Best Regards,
> Mike Gavrilov.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04 17:25                           ` Filipe Manana
                                               ` (3 preceding siblings ...)
  2024-07-05 18:36                             ` Mikhail Gavrilov
@ 2024-07-06  0:11                             ` Andrea Gelmini
  2024-07-06 12:07                               ` Andrea Gelmini
  4 siblings, 1 reply; 56+ messages in thread
From: Andrea Gelmini @ 2024-07-06  0:11 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

Il giorno gio 4 lug 2024 alle ore 19:25 Filipe Manana
<fdmanana@kernel.org> ha scritto:
> 2) Then drop that patch that disables the shrinker.
>      With all the previous 4 patches applied, apply this one on top of them:
>
>      https://gist.githubusercontent.com/fdmanana/9cea16ca56594f8c7e20b67dc66c6c94/raw/557bd5f6b37b65d210218f8da8987b74bfe5e515/gistfile1.txt
>
>      The goal here is to see if the extent map eviction done by the
> shrinker is making reads from other tasks too slow, and check if
> that's what0s making your system unresponsive.
>
> 3) Then drop the patch from step 2), and on top of the previous 4
> patches from my git tree, apply this one:
>
>      https://gist.githubusercontent.com/fdmanana/a7c9c2abb69c978cf5b80c2f784243d5/raw/b4cca964904d3ec15c74e36ccf111a3a2f530520/gistfile1.txt
>
>      This is just to confirm if we do have concurrent calls to the
> shrinker, as the tracing seems to suggest, and where the negative
> numbers come from.
>      It also helps to check if not allowing concurrent calls to it, by
> skipping if it's already running, helps making the problems go away.

Uhm... good news...
To recap, here's this evening tests:

Kernel 6.6.36:
   Fresh BTRFS: (tar cp . | pv -ta > /dev/null): 0:03:53 [ 231MiB/s]
(time and average speed)
   Aged snapshots: (tar cp /.snapshots/|pv -at -s 100G -S >
/dev/null): 0:02:20 [ 726MiB/s]

Kernel rc6+branch+2nd patch:
   Fresh BTRFS: 0:03:14 [ 278MiB/s]
   Aged snapshots: I had to stop. PSI memory > 80%. Processes stucked
for most time. i.e.: mplayer via nfs stops every few seconds for a
while, switching virtual desktop takes >5 seconds. Also "echo 3 >
drop_caches" takes more than 5 minutes to finish (on the other two
kernels, it was quite immediate).

Kernel rc6+branch+3rd patch:
   Fresh BTRFS: 0:03:40 [ 245MiB/s]
   Aged snapshots: 0:02:03 [ 826MiB/s]
   N.b.: no skyrocket PSI memory, no swap pressure, no sluggish results!!!

Now, that was just one run, I'm going to use this patch for a few
days. Next week I can tell you for sure if everything is right!
For the moment it seems we have a winner!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-06  0:11                             ` Andrea Gelmini
@ 2024-07-06 12:07                               ` Andrea Gelmini
  2024-07-06 17:37                                 ` Filipe Manana
  0 siblings, 1 reply; 56+ messages in thread
From: Andrea Gelmini @ 2024-07-06 12:07 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

Il giorno sab 6 lug 2024 alle ore 02:11 Andrea Gelmini
<andrea.gelmini@gmail.com> ha scritto:
> For the moment it seems we have a winner!

I confirm this, but I forgot to add this (a lot of these):
[sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm firefox-bin nr_to_scan 2
[sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm firefox-bin nr_to_scan 2
[sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2
[sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
shrinker already running, comm cc1plus nr_to_scan 2

Just for the record, compiling LibreOffice.

In the meanwhile running restic (full backup to force read
everything), no sluggish at all.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-06 12:07                               ` Andrea Gelmini
@ 2024-07-06 17:37                                 ` Filipe Manana
  2024-07-07  9:41                                   ` Filipe Manana
  2024-07-07 11:35                                   ` Mikhail Gavrilov
  0 siblings, 2 replies; 56+ messages in thread
From: Filipe Manana @ 2024-07-06 17:37 UTC (permalink / raw)
  To: Andrea Gelmini
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Sat, Jul 6, 2024 at 1:07 PM Andrea Gelmini <andrea.gelmini@gmail.com> wrote:
>
> Il giorno sab 6 lug 2024 alle ore 02:11 Andrea Gelmini
> <andrea.gelmini@gmail.com> ha scritto:
> > For the moment it seems we have a winner!
>
> I confirm this, but I forgot to add this (a lot of these):

Oh, those I added on purpose to confirm what the bpftrace logs
suggested: concurrent calls into the shrinker.


> [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm firefox-bin nr_to_scan 2
> [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm firefox-bin nr_to_scan 2
> [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
> [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> shrinker already running, comm cc1plus nr_to_scan 2
>
> Just for the record, compiling LibreOffice.
>
> In the meanwhile running restic (full backup to force read
> everything), no sluggish at all.

That's great!

So I've been working on a proper approach following all those test
results from you and Mikhail, and I would like to ask you both to try
this branch:

https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=test3_em_shrinker_6.10

Again, this is based on 6.10-rc6 plus 3 fixes for this issue you're both having.

Can you guys test that branch?

Thank you a lot for all the time spent on this!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-06 17:37                                 ` Filipe Manana
@ 2024-07-07  9:41                                   ` Filipe Manana
  2024-07-07 10:15                                     ` Andrea Gelmini
  2024-07-07 11:35                                   ` Mikhail Gavrilov
  1 sibling, 1 reply; 56+ messages in thread
From: Filipe Manana @ 2024-07-07  9:41 UTC (permalink / raw)
  To: Andrea Gelmini
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Sat, Jul 6, 2024 at 6:37 PM Filipe Manana <fdmanana@kernel.org> wrote:
>
> On Sat, Jul 6, 2024 at 1:07 PM Andrea Gelmini <andrea.gelmini@gmail.com> wrote:
> >
> > Il giorno sab 6 lug 2024 alle ore 02:11 Andrea Gelmini
> > <andrea.gelmini@gmail.com> ha scritto:
> > > For the moment it seems we have a winner!
> >
> > I confirm this, but I forgot to add this (a lot of these):
>
> Oh, those I added on purpose to confirm what the bpftrace logs
> suggested: concurrent calls into the shrinker.
>
>
> > [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm firefox-bin nr_to_scan 2
> > [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm firefox-bin nr_to_scan 2
> > [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:06 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> > [sab lug  6 13:12:07 2024] BTRFS warning (device dm-0): extent
> > shrinker already running, comm cc1plus nr_to_scan 2
> >
> > Just for the record, compiling LibreOffice.
> >
> > In the meanwhile running restic (full backup to force read
> > everything), no sluggish at all.
>
> That's great!
>
> So I've been working on a proper approach following all those test
> results from you and Mikhail, and I would like to ask you both to try
> this branch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=test3_em_shrinker_6.10
>
> Again, this is based on 6.10-rc6 plus 3 fixes for this issue you're both having.
>
> Can you guys test that branch?

I just updated the branch with a last minute change to avoid an
unnecessary reschedule and re-lock, therefore helping reduce latency.
Thanks.

>
> Thank you a lot for all the time spent on this!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-07  9:41                                   ` Filipe Manana
@ 2024-07-07 10:15                                     ` Andrea Gelmini
  2024-07-07 10:28                                       ` Filipe Manana
  0 siblings, 1 reply; 56+ messages in thread
From: Andrea Gelmini @ 2024-07-07 10:15 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

Il giorno dom 7 lug 2024 alle ore 11:41 Filipe Manana
<fdmanana@kernel.org> ha scritto:
> > Again, this is based on 6.10-rc6 plus 3 fixes for this issue you're both having.
> >
> > Can you guys test that branch?

Used yesterday and today. Seems fine. Just in quick test, I see
sometimes PSI memory spike over 40, but - important thing - no effect
on interactivity. So I didn't investigated more.

Well, just to be sure. I compiled the latest git with -rc6 and
test3_em_shrinker_6.10. Nothing more about patches.

Anyway, just for the record:
kernel: test3
       fresh: 0:03:44 [ 241MiB/s]
       aged: 0:02:07 [ 801MiB/s]
       funny thing: next runs of aged no more than  0:03:22 [
504MiB/s] (but, as I wrote, no problem with interaction).

> I just updated the branch with a last minute change to avoid an
> unnecessary reschedule and re-lock, therefore helping reduce latency.

Ok, recompile now and test!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-07 10:15                                     ` Andrea Gelmini
@ 2024-07-07 10:28                                       ` Filipe Manana
  2024-07-07 11:15                                         ` Andrea Gelmini
  0 siblings, 1 reply; 56+ messages in thread
From: Filipe Manana @ 2024-07-07 10:28 UTC (permalink / raw)
  To: Andrea Gelmini
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Sun, Jul 7, 2024 at 11:15 AM Andrea Gelmini <andrea.gelmini@gmail.com> wrote:
>
> Il giorno dom 7 lug 2024 alle ore 11:41 Filipe Manana
> <fdmanana@kernel.org> ha scritto:
> > > Again, this is based on 6.10-rc6 plus 3 fixes for this issue you're both having.
> > >
> > > Can you guys test that branch?
>
> Used yesterday and today. Seems fine. Just in quick test, I see
> sometimes PSI memory spike over 40, but - important thing - no effect
> on interactivity. So I didn't investigated more.

Awesome!

>
> Well, just to be sure. I compiled the latest git with -rc6 and
> test3_em_shrinker_6.10. Nothing more about patches.

That's right, just that branch. It has all the necessary patches (3),
no need to apply any other patches on top of it.

>
> Anyway, just for the record:
> kernel: test3
>        fresh: 0:03:44 [ 241MiB/s]
>        aged: 0:02:07 [ 801MiB/s]
>        funny thing: next runs of aged no more than  0:03:22 [
> 504MiB/s] (but, as I wrote, no problem with interaction).
>
> > I just updated the branch with a last minute change to avoid an
> > unnecessary reschedule and re-lock, therefore helping reduce latency.
>
> Ok, recompile now and test!

Thanks! Much appreciated!

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-07 10:28                                       ` Filipe Manana
@ 2024-07-07 11:15                                         ` Andrea Gelmini
  2024-07-07 12:10                                           ` Filipe Manana
  0 siblings, 1 reply; 56+ messages in thread
From: Andrea Gelmini @ 2024-07-07 11:15 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

Il giorno dom 7 lug 2024 alle ore 12:28 Filipe Manana
<fdmanana@kernel.org> ha scritto:

> > Ok, recompile now and test!
>
> Thanks! Much appreciated!

So, usual benchmark:
    fresh: 0:03:16 [ 275MiB/s]
    aged: 0:02:30 [ 680MiB/s]

I let you know in a few days.
Well, does it make sense to add the option to disable shrinker via /proc?

Thanks to you,
Gelma

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-07 11:15                                         ` Andrea Gelmini
@ 2024-07-07 12:10                                           ` Filipe Manana
  0 siblings, 0 replies; 56+ messages in thread
From: Filipe Manana @ 2024-07-07 12:10 UTC (permalink / raw)
  To: Andrea Gelmini
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Sun, Jul 7, 2024 at 12:15 PM Andrea Gelmini <andrea.gelmini@gmail.com> wrote:
>
> Il giorno dom 7 lug 2024 alle ore 12:28 Filipe Manana
> <fdmanana@kernel.org> ha scritto:
>
> > > Ok, recompile now and test!
> >
> > Thanks! Much appreciated!
>
> So, usual benchmark:
>     fresh: 0:03:16 [ 275MiB/s]
>     aged: 0:02:30 [ 680MiB/s]
>
> I let you know in a few days.
> Well, does it make sense to add the option to disable shrinker via /proc?

Maybe (through sysfs), but  the shrinker is important to prevent OOM
situations because otherwise we can create an unlimited number of
extent maps.
It can be triggered by a regular user, intentionally or not.

>
> Thanks to you,
> Gelma

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-06 17:37                                 ` Filipe Manana
  2024-07-07  9:41                                   ` Filipe Manana
@ 2024-07-07 11:35                                   ` Mikhail Gavrilov
  2024-07-07 12:15                                     ` Filipe Manana
  1 sibling, 1 reply; 56+ messages in thread
From: Mikhail Gavrilov @ 2024-07-07 11:35 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Andrea Gelmini, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

[-- Attachment #1: Type: text/plain, Size: 2624 bytes --]

On Sat, Jul 6, 2024 at 10:38 PM Filipe Manana <fdmanana@kernel.org> wrote:
> So I've been working on a proper approach following all those test
> results from you and Mikhail, and I would like to ask you both to try
> this branch:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=test3_em_shrinker_6.10
>
> Again, this is based on 6.10-rc6 plus 3 fixes for this issue you're both having.
>
> Can you guys test that branch?
>
> Thank you a lot for all the time spent on this!

6.10.0-rc6-test1_em_shrinker_6.10
up  1:01
root         269 25.8  0.0      0     0 ?        R    10:59  15:47 [kswapd0]
up  2:00
root         269 25.5  0.0      0     0 ?        S    10:59  30:46 [kswapd0]
up  3:00
root         269 27.9  0.0      0     0 ?        S    10:59  50:18 [kswapd0]
up  4:00
root         269 27.8  0.0      0     0 ?        S    10:59  67:08 [kswapd0]
up  5:00
root         269 27.5  0.0      0     0 ?        S    10:59  83:01 [kswapd0]
up  6:00
root         269 27.5  0.0      0     0 ?        S    10:59  99:31 [kswapd0]
kswapd0 on the test1 branch is bad as
https://gist.githubusercontent.com/fdmanana/9cea16ca56594f8c7e20b67dc66c6c94/raw/557bd5f6b37b65d210218f8da8987b74bfe5e515/gistfile1.txt


6.10.0-rc6-test2_em_shrinker_6.10
up  1:00
root         269 11.7  0.0      0     0 ?        S    19:23   7:03 [kswapd0]
up  2:02
root         269 11.9  0.0      0     0 ?        S    19:23  14:38 [kswapd0]
up  3:00
root         269 11.9  0.0      0     0 ?        S    19:23  21:30 [kswapd0]
up  4:01
root         269 11.2  0.0      0     0 ?        S    19:23  27:15 [kswapd0]
up  5:00
root         269 11.4  0.0      0     0 ?        R    Jul06  34:25 [kswapd0]
up  6:00
root         269 13.9  0.0      0     0 ?        S    Jul06  50:14 [kswapd0]
On the test2 branch, kswapd0 is two times better.


6.10.0-rc6-test3_em_shrinker_6.10 (d22fedf5058d)
up  1:02
root         269 11.0  0.0      0     0 ?        S    09:54   6:50 [kswapd0]
up  2:00
root         269 10.7  0.0      0     0 ?        S    09:54  12:54 [kswapd0]
up  3:00
root         269 10.1  0.0      0     0 ?        S    09:54  18:18 [kswapd0]
up  4:00
root         269  9.5  0.0      0     0 ?        S    09:54  23:03 [kswapd0]
up  5:01
root         269 10.0  0.0      0     0 ?        S    09:54  30:24 [kswapd0]
up  6:00
root         269  9.9  0.0      0     0 ?        S    09:54  35:42 [kswapd0]
On the test3 branch, kswapd0 is thee times better.

To catch up with the 6.9 branch, the timing needs to be 4 times better.

-- 
Best Regards,
Mike Gavrilov.

[-- Attachment #2: 6.10.0-rc6-test1_em_shrinker_6.10.zip --]
[-- Type: application/zip, Size: 52872 bytes --]

[-- Attachment #3: 6.10.0-rc6-test2_em_shrinker_6.10.zip --]
[-- Type: application/zip, Size: 45286 bytes --]

[-- Attachment #4: 6.10.0-rc6-test3_em_shrinker_6.10.zip --]
[-- Type: application/zip, Size: 52841 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-07 11:35                                   ` Mikhail Gavrilov
@ 2024-07-07 12:15                                     ` Filipe Manana
  2024-07-07 19:16                                       ` Mikhail Gavrilov
  0 siblings, 1 reply; 56+ messages in thread
From: Filipe Manana @ 2024-07-07 12:15 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Andrea Gelmini, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Sun, Jul 7, 2024 at 12:35 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> On Sat, Jul 6, 2024 at 10:38 PM Filipe Manana <fdmanana@kernel.org> wrote:
> > So I've been working on a proper approach following all those test
> > results from you and Mikhail, and I would like to ask you both to try
> > this branch:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/log/?h=test3_em_shrinker_6.10
> >
> > Again, this is based on 6.10-rc6 plus 3 fixes for this issue you're both having.
> >
> > Can you guys test that branch?
> >
> > Thank you a lot for all the time spent on this!
>
> 6.10.0-rc6-test1_em_shrinker_6.10
> up  1:01
> root         269 25.8  0.0      0     0 ?        R    10:59  15:47 [kswapd0]
> up  2:00
> root         269 25.5  0.0      0     0 ?        S    10:59  30:46 [kswapd0]
> up  3:00
> root         269 27.9  0.0      0     0 ?        S    10:59  50:18 [kswapd0]
> up  4:00
> root         269 27.8  0.0      0     0 ?        S    10:59  67:08 [kswapd0]
> up  5:00
> root         269 27.5  0.0      0     0 ?        S    10:59  83:01 [kswapd0]
> up  6:00
> root         269 27.5  0.0      0     0 ?        S    10:59  99:31 [kswapd0]
> kswapd0 on the test1 branch is bad as
> https://gist.githubusercontent.com/fdmanana/9cea16ca56594f8c7e20b67dc66c6c94/raw/557bd5f6b37b65d210218f8da8987b74bfe5e515/gistfile1.txt
>
>
> 6.10.0-rc6-test2_em_shrinker_6.10
> up  1:00
> root         269 11.7  0.0      0     0 ?        S    19:23   7:03 [kswapd0]
> up  2:02
> root         269 11.9  0.0      0     0 ?        S    19:23  14:38 [kswapd0]
> up  3:00
> root         269 11.9  0.0      0     0 ?        S    19:23  21:30 [kswapd0]
> up  4:01
> root         269 11.2  0.0      0     0 ?        S    19:23  27:15 [kswapd0]
> up  5:00
> root         269 11.4  0.0      0     0 ?        R    Jul06  34:25 [kswapd0]
> up  6:00
> root         269 13.9  0.0      0     0 ?        S    Jul06  50:14 [kswapd0]
> On the test2 branch, kswapd0 is two times better.
>
>
> 6.10.0-rc6-test3_em_shrinker_6.10 (d22fedf5058d)
> up  1:02
> root         269 11.0  0.0      0     0 ?        S    09:54   6:50 [kswapd0]
> up  2:00
> root         269 10.7  0.0      0     0 ?        S    09:54  12:54 [kswapd0]
> up  3:00
> root         269 10.1  0.0      0     0 ?        S    09:54  18:18 [kswapd0]
> up  4:00
> root         269  9.5  0.0      0     0 ?        S    09:54  23:03 [kswapd0]
> up  5:01
> root         269 10.0  0.0      0     0 ?        S    09:54  30:24 [kswapd0]
> up  6:00
> root         269  9.9  0.0      0     0 ?        S    09:54  35:42 [kswapd0]
> On the test3 branch, kswapd0 is thee times better.

That's good. And is the DE unresponsiveness gone too?

I see you tested d22fedf5058d, but I updated the branch a couple hours
ago, now the top commit is fa8b5dd7fa18.
Can you test the updated branch? It may help further in your case.

https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/commit/?h=test3_em_shrinker_6.10&id=fa8b5dd7fa18a4dc2ea6bdeaf5525b1af348f383

>
> To catch up with the 6.9 branch, the timing needs to be 4 times better.

Hopefully it will be much closer to that with the updated branch.
The upcoming changes for 6.11 would help there too, but anyway we can
still further optimize on top of the 6.10-rc code.

Thanks Mikhail!

>
> --
> Best Regards,
> Mike Gavrilov.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-07 12:15                                     ` Filipe Manana
@ 2024-07-07 19:16                                       ` Mikhail Gavrilov
  2024-07-08 14:15                                         ` Filipe Manana
  0 siblings, 1 reply; 56+ messages in thread
From: Mikhail Gavrilov @ 2024-07-07 19:16 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Andrea Gelmini, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

[-- Attachment #1: Type: text/plain, Size: 1371 bytes --]

On Sunday, Jul 7, 2024, at 5:15 PM Filipe Manana <fdmanana@kernel.org>, wrote:
> That's good. And is the DE unresponsiveness gone too?

Yes. I don’t know how to objectively measure responsiveness, but there
There were no more freezes like those on my video.

> I see you tested d22fedf5058d, but I updated the branch a couple hours
> ago, now the top commit is fa8b5dd7fa18.
> Can you test the updated branch? It may help further in your case.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/commit/?h=test3_em_shrinker_6.10&id=fa8b5dd7fa18a4dc2ea6bdeaf5525b1af348f383

6.10.0-rc6-test3_em_shrinker_6.10-fa8b5dd7fa18
up  1:00
root         269 13.1  0.0      0     0 ?        S    18:01   7:54 [kswapd0]
up  2:00
root         269  9.8  0.0      0     0 ?        S    18:01  11:46 [kswapd0]
up  3:00
root         269 10.8  0.0      0     0 ?        S    18:01  19:36 [kswapd0]
up  4:00
root         269 11.9  0.0      0     0 ?        R    18:01  28:37 [kswapd0]
up  5:00
root         269 13.1  0.0      0     0 ?        S    18:01  39:29 [kswapd0]
up  6:00
root         269 13.1  0.0      0     0 ?        S    Jul07  47:24 [kswapd0]

It’s as if kswapd0 got worse based on time measurements (it became
like on the test2 branch), but subjectively, the responsiveness got
better.

-- 
Best Regards,
Mike Gavrilov.

[-- Attachment #2: 6.10.0-rc6-test3_em_shrinker_6.10-fa8b5dd7fa18.zip --]
[-- Type: application/zip, Size: 54067 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-07 19:16                                       ` Mikhail Gavrilov
@ 2024-07-08 14:15                                         ` Filipe Manana
  2024-07-10  9:24                                           ` Mikhail Gavrilov
  0 siblings, 1 reply; 56+ messages in thread
From: Filipe Manana @ 2024-07-08 14:15 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Andrea Gelmini, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Sun, Jul 7, 2024 at 8:16 PM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> On Sunday, Jul 7, 2024, at 5:15 PM Filipe Manana <fdmanana@kernel.org>, wrote:
> > That's good. And is the DE unresponsiveness gone too?
>
> Yes. I don’t know how to objectively measure responsiveness, but there
> There were no more freezes like those on my video.

That's good.

>
> > I see you tested d22fedf5058d, but I updated the branch a couple hours
> > ago, now the top commit is fa8b5dd7fa18.
> > Can you test the updated branch? It may help further in your case.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux.git/commit/?h=test3_em_shrinker_6.10&id=fa8b5dd7fa18a4dc2ea6bdeaf5525b1af348f383
>
> 6.10.0-rc6-test3_em_shrinker_6.10-fa8b5dd7fa18
> up  1:00
> root         269 13.1  0.0      0     0 ?        S    18:01   7:54 [kswapd0]
> up  2:00
> root         269  9.8  0.0      0     0 ?        S    18:01  11:46 [kswapd0]
> up  3:00
> root         269 10.8  0.0      0     0 ?        S    18:01  19:36 [kswapd0]
> up  4:00
> root         269 11.9  0.0      0     0 ?        R    18:01  28:37 [kswapd0]
> up  5:00
> root         269 13.1  0.0      0     0 ?        S    18:01  39:29 [kswapd0]
> up  6:00
> root         269 13.1  0.0      0     0 ?        S    Jul07  47:24 [kswapd0]
>
> It’s as if kswapd0 got worse based on time measurements (it became
> like on the test2 branch), but subjectively, the responsiveness got
> better.

That's weird, I think you might be observing some variance.
I noticed that too for your reports of the test2 branch and the old
test3 branch, which were very identical, yet you got a very
significant difference between them.

Thanks.

>
> --
> Best Regards,
> Mike Gavrilov.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-08 14:15                                         ` Filipe Manana
@ 2024-07-10  9:24                                           ` Mikhail Gavrilov
  2024-07-10 10:53                                             ` Filipe Manana
  0 siblings, 1 reply; 56+ messages in thread
From: Mikhail Gavrilov @ 2024-07-10  9:24 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Andrea Gelmini, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Mon, Jul 8, 2024 at 7:16 PM Filipe Manana <fdmanana@kernel.org> wrote:
>
> That's weird, I think you might be observing some variance.
> I noticed that too for your reports of the test2 branch and the old
> test3 branch, which were very identical, yet you got a very
> significant difference between them.
>
> Thanks.
>

up  1:00
root         269 10.2  0.0      0     0 ?        S    10:06   6:13 [kswapd0]
up  2:01
root         269  9.1  0.0      0     0 ?        S    10:06  11:07 [kswapd0]
up  3:00
root         269  8.4  0.0      0     0 ?        R    10:06  15:18 [kswapd0]
up  4:21
root         269 11.7  0.0      0     0 ?        S    10:06  30:33 [kswapd0]
up  5:01
root         269 11.7  0.0      0     0 ?        S    10:06  35:19 [kswapd0]
up  6:27
root         269 11.5  0.0      0     0 ?        S    10:06  44:39 [kswapd0]
up  7:00
root         269 11.2  0.0      0     0 ?        R    10:06  47:18 [kswapd0]

The measurement error can reach ±10 min.
Did you plan to merge the fix before the 6.10 release?

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-10  9:24                                           ` Mikhail Gavrilov
@ 2024-07-10 10:53                                             ` Filipe Manana
  2024-08-11  8:08                                               ` Jannik Glückert
  0 siblings, 1 reply; 56+ messages in thread
From: Filipe Manana @ 2024-07-10 10:53 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Andrea Gelmini, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

On Wed, Jul 10, 2024 at 10:24 AM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> On Mon, Jul 8, 2024 at 7:16 PM Filipe Manana <fdmanana@kernel.org> wrote:
> >
> > That's weird, I think you might be observing some variance.
> > I noticed that too for your reports of the test2 branch and the old
> > test3 branch, which were very identical, yet you got a very
> > significant difference between them.
> >
> > Thanks.
> >
>
> up  1:00
> root         269 10.2  0.0      0     0 ?        S    10:06   6:13 [kswapd0]
> up  2:01
> root         269  9.1  0.0      0     0 ?        S    10:06  11:07 [kswapd0]
> up  3:00
> root         269  8.4  0.0      0     0 ?        R    10:06  15:18 [kswapd0]
> up  4:21
> root         269 11.7  0.0      0     0 ?        S    10:06  30:33 [kswapd0]
> up  5:01
> root         269 11.7  0.0      0     0 ?        S    10:06  35:19 [kswapd0]
> up  6:27
> root         269 11.5  0.0      0     0 ?        S    10:06  44:39 [kswapd0]
> up  7:00
> root         269 11.2  0.0      0     0 ?        R    10:06  47:18 [kswapd0]
>
> The measurement error can reach ±10 min.
> Did you plan to merge the fix before the 6.10 release?

I've submitted a patchset with the goal to apply against 6.10 (see the
notes there in the cover letter):

https://lore.kernel.org/linux-btrfs/cover.1720448663.git.fdmanana@suse.com/

But it's up to David to submit to Linus, as he's the maintainer.
Though I haven't heard from him yet.

I plan at least one more improvement for the shrinker, but I would
like to know too if those patches go into 6.10 before it's released or
not,
because there are conflicts with the for-next branch.

> --
> Best Regards,
> Mike Gavrilov.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-10 10:53                                             ` Filipe Manana
@ 2024-08-11  8:08                                               ` Jannik Glückert
  2024-08-11 15:33                                                 ` Filipe Manana
  0 siblings, 1 reply; 56+ messages in thread
From: Jannik Glückert @ 2024-08-11  8:08 UTC (permalink / raw)
  To: fdmanana
  Cc: andrea.gelmini, dsterba, josef, linux-btrfs, linux-kernel,
	mikhail.v.gavrilov, regressions

[-- Attachment #1: Type: text/plain, Size: 794 bytes --]

Hello,

I am still encountering this issue on 6.10.3. As far as I can see this 
is the last post in the thread, if the discussion continued elsewhere 
please let me know.

My workload is a backup via restic, the system is idle otherwise.
This is on a Zen4 CPU with a very fast PCIe Gen4 nvme, so perhaps it was 
fixed for others because they had comparatively slow IO or a smaller 
workload?

I have attached the bpftrace run and a graph of the memory PSI. kswapd0 
is at 100% during the critical sections. dmesg is empty.
Important events were e.g. 09:31-09:32 and 09:33-09:34 where the system 
was completely unresponsive multiple times, for about 5 seconds at a time.

I did also mention this on the #btrfs IRC channel and there are other 
users still encountering this on 6.10

Best
Jannik

[-- Attachment #2: psi_memory.jpg --]
[-- Type: image/jpeg, Size: 297648 bytes --]

[-- Attachment #3: bpftrace.log.gz --]
[-- Type: application/gzip, Size: 178863 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-08-11  8:08                                               ` Jannik Glückert
@ 2024-08-11 15:33                                                 ` Filipe Manana
  2024-08-14 21:24                                                   ` Jannik Glückert
  2024-08-15 22:21                                                   ` intelfx
  0 siblings, 2 replies; 56+ messages in thread
From: Filipe Manana @ 2024-08-11 15:33 UTC (permalink / raw)
  To: Jannik Glückert
  Cc: andrea.gelmini, dsterba, josef, linux-btrfs, linux-kernel,
	mikhail.v.gavrilov, regressions

On Sun, Aug 11, 2024 at 9:08 AM Jannik Glückert
<jannik.glueckert@gmail.com> wrote:
>
> Hello,
>
> I am still encountering this issue on 6.10.3. As far as I can see this
> is the last post in the thread, if the discussion continued elsewhere
> please let me know.
>
> My workload is a backup via restic, the system is idle otherwise.
> This is on a Zen4 CPU with a very fast PCIe Gen4 nvme, so perhaps it was
> fixed for others because they had comparatively slow IO or a smaller
> workload?
>
> I have attached the bpftrace run and a graph of the memory PSI. kswapd0
> is at 100% during the critical sections. dmesg is empty.
> Important events were e.g. 09:31-09:32 and 09:33-09:34 where the system
> was completely unresponsive multiple times, for about 5 seconds at a time.
>
> I did also mention this on the #btrfs IRC channel and there are other
> users still encountering this on 6.10

This came to my attention a couple days ago in a bugzilla report here:

https://bugzilla.kernel.org/show_bug.cgi?id=219121

There's also 2 other recent threads in the mailing about it.

There's a fix there in the bugzilla, and I've just sent it to the mailing list.
In case you want to try it:

https://lore.kernel.org/linux-btrfs/d85d72b968a1f7b8538c581eeb8f5baa973dfc95.1723377230.git.fdmanana@suse.com/

Thanks.



>
> Best
> Jannik

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-08-11 15:33                                                 ` Filipe Manana
@ 2024-08-14 21:24                                                   ` Jannik Glückert
  2024-08-15 22:21                                                   ` intelfx
  1 sibling, 0 replies; 56+ messages in thread
From: Jannik Glückert @ 2024-08-14 21:24 UTC (permalink / raw)
  To: Filipe Manana
  Cc: andrea.gelmini, dsterba, josef, linux-btrfs, linux-kernel,
	mikhail.v.gavrilov, regressions

On 8/11/24 17:33, Filipe Manana wrote:
> There's a fix there in the bugzilla, and I've just sent it to the mailing list.
> In case you want to try it:
> 
> https://lore.kernel.org/linux-btrfs/d85d72b968a1f7b8538c581eeb8f5baa973dfc95.1723377230.git.fdmanana@suse.com/
> 
> Thanks

Hi Filipe,

this patch mostly fixes the issue, but I still get a 1-2 second window 
of freezing every now and then. I also still see long periods of 100% 
kswapd0 usage, but without the periodic freezing.

Thanks
Jannik

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-08-11 15:33                                                 ` Filipe Manana
  2024-08-14 21:24                                                   ` Jannik Glückert
@ 2024-08-15 22:21                                                   ` intelfx
  2024-08-15 23:17                                                     ` intelfx
  1 sibling, 1 reply; 56+ messages in thread
From: intelfx @ 2024-08-15 22:21 UTC (permalink / raw)
  To: Filipe Manana, Jannik Glückert
  Cc: andrea.gelmini, dsterba, josef, linux-btrfs, linux-kernel,
	mikhail.v.gavrilov, regressions

[-- Attachment #1: Type: text/plain, Size: 1027 bytes --]

On 2024-08-11 at 16:33 +0100, Filipe Manana wrote:
> <...>
> This came to my attention a couple days ago in a bugzilla report here:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=219121
> 
> There's also 2 other recent threads in the mailing about it.
> 
> There's a fix there in the bugzilla, and I've just sent it to the mailing list.
> In case you want to try it:
> 
> https://lore.kernel.org/linux-btrfs/d85d72b968a1f7b8538c581eeb8f5baa973dfc95.1723377230.git.fdmanana@suse.com/
> 
> Thanks.

Hello,

I confirm that excessive "system" CPU usage by kswapd and btrfs-cleaner
kernel threads is still happening on the latest 6.10 stable with all
quoted patches applied, making the system close to unusable (not to
mention excessive power usage which crosses the line well *into*
"unusable" for low-power systems such as laptops).

With just 5 minutes of uptime on a freshly booted 6.10.5 system, the
cumulative CPU time of kswapd is already at 2 minutes.

Regards,
--
Ivan Shapovalov / intelfx /

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 862 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-08-15 22:21                                                   ` intelfx
@ 2024-08-15 23:17                                                     ` intelfx
  2024-08-16  0:02                                                       ` David Sterba
                                                                         ` (2 more replies)
  0 siblings, 3 replies; 56+ messages in thread
From: intelfx @ 2024-08-15 23:17 UTC (permalink / raw)
  To: Filipe Manana, Jannik Glückert
  Cc: andrea.gelmini, dsterba, josef, linux-btrfs, linux-kernel,
	mikhail.v.gavrilov, regressions

[-- Attachment #1: Type: text/plain, Size: 1508 bytes --]

On 2024-08-16 at 00:21 +0200, intelfx@intelfx.name wrote:
> On 2024-08-11 at 16:33 +0100, Filipe Manana wrote:
> > <...>
> > This came to my attention a couple days ago in a bugzilla report here:
> > 
> > https://bugzilla.kernel.org/show_bug.cgi?id=219121
> > 
> > There's also 2 other recent threads in the mailing about it.
> > 
> > There's a fix there in the bugzilla, and I've just sent it to the mailing list.
> > In case you want to try it:
> > 
> > https://lore.kernel.org/linux-btrfs/d85d72b968a1f7b8538c581eeb8f5baa973dfc95.1723377230.git.fdmanana@suse.com/
> > 
> > Thanks.
> 
> Hello,
> 
> I confirm that excessive "system" CPU usage by kswapd and btrfs-cleaner
> kernel threads is still happening on the latest 6.10 stable with all
> quoted patches applied, making the system close to unusable (not to
> mention excessive power usage which crosses the line well *into*
> "unusable" for low-power systems such as laptops).
> 
> With just 5 minutes of uptime on a freshly booted 6.10.5 system, the
> cumulative CPU time of kswapd is already at 2 minutes.

As a follow-up, after 1 hour of uptime of this system the total CPU
time of kswapd0 is exactly 30 minutes. So whatever is the theoretical
OOM issue that the extent map shrinker is trying to solve, the solution
in its current form is clearly unacceptable.

Can we please have it reverted on the basis of this severe regression,
until a better solution is found?

Thanks,
-- 
Ivan Shapovalov / intelfx /


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 862 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-08-15 23:17                                                     ` intelfx
@ 2024-08-16  0:02                                                       ` David Sterba
  2024-08-16  6:42                                                       ` Andrea Gelmini
  2024-08-16 10:58                                                       ` Filipe Manana
  2 siblings, 0 replies; 56+ messages in thread
From: David Sterba @ 2024-08-16  0:02 UTC (permalink / raw)
  To: intelfx
  Cc: Filipe Manana, Jannik Glückert, andrea.gelmini, dsterba,
	josef, linux-btrfs, linux-kernel, mikhail.v.gavrilov, regressions

On Fri, Aug 16, 2024 at 01:17:25AM +0200, intelfx@intelfx.name wrote:
> On 2024-08-16 at 00:21 +0200, intelfx@intelfx.name wrote:
> > On 2024-08-11 at 16:33 +0100, Filipe Manana wrote:
> > > <...>
> > > This came to my attention a couple days ago in a bugzilla report here:
> > > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=219121
> > > 
> > > There's also 2 other recent threads in the mailing about it.
> > > 
> > > There's a fix there in the bugzilla, and I've just sent it to the mailing list.
> > > In case you want to try it:
> > > 
> > > https://lore.kernel.org/linux-btrfs/d85d72b968a1f7b8538c581eeb8f5baa973dfc95.1723377230.git.fdmanana@suse.com/
> > > 
> > > Thanks.
> > 
> > Hello,
> > 
> > I confirm that excessive "system" CPU usage by kswapd and btrfs-cleaner
> > kernel threads is still happening on the latest 6.10 stable with all
> > quoted patches applied, making the system close to unusable (not to
> > mention excessive power usage which crosses the line well *into*
> > "unusable" for low-power systems such as laptops).
> > 
> > With just 5 minutes of uptime on a freshly booted 6.10.5 system, the
> > cumulative CPU time of kswapd is already at 2 minutes.
> 
> As a follow-up, after 1 hour of uptime of this system the total CPU
> time of kswapd0 is exactly 30 minutes. So whatever is the theoretical
> OOM issue that the extent map shrinker is trying to solve, the solution
> in its current form is clearly unacceptable.
> 
> Can we please have it reverted on the basis of this severe regression,
> until a better solution is found?

It's not just one patch so a clean revert may not be possible, I'll see
if there's another possibility to either avoid depending on shrinker to
free the data or do a different workaround.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-08-15 23:17                                                     ` intelfx
  2024-08-16  0:02                                                       ` David Sterba
@ 2024-08-16  6:42                                                       ` Andrea Gelmini
  2024-08-16  6:47                                                         ` Ivan Shapovalov
  2024-08-16 10:58                                                       ` Filipe Manana
  2 siblings, 1 reply; 56+ messages in thread
From: Andrea Gelmini @ 2024-08-16  6:42 UTC (permalink / raw)
  To: intelfx
  Cc: Filipe Manana, Jannik Glückert, dsterba, josef, linux-btrfs,
	linux-kernel, mikhail.v.gavrilov, regressions

Il giorno ven 16 ago 2024 alle ore 01:17 <intelfx@intelfx.name> ha scritto:
> Can we please have it reverted on the basis of this severe regression,
> until a better solution is found?

To disable the shrinker I simply remove two items:

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index f05cce7c8b8d..4f958ba61e0e 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2410,8 +2410,6 @@ static const struct super_operations btrfs_super_ops = {
       .statfs         = btrfs_statfs,
       .freeze_fs      = btrfs_freeze,
       .unfreeze_fs    = btrfs_unfreeze,
-   .nr_cached_objects = btrfs_nr_cached_objects,
-   .free_cached_objects = btrfs_free_cached_objects,
};

static const struct file_operations btrfs_ctl_fops = {

This is from my thread with Filipe about same topic you can find in
the mailing list archive.

Ciao,
Gelma

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-08-16  6:42                                                       ` Andrea Gelmini
@ 2024-08-16  6:47                                                         ` Ivan Shapovalov
  2024-08-16  7:45                                                           ` Qu Wenruo
  0 siblings, 1 reply; 56+ messages in thread
From: Ivan Shapovalov @ 2024-08-16  6:47 UTC (permalink / raw)
  To: Andrea Gelmini; +Cc: linux-btrfs, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1225 bytes --]

On 2024-08-16 at 08:42 +0200, Andrea Gelmini wrote:
> Il giorno ven 16 ago 2024 alle ore 01:17 <intelfx@intelfx.name> ha scritto:
> > Can we please have it reverted on the basis of this severe regression,
> > until a better solution is found?
> 
> To disable the shrinker I simply remove two items:
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index f05cce7c8b8d..4f958ba61e0e 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2410,8 +2410,6 @@ static const struct super_operations btrfs_super_ops = {
>        .statfs         = btrfs_statfs,
>        .freeze_fs      = btrfs_freeze,
>        .unfreeze_fs    = btrfs_unfreeze,
> -   .nr_cached_objects = btrfs_nr_cached_objects,
> -   .free_cached_objects = btrfs_free_cached_objects,
> };
> 
> static const struct file_operations btrfs_ctl_fops = {
> 
> This is from my thread with Filipe about same topic you can find in
> the mailing list archive.

Yes, that's what I did locally so far, on those systems that I _can_
run custom kernels on. The others I had to downgrade to 6.9 for the
time being. So I do have a vested interest in this being resolved in
the mainline/stable tree :-)

-- 
Ivan Shapovalov / intelfx /

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 862 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-08-16  6:47                                                         ` Ivan Shapovalov
@ 2024-08-16  7:45                                                           ` Qu Wenruo
  0 siblings, 0 replies; 56+ messages in thread
From: Qu Wenruo @ 2024-08-16  7:45 UTC (permalink / raw)
  To: Ivan Shapovalov, Andrea Gelmini; +Cc: linux-btrfs, linux-kernel



在 2024/8/16 16:17, Ivan Shapovalov 写道:
> On 2024-08-16 at 08:42 +0200, Andrea Gelmini wrote:
>> Il giorno ven 16 ago 2024 alle ore 01:17 <intelfx@intelfx.name> ha scritto:
>>> Can we please have it reverted on the basis of this severe regression,
>>> until a better solution is found?
>>
>> To disable the shrinker I simply remove two items:
>>
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index f05cce7c8b8d..4f958ba61e0e 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -2410,8 +2410,6 @@ static const struct super_operations btrfs_super_ops = {
>>         .statfs         = btrfs_statfs,
>>         .freeze_fs      = btrfs_freeze,
>>         .unfreeze_fs    = btrfs_unfreeze,
>> -   .nr_cached_objects = btrfs_nr_cached_objects,
>> -   .free_cached_objects = btrfs_free_cached_objects,
>> };
>>
>> static const struct file_operations btrfs_ctl_fops = {
>>
>> This is from my thread with Filipe about same topic you can find in
>> the mailing list archive.
>
> Yes, that's what I did locally so far, on those systems that I _can_
> run custom kernels on. The others I had to downgrade to 6.9 for the
> time being. So I do have a vested interest in this being resolved in
> the mainline/stable tree :-)
>

That's the most straightforward way to revert to the previous behavior.

Or you can try this patch, which is less obvious but should do the same
thing:
https://lore.kernel.org/linux-btrfs/09ca70ddac244d13780bd82866b8b708088362fb.1723770634.git.wqu@suse.com/T/#u

Meanwhile after looking into how XFS triggers its reclaim, I believe we
should not even bother using those callbacks.

XFS handles the trigger by making sure there is only one reclaim
workload queued, and the workload always delay 18s by default.

So for btrfs, I believe it's better to do the reclaim in the cleaner thread.

Will craft a proper fix for you guys to test, and since Filipe is on
vacation, we may go disable the reclaim workload for now.

Thanks,
Qu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-08-15 23:17                                                     ` intelfx
  2024-08-16  0:02                                                       ` David Sterba
  2024-08-16  6:42                                                       ` Andrea Gelmini
@ 2024-08-16 10:58                                                       ` Filipe Manana
  2024-08-16 11:16                                                         ` Ivan Shapovalov
  2 siblings, 1 reply; 56+ messages in thread
From: Filipe Manana @ 2024-08-16 10:58 UTC (permalink / raw)
  To: intelfx
  Cc: Jannik Glückert, andrea.gelmini, dsterba, josef, linux-btrfs,
	linux-kernel, mikhail.v.gavrilov, regressions

On Fri, Aug 16, 2024 at 12:17 AM <intelfx@intelfx.name> wrote:
>
> On 2024-08-16 at 00:21 +0200, intelfx@intelfx.name wrote:
> > On 2024-08-11 at 16:33 +0100, Filipe Manana wrote:
> > > <...>
> > > This came to my attention a couple days ago in a bugzilla report here:
> > >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=219121
> > >
> > > There's also 2 other recent threads in the mailing about it.
> > >
> > > There's a fix there in the bugzilla, and I've just sent it to the mailing list.
> > > In case you want to try it:
> > >
> > > https://lore.kernel.org/linux-btrfs/d85d72b968a1f7b8538c581eeb8f5baa973dfc95.1723377230.git.fdmanana@suse.com/
> > >
> > > Thanks.
> >
> > Hello,
> >
> > I confirm that excessive "system" CPU usage by kswapd and btrfs-cleaner
> > kernel threads is still happening on the latest 6.10 stable with all
> > quoted patches applied, making the system close to unusable (not to
> > mention excessive power usage which crosses the line well *into*
> > "unusable" for low-power systems such as laptops).
> >
> > With just 5 minutes of uptime on a freshly booted 6.10.5 system, the
> > cumulative CPU time of kswapd is already at 2 minutes.

Less than 24 hours before your message, there was a patch merged to
Linus' tree, which was not (and is not) yet in any stable release
(including 6.10.5 of course).
Have you tried that patch?

>
> As a follow-up, after 1 hour of uptime of this system the total CPU
> time of kswapd0 is exactly 30 minutes. So whatever is the theoretical
> OOM issue that the extent map shrinker is trying to solve, the solution

It's not a theoretical problem.
It's a problem that any unprivileged user can trigger provided that
the amount of available disk space is much higher than total RAM,
which is by far the most common case.

The problem is explained in the commit change log, there's a
reproducer and it was even reported by a user:

https://lore.kernel.org/linux-btrfs/13f94633dcf04d29aaf1f0a43d42c55e@amazon.com/

This link was included in the changelog of the patch when submitted to
the list [1], but somehow it disappeared when it was merged to the git
repository.

Any user can effectively trigger a denial of service by creating an
unlimited number of extent maps that never get removed while it keeps
a file descriptor open and doing writes, either with direct IO, which
is simpler, or even buffered IO in case it creates holes in the files
(example: keep doing append writes starting after current eof, to
create a bunch of holes). Even if that task doing that gets killed by
the OOM, as long as there are idle processes keeping the file open,
the problem doesn't go away.

[1] https://lore.kernel.org/linux-btrfs/1cb649870b6cad4411da7998735ab1141bb9f2f0.1712837044.git.fdmanana@suse.com/

> in its current form is clearly unacceptable.
>
> Can we please have it reverted on the basis of this severe regression,
> until a better solution is found?

Disabling the shrinker might be the best for now. I'm on vacation and
can't write and test code, but I do have plans for making it better
and solving any remaining issues.
There's already a patch for that from Qu.

>
> Thanks,
> --
> Ivan Shapovalov / intelfx /
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-08-16 10:58                                                       ` Filipe Manana
@ 2024-08-16 11:16                                                         ` Ivan Shapovalov
  2024-09-26 13:45                                                           ` Filipe Manana
  0 siblings, 1 reply; 56+ messages in thread
From: Ivan Shapovalov @ 2024-08-16 11:16 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Jannik Glückert, andrea.gelmini, dsterba, josef, linux-btrfs,
	linux-kernel, mikhail.v.gavrilov, regressions

[-- Attachment #1: Type: text/plain, Size: 4004 bytes --]

On 2024-08-16 at 11:58 +0100, Filipe Manana wrote:
> On Fri, Aug 16, 2024 at 12:17 AM <intelfx@intelfx.name> wrote:
> > 
> > On 2024-08-16 at 00:21 +0200, intelfx@intelfx.name wrote:
> > > On 2024-08-11 at 16:33 +0100, Filipe Manana wrote:
> > > > <...>
> > > > This came to my attention a couple days ago in a bugzilla report here:
> > > > 
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=219121
> > > > 
> > > > There's also 2 other recent threads in the mailing about it.
> > > > 
> > > > There's a fix there in the bugzilla, and I've just sent it to the mailing list.
> > > > In case you want to try it:
> > > > 
> > > > https://lore.kernel.org/linux-btrfs/d85d72b968a1f7b8538c581eeb8f5baa973dfc95.1723377230.git.fdmanana@suse.com/
> > > > 
> > > > Thanks.
> > > 
> > > Hello,
> > > 
> > > I confirm that excessive "system" CPU usage by kswapd and btrfs-cleaner
> > > kernel threads is still happening on the latest 6.10 stable with all
> > > quoted patches applied, making the system close to unusable (not to
> > > mention excessive power usage which crosses the line well *into*
> > > "unusable" for low-power systems such as laptops).
> > > 
> > > With just 5 minutes of uptime on a freshly booted 6.10.5 system, the
> > > cumulative CPU time of kswapd is already at 2 minutes.
> 
> Less than 24 hours before your message, there was a patch merged to
> Linus' tree, which was not (and is not) yet in any stable release
> (including 6.10.5 of course).
> Have you tried that patch?

Yes, I did — as I said, I tried 6.10.5 with all combinations of patches
ever posted in this thread (skipping those that I was not able to
apply; it seems that there were a few mutually incompatible attempts to
improve the extent map shrinker, some of which have already gone into
the stable tree, thus making others inapplicable).

> > As a follow-up, after 1 hour of uptime of this system the total CPU
> > time of kswapd0 is exactly 30 minutes. So whatever is the theoretical
> > OOM issue that the extent map shrinker is trying to solve, the solution
> 
> It's not a theoretical problem.
> It's a problem that any unprivileged user can trigger provided that
> the amount of available disk space is much higher than total RAM,
> which is by far the most common case.
> 
> The problem is explained in the commit change log, there's a
> reproducer and it was even reported by a user:
> 
> https://lore.kernel.org/linux-btrfs/13f94633dcf04d29aaf1f0a43d42c55e@amazon.com/
> 
> This link was included in the changelog of the patch when submitted to
> the list [1], but somehow it disappeared when it was merged to the git
> repository.
> 
> Any user can effectively trigger a denial of service by creating an
> unlimited number of extent maps that never get removed while it keeps
> a file descriptor open and doing writes, either with direct IO, which
> is simpler, or even buffered IO in case it creates holes in the files
> (example: keep doing append writes starting after current eof, to
> create a bunch of holes). Even if that task doing that gets killed by
> the OOM, as long as there are idle processes keeping the file open,
> the problem doesn't go away.

Sorry, I did not intend to sound dismissive — what I wanted to say was
that we fixed an edge case (and yes, I acknowledge that this edge case
could be a security problem) by instead pessimizing a common case.

-- 
Ivan Shapovalov / intelfx /

> [1] https://lore.kernel.org/linux-btrfs/1cb649870b6cad4411da7998735ab1141bb9f2f0.1712837044.git.fdmanana@suse.com/
> 
> > in its current form is clearly unacceptable.
> > 
> > Can we please have it reverted on the basis of this severe regression,
> > until a better solution is found?
> 
> Disabling the shrinker might be the best for now. I'm on vacation and
> can't write and test code, but I do have plans for making it better
> and solving any remaining issues.
> There's already a patch for that from Qu.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 862 bytes --]

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-08-16 11:16                                                         ` Ivan Shapovalov
@ 2024-09-26 13:45                                                           ` Filipe Manana
  0 siblings, 0 replies; 56+ messages in thread
From: Filipe Manana @ 2024-09-26 13:45 UTC (permalink / raw)
  To: Ivan Shapovalov
  Cc: Jannik Glückert, andrea.gelmini, dsterba, josef, linux-btrfs,
	linux-kernel, mikhail.v.gavrilov, regressions

On Fri, Aug 16, 2024 at 12:16 PM Ivan Shapovalov <intelfx@intelfx.name> wrote:
>
> On 2024-08-16 at 11:58 +0100, Filipe Manana wrote:
> > On Fri, Aug 16, 2024 at 12:17 AM <intelfx@intelfx.name> wrote:
> > >
> > > On 2024-08-16 at 00:21 +0200, intelfx@intelfx.name wrote:
> > > > On 2024-08-11 at 16:33 +0100, Filipe Manana wrote:
> > > > > <...>
> > > > > This came to my attention a couple days ago in a bugzilla report here:
> > > > >
> > > > > https://bugzilla.kernel.org/show_bug.cgi?id=219121
> > > > >
> > > > > There's also 2 other recent threads in the mailing about it.
> > > > >
> > > > > There's a fix there in the bugzilla, and I've just sent it to the mailing list.
> > > > > In case you want to try it:
> > > > >
> > > > > https://lore.kernel.org/linux-btrfs/d85d72b968a1f7b8538c581eeb8f5baa973dfc95.1723377230.git.fdmanana@suse.com/
> > > > >
> > > > > Thanks.
> > > >
> > > > Hello,
> > > >
> > > > I confirm that excessive "system" CPU usage by kswapd and btrfs-cleaner
> > > > kernel threads is still happening on the latest 6.10 stable with all
> > > > quoted patches applied, making the system close to unusable (not to
> > > > mention excessive power usage which crosses the line well *into*
> > > > "unusable" for low-power systems such as laptops).
> > > >
> > > > With just 5 minutes of uptime on a freshly booted 6.10.5 system, the
> > > > cumulative CPU time of kswapd is already at 2 minutes.
> >
> > Less than 24 hours before your message, there was a patch merged to
> > Linus' tree, which was not (and is not) yet in any stable release
> > (including 6.10.5 of course).
> > Have you tried that patch?
>
> Yes, I did — as I said, I tried 6.10.5 with all combinations of patches
> ever posted in this thread (skipping those that I was not able to
> apply; it seems that there were a few mutually incompatible attempts to
> improve the extent map shrinker, some of which have already gone into
> the stable tree, thus making others inapplicable).
>
> > > As a follow-up, after 1 hour of uptime of this system the total CPU
> > > time of kswapd0 is exactly 30 minutes. So whatever is the theoretical
> > > OOM issue that the extent map shrinker is trying to solve, the solution
> >
> > It's not a theoretical problem.
> > It's a problem that any unprivileged user can trigger provided that
> > the amount of available disk space is much higher than total RAM,
> > which is by far the most common case.
> >
> > The problem is explained in the commit change log, there's a
> > reproducer and it was even reported by a user:
> >
> > https://lore.kernel.org/linux-btrfs/13f94633dcf04d29aaf1f0a43d42c55e@amazon.com/
> >
> > This link was included in the changelog of the patch when submitted to
> > the list [1], but somehow it disappeared when it was merged to the git
> > repository.
> >
> > Any user can effectively trigger a denial of service by creating an
> > unlimited number of extent maps that never get removed while it keeps
> > a file descriptor open and doing writes, either with direct IO, which
> > is simpler, or even buffered IO in case it creates holes in the files
> > (example: keep doing append writes starting after current eof, to
> > create a bunch of holes). Even if that task doing that gets killed by
> > the OOM, as long as there are idle processes keeping the file open,
> > the problem doesn't go away.
>
> Sorry, I did not intend to sound dismissive — what I wanted to say was
> that we fixed an edge case (and yes, I acknowledge that this edge case
> could be a security problem) by instead pessimizing a common case.

So I've recently sent out a patchset to update the shrinker and
re-enable it again:

https://lore.kernel.org/linux-btrfs/cover.1727174151.git.fdmanana@suse.com/

It applies against the current for-next branch, and should apply
against a 6.11 release too, except for the last patch due to a rename
in a function: CONFIG_BTRFS_DEBUG to CONFIG_BTRFS_EXPERIMENTAL.
I can prepare a git branch based on a 6.11 release (or 6.10) if anyone
prefers that rather than manually picking patches and resolving
conflicts (or testing for-next which has many unrelated changes).

If any of you can test it and report, it would be much appreciated.
Thanks.


>
> --
> Ivan Shapovalov / intelfx /
>
> > [1] https://lore.kernel.org/linux-btrfs/1cb649870b6cad4411da7998735ab1141bb9f2f0.1712837044.git.fdmanana@suse.com/
> >
> > > in its current form is clearly unacceptable.
> > >
> > > Can we please have it reverted on the basis of this severe regression,
> > > until a better solution is found?
> >
> > Disabling the shrinker might be the best for now. I'm on vacation and
> > can't write and test code, but I do have plans for making it better
> > and solving any remaining issues.
> > There's already a patch for that from Qu.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04  9:48                 ` Filipe Manana
  2024-07-04  9:56                   ` Filipe Manana
@ 2024-07-04 11:18                   ` Andrea Gelmini
  2024-07-04 16:38                     ` Filipe Manana
  1 sibling, 1 reply; 56+ messages in thread
From: Andrea Gelmini @ 2024-07-04 11:18 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef

Il giorno gio 4 lug 2024 alle ore 11:49 Filipe Manana
<fdmanana@kernel.org> ha scritto:
> I'll try that soon and see if I can reproduce.

I'm creating a qcow installation with everything, to replicate this.
Sorry, it takes time.

By the way, it's just me? (latest git master btrfs-progs)
 btrfs-image /dev/blah-blah loop.dd
works perfectly, but
  btrfs-image -s  /dev/blah-blah loop.dd
generate an image impossible to mount:
[gio lug  4 11:20:05 2024] BTRFS info (device loop40): first mount of
filesystem 496b800d-2f32-46bb-b8d0-03d6f71cf4b2
[gio lug  4 11:20:05 2024] BTRFS info (device loop40): using crc32c
(crc32c-intel) checksum algorithm
[gio lug  4 11:20:05 2024] BTRFS info (device loop40): using free space tree
[gio lug  4 11:20:05 2024] BTRFS critical (device loop40): corrupt
leaf: root=1 block=40297906176 slot=6 ino=6, name hash mismatch with
key,
have 0x00000000365ce506 expect 0x000000008dbfc2d2
[gio lug  4 11:20:05 2024] BTRFS error (device loop40): read time tree
block corruption detected on logical 40297906176 mirror 1
[gio lug  4 11:20:05 2024] BTRFS critical (device loop40): corrupt
leaf: root=1 block=40297906176 slot=6 ino=6, name hash mismatch with
key,
have 0x00000000365ce506 expect 0x000000008dbfc2d2
[gio lug  4 11:20:05 2024] BTRFS error (device loop40): read time tree
block corruption detected on logical 40297906176 mirror 2
[gio lug  4 11:20:05 2024] BTRFS warning (device loop40): couldn't
read tree root
[gio lug  4 11:20:05 2024] BTRFS error (device loop40): open_ctree failed

> In the meanwhile, just curious: are you using swapfiles on btrfs?

never used on BTRFS (i have a dedicated nvme partition).

Same effect also disabling the swap, btw, and thp.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04 11:18                   ` Andrea Gelmini
@ 2024-07-04 16:38                     ` Filipe Manana
  2024-07-04 22:32                       ` Qu Wenruo
  0 siblings, 1 reply; 56+ messages in thread
From: Filipe Manana @ 2024-07-04 16:38 UTC (permalink / raw)
  To: Andrea Gelmini
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef,
	Qu Wenruo

On Thu, Jul 4, 2024 at 12:19 PM Andrea Gelmini <andrea.gelmini@gmail.com> wrote:
>
> Il giorno gio 4 lug 2024 alle ore 11:49 Filipe Manana
> <fdmanana@kernel.org> ha scritto:
> > I'll try that soon and see if I can reproduce.
>
> I'm creating a qcow installation with everything, to replicate this.
> Sorry, it takes time.
>
> By the way, it's just me? (latest git master btrfs-progs)
>  btrfs-image /dev/blah-blah loop.dd
> works perfectly, but
>   btrfs-image -s  /dev/blah-blah loop.dd
> generate an image impossible to mount:
> [gio lug  4 11:20:05 2024] BTRFS info (device loop40): first mount of
> filesystem 496b800d-2f32-46bb-b8d0-03d6f71cf4b2
> [gio lug  4 11:20:05 2024] BTRFS info (device loop40): using crc32c
> (crc32c-intel) checksum algorithm
> [gio lug  4 11:20:05 2024] BTRFS info (device loop40): using free space tree
> [gio lug  4 11:20:05 2024] BTRFS critical (device loop40): corrupt
> leaf: root=1 block=40297906176 slot=6 ino=6, name hash mismatch with
> key,

Sorry I have no idea about that. I don't use btrfs-image myself and I
don't think I even ever looked at its source code.
CC'ing Qu who might be interested in that.

I'll reply very soon to the emails about the performance issues that
correlated to related to the shrinker, there are some interesting
things to look for.

Thanks.


> have 0x00000000365ce506 expect 0x000000008dbfc2d2
> [gio lug  4 11:20:05 2024] BTRFS error (device loop40): read time tree
> block corruption detected on logical 40297906176 mirror 1
> [gio lug  4 11:20:05 2024] BTRFS critical (device loop40): corrupt
> leaf: root=1 block=40297906176 slot=6 ino=6, name hash mismatch with
> key,
> have 0x00000000365ce506 expect 0x000000008dbfc2d2
> [gio lug  4 11:20:05 2024] BTRFS error (device loop40): read time tree
> block corruption detected on logical 40297906176 mirror 2
> [gio lug  4 11:20:05 2024] BTRFS warning (device loop40): couldn't
> read tree root
> [gio lug  4 11:20:05 2024] BTRFS error (device loop40): open_ctree failed
>
> > In the meanwhile, just curious: are you using swapfiles on btrfs?
>
> never used on BTRFS (i have a dedicated nvme partition).
>
> Same effect also disabling the swap, btw, and thp.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04 16:38                     ` Filipe Manana
@ 2024-07-04 22:32                       ` Qu Wenruo
  2024-07-05  6:18                         ` Andrea Gelmini
  0 siblings, 1 reply; 56+ messages in thread
From: Qu Wenruo @ 2024-07-04 22:32 UTC (permalink / raw)
  To: Filipe Manana, Andrea Gelmini
  Cc: Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef,
	Qu Wenruo



在 2024/7/5 02:08, Filipe Manana 写道:
> On Thu, Jul 4, 2024 at 12:19 PM Andrea Gelmini <andrea.gelmini@gmail.com> wrote:
>>
>> Il giorno gio 4 lug 2024 alle ore 11:49 Filipe Manana
>> <fdmanana@kernel.org> ha scritto:
>>> I'll try that soon and see if I can reproduce.
>>
>> I'm creating a qcow installation with everything, to replicate this.
>> Sorry, it takes time.
>>
>> By the way, it's just me? (latest git master btrfs-progs)
>>   btrfs-image /dev/blah-blah loop.dd
>> works perfectly, but
>>    btrfs-image -s  /dev/blah-blah loop.dd
>> generate an image impossible to mount:
>> [gio lug  4 11:20:05 2024] BTRFS info (device loop40): first mount of
>> filesystem 496b800d-2f32-46bb-b8d0-03d6f71cf4b2
>> [gio lug  4 11:20:05 2024] BTRFS info (device loop40): using crc32c
>> (crc32c-intel) checksum algorithm
>> [gio lug  4 11:20:05 2024] BTRFS info (device loop40): using free space tree
>> [gio lug  4 11:20:05 2024] BTRFS critical (device loop40): corrupt
>> leaf: root=1 block=40297906176 slot=6 ino=6, name hash mismatch with
>> key,
>
> Sorry I have no idea about that. I don't use btrfs-image myself and I
> don't think I even ever looked at its source code.
> CC'ing Qu who might be interested in that.

That's the nature of "-s" option unfortunately.

Tree-checker has extra sanity checks to ensure the hash matches the name.

I think it's a little overkilled for image dump, would fix it soon.

Thanks,
Qu

>
> I'll reply very soon to the emails about the performance issues that
> correlated to related to the shrinker, there are some interesting
> things to look for.
>
> Thanks.
>
>
>> have 0x00000000365ce506 expect 0x000000008dbfc2d2
>> [gio lug  4 11:20:05 2024] BTRFS error (device loop40): read time tree
>> block corruption detected on logical 40297906176 mirror 1
>> [gio lug  4 11:20:05 2024] BTRFS critical (device loop40): corrupt
>> leaf: root=1 block=40297906176 slot=6 ino=6, name hash mismatch with
>> key,
>> have 0x00000000365ce506 expect 0x000000008dbfc2d2
>> [gio lug  4 11:20:05 2024] BTRFS error (device loop40): read time tree
>> block corruption detected on logical 40297906176 mirror 2
>> [gio lug  4 11:20:05 2024] BTRFS warning (device loop40): couldn't
>> read tree root
>> [gio lug  4 11:20:05 2024] BTRFS error (device loop40): open_ctree failed
>>
>>> In the meanwhile, just curious: are you using swapfiles on btrfs?
>>
>> never used on BTRFS (i have a dedicated nvme partition).
>>
>> Same effect also disabling the swap, btw, and thp.
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory
  2024-07-04 22:32                       ` Qu Wenruo
@ 2024-07-05  6:18                         ` Andrea Gelmini
  0 siblings, 0 replies; 56+ messages in thread
From: Andrea Gelmini @ 2024-07-05  6:18 UTC (permalink / raw)
  To: Qu Wenruo
  Cc: Filipe Manana, Mikhail Gavrilov, Linux List Kernel Mailing,
	Linux regressions mailing list, Btrfs BTRFS, dsterba, josef,
	Qu Wenruo

Il giorno ven 5 lug 2024 alle ore 00:32 Qu Wenruo
<quwenruo.btrfs@gmx.com> ha scritto:
> I think it's a little overkilled for image dump, would fix it soon.

Thanks a lot Qu! You are always fast and smart!

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2024-09-26 13:45 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-25 20:56 6.10/regression/bisected - after f1d97e769152 I spotted increased execution time of the kswapd0 process and symptoms as if there is not enough memory Mikhail Gavrilov
2024-06-26 10:48 ` Filipe Manana
2024-06-26 14:16   ` Mikhail Gavrilov
2024-07-01  9:30     ` Filipe Manana
2024-07-02 14:13       ` Mikhail Gavrilov
2024-07-02 17:22         ` Filipe Manana
2024-07-02 19:46           ` Chris Murphy
2024-07-03 10:32             ` Filipe Manana
2024-07-03 10:31           ` Filipe Manana
2024-07-03 10:44             ` Filipe Manana
2024-07-03 21:07               ` Andrea Gelmini
2024-07-04  9:48                 ` Filipe Manana
2024-07-04  9:56                   ` Filipe Manana
2024-07-04 10:50                     ` Mikhail Gavrilov
2024-07-04 13:33                     ` Andrea Gelmini
2024-07-04 13:47                       ` Andrea Gelmini
2024-07-04 14:48                         ` Andrea Gelmini
2024-07-04 17:25                           ` Filipe Manana
2024-07-04 17:31                             ` Filipe Manana
2024-07-04 22:15                             ` Andrea Gelmini
2024-07-04 22:23                               ` Andrea Gelmini
2024-07-05 11:00                               ` Filipe Manana
2024-07-05  6:30                             ` Andrea Gelmini
2024-07-05 11:06                               ` Filipe Manana
2024-07-05 18:36                             ` Mikhail Gavrilov
2024-07-05 23:09                               ` Filipe Manana
2024-07-06  0:11                             ` Andrea Gelmini
2024-07-06 12:07                               ` Andrea Gelmini
2024-07-06 17:37                                 ` Filipe Manana
2024-07-07  9:41                                   ` Filipe Manana
2024-07-07 10:15                                     ` Andrea Gelmini
2024-07-07 10:28                                       ` Filipe Manana
2024-07-07 11:15                                         ` Andrea Gelmini
2024-07-07 12:10                                           ` Filipe Manana
2024-07-07 11:35                                   ` Mikhail Gavrilov
2024-07-07 12:15                                     ` Filipe Manana
2024-07-07 19:16                                       ` Mikhail Gavrilov
2024-07-08 14:15                                         ` Filipe Manana
2024-07-10  9:24                                           ` Mikhail Gavrilov
2024-07-10 10:53                                             ` Filipe Manana
2024-08-11  8:08                                               ` Jannik Glückert
2024-08-11 15:33                                                 ` Filipe Manana
2024-08-14 21:24                                                   ` Jannik Glückert
2024-08-15 22:21                                                   ` intelfx
2024-08-15 23:17                                                     ` intelfx
2024-08-16  0:02                                                       ` David Sterba
2024-08-16  6:42                                                       ` Andrea Gelmini
2024-08-16  6:47                                                         ` Ivan Shapovalov
2024-08-16  7:45                                                           ` Qu Wenruo
2024-08-16 10:58                                                       ` Filipe Manana
2024-08-16 11:16                                                         ` Ivan Shapovalov
2024-09-26 13:45                                                           ` Filipe Manana
2024-07-04 11:18                   ` Andrea Gelmini
2024-07-04 16:38                     ` Filipe Manana
2024-07-04 22:32                       ` Qu Wenruo
2024-07-05  6:18                         ` Andrea Gelmini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox