* Re: Regression from "mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()" in stable kernels
2019-12-12 10:54 Regression from "mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()" in stable kernels Chen-Yu Tsai
@ 2019-12-12 11:19 ` Joerg Roedel
2019-12-12 11:22 ` Joerg Roedel
2019-12-12 11:19 ` Greg Kroah-Hartman
2019-12-13 18:57 ` Pavel Machek
2 siblings, 1 reply; 7+ messages in thread
From: Joerg Roedel @ 2019-12-12 11:19 UTC (permalink / raw)
To: Chen-Yu Tsai
Cc: Greg Kroah-Hartman, stable, Pavel Machek, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Andrew Morton
Hi,
On Thu, Dec 12, 2019 at 06:54:12PM +0800, Chen-Yu Tsai wrote:
> I'd like to report a very severe performance regression due to
>
> mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() in stable kernels
Yes, that is a known problem, with a couple of reports already in the
past months. And I posted a fix from which I thought it is on its way
upstream, but apparently its not:
https://lore.kernel.org/lkml/20191009124418.8286-1-joro@8bytes.org/
Adding Andrew and the x86 maintainers to Cc.
Regards,
Joerg
>
> in v4.19.88. I believe this was included since v4.19.67. It is also
> in all the other LTS kernels, except 3.16.
>
> So today I switched an x86_64 production server from v5.1.21 to
> v4.19.88, because we kept hitting runaway kcompactd and kswapd.
> Plus there was a significant increase in memory usage compared to
> v5.1.5. I'm still bisecting that on another production server.
>
> The service we run is one of the largest forums in Taiwan [1].
> It is a terminal-based bulletin board system running over telnet,
> SSH or a custom WebSocket bridge. The service itself is the
> one-process-per-user type of design from the old days. This
> means a lot of forks when there are user spikes or reconnections.
>
> (Reconnections happen because a lot of people use mobile apps that
> wrap the service, but they get disconnected as soon as they are
> backgrounded.)
>
> With v4.19.88 we saw a lot of contention on pgd_lock in the process
> fork path with CONFIG_VMAP_STACK=y:
>
> Samples: 937K of event 'cycles:ppp', Event count (approx.): 499112453614
> Children Self Command Shared Object Symbol
> + 31.15% 0.03% mbbsd [kernel.kallsyms]
> [k] entry_SYSCALL_64_after_hwframe
> + 31.12% 0.02% mbbsd [kernel.kallsyms]
> [k] do_syscall_64
> + 28.12% 0.42% mbbsd [kernel.kallsyms]
> [k] do_raw_spin_lock
> - 27.70% 27.62% mbbsd [kernel.kallsyms]
> [k] queued_spin_lock_slowpath
> - 18.73% __libc_fork
> - 18.33% entry_SYSCALL_64_after_hwframe
> do_syscall_64
> - _do_fork
> - 18.33% copy_process.part.64
> - 11.00% __vmalloc_node_range
> - 10.93% sync_global_pgds_l4
> do_raw_spin_lock
> queued_spin_lock_slowpath
> - 7.27% mm_init.isra.59
> pgd_alloc
> do_raw_spin_lock
> queued_spin_lock_slowpath
> - 8.68% 0x41fd89415541f689
> - __libc_start_main
> + 7.49% main
> + 0.90% main
>
> This hit us pretty hard, with the service dropping below one-third
> of its original capacity.
>
> With CONFIG_VMAP_STACK=n, the fork code path skips this, but other
> vmalloc users are still affected. One other area is the tty layer.
> This also causes problems for us since there can be as many as 15k
> users over SSH, some coming and going. So we got a lot of hung sshd
> processes as well. Unfortunately I don't have any perf reports or
> kernel logs to go with.
>
> Now I understand that there is already a fix in -next:
>
> https://lore.kernel.org/patchwork/patch/1137341/
>
> However the code has changed a lot in mainline and I'm not sure how
> to backport this. For now I just reverted the commit by hand by
> removing the offending code. Seems to work OK, and based on the commit
> logs I guess it's safe to do so, as we're not running X86-32 or PTI.
>
>
> Regards
> ChenYu
>
> [1] https://en.wikipedia.org/wiki/PTT_Bulletin_Board_System
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regression from "mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()" in stable kernels
2019-12-12 11:19 ` Joerg Roedel
@ 2019-12-12 11:22 ` Joerg Roedel
0 siblings, 0 replies; 7+ messages in thread
From: Joerg Roedel @ 2019-12-12 11:22 UTC (permalink / raw)
To: Chen-Yu Tsai
Cc: Greg Kroah-Hartman, stable, Pavel Machek, Thomas Gleixner,
Ingo Molnar, Borislav Petkov, Andrew Morton
On Thu, Dec 12, 2019 at 12:19:11PM +0100, Joerg Roedel wrote:
> Hi,
>
> On Thu, Dec 12, 2019 at 06:54:12PM +0800, Chen-Yu Tsai wrote:
> > I'd like to report a very severe performance regression due to
> >
> > mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() in stable kernels
>
> Yes, that is a known problem, with a couple of reports already in the
> past months. And I posted a fix from which I thought it is on its way
> upstream, but apparently its not:
>
> https://lore.kernel.org/lkml/20191009124418.8286-1-joro@8bytes.org/
Ah, I missed that it is in linux-next already. Sorry for the noise.
Joerg
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regression from "mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()" in stable kernels
2019-12-12 10:54 Regression from "mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()" in stable kernels Chen-Yu Tsai
2019-12-12 11:19 ` Joerg Roedel
@ 2019-12-12 11:19 ` Greg Kroah-Hartman
2019-12-12 11:31 ` Chen-Yu Tsai
2019-12-13 18:57 ` Pavel Machek
2 siblings, 1 reply; 7+ messages in thread
From: Greg Kroah-Hartman @ 2019-12-12 11:19 UTC (permalink / raw)
To: Chen-Yu Tsai; +Cc: Joerg Roedel, stable, Pavel Machek
On Thu, Dec 12, 2019 at 06:54:12PM +0800, Chen-Yu Tsai wrote:
> Hi,
>
> I'd like to report a very severe performance regression due to
>
> mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() in stable kernels
>
> in v4.19.88. I believe this was included since v4.19.67. It is also
> in all the other LTS kernels, except 3.16.
>
> So today I switched an x86_64 production server from v5.1.21 to
> v4.19.88, because we kept hitting runaway kcompactd and kswapd.
> Plus there was a significant increase in memory usage compared to
> v5.1.5. I'm still bisecting that on another production server.
>
> The service we run is one of the largest forums in Taiwan [1].
> It is a terminal-based bulletin board system running over telnet,
> SSH or a custom WebSocket bridge. The service itself is the
> one-process-per-user type of design from the old days. This
> means a lot of forks when there are user spikes or reconnections.
>
> (Reconnections happen because a lot of people use mobile apps that
> wrap the service, but they get disconnected as soon as they are
> backgrounded.)
>
> With v4.19.88 we saw a lot of contention on pgd_lock in the process
> fork path with CONFIG_VMAP_STACK=y:
>
> Samples: 937K of event 'cycles:ppp', Event count (approx.): 499112453614
> Children Self Command Shared Object Symbol
> + 31.15% 0.03% mbbsd [kernel.kallsyms]
> [k] entry_SYSCALL_64_after_hwframe
> + 31.12% 0.02% mbbsd [kernel.kallsyms]
> [k] do_syscall_64
> + 28.12% 0.42% mbbsd [kernel.kallsyms]
> [k] do_raw_spin_lock
> - 27.70% 27.62% mbbsd [kernel.kallsyms]
> [k] queued_spin_lock_slowpath
> - 18.73% __libc_fork
> - 18.33% entry_SYSCALL_64_after_hwframe
> do_syscall_64
> - _do_fork
> - 18.33% copy_process.part.64
> - 11.00% __vmalloc_node_range
> - 10.93% sync_global_pgds_l4
> do_raw_spin_lock
> queued_spin_lock_slowpath
> - 7.27% mm_init.isra.59
> pgd_alloc
> do_raw_spin_lock
> queued_spin_lock_slowpath
> - 8.68% 0x41fd89415541f689
> - __libc_start_main
> + 7.49% main
> + 0.90% main
>
> This hit us pretty hard, with the service dropping below one-third
> of its original capacity.
>
> With CONFIG_VMAP_STACK=n, the fork code path skips this, but other
> vmalloc users are still affected. One other area is the tty layer.
> This also causes problems for us since there can be as many as 15k
> users over SSH, some coming and going. So we got a lot of hung sshd
> processes as well. Unfortunately I don't have any perf reports or
> kernel logs to go with.
>
> Now I understand that there is already a fix in -next:
>
> https://lore.kernel.org/patchwork/patch/1137341/
>
> However the code has changed a lot in mainline and I'm not sure how
> to backport this. For now I just reverted the commit by hand by
> removing the offending code. Seems to work OK, and based on the commit
> logs I guess it's safe to do so, as we're not running X86-32 or PTI.
The above commit should resolve the issue for you, can you try it out on
5.4? And any reason you have to stick with the old 4.19 kernel?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regression from "mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()" in stable kernels
2019-12-12 11:19 ` Greg Kroah-Hartman
@ 2019-12-12 11:31 ` Chen-Yu Tsai
2019-12-12 12:19 ` Greg Kroah-Hartman
0 siblings, 1 reply; 7+ messages in thread
From: Chen-Yu Tsai @ 2019-12-12 11:31 UTC (permalink / raw)
To: Greg Kroah-Hartman; +Cc: Chen-Yu Tsai, Joerg Roedel, stable, Pavel Machek
On Thu, Dec 12, 2019 at 7:19 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Thu, Dec 12, 2019 at 06:54:12PM +0800, Chen-Yu Tsai wrote:
> > Hi,
> >
> > I'd like to report a very severe performance regression due to
> >
> > mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() in stable kernels
> >
> > in v4.19.88. I believe this was included since v4.19.67. It is also
> > in all the other LTS kernels, except 3.16.
> >
> > So today I switched an x86_64 production server from v5.1.21 to
> > v4.19.88, because we kept hitting runaway kcompactd and kswapd.
> > Plus there was a significant increase in memory usage compared to
> > v5.1.5. I'm still bisecting that on another production server.
> >
> > The service we run is one of the largest forums in Taiwan [1].
> > It is a terminal-based bulletin board system running over telnet,
> > SSH or a custom WebSocket bridge. The service itself is the
> > one-process-per-user type of design from the old days. This
> > means a lot of forks when there are user spikes or reconnections.
> >
> > (Reconnections happen because a lot of people use mobile apps that
> > wrap the service, but they get disconnected as soon as they are
> > backgrounded.)
> >
> > With v4.19.88 we saw a lot of contention on pgd_lock in the process
> > fork path with CONFIG_VMAP_STACK=y:
> >
> > Samples: 937K of event 'cycles:ppp', Event count (approx.): 499112453614
> > Children Self Command Shared Object Symbol
> > + 31.15% 0.03% mbbsd [kernel.kallsyms]
> > [k] entry_SYSCALL_64_after_hwframe
> > + 31.12% 0.02% mbbsd [kernel.kallsyms]
> > [k] do_syscall_64
> > + 28.12% 0.42% mbbsd [kernel.kallsyms]
> > [k] do_raw_spin_lock
> > - 27.70% 27.62% mbbsd [kernel.kallsyms]
> > [k] queued_spin_lock_slowpath
> > - 18.73% __libc_fork
> > - 18.33% entry_SYSCALL_64_after_hwframe
> > do_syscall_64
> > - _do_fork
> > - 18.33% copy_process.part.64
> > - 11.00% __vmalloc_node_range
> > - 10.93% sync_global_pgds_l4
> > do_raw_spin_lock
> > queued_spin_lock_slowpath
> > - 7.27% mm_init.isra.59
> > pgd_alloc
> > do_raw_spin_lock
> > queued_spin_lock_slowpath
> > - 8.68% 0x41fd89415541f689
> > - __libc_start_main
> > + 7.49% main
> > + 0.90% main
> >
> > This hit us pretty hard, with the service dropping below one-third
> > of its original capacity.
> >
> > With CONFIG_VMAP_STACK=n, the fork code path skips this, but other
> > vmalloc users are still affected. One other area is the tty layer.
> > This also causes problems for us since there can be as many as 15k
> > users over SSH, some coming and going. So we got a lot of hung sshd
> > processes as well. Unfortunately I don't have any perf reports or
> > kernel logs to go with.
> >
> > Now I understand that there is already a fix in -next:
> >
> > https://lore.kernel.org/patchwork/patch/1137341/
> >
> > However the code has changed a lot in mainline and I'm not sure how
> > to backport this. For now I just reverted the commit by hand by
> > removing the offending code. Seems to work OK, and based on the commit
> > logs I guess it's safe to do so, as we're not running X86-32 or PTI.
>
> The above commit should resolve the issue for you, can you try it out on
> 5.4? And any reason you have to stick with the old 4.19 kernel?
We typically run new kernels on the other server (the one I'm currently
doing git bisect on) for a couple weeks before running it on our main
server. That one doesn't see nearly as much load though. Also because
of the increased memory usage I was seeing in 5.1.21, I wasn't
particularly comfortable going directly to 5.4.
I suppose the reason for being overly cautious is that the server is a
pain to reboot. The service is monolithic, running on just the one server.
And any significant downtime _always_ hits the local newspapers. Combined
with the upcoming election, conspiracy theories start flying around. :(
Now that it looks stable, we probably won't be testing anything new until
mid-January.
ChenYu
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regression from "mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()" in stable kernels
2019-12-12 11:31 ` Chen-Yu Tsai
@ 2019-12-12 12:19 ` Greg Kroah-Hartman
0 siblings, 0 replies; 7+ messages in thread
From: Greg Kroah-Hartman @ 2019-12-12 12:19 UTC (permalink / raw)
To: Chen-Yu Tsai; +Cc: Joerg Roedel, stable, Pavel Machek
On Thu, Dec 12, 2019 at 07:31:54PM +0800, Chen-Yu Tsai wrote:
> On Thu, Dec 12, 2019 at 7:19 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Thu, Dec 12, 2019 at 06:54:12PM +0800, Chen-Yu Tsai wrote:
> > > Hi,
> > >
> > > I'd like to report a very severe performance regression due to
> > >
> > > mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() in stable kernels
> > >
> > > in v4.19.88. I believe this was included since v4.19.67. It is also
> > > in all the other LTS kernels, except 3.16.
> > >
> > > So today I switched an x86_64 production server from v5.1.21 to
> > > v4.19.88, because we kept hitting runaway kcompactd and kswapd.
> > > Plus there was a significant increase in memory usage compared to
> > > v5.1.5. I'm still bisecting that on another production server.
> > >
> > > The service we run is one of the largest forums in Taiwan [1].
> > > It is a terminal-based bulletin board system running over telnet,
> > > SSH or a custom WebSocket bridge. The service itself is the
> > > one-process-per-user type of design from the old days. This
> > > means a lot of forks when there are user spikes or reconnections.
> > >
> > > (Reconnections happen because a lot of people use mobile apps that
> > > wrap the service, but they get disconnected as soon as they are
> > > backgrounded.)
> > >
> > > With v4.19.88 we saw a lot of contention on pgd_lock in the process
> > > fork path with CONFIG_VMAP_STACK=y:
> > >
> > > Samples: 937K of event 'cycles:ppp', Event count (approx.): 499112453614
> > > Children Self Command Shared Object Symbol
> > > + 31.15% 0.03% mbbsd [kernel.kallsyms]
> > > [k] entry_SYSCALL_64_after_hwframe
> > > + 31.12% 0.02% mbbsd [kernel.kallsyms]
> > > [k] do_syscall_64
> > > + 28.12% 0.42% mbbsd [kernel.kallsyms]
> > > [k] do_raw_spin_lock
> > > - 27.70% 27.62% mbbsd [kernel.kallsyms]
> > > [k] queued_spin_lock_slowpath
> > > - 18.73% __libc_fork
> > > - 18.33% entry_SYSCALL_64_after_hwframe
> > > do_syscall_64
> > > - _do_fork
> > > - 18.33% copy_process.part.64
> > > - 11.00% __vmalloc_node_range
> > > - 10.93% sync_global_pgds_l4
> > > do_raw_spin_lock
> > > queued_spin_lock_slowpath
> > > - 7.27% mm_init.isra.59
> > > pgd_alloc
> > > do_raw_spin_lock
> > > queued_spin_lock_slowpath
> > > - 8.68% 0x41fd89415541f689
> > > - __libc_start_main
> > > + 7.49% main
> > > + 0.90% main
> > >
> > > This hit us pretty hard, with the service dropping below one-third
> > > of its original capacity.
> > >
> > > With CONFIG_VMAP_STACK=n, the fork code path skips this, but other
> > > vmalloc users are still affected. One other area is the tty layer.
> > > This also causes problems for us since there can be as many as 15k
> > > users over SSH, some coming and going. So we got a lot of hung sshd
> > > processes as well. Unfortunately I don't have any perf reports or
> > > kernel logs to go with.
> > >
> > > Now I understand that there is already a fix in -next:
> > >
> > > https://lore.kernel.org/patchwork/patch/1137341/
> > >
> > > However the code has changed a lot in mainline and I'm not sure how
> > > to backport this. For now I just reverted the commit by hand by
> > > removing the offending code. Seems to work OK, and based on the commit
> > > logs I guess it's safe to do so, as we're not running X86-32 or PTI.
> >
> > The above commit should resolve the issue for you, can you try it out on
> > 5.4? And any reason you have to stick with the old 4.19 kernel?
>
> We typically run new kernels on the other server (the one I'm currently
> doing git bisect on) for a couple weeks before running it on our main
> server. That one doesn't see nearly as much load though. Also because
> of the increased memory usage I was seeing in 5.1.21, I wasn't
> particularly comfortable going directly to 5.4.
>
> I suppose the reason for being overly cautious is that the server is a
> pain to reboot. The service is monolithic, running on just the one server.
> And any significant downtime _always_ hits the local newspapers. Combined
> with the upcoming election, conspiracy theories start flying around. :(
> Now that it looks stable, we probably won't be testing anything new until
> mid-January.
Fair enough, good luck!
greg k-h
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Regression from "mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()" in stable kernels
2019-12-12 10:54 Regression from "mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()" in stable kernels Chen-Yu Tsai
2019-12-12 11:19 ` Joerg Roedel
2019-12-12 11:19 ` Greg Kroah-Hartman
@ 2019-12-13 18:57 ` Pavel Machek
2 siblings, 0 replies; 7+ messages in thread
From: Pavel Machek @ 2019-12-13 18:57 UTC (permalink / raw)
To: Chen-Yu Tsai; +Cc: Joerg Roedel, Greg Kroah-Hartman, stable, Pavel Machek
[-- Attachment #1: Type: text/plain, Size: 1432 bytes --]
Hi!
> I'd like to report a very severe performance regression due to
>
> mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() in stable kernels
>
> in v4.19.88. I believe this was included since v4.19.67. It is also
> in all the other LTS kernels, except 3.16.
>
> So today I switched an x86_64 production server from v5.1.21 to
> v4.19.88, because we kept hitting runaway kcompactd and kswapd.
> Plus there was a significant increase in memory usage compared to
> v5.1.5. I'm still bisecting that on another production server.
>
> The service we run is one of the largest forums in Taiwan [1].
> It is a terminal-based bulletin board system running over telnet,
> SSH or a custom WebSocket bridge. The service itself is the
> one-process-per-user type of design from the old days. This
> means a lot of forks when there are user spikes or reconnections.
Sounds like fun :-).
I noticed that there's something vmalloc-related in 4.19.89,
Subject: [PATCH 4.19 210/243] x86/mm/32: Sync only to VMALLOC_END in
vmalloc_sync_all()
From: Joerg Roedel <jroedel@suse.de>
commit 9a62d20027da3164a22244d9f022c0c987261687 upstream.
But looking at the changelog again, it may not solve the performance
problem.
Best regards,
Pavel
--
DENX Software Engineering GmbH, Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread