[PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Baoquan He <bhe@redhat.com>
To: kexec@lists.infradead.org
Subject: [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
Date: Thu, 7 Apr 2022 10:32:07 +0800	[thread overview]
Message-ID: <Yk5NJ6CwNcf6yIfe@MiWiFi-R3L-srv> (raw)
In-Reply-To: <Yk31Bs06E1nFTelX@relinquished.localdomain>

On 04/06/22 at 01:16pm, Omar Sandoval wrote:
> On Wed, Apr 06, 2022 at 05:59:53PM +0800, Baoquan He wrote:
> > On 04/06/22 at 11:13am, Uladzislau Rezki wrote:
> > > > On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> > > > > A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> > > > > vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> > > > > think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> > > > > Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> > > > > areas itself.
> > > > 
> > > > This fixes the bug and the interface also is better than what we had
> > > > before.  But a vmap/iounmap_eager would seem even better.  But hey,
> > > > right now it has one caller in always built ?n x86 arch code, so maybe
> > > > it isn't worth spending more effort on this.
> > > >
> > > IMHO, it just makes sense to remove it. The set_iounmap_nonlazy() was
> > > added in 2010 year:
> > > 
> > > <snip>
> > > commit 3ee48b6af49cf534ca2f481ecc484b156a41451d
> > > Author: Cliff Wickman <cpw@sgi.com>
> > > Date:   Thu Sep 16 11:44:02 2010 -0500
> > > 
> > >     mm, x86: Saving vmcore with non-lazy freeing of vmas
> > > 
> > >     During the reading of /proc/vmcore the kernel is doing
> > >     ioremap()/iounmap() repeatedly. And the buildup of un-flushed
> > >     vm_area_struct's is causing a great deal of overhead. (rb_next()
> > >     is chewing up most of that time).
> > > 
> > >     This solution is to provide function set_iounmap_nonlazy(). It
> > >     causes a subsequent call to iounmap() to immediately purge the
> > >     vma area (with try_purge_vmap_area_lazy()).
> > > 
> > >     With this patch we have seen the time for writing a 250MB
> > >     compressed dump drop from 71 seconds to 44 seconds.
> > > 
> > >     Signed-off-by: Cliff Wickman <cpw@sgi.com>
> > >     Cc: Andrew Morton <akpm@linux-foundation.org>
> > >     Cc: kexec at lists.infradead.org
> > >     Cc: <stable@kernel.org>
> > >     LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com>
> > >     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > > <snip>
> > > 
> > > and the reason was the "slow vmap" code, i.e. due to poor performance
> > > they decided to drop the lazily ASAP. Now we have absolutely different
> > > picture when it comes to performance and the vmalloc/vmap code.
> > 
> > I would vote for the current code change, removing it. As pointed out by
> > Christoph, it's only used by x86, may not be so worth to introduce a new
> > interface.
> 
> I did a quick benchmark to see if this optimization is still needed.
> This is on a system with 32GB RAM. I timed
> `dd if=/proc/vmcore of=/dev/null` with 4k and 1M block sizes on 5.17,
> 5.18 with this fix, and 5.18 with the non-lazy cleanup removed entirely.
> It looks like Uladzislau has a point, and this "optimization" actually
> slows things down now:
> 
>   |5.17  |5.18+fix|5.18+removal
> 4k|40.86s|  40.09s|      26.73s
> 1M|24.47s|  23.98s|      21.84s
> 
> I'll send a v2 which removes set_iounmap_nonlazy() entirely.

Hi Omar,

Thanks for the effort on posting patch to fix this and further benchmark
testing.

The removing I said means what you are doing in v1. While from your
testing result, seems removing set_iounmap_nonlazy() directly is better.
I agree that the old optimization was made too long ago, should be not
needed any more. E.g the added purge_vmap_area_root tree will speed up
the searching, adding and removing of the purged vmap_area.

I am wondering if this is a real issue you met, or you just found it by
code inspecting. If it breaks vmcore dumping, we should add 'Fixes' tag
and cc stable.

I am wondering how your vmcore dumping is handled. Asking this because we
usually use makedumpfile utility to filter and dump those kernel data, the
unneeded data, e.g zero-ed pages, unused pages, are all filtered out. While
using makedumpfile, we use mmap which is 4M at one time by default, then
process the content. So the copy_oldmem_page() may only be called during
elfcorehdr and notes reading.

Thanks
Baoquan

WARNING: multiple messages have this Message-ID (diff)

From: Baoquan He <bhe@redhat.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	Christoph Hellwig <hch@lst.de>,
	linux-mm@kvack.org, kexec@lists.infradead.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Cliff Wickman <cpw@sgi.com>,
	x86@kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore
Date: Thu, 7 Apr 2022 10:32:07 +0800	[thread overview]
Message-ID: <Yk5NJ6CwNcf6yIfe@MiWiFi-R3L-srv> (raw)
In-Reply-To: <Yk31Bs06E1nFTelX@relinquished.localdomain>

On 04/06/22 at 01:16pm, Omar Sandoval wrote:
> On Wed, Apr 06, 2022 at 05:59:53PM +0800, Baoquan He wrote:
> > On 04/06/22 at 11:13am, Uladzislau Rezki wrote:
> > > > On Tue, Apr 05, 2022 at 12:40:31PM -0700, Omar Sandoval wrote:
> > > > > A simple way to "fix" this would be to make set_iounmap_nonlazy() set
> > > > > vmap_lazy_nr to lazy_max_pages() instead of lazy_max_pages() + 1. But, I
> > > > > think it'd be better to get rid of this hack of clobbering vmap_lazy_nr.
> > > > > Instead, this fix makes __copy_oldmem_page() explicitly drain the vmap
> > > > > areas itself.
> > > > 
> > > > This fixes the bug and the interface also is better than what we had
> > > > before.  But a vmap/iounmap_eager would seem even better.  But hey,
> > > > right now it has one caller in always built іn x86 arch code, so maybe
> > > > it isn't worth spending more effort on this.
> > > >
> > > IMHO, it just makes sense to remove it. The set_iounmap_nonlazy() was
> > > added in 2010 year:
> > > 
> > > <snip>
> > > commit 3ee48b6af49cf534ca2f481ecc484b156a41451d
> > > Author: Cliff Wickman <cpw@sgi.com>
> > > Date:   Thu Sep 16 11:44:02 2010 -0500
> > > 
> > >     mm, x86: Saving vmcore with non-lazy freeing of vmas
> > > 
> > >     During the reading of /proc/vmcore the kernel is doing
> > >     ioremap()/iounmap() repeatedly. And the buildup of un-flushed
> > >     vm_area_struct's is causing a great deal of overhead. (rb_next()
> > >     is chewing up most of that time).
> > > 
> > >     This solution is to provide function set_iounmap_nonlazy(). It
> > >     causes a subsequent call to iounmap() to immediately purge the
> > >     vma area (with try_purge_vmap_area_lazy()).
> > > 
> > >     With this patch we have seen the time for writing a 250MB
> > >     compressed dump drop from 71 seconds to 44 seconds.
> > > 
> > >     Signed-off-by: Cliff Wickman <cpw@sgi.com>
> > >     Cc: Andrew Morton <akpm@linux-foundation.org>
> > >     Cc: kexec@lists.infradead.org
> > >     Cc: <stable@kernel.org>
> > >     LKML-Reference: <E1OwHZ4-0005WK-Tw@eag09.americas.sgi.com>
> > >     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > > <snip>
> > > 
> > > and the reason was the "slow vmap" code, i.e. due to poor performance
> > > they decided to drop the lazily ASAP. Now we have absolutely different
> > > picture when it comes to performance and the vmalloc/vmap code.
> > 
> > I would vote for the current code change, removing it. As pointed out by
> > Christoph, it's only used by x86, may not be so worth to introduce a new
> > interface.
> 
> I did a quick benchmark to see if this optimization is still needed.
> This is on a system with 32GB RAM. I timed
> `dd if=/proc/vmcore of=/dev/null` with 4k and 1M block sizes on 5.17,
> 5.18 with this fix, and 5.18 with the non-lazy cleanup removed entirely.
> It looks like Uladzislau has a point, and this "optimization" actually
> slows things down now:
> 
>   |5.17  |5.18+fix|5.18+removal
> 4k|40.86s|  40.09s|      26.73s
> 1M|24.47s|  23.98s|      21.84s
> 
> I'll send a v2 which removes set_iounmap_nonlazy() entirely.

Hi Omar,

Thanks for the effort on posting patch to fix this and further benchmark
testing.

The removing I said means what you are doing in v1. While from your
testing result, seems removing set_iounmap_nonlazy() directly is better.
I agree that the old optimization was made too long ago, should be not
needed any more. E.g the added purge_vmap_area_root tree will speed up
the searching, adding and removing of the purged vmap_area.

I am wondering if this is a real issue you met, or you just found it by
code inspecting. If it breaks vmcore dumping, we should add 'Fixes' tag
and cc stable.

I am wondering how your vmcore dumping is handled. Asking this because we
usually use makedumpfile utility to filter and dump those kernel data, the
unneeded data, e.g zero-ed pages, unused pages, are all filtered out. While
using makedumpfile, we use mmap which is 4M at one time by default, then
process the content. So the copy_oldmem_page() may only be called during
elfcorehdr and notes reading.

Thanks
Baoquan

next prev parent reply	other threads:[~2022-04-07  2:32 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-05 19:40 [PATCH] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore Omar Sandoval
2022-04-05 19:40 ` Omar Sandoval
2022-04-05 20:28 ` Uladzislau Rezki
2022-04-05 20:28   ` Uladzislau Rezki
2022-04-06  4:42 ` Christoph Hellwig
2022-04-06  4:42   ` Christoph Hellwig
2022-04-06  9:13   ` Uladzislau Rezki
2022-04-06  9:13     ` Uladzislau Rezki
2022-04-06  9:59     ` Baoquan He
2022-04-06  9:59       ` Baoquan He
2022-04-06 20:16       ` Omar Sandoval
2022-04-06 20:16         ` Omar Sandoval
2022-04-07  2:32         ` Baoquan He [this message]
2022-04-07  2:32           ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yk5NJ6CwNcf6yIfe@MiWiFi-R3L-srv \
    --to=bhe@redhat.com \
    --cc=kexec@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.