Re: [RFC PATCH 00/16] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE

All of lore.kernel.org
 help / color / mirror / Atom feed

From: SeongJae Park <sj@kernel.org>
To: Shakeel Butt <shakeel.butt@linux.dev>
Cc: SeongJae Park <sj@kernel.org>,
	"Liam R. Howlett" <howlett@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	kernel-team@meta.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [RFC PATCH 00/16] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE
Date: Wed,  5 Mar 2025 14:58:02 -0800	[thread overview]
Message-ID: <20250305225803.60171-1-sj@kernel.org> (raw)
In-Reply-To: <ro2wtggwxbmwk6lhvcixwrefo44x7ggeumevv7lyupvudwxjsg@onh2e46eqzcy>

On Wed, 5 Mar 2025 12:22:25 -0800 Shakeel Butt <shakeel.butt@linux.dev> wrote:

> On Wed, Mar 05, 2025 at 10:15:55AM -0800, SeongJae Park wrote:
> > For MADV_DONTNEED[_LOCKED] or MADV_FREE madvise requests, tlb flushes
> > can happen for each vma of the given address ranges.  Because such tlb
> > flushes are for address ranges of same process, doing those in a batch
> > is more efficient while still being safe.  Modify madvise() and
> > process_madvise() entry level code path to do such batched tlb flushes,
> > while the internal unmap logics do only gathering of the tlb entries to
> > flush.
> > 
> > In more detail, modify the entry functions to initialize an mmu_gather
> > ojbect and pass it to the internal logics.  Also modify the internal
> > logics to do only gathering of the tlb entries to flush into the
> > received mmu_gather object.  After all internal function calls are done,
> > the entry functions finish the mmu_gather object to flush the gathered
> > tlb entries in the one batch.
> > 
> > Patches Seuquence
> > =================
> > 
> > First four patches are minor cleanups of madvise.c for readability.
> > 
> > Following four patches (patches 5-8) define new data structure for
> > managing information that required for batched tlb flushing (mmu_gather
> > and behavior), and update code paths for MADV_DONTNEED[_LOCKED] and
> > MADV_FREE handling internal logics to receive it.
> > 
> > Three patches (patches 9-11) for making internal MADV_DONTNEED[_LOCKED]
> > and MADV_FREE handling logic ready for batched tlb flushing follow. 
> 
> I think you forgot to complete the above sentence or the 'follow' at the
> end seems weird.

Thank you for catching this.  I just wanted to say these three patches come
after the previous ones.  I will wordsmith this part in the next version.

> 
> > The
> > patches keep the support of unbatched tlb flushes use case, for
> > fine-grained and safe transitions.
> > 
> > Next three patches (patches 12-14) update madvise() and
> > process_madvise() code to do the batched tlb flushes utilizing the
> > previous patches introduced changes.
> > 
> > Final two patches (patches 15-16) clean up the internal logics'
> > unbatched tlb flushes use case support code, which is no more be used.
> > 
> > Test Results
> > ============
> > 
> > I measured the time to apply MADV_DONTNEED advice to 256 MiB memory
> > using multiple process_madvise() calls.  I apply the advice in 4 KiB
> > sized regions granularity, but with varying batch size (vlen) from 1 to
> > 1024.  The source code for the measurement is available at GitHub[1].
> > 
> > The measurement results are as below.  'sz_batches' column shows the
> > batch size of process_madvise() calls.  'before' and 'after' columns are
> > the measured time to apply MADV_DONTNEED to the 256 MiB memory buffer in
> > nanoseconds, on kernels that built without and with the MADV_DONTNEED
> > tlb flushes batching patch of this series, respectively.  For the
> > baseline, mm-unstable tree of 2025-03-04[2] has been used.
> > 'after/before' column is the ratio of 'after' to 'before'.  So
> > 'afetr/before' value lower than 1.0 means this patch increased
> > efficiency over the baseline.  And lower value means better efficiency.
> 
> I would recommend to replace the after/end column with percentage i.e.
> percentage improvement or degradation.

Thank you for the nice suggestion.  I will do so in the next version.

> 
> > 
> >     sz_batches    before       after        after/before
> >     1             102842895    106507398    1.03563204828102
> >     2             73364942     74529223     1.01586971880929
> >     4             58823633     51608504     0.877343022998937
> >     8             47532390     44820223     0.942940655834895
> >     16            43591587     36727177     0.842529018271347
> >     32            44207282     33946975     0.767904595446515
> >     64            41832437     26738286     0.639175910310939
> >     128           40278193     23262940     0.577556694263817
> >     256           41568533     22355103     0.537789077136785
> >     512           41626638     22822516     0.54826709762148
> >     1024          44440870     22676017     0.510251419470411
> > 
> > For <=2 batch size, tlb flushes batching shows no big difference but
> > slight overhead.  I think that's in an error range of this simple
> > micro-benchmark, and therefore can be ignored.  
> 
> I would recommend to run the experiment multiple times and report
> averages and standard deviation which will support your error range
> claim.

Again, good suggestion.  I will do so.

> 
> > Starting from batch size
> > 4, however, tlb flushes batching shows clear efficiency gain.  The
> > efficiency gain tends to be proportional to the batch size, as expected.
> > The efficiency gain ranges from about 13 percent with batch size 4, and
> > up to 49 percent with batch size 1,024.
> > 
> > Please note that this is a very simple microbenchmark, so real
> > efficiency gain on real workload could be very different.
> > 
> 
> I think you are running a single thread benchmark on a free machine. I
> expect this series to be much more beneficial on loaded machine and for
> multi-threaded applications.

Your understanding of my test setup is correct and I agree to your expectation.

> No need to test that scenario but if you
> have already done that then it would be good to report.

I don't have such test results or plans for those with specific timeline for
now.  I will share those if I get a chance, of course.


Thanks,
SJ

next prev parent reply	other threads:[~2025-03-05 22:58 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-05 18:15 [RFC PATCH 00/16] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE SeongJae Park
2025-03-05 18:15 ` [RFC PATCH 01/16] mm/madvise: use is_memory_failure() from madvise_do_behavior() SeongJae Park
2025-03-05 20:25   ` Shakeel Butt
2025-03-05 23:13     ` SeongJae Park
2025-03-05 18:15 ` [RFC PATCH 02/16] mm/madvise: split out populate behavior check logic SeongJae Park
2025-03-05 20:32   ` Shakeel Butt
2025-03-05 23:18     ` SeongJae Park
2025-03-05 18:15 ` [RFC PATCH 03/16] mm/madvise: deduplicate madvise_do_behavior() skip case handlings SeongJae Park
2025-03-05 18:15 ` [RFC PATCH 04/16] mm/madvise: remove len parameter of madvise_do_behavior() SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 05/16] mm/madvise: define and use madvise_behavior struct for madvise_do_behavior() SeongJae Park
2025-03-05 21:02   ` Shakeel Butt
2025-03-05 21:40     ` Shakeel Butt
2025-03-05 23:56       ` SeongJae Park
2025-03-06  3:37         ` Shakeel Butt
2025-03-06  4:18           ` SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 06/16] mm/madvise: pass madvise_behavior struct to madvise_vma_behavior() SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 07/16] mm/madvise: make madvise_walk_vmas() visit function receives a void pointer SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 08/16] mm/madvise: pass madvise_behavior struct to madvise_dontneed_free() SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 09/16] mm/memory: split non-tlb flushing part from zap_page_range_single() SeongJae Park
2025-03-06 18:45   ` Shakeel Butt
2025-03-06 19:09     ` SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 10/16] mm/madvise: let madvise_dontneed_single_vma() caller batches tlb flushes SeongJae Park
2025-03-06 18:36   ` Shakeel Butt
2025-03-06 19:10     ` SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 11/16] mm/madvise: let madvise_free_single_vma() " SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 12/16] mm/madvise: batch tlb flushes for process_madvise(MADV_DONTNEED[_LOCKED]) SeongJae Park
2025-03-06 18:36   ` Shakeel Butt
2025-03-06 19:11     ` SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 13/16] mm/madvise: batch tlb flushes for process_madvise(MADV_FREE) SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 14/16] mm/madvise: batch tlb flushes for madvise(MADV_{DONTNEED[_LOCKED],FREE} SeongJae Park
2025-03-05 18:16 ` [RFC PATCH 15/16] mm/madvise: remove !tlb support from madvise_dontneed_single_vma() SeongJae Park
2025-03-06 18:37   ` Shakeel Butt
2025-03-05 18:16 ` [RFC PATCH 16/16] mm/madvise: remove !caller_tlb case of madvise_free_single_vma() SeongJae Park
2025-03-05 18:56 ` [RFC PATCH 00/16] mm/madvise: batch tlb flushes for MADV_DONTNEED and MADV_FREE Matthew Wilcox
2025-03-05 19:19   ` David Hildenbrand
2025-03-05 19:26     ` Lorenzo Stoakes
2025-03-05 19:35       ` David Hildenbrand
2025-03-05 19:39         ` Lorenzo Stoakes
2025-03-05 19:46     ` Shakeel Butt
2025-03-05 19:49       ` David Hildenbrand
2025-03-05 20:59         ` SeongJae Park
2025-03-05 19:49       ` Lorenzo Stoakes
2025-03-05 19:57         ` Shakeel Butt
2025-03-05 22:46           ` SeongJae Park
2025-03-05 20:22 ` Shakeel Butt
2025-03-05 22:58   ` SeongJae Park [this message]
2025-03-05 20:36 ` Nadav Amit
2025-03-05 23:02   ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250305225803.60171-1-sj@kernel.org \
    --to=sj@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=howlett@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=shakeel.butt@linux.dev \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.