[RFC PATCH 0/4] Introduce mempool pages bulk allocator the use it in dm-crypt

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yang Shi <shy828301@gmail.com>
To: mgorman@techsingularity.net, agk@redhat.com, snitzer@kernel.org,
	dm-devel@redhat.com, akpm@linux-foundation.org
Cc: linux-mm@kvack.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: [RFC PATCH 0/4] Introduce mempool pages bulk allocator the use it in dm-crypt
Date: Wed,  5 Oct 2022 11:03:37 -0700	[thread overview]
Message-ID: <20221005180341.1738796-1-shy828301@gmail.com> (raw)


We have full disk encryption enabled, profiling shows page allocations may
incur a noticeable overhead when writing.  The dm-crypt creates an "out"
bio for writing.  And fill the "out" bio with the same amount of pages
as "in" bio.  But the driver allocates one page at a time in a loop.  For
1M bio it means the driver has to call page allocator 256 times.  It seems
not that efficient.

Since v5.13 we have page bulk allocator supported, so dm-crypt could use
it to do page allocations more efficiently.

I could just call the page bulk allocator in dm-crypt driver before the
mempool allocator, but it seems ad-hoc and the quick search shows some
others do the similar thing, for example, f2fs compress, block bounce,
g2fs, ufs, etc.  So it seems more neat to implement a bulk allocation
API for mempool.

So introduce the mempool page bulk allocator.
The below APIs are introduced:
    - mempool_init_pages_bulk()
    - mempool_create_pages_bulk()
    They initialize the mempool for page bulk allocator.  The pool is filled
    by alloc_page() in a loop.
    
    - mempool_alloc_pages_bulk_list()
    - mempool_alloc_pages_bulk_array()
    They do bulk allocation from mempool.
    They do the below conceptually:
      1. Call bulk page allocator
      2. If the allocation is fulfilled then return otherwise try to
         allocate the remaining pages from the mempool
      3. If it is fulfilled then return otherwise retry from #1 with sleepable
         gfp
      4. If it is still failed, sleep for a while to wait for the mempool is
         refilled, then retry from #1
    The populated pages will stay on the list or array until the callers
    consume them or free them.
    Since mempool allocator is guaranteed to success in the sleepable context,
    so the two APIs return true for success or false for fail.  It is the
    caller's responsibility to handle failure case (partial allocation), just
    like the page bulk allocator.
    
The mempool typically is an object agnostic allocator, but bulk allocation
is only supported by pages, so the mempool bulk allocator is for page
allocation only as well.

With the mempool bulk allocator the IOPS of dm-crypt with 1M I/O would get
improved by approxiamately 6%.  The test is done on a VM with 80 vCPU and
64GB memory with an encrypted ram device (the impact from storage hardware
could be minimized so that we could benchmark the dm-crypt layer more
accurately).

Before the patch:
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=402MiB/s][r=0,w=402 IOPS][eta 00m:00s]
crypt: (groupid=0, jobs=1): err= 0: pid=233950: Thu Sep 15 16:23:10 2022
  write: IOPS=402, BW=403MiB/s (423MB/s)(23.6GiB/60002msec)
    slat (usec): min=2425, max=3819, avg=2480.84, stdev=34.00
    clat (usec): min=7, max=165751, avg=156398.72, stdev=4691.03
     lat (msec): min=2, max=168, avg=158.88, stdev= 4.69
    clat percentiles (msec):
     |  1.00th=[  157],  5.00th=[  157], 10.00th=[  157], 20.00th=[  157],
     | 30.00th=[  157], 40.00th=[  157], 50.00th=[  157], 60.00th=[  157],
     | 70.00th=[  157], 80.00th=[  157], 90.00th=[  157], 95.00th=[  157],
     | 99.00th=[  159], 99.50th=[  159], 99.90th=[  165], 99.95th=[  165],
     | 99.99th=[  167]
   bw (  KiB/s): min=405504, max=413696, per=99.71%, avg=411845.53, stdev=1155.04, samples=120
   iops        : min=  396, max=  404, avg=402.17, stdev= 1.15, samples=120
  lat (usec)   : 10=0.01%
  lat (msec)   : 4=0.01%, 10=0.01%, 20=0.02%, 50=0.05%, 100=0.08%
  lat (msec)   : 250=100.09%
  cpu          : usr=3.74%, sys=95.66%, ctx=27, majf=0, minf=4
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=103.1%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,24138,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=403MiB/s (423MB/s), 403MiB/s-403MiB/s (423MB/s-423MB/s), io=23.6GiB (25.4GB), run=60002-60002msec

After the patch:
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=430MiB/s][r=0,w=430 IOPS][eta 00m:00s]
crypt: (groupid=0, jobs=1): err= 0: pid=288730: Thu Sep 15 16:25:39 2022
  write: IOPS=430, BW=431MiB/s (452MB/s)(25.3GiB/60002msec)
    slat (usec): min=2253, max=3213, avg=2319.49, stdev=34.29
    clat (usec): min=6, max=149337, avg=146257.68, stdev=4239.52
     lat (msec): min=2, max=151, avg=148.58, stdev= 4.24
    clat percentiles (msec):
     |  1.00th=[  146],  5.00th=[  146], 10.00th=[  146], 20.00th=[  146],
     | 30.00th=[  146], 40.00th=[  146], 50.00th=[  146], 60.00th=[  146],
     | 70.00th=[  146], 80.00th=[  146], 90.00th=[  148], 95.00th=[  148],
     | 99.00th=[  148], 99.50th=[  148], 99.90th=[  150], 99.95th=[  150],
     | 99.99th=[  150]
   bw (  KiB/s): min=438272, max=442368, per=99.73%, avg=440463.57, stdev=1305.60, samples=120
   iops        : min=  428, max=  432, avg=430.12, stdev= 1.28, samples=120
  lat (usec)   : 10=0.01%
  lat (msec)   : 4=0.01%, 10=0.01%, 20=0.02%, 50=0.05%, 100=0.09%
  lat (msec)   : 250=100.07%
  cpu          : usr=3.78%, sys=95.37%, ctx=12778, majf=0, minf=4
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=103.1%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=0,25814,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
  WRITE: bw=431MiB/s (452MB/s), 431MiB/s-431MiB/s (452MB/s-452MB/s), io=25.3GiB (27.1GB), run=60002-60002msec

The function tracing also shows the time consumed by page allocations is
reduced significantly.  The test allocated 1M (256 pages) bio in the same
environment.

Before the patch:
It took approximately 600us by excluding the bio_add_page() calls.
2720.630754 |   56)  xfs_io-38859  |   2.571 us    |    mempool_alloc();
2720.630757 |   56)  xfs_io-38859  |   0.937 us    |    bio_add_page();
 2720.630758 |   56)  xfs_io-38859  |   1.772 us    |    mempool_alloc();
 2720.630760 |   56)  xfs_io-38859  |   0.852 us    |    bio_add_page();
….
2720.631559 |   56)  xfs_io-38859  |   2.058 us    |    mempool_alloc();
 2720.631561 |   56)  xfs_io-38859  |   0.717 us    |    bio_add_page();
 2720.631562 |   56)  xfs_io-38859  |   2.014 us    |    mempool_alloc();
 2720.631564 |   56)  xfs_io-38859  |   0.620 us    |    bio_add_page();

After the patch:
It took approxiamately 30us.
11564.266385 |   22) xfs_io-136183  | + 30.551 us   |    __alloc_pages_bulk();

Page allocations overhead is around 6% (600us/9853us) in dm-crypt layer shown by
function trace.  The data also matches the IOPS data shown by fio.

And the benchmark with 4K size I/O doesn't show measurable regression.


Yang Shi (4):
      mm: mempool: extract common initialization code
      mm: mempool: introduce page bulk allocator
      md: dm-crypt: move crypt_free_buffer_pages ahead
      md: dm-crypt: use mempool page bulk allocator

 drivers/md/dm-crypt.c   |  92 ++++++++++++++++-------------
 include/linux/mempool.h |  19 ++++++
 mm/mempool.c            | 227 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 3 files changed, 276 insertions(+), 62 deletions(-)

next             reply	other threads:[~2022-10-05 18:03 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-05 18:03 Yang Shi [this message]
2022-10-05 18:03 ` [PATCH 1/4] mm: mempool: extract common initialization code Yang Shi
2022-10-05 18:03 ` [PATCH 2/4] mm: mempool: introduce page bulk allocator Yang Shi
2022-10-05 19:35   ` kernel test robot
2022-10-06 14:47   ` Brian Foster
2022-10-06 18:43     ` Yang Shi
2022-10-14 12:03       ` Brian Foster
2022-10-18 17:51         ` Yang Shi
2022-10-13 12:38   ` Mel Gorman
2022-10-13 20:16     ` Yang Shi
2022-10-17  9:41       ` Mel Gorman
2022-10-18 18:01         ` Yang Shi
2022-10-21  9:19           ` Mel Gorman
2022-10-21 21:04             ` Yang Shi
2022-10-05 18:03 ` [PATCH 3/4] md: dm-crypt: move crypt_free_buffer_pages ahead Yang Shi
2022-10-05 18:03 ` [PATCH 4/4] md: dm-crypt: use mempool page bulk allocator Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221005180341.1738796-1-shy828301@gmail.com \
    --to=shy828301@gmail.com \
    --cc=agk@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dm-devel@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=snitzer@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).