All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 0/3] Optimize ext4 DAX overwrites
@ 2020-08-22 11:34 Ritesh Harjani
  2020-08-22 11:34 ` [PATCHv2 1/3] ext4: Refactor ext4_overwrite_io() to take ext4_map_blocks as argument Ritesh Harjani
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Ritesh Harjani @ 2020-08-22 11:34 UTC (permalink / raw)
  To: linux-ext4
  Cc: jack, tytso, Dan Williams, Anju T Sudhakar, linux-fsdevel,
	linux-kernel, Ritesh Harjani

Hello,

RFC -> v2
1. Addressed comments from Jan.
2. Added xfstests results in cover letter.

In case of dax writes, currently we start a journal txn irrespective of whether
it's an overwrite or not. In case of an overwrite we don't need to start a
jbd2 txn since the blocks are already allocated.
So this patch optimizes away the txn start in case of DAX overwrites.
This could significantly boost performance for multi-threaded writes
specially random writes (overwrite).
Fio script used to collect perf numbers is mentioned below.

Below numbers were calculated on a QEMU setup on ppc64 box with simulated
pmem device. 

Didn't observe any new failures with this patch in xfstests "-g quick,dax"

Performance numbers with different threads - (~10x improvement)
==========================================

vanilla_kernel(kIOPS) (randomwrite)
 60 +-+------+-------+--------+--------+--------+-------+------+-+   
     |        +       +        +        +**      +       +        |   
  55 +-+                                 **                     +-+   
     |                          **       **                       |   
     |                          **       **                       |   
  50 +-+                        **       **                     +-+   
     |                          **       **                       |   
  45 +-+                        **       **                     +-+   
     |                          **       **                       |   
     |                          **       **                       |   
  40 +-+                        **       **                     +-+   
     |                          **       **                       |   
  35 +-+               **       **       **                     +-+   
     |                 **       **       **               **      |   
     |                 **       **       **      **       **      |   
  30 +-+      **       **       **       **      **       **    +-+   
     |        **      +**      +**      +**      **      +**      |   
  25 +-+------**------+**------+**------+**------**------+**----+-+   
              1       2        4        8       12      16            
                                     Threads                                   
patched_kernel(kIOPS) (randomwrite)
  600 +-+-----+--------+--------+-------+--------+-------+------+-+   
      |       +        +        +       +        +       +**      |   
      |                                                   **      |   
  500 +-+                                                 **    +-+   
      |                                                   **      |   
      |                                           **      **      |   
  400 +-+                                         **      **    +-+   
      |                                           **      **      |   
  300 +-+                                **       **      **    +-+   
      |                                  **       **      **      |   
      |                                  **       **      **      |   
  200 +-+                                **       **      **    +-+   
      |                         **       **       **      **      |   
      |                         **       **       **      **      |   
  100 +-+               **      **       **       **      **    +-+   
      |                 **      **       **       **      **      |   
      |       +**      +**      **      +**      +**     +**      |   
    0 +-+-----+**------+**------**------+**------+**-----+**----+-+   
              1        2        4       8       12      16            
                                    Threads                                   
fio script
==========
[global]
rw=randwrite
norandommap=1
invalidate=0
bs=4k
numjobs=16 		--> changed this for different thread options
time_based=1
ramp_time=30
runtime=60
group_reporting=1
ioengine=psync
direct=1
size=16G
filename=file1.0.0:file1.0.1:file1.0.2:file1.0.3:file1.0.4:file1.0.5:file1.0.6:file1.0.7:file1.0.8:file1.0.9:file1.0.10:file1.0.11:file1.0.12:file1.0.13:file1.0.14:file1.0.15:file1.0.16:file1.0.17:file1.0.18:file1.0.19:file1.0.20:file1.0.21:file1.0.22:file1.0.23:file1.0.24:file1.0.25:file1.0.26:file1.0.27:file1.0.28:file1.0.29:file1.0.30:file1.0.31
file_service_type=random
nrfiles=32
directory=/mnt/

[name]
directory=/mnt/
direct=1

NOTE:
======
1. Looking at ~10x perf delta, I probed a bit deeper to understand what's causing
this scalability problem. It seems when we are starting a jbd2 txn then slab
alloc code is observing some serious contention around spinlock.

Even though the spinlock contention could be related to some other
issue (looking into it internally). But I could still see the perf improvement
of close to ~2x on QEMU setup on x86 with simulated pmem device with the
patched_kernel v/s vanilla_kernel with same fio workload.

perf report from vanilla_kernel (this is not seen with patched kernel) (ppc64)
=======================================================================

  47.86%  fio              [kernel.vmlinux]            [k] do_raw_spin_lock
             |
             ---do_raw_spin_lock
                |
                |--19.43%--_raw_spin_lock
                |          |
                |           --19.31%--0
                |                     |
                |                     |--9.77%--deactivate_slab.isra.61
                |                     |          ___slab_alloc
                |                     |          __slab_alloc
                |                     |          kmem_cache_alloc
                |                     |          jbd2__journal_start
                |                     |          __ext4_journal_start_sb
<...>

2. This problem was reported by Dan Williams at [1]

Links
======
[1]: https://lore.kernel.org/linux-ext4/20190802144304.GP25064@quack2.suse.cz/T/

Ritesh Harjani (3):
  ext4: Refactor ext4_overwrite_io() to take ext4_map_blocks as argument
  ext4: Extend ext4_overwrite_io() for dax path
  ext4: Optimize ext4 DAX overwrites

 fs/ext4/ext4.h  |  2 ++
 fs/ext4/file.c  | 28 ++++++++++++++++++----------
 fs/ext4/inode.c | 11 +++++++++--
 3 files changed, 29 insertions(+), 12 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-10-03  3:59 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-22 11:34 [PATCHv2 0/3] Optimize ext4 DAX overwrites Ritesh Harjani
2020-08-22 11:34 ` [PATCHv2 1/3] ext4: Refactor ext4_overwrite_io() to take ext4_map_blocks as argument Ritesh Harjani
2020-08-24 12:15   ` Dan Carpenter
2020-08-24 12:15     ` Dan Carpenter
2020-10-03  3:59   ` Theodore Y. Ts'o
2020-08-22 11:34 ` [PATCHv2 2/3] ext4: Extend ext4_overwrite_io() for dax path Ritesh Harjani
2020-08-22 11:34 ` [PATCHv2 3/3] ext4: Optimize ext4 DAX overwrites Ritesh Harjani

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.