[PATCHv3 0/1] Optimize ext4 file overwrites - perf improvement

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ritesh Harjani <riteshh@linux.ibm.com>
To: linux-ext4@vger.kernel.org
Cc: tytso@mit.edu, jack@suse.cz, dan.j.williams@intel.com,
	anju@linux.vnet.ibm.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Ritesh Harjani <riteshh@linux.ibm.com>
Subject: [PATCHv3 0/1] Optimize ext4 file overwrites - perf improvement
Date: Fri, 18 Sep 2020 10:36:34 +0530	[thread overview]
Message-ID: <cover.1600401668.git.riteshh@linux.ibm.com> (raw)

Hello,

v2 -> v3
1. Switched to suggested approach from Jan to make the approach general
for all file writes rather than only for DAX.
(So as of now both DAX & DIO should benefit from this as both uses the same
iomap path. Although note that I only tested performance improvement for DAX)

Gave a run on xfstests with -g quick,dax and didn't observe any new
issues with this patch.

In case of file writes, currently we start a journal txn irrespective of whether
it's an overwrite or not. In case of an overwrite we don't need to start a
jbd2 txn since the blocks are already allocated.
So this patch optimizes away the txn start in case of file (DAX/DIO) overwrites.
This could significantly boost performance for multi-threaded writes
specially random writes (overwrite).
Fio script used to collect perf numbers is mentioned below.

Below numbers were calculated on a QEMU setup on ppc64 box with simulated
pmem (fsdax) device. 

Didn't observe any new failures with this patch in xfstests "-g quick,dax"

Performance numbers with different threads - (~10x improvement)
==========================================

vanilla_kernel(kIOPS) (randomwrite)
 60 +-+------+-------+--------+--------+--------+-------+------+-+   
     |        +       +        +        +**      +       +        |   
  55 +-+                                 **                     +-+   
     |                          **       **                       |   
     |                          **       **                       |   
  50 +-+                        **       **                     +-+   
     |                          **       **                       |   
  45 +-+                        **       **                     +-+   
     |                          **       **                       |   
     |                          **       **                       |   
  40 +-+                        **       **                     +-+   
     |                          **       **                       |   
  35 +-+               **       **       **                     +-+   
     |                 **       **       **               **      |   
     |                 **       **       **      **       **      |   
  30 +-+      **       **       **       **      **       **    +-+   
     |        **      +**      +**      +**      **      +**      |   
  25 +-+------**------+**------+**------+**------**------+**----+-+   
              1       2        4        8       12      16            
                                     Threads                                   
patched_kernel(kIOPS) (randomwrite)
  600 +-+-----+--------+--------+-------+--------+-------+------+-+   
      |       +        +        +       +        +       +**      |   
      |                                                   **      |   
  500 +-+                                                 **    +-+   
      |                                                   **      |   
      |                                           **      **      |   
  400 +-+                                         **      **    +-+   
      |                                           **      **      |   
  300 +-+                                **       **      **    +-+   
      |                                  **       **      **      |   
      |                                  **       **      **      |   
  200 +-+                                **       **      **    +-+   
      |                         **       **       **      **      |   
      |                         **       **       **      **      |   
  100 +-+               **      **       **       **      **    +-+   
      |                 **      **       **       **      **      |   
      |       +**      +**      **      +**      +**     +**      |   
    0 +-+-----+**------+**------**------+**------+**-----+**----+-+   
              1        2        4       8       12      16            
                                    Threads                                   
fio script
==========
[global]
rw=randwrite
norandommap=1
invalidate=0
bs=4k
numjobs=16 		--> changed this for different thread options
time_based=1
ramp_time=30
runtime=60
group_reporting=1
ioengine=psync
direct=1
size=16G
filename=file1.0.0:file1.0.1:file1.0.2:file1.0.3:file1.0.4:file1.0.5:file1.0.6:file1.0.7:file1.0.8:file1.0.9:file1.0.10:file1.0.11:file1.0.12:file1.0.13:file1.0.14:file1.0.15:file1.0.16:file1.0.17:file1.0.18:file1.0.19:file1.0.20:file1.0.21:file1.0.22:file1.0.23:file1.0.24:file1.0.25:file1.0.26:file1.0.27:file1.0.28:file1.0.29:file1.0.30:file1.0.31
file_service_type=random
nrfiles=32
directory=/mnt/

[name]
directory=/mnt/
direct=1

NOTE:
======
1. Looking at ~10x perf delta, I probed a bit deeper to understand what's causing
this scalability problem. It seems when we are starting a jbd2 txn then slab
alloc code is observing some serious contention around spinlock.

I think that the spinlock contention problem in slab alloc path could be optimized
on PPC in general, will look into it seperately. But I could still see the
perf improvement of close to ~2x on QEMU setup on x86 with simulated pmem device
with the patched_kernel v/s vanilla_kernel with same fio workload.

perf report from vanilla_kernel (this is not seen with patched kernel) (ppc64)
=======================================================================

  47.86%  fio              [kernel.vmlinux]            [k] do_raw_spin_lock
             |
             ---do_raw_spin_lock
                |
                |--19.43%--_raw_spin_lock
                |          |
                |           --19.31%--0
                |                     |
                |                     |--9.77%--deactivate_slab.isra.61
                |                     |          ___slab_alloc
                |                     |          __slab_alloc
                |                     |          kmem_cache_alloc
                |                     |          jbd2__journal_start
                |                     |          __ext4_journal_start_sb
<...>

2. This problem was reported by Dan Williams at [1]

Links
======
[1]: https://lore.kernel.org/linux-ext4/20190802144304.GP25064@quack2.suse.cz/T/
[v2]: https://lkml.org/lkml/2020/8/22/123

Ritesh Harjani (1):
  ext4: Optimize file overwrites

 fs/ext4/inode.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

-- 
2.26.2

next             reply	other threads:[~2020-09-18  5:09 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-18  5:06 Ritesh Harjani [this message]
2020-09-18  5:06 ` [PATCHv3 1/1] ext4: Optimize file overwrites Ritesh Harjani
2020-09-18  7:52   ` Sedat Dilek
2020-09-18  9:52   ` Jan Kara
2020-09-25  7:12   ` [ext4] 4e8fc10115: fio.write_iops 330.6% improvement kernel test robot
2020-09-25  7:12     ` kernel test robot
2020-10-03  4:49   ` [PATCHv3 1/1] ext4: Optimize file overwrites Theodore Y. Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1600401668.git.riteshh@linux.ibm.com \
    --to=riteshh@linux.ibm.com \
    --cc=anju@linux.vnet.ibm.com \
    --cc=dan.j.williams@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.