All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 0/3] Optimize ext4 DAX overwrites
@ 2020-08-22 11:34 Ritesh Harjani
  2020-08-22 11:34 ` [PATCHv2 1/3] ext4: Refactor ext4_overwrite_io() to take ext4_map_blocks as argument Ritesh Harjani
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Ritesh Harjani @ 2020-08-22 11:34 UTC (permalink / raw)
  To: linux-ext4
  Cc: jack, tytso, Dan Williams, Anju T Sudhakar, linux-fsdevel,
	linux-kernel, Ritesh Harjani

Hello,

RFC -> v2
1. Addressed comments from Jan.
2. Added xfstests results in cover letter.

In case of dax writes, currently we start a journal txn irrespective of whether
it's an overwrite or not. In case of an overwrite we don't need to start a
jbd2 txn since the blocks are already allocated.
So this patch optimizes away the txn start in case of DAX overwrites.
This could significantly boost performance for multi-threaded writes
specially random writes (overwrite).
Fio script used to collect perf numbers is mentioned below.

Below numbers were calculated on a QEMU setup on ppc64 box with simulated
pmem device. 

Didn't observe any new failures with this patch in xfstests "-g quick,dax"

Performance numbers with different threads - (~10x improvement)
==========================================

vanilla_kernel(kIOPS) (randomwrite)
 60 +-+------+-------+--------+--------+--------+-------+------+-+   
     |        +       +        +        +**      +       +        |   
  55 +-+                                 **                     +-+   
     |                          **       **                       |   
     |                          **       **                       |   
  50 +-+                        **       **                     +-+   
     |                          **       **                       |   
  45 +-+                        **       **                     +-+   
     |                          **       **                       |   
     |                          **       **                       |   
  40 +-+                        **       **                     +-+   
     |                          **       **                       |   
  35 +-+               **       **       **                     +-+   
     |                 **       **       **               **      |   
     |                 **       **       **      **       **      |   
  30 +-+      **       **       **       **      **       **    +-+   
     |        **      +**      +**      +**      **      +**      |   
  25 +-+------**------+**------+**------+**------**------+**----+-+   
              1       2        4        8       12      16            
                                     Threads                                   
patched_kernel(kIOPS) (randomwrite)
  600 +-+-----+--------+--------+-------+--------+-------+------+-+   
      |       +        +        +       +        +       +**      |   
      |                                                   **      |   
  500 +-+                                                 **    +-+   
      |                                                   **      |   
      |                                           **      **      |   
  400 +-+                                         **      **    +-+   
      |                                           **      **      |   
  300 +-+                                **       **      **    +-+   
      |                                  **       **      **      |   
      |                                  **       **      **      |   
  200 +-+                                **       **      **    +-+   
      |                         **       **       **      **      |   
      |                         **       **       **      **      |   
  100 +-+               **      **       **       **      **    +-+   
      |                 **      **       **       **      **      |   
      |       +**      +**      **      +**      +**     +**      |   
    0 +-+-----+**------+**------**------+**------+**-----+**----+-+   
              1        2        4       8       12      16            
                                    Threads                                   
fio script
==========
[global]
rw=randwrite
norandommap=1
invalidate=0
bs=4k
numjobs=16 		--> changed this for different thread options
time_based=1
ramp_time=30
runtime=60
group_reporting=1
ioengine=psync
direct=1
size=16G
filename=file1.0.0:file1.0.1:file1.0.2:file1.0.3:file1.0.4:file1.0.5:file1.0.6:file1.0.7:file1.0.8:file1.0.9:file1.0.10:file1.0.11:file1.0.12:file1.0.13:file1.0.14:file1.0.15:file1.0.16:file1.0.17:file1.0.18:file1.0.19:file1.0.20:file1.0.21:file1.0.22:file1.0.23:file1.0.24:file1.0.25:file1.0.26:file1.0.27:file1.0.28:file1.0.29:file1.0.30:file1.0.31
file_service_type=random
nrfiles=32
directory=/mnt/

[name]
directory=/mnt/
direct=1

NOTE:
======
1. Looking at ~10x perf delta, I probed a bit deeper to understand what's causing
this scalability problem. It seems when we are starting a jbd2 txn then slab
alloc code is observing some serious contention around spinlock.

Even though the spinlock contention could be related to some other
issue (looking into it internally). But I could still see the perf improvement
of close to ~2x on QEMU setup on x86 with simulated pmem device with the
patched_kernel v/s vanilla_kernel with same fio workload.

perf report from vanilla_kernel (this is not seen with patched kernel) (ppc64)
=======================================================================

  47.86%  fio              [kernel.vmlinux]            [k] do_raw_spin_lock
             |
             ---do_raw_spin_lock
                |
                |--19.43%--_raw_spin_lock
                |          |
                |           --19.31%--0
                |                     |
                |                     |--9.77%--deactivate_slab.isra.61
                |                     |          ___slab_alloc
                |                     |          __slab_alloc
                |                     |          kmem_cache_alloc
                |                     |          jbd2__journal_start
                |                     |          __ext4_journal_start_sb
<...>

2. This problem was reported by Dan Williams at [1]

Links
======
[1]: https://lore.kernel.org/linux-ext4/20190802144304.GP25064@quack2.suse.cz/T/

Ritesh Harjani (3):
  ext4: Refactor ext4_overwrite_io() to take ext4_map_blocks as argument
  ext4: Extend ext4_overwrite_io() for dax path
  ext4: Optimize ext4 DAX overwrites

 fs/ext4/ext4.h  |  2 ++
 fs/ext4/file.c  | 28 ++++++++++++++++++----------
 fs/ext4/inode.c | 11 +++++++++--
 3 files changed, 29 insertions(+), 12 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 8+ messages in thread
* Re: [PATCHv2 1/3] ext4: Refactor ext4_overwrite_io() to take ext4_map_blocks as argument
@ 2020-08-22 17:52 kernel test robot
  0 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2020-08-22 17:52 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 3726 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <057a08972f818c035621a9fd3ff870bedcdf5e83.1598094830.git.riteshh@linux.ibm.com>
References: <057a08972f818c035621a9fd3ff870bedcdf5e83.1598094830.git.riteshh@linux.ibm.com>
TO: Ritesh Harjani <riteshh@linux.ibm.com>
TO: linux-ext4(a)vger.kernel.org
CC: jack(a)suse.cz
CC: tytso(a)mit.edu
CC: Dan Williams <dan.j.williams@intel.com>
CC: Anju T Sudhakar <anju@linux.vnet.ibm.com>
CC: linux-fsdevel(a)vger.kernel.org
CC: linux-kernel(a)vger.kernel.org
CC: Ritesh Harjani <riteshh@linux.ibm.com>

Hi Ritesh,

I love your patch! Perhaps something to improve:

[auto build test WARNING on v5.9-rc1]
[cannot apply to ext4/dev next-20200821]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Ritesh-Harjani/Optimize-ext4-DAX-overwrites/20200822-193615
base:    9123e3a74ec7b934a4a099e98af6a61c2f80bbf5
:::::: branch date: 6 hours ago
:::::: commit date: 6 hours ago
config: i386-randconfig-m021-20200822 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

New smatch warnings:
fs/ext4/file.c:194 ext4_overwrite_io() warn: should '(map->m_lblk + map->m_len) << blkbits' be a 64 bit type?

Old smatch warnings:
include/linux/fs.h:867 i_size_write() warn: statement has no effect 31
fs/ext4/file.c:585 ext4_dio_write_iter() warn: inconsistent returns 'inode->i_rwsem'.

# https://github.com/0day-ci/linux/commit/5d171d1d87ee0aca0a992b6843d154b41466e5e5
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Ritesh-Harjani/Optimize-ext4-DAX-overwrites/20200822-193615
git checkout 5d171d1d87ee0aca0a992b6843d154b41466e5e5
vim +194 fs/ext4/file.c

e9e3bcecf44c04 Eric Sandeen   2011-02-12  189  
213bcd9ccbf04b Jan Kara       2016-11-20  190  /* Is IO overwriting allocated and initialized blocks? */
5d171d1d87ee0a Ritesh Harjani 2020-08-22  191  static bool ext4_overwrite_io(struct inode *inode, struct ext4_map_blocks *map)
213bcd9ccbf04b Jan Kara       2016-11-20  192  {
213bcd9ccbf04b Jan Kara       2016-11-20  193  	unsigned int blkbits = inode->i_blkbits;
5d171d1d87ee0a Ritesh Harjani 2020-08-22 @194  	loff_t end = (map->m_lblk + map->m_len) << blkbits;
5d171d1d87ee0a Ritesh Harjani 2020-08-22  195  	int err, blklen = map->m_len;
213bcd9ccbf04b Jan Kara       2016-11-20  196  
5d171d1d87ee0a Ritesh Harjani 2020-08-22  197  	if (end > i_size_read(inode))
213bcd9ccbf04b Jan Kara       2016-11-20  198  		return false;
213bcd9ccbf04b Jan Kara       2016-11-20  199  
5d171d1d87ee0a Ritesh Harjani 2020-08-22  200  	err = ext4_map_blocks(NULL, inode, map, 0);
213bcd9ccbf04b Jan Kara       2016-11-20  201  	/*
213bcd9ccbf04b Jan Kara       2016-11-20  202  	 * 'err==len' means that all of the blocks have been preallocated,
213bcd9ccbf04b Jan Kara       2016-11-20  203  	 * regardless of whether they have been initialized or not. To exclude
213bcd9ccbf04b Jan Kara       2016-11-20  204  	 * unwritten extents, we need to check m_flags.
213bcd9ccbf04b Jan Kara       2016-11-20  205  	 */
5d171d1d87ee0a Ritesh Harjani 2020-08-22  206  	return err == blklen && (map->m_flags & EXT4_MAP_MAPPED);
213bcd9ccbf04b Jan Kara       2016-11-20  207  }
213bcd9ccbf04b Jan Kara       2016-11-20  208  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 35937 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-10-03  3:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-08-22 11:34 [PATCHv2 0/3] Optimize ext4 DAX overwrites Ritesh Harjani
2020-08-22 11:34 ` [PATCHv2 1/3] ext4: Refactor ext4_overwrite_io() to take ext4_map_blocks as argument Ritesh Harjani
2020-08-24 12:15   ` Dan Carpenter
2020-08-24 12:15     ` Dan Carpenter
2020-10-03  3:59   ` Theodore Y. Ts'o
2020-08-22 11:34 ` [PATCHv2 2/3] ext4: Extend ext4_overwrite_io() for dax path Ritesh Harjani
2020-08-22 11:34 ` [PATCHv2 3/3] ext4: Optimize ext4 DAX overwrites Ritesh Harjani
  -- strict thread matches above, loose matches on Subject: below --
2020-08-22 17:52 [PATCHv2 1/3] ext4: Refactor ext4_overwrite_io() to take ext4_map_blocks as argument kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.