From: Dmitry Monakhov <dmonakhov@gmail.com>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: linux-ext4@vger.kernel.org
Subject: Re: [PATCH] ext4: fix extent_status fragmentation for plain files
Date: Fri, 08 Nov 2019 18:21:56 +0300 [thread overview]
Message-ID: <874kze4cmj.fsf@dmws.yandex.net> (raw)
In-Reply-To: <20191107153819.GI26959@mit.edu>
[-- Attachment #1: Type: text/plain, Size: 3608 bytes --]
"Theodore Y. Ts'o" <tytso@mit.edu> writes:
> On Wed, Nov 06, 2019 at 12:25:02PM +0000, Dmitry Monakhov wrote:
>> It is appeared that extent are not cached for inodes with depth == 0
>> which result in suboptimal extent status populating inside ext4_map_blocks()
>> by map's result where size requested is usually smaller than extent size so
>> cache becomes fragmented
>>
>> # Example: I have plain file:
>> File size of /mnt/test is 33554432 (8192 blocks of 4096 bytes)
>> ext: logical_offset: physical_offset: length: expected: flags:
>> 0: 0.. 8191: 40960.. 49151: 8192: last,eof
>>
>> $ perf record -e 'ext4:ext4_es_*' /root/bin/fio --name=t --direct=0 --rw=randread --bs=4k --filesize=32M --size=32M --filename=/mnt/test
>> $ perf script | grep ext4_es_insert_extent | head -n 10
>> fio 131 [000] 13.975421: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [494/1) mapped 41454 status W
>> fio 131 [000] 13.976467: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6907/1) mapped 47867 status W
>
> So this is certainly bad behavior, but the original intent was to not
> cached extents that were in the inode's i_blocks[] array because the
> information was already in the inode cache, and so we could save
> memory but just pulling the information out of the i_blocks away and
> there was no need to cache the extent in the es cache.
>
> There are cases where we do need to track the extent in the es cache
> --- for example, if we are writing the file and we need to track its
> delayed allocation status.
>
> So I wonder if we might be better off defining a new flag
> EXT4_MAP_INROOT, which gets set by ext4_ext_map_blocks() and
> ext4_ind_map_blocks() if the mapping is exclusively found in the
> i_blocks array, and if EXT4_MAP_INROOT is set, and we don't need to
> set EXTENT_STATUS_DELAYED, we skip the call to
> ext4_es_insert_extent().
>
> What do you think? This should significantly reduce the memory
> utilization of the es_cache, which would be good for low-memory
> workloads, and those where there are a large number of inodes that fit
> in the es_cache, which is probably true for most desktops, especially
> those belonging kernel developers. :-)
Sound reasonable, same happens in ext4_ext_precache()
See my patch below.
But this also means that on each ext4_map_blocks()
will fallback to regular block lookup:
down_read(&EXT4_I(inode)->i_data_sem)
ext4_ext_map_blocks (
->ext4_find_extent
path = kcalloc(depth + 2, sizeof(struct ext4_ext_path), GFP_NOFS)
kfree(path)
up_read(&EXT4_I(inode)->i_data_sem)
I thought that we can neglect this, but curiosity is a good thing
That it why I've tested nocache(see patch below) vs cache(first path) approach
## Production server ##
# CPU: Intel-Gold-6230
# DEV: /dev/ram0
# TEST: fio --direct=1 --rw=randread --bs=1k --filesize=64M
IOPS(nocache/cache): 729k vs 764k => +5%
# DEV: /dev/nvme0n1 (Samsung DC grade)
# TEST: fio --direct=1 --rw=randread --bs=4k --filesize=64M --ioengine=libaio --iodepth=128
IOPS(nocache/cache): 366k vs 378k => +3%
## My notebook Carbon/X1 ##
# CPU: i7-7600U
# DEV: /dev/nvme0n1 (Samsung MZVLB512HAJQ-000L7)
# TEST: fio --direct=1 --rw=randread --bs=4k --filesize=64M --ioengine=libaio --iodepth=128
IOPS(nocache/cache): 270k vs 270k => No difference
Difference is invisiable for my laptop, but visiable on fast NVME dev.
From other point of view unconditional caching result in memory
overhead sizeof(struct extent_status)/sizeof(ext4_inode_info) => 3.5% for everybody.
>
> - Ted
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-ext4-Don-t-cache-clean-inline-extents.patch --]
[-- Type: text/x-diff, Size: 1291 bytes --]
From 779e1322d47a5f28364343446853f24ac145e9ff Mon Sep 17 00:00:00 2001
From: Dmitry Monakhov <dmonakhov@gmail.com>
Date: Fri, 8 Nov 2019 18:08:49 +0300
Subject: [PATCH] ext4: Don't cache clean inline extents
Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com>
---
fs/ext4/inode.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index abaaf7d..6839ac4 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -45,6 +45,7 @@
#include "xattr.h"
#include "acl.h"
#include "truncate.h"
+#include "ext4_extents.h"
#include <trace/events/ext4.h>
@@ -584,11 +585,13 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode,
ext4_es_scan_range(inode, &ext4_es_is_delayed, map->m_lblk,
map->m_lblk + map->m_len - 1))
status |= EXTENT_STATUS_DELAYED;
- ret = ext4_es_insert_extent(inode, map->m_lblk,
- map->m_len, map->m_pblk, status);
- if (ret < 0)
- retval = ret;
+ /* Don't cache if there are no external extent blocks */
+ if ((status & EXTENT_STATUS_DELAYED) || ext_depth(inode)) {
+ ret = ext4_es_insert_extent(inode, map->m_lblk,
+ map->m_len, map->m_pblk, status);
+ if (ret < 0)
+ retval = ret;
+ }
}
up_read((&EXT4_I(inode)->i_data_sem));
--
2.7.4
next prev parent reply other threads:[~2019-11-08 15:22 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-06 12:25 [PATCH] ext4: fix extent_status fragmentation for plain files Dmitry Monakhov
2019-11-07 15:38 ` Theodore Y. Ts'o
2019-11-08 15:21 ` Dmitry Monakhov [this message]
2020-01-25 2:22 ` Theodore Y. Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=874kze4cmj.fsf@dmws.yandex.net \
--to=dmonakhov@gmail.com \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.