From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Dmitry Monakhov <dmonakhov@gmail.com>,
Theodore Tso <tytso@mit.edu>
Subject: [PATCH 4.9 01/80] ext4: fix extent_status fragmentation for plain files
Date: Fri, 1 May 2020 15:20:55 +0200 [thread overview]
Message-ID: <20200501131514.101804599@linuxfoundation.org> (raw)
In-Reply-To: <20200501131513.810761598@linuxfoundation.org>
From: Dmitry Monakhov <dmonakhov@gmail.com>
commit 4068664e3cd2312610ceac05b74c4cf1853b8325 upstream.
Extents are cached in read_extent_tree_block(); as a result, extents
are not cached for inodes with depth == 0 when we try to find the
extent using ext4_find_extent(). The result of the lookup is cached
in ext4_map_blocks() but is only a subset of the extent on disk. As a
result, the contents of extents status cache can get very badly
fragmented for certain workloads, such as a random 4k read workload.
File size of /mnt/test is 33554432 (8192 blocks of 4096 bytes)
ext: logical_offset: physical_offset: length: expected: flags:
0: 0.. 8191: 40960.. 49151: 8192: last,eof
$ perf record -e 'ext4:ext4_es_*' /root/bin/fio --name=t --direct=0 --rw=randread --bs=4k --filesize=32M --size=32M --filename=/mnt/test
$ perf script | grep ext4_es_insert_extent | head -n 10
fio 131 [000] 13.975421: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [494/1) mapped 41454 status W
fio 131 [000] 13.975939: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6064/1) mapped 47024 status W
fio 131 [000] 13.976467: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6907/1) mapped 47867 status W
fio 131 [000] 13.976937: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3850/1) mapped 44810 status W
fio 131 [000] 13.977440: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3292/1) mapped 44252 status W
fio 131 [000] 13.977931: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [6882/1) mapped 47842 status W
fio 131 [000] 13.978376: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [3117/1) mapped 44077 status W
fio 131 [000] 13.978957: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [2896/1) mapped 43856 status W
fio 131 [000] 13.979474: ext4:ext4_es_insert_extent: dev 253,0 ino 12 es [7479/1) mapped 48439 status W
Fix this by caching the extents for inodes with depth == 0 in
ext4_find_extent().
[ Renamed ext4_es_cache_extents() to ext4_cache_extents() since this
newly added function is not in extents_cache.c, and to avoid
potential visual confusion with ext4_es_cache_extent(). -TYT ]
Signed-off-by: Dmitry Monakhov <dmonakhov@gmail.com>
Link: https://lore.kernel.org/r/20191106122502.19986-1-dmonakhov@gmail.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/ext4/extents.c | 47 +++++++++++++++++++++++++++--------------------
1 file changed, 27 insertions(+), 20 deletions(-)
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -510,6 +510,30 @@ int ext4_ext_check_inode(struct inode *i
return ext4_ext_check(inode, ext_inode_hdr(inode), ext_depth(inode), 0);
}
+static void ext4_cache_extents(struct inode *inode,
+ struct ext4_extent_header *eh)
+{
+ struct ext4_extent *ex = EXT_FIRST_EXTENT(eh);
+ ext4_lblk_t prev = 0;
+ int i;
+
+ for (i = le16_to_cpu(eh->eh_entries); i > 0; i--, ex++) {
+ unsigned int status = EXTENT_STATUS_WRITTEN;
+ ext4_lblk_t lblk = le32_to_cpu(ex->ee_block);
+ int len = ext4_ext_get_actual_len(ex);
+
+ if (prev && (prev != lblk))
+ ext4_es_cache_extent(inode, prev, lblk - prev, ~0,
+ EXTENT_STATUS_HOLE);
+
+ if (ext4_ext_is_unwritten(ex))
+ status = EXTENT_STATUS_UNWRITTEN;
+ ext4_es_cache_extent(inode, lblk, len,
+ ext4_ext_pblock(ex), status);
+ prev = lblk + len;
+ }
+}
+
static struct buffer_head *
__read_extent_tree_block(const char *function, unsigned int line,
struct inode *inode, ext4_fsblk_t pblk, int depth,
@@ -540,26 +564,7 @@ __read_extent_tree_block(const char *fun
*/
if (!(flags & EXT4_EX_NOCACHE) && depth == 0) {
struct ext4_extent_header *eh = ext_block_hdr(bh);
- struct ext4_extent *ex = EXT_FIRST_EXTENT(eh);
- ext4_lblk_t prev = 0;
- int i;
-
- for (i = le16_to_cpu(eh->eh_entries); i > 0; i--, ex++) {
- unsigned int status = EXTENT_STATUS_WRITTEN;
- ext4_lblk_t lblk = le32_to_cpu(ex->ee_block);
- int len = ext4_ext_get_actual_len(ex);
-
- if (prev && (prev != lblk))
- ext4_es_cache_extent(inode, prev,
- lblk - prev, ~0,
- EXTENT_STATUS_HOLE);
-
- if (ext4_ext_is_unwritten(ex))
- status = EXTENT_STATUS_UNWRITTEN;
- ext4_es_cache_extent(inode, lblk, len,
- ext4_ext_pblock(ex), status);
- prev = lblk + len;
- }
+ ext4_cache_extents(inode, eh);
}
return bh;
errout:
@@ -907,6 +912,8 @@ ext4_find_extent(struct inode *inode, ex
path[0].p_bh = NULL;
i = depth;
+ if (!(flags & EXT4_EX_NOCACHE) && depth == 0)
+ ext4_cache_extents(inode, eh);
/* walk through the tree */
while (i) {
ext_debug("depth %d: num %d, max %d\n",
next prev parent reply other threads:[~2020-05-01 14:00 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-01 13:20 [PATCH 4.9 00/80] 4.9.221-rc1 review Greg Kroah-Hartman
2020-05-01 13:20 ` Greg Kroah-Hartman [this message]
2020-05-01 13:20 ` [PATCH 4.9 02/80] net: ipv4: emulate READ_ONCE() on ->hdrincl bit-field in raw_sendmsg() Greg Kroah-Hartman
2020-05-01 13:20 ` [PATCH 4.9 03/80] net: ipv4: avoid unused variable warning for sysctl Greg Kroah-Hartman
2020-05-01 13:20 ` [PATCH 4.9 04/80] drm/msm: Use the correct dma_sync calls harder Greg Kroah-Hartman
2020-05-01 13:20 ` [PATCH 4.9 05/80] crypto: mxs-dcp - make symbols sha1_null_hash and sha256_null_hash static Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 06/80] vti4: removed duplicate log message Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 07/80] watchdog: reset last_hw_keepalive time at start Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 08/80] scsi: lpfc: Fix kasan slab-out-of-bounds error in lpfc_unreg_login Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 09/80] ceph: return ceph_mdsc_do_request() errors from __get_parent() Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 10/80] ceph: dont skip updating wanted caps when cap is stale Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 11/80] pwm: rcar: Fix late Runtime PM enablement Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 12/80] scsi: iscsi: Report unbind session event when the target has been removed Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 13/80] ASoC: Intel: atom: Take the drv->lock mutex before calling sst_send_slot_map() Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 14/80] kernel/gcov/fs.c: gcov_seq_next() should increase position index Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 15/80] ipc/util.c: sysvipc_find_ipc() " Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 16/80] s390/cio: avoid duplicated ADD uevents Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 17/80] pwm: renesas-tpu: Fix late Runtime PM enablement Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 18/80] pwm: bcm2835: Dynamically allocate base Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 19/80] PCI/ASPM: Allow re-enabling Clock PM Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 20/80] ipv6: fix restrict IPV6_ADDRFORM operation Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 21/80] macsec: avoid to set wrong mtu Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 22/80] macvlan: fix null dereference in macvlan_device_event() Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 23/80] net: netrom: Fix potential nr_neigh refcnt leak in nr_add_node Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 24/80] net/x25: Fix x25_neigh refcnt leak when receiving frame Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 25/80] tcp: cache line align MAX_TCP_HEADER Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 26/80] team: fix hang in team_mode_get() Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 27/80] net: dsa: b53: Fix ARL register definitions Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 28/80] xfrm: Always set XFRM_TRANSFORMED in xfrm{4,6}_output_finish Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 29/80] ALSA: hda: Remove ASUS ROG Zenith from the blacklist Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 30/80] iio: xilinx-xadc: Fix ADC-B powerdown Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 31/80] iio: xilinx-xadc: Fix clearing interrupt when enabling trigger Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 32/80] iio: xilinx-xadc: Fix sequencer configuration for aux channels in simultaneous mode Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 33/80] fs/namespace.c: fix mountpoint reference counter race Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 34/80] USB: sisusbvga: Change port variable from signed to unsigned Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 35/80] USB: Add USB_QUIRK_DELAY_CTRL_MSG and USB_QUIRK_DELAY_INIT for Corsair K70 RGB RAPIDFIRE Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 36/80] USB: core: Fix free-while-in-use bug in the USB S-Glibrary Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 37/80] USB: hub: Fix handling of connect changes during sleep Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 38/80] overflow.h: Add arithmetic shift helper Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 39/80] vmalloc: fix remap_vmalloc_range() bounds checks Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 40/80] ALSA: usx2y: Fix potential NULL dereference Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 41/80] ALSA: usb-audio: Fix usb audio refcnt leak when getting spdif Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 42/80] ALSA: usb-audio: Filter out unsupported sample rates on Focusrite devices Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 43/80] tpm/tpm_tis: Free IRQ if probing fails Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 44/80] KVM: Check validity of resolved slot when searching memslots Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 45/80] KVM: VMX: Enable machine check support for 32bit targets Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 46/80] tty: hvc: fix buffer overflow during hvc_alloc() Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 47/80] tty: rocket, avoid OOB access Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 48/80] usb-storage: Add unusual_devs entry for JMicron JMS566 Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 49/80] audit: check the length of userspace generated audit records Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 50/80] ASoC: dapm: fixup dapm kcontrol widget Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 51/80] ARM: imx: provide v7_cpu_resume() only on ARM_CPU_SUSPEND=y Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 52/80] staging: comedi: dt2815: fix writing hi byte of analog output Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 53/80] staging: comedi: Fix comedi_device refcnt leak in comedi_open Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 54/80] staging: vt6656: Fix drivers TBTT timing counter Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 55/80] staging: vt6656: Power save stop wake_up_count wrap around Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 56/80] UAS: no use logging any details in case of ENODEV Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 57/80] UAS: fix deadlock in error handling and PM flushing work Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 58/80] usb: f_fs: Clear OS Extended descriptor counts to zero in ffs_data_reset() Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 59/80] remoteproc: Fix wrong rvring index computation Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 60/80] fuse: fix possibly missed wake-up after abort Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 61/80] mtd: cfi: fix deadloop in cfi_cmdset_0002.c do_write_buffer Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 62/80] usb: gadget: udc: bdc: Remove unnecessary NULL checks in bdc_req_complete Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 63/80] nfsd: memory corruption in nfsd4_lock() Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 64/80] net/cxgb4: Check the return from t4_query_params properly Greg Kroah-Hartman
2020-05-01 13:21 ` [PATCH 4.9 65/80] perf/core: fix parent pid/tid in task exit events Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 66/80] bpf, x86: Fix encoding for lower 8-bit registers in BPF_STX BPF_B Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 67/80] xfs: fix partially uninitialized structure in xfs_reflink_remap_extent Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 68/80] scsi: target: fix PR IN / READ FULL STATUS for FC Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 69/80] objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 70/80] objtool: Support Clang non-section symbols in ORC dump Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 71/80] xen/xenbus: ensure xenbus_map_ring_valloc() returns proper grant status Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 72/80] ext4: convert BUG_ONs to WARN_ONs in mballoc.c Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 73/80] of: unittest: kmemleak on changeset destroy Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 74/80] hwmon: (jc42) Fix name to have no illegal characters Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 75/80] ext4: avoid declaring fs inconsistent due to invalid file handles Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 76/80] ext4: protect journal inodes blocks using block_validity Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 77/80] ext4: dont perform block validity checks on the journal inode Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 78/80] ext4: fix block validity checks for journal inodes using indirect blocks Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 79/80] ext4: unsigned int compared against zero Greg Kroah-Hartman
2020-05-01 13:22 ` [PATCH 4.9 80/80] ext4: check for non-zero journal inum in ext4_calculate_overhead Greg Kroah-Hartman
2020-05-01 15:16 ` [PATCH 4.9 00/80] 4.9.221-rc1 review Jon Hunter
2020-05-01 21:58 ` Guenter Roeck
2020-05-01 22:43 ` Naresh Kamboju
2020-05-02 23:18 ` shuah
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200501131514.101804599@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=dmonakhov@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).