From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Jan Kara <jack@suse.cz>, Matthew Wilcox <willy@infradead.org>,
Guo Xuenan <guoxuenan@huawei.com>,
Andrew Morton <akpm@linux-foundation.org>,
Sasha Levin <sashal@kernel.org>
Subject: [PATCH 6.1 37/71] readahead: avoid multiple marked readahead pages
Date: Wed, 13 Mar 2024 12:39:23 -0400 [thread overview]
Message-ID: <20240313163957.615276-38-sashal@kernel.org> (raw)
In-Reply-To: <20240313163957.615276-1-sashal@kernel.org>
From: Jan Kara <jack@suse.cz>
[ Upstream commit ab4443fe3ca6298663a55c4a70efc6c3ce913ca6 ]
ra_alloc_folio() marks a page that should trigger next round of async
readahead. However it rounds up computed index to the order of page being
allocated. This can however lead to multiple consecutive pages being
marked with readahead flag. Consider situation with index == 1, mark ==
1, order == 0. We insert order 0 page at index 1 and mark it. Then we
bump order to 1, index to 2, mark (still == 1) is rounded up to 2 so page
at index 2 is marked as well. Then we bump order to 2, index is
incremented to 4, mark gets rounded to 4 so page at index 4 is marked as
well. The fact that multiple pages get marked within a single readahead
window confuses the readahead logic and results in readahead window being
trimmed back to 1. This situation is triggered in particular when maximum
readahead window size is not a power of two (in the observed case it was
768 KB) and as a result sequential read throughput suffers.
Fix the problem by rounding 'mark' down instead of up. Because the index
is naturally aligned to 'order', we are guaranteed 'rounded mark' == index
iff 'mark' is within the page we are allocating at 'index' and thus
exactly one page is marked with readahead flag as required by the
readahead code and sequential read performance is restored.
This effectively reverts part of commit b9ff43dd2743 ("mm/readahead: Fix
readahead with large folios"). The commit changed the rounding with the
rationale:
"... we were setting the readahead flag on the folio which contains the
last byte read from the block. This is wrong because we will trigger
readahead at the end of the read without waiting to see if a subsequent
read is going to use the pages we just read."
Although this is true, the fact is this was always the case with read
sizes not aligned to folio boundaries and large folios in the page cache
just make the situation more obvious (and frequent). Also for sequential
read workloads it is better to trigger the readahead earlier rather than
later. It is true that the difference in the rounding and thus earlier
triggering of the readahead can result in reading more for semi-random
workloads. However workloads really suffering from this seem to be rare.
In particular I have verified that the workload described in commit
b9ff43dd2743 ("mm/readahead: Fix readahead with large folios") of reading
random 100k blocks from a file like:
[reader]
bs=100k
rw=randread
numjobs=1
size=64g
runtime=60s
is not impacted by the rounding change and achieves ~70MB/s in both cases.
[jack@suse.cz: fix one more place where mark rounding was done as well]
Link: https://lkml.kernel.org/r/20240123153254.5206-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240104085839.21029-1-jack@suse.cz
Fixes: b9ff43dd2743 ("mm/readahead: Fix readahead with large folios")
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Guo Xuenan <guoxuenan@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
mm/readahead.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/readahead.c b/mm/readahead.c
index ba43428043a35..e4b772bb70e68 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -483,7 +483,7 @@ static inline int ra_alloc_folio(struct readahead_control *ractl, pgoff_t index,
if (!folio)
return -ENOMEM;
- mark = round_up(mark, 1UL << order);
+ mark = round_down(mark, 1UL << order);
if (index == mark)
folio_set_readahead(folio);
err = filemap_add_folio(ractl->mapping, folio, index, gfp);
@@ -591,7 +591,7 @@ static void ondemand_readahead(struct readahead_control *ractl,
* It's the expected callback index, assume sequential access.
* Ramp up sizes, and push forward the readahead window.
*/
- expected = round_up(ra->start + ra->size - ra->async_size,
+ expected = round_down(ra->start + ra->size - ra->async_size,
1UL << order);
if (index == expected || index == (ra->start + ra->size)) {
ra->start += ra->size;
--
2.43.0
next prev parent reply other threads:[~2024-03-13 16:40 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-13 16:38 [PATCH 6.1 00/71] 6.1.82-rc1 review Sasha Levin
2024-03-13 16:38 ` [PATCH 6.1 01/71] ceph: switch to corrected encoding of max_xattr_size in mdsmap Sasha Levin
2024-03-13 16:38 ` [PATCH 6.1 02/71] net: lan78xx: fix runtime PM count underflow on link stop Sasha Levin
2024-03-13 16:38 ` [PATCH 6.1 03/71] ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able Sasha Levin
2024-03-13 16:38 ` [PATCH 6.1 04/71] i40e: disable NAPI right after disabling irqs when handling xsk_pool Sasha Levin
2024-03-13 16:38 ` [PATCH 6.1 05/71] ice: reorder disabling IRQ and NAPI in ice_qp_dis Sasha Levin
2024-03-13 16:38 ` [PATCH 6.1 06/71] tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string Sasha Levin
2024-03-13 16:38 ` [PATCH 6.1 07/71] geneve: make sure to pull inner header in geneve_rx() Sasha Levin
2024-03-13 16:38 ` [PATCH 6.1 08/71] net: sparx5: Fix use after free inside sparx5_del_mact_entry Sasha Levin
2024-03-13 16:38 ` [PATCH 6.1 09/71] ice: virtchnl: stop pretending to support RSS over AQ or registers Sasha Levin
2024-03-13 16:38 ` [PATCH 6.1 10/71] net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink() Sasha Levin
2024-03-13 16:38 ` [PATCH 6.1 11/71] igc: avoid returning frame twice in XDP_REDIRECT Sasha Levin
2024-03-13 16:38 ` [PATCH 6.1 12/71] net/ipv6: avoid possible UAF in ip6_route_mpath_notify() Sasha Levin
2024-03-13 16:38 ` [PATCH 6.1 13/71] cpumap: Zero-initialise xdp_rxq_info struct before running XDP program Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 14/71] net: dsa: microchip: fix register write order in ksz8_ind_write8() Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 15/71] net/rds: fix WARNING in rds_conn_connect_if_down Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 16/71] netfilter: nft_ct: fix l3num expectations with inet pseudo family Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 17/71] netfilter: nf_conntrack_h323: Add protection for bmp length out of range Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 18/71] erofs: apply proper VMA alignment for memory mapped files on THP Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 19/71] netrom: Fix a data-race around sysctl_netrom_default_path_quality Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 20/71] netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 21/71] netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 22/71] netrom: Fix a data-race around sysctl_netrom_transport_timeout Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 23/71] netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 24/71] netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 25/71] netrom: Fix a data-race around sysctl_netrom_transport_busy_delay Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 26/71] netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 27/71] netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 28/71] netrom: Fix a data-race around sysctl_netrom_routing_control Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 29/71] netrom: Fix a data-race around sysctl_netrom_link_fails_count Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 30/71] netrom: Fix data-races around sysctl_net_busy_read Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 31/71] KVM: s390: add stat counter for shadow gmap events Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 32/71] KVM: s390: vsie: fix race during shadow creation Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 33/71] ASoC: codecs: wcd938x: fix headphones volume controls Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 34/71] drm/amd/display: Fix uninitialized variable usage in core_link_ 'read_dpcd() & write_dpcd()' functions Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 35/71] nfp: flower: add goto_chain_index for ct entry Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 36/71] nfp: flower: add hardware offload check for post " Sasha Levin
2024-03-13 16:39 ` Sasha Levin [this message]
2024-03-13 16:39 ` [PATCH 6.1 38/71] selftests/mm: switch to bash from sh Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 39/71] selftests: mm: fix map_hugetlb failure on 64K page size systems Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 40/71] xhci: process isoc TD properly when there was a transaction error mid TD Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 41/71] xhci: handle isoc Babble and Buffer Overrun events properly Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 42/71] drm/amdgpu: Reset IH OVERFLOW_CLEAR bit Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 43/71] x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 44/71] Documentation/hw-vuln: Add documentation for RFDS Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 45/71] x86/rfds: Mitigate Register File Data Sampling (RFDS) Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 46/71] KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 47/71] selftests: mptcp: decrease BW in simult flows Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 48/71] blk-iocost: disable writeback throttling Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 49/71] elevator: remove redundant code in elv_unregister_queue() Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 50/71] blk-wbt: remove unnecessary check in wbt_enable_default() Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 51/71] elevator: add new field flags in struct elevator_queue Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 52/71] blk-wbt: don't enable throttling if default elevator is bfq Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 53/71] blk-wbt: pass a gendisk to wbt_{enable,disable}_default Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 54/71] blk-wbt: pass a gendisk to wbt_init Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 55/71] blk-rq-qos: move rq_qos_add and rq_qos_del out of line Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 56/71] blk-rq-qos: make rq_qos_add and rq_qos_del more useful Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 57/71] blk-rq-qos: constify rq_qos_ops Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 58/71] blk-rq-qos: store a gendisk instead of request_queue in struct rq_qos Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 59/71] blk-wbt: Fix detection of dirty-throttled tasks Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 60/71] drm/amd/display: Wrong colorimetry workaround Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 61/71] drm/amd/display: Fix MST Null Ptr for RV Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 62/71] getrusage: add the "signal_struct *sig" local variable Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 63/71] getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand() Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 64/71] getrusage: use __for_each_thread() Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 65/71] getrusage: use sig->stats_lock rather than lock_task_sighand() Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 66/71] fs/proc: do_task_stat: use __for_each_thread() Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 67/71] fs/proc: do_task_stat: use sig->stats_lock to gather the threads/children stats Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 68/71] exit: wait_task_zombie: kill the no longer necessary spin_lock_irq(siglock) Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 69/71] blk-wbt: fix that wbt can't be disabled by default Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 70/71] blk-iocost: Pass gendisk to ioc_refresh_params Sasha Levin
2024-03-13 16:39 ` [PATCH 6.1 71/71] Linux 6.1.82-rc1 Sasha Levin
2024-03-13 20:04 ` [PATCH 6.1 00/71] 6.1.82-rc1 review Pavel Machek
2024-03-13 20:13 ` Mateusz Jończyk
2024-03-13 21:27 ` Mateusz Jończyk
2024-03-14 21:12 ` Mateusz Jończyk
2024-03-14 22:04 ` Jens Axboe
2024-03-14 22:35 ` Sasha Levin
2024-03-14 22:40 ` Jens Axboe
2024-03-15 12:14 ` Sasha Levin
2024-03-15 14:42 ` Sasha Levin
2024-03-15 14:49 ` Jens Axboe
2024-03-15 19:31 ` Ron Economos
2024-03-14 14:43 ` Naresh Kamboju
2024-03-14 20:45 ` Florian Fainelli
2024-03-15 10:37 ` Shreeya Patel
2024-03-15 15:34 ` Mark Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240313163957.615276-38-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=guoxuenan@huawei.com \
--cc=jack@suse.cz \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox