lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
	Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Andrew Perepechko <andrew.perepechko@hpe.com>,
	Alexander Zarochentsev <alexander.zarochentsev@hpe.com>,
	Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 03/33] lustre: llite: EIO is possible on a race with page reclaim
Date: Sun,  2 Feb 2025 15:46:03 -0500	[thread overview]
Message-ID: <20250202204633.1148872-4-jsimmons@infradead.org> (raw)
In-Reply-To: <20250202204633.1148872-1-jsimmons@infradead.org>

From: Patrick Farrell <pfarrell@whamcloud.com>

We must clear the 'uptodate' page flag when we delete a
page from Lustre, or stale reads can occur.  However,
generic_file_buffered_read requires any pages returned from
readpage() be uptodate.

So, we must retry reading if page truncation happens in
parallel with the read.

This implements the same fix as:
https://review.whamcloud.com/49647
commit e02cfe39f908 ("lustre: llite: SIGBUS is possible on a race with page reclaim")

did for the mmap path.

WC-bug-id: https://jira.whamcloud.com/browse/LU-16649
Lustre-commit: 1d98e5c32b41e19bb ("LU-16649 llite: EIO is possible on a race with page reclaim")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50344
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 fs/lustre/include/obd_support.h |  1 +
 fs/lustre/llite/file.c          | 11 +++++++----
 fs/lustre/llite/rw.c            |  8 ++++++++
 fs/lustre/llite/vvp_io.c        | 25 ++++++++++++++++++++++---
 4 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/fs/lustre/include/obd_support.h b/fs/lustre/include/obd_support.h
index cee7e3164d66..0a63af11db35 100644
--- a/fs/lustre/include/obd_support.h
+++ b/fs/lustre/include/obd_support.h
@@ -492,6 +492,7 @@ extern char obd_jobid_var[];
 #define OBD_FAIL_LLITE_PAGE_INVALIDATE_PAUSE		0x1421
 #define OBD_FAIL_LLITE_READPAGE_PAUSE			0x1422
 #define OBD_FAIL_LLITE_PANIC_ON_ESTALE			0x1423
+#define OBD_FAIL_LLITE_READPAGE_PAUSE2			0x1424
 
 #define OBD_FAIL_FID_INDIR				0x1501
 #define OBD_FAIL_FID_INLMA				0x1502
diff --git a/fs/lustre/llite/file.c b/fs/lustre/llite/file.c
index c99e9c01bc65..b2751b571ea9 100644
--- a/fs/lustre/llite/file.c
+++ b/fs/lustre/llite/file.c
@@ -1990,12 +1990,15 @@ ll_do_fast_read(struct kiocb *iocb, struct iov_iter *iter)
 
 	result = generic_file_read_iter(iocb, iter);
 
-	/*
-	 * If the first page is not in cache, generic_file_aio_read() will be
-	 * returned with -ENODATA.
+	/* If the first page is not in cache, generic_file_aio_read() will be
+	 * returned with -ENODATA.  Fall back to full read path.
 	 * See corresponding code in ll_readpage().
+	 *
+	 * if we raced with page deletion, we might get EIO.  Rather than add
+	 * locking to the fast path for this rare case, fall back to the full
+	 * read path.  (See vvp_io_read_start() for rest of handling.
 	 */
-	if (result == -ENODATA)
+	if (result == -ENODATA || result == -EIO)
 		result = 0;
 
 	if (result > 0) {
diff --git a/fs/lustre/llite/rw.c b/fs/lustre/llite/rw.c
index 0c73258428e6..92a9c252247e 100644
--- a/fs/lustre/llite/rw.c
+++ b/fs/lustre/llite/rw.c
@@ -2046,5 +2046,13 @@ int ll_readpage(struct file *file, struct page *vmpage)
 	if (ra.cra_release)
 		cl_read_ahead_release(env, &ra);
 
+	/* this delay gives time for the actual read of the page to finish and
+	 * unlock the page in vvp_page_completion_read before we return to our
+	 * caller and the caller tries to use the page, allowing us to test
+	 * races with the page being unlocked after readpage() but before it's
+	 * used by the caller
+	 */
+	OBD_FAIL_TIMEOUT(OBD_FAIL_LLITE_READPAGE_PAUSE2, cfs_fail_val);
+
 	return result;
 }
diff --git a/fs/lustre/llite/vvp_io.c b/fs/lustre/llite/vvp_io.c
index 26dfaaa76bd9..86dab3b1a39f 100644
--- a/fs/lustre/llite/vvp_io.c
+++ b/fs/lustre/llite/vvp_io.c
@@ -811,8 +811,10 @@ static int vvp_io_read_start(const struct lu_env *env,
 	size_t cnt = io->u.ci_rd.rd.crw_count;
 	size_t tot = vio->vui_tot_count;
 	struct ll_cl_context *lcc;
+	unsigned int seq;
 	int exceed = 0;
 	int result;
+	int total_bytes_read = 0;
 	struct iov_iter iter;
 	pgoff_t page_offset;
 
@@ -878,12 +880,29 @@ static int vvp_io_read_start(const struct lu_env *env,
 	lcc->lcc_end_index = DIV_ROUND_UP(pos + iter.count, PAGE_SIZE);
 	CDEBUG(D_VFSTRACE, "count:%ld iocb pos:%lld\n", iter.count, pos);
 
-	result = generic_file_read_iter(vio->vui_iocb, &iter);
+	/* this seqlock lets us notice if a page has been deleted on this inode
+	 * during the fault process, allowing us to catch an erroneous short
+	 * read or EIO
+	 * See LU-16160
+	 */
+	do {
+		seq = read_seqbegin(&ll_i2info(inode)->lli_page_inv_lock);
+		result = generic_file_read_iter(vio->vui_iocb, &iter);
+		if (result >= 0) {
+			io->ci_nob += result;
+			total_bytes_read += result;
+		}
+	/* if we got a short read or -EIO and we raced with page invalidation,
+	 * retry
+	 */
+	} while (read_seqretry(&ll_i2info(inode)->lli_page_inv_lock, seq) &&
+		 ((result >= 0 && iov_iter_count(&iter) > 0) ||
+		  result == -EIO));
+
 out:
 	if (result >= 0) {
-		if (result < cnt)
+		if (total_bytes_read < cnt)
 			io->ci_continue = 0;
-		io->ci_nob += result;
 		result = 0;
 	} else if (result == -EIOCBQUEUED) {
 		io->ci_nob += vio->u.readwrite.vui_read;
-- 
2.39.3

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

  parent reply	other threads:[~2025-02-02 20:50 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-02 20:46 [lustre-devel] [PATCH 00/33] lustre: sync to OpenSFS branch May 31, 2023 James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 01/33] lnet: set msg field for lnet message header James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 02/33] Revert "lustre: llite: Check vmpage in releasepage" James Simmons
2025-02-02 20:46 ` James Simmons [this message]
2025-02-02 20:46 ` [lustre-devel] [PATCH 04/33] lustre: llite: add __GFP_NORETRY for read-ahead page James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 05/33] lustre: obd: change lmd flags to bitmap James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 06/33] lustre: uapi: cleanup FSFILT defines James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 07/33] lustre: obd: Reserve metadata overstriping flags James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 08/33] lnet: selftest: manage the workqueue state properly James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 09/33] lustre: remove cl_{offset, index, page_size} helpers James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 10/33] lustre: csdc: reserve layout bits for compress component James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 11/33] lustre: obd: replace simple_strtoul() James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 12/33] lnet: Use dynamic allocation for LND tunables James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 13/33] lustre: cksum: fix generating T10PI guard tags for partial brw page James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 14/33] lustre: llite: remove OBD_ -> CFS_ macros James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 15/33] lustre: obd: " James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 16/33] lnet: improve numeric NID to CPT hashing James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 17/33] lnet: libcfs: Remove unsed LASSERT_ATOMIC_* macros James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 18/33] lustre: misc: replace obsolete ioctl numbers James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 19/33] lustre: lmv: treat unknown hash type as sane type James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 20/33] lustre: llite: Fix return for non-queued aio James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 21/33] lnet: collect data about routes by using Netlink James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 22/33] lustre: ptlrpc: switch sptlrpc_rule_set_choose to large nid James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 23/33] lnet: use list_first_entry() where appropriate James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 24/33] lustre: statahead: using try lock for batched RPCs James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 25/33] lnet: libcfs: use round_up directly James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 26/33] lustre: mdc: md_open_data should keep ref on close_req James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 27/33] lustre: llite: update comment of ll_swap_layouts_close James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 28/33] lustre: ldlm: replace OBD_ -> CFS_ macros James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 29/33] lustre: mdc: remove " James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 30/33] lnet: libcfs: move cfs_expr_list_print to nidstrings.c James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 31/33] lnet: libcfs: Remove reference to LASSERT_ATOMIC_POS James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 32/33] lnet: ksocklnd: ksocklnd_ni_get_eth_intf_speed() must use only rtnl lock James Simmons
2025-02-02 20:46 ` [lustre-devel] [PATCH 33/33] lustre: ldlm: convert ldlm extent locks to linux extent-tree James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250202204633.1148872-4-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=adilger@whamcloud.com \
    --cc=alexander.zarochentsev@hpe.com \
    --cc=andrew.perepechko@hpe.com \
    --cc=green@whamcloud.com \
    --cc=lustre-devel@lists.lustre.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).