[PATCH v2 2/3] bcache: fix io error during cache read race

Linux bcache driver list
 help / color / mirror / Atom feed

From: mingzhe.zou@easystack.cn
To: colyli@suse.de
Cc: linux-bcache@vger.kernel.org, dongsheng.yang@easystack.cn,
	zoumingzhe@qq.com
Subject: [PATCH v2 2/3] bcache: fix io error during cache read race
Date: Tue, 19 Nov 2024 15:40:30 +0800	[thread overview]
Message-ID: <20241119074031.27340-2-mingzhe.zou@easystack.cn> (raw)
In-Reply-To: <20241119074031.27340-1-mingzhe.zou@easystack.cn>

From: Mingzhe Zou <mingzhe.zou@easystack.cn>

In our production environment, bcache returned IO_ERROR(errno=-5).
These errors always happen during 1M read IO under high pressure
and without any message log. When the error occurred, we stopped
all reading and writing and used 1M read IO to read the entire disk
without any errors. Later we found that cache_read_races of cache_set
is non-zero.

If a large (1M) read bio is split into two or more bios, when one bio
reads dirty data, s->read_dirty_data will be set to true and remain.
If the bucket was reused while our subsequent read bio was in flight,
the read will be unrecoverable(cannot read data from backing).

This patch increases the count for bucket->pin to prevent the bucket
from being reclaimed and reused.

Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
---
 drivers/md/bcache/request.c | 39 ++++++++++++++++++++++++-------------
 1 file changed, 26 insertions(+), 13 deletions(-)

diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index af345dc6fde1..6c41957138e5 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -502,12 +502,8 @@ static void bch_cache_read_endio(struct bio *bio)
 	struct closure *cl = bio->bi_private;
 	struct search *s = container_of(cl, struct search, cl);
 
-	/*
-	 * If the bucket was reused while our bio was in flight, we might have
-	 * read the wrong data. Set s->error but not error so it doesn't get
-	 * counted against the cache device, but we'll still reread the data
-	 * from the backing device.
-	 */
+	BUG_ON(ptr_stale(s->iop.c, &b->key, 0)); // bucket should not be reused
+	atomic_dec(&PTR_BUCKET(s->iop.c, &b->key, 0)->pin);
 
 	if (bio->bi_status)
 		s->iop.status = bio->bi_status;
@@ -520,6 +516,8 @@ static void bch_cache_read_endio(struct bio *bio)
 	bch_bbio_endio(s->iop.c, bio, bio->bi_status, "reading from cache");
 }
 
+static void backing_request_endio(struct bio *bio);
+
 /*
  * Read from a single key, handling the initial cache miss if the key starts in
  * the middle of the bio
@@ -529,7 +527,6 @@ static int cache_lookup_fn(struct btree_op *op, struct btree *b, struct bkey *k)
 	struct search *s = container_of(op, struct search, op);
 	struct bio *n, *bio = &s->bio.bio;
 	struct bkey *bio_key;
-	unsigned int ptr;
 
 	if (bkey_cmp(k, &KEY(s->iop.inode, bio->bi_iter.bi_sector, 0)) <= 0)
 		return MAP_CONTINUE;
@@ -553,20 +550,36 @@ static int cache_lookup_fn(struct btree_op *op, struct btree *b, struct bkey *k)
 	if (!KEY_SIZE(k))
 		return MAP_CONTINUE;
 
-	/* XXX: figure out best pointer - for multiple cache devices */
-	ptr = 0;
+	/*
+	 * If the bucket was reused while our bio was in flight, we might have
+	 * read the wrong data. Set s->cache_read_races and reread the data
+	 * from the backing device.
+	 */
+	spin_lock(&PTR_BUCKET(b->c, k, 0)->lock);
+	if (ptr_stale(s->iop.c, k, 0)) {
+		spin_unlock(&PTR_BUCKET(b->c, k, 0)->lock);
+		atomic_long_inc(&s->iop.c->cache_read_races);
+		pr_warn("%pU cache read race count: %lu", s->iop.c->sb.set_uuid,
+			atomic_long_read(&s->iop.c->cache_read_races));
 
-	PTR_BUCKET(b->c, k, ptr)->prio = INITIAL_PRIO;
+		n->bi_end_io	= backing_request_endio;
+		n->bi_private	= &s->cl;
+
+		/* I/O request sent to backing device */
+		closure_bio_submit(s->iop.c, n, &s->cl);
+		return n == bio ? MAP_DONE : MAP_CONTINUE;
+	}
+	atomic_inc(&PTR_BUCKET(s->iop.c, k, 0)->pin);
+	spin_unlock(&PTR_BUCKET(b->c, k, 0)->lock);
 
-	if (KEY_DIRTY(k))
-		s->read_dirty_data = true;
+	PTR_BUCKET(b->c, k, 0)->prio = INITIAL_PRIO;
 
 	n = bio_next_split(bio, min_t(uint64_t, INT_MAX,
 				      KEY_OFFSET(k) - bio->bi_iter.bi_sector),
 			   GFP_NOIO, &s->d->bio_split);
 
 	bio_key = &container_of(n, struct bbio, bio)->key;
-	bch_bkey_copy_single_ptr(bio_key, k, ptr);
+	bch_bkey_copy_single_ptr(bio_key, k, 0);
 
 	bch_cut_front(&KEY(s->iop.inode, n->bi_iter.bi_sector, 0), bio_key);
 	bch_cut_back(&KEY(s->iop.inode, bio_end_sector(n), 0), bio_key);
-- 
2.34.1

next prev parent reply	other threads:[~2024-11-19 15:37 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-19  7:40 [PATCH v2 1/3] bcache: avoid invalidating buckets in use mingzhe.zou
2024-11-19  7:40 ` mingzhe.zou [this message]
2024-12-20  5:20   ` Re:[PATCH v2 2/3] bcache: fix io error during cache read race 邹明哲
2024-12-21 16:17     ` [PATCH " colyli
2024-12-24  2:38       ` 邹明哲
2024-11-19  7:40 ` [PATCH v2 3/3] bcache: remove unused parameters mingzhe.zou
2025-01-22  4:42 ` [PATCH v2 1/3] bcache: avoid invalidating buckets in use Coly Li

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:af345dc6fde dfblob:6c41957138e )
 OR (
bs:"[PATCH v2 2/3] bcache: fix io error during cache read race" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241119074031.27340-2-mingzhe.zou@easystack.cn \
    --to=mingzhe.zou@easystack.cn \
    --cc=colyli@suse.de \
    --cc=dongsheng.yang@easystack.cn \
    --cc=linux-bcache@vger.kernel.org \
    --cc=zoumingzhe@qq.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox