From: colyli <colyli@suse.de>
To: 邹明哲 <mingzhe.zou@easystack.cn>
Cc: linux-bcache <linux-bcache@vger.kernel.org>,
zoumingzhe <zoumingzhe@qq.com>
Subject: Re: [PATCH v2 2/3] bcache: fix io error during cache read race
Date: Sun, 22 Dec 2024 00:17:06 +0800 [thread overview]
Message-ID: <4da838e985ec3f7263a4ad4a9b307d15@suse.de> (raw)
In-Reply-To: <AMcAQwBHLpIksKhhq8fkdqro.3.1734672038253.Hmail.mingzhe.zou@easystack.cn>
在 2024-12-20 13:20,邹明哲 写道:
> Hi, Coly:
>
> Our users have reported this issue to us in their generation
> environment!
>
> Please review these patches and provide feedback.
>
> Thank you very much.
Hi Mingzhe,
Yes it is planed for next week, I will start to look at this series.
BTW, I don't see change log from the v1 to v2 series. Can I assume the
v2 series fix warning reported by kernel test robot?
Thanks.
Coly Li
>
> mingzhe
>
> Original:
> From:mingzhe.zou<mingzhe.zou@easystack.cn>
> Date:2024-11-19 15:40:30(中国 (GMT+08:00))
> To:colyli<colyli@suse.de>
> Cc:linux-bcache<linux-bcache@vger.kernel.org> ,
> dongsheng.yang<dongsheng.yang@easystack.cn> ,
> zoumingzhe<zoumingzhe@qq.com>
> Subject:[PATCH v2 2/3] bcache: fix io error during cache read race
> From: Mingzhe Zou <mingzhe.zou@easystack.cn>
>
> In our production environment, bcache returned IO_ERROR(errno=-5).
> These errors always happen during 1M read IO under high pressure
> and without any message log. When the error occurred, we stopped
> all reading and writing and used 1M read IO to read the entire disk
> without any errors. Later we found that cache_read_races of cache_set
> is non-zero.
>
> If a large (1M) read bio is split into two or more bios, when one bio
> reads dirty data, s->read_dirty_data will be set to true and remain.
> If the bucket was reused while our subsequent read bio was in flight,
> the read will be unrecoverable(cannot read data from backing).
>
> This patch increases the count for bucket->pin to prevent the bucket
> from being reclaimed and reused.
>
> Signed-off-by: Mingzhe Zou <mingzhe.zou@easystack.cn>
> ---
> drivers/md/bcache/request.c | 39 ++++++++++++++++++++++++-------------
> 1 file changed, 26 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index af345dc6fde1..6c41957138e5 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -502,12 +502,8 @@ static void bch_cache_read_endio(struct bio *bio)
> struct closure *cl = bio->bi_private;
> struct search *s = container_of(cl, struct search, cl);
>
> - /*
> - * If the bucket was reused while our bio was in flight, we might
> have
> - * read the wrong data. Set s->error but not error so it doesn't
> get
> - * counted against the cache device, but we'll still reread the data
> - * from the backing device.
> - */
> + BUG_ON(ptr_stale(s->iop.c, &b->key, 0)); // bucket should
> not be reused
> + atomic_dec(&PTR_BUCKET(s->iop.c, &b->key, 0)->pin);
>
> if (bio->bi_status)
> s->iop.status = bio->bi_status;
> @@ -520,6 +516,8 @@ static void bch_cache_read_endio(struct bio *bio)
> bch_bbio_endio(s->iop.c, bio, bio->bi_status, "reading from
> cache");
> }
>
> +static void backing_request_endio(struct bio *bio);
> +
> /*
> * Read from a single key, handling the initial cache miss if the key
> starts in
> * the middle of the bio
> @@ -529,7 +527,6 @@ static int cache_lookup_fn(struct btree_op *op,
> struct btree *b, struct bkey *k)
> struct search *s = container_of(op, struct search, op);
> struct bio *n, *bio = &s->bio.bio;
> struct bkey *bio_key;
> - unsigned int ptr;
>
> if (bkey_cmp(k, &KEY(s->iop.inode, bio->bi_iter.bi_sector,
> 0)) <= 0)
> return MAP_CONTINUE;
> @@ -553,20 +550,36 @@ static int cache_lookup_fn(struct btree_op *op,
> struct btree *b, struct bkey *k)
> if (!KEY_SIZE(k))
> return MAP_CONTINUE;
>
> - /* XXX: figure out best pointer - for multiple cache devices */
> - ptr = 0;
> + /*
> + * If the bucket was reused while our bio was in flight, we might
> have
> + * read the wrong data. Set s->cache_read_races and reread the
> data
> + * from the backing device.
> + */
> + spin_lock(&PTR_BUCKET(b->c, k, 0)->lock);
> + if (ptr_stale(s->iop.c, k, 0)) {
> + spin_unlock(&PTR_BUCKET(b->c, k, 0)->lock);
> + atomic_long_inc(&s->iop.c->cache_read_races);
> + pr_warn("%pU cache read race count: %lu",
> s->iop.c->sb.set_uuid,
> + atomic_long_read(&s->iop.c->cache_read_races));
>
> - PTR_BUCKET(b->c, k, ptr)->prio = INITIAL_PRIO;
> + n->bi_end_io = backing_request_endio;
> + n->bi_private = &s->cl;
> +
> + /* I/O request sent to backing device */
> + closure_bio_submit(s->iop.c, n, &s->cl);
> + return n == bio ? MAP_DONE : MAP_CONTINUE;
> + }
> + atomic_inc(&PTR_BUCKET(s->iop.c, k, 0)->pin);
> + spin_unlock(&PTR_BUCKET(b->c, k, 0)->lock);
>
> - if (KEY_DIRTY(k))
> - s->read_dirty_data = true;
> + PTR_BUCKET(b->c, k, 0)->prio = INITIAL_PRIO;
>
> n = bio_next_split(bio, min_t(uint64_t, INT_MAX,
> KEY_OFFSET(k) - bio->bi_iter.bi_sector),
> GFP_NOIO, &s->d->bio_split);
>
> bio_key = &container_of(n, struct bbio, bio)->key;
> - bch_bkey_copy_single_ptr(bio_key, k, ptr);
> + bch_bkey_copy_single_ptr(bio_key, k, 0);
>
> bch_cut_front(&KEY(s->iop.inode, n->bi_iter.bi_sector, 0),
> bio_key);
> bch_cut_back(&KEY(s->iop.inode, bio_end_sector(n), 0),
> bio_key);
next prev parent reply other threads:[~2024-12-21 16:17 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-19 7:40 [PATCH v2 1/3] bcache: avoid invalidating buckets in use mingzhe.zou
2024-11-19 7:40 ` [PATCH v2 2/3] bcache: fix io error during cache read race mingzhe.zou
2024-12-20 5:20 ` 邹明哲
2024-12-21 16:17 ` colyli [this message]
2024-12-24 2:38 ` [PATCH " 邹明哲
2024-11-19 7:40 ` [PATCH v2 3/3] bcache: remove unused parameters mingzhe.zou
2025-01-22 4:42 ` [PATCH v2 1/3] bcache: avoid invalidating buckets in use Coly Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4da838e985ec3f7263a4ad4a9b307d15@suse.de \
--to=colyli@suse.de \
--cc=linux-bcache@vger.kernel.org \
--cc=mingzhe.zou@easystack.cn \
--cc=zoumingzhe@qq.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox