Re: [PATCH] md/raid5: protect bitmap batch counters aka seq_flush/seq_write

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Chen Cheng" <chencheng@fnnas.com>
Cc: yukuai@fygo.io, linux-raid@vger.kernel.org
Subject: Re: [PATCH] md/raid5: protect bitmap batch counters aka seq_flush/seq_write
Date: Tue, 16 Jun 2026 13:30:33 +0000	[thread overview]
Message-ID: <20260616133034.015501F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260616131310.3790554-1-chencheng@fnnas.com>

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [Critical] Missing full memory barriers between the bitmap update/scan and `seq_flush` access allows torn writes.
--

commit 45ef15a300eccd14518bc90e254146a73a49bf51
Author: Chen Cheng <chencheng@fnnas.com>

md/raid5: protect bitmap batch counters aka seq_flush/seq_write

This commit attempts to fix a KCSAN data-race report on conf->seq_flush
in the raid5 code by introducing READ_ONCE() and WRITE_ONCE().

> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index d0d7f96fd7cd9..ffc89ebbc4857 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3538,7 +3538,7 @@ static void __add_stripe_bio(struct stripe_head *sh, struct bio *bi,
>  		 sh->dev[dd_idx].sector);
>  
>  	if (conf->mddev->bitmap && firstwrite && !sh->batch_head) {
> -		sh->bm_seq = conf->seq_flush+1;
> +		sh->bm_seq = READ_ONCE(conf->seq_flush) + 1;

[Severity: Critical]
This is a pre-existing issue, but does this lockless synchronization require
a full memory barrier (smp_mb()) before reading seq_flush?

The producer sequence updates the in-memory bitmap (which drops a lock,
providing only a Store-Release barrier) and then reads seq_flush. Because
there is no full memory barrier between the store to the bitmap and the load
of seq_flush, weakly-ordered CPUs (and even x86, which allows Store-Load
reordering) can reorder the read of seq_flush before the store to the
bitmap becomes globally visible.

Could this lead to silent data corruption on a crash if the consumer sees
the old bitmap state but the producer assigns the stripe to the current
batch?

>  		set_bit(STRIPE_BIT_DELAY, &sh->state);
>  	}
>  }

[ ... ]

> @@ -5755,7 +5755,7 @@ static void make_discard_request(struct mddev *mddev, struct bio *bi)
>  		}
>  		spin_unlock_irq(&sh->stripe_lock);
>  		if (conf->mddev->bitmap) {
> -			sh->bm_seq = conf->seq_flush + 1;
> +			sh->bm_seq = READ_ONCE(conf->seq_flush) + 1;

[Severity: Critical]
This is a pre-existing issue, but the same missing full memory barrier
pattern appears here in the discard path as well.

>  			set_bit(STRIPE_BIT_DELAY, &sh->state);
>  		}
>  

[ ... ]

> @@ -6805,12 +6805,13 @@ static void raid5d(struct md_thread *thread)
>  		if (
>  		    !list_empty(&conf->bitmap_list)) {
>  			/* Now is a good time to flush some bitmap updates */
> -			conf->seq_flush++;
> +			int seq = READ_ONCE(conf->seq_flush) + 1;
> +			WRITE_ONCE(conf->seq_flush, seq);

[Severity: Critical]
This isn't a bug introduced by this patch, but does the consumer side
also rely dangerously on implicit barriers?

The consumer (raid5d) writes seq_flush and relies on an implicit full
barrier deep inside unplug() (via test_and_clear_bit) to order the
subsequent bitmap scan.

If the producer's missing barrier allows the read of seq_flush to be
reordered, the consumer might flush the old batch and advance seq_write,
unblocking the producer's stripe to be written to disk without its
write-intent bitmap bit being flushed.

>  			spin_unlock_irq(&conf->device_lock);
>  			if (md_bitmap_enabled(mddev, true))
>  				mddev->bitmap_ops->unplug(mddev, true);
>  			spin_lock_irq(&conf->device_lock);
> -			conf->seq_write = conf->seq_flush;
> +			WRITE_ONCE(conf->seq_write, seq);
>  			activate_bit_delay(conf, conf->temp_inactive_list);
>  		}
>  		raid5_activate_delayed(conf);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260616131310.3790554-1-chencheng@fnnas.com?part=1

     prev parent reply	other threads:[~2026-06-16 13:30 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16 13:13 [PATCH] md/raid5: protect bitmap batch counters aka seq_flush/seq_write Chen Cheng
2026-06-16 13:30 ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260616133034.015501F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=chencheng@fnnas.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=yukuai@fygo.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.