From: Ming Lei <ming.lei@redhat.com>
To: Waiman Long <longman@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org, Tejun Heo <tj@kernel.org>,
ming.lei@redhat.com
Subject: Re: [PATCH 2/2] blk-cgroup: fix list corruption from reorder of WRITE ->lqueued
Date: Thu, 16 May 2024 08:58:50 +0800 [thread overview]
Message-ID: <ZkVaSg154oU3XltA@fedora> (raw)
In-Reply-To: <2a0cae15-e9d9-4524-a0db-f665b1832db4@redhat.com>
On Wed, May 15, 2024 at 10:09:30AM -0400, Waiman Long wrote:
>
> On 5/14/24 21:31, Ming Lei wrote:
> > __blkcg_rstat_flush() can be run anytime, especially when blk_cgroup_bio_start
> > is being executed.
> >
> > If WRITE of `->lqueued` is re-ordered with READ of 'bisc->lnode.next' in
> > the loop of __blkcg_rstat_flush(), `next_bisc` can be assigned with one
> > stat instance being added in blk_cgroup_bio_start(), then the local
> > list in __blkcg_rstat_flush() could be corrupted.
> >
> > Fix the issue by adding one barrier.
> >
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Waiman Long <longman@redhat.com>
> > Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()")
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> > block/blk-cgroup.c | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> >
> > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> > index 86752b1652b5..b36ba1d40ba1 100644
> > --- a/block/blk-cgroup.c
> > +++ b/block/blk-cgroup.c
> > @@ -1036,6 +1036,16 @@ static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
> > struct blkg_iostat cur;
> > unsigned int seq;
> > + /*
> > + * Order assignment of `next_bisc` from `bisc->lnode.next` in
> > + * llist_for_each_entry_safe and clearing `bisc->lqueued` for
> > + * avoiding to assign `next_bisc` with new next pointer added
> > + * in blk_cgroup_bio_start() in case of re-ordering.
> > + *
> > + * The pair barrier is implied in llist_add() in blk_cgroup_bio_start().
> > + */
> > + smp_mb();
> > +
> > WRITE_ONCE(bisc->lqueued, false);
>
> I believe replacing WRITE_ONCE() by smp_store_release() and the READ_ONCE()
> in blk_cgroup_bio_start() by smp_load_acquire() should provide enough
> barrier to prevent unexpected reordering as
Yeah, smp_load_acquire() and smp_store_release() pair works too, but with
one extra cost of smp_mb() around llist_add() which implies barrier
already.
So I prefer to this patch.
> the subsequent
> u64_stats_fetch_begin() will also provide a read barrier for subsequent
> read.
u64_stats_fetch_begin() is nop in case of `BITS_PER_LONG == 64`.
Thanks,
Ming
next prev parent reply other threads:[~2024-05-16 0:59 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-15 1:31 [PATCH 0/2] blk-cgroup: two fixes on list corruption Ming Lei
2024-05-15 1:31 ` [PATCH 1/2] blk-cgroup: fix list corruption from resetting io stat Ming Lei
2024-05-15 13:59 ` Waiman Long
2024-05-15 16:49 ` Tejun Heo
2024-05-15 1:31 ` [PATCH 2/2] blk-cgroup: fix list corruption from reorder of WRITE ->lqueued Ming Lei
2024-05-15 14:09 ` Waiman Long
2024-05-15 14:46 ` Waiman Long
2024-05-16 0:58 ` Ming Lei [this message]
2024-05-16 2:21 ` [PATCH 0/2] blk-cgroup: two fixes on list corruption Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZkVaSg154oU3XltA@fedora \
--to=ming.lei@redhat.com \
--cc=axboe@kernel.dk \
--cc=linux-block@vger.kernel.org \
--cc=longman@redhat.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.