From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Andrea Parri <parri.andrea@gmail.com>,
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
Paul McKenney <paulmck@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
Steven Rostedt <rostedt@goodmis.org>,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: blk->id read race: was: [PATCH v2 2/3] printk: add lockless buffer
Date: Wed, 10 Jun 2020 10:42:48 +0200 [thread overview]
Message-ID: <20200610084248.GA4311@linux-b0ei> (raw)
In-Reply-To: <87tuzkuxtw.fsf@vostro.fn.ogness.net>
On Tue 2020-06-09 16:18:35, John Ogness wrote:
> On 2020-06-09, Petr Mladek <pmladek@suse.com> wrote:
> >> --- /dev/null
> >> +++ b/kernel/printk/printk_ringbuffer.c
> >> +/*
> >> + * Given a data ring (text or dict), put the associated descriptor of each
> >> + * data block from @lpos_begin until @lpos_end into the reusable state.
> >> + *
> >> + * If there is any problem making the associated descriptor reusable, either
> >> + * the descriptor has not yet been committed or another writer task has
> >> + * already pushed the tail lpos past the problematic data block. Regardless,
> >> + * on error the caller can re-load the tail lpos to determine the situation.
> >> + */
> >> +static bool data_make_reusable(struct printk_ringbuffer *rb,
> >> + struct prb_data_ring *data_ring,
> >> + unsigned long lpos_begin,
> >> + unsigned long lpos_end,
> >> + unsigned long *lpos_out)
> >> +{
> >> + struct prb_desc_ring *desc_ring = &rb->desc_ring;
> >> + struct prb_data_blk_lpos *blk_lpos;
> >> + struct prb_data_block *blk;
> >> + unsigned long tail_lpos;
> >> + enum desc_state d_state;
> >> + struct prb_desc desc;
> >> + unsigned long id;
> >> +
> >> + /*
> >> + * Using the provided @data_ring, point @blk_lpos to the correct
> >> + * blk_lpos within the local copy of the descriptor.
> >> + */
> >> + if (data_ring == &rb->text_data_ring)
> >> + blk_lpos = &desc.text_blk_lpos;
> >> + else
> >> + blk_lpos = &desc.dict_blk_lpos;
> >> +
> >> + /* Loop until @lpos_begin has advanced to or beyond @lpos_end. */
> >> + while ((lpos_end - lpos_begin) - 1 < DATA_SIZE(data_ring)) {
> >> + blk = to_block(data_ring, lpos_begin);
> >> + id = READ_ONCE(blk->id); /* LMM(data_make_reusable:A) */
> >
> > This would deserve some comment:
> >
> > 1. Compiler could not optimize out the read because there is a data
> > dependency on lpos_begin.
> >
> > 2. Compiler could not postpone the read because it is followed by
> > smp_rmb().
> >
> > So, is READ_ONCE() realy needed?
>
> I agree that it is not needed. Both the READ_ONCE() and its countering
> WRITE_ONCE() (data_alloc:B) only document the lockless shared access. I
> will remove both for the next version.
Sounds good.
> Do we still need a comment? Is it not obvious that there is a data
> dependency on @lpos_begin?
Sigh, I just wonder why I am always confusedby this. See below.
> blk = to_block(data_ring, lpos_begin);
> id = blk->id;
>
> > Well, blk->id clearly can be modified in parallel so we need to be
> > careful. There is smp_rmb() right below. Do we needed smp_rmb() also
> > before?
> >
> > What about the following scenario?:
> >
> >
> > CPU0 CPU1
> >
> > data_alloc()
> > data_push_tail()
> >
> > blk = to_block(data_ring, begin_lpos)
> > WRITE_ONCE(blk->id, id); /* LMM(data_alloc:B) */
> >
> > desc_push_tail()
> > data_push_tail()
> >
> > tail_lpos = data_ring->tail_lpos;
> > // see data_ring->tail_lpos already updated by CPU1
> >
> > data_make_reusable()
> >
> > // lpos_begin = tail_lpos via parameter
> > blk = to_block(data_ring, lpos_begin);
> > id = blk->id
> >
> > Now: CPU0 might see outdated blk->id before CPU1 wrote new value
> > because there is no read barrier betwen reading tail_lpos
> > and blk->id here.
>
> In your example, CPU1 is pushing the tail and then setting the block ID
> for the _newly_ allocated block, that is located is _before_ the new
> tail. If CPU0 sees the new tail already, it is still reading a valid
> block ID, which is _not_ from the block that CPU1 is in the process of
> writing.
Ah, I see. I wrongly assumed that both CPO0 and CPU1 are working with
the same block address. But if CPU0 sees the new tail_lpos, it is
already looking at another block. And it is the classic fight against
yet another potential CPUs that try to push the tail as well.
I wonder if the comment might look like:
/*
* No barrier is needed between reading tail_lpos and the related
* blk->id. Only CPU that modifies tail_lpos via cmpxchg is allowed
* to modify the related blk->id. CPUs that see the moved tail_lpos
* are looking at another block related to the new tail_lpos.
* It does not mater when the previous winner modifies the previous
* block.
*/
I am not sure how many people are confused like me. It is possible
that it is not worth it. I just know that I did this mistake
repeatedly ;-)
Best Regards,
Petr
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
WARNING: multiple messages have this Message-ID (diff)
From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>,
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Andrea Parri <parri.andrea@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
Paul McKenney <paulmck@kernel.org>
Subject: Re: blk->id read race: was: [PATCH v2 2/3] printk: add lockless buffer
Date: Wed, 10 Jun 2020 10:42:48 +0200 [thread overview]
Message-ID: <20200610084248.GA4311@linux-b0ei> (raw)
In-Reply-To: <87tuzkuxtw.fsf@vostro.fn.ogness.net>
On Tue 2020-06-09 16:18:35, John Ogness wrote:
> On 2020-06-09, Petr Mladek <pmladek@suse.com> wrote:
> >> --- /dev/null
> >> +++ b/kernel/printk/printk_ringbuffer.c
> >> +/*
> >> + * Given a data ring (text or dict), put the associated descriptor of each
> >> + * data block from @lpos_begin until @lpos_end into the reusable state.
> >> + *
> >> + * If there is any problem making the associated descriptor reusable, either
> >> + * the descriptor has not yet been committed or another writer task has
> >> + * already pushed the tail lpos past the problematic data block. Regardless,
> >> + * on error the caller can re-load the tail lpos to determine the situation.
> >> + */
> >> +static bool data_make_reusable(struct printk_ringbuffer *rb,
> >> + struct prb_data_ring *data_ring,
> >> + unsigned long lpos_begin,
> >> + unsigned long lpos_end,
> >> + unsigned long *lpos_out)
> >> +{
> >> + struct prb_desc_ring *desc_ring = &rb->desc_ring;
> >> + struct prb_data_blk_lpos *blk_lpos;
> >> + struct prb_data_block *blk;
> >> + unsigned long tail_lpos;
> >> + enum desc_state d_state;
> >> + struct prb_desc desc;
> >> + unsigned long id;
> >> +
> >> + /*
> >> + * Using the provided @data_ring, point @blk_lpos to the correct
> >> + * blk_lpos within the local copy of the descriptor.
> >> + */
> >> + if (data_ring == &rb->text_data_ring)
> >> + blk_lpos = &desc.text_blk_lpos;
> >> + else
> >> + blk_lpos = &desc.dict_blk_lpos;
> >> +
> >> + /* Loop until @lpos_begin has advanced to or beyond @lpos_end. */
> >> + while ((lpos_end - lpos_begin) - 1 < DATA_SIZE(data_ring)) {
> >> + blk = to_block(data_ring, lpos_begin);
> >> + id = READ_ONCE(blk->id); /* LMM(data_make_reusable:A) */
> >
> > This would deserve some comment:
> >
> > 1. Compiler could not optimize out the read because there is a data
> > dependency on lpos_begin.
> >
> > 2. Compiler could not postpone the read because it is followed by
> > smp_rmb().
> >
> > So, is READ_ONCE() realy needed?
>
> I agree that it is not needed. Both the READ_ONCE() and its countering
> WRITE_ONCE() (data_alloc:B) only document the lockless shared access. I
> will remove both for the next version.
Sounds good.
> Do we still need a comment? Is it not obvious that there is a data
> dependency on @lpos_begin?
Sigh, I just wonder why I am always confusedby this. See below.
> blk = to_block(data_ring, lpos_begin);
> id = blk->id;
>
> > Well, blk->id clearly can be modified in parallel so we need to be
> > careful. There is smp_rmb() right below. Do we needed smp_rmb() also
> > before?
> >
> > What about the following scenario?:
> >
> >
> > CPU0 CPU1
> >
> > data_alloc()
> > data_push_tail()
> >
> > blk = to_block(data_ring, begin_lpos)
> > WRITE_ONCE(blk->id, id); /* LMM(data_alloc:B) */
> >
> > desc_push_tail()
> > data_push_tail()
> >
> > tail_lpos = data_ring->tail_lpos;
> > // see data_ring->tail_lpos already updated by CPU1
> >
> > data_make_reusable()
> >
> > // lpos_begin = tail_lpos via parameter
> > blk = to_block(data_ring, lpos_begin);
> > id = blk->id
> >
> > Now: CPU0 might see outdated blk->id before CPU1 wrote new value
> > because there is no read barrier betwen reading tail_lpos
> > and blk->id here.
>
> In your example, CPU1 is pushing the tail and then setting the block ID
> for the _newly_ allocated block, that is located is _before_ the new
> tail. If CPU0 sees the new tail already, it is still reading a valid
> block ID, which is _not_ from the block that CPU1 is in the process of
> writing.
Ah, I see. I wrongly assumed that both CPO0 and CPU1 are working with
the same block address. But if CPU0 sees the new tail_lpos, it is
already looking at another block. And it is the classic fight against
yet another potential CPUs that try to push the tail as well.
I wonder if the comment might look like:
/*
* No barrier is needed between reading tail_lpos and the related
* blk->id. Only CPU that modifies tail_lpos via cmpxchg is allowed
* to modify the related blk->id. CPUs that see the moved tail_lpos
* are looking at another block related to the new tail_lpos.
* It does not mater when the previous winner modifies the previous
* block.
*/
I am not sure how many people are confused like me. It is possible
that it is not worth it. I just know that I did this mistake
repeatedly ;-)
Best Regards,
Petr
next prev parent reply other threads:[~2020-06-10 8:42 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-01 9:40 [PATCH v2 0/3] printk: replace ringbuffer John Ogness
2020-05-01 9:40 ` John Ogness
2020-05-01 9:40 ` [PATCH v2 1/3] crash: add VMCOREINFO macro for anonymous structs John Ogness
2020-05-01 9:40 ` John Ogness
2020-06-03 10:16 ` Petr Mladek
2020-06-03 10:16 ` Petr Mladek
2020-05-01 9:40 ` [PATCH v2 2/3] printk: add lockless buffer John Ogness
2020-05-01 9:40 ` John Ogness
2020-05-18 13:03 ` John Ogness
2020-05-18 17:22 ` Linus Torvalds
2020-05-18 17:22 ` Linus Torvalds
2020-05-19 20:34 ` John Ogness
2020-05-19 20:34 ` John Ogness
2020-06-09 7:10 ` blk->id read race: was: " Petr Mladek
2020-06-09 7:10 ` Petr Mladek
2020-06-09 14:18 ` John Ogness
2020-06-09 14:18 ` John Ogness
2020-06-10 8:42 ` Petr Mladek [this message]
2020-06-10 8:42 ` Petr Mladek
2020-06-10 13:55 ` John Ogness
2020-06-10 13:55 ` John Ogness
2020-06-09 9:31 ` redundant check in make_data_reusable(): was " Petr Mladek
2020-06-09 9:31 ` Petr Mladek
2020-06-09 14:48 ` John Ogness
2020-06-09 14:48 ` John Ogness
2020-06-10 9:38 ` Petr Mladek
2020-06-10 9:38 ` Petr Mladek
2020-06-10 10:24 ` John Ogness
2020-06-10 10:24 ` John Ogness
2020-06-10 14:56 ` John Ogness
2020-06-10 14:56 ` John Ogness
2020-06-11 19:51 ` John Ogness
2020-06-11 19:51 ` John Ogness
2020-06-11 13:55 ` Petr Mladek
2020-06-11 13:55 ` Petr Mladek
2020-06-11 20:25 ` John Ogness
2020-06-11 20:25 ` John Ogness
2020-06-09 9:48 ` Full barrier in data_push_tail(): " Petr Mladek
2020-06-09 9:48 ` Petr Mladek
2020-06-09 15:03 ` John Ogness
2020-06-09 15:03 ` John Ogness
2020-06-09 11:37 ` Barrier before pushing desc_ring tail: " Petr Mladek
2020-06-09 11:37 ` Petr Mladek
2020-06-09 15:56 ` John Ogness
2020-06-09 15:56 ` John Ogness
2020-06-11 12:01 ` Petr Mladek
2020-06-11 12:01 ` Petr Mladek
2020-06-11 23:06 ` John Ogness
2020-06-11 23:06 ` John Ogness
2020-06-09 14:38 ` data_ring head_lpos and tail_lpos synchronization: " Petr Mladek
2020-06-09 14:38 ` Petr Mladek
2020-06-10 7:53 ` John Ogness
2020-06-10 7:53 ` John Ogness
2020-05-01 9:40 ` [PATCH v2 3/3] printk: use the lockless ringbuffer John Ogness
2020-05-01 9:40 ` John Ogness
2020-05-06 14:50 ` John Ogness
2020-05-06 14:50 ` John Ogness
2020-05-13 12:05 ` [PATCH v2 0/3] printk: replace ringbuffer Prarit Bhargava
2020-05-13 12:05 ` Prarit Bhargava
2020-05-15 10:24 ` Sergey Senozhatsky
2020-05-15 10:24 ` Sergey Senozhatsky
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200610084248.GA4311@linux-b0ei \
--to=pmladek@suse.com \
--cc=gregkh@linuxfoundation.org \
--cc=john.ogness@linutronix.de \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=parri.andrea@gmail.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=sergey.senozhatsky.work@gmail.com \
--cc=sergey.senozhatsky@gmail.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.