All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Andrea Parri <parri.andrea@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Paul McKenney <paulmck@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: redundant check in make_data_reusable(): was [PATCH v2 2/3] printk: add lockless buffer
Date: Thu, 11 Jun 2020 15:55:09 +0200	[thread overview]
Message-ID: <20200611135509.GE6581@linux-b0ei> (raw)
In-Reply-To: <87o8prp6bi.fsf@vostro.fn.ogness.net>

On Wed 2020-06-10 12:24:01, John Ogness wrote:
> On 2020-06-10, Petr Mladek <pmladek@suse.com> wrote:
> >> >> --- /dev/null
> >> >> +++ b/kernel/printk/printk_ringbuffer.c
> >> >> +/*
> >> >> + * Given a data ring (text or dict), put the associated descriptor of each
> >> >> + * data block from @lpos_begin until @lpos_end into the reusable state.
> >> >> + *
> >>>> + * If there is any problem making the associated descriptor reusable, either
> >>>> + * the descriptor has not yet been committed or another writer task has
> >>>> + * already pushed the tail lpos past the problematic data block. Regardless,
> >>>> + * on error the caller can re-load the tail lpos to determine the situation.
> >>>> + */
> >>>> +static bool data_make_reusable(struct printk_ringbuffer *rb,
> >>>> +			       struct prb_data_ring *data_ring,
> >>>> +			       unsigned long lpos_begin,
> >>>> +			       unsigned long lpos_end,
> >>>> +			       unsigned long *lpos_out)
> >>>> +{
> >>>> +	struct prb_desc_ring *desc_ring = &rb->desc_ring;
> >>>> +	struct prb_data_blk_lpos *blk_lpos;
> >>>> +	struct prb_data_block *blk;
> >>>> +	unsigned long tail_lpos;
> >>>> +	enum desc_state d_state;
> >>>> +	struct prb_desc desc;
> >>>> +	unsigned long id;
> >>>> +
> >>>> +	/*
> >>>> +	 * Using the provided @data_ring, point @blk_lpos to the correct
> >>>> +	 * blk_lpos within the local copy of the descriptor.
> >>>> +	 */
> >>>> +	if (data_ring == &rb->text_data_ring)
> >>>> +		blk_lpos = &desc.text_blk_lpos;
> >>>> +	else
> >>>> +		blk_lpos = &desc.dict_blk_lpos;
> >>>> +
> >>>> +	/* Loop until @lpos_begin has advanced to or beyond @lpos_end. */
> >>>> +	while ((lpos_end - lpos_begin) - 1 < DATA_SIZE(data_ring)) {
> >>>> +		blk = to_block(data_ring, lpos_begin);
> >>>> +		id = READ_ONCE(blk->id); /* LMM(data_make_reusable:A) */
> >>>> +
> >>>> +		/*
> >>>> +		 * Guarantee the block ID is loaded before checking the tail
> >>>> +		 * lpos. The loaded block ID can only be considered valid if
> >>>> +		 * the tail lpos has not overtaken @lpos_begin. This pairs
> >>>> +		 * with data_alloc:A.
> >>>> +		 *
> >>>> +		 * Memory barrier involvement:
> >>>> +		 *
> >>>> +		 * If data_make_reusable:A reads from data_alloc:B, then
> >>>> +		 * data_make_reusable:C reads from data_push_tail:D.
> >>>> +		 *
> >>>> +		 * Relies on:
> >>>> +		 *
> >>>> +		 * MB from data_push_tail:D to data_alloc:B
> >>>> +		 *    matching
> >>>> +		 * RMB from data_make_reusable:A to data_make_reusable:C
> >>>> +		 *
> >>>> +		 * Note: data_push_tail:D and data_alloc:B can be different
> >>>> +		 *       CPUs. However, the data_alloc:B CPU (which performs
> >>>> +		 *       the full memory barrier) must have previously seen
> >>>> +		 *       data_push_tail:D.
> >>>> +		 */
> >>>> +		smp_rmb(); /* LMM(data_make_reusable:B) */
> >>>> +
> >>>> +		tail_lpos = atomic_long_read(&data_ring->tail_lpos
> >>>> +					); /* LMM(data_make_reusable:C) */
> >>>> +
> >>>> +		/*
> >>>> +		 * If @lpos_begin has fallen behind the tail lpos, the read
> >>>> +		 * block ID cannot be trusted. Fast forward @lpos_begin to the
> >>>> +		 * tail lpos and try again.
> >>>> +		 */
> >>>> +		if (lpos_begin - tail_lpos >= DATA_SIZE(data_ring)) {
> >>>> +			lpos_begin = tail_lpos;
> >>>> +			continue;
> >>>> +		}
> >>>
> >>> I am sorry if we have had this discussion already in past.
> >> 
> >> We have [0]. (Search for "Ouch.")
> >
> > I see. Thanks a lot for the pointer.
> >
> >>> Well, it would just prove that it really needs a comment why this
> >>> check is necessary.
> >> 
> >> The comment says why it is necessary. The previous read of the block ID
> >> cannot be trusted if the the tail has been pushed beyond it.
> >
> > Not really. The comment describes what the check does. But it does not
> > explain why it is needed. The reason might be described be something like:
> >
> > 		* Make sure that the id read from tail_lpos is really
> > 		* pointing to this lpos. The block might have been
> > 		* reused in the meantime. Make sure to do not make
> > 		* the new owner reusable.
> 
> That is _not_ what this check is doing. I recommend looking closely at
> the example you posted. This is not about whether or not a descriptor is
> pointing to this lpos. In your example you showed that ID, state, and
> lpos values could all look good, but it is for the _new_ record and we
> should _not_ invalidate that one.

OK, let's make sure that we are talking about the same example.
I was talking about this one from
https://lore.kernel.org/lkml/87ftecy343.fsf@linutronix.de/

% [*] Another problem would be when data_make_reusable() see the new
%     data already in the committed state. It would make fresh new
%     data reusable.
%
%     I mean the following:
%
% CPU0				CPU1
%
% data_alloc()
%   begin_lpos = dr->head_lpos
%   data_push_tail()
%     lpos = dr->tail_lpos
%				prb_reserve()
%				  # reserve the location of current
%				  # dr->tail_lpos
%				prb_commit()
%
%     id = blk->id
%     # read id for the freshly written data on CPU1
%     # and happily make them reusable
%     data_make_reusable()

Sigh, sigh, sigh, there is a hugely misleading comment in the example:

%				  # reserve the location of current
%				  # dr->tail_lpos

It is true that it reserves part of this location. But it will use
data_ring->head_lpos for the related desc->text_blk_lpos.begin !!!

See below:

> We can detect the scenario you pointed out by verifying the tail has not
> moved beyond the data block that the ID was read from. The comment for
> this check says:
> 
>     If @lpos_begin has fallen behind the tail lpos,
>     the read block ID cannot be trusted.
> 
> This is exactly the why. It is only about whether we can trust that a
> non-garbage block ID was read. Or do you want me to add:
> 
>     ... because data read that is behind the tail lpos must be
>     considered garbage.
> 
> > But wait! This situation should get caught by the two existing descriptor
> > checks:
> >
> >>		case desc_committed:
> >>			/*
> >>			 * This data block is invalid if the descriptor
> >>			 * does not point back to it.
> >>			 */
> >>			if (blk_lpos->begin != lpos_begin)
> >>				return false;
> >>			desc_make_reusable(desc_ring, id);
> >>			break;
> >>		case desc_reusable:
> >>			/*
> >>			 * This data block is invalid if the descriptor
> >>			 * does not point back to it.
> >>			 */
> >>			if (blk_lpos->begin != lpos_begin)
> >>				return false;
> >>			break;
> 
> No. Your example showed that it is not caught here.

I am afraid that my example was wrong:

If blk->id comes from the new descriptor written by CPU1 then
blk_lpos->begin is based on the old data_ring->head_lpos.
Then it is different from lpos_begin.


Let's put it another way. The state of the descriptor defines validity
of the data. Descriptor in committed state _must not_ point to invalid
data block!!!

If a descriptor in committed state point to lpos that was in invalid range
before reading the descriptor then we have a huge hole in the design.

This is why I believe that the check of the descriptor must be enough.


Best Regards,
Petr

PS: I am sorry if I create too much confusion. It is easy to
get lost :-(

_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec

WARNING: multiple messages have this Message-ID (diff)
From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Andrea Parri <parri.andrea@gmail.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	kexec@lists.infradead.org, linux-kernel@vger.kernel.org,
	Paul McKenney <paulmck@kernel.org>
Subject: Re: redundant check in make_data_reusable(): was [PATCH v2 2/3] printk: add lockless buffer
Date: Thu, 11 Jun 2020 15:55:09 +0200	[thread overview]
Message-ID: <20200611135509.GE6581@linux-b0ei> (raw)
In-Reply-To: <87o8prp6bi.fsf@vostro.fn.ogness.net>

On Wed 2020-06-10 12:24:01, John Ogness wrote:
> On 2020-06-10, Petr Mladek <pmladek@suse.com> wrote:
> >> >> --- /dev/null
> >> >> +++ b/kernel/printk/printk_ringbuffer.c
> >> >> +/*
> >> >> + * Given a data ring (text or dict), put the associated descriptor of each
> >> >> + * data block from @lpos_begin until @lpos_end into the reusable state.
> >> >> + *
> >>>> + * If there is any problem making the associated descriptor reusable, either
> >>>> + * the descriptor has not yet been committed or another writer task has
> >>>> + * already pushed the tail lpos past the problematic data block. Regardless,
> >>>> + * on error the caller can re-load the tail lpos to determine the situation.
> >>>> + */
> >>>> +static bool data_make_reusable(struct printk_ringbuffer *rb,
> >>>> +			       struct prb_data_ring *data_ring,
> >>>> +			       unsigned long lpos_begin,
> >>>> +			       unsigned long lpos_end,
> >>>> +			       unsigned long *lpos_out)
> >>>> +{
> >>>> +	struct prb_desc_ring *desc_ring = &rb->desc_ring;
> >>>> +	struct prb_data_blk_lpos *blk_lpos;
> >>>> +	struct prb_data_block *blk;
> >>>> +	unsigned long tail_lpos;
> >>>> +	enum desc_state d_state;
> >>>> +	struct prb_desc desc;
> >>>> +	unsigned long id;
> >>>> +
> >>>> +	/*
> >>>> +	 * Using the provided @data_ring, point @blk_lpos to the correct
> >>>> +	 * blk_lpos within the local copy of the descriptor.
> >>>> +	 */
> >>>> +	if (data_ring == &rb->text_data_ring)
> >>>> +		blk_lpos = &desc.text_blk_lpos;
> >>>> +	else
> >>>> +		blk_lpos = &desc.dict_blk_lpos;
> >>>> +
> >>>> +	/* Loop until @lpos_begin has advanced to or beyond @lpos_end. */
> >>>> +	while ((lpos_end - lpos_begin) - 1 < DATA_SIZE(data_ring)) {
> >>>> +		blk = to_block(data_ring, lpos_begin);
> >>>> +		id = READ_ONCE(blk->id); /* LMM(data_make_reusable:A) */
> >>>> +
> >>>> +		/*
> >>>> +		 * Guarantee the block ID is loaded before checking the tail
> >>>> +		 * lpos. The loaded block ID can only be considered valid if
> >>>> +		 * the tail lpos has not overtaken @lpos_begin. This pairs
> >>>> +		 * with data_alloc:A.
> >>>> +		 *
> >>>> +		 * Memory barrier involvement:
> >>>> +		 *
> >>>> +		 * If data_make_reusable:A reads from data_alloc:B, then
> >>>> +		 * data_make_reusable:C reads from data_push_tail:D.
> >>>> +		 *
> >>>> +		 * Relies on:
> >>>> +		 *
> >>>> +		 * MB from data_push_tail:D to data_alloc:B
> >>>> +		 *    matching
> >>>> +		 * RMB from data_make_reusable:A to data_make_reusable:C
> >>>> +		 *
> >>>> +		 * Note: data_push_tail:D and data_alloc:B can be different
> >>>> +		 *       CPUs. However, the data_alloc:B CPU (which performs
> >>>> +		 *       the full memory barrier) must have previously seen
> >>>> +		 *       data_push_tail:D.
> >>>> +		 */
> >>>> +		smp_rmb(); /* LMM(data_make_reusable:B) */
> >>>> +
> >>>> +		tail_lpos = atomic_long_read(&data_ring->tail_lpos
> >>>> +					); /* LMM(data_make_reusable:C) */
> >>>> +
> >>>> +		/*
> >>>> +		 * If @lpos_begin has fallen behind the tail lpos, the read
> >>>> +		 * block ID cannot be trusted. Fast forward @lpos_begin to the
> >>>> +		 * tail lpos and try again.
> >>>> +		 */
> >>>> +		if (lpos_begin - tail_lpos >= DATA_SIZE(data_ring)) {
> >>>> +			lpos_begin = tail_lpos;
> >>>> +			continue;
> >>>> +		}
> >>>
> >>> I am sorry if we have had this discussion already in past.
> >> 
> >> We have [0]. (Search for "Ouch.")
> >
> > I see. Thanks a lot for the pointer.
> >
> >>> Well, it would just prove that it really needs a comment why this
> >>> check is necessary.
> >> 
> >> The comment says why it is necessary. The previous read of the block ID
> >> cannot be trusted if the the tail has been pushed beyond it.
> >
> > Not really. The comment describes what the check does. But it does not
> > explain why it is needed. The reason might be described be something like:
> >
> > 		* Make sure that the id read from tail_lpos is really
> > 		* pointing to this lpos. The block might have been
> > 		* reused in the meantime. Make sure to do not make
> > 		* the new owner reusable.
> 
> That is _not_ what this check is doing. I recommend looking closely at
> the example you posted. This is not about whether or not a descriptor is
> pointing to this lpos. In your example you showed that ID, state, and
> lpos values could all look good, but it is for the _new_ record and we
> should _not_ invalidate that one.

OK, let's make sure that we are talking about the same example.
I was talking about this one from
https://lore.kernel.org/lkml/87ftecy343.fsf@linutronix.de/

% [*] Another problem would be when data_make_reusable() see the new
%     data already in the committed state. It would make fresh new
%     data reusable.
%
%     I mean the following:
%
% CPU0				CPU1
%
% data_alloc()
%   begin_lpos = dr->head_lpos
%   data_push_tail()
%     lpos = dr->tail_lpos
%				prb_reserve()
%				  # reserve the location of current
%				  # dr->tail_lpos
%				prb_commit()
%
%     id = blk->id
%     # read id for the freshly written data on CPU1
%     # and happily make them reusable
%     data_make_reusable()

Sigh, sigh, sigh, there is a hugely misleading comment in the example:

%				  # reserve the location of current
%				  # dr->tail_lpos

It is true that it reserves part of this location. But it will use
data_ring->head_lpos for the related desc->text_blk_lpos.begin !!!

See below:

> We can detect the scenario you pointed out by verifying the tail has not
> moved beyond the data block that the ID was read from. The comment for
> this check says:
> 
>     If @lpos_begin has fallen behind the tail lpos,
>     the read block ID cannot be trusted.
> 
> This is exactly the why. It is only about whether we can trust that a
> non-garbage block ID was read. Or do you want me to add:
> 
>     ... because data read that is behind the tail lpos must be
>     considered garbage.
> 
> > But wait! This situation should get caught by the two existing descriptor
> > checks:
> >
> >>		case desc_committed:
> >>			/*
> >>			 * This data block is invalid if the descriptor
> >>			 * does not point back to it.
> >>			 */
> >>			if (blk_lpos->begin != lpos_begin)
> >>				return false;
> >>			desc_make_reusable(desc_ring, id);
> >>			break;
> >>		case desc_reusable:
> >>			/*
> >>			 * This data block is invalid if the descriptor
> >>			 * does not point back to it.
> >>			 */
> >>			if (blk_lpos->begin != lpos_begin)
> >>				return false;
> >>			break;
> 
> No. Your example showed that it is not caught here.

I am afraid that my example was wrong:

If blk->id comes from the new descriptor written by CPU1 then
blk_lpos->begin is based on the old data_ring->head_lpos.
Then it is different from lpos_begin.


Let's put it another way. The state of the descriptor defines validity
of the data. Descriptor in committed state _must not_ point to invalid
data block!!!

If a descriptor in committed state point to lpos that was in invalid range
before reading the descriptor then we have a huge hole in the design.

This is why I believe that the check of the descriptor must be enough.


Best Regards,
Petr

PS: I am sorry if I create too much confusion. It is easy to
get lost :-(

  parent reply	other threads:[~2020-06-11 13:55 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-01  9:40 [PATCH v2 0/3] printk: replace ringbuffer John Ogness
2020-05-01  9:40 ` John Ogness
2020-05-01  9:40 ` [PATCH v2 1/3] crash: add VMCOREINFO macro for anonymous structs John Ogness
2020-05-01  9:40   ` John Ogness
2020-06-03 10:16   ` Petr Mladek
2020-06-03 10:16     ` Petr Mladek
2020-05-01  9:40 ` [PATCH v2 2/3] printk: add lockless buffer John Ogness
2020-05-01  9:40   ` John Ogness
2020-05-18 13:03   ` John Ogness
2020-05-18 17:22     ` Linus Torvalds
2020-05-18 17:22       ` Linus Torvalds
2020-05-19 20:34       ` John Ogness
2020-05-19 20:34         ` John Ogness
2020-06-09  7:10   ` blk->id read race: was: " Petr Mladek
2020-06-09  7:10     ` Petr Mladek
2020-06-09 14:18     ` John Ogness
2020-06-09 14:18       ` John Ogness
2020-06-10  8:42       ` Petr Mladek
2020-06-10  8:42         ` Petr Mladek
2020-06-10 13:55         ` John Ogness
2020-06-10 13:55           ` John Ogness
2020-06-09  9:31   ` redundant check in make_data_reusable(): was " Petr Mladek
2020-06-09  9:31     ` Petr Mladek
2020-06-09 14:48     ` John Ogness
2020-06-09 14:48       ` John Ogness
2020-06-10  9:38       ` Petr Mladek
2020-06-10  9:38         ` Petr Mladek
2020-06-10 10:24         ` John Ogness
2020-06-10 10:24           ` John Ogness
2020-06-10 14:56           ` John Ogness
2020-06-10 14:56             ` John Ogness
2020-06-11 19:51             ` John Ogness
2020-06-11 19:51               ` John Ogness
2020-06-11 13:55           ` Petr Mladek [this message]
2020-06-11 13:55             ` Petr Mladek
2020-06-11 20:25             ` John Ogness
2020-06-11 20:25               ` John Ogness
2020-06-09  9:48   ` Full barrier in data_push_tail(): " Petr Mladek
2020-06-09  9:48     ` Petr Mladek
2020-06-09 15:03     ` John Ogness
2020-06-09 15:03       ` John Ogness
2020-06-09 11:37   ` Barrier before pushing desc_ring tail: " Petr Mladek
2020-06-09 11:37     ` Petr Mladek
2020-06-09 15:56     ` John Ogness
2020-06-09 15:56       ` John Ogness
2020-06-11 12:01       ` Petr Mladek
2020-06-11 12:01         ` Petr Mladek
2020-06-11 23:06         ` John Ogness
2020-06-11 23:06           ` John Ogness
2020-06-09 14:38   ` data_ring head_lpos and tail_lpos synchronization: " Petr Mladek
2020-06-09 14:38     ` Petr Mladek
2020-06-10  7:53     ` John Ogness
2020-06-10  7:53       ` John Ogness
2020-05-01  9:40 ` [PATCH v2 3/3] printk: use the lockless ringbuffer John Ogness
2020-05-01  9:40   ` John Ogness
2020-05-06 14:50   ` John Ogness
2020-05-06 14:50     ` John Ogness
2020-05-13 12:05 ` [PATCH v2 0/3] printk: replace ringbuffer Prarit Bhargava
2020-05-13 12:05   ` Prarit Bhargava
2020-05-15 10:24 ` Sergey Senozhatsky
2020-05-15 10:24   ` Sergey Senozhatsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200611135509.GE6581@linux-b0ei \
    --to=pmladek@suse.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=john.ogness@linutronix.de \
    --cc=kexec@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=parri.andrea@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=sergey.senozhatsky@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.