From: John Ogness <john.ogness@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
Sergey Senozhatsky <sergey.senozhatsky@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Andrea Parri <parri.andrea@gmail.com>,
Paul McKenney <paulmck@kernel.org>,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support
Date: Thu, 27 Aug 2020 16:34:16 +0206 [thread overview]
Message-ID: <87mu2g5fjz.fsf@jogness.linutronix.de> (raw)
In-Reply-To: <20200827125047.GB4353@alley>
Hi Petr,
Thanks for the review. Most of your suggested changes are fine and I'll
integrate them for the next version. Here a few comments on some open
issues.
On 2020-08-27, Petr Mladek <pmladek@suse.com> wrote:
>> +static struct prb_desc *desc_reopen_last(struct prb_desc_ring *desc_ring,
>> + unsigned long *id_out)
>> +{
>> + unsigned long prev_state_val;
>> + struct prb_desc *d;
>> + unsigned long id;
>> +
>> + id = atomic_long_read(&desc_ring->head_id);
>> +
>> + d = to_desc(desc_ring, id);
>> +
>> + prev_state_val = id | DESC_COMMIT_MASK;
>> + if (!atomic_long_try_cmpxchg_relaxed(&d->state_var, &prev_state_val, id | 0))
>> + return NULL;
>
> This might deserve a comment why any barrier is not needed here.
> I guess that you did not add it because the reader API allows
> to read data only when DESC_FINAL_MASK is set.
>
> But this assumption looks dangerous to me. As I already wrote above.
> We might later need to add an API to read the already committed
> parts of a continuous lines.
Supporting such an API will require more than just a memory barrier
here. When a record is re-opened, it actually transitions back to a
reserved state. At that point there is no difference between a record
being extended and one that is reserved for the first time.
Perhaps my idea of chaining blocks [0] might be more appropriate. Then
there are discrete committed pieces that can be read if needed.
Either way, such a partial-read feature will require a new memory
barrier pair so a reader would see the state change before any data was
modified. As the code is now (without such a feature), a memory barrier
is not needed.
>> +static void desc_finalize(struct prb_desc_ring *desc_ring, unsigned long id)
>> +{
>> + unsigned long prev_state_val = id | DESC_COMMIT_MASK;
>> + struct prb_desc *d = to_desc(desc_ring, id);
>> + bool is_final;
>> +
>> + while (!atomic_long_try_cmpxchg_relaxed(&d->state_var, &prev_state_val,
>> + prev_state_val | DESC_FINAL_MASK)) {
>> +
>> + if (get_desc_state(id, prev_state_val, &is_final) != desc_reserved)
>> + break;
>> +
>> + if (is_final)
>> + break;
>> + }
>
> Do we need the cycle and check if it succeeded?
The initial cmpxchg() is assuming the previous descriptor is in the
commit+!final state (common case). But, for example, if it was in the
reserve+final state, the cmpxhg() would fail and the finalization is
already successful.
> When could the cmpxchg fail?
>
> + It is still reserved or reserved again => it should get
> finalized by the one who reserved it. This is related to
> the problem reported/discussed in the other thread,
> see https://lore.kernel.org/r/87lfi1ls2g.fsf@jogness.linutronix.de
>
> + It is already in the final (committed) state => success
>
> + It is in reusable state => already done in the past
>
> + It is already reused => already done in the past
+ It is being reserved but is still uninitialized (state_var = 0).
+ It is being reserved but is not yet set (id is one wrap behind).
> IMHO, the following simple call should be enough.
>
> atomic_long_try_cmpxchg_relaxed(&d->state_var, &prev_state_val,
> prev_state_val | DESC_FINAL_MASK);
This works if we define that the FINAL flag can only be set when the
COMMIT flag is set. (More on this below.)
> Well, I am not sure about the "relaxed" variant again. I would feel
> more comfortable when it is full barrier especially because it
> might be done by another CPU than what did the commit. And
> we need to avoid the race when nobody finalize the record.
The FINAL flag has nothing to do with whether or not the data is
consistent. It's only use is to prevent a writer from transitioning a
descriptor back to reserved via cmpxchg(). What would a memory barrier
here be pairing with?
>> +static void _prb_commit(struct prb_reserved_entry *e, unsigned long final_mask)
>> {
>> struct prb_desc_ring *desc_ring = &e->rb->desc_ring;
>> struct prb_desc *d = to_desc(desc_ring, e->id);
>> unsigned long prev_state_val = e->id | 0;
>>
>> - /* Now the writer has finished all writing: LMM(prb_commit:A) */
>> + /* Now the writer has finished all writing: LMM(_prb_commit:A) */
>>
>> /*
>> * Set the descriptor as committed. See "ABA Issues" about why
>> @@ -1244,14 +1518,66 @@ void prb_commit(struct prb_reserved_entry *e)
>> * this. This pairs with desc_read:B.
>> */
>> if (!atomic_long_try_cmpxchg(&d->state_var, &prev_state_val,
>> - e->id | DESC_COMMIT_MASK)) { /* LMM(prb_commit:B) */
>> - WARN_ON_ONCE(1);
>> + e->id | DESC_COMMIT_MASK |
>> + final_mask)) { /* LMM(_prb_commit:B) */
>> + /*
>> + * This reserved descriptor must have been finalized already.
>> + * Retry with a reserved+final expected value.
>> + */
>> + prev_state_val = e->id | 0 | DESC_FINAL_MASK;
>
> This does not make sense to me. The state "e->id | 0 | DESC_FINAL_MASK"
> must never happen. It would mean that someone finalized
> record that is still being modified.
Correct. Setting the FINAL flag means the descriptor cannot be
_reopened_. It has nothing to do with the current state of the
descriptor. Once the FINAL flag is set, it remains set for the remaining
lifetime of that record.
> Or we both have different understanding of the logic.
Yes.
> Well, there are actually two approaches:
>
> + I originally expected that FINAL bit could be set only when
> COMMIT bit is set. But this brings the problems that prb_commit()
> would need to set FINAL when it is not longer the last descriptor.
My first attempt was to implement this. It turned out complex because it
involves descriptors finalizing themselves _and_ descriptors finalizing
their predecessor. This required two new memory barrier pairs:
- between a writer committing and re-checking the head_id that another
writer may have modified
- between a writer setting the state and another writer checking that
state
After re-evaluating the purpose of the FINAL flag, I decided that it
would be simpler to implement the 2nd approach (below) and would not
require any new memory barrier pairs.
> + Another approach is that FINAL bit could be set even when the
> COMMIT is not set. It would always be set by the next
> prb_reserve(). But it causes that there are more possible
> combinations of COMMIT and FINAL bits. As a result, the caller
> would need try more variants of the cmpxchg() calls. And
> it creates another races/cycles, ...
It does not cause more races. And I don't see where it will cause more
cmpxchg() calls. It probably _does_ lead to more cmpxchg() _code_. But
those are fallbacks for when the common case fails.
> I guess that you wanted to implement the 2nd approach and ended in
> many troubles. I wonder if the 1st approach might be easier.
Well, the "many troubles" were due to my naive assumption about the
previous descriptor state. Once I realized that, the missing piece was
obvious.
I will reconsider the first approach. Perhaps adding memory barriers is
preferable if it reduces lines of code.
And we will need to clarify partial continuous line reading because
right now that will not work.
John Ogness
[0] https://lkml.kernel.org/r/875z9nvvl2.fsf@jogness.linutronix.de
next prev parent reply other threads:[~2020-08-27 14:32 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-24 10:35 [PATCH v2 0/7][next] printk: reimplement LOG_CONT handling John Ogness
2020-08-24 10:35 ` [PATCH v2 1/7][next] printk: ringbuffer: rename DESC_COMMITTED_MASK flag John Ogness
2020-08-25 17:09 ` Petr Mladek
2020-08-24 10:35 ` [PATCH v2 2/7][next] printk: ringbuffer: change representation of reusable John Ogness
2020-08-25 17:10 ` Petr Mladek
2020-08-24 10:35 ` [PATCH v2 3/7][next] printk: ringbuffer: relocate get_data() John Ogness
2020-08-25 17:14 ` Petr Mladek
2020-08-24 10:35 ` [PATCH v2 4/7][next] printk: ringbuffer: add BLK_DATALESS() macro John Ogness
2020-08-25 17:24 ` Petr Mladek
2020-08-24 10:35 ` [PATCH v2 5/7][next] printk: ringbuffer: add finalization/extension support John Ogness
2020-08-26 8:39 ` John Ogness
2020-08-26 10:01 ` Sergey Senozhatsky
2020-08-26 12:37 ` John Ogness
2020-08-26 15:07 ` Petr Mladek
2020-08-27 9:58 ` John Ogness
2020-08-27 15:17 ` Petr Mladek
2020-08-28 7:21 ` Petr Mladek
2020-08-28 10:01 ` John Ogness
2020-08-27 12:50 ` Petr Mladek
2020-08-27 14:28 ` John Ogness [this message]
2020-08-24 10:35 ` [PATCH v2 6/7][next] printk: reimplement log_cont using record extension John Ogness
2020-08-24 10:35 ` [PATCH v2 7/7][next] scripts/gdb: support printk finalized records John Ogness
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mu2g5fjz.fsf@jogness.linutronix.de \
--to=john.ogness@linutronix.de \
--cc=gregkh@linuxfoundation.org \
--cc=kexec@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=parri.andrea@gmail.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rostedt@goodmis.org \
--cc=sergey.senozhatsky.work@gmail.com \
--cc=sergey.senozhatsky@gmail.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox