From: Petr Mladek <pmladek@suse.com>
To: John Ogness <john.ogness@linutronix.de>
Cc: Daniil Tatianin <d-tatianin@yandex-team.ru>,
linux-kernel@vger.kernel.org,
Steven Rostedt <rostedt@goodmis.org>,
Sergey Senozhatsky <senozhatsky@chromium.org>
Subject: Re: [PATCH v2 0/2] printk_ringbuffer: don't needlessly wrap data blocks around
Date: Mon, 15 Sep 2025 17:08:01 +0200 [thread overview]
Message-ID: <aMgr0dId_UfBptzW@pathway.suse.cz> (raw)
In-Reply-To: <84a52zy0iu.fsf@jogness.linutronix.de>
On Fri 2025-09-12 20:49:37, John Ogness wrote:
> Hi Petr,
>
> Summary: printk() is not in danger but we should correct a loose bounds
> check.
>
> On 2025-09-12, Petr Mladek <pmladek@suse.com> wrote:
> > Honestly, I would really like to limit the maximal record size to
> > 1/4 of the buffer size. I do not want to make the design more
> > complicated just to be able to fill just one record, definitely.
>
> So I was able to track this down. Your usage of
>
> DEFINE_PRINTKRB(test_rb, 4, 4);
>
> actually made it relatively easy because there are only 16
> descriptors. All I needed to do was dump the descriptors before each
> reserve, between reserve and commit, after commit, and when reserve
> fails. This allowed me to easily see exactly how the ringbuffer is
> behaving.
>
> The problem can be reproduced with a single writer, no reader
> needed. Using
>
> #define MAX_RBDATA_TEXT_SIZE (0x256 - sizeof(struct prbtest_rbdata))
>
> provides a wild range of attempts that trigger the problem within about
> 20 write cycles.
>
> The problem comes from the function data_make_reusable(). The job of
> this function is to push the data_ring tail forward, one data block at a
> time, while setting the related descriptors to reusable.
>
> After pushing the tail forward, if it still has not pushed it far enough
> for new requested reservation, it must push it further. For this it
> _assumes the current position of the tail is a descriptor ID for the
> next data block_. But what if the tail was pushed all the way to the
> head? Then there is no next data block and it will read in garbage,
> thinking it is the next descriptor ID to set reusable. And from there it
> just goes crazy because it is reading garbage to determine how big the
> data block is so that it can continue pushing the tail (beyond the head!).
>
> Example: Assume the 96 byte ringbuffer has a single message of 64
> bytes. Then we try to reserve space for a 72-byte
> message. data_make_reusable() will first set the descriptor of the
> 64-byte message to reusable and push the tail forward to index 64. But
> the new message needs 72 bytes, so data_make_reusable() will keep going
> and read the descriptor ID at index 64, but there is only random garbage
> at that position. 64 is the head and there is nothing valid after it.
Great catch and example!
I wondered why data_make_reusable() needed to push the tail that far.
The buffer was empty after making the 64 bytes long message free.
My understanding is that it is combination of the following effects:
1. The message is wrapped.
2. The ring buffer does not support proper wrapping. Instead,
the non-sufficient space at the end of the buffer stays
unused (last wrap). And the messages will be written
from the beginning of the buffer (next wrap).
=> the message will occupy more space than expected
unused space from last wrap + full message size in new wrap
In our case:
+ size of the buffer: 96
+ unused space in old wrap: 96 - 64 = 32
+ occupied space in new wrap: 72
=> total occupied space: = 32 + 72 = 104 > 96
=> lpos passed to data_push_tail() is from a never used space
=> This is why data_push_tail() tries to read
descriptor from a never used space and reads a garbage
> This situation can never happen for printk because of your 1/4 limit
> (MAX_LOG_TAKE_PART), although it is over-conservative.
I would say that it is conservative. It would survive mistakes from the
off-by-one family, ... ;-)
And it is still far from practical limits. Because having this
powerful ring buffer for 1, 2, or 4 messages looks line an overkill.
> It is enough to limit messages to 1/2 of the data ring
> (with Daniil's series). Otherwise the limit must be
> "1/2 - sizeof(long)" to also leave room for the
> trailing ID of a wrapping data block.
I am not sure why it is important to push it to the limits.
That said, I could live with it. Especially how, when we
understood what happened.
> I am still positive about Daniil's series.
Yes, the patch which prevents wrapping for perfectly fitting messages
looks good to me.
> And we should fix
> data_check_size() to be provide a proper limit as well as describe the
> critical relationship between data_check_size() and
> data_make_reusable().
Yup.
> I prefer not modify data_make_reusable() to handle this case. Currently
> data_make_reusable() does nothing with the head, so it would introduce
> new memory barriers. Also, the "push tail beyond head" scenario is a bit
> odd to handle. It is better just to document the assumption and put in
> the correct bounds checks.
It might be possible to catch this in either in data_alloc().
or in get_next_lpos(). They could ignore/yell about when
the really occupied space would be bigger than DATA_SIZE(data_ring).
Something like:
diff --git a/kernel/printk/printk_ringbuffer.c b/kernel/printk/printk_ringbuffer.c
index 17b741b2eccd..d7ba4c0d8c3b 100644
--- a/kernel/printk/printk_ringbuffer.c
+++ b/kernel/printk/printk_ringbuffer.c
@@ -1056,8 +1056,16 @@ static char *data_alloc(struct printk_ringbuffer *rb, unsigned int size,
do {
next_lpos = get_next_lpos(data_ring, begin_lpos, size);
- if (!data_push_tail(rb, next_lpos - DATA_SIZE(data_ring))) {
- /* Failed to allocate, specify a data-less block. */
+ /*
+ * Double check that the really used space won't be bigger than
+ * the ring buffer. Wrapped messages need to reserve more space,
+ * see get_next_lpos.
+ *
+ * Specify a data-less block when the check or the allocation
+ * fails.
+ */
+ if (WARN_ON_ONCE(next_lpos - begin_lpos > DATA_SIZE(data_ring)) ||
+ !data_push_tail(rb, next_lpos - DATA_SIZE(data_ring))) {
blk_lpos->begin = FAILED_LPOS;
blk_lpos->next = FAILED_LPOS;
return NULL;
Similar check would need to be done also in data_realloc().
I am not sure if it is worth it. Maybe, we could rule this out
when we limit the allocated size to 1/2 or 1/4 of the ring buffer size.
Best Regards,
Petr
next prev parent reply other threads:[~2025-09-15 15:08 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-05 14:41 [PATCH v2 0/2] printk_ringbuffer: don't needlessly wrap data blocks around Daniil Tatianin
2025-09-05 14:41 ` [PATCH v2 1/2] " Daniil Tatianin
2025-09-05 15:27 ` John Ogness
2025-09-05 15:29 ` Daniil Tatianin
2025-09-05 16:10 ` John Ogness
2025-09-11 8:34 ` Daniil Tatianin
2025-09-11 15:33 ` Petr Mladek
2025-09-05 15:30 ` John Ogness
2025-09-26 14:35 ` Daniil Tatianin
2025-09-26 14:44 ` Petr Mladek
2025-09-26 14:53 ` Daniil Tatianin
2025-10-22 12:46 ` Petr Mladek
2025-09-05 14:41 ` [PATCH v2 2/2] printk_ringbuffer: allow one data block to occupy the entire data ring Daniil Tatianin
2025-09-05 15:27 ` John Ogness
2025-09-11 15:30 ` [PATCH v2 0/2] printk_ringbuffer: don't needlessly wrap data blocks around Petr Mladek
2025-09-11 15:55 ` Petr Mladek
2025-09-11 16:11 ` Petr Mladek
2025-09-11 15:58 ` Petr Mladek
2025-09-11 16:12 ` John Ogness
2025-09-12 9:25 ` Petr Mladek
2025-09-12 9:54 ` Petr Mladek
2025-09-12 14:49 ` Petr Mladek
2025-09-12 15:15 ` Petr Mladek
2025-09-12 18:43 ` John Ogness
2025-09-13 17:38 ` Daniil Tatianin
2025-09-14 9:23 ` John Ogness
2025-09-14 9:56 ` Daniil Tatianin
2025-09-15 15:07 ` John Ogness
2025-09-15 16:00 ` Petr Mladek
2025-09-15 16:07 ` Daniil Tatianin
2025-09-15 15:08 ` Petr Mladek [this message]
2025-09-15 15:25 ` John Ogness
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aMgr0dId_UfBptzW@pathway.suse.cz \
--to=pmladek@suse.com \
--cc=d-tatianin@yandex-team.ru \
--cc=john.ogness@linutronix.de \
--cc=linux-kernel@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=senozhatsky@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.