* [PATCH] sysfs: clamp show() return value in sysfs_kf_read()
@ 2026-05-20 13:07 Greg Kroah-Hartman
2026-05-20 13:36 ` Rafael J. Wysocki
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Greg Kroah-Hartman @ 2026-05-20 13:07 UTC (permalink / raw)
To: driver-core
Cc: linux-kernel, Greg Kroah-Hartman, Rafael J. Wysocki,
Danilo Krummrich, NeilBrown, Tejun Heo
sysfs_kf_seq_show() defends against buggy show() callbacks that return
larger than PAGE_SIZE by clamping the value and printing a warning.
sysfs_kf_read(), the prealloc variant, has no such defense.
The only current in-tree user of __ATTR_PREALLOC is drivers/md/md.c,
whose show() callbacks are well-behaved, so this is hardening against
future drivers doing foolish things and out-of-tree code doing even more
foolish things.
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: NeilBrown <neil@brown.name>
Cc: Tejun Heo <tj@kernel.org>
Fixes: 2b75869bba67 ("sysfs/kernfs: allow attributes to request write buffer be pre-allocated.")
Assisted-by: gregkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/sysfs/file.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
index 5709cede1d75..25b44fe171a3 100644
--- a/fs/sysfs/file.c
+++ b/fs/sysfs/file.c
@@ -120,6 +120,10 @@ static ssize_t sysfs_kf_read(struct kernfs_open_file *of, char *buf,
len = ops->show(kobj, of->kn->priv, buf);
if (len < 0)
return len;
+ if (len >= (ssize_t)PAGE_SIZE) {
+ printk("fill_read_buffer: %pS returned bad count\n", ops->show);
+ len = PAGE_SIZE - 1;
+ }
if (pos) {
if (len <= pos)
return 0;
--
2.54.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] sysfs: clamp show() return value in sysfs_kf_read()
2026-05-20 13:07 [PATCH] sysfs: clamp show() return value in sysfs_kf_read() Greg Kroah-Hartman
@ 2026-05-20 13:36 ` Rafael J. Wysocki
2026-05-20 14:43 ` Danilo Krummrich
` (2 subsequent siblings)
3 siblings, 0 replies; 11+ messages in thread
From: Rafael J. Wysocki @ 2026-05-20 13:36 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: driver-core, linux-kernel, Rafael J. Wysocki, Danilo Krummrich,
NeilBrown, Tejun Heo
On Wed, May 20, 2026 at 3:06 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> sysfs_kf_seq_show() defends against buggy show() callbacks that return
> larger than PAGE_SIZE by clamping the value and printing a warning.
> sysfs_kf_read(), the prealloc variant, has no such defense.
>
> The only current in-tree user of __ATTR_PREALLOC is drivers/md/md.c,
> whose show() callbacks are well-behaved, so this is hardening against
> future drivers doing foolish things and out-of-tree code doing even more
> foolish things.
>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: NeilBrown <neil@brown.name>
> Cc: Tejun Heo <tj@kernel.org>
> Fixes: 2b75869bba67 ("sysfs/kernfs: allow attributes to request write buffer be pre-allocated.")
> Assisted-by: gregkh_clanker_t1000
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
No issues found, so
Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
> ---
> fs/sysfs/file.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> index 5709cede1d75..25b44fe171a3 100644
> --- a/fs/sysfs/file.c
> +++ b/fs/sysfs/file.c
> @@ -120,6 +120,10 @@ static ssize_t sysfs_kf_read(struct kernfs_open_file *of, char *buf,
> len = ops->show(kobj, of->kn->priv, buf);
> if (len < 0)
> return len;
> + if (len >= (ssize_t)PAGE_SIZE) {
> + printk("fill_read_buffer: %pS returned bad count\n", ops->show);
> + len = PAGE_SIZE - 1;
> + }
> if (pos) {
> if (len <= pos)
> return 0;
> --
> 2.54.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sysfs: clamp show() return value in sysfs_kf_read()
2026-05-20 13:07 [PATCH] sysfs: clamp show() return value in sysfs_kf_read() Greg Kroah-Hartman
2026-05-20 13:36 ` Rafael J. Wysocki
@ 2026-05-20 14:43 ` Danilo Krummrich
2026-05-20 18:19 ` Tejun Heo
2026-05-20 22:11 ` David Laight
3 siblings, 0 replies; 11+ messages in thread
From: Danilo Krummrich @ 2026-05-20 14:43 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: driver-core, linux-kernel, Rafael J. Wysocki, NeilBrown,
Tejun Heo
On Wed May 20, 2026 at 3:07 PM CEST, Greg Kroah-Hartman wrote:
> sysfs_kf_seq_show() defends against buggy show() callbacks that return
> larger than PAGE_SIZE by clamping the value and printing a warning.
> sysfs_kf_read(), the prealloc variant, has no such defense.
>
> The only current in-tree user of __ATTR_PREALLOC is drivers/md/md.c,
> whose show() callbacks are well-behaved, so this is hardening against
> future drivers doing foolish things and out-of-tree code doing even more
> foolish things.
>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: NeilBrown <neil@brown.name>
> Cc: Tejun Heo <tj@kernel.org>
> Fixes: 2b75869bba67 ("sysfs/kernfs: allow attributes to request write buffer be pre-allocated.")
> Assisted-by: gregkh_clanker_t1000
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sysfs: clamp show() return value in sysfs_kf_read()
2026-05-20 13:07 [PATCH] sysfs: clamp show() return value in sysfs_kf_read() Greg Kroah-Hartman
2026-05-20 13:36 ` Rafael J. Wysocki
2026-05-20 14:43 ` Danilo Krummrich
@ 2026-05-20 18:19 ` Tejun Heo
2026-05-21 6:18 ` Greg Kroah-Hartman
2026-05-20 22:11 ` David Laight
3 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2026-05-20 18:19 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: driver-core, linux-kernel, Rafael J. Wysocki, Danilo Krummrich,
NeilBrown
Hello,
Two nits:
- Buffer is atomic_write_len ?: PAGE_SIZE, so probably better to clamp
to that than hardcode PAGE_SIZE.
- pr_warn() instead of bare printk()?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sysfs: clamp show() return value in sysfs_kf_read()
2026-05-20 13:07 [PATCH] sysfs: clamp show() return value in sysfs_kf_read() Greg Kroah-Hartman
` (2 preceding siblings ...)
2026-05-20 18:19 ` Tejun Heo
@ 2026-05-20 22:11 ` David Laight
2026-05-21 6:19 ` Greg Kroah-Hartman
3 siblings, 1 reply; 11+ messages in thread
From: David Laight @ 2026-05-20 22:11 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: driver-core, linux-kernel, Rafael J. Wysocki, Danilo Krummrich,
NeilBrown, Tejun Heo
On Wed, 20 May 2026 15:07:01 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> sysfs_kf_seq_show() defends against buggy show() callbacks that return
> larger than PAGE_SIZE by clamping the value and printing a warning.
> sysfs_kf_read(), the prealloc variant, has no such defense.
>
> The only current in-tree user of __ATTR_PREALLOC is drivers/md/md.c,
> whose show() callbacks are well-behaved, so this is hardening against
> future drivers doing foolish things and out-of-tree code doing even more
> foolish things.
What is the rational for it using PREALLOC?
From the code it looks like it could set a buffer size, but it doesn't seem to.
Which means there is a kmalloc(PAGE_SIZE + 1) in there.
If it did set a size, the check below would be wrong.
From the look of the fragment below it is just passed the address of the
pre-allocated buffer.
That also means it can't use sysfs_emit() because that relies on a
page-aligned buffer.
Also I suspect that kmalloc() grabbing a buffer from the per-cpu freelist
is cheaper than the mutex required to lock access to the preallocted buffer.
It would be 'nice' to change the type of the buffer passed to the show()
functions and to sysfs_emit() to something other than 'char *'.
Even if the initial change is just a typdef for 'char *' so that
you can tell which functions can call sysfs_emit().
At the moment it is pretty hard to actually decide whether it is safe
to change the printf() to sysfs_emit().
If all the show() functions are expected to include a terminating '\n'
then the wrapper code could add one if missing.
That would save a lot of strcat(buf, '\n'); return strlen(buf); sequences.
I also think (bad for me at 11pm) that the buffer should be 4k not PAGE_SIZE.
-- David
>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: NeilBrown <neil@brown.name>
> Cc: Tejun Heo <tj@kernel.org>
> Fixes: 2b75869bba67 ("sysfs/kernfs: allow attributes to request write buffer be pre-allocated.")
> Assisted-by: gregkh_clanker_t1000
> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> ---
> fs/sysfs/file.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/sysfs/file.c b/fs/sysfs/file.c
> index 5709cede1d75..25b44fe171a3 100644
> --- a/fs/sysfs/file.c
> +++ b/fs/sysfs/file.c
> @@ -120,6 +120,10 @@ static ssize_t sysfs_kf_read(struct kernfs_open_file *of, char *buf,
> len = ops->show(kobj, of->kn->priv, buf);
> if (len < 0)
> return len;
> + if (len >= (ssize_t)PAGE_SIZE) {
> + printk("fill_read_buffer: %pS returned bad count\n", ops->show);
> + len = PAGE_SIZE - 1;
> + }
> if (pos) {
> if (len <= pos)
> return 0;
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sysfs: clamp show() return value in sysfs_kf_read()
2026-05-20 18:19 ` Tejun Heo
@ 2026-05-21 6:18 ` Greg Kroah-Hartman
2026-05-21 10:04 ` David Laight
2026-05-21 16:18 ` Tejun Heo
0 siblings, 2 replies; 11+ messages in thread
From: Greg Kroah-Hartman @ 2026-05-21 6:18 UTC (permalink / raw)
To: Tejun Heo
Cc: driver-core, linux-kernel, Rafael J. Wysocki, Danilo Krummrich,
NeilBrown
On Wed, May 20, 2026 at 08:19:34AM -1000, Tejun Heo wrote:
> Hello,
>
> Two nits:
>
> - Buffer is atomic_write_len ?: PAGE_SIZE, so probably better to clamp
> to that than hardcode PAGE_SIZE.
Where is that check at? And sysfs_kf_seq_show() doesn't check it this
way, should that change?
> - pr_warn() instead of bare printk()?
This is copying the message in sysfs_kf_seq_show() that has the same
exact format. I will send a follow-on patch to make both the same.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sysfs: clamp show() return value in sysfs_kf_read()
2026-05-20 22:11 ` David Laight
@ 2026-05-21 6:19 ` Greg Kroah-Hartman
2026-05-21 9:17 ` David Laight
0 siblings, 1 reply; 11+ messages in thread
From: Greg Kroah-Hartman @ 2026-05-21 6:19 UTC (permalink / raw)
To: David Laight
Cc: driver-core, linux-kernel, Rafael J. Wysocki, Danilo Krummrich,
NeilBrown, Tejun Heo
On Wed, May 20, 2026 at 11:11:58PM +0100, David Laight wrote:
> On Wed, 20 May 2026 15:07:01 +0200
> Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
>
> > sysfs_kf_seq_show() defends against buggy show() callbacks that return
> > larger than PAGE_SIZE by clamping the value and printing a warning.
> > sysfs_kf_read(), the prealloc variant, has no such defense.
> >
> > The only current in-tree user of __ATTR_PREALLOC is drivers/md/md.c,
> > whose show() callbacks are well-behaved, so this is hardening against
> > future drivers doing foolish things and out-of-tree code doing even more
> > foolish things.
>
> What is the rational for it using PREALLOC?
No idea, you might want to dig to find the commit that did this.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sysfs: clamp show() return value in sysfs_kf_read()
2026-05-21 6:19 ` Greg Kroah-Hartman
@ 2026-05-21 9:17 ` David Laight
0 siblings, 0 replies; 11+ messages in thread
From: David Laight @ 2026-05-21 9:17 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: driver-core, linux-kernel, Rafael J. Wysocki, Danilo Krummrich,
NeilBrown, Tejun Heo
On Thu, 21 May 2026 08:19:16 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> On Wed, May 20, 2026 at 11:11:58PM +0100, David Laight wrote:
> > On Wed, 20 May 2026 15:07:01 +0200
> > Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> >
> > > sysfs_kf_seq_show() defends against buggy show() callbacks that return
> > > larger than PAGE_SIZE by clamping the value and printing a warning.
> > > sysfs_kf_read(), the prealloc variant, has no such defense.
> > >
> > > The only current in-tree user of __ATTR_PREALLOC is drivers/md/md.c,
> > > whose show() callbacks are well-behaved, so this is hardening against
> > > future drivers doing foolish things and out-of-tree code doing even more
> > > foolish things.
> >
> > What is the rational for it using PREALLOC?
>
> No idea, you might want to dig to find the commit that did this.
This one:
commit 750f199ee8b578062341e6ddfe36c59ac8ff2dcb
Author: NeilBrown <neil@brown.name>
Date: Tue Sep 30 08:53:05 2014 +1000
md: mark some attributes as pre-alloc
Since __ATTR_PREALLOC was introduced in v3.19-rc1~78^2~18
it can now be used by md.
This ensure that writing to these sysfs attributes will never
block due to a memory allocation.
Such blocking could become a deadlock if mdmon is trying to
reconfigure an array after a failure prior to re-enabling writes.
That might be better handled with a flag that changes the kmalloc()
to NOIO and falls back onto a global preallocated page in the
normal path.
It would certainly let a lot of code be deleted (always good).
The atomic_write_len would then just be a max_write_len
(and maybe renamed).
The only real use is the 'cgroup' code that wants to support writes
that are larger than 4k.
Currently, if atomic_write_len is zero writes are truncated to PAGE_SIZE,
if non-zero overlong writes are rejected.
AFAICT the file offset is never checked - so all writes are (sort of) appends.
I'm not at all sure that makes sense.
Possibly all overlong writes should be rejected.
-- David
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sysfs: clamp show() return value in sysfs_kf_read()
2026-05-21 6:18 ` Greg Kroah-Hartman
@ 2026-05-21 10:04 ` David Laight
2026-05-21 16:18 ` Tejun Heo
1 sibling, 0 replies; 11+ messages in thread
From: David Laight @ 2026-05-21 10:04 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Tejun Heo, driver-core, linux-kernel, Rafael J. Wysocki,
Danilo Krummrich, NeilBrown
On Thu, 21 May 2026 08:18:32 +0200
Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote:
> On Wed, May 20, 2026 at 08:19:34AM -1000, Tejun Heo wrote:
> > Hello,
> >
> > Two nits:
> >
> > - Buffer is atomic_write_len ?: PAGE_SIZE, so probably better to clamp
> > to that than hardcode PAGE_SIZE.
>
> Where is that check at? And sysfs_kf_seq_show() doesn't check it this
> way, should that change?
It would be better to always fix the 'show' buffer at PAGE_SIZE (or 4k).
The only place that uses is different value is the cgroup code and I
think it needs to support longer writes (100 + 6 * NR_CPUS).
I've not looked at how it handles the matching reads - I presume they exist.
If the data is binary, or it uses seq_printf() then it can just support
a longer dynamically allocated buffer.
-- David
>
> > - pr_warn() instead of bare printk()?
>
> This is copying the message in sysfs_kf_seq_show() that has the same
> exact format. I will send a follow-on patch to make both the same.
>
> thanks,
>
> greg k-h
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sysfs: clamp show() return value in sysfs_kf_read()
2026-05-21 6:18 ` Greg Kroah-Hartman
2026-05-21 10:04 ` David Laight
@ 2026-05-21 16:18 ` Tejun Heo
2026-05-21 21:42 ` David Laight
1 sibling, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2026-05-21 16:18 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: driver-core, linux-kernel, Rafael J. Wysocki, Danilo Krummrich,
NeilBrown
Hello,
On Thu, May 21, 2026 at 08:18:32AM +0200, Greg Kroah-Hartman wrote:
> On Wed, May 20, 2026 at 08:19:34AM -1000, Tejun Heo wrote:
> > Hello,
> >
> > Two nits:
> >
> > - Buffer is atomic_write_len ?: PAGE_SIZE, so probably better to clamp
> > to that than hardcode PAGE_SIZE.
>
> Where is that check at? And sysfs_kf_seq_show() doesn't check it this
> way, should that change?
In kernfs_fop_open(), if ops->prealloc, we allocate of->prealloc_buf to the
size of of->atomic_write_len ?: PAGE_SIZE. Maybe we should just update
of->atomic_write_len. I think the intention was to allow >PAGE_SIZE atomic
content. seq_file now does dynamic buffer resizing, so this may not be
necessary. There's no clean interface to preemptly set the minimum buffer
size right now but that probably is the better direction. Then, we can just
always go through seq_file.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] sysfs: clamp show() return value in sysfs_kf_read()
2026-05-21 16:18 ` Tejun Heo
@ 2026-05-21 21:42 ` David Laight
0 siblings, 0 replies; 11+ messages in thread
From: David Laight @ 2026-05-21 21:42 UTC (permalink / raw)
To: Tejun Heo
Cc: Greg Kroah-Hartman, driver-core, linux-kernel, Rafael J. Wysocki,
Danilo Krummrich, NeilBrown
On Thu, 21 May 2026 06:18:09 -1000
Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On Thu, May 21, 2026 at 08:18:32AM +0200, Greg Kroah-Hartman wrote:
> > On Wed, May 20, 2026 at 08:19:34AM -1000, Tejun Heo wrote:
> > > Hello,
> > >
> > > Two nits:
> > >
> > > - Buffer is atomic_write_len ?: PAGE_SIZE, so probably better to clamp
> > > to that than hardcode PAGE_SIZE.
> >
> > Where is that check at? And sysfs_kf_seq_show() doesn't check it this
> > way, should that change?
>
> In kernfs_fop_open(), if ops->prealloc, we allocate of->prealloc_buf to the
> size of of->atomic_write_len ?: PAGE_SIZE. Maybe we should just update
> of->atomic_write_len. I think the intention was to allow >PAGE_SIZE atomic
> content. seq_file now does dynamic buffer resizing, so this may not be
> necessary. There's no clean interface to preemptly set the minimum buffer
> size right now but that probably is the better direction. Then, we can just
> always go through seq_file.
Except that the buffer size is always PAGE_SIZE in the prealloc paths.
A larger size is used by the cgroup code - but that is really for writes.
There seems to have been a lot of rework to all this code around 3.13.
I can't quite see what it looked like before.
But I think the write code should reject writes longer than
of->atomic_write_len ?: PAGE_SIZE because all writes are treated as
having 'offset zero'.
If op->atomic_write_len is non-zero overlong writes are rejected.
If op->atomic_write_len is zero then writes are truncated and userspace
is likely to do a second write for the remainder - which won't DTRT.
The naming suggests that loop existed in the kernel at some point,
but I can't find that version.
The 'prealloc' code was added later and reused 'atomic_write_len' for
the buffer size. But the only users leave it as default - allowing
PAGE_SIZE requests which require a kmalloc(PAGE_SIZE + 1) (aka 2 pages)
because of the trailing '\0' added to writes.
The 'prealloc' code is all about trying to avoid a kmalloc() that might
fail and cause a deadlock.
But I suspect the daemon doing the system call wont actually get to that
kmalloc() if memory to so short it fails to allocate the IO buffer.
Even if that were a problem an option to make that path 'try harder'
to kmalloc() the buffer would be better than preallocating pairs of pages
for each of the six? cgroup 'files'.
(It could even fall back to a single preallocated buffer for all users
that deem it important.)
Reading this code is hard because of the tangled mess between kernfs/file.c
and sysfs/file.c.
-- David
>
> Thanks.
>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-05-21 21:42 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-20 13:07 [PATCH] sysfs: clamp show() return value in sysfs_kf_read() Greg Kroah-Hartman
2026-05-20 13:36 ` Rafael J. Wysocki
2026-05-20 14:43 ` Danilo Krummrich
2026-05-20 18:19 ` Tejun Heo
2026-05-21 6:18 ` Greg Kroah-Hartman
2026-05-21 10:04 ` David Laight
2026-05-21 16:18 ` Tejun Heo
2026-05-21 21:42 ` David Laight
2026-05-20 22:11 ` David Laight
2026-05-21 6:19 ` Greg Kroah-Hartman
2026-05-21 9:17 ` David Laight
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox