[PATCH v2] seq_file: Unconditionally use vmalloc for buffer

Linux Hardening
 help / color / mirror / Atom feed

* [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
@ 2021-03-15 17:48 Kees Cook
  2021-03-15 18:33 ` Al Viro
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Kees Cook @ 2021-03-15 17:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Kees Cook, Greg Kroah-Hartman, Michal Hocko, Alexey Dobriyan,
	Lee Duncan, Chris Leech, Adam Nichols, linux-kernel,
	linux-fsdevel, linux-hardening

The sysfs interface to seq_file continues to be rather fragile, as seen
with some recent exploits[1]. Move the seq_file buffer to the vmap area
(while retaining the accounting flag), since it has guard pages that
will catch and stop linear overflows. This seems justified given that
seq_file already uses kvmalloc(), is almost always using a PAGE_SIZE or
larger allocation, has allocations are normally short lived, and is not
normally on a performance critical path.

[1] https://blog.grimm-co.com/2021/03/new-old-bugs-in-linux-kernel.html

Signed-off-by: Kees Cook <keescook@chromium.org>
---
 fs/seq_file.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/seq_file.c b/fs/seq_file.c
index cb11a34fb871..16fb4a4e61e3 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -32,7 +32,12 @@ static void seq_set_overflow(struct seq_file *m)
 
 static void *seq_buf_alloc(unsigned long size)
 {
-	return kvmalloc(size, GFP_KERNEL_ACCOUNT);
+	/*
+	 * To be proactively defensive against buggy seq_get_buf() callers
+	 * (i.e. sysfs handlers), use the vmap area to gain the trailing
+	 * guard page which will protect against linear buffer overflows.
+	 */
+	return __vmalloc(size, GFP_KERNEL_ACCOUNT);
 }
 
 /**
@@ -130,7 +135,7 @@ static int traverse(struct seq_file *m, loff_t offset)
 
 Eoverflow:
 	m->op->stop(m, p);
-	kvfree(m->buf);
+	vfree(m->buf);
 	m->count = 0;
 	m->buf = seq_buf_alloc(m->size <<= 1);
 	return !m->buf ? -ENOMEM : -EAGAIN;
@@ -237,7 +242,7 @@ ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 			goto Fill;
 		// need a bigger buffer
 		m->op->stop(m, p);
-		kvfree(m->buf);
+		vfree(m->buf);
 		m->count = 0;
 		m->buf = seq_buf_alloc(m->size <<= 1);
 		if (!m->buf)
@@ -349,7 +354,7 @@ EXPORT_SYMBOL(seq_lseek);
 int seq_release(struct inode *inode, struct file *file)
 {
 	struct seq_file *m = file->private_data;
-	kvfree(m->buf);
+	vfree(m->buf);
 	kmem_cache_free(seq_file_cache, m);
 	return 0;
 }
@@ -585,7 +590,7 @@ int single_open_size(struct file *file, int (*show)(struct seq_file *, void *),
 		return -ENOMEM;
 	ret = single_open(file, show, data);
 	if (ret) {
-		kvfree(buf);
+		vfree(buf);
 		return ret;
 	}
 	((struct seq_file *)file->private_data)->buf = buf;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-15 17:48 [PATCH v2] seq_file: Unconditionally use vmalloc for buffer Kees Cook
@ 2021-03-15 18:33 ` Al Viro
  2021-03-15 20:43   ` Kees Cook
  2021-03-16  8:31 ` Michal Hocko
       [not found] ` <20210319140742.GC30349@xsang-OptiPlex-9020>
  2 siblings, 1 reply; 23+ messages in thread
From: Al Viro @ 2021-03-15 18:33 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Greg Kroah-Hartman, Michal Hocko, Alexey Dobriyan,
	Lee Duncan, Chris Leech, Adam Nichols, linux-kernel,
	linux-fsdevel, linux-hardening

On Mon, Mar 15, 2021 at 10:48:51AM -0700, Kees Cook wrote:
> The sysfs interface to seq_file continues to be rather fragile, as seen
> with some recent exploits[1]. Move the seq_file buffer to the vmap area
> (while retaining the accounting flag), since it has guard pages that
> will catch and stop linear overflows. This seems justified given that
> seq_file already uses kvmalloc(), is almost always using a PAGE_SIZE or
> larger allocation, has allocations are normally short lived, and is not
> normally on a performance critical path.

You are attacking the wrong part of it.  Is there any reason for having
seq_get_buf() public in the first place?

For example, the use in blkcg_print_stat() is entirely due to the bogus
->pd_stat_fn() calling conventions.  Fuck scnprintf() games, just pass
seq_file to ->pd_stat_fn() and use seq_printf() instead.  Voila - no
seq_get_buf()/seq_commit()/scnprintf() garbage.

tegra use is no better, AFAICS.  inifinibarf one...  allow me to quote
that gem in full:
static int _driver_stats_seq_show(struct seq_file *s, void *v)
{
        loff_t *spos = v;
        char *buffer;
        u64 *stats = (u64 *)&hfi1_stats;
        size_t sz = seq_get_buf(s, &buffer);

        if (sz < sizeof(u64))
                return SEQ_SKIP;
        /* special case for interrupts */
        if (*spos == 0)
                *(u64 *)buffer = hfi1_sps_ints();
        else
                *(u64 *)buffer = stats[*spos];
        seq_commit(s,  sizeof(u64));
        return 0;
}
Yes, really.  Not to mention that there's seq_write(), what the _hell_
is it using seq_file for in the first place?  Oh, and hfi_stats is
actually this:
struct hfi1_ib_stats {
        __u64 sps_ints; /* number of interrupts handled */
        __u64 sps_errints; /* number of error interrupts */
        __u64 sps_txerrs; /* tx-related packet errors */
        __u64 sps_rcverrs; /* non-crc rcv packet errors */
        __u64 sps_hwerrs; /* hardware errors reported (parity, etc.) */
        __u64 sps_nopiobufs; /* no pio bufs avail from kernel */
        __u64 sps_ctxts; /* number of contexts currently open */
        __u64 sps_lenerrs; /* number of kernel packets where RHF != LRH len */
        __u64 sps_buffull;
        __u64 sps_hdrfull;
};
I won't go into further details - CDA might be dead and buried, but there
should be some limit to public obscenity ;-/

procfs use is borderline - it looks like there might be a good cause
for seq_escape_str().

And sysfs_kf_seq_show()...  Do we want to go through seq_file there at
all?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-15 18:33 ` Al Viro
@ 2021-03-15 20:43   ` Kees Cook
  2021-03-16  7:24     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 23+ messages in thread
From: Kees Cook @ 2021-03-15 20:43 UTC (permalink / raw)
  To: Al Viro
  Cc: Andrew Morton, Greg Kroah-Hartman, Michal Hocko, Alexey Dobriyan,
	Lee Duncan, Chris Leech, Adam Nichols, linux-kernel,
	linux-fsdevel, linux-hardening

On Mon, Mar 15, 2021 at 06:33:10PM +0000, Al Viro wrote:
> On Mon, Mar 15, 2021 at 10:48:51AM -0700, Kees Cook wrote:
> > The sysfs interface to seq_file continues to be rather fragile, as seen
> > with some recent exploits[1]. Move the seq_file buffer to the vmap area
> > (while retaining the accounting flag), since it has guard pages that
> > will catch and stop linear overflows. This seems justified given that
> > seq_file already uses kvmalloc(), is almost always using a PAGE_SIZE or
> > larger allocation, has allocations are normally short lived, and is not
> > normally on a performance critical path.
> 
> You are attacking the wrong part of it.  Is there any reason for having
> seq_get_buf() public in the first place?

Completely agreed. seq_get_buf() should be totally ripped out.
Unfortunately, this is going to be a long road because of sysfs's ATTR
stuff, there are something like 5000 callers, and the entire API was
designed to avoid refactoring all those callers from
sysfs_kf_seq_show().

However, since I also need to entirely rewrite the sysfs vs kobj APIs[1]
for CFI, I'm working on a plan to fix it all at once, but based on my
experience refactoring the timer struct, it's going to be a very painful
and long road.

So, in the meantime, I'd like to make this change so we can get bounds
checking for free on seq_file (since it's almost always PAGE_SIZE
anyway).

-Kees

[1] https://lore.kernel.org/lkml/202006112217.2E6CE093@keescook/

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-15 20:43   ` Kees Cook
@ 2021-03-16  7:24     ` Greg Kroah-Hartman
  2021-03-16 12:43       ` Al Viro
  0 siblings, 1 reply; 23+ messages in thread
From: Greg Kroah-Hartman @ 2021-03-16  7:24 UTC (permalink / raw)
  To: Kees Cook
  Cc: Al Viro, Andrew Morton, Michal Hocko, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening

On Mon, Mar 15, 2021 at 01:43:59PM -0700, Kees Cook wrote:
> On Mon, Mar 15, 2021 at 06:33:10PM +0000, Al Viro wrote:
> > On Mon, Mar 15, 2021 at 10:48:51AM -0700, Kees Cook wrote:
> > > The sysfs interface to seq_file continues to be rather fragile, as seen
> > > with some recent exploits[1]. Move the seq_file buffer to the vmap area
> > > (while retaining the accounting flag), since it has guard pages that
> > > will catch and stop linear overflows. This seems justified given that
> > > seq_file already uses kvmalloc(), is almost always using a PAGE_SIZE or
> > > larger allocation, has allocations are normally short lived, and is not
> > > normally on a performance critical path.
> > 
> > You are attacking the wrong part of it.  Is there any reason for having
> > seq_get_buf() public in the first place?
> 
> Completely agreed. seq_get_buf() should be totally ripped out.
> Unfortunately, this is going to be a long road because of sysfs's ATTR
> stuff, there are something like 5000 callers, and the entire API was
> designed to avoid refactoring all those callers from
> sysfs_kf_seq_show().

What is wrong with the sysfs ATTR stuff?  That should make it so that we
do not have to change any caller for any specific change like this, why
can't sysfs or kernfs handle it automatically?

> However, since I also need to entirely rewrite the sysfs vs kobj APIs[1]
> for CFI, I'm working on a plan to fix it all at once, but based on my
> experience refactoring the timer struct, it's going to be a very painful
> and long road.

Oh yeah, that fun.  I don't think it's going to be as hard as you think,
as the underlying code is doing the "right thing" here, so this feels
like a problem in the CFI implementation more than anything else.

So what can I do today in sysfs to help fix the seq_get_buf() stuff?
What should it use instead?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-15 17:48 [PATCH v2] seq_file: Unconditionally use vmalloc for buffer Kees Cook
  2021-03-15 18:33 ` Al Viro
@ 2021-03-16  8:31 ` Michal Hocko
  2021-03-16 19:08   ` Kees Cook
       [not found] ` <20210319140742.GC30349@xsang-OptiPlex-9020>
  2 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2021-03-16  8:31 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Greg Kroah-Hartman, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening

On Mon 15-03-21 10:48:51, Kees Cook wrote:
> The sysfs interface to seq_file continues to be rather fragile, as seen
> with some recent exploits[1]. Move the seq_file buffer to the vmap area
> (while retaining the accounting flag), since it has guard pages that
> will catch and stop linear overflows. This seems justified given that
> seq_file already uses kvmalloc(), is almost always using a PAGE_SIZE or
> larger allocation, has allocations are normally short lived, and is not
> normally on a performance critical path.

I have already objected without having my concerns really addressed.

Your observation that most of buffers are PAGE_SIZE in the vast majority
cases matches my experience and kmalloc should perform better than
vmalloc. You should check the most common /proc readers at least.

Also this cannot really be done for configurations with a very limited
vmalloc space (32b for example). Those systems are more and more rare
but you shouldn't really allow userspace to deplete the vmalloc space.

I would be also curious to see how vmalloc scales with huge number of
single page allocations which would be easy to trigger with this patch.

> [1] https://blog.grimm-co.com/2021/03/new-old-bugs-in-linux-kernel.html
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  fs/seq_file.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/seq_file.c b/fs/seq_file.c
> index cb11a34fb871..16fb4a4e61e3 100644
> --- a/fs/seq_file.c
> +++ b/fs/seq_file.c
> @@ -32,7 +32,12 @@ static void seq_set_overflow(struct seq_file *m)
>  
>  static void *seq_buf_alloc(unsigned long size)
>  {
> -	return kvmalloc(size, GFP_KERNEL_ACCOUNT);
> +	/*
> +	 * To be proactively defensive against buggy seq_get_buf() callers
> +	 * (i.e. sysfs handlers), use the vmap area to gain the trailing
> +	 * guard page which will protect against linear buffer overflows.
> +	 */
> +	return __vmalloc(size, GFP_KERNEL_ACCOUNT);
>  }
>  
>  /**
> @@ -130,7 +135,7 @@ static int traverse(struct seq_file *m, loff_t offset)
>  
>  Eoverflow:
>  	m->op->stop(m, p);
> -	kvfree(m->buf);
> +	vfree(m->buf);
>  	m->count = 0;
>  	m->buf = seq_buf_alloc(m->size <<= 1);
>  	return !m->buf ? -ENOMEM : -EAGAIN;
> @@ -237,7 +242,7 @@ ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter)
>  			goto Fill;
>  		// need a bigger buffer
>  		m->op->stop(m, p);
> -		kvfree(m->buf);
> +		vfree(m->buf);
>  		m->count = 0;
>  		m->buf = seq_buf_alloc(m->size <<= 1);
>  		if (!m->buf)
> @@ -349,7 +354,7 @@ EXPORT_SYMBOL(seq_lseek);
>  int seq_release(struct inode *inode, struct file *file)
>  {
>  	struct seq_file *m = file->private_data;
> -	kvfree(m->buf);
> +	vfree(m->buf);
>  	kmem_cache_free(seq_file_cache, m);
>  	return 0;
>  }
> @@ -585,7 +590,7 @@ int single_open_size(struct file *file, int (*show)(struct seq_file *, void *),
>  		return -ENOMEM;
>  	ret = single_open(file, show, data);
>  	if (ret) {
> -		kvfree(buf);
> +		vfree(buf);
>  		return ret;
>  	}
>  	((struct seq_file *)file->private_data)->buf = buf;
> -- 
> 2.25.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-16  7:24     ` Greg Kroah-Hartman
@ 2021-03-16 12:43       ` Al Viro
  2021-03-16 12:55         ` Greg Kroah-Hartman
                           ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Al Viro @ 2021-03-16 12:43 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kees Cook, Andrew Morton, Michal Hocko, Alexey Dobriyan,
	Lee Duncan, Chris Leech, Adam Nichols, linux-kernel,
	linux-fsdevel, linux-hardening

On Tue, Mar 16, 2021 at 08:24:50AM +0100, Greg Kroah-Hartman wrote:

> > Completely agreed. seq_get_buf() should be totally ripped out.
> > Unfortunately, this is going to be a long road because of sysfs's ATTR
> > stuff, there are something like 5000 callers, and the entire API was
> > designed to avoid refactoring all those callers from
> > sysfs_kf_seq_show().
> 
> What is wrong with the sysfs ATTR stuff?  That should make it so that we
> do not have to change any caller for any specific change like this, why
> can't sysfs or kernfs handle it automatically?

Hard to tell, since that would require _finding_ the sodding ->show()
instances first.  Good luck with that, seeing that most of those appear
to come from templates-done-with-cpp...

AFAICS, Kees wants to protect against ->show() instances stomping beyond
the page size.  What I don't get is what do you get from using seq_file
if you insist on doing raw access to the buffer rather than using
seq_printf() and friends.  What's the point?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-16 12:43       ` Al Viro
@ 2021-03-16 12:55         ` Greg Kroah-Hartman
  2021-03-16 13:01         ` Michal Hocko
  2021-03-16 19:18         ` Kees Cook
  2 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2021-03-16 12:55 UTC (permalink / raw)
  To: Al Viro
  Cc: Kees Cook, Andrew Morton, Michal Hocko, Alexey Dobriyan,
	Lee Duncan, Chris Leech, Adam Nichols, linux-kernel,
	linux-fsdevel, linux-hardening

On Tue, Mar 16, 2021 at 12:43:12PM +0000, Al Viro wrote:
> On Tue, Mar 16, 2021 at 08:24:50AM +0100, Greg Kroah-Hartman wrote:
> 
> > > Completely agreed. seq_get_buf() should be totally ripped out.
> > > Unfortunately, this is going to be a long road because of sysfs's ATTR
> > > stuff, there are something like 5000 callers, and the entire API was
> > > designed to avoid refactoring all those callers from
> > > sysfs_kf_seq_show().
> > 
> > What is wrong with the sysfs ATTR stuff?  That should make it so that we
> > do not have to change any caller for any specific change like this, why
> > can't sysfs or kernfs handle it automatically?
> 
> Hard to tell, since that would require _finding_ the sodding ->show()
> instances first.  Good luck with that, seeing that most of those appear
> to come from templates-done-with-cpp...

Sure, auditing all of this is a pain, but the numbers that take a string
are low if someone wants to do that and convert them all to sysfs_emit()
today.

> AFAICS, Kees wants to protect against ->show() instances stomping beyond
> the page size.  What I don't get is what do you get from using seq_file
> if you insist on doing raw access to the buffer rather than using
> seq_printf() and friends.  What's the point?

I don't understand as I didn't switch kernfs to this api at all anyway,
as it seems to have come from the original sysfs code moving to kernfs
way back in 2013 with the work that Tejun did.  So I can't remember any
of that...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-16 12:43       ` Al Viro
  2021-03-16 12:55         ` Greg Kroah-Hartman
@ 2021-03-16 13:01         ` Michal Hocko
  2021-03-16 19:18         ` Kees Cook
  2 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2021-03-16 13:01 UTC (permalink / raw)
  To: Al Viro
  Cc: Greg Kroah-Hartman, Kees Cook, Andrew Morton, Alexey Dobriyan,
	Lee Duncan, Chris Leech, Adam Nichols, linux-kernel,
	linux-fsdevel, linux-hardening

On Tue 16-03-21 12:43:12, Al Viro wrote:
[...]
> AFAICS, Kees wants to protect against ->show() instances stomping beyond
> the page size.  What I don't get is what do you get from using seq_file
> if you insist on doing raw access to the buffer rather than using
> seq_printf() and friends.  What's the point?

I do not think there is any and as you have said in other response we
should really make seq_get_buf internal thing to seq_file and be done
with that. If there is a missing functionality that users workaround by
abusing seq_get_buf then it should be added into seq_file interface.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-16  8:31 ` Michal Hocko
@ 2021-03-16 19:08   ` Kees Cook
  2021-03-17 12:08     ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Kees Cook @ 2021-03-16 19:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Greg Kroah-Hartman, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening

On Tue, Mar 16, 2021 at 09:31:23AM +0100, Michal Hocko wrote:
> On Mon 15-03-21 10:48:51, Kees Cook wrote:
> > The sysfs interface to seq_file continues to be rather fragile, as seen
> > with some recent exploits[1]. Move the seq_file buffer to the vmap area
> > (while retaining the accounting flag), since it has guard pages that
> > will catch and stop linear overflows. This seems justified given that
> > seq_file already uses kvmalloc(), is almost always using a PAGE_SIZE or
> > larger allocation, has allocations are normally short lived, and is not
> > normally on a performance critical path.
> 
> I have already objected without having my concerns really addressed.

Sorry, I didn't mean to ignore your comments!

> Your observation that most of buffers are PAGE_SIZE in the vast majority
> cases matches my experience and kmalloc should perform better than
> vmalloc. You should check the most common /proc readers at least.

Yeah, I'm going to build a quick test rig to see some before/after
timings, etc.

> Also this cannot really be done for configurations with a very limited
> vmalloc space (32b for example). Those systems are more and more rare
> but you shouldn't really allow userspace to deplete the vmalloc space.

This sounds like two objections:
- 32b has a small vmalloc space
- userspace shouldn't allow depletion of vmalloc space

I'd be happy to make this 64b only. For the latter, I would imagine
there are other vmalloc-exposed-to-userspace cases, but yes, this would
be much more direct. Is that a problem in practice?

> I would be also curious to see how vmalloc scales with huge number of
> single page allocations which would be easy to trigger with this patch.

Right -- what the best way to measure this (and what would be "too
much")?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-16 12:43       ` Al Viro
  2021-03-16 12:55         ` Greg Kroah-Hartman
  2021-03-16 13:01         ` Michal Hocko
@ 2021-03-16 19:18         ` Kees Cook
  2021-03-17 10:44           ` Greg Kroah-Hartman
  2 siblings, 1 reply; 23+ messages in thread
From: Kees Cook @ 2021-03-16 19:18 UTC (permalink / raw)
  To: Al Viro
  Cc: Greg Kroah-Hartman, Andrew Morton, Michal Hocko, Alexey Dobriyan,
	Lee Duncan, Chris Leech, Adam Nichols, linux-kernel,
	linux-fsdevel, linux-hardening

On Tue, Mar 16, 2021 at 12:43:12PM +0000, Al Viro wrote:
> On Tue, Mar 16, 2021 at 08:24:50AM +0100, Greg Kroah-Hartman wrote:
> 
> > > Completely agreed. seq_get_buf() should be totally ripped out.
> > > Unfortunately, this is going to be a long road because of sysfs's ATTR
> > > stuff, there are something like 5000 callers, and the entire API was
> > > designed to avoid refactoring all those callers from
> > > sysfs_kf_seq_show().
> > 
> > What is wrong with the sysfs ATTR stuff?  That should make it so that we
> > do not have to change any caller for any specific change like this, why
> > can't sysfs or kernfs handle it automatically?
> 
> Hard to tell, since that would require _finding_ the sodding ->show()
> instances first.  Good luck with that, seeing that most of those appear
> to come from templates-done-with-cpp...

I *think* I can get coccinelle to find them all, but my brute-force
approach was to just do a debug build changing the ATTR macro to be
typed, and changing the name of "show" and "store" in kobj_attribute
(to make the compiler find them all).

> AFAICS, Kees wants to protect against ->show() instances stomping beyond
> the page size.  What I don't get is what do you get from using seq_file
> if you insist on doing raw access to the buffer rather than using
> seq_printf() and friends.  What's the point?

To me, it looks like the kernfs/sysfs API happened around the time
"container_of" was gaining ground. It's trying to do the same thing
the "modern" callbacks do with finding a pointer from another, but it
did so by making sure everything had a 0 offset and an identical
beginning structure layout _but changed prototypes_.

It's the changed prototypes that freaks out CFI.

My current plan consists of these steps:

- add two new callbacks to the kobj_attribute struct (and its clones):
  "seq_show" and "seq_store", which will pass in the seq_file.
- convert all callbacks to kobject/kboj_attribute and use container_of()
  to find their respective pointers.
- remove "show" and "store"
- remove external use of seq_get_buf().

The first two steps require thousands of lines of code changed, so
I'm going to try to minimize it by trying to do as many conversions as
possible to the appropriate helpers first. e.g. DEVICE_ATTR_INT exists,
but there are only 2 users, yet there appears to be something like 500
DEVICE_ATTR callers that have an open-coded '%d':

$ git grep -B10 '\bDEVICE_ATTR' | grep '%d' | wc -l
530

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-16 19:18         ` Kees Cook
@ 2021-03-17 10:44           ` Greg Kroah-Hartman
  0 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2021-03-17 10:44 UTC (permalink / raw)
  To: Kees Cook
  Cc: Al Viro, Andrew Morton, Michal Hocko, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening

On Tue, Mar 16, 2021 at 12:18:33PM -0700, Kees Cook wrote:
> On Tue, Mar 16, 2021 at 12:43:12PM +0000, Al Viro wrote:
> > On Tue, Mar 16, 2021 at 08:24:50AM +0100, Greg Kroah-Hartman wrote:
> > 
> > > > Completely agreed. seq_get_buf() should be totally ripped out.
> > > > Unfortunately, this is going to be a long road because of sysfs's ATTR
> > > > stuff, there are something like 5000 callers, and the entire API was
> > > > designed to avoid refactoring all those callers from
> > > > sysfs_kf_seq_show().
> > > 
> > > What is wrong with the sysfs ATTR stuff?  That should make it so that we
> > > do not have to change any caller for any specific change like this, why
> > > can't sysfs or kernfs handle it automatically?
> > 
> > Hard to tell, since that would require _finding_ the sodding ->show()
> > instances first.  Good luck with that, seeing that most of those appear
> > to come from templates-done-with-cpp...
> 
> I *think* I can get coccinelle to find them all, but my brute-force
> approach was to just do a debug build changing the ATTR macro to be
> typed, and changing the name of "show" and "store" in kobj_attribute
> (to make the compiler find them all).
> 
> > AFAICS, Kees wants to protect against ->show() instances stomping beyond
> > the page size.  What I don't get is what do you get from using seq_file
> > if you insist on doing raw access to the buffer rather than using
> > seq_printf() and friends.  What's the point?
> 
> To me, it looks like the kernfs/sysfs API happened around the time
> "container_of" was gaining ground. It's trying to do the same thing
> the "modern" callbacks do with finding a pointer from another, but it
> did so by making sure everything had a 0 offset and an identical
> beginning structure layout _but changed prototypes_.
> 
> It's the changed prototypes that freaks out CFI.
> 
> My current plan consists of these steps:
> 
> - add two new callbacks to the kobj_attribute struct (and its clones):
>   "seq_show" and "seq_store", which will pass in the seq_file.

Ick, why?  Why should the callback care about seq_file?  Shouldn't any
wrapper logic in the kobject code be able to handle this automatically?

> - convert all callbacks to kobject/kboj_attribute and use container_of()
>   to find their respective pointers.

Which callbacks are you talking about here?

> - remove "show" and "store"

Hah!

> - remove external use of seq_get_buf().

So is this the main goal?  I still don't understand the sequence file
problem here, what am I missing (becides the CFI stuff that is)?

> The first two steps require thousands of lines of code changed, so
> I'm going to try to minimize it by trying to do as many conversions as
> possible to the appropriate helpers first. e.g. DEVICE_ATTR_INT exists,
> but there are only 2 users, yet there appears to be something like 500
> DEVICE_ATTR callers that have an open-coded '%d':
> 
> $ git grep -B10 '\bDEVICE_ATTR' | grep '%d' | wc -l
> 530

That's going to be hard, and a pain, and I really doubt all that useful
as I still can't figure out why this is needed...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-16 19:08   ` Kees Cook
@ 2021-03-17 12:08     ` Michal Hocko
  2021-03-17 13:34       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2021-03-17 12:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andrew Morton, Greg Kroah-Hartman, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening, Uladzislau Rezki

On Tue 16-03-21 12:08:02, Kees Cook wrote:
> On Tue, Mar 16, 2021 at 09:31:23AM +0100, Michal Hocko wrote:
[...]
> > Also this cannot really be done for configurations with a very limited
> > vmalloc space (32b for example). Those systems are more and more rare
> > but you shouldn't really allow userspace to deplete the vmalloc space.
>
> This sounds like two objections:
> - 32b has a small vmalloc space
> - userspace shouldn't allow depletion of vmalloc space
>
> I'd be happy to make this 64b only. For the latter, I would imagine
> there are other vmalloc-exposed-to-userspace cases, but yes, this would
> be much more direct. Is that a problem in practice?

vmalloc space shouldn't be a problem for 64b systems but I am not sure
how does vmalloc scale with many small allocations. There were some
changes by Uladzislau who might give us more insight (CCed).

> > I would be also curious to see how vmalloc scales with huge number of
> > single page allocations which would be easy to trigger with this patch.
>
> Right -- what the best way to measure this (and what would be "too
> much")?

Proc is used quite heavily for all sorts of monitoring so I would be
worried about a noticeable slow down.

Btw. I still have problems with the approach. seq_file is intended to
provide safe way to dump values to the userspace. Sacrificing
performance just because of some abuser seems like a wrong way to go as
Al pointed out earlier. Can we simply stop the abuse and disallow to
manipulate the buffer directly? I do realize this might be more tricky
for reasons mentioned in other emails but this is definitely worth
doing.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-17 12:08     ` Michal Hocko
@ 2021-03-17 13:34       ` Greg Kroah-Hartman
  2021-03-17 14:44         ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Greg Kroah-Hartman @ 2021-03-17 13:34 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kees Cook, Andrew Morton, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening, Uladzislau Rezki

On Wed, Mar 17, 2021 at 01:08:21PM +0100, Michal Hocko wrote:
> Btw. I still have problems with the approach. seq_file is intended to
> provide safe way to dump values to the userspace. Sacrificing
> performance just because of some abuser seems like a wrong way to go as
> Al pointed out earlier. Can we simply stop the abuse and disallow to
> manipulate the buffer directly? I do realize this might be more tricky
> for reasons mentioned in other emails but this is definitely worth
> doing.

We have to provide a buffer to "write into" somehow, so what is the best
way to stop "abuse" like this?

Right now, we do have helper functions, sysfs_emit(), that know to stop
the overflow of the buffer size, but porting the whole kernel to them is
going to take a bunch of churn, for almost no real benefit except a
potential random driver that might be doing bad things here that we have
not noticed yet.

Other than that, suggestions are welcome!

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-17 13:34       ` Greg Kroah-Hartman
@ 2021-03-17 14:44         ` Michal Hocko
  2021-03-17 14:56           ` Greg Kroah-Hartman
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2021-03-17 14:44 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kees Cook, Andrew Morton, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening, Uladzislau Rezki

On Wed 17-03-21 14:34:27, Greg KH wrote:
> On Wed, Mar 17, 2021 at 01:08:21PM +0100, Michal Hocko wrote:
> > Btw. I still have problems with the approach. seq_file is intended to
> > provide safe way to dump values to the userspace. Sacrificing
> > performance just because of some abuser seems like a wrong way to go as
> > Al pointed out earlier. Can we simply stop the abuse and disallow to
> > manipulate the buffer directly? I do realize this might be more tricky
> > for reasons mentioned in other emails but this is definitely worth
> > doing.
> 
> We have to provide a buffer to "write into" somehow, so what is the best
> way to stop "abuse" like this?

What is wrong about using seq_* interface directly?

> Right now, we do have helper functions, sysfs_emit(), that know to stop
> the overflow of the buffer size, but porting the whole kernel to them is
> going to take a bunch of churn, for almost no real benefit except a
> potential random driver that might be doing bad things here that we have
> not noticed yet.

I am not familiar with sysfs, I just got lost in all the indirection but
replacing buffer by the seq_file and operate on that should be possible,
no?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-17 14:44         ` Michal Hocko
@ 2021-03-17 14:56           ` Greg Kroah-Hartman
  2021-03-17 15:20             ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: Greg Kroah-Hartman @ 2021-03-17 14:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kees Cook, Andrew Morton, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening, Uladzislau Rezki

On Wed, Mar 17, 2021 at 03:44:16PM +0100, Michal Hocko wrote:
> On Wed 17-03-21 14:34:27, Greg KH wrote:
> > On Wed, Mar 17, 2021 at 01:08:21PM +0100, Michal Hocko wrote:
> > > Btw. I still have problems with the approach. seq_file is intended to
> > > provide safe way to dump values to the userspace. Sacrificing
> > > performance just because of some abuser seems like a wrong way to go as
> > > Al pointed out earlier. Can we simply stop the abuse and disallow to
> > > manipulate the buffer directly? I do realize this might be more tricky
> > > for reasons mentioned in other emails but this is definitely worth
> > > doing.
> > 
> > We have to provide a buffer to "write into" somehow, so what is the best
> > way to stop "abuse" like this?
> 
> What is wrong about using seq_* interface directly?

Right now every show() callback of sysfs would have to be changed :(

> > Right now, we do have helper functions, sysfs_emit(), that know to stop
> > the overflow of the buffer size, but porting the whole kernel to them is
> > going to take a bunch of churn, for almost no real benefit except a
> > potential random driver that might be doing bad things here that we have
> > not noticed yet.
> 
> I am not familiar with sysfs, I just got lost in all the indirection but
> replacing buffer by the seq_file and operate on that should be possible,
> no?

sysfs files should be very simple and easy, and have a single value
being written to userspace.  I guess seq_printf() does handle the issue
of "big buffers", but there should not be a big buffer here to worry
about in the first place (yes, there was a bug where a driver took
unchecked data and sent it to userspace overflowing the buffer which
started this whole thread...)

I guess Kees wants to change all show functions to use the seq_ api,
which now makes a bit more sense, but still seems like a huge overkill.
But I now understand the idea here, the buffer management is handled by
the core kernel and overflows are impossible.

A "simpler" fix is to keep the api the same today, and just "force"
everyone to use sysfs_emit() which does the length checking
automatically.

I don't know, it all depends on how much effort we want to put into the
"drivers can not do stupid things because we prevent them from it"
type of work here...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-17 14:56           ` Greg Kroah-Hartman
@ 2021-03-17 15:20             ` Michal Hocko
  2021-03-17 15:38               ` Greg Kroah-Hartman
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2021-03-17 15:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kees Cook, Andrew Morton, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening, Uladzislau Rezki

On Wed 17-03-21 15:56:44, Greg KH wrote:
> On Wed, Mar 17, 2021 at 03:44:16PM +0100, Michal Hocko wrote:
> > On Wed 17-03-21 14:34:27, Greg KH wrote:
> > > On Wed, Mar 17, 2021 at 01:08:21PM +0100, Michal Hocko wrote:
> > > > Btw. I still have problems with the approach. seq_file is intended to
> > > > provide safe way to dump values to the userspace. Sacrificing
> > > > performance just because of some abuser seems like a wrong way to go as
> > > > Al pointed out earlier. Can we simply stop the abuse and disallow to
> > > > manipulate the buffer directly? I do realize this might be more tricky
> > > > for reasons mentioned in other emails but this is definitely worth
> > > > doing.
> > > 
> > > We have to provide a buffer to "write into" somehow, so what is the best
> > > way to stop "abuse" like this?
> > 
> > What is wrong about using seq_* interface directly?
> 
> Right now every show() callback of sysfs would have to be changed :(

Is this really the case? Would it be too ugly to have an intermediate
buffer and then seq_puts it into the seq file inside sysfs_kf_seq_show.
Sure one copy more than necessary but it this shouldn't be a hot path or
even visible on small strings. So that might be worth destroying an
inherently dangerous seq API (seq_get_buf).
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-17 15:20             ` Michal Hocko
@ 2021-03-17 15:38               ` Greg Kroah-Hartman
  2021-03-17 15:48                 ` Michal Hocko
  2021-03-17 21:30                 ` Kees Cook
  0 siblings, 2 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2021-03-17 15:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kees Cook, Andrew Morton, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening, Uladzislau Rezki

On Wed, Mar 17, 2021 at 04:20:52PM +0100, Michal Hocko wrote:
> On Wed 17-03-21 15:56:44, Greg KH wrote:
> > On Wed, Mar 17, 2021 at 03:44:16PM +0100, Michal Hocko wrote:
> > > On Wed 17-03-21 14:34:27, Greg KH wrote:
> > > > On Wed, Mar 17, 2021 at 01:08:21PM +0100, Michal Hocko wrote:
> > > > > Btw. I still have problems with the approach. seq_file is intended to
> > > > > provide safe way to dump values to the userspace. Sacrificing
> > > > > performance just because of some abuser seems like a wrong way to go as
> > > > > Al pointed out earlier. Can we simply stop the abuse and disallow to
> > > > > manipulate the buffer directly? I do realize this might be more tricky
> > > > > for reasons mentioned in other emails but this is definitely worth
> > > > > doing.
> > > > 
> > > > We have to provide a buffer to "write into" somehow, so what is the best
> > > > way to stop "abuse" like this?
> > > 
> > > What is wrong about using seq_* interface directly?
> > 
> > Right now every show() callback of sysfs would have to be changed :(
> 
> Is this really the case? Would it be too ugly to have an intermediate
> buffer and then seq_puts it into the seq file inside sysfs_kf_seq_show.

Oh, good idea.

> Sure one copy more than necessary but it this shouldn't be a hot path or
> even visible on small strings. So that might be worth destroying an
> inherently dangerous seq API (seq_get_buf).

I'm all for that, let me see if I can carve out some time tomorrow to
try this out.

But, you don't get rid of the "ability" to have a driver write more than
a PAGE_SIZE into the buffer passed to it.  I guess I could be paranoid
and do some internal checks (allocate a bunch of memory and check for
overflow by hand), if this is something to really be concerned about...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-17 15:38               ` Greg Kroah-Hartman
@ 2021-03-17 15:48                 ` Michal Hocko
  2021-03-17 21:30                 ` Kees Cook
  1 sibling, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2021-03-17 15:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Kees Cook, Andrew Morton, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening, Uladzislau Rezki

On Wed 17-03-21 16:38:57, Greg KH wrote:
> On Wed, Mar 17, 2021 at 04:20:52PM +0100, Michal Hocko wrote:
> > On Wed 17-03-21 15:56:44, Greg KH wrote:
> > > On Wed, Mar 17, 2021 at 03:44:16PM +0100, Michal Hocko wrote:
> > > > On Wed 17-03-21 14:34:27, Greg KH wrote:
> > > > > On Wed, Mar 17, 2021 at 01:08:21PM +0100, Michal Hocko wrote:
> > > > > > Btw. I still have problems with the approach. seq_file is intended to
> > > > > > provide safe way to dump values to the userspace. Sacrificing
> > > > > > performance just because of some abuser seems like a wrong way to go as
> > > > > > Al pointed out earlier. Can we simply stop the abuse and disallow to
> > > > > > manipulate the buffer directly? I do realize this might be more tricky
> > > > > > for reasons mentioned in other emails but this is definitely worth
> > > > > > doing.
> > > > > 
> > > > > We have to provide a buffer to "write into" somehow, so what is the best
> > > > > way to stop "abuse" like this?
> > > > 
> > > > What is wrong about using seq_* interface directly?
> > > 
> > > Right now every show() callback of sysfs would have to be changed :(
> > 
> > Is this really the case? Would it be too ugly to have an intermediate
> > buffer and then seq_puts it into the seq file inside sysfs_kf_seq_show.
> 
> Oh, good idea.
> 
> > Sure one copy more than necessary but it this shouldn't be a hot path or
> > even visible on small strings. So that might be worth destroying an
> > inherently dangerous seq API (seq_get_buf).
> 
> I'm all for that, let me see if I can carve out some time tomorrow to
> try this out.
> 
> But, you don't get rid of the "ability" to have a driver write more than
> a PAGE_SIZE into the buffer passed to it.  I guess I could be paranoid
> and do some internal checks (allocate a bunch of memory and check for
> overflow by hand), if this is something to really be concerned about...

Yes this is certainly possible and it will needs some way to address. My
point was that we shouldn't cripple seq_file just because the API allows
for an abuse. Sysfs needs to find a way to handle internal PAGE_SIZE
buffer assumption in any case.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-17 15:38               ` Greg Kroah-Hartman
  2021-03-17 15:48                 ` Michal Hocko
@ 2021-03-17 21:30                 ` Kees Cook
  2021-03-18  8:07                   ` Greg Kroah-Hartman
  1 sibling, 1 reply; 23+ messages in thread
From: Kees Cook @ 2021-03-17 21:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Michal Hocko, Andrew Morton, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening, Uladzislau Rezki

On Wed, Mar 17, 2021 at 04:38:57PM +0100, Greg Kroah-Hartman wrote:
> On Wed, Mar 17, 2021 at 04:20:52PM +0100, Michal Hocko wrote:
> > On Wed 17-03-21 15:56:44, Greg KH wrote:
> > > On Wed, Mar 17, 2021 at 03:44:16PM +0100, Michal Hocko wrote:
> > > > On Wed 17-03-21 14:34:27, Greg KH wrote:
> > > > > On Wed, Mar 17, 2021 at 01:08:21PM +0100, Michal Hocko wrote:
> > > > > > Btw. I still have problems with the approach. seq_file is intended to
> > > > > > provide safe way to dump values to the userspace. Sacrificing
> > > > > > performance just because of some abuser seems like a wrong way to go as
> > > > > > Al pointed out earlier. Can we simply stop the abuse and disallow to
> > > > > > manipulate the buffer directly? I do realize this might be more tricky
> > > > > > for reasons mentioned in other emails but this is definitely worth
> > > > > > doing.
> > > > > 
> > > > > We have to provide a buffer to "write into" somehow, so what is the best
> > > > > way to stop "abuse" like this?
> > > > 
> > > > What is wrong about using seq_* interface directly?
> > > 
> > > Right now every show() callback of sysfs would have to be changed :(
> > 
> > Is this really the case? Would it be too ugly to have an intermediate
> > buffer and then seq_puts it into the seq file inside sysfs_kf_seq_show.
> 
> Oh, good idea.
> 
> > Sure one copy more than necessary but it this shouldn't be a hot path or
> > even visible on small strings. So that might be worth destroying an
> > inherently dangerous seq API (seq_get_buf).
> 
> I'm all for that, let me see if I can carve out some time tomorrow to
> try this out.

The trouble has been that C string APIs are just so impossibly fragile.
We just get too many bugs with it, so we really do need to rewrite the
callbacks to use seq_file, since it has a safe API.

I've been trying to write coccinelle scripts to do some of this
refactoring, but I have not found a silver bullet. (This is why I've
suggested adding the temporary "seq_show" and "seq_store" functions, so
we can transition all the callbacks without a flag day.)

> But, you don't get rid of the "ability" to have a driver write more than
> a PAGE_SIZE into the buffer passed to it.  I guess I could be paranoid
> and do some internal checks (allocate a bunch of memory and check for
> overflow by hand), if this is something to really be concerned about...

Besides the CFI prototype enforcement changes (which I can build into
the new seq_show/seq_store callbacks), the buffer management is the
primary issue: we just can't hand drivers a string (even with a length)
because the C functions are terrible. e.g. just look at the snprintf vs
scnprintf -- we constantly have to just build completely new API when
what we need is a safe way (i.e. obfuscated away from the caller) to
build a string. Luckily seq_file does this already, so leaning into that
is good here.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-17 21:30                 ` Kees Cook
@ 2021-03-18  8:07                   ` Greg Kroah-Hartman
  2021-03-18 15:51                     ` Kees Cook
  0 siblings, 1 reply; 23+ messages in thread
From: Greg Kroah-Hartman @ 2021-03-18  8:07 UTC (permalink / raw)
  To: Kees Cook
  Cc: Michal Hocko, Andrew Morton, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening, Uladzislau Rezki

On Wed, Mar 17, 2021 at 02:30:47PM -0700, Kees Cook wrote:
> On Wed, Mar 17, 2021 at 04:38:57PM +0100, Greg Kroah-Hartman wrote:
> > On Wed, Mar 17, 2021 at 04:20:52PM +0100, Michal Hocko wrote:
> > > On Wed 17-03-21 15:56:44, Greg KH wrote:
> > > > On Wed, Mar 17, 2021 at 03:44:16PM +0100, Michal Hocko wrote:
> > > > > On Wed 17-03-21 14:34:27, Greg KH wrote:
> > > > > > On Wed, Mar 17, 2021 at 01:08:21PM +0100, Michal Hocko wrote:
> > > > > > > Btw. I still have problems with the approach. seq_file is intended to
> > > > > > > provide safe way to dump values to the userspace. Sacrificing
> > > > > > > performance just because of some abuser seems like a wrong way to go as
> > > > > > > Al pointed out earlier. Can we simply stop the abuse and disallow to
> > > > > > > manipulate the buffer directly? I do realize this might be more tricky
> > > > > > > for reasons mentioned in other emails but this is definitely worth
> > > > > > > doing.
> > > > > > 
> > > > > > We have to provide a buffer to "write into" somehow, so what is the best
> > > > > > way to stop "abuse" like this?
> > > > > 
> > > > > What is wrong about using seq_* interface directly?
> > > > 
> > > > Right now every show() callback of sysfs would have to be changed :(
> > > 
> > > Is this really the case? Would it be too ugly to have an intermediate
> > > buffer and then seq_puts it into the seq file inside sysfs_kf_seq_show.
> > 
> > Oh, good idea.
> > 
> > > Sure one copy more than necessary but it this shouldn't be a hot path or
> > > even visible on small strings. So that might be worth destroying an
> > > inherently dangerous seq API (seq_get_buf).
> > 
> > I'm all for that, let me see if I can carve out some time tomorrow to
> > try this out.
> 
> The trouble has been that C string APIs are just so impossibly fragile.
> We just get too many bugs with it, so we really do need to rewrite the
> callbacks to use seq_file, since it has a safe API.
> 
> I've been trying to write coccinelle scripts to do some of this
> refactoring, but I have not found a silver bullet. (This is why I've
> suggested adding the temporary "seq_show" and "seq_store" functions, so
> we can transition all the callbacks without a flag day.)
> 
> > But, you don't get rid of the "ability" to have a driver write more than
> > a PAGE_SIZE into the buffer passed to it.  I guess I could be paranoid
> > and do some internal checks (allocate a bunch of memory and check for
> > overflow by hand), if this is something to really be concerned about...
> 
> Besides the CFI prototype enforcement changes (which I can build into
> the new seq_show/seq_store callbacks), the buffer management is the
> primary issue: we just can't hand drivers a string (even with a length)
> because the C functions are terrible. e.g. just look at the snprintf vs
> scnprintf -- we constantly have to just build completely new API when
> what we need is a safe way (i.e. obfuscated away from the caller) to
> build a string. Luckily seq_file does this already, so leaning into that
> is good here.

But, is it really worth the churn here?

Yes, strings in C is "hard", but this _should_ be a simple thing for any
driver to handle:
	return sysfs_emit(buffer, "%d\n", my_dev->value);

To change that to:
	return seq_printf(seq, "%d\n", my_dev->value);
feels very much "don't we have other more valuable things we could be
doing?"

So far we have found 1 driver that messed up and overflowed the buffer
that I know of.  While reworking apis to make it "hard to get wrong" is
a great goal, the work involved here vs. any "protection" feels very
low.

How about moving everyone to sysfs_emit() first?  That way it becomes
much more "obvious" when drivers are doing stupid things with their
sysfs buffer.  But even then, it would not have caught the iscsi issue
as that was printing a user-provided string so maybe I'm just feeling
grumpy about the potential churn here...

I don't know...

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-18  8:07                   ` Greg Kroah-Hartman
@ 2021-03-18 15:51                     ` Kees Cook
  2021-03-18 17:56                       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 23+ messages in thread
From: Kees Cook @ 2021-03-18 15:51 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Michal Hocko, Andrew Morton, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening, Uladzislau Rezki

On Thu, Mar 18, 2021 at 09:07:45AM +0100, Greg Kroah-Hartman wrote:
> On Wed, Mar 17, 2021 at 02:30:47PM -0700, Kees Cook wrote:
> > On Wed, Mar 17, 2021 at 04:38:57PM +0100, Greg Kroah-Hartman wrote:
> > > On Wed, Mar 17, 2021 at 04:20:52PM +0100, Michal Hocko wrote:
> > > > On Wed 17-03-21 15:56:44, Greg KH wrote:
> > > > > On Wed, Mar 17, 2021 at 03:44:16PM +0100, Michal Hocko wrote:
> > > > > > On Wed 17-03-21 14:34:27, Greg KH wrote:
> > > > > > > On Wed, Mar 17, 2021 at 01:08:21PM +0100, Michal Hocko wrote:
> > > > > > > > Btw. I still have problems with the approach. seq_file is intended to
> > > > > > > > provide safe way to dump values to the userspace. Sacrificing
> > > > > > > > performance just because of some abuser seems like a wrong way to go as
> > > > > > > > Al pointed out earlier. Can we simply stop the abuse and disallow to
> > > > > > > > manipulate the buffer directly? I do realize this might be more tricky
> > > > > > > > for reasons mentioned in other emails but this is definitely worth
> > > > > > > > doing.
> > > > > > > 
> > > > > > > We have to provide a buffer to "write into" somehow, so what is the best
> > > > > > > way to stop "abuse" like this?
> > > > > > 
> > > > > > What is wrong about using seq_* interface directly?
> > > > > 
> > > > > Right now every show() callback of sysfs would have to be changed :(
> > > > 
> > > > Is this really the case? Would it be too ugly to have an intermediate
> > > > buffer and then seq_puts it into the seq file inside sysfs_kf_seq_show.
> > > 
> > > Oh, good idea.
> > > 
> > > > Sure one copy more than necessary but it this shouldn't be a hot path or
> > > > even visible on small strings. So that might be worth destroying an
> > > > inherently dangerous seq API (seq_get_buf).
> > > 
> > > I'm all for that, let me see if I can carve out some time tomorrow to
> > > try this out.
> > 
> > The trouble has been that C string APIs are just so impossibly fragile.
> > We just get too many bugs with it, so we really do need to rewrite the
> > callbacks to use seq_file, since it has a safe API.
> > 
> > I've been trying to write coccinelle scripts to do some of this
> > refactoring, but I have not found a silver bullet. (This is why I've
> > suggested adding the temporary "seq_show" and "seq_store" functions, so
> > we can transition all the callbacks without a flag day.)
> > 
> > > But, you don't get rid of the "ability" to have a driver write more than
> > > a PAGE_SIZE into the buffer passed to it.  I guess I could be paranoid
> > > and do some internal checks (allocate a bunch of memory and check for
> > > overflow by hand), if this is something to really be concerned about...
> > 
> > Besides the CFI prototype enforcement changes (which I can build into
> > the new seq_show/seq_store callbacks), the buffer management is the
> > primary issue: we just can't hand drivers a string (even with a length)
> > because the C functions are terrible. e.g. just look at the snprintf vs
> > scnprintf -- we constantly have to just build completely new API when
> > what we need is a safe way (i.e. obfuscated away from the caller) to
> > build a string. Luckily seq_file does this already, so leaning into that
> > is good here.
> 
> But, is it really worth the churn here?
> 
> Yes, strings in C is "hard", but this _should_ be a simple thing for any
> driver to handle:
> 	return sysfs_emit(buffer, "%d\n", my_dev->value);
> 
> To change that to:
> 	return seq_printf(seq, "%d\n", my_dev->value);
> feels very much "don't we have other more valuable things we could be
> doing?"
> 
> So far we have found 1 driver that messed up and overflowed the buffer
> that I know of.  While reworking apis to make it "hard to get wrong" is
> a great goal, the work involved here vs. any "protection" feels very
> low.

I haven't been keeping a list, but it's not the only one. The _other_
reason we need seq_file is so we can perform checks against f_cred for
things like %p obfuscation (as was needed for modules that I hacked
around) and is needed a proper bug fix for the kernel pointer exposure
bug from the same batch. So now I'm up to 3 distinct reasons that the
sysfs API is lacking -- I think it's worth the churn and time.

> How about moving everyone to sysfs_emit() first?  That way it becomes
> much more "obvious" when drivers are doing stupid things with their
> sysfs buffer.  But even then, it would not have caught the iscsi issue
> as that was printing a user-provided string so maybe I'm just feeling
> grumpy about the potential churn here...

I need to fix the prototypes for CFI sanity too. Switching to seq_file
solves 2 problems, and if we have to change the prototype once for that,
we can include the prototype fixes for CFI at the same time to avoid
double the churn.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] seq_file: Unconditionally use vmalloc for buffer
  2021-03-18 15:51                     ` Kees Cook
@ 2021-03-18 17:56                       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 23+ messages in thread
From: Greg Kroah-Hartman @ 2021-03-18 17:56 UTC (permalink / raw)
  To: Kees Cook
  Cc: Michal Hocko, Andrew Morton, Alexey Dobriyan, Lee Duncan,
	Chris Leech, Adam Nichols, linux-kernel, linux-fsdevel,
	linux-hardening, Uladzislau Rezki

On Thu, Mar 18, 2021 at 08:51:45AM -0700, Kees Cook wrote:
> On Thu, Mar 18, 2021 at 09:07:45AM +0100, Greg Kroah-Hartman wrote:
> > On Wed, Mar 17, 2021 at 02:30:47PM -0700, Kees Cook wrote:
> > > On Wed, Mar 17, 2021 at 04:38:57PM +0100, Greg Kroah-Hartman wrote:
> > > > On Wed, Mar 17, 2021 at 04:20:52PM +0100, Michal Hocko wrote:
> > > > > On Wed 17-03-21 15:56:44, Greg KH wrote:
> > > > > > On Wed, Mar 17, 2021 at 03:44:16PM +0100, Michal Hocko wrote:
> > > > > > > On Wed 17-03-21 14:34:27, Greg KH wrote:
> > > > > > > > On Wed, Mar 17, 2021 at 01:08:21PM +0100, Michal Hocko wrote:
> > > > > > > > > Btw. I still have problems with the approach. seq_file is intended to
> > > > > > > > > provide safe way to dump values to the userspace. Sacrificing
> > > > > > > > > performance just because of some abuser seems like a wrong way to go as
> > > > > > > > > Al pointed out earlier. Can we simply stop the abuse and disallow to
> > > > > > > > > manipulate the buffer directly? I do realize this might be more tricky
> > > > > > > > > for reasons mentioned in other emails but this is definitely worth
> > > > > > > > > doing.
> > > > > > > > 
> > > > > > > > We have to provide a buffer to "write into" somehow, so what is the best
> > > > > > > > way to stop "abuse" like this?
> > > > > > > 
> > > > > > > What is wrong about using seq_* interface directly?
> > > > > > 
> > > > > > Right now every show() callback of sysfs would have to be changed :(
> > > > > 
> > > > > Is this really the case? Would it be too ugly to have an intermediate
> > > > > buffer and then seq_puts it into the seq file inside sysfs_kf_seq_show.
> > > > 
> > > > Oh, good idea.
> > > > 
> > > > > Sure one copy more than necessary but it this shouldn't be a hot path or
> > > > > even visible on small strings. So that might be worth destroying an
> > > > > inherently dangerous seq API (seq_get_buf).
> > > > 
> > > > I'm all for that, let me see if I can carve out some time tomorrow to
> > > > try this out.
> > > 
> > > The trouble has been that C string APIs are just so impossibly fragile.
> > > We just get too many bugs with it, so we really do need to rewrite the
> > > callbacks to use seq_file, since it has a safe API.
> > > 
> > > I've been trying to write coccinelle scripts to do some of this
> > > refactoring, but I have not found a silver bullet. (This is why I've
> > > suggested adding the temporary "seq_show" and "seq_store" functions, so
> > > we can transition all the callbacks without a flag day.)
> > > 
> > > > But, you don't get rid of the "ability" to have a driver write more than
> > > > a PAGE_SIZE into the buffer passed to it.  I guess I could be paranoid
> > > > and do some internal checks (allocate a bunch of memory and check for
> > > > overflow by hand), if this is something to really be concerned about...
> > > 
> > > Besides the CFI prototype enforcement changes (which I can build into
> > > the new seq_show/seq_store callbacks), the buffer management is the
> > > primary issue: we just can't hand drivers a string (even with a length)
> > > because the C functions are terrible. e.g. just look at the snprintf vs
> > > scnprintf -- we constantly have to just build completely new API when
> > > what we need is a safe way (i.e. obfuscated away from the caller) to
> > > build a string. Luckily seq_file does this already, so leaning into that
> > > is good here.
> > 
> > But, is it really worth the churn here?
> > 
> > Yes, strings in C is "hard", but this _should_ be a simple thing for any
> > driver to handle:
> > 	return sysfs_emit(buffer, "%d\n", my_dev->value);
> > 
> > To change that to:
> > 	return seq_printf(seq, "%d\n", my_dev->value);
> > feels very much "don't we have other more valuable things we could be
> > doing?"
> > 
> > So far we have found 1 driver that messed up and overflowed the buffer
> > that I know of.  While reworking apis to make it "hard to get wrong" is
> > a great goal, the work involved here vs. any "protection" feels very
> > low.
> 
> I haven't been keeping a list, but it's not the only one. The _other_
> reason we need seq_file is so we can perform checks against f_cred for
> things like %p obfuscation (as was needed for modules that I hacked
> around) and is needed a proper bug fix for the kernel pointer exposure
> bug from the same batch. So now I'm up to 3 distinct reasons that the
> sysfs API is lacking -- I think it's worth the churn and time.

Ok, if you think so.

But if we do this, can we not do a "raw" seqfile api?  I would like to
see only 1 function that works like sysfs_emit() does.  Perhaps:
	void sysfs_printf(struct attribute *attr, const char *fmt, ...);

and then from there we can "derive" things like:
	void device_printf(struct device_attribute *attr, const char *fmt, ...);

You can "hide" the needed seq_file structure in the attribute structure
for the buffer management, but I don't think we need the crazy multiple
ways that seq_printf() has morphed into over the years, right?

seq_path() anyone?

binary attribute files are a totally different thing, and probably can
just be left alone for now.

> > How about moving everyone to sysfs_emit() first?  That way it becomes
> > much more "obvious" when drivers are doing stupid things with their
> > sysfs buffer.  But even then, it would not have caught the iscsi issue
> > as that was printing a user-provided string so maybe I'm just feeling
> > grumpy about the potential churn here...
> 
> I need to fix the prototypes for CFI sanity too. Switching to seq_file
> solves 2 problems, and if we have to change the prototype once for that,
> we can include the prototype fixes for CFI at the same time to avoid
> double the churn.

Yes, let's not go through this twice...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [seq_file]  5fd6060e50:  stress-ng.eventfd.ops_per_sec -49.1% regression
       [not found] ` <20210319140742.GC30349@xsang-OptiPlex-9020>
@ 2021-03-19 19:31   ` Kees Cook
  0 siblings, 0 replies; 23+ messages in thread
From: Kees Cook @ 2021-03-19 19:31 UTC (permalink / raw)
  To: kernel test robot
  Cc: 0day robot, LKML, lkp, ying.huang, feng.tang, zhengjun.xing,
	Andrew Morton, Greg Kroah-Hartman, Michal Hocko, Alexey Dobriyan,
	Lee Duncan, Chris Leech, Adam Nichols, linux-fsdevel,
	linux-hardening

On Fri, Mar 19, 2021 at 10:07:42PM +0800, kernel test robot wrote:
> FYI, we noticed a -49.1% regression of stress-ng.eventfd.ops_per_sec due to commit:

Well, so it can be seen. ;) Though I feel slightly better that it's stress-ng
instead of a "normal" workload.

Thanks for the report!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2021-03-19 19:32 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-03-15 17:48 [PATCH v2] seq_file: Unconditionally use vmalloc for buffer Kees Cook
2021-03-15 18:33 ` Al Viro
2021-03-15 20:43   ` Kees Cook
2021-03-16  7:24     ` Greg Kroah-Hartman
2021-03-16 12:43       ` Al Viro
2021-03-16 12:55         ` Greg Kroah-Hartman
2021-03-16 13:01         ` Michal Hocko
2021-03-16 19:18         ` Kees Cook
2021-03-17 10:44           ` Greg Kroah-Hartman
2021-03-16  8:31 ` Michal Hocko
2021-03-16 19:08   ` Kees Cook
2021-03-17 12:08     ` Michal Hocko
2021-03-17 13:34       ` Greg Kroah-Hartman
2021-03-17 14:44         ` Michal Hocko
2021-03-17 14:56           ` Greg Kroah-Hartman
2021-03-17 15:20             ` Michal Hocko
2021-03-17 15:38               ` Greg Kroah-Hartman
2021-03-17 15:48                 ` Michal Hocko
2021-03-17 21:30                 ` Kees Cook
2021-03-18  8:07                   ` Greg Kroah-Hartman
2021-03-18 15:51                     ` Kees Cook
2021-03-18 17:56                       ` Greg Kroah-Hartman
     [not found] ` <20210319140742.GC30349@xsang-OptiPlex-9020>
2021-03-19 19:31   ` [seq_file] 5fd6060e50: stress-ng.eventfd.ops_per_sec -49.1% regression Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox