linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] A few round_pipe_size() and pipe-max-size fixups
@ 2017-09-05 14:44 Joe Lawrence
  2017-09-05 14:44 ` [PATCH RFC 1/3] pipe: avoid round_pipe_size() nr_pages overflow on 32-bit Joe Lawrence
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Joe Lawrence @ 2017-09-05 14:44 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel
  Cc: Alexander Viro, Luis R. Rodriguez, Kees Cook, Mikulas Patocka,
	Michael Kerrisk

While backporting Michael's "pipe: fix limit handling" [1] patchset to a
distro-kernel, Mikulas noticed that current upstream pipe limit handling
contains a few problems:

  1 - round_pipe_size() nr_pages overflow on 32bit:  this would
      subsequently try roundup_pow_of_two(0), which is undefined.

  2 - visible non-rounded pipe-max-size value: there is no mutual
      exclusion or protection between the time pipe_max_size is assigned
      a raw value from proc_dointvec_minmax() and when it is rounded.

  3 - procfs signed wrap: echo'ing a large number into
      /proc/sys/fs/pipe-max-size and then cat'ing it back out shows a
      negative value.


This RFC serves as a bug report and a contains a few possible fixes.
There may be better / more consistent ways to fix the overflows and
procfs bugs, but I figured I'd throw an RFC w/code out there for initial
conversation.  Suggestions welcome!

-- Joe


Testing
=======

Patch 1 - 32bit overflow
------------------------
>From userspace:

  fcntl(fd, F_SETPIPE_SZ, 0xffffffff);

- Before the fix, return value was 4096 as pipe size overflowed and
  was set to 4096

- After the fix, returns -1 and sets errno EINVAL, pipe size remains
  untouched


Patch 2 - non-rounded pipe-max-size value
-----------------------------------------
Keep plugging in values that need to be rounded:

  while (true); do echo 1048570 > /proc/sys/fs/pipe-max-size; done

and in another terminal, loop around reading the value:

  time (while (true); do SIZE=$(cat /proc/sys/fs/pipe-max-size); [[ $(( $SIZE % 4096 )) -ne 0 ]] && break; done; echo "$SIZE")
  1048570

  real    0m46.213s
  user    0m29.688s
  sys     0m20.042s

after the fix, the test loop never encountered a non-page-rounded value.


Patch 3 - procfs signed wrap
----------------------------
Before:

  % echo 2147483647 >/proc/sys/fs/pipe-max-size
  % cat /proc/sys/fs/pipe-max-size
  -2147483648

After:

  % echo 2147483647 >/proc/sys/fs/pipe-max-size
  % cat /proc/sys/fs/pipe-max-size
  2147483648


Joe Lawrence (3):
  pipe: avoid round_pipe_size() nr_pages overflow on 32-bit
  pipe: protect pipe_max_size access with a mutex
  pipe: match pipe_max_size data type with procfs

 fs/pipe.c       | 48 +++++++++++++++++++++++++++++++++++++++++-------
 kernel/sysctl.c |  2 +-
 2 files changed, 42 insertions(+), 8 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFC 1/3] pipe: avoid round_pipe_size() nr_pages overflow on 32-bit
  2017-09-05 14:44 [PATCH RFC 0/3] A few round_pipe_size() and pipe-max-size fixups Joe Lawrence
@ 2017-09-05 14:44 ` Joe Lawrence
  2017-09-05 14:44 ` [PATCH RFC 2/3] pipe: protect pipe_max_size access with a mutex Joe Lawrence
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Joe Lawrence @ 2017-09-05 14:44 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel
  Cc: Alexander Viro, Luis R. Rodriguez, Kees Cook, Mikulas Patocka,
	Michael Kerrisk

The round_pipe_size() function contains a right-bit-shift expression
which may overflow, which would cause undefined results in a subsequent
roundup_pow_of_two() call.

  static inline unsigned int round_pipe_size(unsigned int size)
  {
          unsigned long nr_pages;

          nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
          return roundup_pow_of_two(nr_pages) << PAGE_SHIFT;
  }

PAGE_SIZE is defined as (1UL << PAGE_SHIFT), so:
  - 4 bytes wide on 32-bit (0 to 0xffffffff)
  - 8 bytes wide on 64-bit (0 to 0xffffffffffffffff)

That means that 32-bit round_pipe_size(), nr_pages may overflow to 0:

  size=0x00000000    nr_pages=0x0
  size=0x00000001    nr_pages=0x1
  size=0xfffff000    nr_pages=0xfffff
  size=0xfffff001    nr_pages=0x0         << !
  size=0xffffffff    nr_pages=0x0         << !

This is bad because roundup_pow_of_two(n) is undefined when n == 0!

64-bit is not a problem* as the unsigned int size is 4 bytes wide
(similar to 32-bit) and the larger, 8 byte wide unsigned long, is
sufficient to handle the largest value of the bit shift expression:

  size=0xffffffff    nr_pages=100000

(*On 64-bit, round_pipe_size(0xffffffff) will later overflow when the
result of roundup_pow_of_two(0x100000) is shifted left by PAGE_SHIFT.
This behavior is at least defined by the language, so callers can safely
sanity-check a 0 return value from round_pipe_size.)

Modify round_pipe_size() to return 0 if n == 0, reset the original
pipe_max_size value, and update callers to handle accordingly.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
---
 fs/pipe.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 97e5be897753..fa28910b3c59 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1017,13 +1017,16 @@ static int fifo_open(struct inode *inode, struct file *filp)
 
 /*
  * Currently we rely on the pipe array holding a power-of-2 number
- * of pages.
+ * of pages. Returns 0 on error.
  */
 static inline unsigned int round_pipe_size(unsigned int size)
 {
 	unsigned long nr_pages;
 
 	nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	if (nr_pages == 0)
+		return 0;
+
 	return roundup_pow_of_two(nr_pages) << PAGE_SHIFT;
 }
 
@@ -1039,6 +1042,8 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
 	long ret = 0;
 
 	size = round_pipe_size(arg);
+	if (size == 0)
+		return -EINVAL;
 	nr_pages = size >> PAGE_SHIFT;
 
 	if (!nr_pages)
@@ -1122,13 +1127,22 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
 int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
 		 size_t *lenp, loff_t *ppos)
 {
+	unsigned int orig_pipe_max_size;
+	unsigned int rounded_pipe_max_size;
 	int ret;
 
+	orig_pipe_max_size = pipe_max_size;
 	ret = proc_dointvec_minmax(table, write, buf, lenp, ppos);
 	if (ret < 0 || !write)
 		return ret;
 
-	pipe_max_size = round_pipe_size(pipe_max_size);
+	rounded_pipe_max_size = round_pipe_size(pipe_max_size);
+	if (rounded_pipe_max_size == 0) {
+		pipe_max_size = orig_pipe_max_size;
+		return -EINVAL;
+	}
+
+	pipe_max_size = rounded_pipe_max_size;
 	return ret;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 2/3] pipe: protect pipe_max_size access with a mutex
  2017-09-05 14:44 [PATCH RFC 0/3] A few round_pipe_size() and pipe-max-size fixups Joe Lawrence
  2017-09-05 14:44 ` [PATCH RFC 1/3] pipe: avoid round_pipe_size() nr_pages overflow on 32-bit Joe Lawrence
@ 2017-09-05 14:44 ` Joe Lawrence
  2017-09-14 23:09   ` Mikulas Patocka
  2017-09-05 14:44 ` [PATCH RFC 3/3] pipe: match pipe_max_size data type with procfs Joe Lawrence
  2017-09-14 13:26 ` [PATCH RFC 0/3] A few round_pipe_size() and pipe-max-size fixups Michael Kerrisk (man-pages)
  3 siblings, 1 reply; 15+ messages in thread
From: Joe Lawrence @ 2017-09-05 14:44 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel
  Cc: Alexander Viro, Luis R. Rodriguez, Kees Cook, Mikulas Patocka,
	Michael Kerrisk

pipe_max_size is assigned directly via procfs sysctl:

  static struct ctl_table fs_table[] = {
          ...
          {
                  .procname       = "pipe-max-size",
                  .data           = &pipe_max_size,
                  .maxlen         = sizeof(int),
                  .mode           = 0644,
                  .proc_handler   = &pipe_proc_fn,
                  .extra1         = &pipe_min_size,
          },
          ...

  int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
                   size_t *lenp, loff_t *ppos)
  {
          ...
          ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)
          ...

and then later rounded in-place a few statements later:

          ...
          pipe_max_size = round_pipe_size(pipe_max_size);
          ...

This leaves a window of time between initial assignment and rounding
that may be visible to other threads.  (For example, one thread sets a
non-rounded value to pipe_max_size while another reads its value.)

Similar reads of pipe_max_size are potentially racey:

  pipe.c :: alloc_pipe_info()
  pipe.c :: pipe_set_size()

Protect them and the procfs sysctl assignment with a mutex.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
---
 fs/pipe.c | 28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index fa28910b3c59..33bb11b0d78e 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -35,6 +35,11 @@
 unsigned int pipe_max_size = 1048576;
 
 /*
+ * Provide mutual exclusion around access to pipe_max_size
+ */
+static DEFINE_MUTEX(pipe_max_mutex);
+
+/*
  * Minimum pipe size, as required by POSIX
  */
 unsigned int pipe_min_size = PAGE_SIZE;
@@ -623,13 +628,18 @@ struct pipe_inode_info *alloc_pipe_info(void)
 	unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
 	struct user_struct *user = get_current_user();
 	unsigned long user_bufs;
+	unsigned int max_size;
 
 	pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT);
 	if (pipe == NULL)
 		goto out_free_uid;
 
-	if (pipe_bufs * PAGE_SIZE > pipe_max_size && !capable(CAP_SYS_RESOURCE))
-		pipe_bufs = pipe_max_size >> PAGE_SHIFT;
+	mutex_lock(&pipe_max_mutex);
+	max_size = pipe_max_size;
+	mutex_unlock(&pipe_max_mutex);
+
+	if (pipe_bufs * PAGE_SIZE > max_size && !capable(CAP_SYS_RESOURCE))
+		pipe_bufs = max_size >> PAGE_SHIFT;
 
 	user_bufs = account_pipe_buffers(user, 0, pipe_bufs);
 
@@ -1039,6 +1049,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
 	struct pipe_buffer *bufs;
 	unsigned int size, nr_pages;
 	unsigned long user_bufs;
+	unsigned int max_size;
 	long ret = 0;
 
 	size = round_pipe_size(arg);
@@ -1056,8 +1067,11 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
 	 * Decreasing the pipe capacity is always permitted, even
 	 * if the user is currently over a limit.
 	 */
+	mutex_lock(&pipe_max_mutex);
+	max_size = pipe_max_size;
+	mutex_unlock(&pipe_max_mutex);
 	if (nr_pages > pipe->buffers &&
-			size > pipe_max_size && !capable(CAP_SYS_RESOURCE))
+			size > max_size && !capable(CAP_SYS_RESOURCE))
 		return -EPERM;
 
 	user_bufs = account_pipe_buffers(pipe->user, pipe->buffers, nr_pages);
@@ -1131,18 +1145,24 @@ int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
 	unsigned int rounded_pipe_max_size;
 	int ret;
 
+	mutex_lock(&pipe_max_mutex);
 	orig_pipe_max_size = pipe_max_size;
 	ret = proc_dointvec_minmax(table, write, buf, lenp, ppos);
-	if (ret < 0 || !write)
+	if (ret < 0 || !write) {
+		mutex_unlock(&pipe_max_mutex);
 		return ret;
+	}
 
 	rounded_pipe_max_size = round_pipe_size(pipe_max_size);
 	if (rounded_pipe_max_size == 0) {
 		pipe_max_size = orig_pipe_max_size;
+		mutex_unlock(&pipe_max_mutex);
 		return -EINVAL;
 	}
 
 	pipe_max_size = rounded_pipe_max_size;
+	mutex_unlock(&pipe_max_mutex);
+
 	return ret;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFC 3/3] pipe: match pipe_max_size data type with procfs
  2017-09-05 14:44 [PATCH RFC 0/3] A few round_pipe_size() and pipe-max-size fixups Joe Lawrence
  2017-09-05 14:44 ` [PATCH RFC 1/3] pipe: avoid round_pipe_size() nr_pages overflow on 32-bit Joe Lawrence
  2017-09-05 14:44 ` [PATCH RFC 2/3] pipe: protect pipe_max_size access with a mutex Joe Lawrence
@ 2017-09-05 14:44 ` Joe Lawrence
  2017-09-14 13:26 ` [PATCH RFC 0/3] A few round_pipe_size() and pipe-max-size fixups Michael Kerrisk (man-pages)
  3 siblings, 0 replies; 15+ messages in thread
From: Joe Lawrence @ 2017-09-05 14:44 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel
  Cc: Alexander Viro, Luis R. Rodriguez, Kees Cook, Mikulas Patocka,
	Michael Kerrisk

pipe_max_size is defined as an unsigned int:

  unsigned int pipe_max_size = 1048576;

but its procfs/sysctl representation is an integer:

  static struct ctl_table fs_table[] = {
          ...
          {
                  .procname       = "pipe-max-size",
                  .data           = &pipe_max_size,
                  .maxlen         = sizeof(int),
                  .mode           = 0644,
                  .proc_handler   = &pipe_proc_fn,
                  .extra1         = &pipe_min_size,
          },
          ...

that is signed:

  int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
                   size_t *lenp, loff_t *ppos)
  {
          ...
          ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)

This leads to signed results via procfs for large values of
pipe_max_size:

  % echo 2147483647 >/proc/sys/fs/pipe-max-size
  % cat /proc/sys/fs/pipe-max-size
  -2147483648

Use unsigned operations on this variable to avoid such negative values.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
---
 fs/pipe.c       | 2 +-
 kernel/sysctl.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/pipe.c b/fs/pipe.c
index 33bb11b0d78e..3b10d39cc5d1 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1147,7 +1147,7 @@ int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
 
 	mutex_lock(&pipe_max_mutex);
 	orig_pipe_max_size = pipe_max_size;
-	ret = proc_dointvec_minmax(table, write, buf, lenp, ppos);
+	ret = proc_douintvec_minmax(table, write, buf, lenp, ppos);
 	if (ret < 0 || !write) {
 		mutex_unlock(&pipe_max_mutex);
 		return ret;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6648fbbb8157..c976719bf37a 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1825,7 +1825,7 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write,
 	{
 		.procname	= "pipe-max-size",
 		.data		= &pipe_max_size,
-		.maxlen		= sizeof(int),
+		.maxlen		= sizeof(pipe_max_size),
 		.mode		= 0644,
 		.proc_handler	= &pipe_proc_fn,
 		.extra1		= &pipe_min_size,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 0/3] A few round_pipe_size() and pipe-max-size fixups
  2017-09-05 14:44 [PATCH RFC 0/3] A few round_pipe_size() and pipe-max-size fixups Joe Lawrence
                   ` (2 preceding siblings ...)
  2017-09-05 14:44 ` [PATCH RFC 3/3] pipe: match pipe_max_size data type with procfs Joe Lawrence
@ 2017-09-14 13:26 ` Michael Kerrisk (man-pages)
  2017-09-14 16:57   ` Randy Dunlap
  3 siblings, 1 reply; 15+ messages in thread
From: Michael Kerrisk (man-pages) @ 2017-09-14 13:26 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: lkml, linux-fsdevel@vger.kernel.org, Alexander Viro,
	Luis R. Rodriguez, Kees Cook, Mikulas Patocka

Hello Joe,

On 5 September 2017 at 16:44, Joe Lawrence <joe.lawrence@redhat.com> wrote:
> While backporting Michael's "pipe: fix limit handling" [1] patchset to a
> distro-kernel, Mikulas noticed that current upstream pipe limit handling
> contains a few problems:
>
>   1 - round_pipe_size() nr_pages overflow on 32bit:  this would
>       subsequently try roundup_pow_of_two(0), which is undefined.
>
>   2 - visible non-rounded pipe-max-size value: there is no mutual
>       exclusion or protection between the time pipe_max_size is assigned
>       a raw value from proc_dointvec_minmax() and when it is rounded.
>
>   3 - procfs signed wrap: echo'ing a large number into
>       /proc/sys/fs/pipe-max-size and then cat'ing it back out shows a
>       negative value.
>
>
> This RFC serves as a bug report and a contains a few possible fixes.
> There may be better / more consistent ways to fix the overflows and
> procfs bugs, but I figured I'd throw an RFC w/code out there for initial
> conversation.  Suggestions welcome!

Thank for working on this. I have no improvements to suggest. The
patches all look sane to me. For the whole series:

Reviewed-by: MIchael Kerrisk <mtk.manpages@gmail.com>

Cheers,

Michael


> Testing
> =======
>
> Patch 1 - 32bit overflow
> ------------------------
> From userspace:
>
>   fcntl(fd, F_SETPIPE_SZ, 0xffffffff);
>
> - Before the fix, return value was 4096 as pipe size overflowed and
>   was set to 4096
>
> - After the fix, returns -1 and sets errno EINVAL, pipe size remains
>   untouched
>
>
> Patch 2 - non-rounded pipe-max-size value
> -----------------------------------------
> Keep plugging in values that need to be rounded:
>
>   while (true); do echo 1048570 > /proc/sys/fs/pipe-max-size; done
>
> and in another terminal, loop around reading the value:
>
>   time (while (true); do SIZE=$(cat /proc/sys/fs/pipe-max-size); [[ $(( $SIZE % 4096 )) -ne 0 ]] && break; done; echo "$SIZE")
>   1048570
>
>   real    0m46.213s
>   user    0m29.688s
>   sys     0m20.042s
>
> after the fix, the test loop never encountered a non-page-rounded value.
>
>
> Patch 3 - procfs signed wrap
> ----------------------------
> Before:
>
>   % echo 2147483647 >/proc/sys/fs/pipe-max-size
>   % cat /proc/sys/fs/pipe-max-size
>   -2147483648
>
> After:
>
>   % echo 2147483647 >/proc/sys/fs/pipe-max-size
>   % cat /proc/sys/fs/pipe-max-size
>   2147483648
>
>
> Joe Lawrence (3):
>   pipe: avoid round_pipe_size() nr_pages overflow on 32-bit
>   pipe: protect pipe_max_size access with a mutex
>   pipe: match pipe_max_size data type with procfs
>
>  fs/pipe.c       | 48 +++++++++++++++++++++++++++++++++++++++++-------
>  kernel/sysctl.c |  2 +-
>  2 files changed, 42 insertions(+), 8 deletions(-)
>
> --
> 1.8.3.1
>



-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 0/3] A few round_pipe_size() and pipe-max-size fixups
  2017-09-14 13:26 ` [PATCH RFC 0/3] A few round_pipe_size() and pipe-max-size fixups Michael Kerrisk (man-pages)
@ 2017-09-14 16:57   ` Randy Dunlap
  2017-09-14 19:19     ` Joe Lawrence
  0 siblings, 1 reply; 15+ messages in thread
From: Randy Dunlap @ 2017-09-14 16:57 UTC (permalink / raw)
  To: mtk.manpages, Joe Lawrence
  Cc: lkml, linux-fsdevel@vger.kernel.org, Alexander Viro,
	Luis R. Rodriguez, Kees Cook, Mikulas Patocka

On 09/14/17 06:26, Michael Kerrisk (man-pages) wrote:
> Hello Joe,
> 
> On 5 September 2017 at 16:44, Joe Lawrence <joe.lawrence@redhat.com> wrote:
>> While backporting Michael's "pipe: fix limit handling" [1] patchset to a
>> distro-kernel, Mikulas noticed that current upstream pipe limit handling
>> contains a few problems:
>>
>>   1 - round_pipe_size() nr_pages overflow on 32bit:  this would
>>       subsequently try roundup_pow_of_two(0), which is undefined.

Hi,
Sorry I missed the initial posting of this.

The man page for F_SETPIPE_SZ (http://man7.org/linux/man-pages/man2/fcntl.2.html)
says:
"Attempts to set the pipe capacity below the page size are
silently rounded up to the page size."

That implies to me that setting pipe size to 0 would round up to PAGE_SIZE.
Doesn't patch 1/3 change that to return -EINVAL?


Otherwise all 3 patches look good to me.

thanks,
-- 
~Randy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 0/3] A few round_pipe_size() and pipe-max-size fixups
  2017-09-14 16:57   ` Randy Dunlap
@ 2017-09-14 19:19     ` Joe Lawrence
  2017-09-14 20:22       ` Randy Dunlap
  0 siblings, 1 reply; 15+ messages in thread
From: Joe Lawrence @ 2017-09-14 19:19 UTC (permalink / raw)
  To: Randy Dunlap, mtk.manpages
  Cc: lkml, linux-fsdevel@vger.kernel.org, Alexander Viro,
	Luis R. Rodriguez, Kees Cook, Mikulas Patocka

On 09/14/2017 12:57 PM, Randy Dunlap wrote:
> On 09/14/17 06:26, Michael Kerrisk (man-pages) wrote:
>> Hello Joe,
>>
>> On 5 September 2017 at 16:44, Joe Lawrence <joe.lawrence@redhat.com> wrote:
>>> While backporting Michael's "pipe: fix limit handling" [1] patchset to a
>>> distro-kernel, Mikulas noticed that current upstream pipe limit handling
>>> contains a few problems:
>>>
>>>   1 - round_pipe_size() nr_pages overflow on 32bit:  this would
>>>       subsequently try roundup_pow_of_two(0), which is undefined.
> 
> Hi,
> Sorry I missed the initial posting of this.
> 
> The man page for F_SETPIPE_SZ (http://man7.org/linux/man-pages/man2/fcntl.2.html)
> says:
> "Attempts to set the pipe capacity below the page size are
> silently rounded up to the page size."
> 
> That implies to me that setting pipe size to 0 would round up to PAGE_SIZE.
> Doesn't patch 1/3 change that to return -EINVAL?

Good catch.  How about something like this:

/*
 * Minimum pipe size, as required by POSIX
 */
unsigned int pipe_min_size = PAGE_SIZE;

...

static inline unsigned int round_pipe_size(unsigned int size)
 {
        unsigned long nr_pages;

+       if (size < pipe_min_size)
+               size = pipe_min_size;
+
        nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
        if (nr_pages == 0)
                return 0;

> 
> Otherwise all 3 patches look good to me.

If the above is good, I can fold this into patch 1 and respin the set.

Thanks,

-- Joe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 0/3] A few round_pipe_size() and pipe-max-size fixups
  2017-09-14 19:19     ` Joe Lawrence
@ 2017-09-14 20:22       ` Randy Dunlap
  0 siblings, 0 replies; 15+ messages in thread
From: Randy Dunlap @ 2017-09-14 20:22 UTC (permalink / raw)
  To: Joe Lawrence, mtk.manpages
  Cc: lkml, linux-fsdevel@vger.kernel.org, Alexander Viro,
	Luis R. Rodriguez, Kees Cook, Mikulas Patocka

On 09/14/17 12:19, Joe Lawrence wrote:
> On 09/14/2017 12:57 PM, Randy Dunlap wrote:
>> On 09/14/17 06:26, Michael Kerrisk (man-pages) wrote:
>>> Hello Joe,
>>>
>>> On 5 September 2017 at 16:44, Joe Lawrence <joe.lawrence@redhat.com> wrote:
>>>> While backporting Michael's "pipe: fix limit handling" [1] patchset to a
>>>> distro-kernel, Mikulas noticed that current upstream pipe limit handling
>>>> contains a few problems:
>>>>
>>>>   1 - round_pipe_size() nr_pages overflow on 32bit:  this would
>>>>       subsequently try roundup_pow_of_two(0), which is undefined.
>>
>> Hi,
>> Sorry I missed the initial posting of this.
>>
>> The man page for F_SETPIPE_SZ (http://man7.org/linux/man-pages/man2/fcntl.2.html)
>> says:
>> "Attempts to set the pipe capacity below the page size are
>> silently rounded up to the page size."
>>
>> That implies to me that setting pipe size to 0 would round up to PAGE_SIZE.
>> Doesn't patch 1/3 change that to return -EINVAL?
> 
> Good catch.  How about something like this:
> 
> /*
>  * Minimum pipe size, as required by POSIX
>  */
> unsigned int pipe_min_size = PAGE_SIZE;
> 
> ...
> 
> static inline unsigned int round_pipe_size(unsigned int size)
>  {
>         unsigned long nr_pages;
> 
> +       if (size < pipe_min_size)
> +               size = pipe_min_size;
> +
>         nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
>         if (nr_pages == 0)
>                 return 0;
> 
>>
>> Otherwise all 3 patches look good to me.
> 
> If the above is good, I can fold this into patch 1 and respin the set.

Yes, looks good to me.  Thanks.


-- 
~Randy

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/3] pipe: protect pipe_max_size access with a mutex
  2017-09-05 14:44 ` [PATCH RFC 2/3] pipe: protect pipe_max_size access with a mutex Joe Lawrence
@ 2017-09-14 23:09   ` Mikulas Patocka
  2017-09-15 14:08     ` Joe Lawrence
  2017-09-19 21:47     ` Joe Lawrence
  0 siblings, 2 replies; 15+ messages in thread
From: Mikulas Patocka @ 2017-09-14 23:09 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: linux-kernel, linux-fsdevel, Alexander Viro, Luis R. Rodriguez,
	Kees Cook, Michael Kerrisk

I think this mutex is too heavy - if multiple processes simultaneously 
create a pipe, the mutex would cause performance degradation.

You can call do_proc_dointvec with a custom callback "conv" function that 
does the rounding of the pipe size value.

Mikulas

On Tue, 5 Sep 2017, Joe Lawrence wrote:

> pipe_max_size is assigned directly via procfs sysctl:
> 
>   static struct ctl_table fs_table[] = {
>           ...
>           {
>                   .procname       = "pipe-max-size",
>                   .data           = &pipe_max_size,
>                   .maxlen         = sizeof(int),
>                   .mode           = 0644,
>                   .proc_handler   = &pipe_proc_fn,
>                   .extra1         = &pipe_min_size,
>           },
>           ...
> 
>   int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
>                    size_t *lenp, loff_t *ppos)
>   {
>           ...
>           ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)
>           ...
> 
> and then later rounded in-place a few statements later:
> 
>           ...
>           pipe_max_size = round_pipe_size(pipe_max_size);
>           ...
> 
> This leaves a window of time between initial assignment and rounding
> that may be visible to other threads.  (For example, one thread sets a
> non-rounded value to pipe_max_size while another reads its value.)
> 
> Similar reads of pipe_max_size are potentially racey:
> 
>   pipe.c :: alloc_pipe_info()
>   pipe.c :: pipe_set_size()
> 
> Protect them and the procfs sysctl assignment with a mutex.
> 
> Reported-by: Mikulas Patocka <mpatocka@redhat.com>
> Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
> ---
>  fs/pipe.c | 28 ++++++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/pipe.c b/fs/pipe.c
> index fa28910b3c59..33bb11b0d78e 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -35,6 +35,11 @@
>  unsigned int pipe_max_size = 1048576;
>  
>  /*
> + * Provide mutual exclusion around access to pipe_max_size
> + */
> +static DEFINE_MUTEX(pipe_max_mutex);
> +
> +/*
>   * Minimum pipe size, as required by POSIX
>   */
>  unsigned int pipe_min_size = PAGE_SIZE;
> @@ -623,13 +628,18 @@ struct pipe_inode_info *alloc_pipe_info(void)
>  	unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
>  	struct user_struct *user = get_current_user();
>  	unsigned long user_bufs;
> +	unsigned int max_size;
>  
>  	pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT);
>  	if (pipe == NULL)
>  		goto out_free_uid;
>  
> -	if (pipe_bufs * PAGE_SIZE > pipe_max_size && !capable(CAP_SYS_RESOURCE))
> -		pipe_bufs = pipe_max_size >> PAGE_SHIFT;
> +	mutex_lock(&pipe_max_mutex);
> +	max_size = pipe_max_size;
> +	mutex_unlock(&pipe_max_mutex);
> +
> +	if (pipe_bufs * PAGE_SIZE > max_size && !capable(CAP_SYS_RESOURCE))
> +		pipe_bufs = max_size >> PAGE_SHIFT;
>  
>  	user_bufs = account_pipe_buffers(user, 0, pipe_bufs);
>  
> @@ -1039,6 +1049,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
>  	struct pipe_buffer *bufs;
>  	unsigned int size, nr_pages;
>  	unsigned long user_bufs;
> +	unsigned int max_size;
>  	long ret = 0;
>  
>  	size = round_pipe_size(arg);
> @@ -1056,8 +1067,11 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
>  	 * Decreasing the pipe capacity is always permitted, even
>  	 * if the user is currently over a limit.
>  	 */
> +	mutex_lock(&pipe_max_mutex);
> +	max_size = pipe_max_size;
> +	mutex_unlock(&pipe_max_mutex);
>  	if (nr_pages > pipe->buffers &&
> -			size > pipe_max_size && !capable(CAP_SYS_RESOURCE))
> +			size > max_size && !capable(CAP_SYS_RESOURCE))
>  		return -EPERM;
>  
>  	user_bufs = account_pipe_buffers(pipe->user, pipe->buffers, nr_pages);
> @@ -1131,18 +1145,24 @@ int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
>  	unsigned int rounded_pipe_max_size;
>  	int ret;
>  
> +	mutex_lock(&pipe_max_mutex);
>  	orig_pipe_max_size = pipe_max_size;
>  	ret = proc_dointvec_minmax(table, write, buf, lenp, ppos);
> -	if (ret < 0 || !write)
> +	if (ret < 0 || !write) {
> +		mutex_unlock(&pipe_max_mutex);
>  		return ret;
> +	}
>  
>  	rounded_pipe_max_size = round_pipe_size(pipe_max_size);
>  	if (rounded_pipe_max_size == 0) {
>  		pipe_max_size = orig_pipe_max_size;
> +		mutex_unlock(&pipe_max_mutex);
>  		return -EINVAL;
>  	}
>  
>  	pipe_max_size = rounded_pipe_max_size;
> +	mutex_unlock(&pipe_max_mutex);
> +
>  	return ret;
>  }
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/3] pipe: protect pipe_max_size access with a mutex
  2017-09-14 23:09   ` Mikulas Patocka
@ 2017-09-15 14:08     ` Joe Lawrence
  2017-09-19  7:53       ` Mikulas Patocka
  2017-09-19 21:47     ` Joe Lawrence
  1 sibling, 1 reply; 15+ messages in thread
From: Joe Lawrence @ 2017-09-15 14:08 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: linux-kernel, linux-fsdevel, Alexander Viro, Luis R. Rodriguez,
	Kees Cook, Michael Kerrisk, Randy Dunlap

On 09/14/2017 07:09 PM, Mikulas Patocka wrote:
> On Tue, 5 Sep 2017, Joe Lawrence wrote:
>> pipe_max_size is assigned directly via procfs sysctl:
>>
>>   static struct ctl_table fs_table[] = {
>>           ...
>>           {
>>                   .procname       = "pipe-max-size",
>>                   .data           = &pipe_max_size,
>>                   .maxlen         = sizeof(int),
>>                   .mode           = 0644,
>>                   .proc_handler   = &pipe_proc_fn,
>>                   .extra1         = &pipe_min_size,
>>           },
>>           ...
>>
>>   int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
>>                    size_t *lenp, loff_t *ppos)
>>   {
>>           ...
>>           ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)
>>           ...
>>
>> and then later rounded in-place a few statements later:
>>
>>           ...
>>           pipe_max_size = round_pipe_size(pipe_max_size);
>>           ...
>>
>> This leaves a window of time between initial assignment and rounding
>> that may be visible to other threads.  (For example, one thread sets a
>> non-rounded value to pipe_max_size while another reads its value.)
>>
>> Similar reads of pipe_max_size are potentially racey:
>>
>>   pipe.c :: alloc_pipe_info()
>>   pipe.c :: pipe_set_size()
>>
>> Protect them and the procfs sysctl assignment with a mutex.
>>
>> Reported-by: Mikulas Patocka <mpatocka@redhat.com>
>> Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
>> ---
>>  fs/pipe.c | 28 ++++++++++++++++++++++++----
>>  1 file changed, 24 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/pipe.c b/fs/pipe.c
>> index fa28910b3c59..33bb11b0d78e 100644
>> --- a/fs/pipe.c
>> +++ b/fs/pipe.c
>> @@ -35,6 +35,11 @@
>>  unsigned int pipe_max_size = 1048576;
>>  
>>  /*
>> + * Provide mutual exclusion around access to pipe_max_size
>> + */
>> +static DEFINE_MUTEX(pipe_max_mutex);
>> +
>> +/*
>>   * Minimum pipe size, as required by POSIX
>>   */
>>  unsigned int pipe_min_size = PAGE_SIZE;
>> @@ -623,13 +628,18 @@ struct pipe_inode_info *alloc_pipe_info(void)
>>  	unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
>>  	struct user_struct *user = get_current_user();
>>  	unsigned long user_bufs;
>> +	unsigned int max_size;
>>  
>>  	pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT);
>>  	if (pipe == NULL)
>>  		goto out_free_uid;
>>  
>> -	if (pipe_bufs * PAGE_SIZE > pipe_max_size && !capable(CAP_SYS_RESOURCE))
>> -		pipe_bufs = pipe_max_size >> PAGE_SHIFT;
>> +	mutex_lock(&pipe_max_mutex);
>> +	max_size = pipe_max_size;
>> +	mutex_unlock(&pipe_max_mutex);
>> +
>> +	if (pipe_bufs * PAGE_SIZE > max_size && !capable(CAP_SYS_RESOURCE))
>> +		pipe_bufs = max_size >> PAGE_SHIFT;
>>  
>>  	user_bufs = account_pipe_buffers(user, 0, pipe_bufs);
>>  
>> @@ -1039,6 +1049,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
>>  	struct pipe_buffer *bufs;
>>  	unsigned int size, nr_pages;
>>  	unsigned long user_bufs;
>> +	unsigned int max_size;
>>  	long ret = 0;
>>  
>>  	size = round_pipe_size(arg);
>> @@ -1056,8 +1067,11 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
>>  	 * Decreasing the pipe capacity is always permitted, even
>>  	 * if the user is currently over a limit.
>>  	 */
>> +	mutex_lock(&pipe_max_mutex);
>> +	max_size = pipe_max_size;
>> +	mutex_unlock(&pipe_max_mutex);
>>  	if (nr_pages > pipe->buffers &&
>> -			size > pipe_max_size && !capable(CAP_SYS_RESOURCE))
>> +			size > max_size && !capable(CAP_SYS_RESOURCE))
>>  		return -EPERM;
>>  
>>  	user_bufs = account_pipe_buffers(pipe->user, pipe->buffers, nr_pages);
>> @@ -1131,18 +1145,24 @@ int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
>>  	unsigned int rounded_pipe_max_size;
>>  	int ret;
>>  
>> +	mutex_lock(&pipe_max_mutex);
>>  	orig_pipe_max_size = pipe_max_size;
>>  	ret = proc_dointvec_minmax(table, write, buf, lenp, ppos);
>> -	if (ret < 0 || !write)
>> +	if (ret < 0 || !write) {
>> +		mutex_unlock(&pipe_max_mutex);
>>  		return ret;
>> +	}
>>  
>>  	rounded_pipe_max_size = round_pipe_size(pipe_max_size);
>>  	if (rounded_pipe_max_size == 0) {
>>  		pipe_max_size = orig_pipe_max_size;
>> +		mutex_unlock(&pipe_max_mutex);
>>  		return -EINVAL;
>>  	}
>>  
>>  	pipe_max_size = rounded_pipe_max_size;
>> +	mutex_unlock(&pipe_max_mutex);
>> +
>>  	return ret;
>>  }
>>  
>> -- 
>> 1.8.3.1
>>
> I think this mutex is too heavy - if multiple processes simultaneously
> create a pipe, the mutex would cause performance degradation.
>
> You can call do_proc_dointvec with a custom callback "conv" function that
> does the rounding of the pipe size value.
>
> Mikulas
>

Hi Mikulas,

I'm not strong when it comes to memory barriers, but one of the
side-effects of using the mutex is that pipe_set_size() and
alloc_pipe_info() should have a consistent view of pipe_max_size.

If I remove the mutex (and assume that I implement a custom
do_proc_dointvec "conv" callback), is it safe for these routines to
directly use pipe_max_size as they had done before?

If not, is it safe to alias through a temporary stack variable (ie,
could the compiler re-read pipe_max_size multiple times in the function)?

Would READ_ONCE() help in any way?

The mutex covered up some confusion on my part here.

OTOH, since pipe_max_size is read-only for pipe_set_size() and
alloc_pipe_info() and only updated occasionally by pipe_proc_fn(), would
rw_semaphore or RCU be a fit here?

Regards,

-- Joe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/3] pipe: protect pipe_max_size access with a mutex
  2017-09-15 14:08     ` Joe Lawrence
@ 2017-09-19  7:53       ` Mikulas Patocka
  2017-09-19 21:32         ` Joe Lawrence
  0 siblings, 1 reply; 15+ messages in thread
From: Mikulas Patocka @ 2017-09-19  7:53 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: linux-kernel, linux-fsdevel, Alexander Viro, Luis R. Rodriguez,
	Kees Cook, Michael Kerrisk, Randy Dunlap



On Fri, 15 Sep 2017, Joe Lawrence wrote:

> On 09/14/2017 07:09 PM, Mikulas Patocka wrote:
> > On Tue, 5 Sep 2017, Joe Lawrence wrote:
> >> pipe_max_size is assigned directly via procfs sysctl:
> >>
> >>   static struct ctl_table fs_table[] = {
> >>           ...
> >>           {
> >>                   .procname       = "pipe-max-size",
> >>                   .data           = &pipe_max_size,
> >>                   .maxlen         = sizeof(int),
> >>                   .mode           = 0644,
> >>                   .proc_handler   = &pipe_proc_fn,
> >>                   .extra1         = &pipe_min_size,
> >>           },
> >>           ...
> >>
> >>   int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
> >>                    size_t *lenp, loff_t *ppos)
> >>   {
> >>           ...
> >>           ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)
> >>           ...
> >>
> >> and then later rounded in-place a few statements later:
> >>
> >>           ...
> >>           pipe_max_size = round_pipe_size(pipe_max_size);
> >>           ...
> >>
> >> This leaves a window of time between initial assignment and rounding
> >> that may be visible to other threads.  (For example, one thread sets a
> >> non-rounded value to pipe_max_size while another reads its value.)
> >>
> >> Similar reads of pipe_max_size are potentially racey:
> >>
> >>   pipe.c :: alloc_pipe_info()
> >>   pipe.c :: pipe_set_size()
> >>
> >> Protect them and the procfs sysctl assignment with a mutex.
> >>
> >> Reported-by: Mikulas Patocka <mpatocka@redhat.com>
> >> Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
> >> ---
> >>  fs/pipe.c | 28 ++++++++++++++++++++++++----
> >>  1 file changed, 24 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/fs/pipe.c b/fs/pipe.c
> >> index fa28910b3c59..33bb11b0d78e 100644
> >> --- a/fs/pipe.c
> >> +++ b/fs/pipe.c
> >> @@ -35,6 +35,11 @@
> >>  unsigned int pipe_max_size = 1048576;
> >>  
> >>  /*
> >> + * Provide mutual exclusion around access to pipe_max_size
> >> + */
> >> +static DEFINE_MUTEX(pipe_max_mutex);
> >> +
> >> +/*
> >>   * Minimum pipe size, as required by POSIX
> >>   */
> >>  unsigned int pipe_min_size = PAGE_SIZE;
> >> @@ -623,13 +628,18 @@ struct pipe_inode_info *alloc_pipe_info(void)
> >>  	unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
> >>  	struct user_struct *user = get_current_user();
> >>  	unsigned long user_bufs;
> >> +	unsigned int max_size;
> >>  
> >>  	pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT);
> >>  	if (pipe == NULL)
> >>  		goto out_free_uid;
> >>  
> >> -	if (pipe_bufs * PAGE_SIZE > pipe_max_size && !capable(CAP_SYS_RESOURCE))
> >> -		pipe_bufs = pipe_max_size >> PAGE_SHIFT;
> >> +	mutex_lock(&pipe_max_mutex);
> >> +	max_size = pipe_max_size;
> >> +	mutex_unlock(&pipe_max_mutex);
> >> +
> >> +	if (pipe_bufs * PAGE_SIZE > max_size && !capable(CAP_SYS_RESOURCE))
> >> +		pipe_bufs = max_size >> PAGE_SHIFT;
> >>  
> >>  	user_bufs = account_pipe_buffers(user, 0, pipe_bufs);
> >>  
> >> @@ -1039,6 +1049,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
> >>  	struct pipe_buffer *bufs;
> >>  	unsigned int size, nr_pages;
> >>  	unsigned long user_bufs;
> >> +	unsigned int max_size;
> >>  	long ret = 0;
> >>  
> >>  	size = round_pipe_size(arg);
> >> @@ -1056,8 +1067,11 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
> >>  	 * Decreasing the pipe capacity is always permitted, even
> >>  	 * if the user is currently over a limit.
> >>  	 */
> >> +	mutex_lock(&pipe_max_mutex);
> >> +	max_size = pipe_max_size;
> >> +	mutex_unlock(&pipe_max_mutex);
> >>  	if (nr_pages > pipe->buffers &&
> >> -			size > pipe_max_size && !capable(CAP_SYS_RESOURCE))
> >> +			size > max_size && !capable(CAP_SYS_RESOURCE))
> >>  		return -EPERM;
> >>  
> >>  	user_bufs = account_pipe_buffers(pipe->user, pipe->buffers, nr_pages);
> >> @@ -1131,18 +1145,24 @@ int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
> >>  	unsigned int rounded_pipe_max_size;
> >>  	int ret;
> >>  
> >> +	mutex_lock(&pipe_max_mutex);
> >>  	orig_pipe_max_size = pipe_max_size;
> >>  	ret = proc_dointvec_minmax(table, write, buf, lenp, ppos);
> >> -	if (ret < 0 || !write)
> >> +	if (ret < 0 || !write) {
> >> +		mutex_unlock(&pipe_max_mutex);
> >>  		return ret;
> >> +	}
> >>  
> >>  	rounded_pipe_max_size = round_pipe_size(pipe_max_size);
> >>  	if (rounded_pipe_max_size == 0) {
> >>  		pipe_max_size = orig_pipe_max_size;
> >> +		mutex_unlock(&pipe_max_mutex);
> >>  		return -EINVAL;
> >>  	}
> >>  
> >>  	pipe_max_size = rounded_pipe_max_size;
> >> +	mutex_unlock(&pipe_max_mutex);
> >> +
> >>  	return ret;
> >>  }
> >>  
> >> -- 
> >> 1.8.3.1
> >>
> > I think this mutex is too heavy - if multiple processes simultaneously
> > create a pipe, the mutex would cause performance degradation.
> >
> > You can call do_proc_dointvec with a custom callback "conv" function that
> > does the rounding of the pipe size value.
> >
> > Mikulas
> >
> 
> Hi Mikulas,
> 
> I'm not strong when it comes to memory barriers, but one of the
> side-effects of using the mutex is that pipe_set_size() and
> alloc_pipe_info() should have a consistent view of pipe_max_size.
> 
> If I remove the mutex (and assume that I implement a custom
> do_proc_dointvec "conv" callback), is it safe for these routines to
> directly use pipe_max_size as they had done before?
>
> If not, is it safe to alias through a temporary stack variable (ie,
> could the compiler re-read pipe_max_size multiple times in the function)?
>
> Would READ_ONCE() help in any way?

Theoretically re-reading the variable is possible and you should use 
ACCESS_ONCE or READ_ONCE+WRITE_ONCE on that variable.

In practice, ACCESS_ONCE/READ_ONCE/WRITE_ONCE is missing at a lot of 
kernel variables that could be modified asynchronously and no one is 
complaining about it and no one is making any systematic effort to fix it.

That re-reading happens (I have some test code that makes the gcc 
optimizer re-read a variable), but it happens very rarely.

Another theoretical problem is that when reading or writing a variable 
without ACCESS_ONCE, the compiler could read and write the variable using 
multiple smaller memory accesses. But in practice, it happens only on some 
non-common architectures.

> The mutex covered up some confusion on my part here.
> 
> OTOH, since pipe_max_size is read-only for pipe_set_size() and
> alloc_pipe_info() and only updated occasionally by pipe_proc_fn(), would
> rw_semaphore or RCU be a fit here?

RW semaphore causes cache-line ping-pong between CPUs, it slows down the 
kernel just like a normal spinlock or mutex.

RCU would be useless here (i.e. you don't want to allocate memory and 
atomically assign it with rcu_assign_pointer).

> Regards,
> 
> -- Joe

Mikulas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/3] pipe: protect pipe_max_size access with a mutex
  2017-09-19  7:53       ` Mikulas Patocka
@ 2017-09-19 21:32         ` Joe Lawrence
  2017-09-21 10:05           ` Mikulas Patocka
  0 siblings, 1 reply; 15+ messages in thread
From: Joe Lawrence @ 2017-09-19 21:32 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: linux-kernel, linux-fsdevel, Alexander Viro, Luis R. Rodriguez,
	Kees Cook, Michael Kerrisk, Randy Dunlap

On 09/19/2017 03:53 AM, Mikulas Patocka wrote:
> On Fri, 15 Sep 2017, Joe Lawrence wrote:
> [ ... snip ... ]
>> Hi Mikulas,
>>
>> I'm not strong when it comes to memory barriers, but one of the
>> side-effects of using the mutex is that pipe_set_size() and
>> alloc_pipe_info() should have a consistent view of pipe_max_size.
>>
>> If I remove the mutex (and assume that I implement a custom
>> do_proc_dointvec "conv" callback), is it safe for these routines to
>> directly use pipe_max_size as they had done before?
>>
>> If not, is it safe to alias through a temporary stack variable (ie,
>> could the compiler re-read pipe_max_size multiple times in the function)?
>>
>> Would READ_ONCE() help in any way?
> 
> Theoretically re-reading the variable is possible and you should use 
> ACCESS_ONCE or READ_ONCE+WRITE_ONCE on that variable.
> 
> In practice, ACCESS_ONCE/READ_ONCE/WRITE_ONCE is missing at a lot of 
> kernel variables that could be modified asynchronously and no one is 
> complaining about it and no one is making any systematic effort to fix it.
> 
> That re-reading happens (I have some test code that makes the gcc 
> optimizer re-read a variable), but it happens very rarely.

This would be interesting to look at if you are willing to share (can
send offlist).

> Another theoretical problem is that when reading or writing a variable 
> without ACCESS_ONCE, the compiler could read and write the variable using 
> multiple smaller memory accesses. But in practice, it happens only on some 
> non-common architectures.

Smaller access than word size?

>> The mutex covered up some confusion on my part here.
>>
>> OTOH, since pipe_max_size is read-only for pipe_set_size() and
>> alloc_pipe_info() and only updated occasionally by pipe_proc_fn(), would
>> rw_semaphore or RCU be a fit here?
> 
> RW semaphore causes cache-line ping-pong between CPUs, it slows down the 
> kernel just like a normal spinlock or mutex.

Ah right.

> RCU would be useless here (i.e. you don't want to allocate memory and 
> atomically assign it with rcu_assign_pointer).

And good point here.

Thanks for the explanations, they confirm and expand what I as already
thinking in this space.

--- Joe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/3] pipe: protect pipe_max_size access with a mutex
  2017-09-14 23:09   ` Mikulas Patocka
  2017-09-15 14:08     ` Joe Lawrence
@ 2017-09-19 21:47     ` Joe Lawrence
  2017-09-25 10:44       ` Mikulas Patocka
  1 sibling, 1 reply; 15+ messages in thread
From: Joe Lawrence @ 2017-09-19 21:47 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: linux-kernel, linux-fsdevel, Alexander Viro, Luis R. Rodriguez,
	Kees Cook, Michael Kerrisk

On 09/14/2017 07:09 PM, Mikulas Patocka wrote:
> On Tue, 5 Sep 2017, Joe Lawrence wrote:
> 
>> pipe_max_size is assigned directly via procfs sysctl:
>>
>>   static struct ctl_table fs_table[] = {
>>           ...
>>           {
>>                   .procname       = "pipe-max-size",
>>                   .data           = &pipe_max_size,
>>                   .maxlen         = sizeof(int),
>>                   .mode           = 0644,
>>                   .proc_handler   = &pipe_proc_fn,
>>                   .extra1         = &pipe_min_size,
>>           },
>>           ...
>>
>>   int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
>>                    size_t *lenp, loff_t *ppos)
>>   {
>>           ...
>>           ret = proc_dointvec_minmax(table, write, buf, lenp, ppos)
>>           ...
>>
>> and then later rounded in-place a few statements later:
>>
>>           ...
>>           pipe_max_size = round_pipe_size(pipe_max_size);
>>           ...
>>
>> This leaves a window of time between initial assignment and rounding
>> that may be visible to other threads.  (For example, one thread sets a
>> non-rounded value to pipe_max_size while another reads its value.)
>>
>> Similar reads of pipe_max_size are potentially racey:
>>
>>   pipe.c :: alloc_pipe_info()
>>   pipe.c :: pipe_set_size()
>>
>> Protect them and the procfs sysctl assignment with a mutex.
>>
>> Reported-by: Mikulas Patocka <mpatocka@redhat.com>
>> Signed-off-by: Joe Lawrence <joe.lawrence@redhat.com>
>> ---
>>  fs/pipe.c | 28 ++++++++++++++++++++++++----
>>  1 file changed, 24 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/pipe.c b/fs/pipe.c
>> index fa28910b3c59..33bb11b0d78e 100644
>> --- a/fs/pipe.c
>> +++ b/fs/pipe.c
>> @@ -35,6 +35,11 @@
>>  unsigned int pipe_max_size = 1048576;
>>  
>>  /*
>> + * Provide mutual exclusion around access to pipe_max_size
>> + */
>> +static DEFINE_MUTEX(pipe_max_mutex);
>> +
>> +/*
>>   * Minimum pipe size, as required by POSIX
>>   */
>>  unsigned int pipe_min_size = PAGE_SIZE;
>> @@ -623,13 +628,18 @@ struct pipe_inode_info *alloc_pipe_info(void)
>>  	unsigned long pipe_bufs = PIPE_DEF_BUFFERS;
>>  	struct user_struct *user = get_current_user();
>>  	unsigned long user_bufs;
>> +	unsigned int max_size;
>>  
>>  	pipe = kzalloc(sizeof(struct pipe_inode_info), GFP_KERNEL_ACCOUNT);
>>  	if (pipe == NULL)
>>  		goto out_free_uid;
>>  
>> -	if (pipe_bufs * PAGE_SIZE > pipe_max_size && !capable(CAP_SYS_RESOURCE))
>> -		pipe_bufs = pipe_max_size >> PAGE_SHIFT;
>> +	mutex_lock(&pipe_max_mutex);
>> +	max_size = pipe_max_size;
>> +	mutex_unlock(&pipe_max_mutex);
>> +
>> +	if (pipe_bufs * PAGE_SIZE > max_size && !capable(CAP_SYS_RESOURCE))
>> +		pipe_bufs = max_size >> PAGE_SHIFT;
>>  
>>  	user_bufs = account_pipe_buffers(user, 0, pipe_bufs);
>>  
>> @@ -1039,6 +1049,7 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
>>  	struct pipe_buffer *bufs;
>>  	unsigned int size, nr_pages;
>>  	unsigned long user_bufs;
>> +	unsigned int max_size;
>>  	long ret = 0;
>>  
>>  	size = round_pipe_size(arg);
>> @@ -1056,8 +1067,11 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned long arg)
>>  	 * Decreasing the pipe capacity is always permitted, even
>>  	 * if the user is currently over a limit.
>>  	 */
>> +	mutex_lock(&pipe_max_mutex);
>> +	max_size = pipe_max_size;
>> +	mutex_unlock(&pipe_max_mutex);
>>  	if (nr_pages > pipe->buffers &&
>> -			size > pipe_max_size && !capable(CAP_SYS_RESOURCE))
>> +			size > max_size && !capable(CAP_SYS_RESOURCE))
>>  		return -EPERM;
>>  
>>  	user_bufs = account_pipe_buffers(pipe->user, pipe->buffers, nr_pages);
>> @@ -1131,18 +1145,24 @@ int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
>>  	unsigned int rounded_pipe_max_size;
>>  	int ret;
>>  
>> +	mutex_lock(&pipe_max_mutex);
>>  	orig_pipe_max_size = pipe_max_size;
>>  	ret = proc_dointvec_minmax(table, write, buf, lenp, ppos);
>> -	if (ret < 0 || !write)
>> +	if (ret < 0 || !write) {
>> +		mutex_unlock(&pipe_max_mutex);
>>  		return ret;
>> +	}
>>  
>>  	rounded_pipe_max_size = round_pipe_size(pipe_max_size);
>>  	if (rounded_pipe_max_size == 0) {
>>  		pipe_max_size = orig_pipe_max_size;
>> +		mutex_unlock(&pipe_max_mutex);
>>  		return -EINVAL;
>>  	}
>>  
>>  	pipe_max_size = rounded_pipe_max_size;
>> +	mutex_unlock(&pipe_max_mutex);
>> +
>>  	return ret;
>>  }
>>  
>> -- 
>> 1.8.3.1
>>
> I think this mutex is too heavy - if multiple processes simultaneously
> create a pipe, the mutex would cause performance degradation.
>
> You can call do_proc_dointvec with a custom callback "conv" function
> that does the rounding of the pipe size value.

Okay, that would consolidate the fiddling with pipe_max_size into a
single place / assignment.

Implementation wise, are you suggesting something like the following
(work in progress).  round_pipe_size() would still be used by
fs/pipe.c::pipe_set_size() so it would need to be visible to both source
files.  The proc_do... function would be pipe-max-size specific as it's
basically:

 - proc_douintvec_minmax, but
    - without the "max" check
    - with an added PAGE_SIZE floor
    - PAGE_SIZE increment

and exported for pipe.c to call.

Plumbing in the opposite direction (export do_proc_douintvec() and stick
the new conv implementation in fs/pipe.c) might be a little easier, but
seems like the same ick-factor in the end.

Regards,

-- Joe

-->8-- -->8-- -->8-- -->8--

diff --git a/fs/pipe.c b/fs/pipe.c
index 8cbc97d97753..4db3cd2d139c 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1019,7 +1019,7 @@ static int fifo_open(struct inode *inode, struct
file *filp)
  * Currently we rely on the pipe array holding a power-of-2 number
  * of pages. Returns 0 on error.
  */
-static inline unsigned int round_pipe_size(unsigned int size)
+unsigned int round_pipe_size(unsigned int size)
 {
 	unsigned long nr_pages;

@@ -1130,19 +1130,7 @@ static long pipe_set_size(struct pipe_inode_info
*pipe, unsigned long arg)
 int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
 		 size_t *lenp, loff_t *ppos)
 {
-	unsigned int rounded_pipe_max_size;
-	int ret;
-
-	ret = proc_douintvec_minmax(table, write, buf, lenp, ppos);
-	if (ret < 0 || !write)
-		return ret;
-
-	rounded_pipe_max_size = round_pipe_size(pipe_max_size);
-	if (rounded_pipe_max_size == 0)
-		return -EINVAL;
-
-	pipe_max_size = rounded_pipe_max_size;
-	return ret;
+	return proc_dopipe_max_size(table, write, buf, lenp, ppos);
 }

 /*
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index e7497c9dde7f..485cf7a7aa8f 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -190,5 +190,6 @@ static inline int pipe_buf_steal(struct
pipe_inode_info *pipe,
 struct pipe_inode_info *get_pipe_info(struct file *file);

 int create_pipe_files(struct file **, int);
+unsigned int round_pipe_size(unsigned int size);

 #endif
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 1d4dba490fb6..ba24ca72800c 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -50,6 +50,9 @@ extern int proc_dointvec_minmax(struct ctl_table *, int,
 extern int proc_douintvec_minmax(struct ctl_table *table, int write,
 				 void __user *buffer, size_t *lenp,
 				 loff_t *ppos);
+extern int proc_dopipe_max_size(struct ctl_table *table, int write,
+				void __user *buffer, size_t *lenp,
+				loff_t *ppos);
 extern int proc_dointvec_jiffies(struct ctl_table *, int,
 				 void __user *, size_t *, loff_t *);
 extern int proc_dointvec_userhz_jiffies(struct ctl_table *, int,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index c976719bf37a..7a2913c5546e 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -67,6 +67,7 @@
 #include <linux/kexec.h>
 #include <linux/bpf.h>
 #include <linux/mount.h>
+#include <linux/pipe_fs_i.h>

 #include <linux/uaccess.h>
 #include <asm/processor.h>
@@ -2631,6 +2632,47 @@ int proc_douintvec_minmax(struct ctl_table
*table, int write,
 				 do_proc_douintvec_minmax_conv, &param);
 }

+struct do_proc_dopipe_max_size_conv_param {
+	unsigned int *min;
+};
+
+static int do_proc_dopipe_max_size_conv(unsigned long *lvalp,
+					unsigned int *valp,
+					int write, void *data)
+{
+	struct do_proc_dopipe_max_size_conv_param *param = data;
+
+	if (write) {
+		unsigned int val = round_pipe_size(*lvalp);
+
+		if (val == 0)
+			return -EINVAL;
+
+		if (param->min && *param->min > val)
+			return -ERANGE;
+
+		if (*lvalp > UINT_MAX)
+			return -EINVAL;
+
+		*valp = val;
+	} else {
+		unsigned int val = *valp;
+		*lvalp = (unsigned long) val;
+	}
+
+	return 0;
+}
+
+int proc_dopipe_max_size(struct ctl_table *table, int write,
+			 void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+	struct do_proc_dopipe_max_size_conv_param param = {
+		.min = (unsigned int *) table->extra1,
+	};
+	return do_proc_douintvec(table, write, buffer, lenp, ppos,
+				 do_proc_dopipe_max_size_conv, &param);
+}
+
 static void validate_coredump_safety(void)
 {
 #ifdef CONFIG_COREDUMP
@@ -3179,6 +3221,7 @@ int proc_doulongvec_ms_jiffies_minmax(struct
ctl_table *table, int write,
 EXPORT_SYMBOL(proc_dointvec_jiffies);
 EXPORT_SYMBOL(proc_dointvec_minmax);
 EXPORT_SYMBOL_GPL(proc_douintvec_minmax);
+EXPORT_SYMBOL_GPL(proc_dopipe_max_size);
 EXPORT_SYMBOL(proc_dointvec_userhz_jiffies);
 EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
 EXPORT_SYMBOL(proc_dostring);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/3] pipe: protect pipe_max_size access with a mutex
  2017-09-19 21:32         ` Joe Lawrence
@ 2017-09-21 10:05           ` Mikulas Patocka
  0 siblings, 0 replies; 15+ messages in thread
From: Mikulas Patocka @ 2017-09-21 10:05 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: linux-kernel, linux-fsdevel, Alexander Viro, Luis R. Rodriguez,
	Kees Cook, Michael Kerrisk, Randy Dunlap



On Tue, 19 Sep 2017, Joe Lawrence wrote:

> On 09/19/2017 03:53 AM, Mikulas Patocka wrote:
> > On Fri, 15 Sep 2017, Joe Lawrence wrote:
> > [ ... snip ... ]
> >> Hi Mikulas,
> >>
> >> I'm not strong when it comes to memory barriers, but one of the
> >> side-effects of using the mutex is that pipe_set_size() and
> >> alloc_pipe_info() should have a consistent view of pipe_max_size.
> >>
> >> If I remove the mutex (and assume that I implement a custom
> >> do_proc_dointvec "conv" callback), is it safe for these routines to
> >> directly use pipe_max_size as they had done before?
> >>
> >> If not, is it safe to alias through a temporary stack variable (ie,
> >> could the compiler re-read pipe_max_size multiple times in the function)?
> >>
> >> Would READ_ONCE() help in any way?
> > 
> > Theoretically re-reading the variable is possible and you should use 
> > ACCESS_ONCE or READ_ONCE+WRITE_ONCE on that variable.
> > 
> > In practice, ACCESS_ONCE/READ_ONCE/WRITE_ONCE is missing at a lot of 
> > kernel variables that could be modified asynchronously and no one is 
> > complaining about it and no one is making any systematic effort to fix it.
> > 
> > That re-reading happens (I have some test code that makes the gcc 
> > optimizer re-read a variable), but it happens very rarely.
> 
> This would be interesting to look at if you are willing to share (can
> send offlist).

struct s {
        unsigned a, b, c, d;
};

unsigned fn(struct s *s)
{
        unsigned a = s->a;
        s->b = a;
        asm("nop":::"ebx","ecx","edx","esi","edi","ebp");
        s->c = a;
        return s->d;
}

This piece of code makes gcc read the variable s->a twice (although it is 
read only once in the source code). Compile it with -m32 -O2. The result 
is this:

00000000 <fn>:
   0:   55                      push   %ebp
   1:   57                      push   %edi
   2:   56                      push   %esi
   3:   53                      push   %ebx
   4:   8b 44 24 14             mov    0x14(%esp),%eax
   8:   8b 10                   mov    (%eax),%edx   <--- 1st load of s->a
   a:   89 50 04                mov    %edx,0x4(%eax)
   d:   90                      nop
   e:   8b 08                   mov    (%eax),%ecx   <--- 2nd load of s->a
  10:   89 48 08                mov    %ecx,0x8(%eax)
  13:   8b 40 0c                mov    0xc(%eax),%eax
  16:   5b                      pop    %ebx
  17:   5e                      pop    %esi
  18:   5f                      pop    %edi
  19:   5d                      pop    %ebp
  1a:   c3                      ret


> > Another theoretical problem is that when reading or writing a variable 
> > without ACCESS_ONCE, the compiler could read and write the variable using 
> > multiple smaller memory accesses. But in practice, it happens only on some 
> > non-common architectures.
> 
> Smaller access than word size?

Yes. The file Documentation/memory-barriers.txt says:
 (*) For aligned memory locations whose size allows them to be accessed
     with a single memory-reference instruction, prevents "load tearing"
     and "store tearing," in which a single large access is replaced by
     multiple smaller accesses.  For example, given an architecture having
     16-bit store instructions with 7-bit immediate fields, the compiler
     might be tempted to use two 16-bit store-immediate instructions to
     implement the following 32-bit store:

        p = 0x00010002;

     Please note that GCC really does use this sort of optimization,
     which is not surprising given that it would likely take more
     than two instructions to build the constant and then store it.

But it doesn't say on which architecture gcc does it.

Mikulas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFC 2/3] pipe: protect pipe_max_size access with a mutex
  2017-09-19 21:47     ` Joe Lawrence
@ 2017-09-25 10:44       ` Mikulas Patocka
  0 siblings, 0 replies; 15+ messages in thread
From: Mikulas Patocka @ 2017-09-25 10:44 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: linux-kernel, linux-fsdevel, Alexander Viro, Luis R. Rodriguez,
	Kees Cook, Michael Kerrisk



On Tue, 19 Sep 2017, Joe Lawrence wrote:

> On 09/14/2017 07:09 PM, Mikulas Patocka wrote:
> >
> > I think this mutex is too heavy - if multiple processes simultaneously
> > create a pipe, the mutex would cause performance degradation.
> >
> > You can call do_proc_dointvec with a custom callback "conv" function
> > that does the rounding of the pipe size value.
> 
> Okay, that would consolidate the fiddling with pipe_max_size into a
> single place / assignment.
> 
> Implementation wise, are you suggesting something like the following
> (work in progress).  round_pipe_size() would still be used by
> fs/pipe.c::pipe_set_size() so it would need to be visible to both source
> files.  The proc_do... function would be pipe-max-size specific as it's
> basically:
> 
>  - proc_douintvec_minmax, but
>     - without the "max" check
>     - with an added PAGE_SIZE floor
>     - PAGE_SIZE increment
> 
> and exported for pipe.c to call.
> 
> Plumbing in the opposite direction (export do_proc_douintvec() and stick
> the new conv implementation in fs/pipe.c) might be a little easier, but
> seems like the same ick-factor in the end.
> 
> Regards,
> 
> -- Joe

Yes. I think this is correct.

Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>

> -->8-- -->8-- -->8-- -->8--
> 
> diff --git a/fs/pipe.c b/fs/pipe.c
> index 8cbc97d97753..4db3cd2d139c 100644
> --- a/fs/pipe.c
> +++ b/fs/pipe.c
> @@ -1019,7 +1019,7 @@ static int fifo_open(struct inode *inode, struct
> file *filp)
>   * Currently we rely on the pipe array holding a power-of-2 number
>   * of pages. Returns 0 on error.
>   */
> -static inline unsigned int round_pipe_size(unsigned int size)
> +unsigned int round_pipe_size(unsigned int size)
>  {
>  	unsigned long nr_pages;
> 
> @@ -1130,19 +1130,7 @@ static long pipe_set_size(struct pipe_inode_info
> *pipe, unsigned long arg)
>  int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf,
>  		 size_t *lenp, loff_t *ppos)
>  {
> -	unsigned int rounded_pipe_max_size;
> -	int ret;
> -
> -	ret = proc_douintvec_minmax(table, write, buf, lenp, ppos);
> -	if (ret < 0 || !write)
> -		return ret;
> -
> -	rounded_pipe_max_size = round_pipe_size(pipe_max_size);
> -	if (rounded_pipe_max_size == 0)
> -		return -EINVAL;
> -
> -	pipe_max_size = rounded_pipe_max_size;
> -	return ret;
> +	return proc_dopipe_max_size(table, write, buf, lenp, ppos);
>  }
> 
>  /*
> diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
> index e7497c9dde7f..485cf7a7aa8f 100644
> --- a/include/linux/pipe_fs_i.h
> +++ b/include/linux/pipe_fs_i.h
> @@ -190,5 +190,6 @@ static inline int pipe_buf_steal(struct
> pipe_inode_info *pipe,
>  struct pipe_inode_info *get_pipe_info(struct file *file);
> 
>  int create_pipe_files(struct file **, int);
> +unsigned int round_pipe_size(unsigned int size);
> 
>  #endif
> diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
> index 1d4dba490fb6..ba24ca72800c 100644
> --- a/include/linux/sysctl.h
> +++ b/include/linux/sysctl.h
> @@ -50,6 +50,9 @@ extern int proc_dointvec_minmax(struct ctl_table *, int,
>  extern int proc_douintvec_minmax(struct ctl_table *table, int write,
>  				 void __user *buffer, size_t *lenp,
>  				 loff_t *ppos);
> +extern int proc_dopipe_max_size(struct ctl_table *table, int write,
> +				void __user *buffer, size_t *lenp,
> +				loff_t *ppos);
>  extern int proc_dointvec_jiffies(struct ctl_table *, int,
>  				 void __user *, size_t *, loff_t *);
>  extern int proc_dointvec_userhz_jiffies(struct ctl_table *, int,
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index c976719bf37a..7a2913c5546e 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -67,6 +67,7 @@
>  #include <linux/kexec.h>
>  #include <linux/bpf.h>
>  #include <linux/mount.h>
> +#include <linux/pipe_fs_i.h>
> 
>  #include <linux/uaccess.h>
>  #include <asm/processor.h>
> @@ -2631,6 +2632,47 @@ int proc_douintvec_minmax(struct ctl_table
> *table, int write,
>  				 do_proc_douintvec_minmax_conv, &param);
>  }
> 
> +struct do_proc_dopipe_max_size_conv_param {
> +	unsigned int *min;
> +};
> +
> +static int do_proc_dopipe_max_size_conv(unsigned long *lvalp,
> +					unsigned int *valp,
> +					int write, void *data)
> +{
> +	struct do_proc_dopipe_max_size_conv_param *param = data;
> +
> +	if (write) {
> +		unsigned int val = round_pipe_size(*lvalp);
> +
> +		if (val == 0)
> +			return -EINVAL;
> +
> +		if (param->min && *param->min > val)
> +			return -ERANGE;
> +
> +		if (*lvalp > UINT_MAX)
> +			return -EINVAL;
> +
> +		*valp = val;
> +	} else {
> +		unsigned int val = *valp;
> +		*lvalp = (unsigned long) val;
> +	}
> +
> +	return 0;
> +}
> +
> +int proc_dopipe_max_size(struct ctl_table *table, int write,
> +			 void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> +	struct do_proc_dopipe_max_size_conv_param param = {
> +		.min = (unsigned int *) table->extra1,
> +	};
> +	return do_proc_douintvec(table, write, buffer, lenp, ppos,
> +				 do_proc_dopipe_max_size_conv, &param);
> +}
> +
>  static void validate_coredump_safety(void)
>  {
>  #ifdef CONFIG_COREDUMP
> @@ -3179,6 +3221,7 @@ int proc_doulongvec_ms_jiffies_minmax(struct
> ctl_table *table, int write,
>  EXPORT_SYMBOL(proc_dointvec_jiffies);
>  EXPORT_SYMBOL(proc_dointvec_minmax);
>  EXPORT_SYMBOL_GPL(proc_douintvec_minmax);
> +EXPORT_SYMBOL_GPL(proc_dopipe_max_size);
>  EXPORT_SYMBOL(proc_dointvec_userhz_jiffies);
>  EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
>  EXPORT_SYMBOL(proc_dostring);
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-09-25 10:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-05 14:44 [PATCH RFC 0/3] A few round_pipe_size() and pipe-max-size fixups Joe Lawrence
2017-09-05 14:44 ` [PATCH RFC 1/3] pipe: avoid round_pipe_size() nr_pages overflow on 32-bit Joe Lawrence
2017-09-05 14:44 ` [PATCH RFC 2/3] pipe: protect pipe_max_size access with a mutex Joe Lawrence
2017-09-14 23:09   ` Mikulas Patocka
2017-09-15 14:08     ` Joe Lawrence
2017-09-19  7:53       ` Mikulas Patocka
2017-09-19 21:32         ` Joe Lawrence
2017-09-21 10:05           ` Mikulas Patocka
2017-09-19 21:47     ` Joe Lawrence
2017-09-25 10:44       ` Mikulas Patocka
2017-09-05 14:44 ` [PATCH RFC 3/3] pipe: match pipe_max_size data type with procfs Joe Lawrence
2017-09-14 13:26 ` [PATCH RFC 0/3] A few round_pipe_size() and pipe-max-size fixups Michael Kerrisk (man-pages)
2017-09-14 16:57   ` Randy Dunlap
2017-09-14 19:19     ` Joe Lawrence
2017-09-14 20:22       ` Randy Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).