Linux Security Modules development

Linux Security Modules development
 help / color / mirror / Atom feed

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
From: Christian Borntraeger @ 2020-06-24 18:37 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390
In-Reply-To: <4d8fbcea-a892-3453-091f-d57c03f9aa90@de.ibm.com>



On 24.06.20 20:32, Christian Borntraeger wrote:
[...]> 
> So the translations look correct. But your change is actually a sematic change
> if(ret) will only trigger if there is an error
> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
> and we did not do it before. 
> 

So the right fix is

diff --git a/kernel/umh.c b/kernel/umh.c
index f81e8698e36e..a3a3196e84d1 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
                 * the real error code is already in sub_info->retval or
                 * sub_info->retval is 0 anyway, so don't mess with it then.
                 */
-               if (KWIFEXITED(ret))
+               if (KWEXITSTATUS(ret))
                        sub_info->retval = KWEXITSTATUS(ret);
        }
 


I think.

^ permalink raw reply related

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
From: Christian Borntraeger @ 2020-06-24 18:32 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390
In-Reply-To: <4e27098e-ac8d-98f0-3a9a-ea25242e24ec@de.ibm.com>



On 24.06.20 20:09, Christian Borntraeger wrote:
> 
> 
> On 24.06.20 19:58, Christian Borntraeger wrote:
>>
>>
>> On 24.06.20 18:09, Luis Chamberlain wrote:
>>> On Wed, Jun 24, 2020 at 05:54:46PM +0200, Christian Borntraeger wrote:
>>>>
>>>>
>>>> On 24.06.20 16:43, Christoph Hellwig wrote:
>>>>> On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote:
>>>>>> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
>>>>>> linux-bridge on s390?
>>>>>
>>>>> Are we even sure this is s390 specific and doesn't happen on other
>>>>> architectures with the same bridge setup?
>>>>
>>>> Fair point. AFAIK nobody has tested this yet on x86.
>>>
>>> Regardless, can you enable dynamic debug prints, to see if the kernel
>>> reveals anything on the bridge code which may be relevant:
>>>
>>> echo "file net/bridge/* +p" > /sys/kernel/debug/dynamic_debug/control
>>>
>>>   Luis
>>
>> When I start a guest the following happens with the patch:
>>
>> [   47.420237] virbr0: port 2(vnet0) entered blocking state
>> [   47.420242] virbr0: port 2(vnet0) entered disabled state
>> [   47.420315] device vnet0 entered promiscuous mode
>> [   47.420365] virbr0: port 2(vnet0) event 16
>> [   47.420366] virbr0: br_fill_info event 16 port vnet0 master virbr0
>> [   47.420373] virbr0: toggle option: 12 state: 0 -> 0
>> [   47.420536] virbr0: port 2(vnet0) entered blocking state
>> [   47.420538] virbr0: port 2(vnet0) event 16
>> [   47.420539] virbr0: br_fill_info event 16 port vnet0 master virbr0
>>
>> and the nothing happens.
>>
>>
>> without the patch
>> [   33.805410] virbr0: hello timer expired
>> [   35.805413] virbr0: hello timer expired
>> [   36.184349] virbr0: port 2(vnet0) entered blocking state
>> [   36.184353] virbr0: port 2(vnet0) entered disabled state
>> [   36.184427] device vnet0 entered promiscuous mode
>> [   36.184479] virbr0: port 2(vnet0) event 16
>> [   36.184480] virbr0: br_fill_info event 16 port vnet0 master virbr0
>> [   36.184487] virbr0: toggle option: 12 state: 0 -> 0
>> [   36.184636] virbr0: port 2(vnet0) entered blocking state
>> [   36.184638] virbr0: port 2(vnet0) entered listening state
>> [   36.184639] virbr0: port 2(vnet0) event 16
>> [   36.184640] virbr0: br_fill_info event 16 port vnet0 master virbr0
>> [   36.184645] virbr0: port 2(vnet0) event 16
>> [   36.184646] virbr0: br_fill_info event 16 port vnet0 master virbr0
>> [   37.805478] virbr0: hello timer expired
>> [   38.205413] virbr0: port 2(vnet0) forward delay timer
>> [   38.205414] virbr0: port 2(vnet0) entered learning state
>> [   38.205427] virbr0: port 2(vnet0) event 16
>> [   38.205430] virbr0: br_fill_info event 16 port vnet0 master virbr0
>> [   38.765414] virbr0: port 2(vnet0) hold timer expired
>> [   39.805415] virbr0: hello timer expired
>> [   40.285410] virbr0: port 2(vnet0) forward delay timer
>> [   40.285411] virbr0: port 2(vnet0) entered forwarding state
>> [   40.285418] virbr0: topology change detected, propagating
>> [   40.285420] virbr0: decreasing ageing time to 400
>> [   40.285427] virbr0: port 2(vnet0) event 16
>> [   40.285432] virbr0: br_fill_info event 16 port vnet0 master virbr0
>> [   40.765408] virbr0: port 2(vnet0) hold timer expired
>> [   41.805415] virbr0: hello timer expired
>> [   42.765426] virbr0: port 2(vnet0) hold timer expired
>> [   43.805425] virbr0: hello timer expired
>> [   44.765426] virbr0: port 2(vnet0) hold timer expired
>> [   45.805418] virbr0: hello timer expired
>>
>> and continuing....
> 
> Just reverting the umh.c parts like this makes the problem go away.
> 
> diff --git a/kernel/umh.c b/kernel/umh.c
> index f81e8698e36e..79f139a7ca03 100644
> --- a/kernel/umh.c
> +++ b/kernel/umh.c
> @@ -154,8 +154,8 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
>                  * the real error code is already in sub_info->retval or
>                  * sub_info->retval is 0 anyway, so don't mess with it then.
>                  */
> -               if (KWIFEXITED(ret))
> -                       sub_info->retval = KWEXITSTATUS(ret);
> +               if (ret)
> +                       sub_info->retval = ret;
>         }
>  
>         /* Restore default kernel sig handler */

I instrumented this:

[    5.118528] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 
[    9.409235] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   10.114914] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 
[   10.116308] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 
[   10.117690] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 
[   10.118732] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 
[   10.119899] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   10.121365] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 
[   10.128044] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   10.945814] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   10.962799] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   10.983755] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   10.992705] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.118877] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.124359] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.129364] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.142139] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.148385] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.153686] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.158861] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.164068] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.192445] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.200605] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.208493] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.216612] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.226467] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.525363] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.532758] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.535279] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.697660] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.701634] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.818652] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   11.836228] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   12.082266] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 
[   40.965898] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 

So the translations look correct. But your change is actually a sematic change
if(ret) will only trigger if there is an error
if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD
and we did not do it before. 

^ permalink raw reply

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
From: Christian Borntraeger @ 2020-06-24 18:09 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390
In-Reply-To: <ea41e2a9-61f7-aec1-79e5-7b08b6dd5119@de.ibm.com>



On 24.06.20 19:58, Christian Borntraeger wrote:
> 
> 
> On 24.06.20 18:09, Luis Chamberlain wrote:
>> On Wed, Jun 24, 2020 at 05:54:46PM +0200, Christian Borntraeger wrote:
>>>
>>>
>>> On 24.06.20 16:43, Christoph Hellwig wrote:
>>>> On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote:
>>>>> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
>>>>> linux-bridge on s390?
>>>>
>>>> Are we even sure this is s390 specific and doesn't happen on other
>>>> architectures with the same bridge setup?
>>>
>>> Fair point. AFAIK nobody has tested this yet on x86.
>>
>> Regardless, can you enable dynamic debug prints, to see if the kernel
>> reveals anything on the bridge code which may be relevant:
>>
>> echo "file net/bridge/* +p" > /sys/kernel/debug/dynamic_debug/control
>>
>>   Luis
> 
> When I start a guest the following happens with the patch:
> 
> [   47.420237] virbr0: port 2(vnet0) entered blocking state
> [   47.420242] virbr0: port 2(vnet0) entered disabled state
> [   47.420315] device vnet0 entered promiscuous mode
> [   47.420365] virbr0: port 2(vnet0) event 16
> [   47.420366] virbr0: br_fill_info event 16 port vnet0 master virbr0
> [   47.420373] virbr0: toggle option: 12 state: 0 -> 0
> [   47.420536] virbr0: port 2(vnet0) entered blocking state
> [   47.420538] virbr0: port 2(vnet0) event 16
> [   47.420539] virbr0: br_fill_info event 16 port vnet0 master virbr0
> 
> and the nothing happens.
> 
> 
> without the patch
> [   33.805410] virbr0: hello timer expired
> [   35.805413] virbr0: hello timer expired
> [   36.184349] virbr0: port 2(vnet0) entered blocking state
> [   36.184353] virbr0: port 2(vnet0) entered disabled state
> [   36.184427] device vnet0 entered promiscuous mode
> [   36.184479] virbr0: port 2(vnet0) event 16
> [   36.184480] virbr0: br_fill_info event 16 port vnet0 master virbr0
> [   36.184487] virbr0: toggle option: 12 state: 0 -> 0
> [   36.184636] virbr0: port 2(vnet0) entered blocking state
> [   36.184638] virbr0: port 2(vnet0) entered listening state
> [   36.184639] virbr0: port 2(vnet0) event 16
> [   36.184640] virbr0: br_fill_info event 16 port vnet0 master virbr0
> [   36.184645] virbr0: port 2(vnet0) event 16
> [   36.184646] virbr0: br_fill_info event 16 port vnet0 master virbr0
> [   37.805478] virbr0: hello timer expired
> [   38.205413] virbr0: port 2(vnet0) forward delay timer
> [   38.205414] virbr0: port 2(vnet0) entered learning state
> [   38.205427] virbr0: port 2(vnet0) event 16
> [   38.205430] virbr0: br_fill_info event 16 port vnet0 master virbr0
> [   38.765414] virbr0: port 2(vnet0) hold timer expired
> [   39.805415] virbr0: hello timer expired
> [   40.285410] virbr0: port 2(vnet0) forward delay timer
> [   40.285411] virbr0: port 2(vnet0) entered forwarding state
> [   40.285418] virbr0: topology change detected, propagating
> [   40.285420] virbr0: decreasing ageing time to 400
> [   40.285427] virbr0: port 2(vnet0) event 16
> [   40.285432] virbr0: br_fill_info event 16 port vnet0 master virbr0
> [   40.765408] virbr0: port 2(vnet0) hold timer expired
> [   41.805415] virbr0: hello timer expired
> [   42.765426] virbr0: port 2(vnet0) hold timer expired
> [   43.805425] virbr0: hello timer expired
> [   44.765426] virbr0: port 2(vnet0) hold timer expired
> [   45.805418] virbr0: hello timer expired
> 
> and continuing....

Just reverting the umh.c parts like this makes the problem go away.

diff --git a/kernel/umh.c b/kernel/umh.c
index f81e8698e36e..79f139a7ca03 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -154,8 +154,8 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info)
                 * the real error code is already in sub_info->retval or
                 * sub_info->retval is 0 anyway, so don't mess with it then.
                 */
-               if (KWIFEXITED(ret))
-                       sub_info->retval = KWEXITSTATUS(ret);
+               if (ret)
+                       sub_info->retval = ret;
        }
 
        /* Restore default kernel sig handler */



^ permalink raw reply related

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
From: Christian Borntraeger @ 2020-06-24 17:58 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390
In-Reply-To: <20200624160953.GH4332@42.do-not-panic.com>



On 24.06.20 18:09, Luis Chamberlain wrote:
> On Wed, Jun 24, 2020 at 05:54:46PM +0200, Christian Borntraeger wrote:
>>
>>
>> On 24.06.20 16:43, Christoph Hellwig wrote:
>>> On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote:
>>>> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
>>>> linux-bridge on s390?
>>>
>>> Are we even sure this is s390 specific and doesn't happen on other
>>> architectures with the same bridge setup?
>>
>> Fair point. AFAIK nobody has tested this yet on x86.
> 
> Regardless, can you enable dynamic debug prints, to see if the kernel
> reveals anything on the bridge code which may be relevant:
> 
> echo "file net/bridge/* +p" > /sys/kernel/debug/dynamic_debug/control
> 
>   Luis

When I start a guest the following happens with the patch:

[   47.420237] virbr0: port 2(vnet0) entered blocking state
[   47.420242] virbr0: port 2(vnet0) entered disabled state
[   47.420315] device vnet0 entered promiscuous mode
[   47.420365] virbr0: port 2(vnet0) event 16
[   47.420366] virbr0: br_fill_info event 16 port vnet0 master virbr0
[   47.420373] virbr0: toggle option: 12 state: 0 -> 0
[   47.420536] virbr0: port 2(vnet0) entered blocking state
[   47.420538] virbr0: port 2(vnet0) event 16
[   47.420539] virbr0: br_fill_info event 16 port vnet0 master virbr0

and the nothing happens.


without the patch
[   33.805410] virbr0: hello timer expired
[   35.805413] virbr0: hello timer expired
[   36.184349] virbr0: port 2(vnet0) entered blocking state
[   36.184353] virbr0: port 2(vnet0) entered disabled state
[   36.184427] device vnet0 entered promiscuous mode
[   36.184479] virbr0: port 2(vnet0) event 16
[   36.184480] virbr0: br_fill_info event 16 port vnet0 master virbr0
[   36.184487] virbr0: toggle option: 12 state: 0 -> 0
[   36.184636] virbr0: port 2(vnet0) entered blocking state
[   36.184638] virbr0: port 2(vnet0) entered listening state
[   36.184639] virbr0: port 2(vnet0) event 16
[   36.184640] virbr0: br_fill_info event 16 port vnet0 master virbr0
[   36.184645] virbr0: port 2(vnet0) event 16
[   36.184646] virbr0: br_fill_info event 16 port vnet0 master virbr0
[   37.805478] virbr0: hello timer expired
[   38.205413] virbr0: port 2(vnet0) forward delay timer
[   38.205414] virbr0: port 2(vnet0) entered learning state
[   38.205427] virbr0: port 2(vnet0) event 16
[   38.205430] virbr0: br_fill_info event 16 port vnet0 master virbr0
[   38.765414] virbr0: port 2(vnet0) hold timer expired
[   39.805415] virbr0: hello timer expired
[   40.285410] virbr0: port 2(vnet0) forward delay timer
[   40.285411] virbr0: port 2(vnet0) entered forwarding state
[   40.285418] virbr0: topology change detected, propagating
[   40.285420] virbr0: decreasing ageing time to 400
[   40.285427] virbr0: port 2(vnet0) event 16
[   40.285432] virbr0: br_fill_info event 16 port vnet0 master virbr0
[   40.765408] virbr0: port 2(vnet0) hold timer expired
[   41.805415] virbr0: hello timer expired
[   42.765426] virbr0: port 2(vnet0) hold timer expired
[   43.805425] virbr0: hello timer expired
[   44.765426] virbr0: port 2(vnet0) hold timer expired
[   45.805418] virbr0: hello timer expired

and continuing....





^ permalink raw reply

* Re: [RFC][PATCH] net/bpfilter: Remove this broken and apparently unmantained
From: Alexei Starovoitov @ 2020-06-24 17:54 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: Tetsuo Handa, linux-security-module, Eric W. Biederman,
	Linus Torvalds, Kees Cook, Andrew Morton, Alexei Starovoitov,
	David Miller, Al Viro, bpf, linux-fsdevel, Daniel Borkmann,
	Jakub Kicinski, Masahiro Yamada, Gary Lin, Bruno Meneguele
In-Reply-To: <748ef005-7f64-ab9b-c767-c617ec995df4@schaufler-ca.com>

On Wed, Jun 24, 2020 at 08:41:37AM -0700, Casey Schaufler wrote:
> On 6/24/2020 12:05 AM, Tetsuo Handa wrote:
> > Forwarding to LSM-ML again. Any comments?
> 
> Hey, BPF folks - you *really* need to do better about keeping the LSM
> community in the loop when you're discussing LSM issues. 
> 
> >
> > On 2020/06/24 15:39, Alexei Starovoitov wrote:
> >> On Wed, Jun 24, 2020 at 01:58:33PM +0900, Tetsuo Handa wrote:
> >>> On 2020/06/24 13:00, Alexei Starovoitov wrote:
> >>>>> However, regarding usermode_blob, although the byte array (which contains code / data)
> >>>>> might be initially loaded from the kernel space (which is protected), that byte array
> >>>>> is no longer protected (e.g. SIGKILL, strace()) when executed because they are placed
> >>>>> in the user address space. Thus, LSM modules (including pathname based security) want
> >>>>> to control how that byte array can behave.
> >>>> It's privileged memory regardless. root can poke into kernel or any process memory.
> >>> LSM is there to restrict processes running as "root".
> >> hmm. do you really mean that it's possible for an LSM to restrict CAP_SYS_ADMIN effectively?
> 
> I think that SELinux works hard to do just that. SELinux implements it's own
> privilege model that is tangential to the capabilities model.

of course. no argument here.

> More directly, it is simple to create a security module to provide finer privilege
> granularity than capabilities. I have one lurking in a source tree, and I would
> be surprised if it's the only one waiting for the next round of LSM stacking.

no one is arguing with that either.

> 
> >> LSM can certainly provide extra level of foolproof-ness against accidental
> >> mistakes, but it's not a security boundary.
> 
> Gasp! Them's fight'n words. How do you justify such an outrageous claim?

.. for root user processes.
What's outrageous about that?
Did you capture the context or just replying to few sentences out of the context?

^ permalink raw reply

* clean up kernel_{read,write} & friends v5
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel

Hi Al,

this series fixes a few issues and cleans up the helpers that read from
or write to kernel space buffers, and ensures that we don't change the
address limit if we are using the ->read_iter and ->write_iter methods
that don't need the changed address limit.

I did not add your suggested comments on the instances using
uaccess_kernel as all of them already have comments.  If you have
anything better in mind feel free to throw in additional comments.


Changes since v4:
 - warn on calling __kernel_write on files not open for write
 - add a FMODE_READ check and warning in __kernel_read
 - add a new patch to remove kernel_readv
 - stop preferring the iter variants if normal read/write is
   present

Changes since v3:
 - keep call_read_iter/call_write_iter for now
 - don't modify an existing long line
 - update a change log

Changes since v2:
 - picked up a few ACKs

Changes since v1:
 - __kernel_write must not take sb_writers
 - unexport __kernel_write

Diffstat:
 fs/autofs/waitq.c            |    2 
 fs/cachefiles/rdwr.c         |    2 
 fs/read_write.c              |  171 ++++++++++++++++++++++++++-----------------
 fs/splice.c                  |   53 +++----------
 include/linux/fs.h           |    4 -
 net/bpfilter/bpfilter_kern.c |    2 
 security/integrity/iint.c    |   14 ---
 7 files changed, 125 insertions(+), 123 deletions(-)

^ permalink raw reply

* [PATCH 04/14] fs: unexport __kernel_write
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel
In-Reply-To: <20200624161335.1810359-1-hch@lst.de>

This is a very special interface that skips sb_writes protection, and not
used by modules anymore.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index bbfa9b12b15eb7..2c601d853ff3d8 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -522,7 +522,6 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t
 	inc_syscw(current);
 	return ret;
 }
-EXPORT_SYMBOL(__kernel_write);
 
 ssize_t kernel_write(struct file *file, const void *buf, size_t count,
 			    loff_t *pos)
-- 
2.26.2


^ permalink raw reply related

* [PATCH 05/14] fs: check FMODE_WRITE in __kernel_write
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel
In-Reply-To: <20200624161335.1810359-1-hch@lst.de>

Add a WARN_ON_ONCE if the file isn't actually open for write.  This
matches the check done in vfs_write, but actually warn warns as a
kernel user calling write on a file not opened for writing is a pretty
obvious programming error.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/read_write.c b/fs/read_write.c
index 2c601d853ff3d8..8f9fc05990ae8b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -505,6 +505,8 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t
 	const char __user *p;
 	ssize_t ret;
 
+	if (WARN_ON_ONCE(!(file->f_mode & FMODE_WRITE)))
+		return -EBADF;
 	if (!(file->f_mode & FMODE_CAN_WRITE))
 		return -EINVAL;
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH 06/14] fs: implement kernel_write using __kernel_write
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel
In-Reply-To: <20200624161335.1810359-1-hch@lst.de>

Consolidate the two in-kernel write helpers to make upcoming changes
easier.  The only difference are the missing call to rw_verify_area
in kernel_write, and an access_ok check that doesn't make sense for
kernel buffers to start with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 8f9fc05990ae8b..5110cd1e6e2771 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -499,6 +499,7 @@ static ssize_t __vfs_write(struct file *file, const char __user *p,
 		return -EINVAL;
 }
 
+/* caller is responsible for file_start_write/file_end_write */
 ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t *pos)
 {
 	mm_segment_t old_fs;
@@ -528,16 +529,16 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t
 ssize_t kernel_write(struct file *file, const void *buf, size_t count,
 			    loff_t *pos)
 {
-	mm_segment_t old_fs;
-	ssize_t res;
+	ssize_t ret;
 
-	old_fs = get_fs();
-	set_fs(KERNEL_DS);
-	/* The cast to a user pointer is valid due to the set_fs() */
-	res = vfs_write(file, (__force const char __user *)buf, count, pos);
-	set_fs(old_fs);
+	ret = rw_verify_area(WRITE, file, pos, count);
+	if (ret)
+		return ret;
 
-	return res;
+	file_start_write(file);
+	ret =  __kernel_write(file, buf, count, pos);
+	file_end_write(file);
+	return ret;
 }
 EXPORT_SYMBOL(kernel_write);
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH 07/14] fs: remove __vfs_write
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel
In-Reply-To: <20200624161335.1810359-1-hch@lst.de>

Fold it into the two callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 46 ++++++++++++++++++++++------------------------
 1 file changed, 22 insertions(+), 24 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 5110cd1e6e2771..96e8e354f99b45 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -488,17 +488,6 @@ static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t
 	return ret;
 }
 
-static ssize_t __vfs_write(struct file *file, const char __user *p,
-			   size_t count, loff_t *pos)
-{
-	if (file->f_op->write)
-		return file->f_op->write(file, p, count, pos);
-	else if (file->f_op->write_iter)
-		return new_sync_write(file, p, count, pos);
-	else
-		return -EINVAL;
-}
-
 /* caller is responsible for file_start_write/file_end_write */
 ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t *pos)
 {
@@ -516,7 +505,12 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t
 	p = (__force const char __user *)buf;
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
-	ret = __vfs_write(file, p, count, pos);
+	if (file->f_op->write)
+		ret = file->f_op->write(file, p, count, pos);
+	else if (file->f_op->write_iter)
+		ret = new_sync_write(file, p, count, pos);
+	else
+		ret = -EINVAL;
 	set_fs(old_fs);
 	if (ret > 0) {
 		fsnotify_modify(file);
@@ -554,19 +548,23 @@ ssize_t vfs_write(struct file *file, const char __user *buf, size_t count, loff_
 		return -EFAULT;
 
 	ret = rw_verify_area(WRITE, file, pos, count);
-	if (!ret) {
-		if (count > MAX_RW_COUNT)
-			count =  MAX_RW_COUNT;
-		file_start_write(file);
-		ret = __vfs_write(file, buf, count, pos);
-		if (ret > 0) {
-			fsnotify_modify(file);
-			add_wchar(current, ret);
-		}
-		inc_syscw(current);
-		file_end_write(file);
+	if (ret)
+		return ret;
+	if (count > MAX_RW_COUNT)
+		count =  MAX_RW_COUNT;
+	file_start_write(file);
+	if (file->f_op->write)
+		ret = file->f_op->write(file, buf, count, pos);
+	else if (file->f_op->write_iter)
+		ret = new_sync_write(file, buf, count, pos);
+	else
+		ret = -EINVAL;
+	if (ret > 0) {
+		fsnotify_modify(file);
+		add_wchar(current, ret);
 	}
-
+	inc_syscw(current);
+	file_end_write(file);
 	return ret;
 }
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH 10/14] integrity/ima: switch to using __kernel_read
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel
In-Reply-To: <20200624161335.1810359-1-hch@lst.de>

__kernel_read has a bunch of additional sanity checks, and this moves
the set_fs out of non-core code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 security/integrity/iint.c | 14 +-------------
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/security/integrity/iint.c b/security/integrity/iint.c
index e12c4900510f60..1d20003243c3fb 100644
--- a/security/integrity/iint.c
+++ b/security/integrity/iint.c
@@ -188,19 +188,7 @@ DEFINE_LSM(integrity) = {
 int integrity_kernel_read(struct file *file, loff_t offset,
 			  void *addr, unsigned long count)
 {
-	mm_segment_t old_fs;
-	char __user *buf = (char __user *)addr;
-	ssize_t ret;
-
-	if (!(file->f_mode & FMODE_READ))
-		return -EBADF;
-
-	old_fs = get_fs();
-	set_fs(KERNEL_DS);
-	ret = __vfs_read(file, buf, count, &offset);
-	set_fs(old_fs);
-
-	return ret;
+	return __kernel_read(file, addr, count, &offset);
 }
 
 /*
-- 
2.26.2


^ permalink raw reply related

* [PATCH 08/14] fs: don't change the address limit for ->write_iter in __kernel_write
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel
In-Reply-To: <20200624161335.1810359-1-hch@lst.de>

If we write to a file that implements ->write_iter there is no need
to change the address limit if we send a kvec down.  Implement that
case, and prefer it over using plain ->write with a changed address
limit if available.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 96e8e354f99b45..bd46c959799e97 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -489,10 +489,9 @@ static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t
 }
 
 /* caller is responsible for file_start_write/file_end_write */
-ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t *pos)
+ssize_t __kernel_write(struct file *file, const void *buf, size_t count,
+		loff_t *pos)
 {
-	mm_segment_t old_fs;
-	const char __user *p;
 	ssize_t ret;
 
 	if (WARN_ON_ONCE(!(file->f_mode & FMODE_WRITE)))
@@ -500,18 +499,29 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count, loff_t
 	if (!(file->f_mode & FMODE_CAN_WRITE))
 		return -EINVAL;
 
-	old_fs = get_fs();
-	set_fs(KERNEL_DS);
-	p = (__force const char __user *)buf;
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
-	if (file->f_op->write)
-		ret = file->f_op->write(file, p, count, pos);
-	else if (file->f_op->write_iter)
-		ret = new_sync_write(file, p, count, pos);
-	else
+	if (file->f_op->write_iter) {
+		struct kvec iov = { .iov_base = (void *)buf, .iov_len = count };
+		struct kiocb kiocb;
+		struct iov_iter iter;
+
+		init_sync_kiocb(&kiocb, file);
+		kiocb.ki_pos = *pos;
+		iov_iter_kvec(&iter, WRITE, &iov, 1, count);
+		ret = file->f_op->write_iter(&kiocb, &iter);
+		if (ret > 0)
+			*pos = kiocb.ki_pos;
+	} else if (file->f_op->write) {
+		mm_segment_t old_fs = get_fs();
+
+		set_fs(KERNEL_DS);
+		ret = file->f_op->write(file, (__force const char __user *)buf,
+				count, pos);
+		set_fs(old_fs);
+	} else {
 		ret = -EINVAL;
-	set_fs(old_fs);
+	}
 	if (ret > 0) {
 		fsnotify_modify(file);
 		add_wchar(current, ret);
-- 
2.26.2


^ permalink raw reply related

* [PATCH 11/14] fs: implement kernel_read using __kernel_read
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel
In-Reply-To: <20200624161335.1810359-1-hch@lst.de>

Consolidate the two in-kernel read helpers to make upcoming changes
easier.  The only difference are the missing call to rw_verify_area
in kernel_read, and an access_ok check that doesn't make sense for
kernel buffers to start with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index cc8e0b4f3cd697..a0a0b5d1d9249c 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -455,15 +455,12 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 
 ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 {
-	mm_segment_t old_fs;
-	ssize_t result;
+	ssize_t ret;
 
-	old_fs = get_fs();
-	set_fs(KERNEL_DS);
-	/* The cast to a user pointer is valid due to the set_fs() */
-	result = vfs_read(file, (void __user *)buf, count, pos);
-	set_fs(old_fs);
-	return result;
+	ret = rw_verify_area(READ, file, pos, count);
+	if (ret)
+		return ret;
+	return __kernel_read(file, buf, count, pos);
 }
 EXPORT_SYMBOL(kernel_read);
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH 14/14] fs: don't change the address limit for ->read_iter in __kernel_read
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel
In-Reply-To: <20200624161335.1810359-1-hch@lst.de>

If we read to a file that implements ->read_iter there is no need
to change the address limit if we send a kvec down.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c | 40 +++++++++++++++++++++++++---------------
 1 file changed, 25 insertions(+), 15 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 1c41c25e548d8c..e7f36b15683049 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -421,7 +421,6 @@ static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, lo
 
 ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 {
-	mm_segment_t old_fs = get_fs();
 	ssize_t ret;
 
 	if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ)))
@@ -431,14 +430,25 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
-	set_fs(KERNEL_DS);
-	if (file->f_op->read)
+	if (file->f_op->read) {
+		mm_segment_t old_fs = get_fs();
+
+		set_fs(KERNEL_DS);
 		ret = file->f_op->read(file, (void __user *)buf, count, pos);
-	else if (file->f_op->read_iter)
-		ret = new_sync_read(file, (void __user *)buf, count, pos);
-	else
+		set_fs(old_fs);
+	} else if (file->f_op->read_iter) {
+		struct kvec iov = { .iov_base = buf, .iov_len = count };
+		struct kiocb kiocb;
+		struct iov_iter iter;
+
+		init_sync_kiocb(&kiocb, file);
+		kiocb.ki_pos = *pos;
+		iov_iter_kvec(&iter, READ, &iov, 1, count);
+		ret = file->f_op->read_iter(&kiocb, &iter);
+		*pos = kiocb.ki_pos;
+	} else {
 		ret = -EINVAL;
-	set_fs(old_fs);
+	}
 	if (ret > 0) {
 		fsnotify_access(file);
 		add_rchar(current, ret);
@@ -520,7 +530,14 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count,
 
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
-	if (file->f_op->write_iter) {
+	if (file->f_op->write) {
+		mm_segment_t old_fs = get_fs();
+
+		set_fs(KERNEL_DS);
+		ret = file->f_op->write(file, (__force const char __user *)buf,
+				count, pos);
+		set_fs(old_fs);
+	} else if (file->f_op->write_iter) {
 		struct kvec iov = { .iov_base = (void *)buf, .iov_len = count };
 		struct kiocb kiocb;
 		struct iov_iter iter;
@@ -531,13 +548,6 @@ ssize_t __kernel_write(struct file *file, const void *buf, size_t count,
 		ret = file->f_op->write_iter(&kiocb, &iter);
 		if (ret > 0)
 			*pos = kiocb.ki_pos;
-	} else if (file->f_op->write) {
-		mm_segment_t old_fs = get_fs();
-
-		set_fs(KERNEL_DS);
-		ret = file->f_op->write(file, (__force const char __user *)buf,
-				count, pos);
-		set_fs(old_fs);
 	} else {
 		ret = -EINVAL;
 	}
-- 
2.26.2


^ permalink raw reply related

* [PATCH 09/14] fs: add a __kernel_read helper
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel
In-Reply-To: <20200624161335.1810359-1-hch@lst.de>

This is the counterpart to __kernel_write, and skip the rw_verify_area
call compared to kernel_read.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c    | 23 +++++++++++++++++++++++
 include/linux/fs.h |  1 +
 2 files changed, 24 insertions(+)

diff --git a/fs/read_write.c b/fs/read_write.c
index bd46c959799e97..cc8e0b4f3cd697 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -430,6 +430,29 @@ ssize_t __vfs_read(struct file *file, char __user *buf, size_t count,
 		return -EINVAL;
 }
 
+ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
+{
+	mm_segment_t old_fs = get_fs();
+	ssize_t ret;
+
+	if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ)))
+		return -EINVAL;
+	if (!(file->f_mode & FMODE_CAN_READ))
+		return -EINVAL;
+
+	if (count > MAX_RW_COUNT)
+		count =  MAX_RW_COUNT;
+	set_fs(KERNEL_DS);
+	ret = __vfs_read(file, (void __user *)buf, count, pos);
+	set_fs(old_fs);
+	if (ret > 0) {
+		fsnotify_access(file);
+		add_rchar(current, ret);
+	}
+	inc_syscr(current);
+	return ret;
+}
+
 ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 {
 	mm_segment_t old_fs;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3f881a892ea746..22cbe7b2e91994 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3033,6 +3033,7 @@ extern int kernel_read_file_from_path_initns(const char *, void **, loff_t *, lo
 extern int kernel_read_file_from_fd(int, void **, loff_t *, loff_t,
 				    enum kernel_read_file_id);
 extern ssize_t kernel_read(struct file *, void *, size_t, loff_t *);
+ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos);
 extern ssize_t kernel_write(struct file *, const void *, size_t, loff_t *);
 extern ssize_t __kernel_write(struct file *, const void *, size_t, loff_t *);
 extern struct file * open_exec(const char *);
-- 
2.26.2


^ permalink raw reply related

* [PATCH 12/14] fs: remove __vfs_read
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel
In-Reply-To: <20200624161335.1810359-1-hch@lst.de>

Fold it into the two callers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c    | 43 +++++++++++++++++++++----------------------
 include/linux/fs.h |  1 -
 2 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index a0a0b5d1d9249c..6a2170eaee64f9 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -419,17 +419,6 @@ static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, lo
 	return ret;
 }
 
-ssize_t __vfs_read(struct file *file, char __user *buf, size_t count,
-		   loff_t *pos)
-{
-	if (file->f_op->read)
-		return file->f_op->read(file, buf, count, pos);
-	else if (file->f_op->read_iter)
-		return new_sync_read(file, buf, count, pos);
-	else
-		return -EINVAL;
-}
-
 ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 {
 	mm_segment_t old_fs = get_fs();
@@ -443,7 +432,12 @@ ssize_t __kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
 	if (count > MAX_RW_COUNT)
 		count =  MAX_RW_COUNT;
 	set_fs(KERNEL_DS);
-	ret = __vfs_read(file, (void __user *)buf, count, pos);
+	if (file->f_op->read)
+		ret = file->f_op->read(file, (void __user *)buf, count, pos);
+	else if (file->f_op->read_iter)
+		ret = new_sync_read(file, (void __user *)buf, count, pos);
+	else
+		ret = -EINVAL;
 	set_fs(old_fs);
 	if (ret > 0) {
 		fsnotify_access(file);
@@ -476,17 +470,22 @@ ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
 		return -EFAULT;
 
 	ret = rw_verify_area(READ, file, pos, count);
-	if (!ret) {
-		if (count > MAX_RW_COUNT)
-			count =  MAX_RW_COUNT;
-		ret = __vfs_read(file, buf, count, pos);
-		if (ret > 0) {
-			fsnotify_access(file);
-			add_rchar(current, ret);
-		}
-		inc_syscr(current);
-	}
+	if (ret)
+		return ret;
+	if (count > MAX_RW_COUNT)
+		count =  MAX_RW_COUNT;
 
+	if (file->f_op->read)
+		ret = file->f_op->read(file, buf, count, pos);
+	else if (file->f_op->read_iter)
+		ret = new_sync_read(file, buf, count, pos);
+	else
+		ret = -EINVAL;
+	if (ret > 0) {
+		fsnotify_access(file);
+		add_rchar(current, ret);
+	}
+	inc_syscr(current);
 	return ret;
 }
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 22cbe7b2e91994..0c0ec76b600b50 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1917,7 +1917,6 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
 			      struct iovec *fast_pointer,
 			      struct iovec **ret_pointer);
 
-extern ssize_t __vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
 extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
-- 
2.26.2


^ permalink raw reply related

* [PATCH 13/14] fs: implement default_file_splice_read using __kernel_read
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel
In-Reply-To: <20200624161335.1810359-1-hch@lst.de>

default_file_splice_read goes through great lenght to create an
iovec array and iov_iter for all the reads, but is a helper only
useful for files not implementing ->read_iter as we have the much
better generic_file_splice_read version available for those.  Remove
the iters and just call __kernel_read in a loop instead.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/read_write.c    |  2 +-
 fs/splice.c        | 53 +++++++++++++---------------------------------
 include/linux/fs.h |  2 --
 3 files changed, 16 insertions(+), 41 deletions(-)

diff --git a/fs/read_write.c b/fs/read_write.c
index 6a2170eaee64f9..1c41c25e548d8c 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1070,7 +1070,7 @@ ssize_t vfs_iter_write(struct file *file, struct iov_iter *iter, loff_t *ppos,
 }
 EXPORT_SYMBOL(vfs_iter_write);
 
-ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
+static ssize_t vfs_readv(struct file *file, const struct iovec __user *vec,
 		  unsigned long vlen, loff_t *pos, rwf_t flags)
 {
 	struct iovec iovstack[UIO_FASTIOV];
diff --git a/fs/splice.c b/fs/splice.c
index d7c8a7c4db07ff..d1efc53875bd93 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -342,38 +342,26 @@ const struct pipe_buf_operations nosteal_pipe_buf_ops = {
 };
 EXPORT_SYMBOL(nosteal_pipe_buf_ops);
 
-static ssize_t kernel_readv(struct file *file, const struct kvec *vec,
-			    unsigned long vlen, loff_t offset)
-{
-	mm_segment_t old_fs;
-	loff_t pos = offset;
-	ssize_t res;
-
-	old_fs = get_fs();
-	set_fs(KERNEL_DS);
-	/* The cast to a user pointer is valid due to the set_fs() */
-	res = vfs_readv(file, (const struct iovec __user *)vec, vlen, &pos, 0);
-	set_fs(old_fs);
-
-	return res;
-}
-
 static ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
 				 struct pipe_inode_info *pipe, size_t len,
 				 unsigned int flags)
 {
-	struct kvec *vec, __vec[PIPE_DEF_BUFFERS];
 	struct iov_iter to;
 	struct page **pages;
 	unsigned int nr_pages;
 	unsigned int mask;
 	size_t offset, base, copied = 0;
+	loff_t pos;
 	ssize_t res;
 	int i;
 
 	if (pipe_full(pipe->head, pipe->tail, pipe->max_usage))
 		return -EAGAIN;
 
+	res = rw_verify_area(READ, in, ppos, len);
+	if (res < 0)
+		return res;
+
 	/*
 	 * Try to keep page boundaries matching to source pagecache ones -
 	 * it probably won't be much help, but...
@@ -386,37 +374,26 @@ static ssize_t default_file_splice_read(struct file *in, loff_t *ppos,
 	if (res <= 0)
 		return -ENOMEM;
 
-	nr_pages = DIV_ROUND_UP(res + base, PAGE_SIZE);
-
-	vec = __vec;
-	if (nr_pages > PIPE_DEF_BUFFERS) {
-		vec = kmalloc_array(nr_pages, sizeof(struct kvec), GFP_KERNEL);
-		if (unlikely(!vec)) {
-			res = -ENOMEM;
-			goto out;
-		}
-	}
-
 	mask = pipe->ring_size - 1;
 	pipe->bufs[to.head & mask].offset = offset;
 	pipe->bufs[to.head & mask].len -= offset;
 
+	nr_pages = DIV_ROUND_UP(res + base, PAGE_SIZE);
+
+	pos = *ppos;
 	for (i = 0; i < nr_pages; i++) {
 		size_t this_len = min_t(size_t, len, PAGE_SIZE - offset);
-		vec[i].iov_base = page_address(pages[i]) + offset;
-		vec[i].iov_len = this_len;
+
+		res = __kernel_read(in, page_address(pages[i]) + offset,
+				this_len, &pos);
+		if (res < 0)
+			goto out;
 		len -= this_len;
 		offset = 0;
 	}
+	copied = pos - *ppos;
+	*ppos = pos;
 
-	res = kernel_readv(in, vec, nr_pages, *ppos);
-	if (res > 0) {
-		copied = res;
-		*ppos += res;
-	}
-
-	if (vec != __vec)
-		kfree(vec);
 out:
 	for (i = 0; i < nr_pages; i++)
 		put_page(pages[i]);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0c0ec76b600b50..fac6aead402a98 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1919,8 +1919,6 @@ ssize_t rw_copy_check_uvector(int type, const struct iovec __user * uvector,
 
 extern ssize_t vfs_read(struct file *, char __user *, size_t, loff_t *);
 extern ssize_t vfs_write(struct file *, const char __user *, size_t, loff_t *);
-extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
-		unsigned long, loff_t *, rwf_t);
 extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *,
 				   loff_t, size_t, unsigned int);
 extern ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in,
-- 
2.26.2


^ permalink raw reply related

* [PATCH 01/14] cachefiles: switch to kernel_write
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel
In-Reply-To: <20200624161335.1810359-1-hch@lst.de>

__kernel_write doesn't take a sb_writers references, which we need here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
---
 fs/cachefiles/rdwr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index e7726f5f1241c2..3080cda9e82457 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -937,7 +937,7 @@ int cachefiles_write_page(struct fscache_storage *op, struct page *page)
 	}
 
 	data = kmap(page);
-	ret = __kernel_write(file, data, len, &pos);
+	ret = kernel_write(file, data, len, &pos);
 	kunmap(page);
 	fput(file);
 	if (ret != len)
-- 
2.26.2


^ permalink raw reply related

* [PATCH 02/14] autofs: switch to kernel_write
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel
In-Reply-To: <20200624161335.1810359-1-hch@lst.de>

While pipes don't really need sb_writers projection, __kernel_write is an
interface better kept private, and the additional rw_verify_area does not
hurt here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Ian Kent <raven@themaw.net>
---
 fs/autofs/waitq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/autofs/waitq.c b/fs/autofs/waitq.c
index b04c528b19d342..74c886f7c51cbe 100644
--- a/fs/autofs/waitq.c
+++ b/fs/autofs/waitq.c
@@ -53,7 +53,7 @@ static int autofs_write(struct autofs_sb_info *sbi,
 
 	mutex_lock(&sbi->pipe_mutex);
 	while (bytes) {
-		wr = __kernel_write(file, data, bytes, &file->f_pos);
+		wr = kernel_write(file, data, bytes, &file->f_pos);
 		if (wr <= 0)
 			break;
 		data += wr;
-- 
2.26.2


^ permalink raw reply related

* [PATCH 03/14] bpfilter: switch to kernel_write
From: Christoph Hellwig @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Al Viro
  Cc: Linus Torvalds, Ian Kent, David Howells, linux-kernel,
	linux-fsdevel, linux-security-module, netfilter-devel
In-Reply-To: <20200624161335.1810359-1-hch@lst.de>

While pipes don't really need sb_writers projection, __kernel_write is an
interface better kept private, and the additional rw_verify_area does not
hurt here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 net/bpfilter/bpfilter_kern.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bpfilter/bpfilter_kern.c b/net/bpfilter/bpfilter_kern.c
index c0f0990f30b604..1905e01c3aa9a7 100644
--- a/net/bpfilter/bpfilter_kern.c
+++ b/net/bpfilter/bpfilter_kern.c
@@ -50,7 +50,7 @@ static int __bpfilter_process_sockopt(struct sock *sk, int optname,
 	req.len = optlen;
 	if (!bpfilter_ops.info.pid)
 		goto out;
-	n = __kernel_write(bpfilter_ops.info.pipe_to_umh, &req, sizeof(req),
+	n = kernel_write(bpfilter_ops.info.pipe_to_umh, &req, sizeof(req),
 			   &pos);
 	if (n != sizeof(req)) {
 		pr_err("write fail %zd\n", n);
-- 
2.26.2


^ permalink raw reply related

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
From: Luis Chamberlain @ 2020-06-24 16:13 UTC (permalink / raw)
  To: Christian Borntraeger, Andrew Morton, Martin Doucha, hch
  Cc: ast, axboe, bfields, bridge, chainsaw, christian.brauner,
	chuck.lever, davem, dhowells, gregkh, jarkko.sakkinen, jmorris,
	josh, keescook, keyrings, kuba, lars.ellenberg, linux-fsdevel,
	linux-kernel, linux-nfs, linux-security-module, nikolay,
	philipp.reisner, ravenexp, roopa, serge, slyfox, viro, yangtiezhu,
	netdev, markward, linux-s390
In-Reply-To: <20200624131725.GL13911@42.do-not-panic.com>

On Wed, Jun 24, 2020 at 01:17:25PM +0000, Luis Chamberlain wrote:
> I found however an LTP bug indicating the need to test for
> s390 wait macros [0] in light of a recent bug in glibc for s390.
> I am asking for references to that issue given I cannot find
> any mention of this on glibc yet.
> 
> [0] https://github.com/linux-test-project/ltp/issues/605

I looked into this and the bug associated was:

https://sourceware.org/bugzilla/show_bug.cgi?id=19613

The commit in question was upstream glibc commit
b49ab5f4503f36dcbf43f821f817da66b2931fe6 ("Remove union wait [BZ
#19613]"), and while I don't see anything s390 mentioned there,
the issue there was due to the caller of the wait using a long
instead of an int for the return value.

In other words, that'd not the droid we are looking for.

So the issue is something else.

  Luis

^ permalink raw reply

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
From: Luis Chamberlain @ 2020-06-24 16:09 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Christoph Hellwig, ast, axboe, bfields, bridge, chainsaw,
	christian.brauner, chuck.lever, davem, dhowells, gregkh,
	jarkko.sakkinen, jmorris, josh, keescook, keyrings, kuba,
	lars.ellenberg, linux-fsdevel, linux-kernel, linux-nfs,
	linux-security-module, nikolay, philipp.reisner, ravenexp, roopa,
	serge, slyfox, viro, yangtiezhu, netdev, markward, linux-s390
In-Reply-To: <9e767819-9bbe-2181-521e-4d8ca28ca4f7@de.ibm.com>

On Wed, Jun 24, 2020 at 05:54:46PM +0200, Christian Borntraeger wrote:
> 
> 
> On 24.06.20 16:43, Christoph Hellwig wrote:
> > On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote:
> >> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
> >> linux-bridge on s390?
> > 
> > Are we even sure this is s390 specific and doesn't happen on other
> > architectures with the same bridge setup?
> 
> Fair point. AFAIK nobody has tested this yet on x86.

Regardless, can you enable dynamic debug prints, to see if the kernel
reveals anything on the bridge code which may be relevant:

echo "file net/bridge/* +p" > /sys/kernel/debug/dynamic_debug/control

  Luis

^ permalink raw reply

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
From: Christian Borntraeger @ 2020-06-24 15:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: mcgrof, ast, axboe, bfields, bridge, chainsaw, christian.brauner,
	chuck.lever, davem, dhowells, gregkh, jarkko.sakkinen, jmorris,
	josh, keescook, keyrings, kuba, lars.ellenberg, linux-fsdevel,
	linux-kernel, linux-nfs, linux-security-module, nikolay,
	philipp.reisner, ravenexp, roopa, serge, slyfox, viro, yangtiezhu,
	netdev, markward, linux-s390
In-Reply-To: <20200624144311.GA5839@infradead.org>



On 24.06.20 16:43, Christoph Hellwig wrote:
> On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote:
>> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
>> linux-bridge on s390?
> 
> Are we even sure this is s390 specific and doesn't happen on other
> architectures with the same bridge setup?

Fair point. AFAIK nobody has tested this yet on x86.

^ permalink raw reply

* Re: [RFC][PATCH] net/bpfilter: Remove this broken and apparently unmantained
From: Casey Schaufler @ 2020-06-24 15:41 UTC (permalink / raw)
  To: Tetsuo Handa, linux-security-module
  Cc: Alexei Starovoitov, Eric W. Biederman, Linus Torvalds, Kees Cook,
	Andrew Morton, Alexei Starovoitov, David Miller, Al Viro, bpf,
	linux-fsdevel, Daniel Borkmann, Jakub Kicinski, Masahiro Yamada,
	Gary Lin, Bruno Meneguele, Casey Schaufler
In-Reply-To: <321b85b4-95f0-2f9b-756a-8405adc97230@i-love.sakura.ne.jp>

On 6/24/2020 12:05 AM, Tetsuo Handa wrote:
> Forwarding to LSM-ML again. Any comments?

Hey, BPF folks - you *really* need to do better about keeping the LSM
community in the loop when you're discussing LSM issues. 

>
> On 2020/06/24 15:39, Alexei Starovoitov wrote:
>> On Wed, Jun 24, 2020 at 01:58:33PM +0900, Tetsuo Handa wrote:
>>> On 2020/06/24 13:00, Alexei Starovoitov wrote:
>>>>> However, regarding usermode_blob, although the byte array (which contains code / data)
>>>>> might be initially loaded from the kernel space (which is protected), that byte array
>>>>> is no longer protected (e.g. SIGKILL, strace()) when executed because they are placed
>>>>> in the user address space. Thus, LSM modules (including pathname based security) want
>>>>> to control how that byte array can behave.
>>>> It's privileged memory regardless. root can poke into kernel or any process memory.
>>> LSM is there to restrict processes running as "root".
>> hmm. do you really mean that it's possible for an LSM to restrict CAP_SYS_ADMIN effectively?

I think that SELinux works hard to do just that. SELinux implements it's own
privilege model that is tangential to the capabilities model.

More directly, it is simple to create a security module to provide finer privilege
granularity than capabilities. I have one lurking in a source tree, and I would
be surprised if it's the only one waiting for the next round of LSM stacking.

>> LSM can certainly provide extra level of foolproof-ness against accidental
>> mistakes, but it's not a security boundary.

Gasp! Them's fight'n words. How do you justify such an outrageous claim?

>>> Your "root can poke into kernel or any process memory." response is out of step with the times.
>>>
>>> Initial byte array used for usermode blob might be protected because of "part of .rodata or
>>> .init.rodata of kernel module", but that byte array after started in userspace is no longer
>>> protected. 
>>>
>>> I don't trust such byte array as "part of kernel module", and I'm asking you how
>>> such byte array does not interfere (or be interfered by) the rest of the system.
>> Could you please explain the attack vector that you see in such scenario?
>> How elf binaries embedded in the kernel modules different from pid 1?
>> If anything can peek into their memory the system is compromised.
>> Say, there are no user blobs in kernel modules. How pid 1 memory is different
>> from all the JITed images? How is it different for all memory regions shared
>> between kernel and user processes?
>> I see an opportunity for an LSM to provide a protection against non-security
>> bugs when system is running trusted apps, but not when arbitrary code can
>> execute under root.
>>


^ permalink raw reply

* Re: linux-next: umh: fix processed error when UMH_WAIT_PROC is used seems to break linux bridge on s390x (bisected)
From: Christoph Hellwig @ 2020-06-24 14:43 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: mcgrof, ast, axboe, bfields, bridge, chainsaw, christian.brauner,
	chuck.lever, davem, dhowells, gregkh, jarkko.sakkinen, jmorris,
	josh, keescook, keyrings, kuba, lars.ellenberg, linux-fsdevel,
	linux-kernel, linux-nfs, linux-security-module, nikolay,
	philipp.reisner, ravenexp, roopa, serge, slyfox, viro, yangtiezhu,
	netdev, markward, linux-s390
In-Reply-To: <3118dc0d-a3af-9337-c897-2380062a8644@de.ibm.com>

On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote:
> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the
> linux-bridge on s390?

Are we even sure this is s390 specific and doesn't happen on other
architectures with the same bridge setup?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox