LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: properly support exec and wait with kernel pointers
From: Arnd Bergmann @ 2020-06-15 13:42 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-arch, linux-s390, Parisc List, the arch/x86 maintainers,
	open list:BROADCOM NVRAM DRIVER, linux-kernel@vger.kernel.org,
	Linux FS-devel Mailing List, Luis Chamberlain, Al Viro,
	sparclinux, linuxppc-dev, Linux ARM
In-Reply-To: <20200615130032.931285-1-hch@lst.de>

On Mon, Jun 15, 2020 at 3:00 PM Christoph Hellwig <hch@lst.de> wrote:
>
> Hi all,
>
> this series first cleans up the exec code and then adds proper
> kernel_execveat and kernel_wait callers instead of relying on the fact
> that the early init code and kernel threads implicitly run with
> the address limit set to KERNEL_DS.
>
> Note that the cleanup removes the compat execve(at) handlers (almost)
> entirely, as we can handle the compat difference very nicely in a
> unified codebase.  The only exception is x86 where this would list the
> handlers twice in the same syscall table due to the messed up x32
> design.  I had to add an extra compat handler just for that case, but
> maybe someone has a better idea.

I looked at all the patches and I like it a lot. I replied with some suggestions
for x32, but maybe I misunderstood what its problem is, as I don't see
anything preventing us from having two entries in the x32 table pointing
to the same function.

       Arnd

^ permalink raw reply

* Re: [PATCH 2/6] exec: simplify the compat syscall handling
From: Arnd Bergmann @ 2020-06-15 13:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-arch, linux-s390, Parisc List, the arch/x86 maintainers,
	open list:BROADCOM NVRAM DRIVER, linux-kernel@vger.kernel.org,
	Linux FS-devel Mailing List, Luis Chamberlain, Al Viro,
	sparclinux, linuxppc-dev, Linux ARM
In-Reply-To: <20200615130032.931285-3-hch@lst.de>

On Mon, Jun 15, 2020 at 3:00 PM Christoph Hellwig <hch@lst.de> wrote:
>
> The only differenence betweeen the compat exec* syscalls and their
> native versions is that compat_ptr sign extension, and the fact that
> the pointer arithmetics for the two dimensional arrays needs to use
> the compat pointer size.  Instead of the compat wrappers and the
> struct user_arg_ptr machinery just use in_compat_syscall() to do the
> right thing for the compat case deep inside get_user_arg_ptr().
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Nice!

I have an older patch doing the same for sys_mount() somewhere,
but never got around to send that. One day we should see if we
can just do this for all of them.

> -
> -static const char __user *get_user_arg_ptr(struct user_arg_ptr argv, int nr)
> +static const char __user *
> +get_user_arg_ptr(const char __user *const __user *argv, int nr)
>  {
>         const char __user *native;
>
>  #ifdef CONFIG_COMPAT
> -       if (unlikely(argv.is_compat)) {
> +       if (in_compat_syscall()) {
> +               const compat_uptr_t __user *compat_argv =
> +                       compat_ptr((unsigned long)argv);
>                 compat_uptr_t compat;
>
> -               if (get_user(compat, argv.ptr.compat + nr))
> +               if (get_user(compat, compat_argv + nr))
>                         return ERR_PTR(-EFAULT);
>
>                 return compat_ptr(compat);
>         }
>  #endif

I would expect that the "#ifdef CONFIG_COMPAT" can be removed
now, since compat_ptr() and in_compat_syscall() are now defined
unconditionally. I have not tried that though.

> +/*
> + * x32 syscalls are listed in the same table as x86_64 ones, so we need to
> + * define compat syscalls that are exactly the same as the native version for
> + * the syscall table machinery to work.  Sigh..
> + */
> +#ifdef CONFIG_X86_X32
>  COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
> -       const compat_uptr_t __user *, argv,
> -       const compat_uptr_t __user *, envp)
> +                      const char __user *const __user *, argv,
> +                      const char __user *const __user *, envp)
>  {
> -       return do_compat_execve(AT_FDCWD, getname(filename), argv, envp, 0);
> +       return do_execveat(AT_FDCWD, getname(filename), argv, envp, 0, NULL);
>  }

Maybe move it to arch/x86/kernel/process_64.c or arch/x86/entry/syscall_x32.c
to keep it out of the common code if this is needed. I don't really understand
the comment, why can't this just use this?

--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -379,7 +379,7 @@
 517    x32     recvfrom                compat_sys_recvfrom
 518    x32     sendmsg                 compat_sys_sendmsg
 519    x32     recvmsg                 compat_sys_recvmsg
-520    x32     execve                  compat_sys_execve
+520    x32     execve                  sys_execve
 521    x32     ptrace                  compat_sys_ptrace
 522    x32     rt_sigpending           compat_sys_rt_sigpending
 523    x32     rt_sigtimedwait         compat_sys_rt_sigtimedwait_time64
@@ -404,7 +404,7 @@
 542    x32     getsockopt              compat_sys_getsockopt
 543    x32     io_setup                compat_sys_io_setup
 544    x32     io_submit               compat_sys_io_submit
-545    x32     execveat                compat_sys_execveat
+545    x32     execveat                sys_execveat
 546    x32     preadv2                 compat_sys_preadv64v2
 547    x32     pwritev2                compat_sys_pwritev64v2
 548    x32     process_madvise         compat_sys_process_madvise

       Arnd

^ permalink raw reply

* Re: [PATCH 0/8 v2] PCI: Align return values of PCIe capability and PCI accessors
From: Jason Gunthorpe @ 2020-06-15 13:46 UTC (permalink / raw)
  To: refactormyself
  Cc: Don Brace, Sam Bobroff, Mike Marciniszyn, linux-scsi,
	Martin K. Petersen, linux-rdma, linux-pci, Dennis Dalessandro,
	esc.storagedev, Doug Ledford, linux-kernel, dmaengine, Vinod Koul,
	helgaas, skhan, bjorn, Oliver O'Halloran, linuxppc-dev,
	James E.J. Bottomley, linux-kernel-mentees
In-Reply-To: <20200615073225.24061-1-refactormyself@gmail.com>

On Mon, Jun 15, 2020 at 09:32:17AM +0200, refactormyself@gmail.com wrote:
> From: Bolarinwa Olayemi Saheed <refactormyself@gmail.com>
> 
> 
> PATCH 1/8 to 7/8:
> PCIBIOS_ error codes have positive values and they are passed down the
> call heirarchy from accessors. For functions which are meant to return
> only a negative value on failure, passing on this value is a bug.
> To mitigate this, call pcibios_err_to_errno() before passing on return
> value from PCIe capability accessors call heirarchy. This function
> converts any positive PCIBIOS_ error codes to negative generic error
> values.
> 
> PATCH 8/8:
> The PCIe capability accessors can return 0, -EINVAL, or any PCIBIOS_ error
> code. The pci accessor on the other hand can only return 0 or any PCIBIOS_
> error code.This inconsistency among these accessor makes it harder for
> callers to check for errors.
> Return PCIBIOS_BAD_REGISTER_NUMBER instead of -EINVAL in all PCIe
> capability accessors.
> 
> MERGING:
> These may all be merged via the PCI tree, since it is a collection of
> similar fixes. This way they all get merged at once.

I prefer this not happen for active trees, it just risks needless
merge conflicts.

I will take the hfi1 patches at least, let me know when they are
reviewed

Thanks,
Jason

^ permalink raw reply

* Re: [PATCH 2/6] exec: simplify the compat syscall handling
From: Christoph Hellwig @ 2020-06-15 14:12 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, linux-s390, Parisc List, the arch/x86 maintainers,
	open list:BROADCOM NVRAM DRIVER, linux-kernel@vger.kernel.org,
	Linux FS-devel Mailing List, Luis Chamberlain, Al Viro,
	sparclinux, linuxppc-dev, Christoph Hellwig, Linux ARM
In-Reply-To: <CAK8P3a0bRD3RzE_X6Tjzu9Tj+OhHhP+S=k6+VYODBGko8oQhew@mail.gmail.com>

On Mon, Jun 15, 2020 at 03:31:35PM +0200, Arnd Bergmann wrote:
> >  #ifdef CONFIG_COMPAT
> > -       if (unlikely(argv.is_compat)) {
> > +       if (in_compat_syscall()) {
> > +               const compat_uptr_t __user *compat_argv =
> > +                       compat_ptr((unsigned long)argv);
> >                 compat_uptr_t compat;
> >
> > -               if (get_user(compat, argv.ptr.compat + nr))
> > +               if (get_user(compat, compat_argv + nr))
> >                         return ERR_PTR(-EFAULT);
> >
> >                 return compat_ptr(compat);
> >         }
> >  #endif
> 
> I would expect that the "#ifdef CONFIG_COMPAT" can be removed
> now, since compat_ptr() and in_compat_syscall() are now defined
> unconditionally. I have not tried that though.

True, I'll give it a spin.

> > +/*
> > + * x32 syscalls are listed in the same table as x86_64 ones, so we need to
> > + * define compat syscalls that are exactly the same as the native version for
> > + * the syscall table machinery to work.  Sigh..
> > + */
> > +#ifdef CONFIG_X86_X32
> >  COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
> > -       const compat_uptr_t __user *, argv,
> > -       const compat_uptr_t __user *, envp)
> > +                      const char __user *const __user *, argv,
> > +                      const char __user *const __user *, envp)
> >  {
> > -       return do_compat_execve(AT_FDCWD, getname(filename), argv, envp, 0);
> > +       return do_execveat(AT_FDCWD, getname(filename), argv, envp, 0, NULL);
> >  }
> 
> Maybe move it to arch/x86/kernel/process_64.c or arch/x86/entry/syscall_x32.c
> to keep it out of the common code if this is needed.

I'd rather keep it in common code as that allows all the low-level
exec stuff to be marked static, and avoid us growing new pointless
compat variants through copy and paste.
smart compiler to d

> I don't really understand
> the comment, why can't this just use this?

That errors out with:

ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1040): undefined reference to
`__x32_sys_execve'
ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1108): undefined reference to
`__x32_sys_execveat'
make: *** [Makefile:1139: vmlinux] Error 1

^ permalink raw reply

* Re: [PATCH 2/6] exec: simplify the compat syscall handling
From: Christoph Hellwig @ 2020-06-15 14:43 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, linux-s390, Parisc List, the arch/x86 maintainers,
	open list:BROADCOM NVRAM DRIVER, linux-kernel@vger.kernel.org,
	Linux FS-devel Mailing List, Luis Chamberlain, Al Viro,
	sparclinux, linuxppc-dev, Christoph Hellwig, Linux ARM
In-Reply-To: <CAK8P3a2MeZhayZWkPbd4Ckq3n410p_n808NJTwN=JjzqHRiAXg@mail.gmail.com>

On Mon, Jun 15, 2020 at 04:40:28PM +0200, Arnd Bergmann wrote:
> > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1040): undefined reference to
> > `__x32_sys_execve'
> > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1108): undefined reference to
> > `__x32_sys_execveat'
> > make: *** [Makefile:1139: vmlinux] Error 1
> 
> Ah, I see: it's marked x32-only, so arch/x86/entry/syscall_x32.c
> uses the __x32 prefix instead of the __x64 one. Marking it 'common'
> instead would make it work, but also create an extra entry point
> for native processes, something that commit
> 6365b842aae4 ("x86/syscalls: Split the x32 syscalls into their own table")
> was trying to avoid.

Marking it common also doesn't compile at all because __NR_execve
and __NR_execveat get redefined in unistd_64.h.  I then tried to rename
the x32 versions, which failed in yet another way.  At that point I gave
up instead of digging myself into a deeper hole..

^ permalink raw reply

* Re: [PATCH 2/6] exec: simplify the compat syscall handling
From: Arnd Bergmann @ 2020-06-15 14:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-arch, linux-s390, Parisc List, the arch/x86 maintainers,
	open list:BROADCOM NVRAM DRIVER, linux-kernel@vger.kernel.org,
	Linux FS-devel Mailing List, Luis Chamberlain, Al Viro,
	sparclinux, linuxppc-dev, Linux ARM
In-Reply-To: <20200615141239.GA12951@lst.de>

On Mon, Jun 15, 2020 at 4:12 PM Christoph Hellwig <hch@lst.de> wrote:
> On Mon, Jun 15, 2020 at 03:31:35PM +0200, Arnd Bergmann wrote:

>
> > I don't really understand
> > the comment, why can't this just use this?
>
> That errors out with:
>
> ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1040): undefined reference to
> `__x32_sys_execve'
> ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1108): undefined reference to
> `__x32_sys_execveat'
> make: *** [Makefile:1139: vmlinux] Error 1

Ah, I see: it's marked x32-only, so arch/x86/entry/syscall_x32.c
uses the __x32 prefix instead of the __x64 one. Marking it 'common'
instead would make it work, but also create an extra entry point
for native processes, something that commit
6365b842aae4 ("x86/syscalls: Split the x32 syscalls into their own table")
was trying to avoid.

      Arnd

^ permalink raw reply

* Re: [PATCH 2/6] exec: simplify the compat syscall handling
From: Arnd Bergmann @ 2020-06-15 14:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-arch, linux-s390, Parisc List, the arch/x86 maintainers,
	open list:BROADCOM NVRAM DRIVER, linux-kernel@vger.kernel.org,
	Linux FS-devel Mailing List, Luis Chamberlain, Al Viro,
	sparclinux, linuxppc-dev, Linux ARM
In-Reply-To: <20200615144310.GA15101@lst.de>

On Mon, Jun 15, 2020 at 4:43 PM Christoph Hellwig <hch@lst.de> wrote:
>
> On Mon, Jun 15, 2020 at 04:40:28PM +0200, Arnd Bergmann wrote:
> > > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1040): undefined reference to
> > > `__x32_sys_execve'
> > > ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1108): undefined reference to
> > > `__x32_sys_execveat'
> > > make: *** [Makefile:1139: vmlinux] Error 1
> >
> > Ah, I see: it's marked x32-only, so arch/x86/entry/syscall_x32.c
> > uses the __x32 prefix instead of the __x64 one. Marking it 'common'
> > instead would make it work, but also create an extra entry point
> > for native processes, something that commit
> > 6365b842aae4 ("x86/syscalls: Split the x32 syscalls into their own table")
> > was trying to avoid.
>
> Marking it common also doesn't compile at all because __NR_execve
> and __NR_execveat get redefined in unistd_64.h.  I then tried to rename
> the x32 versions, which failed in yet another way.  At that point I gave
> up instead of digging myself into a deeper hole..

How about this one:

diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index 3d8d70d3896c..0ce15807cf54 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -16,6 +16,9 @@
 #undef __SYSCALL_X32
 #undef __SYSCALL_COMMON

+#define __x32_sys_execve __x64_sys_execve
+#define __x32_sys_execveat __x64_sys_execveat
+
 #define __SYSCALL_X32(nr, sym) [nr] = __x32_##sym,
 #define __SYSCALL_COMMON(nr, sym) [nr] = __x64_##sym,

Still ugly, but much simpler and more localized (if it works).

        Arnd

^ permalink raw reply related

* Re: [PATCH 2/6] exec: simplify the compat syscall handling
From: Brian Gerst @ 2020-06-15 14:48 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-arch, linux-s390, Parisc List, Arnd Bergmann,
	the arch/x86 maintainers, open list:BROADCOM NVRAM DRIVER,
	linux-kernel@vger.kernel.org, Linux FS-devel Mailing List,
	Luis Chamberlain, Al Viro, sparclinux, linuxppc-dev, Linux ARM
In-Reply-To: <20200615141239.GA12951@lst.de>

On Mon, Jun 15, 2020 at 10:13 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Mon, Jun 15, 2020 at 03:31:35PM +0200, Arnd Bergmann wrote:
> > >  #ifdef CONFIG_COMPAT
> > > -       if (unlikely(argv.is_compat)) {
> > > +       if (in_compat_syscall()) {
> > > +               const compat_uptr_t __user *compat_argv =
> > > +                       compat_ptr((unsigned long)argv);
> > >                 compat_uptr_t compat;
> > >
> > > -               if (get_user(compat, argv.ptr.compat + nr))
> > > +               if (get_user(compat, compat_argv + nr))
> > >                         return ERR_PTR(-EFAULT);
> > >
> > >                 return compat_ptr(compat);
> > >         }
> > >  #endif
> >
> > I would expect that the "#ifdef CONFIG_COMPAT" can be removed
> > now, since compat_ptr() and in_compat_syscall() are now defined
> > unconditionally. I have not tried that though.
>
> True, I'll give it a spin.
>
> > > +/*
> > > + * x32 syscalls are listed in the same table as x86_64 ones, so we need to
> > > + * define compat syscalls that are exactly the same as the native version for
> > > + * the syscall table machinery to work.  Sigh..
> > > + */
> > > +#ifdef CONFIG_X86_X32
> > >  COMPAT_SYSCALL_DEFINE3(execve, const char __user *, filename,
> > > -       const compat_uptr_t __user *, argv,
> > > -       const compat_uptr_t __user *, envp)
> > > +                      const char __user *const __user *, argv,
> > > +                      const char __user *const __user *, envp)
> > >  {
> > > -       return do_compat_execve(AT_FDCWD, getname(filename), argv, envp, 0);
> > > +       return do_execveat(AT_FDCWD, getname(filename), argv, envp, 0, NULL);
> > >  }
> >
> > Maybe move it to arch/x86/kernel/process_64.c or arch/x86/entry/syscall_x32.c
> > to keep it out of the common code if this is needed.
>
> I'd rather keep it in common code as that allows all the low-level
> exec stuff to be marked static, and avoid us growing new pointless
> compat variants through copy and paste.
> smart compiler to d
>
> > I don't really understand
> > the comment, why can't this just use this?
>
> That errors out with:
>
> ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1040): undefined reference to
> `__x32_sys_execve'
> ld: arch/x86/entry/syscall_x32.o:(.rodata+0x1108): undefined reference to
> `__x32_sys_execveat'
> make: *** [Makefile:1139: vmlinux] Error 1

I think I have a fix for this, by modifying the syscall wrappers to
add an alias for the __x32 variant to the native __x64_sys_foo().
I'll get back to you with a patch.

--
Brian Gerst

^ permalink raw reply

* Re: [PATCH net] ibmvnic: Harden device login requests
From: Thomas Falcon @ 2020-06-15 14:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linuxppc-dev, danymadden
In-Reply-To: <20200612.141040.977929535227856014.davem@davemloft.net>

On 6/12/20 4:10 PM, David Miller wrote:
> From: Thomas Falcon <tlfalcon@linux.ibm.com>
> Date: Fri, 12 Jun 2020 13:31:39 -0500
>
>> @@ -841,13 +841,14 @@ static int ibmvnic_login(struct net_device *netdev)
>>   {
>>   	struct ibmvnic_adapter *adapter = netdev_priv(netdev);
>>   	unsigned long timeout = msecs_to_jiffies(30000);
>> +	int retries = 10;
>>   	int retry_count = 0;
>>   	bool retry;
>>   	int rc;
> Reverse christmas tree, please.

Oops, sending a v2 soon.

Thanks,

Tom


^ permalink raw reply

* Re: [PATCH v2 00/12] x86: tag application address space for devices
From: Fenghua Yu @ 2020-06-15 14:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj, Joerg Roedel,
	x86, amd-gfx, Ingo Molnar, Ravi V Shankar, Yu-cheng Yu,
	Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
	Tony Luck, linuxppc-dev, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, Frederic Barrat, David Woodhouse, Lu Baolu
In-Reply-To: <20200615075202.GI2497@hirez.programming.kicks-ass.net>

Hi, Peter,
On Mon, Jun 15, 2020 at 09:52:02AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 12, 2020 at 05:41:21PM -0700, Fenghua Yu wrote:
> 
> > This series only provides simple and basic support for ENQCMD and the MSR:
> > 1. Clean up type definitions (patch 1-3). These patches can be in a
> >    separate series.
> >    - Define "pasid" as "unsigned int" consistently (patch 1 and 2).
> >    - Define "flags" as "unsigned int"
> > 2. Explain different various technical terms used in the series (patch 4).
> > 3. Enumerate support for ENQCMD in the processor (patch 5).
> > 4. Handle FPU PASID state and the MSR during context switch (patches 6-7).
> > 5. Define "pasid" in mm_struct (patch 8).
> > 5. Clear PASID state for new mm and forked and cloned thread (patch 9-10).
> > 6. Allocate and free PASID for a process (patch 11).
> > 7. Fix up the PASID MSR in #GP handler when one thread in a process
> >    executes ENQCMD for the first time (patches 12).
> 
> If this is per mm, should not switch_mm() update the MSR ? I'm not
> seeing that, nor do I see it explained why not.

PASID value is per mm and all threads in a process have the same PASID
value in the MSR. However, the MSR is per thread and is context switched
by XSAVES/XRSTROS in patches 6-7.

Thanks.

-Fenghua

^ permalink raw reply

* Re: [PATCH] powerpc/fsl_booke/32: fix build with CONFIG_RANDOMIZE_BASE
From: Scott Wood @ 2020-06-15 14:53 UTC (permalink / raw)
  To: Arseny Solokha, Michael Ellerman, Jason Yan, linuxppc-dev
  Cc: Christophe Leroy, linux-kernel, stable
In-Reply-To: <20200613162801.1946619-1-asolokha@kb.kras.ru>

On Sat, 2020-06-13 at 23:28 +0700, Arseny Solokha wrote:
> Building the current 5.8 kernel for a e500 machine with
> CONFIG_RANDOMIZE_BASE set yields the following failure:
> 
>   arch/powerpc/mm/nohash/kaslr_booke.c: In function 'kaslr_early_init':
>   arch/powerpc/mm/nohash/kaslr_booke.c:387:2: error: implicit declaration
> of function 'flush_icache_range'; did you mean 'flush_tlb_range'?
> [-Werror=implicit-function-declaration]
> 
> Indeed, including asm/cacheflush.h into kaslr_booke.c fixes the build.
> 
> The issue dates back to the introduction of that file and probably went
> unnoticed because there's no in-tree defconfig with CONFIG_RANDOMIZE_BASE
> set.
> 
> Fixes: 2b0e86cc5de6 ("powerpc/fsl_booke/32: implement KASLR infrastructure")
> Cc: stable@vger.kernel.org
> Signed-off-by: Arseny Solokha <asolokha@kb.kras.ru>
> ---
>  arch/powerpc/mm/nohash/kaslr_booke.c | 1 +
>  1 file changed, 1 insertion(+)

Acked-by: Scott Wood <oss@buserror.net>

-Scott



^ permalink raw reply

* Re: [PATCH 2/6] exec: simplify the compat syscall handling
From: Christoph Hellwig @ 2020-06-15 15:09 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, linux-s390, Parisc List, the arch/x86 maintainers,
	open list:BROADCOM NVRAM DRIVER, linux-kernel@vger.kernel.org,
	Linux FS-devel Mailing List, Luis Chamberlain, Al Viro,
	sparclinux, linuxppc-dev, Christoph Hellwig, Linux ARM
In-Reply-To: <CAK8P3a17h782gO65qJ9Mmz0EuiTSKQPEyr_=nvqOtnmQZuh9Kw@mail.gmail.com>

On Mon, Jun 15, 2020 at 04:46:15PM +0200, Arnd Bergmann wrote:
> How about this one:
> 
> diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
> index 3d8d70d3896c..0ce15807cf54 100644
> --- a/arch/x86/entry/syscall_x32.c
> +++ b/arch/x86/entry/syscall_x32.c
> @@ -16,6 +16,9 @@
>  #undef __SYSCALL_X32
>  #undef __SYSCALL_COMMON
> 
> +#define __x32_sys_execve __x64_sys_execve
> +#define __x32_sys_execveat __x64_sys_execveat
> +


arch/x86/entry/syscall_x32.c:19:26: error: ‘__x64_sys_execve’ undeclared here (not in a function); did you mean ‘__x32_sys_execve’?
   19 | #define __x32_sys_execve __x64_sys_execve
      |                          ^~~~~~~~~~~~~~~~
arch/x86/entry/syscall_x32.c:22:39: note: in expansion of macro ‘__x32_sys_execve’
   22 | #define __SYSCALL_X32(nr, sym) [nr] = __x32_##sym,
      |                                       ^~~~~~
./arch/x86/include/generated/asm/syscalls_64.h:344:1: note: in expansion of macro ‘__SYSCALL_X32’
  344 | __SYSCALL_X32(520, sys_execve)
      | ^~~~~~~~~~~~~
arch/x86/entry/syscall_x32.c:20:28: error: ‘__x64_sys_execveat’ undeclared here (not in a function); did you mean ‘__x32_sys_execveat’?
   20 | #define __x32_sys_execveat __x64_sys_execveat
      |                            ^~~~~~~~~~~~~~~~~~
arch/x86/entry/syscall_x32.c:22:39: note: in expansion of macro ‘__x32_sys_execveat’
   22 | #define __SYSCALL_X32(nr, sym) [nr] = __x32_##sym,
      |                                       ^~~~~~
./arch/x86/include/generated/asm/syscalls_64.h:369:1: note: in expansion of macro ‘__SYSCALL_X32’
  369 | __SYSCALL_X32(545, sys_execveat)
      | ^~~~~~~~~~~~~
make[2]: *** [scripts/Makefile.build:281: arch/x86/entry/syscall_x32.o] Error 1
make[1]: *** [scripts/Makefile.build:497: arch/x86/entry] Error 2
make[1]: *** Waiting for unfinished jobs....
kernel/exit.o: warning: objtool: __x64_sys_exit_group()+0x14: unreachable instruction
make: *** [Makefile:1764: arch/x86] Error 2
make: *** Waiting for unfinished jobs....

^ permalink raw reply

* Re: [PATCH] scsi: target/sbp: remove firewire SBP target driver
From: Chris Boot @ 2020-06-15 15:00 UTC (permalink / raw)
  To: Finn Thain
  Cc: Bart Van Assche, linux-scsi, Chuhong Yuan, linux-kernel,
	Nicholas Bellinger, target-devel, Martin K . Petersen,
	linux1394-devel, linuxppc-dev, Stefan Richter
In-Reply-To: <alpine.LNX.2.22.394.2006150919110.8@nippy.intranet>

On 15/06/2020 00:28, Finn Thain wrote:
> On Sun, 14 Jun 2020, Chris Boot wrote:
> 
>> I expect that if someone finds this useful it can stick around (but 
>> that's not my call).
> 
> Who's call is that? If the patch had said "From: Martin K. Petersen" and 
> "This driver is being removed because it has the following defects..." 
> that would be some indication of a good-faith willingness to accept users 
> as developers in the spirit of the GPL, which is what you seem to be 
> alluding to (?).

If you're asking me, I'd say it was martin's call:

> SCSI TARGET SUBSYSTEM                                                          
> M:      "Martin K. Petersen" <martin.petersen@oracle.com>                      
[...]
> F:      drivers/target/                                                        
> F:      include/target/                                                        

>> I just don't have the time or inclination or hardware to be able to 
>> maintain it anymore, so someone else would have to pick it up.
>>
> 
> Which is why most drivers get orphaned, right?

Sure, but that's not what Martin asked me to do, hence this patch.

-- 
Chris Boot
bootc@boo.tc

^ permalink raw reply

* [PATCH net v2] ibmvnic: Harden device login requests
From: Thomas Falcon @ 2020-06-15 15:29 UTC (permalink / raw)
  To: davem; +Cc: netdev, Thomas Falcon, linuxppc-dev, danymadden

The VNIC driver's "login" command sequence is the final step
in the driver's initialization process with device firmware,
confirming the available device queue resources to be utilized
by the driver. Under high system load, firmware may not respond
to the request in a timely manner or may abort the request. In
such cases, the driver should reattempt the login command
sequence. In case of a device error, the number of retries
is bounded.

Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
---
v2: declare variables in Reverse Christmas tree format
---
 drivers/net/ethernet/ibm/ibmvnic.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 1b4d04e..2baf7b3 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -842,12 +842,13 @@ static int ibmvnic_login(struct net_device *netdev)
 	struct ibmvnic_adapter *adapter = netdev_priv(netdev);
 	unsigned long timeout = msecs_to_jiffies(30000);
 	int retry_count = 0;
+	int retries = 10;
 	bool retry;
 	int rc;
 
 	do {
 		retry = false;
-		if (retry_count > IBMVNIC_MAX_QUEUES) {
+		if (retry_count > retries) {
 			netdev_warn(netdev, "Login attempts exceeded\n");
 			return -1;
 		}
@@ -862,11 +863,23 @@ static int ibmvnic_login(struct net_device *netdev)
 
 		if (!wait_for_completion_timeout(&adapter->init_done,
 						 timeout)) {
-			netdev_warn(netdev, "Login timed out\n");
-			return -1;
+			netdev_warn(netdev, "Login timed out, retrying...\n");
+			retry = true;
+			adapter->init_done_rc = 0;
+			retry_count++;
+			continue;
 		}
 
-		if (adapter->init_done_rc == PARTIALSUCCESS) {
+		if (adapter->init_done_rc == ABORTED) {
+			netdev_warn(netdev, "Login aborted, retrying...\n");
+			retry = true;
+			adapter->init_done_rc = 0;
+			retry_count++;
+			/* FW or device may be busy, so
+			 * wait a bit before retrying login
+			 */
+			msleep(500);
+		} else if (adapter->init_done_rc == PARTIALSUCCESS) {
 			retry_count++;
 			release_sub_crqs(adapter, 1);
 
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH 2/6] exec: simplify the compat syscall handling
From: Brian Gerst @ 2020-06-15 15:33 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-arch, linux-s390, Parisc List, Arnd Bergmann,
	the arch/x86 maintainers, open list:BROADCOM NVRAM DRIVER,
	linux-kernel@vger.kernel.org, Linux FS-devel Mailing List,
	Luis Chamberlain, Al Viro, sparclinux, linuxppc-dev, Linux ARM
In-Reply-To: <20200615150926.GA17108@lst.de>

On Mon, Jun 15, 2020 at 11:10 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Mon, Jun 15, 2020 at 04:46:15PM +0200, Arnd Bergmann wrote:
> > How about this one:
> >
> > diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
> > index 3d8d70d3896c..0ce15807cf54 100644
> > --- a/arch/x86/entry/syscall_x32.c
> > +++ b/arch/x86/entry/syscall_x32.c
> > @@ -16,6 +16,9 @@
> >  #undef __SYSCALL_X32
> >  #undef __SYSCALL_COMMON
> >
> > +#define __x32_sys_execve __x64_sys_execve
> > +#define __x32_sys_execveat __x64_sys_execveat
> > +
>
>
> arch/x86/entry/syscall_x32.c:19:26: error: ‘__x64_sys_execve’ undeclared here (not in a function); did you mean ‘__x32_sys_execve’?
>    19 | #define __x32_sys_execve __x64_sys_execve
>       |                          ^~~~~~~~~~~~~~~~
> arch/x86/entry/syscall_x32.c:22:39: note: in expansion of macro ‘__x32_sys_execve’
>    22 | #define __SYSCALL_X32(nr, sym) [nr] = __x32_##sym,
>       |                                       ^~~~~~
> ./arch/x86/include/generated/asm/syscalls_64.h:344:1: note: in expansion of macro ‘__SYSCALL_X32’
>   344 | __SYSCALL_X32(520, sys_execve)
>       | ^~~~~~~~~~~~~
> arch/x86/entry/syscall_x32.c:20:28: error: ‘__x64_sys_execveat’ undeclared here (not in a function); did you mean ‘__x32_sys_execveat’?
>    20 | #define __x32_sys_execveat __x64_sys_execveat
>       |                            ^~~~~~~~~~~~~~~~~~
> arch/x86/entry/syscall_x32.c:22:39: note: in expansion of macro ‘__x32_sys_execveat’
>    22 | #define __SYSCALL_X32(nr, sym) [nr] = __x32_##sym,
>       |                                       ^~~~~~
> ./arch/x86/include/generated/asm/syscalls_64.h:369:1: note: in expansion of macro ‘__SYSCALL_X32’
>   369 | __SYSCALL_X32(545, sys_execveat)
>       | ^~~~~~~~~~~~~
> make[2]: *** [scripts/Makefile.build:281: arch/x86/entry/syscall_x32.o] Error 1
> make[1]: *** [scripts/Makefile.build:497: arch/x86/entry] Error 2
> make[1]: *** Waiting for unfinished jobs....
> kernel/exit.o: warning: objtool: __x64_sys_exit_group()+0x14: unreachable instruction
> make: *** [Makefile:1764: arch/x86] Error 2
> make: *** Waiting for unfinished jobs....

If you move those aliases above all the __SYSCALL_* defines it will
work, since that will get the forward declaration too.  This would be
the simplest workaround.

--
Brian Gerst

^ permalink raw reply

* Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
From: Fenghua Yu @ 2020-06-15 15:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj, Joerg Roedel,
	x86, amd-gfx, Ingo Molnar, Ravi V Shankar, Yu-cheng Yu,
	Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
	Tony Luck, linuxppc-dev, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, Frederic Barrat, David Woodhouse, Lu Baolu
In-Reply-To: <20200615075649.GK2497@hirez.programming.kicks-ass.net>

Hi, Peter,
On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote:
> > +/*
> > + * Apply some heuristics to see if the #GP fault was caused by a thread
> > + * that hasn't had the IA32_PASID MSR initialized.  If it looks like that
> > + * is the problem, try initializing the IA32_PASID MSR. If the heuristic
> > + * guesses incorrectly, take one more #GP fault.
> 
> How is that going to help? Aren't we then going to run this same
> heuristic again and again and again?

The heuristic always initializes the MSR with the per mm PASID IIF the
mm has a valid PASID but the MSR doesn't have one. This heuristic usually
happens only once on the first #GP in a thread.

If the next #GP still comes in, the heuristic finds out the MSR already
has a valid PASID and thus will not fixup the MSR any more. The fixup()
returns "false" and lets others to handle the #GP.

So the heuristic will be executed once (at most) and won't be executed
again and again.

> 
> > + */
> > +bool __fixup_pasid_exception(void)
> > +{
> > +	u64 pasid_msr;
> > +	unsigned int pasid;
> > +
> > +	/*
> > +	 * This function is called only when this #GP was triggered from user
> > +	 * space. So the mm cannot be NULL.
> > +	 */
> > +	pasid = current->mm->pasid;
> > +	/* If the mm doesn't have a valid PASID, then can't help. */
> > +	if (invalid_pasid(pasid))
> > +		return false;
> > +
> > +	/*
> > +	 * Since IRQ is disabled now, the current task still owns the FPU on
> 
> That's just weird and confusing. What you want to say is that you rely
> on the exception disabling the interrupt.

I checked SDM again. You are right. #GP can be interrupted by machine check
or other interrupts. So I cannot assume the current task still owns the FPU.
Instead of directly rdmsr() and wrmsr(), I will add helpers that can access
either the MSR on the processor or the PASID state in the memory.

> 
> > +	 * this CPU and the PASID MSR can be directly accessed.
> > +	 *
> > +	 * If the MSR has a valid PASID, the #GP must be for some other reason.
> > +	 *
> > +	 * If rdmsr() is really a performance issue, a TIF_ flag may be
> > +	 * added to check if the thread has a valid PASID instead of rdmsr().
> 
> I don't understand any of this. Nobody except us writes to this MSR, we
> should bloody well know what's in it. What gives?

Patch 4 describes how to manage the MSR and patch 7 describes the format
of the MSR (20-bit PASID value and bit 31 is valid bit).

Are they sufficient to help? Or do you mean something else?

> > +	 */
> > +	rdmsrl(MSR_IA32_PASID, pasid_msr);
> > +	if (pasid_msr & MSR_IA32_PASID_VALID)
> > +		return false;
> > +
> > +	/* Fix up the MSR if the MSR doesn't have a valid PASID. */
> > +	wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
> > +
> > +	return true;
> > +}
> > -- 
> > 2.19.1
> > 

Thanks.

-Fenghua

^ permalink raw reply

* Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
From: Peter Zijlstra @ 2020-06-15 16:03 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj, Joerg Roedel,
	x86, amd-gfx, Ingo Molnar, Ravi V Shankar, Yu-cheng Yu,
	Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
	Tony Luck, linuxppc-dev, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, Frederic Barrat, David Woodhouse, Lu Baolu
In-Reply-To: <20200615154854.GB13792@romley-ivt3.sc.intel.com>

On Mon, Jun 15, 2020 at 08:48:54AM -0700, Fenghua Yu wrote:
> Hi, Peter,
> On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote:
> > On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote:
> > > +/*
> > > + * Apply some heuristics to see if the #GP fault was caused by a thread
> > > + * that hasn't had the IA32_PASID MSR initialized.  If it looks like that
> > > + * is the problem, try initializing the IA32_PASID MSR. If the heuristic
> > > + * guesses incorrectly, take one more #GP fault.
> > 
> > How is that going to help? Aren't we then going to run this same
> > heuristic again and again and again?
> 
> The heuristic always initializes the MSR with the per mm PASID IIF the
> mm has a valid PASID but the MSR doesn't have one. This heuristic usually
> happens only once on the first #GP in a thread.

But it doesn't guarantee the PASID is the right one. Suppose both the mm
has a PASID and the MSR has a VALID one, but the MSR isn't the mm one.
Then we NO-OP. So if the exception was due to us having the wrong PASID,
we stuck.

> If the next #GP still comes in, the heuristic finds out the MSR already
> has a valid PASID and thus will not fixup the MSR any more. The fixup()
> returns "false" and lets others to handle the #GP.
> 
> So the heuristic will be executed once (at most) and won't be executed
> again and again.

So I get that you want to set the PASID on-demand, but I find this all
really weird code to make that happen.

> > > +bool __fixup_pasid_exception(void)
> > > +{
> > > +	u64 pasid_msr;
> > > +	unsigned int pasid;
> > > +
> > > +	/*
> > > +	 * This function is called only when this #GP was triggered from user
> > > +	 * space. So the mm cannot be NULL.
> > > +	 */
> > > +	pasid = current->mm->pasid;
> > > +	/* If the mm doesn't have a valid PASID, then can't help. */
> > > +	if (invalid_pasid(pasid))
> > > +		return false;
> > > +
> > > +	/*
> > > +	 * Since IRQ is disabled now, the current task still owns the FPU on
> > 
> > That's just weird and confusing. What you want to say is that you rely
> > on the exception disabling the interrupt.
> 
> I checked SDM again. You are right. #GP can be interrupted by machine check
> or other interrupts. So I cannot assume the current task still owns the FPU.
> Instead of directly rdmsr() and wrmsr(), I will add helpers that can access
> either the MSR on the processor or the PASID state in the memory.

That's not in fact what I meant, but yes, you can take exceptions while
!IF just fine.

> > > +	 * this CPU and the PASID MSR can be directly accessed.
> > > +	 *
> > > +	 * If the MSR has a valid PASID, the #GP must be for some other reason.
> > > +	 *
> > > +	 * If rdmsr() is really a performance issue, a TIF_ flag may be
> > > +	 * added to check if the thread has a valid PASID instead of rdmsr().
> > 
> > I don't understand any of this. Nobody except us writes to this MSR, we
> > should bloody well know what's in it. What gives?
> 
> Patch 4 describes how to manage the MSR and patch 7 describes the format
> of the MSR (20-bit PASID value and bit 31 is valid bit).
> 
> Are they sufficient to help? Or do you mean something else?

I don't get why you need a rdmsr here, or why not having one would
require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?

> > > +	 */
> > > +	rdmsrl(MSR_IA32_PASID, pasid_msr);
> > > +	if (pasid_msr & MSR_IA32_PASID_VALID)
> > > +		return false;
> > > +
> > > +	/* Fix up the MSR if the MSR doesn't have a valid PASID. */
> > > +	wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);

How much more expensive is the wrmsr over the rdmsr? Can't we just
unconditionally write the current PASID and call it a day?

^ permalink raw reply

* Re: [PATCH 2/6] exec: simplify the compat syscall handling
From: Christoph Hellwig @ 2020-06-15 16:41 UTC (permalink / raw)
  To: Brian Gerst
  Cc: linux-arch, linux-s390, Parisc List, Arnd Bergmann,
	the arch/x86 maintainers, open list:BROADCOM NVRAM DRIVER,
	linux-kernel@vger.kernel.org, Linux FS-devel Mailing List,
	Luis Chamberlain, Al Viro, sparclinux, linuxppc-dev,
	Christoph Hellwig, Linux ARM
In-Reply-To: <CAMzpN2htYX7s6pmRg-c8qwZL1f1_+sB=ztDG_L=617hWsm-=8g@mail.gmail.com>

On Mon, Jun 15, 2020 at 11:33:49AM -0400, Brian Gerst wrote:
> If you move those aliases above all the __SYSCALL_* defines it will
> work, since that will get the forward declaration too.  This would be
> the simplest workaround.

That compiles and also passes my exaustive x32 tests (chroot + ls -l).

This is the updated version:

http://git.infradead.org/users/hch/misc.git/commitdiff/c8d319711ad2f53be003ae8e9be08519068bdcee

^ permalink raw reply

* Re: [PATCH v1 4/4] KVM: PPC: Book3S HV: migrate hot plugged memory
From: Laurent Dufour @ 2020-06-15 17:00 UTC (permalink / raw)
  To: Ram Pai, kvm-ppc, linuxppc-dev
  Cc: cclaudio, bharata, aneesh.kumar, sukadev, bauerman, david
In-Reply-To: <1590892071-25549-5-git-send-email-linuxram@us.ibm.com>

Le 31/05/2020 à 04:27, Ram Pai a écrit :
> From: Laurent Dufour <ldufour@linux.ibm.com>
> 
> When a memory slot is hot plugged to a SVM, GFNs associated with that
> memory slot automatically default to secure GFN. Hence migrate the
> PFNs associated with these GFNs to device-PFNs.
> 
> uv_migrate_mem_slot() is called to achieve that. It will not call
> UV_PAGE_IN since this request is ignored by the Ultravisor.
> NOTE: Ultravisor does not trust any page content provided by
> the Hypervisor, ones the VM turns secure.
> 
> Cc: Paul Mackerras <paulus@ozlabs.org>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Bharata B Rao <bharata@linux.ibm.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> Cc: Laurent Dufour <ldufour@linux.ibm.com>
> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> Cc: David Gibson <david@gibson.dropbear.id.au>
> Cc: Claudio Carvalho <cclaudio@linux.ibm.com>
> Cc: kvm-ppc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> 	(fixed merge conflicts. Modified the commit message)
> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
> ---
>   arch/powerpc/include/asm/kvm_book3s_uvmem.h |  4 ++++
>   arch/powerpc/kvm/book3s_hv.c                | 11 +++++++----
>   arch/powerpc/kvm/book3s_hv_uvmem.c          |  3 +--
>   3 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s_uvmem.h b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> index f0c5708..2ec2e5afb 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_uvmem.h
> @@ -23,6 +23,7 @@ unsigned long kvmppc_h_svm_page_out(struct kvm *kvm,
>   void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>   			     struct kvm *kvm, bool skip_page_out,
>   			     bool purge_gfn);
> +int uv_migrate_mem_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot);
>   #else
>   static inline int kvmppc_uvmem_init(void)
>   {
> @@ -78,5 +79,8 @@ static inline int kvmppc_send_page_to_uv(struct kvm *kvm, unsigned long gfn)
>   kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *free,
>   			struct kvm *kvm, bool skip_page_out,
>   			bool purge_gfn) { }
> +
> +static int uv_migrate_mem_slot(struct kvm *kvm,
> +		const struct kvm_memory_slot *memslot);
>   #endif /* CONFIG_PPC_UV */
>   #endif /* __ASM_KVM_BOOK3S_UVMEM_H__ */
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 4c62bfe..604d062 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -4516,13 +4516,16 @@ static void kvmppc_core_commit_memory_region_hv(struct kvm *kvm,
>   	case KVM_MR_CREATE:
>   		if (kvmppc_uvmem_slot_init(kvm, new))
>   			return;
> -		uv_register_mem_slot(kvm->arch.lpid,
> -				     new->base_gfn << PAGE_SHIFT,
> -				     new->npages * PAGE_SIZE,
> -				     0, new->id);
> +		if (uv_register_mem_slot(kvm->arch.lpid,
> +					 new->base_gfn << PAGE_SHIFT,
> +					 new->npages * PAGE_SIZE,
> +					 0, new->id))
> +			return;
> +		uv_migrate_mem_slot(kvm, new);
>   		break;
>   	case KVM_MR_DELETE:
>   		uv_unregister_mem_slot(kvm->arch.lpid, old->id);
> +		kvmppc_uvmem_drop_pages(old, kvm, true, true);

My mistake, kvmppc_radix_flush_memslot() called just before is already 
triggering the call to kvmppc_uvmem_drop_pages(), so that call is useless.

You should remove it in your v2.

>   		kvmppc_uvmem_slot_free(kvm, old);
>   		break;
>   	default:
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 36dda1d..1fa5f2a 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -377,8 +377,7 @@ static int kvmppc_svm_migrate_page(struct vm_area_struct *vma,
>   	return ret;
>   }
>   
> -static int uv_migrate_mem_slot(struct kvm *kvm,
> -		const struct kvm_memory_slot *memslot)
> +int uv_migrate_mem_slot(struct kvm *kvm, const struct kvm_memory_slot *memslot)
>   {
>   	unsigned long gfn = memslot->base_gfn;
>   	unsigned long end;
> 


^ permalink raw reply

* RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
From: Luck, Tony @ 2020-06-15 17:11 UTC (permalink / raw)
  To: Peter Zijlstra, Yu, Fenghua
  Cc: Hansen, Dave, H Peter Anvin, Jiang, Dave, Raj, Ashok,
	Joerg Roedel, x86, amd-gfx, Ingo Molnar, Shankar, Ravi V,
	Yu, Yu-cheng, Andrew Donnellan, Borislav Petkov, Mehta, Sohil,
	Thomas Gleixner, David Woodhouse, Felix Kuehling, linux-kernel,
	iommu@lists.linux-foundation.org, Pan, Jacob jun, Frederic Barrat,
	linuxppc-dev, Lu Baolu
In-Reply-To: <20200615160357.GA2531@hirez.programming.kicks-ass.net>

>> The heuristic always initializes the MSR with the per mm PASID IIF the
>> mm has a valid PASID but the MSR doesn't have one. This heuristic usually
>> happens only once on the first #GP in a thread.
>
> But it doesn't guarantee the PASID is the right one. Suppose both the mm
> has a PASID and the MSR has a VALID one, but the MSR isn't the mm one.
> Then we NO-OP. So if the exception was due to us having the wrong PASID,
> we stuck.

ENQCMD only checks the 'valid' bit of the IA32_PASID MSR to decide whether
to #GP or not.  H/W has no concept of the "right" pasid value.

If IA32_PASID is valid with the wrong value ... then the system is about to
see some major corruption because the operations in the accelerator are
not going to translate to the physical addresses for pages owned by the process
that issued the ENQCMD.

-Tony

^ permalink raw reply

* Re: [PATCH 1/2] mm, treewide: Rename kzfree() to kfree_sensitive()
From: Dan Carpenter @ 2020-06-15 18:07 UTC (permalink / raw)
  To: Waiman Long
  Cc: linux-cifs, linux-wireless, Jarkko Sakkinen, virtualization,
	David Howells, linux-mm, linux-sctp, target-devel, kasan-dev,
	cocci, devel, linux-s390, linux-scsi, x86, James Morris,
	Matthew Wilcox, linux-stm32, intel-wired-lan, David Rientjes,
	Serge E. Hallyn, linux-pm, ecryptfs, linuxppc-dev, linux-fscrypt,
	linux-mediatek, linux-amlogic, linux-btrfs, linux-integrity,
	linux-arm-kernel, linux-nfs, Linus Torvalds, samba-technical,
	linux-kernel, linux-bluetooth, linux-security-module, keyrings,
	tipc-discussion, linux-crypto, netdev, Joe Perches, Andrew Morton,
	linux-wpan, wireguard, linux-ppp
In-Reply-To: <20200413211550.8307-2-longman@redhat.com>

On Mon, Apr 13, 2020 at 05:15:49PM -0400, Waiman Long wrote:
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 23c7500eea7d..c08bc7eb20bd 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1707,17 +1707,17 @@ void *krealloc(const void *p, size_t new_size, gfp_t flags)
>  EXPORT_SYMBOL(krealloc);
>  
>  /**
> - * kzfree - like kfree but zero memory
> + * kfree_sensitive - Clear sensitive information in memory before freeing
>   * @p: object to free memory of
>   *
>   * The memory of the object @p points to is zeroed before freed.
> - * If @p is %NULL, kzfree() does nothing.
> + * If @p is %NULL, kfree_sensitive() does nothing.
>   *
>   * Note: this function zeroes the whole allocated buffer which can be a good
>   * deal bigger than the requested buffer size passed to kmalloc(). So be
>   * careful when using this function in performance sensitive code.
>   */
> -void kzfree(const void *p)
> +void kfree_sensitive(const void *p)
>  {
>  	size_t ks;
>  	void *mem = (void *)p;
> @@ -1725,10 +1725,10 @@ void kzfree(const void *p)
>  	if (unlikely(ZERO_OR_NULL_PTR(mem)))
>  		return;
>  	ks = ksize(mem);
> -	memset(mem, 0, ks);
> +	memzero_explicit(mem, ks);
        ^^^^^^^^^^^^^^^^^^^^^^^^^
This is an unrelated bug fix.  It really needs to be pulled into a
separate patch by itself and back ported to stable kernels.

>  	kfree(mem);
>  }
> -EXPORT_SYMBOL(kzfree);
> +EXPORT_SYMBOL(kfree_sensitive);
>  
>  /**
>   * ksize - get the actual amount of memory allocated for a given object

regards,
dan carpenter

^ permalink raw reply

* Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
From: Fenghua Yu @ 2020-06-15 18:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj, Joerg Roedel,
	x86, amd-gfx, Ingo Molnar, Ravi V Shankar, Yu-cheng Yu,
	Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
	Tony Luck, linuxppc-dev, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, Frederic Barrat, David Woodhouse, Lu Baolu
In-Reply-To: <20200615160357.GA2531@hirez.programming.kicks-ass.net>

On Mon, Jun 15, 2020 at 06:03:57PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 08:48:54AM -0700, Fenghua Yu wrote:
> > Hi, Peter,
> > On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote:
> > > On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote:
> > > > +/*
> > > > + * Apply some heuristics to see if the #GP fault was caused by a thread
> > > > + * that hasn't had the IA32_PASID MSR initialized.  If it looks like that
> > > > + * is the problem, try initializing the IA32_PASID MSR. If the heuristic
> > > > + * guesses incorrectly, take one more #GP fault.
> > > 
> > > How is that going to help? Aren't we then going to run this same
> > > heuristic again and again and again?
> > 
> > The heuristic always initializes the MSR with the per mm PASID IIF the
> > mm has a valid PASID but the MSR doesn't have one. This heuristic usually
> > happens only once on the first #GP in a thread.
> 
> But it doesn't guarantee the PASID is the right one. Suppose both the mm
> has a PASID and the MSR has a VALID one, but the MSR isn't the mm one.
> Then we NO-OP. So if the exception was due to us having the wrong PASID,
> we stuck.

The MSR for each thread was cleared during fork() and clone(). The PASID
was cleared during mm_init(). The per-mm PASID was assigned when fist
bind_mm() is called and remains the same value until process exit().

The MSR is only fixed up when the first ENQCMD is executed in a thread:
bit 31 in the MSR is 0 and the PASID in the mm is non-zero.

The MSR remains the same PASID value once it's fixed up until the thread
exits.

So the work flow ensures the PASID goes from mm's PASID to the MSR.

The PASID could be unbund from the mm. In this case, iommu will generate
#PF and kernel oops instead of #GP.

> 
> > If the next #GP still comes in, the heuristic finds out the MSR already
> > has a valid PASID and thus will not fixup the MSR any more. The fixup()
> > returns "false" and lets others to handle the #GP.
> > 
> > So the heuristic will be executed once (at most) and won't be executed
> > again and again.
> 
> So I get that you want to set the PASID on-demand, but I find this all
> really weird code to make that happen.

We could keep PASID same in all threads sychronously by propogating the MSRs
when the PASID is bound to the mm via IPIs or taskworks to all
threads in the process. But the code is complex and error-prone and
overhead could be high:
1. The user can call driver to do binding and unbinding multiple times.
   The IPIs or taskworks will be sent multiple times to make sure only
   the last IPIs or taskworks take action.
2. Even if a thread never executes ENQCMD and thus never uses the MSR,
   the MSR still needs to be updated whenever bind_mm() and needs to be
   context switched each time. This could cause high overhead.

Setting the PASID on-demand is simpler and cleaner and was recommended
by Thomas.

> 
> > > > +bool __fixup_pasid_exception(void)
> > > > +{
> > > > +	u64 pasid_msr;
> > > > +	unsigned int pasid;
> > > > +
> > > > +	/*
> > > > +	 * This function is called only when this #GP was triggered from user
> > > > +	 * space. So the mm cannot be NULL.
> > > > +	 */
> > > > +	pasid = current->mm->pasid;
> > > > +	/* If the mm doesn't have a valid PASID, then can't help. */
> > > > +	if (invalid_pasid(pasid))
> > > > +		return false;
> > > > +
> > > > +	/*
> > > > +	 * Since IRQ is disabled now, the current task still owns the FPU on
> > > 
> > > That's just weird and confusing. What you want to say is that you rely
> > > on the exception disabling the interrupt.
> > 
> > I checked SDM again. You are right. #GP can be interrupted by machine check
> > or other interrupts. So I cannot assume the current task still owns the FPU.
> > Instead of directly rdmsr() and wrmsr(), I will add helpers that can access
> > either the MSR on the processor or the PASID state in the memory.
> 
> That's not in fact what I meant, but yes, you can take exceptions while
> !IF just fine.
> 
> > > > +	 * this CPU and the PASID MSR can be directly accessed.
> > > > +	 *
> > > > +	 * If the MSR has a valid PASID, the #GP must be for some other reason.
> > > > +	 *
> > > > +	 * If rdmsr() is really a performance issue, a TIF_ flag may be
> > > > +	 * added to check if the thread has a valid PASID instead of rdmsr().
> > > 
> > > I don't understand any of this. Nobody except us writes to this MSR, we
> > > should bloody well know what's in it. What gives?
> > 
> > Patch 4 describes how to manage the MSR and patch 7 describes the format
> > of the MSR (20-bit PASID value and bit 31 is valid bit).
> > 
> > Are they sufficient to help? Or do you mean something else?
> 
> I don't get why you need a rdmsr here, or why not having one would
> require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?

My concern is TIF flags are precious (only 3 slots available). Defining
a new TIF flag may be not worth it while rdmsr() can check if PASID
is valid in the MSR. And performance here might not be a big issue
in #GP.

But if you think using TIF flag is better, I can define a new TIF flag
and maintain it per thread (init 0 when clone()/fork(), set 1 in fixup()).
Then we can avoid using rdmsr() to check valid PASID in the MSR.

> 
> > > > +	 */
> > > > +	rdmsrl(MSR_IA32_PASID, pasid_msr);
> > > > +	if (pasid_msr & MSR_IA32_PASID_VALID)
> > > > +		return false;
> > > > +
> > > > +	/* Fix up the MSR if the MSR doesn't have a valid PASID. */
> > > > +	wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
> 
> How much more expensive is the wrmsr over the rdmsr? Can't we just
> unconditionally write the current PASID and call it a day?

rdmsr() and wrmsr() might have same cost.

If using a TIF flag to check if the thread has a valid PASID MSR, then
I can replace rdmsr() by checking the TIF flag and only do wrmsr()
when the TIF flag is set.

Please advice.

Thanks.

-Fenghua

^ permalink raw reply

* Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
From: Peter Zijlstra @ 2020-06-15 18:31 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Dave Hansen, H Peter Anvin, Dave Jiang, Ashok Raj, Joerg Roedel,
	x86, amd-gfx, Ingo Molnar, Ravi V Shankar, Yu-cheng Yu,
	Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
	Tony Luck, linuxppc-dev, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, Frederic Barrat, David Woodhouse, Lu Baolu
In-Reply-To: <20200615181259.GC13792@romley-ivt3.sc.intel.com>

On Mon, Jun 15, 2020 at 11:12:59AM -0700, Fenghua Yu wrote:
> > I don't get why you need a rdmsr here, or why not having one would
> > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?
> 
> My concern is TIF flags are precious (only 3 slots available). Defining
> a new TIF flag may be not worth it while rdmsr() can check if PASID
> is valid in the MSR. And performance here might not be a big issue
> in #GP.
> 
> But if you think using TIF flag is better, I can define a new TIF flag
> and maintain it per thread (init 0 when clone()/fork(), set 1 in fixup()).
> Then we can avoid using rdmsr() to check valid PASID in the MSR.

WHY ?!?! What do you need a TIF flag for?

^ permalink raw reply

* Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID
From: Peter Zijlstra @ 2020-06-15 18:32 UTC (permalink / raw)
  To: Raj, Ashok
  Cc: Ravi V Shankar, Dave Hansen, H Peter Anvin, Dave Jiang,
	Joerg Roedel, x86, amd-gfx, Ingo Molnar, Fenghua Yu, Yu-cheng Yu,
	Andrew Donnellan, Borislav Petkov, Sohil Mehta, Thomas Gleixner,
	Tony Luck, linuxppc-dev, Felix Kuehling, linux-kernel, iommu,
	Jacob Jun Pan, Frederic Barrat, David Woodhouse, Lu Baolu
In-Reply-To: <20200615181921.GA33928@otc-nc-03>

On Mon, Jun 15, 2020 at 11:19:21AM -0700, Raj, Ashok wrote:
> On Mon, Jun 15, 2020 at 06:03:57PM +0200, Peter Zijlstra wrote:
> > 
> > I don't get why you need a rdmsr here, or why not having one would
> > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?
> > 
> > > > > +	 */
> > > > > +	rdmsrl(MSR_IA32_PASID, pasid_msr);
> > > > > +	if (pasid_msr & MSR_IA32_PASID_VALID)
> > > > > +		return false;
> > > > > +
> > > > > +	/* Fix up the MSR if the MSR doesn't have a valid PASID. */
> > > > > +	wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
> > 
> > How much more expensive is the wrmsr over the rdmsr? Can't we just
> > unconditionally write the current PASID and call it a day?
> 
> The reason to check the rdmsr() is because we are using a hueristic taking
> GP faults. If we already setup the MSR, but we get it a second time it
> means the reason is something other than PASID_MSR not being set.
> 
> Ideally we should use the TIF_ to track this would be cheaper, but we were
> told those bits aren't easy to give out. 

Why do you need a TIF flag? Why not any other random flag in current?

^ permalink raw reply

* Re: [PATCH 1/2] mm, treewide: Rename kzfree() to kfree_sensitive()
From: Waiman Long @ 2020-06-15 18:39 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: linux-cifs, linux-wireless, Jarkko Sakkinen, virtualization,
	David Howells, linux-mm, linux-sctp, target-devel, kasan-dev,
	cocci, devel, linux-s390, linux-scsi, x86, James Morris,
	Matthew Wilcox, linux-stm32, intel-wired-lan, David Rientjes,
	Serge E. Hallyn, linux-pm, ecryptfs, linuxppc-dev, linux-fscrypt,
	linux-mediatek, linux-amlogic, linux-btrfs, linux-integrity,
	linux-arm-kernel, linux-nfs, Linus Torvalds, samba-technical,
	linux-kernel, linux-bluetooth, linux-security-module, keyrings,
	tipc-discussion, linux-crypto, netdev, Joe Perches, Andrew Morton,
	linux-wpan, wireguard, linux-ppp
In-Reply-To: <20200615180753.GJ4151@kadam>

On 6/15/20 2:07 PM, Dan Carpenter wrote:
> On Mon, Apr 13, 2020 at 05:15:49PM -0400, Waiman Long wrote:
>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>> index 23c7500eea7d..c08bc7eb20bd 100644
>> --- a/mm/slab_common.c
>> +++ b/mm/slab_common.c
>> @@ -1707,17 +1707,17 @@ void *krealloc(const void *p, size_t new_size, gfp_t flags)
>>   EXPORT_SYMBOL(krealloc);
>>   
>>   /**
>> - * kzfree - like kfree but zero memory
>> + * kfree_sensitive - Clear sensitive information in memory before freeing
>>    * @p: object to free memory of
>>    *
>>    * The memory of the object @p points to is zeroed before freed.
>> - * If @p is %NULL, kzfree() does nothing.
>> + * If @p is %NULL, kfree_sensitive() does nothing.
>>    *
>>    * Note: this function zeroes the whole allocated buffer which can be a good
>>    * deal bigger than the requested buffer size passed to kmalloc(). So be
>>    * careful when using this function in performance sensitive code.
>>    */
>> -void kzfree(const void *p)
>> +void kfree_sensitive(const void *p)
>>   {
>>   	size_t ks;
>>   	void *mem = (void *)p;
>> @@ -1725,10 +1725,10 @@ void kzfree(const void *p)
>>   	if (unlikely(ZERO_OR_NULL_PTR(mem)))
>>   		return;
>>   	ks = ksize(mem);
>> -	memset(mem, 0, ks);
>> +	memzero_explicit(mem, ks);
>          ^^^^^^^^^^^^^^^^^^^^^^^^^
> This is an unrelated bug fix.  It really needs to be pulled into a
> separate patch by itself and back ported to stable kernels.
>
>>   	kfree(mem);
>>   }
>> -EXPORT_SYMBOL(kzfree);
>> +EXPORT_SYMBOL(kfree_sensitive);
>>   
>>   /**
>>    * ksize - get the actual amount of memory allocated for a given object
> regards,
> dan carpenter
>
Thanks for the suggestion. I will break it out and post a version soon.

Cheers,
Longman


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox