LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 4.19] mm: fix exec activate_mm vs TLB shootdown and lazy tlb switching race
From: Greg KH @ 2020-11-04  9:05 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: peterz, linuxppc-dev, npiggin, linux-kernel, stable
In-Reply-To: <20201104011406.598487-1-mpe@ellerman.id.au>

On Wed, Nov 04, 2020 at 12:14:06PM +1100, Michael Ellerman wrote:
> From: Nicholas Piggin <npiggin@gmail.com>
> 
> commit d53c3dfb23c45f7d4f910c3a3ca84bf0a99c6143 upstream.
> 
> Reading and modifying current->mm and current->active_mm and switching
> mm should be done with irqs off, to prevent races seeing an intermediate
> state.
> 
> This is similar to commit 38cf307c1f20 ("mm: fix kthread_use_mm() vs TLB
> invalidate"). At exec-time when the new mm is activated, the old one
> should usually be single-threaded and no longer used, unless something
> else is holding an mm_users reference (which may be possible).
> 
> Absent other mm_users, there is also a race with preemption and lazy tlb
> switching. Consider the kernel_execve case where the current thread is
> using a lazy tlb active mm:
> 
>   call_usermodehelper()
>     kernel_execve()
>       old_mm = current->mm;
>       active_mm = current->active_mm;
>       *** preempt *** -------------------->  schedule()
>                                                prev->active_mm = NULL;
>                                                mmdrop(prev active_mm);
>                                              ...
>                       <--------------------  schedule()
>       current->mm = mm;
>       current->active_mm = mm;
>       if (!old_mm)
>           mmdrop(active_mm);
> 
> If we switch back to the kernel thread from a different mm, there is a
> double free of the old active_mm, and a missing free of the new one.
> 
> Closing this race only requires interrupts to be disabled while ->mm
> and ->active_mm are being switched, but the TLB problem requires also
> holding interrupts off over activate_mm. Unfortunately not all archs
> can do that yet, e.g., arm defers the switch if irqs are disabled and
> expects finish_arch_post_lock_switch() to be called to complete the
> flush; um takes a blocking lock in activate_mm().
> 
> So as a first step, disable interrupts across the mm/active_mm updates
> to close the lazy tlb preempt race, and provide an arch option to
> extend that to activate_mm which allows architectures doing IPI based
> TLB shootdowns to close the second race.
> 
> This is a bit ugly, but in the interest of fixing the bug and backporting
> before all architectures are converted this is a compromise.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> [mpe: Manual backport to 4.19 due to membarrier_exec_mmap(mm) changes]
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> Link: https://lore.kernel.org/r/20200914045219.3736466-2-npiggin@gmail.com
> ---
>  arch/Kconfig |  7 +++++++
>  fs/exec.c    | 15 ++++++++++++++-
>  2 files changed, 21 insertions(+), 1 deletion(-)

Now queued up, thanks!

greg k-h

^ permalink raw reply

* Re: [PATCH] powerpc/32s: Setup the early hash table at all time.
From: Serge Belyshev @ 2020-11-04  7:57 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Paul Mackerras, Andreas Schwab, linuxppc-dev, linux-kernel
In-Reply-To: <5acd7caf-99e9-9cb5-ed24-578d2e0a5ee1@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:

> To be sure we are not in front of a long lasting bug, could you try
> CONFIG_KASAN=y on v5.9 ?

Indeed it started to fail somewhere between v5.6 and v5.7.

v5.7 fails early with few messages on the console with reboot, v5.8 and
later hang right at bootloader.

I'm bisecting now.

^ permalink raw reply

* Re: [PATCH] powerpc/32s: Setup the early hash table at all time.
From: Christophe Leroy @ 2020-11-04  6:44 UTC (permalink / raw)
  To: Serge Belyshev; +Cc: Paul Mackerras, Andreas Schwab, linuxppc-dev, linux-kernel
In-Reply-To: <875z6mmfna.fsf@depni.sinp.msu.ru>



Le 03/11/2020 à 19:58, Serge Belyshev a écrit :
>> Would you mind checking that with that patch reverted, you are able to
>> boot a kernel built with CONFIG_KASAN ?
> 
> I can reproduce the same problem on a powerbook G4, and no,
> CONFIG_KASAN=y kernel with that patch reverted also does not boot with
> the same symptom: white screen at the bootloader right after "Booting Linux
> via __start() @ 0x0140000 ..."
> 

Thanks for the test Serge.

To be sure we are not in front of a long lasting bug, could you try CONFIG_KASAN=y on v5.9 ?

Christophe

^ permalink raw reply

* Re: [PATCH 01/18] powerpc/pci: Add ppc_md.discover_phbs()
From: kernel test robot @ 2020-11-04  4:07 UTC (permalink / raw)
  To: Oliver O'Halloran, linuxppc-dev
  Cc: clang-built-linux, Oliver O'Halloran, kbuild-all,
	Paul Mackerras
In-Reply-To: <20201103043523.916109-1-oohall@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2261 bytes --]

Hi Oliver,

I love your patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on v5.10-rc2 next-20201103]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Oliver-O-Halloran/powerpc-pci-Add-ppc_md-discover_phbs/20201103-130935
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc64-randconfig-r024-20201103 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 1fcd5d5655e29f85e12b402e32974f207cfedf32)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc64 cross compiling tool for clang build
        # apt-get install binutils-powerpc64-linux-gnu
        # https://github.com/0day-ci/linux/commit/76dcfc8e7ec9ceaee251e156ffe07140bf1f1a5d
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Oliver-O-Halloran/powerpc-pci-Add-ppc_md-discover_phbs/20201103-130935
        git checkout 76dcfc8e7ec9ceaee251e156ffe07140bf1f1a5d
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/powerpc/kernel/pci-common.c:1630:12: warning: no previous prototype for function 'discover_phbs' [-Wmissing-prototypes]
   int __init discover_phbs(void)
              ^
   arch/powerpc/kernel/pci-common.c:1630:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int __init discover_phbs(void)
   ^
   static 
   1 warning generated.

vim +/discover_phbs +1630 arch/powerpc/kernel/pci-common.c

  1628	
  1629	
> 1630	int __init discover_phbs(void)

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34042 bytes --]

^ permalink raw reply

* [PATCH 4.19] mm: fix exec activate_mm vs TLB shootdown and lazy tlb switching race
From: Michael Ellerman @ 2020-11-04  1:14 UTC (permalink / raw)
  To: stable; +Cc: peterz, linuxppc-dev, linux-kernel, npiggin, gregkh

From: Nicholas Piggin <npiggin@gmail.com>

commit d53c3dfb23c45f7d4f910c3a3ca84bf0a99c6143 upstream.

Reading and modifying current->mm and current->active_mm and switching
mm should be done with irqs off, to prevent races seeing an intermediate
state.

This is similar to commit 38cf307c1f20 ("mm: fix kthread_use_mm() vs TLB
invalidate"). At exec-time when the new mm is activated, the old one
should usually be single-threaded and no longer used, unless something
else is holding an mm_users reference (which may be possible).

Absent other mm_users, there is also a race with preemption and lazy tlb
switching. Consider the kernel_execve case where the current thread is
using a lazy tlb active mm:

  call_usermodehelper()
    kernel_execve()
      old_mm = current->mm;
      active_mm = current->active_mm;
      *** preempt *** -------------------->  schedule()
                                               prev->active_mm = NULL;
                                               mmdrop(prev active_mm);
                                             ...
                      <--------------------  schedule()
      current->mm = mm;
      current->active_mm = mm;
      if (!old_mm)
          mmdrop(active_mm);

If we switch back to the kernel thread from a different mm, there is a
double free of the old active_mm, and a missing free of the new one.

Closing this race only requires interrupts to be disabled while ->mm
and ->active_mm are being switched, but the TLB problem requires also
holding interrupts off over activate_mm. Unfortunately not all archs
can do that yet, e.g., arm defers the switch if irqs are disabled and
expects finish_arch_post_lock_switch() to be called to complete the
flush; um takes a blocking lock in activate_mm().

So as a first step, disable interrupts across the mm/active_mm updates
to close the lazy tlb preempt race, and provide an arch option to
extend that to activate_mm which allows architectures doing IPI based
TLB shootdowns to close the second race.

This is a bit ugly, but in the interest of fixing the bug and backporting
before all architectures are converted this is a compromise.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[mpe: Manual backport to 4.19 due to membarrier_exec_mmap(mm) changes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20200914045219.3736466-2-npiggin@gmail.com
---
 arch/Kconfig |  7 +++++++
 fs/exec.c    | 15 ++++++++++++++-
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index a336548487e6..e3a030f7a722 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -366,6 +366,13 @@ config HAVE_RCU_TABLE_FREE
 config HAVE_RCU_TABLE_INVALIDATE
 	bool
 
+config ARCH_WANT_IRQS_OFF_ACTIVATE_MM
+	bool
+	help
+	  Temporary select until all architectures can be converted to have
+	  irqs disabled over activate_mm. Architectures that do IPI based TLB
+	  shootdowns should enable this.
+
 config ARCH_HAVE_NMI_SAFE_CMPXCHG
 	bool
 
diff --git a/fs/exec.c b/fs/exec.c
index cece8c14f377..52788644c4af 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1028,10 +1028,23 @@ static int exec_mmap(struct mm_struct *mm)
 		}
 	}
 	task_lock(tsk);
+
+	local_irq_disable();
 	active_mm = tsk->active_mm;
-	tsk->mm = mm;
 	tsk->active_mm = mm;
+	tsk->mm = mm;
+	/*
+	 * This prevents preemption while active_mm is being loaded and
+	 * it and mm are being updated, which could cause problems for
+	 * lazy tlb mm refcounting when these are updated by context
+	 * switches. Not all architectures can handle irqs off over
+	 * activate_mm yet.
+	 */
+	if (!IS_ENABLED(CONFIG_ARCH_WANT_IRQS_OFF_ACTIVATE_MM))
+		local_irq_enable();
 	activate_mm(active_mm, mm);
+	if (IS_ENABLED(CONFIG_ARCH_WANT_IRQS_OFF_ACTIVATE_MM))
+		local_irq_enable();
 	tsk->mm->vmacache_seqnum = 0;
 	vmacache_flush(tsk);
 	task_unlock(tsk);
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH seccomp 0/8] seccomp: add bitmap cache support on remaining arches and report cache in procfs
From: Kees Cook @ 2020-11-04  0:11 UTC (permalink / raw)
  To: YiFei Zhu
  Cc: linux-sh, Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
	linux-riscv, Andrea Arcangeli, linux-s390, YiFei Zhu, linux-csky,
	Tianyin Xu, linux-xtensa, Jann Horn, Valentin Rothberg,
	Aleksa Sarai, Josep Torrellas, Will Drewry, linux-parisc,
	containers, linux-kernel, Andy Lutomirski, Dimitrios Skarlatos,
	David Laight, Giuseppe Scrivano, linuxppc-dev, Tycho Andersen
In-Reply-To: <cover.1604410035.git.yifeifz2@illinois.edu>

On Tue, Nov 03, 2020 at 07:42:56AM -0600, YiFei Zhu wrote:
> From: YiFei Zhu <yifeifz2@illinois.edu>
> 
> This patch series enables bitmap cache for the remaining arches with
> SECCOMP_FILTER, other than MIPS.
> 
> I was unable to find any of the arches having subarch-specific NR_syscalls
> macros, so generic NR_syscalls is used. SH's syscall_get_arch seems to
> only have the 32-bit subarch implementation. I'm not sure if this is
> expected.
> 
> This series has not been tested; I have not built all the cross compilers
> necessary to build test, let alone run the kernel or benchmark the
> performance, so help on making sure the bitmap cache works as expected
> would be appreciated. The series applies on top of Kees's for-next/seccomp
> branch.

Thank you! This looks good. I wonder if the different handling of little
endian is worth solving -- I'm suspicious about powerpc's use of
__LITTLE_ENDIAN__ vs a CONFIG, but I guess the compiler would match the
target endian-ness. Regardless, it captures what the architectures are
doing, and gets things standardized.

> 
> YiFei Zhu (8):
>   csky: Enable seccomp architecture tracking
>   parisc: Enable seccomp architecture tracking

I don't have compilers for these.

>   powerpc: Enable seccomp architecture tracking
>   riscv: Enable seccomp architecture tracking
>   s390: Enable seccomp architecture tracking

These I can build-test immediately.

>   sh: Enable seccomp architecture tracking
>   xtensa: Enable seccomp architecture tracking

These two are available in Ubuntu's cross compiler set, so I'll get them
added to my cross-builders.

>   seccomp/cache: Report cache data through /proc/pid/seccomp_cache

In the meantime, I'll wait a bit to see if we can get some Acks/Reviews
from arch maintainers. :)

-Kees

> 
>  arch/Kconfig                       | 15 ++++++++
>  arch/csky/include/asm/Kbuild       |  1 -
>  arch/csky/include/asm/seccomp.h    | 11 ++++++
>  arch/parisc/include/asm/Kbuild     |  1 -
>  arch/parisc/include/asm/seccomp.h  | 22 +++++++++++
>  arch/powerpc/include/asm/seccomp.h | 21 +++++++++++
>  arch/riscv/include/asm/seccomp.h   | 10 +++++
>  arch/s390/include/asm/seccomp.h    |  9 +++++
>  arch/sh/include/asm/seccomp.h      | 10 +++++
>  arch/xtensa/include/asm/Kbuild     |  1 -
>  arch/xtensa/include/asm/seccomp.h  | 11 ++++++
>  fs/proc/base.c                     |  6 +++
>  include/linux/seccomp.h            |  7 ++++
>  kernel/seccomp.c                   | 59 ++++++++++++++++++++++++++++++
>  14 files changed, 181 insertions(+), 3 deletions(-)
>  create mode 100644 arch/csky/include/asm/seccomp.h
>  create mode 100644 arch/parisc/include/asm/seccomp.h
>  create mode 100644 arch/xtensa/include/asm/seccomp.h
> 
> 
> base-commit: 38c37e8fd3d2590c4234d8cfbc22158362f0eb04
> --
> 2.29.2

-- 
Kees Cook

^ permalink raw reply

* Re: Kernel panic from malloc() on SUSE 15.1?
From: Carl Jacobsen @ 2020-11-03 22:09 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <878sbjuqe6.fsf@mpe.ellerman.id.au>

[-- Attachment #1: Type: text/plain, Size: 9539 bytes --]

The panic (on a call to malloc from static linked libcrypto) looks like
this:

Bad kernel stack pointer 7fffffffeac0 at 700
Oops: Bad kernel stack pointer, sig: 6 [#1]
SMP NR_CPUS=2048
NUMA
pSeries
Modules linked in: scsi_transport_iscsi af_packet xt_tcpudp ip6t_rpfilter
ip6t_REJECT ipt_REJECT xt_conntrack ip_set nfnetlink ebtable_nat
ebtable_broute br_netfilter bridge stp llc ip6table_nat nf_conntrack_ipv6
nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat
nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security
ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables
x_tables ibmveth(X) vmx_crypto gf128mul crct10dif_vpmsum rtc_generic btrfs
xor zstd_decompress zstd_compress xxhash raid6_pq sr_mod cdrom sd_mod
ibmvscsi(X) scsi_transport_srp crc32c_vpmsum sg dm_multipath dm_mod
scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
Supported: Yes, External
CPU: 0 PID: 14144 Comm: rand_test_no_pt Tainted: G
4.12.14-197.18-default #1 SLE15-SP1
task: c00000002fa23b80 task.stack: c000000032824000
NIP: 0000000000000700 LR: 0000000010004ad0 CTR: 0000000000000000
REGS: c00000001ec2fd40 TRAP: 0300   Tainted: G
 (4.12.14-197.18-default)
MSR: 8000000000001000 <SF,ME>
  CR: 44000844  XER: 20000000
CFAR: 00000000000010f0 DAR: ffffffffffffb27a DSISR: 40000000 SOFTE: 0
GPR00: 0000000020000000 00007fffffffeac0 00000000102af788 fffffffffffffffd
GPR04: 0000000000000020 0000000000000030 00000000102b0550 0000000000000001
GPR08: 0000000000000000 00007fffb7dacc00 00000000102b0520 800000010280f033
GPR12: 0000000000004000 00007fffb7ffa100 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR24: 0000000000000000 0000000000000000 0000000000000000 00007fffb7fef4b8
GPR28: 00007fffb7ff0000 0000000000000000 0000000000000000 00007fffffffeac0
NIP [0000000000000700] 0x700
LR [0000000010004ad0] 0x10004ad0
Call Trace:
Instruction dump:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 7db243a6 7db142a6 f92d0080 7d20e2a6
---[ end trace cc04515f274cfbf6 ]---

Sending IPI to other CPUs
IPI complete
kexec: Starting switchover sequence.
I'm in purgatory
 -> smp_release_cpus()
spinning_secondaries = 0
 <- smp_release_cpus()
Kernel panic - not syncing: Out of memory and no killable processes...

CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
Call Trace:
[c000000012457210] [c000000008a20140] dump_stack+0xb0/0xf0 (unreliable)
[c000000012457250] [c000000008a1ccd4] panic+0x144/0x31c
[c0000000124572e0] [c0000000082efcc0] out_of_memory+0x3f0/0x700
[c000000012457380] [c0000000082f7ed4] __alloc_pages_nodemask+0x1004/0x10b0
[c000000012457570] [c00000000837f4d8] alloc_page_interleave+0x58/0x110
[c0000000124575b0] [c0000000083800bc] alloc_pages_current+0x16c/0x1d0
[c000000012457610] [c0000000082e8398] __page_cache_alloc+0xd8/0x150
[c000000012457650] [c0000000082e8574] pagecache_get_page+0x164/0x440
[c0000000124576b0] [c0000000082e8884] grab_cache_page_write_begin+0x34/0x70
[c0000000124576e0] [c00000000840ede8] simple_write_begin+0x48/0x190
[c000000012457720] [c0000000082e7c7c] generic_perform_write+0xec/0x270
[c0000000124577b0] [c0000000082ea2e0] __generic_file_write_iter+0x250/0x2a0
[c000000012457810] [c0000000082ea53c] generic_file_write_iter+0x20c/0x2e0
[c000000012457850] [c0000000083cc0e0] __vfs_write+0x120/0x1e0
[c0000000124578e0] [c0000000083cdfc8] vfs_write+0xd8/0x220
[c000000012457930] [c0000000083cfeec] SyS_write+0x6c/0x110
[c000000012457980] [c000000008d154c4] xwrite+0x54/0xb8
[c0000000124579c0] [c000000008d15574] do_copy+0x4c/0x17c
[c0000000124579f0] [c000000008d15140] write_buffer+0x64/0x90
[c000000012457a20] [c000000008d151d4] flush_buffer+0x68/0xf4
[c000000012457a70] [c000000008d62268] unxz+0x210/0x398
[c000000012457b10] [c000000008d15efc] unpack_to_rootfs+0x1f0/0x360
[c000000012457bc0] [c000000008d16108] populate_rootfs+0x9c/0x188
[c000000012457c40] [c00000000800f5d4] do_one_initcall+0x64/0x1d0
[c000000012457d00] [c000000008d14474] kernel_init_freeable+0x294/0x388
[c000000012457dc0] [c00000000801026c] kernel_init+0x2c/0x160
[c000000012457e30] [c00000000800b560] ret_from_kernel_thread+0x5c/0x7c
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at ../drivers/tty/vt/vt.c:3887
do_unblank_screen+0x1d0/0x270
Modules linked in:
Supported: Yes
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.14-197.18-default #1
SLE15-SP1
task: c000000012449680 task.stack: c000000012454000
NIP: c0000000086d1ac0 LR: c0000000086d1918 CTR: c0000000085fc390
REGS: c000000012456f60 TRAP: 0700   Not tainted  (4.12.14-197.18-default)
MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>
  CR: 28242222  XER: 20000008
CFAR: c0000000086d1934 SOFTE: 0
GPR00: c0000000086d1918 c0000000124571e0 c000000009240000 0000000000000000
GPR04: 0000000000000000 c00000001237e00e 00000000000010b9 c000000012457170
GPR08: 000000000a610000 0000000000000000 c0000000090f38f0 c0000000122bc3d7
GPR12: 0000000028242428 c00000000f6c0000 00000000014200c2 00000000014200c2
GPR16: 00000000014200c2 0000000000000001 0000000000000000 0000000000000240
GPR20: 0000000000000001 0000000000000240 0000000000000000 c0000000140e1d10
GPR24: 0000000000000000 0000000000000000 0000000000000115 c000000009282374
GPR28: c000000009403508 c0000000094034d8 0000000000000000 0000000000000000
NIP [c0000000086d1ac0] do_unblank_screen+0x1d0/0x270
LR [c0000000086d1918] do_unblank_screen+0x28/0x270
Call Trace:
[c0000000124571e0] [c000000012457250] 0xc000000012457250 (unreliable)
[c000000012457250] [c000000008a1cd44] panic+0x1b4/0x31c
[c0000000124572e0] [c0000000082efcc0] out_of_memory+0x3f0/0x700
[c000000012457380] [c0000000082f7ed4] __alloc_pages_nodemask+0x1004/0x10b0
[c000000012457570] [c00000000837f4d8] alloc_page_interleave+0x58/0x110
[c0000000124575b0] [c0000000083800bc] alloc_pages_current+0x16c/0x1d0
[c000000012457610] [c0000000082e8398] __page_cache_alloc+0xd8/0x150
[c000000012457650] [c0000000082e8574] pagecache_get_page+0x164/0x440
[c0000000124576b0] [c0000000082e8884] grab_cache_page_write_begin+0x34/0x70
[c0000000124576e0] [c00000000840ede8] simple_write_begin+0x48/0x190
[c000000012457720] [c0000000082e7c7c] generic_perform_write+0xec/0x270
[c0000000124577b0] [c0000000082ea2e0] __generic_file_write_iter+0x250/0x2a0
[c000000012457810] [c0000000082ea53c] generic_file_write_iter+0x20c/0x2e0
[c000000012457850] [c0000000083cc0e0] __vfs_write+0x120/0x1e0
[c0000000124578e0] [c0000000083cdfc8] vfs_write+0xd8/0x220
[c000000012457930] [c0000000083cfeec] SyS_write+0x6c/0x110
[c000000012457980] [c000000008d154c4] xwrite+0x54/0xb8
[c0000000124579c0] [c000000008d15574] do_copy+0x4c/0x17c
[c0000000124579f0] [c000000008d15140] write_buffer+0x64/0x90
[c000000012457a20] [c000000008d151d4] flush_buffer+0x68/0xf4
[c000000012457a70] [c000000008d62268] unxz+0x210/0x398
[c000000012457b10] [c000000008d15efc] unpack_to_rootfs+0x1f0/0x360
[c000000012457bc0] [c000000008d16108] populate_rootfs+0x9c/0x188
[c000000012457c40] [c00000000800f5d4] do_one_initcall+0x64/0x1d0
[c000000012457d00] [c000000008d14474] kernel_init_freeable+0x294/0x388
[c000000012457dc0] [c00000000801026c] kernel_init+0x2c/0x160
[c000000012457e30] [c00000000800b560] ret_from_kernel_thread+0x5c/0x7c
Instruction dump:
3d22001c 39293920 81290000 2f890000 409cff00 ebe10068 38210070 e8010010
ebc1fff0 7c0803a6 4e800020 60000000 <0fe00000> 4bfffe74 60000000 60000000
---[ end trace ad1803c957b45442 ]---
---[ end Kernel panic - not syncing: Out of memory and no killable
processes...


On Mon, Nov 2, 2020 at 6:26 PM Michael Ellerman <mpe@ellerman.id.au> wrote:

> Carl Jacobsen <cjacobsen@storix.com> writes:
> > I've got a SUSE 15.1 install (on ppc64le) that kernel panics on a very
> > simple
> > test program, built in a slightly unusual way.
> >
> > I'm compiling on SUSE 12, using gcc 4.8.3. I'm linking to a static
> > copy of libcrypto.a (from openssl-1.1.1g), built without threads.
> > I have a 10 line C test program that compiles and runs fine on the
> > SUSE 12 system. If I compile the same program on SUSE 15.1 (with
> > gcc 7.4.1), it runs fine on SUSE 15.1.
> >
> > But, if I run the version that I compiled on SUSE 12, on the SUSE 15.1
> > system, the call to RAND_status() gets to a malloc() and then panics.
> > (And, of course, if I just compile a call to malloc(), that runs fine
> > on both systems.) Here's the test program, it's really just a call to
> > RAND_status():
> >
> >     #include <stdio.h>
> >     #include <openssl/rand.h>
> >
> >     int main(int argc, char **argv)
> >     {
> >         int has_enough_data = RAND_status();
> >         printf("The PRNG %s been seeded with enough data\n",
> >                has_enough_data ? "HAS" : "has NOT");
> >         return 0;
> >     }
> >
> > openssl is configured/built with:
> >     ./config no-shared no-dso no-threads -fPIC -ggdb3 -debug -static
> >     make
> >
> > and the test program is compiled with:
> >     gcc -ggdb3 -o rand_test rand_test.c libcrypto.a
> >
> > The kernel on SUSE 12 is: 3.12.28-4-default
> > And glibc is: 2.19
> >
> > The kernel on SUSE 15.1 is: 4.12.14-197.18-default
> > And glibc is: 2.26
> >
> > In a previous iteration it was panicking in pthread_once(), so
> > I compiled openssl without pthreads support, and now it panics
> > calling malloc().
>
> What's the panic look like?
>
> cheers
>


-- 
Carl Jacobsen
Storix, Inc.

[-- Attachment #2: Type: text/html, Size: 10895 bytes --]

^ permalink raw reply

* Re: [PATCH] x86/mpx: fix recursive munmap() corruption
From: Dmitry Safonov @ 2020-11-03 21:08 UTC (permalink / raw)
  To: Laurent Dufour, Christophe Leroy, Michael Ellerman
  Cc: mhocko, rguenther, linux-mm, Dave Hansen, x86, stable, LKML,
	Dave Hansen, Thomas Gleixner, luto, linuxppc-dev, Andrew Morton,
	vbabka
In-Reply-To: <452b347c-0a86-c710-16ba-5a98c12a47e3@linux.vnet.ibm.com>

Hi Laurent, Christophe, Michael, all,

On 11/3/20 5:11 PM, Laurent Dufour wrote:
> Le 23/10/2020 à 14:28, Christophe Leroy a écrit :
[..]
>>>> That seems like it would work for CRIU and make sense in general?
>>>
>>> Sorry for the late answer, yes this would make more sense.
>>>
>>> Here is a patch doing that.
>>>
>>
>> In your patch, the test seems overkill:
>>
>> +    if ((start <= vdso_base && vdso_end <= end) ||  /* 1   */
>> +        (vdso_base <= start && start < vdso_end) || /* 3,4 */
>> +        (vdso_base < end && end <= vdso_end))       /* 2,3 */
>> +        mm->context.vdso_base = mm->context.vdso_end = 0;
>>
>> What about
>>
>>      if (start < vdso_end && vdso_start < end)
>>          mm->context.vdso_base = mm->context.vdso_end = 0;
>>
>> This should cover all cases, or am I missing something ?
>>
>>
>> And do we really need to store vdso_end in the context ?
>> I think it should be possible to re-calculate it: the size of the VDSO
>> should be (&vdso32_end - &vdso32_start) + PAGE_SIZE for 32 bits VDSO,
>> and (&vdso64_end - &vdso64_start) + PAGE_SIZE for the 64 bits VDSO.
> 
> Thanks Christophe for the advise.
> 
> That is covering all the cases, and indeed is similar to the Michael's
> proposal I missed last year.
> 
> I'll send a patch fixing this issue following your proposal.

It's probably not necessary anymore. I've sent patches [1], currently in
akpm, the last one forbids splitting of vm_special_mapping.
So, a user is able munmap() or mremap() vdso as a whole, but not partly.

[1]:
https://lore.kernel.org/linux-mm/20201013013416.390574-1-dima@arista.com/

Thanks,
          Dmitry

^ permalink raw reply

* Re: [patch V3 22/37] highmem: High implementation details and document API
From: Thomas Gleixner @ 2020-11-03 19:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Juri Lelli, linux-aio, Peter Zijlstra, Sebastian Andrzej Siewior,
	Joonas Lahtinen, dri-devel, linux-mips, Ben Segall, Chris Mason,
	Huang Rui, Paul Mackerras, Gerd Hoffmann,
	Daniel Bristot de Oliveira, linux-sparc, Vincent Chen,
	Christoph Hellwig, Vincent Guittot, Paul McKenney, Max Filippov,
	the arch/x86 maintainers, Russell King, linux-csky, Ingo Molnar,
	David Airlie, VMware Graphics, Mel Gorman, nouveau, Dave Airlie,
	open list:SYNOPSYS ARC ARCHITECTURE, Ben Skeggs, linux-xtensa,
	Arnd Bergmann, intel-gfx, Roland Scheidegger, Josef Bacik,
	Steven Rostedt, Rodrigo Vivi, Alexander Viro, spice-devel,
	David Sterba, virtualization, Dietmar Eggemann, Linux ARM,
	Jani Nikula, Chris Zankel, Michal Simek, Thomas Bogendoerfer,
	Nick Hu, Linux-MM, Vineet Gupta, LKML, Christian Koenig,
	Benjamin LaHaise, Daniel Vetter, linux-fsdevel, Andrew Morton,
	linuxppc-dev, David S. Miller, linux-btrfs, Greentime Hu
In-Reply-To: <CAHk-=wg2D_yjgKYkXCybD3uf0dtwYh6HxZ9BQJfV5t+EBqLGQQ@mail.gmail.com>

On Tue, Nov 03 2020 at 09:48, Linus Torvalds wrote:
> I have no complaints about the patch, but it strikes me that if people
> want to actually have much better debug coverage, this is where it
> should be (I like the "every other address" thing too, don't get me
> wrong).
>
> In particular, instead of these PageHighMem(page) tests, I think
> something like this would be better:
>
>    #ifdef CONFIG_DEBUG_HIGHMEM
>      #define page_use_kmap(page) ((page),1)
>    #else
>      #define page_use_kmap(page) PageHighMem(page)
>    #endif
>
> adn then replace those "if (!PageHighMem(page))" tests with "if
> (!page_use_kmap())" instead.
>
> IOW, in debug mode, it would _always_ remap the page, whether it's
> highmem or not. That would really stress the highmem code and find any
> fragilities.

Yes, that makes a lot of sense. We just have to avoid that for the
architectures with aliasing issues.

> Anyway, this is all sepatrate from the series, which still looks fine
> to me. Just a reaction to seeing the patch, and Thomas' earlier
> mention that the highmem debugging doesn't actually do much.

Right, forcing it for both kmap and kmap_local is straight forward. I'll
cook a patch on top for that.

Thanks,

        tglx



^ permalink raw reply

* Re: [PATCH net-next 04/15] net: mlx5: Replace in_irq() usage.
From: Saeed Mahameed @ 2020-11-03 19:31 UTC (permalink / raw)
  To: Jakub Kicinski, Leon Romanovsky
  Cc: Aymen Sghaier, Madalin Bucur, Sebastian Andrzej Siewior,
	Zhu Yanjun, Samuel Chessman, Ping-Ke Shih, Herbert Xu,
	Horia Geantă, linux-rdma, Rain River, Kalle Valo,
	Ulrich Kunitz, Jouni Malinen, Daniel Drake, Thomas Gleixner,
	linux-arm-kernel, netdev, linux-wireless, Li Yang, linux-crypto,
	Jon Mason, linuxppc-dev, David S. Miller
In-Reply-To: <20201031095938.3878412e@kicinski-fedora-PC1C0HJN.hsd1.ca.comcast.net>

On Sat, 2020-10-31 at 09:59 -0700, Jakub Kicinski wrote:
> On Tue, 27 Oct 2020 23:54:43 +0100 Sebastian Andrzej Siewior wrote:
> > mlx5_eq_async_int() uses in_irq() to decide whether eq::lock needs
> > to be
> > acquired and released with spin_[un]lock() or the irq
> > saving/restoring
> > variants.
> > 
> > The usage of in_*() in drivers is phased out and Linus clearly
> > requested
> > that code which changes behaviour depending on context should
> > either be
> > seperated or the context be conveyed in an argument passed by the
> > caller,
> > which usually knows the context.
> > 
> > mlx5_eq_async_int() knows the context via the action argument
> > already so
> > using it for the lock variant decision is a straight forward
> > replacement
> > for in_irq().
> > 
> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> > Cc: Saeed Mahameed <saeedm@nvidia.com>
> > Cc: Leon Romanovsky <leon@kernel.org>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: linux-rdma@vger.kernel.org
> 
> Saeed, please pick this up into your tree.

Applied to net-next-mlx5 will submit to net-next shortly.


^ permalink raw reply

* Re: [PATCH] powerpc: Don't use asm goto for put_user() with GCC 4.9
From: Christophe Leroy @ 2020-11-03 19:22 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: schwab, linuxppc-dev
In-Reply-To: <20201103185829.GM2672@gate.crashing.org>



Le 03/11/2020 à 19:58, Segher Boessenkool a écrit :
> On Tue, Nov 03, 2020 at 03:43:55PM +0100, Christophe Leroy wrote:
>> Le 03/11/2020 à 14:29, Michael Ellerman a écrit :
>>> For now though let's just not use asm goto with GCC 4.9, to avoid this
>>> bug and any other issues we haven't noticed yet. Possibly in future we
>>> can find a smaller workaround.
>>
>> Is that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 ?
> 
> That was fixed in 4.8.1 (and all 4.9), so probably not.
> 

Ok.

Regardless, using "asm_volatile_goto()" instead of "asm volatile goto()" fixes the issue it seems.

Christophe

^ permalink raw reply

* Re: [PATCH] powerpc/32s: Setup the early hash table at all time.
From: Serge Belyshev @ 2020-11-03 18:58 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Paul Mackerras, Andreas Schwab, linuxppc-dev, linux-kernel
In-Reply-To: <1f8494cd-36db-e3a2-8ea4-28fb976468e7@csgroup.eu>

> Would you mind checking that with that patch reverted, you are able to
> boot a kernel built with CONFIG_KASAN ?

I can reproduce the same problem on a powerbook G4, and no,
CONFIG_KASAN=y kernel with that patch reverted also does not boot with
the same symptom: white screen at the bootloader right after "Booting Linux
via __start() @ 0x0140000 ..."

^ permalink raw reply

* Re: [PATCH] powerpc: Don't use asm goto for put_user() with GCC 4.9
From: Segher Boessenkool @ 2020-11-03 18:58 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: schwab, linuxppc-dev
In-Reply-To: <4fe837f8-ecae-f009-c193-8da386a70705@csgroup.eu>

On Tue, Nov 03, 2020 at 03:43:55PM +0100, Christophe Leroy wrote:
> Le 03/11/2020 à 14:29, Michael Ellerman a écrit :
> >For now though let's just not use asm goto with GCC 4.9, to avoid this
> >bug and any other issues we haven't noticed yet. Possibly in future we
> >can find a smaller workaround.
> 
> Is that https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 ?

That was fixed in 4.8.1 (and all 4.9), so probably not.


Segher

^ permalink raw reply

* Re: C vdso
From: Christophe Leroy @ 2020-11-03 18:13 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev@ozlabs.org
In-Reply-To: <877drhxeg8.fsf@mpe.ellerman.id.au>



Le 23/10/2020 à 15:24, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Le 24/09/2020 à 15:17, Christophe Leroy a écrit :
>>> Le 17/09/2020 à 14:33, Michael Ellerman a écrit :
>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>>
>>>>> What is the status with the generic C vdso merge ?
>>>>> In some mail, you mentionned having difficulties getting it working on
>>>>> ppc64, any progress ? What's the problem ? Can I help ?
>>>>
>>>> Yeah sorry I was hoping to get time to work on it but haven't been able
>>>> to.
>>>>
>>>> It's causing crashes on ppc64 ie. big endian.
> ...
>>>
>>> Can you tell what defconfig you are using ? I have been able to setup a full glibc PPC64 cross
>>> compilation chain and been able to test it under QEMU with success, using Nathan's vdsotest tool.
>>
>> What config are you using ?
> 
> ppc64_defconfig + guest.config
> 
> Or pseries_defconfig.
> 
> I'm using Ubuntu GCC 9.3.0 mostly, but it happens with other toolchains too.
> 
> At a minimum we're seeing relocations in the output, which is a problem:
> 
>    $ readelf -r build\~/arch/powerpc/kernel/vdso64/vdso64.so
>    
>    Relocation section '.rela.dyn' at offset 0x12a8 contains 8 entries:
>      Offset          Info           Type           Sym. Value    Sym. Name + Addend
>    000000001368  000000000016 R_PPC64_RELATIVE                     7c0
>    000000001370  000000000016 R_PPC64_RELATIVE                     9300
>    000000001380  000000000016 R_PPC64_RELATIVE                     970
>    000000001388  000000000016 R_PPC64_RELATIVE                     9300
>    000000001398  000000000016 R_PPC64_RELATIVE                     a90
>    0000000013a0  000000000016 R_PPC64_RELATIVE                     9300
>    0000000013b0  000000000016 R_PPC64_RELATIVE                     b20
>    0000000013b8  000000000016 R_PPC64_RELATIVE                     9300

Looks like it's due to the OPD and relation between the function() and .function()

By using DOTSYM() in the 'bl' call, that's directly the dot function which is called and the OPD is 
not used anymore, it can get dropped.

Now I get .rela.dyn full of 0, don't know if we should drop it explicitely.

Christophe

^ permalink raw reply

* Re: C vdso
From: Christophe Leroy @ 2020-11-03 18:11 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev@ozlabs.org
In-Reply-To: <874kmkx7gi.fsf@mpe.ellerman.id.au>



Le 24/10/2020 à 12:07, Michael Ellerman a écrit :
> Michael Ellerman <mpe@ellerman.id.au> writes:
>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>> Le 24/09/2020 à 15:17, Christophe Leroy a écrit :
>>>> Le 17/09/2020 à 14:33, Michael Ellerman a écrit :
>>>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>>>>
>>>>>> What is the status with the generic C vdso merge ?
>>>>>> In some mail, you mentionned having difficulties getting it working on
>>>>>> ppc64, any progress ? What's the problem ? Can I help ?
>>>>>
>>>>> Yeah sorry I was hoping to get time to work on it but haven't been able
>>>>> to.
>>>>>
>>>>> It's causing crashes on ppc64 ie. big endian.
>> ...
>>>>
>>>> Can you tell what defconfig you are using ? I have been able to setup a full glibc PPC64 cross
>>>> compilation chain and been able to test it under QEMU with success, using Nathan's vdsotest tool.
>>>
>>> What config are you using ?
>>
>> ppc64_defconfig + guest.config
>>
>> Or pseries_defconfig.
>>
>> I'm using Ubuntu GCC 9.3.0 mostly, but it happens with other toolchains too.
> 
> I'm also seeing warnings because of the feature fixups:
> 

[...]

> 
> That's happening because the 32-bit VDSO is built with CONFIG_PPC32=y,
> due to config-fake32.h, and that causes the feature fixup entries to be
> the wrong size.
> 
> See the logic in feature-fixup.h:
> 
>    #if defined(CONFIG_PPC64) && !defined(__powerpc64__)
>    /* 64 bits kernel, 32 bits code (ie. vdso32) */
>    #define FTR_ENTRY_LONG		.8byte
>    #define FTR_ENTRY_OFFSET	.long 0xffffffff; .long
>    #elif defined(CONFIG_PPC64)
>    #define FTR_ENTRY_LONG		.8byte
>    #define FTR_ENTRY_OFFSET	.8byte
>    #else
>    #define FTR_ENTRY_LONG		.long
>    #define FTR_ENTRY_OFFSET	.long
>    #endif
> 
> 
> We expect the fixup entries to still use 64-bit values, even for the
> 32-bit VDSO in a 64-bit kernel.
> 
> TBH I'm not sure how config-fake32.h can work long term, it's so fragile
> to be defining/redefining a handful of CONFIG symbols like that.
> 
> The generic VDSO code is fairly careful to only include uapi and vdso
> headers, not linux ones. So I think we need to better split our headers
> so that we can build the VDSO code with few or no linux headers, and so
> avoid the need to define any (or most) CONFIG symbols.
> 

Finally, it was easy to do, just had to change a couple of __powerpc64__ into CONFIG_PPC64 in 
asm/cputable.h, and move asm/time.h functions playing with timebase into asm/timebase.h

Christophe

^ permalink raw reply

* Re: [patch V3 22/37] highmem: High implementation details and document API
From: Linus Torvalds @ 2020-11-03 17:48 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Juri Lelli, linux-aio, Peter Zijlstra, Sebastian Andrzej Siewior,
	Joonas Lahtinen, dri-devel, linux-mips, Ben Segall, Chris Mason,
	Huang Rui, Paul Mackerras, Gerd Hoffmann,
	Daniel Bristot de Oliveira, linux-sparc, Vincent Chen,
	Christoph Hellwig, Vincent Guittot, Paul McKenney, Max Filippov,
	the arch/x86 maintainers, Russell King, linux-csky, Ingo Molnar,
	David Airlie, VMware Graphics, Mel Gorman, nouveau, Dave Airlie,
	open list:SYNOPSYS ARC ARCHITECTURE, Ben Skeggs, linux-xtensa,
	Arnd Bergmann, intel-gfx, Roland Scheidegger, Josef Bacik,
	Steven Rostedt, Rodrigo Vivi, Alexander Viro, spice-devel,
	David Sterba, virtualization, Dietmar Eggemann, Linux ARM,
	Jani Nikula, Chris Zankel, Michal Simek, Thomas Bogendoerfer,
	Nick Hu, Linux-MM, Vineet Gupta, LKML, Christian Koenig,
	Benjamin LaHaise, Daniel Vetter, linux-fsdevel, Andrew Morton,
	linuxppc-dev, David S. Miller, linux-btrfs, Greentime Hu
In-Reply-To: <20201103095858.827582066@linutronix.de>

On Tue, Nov 3, 2020 at 2:33 AM Thomas Gleixner <tglx@linutronix.de> wrote:
>
> +static inline void *kmap(struct page *page)
> +{
> +       void *addr;
> +
> +       might_sleep();
> +       if (!PageHighMem(page))
> +               addr = page_address(page);
> +       else
> +               addr = kmap_high(page);
> +       kmap_flush_tlb((unsigned long)addr);
> +       return addr;
> +}
> +
> +static inline void kunmap(struct page *page)
> +{
> +       might_sleep();
> +       if (!PageHighMem(page))
> +               return;
> +       kunmap_high(page);
> +}

I have no complaints about the patch, but it strikes me that if people
want to actually have much better debug coverage, this is where it
should be (I like the "every other address" thing too, don't get me
wrong).

In particular, instead of these PageHighMem(page) tests, I think
something like this would be better:

   #ifdef CONFIG_DEBUG_HIGHMEM
     #define page_use_kmap(page) ((page),1)
   #else
     #define page_use_kmap(page) PageHighMem(page)
   #endif

adn then replace those "if (!PageHighMem(page))" tests with "if
(!page_use_kmap())" instead.

IOW, in debug mode, it would _always_ remap the page, whether it's
highmem or not. That would really stress the highmem code and find any
fragilities.

No?

Anyway, this is all sepatrate from the series, which still looks fine
to me. Just a reaction to seeing the patch, and Thomas' earlier
mention that the highmem debugging doesn't actually do much.

               Linus

^ permalink raw reply

* [patch V4 24/37] sched: highmem: Store local kmaps in task struct
From: Thomas Gleixner @ 2020-11-03 13:51 UTC (permalink / raw)
  To: LKML
  Cc: Juri Lelli, linux-aio, Peter Zijlstra, Sebastian Andrzej Siewior,
	Joonas Lahtinen, dri-devel, virtualization, Ben Segall,
	Chris Mason, Huang Rui, Paul Mackerras, Gerd Hoffmann,
	Daniel Bristot de Oliveira, sparclinux, Vincent Chen,
	Christoph Hellwig, Vincent Guittot, Paul McKenney, Max Filippov,
	x86, Russell King, linux-csky, Ingo Molnar, David Airlie,
	VMware Graphics, Mel Gorman, nouveau, Dave Airlie, linux-snps-arc,
	Ben Skeggs, linux-xtensa, Arnd Bergmann, intel-gfx,
	Roland Scheidegger, Josef Bacik, Steven Rostedt, Linus Torvalds,
	Alexander Viro, spice-devel, David Sterba, Rodrigo Vivi,
	Dietmar Eggemann, linux-arm-kernel, Jani Nikula, Chris Zankel,
	Michal Simek, Thomas Bogendoerfer, Nick Hu, linux-mm,
	Vineet Gupta, linux-mips, Christian Koenig, Benjamin LaHaise,
	Daniel Vetter, linux-fsdevel, Andrew Morton, linuxppc-dev,
	David S. Miller, linux-btrfs, Greentime Hu
In-Reply-To: <20201103095859.038791330@linutronix.de>

Instead of storing the map per CPU provide and use per task storage. That
prepares for local kmaps which are preemptible.

The context switch code is preparatory and not yet in use because
kmap_atomic() runs with preemption disabled. Will be made usable in the
next step.

The context switch logic is safe even when an interrupt happens after
clearing or before restoring the kmaps. The kmap index in task struct is
not modified so any nesting kmap in an interrupt will use unused indices
and on return the counter is the same as before.

Also add an assert into the return to user space code. Going back to user
space with an active kmap local is a nono.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
V4: Use the version which actually compiles and works
V3: Handle the debug case correctly
---
 include/linux/highmem-internal.h |   10 +++
 include/linux/sched.h            |    9 +++
 kernel/entry/common.c            |    2 
 kernel/fork.c                    |    1 
 kernel/sched/core.c              |   18 +++++++
 mm/highmem.c                     |   99 +++++++++++++++++++++++++++++++++++----
 6 files changed, 129 insertions(+), 10 deletions(-)

--- a/include/linux/highmem-internal.h
+++ b/include/linux/highmem-internal.h
@@ -9,6 +9,16 @@
 void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t prot);
 void *__kmap_local_page_prot(struct page *page, pgprot_t prot);
 void kunmap_local_indexed(void *vaddr);
+void kmap_local_fork(struct task_struct *tsk);
+void __kmap_local_sched_out(void);
+void __kmap_local_sched_in(void);
+static inline void kmap_assert_nomap(void)
+{
+	DEBUG_LOCKS_WARN_ON(current->kmap_ctrl.idx);
+}
+#else
+static inline void kmap_local_fork(struct task_struct *tsk) { }
+static inline void kmap_assert_nomap(void) { }
 #endif
 
 #ifdef CONFIG_HIGHMEM
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -34,6 +34,7 @@
 #include <linux/rseq.h>
 #include <linux/seqlock.h>
 #include <linux/kcsan.h>
+#include <asm/kmap_size.h>
 
 /* task_struct member predeclarations (sorted alphabetically): */
 struct audit_context;
@@ -629,6 +630,13 @@ struct wake_q_node {
 	struct wake_q_node *next;
 };
 
+struct kmap_ctrl {
+#ifdef CONFIG_KMAP_LOCAL
+	int				idx;
+	pte_t				pteval[KM_MAX_IDX];
+#endif
+};
+
 struct task_struct {
 #ifdef CONFIG_THREAD_INFO_IN_TASK
 	/*
@@ -1294,6 +1302,7 @@ struct task_struct {
 	unsigned int			sequential_io;
 	unsigned int			sequential_io_avg;
 #endif
+	struct kmap_ctrl		kmap_ctrl;
 #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
 	unsigned long			task_state_change;
 #endif
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -2,6 +2,7 @@
 
 #include <linux/context_tracking.h>
 #include <linux/entry-common.h>
+#include <linux/highmem.h>
 #include <linux/livepatch.h>
 #include <linux/audit.h>
 
@@ -194,6 +195,7 @@ static void exit_to_user_mode_prepare(st
 
 	/* Ensure that the address limit is intact and no locks are held */
 	addr_limit_user_check();
+	kmap_assert_nomap();
 	lockdep_assert_irqs_disabled();
 	lockdep_sys_exit();
 }
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -930,6 +930,7 @@ static struct task_struct *dup_task_stru
 	account_kernel_stack(tsk, 1);
 
 	kcov_task_init(tsk);
+	kmap_local_fork(tsk);
 
 #ifdef CONFIG_FAULT_INJECTION
 	tsk->fail_nth = 0;
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4053,6 +4053,22 @@ static inline void finish_lock_switch(st
 # define finish_arch_post_lock_switch()	do { } while (0)
 #endif
 
+static inline void kmap_local_sched_out(void)
+{
+#ifdef CONFIG_KMAP_LOCAL
+	if (unlikely(current->kmap_ctrl.idx))
+		__kmap_local_sched_out();
+#endif
+}
+
+static inline void kmap_local_sched_in(void)
+{
+#ifdef CONFIG_KMAP_LOCAL
+	if (unlikely(current->kmap_ctrl.idx))
+		__kmap_local_sched_in();
+#endif
+}
+
 /**
  * prepare_task_switch - prepare to switch tasks
  * @rq: the runqueue preparing to switch
@@ -4075,6 +4091,7 @@ prepare_task_switch(struct rq *rq, struc
 	perf_event_task_sched_out(prev, next);
 	rseq_preempt(prev);
 	fire_sched_out_preempt_notifiers(prev, next);
+	kmap_local_sched_out();
 	prepare_task(next);
 	prepare_arch_switch(next);
 }
@@ -4141,6 +4158,7 @@ static struct rq *finish_task_switch(str
 	finish_lock_switch(rq);
 	finish_arch_post_lock_switch();
 	kcov_finish_switch(current);
+	kmap_local_sched_in();
 
 	fire_sched_in_preempt_notifiers(current);
 	/*
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -365,8 +365,6 @@ EXPORT_SYMBOL(kunmap_high);
 
 #include <asm/kmap_size.h>
 
-static DEFINE_PER_CPU(int, __kmap_local_idx);
-
 /*
  * With DEBUG_HIGHMEM the stack depth is doubled and every second
  * slot is unused which acts as a guard page
@@ -379,23 +377,21 @@ static DEFINE_PER_CPU(int, __kmap_local_
 
 static inline int kmap_local_idx_push(void)
 {
-	int idx = __this_cpu_add_return(__kmap_local_idx, KM_INCR) - 1;
-
 	WARN_ON_ONCE(in_irq() && !irqs_disabled());
-	BUG_ON(idx >= KM_MAX_IDX);
-	return idx;
+	current->kmap_ctrl.idx += KM_INCR;
+	BUG_ON(current->kmap_ctrl.idx >= KM_MAX_IDX);
+	return current->kmap_ctrl.idx - 1;
 }
 
 static inline int kmap_local_idx(void)
 {
-	return __this_cpu_read(__kmap_local_idx) - 1;
+	return current->kmap_ctrl.idx - 1;
 }
 
 static inline void kmap_local_idx_pop(void)
 {
-	int idx = __this_cpu_sub_return(__kmap_local_idx, KM_INCR);
-
-	BUG_ON(idx < 0);
+	current->kmap_ctrl.idx -= KM_INCR;
+	BUG_ON(current->kmap_ctrl.idx < 0);
 }
 
 #ifndef arch_kmap_local_post_map
@@ -461,6 +457,7 @@ void *__kmap_local_pfn_prot(unsigned lon
 	pteval = pfn_pte(pfn, prot);
 	set_pte_at(&init_mm, vaddr, kmap_pte - idx, pteval);
 	arch_kmap_local_post_map(vaddr, pteval);
+	current->kmap_ctrl.pteval[kmap_local_idx()] = pteval;
 	preempt_enable();
 
 	return (void *)vaddr;
@@ -505,10 +502,92 @@ void kunmap_local_indexed(void *vaddr)
 	arch_kmap_local_pre_unmap(addr);
 	pte_clear(&init_mm, addr, kmap_pte - idx);
 	arch_kmap_local_post_unmap(addr);
+	current->kmap_ctrl.pteval[kmap_local_idx()] = __pte(0);
 	kmap_local_idx_pop();
 	preempt_enable();
 }
 EXPORT_SYMBOL(kunmap_local_indexed);
+
+/*
+ * Invoked before switch_to(). This is safe even when during or after
+ * clearing the maps an interrupt which needs a kmap_local happens because
+ * the task::kmap_ctrl.idx is not modified by the unmapping code so a
+ * nested kmap_local will use the next unused index and restore the index
+ * on unmap. The already cleared kmaps of the outgoing task are irrelevant
+ * because the interrupt context does not know about them. The same applies
+ * when scheduling back in for an interrupt which happens before the
+ * restore is complete.
+ */
+void __kmap_local_sched_out(void)
+{
+	struct task_struct *tsk = current;
+	pte_t *kmap_pte = kmap_get_pte();
+	int i;
+
+	/* Clear kmaps */
+	for (i = 0; i < tsk->kmap_ctrl.idx; i++) {
+		pte_t pteval = tsk->kmap_ctrl.pteval[i];
+		unsigned long addr;
+		int idx;
+
+		/* With debug all even slots are unmapped and act as guard */
+		if (IS_ENABLED(CONFIG_DEBUG_HIGHMEM) && !(i & 0x01)) {
+			WARN_ON_ONCE(!pte_none(pteval));
+			continue;
+		}
+		if (WARN_ON_ONCE(pte_none(pteval)))
+			continue;
+
+		/*
+		 * This is a horrible hack for XTENSA to calculate the
+		 * coloured PTE index. Uses the PFN encoded into the pteval
+		 * and the map index calculation because the actual mapped
+		 * virtual address is not stored in task::kmap_ctrl.
+		 * For any sane architecture this is optimized out.
+		 */
+		idx = arch_kmap_local_map_idx(i, pte_pfn(pteval));
+
+		addr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+		arch_kmap_local_pre_unmap(addr);
+		pte_clear(&init_mm, addr, kmap_pte - idx);
+		arch_kmap_local_post_unmap(addr);
+	}
+}
+
+void __kmap_local_sched_in(void)
+{
+	struct task_struct *tsk = current;
+	pte_t *kmap_pte = kmap_get_pte();
+	int i;
+
+	/* Restore kmaps */
+	for (i = 0; i < tsk->kmap_ctrl.idx; i++) {
+		pte_t pteval = tsk->kmap_ctrl.pteval[i];
+		unsigned long addr;
+		int idx;
+
+		/* With debug all even slots are unmapped and act as guard */
+		if (IS_ENABLED(CONFIG_DEBUG_HIGHMEM) && !(i & 0x01)) {
+			WARN_ON_ONCE(!pte_none(pteval));
+			continue;
+		}
+		if (WARN_ON_ONCE(pte_none(pteval)))
+			continue;
+
+		/* See comment in __kmap_local_sched_out() */
+		idx = arch_kmap_local_map_idx(i, pte_pfn(pteval));
+		addr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
+		set_pte_at(&init_mm, addr, kmap_pte - idx, pteval);
+		arch_kmap_local_post_map(addr, pteval);
+	}
+}
+
+void kmap_local_fork(struct task_struct *tsk)
+{
+	if (WARN_ON_ONCE(tsk->kmap_ctrl.idx))
+		memset(&tsk->kmap_ctrl, 0, sizeof(tsk->kmap_ctrl));
+}
+
 #endif
 
 #if defined(HASHED_PAGE_VIRTUAL)

^ permalink raw reply

* Re: [patch V3 24/37] sched: highmem: Store local kmaps in task struct
From: Thomas Gleixner @ 2020-11-03 13:49 UTC (permalink / raw)
  To: LKML
  Cc: Juri Lelli, linux-aio, Peter Zijlstra, Sebastian Andrzej Siewior,
	Joonas Lahtinen, dri-devel, virtualization, Ben Segall,
	Chris Mason, Huang Rui, Paul Mackerras, Gerd Hoffmann,
	Daniel Bristot de Oliveira, sparclinux, Vincent Chen,
	Christoph Hellwig, Vincent Guittot, Paul McKenney, Max Filippov,
	x86, Russell King, linux-csky, Ingo Molnar, David Airlie,
	VMware Graphics, Mel Gorman, nouveau, Dave Airlie, linux-snps-arc,
	Ben Skeggs, linux-xtensa, Arnd Bergmann, intel-gfx,
	Roland Scheidegger, Josef Bacik, Steven Rostedt, Linus Torvalds,
	Alexander Viro, spice-devel, David Sterba, Rodrigo Vivi,
	Dietmar Eggemann, linux-arm-kernel, Jani Nikula, Chris Zankel,
	Michal Simek, Thomas Bogendoerfer, Nick Hu, linux-mm,
	Vineet Gupta, linux-mips, Christian Koenig, Benjamin LaHaise,
	Daniel Vetter, linux-fsdevel, Andrew Morton, linuxppc-dev,
	David S. Miller, linux-btrfs, Greentime Hu
In-Reply-To: <20201103095859.038791330@linutronix.de>

On Tue, Nov 03 2020 at 10:27, Thomas Gleixner wrote:
> +struct kmap_ctrl {
> +#ifdef CONFIG_KMAP_LOCAL
> +	int				idx;
> +	pte_t				pteval[KM_TYPE_NR];

I'm a moron. Fixed it on the test machine ...

^ permalink raw reply

* [PATCH seccomp 6/8] sh: Enable seccomp architecture tracking
From: YiFei Zhu @ 2020-11-03 13:43 UTC (permalink / raw)
  To: containers
  Cc: linux-sh, Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
	linux-riscv, Andrea Arcangeli, linux-s390, YiFei Zhu, linux-csky,
	Tianyin Xu, linux-xtensa, Kees Cook, Jann Horn, Valentin Rothberg,
	Aleksa Sarai, Josep Torrellas, Will Drewry, linux-parisc,
	linux-kernel, Andy Lutomirski, Dimitrios Skarlatos, David Laight,
	Giuseppe Scrivano, linuxppc-dev, Tycho Andersen
In-Reply-To: <cover.1604410035.git.yifeifz2@illinois.edu>

From: YiFei Zhu <yifeifz2@illinois.edu>

To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for sh.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
---
 arch/sh/include/asm/seccomp.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/sh/include/asm/seccomp.h b/arch/sh/include/asm/seccomp.h
index 54111e4d32b8..b8d169292a34 100644
--- a/arch/sh/include/asm/seccomp.h
+++ b/arch/sh/include/asm/seccomp.h
@@ -8,4 +8,14 @@
 #define __NR_seccomp_exit __NR_exit
 #define __NR_seccomp_sigreturn __NR_rt_sigreturn
 
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define __SECCOMP_ARCH_LE_BIT		__AUDIT_ARCH_LE
+#else
+#define __SECCOMP_ARCH_LE_BIT		0
+#endif
+
+#define SECCOMP_ARCH_NATIVE		(AUDIT_ARCH_SH | __SECCOMP_ARCH_LE)
+#define SECCOMP_ARCH_NATIVE_NR		NR_syscalls
+#define SECCOMP_ARCH_NATIVE_NAME	"sh"
+
 #endif /* __ASM_SECCOMP_H */
-- 
2.29.2


^ permalink raw reply related

* [PATCH seccomp 8/8] seccomp/cache: Report cache data through /proc/pid/seccomp_cache
From: YiFei Zhu @ 2020-11-03 13:43 UTC (permalink / raw)
  To: containers
  Cc: linux-sh, Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
	linux-riscv, Andrea Arcangeli, linux-s390, YiFei Zhu, linux-csky,
	Tianyin Xu, linux-xtensa, Kees Cook, Jann Horn, Valentin Rothberg,
	Aleksa Sarai, Josep Torrellas, Will Drewry, linux-parisc,
	linux-kernel, Andy Lutomirski, Dimitrios Skarlatos, David Laight,
	Giuseppe Scrivano, linuxppc-dev, Tycho Andersen
In-Reply-To: <cover.1604410035.git.yifeifz2@illinois.edu>

From: YiFei Zhu <yifeifz2@illinois.edu>

Currently the kernel does not provide an infrastructure to translate
architecture numbers to a human-readable name. Translating syscall
numbers to syscall names is possible through FTRACE_SYSCALL
infrastructure but it does not provide support for compat syscalls.

This will create a file for each PID as /proc/pid/seccomp_cache.
The file will be empty when no seccomp filters are loaded, or be
in the format of:
<arch name> <decimal syscall number> <ALLOW | FILTER>
where ALLOW means the cache is guaranteed to allow the syscall,
and filter means the cache will pass the syscall to the BPF filter.

For the docker default profile on x86_64 it looks like:
x86_64 0 ALLOW
x86_64 1 ALLOW
x86_64 2 ALLOW
x86_64 3 ALLOW
[...]
x86_64 132 ALLOW
x86_64 133 ALLOW
x86_64 134 FILTER
x86_64 135 FILTER
x86_64 136 FILTER
x86_64 137 ALLOW
x86_64 138 ALLOW
x86_64 139 FILTER
x86_64 140 ALLOW
x86_64 141 ALLOW
[...]

This file is guarded by CONFIG_SECCOMP_CACHE_DEBUG with a default
of N because I think certain users of seccomp might not want the
application to know which syscalls are definitely usable. For
the same reason, it is also guarded by CAP_SYS_ADMIN.

Suggested-by: Jann Horn <jannh@google.com>
Link: https://lore.kernel.org/lkml/CAG48ez3Ofqp4crXGksLmZY6=fGrF_tWyUCg7PBkAetvbbOPeOA@mail.gmail.com/
Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
---
 arch/Kconfig            | 15 +++++++++++
 fs/proc/base.c          |  6 +++++
 include/linux/seccomp.h |  7 +++++
 kernel/seccomp.c        | 59 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 87 insertions(+)

diff --git a/arch/Kconfig b/arch/Kconfig
index 56b6ccc0e32d..6e2eb7171da0 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -514,6 +514,21 @@ config SECCOMP_FILTER
 
 	  See Documentation/userspace-api/seccomp_filter.rst for details.
 
+config SECCOMP_CACHE_DEBUG
+	bool "Show seccomp filter cache status in /proc/pid/seccomp_cache"
+	depends on SECCOMP
+	depends on SECCOMP_FILTER && !HAVE_SPARSE_SYSCALL_NR
+	depends on PROC_FS
+	help
+	  This enables the /proc/pid/seccomp_cache interface to monitor
+	  seccomp cache data. The file format is subject to change. Reading
+	  the file requires CAP_SYS_ADMIN.
+
+	  This option is for debugging only. Enabling presents the risk that
+	  an adversary may be able to infer the seccomp filter logic.
+
+	  If unsure, say N.
+
 config HAVE_ARCH_STACKLEAK
 	bool
 	help
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0f707003dda5..d652f9dbaecc 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3261,6 +3261,9 @@ static const struct pid_entry tgid_base_stuff[] = {
 #ifdef CONFIG_PROC_PID_ARCH_STATUS
 	ONE("arch_status", S_IRUGO, proc_pid_arch_status),
 #endif
+#ifdef CONFIG_SECCOMP_CACHE_DEBUG
+	ONE("seccomp_cache", S_IRUSR, proc_pid_seccomp_cache),
+#endif
 };
 
 static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
@@ -3590,6 +3593,9 @@ static const struct pid_entry tid_base_stuff[] = {
 #ifdef CONFIG_PROC_PID_ARCH_STATUS
 	ONE("arch_status", S_IRUGO, proc_pid_arch_status),
 #endif
+#ifdef CONFIG_SECCOMP_CACHE_DEBUG
+	ONE("seccomp_cache", S_IRUSR, proc_pid_seccomp_cache),
+#endif
 };
 
 static int proc_tid_base_readdir(struct file *file, struct dir_context *ctx)
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 02aef2844c38..76963ec4641a 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -121,4 +121,11 @@ static inline long seccomp_get_metadata(struct task_struct *task,
 	return -EINVAL;
 }
 #endif /* CONFIG_SECCOMP_FILTER && CONFIG_CHECKPOINT_RESTORE */
+
+#ifdef CONFIG_SECCOMP_CACHE_DEBUG
+struct seq_file;
+
+int proc_pid_seccomp_cache(struct seq_file *m, struct pid_namespace *ns,
+			   struct pid *pid, struct task_struct *task);
+#endif
 #endif /* _LINUX_SECCOMP_H */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index d8cf468dbe1e..76f524e320b1 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -553,6 +553,9 @@ void seccomp_filter_release(struct task_struct *tsk)
 {
 	struct seccomp_filter *orig = tsk->seccomp.filter;
 
+	/* We are effectively holding the siglock by not having any sighand. */
+	WARN_ON(tsk->sighand != NULL);
+
 	/* Detach task from its filter tree. */
 	tsk->seccomp.filter = NULL;
 	__seccomp_filter_release(orig);
@@ -2335,3 +2338,59 @@ static int __init seccomp_sysctl_init(void)
 device_initcall(seccomp_sysctl_init)
 
 #endif /* CONFIG_SYSCTL */
+
+#ifdef CONFIG_SECCOMP_CACHE_DEBUG
+/* Currently CONFIG_SECCOMP_CACHE_DEBUG implies SECCOMP_ARCH_NATIVE */
+static void proc_pid_seccomp_cache_arch(struct seq_file *m, const char *name,
+					const void *bitmap, size_t bitmap_size)
+{
+	int nr;
+
+	for (nr = 0; nr < bitmap_size; nr++) {
+		bool cached = test_bit(nr, bitmap);
+		char *status = cached ? "ALLOW" : "FILTER";
+
+		seq_printf(m, "%s %d %s\n", name, nr, status);
+	}
+}
+
+int proc_pid_seccomp_cache(struct seq_file *m, struct pid_namespace *ns,
+			   struct pid *pid, struct task_struct *task)
+{
+	struct seccomp_filter *f;
+	unsigned long flags;
+
+	/*
+	 * We don't want some sandboxed process to know what their seccomp
+	 * filters consist of.
+	 */
+	if (!file_ns_capable(m->file, &init_user_ns, CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (!lock_task_sighand(task, &flags))
+		return -ESRCH;
+
+	f = READ_ONCE(task->seccomp.filter);
+	if (!f) {
+		unlock_task_sighand(task, &flags);
+		return 0;
+	}
+
+	/* prevent filter from being freed while we are printing it */
+	__get_seccomp_filter(f);
+	unlock_task_sighand(task, &flags);
+
+	proc_pid_seccomp_cache_arch(m, SECCOMP_ARCH_NATIVE_NAME,
+				    f->cache.allow_native,
+				    SECCOMP_ARCH_NATIVE_NR);
+
+#ifdef SECCOMP_ARCH_COMPAT
+	proc_pid_seccomp_cache_arch(m, SECCOMP_ARCH_COMPAT_NAME,
+				    f->cache.allow_compat,
+				    SECCOMP_ARCH_COMPAT_NR);
+#endif /* SECCOMP_ARCH_COMPAT */
+
+	__put_seccomp_filter(f);
+	return 0;
+}
+#endif /* CONFIG_SECCOMP_CACHE_DEBUG */
-- 
2.29.2


^ permalink raw reply related

* [PATCH seccomp 7/8] xtensa: Enable seccomp architecture tracking
From: YiFei Zhu @ 2020-11-03 13:43 UTC (permalink / raw)
  To: containers
  Cc: linux-sh, Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
	linux-riscv, Andrea Arcangeli, linux-s390, YiFei Zhu, linux-csky,
	Tianyin Xu, linux-xtensa, Kees Cook, Jann Horn, Valentin Rothberg,
	Aleksa Sarai, Josep Torrellas, Will Drewry, linux-parisc,
	linux-kernel, Andy Lutomirski, Dimitrios Skarlatos, David Laight,
	Giuseppe Scrivano, linuxppc-dev, Tycho Andersen
In-Reply-To: <cover.1604410035.git.yifeifz2@illinois.edu>

From: YiFei Zhu <yifeifz2@illinois.edu>

To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for xtensa.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
---
 arch/xtensa/include/asm/Kbuild    |  1 -
 arch/xtensa/include/asm/seccomp.h | 11 +++++++++++
 2 files changed, 11 insertions(+), 1 deletion(-)
 create mode 100644 arch/xtensa/include/asm/seccomp.h

diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild
index c59c42a1221a..9718e9593564 100644
--- a/arch/xtensa/include/asm/Kbuild
+++ b/arch/xtensa/include/asm/Kbuild
@@ -7,5 +7,4 @@ generic-y += mcs_spinlock.h
 generic-y += param.h
 generic-y += qrwlock.h
 generic-y += qspinlock.h
-generic-y += seccomp.h
 generic-y += user.h
diff --git a/arch/xtensa/include/asm/seccomp.h b/arch/xtensa/include/asm/seccomp.h
new file mode 100644
index 000000000000..f1cb6b0a9e1f
--- /dev/null
+++ b/arch/xtensa/include/asm/seccomp.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_SECCOMP_H
+#define _ASM_SECCOMP_H
+
+#include <asm-generic/seccomp.h>
+
+#define SECCOMP_ARCH_NATIVE		AUDIT_ARCH_XTENSA
+#define SECCOMP_ARCH_NATIVE_NR		NR_syscalls
+#define SECCOMP_ARCH_NATIVE_NAME	"xtensa"
+
+#endif /* _ASM_SECCOMP_H */
-- 
2.29.2


^ permalink raw reply related

* [PATCH seccomp 4/8] riscv: Enable seccomp architecture tracking
From: YiFei Zhu @ 2020-11-03 13:43 UTC (permalink / raw)
  To: containers
  Cc: linux-sh, Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
	linux-riscv, Andrea Arcangeli, linux-s390, YiFei Zhu, linux-csky,
	Tianyin Xu, linux-xtensa, Kees Cook, Jann Horn, Valentin Rothberg,
	Aleksa Sarai, Josep Torrellas, Will Drewry, linux-parisc,
	linux-kernel, Andy Lutomirski, Dimitrios Skarlatos, David Laight,
	Giuseppe Scrivano, linuxppc-dev, Tycho Andersen
In-Reply-To: <cover.1604410035.git.yifeifz2@illinois.edu>

From: YiFei Zhu <yifeifz2@illinois.edu>

To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for riscv.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
---
 arch/riscv/include/asm/seccomp.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/riscv/include/asm/seccomp.h b/arch/riscv/include/asm/seccomp.h
index bf7744ee3b3d..c7ee6a3507be 100644
--- a/arch/riscv/include/asm/seccomp.h
+++ b/arch/riscv/include/asm/seccomp.h
@@ -7,4 +7,14 @@
 
 #include <asm-generic/seccomp.h>
 
+#ifdef CONFIG_64BIT
+# define SECCOMP_ARCH_NATIVE		AUDIT_ARCH_RISCV64
+# define SECCOMP_ARCH_NATIVE_NR		NR_syscalls
+# define SECCOMP_ARCH_NATIVE_NAME	"riscv64"
+#else /* !CONFIG_64BIT */
+# define SECCOMP_ARCH_NATIVE		AUDIT_ARCH_RISCV32
+# define SECCOMP_ARCH_NATIVE_NR		NR_syscalls
+# define SECCOMP_ARCH_NATIVE_NAME	"riscv32"
+#endif
+
 #endif /* _ASM_SECCOMP_H */
-- 
2.29.2


^ permalink raw reply related

* [PATCH seccomp 5/8] s390: Enable seccomp architecture tracking
From: YiFei Zhu @ 2020-11-03 13:43 UTC (permalink / raw)
  To: containers
  Cc: linux-sh, Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
	linux-riscv, Andrea Arcangeli, linux-s390, YiFei Zhu, linux-csky,
	Tianyin Xu, linux-xtensa, Kees Cook, Jann Horn, Valentin Rothberg,
	Aleksa Sarai, Josep Torrellas, Will Drewry, linux-parisc,
	linux-kernel, Andy Lutomirski, Dimitrios Skarlatos, David Laight,
	Giuseppe Scrivano, linuxppc-dev, Tycho Andersen
In-Reply-To: <cover.1604410035.git.yifeifz2@illinois.edu>

From: YiFei Zhu <yifeifz2@illinois.edu>

To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for s390.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
---
 arch/s390/include/asm/seccomp.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/arch/s390/include/asm/seccomp.h b/arch/s390/include/asm/seccomp.h
index 795bbe0d7ca6..71d46f0ba97b 100644
--- a/arch/s390/include/asm/seccomp.h
+++ b/arch/s390/include/asm/seccomp.h
@@ -16,4 +16,13 @@
 
 #include <asm-generic/seccomp.h>
 
+#define SECCOMP_ARCH_NATIVE		AUDIT_ARCH_S390X
+#define SECCOMP_ARCH_NATIVE_NR		NR_syscalls
+#define SECCOMP_ARCH_NATIVE_NAME	"s390x"
+#ifdef CONFIG_COMPAT
+# define SECCOMP_ARCH_COMPAT		AUDIT_ARCH_S390
+# define SECCOMP_ARCH_COMPAT_NR		NR_syscalls
+# define SECCOMP_ARCH_COMPAT_NAME	"s390"
+#endif
+
 #endif	/* _ASM_S390_SECCOMP_H */
-- 
2.29.2


^ permalink raw reply related

* [PATCH seccomp 2/8] parisc: Enable seccomp architecture tracking
From: YiFei Zhu @ 2020-11-03 13:42 UTC (permalink / raw)
  To: containers
  Cc: linux-sh, Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
	linux-riscv, Andrea Arcangeli, linux-s390, YiFei Zhu, linux-csky,
	Tianyin Xu, linux-xtensa, Kees Cook, Jann Horn, Valentin Rothberg,
	Aleksa Sarai, Josep Torrellas, Will Drewry, linux-parisc,
	linux-kernel, Andy Lutomirski, Dimitrios Skarlatos, David Laight,
	Giuseppe Scrivano, linuxppc-dev, Tycho Andersen
In-Reply-To: <cover.1604410035.git.yifeifz2@illinois.edu>

From: YiFei Zhu <yifeifz2@illinois.edu>

To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for parisc.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
---
 arch/parisc/include/asm/Kbuild    |  1 -
 arch/parisc/include/asm/seccomp.h | 22 ++++++++++++++++++++++
 2 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 arch/parisc/include/asm/seccomp.h

diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild
index e3ee5c0bfe80..f16c4db80116 100644
--- a/arch/parisc/include/asm/Kbuild
+++ b/arch/parisc/include/asm/Kbuild
@@ -5,5 +5,4 @@ generated-y += syscall_table_c32.h
 generic-y += kvm_para.h
 generic-y += local64.h
 generic-y += mcs_spinlock.h
-generic-y += seccomp.h
 generic-y += user.h
diff --git a/arch/parisc/include/asm/seccomp.h b/arch/parisc/include/asm/seccomp.h
new file mode 100644
index 000000000000..b058b2220322
--- /dev/null
+++ b/arch/parisc/include/asm/seccomp.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+#ifndef _ASM_SECCOMP_H
+#define _ASM_SECCOMP_H
+
+#include <asm-generic/seccomp.h>
+
+#ifdef CONFIG_64BIT
+# define SECCOMP_ARCH_NATIVE		AUDIT_ARCH_PARISC64
+# define SECCOMP_ARCH_NATIVE_NR		NR_syscalls
+# define SECCOMP_ARCH_NATIVE_NAME	"parisc64"
+# ifdef CONFIG_COMPAT
+#  define SECCOMP_ARCH_COMPAT		AUDIT_ARCH_PARISC
+#  define SECCOMP_ARCH_COMPAT_NR	NR_syscalls
+#  define SECCOMP_ARCH_COMPAT_NAME	"parisc"
+# endif
+#else /* !CONFIG_64BIT */
+# define SECCOMP_ARCH_NATIVE		AUDIT_ARCH_PARISC
+# define SECCOMP_ARCH_NATIVE_NR		NR_syscalls
+# define SECCOMP_ARCH_NATIVE_NAME	"parisc"
+#endif
+
+#endif /* _ASM_SECCOMP_H */
-- 
2.29.2


^ permalink raw reply related

* [PATCH seccomp 3/8] powerpc: Enable seccomp architecture tracking
From: YiFei Zhu @ 2020-11-03 13:42 UTC (permalink / raw)
  To: containers
  Cc: linux-sh, Tobin Feldman-Fitzthum, Hubertus Franke, Jack Chen,
	linux-riscv, Andrea Arcangeli, linux-s390, YiFei Zhu, linux-csky,
	Tianyin Xu, linux-xtensa, Kees Cook, Jann Horn, Valentin Rothberg,
	Aleksa Sarai, Josep Torrellas, Will Drewry, linux-parisc,
	linux-kernel, Andy Lutomirski, Dimitrios Skarlatos, David Laight,
	Giuseppe Scrivano, linuxppc-dev, Tycho Andersen
In-Reply-To: <cover.1604410035.git.yifeifz2@illinois.edu>

From: YiFei Zhu <yifeifz2@illinois.edu>

To enable seccomp constant action bitmaps, we need to have a static
mapping to the audit architecture and system call table size. Add these
for powerpc.

Signed-off-by: YiFei Zhu <yifeifz2@illinois.edu>
---
 arch/powerpc/include/asm/seccomp.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/arch/powerpc/include/asm/seccomp.h b/arch/powerpc/include/asm/seccomp.h
index 51209f6071c5..3efcc83e9cc6 100644
--- a/arch/powerpc/include/asm/seccomp.h
+++ b/arch/powerpc/include/asm/seccomp.h
@@ -8,4 +8,25 @@
 
 #include <asm-generic/seccomp.h>
 
+#ifdef __LITTLE_ENDIAN__
+#define __SECCOMP_ARCH_LE_BIT		__AUDIT_ARCH_LE
+#else
+#define __SECCOMP_ARCH_LE_BIT		0
+#endif
+
+#ifdef CONFIG_PPC64
+# define SECCOMP_ARCH_NATIVE		(AUDIT_ARCH_PPC64 | __SECCOMP_ARCH_LE)
+# define SECCOMP_ARCH_NATIVE_NR		NR_syscalls
+# define SECCOMP_ARCH_NATIVE_NAME	"ppc64"
+# ifdef CONFIG_COMPAT
+#  define SECCOMP_ARCH_COMPAT		(AUDIT_ARCH_PPC | __SECCOMP_ARCH_LE)
+#  define SECCOMP_ARCH_COMPAT_NR	NR_syscalls
+#  define SECCOMP_ARCH_COMPAT_NAME	"powerpc"
+# endif
+#else /* !CONFIG_PPC64 */
+# define SECCOMP_ARCH_NATIVE		(AUDIT_ARCH_PPC | __SECCOMP_ARCH_LE)
+# define SECCOMP_ARCH_NATIVE_NR		NR_syscalls
+# define SECCOMP_ARCH_NATIVE_NAME	"powerpc"
+#endif
+
 #endif	/* _ASM_POWERPC_SECCOMP_H */
-- 
2.29.2


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox