* Re: [RFC v1 0/2] Plumbing to support multiple secure memory backends.
From: Christoph Hellwig @ 2020-10-14 6:31 UTC (permalink / raw)
To: Ram Pai; +Cc: bharata, linuxppc-dev, kvm-ppc, farosas
In-Reply-To: <1602487663-7321-1-git-send-email-linuxram@us.ibm.com>
Please don't add an abstraction without a second implementation.
Once we have the implementation we can consider the tradeoffs. E.g.
if expensive indirect function calls are really needed vs simple
branches.
^ permalink raw reply
* Re: [PATCH 05/14] fs: don't allow kernel reads and writes without iter ops
From: Christoph Hellwig @ 2020-10-14 5:51 UTC (permalink / raw)
To: Alexander Viro
Cc: linux-arch, linuxppc-dev, Kees Cook, the arch/x86 maintainers,
Linux Kernel Mailing List, Alexey Dobriyan, Eric Biggers,
Luis Chamberlain, Al Viro, linux-fsdevel, Linus Torvalds,
Christoph Hellwig
In-Reply-To: <20201010015524.GB101464@shell-el7.hosts.prod.upshift.rdu2.redhat.com>
On Sat, Oct 10, 2020 at 01:55:24AM +0000, Alexander Viro wrote:
> FWIW, I hadn't pushed that branch out (or merged it into #for-next yet);
> for one thing, uml part (mconsole) is simply broken, for another...
> IMO ##5--8 are asking for kernel_pread() and if you look at binfmt_elf.c,
> you'll see elf_read() being pretty much that. acct.c, keys and usermode
> parts are asking for kernel_pwrite() as well.
>
> I've got stuck looking through the drivers/target stuff - it would've
> been another kernel_pwrite() candidate, but it smells like its use of
> filp_open() is really asking for trouble, starting with symlink attacks.
> Not sure - I'm not familiar with the area, but...
Can you just pull in the minimal fix so that the branch gets fixed
for this merge window? All the cleanups can come later.
^ permalink raw reply
* Re: [PATCH] powerpc/perf: fix Threshold Event CounterMultiplier width for P10
From: Madhavan Srinivasan @ 2020-10-14 5:50 UTC (permalink / raw)
To: Michal Suchánek; +Cc: atrajeev, linuxppc-dev
In-Reply-To: <20201013155842.GY29778@kitsune.suse.cz>
On 10/13/20 9:28 PM, Michal Suchánek wrote:
> On Tue, Oct 13, 2020 at 06:27:05PM +0530, Madhavan Srinivasan wrote:
>> On 10/12/20 4:59 PM, Michal Suchánek wrote:
>>> Hello,
>>>
>>> On Mon, Oct 12, 2020 at 04:01:28PM +0530, Madhavan Srinivasan wrote:
>>>> Power9 and isa v3.1 has 7bit mantissa field for Threshold Event Counter
>>> ^^^ Shouldn't his be 3.0?
>> My bad, What I meant was
>>
>> Power9, ISA v3.0 and ISA v3.1 define a 7 bit mantissa field for Threshold
>> Event Counter Multiplier(TECM).
> I am really confused.
>
> The following text and the code suggests that the mantissa is 8bit on
> POWER10 and ISA v3.1.
Ok got it. Will fix the CPU_FTR_ARCH_31 check.
Thanks for review
Maddy
>
> Thanks
>
> Michal
>> Maddy
>>
>>>> Multiplier (TECM). TECM is part of Monitor Mode Control Register A (MMCRA).
>>>> This field along with Threshold Event Counter Exponent (TECE) is used to
>>>> get threshould counter value. In Power10, the width of TECM field is
>>>> increase to 8bits. Patch fixes the current code to modify the MMCRA[TECM]
>>>> extraction macro to handling this changes.
>>>>
>>>> Fixes: 170a315f41c64 ('powerpc/perf: Support to export MMCRA[TEC*] field to userspace')
>>>> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
>>>> ---
>>>> arch/powerpc/perf/isa207-common.c | 3 +++
>>>> arch/powerpc/perf/isa207-common.h | 4 ++++
>>>> 2 files changed, 7 insertions(+)
>>>>
>>>> diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
>>>> index 964437adec18..5fe129f02290 100644
>>>> --- a/arch/powerpc/perf/isa207-common.c
>>>> +++ b/arch/powerpc/perf/isa207-common.c
>>>> @@ -247,6 +247,9 @@ void isa207_get_mem_weight(u64 *weight)
>>>> u64 sier = mfspr(SPRN_SIER);
>>>> u64 val = (sier & ISA207_SIER_TYPE_MASK) >> ISA207_SIER_TYPE_SHIFT;
>>>> + if (cpu_has_feature(CPU_FTR_ARCH_31))
>>>> + mantissa = P10_MMCRA_THR_CTR_MANT(mmcra);
>>>> +
>>>> if (val == 0 || val == 7)
>>>> *weight = 0;
>>>> else
>>>> diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
>>>> index 044de65e96b9..71380e854f48 100644
>>>> --- a/arch/powerpc/perf/isa207-common.h
>>>> +++ b/arch/powerpc/perf/isa207-common.h
>>>> @@ -219,6 +219,10 @@
>>>> #define MMCRA_THR_CTR_EXP(v) (((v) >> MMCRA_THR_CTR_EXP_SHIFT) &\
>>>> MMCRA_THR_CTR_EXP_MASK)
>>>> +#define P10_MMCRA_THR_CTR_MANT_MASK 0xFFul
>>>> +#define P10_MMCRA_THR_CTR_MANT(v) (((v) >> MMCRA_THR_CTR_MANT_SHIFT) &\
>>>> + P10_MMCRA_THR_CTR_MANT_MASK)
>>>> +
>>>> /* MMCRA Threshold Compare bit constant for power9 */
>>>> #define p9_MMCRA_THR_CMP_SHIFT 45
>>>> --
>>>> 2.26.2
>>>>
^ permalink raw reply
* Re: [PATCH] powerpc/features: Remove CPU_FTR_NODSISRALIGN
From: Aneesh Kumar K.V @ 2020-10-14 3:19 UTC (permalink / raw)
To: Michael Ellerman, Christophe Leroy, Benjamin Herrenschmidt,
Paul Mackerras
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <87wnzuzb1x.fsf@mpe.ellerman.id.au>
On 10/13/20 3:45 PM, Michael Ellerman wrote:
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Le 13/10/2020 à 09:23, Aneesh Kumar K.V a écrit :
>>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>>
>>>> CPU_FTR_NODSISRALIGN has not been used since
>>>> commit 31bfdb036f12 ("powerpc: Use instruction emulation
>>>> infrastructure to handle alignment faults")
>>>>
>>>> Remove it.
>>>>
>>>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>>>> ---
>>>> arch/powerpc/include/asm/cputable.h | 22 ++++++++++------------
>>>> arch/powerpc/kernel/dt_cpu_ftrs.c | 8 --------
>>>> arch/powerpc/kernel/prom.c | 2 +-
>>>> 3 files changed, 11 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
>>>> index 1098863e17ee..c598961d9f15 100644
>>>> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
>>>> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
>>>> @@ -273,13 +273,6 @@ static int __init feat_enable_idle_nap(struct dt_cpu_feature *f)
>>>> return 1;
>>>> }
>>>>
>>>> -static int __init feat_enable_align_dsisr(struct dt_cpu_feature *f)
>>>> -{
>>>> - cur_cpu_spec->cpu_features &= ~CPU_FTR_NODSISRALIGN;
>>>> -
>>>> - return 1;
>>>> -}
>>>> -
>>>> static int __init feat_enable_idle_stop(struct dt_cpu_feature *f)
>>>> {
>>>> u64 lpcr;
>>>> @@ -641,7 +634,6 @@ static struct dt_cpu_feature_match __initdata
>>>> {"tm-suspend-hypervisor-assist", feat_enable, CPU_FTR_P9_TM_HV_ASSIST},
>>>> {"tm-suspend-xer-so-bug", feat_enable, CPU_FTR_P9_TM_XER_SO_BUG},
>>>> {"idle-nap", feat_enable_idle_nap, 0},
>>>> - {"alignment-interrupt-dsisr", feat_enable_align_dsisr, 0},
>
> Rather than removing it entirely, I'd rather we left a comment, so that
> it's obvious that we are ignoring that feature on purpose, not because
> we forget about it.
>
> eg:
>
> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index f204ad79b6b5..45cb7e59bd13 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -640,7 +640,7 @@ static struct dt_cpu_feature_match __initdata
> {"tm-suspend-hypervisor-assist", feat_enable, CPU_FTR_P9_TM_HV_ASSIST},
> {"tm-suspend-xer-so-bug", feat_enable, CPU_FTR_P9_TM_XER_SO_BUG},
> {"idle-nap", feat_enable_idle_nap, 0},
> - {"alignment-interrupt-dsisr", feat_enable_align_dsisr, 0},
> + // "alignment-interrupt-dsisr" ignored
> {"idle-stop", feat_enable_idle_stop, 0},
> {"machine-check-power8", feat_enable_mce_power8, 0},
> {"performance-monitor-power8", feat_enable_pmu_power8, 0},
>
why not do it as
static int __init feat_enable_align_dsisr(struct dt_cpu_feature *f)
{
/* This feature should not be enabled */
#ifdef DEBUG
WARN(1);
#endif
return 1;
}
-aneesh
^ permalink raw reply
* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
From: Aneesh Kumar K.V @ 2020-10-14 3:15 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, Anshuman Khandual, linuxppc-dev
In-Reply-To: <20201013135858.f4a7f0c5f3b0a69a2a304cfe@linux-foundation.org>
On 10/14/20 2:28 AM, Andrew Morton wrote:
> On Wed, 2 Sep 2020 17:12:09 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:
>
>> This patch series includes fixes for debug_vm_pgtable test code so that
>> they follow page table updates rules correctly. The first two patches introduce
>> changes w.r.t ppc64. The patches are included in this series for completeness. We can
>> merge them via ppc64 tree if required.
>
> Do you think this series is ready to be merged?
Hopefully, except for the Riscv crash.
>
> Possibly-unresolved issues which I have recorded are
>
> Against
> mm-debug_vm_pgtable-locks-move-non-page-table-modifying-test-together.patch:
>
> https://lkml.kernel.org/r/56830efb-887e-0000-a46e-ae015e5854cd@arm.com
I guess the full series do boot fine on arm.
> https://lkml.kernel.org/r/20200910075752.GC26874@shao2-debian
This should be fixed by
https://ozlabs.org/~akpm/mmots/broken-out/mm-debug_vm_pgtable-avoid-doing-memory-allocation-with-pgtable_t-mapped.patch
>
> Against mm-debug_vm_pgtable-avoid-none-pte-in-pte_clear_test.patch:
>
> https://lkml.kernel.org/r/87zh5wx51b.fsf@linux.ibm.com
yes this one we should get fixed. I was hoping someone familiar with
Riscv pte updates rules would pitch in. IIUC we need to update
RANDON_ORVALUE similar to how we updated it for s390 and ppc64.
Alternatively we can do this
modified mm/debug_vm_pgtable.c
@@ -548,7 +548,7 @@ static void __init pte_clear_tests(struct mm_struct
*mm, pte_t *ptep,
pte_t pte = pfn_pte(pfn, prot);
pr_debug("Validating PTE clear\n");
- pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
+// pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
set_pte_at(mm, vaddr, ptep, pte);
barrier();
pte_clear(mm, vaddr, ptep);
till we get that feedback from RiscV team?
> https://lkml.kernel.org/r/37a9facc-ca36-290f-3748-16c4a7a778fa@arm.com
same as the above.
> https://lkml.kernel.org/r/20201011200258.GA91021@roeck-us.net
>
same as the above.
-aneesh
^ permalink raw reply
* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
From: Alexey Kardashevskiy @ 2020-10-14 2:55 UTC (permalink / raw)
To: Cédric Le Goater, Qian Cai, Michael Ellerman
Cc: linuxppc-dev, linux-next, Oliver O'Halloran, linux-kernel,
Stephen Rothwell
In-Reply-To: <fce8ffe1-521c-8344-c7ad-53550e408cdc@kaod.org>
On 23/09/2020 17:06, Cédric Le Goater wrote:
> On 9/23/20 2:33 AM, Qian Cai wrote:
>> On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
>>> When a passthrough IO adapter is removed from a pseries machine using
>>> hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
>>> guest OS to clear all page table entries related to the adapter. If
>>> some are still present, the RTAS call which isolates the PCI slot
>>> returns error 9001 "valid outstanding translations" and the removal of
>>> the IO adapter fails. This is because when the PHBs are scanned, Linux
>>> maps automatically the INTx interrupts in the Linux interrupt number
>>> space but these are never removed.
>>>
>>> To solve this problem, we introduce a PPC platform specific
>>> pcibios_remove_bus() routine which clears all interrupt mappings when
>>> the bus is removed. This also clears the associated page table entries
>>> of the ESB pages when using XIVE.
>>>
>>> For this purpose, we record the logical interrupt numbers of the
>>> mapped interrupt under the PHB structure and let pcibios_remove_bus()
>>> do the clean up.
>>>
>>> Since some PCI adapters, like GPUs, use the "interrupt-map" property
>>> to describe interrupt mappings other than the legacy INTx interrupts,
>>> we can not restrict the size of the mapping array to PCI_NUM_INTX. The
>>> number of interrupt mappings is computed from the "interrupt-map"
>>> property and the mapping array is allocated accordingly.
>>>
>>> Cc: "Oliver O'Halloran" <oohall@gmail.com>
>>> Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>
>> Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed to
>> this patch.
>>
>> .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
>
> OK. The patch is missing a NULL assignement after kfree() and that
> might be the issue.
>
> I did try PHB removal under PowerNV, so I would like to understand
> how we managed to remove twice the PCI bus and possibly reproduce.
> Any chance we could grab what the syscall fuzzer (syzkaller) did ?
How do you remove PHBs exactly? There is no such thing in the powernv
platform, I thought someone added this and you are fixing it but no.
PHBs on powernv are created at the boot time and there is no way to
remove them, you can only try removing all the bridges.
So what exactly are you doing?
--
Alexey
^ permalink raw reply
* [PATCH] selftests/powerpc: Fix eeh-basic.sh exit codes
From: Oliver O'Halloran @ 2020-10-14 2:47 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Oliver O'Halloran
The kselftests test running infrastructure expects tests to finish with an
exit code of 4 if the test decided it should be skipped. Currently
eeh-basic.sh exits with the number of devices that failed to recover, so if
four devices didn't recover we'll report a skip instead of a fail.
Fix this by checking if the return code is non-zero and report success
and failure by returning 0 or 1 respectively. For the cases where should
actually skip return 4.
Fixes: 85d86c8aa52e ("selftests/powerpc: Add basic EEH selftest")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
tools/testing/selftests/powerpc/eeh/eeh-basic.sh | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/powerpc/eeh/eeh-basic.sh b/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
index 8a8d0f456946..0d783e1065c8 100755
--- a/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
+++ b/tools/testing/selftests/powerpc/eeh/eeh-basic.sh
@@ -1,17 +1,19 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0-only
+KSELFTESTS_SKIP=4
+
. ./eeh-functions.sh
if ! eeh_supported ; then
echo "EEH not supported on this system, skipping"
- exit 0;
+ exit $KSELFTESTS_SKIP;
fi
if [ ! -e "/sys/kernel/debug/powerpc/eeh_dev_check" ] && \
[ ! -e "/sys/kernel/debug/powerpc/eeh_dev_break" ] ; then
echo "debugfs EEH testing files are missing. Is debugfs mounted?"
- exit 1;
+ exit $KSELFTESTS_SKIP;
fi
pre_lspci=`mktemp`
@@ -84,4 +86,5 @@ echo "$failed devices failed to recover ($dev_count tested)"
lspci | diff -u $pre_lspci -
rm -f $pre_lspci
-exit $failed
+test "$failed" == 0
+exit $?
--
2.26.2
^ permalink raw reply related
* Re: [PATCH 1/2] powerpc: Fix user data corruption with P9N DD2.1 VSX CI load workaround emulation
From: Michael Ellerman @ 2020-10-14 0:32 UTC (permalink / raw)
To: Michael Neuling; +Cc: mikey, linuxppc-dev
In-Reply-To: <20201013043741.743413-1-mikey@neuling.org>
Michael Neuling <mikey@neuling.org> writes:
> __get_user_atomic_128_aligned() stores to kaddr using stvx which is a
> VMX store instruction, hence kaddr must be 16 byte aligned otherwise
> the store won't occur as expected.
>
> Unfortunately when we call __get_user_atomic_128_aligned() in
> p9_hmi_special_emu(), the buffer we pass as kaddr (ie. vbuf) isn't
> guaranteed to be 16B aligned. This means that the write to vbuf in
> __get_user_atomic_128_aligned() has the bottom bits of the address
> truncated. This results in other local variables being
> overwritten. Also vbuf will not contain the correct data which results
> in the userspace emulation being wrong and hence user data corruption.
>
> In the past we've been mostly lucky as vbuf has ended up aligned but
> this is fragile and isn't always true. CONFIG_STACKPROTECTOR in
> particular can change the stack arrangement enough that our luck runs
> out.
Below is a script which takes a System.map and vmlinux (or objdump
output) and tries to check if the stack layout is susceptible to the
bug.
cheers
#!/usr/bin/python3
import os
import sys
import re
from subprocess import Popen, PIPE
# eg: c00000000002ea88: ce 49 00 7c stvx v0,0,r9
stvx_pattern = re.compile('^c[0-9a-f]{15}:\s+(?:[0-9a-f]{2} ){4}\s+stvx\s+v0,0,(r\d+)\s*')
# eg: c00000000002ea80: 28 00 21 39 addi r9,r1,40
addi_pattern = '^c[0-9a-f]{15}:\s+(?:[0-9a-f]{2} ){4}\s+addi\s+%s,r1,(\d+)\s*'
def main(args):
if len(args) != 2:
print('Usage: %s <objdump|vmlinux> <System.map>' % sys.argv[0])
return -1
if os.path.basename(sys.argv[1]).startswith('vmlinu'):
dump = Popen(['objdump', '-d', sys.argv[1]], stdout=PIPE, encoding='utf-8').stdout
else:
dump = open(sys.argv[1])
syms = read_symbols(sys.argv[2])
func_lines = extract_func(dump, 'handle_hmi_exception', syms)
if func_lines is None:
print("Error: couldn't find handle_hmi_exception in objdump output")
return -1
match = None
i = 0
while i < len(func_lines):
match = stvx_pattern.match(func_lines[i])
if match:
break
i += 1
if match is None:
print("Error: couldn't find stvx in handle_hmi_exception")
return -1
stvx_reg = match.group(1)
print('stvx found using register %s:\n%s\n' % (stvx_reg, match.group(0).rstrip()))
match = None
i -= 1
while i > 0:
pattern = re.compile(addi_pattern % stvx_reg)
match = pattern.match(func_lines[i])
if match:
break
i -= 1
if match is None:
print("Error: couldn't find addi in handle_hmi_exception")
return -1
stack_offset = int(match.group(1))
print('addi found using offset %d:\n%s\n' % (stack_offset, match.group(0).rstrip()))
if stack_offset & 0xf:
print('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!')
print('!! Offset is misaligned - bug present !!')
print('!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!')
return 1
else:
print('OK - offset is aligned')
return 0
def extract_func(f, func_name, syms):
func_addr, func_size = find_symbol_and_size(syms, func_name)
num_lines = int(func_size / 4)
pattern = re.compile('^%016x:' % func_addr)
match = None
line = f.readline()
while len(line):
match = pattern.match(line)
if match:
break
line = f.readline()
if match is None:
return None
lines = []
for i in range(0, num_lines):
lines.append(f.readline())
return lines
def read_symbols(map_path):
last_function = ''
last_addr = 0
lines = open(map_path).readlines()
addrs = []
last_addr = 0
for line in lines:
tokens = line.split()
if len(tokens) == 3:
addr = int(tokens[0], 16)
sym_type = tokens[1]
name = tokens[2]
elif len(tokens) == 2:
addr = last_addr
sym_type = tokens[0]
name = tokens[1]
else:
raise Exception("Couldn't grok System.map")
addrs.append((addr, name, sym_type))
last_addr = addr
return addrs
def find_symbol_and_size(symbol_map, name):
dot_name = '.%s' % name
saddr = None
i = 0
for addr, cur_name, sym_type in symbol_map:
if cur_name == name or cur_name == dot_name:
saddr = addr
break
i += 1
if saddr is None:
return (None, None)
i += 1
if i >= len(symbol_map):
size = -1
else:
size = symbol_map[i][0] - saddr
return (saddr, size)
sys.exit(main(sys.argv[1:]))
^ permalink raw reply
* Re: [PATCH 1/2] powerpc: Fix user data corruption with P9N DD2.1 VSX CI load workaround emulation
From: Michael Ellerman @ 2020-10-14 0:13 UTC (permalink / raw)
To: Michael Neuling; +Cc: mikey, linuxppc-dev
In-Reply-To: <20201013043741.743413-1-mikey@neuling.org>
Michael Neuling <mikey@neuling.org> writes:
> __get_user_atomic_128_aligned() stores to kaddr using stvx which is a
> VMX store instruction, hence kaddr must be 16 byte aligned otherwise
> the store won't occur as expected.
>
> Unfortunately when we call __get_user_atomic_128_aligned() in
> p9_hmi_special_emu(), the buffer we pass as kaddr (ie. vbuf) isn't
> guaranteed to be 16B aligned. This means that the write to vbuf in
> __get_user_atomic_128_aligned() has the bottom bits of the address
> truncated. This results in other local variables being
> overwritten. Also vbuf will not contain the correct data which results
> in the userspace emulation being wrong and hence user data corruption.
>
> In the past we've been mostly lucky as vbuf has ended up aligned but
> this is fragile and isn't always true. CONFIG_STACKPROTECTOR in
> particular can change the stack arrangement enough that our luck runs
> out.
Actually I'm yet to find a kernel with CONFIG_STACKPROTECTOR=n that is
vulnerable to the bug.
Turning on STACKPROTECTOR changes the order GCC allocates locals on the
stack, from bottom-up to top-down. That in conjunction with the 8 byte
stack canary means we end up with 8 bytes of space below the locals,
which misaligns vbuf.
But obviously other things can change the stack layout too, so no
guarantees that CONFIG_STACKPROTECTOR=n makes it safe.
cheers
^ permalink raw reply
* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
From: Michael Ellerman @ 2020-10-13 23:42 UTC (permalink / raw)
To: Qian Cai, Cédric Le Goater
Cc: Stephen Rothwell, Alexey Kardashevskiy, linux-kernel, linux-next,
Oliver O'Halloran, linuxppc-dev
In-Reply-To: <90922c43c670e4b55e6cf421be19146333e2ae7b.camel@redhat.com>
Qian Cai <cai@redhat.com> writes:
> On Wed, 2020-09-23 at 09:06 +0200, Cédric Le Goater wrote:
>> On 9/23/20 2:33 AM, Qian Cai wrote:
>> > On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
>> > > When a passthrough IO adapter is removed from a pseries machine using
>> > > hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
>> > > guest OS to clear all page table entries related to the adapter. If
>> > > some are still present, the RTAS call which isolates the PCI slot
>> > > returns error 9001 "valid outstanding translations" and the removal of
>> > > the IO adapter fails. This is because when the PHBs are scanned, Linux
>> > > maps automatically the INTx interrupts in the Linux interrupt number
>> > > space but these are never removed.
>> > >
>> > > To solve this problem, we introduce a PPC platform specific
>> > > pcibios_remove_bus() routine which clears all interrupt mappings when
>> > > the bus is removed. This also clears the associated page table entries
>> > > of the ESB pages when using XIVE.
>> > >
>> > > For this purpose, we record the logical interrupt numbers of the
>> > > mapped interrupt under the PHB structure and let pcibios_remove_bus()
>> > > do the clean up.
>> > >
>> > > Since some PCI adapters, like GPUs, use the "interrupt-map" property
>> > > to describe interrupt mappings other than the legacy INTx interrupts,
>> > > we can not restrict the size of the mapping array to PCI_NUM_INTX. The
>> > > number of interrupt mappings is computed from the "interrupt-map"
>> > > property and the mapping array is allocated accordingly.
>> > >
>> > > Cc: "Oliver O'Halloran" <oohall@gmail.com>
>> > > Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
>> > > Signed-off-by: Cédric Le Goater <clg@kaod.org>
>> >
>> > Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed
>> > to
>> > this patch.
>> >
>> > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
>>
>> OK. The patch is missing a NULL assignement after kfree() and that
>> might be the issue.
>>
>> I did try PHB removal under PowerNV, so I would like to understand
>> how we managed to remove twice the PCI bus and possibly reproduce.
>> Any chance we could grab what the syscall fuzzer (syzkaller) did ?
>
> Any update on this? Maybe Michael or Stephen could drop this for now, so our
> fuzzing could continue to find something else new?
Someone send me a revert?
cheers
^ permalink raw reply
* Re: [PATCH v4 00/13] mm/debug_vm_pgtable fixes
From: Andrew Morton @ 2020-10-13 20:58 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: linux-mm, Anshuman Khandual, linuxppc-dev
In-Reply-To: <20200902114222.181353-1-aneesh.kumar@linux.ibm.com>
On Wed, 2 Sep 2020 17:12:09 +0530 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> wrote:
> This patch series includes fixes for debug_vm_pgtable test code so that
> they follow page table updates rules correctly. The first two patches introduce
> changes w.r.t ppc64. The patches are included in this series for completeness. We can
> merge them via ppc64 tree if required.
Do you think this series is ready to be merged?
Possibly-unresolved issues which I have recorded are
Against
mm-debug_vm_pgtable-locks-move-non-page-table-modifying-test-together.patch:
https://lkml.kernel.org/r/56830efb-887e-0000-a46e-ae015e5854cd@arm.com
https://lkml.kernel.org/r/20200910075752.GC26874@shao2-debian
Against mm-debug_vm_pgtable-avoid-none-pte-in-pte_clear_test.patch:
https://lkml.kernel.org/r/87zh5wx51b.fsf@linux.ibm.com
https://lkml.kernel.org/r/37a9facc-ca36-290f-3748-16c4a7a778fa@arm.com
https://lkml.kernel.org/r/20201011200258.GA91021@roeck-us.net
^ permalink raw reply
* Re: [PATCH RFC PKS/PMEM 24/58] fs/freevxfs: Utilize new kmap_thread()
From: Ira Weiny @ 2020-10-13 20:52 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-aio, linux-efi, kvm, linux-doc, Peter Zijlstra, linux-mmc,
Dave Hansen, dri-devel, linux-mm, target-devel, linux-mtd,
linux-kselftest, samba-technical, Thomas Gleixner, drbd-dev,
devel, linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm,
linux-rdma, x86, ceph-devel, amd-gfx, io-uring, cluster-devel,
Ingo Molnar, intel-wired-lan, xen-devel, linux-ext4, Fenghua Yu,
linux-afs, linux-um, intel-gfx, ecryptfs, linux-erofs,
reiserfs-devel, linux-block, linux-bcache, Borislav Petkov,
Andy Lutomirski, Dan Williams, Andrew Morton, linux-cachefs,
linux-nfs, linux-ntfs-dev, netdev, kexec, linux-kernel,
linux-f2fs-devel, linux-fsdevel, bpf, linuxppc-dev, linux-btrfs
In-Reply-To: <20201013112544.GA5249@infradead.org>
On Tue, Oct 13, 2020 at 12:25:44PM +0100, Christoph Hellwig wrote:
> > - kaddr = kmap(pp);
> > + kaddr = kmap_thread(pp);
> > memcpy(kaddr, vip->vii_immed.vi_immed + offset, PAGE_SIZE);
> > - kunmap(pp);
> > + kunmap_thread(pp);
>
> You only Cced me on this particular patch, which means I have absolutely
> no idea what kmap_thread and kunmap_thread actually do, and thus can't
> provide an informed review.
Sorry the list was so big I struggled with who to CC and on which patches.
>
> That being said I think your life would be a lot easier if you add
> helpers for the above code sequence and its counterpart that copies
> to a potential hughmem page first, as that hides the implementation
> details from most users.
Matthew Wilcox and Al Viro have suggested similar ideas.
https://lore.kernel.org/lkml/20201013205012.GI2046448@iweiny-DESK2.sc.intel.com/
Ira
^ permalink raw reply
* Re: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()
From: Ira Weiny @ 2020-10-13 20:50 UTC (permalink / raw)
To: Al Viro
Cc: linux-aio, linux-efi, KVM list, Linux Doc Mailing List,
Peter Zijlstra, linux-mmc, Dave Hansen,
Maling list - DRI developers, Linux MM, target-devel, linux-mtd,
amd-gfx list, linux-kselftest, samba-technical, Thomas Gleixner,
drbd-dev, devel, linux-cifs, linux-nilfs, linux-scsi,
linux-nvdimm, linux-rdma, X86 ML, ceph-devel, Matthew Wilcox,
io-uring, cluster-devel, Ingo Molnar, intel-wired-lan, xen-devel,
linux-ext4, Fenghua Yu, linux-afs, linux-um, intel-gfx, ecryptfs,
linux-erofs, reiserfs-devel, linux-block, linux-bcache,
Borislav Petkov, Andy Lutomirski, Dan Williams, bpf,
linux-cachefs, linux-nfs, Nicolas Pitre, linux-ntfs-dev, Netdev,
Kexec Mailing List, Linux Kernel Mailing List, linux-f2fs-devel,
linux-fsdevel, Andrew Morton, linuxppc-dev, linux-btrfs
In-Reply-To: <20201013200149.GI3576660@ZenIV.linux.org.uk>
On Tue, Oct 13, 2020 at 09:01:49PM +0100, Al Viro wrote:
> On Tue, Oct 13, 2020 at 08:36:43PM +0100, Matthew Wilcox wrote:
>
> > static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size)
> > {
> > char *vto = kmap_atomic(to);
> >
> > memcpy(vto, vfrom, size);
> > kunmap_atomic(vto);
> > }
> >
> > in linux/highmem.h ?
>
> You mean, like
> static void memcpy_from_page(char *to, struct page *page, size_t offset, size_t len)
> {
> char *from = kmap_atomic(page);
> memcpy(to, from + offset, len);
> kunmap_atomic(from);
> }
>
> static void memcpy_to_page(struct page *page, size_t offset, const char *from, size_t len)
> {
> char *to = kmap_atomic(page);
> memcpy(to + offset, from, len);
> kunmap_atomic(to);
> }
>
> static void memzero_page(struct page *page, size_t offset, size_t len)
> {
> char *addr = kmap_atomic(page);
> memset(addr + offset, 0, len);
> kunmap_atomic(addr);
> }
>
> in lib/iov_iter.c? FWIW, I don't like that "highpage" in the name and
> highmem.h as location - these make perfect sense regardless of highmem;
> they are normal memory operations with page + offset used instead of
> a pointer...
I was thinking along those lines as well especially because of the direction
this patch set takes kmap().
Thanks for pointing these out to me. How about I lift them to a common header?
But if not highmem.h where?
Ira
^ permalink raw reply
* Re: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()
From: Ira Weiny @ 2020-10-13 20:45 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-aio, linux-efi, KVM list, Linux Doc Mailing List,
Peter Zijlstra, linux-mmc, Dave Hansen,
Maling list - DRI developers, Linux MM, target-devel, linux-mtd,
linux-kselftest, samba-technical, Thomas Gleixner, drbd-dev,
devel, linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm,
linux-rdma, X86 ML, ceph-devel, amd-gfx list, io-uring,
cluster-devel, Ingo Molnar, intel-wired-lan, xen-devel,
linux-ext4, Fenghua Yu, linux-afs, linux-um, intel-gfx, ecryptfs,
linux-erofs, reiserfs-devel, linux-block, linux-bcache,
Borislav Petkov, Andy Lutomirski, Dan Williams, bpf,
linux-cachefs, linux-nfs, Nicolas Pitre, linux-ntfs-dev, Netdev,
Kexec Mailing List, Linux Kernel Mailing List, linux-f2fs-devel,
linux-fsdevel, Andrew Morton, linuxppc-dev, linux-btrfs
In-Reply-To: <20201013193643.GK20115@casper.infradead.org>
On Tue, Oct 13, 2020 at 08:36:43PM +0100, Matthew Wilcox wrote:
> On Tue, Oct 13, 2020 at 11:44:29AM -0700, Dan Williams wrote:
> > On Fri, Oct 9, 2020 at 12:52 PM <ira.weiny@intel.com> wrote:
> > >
> > > From: Ira Weiny <ira.weiny@intel.com>
> > >
> > > The kmap() calls in this FS are localized to a single thread. To avoid
> > > the over head of global PKRS updates use the new kmap_thread() call.
> > >
> > > Cc: Nicolas Pitre <nico@fluxnic.net>
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > ---
> > > fs/cramfs/inode.c | 10 +++++-----
> > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> > > index 912308600d39..003c014a42ed 100644
> > > --- a/fs/cramfs/inode.c
> > > +++ b/fs/cramfs/inode.c
> > > @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
> > > struct page *page = pages[i];
> > >
> > > if (page) {
> > > - memcpy(data, kmap(page), PAGE_SIZE);
> > > - kunmap(page);
> > > + memcpy(data, kmap_thread(page), PAGE_SIZE);
> > > + kunmap_thread(page);
> >
> > Why does this need a sleepable kmap? This looks like a textbook
> > kmap_atomic() use case.
>
> There's a lot of code of this form. Could we perhaps have:
>
> static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size)
> {
> char *vto = kmap_atomic(to);
>
> memcpy(vto, vfrom, size);
> kunmap_atomic(vto);
> }
>
> in linux/highmem.h ?
Christoph had the same idea. I'll work on it.
Ira
^ permalink raw reply
* Re: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()
From: Al Viro @ 2020-10-13 20:01 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-aio, linux-efi, KVM list, Linux Doc Mailing List,
Peter Zijlstra, linux-mmc, Dave Hansen,
Maling list - DRI developers, Linux MM, target-devel, linux-mtd,
linux-kselftest, samba-technical, Weiny, Ira, Dan Williams,
drbd-dev, devel, linux-cifs, linux-nilfs, linux-scsi,
linux-nvdimm, linux-rdma, X86 ML, ceph-devel, amd-gfx list,
io-uring, cluster-devel, Ingo Molnar, intel-wired-lan, xen-devel,
linux-ext4, Fenghua Yu, linux-afs, linux-um, intel-gfx, ecryptfs,
linux-erofs, reiserfs-devel, linux-block, linux-bcache,
Borislav Petkov, Andy Lutomirski, Thomas Gleixner, Andrew Morton,
linux-cachefs, linux-nfs, Nicolas Pitre, linux-ntfs-dev, Netdev,
Kexec Mailing List, Linux Kernel Mailing List, linux-f2fs-devel,
linux-fsdevel, bpf, linuxppc-dev, linux-btrfs
In-Reply-To: <20201013193643.GK20115@casper.infradead.org>
On Tue, Oct 13, 2020 at 08:36:43PM +0100, Matthew Wilcox wrote:
> static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size)
> {
> char *vto = kmap_atomic(to);
>
> memcpy(vto, vfrom, size);
> kunmap_atomic(vto);
> }
>
> in linux/highmem.h ?
You mean, like
static void memcpy_from_page(char *to, struct page *page, size_t offset, size_t len)
{
char *from = kmap_atomic(page);
memcpy(to, from + offset, len);
kunmap_atomic(from);
}
static void memcpy_to_page(struct page *page, size_t offset, const char *from, size_t len)
{
char *to = kmap_atomic(page);
memcpy(to + offset, from, len);
kunmap_atomic(to);
}
static void memzero_page(struct page *page, size_t offset, size_t len)
{
char *addr = kmap_atomic(page);
memset(addr + offset, 0, len);
kunmap_atomic(addr);
}
in lib/iov_iter.c? FWIW, I don't like that "highpage" in the name and
highmem.h as location - these make perfect sense regardless of highmem;
they are normal memory operations with page + offset used instead of
a pointer...
^ permalink raw reply
* Re: [PATCH v2] ima: defer arch_ima_get_secureboot() call to IMA init time
From: Mimi Zohar @ 2020-10-13 19:45 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-efi, Dmitry Kasatkin, James Morris, Chester Lin,
linux-security-module, linux-integrity,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), Serge E. Hallyn
In-Reply-To: <CAMj1kXFZVR46_oeYTxJ59q-7u+zFCFtOQuSQoiEzKLhXzpydow@mail.gmail.com>
On Tue, 2020-10-13 at 18:59 +0200, Ard Biesheuvel wrote:
> Suggestion: can we take the get_sb_mode() code from ima_arch.c in
> arch/x86, and generalize it for all EFI architectures? That way, we
> can enable 32-bit ARM and RISC-V seamlessly once someone gets around
> to enabling IMA on those platforms. In fact, get_sb_mode() itself
> should probably be factored out into a generic helper for use outside
> of IMA as well (Xen/x86 has code that does roughly the same already)
On Power, there are three different policies - secure, trusted, and
secure & trusted boot policy rules. Based on whether secure or trusted
boot is enabled, the appropriate policy is enabled. On x86, if
secure_boot is enabled (and CONFIG_IMA_ARCH_POLICY is enabled) both the
secure and trusted boot rules are defined. Is this design fine enough
granularity or should should there be a get_trustedboot_mode() function
as well?
Agreed, the code should not be duplicated across arch's. As for making
get_sb_mode() generic, not dependent on IMA, where would it reside?
Would this be in EFI?
thanks,
Mimi
^ permalink raw reply
* Re: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()
From: Dan Williams @ 2020-10-13 19:41 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-aio, linux-efi, KVM list, Linux Doc Mailing List,
Peter Zijlstra, linux-mmc, Dave Hansen,
Maling list - DRI developers, Linux MM, target-devel, linux-mtd,
linux-kselftest, samba-technical, Weiny, Ira, ceph-devel,
drbd-dev, devel, linux-cifs, linux-nilfs, linux-scsi,
linux-nvdimm, linux-rdma, X86 ML, amd-gfx list, io-uring,
cluster-devel, Ingo Molnar, intel-wired-lan, xen-devel,
linux-ext4, Fenghua Yu, linux-afs, linux-um, intel-gfx, ecryptfs,
linux-erofs, reiserfs-devel, linux-block, linux-bcache,
Borislav Petkov, Andy Lutomirski, Thomas Gleixner, Andrew Morton,
linux-cachefs, linux-nfs, Nicolas Pitre, linux-ntfs-dev, Netdev,
Kexec Mailing List, Linux Kernel Mailing List, linux-f2fs-devel,
linux-fsdevel, bpf, linuxppc-dev, linux-btrfs
In-Reply-To: <20201013193643.GK20115@casper.infradead.org>
On Tue, Oct 13, 2020 at 12:37 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, Oct 13, 2020 at 11:44:29AM -0700, Dan Williams wrote:
> > On Fri, Oct 9, 2020 at 12:52 PM <ira.weiny@intel.com> wrote:
> > >
> > > From: Ira Weiny <ira.weiny@intel.com>
> > >
> > > The kmap() calls in this FS are localized to a single thread. To avoid
> > > the over head of global PKRS updates use the new kmap_thread() call.
> > >
> > > Cc: Nicolas Pitre <nico@fluxnic.net>
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > ---
> > > fs/cramfs/inode.c | 10 +++++-----
> > > 1 file changed, 5 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> > > index 912308600d39..003c014a42ed 100644
> > > --- a/fs/cramfs/inode.c
> > > +++ b/fs/cramfs/inode.c
> > > @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
> > > struct page *page = pages[i];
> > >
> > > if (page) {
> > > - memcpy(data, kmap(page), PAGE_SIZE);
> > > - kunmap(page);
> > > + memcpy(data, kmap_thread(page), PAGE_SIZE);
> > > + kunmap_thread(page);
> >
> > Why does this need a sleepable kmap? This looks like a textbook
> > kmap_atomic() use case.
>
> There's a lot of code of this form. Could we perhaps have:
>
> static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size)
> {
> char *vto = kmap_atomic(to);
>
> memcpy(vto, vfrom, size);
> kunmap_atomic(vto);
> }
>
> in linux/highmem.h ?
Nice, yes, that could also replace the local ones in lib/iov_iter.c
(memcpy_{to,from}_page())
^ permalink raw reply
* Re: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()
From: Matthew Wilcox @ 2020-10-13 19:36 UTC (permalink / raw)
To: Dan Williams
Cc: linux-aio, linux-efi, KVM list, Linux Doc Mailing List,
Peter Zijlstra, linux-mmc, Dave Hansen,
Maling list - DRI developers, Linux MM, target-devel, linux-mtd,
linux-kselftest, samba-technical, Weiny, Ira, ceph-devel,
drbd-dev, devel, linux-cifs, linux-nilfs, linux-scsi,
linux-nvdimm, linux-rdma, X86 ML, amd-gfx list, io-uring,
cluster-devel, Ingo Molnar, intel-wired-lan, xen-devel,
linux-ext4, Fenghua Yu, linux-afs, linux-um, intel-gfx, ecryptfs,
linux-erofs, reiserfs-devel, linux-block, linux-bcache,
Borislav Petkov, Andy Lutomirski, Thomas Gleixner, Andrew Morton,
linux-cachefs, linux-nfs, Nicolas Pitre, linux-ntfs-dev, Netdev,
Kexec Mailing List, Linux Kernel Mailing List, linux-f2fs-devel,
linux-fsdevel, bpf, linuxppc-dev, linux-btrfs
In-Reply-To: <CAPcyv4gL3jfw4d+SJGPqAD3Dp4F_K=X3domuN4ndAA1FQDGcPg@mail.gmail.com>
On Tue, Oct 13, 2020 at 11:44:29AM -0700, Dan Williams wrote:
> On Fri, Oct 9, 2020 at 12:52 PM <ira.weiny@intel.com> wrote:
> >
> > From: Ira Weiny <ira.weiny@intel.com>
> >
> > The kmap() calls in this FS are localized to a single thread. To avoid
> > the over head of global PKRS updates use the new kmap_thread() call.
> >
> > Cc: Nicolas Pitre <nico@fluxnic.net>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > ---
> > fs/cramfs/inode.c | 10 +++++-----
> > 1 file changed, 5 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> > index 912308600d39..003c014a42ed 100644
> > --- a/fs/cramfs/inode.c
> > +++ b/fs/cramfs/inode.c
> > @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
> > struct page *page = pages[i];
> >
> > if (page) {
> > - memcpy(data, kmap(page), PAGE_SIZE);
> > - kunmap(page);
> > + memcpy(data, kmap_thread(page), PAGE_SIZE);
> > + kunmap_thread(page);
>
> Why does this need a sleepable kmap? This looks like a textbook
> kmap_atomic() use case.
There's a lot of code of this form. Could we perhaps have:
static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size)
{
char *vto = kmap_atomic(to);
memcpy(vto, vfrom, size);
kunmap_atomic(vto);
}
in linux/highmem.h ?
^ permalink raw reply
* Re: [PATCH v2] powerpc/pci: unmap legacy INTx interrupts when a PHB is removed
From: Qian Cai @ 2020-10-13 19:33 UTC (permalink / raw)
To: Cédric Le Goater, Michael Ellerman
Cc: Stephen Rothwell, Alexey Kardashevskiy, linux-kernel, linux-next,
Oliver O'Halloran, linuxppc-dev
In-Reply-To: <fce8ffe1-521c-8344-c7ad-53550e408cdc@kaod.org>
On Wed, 2020-09-23 at 09:06 +0200, Cédric Le Goater wrote:
> On 9/23/20 2:33 AM, Qian Cai wrote:
> > On Fri, 2020-08-07 at 12:18 +0200, Cédric Le Goater wrote:
> > > When a passthrough IO adapter is removed from a pseries machine using
> > > hash MMU and the XIVE interrupt mode, the POWER hypervisor expects the
> > > guest OS to clear all page table entries related to the adapter. If
> > > some are still present, the RTAS call which isolates the PCI slot
> > > returns error 9001 "valid outstanding translations" and the removal of
> > > the IO adapter fails. This is because when the PHBs are scanned, Linux
> > > maps automatically the INTx interrupts in the Linux interrupt number
> > > space but these are never removed.
> > >
> > > To solve this problem, we introduce a PPC platform specific
> > > pcibios_remove_bus() routine which clears all interrupt mappings when
> > > the bus is removed. This also clears the associated page table entries
> > > of the ESB pages when using XIVE.
> > >
> > > For this purpose, we record the logical interrupt numbers of the
> > > mapped interrupt under the PHB structure and let pcibios_remove_bus()
> > > do the clean up.
> > >
> > > Since some PCI adapters, like GPUs, use the "interrupt-map" property
> > > to describe interrupt mappings other than the legacy INTx interrupts,
> > > we can not restrict the size of the mapping array to PCI_NUM_INTX. The
> > > number of interrupt mappings is computed from the "interrupt-map"
> > > property and the mapping array is allocated accordingly.
> > >
> > > Cc: "Oliver O'Halloran" <oohall@gmail.com>
> > > Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
> > > Signed-off-by: Cédric Le Goater <clg@kaod.org>
> >
> > Some syscall fuzzing will trigger this on POWER9 NV where the traces pointed
> > to
> > this patch.
> >
> > .config: https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config
>
> OK. The patch is missing a NULL assignement after kfree() and that
> might be the issue.
>
> I did try PHB removal under PowerNV, so I would like to understand
> how we managed to remove twice the PCI bus and possibly reproduce.
> Any chance we could grab what the syscall fuzzer (syzkaller) did ?
Any update on this? Maybe Michael or Stephen could drop this for now, so our
fuzzing could continue to find something else new?
It can still be reproduced on today's linux-next. BTW, this is running trinity
from an unprivileged user. This is the snapshot of the each fuzzing thread when
this happens.
http://people.redhat.com/qcai/pcibios_remove_bus/trinity-post-mortem.log
It can be reproduced by simply keep running this for a while:
$ trinity -C <total number of CPUs> --arch 64
[19611.946827][T1717146] pci_bus 0035:03: busn_res: [bus 03-07] is released
[19611.950956][T1717146] pci_bus 0035:08: busn_res: [bus 08-0c] is released
[19611.951260][T1717146] =============================================================================
[19611.952336][T1717146] BUG kmalloc-16 (Tainted: G W O ): Object already free
[19611.952365][T1717146] -----------------------------------------------------------------------------
[19611.952365][T1717146]
[19611.952411][T1717146] Disabling lock debugging due to kernel taint
[19611.952438][T1717146] INFO: Allocated in pcibios_scan_phb+0x104/0x3e0 age=1960714 cpu=4 pid=1
[19611.952481][T1717146] __slab_alloc+0xa4/0xf0
[19611.952500][T1717146] __kmalloc+0x294/0x330
[19611.952519][T1717146] pcibios_scan_phb+0x104/0x3e0
[19611.952549][T1717146] pcibios_init+0x84/0x124
[19611.952578][T1717146] do_one_initcall+0xac/0x528
[19611.952599][T1717146] kernel_init_freeable+0x35c/0x3fc
[19611.952618][T1717146] kernel_init+0x24/0x148
[19611.952646][T1717146] ret_from_kernel_thread+0x5c/0x80
[19611.952665][T1717146] INFO: Freed in pcibios_remove_bus+0x70/0x90 age=0 cpu=16 pid=1717146
[19611.952691][T1717146] kfree+0x49c/0x510
[19611.952700][T1717146] pcibios_remove_bus+0x70/0x90
[19611.952711][T1717146] pci_remove_bus+0xe4/0x110
[19611.952730][T1717146] pci_remove_bus_device+0x74/0x170
[19611.952749][T1717146] pci_remove_bus_device+0x4c/0x170
[19611.952768][T1717146] pci_stop_and_remove_bus_device_locked+0x34/0x50
[19611.952798][T1717146] remove_store+0xc0/0xe0
[19611.952819][T1717146] dev_attr_store+0x30/0x50
[19611.952852][T1717146] sysfs_kf_write+0x68/0xb0
[19611.952870][T1717146] kernfs_fop_write+0x114/0x260
[19611.952904][T1717146] vfs_write+0xe4/0x260
[19611.952922][T1717146] ksys_write+0x74/0x130
[19611.952951][T1717146] system_call_exception+0xf8/0x1d0
[19611.952970][T1717146] system_call_common+0xe8/0x218
[19611.952990][T1717146] INFO: Slab 0x0000000099caaf22 objects=178 used=174 fp=0x00000000006a64b0 flags=0x7fff8000000201
[19611.953004][T1717146] INFO: Object 0x00000000f360132d @offset=30192 fp=0x0000000000000000
[19611.953004][T1717146]
[19611.953048][T1717146] Redzone 00000000acef7298: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[19611.953080][T1717146] Object 00000000f360132d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
[19611.953114][T1717146] Redzone 0000000083758aaa: bb bb bb bb bb bb bb bb ........
[19611.953146][T1717146] Padding 00000000cbb228a2: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
[19611.953189][T1717146] CPU: 16 PID: 1717146 Comm: trinity-c8 Tainted: G B W O 5.9.0-next-20201013 #1
[19611.953223][T1717146] Call Trace:
[19611.953242][T1717146] [c000200022557800] [c00000000064c208] dump_stack+0xec/0x144 (unreliable)
[19611.953291][T1717146] [c000200022557840] [c000000000363688] print_trailer+0x278/0x2a0
[19611.953323][T1717146] [c0002000225578d0] [c00000000035aa8c] free_debug_processing+0x57c/0x600
[19611.953356][T1717146] [c0002000225579b0] [c00000000035af24] __slab_free+0x414/0x5b0
[19611.953391][T1717146] [c000200022557a80] [c00000000035b55c] kfree+0x49c/0x510
[19611.953423][T1717146] [c000200022557b10] [c0000000000432a0] pcibios_remove_bus+0x70/0x90
pci_irq_map_dispose at arch/powerpc/kernel/pci-common.c:456
(inlined by) pcibios_remove_bus at arch/powerpc/kernel/pci-common.c:461
[19611.953454][T1717146] [c000200022557b40] [c000000000677f94] pci_remove_bus+0xe4/0x110
[19611.953477][T1717146] [c000200022557b70] [c000000000678134] pci_remove_bus_device+0x74/0x170
[19611.953510][T1717146] [c000200022557bb0] [c000000000678120] pci_remove_bus_device+0x60/0x170
[19611.953543][T1717146] [c000200022557bf0] [c0000000006782a4] pci_stop_and_remove_bus_device_locked+0x34/0x50
[19611.953567][T1717146] [c000200022557c20] [c000000000687690] remove_store+0xc0/0xe0
[19611.953599][T1717146] [c000200022557c70] [c0000000006e5320] dev_attr_store+0x30/0x50
[19611.953621][T1717146] [c000200022557c90] [c0000000004a53b8] sysfs_kf_write+0x68/0xb0
[19611.953652][T1717146] [c000200022557cd0] [c0000000004a45e4] kernfs_fop_write+0x114/0x260
[19611.953684][T1717146] [c000200022557d20] [c0000000003aff74] vfs_write+0xe4/0x260
[19611.953717][T1717146] [c000200022557d70] [c0000000003b02a4] ksys_write+0x74/0x130
[19611.953762][T1717146] [c000200022557dc0] [c00000000002a3e8] system_call_exception+0xf8/0x1d0
[19611.953795][T1717146] [c000200022557e20] [c00000000000d0a8] system_call_common+0xe8/0x218
[19611.953821][T1717146] FIX kmalloc-16: Object at 0x00000000f360132d not freed
[19611.954111][T1717146] =============================================================================
[19611.954144][T1717146] BUG kmalloc-16 (Tainted: G B W O ): Wrong object count. Counter is 174 but counted were 176
[19611.954176][T1717146] -----------------------------------------------------------------------------
[19611.954176][T1717146]
[19611.954221][T1717146] INFO: Slab 0x0000000099caaf22 objects=178 used=174 fp=0x00000000006a64b0 flags=0x7fff8000000201
[19611.954237][T1717146] CPU: 16 PID: 1717146 Comm: trinity-c8 Tainted: G B W O 5.9.0-next-20201013 #1
[19611.954269][T1717146] Call Trace:
[19611.954286][T1717146] [c0002000225576f0] [c00000000064c208] dump_stack+0xec/0x144 (unreliable)
[19611.954329][T1717146] [c000200022557730] [c000000000363368] slab_err+0x78/0xb0
[19611.954364][T1717146] [c000200022557810] [c000000000359f94] on_freelist+0x364/0x390
[19611.954390][T1717146] [c0002000225578b0] [c00000000035a798] free_debug_processing+0x288/0x600
[19611.954428][T1717146] [c000200022557990] [c00000000035af24] __slab_free+0x414/0x5b0
[19611.954459][T1717146] [c000200022557a60] [c00000000035b55c] kfree+0x49c/0x510
[19611.954507][T1717146] [c000200022557af0] [c0000000002bd5a0] kfree_const+0x60/0x80
[19611.954540][T1717146] [c000200022557b10] [c0000000006553ec] kobject_release+0x7c/0xd0
[19611.954562][T1717146] [c000200022557b50] [c0000000006e66c0] put_device+0x20/0x40
[19611.954594][T1717146] [c000200022557b70] [c00000000067820c] pci_remove_bus_device+0x14c/0x170
[19611.954627][T1717146] [c000200022557bb0] [c000000000678120] pci_remove_bus_device+0x60/0x170
[19611.954652][T1717146] [c000200022557bf0] [c0000000006782a4] pci_stop_and_remove_bus_device_locked+0x34/0x50
[19611.954686][T1717146] [c000200022557c20] [c000000000687690] remove_store+0xc0/0xe0
[19611.954717][T1717146] [c000200022557c70] [c0000000006e5320] dev_attr_store+0x30/0x50
[19611.954749][T1717146] [c000200022557c90] [c0000000004a53b8] sysfs_kf_write+0x68/0xb0
[19611.954784][T1717146] [c000200022557cd0] [c0000000004a45e4] kernfs_fop_write+0x114/0x260
[19611.954884][T1717146] [c000200022557d20] [c0000000003aff74] vfs_write+0xe4/0x260
[19611.954972][T1717146] [c000200022557d70] [c0000000003b02a4] ksys_write+0x74/0x130
[19611.955050][T1717146] [c000200022557dc0] [c00000000002a3e8] system_call_exception+0xf8/0x1d0
[19611.955144][T1717146] [c000200022557e20] [c00000000000d0a8] system_call_common+0xe8/0x218
[19611.955228][T1717146] FIX kmalloc-16: Object count adjusted.
[19611.955300][T1717146] pci_bus 0035:0d: busn_res: [bus 0d-11] is released
[19611.955394][T1717146] =============================================================================
[19611.955493][T1717146] BUG kmalloc-16 (Tainted: G B W O ): Object already free
[19611.955572][T1717146] -----------------------------------------------------------------------------
[19611.955572][T1717146]
[19611.955732][T1717146] INFO: Allocated in pcibios_scan_phb+0x104/0x3e0 age=1960715 cpu=4 pid=1
[19611.955847][T1717146] __slab_alloc+0xa4/0xf0
[19611.955902][T1717146] __kmalloc+0x294/0x330
[19611.955948][T1717146] pcibios_scan_phb+0x104/0x3e0
[19611.955994][T1717146] pcibios_init+0x84/0x124
[19611.956064][T1717146] do_one_initcall+0xac/0x528
[19611.956101][T1717146] kernel_init_freeable+0x35c/0x3fc
[19611.956164][T1717146] kernel_init+0x24/0x148
[19611.956215][T1717146] ret_from_kernel_thread+0x5c/0x80
[19611.956283][T1717146] INFO: Freed in pcibios_remove_bus+0x70/0x90 age=1 cpu=16 pid=1717146
[19611.956385][T1717146] kfree+0x49c/0x510
[19611.956419][T1717146] pcibios_remove_bus+0x70/0x90
[19611.956481][T1717146] pci_remove_bus+0xe4/0x110
[19611.956532][T1717146] pci_remove_bus_device+0x74/0x170
[19611.956608][T1717146] pci_remove_bus_device+0x4c/0x170
[19611.956652][T1717146] pci_stop_and_remove_bus_device_locked+0x34/0x50
[19611.956722][T1717146] remove_store+0xc0/0xe0
[19611.956793][T1717146] dev_attr_store+0x30/0x50
[19611.956850][T1717146] sysfs_kf_write+0x68/0xb0
[19611.956914][T1717146] kernfs_fop_write+0x114/0x260
[19611.956964][T1717146] vfs_write+0xe4/0x260
[19611.957009][T1717146] ksys_write+0x74/0x130
[19611.957055][T1717146] system_call_exception+0xf8/0x1d0
[19611.957101][T1717146] system_call_common+0xe8/0x218
[19611.957173][T1717146] INFO: Slab 0x0000000099caaf22 objects=178 used=175 fp=0x00000000f4222fd7 flags=0x7fff8000000201
[19611.957304][T1717146] INFO: Object 0x00000000f360132d @offset=30192 fp=0x0000000000000000
[19611.957304][T1717146]
[19611.957429][T1717146] Redzone 00000000acef7298: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[19611.957543][T1717146] Object 00000000f360132d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
[19611.957684][T1717146] Redzone 0000000083758aaa: bb bb bb bb bb bb bb bb ........
[19611.957781][T1717146] Padding 00000000cbb228a2: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
[19611.957912][T1717146] CPU: 16 PID: 1717146 Comm: trinity-c8 Tainted: G B W O 5.9.0-next-20201013 #1
[19611.958033][T1717146] Call Trace:
[19611.958085][T1717146] [c000200022557800] [c00000000064c208] dump_stack+0xec/0x144 (unreliable)
[19611.958182][T1717146] [c000200022557840] [c000000000363688] print_trailer+0x278/0x2a0
[19611.958261][T1717146] [c0002000225578d0] [c00000000035aa8c] free_debug_processing+0x57c/0x600
[19611.958385][T1717146] [c0002000225579b0] [c00000000035af24] __slab_free+0x414/0x5b0
[19611.958486][T1717146] [c000200022557a80] [c00000000035b55c] kfree+0x49c/0x510
[19611.958555][T1717146] [c000200022557b10] [c0000000000432a0] pcibios_remove_bus+0x70/0x90
[19611.958665][T1717146] [c000200022557b40] [c000000000677f94] pci_remove_bus+0xe4/0x110
[19611.958745][T1717146] [c000200022557b70] [c000000000678134] pci_remove_bus_device+0x74/0x170
[19611.958842][T1717146] [c000200022557bb0] [c000000000678120] pci_remove_bus_device+0x60/0x170
[19611.958953][T1717146] [c000200022557bf0] [c0000000006782a4] pci_stop_and_remove_bus_device_locked+0x34/0x50
[19611.959062][T1717146] [c000200022557c20] [c000000000687690] remove_store+0xc0/0xe0
[19611.959142][T1717146] [c000200022557c70] [c0000000006e5320] dev_attr_store+0x30/0x50
[19611.959242][T1717146] [c000200022557c90] [c0000000004a53b8] sysfs_kf_write+0x68/0xb0
[19611.959323][T1717146] [c000200022557cd0] [c0000000004a45e4] kernfs_fop_write+0x114/0x260
[19611.959425][T1717146] [c000200022557d20] [c0000000003aff74] vfs_write+0xe4/0x260
[19611.959506][T1717146] [c000200022557d70] [c0000000003b02a4] ksys_write+0x74/0x130
[19611.959613][T1717146] [c000200022557dc0] [c00000000002a3e8] system_call_exception+0xf8/0x1d0
[19611.959713][T1717146] [c000200022557e20] [c00000000000d0a8] system_call_common+0xe8/0x218
[19611.959819][T1717146] FIX kmalloc-16: Object at 0x00000000f360132d not freed
[19611.960653][T1717146] pci 0035:02 : [PE# fc] Releasing PE
[19611.960831][T1717146] pci_bus 0035:02: busn_res: [bus 02-11] is released
[19611.960913][T1717146] =============================================================================
[19611.960934][T1717146] BUG kmalloc-16 (Tainted: G B W O ): Object already free
[19611.960954][T1717146] -----------------------------------------------------------------------------
[19611.960954][T1717146]
[19611.960991][T1717146] INFO: Allocated in pcibios_scan_phb+0x104/0x3e0 age=1960715 cpu=4 pid=1
[19611.961024][T1717146] __slab_alloc+0xa4/0xf0
[19611.961052][T1717146] __kmalloc+0x294/0x330
[19611.961070][T1717146] pcibios_scan_phb+0x104/0x3e0
[19611.961089][T1717146] pcibios_init+0x84/0x124
[19611.961108][T1717146] do_one_initcall+0xac/0x528
[19611.961169][T1717146] kernel_init_freeable+0x35c/0x3fc
[19611.961213][T1717146] kernel_init+0x24/0x148
[19611.961276][T1717146] ret_from_kernel_thread+0x5c/0x80
[19611.961321][T1717146] INFO: Freed in pcibios_remove_bus+0x70/0x90 age=1 cpu=16 pid=1717146
[19611.961441][T1717146] kfree+0x49c/0x510
[19611.961497][T1717146] pcibios_remove_bus+0x70/0x90
[19611.961554][T1717146] pci_remove_bus+0xe4/0x110
[19611.961621][T1717146] pci_remove_bus_device+0x74/0x170
[19611.961670][T1717146] pci_remove_bus_device+0x4c/0x170
[19611.961730][T1717146] pci_stop_and_remove_bus_device_locked+0x34/0x50
[19611.961810][T1717146] remove_store+0xc0/0xe0
[19611.961855][T1717146] dev_attr_store+0x30/0x50
[19611.961912][T1717146] sysfs_kf_write+0x68/0xb0
[19611.961965][T1717146] kernfs_fop_write+0x114/0x260
[19611.962017][T1717146] vfs_write+0xe4/0x260
[19611.962071][T1717146] ksys_write+0x74/0x130
[19611.962127][T1717146] system_call_exception+0xf8/0x1d0
[19611.962194][T1717146] system_call_common+0xe8/0x218
[19611.962239][T1717146] INFO: Slab 0x0000000099caaf22 objects=178 used=174 fp=0x00000000253d72f3 flags=0x7fff8000000201
[19611.962365][T1717146] INFO: Object 0x00000000f360132d @offset=30192 fp=0x0000000000000000
[19611.962365][T1717146]
[19611.962501][T1717146] Redzone 00000000acef7298: bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb ................
[19611.962628][T1717146] Object 00000000f360132d: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk.
[19611.962729][T1717146] Redzone 0000000083758aaa: bb bb bb bb bb bb bb bb ........
[19611.962836][T1717146] Padding 00000000cbb228a2: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ
[19611.962962][T1717146] CPU: 16 PID: 1717146 Comm: trinity-c8 Tainted: G B W O 5.9.0-next-20201013 #1
[19611.963077][T1717146] Call Trace:
[19611.963113][T1717146] [c000200022557840] [c00000000064c208] dump_stack+0xec/0x144 (unreliable)
[19611.963210][T1717146] [c000200022557880] [c000000000363688] print_trailer+0x278/0x2a0
[19611.963300][T1717146] [c000200022557910] [c00000000035aa8c] free_debug_processing+0x57c/0x600
[19611.963395][T1717146] [c0002000225579f0] [c00000000035af24] __slab_free+0x414/0x5b0
[19611.963490][T1717146] [c000200022557ac0] [c00000000035b55c] kfree+0x49c/0x510
[19611.963585][T1717146] [c000200022557b50] [c0000000000432a0] pcibios_remove_bus+0x70/0x90
[19611.963710][T1717146] [c000200022557b80] [c000000000677f94] pci_remove_bus+0xe4/0x110
[19611.963788][T1717146] [c000200022557bb0] [c000000000678134] pci_remove_bus_device+0x74/0x170
[19611.963883][T1717146] [c000200022557bf0] [c0000000006782a4] pci_stop_and_remove_bus_device_locked+0x34/0x50
[19611.963988][T1717146] [c000200022557c20] [c000000000687690] remove_store+0xc0/0xe0
[19611.964102][T1717146] [c000200022557c70] [c0000000006e5320] dev_attr_store+0x30/0x50
[19611.964189][T1717146] [c000200022557c90] [c0000000004a53b8] sysfs_kf_write+0x68/0xb0
[19611.964302][T1717146] [c000200022557cd0] [c0000000004a45e4] kernfs_fop_write+0x114/0x260
[19611.964390][T1717146] [c000200022557d20] [c0000000003aff74] vfs_write+0xe4/0x260
[19611.964493][T1717146] [c000200022557d70] [c0000000003b02a4] ksys_write+0x74/0x130
[19611.964541][T1717146] [c000200022557dc0] [c00000000002a3e8] system_call_exception+0xf8/0x1d0
[19611.964747][T1717146] [c000200022557e20] [c00000000000d0a8] system_call_common+0xe8/0x218
[19611.964851][T1717146] FIX kmalloc-16: Object at 0x00000000f360132d not freed
[19611.966211][T1717146] pci 0035:01 : [PE# fd] Releasing PE
^ permalink raw reply
* Re: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()
From: Dan Williams @ 2020-10-13 18:44 UTC (permalink / raw)
To: Weiny, Ira
Cc: linux-aio, linux-efi, KVM list, Linux Doc Mailing List,
Peter Zijlstra, linux-mmc, Dave Hansen,
Maling list - DRI developers, Linux MM, target-devel, linux-mtd,
linux-kselftest, samba-technical, ceph-devel, drbd-dev, devel,
linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm, linux-rdma,
X86 ML, amd-gfx list, io-uring, cluster-devel, Ingo Molnar,
intel-wired-lan, xen-devel, linux-ext4, Fenghua Yu, linux-afs,
linux-um, intel-gfx, ecryptfs, linux-erofs, reiserfs-devel,
linux-block, linux-bcache, Borislav Petkov, Andy Lutomirski,
Thomas Gleixner, Andrew Morton, linux-cachefs, linux-nfs,
Nicolas Pitre, linux-ntfs-dev, Netdev, Kexec Mailing List,
Linux Kernel Mailing List, linux-f2fs-devel, linux-fsdevel, bpf,
linuxppc-dev, linux-btrfs
In-Reply-To: <20201009195033.3208459-34-ira.weiny@intel.com>
On Fri, Oct 9, 2020 at 12:52 PM <ira.weiny@intel.com> wrote:
>
> From: Ira Weiny <ira.weiny@intel.com>
>
> The kmap() calls in this FS are localized to a single thread. To avoid
> the over head of global PKRS updates use the new kmap_thread() call.
>
> Cc: Nicolas Pitre <nico@fluxnic.net>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
> fs/cramfs/inode.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> index 912308600d39..003c014a42ed 100644
> --- a/fs/cramfs/inode.c
> +++ b/fs/cramfs/inode.c
> @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
> struct page *page = pages[i];
>
> if (page) {
> - memcpy(data, kmap(page), PAGE_SIZE);
> - kunmap(page);
> + memcpy(data, kmap_thread(page), PAGE_SIZE);
> + kunmap_thread(page);
Why does this need a sleepable kmap? This looks like a textbook
kmap_atomic() use case.
^ permalink raw reply
* Re: [PATCH RFC PKS/PMEM 33/58] fs/cramfs: Utilize new kmap_thread()
From: Nicolas Pitre @ 2020-10-13 18:36 UTC (permalink / raw)
To: Ira Weiny
Cc: linux-aio, linux-efi, kvm, linux-doc, Peter Zijlstra, linux-mmc,
Dave Hansen, dri-devel, linux-mm, target-devel, linux-mtd,
linux-kselftest, samba-technical, Thomas Gleixner, drbd-dev,
devel, linux-cifs, linux-nilfs, linux-scsi, linux-nvdimm,
linux-rdma, x86, ceph-devel, amd-gfx, io-uring, cluster-devel,
Ingo Molnar, intel-wired-lan, xen-devel, linux-ext4, Fenghua Yu,
linux-afs, linux-um, intel-gfx, ecryptfs, linux-erofs,
reiserfs-devel, linux-block, linux-bcache, Borislav Petkov,
Andy Lutomirski, Dan Williams, Andrew Morton, linux-cachefs,
linux-nfs, linux-ntfs-dev, netdev, kexec, linux-kernel,
linux-f2fs-devel, linux-fsdevel, bpf, linuxppc-dev, linux-btrfs
In-Reply-To: <20201009195033.3208459-34-ira.weiny@intel.com>
On Fri, 9 Oct 2020, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
>
> The kmap() calls in this FS are localized to a single thread. To avoid
> the over head of global PKRS updates use the new kmap_thread() call.
>
> Cc: Nicolas Pitre <nico@fluxnic.net>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
> fs/cramfs/inode.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> index 912308600d39..003c014a42ed 100644
> --- a/fs/cramfs/inode.c
> +++ b/fs/cramfs/inode.c
> @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
> struct page *page = pages[i];
>
> if (page) {
> - memcpy(data, kmap(page), PAGE_SIZE);
> - kunmap(page);
> + memcpy(data, kmap_thread(page), PAGE_SIZE);
> + kunmap_thread(page);
> put_page(page);
> } else
> memset(data, 0, PAGE_SIZE);
> @@ -826,7 +826,7 @@ static int cramfs_readpage(struct file *file, struct page *page)
>
> maxblock = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
> bytes_filled = 0;
> - pgdata = kmap(page);
> + pgdata = kmap_thread(page);
>
> if (page->index < maxblock) {
> struct super_block *sb = inode->i_sb;
> @@ -914,13 +914,13 @@ static int cramfs_readpage(struct file *file, struct page *page)
>
> memset(pgdata + bytes_filled, 0, PAGE_SIZE - bytes_filled);
> flush_dcache_page(page);
> - kunmap(page);
> + kunmap_thread(page);
> SetPageUptodate(page);
> unlock_page(page);
> return 0;
>
> err:
> - kunmap(page);
> + kunmap_thread(page);
> ClearPageUptodate(page);
> SetPageError(page);
> unlock_page(page);
> --
> 2.28.0.rc0.12.gb6a658bd00c9
>
>
^ permalink raw reply
* Re: [PATCH v2] ima: defer arch_ima_get_secureboot() call to IMA init time
From: Ard Biesheuvel @ 2020-10-13 16:59 UTC (permalink / raw)
To: Mimi Zohar
Cc: linux-efi, Dmitry Kasatkin, James Morris, Chester Lin,
linux-security-module, linux-integrity,
open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), Serge E. Hallyn
In-Reply-To: <ae9ab2560f6d7b114726efb1ec26f0a36f695335.camel@linux.ibm.com>
On Tue, 13 Oct 2020 at 18:46, Mimi Zohar <zohar@linux.ibm.com> wrote:
>
> [Cc'ing linuxppc-dev@lists.ozlabs.org]
>
> On Tue, 2020-10-13 at 10:18 +0200, Ard Biesheuvel wrote:
> > Chester reports that it is necessary to introduce a new way to pass
> > the EFI secure boot status between the EFI stub and the core kernel
> > on ARM systems. The usual way of obtaining this information is by
> > checking the SecureBoot and SetupMode EFI variables, but this can
> > only be done after the EFI variable workqueue is created, which
> > occurs in a subsys_initcall(), whereas arch_ima_get_secureboot()
> > is called much earlier by the IMA framework.
> >
> > However, the IMA framework itself is started as a late_initcall,
> > and the only reason the call to arch_ima_get_secureboot() occurs
> > so early is because it happens in the context of a __setup()
> > callback that parses the ima_appraise= command line parameter.
> >
> > So let's refactor this code a little bit, by using a core_param()
> > callback to capture the command line argument, and deferring any
> > reasoning based on its contents to the IMA init routine.
> >
> > Cc: Chester Lin <clin@suse.com>
> > Cc: Mimi Zohar <zohar@linux.ibm.com>
> > Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
> > Cc: James Morris <jmorris@namei.org>
> > Cc: "Serge E. Hallyn" <serge@hallyn.com>
> > Link: https://lore.kernel.org/linux-arm-kernel/20200904072905.25332-2-clin@suse.com/
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > v2: rebase onto series 'integrity: improve user feedback for invalid bootparams'
>
> Thanks, Ard. Based on my initial, limited testing on Power, it looks
> good, but I'm hesistant to include it in the integrity 5.10 pull
> request without it having been in linux-next and some additional
> testing. It's now queued in the next-integrity-testing branch awaiting
> some tags.
>
Thanks. No rush as far as I am concerned, although I suppose Chester
may want to rebase his arm64 IMA enablement series on this.
Suggestion: can we take the get_sb_mode() code from ima_arch.c in
arch/x86, and generalize it for all EFI architectures? That way, we
can enable 32-bit ARM and RISC-V seamlessly once someone gets around
to enabling IMA on those platforms. In fact, get_sb_mode() itself
should probably be factored out into a generic helper for use outside
of IMA as well (Xen/x86 has code that does roughly the same already)
^ permalink raw reply
* Re: [PATCH v2] ima: defer arch_ima_get_secureboot() call to IMA init time
From: Mimi Zohar @ 2020-10-13 16:46 UTC (permalink / raw)
To: Ard Biesheuvel, linux-efi
Cc: Dmitry Kasatkin, James Morris, Chester Lin, linux-security-module,
linux-integrity, linuxppc-dev, Serge E. Hallyn
In-Reply-To: <20201013081804.17332-1-ardb@kernel.org>
[Cc'ing linuxppc-dev@lists.ozlabs.org]
On Tue, 2020-10-13 at 10:18 +0200, Ard Biesheuvel wrote:
> Chester reports that it is necessary to introduce a new way to pass
> the EFI secure boot status between the EFI stub and the core kernel
> on ARM systems. The usual way of obtaining this information is by
> checking the SecureBoot and SetupMode EFI variables, but this can
> only be done after the EFI variable workqueue is created, which
> occurs in a subsys_initcall(), whereas arch_ima_get_secureboot()
> is called much earlier by the IMA framework.
>
> However, the IMA framework itself is started as a late_initcall,
> and the only reason the call to arch_ima_get_secureboot() occurs
> so early is because it happens in the context of a __setup()
> callback that parses the ima_appraise= command line parameter.
>
> So let's refactor this code a little bit, by using a core_param()
> callback to capture the command line argument, and deferring any
> reasoning based on its contents to the IMA init routine.
>
> Cc: Chester Lin <clin@suse.com>
> Cc: Mimi Zohar <zohar@linux.ibm.com>
> Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
> Cc: James Morris <jmorris@namei.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Link: https://lore.kernel.org/linux-arm-kernel/20200904072905.25332-2-clin@suse.com/
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> v2: rebase onto series 'integrity: improve user feedback for invalid bootparams'
Thanks, Ard. Based on my initial, limited testing on Power, it looks
good, but I'm hesistant to include it in the integrity 5.10 pull
request without it having been in linux-next and some additional
testing. It's now queued in the next-integrity-testing branch awaiting
some tags.
thanks,
Mimi
^ permalink raw reply
* Re: [PATCH] powerpc/perf: fix Threshold Event CounterMultiplier width for P10
From: Michal Suchánek @ 2020-10-13 15:58 UTC (permalink / raw)
To: Madhavan Srinivasan; +Cc: atrajeev, linuxppc-dev
In-Reply-To: <b840fcf3-6546-159e-e23a-c8fe00123539@linux.ibm.com>
On Tue, Oct 13, 2020 at 06:27:05PM +0530, Madhavan Srinivasan wrote:
>
> On 10/12/20 4:59 PM, Michal Suchánek wrote:
> > Hello,
> >
> > On Mon, Oct 12, 2020 at 04:01:28PM +0530, Madhavan Srinivasan wrote:
> > > Power9 and isa v3.1 has 7bit mantissa field for Threshold Event Counter
> > ^^^ Shouldn't his be 3.0?
>
> My bad, What I meant was
>
> Power9, ISA v3.0 and ISA v3.1 define a 7 bit mantissa field for Threshold
> Event Counter Multiplier(TECM).
I am really confused.
The following text and the code suggests that the mantissa is 8bit on
POWER10 and ISA v3.1.
Thanks
Michal
>
> Maddy
>
> >
> > > Multiplier (TECM). TECM is part of Monitor Mode Control Register A (MMCRA).
> > > This field along with Threshold Event Counter Exponent (TECE) is used to
> > > get threshould counter value. In Power10, the width of TECM field is
> > > increase to 8bits. Patch fixes the current code to modify the MMCRA[TECM]
> > > extraction macro to handling this changes.
> > >
> > > Fixes: 170a315f41c64 ('powerpc/perf: Support to export MMCRA[TEC*] field to userspace')
> > > Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
> > > ---
> > > arch/powerpc/perf/isa207-common.c | 3 +++
> > > arch/powerpc/perf/isa207-common.h | 4 ++++
> > > 2 files changed, 7 insertions(+)
> > >
> > > diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
> > > index 964437adec18..5fe129f02290 100644
> > > --- a/arch/powerpc/perf/isa207-common.c
> > > +++ b/arch/powerpc/perf/isa207-common.c
> > > @@ -247,6 +247,9 @@ void isa207_get_mem_weight(u64 *weight)
> > > u64 sier = mfspr(SPRN_SIER);
> > > u64 val = (sier & ISA207_SIER_TYPE_MASK) >> ISA207_SIER_TYPE_SHIFT;
> > > + if (cpu_has_feature(CPU_FTR_ARCH_31))
> > > + mantissa = P10_MMCRA_THR_CTR_MANT(mmcra);
> > > +
> > > if (val == 0 || val == 7)
> > > *weight = 0;
> > > else
> > > diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
> > > index 044de65e96b9..71380e854f48 100644
> > > --- a/arch/powerpc/perf/isa207-common.h
> > > +++ b/arch/powerpc/perf/isa207-common.h
> > > @@ -219,6 +219,10 @@
> > > #define MMCRA_THR_CTR_EXP(v) (((v) >> MMCRA_THR_CTR_EXP_SHIFT) &\
> > > MMCRA_THR_CTR_EXP_MASK)
> > > +#define P10_MMCRA_THR_CTR_MANT_MASK 0xFFul
> > > +#define P10_MMCRA_THR_CTR_MANT(v) (((v) >> MMCRA_THR_CTR_MANT_SHIFT) &\
> > > + P10_MMCRA_THR_CTR_MANT_MASK)
> > > +
> > > /* MMCRA Threshold Compare bit constant for power9 */
> > > #define p9_MMCRA_THR_CMP_SHIFT 45
> > > --
> > > 2.26.2
> > >
^ permalink raw reply
* Re: [PATCH] powerpc/perf: fix Threshold Event CounterMultiplier width for P10
From: Madhavan Srinivasan @ 2020-10-13 12:57 UTC (permalink / raw)
To: Michal Suchánek; +Cc: atrajeev, linuxppc-dev
In-Reply-To: <20201012112905.GQ29778@kitsune.suse.cz>
On 10/12/20 4:59 PM, Michal Suchánek wrote:
> Hello,
>
> On Mon, Oct 12, 2020 at 04:01:28PM +0530, Madhavan Srinivasan wrote:
>> Power9 and isa v3.1 has 7bit mantissa field for Threshold Event Counter
> ^^^ Shouldn't his be 3.0?
My bad, What I meant was
Power9, ISA v3.0 and ISA v3.1 define a 7 bit mantissa field for
Threshold Event Counter Multiplier(TECM).
Maddy
>
>> Multiplier (TECM). TECM is part of Monitor Mode Control Register A (MMCRA).
>> This field along with Threshold Event Counter Exponent (TECE) is used to
>> get threshould counter value. In Power10, the width of TECM field is
>> increase to 8bits. Patch fixes the current code to modify the MMCRA[TECM]
>> extraction macro to handling this changes.
>>
>> Fixes: 170a315f41c64 ('powerpc/perf: Support to export MMCRA[TEC*] field to userspace')
>> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com>
>> ---
>> arch/powerpc/perf/isa207-common.c | 3 +++
>> arch/powerpc/perf/isa207-common.h | 4 ++++
>> 2 files changed, 7 insertions(+)
>>
>> diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
>> index 964437adec18..5fe129f02290 100644
>> --- a/arch/powerpc/perf/isa207-common.c
>> +++ b/arch/powerpc/perf/isa207-common.c
>> @@ -247,6 +247,9 @@ void isa207_get_mem_weight(u64 *weight)
>> u64 sier = mfspr(SPRN_SIER);
>> u64 val = (sier & ISA207_SIER_TYPE_MASK) >> ISA207_SIER_TYPE_SHIFT;
>>
>> + if (cpu_has_feature(CPU_FTR_ARCH_31))
>> + mantissa = P10_MMCRA_THR_CTR_MANT(mmcra);
>> +
>> if (val == 0 || val == 7)
>> *weight = 0;
>> else
>> diff --git a/arch/powerpc/perf/isa207-common.h b/arch/powerpc/perf/isa207-common.h
>> index 044de65e96b9..71380e854f48 100644
>> --- a/arch/powerpc/perf/isa207-common.h
>> +++ b/arch/powerpc/perf/isa207-common.h
>> @@ -219,6 +219,10 @@
>> #define MMCRA_THR_CTR_EXP(v) (((v) >> MMCRA_THR_CTR_EXP_SHIFT) &\
>> MMCRA_THR_CTR_EXP_MASK)
>>
>> +#define P10_MMCRA_THR_CTR_MANT_MASK 0xFFul
>> +#define P10_MMCRA_THR_CTR_MANT(v) (((v) >> MMCRA_THR_CTR_MANT_SHIFT) &\
>> + P10_MMCRA_THR_CTR_MANT_MASK)
>> +
>> /* MMCRA Threshold Compare bit constant for power9 */
>> #define p9_MMCRA_THR_CMP_SHIFT 45
>>
>> --
>> 2.26.2
>>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox