LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] ASoC: fsl_rpmsg: add soc specific data structure
From: Shengjiu Wang @ 2021-08-27  6:14 UTC (permalink / raw)
  To: Fabio Estevam
  Cc: Linux-ALSA, Timur Tabi, Xiubo Li, Shengjiu Wang, linux-kernel,
	Takashi Iwai, Nicolin Chen, Mark Brown, linuxppc-dev
In-Reply-To: <CAOMZO5BCsTMjJJPtLN6_seVcWb24A2ms11FP3HzR0i7t3GLSuA@mail.gmail.com>

On Thu, Aug 26, 2021 at 7:54 PM Fabio Estevam <festevam@gmail.com> wrote:
>
> Hi Shengjiu,
>
> On Thu, Aug 26, 2021 at 8:19 AM Shengjiu Wang <shengjiu.wang@nxp.com> wrote:
>
> > +       rpmsg->soc_data = of_device_get_match_data(&pdev->dev);
> > +       if (rpmsg->soc_data) {
>
> This check is not necessary, because rpmsg->soc_data is always non-NULL.
>
> Other than that:
>
> Reviewed-by: Fabio Estevam <festevam@gmail.com>

Thanks, I will update in v2.

Best regards
wang shengjiu

^ permalink raw reply

* [PATCH kernel] KVM: PPC: Fix clearing never mapped TCEs in realmode
From: Alexey Kardashevskiy @ 2021-08-27  4:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Leonardo Bras, kvm-ppc

Since e1a1ef84cd07, pages for TCE tables for KVM guests are allocated
only when needed. This allows skipping any update when clearing TCEs.
This works mostly fine as TCE updates are handled when MMU is enabled.
The realmode handlers fail with H_TOO_HARD when pages are not yet
allocated except when clearing a TCE in which case KVM prints a warning
but proceeds to dereference a NULL pointer which crashes the host OS.

This has not been caught so far as the change is reasonably new,
POWER9 runs mostly radix which does not use realmode handlers.
With hash, the default TCE table is memset() by QEMU the machine reset
which triggers page faults and the KVM TCE device's kvm_spapr_tce_fault()
handles those with MMU on. And the huge DMA windows are not cleared
by VMs whicn instead successfully create a DMA window big enough to map
the VM memory 1:1 and then VMs just map everything without clearing.

This started crashing now as upcoming sriov-under-powervm support added
a mode when a dymanic DMA window not big enough to map the VM memory 1:1
but it is used anyway, and the VM now is the first (i.e. not QEMU) to
clear a just created table. Note that the upstream QEMU needs to be
modified to trigger the VM to trigger the host OS crash.

This replaces WARN_ON_ONCE_RM() with a check and return.
This adds another warning if TCE is not being cleared.

Cc: Leonardo Bras <leobras.c@gmail.com>
Fixes: e1a1ef84cd07 ("KVM: PPC: Book3S: Allocate guest TCEs on demand too")
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---

With recent changes in the printk() department, calling pr_err() when MMU
off causes lockdep lockups which I did not dig any further so we should
start getting rid of the realmode's WARN_ON_ONCE_RM().
---
 arch/powerpc/kvm/book3s_64_vio_hv.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 083a4e037718..e5ba96c41f3f 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -173,10 +173,13 @@ static void kvmppc_rm_tce_put(struct kvmppc_spapr_tce_table *stt,
 	idx -= stt->offset;
 	page = stt->pages[idx / TCES_PER_PAGE];
 	/*
-	 * page must not be NULL in real mode,
-	 * kvmppc_rm_ioba_validate() must have taken care of this.
+	 * kvmppc_rm_ioba_validate() allows pages not be allocated if TCE is
+	 * being cleared, otherwise it returns H_TOO_HARD and we skip this.
 	 */
-	WARN_ON_ONCE_RM(!page);
+	if (!page) {
+		WARN_ON_ONCE_RM(tce != 0);
+		return;
+	}
 	tbl = kvmppc_page_address(page);

 	tbl[idx % TCES_PER_PAGE] = tce;
-- 
2.30.2

^ permalink raw reply related

* Re: [PATCH v15 10/12] swiotlb: Add restricted DMA pool initialization
From: Claire Chang @ 2021-08-27  3:50 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: heikki.krogerus, thomas.hellstrom, peterz, joonas.lahtinen,
	dri-devel, chris, grant.likely, paulus, Frank Rowand, mingo,
	Marek Szyprowski, Stefano Stabellini, Saravana Kannan,
	Joerg Roedel, Rafael J . Wysocki, Christoph Hellwig,
	Bartosz Golaszewski, bskeggs, linux-pci, xen-devel,
	Thierry Reding, intel-gfx, matthew.auld, linux-devicetree,
	Jianxiong Gao, Daniel Vetter, Will Deacon, Konrad Rzeszutek Wilk,
	maarten.lankhorst, airlied, Dan Williams, linuxppc-dev,
	jani.nikula, Rob Herring, rodrigo.vivi, Bjorn Helgaas,
	boris.ostrovsky, Andy Shevchenko, jgross, Nicolas Boichat,
	Greg KH, Randy Dunlap, Qian Cai, lkml, Tomasz Figa,
	list@263.net:IOMMU DRIVERS, Jim Quinlan, xypron.glpk,
	Tom Lendacky, Robin Murphy, bauerman
In-Reply-To: <20210824142601.GA3393158@roeck-us.net>

On Tue, Aug 24, 2021 at 10:26 PM Guenter Roeck <linux@roeck-us.net> wrote:
>
> Hi Claire,
>
> On Thu, Jun 24, 2021 at 11:55:24PM +0800, Claire Chang wrote:
> > Add the initialization function to create restricted DMA pools from
> > matching reserved-memory nodes.
> >
> > Regardless of swiotlb setting, the restricted DMA pool is preferred if
> > available.
> >
> > The restricted DMA pools provide a basic level of protection against the
> > DMA overwriting buffer contents at unexpected times. However, to protect
> > against general data leakage and system memory corruption, the system
> > needs to provide a way to lock down the memory access, e.g., MPU.
> >
> > Signed-off-by: Claire Chang <tientzu@chromium.org>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Tested-by: Stefano Stabellini <sstabellini@kernel.org>
> > Tested-by: Will Deacon <will@kernel.org>
> > ---
> >  include/linux/swiotlb.h |  3 +-
> >  kernel/dma/Kconfig      | 14 ++++++++
> >  kernel/dma/swiotlb.c    | 76 +++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 92 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> > index 3b9454d1e498..39284ff2a6cd 100644
> > --- a/include/linux/swiotlb.h
> > +++ b/include/linux/swiotlb.h
> > @@ -73,7 +73,8 @@ extern enum swiotlb_force swiotlb_force;
> >   *           range check to see if the memory was in fact allocated by this
> >   *           API.
> >   * @nslabs:  The number of IO TLB blocks (in groups of 64) between @start and
> > - *           @end. This is command line adjustable via setup_io_tlb_npages.
> > + *           @end. For default swiotlb, this is command line adjustable via
> > + *           setup_io_tlb_npages.
> >   * @used:    The number of used IO TLB block.
> >   * @list:    The free list describing the number of free entries available
> >   *           from each index.
> > diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
> > index 77b405508743..3e961dc39634 100644
> > --- a/kernel/dma/Kconfig
> > +++ b/kernel/dma/Kconfig
> > @@ -80,6 +80,20 @@ config SWIOTLB
> >       bool
> >       select NEED_DMA_MAP_STATE
> >
> > +config DMA_RESTRICTED_POOL
> > +     bool "DMA Restricted Pool"
> > +     depends on OF && OF_RESERVED_MEM
> > +     select SWIOTLB
>
> This makes SWIOTLB user configurable, which in turn results in
>
> mips64-linux-ld: arch/mips/kernel/setup.o: in function `arch_mem_init':
> setup.c:(.init.text+0x19c8): undefined reference to `plat_swiotlb_setup'
> make[1]: *** [Makefile:1280: vmlinux] Error 1
>
> when building mips:allmodconfig.
>
> Should this possibly be "depends on SWIOTLB" ?

Patch is sent here: https://lkml.org/lkml/2021/8/26/932

>
> Thanks,
> Guenter

Thanks,
Claire

^ permalink raw reply

* Re: [PATCH v2 3/4] powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use
From: Nicholas Piggin @ 2021-08-27  1:33 UTC (permalink / raw)
  To: linuxppc-dev, kernel test robot; +Cc: kbuild-all
In-Reply-To: <202108262232.PzC05uqr-lkp@intel.com>

Excerpts from kernel test robot's message of August 27, 2021 1:04 am:
> Hi Nicholas,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on powerpc/next]
> [also build test ERROR on v5.14-rc7 next-20210826]
> [cannot apply to scottwood/next]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64s-interrupt-speedups/20210825-204209
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
> config: powerpc-allnoconfig (attached as .config)

Fix for 32-bit:

diff --git a/arch/powerpc/include/asm/hw_irq.h b/arch/powerpc/include/asm/hw_irq.h
index 8c78c40c006e..55e3fa44f280 100644
--- a/arch/powerpc/include/asm/hw_irq.h
+++ b/arch/powerpc/include/asm/hw_irq.h
@@ -437,7 +437,7 @@ static inline bool arch_irq_disabled_regs(struct pt_regs *regs)
        return !(regs->msr & MSR_EE);
 }
 
-static inline bool may_hard_irq_enable(void)
+static inline bool should_hard_irq_enable(void)
 {
        return false;
 }



^ permalink raw reply related

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
From: Nicholas Piggin @ 2021-08-27  1:28 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev
In-Reply-To: <20210826153048.GD1583@gate.crashing.org>

Excerpts from Segher Boessenkool's message of August 27, 2021 1:30 am:
> On Fri, Aug 27, 2021 at 01:04:36AM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am:
>> >> No, they are all dispatched and issue to the BRU for execution. It's 
>> >> trivial to construct a test of a lot of not taken branches in a row
>> >> and time a loop of it to see it executes at 1 cycle per branch.
>> > 
>> > (s/dispatched/issued/)
>> 
>> ?
> 
> Dispatch is from decode to the issue queues.  Issue is from there to
> execution units.  Dispatch is in-order, issue is not.

I know what those mean, I wonder what your s/dispatched/issued means.
I was saying they are dispatched in response to you saying they never
hit the issue queue.

> 
>> >> How could it validate prediction without issuing? It wouldn't know when
>> >> sources are ready.
>> > 
>> > In the backend.  But that is just how it worked on older cores :-/
>> 
>> Okay. I don't know about older cores than POWER9. Backend would normally 
>> include execution though.
>> Only other place you could do it if you don't
>> issue/exec would be after it goes back in order, like completion.
> 
> You do not have to do the verification in-order: the insn cannot finish
> until it is no longer speculative, that takes care of all ordering
> needed.

Branches *can* finish out of order and speculative as they do in P9 and 
P10. Are you talking about these CPUs or something else which can 
verify branches without issuing them?

> 
>> But that would be horrible for mispredict penalty.
> 
> See the previous point.  Also, any insn known to be mispredicted can be
> flushed immediately anyway.

The point is it has to know sources (CR) to verify (aka execute) the 
branch prediction was right, and if it needs sources then it needs to 
either issue and execute in the out of order part, or it needs to wait
until completion which would seem to be prohibitively expensive. I am
interested to know how it works.

> 
>> >> >> The first problem seems like the show stopper though. AFAIKS it would 
>> >> >> need a special builtin support that does something to create the table
>> >> >> entry, or a guarantee that we could put an inline asm right after the
>> >> >> builtin as a recognized pattern and that would give us the instruction
>> >> >> following the trap.
>> >> > 
>> >> > I'm not quite sure what this means.  Can't you always just put a
>> >> > 
>> >> > bla:	asm("");
>> >> > 
>> >> > in there, and use the address of "bla"?
>> >> 
>> >> Not AFAIKS. Put it where?
>> > 
>> > After wherever you want to know the address after.  You will have to
>> > make sure they stay together somehow.
>> 
>> I still don't follow.
> 
> some_thing_you_want_to_know_the_address_after_let_us_call_it_A;
> empty_asm_that_we_can_take_the_address_of_known_as_B;
> 
> You have to make sure the compiler keeps A and B together, does not
> insert anything between them, does put them in the assembler output in
> the same fragment, etc.

How does all this help our problem of putting the address of the trap 
into the table?

> 
>> If you could give a built in that put a label at the address of the trap 
>> instruction that could be used later by inline asm then that could work
>> too:
>> 
>>     __builtin_labeled_trap("1:");
>>     asm ("    .section __bug_table,\"aw\"  \n\t"
>>          "2:  .4byte 1b - 2b               \n\t"
>>          "    .previous");
> 
> How could a compiler do anything like that?!

How could it add a label at the trap instruction it generates? It didn't 
seem like an outlandish thing to do, but I'm not a compiler writer. It was 
just a handwaving idea to show what we want to be able to do.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v2 RESEND] powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC
From: Paul Moore @ 2021-08-27  0:52 UTC (permalink / raw)
  To: Michael Ellerman, rgb
  Cc: linux-kernel, Eric Paris, linux-audit, Paul Mackerras,
	linuxppc-dev
In-Reply-To: <87tujc9srr.fsf@mpe.ellerman.id.au>

On Thu, Aug 26, 2021 at 10:37 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
> Paul Moore <paul@paul-moore.com> writes:
> > On Tue, Aug 24, 2021 at 1:11 PM Christophe Leroy
> > <christophe.leroy@csgroup.eu> wrote:
> >> Le 24/08/2021 à 16:47, Paul Moore a écrit :
> >> > On Tue, Aug 24, 2021 at 9:36 AM Christophe Leroy
> >> > <christophe.leroy@csgroup.eu> wrote:
> >> >>
> >> >> Commit e65e1fc2d24b ("[PATCH] syscall class hookup for all normal
> >> >> targets") added generic support for AUDIT but that didn't include
> >> >> support for bi-arch like powerpc.
> >> >>
> >> >> Commit 4b58841149dc ("audit: Add generic compat syscall support")
> >> >> added generic support for bi-arch.
> >> >>
> >> >> Convert powerpc to that bi-arch generic audit support.
> >> >>
> >> >> Cc: Paul Moore <paul@paul-moore.com>
> >> >> Cc: Eric Paris <eparis@redhat.com>
> >> >> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> >> >> ---
> >> >> Resending v2 with Audit people in Cc
> >> >>
> >> >> v2:
> >> >> - Missing 'git add' for arch/powerpc/include/asm/unistd32.h
> >> >> - Finalised commit description
> >> >> ---
> >> >>   arch/powerpc/Kconfig                |  5 +-
> >> >>   arch/powerpc/include/asm/unistd32.h |  7 +++
> >> >>   arch/powerpc/kernel/Makefile        |  3 --
> >> >>   arch/powerpc/kernel/audit.c         | 84 -----------------------------
> >> >>   arch/powerpc/kernel/compat_audit.c  | 44 ---------------
> >> >>   5 files changed, 8 insertions(+), 135 deletions(-)
> >> >>   create mode 100644 arch/powerpc/include/asm/unistd32.h
> >> >>   delete mode 100644 arch/powerpc/kernel/audit.c
> >> >>   delete mode 100644 arch/powerpc/kernel/compat_audit.c
> >> >
> >> > Can you explain, in detail please, the testing you have done to verify
> >> > this patch?
> >> >
> >>
> >> I built ppc64_defconfig and checked that the generated code is functionnaly equivalent.
> >>
> >> ppc32_classify_syscall() is exactly the same as audit_classify_compat_syscall() except that the
> >> later takes the syscall as second argument (ie in r4) whereas the former takes it as first argument
> >> (ie in r3).
> >>
> >> audit_classify_arch() and powerpc audit_classify_syscall() are slightly different between the
> >> powerpc version and the generic version because the powerpc version checks whether it is
> >> AUDIT_ARCH_PPC or not (ie value 20), while the generic one checks whether it has bit
> >> __AUDIT_ARCH_64BIT set or not (__AUDIT_ARCH_64BIT is the sign bit of a word), but taking into
> >> account that the abi is either AUDIT_ARCH_PPC, AUDIT_ARCH_PPC64 or AUDIT_ARCH_PPC64LE, the result is
> >> the same.
> >>
> >> If you are asking I guess you saw something wrong ?
> >
> > I was asking because I didn't see any mention of testing, and when you
> > are enabling something significant like this it is nice to see that it
> > has been verified to work :)
> >
> > While binary dumps and comparisons are nice, it is always good to see
> > verification from a test suite.  I don't have access to the necessary
> > hardware to test this, but could you verify that the audit-testsuite
> > passes on your test system with your patches applied?
> >
> >  * https://github.com/linux-audit/audit-testsuite
>
> I tested on ppc64le. Both before and after the patch I get the result
> below.
>
> So I guess the patch is OK, but maybe we have some existing issue.
>
> I had a bit of a look at the test code, but my perl is limited. I think
> it was running the command below, and it returned "<no matches>", but
> not really sure what that means.

If it makes you feel any better, my perl is *very* limited; thankfully
this isn't my first time looking at that test :)

It's a little odd, but after some basic sanity tests at the top, the
test sets a watch on a file, /tmp/<rando_string>, and tells the kernel
to generate audit records for any syscall that operates on that file.
It then creates, and removes, a series of exclude audit filters to
check if the exclude filtering is working as expected, e.g. when
syscall filtering is excluded there should be no syscall records in
the audit log.

In the case you describe, it looks like it looks like the audit
exclude filter is removed (that's what line 147 does), the
/tmp/<rando_string> file is removed (line 152), and then we check to
see if any syscall records exist (line 164, and yes, there should be
*something* there for the unlink/rm).

It may be of little consolation, but this test works just fine on
recent kernels running on both x86_64 and aarch64.  I don't have
access to a powerpc system of any vintage, but I added Richard to the
To line above in case he has easier access to a test system (I suspect
the RH/IBM linkage should help in this regard).  Otherwise I would
suggest starting to debug this by simply doing some basic tests using
auditctl to create rules and exclude rules to see what is working, and
what isn't; that might provide some clues.

Sorry I'm not much more help at this point :/

>   $ sudo ausearch -i -m SYSCALL -p 216440 -ui 0 -gi 0 -ul 0 -su unconfined _u:unconfined_r:unconfined_t:s0-s0:c0.c1023 -ts recent
>   <no matches>
>
> cheers
>
>
>
> Running as   user    root
>         with context unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
>         on   system  Fedora
>
> backlog_wait_time_actual_reset/test .. ok
> exec_execve/test ..................... ok
> exec_name/test ....................... ok
> file_create/test ..................... ok
> file_delete/test ..................... ok
> file_rename/test ..................... ok
> filter_exclude/test .................. 1/21
> # Test 20 got: "256" (filter_exclude/test at line 167)
> #    Expected: "0"
> #  filter_exclude/test line 167 is: ok( $result, 0 );
> # Test 21 got: "0" (filter_exclude/test at line 179)
> #    Expected: "1"
> #  filter_exclude/test line 179 is: ok( $found_msg, 1 );
> filter_exclude/test .................. Failed 2/21 subtests
> filter_saddr_fam/test ................ ok
> filter_sessionid/test ................ ok
> login_tty/test ....................... ok
> lost_reset/test ...................... ok
> netfilter_pkt/test ................... ok
> syscalls_file/test ................... ok
> syscall_module/test .................. ok
> time_change/test ..................... ok
> user_msg/test ........................ ok
> fanotify/test ........................ ok
> bpf/test ............................. ok
>
> Test Summary Report
> -------------------
> filter_exclude/test                (Wstat: 0 Tests: 21 Failed: 2)
>   Failed tests:  20-21
> Files=18, Tests=202, 45 wallclock secs ( 0.18 usr  0.03 sys + 20.15 cusr  0.92 csys = 21.28 CPU)
> Result: FAIL
> Failed 1/18 test programs. 2/202 subtests failed.



-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH v2 2/3] selftests/powerpc: Add test for real address error handling
From: Ganesh @ 2021-08-26 12:57 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: mikey, mahesh, npiggin
In-Reply-To: <87eeagc2c1.fsf@mpe.ellerman.id.au>

[-- Attachment #1: Type: text/plain, Size: 1767 bytes --]


On 8/26/21 8:57 AM, Michael Ellerman wrote:
> Ganesh <ganeshgr@linux.ibm.com> writes:
>> On 8/24/21 6:18 PM, Michael Ellerman wrote:
>>
>>> Ganesh Goudar <ganeshgr@linux.ibm.com> writes:
>>>> Add test for real address or control memory address access
>>>> error handling, using NX-GZIP engine.
>>>>
>>>> The error is injected by accessing the control memory address
>>>> using illegal instruction, on successful handling the process
>>>> attempting to access control memory address using illegal
>>>> instruction receives SIGBUS.
>>> ...
>>>
>>>> diff --git a/tools/testing/selftests/powerpc/mce/inject-ra-err.sh b/tools/testing/selftests/powerpc/mce/inject-ra-err.sh
>>>> new file mode 100755
>>>> index 000000000000..3633cdc651a1
>>>> --- /dev/null
>>>> +++ b/tools/testing/selftests/powerpc/mce/inject-ra-err.sh
>>>> @@ -0,0 +1,18 @@
>>>> +#!/bin/bash
>>>> +# SPDX-License-Identifier: GPL-2.0-or-later
>>>> +
>>>> +if [[ ! -w /dev/crypto/nx-gzip ]]; then
>>>> +	echo "WARN: Can't access /dev/crypto/nx-gzip, skipping"
>>>> +	exit 0
>>>> +fi
>>>> +
>>>> +timeout 5 ./inject-ra-err
>>>> +
>>>> +# 128 + 7 (SIGBUS) = 135, 128 is a exit code with special meaning.
>>>> +if [ $? -ne 135 ]; then
>>>> +	echo "FAILED: Real address or Control memory access error not handled"
>>>> +	exit $?
>>>> +fi
>>>> +
>>>> +echo "OK: Real address or Control memory access error is handled"
>>>> +exit 0
>>> I don't think we really need the shell script, we should be able to do
>>> all that in the C code.
>>>
>>> Can you try this?
>> it works!, We need to set timeout, with 120 sec timeout we may flood the dmesg.
> Hmm. Does it keep faulting? The regs->nip += 4 is meant to avoid that.

Yes, it keeps faulting, if we fail to handle and not send SIGBUS to the process.

>
> cheers

[-- Attachment #2: Type: text/html, Size: 2865 bytes --]

^ permalink raw reply

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
From: Nathan Chancellor @ 2021-08-26 23:55 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: llvm, Nick Desaulniers, linux-kernel, clang-built-linux,
	Paul Mackerras, linuxppc-dev
In-Reply-To: <YSfjWfGbZbpYBn+w@Ryzen-9-3900X.localdomain>

On Thu, Aug 26, 2021 at 11:54:17AM -0700, Nathan Chancellor wrote:
> On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
> > Nathan Chancellor <nathan@kernel.org> writes:
> > > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
> > >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> > >> flexibility to GCC.
> > ...
> > >
> > > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
> > > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
> > > klist_add_tail to trigger over and over on boot when compiling with
> > > clang:
> > >
> > > [    2.177416][    T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
> > > [    2.177456][    T1] Modules linked in:
> > > [    2.177481][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-next-20210825 #1
> > > [    2.177520][    T1] NIP:  c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
> > > [    2.177557][    T1] REGS: c0000000073c32a0 TRAP: 0700   Tainted: G        W          (5.14.0-rc7-next-20210825)
> > > [    2.177593][    T1] MSR:  8000000002029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 22000a40  XER: 00000000
> > > [    2.177667][    T1] CFAR: c00000000090a034 IRQMASK: 0
> > > [    2.177667][    T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
> > > [    2.177667][    T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
> > > [    2.177667][    T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
> > > [    2.177667][    T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
> > > [    2.177667][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > [    2.177667][    T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > [    2.177667][    T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
> > > [    2.177667][    T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
> > > [    2.178019][    T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110
> > > [    2.178058][    T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290
> > > [    2.178088][    T1] Call Trace:
> > > [    2.178105][    T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable)
> > > [    2.178150][    T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290
> > > [    2.178190][    T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190
> > > [    2.178234][    T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0
> > > [    2.178275][    T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58
> > > [    2.178314][    T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0
> > > [    2.178357][    T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188
> > > [    2.178403][    T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc
> > > [    2.178445][    T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c
> > > [    2.178491][    T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220
> > > [    2.178530][    T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60
> > > [    2.178569][    T1] Instruction dump:
> > > [    2.178592][    T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
> > > [    2.178662][    T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
> > > [    2.178728][    T1] ---[ end trace 52ed3431f58f1847 ]---
> > >
> > > Is this a bug with clang or is there something wrong with the patch? The
> > > vmlinux image is available at [1] if you want to inspect it and our QEMU
> > > command and the warning at boot can be viewed at [2]. If there is any
> > > other information I can provide, please let me know.
> > >
> > > [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/
> > > [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs
> > 
> > Thanks.
> > 
> > This is the generated assembly:
> > 
> > c0000000007ff600 <.klist_add_tail>:
> > c0000000007ff600:       7c 08 02 a6     mflr    r0
> > c0000000007ff604:       f8 01 00 10     std     r0,16(r1)
> > c0000000007ff608:       f8 21 ff 71     stdu    r1,-144(r1)	^ prolog
> > c0000000007ff60c:       fb a1 00 78     std     r29,120(r1)	save r29 to stack
> > c0000000007ff610:       7c 7d 1b 78     mr      r29,r3		r29 = struct klist_node *n
> > c0000000007ff614:       38 60 00 01     li      r3,1		r3 = 1
> > c0000000007ff618:       fb 81 00 70     std     r28,112(r1)	save r28 to stack
> > c0000000007ff61c:       3b 9d 00 08     addi    r28,r29,8	r28 = &n->n_node
> > c0000000007ff620:       fb c1 00 80     std     r30,128(r1)	save r30 to stack
> > c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
> > c0000000007ff628:       38 9d 00 18     addi    r4,r29,24	r4 = &n->n_ref
> > c0000000007ff62c:       fb 9d 00 08     std     r28,8(r29)	n->n_node.next = &n->n_node	INIT_LIST_HEAD
> > c0000000007ff630:       fb 9d 00 10     std     r28,16(r29)	n->n_node.prev = &n->n_node
> > c0000000007ff634:       90 64 00 00     stw     r3,0(r4)	kref_init(&n->n_ref)
> > c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
> > c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero
> > 
> > 
> > From:
> > 
> > static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> > {
> > 	knode->n_klist = klist;
> > 	/* no knode deserves to start its life dead */
> > 	WARN_ON(knode_dead(knode));
> >                 ^^^^^^^^^^^^^^^^^
> > }
> > 
> > Which expands to:
> > 
> > static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> > {
> > 	knode->n_klist = klist;
> > 
> > 	({
> > 		bool __ret_warn_on = false;
> > 		do {
> >                 ...
> > 			} else {
> > 				__label__ __label_warn_on;
> > 				do {
> > 					asm goto(
> > 						"1:   "
> > 						"tdnei"
> > 						"
> > 						" " % 4,
> > 						0 " "\n " ".section __ex_table,\"a\";"
> > 										" "
> > 										".balign 4;"
> > 										" "
> > 										".long (1b) - . ;"
> > 										" "
> > 										".long (%l[__label_warn_on]) - . ;"
> > 										" "
> > 										".previous"
> > 										" "
> > 										".section __bug_table,\"aw\"\n"
> > 										"2:\t.4byte 1b - 2b, %0 - 2b\n"
> > 										"\t.short %1, %2\n"
> > 										".org 2b+%3\n"
> > 										".previous\n"
> > 						:
> > 						: "i"("lib/klist.c"), "i"(62),
> > 						  "i"((1 << 0) | ((9) << 8)),
> > 						  "i"(sizeof(struct bug_entry)),
> > 						  "r"(knode_dead(knode))
> >                                                   ^^^^^^^^^^^^^^^^^^^^^
> > 
> > 						:
> > 						: __label_warn_on);
> > 					asm("");
> > 				} while (0);
> > 				break;
> > 			__label_warn_on:
> > 				__ret_warn_on = true;
> > 			}
> > 		} while (0);
> > 		__builtin_expect(!!(__ret_warn_on), 0);
> > 	});
> > }
> > 
> > And knode_dead is:
> > 
> > #define KNODE_DEAD		1LU
> > 
> > static bool knode_dead(struct klist_node *knode)
> > {
> > 	return (unsigned long)knode->n_klist & KNODE_DEAD;
> > }
> > 
> > 
> > So it's meant to warn if (n_klist & KNODE_DEAD) is not equal to zero.
> > 
> > But in the asm:
> > 
> > c0000000007ff600 <.klist_add_tail>:
> > ...
> > c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
> > ...
> > c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
> > c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero
> > 
> > 
> > It's just warning if n_klist is not equal to zero. ie. we lost the "& KNODE_DEAD".
> > 
> > In the GCC output you can see it:
> > 
> > c0000000008c8a30 <.klist_node_init>:
> > c0000000008c8a30:       39 24 00 08     addi    r9,r4,8
> > c0000000008c8a34:       39 40 00 01     li      r10,1
> > c0000000008c8a38:       f9 24 00 08     std     r9,8(r4)
> > c0000000008c8a3c:       f9 24 00 10     std     r9,16(r4)
> > c0000000008c8a40:       91 44 00 18     stw     r10,24(r4)
> > c0000000008c8a44:       f8 64 00 00     std     r3,0(r4)
> > c0000000008c8a48:       54 69 07 fe     clrlwi  r9,r3,31
> > c0000000008c8a4c:       0b 09 00 00     tdnei   r9,0
> > 
> > ie. the clrlwi is "clear left (word) immediate", and zeroes everything
> > except bit 0, which is equivalent to "& KNODE_DEAD".
> > 
> > 
> > So there seems to be some misunderstanding with clang, it doesn't like
> > us passing an expression to the inline asm.
> > 
> > AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
> > doesn't have to just be a variable name.
> > 
> > This patch seems to fix it. Not sure if that's just papering over it though.
> > 
> > diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> > index 1ee0f22313ee..75fcb4370d96 100644
> > --- a/arch/powerpc/include/asm/bug.h
> > +++ b/arch/powerpc/include/asm/bug.h
> > @@ -119,7 +119,7 @@ __label_warn_on:						\
> >  								\
> >  			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
> >  				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
> > -				   __label_warn_on, "r" (x));	\
> > +				   __label_warn_on, "r" (!!(x))); \
> >  			break;					\
> >  __label_warn_on:						\
> >  			__ret_warn_on = true;			\
> > 
> > 
> > Generates:
> > 
> > c0000000008e2ac0 <.klist_add_tail>:
> > c0000000008e2ac0:       7c 08 02 a6     mflr    r0
> > c0000000008e2ac4:       f8 01 00 10     std     r0,16(r1)
> > c0000000008e2ac8:       f8 21 ff 71     stdu    r1,-144(r1)
> > c0000000008e2acc:       fb a1 00 78     std     r29,120(r1)
> > c0000000008e2ad0:       7c 7d 1b 78     mr      r29,r3
> > c0000000008e2ad4:       38 60 00 01     li      r3,1
> > c0000000008e2ad8:       fb c1 00 80     std     r30,128(r1)
> > c0000000008e2adc:       7c 9e 23 78     mr      r30,r4
> > c0000000008e2ae0:       38 9d 00 18     addi    r4,r29,24
> > c0000000008e2ae4:       57 c5 07 fe     clrlwi  r5,r30,31	<-
> > c0000000008e2ae8:       fb 81 00 70     std     r28,112(r1)
> > c0000000008e2aec:       3b 9d 00 08     addi    r28,r29,8
> > c0000000008e2af0:       fb 9d 00 08     std     r28,8(r29)
> > c0000000008e2af4:       fb 9d 00 10     std     r28,16(r29)
> > c0000000008e2af8:       90 64 00 00     stw     r3,0(r4)
> > c0000000008e2afc:       fb dd 00 00     std     r30,0(r29)
> > c0000000008e2b00:       0b 05 00 00     tdnei   r5,0		<-
> 
> Thank you for the in-depth explanation and triage! I have gone ahead and
> created a standalone reproducer that shows this based on the
> preprocessed file and opened https://bugs.llvm.org/show_bug.cgi?id=51634
> so the LLVM developers can take a look.

Based on Eli Friedman's comment in the bug, would something like this
work and not regress GCC? I noticed that the BUG_ON macro does a cast as
well. Nick pointed out to me privately that we have run into what seems
like a similar issue before in commit 6c58f25e6938 ("riscv/atomic: Fix
sign extension for RV64I").

Cheers,
Nathan

diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1ee0f22313ee..35022667f57d 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -119,7 +119,7 @@ __label_warn_on:                                            \
                                                                \
                        WARN_ENTRY(PPC_TLNEI " %4, 0",          \
                                   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN), \
-                                  __label_warn_on, "r" (x));   \
+                                  __label_warn_on, "r" ((__force long)(x)));   \
                        break;                                  \
 __label_warn_on:                                               \
                        __ret_warn_on = true;                   \

^ permalink raw reply related

* Re: [PATCH v2 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs
From: Sean Christopherson @ 2021-08-26 23:54 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: KVM list, Peter Zijlstra, Catalin Marinas, linux-kernel,
	Will Deacon, Guo Ren, linux-kselftest, Ben Gardon, shuah,
	Paul Mackerras, linux-s390, gor, Russell King, ARM Linux,
	linux-csky, Christian Borntraeger, Ingo Molnar, dvhart,
	linux-mips, Boqun Feng, paulmck, Heiko Carstens, rostedt,
	Shakeel Butt, Andy Lutomirski, Thomas Gleixner, Peter Foley,
	linux-arm-kernel, Thomas Bogendoerfer, Oleg Nesterov,
	Paolo Bonzini, linuxppc-dev
In-Reply-To: <1700758714.29394.1630003332081.JavaMail.zimbra@efficios.com>

On Thu, Aug 26, 2021, Mathieu Desnoyers wrote:
> ----- On Aug 25, 2021, at 8:51 PM, Sean Christopherson seanjc@google.com wrote:
> >> >> +		r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask);
> >> >> +		TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)",
> >> >> +			    errno, strerror(errno));
> >> >> +		atomic_inc(&seq_cnt);
> >> >> +
> >> >> +		CPU_CLR(cpu, &allowed_mask);
> >> >> +
> >> >> +		/*
> >> >> +		 * Let the read-side get back into KVM_RUN to improve the odds
> >> >> +		 * of task migration coinciding with KVM's run loop.
> >> > 
> >> > This comment should be about increasing the odds of letting the seqlock
> >> > read-side complete. Otherwise, the delay between the two back-to-back
> >> > atomic_inc is so small that the seqlock read-side may never have time to
> >> > complete the reading the rseq cpu id and the sched_getcpu() call, and can
> >> > retry forever.
> > 
> > Hmm, but that's not why there's a delay.  I'm not arguing that a livelock isn't
> > possible (though that syscall would have to be screaming fast),
> 
> I don't think we have the same understanding of the livelock scenario. AFAIU the livelock
> would be caused by a too-small delay between the two consecutive atomic_inc() from one
> loop iteration to the next compared to the time it takes to perform a read-side critical
> section of the seqlock. Back-to-back atomic_inc can be performed very quickly, so I
> doubt that the sched_getcpu implementation have good odds to be fast enough to complete
> in that narrow window, leading to lots of read seqlock retry.

Ooooh, yeah, brain fart on my side.  I was thinking of the two atomic_inc() in the
same loop iteration and completely ignoring the next iteration.  Yes, I 100% agree
that a delay to ensure forward progress is needed.  An assertion in main() that the
reader complete at least some reasonable number of KVM_RUNs is also probably a good
idea, e.g. to rule out a false pass due to the reader never making forward progress.

FWIW, the do-while loop does make forward progress without a delay, but at ~50% 
throughput, give or take.

> > but the primary motivation is very much to allow the read-side enough time
> > to get back into KVM proper.
> 
> I'm puzzled by your statement. AFAIU, let's say we don't have the delay, then we
> have:
> 
> migration thread                             KVM_RUN/read-side thread
> -----------------------------------------------------------------------------------
>                                              - ioctl(KVM_RUN)
> - atomic_inc_seq_cst(&seqcnt)
> - sched_setaffinity
> - atomic_inc_seq_cst(&seqcnt)
>                                              - a = atomic_load(&seqcnt) & ~1
>                                              - smp_rmb()
>                                              - b = LOAD_ONCE(__rseq_abi->cpu_id);
>                                              - sched_getcpu()
>                                              - smp_rmb()
>                                              - re-load seqcnt/compare (succeeds)
>                                                - Can only succeed if entire read-side happens while the seqcnt
>                                                  is in an even numbered state.
>                                              - if (a != b) abort()
>   /* no delay. Even counter state is very
>      short. */
> - atomic_inc_seq_cst(&seqcnt)
>   /* Let's suppose the lack of delay causes the
>      setaffinity to complete too early compared
>      with KVM_RUN ioctl */
> - sched_setaffinity
> - atomic_inc_seq_cst(&seqcnt)
> 
>   /* no delay. Even counter state is very
>      short. */
> - atomic_inc_seq_cst(&seqcnt)
>   /* Then a setaffinity from a following
>      migration thread loop will run
>      concurrently with KVM_RUN */
>                                              - ioctl(KVM_RUN)
> - sched_setaffinity
> - atomic_inc_seq_cst(&seqcnt)
> 
> As pointed out here, if the first setaffinity runs too early compared with KVM_RUN,
> a following setaffinity will run concurrently with it. However, the fact that 
> the even counter state is very short may very well hurt progress of the read seqlock.

*sigh*

Several hours later, I think I finally have my head wrapped around everything.

Due to the way the test is written and because of how KVM's run loop works,
TIF_NOTIFY_RESUME or TIF_NEED_RESCHED effectively has to be set before KVM actually
enters the guest, otherwise KVM will exit to userspace without touching the flag,
i.e. it will be handled by the normal exit_to_user_mode_loop().

Where I got lost was trying to figure out why I couldn't make the bug reproduce by
causing the guest to exit to KVM, but not userspace, in which case KVM should
easily trigger the problematic flow as the window for sched_getcpu() to collide
with KVM would be enormous.  The reason I didn't go down this route for the
"official" test is that, unless there's something clever I'm overlooking, it
requires arch specific guest code, and ialso I don't know that forcing an exit
would even be necessary/sufficient on other architectures.

Anyways, I was trying to confirm that the bug was being hit without a delay, while
still retaining the sequence retry in the test.  The test doesn't fail because the
back-to-back atomic_inc() changes seqcnt too fast.  The read-side makes forward
progress, but it never observes failure because the do-while loop only ever
completes after another sched_setaffinity(), never after the one that collides
with KVM because it takes too long to get out of ioctl(KVM_RUN) and back to the
test.  I.e. the atomic_inc() in the next loop iteration (makes seq_cnt odd) always
completes before the check, and so the check ends up spinning until another
migration, which correctly updates rseq.  This was expected and didn't confuse me.

What confused me is that I was trying to confirm the bug was being hit from within
the kernel by confirming KVM observed TIF_NOTIFY_RESUME, but I misunderstood when
TIF_NOTIFY_RESUME would get set.  KVM can observe TIF_NOTIFY_RESUME directly, but
it's rare, and I suspect happens iff sched_setaffinity() hits the small window where
it collides with KVM_RUN before KVM enters the guest.

More commonly, the bug occurs when KVM sees TIF_NEED_RESCHED.  In that case, KVM
calls xfer_to_guest_mode_work(), which does schedule() and _that_ sets
TIF_NOTIFY_RESUME.  xfer_to_guest_mode_work() then mishandles TIF_NOTIFY_RESUME
and the bug is hit, but my confirmation logic in KVM never fired.

So there are effectively three reasons we want a delay:

  1. To allow sched_setaffinity() to coincide with ioctl(KVM_RUN) before KVM can
     enter the guest so that the guest doesn't need an arch-specific VM-Exit source.

  2. To let ioctl(KVM_RUN) make its way back to the test before the next round
     of migration.

  3. To ensure the read-side can make forward progress, e.g. if sched_getcpu()
     involves a syscall.

After looking at KVM for arm64 and s390, #1 is a bit tenuous because x86 is the
only arch that currently uses xfer_to_guest_mode_work(), i.e. the test could be
tweaked to be overtly x86-specific.  But since a delay is needed for #2 and #3,
I'd prefer to rely on it for #1 as well in the hopes that this test provides
coverage for arm64 and/or s390 if they're ever converted to use the common
xfer_to_guest_mode_work().

^ permalink raw reply

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
From: Nathan Chancellor @ 2021-08-26 18:54 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: llvm, linux-kernel, clang-built-linux, Paul Mackerras,
	linuxppc-dev
In-Reply-To: <87h7fcc2m4.fsf@mpe.ellerman.id.au>

On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
> Nathan Chancellor <nathan@kernel.org> writes:
> > On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
> >> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
> >> flexibility to GCC.
> ...
> >
> > This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
> > flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
> > klist_add_tail to trigger over and over on boot when compiling with
> > clang:
> >
> > [    2.177416][    T1] WARNING: CPU: 0 PID: 1 at lib/klist.c:62 .klist_add_tail+0x3c/0x110
> > [    2.177456][    T1] Modules linked in:
> > [    2.177481][    T1] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W         5.14.0-rc7-next-20210825 #1
> > [    2.177520][    T1] NIP:  c0000000007ff81c LR: c00000000090a038 CTR: 0000000000000000
> > [    2.177557][    T1] REGS: c0000000073c32a0 TRAP: 0700   Tainted: G        W          (5.14.0-rc7-next-20210825)
> > [    2.177593][    T1] MSR:  8000000002029032 <SF,VEC,EE,ME,IR,DR,RI>  CR: 22000a40  XER: 00000000
> > [    2.177667][    T1] CFAR: c00000000090a034 IRQMASK: 0
> > [    2.177667][    T1] GPR00: c00000000090a038 c0000000073c3540 c000000001be3200 0000000000000001
> > [    2.177667][    T1] GPR04: c0000000072d65c0 0000000000000000 c0000000091ba798 c0000000091bb0a0
> > [    2.177667][    T1] GPR08: 0000000000000001 0000000000000000 c000000008581918 fffffffffffffc00
> > [    2.177667][    T1] GPR12: 0000000044000240 c000000001dd0000 c000000000012300 0000000000000000
> > [    2.177667][    T1] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [    2.177667][    T1] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [    2.177667][    T1] GPR24: 0000000000000000 c0000000017e3200 0000000000000000 c000000001a0e778
> > [    2.177667][    T1] GPR28: c0000000072d65b0 c0000000072d65a8 c000000007de72c8 c0000000073c35d0
> > [    2.178019][    T1] NIP [c0000000007ff81c] .klist_add_tail+0x3c/0x110
> > [    2.178058][    T1] LR [c00000000090a038] .bus_add_driver+0x148/0x290
> > [    2.178088][    T1] Call Trace:
> > [    2.178105][    T1] [c0000000073c3540] [c0000000073c35d0] 0xc0000000073c35d0 (unreliable)
> > [    2.178150][    T1] [c0000000073c35d0] [c00000000090a038] .bus_add_driver+0x148/0x290
> > [    2.178190][    T1] [c0000000073c3670] [c00000000090fae8] .driver_register+0xb8/0x190
> > [    2.178234][    T1] [c0000000073c3700] [c000000000be55c0] .__hid_register_driver+0x70/0xd0
> > [    2.178275][    T1] [c0000000073c37a0] [c00000000116955c] .redragon_driver_init+0x34/0x58
> > [    2.178314][    T1] [c0000000073c3820] [c000000000011ae0] .do_one_initcall+0x130/0x3b0
> > [    2.178357][    T1] [c0000000073c3bb0] [c0000000011065e0] .do_initcall_level+0xd8/0x188
> > [    2.178403][    T1] [c0000000073c3c50] [c0000000011064a8] .do_initcalls+0x7c/0xdc
> > [    2.178445][    T1] [c0000000073c3ce0] [c000000001106238] .kernel_init_freeable+0x178/0x21c
> > [    2.178491][    T1] [c0000000073c3d90] [c000000000012334] .kernel_init+0x34/0x220
> > [    2.178530][    T1] [c0000000073c3e10] [c00000000000cf50] .ret_from_kernel_thread+0x58/0x60
> > [    2.178569][    T1] Instruction dump:
> > [    2.178592][    T1] fba10078 7c7d1b78 38600001 fb810070 3b9d0008 fbc10080 7c9e2378 389d0018
> > [    2.178662][    T1] fb9d0008 fb9d0010 90640000 fbdd0000 <0b1e0000> e87e0018 28230000 41820024
> > [    2.178728][    T1] ---[ end trace 52ed3431f58f1847 ]---
> >
> > Is this a bug with clang or is there something wrong with the patch? The
> > vmlinux image is available at [1] if you want to inspect it and our QEMU
> > command and the warning at boot can be viewed at [2]. If there is any
> > other information I can provide, please let me know.
> >
> > [1] https://builds.tuxbuild.com/1xDcmp3Tvno0TTGxDVPedRKIKM2/
> > [2] https://github.com/ClangBuiltLinux/continuous-integration2/commit/cee159b66a58eb57fa2359e7888074b9da24126c/checks/3422232736/logs
> 
> Thanks.
> 
> This is the generated assembly:
> 
> c0000000007ff600 <.klist_add_tail>:
> c0000000007ff600:       7c 08 02 a6     mflr    r0
> c0000000007ff604:       f8 01 00 10     std     r0,16(r1)
> c0000000007ff608:       f8 21 ff 71     stdu    r1,-144(r1)	^ prolog
> c0000000007ff60c:       fb a1 00 78     std     r29,120(r1)	save r29 to stack
> c0000000007ff610:       7c 7d 1b 78     mr      r29,r3		r29 = struct klist_node *n
> c0000000007ff614:       38 60 00 01     li      r3,1		r3 = 1
> c0000000007ff618:       fb 81 00 70     std     r28,112(r1)	save r28 to stack
> c0000000007ff61c:       3b 9d 00 08     addi    r28,r29,8	r28 = &n->n_node
> c0000000007ff620:       fb c1 00 80     std     r30,128(r1)	save r30 to stack
> c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
> c0000000007ff628:       38 9d 00 18     addi    r4,r29,24	r4 = &n->n_ref
> c0000000007ff62c:       fb 9d 00 08     std     r28,8(r29)	n->n_node.next = &n->n_node	INIT_LIST_HEAD
> c0000000007ff630:       fb 9d 00 10     std     r28,16(r29)	n->n_node.prev = &n->n_node
> c0000000007ff634:       90 64 00 00     stw     r3,0(r4)	kref_init(&n->n_ref)
> c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
> c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero
> 
> 
> From:
> 
> static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> {
> 	knode->n_klist = klist;
> 	/* no knode deserves to start its life dead */
> 	WARN_ON(knode_dead(knode));
>                 ^^^^^^^^^^^^^^^^^
> }
> 
> Which expands to:
> 
> static void knode_set_klist(struct klist_node *knode, struct klist *klist)
> {
> 	knode->n_klist = klist;
> 
> 	({
> 		bool __ret_warn_on = false;
> 		do {
>                 ...
> 			} else {
> 				__label__ __label_warn_on;
> 				do {
> 					asm goto(
> 						"1:   "
> 						"tdnei"
> 						"
> 						" " % 4,
> 						0 " "\n " ".section __ex_table,\"a\";"
> 										" "
> 										".balign 4;"
> 										" "
> 										".long (1b) - . ;"
> 										" "
> 										".long (%l[__label_warn_on]) - . ;"
> 										" "
> 										".previous"
> 										" "
> 										".section __bug_table,\"aw\"\n"
> 										"2:\t.4byte 1b - 2b, %0 - 2b\n"
> 										"\t.short %1, %2\n"
> 										".org 2b+%3\n"
> 										".previous\n"
> 						:
> 						: "i"("lib/klist.c"), "i"(62),
> 						  "i"((1 << 0) | ((9) << 8)),
> 						  "i"(sizeof(struct bug_entry)),
> 						  "r"(knode_dead(knode))
>                                                   ^^^^^^^^^^^^^^^^^^^^^
> 
> 						:
> 						: __label_warn_on);
> 					asm("");
> 				} while (0);
> 				break;
> 			__label_warn_on:
> 				__ret_warn_on = true;
> 			}
> 		} while (0);
> 		__builtin_expect(!!(__ret_warn_on), 0);
> 	});
> }
> 
> And knode_dead is:
> 
> #define KNODE_DEAD		1LU
> 
> static bool knode_dead(struct klist_node *knode)
> {
> 	return (unsigned long)knode->n_klist & KNODE_DEAD;
> }
> 
> 
> So it's meant to warn if (n_klist & KNODE_DEAD) is not equal to zero.
> 
> But in the asm:
> 
> c0000000007ff600 <.klist_add_tail>:
> ...
> c0000000007ff624:       7c 9e 23 78     mr      r30,r4		r30 = struct klist *k
> ...
> c0000000007ff638:       fb dd 00 00     std     r30,0(r29)	n->n_klist = k
> c0000000007ff63c:       0b 1e 00 00     tdnei   r30,0		trap if r30 (k) is not zero
> 
> 
> It's just warning if n_klist is not equal to zero. ie. we lost the "& KNODE_DEAD".
> 
> In the GCC output you can see it:
> 
> c0000000008c8a30 <.klist_node_init>:
> c0000000008c8a30:       39 24 00 08     addi    r9,r4,8
> c0000000008c8a34:       39 40 00 01     li      r10,1
> c0000000008c8a38:       f9 24 00 08     std     r9,8(r4)
> c0000000008c8a3c:       f9 24 00 10     std     r9,16(r4)
> c0000000008c8a40:       91 44 00 18     stw     r10,24(r4)
> c0000000008c8a44:       f8 64 00 00     std     r3,0(r4)
> c0000000008c8a48:       54 69 07 fe     clrlwi  r9,r3,31
> c0000000008c8a4c:       0b 09 00 00     tdnei   r9,0
> 
> ie. the clrlwi is "clear left (word) immediate", and zeroes everything
> except bit 0, which is equivalent to "& KNODE_DEAD".
> 
> 
> So there seems to be some misunderstanding with clang, it doesn't like
> us passing an expression to the inline asm.
> 
> AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
> doesn't have to just be a variable name.
> 
> This patch seems to fix it. Not sure if that's just papering over it though.
> 
> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
> index 1ee0f22313ee..75fcb4370d96 100644
> --- a/arch/powerpc/include/asm/bug.h
> +++ b/arch/powerpc/include/asm/bug.h
> @@ -119,7 +119,7 @@ __label_warn_on:						\
>  								\
>  			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
>  				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
> -				   __label_warn_on, "r" (x));	\
> +				   __label_warn_on, "r" (!!(x))); \
>  			break;					\
>  __label_warn_on:						\
>  			__ret_warn_on = true;			\
> 
> 
> Generates:
> 
> c0000000008e2ac0 <.klist_add_tail>:
> c0000000008e2ac0:       7c 08 02 a6     mflr    r0
> c0000000008e2ac4:       f8 01 00 10     std     r0,16(r1)
> c0000000008e2ac8:       f8 21 ff 71     stdu    r1,-144(r1)
> c0000000008e2acc:       fb a1 00 78     std     r29,120(r1)
> c0000000008e2ad0:       7c 7d 1b 78     mr      r29,r3
> c0000000008e2ad4:       38 60 00 01     li      r3,1
> c0000000008e2ad8:       fb c1 00 80     std     r30,128(r1)
> c0000000008e2adc:       7c 9e 23 78     mr      r30,r4
> c0000000008e2ae0:       38 9d 00 18     addi    r4,r29,24
> c0000000008e2ae4:       57 c5 07 fe     clrlwi  r5,r30,31	<-
> c0000000008e2ae8:       fb 81 00 70     std     r28,112(r1)
> c0000000008e2aec:       3b 9d 00 08     addi    r28,r29,8
> c0000000008e2af0:       fb 9d 00 08     std     r28,8(r29)
> c0000000008e2af4:       fb 9d 00 10     std     r28,16(r29)
> c0000000008e2af8:       90 64 00 00     stw     r3,0(r4)
> c0000000008e2afc:       fb dd 00 00     std     r30,0(r29)
> c0000000008e2b00:       0b 05 00 00     tdnei   r5,0		<-

Thank you for the in-depth explanation and triage! I have gone ahead and
created a standalone reproducer that shows this based on the
preprocessed file and opened https://bugs.llvm.org/show_bug.cgi?id=51634
so the LLVM developers can take a look.

Cheers,
Nathan

^ permalink raw reply

* Re: [PATCH v2 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs
From: Mathieu Desnoyers @ 2021-08-26 18:42 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: KVM list, Peter Zijlstra, Catalin Marinas, linux-kernel,
	Will Deacon, Guo Ren, linux-kselftest, Ben Gardon, shuah,
	Paul Mackerras, linux-s390, gor, Russell King, ARM Linux,
	linux-csky, Christian Borntraeger, Ingo Molnar, dvhart,
	linux-mips, Boqun Feng, paulmck, Heiko Carstens, rostedt,
	Shakeel Butt, Andy Lutomirski, Thomas Gleixner, Peter Foley,
	linux-arm-kernel, Thomas Bogendoerfer, Oleg Nesterov,
	Paolo Bonzini, linuxppc-dev
In-Reply-To: <YSblqrrpKcORzilX@google.com>

----- On Aug 25, 2021, at 8:51 PM, Sean Christopherson seanjc@google.com wrote:

> On Mon, Aug 23, 2021, Mathieu Desnoyers wrote:
>> [ re-send to Darren Hart ]
>> 
>> ----- On Aug 23, 2021, at 11:18 AM, Mathieu Desnoyers
>> mathieu.desnoyers@efficios.com wrote:
>> 
>> > ----- On Aug 20, 2021, at 6:50 PM, Sean Christopherson seanjc@google.com wrote:
>> > 
>> >> Add a test to verify an rseq's CPU ID is updated correctly if the task is
>> >> migrated while the kernel is handling KVM_RUN.  This is a regression test
>> >> for a bug introduced by commit 72c3c0fe54a3 ("x86/kvm: Use generic xfer
>> >> to guest work function"), where TIF_NOTIFY_RESUME would be cleared by KVM
>> >> without updating rseq, leading to a stale CPU ID and other badness.
>> >> 
>> > 
>> > [...]
>> > 
>> > +#define RSEQ_SIG 0xdeadbeef
>> > 
>> > Is there any reason for defining a custom signature rather than including
>> > tools/testing/selftests/rseq/rseq.h ? This should take care of including
>> > the proper architecture header which will define the appropriate signature.
>> > 
>> > Arguably you don't define rseq critical sections in this test per se, but
>> > I'm wondering why the custom signature here.
> 
> Partly to avoid taking a dependency on rseq.h, and partly to try to call out
> that
> the test doesn't actually do any rseq critical sections.

It might be good to add a comment near this define stating this then, so nobody
attempts to copy this as an example.

> 
>> > [...]
>> > 
>> >> +
>> >> +static void *migration_worker(void *ign)
>> >> +{
>> >> +	cpu_set_t allowed_mask;
>> >> +	int r, i, nr_cpus, cpu;
>> >> +
>> >> +	CPU_ZERO(&allowed_mask);
>> >> +
>> >> +	nr_cpus = CPU_COUNT(&possible_mask);
>> >> +
>> >> +	for (i = 0; i < 20000; i++) {
>> >> +		cpu = i % nr_cpus;
>> >> +		if (!CPU_ISSET(cpu, &possible_mask))
>> >> +			continue;
>> >> +
>> >> +		CPU_SET(cpu, &allowed_mask);
>> >> +
>> >> +		/*
>> >> +		 * Bump the sequence count twice to allow the reader to detect
>> >> +		 * that a migration may have occurred in between rseq and sched
>> >> +		 * CPU ID reads.  An odd sequence count indicates a migration
>> >> +		 * is in-progress, while a completely different count indicates
>> >> +		 * a migration occurred since the count was last read.
>> >> +		 */
>> >> +		atomic_inc(&seq_cnt);
>> > 
>> > So technically this atomic_inc contains the required barriers because the
>> > selftests implementation uses "__sync_add_and_fetch(&addr->val, 1)". But
>> > it's rather odd that the semantic differs from the kernel implementation in
>> > terms of memory barriers: the kernel implementation of atomic_inc
>> > guarantees no memory barriers, but this one happens to provide full
>> > barriers pretty much by accident (selftests futex/include/atomic.h
>> > documents no such guarantee).
> 
> Yeah, I got quite lost trying to figure out what atomics the test would actually
> end up with.

At the very least, until things are clarified in the selftests atomic header, I would
recommend adding a comment stating which memory barriers are required alongside each
use of atomic_inc here. I would even prefer that we add extra (currently unneeded)
write barriers to make extra sure that this stays documented. Performance of the write-side
does not matter much here.

> 
>> > If this full barrier guarantee is indeed provided by the selftests atomic.h
>> > header, I would really like a comment stating that in the atomic.h header
>> > so the carpet is not pulled from under our feet by a future optimization.
>> > 
>> > 
>> >> +		r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask);
>> >> +		TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)",
>> >> +			    errno, strerror(errno));
>> >> +		atomic_inc(&seq_cnt);
>> >> +
>> >> +		CPU_CLR(cpu, &allowed_mask);
>> >> +
>> >> +		/*
>> >> +		 * Let the read-side get back into KVM_RUN to improve the odds
>> >> +		 * of task migration coinciding with KVM's run loop.
>> > 
>> > This comment should be about increasing the odds of letting the seqlock
>> > read-side complete. Otherwise, the delay between the two back-to-back
>> > atomic_inc is so small that the seqlock read-side may never have time to
>> > complete the reading the rseq cpu id and the sched_getcpu() call, and can
>> > retry forever.
> 
> Hmm, but that's not why there's a delay.  I'm not arguing that a livelock isn't
> possible (though that syscall would have to be screaming fast),

I don't think we have the same understanding of the livelock scenario. AFAIU the livelock
would be caused by a too-small delay between the two consecutive atomic_inc() from one
loop iteration to the next compared to the time it takes to perform a read-side critical
section of the seqlock. Back-to-back atomic_inc can be performed very quickly, so I
doubt that the sched_getcpu implementation have good odds to be fast enough to complete
in that narrow window, leading to lots of read seqlock retry.

> but the primary
> motivation is very much to allow the read-side enough time to get back into KVM
> proper.

I'm puzzled by your statement. AFAIU, let's say we don't have the delay, then we
have:

migration thread                             KVM_RUN/read-side thread
-----------------------------------------------------------------------------------
                                             - ioctl(KVM_RUN)
- atomic_inc_seq_cst(&seqcnt)
- sched_setaffinity
- atomic_inc_seq_cst(&seqcnt)
                                             - a = atomic_load(&seqcnt) & ~1
                                             - smp_rmb()
                                             - b = LOAD_ONCE(__rseq_abi->cpu_id);
                                             - sched_getcpu()
                                             - smp_rmb()
                                             - re-load seqcnt/compare (succeeds)
                                               - Can only succeed if entire read-side happens while the seqcnt
                                                 is in an even numbered state.
                                             - if (a != b) abort()
  /* no delay. Even counter state is very
     short. */
- atomic_inc_seq_cst(&seqcnt)
  /* Let's suppose the lack of delay causes the
     setaffinity to complete too early compared
     with KVM_RUN ioctl */
- sched_setaffinity
- atomic_inc_seq_cst(&seqcnt)

  /* no delay. Even counter state is very
     short. */
- atomic_inc_seq_cst(&seqcnt)
  /* Then a setaffinity from a following
     migration thread loop will run
     concurrently with KVM_RUN */
                                             - ioctl(KVM_RUN)
- sched_setaffinity
- atomic_inc_seq_cst(&seqcnt)

As pointed out here, if the first setaffinity runs too early compared with KVM_RUN,
a following setaffinity will run concurrently with it. However, the fact that 
the even counter state is very short may very well hurt progress of the read seqlock.

> 
> To encounter the bug, TIF_NOTIFY_RESUME has to be recognized by KVM in its run
> loop, i.e. sched_setaffinity() must induce task migration after the read-side
> has
> invoked ioctl(KVM_RUN).

No argument here.

> 
>> > I'm wondering if 1 microsecond is sufficient on other architectures as
>> > well.
> 
> I'm definitely wondering that as well :-)
> 
>> > One alternative way to make this depend less on the architecture's
>> > implementation of sched_getcpu (whether it's a vDSO, or goes through a
>> > syscall) would be to read the rseq cpu id and call sched_getcpu a few times
>> > (e.g. 3 times) in the migration thread rather than use usleep, and throw
>> > away the value read. This would ensure the delay is appropriate on all
>> > architectures.
> 
> As above, I think an arbitrary delay is required regardless of how fast
> sched_getcpu() can execute.  One thought would be to do sched_getcpu() _and_
> usleep() to account for sched_getcpu() overhead and to satisfy the KVM_RUN part,
> but I don't know that that adds meaningful value.
> 
> The real test is if someone could see if the bug repros on non-x86 hardware...

As I explain in the scenario above, I don't think we agree on the reason why the
delay is required. One way to confirm this would be to run the code without delay,
and count how many seqcnt read-side succeed vs fail. We can then compare that with
a run with a 1us delay between the migration thread loops. This data should help us
come to a common understanding of the delay's role.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply

* Re: [next-20210820][ppc][nvme/raid] pci unbind WARNS at fs/kernfs/dir.c:1524 kernfs_remove_by_name_ns+
From: Christoph Hellwig @ 2021-08-26 16:30 UTC (permalink / raw)
  To: Abdul Haleem
  Cc: Brian King, linux-next, linuxppc-dev, linux-kernel, linux-scsi
In-Reply-To: <063e6cf0-94ab-26f2-4fed-aebf1499127c@linux.vnet.ibm.com>

Are you sure this is the very latest linux-next?  This should hav been
fixed by:

    "block: add back the bd_holder_dir reference in bd_link_disk_holder"

^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
From: Segher Boessenkool @ 2021-08-26 15:30 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev
In-Reply-To: <1629989540.drlhb24t2w.astroid@bobo.none>

On Fri, Aug 27, 2021 at 01:04:36AM +1000, Nicholas Piggin wrote:
> Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am:
> >> No, they are all dispatched and issue to the BRU for execution. It's 
> >> trivial to construct a test of a lot of not taken branches in a row
> >> and time a loop of it to see it executes at 1 cycle per branch.
> > 
> > (s/dispatched/issued/)
> 
> ?

Dispatch is from decode to the issue queues.  Issue is from there to
execution units.  Dispatch is in-order, issue is not.

> >> How could it validate prediction without issuing? It wouldn't know when
> >> sources are ready.
> > 
> > In the backend.  But that is just how it worked on older cores :-/
> 
> Okay. I don't know about older cores than POWER9. Backend would normally 
> include execution though.
> Only other place you could do it if you don't
> issue/exec would be after it goes back in order, like completion.

You do not have to do the verification in-order: the insn cannot finish
until it is no longer speculative, that takes care of all ordering
needed.

> But that would be horrible for mispredict penalty.

See the previous point.  Also, any insn known to be mispredicted can be
flushed immediately anyway.

> >> >> The first problem seems like the show stopper though. AFAIKS it would 
> >> >> need a special builtin support that does something to create the table
> >> >> entry, or a guarantee that we could put an inline asm right after the
> >> >> builtin as a recognized pattern and that would give us the instruction
> >> >> following the trap.
> >> > 
> >> > I'm not quite sure what this means.  Can't you always just put a
> >> > 
> >> > bla:	asm("");
> >> > 
> >> > in there, and use the address of "bla"?
> >> 
> >> Not AFAIKS. Put it where?
> > 
> > After wherever you want to know the address after.  You will have to
> > make sure they stay together somehow.
> 
> I still don't follow.

some_thing_you_want_to_know_the_address_after_let_us_call_it_A;
empty_asm_that_we_can_take_the_address_of_known_as_B;

You have to make sure the compiler keeps A and B together, does not
insert anything between them, does put them in the assembler output in
the same fragment, etc.

> If you could give a built in that put a label at the address of the trap 
> instruction that could be used later by inline asm then that could work
> too:
> 
>     __builtin_labeled_trap("1:");
>     asm ("    .section __bug_table,\"aw\"  \n\t"
>          "2:  .4byte 1b - 2b               \n\t"
>          "    .previous");

How could a compiler do anything like that?!


Segher

^ permalink raw reply

* Re: [PATCH v2 3/4] powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use
From: kernel test robot @ 2021-08-26 15:04 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: kbuild-all, Nicholas Piggin
In-Reply-To: <20210825123714.706201-4-npiggin@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 6119 bytes --]

Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.14-rc7 next-20210826]
[cannot apply to scottwood/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64s-interrupt-speedups/20210825-204209
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-allnoconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/a0eb195f49a01ed45b3f97815470f9c8acaa4991
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nicholas-Piggin/powerpc-64s-interrupt-speedups/20210825-204209
        git checkout a0eb195f49a01ed45b3f97815470f9c8acaa4991
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arch/powerpc/kernel/irq.c: In function '__do_irq':
>> arch/powerpc/kernel/irq.c:742:13: error: implicit declaration of function 'should_hard_irq_enable'; did you mean 'do_hard_irq_enable'? [-Werror=implicit-function-declaration]
     742 |         if (should_hard_irq_enable())
         |             ^~~~~~~~~~~~~~~~~~~~~~
         |             do_hard_irq_enable
   In file included from <command-line>:
   In function 'do_hard_irq_enable',
       inlined from '__do_irq' at arch/powerpc/kernel/irq.c:743:3:
>> include/linux/compiler_types.h:328:45: error: call to '__compiletime_assert_34' declared with attribute error: BUILD_BUG failed
     328 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |                                             ^
   include/linux/compiler_types.h:309:25: note: in definition of macro '__compiletime_assert'
     309 |                         prefix ## suffix();                             \
         |                         ^~~~~~
   include/linux/compiler_types.h:328:9: note: in expansion of macro '_compiletime_assert'
     328 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
      59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
         |                     ^~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/hw_irq.h:447:9: note: in expansion of macro 'BUILD_BUG'
     447 |         BUILD_BUG();
         |         ^~~~~~~~~
   cc1: all warnings being treated as errors
--
   arch/powerpc/kernel/time.c: In function '____timer_interrupt':
>> arch/powerpc/kernel/time.c:570:13: error: implicit declaration of function 'should_hard_irq_enable'; did you mean 'do_hard_irq_enable'? [-Werror=implicit-function-declaration]
     570 |         if (should_hard_irq_enable()) {
         |             ^~~~~~~~~~~~~~~~~~~~~~
         |             do_hard_irq_enable
   In file included from <command-line>:
   In function 'do_hard_irq_enable',
       inlined from '____timer_interrupt' at arch/powerpc/kernel/time.c:584:3,
       inlined from 'timer_interrupt' at arch/powerpc/kernel/time.c:553:1:
>> include/linux/compiler_types.h:328:45: error: call to '__compiletime_assert_34' declared with attribute error: BUILD_BUG failed
     328 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |                                             ^
   include/linux/compiler_types.h:309:25: note: in definition of macro '__compiletime_assert'
     309 |                         prefix ## suffix();                             \
         |                         ^~~~~~
   include/linux/compiler_types.h:328:9: note: in expansion of macro '_compiletime_assert'
     328 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
      39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
         |                                     ^~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:59:21: note: in expansion of macro 'BUILD_BUG_ON_MSG'
      59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
         |                     ^~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/hw_irq.h:447:9: note: in expansion of macro 'BUILD_BUG'
     447 |         BUILD_BUG();
         |         ^~~~~~~~~
   cc1: all warnings being treated as errors


vim +742 arch/powerpc/kernel/irq.c

   727	
   728	void __do_irq(struct pt_regs *regs)
   729	{
   730		unsigned int irq;
   731	
   732		trace_irq_entry(regs);
   733	
   734		/*
   735		 * Query the platform PIC for the interrupt & ack it.
   736		 *
   737		 * This will typically lower the interrupt line to the CPU
   738		 */
   739		irq = ppc_md.get_irq();
   740	
   741		/* We can hard enable interrupts now to allow perf interrupts */
 > 742		if (should_hard_irq_enable())
   743			do_hard_irq_enable();
   744	
   745		/* And finally process it */
   746		if (unlikely(!irq))
   747			__this_cpu_inc(irq_stat.spurious_irqs);
   748		else
   749			generic_handle_irq(irq);
   750	
   751		trace_irq_exit(regs);
   752	}
   753	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 7152 bytes --]

^ permalink raw reply

* Re: [GIT PULL] retire legacy WR sbc8548 and sbc8641 platforms
From: Michael Ellerman @ 2021-08-26 15:05 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: Scott Wood, Paul Mackerras, linuxppc-dev, linux-kernel
In-Reply-To: <20210824174209.GB160508@windriver.com>

Paul Gortmaker <paul.gortmaker@windriver.com> writes:
> This is unchanged from the original wr_sbc-delete branch sent in January,
> other than to add the Acks from Scott in July, and update the baseline.
>
> Built with ppc64 defconfig and mpc85xx_cds_defconfig and mpc86xx_defconfig
> just to make sure I didn't fat finger anything in the baseline update.

Thanks for following up on this.

I ended up cherry-picking the patches into my branch. I like to keep my
next based on rc2, and merging this would have pulled in everything up
to rc7 into my branch.

I don't think you were planning to merge this branch anywhere else, so
it shouldn't make any difference, but let me know if it's a problem.

It should appear here soon:

  https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git/log/?h=next


cheers
 

> Original v1 text follows below, from:
>
> https://lore.kernel.org/lkml/20210111082823.99562-1-paul.gortmaker@windriver.com
>
> It would be nice to get this in and off our collective to-do list.
>
> Thanks,
> Paul.
>
>   ---
>
> In v2.6.27 (2008, 917f0af9e5a9) the sbc8260 support was implicitly
> retired by not being carried forward through the ppc --> powerpc
> device tree transition.
>
> Then, in v3.6 (2012, b048b4e17cbb) we retired the support for the
> sbc8560 boards.
>
> Next, in v4.18 (2017, 3bc6cf5a86e5) we retired the support for the
> 2006 vintage sbc834x boards.
>
> The sbc8548 and sbc8641d boards were maybe 1-2 years newer than the
> sbc834x boards, but it is also 3+ years later, so it makes sense to
> now retire them as well - which is what is done here.
>
> These two remaining WR boards were based on the Freescale MPC8548-CDS
> and the MPC8641D-HPCN reference board implementations.  Having had the
> chance to use these and many other Fsl ref boards, I know this:  The
> Freescale reference boards were typically produced in limited quantity
> and primarily available to BSP developers and hardware designers, and
> not likely to have found a 2nd life with hobbyists and/or collectors.
>
> It was good to have that BSP code subjected to mainline review and
> hence also widely available back in the day. But given the above, we
> should probably also be giving serious consideration to retiring
> additional similar age/type reference board platforms as well.
>
> I've always felt it is important for us to be proactive in retiring
> old code, since it has a genuine non-zero carrying cost, as described
> in the 930d52c012b8 merge log.  But for the here and now, we just
> clean up the remaining BSP code that I had added for SBC platforms.
>
> --- 
>
> The following changes since commit e22ce8eb631bdc47a4a4ea7ecf4e4ba499db4f93:
>
>   Linux 5.14-rc7 (2021-08-22 14:24:56 -0700)
>
> are available in the Git repository at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux.git wr_sbc-delete-v2
>
> for you to fetch changes up to d44e2dc12ea2112e74cdd25090eeda2727ed09cc:
>
>   MAINTAINERS: update for Paul Gortmaker (2021-08-24 08:19:01 -0400)
>
> ----------------------------------------------------------------
> Paul Gortmaker (3):
>       powerpc: retire sbc8548 board support
>       powerpc: retire sbc8641d board support
>       MAINTAINERS: update for Paul Gortmaker
>
>  MAINTAINERS                                 |   1 -
>  arch/powerpc/boot/Makefile                  |   1 -
>  arch/powerpc/boot/dts/fsl/sbc8641d.dts      | 176 -----------------
>  arch/powerpc/boot/dts/sbc8548-altflash.dts  | 111 -----------
>  arch/powerpc/boot/dts/sbc8548-post.dtsi     | 289 ----------------------------
>  arch/powerpc/boot/dts/sbc8548-pre.dtsi      |  48 -----
>  arch/powerpc/boot/dts/sbc8548.dts           | 106 ----------
>  arch/powerpc/boot/wrapper                   |   2 +-
>  arch/powerpc/configs/85xx/sbc8548_defconfig |  50 -----
>  arch/powerpc/configs/mpc85xx_base.config    |   1 -
>  arch/powerpc/configs/mpc86xx_base.config    |   1 -
>  arch/powerpc/configs/ppc6xx_defconfig       |   1 -
>  arch/powerpc/platforms/85xx/Kconfig         |   6 -
>  arch/powerpc/platforms/85xx/Makefile        |   1 -
>  arch/powerpc/platforms/85xx/sbc8548.c       | 134 -------------
>  arch/powerpc/platforms/86xx/Kconfig         |   8 +-
>  arch/powerpc/platforms/86xx/Makefile        |   1 -
>  arch/powerpc/platforms/86xx/sbc8641d.c      |  87 ---------
>  18 files changed, 2 insertions(+), 1022 deletions(-)
>  delete mode 100644 arch/powerpc/boot/dts/fsl/sbc8641d.dts
>  delete mode 100644 arch/powerpc/boot/dts/sbc8548-altflash.dts
>  delete mode 100644 arch/powerpc/boot/dts/sbc8548-post.dtsi
>  delete mode 100644 arch/powerpc/boot/dts/sbc8548-pre.dtsi
>  delete mode 100644 arch/powerpc/boot/dts/sbc8548.dts
>  delete mode 100644 arch/powerpc/configs/85xx/sbc8548_defconfig
>  delete mode 100644 arch/powerpc/platforms/85xx/sbc8548.c
>  delete mode 100644 arch/powerpc/platforms/86xx/sbc8641d.c

^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
From: Nicholas Piggin @ 2021-08-26 15:04 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev
In-Reply-To: <20210826143708.GC1583@gate.crashing.org>

Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am:
> On Thu, Aug 26, 2021 at 11:57:52PM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
>> > On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
>> >> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
>> >> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> >> >> This one possibly the branches end up in predictors, whereas conditional 
>> >> >> trap is always just speculated not to hit. Branches may also have a
>> >> >> throughput limit on execution whereas trap could be more (1 per cycle
>> >> >> vs 4 per cycle on POWER9).
>> >> > 
>> >> > I thought only *taken* branches are just one per cycle?
>> >> 
>> >> Taken branches are fetched by the front end at one per cycle (assuming 
>> >> they hit the BTAC), but all branches have to be executed by BR at one 
>> >> per cycle
>> > 
>> > This is not true.  (Simple) predicted not-taken conditional branches are
>> > just folded out, never hit the issue queues.  And they are fetched as
>> > many together as fit in a fetch group, can complete without limits as
>> > well.
>> 
>> No, they are all dispatched and issue to the BRU for execution. It's 
>> trivial to construct a test of a lot of not taken branches in a row
>> and time a loop of it to see it executes at 1 cycle per branch.
> 
> (s/dispatched/issued/)

?

> 
> Huh, this was true on p8 already.  Sorry for my confusion.  In my
> defence, this doesn't matter for performance on "real code".
> 
>> > Correctly predicted simple conditional branches just get their prediction
>> > validated (and that is not done in the execution units).  Incorrectly
>> > predicted branches the same, but those cause a redirect and refetch.
>> 
>> How could it validate prediction without issuing? It wouldn't know when
>> sources are ready.
> 
> In the backend.  But that is just how it worked on older cores :-/

Okay. I don't know about older cores than POWER9. Backend would normally 
include execution though. Only other place you could do it if you don't
issue/exec would be after it goes back in order, like completion. But
that would be horrible for mispredict penalty.

>> >> The first problem seems like the show stopper though. AFAIKS it would 
>> >> need a special builtin support that does something to create the table
>> >> entry, or a guarantee that we could put an inline asm right after the
>> >> builtin as a recognized pattern and that would give us the instruction
>> >> following the trap.
>> > 
>> > I'm not quite sure what this means.  Can't you always just put a
>> > 
>> > bla:	asm("");
>> > 
>> > in there, and use the address of "bla"?
>> 
>> Not AFAIKS. Put it where?
> 
> After wherever you want to know the address after.  You will have to
> make sure they stay together somehow.

I still don't follow.

> It is much easier to get the address of something, not the address after
> it.  If you know it is just one insn anyway, that isn't hard to handle
> either (even if prefixed insns exist).
> 
>> > If not, you need to say a lot
>> > more about what you actually want to do :-/
>> 
>> We need to put the address (or relative offset) of the trap instruction
>> into an entry in a different section. Basically this:
>> 
>>    __builtin_trap();
>>    asm ("1:                               \n\t"
>>         "    .section __bug_table,\"aw\"  \n\t"
>>         "2:  .4byte 1b - 2b - 4           \n\t"
>>         "    .previous");
>> 
>> Where the 1: label must follow immediately after the trap instruction, 
>> and where the asm must be emitted even for the case of no-return traps 
>> (but anything following the asm could be omitted).
> 
> The address *after* the trap insn?  That is much much harder than the
> address *of* the trap insn.

No the address of the trap instruction, hence the -4. I have the label
after because that is the semantics I have from inline asm.

If you could give a built in that put a label at the address of the trap 
instruction that could be used later by inline asm then that could work
too:

    __builtin_labeled_trap("1:");
    asm ("    .section __bug_table,\"aw\"  \n\t"
         "2:  .4byte 1b - 2b               \n\t"
         "    .previous");

^ permalink raw reply

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
From: Christophe Leroy @ 2021-08-26 14:53 UTC (permalink / raw)
  To: Michael Ellerman, Nathan Chancellor
  Cc: llvm, linux-kernel, clang-built-linux, Paul Mackerras,
	linuxppc-dev
In-Reply-To: <87sfyw9sel.fsf@mpe.ellerman.id.au>



Le 26/08/2021 à 16:45, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Le 26/08/2021 à 05:21, Michael Ellerman a écrit :
>>> Nathan Chancellor <nathan@kernel.org> writes:
>>>> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>>>>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>>>>> flexibility to GCC.
>>> ...
>>>>
>>>> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
>>>> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
>>>> klist_add_tail to trigger over and over on boot when compiling with
>>>> clang:
>>
>> ...
>>
>>>
>>> This patch seems to fix it. Not sure if that's just papering over it though.
>>>
>>> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
>>> index 1ee0f22313ee..75fcb4370d96 100644
>>> --- a/arch/powerpc/include/asm/bug.h
>>> +++ b/arch/powerpc/include/asm/bug.h
>>> @@ -119,7 +119,7 @@ __label_warn_on:						\
>>>    								\
>>>    			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
>>>    				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
>>> -				   __label_warn_on, "r" (x));	\
>>> +				   __label_warn_on, "r" (!!(x))); \
>>>    			break;					\
>>>    __label_warn_on:						\
>>>    			__ret_warn_on = true;			\
>>
>> But for a simple WARN_ON() call:
>>
>> void test(unsigned long b)
>> {
>> 	WARN_ON(b);
>> }
>>
>> Without your change with GCC you get:
>>
>> 00000000000012d0 <.test>:
>>       12d0:	0b 03 00 00 	tdnei   r3,0
>>       12d4:	4e 80 00 20 	blr
>>
>>
>> With the !! change you get:
>>
>> 00000000000012d0 <.test>:
>>       12d0:	31 23 ff ff 	addic   r9,r3,-1
>>       12d4:	7d 29 19 10 	subfe   r9,r9,r3
>>       12d8:	0b 09 00 00 	tdnei   r9,0
>>       12dc:	4e 80 00 20 	blr
> 
> Yeah that's a pity.
> 
> We could do something like below, which is ugly, but would be better
> than having to revert the whole thing.

Yes looks nice, we already had that kind of stuff in the past, even more ugly.

> 
> Although this doesn't fix the strange warning in drivers/net/ethernet/sfc.

What's the warning ?


> 
> So possibly we need a CLANG ifdef around the whole thing, and use the
> old style warn for clang.
> 

Christophe

^ permalink raw reply

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
From: Michael Ellerman @ 2021-08-26 14:45 UTC (permalink / raw)
  To: Christophe Leroy, Nathan Chancellor
  Cc: llvm, linux-kernel, clang-built-linux, Paul Mackerras,
	linuxppc-dev
In-Reply-To: <3fad8702-278a-d9f9-1882-6958ce570bcc@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 26/08/2021 à 05:21, Michael Ellerman a écrit :
>> Nathan Chancellor <nathan@kernel.org> writes:
>>> On Tue, Apr 13, 2021 at 04:38:10PM +0000, Christophe Leroy wrote:
>>>> Using asm goto in __WARN_FLAGS() and WARN_ON() allows more
>>>> flexibility to GCC.
>> ...
>>>
>>> This patch as commit 1e688dd2a3d6 ("powerpc/bug: Provide better
>>> flexibility to WARN_ON/__WARN_FLAGS() with asm goto") cause a WARN_ON in
>>> klist_add_tail to trigger over and over on boot when compiling with
>>> clang:
>
> ...
>
>> 
>> This patch seems to fix it. Not sure if that's just papering over it though.
>> 
>> diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
>> index 1ee0f22313ee..75fcb4370d96 100644
>> --- a/arch/powerpc/include/asm/bug.h
>> +++ b/arch/powerpc/include/asm/bug.h
>> @@ -119,7 +119,7 @@ __label_warn_on:						\
>>   								\
>>   			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
>>   				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
>> -				   __label_warn_on, "r" (x));	\
>> +				   __label_warn_on, "r" (!!(x))); \
>>   			break;					\
>>   __label_warn_on:						\
>>   			__ret_warn_on = true;			\
>
> But for a simple WARN_ON() call:
>
> void test(unsigned long b)
> {
> 	WARN_ON(b);
> }
>
> Without your change with GCC you get:
>
> 00000000000012d0 <.test>:
>      12d0:	0b 03 00 00 	tdnei   r3,0
>      12d4:	4e 80 00 20 	blr
>
>
> With the !! change you get:
>
> 00000000000012d0 <.test>:
>      12d0:	31 23 ff ff 	addic   r9,r3,-1
>      12d4:	7d 29 19 10 	subfe   r9,r9,r3
>      12d8:	0b 09 00 00 	tdnei   r9,0
>      12dc:	4e 80 00 20 	blr

Yeah that's a pity.

We could do something like below, which is ugly, but would be better
than having to revert the whole thing.

Although this doesn't fix the strange warning in drivers/net/ethernet/sfc.

So possibly we need a CLANG ifdef around the whole thing, and use the
old style warn for clang.

cheers


diff --git a/arch/powerpc/include/asm/bug.h b/arch/powerpc/include/asm/bug.h
index 1ee0f22313ee..d978d9004d0d 100644
--- a/arch/powerpc/include/asm/bug.h
+++ b/arch/powerpc/include/asm/bug.h
@@ -106,6 +106,12 @@ __label_warn_on:						\
 	}							\
 } while (0)
 
+#ifdef CONFIG_CC_IS_CLANG
+#define __clang_warn_hack(x)	(!!(x))
+#else
+#define __clang_warn_hack(x)	(x)
+#endif
+
 #define WARN_ON(x) ({						\
 	bool __ret_warn_on = false;				\
 	do {							\
@@ -119,7 +125,8 @@ __label_warn_on:						\
 								\
 			WARN_ENTRY(PPC_TLNEI " %4, 0",		\
 				   BUGFLAG_WARNING | BUGFLAG_TAINT(TAINT_WARN),	\
-				   __label_warn_on, "r" (x));	\
+				   __label_warn_on,		\
+				   "r" __clang_warn_hack(x));	\
 			break;					\
 __label_warn_on:						\
 			__ret_warn_on = true;			\



^ permalink raw reply related

* [next-20210820][ppc][nvme/raid] pci unbind WARNS at fs/kernfs/dir.c:1524 kernfs_remove_by_name_ns+
From: Abdul Haleem @ 2021-08-26 14:41 UTC (permalink / raw)
  To: linux-next; +Cc: Brian King, linuxppc-dev, linux-kernel, linux-scsi

Greeting's

Today's linux-next kernel WARN's while unbind of pci ssd flash device on 
my powerpc box

# lspci -nn
001d:80:00.0 Non-Volatile memory controller [0108]: Seagate Technology 
PLC Nytro Flash Storage [1bb1:0100]
001e:90:00.0 Non-Volatile memory controller [0108]: Seagate Technology 
PLC Nytro Flash Storage [1bb1:0100]

$ echo -n 001d:80:00.0 > /sys/bus/pci/drivers/nvme/unbind

md: md127: raid0 array has a missing/failed member
Buffer I/O error on dev md127, logical block 1498959, async page read
Buffer I/O error on dev md127, logical block 1498959, async page read
md127: detected capacity change from 191866880 to 0
md: md127 stopped.
------------[ cut here ]------------
kernfs: can not remove 'md127', no directory
WARNING: CPU: 21 PID: 11006 at fs/kernfs/dir.c:1524 
kernfs_remove_by_name_ns+0xc0/0x110

Modules linked in: dm_mod rpadlpar_io rpaphp kvm_pr kvm nf_tables 
libcrc32c nfnetlink tcp_diag udp_diag inet_diag unix_diag af_packet_diag 
netlink_diag bridge stp llc rfkill sg pseries_rng raid0 xts vmx_crypto 
uio_pdrv_genirq gf128mul uio nfsd auth_rpcgss nfs_acl lockd grace sunrpc 
binfmt_misc sch_fq_codel ip_tables ext4 mbcache jbd2 sd_mod ibmvscsi 
ibmvnic ibmveth scsi_transport_srp nvme nvme_core t10_piCPU: 21 PID: 
11006 Comm: mdadm Not tainted 5.14.0-rc6-next-20210820-autotest #1

NIP:  c00000000056dda0 LR: c00000000056dd9c CTR: 00000000007088ec
REGS: c000000004593680 TRAP: 0700   Not tainted 
(5.14.0-rc6-next-20210820-autotest)
MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 48044224  
XER: 0000000d
CFAR: c000000000145f00 IRQMASK: 0
GPR00: c00000000056dd9c c000000004593920 c0000000019b3200 000000000000002c
GPR04: 00000000ffff7fff c0000000045935f0 0000000000000027 c00000077cd07e08
GPR08: 0000000000000023 0000000000000001 0000000000000027 c0000000018694b0
GPR12: 0000000000004000 c00000001ec40680 00007fff84875040 ffffffffffffffff
GPR16: 0000000000000000 0000000000000001 0000000000000000 000001001e990280
GPR20: 0000000000000000 0000000000000000 0000000000000066 0000000000030d40
GPR24: c00000002bc591e8 0000000000000000 c00000002bc591c8 c000000025765c00
GPR28: c000000025765c70 c000000025765c00 0000000000000000 c000000025790640
NIP [c00000000056dda0] kernfs_remove_by_name_ns+0xc0/0x110
LR [c00000000056dd9c] kernfs_remove_by_name_ns+0xbc/0x110
Call Trace:
[c000000004593920] [c00000000056dd9c] 
kernfs_remove_by_name_ns+0xbc/0x110 (unreliable)
[c0000000045939b0] [c000000000572368] sysfs_remove_link+0x28/0x70
[c0000000045939d0] [c000000000678bd4] bd_unlink_disk_holder+0xc4/0x130
[c000000004593a10] [c00000000098e4e0] unbind_rdev_from_array+0x40/0x1b0
[c000000004593ac0] [c0000000009921e0] do_md_stop+0x410/0x5a0
[c000000004593bc0] [c0000000009966ac] md_ioctl+0xc6c/0x1c60
[c000000004593cd0] [c000000000654ebc] blkdev_ioctl+0x2fc/0x740
[c000000004593d40] [c0000000004ce284] block_ioctl+0x74/0x90
[c000000004593d60] [c00000000047b6e8] sys_ioctl+0xf8/0x150
[c000000004593db0] [c00000000002ff28] system_call_exception+0x158/0x2c0
[c000000004593e10] [c00000000000c764] system_call_common+0xf4/0x258
--- interrupt: c00 at 0x7fff84790290
NIP:  00007fff84790290 LR: 000000013d5c2ef0 CTR: 0000000000000000
REGS: c000000004593e80 TRAP: 0c00   Not tainted 
(5.14.0-rc6-next-20210820-autotest)
MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 
28044288  XER: 00000000
IRQMASK: 0
GPR00: 0000000000000036 00007fffd0148020 00007fff84877300 0000000000000003
GPR04: 0000000020000932 0000000000000000 0000000000030d40 00007fffd0148028
GPR08: 0000000000000003 0000000000000000 0000000000000000 0000000000000000
GPR12: 0000000000000000 00007fff8494a780 00007fff84875040 ffffffffffffffff
GPR16: 0000000000000000 0000000000000001 0000000000000000 000001001e990280
GPR20: 0000000000000000 0000000000000000 0000000000000066 0000000000030d40
GPR24: 0000000020000932 000001001e992ac0 ffffffffffffffff 000001001e990a40
GPR28: ffffffffffffffff 0000000000000018 00007fffd0148100 0000000000000003
NIP [00007fff84790290] 0x7fff84790290
LR [000000013d5c2ef0] 0x13d5c2ef0
--- interrupt: c00
Instruction dump:
ebe10088 38210090 e8010010 ebc1fff0 7c0803a6 4e800020 60000000 60000000
3c62ff5c 3863d168 4bbd8109 60000000 <0fe00000> fba10078 fbe10088 60000000
---[ end trace 634fa04d6dac7dfd ]---

-- 
Regard's

Abdul Haleem
IBM Linux Technology Center


^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
From: Segher Boessenkool @ 2021-08-26 14:37 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev
In-Reply-To: <1629983260.5jkx2jf0y8.astroid@bobo.none>

On Thu, Aug 26, 2021 at 11:57:52PM +1000, Nicholas Piggin wrote:
> Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
> > On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
> >> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
> >> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
> >> >> This one possibly the branches end up in predictors, whereas conditional 
> >> >> trap is always just speculated not to hit. Branches may also have a
> >> >> throughput limit on execution whereas trap could be more (1 per cycle
> >> >> vs 4 per cycle on POWER9).
> >> > 
> >> > I thought only *taken* branches are just one per cycle?
> >> 
> >> Taken branches are fetched by the front end at one per cycle (assuming 
> >> they hit the BTAC), but all branches have to be executed by BR at one 
> >> per cycle
> > 
> > This is not true.  (Simple) predicted not-taken conditional branches are
> > just folded out, never hit the issue queues.  And they are fetched as
> > many together as fit in a fetch group, can complete without limits as
> > well.
> 
> No, they are all dispatched and issue to the BRU for execution. It's 
> trivial to construct a test of a lot of not taken branches in a row
> and time a loop of it to see it executes at 1 cycle per branch.

(s/dispatched/issued/)

Huh, this was true on p8 already.  Sorry for my confusion.  In my
defence, this doesn't matter for performance on "real code".

> > Correctly predicted simple conditional branches just get their prediction
> > validated (and that is not done in the execution units).  Incorrectly
> > predicted branches the same, but those cause a redirect and refetch.
> 
> How could it validate prediction without issuing? It wouldn't know when
> sources are ready.

In the backend.  But that is just how it worked on older cores :-/

> >> The first problem seems like the show stopper though. AFAIKS it would 
> >> need a special builtin support that does something to create the table
> >> entry, or a guarantee that we could put an inline asm right after the
> >> builtin as a recognized pattern and that would give us the instruction
> >> following the trap.
> > 
> > I'm not quite sure what this means.  Can't you always just put a
> > 
> > bla:	asm("");
> > 
> > in there, and use the address of "bla"?
> 
> Not AFAIKS. Put it where?

After wherever you want to know the address after.  You will have to
make sure they stay together somehow.

It is much easier to get the address of something, not the address after
it.  If you know it is just one insn anyway, that isn't hard to handle
either (even if prefixed insns exist).

> > If not, you need to say a lot
> > more about what you actually want to do :-/
> 
> We need to put the address (or relative offset) of the trap instruction
> into an entry in a different section. Basically this:
> 
>    __builtin_trap();
>    asm ("1:                               \n\t"
>         "    .section __bug_table,\"aw\"  \n\t"
>         "2:  .4byte 1b - 2b - 4           \n\t"
>         "    .previous");
> 
> Where the 1: label must follow immediately after the trap instruction, 
> and where the asm must be emitted even for the case of no-return traps 
> (but anything following the asm could be omitted).

The address *after* the trap insn?  That is much much harder than the
address *of* the trap insn.


Segher

^ permalink raw reply

* Re: [PATCH v2 RESEND] powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC
From: Michael Ellerman @ 2021-08-26 14:37 UTC (permalink / raw)
  To: Paul Moore, Christophe Leroy
  Cc: linux-kernel, Eric Paris, linux-audit, Paul Mackerras,
	linuxppc-dev
In-Reply-To: <CAHC9VhSG8tPAkAAz5Z77HDMKXLAiaEOanxR+oY5c1E_Xoiso9Q@mail.gmail.com>

Paul Moore <paul@paul-moore.com> writes:
> On Tue, Aug 24, 2021 at 1:11 PM Christophe Leroy
> <christophe.leroy@csgroup.eu> wrote:
>> Le 24/08/2021 à 16:47, Paul Moore a écrit :
>> > On Tue, Aug 24, 2021 at 9:36 AM Christophe Leroy
>> > <christophe.leroy@csgroup.eu> wrote:
>> >>
>> >> Commit e65e1fc2d24b ("[PATCH] syscall class hookup for all normal
>> >> targets") added generic support for AUDIT but that didn't include
>> >> support for bi-arch like powerpc.
>> >>
>> >> Commit 4b58841149dc ("audit: Add generic compat syscall support")
>> >> added generic support for bi-arch.
>> >>
>> >> Convert powerpc to that bi-arch generic audit support.
>> >>
>> >> Cc: Paul Moore <paul@paul-moore.com>
>> >> Cc: Eric Paris <eparis@redhat.com>
>> >> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> >> ---
>> >> Resending v2 with Audit people in Cc
>> >>
>> >> v2:
>> >> - Missing 'git add' for arch/powerpc/include/asm/unistd32.h
>> >> - Finalised commit description
>> >> ---
>> >>   arch/powerpc/Kconfig                |  5 +-
>> >>   arch/powerpc/include/asm/unistd32.h |  7 +++
>> >>   arch/powerpc/kernel/Makefile        |  3 --
>> >>   arch/powerpc/kernel/audit.c         | 84 -----------------------------
>> >>   arch/powerpc/kernel/compat_audit.c  | 44 ---------------
>> >>   5 files changed, 8 insertions(+), 135 deletions(-)
>> >>   create mode 100644 arch/powerpc/include/asm/unistd32.h
>> >>   delete mode 100644 arch/powerpc/kernel/audit.c
>> >>   delete mode 100644 arch/powerpc/kernel/compat_audit.c
>> >
>> > Can you explain, in detail please, the testing you have done to verify
>> > this patch?
>> >
>>
>> I built ppc64_defconfig and checked that the generated code is functionnaly equivalent.
>>
>> ppc32_classify_syscall() is exactly the same as audit_classify_compat_syscall() except that the
>> later takes the syscall as second argument (ie in r4) whereas the former takes it as first argument
>> (ie in r3).
>>
>> audit_classify_arch() and powerpc audit_classify_syscall() are slightly different between the
>> powerpc version and the generic version because the powerpc version checks whether it is
>> AUDIT_ARCH_PPC or not (ie value 20), while the generic one checks whether it has bit
>> __AUDIT_ARCH_64BIT set or not (__AUDIT_ARCH_64BIT is the sign bit of a word), but taking into
>> account that the abi is either AUDIT_ARCH_PPC, AUDIT_ARCH_PPC64 or AUDIT_ARCH_PPC64LE, the result is
>> the same.
>>
>> If you are asking I guess you saw something wrong ?
>
> I was asking because I didn't see any mention of testing, and when you
> are enabling something significant like this it is nice to see that it
> has been verified to work :)
>
> While binary dumps and comparisons are nice, it is always good to see
> verification from a test suite.  I don't have access to the necessary
> hardware to test this, but could you verify that the audit-testsuite
> passes on your test system with your patches applied?
>
>  * https://github.com/linux-audit/audit-testsuite

I tested on ppc64le. Both before and after the patch I get the result
below.

So I guess the patch is OK, but maybe we have some existing issue.

I had a bit of a look at the test code, but my perl is limited. I think
it was running the command below, and it returned "<no matches>", but
not really sure what that means.

  $ sudo ausearch -i -m SYSCALL -p 216440 -ui 0 -gi 0 -ul 0 -su unconfined _u:unconfined_r:unconfined_t:s0-s0:c0.c1023 -ts recent
  <no matches>

cheers



Running as   user    root
        with context unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
        on   system  Fedora

backlog_wait_time_actual_reset/test .. ok
exec_execve/test ..................... ok
exec_name/test ....................... ok
file_create/test ..................... ok
file_delete/test ..................... ok
file_rename/test ..................... ok
filter_exclude/test .................. 1/21
# Test 20 got: "256" (filter_exclude/test at line 167)
#    Expected: "0"
#  filter_exclude/test line 167 is: ok( $result, 0 );
# Test 21 got: "0" (filter_exclude/test at line 179)
#    Expected: "1"
#  filter_exclude/test line 179 is: ok( $found_msg, 1 );
filter_exclude/test .................. Failed 2/21 subtests
filter_saddr_fam/test ................ ok
filter_sessionid/test ................ ok
login_tty/test ....................... ok
lost_reset/test ...................... ok
netfilter_pkt/test ................... ok
syscalls_file/test ................... ok
syscall_module/test .................. ok
time_change/test ..................... ok
user_msg/test ........................ ok
fanotify/test ........................ ok
bpf/test ............................. ok

Test Summary Report
-------------------
filter_exclude/test                (Wstat: 0 Tests: 21 Failed: 2)
  Failed tests:  20-21
Files=18, Tests=202, 45 wallclock secs ( 0.18 usr  0.03 sys + 20.15 cusr  0.92 csys = 21.28 CPU)
Result: FAIL
Failed 1/18 test programs. 2/202 subtests failed.

^ permalink raw reply

* Re: [PATCH linux-next] powerpc/tm: remove duplicate include in tm-poison.c
From: Michael Ellerman @ 2021-08-26 14:26 UTC (permalink / raw)
  To: Christophe Leroy, Shuah Khan, cgel.zte
  Cc: yong.yiran, shuah, Zeal Robot, linux-kernel, paulus,
	linux-kselftest, linuxppc-dev
In-Reply-To: <4bc97c33-7fc0-ff9d-041b-e773f682c5d2@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 24/08/2021 à 16:40, Shuah Khan a écrit :
>> On 8/5/21 12:52 AM, cgel.zte@gmail.com wrote:
>>> From: yong yiran <yong.yiran@zte.com.cn>
>>>
>>> 'inttypes.h' included in 'tm-poison.c' is duplicated.
>>> Remove all but the first include of inttypes.h from tm-poison.c.
>>>
>>> Reported-by: Zeal Robot <zealci@zte.com.cn>
>>> Signed-off-by: yong yiran <yong.yiran@zte.com.cn>
>>> ---
>>>   tools/testing/selftests/powerpc/tm/tm-poison.c | 1 -
>>>   1 file changed, 1 deletion(-)
>>>
>>> diff --git a/tools/testing/selftests/powerpc/tm/tm-poison.c 
>>> b/tools/testing/selftests/powerpc/tm/tm-poison.c
>>> index 29e5f26af7b9..27c083a03d1f 100644
>>> --- a/tools/testing/selftests/powerpc/tm/tm-poison.c
>>> +++ b/tools/testing/selftests/powerpc/tm/tm-poison.c
>>> @@ -20,7 +20,6 @@
>>>   #include <sched.h>
>>>   #include <sys/types.h>
>>>   #include <signal.h>
>>> -#include <inttypes.h>
>>>   #include "tm.h"
>>>
>> 
>> We can't accept this patch. The from and Signed-off-by don't match.
>> 
>
> As far as I can see they match. You have:
>
> From: yong yiran <yong.yiran@zte.com.cn>
> Signed-off-by: yong yiran <yong.yiran@zte.com.cn>

Regardless I already have a patch queued to fix this, from a different
CI bot.

cheers

^ permalink raw reply

* Re: [PATCH] powerpc: Make set_endian() return EINVAL when not supporting little endian
From: Michael Ellerman @ 2021-08-26 14:23 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <df2e9b65-9303-070e-b803-c64e20e2620d@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Le 26/08/2021 à 05:41, Michael Ellerman a écrit :
>> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>>> There is no point in modifying MSR_LE bit on CPUs not supporting
>>> little endian.
>> 
>> Isn't that an ABI break?
>
> Or an ABI fix ? I don't know.

It could break existing applications, even if the new semantics make
more sense. So that's a break IMHO :)

> My first thought was that all other 32 bits architectures were returning -EINVAL, but looking at the 
> man page of prctl, it is explicit that this is powerpc only.

It could be generic, but yeah seems we're the only arch that implements
it.

>> set_endian(PR_ENDIAN_BIG) should work on a big endian CPU, even if it
>> does nothing useful.
>
> Fair enough. But shouldn't in that case get_endian() return PR_ENDIAN_BIG instead of returning EINVAL ?

> We can do one or the other, but I think it should at least be consistant between them, shouldn't it ?

It should be consistent, but it isn't, and if we change it now we
potentially break existing userspace, which is bad.

I don't think it's widely used, and the risk of breakage would be
minimal, but it's not zero.

So I'm not sure it's worth changing it just for the sake of consistency.

cheers

^ permalink raw reply

* Re: [PATCH v2 2/2] powerpc/bug: Provide better flexibility to WARN_ON/__WARN_FLAGS() with asm goto
From: Segher Boessenkool @ 2021-08-26 14:12 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: llvm, linux-kernel, Nathan Chancellor, clang-built-linux,
	Paul Mackerras, linuxppc-dev
In-Reply-To: <87h7fcc2m4.fsf@mpe.ellerman.id.au>

On Thu, Aug 26, 2021 at 01:21:39PM +1000, Michael Ellerman wrote:
> So there seems to be some misunderstanding with clang, it doesn't like
> us passing an expression to the inline asm.
> 
> AFAIK it is legal to pass expressions as inputs to inline asm, ie. it
> doesn't have to just be a variable name.

It certainly is.  That is the whole point of inline asm!  This way, all
of the C code "around" the asm can be optimised.

> This patch seems to fix it. Not sure if that's just papering over it though.

It is, and it makes less optimised code (also on GCC), as Christophe
points out.


Segher

^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32
From: Nicholas Piggin @ 2021-08-26 13:57 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev
In-Reply-To: <20210826124901.GY1583@gate.crashing.org>

Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
> Hi!
> 
> On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
>> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> >> This one possibly the branches end up in predictors, whereas conditional 
>> >> trap is always just speculated not to hit. Branches may also have a
>> >> throughput limit on execution whereas trap could be more (1 per cycle
>> >> vs 4 per cycle on POWER9).
>> > 
>> > I thought only *taken* branches are just one per cycle?
>> 
>> Taken branches are fetched by the front end at one per cycle (assuming 
>> they hit the BTAC), but all branches have to be executed by BR at one 
>> per cycle
> 
> This is not true.  (Simple) predicted not-taken conditional branches are
> just folded out, never hit the issue queues.  And they are fetched as
> many together as fit in a fetch group, can complete without limits as
> well.

No, they are all dispatched and issue to the BRU for execution. It's 
trivial to construct a test of a lot of not taken branches in a row
and time a loop of it to see it executes at 1 cycle per branch.

> The BTAC is a frontend thing, used for target address prediction.  It
> does not limit execution.

I didn't say it did.

> Correctly predicted simple conditional branches just get their prediction
> validated (and that is not done in the execution units).  Incorrectly
> predicted branches the same, but those cause a redirect and refetch.

How could it validate prediction without issuing? It wouldn't know when
sources are ready.

> 
>> > Internally *all* traps are conditional, in GCC.  It also can optimise
>> > them quite well.  There must be something in the kernel macros that
>> > prevents good optimisation.
>> 
>> I did take a look at it at one point.
>> 
>> One problem is that the kernel needs the address of the trap instruction 
>> to create the entry for it. The other problem is that __builtin_trap 
>> does not return so it can't be used for WARN. LLVM at least seems to 
>> have a __builtin_debugtrap which does return.
> 
> This is <https://gcc.gnu.org/PR99299>.

Aha.

>> The first problem seems like the show stopper though. AFAIKS it would 
>> need a special builtin support that does something to create the table
>> entry, or a guarantee that we could put an inline asm right after the
>> builtin as a recognized pattern and that would give us the instruction
>> following the trap.
> 
> I'm not quite sure what this means.  Can't you always just put a
> 
> bla:	asm("");
> 
> in there, and use the address of "bla"?

Not AFAIKS. Put it where?

> If not, you need to say a lot
> more about what you actually want to do :-/

We need to put the address (or relative offset) of the trap instruction
into an entry in a different section. Basically this:

   __builtin_trap();
   asm ("1:                               \n\t"
        "    .section __bug_table,\"aw\"  \n\t"
        "2:  .4byte 1b - 2b - 4           \n\t"
        "    .previous");

Where the 1: label must follow immediately after the trap instruction, 
and where the asm must be emitted even for the case of no-return traps 
(but anything following the asm could be omitted).

Thanks,
Nick

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox