LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Arnd Bergmann @ 2020-09-22  9:01 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, linux-block, Al Viro, Andy Lutomirski,
	io-uring, linux-arm-kernel, Jens Axboe, Parisc List,
	Network Development, LKML, LSM List, Linux FS Devel,
	Andrew Morton, linuxppc-dev
In-Reply-To: <f25b4708-eba6-78d6-03f9-5bfb04e07627@gmail.com>

On Tue, Sep 22, 2020 at 9:59 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> On 22/09/2020 10:23, Arnd Bergmann wrote:
> > On Tue, Sep 22, 2020 at 8:32 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> >> On 22/09/2020 03:58, Andy Lutomirski wrote:
> >>> On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
> >>> I may be looking at a different kernel than you, but aren't you
> >>> preventing creating an io_uring regardless of whether SQPOLL is
> >>> requested?
> >>
> >> I diffed a not-saved file on a sleepy head, thanks for noticing.
> >> As you said, there should be an SQPOLL check.
> >>
> >> ...
> >> if (ctx->compat && (p->flags & IORING_SETUP_SQPOLL))
> >>         goto err;
> >
> > Wouldn't that mean that now 32-bit containers behave differently
> > between compat and native execution?
> >
> > I think if you want to prevent 32-bit applications from using SQPOLL,
> > it needs to be done the same way on both to be consistent:
>
> The intention was to disable only compat not native 32-bit.

I'm not following why that would be considered a valid option,
as that clearly breaks existing users that update from a 32-bit
kernel to a 64-bit one.

Taking away the features from users that are still on 32-bit kernels
already seems questionable to me, but being inconsistent
about it seems much worse, in particular when the regression
is on the upgrade path.

> > Can we expect all existing and future user space to have a sane
> > fallback when IORING_SETUP_SQPOLL fails?
>
> SQPOLL has a few differences with non-SQPOLL modes, but it's easy
> to convert between them. Anyway, SQPOLL is a privileged special
> case that's here for performance/latency reasons, I don't think
> there will be any non-accidental users of it.

Ok, so the behavior of 32-bit tasks would be the same as running
the same application as unprivileged 64-bit tasks, with applications
already having to implement that fallback, right?

      Arnd

^ permalink raw reply

* Re: [PATCH v3] powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h
From: Tony Ambardar @ 2020-09-22  8:38 UTC (permalink / raw)
  To: Sasha Levin; +Cc: Paul Mackerras, linuxppc-dev, Arnd Bergmann, Stable
In-Reply-To: <20200921125452.32E0E21D7A@mail.kernel.org>

On Mon, 21 Sep 2020 at 05:54, Sasha Levin <sashal@kernel.org> wrote:
>
> Hi
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.8.10, v5.4.66, v4.19.146, v4.14.198, v4.9.236, v4.4.236.
>
> v5.8.10: Build OK!
> v5.4.66: Build OK!
> v4.19.146: Build OK!
> v4.14.198: Failed to apply! Possible dependencies:
>     7af7919f0f4b ("tools include s390: Grab a copy of arch/s390/include/uapi/asm/unistd.h")
>     95f28190aa01 ("tools include arch: Grab a copy of errno.h for arch's supported by perf")
>     a3f22d505f56 ("s390/perf: add callback to perf to enable using AUX buffer")
>     a81c42136604 ("perf s390: add regs_query_register_offset()")
>     a9fc2db0a8ab ("s390/perf: define common DWARF register string table")
>     f704ef44602f ("s390/perf: add support for perf_regs and libdw")
>
> v4.9.236: Failed to apply! Possible dependencies:
>     0c744ea4f77d ("Linux 4.10-rc2")
>     2bd6bf03f4c1 ("Linux 4.14-rc1")
>     2ea659a9ef48 ("Linux 4.12-rc1")
>     49def1853334 ("Linux 4.10-rc4")
>     566cf877a1fc ("Linux 4.10-rc6")
>     5771a8c08880 ("Linux v4.13-rc1")
>     7089db84e356 ("Linux 4.10-rc8")
>     7a308bb3016f ("Linux 4.10-rc5")
>     7af7919f0f4b ("tools include s390: Grab a copy of arch/s390/include/uapi/asm/unistd.h")
>     7ce7d89f4883 ("Linux 4.10-rc1")
>     95f28190aa01 ("tools include arch: Grab a copy of errno.h for arch's supported by perf")
>     a121103c9228 ("Linux 4.10-rc3")
>     a81c42136604 ("perf s390: add regs_query_register_offset()")
>     a9fc2db0a8ab ("s390/perf: define common DWARF register string table")
>     b24413180f56 ("License cleanup: add SPDX GPL-2.0 license identifier to files with no license")
>     c1ae3cfa0e89 ("Linux 4.11-rc1")
>     c470abd4fde4 ("Linux 4.10")
>     d5adbfcd5f7b ("Linux 4.10-rc7")
>
> v4.4.236: Failed to apply! Possible dependencies:
>     0c4d40d58075 ("tools build: Add BPF feature check to test-all")
>     1925459b4d92 ("tools build: Fix feature Makefile issues with 'O='")
>     58683600dfe3 ("perf build: Use FEATURE-DUMP in bpf subproject")
>     76ee2ff34274 ("tools build feature: Move dwarf post unwind choice output into perf")
>     7af7919f0f4b ("tools include s390: Grab a copy of arch/s390/include/uapi/asm/unistd.h")
>     8ee4646038e4 ("perf build: Add libcrypto feature detection")
>     95f28190aa01 ("tools include arch: Grab a copy of errno.h for arch's supported by perf")
>     96b9e70b8e6c ("perf build: Introduce FEATURES_DUMP make variable")
>     9fd4186ac19a ("tools build: Allow subprojects select all feature checkers")
>     abb26210a395 ("perf tools: Force fixdep compilation at the start of the build")
>     aeafd623f866 ("perf tools: Move headers check into bash script")
>     c053a1506fae ("perf build: Select all feature checkers for feature-dump")
>     d4dfdf00d43e ("perf jvmti: Plug compilation into perf build")
>     d58ac0bf8d1e ("perf build: Add clang and llvm compile and linking support")
>     d8ad6a15cc3a ("tools lib bpf: Don't do a feature check when cleaning")
>     e12b202f8fb9 ("perf jitdump: Build only on supported archs")
>     e26e63be64a1 ("perf build: Add sdt feature detection")
>
>
> NOTE: The patch will not be queued to stable trees until it is upstream.
>
> How should we proceed with this patch?
>
[cc: linux-ppc, Arnd, Paul]

The patch makes identical changes to
'arch/powerpc/include/uapi/asm/errno.h' and its copy
'tools/arch/powerpc/include/uapi/asm/errno.h' first created in kernel
v4.16. Since it's the patch
hunk for the latter file which is failing on backports to < v4.16, I
would think it OK to skip
that hunk where the latter file is missing. I'd prefer to let Michael
decide the best course as
he's still reviewing the patch.

Thanks,
Tony
> --
> Thanks
> Sasha

^ permalink raw reply

* [PATCH] cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_reboot_notifier
From: Srikar Dronamraju @ 2020-09-22  8:02 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, Srikar Dronamraju, Pratik Rajesh Sampat,
	Daniel Axtens

The patch avoids allocating cpufreq_policy on stack hence fixing frame
size overflow in 'powernv_cpufreq_reboot_notifier'

./drivers/cpufreq/powernv-cpufreq.c: In function _powernv_cpufreq_reboot_notifier_:
./drivers/cpufreq/powernv-cpufreq.c:906:1: error: the frame size of 2064 bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
 }
 ^
cc1: all warnings being treated as errors
make[3]: *** [./scripts/Makefile.build:316: drivers/cpufreq/powernv-cpufreq.o] Error 1
make[2]: *** [./scripts/Makefile.build:556: drivers/cpufreq] Error 2
make[1]: *** [./Makefile:1072: drivers] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:157: sub-make] Error 2

Fixes: cf30af76 ("cpufreq: powernv: Set the cpus to nominal frequency during reboot/kexec")
Cc: Pratik Rajesh Sampat <psampat@linux.ibm.com>
Cc: Daniel Axtens <dja@axtens.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 drivers/cpufreq/powernv-cpufreq.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
index a9af15e..e439b43 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -885,12 +885,15 @@ static int powernv_cpufreq_reboot_notifier(struct notifier_block *nb,
 				unsigned long action, void *unused)
 {
 	int cpu;
-	struct cpufreq_policy cpu_policy;
+	struct cpufreq_policy *cpu_policy;
 
 	rebooting = true;
 	for_each_online_cpu(cpu) {
-		cpufreq_get_policy(&cpu_policy, cpu);
-		powernv_cpufreq_target_index(&cpu_policy, get_nominal_index());
+		cpu_policy = cpufreq_cpu_get(cpu);
+		if (!cpu_policy)
+			continue;
+		powernv_cpufreq_target_index(cpu_policy, get_nominal_index());
+		cpufreq_cpu_put(cpu_policy);
 	}
 
 	return NOTIFY_DONE;
-- 
1.8.3.1


^ permalink raw reply related

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Pavel Begunkov @ 2020-09-22  7:57 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, linux-block, Al Viro, Andy Lutomirski,
	io-uring, linux-arm-kernel, Jens Axboe, Parisc List,
	Network Development, LKML, LSM List, Linux FS Devel,
	Andrew Morton, linuxppc-dev
In-Reply-To: <CAK8P3a2-6JNS38EbZcLrk=cTT526oP=Rf0aoqWNSJ-k4XTYehQ@mail.gmail.com>

On 22/09/2020 10:23, Arnd Bergmann wrote:
> On Tue, Sep 22, 2020 at 8:32 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>> On 22/09/2020 03:58, Andy Lutomirski wrote:
>>> On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>> I may be looking at a different kernel than you, but aren't you
>>> preventing creating an io_uring regardless of whether SQPOLL is
>>> requested?
>>
>> I diffed a not-saved file on a sleepy head, thanks for noticing.
>> As you said, there should be an SQPOLL check.
>>
>> ...
>> if (ctx->compat && (p->flags & IORING_SETUP_SQPOLL))
>>         goto err;
> 
> Wouldn't that mean that now 32-bit containers behave differently
> between compat and native execution?
> 
> I think if you want to prevent 32-bit applications from using SQPOLL,
> it needs to be done the same way on both to be consistent:

The intention was to disable only compat not native 32-bit.

> 
>    if ((!IS_ENABLED(CONFIG_64BIT) || ctx->compat) &&
>         (p->flags & IORING_SETUP_SQPOLL))
>             goto err;
> 
> I don't really see how taking away SQPOLL from 32-bit tasks is
> any better than just preventing access to the known-broken files
> as Al suggested, or adding the hack to make it work as in
> Christoph's original patch.

That's why I'm hoping that Christoph's work and the discussion will
reach consensus, but the bug should be patched in the end. IMHO,
it's a good and easy enough fallback option (temporal?).

> 
> Can we expect all existing and future user space to have a sane
> fallback when IORING_SETUP_SQPOLL fails?

SQPOLL has a few differences with non-SQPOLL modes, but it's easy
to convert between them. Anyway, SQPOLL is a privileged special
case that's here for performance/latency reasons, I don't think
there will be any non-accidental users of it.
-- 
Pavel Begunkov

^ permalink raw reply

* Re: [PATCH] powerpc/64: Make VDSO32 track COMPAT on 64-bit
From: Srikar Dronamraju @ 2020-09-22  7:55 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, msuchanek, srikar, Stephen Rothwell
In-Reply-To: <160034211494.3342081.16977957933269532766.b4-ty@ellerman.id.au>

* Michael Ellerman <patch-notifications@ellerman.id.au> [2020-09-17 21:28:46]:

> On Tue, 8 Sep 2020 22:58:50 +1000, Michael Ellerman wrote:
> > When we added the VDSO32 kconfig symbol, which controls building of
> > the 32-bit VDSO, we made it depend on CPU_BIG_ENDIAN (for 64-bit).
> > 
> > That was because back then COMPAT was always enabled for 64-bit, so
> > depending on it would have left the 32-bit VDSO always enabled, which
> > we didn't want.
> > 
> > [...]
> 
> Applied to powerpc/next.
> 
> [1/1] powerpc/64: Make VDSO32 track COMPAT on 64-bit
>       https://git.kernel.org/powerpc/c/231b232df8f67e7d37af01259c21f2a131c3911e
> 
> cheers

With this commit which is part of powerpc/next and with
/opt/at12.0/bin/gcc --version
gcc (GCC) 8.4.1 20191125 (Advance-Toolchain 12.0-3) [e25f27eea473]
throws up a compile error on a witherspoon/PowerNV with CONFIG_COMPAT.
CONFIG_COMPAT got carried from the distro config. (And looks like most
distros seem to be having this config)

cc1: error: _-m32_ not supported in this configuration
make[4]: *** [arch/powerpc/kernel/vdso32/sigtramp.o] Error 1
make[4]: *** Waiting for unfinished jobs....
cc1: error: _-m32_ not supported in this configuration
make[4]: *** [arch/powerpc/kernel/vdso32/gettimeofday.o] Error 1
make[3]: *** [arch/powerpc/kernel/vdso32] Error 2
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [arch/powerpc/kernel] Error 2
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [arch/powerpc] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [__sub-make] Error 2

I don't seem to be facing with other compilers like "gcc (Ubuntu
7.4.0-1ubuntu1~18.04.1) 7.4.0" and I was able to disable CONFIG_COMPAT and
proceed with the build.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Arnd Bergmann @ 2020-09-22  7:23 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, linux-block, Al Viro, Andy Lutomirski,
	io-uring, linux-arm-kernel, Jens Axboe, Parisc List,
	Network Development, LKML, LSM List, Linux FS Devel,
	Andrew Morton, linuxppc-dev
In-Reply-To: <e0a1b4d1-ff47-18d1-d535-c62812cb3105@gmail.com>

On Tue, Sep 22, 2020 at 8:32 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> On 22/09/2020 03:58, Andy Lutomirski wrote:
> > On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
> > I may be looking at a different kernel than you, but aren't you
> > preventing creating an io_uring regardless of whether SQPOLL is
> > requested?
>
> I diffed a not-saved file on a sleepy head, thanks for noticing.
> As you said, there should be an SQPOLL check.
>
> ...
> if (ctx->compat && (p->flags & IORING_SETUP_SQPOLL))
>         goto err;

Wouldn't that mean that now 32-bit containers behave differently
between compat and native execution?

I think if you want to prevent 32-bit applications from using SQPOLL,
it needs to be done the same way on both to be consistent:

   if ((!IS_ENABLED(CONFIG_64BIT) || ctx->compat) &&
        (p->flags & IORING_SETUP_SQPOLL))
            goto err;

I don't really see how taking away SQPOLL from 32-bit tasks is
any better than just preventing access to the known-broken files
as Al suggested, or adding the hack to make it work as in
Christoph's original patch.

Can we expect all existing and future user space to have a sane
fallback when IORING_SETUP_SQPOLL fails?

      Arnd

^ permalink raw reply

* Re: [PATCH V2] powerpc/perf: Exclude pmc5/6 from the irrelevant PMU group constraints
From: Madhavan Srinivasan @ 2020-09-22  6:53 UTC (permalink / raw)
  To: Athira Rajeev, mpe; +Cc: maddy, linuxppc-dev
In-Reply-To: <1600672204-1610-1-git-send-email-atrajeev@linux.vnet.ibm.com>



On 9/21/20 12:40 PM, Athira Rajeev wrote:
> PMU counter support functions enforces event constraints for group of
> events to check if all events in a group can be monitored. Incase of
> event codes using PMC5 and PMC6 ( 500fa and 600f4 respectively ),
> not all constraints are applicable, say the threshold or sample bits.
> But current code includes pmc5 and pmc6 in some group constraints (like
> IC_DC Qualifier bits) which is actually not applicable and hence results
> in those events not getting counted when scheduled along with group of
> other events. Patch fixes this by excluding PMC5/6 from constraints
> which are not relevant for it.

Changes looks fine to me.

Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>

we need to CC this in Stable too.

> Fixes: 7ffd948 ("powerpc/perf: factor out power8 pmu functions")
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
> Changes in v2:
> - Added a block comment in the fix path explaining
>    why the change is needed.
>
>   arch/powerpc/perf/isa207-common.c | 10 ++++++++++
>   1 file changed, 10 insertions(+)
>
> diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c
> index 964437a..12153da 100644
> --- a/arch/powerpc/perf/isa207-common.c
> +++ b/arch/powerpc/perf/isa207-common.c
> @@ -288,6 +288,15 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
>
>   		mask  |= CNST_PMC_MASK(pmc);
>   		value |= CNST_PMC_VAL(pmc);
> +
> +		/*
> +		 * PMC5 and PMC6 are used to count cycles and instructions
> +		 * and these doesnot support most of the constraint bits.
> +		 * Add a check to exclude PMC5/6 from most of the constraints
> +		 * except for ebb/bhrb.
> +		 */
> +		if (pmc >= 5)
> +			goto ebb_bhrb;
>   	}
>
>   	if (pmc <= 4) {
> @@ -357,6 +366,7 @@ int isa207_get_constraint(u64 event, unsigned long *maskp, unsigned long *valp)
>   		}
>   	}
>
> +ebb_bhrb:
>   	if (!pmc && ebb)
>   		/* EBB events must specify the PMC */
>   		return -1;


^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Pavel Begunkov @ 2020-09-22  6:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, Arnd Bergmann, linux-block, Al Viro,
	io-uring, linux-arm-kernel, Jens Axboe, Parisc List,
	Network Development, LKML, LSM List, Linux FS Devel,
	Andrew Morton, linuxppc-dev
In-Reply-To: <CALCETrUEC81va8-fuUXG1uA5rbKxnKDYsDOXC70_HtKD4LAeAg@mail.gmail.com>

On 22/09/2020 03:58, Andy Lutomirski wrote:
> On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>>>>> Ah, so reading /dev/input/event* would suffer from the same issue,
>>>>>>> and that one would in fact be broken by your patch in the hypothetical
>>>>>>> case that someone tried to use io_uring to read /dev/input/event on x32...
>>>>>>>
>>>>>>> For reference, I checked the socket timestamp handling that has a
>>>>>>> number of corner cases with time32/time64 formats in compat mode,
>>>>>>> but none of those appear to be affected by the problem.
>>>>>>>
>>>>>>>> Aside from the potentially nasty use of per-task variables, one thing
>>>>>>>> I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
>>>>>>>> going to have a generic mechanism for this, shouldn't we allow a full
>>>>>>>> override of the syscall arch instead of just allowing forcing compat
>>>>>>>> so that a compat syscall can do a non-compat operation?
>>>>>>>
>>>>>>> The only reason it's needed here is that the caller is in a kernel
>>>>>>> thread rather than a system call. Are there any possible scenarios
>>>>>>> where one would actually need the opposite?
>>>>>>>
>>>>>>
>>>>>> I can certainly imagine needing to force x32 mode from a kernel thread.
>>>>>>
>>>>>> As for the other direction: what exactly are the desired bitness/arch semantics of io_uring?  Is the operation bitness chosen by the io_uring creation or by the io_uring_enter() bitness?
>>>>>
>>>>> It's rather the second one. Even though AFAIR it wasn't discussed
>>>>> specifically, that how it works now (_partially_).
>>>>
>>>> Double checked -- I'm wrong, that's the former one. Most of it is based
>>>> on a flag that was set an creation.
>>>>
>>>
>>> Could we get away with making io_uring_enter() return -EINVAL (or
>>> maybe -ENOTTY?) if you try to do it with bitness that doesn't match
>>> the io_uring?  And disable SQPOLL in compat mode?
>>
>> Something like below. If PF_FORCE_COMPAT or any other solution
>> doesn't lend by the time, I'll take a look whether other io_uring's
>> syscalls need similar checks, etc.
>>
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 0458f02d4ca8..aab20785fa9a 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -8671,6 +8671,10 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
>>         if (ctx->flags & IORING_SETUP_R_DISABLED)
>>                 goto out;
>>
>> +       ret = -EINVAl;
>> +       if (ctx->compat != in_compat_syscall())
>> +               goto out;
>> +
> 
> This seems entirely reasonable to me.  Sharing an io_uring ring
> between programs with different ABIs seems a bit nutty.
> 
>>         /*
>>          * For SQ polling, the thread will do all submissions and completions.
>>          * Just return the requested submit count, and wake the thread if
>> @@ -9006,6 +9010,10 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p,
>>         if (ret)
>>                 goto err;
>>
>> +       ret = -EINVAL;
>> +       if (ctx->compat)
>> +               goto err;
>> +
> 
> I may be looking at a different kernel than you, but aren't you
> preventing creating an io_uring regardless of whether SQPOLL is
> requested?

I diffed a not-saved file on a sleepy head, thanks for noticing.
As you said, there should be an SQPOLL check.

...
if (ctx->compat && (p->flags & IORING_SETUP_SQPOLL))
	goto err;

-- 
Pavel Begunkov

^ permalink raw reply

* Re: [PATCH v3 1/2] powerpc/64: Set up a kernel stack for secondaries before cpu_restore()
From: Jordan Niethe @ 2020-09-22  6:10 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: Oliver O'Halloran, linuxppc-dev, Nicholas Piggin
In-Reply-To: <a4bc673b-74e2-98ea-dac7-4e6d86d10d15@csgroup.eu>

On Tue, Sep 22, 2020 at 3:59 PM Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
>
>
> Le 22/09/2020 à 07:53, Jordan Niethe a écrit :
> > Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore()
> > is called before a stack has been set up in r1. This was previously fine
> > as the cpu_restore() functions were implemented in assembly and did not
> > use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new
> > device tree binding for discovering CPU features") used
> > __restore_cpu_cpufeatures() as the cpu_restore() function for a
> > device-tree features based cputable entry. This is a C function and
> > hence uses a stack in r1.
> >
> > generic_secondary_smp_init() is entered on the secondary cpus via the
> > primary cpu using the OPAL call opal_start_cpu(). In OPAL, each hardware
> > thread has its own stack. The OPAL call is ran in the primary's hardware
> > thread. During the call, a job is scheduled on a secondary cpu that will
> > start executing at the address of generic_secondary_smp_init().  Hence
> > the value that will be left in r1 when the secondary cpu enters the
> > kernel is part of that secondary cpu's individual OPAL stack. This means
> > that __restore_cpu_cpufeatures() will write to that OPAL stack. This is
> > not horribly bad as each hardware thread has its own stack and the call
> > that enters the kernel from OPAL never returns, but it is still wrong
> > and should be corrected.
> >
> > Create the temp kernel stack before calling cpu_restore().
> >
> > As noted by mpe, for a kexec boot, the secondary CPUs are released from
> > the spin loop at address 0x60 by smp_release_cpus() and then jump to
> > generic_secondary_smp_init(). The call to smp_release_cpus() is in
> > setup_arch(), and it comes before the call to emergency_stack_init().
> > emergency_stack_init() allocates an emergency stack in the PACA for each
> > CPU.  This address in the PACA is what is used to set up the temp kernel
> > stack in generic_secondary_smp_init(). Move releasing the secondary CPUs
> > to after the PACAs have been allocated an emergency stack, otherwise the
> > PACA stack pointer will contain garbage and hence the temp kernel stack
> > created from it will be broken.
> >
> > Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
> > Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
> > ---
> > v2: Add more detail to the commit message
> > v3: Release secondary CPUs after the emergency stack is created
> > ---
> >   arch/powerpc/kernel/head_64.S      | 8 ++++----
> >   arch/powerpc/kernel/setup-common.c | 6 ++++--
> >   2 files changed, 8 insertions(+), 6 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> > index 0e05a9a47a4b..4b7f4c6c2600 100644
> > --- a/arch/powerpc/kernel/head_64.S
> > +++ b/arch/powerpc/kernel/head_64.S
> > @@ -420,6 +420,10 @@ generic_secondary_common_init:
> >       /* From now on, r24 is expected to be logical cpuid */
> >       mr      r24,r5
> >
> > +     /* Create a temp kernel stack for use before relocation is on.  */
> > +     ld      r1,PACAEMERGSP(r13)
> > +     subi    r1,r1,STACK_FRAME_OVERHEAD
> > +
> >       /* See if we need to call a cpu state restore handler */
> >       LOAD_REG_ADDR(r23, cur_cpu_spec)
> >       ld      r23,0(r23)
> > @@ -448,10 +452,6 @@ generic_secondary_common_init:
> >       sync                            /* order paca.run and cur_cpu_spec */
> >       isync                           /* In case code patching happened */
> >
> > -     /* Create a temp kernel stack for use before relocation is on.  */
> > -     ld      r1,PACAEMERGSP(r13)
> > -     subi    r1,r1,STACK_FRAME_OVERHEAD
> > -
> >       b       __secondary_start
> >   #endif /* SMP */
> >
> > diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> > index 808ec9fab605..fff714e36b37 100644
> > --- a/arch/powerpc/kernel/setup-common.c
> > +++ b/arch/powerpc/kernel/setup-common.c
> > @@ -919,8 +919,6 @@ void __init setup_arch(char **cmdline_p)
> >
> >       /* On BookE, setup per-core TLB data structures. */
> >       setup_tlb_core_data();
> > -
> > -     smp_release_cpus();
> >   #endif
> >
> >       /* Print various info about the machine that has been gathered so far. */
> > @@ -944,6 +942,10 @@ void __init setup_arch(char **cmdline_p)
> >       exc_lvl_early_init();
> >       emergency_stack_init();
> >
> > +#ifdef CONFIG_SMP
> > +     smp_release_cpus();
> > +#endif
>
> Are you sure you need that #ifdef ?
Thanks, you are right, should not be necessary.
>
> In asm/smp.h, we have:
>
> #if defined(CONFIG_PPC64) && (defined(CONFIG_SMP) ||
> defined(CONFIG_KEXEC_CORE))
> extern void smp_release_cpus(void);
> #else
> static inline void smp_release_cpus(void) { };
> #endif
>
>
> > +
> >       initmem_init();
> >
> >       early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT);
> >
>
> Christophe

^ permalink raw reply

* Re: [PATCH v3 1/2] powerpc/64: Set up a kernel stack for secondaries before cpu_restore()
From: Christophe Leroy @ 2020-09-22  5:59 UTC (permalink / raw)
  To: Jordan Niethe, linuxppc-dev; +Cc: oohall, npiggin
In-Reply-To: <20200922055307.10647-1-jniethe5@gmail.com>



Le 22/09/2020 à 07:53, Jordan Niethe a écrit :
> Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore()
> is called before a stack has been set up in r1. This was previously fine
> as the cpu_restore() functions were implemented in assembly and did not
> use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new
> device tree binding for discovering CPU features") used
> __restore_cpu_cpufeatures() as the cpu_restore() function for a
> device-tree features based cputable entry. This is a C function and
> hence uses a stack in r1.
> 
> generic_secondary_smp_init() is entered on the secondary cpus via the
> primary cpu using the OPAL call opal_start_cpu(). In OPAL, each hardware
> thread has its own stack. The OPAL call is ran in the primary's hardware
> thread. During the call, a job is scheduled on a secondary cpu that will
> start executing at the address of generic_secondary_smp_init().  Hence
> the value that will be left in r1 when the secondary cpu enters the
> kernel is part of that secondary cpu's individual OPAL stack. This means
> that __restore_cpu_cpufeatures() will write to that OPAL stack. This is
> not horribly bad as each hardware thread has its own stack and the call
> that enters the kernel from OPAL never returns, but it is still wrong
> and should be corrected.
> 
> Create the temp kernel stack before calling cpu_restore().
> 
> As noted by mpe, for a kexec boot, the secondary CPUs are released from
> the spin loop at address 0x60 by smp_release_cpus() and then jump to
> generic_secondary_smp_init(). The call to smp_release_cpus() is in
> setup_arch(), and it comes before the call to emergency_stack_init().
> emergency_stack_init() allocates an emergency stack in the PACA for each
> CPU.  This address in the PACA is what is used to set up the temp kernel
> stack in generic_secondary_smp_init(). Move releasing the secondary CPUs
> to after the PACAs have been allocated an emergency stack, otherwise the
> PACA stack pointer will contain garbage and hence the temp kernel stack
> created from it will be broken.
> 
> Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
> Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
> ---
> v2: Add more detail to the commit message
> v3: Release secondary CPUs after the emergency stack is created
> ---
>   arch/powerpc/kernel/head_64.S      | 8 ++++----
>   arch/powerpc/kernel/setup-common.c | 6 ++++--
>   2 files changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index 0e05a9a47a4b..4b7f4c6c2600 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -420,6 +420,10 @@ generic_secondary_common_init:
>   	/* From now on, r24 is expected to be logical cpuid */
>   	mr	r24,r5
>   
> +	/* Create a temp kernel stack for use before relocation is on.	*/
> +	ld	r1,PACAEMERGSP(r13)
> +	subi	r1,r1,STACK_FRAME_OVERHEAD
> +
>   	/* See if we need to call a cpu state restore handler */
>   	LOAD_REG_ADDR(r23, cur_cpu_spec)
>   	ld	r23,0(r23)
> @@ -448,10 +452,6 @@ generic_secondary_common_init:
>   	sync				/* order paca.run and cur_cpu_spec */
>   	isync				/* In case code patching happened */
>   
> -	/* Create a temp kernel stack for use before relocation is on.	*/
> -	ld	r1,PACAEMERGSP(r13)
> -	subi	r1,r1,STACK_FRAME_OVERHEAD
> -
>   	b	__secondary_start
>   #endif /* SMP */
>   
> diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
> index 808ec9fab605..fff714e36b37 100644
> --- a/arch/powerpc/kernel/setup-common.c
> +++ b/arch/powerpc/kernel/setup-common.c
> @@ -919,8 +919,6 @@ void __init setup_arch(char **cmdline_p)
>   
>   	/* On BookE, setup per-core TLB data structures. */
>   	setup_tlb_core_data();
> -
> -	smp_release_cpus();
>   #endif
>   
>   	/* Print various info about the machine that has been gathered so far. */
> @@ -944,6 +942,10 @@ void __init setup_arch(char **cmdline_p)
>   	exc_lvl_early_init();
>   	emergency_stack_init();
>   
> +#ifdef CONFIG_SMP
> +	smp_release_cpus();
> +#endif

Are you sure you need that #ifdef ?

In asm/smp.h, we have:

#if defined(CONFIG_PPC64) && (defined(CONFIG_SMP) || 
defined(CONFIG_KEXEC_CORE))
extern void smp_release_cpus(void);
#else
static inline void smp_release_cpus(void) { };
#endif


> +
>   	initmem_init();
>   
>   	early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT);
> 

Christophe

^ permalink raw reply

* [PATCH v3 2/2] powerpc/64s: Convert some cpu_setup() and cpu_restore() functions to C
From: Jordan Niethe @ 2020-09-22  5:53 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: oohall, npiggin, Jordan Niethe
In-Reply-To: <20200922055307.10647-1-jniethe5@gmail.com>

The only thing keeping the cpu_setup() and cpu_restore() functions used
in the cputable entries for Power7, Power8, Power9 and Power10 in
assembly was cpu_restore() being called before there was a stack in
generic_secondary_smp_init(). Commit ("powerpc/64: Set up a kernel stack
for secondaries before cpu_restore()") means that it is now possible to
use C.

Rewrite the functions in C so they are a little bit easier to read. This
is not changing their functionality.

Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
---
 arch/powerpc/include/asm/cpu_setup_power.h |  12 +
 arch/powerpc/kernel/cpu_setup_power.S      | 252 -------------------
 arch/powerpc/kernel/cpu_setup_power.c      | 269 +++++++++++++++++++++
 arch/powerpc/kernel/cputable.c             |  12 +-
 4 files changed, 285 insertions(+), 260 deletions(-)
 create mode 100644 arch/powerpc/include/asm/cpu_setup_power.h
 delete mode 100644 arch/powerpc/kernel/cpu_setup_power.S
 create mode 100644 arch/powerpc/kernel/cpu_setup_power.c

diff --git a/arch/powerpc/include/asm/cpu_setup_power.h b/arch/powerpc/include/asm/cpu_setup_power.h
new file mode 100644
index 000000000000..24be9131f803
--- /dev/null
+++ b/arch/powerpc/include/asm/cpu_setup_power.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2020 IBM Corporation
+ */
+void __setup_cpu_power7(unsigned long offset, struct cpu_spec *spec);
+void __restore_cpu_power7(void);
+void __setup_cpu_power8(unsigned long offset, struct cpu_spec *spec);
+void __restore_cpu_power8(void);
+void __setup_cpu_power9(unsigned long offset, struct cpu_spec *spec);
+void __restore_cpu_power9(void);
+void __setup_cpu_power10(unsigned long offset, struct cpu_spec *spec);
+void __restore_cpu_power10(void);
diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S
deleted file mode 100644
index 704e8b9501ee..000000000000
--- a/arch/powerpc/kernel/cpu_setup_power.S
+++ /dev/null
@@ -1,252 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-or-later */
-/*
- * This file contains low level CPU setup functions.
- *    Copyright (C) 2003 Benjamin Herrenschmidt (benh@kernel.crashing.org)
- */
-
-#include <asm/processor.h>
-#include <asm/page.h>
-#include <asm/cputable.h>
-#include <asm/ppc_asm.h>
-#include <asm/asm-offsets.h>
-#include <asm/cache.h>
-#include <asm/book3s/64/mmu-hash.h>
-
-/* Entry: r3 = crap, r4 = ptr to cputable entry
- *
- * Note that we can be called twice for pseudo-PVRs
- */
-_GLOBAL(__setup_cpu_power7)
-	mflr	r11
-	bl	__init_hvmode_206
-	mtlr	r11
-	beqlr
-	li	r0,0
-	mtspr	SPRN_LPID,r0
-	LOAD_REG_IMMEDIATE(r0, PCR_MASK)
-	mtspr	SPRN_PCR,r0
-	mfspr	r3,SPRN_LPCR
-	li	r4,(LPCR_LPES1 >> LPCR_LPES_SH)
-	bl	__init_LPCR_ISA206
-	mtlr	r11
-	blr
-
-_GLOBAL(__restore_cpu_power7)
-	mflr	r11
-	mfmsr	r3
-	rldicl.	r0,r3,4,63
-	beqlr
-	li	r0,0
-	mtspr	SPRN_LPID,r0
-	LOAD_REG_IMMEDIATE(r0, PCR_MASK)
-	mtspr	SPRN_PCR,r0
-	mfspr	r3,SPRN_LPCR
-	li	r4,(LPCR_LPES1 >> LPCR_LPES_SH)
-	bl	__init_LPCR_ISA206
-	mtlr	r11
-	blr
-
-_GLOBAL(__setup_cpu_power8)
-	mflr	r11
-	bl	__init_FSCR
-	bl	__init_PMU
-	bl	__init_PMU_ISA207
-	bl	__init_hvmode_206
-	mtlr	r11
-	beqlr
-	li	r0,0
-	mtspr	SPRN_LPID,r0
-	LOAD_REG_IMMEDIATE(r0, PCR_MASK)
-	mtspr	SPRN_PCR,r0
-	mfspr	r3,SPRN_LPCR
-	ori	r3, r3, LPCR_PECEDH
-	li	r4,0 /* LPES = 0 */
-	bl	__init_LPCR_ISA206
-	bl	__init_HFSCR
-	bl	__init_PMU_HV
-	bl	__init_PMU_HV_ISA207
-	mtlr	r11
-	blr
-
-_GLOBAL(__restore_cpu_power8)
-	mflr	r11
-	bl	__init_FSCR
-	bl	__init_PMU
-	bl	__init_PMU_ISA207
-	mfmsr	r3
-	rldicl.	r0,r3,4,63
-	mtlr	r11
-	beqlr
-	li	r0,0
-	mtspr	SPRN_LPID,r0
-	LOAD_REG_IMMEDIATE(r0, PCR_MASK)
-	mtspr	SPRN_PCR,r0
-	mfspr   r3,SPRN_LPCR
-	ori	r3, r3, LPCR_PECEDH
-	li	r4,0 /* LPES = 0 */
-	bl	__init_LPCR_ISA206
-	bl	__init_HFSCR
-	bl	__init_PMU_HV
-	bl	__init_PMU_HV_ISA207
-	mtlr	r11
-	blr
-
-_GLOBAL(__setup_cpu_power10)
-	mflr	r11
-	bl	__init_FSCR_power10
-	bl	__init_PMU
-	bl	__init_PMU_ISA31
-	b	1f
-
-_GLOBAL(__setup_cpu_power9)
-	mflr	r11
-	bl	__init_FSCR_power9
-	bl	__init_PMU
-1:	bl	__init_hvmode_206
-	mtlr	r11
-	beqlr
-	li	r0,0
-	mtspr	SPRN_PSSCR,r0
-	mtspr	SPRN_LPID,r0
-	mtspr	SPRN_PID,r0
-	LOAD_REG_IMMEDIATE(r0, PCR_MASK)
-	mtspr	SPRN_PCR,r0
-	mfspr	r3,SPRN_LPCR
-	LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE  | LPCR_HEIC)
-	or	r3, r3, r4
-	LOAD_REG_IMMEDIATE(r4, LPCR_UPRT | LPCR_HR)
-	andc	r3, r3, r4
-	li	r4,0 /* LPES = 0 */
-	bl	__init_LPCR_ISA300
-	bl	__init_HFSCR
-	bl	__init_PMU_HV
-	mtlr	r11
-	blr
-
-_GLOBAL(__restore_cpu_power10)
-	mflr	r11
-	bl	__init_FSCR_power10
-	bl	__init_PMU
-	bl	__init_PMU_ISA31
-	b	1f
-
-_GLOBAL(__restore_cpu_power9)
-	mflr	r11
-	bl	__init_FSCR_power9
-	bl	__init_PMU
-1:	mfmsr	r3
-	rldicl.	r0,r3,4,63
-	mtlr	r11
-	beqlr
-	li	r0,0
-	mtspr	SPRN_PSSCR,r0
-	mtspr	SPRN_LPID,r0
-	mtspr	SPRN_PID,r0
-	LOAD_REG_IMMEDIATE(r0, PCR_MASK)
-	mtspr	SPRN_PCR,r0
-	mfspr   r3,SPRN_LPCR
-	LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE | LPCR_HEIC)
-	or	r3, r3, r4
-	LOAD_REG_IMMEDIATE(r4, LPCR_UPRT | LPCR_HR)
-	andc	r3, r3, r4
-	li	r4,0 /* LPES = 0 */
-	bl	__init_LPCR_ISA300
-	bl	__init_HFSCR
-	bl	__init_PMU_HV
-	mtlr	r11
-	blr
-
-__init_hvmode_206:
-	/* Disable CPU_FTR_HVMODE and exit if MSR:HV is not set */
-	mfmsr	r3
-	rldicl.	r0,r3,4,63
-	bnelr
-	ld	r5,CPU_SPEC_FEATURES(r4)
-	LOAD_REG_IMMEDIATE(r6,CPU_FTR_HVMODE | CPU_FTR_P9_TM_HV_ASSIST)
-	andc	r5,r5,r6
-	std	r5,CPU_SPEC_FEATURES(r4)
-	blr
-
-__init_LPCR_ISA206:
-	/* Setup a sane LPCR:
-	 *   Called with initial LPCR in R3 and desired LPES 2-bit value in R4
-	 *
-	 *   LPES = 0b01 (HSRR0/1 used for 0x500)
-	 *   PECE = 0b111
-	 *   DPFD = 4
-	 *   HDICE = 0
-	 *   VC = 0b100 (VPM0=1, VPM1=0, ISL=0)
-	 *   VRMASD = 0b10000 (L=1, LP=00)
-	 *
-	 * Other bits untouched for now
-	 */
-	li	r5,0x10
-	rldimi	r3,r5, LPCR_VRMASD_SH, 64-LPCR_VRMASD_SH-5
-
-	/* POWER9 has no VRMASD */
-__init_LPCR_ISA300:
-	rldimi	r3,r4, LPCR_LPES_SH, 64-LPCR_LPES_SH-2
-	ori	r3,r3,(LPCR_PECE0|LPCR_PECE1|LPCR_PECE2)
-	li	r5,4
-	rldimi	r3,r5, LPCR_DPFD_SH, 64-LPCR_DPFD_SH-3
-	clrrdi	r3,r3,1		/* clear HDICE */
-	li	r5,4
-	rldimi	r3,r5, LPCR_VC_SH, 0
-	mtspr	SPRN_LPCR,r3
-	isync
-	blr
-
-__init_FSCR_power10:
-	mfspr	r3, SPRN_FSCR
-	ori	r3, r3, FSCR_PREFIX
-	mtspr	SPRN_FSCR, r3
-	// fall through
-
-__init_FSCR_power9:
-	mfspr	r3, SPRN_FSCR
-	ori	r3, r3, FSCR_SCV
-	mtspr	SPRN_FSCR, r3
-	// fall through
-
-__init_FSCR:
-	mfspr	r3,SPRN_FSCR
-	ori	r3,r3,FSCR_TAR|FSCR_EBB
-	mtspr	SPRN_FSCR,r3
-	blr
-
-__init_HFSCR:
-	mfspr	r3,SPRN_HFSCR
-	ori	r3,r3,HFSCR_TAR|HFSCR_TM|HFSCR_BHRB|HFSCR_PM|\
-		      HFSCR_DSCR|HFSCR_VECVSX|HFSCR_FP|HFSCR_EBB|HFSCR_MSGP
-	mtspr	SPRN_HFSCR,r3
-	blr
-
-__init_PMU_HV:
-	li	r5,0
-	mtspr	SPRN_MMCRC,r5
-	blr
-
-__init_PMU_HV_ISA207:
-	li	r5,0
-	mtspr	SPRN_MMCRH,r5
-	blr
-
-__init_PMU:
-	li	r5,0
-	mtspr	SPRN_MMCRA,r5
-	mtspr	SPRN_MMCR0,r5
-	mtspr	SPRN_MMCR1,r5
-	mtspr	SPRN_MMCR2,r5
-	blr
-
-__init_PMU_ISA207:
-	li	r5,0
-	mtspr	SPRN_MMCRS,r5
-	blr
-
-__init_PMU_ISA31:
-	li	r5,0
-	mtspr	SPRN_MMCR3,r5
-	LOAD_REG_IMMEDIATE(r5, MMCRA_BHRB_DISABLE)
-	mtspr	SPRN_MMCRA,r5
-	blr
diff --git a/arch/powerpc/kernel/cpu_setup_power.c b/arch/powerpc/kernel/cpu_setup_power.c
new file mode 100644
index 000000000000..cf5201b0579d
--- /dev/null
+++ b/arch/powerpc/kernel/cpu_setup_power.c
@@ -0,0 +1,269 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2020 IBM Corporation
+ * This file contains low level CPU setup functions.
+ * Originally written in assembly by
+ * Benjamin Herrenschmidt (benh@kernel.crashing.org)
+ */
+#include <asm/reg.h>
+#include <asm/synch.h>
+#include <linux/bitops.h>
+#include <asm/cputable.h>
+#include <asm/cpu_setup_power.h>
+
+/* Disable CPU_FTR_HVMODE and return false if MSR:HV is not set */
+static bool init_hvmode_206(struct cpu_spec *t)
+{
+	u64 msr;
+
+	msr = mfmsr();
+	if (msr & MSR_HV)
+		return true;
+
+	t->cpu_features &= ~(CPU_FTR_HVMODE | CPU_FTR_P9_TM_HV_ASSIST);
+	return false;
+}
+
+static void init_LPCR_ISA300(u64 lpcr, u64 lpes)
+{
+	/* POWER9 has no VRMASD */
+	lpcr |= (lpes << LPCR_LPES_SH) & LPCR_LPES;
+	lpcr |= LPCR_PECE0|LPCR_PECE1|LPCR_PECE2;
+	lpcr |= (4ull << LPCR_DPFD_SH) & LPCR_DPFD;
+	lpcr &= ~LPCR_HDICE;	/* clear HDICE */
+	lpcr |= (4ull << LPCR_VC_SH);
+	mtspr(SPRN_LPCR, lpcr);
+	isync();
+}
+
+/*
+ * Setup a sane LPCR:
+ *   Called with initial LPCR and desired LPES 2-bit value
+ *
+ *   LPES = 0b01 (HSRR0/1 used for 0x500)
+ *   PECE = 0b111
+ *   DPFD = 4
+ *   HDICE = 0
+ *   VC = 0b100 (VPM0=1, VPM1=0, ISL=0)
+ *   VRMASD = 0b10000 (L=1, LP=00)
+ *
+ * Other bits untouched for now
+ */
+static void init_LPCR_ISA206(u64 lpcr, u64 lpes)
+{
+	lpcr |= (0x10ull << LPCR_VRMASD_SH) & LPCR_VRMASD;
+	init_LPCR_ISA300(lpcr, lpes);
+}
+
+static void init_FSCR(void)
+{
+	u64 fscr;
+
+	fscr = mfspr(SPRN_FSCR);
+	fscr |= FSCR_TAR|FSCR_EBB;
+	mtspr(SPRN_FSCR, fscr);
+}
+
+static void init_FSCR_power9(void)
+{
+	u64 fscr;
+
+	fscr = mfspr(SPRN_FSCR);
+	fscr |= FSCR_SCV;
+	mtspr(SPRN_FSCR, fscr);
+	init_FSCR();
+}
+
+static void init_FSCR_power10(void)
+{
+	u64 fscr;
+
+	fscr = mfspr(SPRN_FSCR);
+	fscr |= FSCR_PREFIX;
+	mtspr(SPRN_FSCR, fscr);
+	init_FSCR_power9();
+}
+
+static void init_HFSCR(void)
+{
+	u64 hfscr;
+
+	hfscr = mfspr(SPRN_HFSCR);
+	hfscr |= HFSCR_TAR|HFSCR_TM|HFSCR_BHRB|HFSCR_PM|HFSCR_DSCR|\
+		 HFSCR_VECVSX|HFSCR_FP|HFSCR_EBB|HFSCR_MSGP;
+	mtspr(SPRN_HFSCR, hfscr);
+}
+
+static void init_PMU_HV(void)
+{
+	mtspr(SPRN_MMCRC, 0);
+}
+
+static void init_PMU_HV_ISA207(void)
+{
+	mtspr(SPRN_MMCRH, 0);
+}
+
+static void init_PMU(void)
+{
+	mtspr(SPRN_MMCRA, 0);
+	mtspr(SPRN_MMCR0, 0);
+	mtspr(SPRN_MMCR1, 0);
+	mtspr(SPRN_MMCR2, 0);
+}
+
+static void init_PMU_ISA207(void)
+{
+	mtspr(SPRN_MMCRS, 0);
+}
+
+static void init_PMU_ISA31(void)
+{
+	mtspr(SPRN_MMCR3, 0);
+	mtspr(SPRN_MMCRA, MMCRA_BHRB_DISABLE);
+}
+
+/*
+ * Note that we can be called twice of pseudo-PVRs.
+ * The parameter offset is not used.
+ */
+
+void __setup_cpu_power7(unsigned long offset, struct cpu_spec *t)
+{
+	if (!init_hvmode_206(t))
+		return;
+
+	mtspr(SPRN_LPID, 0);
+	mtspr(SPRN_PCR, PCR_MASK);
+	init_LPCR_ISA206(mfspr(SPRN_LPCR), LPCR_LPES1 >> LPCR_LPES_SH);
+}
+
+void __restore_cpu_power7(void)
+{
+	u64 msr;
+
+	msr = mfmsr();
+	if (!(msr & MSR_HV))
+		return;
+
+	mtspr(SPRN_LPID, 0);
+	mtspr(SPRN_PCR, PCR_MASK);
+	init_LPCR_ISA206(mfspr(SPRN_LPCR), LPCR_LPES1 >> LPCR_LPES_SH);
+}
+
+void __setup_cpu_power8(unsigned long offset, struct cpu_spec *t)
+{
+	init_FSCR();
+	init_PMU();
+	init_PMU_ISA207();
+
+	if (!init_hvmode_206(t))
+		return;
+
+	mtspr(SPRN_LPID, 0);
+	mtspr(SPRN_PCR, PCR_MASK);
+	init_LPCR_ISA206(mfspr(SPRN_LPCR) | LPCR_PECEDH, 0); /* LPES = 0 */
+	init_HFSCR();
+	init_PMU_HV();
+	init_PMU_HV_ISA207();
+}
+
+void __restore_cpu_power8(void)
+{
+	u64 msr;
+
+	init_FSCR();
+	init_PMU();
+	init_PMU_ISA207();
+
+	msr = mfmsr();
+	if (!(msr & MSR_HV))
+		return;
+
+	mtspr(SPRN_LPID, 0);
+	mtspr(SPRN_PCR, PCR_MASK);
+	init_LPCR_ISA206(mfspr(SPRN_LPCR) | LPCR_PECEDH, 0); /* LPES = 0 */
+	init_HFSCR();
+	init_PMU_HV();
+	init_PMU_HV_ISA207();
+}
+
+void __setup_cpu_power9(unsigned long offset, struct cpu_spec *t)
+{
+	init_FSCR_power9();
+	init_PMU();
+
+	if (!init_hvmode_206(t))
+		return;
+
+	mtspr(SPRN_PSSCR, 0);
+	mtspr(SPRN_LPID, 0);
+	mtspr(SPRN_PID, 0);
+	mtspr(SPRN_PCR, PCR_MASK);
+	init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
+			 LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
+	init_HFSCR();
+	init_PMU_HV();
+}
+
+void __restore_cpu_power9(void)
+{
+	u64 msr;
+
+	init_FSCR_power9();
+	init_PMU();
+
+	msr = mfmsr();
+	if (!(msr & MSR_HV))
+		return;
+
+	mtspr(SPRN_PSSCR, 0);
+	mtspr(SPRN_LPID, 0);
+	mtspr(SPRN_PID, 0);
+	mtspr(SPRN_PCR, PCR_MASK);
+	init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
+			 LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
+	init_HFSCR();
+	init_PMU_HV();
+}
+
+void __setup_cpu_power10(unsigned long offset, struct cpu_spec *t)
+{
+	init_FSCR_power10();
+	init_PMU();
+	init_PMU_ISA31();
+
+	if (!init_hvmode_206(t))
+		return;
+
+	mtspr(SPRN_PSSCR, 0);
+	mtspr(SPRN_LPID, 0);
+	mtspr(SPRN_PID, 0);
+	mtspr(SPRN_PCR, PCR_MASK);
+	init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
+			 LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
+	init_HFSCR();
+	init_PMU_HV();
+}
+
+void __restore_cpu_power10(void)
+{
+	u64 msr;
+
+	init_FSCR_power10();
+	init_PMU();
+	init_PMU_ISA31();
+
+	msr = mfmsr();
+	if (!(msr & MSR_HV))
+		return;
+
+	mtspr(SPRN_PSSCR, 0);
+	mtspr(SPRN_LPID, 0);
+	mtspr(SPRN_PID, 0);
+	mtspr(SPRN_PCR, PCR_MASK);
+	init_LPCR_ISA300((mfspr(SPRN_LPCR) | LPCR_PECEDH | LPCR_PECE_HVEE |\
+			 LPCR_HVICE | LPCR_HEIC) & ~(LPCR_UPRT | LPCR_HR), 0);
+	init_HFSCR();
+	init_PMU_HV();
+}
diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 2aa89c6b2896..26a56c9d6650 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -59,19 +59,15 @@ extern void __setup_cpu_7410(unsigned long offset, struct cpu_spec* spec);
 extern void __setup_cpu_745x(unsigned long offset, struct cpu_spec* spec);
 #endif /* CONFIG_PPC32 */
 #ifdef CONFIG_PPC64
+#include <asm/cpu_setup_power.h>
 extern void __setup_cpu_ppc970(unsigned long offset, struct cpu_spec* spec);
 extern void __setup_cpu_ppc970MP(unsigned long offset, struct cpu_spec* spec);
 extern void __setup_cpu_pa6t(unsigned long offset, struct cpu_spec* spec);
 extern void __restore_cpu_pa6t(void);
 extern void __restore_cpu_ppc970(void);
-extern void __setup_cpu_power7(unsigned long offset, struct cpu_spec* spec);
-extern void __restore_cpu_power7(void);
-extern void __setup_cpu_power8(unsigned long offset, struct cpu_spec* spec);
-extern void __restore_cpu_power8(void);
-extern void __setup_cpu_power9(unsigned long offset, struct cpu_spec* spec);
-extern void __restore_cpu_power9(void);
-extern void __setup_cpu_power10(unsigned long offset, struct cpu_spec* spec);
-extern void __restore_cpu_power10(void);
+extern long __machine_check_early_realmode_p7(struct pt_regs *regs);
+extern long __machine_check_early_realmode_p8(struct pt_regs *regs);
+extern long __machine_check_early_realmode_p9(struct pt_regs *regs);
 #endif /* CONFIG_PPC64 */
 #if defined(CONFIG_E500)
 extern void __setup_cpu_e5500(unsigned long offset, struct cpu_spec* spec);
-- 
2.17.1


^ permalink raw reply related

* [PATCH v3 1/2] powerpc/64: Set up a kernel stack for secondaries before cpu_restore()
From: Jordan Niethe @ 2020-09-22  5:53 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: oohall, npiggin, Jordan Niethe

Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore()
is called before a stack has been set up in r1. This was previously fine
as the cpu_restore() functions were implemented in assembly and did not
use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new
device tree binding for discovering CPU features") used
__restore_cpu_cpufeatures() as the cpu_restore() function for a
device-tree features based cputable entry. This is a C function and
hence uses a stack in r1.

generic_secondary_smp_init() is entered on the secondary cpus via the
primary cpu using the OPAL call opal_start_cpu(). In OPAL, each hardware
thread has its own stack. The OPAL call is ran in the primary's hardware
thread. During the call, a job is scheduled on a secondary cpu that will
start executing at the address of generic_secondary_smp_init().  Hence
the value that will be left in r1 when the secondary cpu enters the
kernel is part of that secondary cpu's individual OPAL stack. This means
that __restore_cpu_cpufeatures() will write to that OPAL stack. This is
not horribly bad as each hardware thread has its own stack and the call
that enters the kernel from OPAL never returns, but it is still wrong
and should be corrected.

Create the temp kernel stack before calling cpu_restore().

As noted by mpe, for a kexec boot, the secondary CPUs are released from
the spin loop at address 0x60 by smp_release_cpus() and then jump to
generic_secondary_smp_init(). The call to smp_release_cpus() is in
setup_arch(), and it comes before the call to emergency_stack_init().
emergency_stack_init() allocates an emergency stack in the PACA for each
CPU.  This address in the PACA is what is used to set up the temp kernel
stack in generic_secondary_smp_init(). Move releasing the secondary CPUs
to after the PACAs have been allocated an emergency stack, otherwise the
PACA stack pointer will contain garbage and hence the temp kernel stack
created from it will be broken.

Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
---
v2: Add more detail to the commit message
v3: Release secondary CPUs after the emergency stack is created
---
 arch/powerpc/kernel/head_64.S      | 8 ++++----
 arch/powerpc/kernel/setup-common.c | 6 ++++--
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
index 0e05a9a47a4b..4b7f4c6c2600 100644
--- a/arch/powerpc/kernel/head_64.S
+++ b/arch/powerpc/kernel/head_64.S
@@ -420,6 +420,10 @@ generic_secondary_common_init:
 	/* From now on, r24 is expected to be logical cpuid */
 	mr	r24,r5

+	/* Create a temp kernel stack for use before relocation is on.	*/
+	ld	r1,PACAEMERGSP(r13)
+	subi	r1,r1,STACK_FRAME_OVERHEAD
+
 	/* See if we need to call a cpu state restore handler */
 	LOAD_REG_ADDR(r23, cur_cpu_spec)
 	ld	r23,0(r23)
@@ -448,10 +452,6 @@ generic_secondary_common_init:
 	sync				/* order paca.run and cur_cpu_spec */
 	isync				/* In case code patching happened */

-	/* Create a temp kernel stack for use before relocation is on.	*/
-	ld	r1,PACAEMERGSP(r13)
-	subi	r1,r1,STACK_FRAME_OVERHEAD
-
 	b	__secondary_start
 #endif /* SMP */

diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 808ec9fab605..fff714e36b37 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -919,8 +919,6 @@ void __init setup_arch(char **cmdline_p)

 	/* On BookE, setup per-core TLB data structures. */
 	setup_tlb_core_data();
-
-	smp_release_cpus();
 #endif

 	/* Print various info about the machine that has been gathered so far. */
@@ -944,6 +942,10 @@ void __init setup_arch(char **cmdline_p)
 	exc_lvl_early_init();
 	emergency_stack_init();

+#ifdef CONFIG_SMP
+	smp_release_cpus();
+#endif
+
 	initmem_init();

 	early_memtest(min_low_pfn << PAGE_SHIFT, max_low_pfn << PAGE_SHIFT);
-- 
2.17.1

^ permalink raw reply related

* Re: [PATCH] ibmvfc: Avoid link down on FS9100 canister reboot
From: Martin K. Petersen @ 2020-09-22  3:56 UTC (permalink / raw)
  To: linux-scsi, Brian King; +Cc: tyreld, linuxppc-dev, Martin K . Petersen
In-Reply-To: <1599859706-8505-1-git-send-email-brking@linux.vnet.ibm.com>

On Fri, 11 Sep 2020 16:28:26 -0500, Brian King wrote:

> When a canister on a FS9100, or similar storage, running in NPIV mode,
> is rebooted, its WWPNs will fail over to another canister. When this
> occurs, we see a WWPN going away from the fabric at one N-Port ID,
> and, a short time later, the same WWPN appears at a different N-Port ID.
> When the canister is fully operational again, the WWPNs fail back to
> the original canister. If there is any I/O outstanding to the target
> when this occurs, it will result in the implicit logout the ibmvfc driver
> issues before removing the rport to fail. When the WWPN then shows up at a
> different N-Port ID, and we issue a PLOGI to it, the VIOS will
> see that it still has a login for this WWPN at the old N-Port ID,
> which results in the VIOS simulating a link down / link up sequence
> to the client, in order to get the VIOS and client LPAR in sync.
> 
> [...]

Applied to 5.10/scsi-queue, thanks!

[1/1] scsi: ibmvfc: Avoid link down on FS9100 canister reboot
      https://git.kernel.org/mkp/scsi/c/4b29cb6197d9

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/64: Set up a kernel stack for secondaries before cpu_restore()
From: Jordan Niethe @ 2020-09-22  3:52 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Oliver O'Halloran, linuxppc-dev, Nicholas Piggin
In-Reply-To: <877dsro8iy.fsf@mpe.ellerman.id.au>

On Fri, Sep 18, 2020 at 5:21 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Hi Jordan,
>
> Jordan Niethe <jniethe5@gmail.com> writes:
> > Currently in generic_secondary_smp_init(), cur_cpu_spec->cpu_restore()
> > is called before a stack has been set up in r1. This was previously fine
> > as the cpu_restore() functions were implemented in assembly and did not
> > use a stack. However commit 5a61ef74f269 ("powerpc/64s: Support new
> > device tree binding for discovering CPU features") used
> > __restore_cpu_cpufeatures() as the cpu_restore() function for a
> > device-tree features based cputable entry. This is a C function and
> > hence uses a stack in r1.
> >
> > generic_secondary_smp_init() is entered on the secondary cpus via the
> > primary cpu using the OPAL call opal_start_cpu(). In OPAL, each hardware
> > thread has its own stack. The OPAL call is ran in the primary's hardware
> > thread. During the call, a job is scheduled on a secondary cpu that will
> > start executing at the address of generic_secondary_smp_init().  Hence
> > the value that will be left in r1 when the secondary cpu enters the
> > kernel is part of that secondary cpu's individual OPAL stack. This means
> > that __restore_cpu_cpufeatures() will write to that OPAL stack. This is
> > not horribly bad as each hardware thread has its own stack and the call
> > that enters the kernel from OPAL never returns, but it is still wrong
> > and should be corrected.
> >
> > Create the temp kernel stack before calling cpu_restore().
> >
> > Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features")
> > Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
> > ---
> > v2: Add more detail to the commit message
> > ---
> >  arch/powerpc/kernel/head_64.S | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
>
> Unfortunately this breaks booting via kexec.
>
> In that case the secondaries come in to 0x60 and spin until they're
> released by smp_release_cpus(), which is before emergency_stack_init()
> has run. That means they pick up a bad r1 value and crash/get stuck.
>
> I'm not sure what the best solution is.
Would it be simplest to just call smp_release_cpus() after setting up the stack?
>
> I've thought in the past that it would be nicer if the CPU setup didn't
> run until the secondary is told to start (via PACAPROCSTART), ie. more
> the CPU setup call below there.
>
> But that opens the possibility that we run threads with different
> settings of some SPRs until SMP bringup, and if the user has said not to
> start secondaries then possibly for ever. And I haven't though hard
> enough about whether that's actually problematic (running with different
> SPR values).
>
> cheers
>
>
> > diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> > index 0e05a9a47a4b..4b7f4c6c2600 100644
> > --- a/arch/powerpc/kernel/head_64.S
> > +++ b/arch/powerpc/kernel/head_64.S
> > @@ -420,6 +420,10 @@ generic_secondary_common_init:
> >       /* From now on, r24 is expected to be logical cpuid */
> >       mr      r24,r5
> >
> > +     /* Create a temp kernel stack for use before relocation is on.  */
> > +     ld      r1,PACAEMERGSP(r13)
> > +     subi    r1,r1,STACK_FRAME_OVERHEAD
> > +
> >       /* See if we need to call a cpu state restore handler */
> >       LOAD_REG_ADDR(r23, cur_cpu_spec)
> >       ld      r23,0(r23)
> > @@ -448,10 +452,6 @@ generic_secondary_common_init:
> >       sync                            /* order paca.run and cur_cpu_spec */
> >       isync                           /* In case code patching happened */
> >
> > -     /* Create a temp kernel stack for use before relocation is on.  */
> > -     ld      r1,PACAEMERGSP(r13)
> > -     subi    r1,r1,STACK_FRAME_OVERHEAD
> > -
> >       b       __secondary_start
> >  #endif /* SMP */
> >
> > --
> > 2.17.1

^ permalink raw reply

* Re: [PATCH kernel] powerpc/dma: Fix dma_map_ops::get_required_mask
From: Alexey Kardashevskiy @ 2020-09-22  2:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Oliver OHalloran, linuxppc-dev@lists.ozlabs.org,
	Cédric Le Goater
In-Reply-To: <20200915065022.GA19658@lst.de>

On 15/09/2020 16:50, Christoph Hellwig wrote:
> On Wed, Sep 09, 2020 at 07:36:04PM +1000, Alexey Kardashevskiy wrote:
>> I want dma_get_required_mask() to return the bigger mask always.
>>
>> Now it depends on (in dma_alloc_direct()):
>> 1. dev->dma_ops_bypass: set via pci_set_(coherent_)dma_mask();
>> 2. dev->coherent_dma_mask - the same;
>> 3. dev->bus_dma_limit - usually not set at all.
>>
>> So until we set the mask, dma_get_required_mask() returns smaller mask.
>> So aacraid and likes (which calls dma_get_required_mask() before setting
>> it) will remain prone for breaks.
> 
> Well, the original intent of dma_get_required_mask is to return the
> mask that the driver then uses to figure out what to set, so what aacraid
> does fits that use case. 

What was the original intent exactly? The driver asks for the minimum or
maximum DMA mask the platform supports?

As for now, we (ppc64/powernv) can do:
1. bypass (==64bit)
2. a DMA window which used to be limited by 2GB but not anymore.

I can understand if the driver asked for required mask in expectation to
receive "less or equal than 32bit" and "more than 32 bit" and choose.
And this probably was the intent as at the time when the bug was
introduced, the window was always smaller than 4GB.

But today the window is bigger than than (44 bits now, or a similar
value, depends on max page order) so the returned mask is >32. Which
still enables that DAC in aacraid but I suspect this is accidental.

> Of course that idea is pretty bogus for
> PCIe devices.

Why? From the PHB side, there are windows. From the device side, there
are many crippled devices, like, no GPU I saw in last years supported
more than 48bit.

> I suspect the right fix is to just not query dma_get_required_mask for
> PCIe devices in aacraid (and other drivers that do something similar).

May be, if you write nice and big comment next to
dma_get_required_mask() explaining exactly what it does, then I will
realize I am getting this all wrong and we will move to fixing the
drivers :)

-- 
Alexey

^ permalink raw reply

* RE: [PATCH 3/5] arm: dts: ls1021a: fix that FlexTimer cannot wakeup system in deep sleep
From: Ran Wang @ 2020-09-22  2:18 UTC (permalink / raw)
  To: Leo Li, Rob Herring, Shawn Guo
  Cc: devicetree@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, Biwen Li
In-Reply-To: <VE1PR04MB6687F86E5F609DC77E050CEA8F3A0@VE1PR04MB6687.eurprd04.prod.outlook.com>

Hi Leo,

On Tuesday, September 22, 2020 6:59 AM, Leo Li wrote:
> 
> > -----Original Message-----
> > From: Ran Wang <ran.wang_1@nxp.com>
> > Sent: Wednesday, September 16, 2020 3:18 AM
> > To: Leo Li <leoyang.li@nxp.com>; Rob Herring <robh+dt@kernel.org>;
> > Shawn Guo <shawnguo@kernel.org>
> > Cc: linuxppc-dev@lists.ozlabs.org;
> > linux-arm-kernel@lists.infradead.org;
> > devicetree@vger.kernel.org; linux-kernel@vger.kernel.org; Biwen Li
> > <biwen.li@nxp.com>; Ran Wang <ran.wang_1@nxp.com>
> > Subject: [PATCH 3/5] arm: dts: ls1021a: fix that FlexTimer cannot
> > wakeup system in deep sleep
> 
> A better description should be enabling the A-008646 workaround to be
> consistent with other patches.

OK, will update this.

Regards,
Ran
> >
> > From: Biwen Li <biwen.li@nxp.com>
> >
> > The patch fixes a bug that FlexTimer cannot wakeup system in deep sleep.
> >
> > Signed-off-by: Biwen Li <biwen.li@nxp.com>
> > Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
> > ---
> >  arch/arm/boot/dts/ls1021a.dtsi | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/arch/arm/boot/dts/ls1021a.dtsi
> > b/arch/arm/boot/dts/ls1021a.dtsi index 827373e..089fe86 100644
> > --- a/arch/arm/boot/dts/ls1021a.dtsi
> > +++ b/arch/arm/boot/dts/ls1021a.dtsi
> > @@ -1007,6 +1007,7 @@
> >  			compatible = "fsl,ls1021a-rcpm", "fsl,qoriq-rcpm- 2.1+";
> >  			reg = <0x0 0x1ee2140 0x0 0x8>;
> >  			#fsl,rcpm-wakeup-cells = <2>;
> > +			fsl,ippdexpcr1-alt-addr = <&scfg 0x51c>;
> >  		};
> >
> >  		ftm_alarm0: timer0@29d0000 {
> > --
> > 2.7.4


^ permalink raw reply

* RE: [PATCH 2/5] soc: fsl: handle RCPM errata A-008646 on SoC LS1021A
From: Ran Wang @ 2020-09-22  2:16 UTC (permalink / raw)
  To: Leo Li, Rob Herring, Shawn Guo
  Cc: devicetree@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, Biwen Li
In-Reply-To: <AM0PR04MB66769C7080A32C95AD438ADF8F3A0@AM0PR04MB6676.eurprd04.prod.outlook.com>

Hi Leo

Tuesday, September 22, 2020 6:43 AM, Leo Li wrote:
> 
> 
> > -----Original Message-----
> > From: Ran Wang <ran.wang_1@nxp.com>
> > Sent: Wednesday, September 16, 2020 3:18 AM
> > To: Leo Li <leoyang.li@nxp.com>; Rob Herring <robh+dt@kernel.org>;
> > Shawn Guo <shawnguo@kernel.org>
> > Cc: linuxppc-dev@lists.ozlabs.org;
> > linux-arm-kernel@lists.infradead.org;
> > devicetree@vger.kernel.org; linux-kernel@vger.kernel.org; Biwen Li
> > <biwen.li@nxp.com>; Ran Wang <ran.wang_1@nxp.com>
> > Subject: [PATCH 2/5] soc: fsl: handle RCPM errata A-008646 on SoC
> > LS1021A
> >
> > From: Biwen Li <biwen.li@nxp.com>
> >
> > Description:
> > 	- Reading configuration register RCPM_IPPDEXPCR1
> > 	  always return zero
> >
> > Workaround:
> > 	- Save register RCPM_IPPDEXPCR1's value to
> > 	  register SCFG_SPARECR8.(uboot's psci also
> > 	  need reading value from the register SCFG_SPARECR8
> > 	  to set register RCPM_IPPDEXPCR1)
> >
> > Impact:
> > 	- FlexTimer module will cannot wakeup system in
> Will not..
> Also it will be better to merge this with the issue description part above to
> prevent confusion.

OK

> > 	  deep sleep on SoC LS1021A
> >
> > Signed-off-by: Biwen Li <biwen.li@nxp.com>
> > Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
> > ---
> >  drivers/soc/fsl/rcpm.c | 42
> > +++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 41 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c index
> > a093dbe..e6354f5 100644
> > --- a/drivers/soc/fsl/rcpm.c
> > +++ b/drivers/soc/fsl/rcpm.c
> > @@ -2,7 +2,7 @@
> >  //
> >  // rcpm.c - Freescale QorIQ RCPM driver  // -// Copyright 2019 NXP
> > +// Copyright 2019-2020 NXP
> >  //
> >  // Author: Ran Wang <ran.wang_1@nxp.com>
> >
> > @@ -13,6 +13,9 @@
> >  #include <linux/slab.h>
> >  #include <linux/suspend.h>
> >  #include <linux/kernel.h>
> > +#include <linux/acpi.h>
> > +#include <linux/mfd/syscon.h>
> > +#include <linux/regmap.h>
> >
> >  #define RCPM_WAKEUP_CELL_MAX_SIZE	7
> >
> > @@ -37,6 +40,9 @@ static int rcpm_pm_prepare(struct device *dev)
> >  	struct device_node	*np = dev->of_node;
> >  	u32 value[RCPM_WAKEUP_CELL_MAX_SIZE + 1];
> >  	u32 setting[RCPM_WAKEUP_CELL_MAX_SIZE] = {0};
> > +	struct regmap *scfg_addr_regmap = NULL;
> > +	u32 reg_offset[2];
> > +	u32 reg_value = 0;
> >
> >  	rcpm = dev_get_drvdata(dev);
> >  	if (!rcpm)
> > @@ -90,6 +96,40 @@ static int rcpm_pm_prepare(struct device *dev)
> >  			tmp |= ioread32be(address);
> >  			iowrite32be(tmp, address);
> >  		}
> > +		/*
> > +		 * Workaround of errata A-008646 on SoC LS1021A:
> > +		 * There is a bug of register ippdexpcr1.
> > +		 * Reading configuration register RCPM_IPPDEXPCR1
> > +		 * always return zero. So save ippdexpcr1's value
> > +		 * to register SCFG_SPARECR8.And the value of
> > +		 * ippdexpcr1 will be read from SCFG_SPARECR8.
> > +		 */
> > +		if (device_property_present(dev, "fsl,ippdexpcr1-alt-addr"))
> > {
> > +			if (dev_of_node(dev)) {
> > +				scfg_addr_regmap =
> > syscon_regmap_lookup_by_phandle(np,
> > +
> > 	   "fsl,ippdexpcr1-alt-addr");
> > +			} else if (is_acpi_node(dev->fwnode)) {
> > +				continue;
> > +			}
> > +
> > +			if (scfg_addr_regmap && (i == 1)) {
> > +				if (device_property_read_u32_array(dev,
> > +				    "fsl,ippdexpcr1-alt-addr",
> > +				    reg_offset,
> > +				    2)) {
> 
> It is not necessary to read out the whole fsl,ippdexpcr1-alt-addr property if we
> only need the offset.  Also you can change to use the
> syscon_regmap_lookup_by_phandle_args() API above to simplify the code.

Got it, will update it in next version, thanks.

Regards,
Ran

> > +					scfg_addr_regmap = NULL;
> > +					continue;
> > +				}
> > +				/* Read value from register SCFG_SPARECR8
> > */
> > +				regmap_read(scfg_addr_regmap,
> > +					    reg_offset[1],
> > +					    &reg_value);
> > +				/* Write value to register SCFG_SPARECR8 */
> > +				regmap_write(scfg_addr_regmap,
> > +					     reg_offset[1],
> > +					     tmp | reg_value);
> > +			}
> > +		}
> >  	}
> >
> >  	return 0;
> > --
> > 2.7.4


^ permalink raw reply

* RE: [PATCH 1/5] Documentation: dt: binding: fsl: Add 'fsl,ippdexpcr1-alt-addr' property
From: Ran Wang @ 2020-09-22  2:12 UTC (permalink / raw)
  To: Leo Li, Rob Herring
  Cc: Biwen Li, devicetree@vger.kernel.org, Shawn Guo,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <VE1PR04MB6687A3E64431C16831D09C568F3A0@VE1PR04MB6687.eurprd04.prod.outlook.com>

Hi Leo, Rob,


On Tuesday, September 22, 2020 6:20 AM, Leo Li wrote:
> 
> > -----Original Message-----
> > From: Ran Wang <ran.wang_1@nxp.com>
> > Sent: Wednesday, September 16, 2020 3:18 AM
> > To: Leo Li <leoyang.li@nxp.com>; Rob Herring <robh+dt@kernel.org>;
> > Shawn Guo <shawnguo@kernel.org>
> > Cc: linuxppc-dev@lists.ozlabs.org;
> > linux-arm-kernel@lists.infradead.org;
> > devicetree@vger.kernel.org; linux-kernel@vger.kernel.org; Biwen Li
> > <biwen.li@nxp.com>; Ran Wang <ran.wang_1@nxp.com>
> > Subject: [PATCH 1/5] Documentation: dt: binding: fsl: Add
> > 'fsl,ippdexpcr1-alt- addr' property
> >
> > From: Biwen Li <biwen.li@nxp.com>
> >
> > The 'fsl,ippdexpcr1-alt-addr' property is used to handle an errata
> > A-008646 on LS1021A
> 
> It looks like the previous version of this patch has gotten the reviewed-by from
> Rob.  It would be good to be added to the patch for new submission.

Actually this patch has one update from previous version (https://lore.kernel.org/patchwork/patch/1161631/):
Reduce entry number from 3 to 2.

So I'd like to have a review for this one, sorry for not highlight this in advance.

Regards,
Ran

> >
> > Signed-off-by: Biwen Li <biwen.li@nxp.com>
> > Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
> > ---
> >  Documentation/devicetree/bindings/soc/fsl/rcpm.txt | 19
> > +++++++++++++++++++
> >  1 file changed, 19 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > index 5a33619..1be58a3 100644
> > --- a/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > +++ b/Documentation/devicetree/bindings/soc/fsl/rcpm.txt
> > @@ -34,6 +34,11 @@ Chassis Version		Example Chips
> >  Optional properties:
> >   - little-endian : RCPM register block is Little Endian. Without it RCPM
> >     will be Big Endian (default case).
> > + - fsl,ippdexpcr1-alt-addr : The property is related to a hardware issue
> > +   on SoC LS1021A and only needed on SoC LS1021A.
> > +   Must include 2 entries:
> > +   The first entry must be a link to the SCFG device node.
> > +   The 2nd entry must be offset of register IPPDEXPCR1 in SCFG.
> >
> >  Example:
> >  The RCPM node for T4240:
> > @@ -43,6 +48,20 @@ The RCPM node for T4240:
> >  		#fsl,rcpm-wakeup-cells = <2>;
> >  	};
> >
> > +The RCPM node for LS1021A:
> > +	rcpm: rcpm@1ee2140 {
> > +		compatible = "fsl,ls1021a-rcpm", "fsl,qoriq-rcpm-2.1+";
> > +		reg = <0x0 0x1ee2140 0x0 0x8>;
> > +		#fsl,rcpm-wakeup-cells = <2>;
> > +
> > +		/*
> > +		 * The second and third entry compose an alt offset
> > +		 * address for IPPDEXPCR1(SCFG_SPARECR8)
> > +		 */
> > +		fsl,ippdexpcr1-alt-addr = <&scfg 0x51c>;
> > +	};
> > +
> > +
> >  * Freescale RCPM Wakeup Source Device Tree Bindings
> >  -------------------------------------------
> >  Required fsl,rcpm-wakeup property should be added to a device node if
> > the device
> > --
> > 2.7.4


^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Andy Lutomirski @ 2020-09-22  0:58 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, Arnd Bergmann, linux-block, Al Viro,
	Andy Lutomirski, io-uring, linux-arm-kernel, Jens Axboe,
	Parisc List, Network Development, LKML, LSM List, Linux FS Devel,
	Andrew Morton, linuxppc-dev
In-Reply-To: <d5c6736a-2cb4-4e22-78da-a667bda5c05a@gmail.com>

On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
>
>
> On 22/09/2020 02:51, Andy Lutomirski wrote:
> > On Mon, Sep 21, 2020 at 9:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
> >>
> >> On 21/09/2020 19:10, Pavel Begunkov wrote:
> >>> On 20/09/2020 01:22, Andy Lutomirski wrote:
> >>>>
> >>>>> On Sep 19, 2020, at 2:16 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> >>>>>
> >>>>> On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski <luto@kernel.org> wrote:
> >>>>>>> On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <hch@lst.de> wrote:
> >>>>>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> >>>>>>>> Said that, why not provide a variant that would take an explicit
> >>>>>>>> "is it compat" argument and use it there?  And have the normal
> >>>>>>>> one pass in_compat_syscall() to that...
> >>>>>>>
> >>>>>>> That would help to not introduce a regression with this series yes.
> >>>>>>> But it wouldn't fix existing bugs when io_uring is used to access
> >>>>>>> read or write methods that use in_compat_syscall().  One example that
> >>>>>>> I recently ran into is drivers/scsi/sg.c.
> >>>>>
> >>>>> Ah, so reading /dev/input/event* would suffer from the same issue,
> >>>>> and that one would in fact be broken by your patch in the hypothetical
> >>>>> case that someone tried to use io_uring to read /dev/input/event on x32...
> >>>>>
> >>>>> For reference, I checked the socket timestamp handling that has a
> >>>>> number of corner cases with time32/time64 formats in compat mode,
> >>>>> but none of those appear to be affected by the problem.
> >>>>>
> >>>>>> Aside from the potentially nasty use of per-task variables, one thing
> >>>>>> I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
> >>>>>> going to have a generic mechanism for this, shouldn't we allow a full
> >>>>>> override of the syscall arch instead of just allowing forcing compat
> >>>>>> so that a compat syscall can do a non-compat operation?
> >>>>>
> >>>>> The only reason it's needed here is that the caller is in a kernel
> >>>>> thread rather than a system call. Are there any possible scenarios
> >>>>> where one would actually need the opposite?
> >>>>>
> >>>>
> >>>> I can certainly imagine needing to force x32 mode from a kernel thread.
> >>>>
> >>>> As for the other direction: what exactly are the desired bitness/arch semantics of io_uring?  Is the operation bitness chosen by the io_uring creation or by the io_uring_enter() bitness?
> >>>
> >>> It's rather the second one. Even though AFAIR it wasn't discussed
> >>> specifically, that how it works now (_partially_).
> >>
> >> Double checked -- I'm wrong, that's the former one. Most of it is based
> >> on a flag that was set an creation.
> >>
> >
> > Could we get away with making io_uring_enter() return -EINVAL (or
> > maybe -ENOTTY?) if you try to do it with bitness that doesn't match
> > the io_uring?  And disable SQPOLL in compat mode?
>
> Something like below. If PF_FORCE_COMPAT or any other solution
> doesn't lend by the time, I'll take a look whether other io_uring's
> syscalls need similar checks, etc.
>
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 0458f02d4ca8..aab20785fa9a 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -8671,6 +8671,10 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
>         if (ctx->flags & IORING_SETUP_R_DISABLED)
>                 goto out;
>
> +       ret = -EINVAl;
> +       if (ctx->compat != in_compat_syscall())
> +               goto out;
> +

This seems entirely reasonable to me.  Sharing an io_uring ring
between programs with different ABIs seems a bit nutty.

>         /*
>          * For SQ polling, the thread will do all submissions and completions.
>          * Just return the requested submit count, and wake the thread if
> @@ -9006,6 +9010,10 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p,
>         if (ret)
>                 goto err;
>
> +       ret = -EINVAL;
> +       if (ctx->compat)
> +               goto err;
> +

I may be looking at a different kernel than you, but aren't you
preventing creating an io_uring regardless of whether SQPOLL is
requested?

>         /* Only gets the ring fd, doesn't install it in the file table */
>         fd = io_uring_get_fd(ctx, &file);
>         if (fd < 0) {
> --
> Pavel Begunkov

^ permalink raw reply

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Pavel Begunkov @ 2020-09-22  0:22 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, Arnd Bergmann, linux-block, Al Viro,
	io-uring, linux-arm-kernel, Jens Axboe, Parisc List,
	Network Development, LKML, LSM List, Linux FS Devel,
	Andrew Morton, linuxppc-dev
In-Reply-To: <CALCETrW3rwGsgfLNnu_0JAcL5jvrPVTLTWM3JpbB5P9Hye6Fdw@mail.gmail.com>



On 22/09/2020 02:51, Andy Lutomirski wrote:
> On Mon, Sep 21, 2020 at 9:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> On 21/09/2020 19:10, Pavel Begunkov wrote:
>>> On 20/09/2020 01:22, Andy Lutomirski wrote:
>>>>
>>>>> On Sep 19, 2020, at 2:16 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>>
>>>>> On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski <luto@kernel.org> wrote:
>>>>>>> On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <hch@lst.de> wrote:
>>>>>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
>>>>>>>> Said that, why not provide a variant that would take an explicit
>>>>>>>> "is it compat" argument and use it there?  And have the normal
>>>>>>>> one pass in_compat_syscall() to that...
>>>>>>>
>>>>>>> That would help to not introduce a regression with this series yes.
>>>>>>> But it wouldn't fix existing bugs when io_uring is used to access
>>>>>>> read or write methods that use in_compat_syscall().  One example that
>>>>>>> I recently ran into is drivers/scsi/sg.c.
>>>>>
>>>>> Ah, so reading /dev/input/event* would suffer from the same issue,
>>>>> and that one would in fact be broken by your patch in the hypothetical
>>>>> case that someone tried to use io_uring to read /dev/input/event on x32...
>>>>>
>>>>> For reference, I checked the socket timestamp handling that has a
>>>>> number of corner cases with time32/time64 formats in compat mode,
>>>>> but none of those appear to be affected by the problem.
>>>>>
>>>>>> Aside from the potentially nasty use of per-task variables, one thing
>>>>>> I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
>>>>>> going to have a generic mechanism for this, shouldn't we allow a full
>>>>>> override of the syscall arch instead of just allowing forcing compat
>>>>>> so that a compat syscall can do a non-compat operation?
>>>>>
>>>>> The only reason it's needed here is that the caller is in a kernel
>>>>> thread rather than a system call. Are there any possible scenarios
>>>>> where one would actually need the opposite?
>>>>>
>>>>
>>>> I can certainly imagine needing to force x32 mode from a kernel thread.
>>>>
>>>> As for the other direction: what exactly are the desired bitness/arch semantics of io_uring?  Is the operation bitness chosen by the io_uring creation or by the io_uring_enter() bitness?
>>>
>>> It's rather the second one. Even though AFAIR it wasn't discussed
>>> specifically, that how it works now (_partially_).
>>
>> Double checked -- I'm wrong, that's the former one. Most of it is based
>> on a flag that was set an creation.
>>
> 
> Could we get away with making io_uring_enter() return -EINVAL (or
> maybe -ENOTTY?) if you try to do it with bitness that doesn't match
> the io_uring?  And disable SQPOLL in compat mode?

Something like below. If PF_FORCE_COMPAT or any other solution
doesn't lend by the time, I'll take a look whether other io_uring's
syscalls need similar checks, etc.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 0458f02d4ca8..aab20785fa9a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8671,6 +8671,10 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit,
 	if (ctx->flags & IORING_SETUP_R_DISABLED)
 		goto out;
 
+	ret = -EINVAl;
+	if (ctx->compat != in_compat_syscall())
+		goto out;
+
 	/*
 	 * For SQ polling, the thread will do all submissions and completions.
 	 * Just return the requested submit count, and wake the thread if
@@ -9006,6 +9010,10 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p,
 	if (ret)
 		goto err;
 
+	ret = -EINVAL;
+	if (ctx->compat)
+		goto err;
+
 	/* Only gets the ring fd, doesn't install it in the file table */
 	fd = io_uring_get_fd(ctx, &file);
 	if (fd < 0) {
-- 
Pavel Begunkov

^ permalink raw reply related

* Re: [PATCH 1/9] kernel: add a PF_FORCE_COMPAT flag
From: Andy Lutomirski @ 2020-09-21 23:51 UTC (permalink / raw)
  To: Pavel Begunkov
  Cc: linux-aio, open list:MIPS, David Howells, Linux-MM, keyrings,
	sparclinux, Christoph Hellwig, linux-arch, linux-s390,
	Linux SCSI List, X86 ML, Arnd Bergmann, linux-block, Al Viro,
	Andy Lutomirski, io-uring, linux-arm-kernel, Jens Axboe,
	Parisc List, Network Development, LKML, LSM List, Linux FS Devel,
	Andrew Morton, linuxppc-dev
In-Reply-To: <486c92d0-0f2e-bd61-1ab8-302524af5e08@gmail.com>

On Mon, Sep 21, 2020 at 9:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 21/09/2020 19:10, Pavel Begunkov wrote:
> > On 20/09/2020 01:22, Andy Lutomirski wrote:
> >>
> >>> On Sep 19, 2020, at 2:16 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> >>>
> >>> On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski <luto@kernel.org> wrote:
> >>>>> On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <hch@lst.de> wrote:
> >>>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote:
> >>>>>> Said that, why not provide a variant that would take an explicit
> >>>>>> "is it compat" argument and use it there?  And have the normal
> >>>>>> one pass in_compat_syscall() to that...
> >>>>>
> >>>>> That would help to not introduce a regression with this series yes.
> >>>>> But it wouldn't fix existing bugs when io_uring is used to access
> >>>>> read or write methods that use in_compat_syscall().  One example that
> >>>>> I recently ran into is drivers/scsi/sg.c.
> >>>
> >>> Ah, so reading /dev/input/event* would suffer from the same issue,
> >>> and that one would in fact be broken by your patch in the hypothetical
> >>> case that someone tried to use io_uring to read /dev/input/event on x32...
> >>>
> >>> For reference, I checked the socket timestamp handling that has a
> >>> number of corner cases with time32/time64 formats in compat mode,
> >>> but none of those appear to be affected by the problem.
> >>>
> >>>> Aside from the potentially nasty use of per-task variables, one thing
> >>>> I don't like about PF_FORCE_COMPAT is that it's one-way.  If we're
> >>>> going to have a generic mechanism for this, shouldn't we allow a full
> >>>> override of the syscall arch instead of just allowing forcing compat
> >>>> so that a compat syscall can do a non-compat operation?
> >>>
> >>> The only reason it's needed here is that the caller is in a kernel
> >>> thread rather than a system call. Are there any possible scenarios
> >>> where one would actually need the opposite?
> >>>
> >>
> >> I can certainly imagine needing to force x32 mode from a kernel thread.
> >>
> >> As for the other direction: what exactly are the desired bitness/arch semantics of io_uring?  Is the operation bitness chosen by the io_uring creation or by the io_uring_enter() bitness?
> >
> > It's rather the second one. Even though AFAIR it wasn't discussed
> > specifically, that how it works now (_partially_).
>
> Double checked -- I'm wrong, that's the former one. Most of it is based
> on a flag that was set an creation.
>

Could we get away with making io_uring_enter() return -EINVAL (or
maybe -ENOTTY?) if you try to do it with bitness that doesn't match
the io_uring?  And disable SQPOLL in compat mode?

--Andy

^ permalink raw reply

* Re: [PATCH V2] Doc: admin-guide: Add entry for kvm_cma_resv_ratio kernel param
From: Paul Mackerras @ 2020-09-21 23:19 UTC (permalink / raw)
  To: sathnaga
  Cc: Randy Dunlap, Jonathan Corbet, linux-doc, linux-kernel, kvm-ppc,
	linuxppc-dev
In-Reply-To: <20200921090220.14981-1-sathnaga@linux.vnet.ibm.com>

On Mon, Sep 21, 2020 at 02:32:20PM +0530, sathnaga@linux.vnet.ibm.com wrote:
> From: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> 
> Add document entry for kvm_cma_resv_ratio kernel param which
> is used to alter the KVM contiguous memory allocation percentage
> for hash pagetable allocation used by hash mode PowerPC KVM guests.
> 
> Cc: linux-kernel@vger.kernel.org
> Cc: kvm-ppc@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
> Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> ---
> 
> V2: 
> Addressed review comments from Randy.
> 
> V1: https://lkml.org/lkml/2020/9/16/72
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index a1068742a6df..932ed45740c9 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2258,6 +2258,14 @@
>  			[KVM,ARM] Allow use of GICv4 for direct injection of
>  			LPIs.
>  
> +	kvm_cma_resv_ratio=n [PPC]
> +			Reserves given percentage from system memory area for
> +			contiguous memory allocation for KVM hash pagetable
> +			allocation.
> +			By default it reserves 5% of total system memory.

I am concerned that using the term "reserve" here could give the
impression that this memory is then not available for any other use.
It is in fact available for other uses as long as they are movable
allocations.  So this memory is available for uses such as process
anonymous memory and page cache, just not for things like kmalloc.

I'm not sure what would be a better term than "reserve", though.
Perhaps we need to add a sentence something like "The reserved memory
is available for use as process memory and page cache when it is not
being used by KVM."

Paul.

^ permalink raw reply

* RE: [PATCH 4/5] arm: dts: ls1021a: fix flextimer failed to wake system
From: Leo Li @ 2020-09-21 23:01 UTC (permalink / raw)
  To: Ran Wang, Rob Herring, Shawn Guo
  Cc: devicetree@vger.kernel.org, Ran Wang,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20200916081831.24747-4-ran.wang_1@nxp.com>



> -----Original Message-----
> From: Ran Wang <ran.wang_1@nxp.com>
> Sent: Wednesday, September 16, 2020 3:19 AM
> To: Leo Li <leoyang.li@nxp.com>; Rob Herring <robh+dt@kernel.org>;
> Shawn Guo <shawnguo@kernel.org>
> Cc: linuxppc-dev@lists.ozlabs.org; linux-arm-kernel@lists.infradead.org;
> devicetree@vger.kernel.org; linux-kernel@vger.kernel.org; Ran Wang
> <ran.wang_1@nxp.com>
> Subject: [PATCH 4/5] arm: dts: ls1021a: fix flextimer failed to wake system
> 
> The data of property 'fsl,rcpm-wakeup' is not corrcet, which causing RCPM
> driver incorrectly program register IPPDEXPCR1, then flextimer is wrongly
> clock gated during system suspend, can't send interrupt to wake.
> 
> Signed-off-by: Ran Wang <ran.wang_1@nxp.com>

Acked-by: Li Yang <leoyang.li@nxp.com>

> ---
>  arch/arm/boot/dts/ls1021a.dtsi | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi
> index 089fe86..e372630f 100644
> --- a/arch/arm/boot/dts/ls1021a.dtsi
> +++ b/arch/arm/boot/dts/ls1021a.dtsi
> @@ -1014,7 +1014,7 @@
>  			compatible = "fsl,ls1021a-ftm-alarm";
>  			reg = <0x0 0x29d0000 0x0 0x10000>;
>  			reg-names = "ftm";
> -			fsl,rcpm-wakeup = <&rcpm 0x20000 0x0>;
> +			fsl,rcpm-wakeup = <&rcpm 0x0 0x20000000>;
>  			interrupts = <GIC_SPI 118 IRQ_TYPE_LEVEL_HIGH>;
>  			big-endian;
>  		};
> --
> 2.7.4


^ permalink raw reply

* RE: [PATCH 3/5] arm: dts: ls1021a: fix that FlexTimer cannot wakeup system in deep sleep
From: Leo Li @ 2020-09-21 22:58 UTC (permalink / raw)
  To: Ran Wang, Rob Herring, Shawn Guo
  Cc: devicetree@vger.kernel.org, Biwen Li,
	linux-kernel@vger.kernel.org, Ran Wang,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20200916081831.24747-3-ran.wang_1@nxp.com>



> -----Original Message-----
> From: Ran Wang <ran.wang_1@nxp.com>
> Sent: Wednesday, September 16, 2020 3:18 AM
> To: Leo Li <leoyang.li@nxp.com>; Rob Herring <robh+dt@kernel.org>;
> Shawn Guo <shawnguo@kernel.org>
> Cc: linuxppc-dev@lists.ozlabs.org; linux-arm-kernel@lists.infradead.org;
> devicetree@vger.kernel.org; linux-kernel@vger.kernel.org; Biwen Li
> <biwen.li@nxp.com>; Ran Wang <ran.wang_1@nxp.com>
> Subject: [PATCH 3/5] arm: dts: ls1021a: fix that FlexTimer cannot wakeup
> system in deep sleep

A better description should be enabling the A-008646 workaround to be consistent with other patches.

> 
> From: Biwen Li <biwen.li@nxp.com>
> 
> The patch fixes a bug that FlexTimer cannot wakeup system in deep sleep.
> 
> Signed-off-by: Biwen Li <biwen.li@nxp.com>
> Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
> ---
>  arch/arm/boot/dts/ls1021a.dtsi | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi
> index 827373e..089fe86 100644
> --- a/arch/arm/boot/dts/ls1021a.dtsi
> +++ b/arch/arm/boot/dts/ls1021a.dtsi
> @@ -1007,6 +1007,7 @@
>  			compatible = "fsl,ls1021a-rcpm", "fsl,qoriq-rcpm-
> 2.1+";
>  			reg = <0x0 0x1ee2140 0x0 0x8>;
>  			#fsl,rcpm-wakeup-cells = <2>;
> +			fsl,ippdexpcr1-alt-addr = <&scfg 0x51c>;
>  		};
> 
>  		ftm_alarm0: timer0@29d0000 {
> --
> 2.7.4


^ permalink raw reply

* RE: [PATCH 2/5] soc: fsl: handle RCPM errata A-008646 on SoC LS1021A
From: Leo Li @ 2020-09-21 22:42 UTC (permalink / raw)
  To: Ran Wang, Rob Herring, Shawn Guo
  Cc: devicetree@vger.kernel.org, Biwen Li,
	linux-kernel@vger.kernel.org, Ran Wang,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <20200916081831.24747-2-ran.wang_1@nxp.com>



> -----Original Message-----
> From: Ran Wang <ran.wang_1@nxp.com>
> Sent: Wednesday, September 16, 2020 3:18 AM
> To: Leo Li <leoyang.li@nxp.com>; Rob Herring <robh+dt@kernel.org>;
> Shawn Guo <shawnguo@kernel.org>
> Cc: linuxppc-dev@lists.ozlabs.org; linux-arm-kernel@lists.infradead.org;
> devicetree@vger.kernel.org; linux-kernel@vger.kernel.org; Biwen Li
> <biwen.li@nxp.com>; Ran Wang <ran.wang_1@nxp.com>
> Subject: [PATCH 2/5] soc: fsl: handle RCPM errata A-008646 on SoC LS1021A
> 
> From: Biwen Li <biwen.li@nxp.com>
> 
> Description:
> 	- Reading configuration register RCPM_IPPDEXPCR1
> 	  always return zero
> 
> Workaround:
> 	- Save register RCPM_IPPDEXPCR1's value to
> 	  register SCFG_SPARECR8.(uboot's psci also
> 	  need reading value from the register SCFG_SPARECR8
> 	  to set register RCPM_IPPDEXPCR1)
> 
> Impact:
> 	- FlexTimer module will cannot wakeup system in
Will not..
Also it will be better to merge this with the issue description part above to prevent confusion.

> 	  deep sleep on SoC LS1021A
> 
> Signed-off-by: Biwen Li <biwen.li@nxp.com>
> Signed-off-by: Ran Wang <ran.wang_1@nxp.com>
> ---
>  drivers/soc/fsl/rcpm.c | 42
> +++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/soc/fsl/rcpm.c b/drivers/soc/fsl/rcpm.c
> index a093dbe..e6354f5 100644
> --- a/drivers/soc/fsl/rcpm.c
> +++ b/drivers/soc/fsl/rcpm.c
> @@ -2,7 +2,7 @@
>  //
>  // rcpm.c - Freescale QorIQ RCPM driver
>  //
> -// Copyright 2019 NXP
> +// Copyright 2019-2020 NXP
>  //
>  // Author: Ran Wang <ran.wang_1@nxp.com>
> 
> @@ -13,6 +13,9 @@
>  #include <linux/slab.h>
>  #include <linux/suspend.h>
>  #include <linux/kernel.h>
> +#include <linux/acpi.h>
> +#include <linux/mfd/syscon.h>
> +#include <linux/regmap.h>
> 
>  #define RCPM_WAKEUP_CELL_MAX_SIZE	7
> 
> @@ -37,6 +40,9 @@ static int rcpm_pm_prepare(struct device *dev)
>  	struct device_node	*np = dev->of_node;
>  	u32 value[RCPM_WAKEUP_CELL_MAX_SIZE + 1];
>  	u32 setting[RCPM_WAKEUP_CELL_MAX_SIZE] = {0};
> +	struct regmap *scfg_addr_regmap = NULL;
> +	u32 reg_offset[2];
> +	u32 reg_value = 0;
> 
>  	rcpm = dev_get_drvdata(dev);
>  	if (!rcpm)
> @@ -90,6 +96,40 @@ static int rcpm_pm_prepare(struct device *dev)
>  			tmp |= ioread32be(address);
>  			iowrite32be(tmp, address);
>  		}
> +		/*
> +		 * Workaround of errata A-008646 on SoC LS1021A:
> +		 * There is a bug of register ippdexpcr1.
> +		 * Reading configuration register RCPM_IPPDEXPCR1
> +		 * always return zero. So save ippdexpcr1's value
> +		 * to register SCFG_SPARECR8.And the value of
> +		 * ippdexpcr1 will be read from SCFG_SPARECR8.
> +		 */
> +		if (device_property_present(dev, "fsl,ippdexpcr1-alt-addr"))
> {
> +			if (dev_of_node(dev)) {
> +				scfg_addr_regmap =
> syscon_regmap_lookup_by_phandle(np,
> +
> 	   "fsl,ippdexpcr1-alt-addr");
> +			} else if (is_acpi_node(dev->fwnode)) {
> +				continue;
> +			}
> +
> +			if (scfg_addr_regmap && (i == 1)) {
> +				if (device_property_read_u32_array(dev,
> +				    "fsl,ippdexpcr1-alt-addr",
> +				    reg_offset,
> +				    2)) {

It is not necessary to read out the whole fsl,ippdexpcr1-alt-addr property if we only need the offset.  Also you can change to use the syscon_regmap_lookup_by_phandle_args() API above to simplify the code.

> +					scfg_addr_regmap = NULL;
> +					continue;
> +				}
> +				/* Read value from register SCFG_SPARECR8
> */
> +				regmap_read(scfg_addr_regmap,
> +					    reg_offset[1],
> +					    &reg_value);
> +				/* Write value to register SCFG_SPARECR8 */
> +				regmap_write(scfg_addr_regmap,
> +					     reg_offset[1],
> +					     tmp | reg_value);
> +			}
> +		}
>  	}
> 
>  	return 0;
> --
> 2.7.4


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox