* Re: [PATCH] powerpc/powermac: Fix low_sleep_handler with KUAP and KUEP
From: Michael Ellerman @ 2020-09-10 23:56 UTC (permalink / raw)
To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <74472ea2a7e8698273643cde7d382bd9f31cd1dd.1598945727.git.christophe.leroy@csgroup.eu>
Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> low_sleep_handler() has an hardcoded restore of segment registers
> that doesn't take KUAP and KUEP into account.
>
> Use head_32's load_segment_registers() routine instead.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> Fixes: a68c31fc01ef ("powerpc/32s: Implement Kernel Userspace Access Protection")
> Fixes: 31ed2b13c48d ("powerpc/32s: Implement Kernel Userspace Execution Prevention.")
> Cc: stable@vger.kernel.org
> ---
> arch/powerpc/platforms/powermac/sleep.S | 9 +--------
> 1 file changed, 1 insertion(+), 8 deletions(-)
Doesn't build? pmac32_defconfig, gcc 9.3.0:
ld: arch/powerpc/platforms/powermac/sleep.o: in function `core99_wake_up':
(.text+0x25c): undefined reference to `load_segment_registers'
Missing _GLOBAL() presumably?
cheers
> diff --git a/arch/powerpc/platforms/powermac/sleep.S b/arch/powerpc/platforms/powermac/sleep.S
> index f9a680fdd9c4..51bfdfe85058 100644
> --- a/arch/powerpc/platforms/powermac/sleep.S
> +++ b/arch/powerpc/platforms/powermac/sleep.S
> @@ -294,14 +294,7 @@ grackle_wake_up:
> * we do any r1 memory access as we are not sure they
> * are in a sane state above the first 256Mb region
> */
> - li r0,16 /* load up segment register values */
> - mtctr r0 /* for context 0 */
> - lis r3,0x2000 /* Ku = 1, VSID = 0 */
> - li r4,0
> -3: mtsrin r3,r4
> - addi r3,r3,0x111 /* increment VSID */
> - addis r4,r4,0x1000 /* address of next segment */
> - bdnz 3b
> + bl load_segment_registers
> sync
> isync
>
> --
> 2.25.0
^ permalink raw reply
* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Jason Gunthorpe @ 2020-09-10 23:21 UTC (permalink / raw)
To: Gerald Schaefer
Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
linux-arch, linux-s390, Vasily Gorbik, Richard Weinberger,
linux-x86, Russell King, Christian Borntraeger, Ingo Molnar,
Catalin Marinas, Andrey Ryabinin, Heiko Carstens, Arnd Bergmann,
John Hubbard, Jeff Dike, linux-um, Borislav Petkov,
Andy Lutomirski, Thomas Gleixner, linux-arm, linux-mm,
linux-power, LKML, Andrew Morton, Linus Torvalds, Mike Rapoport
In-Reply-To: <20200910195749.795232d1@thinkpad>
On Thu, Sep 10, 2020 at 07:57:49PM +0200, Gerald Schaefer wrote:
> On Thu, 10 Sep 2020 10:02:33 -0300
> Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> > On Thu, Sep 10, 2020 at 11:39:25AM +0200, Alexander Gordeev wrote:
> >
> > > As Gerald mentioned, it is very difficult to explain in a clear way.
> > > Hopefully, one could make sense ot of it.
> >
> > I would say the page table API requires this invariant:
> >
> > pud = pud_offset(p4d, addr);
> > do {
> > WARN_ON(pud != pud_offset(p4d, addr);
> > next = pud_addr_end(addr, end);
> > } while (pud++, addr = next, addr != end);
> >
> > ie pud++ is supposed to be a shortcut for
> > pud_offset(p4d, next)
> >
>
> Hmm, IIUC, all architectures with static folding will simply return
> the passed-in p4d pointer for pud_offset(p4d, addr), for 3-level
> pagetables.
It is probably moot now, but since other arch's don't crash they also
return pud_addr_end() == end so the loop only does one iteration.
ie pud == pud_offset(p4d, addr) for all iterations as the pud++ never
happens.
Which is what this addr_end patch does for s390..
Jason
^ permalink raw reply
* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: John Hubbard @ 2020-09-10 22:17 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
linux-arch, linux-s390, Vasily Gorbik, Richard Weinberger,
linux-x86, Russell King, Christian Borntraeger, Ingo Molnar,
Catalin Marinas, Andrey Ryabinin, Gerald Schaefer, Heiko Carstens,
Arnd Bergmann, Jeff Dike, linux-um, Borislav Petkov,
Andy Lutomirski, Thomas Gleixner, linux-arm, linux-mm,
linux-power, LKML, Andrew Morton, Linus Torvalds, Mike Rapoport
In-Reply-To: <20200910221116.GQ87483@ziepe.ca>
On 9/10/20 3:11 PM, Jason Gunthorpe wrote:
> On Thu, Sep 10, 2020 at 02:22:37PM -0700, John Hubbard wrote:
>
>> Or am I way off here, and it really is possible (aside from the current
>> s390 situation) to observe something that "is no longer a page table"?
>
> Yes, that is the issue. Remember there is no locking for GUP
> fast. While a page table cannot be freed there is nothing preventing
> the page table entry from being concurrently modified.
>
OK, then we are saying the same thing after all, good.
> Without the stack variable it looks like this:
>
> pud_t pud = READ_ONCE(*pudp);
> if (!pud_present(pud))
> return
> pmd_offset(pudp, address);
>
> And pmd_offset() expands to
>
> return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
>
> Between the READ_ONCE(*pudp) and (*pud) inside pmd_offset() the value
> of *pud can change, eg to !pud_present.
>
> Then pud_page_vaddr(*pud) will crash. It is not use after free, it
> is using data that has not been validated.
>
Right, that matches what I had in mind, too: you can still have a problem
even though you're in the same page table. I just wanted to confirm that
there's not some odd way to launch out into completely non-page-table
memory.
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply
* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Jason Gunthorpe @ 2020-09-10 22:11 UTC (permalink / raw)
To: John Hubbard
Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
linux-arch, linux-s390, Vasily Gorbik, Richard Weinberger,
linux-x86, Russell King, Christian Borntraeger, Ingo Molnar,
Catalin Marinas, Andrey Ryabinin, Gerald Schaefer, Heiko Carstens,
Arnd Bergmann, Jeff Dike, linux-um, Borislav Petkov,
Andy Lutomirski, Thomas Gleixner, linux-arm, linux-mm,
linux-power, LKML, Andrew Morton, Linus Torvalds, Mike Rapoport
In-Reply-To: <0c9bcb54-914b-e582-dd6d-3861267b6c94@nvidia.com>
On Thu, Sep 10, 2020 at 02:22:37PM -0700, John Hubbard wrote:
> Or am I way off here, and it really is possible (aside from the current
> s390 situation) to observe something that "is no longer a page table"?
Yes, that is the issue. Remember there is no locking for GUP
fast. While a page table cannot be freed there is nothing preventing
the page table entry from being concurrently modified.
Without the stack variable it looks like this:
pud_t pud = READ_ONCE(*pudp);
if (!pud_present(pud))
return
pmd_offset(pudp, address);
And pmd_offset() expands to
return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
Between the READ_ONCE(*pudp) and (*pud) inside pmd_offset() the value
of *pud can change, eg to !pud_present.
Then pud_page_vaddr(*pud) will crash. It is not use after free, it
is using data that has not been validated.
Jason
^ permalink raw reply
* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Jason Gunthorpe @ 2020-09-10 21:59 UTC (permalink / raw)
To: Linus Torvalds
Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
linux-arch, linux-s390, Vasily Gorbik, Richard Weinberger,
linux-x86, Russell King, Christian Borntraeger, Ingo Molnar,
Catalin Marinas, Andrey Ryabinin, Gerald Schaefer, Heiko Carstens,
Arnd Bergmann, John Hubbard, Jeff Dike, linux-um, Borislav Petkov,
Andy Lutomirski, Thomas Gleixner, linux-arm, linux-mm, LKML,
Andrew Morton, linux-power, Mike Rapoport
In-Reply-To: <CAHk-=wg3ggXU98Mnv-ss-hEcvUNc9vCtgSRc7GpcGfvyOw_h3g@mail.gmail.com>
On Thu, Sep 10, 2020 at 12:32:05PM -0700, Linus Torvalds wrote:
> Yeah, I get hung up on naming sometimes. I don't tend to care much
> about private local variables ("i" is a perfectly fine variable name),
> but these kinds of somewhat subtle cross-architecture definitions I
> feel matter.
One of the first replys to this patch was to ask "when would I use
_orig vs normal", so you are not alone. The name should convey it..
So, I suggest pXX_offset_unlocked()
Since it is safe to call without the page table lock, while pXX_offset()
requires the page table lock to be held as the internal *pXX is a data
race otherwise.
Patch 1 might be OK for a stable backport, but to get to a clear
pXX_offset_unlocked() all the arches would want to be changed to
implement that API and the generic code would provide the wrapper:
#define pXX_offset(pXXp, address) pXX_offset_unlocked(pXXp, *(pXXp), address)
Arches would not have a *pXX inside their code.
Then we can talk about auditing call sites of pXX_offset and think
about using the _unlocked version in places where the page table lock
is not held.
For instance mm/pagewalk.c should be changed. So should
huge_pte_offset() and probably other places. These places might
already be exsting data-race bugs.
It is code-as-documentation indicating an unlocked page table walk.
Now it is not just a S390 story but a change that makes the data
concurrency much clearer, so I think I prefer this version to the
addr_end one too.
Regards,
Jason
^ permalink raw reply
* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: John Hubbard @ 2020-09-10 21:22 UTC (permalink / raw)
To: Jason Gunthorpe, Linus Torvalds
Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
linux-arch, linux-s390, Vasily Gorbik, Richard Weinberger,
linux-x86, Russell King, Christian Borntraeger, Ingo Molnar,
Catalin Marinas, Andrey Ryabinin, Gerald Schaefer, Heiko Carstens,
Arnd Bergmann, Jeff Dike, linux-um, Borislav Petkov,
Andy Lutomirski, Thomas Gleixner, linux-arm, linux-mm, LKML,
Andrew Morton, linux-power, Mike Rapoport
In-Reply-To: <20200910181319.GO87483@ziepe.ca>
On 9/10/20 11:13 AM, Jason Gunthorpe wrote:
> On Thu, Sep 10, 2020 at 10:35:38AM -0700, Linus Torvalds wrote:
>> On Thu, Sep 10, 2020 at 2:40 AM Alexander Gordeev
>> <agordeev@linux.ibm.com> wrote:
>>>
>>> It is only gup_fast case that exposes the issue. It hits because
>>> pointers to stack copies are passed to gup_pXd_range iterators, not
>>> pointers to real page tables itself.
>>
>> Can we possibly change fast-gup to not do the stack copies?
>>
>> I'd actually rather do something like that, than the "addr_end" thing.
>
>> As you say, none of the other page table walking code does what the
>> GUP code does, and I don't think it's required.
>
> As I understand it, the requirement is because fast-gup walks without
> the page table spinlock, or mmap_sem held so it must READ_ONCE the
> *pXX.
>
> It then checks that it is a valid page table pointer, then calls
> pXX_offset().
>
> The arch implementation of pXX_offset() derefs again the passed pXX
> pointer. So it defeats the READ_ONCE and the 2nd load could observe
> something that is no longer a page table pointer and crash.
Just to be clear, though, that makes it sound a little wilder and
reckless than it really is, right?
Because actually, the page tables cannot be freed while gup_fast is
walking them, due to either IPI blocking during the walk, or the moral
equivalent (MMU_GATHER_RCU_TABLE_FREE) for non-IPI architectures. So the
pages tables can *change* underneath gup_fast, and for example pages can
be unmapped. But they remain valid page tables, it's just that their
contents are unstable. Even if pXd_none()==true.
Or am I way off here, and it really is possible (aside from the current
s390 situation) to observe something that "is no longer a page table"?
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply
* [PATCH 4/7] powerpc/xive: Fix W=1 compile warning
From: Cédric Le Goater @ 2020-09-10 21:02 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev, Cédric Le Goater
In-Reply-To: <20200910210250.1962595-1-clg@kaod.org>
CC arch/powerpc/sysdev/xive/common.o
../arch/powerpc/sysdev/xive/common.c:1568:6: error: no previous prototype for ‘xive_debug_show_cpu’ [-Werror=missing-prototypes]
void xive_debug_show_cpu(struct seq_file *m, int cpu)
^~~~~~~~~~~~~~~~~~~
../arch/powerpc/sysdev/xive/common.c:1602:6: error: no previous prototype for ‘xive_debug_show_irq’ [-Werror=missing-prototypes]
void xive_debug_show_irq(struct seq_file *m, u32 hw_irq, struct irq_data *d)
^~~~~~~~~~~~~~~~~~~
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/sysdev/xive/common.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index f591be9f01f4..a80440af491a 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1565,7 +1565,7 @@ static int __init xive_off(char *arg)
}
__setup("xive=off", xive_off);
-void xive_debug_show_cpu(struct seq_file *m, int cpu)
+static void xive_debug_show_cpu(struct seq_file *m, int cpu)
{
struct xive_cpu *xc = per_cpu(xive_cpu, cpu);
@@ -1599,7 +1599,7 @@ void xive_debug_show_cpu(struct seq_file *m, int cpu)
seq_puts(m, "\n");
}
-void xive_debug_show_irq(struct seq_file *m, u32 hw_irq, struct irq_data *d)
+static void xive_debug_show_irq(struct seq_file *m, u32 hw_irq, struct irq_data *d)
{
struct irq_chip *chip = irq_data_get_irq_chip(d);
int rc;
--
2.25.4
^ permalink raw reply related
* [PATCH 7/7] powerpc/32: Fix W=1 compile warning
From: Cédric Le Goater @ 2020-09-10 21:02 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Christophe Leroy, linuxppc-dev, Cédric Le Goater
In-Reply-To: <20200910210250.1962595-1-clg@kaod.org>
CC arch/powerpc/kernel/traps.o
../arch/powerpc/kernel/traps.c:1663:6: error: no previous prototype for ‘stack_overflow_exception’ [-Werror=missing-prototypes]
void stack_overflow_exception(struct pt_regs *regs)
^~~~~~~~~~~~~~~~~~~~~~~~
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Fixes: 3978eb78517c ("powerpc/32: Add early stack overflow detection with VMAP stack.")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/include/asm/asm-prototypes.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index de14b1a34d56..4957119604c7 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -67,6 +67,7 @@ void single_step_exception(struct pt_regs *regs);
void program_check_exception(struct pt_regs *regs);
void alignment_exception(struct pt_regs *regs);
void StackOverflow(struct pt_regs *regs);
+void stack_overflow_exception(struct pt_regs *regs);
void kernel_fp_unavailable_exception(struct pt_regs *regs);
void altivec_unavailable_exception(struct pt_regs *regs);
void vsx_unavailable_exception(struct pt_regs *regs);
--
2.25.4
^ permalink raw reply related
* [PATCH 6/7] powerpc/perf: Fix W=1 compile warning
From: Cédric Le Goater @ 2020-09-10 21:02 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Anju T Sudhakar, linuxppc-dev, Cédric Le Goater
In-Reply-To: <20200910210250.1962595-1-clg@kaod.org>
CC arch/powerpc/perf/imc-pmu.o
../arch/powerpc/perf/imc-pmu.c: In function ‘trace_imc_event_init’:
../arch/powerpc/perf/imc-pmu.c:1429:22: error: variable ‘target’ set but not used [-Werror=unused-but-set-variable]
struct task_struct *target;
^~~~~~
Cc: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/perf/imc-pmu.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 62d0b54086f8..9ed4fcccf8a9 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1426,8 +1426,6 @@ static void trace_imc_event_del(struct perf_event *event, int flags)
static int trace_imc_event_init(struct perf_event *event)
{
- struct task_struct *target;
-
if (event->attr.type != event->pmu->type)
return -ENOENT;
@@ -1458,7 +1456,6 @@ static int trace_imc_event_init(struct perf_event *event)
mutex_unlock(&imc_global_refc.lock);
event->hw.idx = -1;
- target = event->hw.target;
event->pmu->task_ctx_nr = perf_hw_context;
event->destroy = reset_global_refc;
--
2.25.4
^ permalink raw reply related
* [PATCH 1/7] powerpc/sysfs: Fix W=1 compile warning
From: Cédric Le Goater @ 2020-09-10 21:02 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Madhavan Srinivasan, linuxppc-dev, Cédric Le Goater
In-Reply-To: <20200910210250.1962595-1-clg@kaod.org>
arch/powerpc/kernel/sysfs.c: In function ‘sysfs_create_dscr_default’:
arch/powerpc/kernel/sysfs.c:228:7: error: variable ‘err’ set but not used [-Werror=unused-but-set-variable]
int err = 0;
^~~
cc1: all warnings being treated as errors
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/kernel/sysfs.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 46b4ebc33db7..821a3dc4c924 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -225,14 +225,13 @@ static DEVICE_ATTR(dscr_default, 0600,
static void sysfs_create_dscr_default(void)
{
if (cpu_has_feature(CPU_FTR_DSCR)) {
- int err = 0;
int cpu;
dscr_default = spr_default_dscr;
for_each_possible_cpu(cpu)
paca_ptrs[cpu]->dscr_default = dscr_default;
- err = device_create_file(cpu_subsys.dev_root, &dev_attr_dscr_default);
+ device_create_file(cpu_subsys.dev_root, &dev_attr_dscr_default);
}
}
#endif /* CONFIG_PPC64 */
--
2.25.4
^ permalink raw reply related
* [PATCH 0/7] powerpc: Fix a few W=1 compile warnings
From: Cédric Le Goater @ 2020-09-10 21:02 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev, Cédric Le Goater
Hello,
Here is a small contribution on improving compile with W=1.
Thanks,
C.
Cédric Le Goater (7):
powerpc/sysfs: Fix W=1 compile warning
powerpc/prom: Fix W=1 compile warning
powerpc/sstep: Fix W=1 compile warning
powerpc/xive: Fix W=1 compile warning
powerpc/powernv/pci: Fix W=1 compile warning
powerpc/perf: Fix W=1 compile warning
powerpc/32: Fix W=1 compile warning
arch/powerpc/include/asm/asm-prototypes.h | 1 +
arch/powerpc/kernel/prom.c | 51 ++++++++++++-----------
arch/powerpc/kernel/sysfs.c | 3 +-
arch/powerpc/lib/sstep.c | 3 +-
arch/powerpc/perf/imc-pmu.c | 3 --
arch/powerpc/platforms/powernv/pci-ioda.c | 8 ----
arch/powerpc/sysdev/xive/common.c | 4 +-
7 files changed, 32 insertions(+), 41 deletions(-)
--
2.25.4
^ permalink raw reply
* [PATCH 2/7] powerpc/prom: Fix W=1 compile warning
From: Cédric Le Goater @ 2020-09-10 21:02 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Christophe Leroy, linuxppc-dev, Cédric Le Goater
In-Reply-To: <20200910210250.1962595-1-clg@kaod.org>
arch/powerpc/kernel/prom.c: In function ‘early_reserve_mem’:
arch/powerpc/kernel/prom.c:625:10: error: variable ‘reserve_map’ set but not used [-Werror=unused-but-set-variable]
__be64 *reserve_map;
^~~~~~~~~~~
cc1: all warnings being treated as errors
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/kernel/prom.c | 51 +++++++++++++++++++-------------------
1 file changed, 26 insertions(+), 25 deletions(-)
diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
index d8a2fb87ba0c..4bae9ebc7d0b 100644
--- a/arch/powerpc/kernel/prom.c
+++ b/arch/powerpc/kernel/prom.c
@@ -622,11 +622,6 @@ static void __init early_reserve_mem_dt(void)
static void __init early_reserve_mem(void)
{
- __be64 *reserve_map;
-
- reserve_map = (__be64 *)(((unsigned long)initial_boot_params) +
- fdt_off_mem_rsvmap(initial_boot_params));
-
/* Look for the new "reserved-regions" property in the DT */
early_reserve_mem_dt();
@@ -639,28 +634,34 @@ static void __init early_reserve_mem(void)
}
#endif /* CONFIG_BLK_DEV_INITRD */
-#ifdef CONFIG_PPC32
- /*
- * Handle the case where we might be booting from an old kexec
- * image that setup the mem_rsvmap as pairs of 32-bit values
- */
- if (be64_to_cpup(reserve_map) > 0xffffffffull) {
- u32 base_32, size_32;
- __be32 *reserve_map_32 = (__be32 *)reserve_map;
-
- DBG("Found old 32-bit reserve map\n");
-
- while (1) {
- base_32 = be32_to_cpup(reserve_map_32++);
- size_32 = be32_to_cpup(reserve_map_32++);
- if (size_32 == 0)
- break;
- DBG("reserving: %x -> %x\n", base_32, size_32);
- memblock_reserve(base_32, size_32);
+ if (IS_ENABLED(CONFIG_PPC32)) {
+ __be64 *reserve_map;
+
+ reserve_map = (__be64 *)(((unsigned long)initial_boot_params) +
+ fdt_off_mem_rsvmap(initial_boot_params));
+
+ /*
+ * Handle the case where we might be booting from an
+ * old kexec image that setup the mem_rsvmap as pairs
+ * of 32-bit values
+ */
+ if (be64_to_cpup(reserve_map) > 0xffffffffull) {
+ u32 base_32, size_32;
+ __be32 *reserve_map_32 = (__be32 *)reserve_map;
+
+ DBG("Found old 32-bit reserve map\n");
+
+ while (1) {
+ base_32 = be32_to_cpup(reserve_map_32++);
+ size_32 = be32_to_cpup(reserve_map_32++);
+ if (size_32 == 0)
+ break;
+ DBG("reserving: %x -> %x\n", base_32, size_32);
+ memblock_reserve(base_32, size_32);
+ }
+ return;
}
- return;
}
-#endif
}
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
--
2.25.4
^ permalink raw reply related
* [PATCH 5/7] powerpc/powernv/pci: Fix W=1 compile warning
From: Cédric Le Goater @ 2020-09-10 21:02 UTC (permalink / raw)
To: Michael Ellerman
Cc: Oliver O'Halloran, linuxppc-dev, Cédric Le Goater
In-Reply-To: <20200910210250.1962595-1-clg@kaod.org>
CC arch/powerpc/platforms/powernv/pci-ioda.o
../arch/powerpc/platforms/powernv/pci-ioda.c: In function ‘pnv_ioda_configure_pe’:
../arch/powerpc/platforms/powernv/pci-ioda.c:897:18: error: variable ‘parent’ set but not used [-Werror=unused-but-set-variable]
struct pci_dev *parent;
^~~~~~
Cc: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/platforms/powernv/pci-ioda.c | 8 --------
1 file changed, 8 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 023a4f987bb2..2b4ceb5e6ce4 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -894,7 +894,6 @@ int pnv_ioda_deconfigure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
{
- struct pci_dev *parent;
uint8_t bcomp, dcomp, fcomp;
long rc, rid_end, rid;
@@ -904,7 +903,6 @@ int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
dcomp = OPAL_IGNORE_RID_DEVICE_NUMBER;
fcomp = OPAL_IGNORE_RID_FUNCTION_NUMBER;
- parent = pe->pbus->self;
if (pe->flags & PNV_IODA_PE_BUS_ALL)
count = resource_size(&pe->pbus->busn_res);
else
@@ -925,12 +923,6 @@ int pnv_ioda_configure_pe(struct pnv_phb *phb, struct pnv_ioda_pe *pe)
}
rid_end = pe->rid + (count << 8);
} else {
-#ifdef CONFIG_PCI_IOV
- if (pe->flags & PNV_IODA_PE_VF)
- parent = pe->parent_dev;
- else
-#endif /* CONFIG_PCI_IOV */
- parent = pe->pdev->bus->self;
bcomp = OpalPciBusAll;
dcomp = OPAL_COMPARE_RID_DEVICE_NUMBER;
fcomp = OPAL_COMPARE_RID_FUNCTION_NUMBER;
--
2.25.4
^ permalink raw reply related
* [PATCH 3/7] powerpc/sstep: Fix W=1 compile warning
From: Cédric Le Goater @ 2020-09-10 21:02 UTC (permalink / raw)
To: Michael Ellerman; +Cc: Jordan Niethe, linuxppc-dev, Cédric Le Goater
In-Reply-To: <20200910210250.1962595-1-clg@kaod.org>
../arch/powerpc/lib/sstep.c: In function ‘mlsd_8lsd_ea’:
../arch/powerpc/lib/sstep.c:225:3: error: suggest braces around empty body in an ‘if’ statement [-Werror=empty-body]
; /* Invalid form. Should already be checked for by caller! */
^
Cc: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
arch/powerpc/lib/sstep.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index caee8cc77e19..14572af16e55 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -221,8 +221,9 @@ static nokprobe_inline unsigned long mlsd_8lsd_ea(unsigned int instr,
; /* Leave ea as is */
else if (prefix_r && !ra)
ea += regs->nip;
- else if (prefix_r && ra)
+ else if (prefix_r && ra) {
; /* Invalid form. Should already be checked for by caller! */
+ }
return ea;
}
--
2.25.4
^ permalink raw reply related
* Re: [PATCH] kbuild: preprocess module linker script
From: Kees Cook @ 2020-09-10 19:45 UTC (permalink / raw)
To: Masahiro Yamada
Cc: linux-ia64, Peter Zijlstra, Paul Mackerras, linux-riscv,
Will Deacon, Ard Biesheuvel, Anton Ivanov, linux-arch,
Mauro Carvalho Chehab, Richard Weinberger, Russell King,
Ingo Molnar, Geert Uytterhoeven, Catalin Marinas, Fenghua Yu,
Albert Ou, Arnd Bergmann, linux-kbuild, Jeff Dike, linux-um,
linux-m68k, Michal Marek, Paul Walmsley, linux-arm-kernel,
Tony Luck, linux-kernel, Palmer Dabbelt, Jessica Yu, linuxppc-dev
In-Reply-To: <20200904133122.133071-1-masahiroy@kernel.org>
On Fri, Sep 04, 2020 at 10:31:21PM +0900, Masahiro Yamada wrote:
> There was a request to preprocess the module linker script like we do
> for the vmlinux one (https://lkml.org/lkml/2020/8/21/512).
>
> The difference between vmlinux.lds and module.lds is that the latter
> is needed for external module builds, thus must be cleaned up by
> 'make mrproper' instead of 'make clean' (also, it must be created by
> 'make modules_prepare').
>
> You cannot put it in arch/*/kernel/ because 'make clean' descends into
> it. I moved arch/*/kernel/module.lds to arch/*/include/asm/module.lds.h,
> which is included from scripts/module.lds.S.
>
> scripts/module.lds is fine because 'make clean' keeps all the build
> artifacts under scripts/.
>
> You can add arch-specific sections in <asm/module.lds.h>.
>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
--
Kees Cook
^ permalink raw reply
* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Linus Torvalds @ 2020-09-10 19:32 UTC (permalink / raw)
To: Gerald Schaefer
Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
linux-arch, linux-s390, Vasily Gorbik, Christian Borntraeger,
Richard Weinberger, linux-x86, Russell King, Jason Gunthorpe,
Ingo Molnar, Catalin Marinas, Andrey Ryabinin, Heiko Carstens,
Arnd Bergmann, John Hubbard, Jeff Dike, linux-um, Borislav Petkov,
Andy Lutomirski, Thomas Gleixner, linux-arm, linux-mm, LKML,
Andrew Morton, linux-power, Mike Rapoport
In-Reply-To: <20200910211010.46d064a7@thinkpad>
On Thu, Sep 10, 2020 at 12:11 PM Gerald Schaefer
<gerald.schaefer@linux.ibm.com> wrote:
>
> That sounds a lot like the pXd_offset_orig() from Martins first approach
> in this thread:
> https://lore.kernel.org/linuxppc-dev/20190418100218.0a4afd51@mschwideX1/
I have to admit to finding that name horrible, but aside from that, yes.
I don't think "pXd_offset_orig()" makes any sense as a name. Yes,
"orig" may make sense as the variable name (as in "this was the
original value we read"), but a function name should describe what it
*does*, not what the arguments are.
Plus "original" doesn't make sense to me anyway, since we're not
modifying it. To me, "original" means that there's a final version
too, which this interface in no way implies. It's just "this is the
value we already read".
("orig" does make some sense in that fault path - because by
definition we *are* going to modify the page table entry, that's the
whole point of the fault - we need to do something to not keep
faulting. But here, we're not at all necessarily modifying the page
table contents, we're just following them and readign the values once)
Of course, I don't know what a better name would be to describe what
is actually going on, I'm just explaining why I hate that naming.
*Maybe* something like just "pXd_offset_value()" together with a
comment explaining that it's given the upper pXd pointer _and_ the
value behind it, and it needs to return the next level offset? I
dunno. "value" doesn't really seem horribly descriptive either, but at
least it doesn't feel actively misleading to me.
Yeah, I get hung up on naming sometimes. I don't tend to care much
about private local variables ("i" is a perfectly fine variable name),
but these kinds of somewhat subtle cross-architecture definitions I
feel matter.
Linus
^ permalink raw reply
* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Gerald Schaefer @ 2020-09-10 19:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
linux-arch, linux-s390, Vasily Gorbik, Christian Borntraeger,
Richard Weinberger, linux-x86, Russell King, Jason Gunthorpe,
Ingo Molnar, Catalin Marinas, Andrey Ryabinin, Heiko Carstens,
Arnd Bergmann, John Hubbard, Jeff Dike, linux-um, Borislav Petkov,
Andy Lutomirski, Thomas Gleixner, linux-arm, linux-mm, LKML,
Andrew Morton, linux-power, Mike Rapoport
In-Reply-To: <CAHk-=wh3SjOE2r4WCfagL5Zq4Oj4Jsu1=1jTTi2GxGDTxP-J0Q@mail.gmail.com>
On Thu, 10 Sep 2020 11:33:17 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Thu, Sep 10, 2020 at 11:13 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > So.. To change away from the stack option I think we'd have to pass
> > the READ_ONCE value to pXX_offset() as an extra argument instead of it
> > derefing the pointer internally.
>
> Yeah, but I think that would actually be the better model than passing
> an address to a random stack location.
>
> It's also effectively what we do in some other places, eg the whole
> logic with "orig" in the regular pte fault handling is basically doing
> unlocked loads of the pte, various decisions on that, and then doing a
> final "is this still the same pte" after it has gotten the page table
> lock.
That sounds a lot like the pXd_offset_orig() from Martins first approach
in this thread:
https://lore.kernel.org/linuxppc-dev/20190418100218.0a4afd51@mschwideX1/
It is also the "Patch 1" option from the start of this thread:
https://lore.kernel.org/lkml/20200828140314.8556-1-gerald.schaefer@linux.ibm.com/
I guess I chose wrongly there, should have had more trust in Martins
approach, and not try so hard to do it like others...
So, maybe we can start over again, from that patch option. It would of
course also initially introduce some gup-specific helpers, like with
the other approach. It seemed harder to generalize when I thought
about it back then, but I guess it should not be a lot harder than
the _addr_end stuff.
Or, maybe this time, just not to risk Christian getting a heart attack,
we could go for the gup-specific helper first, so that we would at
least have a fix for the possible s390 data corruption. Jason, would
you agree that we send a new RFC, this time with pXd_offset_orig()
approach, and have that accepted as short-term fix?
Or would you rather also wait for some proper generic change? Have
lost that option from my radar, so cannot really judge how much more
effort it would be. I'm on vacation next week anyway, but Alexander
or Vasily (who did the option 1 patch) could look into this further.
^ permalink raw reply
* Re: remove the last set_fs() in common code, and remove it for x86 and powerpc v3
From: Segher Boessenkool @ 2020-09-10 17:15 UTC (permalink / raw)
To: David Laight
Cc: linux-arch, Kees Cook, linuxppc-dev, the arch/x86 maintainers,
Nick Desaulniers, Linux Kernel Mailing List, Christoph Hellwig,
Luis Chamberlain, Al Viro, linux-fsdevel,
'Linus Torvalds', Alexey Dobriyan
In-Reply-To: <18fdbaeacba349a0a8bf7568f709e991@AcuMS.aculab.com>
On Thu, Sep 10, 2020 at 03:31:53PM +0000, David Laight wrote:
> > > asm volatile ("" : "+r" (eax));
> > > // So here eax must contain the value set by the "xxxxx" instructions.
> >
> > No, the register eax will contain the value of the eax variable. In the
> > asm; it might well be there before or after the asm as well, but none of
> > that is guaranteed.
>
> Perhaps not 'guaranteed', but very unlikely to be wrong.
> It doesn't give gcc much scope for not generating the desired code.
Wanna bet? :-)
Correct is correct. Anything else is not.
Segher
^ permalink raw reply
* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Linus Torvalds @ 2020-09-10 18:33 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
linux-arch, linux-s390, Vasily Gorbik, Richard Weinberger,
linux-x86, Russell King, Christian Borntraeger, Ingo Molnar,
Catalin Marinas, Andrey Ryabinin, Gerald Schaefer, Heiko Carstens,
Arnd Bergmann, John Hubbard, Jeff Dike, linux-um, Borislav Petkov,
Andy Lutomirski, Thomas Gleixner, linux-arm, linux-mm, LKML,
Andrew Morton, linux-power, Mike Rapoport
In-Reply-To: <20200910181319.GO87483@ziepe.ca>
On Thu, Sep 10, 2020 at 11:13 AM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> So.. To change away from the stack option I think we'd have to pass
> the READ_ONCE value to pXX_offset() as an extra argument instead of it
> derefing the pointer internally.
Yeah, but I think that would actually be the better model than passing
an address to a random stack location.
It's also effectively what we do in some other places, eg the whole
logic with "orig" in the regular pte fault handling is basically doing
unlocked loads of the pte, various decisions on that, and then doing a
final "is this still the same pte" after it has gotten the page table
lock.
(And yes, those other pte fault handling cases are different, since
they _do_ hold the mmap lock, so they know the page *tables* are
stable, and it's only the last level that then gets re-checked against
the pte once the pte itself has also been stabilized with the page
table lock).
So I think it would actually be a better conceptual match to make the
page table walking interface be "here, this is the value I read once
carefully, and this is the address, now give me the next address".
The folded case would then just return the address it was given, and
the non-folded case would return the inner page table based on the
value.
I dunno. I don't actually feel all that strongly about this, so
whatever works, I guess.
Linus
^ permalink raw reply
* Re: [PATCH v3] pseries/drmem: don't cache node id in drmem_lmb struct
From: Scott Cheloha @ 2020-09-10 18:28 UTC (permalink / raw)
To: Laurent Dufour
Cc: Nathan Lynch, David Hildenbrand, linuxppc-dev, Michal Suchanek,
Rick Lindsley
In-Reply-To: <2704bebf-8c7b-7912-5e03-ddcc57acf8d1@linux.vnet.ibm.com>
On Fri, Aug 21, 2020 at 10:33:10AM +0200, Laurent Dufour wrote:
> Le 11/08/2020 à 03:51, Scott Cheloha a écrit :
> >
> > [...]
> >
> > @@ -631,7 +638,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
> > static int dlpar_add_lmb(struct drmem_lmb *lmb)
> > {
> > unsigned long block_sz;
> > - int rc;
> > + int nid, rc;
> >
> > if (lmb->flags & DRCONF_MEM_ASSIGNED)
> > return -EINVAL;
> > @@ -642,11 +649,13 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
> > return rc;
> > }
> >
> > - lmb_set_nid(lmb);
> > block_sz = memory_block_size_bytes();
> >
> > + /* Find the node id for this address. */
> > + nid = memory_add_physaddr_to_nid(lmb->base_addr);
>
> I think we could be more efficient here.
> Here is the call stack behind memory_add_physaddr_to_nid():
>
> memory_add_physaddr_to_nid(lmb->base_addr)
> hot_add_scn_to_nid()
> if (of_find_node_by_path("/ibm,dynamic-reconfiguration-memory")) == true*
> then
> hot_add_drconf_scn_to_nid()
> for_each_drmem_lmb() to find the LMB based on lmb->base_addr
> of_drconf_to_nid_single(found LMB)
> use lmb->aa_index to get the nid.
>
> * that test is necessarily true when called from dlpar_add_lmb()
> otherwise the call to update_lmb_associativity_index() would have
> failed earlier.
>
> Basically, we have a LMB and we later walk all the LMBs to find that lmb
> again. In the case of dlpar_add_lmb(), it would be more efficient to
> directly call of_drconf_to_nid_single(). That function is not exported
> from arch/powerpc/mm/numa.c but it may be good to export it through that
> patch.
I've posted a patch for this:
https://lore.kernel.org/linuxppc-dev/20200910175637.2865160-1-cheloha@linux.ibm.com/T/#u
The speedup is nice, especially for LMBs at the end of the array.
^ permalink raw reply
* Re: [PATCHv7 12/12] misc: pci_endpoint_test: Add driver data for Layerscape PCIe controllers
From: Rob Herring @ 2020-09-10 18:17 UTC (permalink / raw)
To: Zhiqiang Hou
Cc: roy.zang, lorenzo.pieralisi, jingoohan1, linuxppc-dev,
linux-kernel, leoyang.li, kishon, minghuan.Lian, devicetree,
robh+dt, mingkai.hu, linux-pci, bhelgaas, andrew.murray,
gustavo.pimentel, shawnguo, linux-arm-kernel
In-Reply-To: <20200811095441.7636-13-Zhiqiang.Hou@nxp.com>
On Tue, 11 Aug 2020 17:54:41 +0800, Zhiqiang Hou wrote:
> From: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
>
> The commit 0a121f9bc3f5 ("misc: pci_endpoint_test: Use streaming DMA
> APIs for buffer allocation") changed to use streaming DMA APIs, however,
> dma_map_single() might not return a 4KB aligned address, so add the
> default_data as driver data for Layerscape PCIe controllers to make it
> 4KB aligned.
>
> Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> ---
> V7:
> - New patch.
>
> drivers/misc/pci_endpoint_test.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
Acked-by: Rob Herring <robh@kernel.org>
^ permalink raw reply
* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Jason Gunthorpe @ 2020-09-10 18:13 UTC (permalink / raw)
To: Linus Torvalds
Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
linux-arch, linux-s390, Vasily Gorbik, Richard Weinberger,
linux-x86, Russell King, Christian Borntraeger, Ingo Molnar,
Catalin Marinas, Andrey Ryabinin, Gerald Schaefer, Heiko Carstens,
Arnd Bergmann, John Hubbard, Jeff Dike, linux-um, Borislav Petkov,
Andy Lutomirski, Thomas Gleixner, linux-arm, linux-mm, LKML,
Andrew Morton, linux-power, Mike Rapoport
In-Reply-To: <CAHk-=wh4SuNvThq1nBiqk0N-fW6NsY5w=VawC=rJs7ekmjAhjA@mail.gmail.com>
On Thu, Sep 10, 2020 at 10:35:38AM -0700, Linus Torvalds wrote:
> On Thu, Sep 10, 2020 at 2:40 AM Alexander Gordeev
> <agordeev@linux.ibm.com> wrote:
> >
> > It is only gup_fast case that exposes the issue. It hits because
> > pointers to stack copies are passed to gup_pXd_range iterators, not
> > pointers to real page tables itself.
>
> Can we possibly change fast-gup to not do the stack copies?
>
> I'd actually rather do something like that, than the "addr_end" thing.
> As you say, none of the other page table walking code does what the
> GUP code does, and I don't think it's required.
As I understand it, the requirement is because fast-gup walks without
the page table spinlock, or mmap_sem held so it must READ_ONCE the
*pXX.
It then checks that it is a valid page table pointer, then calls
pXX_offset().
The arch implementation of pXX_offset() derefs again the passed pXX
pointer. So it defeats the READ_ONCE and the 2nd load could observe
something that is no longer a page table pointer and crash.
Passing it the address of the stack value is a way to force
pXX_offset() to use the READ_ONCE result which has already been tested
to be a page table pointer.
Other page walking code that holds the mmap_sem tends to use
pmd_trans_unstable() which solves this problem by injecting a
barrier. The load hidden in pte_offset() after a pmd_trans_unstable()
can't be re-ordered and will only see a page table entry under the
mmap_sem.
However, I think that logic would have been much clearer following the
GUP model of READ_ONCE vs extra reads and a hidden barrier. At least
it took me a long time to work it out :(
I also think there are real bugs here where places are reading *pXX
multiple times without locking the page table. One was found recently
in the wild in the huge tlb code IIRC.
The mm/pagewalk.c has these missing READ_ONCE bugs too.
So.. To change away from the stack option I think we'd have to pass
the READ_ONCE value to pXX_offset() as an extra argument instead of it
derefing the pointer internally.
Jason
^ permalink raw reply
* Re: [PATCHv7 04/12] PCI: designware-ep: Modify MSI and MSIX CAP way of finding
From: Rob Herring @ 2020-09-10 18:10 UTC (permalink / raw)
To: Zhiqiang Hou
Cc: devicetree, lorenzo.pieralisi, Xiaowei Bao, roy.zang, linux-pci,
linux-kernel, leoyang.li, minghuan.Lian, jingoohan1,
andrew.murray, mingkai.hu, gustavo.pimentel, bhelgaas, shawnguo,
kishon, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200811095441.7636-5-Zhiqiang.Hou@nxp.com>
On Tue, Aug 11, 2020 at 05:54:33PM +0800, Zhiqiang Hou wrote:
> From: Xiaowei Bao <xiaowei.bao@nxp.com>
>
> Each PF of EP device should have its own MSI or MSIX capabitily
> struct, so create a dw_pcie_ep_func struct and move the msi_cap
> and msix_cap to this struct from dw_pcie_ep, and manage the PFs
> via a list.
>
> Signed-off-by: Xiaowei Bao <xiaowei.bao@nxp.com>
> Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> ---
> V7:
> - Rebase the patch without functionality change.
>
> .../pci/controller/dwc/pcie-designware-ep.c | 139 +++++++++++++++---
> drivers/pci/controller/dwc/pcie-designware.h | 18 ++-
> 2 files changed, 136 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index 56bd1cd71f16..4680a51c49c0 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -28,6 +28,19 @@ void dw_pcie_ep_init_notify(struct dw_pcie_ep *ep)
> }
> EXPORT_SYMBOL_GPL(dw_pcie_ep_init_notify);
>
> +struct dw_pcie_ep_func *
> +dw_pcie_ep_get_func_from_ep(struct dw_pcie_ep *ep, u8 func_no)
> +{
> + struct dw_pcie_ep_func *ep_func;
> +
> + list_for_each_entry(ep_func, &ep->func_list, list) {
> + if (ep_func->func_no == func_no)
> + return ep_func;
> + }
> +
> + return NULL;
> +}
> +
> static unsigned int dw_pcie_ep_func_select(struct dw_pcie_ep *ep, u8 func_no)
> {
> unsigned int func_offset = 0;
> @@ -68,6 +81,47 @@ void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)
> __dw_pcie_ep_reset_bar(pci, func_no, bar, 0);
> }
>
> +static u8 __dw_pcie_ep_find_next_cap(struct dw_pcie_ep *ep, u8 func_no,
> + u8 cap_ptr, u8 cap)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + unsigned int func_offset = 0;
> + u8 cap_id, next_cap_ptr;
> + u16 reg;
> +
> + if (!cap_ptr)
> + return 0;
> +
> + func_offset = dw_pcie_ep_func_select(ep, func_no);
> +
> + reg = dw_pcie_readw_dbi(pci, func_offset + cap_ptr);
> + cap_id = (reg & 0x00ff);
> +
> + if (cap_id > PCI_CAP_ID_MAX)
> + return 0;
> +
> + if (cap_id == cap)
> + return cap_ptr;
> +
> + next_cap_ptr = (reg & 0xff00) >> 8;
> + return __dw_pcie_ep_find_next_cap(ep, func_no, next_cap_ptr, cap);
> +}
> +
> +static u8 dw_pcie_ep_find_capability(struct dw_pcie_ep *ep, u8 func_no, u8 cap)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + unsigned int func_offset = 0;
> + u8 next_cap_ptr;
> + u16 reg;
> +
> + func_offset = dw_pcie_ep_func_select(ep, func_no);
> +
> + reg = dw_pcie_readw_dbi(pci, func_offset + PCI_CAPABILITY_LIST);
> + next_cap_ptr = (reg & 0x00ff);
> +
> + return __dw_pcie_ep_find_next_cap(ep, func_no, next_cap_ptr, cap);
> +}
These are almost the same as __dw_pcie_find_next_cap and
dw_pcie_find_capability. Please modify them to take a function offset
and work for both host and endpoints.
> +
> static int dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no,
> struct pci_epf_header *hdr)
> {
> @@ -257,13 +311,18 @@ static int dw_pcie_ep_get_msi(struct pci_epc *epc, u8 func_no)
> struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> u32 val, reg;
> unsigned int func_offset = 0;
> + struct dw_pcie_ep_func *ep_func;
>
> - if (!ep->msi_cap)
> + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> + if (!ep_func)
> + return -EINVAL;
> +
> + if (!ep_func->msi_cap)
> return -EINVAL;
>
> func_offset = dw_pcie_ep_func_select(ep, func_no);
>
> - reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS;
> + reg = ep_func->msi_cap + func_offset + PCI_MSI_FLAGS;
> val = dw_pcie_readw_dbi(pci, reg);
> if (!(val & PCI_MSI_FLAGS_ENABLE))
> return -EINVAL;
> @@ -279,13 +338,18 @@ static int dw_pcie_ep_set_msi(struct pci_epc *epc, u8 func_no, u8 interrupts)
> struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> u32 val, reg;
> unsigned int func_offset = 0;
> + struct dw_pcie_ep_func *ep_func;
> +
> + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> + if (!ep_func)
> + return -EINVAL;
>
> - if (!ep->msi_cap)
> + if (!ep_func->msi_cap)
> return -EINVAL;
>
> func_offset = dw_pcie_ep_func_select(ep, func_no);
>
> - reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS;
> + reg = ep_func->msi_cap + func_offset + PCI_MSI_FLAGS;
If msi_cap is now per function, then shouldn't it already include
'func_offset'?
> val = dw_pcie_readw_dbi(pci, reg);
> val &= ~PCI_MSI_FLAGS_QMASK;
> val |= (interrupts << 1) & PCI_MSI_FLAGS_QMASK;
> @@ -302,13 +366,18 @@ static int dw_pcie_ep_get_msix(struct pci_epc *epc, u8 func_no)
> struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> u32 val, reg;
> unsigned int func_offset = 0;
> + struct dw_pcie_ep_func *ep_func;
> +
> + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> + if (!ep_func)
> + return -EINVAL;
>
> - if (!ep->msix_cap)
> + if (!ep_func->msix_cap)
> return -EINVAL;
>
> func_offset = dw_pcie_ep_func_select(ep, func_no);
>
> - reg = ep->msix_cap + func_offset + PCI_MSIX_FLAGS;
> + reg = ep_func->msix_cap + func_offset + PCI_MSIX_FLAGS;
> val = dw_pcie_readw_dbi(pci, reg);
> if (!(val & PCI_MSIX_FLAGS_ENABLE))
> return -EINVAL;
> @@ -325,25 +394,30 @@ static int dw_pcie_ep_set_msix(struct pci_epc *epc, u8 func_no, u16 interrupts,
> struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> u32 val, reg;
> unsigned int func_offset = 0;
> + struct dw_pcie_ep_func *ep_func;
>
> - if (!ep->msix_cap)
> + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> + if (!ep_func)
> + return -EINVAL;
> +
> + if (!ep_func->msix_cap)
> return -EINVAL;
>
> dw_pcie_dbi_ro_wr_en(pci);
>
> func_offset = dw_pcie_ep_func_select(ep, func_no);
>
> - reg = ep->msix_cap + func_offset + PCI_MSIX_FLAGS;
> + reg = ep_func->msix_cap + func_offset + PCI_MSIX_FLAGS;
> val = dw_pcie_readw_dbi(pci, reg);
> val &= ~PCI_MSIX_FLAGS_QSIZE;
> val |= interrupts;
> dw_pcie_writew_dbi(pci, reg, val);
>
> - reg = ep->msix_cap + func_offset + PCI_MSIX_TABLE;
> + reg = ep_func->msix_cap + func_offset + PCI_MSIX_TABLE;
> val = offset | bir;
> dw_pcie_writel_dbi(pci, reg, val);
>
> - reg = ep->msix_cap + func_offset + PCI_MSIX_PBA;
> + reg = ep_func->msix_cap + func_offset + PCI_MSIX_PBA;
> val = (offset + (interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
> dw_pcie_writel_dbi(pci, reg, val);
>
> @@ -426,6 +500,7 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
> u8 interrupt_num)
> {
> struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + struct dw_pcie_ep_func *ep_func;
> struct pci_epc *epc = ep->epc;
> unsigned int aligned_offset;
> unsigned int func_offset = 0;
> @@ -435,25 +510,29 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
> bool has_upper;
> int ret;
>
> - if (!ep->msi_cap)
> + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> + if (!ep_func)
> + return -EINVAL;
> +
> + if (!ep_func->msi_cap)
> return -EINVAL;
>
> func_offset = dw_pcie_ep_func_select(ep, func_no);
>
> /* Raise MSI per the PCI Local Bus Specification Revision 3.0, 6.8.1. */
> - reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS;
> + reg = ep_func->msi_cap + func_offset + PCI_MSI_FLAGS;
> msg_ctrl = dw_pcie_readw_dbi(pci, reg);
> has_upper = !!(msg_ctrl & PCI_MSI_FLAGS_64BIT);
> - reg = ep->msi_cap + func_offset + PCI_MSI_ADDRESS_LO;
> + reg = ep_func->msi_cap + func_offset + PCI_MSI_ADDRESS_LO;
> msg_addr_lower = dw_pcie_readl_dbi(pci, reg);
> if (has_upper) {
> - reg = ep->msi_cap + func_offset + PCI_MSI_ADDRESS_HI;
> + reg = ep_func->msi_cap + func_offset + PCI_MSI_ADDRESS_HI;
> msg_addr_upper = dw_pcie_readl_dbi(pci, reg);
> - reg = ep->msi_cap + func_offset + PCI_MSI_DATA_64;
> + reg = ep_func->msi_cap + func_offset + PCI_MSI_DATA_64;
> msg_data = dw_pcie_readw_dbi(pci, reg);
> } else {
> msg_addr_upper = 0;
> - reg = ep->msi_cap + func_offset + PCI_MSI_DATA_32;
> + reg = ep_func->msi_cap + func_offset + PCI_MSI_DATA_32;
> msg_data = dw_pcie_readw_dbi(pci, reg);
> }
> aligned_offset = msg_addr_lower & (epc->mem->window.page_size - 1);
> @@ -489,6 +568,7 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> u16 interrupt_num)
> {
> struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + struct dw_pcie_ep_func *ep_func;
> struct pci_epf_msix_tbl *msix_tbl;
> struct pci_epc *epc = ep->epc;
> unsigned int func_offset = 0;
> @@ -499,9 +579,16 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> int ret;
> u8 bir;
>
> + ep_func = dw_pcie_ep_get_func_from_ep(ep, func_no);
> + if (!ep_func)
> + return -EINVAL;
> +
> + if (!ep_func->msix_cap)
> + return -EINVAL;
> +
> func_offset = dw_pcie_ep_func_select(ep, func_no);
>
> - reg = ep->msix_cap + func_offset + PCI_MSIX_TABLE;
> + reg = ep_func->msix_cap + func_offset + PCI_MSIX_TABLE;
> tbl_offset = dw_pcie_readl_dbi(pci, reg);
> bir = (tbl_offset & PCI_MSIX_TABLE_BIR);
> tbl_offset &= PCI_MSIX_TABLE_OFFSET;
> @@ -596,11 +683,15 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> {
> int ret;
> void *addr;
> + u8 func_no;
> struct pci_epc *epc;
> struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> struct device *dev = pci->dev;
> struct device_node *np = dev->of_node;
> const struct pci_epc_features *epc_features;
> + struct dw_pcie_ep_func *ep_func;
> +
> + INIT_LIST_HEAD(&ep->func_list);
>
> if (!pci->dbi_base || !pci->dbi_base2) {
> dev_err(dev, "dbi_base/dbi_base2 is not populated\n");
> @@ -660,9 +751,19 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> if (ret < 0)
> epc->max_functions = 1;
>
> - ep->msi_cap = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI);
> + for (func_no = 0; func_no < epc->max_functions; func_no++) {
> + ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL);
> + if (!ep_func)
> + return -ENOMEM;
>
> - ep->msix_cap = dw_pcie_find_capability(pci, PCI_CAP_ID_MSIX);
> + ep_func->func_no = func_no;
> + ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no,
> + PCI_CAP_ID_MSI);
> + ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no,
> + PCI_CAP_ID_MSIX);
> +
> + list_add_tail(&ep_func->list, &ep->func_list);
> + }
>
> if (ep->ops->ep_init)
> ep->ops->ep_init(ep);
> diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> index 745b4938225a..19c4ba486239 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.h
> +++ b/drivers/pci/controller/dwc/pcie-designware.h
> @@ -230,8 +230,16 @@ struct dw_pcie_ep_ops {
> unsigned int (*func_conf_select)(struct dw_pcie_ep *ep, u8 func_no);
> };
>
> +struct dw_pcie_ep_func {
> + struct list_head list;
> + u8 func_no;
> + u8 msi_cap; /* MSI capability offset */
> + u8 msix_cap; /* MSI-X capability offset */
> +};
> +
> struct dw_pcie_ep {
> struct pci_epc *epc;
> + struct list_head func_list;
> const struct dw_pcie_ep_ops *ops;
> phys_addr_t phys_base;
> size_t addr_size;
> @@ -244,8 +252,6 @@ struct dw_pcie_ep {
> u32 num_ob_windows;
> void __iomem *msi_mem;
> phys_addr_t msi_mem_phys;
> - u8 msi_cap; /* MSI capability offset */
> - u8 msix_cap; /* MSI-X capability offset */
> struct pci_epf_bar *epf_bar[PCI_STD_NUM_BARS];
> };
>
> @@ -440,6 +446,8 @@ int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8 func_no,
> u16 interrupt_num);
> void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar);
> +struct dw_pcie_ep_func *
> +dw_pcie_ep_get_func_from_ep(struct dw_pcie_ep *ep, u8 func_no);
> #else
> static inline void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)
> {
> @@ -490,5 +498,11 @@ static inline int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep,
> static inline void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)
> {
> }
> +
> +static inline struct dw_pcie_ep_func *
> +dw_pcie_ep_get_func_from_ep(struct dw_pcie_ep *ep, u8 func_no)
> +{
> + return NULL;
> +}
> #endif
> #endif /* _PCIE_DESIGNWARE_H */
> --
> 2.17.1
>
^ permalink raw reply
* Re: [RFC PATCH v2 1/3] mm/gup: fix gup_fast with dynamic page table folding
From: Gerald Schaefer @ 2020-09-10 17:57 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Peter Zijlstra, Dave Hansen, Dave Hansen, Paul Mackerras,
linux-sparc, Alexander Gordeev, Claudio Imbrenda, Will Deacon,
linux-arch, linux-s390, Vasily Gorbik, Richard Weinberger,
linux-x86, Russell King, Christian Borntraeger, Ingo Molnar,
Catalin Marinas, Andrey Ryabinin, Heiko Carstens, Arnd Bergmann,
John Hubbard, Jeff Dike, linux-um, Borislav Petkov,
Andy Lutomirski, Thomas Gleixner, linux-arm, linux-mm,
linux-power, LKML, Andrew Morton, Linus Torvalds, Mike Rapoport
In-Reply-To: <20200910130233.GK87483@ziepe.ca>
On Thu, 10 Sep 2020 10:02:33 -0300
Jason Gunthorpe <jgg@ziepe.ca> wrote:
> On Thu, Sep 10, 2020 at 11:39:25AM +0200, Alexander Gordeev wrote:
>
> > As Gerald mentioned, it is very difficult to explain in a clear way.
> > Hopefully, one could make sense ot of it.
>
> I would say the page table API requires this invariant:
>
> pud = pud_offset(p4d, addr);
> do {
> WARN_ON(pud != pud_offset(p4d, addr);
> next = pud_addr_end(addr, end);
> } while (pud++, addr = next, addr != end);
>
> ie pud++ is supposed to be a shortcut for
> pud_offset(p4d, next)
>
Hmm, IIUC, all architectures with static folding will simply return
the passed-in p4d pointer for pud_offset(p4d, addr), for 3-level
pagetables. There is no difference for s390. For gup_fast, that p4d
pointer is not really a pointer to a value in a pagetable, but
to some local copy of such a value, and not just for s390.
So, pud = p4d = pointer to copy, and increasing that pud pointer
cannot be the same as pud_offset(p4d, next). I do see your point
however, at last I think :-) My problem is that I do not see where
we would have an s390-specific issue here. Maybe my understanding
of how it works for others with static folding is wrong. That
would explain my difficulties in getting your point...
> While S390 does not follow this. Fixing addr_end brings it into
> alignment by preventing pud++ from happening.
Exactly, only that nobody seems to follow it, IIUC. Fixing it up
with pXd_addr_end was my impression of what we need to do, in order to
have it work the same way as for others.
> The only currently known side effect is that gup_fast crashes, but it
> sure is an unexpected thing.
Well, from my understanding it feels more unexpected that something
that is supposed to be a pointer to an entry in a page table, really is
just a pointer to some copy somewhere.
^ permalink raw reply
* Re: [PATCHv7 02/12] PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode
From: Rob Herring @ 2020-09-10 17:58 UTC (permalink / raw)
To: Zhiqiang Hou
Cc: devicetree, lorenzo.pieralisi, Xiaowei Bao, roy.zang, linux-pci,
linux-kernel, leoyang.li, minghuan.Lian, jingoohan1,
andrew.murray, mingkai.hu, gustavo.pimentel, bhelgaas, shawnguo,
kishon, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20200811095441.7636-3-Zhiqiang.Hou@nxp.com>
On Tue, Aug 11, 2020 at 05:54:31PM +0800, Zhiqiang Hou wrote:
> From: Xiaowei Bao <xiaowei.bao@nxp.com>
>
> Add the doorbell mode of MSI-X in DWC EP driver.
>
> Signed-off-by: Xiaowei Bao <xiaowei.bao@nxp.com>
> Reviewed-by: Andrew Murray <andrew.murray@arm.com>
> Signed-off-by: Hou Zhiqiang <Zhiqiang.Hou@nxp.com>
> ---
> V7:
> - Rebase the patch without functionality change.
>
> drivers/pci/controller/dwc/pcie-designware-ep.c | 14 ++++++++++++++
> drivers/pci/controller/dwc/pcie-designware.h | 12 ++++++++++++
> 2 files changed, 26 insertions(+)
>
> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c
> index e5bd3a5ef380..e76b504ed465 100644
> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> @@ -471,6 +471,20 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
> return 0;
> }
>
> +int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8 func_no,
return void. It never has an error.
It could also just be an inline function.
> + u16 interrupt_num)
> +{
> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> + u32 msg_data;
> +
> + msg_data = (func_no << PCIE_MSIX_DOORBELL_PF_SHIFT) |
> + (interrupt_num - 1);
> +
> + dw_pcie_writel_dbi(pci, PCIE_MSIX_DOORBELL, msg_data);
> +
> + return 0;
> +}
> +
> int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> u16 interrupt_num)
> {
> diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h
> index 89f8271ec5ee..745b4938225a 100644
> --- a/drivers/pci/controller/dwc/pcie-designware.h
> +++ b/drivers/pci/controller/dwc/pcie-designware.h
> @@ -97,6 +97,9 @@
> #define PCIE_MISC_CONTROL_1_OFF 0x8BC
> #define PCIE_DBI_RO_WR_EN BIT(0)
>
> +#define PCIE_MSIX_DOORBELL 0x948
> +#define PCIE_MSIX_DOORBELL_PF_SHIFT 24
> +
> #define PCIE_PL_CHK_REG_CONTROL_STATUS 0xB20
> #define PCIE_PL_CHK_REG_CHK_REG_START BIT(0)
> #define PCIE_PL_CHK_REG_CHK_REG_CONTINUOUS BIT(1)
> @@ -434,6 +437,8 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep *ep, u8 func_no,
> u8 interrupt_num);
> int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> u16 interrupt_num);
> +int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8 func_no,
> + u16 interrupt_num);
> void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar);
> #else
> static inline void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)
> @@ -475,6 +480,13 @@ static inline int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> return 0;
> }
>
> +static inline int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep,
> + u8 func_no,
> + u16 interrupt_num)
> +{
> + return 0;
> +}
> +
> static inline void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)
> {
> }
> --
> 2.17.1
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox