LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] macintosh/adb-iop: Send correct poll command
From: Geert Uytterhoeven @ 2020-12-04 10:03 UTC (permalink / raw)
  To: Finn Thain; +Cc: linuxppc-dev, Linux Kernel Mailing List, Joshua Thompson
In-Reply-To: <58bba4310da4c29b068345a4b36af8a531397ff7.1605847196.git.fthain@telegraphics.com.au>

Hi Finn,

On Fri, Nov 20, 2020 at 5:54 AM Finn Thain <fthain@telegraphics.com.au> wrote:
> The behaviour of the IOP firmware is not well documented but we do know
> that IOP message reply data can be used to issue new ADB commands.
> Use the message reply to better control autopoll behaviour by sending
> a Talk Register 0 command after every ADB response, not unlike the
> algorithm in the via-macii driver. This poll command is addressed to
> that device which last received a Talk command (explicit or otherwise).
>
> Cc: Joshua Thompson <funaho@jurai.org>
> Fixes: fa3b5a9929fc ("macintosh/adb-iop: Implement idle -> sending state transition")

WARNING: Unknown commit id 'fa3b5a9929fc', maybe rebased or not pulled?

32226e817043?

> Tested-by: Stan Johnson <userm57@yahoo.com>
> Signed-off-by: Finn Thain <fthain@telegraphics.com.au>

Thanks, will queue in the m68k for-v5.11 branch.

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH v3 05/19] powerpc: interrupt handler wrapper functions
From: Nicholas Piggin @ 2020-12-04  8:31 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev
In-Reply-To: <87o8jfpa5k.fsf@linux.ibm.com>

Excerpts from Aneesh Kumar K.V's message of November 30, 2020 5:37 pm:
> Nicholas Piggin <npiggin@gmail.com> writes:
> 
> .....
>  +#endif
>> +DECLARE_INTERRUPT_HANDLER(emulation_assist_interrupt);
>> +DECLARE_INTERRUPT_HANDLER_RAW(do_slb_fault);
> 
> Can we add comments here explaining why some of these handlers need to remain RAW()?

I possibly could. My patch doesn't change the reason, of course, they 
already have these issues.

It wants to avoid reconciling interrupts and probably context tracking 
because you can take SLB faults within those subsystems, which are not 
expecting re-entrant calls into them. It's okay to avoid these things 
because the interrupts don't enable interrupts, go to process context, 
add any timers, etc.

Now that I look at it, possibly the primary do_hash_fault handler needs 
to be RAW as well for the same reason.

>> +DECLARE_INTERRUPT_HANDLER(do_bad_slb_fault);
>> +DECLARE_INTERRUPT_HANDLER_RET(do_hash_fault);
>> +DECLARE_INTERRUPT_HANDLER_RET(do_page_fault);
>> +DECLARE_INTERRUPT_HANDLER(do_bad_page_fault);
>> +
>> +DECLARE_INTERRUPT_HANDLER_ASYNC(timer_interrupt);
>> +DECLARE_INTERRUPT_HANDLER_NMI(performance_monitor_exception_nmi);
>> +DECLARE_INTERRUPT_HANDLER_ASYNC(performance_monitor_exception_async);
>> +DECLARE_INTERRUPT_HANDLER_RAW(performance_monitor_exception);
> 
> Same for this.

That's just because nmi vs async is decided at runtime for PMIs. I can 
add a comment, although it's more obvious when looking at the body.

Thanks,
Nick

^ permalink raw reply

* Re: [RFC v2 1/2] [NEEDS HELP] x86/mm: Handle unlazying membarrier core sync in the arch code
From: Nadav Amit @ 2020-12-04  8:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-arch, Arnd Bergmann, Rik van Riel, Will Deacon, X86 ML,
	Dave Hansen, LKML, Nicholas Piggin, Linux-MM, Mathieu Desnoyers,
	Catalin Marinas, Jann Horn, linuxppc-dev
In-Reply-To: <203d39d11562575fd8bd6a094d97a3a332d8b265.1607059162.git.luto@kernel.org>

I am not very familiar with membarrier, but here are my 2 cents while trying
to answer your questions.

> On Dec 3, 2020, at 9:26 PM, Andy Lutomirski <luto@kernel.org> wrote:
> @@ -496,6 +497,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> 		 * from one thread in a process to another thread in the same
> 		 * process. No TLB flush required.
> 		 */
> +
> +		// XXX: why is this okay wrt membarrier?
> 		if (!was_lazy)
> 			return;

I am confused.

On one hand, it seems that membarrier_private_expedited() would issue an IPI
to that core, as it would find that this core’s cpu_rq(cpu)->curr->mm is the
same as the one that the membarrier applies to. But… (see below)


> @@ -508,12 +511,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
> 		smp_mb();
> 		next_tlb_gen = atomic64_read(&next->context.tlb_gen);
> 		if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
> -				next_tlb_gen)
> +		    next_tlb_gen) {
> +			/*
> +			 * We're reactivating an mm, and membarrier might
> +			 * need to serialize.  Tell membarrier.
> +			 */
> +
> +			// XXX: I can't understand the logic in
> +			// membarrier_mm_sync_core_before_usermode().  What's
> +			// the mm check for?
> +			membarrier_mm_sync_core_before_usermode(next);

On the other hand the reason for this mm check that you mention contradicts
my previous understanding as the git log says:

commit 2840cf02fae627860156737e83326df354ee4ec6
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date:   Thu Sep 19 13:37:01 2019 -0400

    sched/membarrier: Call sync_core only before usermode for same mm
    
    When the prev and next task's mm change, switch_mm() provides the core
    serializing guarantees before returning to usermode. The only case
    where an explicit core serialization is needed is when the scheduler
    keeps the same mm for prev and next.

> 	/*
> 	 * When switching through a kernel thread, the loop in
> 	 * membarrier_{private,global}_expedited() may have observed that
> 	 * kernel thread and not issued an IPI. It is therefore possible to
> 	 * schedule between user->kernel->user threads without passing though
> 	 * switch_mm(). Membarrier requires a barrier after storing to
> -	 * rq->curr, before returning to userspace, so provide them here:
> +	 * rq->curr, before returning to userspace, and mmdrop() provides
> +	 * this barrier.
> 	 *
> -	 * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly
> -	 *   provided by mmdrop(),
> -	 * - a sync_core for SYNC_CORE.
> +	 * XXX: I don't think mmdrop() actually does this.  There's no
> +	 * smp_mb__before/after_atomic() in there.

I presume that since x86 is the only one that needs
membarrier_mm_sync_core_before_usermode(), nobody noticed the missing
smp_mb__before/after_atomic(). These are anyhow a compiler barrier in x86,
and such a barrier would take place before the return to userspace.


^ permalink raw reply

* Re: [PATCH v8 11/12] mm/vmalloc: Hugepage vmalloc mappings
From: Nicholas Piggin @ 2020-12-04  8:12 UTC (permalink / raw)
  To: akpm@linux-foundation.org, linux-mm@kvack.org, Edgecombe, Rick P
  Cc: linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	hch@infradead.org, lizefan@huawei.com,
	Jonathan.Cameron@Huawei.com, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <e9d3a50e1b18f9ea1cdfdc221bef75db19273417.camel@intel.com>

Excerpts from Edgecombe, Rick P's message of December 1, 2020 6:21 am:
> On Sun, 2020-11-29 at 01:25 +1000, Nicholas Piggin wrote:
>> Support huge page vmalloc mappings. Config option
>> HAVE_ARCH_HUGE_VMALLOC
>> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
>> supports PMD sized vmap mappings.
>> 
>> vmalloc will attempt to allocate PMD-sized pages if allocating PMD
>> size
>> or larger, and fall back to small pages if that was unsuccessful.
>> 
>> Allocations that do not use PAGE_KERNEL prot are not permitted to use
>> huge pages, because not all callers expect this (e.g., module
>> allocations vs strict module rwx).
> 
> Several architectures (x86, arm64, others?) allocate modules initially
> with PAGE_KERNEL and so I think this test will not exclude module
> allocations in those cases.

Ah, thanks. I guess archs must additionally ensure that their
PAGE_KERNEL allocations are suitable for huge page mappings before
enabling the option.

If there is interest from those archs to support this, I have an
early (un-posted) patch that adds an explicit VM_HUGE flag that could
override the pessemistic arch default. It's not much trouble to add this 
to the large system hash allocations. It's very out of date now but I 
can at least give what I have to anyone doing an arch support that
wants it.

> 
> [snip]
> 
>> @@ -2400,6 +2453,7 @@ static inline void set_area_direct_map(const
>> struct vm_struct *area,
>>  {
>>  	int i;
>>  
>> +	/* HUGE_VMALLOC passes small pages to set_direct_map */
>>  	for (i = 0; i < area->nr_pages; i++)
>>  		if (page_address(area->pages[i]))
>>  			set_direct_map(area->pages[i]);
>> @@ -2433,11 +2487,12 @@ static void vm_remove_mappings(struct
>> vm_struct *area, int deallocate_pages)
>>  	 * map. Find the start and end range of the direct mappings to
>> make sure
>>  	 * the vm_unmap_aliases() flush includes the direct map.
>>  	 */
>> -	for (i = 0; i < area->nr_pages; i++) {
>> +	for (i = 0; i < area->nr_pages; i += 1U << area->page_order) {
>>  		unsigned long addr = (unsigned long)page_address(area-
>> >pages[i]);
>>  		if (addr) {
>> +			unsigned long page_size = PAGE_SIZE << area-
>> >page_order;
>>  			start = min(addr, start);
>> -			end = max(addr + PAGE_SIZE, end);
>> +			end = max(addr + page_size, end);
>>  			flush_dmap = 1;
>>  		}
>>  	}
> 
> The logic around this is a bit tangled. The reset of the direct map has
> to succeed, but if the set_direct_map_() functions require a split they
> could fail. For x86, set_memory_ro() calls on a vmalloc alias will
> mirror the page size and permission on the direct map and so the direct
> map will be broken to 4k pages if it's a RO vmalloc allocation.
> 
> But after this, module vmalloc()'s could have large pages which would
> result in large RO pages on the direct map. Then it could possibly fail
> when trying to reset a 4k page out of a large RO direct map mapping. 
> 
> I think either module allocations need to be actually excluded from
> having large pages (seems like you might have seen other issues as
> well?), or another option could be to use the changes here:
> https://lore.kernel.org/lkml/20201125092208.12544-4-rppt@kernel.org/
> to reset the direct map for a large page range at a time for large 
> vmalloc pages.
> 

Right, x86 would have to do something about that before enabling.
A VM_HUGE flag might be quick and easy but maybe other options are not 
too difficult.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH] powerpc/hotplug: assign hot added LMB to the right node
From: Laurent Dufour @ 2020-12-04  8:12 UTC (permalink / raw)
  To: Greg KH
  Cc: Nathan Lynch, Scott Cheloha, linux-kernel, stable, Paul Mackerras,
	linuxppc-dev
In-Reply-To: <X8ktsoAv4/h+p1/8@kroah.com>

Le 03/12/2020 à 19:25, Greg KH a écrit :
> On Thu, Dec 03, 2020 at 11:15:14AM +0100, Laurent Dufour wrote:
>> This patch applies to 5.9 and earlier kernels only.
>>
>> Since 5.10, this has been fortunately fixed by the commit
>> e5e179aa3a39 ("pseries/drmem: don't cache node id in drmem_lmb struct").
> 
> Why can't we just backport that patch instead?  It's almost always
> better to do that than to have a one-off patch, as almost always those
> have bugs in them.

That's a good option too.
I was thinking that this 5.10 patch was not matching the stable release's 
guidelines since it was targeting performance issue, but since it is also fixing 
this issue, I'm certainly wrong.

So, forget that patch.

Thanks,
Laurent.

^ permalink raw reply

* Re: [RFC v2 2/2] [MOCKUP] sched/mm: Lightweight lazy mm refcounting
From: Nicholas Piggin @ 2020-12-04  7:54 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-arch, Nadav Amit, X86 ML, Arnd Bergmann, Jann Horn,
	Catalin Marinas, Rik van Riel, LKML, Linux-MM, Dave Hansen,
	Will Deacon, Mathieu Desnoyers, linuxppc-dev
In-Reply-To: <e69827244c2f1e534aa83a40334ffa00bafe54c2.1607059162.git.luto@kernel.org>

Excerpts from Andy Lutomirski's message of December 4, 2020 3:26 pm:
> This is a mockup.  It's designed to illustrate the algorithm and how the
> code might be structured.  There are several things blatantly wrong with
> it:
> 
> The coding stype is not up to kernel standards.  I have prototypes in the
> wrong places and other hacks.
> 
> There's a problem with mm_cpumask() not being reliable.

Interesting, this might be a way to reduce those IPIs with fairly 
minimal fast path cost. Would be interesting to see how much performance 
advantage it has over my dumb simple shoot-lazies.

For powerpc I don't think we'd be inclined to go that way, so don't feel 
the need to add this complexity for us alone -- we'd be more inclined to 
move the exit lazy to the final TLB shootdown path, which we're slowly 
getting more infrastructure in place to do.

(The powerpc hash MMU code which we're slowly moving away from might 
never get that capability because it's complex there and hard to do with
that virtualisation model so current big systems (and radix MMU until we
finish the TLB flushing stuff) want something here, but for those the
shoot-lazies could quite likely be sufficient)

But if core code was moved over to something like this for the benefit of
others archs we'd probably just as happily do that.

There's a few nits but I don't think I can see a fundamental problem 
yet.

Thanks,
Nick

^ permalink raw reply

* Re: [RFC v2 1/2] [NEEDS HELP] x86/mm: Handle unlazying membarrier core sync in the arch code
From: Nicholas Piggin @ 2020-12-04  7:06 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-arch, Nadav Amit, X86 ML, Arnd Bergmann, Jann Horn,
	Catalin Marinas, Rik van Riel, LKML, Linux-MM, Dave Hansen,
	Will Deacon, Mathieu Desnoyers, linuxppc-dev
In-Reply-To: <203d39d11562575fd8bd6a094d97a3a332d8b265.1607059162.git.luto@kernel.org>

Excerpts from Andy Lutomirski's message of December 4, 2020 3:26 pm:
> The core scheduler isn't a great place for
> membarrier_mm_sync_core_before_usermode() -- the core scheduler doesn't
> actually know whether we are lazy.  With the old code, if a CPU is
> running a membarrier-registered task, goes idle, gets unlazied via a TLB
> shootdown IPI, and switches back to the membarrier-registered task, it
> will do an unnecessary core sync.
> 
> Conveniently, x86 is the only architecture that does anything in this
> hook, so we can just move the code.

This should go on top of my series that adds the exit_lazy_mm call
and switches x86 over, at least.

> XXX: there are some comments in swich_mm_irqs_off() that seem to be
> trying to document what barriers are expected, and it's not clear to me
> that these barriers are actually present in all paths through the
> code.  So I think this change makes the code more comprehensible and
> has no effect on the code's correctness, but I'm not at all convinced
> that the code is correct.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/mm/tlb.c   | 17 ++++++++++++++++-
>  kernel/sched/core.c | 14 +++++++-------
>  2 files changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
> index 3338a1feccf9..23df035b80e8 100644
> --- a/arch/x86/mm/tlb.c
> +++ b/arch/x86/mm/tlb.c
> @@ -8,6 +8,7 @@
>  #include <linux/export.h>
>  #include <linux/cpu.h>
>  #include <linux/debugfs.h>
> +#include <linux/sched/mm.h>
>  
>  #include <asm/tlbflush.h>
>  #include <asm/mmu_context.h>
> @@ -496,6 +497,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>  		 * from one thread in a process to another thread in the same
>  		 * process. No TLB flush required.
>  		 */
> +
> +		// XXX: why is this okay wrt membarrier?
>  		if (!was_lazy)
>  			return;
>  
> @@ -508,12 +511,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
>  		smp_mb();
>  		next_tlb_gen = atomic64_read(&next->context.tlb_gen);
>  		if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
> -				next_tlb_gen)
> +		    next_tlb_gen) {
> +			/*
> +			 * We're reactivating an mm, and membarrier might
> +			 * need to serialize.  Tell membarrier.
> +			 */
> +
> +			// XXX: I can't understand the logic in
> +			// membarrier_mm_sync_core_before_usermode().  What's
> +			// the mm check for?

Writing CR3 is serializing, apparently. Another x86ism that gets 
commented and moved into arch/x86 with my patch.


> +			membarrier_mm_sync_core_before_usermode(next);
>  			return;
> +		}
>  
>  		/*
>  		 * TLB contents went out of date while we were in lazy
>  		 * mode. Fall through to the TLB switching code below.
> +		 * No need for an explicit membarrier invocation -- the CR3
> +		 * write will serialize.
>  		 */
>  		new_asid = prev_asid;
>  		need_flush = true;
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2d95dc3f4644..6c4b76147166 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3619,22 +3619,22 @@ static struct rq *finish_task_switch(struct task_struct *prev)
>  	kcov_finish_switch(current);
>  
>  	fire_sched_in_preempt_notifiers(current);
> +
>  	/*
>  	 * When switching through a kernel thread, the loop in
>  	 * membarrier_{private,global}_expedited() may have observed that
>  	 * kernel thread and not issued an IPI. It is therefore possible to
>  	 * schedule between user->kernel->user threads without passing though
>  	 * switch_mm(). Membarrier requires a barrier after storing to
> -	 * rq->curr, before returning to userspace, so provide them here:
> +	 * rq->curr, before returning to userspace, and mmdrop() provides
> +	 * this barrier.
>  	 *
> -	 * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly
> -	 *   provided by mmdrop(),
> -	 * - a sync_core for SYNC_CORE.
> +	 * XXX: I don't think mmdrop() actually does this.  There's no
> +	 * smp_mb__before/after_atomic() in there.

mmdrop definitely does provide a full barrier.

>  	 */
> -	if (mm) {
> -		membarrier_mm_sync_core_before_usermode(mm);
> +	if (mm)
>  		mmdrop(mm);
> -	}
> +
>  	if (unlikely(prev_state == TASK_DEAD)) {
>  		if (prev->sched_class->task_dead)
>  			prev->sched_class->task_dead(prev);
> -- 
> 2.28.0
> 
> 

^ permalink raw reply

* Re: [PATCH] powerpc/mm: Don't see NULL pointer dereference as a KUAP fault
From: Christophe Leroy @ 2020-12-04  6:30 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <87mtyvumrn.fsf@mpe.ellerman.id.au>



Le 03/12/2020 à 12:55, Michael Ellerman a écrit :
> Christophe Leroy <christophe.leroy@csgroup.eu> writes:
>> Sometimes, NULL pointer dereferences are expected. Even when they
>> are accidental they are unlikely an exploit attempt because the
>> first page is never mapped.
> 
> The first page can be mapped if mmap_min_addr is 0.
> 
> Blocking all faults to the first page would potentially break any
> program that does that.
> 
> Also if there is something mapped at 0 it's a good chance it is an
> exploit attempt :)

Ok, I see.

In fact, we hit this warning because we don't provide copy_from_kernel_nofault_allowed()

I'll cook a patch for that.

Christophe

^ permalink raw reply

* Re: [PATCH 3/7] powerpc/64s: flush L1D after user accesses
From: Christophe Leroy @ 2020-12-04  6:21 UTC (permalink / raw)
  To: Qian Cai
  Cc: cmr, spoorts2, npiggin, linuxppc-dev, Christoph Hellwig,
	Daniel Axtens
In-Reply-To: <da02e10d6b5a63dc10159d4420def15aa0bc4c19.camel@redhat.com>


Quoting Qian Cai <qcai@redhat.com>:

> On Thu, 2020-12-03 at 12:17 -0500, Qian Cai wrote:
>> []
>> > +static inline bool
>> > +bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool  
>> is_write)
>> > +{
>> > +	return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) &&
>> > +		    (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE :  
>> AMR_KUAP_BLOCK_READ)),
>> > +		    "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read");
>> > +}
>>
>> A simple "echo t > /proc/sysrq-trigger" will trigger this warning almost
>> endlessly on POWER9 NV.
>
> I have just realized the patch just moved this warning around, so  
> the issue was
> pre-existent. Since I have not tested sysrq-t regularly, I am not  
> sure when it
> started to break. So far, I have reverted some of those for testing which did
> not help, i.e., the sysrq-t issue remains.
>
> 16852975f0f  Revert "powerpc/64s: Use early_mmu_has_feature() in set_kuap()"
> 129e240ead32 Revert "powerpc: Implement user_access_save() and  
> user_access_restore()"
> edb0046c842c Revert "powerpc/64s/kuap: Add missing isync to KUAP  
> restore paths"
> 2d46ee87ce44 Revert "powerpc/64/kuap: Conditionally restore AMR in  
> interrupt exit"
> c1e0e805fc57 Revert "powerpc/64s/kuap: Conditionally restore AMR in  
> kuap_restore_amr asm"
> 7f30b7aaf23a Revert "selftests/powerpc: rfi_flush: disable entry  
> flush if present"
> bc9b9967a100 Revert "powerpc/64s: flush L1D on kernel entry"
> b77e7b54f5eb Revert "powerpc/64s: flush L1D after user accesses"
> 22dddf532c64 Revert "powerpc: Only include kup-radix.h for 64-bit Book3S"
> 2679d155c46a Revert "selftests/powerpc: entry flush test"
> 87954b9b4243 Revert "selftests/powerpc: refactor entry and rfi_flush tests"
> 342d82bd4c5d Revert "powerpc/64s: rename pnv|pseries_setup_rfi_flush  
> to _setup_security_mitigations"

I also hit that WARNING in the same way earlier this week.

I think it has been broken by commit c33165253492 ("powerpc: use  
non-set_fs based maccess routines")

IIUC we should provide copy_from_kernel_nofault_allowed() to avoid that.

Christophe

^ permalink raw reply

* [RFC v2 2/2] [MOCKUP] sched/mm: Lightweight lazy mm refcounting
From: Andy Lutomirski @ 2020-12-04  5:26 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, Nadav Amit, Arnd Bergmann, Rik van Riel, Will Deacon,
	X86 ML, Dave Hansen, LKML, Linux-MM, Mathieu Desnoyers,
	Andy Lutomirski, Catalin Marinas, Jann Horn, linuxppc-dev
In-Reply-To: <cover.1607059162.git.luto@kernel.org>

This is a mockup.  It's designed to illustrate the algorithm and how the
code might be structured.  There are several things blatantly wrong with
it:

The coding stype is not up to kernel standards.  I have prototypes in the
wrong places and other hacks.

There's a problem with mm_cpumask() not being reliable.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 kernel/fork.c        |   4 ++
 kernel/sched/core.c  | 128 +++++++++++++++++++++++++++++++++++++------
 kernel/sched/sched.h |  11 +++-
 3 files changed, 126 insertions(+), 17 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index da8d360fb032..0887a33cf84f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1066,6 +1066,8 @@ struct mm_struct *mm_alloc(void)
 	return mm_init(mm, current, current_user_ns());
 }
 
+extern void mm_fixup_lazy_refs(struct mm_struct *mm);
+
 static inline void __mmput(struct mm_struct *mm)
 {
 	VM_BUG_ON(atomic_read(&mm->mm_users));
@@ -1084,6 +1086,8 @@ static inline void __mmput(struct mm_struct *mm)
 	}
 	if (mm->binfmt)
 		module_put(mm->binfmt->module);
+
+	mm_fixup_lazy_refs(mm);
 	mmdrop(mm);
 }
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6c4b76147166..69dfdfe0e5b4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3555,6 +3555,75 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
 	prepare_arch_switch(next);
 }
 
+static void drop_extra_mm_refs(struct rq *rq)
+{
+	unsigned long old_mm;
+
+	if (likely(!atomic_long_read(&rq->mm_to_mmdrop)))
+		return;
+
+	/*
+	 * Slow path.  This only happens when we recently stopped using
+	 * an mm that is exiting.
+	 */
+	old_mm = atomic_long_xchg_relaxed(&rq->mm_to_mmdrop, 0);
+	if (old_mm)
+		mmdrop((struct mm_struct *)old_mm);
+}
+
+/*
+ * This ensures that all lazy_mm refs to mm are converted to mm_count
+ * refcounts.  Our caller holds an mm_count reference, so we don't need
+ * to worry about mm being freed out from under us.
+ */
+void mm_fixup_lazy_refs(struct mm_struct *mm)
+{
+	int cpu;
+
+	/*
+	 * mm_users is zero, so no new lazy refs will be taken.
+	 */
+	WARN_ON_ONCE(atomic_read(&mm->mm_users) != 0);
+
+	/*
+	 * XXX: this is wrong on arm64 and possibly on other architectures.
+	 * Maybe we need a config option for this?  Or a
+	 * for_each_possible_lazy_cpu(cpu, mm) helper?
+	 */
+	for_each_cpu(cpu, mm_cpumask(mm)) {
+		struct rq *rq = cpu_rq(cpu);
+		unsigned long old;
+
+		if (READ_ONCE(rq->lazy_mm) != mm)
+			continue;
+
+		// XXX: we could optimize this by doing a big addition to
+		// mm_count up front instead of incrementing it separately
+		// for each CPU.
+		mmgrab(mm);
+
+		// XXX: could this be relaxed instead?
+		old = atomic_long_xchg(&rq->mm_to_mmdrop, (unsigned long)mm);
+
+		// At this point, mm can be mmdrop()ed at any time, probably
+		// by the target cpu.
+
+		if (!old)
+			continue;  // All done!
+
+		WARN_ON_ONCE(old == (unsigned long)mm);
+
+		// Uh oh!  We just stole an mm reference from the target CPU.
+		// Fortunately, we just observed the target's lazy_mm pointing
+		// to something other than old, and we observed this after
+		// bringing mm_users down to 0.  This means that the remote
+		// cpu is definitely done with old.  So we can drop it on the
+		// remote CPU's behalf.
+
+		mmdrop((struct mm_struct *)old);
+	}
+}
+
 /**
  * finish_task_switch - clean up after a task-switch
  * @prev: the thread we just switched away from.
@@ -3578,7 +3647,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	__releases(rq->lock)
 {
 	struct rq *rq = this_rq();
-	struct mm_struct *mm = rq->prev_mm;
 	long prev_state;
 
 	/*
@@ -3597,8 +3665,6 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 		      current->comm, current->pid, preempt_count()))
 		preempt_count_set(FORK_PREEMPT_COUNT);
 
-	rq->prev_mm = NULL;
-
 	/*
 	 * A task struct has one reference for the use as "current".
 	 * If a task dies, then it sets TASK_DEAD in tsk->state and calls
@@ -3629,11 +3695,28 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	 * rq->curr, before returning to userspace, and mmdrop() provides
 	 * this barrier.
 	 *
-	 * XXX: I don't think mmdrop() actually does this.  There's no
-	 * smp_mb__before/after_atomic() in there.
+	 * XXX: I don't think mmdrop() actually did this.  There's no
+	 * smp_mb__before/after_atomic() in there.  But mmdrop()
+	 * certainly doesn't do this now, since we don't call mmdrop().
 	 */
-	if (mm)
-		mmdrop(mm);
+	if (current->mm && rq->lazy_mm) {
+		/*
+		 * We are unlazying.  Any remote CPU that observes our
+		 * store to lazy_mm is permitted to free the mm if mm_users
+		 * and mm_count are both zero.
+		 */
+		WRITE_ONCE(rq->lazy_mm, NULL);
+	}
+
+	// Do this unconditionally.  There's a race in which a remote CPU
+	// sees rq->lazy_mm != NULL and gives us an extra mm ref while we
+	// are executing this code and we don't notice.  Instead of letting
+	// that ref sit around until the next time we unlazy, do it on every
+	// context switch.
+	//
+	// XXX: maybe we should do this at the beginning of a context switch
+	// instead?
+	drop_extra_mm_refs(rq);
 
 	if (unlikely(prev_state == TASK_DEAD)) {
 		if (prev->sched_class->task_dead)
@@ -3737,20 +3820,31 @@ context_switch(struct rq *rq, struct task_struct *prev,
 	arch_start_context_switch(prev);
 
 	/*
-	 * kernel -> kernel   lazy + transfer active
-	 *   user -> kernel   lazy + mmgrab() active
+	 * TODO: write a new comment!
 	 *
-	 * kernel ->   user   switch + mmdrop() active
-	 *   user ->   user   switch
+	 * NB: none of this is kept in sync with the arch code.
+	 * In particular, active_mm can point to an mm that is no longer
+	 * in use by the arch mm code, and this condition can persist
+	 * across multiple context switches.  This isn't a problem
+	 * per se, but it does mean that using active_mm for anything
+	 * other than keeping an mm from being freed is a bit dubious.
 	 */
 	if (!next->mm) {                                // to kernel
 		enter_lazy_tlb(prev->active_mm, next);
 
 		next->active_mm = prev->active_mm;
-		if (prev->mm)                           // from user
-			mmgrab(prev->active_mm);
-		else
+		if (prev->mm) {                         // from user
+			WARN_ON_ONCE(rq->lazy_mm);
+			WRITE_ONCE(rq->lazy_mm, next->active_mm);
+			/*
+			 * barrier here?  this needs to be visible to any
+			 * remote CPU that starts executing __mmput().  That
+			 * can't happen until either we call mmput() or until
+			 * prev migrates elsewhere.
+			 */
+		} else {
 			prev->active_mm = NULL;
+		}
 	} else {                                        // to user
 		membarrier_switch_mm(rq, prev->active_mm, next->mm);
 		/*
@@ -3760,12 +3854,14 @@ context_switch(struct rq *rq, struct task_struct *prev,
 		 * The below provides this either through switch_mm(), or in
 		 * case 'prev->active_mm == next->mm' through
 		 * finish_task_switch()'s mmdrop().
+		 *
+		 * XXX: mmdrop() didn't do this before, and the new
+		 * code doesn't even call mmdrop().
 		 */
 		switch_mm_irqs_off(prev->active_mm, next->mm, next);
 
 		if (!prev->mm) {                        // from kernel
-			/* will mmdrop() in finish_task_switch(). */
-			rq->prev_mm = prev->active_mm;
+			/* will release lazy_mm in finish_task_switch(). */
 			prev->active_mm = NULL;
 		}
 	}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 28709f6b0975..e0caee5f158e 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -950,7 +950,16 @@ struct rq {
 	struct task_struct	*idle;
 	struct task_struct	*stop;
 	unsigned long		next_balance;
-	struct mm_struct	*prev_mm;
+
+	/*
+	 * Hazard pointer for an mm that we might be using lazily.
+	 */
+	struct mm_struct	*lazy_mm;
+
+	/*
+	 * An mm that needs mmdrop()ing.
+	 */
+	atomic_long_t		mm_to_mmdrop;
 
 	unsigned int		clock_update_flags;
 	u64			clock;
-- 
2.28.0


^ permalink raw reply related

* [RFC v2 1/2] [NEEDS HELP] x86/mm: Handle unlazying membarrier core sync in the arch code
From: Andy Lutomirski @ 2020-12-04  5:26 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, Nadav Amit, Arnd Bergmann, Rik van Riel, Will Deacon,
	X86 ML, Dave Hansen, LKML, Linux-MM, Mathieu Desnoyers,
	Andy Lutomirski, Catalin Marinas, Jann Horn, linuxppc-dev
In-Reply-To: <cover.1607059162.git.luto@kernel.org>

The core scheduler isn't a great place for
membarrier_mm_sync_core_before_usermode() -- the core scheduler doesn't
actually know whether we are lazy.  With the old code, if a CPU is
running a membarrier-registered task, goes idle, gets unlazied via a TLB
shootdown IPI, and switches back to the membarrier-registered task, it
will do an unnecessary core sync.

Conveniently, x86 is the only architecture that does anything in this
hook, so we can just move the code.

XXX: there are some comments in swich_mm_irqs_off() that seem to be
trying to document what barriers are expected, and it's not clear to me
that these barriers are actually present in all paths through the
code.  So I think this change makes the code more comprehensible and
has no effect on the code's correctness, but I'm not at all convinced
that the code is correct.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/mm/tlb.c   | 17 ++++++++++++++++-
 kernel/sched/core.c | 14 +++++++-------
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3338a1feccf9..23df035b80e8 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -8,6 +8,7 @@
 #include <linux/export.h>
 #include <linux/cpu.h>
 #include <linux/debugfs.h>
+#include <linux/sched/mm.h>
 
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
@@ -496,6 +497,8 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		 * from one thread in a process to another thread in the same
 		 * process. No TLB flush required.
 		 */
+
+		// XXX: why is this okay wrt membarrier?
 		if (!was_lazy)
 			return;
 
@@ -508,12 +511,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		smp_mb();
 		next_tlb_gen = atomic64_read(&next->context.tlb_gen);
 		if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
-				next_tlb_gen)
+		    next_tlb_gen) {
+			/*
+			 * We're reactivating an mm, and membarrier might
+			 * need to serialize.  Tell membarrier.
+			 */
+
+			// XXX: I can't understand the logic in
+			// membarrier_mm_sync_core_before_usermode().  What's
+			// the mm check for?
+			membarrier_mm_sync_core_before_usermode(next);
 			return;
+		}
 
 		/*
 		 * TLB contents went out of date while we were in lazy
 		 * mode. Fall through to the TLB switching code below.
+		 * No need for an explicit membarrier invocation -- the CR3
+		 * write will serialize.
 		 */
 		new_asid = prev_asid;
 		need_flush = true;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d95dc3f4644..6c4b76147166 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3619,22 +3619,22 @@ static struct rq *finish_task_switch(struct task_struct *prev)
 	kcov_finish_switch(current);
 
 	fire_sched_in_preempt_notifiers(current);
+
 	/*
 	 * When switching through a kernel thread, the loop in
 	 * membarrier_{private,global}_expedited() may have observed that
 	 * kernel thread and not issued an IPI. It is therefore possible to
 	 * schedule between user->kernel->user threads without passing though
 	 * switch_mm(). Membarrier requires a barrier after storing to
-	 * rq->curr, before returning to userspace, so provide them here:
+	 * rq->curr, before returning to userspace, and mmdrop() provides
+	 * this barrier.
 	 *
-	 * - a full memory barrier for {PRIVATE,GLOBAL}_EXPEDITED, implicitly
-	 *   provided by mmdrop(),
-	 * - a sync_core for SYNC_CORE.
+	 * XXX: I don't think mmdrop() actually does this.  There's no
+	 * smp_mb__before/after_atomic() in there.
 	 */
-	if (mm) {
-		membarrier_mm_sync_core_before_usermode(mm);
+	if (mm)
 		mmdrop(mm);
-	}
+
 	if (unlikely(prev_state == TASK_DEAD)) {
 		if (prev->sched_class->task_dead)
 			prev->sched_class->task_dead(prev);
-- 
2.28.0


^ permalink raw reply related

* [RFC v2 0/2] lazy mm refcounting
From: Andy Lutomirski @ 2020-12-04  5:26 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, Nadav Amit, Arnd Bergmann, Rik van Riel, Will Deacon,
	X86 ML, Dave Hansen, LKML, Linux-MM, Mathieu Desnoyers,
	Andy Lutomirski, Catalin Marinas, Jann Horn, linuxppc-dev

This is part of a larger series here, but the beginning bit is irrelevant
to the current discussion:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm&id=203d39d11562575fd8bd6a094d97a3a332d8b265

This is IMO a lot better than v1.  It's now almost entirely in generic
code.  (It looks like it's 100% generic, but that's a lie -- the
generic code currently that all possible lazy mm refs are in
mm_cpumask(), and that's not true on all arches.  So, if we take my
approach, we'll need to have a little arch hook to control this.)

Here's how I think it fits with various arches:

x86: On bare metal (i.e. paravirt flush unavailable), the loop won't do
much.  The existing TLB shootdown when user tables are freed will
empty mm_cpumask of everything but the calling CPU.  So x86 ends up
pretty close to as good as we can get short of reworking mm_cpumask() itself.

arm64: It needs the fixup above for correctness, but I think performance
should be pretty good.  Compared to current kernels, we lose an mmgrab()
and mmdrop() on each lazy transition, and we add a reasonably fast loop
over all cpus on process exit.  Someone (probably me) needs to make
sure we don't need some extra barriers.

power: Similar to x86.

s390x: Should be essentially the same as arm64.

Other arches: I don't know.  Further research is required.

What do you all think?

Andy Lutomirski (2):
  [NEEDS HELP] x86/mm: Handle unlazying membarrier core sync in the arch
    code
  [MOCKUP] sched/mm: Lightweight lazy mm refcounting

 arch/x86/mm/tlb.c    |  17 +++++-
 kernel/fork.c        |   4 ++
 kernel/sched/core.c  | 134 +++++++++++++++++++++++++++++++++++++------
 kernel/sched/sched.h |  11 +++-
 4 files changed, 145 insertions(+), 21 deletions(-)

-- 
2.28.0

^ permalink raw reply

* [PATCH 1/3] powerpc/smp: Parse ibm, thread-groups with multiple properties
From: Gautham R. Shenoy @ 2020-12-04  4:48 UTC (permalink / raw)
  To: Srikar Dronamraju, Anton Blanchard, Vaidyanathan Srinivasan,
	Michael Ellerman, Michael Neuling, Nicholas Piggin, Nathan Lynch,
	Peter Zijlstra, Valentin Schneider
  Cc: Gautham R. Shenoy, linuxppc-dev, linux-kernel
In-Reply-To: <1607057327-29822-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

The "ibm,thread-groups" device-tree property is an array that is used
to indicate if groups of threads within a core share certain
properties. It provides details of which property is being shared by
which groups of threads. This array can encode information about
multiple properties being shared by different thread-groups within the
core.

Example: Suppose,
"ibm,thread-groups" = [1,2,4,8,10,12,14,9,11,13,15,2,2,4,8,10,12,14,9,11,13,15]

This can be decomposed up into two consecutive arrays:

a) [1,2,4,8,10,12,14,9,11,13,15]
b) [2,2,4,8,10,12,14,9,11,13,15]

where in,

a) provides information of Property "1" being shared by "2" groups,
   each with "4" threads each. The "ibm,ppc-interrupt-server#s" of the
   first group is {8,10,12,14} and the "ibm,ppc-interrupt-server#s" of
   the second group is {9,11,13,15}. Property "1" is indicative of
   the thread in the group sharing L1 cache, translation cache and
   Instruction Data flow.

b) provides information of Property "2" being shared by "2" groups,
   each group with "4" threads. The "ibm,ppc-interrupt-server#s" of
   the first group is {8,10,12,14} and the
   "ibm,ppc-interrupt-server#s" of the second group is
   {9,11,13,15}. Property "2" indicates that the threads in each group
   share the L2-cache.
   
The existing code assumes that the "ibm,thread-groups" encodes
information about only one property. Hence even on platforms which
encode information about multiple properties being shared by the
corresponding groups of threads, the current code will only pick the
first one. (In the above example, it will only consider
[1,2,4,8,10,12,14,9,11,13,15] but not [2,2,4,8,10,12,14,9,11,13,15]).

This patch extends the parsing support on platforms which encode
information about multiple properties being shared by the
corresponding groups of threads.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/smp.c | 146 +++++++++++++++++++++++++++++-----------------
 1 file changed, 92 insertions(+), 54 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 8c2857c..6a242a3 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -106,6 +106,15 @@ struct thread_groups {
 	unsigned int thread_list[MAX_THREAD_LIST_SIZE];
 };
 
+/* Maximum number of properties that groups of threads within a core can share */
+#define MAX_THREAD_GROUP_PROPERTIES 1
+
+struct thread_groups_list {
+	unsigned int nr_properties;
+	struct thread_groups property_tgs[MAX_THREAD_GROUP_PROPERTIES];
+};
+
+static struct thread_groups_list tgl[NR_CPUS] __initdata;
 /*
  * On big-cores system, cpu_l1_cache_map for each CPU corresponds to
  * the set its siblings that share the L1-cache.
@@ -695,81 +704,94 @@ static void or_cpumasks_related(int i, int j, struct cpumask *(*srcmask)(int),
 /*
  * parse_thread_groups: Parses the "ibm,thread-groups" device tree
  *                      property for the CPU device node @dn and stores
- *                      the parsed output in the thread_groups
- *                      structure @tg if the ibm,thread-groups[0]
- *                      matches @property.
+ *                      the parsed output in the thread_groups_list
+ *                      structure @tglp.
  *
  * @dn: The device node of the CPU device.
- * @tg: Pointer to a thread group structure into which the parsed
+ * @tglp: Pointer to a thread group list structure into which the parsed
  *      output of "ibm,thread-groups" is stored.
- * @property: The property of the thread-group that the caller is
- *            interested in.
  *
  * ibm,thread-groups[0..N-1] array defines which group of threads in
  * the CPU-device node can be grouped together based on the property.
  *
- * ibm,thread-groups[0] tells us the property based on which the
+ * This array can represent thread groupings for multiple properties.
+ *
+ * ibm,thread-groups[i + 0] tells us the property based on which the
  * threads are being grouped together. If this value is 1, it implies
  * that the threads in the same group share L1, translation cache.
  *
- * ibm,thread-groups[1] tells us how many such thread groups exist.
+ * ibm,thread-groups[i+1] tells us how many such thread groups exist for the
+ * property ibm,thread-groups[i]
  *
- * ibm,thread-groups[2] tells us the number of threads in each such
+ * ibm,thread-groups[i+2] tells us the number of threads in each such
  * group.
+ * Suppose k = (ibm,thread-groups[i+1] * ibm,thread-groups[i+2]), then,
  *
- * ibm,thread-groups[3..N-1] is the list of threads identified by
+ * ibm,thread-groups[i+3..i+k+2] (is the list of threads identified by
  * "ibm,ppc-interrupt-server#s" arranged as per their membership in
  * the grouping.
  *
- * Example: If ibm,thread-groups = [1,2,4,5,6,7,8,9,10,11,12] it
- * implies that there are 2 groups of 4 threads each, where each group
- * of threads share L1, translation cache.
+ * Example:
+ * If ibm,thread-groups = [1,2,4,5,6,7,8,9,10,11,12,2,2,4,5,7,9,11,6,8,10,12]
+ * This can be decomposed up into two consecutive arrays:
+ * a) [1,2,4,5,6,7,8,9,10,11,12]
+ * b) [2,2,4,5,7,9,11,6,8,10,12]
+ * where in,
  *
- * The "ibm,ppc-interrupt-server#s" of the first group is {5,6,7,8}
- * and the "ibm,ppc-interrupt-server#s" of the second group is {9, 10,
- * 11, 12} structure
+ * a) there are 2 groups of 4 threads each, where each group of
+ * threads share Property 1 (L1, translation cache). The
+ * "ibm,ppc-interrupt-server#s" of the first group is {5,6,7,8} and
+ * the "ibm,ppc-interrupt-server#s" of the second group is {9, 10, 11,
+ * 12}.
+ *
+ * b) there are 2 groups of 4 threads each, where each group of
+ *    threads share some property indicated by the first value 2. The
+ *    "ibm,ppc-interrupt-server#s" of the first group is {5,7,9,11}
+ *    and the "ibm,ppc-interrupt-server#s" of the second group is
+ *    {6,8,10,12} structure
  *
  * Returns 0 on success, -EINVAL if the property does not exist,
  * -ENODATA if property does not have a value, and -EOVERFLOW if the
  * property data isn't large enough.
  */
 static int parse_thread_groups(struct device_node *dn,
-			       struct thread_groups *tg,
-			       unsigned int property)
+			       struct thread_groups_list *tglp)
 {
-	int i;
-	u32 thread_group_array[3 + MAX_THREAD_LIST_SIZE];
+	int i = 0;
+	u32 *thread_group_array;
 	u32 *thread_list;
 	size_t total_threads;
-	int ret;
+	int ret = 0, count;
+	unsigned int property_idx = 0;
 
+	count = of_property_count_u32_elems(dn, "ibm,thread-groups");
+	thread_group_array = kcalloc(count, sizeof(u32), GFP_KERNEL);
 	ret = of_property_read_u32_array(dn, "ibm,thread-groups",
-					 thread_group_array, 3);
+					 thread_group_array, count);
 	if (ret)
-		return ret;
-
-	tg->property = thread_group_array[0];
-	tg->nr_groups = thread_group_array[1];
-	tg->threads_per_group = thread_group_array[2];
-	if (tg->property != property ||
-	    tg->nr_groups < 1 ||
-	    tg->threads_per_group < 1)
-		return -ENODATA;
+		goto out_free;
 
-	total_threads = tg->nr_groups * tg->threads_per_group;
+	while (i < count && property_idx < MAX_THREAD_GROUP_PROPERTIES) {
+		int j;
+		struct thread_groups *tg = &tglp->property_tgs[property_idx++];
 
-	ret = of_property_read_u32_array(dn, "ibm,thread-groups",
-					 thread_group_array,
-					 3 + total_threads);
-	if (ret)
-		return ret;
+		tg->property = thread_group_array[i];
+		tg->nr_groups = thread_group_array[i + 1];
+		tg->threads_per_group = thread_group_array[i + 2];
+		total_threads = tg->nr_groups * tg->threads_per_group;
+
+		thread_list = &thread_group_array[i + 3];
 
-	thread_list = &thread_group_array[3];
+		for (j = 0; j < total_threads; j++)
+			tg->thread_list[j] = thread_list[j];
+		i = i + 3 + total_threads;
+	}
 
-	for (i = 0 ; i < total_threads; i++)
-		tg->thread_list[i] = thread_list[i];
+	tglp->nr_properties = property_idx;
 
-	return 0;
+out_free:
+	kfree(thread_group_array);
+	return ret;
 }
 
 /*
@@ -805,24 +827,39 @@ static int get_cpu_thread_group_start(int cpu, struct thread_groups *tg)
 	return -1;
 }
 
-static int init_cpu_l1_cache_map(int cpu)
+static int init_cpu_cache_map(int cpu, unsigned int cache_property)
 
 {
 	struct device_node *dn = of_get_cpu_node(cpu, NULL);
-	struct thread_groups tg = {.property = 0,
-				   .nr_groups = 0,
-				   .threads_per_group = 0};
+	struct thread_groups *tg = NULL;
 	int first_thread = cpu_first_thread_sibling(cpu);
 	int i, cpu_group_start = -1, err = 0;
+	cpumask_var_t *mask;
+	struct thread_groups_list *cpu_tgl = &tgl[cpu];
 
 	if (!dn)
 		return -ENODATA;
 
-	err = parse_thread_groups(dn, &tg, THREAD_GROUP_SHARE_L1);
-	if (err)
-		goto out;
+	if (!(cache_property == THREAD_GROUP_SHARE_L1))
+		return -EINVAL;
 
-	cpu_group_start = get_cpu_thread_group_start(cpu, &tg);
+	if (!cpu_tgl->nr_properties) {
+		err = parse_thread_groups(dn, cpu_tgl);
+		if (err)
+			goto out;
+	}
+
+	for (i = 0; i < cpu_tgl->nr_properties; i++) {
+		if (cpu_tgl->property_tgs[i].property == cache_property) {
+			tg = &cpu_tgl->property_tgs[i];
+			break;
+		}
+	}
+
+	if (!tg)
+		return -EINVAL;
+
+	cpu_group_start = get_cpu_thread_group_start(cpu, tg);
 
 	if (unlikely(cpu_group_start == -1)) {
 		WARN_ON_ONCE(1);
@@ -830,11 +867,12 @@ static int init_cpu_l1_cache_map(int cpu)
 		goto out;
 	}
 
-	zalloc_cpumask_var_node(&per_cpu(cpu_l1_cache_map, cpu),
-				GFP_KERNEL, cpu_to_node(cpu));
+	mask = &per_cpu(cpu_l1_cache_map, cpu);
+
+	zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));
 
 	for (i = first_thread; i < first_thread + threads_per_core; i++) {
-		int i_group_start = get_cpu_thread_group_start(i, &tg);
+		int i_group_start = get_cpu_thread_group_start(i, tg);
 
 		if (unlikely(i_group_start == -1)) {
 			WARN_ON_ONCE(1);
@@ -843,7 +881,7 @@ static int init_cpu_l1_cache_map(int cpu)
 		}
 
 		if (i_group_start == cpu_group_start)
-			cpumask_set_cpu(i, per_cpu(cpu_l1_cache_map, cpu));
+			cpumask_set_cpu(i, *mask);
 	}
 
 out:
@@ -924,7 +962,7 @@ static int init_big_cores(void)
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		int err = init_cpu_l1_cache_map(cpu);
+		int err = init_cpu_cache_map(cpu, THREAD_GROUP_SHARE_L1);
 
 		if (err)
 			return err;
-- 
1.9.4


^ permalink raw reply related

* [PATCH 3/3] powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache
From: Gautham R. Shenoy @ 2020-12-04  4:48 UTC (permalink / raw)
  To: Srikar Dronamraju, Anton Blanchard, Vaidyanathan Srinivasan,
	Michael Ellerman, Michael Neuling, Nicholas Piggin, Nathan Lynch,
	Peter Zijlstra, Valentin Schneider
  Cc: Gautham R. Shenoy, linuxppc-dev, linux-kernel
In-Reply-To: <1607057327-29822-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

On POWER platforms where only some groups of threads within a core
share the L2-cache (indicated by the ibm,thread-groups device-tree
property), we currently print the incorrect shared_cpu_map/list for
L2-cache in the sysfs.

This patch reports the correct shared_cpu_map/list on such platforms.

Example:
On a platform with "ibm,thread-groups" set to
                 00000001 00000002 00000004 00000000
                 00000002 00000004 00000006 00000001
                 00000003 00000005 00000007 00000002
                 00000002 00000004 00000000 00000002
                 00000004 00000006 00000001 00000003
                 00000005 00000007

This indicates that threads {0,2,4,6} in the core share the L2-cache
and threads {1,3,5,7} in the core share the L2 cache.

However, without the patch, the shared_cpu_map/list for L2 for CPUs 0,
1 is reported in the sysfs as follows:

/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0-7
/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_map:000000,000000ff

/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list:0-7
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map:000000,000000ff

With the patch, the shared_cpu_map/list for L2 cache for CPUs 0, 1 is
correctly reported as follows:

/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0,2,4,6
/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_map:000000,00000055

/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list:1,3,5,7
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map:000000,000000aa

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/cacheinfo.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/cacheinfo.c b/arch/powerpc/kernel/cacheinfo.c
index 65ab9fc..1cc8f37 100644
--- a/arch/powerpc/kernel/cacheinfo.c
+++ b/arch/powerpc/kernel/cacheinfo.c
@@ -651,15 +651,22 @@ static unsigned int index_dir_to_cpu(struct cache_index_dir *index)
 	return dev->id;
 }
 
+extern bool thread_group_shares_l2;
 /*
  * On big-core systems, each core has two groups of CPUs each of which
  * has its own L1-cache. The thread-siblings which share l1-cache with
  * @cpu can be obtained via cpu_smallcore_mask().
+ *
+ * On some big-core systems, the L2 cache is shared only between some
+ * groups of siblings. This is already parsed and encoded in
+ * cpu_l2_cache_mask().
  */
 static const struct cpumask *get_big_core_shared_cpu_map(int cpu, struct cache *cache)
 {
 	if (cache->level == 1)
 		return cpu_smallcore_mask(cpu);
+	if (cache->level == 2 && thread_group_shares_l2)
+		return cpu_l2_cache_mask(cpu);
 
 	return &cache->shared_cpu_map;
 }
-- 
1.9.4


^ permalink raw reply related

* [PATCH 2/3] powerpc/smp: Add support detecting thread-groups sharing L2 cache
From: Gautham R. Shenoy @ 2020-12-04  4:48 UTC (permalink / raw)
  To: Srikar Dronamraju, Anton Blanchard, Vaidyanathan Srinivasan,
	Michael Ellerman, Michael Neuling, Nicholas Piggin, Nathan Lynch,
	Peter Zijlstra, Valentin Schneider
  Cc: Gautham R. Shenoy, linuxppc-dev, linux-kernel
In-Reply-To: <1607057327-29822-1-git-send-email-ego@linux.vnet.ibm.com>

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

On POWER systems, groups of threads within a core sharing the L2-cache
can be indicated by the "ibm,thread-groups" property array with the
identifier "2".

This patch adds support for detecting this, and when present, populate
the populating the cpu_l2_cache_mask of every CPU to the core-siblings
which share L2 with the CPU as specified in the by the
"ibm,thread-groups" property array.

On a platform with the following "ibm,thread-group" configuration
		 00000001 00000002 00000004 00000000
		 00000002 00000004 00000006 00000001
		 00000003 00000005 00000007 00000002
		 00000002 00000004 00000000 00000002
		 00000004 00000006 00000001 00000003
		 00000005 00000007

Without this patch, the sched-domain hierarchy for CPUs 0,1 would be
	CPU0 attaching sched-domain(s):
	domain-0: span=0,2,4,6 level=SMT
	domain-1: span=0-7 level=CACHE
	domain-2: span=0-15,24-39,48-55 level=MC
	domain-3: span=0-55 level=DIE

	CPU1 attaching sched-domain(s):
	domain-0: span=1,3,5,7 level=SMT
	domain-1: span=0-7 level=CACHE
	domain-2: span=0-15,24-39,48-55 level=MC
	domain-3: span=0-55 level=DIE

The CACHE domain at 0-7 is incorrect since the ibm,thread-groups
sub-array
[00000002 00000002 00000004
 00000000 00000002 00000004 00000006
 00000001 00000003 00000005 00000007]
indicates that L2 (Property "2") is shared only between the threads of a single
group. There are "2" groups of threads where each group contains "4"
threads each. The groups being {0,2,4,6} and {1,3,5,7}.

With this patch, the sched-domain hierarchy for CPUs 0,1 would be
     	CPU0 attaching sched-domain(s):
	domain-0: span=0,2,4,6 level=SMT
	domain-1: span=0-15,24-39,48-55 level=MC
	domain-2: span=0-55 level=DIE

	CPU1 attaching sched-domain(s):
	domain-0: span=1,3,5,7 level=SMT
	domain-1: span=0-15,24-39,48-55 level=MC
	domain-2: span=0-55 level=DIE

The CACHE domain with span=0,2,4,6 for CPU 0 (span=1,3,5,7 for CPU 1
resp.) gets degenerated into the SMT domain. Furthermore, the
last-level-cache domain gets correctly set to the SMT sched-domain.

Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/smp.c | 66 +++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 58 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 6a242a3..a116d2d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -76,6 +76,7 @@
 struct task_struct *secondary_current;
 bool has_big_cores;
 bool coregroup_enabled;
+bool thread_group_shares_l2;
 
 DEFINE_PER_CPU(cpumask_var_t, cpu_sibling_map);
 DEFINE_PER_CPU(cpumask_var_t, cpu_smallcore_map);
@@ -99,6 +100,7 @@ enum {
 
 #define MAX_THREAD_LIST_SIZE	8
 #define THREAD_GROUP_SHARE_L1   1
+#define THREAD_GROUP_SHARE_L2   2
 struct thread_groups {
 	unsigned int property;
 	unsigned int nr_groups;
@@ -107,7 +109,7 @@ struct thread_groups {
 };
 
 /* Maximum number of properties that groups of threads within a core can share */
-#define MAX_THREAD_GROUP_PROPERTIES 1
+#define MAX_THREAD_GROUP_PROPERTIES 2
 
 struct thread_groups_list {
 	unsigned int nr_properties;
@@ -121,6 +123,13 @@ struct thread_groups_list {
  */
 DEFINE_PER_CPU(cpumask_var_t, cpu_l1_cache_map);
 
+/*
+ * On some big-cores system, thread_group_l2_cache_map for each CPU
+ * corresponds to the set its siblings within the core that share the
+ * L2-cache.
+ */
+DEFINE_PER_CPU(cpumask_var_t, thread_group_l2_cache_map);
+
 /* SMP operations for this machine */
 struct smp_ops_t *smp_ops;
 
@@ -718,7 +727,9 @@ static void or_cpumasks_related(int i, int j, struct cpumask *(*srcmask)(int),
  *
  * ibm,thread-groups[i + 0] tells us the property based on which the
  * threads are being grouped together. If this value is 1, it implies
- * that the threads in the same group share L1, translation cache.
+ * that the threads in the same group share L1, translation cache. If
+ * the value is 2, it implies that the threads in the same group share
+ * the same L2 cache.
  *
  * ibm,thread-groups[i+1] tells us how many such thread groups exist for the
  * property ibm,thread-groups[i]
@@ -745,10 +756,10 @@ static void or_cpumasks_related(int i, int j, struct cpumask *(*srcmask)(int),
  * 12}.
  *
  * b) there are 2 groups of 4 threads each, where each group of
- *    threads share some property indicated by the first value 2. The
- *    "ibm,ppc-interrupt-server#s" of the first group is {5,7,9,11}
- *    and the "ibm,ppc-interrupt-server#s" of the second group is
- *    {6,8,10,12} structure
+ *    threads share some property indicated by the first value 2 (L2
+ *    cache). The "ibm,ppc-interrupt-server#s" of the first group is
+ *    {5,7,9,11} and the "ibm,ppc-interrupt-server#s" of the second
+ *    group is {6,8,10,12} structure
  *
  * Returns 0 on success, -EINVAL if the property does not exist,
  * -ENODATA if property does not have a value, and -EOVERFLOW if the
@@ -840,7 +851,8 @@ static int init_cpu_cache_map(int cpu, unsigned int cache_property)
 	if (!dn)
 		return -ENODATA;
 
-	if (!(cache_property == THREAD_GROUP_SHARE_L1))
+	if (!(cache_property == THREAD_GROUP_SHARE_L1 ||
+	      cache_property == THREAD_GROUP_SHARE_L2))
 		return -EINVAL;
 
 	if (!cpu_tgl->nr_properties) {
@@ -867,7 +879,10 @@ static int init_cpu_cache_map(int cpu, unsigned int cache_property)
 		goto out;
 	}
 
-	mask = &per_cpu(cpu_l1_cache_map, cpu);
+	if (cache_property == THREAD_GROUP_SHARE_L1)
+		mask = &per_cpu(cpu_l1_cache_map, cpu);
+	else if (cache_property == THREAD_GROUP_SHARE_L2)
+		mask = &per_cpu(thread_group_l2_cache_map, cpu);
 
 	zalloc_cpumask_var_node(mask, GFP_KERNEL, cpu_to_node(cpu));
 
@@ -973,6 +988,16 @@ static int init_big_cores(void)
 	}
 
 	has_big_cores = true;
+
+	for_each_possible_cpu(cpu) {
+		int err = init_cpu_cache_map(cpu, THREAD_GROUP_SHARE_L2);
+
+		if (err)
+			return err;
+	}
+
+	thread_group_shares_l2 = true;
+	pr_info("Thread-groups in a core share L2-cache\n");
 	return 0;
 }
 
@@ -1287,6 +1312,31 @@ static bool update_mask_by_l2(int cpu, cpumask_var_t *mask)
 	if (has_big_cores)
 		submask_fn = cpu_smallcore_mask;
 
+
+	/*
+	 * If the threads in a thread-group share L2 cache, then then
+	 * the L2-mask can be obtained from thread_group_l2_cache_map.
+	 */
+	if (thread_group_shares_l2) {
+		/* Siblings that share L1 is a subset of siblings that share L2.*/
+		or_cpumasks_related(cpu, cpu, submask_fn, cpu_l2_cache_mask);
+		if (*mask) {
+			cpumask_andnot(*mask,
+				       per_cpu(thread_group_l2_cache_map, cpu),
+				       cpu_l2_cache_mask(cpu));
+		} else {
+			mask = &per_cpu(thread_group_l2_cache_map, cpu);
+		}
+
+		for_each_cpu(i, *mask) {
+			if (!cpu_online(i))
+				continue;
+			set_cpus_related(i, cpu, cpu_l2_cache_mask);
+		}
+
+		return true;
+	}
+
 	l2_cache = cpu_to_l2cache(cpu);
 	if (!l2_cache || !*mask) {
 		/* Assume only core siblings share cache with this CPU */
-- 
1.9.4


^ permalink raw reply related

* [PATCH 0/3] Extend Parsing "ibm, thread-groups" for Shared-L2 information
From: Gautham R. Shenoy @ 2020-12-04  4:48 UTC (permalink / raw)
  To: Srikar Dronamraju, Anton Blanchard, Vaidyanathan Srinivasan,
	Michael Ellerman, Michael Neuling, Nicholas Piggin, Nathan Lynch,
	Peter Zijlstra, Valentin Schneider
  Cc: Gautham R. Shenoy, linuxppc-dev, linux-kernel

From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>

The "ibm,thread-groups" device-tree property is an array that is used
to indicate if groups of threads within a core share certain
properties. It provides details of which property is being shared by
which groups of threads. This array can encode information about
multiple properties being shared by different thread-groups within the
core.

Example: Suppose,
"ibm,thread-groups" = [1,2,4,8,10,12,14,9,11,13,15,2,2,4,8,10,12,14,9,11,13,15]

This can be decomposed up into two consecutive arrays:

a) [1,2,4,8,10,12,14,9,11,13,15]
b) [2,2,4,8,10,12,14,9,11,13,15]

where in,

a) provides information of Property "1" being shared by "2" groups,
   each with "4" threads each. The "ibm,ppc-interrupt-server#s" of the
   first group is {8,10,12,14} and the "ibm,ppc-interrupt-server#s" of
   the second group is {9,11,13,15}. Property "1" is indicative of
   the thread in the group sharing L1 cache, translation cache and
   Instruction Data flow.

b) provides information of Property "2" being shared by "2" groups,
   each group with "4" threads. The "ibm,ppc-interrupt-server#s" of
   the first group is {8,10,12,14} and the
   "ibm,ppc-interrupt-server#s" of the second group is
   {9,11,13,15}. Property "2" indicates that the threads in each group
   share the L2-cache.

The existing code assumes that the "ibm,thread-groups" encodes
information about only one property. Hence even on platforms which
encode information about multiple properties being shared by the
corresponding groups of threads, the current code will only pick the
first one. (In the above example, it will only consider
[1,2,4,8,10,12,14,9,11,13,15] but not [2,2,4,8,10,12,14,9,11,13,15]).

Furthermore, currently on platforms where groups of threads share L2
cache, we incorrectly create an extra CACHE level sched-domain that
maps to all the threads of the core.

For example, if "ibm,thread-groups" is 
		 00000001 00000002 00000004 00000000
		 00000002 00000004 00000006 00000001
		 00000003 00000005 00000007 00000002
		 00000002 00000004 00000000 00000002
		 00000004 00000006 00000001 00000003
		 00000005 00000007

then, the sub-array
[00000002 00000002 00000004
 00000000 00000002 00000004 00000006
 00000001 00000003 00000005 00000007]
indicates that L2 (Property "2") is shared only between the threads of a single
group. There are "2" groups of threads where each group contains "4"
threads each. The groups being {0,2,4,6} and {1,3,5,7}.

However, the sched-domain hierarchy for CPUs 0,1 is
	CPU0 attaching sched-domain(s):
	domain-0: span=0,2,4,6 level=SMT
	domain-1: span=0-7 level=CACHE
	domain-2: span=0-15,24-39,48-55 level=MC
	domain-3: span=0-55 level=DIE

	CPU1 attaching sched-domain(s):
	domain-0: span=1,3,5,7 level=SMT
	domain-1: span=0-7 level=CACHE
	domain-2: span=0-15,24-39,48-55 level=MC
	domain-3: span=0-55 level=DIE

where the CACHE domain reports that L2 is shared across the entire
core which is incorrect on such platforms.

This patchset remedies these issues by extending the parsing support
for "ibm,thread-groups" to discover information about multiple
properties being shared by the corresponding groups of threads. In
particular we cano now detect if the groups of threads within a core
share the L2-cache. On such platforms, we populate the populating the
cpu_l2_cache_mask of every CPU to the core-siblings which share L2
with the CPU as specified in the by the "ibm,thread-groups" property
array.

With the patchset, the sched-domain hierarchy is correctly
reported. For eg for CPUs 0,1, with the patchset

     	CPU0 attaching sched-domain(s):
	domain-0: span=0,2,4,6 level=SMT
	domain-1: span=0-15,24-39,48-55 level=MC
	domain-2: span=0-55 level=DIE

	CPU1 attaching sched-domain(s):
	domain-0: span=1,3,5,7 level=SMT
	domain-1: span=0-15,24-39,48-55 level=MC
	domain-2: span=0-55 level=DIE

The CACHE domain with span=0,2,4,6 for CPU 0 (span=1,3,5,7 for CPU 1
resp.) gets degenerated into the SMT domain. Furthermore, the
last-level-cache domain gets correctly set to the SMT sched-domain.

Finally, this patchset reports the correct shared_cpu_map/list in the
sysfs for L2 cache on such platforms. With the patchset for CPUs0, 1,
for L2 cache we see the correct shared_cpu_map/list

/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_list:0,2,4,6
/sys/devices/system/cpu/cpu0/cache/index2/shared_cpu_map:000000,00000055

/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_list:1,3,5,7
/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map:000000,000000aa

The patchset has been tested on older platforms which encode only the
L1 sharing information via "ibm,thread-groups" and there is no
regression found.

Gautham R. Shenoy (3):
  powerpc/smp: Parse ibm,thread-groups with multiple properties
  powerpc/smp: Add support detecting thread-groups sharing L2 cache
  powerpc/cacheinfo: Print correct cache-sibling map/list for L2 cache

 arch/powerpc/kernel/cacheinfo.c |   7 ++
 arch/powerpc/kernel/smp.c       | 198 +++++++++++++++++++++++++++++-----------
 2 files changed, 150 insertions(+), 55 deletions(-)

-- 
1.9.4

^ permalink raw reply

* Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting
From: Andy Lutomirski @ 2020-12-04  2:17 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, X86 ML, Arnd Bergmann, Linux-MM, Peter Zijlstra,
	Catalin Marinas, Rik van Riel, LKML, Dave Hansen, Will Deacon,
	Mathieu Desnoyers, Andy Lutomirski, linuxppc-dev
In-Reply-To: <1607033145.hcppy9ndl4.astroid@bobo.none>


> On Dec 3, 2020, at 2:13 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Excerpts from Peter Zijlstra's message of December 3, 2020 6:44 pm:
>>> On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote:
>>> 
>>> power: same as ARM, except that the loop may be rather larger since
>>> the systems are bigger.  But I imagine it's still faster than Nick's
>>> approach -- a cmpxchg to a remote cacheline should still be faster than
>>> an IPI shootdown. 
>> 
>> While a single atomic might be cheaper than an IPI, the comparison
>> doesn't work out nicely. You do the xchg() on every unlazy, while the
>> IPI would be once per process exit.
>> 
>> So over the life of the process, it might do very many unlazies, adding
>> up to a total cost far in excess of what the single IPI would've been.
> 
> Yeah this is the concern, I looked at things that add cost to the
> idle switch code and it gets hard to justify the scalability improvement
> when you slow these fundmaental things down even a bit.

v2 fixes this and is generally much nicer. I’ll send it out in a couple hours.

> 
> I still think working on the assumption that IPIs = scary expensive 
> might not be correct. An IPI itself is, but you only issue them when 
> you've left a lazy mm on another CPU which just isn't that often.
> 
> Thanks,
> Nick

^ permalink raw reply

* Re: [PATCH kernel v2] vfio/pci/nvlink2: Do not attempt NPU2 setup on POWER8NVL NPU
From: Alex Williamson @ 2020-12-03 23:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, kvm-ppc, stable, Leonardo Augusto Guimarães Garcia,
	linuxppc-dev, David Gibson
In-Reply-To: <20201122073950.15684-1-aik@ozlabs.ru>

On Sun, 22 Nov 2020 18:39:50 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> We execute certain NPU2 setup code (such as mapping an LPID to a device
> in NPU2) unconditionally if an Nvlink bridge is detected. However this
> cannot succeed on POWER8NVL machines as the init helpers return an error
> other than ENODEV which means the device is there is and setup failed so
> vfio_pci_enable() fails and pass through is not possible.
> 
> This changes the two NPU2 related init helpers to return -ENODEV if
> there is no "memory-region" device tree property as this is
> the distinction between NPU and NPU2.
> 
> Tested on
> - POWER9 pvr=004e1201, Ubuntu 19.04 host, Ubuntu 18.04 vm,
>   NVIDIA GV100 10de:1db1 driver 418.39
> - POWER8 pvr=004c0100, RHEL 7.6 host, Ubuntu 16.10 vm,
>   NVIDIA P100 10de:15f9 driver 396.47
> 
> Fixes: 7f92891778df ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver")
> Cc: stable@vger.kernel.org # 5.0
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * updated commit log with tested configs and replaced P8+ with POWER8NVL for clarity
> ---
>  drivers/vfio/pci/vfio_pci_nvlink2.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)

Thanks, applies to vfio next branch for v5.11.

Alex


> 
> diff --git a/drivers/vfio/pci/vfio_pci_nvlink2.c b/drivers/vfio/pci/vfio_pci_nvlink2.c
> index 65c61710c0e9..9adcf6a8f888 100644
> --- a/drivers/vfio/pci/vfio_pci_nvlink2.c
> +++ b/drivers/vfio/pci/vfio_pci_nvlink2.c
> @@ -231,7 +231,7 @@ int vfio_pci_nvdia_v100_nvlink2_init(struct vfio_pci_device *vdev)
>  		return -EINVAL;
>  
>  	if (of_property_read_u32(npu_node, "memory-region", &mem_phandle))
> -		return -EINVAL;
> +		return -ENODEV;
>  
>  	mem_node = of_find_node_by_phandle(mem_phandle);
>  	if (!mem_node)
> @@ -393,7 +393,7 @@ int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev)
>  	int ret;
>  	struct vfio_pci_npu2_data *data;
>  	struct device_node *nvlink_dn;
> -	u32 nvlink_index = 0;
> +	u32 nvlink_index = 0, mem_phandle = 0;
>  	struct pci_dev *npdev = vdev->pdev;
>  	struct device_node *npu_node = pci_device_to_OF_node(npdev);
>  	struct pci_controller *hose = pci_bus_to_host(npdev->bus);
> @@ -408,6 +408,9 @@ int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev)
>  	if (!pnv_pci_get_gpu_dev(vdev->pdev))
>  		return -ENODEV;
>  
> +	if (of_property_read_u32(npu_node, "memory-region", &mem_phandle))
> +		return -ENODEV;
> +
>  	/*
>  	 * NPU2 normally has 8 ATSD registers (for concurrency) and 6 links
>  	 * so we can allocate one register per link, using nvlink index as


^ permalink raw reply

* Re: [PATCH] ASoC: fsl_aud2htx: mark PM functions as __maybe_unused
From: Nicolin Chen @ 2020-12-03 22:36 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: alsa-devel, linuxppc-dev, Arnd Bergmann, Timur Tabi,
	Liam Girdwood, Shengjiu Wang, Shengjiu Wang, Xiubo Li,
	Takashi Iwai, Jaroslav Kysela, Mark Brown, Fabio Estevam,
	linux-kernel
In-Reply-To: <20201203222900.1042578-1-arnd@kernel.org>

On Thu, Dec 03, 2020 at 11:28:47PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> When CONFIG_PM is disabled, we get a warning for unused functions:
> 
> sound/soc/fsl/fsl_aud2htx.c:261:12: error: unused function 'fsl_aud2htx_runtime_suspend' [-Werror,-Wunused-function]
> static int fsl_aud2htx_runtime_suspend(struct device *dev)
> sound/soc/fsl/fsl_aud2htx.c:271:12: error: unused function 'fsl_aud2htx_runtime_resume' [-Werror,-Wunused-function]
> static int fsl_aud2htx_runtime_resume(struct device *dev)
> 
> Mark these as __maybe_unused to avoid the warning without adding
> an #ifdef.
> 
> Fixes: 8a24c834c053 ("ASoC: fsl_aud2htx: Add aud2htx module driver")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>

^ permalink raw reply

* Re: [PATCH 0/6] Add documentation for Documentation/features at the built docs
From: Jonathan Corbet @ 2020-12-03 22:36 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Rich Felker, linux-ia64, Linux Doc Mailing List, x86, linux-mips,
	James E.J. Bottomley, Paul Mackerras, H. Peter Anvin, linux-riscv,
	Will Deacon, Thomas Gleixner, Jonas Bonn, linux-s390,
	Yoshinori Sato, Helge Deller, linux-sh, Christian Borntraeger,
	Ingo Molnar, Catalin Marinas, Fenghua Yu, Albert Ou, Kees Cook,
	Vasily Gorbik, Heiko Carstens, Jonathan Neuschäfer,
	Stefan Kristiansson, Tony Luck, Borislav Petkov, Paul Walmsley,
	Stafford Horne, Daniel W. S. Almeida, linux-arm-kernel,
	Thomas Bogendoerfer, linux-parisc, Andrew Cooper, linux-kernel,
	openrisc, Palmer Dabbelt, Masami Hiramatsu, Greg Kroah-Hartman,
	linuxppc-dev
In-Reply-To: <cover.1606748711.git.mchehab+huawei@kernel.org>

On Mon, 30 Nov 2020 16:36:29 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> This series got already submitted last year:
> 
>    https://lore.kernel.org/lkml/cover.1561222784.git.mchehab+samsung@kernel.org/
> 
> Yet, on that time, there were too many other patches related to ReST
> conversion floating around. So, at the end, I guess this one got missed.
> 
> So, I did a rebase on the top of upstream, and added a few new changes.

OK, I've gone ahead and applied these; it gains me a new trivial conflict
with x86, but so be it...

That said, I think that the RST table formatting could be *way* improved.
The current tables are all white space and hard to make sense of.  What if
we condensed the information?  Just looking at the first entry in
Documentation/admin-guide/features.html, perhaps it could look like:

    FEATURE	KCONFIG/DESCRIPTION		STATUS

    cBPF-JIT	HAVE_CBPF_JIT			TODO: alpha, arc, arm...
    						ok: mips, powerpc, ...
		arch supports cBPF JIT
		optimizations

The result would be far more compact and easy to read, IMO.  I may get
around to giving this a try if (hint :) nobody else gets there first.

Thanks,

jon

^ permalink raw reply

* [PATCH] ASoC: fsl_aud2htx: mark PM functions as __maybe_unused
From: Arnd Bergmann @ 2020-12-03 22:28 UTC (permalink / raw)
  To: Timur Tabi, Nicolin Chen, Xiubo Li, Liam Girdwood, Mark Brown,
	Jaroslav Kysela, Takashi Iwai, Shengjiu Wang
  Cc: alsa-devel, Arnd Bergmann, Fabio Estevam, linuxppc-dev,
	linux-kernel, Shengjiu Wang

From: Arnd Bergmann <arnd@arndb.de>

When CONFIG_PM is disabled, we get a warning for unused functions:

sound/soc/fsl/fsl_aud2htx.c:261:12: error: unused function 'fsl_aud2htx_runtime_suspend' [-Werror,-Wunused-function]
static int fsl_aud2htx_runtime_suspend(struct device *dev)
sound/soc/fsl/fsl_aud2htx.c:271:12: error: unused function 'fsl_aud2htx_runtime_resume' [-Werror,-Wunused-function]
static int fsl_aud2htx_runtime_resume(struct device *dev)

Mark these as __maybe_unused to avoid the warning without adding
an #ifdef.

Fixes: 8a24c834c053 ("ASoC: fsl_aud2htx: Add aud2htx module driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 sound/soc/fsl/fsl_aud2htx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_aud2htx.c b/sound/soc/fsl/fsl_aud2htx.c
index 4091ccc7c3e9..d70d5e75a30c 100644
--- a/sound/soc/fsl/fsl_aud2htx.c
+++ b/sound/soc/fsl/fsl_aud2htx.c
@@ -258,7 +258,7 @@ static int fsl_aud2htx_remove(struct platform_device *pdev)
 	return 0;
 }
 
-static int fsl_aud2htx_runtime_suspend(struct device *dev)
+static int __maybe_unused fsl_aud2htx_runtime_suspend(struct device *dev)
 {
 	struct fsl_aud2htx *aud2htx = dev_get_drvdata(dev);
 
@@ -268,7 +268,7 @@ static int fsl_aud2htx_runtime_suspend(struct device *dev)
 	return 0;
 }
 
-static int fsl_aud2htx_runtime_resume(struct device *dev)
+static int __maybe_unused fsl_aud2htx_runtime_resume(struct device *dev)
 {
 	struct fsl_aud2htx *aud2htx = dev_get_drvdata(dev);
 	int ret;
-- 
2.27.0


^ permalink raw reply related

* Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting
From: Nicholas Piggin @ 2020-12-03 22:13 UTC (permalink / raw)
  To: Andy Lutomirski, Peter Zijlstra
  Cc: linux-arch, X86 ML, Arnd Bergmann, Linux-MM, Catalin Marinas,
	Rik van Riel, LKML, Dave Hansen, Will Deacon, Mathieu Desnoyers,
	linuxppc-dev
In-Reply-To: <20201203084448.GF2414@hirez.programming.kicks-ass.net>

Excerpts from Peter Zijlstra's message of December 3, 2020 6:44 pm:
> On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote:
> 
>> power: same as ARM, except that the loop may be rather larger since
>> the systems are bigger.  But I imagine it's still faster than Nick's
>> approach -- a cmpxchg to a remote cacheline should still be faster than
>> an IPI shootdown. 
> 
> While a single atomic might be cheaper than an IPI, the comparison
> doesn't work out nicely. You do the xchg() on every unlazy, while the
> IPI would be once per process exit.
> 
> So over the life of the process, it might do very many unlazies, adding
> up to a total cost far in excess of what the single IPI would've been.

Yeah this is the concern, I looked at things that add cost to the
idle switch code and it gets hard to justify the scalability improvement
when you slow these fundmaental things down even a bit.

I still think working on the assumption that IPIs = scary expensive 
might not be correct. An IPI itself is, but you only issue them when 
you've left a lazy mm on another CPU which just isn't that often.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH 3/7] powerpc/64s: flush L1D after user accesses
From: Qian Cai @ 2020-12-03 18:50 UTC (permalink / raw)
  To: Daniel Axtens, linuxppc-dev; +Cc: cmr, spoorts2, npiggin
In-Reply-To: <e82f315e08fe9f13ce4e94259968e0782ebb57a3.camel@redhat.com>

On Thu, 2020-12-03 at 12:17 -0500, Qian Cai wrote:
> []
> > +static inline bool
> > +bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
> > +{
> > +	return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) &&
> > +		    (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)),
> > +		    "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read");
> > +}
> 
> A simple "echo t > /proc/sysrq-trigger" will trigger this warning almost
> endlessly on POWER9 NV.

I have just realized the patch just moved this warning around, so the issue was
pre-existent. Since I have not tested sysrq-t regularly, I am not sure when it
started to break. So far, I have reverted some of those for testing which did
not help, i.e., the sysrq-t issue remains.

16852975f0f  Revert "powerpc/64s: Use early_mmu_has_feature() in set_kuap()"
129e240ead32 Revert "powerpc: Implement user_access_save() and user_access_restore()"
edb0046c842c Revert "powerpc/64s/kuap: Add missing isync to KUAP restore paths"
2d46ee87ce44 Revert "powerpc/64/kuap: Conditionally restore AMR in interrupt exit"
c1e0e805fc57 Revert "powerpc/64s/kuap: Conditionally restore AMR in kuap_restore_amr asm"
7f30b7aaf23a Revert "selftests/powerpc: rfi_flush: disable entry flush if present"
bc9b9967a100 Revert "powerpc/64s: flush L1D on kernel entry"
b77e7b54f5eb Revert "powerpc/64s: flush L1D after user accesses"
22dddf532c64 Revert "powerpc: Only include kup-radix.h for 64-bit Book3S"
2679d155c46a Revert "selftests/powerpc: entry flush test"
87954b9b4243 Revert "selftests/powerpc: refactor entry and rfi_flush tests"
342d82bd4c5d Revert "powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigations"


^ permalink raw reply

* Re: [PATCH 3/7] powerpc/64s: flush L1D after user accesses
From: Qian Cai @ 2020-12-03 17:31 UTC (permalink / raw)
  To: Daniel Axtens, linuxppc-dev; +Cc: cmr, spoorts2, npiggin
In-Reply-To: <e82f315e08fe9f13ce4e94259968e0782ebb57a3.camel@redhat.com>

On Thu, 2020-12-03 at 12:17 -0500, Qian Cai wrote:
> A simple "echo t > /proc/sysrq-trigger" will trigger this warning almost
> endlessly on Power8 NV.

Correction -- POWER9 NV.


^ permalink raw reply

* Re: [PATCH 3/7] powerpc/64s: flush L1D after user accesses
From: Qian Cai @ 2020-12-03 17:17 UTC (permalink / raw)
  To: Daniel Axtens, linuxppc-dev; +Cc: cmr, spoorts2, npiggin
In-Reply-To: <20201119231333.361771-4-dja@axtens.net>

On Fri, 2020-11-20 at 10:13 +1100, Daniel Axtens wrote:
> From: Nicholas Piggin <npiggin@gmail.com>
> 
> IBM Power9 processors can speculatively operate on data in the L1 cache
> before it has been completely validated, via a way-prediction mechanism. It
> is not possible for an attacker to determine the contents of impermissible
> memory using this method, since these systems implement a combination of
> hardware and software security measures to prevent scenarios where
> protected data could be leaked.
> 
> However these measures don't address the scenario where an attacker induces
> the operating system to speculatively execute instructions using data that
> the attacker controls. This can be used for example to speculatively bypass
> "kernel user access prevention" techniques, as discovered by Anthony
> Steinhauser of Google's Safeside Project. This is not an attack by itself,
> but there is a possibility it could be used in conjunction with
> side-channels or other weaknesses in the privileged code to construct an
> attack.
> 
> This issue can be mitigated by flushing the L1 cache between privilege
> boundaries of concern. This patch flushes the L1 cache after user accesses.
> 
> This is part of the fix for CVE-2020-4788.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  .../admin-guide/kernel-parameters.txt         |  4 +
>  .../powerpc/include/asm/book3s/64/kup-radix.h | 66 ++++++++------
>  arch/powerpc/include/asm/exception-64s.h      |  3 +
>  arch/powerpc/include/asm/feature-fixups.h     |  9 ++
>  arch/powerpc/include/asm/kup.h                | 19 +++--
>  arch/powerpc/include/asm/security_features.h  |  3 +
>  arch/powerpc/include/asm/setup.h              |  1 +
>  arch/powerpc/kernel/exceptions-64s.S          | 85 ++++++-------------
>  arch/powerpc/kernel/setup_64.c                | 62 ++++++++++++++
>  arch/powerpc/kernel/vmlinux.lds.S             |  7 ++
>  arch/powerpc/lib/feature-fixups.c             | 50 +++++++++++
>  arch/powerpc/platforms/powernv/setup.c        | 10 ++-
>  arch/powerpc/platforms/pseries/setup.c        |  4 +
>  13 files changed, 233 insertions(+), 90 deletions(-)
[]
> diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h
> index 3ee1ec60be84..97c2394e7dea 100644
> --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
[]
> +static inline bool
> +bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
> +{
> +	return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) &&
> +		    (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)),
> +		    "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read");
> +}

A simple "echo t > /proc/sysrq-trigger" will trigger this warning almost
endlessly on Power8 NV.

.config: 
https://cailca.coding.net/public/linux/mm/git/files/master/powerpc.config

[  391.734028][ T1986] Bug: Read fault blocked by AMR!
[  391.734032][ T1986] WARNING: CPU: 80 PID: 1986 at arch/powerpc/include/asm/book3s/64/kup-radix.h:145 do_page_fault+0x8fc/0xb70
[  391.734232][ T1986] Modules linked in: kvm_hv kvm ip_tables x_tables sd_mod ahci libahci tg3 libata firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod
[  391.734425][ T1986] CPU: 80 PID: 1986 Comm: bash Tainted: G        W         5.10.0-rc6-next-20201203+ #3
[  391.734535][ T1986] NIP:  c00000000004dd1c LR: c00000000004dd18 CTR: 0000000000000000
[  391.734648][ T1986] REGS: c00020003a0bf3a0 TRAP: 0700   Tainted: G        W          (5.10.0-rc6-next-20201203+)
[  391.734768][ T1986] MSR:  9000000000021033 <SF,HV,ME,IR,DR,RI,LE>  CR: 48222284  XER: 00000000
[  391.734906][ T1986] CFAR: c0000000000bb05c IRQMASK: 1 
[  391.734906][ T1986] GPR00: c00000000004dd18 c00020003a0bf630 c000000007fe0d00 000000000000001f 
[  391.734906][ T1986] GPR04: c000000000f1cc58 0000000000000003 0000000000000027 c000201cc6207218 
[  391.734906][ T1986] GPR08: 0000000000000023 0000000000000002 c00020004753bd80 c000000007f1cee8 
[  391.734906][ T1986] GPR12: 0000000000002000 c000201fff7f8380 0000000040000000 0000000110929798 
[  391.734906][ T1986] GPR16: 0000000110929724 00000001108c6988 000000011085f290 000000011092d568 
[  391.734906][ T1986] GPR20: 00000001229f1f80 0000000000000001 0000000000000001 c000000000aa8dc8 
[  391.734906][ T1986] GPR24: c000000000ab4a00 c00020001cc8c880 0000000000000000 0000000000000000 
[  391.734906][ T1986] GPR28: c00000000801aa18 0000000000000160 c00020003a0bf760 0000000000000300 
[  391.735865][ T1986] NIP [c00000000004dd1c] do_page_fault+0x8fc/0xb70
[  391.735947][ T1986] LR [c00000000004dd18] do_page_fault+0x8f8/0xb70
[  391.736033][ T1986] Call Trace:
[  391.736072][ T1986] [c00020003a0bf630] [c00000000004dd18] do_page_fault+0x8f8/0xb70 (unreliable)
[  391.736181][ T1986] [c00020003a0bf6f0] [c00000000000c1b8] handle_page_fault+0x10/0x2c
[  391.736294][ T1986] --- interrupt: 300 at copy_from_kernel_nofault+0x68/0x190
[  391.736294][ T1986]     LR = copy_from_kernel_nofault+0x40/0x190
[  391.736441][ T1986] [c00020003a0bf9f0] [c00020003a0bfa30] 0xc00020003a0bfa30 (unreliable)
[  391.736565][ T1986] [c00020003a0bfa30] [c0000000000edd98] print_worker_info+0xe8/0x1c0
[  391.736672][ T1986] [c00020003a0bfaf0] [c000000000104b0c] sched_show_task+0x2dc/0x350
[  391.736807][ T1986] [c00020003a0bfb70] [c000000000112cd8] show_state_filter+0x148/0x320
[  391.736899][ T1986] [c00020003a0bfbe0] [c00000000070a3f4] sysrq_handle_showstate+0x24/0x40
[  391.736995][ T1986] [c00020003a0bfc00] [c00000000070add4] __handle_sysrq+0x164/0x280
[  391.737111][ T1986] [c00020003a0bfca0] [c00000000070b03c] write_sysrq_trigger+0xfc/0x13c
[  391.737233][ T1986] [c00020003a0bfce0] [c0000000004c579c] proc_reg_write+0x10c/0x1b0
[  391.737327][ T1986] [c00020003a0bfd10] [c0000000003e5ec4] vfs_write+0xf4/0x480
[  391.737431][ T1986] [c00020003a0bfd70] [c0000000003e642c] ksys_write+0x7c/0x140
[  391.737536][ T1986] [c00020003a0bfdc0] [c00000000002c578] system_call_exception+0xf8/0x1d0
[  391.737623][ T1986] [c00020003a0bfe20] [c00000000000d1c8] system_call_common+0xe8/0x218
[  391.737708][ T1986] Instruction dump:
[  391.737777][ T1986] 60000000 2fbb0000 e93e0168 419e007c 2fa90000 3c82f8ac 388467a0 409cfed4 
[  391.737913][ T1986] 3c62f8ac 386368a8 4806d2e1 60000000 <0fe00000> e80100d0 3ae0000b eb410090 
[  391.738041][ T1986] irq event stamp: 126198
[  391.738077][ T1986] hardirqs last  enabled at (126197): [<c00000000002cbf4>] interrupt_exit_kernel_prepare+0xb4/0x250
[  391.738196][ T1986] hardirqs last disabled at (126198): [<c00000000000897c>] data_access_common_virt+0x16c/0x180
[  391.738327][ T1986] softirqs last  enabled at (126196): [<c000000000948f08>] __do_softirq+0x388/0x704
[  391.738427][ T1986] softirqs last disabled at (126191): [<c0000000000c75b8>] irq_exit+0x198/0x1c0
[  392.177827][ T1986] ---[ end trace 8eaf99b33f09def0 ]---
[  392.177934][ T1986] Workqueue:  0x0 (mm_percpu_wq)
[  392.177994][ T1986] Call Trace:
[  392.178048][ T1986] [c00000002afbf9d0] [c000000000ab0ee8] __func__.4060+0x125178/0x185


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox