* Re: [PATCH] powerpc/pseries: hcall_exit tracepoint retval should be signed
From: Ravi Bangoria @ 2018-05-09 9:46 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev, anton
In-Reply-To: <20180507130355.25115-1-mpe@ellerman.id.au>
On 05/07/2018 06:33 PM, Michael Ellerman wrote:
> The hcall_exit() tracepoint has retval defined as unsigned long. That
> leads to humours results like:
>
> bash-3686 [009] d..2 854.134094: hcall_entry: opcode=24
> bash-3686 [009] d..2 854.134095: hcall_exit: opcode=24 retval=18446744073709551609
>
> It's normal for some hcalls to return negative values, displaying them
> as unsigned isn't very helpful. So change it to signed.
>
> bash-3711 [001] d..2 471.691008: hcall_entry: opcode=24
> bash-3711 [001] d..2 471.691008: hcall_exit: opcode=24 retval=-7
>
> Which can be more easily compared to H_NOT_FOUND in hvcall.h
Tested-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Thank,
Ravi
^ permalink raw reply
* [PATCH] crypto: nx: fix spelling mistake: "seqeunce" -> "sequence"
From: Colin King @ 2018-05-09 9:16 UTC (permalink / raw)
To: Haren Myneni, Herbert Xu, David S . Miller,
Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
linux-crypto, linuxppc-dev
Cc: kernel-janitors, linux-kernel
From: Colin Ian King <colin.king@canonical.com>
Trivial fix to spelling mistake in CSB_ERR error message text
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/crypto/nx/nx-842-powernv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/crypto/nx/nx-842-powernv.c b/drivers/crypto/nx/nx-842-powernv.c
index 1e87637c412d..36afd6d8753c 100644
--- a/drivers/crypto/nx/nx-842-powernv.c
+++ b/drivers/crypto/nx/nx-842-powernv.c
@@ -334,7 +334,7 @@ static int wait_for_csb(struct nx842_workmem *wmem,
return -EPROTO;
case CSB_CC_SEQUENCE:
/* should not happen, we don't use chained CRBs */
- CSB_ERR(csb, "CRB seqeunce number error");
+ CSB_ERR(csb, "CRB sequence number error");
return -EPROTO;
case CSB_CC_UNKNOWN_CODE:
CSB_ERR(csb, "Unknown subfunction code");
--
2.17.0
^ permalink raw reply related
* Re: [PATCH] powerpc/64s/radix: reset mm_cpumask for single thread process when possible
From: Benjamin Herrenschmidt @ 2018-05-09 8:31 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20180509065613.14762-1-npiggin@gmail.com>
On Wed, 2018-05-09 at 16:56 +1000, Nicholas Piggin wrote:
> When a single-threaded process has a non-local mm_cpumask and requires
> a full PID tlbie invalidation, use that as an opportunity to reset the
> cpumask back to the current CPU we're running on.
>
> No other thread can concurrently switch to this mm, because it must
> have had a reference on mm_users before it could use_mm. mm_users can
> be asynchronously incremented e.g., by mmget_not_zero, but those users
> must not be doing use_mm.
What do you mean ? I don't fully understand how this isn't racy with
another thread being created, switching to that mm, and then having its
bit cleared by us ? Also why not use_mm ? what prevents it ?
Cheers,
Ben.
^ permalink raw reply
* Re: [PATCH 1/2] powerpc/64s/radix: do not flush TLB when relaxing access
From: Balbir Singh @ 2018-05-09 8:27 UTC (permalink / raw)
To: Nicholas Piggin
Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), Alistair Popple
In-Reply-To: <20180509174356.361abdba@roar.ozlabs.ibm.com>
On Wed, May 9, 2018 at 5:43 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
> On Wed, 9 May 2018 17:07:47 +1000
> Balbir Singh <bsingharora@gmail.com> wrote:
>
>> On Wed, May 9, 2018 at 4:51 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
>> > Radix flushes the TLB when updating ptes to increase permissiveness
>> > of protection (increase access authority). Book3S does not require
>> > TLB flushing in this case, and it is not done on hash. This patch
>> > avoids the flush for radix.
>> >
>> > From Power ISA v3.0B, p.1090:
>> >
>> > Setting a Reference or Change Bit or Upgrading Access Authority
>> > (PTE Subject to Atomic Hardware Updates)
>> >
>> > If the only change being made to a valid PTE that is subject to
>> > atomic hardware updates is to set the Reference or Change bit to 1
>> > or to add access authorities, a simpler sequence suffices because
>> > the translation hardware will refetch the PTE if an access is
>> > attempted for which the only problems were reference and/or change
>> > bits needing to be set or insufficient access authority.
>> >
>> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> > ---
>> > arch/powerpc/mm/pgtable-book3s64.c | 1 -
>> > arch/powerpc/mm/pgtable.c | 3 ++-
>> > 2 files changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
>> > index 518518fb7c45..6e991eaccab4 100644
>> > --- a/arch/powerpc/mm/pgtable-book3s64.c
>> > +++ b/arch/powerpc/mm/pgtable-book3s64.c
>> > @@ -40,7 +40,6 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
>> > if (changed) {
>> > __ptep_set_access_flags(vma->vm_mm, pmdp_ptep(pmdp),
>> > pmd_pte(entry), address);
>> > - flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
>>
>> The comment states that this can be used for missing execution
>> permissions as well. I am not convinced we can skip a flush in those
>> cases
>
> Why not? Execute is part of the access authority. And they're already no
> ops on hash. What am I missing?
I have not reviewed the hash code, but if relaxing access means
allowing the code to provide execute permission, won't this result in
spurious faults? A simple test might be to run a JIT workload and see
if the number of faults go up with and without the patch?
Balbir
^ permalink raw reply
* Re: [PATCH] powerpc/64s/radix: reset mm_cpumask for single thread process when possible
From: Balbir Singh @ 2018-05-09 8:23 UTC (permalink / raw)
To: Nicholas Piggin; +Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT)
In-Reply-To: <20180509065613.14762-1-npiggin@gmail.com>
On Wed, May 9, 2018 at 4:56 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
> When a single-threaded process has a non-local mm_cpumask and requires
> a full PID tlbie invalidation, use that as an opportunity to reset the
> cpumask back to the current CPU we're running on.
>
> No other thread can concurrently switch to this mm, because it must
> have had a reference on mm_users before it could use_mm. mm_users can
> be asynchronously incremented e.g., by mmget_not_zero, but those users
> must not be doing use_mm.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> arch/powerpc/include/asm/mmu_context.h | 19 +++++++++
> arch/powerpc/include/asm/tlb.h | 8 ++++
> arch/powerpc/mm/tlb-radix.c | 57 +++++++++++++++++++-------
> 3 files changed, 70 insertions(+), 14 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> index 1835ca1505d6..df12a994529f 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -6,6 +6,7 @@
> #include <linux/kernel.h>
> #include <linux/mm.h>
> #include <linux/sched.h>
> +#include <linux/sched/mm.h>
> #include <linux/spinlock.h>
> #include <asm/mmu.h>
> #include <asm/cputable.h>
> @@ -201,6 +202,24 @@ static inline void activate_mm(struct mm_struct *prev, struct mm_struct *next)
> static inline void enter_lazy_tlb(struct mm_struct *mm,
> struct task_struct *tsk)
> {
> +#ifdef CONFIG_PPC_BOOK3S_64
> + /*
> + * Under radix, we do not want to keep lazy PIDs around because
> + * even if the CPU does not access userspace, it can still bring
> + * in translations through speculation and prefetching.
> + *
> + * Switching away here allows us to trim back the mm_cpumask in
> + * cases where we know the process is not running on some CPUs
> + * (see mm/tlb-radix.c).
> + */
> + if (radix_enabled() && mm != &init_mm) {
> + mmgrab(&init_mm);
This is called when a kernel thread decides to unuse a mm, I agree switching
to init_mm as active_mm is reasonable thing to do.
> + tsk->active_mm = &init_mm;
Are we called with irqs disabled? Don't we need it below?
> + switch_mm_irqs_off(mm, tsk->active_mm, tsk);
> + mmdrop(mm);
> + }
> +#endif
> +
> /* 64-bit Book3E keeps track of current PGD in the PACA */
> #ifdef CONFIG_PPC_BOOK3E_64
> get_paca()->pgd = NULL;
> diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
> index a7eabff27a0f..006fce98c403 100644
> --- a/arch/powerpc/include/asm/tlb.h
> +++ b/arch/powerpc/include/asm/tlb.h
> @@ -76,6 +76,14 @@ static inline int mm_is_thread_local(struct mm_struct *mm)
> return false;
> return cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm));
> }
> +static inline void mm_reset_thread_local(struct mm_struct *mm)
reset_thread_local --> reset_to_thread_local?
> +{
> + WARN_ON(atomic_read(&mm->context.copros) > 0);
Can we put this under DEBUG_VM, VM_WARN_ON?
> + WARN_ON(!(atomic_read(&mm->mm_users) == 1 && current->mm == mm));
> + atomic_set(&mm->context.active_cpus, 1);
> + cpumask_clear(mm_cpumask(mm));
> + cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm));
> +}
> #else /* CONFIG_PPC_BOOK3S_64 */
> static inline int mm_is_thread_local(struct mm_struct *mm)
> {
> diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
> index 5ac3206c51cc..d5593a78702a 100644
> --- a/arch/powerpc/mm/tlb-radix.c
> +++ b/arch/powerpc/mm/tlb-radix.c
> @@ -504,6 +504,15 @@ void radix__local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmadd
> }
> EXPORT_SYMBOL(radix__local_flush_tlb_page);
>
> +static bool mm_is_singlethreaded(struct mm_struct *mm)
> +{
mm_tlb_context_is_local? We should also skip init_mm from these checks
> + if (atomic_read(&mm->context.copros) > 0)
> + return false;
> + if (atomic_read(&mm->mm_users) == 1 && current->mm == mm)
> + return true;
> + return false;
> +}
> +
> static bool mm_needs_flush_escalation(struct mm_struct *mm)
> {
> /*
> @@ -511,7 +520,9 @@ static bool mm_needs_flush_escalation(struct mm_struct *mm)
> * caching PTEs and not flushing them properly when
> * RIC = 0 for a PID/LPID invalidate
> */
> - return atomic_read(&mm->context.copros) != 0;
> + if (atomic_read(&mm->context.copros) > 0)
> + return true;
> + return false;
> }
>
> #ifdef CONFIG_SMP
> @@ -525,12 +536,17 @@ void radix__flush_tlb_mm(struct mm_struct *mm)
>
> preempt_disable();
> if (!mm_is_thread_local(mm)) {
> - if (mm_needs_flush_escalation(mm))
> + if (mm_is_singlethreaded(mm)) {
> _tlbie_pid(pid, RIC_FLUSH_ALL);
> - else
> + mm_reset_thread_local(mm);
> + } else if (mm_needs_flush_escalation(mm)) {
> + _tlbie_pid(pid, RIC_FLUSH_ALL);
> + } else {
> _tlbie_pid(pid, RIC_FLUSH_TLB);
> - } else
> + }
> + } else {
> _tlbiel_pid(pid, RIC_FLUSH_TLB);
> + }
> preempt_enable();
> }
> EXPORT_SYMBOL(radix__flush_tlb_mm);
> @@ -544,10 +560,13 @@ void radix__flush_all_mm(struct mm_struct *mm)
> return;
>
> preempt_disable();
> - if (!mm_is_thread_local(mm))
> + if (!mm_is_thread_local(mm)) {
> _tlbie_pid(pid, RIC_FLUSH_ALL);
> - else
> + if (mm_is_singlethreaded(mm))
> + mm_reset_thread_local(mm);
> + } else {
> _tlbiel_pid(pid, RIC_FLUSH_ALL);
> + }
> preempt_enable();
> }
> EXPORT_SYMBOL(radix__flush_all_mm);
> @@ -644,10 +663,14 @@ void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
> if (local) {
> _tlbiel_pid(pid, RIC_FLUSH_TLB);
> } else {
> - if (mm_needs_flush_escalation(mm))
> + if (mm_is_singlethreaded(mm)) {
> + _tlbie_pid(pid, RIC_FLUSH_ALL);
> + mm_reset_thread_local(mm);
> + } else if (mm_needs_flush_escalation(mm)) {
> _tlbie_pid(pid, RIC_FLUSH_ALL);
> - else
> + } else {
> _tlbie_pid(pid, RIC_FLUSH_TLB);
> + }
> }
> } else {
> bool hflush = false;
> @@ -802,13 +825,19 @@ static inline void __radix__flush_tlb_range_psize(struct mm_struct *mm,
> }
>
> if (full) {
> - if (!local && mm_needs_flush_escalation(mm))
> - also_pwc = true;
> -
> - if (local)
> + if (local) {
> _tlbiel_pid(pid, also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
> - else
> - _tlbie_pid(pid, also_pwc ? RIC_FLUSH_ALL: RIC_FLUSH_TLB);
> + } else {
> + if (mm_is_singlethreaded(mm)) {
> + _tlbie_pid(pid, RIC_FLUSH_ALL);
> + mm_reset_thread_local(mm);
> + } else {
> + if (mm_needs_flush_escalation(mm))
> + also_pwc = true;
> +
> + _tlbie_pid(pid, also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
> + }
> + }
> } else {
> if (local)
> _tlbiel_va_range(start, end, pid, page_size, psize, also_pwc);
Looks good otherwise
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
Balbir Singh.
^ permalink raw reply
* Re: [PATCH 1/2] powerpc/64s/radix: do not flush TLB when relaxing access
From: Nicholas Piggin @ 2018-05-09 7:43 UTC (permalink / raw)
To: Balbir Singh
Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), Alistair Popple
In-Reply-To: <CAKTCnzmeTiSK8naAz32PAcCdq3Nu3tb02DY4Kn0V+pG593tgOg@mail.gmail.com>
On Wed, 9 May 2018 17:07:47 +1000
Balbir Singh <bsingharora@gmail.com> wrote:
> On Wed, May 9, 2018 at 4:51 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
> > Radix flushes the TLB when updating ptes to increase permissiveness
> > of protection (increase access authority). Book3S does not require
> > TLB flushing in this case, and it is not done on hash. This patch
> > avoids the flush for radix.
> >
> > From Power ISA v3.0B, p.1090:
> >
> > Setting a Reference or Change Bit or Upgrading Access Authority
> > (PTE Subject to Atomic Hardware Updates)
> >
> > If the only change being made to a valid PTE that is subject to
> > atomic hardware updates is to set the Reference or Change bit to 1
> > or to add access authorities, a simpler sequence suffices because
> > the translation hardware will refetch the PTE if an access is
> > attempted for which the only problems were reference and/or change
> > bits needing to be set or insufficient access authority.
> >
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> > ---
> > arch/powerpc/mm/pgtable-book3s64.c | 1 -
> > arch/powerpc/mm/pgtable.c | 3 ++-
> > 2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> > index 518518fb7c45..6e991eaccab4 100644
> > --- a/arch/powerpc/mm/pgtable-book3s64.c
> > +++ b/arch/powerpc/mm/pgtable-book3s64.c
> > @@ -40,7 +40,6 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
> > if (changed) {
> > __ptep_set_access_flags(vma->vm_mm, pmdp_ptep(pmdp),
> > pmd_pte(entry), address);
> > - flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
>
> The comment states that this can be used for missing execution
> permissions as well. I am not convinced we can skip a flush in those
> cases
Why not? Execute is part of the access authority. And they're already no
ops on hash. What am I missing?
Thanks,
Nick
^ permalink raw reply
* Re: [PATCH 1/2] powerpc/64s/radix: do not flush TLB when relaxing access
From: Balbir Singh @ 2018-05-09 7:07 UTC (permalink / raw)
To: Nicholas Piggin
Cc: open list:LINUX FOR POWERPC (32-BIT AND 64-BIT), Alistair Popple
In-Reply-To: <20180509065152.14213-2-npiggin@gmail.com>
On Wed, May 9, 2018 at 4:51 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
> Radix flushes the TLB when updating ptes to increase permissiveness
> of protection (increase access authority). Book3S does not require
> TLB flushing in this case, and it is not done on hash. This patch
> avoids the flush for radix.
>
> From Power ISA v3.0B, p.1090:
>
> Setting a Reference or Change Bit or Upgrading Access Authority
> (PTE Subject to Atomic Hardware Updates)
>
> If the only change being made to a valid PTE that is subject to
> atomic hardware updates is to set the Reference or Change bit to 1
> or to add access authorities, a simpler sequence suffices because
> the translation hardware will refetch the PTE if an access is
> attempted for which the only problems were reference and/or change
> bits needing to be set or insufficient access authority.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> arch/powerpc/mm/pgtable-book3s64.c | 1 -
> arch/powerpc/mm/pgtable.c | 3 ++-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
> index 518518fb7c45..6e991eaccab4 100644
> --- a/arch/powerpc/mm/pgtable-book3s64.c
> +++ b/arch/powerpc/mm/pgtable-book3s64.c
> @@ -40,7 +40,6 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
> if (changed) {
> __ptep_set_access_flags(vma->vm_mm, pmdp_ptep(pmdp),
> pmd_pte(entry), address);
> - flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
The comment states that this can be used for missing execution
permissions as well. I am not convinced we can skip a flush in those
cases
> }
> return changed;
> }
> diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
> index 9f361ae571e9..5b07a626df5b 100644
> --- a/arch/powerpc/mm/pgtable.c
> +++ b/arch/powerpc/mm/pgtable.c
> @@ -224,7 +224,8 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
> if (!is_vm_hugetlb_page(vma))
> assert_pte_locked(vma->vm_mm, address);
> __ptep_set_access_flags(vma->vm_mm, ptep, entry, address);
> - flush_tlb_page(vma, address);
> + if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64))
> + flush_tlb_page(vma, address);
Same as above
Balbir Singh.
^ permalink raw reply
* [PATCH] powerpc/64s/radix: reset mm_cpumask for single thread process when possible
From: Nicholas Piggin @ 2018-05-09 6:56 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin
When a single-threaded process has a non-local mm_cpumask and requires
a full PID tlbie invalidation, use that as an opportunity to reset the
cpumask back to the current CPU we're running on.
No other thread can concurrently switch to this mm, because it must
have had a reference on mm_users before it could use_mm. mm_users can
be asynchronously incremented e.g., by mmget_not_zero, but those users
must not be doing use_mm.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/include/asm/mmu_context.h | 19 +++++++++
arch/powerpc/include/asm/tlb.h | 8 ++++
arch/powerpc/mm/tlb-radix.c | 57 +++++++++++++++++++-------
3 files changed, 70 insertions(+), 14 deletions(-)
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index 1835ca1505d6..df12a994529f 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -6,6 +6,7 @@
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/sched.h>
+#include <linux/sched/mm.h>
#include <linux/spinlock.h>
#include <asm/mmu.h>
#include <asm/cputable.h>
@@ -201,6 +202,24 @@ static inline void activate_mm(struct mm_struct *prev, struct mm_struct *next)
static inline void enter_lazy_tlb(struct mm_struct *mm,
struct task_struct *tsk)
{
+#ifdef CONFIG_PPC_BOOK3S_64
+ /*
+ * Under radix, we do not want to keep lazy PIDs around because
+ * even if the CPU does not access userspace, it can still bring
+ * in translations through speculation and prefetching.
+ *
+ * Switching away here allows us to trim back the mm_cpumask in
+ * cases where we know the process is not running on some CPUs
+ * (see mm/tlb-radix.c).
+ */
+ if (radix_enabled() && mm != &init_mm) {
+ mmgrab(&init_mm);
+ tsk->active_mm = &init_mm;
+ switch_mm_irqs_off(mm, tsk->active_mm, tsk);
+ mmdrop(mm);
+ }
+#endif
+
/* 64-bit Book3E keeps track of current PGD in the PACA */
#ifdef CONFIG_PPC_BOOK3E_64
get_paca()->pgd = NULL;
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h
index a7eabff27a0f..006fce98c403 100644
--- a/arch/powerpc/include/asm/tlb.h
+++ b/arch/powerpc/include/asm/tlb.h
@@ -76,6 +76,14 @@ static inline int mm_is_thread_local(struct mm_struct *mm)
return false;
return cpumask_test_cpu(smp_processor_id(), mm_cpumask(mm));
}
+static inline void mm_reset_thread_local(struct mm_struct *mm)
+{
+ WARN_ON(atomic_read(&mm->context.copros) > 0);
+ WARN_ON(!(atomic_read(&mm->mm_users) == 1 && current->mm == mm));
+ atomic_set(&mm->context.active_cpus, 1);
+ cpumask_clear(mm_cpumask(mm));
+ cpumask_set_cpu(smp_processor_id(), mm_cpumask(mm));
+}
#else /* CONFIG_PPC_BOOK3S_64 */
static inline int mm_is_thread_local(struct mm_struct *mm)
{
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index 5ac3206c51cc..d5593a78702a 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -504,6 +504,15 @@ void radix__local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmadd
}
EXPORT_SYMBOL(radix__local_flush_tlb_page);
+static bool mm_is_singlethreaded(struct mm_struct *mm)
+{
+ if (atomic_read(&mm->context.copros) > 0)
+ return false;
+ if (atomic_read(&mm->mm_users) == 1 && current->mm == mm)
+ return true;
+ return false;
+}
+
static bool mm_needs_flush_escalation(struct mm_struct *mm)
{
/*
@@ -511,7 +520,9 @@ static bool mm_needs_flush_escalation(struct mm_struct *mm)
* caching PTEs and not flushing them properly when
* RIC = 0 for a PID/LPID invalidate
*/
- return atomic_read(&mm->context.copros) != 0;
+ if (atomic_read(&mm->context.copros) > 0)
+ return true;
+ return false;
}
#ifdef CONFIG_SMP
@@ -525,12 +536,17 @@ void radix__flush_tlb_mm(struct mm_struct *mm)
preempt_disable();
if (!mm_is_thread_local(mm)) {
- if (mm_needs_flush_escalation(mm))
+ if (mm_is_singlethreaded(mm)) {
_tlbie_pid(pid, RIC_FLUSH_ALL);
- else
+ mm_reset_thread_local(mm);
+ } else if (mm_needs_flush_escalation(mm)) {
+ _tlbie_pid(pid, RIC_FLUSH_ALL);
+ } else {
_tlbie_pid(pid, RIC_FLUSH_TLB);
- } else
+ }
+ } else {
_tlbiel_pid(pid, RIC_FLUSH_TLB);
+ }
preempt_enable();
}
EXPORT_SYMBOL(radix__flush_tlb_mm);
@@ -544,10 +560,13 @@ void radix__flush_all_mm(struct mm_struct *mm)
return;
preempt_disable();
- if (!mm_is_thread_local(mm))
+ if (!mm_is_thread_local(mm)) {
_tlbie_pid(pid, RIC_FLUSH_ALL);
- else
+ if (mm_is_singlethreaded(mm))
+ mm_reset_thread_local(mm);
+ } else {
_tlbiel_pid(pid, RIC_FLUSH_ALL);
+ }
preempt_enable();
}
EXPORT_SYMBOL(radix__flush_all_mm);
@@ -644,10 +663,14 @@ void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
if (local) {
_tlbiel_pid(pid, RIC_FLUSH_TLB);
} else {
- if (mm_needs_flush_escalation(mm))
+ if (mm_is_singlethreaded(mm)) {
+ _tlbie_pid(pid, RIC_FLUSH_ALL);
+ mm_reset_thread_local(mm);
+ } else if (mm_needs_flush_escalation(mm)) {
_tlbie_pid(pid, RIC_FLUSH_ALL);
- else
+ } else {
_tlbie_pid(pid, RIC_FLUSH_TLB);
+ }
}
} else {
bool hflush = false;
@@ -802,13 +825,19 @@ static inline void __radix__flush_tlb_range_psize(struct mm_struct *mm,
}
if (full) {
- if (!local && mm_needs_flush_escalation(mm))
- also_pwc = true;
-
- if (local)
+ if (local) {
_tlbiel_pid(pid, also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
- else
- _tlbie_pid(pid, also_pwc ? RIC_FLUSH_ALL: RIC_FLUSH_TLB);
+ } else {
+ if (mm_is_singlethreaded(mm)) {
+ _tlbie_pid(pid, RIC_FLUSH_ALL);
+ mm_reset_thread_local(mm);
+ } else {
+ if (mm_needs_flush_escalation(mm))
+ also_pwc = true;
+
+ _tlbie_pid(pid, also_pwc ? RIC_FLUSH_ALL : RIC_FLUSH_TLB);
+ }
+ }
} else {
if (local)
_tlbiel_va_range(start, end, pid, page_size, psize, also_pwc);
--
2.17.0
^ permalink raw reply related
* [PATCH 2/2] powerpc/64s/radix: do not flush TLB on spurious fault
From: Nicholas Piggin @ 2018-05-09 6:51 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Alistair Popple
In-Reply-To: <20180509065152.14213-1-npiggin@gmail.com>
In the case of a spurious fault (which can happen due to a race with
another thread that changes the page table), the default Linux mm code
calls flush_tlb_page for that address. This is not required because
the pte will be re-fetched. Hash does not wire this up to a hardware
TLB flush for this reason. This patch avoids the flush for radix.
>From Power ISA v3.0B, p.1090:
Setting a Reference or Change Bit or Upgrading Access Authority
(PTE Subject to Atomic Hardware Updates)
If the only change being made to a valid PTE that is subject to
atomic hardware updates is to set the Refer- ence or Change bit to
1 or to add access authorities, a simpler sequence suffices
because the translation hardware will refetch the PTE if an access
is attempted for which the only problems were reference and/or
change bits needing to be set or insufficient access authority.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/include/asm/book3s/64/tlbflush.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h
index 0cac17253513..9c43431e01c5 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h
@@ -137,6 +137,13 @@ static inline void flush_all_mm(struct mm_struct *mm)
#define flush_tlb_page(vma, addr) local_flush_tlb_page(vma, addr)
#define flush_all_mm(mm) local_flush_all_mm(mm)
#endif /* CONFIG_SMP */
+
+#define flush_tlb_fix_spurious_fault flush_tlb_fix_spurious_fault
+static inline void flush_tlb_fix_spurious_fault(struct vm_area_struct *vma,
+ unsigned long address)
+{
+}
+
/*
* flush the page walk cache for the address
*/
--
2.17.0
^ permalink raw reply related
* [PATCH 1/2] powerpc/64s/radix: do not flush TLB when relaxing access
From: Nicholas Piggin @ 2018-05-09 6:51 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Alistair Popple
In-Reply-To: <20180509065152.14213-1-npiggin@gmail.com>
Radix flushes the TLB when updating ptes to increase permissiveness
of protection (increase access authority). Book3S does not require
TLB flushing in this case, and it is not done on hash. This patch
avoids the flush for radix.
>From Power ISA v3.0B, p.1090:
Setting a Reference or Change Bit or Upgrading Access Authority
(PTE Subject to Atomic Hardware Updates)
If the only change being made to a valid PTE that is subject to
atomic hardware updates is to set the Reference or Change bit to 1
or to add access authorities, a simpler sequence suffices because
the translation hardware will refetch the PTE if an access is
attempted for which the only problems were reference and/or change
bits needing to be set or insufficient access authority.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
arch/powerpc/mm/pgtable-book3s64.c | 1 -
arch/powerpc/mm/pgtable.c | 3 ++-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/pgtable-book3s64.c b/arch/powerpc/mm/pgtable-book3s64.c
index 518518fb7c45..6e991eaccab4 100644
--- a/arch/powerpc/mm/pgtable-book3s64.c
+++ b/arch/powerpc/mm/pgtable-book3s64.c
@@ -40,7 +40,6 @@ int pmdp_set_access_flags(struct vm_area_struct *vma, unsigned long address,
if (changed) {
__ptep_set_access_flags(vma->vm_mm, pmdp_ptep(pmdp),
pmd_pte(entry), address);
- flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
}
return changed;
}
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index 9f361ae571e9..5b07a626df5b 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -224,7 +224,8 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
if (!is_vm_hugetlb_page(vma))
assert_pte_locked(vma->vm_mm, address);
__ptep_set_access_flags(vma->vm_mm, ptep, entry, address);
- flush_tlb_page(vma, address);
+ if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64))
+ flush_tlb_page(vma, address);
}
return changed;
}
--
2.17.0
^ permalink raw reply related
* [PATCH 0/2] powerpc/64s/radix: avoid unnecessary TLB flushes on fault
From: Nicholas Piggin @ 2018-05-09 6:51 UTC (permalink / raw)
To: linuxppc-dev; +Cc: Nicholas Piggin, Alistair Popple
These two patches make radix match hash and not flush the TLB after
fixing up faults unnecessarily.
There was some concern that accelerators need to have this flush, but
nothing is documented or commented, so it should just be removed. We
have a coprocessor count in the mm context now, and that can easily
be special cased if neccesary.
This and a few other changes reduce our broadcast tlbie rates by 10x
on a kernel compile benchmark, so it would be good to get it in.
Thanks,
Nick
Nicholas Piggin (2):
powerpc/64s/radix: do not flush TLB when relaxing access
powerpc/64s/radix: do not flush TLB on spurious fault
arch/powerpc/include/asm/book3s/64/tlbflush.h | 7 +++++++
arch/powerpc/mm/pgtable-book3s64.c | 1 -
arch/powerpc/mm/pgtable.c | 3 ++-
3 files changed, 9 insertions(+), 2 deletions(-)
--
2.17.0
^ permalink raw reply
* [PATCH v4 5/7] ocxl: Expose the thread_id needed for wait on POWER9
From: Alastair D'Silva @ 2018-05-09 5:35 UTC (permalink / raw)
To: linuxppc-dev
Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
felix, pombredanne, sukadev, npiggin, gregkh, arnd,
andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180509053506.9754-1-alastair@au1.ibm.com>
From: Alastair D'Silva <alastair@d-silva.org>
In order to successfully issue as_notify, an AFU needs to know the TID
to notify, which in turn means that this information should be
available in userspace so it can be communicated to the AFU.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
---
drivers/misc/ocxl/context.c | 5 +++-
drivers/misc/ocxl/file.c | 53 +++++++++++++++++++++++++++++++++++++++
drivers/misc/ocxl/link.c | 36 ++++++++++++++++++++++++++
drivers/misc/ocxl/ocxl_internal.h | 1 +
include/misc/ocxl.h | 9 +++++++
include/uapi/misc/ocxl.h | 10 ++++++++
6 files changed, 113 insertions(+), 1 deletion(-)
diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 909e8807824a..95f74623113e 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -34,6 +34,8 @@ int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
mutex_init(&ctx->xsl_error_lock);
mutex_init(&ctx->irq_lock);
idr_init(&ctx->irq_idr);
+ ctx->tidr = 0;
+
/*
* Keep a reference on the AFU to make sure it's valid for the
* duration of the life of the context
@@ -65,6 +67,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
{
int rc;
+ // Locks both status & tidr
mutex_lock(&ctx->status_mutex);
if (ctx->status != OPENED) {
rc = -EIO;
@@ -72,7 +75,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
}
rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid,
- current->mm->context.id, 0, amr, current->mm,
+ current->mm->context.id, ctx->tidr, amr, current->mm,
xsl_fault_error, ctx);
if (rc)
goto out;
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index 038509e5d031..eb409a469f21 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -5,6 +5,8 @@
#include <linux/sched/signal.h>
#include <linux/uaccess.h>
#include <uapi/misc/ocxl.h>
+#include <asm/reg.h>
+#include <asm/switch_to.h>
#include "ocxl_internal.h"
@@ -123,11 +125,55 @@ static long afu_ioctl_get_metadata(struct ocxl_context *ctx,
return 0;
}
+#ifdef CONFIG_PPC64
+static long afu_ioctl_enable_p9_wait(struct ocxl_context *ctx,
+ struct ocxl_ioctl_p9_wait __user *uarg)
+{
+ struct ocxl_ioctl_p9_wait arg;
+
+ memset(&arg, 0, sizeof(arg));
+
+ if (cpu_has_feature(CPU_FTR_P9_TIDR)) {
+ enum ocxl_context_status status;
+
+ // Locks both status & tidr
+ mutex_lock(&ctx->status_mutex);
+ if (!ctx->tidr) {
+ if (set_thread_tidr(current))
+ return -ENOENT;
+
+ ctx->tidr = current->thread.tidr;
+ }
+
+ status = ctx->status;
+ mutex_unlock(&ctx->status_mutex);
+
+ if (status == ATTACHED) {
+ int rc;
+ struct link *link = ctx->afu->fn->link;
+
+ rc = ocxl_link_update_pe(link, ctx->pasid, ctx->tidr);
+ if (rc)
+ return rc;
+ }
+
+ arg.thread_id = ctx->tidr;
+ } else
+ return -ENOENT;
+
+ if (copy_to_user(uarg, &arg, sizeof(arg)))
+ return -EFAULT;
+
+ return 0;
+}
+#endif
+
#define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" : \
x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" : \
x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" : \
x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" : \
x == OCXL_IOCTL_GET_METADATA ? "GET_METADATA" : \
+ x == OCXL_IOCTL_ENABLE_P9_WAIT ? "ENABLE_P9_WAIT" : \
"UNKNOWN")
static long afu_ioctl(struct file *file, unsigned int cmd,
@@ -186,6 +232,13 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
(struct ocxl_ioctl_metadata __user *) args);
break;
+#ifdef CONFIG_PPC64
+ case OCXL_IOCTL_ENABLE_P9_WAIT:
+ rc = afu_ioctl_enable_p9_wait(ctx,
+ (struct ocxl_ioctl_p9_wait __user *) args);
+ break;
+#endif
+
default:
rc = -EINVAL;
}
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 656e8610eec2..88876ae8f330 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -544,6 +544,42 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
}
EXPORT_SYMBOL_GPL(ocxl_link_add_pe);
+int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
+{
+ struct link *link = (struct link *) link_handle;
+ struct spa *spa = link->spa;
+ struct ocxl_process_element *pe;
+ int pe_handle, rc;
+
+ if (pasid > SPA_PASID_MAX)
+ return -EINVAL;
+
+ pe_handle = pasid & SPA_PE_MASK;
+ pe = spa->spa_mem + pe_handle;
+
+ mutex_lock(&spa->spa_lock);
+
+ pe->tid = tid;
+
+ /*
+ * The barrier makes sure the PE is updated
+ * before we clear the NPU context cache below, so that the
+ * old PE cannot be reloaded erroneously.
+ */
+ mb();
+
+ /*
+ * hook to platform code
+ * On powerpc, the entry needs to be cleared from the context
+ * cache of the NPU.
+ */
+ rc = pnv_ocxl_spa_remove_pe_from_cache(link->platform_data, pe_handle);
+ WARN_ON(rc);
+
+ mutex_unlock(&spa->spa_lock);
+ return rc;
+}
+
int ocxl_link_remove_pe(void *link_handle, int pasid)
{
struct link *link = (struct link *) link_handle;
diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
index 5d421824afd9..a32f2151029f 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -77,6 +77,7 @@ struct ocxl_context {
struct ocxl_xsl_error xsl_error;
struct mutex irq_lock;
struct idr irq_idr;
+ u16 tidr; // Thread ID used for P9 wait implementation
};
struct ocxl_process_element {
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 51ccf76db293..9ff6ddc28e22 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -188,6 +188,15 @@ extern int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data);
+/**
+ * Update values within a Process Element
+ *
+ * link_handle: the link handle associated with the process element
+ * pasid: the PASID for the AFU context
+ * tid: the new thread id for the process element
+ */
+extern int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
+
/*
* Remove a Process Element from the Shared Process Area for a link
*/
diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
index 0af83d80fb3e..8d2748e69c84 100644
--- a/include/uapi/misc/ocxl.h
+++ b/include/uapi/misc/ocxl.h
@@ -48,6 +48,15 @@ struct ocxl_ioctl_metadata {
__u64 reserved[13]; // Total of 16*u64
};
+struct ocxl_ioctl_p9_wait {
+ __u16 thread_id; // The thread ID required to wake this thread
+ __u16 reserved1;
+ __u32 reserved2;
+ __u64 reserved3[3];
+};
+
+};
+
struct ocxl_ioctl_irq_fd {
__u64 irq_offset;
__s32 eventfd;
@@ -62,5 +71,6 @@ struct ocxl_ioctl_irq_fd {
#define OCXL_IOCTL_IRQ_FREE _IOW(OCXL_MAGIC, 0x12, __u64)
#define OCXL_IOCTL_IRQ_SET_FD _IOW(OCXL_MAGIC, 0x13, struct ocxl_ioctl_irq_fd)
#define OCXL_IOCTL_GET_METADATA _IOR(OCXL_MAGIC, 0x14, struct ocxl_ioctl_metadata)
+#define OCXL_IOCTL_ENABLE_P9_WAIT _IOR(OCXL_MAGIC, 0x15, struct ocxl_ioctl_p9_wait)
#endif /* _UAPI_MISC_OCXL_H */
--
2.14.3
^ permalink raw reply related
* [PATCH v4 7/7] ocxl: Document new OCXL IOCTLs
From: Alastair D'Silva @ 2018-05-09 5:35 UTC (permalink / raw)
To: linuxppc-dev
Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
felix, pombredanne, sukadev, npiggin, gregkh, arnd,
andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180509053506.9754-1-alastair@au1.ibm.com>
From: Alastair D'Silva <alastair@d-silva.org>
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
---
Documentation/accelerators/ocxl.rst | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/Documentation/accelerators/ocxl.rst b/Documentation/accelerators/ocxl.rst
index ddcc58d01cfb..14cefc020e2d 100644
--- a/Documentation/accelerators/ocxl.rst
+++ b/Documentation/accelerators/ocxl.rst
@@ -157,6 +157,17 @@ OCXL_IOCTL_GET_METADATA:
Obtains configuration information from the card, such at the size of
MMIO areas, the AFU version, and the PASID for the current context.
+OCXL_IOCTL_ENABLE_P9_WAIT:
+
+ Allows the AFU to wake a userspace thread executing 'wait'. Returns
+ information to userspace to allow it to configure the AFU. Note that
+ this is only available on POWER9.
+
+OCXL_IOCTL_GET_FEATURES:
+
+ Reports on which CPU features that affect OpenCAPI are usable from
+ userspace.
+
mmap
----
--
2.14.3
^ permalink raw reply related
* [PATCH v4 3/7] powerpc: use task_pid_nr() for TID allocation
From: Alastair D'Silva @ 2018-05-09 5:35 UTC (permalink / raw)
To: linuxppc-dev
Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
felix, pombredanne, sukadev, npiggin, gregkh, arnd,
andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180509053506.9754-1-alastair@au1.ibm.com>
From: Alastair D'Silva <alastair@d-silva.org>
The current implementation of TID allocation, using a global IDR, may
result in an errant process starving the system of available TIDs.
Instead, use task_pid_nr(), as mentioned by the original author. The
scenario described which prevented it's use is not applicable, as
set_thread_tidr can only be called after the task struct has been
populated.
In the unlikely event that 2 threads share the TID and are waiting,
all potential outcomes have been determined safe.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
---
arch/powerpc/include/asm/switch_to.h | 1 -
arch/powerpc/kernel/process.c | 122 ++++++++---------------------------
2 files changed, 28 insertions(+), 95 deletions(-)
diff --git a/arch/powerpc/include/asm/switch_to.h b/arch/powerpc/include/asm/switch_to.h
index be8c9fa23983..5b03d8a82409 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -94,6 +94,5 @@ static inline void clear_task_ebb(struct task_struct *t)
extern int set_thread_uses_vas(void);
extern int set_thread_tidr(struct task_struct *t);
-extern void clear_thread_tidr(struct task_struct *t);
#endif /* _ASM_POWERPC_SWITCH_TO_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 3b00da47699b..c5b8e53acbae 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1496,103 +1496,41 @@ int set_thread_uses_vas(void)
}
#ifdef CONFIG_PPC64
-static DEFINE_SPINLOCK(vas_thread_id_lock);
-static DEFINE_IDA(vas_thread_ida);
-
-/*
- * We need to assign a unique thread id to each thread in a process.
+/**
+ * Assign a TIDR (thread ID) for task @t and set it in the thread
+ * structure. For now, we only support setting TIDR for 'current' task.
*
- * This thread id, referred to as TIDR, and separate from the Linux's tgid,
- * is intended to be used to direct an ASB_Notify from the hardware to the
- * thread, when a suitable event occurs in the system.
+ * Since the TID value is a truncated form of it PID, it is possible
+ * (but unlikely) for 2 threads to have the same TID. In the unlikely event
+ * that 2 threads share the same TID and are waiting, one of the following
+ * cases will happen:
*
- * One such event is a "paste" instruction in the context of Fast Thread
- * Wakeup (aka Core-to-core wake up in the Virtual Accelerator Switchboard
- * (VAS) in POWER9.
+ * 1. The correct thread is running, the wrong thread is not
+ * In this situation, the correct thread is woken and proceeds to pass it's
+ * condition check.
*
- * To get a unique TIDR per process we could simply reuse task_pid_nr() but
- * the problem is that task_pid_nr() is not yet available copy_thread() is
- * called. Fixing that would require changing more intrusive arch-neutral
- * code in code path in copy_process()?.
+ * 2. Neither threads are running
+ * In this situation, neither thread will be woken. When scheduled, the waiting
+ * threads will execute either a wait, which will return immediately, followed
+ * by a condition check, which will pass for the correct thread and fail
+ * for the wrong thread, or they will execute the condition check immediately.
*
- * Further, to assign unique TIDRs within each process, we need an atomic
- * field (or an IDR) in task_struct, which again intrudes into the arch-
- * neutral code. So try to assign globally unique TIDRs for now.
+ * 3. The wrong thread is running, the correct thread is not
+ * The wrong thread will be woken, but will fail it's condition check and
+ * re-execute wait. The correct thread, when scheduled, will execute either
+ * it's condition check (which will pass), or wait, which returns immediately
+ * when called the first time after the thread is scheduled, followed by it's
+ * condition check (which will pass).
*
- * NOTE: TIDR 0 indicates that the thread does not need a TIDR value.
- * For now, only threads that expect to be notified by the VAS
- * hardware need a TIDR value and we assign values > 0 for those.
- */
-#define MAX_THREAD_CONTEXT ((1 << 16) - 1)
-static int assign_thread_tidr(void)
-{
- int index;
- int err;
- unsigned long flags;
-
-again:
- if (!ida_pre_get(&vas_thread_ida, GFP_KERNEL))
- return -ENOMEM;
-
- spin_lock_irqsave(&vas_thread_id_lock, flags);
- err = ida_get_new_above(&vas_thread_ida, 1, &index);
- spin_unlock_irqrestore(&vas_thread_id_lock, flags);
-
- if (err == -EAGAIN)
- goto again;
- else if (err)
- return err;
-
- if (index > MAX_THREAD_CONTEXT) {
- spin_lock_irqsave(&vas_thread_id_lock, flags);
- ida_remove(&vas_thread_ida, index);
- spin_unlock_irqrestore(&vas_thread_id_lock, flags);
- return -ENOMEM;
- }
-
- return index;
-}
-
-static void free_thread_tidr(int id)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&vas_thread_id_lock, flags);
- ida_remove(&vas_thread_ida, id);
- spin_unlock_irqrestore(&vas_thread_id_lock, flags);
-}
-
-/*
- * Clear any TIDR value assigned to this thread.
- */
-void clear_thread_tidr(struct task_struct *t)
-{
- if (!t->thread.tidr)
- return;
-
- if (!cpu_has_feature(CPU_FTR_P9_TIDR)) {
- WARN_ON_ONCE(1);
- return;
- }
-
- mtspr(SPRN_TIDR, 0);
- free_thread_tidr(t->thread.tidr);
- t->thread.tidr = 0;
-}
-
-void arch_release_task_struct(struct task_struct *t)
-{
- clear_thread_tidr(t);
-}
-
-/*
- * Assign a unique TIDR (thread id) for task @t and set it in the thread
- * structure. For now, we only support setting TIDR for 'current' task.
+ * 4. Both threads are running
+ * Both threads will be woken. The wrong thread will fail it's condition check
+ * and execute another wait, while the correct thread will pass it's condition
+ * check.
+ *
+ * @t: the task to set the thread ID for
*/
int set_thread_tidr(struct task_struct *t)
{
- int rc;
-
if (!cpu_has_feature(CPU_FTR_P9_TIDR))
return -EINVAL;
@@ -1602,11 +1540,7 @@ int set_thread_tidr(struct task_struct *t)
if (t->thread.tidr)
return 0;
- rc = assign_thread_tidr();
- if (rc < 0)
- return rc;
-
- t->thread.tidr = rc;
+ t->thread.tidr = (u16)task_pid_nr(t);
mtspr(SPRN_TIDR, t->thread.tidr);
return 0;
--
2.14.3
^ permalink raw reply related
* [PATCH v4 2/7] powerpc: Use TIDR CPU feature to control TIDR allocation
From: Alastair D'Silva @ 2018-05-09 5:35 UTC (permalink / raw)
To: linuxppc-dev
Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
felix, pombredanne, sukadev, npiggin, gregkh, arnd,
andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180509053506.9754-1-alastair@au1.ibm.com>
From: Alastair D'Silva <alastair@d-silva.org>
Switch the use of TIDR on it's CPU feature, rather than assuming it
is available based on architecture.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
---
arch/powerpc/kernel/process.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 1237f13fed51..3b00da47699b 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1154,7 +1154,7 @@ static inline void restore_sprs(struct thread_struct *old_thread,
mtspr(SPRN_TAR, new_thread->tar);
}
- if (cpu_has_feature(CPU_FTR_ARCH_300) &&
+ if (cpu_has_feature(CPU_FTR_P9_TIDR) &&
old_thread->tidr != new_thread->tidr)
mtspr(SPRN_TIDR, new_thread->tidr);
#endif
@@ -1570,7 +1570,7 @@ void clear_thread_tidr(struct task_struct *t)
if (!t->thread.tidr)
return;
- if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
+ if (!cpu_has_feature(CPU_FTR_P9_TIDR)) {
WARN_ON_ONCE(1);
return;
}
@@ -1593,7 +1593,7 @@ int set_thread_tidr(struct task_struct *t)
{
int rc;
- if (!cpu_has_feature(CPU_FTR_ARCH_300))
+ if (!cpu_has_feature(CPU_FTR_P9_TIDR))
return -EINVAL;
if (t != current)
--
2.14.3
^ permalink raw reply related
* [PATCH v4 6/7] ocxl: Add an IOCTL so userspace knows what OCXL features are available
From: Alastair D'Silva @ 2018-05-09 5:35 UTC (permalink / raw)
To: linuxppc-dev
Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
felix, pombredanne, sukadev, npiggin, gregkh, arnd,
andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180509053506.9754-1-alastair@au1.ibm.com>
From: Alastair D'Silva <alastair@d-silva.org>
In order for a userspace AFU driver to call the POWER9 specific
OCXL_IOCTL_ENABLE_P9_WAIT, it needs to verify that it can actually
make that call.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
---
drivers/misc/ocxl/file.c | 25 +++++++++++++++++++++++++
include/uapi/misc/ocxl.h | 4 ++++
2 files changed, 29 insertions(+)
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index eb409a469f21..33ae46ce0a8a 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -168,12 +168,32 @@ static long afu_ioctl_enable_p9_wait(struct ocxl_context *ctx,
}
#endif
+
+static long afu_ioctl_get_features(struct ocxl_context *ctx,
+ struct ocxl_ioctl_features __user *uarg)
+{
+ struct ocxl_ioctl_features arg;
+
+ memset(&arg, 0, sizeof(arg));
+
+#ifdef CONFIG_PPC64
+ if (cpu_has_feature(CPU_FTR_P9_TIDR))
+ arg.flags[0] |= OCXL_IOCTL_FEATURES_FLAGS0_P9_WAIT;
+#endif
+
+ if (copy_to_user(uarg, &arg, sizeof(arg)))
+ return -EFAULT;
+
+ return 0;
+}
+
#define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" : \
x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" : \
x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" : \
x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" : \
x == OCXL_IOCTL_GET_METADATA ? "GET_METADATA" : \
x == OCXL_IOCTL_ENABLE_P9_WAIT ? "ENABLE_P9_WAIT" : \
+ x == OCXL_IOCTL_GET_FEATURES ? "GET_FEATURES" : \
"UNKNOWN")
static long afu_ioctl(struct file *file, unsigned int cmd,
@@ -239,6 +259,11 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
break;
#endif
+ case OCXL_IOCTL_GET_FEATURES:
+ rc = afu_ioctl_get_features(ctx,
+ (struct ocxl_ioctl_features __user *) args);
+ break;
+
default:
rc = -EINVAL;
}
diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
index 8d2748e69c84..bb80f294b429 100644
--- a/include/uapi/misc/ocxl.h
+++ b/include/uapi/misc/ocxl.h
@@ -55,6 +55,9 @@ struct ocxl_ioctl_p9_wait {
__u64 reserved3[3];
};
+#define OCXL_IOCTL_FEATURES_FLAGS0_P9_WAIT 0x01
+struct ocxl_ioctl_features {
+ __u64 flags[4];
};
struct ocxl_ioctl_irq_fd {
@@ -72,5 +75,6 @@ struct ocxl_ioctl_irq_fd {
#define OCXL_IOCTL_IRQ_SET_FD _IOW(OCXL_MAGIC, 0x13, struct ocxl_ioctl_irq_fd)
#define OCXL_IOCTL_GET_METADATA _IOR(OCXL_MAGIC, 0x14, struct ocxl_ioctl_metadata)
#define OCXL_IOCTL_ENABLE_P9_WAIT _IOR(OCXL_MAGIC, 0x15, struct ocxl_ioctl_p9_wait)
+#define OCXL_IOCTL_GET_FEATURES _IOR(OCXL_MAGIC, 0x16, struct ocxl_ioctl_platform)
#endif /* _UAPI_MISC_OCXL_H */
--
2.14.3
^ permalink raw reply related
* [PATCH v4 4/7] ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
From: Alastair D'Silva @ 2018-05-09 5:35 UTC (permalink / raw)
To: linuxppc-dev
Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
felix, pombredanne, sukadev, npiggin, gregkh, arnd,
andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180509053506.9754-1-alastair@au1.ibm.com>
From: Alastair D'Silva <alastair@d-silva.org>
The function removes the process element from NPU cache.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
---
arch/powerpc/include/asm/pnv-ocxl.h | 2 +-
arch/powerpc/platforms/powernv/ocxl.c | 4 ++--
drivers/misc/ocxl/link.c | 2 +-
3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
index f6945d3bc971..208b5503f4ed 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -28,7 +28,7 @@ extern int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void __iomem **dsisr,
extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
void **platform_data);
extern void pnv_ocxl_spa_release(void *platform_data);
-extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
+extern int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);
extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
extern void pnv_ocxl_free_xive_irq(u32 irq);
diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
index fa9b53af3c7b..8c65aacda9c8 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -475,7 +475,7 @@ void pnv_ocxl_spa_release(void *platform_data)
}
EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);
-int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle)
+int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
{
struct spa_data *data = (struct spa_data *) platform_data;
int rc;
@@ -483,7 +483,7 @@ int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle)
rc = opal_npu_spa_clear_cache(data->phb_opal_id, data->bdfn, pe_handle);
return rc;
}
-EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe);
+EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe_from_cache);
int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr)
{
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index f30790582dc0..656e8610eec2 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -599,7 +599,7 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
* On powerpc, the entry needs to be cleared from the context
* cache of the NPU.
*/
- rc = pnv_ocxl_spa_remove_pe(link->platform_data, pe_handle);
+ rc = pnv_ocxl_spa_remove_pe_from_cache(link->platform_data, pe_handle);
WARN_ON(rc);
pe_data = radix_tree_delete(&spa->pe_tree, pe_handle);
--
2.14.3
^ permalink raw reply related
* [PATCH v4 1/7] powerpc: Add TIDR CPU feature for POWER9
From: Alastair D'Silva @ 2018-05-09 5:35 UTC (permalink / raw)
To: linuxppc-dev
Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
felix, pombredanne, sukadev, npiggin, gregkh, arnd,
andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180509053506.9754-1-alastair@au1.ibm.com>
From: Alastair D'Silva <alastair@d-silva.org>
This patch adds a CPU feature bit to show whether the CPU has
the TIDR register available, enabling as_notify/wait in userspace.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
---
arch/powerpc/include/asm/cputable.h | 3 ++-
arch/powerpc/kernel/dt_cpu_ftrs.c | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/cputable.h b/arch/powerpc/include/asm/cputable.h
index 4e332f3531c5..54c4cbbe57b4 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -215,6 +215,7 @@ static inline void cpu_feature_keys_init(void) { }
#define CPU_FTR_P9_TM_HV_ASSIST LONG_ASM_CONST(0x0000100000000000)
#define CPU_FTR_P9_TM_XER_SO_BUG LONG_ASM_CONST(0x0000200000000000)
#define CPU_FTR_P9_TLBIE_BUG LONG_ASM_CONST(0x0000400000000000)
+#define CPU_FTR_P9_TIDR LONG_ASM_CONST(0x0000800000000000)
#ifndef __ASSEMBLY__
@@ -462,7 +463,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
- CPU_FTR_P9_TLBIE_BUG)
+ CPU_FTR_P9_TLBIE_BUG | CPU_FTR_P9_TIDR)
#define CPU_FTRS_POWER9_DD1 ((CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD1) & \
(~CPU_FTR_SAO))
#define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 11a3a4fed3fb..10f8b7f55637 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -722,6 +722,7 @@ static __init void cpufeatures_cpu_quirks(void)
if ((version & 0xffff0000) == 0x004e0000) {
cur_cpu_spec->cpu_features &= ~(CPU_FTR_DAWR);
cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG;
+ cur_cpu_spec->cpu_features |= CPU_FTR_P9_TIDR;
}
}
--
2.14.3
^ permalink raw reply related
* [PATCH v4 0/7] ocxl: Implement Power9 as_notify/wait for OpenCAPI
From: Alastair D'Silva @ 2018-05-09 5:34 UTC (permalink / raw)
To: linuxppc-dev
Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
felix, pombredanne, sukadev, npiggin, gregkh, arnd,
andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180509004212.4506-1-alastair@au1.ibm.com>
From: Alastair D'Silva <alastair@d-silva.org>
The Power 9 as_notify/wait feature provides a lower latency way to
signal a thread that work is complete. This series enables the use of
this feature from OpenCAPI adapters, as well as addressing a potential
starvation issue when allocating thread IDs.
Changelog:
v4:
Remove the "unique" statement from the set_thread_tidr function and
move the text explaining why it is safe from the commit message
to the function description
v3:
Fix references to POWER9
Remove stray whitespace edit from docs
Add more details to commit message for "use task_pid_nr()"
Retitle patch 6 to indicate OCXL rather than CPU features
v2:
Rename get_platform IOCTL to get_features
Move stray edit from patch 1 to patch 3
Alastair D'Silva (7):
powerpc: Add TIDR CPU feature for POWER9
powerpc: Use TIDR CPU feature to control TIDR allocation
powerpc: use task_pid_nr() for TID allocation
ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
ocxl: Expose the thread_id needed for wait on POWER9
ocxl: Add an IOCTL so userspace knows what OCXL features are available
ocxl: Document new OCXL IOCTLs
Documentation/accelerators/ocxl.rst | 11 ++++
arch/powerpc/include/asm/cputable.h | 3 +-
arch/powerpc/include/asm/pnv-ocxl.h | 2 +-
arch/powerpc/include/asm/switch_to.h | 1 -
arch/powerpc/kernel/dt_cpu_ftrs.c | 1 +
arch/powerpc/kernel/process.c | 101 +---------------------------------
arch/powerpc/platforms/powernv/ocxl.c | 4 +-
drivers/misc/ocxl/context.c | 5 +-
drivers/misc/ocxl/file.c | 78 ++++++++++++++++++++++++++
drivers/misc/ocxl/link.c | 38 ++++++++++++-
drivers/misc/ocxl/ocxl_internal.h | 1 +
include/misc/ocxl.h | 9 +++
include/uapi/misc/ocxl.h | 14 +++++
13 files changed, 163 insertions(+), 105 deletions(-)
--
2.14.3
^ permalink raw reply
* [PATCH v2 5/9] powerpc/mm/radix: implement LPID based TLB flushes to be used by KVM
From: Nicholas Piggin @ 2018-05-09 2:20 UTC (permalink / raw)
To: kvm-ppc; +Cc: Nicholas Piggin, linuxppc-dev
In-Reply-To: <20180509022022.21226-1-npiggin@gmail.com>
Implement a local TLB flush for invalidating an LPID with variants for
process or partition scope. And a global TLB flush for invalidating
a partition scoped page of an LPID.
These will be used by KVM in subsequent patches.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
.../include/asm/book3s/64/tlbflush-radix.h | 7 +
arch/powerpc/mm/tlb-radix.c | 207 ++++++++++++++++++
2 files changed, 214 insertions(+)
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 19b45ba6caf9..ef5c3f2994c9 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -51,4 +51,11 @@ extern void radix__flush_tlb_all(void);
extern void radix__flush_tlb_pte_p9_dd1(unsigned long old_pte, struct mm_struct *mm,
unsigned long address);
+extern void radix__flush_tlb_lpid_page(unsigned int lpid,
+ unsigned long addr,
+ unsigned long page_size);
+extern void radix__flush_pwc_lpid(unsigned int lpid);
+extern void radix__local_flush_tlb_lpid(unsigned int lpid);
+extern void radix__local_flush_tlb_lpid_guest(unsigned int lpid);
+
#endif
diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c
index a5d7309c2d05..5ac3206c51cc 100644
--- a/arch/powerpc/mm/tlb-radix.c
+++ b/arch/powerpc/mm/tlb-radix.c
@@ -118,6 +118,53 @@ static inline void __tlbie_pid(unsigned long pid, unsigned long ric)
trace_tlbie(0, 0, rb, rs, ric, prs, r);
}
+static inline void __tlbiel_lpid(unsigned long lpid, int set,
+ unsigned long ric)
+{
+ unsigned long rb,rs,prs,r;
+
+ rb = PPC_BIT(52); /* IS = 2 */
+ rb |= set << PPC_BITLSHIFT(51);
+ rs = 0; /* LPID comes from LPIDR */
+ prs = 0; /* partition scoped */
+ r = 1; /* radix format */
+
+ asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
+ : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
+ trace_tlbie(lpid, 1, rb, rs, ric, prs, r);
+}
+
+static inline void __tlbie_lpid(unsigned long lpid, unsigned long ric)
+{
+ unsigned long rb,rs,prs,r;
+
+ rb = PPC_BIT(52); /* IS = 2 */
+ rs = lpid;
+ prs = 0; /* partition scoped */
+ r = 1; /* radix format */
+
+ asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
+ : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
+ trace_tlbie(lpid, 0, rb, rs, ric, prs, r);
+}
+
+static inline void __tlbiel_lpid_guest(unsigned long lpid, int set,
+ unsigned long ric)
+{
+ unsigned long rb,rs,prs,r;
+
+ rb = PPC_BIT(52); /* IS = 2 */
+ rb |= set << PPC_BITLSHIFT(51);
+ rs = 0; /* LPID comes from LPIDR */
+ prs = 1; /* process scoped */
+ r = 1; /* radix format */
+
+ asm volatile(PPC_TLBIEL(%0, %4, %3, %2, %1)
+ : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
+ trace_tlbie(lpid, 1, rb, rs, ric, prs, r);
+}
+
+
static inline void __tlbiel_va(unsigned long va, unsigned long pid,
unsigned long ap, unsigned long ric)
{
@@ -150,6 +197,22 @@ static inline void __tlbie_va(unsigned long va, unsigned long pid,
trace_tlbie(0, 0, rb, rs, ric, prs, r);
}
+static inline void __tlbie_lpid_va(unsigned long va, unsigned long lpid,
+ unsigned long ap, unsigned long ric)
+{
+ unsigned long rb,rs,prs,r;
+
+ rb = va & ~(PPC_BITMASK(52, 63));
+ rb |= ap << PPC_BITLSHIFT(58);
+ rs = lpid;
+ prs = 0; /* partition scoped */
+ r = 1; /* radix format */
+
+ asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
+ : : "r"(rb), "i"(r), "i"(prs), "i"(ric), "r"(rs) : "memory");
+ trace_tlbie(lpid, 0, rb, rs, ric, prs, r);
+}
+
static inline void fixup_tlbie(void)
{
unsigned long pid = 0;
@@ -161,6 +224,16 @@ static inline void fixup_tlbie(void)
}
}
+static inline void fixup_tlbie_lpid(unsigned long lpid)
+{
+ unsigned long va = ((1UL << 52) - 1);
+
+ if (cpu_has_feature(CPU_FTR_P9_TLBIE_BUG)) {
+ asm volatile("ptesync": : :"memory");
+ __tlbie_lpid_va(va, lpid, mmu_get_ap(MMU_PAGE_64K), RIC_FLUSH_TLB);
+ }
+}
+
/*
* We use 128 set in radix mode and 256 set in hpt mode.
*/
@@ -214,6 +287,86 @@ static inline void _tlbie_pid(unsigned long pid, unsigned long ric)
asm volatile("eieio; tlbsync; ptesync": : :"memory");
}
+static inline void _tlbiel_lpid(unsigned long lpid, unsigned long ric)
+{
+ int set;
+
+ VM_BUG_ON(mfspr(SPRN_LPID) != lpid);
+
+ asm volatile("ptesync": : :"memory");
+
+ /*
+ * Flush the first set of the TLB, and if we're doing a RIC_FLUSH_ALL,
+ * also flush the entire Page Walk Cache.
+ */
+ __tlbiel_lpid(lpid, 0, ric);
+
+ /* For PWC, only one flush is needed */
+ if (ric == RIC_FLUSH_PWC) {
+ asm volatile("ptesync": : :"memory");
+ return;
+ }
+
+ /* For the remaining sets, just flush the TLB */
+ for (set = 1; set < POWER9_TLB_SETS_RADIX ; set++)
+ __tlbiel_lpid(lpid, set, RIC_FLUSH_TLB);
+
+ asm volatile("ptesync": : :"memory");
+ asm volatile(PPC_INVALIDATE_ERAT "; isync" : : :"memory");
+}
+
+static inline void _tlbie_lpid(unsigned long lpid, unsigned long ric)
+{
+ asm volatile("ptesync": : :"memory");
+
+ /*
+ * Workaround the fact that the "ric" argument to __tlbie_pid
+ * must be a compile-time contraint to match the "i" constraint
+ * in the asm statement.
+ */
+ switch (ric) {
+ case RIC_FLUSH_TLB:
+ __tlbie_lpid(lpid, RIC_FLUSH_TLB);
+ break;
+ case RIC_FLUSH_PWC:
+ __tlbie_lpid(lpid, RIC_FLUSH_PWC);
+ break;
+ case RIC_FLUSH_ALL:
+ default:
+ __tlbie_lpid(lpid, RIC_FLUSH_ALL);
+ }
+ fixup_tlbie_lpid(lpid);
+ asm volatile("eieio; tlbsync; ptesync": : :"memory");
+}
+
+static inline void _tlbiel_lpid_guest(unsigned long lpid, unsigned long ric)
+{
+ int set;
+
+ VM_BUG_ON(mfspr(SPRN_LPID) != lpid);
+
+ asm volatile("ptesync": : :"memory");
+
+ /*
+ * Flush the first set of the TLB, and if we're doing a RIC_FLUSH_ALL,
+ * also flush the entire Page Walk Cache.
+ */
+ __tlbiel_lpid_guest(lpid, 0, ric);
+
+ /* For PWC, only one flush is needed */
+ if (ric == RIC_FLUSH_PWC) {
+ asm volatile("ptesync": : :"memory");
+ return;
+ }
+
+ /* For the remaining sets, just flush the TLB */
+ for (set = 1; set < POWER9_TLB_SETS_RADIX ; set++)
+ __tlbiel_lpid_guest(lpid, set, RIC_FLUSH_TLB);
+
+ asm volatile("ptesync": : :"memory");
+}
+
+
static inline void __tlbiel_va_range(unsigned long start, unsigned long end,
unsigned long pid, unsigned long page_size,
unsigned long psize)
@@ -268,6 +421,17 @@ static inline void _tlbie_va(unsigned long va, unsigned long pid,
asm volatile("eieio; tlbsync; ptesync": : :"memory");
}
+static inline void _tlbie_lpid_va(unsigned long va, unsigned long lpid,
+ unsigned long psize, unsigned long ric)
+{
+ unsigned long ap = mmu_get_ap(psize);
+
+ asm volatile("ptesync": : :"memory");
+ __tlbie_lpid_va(va, lpid, ap, ric);
+ fixup_tlbie_lpid(lpid);
+ asm volatile("eieio; tlbsync; ptesync": : :"memory");
+}
+
static inline void _tlbie_va_range(unsigned long start, unsigned long end,
unsigned long pid, unsigned long page_size,
unsigned long psize, bool also_pwc)
@@ -534,6 +698,49 @@ static int radix_get_mmu_psize(int page_size)
return psize;
}
+/*
+ * Flush partition scoped LPID address translation for all CPUs.
+ */
+void radix__flush_tlb_lpid_page(unsigned int lpid,
+ unsigned long addr,
+ unsigned long page_size)
+{
+ int psize = radix_get_mmu_psize(page_size);
+
+ _tlbie_lpid_va(addr, lpid, psize, RIC_FLUSH_TLB);
+}
+EXPORT_SYMBOL_GPL(radix__flush_tlb_lpid_page);
+
+/*
+ * Flush partition scoped PWC from LPID for all CPUs.
+ */
+void radix__flush_pwc_lpid(unsigned int lpid)
+{
+ _tlbie_lpid(lpid, RIC_FLUSH_PWC);
+}
+EXPORT_SYMBOL_GPL(radix__flush_pwc_lpid);
+
+/*
+ * Flush partition scoped translations from LPID (=LPIDR)
+ */
+void radix__local_flush_tlb_lpid(unsigned int lpid)
+{
+ _tlbiel_lpid(lpid, RIC_FLUSH_ALL);
+}
+EXPORT_SYMBOL_GPL(radix__local_flush_tlb_lpid);
+
+/*
+ * Flush process scoped translations from LPID (=LPIDR).
+ * Important difference, the guest normally manages its own translations,
+ * but some cases e.g., vCPU CPU migration require KVM to flush.
+ */
+void radix__local_flush_tlb_lpid_guest(unsigned int lpid)
+{
+ _tlbiel_lpid_guest(lpid, RIC_FLUSH_ALL);
+}
+EXPORT_SYMBOL_GPL(radix__local_flush_tlb_lpid_guest);
+
+
static void radix__flush_tlb_pwc_range_psize(struct mm_struct *mm, unsigned long start,
unsigned long end, int psize);
--
2.17.0
^ permalink raw reply related
* Re: [PATCH 8/8] mm/pkeys, x86, powerpc: Display pkey in smaps if arch supports pkeys
From: Michael Ellerman @ 2018-05-09 1:57 UTC (permalink / raw)
To: Dave Hansen, linuxram; +Cc: mingo, linuxppc-dev, linux-mm, x86, linux-kernel
In-Reply-To: <cd58dbfe-55f9-bb9a-d907-0f92740a8c2e@intel.com>
Dave Hansen <dave.hansen@intel.com> writes:
> On 05/08/2018 07:59 AM, Michael Ellerman wrote:
>> Currently the architecture specific code is expected to display the
>> protection keys in smap for a given vma. This can lead to redundant
>> code and possibly to divergent formats in which the key gets
>> displayed.
>>
>> This patch changes the implementation. It displays the pkey only if
>> the architecture support pkeys, i.e arch_pkeys_enabled() returns true.
>
>
> For this, along with 6/8 and 7/8:
>
> Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Thanks for reviewing them all.
cheers
^ permalink raw reply
* Re: [PATCH 5/8] x86/pkeys: Move vma_pkey() into asm/pkeys.h
From: Michael Ellerman @ 2018-05-09 1:56 UTC (permalink / raw)
To: Dave Hansen, linuxram; +Cc: mingo, linuxppc-dev, linux-mm, x86, linux-kernel
In-Reply-To: <f9240079-2ff2-9d3f-ba15-fefe65c67779@intel.com>
Dave Hansen <dave.hansen@intel.com> writes:
> On 05/08/2018 07:59 AM, Michael Ellerman wrote:
>> Move the last remaining pkey helper, vma_pkey() into asm/pkeys.h
>
> Fine with me, as long as it compiles. :)
Yeah fair point :)
It survived the kbuild robot, so fingers crossed. I'll let it sit in
linux-next for a while too.
cheers
From: kbuild test robot <lkp@intel.com> (Today 04:11)
Subject: [powerpc:topic/pkey] BUILD SUCCESS 1a03cd3b474084aaa967f646379495a14716632d
To: Michael Ellerman <mpe@ellerman.id.au>
Date: Wed, 09 May 2018 02:11:24 +0800
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git topic/pkey
branch HEAD: 1a03cd3b474084aaa967f646379495a14716632d mm/pkeys, x86, powerpc: Display pkey in smaps if arch supports pkeys
elapsed time: 194m
configs tested: 113
The following configs have been built successfully.
More configs may be tested in the coming days.
i386 tinyconfig
i386 randconfig-x010-201818
i386 randconfig-x011-201818
i386 randconfig-x012-201818
i386 randconfig-x013-201818
i386 randconfig-x014-201818
i386 randconfig-x015-201818
i386 randconfig-x016-201818
i386 randconfig-x017-201818
i386 randconfig-x018-201818
i386 randconfig-x019-201818
microblaze mmu_defconfig
microblaze nommu_defconfig
i386 randconfig-i1-201818
i386 randconfig-i0-201818
alpha defconfig
parisc allnoconfig
parisc b180_defconfig
parisc c3000_defconfig
parisc defconfig
x86_64 randconfig-x010-201818
x86_64 randconfig-x011-201818
x86_64 randconfig-x012-201818
x86_64 randconfig-x013-201818
x86_64 randconfig-x014-201818
x86_64 randconfig-x015-201818
x86_64 randconfig-x016-201818
x86_64 randconfig-x017-201818
x86_64 randconfig-x018-201818
x86_64 randconfig-x019-201818
x86_64 kexec
x86_64 federa-25
x86_64 rhel
x86_64 rhel-7.2
i386 randconfig-a0-201818
i386 randconfig-a1-201818
x86_64 randconfig-s0-05090041
x86_64 randconfig-s1-05090041
x86_64 randconfig-s2-05090041
openrisc or1ksim_defconfig
um i386_defconfig
um x86_64_defconfig
x86_64 acpi-redef
x86_64 allyesdebian
x86_64 nfsroot
m68k sun3_defconfig
m68k multi_defconfig
m68k m5475evb_defconfig
i386 randconfig-s1-201818
i386 randconfig-s0-201818
x86_64 randconfig-s3-05090001
x86_64 randconfig-s4-05090001
x86_64 randconfig-s5-05090001
x86_64 randconfig-i0-201818
i386 randconfig-b0-05090042
x86_64 randconfig-u0-05090045
c6x evmc6678_defconfig
h8300 h8300h-sim_defconfig
nios2 10m50_defconfig
xtensa common_defconfig
xtensa iss_defconfig
i386 allmodconfig
i386 randconfig-x008-201818
i386 randconfig-x006-201818
i386 randconfig-x000-201818
i386 randconfig-x004-201818
i386 randconfig-x009-201818
i386 randconfig-x003-201818
i386 randconfig-x001-201818
i386 randconfig-x002-201818
i386 randconfig-x005-201818
i386 randconfig-x007-201818
arm at91_dt_defconfig
arm allnoconfig
arm efm32_defconfig
arm64 defconfig
arm multi_v5_defconfig
arm sunxi_defconfig
arm64 allnoconfig
arm exynos_defconfig
arm shmobile_defconfig
arm multi_v7_defconfig
i386 randconfig-x073-201818
i386 randconfig-x075-201818
i386 randconfig-x079-201818
i386 randconfig-x071-201818
i386 randconfig-x076-201818
i386 randconfig-x074-201818
i386 randconfig-x078-201818
i386 randconfig-x077-201818
i386 randconfig-x070-201818
i386 randconfig-x072-201818
x86_64 randconfig-x002-201818
x86_64 randconfig-x006-201818
x86_64 randconfig-x005-201818
x86_64 randconfig-x001-201818
x86_64 randconfig-x009-201818
x86_64 randconfig-x004-201818
x86_64 randconfig-x003-201818
x86_64 randconfig-x007-201818
x86_64 randconfig-x000-201818
x86_64 randconfig-x008-201818
ia64 alldefconfig
ia64 allnoconfig
ia64 defconfig
i386 allnoconfig
i386 defconfig
i386 alldefconfig
mips jz4740
mips malta_kvm_defconfig
mips allnoconfig
mips fuloong2e_defconfig
mips txx9
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
^ permalink raw reply
* Re: [PATCH v3] ppc64le livepatch: implement reliable stacktrace for newer consistency models
From: Michael Ellerman @ 2018-05-09 1:41 UTC (permalink / raw)
To: Josh Poimboeuf, Torsten Duwe
Cc: Jiri Kosina, linuxppc-dev, linux-kernel, Nicholas Piggin,
live-patching
In-Reply-To: <20180509001432.5nd55ictwf5yi7sm@treble>
Josh Poimboeuf <jpoimboe@redhat.com> writes:
> On Tue, May 08, 2018 at 10:38:32AM +0200, Torsten Duwe wrote:
>> On Mon, 7 May 2018 10:42:08 -0500
>> Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>>
>> > The subject doesn't actively describe what the patch does, maybe
>> > change it to something like:
>> >
>> > powerpc: Add support for HAVE_RELIABLE_STACKTRACE
>> >
>> > or maybe
>> >
>> > powerpc: Add support for livepatch consistency model
>>
>> Maybe $SUBJECT? You're absolutely right, the old subject was just a
>> leftover of my original attempt to just set the flag, before Miroslav
>> corrected me. I just kept on copying it without a second thought. Thanks
>> for noting.
>
> Generally we refer to it as "the consistency model". So I would
> propose a minor edit:
>
> ppc64le/livepatch: Implement reliable stack tracing for the consistency model
We use "powerpc" as the prefix.
So I've used:
powerpc/livepatch: Implement reliable stack tracing for the consistency model
cheers
^ permalink raw reply
* [PATCH v3 5/7] ocxl: Expose the thread_id needed for wait on POWER9
From: Alastair D'Silva @ 2018-05-09 0:42 UTC (permalink / raw)
To: linuxppc-dev
Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
felix, pombredanne, sukadev, npiggin, gregkh, arnd,
andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180509004212.4506-1-alastair@au1.ibm.com>
From: Alastair D'Silva <alastair@d-silva.org>
In order to successfully issue as_notify, an AFU needs to know the TID
to notify, which in turn means that this information should be
available in userspace so it can be communicated to the AFU.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
---
drivers/misc/ocxl/context.c | 5 +++-
drivers/misc/ocxl/file.c | 53 +++++++++++++++++++++++++++++++++++++++
drivers/misc/ocxl/link.c | 36 ++++++++++++++++++++++++++
drivers/misc/ocxl/ocxl_internal.h | 1 +
include/misc/ocxl.h | 9 +++++++
include/uapi/misc/ocxl.h | 10 ++++++++
6 files changed, 113 insertions(+), 1 deletion(-)
diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 909e8807824a..95f74623113e 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -34,6 +34,8 @@ int ocxl_context_init(struct ocxl_context *ctx, struct ocxl_afu *afu,
mutex_init(&ctx->xsl_error_lock);
mutex_init(&ctx->irq_lock);
idr_init(&ctx->irq_idr);
+ ctx->tidr = 0;
+
/*
* Keep a reference on the AFU to make sure it's valid for the
* duration of the life of the context
@@ -65,6 +67,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
{
int rc;
+ // Locks both status & tidr
mutex_lock(&ctx->status_mutex);
if (ctx->status != OPENED) {
rc = -EIO;
@@ -72,7 +75,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
}
rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid,
- current->mm->context.id, 0, amr, current->mm,
+ current->mm->context.id, ctx->tidr, amr, current->mm,
xsl_fault_error, ctx);
if (rc)
goto out;
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index 038509e5d031..eb409a469f21 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -5,6 +5,8 @@
#include <linux/sched/signal.h>
#include <linux/uaccess.h>
#include <uapi/misc/ocxl.h>
+#include <asm/reg.h>
+#include <asm/switch_to.h>
#include "ocxl_internal.h"
@@ -123,11 +125,55 @@ static long afu_ioctl_get_metadata(struct ocxl_context *ctx,
return 0;
}
+#ifdef CONFIG_PPC64
+static long afu_ioctl_enable_p9_wait(struct ocxl_context *ctx,
+ struct ocxl_ioctl_p9_wait __user *uarg)
+{
+ struct ocxl_ioctl_p9_wait arg;
+
+ memset(&arg, 0, sizeof(arg));
+
+ if (cpu_has_feature(CPU_FTR_P9_TIDR)) {
+ enum ocxl_context_status status;
+
+ // Locks both status & tidr
+ mutex_lock(&ctx->status_mutex);
+ if (!ctx->tidr) {
+ if (set_thread_tidr(current))
+ return -ENOENT;
+
+ ctx->tidr = current->thread.tidr;
+ }
+
+ status = ctx->status;
+ mutex_unlock(&ctx->status_mutex);
+
+ if (status == ATTACHED) {
+ int rc;
+ struct link *link = ctx->afu->fn->link;
+
+ rc = ocxl_link_update_pe(link, ctx->pasid, ctx->tidr);
+ if (rc)
+ return rc;
+ }
+
+ arg.thread_id = ctx->tidr;
+ } else
+ return -ENOENT;
+
+ if (copy_to_user(uarg, &arg, sizeof(arg)))
+ return -EFAULT;
+
+ return 0;
+}
+#endif
+
#define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" : \
x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" : \
x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" : \
x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" : \
x == OCXL_IOCTL_GET_METADATA ? "GET_METADATA" : \
+ x == OCXL_IOCTL_ENABLE_P9_WAIT ? "ENABLE_P9_WAIT" : \
"UNKNOWN")
static long afu_ioctl(struct file *file, unsigned int cmd,
@@ -186,6 +232,13 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
(struct ocxl_ioctl_metadata __user *) args);
break;
+#ifdef CONFIG_PPC64
+ case OCXL_IOCTL_ENABLE_P9_WAIT:
+ rc = afu_ioctl_enable_p9_wait(ctx,
+ (struct ocxl_ioctl_p9_wait __user *) args);
+ break;
+#endif
+
default:
rc = -EINVAL;
}
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 656e8610eec2..88876ae8f330 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -544,6 +544,42 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
}
EXPORT_SYMBOL_GPL(ocxl_link_add_pe);
+int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
+{
+ struct link *link = (struct link *) link_handle;
+ struct spa *spa = link->spa;
+ struct ocxl_process_element *pe;
+ int pe_handle, rc;
+
+ if (pasid > SPA_PASID_MAX)
+ return -EINVAL;
+
+ pe_handle = pasid & SPA_PE_MASK;
+ pe = spa->spa_mem + pe_handle;
+
+ mutex_lock(&spa->spa_lock);
+
+ pe->tid = tid;
+
+ /*
+ * The barrier makes sure the PE is updated
+ * before we clear the NPU context cache below, so that the
+ * old PE cannot be reloaded erroneously.
+ */
+ mb();
+
+ /*
+ * hook to platform code
+ * On powerpc, the entry needs to be cleared from the context
+ * cache of the NPU.
+ */
+ rc = pnv_ocxl_spa_remove_pe_from_cache(link->platform_data, pe_handle);
+ WARN_ON(rc);
+
+ mutex_unlock(&spa->spa_lock);
+ return rc;
+}
+
int ocxl_link_remove_pe(void *link_handle, int pasid)
{
struct link *link = (struct link *) link_handle;
diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
index 5d421824afd9..a32f2151029f 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -77,6 +77,7 @@ struct ocxl_context {
struct ocxl_xsl_error xsl_error;
struct mutex irq_lock;
struct idr irq_idr;
+ u16 tidr; // Thread ID used for P9 wait implementation
};
struct ocxl_process_element {
diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 51ccf76db293..9ff6ddc28e22 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -188,6 +188,15 @@ extern int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data);
+/**
+ * Update values within a Process Element
+ *
+ * link_handle: the link handle associated with the process element
+ * pasid: the PASID for the AFU context
+ * tid: the new thread id for the process element
+ */
+extern int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
+
/*
* Remove a Process Element from the Shared Process Area for a link
*/
diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
index 0af83d80fb3e..8d2748e69c84 100644
--- a/include/uapi/misc/ocxl.h
+++ b/include/uapi/misc/ocxl.h
@@ -48,6 +48,15 @@ struct ocxl_ioctl_metadata {
__u64 reserved[13]; // Total of 16*u64
};
+struct ocxl_ioctl_p9_wait {
+ __u16 thread_id; // The thread ID required to wake this thread
+ __u16 reserved1;
+ __u32 reserved2;
+ __u64 reserved3[3];
+};
+
+};
+
struct ocxl_ioctl_irq_fd {
__u64 irq_offset;
__s32 eventfd;
@@ -62,5 +71,6 @@ struct ocxl_ioctl_irq_fd {
#define OCXL_IOCTL_IRQ_FREE _IOW(OCXL_MAGIC, 0x12, __u64)
#define OCXL_IOCTL_IRQ_SET_FD _IOW(OCXL_MAGIC, 0x13, struct ocxl_ioctl_irq_fd)
#define OCXL_IOCTL_GET_METADATA _IOR(OCXL_MAGIC, 0x14, struct ocxl_ioctl_metadata)
+#define OCXL_IOCTL_ENABLE_P9_WAIT _IOR(OCXL_MAGIC, 0x15, struct ocxl_ioctl_p9_wait)
#endif /* _UAPI_MISC_OCXL_H */
--
2.14.3
^ permalink raw reply related
* [PATCH v3 6/7] ocxl: Add an IOCTL so userspace knows what OCXL features are available
From: Alastair D'Silva @ 2018-05-09 0:42 UTC (permalink / raw)
To: linuxppc-dev
Cc: linux-kernel, linux-doc, mikey, vaibhav, aneesh.kumar, malat,
felix, pombredanne, sukadev, npiggin, gregkh, arnd,
andrew.donnellan, fbarrat, corbet, Alastair D'Silva
In-Reply-To: <20180509004212.4506-1-alastair@au1.ibm.com>
From: Alastair D'Silva <alastair@d-silva.org>
In order for a userspace AFU driver to call the POWER9 specific
OCXL_IOCTL_ENABLE_P9_WAIT, it needs to verify that it can actually
make that call.
Signed-off-by: Alastair D'Silva <alastair@d-silva.org>
---
drivers/misc/ocxl/file.c | 25 +++++++++++++++++++++++++
include/uapi/misc/ocxl.h | 4 ++++
2 files changed, 29 insertions(+)
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index eb409a469f21..33ae46ce0a8a 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -168,12 +168,32 @@ static long afu_ioctl_enable_p9_wait(struct ocxl_context *ctx,
}
#endif
+
+static long afu_ioctl_get_features(struct ocxl_context *ctx,
+ struct ocxl_ioctl_features __user *uarg)
+{
+ struct ocxl_ioctl_features arg;
+
+ memset(&arg, 0, sizeof(arg));
+
+#ifdef CONFIG_PPC64
+ if (cpu_has_feature(CPU_FTR_P9_TIDR))
+ arg.flags[0] |= OCXL_IOCTL_FEATURES_FLAGS0_P9_WAIT;
+#endif
+
+ if (copy_to_user(uarg, &arg, sizeof(arg)))
+ return -EFAULT;
+
+ return 0;
+}
+
#define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" : \
x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" : \
x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" : \
x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" : \
x == OCXL_IOCTL_GET_METADATA ? "GET_METADATA" : \
x == OCXL_IOCTL_ENABLE_P9_WAIT ? "ENABLE_P9_WAIT" : \
+ x == OCXL_IOCTL_GET_FEATURES ? "GET_FEATURES" : \
"UNKNOWN")
static long afu_ioctl(struct file *file, unsigned int cmd,
@@ -239,6 +259,11 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
break;
#endif
+ case OCXL_IOCTL_GET_FEATURES:
+ rc = afu_ioctl_get_features(ctx,
+ (struct ocxl_ioctl_features __user *) args);
+ break;
+
default:
rc = -EINVAL;
}
diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
index 8d2748e69c84..bb80f294b429 100644
--- a/include/uapi/misc/ocxl.h
+++ b/include/uapi/misc/ocxl.h
@@ -55,6 +55,9 @@ struct ocxl_ioctl_p9_wait {
__u64 reserved3[3];
};
+#define OCXL_IOCTL_FEATURES_FLAGS0_P9_WAIT 0x01
+struct ocxl_ioctl_features {
+ __u64 flags[4];
};
struct ocxl_ioctl_irq_fd {
@@ -72,5 +75,6 @@ struct ocxl_ioctl_irq_fd {
#define OCXL_IOCTL_IRQ_SET_FD _IOW(OCXL_MAGIC, 0x13, struct ocxl_ioctl_irq_fd)
#define OCXL_IOCTL_GET_METADATA _IOR(OCXL_MAGIC, 0x14, struct ocxl_ioctl_metadata)
#define OCXL_IOCTL_ENABLE_P9_WAIT _IOR(OCXL_MAGIC, 0x15, struct ocxl_ioctl_p9_wait)
+#define OCXL_IOCTL_GET_FEATURES _IOR(OCXL_MAGIC, 0x16, struct ocxl_ioctl_platform)
#endif /* _UAPI_MISC_OCXL_H */
--
2.14.3
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox