* Re: [PATCH v2 1/1] powerpc/kvm: Save Timebase Offset to fix sched_clock() while running guest code.
From: Leonardo Bras @ 2021-02-05 20:35 UTC (permalink / raw)
To: Fabiano Rosas, Paul Mackerras, Michael Ellerman,
Benjamin Herrenschmidt, Christophe Leroy, Athira Rajeev,
Aneesh Kumar K.V, Jordan Niethe, Nicholas Piggin,
Frederic Weisbecker, Thomas Gleixner, Geert Uytterhoeven
Cc: linuxppc-dev, linux-kernel, kvm-ppc
In-Reply-To: <874kiqy82t.fsf@linux.ibm.com>
Hello Fabiano,
Thanks for reviewing!
(answers inline)
On Fri, 2021-02-05 at 10:09 -0300, Fabiano Rosas wrote:
> Leonardo Bras <leobras.c@gmail.com> writes:
>
> > Before guest entry, TBU40 register is changed to reflect guest timebase.
> > After exitting guest, the register is reverted to it's original value.
> >
> > If one tries to get the timestamp from host between those changes, it
> > will present an incorrect value.
> >
> > An example would be trying to add a tracepoint in
> > kvmppc_guest_entry_inject_int(), which depending on last tracepoint
> > acquired could actually cause the host to crash.
> >
> > Save the Timebase Offset to PACA and use it on sched_clock() to always
> > get the correct timestamp.
> >
> > Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> > Suggested-by: Paul Mackerras <paulus@ozlabs.org>
> > ---
> > Changes since v1:
> > - Subtracts offset only when CONFIG_KVM_BOOK3S_HANDLER and
> > CONFIG_PPC_BOOK3S_64 are defined.
> > ---
> > arch/powerpc/include/asm/kvm_book3s_asm.h | 1 +
> > arch/powerpc/kernel/asm-offsets.c | 1 +
> > arch/powerpc/kernel/time.c | 8 +++++++-
> > arch/powerpc/kvm/book3s_hv.c | 2 ++
> > arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 ++
> > 5 files changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
> > index 078f4648ea27..e2c12a10eed2 100644
> > --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
> > +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
> > @@ -131,6 +131,7 @@ struct kvmppc_host_state {
> > u64 cfar;
> > u64 ppr;
> > u64 host_fscr;
> > + u64 tb_offset; /* Timebase offset: keeps correct
> > timebase while on guest */
>
> Couldn't you use the vc->tb_offset_applied for this? We have a reference
> for the vcore in the hstate already.
But it's a pointer, which means we would have to keep checking for NULL
every time we need sched_clock().
Potentially it would cost a cache miss for PACA memory region that
contain vc, another for getting the part of *vc that contains the
tb_offset_applied, instead of only one for PACA struct region that
contains tb_offset.
On the other hand, it got me thinking: If the offset is applied per
cpu, why don't we get this info only in PACA, instead of in vc?
It could be a general way to get an offset applied for any purpose and
still get the sched_clock() right.
(Not that I have any idea of any other purpose we could use it)
Best regards!
Leonardo Bras
>
> > #endif
> > };
> >
> > diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> > index b12d7c049bfe..0beb8fdc6352 100644
> > --- a/arch/powerpc/kernel/asm-offsets.c
> > +++ b/arch/powerpc/kernel/asm-offsets.c
> > @@ -706,6 +706,7 @@ int main(void)
> > HSTATE_FIELD(HSTATE_CFAR, cfar);
> > HSTATE_FIELD(HSTATE_PPR, ppr);
> > HSTATE_FIELD(HSTATE_HOST_FSCR, host_fscr);
> > + HSTATE_FIELD(HSTATE_TB_OFFSET, tb_offset);
> > #endif /* CONFIG_PPC_BOOK3S_64 */
> >
> > #else /* CONFIG_PPC_BOOK3S */
> > diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> > index 67feb3524460..f27f0163792b 100644
> > --- a/arch/powerpc/kernel/time.c
> > +++ b/arch/powerpc/kernel/time.c
> > @@ -699,7 +699,13 @@ EXPORT_SYMBOL_GPL(tb_to_ns);
> > */
> > notrace unsigned long long sched_clock(void)
> > {
> > - return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
> > + u64 tb = get_tb() - boot_tb;
> > +
> > +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_HANDLER)
> > + tb -= local_paca->kvm_hstate.tb_offset;
> > +#endif
> > +
> > + return mulhdu(tb, tb_to_ns_scale) << tb_to_ns_shift;
> > }
> >
> >
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index b3731572295e..c08593c63353 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -3491,6 +3491,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
> > if ((tb & 0xffffff) < (new_tb & 0xffffff))
> > mtspr(SPRN_TBU40, new_tb + 0x1000000);
> > vc->tb_offset_applied = vc->tb_offset;
> > + local_paca->kvm_hstate.tb_offset = vc->tb_offset;
> > }
> >
> > if (vc->pcr)
> > @@ -3594,6 +3595,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
> > if ((tb & 0xffffff) < (new_tb & 0xffffff))
> > mtspr(SPRN_TBU40, new_tb + 0x1000000);
> > vc->tb_offset_applied = 0;
> > + local_paca->kvm_hstate.tb_offset = 0;
> > }
> >
> > mtspr(SPRN_HDEC, 0x7fffffff);
> > diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > index b73140607875..8f7a9f7f4ee6 100644
> > --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> > @@ -632,6 +632,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> > cmpdi r8,0
> > beq 37f
> > std r8, VCORE_TB_OFFSET_APPL(r5)
> > + std r8, HSTATE_TB_OFFSET(r13)
> > mftb r6 /* current host timebase */
> > add r8,r8,r6
> > mtspr SPRN_TBU40,r8 /* update upper 40 bits */
> > @@ -1907,6 +1908,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> > beq 17f
> > li r0, 0
> > std r0, VCORE_TB_OFFSET_APPL(r5)
> > + std r0, HSTATE_TB_OFFSET(r13)
> > mftb r6 /* current guest timebase */
> > subf r8,r8,r6
> > mtspr SPRN_TBU40,r8 /* update upper 40 bits */
^ permalink raw reply
* Re: [PATCH] powerpc/pseries/dlpar: handle ibm,configure-connector delay status
From: Tyrel Datwyler @ 2021-02-05 19:46 UTC (permalink / raw)
To: Nathan Lynch, linuxppc-dev; +Cc: brking
In-Reply-To: <20210107025900.410369-1-nathanl@linux.ibm.com>
On 1/6/21 6:59 PM, Nathan Lynch wrote:
> dlpar_configure_connector() has two problems in its handling of
> ibm,configure-connector's return status:
>
> 1. When the status is -2 (busy, call again), we call
> ibm,configure-connector again immediately without checking whether
> to schedule, which can result in monopolizing the CPU.
> 2. Extended delay status (9900..9905) goes completely unhandled,
> causing the configuration to unnecessarily terminate.
>
> Fix both of these issues by using rtas_busy_delay().
>
> Fixes: ab519a011caa ("powerpc/pseries: Kernel DLPAR Infrastructure")
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
^ permalink raw reply
* [PATCH v2] powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm
From: Aneesh Kumar K.V @ 2021-02-05 18:17 UTC (permalink / raw)
To: linuxppc-dev, mpe
Cc: Jens Axboe, Aneesh Kumar K.V, Zorro Lang, Nicholas Piggin
This fix the bad fault reported by KUAP when io_wqe_worker access userspace.
Bug: Read fault blocked by KUAP!
WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0
NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
..........
Call Trace:
[c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable)
[c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
[c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
--- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
..........
NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
interrupt: 300
[c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
[c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
[c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120
[c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
[c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
[c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
[c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
[c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800
[c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
[c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
[c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c
The kernel consider thread AMR value for kernel thread to be
AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
of course not correct and we should allow userspace access after
kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
AMR value of the operating address space. But, the AMR value is
thread-specific and we inherit the address space and not thread
access restrictions. Because of this ignore AMR value when accessing
userspace via kernel thread.
current_thread_amr/iamr() are also updated, because we use them in the
below stack.
....
[ 530.710838] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Tainted: G D 5.11.0-rc6+ #3
....
NIP [c0000000000aa0c8] pkey_access_permitted+0x28/0x90
LR [c0000000004b9278] gup_pte_range+0x188/0x420
--- interrupt: 700
[c00000001c4ef3f0] [0000000000000000] 0x0 (unreliable)
[c00000001c4ef490] [c0000000004bd39c] gup_pgd_range+0x3ac/0xa20
[c00000001c4ef5a0] [c0000000004bdd44] internal_get_user_pages_fast+0x334/0x410
[c00000001c4ef620] [c000000000852028] iov_iter_get_pages+0xf8/0x5c0
[c00000001c4ef6a0] [c0000000007da44c] bio_iov_iter_get_pages+0xec/0x700
[c00000001c4ef770] [c0000000006a325c] iomap_dio_bio_actor+0x2ac/0x4f0
[c00000001c4ef810] [c00000000069cd94] iomap_apply+0x2b4/0x740
[c00000001c4ef920] [c0000000006a38b8] __iomap_dio_rw+0x238/0x5c0
[c00000001c4ef9d0] [c0000000006a3c60] iomap_dio_rw+0x20/0x80
[c00000001c4ef9f0] [c008000001927a30] xfs_file_dio_aio_write+0x1f8/0x650 [xfs]
[c00000001c4efa60] [c0080000019284dc] xfs_file_write_iter+0xc4/0x130 [xfs]
[c00000001c4efa90] [c000000000669984] io_write+0x104/0x4b0
[c00000001c4efbb0] [c00000000066cea4] io_issue_sqe+0x3d4/0xf50
[c00000001c4efc60] [c000000000670200] io_wq_submit_work+0xb0/0x2f0
[c00000001c4efcb0] [c000000000674268] io_worker_handle_work+0x248/0x4a0
[c00000001c4efd30] [c0000000006746e8] io_wqe_worker+0x228/0x2a0
[c00000001c4efda0] [c00000000019d994] kthread+0x1b4/0x1c0
Cc: Zorro Lang <zlang@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
arch/powerpc/include/asm/book3s/64/kup.h | 16 +++++++++++-----
arch/powerpc/include/asm/book3s/64/pkeys.h | 4 ----
2 files changed, 11 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
index f50f72e535aa..7d1ef7b9754e 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -199,25 +199,31 @@ DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
#ifdef CONFIG_PPC_PKEY
+extern u64 __ro_after_init default_uamor;
+extern u64 __ro_after_init default_amr;
+extern u64 __ro_after_init default_iamr;
+
#include <asm/mmu.h>
#include <asm/ptrace.h>
-/*
- * For kernel thread that doesn't have thread.regs return
- * default AMR/IAMR values.
+/* usage of kthread_use_mm() should inherit the
+ * AMR value of the operating address space. But, the AMR value is
+ * thread-specific and we inherit the address space and not thread
+ * access restrictions. Because of this ignore AMR value when accessing
+ * userspace via kernel thread.
*/
static inline u64 current_thread_amr(void)
{
if (current->thread.regs)
return current->thread.regs->amr;
- return AMR_KUAP_BLOCKED;
+ return default_amr;
}
static inline u64 current_thread_iamr(void)
{
if (current->thread.regs)
return current->thread.regs->iamr;
- return AMR_KUEP_BLOCKED;
+ return default_iamr;
}
#endif /* CONFIG_PPC_PKEY */
diff --git a/arch/powerpc/include/asm/book3s/64/pkeys.h b/arch/powerpc/include/asm/book3s/64/pkeys.h
index 3b8640498f5b..5b178139f3c0 100644
--- a/arch/powerpc/include/asm/book3s/64/pkeys.h
+++ b/arch/powerpc/include/asm/book3s/64/pkeys.h
@@ -5,10 +5,6 @@
#include <asm/book3s/64/hash-pkey.h>
-extern u64 __ro_after_init default_uamor;
-extern u64 __ro_after_init default_amr;
-extern u64 __ro_after_init default_iamr;
-
static inline u64 vmflag_to_pte_pkey_bits(u64 vm_flags)
{
if (!mmu_has_feature(MMU_FTR_PKEY))
--
2.29.2
^ permalink raw reply related
* Re: [PATCH v2 1/2] ima: Free IMA measurement buffer on error
From: Lakshmi Ramasubramanian @ 2021-02-05 17:59 UTC (permalink / raw)
To: Mimi Zohar, Greg KH
Cc: sashal, dmitry.kasatkin, linux-kernel, tyhicks, ebiederm,
linux-integrity, linuxppc-dev, bauerman
In-Reply-To: <6a5b7a1767265122d21f185c81399692d12191f4.camel@linux.ibm.com>
On 2/5/21 9:49 AM, Mimi Zohar wrote:
Hi Mimi,
> On Fri, 2021-02-05 at 09:39 -0800, Lakshmi Ramasubramanian wrote:
>> On 2/5/21 2:05 AM, Greg KH wrote:
>>> On Thu, Feb 04, 2021 at 09:49:50AM -0800, Lakshmi Ramasubramanian wrote:
>>>> IMA allocates kernel virtual memory to carry forward the measurement
>>>> list, from the current kernel to the next kernel on kexec system call,
>>>> in ima_add_kexec_buffer() function. In error code paths this memory
>>>> is not freed resulting in memory leak.
>>>>
>>>> Free the memory allocated for the IMA measurement list in
>>>> the error code paths in ima_add_kexec_buffer() function.
>>>>
>>>> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
>>>> Suggested-by: Tyler Hicks <tyhicks@linux.microsoft.com>
>>>> Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list")
>>>> ---
>>>> security/integrity/ima/ima_kexec.c | 1 +
>>>> 1 file changed, 1 insertion(+)
>>>
>>> <formletter>
>>>
>>> This is not the correct way to submit patches for inclusion in the
>>> stable kernel tree. Please read:
>>> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
>>> for how to do this properly.
>>>
>>> </formletter>
>>>
>>
>> Thanks for the info Greg.
>>
>> I will re-submit the two patches in the proper format.
>
> No need. I'm testing these patches now. I'm not exactly sure what the
> problem is. Stable wasn't Cc'ed. Is it that you sent the patch
> directly to Greg or added "Fixes"?
>
I had not Cced stable, but had "Fixes" tag in the patch.
Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list")
The problem is that the buffer allocated for forwarding the IMA
measurement list is not freed - at the end of the kexec call and also in
an error path. Please see the patch description for
[PATCH v2 2/2] ima: Free IMA measurement buffer after kexec syscall
IMA allocates kernel virtual memory to carry forward the measurement
list, from the current kernel to the next kernel on kexec system call,
in ima_add_kexec_buffer() function. This buffer is not freed before
completing the kexec system call resulting in memory leak.
thanks,
-lakshmi
^ permalink raw reply
* Re: [PATCH v2 1/2] ima: Free IMA measurement buffer on error
From: Mimi Zohar @ 2021-02-05 17:49 UTC (permalink / raw)
To: Lakshmi Ramasubramanian, Greg KH
Cc: sashal, dmitry.kasatkin, linux-kernel, tyhicks, ebiederm,
linux-integrity, linuxppc-dev, bauerman
In-Reply-To: <7000d128-272e-3654-8480-e46bf7dfad74@linux.microsoft.com>
On Fri, 2021-02-05 at 09:39 -0800, Lakshmi Ramasubramanian wrote:
> On 2/5/21 2:05 AM, Greg KH wrote:
> > On Thu, Feb 04, 2021 at 09:49:50AM -0800, Lakshmi Ramasubramanian wrote:
> >> IMA allocates kernel virtual memory to carry forward the measurement
> >> list, from the current kernel to the next kernel on kexec system call,
> >> in ima_add_kexec_buffer() function. In error code paths this memory
> >> is not freed resulting in memory leak.
> >>
> >> Free the memory allocated for the IMA measurement list in
> >> the error code paths in ima_add_kexec_buffer() function.
> >>
> >> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
> >> Suggested-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> >> Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list")
> >> ---
> >> security/integrity/ima/ima_kexec.c | 1 +
> >> 1 file changed, 1 insertion(+)
> >
> > <formletter>
> >
> > This is not the correct way to submit patches for inclusion in the
> > stable kernel tree. Please read:
> > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> > for how to do this properly.
> >
> > </formletter>
> >
>
> Thanks for the info Greg.
>
> I will re-submit the two patches in the proper format.
No need. I'm testing these patches now. I'm not exactly sure what the
problem is. Stable wasn't Cc'ed. Is it that you sent the patch
directly to Greg or added "Fixes"?
thanks,
Mimi
^ permalink raw reply
* Re: [PATCH] mm/pmem: Avoid inserting hugepage PTE entry with fsdax if hugepage support is disabled
From: Dan Williams @ 2021-02-05 17:47 UTC (permalink / raw)
To: Aneesh Kumar K.V
Cc: Jan Kara, linux-nvdimm, Linux MM, Andrew Morton, linuxppc-dev,
Kirill A . Shutemov
In-Reply-To: <20210205023956.417587-1-aneesh.kumar@linux.ibm.com>
[ add Andrew ]
On Thu, Feb 4, 2021 at 6:40 PM Aneesh Kumar K.V
<aneesh.kumar@linux.ibm.com> wrote:
>
> Differentiate between hardware not supporting hugepages and user disabling THP
> via 'echo never > /sys/kernel/mm/transparent_hugepage/enabled'
>
> For the devdax namespace, the kernel handles the above via the
> supported_alignment attribute and failing to initialize the namespace
> if the namespace align value is not supported on the platform.
>
> For the fsdax namespace, the kernel will continue to initialize
> the namespace. This can result in the kernel creating a huge pte
> entry even though the hardware don't support the same.
>
> We do want hugepage support with pmem even if the end-user disabled THP
> via sysfs file (/sys/kernel/mm/transparent_hugepage/enabled). Hence
> differentiate between hardware/firmware lacking support vs user-controlled
> disable of THP and prevent a huge fault if the hardware lacks hugepage
> support.
Looks good to me.
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
I assume this will go through Andrew.
^ permalink raw reply
* Re: [PATCH v2 1/2] ima: Free IMA measurement buffer on error
From: Lakshmi Ramasubramanian @ 2021-02-05 17:39 UTC (permalink / raw)
To: Greg KH
Cc: sashal, dmitry.kasatkin, linux-kernel, zohar, tyhicks, ebiederm,
linux-integrity, linuxppc-dev, bauerman
In-Reply-To: <YB0YdqbbdAdbEOQw@kroah.com>
On 2/5/21 2:05 AM, Greg KH wrote:
> On Thu, Feb 04, 2021 at 09:49:50AM -0800, Lakshmi Ramasubramanian wrote:
>> IMA allocates kernel virtual memory to carry forward the measurement
>> list, from the current kernel to the next kernel on kexec system call,
>> in ima_add_kexec_buffer() function. In error code paths this memory
>> is not freed resulting in memory leak.
>>
>> Free the memory allocated for the IMA measurement list in
>> the error code paths in ima_add_kexec_buffer() function.
>>
>> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
>> Suggested-by: Tyler Hicks <tyhicks@linux.microsoft.com>
>> Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list")
>> ---
>> security/integrity/ima/ima_kexec.c | 1 +
>> 1 file changed, 1 insertion(+)
>
> <formletter>
>
> This is not the correct way to submit patches for inclusion in the
> stable kernel tree. Please read:
> https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> for how to do this properly.
>
> </formletter>
>
Thanks for the info Greg.
I will re-submit the two patches in the proper format.
-lakshmi
^ permalink raw reply
* [PATCH] KVM: PPC: Don't always report hash MMU capability for P9 < DD2.2
From: Fabiano Rosas @ 2021-02-05 16:41 UTC (permalink / raw)
To: kvm-ppc; +Cc: linuxppc-dev, npiggin
In-Reply-To: <20210118062809.1430920-2-npiggin@gmail.com>
These machines don't support running both MMU types at the same time,
so remove the KVM_CAP_PPC_MMU_HASH_V3 capability when the host is
using Radix MMU.
Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
---
arch/powerpc/include/asm/kvm_ppc.h | 1 +
arch/powerpc/kvm/book3s_hv.c | 10 ++++++++++
arch/powerpc/kvm/powerpc.c | 3 +--
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 0a056c64c317..b36abc89baf3 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -314,6 +314,7 @@ struct kvmppc_ops {
int size);
int (*enable_svm)(struct kvm *kvm);
int (*svm_off)(struct kvm *kvm);
+ bool (*hash_v3_possible)(void);
};
extern struct kvmppc_ops *kvmppc_hv_ops;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6f612d240392..d20c0682cae5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5599,6 +5599,15 @@ static int kvmhv_svm_off(struct kvm *kvm)
return ret;
}
+static bool kvmppc_hash_v3_possible(void)
+{
+ if (radix_enabled() && no_mixing_hpt_and_radix)
+ return false;
+
+ return cpu_has_feature(CPU_FTR_ARCH_300) &&
+ cpu_has_feature(CPU_FTR_HVMODE);
+}
+
static struct kvmppc_ops kvm_ops_hv = {
.get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv,
.set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv,
@@ -5642,6 +5651,7 @@ static struct kvmppc_ops kvm_ops_hv = {
.store_to_eaddr = kvmhv_store_to_eaddr,
.enable_svm = kvmhv_enable_svm,
.svm_off = kvmhv_svm_off,
+ .hash_v3_possible = kvmppc_hash_v3_possible,
};
static int kvm_init_subcore_bitmap(void)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index cf52d26f49cd..b9fb2f20f879 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -611,8 +611,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = !!(hv_enabled && radix_enabled());
break;
case KVM_CAP_PPC_MMU_HASH_V3:
- r = !!(hv_enabled && cpu_has_feature(CPU_FTR_ARCH_300) &&
- cpu_has_feature(CPU_FTR_HVMODE));
+ r = !!(hv_enabled && kvmppc_hv_ops->hash_v3_possible());
break;
case KVM_CAP_PPC_NESTED_HV:
r = !!(hv_enabled && kvmppc_hv_ops->enable_nested &&
--
2.29.2
^ permalink raw reply related
* Re: [PATCH] powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm
From: Zorro Lang @ 2021-02-05 16:12 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: Jens Axboe, linuxppc-dev, Nicholas Piggin
In-Reply-To: <871rdur5e7.fsf@linux.ibm.com>
On Fri, Feb 05, 2021 at 07:19:36PM +0530, Aneesh Kumar K.V wrote:
> Zorro Lang <zlang@redhat.com> writes:
>
> ....
>
> > ...
> > [ 530.180466] run fstests generic/617 at 2021-02-05 03:41:10
> > [ 530.707969] ------------[ cut here ]------------
> > [ 530.708006] kernel BUG at arch/powerpc/include/asm/book3s/64/kup.h:207!
> > [ 530.708013] Oops: Exception in kernel mode, sig: 5 [#1]
> > [ 530.708018] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> > [ 530.708022] Modules linked in: bonding rfkill sunrpc uio_pdrv_genirq pseries_rng uio drm fuse drm_panel_orientation_quirks ip_tables xfs libcrc32c sd_mod t10_pi ibmvscsi ibmveth scsi_trans
> > port_srp xts vmx_crypto
> > [ 530.708049] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Not tainted 5.11.0-r
>
> ok so we call current_thread_amr() with kthread.
>
> commit ae33fb7b069ebb41e32f55ae397c887031e47472
> Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Date: Fri Feb 5 19:11:49 2021 +0530
>
>
> The other stack that matters is
> ...
> [ 530.710838] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Tainted: G D 5.11.0-rc6+ #3
> ....
>
> NIP [c0000000000aa0c8] pkey_access_permitted+0x28/0x90
> LR [c0000000004b9278] gup_pte_range+0x188/0x420
> --- interrupt: 700
> [c00000001c4ef3f0] [0000000000000000] 0x0 (unreliable)
> [c00000001c4ef490] [c0000000004bd39c] gup_pgd_range+0x3ac/0xa20
> [c00000001c4ef5a0] [c0000000004bdd44] internal_get_user_pages_fast+0x334/0x410
> [c00000001c4ef620] [c000000000852028] iov_iter_get_pages+0xf8/0x5c0
> [c00000001c4ef6a0] [c0000000007da44c] bio_iov_iter_get_pages+0xec/0x700
> [c00000001c4ef770] [c0000000006a325c] iomap_dio_bio_actor+0x2ac/0x4f0
> [c00000001c4ef810] [c00000000069cd94] iomap_apply+0x2b4/0x740
> [c00000001c4ef920] [c0000000006a38b8] __iomap_dio_rw+0x238/0x5c0
> [c00000001c4ef9d0] [c0000000006a3c60] iomap_dio_rw+0x20/0x80
> [c00000001c4ef9f0] [c008000001927a30] xfs_file_dio_aio_write+0x1f8/0x650 [xfs]
> [c00000001c4efa60] [c0080000019284dc] xfs_file_write_iter+0xc4/0x130 [xfs]
> [c00000001c4efa90] [c000000000669984] io_write+0x104/0x4b0
> [c00000001c4efbb0] [c00000000066cea4] io_issue_sqe+0x3d4/0xf50
> [c00000001c4efc60] [c000000000670200] io_wq_submit_work+0xb0/0x2f0
> [c00000001c4efcb0] [c000000000674268] io_worker_handle_work+0x248/0x4a0
> [c00000001c4efd30] [c0000000006746e8] io_wqe_worker+0x228/0x2a0
> [c00000001c4efda0] [c00000000019d994] kthread+0x1b4/0x1c0
>
> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
> index 2064621ae7b6..21e59c1f0d67 100644
> --- a/arch/powerpc/include/asm/book3s/64/kup.h
> +++ b/arch/powerpc/include/asm/book3s/64/kup.h
> @@ -204,14 +204,16 @@ DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
>
> static inline u64 current_thread_amr(void)
> {
> - VM_BUG_ON(!current->thread.regs);
> - return current->thread.regs->amr;
> + if (current->thread.regs)
> + return current->thread.regs->amr;
> + return 0;
> }
>
> static inline u64 current_thread_iamr(void)
> {
> - VM_BUG_ON(!current->thread.regs);
> - return current->thread.regs->iamr;
> + if (current->thread.regs)
> + return current->thread.regs->iamr;
> + return 0;
> }
> #endif /* CONFIG_PPC_PKEY */
This change can help to avoid above regression issue:
# ./check generic/013 generic/616 generic/617
FSTYP -- xfs (debug)
PLATFORM -- Linux/ppc64le ibm-p9z-xx-xxx 5.11.0-rc6+ #4 SMP Fri Feb 5 10:22:14 EST 2021
MKFS_OPTIONS -- -f -m crc=1,finobt=1,rmapbt=1,reflink=1,inobtcount=1,bigtime=1 /dev/sda3
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch
generic/013 37s ... 42s
generic/616 166s
generic/617 16s ... 20s
Ran: generic/013 generic/616 generic/617
Passed all 3 tests
>
>
^ permalink raw reply
* Re: [PATCH 1/2] PCI/AER: Disable AER interrupt during suspend
From: Kai-Heng Feng @ 2021-02-05 15:17 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: Joerg Roedel,
open list:PCI ENHANCED ERROR HANDLING (EEH) FOR POWERPC,
open list:PCI SUBSYSTEM, open list, Lalithambika Krishnakumar,
Alex Williamson, Oliver O'Halloran, Bjorn Helgaas,
Mika Westerberg, Lu Baolu
In-Reply-To: <20210204232758.GA125392@bjorn-Precision-5520>
On Fri, Feb 5, 2021 at 7:28 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Alex]
>
> On Thu, Jan 28, 2021 at 12:09:37PM +0800, Kai-Heng Feng wrote:
> > On Thu, Jan 28, 2021 at 4:51 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Thu, Jan 28, 2021 at 01:31:00AM +0800, Kai-Heng Feng wrote:
> > > > Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in
> > > > hint") enables ACS, and some platforms lose its NVMe after resume from
> > > > firmware:
> > > > [ 50.947816] pcieport 0000:00:1b.0: DPC: containment event, status:0x1f01 source:0x0000
> > > > [ 50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected
> > > > [ 50.947829] pcieport 0000:00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
> > > > [ 50.947830] pcieport 0000:00:1b.0: device [8086:06ac] error status/mask=00200000/00010000
> > > > [ 50.947831] pcieport 0000:00:1b.0: [21] ACSViol (First)
> > > > [ 50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message
> > > > [ 50.947843] nvme nvme0: frozen state error detected, reset controller
> > > >
> > > > It happens right after ACS gets enabled during resume.
> > > >
> > > > To prevent that from happening, disable AER interrupt and enable it on
> > > > system suspend and resume, respectively.
> > >
> > > Lots of questions here. Maybe this is what we'll end up doing, but I
> > > am curious about why the error is reported in the first place.
> > >
> > > Is this a consequence of the link going down and back up?
> >
> > Could be. From the observations, it only happens when firmware suspend
> > (S3) is used.
> > Maybe it happens when it's gets powered up, but I don't have equipment
> > to debug at hardware level.
> >
> > If we use non-firmware suspend method, enabling ACS after resume won't
> > trip AER and DPC.
> >
> > > Is it consequence of the device doing a DMA when it shouldn't?
> >
> > If it's doing DMA while suspending, the same error should also happen
> > after NVMe is suspended and before PCIe port suspending.
> > Furthermore, if non-firmware suspend method is used, there's so such
> > issue, so less likely to be any DMA operation.
> >
> > > Are we doing something in the wrong order during suspend? Or maybe
> > > resume, since I assume the error is reported during resume?
> >
> > Yes the error is reported during resume. The suspend/resume order
> > seems fine as non-firmware suspend doesn't have this issue.
>
> I really feel like we need a better understanding of what's going on
> here. Disabling the AER interrupt is like closing our eyes and
> pretending that because we don't see it, it didn't happen.
>
> An ACS error is triggered by a DMA, right? I'm assuming an MMIO
> access from the CPU wouldn't trigger this error. And it sounds like
> the error is triggered before we even start running the driver after
> resume.
>
> If we're powering up an NVMe device from D3cold and it DMAs before the
> driver touches it, something would be seriously broken. I doubt
> that's what's happening. Maybe a device could resume some previously
> programmed DMA after powering up from D3hot.
I am not that familiar with PCIe ACS/AER/DPC, so I can't really answer
questions you raised.
PCIe spec doesn't say the suspend/resume order is also not helping here.
However, I really think it's a system firmware issue.
I've seen some suspend-to-idle platforms with NVMe can reach D3cold,
those are unaffected.
>
> Or maybe the error occurred on suspend, like if the device wasn't
> quiesced or something, but we didn't notice it until resume? The
> AER error status bits are RW1CS, which means they can be preserved
> across hot/warm/cold resets.
>
> Can you instrument the code to see whether the AER error status bit is
> set before enabling ACS? I'm not sure that merely enabling ACS (I
> assume you mean pci_std_enable_acs(), where we write PCI_ACS_CTRL)
> should cause an interrupt for a previously-logged error. I suspect
> that could happen when enabling *AER*, but I wouldn't think it would
> happen when enabling *ACS*.
Diff to print AER status:
https://bugzilla.kernel.org/show_bug.cgi?id=209149#c11
And dmesg:
https://bugzilla.kernel.org/show_bug.cgi?id=209149#c12
Looks like the read before suspend and after resume are both fine.
>
> Does this error happen on multiple machines from different vendors?
> Wondering if it could be a BIOS issue, e.g., BIOS not cleaning up
> after it did something to cause an error.
AFAIK, systems from both HP and Dell are affected.
I was told that the reference platform from Intel is using
suspend-to-idle, but vendors changed the sleep method to S3 to have
lower power consumption to pass regulation.
Kai-Heng
>
> > > If we *do* take the error, why doesn't DPC recovery work?
> >
> > It works for the root port, but not for the NVMe drive:
> > [ 50.947816] pcieport 0000:00:1b.0: DPC: containment event,
> > status:0x1f01 source:0x0000
> > [ 50.947817] pcieport 0000:00:1b.0: DPC: unmasked uncorrectable error detected
> > [ 50.947829] pcieport 0000:00:1b.0: PCIe Bus Error:
> > severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver
> > ID)
> > [ 50.947830] pcieport 0000:00:1b.0: device [8086:06ac] error
> > status/mask=00200000/00010000
> > [ 50.947831] pcieport 0000:00:1b.0: [21] ACSViol (First)
> > [ 50.947841] pcieport 0000:00:1b.0: AER: broadcast error_detected message
> > [ 50.947843] nvme nvme0: frozen state error detected, reset controller
> > [ 50.948400] ACPI: EC: event unblocked
> > [ 50.948432] xhci_hcd 0000:00:14.0: PME# disabled
> > [ 50.948444] xhci_hcd 0000:00:14.0: enabling bus mastering
> > [ 50.949056] pcieport 0000:00:1b.0: PME# disabled
> > [ 50.949068] pcieport 0000:00:1c.0: PME# disabled
> > [ 50.949416] e1000e 0000:00:1f.6: PME# disabled
> > [ 50.949463] e1000e 0000:00:1f.6: enabling bus mastering
> > [ 50.951606] sd 0:0:0:0: [sda] Starting disk
> > [ 50.951610] nvme 0000:01:00.0: can't change power state from D3hot
> > to D0 (config space inaccessible)
> > [ 50.951730] nvme nvme0: Removing after probe failure status: -19
> > [ 50.952360] nvme nvme0: failed to set APST feature (-19)
> > [ 50.971136] snd_hda_intel 0000:00:1f.3: PME# disabled
> > [ 51.089330] pcieport 0000:00:1b.0: AER: broadcast resume message
> > [ 51.089345] pcieport 0000:00:1b.0: AER: device recovery successful
> >
> > But I think why recovery doesn't work for NVMe is for another discussion...
> >
> > Kai-Heng
> >
> > >
> > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209149
> > > > Fixes: 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint")
> > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > > > ---
> > > > drivers/pci/pcie/aer.c | 18 ++++++++++++++++++
> > > > 1 file changed, 18 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > > > index 77b0f2c45bc0..0e9a85530ae6 100644
> > > > --- a/drivers/pci/pcie/aer.c
> > > > +++ b/drivers/pci/pcie/aer.c
> > > > @@ -1365,6 +1365,22 @@ static int aer_probe(struct pcie_device *dev)
> > > > return 0;
> > > > }
> > > >
> > > > +static int aer_suspend(struct pcie_device *dev)
> > > > +{
> > > > + struct aer_rpc *rpc = get_service_data(dev);
> > > > +
> > > > + aer_disable_rootport(rpc);
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static int aer_resume(struct pcie_device *dev)
> > > > +{
> > > > + struct aer_rpc *rpc = get_service_data(dev);
> > > > +
> > > > + aer_enable_rootport(rpc);
> > > > + return 0;
> > > > +}
> > > > +
> > > > /**
> > > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP
> > > > * @dev: pointer to Root Port, RCEC, or RCiEP
> > > > @@ -1437,6 +1453,8 @@ static struct pcie_port_service_driver aerdriver = {
> > > > .service = PCIE_PORT_SERVICE_AER,
> > > >
> > > > .probe = aer_probe,
> > > > + .suspend = aer_suspend,
> > > > + .resume = aer_resume,
> > > > .remove = aer_remove,
> > > > };
> > > >
> > > > --
> > > > 2.29.2
> > > >
^ permalink raw reply
* Re: [PATCH 5/7] ASoC: imx-pcm-rpmsg: Add platform driver for audio base on rpmsg
From: Mark Brown @ 2021-02-05 14:58 UTC (permalink / raw)
To: Shengjiu Wang
Cc: devicetree, alsa-devel, timur, lgirdwood, linuxppc-dev, Xiubo.Lee,
linux-kernel, tiwai, nicoleotsuka, robh+dt, perex, festevam
In-Reply-To: <1612508250-10586-6-git-send-email-shengjiu.wang@nxp.com>
[-- Attachment #1: Type: text/plain, Size: 1431 bytes --]
On Fri, Feb 05, 2021 at 02:57:28PM +0800, Shengjiu Wang wrote:
> + if (params_format(params) == SNDRV_PCM_FORMAT_S16_LE)
> + msg->s_msg.param.format = RPMSG_S16_LE;
> + else if (params_format(params) == SNDRV_PCM_FORMAT_S24_LE)
Again this should be a switch statement.
> + if (params_channels(params) == 1)
> + msg->s_msg.param.channels = RPMSG_CH_LEFT;
> + else
> + msg->s_msg.param.channels = RPMSG_CH_STEREO;
Shouldn't this be reporting an error if the number of channels is more
than 2?
> + /*
> + * if the data in the buffer is less than one period
> + * send message immediately.
> + * if there is more than one period data, delay one
> + * period (timer) to send the message.
> + */
> + if ((avail - writen_num * period_size) <= period_size) {
> + imx_rpmsg_insert_workqueue(substream, msg, info);
> + } else if (rpmsg->force_lpa && !timer_pending(timer)) {
> + int time_msec;
> +
> + time_msec = (int)(runtime->period_size * 1000 / runtime->rate);
> + mod_timer(timer, jiffies + msecs_to_jiffies(time_msec));
> + }
The comment here is at least confusing - why would we not send a full
buffer immediately if we have one? This sounds like it's the opposite
way round to what we'd do if we were trying to cut down the number of
messages. It might help to say which buffer and where?
> + /**
> + * Every work in the work queue, first we check if there
/** comments are only for kerneldoc.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH 4/7] ASoC: imx-audio-rpmsg: Add rpmsg_driver for audio channel
From: Mark Brown @ 2021-02-05 14:25 UTC (permalink / raw)
To: Shengjiu Wang
Cc: devicetree, alsa-devel, timur, lgirdwood, linuxppc-dev, Xiubo.Lee,
linux-kernel, tiwai, nicoleotsuka, robh+dt, perex, festevam
In-Reply-To: <1612508250-10586-5-git-send-email-shengjiu.wang@nxp.com>
[-- Attachment #1: Type: text/plain, Size: 584 bytes --]
On Fri, Feb 05, 2021 at 02:57:27PM +0800, Shengjiu Wang wrote:
> + /* TYPE C is notification from M core */
> + if (r_msg->header.type == MSG_TYPE_C) {
> + if (r_msg->header.cmd == TX_PERIOD_DONE) {
> + } else if (r_msg->header.cmd == RX_PERIOD_DONE) {
A switch statement would be clearer and more extensible...
> + /* TYPE B is response msg */
> + if (r_msg->header.type == MSG_TYPE_B) {
> + memcpy(&info->r_msg, r_msg, sizeof(struct rpmsg_r_msg));
> + complete(&info->cmd_complete);
> + }
...and make this flow clearer for example. Do we need to warn on
unknown messages?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH 2/7] ASoC: fsl_rpmsg: Add CPU DAI driver for audio base on rpmsg
From: Mark Brown @ 2021-02-05 14:02 UTC (permalink / raw)
To: Shengjiu Wang
Cc: devicetree, alsa-devel, timur, lgirdwood, linuxppc-dev, Xiubo.Lee,
linux-kernel, tiwai, nicoleotsuka, robh+dt, perex, festevam
In-Reply-To: <1612508250-10586-3-git-send-email-shengjiu.wang@nxp.com>
[-- Attachment #1: Type: text/plain, Size: 362 bytes --]
On Fri, Feb 05, 2021 at 02:57:25PM +0800, Shengjiu Wang wrote:
> This is a dummy cpu dai driver for rpmsg audio use case,
> which is mainly used for getting the user's configuration
This is actually doing stuff, it's not a dummy driver.
> +static int fsl_rpmsg_remove(struct platform_device *pdev)
> +{
> + return 0;
> +}
If this isn't needed just remove it.
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply
* Re: [PATCH] powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm
From: Aneesh Kumar K.V @ 2021-02-05 13:49 UTC (permalink / raw)
To: Zorro Lang; +Cc: Jens Axboe, linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210205095820.GI14354@localhost.localdomain>
Zorro Lang <zlang@redhat.com> writes:
....
> ...
> [ 530.180466] run fstests generic/617 at 2021-02-05 03:41:10
> [ 530.707969] ------------[ cut here ]------------
> [ 530.708006] kernel BUG at arch/powerpc/include/asm/book3s/64/kup.h:207!
> [ 530.708013] Oops: Exception in kernel mode, sig: 5 [#1]
> [ 530.708018] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [ 530.708022] Modules linked in: bonding rfkill sunrpc uio_pdrv_genirq pseries_rng uio drm fuse drm_panel_orientation_quirks ip_tables xfs libcrc32c sd_mod t10_pi ibmvscsi ibmveth scsi_trans
> port_srp xts vmx_crypto
> [ 530.708049] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Not tainted 5.11.0-r
ok so we call current_thread_amr() with kthread.
commit ae33fb7b069ebb41e32f55ae397c887031e47472
Author: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Date: Fri Feb 5 19:11:49 2021 +0530
The other stack that matters is
...
[ 530.710838] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Tainted: G D 5.11.0-rc6+ #3
....
NIP [c0000000000aa0c8] pkey_access_permitted+0x28/0x90
LR [c0000000004b9278] gup_pte_range+0x188/0x420
--- interrupt: 700
[c00000001c4ef3f0] [0000000000000000] 0x0 (unreliable)
[c00000001c4ef490] [c0000000004bd39c] gup_pgd_range+0x3ac/0xa20
[c00000001c4ef5a0] [c0000000004bdd44] internal_get_user_pages_fast+0x334/0x410
[c00000001c4ef620] [c000000000852028] iov_iter_get_pages+0xf8/0x5c0
[c00000001c4ef6a0] [c0000000007da44c] bio_iov_iter_get_pages+0xec/0x700
[c00000001c4ef770] [c0000000006a325c] iomap_dio_bio_actor+0x2ac/0x4f0
[c00000001c4ef810] [c00000000069cd94] iomap_apply+0x2b4/0x740
[c00000001c4ef920] [c0000000006a38b8] __iomap_dio_rw+0x238/0x5c0
[c00000001c4ef9d0] [c0000000006a3c60] iomap_dio_rw+0x20/0x80
[c00000001c4ef9f0] [c008000001927a30] xfs_file_dio_aio_write+0x1f8/0x650 [xfs]
[c00000001c4efa60] [c0080000019284dc] xfs_file_write_iter+0xc4/0x130 [xfs]
[c00000001c4efa90] [c000000000669984] io_write+0x104/0x4b0
[c00000001c4efbb0] [c00000000066cea4] io_issue_sqe+0x3d4/0xf50
[c00000001c4efc60] [c000000000670200] io_wq_submit_work+0xb0/0x2f0
[c00000001c4efcb0] [c000000000674268] io_worker_handle_work+0x248/0x4a0
[c00000001c4efd30] [c0000000006746e8] io_wqe_worker+0x228/0x2a0
[c00000001c4efda0] [c00000000019d994] kthread+0x1b4/0x1c0
diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
index 2064621ae7b6..21e59c1f0d67 100644
--- a/arch/powerpc/include/asm/book3s/64/kup.h
+++ b/arch/powerpc/include/asm/book3s/64/kup.h
@@ -204,14 +204,16 @@ DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
static inline u64 current_thread_amr(void)
{
- VM_BUG_ON(!current->thread.regs);
- return current->thread.regs->amr;
+ if (current->thread.regs)
+ return current->thread.regs->amr;
+ return 0;
}
static inline u64 current_thread_iamr(void)
{
- VM_BUG_ON(!current->thread.regs);
- return current->thread.regs->iamr;
+ if (current->thread.regs)
+ return current->thread.regs->iamr;
+ return 0;
}
#endif /* CONFIG_PPC_PKEY */
^ permalink raw reply related
* Re: [PATCH] powerpc/pseries/dlpar: handle ibm, configure-connector delay status
From: Nathan Lynch @ 2021-02-05 13:15 UTC (permalink / raw)
To: linuxppc-dev; +Cc: tyreld, brking, Laurent Dufour, Scott Cheloha
In-Reply-To: <20210107025900.410369-1-nathanl@linux.ibm.com>
Nathan Lynch <nathanl@linux.ibm.com> writes:
> dlpar_configure_connector() has two problems in its handling of
> ibm,configure-connector's return status:
>
> 1. When the status is -2 (busy, call again), we call
> ibm,configure-connector again immediately without checking whether
> to schedule, which can result in monopolizing the CPU.
> 2. Extended delay status (9900..9905) goes completely unhandled,
> causing the configuration to unnecessarily terminate.
>
> Fix both of these issues by using rtas_busy_delay().
>
> Fixes: ab519a011caa ("powerpc/pseries: Kernel DLPAR Infrastructure")
> Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Just following up and adding some people to cc in hopes of getting some
review for this.
> ---
> arch/powerpc/platforms/pseries/dlpar.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/platforms/pseries/dlpar.c b/arch/powerpc/platforms/pseries/dlpar.c
> index 16e86ba8aa20..f6b7749d6ada 100644
> --- a/arch/powerpc/platforms/pseries/dlpar.c
> +++ b/arch/powerpc/platforms/pseries/dlpar.c
> @@ -127,7 +127,6 @@ void dlpar_free_cc_nodes(struct device_node *dn)
> #define NEXT_PROPERTY 3
> #define PREV_PARENT 4
> #define MORE_MEMORY 5
> -#define CALL_AGAIN -2
> #define ERR_CFG_USE -9003
>
> struct device_node *dlpar_configure_connector(__be32 drc_index,
> @@ -168,6 +167,9 @@ struct device_node *dlpar_configure_connector(__be32 drc_index,
>
> spin_unlock(&rtas_data_buf_lock);
>
> + if (rtas_busy_delay(rc))
> + continue;
> +
> switch (rc) {
> case COMPLETE:
> break;
> @@ -216,9 +218,6 @@ struct device_node *dlpar_configure_connector(__be32 drc_index,
> last_dn = last_dn->parent;
> break;
>
> - case CALL_AGAIN:
> - break;
> -
> case MORE_MEMORY:
> case ERR_CFG_USE:
> default:
> --
> 2.29.2
^ permalink raw reply
* Re: [PATCH v2 1/1] powerpc/kvm: Save Timebase Offset to fix sched_clock() while running guest code.
From: Fabiano Rosas @ 2021-02-05 13:09 UTC (permalink / raw)
To: Leonardo Bras, Paul Mackerras, Michael Ellerman,
Benjamin Herrenschmidt, Christophe Leroy, Athira Rajeev,
Aneesh Kumar K.V, Leonardo Bras, Jordan Niethe, Nicholas Piggin,
Frederic Weisbecker, Thomas Gleixner, Geert Uytterhoeven
Cc: linuxppc-dev, linux-kernel, kvm-ppc
In-Reply-To: <20210205060643.233481-1-leobras.c@gmail.com>
Leonardo Bras <leobras.c@gmail.com> writes:
> Before guest entry, TBU40 register is changed to reflect guest timebase.
> After exitting guest, the register is reverted to it's original value.
>
> If one tries to get the timestamp from host between those changes, it
> will present an incorrect value.
>
> An example would be trying to add a tracepoint in
> kvmppc_guest_entry_inject_int(), which depending on last tracepoint
> acquired could actually cause the host to crash.
>
> Save the Timebase Offset to PACA and use it on sched_clock() to always
> get the correct timestamp.
>
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> Suggested-by: Paul Mackerras <paulus@ozlabs.org>
> ---
> Changes since v1:
> - Subtracts offset only when CONFIG_KVM_BOOK3S_HANDLER and
> CONFIG_PPC_BOOK3S_64 are defined.
> ---
> arch/powerpc/include/asm/kvm_book3s_asm.h | 1 +
> arch/powerpc/kernel/asm-offsets.c | 1 +
> arch/powerpc/kernel/time.c | 8 +++++++-
> arch/powerpc/kvm/book3s_hv.c | 2 ++
> arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 ++
> 5 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
> index 078f4648ea27..e2c12a10eed2 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
> @@ -131,6 +131,7 @@ struct kvmppc_host_state {
> u64 cfar;
> u64 ppr;
> u64 host_fscr;
> + u64 tb_offset; /* Timebase offset: keeps correct
> timebase while on guest */
Couldn't you use the vc->tb_offset_applied for this? We have a reference
for the vcore in the hstate already.
> #endif
> };
>
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index b12d7c049bfe..0beb8fdc6352 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -706,6 +706,7 @@ int main(void)
> HSTATE_FIELD(HSTATE_CFAR, cfar);
> HSTATE_FIELD(HSTATE_PPR, ppr);
> HSTATE_FIELD(HSTATE_HOST_FSCR, host_fscr);
> + HSTATE_FIELD(HSTATE_TB_OFFSET, tb_offset);
> #endif /* CONFIG_PPC_BOOK3S_64 */
>
> #else /* CONFIG_PPC_BOOK3S */
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index 67feb3524460..f27f0163792b 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -699,7 +699,13 @@ EXPORT_SYMBOL_GPL(tb_to_ns);
> */
> notrace unsigned long long sched_clock(void)
> {
> - return mulhdu(get_tb() - boot_tb, tb_to_ns_scale) << tb_to_ns_shift;
> + u64 tb = get_tb() - boot_tb;
> +
> +#if defined(CONFIG_PPC_BOOK3S_64) && defined(CONFIG_KVM_BOOK3S_HANDLER)
> + tb -= local_paca->kvm_hstate.tb_offset;
> +#endif
> +
> + return mulhdu(tb, tb_to_ns_scale) << tb_to_ns_shift;
> }
>
>
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index b3731572295e..c08593c63353 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -3491,6 +3491,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
> if ((tb & 0xffffff) < (new_tb & 0xffffff))
> mtspr(SPRN_TBU40, new_tb + 0x1000000);
> vc->tb_offset_applied = vc->tb_offset;
> + local_paca->kvm_hstate.tb_offset = vc->tb_offset;
> }
>
> if (vc->pcr)
> @@ -3594,6 +3595,7 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
> if ((tb & 0xffffff) < (new_tb & 0xffffff))
> mtspr(SPRN_TBU40, new_tb + 0x1000000);
> vc->tb_offset_applied = 0;
> + local_paca->kvm_hstate.tb_offset = 0;
> }
>
> mtspr(SPRN_HDEC, 0x7fffffff);
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index b73140607875..8f7a9f7f4ee6 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -632,6 +632,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
> cmpdi r8,0
> beq 37f
> std r8, VCORE_TB_OFFSET_APPL(r5)
> + std r8, HSTATE_TB_OFFSET(r13)
> mftb r6 /* current host timebase */
> add r8,r8,r6
> mtspr SPRN_TBU40,r8 /* update upper 40 bits */
> @@ -1907,6 +1908,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
> beq 17f
> li r0, 0
> std r0, VCORE_TB_OFFSET_APPL(r5)
> + std r0, HSTATE_TB_OFFSET(r13)
> mftb r6 /* current guest timebase */
> subf r8,r8,r6
> mtspr SPRN_TBU40,r8 /* update upper 40 bits */
^ permalink raw reply
* Re: [PATCH v2 2/2] ima: Free IMA measurement buffer after kexec syscall
From: Greg KH @ 2021-02-05 10:05 UTC (permalink / raw)
To: Lakshmi Ramasubramanian
Cc: sashal, dmitry.kasatkin, linux-kernel, zohar, tyhicks, ebiederm,
linux-integrity, linuxppc-dev, bauerman
In-Reply-To: <20210204174951.25771-2-nramas@linux.microsoft.com>
On Thu, Feb 04, 2021 at 09:49:51AM -0800, Lakshmi Ramasubramanian wrote:
> IMA allocates kernel virtual memory to carry forward the measurement
> list, from the current kernel to the next kernel on kexec system call,
> in ima_add_kexec_buffer() function. This buffer is not freed before
> completing the kexec system call resulting in memory leak.
>
> Add ima_buffer field in "struct kimage" to store the virtual address
> of the buffer allocated for the IMA measurement list.
> Free the memory allocated for the IMA measurement list in
> kimage_file_post_load_cleanup() function.
>
> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
> Suggested-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> Reviewed-by: Thiago Jung Bauermann <bauerman@linux.ibm.com>
> Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list")
> ---
> include/linux/kexec.h | 5 +++++
> kernel/kexec_file.c | 5 +++++
> security/integrity/ima/ima_kexec.c | 2 ++
> 3 files changed, 12 insertions(+)
<formletter>
This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.
</formletter>
^ permalink raw reply
* Re: [PATCH v2 1/2] ima: Free IMA measurement buffer on error
From: Greg KH @ 2021-02-05 10:05 UTC (permalink / raw)
To: Lakshmi Ramasubramanian
Cc: sashal, dmitry.kasatkin, linux-kernel, zohar, tyhicks, ebiederm,
linux-integrity, linuxppc-dev, bauerman
In-Reply-To: <20210204174951.25771-1-nramas@linux.microsoft.com>
On Thu, Feb 04, 2021 at 09:49:50AM -0800, Lakshmi Ramasubramanian wrote:
> IMA allocates kernel virtual memory to carry forward the measurement
> list, from the current kernel to the next kernel on kexec system call,
> in ima_add_kexec_buffer() function. In error code paths this memory
> is not freed resulting in memory leak.
>
> Free the memory allocated for the IMA measurement list in
> the error code paths in ima_add_kexec_buffer() function.
>
> Signed-off-by: Lakshmi Ramasubramanian <nramas@linux.microsoft.com>
> Suggested-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> Fixes: 7b8589cc29e7 ("ima: on soft reboot, save the measurement list")
> ---
> security/integrity/ima/ima_kexec.c | 1 +
> 1 file changed, 1 insertion(+)
<formletter>
This is not the correct way to submit patches for inclusion in the
stable kernel tree. Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.
</formletter>
^ permalink raw reply
* Re: [PATCH] powerpc/kuap: Allow kernel thread to access userspace after kthread_use_mm
From: Zorro Lang @ 2021-02-05 9:58 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: Jens Axboe, linuxppc-dev, Nicholas Piggin
In-Reply-To: <20210205030426.430331-1-aneesh.kumar@linux.ibm.com>
On Fri, Feb 05, 2021 at 08:34:26AM +0530, Aneesh Kumar K.V wrote:
> This fix the bad fault reported by KUAP when io_wqe_worker access userspace.
>
> Bug: Read fault blocked by KUAP!
> WARNING: CPU: 1 PID: 101841 at arch/powerpc/mm/fault.c:229 __do_page_fault+0x6b4/0xcd0
> NIP [c00000000009e7e4] __do_page_fault+0x6b4/0xcd0
> LR [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0
> ..........
> Call Trace:
> [c000000016367330] [c00000000009e7e0] __do_page_fault+0x6b0/0xcd0 (unreliable)
> [c0000000163673e0] [c00000000009ee3c] do_page_fault+0x3c/0x120
> [c000000016367430] [c00000000000c848] handle_page_fault+0x10/0x2c
> --- interrupt: 300 at iov_iter_fault_in_readable+0x148/0x6f0
> ..........
> NIP [c0000000008e8228] iov_iter_fault_in_readable+0x148/0x6f0
> LR [c0000000008e834c] iov_iter_fault_in_readable+0x26c/0x6f0
> interrupt: 300
> [c0000000163677e0] [c0000000007154a0] iomap_write_actor+0xc0/0x280
> [c000000016367880] [c00000000070fc94] iomap_apply+0x1c4/0x780
> [c000000016367990] [c000000000710330] iomap_file_buffered_write+0xa0/0x120
> [c0000000163679e0] [c00800000040791c] xfs_file_buffered_aio_write+0x314/0x5e0 [xfs]
> [c000000016367a90] [c0000000006d74bc] io_write+0x10c/0x460
> [c000000016367bb0] [c0000000006d80e4] io_issue_sqe+0x8d4/0x1200
> [c000000016367c70] [c0000000006d8ad0] io_wq_submit_work+0xc0/0x250
> [c000000016367cb0] [c0000000006e2578] io_worker_handle_work+0x498/0x800
> [c000000016367d40] [c0000000006e2cdc] io_wqe_worker+0x3fc/0x4f0
> [c000000016367da0] [c0000000001cb0a4] kthread+0x1c4/0x1d0
> [c000000016367e10] [c00000000000dbf0] ret_from_kernel_thread+0x5c/0x6c
>
> The kernel consider thread AMR value for kernel thread to be
> AMR_KUAP_BLOCKED. Hence access to userspace is denied. This
> of course not correct and we should allow userspace access after
> kthread_use_mm(). To be precise, kthread_use_mm() should inherit the
> AMR value of the operating address space. But, the AMR value is
> thread-specific and we inherit the address space and not thread
> access restrictions. Because of this ignore AMR value when accessing
> userspace via kernel thread.
>
> Cc: Zorro Lang <zlang@redhat.com>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
> Cc: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
Hi,
Simply test on ppc64le with latest 5.11.0-rc6+.
1) Reproduced this bug at first:
# ./check generic/013
FSTYP -- xfs (debug)
PLATFORM -- Linux/ppc64le ibm-p9z-xxx-xxx 5.11.0-rc6+ #2 SMP Fri Feb 5 01:40:25 EST 2021
MKFS_OPTIONS -- -f -m crc=1,finobt=1,rmapbt=1,reflink=1,inobtcount=1,bigtime=1 /dev/sda3
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch
generic/013 49s ... _check_dmesg: something found in dmesg (see /var/lib/xfstests/results//generic/013.dmesg)
Ran: generic/013
Failures: generic/013
Failed 1 of 1 tests
# cat results//generic/013.dmesg
...
[ 4261.095623] Kernel attempted to read user page (1003a0648b0) - exploit attempt? (uid: 0)
[ 4261.095640] ------------[ cut here ]------------
[ 4261.095643] Bug: Read fault blocked by KUAP!
[ 4261.095647] WARNING: CPU: 7 PID: 287137 at arch/powerpc/mm/fault.c:229 bad_kernel_fault+0x180/0x310
...
...
2) Test passed on the kernel with this patch:
# ./check generic/013 generic/051
FSTYP -- xfs (debug)
PLATFORM -- Linux/ppc64le ibm-p9z-xx-xxx 5.11.0-rc6+ #3 SMP Fri Feb 5 02:44:31 EST 2021
MKFS_OPTIONS -- -f -m crc=1,finobt=1,rmapbt=1,reflink=1,inobtcount=1,bigtime=1 /dev/sda3
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch
generic/013 49s ... 42s
generic/051 87s
Ran: generic/013 generic/051
Passed all 2 tests
3) But when I just gave it a little more test, a test case hang and trigger a kernel BUG as below.
I thought it's a regression issue from this patch.
https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/tree/tests/generic/617
# ./check generic/616 generic/617
FSTYP -- xfs (debug)
PLATFORM -- Linux/ppc64le ibm-p9z-xx-xxx 5.11.0-rc6+ #3 SMP Fri Feb 5 02:44:31 EST 2021
MKFS_OPTIONS -- -f -m crc=1,finobt=1,rmapbt=1,reflink=1,inobtcount=1,bigtime=1 /dev/sda3
MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 /dev/sda3 /mnt/xfstests/scratch
generic/616 170s
generic/617 ^C^C^C^C
# dmesg
...
[ 530.180466] run fstests generic/617 at 2021-02-05 03:41:10
[ 530.707969] ------------[ cut here ]------------
[ 530.708006] kernel BUG at arch/powerpc/include/asm/book3s/64/kup.h:207!
[ 530.708013] Oops: Exception in kernel mode, sig: 5 [#1]
[ 530.708018] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[ 530.708022] Modules linked in: bonding rfkill sunrpc uio_pdrv_genirq pseries_rng uio drm fuse drm_panel_orientation_quirks ip_tables xfs libcrc32c sd_mod t10_pi ibmvscsi ibmveth scsi_trans
port_srp xts vmx_crypto
[ 530.708049] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Not tainted 5.11.0-rc6+ #3
[ 530.708055] NIP: c0000000000aa0c8 LR: c0000000004b9278 CTR: 0000000000000000
[ 530.708059] REGS: c00000001c4ef150 TRAP: 0700 Not tainted (5.11.0-rc6+)
[ 530.708064] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24022804 XER: 2004000a
[ 530.708079] CFAR: c0000000000aa494 IRQMASK: 1
GPR00: c0000000004b9278 c00000001c4ef3f0 c000000002127000 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000008
GPR08: 0000000000000000 000000000000003e 0000000000000001 ffffffffffffffff
GPR12: 0000000044022804 c00000001ec7d200 00301d1c00000080 c000000002201a78
GPR16: 00007fffa0930000 c00000001c4ef5d4 0000000000000000 0000000000000000
GPR20: 0000000008000000 0008000000000040 07000000000000c0 00007fffa0930000
GPR24: c00000002d889f80 0000000000000002 0000000000000000 c0800000460b0386
GPR28: 0000000000000004 00007fffa0920000 c00000001c1d3490 86030b46000080c0
[ 530.708139] NIP [c0000000000aa0c8] pkey_access_permitted+0x28/0x90
[ 530.708146] LR [c0000000004b9278] gup_pte_range+0x188/0x420
[ 530.708152] Call Trace:
[ 530.708154] [c00000001c4ef490] [c0000000004bd39c] gup_pgd_range+0x3ac/0xa20
[ 530.708160] [c00000001c4ef5a0] [c0000000004bdd44] internal_get_user_pages_fast+0x334/0x410
[ 530.708167] [c00000001c4ef620] [c000000000852028] iov_iter_get_pages+0xf8/0x5c0
[ 530.708173] [c00000001c4ef6a0] [c0000000007da44c] bio_iov_iter_get_pages+0xec/0x700
[ 530.708180] [c00000001c4ef770] [c0000000006a325c] iomap_dio_bio_actor+0x2ac/0x4f0
[ 530.708186] [c00000001c4ef810] [c00000000069cd94] iomap_apply+0x2b4/0x740
[ 530.708191] [c00000001c4ef920] [c0000000006a38b8] __iomap_dio_rw+0x238/0x5c0
[ 530.708197] [c00000001c4ef9d0] [c0000000006a3c60] iomap_dio_rw+0x20/0x80
[ 530.708203] [c00000001c4ef9f0] [c008000001927a30] xfs_file_dio_aio_write+0x1f8/0x650 [xfs]
[ 530.708273] [c00000001c4efa60] [c0080000019284dc] xfs_file_write_iter+0xc4/0x130 [xfs]
[ 530.708340] [c00000001c4efa90] [c000000000669984] io_write+0x104/0x4b0
[ 530.708346] [c00000001c4efbb0] [c00000000066cea4] io_issue_sqe+0x3d4/0xf50
[ 530.708352] [c00000001c4efc60] [c000000000670200] io_wq_submit_work+0xb0/0x2f0
[ 530.708358] [c00000001c4efcb0] [c000000000674268] io_worker_handle_work+0x248/0x4a0
[ 530.708364] [c00000001c4efd30] [c0000000006746e8] io_wqe_worker+0x228/0x2a0
[ 530.708369] [c00000001c4efda0] [c00000000019d994] kthread+0x1b4/0x1c0
[ 530.708375] [c00000001c4efe10] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 530.708381] Instruction dump:
[ 530.708384] 60000000 60000000 7c0802a6 60000000 e94d0968 e90a2970 2c250000 5463083c
[ 530.708395] 2123003e 7d0a0074 794ad182 4082004c <0b0a0000> 2c240000 e9480168 4082001c
[ 530.708407] ---[ end trace 346ddedf8bc4b5b3 ]---
[ 530.710799] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:49
[ 530.710803] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 5587, name: io_wqe_worker-0
[ 530.710808] INFO: lockdep is turned off.
[ 530.710811] irq event stamp: 124
[ 530.710814] hardirqs last enabled at (123): [<c000000000fb8874>] _raw_spin_unlock_irqrestore+0x94/0xd0
[ 530.710821] hardirqs last disabled at (124): [<c0000000004bdd28>] internal_get_user_pages_fast+0x318/0x410
[ 530.710827] softirqs last enabled at (0): [<c00000000015abc8>] copy_process+0x688/0x1600
[ 530.710833] softirqs last disabled at (0): [<0000000000000000>] 0x0
[ 530.710838] CPU: 13 PID: 5587 Comm: io_wqe_worker-0 Tainted: G D 5.11.0-rc6+ #3
[ 530.710844] Call Trace:
[ 530.710846] [c00000001c4eee20] [c0000000008a6a14] dump_stack+0xe8/0x144 (unreliable)
[ 530.710854] [c00000001c4eee70] [c0000000001b0898] ___might_sleep+0x2e8/0x300
[ 530.710861] [c00000001c4eef00] [c00000000017e31c] exit_signals+0x4c/0x490
[ 530.710867] [c00000001c4eef50] [c000000000168b38] do_exit+0x108/0x740
[ 530.710873] [c00000001c4eefe0] [c00000000002c3dc] oops_end+0x18c/0x1c0
[ 530.710880] [c00000001c4ef060] [c00000000002e7c4] program_check_exception+0x2c4/0x3c0
[ 530.710886] [c00000001c4ef0e0] [c0000000000098fc] program_check_common_virt+0x30c/0x360
[ 530.710893] --- interrupt: 700 at pkey_access_permitted+0x28/0x90
[ 530.710898] NIP: c0000000000aa0c8 LR: c0000000004b9278 CTR: 0000000000000000
[ 530.710902] REGS: c00000001c4ef150 TRAP: 0700 Tainted: G D (5.11.0-rc6+)
[ 530.710907] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24022804 XER: 2004000a
[ 530.710922] CFAR: c0000000000aa494 IRQMASK: 1
GPR00: c0000000004b9278 c00000001c4ef3f0 c000000002127000 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000008
GPR08: 0000000000000000 000000000000003e 0000000000000001 ffffffffffffffff
GPR12: 0000000044022804 c00000001ec7d200 00301d1c00000080 c000000002201a78
GPR16: 00007fffa0930000 c00000001c4ef5d4 0000000000000000 0000000000000000
GPR20: 0000000008000000 0008000000000040 07000000000000c0 00007fffa0930000
GPR24: c00000002d889f80 0000000000000002 0000000000000000 c0800000460b0386
GPR28: 0000000000000004 00007fffa0920000 c00000001c1d3490 86030b46000080c0
[ 530.710980] NIP [c0000000000aa0c8] pkey_access_permitted+0x28/0x90
[ 530.710985] LR [c0000000004b9278] gup_pte_range+0x188/0x420
[ 530.710989] --- interrupt: 700
[ 530.710992] [c00000001c4ef3f0] [0000000000000000] 0x0 (unreliable)
[ 530.710997] [c00000001c4ef490] [c0000000004bd39c] gup_pgd_range+0x3ac/0xa20
[ 530.711003] [c00000001c4ef5a0] [c0000000004bdd44] internal_get_user_pages_fast+0x334/0x410
[ 530.711009] [c00000001c4ef620] [c000000000852028] iov_iter_get_pages+0xf8/0x5c0
[ 530.711016] [c00000001c4ef6a0] [c0000000007da44c] bio_iov_iter_get_pages+0xec/0x700
[ 530.711021] [c00000001c4ef770] [c0000000006a325c] iomap_dio_bio_actor+0x2ac/0x4f0
[ 530.711027] [c00000001c4ef810] [c00000000069cd94] iomap_apply+0x2b4/0x740
[ 530.711032] [c00000001c4ef920] [c0000000006a38b8] __iomap_dio_rw+0x238/0x5c0
[ 530.711038] [c00000001c4ef9d0] [c0000000006a3c60] iomap_dio_rw+0x20/0x80
[ 530.711044] [c00000001c4ef9f0] [c008000001927a30] xfs_file_dio_aio_write+0x1f8/0x650 [xfs]
[ 530.711115] [c00000001c4efa60] [c0080000019284dc] xfs_file_write_iter+0xc4/0x130 [xfs]
[ 530.711180] [c00000001c4efa90] [c000000000669984] io_write+0x104/0x4b0
[ 530.711186] [c00000001c4efbb0] [c00000000066cea4] io_issue_sqe+0x3d4/0xf50
[ 530.711192] [c00000001c4efc60] [c000000000670200] io_wq_submit_work+0xb0/0x2f0
[ 530.711198] [c00000001c4efcb0] [c000000000674268] io_worker_handle_work+0x248/0x4a0
[ 530.711204] [c00000001c4efd30] [c0000000006746e8] io_wqe_worker+0x228/0x2a0
[ 530.711210] [c00000001c4efda0] [c00000000019d994] kthread+0x1b4/0x1c0
[ 530.711215] [c00000001c4efe10] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
[ 530.807015] ------------[ cut here ]------------
[ 530.807020] WARNING: CPU: 13 PID: 0 at kernel/kthread.c:97 free_kthread_struct+0x44/0x60
[ 530.807027] Modules linked in: bonding rfkill sunrpc uio_pdrv_genirq pseries_rng uio drm fuse drm_panel_orientation_quirks ip_tables xfs libcrc32c sd_mod t10_pi ibmvscsi ibmveth scsi_transport_srp xts vmx_crypto
[ 530.807051] CPU: 13 PID: 0 Comm: swapper/13 Tainted: G D W 5.11.0-rc6+ #3
[ 530.807056] NIP: c00000000019fe04 LR: c000000000158fa8 CTR: c0000000007821a0
[ 530.807060] REGS: c0000000086d3410 TRAP: 0700 Tainted: G D W (5.11.0-rc6+)
[ 530.807065] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 82022888 XER: 20040007
[ 530.807077] CFAR: c000000000158fa4 IRQMASK: 0
GPR00: c000000000158fa8 c0000000086d36b0 c000000002127000 c000000025db6c00
GPR04: c0000000001a1fb4 c000000021827c00 ffffffffffffffff 0000000000000000
GPR08: 0000000000000000 c000000003815208 0000000000000000 000000000000000d
GPR12: 0000000000002000 c00000001ec7d200 0000000000000001 000000001ef2d9a0
GPR16: c000000002157a00 00000001000059d4 c00000000159b910 c0000000017e8480
GPR20: c00000000025a374 c000000001fad680 0000000000000000 c00000047b159ea0
GPR24: c00000000025a374 000000000000000a c0000000086d3760 c00000047b159e00
GPR28: c000000002180820 c0000000021802c8 c000000021827c00 c000000021827c00
[ 530.807135] NIP [c00000000019fe04] free_kthread_struct+0x44/0x60
[ 530.807140] LR [c000000000158fa8] free_task+0x98/0xe0
[ 530.807145] Call Trace:
[ 530.807147] [c0000000086d36b0] [c00000000036e838] ftrace_graph_exit_task+0x28/0x40 (unreliable)
[ 530.807156] [c0000000086d36d0] [c000000000158fa8] free_task+0x98/0xe0
[ 530.807162] [c0000000086d3700] [c00000000016557c] delayed_put_task_struct+0x16c/0x270
[ 530.807168] [c0000000086d3740] [c00000000025a3d8] rcu_do_batch+0x268/0x750
[ 530.807175] [c0000000086d37e0] [c00000000025b04c] rcu_core+0x36c/0x4a0
[ 530.807180] [c0000000086d3840] [c000000000fb9a30] __do_softirq+0x190/0x718
[ 530.807187] [c0000000086d3950] [c00000000016b008] __irq_exit_rcu+0x218/0x260
[ 530.807193] [c0000000086d3980] [c00000000016b280] irq_exit+0x20/0x50
[ 530.807199] [c0000000086d39a0] [c00000000002b730] timer_interrupt+0x1a0/0x520
[ 530.807206] [c0000000086d3a10] [c000000000009dd8] decrementer_common_virt+0x1d8/0x1e0
[ 530.807212] --- interrupt: 900 at plpar_hcall_norets+0x1c/0x28
[ 530.807218] NIP: c0000000000fe900 LR: c000000000c05c94 CTR: 0000000000000000
[ 530.807222] REGS: c0000000086d3a80 TRAP: 0900 Tainted: G D W (5.11.0-rc6+)
[ 530.807226] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24000888 XER: 20040007
[ 530.807238] CFAR: 0000000000000c00 IRQMASK: 0
GPR00: 0000000000000000 c0000000086d3d20 c000000002127000 0000000000000000
GPR04: 000000000000000e 0000000000000300 0000000000000400 000000000000ffff
GPR08: 0000000000000000 000a000000000000 00000000000e0380 0000000000000001
GPR12: 00000000000dd080 c00000001ec7d200 0000000000000000 000000001ef2d9a0
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001
GPR24: 000000000000000d 0000000000000000 0000007b95f781fe 0000000000000001
GPR28: 0000000000000000 0000000000000001 c000000001596fd8 c000000001596fe0
[ 530.807296] NIP [c0000000000fe900] plpar_hcall_norets+0x1c/0x28
[ 530.807301] LR [c000000000c05c94] check_and_cede_processor.part.0+0x24/0x70
[ 530.807307] --- interrupt: 900
[ 530.807309] [c0000000086d3d20] [0000000000000000] 0x0 (unreliable)
[ 530.807315] [c0000000086d3d80] [c000000000c062b4] dedicated_cede_loop+0x164/0x210
[ 530.807321] [c0000000086d3dc0] [c000000000c02cbc] cpuidle_enter_state+0x2bc/0x500
[ 530.807327] [c0000000086d3e20] [c000000000c02f9c] cpuidle_enter+0x4c/0x70
[ 530.807332] [c0000000086d3e60] [c0000000001c7c90] cpuidle_idle_call+0x1c0/0x2f0
[ 530.807338] [c0000000086d3eb0] [c0000000001c7f34] do_idle+0x174/0x230
[ 530.807344] [c0000000086d3f10] [c0000000001c83ec] cpu_startup_entry+0x3c/0x40
[ 530.807351] [c0000000086d3f40] [c000000000060b38] start_secondary+0x278/0x280
[ 530.807357] [c0000000086d3f90] [c00000000000cb54] start_secondary_prolog+0x10/0x14
[ 530.807362] Instruction dump:
[ 530.807366] f8010010 f821ffe1 81230114 6d290020 79295fe2 0b090000 e86307f8 2c230000
[ 530.807376] 41820014 e92300e0 2c290000 41820008 <0fe00000> 483a3231 60000000 38210020
[ 530.807387] irq event stamp: 1487390
[ 530.807390] hardirqs last enabled at (1487389): [<c00000000029cb84>] tick_nohz_idle_exit+0x94/0x200
[ 530.807396] hardirqs last disabled at (1487390): [<c000000000fae964>] __schedule+0x344/0x8b0
[ 530.807402] softirqs last enabled at (1487378): [<c000000000fb9f48>] __do_softirq+0x6a8/0x718
[ 530.807409] softirqs last disabled at (1487373): [<c00000000016b008>] __irq_exit_rcu+0x218/0x260
[ 530.807414] ---[ end trace 346ddedf8bc4b5b4 ]---
> arch/powerpc/include/asm/book3s/64/kup.h | 23 ++++++++++++-----------
> 1 file changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/book3s/64/kup.h b/arch/powerpc/include/asm/book3s/64/kup.h
> index f50f72e535aa..2064621ae7b6 100644
> --- a/arch/powerpc/include/asm/book3s/64/kup.h
> +++ b/arch/powerpc/include/asm/book3s/64/kup.h
> @@ -202,22 +202,16 @@ DECLARE_STATIC_KEY_FALSE(uaccess_flush_key);
> #include <asm/mmu.h>
> #include <asm/ptrace.h>
>
> -/*
> - * For kernel thread that doesn't have thread.regs return
> - * default AMR/IAMR values.
> - */
> static inline u64 current_thread_amr(void)
> {
> - if (current->thread.regs)
> - return current->thread.regs->amr;
> - return AMR_KUAP_BLOCKED;
> + VM_BUG_ON(!current->thread.regs);
> + return current->thread.regs->amr;
> }
>
> static inline u64 current_thread_iamr(void)
> {
> - if (current->thread.regs)
> - return current->thread.regs->iamr;
> - return AMR_KUEP_BLOCKED;
> + VM_BUG_ON(!current->thread.regs);
> + return current->thread.regs->iamr;
> }
> #endif /* CONFIG_PPC_PKEY */
>
> @@ -384,7 +378,14 @@ static __always_inline void allow_user_access(void __user *to, const void __user
> // This is written so we can resolve to a single case at build time
> BUILD_BUG_ON(!__builtin_constant_p(dir));
>
> - if (mmu_has_feature(MMU_FTR_PKEY))
> + /*
> + * Kernel threads may access user mm with kthread_use_mm() but
> + * can't use current_thread_amr because they have thread.regs==NULL,
> + * but they have no pkeys.
> + */
> + if (current->flags & PF_KTHREAD)
> + thread_amr = 0;
> + else if (mmu_has_feature(MMU_FTR_PKEY))
> thread_amr = current_thread_amr();
>
> if (dir == KUAP_READ)
> --
> 2.29.2
>
^ permalink raw reply
* Re: [PATCH] mm/memtest: Add ARCH_USE_MEMTEST
From: Vladimir Murzin @ 2021-02-05 9:20 UTC (permalink / raw)
To: Anshuman Khandual, linux-mm
Cc: Chris Zankel, Thomas Bogendoerfer, Catalin Marinas, Will Deacon,
linux-xtensa, linux-kernel, Russell King, Max Filippov,
Ingo Molnar, Paul Mackerras, Thomas Gleixner, linux-mips,
linuxppc-dev, linux-arm-kernel
In-Reply-To: <1612498242-31579-1-git-send-email-anshuman.khandual@arm.com>
Hi Anshuman,
On 2/5/21 4:10 AM, Anshuman Khandual wrote:
> early_memtest() does not get called from all architectures. Hence enabling
> CONFIG_MEMTEST and providing a valid memtest=[1..N] kernel command line
> option might not trigger the memory pattern tests as would be expected in
> normal circumstances. This situation is misleading.
Documentation already mentions which architectures support that:
memtest= [KNL,X86,ARM,PPC] Enable memtest
yet I admit that not all reflected there
>
> The change here prevents the above mentioned problem after introducing a
> new config option ARCH_USE_MEMTEST that should be subscribed on platforms
> that call early_memtest(), in order to enable the config CONFIG_MEMTEST.
> Conversely CONFIG_MEMTEST cannot be enabled on platforms where it would
> not be tested anyway.
>
Is that generic pattern? What about other cross arch parameters? Do they already
use similar subscription or they rely on documentation?
I'm not against the patch just want to check if things are consistent...
Cheers
Vladimir
^ permalink raw reply
* [PATCH V2] powerpc/perf: Record counter overflow always if SAMPLE_IP is unset
From: Athira Rajeev @ 2021-02-05 9:14 UTC (permalink / raw)
To: mpe; +Cc: maddy, linuxppc-dev
While sampling for marked events, currently we record the sample only
if the SIAR valid bit of Sampled Instruction Event Register (SIER) is
set. SIAR_VALID bit is used for fetching the instruction address from
Sampled Instruction Address Register(SIAR). But there are some usecases,
where the user is interested only in the PMU stats at each counter
overflow and the exact IP of the overflow event is not required.
Dropping SIAR invalid samples will fail to record some of the counter
overflows in such cases.
Example of such usecase is dumping the PMU stats (event counts)
after some regular amount of instructions/events from the userspace
(ex: via ptrace). Here counter overflow is indicated to userspace via
signal handler, and captured by monitoring and enabling I/O
signaling on the event file descriptor. In these cases, we expect to
get sample/overflow indication after each specified sample_period.
Perf event attribute will not have PERF_SAMPLE_IP set in the
sample_type if exact IP of the overflow event is not requested. So
while profiling if SAMPLE_IP is not set, just record the counter overflow
irrespective of SIAR_VALID check.
Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
Changes in v2:
-- Changed the approach to include PERF_SAMPLE_IP
condition while checking siar_valid as Suggested by
Michael Ellerman.
arch/powerpc/perf/core-book3s.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 28206b1fe172..0ddbe33798ce 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2149,7 +2149,17 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
left += period;
if (left <= 0)
left = period;
- record = siar_valid(regs);
+
+ /*
+ * If address is not requested in the sample
+ * via PERF_SAMPLE_IP, just record that sample
+ * irrespective of SIAR valid check.
+ */
+ if (event->attr.sample_type & PERF_SAMPLE_IP)
+ record = siar_valid(regs);
+ else
+ record = 1;
+
event->hw.last_period = event->hw.sample_period;
}
if (left < 0x80000000LL)
@@ -2167,9 +2177,10 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
* MMCR2. Check attr.exclude_kernel and address to drop the sample in
* these cases.
*/
- if (event->attr.exclude_kernel && record)
- if (is_kernel_addr(mfspr(SPRN_SIAR)))
- record = 0;
+ if (event->attr.exclude_kernel &&
+ (event->attr.sample_type & PERF_SAMPLE_IP) &&
+ is_kernel_addr(mfspr(SPRN_SIAR)))
+ record = 0;
/*
* Finally record data if requested.
--
1.8.3.1
^ permalink raw reply related
* [PATCH] powerpc/8xx: Fix software emulation interrupt
From: Christophe Leroy @ 2021-02-05 8:56 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, npiggin
Cc: linuxppc-dev, linux-kernel
For unimplemented instructions or unimplemented SPRs, the 8xx triggers
a "Software Emulation Exception" (0x1000). That interrupt doesn't set
reason bits in SRR1 as the "Program Check Exception" does.
Go through emulation_assist_interrupt() to set REASON_ILLEGAL.
Fixes: fbbcc3bb139e ("powerpc/8xx: Remove SoftwareEmulation()")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
I'm wondering whether it wouldn't be better to set REASON_ILLEGAL
in the exception prolog and still call program_check_exception.
And do the same in book3s/64 to avoid the nightmare of an
INTERRUPT_HANDLER calling another INTERRUPT_HANDLER.
---
arch/powerpc/kernel/head_8xx.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 52702f3db6df..9eb63cf6ac38 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -165,7 +165,7 @@ SystemCall:
/* On the MPC8xx, this is a software emulation interrupt. It occurs
* for all unimplemented and illegal instructions.
*/
- EXCEPTION(0x1000, SoftEmu, program_check_exception, EXC_XFER_STD)
+ EXCEPTION(0x1000, SoftEmu, emulation_assist_interrupt, EXC_XFER_STD)
. = 0x1100
/*
--
2.25.0
^ permalink raw reply related
* Re: [PATCH] mm/pmem: Avoid inserting hugepage PTE entry with fsdax if hugepage support is disabled
From: David Hildenbrand @ 2021-02-05 8:29 UTC (permalink / raw)
To: Aneesh Kumar K.V, linux-nvdimm, dan.j.williams,
Kirill A . Shutemov, Jan Kara
Cc: linux-mm, linuxppc-dev
In-Reply-To: <20210205023956.417587-1-aneesh.kumar@linux.ibm.com>
On 05.02.21 03:39, Aneesh Kumar K.V wrote:
> Differentiate between hardware not supporting hugepages and user disabling THP
> via 'echo never > /sys/kernel/mm/transparent_hugepage/enabled'
>
> For the devdax namespace, the kernel handles the above via the
> supported_alignment attribute and failing to initialize the namespace
> if the namespace align value is not supported on the platform.
>
> For the fsdax namespace, the kernel will continue to initialize
> the namespace. This can result in the kernel creating a huge pte
> entry even though the hardware don't support the same.
>
> We do want hugepage support with pmem even if the end-user disabled THP
> via sysfs file (/sys/kernel/mm/transparent_hugepage/enabled). Hence
> differentiate between hardware/firmware lacking support vs user-controlled
> disable of THP and prevent a huge fault if the hardware lacks hugepage
> support.
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
> include/linux/huge_mm.h | 15 +++++++++------
> mm/huge_memory.c | 6 +++++-
> 2 files changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index 6a19f35f836b..ba973efcd369 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -78,6 +78,7 @@ static inline vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn,
> }
>
> enum transparent_hugepage_flag {
> + TRANSPARENT_HUGEPAGE_NEVER_DAX,
> TRANSPARENT_HUGEPAGE_FLAG,
> TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG,
> TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG,
> @@ -123,6 +124,13 @@ extern unsigned long transparent_hugepage_flags;
> */
> static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
> {
> +
> + /*
> + * If the hardware/firmware marked hugepage support disabled.
> + */
> + if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX))
> + return false;
> +
> if (vma->vm_flags & VM_NOHUGEPAGE)
> return false;
>
> @@ -134,12 +142,7 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma)
>
> if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG))
> return true;
> - /*
> - * For dax vmas, try to always use hugepage mappings. If the kernel does
> - * not support hugepages, fsdax mappings will fallback to PAGE_SIZE
> - * mappings, and device-dax namespaces, that try to guarantee a given
> - * mapping size, will fail to enable
> - */
> +
> if (vma_is_dax(vma))
> return true;
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 9237976abe72..d698b7e27447 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -386,7 +386,11 @@ static int __init hugepage_init(void)
> struct kobject *hugepage_kobj;
>
> if (!has_transparent_hugepage()) {
> - transparent_hugepage_flags = 0;
> + /*
> + * Hardware doesn't support hugepages, hence disable
> + * DAX PMD support.
> + */
> + transparent_hugepage_flags = 1 << TRANSPARENT_HUGEPAGE_NEVER_DAX;
> return -EINVAL;
> }
>
>
Looks sane to me from my limited understanding of that code :)
--
Thanks,
David / dhildenb
^ permalink raw reply
* Re: [PATCH] arch:powerpc simple_write_to_buffer return check
From: Mayank Suman @ 2021-02-05 8:29 UTC (permalink / raw)
To: Christophe Leroy, ruscur, oohall, mpe, benh, paulus, linuxppc-dev,
linux-kernel
In-Reply-To: <8be2b91b-cef1-ea68-836a-94c8a574d760@csgroup.eu>
On 05/02/21 12:51 pm, Christophe Leroy wrote:
> Please provide some description of the change.
>
> And please clarify the patch subject, because as far as I can see, the return is already checked allthough the check seams wrong.
This was my first patch. I will try to provide better description of changes and subject in later patches.
> Le 04/02/2021 à 19:16, Mayank Suman a écrit :
>> Signed-off-by: Mayank Suman <mayanksuman@live.com>
>> ---
>> arch/powerpc/kernel/eeh.c | 8 ++++----
>> arch/powerpc/platforms/powernv/eeh-powernv.c | 4 ++--
>> 2 files changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>> index 813713c9120c..2dbe1558a71f 100644
>> --- a/arch/powerpc/kernel/eeh.c
>> +++ b/arch/powerpc/kernel/eeh.c
>> @@ -1628,8 +1628,8 @@ static ssize_t eeh_force_recover_write(struct file *filp,
>> char buf[20];
>> int ret;
>> - ret = simple_write_to_buffer(buf, sizeof(buf), ppos, user_buf, count);
>> - if (!ret)
>> + ret = simple_write_to_buffer(buf, sizeof(buf)-1, ppos, user_buf, count);
>> + if (ret <= 0) > return -EFAULT;
>
> Why return -EFAULT when the function has returned -EINVAL ?
If -EINVAL is returned by simple_write_to_buffer, we should return -EINVAL.
> And why is it -EFAULT when ret is 0 ? EFAULT means error accessing memory.
>
The earlier check returned EFAULT when ret is 0. Most probably, there was an assumption
that writing 0 bytes (by simple_write_to_buffer) means a fault with memory (or error accessing memory).
^ permalink raw reply
* Re: [PATCH v7 28/42] powerpc: convert interrupt handlers to use wrappers
From: Christophe Leroy @ 2021-02-05 8:09 UTC (permalink / raw)
To: Nicholas Piggin, linuxppc-dev; +Cc: Athira Rajeev
In-Reply-To: <20210130130852.2952424-29-npiggin@gmail.com>
Le 30/01/2021 à 14:08, Nicholas Piggin a écrit :
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index f70d3f6174c8..7ff915aae8ec 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -1462,7 +1474,7 @@ static int emulate_math(struct pt_regs *regs)
> static inline int emulate_math(struct pt_regs *regs) { return -1; }
> #endif
>
> -void program_check_exception(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(program_check_exception)
> {
> enum ctx_state prev_state = exception_enter();
> unsigned int reason = get_reason(regs);
> @@ -1587,14 +1599,14 @@ NOKPROBE_SYMBOL(program_check_exception);
> * This occurs when running in hypervisor mode on POWER6 or later
> * and an illegal instruction is encountered.
> */
> -void emulation_assist_interrupt(struct pt_regs *regs)
> +DEFINE_INTERRUPT_HANDLER(emulation_assist_interrupt)
> {
> regs->msr |= REASON_ILLEGAL;
> program_check_exception(regs);
Is it correct that an INTERRUPT_HANDLER calls another INTERRUPT_HANDLER ?
> }
> NOKPROBE_SYMBOL(emulation_assist_interrupt);
>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox