From: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
To: Paul Mackerras <paulus@ozlabs.org>
Cc: linuxppc-dev@lists.ozlabs.org, npiggin@gmail.com
Subject: Re: [PATCH v4 09/22] powerpc/kvm/book3s: Add helper to walk partition scoped linux page table.
Date: Thu, 28 May 2020 11:31:04 +0530 [thread overview]
Message-ID: <e732e386-4a8c-2a7d-220c-e22e85b7a6c3@linux.ibm.com> (raw)
In-Reply-To: <20200528014338.GC307798@thinks.paulus.ozlabs.org>
On 5/28/20 7:13 AM, Paul Mackerras wrote:
> On Tue, May 05, 2020 at 12:47:16PM +0530, Aneesh Kumar K.V wrote:
>> The locking rules for walking partition scoped table is different from process
>> scoped table. Hence add a helper for secondary linux page table walk and also
>> add check whether we are holding the right locks.
>
> This patch is causing new warnings to appear when testing migration,
> like this:
>
> [ 142.090159] ------------[ cut here ]------------
> [ 142.090160] find_kvm_secondary_pte called with kvm mmu_lock not held
> [ 142.090176] WARNING: CPU: 23 PID: 5341 at arch/powerpc/include/asm/kvm_book3s_64.h:644 kvmppc_hv_get_dirty_log_radix+0x2e4/0x340 [kvm_hv]
> [ 142.090177] Modules linked in: xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables bpfilter overlay binfmt_misc input_leds raid_class scsi_transport_sas sch_fq_codel sunrpc kvm_hv kvm
> [ 142.090188] CPU: 23 PID: 5341 Comm: qemu-system-ppc Tainted: G W 5.7.0-rc5-kvm-00211-g9ccf10d6d088 #432
> [ 142.090189] NIP: c008000000fe848c LR: c008000000fe8488 CTR: 0000000000000000
> [ 142.090190] REGS: c000001e19f077e0 TRAP: 0700 Tainted: G W (5.7.0-rc5-kvm-00211-g9ccf10d6d088)
> [ 142.090191] MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42222422 XER: 20040000
> [ 142.090196] CFAR: c00000000012f5ac IRQMASK: 0
> GPR00: c008000000fe8488 c000001e19f07a70 c008000000ffe200 0000000000000039
> GPR04: 0000000000000001 c000001ffc8b4900 0000000000018840 0000000000000007
> GPR08: 0000000000000003 0000000000000001 0000000000000007 0000000000000001
> GPR12: 0000000000002000 c000001fff6d9400 000000011f884678 00007fff70b70000
> GPR16: 00007fff7137cb90 00007fff7dcb4410 0000000000000001 0000000000000000
> GPR20: 000000000ffe0000 0000000000000000 0000000000000001 0000000000000000
> GPR24: 8000000000000000 0000000000000001 c000001e1f67e600 c000001e1fd82410
> GPR28: 0000000000001000 c000001e2e410000 0000000000000fff 0000000000000ffe
> [ 142.090217] NIP [c008000000fe848c] kvmppc_hv_get_dirty_log_radix+0x2e4/0x340 [kvm_hv]
> [ 142.090223] LR [c008000000fe8488] kvmppc_hv_get_dirty_log_radix+0x2e0/0x340 [kvm_hv]
> [ 142.090224] Call Trace:
> [ 142.090230] [c000001e19f07a70] [c008000000fe8488] kvmppc_hv_get_dirty_log_radix+0x2e0/0x340 [kvm_hv] (unreliable)
> [ 142.090236] [c000001e19f07b50] [c008000000fd42e4] kvm_vm_ioctl_get_dirty_log_hv+0x33c/0x3c0 [kvm_hv]
> [ 142.090292] [c000001e19f07be0] [c008000000eea878] kvm_vm_ioctl_get_dirty_log+0x30/0x50 [kvm]
> [ 142.090300] [c000001e19f07c00] [c008000000edc818] kvm_vm_ioctl+0x2b0/0xc00 [kvm]
> [ 142.090302] [c000001e19f07d50] [c00000000046e148] ksys_ioctl+0xf8/0x150
> [ 142.090305] [c000001e19f07da0] [c00000000046e1c8] sys_ioctl+0x28/0x80
> [ 142.090307] [c000001e19f07dc0] [c00000000003652c] system_call_exception+0x16c/0x240
> [ 142.090309] [c000001e19f07e20] [c00000000000d070] system_call_common+0xf0/0x278
> [ 142.090310] Instruction dump:
> [ 142.090312] 7d3a512a 4200ffd0 7ffefb78 4bfffdc4 60000000 3c820000 e8848468 3c620000
> [ 142.090317] e86384a8 38840010 4800673d e8410018 <0fe00000> 4bfffdd4 60000000 60000000
> [ 142.090322] ---[ end trace 619d45057b6919e0 ]---
>
> Indeed, kvm_radix_test_clear_dirty() tests the PTE dirty bit
> locklessly, and only takes the kvm->mmu_lock once it finds a dirty
> PTE. I think that is important for performance, since on any given
> scan of the guest real address space we may only find a small
> proportion of the guest pages to be dirty.
>
> Are you now relying on the kvm->mmu_lock to protect the existence of
> the PTEs, or just their content?
>
The patch series should not change any rules w.r.t kvm partition scoped
page table walk. We only added helpers to make it explicit that this is
different from regular linux page table walk. And kvm->mmu_lock is what
was used to protect the partition scoped table walk earlier.
In this specific case, what we need probably is an open coded kvm
partition scoped walk with a comment around explaining why is it ok to
walk that partition scoped table without taking kvm->mmu_lock.
What happens when a parallel invalidate happens to Qemu address space?
Since we are not holding kvm->mmu_lock mmu notifier will complete and we
will go ahead with unmapping partition scoped table.
Do we need a change like below?
@@ -1040,7 +1040,7 @@ static int kvm_radix_test_clear_dirty(struct kvm *kvm,
{
unsigned long gfn = memslot->base_gfn + pagenum;
unsigned long gpa = gfn << PAGE_SHIFT;
- pte_t *ptep;
+ pte_t *ptep, pte;
unsigned int shift;
int ret = 0;
unsigned long old, *rmapp;
@@ -1049,11 +1049,23 @@ static int kvm_radix_test_clear_dirty(struct kvm
*kvm,
return ret;
ptep = find_kvm_secondary_pte(kvm, gpa, &shift);
- if (ptep && pte_present(*ptep) && pte_dirty(*ptep)) {
+ if (!ptep)
+ return 0;
+
+ pte = READ_ONCE(*ptep);
+ if (pte_present(pte) && pte_dirty(pte)) {
ret = 1;
if (shift)
ret = 1 << (shift - PAGE_SHIFT);
spin_lock(&kvm->mmu_lock);
+ /*
+ * Recheck the pte again
+ */
+ if (pte_val(pte) != pte_val(*ptep)) {
+ spin_unlock(&kvm->mmu_lock);
+ return 0;
+ }
+
old = kvmppc_radix_update_pte(kvm, ptep, _PAGE_DIRTY, 0,
gpa, shift);
kvmppc_radix_tlbie_page(kvm, gpa, shift, kvm->arch.lpid);
next prev parent reply other threads:[~2020-05-28 6:03 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-05 7:17 [PATCH v4 00/22] Avoid IPI while updating page table entries Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 01/22] powerpc/pkeys: Avoid using lockless page table walk Aneesh Kumar K.V
2020-05-13 8:49 ` Michael Ellerman
2020-05-05 7:17 ` [PATCH v4 02/22] powerpc/pkeys: Check vma before returning key fault error to the user Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 03/22] powerpc/mm/hash64: use _PAGE_PTE when checking for pte_present Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 04/22] powerpc/hash64: Restrict page table lookup using init_mm with __flush_hash_table_range Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 05/22] powerpc/book3s64/hash: Use the pte_t address from the caller Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 06/22] powerpc/mce: Don't reload pte val in addr_to_pfn Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 07/22] powerpc/perf/callchain: Use __get_user_pages_fast in read_user_stack_slow Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 08/22] powerpc/kvm/book3s: switch from raw_spin_*lock to arch_spin_lock Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 09/22] powerpc/kvm/book3s: Add helper to walk partition scoped linux page table Aneesh Kumar K.V
2020-05-28 1:43 ` Paul Mackerras
2020-05-28 6:01 ` Aneesh Kumar K.V [this message]
2020-05-28 7:25 ` Paul Mackerras
2020-05-05 7:17 ` [PATCH v4 10/22] powerpc/kvm/nested: Add helper to walk nested shadow " Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 11/22] powerpc/kvm/book3s: Use kvm helpers to walk shadow or secondary table Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 12/22] powerpc/kvm/book3s: Add helper for host page table walk Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 13/22] powerpc/kvm/book3s: Use find_kvm_host_pte in page fault handler Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 14/22] powerpc/kvm/book3s: Use find_kvm_host_pte in h_enter Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 15/22] powerpc/kvm/book3s: use find_kvm_host_pte in pute_tce functions Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 16/22] powerpc/kvm/book3s: Avoid using rmap to protect parallel page table update Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 17/22] powerpc/kvm/book3s: use find_kvm_host_pte in kvmppc_book3s_instantiate_page Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 18/22] powerpc/kvm/book3s: Use find_kvm_host_pte in kvmppc_get_hpa Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 19/22] powerpc/kvm/book3s: Use pte_present instead of opencoding _PAGE_PRESENT check Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 20/22] powerpc/mm/book3s64: Avoid sending IPI on clearing PMD Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 21/22] mm: change pmdp_huge_get_and_clear_full take vm_area_struct as arg Aneesh Kumar K.V
2020-05-05 7:17 ` [PATCH v4 22/22] powerpc/mm/book3s64: Fix MADV_DONTNEED and parallel page fault race Aneesh Kumar K.V
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e732e386-4a8c-2a7d-220c-e22e85b7a6c3@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=npiggin@gmail.com \
--cc=paulus@ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).