Re: [GIT PULL v1 10/12] KVM: s390: Fix cmma dirty tracking

Linux s390 Architecture development
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Claudio Imbrenda" <imbrenda@linux.ibm.com>
Cc: "Alexander Gordeev" <agordeev@linux.ibm.com>,
	linux-s390@vger.kernel.org, kvm@vger.kernel.org,
	"Heiko Carstens" <hca@linux.ibm.com>,
	"Vasily Gorbik" <gor@linux.ibm.com>
Subject: Re: [GIT PULL v1 10/12] KVM: s390: Fix cmma dirty tracking
Date: Wed, 24 Jun 2026 11:46:40 +0000	[thread overview]
Message-ID: <20260624114640.C7EBF1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260624111928.144283-11-imbrenda@linux.ibm.com>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] Race between `kvm_s390_vm_stop_migration()` and `handle_essa()` causes permanent corruption of the `cmma_dirty_pages` counter.
- [High] Unbounded guest address space scan in `dat_get_cmma()` while holding the `mmu_lock` read lock causes soft lockups.
- [High] The `cmma_dirty_pages` counter is not decremented when PGSTEs are zapped, leading to a permanent counter leak.

Pre-existing issues:
- [High] Decoupled locking in `__do_essa()` causes `cmma_dirty_pages` to temporarily underflow, exposing a massive `remaining` value to userspace.
- [High] `KVM_S390_GET_CMMA_BITS` leaks uninitialized kernel heap memory to userspace due to unpopulated array elements.
--

commit f98e5de8f917aa1f013fc20f1b092133a210644d
Author: Claudio Imbrenda <imbrenda@linux.ibm.com>

KVM: s390: Fix cmma dirty tracking

It is possible that some guest memory areas have not been touched yet
when starting migration mode, and thus have no ptes allocated. Only
existing and allocated ptes should count toward the total of dirty cmma
entries.

[ ... ]

> diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c
> index cffac7782c4bf..0ad4ebc80ebaf 100644
> --- a/arch/s390/kvm/dat.c
> +++ b/arch/s390/kvm/dat.c
> @@ -1253,6 +1253,9 @@ int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *values,
>  	};
>  
>  	_dat_walk_gfn_range(*start, asce_end(asce), asce, &ops, DAT_WALK_IGN_HOLES, &state);
> +	/* If no dirty pages were found, wrap around and continue searching */
> +	if (*start && state.start == -1)
> +		_dat_walk_gfn_range(0, *start, asce, &ops, DAT_WALK_IGN_HOLES, &state);

[Severity: High]
Can this wrap-around scan trigger a host soft lockup?

Since both the initial and wrap-around scans execute entirely under the
mmu_lock read lock, preemption is disabled. If a guest has a massive
memory address space that is fully populated but contains no dirty CMMA
pages, dat_get_cmma could iterate over millions of page table entries in
a tight loop without ever checking need_resched or yielding.

[Severity: High]
This is a pre-existing issue, but does dat_get_cmma leak uninitialized
kernel heap memory to userspace here?

When KVM_S390_GET_CMMA_BITS is invoked, the values array is allocated using
vmalloc without zero-initialization. If dat_get_cmma encounters clean pages
or unpopulated holes interleaved with dirty pages, __dat_get_cmma_pte skips
those array indices. The subsequent copy_to_user in kvm_s390_get_cmma_bits
then copies the entire contiguous array back to userspace, taking those
uninitialized vmalloc bytes with it.

>  	if (state.start == -1) {
>  		*count = 0;

[ ... ]

> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 221b2fb199d4e..9ad6bd4edbce3 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
[ ... ]
> @@ -1210,28 +1210,54 @@ static int kvm_s390_vm_start_migration(struct kvm *kvm)
>  	kvm_for_each_memslot(ms, bkt, slots) {
>  		if (!ms->dirty_bitmap)
>  			return -EINVAL;
> -		ram_pages += ms->npages;
>  	}
> -	/* mark all the pages as dirty */
> -	gmap_set_cmma_all_dirty(kvm->arch.gmap);
> -	atomic64_set(&kvm->arch.cmma_dirty_pages, ram_pages);
> -	kvm->arch.migration_mode = 1;
> +	/*
> +	 * Set the flag and let KVM handle ESSA manually, potentially setting
> +	 * the cmma_d bit in some PGSTEs and increasing cmma_dirty_pages.
> +	 * At this point cmma_dirty_pages is still 0, and all existing PGSTEs
> +	 * have their cmma_d bit set to 0.
> +	 * Any newly allocated page table has its entries marked as cmma-clean,
> +	 * which is fine because the CMMA values are not dirty.
> +	 */
> +	WRITE_ONCE(kvm->arch.migration_mode, 1);
>  	kvm_s390_sync_request_broadcast(kvm, KVM_REQ_START_MIGRATION);
> +	/*
> +	 * Mark all PGSTEs as cmma-dirty, increasing cmma_dirty_pages as needed,
> +	 * but without double-counting pages that have become dirty on their own
> +	 * in the meantime.
> +	 * At this point some pages might have become dirty on their own already
> +	 * and cmma_dirty_pages might therefore be non-zero.
> +	 */
> +	gmap_set_cmma_all_dirty(kvm->arch.gmap);

[Severity: High]
Could this lead to a permanent leak in the cmma_dirty_pages counter?

This establishes an invariant that the cmma_dirty_pages counter mirrors the
global number of PGSTEs with the cmma_d bit set. However, if host userspace
unmaps memory or deletes a memslot while migration mode is active,
dat_free_pt destroys the page tables and associated PGSTEs without
decrementing cmma_dirty_pages. This leaves the counter artificially inflated,
potentially trapping the VMM in an infinite migration loop.

>  	return 0;
>  }
>  
>  /*
> - * Must be called with kvm->slots_lock to avoid races with ourselves and
> - * kvm_s390_vm_start_migration.
> + * Must be called with kvm->slots_lock to avoid races with ourselves,
> + * kvm_s390_vm_start_migration() and kvm_s390_get_cmma_bits().
>   */
>  static int kvm_s390_vm_stop_migration(struct kvm *kvm)
>  {
>  	/* migration mode already disabled */
>  	if (!kvm->arch.migration_mode)
>  		return 0;
> -	kvm->arch.migration_mode = 0;
> +	/*
> +	 * Unset the flag and propagate to all vCPUs. From now on the cmma_d
> +	 * bit will not be touched on any PGSTE.
> +	 * At this point cmma_dirty_pages is possibly non-zero, and thus some
> +	 * PGSTEs might have cmma_d set.
> +	 */
> +	WRITE_ONCE(kvm->arch.migration_mode, 0);
>  	if (kvm->arch.use_cmma)
>  		kvm_s390_sync_request_broadcast(kvm, KVM_REQ_STOP_MIGRATION);
> +	/* Clear cmma_d on all existing PGSTEs and set cmma_dirty_pages to 0. */
> +	gmap_set_cmma_all_clean(kvm->arch.gmap);
> +	atomic64_set(&kvm->arch.cmma_dirty_pages, 0);

[Severity: High]
Is there a race condition here with concurrent ESSA instructions?

kvm_s390_sync_request_broadcast does not wait for vCPUs currently executing
host code. If a vCPU executing handle_essa reads migration_mode as 1, it
will proceed to __do_essa. Because gmap_set_cmma_all_clean and __do_essa
can overlap, the stop-migration thread might clear cmma_dirty_pages to 0
right before __do_essa calls atomic64_inc. This would leave the counter at
1 with no actual dirty PGSTEs, breaking migration invariants.

> +	/*
> +	 * At this point the system has the expected state: migration_mode is 0,
> +	 * cmma_dirty_pages is 0, and all existing PGSTEs have their cmma_d bit
> +	 * set to 0.
> +	 */
>  	return 0;
>  }

[ ... ]

> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 9bc6fd02ff777..ad0ddc433a73c 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -1236,7 +1236,7 @@ static int handle_essa(struct kvm_vcpu *vcpu)
>  						: ESSA_SET_STABLE_IF_RESIDENT))
>  		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
>  
> -	if (!vcpu->kvm->arch.migration_mode) {
> +	if (!READ_ONCE(vcpu->kvm->arch.migration_mode)) {

[Severity: High]
This is a pre-existing issue, but does the decoupling of the PGSTE lock
and the global counter increment in __do_essa open a race window?

In __do_essa, dat_perform_essa sets the cmma_d bit under the PGSTE lock,
but the global cmma_dirty_pages counter is incremented outside of that
critical section. Concurrently, dat_get_cmma could acquire the same PGSTE
lock, see the bit set, clear it, and decrement cmma_dirty_pages before
__do_essa has incremented it. If the counter starts at 0, it temporarily
underflows to -1. If kvm_s390_get_cmma_bits reads it at that moment, it
exposes ULLONG_MAX to userspace as the remaining count.

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260624111928.144283-1-imbrenda@linux.ibm.com?part=10

next prev parent reply	other threads:[~2026-06-24 11:46 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24 11:19 [GIT PULL v1 00/12] KVM: s390: Fix S390_USER_OPEREXEC and more gmap fixes Claudio Imbrenda
2026-06-24 11:19 ` [GIT PULL v1 01/12] KVM: s390: Fix S390_USER_OPEREXEC enablement without STFLE 74 Claudio Imbrenda
2026-06-24 11:40   ` sashiko-bot
2026-06-24 11:19 ` [GIT PULL v1 02/12] KVM: s390: selftests: Extended user_operexec tests Claudio Imbrenda
2026-06-24 11:19 ` [GIT PULL v1 03/12] KVM: s390: Fix typo in UCONTROL documentation Claudio Imbrenda
2026-06-24 11:19 ` [GIT PULL v1 04/12] s390/mm: Fix handling of _PAGE_UNUSED pte bit Claudio Imbrenda
2026-06-24 11:42   ` sashiko-bot
2026-06-24 11:19 ` [GIT PULL v1 05/12] KVM: s390: Fix dat_peek_cmma() overflow Claudio Imbrenda
2026-06-24 11:38   ` sashiko-bot
2026-06-24 11:19 ` [GIT PULL v1 06/12] KVM: s390: Do not set special large pages dirty Claudio Imbrenda
2026-06-24 11:37   ` sashiko-bot
2026-06-24 11:19 ` [GIT PULL v1 07/12] KVM: s390: Fix code typo in gmap_protect_asce_top_level() Claudio Imbrenda
2026-06-24 11:19 ` [GIT PULL v1 08/12] KVM: s390: Fix handle_{sske,pfmf} under memory pressure Claudio Imbrenda
2026-06-24 11:19 ` [GIT PULL v1 09/12] KVM: s390: Fix locking in kvm_s390_set_mem_control() Claudio Imbrenda
2026-06-24 11:19 ` [GIT PULL v1 10/12] KVM: s390: Fix cmma dirty tracking Claudio Imbrenda
2026-06-24 11:46   ` sashiko-bot [this message]
2026-06-24 11:19 ` [GIT PULL v1 11/12] KVM: s390: selftests: Fix cmma selftest Claudio Imbrenda
2026-06-24 11:19 ` [GIT PULL v1 12/12] KVM: s390: Return failure in case of failure in kvm_s390_set_cmma_bits() Claudio Imbrenda
2026-06-24 16:56 ` [GIT PULL v1 00/12] KVM: s390: Fix S390_USER_OPEREXEC and more gmap fixes Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260624114640.C7EBF1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=agordeev@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox