Kernel KVM virtualization development
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Claudio Imbrenda" <imbrenda@linux.ibm.com>
Cc: linux-s390@vger.kernel.org, "Heiko Carstens" <hca@linux.ibm.com>,
	"Vasily Gorbik" <gor@linux.ibm.com>,
	"Alexander Gordeev" <agordeev@linux.ibm.com>,
	kvm@vger.kernel.org
Subject: Re: [PATCH v6 07/10] KVM: s390: Fix cmma dirty tracking
Date: Tue, 23 Jun 2026 15:50:27 +0000	[thread overview]
Message-ID: <20260623155028.CF4821F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260623153331.233784-8-imbrenda@linux.ibm.com>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] Race condition between kvm_s390_vm_stop_migration() and handle_essa() causes cmma_dirty_pages undercounting and data loss.
- [Medium] Missing curly braces around the kvm->arch.use_cmma condition in kvm_s390_vm_stop_migration() causes an unconditional, highly expensive guest page table walk.

Pre-existing issues:
- [High] Page table walk skips pages and silently overwrites return values in dat_crste_walk_range(), breaking dat_test_age_gfn() and dat_reset_skeys().
- [High] Uninitialized vmalloc memory is copied to user space in kvm_s390_get_cmma_bits(), leaking kernel memory.
- [High] The CMMA dirty tracking allows native hardware execution of ESSA instructions during the full memory walk, potentially causing dirty CMMA state to be missed during live migration.
--

commit 9bad94601846b4f7b13bcf1ade41fc1b47e4dedf
Author: Claudio Imbrenda <imbrenda@linux.ibm.com>

KVM: s390: Fix cmma dirty tracking

This commit modifies the CMMA dirty tracking to set the migration_mode flag
before marking all existing PGSTEs as cmma-dirty. It also attempts to clear
the cmma_d bit and reset the dirty page count when stopping migration, and
addresses a wrap-around case in dat_get_cmma() when searching for pages.

> diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c
> index cffac7782c4bf..0ad4ebc80ebaf 100644
> --- a/arch/s390/kvm/dat.c
> +++ b/arch/s390/kvm/dat.c
> @@ -1253,6 +1253,9 @@ int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *values,
>  	};
>  
>  	_dat_walk_gfn_range(*start, asce_end(asce), asce, &ops, DAT_WALK_IGN_HOLES, &state);

[Severity: High]
This is a pre-existing issue, but does dat_get_cmma() leave memory uninitialized
when skipping clean pages?

Looking at kvm_s390_get_cmma_bits(), the values array is allocated using
vmalloc(args->count), which leaves the memory uninitialized. When
dat_get_cmma() writes the dirty states, it skips clean pages (cmma_d == 0)
and unmapped holes without writing to those indices. Does the subsequent
copy_to_user() in kvm_s390_get_cmma_bits() then leak this uninitialized
kernel memory to user space?

[Severity: High]
This is also a pre-existing issue, but does the page table walk in
dat_crste_walk_range() silently overwrite return values?

When _dat_walk_gfn_range() calls dat_crste_walk_range(), the loop logic
places the if (rc) break; condition before the child walk is checked:

arch/s390/kvm/dat.c:dat_crste_walk_range() {
    ...
    the_op = walk->ops->crste_ops[crste.h.tt];
    if (the_op) {
        rc = the_op(walk->last, cur, next, walk);
        ...
    }
    if (rc)
        break;
    if (!crste.h.i && !crste.h.fc) {
        if (!is_pmd(crste))
            rc = dat_crste_walk_range(...);
        else if (walk->ops->pte_entry)
            rc = dat_pte_walk_range(...);
    }
}

When dat_pte_walk_range() returns a positive rc, the loop proceeds to the
next iteration. If the_op executes (for example, _dat_test_young_crste
returning 0), does it overwrite the previous positive rc, causing the loop
to fail to break and ignoring yield requests or young-page hits?

[ ... ]

> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index fe0ae15e5ad07..91cf7a3c55c79 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1210,28 +1210,54 @@ static int kvm_s390_vm_start_migration(struct kvm *kvm)
>  	kvm_for_each_memslot(ms, bkt, slots) {
>  		if (!ms->dirty_bitmap)
>  			return -EINVAL;
> -		ram_pages += ms->npages;
>  	}
> -	/* mark all the pages as dirty */
> +	/*
> +	 * Set the flag and let KVM handle ESSA manually, potentially setting
> +	 * the cmma_d bit in some PGSTEs and increasing cmma_dirty_pages.
> +	 * At this point cmma_dirty_pages is still 0, and all existing PGSTEs
> +	 * have their cmma_d bit set to 0.
> +	 * Any newly allocated page table has its entries marked as cmma-clean,
> +	 * which is fine because the CMMA values are not dirty.
> +	 */
> +	WRITE_ONCE(kvm->arch.migration_mode, 1);
> +	/*
> +	 * Mark all PGSTEs as cmma-dirty, increasing cmma_dirty_pages as needed,
> +	 * but without double-counting pages that have become dirty on their own
> +	 * in the meantime.
> +	 * At this point some pages might have become dirty on their own already
> +	 * and cmma_dirty_pages might therefore be non-zero.
> +	 */
>  	gmap_set_cmma_all_dirty(kvm->arch.gmap);
> -	atomic64_set(&kvm->arch.cmma_dirty_pages, ram_pages);
> -	kvm->arch.migration_mode = 1;
>  	kvm_s390_sync_request_broadcast(kvm, KVM_REQ_START_MIGRATION);

[Severity: High]
This is a pre-existing issue, but does this sequence allow native hardware
execution of ESSA instructions during the full memory walk?

Since the KVM_REQ_START_MIGRATION broadcast (which clears ECB2_CMMA to
disable hardware ESSA execution) is issued after gmap_set_cmma_all_dirty()
completes, a vCPU could execute ESSA natively on a newly allocated page
that lies behind the current walk pointer. Since newly allocated PGSTEs
start with cmma_d = 0 and the walk has already passed them, could this
cause dirty CMMA state to be missed during live migration?

>  	return 0;
>  }
>  
>  /*
> - * Must be called with kvm->slots_lock to avoid races with ourselves and
> - * kvm_s390_vm_start_migration.
> + * Must be called with kvm->slots_lock to avoid races with ourselves,
> + * kvm_s390_vm_start_migration() and kvm_s390_get_cmma_bits().
>   */
>  static int kvm_s390_vm_stop_migration(struct kvm *kvm)
>  {
>  	/* migration mode already disabled */
>  	if (!kvm->arch.migration_mode)
>  		return 0;
> -	kvm->arch.migration_mode = 0;
> +	/*
> +	 * Unset the flag and propagate to all vCPUs. From now on the cmma_d
> +	 * bit will not be touched on any PGSTE.
> +	 * At this point cmma_dirty_pages is possibly non-zero, and thus some
> +	 * PGSTEs might have cmma_d set.
> +	 */
> +	WRITE_ONCE(kvm->arch.migration_mode, 0);
>  	if (kvm->arch.use_cmma)
>  		kvm_s390_sync_request_broadcast(kvm, KVM_REQ_STOP_MIGRATION);
> +	/* Clear cmma_d on all existing PGSTEs and set cmma_dirty_pages to 0. */
> +	gmap_set_cmma_all_clean(kvm->arch.gmap);
> +	atomic64_set(&kvm->arch.cmma_dirty_pages, 0);

[Severity: Medium]
Are curly braces missing around the kvm->arch.use_cmma condition?

Because they are absent, gmap_set_cmma_all_clean() and atomic64_set()
execute unconditionally. For VMs where use_cmma is false, this seems to force
an unnecessary full walk of the entire guest page table under mmu_lock to
clear bits that were never set, which could stall the host and VM.

[Severity: High]
Can a race condition between kvm_s390_vm_stop_migration() and handle_essa()
cause cmma_dirty_pages to undercount?

If handle_essa() reads migration_mode locklessly and enters __do_essa()
concurrently with kvm_s390_vm_stop_migration():

CPU1 (kvm_s390_vm_stop_migration):
    gmap_set_cmma_all_clean(kvm->arch.gmap);

CPU2 (handle_essa -> __do_essa):
    Sets cmma_d = 1 on a PTE that CPU1 already cleaned
    atomic64_inc(&kvm->arch.cmma_dirty_pages);

CPU1 (kvm_s390_vm_stop_migration):
    atomic64_set(&kvm->arch.cmma_dirty_pages, 0);

Would this unconditional reset to 0 overwrite CPU2's increment, leaving a
PTE with cmma_d == 1 but cmma_dirty_pages == 0, leading to premature stops
in dat_get_cmma() and data loss during the next migration?

> +	/*
> +	 * At this point the system has the expected state: migration_mode is 0,
> +	 * cmma_dirty_pages is 0, and all existing PGSTEs have their cmma_d bit
> +	 * set to 0.
> +	 */
>  	return 0;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623153331.233784-1-imbrenda@linux.ibm.com?part=7

  reply	other threads:[~2026-06-23 15:50 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 15:33 [PATCH v6 00/10] KVM: s390: A bunch of gmap-related fixes Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 01/10] s390/mm: Fix handling of _PAGE_UNUSED pte bit Claudio Imbrenda
2026-06-23 15:54   ` sashiko-bot
2026-06-23 15:33 ` [PATCH v6 02/10] KVM: s390: Fix dat_peek_cmma() overflow Claudio Imbrenda
2026-06-23 15:49   ` sashiko-bot
2026-06-23 15:33 ` [PATCH v6 03/10] KVM: s390: Do not set special large pages dirty Claudio Imbrenda
2026-06-23 15:55   ` sashiko-bot
2026-06-23 15:33 ` [PATCH v6 04/10] KVM: s390: Fix code typo in gmap_protect_asce_top_level() Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 05/10] KVM: s390: Fix handle_{sske,pfmf} under memory pressure Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 06/10] KVM: s390: Fix locking in kvm_s390_set_mem_control() Claudio Imbrenda
2026-06-23 15:49   ` sashiko-bot
2026-06-23 15:33 ` [PATCH v6 07/10] KVM: s390: Fix cmma dirty tracking Claudio Imbrenda
2026-06-23 15:50   ` sashiko-bot [this message]
2026-06-23 15:33 ` [PATCH v6 08/10] KVM: s390: selftests: Fix cmma selftest Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 09/10] KVM: s390: Return failure in case of failure in kvm_s390_set_cmma_bits() Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 10/10] KVM: s390: vsie: Avoid potential deadlock with real spaces Claudio Imbrenda
2026-06-23 15:51   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623155028.CF4821F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=agordeev@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox