Re: [PATCH v6 07/10] KVM: s390: Fix cmma dirty tracking

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Claudio Imbrenda" <imbrenda@linux.ibm.com>
Cc: linux-s390@vger.kernel.org, "Heiko Carstens" <hca@linux.ibm.com>,
	"Vasily Gorbik" <gor@linux.ibm.com>,
	"Alexander Gordeev" <agordeev@linux.ibm.com>,
	kvm@vger.kernel.org
Subject: Re: [PATCH v6 07/10] KVM: s390: Fix cmma dirty tracking
Date: Tue, 23 Jun 2026 15:50:27 +0000	[thread overview]
Message-ID: <20260623155028.CF4821F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260623153331.233784-8-imbrenda@linux.ibm.com>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] Race condition between kvm_s390_vm_stop_migration() and handle_essa() causes cmma_dirty_pages undercounting and data loss.
- [Medium] Missing curly braces around the kvm->arch.use_cmma condition in kvm_s390_vm_stop_migration() causes an unconditional, highly expensive guest page table walk.

Pre-existing issues:
- [High] Page table walk skips pages and silently overwrites return values in dat_crste_walk_range(), breaking dat_test_age_gfn() and dat_reset_skeys().
- [High] Uninitialized vmalloc memory is copied to user space in kvm_s390_get_cmma_bits(), leaking kernel memory.
- [High] The CMMA dirty tracking allows native hardware execution of ESSA instructions during the full memory walk, potentially causing dirty CMMA state to be missed during live migration.
--

commit 9bad94601846b4f7b13bcf1ade41fc1b47e4dedf
Author: Claudio Imbrenda <imbrenda@linux.ibm.com>

KVM: s390: Fix cmma dirty tracking

This commit modifies the CMMA dirty tracking to set the migration_mode flag
before marking all existing PGSTEs as cmma-dirty. It also attempts to clear
the cmma_d bit and reset the dirty page count when stopping migration, and
addresses a wrap-around case in dat_get_cmma() when searching for pages.

> diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c
> index cffac7782c4bf..0ad4ebc80ebaf 100644
> --- a/arch/s390/kvm/dat.c
> +++ b/arch/s390/kvm/dat.c
> @@ -1253,6 +1253,9 @@ int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *values,
>  	};
>  
>  	_dat_walk_gfn_range(*start, asce_end(asce), asce, &ops, DAT_WALK_IGN_HOLES, &state);

[Severity: High]
This is a pre-existing issue, but does dat_get_cmma() leave memory uninitialized
when skipping clean pages?

Looking at kvm_s390_get_cmma_bits(), the values array is allocated using
vmalloc(args->count), which leaves the memory uninitialized. When
dat_get_cmma() writes the dirty states, it skips clean pages (cmma_d == 0)
and unmapped holes without writing to those indices. Does the subsequent
copy_to_user() in kvm_s390_get_cmma_bits() then leak this uninitialized
kernel memory to user space?

[Severity: High]
This is also a pre-existing issue, but does the page table walk in
dat_crste_walk_range() silently overwrite return values?

When _dat_walk_gfn_range() calls dat_crste_walk_range(), the loop logic
places the if (rc) break; condition before the child walk is checked:

arch/s390/kvm/dat.c:dat_crste_walk_range() {
    ...
    the_op = walk->ops->crste_ops[crste.h.tt];
    if (the_op) {
        rc = the_op(walk->last, cur, next, walk);
        ...
    }
    if (rc)
        break;
    if (!crste.h.i && !crste.h.fc) {
        if (!is_pmd(crste))
            rc = dat_crste_walk_range(...);
        else if (walk->ops->pte_entry)
            rc = dat_pte_walk_range(...);
    }
}

When dat_pte_walk_range() returns a positive rc, the loop proceeds to the
next iteration. If the_op executes (for example, _dat_test_young_crste
returning 0), does it overwrite the previous positive rc, causing the loop
to fail to break and ignoring yield requests or young-page hits?

[ ... ]

> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index fe0ae15e5ad07..91cf7a3c55c79 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1210,28 +1210,54 @@ static int kvm_s390_vm_start_migration(struct kvm *kvm)
>  	kvm_for_each_memslot(ms, bkt, slots) {
>  		if (!ms->dirty_bitmap)
>  			return -EINVAL;
> -		ram_pages += ms->npages;
>  	}
> -	/* mark all the pages as dirty */
> +	/*
> +	 * Set the flag and let KVM handle ESSA manually, potentially setting
> +	 * the cmma_d bit in some PGSTEs and increasing cmma_dirty_pages.
> +	 * At this point cmma_dirty_pages is still 0, and all existing PGSTEs
> +	 * have their cmma_d bit set to 0.
> +	 * Any newly allocated page table has its entries marked as cmma-clean,
> +	 * which is fine because the CMMA values are not dirty.
> +	 */
> +	WRITE_ONCE(kvm->arch.migration_mode, 1);
> +	/*
> +	 * Mark all PGSTEs as cmma-dirty, increasing cmma_dirty_pages as needed,
> +	 * but without double-counting pages that have become dirty on their own
> +	 * in the meantime.
> +	 * At this point some pages might have become dirty on their own already
> +	 * and cmma_dirty_pages might therefore be non-zero.
> +	 */
>  	gmap_set_cmma_all_dirty(kvm->arch.gmap);
> -	atomic64_set(&kvm->arch.cmma_dirty_pages, ram_pages);
> -	kvm->arch.migration_mode = 1;
>  	kvm_s390_sync_request_broadcast(kvm, KVM_REQ_START_MIGRATION);

[Severity: High]
This is a pre-existing issue, but does this sequence allow native hardware
execution of ESSA instructions during the full memory walk?

Since the KVM_REQ_START_MIGRATION broadcast (which clears ECB2_CMMA to
disable hardware ESSA execution) is issued after gmap_set_cmma_all_dirty()
completes, a vCPU could execute ESSA natively on a newly allocated page
that lies behind the current walk pointer. Since newly allocated PGSTEs
start with cmma_d = 0 and the walk has already passed them, could this
cause dirty CMMA state to be missed during live migration?

>  	return 0;
>  }
>  
>  /*
> - * Must be called with kvm->slots_lock to avoid races with ourselves and
> - * kvm_s390_vm_start_migration.
> + * Must be called with kvm->slots_lock to avoid races with ourselves,
> + * kvm_s390_vm_start_migration() and kvm_s390_get_cmma_bits().
>   */
>  static int kvm_s390_vm_stop_migration(struct kvm *kvm)
>  {
>  	/* migration mode already disabled */
>  	if (!kvm->arch.migration_mode)
>  		return 0;
> -	kvm->arch.migration_mode = 0;
> +	/*
> +	 * Unset the flag and propagate to all vCPUs. From now on the cmma_d
> +	 * bit will not be touched on any PGSTE.
> +	 * At this point cmma_dirty_pages is possibly non-zero, and thus some
> +	 * PGSTEs might have cmma_d set.
> +	 */
> +	WRITE_ONCE(kvm->arch.migration_mode, 0);
>  	if (kvm->arch.use_cmma)
>  		kvm_s390_sync_request_broadcast(kvm, KVM_REQ_STOP_MIGRATION);
> +	/* Clear cmma_d on all existing PGSTEs and set cmma_dirty_pages to 0. */
> +	gmap_set_cmma_all_clean(kvm->arch.gmap);
> +	atomic64_set(&kvm->arch.cmma_dirty_pages, 0);

[Severity: Medium]
Are curly braces missing around the kvm->arch.use_cmma condition?

Because they are absent, gmap_set_cmma_all_clean() and atomic64_set()
execute unconditionally. For VMs where use_cmma is false, this seems to force
an unnecessary full walk of the entire guest page table under mmu_lock to
clear bits that were never set, which could stall the host and VM.

[Severity: High]
Can a race condition between kvm_s390_vm_stop_migration() and handle_essa()
cause cmma_dirty_pages to undercount?

If handle_essa() reads migration_mode locklessly and enters __do_essa()
concurrently with kvm_s390_vm_stop_migration():

CPU1 (kvm_s390_vm_stop_migration):
    gmap_set_cmma_all_clean(kvm->arch.gmap);

CPU2 (handle_essa -> __do_essa):
    Sets cmma_d = 1 on a PTE that CPU1 already cleaned
    atomic64_inc(&kvm->arch.cmma_dirty_pages);

CPU1 (kvm_s390_vm_stop_migration):
    atomic64_set(&kvm->arch.cmma_dirty_pages, 0);

Would this unconditional reset to 0 overwrite CPU2's increment, leaving a
PTE with cmma_d == 1 but cmma_dirty_pages == 0, leading to premature stops
in dat_get_cmma() and data loss during the next migration?

> +	/*
> +	 * At this point the system has the expected state: migration_mode is 0,
> +	 * cmma_dirty_pages is 0, and all existing PGSTEs have their cmma_d bit
> +	 * set to 0.
> +	 */
>  	return 0;
>  }

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623153331.233784-1-imbrenda@linux.ibm.com?part=7

next prev parent reply	other threads:[~2026-06-23 15:50 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-23 15:33 [PATCH v6 00/10] KVM: s390: A bunch of gmap-related fixes Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 01/10] s390/mm: Fix handling of _PAGE_UNUSED pte bit Claudio Imbrenda
2026-06-23 15:54   ` sashiko-bot
2026-06-23 15:33 ` [PATCH v6 02/10] KVM: s390: Fix dat_peek_cmma() overflow Claudio Imbrenda
2026-06-23 15:49   ` sashiko-bot
2026-06-23 15:33 ` [PATCH v6 03/10] KVM: s390: Do not set special large pages dirty Claudio Imbrenda
2026-06-23 15:55   ` sashiko-bot
2026-06-23 15:33 ` [PATCH v6 04/10] KVM: s390: Fix code typo in gmap_protect_asce_top_level() Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 05/10] KVM: s390: Fix handle_{sske,pfmf} under memory pressure Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 06/10] KVM: s390: Fix locking in kvm_s390_set_mem_control() Claudio Imbrenda
2026-06-23 15:49   ` sashiko-bot
2026-06-23 15:33 ` [PATCH v6 07/10] KVM: s390: Fix cmma dirty tracking Claudio Imbrenda
2026-06-23 15:50   ` sashiko-bot [this message]
2026-06-23 15:33 ` [PATCH v6 08/10] KVM: s390: selftests: Fix cmma selftest Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 09/10] KVM: s390: Return failure in case of failure in kvm_s390_set_cmma_bits() Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 10/10] KVM: s390: vsie: Avoid potential deadlock with real spaces Claudio Imbrenda
2026-06-23 15:51   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623155028.CF4821F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=agordeev@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=imbrenda@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.