From: sashiko-bot@kernel.org
To: "Claudio Imbrenda" <imbrenda@linux.ibm.com>
Cc: linux-s390@vger.kernel.org, "Heiko Carstens" <hca@linux.ibm.com>,
"Vasily Gorbik" <gor@linux.ibm.com>,
"Alexander Gordeev" <agordeev@linux.ibm.com>,
kvm@vger.kernel.org
Subject: Re: [PATCH v6 07/10] KVM: s390: Fix cmma dirty tracking
Date: Tue, 23 Jun 2026 15:50:27 +0000 [thread overview]
Message-ID: <20260623155028.CF4821F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260623153331.233784-8-imbrenda@linux.ibm.com>
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
New issues:
- [High] Race condition between kvm_s390_vm_stop_migration() and handle_essa() causes cmma_dirty_pages undercounting and data loss.
- [Medium] Missing curly braces around the kvm->arch.use_cmma condition in kvm_s390_vm_stop_migration() causes an unconditional, highly expensive guest page table walk.
Pre-existing issues:
- [High] Page table walk skips pages and silently overwrites return values in dat_crste_walk_range(), breaking dat_test_age_gfn() and dat_reset_skeys().
- [High] Uninitialized vmalloc memory is copied to user space in kvm_s390_get_cmma_bits(), leaking kernel memory.
- [High] The CMMA dirty tracking allows native hardware execution of ESSA instructions during the full memory walk, potentially causing dirty CMMA state to be missed during live migration.
--
commit 9bad94601846b4f7b13bcf1ade41fc1b47e4dedf
Author: Claudio Imbrenda <imbrenda@linux.ibm.com>
KVM: s390: Fix cmma dirty tracking
This commit modifies the CMMA dirty tracking to set the migration_mode flag
before marking all existing PGSTEs as cmma-dirty. It also attempts to clear
the cmma_d bit and reset the dirty page count when stopping migration, and
addresses a wrap-around case in dat_get_cmma() when searching for pages.
> diff --git a/arch/s390/kvm/dat.c b/arch/s390/kvm/dat.c
> index cffac7782c4bf..0ad4ebc80ebaf 100644
> --- a/arch/s390/kvm/dat.c
> +++ b/arch/s390/kvm/dat.c
> @@ -1253,6 +1253,9 @@ int dat_get_cmma(union asce asce, gfn_t *start, unsigned int *count, u8 *values,
> };
>
> _dat_walk_gfn_range(*start, asce_end(asce), asce, &ops, DAT_WALK_IGN_HOLES, &state);
[Severity: High]
This is a pre-existing issue, but does dat_get_cmma() leave memory uninitialized
when skipping clean pages?
Looking at kvm_s390_get_cmma_bits(), the values array is allocated using
vmalloc(args->count), which leaves the memory uninitialized. When
dat_get_cmma() writes the dirty states, it skips clean pages (cmma_d == 0)
and unmapped holes without writing to those indices. Does the subsequent
copy_to_user() in kvm_s390_get_cmma_bits() then leak this uninitialized
kernel memory to user space?
[Severity: High]
This is also a pre-existing issue, but does the page table walk in
dat_crste_walk_range() silently overwrite return values?
When _dat_walk_gfn_range() calls dat_crste_walk_range(), the loop logic
places the if (rc) break; condition before the child walk is checked:
arch/s390/kvm/dat.c:dat_crste_walk_range() {
...
the_op = walk->ops->crste_ops[crste.h.tt];
if (the_op) {
rc = the_op(walk->last, cur, next, walk);
...
}
if (rc)
break;
if (!crste.h.i && !crste.h.fc) {
if (!is_pmd(crste))
rc = dat_crste_walk_range(...);
else if (walk->ops->pte_entry)
rc = dat_pte_walk_range(...);
}
}
When dat_pte_walk_range() returns a positive rc, the loop proceeds to the
next iteration. If the_op executes (for example, _dat_test_young_crste
returning 0), does it overwrite the previous positive rc, causing the loop
to fail to break and ignoring yield requests or young-page hits?
[ ... ]
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index fe0ae15e5ad07..91cf7a3c55c79 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -1210,28 +1210,54 @@ static int kvm_s390_vm_start_migration(struct kvm *kvm)
> kvm_for_each_memslot(ms, bkt, slots) {
> if (!ms->dirty_bitmap)
> return -EINVAL;
> - ram_pages += ms->npages;
> }
> - /* mark all the pages as dirty */
> + /*
> + * Set the flag and let KVM handle ESSA manually, potentially setting
> + * the cmma_d bit in some PGSTEs and increasing cmma_dirty_pages.
> + * At this point cmma_dirty_pages is still 0, and all existing PGSTEs
> + * have their cmma_d bit set to 0.
> + * Any newly allocated page table has its entries marked as cmma-clean,
> + * which is fine because the CMMA values are not dirty.
> + */
> + WRITE_ONCE(kvm->arch.migration_mode, 1);
> + /*
> + * Mark all PGSTEs as cmma-dirty, increasing cmma_dirty_pages as needed,
> + * but without double-counting pages that have become dirty on their own
> + * in the meantime.
> + * At this point some pages might have become dirty on their own already
> + * and cmma_dirty_pages might therefore be non-zero.
> + */
> gmap_set_cmma_all_dirty(kvm->arch.gmap);
> - atomic64_set(&kvm->arch.cmma_dirty_pages, ram_pages);
> - kvm->arch.migration_mode = 1;
> kvm_s390_sync_request_broadcast(kvm, KVM_REQ_START_MIGRATION);
[Severity: High]
This is a pre-existing issue, but does this sequence allow native hardware
execution of ESSA instructions during the full memory walk?
Since the KVM_REQ_START_MIGRATION broadcast (which clears ECB2_CMMA to
disable hardware ESSA execution) is issued after gmap_set_cmma_all_dirty()
completes, a vCPU could execute ESSA natively on a newly allocated page
that lies behind the current walk pointer. Since newly allocated PGSTEs
start with cmma_d = 0 and the walk has already passed them, could this
cause dirty CMMA state to be missed during live migration?
> return 0;
> }
>
> /*
> - * Must be called with kvm->slots_lock to avoid races with ourselves and
> - * kvm_s390_vm_start_migration.
> + * Must be called with kvm->slots_lock to avoid races with ourselves,
> + * kvm_s390_vm_start_migration() and kvm_s390_get_cmma_bits().
> */
> static int kvm_s390_vm_stop_migration(struct kvm *kvm)
> {
> /* migration mode already disabled */
> if (!kvm->arch.migration_mode)
> return 0;
> - kvm->arch.migration_mode = 0;
> + /*
> + * Unset the flag and propagate to all vCPUs. From now on the cmma_d
> + * bit will not be touched on any PGSTE.
> + * At this point cmma_dirty_pages is possibly non-zero, and thus some
> + * PGSTEs might have cmma_d set.
> + */
> + WRITE_ONCE(kvm->arch.migration_mode, 0);
> if (kvm->arch.use_cmma)
> kvm_s390_sync_request_broadcast(kvm, KVM_REQ_STOP_MIGRATION);
> + /* Clear cmma_d on all existing PGSTEs and set cmma_dirty_pages to 0. */
> + gmap_set_cmma_all_clean(kvm->arch.gmap);
> + atomic64_set(&kvm->arch.cmma_dirty_pages, 0);
[Severity: Medium]
Are curly braces missing around the kvm->arch.use_cmma condition?
Because they are absent, gmap_set_cmma_all_clean() and atomic64_set()
execute unconditionally. For VMs where use_cmma is false, this seems to force
an unnecessary full walk of the entire guest page table under mmu_lock to
clear bits that were never set, which could stall the host and VM.
[Severity: High]
Can a race condition between kvm_s390_vm_stop_migration() and handle_essa()
cause cmma_dirty_pages to undercount?
If handle_essa() reads migration_mode locklessly and enters __do_essa()
concurrently with kvm_s390_vm_stop_migration():
CPU1 (kvm_s390_vm_stop_migration):
gmap_set_cmma_all_clean(kvm->arch.gmap);
CPU2 (handle_essa -> __do_essa):
Sets cmma_d = 1 on a PTE that CPU1 already cleaned
atomic64_inc(&kvm->arch.cmma_dirty_pages);
CPU1 (kvm_s390_vm_stop_migration):
atomic64_set(&kvm->arch.cmma_dirty_pages, 0);
Would this unconditional reset to 0 overwrite CPU2's increment, leaving a
PTE with cmma_d == 1 but cmma_dirty_pages == 0, leading to premature stops
in dat_get_cmma() and data loss during the next migration?
> + /*
> + * At this point the system has the expected state: migration_mode is 0,
> + * cmma_dirty_pages is 0, and all existing PGSTEs have their cmma_d bit
> + * set to 0.
> + */
> return 0;
> }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260623153331.233784-1-imbrenda@linux.ibm.com?part=7
next prev parent reply other threads:[~2026-06-23 15:50 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-23 15:33 [PATCH v6 00/10] KVM: s390: A bunch of gmap-related fixes Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 01/10] s390/mm: Fix handling of _PAGE_UNUSED pte bit Claudio Imbrenda
2026-06-23 15:54 ` sashiko-bot
2026-06-23 15:33 ` [PATCH v6 02/10] KVM: s390: Fix dat_peek_cmma() overflow Claudio Imbrenda
2026-06-23 15:49 ` sashiko-bot
2026-06-23 15:33 ` [PATCH v6 03/10] KVM: s390: Do not set special large pages dirty Claudio Imbrenda
2026-06-23 15:55 ` sashiko-bot
2026-06-23 15:33 ` [PATCH v6 04/10] KVM: s390: Fix code typo in gmap_protect_asce_top_level() Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 05/10] KVM: s390: Fix handle_{sske,pfmf} under memory pressure Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 06/10] KVM: s390: Fix locking in kvm_s390_set_mem_control() Claudio Imbrenda
2026-06-23 15:49 ` sashiko-bot
2026-06-23 15:33 ` [PATCH v6 07/10] KVM: s390: Fix cmma dirty tracking Claudio Imbrenda
2026-06-23 15:50 ` sashiko-bot [this message]
2026-06-23 15:33 ` [PATCH v6 08/10] KVM: s390: selftests: Fix cmma selftest Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 09/10] KVM: s390: Return failure in case of failure in kvm_s390_set_cmma_bits() Claudio Imbrenda
2026-06-23 15:33 ` [PATCH v6 10/10] KVM: s390: vsie: Avoid potential deadlock with real spaces Claudio Imbrenda
2026-06-23 15:51 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260623155028.CF4821F000E9@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=agordeev@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox