From mboxrd@z Thu Jan 1 00:00:00 1970 From: Janosch Frank Subject: Re: [PATCH v1 2/2] KVM: s390: Fix storage attributes migration with memory slots Date: Wed, 17 Jan 2018 14:01:11 +0100 Message-ID: References: <1516035791-10609-1-git-send-email-imbrenda@linux.vnet.ibm.com> <1516035791-10609-3-git-send-email-imbrenda@linux.vnet.ibm.com> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="cxmdipH2MRdBOMuOtaVWNfcEetAZ6Mf4C" Cc: borntraeger@de.ibm.com, cohuck@redhat.com, pbonzini@redhat.com, david@redhat.com, schwidefsky@de.ibm.com To: Claudio Imbrenda , kvm@vger.kernel.org Return-path: Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:42706 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750830AbeAQNBX (ORCPT ); Wed, 17 Jan 2018 08:01:23 -0500 Received: from pps.filterd (m0098396.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w0HD1NCe142888 for ; Wed, 17 Jan 2018 08:01:23 -0500 Received: from e06smtp15.uk.ibm.com (e06smtp15.uk.ibm.com [195.75.94.111]) by mx0a-001b2d01.pphosted.com with ESMTP id 2fj5hbm1gx-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 17 Jan 2018 08:01:23 -0500 Received: from localhost by e06smtp15.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 17 Jan 2018 13:01:20 -0000 In-Reply-To: <1516035791-10609-3-git-send-email-imbrenda@linux.vnet.ibm.com> Sender: kvm-owner@vger.kernel.org List-ID: This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --cxmdipH2MRdBOMuOtaVWNfcEetAZ6Mf4C Content-Type: multipart/mixed; boundary="rudd1Xaj5ucG19HnGDAQ1IWYWvdZsLjxC"; protected-headers="v1" From: Janosch Frank To: Claudio Imbrenda , kvm@vger.kernel.org Cc: borntraeger@de.ibm.com, cohuck@redhat.com, pbonzini@redhat.com, david@redhat.com, schwidefsky@de.ibm.com Message-ID: Subject: Re: [PATCH v1 2/2] KVM: s390: Fix storage attributes migration with memory slots References: <1516035791-10609-1-git-send-email-imbrenda@linux.vnet.ibm.com> <1516035791-10609-3-git-send-email-imbrenda@linux.vnet.ibm.com> In-Reply-To: <1516035791-10609-3-git-send-email-imbrenda@linux.vnet.ibm.com> --rudd1Xaj5ucG19HnGDAQ1IWYWvdZsLjxC Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 15.01.2018 18:03, Claudio Imbrenda wrote: > This is a fix for several issues that were found in the original code > for storage attributes migration. >=20 > Now no bitmap is allocated to keep track of dirty storage attributes; > the extra bits of the per-memslot bitmap that are always present anyway= > are now used for this purpose. >=20 > The code has also been refactored a little to improve readability. Second pass. >=20 > Signed-off-by: Claudio Imbrenda > Fixes: 190df4a212a ("KVM: s390: CMMA tracking, ESSA emulation, migratio= n mode") > Fixes: 4036e3874a1 ("KVM: s390: ioctls to get and set guest storage att= ributes") > --- >=20 > +static int kvm_s390_peek_cmma(struct kvm *kvm, struct kvm_s390_cmma_lo= g *args, > + u8 *res, unsigned long bufsize) > +{ > + unsigned long pgstev, cur, hva, i =3D 0; > + int r, ret =3D 0; > + > + cur =3D args->start_gfn; s/cur/cur_gfn/ for a lot of occasions. > + while (i < bufsize) { > + hva =3D gfn_to_hva(kvm, cur); > + if (kvm_is_error_hva(hva)) { > + if (!i) > + ret =3D -EFAULT; Short comment, something like:? If we end up with a failure right away return an error, if we encounter it after successful reads pass the successes down. > + break; > + } > + r =3D get_pgste(kvm->mm, hva, &pgstev); > + if (r < 0) > + pgstev =3D 0; > + res[i++] =3D (pgstev >> 24) & 0x43; > + cur++; > + } > + args->count =3D i; If you feel comfortable with it you could count with args->count. > + > + return ret; > +} > + > +static unsigned long kvm_s390_next_dirty_cmma(struct kvm *kvm, > + unsigned long cur) > +{ > + struct kvm_memslots *slots =3D kvm_memslots(kvm); > + struct kvm_memory_slot *ms; > + int slotidx; > + > + slotidx =3D gfn_to_memslot_approx(kvm, cur); > + ms =3D slots->memslots + slotidx; I prefer proper indexing. > + > + if (ms->base_gfn + ms->npages <=3D cur) { That's the hole handling, right? > + slotidx--; > + /* If we are above the highest slot, wrap around */ > + if (slotidx < 0) > + slotidx =3D slots->used_slots - 1; > + > + ms =3D slots->memslots + slotidx; > + cur =3D ms->base_gfn; > + } > + cur =3D find_next_bit(_cmma_bitmap(ms), ms->npages, cur - ms->base_gf= n); So cur is now the index of the next set bit, right? I do not like that. Your next dirty bit is in another memslot. :) > + while ((slotidx > 0) && (cur >=3D ms->npages)) { > + slotidx--; > + ms =3D slots->memslots + slotidx; > + cur =3D find_next_bit(_cmma_bitmap(ms), ms->npages, > + cur - ms->base_gfn); > + } > + return cur + ms->base_gfn; I'd switch that, personal preference. > +} > + > +static int kvm_s390_get_cmma(struct kvm *kvm, struct kvm_s390_cmma_log= *args, > + u8 *res, unsigned long bufsize) > +{ > + unsigned long next, mem_end, cur, hva, pgstev, i =3D 0; > + struct kvm_memslots *slots =3D kvm_memslots(kvm); > + struct kvm_memory_slot *ms; > + int r, ret =3D 0; > + > + cur =3D kvm_s390_next_dirty_cmma(kvm, args->start_gfn); > + ms =3D gfn_to_memslot(kvm, cur); > + args->count =3D 0; > + args->start_gfn =3D 0; > + if (!ms) > + return 0; > + next =3D kvm_s390_next_dirty_cmma(kvm, cur + 1); > + mem_end =3D slots->memslots[0].base_gfn + slots->memslots[0].npages; > + > + args->start_gfn =3D cur; > + while (i < bufsize) { > + hva =3D gfn_to_hva(kvm, cur); > + if (kvm_is_error_hva(hva)) > + break; > + /* decrement only if we actually flipped the bit to 0 */ > + if (test_and_clear_bit(cur - ms->base_gfn, _cmma_bitmap(ms))) > + atomic64_dec(&kvm->arch.cmma_dirty_pages); > + r =3D get_pgste(kvm->mm, hva, &pgstev); > + if (r < 0) > + pgstev =3D 0; > + /* save the value */ > + res[i++] =3D (pgstev >> 24) & 0x43; > + /* > + * if the next bit is too far away, stop. > + * if we reached the previous "next", find the next one > + */ > + if (next > cur + KVM_S390_MAX_BIT_DISTANCE) > + break; > + if (cur =3D=3D next) > + next =3D kvm_s390_next_dirty_cmma(kvm, cur + 1); > + /* reached the end of the bitmap or of the buffer, stop */ > + if ((next >=3D mem_end) || (next - args->start_gfn >=3D bufsize)) > + break; > + cur++; > + if (cur - ms->base_gfn >=3D ms->npages) { > + ms =3D gfn_to_memslot(kvm, cur); > + if (!ms) > + break; > + } > + } > + args->count =3D i; > + return ret; s/ret/0/ > +} > + > /* > * This function searches for the next page with dirty CMMA attributes= , and > * saves the attributes in the buffer up to either the end of the buff= er or > @@ -1562,90 +1647,47 @@ static int gfn_to_memslot_approx(struct kvm *kv= m, gfn_t gfn) > static int kvm_s390_get_cmma_bits(struct kvm *kvm, > struct kvm_s390_cmma_log *args) > { > - struct kvm_s390_migration_state *s =3D kvm->arch.migration_state; > - unsigned long bufsize, hva, pgstev, i, next, cur; > - int srcu_idx, peek, r =3D 0, rr; > + unsigned long bufsize; > + int srcu_idx, peek, s, rr, r =3D 0; > u8 *res; >=20 > - cur =3D args->start_gfn; > - i =3D next =3D pgstev =3D 0; > + s =3D atomic_read(&kvm->arch.migration_mode); s? Please do not overuse single char variable names! > + if (peek) > + r =3D kvm_s390_peek_cmma(kvm, args, res, bufsize); > + else > + r =3D kvm_s390_get_cmma(kvm, args, res, bufsize); > srcu_read_unlock(&kvm->srcu, srcu_idx); > up_read(&kvm->mm->mmap_sem); > - args->count =3D i; > - args->remaining =3D s ? atomic64_read(&s->dirty_pages) : 0; > + > + args->remaining =3D s ? atomic64_read(&kvm->arch.cmma_dirty_pages) : = 0; So, I didn't yet have time to read the QEMU part so this might be trivial, but what if: s =3D 0 but we didn't transfer all pgstes with this invocation. There would be leftover data which would not be retrieved >=20 > rr =3D copy_to_user((void __user *)args->values, res, args->count); > if (rr) Please get rid of we don't really need it here. > @@ -2096,10 +2138,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm) > kvm_s390_destroy_adapters(kvm); > kvm_s390_clear_float_irqs(kvm); > kvm_s390_vsie_destroy(kvm); > - if (kvm->arch.migration_state) { > - vfree(kvm->arch.migration_state->pgste_bitmap); > - kfree(kvm->arch.migration_state); > - } > KVM_EVENT(3, "vm 0x%pK destroyed", kvm); > } >=20 > diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c > index 572496c..321d6b2 100644 > --- a/arch/s390/kvm/priv.c > +++ b/arch/s390/kvm/priv.c > @@ -954,9 +954,11 @@ static int handle_pfmf(struct kvm_vcpu *vcpu) > return 0; > } >=20 > -static inline int do_essa(struct kvm_vcpu *vcpu, const int orc) > +/* > + * Must be called with relevant read locks held (kvm->mm->mmap_sem, kv= m->srcu) > + */ > +static inline int __do_essa(struct kvm_vcpu *vcpu, const int orc) > { > - struct kvm_s390_migration_state *ms =3D vcpu->kvm->arch.migration_sta= te; > int r1, r2, nappended, entries; > unsigned long gfn, hva, res, pgstev, ptev; > unsigned long *cbrlo; > @@ -965,7 +967,6 @@ static inline int do_essa(struct kvm_vcpu *vcpu, co= nst int orc) > * We don't need to set SD.FPF.SK to 1 here, because if we have a > * machine check here we either handle it or crash > */ > - > kvm_s390_get_regs_rre(vcpu, &r1, &r2); > gfn =3D vcpu->run->s.regs.gprs[r2] >> PAGE_SHIFT; > hva =3D gfn_to_hva(vcpu->kvm, gfn); > @@ -1007,9 +1008,17 @@ static inline int do_essa(struct kvm_vcpu *vcpu,= const int orc) > } >=20 > if (orc) { > - /* increment only if we are really flipping the bit to 1 */ > - if (!test_and_set_bit(gfn, ms->pgste_bitmap)) > - atomic64_inc(&ms->dirty_pages); > + struct kvm_memory_slot *ms =3D gfn_to_memslot(vcpu->kvm, gfn); > + unsigned long *bm; > + > + if (ms) { > + /* The cmma bitmap is right after the memory bitmap */ > + bm =3D ms->dirty_bitmap + kvm_dirty_bitmap_bytes(ms) > + / sizeof(*ms->dirty_bitmap); Hrm, maybe we could put the utility function in kvm-s390.h so we can also use it here. --rudd1Xaj5ucG19HnGDAQ1IWYWvdZsLjxC-- --cxmdipH2MRdBOMuOtaVWNfcEetAZ6Mf4C Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBCAAGBQJaX0kXAAoJEBcO/8Q8ZEV5v9UP/0SxDf/6jCQwd1rTx7Xd2PNC 2as7EX8/3NEzQTV7Z8ApVD6Pq8v4QI2DDDfdb7q6d7HpJtrGn6Or+lEBv8kY8An5 2U0RF1tymFRcZjU2SntVDqv26i9t3eTmndHRJg17gRqtsclslKhM0aZRj2bgSqkV SN14fNAghYPmvE8GfxeZpvppHInMbPucTQ27VdtI8wxv+iD8usSjibwzOt9IfFyX bFPOnbJJ03He56pJXfqIoR8aYgxTuEwDjSX3Cd7Os7fq7wH/yOYbyzB91DGvRdx/ XAW+GUD3HI41gP33M2fPspb2MUOSDFg8YIFheZL+61jlewcUfxTadxFqmHpJ1/2D Zqxq4O2JfaPVyGNMsqbRRDWFqr61fe1xFkxkxBf+mEbPu9q2Qq/JMVeEGkTf76I5 9Emm7uOVSkkJdj5ymndVrIXAHGSZy3Bid/miPbuVOTu53zYbKRjtLeQEiarFdMsl AGt8vX9DnoD2wOQn04XsJT7chHA6afYFZCaJ06/4YlHcBqUNFJ4OUYTPqEjtABxO bJn6VhUN5ifU3JXhwhVWUWpT8jOX1N73NrrWAl2imMCqrsp85wwqN1+1KZ92Gpax Y7f2Q1mV2Dp4AySJywejb+Ap1C/BaouPEJAkCEInSLq62DTZ6Ae++C0PEVqcKXa7 lf6d4KjClhiZu8B4CMLj =uWK+ -----END PGP SIGNATURE----- --cxmdipH2MRdBOMuOtaVWNfcEetAZ6Mf4C--