All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nikola Ciprich <nikola.ciprich@linuxbox.cz>
To: 王金浦 <jinpuwang@gmail.com>
Cc: KVM list <kvm@vger.kernel.org>, nik@linuxbox.cz, stable@vger.kernel.org
Subject: Re: 4.14.18 -> 4.14.24 - almost all guests hanged
Date: Wed, 7 Mar 2018 21:29:10 +0100	[thread overview]
Message-ID: <20180307202910.GA1527@localhost.localdomain> (raw)
In-Reply-To: <20180307145623.GH28488@pcnci.linuxbox.cz>

[-- Attachment #1: Type: text/plain, Size: 6118 bytes --]

Hi,

> > > I'd like to report that when upgrading our cluster from 4.14.18 to
> > >  4.14.24-rc1 (with live guests migration), almost none of guests survived..
> > What's your hardware setup, intel with IBPB enabled microcode?
> Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
> 
> therefore I suppose no IBPB (at least meltdown checker reports so)
> 
> 
> > Does guests hang right after live migration?
> yes, just  tried it.
> 
> 
> > 
> > Are you able to reproduce the problem, does it work with latest upstream?
> yup, so I'm able to reproduce quickly. I'll revert the cluster to 4.14.18 now,
> but setup test system just afterwards, so and test the patch you've proposed.
> 
> > 
> > Not sure it helps, but following patch is missing in 4.14.24
> > 
> > commit 37b95951c58fdf08dc10afa9d02066ed9f176fb5 upstream.
> > 
> > kvm_valid_sregs() should use X86_CR0_PG and X86_CR4_PAE to check bit
> > status rather than X86_CR0_PG_BIT and X86_CR4_PAE_BIT. This patch is
> > to fix it.
> > 
> > Fixes: f29810335965a(KVM/x86: Check input paging mode when cs.l is set)
> > Reported-by: Jeremi Piotrowski <jeremi.piotrowski@gmail.com>
> > Cc: Paolo Bonzini <pbonzini@redhat.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
> > Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> 
> I'll test and report.

so indeed, this one on top of 4.14.24-rc1 fixes the migration for me.
Greg, could you queue this one up please?

Jack, thanks for the hint!
BR
nik



> 
> n.
> 
> 
> > 
> > Regards,
> > Jack
> > >
> > > I noticed that most of them got stuck in "paused" state without
> > > possibility to resume (virsh just reported guest cannot be continued and
> > > needs to be rebooted).
> > >
> > > in dmesg, lots of following messages appeared:
> > >
> > > [  116.593508] device vnet0 entered promiscuous mode
> > > [  124.143532] *** Guest State ***
> > > [  124.143594] CR0: actual=0x0000000000000030, shadow=0x0000000060000010, gh_mask=fffffffffffffff7
> > > [  124.143668] CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffe871
> > > [  124.143871] CR3 = 0x00000000feffc000
> > > [  124.143984] RSP = 0xffffffff82003e98  RIP = 0xffffffff816df002
> > > [  124.144102] RFLAGS=0x00000246         DR7 = 0x0000000000000400
> > > [  124.144221] Sysenter RSP=0000000000000000 CS:RIP=0000:0000000000000000
> > > [  124.144341] CS:   sel=0xf000, attr=0x0009b, limit=0x0000ffff, base=0x00000000ffff0000
> > > [  124.144516] DS:   sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000
> > > [  124.144692] SS:   sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000
> > > [  124.144907] ES:   sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000
> > > [  124.145089] FS:   sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000
> > > [  124.145272] GS:   sel=0x0000, attr=0x00093, limit=0x0000ffff, base=0x0000000000000000
> > > [  124.145447] GDTR:                           limit=0x0000ffff, base=0x0000000000000000
> > > [  124.145626] LDTR: sel=0x0000, attr=0x00082, limit=0x0000ffff, base=0x0000000000000000
> > > [  124.145814] IDTR:                           limit=0x0000ffff, base=0x0000000000000000
> > > [  124.145995] TR:   sel=0x0000, attr=0x0008b, limit=0x0000ffff, base=0x0000000000000000
> > > [  124.146173] EFER =     0x0000000000000000  PAT = 0x0007040600070406
> > > [  124.146292] DebugCtl = 0x0000000000000000  DebugExceptions = 0x0000000000000000
> > > [  124.146466] Interruptibility = 00000000  ActivityState = 00000000
> > > [  124.146579] *** Host State ***
> > > [  124.146687] RIP = 0xffffffffa046a817  RSP = 0xffffc900200a7cb8
> > > [  124.146832] CS=0010 SS=0018 DS=0000 ES=0000 FS=0000 GS=0000 TR=0040
> > > [  124.146961] FSBase=00007fe82eff7700 GSBase=ffff881fffb40000 TRBase=fffffe00000df000
> > > [  124.147144] GDTBase=fffffe00000dd000 IDTBase=fffffe0000000000
> > > [  124.147262] CR0=0000000080050033 CR3=0000001f5b8fe004 CR4=00000000000626e0
> > > [  124.147381] Sysenter RSP=fffffe00000de200 CS:RIP=0010:ffffffff81801f60
> > > [  124.147499] EFER = 0x0000000000000d01  PAT = 0x0407050600070106
> > > [  124.147614] *** Control State ***
> > > [  124.147734] PinBased=0000007f CPUBased=96a1e9fa SecondaryExec=000004f2
> > > [  124.147849] EntryControls=0000d1ff ExitControls=002fefff
> > > [  124.147965] ExceptionBitmap=00060042 PFECmask=00000000 PFECmatch=00000000
> > > [  124.148085] VMEntry: intr_info=80000081 errcode=00000000 ilen=00000000
> > > [  124.148201] VMExit: intr_info=00000000 errcode=00000000 ilen=00000000
> > > [  124.148318]         reason=80000021 qualification=0000000000000000
> > > [  124.148432] IDTVectoring: info=00000000 errcode=00000000
> > > [  124.148545] TSC Offset = 0xffed7296fb06bc34
> > > [  124.148655] TPR Threshold = 0x00
> > > [  124.148770] EPT pointer = 0x0000001f1a0af01e
> > > [  124.148882] PLE Gap=00000080 Window=00001000
> > > [  124.148995] Virtual processor ID = 0x0001
> > >
> > > (never seen anything like that)
> > >
> > > I haven't yet went through all patches between those two versions, so don't
> > > have any suspicion yet.. If anyone recognizes this as known problem, please
> > > let me know..
> > >
> > > I'm going to try whether I'm able to reproduce the problem.
> > >
> > > BR
> > >
> > > nik
> > 
> 
> -- 
> -------------------------------------
> Ing. Nikola CIPRICH
> LinuxBox.cz, s.r.o.
> 28.rijna 168, 709 00 Ostrava
> 
> tel.:   +420 591 166 214
> fax:    +420 596 621 273
> mobil:  +420 777 093 799
> www.linuxbox.cz
> 
> mobil servis: +420 737 238 656
> email servis: servis@linuxbox.cz
> -------------------------------------
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

       reply	other threads:[~2018-03-07 20:29 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20180305083606.GA3004@pcnci.linuxbox.cz>
     [not found] ` <CAD9gYJKCiHjDV03ZtmZ1_R57N_PcVNWS-pX38pBq+6-7+ObcmQ@mail.gmail.com>
     [not found]   ` <20180307145623.GH28488@pcnci.linuxbox.cz>
2018-03-07 20:29     ` Nikola Ciprich [this message]
2018-03-08 10:53       ` 4.14.18 -> 4.14.24 - almost all guests hanged 王金浦
2018-03-08 14:17       ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180307202910.GA1527@localhost.localdomain \
    --to=nikola.ciprich@linuxbox.cz \
    --cc=jinpuwang@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=nik@linuxbox.cz \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.