public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Petr Vandrovec <petr@vmware.com>
To: Andi Kleen <ak@suse.de>
Cc: Gerd Hoffmann <kraxel@suse.de>, john stultz <johnstul@us.ibm.com>,
	caglar@pardus.org.tr, lkml <linux-kernel@vger.kernel.org>,
	Zachary Amsden <zach@vmware.com>
Subject: Re: Vmware problems was Re: [RFC] Avoid PIT SMP lockups
Date: Mon, 16 Oct 2006 15:16:02 -0700	[thread overview]
Message-ID: <453404A2.6090008@vmware.com> (raw)
In-Reply-To: <200610161822.48500.ak@suse.de>

Andi Kleen wrote:
> On Monday 16 October 2006 18:08, Gerd Hoffmann wrote:
> 
>>john stultz wrote:
>>
>>>Hey Gerd,
>>>	Looks like the smp replacements code in 2.6.18 is breaking with vmware.
>>>I'm guessing we're taking an interrupt while apply_replacements is
>>>running. Any ideas?
>>
>>It's not the smp alternatives code, its the one for processor-specific
>>instructions.  The eip offset for alternative_instructions() in the
>>trace suggests it is the first call to apply_replacements.  The second
>>one is the one for the smp alternatives (which doesn't do anything btw
>>as we patch away the lock prefixes only).
> 
> 
> I would have expected that they trap those writes and invalidate the cache.
> Even qemu and valgrind do that fine.
> 
> Perhaps Zach has some clue or can refer to someone who has.

Why do you think it has something to do with VMware's emulation - except 
timing?  From what I see, there were bytes

0xF0 0x83 0x44 0x24 0x00 0x00

before apply_alternatives() was entered (binutils 2.17 would generate 
one byte shorter code, 0xF0 0x83 0x04 0x24 0x00, but that's another 
story - 2.16 and older treat 0(%esp) differently from (%esp)) .  Now 
update alternatives comes in and starts overwritting alternative - note 
that it does that in two steps - first it memcpy()-ies alternative, then 
it memcpy()-ies nop padding.  So after first memcpy() code looks like

0x0F 0xAE 0xE8 0x24 0x00 0x00

Now timer interrupt arrives, and these data are interpreted as

lfence; andb $0,%al; add %cl,0x465B9415(%ebx)

If it would not crash, once it would return 0x24 0x00 0x00 will get 
overwritten with 3 bytes NOP sequence and everybody will be happy.

AFAIT you do not see this because you have to use old binutils to repro 
this - I'm unable to reproduce this on Debian box with binutils 2.17, as 
then byte sequence is valid instruction even if partially overwritten - 
it just clears %al to zero, but nobody notices that...

So as far as I can tell, interrupts (and NMIs?) should be disabled when 
apply_alternatives() run if interrupt handlers are using alternatives - 
and as it was just proven, they do.

Reason you see this happening in VM often is that this first 
alternatives run invalidates lot of internal state, and it takes so long 
that next timer interrupt is for sure pending as soon as first "rep 
movsb" in alternatives finishes.

						Petr


      reply	other threads:[~2006-10-16 22:16 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-06 21:38 [RFC] Avoid PIT SMP lockups john stultz
2006-10-07 15:50 ` S.Çağlar Onur
2006-10-10  9:11 ` S.Çağlar Onur
2006-10-10 18:27   ` john stultz
2006-10-11 10:49     ` S.Çağlar Onur
2006-10-11 17:59       ` john stultz
2006-10-11 18:37         ` S.Çağlar Onur
2006-10-11 18:43           ` john stultz
2006-10-11 19:09             ` S.Çağlar Onur
2006-10-11 19:26               ` john stultz
2006-10-11 19:31                 ` S.Çağlar Onur
2006-10-12  7:28             ` Gerd Hoffmann
2006-10-12  7:45               ` S.Çağlar Onur
2006-10-16 22:17                 ` Zachary Amsden
2006-10-16 22:21                   ` S.Çağlar Onur
2006-10-17 12:05                     ` S.Çağlar Onur
2006-10-17 12:16                       ` Andi Kleen
2006-10-19  8:00                       ` [PATCH] Fix potential interrupts during alternative patching [was Re: [RFC] Avoid PIT SMP lockups] Zachary Amsden
2006-10-19  8:49                         ` Jeremy Fitzhardinge
2006-10-19  9:00                           ` Zachary Amsden
2006-10-20 10:36                             ` S.Çağlar Onur
2006-10-20  5:25                         ` Greg KH
2006-10-16 22:40                   ` [RFC] Avoid PIT SMP lockups Andi Kleen
2006-10-16 23:25                     ` Zachary Amsden
2006-10-16 16:08             ` Gerd Hoffmann
2006-10-16 16:22               ` Vmware problems was " Andi Kleen
2006-10-16 22:16                 ` Petr Vandrovec [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=453404A2.6090008@vmware.com \
    --to=petr@vmware.com \
    --cc=ak@suse.de \
    --cc=caglar@pardus.org.tr \
    --cc=johnstul@us.ibm.com \
    --cc=kraxel@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zach@vmware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox