From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1L5WG5-0000rT-Nb
	for qemu-devel@nongnu.org; Wed, 26 Nov 2008 21:05:33 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1L5WG3-0000rF-Vo
	for qemu-devel@nongnu.org; Wed, 26 Nov 2008 21:05:33 -0500
Received: from [199.232.76.173] (port=36750 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1L5WG3-0000rC-QT
	for qemu-devel@nongnu.org; Wed, 26 Nov 2008 21:05:31 -0500
Received: from mail.gmx.net ([213.165.64.20]:48744)
	by monty-python.gnu.org with smtp (Exim 4.60)
	(envelope-from <c-d.hailfinger.devel.2006@gmx.net>)
	id 1L5WG2-00051x-Vk
	for qemu-devel@nongnu.org; Wed, 26 Nov 2008 21:05:31 -0500
Message-ID: <492E0066.7040206@gmx.net>
Date: Thu, 27 Nov 2008 03:05:26 +0100
From: Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Modeling x86 early initialization accurately
References: <492C80BF.4010103@gmx.net>
	<492CAEC8.4010306@codemonkey.ws>	<492CC1F9.5050408@gmx.net>
	<492D7B2A.3060404@redhat.com>
In-Reply-To: <492D7B2A.3060404@redhat.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

On 26.11.2008 17:36, Avi Kivity wrote:
> Carl-Daniel Hailfinger wrote:
>>> This is pretty reasonable.
>>>     
>>
>> So this would be my first patch, together with a patch to change the
>> allocation to read/write once a special MSR is written.
>>
>> Is it possible to change the type of allocation from readonly to
>> read/write if the backing store has been allocated with
>> qemu_ram_alloc()?
>> Can I simply call cpu_register_physical_memory() again for the same
>> target region and the newer register will take precedence?
>> Is the "special MSR" solution acceptable? If yes, which number should I
>> pick? Or is that my choice?
>>
>>   
>
> Isn't this usually a chipset function?  In this case a chipset
> register is more appropriate.  Best would be to implement the actual
> chipset that qemu emulates.

Yes, it is a chipset function. However, implementing complete RAM
initialization control for i440FX in a way that is compatible with any
BIOS code written to the spec is not a trivial task. Besides that, it
would make addressing/initializing more than 1 GB of RAM a real problem.
Newer Intel chipsets supporting more RAM have even more complicated
memory initialization interfaces and some have docs only under NDA.

Please don't get me wrong, accurate simulation of the RAM controller in
QEMU would be cool, but the payoff seems rather limited.

For the AMD K8 line and successors, the RAM controller is inside the CPU
and thus not a chipset function. That doesn't mean the interface would
be any easier, though.


>>> Yes, I saw this patch but since it's just debugging code, it's not
>>> interesting for inclusion.
>>>     
>>
>> Quite a few x86 processors reset themselves if they encounter an unknown
>> MSR write. Should we do the same? If not, would spewing a loud debug
>> message be appropriate?  
>
> The standard behaviour is to #GP.

Grepping the source code didn't find any code signaling a #GP. I'd be
thankful for any hints so I can create a patch for this.
(And give a warning to the ReactOS developers because their latest code
will #GP with that change. They read and write MSR 0x0000008b which is
unimplemented in QEMU).


>>> I'm concerned that modeling this could have a non negligible overhead
>>> and could be very difficult in something like KVM.  Can you describe
>>> exactly what coreboot is expecting that we are not implementing?  How
>>> is it relying on cache locking?
>>>     
>>
>> Since there is no RAM before RAM initialization, we have no way to keep
>> a stack. That rules out implementing RAM init in C (which is fond of
>> using a stack for local variables, parameters and call return addresses)
>> unless you either can fake some RAM or have a C compiler which needs no
>> stack. Faking some RAM is way easier.
>> Basically, we use MTRRs to declare everything uncached except one small
>> (4-64k sized with page granularity) area in the CPU address space which
>> has cache type writeback. That area is called the CAR (Cache-as-RAM)
>> area. Reads in that area will allocate a cache line and subsequent reads
>> will hit the cache directly. Writes in that area will allocate a cache
>> line if none already exists for the given address. Writes to the area
>> will never be passed to RAM. Reads and writes outside the CAR area will
>> go directly through to RAM/ROM. Writes outside the CAR area will be
>> discarded. Since everything besides the CAR area is declared as uncached
>> and any access outside the cache area won't cause cacheline evictions,
>> the cache is effectively locked.
>>
>> From a firmware perspective, the following implementation is good
>> enough:
>> 1. CAR enable: Copy the contents of the address area designated for CAR
>> from the underlying (readonly RAM/ROM) backing store to a new "CAR"
>> read/write backing store mapped to the same CPU physical address area.
>> 2. CAR usage: All reads/writes to the CAR area hit the "CAR" read/write
>> backing store. All other reads outside the CAR area hit the normal
>> backing store. All writes outside the CAR area are discarded if they
>> would have ended up in RAM. Writes to MMIO regions are still honored.
>> 3. RAM enabling: The backing store for RAM outside the CAR area now
>> accepts writes.
>> 3. CAR disabling: The "CAR" backing store is either discarded (INVD
>> instruction) or written to RAM (WBINVD instruction).
>>
>> The runtime performance hit of this implementation should be negligible
>> because there is no need to check for CAR on each memory access. Only
>> the relevant MSR writes need to be handled to change allocation type.
>> Once CAR is disabled, the memory allocation and mapping should match
>> exactly what the current code does. That means any performance hit would
>> only matter during the time CAR is active. That's probably a few hundred
>> instructions after poweron until RAM is enabled.
>>   
>
> If we can detect this, we can handle it with kvm by allocating a
> memory slot to back the cache.  But I don't see how we can detect it
> reliably (mtrrs are handled completely within the kernel, and I
> wouldn't want this hack in the kernel).

AFAICS MTRRs of x86 targets are ignored completely by QEMU. They are
handled as unknown MSR reads/writes and do not fault. See
target-i386/op_helper.c:helper_rdmsr() and helper_wrmsr(). I'm not
familiar with how KVM handles the MTRRs and the KVM code in the QEMU
doesn't provide that many clues. Your statement about MTRR handling in
the kernel is not entirely clear to me. Are all MSR writes handled in
the kernel by KVM?

Detection of the CAR mode activation can be performed in two ways,
depending on how close to hardware we want to get:
1. Coreboot specific, triggering on the exact sequence of MSR writes
performed by coreboot.
2. BIOS/firmware agnostic, triggering anytime the cache control bits and
any of the MTRR MSRs are in the right state.


Regards,
Carl-Daniel

-- 
http://www.hailfinger.org/