From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1L5B3F-00008T-P1
	for qemu-devel@nongnu.org; Tue, 25 Nov 2008 22:26:53 -0500
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1L5B3E-000080-RY
	for qemu-devel@nongnu.org; Tue, 25 Nov 2008 22:26:53 -0500
Received: from [199.232.76.173] (port=60076 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1L5B3E-00007t-Im
	for qemu-devel@nongnu.org; Tue, 25 Nov 2008 22:26:52 -0500
Received: from mail.gmx.net ([213.165.64.20]:42091)
	by monty-python.gnu.org with smtp (Exim 4.60)
	(envelope-from <c-d.hailfinger.devel.2006@gmx.net>)
	id 1L5B3D-0001WB-SL
	for qemu-devel@nongnu.org; Tue, 25 Nov 2008 22:26:52 -0500
Message-ID: <492CC1F9.5050408@gmx.net>
Date: Wed, 26 Nov 2008 04:26:49 +0100
From: Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Modeling x86 early initialization accurately
References: <492C80BF.4010103@gmx.net> <492CAEC8.4010306@codemonkey.ws>
In-Reply-To: <492CAEC8.4010306@codemonkey.ws>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

On 26.11.2008 03:04, Anthony Liguori wrote:
> Carl-Daniel Hailfinger wrote:
>> current svn HEAD of QEMU assumes all RAM is available directly at x86
>> CPU startup. The ability to lock processor caches to function as RAM
>> (Cache-as-RAM) is unimplemented as well.
>> While that does make it easier for the shipped BIOS to set up working
>> RAM (i.e. it does nothing about that right now), that simplification
>> reduces the ability to run alternative firmwares for x86 in QEMU.
>> coreboot (a free x86 firmware/BIOS replacement) is unable to use
>> standard x86 early initialization because the MSRs for cache control
>> (MTRRs) are completely unimplemented and ignored.
>> Modeling ACPI S3 (Suspend-to-RAM) suffers from similar issues.
>>
>> Things which need to be changed to model x86 better:
>> - Start up with all RAM being readonly. Writes should be discarded,
>> reads will usually return 0xff or be undefined. The "undefined" variant
>> would allow the code to allocate RAM once and just switch write access
>> on/off.
>>   
>
> This is pretty reasonable.

So this would be my first patch, together with a patch to change the
allocation to read/write once a special MSR is written.

Is it possible to change the type of allocation from readonly to
read/write if the backing store has been allocated with qemu_ram_alloc()?
Can I simply call cpu_register_physical_memory() again for the same
target region and the newer register will take precedence?
Is the "special MSR" solution acceptable? If yes, which number should I
pick? Or is that my choice?


>> - Support MTRRs.
>> -- Mention MTRR support in CPUID.
>> -- I sent a patch to dump unknown MSR accesses in general and MTRR
>> reads/writes in particular. The subject was "[Qemu-devel] [PATCH] x86
>> MTRR access dumping".
>>   
>
> Yes, I saw this patch but since it's just debugging code, it's not
> interesting for inclusion.

Quite a few x86 processors reset themselves if they encounter an unknown
MSR write. Should we do the same? If not, would spewing a loud debug
message be appropriate?


>> -- It is not really needed to completely implement L1/L2 caches, but the
>> ability to lock the cache with the help of MTRRs should be available.
>> Areas with active locked cache do not send writes down to the RAM which
>> is still readonly. The cache locking is done on a per-page basis (or
>> even larger granularity), so it should be easier than having to handle
>> single cache lines.
>>   
>
> I'm concerned that modeling this could have a non negligible overhead
> and could be very difficult in something like KVM.  Can you describe
> exactly what coreboot is expecting that we are not implementing?  How
> is it relying on cache locking?

Since there is no RAM before RAM initialization, we have no way to keep
a stack. That rules out implementing RAM init in C (which is fond of
using a stack for local variables, parameters and call return addresses)
unless you either can fake some RAM or have a C compiler which needs no
stack. Faking some RAM is way easier.
Basically, we use MTRRs to declare everything uncached except one small
(4-64k sized with page granularity) area in the CPU address space which
has cache type writeback. That area is called the CAR (Cache-as-RAM)
area. Reads in that area will allocate a cache line and subsequent reads
will hit the cache directly. Writes in that area will allocate a cache
line if none already exists for the given address. Writes to the area
will never be passed to RAM. Reads and writes outside the CAR area will
go directly through to RAM/ROM. Writes outside the CAR area will be
discarded. Since everything besides the CAR area is declared as uncached
and any access outside the cache area won't cause cacheline evictions,
the cache is effectively locked.

>>From a firmware perspective, the following implementation is good enough:
1. CAR enable: Copy the contents of the address area designated for CAR
from the underlying (readonly RAM/ROM) backing store to a new "CAR"
read/write backing store mapped to the same CPU physical address area.
2. CAR usage: All reads/writes to the CAR area hit the "CAR" read/write
backing store. All other reads outside the CAR area hit the normal
backing store. All writes outside the CAR area are discarded if they
would have ended up in RAM. Writes to MMIO regions are still honored.
3. RAM enabling: The backing store for RAM outside the CAR area now
accepts writes.
3. CAR disabling: The "CAR" backing store is either discarded (INVD
instruction) or written to RAM (WBINVD instruction).

The runtime performance hit of this implementation should be negligible
because there is no need to check for CAR on each memory access. Only
the relevant MSR writes need to be handled to change allocation type.
Once CAR is disabled, the memory allocation and mapping should match
exactly what the current code does. That means any performance hit would
only matter during the time CAR is active. That's probably a few hundred
instructions after poweron until RAM is enabled.


>> - Decide what to do for RAM initialization. Do we switch RAM into
>> read-write mode by a simple QEMU-specific MSR write? Do we want to
>> implement all memory initialization hardware instead?
>> - Adapt the currently shipped BIOS to these tasks and/or switch to
>> coreboot+SeaBIOS.
>>   
>
> BTW, I'd love to switch to something like coreboot but the legacy BIOS
> support payload is too incomplete.  SeaBIOS is a good option too but
> it needs some heavy regression testing first.

SeaBIOS is now the official coreboot payload for any legacy BIOS needs.
The SeaBIOS maintainer is pretty responsive to bug reports, so I think
it will be working well. After all, SeaBIOS as a coreboot payload can
boot Windows XP in QEMU and on some (the number of testers is rather
limited) real hardware.


>> I'm willing to do most of the work if I know that this won't be rejected
>> outright.
>>   
>
> In general, better modeling of processor modes, provided that there
> isn't a regression in performance, is a good thing.  Dividing the
> effort into incremental bits that are posted early for inclusion is
> also a good thing.

Thanks, I'll heed your advice.

I hope the explanations I gave were precise and understandable enough.
If anything seems unclear or incomplete, feel free to ask. x86 CAR is
difficult to explain and understand and there are almost no public docs
about it.


Regards,
Carl-Daniel

> Regards,
>
> Anthony Liguori
-- 
http://www.hailfinger.org/