qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] SMBIOS vs. NUMA (was: Build full type 19 tables)
@ 2014-03-12 21:55 Gabriel L. Somlo
  2014-03-13  8:04 ` Gerd Hoffmann
  0 siblings, 1 reply; 9+ messages in thread
From: Gabriel L. Somlo @ 2014-03-12 21:55 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: agraf, qemu-devel, armbru, alex.williamson, kevin, lersek

On Wed, Mar 12, 2014 at 02:24:54PM +0100, Gerd Hoffmann wrote:
> On Mi, 2014-03-12 at 09:05 -0400, Gabriel L. Somlo wrote:
> > On Wed, Mar 12, 2014 at 09:27:18AM +0100, Gerd Hoffmann wrote:
> > > I think we should just use e820_table (see pc.c) here.  Loop over it and
> > > add a type 19 table for each ram region in there.
> > 
> > I'm assuming this should be another post-Seabios-compatibility patch,
> > at the end of the series, and I should still do the (start,size)
> > arithmetic cut'n'pasted from SeaBIOS first, right ?
> 
> You should get identical results with both methods.  It's just that the
> e820 method is more future proof, i.e. if the numa people add support
> for non-contignous memory some day we don't have to adapt the smbios
> code to handle it.

So I spent some time reverse-engineering the way Type 16..20 (memory)
smbios tables are built in SeaBIOS, and therefore in the QEMU smbios
patch set currently under revision... And I came up with the following
picture (caution: ascii art, fixed-width font strongly recommended):

 ----------------------------------------------------------------------------
|                               Type16  0x1000                               |
 ----------------------------------------------------------------------------
 ^             ^               ^           ^                    ^           ^
 |             |               |           |                    |           |
 |         ----+---        ----+----   ----+----       ---------+--------   |
 |        | Type17 |      | Type17  | | Type17  |     | Type17           |  |
 |        | 0..16G |      | 16..32G | | 32..48G | ... | N*16G..(N+1)*16G |  |
 |        | 0x1100 |      | 0x1101  | | 0x1102  |     | 0x110<N>         |  |
 |         --------        ---------   ---------       ------------------   |
 |          ^   ^              ^           ^                    ^           |
 |          |   |              |           |                    |           |
 |       +--+   +--+           |           |                    |           |
 |       |         |           |           |                    |           |
 |   ----+---   ---+----   ----+----   ----+----       ---------+--------   |
 |  | Type20 | | Type20 | | Type20  | | Type20  |     | Type20           |  |
 |  | 0..4G  | | 4..16G | | 16..32G | | 32..48G | ... | N*16G..(N+1)*16G |  |
 |  | 0x1400 | | 0x1401 | | 0x1402  | | 0x1403  |     | 0x140<N+1>       |  |
 |   ----+---   ---+----   ----+----   ----+----       ---------+--------   |
 |       |         |           |           |                    |           |
 |       |         |           +-------+   |   +----------------+           |
 |       |         +----------------+  |   |   |                            |
 |       |                          |  |   |   |                            |
 |       v                          v  v   v   v                            |
 |   --------                      --------------                           |
 |  | Type19 |                    | Type19       |                          |
 |  | 0..4G  |                    | 4G..ram_size |                          |
 |  | 0x1300 |                    | 0x1301       |                          |
 |   ----+---                      ------+-------                           |
 |       |                               |                                  |
 +-------+                               +----------------------------------+

Here are some of the limit values, and some questions and thoughts:

- Type16 max == 2T - 1K;

Should we just assert((ram_size >> 10) < 0x80000000), and officially
limit guests to < 2T ?

- Type17 max == 32G - 1M;

This explains why we create Type17 device tables in increments of 16G,
since that's the largest possible value that's a nice, round power of
two :)

- Type19 & Type20 max == 4T - 1K;

If we limit ourselves to what Type16 can currently represent (2T),
this should be plenty enough to work with...

So, currently, we split available ram into blobs of up to 16G each,
and assign each blob a Type17 node.

We then split available ram into <4G and 4G+, and create up to two
Type19 nodes for these two areas.

Now, re. e820: currently, the expectation is that the (up to) two
Type19 nodes in the above figure correspond to (up to) two entries of
type E820_RAM in the e820 table.


Then, a type20 node is assigned to the sub-4G portion of the first
Type17 "device", and another type20 node is assigned to the over-4G
portion of the same.

>From then on, type20 nodes correspond to the rest of the 16G-or-less
type17 devices pretty much on a 1:1 basis.


If the e820 table will contain more than just two E820_RAM entries,
and therefore we'll have more than the two Type19 nodes on the bottom
row, what are the rules for extending the rest of the figure
accordingly (i.e. how do we hook together more Type17 and Type20 nodes
to go along with the extra Type19 nodes) ?

Thanks much,
--Gabriel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-03-14 19:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-12 21:55 [Qemu-devel] SMBIOS vs. NUMA (was: Build full type 19 tables) Gabriel L. Somlo
2014-03-13  8:04 ` Gerd Hoffmann
2014-03-13 14:37   ` Gabriel L. Somlo
2014-03-13 15:36     ` Igor Mammedov
2014-03-13 19:01       ` Gabriel L. Somlo
2014-03-14  9:28         ` Igor Mammedov
2014-03-14 15:14           ` Gabriel Somlo
2014-03-14 17:51             ` Igor Mammedov
2014-03-14 19:36               ` Gabriel Somlo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).