From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ed Swierk Subject: Re: Crash in set_cpu_sibling_map() booting Xen 4.6.0 on Fusion Date: Tue, 24 Nov 2015 06:13:27 -0800 Message-ID: References: <564F030C02000078000B70A6@prv-mh.provo.novell.com> <5652F61A02000078000B7B64@prv-mh.provo.novell.com> <56544B5502000078000B84F4@prv-mh.provo.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <56544B5502000078000B84F4@prv-mh.provo.novell.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Jan Beulich Cc: Chao Peng , xen-devel@lists.xen.org, Andrew Cooper List-Id: xen-devel@lists.xenproject.org On Tue, Nov 24, 2015 at 2:34 AM, Jan Beulich wrote: > Indeed, and I think I had said so. The algorithm does, however, tell > us that with the above output CPU 3 (APIC ID 6) is on socket 6 (both > shifts being zero), which for the whole system results in sockets 1, > 3, and 5 unused. While not explicitly excluded, I'm not sure how far > we should go in expecting all kinds of odd configurations (along those > lines we e.g. have a limit on the largest APIC ID we allow: MAX_APICS / > MAX_LOCAL_APIC, which for big systems is 4 times the number of > CPUs we support). That's why I thought it reasonable to substitute MAX_APICS for nr_sockets in sizing the socket_cpumask array. > Taking it to set_nr_sockets(), a pretty basic assumption is broken by > the above way of presenting topology: We would have to have more > sockets than there are CPUs. I would have wanted to check what > e.g. Linux does here, but there doesn't seem to be any support of > CAT (and hence any need for per-socket data) there. I looked at Linux, and there is no per-socket bookkeeping, AFAICT. > (I am, btw, now also confused by you saying that e.g. for a 3-CPU > config things work. If the topology data gets presented in similar > ways in that case, I can't see why you wouldn't run into the same > problem. Unless memory corruption occurs silently in one case, but > "loudly" in the other.) For 3, 6 and 12 CPUs, Fusion presents a completely different topology, with 3-core sockets numbered consecutively starting with 0. > Bottom line - for the moment I do not see a reasonable way of > dealing with that situation. The closest I could see would be what > we iirc had temporarily during the review cycles of the initial CAT > series: A command line option to specify the number of sockets. Or > make all accesses to socket_cpumask[] conditional upon PSR being > enabled (which would have the bad side effect of making future > uses for other purposes more cumbersome), or go through and > range check the socket number on all of those accesses. Could we avoid the issue by replacing socket_cpumask array with a list or hashtable, indexed by socket ID? > Chao, could you - inside Intel - please check whether there are > any assumptions on the respective CPUID leaf output that aren't > explicitly stated in the SDM right now (like resulting in contiguous > socket numbers), and ask for them getting made explicit (if there > are any), or it being made explicit that no assumptions at all are > to be made at all on the presented values (in which case we'd > have to consume MADT parsing data in set_nr_sockets(), e.g. > by replacing num_processors there with one more than the > maximum APIC ID of any non-disabled CPU)? I suppose the key is whether Intel has encoded such assumptions in the BIOS reference code, or has otherwise communicated them to AMI et al. --Ed