From: Daniel J Blueman <daniel@numascale-asia.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
H Peter Anvin <hpa@zytor.com>,
x86@kernel.org, linux-kernel@vger.kernel.org,
Andreas Herrmann <herrmann.der.user@gmail.com>,
Steffen Persvold <sp@numascale.com>
Subject: Re: [PATCH v3] Add support for AMD64 EDAC on multiple PCI domains
Date: Wed, 31 Oct 2012 13:23:36 +0800 [thread overview]
Message-ID: <5090B5D8.3000209@numascale-asia.com> (raw)
In-Reply-To: <20121029103217.GD4326@liondog.tnic>
On 29/10/2012 18:32, Borislav Petkov wrote:
> + Andreas.
>
> Dude, look at this boot log below:
>
> http://quora.org/2012/16-server-boot-2.txt
>
> That's 192 F10h's!
We were booting 384 a while back, but I'll let you know when reach 4096!
> On Mon, Oct 29, 2012 at 04:54:59PM +0800, Daniel J Blueman wrote:
>>> A number of other callers lookup the PCI device based on index
>>> 0..amd_nb_num(), but we can't easily allocate contiguous northbridge IDs
>> >from the PCI device in the first place.
>>
>>> OTOH we can simply this code by changing amd_get_node_id to generate a
>>> linear northbridge ID from the index of the matching entry in the
>>> northbridge array.
>>>
>>> I'll get a patch together to see if there are any snags.
>
> I suspected that after we have this nice approach, you guys would come
> with non-contiguous node numbers. Maan, can't you build your systems so
> that software people can have it easy at least for once??!
It depends on the definition of node, of course. The only changes we're
considering is compliance with the Intel x2apic spec with using the
upper 16-bits of the APIC ID as the server ("cluster") ID, since there
are optimisations in Linux for this.
>> This really is a lot less intrusive [1] and boots well on top of
>> 3.7-rc3 on one of our 16-server/192-core/512GB systems [2].
>>
>> If you're happy with this simpler approach for now, I'll present
>> this and a separate patch cleaning up the inconsistent use of
>> unsigned and u8 node ID variables to u16?
>
> Sure, bring it on.
Yes, I've prepared a patch series and it tests out well.
>> diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
>> index b3341e9..b88fc7a 100644
>> --- a/arch/x86/include/asm/amd_nb.h
>> +++ b/arch/x86/include/asm/amd_nb.h
>> @@ -81,6 +81,18 @@ static inline struct amd_northbridge
>> *node_to_amd_nb(int node)
>> return (node < amd_northbridges.num) ?
>> &amd_northbridges.nb[node] : NULL;
>> }
>>
>> +static inline u8 get_node_id(struct pci_dev *pdev)
>> +{
>> + int i;
>> +
>> + for (i = 0; i != amd_nb_num(); i++)
>> + if (pci_domain_nr(node_to_amd_nb(i)->misc->bus) ==
>> pci_domain_nr(pdev->bus) &&
>> + PCI_SLOT(node_to_amd_nb(i)->misc->devfn) ==
>> PCI_SLOT(pdev->devfn))
>> + return i;
>
> Looks ok, can you send the whole patch please?
>
>> + BUG();
>
> I'm not sure about this - maybe WARN()? Are we absolutely sure we
> unconditionally should panic after not finding an NB descriptor?
It looks like the only way we could be looking up a non-existent NB
descriptor is if the array or variable in hand was corrupted. Maybe
better to panic immediately debugging to be elusive later.
I've tweaked this to warn and return the first Northbridge ID to avoid
further issues, but even that isn't ideal.
> Btw, this shouldn't happen on those CPUs:
>
> [ 39.279131] TSC synchronization [CPU#0 -> CPU#12]:
> [ 39.287223] Measured 22750019569 cycles TSC warp between CPUs, turning off TSC clock.
> [ 0.030000] tsc: Marking TSC unstable due to check_tsc_sync_source failed
>
> I guess TSCs are not starting at the same moment on all boards.
As these are physically separate servers (off-the-shelf servers in fact,
a key benefit of NumaConnect), the TSC clocks diverge. Later, I'll be
cooking up a patch series to keep them in sync, allowing fast TSC use.
> You definitely need ucode on those too:
>
> [ 113.392460] microcode: CPU0: patch_level=0x00000000
Good tip!
Thanks,
Daniel
--
Daniel J Blueman
Principal Software Engineer, Numascale Asia
prev parent reply other threads:[~2012-10-31 5:23 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-25 8:32 [PATCH v3] Add support for AMD64 EDAC on multiple PCI domains Daniel J Blueman
2012-10-25 11:03 ` Borislav Petkov
2012-10-25 11:56 ` Ingo Molnar
2012-10-25 13:59 ` Multiple patch authors (was: Re: [PATCH v3] Add support for AMD64 EDAC on multiple PCI domains) Borislav Petkov
2012-10-25 14:32 ` Multiple patch authors H. Peter Anvin
2012-10-25 14:36 ` Borislav Petkov
2012-10-25 14:41 ` H. Peter Anvin
2012-10-25 15:23 ` Borislav Petkov
2012-10-29 6:17 ` [PATCH v3] Add support for AMD64 EDAC on multiple PCI domains Daniel J Blueman
2012-10-29 8:54 ` Daniel J Blueman
2012-10-29 10:32 ` Borislav Petkov
2012-10-31 5:23 ` Daniel J Blueman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5090B5D8.3000209@numascale-asia.com \
--to=daniel@numascale-asia.com \
--cc=bp@alien8.de \
--cc=herrmann.der.user@gmail.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=sp@numascale.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.