All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel J Blueman <daniel@numascale-asia.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@amd64.org>, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	H Peter Anvin <hpa@zytor.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3] Add support for AMD64 EDAC on multiple PCI domains
Date: Mon, 29 Oct 2012 16:54:59 +0800	[thread overview]
Message-ID: <508E4463.3080503@numascale-asia.com> (raw)
In-Reply-To: <508E1F64.3080806@numascale-asia.com>

On 29/10/2012 14:17, Daniel J Blueman wrote:
> On 25/10/2012 19:03, Borislav Petkov wrote:
>> On Thu, Oct 25, 2012 at 04:32:52PM +0800, Daniel J Blueman wrote:
>>> The AMD Northbridge initialisation code and EDAC assume the
>>> Northbridge IDs
>>> are contiguous, which no longer holds on federated systems with multiple
>>> HyperTransport fabrics and multiple PCI domains, eg on Numascale's
>>> Numaconnect systems with NumaChip.
>>>
>>> Address this assumption by searching the Northbridge ID array, rather
>>> than
>>> directly indexing it, using the upper bits for the PCI domain.
>>>
>>> RFC->v2: Correct array initialisation
>>> v2->v3: Add Boris's neater linked list approach
>>>
>>> Todo:
>>> 1. fix kobject/sysfs oops (see
>>> http://quora.org/2012/16-server-boot.txt later)
>>> 2. reorder amd64_edac.c or add
>>> amd64_per_family_init/pci_get_related_function
>>>     forward declarations, based on feedback
>>>
>>> Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
>>
>> This patch contains code from both of us and thus needs both our SOBs:
>>
>> Signed-off-by: Borislav Petkov <bp@alien8.de>
>
> I'll use "Based-on-patch-from: Borislav Petkov <bp@alien8.de>", great.
>
>>> ---
>>>   arch/x86/include/asm/amd_nb.h            |   63 +++++++++++++++-
>>>   arch/x86/include/asm/numachip/numachip.h |   22 ++++++
>>>   arch/x86/kernel/amd_gart_64.c            |    8 +-
>>>   arch/x86/kernel/amd_nb.c                 |   85 ++++++++++++---------
>>>   arch/x86/pci/numachip.c                  |  121
>>> ++++++++++++++++++++++++++++++
>>>   drivers/char/agp/amd64-agp.c             |   12 +--
>>>   drivers/edac/amd64_edac.c                |   34 +++++----
>>>   drivers/edac/amd64_edac.h                |    6 --
>>>   8 files changed, 283 insertions(+), 68 deletions(-)
>>>   create mode 100644 arch/x86/include/asm/numachip/numachip.h
>>>   create mode 100644 arch/x86/pci/numachip.c
>>>
>>> diff --git a/arch/x86/include/asm/amd_nb.h
>>> b/arch/x86/include/asm/amd_nb.h
>>> index b3341e9..6a27226 100644
>>> --- a/arch/x86/include/asm/amd_nb.h
>>> +++ b/arch/x86/include/asm/amd_nb.h
>>> @@ -4,6 +4,8 @@
>>>   #include <linux/ioport.h>
>>>   #include <linux/pci.h>
>>>
>>> +#define NUM_POSSIBLE_NBS    8
>>> +
>>>   struct amd_nb_bus_dev_range {
>>>       u8 bus;
>>>       u8 dev_base;
>>> @@ -51,12 +53,22 @@ struct amd_northbridge {
>>>       struct pci_dev *link;
>>>       struct amd_l3_cache l3_cache;
>>>       struct threshold_bank *bank4;
>>> +    u16 node;
>>> +    struct list_head nbl;
>>>   };
>>>
>>>   struct amd_northbridge_info {
>>>       u16 num;
>>>       u64 flags;
>>> -    struct amd_northbridge *nb;
>>> +
>>> +    /*
>>> +     * The first 8 elems are for fast lookup of NB descriptors on
>>> single-
>>> +     * system setups, i.e. "normal" boxes. The nb_list, OTOH, is
>>> list of
>>> +     * additional NB descriptors which exist on confederate systems
>>> +     * like using Numascale's Numaconnect/NumaChip.
>>> +     */
>>> +    struct amd_northbridge *nbs[NUM_POSSIBLE_NBS];
>>> +    struct list_head nb_list;
>>>   };
>>>   extern struct amd_northbridge_info amd_northbridges;
>>>
>>> @@ -78,7 +90,54 @@ static inline bool amd_nb_has_feature(unsigned
>>> feature)
>>>
>>>   static inline struct amd_northbridge *node_to_amd_nb(int node)
>>>   {
>>> -    return (node < amd_northbridges.num) ?
>>> &amd_northbridges.nb[node] : NULL;
>>> +    struct amd_northbridge_info *nbi = &amd_northbridges;
>>> +    struct amd_northbridge *nb;
>>> +    int i;
>>> +
>>> +    /* Quick search for first domain */
>>> +    if (node < NUM_POSSIBLE_NBS) {
>>> +        if (node < nbi->num)
>>> +            return nbi->nbs[node];
>>> +        else
>>> +            return NULL;
>>> +    }
>>
>> Why change that here from what I had before?
>>
>> nbi->nbs[node] will either return a valid descriptor or NULL because it
>> is statically allocated in amd_northbridge_info.
>>
>> So why add a conditional where you clearly don't need it?
>
> True; fixed up.
>
>>> +    /* Search for NBs from later domains in array */
>>> +    for (i = 0; i < NUM_POSSIBLE_NBS; i++)
>>> +        if (nbi->nbs[i]->node == node)
>>> +            return nbi->nbs[i];
>>
>> And then this is not needed.
>
> Eg with two servers with two Northbridges per server, interconnected,
> Linux sees two PCI domains (bits 15:3) and the nbs array would have node
> IDs:
>
> [0x00]
> [0x01]
> [0x08]
> [0x09]
>
> Without that check, searching for node 0x08 would only hit the linked
> list, though this doesn't affect the fast-path (id < 0x8) of course.
>
> We can use the static array for only the first PCI domain by changing
> _alloc_nb_desc to use the list when nbi->node > NUM_POSSIBLE_NBS, rather
> than nbi->num; we'd then need to introduce a variable to struct
> amd_northbridge_info to keep track of how many static array entries are
> used, for a linear lookup in index_to_amd_nb.
>
>>> +
>>> +    list_for_each_entry(nb, &nbi->nb_list, nbl)
>>> +        if (node == nb->node)
>>> +            return nb;
>>
>> And why change the list_for_each_entry_safe variant? It is not needed
>> now but who knows what code changes where in the future.
>
> Changed also.
>
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>> +static inline struct amd_northbridge *index_to_amd_nb(int index)
>>> +{
>>> +    struct amd_northbridge_info *nbi = &amd_northbridges;
>>> +    struct amd_northbridge *nb;
>>> +    int count = NUM_POSSIBLE_NBS;
>>> +
>>> +    if (index < NUM_POSSIBLE_NBS) {
>>> +        if (index < nbi->num)
>>> +            return nbi->nbs[index];
>>> +        else
>>> +            return NULL;
>>> +    }
>>> +
>>> +    list_for_each_entry(nb, &nbi->nb_list, nbl) {
>>> +        if (count++ == index)
>>> +            return nb;
>>> +    }
>>> +
>>> +    return NULL;
>>> +}
>>
>> Huh, what do we need that function for? node should be equal to index
>> for the first 8 and then we use the linked list. What's up?
>
> A number of other callers lookup the PCI device based on index
> 0..amd_nb_num(), but we can't easily allocate contiguous northbridge IDs
> from the PCI device in the first place.

> OTOH we can simply this code by changing amd_get_node_id to generate a
> linear northbridge ID from the index of the matching entry in the
> northbridge array.
>
> I'll get a patch together to see if there are any snags.

This really is a lot less intrusive [1] and boots well on top of 3.7-rc3 
on one of our 16-server/192-core/512GB systems [2].

If you're happy with this simpler approach for now, I'll present this 
and a separate patch cleaning up the inconsistent use of unsigned and u8 
node ID variables to u16?

Thanks,
   Daniel

--- [1]

diff --git a/arch/x86/include/asm/amd_nb.h b/arch/x86/include/asm/amd_nb.h
index b3341e9..b88fc7a 100644
--- a/arch/x86/include/asm/amd_nb.h
+++ b/arch/x86/include/asm/amd_nb.h
@@ -81,6 +81,18 @@ static inline struct amd_northbridge 
*node_to_amd_nb(int node)
         return (node < amd_northbridges.num) ? 
&amd_northbridges.nb[node] : NULL;
  }

+static inline u8 get_node_id(struct pci_dev *pdev)
+{
+       int i;
+
+       for (i = 0; i != amd_nb_num(); i++)
+               if (pci_domain_nr(node_to_amd_nb(i)->misc->bus) == 
pci_domain_nr(pdev->bus) &&
+                   PCI_SLOT(node_to_amd_nb(i)->misc->devfn) == 
PCI_SLOT(pdev->devfn))
+                       return i;
+
+       BUG();
+}
+
  #else

  #define amd_nb_num(x)          0
diff --git a/drivers/edac/amd64_edac.h b/drivers/edac/amd64_edac.h
index 8d48047..90cae61 100644
--- a/drivers/edac/amd64_edac.h
+++ b/drivers/edac/amd64_edac.h
@@ -290,12 +290,6 @@
  /* MSRs */
  #define MSR_MCGCTL_NBE                 BIT(4)

-/* AMD sets the first MC device at device ID 0x18. */
-static inline u8 get_node_id(struct pci_dev *pdev)
-{
-       return PCI_SLOT(pdev->devfn) - 0x18;
-}
-
  enum amd_families {
         K8_CPUS = 0,
         F10_CPUS,

[2] http://quora.org/2012/16-server-boot-2.txt
-- 
Daniel J Blueman
Principal Software Engineer, Numascale Asia

  reply	other threads:[~2012-10-29  8:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-25  8:32 [PATCH v3] Add support for AMD64 EDAC on multiple PCI domains Daniel J Blueman
2012-10-25 11:03 ` Borislav Petkov
2012-10-25 11:56   ` Ingo Molnar
2012-10-25 13:59     ` Multiple patch authors (was: Re: [PATCH v3] Add support for AMD64 EDAC on multiple PCI domains) Borislav Petkov
2012-10-25 14:32       ` Multiple patch authors H. Peter Anvin
2012-10-25 14:36         ` Borislav Petkov
2012-10-25 14:41           ` H. Peter Anvin
2012-10-25 15:23             ` Borislav Petkov
2012-10-29  6:17   ` [PATCH v3] Add support for AMD64 EDAC on multiple PCI domains Daniel J Blueman
2012-10-29  8:54     ` Daniel J Blueman [this message]
2012-10-29 10:32       ` Borislav Petkov
2012-10-31  5:23         ` Daniel J Blueman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=508E4463.3080503@numascale-asia.com \
    --to=daniel@numascale-asia.com \
    --cc=bp@alien8.de \
    --cc=bp@amd64.org \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.