All of lore.kernel.org
 help / color / mirror / Atom feed
From: Suravee Suthikulanit <suravee.suthikulpanit@amd.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
	Jan Beulich <JBeulich@suse.com>
Cc: xen-devel <xen-devel@lists.xenproject.org>,
	Dario Faggioli <dario.faggioli@citrix.com>,
	Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Subject: Re: [PATCH 5/5] AMD IOMMU: widen NUMA nodes to be allocated from
Date: Mon, 9 Mar 2015 14:02:15 -0500	[thread overview]
Message-ID: <54FDEE37.7020008@amd.com> (raw)
In-Reply-To: <54FDD7C4.5020905@citrix.com>

On 3/9/2015 12:26 PM, Andrew Cooper wrote:
> On 09/03/15 15:42, Suravee Suthikulanit wrote:
>> On 3/6/2015 6:15 AM, Andrew Cooper wrote:
>>> On 06/03/2015 07:50, Jan Beulich wrote:
>>>>>>> On 05.03.15 at 18:30, <andrew.cooper3@citrix.com> wrote:
>>>>> On 26/02/15 13:56, Jan Beulich wrote:
>>>>>> --- a/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
>>>>>> +++ b/xen/include/asm-x86/hvm/svm/amd-iommu-proto.h
>>>>>> @@ -158,12 +158,12 @@ static inline unsigned long region_to_pa
>>>>>>        return (PAGE_ALIGN(addr + size) - (addr & PAGE_MASK)) >>
>>>>>> PAGE_SHIFT;
>>>>>>    }
>>>>>>
>>>>>> -static inline struct page_info* alloc_amd_iommu_pgtable(void)
>>>>>> +static inline struct page_info *alloc_amd_iommu_pgtable(struct
>>>>>> domain *d)
>>>>>>    {
>>>>>>        struct page_info *pg;
>>>>>>        void *vaddr;
>>>>>>
>>>>>> -    pg = alloc_domheap_page(NULL, 0);
>>>>>> +    pg = alloc_domheap_page(d, MEMF_no_owner);
>>>>> Same comment as with the VT-d side of things.  This should be based on
>>>>> the proximity information of the IOMMU, not of the owning domain.
>>>> I think I buy this argument on the VT-d side (under the assumption
>>>> that there's going to be at least one IOMMU per node), but I'm not
>>>> sure here: The most modern AMD box I have has just a single
>>>> IOMMU for 4 nodes it reports.
>>>
>>> It is not possible for an IOMMU to cover multiple NUMA nodes worth of
>>> IO, because of the position it has to sit relative to the IO root ports
>>> and QPI/HT links.
>>>
>>> In AMD systems, the IOMMUs lives in the northbridges, meaning one per
>>> numa node (as it is the northbridges which contain the hypertransport
>>> links)
>>>
>>> The BIOS/firmware will only report IOMMUs from northbridges which have
>>> IO connected to their IO hypertransport link (most systems in the wild
>>> have all IO hanging off one or two Numa nodes).  On the other hand, I
>>> have an AMD system with 8 IOMMUs in use.
>>
>>
>> Actually, a single IOMMU could handle multiple nodes. For example, in
>> scenario of a multi-chip-module (MCM) setup, there could be at least
>> 2-4 nodes sharing one IOMMU depending on how the platform vendor
>> configuring the system. In the server platforms, IOMMU is in AMD
>> northbridge chipsets (e.g. SR56xx). This website has an example of
>> such system configuration
>> (http://www.qdpma.com/systemarchitecture/SystemArchitecture_Opteron.html).
>
> Ok - I was basing my example on the last layout I had the manual for,
> which I believe was Bulldozer.
>
> However, my point still stands that there is an IOMMU between any IO and
> RAM.  An individual IOMMU will always benefit from having its
> iopagetables on the local numa node, rather than the numa node(s) which
> the domain owning the device is running on.
>

I agree that having the IO page tables on the NUMA node that is closest 
to the IOMMU would be beneficial.  However, I am not sure at the moment 
that this information could be easily determined. I think ACPI _PXM for 
devices should be able to provide this information, but this is optional 
and often not available.

>>
>> For AMD IOMMU, the IVRS table specifies the PCI bus/device ranges to
>> be handled by each IOMMU. This is probably should be considered here.
>
> Presumably a PCI transaction must never get onto the HT bus without
> having already undergone translation, or there can be no guarantee that
> it would be routed via the IOMMU?  Or are you saying that there are
> cases where a transaction will enter the HT bus, route sideways to an
> IOMMU, undergo translation, then route back onto the HT bus to the
> target RAM/processor?
>
> ~Andrew
>

IOMMU sits between PCI devices (downstream) and HT (uptream), all DMA 
transactions from downstream must go through IOMMU. On the other hand, 
the I/O page translation is handled by IOMMU, and it is a separate 
traffic than the downstream device DMA transactions.

Suravee

  reply	other threads:[~2015-03-09 19:02 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-26 13:44 [PATCH 0/5] (not just)x86/Dom0: NUMA related adjustments Jan Beulich
2015-02-26 13:52 ` [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on Jan Beulich
2015-02-26 17:14   ` Dario Faggioli
2015-02-27  8:46     ` Jan Beulich
2015-02-27 10:04       ` Dario Faggioli
2015-02-27 10:50         ` Jan Beulich
2015-02-27 14:54           ` Dario Faggioli
2015-02-27 15:04             ` Jan Beulich
2015-03-03 10:51             ` Jan Beulich
2015-03-04 10:18               ` Dario Faggioli
2015-03-06  9:11               ` Jan Beulich
2015-03-06 10:46                 ` Dario Faggioli
2015-03-06 11:33                   ` Dario Faggioli
2015-03-06 13:26                     ` Jan Beulich
2015-03-06 11:49                   ` Jan Beulich
2015-03-03  9:59   ` Ian Campbell
2015-03-05 16:11   ` Andrew Cooper
2015-03-05 16:43     ` Jan Beulich
2015-03-05 17:27       ` Andrew Cooper
2015-03-06  9:19         ` [PATCH 1/5 v2] " Jan Beulich
2015-03-06 10:41           ` Dario Faggioli
2015-03-06 16:05           ` Andrew Cooper
2015-02-26 13:53 ` [PATCH 2/5] allow domain heap allocations to specify more than one NUMA node Jan Beulich
2015-02-27 11:34   ` Dario Faggioli
2015-03-02 17:12   ` Ian Campbell
2015-03-03  7:59     ` Jan Beulich
2015-03-05 16:18   ` Andrew Cooper
2015-02-26 13:54 ` [PATCH 3/5] x86: widen NUMA nodes to be allocated from Jan Beulich
2015-02-27 13:27   ` Dario Faggioli
2015-02-27 13:36     ` Jan Beulich
2015-02-27 14:11       ` Dario Faggioli
2015-02-27 13:38     ` Julien Grall
2015-02-27 13:55       ` Dario Faggioli
2015-02-27 13:58       ` Jan Beulich
2015-02-27 13:46     ` Ian Campbell
2015-02-27 14:00       ` Dario Faggioli
2015-02-27 14:03       ` Jan Beulich
2015-03-05 16:39   ` Andrew Cooper
2015-02-26 13:55 ` [PATCH 4/5] VT-d: " Jan Beulich
2015-03-05 17:08   ` Andrew Cooper
2015-03-09  3:07     ` Tian, Kevin
2015-02-26 13:56 ` [PATCH 5/5] AMD IOMMU: " Jan Beulich
2015-03-05 17:30   ` Andrew Cooper
2015-03-06  7:50     ` Jan Beulich
2015-03-06 12:15       ` Andrew Cooper
2015-03-09 15:42         ` Suravee Suthikulanit
2015-03-09 17:26           ` Andrew Cooper
2015-03-09 19:02             ` Suravee Suthikulanit [this message]
2015-03-10  7:35               ` Jan Beulich
2015-03-10 13:55                 ` Boris Ostrovsky
2015-02-27 10:04 ` [PATCH 0/5] (not just)x86/Dom0: NUMA related adjustments Dario Faggioli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54FDEE37.7020008@amd.com \
    --to=suravee.suthikulpanit@amd.com \
    --cc=JBeulich@suse.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=aravind.gopalakrishnan@amd.com \
    --cc=dario.faggioli@citrix.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.