Re: [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mike Habeck <habeck@sgi.com>
To: Chris Wright <chrisw@sous-sol.org>
Cc: Mike Travis <travis@sgi.com>,
	David Woodhouse <dwmw2@infradead.org>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	iommu@lists.linux-foundation.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges
Date: Thu, 31 Mar 2011 20:05:35 -0500	[thread overview]
Message-ID: <4D9524DF.10405@sgi.com> (raw)
In-Reply-To: <20110331235657.GG18712@sequoia.sous-sol.org>



Chris Wright wrote:
> * Mike Habeck (habeck@sgi.com) wrote:
>> On 03/31/2011 06:25 PM, Mike Travis wrote:
>>> I'll probably need help from our Hardware PCI Engineer to help explain
>>> this further, though here's a pointer to an earlier email thread:
>>>
>>> http://marc.info/?l=linux-kernel&m=129259816925973&w=2
>>>
>>> I'll also dig out the specs you're asking for.
>>>
>>> Thanks,
>>> Mike
>>>
>>> Chris Wright wrote:
>>>> * Mike Travis (travis@sgi.com) wrote:
>>>>> Chris - did you have any comment on this patch?
>>>> It doesn't actually look right to me. It means that particular range
>>>> is no longer reserved. But perhaps I've misunderstood something.
>>>>
>>>>> Mike Travis wrote:
>>>>>> dmar_init_reserved_ranges() reserves the card's MMIO ranges to
>>>>>> prevent handing out a DMA map that would overlap with the MMIO range.
>>>>>> The problem while the Nvidia GPU has 64bit BARs, it's capable of
>>>>>> receiving > 40bit PIOs, but can't generate > 40bit DMAs.
>>>> I don't undertand what you mean here.
>> What Mike is getting at is there is no reason to reserve the MMIO
>> range if it's greater than the dma_mask, given the MMIO range is
>> outside of what the IOVA code will ever hand back to the IOMMU
>> code.  In this case the nVidia card has a 64bit BAR and is assigned
>> the MMIO range [0xf8200000000 - 0xf820fffffff].  But the Nvidia
>> card can only generate a 40bit DMA (thus has a 40bit dma_mask). If
>> the IOVA code honors the limit_pfn (i.e., dma_mask) passed in it
>> will never hand back a >40bit address back to the IOMMU code. Thus
>> there is no reason to reserve the cards MMIO range if it is greater
>> than the dma_mask. (And that is what the patch is doing).
> 
> The reserved ranges are for all devices.  Another device with a 64bit
> dma_mask could get that region if it's not properly reserved.  The
> driver would then program that device to dma to an address to is an
> alias to a MMIO region.  The memory transaction travels up towards
> root...and sees the MMIO range in some bridge and would go straight down
> to the GPU.

Chris,

OK, I understand now what you meant by the patch possibly causing
the DMA transaction to become a peer to peer transaction.  Mike and
I will have to rethink this one.  Thanks for your input.

-mike


> 
>> More below,,,
>>
>>>>>> So when the iommu code reserves these MMIO ranges a > 40bit
>>>>>> entry ends up getting in the rbtree. On a UV test system with
>>>>>> the Nvidia cards, the BARs are:
>>>>>>
>>>>>> 0001:36:00.0 VGA compatible controller: nVidia Corporation
>>>>>> GT200GL Region 0: Memory at 92000000 (32-bit, non-prefetchable)
>>>>>> [size=16M]
>>>>>> Region 1: Memory at f8200000000 (64-bit, prefetchable) [size=256M]
>>>>>> Region 3: Memory at 90000000 (64-bit, non-prefetchable) [size=32M]
>>>>>>
>>>>>> So this 44bit MMIO address 0xf8200000000 ends up in the rbtree. As DMA
>>>>>> maps get added and deleted from the rbtree we can end up getting a cached
>>>>>> entry to this 0xf8200000000 entry... this is what results in the code
>>>>>> handing out the invalid DMA map of 0xf81fffff000:
>>>>>>
>>>>>> [ 0xf8200000000-1 >> PAGE_SIZE << PAGE_SIZE ]
>>>>>>
>>>>>> The IOVA code needs to better honor the "limit_pfn" when allocating
>>>>>> these maps.
>>>> This means we could get the MMIO address range (it's no longer reserved).
>> Not true, the MMIO address is greater than the dma_mask (i.e., the
>> limit_pfn passed into alloc_iova()) thus the IOVA code will never
>> hand back that address range given it's greater than the dma_mask).
> 
> Well, as you guys are seeing, the iova allocation code is making the
> assumption that if the range is in the tree, it's valid.  And it is
> handing out an address that's too large.
> 
>>>> It seems to me the DMA transaction would then become a peer to peer
>>>> transaction if ACS is not enabled, which could show up as random register
>>>> write in that GPUs 256M BAR (i.e. broken).
>>>>
>>>> The iova allocation should not hand out an address bigger than the
>>>> dma_mask. What is the device's dma_mask?
>> Agree.  But there is a bug.  The IOVA doesn't validate the limit_pfn
>> if it uses the cached entry.  One could argue that it should validate
>> the limit_pfn, but then again a entry outside the limit_pfn should
>> have never got into the rbtree...  (it got in due to the IOMMU's
>> dmar_init_reserved_ranges() adding it).
> 
> Yeah, I think it needs to be in the global reserved list.  But perhaps
> not copied into the domain specific iova.  Or simply skipped on iova
> allocation (don't just assume rb_last is <= dma_mask).
> 
> thanks,
> -chris

next prev parent reply	other threads:[~2011-04-01  1:07 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-29 23:36 [PATCH 0/4] pci: Speed up processing of IOMMU related functions Mike Travis
2011-03-29 23:36 ` [PATCH 1/4] Intel pci: Remove Host Bridge devices from identity mapping Mike Travis
2011-03-30 17:51   ` Chris Wright
2011-03-30 18:30     ` Mike Travis
2011-03-30 19:15       ` Chris Wright
2011-03-30 19:25         ` Mike Travis
2011-03-30 19:57           ` Chris Wright
2011-03-29 23:36 ` [PATCH 2/4] Intel iommu: Speed up processing of the identity_mapping function Mike Travis
2011-03-30 19:19   ` Chris Wright
2011-03-30 19:29     ` Mike Travis
2011-03-31  0:33     ` [PATCH 2/4] Intel iommu: Speed up processing of the identity_mapping function v2 Mike Travis
2011-03-29 23:36 ` [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges Mike Travis
2011-03-31 22:11   ` Mike Travis
2011-03-31 22:53     ` Chris Wright
2011-03-31 23:25       ` Mike Travis
2011-03-31 23:40         ` Mike Habeck
2011-03-31 23:56           ` Chris Wright
2011-04-01  1:05             ` Mike Habeck [this message]
2011-04-02  0:32               ` [PATCH 3/4 v2] intel-iommu: don't cache iova above 32bit caching boundary Chris Wright
2011-04-06  0:39                 ` [PATCH 3/4 v3] " Chris Wright
2011-03-31 23:39       ` [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges Chris Wright
2011-03-29 23:36 ` [PATCH 4/4] Intel pci: Use coherent DMA mask when requested Mike Travis
2011-03-30 18:02   ` Chris Wright
2011-04-01  2:57     ` FUJITA Tomonori
2011-04-07 19:47 ` [PATCH 1/4] Intel pci: Remove Host Bridge devices from identity mapping Mike Travis
2011-04-07 19:51 ` [PATCH 2/4] Intel iommu: Speed up processing of the identity_mapping function Mike Travis
2011-04-07 19:52 ` [PATCH 4/4] Intel pci: Use coherent DMA mask when requested Mike Travis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D9524DF.10405@sgi.com \
    --to=habeck@sgi.com \
    --cc=chrisw@sous-sol.org \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jbarnes@virtuousgeek.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=travis@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox