Re: [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mike Habeck <habeck@sgi.com>
To: Chris Wright <chrisw@sous-sol.org>
Cc: Mike Travis <travis@sgi.com>,
	David Woodhouse <dwmw2@infradead.org>,
	Jesse Barnes <jbarnes@virtuousgeek.org>,
	iommu@lists.linux-foundation.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges
Date: Thu, 31 Mar 2011 20:05:35 -0500	[thread overview]
Message-ID: <4D9524DF.10405@sgi.com> (raw)
In-Reply-To: <20110331235657.GG18712@sequoia.sous-sol.org>



Chris Wright wrote:
> * Mike Habeck (habeck@sgi.com) wrote:
>> On 03/31/2011 06:25 PM, Mike Travis wrote:
>>> I'll probably need help from our Hardware PCI Engineer to help explain
>>> this further, though here's a pointer to an earlier email thread:
>>>
>>> http://marc.info/?l=linux-kernel&m=129259816925973&w=2
>>>
>>> I'll also dig out the specs you're asking for.
>>>
>>> Thanks,
>>> Mike
>>>
>>> Chris Wright wrote:
>>>> * Mike Travis (travis@sgi.com) wrote:
>>>>> Chris - did you have any comment on this patch?
>>>> It doesn't actually look right to me. It means that particular range
>>>> is no longer reserved. But perhaps I've misunderstood something.
>>>>
>>>>> Mike Travis wrote:
>>>>>> dmar_init_reserved_ranges() reserves the card's MMIO ranges to
>>>>>> prevent handing out a DMA map that would overlap with the MMIO range.
>>>>>> The problem while the Nvidia GPU has 64bit BARs, it's capable of
>>>>>> receiving > 40bit PIOs, but can't generate > 40bit DMAs.
>>>> I don't undertand what you mean here.
>> What Mike is getting at is there is no reason to reserve the MMIO
>> range if it's greater than the dma_mask, given the MMIO range is
>> outside of what the IOVA code will ever hand back to the IOMMU
>> code.  In this case the nVidia card has a 64bit BAR and is assigned
>> the MMIO range [0xf8200000000 - 0xf820fffffff].  But the Nvidia
>> card can only generate a 40bit DMA (thus has a 40bit dma_mask). If
>> the IOVA code honors the limit_pfn (i.e., dma_mask) passed in it
>> will never hand back a >40bit address back to the IOMMU code. Thus
>> there is no reason to reserve the cards MMIO range if it is greater
>> than the dma_mask. (And that is what the patch is doing).
> 
> The reserved ranges are for all devices.  Another device with a 64bit
> dma_mask could get that region if it's not properly reserved.  The
> driver would then program that device to dma to an address to is an
> alias to a MMIO region.  The memory transaction travels up towards
> root...and sees the MMIO range in some bridge and would go straight down
> to the GPU.

Chris,

OK, I understand now what you meant by the patch possibly causing
the DMA transaction to become a peer to peer transaction.  Mike and
I will have to rethink this one.  Thanks for your input.

-mike


> 
>> More below,,,
>>
>>>>>> So when the iommu code reserves these MMIO ranges a > 40bit
>>>>>> entry ends up getting in the rbtree. On a UV test system with
>>>>>> the Nvidia cards, the BARs are:
>>>>>>
>>>>>> 0001:36:00.0 VGA compatible controller: nVidia Corporation
>>>>>> GT200GL Region 0: Memory at 92000000 (32-bit, non-prefetchable)
>>>>>> [size=16M]
>>>>>> Region 1: Memory at f8200000000 (64-bit, prefetchable) [size=256M]
>>>>>> Region 3: Memory at 90000000 (64-bit, non-prefetchable) [size=32M]
>>>>>>
>>>>>> So this 44bit MMIO address 0xf8200000000 ends up in the rbtree. As DMA
>>>>>> maps get added and deleted from the rbtree we can end up getting a cached
>>>>>> entry to this 0xf8200000000 entry... this is what results in the code
>>>>>> handing out the invalid DMA map of 0xf81fffff000:
>>>>>>
>>>>>> [ 0xf8200000000-1 >> PAGE_SIZE << PAGE_SIZE ]
>>>>>>
>>>>>> The IOVA code needs to better honor the "limit_pfn" when allocating
>>>>>> these maps.
>>>> This means we could get the MMIO address range (it's no longer reserved).
>> Not true, the MMIO address is greater than the dma_mask (i.e., the
>> limit_pfn passed into alloc_iova()) thus the IOVA code will never
>> hand back that address range given it's greater than the dma_mask).
> 
> Well, as you guys are seeing, the iova allocation code is making the
> assumption that if the range is in the tree, it's valid.  And it is
> handing out an address that's too large.
> 
>>>> It seems to me the DMA transaction would then become a peer to peer
>>>> transaction if ACS is not enabled, which could show up as random register
>>>> write in that GPUs 256M BAR (i.e. broken).
>>>>
>>>> The iova allocation should not hand out an address bigger than the
>>>> dma_mask. What is the device's dma_mask?
>> Agree.  But there is a bug.  The IOVA doesn't validate the limit_pfn
>> if it uses the cached entry.  One could argue that it should validate
>> the limit_pfn, but then again a entry outside the limit_pfn should
>> have never got into the rbtree...  (it got in due to the IOMMU's
>> dmar_init_reserved_ranges() adding it).
> 
> Yeah, I think it needs to be in the global reserved list.  But perhaps
> not copied into the domain specific iova.  Or simply skipped on iova
> allocation (don't just assume rb_last is <= dma_mask).
> 
> thanks,
> -chris

next prev parent reply	other threads:[~2011-04-01  1:07 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-03-29 23:36 [PATCH 0/4] pci: Speed up processing of IOMMU related functions Mike Travis
2011-03-29 23:36 ` [PATCH 1/4] Intel pci: Remove Host Bridge devices from identity mapping Mike Travis
2011-03-30 17:51   ` Chris Wright
2011-03-30 18:30     ` Mike Travis
2011-03-30 19:15       ` Chris Wright
2011-03-30 19:25         ` Mike Travis
2011-03-30 19:57           ` Chris Wright
2011-03-29 23:36 ` [PATCH 2/4] Intel iommu: Speed up processing of the identity_mapping function Mike Travis
2011-03-30 19:19   ` Chris Wright
2011-03-30 19:29     ` Mike Travis
2011-03-31  0:33     ` [PATCH 2/4] Intel iommu: Speed up processing of the identity_mapping function v2 Mike Travis
2011-03-29 23:36 ` [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges Mike Travis
2011-03-31 22:11   ` Mike Travis
2011-03-31 22:53     ` Chris Wright
2011-03-31 23:25       ` Mike Travis
2011-03-31 23:40         ` Mike Habeck
2011-03-31 23:56           ` Chris Wright
2011-04-01  1:05             ` Mike Habeck [this message]
2011-04-02  0:32               ` [PATCH 3/4 v2] intel-iommu: don't cache iova above 32bit caching boundary Chris Wright
2011-04-06  0:39                 ` [PATCH 3/4 v3] " Chris Wright
2011-03-31 23:39       ` [PATCH 3/4] Intel pci: Limit dmar_init_reserved_ranges Chris Wright
2011-03-29 23:36 ` [PATCH 4/4] Intel pci: Use coherent DMA mask when requested Mike Travis
2011-03-30 18:02   ` Chris Wright
2011-04-01  2:57     ` FUJITA Tomonori
2011-04-07 19:47 ` [PATCH 1/4] Intel pci: Remove Host Bridge devices from identity mapping Mike Travis
2011-04-07 19:51 ` [PATCH 2/4] Intel iommu: Speed up processing of the identity_mapping function Mike Travis
2011-04-07 19:52 ` [PATCH 4/4] Intel pci: Use coherent DMA mask when requested Mike Travis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D9524DF.10405@sgi.com \
    --to=habeck@sgi.com \
    --cc=chrisw@sous-sol.org \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jbarnes@virtuousgeek.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=travis@sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.