From: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>,
linux-kernel@vger.kernel.org, ak@suse.de, gregkh@suse.de,
muli@il.ibm.com, suresh.b.siddha@intel.com,
arjan@linux.intel.com, ashok.raj@intel.com, davem@davemloft.net,
clameter@sgi.com
Subject: Re: [Intel IOMMU 04/10] IOVA allocation and management routines
Date: Tue, 26 Jun 2007 09:16:31 -0700 [thread overview]
Message-ID: <20070626161631.GB3374@linux-os.sc.intel.com> (raw)
In-Reply-To: <20070625230747.2c2d4c11.akpm@linux-foundation.org>
On Mon, Jun 25, 2007 at 11:07:47PM -0700, Andrew Morton wrote:
> On Tue, 19 Jun 2007 14:37:05 -0700 "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com> wrote:
>
> All the inlines in this code are pretty pointless: all those functions have
> a single callsite so the compiler inlines them anyway. If we later add
> more callsites for these functions, they're too big to be inlined.
>
> inline is usually wrong: don't do it!
Yup, I agree and will follow in future.
> > +
> > +/**
> > + * find_iova - find's an iova for a given pfn
> > + * @iovad - iova domain in question.
> > + * pfn - page frame number
> > + * This function finds and returns an iova belonging to the
> > + * given doamin which matches the given pfn.
> > + */
> > +struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn)
> > +{
> > + unsigned long flags;
> > + struct rb_node *node;
> > +
> > + spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> > + node = iovad->rbroot.rb_node;
> > + while (node) {
> > + struct iova *iova = container_of(node, struct iova, node);
> > +
> > + /* If pfn falls within iova's range, return iova */
> > + if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
> > + spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > + return iova;
> > + }
> > +
> > + if (pfn < iova->pfn_lo)
> > + node = node->rb_left;
> > + else if (pfn > iova->pfn_lo)
> > + node = node->rb_right;
> > + }
> > +
> > + spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > + return NULL;
> > +}
>
> So we take the lock, look up an item, then drop the lock then return the
> item we just found. We took no refcount on it and we didn't do anything to
> keep this object alive.
>
> Is that a bug, or does the (afacit undocumented) lifecycle management of
> these things take care of it in some manner? If yes, please reply via an
> add-a-comment patch.
Nope, this is not a bug. Adding a comment patch which explains the same.
>
>
> > +/**
> > + * __free_iova - frees the given iova
> > + * @iovad: iova domain in question.
> > + * @iova: iova in question.
> > + * Frees the given iova belonging to the giving domain
> > + */
> > +void
> > +__free_iova(struct iova_domain *iovad, struct iova *iova)
> > +{
> > + unsigned long flags;
> > +
> > + if (iova) {
> > + spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> > + __cached_rbnode_delete_update(iovad, iova);
> > + rb_erase(&iova->node, &iovad->rbroot);
> > + spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > + free_iova_mem(iova);
> > + }
> > +}
>
> Can this really be called with NULL? If so, under what circumstances?
> (This reader couldn't work it out from a brief look at the code, so perhaps
> others will not be able to either. Perhaps a comment is needed)
It was getting called from only one place free_iova().
Below patch address your concern.
>
> > +/**
> > + * free_iova - finds and frees the iova for a given pfn
> > + * @iovad: - iova domain in question.
> > + * @pfn: - pfn that is allocated previously
> > + * This functions finds an iova for a given pfn and then
> > + * frees the iova from that domain.
> > + */
> > +void
> > +free_iova(struct iova_domain *iovad, unsigned long pfn)
> > +{
> > + struct iova *iova = find_iova(iovad, pfn);
> > + __free_iova(iovad, iova);
> > +
> > +}
> > +
> > +/**
> > + * put_iova_domain - destroys the iova doamin
> > + * @iovad: - iova domain in question.
> > + * All the iova's in that domain are destroyed.
> > + */
> > +void put_iova_domain(struct iova_domain *iovad)
> > +{
> > + struct rb_node *node;
> > + unsigned long flags;
> > +
> > + spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> > + node = rb_first(&iovad->rbroot);
> > + while (node) {
> > + struct iova *iova = container_of(node, struct iova, node);
> > + rb_erase(node, &iovad->rbroot);
> > + free_iova_mem(iova);
> > + node = rb_first(&iovad->rbroot);
> > + }
> > + spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > +}
>
> Right, so I suspect what's happening here is that all iova's remain valid
> until their entire domain is destroyed, yes?
Nope. IOVA are valid only for the duration of DMA MAP and DMA UNMAP calls.
In case of Intel-iommu driver, the iova's are valid only for the duration
of __intel_map_singl() and __intel_unmap_single() calls.
>
> What is the upper bound to the memory consumpotion here, and what provides
> it?
As explained above, iova are freed and reused again during the DMA map calls.
>
> Again, some code comments about these design issues are appropriate.
>
> > +/*
> > + * We need a fixed PAGE_SIZE of 4K irrespective of
> > + * arch PAGE_SIZE for IOMMU page tables.
> > + */
> > +#define PAGE_SHIFT_4K (12)
> > +#define PAGE_SIZE_4K (1UL << PAGE_SHIFT_4K)
> > +#define PAGE_MASK_4K (((u64)-1) << PAGE_SHIFT_4K)
> > +#define PAGE_ALIGN_4K(addr) (((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
>
> Am still wondering why we cannot use PAGE_SIZE, PAGE_SHIFT, etc here.
VT-d hardware(a.k.a Intel IOMMU hardware) page table size is always
4K irrespective of the OS PAGE_SIZE. We want to use the same code for
IA64 too where the OS PAGE_SIZE may not be 4K size and hence
had to define here for IOMMU.
>
> > +#define IOVA_START_ADDR (0x1000)
>
> What determined that address? (Needs comment)
Fixed. Please see bleow patch.
>
> > +#define IOVA_START_PFN (IOVA_START_ADDR >> PAGE_SHIFT_4K)
> > +
> > +#define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT_4K)
>
> So I'm looking at this and wondering "what type does addr have"?
>
> If it's unsigned long then perhaps we have a problem on x86_32 PAE. Maybe
> we don't support x86_32 PAE, but still, I'd have thought that the
> appropriate type here is dma_addr_t.
Yup that is correct. We don;t support x86_32.
>
> But alas, it was needlessly implemented as a macro, so the reader cannot
> tell.
I am in the process of making the same code base to work for
IA64 architecure too and in this process I will do more cleanup.
Thanks.
Please apply the below patch as a fix the existing patch.
signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
---
drivers/pci/iova.c | 22 ++++++++++++++--------
drivers/pci/iova.h | 4 ++--
2 files changed, 16 insertions(+), 10 deletions(-)
Index: linux-2.6.22-rc4-mm2/drivers/pci/iova.c
===================================================================
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/iova.c 2007-06-26 07:50:34.000000000 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/iova.c 2007-06-26 08:25:23.000000000 -0700
@@ -166,6 +166,7 @@
unsigned long flags;
struct rb_node *node;
+ /* Take the lock so that no other thread is manipulating the rbtree */
spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
node = iovad->rbroot.rb_node;
while (node) {
@@ -174,6 +175,12 @@
/* If pfn falls within iova's range, return iova */
if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+ /* We are not holding the lock while this iova
+ * is referenced by the caller as the same thread
+ * which called this function also calls __free_iova()
+ * and it is by desing that only one thread can possibly
+ * reference a particular iova and hence no conflict.
+ */
return iova;
}
@@ -198,13 +205,11 @@
{
unsigned long flags;
- if (iova) {
- spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
- __cached_rbnode_delete_update(iovad, iova);
- rb_erase(&iova->node, &iovad->rbroot);
- spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
- free_iova_mem(iova);
- }
+ spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
+ __cached_rbnode_delete_update(iovad, iova);
+ rb_erase(&iova->node, &iovad->rbroot);
+ spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+ free_iova_mem(iova);
}
/**
@@ -218,7 +223,8 @@
free_iova(struct iova_domain *iovad, unsigned long pfn)
{
struct iova *iova = find_iova(iovad, pfn);
- __free_iova(iovad, iova);
+ if (iova)
+ __free_iova(iovad, iova);
}
Index: linux-2.6.22-rc4-mm2/drivers/pci/iova.h
===================================================================
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/iova.h 2007-06-26 07:50:34.000000000 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/iova.h 2007-06-26 08:28:05.000000000 -0700
@@ -24,8 +24,8 @@
#define PAGE_MASK_4K (((u64)-1) << PAGE_SHIFT_4K)
#define PAGE_ALIGN_4K(addr) (((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
-#define IOVA_START_ADDR (0x1000)
-#define IOVA_START_PFN (IOVA_START_ADDR >> PAGE_SHIFT_4K)
+/* IO virtual address start page frame number */
+#define IOVA_START_PFN (1)
#define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT_4K)
#define DMA_32BIT_PFN IOVA_PFN(DMA_32BIT_MASK)
>
next prev parent reply other threads:[~2007-06-26 16:21 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-06-19 21:37 [Intel IOMMU 00/10] Intel IOMMU support, take #2 Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 01/10] DMAR detection and parsing logic Keshavamurthy, Anil S
2007-07-04 9:18 ` Peter Zijlstra
2007-07-04 10:04 ` Andrew Morton
2007-07-04 10:14 ` Peter Zijlstra
2007-06-19 21:37 ` [Intel IOMMU 02/10] PCI generic helper function Keshavamurthy, Anil S
2007-06-26 5:49 ` Andrew Morton
2007-06-26 14:44 ` Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 03/10] clflush_cache_range now takes size param Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 04/10] IOVA allocation and management routines Keshavamurthy, Anil S
2007-06-26 6:07 ` Andrew Morton
2007-06-26 16:16 ` Keshavamurthy, Anil S [this message]
2007-06-19 21:37 ` [Intel IOMMU 05/10] Intel IOMMU driver Keshavamurthy, Anil S
2007-06-19 23:32 ` Christoph Lameter
2007-06-19 23:50 ` Keshavamurthy, Anil S
2007-06-19 23:56 ` Christoph Lameter
2007-06-26 6:32 ` Andrew Morton
2007-06-26 16:29 ` Keshavamurthy, Anil S
2007-06-26 6:25 ` Andrew Morton
2007-06-26 16:33 ` Keshavamurthy, Anil S
2007-06-26 6:30 ` Andrew Morton
2007-06-19 21:37 ` [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls Keshavamurthy, Anil S
2007-06-19 23:25 ` Christoph Lameter
2007-06-19 23:27 ` Arjan van de Ven
2007-06-19 23:34 ` Christoph Lameter
2007-06-20 0:02 ` Arjan van de Ven
2007-06-20 8:06 ` Peter Zijlstra
2007-06-20 13:03 ` Arjan van de Ven
2007-06-20 17:30 ` Siddha, Suresh B
2007-06-20 18:05 ` Peter Zijlstra
2007-06-20 19:14 ` Arjan van de Ven
2007-06-20 20:08 ` Peter Zijlstra
2007-06-20 23:03 ` Keshavamurthy, Anil S
2007-06-21 6:10 ` Peter Zijlstra
2007-06-21 6:11 ` Arjan van de Ven
2007-06-21 6:29 ` Peter Zijlstra
2007-06-21 6:37 ` Keshavamurthy, Anil S
2007-06-21 7:13 ` Peter Zijlstra
2007-06-21 19:51 ` Keshavamurthy, Anil S
2007-06-21 6:30 ` Keshavamurthy, Anil S
2007-06-26 5:34 ` Andrew Morton
2007-06-19 21:37 ` [Intel IOMMU 07/10] Intel iommu cmdline option - forcedac Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 08/10] DMAR fault handling support Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 09/10] Iommu Gfx workaround Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 10/10] Iommu floppy workaround Keshavamurthy, Anil S
2007-06-26 6:42 ` Andrew Morton
2007-06-26 10:37 ` Andi Kleen
2007-06-26 19:25 ` Keshavamurthy, Anil S
2007-06-26 16:26 ` Keshavamurthy, Anil S
2007-06-26 6:45 ` [Intel IOMMU 00/10] Intel IOMMU support, take #2 Andrew Morton
2007-06-26 7:12 ` Andi Kleen
2007-06-26 11:13 ` Muli Ben-Yehuda
2007-06-26 15:03 ` Arjan van de Ven
2007-06-26 15:11 ` Muli Ben-Yehuda
2007-06-26 15:48 ` Keshavamurthy, Anil S
2007-06-26 16:00 ` Muli Ben-Yehuda
2007-06-26 15:56 ` Andi Kleen
2007-06-26 15:09 ` Muli Ben-Yehuda
2007-06-26 15:36 ` Andi Kleen
2007-06-26 15:15 ` Arjan van de Ven
2007-06-26 15:33 ` Andi Kleen
2007-06-26 16:25 ` Arjan van de Ven
2007-06-26 17:31 ` Andi Kleen
2007-06-26 20:10 ` Jesse Barnes
2007-06-26 22:35 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070626161631.GB3374@linux-os.sc.intel.com \
--to=anil.s.keshavamurthy@intel.com \
--cc=ak@suse.de \
--cc=akpm@linux-foundation.org \
--cc=arjan@linux.intel.com \
--cc=ashok.raj@intel.com \
--cc=clameter@sgi.com \
--cc=davem@davemloft.net \
--cc=gregkh@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=muli@il.ibm.com \
--cc=suresh.b.siddha@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).