Re: [Intel IOMMU 04/10] IOVA allocation and management routines

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>,
	linux-kernel@vger.kernel.org, ak@suse.de, gregkh@suse.de,
	muli@il.ibm.com, suresh.b.siddha@intel.com,
	arjan@linux.intel.com, ashok.raj@intel.com, davem@davemloft.net,
	clameter@sgi.com
Subject: Re: [Intel IOMMU 04/10] IOVA allocation and management routines
Date: Tue, 26 Jun 2007 09:16:31 -0700	[thread overview]
Message-ID: <20070626161631.GB3374@linux-os.sc.intel.com> (raw)
In-Reply-To: <20070625230747.2c2d4c11.akpm@linux-foundation.org>

On Mon, Jun 25, 2007 at 11:07:47PM -0700, Andrew Morton wrote:
> On Tue, 19 Jun 2007 14:37:05 -0700 "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com> wrote:
> 
> All the inlines in this code are pretty pointless: all those functions have
> a single callsite so the compiler inlines them anyway.  If we later add
> more callsites for these functions, they're too big to be inlined.
> 
> inline is usually wrong: don't do it!
Yup, I agree and will follow in future.

> > +
> > +/**
> > + * find_iova - find's an iova for a given pfn
> > + * @iovad - iova domain in question.
> > + * pfn - page frame number
> > + * This function finds and returns an iova belonging to the
> > + * given doamin which matches the given pfn.
> > + */
> > +struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn)
> > +{
> > +	unsigned long flags;
> > +	struct rb_node *node;
> > +
> > +	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> > +	node = iovad->rbroot.rb_node;
> > +	while (node) {
> > +		struct iova *iova = container_of(node, struct iova, node);
> > +
> > +		/* If pfn falls within iova's range, return iova */
> > +		if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
> > +			spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > +			return iova;
> > +		}
> > +
> > +		if (pfn < iova->pfn_lo)
> > +			node = node->rb_left;
> > +		else if (pfn > iova->pfn_lo)
> > +			node = node->rb_right;
> > +	}
> > +
> > +	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > +	return NULL;
> > +}
> 
> So we take the lock, look up an item, then drop the lock then return the
> item we just found.  We took no refcount on it and we didn't do anything to
> keep this object alive.
> 
> Is that a bug, or does the (afacit undocumented) lifecycle management of
> these things take care of it in some manner?  If yes, please reply via an
> add-a-comment patch.

Nope, this is not a bug. Adding a comment patch which explains the same.

> 
> 
> > +/**
> > + * __free_iova - frees the given iova
> > + * @iovad: iova domain in question.
> > + * @iova: iova in question.
> > + * Frees the given iova belonging to the giving domain
> > + */
> > +void
> > +__free_iova(struct iova_domain *iovad, struct iova *iova)
> > +{
> > +	unsigned long flags;
> > +
> > +	if (iova) {
> > +		spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> > +		__cached_rbnode_delete_update(iovad, iova);
> > +		rb_erase(&iova->node, &iovad->rbroot);
> > +		spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > +		free_iova_mem(iova);
> > +	}
> > +}
> 

> Can this really be called with NULL?  If so, under what circumstances? 
> (This reader couldn't work it out from a brief look at the code, so perhaps
> others will not be able to either.  Perhaps a comment is needed)

It was getting called from only one place free_iova().
Below patch address your concern.

> 
> > +/**
> > + * free_iova - finds and frees the iova for a given pfn
> > + * @iovad: - iova domain in question.
> > + * @pfn: - pfn that is allocated previously
> > + * This functions finds an iova for a given pfn and then
> > + * frees the iova from that domain.
> > + */
> > +void
> > +free_iova(struct iova_domain *iovad, unsigned long pfn)
> > +{
> > +	struct iova *iova = find_iova(iovad, pfn);
> > +	__free_iova(iovad, iova);
> > +
> > +}
> > +
> > +/**
> > + * put_iova_domain - destroys the iova doamin
> > + * @iovad: - iova domain in question.
> > + * All the iova's in that domain are destroyed.
> > + */
> > +void put_iova_domain(struct iova_domain *iovad)
> > +{
> > +	struct rb_node *node;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> > +	node = rb_first(&iovad->rbroot);
> > +	while (node) {
> > +		struct iova *iova = container_of(node, struct iova, node);
> > +		rb_erase(node, &iovad->rbroot);
> > +		free_iova_mem(iova);
> > +		node = rb_first(&iovad->rbroot);
> > +	}
> > +	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > +}
> 
> Right, so I suspect what's happening here is that all iova's remain valid
> until their entire domain is destroyed, yes?

Nope. IOVA are valid only for the duration of DMA MAP and DMA UNMAP calls.
In case of Intel-iommu driver, the iova's are valid only for the duration
of __intel_map_singl() and __intel_unmap_single() calls.

> 
> What is the upper bound to the memory consumpotion here, and what provides
> it?
As explained above, iova are freed and reused again during the DMA map calls.

> 
> Again, some code comments about these design issues are appropriate.
> 
> > +/*
> > + * We need a fixed PAGE_SIZE of 4K irrespective of
> > + * arch PAGE_SIZE for IOMMU page tables.
> > + */
> > +#define PAGE_SHIFT_4K		(12)
> > +#define PAGE_SIZE_4K		(1UL << PAGE_SHIFT_4K)
> > +#define PAGE_MASK_4K		(((u64)-1) << PAGE_SHIFT_4K)
> > +#define PAGE_ALIGN_4K(addr)	(((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
> 
> Am still wondering why we cannot use PAGE_SIZE, PAGE_SHIFT, etc here.
VT-d hardware(a.k.a Intel IOMMU hardware) page table size is always
4K irrespective of the OS PAGE_SIZE. We want to use the same code for 
IA64 too where the OS PAGE_SIZE may not be 4K size and hence
had to define here for IOMMU.

> 
> > +#define IOVA_START_ADDR		(0x1000)
> 
> What determined that address?  (Needs comment)
Fixed. Please see bleow patch.

> 
> > +#define IOVA_START_PFN		(IOVA_START_ADDR >> PAGE_SHIFT_4K)
> > +
> > +#define IOVA_PFN(addr)		((addr) >> PAGE_SHIFT_4K)
> 
> So I'm looking at this and wondering "what type does addr have"?
> 
> If it's unsigned long then perhaps we have a problem on x86_32 PAE.  Maybe
> we don't support x86_32 PAE, but still, I'd have thought that the
> appropriate type here is dma_addr_t.
Yup that is correct. We don;t support x86_32.
> 
> But alas, it was needlessly implemented as a macro, so the reader cannot
> tell.
I am in the process of making the same code base to work for
IA64 architecure too and in this process I will do more cleanup.

Thanks.

Please apply the below patch as a fix the existing patch.


signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>

---
 drivers/pci/iova.c |   22 ++++++++++++++--------
 drivers/pci/iova.h |    4 ++--
 2 files changed, 16 insertions(+), 10 deletions(-)

Index: linux-2.6.22-rc4-mm2/drivers/pci/iova.c
===================================================================
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/iova.c	2007-06-26 07:50:34.000000000 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/iova.c	2007-06-26 08:25:23.000000000 -0700
@@ -166,6 +166,7 @@
 	unsigned long flags;
 	struct rb_node *node;
 
+	/* Take the lock so that no other thread is manipulating the rbtree */
 	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
 	node = iovad->rbroot.rb_node;
 	while (node) {
@@ -174,6 +175,12 @@
 		/* If pfn falls within iova's range, return iova */
 		if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
 			spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+			/* We are not holding the lock while this iova
+			 * is referenced by the caller as the same thread
+			 * which called this function also calls __free_iova()
+			 * and it is by desing that only one thread can possibly
+			 * reference a particular iova and hence no conflict.
+			 */
 			return iova;
 		}
 
@@ -198,13 +205,11 @@
 {
 	unsigned long flags;
 
-	if (iova) {
-		spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
-		__cached_rbnode_delete_update(iovad, iova);
-		rb_erase(&iova->node, &iovad->rbroot);
-		spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
-		free_iova_mem(iova);
-	}
+	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
+	__cached_rbnode_delete_update(iovad, iova);
+	rb_erase(&iova->node, &iovad->rbroot);
+	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+	free_iova_mem(iova);
 }
 
 /**
@@ -218,7 +223,8 @@
 free_iova(struct iova_domain *iovad, unsigned long pfn)
 {
 	struct iova *iova = find_iova(iovad, pfn);
-	__free_iova(iovad, iova);
+	if (iova)
+		__free_iova(iovad, iova);
 
 }
 
Index: linux-2.6.22-rc4-mm2/drivers/pci/iova.h
===================================================================
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/iova.h	2007-06-26 07:50:34.000000000 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/iova.h	2007-06-26 08:28:05.000000000 -0700
@@ -24,8 +24,8 @@
 #define PAGE_MASK_4K		(((u64)-1) << PAGE_SHIFT_4K)
 #define PAGE_ALIGN_4K(addr)	(((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
 
-#define IOVA_START_ADDR		(0x1000)
-#define IOVA_START_PFN		(IOVA_START_ADDR >> PAGE_SHIFT_4K)
+/* IO virtual address start page frame number */
+#define IOVA_START_PFN		(1)
 
 #define IOVA_PFN(addr)		((addr) >> PAGE_SHIFT_4K)
 #define DMA_32BIT_PFN	IOVA_PFN(DMA_32BIT_MASK)
>

next prev parent reply	other threads:[~2007-06-26 16:21 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-19 21:37 [Intel IOMMU 00/10] Intel IOMMU support, take #2 Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 01/10] DMAR detection and parsing logic Keshavamurthy, Anil S
2007-07-04  9:18   ` Peter Zijlstra
2007-07-04 10:04     ` Andrew Morton
2007-07-04 10:14       ` Peter Zijlstra
2007-06-19 21:37 ` [Intel IOMMU 02/10] PCI generic helper function Keshavamurthy, Anil S
2007-06-26  5:49   ` Andrew Morton
2007-06-26 14:44     ` Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 03/10] clflush_cache_range now takes size param Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 04/10] IOVA allocation and management routines Keshavamurthy, Anil S
2007-06-26  6:07   ` Andrew Morton
2007-06-26 16:16     ` Keshavamurthy, Anil S [this message]
2007-06-19 21:37 ` [Intel IOMMU 05/10] Intel IOMMU driver Keshavamurthy, Anil S
2007-06-19 23:32   ` Christoph Lameter
2007-06-19 23:50     ` Keshavamurthy, Anil S
2007-06-19 23:56       ` Christoph Lameter
2007-06-26  6:32     ` Andrew Morton
2007-06-26 16:29       ` Keshavamurthy, Anil S
2007-06-26  6:25   ` Andrew Morton
2007-06-26 16:33     ` Keshavamurthy, Anil S
2007-06-26  6:30   ` Andrew Morton
2007-06-19 21:37 ` [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls Keshavamurthy, Anil S
2007-06-19 23:25   ` Christoph Lameter
2007-06-19 23:27     ` Arjan van de Ven
2007-06-19 23:34       ` Christoph Lameter
2007-06-20  0:02         ` Arjan van de Ven
2007-06-20  8:06   ` Peter Zijlstra
2007-06-20 13:03     ` Arjan van de Ven
2007-06-20 17:30       ` Siddha, Suresh B
2007-06-20 18:05         ` Peter Zijlstra
2007-06-20 19:14           ` Arjan van de Ven
2007-06-20 20:08             ` Peter Zijlstra
2007-06-20 23:03               ` Keshavamurthy, Anil S
2007-06-21  6:10                 ` Peter Zijlstra
2007-06-21  6:11                   ` Arjan van de Ven
2007-06-21  6:29                     ` Peter Zijlstra
2007-06-21  6:37                       ` Keshavamurthy, Anil S
2007-06-21  7:13                         ` Peter Zijlstra
2007-06-21 19:51                           ` Keshavamurthy, Anil S
2007-06-21  6:30                     ` Keshavamurthy, Anil S
2007-06-26  5:34     ` Andrew Morton
2007-06-19 21:37 ` [Intel IOMMU 07/10] Intel iommu cmdline option - forcedac Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 08/10] DMAR fault handling support Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 09/10] Iommu Gfx workaround Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 10/10] Iommu floppy workaround Keshavamurthy, Anil S
2007-06-26  6:42   ` Andrew Morton
2007-06-26 10:37     ` Andi Kleen
2007-06-26 19:25       ` Keshavamurthy, Anil S
2007-06-26 16:26     ` Keshavamurthy, Anil S
2007-06-26  6:45 ` [Intel IOMMU 00/10] Intel IOMMU support, take #2 Andrew Morton
2007-06-26  7:12   ` Andi Kleen
2007-06-26 11:13     ` Muli Ben-Yehuda
2007-06-26 15:03       ` Arjan van de Ven
2007-06-26 15:11         ` Muli Ben-Yehuda
2007-06-26 15:48           ` Keshavamurthy, Anil S
2007-06-26 16:00             ` Muli Ben-Yehuda
2007-06-26 15:56       ` Andi Kleen
2007-06-26 15:09         ` Muli Ben-Yehuda
2007-06-26 15:36           ` Andi Kleen
2007-06-26 15:15         ` Arjan van de Ven
2007-06-26 15:33           ` Andi Kleen
2007-06-26 16:25             ` Arjan van de Ven
2007-06-26 17:31               ` Andi Kleen
2007-06-26 20:10                 ` Jesse Barnes
2007-06-26 22:35                   ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070626161631.GB3374@linux-os.sc.intel.com \
    --to=anil.s.keshavamurthy@intel.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@linux.intel.com \
    --cc=ashok.raj@intel.com \
    --cc=clameter@sgi.com \
    --cc=davem@davemloft.net \
    --cc=gregkh@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=muli@il.ibm.com \
    --cc=suresh.b.siddha@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).