Re: [Intel IOMMU 04/10] IOVA allocation and management routines

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com>,
	linux-kernel@vger.kernel.org, ak@suse.de, gregkh@suse.de,
	muli@il.ibm.com, suresh.b.siddha@intel.com,
	arjan@linux.intel.com, ashok.raj@intel.com, davem@davemloft.net,
	clameter@sgi.com
Subject: Re: [Intel IOMMU 04/10] IOVA allocation and management routines
Date: Tue, 26 Jun 2007 09:16:31 -0700	[thread overview]
Message-ID: <20070626161631.GB3374@linux-os.sc.intel.com> (raw)
In-Reply-To: <20070625230747.2c2d4c11.akpm@linux-foundation.org>

On Mon, Jun 25, 2007 at 11:07:47PM -0700, Andrew Morton wrote:
> On Tue, 19 Jun 2007 14:37:05 -0700 "Keshavamurthy, Anil S" <anil.s.keshavamurthy@intel.com> wrote:
> 
> All the inlines in this code are pretty pointless: all those functions have
> a single callsite so the compiler inlines them anyway.  If we later add
> more callsites for these functions, they're too big to be inlined.
> 
> inline is usually wrong: don't do it!
Yup, I agree and will follow in future.

> > +
> > +/**
> > + * find_iova - find's an iova for a given pfn
> > + * @iovad - iova domain in question.
> > + * pfn - page frame number
> > + * This function finds and returns an iova belonging to the
> > + * given doamin which matches the given pfn.
> > + */
> > +struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn)
> > +{
> > +	unsigned long flags;
> > +	struct rb_node *node;
> > +
> > +	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> > +	node = iovad->rbroot.rb_node;
> > +	while (node) {
> > +		struct iova *iova = container_of(node, struct iova, node);
> > +
> > +		/* If pfn falls within iova's range, return iova */
> > +		if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
> > +			spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > +			return iova;
> > +		}
> > +
> > +		if (pfn < iova->pfn_lo)
> > +			node = node->rb_left;
> > +		else if (pfn > iova->pfn_lo)
> > +			node = node->rb_right;
> > +	}
> > +
> > +	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > +	return NULL;
> > +}
> 
> So we take the lock, look up an item, then drop the lock then return the
> item we just found.  We took no refcount on it and we didn't do anything to
> keep this object alive.
> 
> Is that a bug, or does the (afacit undocumented) lifecycle management of
> these things take care of it in some manner?  If yes, please reply via an
> add-a-comment patch.

Nope, this is not a bug. Adding a comment patch which explains the same.

> 
> 
> > +/**
> > + * __free_iova - frees the given iova
> > + * @iovad: iova domain in question.
> > + * @iova: iova in question.
> > + * Frees the given iova belonging to the giving domain
> > + */
> > +void
> > +__free_iova(struct iova_domain *iovad, struct iova *iova)
> > +{
> > +	unsigned long flags;
> > +
> > +	if (iova) {
> > +		spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> > +		__cached_rbnode_delete_update(iovad, iova);
> > +		rb_erase(&iova->node, &iovad->rbroot);
> > +		spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > +		free_iova_mem(iova);
> > +	}
> > +}
> 

> Can this really be called with NULL?  If so, under what circumstances? 
> (This reader couldn't work it out from a brief look at the code, so perhaps
> others will not be able to either.  Perhaps a comment is needed)

It was getting called from only one place free_iova().
Below patch address your concern.

> 
> > +/**
> > + * free_iova - finds and frees the iova for a given pfn
> > + * @iovad: - iova domain in question.
> > + * @pfn: - pfn that is allocated previously
> > + * This functions finds an iova for a given pfn and then
> > + * frees the iova from that domain.
> > + */
> > +void
> > +free_iova(struct iova_domain *iovad, unsigned long pfn)
> > +{
> > +	struct iova *iova = find_iova(iovad, pfn);
> > +	__free_iova(iovad, iova);
> > +
> > +}
> > +
> > +/**
> > + * put_iova_domain - destroys the iova doamin
> > + * @iovad: - iova domain in question.
> > + * All the iova's in that domain are destroyed.
> > + */
> > +void put_iova_domain(struct iova_domain *iovad)
> > +{
> > +	struct rb_node *node;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
> > +	node = rb_first(&iovad->rbroot);
> > +	while (node) {
> > +		struct iova *iova = container_of(node, struct iova, node);
> > +		rb_erase(node, &iovad->rbroot);
> > +		free_iova_mem(iova);
> > +		node = rb_first(&iovad->rbroot);
> > +	}
> > +	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
> > +}
> 
> Right, so I suspect what's happening here is that all iova's remain valid
> until their entire domain is destroyed, yes?

Nope. IOVA are valid only for the duration of DMA MAP and DMA UNMAP calls.
In case of Intel-iommu driver, the iova's are valid only for the duration
of __intel_map_singl() and __intel_unmap_single() calls.

> 
> What is the upper bound to the memory consumpotion here, and what provides
> it?
As explained above, iova are freed and reused again during the DMA map calls.

> 
> Again, some code comments about these design issues are appropriate.
> 
> > +/*
> > + * We need a fixed PAGE_SIZE of 4K irrespective of
> > + * arch PAGE_SIZE for IOMMU page tables.
> > + */
> > +#define PAGE_SHIFT_4K		(12)
> > +#define PAGE_SIZE_4K		(1UL << PAGE_SHIFT_4K)
> > +#define PAGE_MASK_4K		(((u64)-1) << PAGE_SHIFT_4K)
> > +#define PAGE_ALIGN_4K(addr)	(((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
> 
> Am still wondering why we cannot use PAGE_SIZE, PAGE_SHIFT, etc here.
VT-d hardware(a.k.a Intel IOMMU hardware) page table size is always
4K irrespective of the OS PAGE_SIZE. We want to use the same code for 
IA64 too where the OS PAGE_SIZE may not be 4K size and hence
had to define here for IOMMU.

> 
> > +#define IOVA_START_ADDR		(0x1000)
> 
> What determined that address?  (Needs comment)
Fixed. Please see bleow patch.

> 
> > +#define IOVA_START_PFN		(IOVA_START_ADDR >> PAGE_SHIFT_4K)
> > +
> > +#define IOVA_PFN(addr)		((addr) >> PAGE_SHIFT_4K)
> 
> So I'm looking at this and wondering "what type does addr have"?
> 
> If it's unsigned long then perhaps we have a problem on x86_32 PAE.  Maybe
> we don't support x86_32 PAE, but still, I'd have thought that the
> appropriate type here is dma_addr_t.
Yup that is correct. We don;t support x86_32.
> 
> But alas, it was needlessly implemented as a macro, so the reader cannot
> tell.
I am in the process of making the same code base to work for
IA64 architecure too and in this process I will do more cleanup.

Thanks.

Please apply the below patch as a fix the existing patch.


signed-off-by: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>

---
 drivers/pci/iova.c |   22 ++++++++++++++--------
 drivers/pci/iova.h |    4 ++--
 2 files changed, 16 insertions(+), 10 deletions(-)

Index: linux-2.6.22-rc4-mm2/drivers/pci/iova.c
===================================================================
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/iova.c	2007-06-26 07:50:34.000000000 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/iova.c	2007-06-26 08:25:23.000000000 -0700
@@ -166,6 +166,7 @@
 	unsigned long flags;
 	struct rb_node *node;
 
+	/* Take the lock so that no other thread is manipulating the rbtree */
 	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
 	node = iovad->rbroot.rb_node;
 	while (node) {
@@ -174,6 +175,12 @@
 		/* If pfn falls within iova's range, return iova */
 		if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) {
 			spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+			/* We are not holding the lock while this iova
+			 * is referenced by the caller as the same thread
+			 * which called this function also calls __free_iova()
+			 * and it is by desing that only one thread can possibly
+			 * reference a particular iova and hence no conflict.
+			 */
 			return iova;
 		}
 
@@ -198,13 +205,11 @@
 {
 	unsigned long flags;
 
-	if (iova) {
-		spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
-		__cached_rbnode_delete_update(iovad, iova);
-		rb_erase(&iova->node, &iovad->rbroot);
-		spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
-		free_iova_mem(iova);
-	}
+	spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
+	__cached_rbnode_delete_update(iovad, iova);
+	rb_erase(&iova->node, &iovad->rbroot);
+	spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+	free_iova_mem(iova);
 }
 
 /**
@@ -218,7 +223,8 @@
 free_iova(struct iova_domain *iovad, unsigned long pfn)
 {
 	struct iova *iova = find_iova(iovad, pfn);
-	__free_iova(iovad, iova);
+	if (iova)
+		__free_iova(iovad, iova);
 
 }
 
Index: linux-2.6.22-rc4-mm2/drivers/pci/iova.h
===================================================================
--- linux-2.6.22-rc4-mm2.orig/drivers/pci/iova.h	2007-06-26 07:50:34.000000000 -0700
+++ linux-2.6.22-rc4-mm2/drivers/pci/iova.h	2007-06-26 08:28:05.000000000 -0700
@@ -24,8 +24,8 @@
 #define PAGE_MASK_4K		(((u64)-1) << PAGE_SHIFT_4K)
 #define PAGE_ALIGN_4K(addr)	(((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
 
-#define IOVA_START_ADDR		(0x1000)
-#define IOVA_START_PFN		(IOVA_START_ADDR >> PAGE_SHIFT_4K)
+/* IO virtual address start page frame number */
+#define IOVA_START_PFN		(1)
 
 #define IOVA_PFN(addr)		((addr) >> PAGE_SHIFT_4K)
 #define DMA_32BIT_PFN	IOVA_PFN(DMA_32BIT_MASK)
>

next prev parent reply	other threads:[~2007-06-26 16:21 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-06-19 21:37 [Intel IOMMU 00/10] Intel IOMMU support, take #2 Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 01/10] DMAR detection and parsing logic Keshavamurthy, Anil S
2007-07-04  9:18   ` Peter Zijlstra
2007-07-04 10:04     ` Andrew Morton
2007-07-04 10:14       ` Peter Zijlstra
2007-06-19 21:37 ` [Intel IOMMU 02/10] PCI generic helper function Keshavamurthy, Anil S
2007-06-26  5:49   ` Andrew Morton
2007-06-26 14:44     ` Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 03/10] clflush_cache_range now takes size param Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 04/10] IOVA allocation and management routines Keshavamurthy, Anil S
2007-06-26  6:07   ` Andrew Morton
2007-06-26 16:16     ` Keshavamurthy, Anil S [this message]
2007-06-19 21:37 ` [Intel IOMMU 05/10] Intel IOMMU driver Keshavamurthy, Anil S
2007-06-19 23:32   ` Christoph Lameter
2007-06-19 23:50     ` Keshavamurthy, Anil S
2007-06-19 23:56       ` Christoph Lameter
2007-06-26  6:32     ` Andrew Morton
2007-06-26 16:29       ` Keshavamurthy, Anil S
2007-06-26  6:25   ` Andrew Morton
2007-06-26 16:33     ` Keshavamurthy, Anil S
2007-06-26  6:30   ` Andrew Morton
2007-06-19 21:37 ` [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls Keshavamurthy, Anil S
2007-06-19 23:25   ` Christoph Lameter
2007-06-19 23:27     ` Arjan van de Ven
2007-06-19 23:34       ` Christoph Lameter
2007-06-20  0:02         ` Arjan van de Ven
2007-06-20  8:06   ` Peter Zijlstra
2007-06-20 13:03     ` Arjan van de Ven
2007-06-20 17:30       ` Siddha, Suresh B
2007-06-20 18:05         ` Peter Zijlstra
2007-06-20 19:14           ` Arjan van de Ven
2007-06-20 20:08             ` Peter Zijlstra
2007-06-20 23:03               ` Keshavamurthy, Anil S
2007-06-21  6:10                 ` Peter Zijlstra
2007-06-21  6:11                   ` Arjan van de Ven
2007-06-21  6:29                     ` Peter Zijlstra
2007-06-21  6:37                       ` Keshavamurthy, Anil S
2007-06-21  7:13                         ` Peter Zijlstra
2007-06-21 19:51                           ` Keshavamurthy, Anil S
2007-06-21  6:30                     ` Keshavamurthy, Anil S
2007-06-26  5:34     ` Andrew Morton
2007-06-19 21:37 ` [Intel IOMMU 07/10] Intel iommu cmdline option - forcedac Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 08/10] DMAR fault handling support Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 09/10] Iommu Gfx workaround Keshavamurthy, Anil S
2007-06-19 21:37 ` [Intel IOMMU 10/10] Iommu floppy workaround Keshavamurthy, Anil S
2007-06-26  6:42   ` Andrew Morton
2007-06-26 10:37     ` Andi Kleen
2007-06-26 19:25       ` Keshavamurthy, Anil S
2007-06-26 16:26     ` Keshavamurthy, Anil S
2007-06-26  6:45 ` [Intel IOMMU 00/10] Intel IOMMU support, take #2 Andrew Morton
2007-06-26  7:12   ` Andi Kleen
2007-06-26 11:13     ` Muli Ben-Yehuda
2007-06-26 15:03       ` Arjan van de Ven
2007-06-26 15:11         ` Muli Ben-Yehuda
2007-06-26 15:48           ` Keshavamurthy, Anil S
2007-06-26 16:00             ` Muli Ben-Yehuda
2007-06-26 15:56       ` Andi Kleen
2007-06-26 15:09         ` Muli Ben-Yehuda
2007-06-26 15:36           ` Andi Kleen
2007-06-26 15:15         ` Arjan van de Ven
2007-06-26 15:33           ` Andi Kleen
2007-06-26 16:25             ` Arjan van de Ven
2007-06-26 17:31               ` Andi Kleen
2007-06-26 20:10                 ` Jesse Barnes
2007-06-26 22:35                   ` Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070626161631.GB3374@linux-os.sc.intel.com \
    --to=anil.s.keshavamurthy@intel.com \
    --cc=ak@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@linux.intel.com \
    --cc=ashok.raj@intel.com \
    --cc=clameter@sgi.com \
    --cc=davem@davemloft.net \
    --cc=gregkh@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=muli@il.ibm.com \
    --cc=suresh.b.siddha@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.