All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robert Jennings <rcj@linux.vnet.ibm.com>
To: Olof Johansson <olof@lixom.net>
Cc: Brian King <brking@linux.vnet.ibm.com>,
	linuxppc-dev@ozlabs.org, paulus@samba.org,
	David Darrington <ddarring@linux.vnet.ibm.com>
Subject: Re: [PATCH 11/19] powerpc: iommu enablement for CMO
Date: Fri, 20 Jun 2008 10:03:55 -0500	[thread overview]
Message-ID: <20080620150354.GA6594@linux.vnet.ibm.com> (raw)
In-Reply-To: <20080613014349.GB32705@lixom.net>

* Olof Johansson (olof@lixom.net) wrote:
> Hi,
> 
> Some comments and questions below.
> 
> 
> -Olof
> 
> On Thu, Jun 12, 2008 at 05:19:36PM -0500, Robert Jennings wrote:
> > Index: b/arch/powerpc/kernel/iommu.c
> > ===================================================================
> > --- a/arch/powerpc/kernel/iommu.c
> > +++ b/arch/powerpc/kernel/iommu.c
> > @@ -183,6 +183,49 @@ static unsigned long iommu_range_alloc(s
> >  	return n;
> >  }
> >  
> > +/** iommu_undo - Clear iommu_table bits without calling platform tce_free.
> > + *
> > + * @tbl - struct iommu_table to alter
> > + * @dma_addr - DMA address to free entries for
> > + * @npages - number of pages to free entries for
> > + *
> > + * This is the same as __iommu_free without the call to ppc_md.tce_free();
> 
> __iommu_free has the __ prepended to indicate that it's not locking.
> Since this does the same, please keep the __. Also see comments below.
> 
> > + *
> > + * To clean up after ppc_md.tce_build() errors we need to clear bits
> > + * in the table without calling the ppc_md.tce_free() method; calling
> > + * ppc_md.tce_free() could alter entries that were not touched due to a
> > + * premature failure in ppc_md.tce_build().
> > + *
> > + * The ppc_md.tce_build() needs to perform its own clean up prior to
> > + * returning its error.
> > + */
> > +static void iommu_undo(struct iommu_table *tbl, dma_addr_t dma_addr,
> > +			 unsigned int npages)
> > +{
> > +	unsigned long entry, free_entry;
> > +
> > +	entry = dma_addr >> IOMMU_PAGE_SHIFT;
> > +	free_entry = entry - tbl->it_offset;
> > +
> > +	if (((free_entry + npages) > tbl->it_size) ||
> > +	    (entry < tbl->it_offset)) {
> > +		if (printk_ratelimit()) {
> > +			printk(KERN_INFO "iommu_undo: invalid entry\n");
> > +			printk(KERN_INFO "\tentry    = 0x%lx\n", entry);
> > +			printk(KERN_INFO "\tdma_addr = 0x%lx\n", (u64)dma_addr);
> > +			printk(KERN_INFO "\tTable    = 0x%lx\n", (u64)tbl);
> > +			printk(KERN_INFO "\tbus#     = 0x%lx\n", tbl->it_busno);
> > +			printk(KERN_INFO "\tsize     = 0x%lx\n", tbl->it_size);
> > +			printk(KERN_INFO "\tstartOff = 0x%lx\n", tbl->it_offset);
> > +			printk(KERN_INFO "\tindex    = 0x%lx\n", tbl->it_index);
> > +			WARN_ON(1);
> > +		}
> > +		return;
> > +	}
> > +
> > +	iommu_area_free(tbl->it_map, free_entry, npages);
> > +}
> 
> Ick, This should just be refactored to reuse code together with
> iommu_free() instead of duplicating it. Also, the error checking
> shouldn't be needed here.
> 
> Actually, is there harm in calling tce_free for these cases anyway? I'm
> guessing it's not a performance critical path.

There is no harm and I will get rid of this and just call __iommu_free.

> > @@ -275,7 +330,7 @@ int iommu_map_sg(struct device *dev, str
> >  	dma_addr_t dma_next = 0, dma_addr;
> >  	unsigned long flags;
> >  	struct scatterlist *s, *outs, *segstart;
> > -	int outcount, incount, i;
> > +	int outcount, incount, i, rc = 0;
> >  	unsigned int align;
> >  	unsigned long handle;
> >  	unsigned int max_seg_size;
> > @@ -336,7 +391,10 @@ int iommu_map_sg(struct device *dev, str
> >  			    npages, entry, dma_addr);
> >  
> >  		/* Insert into HW table */
> > -		ppc_md.tce_build(tbl, entry, npages, vaddr & IOMMU_PAGE_MASK, direction);
> > +		rc = ppc_md.tce_build(tbl, entry, npages,
> > +		                      vaddr & IOMMU_PAGE_MASK, direction);
> > +		if(unlikely(rc))
> > +			goto failure;
> >  
> >  		/* If we are in an open segment, try merging */
> >  		if (segstart != s) {
> > @@ -399,7 +457,10 @@ int iommu_map_sg(struct device *dev, str
> >  
> >  			vaddr = s->dma_address & IOMMU_PAGE_MASK;
> >  			npages = iommu_num_pages(s->dma_address, s->dma_length);
> > -			__iommu_free(tbl, vaddr, npages);
> > +			if (!rc)
> > +				__iommu_free(tbl, vaddr, npages);
> > +			else
> > +				iommu_undo(tbl, vaddr, npages);
> 
> 'rc' is a quite generic name to carry state this far away from where
> it's set. Either a more descriptive name (build_fail, whatever), or if
> the above is true, just call __iommu_free here as well.

I'll fix this.

> > -static void tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
> > +static void tce_free_pSeriesLP(struct iommu_table*, long, long);
> > +static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long);
> > +
> > +static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
> >  				long npages, unsigned long uaddr,
> >  				enum dma_data_direction direction)
> >  {
> > -	u64 rc;
> > +	u64 rc = 0;
> >  	u64 proto_tce, tce;
> >  	u64 rpn;
> > +	int sleep_msecs, ret = 0;
> > +	long tcenum_start = tcenum, npages_start = npages;
> >  
> >  	rpn = (virt_to_abs(uaddr)) >> TCE_SHIFT;
> >  	proto_tce = TCE_PCI_READ;
> > @@ -108,7 +115,21 @@ static void tce_build_pSeriesLP(struct i
> >  
> >  	while (npages--) {
> >  		tce = proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT;
> > -		rc = plpar_tce_put((u64)tbl->it_index, (u64)tcenum << 12, tce);
> > +		do {
> > +			rc = plpar_tce_put((u64)tbl->it_index,
> > +			                   (u64)tcenum << 12, tce);
> > +			if (unlikely(H_IS_LONG_BUSY(rc))) {
> > +				sleep_msecs = plpar_get_longbusy_msecs(rc);
> > +				mdelay(sleep_msecs);
> 
> Ouch! You're holding locks and stuff here. Do you really want this right
> here?

All of this delay business will be removed.  The firmware will not be
returning any H_LONG_BUSY return codes.

> > +			}
> > +		} while (unlikely(H_IS_LONG_BUSY(rc)));
> 
> Do you also want to keep doing this forever, or eventually just fail
> instead?
> 
> > +		if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
> > +			ret = (int)rc;
> > +			tce_free_pSeriesLP(tbl, tcenum_start,
> > +			                   (npages_start - (npages + 1)));
> > +			break;
> > +		}
> >  
> >  		if (rc && printk_ratelimit()) {
> >  			printk("tce_build_pSeriesLP: plpar_tce_put failed. rc=%ld\n", rc);
> > @@ -121,19 +142,22 @@ static void tce_build_pSeriesLP(struct i
> >  		tcenum++;
> >  		rpn++;
> >  	}
> > +	return ret;
> >  }
> >  
> >  static DEFINE_PER_CPU(u64 *, tce_page) = NULL;
> >  
> > -static void tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
> > +static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
> >  				     long npages, unsigned long uaddr,
> >  				     enum dma_data_direction direction)
> >  {
> > -	u64 rc;
> > +	u64 rc = 0;
> >  	u64 proto_tce;
> >  	u64 *tcep;
> >  	u64 rpn;
> >  	long l, limit;
> > +	long tcenum_start = tcenum, npages_start = npages;
> > +	int sleep_msecs, ret = 0;
> >  
> >  	if (npages == 1)
> >  		return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
> > @@ -171,15 +195,26 @@ static void tce_buildmulti_pSeriesLP(str
> >  			rpn++;
> >  		}
> >  
> > -		rc = plpar_tce_put_indirect((u64)tbl->it_index,
> > -					    (u64)tcenum << 12,
> > -					    (u64)virt_to_abs(tcep),
> > -					    limit);
> > +		do {
> > +			rc = plpar_tce_put_indirect(tbl->it_index, tcenum << 12,
> > +						    virt_to_abs(tcep), limit);
> > +			if (unlikely(H_IS_LONG_BUSY(rc))) {
> > +				sleep_msecs = plpar_get_longbusy_msecs(rc);
> > +				mdelay(sleep_msecs);
> > +			}
> > +		} while (unlikely(H_IS_LONG_BUSY(rc)));
> >  
> >  		npages -= limit;
> >  		tcenum += limit;
> >  	} while (npages > 0 && !rc);
> >  
> > +	if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
> > +		ret = (int)rc;
> > +		tce_freemulti_pSeriesLP(tbl, tcenum_start,
> > +		                        (npages_start - (npages + limit)));
> > +		return ret;
> > +	}
> > +
> >  	if (rc && printk_ratelimit()) {
> >  		printk("tce_buildmulti_pSeriesLP: plpar_tce_put failed. rc=%ld\n", rc);
> >  		printk("\tindex   = 0x%lx\n", (u64)tbl->it_index);
> > @@ -187,14 +222,23 @@ static void tce_buildmulti_pSeriesLP(str
> >  		printk("\ttce[0] val = 0x%lx\n", tcep[0]);
> >  		show_stack(current, (unsigned long *)__get_SP());
> >  	}
> > +	return ret;
> >  }
> >  
> >  static void tce_free_pSeriesLP(struct iommu_table *tbl, long tcenum, long npages)
> >  {
> > +	int sleep_msecs;
> >  	u64 rc;
> >  
> >  	while (npages--) {
> > -		rc = plpar_tce_put((u64)tbl->it_index, (u64)tcenum << 12, 0);
> > +		do {
> > +			rc = plpar_tce_put((u64)tbl->it_index,
> > +			                   (u64)tcenum << 12, 0);
> > +			if (unlikely(H_IS_LONG_BUSY(rc))) {
> > +				sleep_msecs = plpar_get_longbusy_msecs(rc);
> > +				mdelay(sleep_msecs);
> > +			}
> > +		} while (unlikely(H_IS_LONG_BUSY(rc)));
> 
> Can this ever happen? I would hope that any entry that's got an active
> mapping is actually pinned in memory, what other than paging in from
> disk can result in long busy?

This won't happen, I'll remove it.

> > @@ -210,9 +254,17 @@ static void tce_free_pSeriesLP(struct io
> >  
> >  static void tce_freemulti_pSeriesLP(struct iommu_table *tbl, long tcenum, long npages)
> >  {
> > +	int sleep_msecs;
> >  	u64 rc;
> >  
> > -	rc = plpar_tce_stuff((u64)tbl->it_index, (u64)tcenum << 12, 0, npages);
> > +	do {
> > +		rc = plpar_tce_stuff((u64)tbl->it_index,
> > +		                     (u64)tcenum << 12, 0, npages);
> > +		if (unlikely(H_IS_LONG_BUSY(rc))) {
> > +			sleep_msecs = plpar_get_longbusy_msecs(rc);
> > +			mdelay(sleep_msecs);
> > +		}
> > +	} while (unlikely(H_IS_LONG_BUSY(rc)));
> >  
> >  	if (rc && printk_ratelimit()) {
> >  		printk("tce_freemulti_pSeriesLP: plpar_tce_stuff failed\n");

  reply	other threads:[~2008-06-20 15:04 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-06-12 21:53 [PATCH 00/19] powerpc: pSeries Cooperative Memory Overcommitment support Robert Jennings
2008-06-12 22:08 ` [PATCH 01/19] powerpc: Remove extraneous error reporting for hcall failures in lparcfg Robert Jennings
2008-06-12 22:08 ` [PATCH 02/19] powerpc: Split processor entitlement retrieval and gathering to helper routines Robert Jennings
2008-06-13  0:23   ` Stephen Rothwell
2008-06-13 19:11     ` Nathan Fontenot
2008-06-16 16:07   ` Nathan Fontenot
2008-06-12 22:09 ` [PATCH 03/19] powerpc: Add memory entitlement capabilities to /proc/ppc64/lparcfg Robert Jennings
2008-06-16 16:09   ` [PATCH 03/19][v2] " Nathan Fontenot
2008-06-16 20:47   ` [PATCH 03/19][v3] " Nathan Fontenot
2008-06-24 14:23     ` Brian King
2008-06-24 15:26   ` [PATCH 03/19] " Nathan Fontenot
2008-06-12 22:11 ` [PATCH 04/19] powerpc: Split retrieval of processor entitlement data into a helper routine Robert Jennings
2008-06-12 22:11 ` [PATCH 05/19] powerpc: Enable CMO feature during platform setup Robert Jennings
2008-06-12 22:12 ` [PATCH 06/19] powerpc: Utilities to set firmware page state Robert Jennings
2008-06-12 22:13 ` [PATCH 07/19] powerpc: Add collaborative memory manager Robert Jennings
2008-06-12 22:13 ` [PATCH 08/19] powerpc: Do not probe PCI buses or eBus devices if CMO is enabled Robert Jennings
2008-06-12 22:14 ` [PATCH 09/19] powerpc: Add CMO paging statistics Robert Jennings
2008-06-12 22:15 ` [PATCH 10/19] powerpc: move get_longbusy_msecs out of ehca/ehea Robert Jennings
2008-06-12 22:18   ` [PATCH 10/19] [repost] " Robert Jennings
2008-06-12 22:18     ` Robert Jennings
2008-06-13 18:24     ` Brian King
2008-06-13 18:24       ` Brian King
2008-06-13 19:55       ` Jeff Garzik
2008-06-13 19:55         ` Jeff Garzik
2008-06-12 22:19 ` [PATCH 11/19] powerpc: iommu enablement for CMO Robert Jennings
2008-06-13  1:43   ` Olof Johansson
2008-06-20 15:03     ` Robert Jennings [this message]
2008-06-20 15:12   ` [PATCH 11/19][v2] " Robert Jennings
2008-06-12 22:19 ` [PATCH 12/19] powerpc: vio bus support " Robert Jennings
2008-06-13  5:12   ` Stephen Rothwell
2008-06-23 20:23     ` Robert Jennings
2008-06-23 20:25   ` [PATCH 12/19][v2] " Robert Jennings
2008-06-12 22:21 ` [PATCH 13/19] powerpc: Verify CMO memory entitlement updates with virtual I/O Robert Jennings
2008-06-12 22:21 ` [PATCH 14/19] powerpc: hvc enablement for CMO Robert Jennings
2008-06-12 22:22 ` [PATCH 15/19] powerpc: hvcs " Robert Jennings
2008-06-12 22:22 ` [PATCH 16/19] ibmveth: Automatically enable larger rx buffer pools for larger mtu Robert Jennings
2008-06-12 22:22   ` Robert Jennings
2008-06-13  5:18   ` Stephen Rothwell
2008-06-13  5:18     ` Stephen Rothwell
2008-06-23 20:21   ` [PATCH 16/19][v2] " Robert Jennings
2008-06-23 20:21     ` Robert Jennings
2008-06-12 22:23 ` [PATCH 17/19] ibmveth: enable driver for CMO Robert Jennings
2008-06-12 22:23   ` Robert Jennings
2008-06-13  5:25   ` Stephen Rothwell
2008-06-13  5:25     ` Stephen Rothwell
2008-06-23 20:20   ` [PATCH 17/19][v2] " Robert Jennings
2008-06-23 20:20     ` Robert Jennings
2008-06-12 22:24 ` [PATCH 18/19] ibmvscsi: driver enablement " Robert Jennings
2008-06-13 18:30   ` Brian King
2008-06-13 18:30     ` Brian King
2008-06-12 22:31 ` [PATCH 19/19] powerpc: Update arch vector to indicate support " Robert Jennings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080620150354.GA6594@linux.vnet.ibm.com \
    --to=rcj@linux.vnet.ibm.com \
    --cc=brking@linux.vnet.ibm.com \
    --cc=ddarring@linux.vnet.ibm.com \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=olof@lixom.net \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.