From: Robert Jennings <rcj@linux.vnet.ibm.com>
To: Olof Johansson <olof@lixom.net>
Cc: Brian King <brking@linux.vnet.ibm.com>,
linuxppc-dev@ozlabs.org, paulus@samba.org,
David Darrington <ddarring@linux.vnet.ibm.com>
Subject: Re: [PATCH 11/19] powerpc: iommu enablement for CMO
Date: Fri, 20 Jun 2008 10:03:55 -0500 [thread overview]
Message-ID: <20080620150354.GA6594@linux.vnet.ibm.com> (raw)
In-Reply-To: <20080613014349.GB32705@lixom.net>
* Olof Johansson (olof@lixom.net) wrote:
> Hi,
>
> Some comments and questions below.
>
>
> -Olof
>
> On Thu, Jun 12, 2008 at 05:19:36PM -0500, Robert Jennings wrote:
> > Index: b/arch/powerpc/kernel/iommu.c
> > ===================================================================
> > --- a/arch/powerpc/kernel/iommu.c
> > +++ b/arch/powerpc/kernel/iommu.c
> > @@ -183,6 +183,49 @@ static unsigned long iommu_range_alloc(s
> > return n;
> > }
> >
> > +/** iommu_undo - Clear iommu_table bits without calling platform tce_free.
> > + *
> > + * @tbl - struct iommu_table to alter
> > + * @dma_addr - DMA address to free entries for
> > + * @npages - number of pages to free entries for
> > + *
> > + * This is the same as __iommu_free without the call to ppc_md.tce_free();
>
> __iommu_free has the __ prepended to indicate that it's not locking.
> Since this does the same, please keep the __. Also see comments below.
>
> > + *
> > + * To clean up after ppc_md.tce_build() errors we need to clear bits
> > + * in the table without calling the ppc_md.tce_free() method; calling
> > + * ppc_md.tce_free() could alter entries that were not touched due to a
> > + * premature failure in ppc_md.tce_build().
> > + *
> > + * The ppc_md.tce_build() needs to perform its own clean up prior to
> > + * returning its error.
> > + */
> > +static void iommu_undo(struct iommu_table *tbl, dma_addr_t dma_addr,
> > + unsigned int npages)
> > +{
> > + unsigned long entry, free_entry;
> > +
> > + entry = dma_addr >> IOMMU_PAGE_SHIFT;
> > + free_entry = entry - tbl->it_offset;
> > +
> > + if (((free_entry + npages) > tbl->it_size) ||
> > + (entry < tbl->it_offset)) {
> > + if (printk_ratelimit()) {
> > + printk(KERN_INFO "iommu_undo: invalid entry\n");
> > + printk(KERN_INFO "\tentry = 0x%lx\n", entry);
> > + printk(KERN_INFO "\tdma_addr = 0x%lx\n", (u64)dma_addr);
> > + printk(KERN_INFO "\tTable = 0x%lx\n", (u64)tbl);
> > + printk(KERN_INFO "\tbus# = 0x%lx\n", tbl->it_busno);
> > + printk(KERN_INFO "\tsize = 0x%lx\n", tbl->it_size);
> > + printk(KERN_INFO "\tstartOff = 0x%lx\n", tbl->it_offset);
> > + printk(KERN_INFO "\tindex = 0x%lx\n", tbl->it_index);
> > + WARN_ON(1);
> > + }
> > + return;
> > + }
> > +
> > + iommu_area_free(tbl->it_map, free_entry, npages);
> > +}
>
> Ick, This should just be refactored to reuse code together with
> iommu_free() instead of duplicating it. Also, the error checking
> shouldn't be needed here.
>
> Actually, is there harm in calling tce_free for these cases anyway? I'm
> guessing it's not a performance critical path.
There is no harm and I will get rid of this and just call __iommu_free.
> > @@ -275,7 +330,7 @@ int iommu_map_sg(struct device *dev, str
> > dma_addr_t dma_next = 0, dma_addr;
> > unsigned long flags;
> > struct scatterlist *s, *outs, *segstart;
> > - int outcount, incount, i;
> > + int outcount, incount, i, rc = 0;
> > unsigned int align;
> > unsigned long handle;
> > unsigned int max_seg_size;
> > @@ -336,7 +391,10 @@ int iommu_map_sg(struct device *dev, str
> > npages, entry, dma_addr);
> >
> > /* Insert into HW table */
> > - ppc_md.tce_build(tbl, entry, npages, vaddr & IOMMU_PAGE_MASK, direction);
> > + rc = ppc_md.tce_build(tbl, entry, npages,
> > + vaddr & IOMMU_PAGE_MASK, direction);
> > + if(unlikely(rc))
> > + goto failure;
> >
> > /* If we are in an open segment, try merging */
> > if (segstart != s) {
> > @@ -399,7 +457,10 @@ int iommu_map_sg(struct device *dev, str
> >
> > vaddr = s->dma_address & IOMMU_PAGE_MASK;
> > npages = iommu_num_pages(s->dma_address, s->dma_length);
> > - __iommu_free(tbl, vaddr, npages);
> > + if (!rc)
> > + __iommu_free(tbl, vaddr, npages);
> > + else
> > + iommu_undo(tbl, vaddr, npages);
>
> 'rc' is a quite generic name to carry state this far away from where
> it's set. Either a more descriptive name (build_fail, whatever), or if
> the above is true, just call __iommu_free here as well.
I'll fix this.
> > -static void tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
> > +static void tce_free_pSeriesLP(struct iommu_table*, long, long);
> > +static void tce_freemulti_pSeriesLP(struct iommu_table*, long, long);
> > +
> > +static int tce_build_pSeriesLP(struct iommu_table *tbl, long tcenum,
> > long npages, unsigned long uaddr,
> > enum dma_data_direction direction)
> > {
> > - u64 rc;
> > + u64 rc = 0;
> > u64 proto_tce, tce;
> > u64 rpn;
> > + int sleep_msecs, ret = 0;
> > + long tcenum_start = tcenum, npages_start = npages;
> >
> > rpn = (virt_to_abs(uaddr)) >> TCE_SHIFT;
> > proto_tce = TCE_PCI_READ;
> > @@ -108,7 +115,21 @@ static void tce_build_pSeriesLP(struct i
> >
> > while (npages--) {
> > tce = proto_tce | (rpn & TCE_RPN_MASK) << TCE_RPN_SHIFT;
> > - rc = plpar_tce_put((u64)tbl->it_index, (u64)tcenum << 12, tce);
> > + do {
> > + rc = plpar_tce_put((u64)tbl->it_index,
> > + (u64)tcenum << 12, tce);
> > + if (unlikely(H_IS_LONG_BUSY(rc))) {
> > + sleep_msecs = plpar_get_longbusy_msecs(rc);
> > + mdelay(sleep_msecs);
>
> Ouch! You're holding locks and stuff here. Do you really want this right
> here?
All of this delay business will be removed. The firmware will not be
returning any H_LONG_BUSY return codes.
> > + }
> > + } while (unlikely(H_IS_LONG_BUSY(rc)));
>
> Do you also want to keep doing this forever, or eventually just fail
> instead?
>
> > + if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
> > + ret = (int)rc;
> > + tce_free_pSeriesLP(tbl, tcenum_start,
> > + (npages_start - (npages + 1)));
> > + break;
> > + }
> >
> > if (rc && printk_ratelimit()) {
> > printk("tce_build_pSeriesLP: plpar_tce_put failed. rc=%ld\n", rc);
> > @@ -121,19 +142,22 @@ static void tce_build_pSeriesLP(struct i
> > tcenum++;
> > rpn++;
> > }
> > + return ret;
> > }
> >
> > static DEFINE_PER_CPU(u64 *, tce_page) = NULL;
> >
> > -static void tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
> > +static int tce_buildmulti_pSeriesLP(struct iommu_table *tbl, long tcenum,
> > long npages, unsigned long uaddr,
> > enum dma_data_direction direction)
> > {
> > - u64 rc;
> > + u64 rc = 0;
> > u64 proto_tce;
> > u64 *tcep;
> > u64 rpn;
> > long l, limit;
> > + long tcenum_start = tcenum, npages_start = npages;
> > + int sleep_msecs, ret = 0;
> >
> > if (npages == 1)
> > return tce_build_pSeriesLP(tbl, tcenum, npages, uaddr,
> > @@ -171,15 +195,26 @@ static void tce_buildmulti_pSeriesLP(str
> > rpn++;
> > }
> >
> > - rc = plpar_tce_put_indirect((u64)tbl->it_index,
> > - (u64)tcenum << 12,
> > - (u64)virt_to_abs(tcep),
> > - limit);
> > + do {
> > + rc = plpar_tce_put_indirect(tbl->it_index, tcenum << 12,
> > + virt_to_abs(tcep), limit);
> > + if (unlikely(H_IS_LONG_BUSY(rc))) {
> > + sleep_msecs = plpar_get_longbusy_msecs(rc);
> > + mdelay(sleep_msecs);
> > + }
> > + } while (unlikely(H_IS_LONG_BUSY(rc)));
> >
> > npages -= limit;
> > tcenum += limit;
> > } while (npages > 0 && !rc);
> >
> > + if (unlikely(rc == H_NOT_ENOUGH_RESOURCES)) {
> > + ret = (int)rc;
> > + tce_freemulti_pSeriesLP(tbl, tcenum_start,
> > + (npages_start - (npages + limit)));
> > + return ret;
> > + }
> > +
> > if (rc && printk_ratelimit()) {
> > printk("tce_buildmulti_pSeriesLP: plpar_tce_put failed. rc=%ld\n", rc);
> > printk("\tindex = 0x%lx\n", (u64)tbl->it_index);
> > @@ -187,14 +222,23 @@ static void tce_buildmulti_pSeriesLP(str
> > printk("\ttce[0] val = 0x%lx\n", tcep[0]);
> > show_stack(current, (unsigned long *)__get_SP());
> > }
> > + return ret;
> > }
> >
> > static void tce_free_pSeriesLP(struct iommu_table *tbl, long tcenum, long npages)
> > {
> > + int sleep_msecs;
> > u64 rc;
> >
> > while (npages--) {
> > - rc = plpar_tce_put((u64)tbl->it_index, (u64)tcenum << 12, 0);
> > + do {
> > + rc = plpar_tce_put((u64)tbl->it_index,
> > + (u64)tcenum << 12, 0);
> > + if (unlikely(H_IS_LONG_BUSY(rc))) {
> > + sleep_msecs = plpar_get_longbusy_msecs(rc);
> > + mdelay(sleep_msecs);
> > + }
> > + } while (unlikely(H_IS_LONG_BUSY(rc)));
>
> Can this ever happen? I would hope that any entry that's got an active
> mapping is actually pinned in memory, what other than paging in from
> disk can result in long busy?
This won't happen, I'll remove it.
> > @@ -210,9 +254,17 @@ static void tce_free_pSeriesLP(struct io
> >
> > static void tce_freemulti_pSeriesLP(struct iommu_table *tbl, long tcenum, long npages)
> > {
> > + int sleep_msecs;
> > u64 rc;
> >
> > - rc = plpar_tce_stuff((u64)tbl->it_index, (u64)tcenum << 12, 0, npages);
> > + do {
> > + rc = plpar_tce_stuff((u64)tbl->it_index,
> > + (u64)tcenum << 12, 0, npages);
> > + if (unlikely(H_IS_LONG_BUSY(rc))) {
> > + sleep_msecs = plpar_get_longbusy_msecs(rc);
> > + mdelay(sleep_msecs);
> > + }
> > + } while (unlikely(H_IS_LONG_BUSY(rc)));
> >
> > if (rc && printk_ratelimit()) {
> > printk("tce_freemulti_pSeriesLP: plpar_tce_stuff failed\n");
next prev parent reply other threads:[~2008-06-20 15:04 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-12 21:53 [PATCH 00/19] powerpc: pSeries Cooperative Memory Overcommitment support Robert Jennings
2008-06-12 22:08 ` [PATCH 01/19] powerpc: Remove extraneous error reporting for hcall failures in lparcfg Robert Jennings
2008-06-12 22:08 ` [PATCH 02/19] powerpc: Split processor entitlement retrieval and gathering to helper routines Robert Jennings
2008-06-13 0:23 ` Stephen Rothwell
2008-06-13 19:11 ` Nathan Fontenot
2008-06-16 16:07 ` Nathan Fontenot
2008-06-12 22:09 ` [PATCH 03/19] powerpc: Add memory entitlement capabilities to /proc/ppc64/lparcfg Robert Jennings
2008-06-16 16:09 ` [PATCH 03/19][v2] " Nathan Fontenot
2008-06-16 20:47 ` [PATCH 03/19][v3] " Nathan Fontenot
2008-06-24 14:23 ` Brian King
2008-06-24 15:26 ` [PATCH 03/19] " Nathan Fontenot
2008-06-12 22:11 ` [PATCH 04/19] powerpc: Split retrieval of processor entitlement data into a helper routine Robert Jennings
2008-06-12 22:11 ` [PATCH 05/19] powerpc: Enable CMO feature during platform setup Robert Jennings
2008-06-12 22:12 ` [PATCH 06/19] powerpc: Utilities to set firmware page state Robert Jennings
2008-06-12 22:13 ` [PATCH 07/19] powerpc: Add collaborative memory manager Robert Jennings
2008-06-12 22:13 ` [PATCH 08/19] powerpc: Do not probe PCI buses or eBus devices if CMO is enabled Robert Jennings
2008-06-12 22:14 ` [PATCH 09/19] powerpc: Add CMO paging statistics Robert Jennings
2008-06-12 22:15 ` [PATCH 10/19] powerpc: move get_longbusy_msecs out of ehca/ehea Robert Jennings
2008-06-12 22:18 ` [PATCH 10/19] [repost] " Robert Jennings
2008-06-13 18:24 ` Brian King
2008-06-13 19:55 ` Jeff Garzik
2008-06-12 22:19 ` [PATCH 11/19] powerpc: iommu enablement for CMO Robert Jennings
2008-06-13 1:43 ` Olof Johansson
2008-06-20 15:03 ` Robert Jennings [this message]
2008-06-20 15:12 ` [PATCH 11/19][v2] " Robert Jennings
2008-06-12 22:19 ` [PATCH 12/19] powerpc: vio bus support " Robert Jennings
2008-06-13 5:12 ` Stephen Rothwell
2008-06-23 20:23 ` Robert Jennings
2008-06-23 20:25 ` [PATCH 12/19][v2] " Robert Jennings
2008-06-12 22:21 ` [PATCH 13/19] powerpc: Verify CMO memory entitlement updates with virtual I/O Robert Jennings
2008-06-12 22:21 ` [PATCH 14/19] powerpc: hvc enablement for CMO Robert Jennings
2008-06-12 22:22 ` [PATCH 15/19] powerpc: hvcs " Robert Jennings
2008-06-12 22:22 ` [PATCH 16/19] ibmveth: Automatically enable larger rx buffer pools for larger mtu Robert Jennings
2008-06-13 5:18 ` Stephen Rothwell
2008-06-23 20:21 ` [PATCH 16/19][v2] " Robert Jennings
2008-06-12 22:23 ` [PATCH 17/19] ibmveth: enable driver for CMO Robert Jennings
2008-06-13 5:25 ` Stephen Rothwell
2008-06-23 20:20 ` [PATCH 17/19][v2] " Robert Jennings
2008-06-12 22:24 ` [PATCH 18/19] ibmvscsi: driver enablement " Robert Jennings
2008-06-13 18:30 ` Brian King
2008-06-12 22:31 ` [PATCH 19/19] powerpc: Update arch vector to indicate support " Robert Jennings
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080620150354.GA6594@linux.vnet.ibm.com \
--to=rcj@linux.vnet.ibm.com \
--cc=brking@linux.vnet.ibm.com \
--cc=ddarring@linux.vnet.ibm.com \
--cc=linuxppc-dev@ozlabs.org \
--cc=olof@lixom.net \
--cc=paulus@samba.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).