All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shuah Khan <shuah.khan@hp.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: konrad.wilk@oracle.com, tglx@linutronix.de, mingo@redhat.com,
	hpa@zytor.com, rob@landley.net, stern@rowland.harvard.edu,
	joerg.roedel@amd.com, bhelgaas@google.com,
	LKML <linux-kernel@vger.kernel.org>,
	linux-doc@vger.kernel.org, devel@linuxdriverproject.org,
	x86@kernel.org, shuahkhan@gmail.com
Subject: Re: [PATCH v4] dma-debug: New interfaces to debug dma mapping errors
Date: Mon, 08 Oct 2012 11:07:20 -0600	[thread overview]
Message-ID: <1349716040.2674.19.camel@lorien2> (raw)
In-Reply-To: <20121005155108.d8d88fd2.akpm@linux-foundation.org>

On Fri, 2012-10-05 at 15:51 -0700, Andrew Morton wrote:

> 
> Still seems overly complicated to me, but whatev.
> 
> I think the way to handle this is pretty simple: set a flag in the dma
> entry when someone runs dma_mapping_error() and, if that flag wasn't
> set at unmap time, emit a loud warning.
> 
> From my reading of the code, this patch indeed does that, along with a
> bunch of other (unnecessary?) stuff.  But boy, the changelog conceals
> this information well!

Are you referring to the system wide error counters when you say
unnecessary stuff. The reason I added those was to catch errors when
drivers don't do unmap. Several drivers fail to do unmap. Checking flag
from unmap debug interfaces, doesn't cover these cases. However, I think
the value of system wide counters is limited in the sense that they
don't really identify the driver that needs fixing. In that sense it can
be deemed unnecessary. I dropped them in v5 patch, which I am sending
out.

I also changed the changelog to be clear.

> 
> >  Documentation/DMA-API.txt          |   13 +++++
> >  arch/x86/include/asm/dma-mapping.h |    1 +
> >  include/linux/dma-debug.h          |    7 +++
> >  lib/dma-debug.c                    |  110 ++++++++++++++++++++++++++++++++++--
> 
> Please, go through Documentation/DMA-API-HOWTO.txt with a toothcomb and
> ensure that all this is appropriately covered and that the examples are
> completed for error checking.
> 
> >
> > ...
> >
> > +static inline int check_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > +{
> > +	const struct dma_map_ops *ops = get_dma_ops(dev);
> > +	if (ops->mapping_error)
> > +		return ops->mapping_error(dev, dma_addr);
> > +
> > +	return (dma_addr == DMA_ERROR_CODE);
> > +}
> 
> I'm not a fan of functions called check_foo() because I never know
> whether their return value means "foo is true" or "foo is false".  It
> doesn't help that the return value here is undocumented.  Names such as
> has_mapping_error() or mapping_has_error() will remove this question,
> thereby making the call sites more readable.

Changed the name to has_mapping_error().

> 
> >  static void check_unmap(struct dma_debug_entry *ref)
> >  {
> >  	struct dma_debug_entry *entry;
> >  	struct hash_bucket *bucket;
> >  	unsigned long flags;
> >  
> > -	if (dma_mapping_error(ref->dev, ref->dev_addr)) {
> > +	if (unlikely(check_mapping_error(ref->dev, ref->dev_addr))) {
> > +		unmap_errors += 1;
> >  		err_printk(ref->dev, NULL, "DMA-API: device driver tries "
> >  			   "to free an invalid DMA memory address\n");
> >  		return;
> > @@ -915,6 +975,15 @@ static void check_unmap(struct dma_debug_entry *ref)
> >  			   dir2name[ref->direction]);
> >  	}
> >  
> > +	if (entry->map_err_type == MAP_ERR_NOT_CHECKED) {
> > +		err_printk(ref->dev, entry,
> > +			   "DMA-API: device driver failed to check map error"
> > +			   "[device address=0x%016llx] [size=%llu bytes] "
> > +			   "[mapped as %s]",
> > +			   ref->dev_addr, ref->size,
> > +			   type2name[entry->type]);
> > +	}
> 
> It's important that this warning be associated with a backtrace so we
> can identify the offending call site in the usual fashion. 
> err_printk() does include a backtrace under some circumstances, but
> those circumstances hurt my brain.
> 
> Is it guaranteed that we'll have that backtrace?  If not, I'd suggest
> making it so.

Yes it does. Here is one from my test run:

[ 5232.368453] e1000e 0000:00:19.0: DMA-API: device driver failed to
check map error[device address=0x00000000ffe4e1c0] [size=1522 bytes]
[mapped as single]

[ 5232.368546] Pid: 0, comm: swapper/1 Not tainted 3.6.0-next-20121002+
#2
[ 5232.368549] Call Trace:
[ 5232.368553]  <IRQ>  [<ffffffff81055faf>] warn_slowpath_common
+0x7f/0xc0
[ 5232.368570]  [<ffffffff810560a6>] warn_slowpath_fmt+0x46/0x50
[ 5232.368579]  [<ffffffff81341fbe>] check_unmap+0x48e/0x860
[ 5232.368588]  [<ffffffff81092ee9>] ? check_preempt_wakeup+0x1c9/0x270
[ 5232.368595]  [<ffffffff8134251c>] debug_dma_unmap_page+0x5c/0x60
[ 5232.368616]  [<ffffffffa0016e4e>] e1000_clean_rx_irq+0x1be/0x410
[e1000e]
[ 5232.368626]  [<ffffffff8108c885>] ? wake_up_process+0x15/0x20
[ 5232.368644]  [<ffffffffa001b35c>] e1000e_poll+0x7c/0x400 [e1000e]
[ 5232.368653]  [<ffffffff8107e539>] ? enqueue_hrtimer+0x39/0xc0
[ 5232.368662]  [<ffffffff815756c4>] net_rx_action+0x134/0x240
[ 5232.368670]  [<ffffffff813b15c3>] ? arch_local_irq_enable+0x8/0xd
[ 5232.368678]  [<ffffffff8105e9d0>] __do_softirq+0xc0/0x240
[ 5232.368688]  [<ffffffff816856ce>] ? _raw_spin_lock+0xe/0x20
[ 5232.368697]  [<ffffffff8168f31c>] call_softirq+0x1c/0x30
[ 5232.368706]  [<ffffffff81016315>] do_softirq+0x65/0xa0
[ 5232.368713]  [<ffffffff8105ecae>] irq_exit+0x8e/0xb0
[ 5232.368721]  [<ffffffff8168fba3>] do_IRQ+0x63/0xe0
[ 5232.368728]  [<ffffffff81685cad>] common_interrupt+0x6d/0x6d
[ 5232.368731]  <EOI>  [<ffffffff813b15c3>] ? arch_local_irq_enable
+0x8/0xd
[ 5232.368744]  [<ffffffff813b2159>] acpi_idle_enter_c1+0x94/0xb9
[ 5232.368752]  [<ffffffff81533fd9>] cpuidle_enter+0x19/0x20
[ 5232.368758]  [<ffffffff815346ac>] cpuidle_idle_call+0xac/0x290
[ 5232.368767]  [<ffffffff8101d59f>] cpu_idle+0xcf/0x120
[ 5232.368775]  [<ffffffff81670bc6>] start_secondary+0x1ec/0x1f3
[ 5232.368780] ---[ end trace 58eeb6eaa1eca88f ]---
[ 5232.368782] Mapped at:
[ 5232.368785]  [<ffffffff81340fcb>] debug_dma_map_page+0x8b/0x150
[ 5232.368792]  [<ffffffffa001823f>] e1000_alloc_rx_buffers+0x15f/0x2d0
[e1000e]
[ 5232.368808]  [<ffffffffa0016f8b>] e1000_clean_rx_irq+0x2fb/0x410
[e1000e]
[ 5232.368823]  [<ffffffffa001b35c>] e1000e_poll+0x7c/0x400 [e1000e]
[ 5232.368838]  [<ffffffff815756c4>] net_rx_action+0x134/0x240

> 
> >  	hash_bucket_del(entry);
> >  	dma_entry_free(entry);
> >  
> >
> > ...
> >
> > +void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
> > +{
> > +	struct dma_debug_entry ref;
> > +	struct dma_debug_entry *entry;
> > +	struct hash_bucket *bucket;
> > +	unsigned long flags;
> > +
> > +	if (unlikely(global_disable))
> > +		return;
> > +
> > +	ref.dev = dev;
> > +	ref.dev_addr = dma_addr;
> > +	bucket = get_hash_bucket(&ref, &flags);
> > +	entry = bucket_find_exact(bucket, &ref);
> > +
> > +	if (!entry) {
> > +		/* very likley dma-api didn't call debug_dma_map_page() or
> > +		   debug_dma_map_page() detected mapping error */
> > +		if (map_errors_not_checked)
> > +			map_errors_not_checked -= 1;
> > +		goto out;
> > +	}
> > +
> > +	entry->map_err_type = MAP_ERR_CHECKED;
> > +out:
> > +	put_hash_bucket(bucket, &flags);
> > +}
> > +EXPORT_SYMBOL(debug_dma_mapping_error);
> 
> Well, it's a global, exported-to-modules symbol.  Some formal
> documentation is appropriate for such things.

I added information on what it does and who can/should call to
DMA-API.txt
> 
> >
> > ...
> >
> 



  reply	other threads:[~2012-10-08 17:07 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-17  0:52 [PATCH] dma-debug: New interfaces to debug dma mapping errors Shuah Khan
2012-09-17  2:07 ` Greg KH
2012-09-17 14:45   ` Shuah Khan
2012-09-17 15:25     ` Greg KH
2012-09-17 13:39 ` Konrad Rzeszutek Wilk
2012-09-17 15:52   ` Shuah Khan
2012-09-17 17:23     ` Konrad Rzeszutek Wilk
2012-09-17 22:45       ` Shuah Khan
2012-09-18 13:34         ` Joerg Roedel
2012-09-18 19:42           ` Shuah Khan
2012-09-18 19:45             ` Konrad Rzeszutek Wilk
2012-09-18 20:34               ` Shuah Khan
2012-09-19 13:08             ` Joerg Roedel
2012-09-19 19:16               ` Shuah Khan
2012-09-26  1:05 ` [PATCH v2] " Shuah Khan
2012-09-26 13:12   ` Konrad Rzeszutek Wilk
2012-09-26 16:23     ` Shuah Khan
2012-09-27 10:20   ` Joerg Roedel
2012-09-27 14:13     ` Shuah Khan
2012-10-03 14:55   ` [PATCH v3] " Shuah Khan
2012-10-03 21:45     ` Andrew Morton
2012-10-04 14:01       ` Konrad Rzeszutek Wilk
2012-10-04 22:16         ` Shuah Khan
2012-10-04 17:38     ` Konrad Rzeszutek Wilk
2012-10-04 22:19       ` Shuah Khan
2012-10-05  1:23     ` [PATCH v4] " Shuah Khan
2012-10-05 22:51       ` Andrew Morton
2012-10-08 17:07         ` Shuah Khan [this message]
2012-10-09 21:02           ` Andrew Morton
2012-10-10 21:50             ` Shuah Khan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1349716040.2674.19.camel@lorien2 \
    --to=shuah.khan@hp.com \
    --cc=akpm@linux-foundation.org \
    --cc=bhelgaas@google.com \
    --cc=devel@linuxdriverproject.org \
    --cc=hpa@zytor.com \
    --cc=joerg.roedel@amd.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=rob@landley.net \
    --cc=shuahkhan@gmail.com \
    --cc=stern@rowland.harvard.edu \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.