Re: Bug: mv643xxx fails with highmem

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
To: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: David Miller <davem@davemloft.net>,
	Nimrod Andy <B38611@freescale.com>,
	Fabio Estevam <fabio.estevam@freescale.com>,
	netdev@vger.kernel.org, fugang.duan@freescale.com
Subject: Re: Bug: mv643xxx fails with highmem
Date: Thu, 18 Dec 2014 10:13:19 -0300	[thread overview]
Message-ID: <5492D2EF.6050807@free-electrons.com> (raw)
In-Reply-To: <20141218000357.GX11285@n2100.arm.linux.org.uk>

On 12/17/2014 09:03 PM, Russell King - ARM Linux wrote:
> On Wed, Dec 17, 2014 at 06:18:58PM -0300, Ezequiel Garcia wrote:
>> On the other side, I haven't been able to reproduce this on my boards. I
>> did try to put a hack to hold most lowmem pages, but it didn't make a
>> difference. (In fact, I haven't been able to clearly see how the pages
>> for the skbuff are allocated from high memory.)
> 
> To be honest, I don't know either.  All that I can do is describe what
> happened...
> 
> I've been running 3.17 since a week after it came out, and never saw a
> problem there.
> 
> Then I moved forward to 3.18, and ended up with memory corruption, which
> seemed to be the GPU scribbling over kernel text (since the oops revealed
> pixel values in the Code: line.)
> 
> I thought it was a GPU problem - which seemed a reasonable assumption as
> I know that the runtime PM I implemented for the GPU doesn't properly
> restore the hardware state yet.  So, I rebooted back into 3.18, this
> time with all GPU users disabled, intending to download a kernel with
> GPU runtime PM disabled.  I'd also added additional debug to my X DDX
> driver which logged the GPU command stream to a file on a NFS mount -
> this does open(, O_CREAT|O_WRONLY|O_APPEND), write(), close() each
> time it submits a block of commands.
> 
> However, while my scripts to download the built kernel to the Cubox
> were running, the kernel oopsed in the depths of dma_map_single() - the
> kernel was trying to access a struct page for phys address 0x40000000,
> which didn't exist.  I decided to go back to 3.17 to get the updated
> kernel on it, hoping that would sort it out.
> 
> With the updated 3.18 kernel (with GPU runtime PM disabled), I found
> that I'd still get oopses in from the network driver while X was starting
> up, again from dma_map_single().  So, with all GPU users again disabled,
> I set about debugging the this issue.
> 
> I added a BUG_ON(!addr) after the page_address(), and that fired.  I
> added a BUG_ON(PageHighMem(this_frag->page.p)) and that fired too.
> (Each time, I had to boot back to 3.17 in order to download the new
> kernel, because very time I tried with 3.18, I'd hit this bug.)
> 
> It's then when I reported the issue and asked the questions...
> 
> I've since done a simple change, taking advantage that on ARM (or any
> asm-generic/dma-mapping-common.h user), dma_unmap_single() and
> dma_unmap_page() are the same function:
> 
> diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
> index d44560d1d268..c343ab03ab8b 100644
> --- a/drivers/net/ethernet/marvell/mv643xx_eth.c
> +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
> @@ -879,10 +879,8 @@ static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)
>                 skb_frag_t *this_frag;
>                 int tx_index;
>                 struct tx_desc *desc;
> -               void *addr;
> 
>                 this_frag = &skb_shinfo(skb)->frags[frag];
> -               addr = page_address(this_frag->page.p) + this_frag->page_offset;                tx_index = txq->tx_curr_desc++;
>                 if (txq->tx_curr_desc == txq->tx_ring_size)
>                         txq->tx_curr_desc = 0;
> @@ -902,8 +900,9 @@ static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)
> 
>                 desc->l4i_chk = 0;
>                 desc->byte_cnt = skb_frag_size(this_frag);
> -               desc->buf_ptr = dma_map_single(mp->dev->dev.parent, addr,
> -                                              desc->byte_cnt, DMA_TO_DEVICE);
> +               desc->buf_ptr = skb_frag_dma_map(mp->dev->dev.parent,
> +                                                this_frag, 0,
> +                                                desc->byte_cnt, DMA_TO_DEVICE);        }
>  }
> 
> 
> I've been running that for the last five days, and I've yet to see
> /any/ issues what so ever, and that includes running with the GPU
> logging all that time:
> 
> -rw-r--r-- 1 root root 17113616 Dec 17 23:52 /shared/etnaviv.bin
> 
> During that time, I've been using the device over the network, running
> various git commands, running builds, running the occasional build
> via NFS, etc.
> 
> So, for me it was trivially easy to reproduce (without my fix in place)
> and all problems have gone away when I've fixed the apparent problem.
> 

Well that's interesting. You've fixed only the non-TSO egress path,
yet your original ethtool output showed tcp-segmentation-offload enabled.

This seems to imply the highmem pages are found only for the non-TSO path.

> However, exactly how it occurs, I don't know.  My understanding from
> reading the various feature flags was that NETIF_F_HIGHDMA was required
> for highmem (see illegal_highdma()) so as this isn't set, we shouldn't
> be seeing highmem fragments - which is why I asked the question in my
> original email.
> 
> If you want me to revert my fix above, and reproduce again, I can
> certainly try that - or put a WARN_ON_ONCE(PageHighMem(this_frag->page.p))
> in there, but I seem to remember that it wasn't particularly useful as
> the backtrace didn't show where the memory actually came from.
> 

No, that's OK. Thanks a lot for all the details. I'll try to come up with a
fix soon.
-- 
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

next prev parent reply	other threads:[~2014-12-18 13:15 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-11 19:49 Bug: mv643xxx fails with highmem Russell King - ARM Linux
2014-12-11 20:10 ` David Miller
2014-12-11 20:12   ` Ezequiel Garcia
2014-12-11 20:25   ` Russell King - ARM Linux
2014-12-11 20:25     ` Ezequiel Garcia
2014-12-11 20:27     ` David Miller
2014-12-12  5:34       ` fugang.duan
2014-12-15 18:04         ` Russell King - ARM Linux
2014-12-16  2:19           ` fugang.duan
2014-12-16 10:57             ` Russell King - ARM Linux
2014-12-18  2:29               ` fugang.duan
2014-12-17 21:18     ` Ezequiel Garcia
2014-12-18  0:03       ` Russell King - ARM Linux
2014-12-18 13:13         ` Ezequiel Garcia [this message]
2014-12-21 16:51           ` Russell King - ARM Linux
2015-01-20 13:41             ` Russell King - ARM Linux
2015-01-20 13:42               ` Ezequiel Garcia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5492D2EF.6050807@free-electrons.com \
    --to=ezequiel.garcia@free-electrons.com \
    --cc=B38611@freescale.com \
    --cc=davem@davemloft.net \
    --cc=fabio.estevam@freescale.com \
    --cc=fugang.duan@freescale.com \
    --cc=linux@arm.linux.org.uk \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.