From: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
To: Russell King - ARM Linux <linux@arm.linux.org.uk>
Cc: David Miller <davem@davemloft.net>,
Nimrod Andy <B38611@freescale.com>,
Fabio Estevam <fabio.estevam@freescale.com>,
netdev@vger.kernel.org, fugang.duan@freescale.com
Subject: Re: Bug: mv643xxx fails with highmem
Date: Thu, 18 Dec 2014 10:13:19 -0300 [thread overview]
Message-ID: <5492D2EF.6050807@free-electrons.com> (raw)
In-Reply-To: <20141218000357.GX11285@n2100.arm.linux.org.uk>
On 12/17/2014 09:03 PM, Russell King - ARM Linux wrote:
> On Wed, Dec 17, 2014 at 06:18:58PM -0300, Ezequiel Garcia wrote:
>> On the other side, I haven't been able to reproduce this on my boards. I
>> did try to put a hack to hold most lowmem pages, but it didn't make a
>> difference. (In fact, I haven't been able to clearly see how the pages
>> for the skbuff are allocated from high memory.)
>
> To be honest, I don't know either. All that I can do is describe what
> happened...
>
> I've been running 3.17 since a week after it came out, and never saw a
> problem there.
>
> Then I moved forward to 3.18, and ended up with memory corruption, which
> seemed to be the GPU scribbling over kernel text (since the oops revealed
> pixel values in the Code: line.)
>
> I thought it was a GPU problem - which seemed a reasonable assumption as
> I know that the runtime PM I implemented for the GPU doesn't properly
> restore the hardware state yet. So, I rebooted back into 3.18, this
> time with all GPU users disabled, intending to download a kernel with
> GPU runtime PM disabled. I'd also added additional debug to my X DDX
> driver which logged the GPU command stream to a file on a NFS mount -
> this does open(, O_CREAT|O_WRONLY|O_APPEND), write(), close() each
> time it submits a block of commands.
>
> However, while my scripts to download the built kernel to the Cubox
> were running, the kernel oopsed in the depths of dma_map_single() - the
> kernel was trying to access a struct page for phys address 0x40000000,
> which didn't exist. I decided to go back to 3.17 to get the updated
> kernel on it, hoping that would sort it out.
>
> With the updated 3.18 kernel (with GPU runtime PM disabled), I found
> that I'd still get oopses in from the network driver while X was starting
> up, again from dma_map_single(). So, with all GPU users again disabled,
> I set about debugging the this issue.
>
> I added a BUG_ON(!addr) after the page_address(), and that fired. I
> added a BUG_ON(PageHighMem(this_frag->page.p)) and that fired too.
> (Each time, I had to boot back to 3.17 in order to download the new
> kernel, because very time I tried with 3.18, I'd hit this bug.)
>
> It's then when I reported the issue and asked the questions...
>
> I've since done a simple change, taking advantage that on ARM (or any
> asm-generic/dma-mapping-common.h user), dma_unmap_single() and
> dma_unmap_page() are the same function:
>
> diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
> index d44560d1d268..c343ab03ab8b 100644
> --- a/drivers/net/ethernet/marvell/mv643xx_eth.c
> +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
> @@ -879,10 +879,8 @@ static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)
> skb_frag_t *this_frag;
> int tx_index;
> struct tx_desc *desc;
> - void *addr;
>
> this_frag = &skb_shinfo(skb)->frags[frag];
> - addr = page_address(this_frag->page.p) + this_frag->page_offset; tx_index = txq->tx_curr_desc++;
> if (txq->tx_curr_desc == txq->tx_ring_size)
> txq->tx_curr_desc = 0;
> @@ -902,8 +900,9 @@ static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)
>
> desc->l4i_chk = 0;
> desc->byte_cnt = skb_frag_size(this_frag);
> - desc->buf_ptr = dma_map_single(mp->dev->dev.parent, addr,
> - desc->byte_cnt, DMA_TO_DEVICE);
> + desc->buf_ptr = skb_frag_dma_map(mp->dev->dev.parent,
> + this_frag, 0,
> + desc->byte_cnt, DMA_TO_DEVICE); }
> }
>
>
> I've been running that for the last five days, and I've yet to see
> /any/ issues what so ever, and that includes running with the GPU
> logging all that time:
>
> -rw-r--r-- 1 root root 17113616 Dec 17 23:52 /shared/etnaviv.bin
>
> During that time, I've been using the device over the network, running
> various git commands, running builds, running the occasional build
> via NFS, etc.
>
> So, for me it was trivially easy to reproduce (without my fix in place)
> and all problems have gone away when I've fixed the apparent problem.
>
Well that's interesting. You've fixed only the non-TSO egress path,
yet your original ethtool output showed tcp-segmentation-offload enabled.
This seems to imply the highmem pages are found only for the non-TSO path.
> However, exactly how it occurs, I don't know. My understanding from
> reading the various feature flags was that NETIF_F_HIGHDMA was required
> for highmem (see illegal_highdma()) so as this isn't set, we shouldn't
> be seeing highmem fragments - which is why I asked the question in my
> original email.
>
> If you want me to revert my fix above, and reproduce again, I can
> certainly try that - or put a WARN_ON_ONCE(PageHighMem(this_frag->page.p))
> in there, but I seem to remember that it wasn't particularly useful as
> the backtrace didn't show where the memory actually came from.
>
No, that's OK. Thanks a lot for all the details. I'll try to come up with a
fix soon.
--
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com
next prev parent reply other threads:[~2014-12-18 13:15 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-11 19:49 Bug: mv643xxx fails with highmem Russell King - ARM Linux
2014-12-11 20:10 ` David Miller
2014-12-11 20:12 ` Ezequiel Garcia
2014-12-11 20:25 ` Russell King - ARM Linux
2014-12-11 20:25 ` Ezequiel Garcia
2014-12-11 20:27 ` David Miller
2014-12-12 5:34 ` fugang.duan
2014-12-15 18:04 ` Russell King - ARM Linux
2014-12-16 2:19 ` fugang.duan
2014-12-16 10:57 ` Russell King - ARM Linux
2014-12-18 2:29 ` fugang.duan
2014-12-17 21:18 ` Ezequiel Garcia
2014-12-18 0:03 ` Russell King - ARM Linux
2014-12-18 13:13 ` Ezequiel Garcia [this message]
2014-12-21 16:51 ` Russell King - ARM Linux
2015-01-20 13:41 ` Russell King - ARM Linux
2015-01-20 13:42 ` Ezequiel Garcia
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5492D2EF.6050807@free-electrons.com \
--to=ezequiel.garcia@free-electrons.com \
--cc=B38611@freescale.com \
--cc=davem@davemloft.net \
--cc=fabio.estevam@freescale.com \
--cc=fugang.duan@freescale.com \
--cc=linux@arm.linux.org.uk \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).