Bug: mv643xxx fails with highmem

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Bug: mv643xxx fails with highmem
@ 2014-12-11 19:49 Russell King - ARM Linux
  2014-12-11 20:10 ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King - ARM Linux @ 2014-12-11 19:49 UTC (permalink / raw)
  To: Ezequiel Garcia, David S. Miller; +Cc: netdev

Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
all fragments with dma_map_single().  This fails when the driver is
used in an environment with highmem.

With this patch in place in 3.18:

diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
index d44560d1d268..14d1fc9ff485 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -882,7 +882,9 @@ static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)
 		void *addr;
  
 		this_frag = &skb_shinfo(skb)->frags[frag];
+		BUG_ON(PageHighMem(this_frag->page.p));
 		addr = page_address(this_frag->page.p) + this_frag->page_offset;
+		BUG_ON(!addr);
 		tx_index = txq->tx_curr_desc++;
 		if (txq->tx_curr_desc == txq->tx_ring_size)
 			txq->tx_curr_desc = 0;

I regularly hit the first BUG_ON().  Without the BUG_ON(), the machine
either freezes or oopses in the DMA API functions due to them being
passed an invalid struct page *.

So, I'm unclear whether this is a driver bug, or whether it's a core
netdev bug: should netdev be passing skbuffs with highmem pages
attached when the driver has NETIF_F_HIGHDMA clear, or is Ezequiel's
patch removing the ability to map highmem pages from this driver
incorrect?

Here's the ethtool -k eth0 output for this driver:

Features for eth0:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: off [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-ipip-segmentation: off [fixed]
tx-sit-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-mpls-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]

Thanks.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Bug: mv643xxx fails with highmem
  2014-12-11 19:49 Bug: mv643xxx fails with highmem Russell King - ARM Linux
@ 2014-12-11 20:10 ` David Miller
  2014-12-11 20:12   ` Ezequiel Garcia
  2014-12-11 20:25   ` Russell King - ARM Linux
  0 siblings, 2 replies; 17+ messages in thread
From: David Miller @ 2014-12-11 20:10 UTC (permalink / raw)
  To: linux; +Cc: ezequiel.garcia, netdev

From: Russell King - ARM Linux <linux@arm.linux.org.uk>
Date: Thu, 11 Dec 2014 19:49:20 +0000

> Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
> all fragments with dma_map_single().  This fails when the driver is
> used in an environment with highmem.

This change looks really buggy to me.

Unfortunately, all the changes he subsequently makes for software TSO
support depend upon this :-/

The change is definitely wrong.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: mv643xxx fails with highmem
  2014-12-11 20:10 ` David Miller
@ 2014-12-11 20:12   ` Ezequiel Garcia
  2014-12-11 20:25   ` Russell King - ARM Linux
  1 sibling, 0 replies; 17+ messages in thread
From: Ezequiel Garcia @ 2014-12-11 20:12 UTC (permalink / raw)
  To: David Miller, linux; +Cc: netdev

On 12/11/2014 05:10 PM, David Miller wrote:
> From: Russell King - ARM Linux <linux@arm.linux.org.uk>
> Date: Thu, 11 Dec 2014 19:49:20 +0000
> 
>> Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
>> all fragments with dma_map_single().  This fails when the driver is
>> used in an environment with highmem.
> 
> This change looks really buggy to me.
> 
> Unfortunately, all the changes he subsequently makes for software TSO
> support depend upon this :-/
> 
> The change is definitely wrong.
> 

Got it. I'll take a closer look and will try to think a fix for this.

-- 
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: mv643xxx fails with highmem
  2014-12-11 20:10 ` David Miller
  2014-12-11 20:12   ` Ezequiel Garcia
@ 2014-12-11 20:25   ` Russell King - ARM Linux
  2014-12-11 20:25     ` Ezequiel Garcia
                       ` (2 more replies)
  1 sibling, 3 replies; 17+ messages in thread
From: Russell King - ARM Linux @ 2014-12-11 20:25 UTC (permalink / raw)
  To: David Miller, Nimrod Andy, Fabio Estevam; +Cc: ezequiel.garcia, netdev

On Thu, Dec 11, 2014 at 03:10:55PM -0500, David Miller wrote:
> From: Russell King - ARM Linux <linux@arm.linux.org.uk>
> Date: Thu, 11 Dec 2014 19:49:20 +0000
> 
> > Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
> > all fragments with dma_map_single().  This fails when the driver is
> > used in an environment with highmem.
> 
> This change looks really buggy to me.
> 
> Unfortunately, all the changes he subsequently makes for software TSO
> support depend upon this :-/
> 
> The change is definitely wrong.

Thanks for confirming where the bug is.

Would other drivers need fixing for this as well?  Eg, fec_main.c
does the following, and this driver is used on iMX6 which can also have
highmem:

static int
fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq,
                             struct sk_buff *skb,
                             struct net_device *ndev)
{
                bufaddr = page_address(this_frag->page.p) + this_frag->page_offset;
...
                addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
                                      DMA_TO_DEVICE);

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: mv643xxx fails with highmem
  2014-12-11 20:25   ` Russell King - ARM Linux
@ 2014-12-11 20:25     ` Ezequiel Garcia
  2014-12-11 20:27     ` David Miller
  2014-12-17 21:18     ` Ezequiel Garcia
  2 siblings, 0 replies; 17+ messages in thread
From: Ezequiel Garcia @ 2014-12-11 20:25 UTC (permalink / raw)
  To: Russell King - ARM Linux, David Miller, Nimrod Andy,
	Fabio Estevam
  Cc: netdev, Thomas Petazzoni, Gregory CLEMENT

On 12/11/2014 05:25 PM, Russell King - ARM Linux wrote:
> On Thu, Dec 11, 2014 at 03:10:55PM -0500, David Miller wrote:
>> From: Russell King - ARM Linux <linux@arm.linux.org.uk>
>> Date: Thu, 11 Dec 2014 19:49:20 +0000
>>
>>> Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
>>> all fragments with dma_map_single().  This fails when the driver is
>>> used in an environment with highmem.
>>
>> This change looks really buggy to me.
>>
>> Unfortunately, all the changes he subsequently makes for software TSO
>> support depend upon this :-/
>>
>> The change is definitely wrong.
> 
> Thanks for confirming where the bug is.
> 
> Would other drivers need fixing for this as well?  Eg, fec_main.c
> does the following, and this driver is used on iMX6 which can also have
> highmem:
> 
> static int
> fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq,
>                              struct sk_buff *skb,
>                              struct net_device *ndev)
> {
>                 bufaddr = page_address(this_frag->page.p) + this_frag->page_offset;
> ...
>                 addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
>                                       DMA_TO_DEVICE);
> 

And mvneta seems to have that pattern too: see mvneta_tx_frag_process().
-- 
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: mv643xxx fails with highmem
  2014-12-11 20:25   ` Russell King - ARM Linux
  2014-12-11 20:25     ` Ezequiel Garcia
@ 2014-12-11 20:27     ` David Miller
  2014-12-12  5:34       ` fugang.duan
  2014-12-17 21:18     ` Ezequiel Garcia
  2 siblings, 1 reply; 17+ messages in thread
From: David Miller @ 2014-12-11 20:27 UTC (permalink / raw)
  To: linux; +Cc: B38611, fabio.estevam, ezequiel.garcia, netdev

From: Russell King - ARM Linux <linux@arm.linux.org.uk>
Date: Thu, 11 Dec 2014 20:25:07 +0000

> Would other drivers need fixing for this as well?  Eg, fec_main.c
> does the following, and this driver is used on iMX6 which can also have
> highmem:
> 
> static int
> fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq,
>                              struct sk_buff *skb,
>                              struct net_device *ndev)
> {
>                 bufaddr = page_address(this_frag->page.p) + this_frag->page_offset;
> ...
>                 addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
>                                       DMA_TO_DEVICE);

Probably, yes.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Bug: mv643xxx fails with highmem
  2014-12-11 20:27     ` David Miller
@ 2014-12-12  5:34       ` fugang.duan
  2014-12-15 18:04         ` Russell King - ARM Linux
  0 siblings, 1 reply; 17+ messages in thread
From: fugang.duan @ 2014-12-12  5:34 UTC (permalink / raw)
  To: David Miller, linux@arm.linux.org.uk
  Cc: Fabio.Estevam@freescale.com, ezequiel.garcia@free-electrons.com,
	netdev@vger.kernel.org

From: David Miller <davem@davemloft.net> Sent: Friday, December 12, 2014 4:28 AM
> To: linux@arm.linux.org.uk
> Cc: Duan Fugang-B38611; Estevam Fabio-R49496; ezequiel.garcia@free-
> electrons.com; netdev@vger.kernel.org
> Subject: Re: Bug: mv643xxx fails with highmem
> 
> From: Russell King - ARM Linux <linux@arm.linux.org.uk>
> Date: Thu, 11 Dec 2014 20:25:07 +0000
> 
> > Would other drivers need fixing for this as well?  Eg, fec_main.c does
> > the following, and this driver is used on iMX6 which can also have
> > highmem:
> >
> > static int
> > fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq,
> >                              struct sk_buff *skb,
> >                              struct net_device *ndev) {
> >                 bufaddr = page_address(this_frag->page.p) +
> > this_frag->page_offset; ...
> >                 addr = dma_map_single(&fep->pdev->dev, bufaddr,
> frag_len,
> >                                       DMA_TO_DEVICE);
> 
> Probably, yes.

Hi, all,

I will submit one patch to fix the issue.

Regards,
Andy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: mv643xxx fails with highmem
  2014-12-12  5:34       ` fugang.duan
@ 2014-12-15 18:04         ` Russell King - ARM Linux
  2014-12-16  2:19           ` fugang.duan
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King - ARM Linux @ 2014-12-15 18:04 UTC (permalink / raw)
  To: fugang.duan@freescale.com
  Cc: David Miller, Fabio.Estevam@freescale.com,
	ezequiel.garcia@free-electrons.com, netdev@vger.kernel.org

On Fri, Dec 12, 2014 at 05:34:01AM +0000, fugang.duan@freescale.com wrote:
> I will submit one patch to fix the issue.

There's more bugs in the FEC driver... here's the relevant bits:

static void
fec_enet_tx_queue(struct net_device *ndev, u16 queue_id)
{
        bdp = txq->dirty_tx;

        bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);

        while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
                /* current queue is empty */
                if (bdp == txq->cur_tx)
                        break;

                skb = txq->tx_skbuff[index];
                txq->tx_skbuff[index] = NULL;
                if (!IS_TSO_HEADER(txq, bdp->cbd_bufaddr))
                        dma_unmap_single(&fep->pdev->dev, bdp->cbd_bufaddr,
                                        bdp->cbd_datlen, DMA_TO_DEVICE);
                bdp->cbd_bufaddr = 0;
                if (!skb) {
                        bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
                        continue;
                }
...
                txq->dirty_tx = bdp;
                bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
        }

Consider the following code path:
- we enter this function
- get the dirty_tx pointer
- move to the next descriptor (which we'll call descriptor A)
- next descriptor indicates that TX_READY = 0
- bdp != txq->cur_tx
- we unmap if needed
- we set bdp->cmdbufaddr = 0
- assume skb is NULL, so we move to the next descriptor (we'll call this B)
- next descriptor _may_ have TX_READY = 1
- we break out of the loop, and return

Some time later, we re-enter:
- get the dirty_tx pointer
- move to the next descriptor (which is descriptor A above)
- next descriptor indicates that TX_READY = 0
- bdp != txq->cur_tx
- we call dma_unmap_single(..., bdp->cbd_bufaddr, which we previously zeroed
  - the DMA API debugging complains that FEC is unmapping memory which it
    doesn't own

Unfortunately, this does appear to happen - from a paste from Jon
Nettleton from iMX6Q:

 32. [   45.033001] unmapping this address 0x0 size 66
 33. [   45.037470] ------------[ cut here ]------------
 34. [   45.042127] WARNING: CPU: 0 PID: 102 at lib/dma-debug.c:1080 check_unmap+0x784/0x9f4()
 35. [   45.050066] fec 2188000.ethernet: DMA-API: device driver tries to free DMA memory it has not a]

(where the printk at line 32 is something that was added to debug this.)

The sad thing is that the remainder of my FEC patches did go a long way
to clean up these kinds of issues in the driver (and there's /many/ of
them), but unfortunately other conflicting changes got merged before I
could finish rebasing them, I decided to move on to other things and
discard the remainder of my patch set.  Marek showed some interest in
taking the patch set over, but I've not heard anything more - and I'm
not about to resurect my efforts only to get into the same situation
where I'm carrying 50 odd patches which I can't merge back into mainline
without spending weeks endlessly rebasing them.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Bug: mv643xxx fails with highmem
  2014-12-15 18:04         ` Russell King - ARM Linux
@ 2014-12-16  2:19           ` fugang.duan
  2014-12-16 10:57             ` Russell King - ARM Linux
  0 siblings, 1 reply; 17+ messages in thread
From: fugang.duan @ 2014-12-16  2:19 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: David Miller, Fabio.Estevam@freescale.com,
	ezequiel.garcia@free-electrons.com, netdev@vger.kernel.org

From: Russell King - ARM Linux <linux@arm.linux.org.uk> Sent: Tuesday, December 16, 2014 2:05 AM
> To: Duan Fugang-B38611
> Cc: David Miller; Estevam Fabio-R49496; ezequiel.garcia@free-
> electrons.com; netdev@vger.kernel.org
> Subject: Re: Bug: mv643xxx fails with highmem
> 
> On Fri, Dec 12, 2014 at 05:34:01AM +0000, fugang.duan@freescale.com wrote:
> > I will submit one patch to fix the issue.
> 
> There's more bugs in the FEC driver... here's the relevant bits:
> 
> static void
> fec_enet_tx_queue(struct net_device *ndev, u16 queue_id) {
>         bdp = txq->dirty_tx;
> 
>         bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
> 
>         while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
>                 /* current queue is empty */
>                 if (bdp == txq->cur_tx)
>                         break;
> 
>                 skb = txq->tx_skbuff[index];
>                 txq->tx_skbuff[index] = NULL;
>                 if (!IS_TSO_HEADER(txq, bdp->cbd_bufaddr))
>                         dma_unmap_single(&fep->pdev->dev, bdp-
> >cbd_bufaddr,
>                                         bdp->cbd_datlen, DMA_TO_DEVICE);
>                 bdp->cbd_bufaddr = 0;
>                 if (!skb) {
>                         bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
>                         continue;
>                 }
> ...
>                 txq->dirty_tx = bdp;
>                 bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
>         }
> 
> Consider the following code path:
> - we enter this function
> - get the dirty_tx pointer
> - move to the next descriptor (which we'll call descriptor A)
> - next descriptor indicates that TX_READY = 0
> - bdp != txq->cur_tx
> - we unmap if needed
> - we set bdp->cmdbufaddr = 0
> - assume skb is NULL, so we move to the next descriptor (we'll call this
> B)
> - next descriptor _may_ have TX_READY = 1
> - we break out of the loop, and return
> 
> Some time later, we re-enter:
> - get the dirty_tx pointer
> - move to the next descriptor (which is descriptor A above)
> - next descriptor indicates that TX_READY = 0
> - bdp != txq->cur_tx
> - we call dma_unmap_single(..., bdp->cbd_bufaddr, which we previously
> zeroed
>   - the DMA API debugging complains that FEC is unmapping memory which it
>     doesn't own
> 
> Unfortunately, this does appear to happen - from a paste from Jon
> Nettleton from iMX6Q:
> 
>  32. [   45.033001] unmapping this address 0x0 size 66  33. [   45.037470]
> ------------[ cut here ]------------  34. [   45.042127] WARNING: CPU: 0
> PID: 102 at lib/dma-debug.c:1080 check_unmap+0x784/0x9f4()
> 35. [   45.050066] fec 2188000.ethernet: DMA-API: device driver tries to
> free DMA memory it has not a]
> 
> (where the printk at line 32 is something that was added to debug this.)
> 
> The sad thing is that the remainder of my FEC patches did go a long way
> to clean up these kinds of issues in the driver (and there's /many/ of
> them), but unfortunately other conflicting changes got merged before I
> could finish rebasing them, I decided to move on to other things and
> discard the remainder of my patch set.  Marek showed some interest in
> taking the patch set over, but I've not heard anything more - and I'm not
> about to resurect my efforts only to get into the same situation where
> I'm carrying 50 odd patches which I can't merge back into mainline
> without spending weeks endlessly rebasing them.
> 
Russell, many thanks for your effort and thanks for your pointing out the bug.
I will think one method to fix it.

And I have one question for highmem dma mapping issue as below:
fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq, struct sk_buff *skb, struct net_device *ndev)
{
	...
		    bufaddr = page_address(this_frag->page.p) + this_frag->page_offset;

                index = fec_enet_get_bd_index(txq->tx_bd_base, bdp, fep);
                if (((unsigned long) bufaddr) & fep->tx_align ||
                        fep->quirks & FEC_QUIRK_SWAP_FRAME) {
                        memcpy(txq->tx_bounce[index], bufaddr, frag_len);
                        bufaddr = txq->tx_bounce[index];

                        if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
                                swap_buffer(bufaddr, frag_len);
                }

                addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
                                      DMA_TO_DEVICE);
                if (dma_mapping_error(&fep->pdev->dev, addr)) {
                        dev_kfree_skb_any(skb);
                        if (net_ratelimit())
                                netdev_err(ndev, "Tx DMA memory map failed\n");
                        goto dma_mapping_error;
                }
	...
}
If the frag page is located at high memory, use dma_map_single() is not right, must use skb_frag_dma_map() or dma_map_page().
But before mapping, if tx has buffer alignment limitation (tx_align is not zero), there need to do memcpy for buffer alignment.
So, there we need to check whether the page is in highmem, if so, we need to call kmap_atomic() or kmap_high_get() to get cpu address,
And then do memcpy or swap buffer operation.

Do you think the above solution is right ? or maybe there have better method to fix it ?

Regards,
Andy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: mv643xxx fails with highmem
  2014-12-16  2:19           ` fugang.duan
@ 2014-12-16 10:57             ` Russell King - ARM Linux
  2014-12-18  2:29               ` fugang.duan
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King - ARM Linux @ 2014-12-16 10:57 UTC (permalink / raw)
  To: fugang.duan@freescale.com
  Cc: David Miller, Fabio.Estevam@freescale.com,
	ezequiel.garcia@free-electrons.com, netdev@vger.kernel.org

On Tue, Dec 16, 2014 at 02:19:38AM +0000, fugang.duan@freescale.com wrote:
> From: Russell King - ARM Linux <linux@arm.linux.org.uk> Sent: Tuesday, December 16, 2014 2:05 AM
> > To: Duan Fugang-B38611
> > Cc: David Miller; Estevam Fabio-R49496; ezequiel.garcia@free-
> > electrons.com; netdev@vger.kernel.org
> > Subject: Re: Bug: mv643xxx fails with highmem
> > 
> > On Fri, Dec 12, 2014 at 05:34:01AM +0000, fugang.duan@freescale.com wrote:
> > > I will submit one patch to fix the issue.
> > 
> > There's more bugs in the FEC driver... here's the relevant bits:
> > 
> > static void
> > fec_enet_tx_queue(struct net_device *ndev, u16 queue_id) {
> >         bdp = txq->dirty_tx;
> > 
> >         bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
> > 
> >         while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
> >                 /* current queue is empty */
> >                 if (bdp == txq->cur_tx)
> >                         break;
> > 
> >                 skb = txq->tx_skbuff[index];
> >                 txq->tx_skbuff[index] = NULL;
> >                 if (!IS_TSO_HEADER(txq, bdp->cbd_bufaddr))
> >                         dma_unmap_single(&fep->pdev->dev, bdp-
> > >cbd_bufaddr,
> >                                         bdp->cbd_datlen, DMA_TO_DEVICE);
> >                 bdp->cbd_bufaddr = 0;
> >                 if (!skb) {
> >                         bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
> >                         continue;
> >                 }
> > ...
> >                 txq->dirty_tx = bdp;
> >                 bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
> >         }
> > 
> > Consider the following code path:
> > - we enter this function
> > - get the dirty_tx pointer
> > - move to the next descriptor (which we'll call descriptor A)
> > - next descriptor indicates that TX_READY = 0
> > - bdp != txq->cur_tx
> > - we unmap if needed
> > - we set bdp->cmdbufaddr = 0
> > - assume skb is NULL, so we move to the next descriptor (we'll call this
> > B)
> > - next descriptor _may_ have TX_READY = 1
> > - we break out of the loop, and return
> > 
> > Some time later, we re-enter:
> > - get the dirty_tx pointer
> > - move to the next descriptor (which is descriptor A above)
> > - next descriptor indicates that TX_READY = 0
> > - bdp != txq->cur_tx
> > - we call dma_unmap_single(..., bdp->cbd_bufaddr, which we previously
> > zeroed
> >   - the DMA API debugging complains that FEC is unmapping memory which it
> >     doesn't own
> > 
> > Unfortunately, this does appear to happen - from a paste from Jon
> > Nettleton from iMX6Q:
> > 
> >  32. [   45.033001] unmapping this address 0x0 size 66  33. [   45.037470]
> > ------------[ cut here ]------------  34. [   45.042127] WARNING: CPU: 0
> > PID: 102 at lib/dma-debug.c:1080 check_unmap+0x784/0x9f4()
> > 35. [   45.050066] fec 2188000.ethernet: DMA-API: device driver tries to
> > free DMA memory it has not a]
> > 
> > (where the printk at line 32 is something that was added to debug this.)
> > 
> > The sad thing is that the remainder of my FEC patches did go a long way
> > to clean up these kinds of issues in the driver (and there's /many/ of
> > them), but unfortunately other conflicting changes got merged before I
> > could finish rebasing them, I decided to move on to other things and
> > discard the remainder of my patch set.  Marek showed some interest in
> > taking the patch set over, but I've not heard anything more - and I'm not
> > about to resurect my efforts only to get into the same situation where
> > I'm carrying 50 odd patches which I can't merge back into mainline
> > without spending weeks endlessly rebasing them.
> > 
> Russell, many thanks for your effort and thanks for your pointing out the bug.
> I will think one method to fix it.
> 
> And I have one question for highmem dma mapping issue as below:
> fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq, struct sk_buff *skb, struct net_device *ndev)
> {
> 	...
> 		    bufaddr = page_address(this_frag->page.p) + this_frag->page_offset;
> 
>                 index = fec_enet_get_bd_index(txq->tx_bd_base, bdp, fep);
>                 if (((unsigned long) bufaddr) & fep->tx_align ||
>                         fep->quirks & FEC_QUIRK_SWAP_FRAME) {
>                         memcpy(txq->tx_bounce[index], bufaddr, frag_len);
>                         bufaddr = txq->tx_bounce[index];
> 
>                         if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
>                                 swap_buffer(bufaddr, frag_len);
>                 }
> 
>                 addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
>                                       DMA_TO_DEVICE);
>                 if (dma_mapping_error(&fep->pdev->dev, addr)) {
>                         dev_kfree_skb_any(skb);
>                         if (net_ratelimit())
>                                 netdev_err(ndev, "Tx DMA memory map failed\n");
>                         goto dma_mapping_error;
>                 }
> 	...
> }
>
> If the frag page is located at high memory, use dma_map_single() is not
> right, must use skb_frag_dma_map() or dma_map_page().

Correct.

> But before mapping, if tx has buffer alignment limitation (tx_align is
> not zero), there need to do memcpy for buffer alignment.

Right, and that can be detected by simply checking this_frag->page_offset
as we know that a page address is always aligned to a page.

> So, there we need to check whether the page is in highmem, if so, we
> need to call kmap_atomic() or kmap_high_get() to get cpu address,
> And then do memcpy or swap buffer operation.

Yes - you'd need to do something like this:

	if (this_frag->page_offset & fep->tx_align ||
	    fep->quirks & FEC_QUIRK_SWAP_FRAME) {
		bufaddr = kmap_atomic(this_frag->page.p) + this_frag->page_offset;
		memcpy(txq->tx_bounce[index], bufaddr, frag_len);
		kunmap_atomic(bufaddr);

		bufaddr = txq->tx_bounce[index];
		if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
			swap_buffer(bufaddr, frag_len);
                addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
                                      DMA_TO_DEVICE);
	} else {
		addr = skb_frag_dma_map(&fep->pdev->dev, this_frag, 0,
					frag_len, DMA_TO_DEVICE);
	}

	if (dma_mapping_error(&fep->pdev->dev, addr)) {
		dev_kfree_skb_any(skb);
		if (net_ratelimit())
			netdev_err(ndev, "Tx DMA memory map failed\n");
		goto dma_mapping_error;
	}

You'll also need to record whether you should use dma_unmap_page() or
dma_unmap_single().

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: Bug: mv643xxx fails with highmem
  2014-12-16 10:57             ` Russell King - ARM Linux
@ 2014-12-18  2:29               ` fugang.duan
  0 siblings, 0 replies; 17+ messages in thread
From: fugang.duan @ 2014-12-18  2:29 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: David Miller, Fabio.Estevam@freescale.com,
	ezequiel.garcia@free-electrons.com, netdev@vger.kernel.org

From: Russell King - ARM Linux <linux@arm.linux.org.uk> Sent: Tuesday, December 16, 2014 6:57 PM
> To: Duan Fugang-B38611
> Cc: David Miller; Estevam Fabio-R49496; ezequiel.garcia@free-
> electrons.com; netdev@vger.kernel.org
> Subject: Re: Bug: mv643xxx fails with highmem
> 
> On Tue, Dec 16, 2014 at 02:19:38AM +0000, fugang.duan@freescale.com wrote:
> > From: Russell King - ARM Linux <linux@arm.linux.org.uk> Sent: Tuesday,
> December 16, 2014 2:05 AM
> > > To: Duan Fugang-B38611
> > > Cc: David Miller; Estevam Fabio-R49496; ezequiel.garcia@free-
> > > electrons.com; netdev@vger.kernel.org
> > > Subject: Re: Bug: mv643xxx fails with highmem
> > >
> > > On Fri, Dec 12, 2014 at 05:34:01AM +0000, fugang.duan@freescale.com
> wrote:
> > > > I will submit one patch to fix the issue.
> > >
> > > There's more bugs in the FEC driver... here's the relevant bits:
> > >
> > > static void
> > > fec_enet_tx_queue(struct net_device *ndev, u16 queue_id) {
> > >         bdp = txq->dirty_tx;
> > >
> > >         bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
> > >
> > >         while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
> > >                 /* current queue is empty */
> > >                 if (bdp == txq->cur_tx)
> > >                         break;
> > >
> > >                 skb = txq->tx_skbuff[index];
> > >                 txq->tx_skbuff[index] = NULL;
> > >                 if (!IS_TSO_HEADER(txq, bdp->cbd_bufaddr))
> > >                         dma_unmap_single(&fep->pdev->dev, bdp-
> > > >cbd_bufaddr,
> > >                                         bdp->cbd_datlen,
> DMA_TO_DEVICE);
> > >                 bdp->cbd_bufaddr = 0;
> > >                 if (!skb) {
> > >                         bdp = fec_enet_get_nextdesc(bdp, fep,
> queue_id);
> > >                         continue;
> > >                 }
> > > ...
> > >                 txq->dirty_tx = bdp;
> > >                 bdp = fec_enet_get_nextdesc(bdp, fep, queue_id);
> > >         }
> > >
> > > Consider the following code path:
> > > - we enter this function
> > > - get the dirty_tx pointer
> > > - move to the next descriptor (which we'll call descriptor A)
> > > - next descriptor indicates that TX_READY = 0
> > > - bdp != txq->cur_tx
> > > - we unmap if needed
> > > - we set bdp->cmdbufaddr = 0
> > > - assume skb is NULL, so we move to the next descriptor (we'll call
> this
> > > B)
> > > - next descriptor _may_ have TX_READY = 1
> > > - we break out of the loop, and return
> > >
> > > Some time later, we re-enter:
> > > - get the dirty_tx pointer
> > > - move to the next descriptor (which is descriptor A above)
> > > - next descriptor indicates that TX_READY = 0
> > > - bdp != txq->cur_tx
> > > - we call dma_unmap_single(..., bdp->cbd_bufaddr, which we previously
> > > zeroed
> > >   - the DMA API debugging complains that FEC is unmapping memory
> which it
> > >     doesn't own
> > >
> > > Unfortunately, this does appear to happen - from a paste from Jon
> > > Nettleton from iMX6Q:
> > >
> > >  32. [   45.033001] unmapping this address 0x0 size 66
> 33. [   45.037470]
> > > ------------[ cut here ]------------  34. [   45.042127] WARNING: CPU:
> 0
> > > PID: 102 at lib/dma-debug.c:1080 check_unmap+0x784/0x9f4()
> > > 35. [   45.050066] fec 2188000.ethernet: DMA-API: device driver tries
> to
> > > free DMA memory it has not a]
> > >
> > > (where the printk at line 32 is something that was added to debug
> this.)
> > >
> > > The sad thing is that the remainder of my FEC patches did go a long
> way
> > > to clean up these kinds of issues in the driver (and there's /many/
> of
> > > them), but unfortunately other conflicting changes got merged before
> I
> > > could finish rebasing them, I decided to move on to other things and
> > > discard the remainder of my patch set.  Marek showed some interest in
> > > taking the patch set over, but I've not heard anything more - and I'm
> not
> > > about to resurect my efforts only to get into the same situation
> where
> > > I'm carrying 50 odd patches which I can't merge back into mainline
> > > without spending weeks endlessly rebasing them.
> > >
> > Russell, many thanks for your effort and thanks for your pointing out
> the bug.
> > I will think one method to fix it.
> >
> > And I have one question for highmem dma mapping issue as below:
> > fec_enet_txq_submit_frag_skb(struct fec_enet_priv_tx_q *txq, struct
> sk_buff *skb, struct net_device *ndev)
> > {
> > 	...
> > 		    bufaddr = page_address(this_frag->page.p) + this_frag-
> >page_offset;
> >
> >                 index = fec_enet_get_bd_index(txq->tx_bd_base, bdp,
> fep);
> >                 if (((unsigned long) bufaddr) & fep->tx_align ||
> >                         fep->quirks & FEC_QUIRK_SWAP_FRAME) {
> >                         memcpy(txq->tx_bounce[index], bufaddr,
> frag_len);
> >                         bufaddr = txq->tx_bounce[index];
> >
> >                         if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> >                                 swap_buffer(bufaddr, frag_len);
> >                 }
> >
> >                 addr = dma_map_single(&fep->pdev->dev, bufaddr,
> frag_len,
> >                                       DMA_TO_DEVICE);
> >                 if (dma_mapping_error(&fep->pdev->dev, addr)) {
> >                         dev_kfree_skb_any(skb);
> >                         if (net_ratelimit())
> >                                 netdev_err(ndev, "Tx DMA memory map
> failed\n");
> >                         goto dma_mapping_error;
> >                 }
> > 	...
> > }
> >
> > If the frag page is located at high memory, use dma_map_single() is not
> > right, must use skb_frag_dma_map() or dma_map_page().
> 
> Correct.
> 
> > But before mapping, if tx has buffer alignment limitation (tx_align is
> > not zero), there need to do memcpy for buffer alignment.
> 
> Right, and that can be detected by simply checking this_frag->page_offset
> as we know that a page address is always aligned to a page.
> 
> > So, there we need to check whether the page is in highmem, if so, we
> > need to call kmap_atomic() or kmap_high_get() to get cpu address,
> > And then do memcpy or swap buffer operation.
> 
> Yes - you'd need to do something like this:
> 
> 	if (this_frag->page_offset & fep->tx_align ||
> 	    fep->quirks & FEC_QUIRK_SWAP_FRAME) {
> 		bufaddr = kmap_atomic(this_frag->page.p) + this_frag-
> >page_offset;
> 		memcpy(txq->tx_bounce[index], bufaddr, frag_len);
> 		kunmap_atomic(bufaddr);
> 
> 		bufaddr = txq->tx_bounce[index];
> 		if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> 			swap_buffer(bufaddr, frag_len);
>                 addr = dma_map_single(&fep->pdev->dev, bufaddr, frag_len,
>                                       DMA_TO_DEVICE);
> 	} else {
> 		addr = skb_frag_dma_map(&fep->pdev->dev, this_frag, 0,
> 					frag_len, DMA_TO_DEVICE);
> 	}
> 
> 	if (dma_mapping_error(&fep->pdev->dev, addr)) {
> 		dev_kfree_skb_any(skb);
> 		if (net_ratelimit())
> 			netdev_err(ndev, "Tx DMA memory map failed\n");
> 		goto dma_mapping_error;
> 	}
> 
> You'll also need to record whether you should use dma_unmap_page() or
> dma_unmap_single().
> 
Russell, thanks for your suggestion and comments. I will generate one patch to fix it like your suggestion.
(Sorry for late to reply your mail since I missed to read this mail)

Regards,
Andy

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: mv643xxx fails with highmem
  2014-12-11 20:25   ` Russell King - ARM Linux
  2014-12-11 20:25     ` Ezequiel Garcia
  2014-12-11 20:27     ` David Miller
@ 2014-12-17 21:18     ` Ezequiel Garcia
  2014-12-18  0:03       ` Russell King - ARM Linux
  2 siblings, 1 reply; 17+ messages in thread
From: Ezequiel Garcia @ 2014-12-17 21:18 UTC (permalink / raw)
  To: Russell King - ARM Linux, David Miller, Nimrod Andy,
	Fabio Estevam
  Cc: netdev, fugang.duan

Russell, David:

On 12/11/2014 05:25 PM, Russell King - ARM Linux wrote:
> On Thu, Dec 11, 2014 at 03:10:55PM -0500, David Miller wrote:
>> From: Russell King - ARM Linux <linux@arm.linux.org.uk>
>> Date: Thu, 11 Dec 2014 19:49:20 +0000
>>
>>> Commit 69ad0dd7af22 removed skb_frag_dma_map() in favour of mapping
>>> all fragments with dma_map_single().  This fails when the driver is
>>> used in an environment with highmem.
>>
>> This change looks really buggy to me.
>>
>> Unfortunately, all the changes he subsequently makes for software TSO
>> support depend upon this :-/
>>
>> The change is definitely wrong.

I've been trying to find a fix for this issue, and also trying to
reproduce the bug.

As for the fix, we need to fix the non-TSO and TSO paths independently.
The former is fairly straightforward, but the latter might be a bit more
involved.

The problem is that the tso_t struct holds a pointer to the skb linear
and non-linear data.

struct tso_t {
        int next_frag_idx;
        void *data;
        size_t size;
        u16 ip_id;
        u32 tcp_seq;
};

Instead, we should deal with pages, and only map the non-linear skb with
skb_frag_dma_map().

On the other side, I haven't been able to reproduce this on my boards. I
did try to put a hack to hold most lowmem pages, but it didn't make a
difference. (In fact, I haven't been able to clearly see how the pages
for the skbuff are allocated from high memory.)

Russell, would you share any hints about your setup? I don't have access
to any Dove boards at the moment, but I do have Kirkwoods, Armadas and
i.MX6.

Thanks a lot for your report and help!
-- 
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: mv643xxx fails with highmem
  2014-12-17 21:18     ` Ezequiel Garcia
@ 2014-12-18  0:03       ` Russell King - ARM Linux
  2014-12-18 13:13         ` Ezequiel Garcia
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King - ARM Linux @ 2014-12-18  0:03 UTC (permalink / raw)
  To: Ezequiel Garcia
  Cc: David Miller, Nimrod Andy, Fabio Estevam, netdev, fugang.duan

On Wed, Dec 17, 2014 at 06:18:58PM -0300, Ezequiel Garcia wrote:
> On the other side, I haven't been able to reproduce this on my boards. I
> did try to put a hack to hold most lowmem pages, but it didn't make a
> difference. (In fact, I haven't been able to clearly see how the pages
> for the skbuff are allocated from high memory.)

To be honest, I don't know either.  All that I can do is describe what
happened...

I've been running 3.17 since a week after it came out, and never saw a
problem there.

Then I moved forward to 3.18, and ended up with memory corruption, which
seemed to be the GPU scribbling over kernel text (since the oops revealed
pixel values in the Code: line.)

I thought it was a GPU problem - which seemed a reasonable assumption as
I know that the runtime PM I implemented for the GPU doesn't properly
restore the hardware state yet.  So, I rebooted back into 3.18, this
time with all GPU users disabled, intending to download a kernel with
GPU runtime PM disabled.  I'd also added additional debug to my X DDX
driver which logged the GPU command stream to a file on a NFS mount -
this does open(, O_CREAT|O_WRONLY|O_APPEND), write(), close() each
time it submits a block of commands.

However, while my scripts to download the built kernel to the Cubox
were running, the kernel oopsed in the depths of dma_map_single() - the
kernel was trying to access a struct page for phys address 0x40000000,
which didn't exist.  I decided to go back to 3.17 to get the updated
kernel on it, hoping that would sort it out.

With the updated 3.18 kernel (with GPU runtime PM disabled), I found
that I'd still get oopses in from the network driver while X was starting
up, again from dma_map_single().  So, with all GPU users again disabled,
I set about debugging the this issue.

I added a BUG_ON(!addr) after the page_address(), and that fired.  I
added a BUG_ON(PageHighMem(this_frag->page.p)) and that fired too.
(Each time, I had to boot back to 3.17 in order to download the new
kernel, because very time I tried with 3.18, I'd hit this bug.)

It's then when I reported the issue and asked the questions...

I've since done a simple change, taking advantage that on ARM (or any
asm-generic/dma-mapping-common.h user), dma_unmap_single() and
dma_unmap_page() are the same function:

diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
index d44560d1d268..c343ab03ab8b 100644
--- a/drivers/net/ethernet/marvell/mv643xx_eth.c
+++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
@@ -879,10 +879,8 @@ static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)
                skb_frag_t *this_frag;
                int tx_index;
                struct tx_desc *desc;
-               void *addr;

                this_frag = &skb_shinfo(skb)->frags[frag];
-               addr = page_address(this_frag->page.p) + this_frag->page_offset;                tx_index = txq->tx_curr_desc++;
                if (txq->tx_curr_desc == txq->tx_ring_size)
                        txq->tx_curr_desc = 0;
@@ -902,8 +900,9 @@ static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)

                desc->l4i_chk = 0;
                desc->byte_cnt = skb_frag_size(this_frag);
-               desc->buf_ptr = dma_map_single(mp->dev->dev.parent, addr,
-                                              desc->byte_cnt, DMA_TO_DEVICE);
+               desc->buf_ptr = skb_frag_dma_map(mp->dev->dev.parent,
+                                                this_frag, 0,
+                                                desc->byte_cnt, DMA_TO_DEVICE);        }
 }

I've been running that for the last five days, and I've yet to see
/any/ issues what so ever, and that includes running with the GPU
logging all that time:

-rw-r--r-- 1 root root 17113616 Dec 17 23:52 /shared/etnaviv.bin

During that time, I've been using the device over the network, running
various git commands, running builds, running the occasional build
via NFS, etc.

So, for me it was trivially easy to reproduce (without my fix in place)
and all problems have gone away when I've fixed the apparent problem.

However, exactly how it occurs, I don't know.  My understanding from
reading the various feature flags was that NETIF_F_HIGHDMA was required
for highmem (see illegal_highdma()) so as this isn't set, we shouldn't
be seeing highmem fragments - which is why I asked the question in my
original email.

If you want me to revert my fix above, and reproduce again, I can
certainly try that - or put a WARN_ON_ONCE(PageHighMem(this_frag->page.p))
in there, but I seem to remember that it wasn't particularly useful as
the backtrace didn't show where the memory actually came from.

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: Bug: mv643xxx fails with highmem
  2014-12-18  0:03       ` Russell King - ARM Linux
@ 2014-12-18 13:13         ` Ezequiel Garcia
  2014-12-21 16:51           ` Russell King - ARM Linux
  0 siblings, 1 reply; 17+ messages in thread
From: Ezequiel Garcia @ 2014-12-18 13:13 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: David Miller, Nimrod Andy, Fabio Estevam, netdev, fugang.duan

On 12/17/2014 09:03 PM, Russell King - ARM Linux wrote:
> On Wed, Dec 17, 2014 at 06:18:58PM -0300, Ezequiel Garcia wrote:
>> On the other side, I haven't been able to reproduce this on my boards. I
>> did try to put a hack to hold most lowmem pages, but it didn't make a
>> difference. (In fact, I haven't been able to clearly see how the pages
>> for the skbuff are allocated from high memory.)
> 
> To be honest, I don't know either.  All that I can do is describe what
> happened...
> 
> I've been running 3.17 since a week after it came out, and never saw a
> problem there.
> 
> Then I moved forward to 3.18, and ended up with memory corruption, which
> seemed to be the GPU scribbling over kernel text (since the oops revealed
> pixel values in the Code: line.)
> 
> I thought it was a GPU problem - which seemed a reasonable assumption as
> I know that the runtime PM I implemented for the GPU doesn't properly
> restore the hardware state yet.  So, I rebooted back into 3.18, this
> time with all GPU users disabled, intending to download a kernel with
> GPU runtime PM disabled.  I'd also added additional debug to my X DDX
> driver which logged the GPU command stream to a file on a NFS mount -
> this does open(, O_CREAT|O_WRONLY|O_APPEND), write(), close() each
> time it submits a block of commands.
> 
> However, while my scripts to download the built kernel to the Cubox
> were running, the kernel oopsed in the depths of dma_map_single() - the
> kernel was trying to access a struct page for phys address 0x40000000,
> which didn't exist.  I decided to go back to 3.17 to get the updated
> kernel on it, hoping that would sort it out.
> 
> With the updated 3.18 kernel (with GPU runtime PM disabled), I found
> that I'd still get oopses in from the network driver while X was starting
> up, again from dma_map_single().  So, with all GPU users again disabled,
> I set about debugging the this issue.
> 
> I added a BUG_ON(!addr) after the page_address(), and that fired.  I
> added a BUG_ON(PageHighMem(this_frag->page.p)) and that fired too.
> (Each time, I had to boot back to 3.17 in order to download the new
> kernel, because very time I tried with 3.18, I'd hit this bug.)
> 
> It's then when I reported the issue and asked the questions...
> 
> I've since done a simple change, taking advantage that on ARM (or any
> asm-generic/dma-mapping-common.h user), dma_unmap_single() and
> dma_unmap_page() are the same function:
> 
> diff --git a/drivers/net/ethernet/marvell/mv643xx_eth.c b/drivers/net/ethernet/marvell/mv643xx_eth.c
> index d44560d1d268..c343ab03ab8b 100644
> --- a/drivers/net/ethernet/marvell/mv643xx_eth.c
> +++ b/drivers/net/ethernet/marvell/mv643xx_eth.c
> @@ -879,10 +879,8 @@ static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)
>                 skb_frag_t *this_frag;
>                 int tx_index;
>                 struct tx_desc *desc;
> -               void *addr;
> 
>                 this_frag = &skb_shinfo(skb)->frags[frag];
> -               addr = page_address(this_frag->page.p) + this_frag->page_offset;                tx_index = txq->tx_curr_desc++;
>                 if (txq->tx_curr_desc == txq->tx_ring_size)
>                         txq->tx_curr_desc = 0;
> @@ -902,8 +900,9 @@ static void txq_submit_frag_skb(struct tx_queue *txq, struct sk_buff *skb)
> 
>                 desc->l4i_chk = 0;
>                 desc->byte_cnt = skb_frag_size(this_frag);
> -               desc->buf_ptr = dma_map_single(mp->dev->dev.parent, addr,
> -                                              desc->byte_cnt, DMA_TO_DEVICE);
> +               desc->buf_ptr = skb_frag_dma_map(mp->dev->dev.parent,
> +                                                this_frag, 0,
> +                                                desc->byte_cnt, DMA_TO_DEVICE);        }
>  }
> 
> 
> I've been running that for the last five days, and I've yet to see
> /any/ issues what so ever, and that includes running with the GPU
> logging all that time:
> 
> -rw-r--r-- 1 root root 17113616 Dec 17 23:52 /shared/etnaviv.bin
> 
> During that time, I've been using the device over the network, running
> various git commands, running builds, running the occasional build
> via NFS, etc.
> 
> So, for me it was trivially easy to reproduce (without my fix in place)
> and all problems have gone away when I've fixed the apparent problem.
> 

Well that's interesting. You've fixed only the non-TSO egress path,
yet your original ethtool output showed tcp-segmentation-offload enabled.

This seems to imply the highmem pages are found only for the non-TSO path.

> However, exactly how it occurs, I don't know.  My understanding from
> reading the various feature flags was that NETIF_F_HIGHDMA was required
> for highmem (see illegal_highdma()) so as this isn't set, we shouldn't
> be seeing highmem fragments - which is why I asked the question in my
> original email.
> 
> If you want me to revert my fix above, and reproduce again, I can
> certainly try that - or put a WARN_ON_ONCE(PageHighMem(this_frag->page.p))
> in there, but I seem to remember that it wasn't particularly useful as
> the backtrace didn't show where the memory actually came from.
> 

No, that's OK. Thanks a lot for all the details. I'll try to come up with a
fix soon.
-- 
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: mv643xxx fails with highmem
  2014-12-18 13:13         ` Ezequiel Garcia
@ 2014-12-21 16:51           ` Russell King - ARM Linux
  2015-01-20 13:41             ` Russell King - ARM Linux
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King - ARM Linux @ 2014-12-21 16:51 UTC (permalink / raw)
  To: Ezequiel Garcia, David Miller
  Cc: Nimrod Andy, Fabio Estevam, netdev, fugang.duan

On Thu, Dec 18, 2014 at 10:13:19AM -0300, Ezequiel Garcia wrote:
> On 12/17/2014 09:03 PM, Russell King - ARM Linux wrote:
> > However, exactly how it occurs, I don't know.  My understanding from
> > reading the various feature flags was that NETIF_F_HIGHDMA was required
> > for highmem (see illegal_highdma()) so as this isn't set, we shouldn't
> > be seeing highmem fragments - which is why I asked the question in my
> > original email.
> > 
> > If you want me to revert my fix above, and reproduce again, I can
> > certainly try that - or put a WARN_ON_ONCE(PageHighMem(this_frag->page.p))
> > in there, but I seem to remember that it wasn't particularly useful as
> > the backtrace didn't show where the memory actually came from.
> > 
> 
> No, that's OK. Thanks a lot for all the details. I'll try to come up with a
> fix soon.

Well, I decided to add the WARN_ON_ONCE() and re-test.  This I provoked
by touching etna_viv/src/etnaviv/etna_bo.c, and re-running make (etnaviv
is on a shared NFS mount.)

WARNING: CPU: 0 PID: 0 at /home/rmk/git/linux-cubox/drivers/net/ethernet/marvell/mv643xx_eth.c:884 mv643xx_eth_xmit+0x850/0x8dc()
Modules linked in: bnep rfcomm bluetooth nfsd exportfs ext3 jbd ext2 etnaviv(C)
snd_soc_spdif_tx orion_wdt snd_soc_kirkwood dove vmeta bmm_dmabuf hwmon snd_soc_kirkwood_spdif
CPU: 0 PID: 0 Comm: swapper Tainted: G         C     3.18.0+ #1056
Backtrace:
[<c0011f54>] (dump_backtrace) from [<c0012228>] (show_stack+0x18/0x1c)
 r6:00000374 r5:00000009 r4:00000000 r3:00000000
[<c0012210>] (show_stack) from [<c04992d8>] (dump_stack+0x20/0x28)
[<c04992b8>] (dump_stack) from [<c0050be4>] (warn_slowpath_common+0x6c/0x8c)
[<c0050b78>] (warn_slowpath_common) from [<c0050c28>] (warn_slowpath_null+0x24/0x2c)
 r8:c064ea80 r7:e8a5d880 r6:d00d0d70 r5:e614877c r4:00000001
[<c0050c04>] (warn_slowpath_null) from [<c02fee9c>] (mv643xx_eth_xmit+0x850/0x8dc)
[<c02fe64c>] (mv643xx_eth_xmit) from [<c03b94fc>] (dev_hard_start_xmit+0x19c/0x328)
 r10:c0648054 r9:d0261f60 r8:e6148000 r7:d00ec1c0 r6:c0648040 r5:e623ec00
 r4:d013c580
[<c03b9360>] (dev_hard_start_xmit) from [<c03d2728>] (sch_direct_xmit+0x148/0x24c)
 r10:e623ec00 r9:e63a4580 r8:e6148000 r7:e61f4e00 r6:c063e000 r5:00000000
 r4:00000000
[<c03d25e0>] (sch_direct_xmit) from [<c03b985c>] (__dev_queue_xmit+0x1d4/0x590)
 r10:e623ec00 r9:00000000 r8:00000000 r7:e6148000 r6:c063e000 r5:e61f4e00
 r4:d0163e70
[<c03b9688>] (__dev_queue_xmit) from [<c03b9c40>] (dev_queue_xmit+0x14/0x18)
 r10:c063e000 r9:00000000 r8:00000000 r7:d0163e70 r6:0000000e r5:d00caa00
 r4:d00caa94
[<c03b9c2c>] (dev_queue_xmit) from [<c043c58c>] (ip6_finish_output2+0x1a0/0x524)[<c043c3ec>] (ip6_finish_output2) from [<c043e008>] (ip6_output+0xb4/0x174)
 r10:d0163e70 r9:c063e000 r8:c066f678 r7:00000000 r6:d0163e70 r5:00000000
 r4:000021c0
[<c043df54>] (ip6_output) from [<c043c114>] (ip6_xmit+0x278/0x550)
 r7:00000000 r6:00000001 r5:00000000 r4:001463b6
[<c043be9c>] (ip6_xmit) from [<c0466fbc>] (inet6_csk_xmit+0x74/0xa8)
 r10:d0163e70 r9:00000020 r8:d00d2080 r7:d00d25a0 r6:d0163e70 r5:00000000
 r4:d00d2080
[<c0466f48>] (inet6_csk_xmit) from [<c03f87ac>] (tcp_transmit_skb+0x494/0x990)
 r7:e6094100 r6:c0658910 r5:00000020 r4:ffff5165
[<c03f8318>] (tcp_transmit_skb) from [<c03f9970>] (tcp_write_xmit+0x138/0xc1c)
 r10:00002178 r9:00000000 r8:00002ca0 r7:00000006 r6:00000594 r5:d0163dc0
 r4:d00d2080
[<c03f9838>] (tcp_write_xmit) from [<c03fa4d4>] (__tcp_push_pending_frames+0x38/0x98)
 r10:00000002 r9:00000078 r8:d03d7600 r7:d00d25a0 r6:d02841c0 r5:e448f778
 r4:d00d2080
[<c03fa49c>] (__tcp_push_pending_frames) from [<c03f567c>] (tcp_rcv_established+0x15c/0x600)
 r4:d00d2080
[<c03f5520>] (tcp_rcv_established) from [<c0461918>] (tcp_v6_do_rcv+0x2bc/0x46c) r10:00000002 r9:00000078 r8:d03d7600 r7:d00d25a0 r6:00000000 r5:d00d2080
 r4:d02841c0
[<c046165c>] (tcp_v6_do_rcv) from [<c0462828>] (tcp_v6_rcv+0x7f8/0x810)
 r8:00000000 r7:d00d2080 r6:c063e000 r5:c066f678 r4:d02841c0
[<c0462030>] (tcp_v6_rcv) from [<c043e750>] (ip6_input+0xec/0x424)
 r10:c066f678 r9:c052a75c r8:d02841c0 r7:c0649720 r6:00000006 r5:e6168400
 r4:00000006
[<c043e664>] (ip6_input) from [<c043e100>] (ip6_rcv_finish+0x38/0xa4)
 r10:e6168400 r9:e6148000 r8:d02841c0 r7:00000001 r6:c066f678 r5:00000000
 r4:d02841c0 r3:c043e664
[<c043e0c8>] (ip6_rcv_finish) from [<c043e494>] (ipv6_rcv+0x328/0x4f8)
 r4:e448f750 r3:00000000
[<c043e16c>] (ipv6_rcv) from [<c03b43c0>] (__netif_receive_skb_core+0x2fc/0x5d0) r10:d02841c0 r9:c0649050 r8:c06480c4 r7:e6148000 r6:00000000 r5:0000dd86
 r4:c043e16c
[<c03b40c4>] (__netif_receive_skb_core) from [<c03b6bac>] (__netif_receive_skb+0x2c/0x88)
 r10:00000001 r9:d02841c0 r8:e61484e0 r7:e8a5b310 r6:2cc7fffe r5:00000003
 r4:d02841c0
[<c03b6b80>] (__netif_receive_skb) from [<c03b6d30>] (netif_receive_skb_internal+0x2c/0x68)
 r5:00000003 r4:d02841c0
[<c03b6d04>] (netif_receive_skb_internal) from [<c03b7690>] (napi_gro_receive+0x7c/0xa8)
 r4:d02841c0
[<c03b7614>] (napi_gro_receive) from [<c0300738>] (mv643xx_eth_poll+0x58c/0x6ac) r5:e6148000 r4:e614864c
[<c03001ac>] (mv643xx_eth_poll) from [<c03b7370>] (net_rx_action+0xa4/0x1a8)
 r10:c0658910 r9:c0675940 r8:c0675940 r7:0000012c r6:00000040 r5:e61485c0
 r4:c03001ac
[<c03b72cc>] (net_rx_action) from [<c00533c0>] (__do_softirq+0xf0/0x214)
 r10:00000003 r9:00000101 r8:c063e000 r7:00000003 r6:c0677480 r5:c067748c
 r4:00000000
[<c00532d0>] (__do_softirq) from [<c005377c>] (irq_exit+0xac/0xfc)
 r10:e6ffcd40 r9:560f5815 r8:00000000 r7:0000001e r6:00000000 r5:00000000
 r4:c063e000
[<c00536d0>] (irq_exit) from [<c0085330>] (__handle_domain_irq+0x7c/0xc0)
 r4:c065ef68 r3:00010001
[<c00852b4>] (__handle_domain_irq) from [<c000fb0c>] (handle_IRQ+0x24/0x28)
 r8:00000001 r7:c063ff74 r6:ffffffff r5:600f0013 r4:c0077408 r3:c063ff40
[<c000fae8>] (handle_IRQ) from [<c0008600>] (dove_legacy_handle_irq+0x34/0x5c)
[<c00085cc>] (dove_legacy_handle_irq) from [<c0012ce0>] (__irq_svc+0x40/0x74)
Exception stack(0xc063ff40 to 0xc063ff88)
ff40: 00000000 0001114c c0658324 c001d940 c063e000 c06470c8 c06759fe c06759fe
ff60: 00000001 560f5815 e6ffcd40 c063ff9c c063ff88 c063ff88 c000fc38 c0077408
ff80: 600f0013 ffffffff
[<c0077354>] (cpu_startup_entry) from [<c04952c4>] (rest_init+0x78/0x90)
 r7:ffffffff r3:c04b0554
[<c049524c>] (rest_init) from [<c0609ca8>] (start_kernel+0x34c/0x3b0)
 r4:c06479b0 r3:00000001
[<c060995c>] (start_kernel) from [<00008070>] (0x8070)

Obviously, the useful bit is only down to just above
__tcp_push_pending_frames(), since that's traces back to the TCP socket's
send queue.

If I had to guess where the highmem pages were coming from, I'd suggest
that it was via the page cache and NFS - maybe something like GCC opens
a new NFS file, writes to it.  Pages are allocated to it, which happen
to be allocated from highmem.  NFS eventually sends these pages to the
NFS server via TCP, which fragments the page and leaves pointers to the
highmem page in the TCP skbuff.  My NFS mounts are IPv6.

For Freescale iMX6 FEC, I haven't yet been able to reproduce this there.
The FEC has tx-checksum-ipv6 and rx-vlan-offload enabled, which the
mv643xxx driver doesn't have.  The FEC platform has twice the memory of
the Dove platform (so has much more highmem) so I would've thought it
would have been easier to reproduce there.  Slightly different kernels
too - Dove runs 3.18 plus additions, iMX6 is running Linus' tip from
two days ago plus very similar additions (but same GPU and X server
code.)

Other stuff... Dove memory is 0 - 0x3fffffff.  mv643xxx DMA mask
uninitialised (coherent DMA mask set to 0xffffffff).

iMX6 is 0x10000000 - 0x8fffffff, with the DMA mask defaulting to
point at the coherent mask (due to being DT based) which is
0xffffffff.

Here's the diff of the ethtool -k output:

--- eth0.dove       2014-12-21 16:38:39.000000000 +0000
+++ eth0.imx6       2014-12-21 16:38:50.792453703 +0000
@@ -3,7 +3,7 @@
 tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
-       tx-checksum-ipv6: off [fixed]
+       tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
 scatter-gather: on
@@ -17,7 +17,7 @@
 generic-segmentation-offload: on
 generic-receive-offload: on
 large-receive-offload: off [fixed]
-rx-vlan-offload: off [fixed]
+rx-vlan-offload: on
 tx-vlan-offload: off [fixed]
 ntuple-filters: off [fixed]
 receive-hashing: off [fixed]
@@ -32,7 +32,6 @@
 tx-ipip-segmentation: off [fixed]
 tx-sit-segmentation: off [fixed]
 tx-udp_tnl-segmentation: off [fixed]
-tx-mpls-segmentation: off [fixed]
 fcoe-mtu: off [fixed]
 tx-nocache-copy: off
 loopback: off [fixed]

Hmm.  I'm now wondering about this:

static netdev_features_t harmonize_features(struct sk_buff *skb,
        netdev_features_t features)
{
...
        if (skb->ip_summed != CHECKSUM_NONE &&
            !can_checksum_protocol(features, type)) {
                features &= ~NETIF_F_ALL_CSUM;
        } else if (illegal_highdma(skb->dev, skb)) {
                features &= ~NETIF_F_SG;
        }

For Dove, can_checksum_protocol() would return false for IPv6, which
would allow the first "if" statement to succeed, hence clearing
NETIF_F_ALL_CSUM.

This would prevent the second if() being evaluated - which seems to
remove the check for any fragments in highmem.  David - shouldn't these
two checks be independent?

-- 
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: mv643xxx fails with highmem
  2014-12-21 16:51           ` Russell King - ARM Linux
@ 2015-01-20 13:41             ` Russell King - ARM Linux
  2015-01-20 13:42               ` Ezequiel Garcia
  0 siblings, 1 reply; 17+ messages in thread
From: Russell King - ARM Linux @ 2015-01-20 13:41 UTC (permalink / raw)
  To: Ezequiel Garcia, David Miller, Linus Torvalds
  Cc: Nimrod Andy, Fabio Estevam, netdev, fugang.duan

Ping.  What's happening on this?

I guess we don't care about regressions in the Linux kernel?

This bug breaks one of my machines, and I'm having to patch around it
by reverting the first two hunks of 69ad0dd7af22, which is only correct
for platforms where dma_unmap_page and dma_unmap_single result in the
same underlying code being used.

I'm not convinced that it's directly caused by 69ad0dd7af22 though.

On Sun, Dec 21, 2014 at 04:51:06PM +0000, Russell King - ARM Linux wrote:
> On Thu, Dec 18, 2014 at 10:13:19AM -0300, Ezequiel Garcia wrote:
> > On 12/17/2014 09:03 PM, Russell King - ARM Linux wrote:
> > > However, exactly how it occurs, I don't know.  My understanding from
> > > reading the various feature flags was that NETIF_F_HIGHDMA was required
> > > for highmem (see illegal_highdma()) so as this isn't set, we shouldn't
> > > be seeing highmem fragments - which is why I asked the question in my
> > > original email.
> > > 
> > > If you want me to revert my fix above, and reproduce again, I can
> > > certainly try that - or put a WARN_ON_ONCE(PageHighMem(this_frag->page.p))
> > > in there, but I seem to remember that it wasn't particularly useful as
> > > the backtrace didn't show where the memory actually came from.
> > > 
> > 
> > No, that's OK. Thanks a lot for all the details. I'll try to come up with a
> > fix soon.
> 
> Well, I decided to add the WARN_ON_ONCE() and re-test.  This I provoked
> by touching etna_viv/src/etnaviv/etna_bo.c, and re-running make (etnaviv
> is on a shared NFS mount.)
> 
> WARNING: CPU: 0 PID: 0 at /home/rmk/git/linux-cubox/drivers/net/ethernet/marvell/mv643xx_eth.c:884 mv643xx_eth_xmit+0x850/0x8dc()
> Modules linked in: bnep rfcomm bluetooth nfsd exportfs ext3 jbd ext2 etnaviv(C)
> snd_soc_spdif_tx orion_wdt snd_soc_kirkwood dove vmeta bmm_dmabuf hwmon snd_soc_kirkwood_spdif
> CPU: 0 PID: 0 Comm: swapper Tainted: G         C     3.18.0+ #1056
> Backtrace:
> [<c0011f54>] (dump_backtrace) from [<c0012228>] (show_stack+0x18/0x1c)
>  r6:00000374 r5:00000009 r4:00000000 r3:00000000
> [<c0012210>] (show_stack) from [<c04992d8>] (dump_stack+0x20/0x28)
> [<c04992b8>] (dump_stack) from [<c0050be4>] (warn_slowpath_common+0x6c/0x8c)
> [<c0050b78>] (warn_slowpath_common) from [<c0050c28>] (warn_slowpath_null+0x24/0x2c)
>  r8:c064ea80 r7:e8a5d880 r6:d00d0d70 r5:e614877c r4:00000001
> [<c0050c04>] (warn_slowpath_null) from [<c02fee9c>] (mv643xx_eth_xmit+0x850/0x8dc)
> [<c02fe64c>] (mv643xx_eth_xmit) from [<c03b94fc>] (dev_hard_start_xmit+0x19c/0x328)
>  r10:c0648054 r9:d0261f60 r8:e6148000 r7:d00ec1c0 r6:c0648040 r5:e623ec00
>  r4:d013c580
> [<c03b9360>] (dev_hard_start_xmit) from [<c03d2728>] (sch_direct_xmit+0x148/0x24c)
>  r10:e623ec00 r9:e63a4580 r8:e6148000 r7:e61f4e00 r6:c063e000 r5:00000000
>  r4:00000000
> [<c03d25e0>] (sch_direct_xmit) from [<c03b985c>] (__dev_queue_xmit+0x1d4/0x590)
>  r10:e623ec00 r9:00000000 r8:00000000 r7:e6148000 r6:c063e000 r5:e61f4e00
>  r4:d0163e70
> [<c03b9688>] (__dev_queue_xmit) from [<c03b9c40>] (dev_queue_xmit+0x14/0x18)
>  r10:c063e000 r9:00000000 r8:00000000 r7:d0163e70 r6:0000000e r5:d00caa00
>  r4:d00caa94
> [<c03b9c2c>] (dev_queue_xmit) from [<c043c58c>] (ip6_finish_output2+0x1a0/0x524)[<c043c3ec>] (ip6_finish_output2) from [<c043e008>] (ip6_output+0xb4/0x174)
>  r10:d0163e70 r9:c063e000 r8:c066f678 r7:00000000 r6:d0163e70 r5:00000000
>  r4:000021c0
> [<c043df54>] (ip6_output) from [<c043c114>] (ip6_xmit+0x278/0x550)
>  r7:00000000 r6:00000001 r5:00000000 r4:001463b6
> [<c043be9c>] (ip6_xmit) from [<c0466fbc>] (inet6_csk_xmit+0x74/0xa8)
>  r10:d0163e70 r9:00000020 r8:d00d2080 r7:d00d25a0 r6:d0163e70 r5:00000000
>  r4:d00d2080
> [<c0466f48>] (inet6_csk_xmit) from [<c03f87ac>] (tcp_transmit_skb+0x494/0x990)
>  r7:e6094100 r6:c0658910 r5:00000020 r4:ffff5165
> [<c03f8318>] (tcp_transmit_skb) from [<c03f9970>] (tcp_write_xmit+0x138/0xc1c)
>  r10:00002178 r9:00000000 r8:00002ca0 r7:00000006 r6:00000594 r5:d0163dc0
>  r4:d00d2080
> [<c03f9838>] (tcp_write_xmit) from [<c03fa4d4>] (__tcp_push_pending_frames+0x38/0x98)
>  r10:00000002 r9:00000078 r8:d03d7600 r7:d00d25a0 r6:d02841c0 r5:e448f778
>  r4:d00d2080
> [<c03fa49c>] (__tcp_push_pending_frames) from [<c03f567c>] (tcp_rcv_established+0x15c/0x600)
>  r4:d00d2080
> [<c03f5520>] (tcp_rcv_established) from [<c0461918>] (tcp_v6_do_rcv+0x2bc/0x46c) r10:00000002 r9:00000078 r8:d03d7600 r7:d00d25a0 r6:00000000 r5:d00d2080
>  r4:d02841c0
> [<c046165c>] (tcp_v6_do_rcv) from [<c0462828>] (tcp_v6_rcv+0x7f8/0x810)
>  r8:00000000 r7:d00d2080 r6:c063e000 r5:c066f678 r4:d02841c0
> [<c0462030>] (tcp_v6_rcv) from [<c043e750>] (ip6_input+0xec/0x424)
>  r10:c066f678 r9:c052a75c r8:d02841c0 r7:c0649720 r6:00000006 r5:e6168400
>  r4:00000006
> [<c043e664>] (ip6_input) from [<c043e100>] (ip6_rcv_finish+0x38/0xa4)
>  r10:e6168400 r9:e6148000 r8:d02841c0 r7:00000001 r6:c066f678 r5:00000000
>  r4:d02841c0 r3:c043e664
> [<c043e0c8>] (ip6_rcv_finish) from [<c043e494>] (ipv6_rcv+0x328/0x4f8)
>  r4:e448f750 r3:00000000
> [<c043e16c>] (ipv6_rcv) from [<c03b43c0>] (__netif_receive_skb_core+0x2fc/0x5d0) r10:d02841c0 r9:c0649050 r8:c06480c4 r7:e6148000 r6:00000000 r5:0000dd86
>  r4:c043e16c
> [<c03b40c4>] (__netif_receive_skb_core) from [<c03b6bac>] (__netif_receive_skb+0x2c/0x88)
>  r10:00000001 r9:d02841c0 r8:e61484e0 r7:e8a5b310 r6:2cc7fffe r5:00000003
>  r4:d02841c0
> [<c03b6b80>] (__netif_receive_skb) from [<c03b6d30>] (netif_receive_skb_internal+0x2c/0x68)
>  r5:00000003 r4:d02841c0
> [<c03b6d04>] (netif_receive_skb_internal) from [<c03b7690>] (napi_gro_receive+0x7c/0xa8)
>  r4:d02841c0
> [<c03b7614>] (napi_gro_receive) from [<c0300738>] (mv643xx_eth_poll+0x58c/0x6ac) r5:e6148000 r4:e614864c
> [<c03001ac>] (mv643xx_eth_poll) from [<c03b7370>] (net_rx_action+0xa4/0x1a8)
>  r10:c0658910 r9:c0675940 r8:c0675940 r7:0000012c r6:00000040 r5:e61485c0
>  r4:c03001ac
> [<c03b72cc>] (net_rx_action) from [<c00533c0>] (__do_softirq+0xf0/0x214)
>  r10:00000003 r9:00000101 r8:c063e000 r7:00000003 r6:c0677480 r5:c067748c
>  r4:00000000
> [<c00532d0>] (__do_softirq) from [<c005377c>] (irq_exit+0xac/0xfc)
>  r10:e6ffcd40 r9:560f5815 r8:00000000 r7:0000001e r6:00000000 r5:00000000
>  r4:c063e000
> [<c00536d0>] (irq_exit) from [<c0085330>] (__handle_domain_irq+0x7c/0xc0)
>  r4:c065ef68 r3:00010001
> [<c00852b4>] (__handle_domain_irq) from [<c000fb0c>] (handle_IRQ+0x24/0x28)
>  r8:00000001 r7:c063ff74 r6:ffffffff r5:600f0013 r4:c0077408 r3:c063ff40
> [<c000fae8>] (handle_IRQ) from [<c0008600>] (dove_legacy_handle_irq+0x34/0x5c)
> [<c00085cc>] (dove_legacy_handle_irq) from [<c0012ce0>] (__irq_svc+0x40/0x74)
> Exception stack(0xc063ff40 to 0xc063ff88)
> ff40: 00000000 0001114c c0658324 c001d940 c063e000 c06470c8 c06759fe c06759fe
> ff60: 00000001 560f5815 e6ffcd40 c063ff9c c063ff88 c063ff88 c000fc38 c0077408
> ff80: 600f0013 ffffffff
> [<c0077354>] (cpu_startup_entry) from [<c04952c4>] (rest_init+0x78/0x90)
>  r7:ffffffff r3:c04b0554
> [<c049524c>] (rest_init) from [<c0609ca8>] (start_kernel+0x34c/0x3b0)
>  r4:c06479b0 r3:00000001
> [<c060995c>] (start_kernel) from [<00008070>] (0x8070)
> 
> Obviously, the useful bit is only down to just above
> __tcp_push_pending_frames(), since that's traces back to the TCP socket's
> send queue.
> 
> If I had to guess where the highmem pages were coming from, I'd suggest
> that it was via the page cache and NFS - maybe something like GCC opens
> a new NFS file, writes to it.  Pages are allocated to it, which happen
> to be allocated from highmem.  NFS eventually sends these pages to the
> NFS server via TCP, which fragments the page and leaves pointers to the
> highmem page in the TCP skbuff.  My NFS mounts are IPv6.
> 
> For Freescale iMX6 FEC, I haven't yet been able to reproduce this there.
> The FEC has tx-checksum-ipv6 and rx-vlan-offload enabled, which the
> mv643xxx driver doesn't have.  The FEC platform has twice the memory of
> the Dove platform (so has much more highmem) so I would've thought it
> would have been easier to reproduce there.  Slightly different kernels
> too - Dove runs 3.18 plus additions, iMX6 is running Linus' tip from
> two days ago plus very similar additions (but same GPU and X server
> code.)
> 
> Other stuff... Dove memory is 0 - 0x3fffffff.  mv643xxx DMA mask
> uninitialised (coherent DMA mask set to 0xffffffff).
> 
> iMX6 is 0x10000000 - 0x8fffffff, with the DMA mask defaulting to
> point at the coherent mask (due to being DT based) which is
> 0xffffffff.
> 
> Here's the diff of the ethtool -k output:
> 
> --- eth0.dove       2014-12-21 16:38:39.000000000 +0000
> +++ eth0.imx6       2014-12-21 16:38:50.792453703 +0000
> @@ -3,7 +3,7 @@
>  tx-checksumming: on
>         tx-checksum-ipv4: on
>         tx-checksum-ip-generic: off [fixed]
> -       tx-checksum-ipv6: off [fixed]
> +       tx-checksum-ipv6: on
>         tx-checksum-fcoe-crc: off [fixed]
>         tx-checksum-sctp: off [fixed]
>  scatter-gather: on
> @@ -17,7 +17,7 @@
>  generic-segmentation-offload: on
>  generic-receive-offload: on
>  large-receive-offload: off [fixed]
> -rx-vlan-offload: off [fixed]
> +rx-vlan-offload: on
>  tx-vlan-offload: off [fixed]
>  ntuple-filters: off [fixed]
>  receive-hashing: off [fixed]
> @@ -32,7 +32,6 @@
>  tx-ipip-segmentation: off [fixed]
>  tx-sit-segmentation: off [fixed]
>  tx-udp_tnl-segmentation: off [fixed]
> -tx-mpls-segmentation: off [fixed]
>  fcoe-mtu: off [fixed]
>  tx-nocache-copy: off
>  loopback: off [fixed]
> 
> Hmm.  I'm now wondering about this:
> 
> static netdev_features_t harmonize_features(struct sk_buff *skb,
>         netdev_features_t features)
> {
> ...
>         if (skb->ip_summed != CHECKSUM_NONE &&
>             !can_checksum_protocol(features, type)) {
>                 features &= ~NETIF_F_ALL_CSUM;
>         } else if (illegal_highdma(skb->dev, skb)) {
>                 features &= ~NETIF_F_SG;
>         }
> 
> For Dove, can_checksum_protocol() would return false for IPv6, which
> would allow the first "if" statement to succeed, hence clearing
> NETIF_F_ALL_CSUM.
> 
> This would prevent the second if() being evaluated - which seems to
> remove the check for any fragments in highmem.  David - shouldn't these
> two checks be independent?
> 
> -- 
> FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
> according to speedtest.net.

-- 
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Bug: mv643xxx fails with highmem
  2015-01-20 13:41             ` Russell King - ARM Linux
@ 2015-01-20 13:42               ` Ezequiel Garcia
  0 siblings, 0 replies; 17+ messages in thread
From: Ezequiel Garcia @ 2015-01-20 13:42 UTC (permalink / raw)
  To: Russell King - ARM Linux, David Miller, Linus Torvalds
  Cc: Nimrod Andy, Fabio Estevam, netdev, fugang.duan

On 01/20/2015 10:41 AM, Russell King - ARM Linux wrote:
> Ping.  What's happening on this?
> 
> I guess we don't care about regressions in the Linux kernel?
> 

Sorry, a holiday trip got in the middle of this.

I'll try to submit a fix this week.
-- 
Ezequiel García, Free Electrons
Embedded Linux, Kernel and Android Engineering
http://free-electrons.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-01-20 13:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-12-11 19:49 Bug: mv643xxx fails with highmem Russell King - ARM Linux
2014-12-11 20:10 ` David Miller
2014-12-11 20:12   ` Ezequiel Garcia
2014-12-11 20:25   ` Russell King - ARM Linux
2014-12-11 20:25     ` Ezequiel Garcia
2014-12-11 20:27     ` David Miller
2014-12-12  5:34       ` fugang.duan
2014-12-15 18:04         ` Russell King - ARM Linux
2014-12-16  2:19           ` fugang.duan
2014-12-16 10:57             ` Russell King - ARM Linux
2014-12-18  2:29               ` fugang.duan
2014-12-17 21:18     ` Ezequiel Garcia
2014-12-18  0:03       ` Russell King - ARM Linux
2014-12-18 13:13         ` Ezequiel Garcia
2014-12-21 16:51           ` Russell King - ARM Linux
2015-01-20 13:41             ` Russell King - ARM Linux
2015-01-20 13:42               ` Ezequiel Garcia

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).