Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures

All of lore.kernel.org
 help / color / mirror / Atom feed

* Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures
@ 2009-08-12 22:09 Jonathan Haws
  2009-08-12 22:37 ` Chris Friesen
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Haws @ 2009-08-12 22:09 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org

All,

I am having some issues with my target and was hoping that someone could le=
nd a hand.  I am using an AMCC 405EX (Kilauea) board running Linux kernel 2=
.6.31.

Here is the problem.  I have some code that receives jumbo frames via the E=
MAC, sticks the data in a buffer, and writes the data out to a solid-state =
SATA disk (using a Silicon Image 3531 controller).

What is happening is that I appear to be running out of memory and I cannot=
 figure out why.  The closest thing I can tell is that the sil24 driver for=
 the SATA controller does not seem to be releasing memory back to the kerne=
l for some reason.  After some time of capturing data and logging it to dis=
k, I get the following kernel dump:

kswapd0: page allocation failure. order:2, mode:0x4020 Call Trace:
[cfaa19a0] [c0006ef0] show_stack+0x44/0x16c (unreliable) [cfaa19e0] [c006f5=
e4] __alloc_pages_nodemask+0x38c/0x4f8
[cfaa1a60] [c006f770] __get_free_pages+0x20/0x50 [cfaa1a70] [c00955d4] __km=
alloc_track_caller+0xcc/0xf0 [cfaa1a90] [c01c437c] __alloc_skb+0x60/0x140 [=
cfaa1ab0] [c01a319c] emac_poll_rx+0x46c/0x7e4 [cfaa1af0] [c019e85c] mal_pol=
l+0xa8/0x1ec [cfaa1b20] [c01cfddc] net_rx_action+0x9c/0x1b4 [cfaa1b50] [c00=
3b3a8] __do_softirq+0xc4/0x148 [cfaa1b90] [c0004d18] do_softirq+0x78/0x80 [=
cfaa1ba0] [c003af94] irq_exit+0x64/0x7c [cfaa1bb0] [c0005210] do_IRQ+0x9c/0=
xb4 [cfaa1bd0] [c000fa7c] ret_from_except+0x0/0x18 [cfaa1c90] [c0094dc4] km=
em_cache_free+0x74/0xcc [cfaa1cb0] [c00c0570] free_buffer_head+0x38/0x84 [c=
faa1cc0] [c00c0b8c] try_to_free_buffers+0x94/0xe0 [cfaa1cf0] [c0067e70] try=
_to_release_page+0x6c/0x84 [cfaa1d00] [c0075f58] shrink_page_list+0x648/0x8=
18 [cfaa1de0] [c0076620] shrink_zone+0x4f8/0xac4 [cfaa1f00] [c0077294] kswa=
pd+0x4a0/0x4bc [cfaa1fc0] [c004d6d8] kthread+0x70/0x74 [cfaa1ff0] [c000f220=
] kernel_thread+0x4c/0x68
Mem-Info:
DMA per-cpu:
CPU    0: hi:   90, btch:  15 usd:  54
Active_anon:5155 active_file:626 inactive_anon:5216
 inactive_file:42474 unevictable:0 dirty:176 writeback:0 unstable:0
 free:631 slab:6416 mapped:324 pagetables:32 bounce:0 DMA free:2524kB min:2=
036kB low:2544kB high:3052kB active_anon:20620kB inactive_anon:20864kB acti=
ve_file:2504kB inactive_file:169896kB unevictable:0kB present:260096kB page=
s_scanned:64 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 345*4kB 119*8kB 0*16kB 0*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB =
0*2048kB 0*4096kB =3D 2524kB
43129 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0 Free swap  =3D 0kB Total swap =
=3D 0kB
65536 pages RAM
1397 pages reserved
43434 pages shared
20347 pages non-shared

I am not sure what is causing this.  It only happens when I run both the ne=
twork and the SATA disk at the same time.  If I only capture data on the EM=
AC, things work just fine (I ran the system overnight, capturing data at 36=
Mbytes/s without even a hiccup).  If I only write data to disk, things seem=
 to work fine.  But when I combine the two, then things go crazy.

Here is the loop:

for(;;)
{
	if( datalength + 9000 > 16*1024*1024 )
	{
		write(fd, (char*)&rxBuf[count][0], dataLength);
		fsync(fd);
		wrBytes +=3D dataLength;
		dataLength =3D 0;

		count =3D (count+1)%RXCNT;
	}

	bytes =3D recvfrom(sock.socket,(char*)&rxBuf[count][dataLength],
		MTUSIZE, (int)NULL, NULL, NULL);

	rxBytes +=3D bytes;
	dataLength +=3D bytes;

	sched_yield();

} /* for(;;) */

A pretty simple loop to receive the data, place it into a buffer, and write=
 it to disk when ready.

What is it about the write call that would not release memory?

Any ideas?  Has anyone seen this type of behavior before?

Thanks!

Jonathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures
@ 2009-08-12 22:11 Jonathan Haws
  0 siblings, 0 replies; 5+ messages in thread
From: Jonathan Haws @ 2009-08-12 22:11 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org

All,

I am having some issues with my target and was hoping that someone could le=
nd a hand.  I am using an AMCC 405EX (Kilauea) board running Linux kernel 2=
.6.31.

Here is the problem.  I have some code that receives jumbo frames via the E=
MAC, sticks the data in a buffer, and writes the data out to a solid-state =
SATA disk (using a Silicon Image 3531 controller).

What is happening is that I appear to be running out of memory and I cannot=
 figure out why.  The closest thing I can tell is that the sil24 driver for=
 the SATA controller does not seem to be releasing memory back to the kerne=
l for some reason.  After some time of capturing data and logging it to dis=
k, I get the following kernel dump:

kswapd0: page allocation failure. order:2, mode:0x4020 Call Trace:
[cfaa19a0] [c0006ef0] show_stack+0x44/0x16c (unreliable) [cfaa19e0] [c006f5=
e4] __alloc_pages_nodemask+0x38c/0x4f8
[cfaa1a60] [c006f770] __get_free_pages+0x20/0x50 [cfaa1a70] [c00955d4] __km=
alloc_track_caller+0xcc/0xf0 [cfaa1a90] [c01c437c] __alloc_skb+0x60/0x140 [=
cfaa1ab0] [c01a319c] emac_poll_rx+0x46c/0x7e4 [cfaa1af0] [c019e85c] mal_pol=
l+0xa8/0x1ec [cfaa1b20] [c01cfddc] net_rx_action+0x9c/0x1b4 [cfaa1b50] [c00=
3b3a8] __do_softirq+0xc4/0x148 [cfaa1b90] [c0004d18] do_softirq+0x78/0x80 [=
cfaa1ba0] [c003af94] irq_exit+0x64/0x7c [cfaa1bb0] [c0005210] do_IRQ+0x9c/0=
xb4 [cfaa1bd0] [c000fa7c] ret_from_except+0x0/0x18 [cfaa1c90] [c0094dc4] km=
em_cache_free+0x74/0xcc [cfaa1cb0] [c00c0570] free_buffer_head+0x38/0x84 [c=
faa1cc0] [c00c0b8c] try_to_free_buffers+0x94/0xe0 [cfaa1cf0] [c0067e70] try=
_to_release_page+0x6c/0x84 [cfaa1d00] [c0075f58] shrink_page_list+0x648/0x8=
18 [cfaa1de0] [c0076620] shrink_zone+0x4f8/0xac4 [cfaa1f00] [c0077294] kswa=
pd+0x4a0/0x4bc [cfaa1fc0] [c004d6d8] kthread+0x70/0x74 [cfaa1ff0] [c000f220=
] kernel_thread+0x4c/0x68
Mem-Info:
DMA per-cpu:
CPU    0: hi:   90, btch:  15 usd:  54
Active_anon:5155 active_file:626 inactive_anon:5216
 inactive_file:42474 unevictable:0 dirty:176 writeback:0 unstable:0
 free:631 slab:6416 mapped:324 pagetables:32 bounce:0 DMA free:2524kB min:2=
036kB low:2544kB high:3052kB active_anon:20620kB inactive_anon:20864kB acti=
ve_file:2504kB inactive_file:169896kB unevictable:0kB present:260096kB page=
s_scanned:64 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 345*4kB 119*8kB 0*16kB 0*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB =
0*2048kB 0*4096kB =3D 2524kB
43129 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0 Free swap  =3D 0kB Total swap =
=3D 0kB
65536 pages RAM
1397 pages reserved
43434 pages shared
20347 pages non-shared

I am not sure what is causing this.  It only happens when I run both the ne=
twork and the SATA disk at the same time.  If I only capture data on the EM=
AC, things work just fine (I ran the system overnight, capturing data at 36=
Mbytes/s without even a hiccup).  If I only write data to disk, things seem=
 to work fine.  But when I combine the two, then things go crazy.

Here is the loop:

for(;;)
{
	if( datalength + 9000 > 16*1024*1024 )
	{
		write(fd, (char*)&rxBuf[count][0], dataLength);
		fsync(fd);
		wrBytes +=3D dataLength;
		dataLength =3D 0;

		count =3D (count+1)%RXCNT;
	}

	bytes =3D recvfrom(sock.socket,(char*)&rxBuf[count][dataLength],
		MTUSIZE, (int)NULL, NULL, NULL);

	rxBytes +=3D bytes;
	dataLength +=3D bytes;

	sched_yield();

} /* for(;;) */

A pretty simple loop to receive the data, place it into a buffer, and write=
 it to disk when ready.

What is it about the write call that would not release memory?

Any ideas?  Has anyone seen this type of behavior before?

Thanks!

Jonathan

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures
  2009-08-12 22:09 Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures Jonathan Haws
@ 2009-08-12 22:37 ` Chris Friesen
  2009-08-18 22:56   ` Jonathan Haws
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Friesen @ 2009-08-12 22:37 UTC (permalink / raw)
  To: Jonathan Haws; +Cc: linuxppc-dev@lists.ozlabs.org

Jonathan Haws wrote:
> All,
> 
> I am having some issues with my target and was hoping that someone
> could lend a hand.  I am using an AMCC 405EX (Kilauea) board running
> Linux kernel 2.6.31.
> 
> Here is the problem.  I have some code that receives jumbo frames via
> the EMAC, sticks the data in a buffer, and writes the data out to a
> solid-state SATA disk (using a Silicon Image 3531 controller).
> 
> What is happening is that I appear to be running out of memory and I
> cannot figure out why.  The closest thing I can tell is that the
> sil24 driver for the SATA controller does not seem to be releasing
> memory back to the kernel for some reason.  After some time of
> capturing data and logging it to disk, I get the following kernel
> dump:
> 
> kswapd0: page allocation failure. order:2, mode:0x4020 Call Trace:

I ran into something similar on e1000 a long time ago.  Notice the
"order 2".  That means that you're requesting 16KB of physically
contiguous memory from the kernel, or four physical pages.

It's possible to get into a scenario where there is lots of memory
available but it's all fragmented.  It looks like you have a 64K area
and a 128K area available still so you may not be hitting this, but you
might want to fix it anyways.

If the hardware supports it, the best way to deal with it is to set up
the driver so that it only ever deals in single pages.  See the page
split code in the current e1000/e1000e for examples of this.

Chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures
  2009-08-12 22:37 ` Chris Friesen
@ 2009-08-18 22:56   ` Jonathan Haws
  2009-10-26 15:38     ` Jumbo Frame bug in ibm_newemac driver (was Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures) Jonathan Haws
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Haws @ 2009-08-18 22:56 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linuxppc-dev@lists.ozlabs.org

> If the hardware supports it, the best way to deal with it is to set
> up
> the driver so that it only ever deals in single pages. =20

I am working on fixing the driver to support NETIF_F_SG and have changed ho=
w it receives packets to follow how the e1000 driver does it.

Here is where I am at:

When I get the first part of the frame, I allocate an skb for the packet.  =
I call dev->page =3D alloc_page(GFP_ATOMIC) to allocate a page for the 4080=
 bytes coming from the MAL.

I then setup a DMA mapping for that page to get the data out of the MAL (th=
e original code simply used dma_map_single, but I need a page).

Once the DMA map has been setup and data transferred, I call skb_fill_page_=
desc() to put the data into the skb.  I then wrote a function called emac_c=
onsume_page, which unmaps the DMA mapping, frees the page, and updates the =
lengths in the skb.

The relevant source code is at the end of this email.

My problem is this:

When I run this code, it appears to create the fragmented packet just fine,=
 but when it passes it up the stack, the kernel spits out these bugs, one a=
fter another:

BUG: Bad page state in process swapper  pfn:0ee9b
page:c051f360 flags:(null) count:-3 mapcount:0 mapping:(null) index:766
Call Trace:
[c032bc30] [c0006ef0] show_stack+0x44/0x16c (unreliable)
[c032bc70] [c006c438] bad_page+0x94/0x130
[c032bc90] [c006d4a0] get_page_from_freelist+0x458/0x4d4
[c032bd20] [c006d5f4] __alloc_pages_nodemask+0xd8/0x4f8
[c032bda0] [c01a1174] emac_poll_rx+0x300/0x9c8
[c032bdf0] [c019cb64] mal_poll+0xa8/0x1ec
[c032be20] [c01cf218] net_rx_action+0x9c/0x1b4
[c032be50] [c0039678] __do_softirq+0xc4/0x148
[c032be90] [c0004d18] do_softirq+0x78/0x80
[c032bea0] [c0039264] irq_exit+0x64/0x7c
[c032beb0] [c0005210] do_IRQ+0x9c/0xb4
[c032bed0] [c000fa7c] ret_from_except+0x0/0x18
[c032bf90] [c000808c] cpu_idle+0xdc/0xec
[c032bfb0] [c00028fc] rest_init+0x70/0x84
[c032bfc0] [c02e0864] start_kernel+0x240/0x2c4
[c032bff0] [c0002254] start_here+0x44/0xb0
BUG: Bad page state in process swapper  pfn:0ee8c
page:c051f180 flags:(null) count:-3 mapcount:0 mapping:(null) index:757
Call Trace:
[c032bc30] [c0006ef0] show_stack+0x44/0x16c (unreliable)
[c032bc70] [c006c438] bad_page+0x94/0x130
[c032bc90] [c006d4a0] get_page_from_freelist+0x458/0x4d4
[c032bd20] [c006d5f4] __alloc_pages_nodemask+0xd8/0x4f8
[c032bda0] [c01a1174] emac_poll_rx+0x300/0x9c8
[c032bdf0] [c019cb64] mal_poll+0xa8/0x1ec
[c032be20] [c01cf218] net_rx_action+0x9c/0x1b4
[c032be50] [c0039678] __do_softirq+0xc4/0x148
[c032be90] [c0004d18] do_softirq+0x78/0x80
[c032bea0] [c0039264] irq_exit+0x64/0x7c
[c032beb0] [c0005210] do_IRQ+0x9c/0xb4
[c032bed0] [c000fa7c] ret_from_except+0x0/0x18
[c032bf90] [c000808c] cpu_idle+0xdc/0xec
[c032bfb0] [c00028fc] rest_init+0x70/0x84
[c032bfc0] [c02e0864] start_kernel+0x240/0x2c4
[c032bff0] [c0002254] start_here+0x44/0xb0

I know that I am missing something when it comes to allocating the pages fo=
r the fragments, but when I compare my methodology to the e1000 driver, the=
y appear to be functionally the same?

Any ideas?  I can send the entire source file for the driver if needs be.

Thanks!

Jonathan


Here is the source:

static int emac_poll_rx(void *param, int budget)
{

... /* Other code is here */

push_packet:
	skb->dev =3D dev->ndev;
	skb->protocol =3D eth_type_trans(skb, dev->ndev);
	emac_rx_csum(dev, skb, ctrl);

	if (unlikely(netif_receive_skb(skb) =3D=3D NET_RX_DROP))
		++dev->estats.rx_dropped_stack;
next:
	++dev->stats.rx_packets;
skip:
	dev->stats.rx_bytes +=3D len;
	slot =3D (slot + 1) % NUM_RX_BUFF;
	--budget;
	++received;
	continue;
sg:
if (ctrl & MAL_RX_CTRL_FIRST) {
	BUG_ON(dev->rx_sg_skb);
	if (unlikely(emac_alloc_rx_skb2(dev, slot, GFP_ATOMIC))) {
		DBG(dev, "rx OOM %d (%d) (%d)" NL, slot, dev->rx_skb_size, len);
		++dev->estats.rx_dropped_oom;
		emac_recycle_rx_skb(dev, slot, 0);
	} else {
		dev->rx_sg_skb =3D skb;
		skb_fill_page_desc(dev->rx_sg_skb, 0, dev->page, 0, len);
		emac_consume_page(dev, len, slot);
		dev->rx_sg_skb->len +=3D ETH_HLEN;
	}
} else if (!emac_rx_sg_append(dev, slot) && (ctrl & MAL_RX_CTRL_LAST)) {
	skb =3D dev->rx_sg_skb;
	dev->rx_sg_skb =3D NULL;

	ctrl &=3D EMAC_BAD_RX_MASK;
	if (unlikely(ctrl && ctrl !=3D EMAC_RX_TAH_BAD_CSUM)) {
		emac_parse_rx_error(dev, ctrl);
		++dev->estats.rx_dropped_error;
		dev_kfree_skb(skb);
		len =3D 0;
	} else
		goto push_packet;
}

... /* Other code is here */
} /* end of emac_poll_rx */

static inline int emac_alloc_rx_skb2(struct emac_instance *dev, int slot,
				    gfp_t flags)
{
	struct sk_buff *skb =3D alloc_skb(242, flags);
	if (unlikely(!skb))
		return -ENOMEM;


	dev->rx_skb[slot] =3D skb;
	dev->rx_desc[slot].data_len =3D 0;

	dev->page =3D alloc_page(flags);
	DBG(dev, "emac_alloc_skb2: page %x" NL, dev->page);
	if(unlikely(!dev->page))
	{
		return -1;
	}
	dev->rx_desc[slot].data_ptr =3D dma_map_page(&dev->ofdev->dev, dev->page, =
0, 4096, DMA_FROM_DEVICE);

	wmb();
	dev->rx_desc[slot].ctrl =3D MAL_RX_CTRL_EMPTY |
	    (slot =3D=3D (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);

	return 0;
} /* end of emac_alloc_rx_skb2 */

static inline void emac_consume_page(struct emac_instance* dev, int length,=
 int slot)
{
	dma_unmap_page(&dev->ofdev->dev, dev->rx_desc[slot].data_ptr, 4096, DMA_FR=
OM_DEVICE);
	wmb();
	__free_page(dev->page);
	dev->page =3D NULL;
	dev->rx_sg_skb->len +=3D length;
	dev->rx_sg_skb->data_len +=3D length;
	dev->rx_sg_skb->truesize +=3D length;
}

static inline int emac_rx_sg_append(struct emac_instance *dev, int slot)
{
	if (likely(dev->rx_sg_skb !=3D NULL)) {
		int len =3D dev->rx_desc[slot].data_len;
		int tot_len =3D dev->rx_sg_skb->len + len;

		if (unlikely(tot_len + 2 > dev->max_mtu)) {
			++dev->estats.rx_dropped_mtu;
			dev_kfree_skb(dev->rx_sg_skb);
			dev->rx_sg_skb =3D NULL;
		} else {
			dev->page =3D alloc_page(GFP_ATOMIC);
			if(unlikely(!dev->page))
			{
				return -ENOMEM;
			}
			dev->rx_desc[slot].data_ptr =3D dma_map_page(&dev->ofdev->dev, dev->page=
, 0, 4096, DMA_FROM_DEVICE);
			dev->rx_desc[slot].data_len =3D 0;
			wmb();
			dev->rx_desc[slot].ctrl =3D MAL_RX_CTRL_EMPTY | (slot =3D=3D (NUM_RX_BUF=
F - 1) ? MAL_RX_CTRL_WRAP : 0);
			skb_fill_page_desc(dev->rx_sg_skb, skb_shinfo(dev->rx_sg_skb)->nr_frags,=
 dev->page, 0, len);
			emac_consume_page(dev, len, slot);
			return 0;
		}
	}
	emac_recycle_rx_skb(dev, slot, 0);
	return -1;
} /* end of emac_rx_sg_append */

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Jumbo Frame bug in ibm_newemac driver (was Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures)
  2009-08-18 22:56   ` Jonathan Haws
@ 2009-10-26 15:38     ` Jonathan Haws
  0 siblings, 0 replies; 5+ messages in thread
From: Jonathan Haws @ 2009-10-26 15:38 UTC (permalink / raw)
  To: linuxppc-dev@lists.ozlabs.org

Okay, I need to revisit this issue.  I have had my time taken away for othe=
r things the past couple of months, but I am now back at this network issue=
.

Here is what I have done:

1. I modified the ibm_newemac driver to follow scatter-gather chains on the=
 RX path.  The idea was to setup the driver to only ever deal with single p=
ages.  The MAL in the PPC only supports data transfers of up to 4080 bytes =
(less than a single page), so it appears that the hardware should support s=
ingle page chains.  I set this up just like the e1000 driver.  For whatever=
 reason, this did not work.  It is probably because I do not fully understa=
nd the Linux network stack yet (as is apparent in the next iteration).

2. I reverted to the original driver and found that, contrary to what I had=
 thought earlier, the driver does allocate a ring of skbs for use in the dr=
iver.  However, when a jumbo packet is received (larger than 4080 bytes) it=
 uses the skb that was pre-allocated for the jumbo packet and allocates a n=
ew skb to replace the one in the ring.  This is where the problem is - in t=
hat new allocation to replace the one in the stack.  So, to remedy this, I =
pre-allocated the same number of jumbo skbs for the sole purpose of being u=
sed as new skbs for the rx ring.  Here is some code that shows the idea:

Statuc int emaC_open(struct net_device *ndev)
{
	...

        /* Allocate RX ring */
        for (i =3D 0; i < NUM_RX_BUFF; ++i)
        {
                if (emac_alloc_rx_skb(dev, i, GFP_KERNEL)) {
                        printk(KERN_ERR "%s: failed to allocate RX ring\n",
                               ndev->name);
                        goto oom;
                }

        }

	...
}

static inline int emac_alloc_rx_skb2(struct emac_instance *dev, int slot,
                                    gfp_t flags)
{
        struct sk_buff *skb =3D dev->rx_skb_pool[slot];
        if (unlikely(!skb))
                return -ENOMEM;

        if(skb_recycle_check(skb, emac_rx_skb_size(dev->rx_skb_size)))
        {
        dev->rx_skb[slot] =3D skb;
        dev->rx_desc[slot].data_len =3D 0;

        skb_reserve(skb, EMAC_RX_SKB_HEADROOM + 2);
        dev->rx_desc[slot].data_ptr =3D
            dma_map_single(&dev->ofdev->dev, skb->data - 2, dev->rx_sync_si=
ze,
                           DMA_FROM_DEVICE) + 2;
        wmb();
        dev->rx_desc[slot].ctrl =3D MAL_RX_CTRL_EMPTY |
            (slot =3D=3D (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);

        return 0;
        }
        else
        {
                printk(KERN_NOTICE "EMAC: SKB not recycleable\n");
                return -ENOMEM;
        }
}

Static int emac_poll_rx(void *param, int budget)
{
	...
	      sg:
                if (ctrl & MAL_RX_CTRL_FIRST) {
                        BUG_ON(dev->rx_sg_skb);
                        if (unlikely(emac_alloc_rx_skb2(dev, slot, GFP_ATOM=
IC))) {
                                DBG(dev, "rx OOM %d" NL, slot);
                                ++dev->estats.rx_dropped_oom;
                                emac_recycle_rx_skb(dev, slot, 0);
                        } else {
                                dev->rx_sg_skb =3D skb;
					  emac_recycle_rx_skb(dev,slot,len);
                                skb_put(skb, len);
                        }
                } else if (!emac_rx_sg_append(dev, slot) &&
                           (ctrl & MAL_RX_CTRL_LAST)) {

                        skb =3D dev->rx_sg_skb;
                        dev->rx_sg_skb =3D NULL;

                        ctrl &=3D EMAC_BAD_RX_MASK;
                        if (unlikely(ctrl && ctrl !=3D EMAC_RX_TAH_BAD_CSUM=
)) {
                                emac_parse_rx_error(dev, ctrl);
                                ++dev->estats.rx_dropped_error;
                                dev_kfree_skb(skb);
                                len =3D 0;
                        } else {
                        /*      printk(KERN_NOTICE "EMAC: pushing sg packet=
\n");*/
                                goto push_packet;
                        }
                }
                goto skip;
	...
}

The changes are the allocation of the rx_skb_pool in emac_open(), the funct=
ion call emac_alloc_rx_skb2() in emac_poll_rx(), and the modifications to e=
mac_alloc_skb to create emac_alloc_rx_skb2.  Also, corresponding allocation=
s for rx_skb_pool are found in emac_resize_rx_ring() for when we need to re=
size the pool.

Now the problem that I am having is this - the first time through the ring,=
 things work just fine.  But the second time through the loop, the buffers =
are not cleaned out - they still think they contain data.  I have tried cal=
ling skb_recycle_check() to restore the skb to a new state, however that ca=
ll fails because apparently the skb cannot be reused for receive.  Why is t=
hat the case?  What am I missing?  It seems like I am missing something tha=
t allows the skb to be reused?

I will admit, I am not a Linux network driver expert, though I am learning.=
  If anyone can lend any advice or can see a problem in my logic, then plea=
se let me know.

Thanks!

Jonathan

> > If the hardware supports it, the best way to deal with it is to
> set
> > up
> > the driver so that it only ever deals in single pages.
>=20
> I am working on fixing the driver to support NETIF_F_SG and have
> changed how it receives packets to follow how the e1000 driver does
> it.
>=20
> Here is where I am at:
>=20
> When I get the first part of the frame, I allocate an skb for the
> packet.  I call dev->page =3D alloc_page(GFP_ATOMIC) to allocate a
> page for the 4080 bytes coming from the MAL.
>=20
> I then setup a DMA mapping for that page to get the data out of the
> MAL (the original code simply used dma_map_single, but I need a
> page).
>=20
> Once the DMA map has been setup and data transferred, I call
> skb_fill_page_desc() to put the data into the skb.  I then wrote a
> function called emac_consume_page, which unmaps the DMA mapping,
> frees the page, and updates the lengths in the skb.
>=20
> The relevant source code is at the end of this email.
>=20
> My problem is this:
>=20
> When I run this code, it appears to create the fragmented packet
> just fine, but when it passes it up the stack, the kernel spits out
> these bugs, one after another:
>=20
> BUG: Bad page state in process swapper  pfn:0ee9b
> page:c051f360 flags:(null) count:-3 mapcount:0 mapping:(null)
> index:766
> Call Trace:
> [c032bc30] [c0006ef0] show_stack+0x44/0x16c (unreliable)
> [c032bc70] [c006c438] bad_page+0x94/0x130
> [c032bc90] [c006d4a0] get_page_from_freelist+0x458/0x4d4
> [c032bd20] [c006d5f4] __alloc_pages_nodemask+0xd8/0x4f8
> [c032bda0] [c01a1174] emac_poll_rx+0x300/0x9c8
> [c032bdf0] [c019cb64] mal_poll+0xa8/0x1ec
> [c032be20] [c01cf218] net_rx_action+0x9c/0x1b4
> [c032be50] [c0039678] __do_softirq+0xc4/0x148
> [c032be90] [c0004d18] do_softirq+0x78/0x80
> [c032bea0] [c0039264] irq_exit+0x64/0x7c
> [c032beb0] [c0005210] do_IRQ+0x9c/0xb4
> [c032bed0] [c000fa7c] ret_from_except+0x0/0x18
> [c032bf90] [c000808c] cpu_idle+0xdc/0xec
> [c032bfb0] [c00028fc] rest_init+0x70/0x84
> [c032bfc0] [c02e0864] start_kernel+0x240/0x2c4
> [c032bff0] [c0002254] start_here+0x44/0xb0
> BUG: Bad page state in process swapper  pfn:0ee8c
> page:c051f180 flags:(null) count:-3 mapcount:0 mapping:(null)
> index:757
> Call Trace:
> [c032bc30] [c0006ef0] show_stack+0x44/0x16c (unreliable)
> [c032bc70] [c006c438] bad_page+0x94/0x130
> [c032bc90] [c006d4a0] get_page_from_freelist+0x458/0x4d4
> [c032bd20] [c006d5f4] __alloc_pages_nodemask+0xd8/0x4f8
> [c032bda0] [c01a1174] emac_poll_rx+0x300/0x9c8
> [c032bdf0] [c019cb64] mal_poll+0xa8/0x1ec
> [c032be20] [c01cf218] net_rx_action+0x9c/0x1b4
> [c032be50] [c0039678] __do_softirq+0xc4/0x148
> [c032be90] [c0004d18] do_softirq+0x78/0x80
> [c032bea0] [c0039264] irq_exit+0x64/0x7c
> [c032beb0] [c0005210] do_IRQ+0x9c/0xb4
> [c032bed0] [c000fa7c] ret_from_except+0x0/0x18
> [c032bf90] [c000808c] cpu_idle+0xdc/0xec
> [c032bfb0] [c00028fc] rest_init+0x70/0x84
> [c032bfc0] [c02e0864] start_kernel+0x240/0x2c4
> [c032bff0] [c0002254] start_here+0x44/0xb0
>=20
> I know that I am missing something when it comes to allocating the
> pages for the fragments, but when I compare my methodology to the
> e1000 driver, they appear to be functionally the same?
>=20
> Any ideas?  I can send the entire source file for the driver if
> needs be.
>=20
> Thanks!
>=20
> Jonathan
>=20
>=20
> Here is the source:
>=20
> static int emac_poll_rx(void *param, int budget)
> {
>=20
> ... /* Other code is here */
>=20
> push_packet:
> 	skb->dev =3D dev->ndev;
> 	skb->protocol =3D eth_type_trans(skb, dev->ndev);
> 	emac_rx_csum(dev, skb, ctrl);
>=20
> 	if (unlikely(netif_receive_skb(skb) =3D=3D NET_RX_DROP))
> 		++dev->estats.rx_dropped_stack;
> next:
> 	++dev->stats.rx_packets;
> skip:
> 	dev->stats.rx_bytes +=3D len;
> 	slot =3D (slot + 1) % NUM_RX_BUFF;
> 	--budget;
> 	++received;
> 	continue;
> sg:
> if (ctrl & MAL_RX_CTRL_FIRST) {
> 	BUG_ON(dev->rx_sg_skb);
> 	if (unlikely(emac_alloc_rx_skb2(dev, slot, GFP_ATOMIC))) {
> 		DBG(dev, "rx OOM %d (%d) (%d)" NL, slot, dev-
> >rx_skb_size, len);
> 		++dev->estats.rx_dropped_oom;
> 		emac_recycle_rx_skb(dev, slot, 0);
> 	} else {
> 		dev->rx_sg_skb =3D skb;
> 		skb_fill_page_desc(dev->rx_sg_skb, 0, dev->page, 0,
> len);
> 		emac_consume_page(dev, len, slot);
> 		dev->rx_sg_skb->len +=3D ETH_HLEN;
> 	}
> } else if (!emac_rx_sg_append(dev, slot) && (ctrl &
> MAL_RX_CTRL_LAST)) {
> 	skb =3D dev->rx_sg_skb;
> 	dev->rx_sg_skb =3D NULL;
>=20
> 	ctrl &=3D EMAC_BAD_RX_MASK;
> 	if (unlikely(ctrl && ctrl !=3D EMAC_RX_TAH_BAD_CSUM)) {
> 		emac_parse_rx_error(dev, ctrl);
> 		++dev->estats.rx_dropped_error;
> 		dev_kfree_skb(skb);
> 		len =3D 0;
> 	} else
> 		goto push_packet;
> }
>=20
> ... /* Other code is here */
> } /* end of emac_poll_rx */
>=20
> static inline int emac_alloc_rx_skb2(struct emac_instance *dev, int
> slot,
> 				    gfp_t flags)
> {
> 	struct sk_buff *skb =3D alloc_skb(242, flags);
> 	if (unlikely(!skb))
> 		return -ENOMEM;
>=20
>=20
> 	dev->rx_skb[slot] =3D skb;
> 	dev->rx_desc[slot].data_len =3D 0;
>=20
> 	dev->page =3D alloc_page(flags);
> 	DBG(dev, "emac_alloc_skb2: page %x" NL, dev->page);
> 	if(unlikely(!dev->page))
> 	{
> 		return -1;
> 	}
> 	dev->rx_desc[slot].data_ptr =3D dma_map_page(&dev->ofdev->dev,
> dev->page, 0, 4096, DMA_FROM_DEVICE);
>=20
> 	wmb();
> 	dev->rx_desc[slot].ctrl =3D MAL_RX_CTRL_EMPTY |
> 	    (slot =3D=3D (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);
>=20
> 	return 0;
> } /* end of emac_alloc_rx_skb2 */
>=20
> static inline void emac_consume_page(struct emac_instance* dev, int
> length, int slot)
> {
> 	dma_unmap_page(&dev->ofdev->dev, dev->rx_desc[slot].data_ptr,
> 4096, DMA_FROM_DEVICE);
> 	wmb();
> 	__free_page(dev->page);
> 	dev->page =3D NULL;
> 	dev->rx_sg_skb->len +=3D length;
> 	dev->rx_sg_skb->data_len +=3D length;
> 	dev->rx_sg_skb->truesize +=3D length;
> }
>=20
> static inline int emac_rx_sg_append(struct emac_instance *dev, int
> slot)
> {
> 	if (likely(dev->rx_sg_skb !=3D NULL)) {
> 		int len =3D dev->rx_desc[slot].data_len;
> 		int tot_len =3D dev->rx_sg_skb->len + len;
>=20
> 		if (unlikely(tot_len + 2 > dev->max_mtu)) {
> 			++dev->estats.rx_dropped_mtu;
> 			dev_kfree_skb(dev->rx_sg_skb);
> 			dev->rx_sg_skb =3D NULL;
> 		} else {
> 			dev->page =3D alloc_page(GFP_ATOMIC);
> 			if(unlikely(!dev->page))
> 			{
> 				return -ENOMEM;
> 			}
> 			dev->rx_desc[slot].data_ptr =3D dma_map_page(&dev-
> >ofdev->dev, dev->page, 0, 4096, DMA_FROM_DEVICE);
> 			dev->rx_desc[slot].data_len =3D 0;
> 			wmb();
> 			dev->rx_desc[slot].ctrl =3D MAL_RX_CTRL_EMPTY |
> (slot =3D=3D (NUM_RX_BUFF - 1) ? MAL_RX_CTRL_WRAP : 0);
> 			skb_fill_page_desc(dev->rx_sg_skb, skb_shinfo(dev-
> >rx_sg_skb)->nr_frags, dev->page, 0, len);
> 			emac_consume_page(dev, len, slot);
> 			return 0;
> 		}
> 	}
> 	emac_recycle_rx_skb(dev, slot, 0);
> 	return -1;
> } /* end of emac_rx_sg_append */
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-10-26 15:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-12 22:09 Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures Jonathan Haws
2009-08-12 22:37 ` Chris Friesen
2009-08-18 22:56   ` Jonathan Haws
2009-10-26 15:38     ` Jumbo Frame bug in ibm_newemac driver (was Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures) Jonathan Haws
  -- strict thread matches above, loose matches on Subject: below --
2009-08-12 22:11 Jumbo Frames, sil24 SATA driver, and kswapd0 page allocation failures Jonathan Haws

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.