netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space
@ 2007-11-24 23:44 Alistair John Strachan
  2007-11-25  0:25 ` Francois Romieu
  2007-11-25  0:39 ` Alan Cox
  0 siblings, 2 replies; 7+ messages in thread
From: Alistair John Strachan @ 2007-11-24 23:44 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel

Hi,

I have recently assembled a Core 2 Duo system with 4GB RAM and I believe there 
might be a bug in the r8169 driver in >4GB RAM configurations.

Initially I can use one of two active r8169 NICs on the motherboard with this 
quantity of RAM with other devices, without issue. But after some amount of 
data (generally about 50MB), no more network packets are sent/received.

The "choke" affects other devices on the system too, notably libata, which 
does not recover gracefully. In my logs, I see a stream of:

DMA: Out of SW-IOMMU space for 7222 bytes at device 0000:04:00.0
DMA: Out of SW-IOMMU space for 7222 bytes at device 0000:04:00.0

The device 0000:04:00.0 corresponds to one of the r8169s.

The reason I believe r8169 is at fault is that I was doing a rebuild of my 
RAID5 across 3 SATA drives via libata's ahci driver, and transferring over the 
network. When the "choke" occurred the RAID sync stopped, libata errors were 
seen, and I simply did a "ifconfig br0 down" (which contained the r8169) and 
the messages went away. Bringing the NIC up again would see some initial 
functionality then very rapidly it would go back to the same error messages.

The Intel chipset I am using does not support any kind of hardware IOMMU, so I 
am forced to use swiotlb in a 4GB RAM configuration. In an attempt to delay 
the failures, I used the swiotlb option to increase the swiotlb's page 
allocation with "swiotlb=65536" (which seems to correspond to a 256MB bounce 
buffer).

Assuming both libata and r8169 use the swiotlb, and both systems are impaired 
when these messages appear, removing r8169 would appear to be key. Indeed, if 
there is no significant libata activity, the problem still occurs on the NIC 
within approximately the same amount of transfer.

This option delays the failure for some time but it will happen eventually, 
which makes me suspicious that maybe the driver is somehow pinning an area of 
the buffer and not releasing it. (I hunted bugzilla for reports similar to 
this one, but couldn't find anything.)

Having tested the r8169 driver on an AMD system I did not experience the same 
problems with 4GB RAM, so this could be a bug specific to swiotlb. I would 
have added more people to CC but I have no idea who might be responsible.

Andrew, I've added you just in case you're aware of other similar reports 
(maybe r8169 on big iron) and have anybody from the sw-iommu camp that could 
be added to CC.

-- 
Cheers,
Alistair.

137/1 Warrender Park Road, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space
  2007-11-24 23:44 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space Alistair John Strachan
@ 2007-11-25  0:25 ` Francois Romieu
  2007-11-25  1:27   ` Francois Romieu
  2007-11-25  1:48   ` Alistair John Strachan
  2007-11-25  0:39 ` Alan Cox
  1 sibling, 2 replies; 7+ messages in thread
From: Francois Romieu @ 2007-11-25  0:25 UTC (permalink / raw)
  To: Alistair John Strachan; +Cc: netdev, linux-kernel

Alistair John Strachan <alistair@devzero.co.uk> :
[...]
> The "choke" affects other devices on the system too, notably libata, which 
> does not recover gracefully. In my logs, I see a stream of:
> 
> DMA: Out of SW-IOMMU space for 7222 bytes at device 0000:04:00.0
> DMA: Out of SW-IOMMU space for 7222 bytes at device 0000:04:00.0

You are using jumbo frames, aren't you ?

-- 
Ueimor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space
  2007-11-24 23:44 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space Alistair John Strachan
  2007-11-25  0:25 ` Francois Romieu
@ 2007-11-25  0:39 ` Alan Cox
  2007-11-25  1:15   ` Francois Romieu
  1 sibling, 1 reply; 7+ messages in thread
From: Alan Cox @ 2007-11-25  0:39 UTC (permalink / raw)
  To: Alistair John Strachan; +Cc: Francois Romieu, netdev, linux-kernel

> when these messages appear, removing r8169 would appear to be key. Indeed, if 
> there is no significant libata activity, the problem still occurs on the NIC 
> within approximately the same amount of transfer.

You seem to have a leak, which actually isn't suprising

	rtl8169_xmit_frags allocates a set of maps for a fragmented packet

	rtl8169_start_xmit allocates a buffer

When we finish the transit we free the main buffer (always using skb->len
when sometimes its skb->headlne. We don't seem to free the fragment
buffers at all.

Looks like the unmap path for fragmented packets is broken with any kind
of iommu

Alan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space
  2007-11-25  0:39 ` Alan Cox
@ 2007-11-25  1:15   ` Francois Romieu
  0 siblings, 0 replies; 7+ messages in thread
From: Francois Romieu @ 2007-11-25  1:15 UTC (permalink / raw)
  To: Alan Cox; +Cc: Alistair John Strachan, netdev, linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> :
[...]
> You seem to have a leak, which actually isn't suprising
> 
> 	rtl8169_xmit_frags allocates a set of maps for a fragmented packet
> 
> 	rtl8169_start_xmit allocates a buffer
> 
> When we finish the transit we free the main buffer (always using skb->len
> when sometimes its skb->headlne. We don't seem to free the fragment
> buffers at all.
> Looks like the unmap path for fragmented packets is broken with any kind
> of iommu

Are you referring to the pci_unmap part ?

There is a 1:1 correspondance between a Tx descriptor entry and
{an unfragmented skb or a fragment of a skb}. Afaiks rtl8169_unmap_tx_skb()
is issued for each Tx descriptor entry, be it after a Tx completion irq or
a general Tx ring cleanup.

I'll read it again after some sleep but the leak does not seem clear to me.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space
  2007-11-25  0:25 ` Francois Romieu
@ 2007-11-25  1:27   ` Francois Romieu
  2007-11-25  2:02     ` Alistair John Strachan
  2007-11-25  1:48   ` Alistair John Strachan
  1 sibling, 1 reply; 7+ messages in thread
From: Francois Romieu @ 2007-11-25  1:27 UTC (permalink / raw)
  To: Alistair John Strachan; +Cc: netdev, linux-kernel

Francois Romieu <romieu@fr.zoreil.com> :
> Alistair John Strachan <alistair@devzero.co.uk> :
> [...]
> > The "choke" affects other devices on the system too, notably libata, which 
> > does not recover gracefully. In my logs, I see a stream of:
> > 
> > DMA: Out of SW-IOMMU space for 7222 bytes at device 0000:04:00.0
> > DMA: Out of SW-IOMMU space for 7222 bytes at device 0000:04:00.0
> 
> You are using jumbo frames, aren't you ?

See below for my late night crap. At least it should avoid the driver
issuing Rx/Tx DMA with the single static buffer of lib/swiotlb.c
(io_tlb_overflow_buffer). Ghee.

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index 1f647b9..72a7370 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -2262,10 +2262,16 @@ static struct sk_buff *rtl8169_alloc_rx_skb(struct pci_dev *pdev,
 	mapping = pci_map_single(pdev, skb->data, rx_buf_sz,
 				 PCI_DMA_FROMDEVICE);
 
+	if (pci_dma_mapping_error(mapping))
+		goto err_kfree_skb;
+
 	rtl8169_map_to_asic(desc, mapping, rx_buf_sz);
 out:
 	return skb;
 
+err_kfree_skb:
+	dev_kfree_skb(skb);
+	skb = NULL;
 err_out:
 	rtl8169_make_unusable_by_asic(desc);
 	goto out;
@@ -2486,6 +2492,7 @@ static int rtl8169_xmit_frags(struct rtl8169_private *tp, struct sk_buff *skb,
 		dma_addr_t mapping;
 		u32 status, len;
 		void *addr;
+		int rc;
 
 		entry = (entry + 1) % NUM_TX_DESC;
 
@@ -2493,6 +2500,22 @@ static int rtl8169_xmit_frags(struct rtl8169_private *tp, struct sk_buff *skb,
 		len = frag->size;
 		addr = ((void *) page_address(frag->page)) + frag->page_offset;
 		mapping = pci_map_single(tp->pci_dev, addr, len, PCI_DMA_TODEVICE);
+		rc = pci_dma_mapping_error(mapping);
+		if (unlikely(rc < 0)) {
+			while (cur_frag-- > 0) {
+				frag = info->frags + cur_frag;
+				entry = (entry - 1) % NUM_TX_DESC;
+				txd = tp->TxDescArray + entry;
+				len = frag->size;
+				mapping = le64_to_cpu(txd->addr);
+				pci_unmap_single(tp->pci_dev, mapping, len,
+						 PCI_DMA_TODEVICE);
+				txd->opts1 = 0x00;
+				txd->opts2 = 0x00;
+				txd->addr = 0x00;
+			}
+			return rc;
+		}
 
 		/* anti gcc 2.95.3 bugware (sic) */
 		status = opts1 | len | (RingEnd * !((entry + 1) % NUM_TX_DESC));
@@ -2534,13 +2557,13 @@ static inline u32 rtl8169_tso_csum(struct sk_buff *skb, struct net_device *dev)
 static int rtl8169_start_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct rtl8169_private *tp = netdev_priv(dev);
-	unsigned int frags, entry = tp->cur_tx % NUM_TX_DESC;
+	unsigned int entry = tp->cur_tx % NUM_TX_DESC;
 	struct TxDesc *txd = tp->TxDescArray + entry;
 	void __iomem *ioaddr = tp->mmio_addr;
 	dma_addr_t mapping;
 	u32 status, len;
 	u32 opts1;
-	int ret = NETDEV_TX_OK;
+	int frags, ret = NETDEV_TX_OK;
 
 	if (unlikely(TX_BUFFS_AVAIL(tp) < skb_shinfo(skb)->nr_frags)) {
 		if (netif_msg_drv(tp)) {
@@ -2557,7 +2580,11 @@ static int rtl8169_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	opts1 = DescOwn | rtl8169_tso_csum(skb, dev);
 
 	frags = rtl8169_xmit_frags(tp, skb, opts1);
-	if (frags) {
+	if (frags < 0) {
+		printk(KERN_ERR "%s: PCI mapping failure (%d).\n", dev->name,
+		       frags);
+		goto err_busy;
+	} else if (frags > 0) {
 		len = skb_headlen(skb);
 		opts1 |= FirstFrag;
 	} else {
@@ -2605,6 +2632,7 @@ out:
 
 err_stop:
 	netif_stop_queue(dev);
+err_busy:
 	ret = NETDEV_TX_BUSY;
 err_update_stats:
 	dev->stats.tx_dropped++;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space
  2007-11-25  0:25 ` Francois Romieu
  2007-11-25  1:27   ` Francois Romieu
@ 2007-11-25  1:48   ` Alistair John Strachan
  1 sibling, 0 replies; 7+ messages in thread
From: Alistair John Strachan @ 2007-11-25  1:48 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel

On Sunday 25 November 2007 00:25:10 Francois Romieu wrote:
> Alistair John Strachan <alistair@devzero.co.uk> :
> [...]
>
> > The "choke" affects other devices on the system too, notably libata,
> > which does not recover gracefully. In my logs, I see a stream of:
> >
> > DMA: Out of SW-IOMMU space for 7222 bytes at device 0000:04:00.0
> > DMA: Out of SW-IOMMU space for 7222 bytes at device 0000:04:00.0
>
> You are using jumbo frames, aren't you ?

Yes, 7200 byte frames. I'll certainly try out your patch and report back.

-- 
Cheers,
Alistair.

137/1 Warrender Park Road, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space
  2007-11-25  1:27   ` Francois Romieu
@ 2007-11-25  2:02     ` Alistair John Strachan
  0 siblings, 0 replies; 7+ messages in thread
From: Alistair John Strachan @ 2007-11-25  2:02 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, linux-kernel, Alan Cox

On Sunday 25 November 2007 01:27:54 Francois Romieu wrote:
> Francois Romieu <romieu@fr.zoreil.com> :
> > Alistair John Strachan <alistair@devzero.co.uk> :
> > [...]
> >
> > > The "choke" affects other devices on the system too, notably libata,
> > > which does not recover gracefully. In my logs, I see a stream of:
> > >
> > > DMA: Out of SW-IOMMU space for 7222 bytes at device 0000:04:00.0
> > > DMA: Out of SW-IOMMU space for 7222 bytes at device 0000:04:00.0
> >
> > You are using jumbo frames, aren't you ?
>
> See below for my late night crap. At least it should avoid the driver
> issuing Rx/Tx DMA with the single static buffer of lib/swiotlb.c
> (io_tlb_overflow_buffer). Ghee.

No improvement. It might be possible to reproduce the problem on your end if 
you add iommu support and force enable the swiotlb (which should be possible 
even with <4GB RAM).

> diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
> index 1f647b9..72a7370 100644
> --- a/drivers/net/r8169.c
> +++ b/drivers/net/r8169.c
> @@ -2262,10 +2262,16 @@ static struct sk_buff *rtl8169_alloc_rx_skb(struct
> pci_dev *pdev, mapping = pci_map_single(pdev, skb->data, rx_buf_sz,
>  				 PCI_DMA_FROMDEVICE);
>
> +	if (pci_dma_mapping_error(mapping))
> +		goto err_kfree_skb;
> +
>  	rtl8169_map_to_asic(desc, mapping, rx_buf_sz);
>  out:
>  	return skb;
>
> +err_kfree_skb:
> +	dev_kfree_skb(skb);
> +	skb = NULL;
>  err_out:
>  	rtl8169_make_unusable_by_asic(desc);
>  	goto out;
> @@ -2486,6 +2492,7 @@ static int rtl8169_xmit_frags(struct rtl8169_private
> *tp, struct sk_buff *skb, dma_addr_t mapping;
>  		u32 status, len;
>  		void *addr;
> +		int rc;
>
>  		entry = (entry + 1) % NUM_TX_DESC;
>
> @@ -2493,6 +2500,22 @@ static int rtl8169_xmit_frags(struct rtl8169_private
> *tp, struct sk_buff *skb, len = frag->size;
>  		addr = ((void *) page_address(frag->page)) + frag->page_offset;
>  		mapping = pci_map_single(tp->pci_dev, addr, len, PCI_DMA_TODEVICE);
> +		rc = pci_dma_mapping_error(mapping);
> +		if (unlikely(rc < 0)) {
> +			while (cur_frag-- > 0) {
> +				frag = info->frags + cur_frag;
> +				entry = (entry - 1) % NUM_TX_DESC;
> +				txd = tp->TxDescArray + entry;
> +				len = frag->size;
> +				mapping = le64_to_cpu(txd->addr);
> +				pci_unmap_single(tp->pci_dev, mapping, len,
> +						 PCI_DMA_TODEVICE);
> +				txd->opts1 = 0x00;
> +				txd->opts2 = 0x00;
> +				txd->addr = 0x00;
> +			}
> +			return rc;
> +		}
>
>  		/* anti gcc 2.95.3 bugware (sic) */
>  		status = opts1 | len | (RingEnd * !((entry + 1) % NUM_TX_DESC));
> @@ -2534,13 +2557,13 @@ static inline u32 rtl8169_tso_csum(struct sk_buff
> *skb, struct net_device *dev) static int rtl8169_start_xmit(struct sk_buff
> *skb, struct net_device *dev) {
>  	struct rtl8169_private *tp = netdev_priv(dev);
> -	unsigned int frags, entry = tp->cur_tx % NUM_TX_DESC;
> +	unsigned int entry = tp->cur_tx % NUM_TX_DESC;
>  	struct TxDesc *txd = tp->TxDescArray + entry;
>  	void __iomem *ioaddr = tp->mmio_addr;
>  	dma_addr_t mapping;
>  	u32 status, len;
>  	u32 opts1;
> -	int ret = NETDEV_TX_OK;
> +	int frags, ret = NETDEV_TX_OK;
>
>  	if (unlikely(TX_BUFFS_AVAIL(tp) < skb_shinfo(skb)->nr_frags)) {
>  		if (netif_msg_drv(tp)) {
> @@ -2557,7 +2580,11 @@ static int rtl8169_start_xmit(struct sk_buff *skb,
> struct net_device *dev) opts1 = DescOwn | rtl8169_tso_csum(skb, dev);
>
>  	frags = rtl8169_xmit_frags(tp, skb, opts1);
> -	if (frags) {
> +	if (frags < 0) {
> +		printk(KERN_ERR "%s: PCI mapping failure (%d).\n", dev->name,
> +		       frags);
> +		goto err_busy;
> +	} else if (frags > 0) {
>  		len = skb_headlen(skb);
>  		opts1 |= FirstFrag;
>  	} else {
> @@ -2605,6 +2632,7 @@ out:
>
>  err_stop:
>  	netif_stop_queue(dev);
> +err_busy:
>  	ret = NETDEV_TX_BUSY;
>  err_update_stats:
>  	dev->stats.tx_dropped++;

-- 
Cheers,
Alistair.

137/1 Warrender Park Road, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2007-11-25  2:02 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-24 23:44 2.6.24-rc3, 4GB RAM, swiotlb, r8169, out of space Alistair John Strachan
2007-11-25  0:25 ` Francois Romieu
2007-11-25  1:27   ` Francois Romieu
2007-11-25  2:02     ` Alistair John Strachan
2007-11-25  1:48   ` Alistair John Strachan
2007-11-25  0:39 ` Alan Cox
2007-11-25  1:15   ` Francois Romieu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).