Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] Fix SCTP failure with ipv6 source address routing
From: Vlad Yasevich @ 2010-04-14  0:47 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: netdev
In-Reply-To: <1271198256-20477-1-git-send-email-paul.gortmaker@windriver.com>



Paul Gortmaker wrote:
> From: Weixing Shi <Weixing.Shi@windriver.com>
> 
> Given the below test case, using source address routing, SCTP
> does not work.
> 
> Node-A:
>   1)ifconfig eth0 inet6 add 2001:1::1/64
>   2)ip -6 rule add from 2001:1::1 table 100 pref 100
>   3)ip -6 route add 2001:2::1 dev eth0 table 100
>   4)sctp_darn -H 2001:1::1 -P 250 -l &
> 
> Node-B:
>   1)ifconfig eth0 inet6 add 2001:2::1/64
>   2)ip -6 rule add from 2001:2::1 table 100 pref 100
>   3)ip -6 route add 2001:1::1 dev eth0 table 100
>   4)sctp_darn -H 2001:2::1 -P 250 -h 2001:1::1 -p 250 -s
> 
> Root cause:
>   Node-A and Node-B use source address routing, and in the
>   begining, the source address will be NULL.  So SCTP will search
>   the routing table by the destination address (because it is using
>   the source address routing table), and hence the resulting dst_entry
>   will be NULL.
> 
> Solution:
>   After SCTP gets the correct source address, then we search for
>   dst_entry again, and then we will get the correct value.

The problem here is that ipv6 route lookup code in sctp doesn't bother
searching for the source address, unlike the v4 route lookup code.

Compare sctp_v4_get_dst() and sctp_v6_get_dst.  The v4 version bends over
backwards trying to get the correct route, while the v6 version simple does
a single lookup and returns the result.

The v6 route lookup code needs to be fixed to take into account the bound
address list.

-vlad
	
> 
> Signed-off-by: Weixing Shi <Weixing.Shi@windriver.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
>  net/sctp/transport.c |   11 +++++++++--
>  1 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index be4d63d..b5ae18c 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -295,9 +295,16 @@ void sctp_transport_route(struct sctp_transport *transport,
>  
>  	if (saddr)
>  		memcpy(&transport->saddr, saddr, sizeof(union sctp_addr));
> -	else
> +	else {
>  		af->get_saddr(opt, asoc, dst, daddr, &transport->saddr);
> -
> +		/* When using source address routing, since dst was
> +		 * looked up prior to filling in the source address, dst
> +		 * needs to be looked up again to get the correct dst
> +		 */
> +		if (dst)
> +			dst_release(dst);
> +		dst = af->get_dst(asoc, daddr, &transport->saddr);
> +	}
>  	transport->dst = dst;
>  	if ((transport->param_flags & SPP_PMTUD_DISABLE) && transport->pathmtu) {
>  		return;

^ permalink raw reply

* Phylib polling when doing mdio_read will cause system response and transfer speed drop
From: Bryan Wu @ 2010-04-14  0:27 UTC (permalink / raw)
  To: afleming, davem; +Cc: netdev, LKML

Hi Andy and David,

After I posted a patch to add phylib supporting in drivers/net/fec.c, we found 
performance drop regressions on Freescale i.MX51 babbage board.

Patch is 
http://git.kernel.org/?p=linux/kernel/git/davem/net-next-2.6.git;a=commitdiff;h=e6b043d512fa8d9a3801bf5d72bfa3b8fc3b3cc8.

Bug tracker is here: 
https://bugs.launchpad.net/ubuntu/+source/linux-fsl-imx51/+bug/546649

I found the root cause is the polling operation in the mdio_read function. When 
we transfer large files, we experienced many times of timeout issue. So I got 
several question here:
1. Need I return -ETIMEDOUT when polling timeout. If I don't return -ETIMEOUT, 
the performance improved a lot. And after check other drivers, some don't return 
anything, some return 0, some return negative value. What's the rule for this 
mdio_read polling timeout case.

2. How to do polling busy waiting? Normally, we won't buys wait very long in 
polling. But hardware is not perfect every time. Running cpu_relax() 10000 times 
in polling will cause our system response very bad when hardware don't set the 
flag as we expected. Maybe udelay(25) 10 times or msleep(1) 10 times is better 
than that.

I got a patch to recover this issue, 
http://kernel.ubuntu.com/git?p=roc/ubuntu-lucid.git;a=commitdiff;h=5d77e3409b319ca84183bf1d2fd158a9c864e03f.

Thanks a lot,
-- 
Bryan Wu <bryan.wu@canonical.com>
Kernel Developer    +86.138-1617-6545 Mobile
Ubuntu Kernel Team | Hardware Enablement Team
Canonical Ltd.      www.canonical.com
Ubuntu - Linux for human beings | www.ubuntu.com

^ permalink raw reply

* Re: [PATCH] Add somaxconn to Documentation/sysctl/net.txt
From: Rob Landley @ 2010-04-13 23:54 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: linux-kernel, linux-doc, netdev
In-Reply-To: <1271184012.16881.549.camel@edumazet-laptop>

On Tuesday 13 April 2010 13:40:12 Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 13:25 -0500, Rob Landley a écrit :
> > From: Rob Landley <rob@landley.net>
> >
> > Add somaxconn to Documentation/sysctl/net.txt
> >
> > Signed-off-by: Rob Landley <rob@landley.net>
> > ---
> >
> >  Documentation/sysctl/net.txt |    6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
> > index df38ef0..2740085 100644
> > --- a/Documentation/sysctl/net.txt
> > +++ b/Documentation/sysctl/net.txt
> > @@ -90,6 +90,12 @@ optmem_max
> >  Maximum ancillary buffer size allowed per socket. Ancillary data is a
> > sequence of struct cmsghdr structures with appended data.
> >
> > +somaxconn
> > +---------
> > +
> > +Maximum backlog of unanswered connections for a listening socket. 
> > Provides +an upper bound on the "backlog" parameter of the listen()
> > syscall. +
> >  2. /proc/sys/net/unix - Parameters for Unix domain sockets
> >  -------------------------------------------------------
>
> Please cc netdev for such patches
>
> Extract of Documentation/networking/ip-sysctl.txt
>
> somaxconn - INTEGER
> 	Limit of socket listen() backlog, known in userspace as SOMAXCONN.
> 	Defaults to 128.  See also tcp_max_syn_backlog for additional tuning
> 	for TCP sockets.
>
> I guess you need to change both files ?

Dunno.  I just got a question on the busybox mailing list:

  http://lists.busybox.net/pipermail/busybox/2010-April/072090.html

Looked in Documentation to see what /proc/sys/net/core/somaxconn actually 
_did_, found it was undocumented, grepped the kernel source for somaxconn, 
found just one chunk of code actually using it, replied to the guy's question:

  http://lists.busybox.net/pipermail/busybox/2010-April/072096.html

And then tweaked the documentation with what I'd found, and sent in a doc 
patch so I wouldn't have to do that twice.

It's quite possible I got it wrong.  Maybe it's per interface or something?

Rob
-- 
Latency is more important than throughput. It's that simple. - Linus Torvalds

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: sk_dst_cache RCUification
From: David Miller @ 2010-04-13 23:11 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, paulmck
In-Reply-To: <1271199845.16881.586.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 14 Apr 2010 01:04:05 +0200

> Instead of using rcu on whole "struct socket", my plan is to use a small
> structure :
> 
> struct wait_queue_head_rcu {
> 	wait_queue_head_t wait;
> 	struct rcu_head	  rcu;
> } ____cacheline_aligned_in_smp;
> 
> and make sk->sk_sleep points to this 'wait' field.

So you're relying upon the fact that in the non-FASYNC case
the struct socket's wait queue is never actually used?

^ permalink raw reply

* Re: [PATCH net-next-2.6] net: sk_dst_cache RCUification
From: Eric Dumazet @ 2010-04-13 23:04 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, paulmck
In-Reply-To: <20100413.015232.67916764.davem@davemloft.net>

Le mardi 13 avril 2010 à 01:52 -0700, David Miller a écrit :

> Applied, thanks for doing this work Eric.

Thanks David :)

I am now working on sk_callback_lock case, to speedup
sock_def_readable(), sock_def_write_space() in typical cases
(SOCK_FASYNC not set)

Instead of using rcu on whole "struct socket", my plan is to use a small
structure :

struct wait_queue_head_rcu {
	wait_queue_head_t wait;
	struct rcu_head	  rcu;
} ____cacheline_aligned_in_smp;

and make sk->sk_sleep points to this 'wait' field.



^ permalink raw reply

* Re: [PATCH v2] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-13 22:43 UTC (permalink / raw)
  To: paulmck; +Cc: Eric Dumazet, David S. Miller, netdev
In-Reply-To: <20100413155227.GC2538@linux.vnet.ibm.com>

On Tue, Apr 13, 2010 at 11:52 PM, Paul E. McKenney
<paulmck@linux.vnet.ibm.com> wrote:
> On Tue, Apr 13, 2010 at 05:50:29PM +0800, Changli Gao wrote:
>> On Tue, Apr 13, 2010 at 4:08 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >
>> >        Probably not necessary.
>> >
>> >> +     volatile bool           flush_processing_queue;
>> >
>> > Use of 'volatile' is strongly discouraged, I would say, forbidden.
>>
>> volatile is used to avoid compiler optimization.
>
> Would it be reasonable to use ACCESS_ONCE() where this variable is used?

Oh, thanks. ACCESS_ONCE() is just what I need.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [PATCH] Fix SCTP failure with ipv6 source address routing
From: Paul Gortmaker @ 2010-04-13 22:37 UTC (permalink / raw)
  To: netdev; +Cc: vladislav.yasevich

From: Weixing Shi <Weixing.Shi@windriver.com>

Given the below test case, using source address routing, SCTP
does not work.

Node-A:
  1)ifconfig eth0 inet6 add 2001:1::1/64
  2)ip -6 rule add from 2001:1::1 table 100 pref 100
  3)ip -6 route add 2001:2::1 dev eth0 table 100
  4)sctp_darn -H 2001:1::1 -P 250 -l &

Node-B:
  1)ifconfig eth0 inet6 add 2001:2::1/64
  2)ip -6 rule add from 2001:2::1 table 100 pref 100
  3)ip -6 route add 2001:1::1 dev eth0 table 100
  4)sctp_darn -H 2001:2::1 -P 250 -h 2001:1::1 -p 250 -s

Root cause:
  Node-A and Node-B use source address routing, and in the
  begining, the source address will be NULL.  So SCTP will search
  the routing table by the destination address (because it is using
  the source address routing table), and hence the resulting dst_entry
  will be NULL.

Solution:
  After SCTP gets the correct source address, then we search for
  dst_entry again, and then we will get the correct value.

Signed-off-by: Weixing Shi <Weixing.Shi@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/sctp/transport.c |   11 +++++++++--
 1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index be4d63d..b5ae18c 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -295,9 +295,16 @@ void sctp_transport_route(struct sctp_transport *transport,
 
 	if (saddr)
 		memcpy(&transport->saddr, saddr, sizeof(union sctp_addr));
-	else
+	else {
 		af->get_saddr(opt, asoc, dst, daddr, &transport->saddr);
-
+		/* When using source address routing, since dst was
+		 * looked up prior to filling in the source address, dst
+		 * needs to be looked up again to get the correct dst
+		 */
+		if (dst)
+			dst_release(dst);
+		dst = af->get_dst(asoc, daddr, &transport->saddr);
+	}
 	transport->dst = dst;
 	if ((transport->param_flags & SPP_PMTUD_DISABLE) && transport->pathmtu) {
 		return;
-- 
1.6.5.2


^ permalink raw reply related

* Re: [PATCH 0/9] net: support multiple independant multicast routing instances
From: David Miller @ 2010-04-13 21:51 UTC (permalink / raw)
  To: kaber; +Cc: netdev
In-Reply-To: <1271171003-11901-1-git-send-email-kaber@trash.net>

From: Patrick McHardy <kaber@trash.net>
Date: Tue, 13 Apr 2010 17:03:14 +0200

> this is an updated patchset of my patches to support multiple independant
> multicast routing instances. Changes since the last posting are:
> 
> - rebase to the current net-next-2.6.git tree
> - fix up patch subjects to consistently refer to "ipv4: ipmr:"
> - fix up list_head conversion patch to add new elements at the head of
>   the list instead of at the tail
> 
> Please apply or pull from:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6.git master

I applied the patches instead of pulling just to check your email
patch submission format, and it was perfect! :-)

I'll do a git pull next time.

All applied to net-next-2.6, thanks!

^ permalink raw reply

* Re: forcedeth driver hangs under heavy load
From: Eric Dumazet @ 2010-04-13 21:46 UTC (permalink / raw)
  To: David Miller; +Cc: smulcahy, bhutchings, netdev, ben, aabdulla, 572201
In-Reply-To: <20100413.144340.138717714.davem@davemloft.net>

Le mardi 13 avril 2010 à 14:43 -0700, David Miller a écrit :
> Do you really come to the conclusion that TSO is broken with the above
> test results?
> 
> I would conclude that there is a TX checksumming issue, since merely
> turning TSO off does not fix the problem whereas turning TX
> checksumming off does.

Indeed, we clarified the point and it is a TX checksum issue.



^ permalink raw reply

* Re: forcedeth driver hangs under heavy load
From: David Miller @ 2010-04-13 21:43 UTC (permalink / raw)
  To: eric.dumazet; +Cc: smulcahy, bhutchings, netdev, ben, aabdulla, 572201
In-Reply-To: <1271169741.16881.437.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 13 Apr 2010 16:42:21 +0200

> Le mardi 13 avril 2010 à 15:27 +0100, stephen mulcahy a écrit :
>> Ok, I've tried both of the following with my reproducer
>> 
>> 1. ethtool -K eth0 tso off
>> 
>> RESULT: reproducer causes multiple hosts to be come unresponsive on 
>> first run.
>> 
>> 2. ethtool -K eth0 tx off
>> 
>> RESULT: reproducer runs three times without any hosts becoming unresponsive.
>> 
>> -stephen
> 
> Thanks Stephen !
> 
> Now some brave fouls to check the 6410 lines of this driver ? ;)
> 
> Question of the day : Why TSO is broken in forcedeth ?
> Is it generically broken or is it broken for specific NICS ?

Do you really come to the conclusion that TSO is broken with the above
test results?

I would conclude that there is a TX checksumming issue, since merely
turning TSO off does not fix the problem whereas turning TX
checksumming off does.

^ permalink raw reply

* RE: [PATCH 2/3] cxgb4i: main driver files
From: Karen Xie @ 2010-04-13 21:41 UTC (permalink / raw)
  To: Mike Christie, open-iscsi
  Cc: Rakesh Ranjan, netdev, linux-scsi, linux-kernel, davem,
	James.Bottomley
In-Reply-To: <4BC4D711.5030802@cs.wisc.edu>

Hi, Mike,

Yes, will do that for the next submission.

Thanks,
Karen

-----Original Message-----
From: Mike Christie [mailto:michaelc@cs.wisc.edu] 
Sent: Tuesday, April 13, 2010 1:42 PM
To: open-iscsi@googlegroups.com
Cc: Rakesh Ranjan; netdev@vger.kernel.org; linux-scsi@vger.kernel.org;
linux-kernel@vger.kernel.org; Karen Xie; davem@davemloft.net;
James.Bottomley@hansenpartnership.com
Subject: Re: [PATCH 2/3] cxgb4i: main driver files

On 04/08/2010 07:14 AM, Rakesh Ranjan wrote:
> +static inline int cxgb4i_ddp_gl_map(struct pci_dev *pdev,
> +				struct cxgb4i_gather_list *gl)
> +{
> +	int i;
> +
> +	for (i = 0; i<  gl->nelem; i++) {
> +		gl->phys_addr[i] = pci_map_page(pdev, gl->pages[i], 0,
> +						PAGE_SIZE,

Hey Rakesh,

I guess we are trying to move away from the pci mapping functions move 
to the dma ones. On your next submission, could you fix those up too?

^ permalink raw reply

* Re: [PATCH Resubmission] drivers/net/usb: Add new driver ipheth
From: David Miller @ 2010-04-13 21:29 UTC (permalink / raw)
  To: agimenez
  Cc: linux-kernel, dgiagio, dborca, gregkh, jonas.sjoquist,
	steve.glendinning, torgny.johansson, dbrownell, omar.oberthur,
	linux-usb, netdev
In-Reply-To: <4BC4BFFD.9040802@sysvalve.es>

From: "L. Alberto Giménez" <agimenez@sysvalve.es>
Date: Tue, 13 Apr 2010 21:03:25 +0200

> Thanks for the info. I didn't know that I had to add an entry on the
> upper level Makefile. I guess that something like
> obj-$(CONFIG_USB_IPHETH) += usb/ should be enough? (I got it from the
> other USB net drivers).

Yes.

^ permalink raw reply

* [PATCH net-next-2.6] drivers: net: use skb_headlen()
From: Eric Dumazet @ 2010-04-13 20:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

replaces (skb->len - skb->data_len) occurrences by skb_headlen(skb)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 drivers/atm/eni.c                  |    2 +-
 drivers/atm/he.c                   |    4 ++--
 drivers/net/3c59x.c                |    4 ++--
 drivers/net/atl1e/atl1e_main.c     |    2 +-
 drivers/net/atlx/atl1.c            |    4 ++--
 drivers/net/benet/be_main.c        |    4 ++--
 drivers/net/chelsio/sge.c          |    8 ++++----
 drivers/net/e1000/e1000_main.c     |    4 ++--
 drivers/net/e1000e/netdev.c        |    4 ++--
 drivers/net/ehea/ehea_main.c       |   10 +++++-----
 drivers/net/forcedeth.c            |    4 ++--
 drivers/net/ixgbevf/ixgbevf_main.c |    2 +-
 drivers/net/ksz884x.c              |    2 +-
 drivers/net/myri10ge/myri10ge.c    |    2 +-
 drivers/net/s2io.c                 |    4 ++--
 drivers/net/tehuti.c               |    2 +-
 drivers/net/tsi108_eth.c           |    4 ++--
 17 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/drivers/atm/eni.c b/drivers/atm/eni.c
index 719ec5a..90a5a7c 100644
--- a/drivers/atm/eni.c
+++ b/drivers/atm/eni.c
@@ -1131,7 +1131,7 @@ DPRINTK("doing direct send\n"); /* @@@ well, this doesn't work anyway */
 			if (i == -1)
 				put_dma(tx->index,eni_dev->dma,&j,(unsigned long)
 				    skb->data,
-				    skb->len - skb->data_len);
+				    skb_headlen(skb));
 			else
 				put_dma(tx->index,eni_dev->dma,&j,(unsigned long)
 				    skb_shinfo(skb)->frags[i].page + skb_shinfo(skb)->frags[i].page_offset,
diff --git a/drivers/atm/he.c b/drivers/atm/he.c
index c213e0d..56c2e99 100644
--- a/drivers/atm/he.c
+++ b/drivers/atm/he.c
@@ -2664,8 +2664,8 @@ he_send(struct atm_vcc *vcc, struct sk_buff *skb)
 
 #ifdef USE_SCATTERGATHER
 	tpd->iovec[slot].addr = pci_map_single(he_dev->pci_dev, skb->data,
-				skb->len - skb->data_len, PCI_DMA_TODEVICE);
-	tpd->iovec[slot].len = skb->len - skb->data_len;
+				skb_headlen(skb), PCI_DMA_TODEVICE);
+	tpd->iovec[slot].len = skb_headlen(skb);
 	++slot;
 
 	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
diff --git a/drivers/net/3c59x.c b/drivers/net/3c59x.c
index 5f92fdb..9752530 100644
--- a/drivers/net/3c59x.c
+++ b/drivers/net/3c59x.c
@@ -2129,8 +2129,8 @@ boomerang_start_xmit(struct sk_buff *skb, struct net_device *dev)
 		int i;
 
 		vp->tx_ring[entry].frag[0].addr = cpu_to_le32(pci_map_single(VORTEX_PCI(vp), skb->data,
-										skb->len-skb->data_len, PCI_DMA_TODEVICE));
-		vp->tx_ring[entry].frag[0].length = cpu_to_le32(skb->len-skb->data_len);
+										skb_headlen(skb), PCI_DMA_TODEVICE));
+		vp->tx_ring[entry].frag[0].length = cpu_to_le32(skb_headlen(skb));
 
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
 			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
diff --git a/drivers/net/atl1e/atl1e_main.c b/drivers/net/atl1e/atl1e_main.c
index b6605d4..d45356f 100644
--- a/drivers/net/atl1e/atl1e_main.c
+++ b/drivers/net/atl1e/atl1e_main.c
@@ -1679,7 +1679,7 @@ static void atl1e_tx_map(struct atl1e_adapter *adapter,
 {
 	struct atl1e_tpd_desc *use_tpd = NULL;
 	struct atl1e_tx_buffer *tx_buffer = NULL;
-	u16 buf_len = skb->len - skb->data_len;
+	u16 buf_len = skb_headsize(skb);
 	u16 map_len = 0;
 	u16 mapped_len = 0;
 	u16 hdr_len = 0;
diff --git a/drivers/net/atlx/atl1.c b/drivers/net/atlx/atl1.c
index 0ebd820..33448a0 100644
--- a/drivers/net/atlx/atl1.c
+++ b/drivers/net/atlx/atl1.c
@@ -2347,7 +2347,7 @@ static netdev_tx_t atl1_xmit_frame(struct sk_buff *skb,
 {
 	struct atl1_adapter *adapter = netdev_priv(netdev);
 	struct atl1_tpd_ring *tpd_ring = &adapter->tpd_ring;
-	int len = skb->len;
+	int len;
 	int tso;
 	int count = 1;
 	int ret_val;
@@ -2359,7 +2359,7 @@ static netdev_tx_t atl1_xmit_frame(struct sk_buff *skb,
 	unsigned int f;
 	unsigned int proto_hdr_len;
 
-	len -= skb->data_len;
+	len = skb_headlen(skb);
 
 	if (unlikely(skb->len <= 0)) {
 		dev_kfree_skb_any(skb);
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 18e0a80..fa10f13 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -432,7 +432,7 @@ static int make_tx_wrbs(struct be_adapter *adapter,
 	map_head = txq->head;
 
 	if (skb->len > skb->data_len) {
-		int len = skb->len - skb->data_len;
+		int len = skb_headlen(skb);
 		busaddr = pci_map_single(pdev, skb->data, len,
 					 PCI_DMA_TODEVICE);
 		if (pci_dma_mapping_error(pdev, busaddr))
@@ -1098,7 +1098,7 @@ static void be_tx_compl_process(struct be_adapter *adapter, u16 last_index)
 		cur_index = txq->tail;
 		wrb = queue_tail_node(txq);
 		unmap_tx_frag(adapter->pdev, wrb, (unmap_skb_hdr &&
-					sent_skb->len > sent_skb->data_len));
+					skb_headlen(sent_skb)));
 		unmap_skb_hdr = false;
 
 		num_wrbs++;
diff --git a/drivers/net/chelsio/sge.c b/drivers/net/chelsio/sge.c
index a8ffc1e..f01cfdb 100644
--- a/drivers/net/chelsio/sge.c
+++ b/drivers/net/chelsio/sge.c
@@ -1123,7 +1123,7 @@ static inline unsigned int compute_large_page_tx_descs(struct sk_buff *skb)
 
 	if (PAGE_SIZE > SGE_TX_DESC_MAX_PLEN) {
 		unsigned int nfrags = skb_shinfo(skb)->nr_frags;
-		unsigned int i, len = skb->len - skb->data_len;
+		unsigned int i, len = skb_headlen(skb);
 		while (len > SGE_TX_DESC_MAX_PLEN) {
 			count++;
 			len -= SGE_TX_DESC_MAX_PLEN;
@@ -1219,10 +1219,10 @@ static inline void write_tx_descs(struct adapter *adapter, struct sk_buff *skb,
 	ce = &q->centries[pidx];
 
 	mapping = pci_map_single(adapter->pdev, skb->data,
-				skb->len - skb->data_len, PCI_DMA_TODEVICE);
+				 skb_headlen(skb), PCI_DMA_TODEVICE);
 
 	desc_mapping = mapping;
-	desc_len = skb->len - skb->data_len;
+	desc_len = skb_headlen(skb);
 
 	flags = F_CMD_DATAVALID | F_CMD_SOP |
 	    V_CMD_EOP(nfrags == 0 && desc_len <= SGE_TX_DESC_MAX_PLEN) |
@@ -1258,7 +1258,7 @@ static inline void write_tx_descs(struct adapter *adapter, struct sk_buff *skb,
 
 	ce->skb = NULL;
 	dma_unmap_addr_set(ce, dma_addr, mapping);
-	dma_unmap_len_set(ce, dma_len, skb->len - skb->data_len);
+	dma_unmap_len_set(ce, dma_len, skb_headlen(skb));
 
 	for (i = 0; nfrags--; i++) {
 		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index 47da5fc..974a02d 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -2929,7 +2929,7 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 	unsigned int first, max_per_txd = E1000_MAX_DATA_PER_TXD;
 	unsigned int max_txd_pwr = E1000_MAX_TXD_PWR;
 	unsigned int tx_flags = 0;
-	unsigned int len = skb->len - skb->data_len;
+	unsigned int len = skb_headlen(skb);
 	unsigned int nr_frags;
 	unsigned int mss;
 	int count = 0;
@@ -2980,7 +2980,7 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 					dev_kfree_skb_any(skb);
 					return NETDEV_TX_OK;
 				}
-				len = skb->len - skb->data_len;
+				len = skb_headlen(skb);
 				break;
 			default:
 				/* do nothing */
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index 38390b5..214db04 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -4130,7 +4130,7 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 	unsigned int max_per_txd = E1000_MAX_PER_TXD;
 	unsigned int max_txd_pwr = E1000_MAX_TXD_PWR;
 	unsigned int tx_flags = 0;
-	unsigned int len = skb->len - skb->data_len;
+	unsigned int len = skb_headsize(skb);
 	unsigned int nr_frags;
 	unsigned int mss;
 	int count = 0;
@@ -4180,7 +4180,7 @@ static netdev_tx_t e1000_xmit_frame(struct sk_buff *skb,
 				dev_kfree_skb_any(skb);
 				return NETDEV_TX_OK;
 			}
-			len = skb->len - skb->data_len;
+			len = skb_headlen(skb);
 		}
 	}
 
diff --git a/drivers/net/ehea/ehea_main.c b/drivers/net/ehea/ehea_main.c
index e2d25fb..3f445ef 100644
--- a/drivers/net/ehea/ehea_main.c
+++ b/drivers/net/ehea/ehea_main.c
@@ -1618,7 +1618,7 @@ static void write_swqe2_TSO(struct sk_buff *skb,
 {
 	struct ehea_vsgentry *sg1entry = &swqe->u.immdata_desc.sg_entry;
 	u8 *imm_data = &swqe->u.immdata_desc.immediate_data[0];
-	int skb_data_size = skb->len - skb->data_len;
+	int skb_data_size = skb_headlen(skb);
 	int headersize;
 
 	/* Packet is TCP with TSO enabled */
@@ -1629,7 +1629,7 @@ static void write_swqe2_TSO(struct sk_buff *skb,
 	 */
 	headersize = ETH_HLEN + ip_hdrlen(skb) + tcp_hdrlen(skb);
 
-	skb_data_size = skb->len - skb->data_len;
+	skb_data_size = skb_headlen(skb);
 
 	if (skb_data_size >= headersize) {
 		/* copy immediate data */
@@ -1651,7 +1651,7 @@ static void write_swqe2_TSO(struct sk_buff *skb,
 static void write_swqe2_nonTSO(struct sk_buff *skb,
 			       struct ehea_swqe *swqe, u32 lkey)
 {
-	int skb_data_size = skb->len - skb->data_len;
+	int skb_data_size = skb_headlen(skb);
 	u8 *imm_data = &swqe->u.immdata_desc.immediate_data[0];
 	struct ehea_vsgentry *sg1entry = &swqe->u.immdata_desc.sg_entry;
 
@@ -2108,8 +2108,8 @@ static void ehea_xmit3(struct sk_buff *skb, struct net_device *dev,
 	} else {
 		/* first copy data from the skb->data buffer ... */
 		skb_copy_from_linear_data(skb, imm_data,
-					  skb->len - skb->data_len);
-		imm_data += skb->len - skb->data_len;
+					  skb_headlen(skb));
+		imm_data += skb_headlen(skb);
 
 		/* ... then copy data from the fragments */
 		for (i = 0; i < nfrags; i++) {
diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
index 3267b23..6c18834 100644
--- a/drivers/net/forcedeth.c
+++ b/drivers/net/forcedeth.c
@@ -2148,7 +2148,7 @@ static netdev_tx_t nv_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	unsigned int i;
 	u32 offset = 0;
 	u32 bcnt;
-	u32 size = skb->len-skb->data_len;
+	u32 size = skb_headlen(skb);
 	u32 entries = (size >> NV_TX2_TSO_MAX_SHIFT) + ((size & (NV_TX2_TSO_MAX_SIZE-1)) ? 1 : 0);
 	u32 empty_slots;
 	struct ring_desc* put_tx;
@@ -2269,7 +2269,7 @@ static netdev_tx_t nv_start_xmit_optimized(struct sk_buff *skb,
 	unsigned int i;
 	u32 offset = 0;
 	u32 bcnt;
-	u32 size = skb->len-skb->data_len;
+	u32 size = skb_headlen(skb);
 	u32 entries = (size >> NV_TX2_TSO_MAX_SHIFT) + ((size & (NV_TX2_TSO_MAX_SIZE-1)) ? 1 : 0);
 	u32 empty_slots;
 	struct ring_desc_ex* put_tx;
diff --git a/drivers/net/ixgbevf/ixgbevf_main.c b/drivers/net/ixgbevf/ixgbevf_main.c
index 960e985..f484161 100644
--- a/drivers/net/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ixgbevf/ixgbevf_main.c
@@ -604,7 +604,7 @@ static bool ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 		 * packets not getting split correctly
 		 */
 		if (staterr & IXGBE_RXD_STAT_LB) {
-			u32 header_fixup_len = skb->len - skb->data_len;
+			u32 header_fixup_len = skb_headlen(skb);
 			if (header_fixup_len < 14)
 				skb_push(skb, header_fixup_len);
 		}
diff --git a/drivers/net/ksz884x.c b/drivers/net/ksz884x.c
index 4a231bd..cc0bc8a 100644
--- a/drivers/net/ksz884x.c
+++ b/drivers/net/ksz884x.c
@@ -4684,7 +4684,7 @@ static void send_packet(struct sk_buff *skb, struct net_device *dev)
 		int frag;
 		skb_frag_t *this_frag;
 
-		dma_buf->len = skb->len - skb->data_len;
+		dma_buf->len = skb_headlen(skb);
 
 		dma_buf->dma = pci_map_single(
 			hw_priv->pdev, skb->data, dma_buf->len,
diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
index 958dc28..e0b47cc 100644
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -2757,7 +2757,7 @@ again:
 	}
 
 	/* map the skb for DMA */
-	len = skb->len - skb->data_len;
+	len = skb_headlen(skb);
 	idx = tx->req & tx->mask;
 	tx->info[idx].skb = skb;
 	bus = pci_map_single(mgp->pdev, skb->data, len, PCI_DMA_TODEVICE);
diff --git a/drivers/net/s2io.c b/drivers/net/s2io.c
index bab0061..f155928 100644
--- a/drivers/net/s2io.c
+++ b/drivers/net/s2io.c
@@ -2400,7 +2400,7 @@ static struct sk_buff *s2io_txdl_getskb(struct fifo_info *fifo_data,
 		return NULL;
 	}
 	pci_unmap_single(nic->pdev, (dma_addr_t)txds->Buffer_Pointer,
-			 skb->len - skb->data_len, PCI_DMA_TODEVICE);
+			 skb_headlen(skb), PCI_DMA_TODEVICE);
 	frg_cnt = skb_shinfo(skb)->nr_frags;
 	if (frg_cnt) {
 		txds++;
@@ -4202,7 +4202,7 @@ static netdev_tx_t s2io_xmit(struct sk_buff *skb, struct net_device *dev)
 		txdp->Control_2 |= TXD_VLAN_TAG(vlan_tag);
 	}
 
-	frg_len = skb->len - skb->data_len;
+	frg_len = skb_headlen(skb);
 	if (offload_type == SKB_GSO_UDP) {
 		int ufo_size;
 
diff --git a/drivers/net/tehuti.c b/drivers/net/tehuti.c
index a38aede..93affdc 100644
--- a/drivers/net/tehuti.c
+++ b/drivers/net/tehuti.c
@@ -1508,7 +1508,7 @@ bdx_tx_map_skb(struct bdx_priv *priv, struct sk_buff *skb,
 	int nr_frags = skb_shinfo(skb)->nr_frags;
 	int i;
 
-	db->wptr->len = skb->len - skb->data_len;
+	db->wptr->len = skb_headsize(skb);
 	db->wptr->addr.dma = pci_map_single(priv->pdev, skb->data,
 					    db->wptr->len, PCI_DMA_TODEVICE);
 	pbl->len = CPU_CHIP_SWAP32(db->wptr->len);
diff --git a/drivers/net/tsi108_eth.c b/drivers/net/tsi108_eth.c
index 1292d23..a03730b 100644
--- a/drivers/net/tsi108_eth.c
+++ b/drivers/net/tsi108_eth.c
@@ -704,8 +704,8 @@ static int tsi108_send_packet(struct sk_buff * skb, struct net_device *dev)
 
 		if (i == 0) {
 			data->txring[tx].buf0 = dma_map_single(NULL, skb->data,
-					skb->len - skb->data_len, DMA_TO_DEVICE);
-			data->txring[tx].len = skb->len - skb->data_len;
+					skb_headlen(skb), DMA_TO_DEVICE);
+			data->txring[tx].len = skb_headlen(skb);
 			misc |= TSI108_TX_SOF;
 		} else {
 			skb_frag_t *frag = &skb_shinfo(skb)->frags[i - 1];



^ permalink raw reply related

* Re: [PATCH] tun: orphan an skb on tx
From: Michael S. Tsirkin @ 2010-04-13 20:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jan Kiszka, David S. Miller, Herbert Xu, Paul Moore,
	David Woodhouse, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel
In-Reply-To: <1271191086.16881.570.camel@edumazet-laptop>

On Tue, Apr 13, 2010 at 10:38:06PM +0200, Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 23:25 +0300, Michael S. Tsirkin a écrit :
> > On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
> > > Le mardi 13 avril 2010 à 20:39 +0300, Michael S. Tsirkin a écrit :
> > > 
> > > > > When a socket with inflight tx packets is closed, we dont block the
> > > > > close, we only delay the socket freeing once all packets were delivered
> > > > > and freed.
> > > > > 
> > > > 
> > > > Which is wrong, since this is under userspace control, so you get
> > > > unkillable processes.
> > > > 
> > > 
> > > We do not get unkillable processes, at least with sockets I was thinking
> > > about (TCP/UDP ones).
> > > 
> > > Maybe tun sockets can behave the same ?
> > 
> > Looks like that's what my patch does: ip_rcv seems to call
> > skb_orphan too.
> 
> Well, I was speaking of tx side, you speak of receiving side.

Point is, both ip_rcv and my patch call skb_orphan.

> An external flood (coming from another domain) is another problem.
> 
> A sender might flood the 'network' inside our domain. How can we
> reasonably limit the sender ?
> 
> Maybe the answer is 'We can not', but it should be stated somewhere, so
> that someone can address this point later.
> 

And whatever's done should ideally work for tap to IP
and IP to IP sockets as well, not just tap to tap.

-- 
MST

^ permalink raw reply

* Re: [PATCH 2/3] cxgb4i: main driver files
From: Mike Christie @ 2010-04-13 20:41 UTC (permalink / raw)
  To: open-iscsi
  Cc: Rakesh Ranjan, netdev, linux-scsi, linux-kernel, kxie, davem,
	James.Bottomley
In-Reply-To: <1270728855-20951-3-git-send-email-rakesh@chelsio.com>

On 04/08/2010 07:14 AM, Rakesh Ranjan wrote:
> +static inline int cxgb4i_ddp_gl_map(struct pci_dev *pdev,
> +				struct cxgb4i_gather_list *gl)
> +{
> +	int i;
> +
> +	for (i = 0; i<  gl->nelem; i++) {
> +		gl->phys_addr[i] = pci_map_page(pdev, gl->pages[i], 0,
> +						PAGE_SIZE,

Hey Rakesh,

I guess we are trying to move away from the pci mapping functions move 
to the dma ones. On your next submission, could you fix those up too?

^ permalink raw reply

* Re: [PATCH] tun: orphan an skb on tx
From: Eric Dumazet @ 2010-04-13 20:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jan Kiszka, David S. Miller, Herbert Xu, Paul Moore,
	David Woodhouse, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel
In-Reply-To: <20100413202548.GA3582@redhat.com>

Le mardi 13 avril 2010 à 23:25 +0300, Michael S. Tsirkin a écrit :
> On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
> > Le mardi 13 avril 2010 à 20:39 +0300, Michael S. Tsirkin a écrit :
> > 
> > > > When a socket with inflight tx packets is closed, we dont block the
> > > > close, we only delay the socket freeing once all packets were delivered
> > > > and freed.
> > > > 
> > > 
> > > Which is wrong, since this is under userspace control, so you get
> > > unkillable processes.
> > > 
> > 
> > We do not get unkillable processes, at least with sockets I was thinking
> > about (TCP/UDP ones).
> > 
> > Maybe tun sockets can behave the same ?
> 
> Looks like that's what my patch does: ip_rcv seems to call
> skb_orphan too.

Well, I was speaking of tx side, you speak of receiving side.
An external flood (coming from another domain) is another problem.

A sender might flood the 'network' inside our domain. How can we
reasonably limit the sender ?

Maybe the answer is 'We can not', but it should be stated somewhere, so
that someone can address this point later.




^ permalink raw reply

* Re: [PATCH] tun: orphan an skb on tx
From: Michael S. Tsirkin @ 2010-04-13 20:25 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jan Kiszka, David S. Miller, Herbert Xu, Paul Moore,
	David Woodhouse, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel
In-Reply-To: <1271183463.16881.545.camel@edumazet-laptop>

On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 20:39 +0300, Michael S. Tsirkin a écrit :
> 
> > > When a socket with inflight tx packets is closed, we dont block the
> > > close, we only delay the socket freeing once all packets were delivered
> > > and freed.
> > > 
> > 
> > Which is wrong, since this is under userspace control, so you get
> > unkillable processes.
> > 
> 
> We do not get unkillable processes, at least with sockets I was thinking
> about (TCP/UDP ones).
> 
> Maybe tun sockets can behave the same ?

Looks like that's what my patch does: ip_rcv seems to call
skb_orphan too.

> Herbert Acked your patch, so I guess its OK, but I think it can be
> dangerous.
> Anyway my feeling is that we try to add various mechanisms to keep a
> hostile user flooding another one.
> 
> For example, UDP got memory accounting quite recently, and we added
> socket backlog limits very recently. It was considered not needed few
> years ago.
> 




^ permalink raw reply

* usb-sound circular locking again?
From: Richard Zidlicky @ 2010-04-13 20:30 UTC (permalink / raw)
  To: Takashi Iwai; +Cc: Andrew Morton, linux-kernel, netdev
In-Reply-To: <s5hocm9som6.wl%tiwai@suse.de>

Hi,

is this the same old issue? Any way to fix it? Seeing it triggered in a sync
syscall does not make me comfortable.

Apr 13 02:01:36 localhost kernel: [ 8569.449882] PM: Syncing filesystems ... 
Apr 13 02:01:36 localhost kernel: [ 8569.449998] =======================================================
Apr 13 02:01:36 localhost kernel: [ 8569.450049] [ INFO: possible circular locking dependency detected ]
Apr 13 02:01:36 localhost kernel: [ 8569.450078] 2.6.33.2v2 #4
Apr 13 02:01:36 localhost kernel: [ 8569.450101] -------------------------------------------------------
Apr 13 02:01:36 localhost kernel: [ 8569.450130] pm-hibernate/17348 is trying to acquire lock:
Apr 13 02:01:36 localhost kernel: [ 8569.450158]  (mutex){+.+...}, at: [<c04e6670>] sync_filesystems+0x14/0xd6
Apr 13 02:01:36 localhost kernel: [ 8569.450252] 
Apr 13 02:01:36 localhost kernel: [ 8569.450253] but task is already holding lock:
Apr 13 02:01:36 localhost kernel: [ 8569.450266]  (pm_mutex){+.+.+.}, at: [<c0466658>] hibernate+0x13/0x18d
Apr 13 02:01:36 localhost kernel: [ 8569.450266] 
Apr 13 02:01:36 localhost kernel: [ 8569.450266] which lock already depends on the new lock.
Apr 13 02:01:36 localhost kernel: [ 8569.450266] 
Apr 13 02:01:36 localhost kernel: [ 8569.450266] 
Apr 13 02:01:36 localhost kernel: [ 8569.450266] the existing dependency chain (in reverse order) is:
Apr 13 02:01:36 localhost kernel: [ 8569.450266] 
Apr 13 02:01:36 localhost kernel: [ 8569.450266] -> #6 (pm_mutex){+.+.+.}:
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c045b60a>] __lock_acquire+0xa2d/0xbb7
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c0736d84>] __mutex_lock_common+0x35/0x2f3
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c07370e0>] mutex_lock_nested+0x30/0x38
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c0466658>] hibernate+0x13/0x18d
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c046551c>] state_store+0x56/0xa8
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c05acb19>] kobj_attr_store+0x1a/0x22
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c050f306>] sysfs_write_file+0xb9/0xe4
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c04cc821>] vfs_write+0x84/0xdf
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c04cc915>] sys_write+0x3b/0x60
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c0738295>] syscall_call+0x7/0xb
Apr 13 02:01:36 localhost kernel: [ 8569.450266] 
Apr 13 02:01:36 localhost kernel: [ 8569.450266] -> #5 (s_active){++++.+}:
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c045b60a>] __lock_acquire+0xa2d/0xbb7
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c05102f8>] sysfs_addrm_finish+0x89/0xde
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c050eaf7>] sysfs_hash_and_remove+0x3d/0x4f
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c0511100>] sysfs_remove_group+0x74/0xa3
Apr 13 02:01:36 localhost kernel: [ 8569.450266]        [<c062e16c>] dpm_sysfs_remove+0x10/0x12
Apr 13 09:39:32 localhost kernel: [ 8569.450266]        [<c062933f>] device_del+0x33/0x154
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c0629488>] device_unregister+0x28/0x4b
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c067b7c5>] usb_remove_ep_devs+0x15/0x1f
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c0675c92>] remove_intf_ep_devs+0x21/0x32
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c0676d53>] usb_set_interface+0x18c/0x22c
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<f8302c46>] snd_usb_capture_close+0x26/0x3f [snd_usb_audio]
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<f80fbb08>] snd_pcm_release_substream+0x3d/0x66 [snd_pcm]
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<f80fbb8d>] snd_pcm_release+0x5c/0x9e [snd_pcm]
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c04cd12a>] __fput+0xf0/0x187
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c04cd1da>] fput+0x19/0x1b
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c04b2e9f>] remove_vma+0x3e/0x5d
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c04b3b2a>] do_munmap+0x23c/0x259
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c04b3b77>] sys_munmap+0x30/0x3f
Apr 13 09:39:34 localhost kernel: [ 8569.450266]        [<c0738295>] syscall_call+0x7/0xb
Apr 13 09:39:34 localhost kernel: [ 8569.450266] 
Apr 13 09:39:34 localhost kernel: [ 8569.450266] -> #4 (&pcm->open_mutex){+.+.+.}:
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c045b60a>] __lock_acquire+0xa2d/0xbb7
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c0736d84>] __mutex_lock_common+0x35/0x2f3
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c07370e0>] mutex_lock_nested+0x30/0x38
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<f80fbb86>] snd_pcm_release+0x55/0x9e [snd_pcm]
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c04cd12a>] __fput+0xf0/0x187
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c04cd1da>] fput+0x19/0x1b
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c04b2e9f>] remove_vma+0x3e/0x5d
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c04b3b2a>] do_munmap+0x23c/0x259
Apr 13 09:39:34 localhost kernel: [ 8569.454127]        [<c04b3b77>] sys_munmap+0x30/0x3f
Apr 13 09:39:34 localhost kernel: [ 8569.455127]        [<c0738295>] syscall_call+0x7/0xb
Apr 13 09:39:34 localhost kernel: [ 8569.455127] 
Apr 13 09:39:34 localhost kernel: [ 8569.455127] -> #3 (&mm->mmap_sem){++++++}:
Apr 13 09:39:34 localhost kernel: [ 8569.455127]        [<c045b60a>] __lock_acquire+0xa2d/0xbb7
Apr 13 09:39:34 localhost kernel: [ 8569.455127]        [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 09:39:34 localhost kernel: [ 8569.455127]        [<c04add1a>] might_fault+0x64/0x81
Apr 13 09:39:34 localhost kernel: [ 8569.455127]        [<c05b3828>] copy_to_user+0x2c/0xfc
Apr 13 09:39:34 localhost kernel: [ 8569.455127]        [<c04d784b>] filldir64+0x97/0xcd
Apr 13 09:39:34 localhost kernel: [ 8569.455127]        [<c04e299c>] dcache_readdir+0x5a/0x1af
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c04d7a5d>] vfs_readdir+0x68/0x94
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c04d7aec>] sys_getdents64+0x63/0xa0
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c0738295>] syscall_call+0x7/0xb
Apr 13 09:39:34 localhost kernel: [ 8569.456129] 
Apr 13 09:39:34 localhost kernel: [ 8569.456129] -> #2 (&sb->s_type->i_mutex_key#3){+.+.+.}:
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c045b60a>] __lock_acquire+0xa2d/0xbb7
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c0736d84>] __mutex_lock_common+0x35/0x2f3
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c07370e0>] mutex_lock_nested+0x30/0x38
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c051164f>] devpts_get_sb+0x1c0/0x29f
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c04ce0db>] vfs_kern_mount+0x86/0x11f
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c04ce1b8>] do_kern_mount+0x32/0xbe
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c04e02c2>] do_mount+0x671/0x6d0
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c04e0382>] sys_mount+0x61/0x8f
Apr 13 09:39:34 localhost kernel: [ 8569.456129]        [<c0738295>] syscall_call+0x7/0xb
Apr 13 09:39:34 localhost kernel: [ 8569.456129] 
Apr 13 09:39:34 localhost kernel: [ 8569.456129] -> #1 (&type->s_umount_key#19){++++..}:
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c045b60a>] __lock_acquire+0xa2d/0xbb7
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c0737310>] down_read+0x31/0x45
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c04e66cf>] sync_filesystems+0x73/0xd6
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c04e676e>] sys_sync+0x11/0x2d
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c0738295>] syscall_call+0x7/0xb
Apr 13 09:39:34 localhost kernel: [ 8569.458127] 
Apr 13 09:39:34 localhost kernel: [ 8569.458127] -> #0 (mutex){+.+...}:
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c045b517>] __lock_acquire+0x93a/0xbb7
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c0736d84>] __mutex_lock_common+0x35/0x2f3
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c07370e0>] mutex_lock_nested+0x30/0x38
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c04e6670>] sync_filesystems+0x14/0xd6
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c04e676e>] sys_sync+0x11/0x2d
Apr 13 09:39:34 localhost kernel: [ 8569.458127]        [<c04666c2>] hibernate+0x7d/0x18d
Apr 13 09:39:34 localhost kernel: [ 8569.459761]        [<c046551c>] state_store+0x56/0xa8
Apr 13 09:39:34 localhost kernel: [ 8569.459761]        [<c05acb19>] kobj_attr_store+0x1a/0x22
Apr 13 09:39:34 localhost kernel: [ 8569.459761]        [<c050f306>] sysfs_write_file+0xb9/0xe4
Apr 13 09:39:34 localhost kernel: [ 8569.459761]        [<c04cc821>] vfs_write+0x84/0xdf
Apr 13 09:39:34 localhost kernel: [ 8569.460128]        [<c04cc915>] sys_write+0x3b/0x60
Apr 13 09:39:34 localhost kernel: [ 8569.460128]        [<c0738295>] syscall_call+0x7/0xb
Apr 13 09:39:34 localhost kernel: [ 8569.460128] 
Apr 13 09:39:34 localhost kernel: [ 8569.460128] other info that might help us debug this:
Apr 13 09:39:34 localhost kernel: [ 8569.460128] 
Apr 13 09:39:34 localhost kernel: [ 8569.460128] 4 locks held by pm-hibernate/17348:
Apr 13 09:39:34 localhost kernel: [ 8569.460128]  #0:  (&buffer->mutex){+.+.+.}, at: [<c050f272>] sysfs_write_file+0x25/0xe4
Apr 13 09:39:34 localhost kernel: [ 8569.460128]  #1:  (s_active){++++.+}, at: [<c0510544>] sysfs_get_active_two+0x16/0x36
Apr 13 09:39:34 localhost kernel: [ 8569.461127]  #2:  (s_active){++++.+}, at: [<c051054f>] sysfs_get_active_two+0x21/0x36
Apr 13 09:39:34 localhost kernel: [ 8569.461127]  #3:  (pm_mutex){+.+.+.}, at: [<c0466658>] hibernate+0x13/0x18d
Apr 13 09:39:34 localhost kernel: [ 8569.461127] 
Apr 13 09:39:34 localhost kernel: [ 8569.461127] stack backtrace:
Apr 13 09:39:34 localhost kernel: [ 8569.461127] Pid: 17348, comm: pm-hibernate Not tainted 2.6.33.2v2 #4
Apr 13 09:39:34 localhost kernel: [ 8569.461127] Call Trace:
Apr 13 09:39:34 localhost kernel: [ 8569.461127]  [<c0735b79>] ? printk+0xf/0x16
Apr 13 09:39:34 localhost kernel: [ 8569.461127]  [<c045a8a0>] print_circular_bug+0x90/0x9c
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c045b517>] __lock_acquire+0x93a/0xbb7
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c042730d>] ? update_curr+0x177/0x17f
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c0459bf5>] ? mark_lock+0x1e/0x1ea
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c045b828>] lock_acquire+0x94/0xb1
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c04e6670>] ? sync_filesystems+0x14/0xd6
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c0736d84>] __mutex_lock_common+0x35/0x2f3
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c04e6670>] ? sync_filesystems+0x14/0xd6
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c04e3423>] ? bdi_alloc_queue_work+0x84/0xa0
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c07370e0>] mutex_lock_nested+0x30/0x38
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c04e6670>] ? sync_filesystems+0x14/0xd6
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c04e6670>] sync_filesystems+0x14/0xd6
Apr 13 09:39:34 localhost kernel: [ 8569.462128]  [<c04e676e>] sys_sync+0x11/0x2d
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c04666c2>] hibernate+0x7d/0x18d
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c04654c6>] ? state_store+0x0/0xa8
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c046551c>] state_store+0x56/0xa8
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c04654c6>] ? state_store+0x0/0xa8
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c05acb19>] kobj_attr_store+0x1a/0x22
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c050f306>] sysfs_write_file+0xb9/0xe4
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c050f24d>] ? sysfs_write_file+0x0/0xe4
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c04cc821>] vfs_write+0x84/0xdf
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c04cc915>] sys_write+0x3b/0x60
Apr 13 09:39:34 localhost kernel: [ 8569.463127]  [<c0738295>] syscall_call+0x7/0xb
Apr 13 09:39:34 localhost kernel: [ 8569.484133] done.

Apr 13 09:39:34 localhost kernel: [ 8569.484223] Freezing user space processes ... (elapsed 0.04 seconds) done.
Apr 13 09:39:34 localhost kernel: [ 8569.528142] Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
Apr 13 09:39:34 localhost kernel: [ 8569.539272] PM: Preallocating image memory... done (allocated 349210 pages)
Apr 13 09:39:34 localhost kernel: [ 8583.627118] PM: Allocated 1396840 kbytes in 14.08 seconds (99.20 MB/s)

Regards,
Richard


^ permalink raw reply

* Re: forcedeth driver hangs under heavy load
From: Eric Dumazet @ 2010-04-13 20:01 UTC (permalink / raw)
  To: stephen mulcahy
  Cc: Ben Hutchings, netdev, Ben Hutchings, Ayaz Abdulla, 572201
In-Reply-To: <4BC48CE0.1080504@gmail.com>

Le mardi 13 avril 2010 à 16:25 +0100, stephen mulcahy a écrit :
> Eric Dumazet wrote:
> > OK, thanks for clarification.
> > 
> > Last question, did you tried a vanilla kernel, aka 2.6.33.2 for
> > example ?
> 
> I built a Debian package from the vanilla 2.6.33.2 and installed that on 
> all nodes and tried my reproducer with the same results - nodes becoming 
> unresponsive.
> 
> I didn't try changing the tso and tx settings with the 2.6.33.2 kernel 
> though. Let me know if that would be useful (and/or if there is another 
> kernel that you would like me to test with) and I'll try to fit it in.
> 

I tried 2.6.34-rc4 (64bits) on an old machine I had lying at home.



00:0a.0 Bridge: nVidia Corporation CK804 Ethernet Controller (rev a3)
	Subsystem: ASUSTeK Computer Inc. K8N4-E or A8N-E Mainboard
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
	Latency: 0 (250ns min, 5000ns max)
	Interrupt: pin A routed to IRQ 21
	Region 0: Memory at d4000000 (32-bit, non-prefetchable) [size=4K]
	Region 1: I/O ports at b000 [size=8]
	Capabilities: [44] Power Management version 2
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable+ DSel=0 DScale=0 PME-
	Kernel driver in use: forcedeth
	Kernel modules: forcedeth

I could not reproduce the problem you have.

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 15
model		: 31
model name	: AMD Athlon(tm) 64 Processor 3200+
stepping	: 0
cpu MHz		: 1000.000
cache size	: 512 KB
fpu		: yes
fpu_exception	: yes
cpuid level	: 1
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow rep_good lahf_lm
bogomips	: 2010.09
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management: ts fid vid ttp


RAM : 3 Gbytes 

Only strange thing I noticed is ethtool -S results with an insane tx_broadcast

# ethtool -S eth1
NIC statistics:
     tx_bytes: 90388
     tx_zero_rexmt: 348
     tx_one_rexmt: 0
     tx_many_rexmt: 0
     tx_late_collision: 0
     tx_fifo_errors: 0
     tx_carrier_errors: 0
     tx_excess_deferral: 0
     tx_retry_error: 0
     rx_frame_error: 0
     rx_extra_byte: 0
     rx_late_collision: 0
     rx_runt: 0
     rx_frame_too_long: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_align_error: 0
     rx_length_error: 0
     rx_unicast: 413
     rx_multicast: 22
     rx_broadcast: 2
     rx_packets: 437
     rx_errors_total: 0
     tx_errors_total: 0
     tx_deferral: 718
     tx_packets: 718
     rx_bytes: 718
     tx_pause: 718
     rx_pause: 718
     rx_drop_frame: 718
     tx_unicast: 15748
     tx_multicast: 5552
     tx_broadcast: 115174309658

[root@localhost ~]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr 00:11:D8:9A:6D:06  
          inet adr:192.168.99.99  Bcast:192.168.99.255  Masque:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:466 errors:0 dropped:0 overruns:0 frame:0
          TX packets:354 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 lg file transmission:1000 
          RX bytes:50751 (49.5 KiB)  TX bytes:92974 (90.7 KiB)
          Interruption:21 Adresse de base:0x2000 

[root@localhost ~]# grep eth1 /proc/interrupts 
 21:        954   IO-APIC-fasteoi   eth1



^ permalink raw reply

* Re: [Bugme-new] [Bug 15777] New: Changing MTU after enabling GSO/GRO breaks incoming IPv6 neighbour discovery
From: Andrew Morton @ 2010-04-13 19:37 UTC (permalink / raw)
  To: netdev; +Cc: bugzilla-daemon, bugme-daemon, roman
In-Reply-To: <bug-15777-10286@https.bugzilla.kernel.org/>


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 13 Apr 2010 16:17:44 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=15777
> 
>            Summary: Changing MTU after enabling GSO/GRO breaks incoming
>                     IPv6 neighbour discovery
>            Product: Networking
>            Version: 2.5
>     Kernel Version: 2.6.33
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: IPV6
>         AssignedTo: yoshfuji@linux-ipv6.org
>         ReportedBy: roman@rm.pp.ru
>         Regression: No
> 
> 
> I have discovered that on one machine, if I enable either GSO or GRO (with
> ethtool -K), and then change the interface MTU, the machine ceases to be IPv6
> neighbour-discoverable. After the following commands:
> 
>   ethtool -K eth0 gro on gso on # doesn't matter which of them, or both
>   ifconfig eth0 mtu 4082
> 
> the machine is no longer ping6'able from LAN by "new" hosts (which haven't seen
> it recently) -- until something ELSE is adjusted on the same interface of that
> machine, e.g. the following command helps (I don't know why, the PROMISC mode
> is already disabled when it runs):
> 
>   ifconfig eth0 -promisc
> 
> The NIC (using the "skge" driver):
> 
>   00:08.0 Ethernet controller: 3Com Corporation 3c940 10/100/1000Base-T
> [Marvell] (rev 10)
> 
> The system is a  Debian Squeeze with 2.6.33 kernel and ethtool 2.6.33.
> 
> The issue is 100% reproducible.


^ permalink raw reply

* Re: [PATCH Resubmission] drivers/net/usb: Add new driver ipheth
From: "L. Alberto Giménez" @ 2010-04-13 19:03 UTC (permalink / raw)
  To: David Miller
  Cc: linux-kernel, dgiagio, dborca, gregkh, jonas.sjoquist,
	steve.glendinning, torgny.johansson, dbrownell, omar.oberthur,
	linux-usb, netdev
In-Reply-To: <20100413.011540.115903049.davem@davemloft.net>

David Miller wrote:
> Unless you add a rule to drivers/net/Makefile, the build won't
> actually get to your driver unless one of the other USB networking
> devices are configured.
>   

Hi David,

Thanks for the info. I didn't know that I had to add an entry on the 
upper level Makefile. I guess that something like 
obj-$(CONFIG_USB_IPHETH) += usb/ should be enough? (I got it from the 
other USB net drivers).

I won't be able to work on it today, I've been very busy today and can't 
look into this, but I've not given up :)

> Please fix this up and resubmit. 

I have also in my queue a patch sent from upstream to fix the latest 
issues pointed out by Roland Dreier, and I need to test it a little bit 
more.

Thanks for your comments and patience!


Best regards,
L. Alberto Giménez

^ permalink raw reply

* Re: [PATCH] Add somaxconn to Documentation/sysctl/net.txt
From: Eric Dumazet @ 2010-04-13 18:40 UTC (permalink / raw)
  To: Rob Landley; +Cc: linux-kernel, linux-doc, netdev
In-Reply-To: <201004131325.29104.rob@landley.net>

Le mardi 13 avril 2010 à 13:25 -0500, Rob Landley a écrit :
> From: Rob Landley <rob@landley.net>
> 
> Add somaxconn to Documentation/sysctl/net.txt
> 
> Signed-off-by: Rob Landley <rob@landley.net>
> ---
> 
>  Documentation/sysctl/net.txt |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.txt
> index df38ef0..2740085 100644
> --- a/Documentation/sysctl/net.txt
> +++ b/Documentation/sysctl/net.txt
> @@ -90,6 +90,12 @@ optmem_max
>  Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence
>  of struct cmsghdr structures with appended data.
>  
> +somaxconn
> +---------
> +
> +Maximum backlog of unanswered connections for a listening socket.  Provides
> +an upper bound on the "backlog" parameter of the listen() syscall.
> +
>  2. /proc/sys/net/unix - Parameters for Unix domain sockets
>  -------------------------------------------------------
>  
> 

Please cc netdev for such patches

Extract of Documentation/networking/ip-sysctl.txt

somaxconn - INTEGER
	Limit of socket listen() backlog, known in userspace as SOMAXCONN.
	Defaults to 128.  See also tcp_max_syn_backlog for additional tuning
	for TCP sockets.

I guess you need to change both files ?



^ permalink raw reply

* Re: [PATCH] tun: orphan an skb on tx
From: Eric Dumazet @ 2010-04-13 18:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jan Kiszka, David S. Miller, Herbert Xu, Paul Moore,
	David Woodhouse, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel
In-Reply-To: <20100413173919.GC26011@redhat.com>

Le mardi 13 avril 2010 à 20:39 +0300, Michael S. Tsirkin a écrit :

> > When a socket with inflight tx packets is closed, we dont block the
> > close, we only delay the socket freeing once all packets were delivered
> > and freed.
> > 
> 
> Which is wrong, since this is under userspace control, so you get
> unkillable processes.
> 

We do not get unkillable processes, at least with sockets I was thinking
about (TCP/UDP ones).

Maybe tun sockets can behave the same ?

Herbert Acked your patch, so I guess its OK, but I think it can be
dangerous.

Anyway my feeling is that we try to add various mechanisms to keep a
hostile user flooding another one.

For example, UDP got memory accounting quite recently, and we added
socket backlog limits very recently. It was considered not needed few
years ago.




^ permalink raw reply

* Re: [PATCH] tun: orphan an skb on tx
From: Michael S. Tsirkin @ 2010-04-13 17:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jan Kiszka, David S. Miller, Herbert Xu, Paul Moore,
	David Woodhouse, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, qemu-devel
In-Reply-To: <1271176838.16881.537.camel@edumazet-laptop>

On Tue, Apr 13, 2010 at 06:40:38PM +0200, Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 17:36 +0200, Jan Kiszka a écrit :
> > Michael S. Tsirkin wrote:
> > > The following situation was observed in the field:
> > > tap1 sends packets, tap2 does not consume them, as a result
> > > tap1 can not be closed.
> > 
> > And before that, tap1 may not be able to send further packets to anyone
> > else on the bridge as its TX resources were blocked by tap2 - that's
> > what we saw in the field.
> > 
> 
> After the patch, tap1 is able to flood tap2, and tap3/tap4 not able to
> send one single frame. Is it OK ?

Yes :) This was always possible. Number of senders needed to flood
a receiver might vary depending on send/recv queue size
that you set. External sources can also fill your RX queue
if you let them. In the end, we need to rely on the scheduler for fairness,
or apply packet shaping.

> Back to the problem : tap1 cannot be closed.
> 
> Why ? because of refcounts ?

Yes.

> When a socket with inflight tx packets is closed, we dont block the
> close, we only delay the socket freeing once all packets were delivered
> and freed.
> 

Which is wrong, since this is under userspace control, so you get
unkillable processes.

-- 
MST

^ permalink raw reply

* Re: [Bonding-devel] [v3 Patch 2/3] bridge: make bridge support netpoll
From: Stephen Hemminger @ 2010-04-13 17:33 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Cong Wang, Eric Dumazet, Neil Horman, netdev, Andy Gospodarek,
	bridge, linux-kernel, bonding-devel, Jeff Moyer, Matt Mackall,
	David Miller
In-Reply-To: <8304.1271177567@death.nxdomain.ibm.com>

On Tue, 13 Apr 2010 09:52:47 -0700
Jay Vosburgh <fubar@us.ibm.com> wrote:

> Cong Wang <amwang@redhat.com> wrote:
> 
> >Stephen Hemminger wrote:
> >> On Mon, 12 Apr 2010 12:38:57 +0200
> >> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >> 
> >>> Le lundi 12 avril 2010 à 18:37 +0800, Cong Wang a écrit :
> >>>> Stephen Hemminger wrote:
> >>>>> There is no protection on dev->priv_flags for SMP access.
> >>>>> It would better bit value in dev->state if you are using it as control flag.
> >>>>>
> >>>>> Then you could use 
> >>>>> 			if (unlikely(test_and_clear_bit(__IN_NETPOLL, &skb->dev->state)))
> >>>>> 				netpoll_send_skb(...)
> >>>>>
> >>>>>
> >>>> Hmm, I think we can't use ->state here, it is not for this kind of purpose,
> >>>> according to its comments.
> >>>>
> >>>> Also, I find other usages of IFF_XXX flags of ->priv_flags are also using
> >>>> &, | to set or clear the flags. So there must be some other things preventing
> >>>> the race...
> >>> Yes, its RTNL that protects priv_flags changes, hopefully...
> >>>
> >> 
> >> The patch was not protecting priv_flags with RTNL.
> >> For example..
> >> 
> >> 
> >> @@ -308,7 +312,9 @@ static void netpoll_send_skb(struct netp
> >>  		     tries > 0; --tries) {
> >>  			if (__netif_tx_trylock(txq)) {
> >>  				if (!netif_tx_queue_stopped(txq)) {
> >> +					dev->priv_flags |= IFF_IN_NETPOLL;
> >>  					status = ops->ndo_start_xmit(skb, dev);
> >> +					dev->priv_flags &= ~IFF_IN_NETPOLL;
> >>  					if (status == NETDEV_TX_OK)
> >>  						txq_trans_update(txq);
> >
> >Hmm, but I checked the bonding case (IFF_BONDING), it doesn't
> >hold rtnl_lock. Strange.
> 
> 	I looked, and there are a couple of cases in bonding that don't
> have RTNL for adjusting priv_flags (in bond_ab_arp_probe when no slaves
> are up, and a couple of cases in 802.3ad).  I think the solution there
> is to move bonding away from priv_flags for some of this (e.g., convert
> bonding to use a frame hook like bridge and macvlan, and greatly
> simplify skb_bond_should_drop), but that's a separate topic.
> 
> 	The majority of the cases, however, do hold RTNL.  Bonding
> generally doesn't have to acquire RTNL itself, since whatever called
> into bonding is holding it already.  For example, the slave add and
> remove paths (bond_enslave, bond_release) are called either via sysfs or
> ioctl, both of which acquire RTNL.  All of the set and clear operations
> for IFF_BONDING fall into this category; look at bonding_store_slaves
> for an example.
> 
> 	Bonding does acquire RTNL itself when performing failovers,
> e.g., bond_mii_monitor holds RTNL prior to calling bond_miimon_commit,
> which will change priv_flags.
> 

All this was related to netpoll. And netpoll processing often needs to occur
in hard IRQ context. Therefor netpoll stuff and RTNL (which is a mutex),
really don't mix well.  Keep RTNL for what it was meant for network
reconfiguration. Don't turn it into a network special BKL.



-- 

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox