Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH  kernel 2.6.34-rc5] lib8390: to be SMP safe
From: Ken Kawasaki @ 2010-05-03 10:43 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20100307070256.cb86716d.ken_kawasaki@spring.nifty.jp>


lib8390:
	write the value "ENISR_ALL" to register "EN0_IMR"
	after enable_irq_lockdep_irqrestore. 

	This patch avoids frequent transmit error on SMP system.


Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>

---

--- linux-2.6.34-rc6/drivers/net/lib8390.c.orig	2010-05-02 16:49:57.000000000 +0900
+++ linux-2.6.34-rc6/drivers/net/lib8390.c	2010-05-02 18:09:18.000000000 +0900
@@ -367,9 +367,9 @@ static netdev_tx_t __ei_start_xmit(struc
 				dev->name, ei_local->tx1, ei_local->tx2, ei_local->lasttx);
 		ei_local->irqlock = 0;
 		netif_stop_queue(dev);
-		ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
 		spin_unlock(&ei_local->page_lock);
 		enable_irq_lockdep_irqrestore(dev->irq, &flags);
+		ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
 		dev->stats.tx_errors++;
 		return NETDEV_TX_BUSY;
 	}
@@ -407,10 +407,10 @@ static netdev_tx_t __ei_start_xmit(struc
 
 	/* Turn 8390 interrupts back on. */
 	ei_local->irqlock = 0;
-	ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
 
 	spin_unlock(&ei_local->page_lock);
 	enable_irq_lockdep_irqrestore(dev->irq, &flags);
+	ei_outb_p(ENISR_ALL, e8390_base + EN0_IMR);
 
 	dev_kfree_skb (skb);
 	dev->stats.tx_bytes += send_length;

^ permalink raw reply

* Re: [patch v2.2 1/4] [PATCH v2.1 1/4] netfilter: xt_ipvs (netfilter matcher for IPVS)
From: Hannes Eder @ 2010-05-03 11:29 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Simon Horman, lvs-devel, netdev, linux-kernel, netfilter,
	Wensong Zhang, Julius Volz, David S. Miller,
	Netfilter Development Mailinglist
In-Reply-To: <4BDC543F.7060500@trash.net>

Thank you for picking this series of patches up again and thanks for
the feedback.

I'll send an updated version in the next days.

Cheers, -Hannes

On Sat, May 1, 2010 at 18:18, Patrick McHardy <kaber@trash.net> wrote:
> Simon Horman wrote:
>
>> @@ -0,0 +1,25 @@
>> +#ifndef _XT_IPVS_H
>> +#define _XT_IPVS_H 1
>
> You don't need to define a value.
>
>> +config NETFILTER_XT_MATCH_IPVS
>> +     tristate '"ipvs" match support'
>> +     depends on IP_VS
>> +     depends on NETFILTER_ADVANCED
>> +     help
>> +       This option allows you to match against IPVS properties of a packet.
>> +
>> +       If unsure, say N.
>
> You're using conntrack symbols, so this seems to need a dependency
> on NF_CONNTRACK.
>
>> +static bool ipvs_mt_check(const struct xt_mtchk_param *par)
>
> We've changed the signature to "int" in nf-next to be able to
> return errno codes. Please rebase your patches onto nf-next-2.6.git.
>
> Please also CC netfilter-devel at least for those parts that affect
> non-IPVS netfilter.
>
>> +{
>> +     if (par->family != NFPROTO_IPV4
>> +#ifdef CONFIG_IP_VS_IPV6
>> +         && par->family != NFPROTO_IPV6
>> +#endif
>> +             ) {
>> +             pr_info("protocol family %u not supported\n", par->family);
>> +             return false;
>> +     }
>> +
>> +     return true;
>> +}
>
>

^ permalink raw reply

* Re: [net-next-2.6 PATCH 2/2] add ndo_set_port_profile op support for enic dynamic vnics
From: Arnd Bergmann @ 2010-05-03 11:32 UTC (permalink / raw)
  To: Vivek Kashyap; +Cc: Scott Feldman, davem, netdev, chrisw, Jens Osterkamp
In-Reply-To: <alpine.LFD.2.00.1005022119140.16925@vk>

On Monday 03 May 2010, Vivek Kashyap wrote:
> > After a successful pre-associate-with-resource-reservation step, we
> > know that the actual associate step will be both fast and successful.
> > After it completes, the VSI is known to be on the destination
> > and all traffic goes there (replacing the gratuitous ARP method we do
> > today).
> >
> > I don't think we'd ever do a pre-associate without the
> > resource-reservation, but the standard defines both. In theory,
> > we could do a pre-associate at every switch in the data center
> > in order to find out if it's possible to migrate there.
> >
> > If you want to have more details, please look at the draft spec at
> > http://www.ieee802.org/1/files/public/docs2010/bg-joint-evb-0410v1.pdf
> 
> The basic difference is that in 'pre-associate with resoruce reservation', the 
> local buffers and resources needed for the eventual 'associate' are reserved
> at the switch port.  Therefore the associate will not fail with 
> 'insufficient resources'. It might otherwise.

Yes, that's exactly what I wrote. So do you have any idea why we would
ever not want to do the resource reservation?

	Arnd

^ permalink raw reply

* VLAN I/F's and TX queue.
From: Joakim Tjernlund @ 2010-05-03 11:34 UTC (permalink / raw)
  To: netdev

We noted dropped pkgs on our VLAN interfaces and i stated to look
for a cause. Here is a ifconfig example:

eth0      Link encap:Ethernet  HWaddr 00:AA:BB:CC:DD:EE
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:8886910 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8880219 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:1626842951 (1.5 GiB)  TX bytes:1555540810 (1.4 GiB)

eth0.1    Link encap:Ethernet  HWaddr 00:AA:BB:CC:DD:EE
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2163164 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2161943 errors:0 dropped:98 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2467090557 (2.2 GiB)  TX bytes:2480246455 (2.3 GiB)

eth0.1.1  Link encap:Ethernet  HWaddr 00:AA:BB:CC:DD:EE
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2163164 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2161943 errors:0 dropped:98 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2458437901 (2.2 GiB)  TX bytes:2471598683 (2.3 GiB)

Here I note that txqueuelen is 0 for eth0.1/eth0.1.1 and 100 for eth0 and
that it is only eth0.1 and eth0.1.1 that drops pkgs. It feels as if eth0.1
bypasses eth0's tx queue and passes pkgs directly to the HW driver. Is that so?
If so, that feels a bit strange and I am not sure how to best
fix this. Any ides?

Using kernel 2.6.33

     Jocke

^ permalink raw reply

* Re: ep93xx_eth stopps receiving packages
From: Stefan Agner @ 2010-05-03 11:37 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: netdev
In-Reply-To: <20100502104350.GS4586@mail.wantstofly.org>

Quoting Lennert Buytenhek <buytenh@wantstofly.org>:

> On Mon, Apr 19, 2010 at 05:38:13PM +0200, Stefan Agner wrote:
>
>> I'm using Linux 2.6.32.9 on a technologic systems TS-7250 SBC board, with
>> the ep93xx_eth driver for networking. On three identical, but independent
>> systems I noted that the system is unreachable after a while. On a serial
>> terminal I noted that only the TX counter counts onward, RX stays where it
>> is,
>> no matter if i try to ping from or to the system. Wireshark tells me exactly
>> that too: I see helpless ARP requests which gets answered, but no ICMP. The
>> system doesnt receive the ARP requests, and just sends another one.
>
> (So does the board or does it not respond to ARP requests for its IP?)
The board does not responds to ARP requests for its IP...

>> With a simple program which sends small packages in a fast pace I can
>> reproduce the problem after several seconds (additional CPU load seem to
>> provoke the problem even more). Remove and replug the network cable doesn't
>> solve the problem, but ifup/down does. I don't see any messages in dmesg,
>> memory is still available.
>
> Do you see interrupts increasing in /proc/interrupts when this happens?
No, interrupt doesn't increase anymore when it happens...

I debugged the problem myself inbetween, I just took not the time to format
and send the patch, sorry! There is a bug when interrupts gets disabled for
a longer period, each a frame arrives when one is just been processed.
ep93xx_rx gets called twice then, and the second time marks to many buffers
as relased. I corrected this error by releasing the correct number of buffers
when ep93xx_poll ends... Patch follows!

Stefan


-- 
Stefan Agner

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


^ permalink raw reply

* [PATCH] ep93xx_eth stopps receiving packets
From: Stefan Agner @ 2010-05-03 11:42 UTC (permalink / raw)
  To: Lennert Buytenhek; +Cc: netdev
In-Reply-To: <20100502104350.GS4586@mail.wantstofly.org>

Receiving small packet(s) in a fast pace leads to not receiving any
packets at all after some time.

After ethernet packet(s) arrived the receive descriptor is incremented
by the number of frames processed. If another packet arrives while
processing, this is processed in another call of ep93xx_rx. This
second call leads that too many receive descriptors getting released.

This fix increments, even in these case, the right number of processed
receive descriptors.

Signed-off-by: Stefan Agner <stefan@agner.ch>
---
  drivers/net/arm/ep93xx_eth.c |   10 +++++-----
  1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/arm/ep93xx_eth.c b/drivers/net/arm/ep93xx_eth.c
index 6995169..cd6cd3e 100644
--- a/drivers/net/arm/ep93xx_eth.c
+++ b/drivers/net/arm/ep93xx_eth.c
@@ -311,11 +311,6 @@ err:
  		processed++;
  	}

-	if (processed) {
-		wrw(ep, REG_RXDENQ, processed);
-		wrw(ep, REG_RXSTSENQ, processed);
-	}
-
  	return processed;
  }

@@ -350,6 +345,11 @@ poll_some_more:
  			goto poll_some_more;
  	}

+	if (rx) {
+                wrw(ep, REG_RXDENQ, rx);
+                wrw(ep, REG_RXSTSENQ, rx);
+        }
+
  	return rx;
  }

-- 
1.7.0

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

^ permalink raw reply related

* Re: [PATCH 1/2] ppp_generic: pull 2 bytes so that PPP_PROTO(skb) is valid
From: Simon Arlott @ 2010-05-03 11:50 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, paulus, linux-ppp
In-Reply-To: <20100502.232520.146109082.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 781 bytes --]

On Mon, May 3, 2010 07:25, David Miller wrote:
> From: Simon Arlott <simon@fire.lp0.eu>
> Date: Fri, 30 Apr 2010 19:41:17 +0100
>> @@ -1572,8 +1572,18 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb)
>>  		return;
>>  	}
>>
>> -	proto = PPP_PROTO(skb);
>> +
>>  	read_lock_bh(&pch->upl);
>> +	if (!pskb_may_pull(skb, 2)) {
>
> This makes the skb->len == 0 test at the beginning completely redundant.
>
> Put your pskb_may_pull(skb, 2) call there and remove the skb->len==0
> check entirely.

If I move pskb_may_pull(skb, 2) up to where skb->len == 0 is then it can't
increment rx_length_errors because it doesn't have the read lock on pch->upl,
so I can only remove the redundant skb->len == 0 if that error count is to
remain.

Updated patch attached.

-- 
Simon Arlott

[-- Attachment #2: 0001-ppp_generic-pull-2-bytes-so-that-PPP_PROTO-skb-is-va.patch --]
[-- Type: application/octet-stream, Size: 2627 bytes --]

From f6d225971143db1ff5353008d20579e1de75f00d Mon Sep 17 00:00:00 2001
From: Simon Arlott <simon@fire.lp0.eu>
Date: Fri, 30 Apr 2010 19:04:33 +0100
Subject: [PATCH 1/2] ppp_generic: pull 2 bytes so that PPP_PROTO(skb) is valid

In ppp_input(), PPP_PROTO(skb) may refer to invalid data in the skb.

If this happens and (proto >= 0xc000 || proto == PPP_CCPFRAG) then
the packet is passed directly to pppd.

This occurs frequently when using PPPoE with an interface MTU
greater than 1500 because the skb is more likely to be non-linear.

The next 2 bytes need to be pulled in ppp_input(). The pull of 2
bytes in ppp_receive_frame() has been removed as it is no longer
required.

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
---
 drivers/net/ppp_generic.c |   29 ++++++++++++++++++-----------
 1 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c
index 6e281bc..75e8903 100644
--- a/drivers/net/ppp_generic.c
+++ b/drivers/net/ppp_generic.c
@@ -1567,13 +1567,22 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb)
 	struct channel *pch = chan->ppp;
 	int proto;
 
-	if (!pch || skb->len == 0) {
+	if (!pch) {
 		kfree_skb(skb);
 		return;
 	}
 
-	proto = PPP_PROTO(skb);
 	read_lock_bh(&pch->upl);
+	if (!pskb_may_pull(skb, 2)) {
+		kfree_skb(skb);
+		if (pch->ppp) {
+			++pch->ppp->dev->stats.rx_length_errors;
+			ppp_receive_error(pch->ppp);
+		}
+		goto done;
+	}
+
+	proto = PPP_PROTO(skb);
 	if (!pch->ppp || proto >= 0xc000 || proto == PPP_CCPFRAG) {
 		/* put it on the channel queue */
 		skb_queue_tail(&pch->file.rq, skb);
@@ -1585,6 +1594,8 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb)
 	} else {
 		ppp_do_recv(pch->ppp, skb, pch);
 	}
+
+done:
 	read_unlock_bh(&pch->upl);
 }
 
@@ -1617,7 +1628,8 @@ ppp_input_error(struct ppp_channel *chan, int code)
 static void
 ppp_receive_frame(struct ppp *ppp, struct sk_buff *skb, struct channel *pch)
 {
-	if (pskb_may_pull(skb, 2)) {
+	/* note: a 0-length skb is used as an error indication */
+	if (skb->len > 0) {
 #ifdef CONFIG_PPP_MULTILINK
 		/* XXX do channel-level decompression here */
 		if (PPP_PROTO(skb) == PPP_MP)
@@ -1625,15 +1637,10 @@ ppp_receive_frame(struct ppp *ppp, struct sk_buff *skb, struct channel *pch)
 		else
 #endif /* CONFIG_PPP_MULTILINK */
 			ppp_receive_nonmp_frame(ppp, skb);
-		return;
+	} else {
+		kfree_skb(skb);
+		ppp_receive_error(ppp);
 	}
-
-	if (skb->len > 0)
-		/* note: a 0-length skb is used as an error indication */
-		++ppp->dev->stats.rx_length_errors;
-
-	kfree_skb(skb);
-	ppp_receive_error(ppp);
 }
 
 static void
-- 
1.7.0.4


^ permalink raw reply related

* Re: [PATCH 3/3] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
From: Kumar Gala @ 2010-05-03 12:35 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Netdev, linuxppc-dev, devicetree-discuss, Sandeep Gopalpet
In-Reply-To: <20100503062617.GA3310@riccoc20.at.omicron.at>

On May 3, 2010, at 1:26 AM, Richard Cochran wrote:

> On Sat, May 01, 2010 at 11:36:12AM -0500, Kumar Gala wrote:
>> Is there a binding document that describes this node you are adding?
> 
> No, but I will add one to Documentation/powerpc/dts-bindings.

Please do so we can review and comment.

- k

^ permalink raw reply

* [PATCH] unix/garbage: kill copy of the skb queue walker
From: Ilpo Järvinen @ 2010-05-03 13:22 UTC (permalink / raw)
  To: David Miller; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1569 bytes --]

Worse yet, it seems that its arguments were in reverse order. Also
remove one related helper which seems hardly worth keeping.

Compile tested.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/unix/garbage.c |   13 ++-----------
 1 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 14c22c3..c8df6fd 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -153,15 +153,6 @@ void unix_notinflight(struct file *fp)
 	}
 }
 
-static inline struct sk_buff *sock_queue_head(struct sock *sk)
-{
-	return (struct sk_buff *)&sk->sk_receive_queue;
-}
-
-#define receive_queue_for_each_skb(sk, next, skb) \
-	for (skb = sock_queue_head(sk)->next, next = skb->next; \
-	     skb != sock_queue_head(sk); skb = next, next = skb->next)
-
 static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
 			  struct sk_buff_head *hitlist)
 {
@@ -169,7 +160,7 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
 	struct sk_buff *next;
 
 	spin_lock(&x->sk_receive_queue.lock);
-	receive_queue_for_each_skb(x, next, skb) {
+	skb_queue_walk_safe(&x->sk_receive_queue, skb, next) {
 		/*
 		 *	Do we have file descriptors ?
 		 */
@@ -225,7 +216,7 @@ static void scan_children(struct sock *x, void (*func)(struct unix_sock *),
 		 * and perform a scan on them as well.
 		 */
 		spin_lock(&x->sk_receive_queue.lock);
-		receive_queue_for_each_skb(x, next, skb) {
+		skb_queue_walk_safe(&x->sk_receive_queue, skb, next) {
 			u = unix_sk(skb->sk);
 
 			/*
-- 
1.5.6.3

^ permalink raw reply related

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Arjan van de Ven @ 2010-05-03 14:09 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Eric Dumazet, David Miller, hadi, xiaosuo, therbert, shemminger,
	netdev, lenb
In-Reply-To: <20100503103426.GA25809@one.firstfloor.org>

On Mon, 3 May 2010 12:34:26 +0200
Andi Kleen <andi@firstfloor.org> wrote:

> > > Maybe its low cost, (apparently, it is, since I can reach ~900.000
> > > ipis on my 16 cores machine) but multiply this by 16 or 32 or 64
> > > cpus, and clockevents_notify() cost appears to be a killer, all
> > > cpus compete on a single lock.
> > > 
> > > Maybe this notifier could use RCU ?
> > 
> > could this be an artifact of the local apic stopping in deeper C
> > states? (which is finally fixed in the Westmere generation)
> 
> Yes it is I think.
> 
> But I suspect Eric wants a solution for Nehalem.

sure ;-)

so the hard problem is that on going idle, the local timers need to be
funneled to the external HPET. Afaik right now we use one channel of
the hpet, with the result that we have one global lock for this.

HPETs have more than one channel (2 or 3 historically, newer chipsets
iirc have a few more), so in principle we can split this lock at least
a little bit... if we can get to one hpet channel per level 3 cache
domain we'd already make huge progress in terms of cost of the
contention....

-- 
Arjan van de Ven 	Intel Open Source Technology Centre
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply

* [PATCH v2] ethernet: call __skb_pull() in eth_type_trans()
From: Changli Gao @ 2010-05-03 14:12 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev, Changli Gao

call __skb_pull() in eth_type_trans().

The callers of eth_type_trans() should always feed it long enough packets. When
the length of the packet is less than ETH_ZLEN, a warning message will be shown,
and the later behaviors are undefined.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/ethernet/eth.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 61ec032..1df31cc 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -162,7 +162,10 @@ __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev)
 
 	skb->dev = dev;
 	skb_reset_mac_header(skb);
-	skb_pull_inline(skb, ETH_HLEN);
+	if (unlikely(skb->len < ETH_ZLEN))
+		dev_warn(&dev->dev, "too small ethernet packet: %u bytes\n",
+			 skb->len);
+	__skb_pull(skb, ETH_HLEN);
 	eth = eth_hdr(skb);
 
 	if (unlikely(is_multicast_ether_addr(eth->h_dest))) {

^ permalink raw reply related

* Re: [PATCH] [RFC] C/R: inet4 and inet6 unicast routes (v2)
From: Dan Smith @ 2010-05-03 14:21 UTC (permalink / raw)
  To: hadi; +Cc: Daniel Lezcano, containers, Vlad Yasevich, netdev, David Miller
In-Reply-To: <1272673614.14499.10.camel@bigi>

j> The problem as i see it (with all net structures not just routes -
j> i was equally pessimistic when i saw those other net structure
j> checkpoint/restore changes) is you are faced with a herculean
j> high-maintainance effort...  You have a separate piece of code
j> which populates structures that _you_ maintain for attributes that
j> are defined elsewhere by other people.  Nobody adding a new
j> attribute that is very important to route restoration for example
j> is likely to change your code. Unless you tie the two together (so
j> changing one forces the coder to change the other).  And once
j> people deploy kernels it is hard to change. Historically (for
j> pragmatic reasons) such rich interfaces sit in user space - much
j> easier to update user space.

The benefits of doing what we can in userspace are well-understood and
arguing for doing so where it makes sense is, of course, a good idea.

However, it seems to me that the rtnl interface provides us a
reasonable layer of isolation between us and such changes.  Am I
wrong?  The rtnl messages appear to be rather generic and timeless,
and in most cases have a significant amount of flexibility with
respect to allowing advanced attributes to be ignored (which implies
taking the default).

In many other areas of C/R we're not so lucky and don't have a
well-defined interface for dumping that information out of the
kernel...

-- 
Dan Smith
IBM Linux Technology Center
email: danms@us.ibm.com

^ permalink raw reply

* [PATCH] sky2: Avoid race in sky2_change_mtu
From: Mike McCormack @ 2010-05-03 14:18 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev

netif_stop_queue does not ensure all in-progress transmits are complete,
 so use netif_tx_disable() instead.

Make sure NAPI polls are disabled, otherwise NAPI might trigger a TX
 restart between when we stop the queue and NAPI is disabled.

Signed-off-by: Mike McCormack <mikem@ring3k.org>
---
 drivers/net/sky2.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 088c797..b839bae 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -2236,8 +2236,8 @@ static int sky2_change_mtu(struct net_device *dev, int new_mtu)
 	sky2_write32(hw, B0_IMSK, 0);
 
 	dev->trans_start = jiffies;	/* prevent tx timeout */
-	netif_stop_queue(dev);
 	napi_disable(&hw->napi);
+	netif_tx_disable(dev);
 
 	synchronize_irq(hw->pdev->irq);
 
-- 
1.5.6.5


^ permalink raw reply related

* Re: mmotm 2010-04-28 - RCU whinges
From: Valdis.Kletnieks @ 2010-05-03 14:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, Peter Zijlstra, Patrick McHardy, David S. Miller,
	linux-kernel, netfilter-devel, netdev, Paul E. McKenney
In-Reply-To: <1272865137.2173.179.camel@edumazet-laptop>

[-- Attachment #1: Type: text/plain, Size: 2704 bytes --]

On Mon, 03 May 2010 07:38:57 +0200, Eric Dumazet said:
> Le dimanche 02 mai 2010 à 13:46 -0400, Valdis.Kletnieks@vt.edu a écrit :
> > On Wed, 28 Apr 2010 16:53:32 PDT, akpm@linux-foundation.org said:
> > > The mm-of-the-moment snapshot 2010-04-28-16-53 has been uploaded to
> > > 
> > >    http://userweb.kernel.org/~akpm/mmotm/
> > 
> > I thought we swatted all these, hit another one...

> Thanks for the report !
> 
> We can use rcu_dereference_protected() in those cases.
> 
> [PATCH] net: Use rcu_dereference_protected in nf_conntrack_ecache
> 
> Writers own nf_ct_ecache_mutex.

I *really* thought we swatted a bunch of these - did the fixes not make it
into linux-next or -mm?  Your patch fixed that one, but then:

[    9.128899] Netfilter messages via NETLINK v0.30.
[    9.128919] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
[    9.129108] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
[    9.129110] nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or
[    9.129113] sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
[    9.129135] ctnetlink v0.93: registering with nfnetlink.
[    9.129452] ip_tables: (C) 2000-2006 Netfilter Core Team
[    9.129506] 
[    9.129507] ===================================================
[    9.129683] [ INFO: suspicious rcu_dereference_check() usage. ]
[    9.129777] ---------------------------------------------------
[    9.129872] net/netfilter/nf_log.c:55 invoked rcu_dereference_check() without protection!
[    9.129969] 
[    9.129969] other info that might help us debug this:
[    9.129970] 
[    9.130232] 
[    9.130232] rcu_scheduler_active = 1, debug_locks = 0
[    9.130407] 1 lock held by swapper/1:
[    9.130525]  #0:  (nf_log_mutex){+.+...}, at: [<ffffffff81481154>] nf_log_register+0x57/0x10f
[    9.130955] 
[    9.130956] stack backtrace:
[    9.131162] Pid: 1, comm: swapper Tainted: G        W   2.6.34-rc5-mmotm0428 #2
[    9.131259] Call Trace:
[    9.131370]  [<ffffffff81064832>] lockdep_rcu_dereference+0xaa/0xb2
[    9.131466]  [<ffffffff814811db>] nf_log_register+0xde/0x10f
[    9.131579]  [<ffffffff81b5ca28>] ? log_tg_init+0x0/0x29
[    9.131689]  [<ffffffff81b5ca4d>] log_tg_init+0x25/0x29
[    9.131800]  [<ffffffff810001ef>] do_one_initcall+0x59/0x14e
[    9.131912]  [<ffffffff81b2e68a>] kernel_init+0x144/0x1ce
[    9.132033]  [<ffffffff81003414>] kernel_thread_helper+0x4/0x10
[    9.132146]  [<ffffffff81598a40>] ? restore_args+0x0/0x30
[    9.132257]  [<ffffffff81b2e546>] ? kernel_init+0x0/0x1ce
[    9.132370]  [<ffffffff81003410>] ? kernel_thread_helper+0x0/0x10
[    9.132513] TCP bic registered


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply

* Re: [PATCH v2] ethernet: call __skb_pull() in eth_type_trans()
From: Eric Dumazet @ 2010-05-03 14:44 UTC (permalink / raw)
  To: Changli Gao; +Cc: David Miller, netdev
In-Reply-To: <1272895972-13799-1-git-send-email-xiaosuo@gmail.com>

Le lundi 03 mai 2010 à 22:12 +0800, Changli Gao a écrit :
> call __skb_pull() in eth_type_trans().
> 
> The callers of eth_type_trans() should always feed it long enough packets. When
> the length of the packet is less than ETH_ZLEN, a warning message will be shown,
> and the later behaviors are undefined.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
> ----
>  net/ethernet/eth.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
> index 61ec032..1df31cc 100644
> --- a/net/ethernet/eth.c
> +++ b/net/ethernet/eth.c
> @@ -162,7 +162,10 @@ __be16 eth_type_trans(struct sk_buff *skb, struct net_device *dev)
>  
>  	skb->dev = dev;
>  	skb_reset_mac_header(skb);
> -	skb_pull_inline(skb, ETH_HLEN);
> +	if (unlikely(skb->len < ETH_ZLEN))
> +		dev_warn(&dev->dev, "too small ethernet packet: %u bytes\n",
> +			 skb->len);
> +	__skb_pull(skb, ETH_HLEN);
>  	eth = eth_hdr(skb);
>  
>  	if (unlikely(is_multicast_ether_addr(eth->h_dest))) {


Hmm, I feel very uncompfortable with this patch.

I am pretty sure some callers dont check minimum ethernet frame length.

At least a WARN_ON_ONCE() is needed, just in case...
In fact our stack has different requirements.

Check net/ipv4/ip_gre.c for example.

                if (tunnel->dev->type == ARPHRD_ETHER) {
                        if (!pskb_may_pull(skb, ETH_HLEN)) {
                                stats->rx_length_errors++;
                                stats->rx_errors++;
                                goto drop;
                        }

                        iph = ip_hdr(skb);
                        skb->protocol = eth_type_trans(skb, tunnel->dev);
                        skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
                }



^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Brian Bloniarz @ 2010-05-03 14:45 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andi Kleen, Eric Dumazet, David Miller, hadi, xiaosuo, therbert,
	shemminger, netdev, lenb
In-Reply-To: <20100503070925.572bbee6@infradead.org>

Arjan van de Ven wrote:
> On Mon, 3 May 2010 12:34:26 +0200
> Andi Kleen <andi@firstfloor.org> wrote:
> 
>>>> Maybe its low cost, (apparently, it is, since I can reach ~900.000
>>>> ipis on my 16 cores machine) but multiply this by 16 or 32 or 64
>>>> cpus, and clockevents_notify() cost appears to be a killer, all
>>>> cpus compete on a single lock.
>>>>
>>>> Maybe this notifier could use RCU ?
>>> could this be an artifact of the local apic stopping in deeper C
>>> states? (which is finally fixed in the Westmere generation)
>> Yes it is I think.
>>
>> But I suspect Eric wants a solution for Nehalem.
> 
> sure ;-)
> 
> 
> so the hard problem is that on going idle, the local timers need to be
> funneled to the external HPET. Afaik right now we use one channel of
> the hpet, with the result that we have one global lock for this.

Does the HPET only need to be programmed when going idle?
That could mean that this isn't a big performance issue.
cares if you spin for a while when you're about to sleep for
at least 60usec?

> HPETs have more than one channel (2 or 3 historically, newer chipsets
> iirc have a few more), so in principle we can split this lock at least
> a little bit... if we can get to one hpet channel per level 3 cache
> domain we'd already make huge progress in terms of cost of the
> contention....

Another possible approach: if a core needs the HPET and finds it
locked, it could queue up its request to a backlog which the
locking core will service.

^ permalink raw reply

* Re: mmotm 2010-04-28 - RCU whinges
From: Eric Dumazet @ 2010-05-03 14:48 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Andrew Morton, Peter Zijlstra, Patrick McHardy, David S. Miller,
	linux-kernel, netfilter-devel, netdev, Paul E. McKenney
In-Reply-To: <5933.1272897014@localhost>

Le lundi 03 mai 2010 à 10:30 -0400, Valdis.Kletnieks@vt.edu a écrit :

> 
> I *really* thought we swatted a bunch of these - did the fixes not make it
> into linux-next or -mm?  Your patch fixed that one, but then:
> 
> [    9.128899] Netfilter messages via NETLINK v0.30.
> [    9.128919] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
> [    9.129108] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
> [    9.129110] nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or
> [    9.129113] sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
> [    9.129135] ctnetlink v0.93: registering with nfnetlink.
> [    9.129452] ip_tables: (C) 2000-2006 Netfilter Core Team
> [    9.129506] 
> [    9.129507] ===================================================
> [    9.129683] [ INFO: suspicious rcu_dereference_check() usage. ]
> [    9.129777] ---------------------------------------------------
> [    9.129872] net/netfilter/nf_log.c:55 invoked rcu_dereference_check() without protection!
> [    9.129969] 
> [    9.129969] other info that might help us debug this:
> [    9.129970] 
> [    9.130232] 
> [    9.130232] rcu_scheduler_active = 1, debug_locks = 0
> [    9.130407] 1 lock held by swapper/1:
> [    9.130525]  #0:  (nf_log_mutex){+.+...}, at: [<ffffffff81481154>] nf_log_register+0x57/0x10f
> [    9.130955] 
> [    9.130956] stack backtrace:
> [    9.131162] Pid: 1, comm: swapper Tainted: G        W   2.6.34-rc5-mmotm0428 #2
> [    9.131259] Call Trace:
> [    9.131370]  [<ffffffff81064832>] lockdep_rcu_dereference+0xaa/0xb2
> [    9.131466]  [<ffffffff814811db>] nf_log_register+0xde/0x10f
> [    9.131579]  [<ffffffff81b5ca28>] ? log_tg_init+0x0/0x29
> [    9.131689]  [<ffffffff81b5ca4d>] log_tg_init+0x25/0x29
> [    9.131800]  [<ffffffff810001ef>] do_one_initcall+0x59/0x14e
> [    9.131912]  [<ffffffff81b2e68a>] kernel_init+0x144/0x1ce
> [    9.132033]  [<ffffffff81003414>] kernel_thread_helper+0x4/0x10
> [    9.132146]  [<ffffffff81598a40>] ? restore_args+0x0/0x30
> [    9.132257]  [<ffffffff81b2e546>] ? kernel_init+0x0/0x1ce
> [    9.132370]  [<ffffffff81003410>] ? kernel_thread_helper+0x0/0x10
> [    9.132513] TCP bic registered
> 

You probably know this PROVE_RCU thing is new and reserved to
developpers ?

We yet have to change all spots were a rcu_dereference() was used
without rcu_read_lock(). Not a bug by itself, just lockdep is to be
instructed not to shout.

Maybe 30 patches already in, and maybe 30 other are still needed.

^ permalink raw reply

* Re: mmotm 2010-04-28 - RCU whinges
From: Eric Dumazet @ 2010-05-03 14:58 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Andrew Morton, Peter Zijlstra, Patrick McHardy, David S. Miller,
	linux-kernel, netfilter-devel, netdev, Paul E. McKenney
In-Reply-To: <5933.1272897014@localhost>

Le lundi 03 mai 2010 à 10:30 -0400, Valdis.Kletnieks@vt.edu a écrit :

> [    9.128899] Netfilter messages via NETLINK v0.30.
> [    9.128919] nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
> [    9.129108] CONFIG_NF_CT_ACCT is deprecated and will be removed soon. Please use
> [    9.129110] nf_conntrack.acct=1 kernel parameter, acct=1 nf_conntrack module option or
> [    9.129113] sysctl net.netfilter.nf_conntrack_acct=1 to enable it.
> [    9.129135] ctnetlink v0.93: registering with nfnetlink.
> [    9.129452] ip_tables: (C) 2000-2006 Netfilter Core Team
> [    9.129506] 
> [    9.129507] ===================================================
> [    9.129683] [ INFO: suspicious rcu_dereference_check() usage. ]
> [    9.129777] ---------------------------------------------------
> [    9.129872] net/netfilter/nf_log.c:55 invoked rcu_dereference_check() without protection!
> [    9.129969] 
> [    9.129969] other info that might help us debug this:
> [    9.129970] 
> [    9.130232] 
> [    9.130232] rcu_scheduler_active = 1, debug_locks = 0
> [    9.130407] 1 lock held by swapper/1:
> [    9.130525]  #0:  (nf_log_mutex){+.+...}, at: [<ffffffff81481154>] nf_log_register+0x57/0x10f
> [    9.130955] 
> [    9.130956] stack backtrace:
> [    9.131162] Pid: 1, comm: swapper Tainted: G        W   2.6.34-rc5-mmotm0428 #2
> [    9.131259] Call Trace:
> [    9.131370]  [<ffffffff81064832>] lockdep_rcu_dereference+0xaa/0xb2
> [    9.131466]  [<ffffffff814811db>] nf_log_register+0xde/0x10f
> [    9.131579]  [<ffffffff81b5ca28>] ? log_tg_init+0x0/0x29
> [    9.131689]  [<ffffffff81b5ca4d>] log_tg_init+0x25/0x29
> [    9.131800]  [<ffffffff810001ef>] do_one_initcall+0x59/0x14e
> [    9.131912]  [<ffffffff81b2e68a>] kernel_init+0x144/0x1ce
> [    9.132033]  [<ffffffff81003414>] kernel_thread_helper+0x4/0x10
> [    9.132146]  [<ffffffff81598a40>] ? restore_args+0x0/0x30
> [    9.132257]  [<ffffffff81b2e546>] ? kernel_init+0x0/0x1ce
> [    9.132370]  [<ffffffff81003410>] ? kernel_thread_helper+0x0/0x10
> [    9.132513] TCP bic registered
> 

Thanks for the report !

[PATCH] net: nf_log RCU fixes

nf_log_register() and nf_log_unregister() use a mutex to have exclusive
access to nf_logers[]. Use appropriate rcu_dereference_protected()
lockdep annotation.

Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
diff --git a/net/netfilter/nf_log.c b/net/netfilter/nf_log.c
index 015725a..7df37fd 100644
--- a/net/netfilter/nf_log.c
+++ b/net/netfilter/nf_log.c
@@ -52,7 +52,8 @@ int nf_log_register(u_int8_t pf, struct nf_logger *logger)
 	} else {
 		/* register at end of list to honor first register win */
 		list_add_tail(&logger->list[pf], &nf_loggers_l[pf]);
-		llog = rcu_dereference(nf_loggers[pf]);
+		llog = rcu_dereference_protected(nf_loggers[pf],
+						 lockdep_is_held(&nf_log_mutex));
 		if (llog == NULL)
 			rcu_assign_pointer(nf_loggers[pf], logger);
 	}
@@ -70,7 +71,8 @@ void nf_log_unregister(struct nf_logger *logger)
 
 	mutex_lock(&nf_log_mutex);
 	for (i = 0; i < ARRAY_SIZE(nf_loggers); i++) {
-		c_logger = rcu_dereference(nf_loggers[i]);
+		c_logger = rcu_dereference_protected(nf_loggers[i],
+						     lockdep_is_held(&nf_log_mutex));
 		if (c_logger == logger)
 			rcu_assign_pointer(nf_loggers[i], NULL);
 		list_del(&logger->list[i]);

^ permalink raw reply related

* [PATCH] net/gianfar: drop recycled skbs on MTU change
From: Sebastian Andrzej Siewior @ 2010-05-03 15:17 UTC (permalink / raw)
  To: Andy Fleming; +Cc: netdev

From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

The size for skb which is added to the recycled list is using the
current descriptor size which is current MTU. gfar_new_skb() is also
using this size. So after changing or alteast increasing the MTU all
recycled skbs should be dropped.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
I'm not 100% sure but it looks like it is wrong.

 drivers/net/gianfar.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index 5267c27..9093106 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -2287,8 +2287,10 @@ static int gfar_change_mtu(struct net_device *dev, int new_mtu)
 
 	/* Only stop and start the controller if it isn't already
 	 * stopped, and we changed something */
-	if ((oldsize != tempsize) && (dev->flags & IFF_UP))
+	if ((oldsize != tempsize) && (dev->flags & IFF_UP)) {
 		stop_gfar(dev);
+		skb_queue_purge(&priv->rx_recycle);
+	}
 
 	priv->rx_buffer_size = tempsize;
 
-- 
1.6.6.1

^ permalink raw reply related

* Re: mmotm 2010-04-28 - RCU whinges
From: Valdis.Kletnieks @ 2010-05-03 15:29 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Andrew Morton, Peter Zijlstra, Patrick McHardy, David S. Miller,
	linux-kernel, netfilter-devel, netdev, Paul E. McKenney
In-Reply-To: <1272898726.2226.47.camel@edumazet-laptop>

[-- Attachment #1: Type: text/plain, Size: 2085 bytes --]

On Mon, 03 May 2010 16:58:46 +0200, Eric Dumazet said:
> Le lundi 03 mai 2010 à 10:30 -0400, Valdis.Kletnieks@vt.edu a écrit :

> > [    9.129872] net/netfilter/nf_log.c:55 invoked rcu_dereference_check() without protection!

> Thanks for the report !
> 
> [PATCH] net: nf_log RCU fixes
> 
> nf_log_register() and nf_log_unregister() use a mutex to have exclusive
> access to nf_logers[]. Use appropriate rcu_dereference_protected()
> lockdep annotation.

Confirming that one fixed.  Now it lives a whole 36 seconds before whinging:

[   35.328729] ===================================================
[   35.328803] [ INFO: suspicious rcu_dereference_check() usage. ]
[   35.328837] ---------------------------------------------------
[   35.328872] net/ipv6/addrconf.c:2977 invoked rcu_dereference_check() without protection!
[   35.328926] 
[   35.328927] other info that might help us debug this:
[   35.328928] 
[   35.329016] 
[   35.329016] rcu_scheduler_active = 1, debug_locks = 0
[   35.329089] 2 locks held by ifconfig/2680:
[   35.329120]  #0:  (&p->lock){+.+.+.}, at: [<ffffffff810f5db8>] seq_read+0x3a/0x42d
[   35.329217]  #1:  (rcu_read_lock_bh){.+....}, at: [<ffffffff814eec68>] rcu_read_lock_bh+0x0/0x35
[   35.329322] 
[   35.329323] stack backtrace:
[   35.329380] Pid: 2680, comm: ifconfig Tainted: G        W   2.6.34-rc5-mmotm0428 #3
[   35.329439] Call Trace:
[   35.329471]  [<ffffffff81064832>] lockdep_rcu_dereference+0xaa/0xb2
[   35.329514]  [<ffffffff814ef3ae>] if6_get_next+0x34/0x6d
[   35.329554]  [<ffffffff814ef3f8>] if6_seq_next+0x11/0x18
[   35.329595]  [<ffffffff810f6083>] seq_read+0x305/0x42d
[   35.329635]  [<ffffffff810f5d7e>] ? seq_read+0x0/0x42d
[   35.329676]  [<ffffffff81129e8c>] proc_reg_read+0x8d/0xac
[   35.329717]  [<ffffffff810db7e0>] vfs_read+0xe0/0x140
[   35.329758]  [<ffffffff810db8f6>] sys_read+0x45/0x69
[   35.329799]  [<ffffffff810025eb>] system_call_fastpath+0x16/0x1b

Maybe I need to go and stick the "RCU whinge multiple times" patch on this
kernel and get it over with. :)


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply

* Re: [PATCHv7] add mergeable buffers support to vhost_net
From: David Stevens @ 2010-05-03 15:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kvm, netdev, virtualization
In-Reply-To: <20100503103410.GA11113@redhat.com>

"Michael S. Tsirkin" <mst@redhat.com> wrote on 05/03/2010 03:34:11 AM:

> On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote:
> > This patch adds mergeable receive buffer support to vhost_net.
> > 
> > Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
> 
> I've been doing some more testing before sending out a pull
> request, and I see a drastic performance degradation in guest to host
> traffic when this is applied but mergeable buffers are not in used
> by userspace (existing qemu-kvm userspace).

        Actually, I wouldn't expect it to work at all; the qemu-kvm
patch (particularly the feature bit setting bug fix) is required.
Without it, I think the existing code will tell the guest to use
mergeable buffers while turning it off in vhost. That was completely
non-functional for me -- what version of qemu-kvm are you using?
        What I did to test w/o mergeable buffers is turn off the
bit in VHOST_FEATURES. I'll recheck these, but qemu-kvm definitely
must be updated; the original doesn't correctly handle feature bits.

                                                                +-DLS


^ permalink raw reply

* Re: mmotm 2010-04-28 - RCU whinges
From: Paul E. McKenney @ 2010-05-03 15:43 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Eric Dumazet, Andrew Morton, Peter Zijlstra, Patrick McHardy,
	David S. Miller, linux-kernel, netfilter-devel, netdev
In-Reply-To: <5112.1272900590@localhost>

On Mon, May 03, 2010 at 11:29:50AM -0400, Valdis.Kletnieks@vt.edu wrote:
> On Mon, 03 May 2010 16:58:46 +0200, Eric Dumazet said:
> > Le lundi 03 mai 2010 à 10:30 -0400, Valdis.Kletnieks@vt.edu a écrit :
> 
> > > [    9.129872] net/netfilter/nf_log.c:55 invoked rcu_dereference_check() without protection!
> 
> > Thanks for the report !
> > 
> > [PATCH] net: nf_log RCU fixes
> > 
> > nf_log_register() and nf_log_unregister() use a mutex to have exclusive
> > access to nf_logers[]. Use appropriate rcu_dereference_protected()
> > lockdep annotation.
> 
> Confirming that one fixed.  Now it lives a whole 36 seconds before whinging:
> 
> [   35.328729] ===================================================
> [   35.328803] [ INFO: suspicious rcu_dereference_check() usage. ]
> [   35.328837] ---------------------------------------------------
> [   35.328872] net/ipv6/addrconf.c:2977 invoked rcu_dereference_check() without protection!
> [   35.328926] 
> [   35.328927] other info that might help us debug this:
> [   35.328928] 
> [   35.329016] 
> [   35.329016] rcu_scheduler_active = 1, debug_locks = 0
> [   35.329089] 2 locks held by ifconfig/2680:
> [   35.329120]  #0:  (&p->lock){+.+.+.}, at: [<ffffffff810f5db8>] seq_read+0x3a/0x42d
> [   35.329217]  #1:  (rcu_read_lock_bh){.+....}, at: [<ffffffff814eec68>] rcu_read_lock_bh+0x0/0x35
> [   35.329322] 
> [   35.329323] stack backtrace:
> [   35.329380] Pid: 2680, comm: ifconfig Tainted: G        W   2.6.34-rc5-mmotm0428 #3
> [   35.329439] Call Trace:
> [   35.329471]  [<ffffffff81064832>] lockdep_rcu_dereference+0xaa/0xb2
> [   35.329514]  [<ffffffff814ef3ae>] if6_get_next+0x34/0x6d
> [   35.329554]  [<ffffffff814ef3f8>] if6_seq_next+0x11/0x18
> [   35.329595]  [<ffffffff810f6083>] seq_read+0x305/0x42d
> [   35.329635]  [<ffffffff810f5d7e>] ? seq_read+0x0/0x42d
> [   35.329676]  [<ffffffff81129e8c>] proc_reg_read+0x8d/0xac
> [   35.329717]  [<ffffffff810db7e0>] vfs_read+0xe0/0x140
> [   35.329758]  [<ffffffff810db8f6>] sys_read+0x45/0x69
> [   35.329799]  [<ffffffff810025eb>] system_call_fastpath+0x16/0x1b
> 
> Maybe I need to go and stick the "RCU whinge multiple times" patch on this
> kernel and get it over with. :)

Highly recommended.  ;-)

And thanks to you for your testing efforts and to Eric for the fixes!!!

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v6] net: batch skb dequeueing from softnet input_pkt_queue
From: Andi Kleen @ 2010-05-03 15:52 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Eric Dumazet, David Miller, hadi, xiaosuo, therbert, shemminger,
	netdev, lenb
In-Reply-To: <20100503070925.572bbee6@infradead.org>

> so the hard problem is that on going idle, the local timers need to be
> funneled to the external HPET. Afaik right now we use one channel of
> the hpet, with the result that we have one global lock for this.
> 
> HPETs have more than one channel (2 or 3 historically, newer chipsets
> iirc have a few more), so in principle we can split this lock at least
> a little bit... if we can get to one hpet channel per level 3 cache
> domain we'd already make huge progress in terms of cost of the
> contention....

I suggested the same thing a few emails up @) (great minds think 
alike etc.etc. @) . 

I'm not sure how difficult it would be to implement though.

Potential issues:

Some user applications use the hpet channels directly through
the character device interface so there would be a potential
compatibility issue (but maybe that should be just moved
to be emulated with a hrtimer ?)

And if multiple broadcast controllers are elected this might
make it harder to become idle.

-Andi

^ permalink raw reply

* Re: [PATCHv7] add mergeable buffers support to vhost_net
From: Michael S. Tsirkin @ 2010-05-03 15:56 UTC (permalink / raw)
  To: David Stevens; +Cc: kvm, netdev, virtualization
In-Reply-To: <OF639FB51C.80B669F0-ON88257718.0054F654-88257718.0055FB45@us.ibm.com>

On Mon, May 03, 2010 at 08:39:08AM -0700, David Stevens wrote:
> "Michael S. Tsirkin" <mst@redhat.com> wrote on 05/03/2010 03:34:11 AM:
> 
> > On Wed, Apr 28, 2010 at 01:57:12PM -0700, David L Stevens wrote:
> > > This patch adds mergeable receive buffer support to vhost_net.
> > > 
> > > Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
> > 
> > I've been doing some more testing before sending out a pull
> > request, and I see a drastic performance degradation in guest to host
> > traffic when this is applied but mergeable buffers are not in used
> > by userspace (existing qemu-kvm userspace).
> 
>         Actually, I wouldn't expect it to work at all;
> the qemu-kvm
> patch (particularly the feature bit setting bug fix) is required.

Which bugfix is that?

> Without it, I think the existing code will tell the guest to use
> mergeable buffers while turning it off in vhost.

It should not do that, specifically
vhost_net_get_features does:
	features &= ~(1 << VIRTIO_NET_F_MRG_RXBUF);
unconditionally. This was added with:
5751995a20e77cd9d61d00f7390401895fa172a6

I forced mergeable buffers off with -global virtio-net-pci.mrg_rxbuf=off
and it did not seem to help either.

> That was completely
> non-functional for me -- what version of qemu-kvm are you using?

992cc816c433332f2e93db033919a9ddbfcd1da4 or later should work well
AFAIK.

>         What I did to test w/o mergeable buffers is turn off the
> bit in VHOST_FEATURES.

It should be enough to force mergeable buffers to off by qemu command
line: -global virtio-net-pci.mrg_rxbuf=off 

> I'll recheck these, but qemu-kvm definitely
> must be updated; the original doesn't correctly handle feature bits.
> 
>                                                                 +-DLS

Hmm, I don't see the bug.

-- 
MST

^ permalink raw reply

* Re: mmotm 2010-04-28 - RCU whinges
From: Eric Dumazet @ 2010-05-03 16:14 UTC (permalink / raw)
  To: paulmck
  Cc: Valdis.Kletnieks, Andrew Morton, Peter Zijlstra, Patrick McHardy,
	David S. Miller, linux-kernel, netfilter-devel, netdev
In-Reply-To: <20100503154357.GF2597@linux.vnet.ibm.com>

Le lundi 03 mai 2010 à 08:43 -0700, Paul E. McKenney a écrit :

> Highly recommended.  ;-)
> 
> And thanks to you for your testing efforts and to Eric for the fixes!!!
> 

For this last one, I think you should push following patch Paul

Followup of commit 3120438ad6
(rcu: Disable lockdep checking in RCU list-traversal primitives)

Or we might introduce a hlist_for_each_entry_continue_rcu_bh() macro...



diff --git a/include/linux/rculist.h b/include/linux/rculist.h
index 004908b..b0c7e24 100644
--- a/include/linux/rculist.h
+++ b/include/linux/rculist.h
@@ -435,10 +435,10 @@ static inline void hlist_add_after_rcu(struct hlist_node *prev,
  * @member:	the name of the hlist_node within the struct.
  */
 #define hlist_for_each_entry_continue_rcu(tpos, pos, member)		\
-	for (pos = rcu_dereference((pos)->next);			\
+	for (pos = rcu_dereference_raw((pos)->next);			\
 	     pos && ({ prefetch(pos->next); 1; }) &&			\
 	     ({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; });  \
-	     pos = rcu_dereference(pos->next))
+	     pos = rcu_dereference_raw(pos->next))
 
 
 #endif	/* __KERNEL__ */



^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox