Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 1/2][RESEND] ehea: error handling improvement
From: David Miller @ 2010-04-22  5:36 UTC (permalink / raw)
  To: tklein; +Cc: netdev, linuxppc-dev, linux-kernel, themann
In-Reply-To: <201004211110.55986.tklein@de.ibm.com>

From: Thomas Klein <tklein@de.ibm.com>
Date: Wed, 21 Apr 2010 11:10:55 +0200

> Reset a port's resources only if they're actually in an error state
> 
> Signed-off-by: Thomas Klein <tklein@de.ibm.com>
> ---
> 
> Patch created against net-2.6

I thought you were sorry for wasting my time and that you were going
to follow the directions I gave you last time, and I quote:

--------------------
3) These are not appropriate for net-2.6 as we are deep in
   the -rcX series at this point and only the most diabolical
   bug fixes are appropriate.  Therefore, please generate these
   against net-next-2.6, thanks.
--------------------

And here you are generating your patches against net-2.6.  Heck, you
even feel it's worth mentioning explicitly.

Lucky for you the patches happen to apply cleanly to net-next-2.6 so
I've put them there.

^ permalink raw reply

* Re: IPv6 duplicate address detection erroneously marking address as duplicate when a host receives its own multicast packets?
From: David Miller @ 2010-04-22  5:30 UTC (permalink / raw)
  To: herbert; +Cc: brian.haley, sam.cannell, netdev
In-Reply-To: <20100422024140.GA7215@gondor.apana.org.au>

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 22 Apr 2010 10:41:40 +0800

> Brian Haley <brian.haley@hp.com> wrote:
>> 
>> Well, my initial reaction is XVM is doing the wrong thing looping-back
>> multicast packets.  You can try the following (untested) patch, I can
>> only confirm it compiles.
> 
> I agree, whatever is looping the packet back should be fixed.

Ethernet does not send multicasts to itself, so we're definitely not
going to cater to this XVM behavior.

^ permalink raw reply

* Re: [PATCH] tcp: fix outsegs stat for TSO segments
From: David Miller @ 2010-04-22  5:28 UTC (permalink / raw)
  To: therbert; +Cc: netdev
In-Reply-To: <alpine.DEB.1.00.1004212214110.14731@pokey.mtv.corp.google.com>

From: Tom Herbert <therbert@google.com>
Date: Wed, 21 Apr 2010 22:17:24 -0700 (PDT)

>  	if (after(tcb->end_seq, tp->snd_nxt) || tcb->seq == tcb->end_seq)
> -		TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS);
> +		TCP_ADD_STATS(sock_net(sk), TCP_MIB_OUTSEGS,
> +		    tcp_skb_pcount(skb));

Please follow proper coding style and make the new line
with the 'tcp_skb_pcount(skb)' argument line up with
the start of the macro arguments on the previous line.

^ permalink raw reply

* Re: [PATCH] net: change recvform to return same address length as getsockname on unnamed unix sockets
From: David Miller @ 2010-04-22  5:26 UTC (permalink / raw)
  To: ppergame; +Cc: netdev, linux-kernel
In-Reply-To: <v2x7447a0ac1004212029qd1866eaekc769fee5b13ac09d@mail.gmail.com>

From: Pavel Pergamenshchik <ppergame@gmail.com>
Date: Wed, 21 Apr 2010 20:29:25 -0700

> unix_*_recvmsg() returns zero-length sockaddr if the sender is an
> unnamed AF_UNIX socket. Change it to return a two-byte sockaddr with
> just the address family, to be consistent with unix_getname().
> 
> Signed-off-by: Pavel Pergamenshchik <ppergame@gmail.com>

Since we've behaved this way for at least 10 years, the existing
behavior is the user visible ABI and the risk of breaking things by
making the change is too great.

I'm not applying this, sorry.

^ permalink raw reply

* [PATCH] tcp: fix outsegs stat for TSO segments
From: Tom Herbert @ 2010-04-22  5:17 UTC (permalink / raw)
  To: davem, netdev

Account for TSO segments of an skb in TCP_MIB_OUTSEGS counter.  Without
this, the counter can be off by orders of magnitude from the
actual number of segments sent.

Signed-off-by: Tom Herbert <therbert@google.com>
---
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 884fdbb..92456f1 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -133,6 +133,8 @@ struct linux_xfrm_mib {
 			__this_cpu_add(mib[0]->mibs[field], addend)
 #define SNMP_ADD_STATS_USER(mib, field, addend)	\
 			this_cpu_add(mib[1]->mibs[field], addend)
+#define SNMP_ADD_STATS(mib, field, addend)	\
+			this_cpu_add(mib[0]->mibs[field], addend)
 /*
  * Use "__typeof__(*mib[0]) *ptr" instead of "__typeof__(mib[0]) ptr"
  * to make @ptr a non-percpu pointer.
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 70c5159..91640fe 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -294,6 +294,7 @@ extern struct proto tcp_prot;
 #define TCP_INC_STATS_BH(net, field)	SNMP_INC_STATS_BH((net)->mib.tcp_statistics, field)
 #define TCP_DEC_STATS(net, field)	SNMP_DEC_STATS((net)->mib.tcp_statistics, field)
 #define TCP_ADD_STATS_USER(net, field, val) SNMP_ADD_STATS_USER((net)->mib.tcp_statistics, field, val)
+#define TCP_ADD_STATS(net, field, val)	SNMP_ADD_STATS((net)->mib.tcp_statistics, field, val)
 
 extern void			tcp_v4_err(struct sk_buff *skb, u32);
 
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2b7d71f..f89fadc 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -888,7 +888,8 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 		tcp_event_data_sent(tp, skb, sk);
 
 	if (after(tcb->end_seq, tp->snd_nxt) || tcb->seq == tcb->end_seq)
-		TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS);
+		TCP_ADD_STATS(sock_net(sk), TCP_MIB_OUTSEGS,
+		    tcp_skb_pcount(skb));
 
 	err = icsk->icsk_af_ops->queue_xmit(skb);
 	if (likely(err <= 0))
@@ -2503,7 +2504,7 @@ struct sk_buff *tcp_make_synack(struct sock *sk, struct dst_entry *dst,
 	th->window = htons(min(req->rcv_wnd, 65535U));
 	tcp_options_write((__be32 *)(th + 1), tp, &opts);
 	th->doff = (tcp_header_size >> 2);
-	TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS);
+	TCP_ADD_STATS(sock_net(sk), TCP_MIB_OUTSEGS, tcp_skb_pcount(skb));
 
 #ifdef CONFIG_TCP_MD5SIG
 	/* Okay, we have all we need - do the md5 hash if needed */

^ permalink raw reply related

* [PATCH] net: change recvform to return same address length as getsockname on unnamed unix sockets
From: Pavel Pergamenshchik @ 2010-04-22  3:29 UTC (permalink / raw)
  To: netdev, davem; +Cc: linux-kernel

unix_*_recvmsg() returns zero-length sockaddr if the sender is an
unnamed AF_UNIX socket. Change it to return a two-byte sockaddr with
just the address family, to be consistent with unix_getname().

Signed-off-by: Pavel Pergamenshchik <ppergame@gmail.com>

---
Minimal example at http://xzrq.net/uaddrwtf.c
Solaris/OS X print 16 and 16. Linux prints 0 and 2 as described above.

--- a/net/unix/af_unix.c	2010-04-01 16:02:33.000000000 -0700
+++ b/net/unix/af_unix.c	2010-04-21 20:17:43.564703748 -0700
@@ -1634,9 +1634,13 @@
 static void unix_copy_addr(struct msghdr *msg, struct sock *sk)
 {
 	struct unix_sock *u = unix_sk(sk);
+	struct sockaddr_un *sunaddr;

-	msg->msg_namelen = 0;
-	if (u->addr) {
+	if (!u->addr) {
+		msg->msg_namelen = sizeof(short);
+		sunaddr = msg->msg_name;
+		sunaddr->sun_family = AF_UNIX;
+	} else {
 		msg->msg_namelen = u->addr->len;
 		memcpy(msg->msg_name, u->addr->name, u->addr->len);
 	}

^ permalink raw reply

* Re: IPv6 duplicate address detection erroneously marking address as duplicate when a host receives its own multicast packets?
From: Herbert Xu @ 2010-04-22  2:41 UTC (permalink / raw)
  To: Brian Haley; +Cc: sam.cannell, netdev
In-Reply-To: <4BCFA615.8060205@hp.com>

Brian Haley <brian.haley@hp.com> wrote:
> 
> Well, my initial reaction is XVM is doing the wrong thing looping-back
> multicast packets.  You can try the following (untested) patch, I can
> only confirm it compiles.

I agree, whatever is looping the packet back should be fixed.

And if we are going to filter them out at our end, then it should
be done below IP.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: Bug#577640: linux-image-2.6.33-2-amd64: Kernel warnings in netns  thread
From: Eric W. Biederman @ 2010-04-22  2:38 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Martín Ferrari, 577640, netdev, Eric W. Biederman,
	Alexey Dobriyan, Mathieu Lacage
In-Reply-To: <1271895278.2582.3.camel@localhost>

Ben Hutchings <ben@decadent.org.uk> writes:

> On Wed, 2010-04-21 at 12:36 -0700, Eric W. Biederman wrote:
>> Martín Ferrari <martin.ferrari@gmail.com> writes:
>> 
>> > I'm not starting a new thread/bug, as this is probably related...
>> >
>> > I just discovered that in 2.6.33, if I create a veth inside a
>> > namespace and then move one of the halves into the main namespace,
>> > when I kill the namespace, I get one of these warnings followed by an
>> > oops. This does not happen if the veth is created from the main ns and
>> > then moved, nor in 2.6.32. This happens both in Qemu and on real
>> > hardware (both amd64)
>> >
>> > To reproduce:
>> >
>> > $ sudo ./startns bash
>> > # ip l a type veth
>> > # ip l s veth0 netns 1
>> > # exit
>> 
>> Nasty weird. I did a quick test here, and I'm not seeing that.
>> Does the 2.6.33 experimental kernel have any patches applied?
>
> Yes, but not many beyond the stable updates, and nothing in this area.
> You can see the list at:
> http://svn.debian.org/wsvn/kernel/dists/trunk/linux-2.6/debian/patches/series/base

Then I should ask what is startns?

Either that is doing something different from my equivalent program, or I have
patches to fix this, that just haven't been merged yet.

Eric

^ permalink raw reply

* Re: IPv6: race condition in __ipv6_ifa_notify() and dst_free() ?
From: Herbert Xu @ 2010-04-22  2:32 UTC (permalink / raw)
  To: Jiri Bohac; +Cc: Hideaki YOSHIFUJI, netdev, David Miller, Stephen Hemminger
In-Reply-To: <20100421213429.GA2799@midget.suse.cz>

On Wed, Apr 21, 2010 at 11:34:29PM +0200, Jiri Bohac wrote:
> Hi,
> 
> On Tue, Apr 20, 2010 at 07:44:01PM +0200, Jiri Bohac wrote:
> > What is the reason __ipv6_ifa_notify() calls dst_free() when
> > ip6_del_rt() fails? I don't see a way ip6_del_rt() could fail
> > with the dst still needing to be freed.
> 
> checked again and I still think that if ip6_del_rt() fails,
> ifp->rt must have been freed already. Anybody with a
> counterexample?

I agree with your diagnosis and the two duplicate NDISC messages
scenario sounds plausible.

Anyway, I think the root of the issue is the fact that NDISC is
calling addrconf_dad_failure with no locking whatsoever.  The
latter is not idempotent so some form of locking is needed.

This bug appears to have been around since the very start.

I'll dig deeper to see where we might be able to add some locks.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v3] net: batch skb dequeueing from softnet input_pkt_queue
From: Changli Gao @ 2010-04-22  1:35 UTC (permalink / raw)
  To: Tom Herbert; +Cc: Eric Dumazet, David S. Miller, netdev, jamal
In-Reply-To: <k2n65634d661004211623k3ce51c95o2c329529ce402eda@mail.gmail.com>

On Thu, Apr 22, 2010 at 7:23 AM, Tom Herbert <therbert@google.com> wrote:
>
> How about just using two input_pkt_queue's (define
> input_pkt_queue[2])?  One that is used to enqueue from RPS, and one
> that is being processed by process_backlog.  Then the only thing that
> needs to be done under lock in process_backlog is to switch the
> queues;  something like sd->current_input_pkt_queue ^= 1
>

It is a better idea, IMO.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: IPv6 duplicate address detection erroneously marking address as duplicate when a host receives its own multicast packets?
From: Brian Haley @ 2010-04-22  1:27 UTC (permalink / raw)
  To: Sam Cannell; +Cc: netdev
In-Reply-To: <1271880831.6685.6.camel@spathi>

Sam Cannell wrote:
> I've been having some trouble with ip6 duplicate address detection in a
> Linux VM (under XVM on OpenSolaris).  It seems that the ethernet bridge
> in XVM sends a host's own multicast packets back to it, which the
> duplicate address detection code in linux decide that another host on
> the network is using the same address.
<snip>
>
> I'd happily put this down to a failing in XVM, however the stateless
> autoconfiguration RFC (4862) states that the stack shouldn't decide an
> address is duplicate based on receipt of a neighbor solicitation message
> that it sent itself:
<snip>
> 
> Assuming my understanding of the RFC is correct, this suggests to me
> that duplicate address detection in Linux is being a little too hasty to
> mark the address as invalid.  Thoughts?

Well, my initial reaction is XVM is doing the wrong thing looping-back
multicast packets.  You can try the following (untested) patch, I can
only confirm it compiles.

-Brian


Add a check for looped-back DAD packets on Ethernet interfaces.

Signed-off-by: Brian Haley <brian.haley@hp.com>

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index da0a4d2..33a7212 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -57,6 +57,7 @@
 #include <linux/net.h>
 #include <linux/in6.h>
 #include <linux/route.h>
+#include <linux/etherdevice.h>
 #include <linux/init.h>
 #include <linux/rcupdate.h>
 #include <linux/slab.h>
@@ -800,6 +801,16 @@ static void ndisc_recv_ns(struct sk_buff *skb)
 					}
 				}
 
+				if (dev->type == ARPHRD_ETHER) {
+					struct ethhdr *eth = eth_hdr(skb);
+					if (!compare_ether_addr_64bits(
+								dev->dev_addr,
+								eth->h_source)){
+						/* looped-back to us */
+						goto out;
+					}
+				}
+
 				/*
 				 * We are colliding with another node
 				 * who is doing DAD

^ permalink raw reply related

* Re: rps perfomance WAS(Re: rps: question
From: Changli Gao @ 2010-04-22  1:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: hadi, Rick Jones, David Miller, therbert, netdev, robert, andi
In-Reply-To: <1271876480.7895.3106.camel@edumazet-laptop>

On Thu, Apr 22, 2010 at 3:01 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Thanks a lot Jamal, this is really useful
>
> Drawback of using a fixed src ip from your generator is that all flows
> share the same struct dst entry on SUT. This might explain some glitches
> you noticed (ip_route_input + ip_rcv at high level on slave/application
> cpus)
> Also note your test is one way. If some data was replied we would see
> much use of the 'flows'
>
> I notice epoll_ctl() used a lot, are you re-arming epoll each time you
> receive a datagram ?
>
> I see slave/application cpus hit _raw_spin_lock_irqsave() and
> _raw_spin_unlock_irqrestore().
>
> Maybe a ring buffer could help (instead of a double linked queue) for
> backlog, or the double queue trick, if Changli wants to respin his
> patch.
>
>

OK, I'll post a new patch against the current tree, so Jamal can have
a try. I am sorry, but I don't have a suitable computer for benchmark.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Re: [PATCH] mac8390: change an error return code and some cleanup, take 3
From: Finn Thain @ 2010-04-22  1:13 UTC (permalink / raw)
  To: David Miller; +Cc: joe, p_gortmaker, netdev, linux-kernel, linux-m68k
In-Reply-To: <20100421.163041.158540277.davem@davemloft.net>


On Wed, 21 Apr 2010, David Miller wrote:

> From: Finn Thain <fthain@telegraphics.com.au>
> Date: Sat, 17 Apr 2010 13:16:04 +1000 (EST)
> 
> > 
> > Change an error return code from -EAGAIN to -EBUSY since the former is 
> > misleading.
> > 
> > Nubus slots are geographically addressed and their irqs are equally 
> > inflexible. -EAGAIN is misleading because retrying will not help fix 
> > whatever bug it was that made the irq unavailable.
> 
> request_irq() itself returns an appropriate error code, so the
> correct change is to do:
> 
> 	err = request_irq( ... );
> 	if (err) {
> 	...
> 
> and return 'err'.

OK. I'll send a new patch once 2.6.34 is out and I have time to test this 
and some other patches.

Finn

^ permalink raw reply

* Re: Bug#577640: linux-image-2.6.33-2-amd64: Kernel warnings in netns  thread
From: Ben Hutchings @ 2010-04-22  0:14 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Martín Ferrari, 577640, netdev, Eric W. Biederman,
	Alexey Dobriyan, Mathieu Lacage
In-Reply-To: <m1ljcgzjh4.fsf@fess.ebiederm.org>

[-- Attachment #1: Type: text/plain, Size: 1121 bytes --]

On Wed, 2010-04-21 at 12:36 -0700, Eric W. Biederman wrote:
> Martín Ferrari <martin.ferrari@gmail.com> writes:
> 
> > I'm not starting a new thread/bug, as this is probably related...
> >
> > I just discovered that in 2.6.33, if I create a veth inside a
> > namespace and then move one of the halves into the main namespace,
> > when I kill the namespace, I get one of these warnings followed by an
> > oops. This does not happen if the veth is created from the main ns and
> > then moved, nor in 2.6.32. This happens both in Qemu and on real
> > hardware (both amd64)
> >
> > To reproduce:
> >
> > $ sudo ./startns bash
> > # ip l a type veth
> > # ip l s veth0 netns 1
> > # exit
> 
> Nasty weird. I did a quick test here, and I'm not seeing that.
> Does the 2.6.33 experimental kernel have any patches applied?

Yes, but not many beyond the stable updates, and nothing in this area.
You can see the list at:
http://svn.debian.org/wsvn/kernel/dists/trunk/linux-2.6/debian/patches/series/base

Ben.

-- 
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply

* Re: [net-next,1/2] add iovnl netlink support
From: Scott Feldman @ 2010-04-22  0:01 UTC (permalink / raw)
  To: Chris Wright, Arnd Bergmann; +Cc: davem, netdev
In-Reply-To: <20100421221806.GD28829@x200.localdomain>

On 4/21/10 3:18 PM, "Chris Wright" <chrisw@redhat.com> wrote:

>> The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
>> as I can tell. Interestingly, this is not actually implemented in
>> the enic driver in patch 2/2. So if we all agree that this is out of the
>> scope of iovnl, let's just remove it from the interface and find another
>> way for it (ethtool, iplink, ..., as listed above).
> 
> Scott, any objection?  At least a way to keep moving forward on the port
> profile bit.

Yes, that's fine with me, port-profile bit is the most important part.
 
>> Note that we still need to pass the MAC address and VLAN ID (or a list
>> of these) to the external switch, my point is just that this should be
>> separate from enforcing it in the hypervisor.
> 
> Yup, we should focus on reconciling the diff of enic vs vpd port profile
> needs.

-scott


^ permalink raw reply

* Re: [net-next,1/2] add iovnl netlink support
From: Scott Feldman @ 2010-04-21 23:54 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Chris Wright, davem, netdev
In-Reply-To: <201004212313.05060.arnd@arndb.de>

On 4/21/10 2:13 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:

> On Wednesday 21 April 2010, Scott Feldman wrote:
>> On 4/21/10 12:39 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
>> 
>>>>> 1. Setting up the slave device
>>>>>  a) create an SR-IOV VF to assign to a guest
>>>>>  b) create a macvtap device to pass to qemu or vhost
>>>>>  c) attach a tap device to a bridge
>>>>>  d) create a macvlan device and put it into a container
>>>>>  e) create a virtual interface for a VMDq adapter
>>>> 
>>>> OK, but iovnl isn't doing this.
>>> 
>>> The set_mac_vlan that Scott's patch adds seems to implement 1a), as far
>>> as I can tell. Interestingly, this is not actually implemented in
>>> the enic driver in patch 2/2. So if we all agree that this is out of the
>>> scope of iovnl, let's just remove it from the interface and find another
>>> way for it (ethtool, iplink, ..., as listed above).
>> 
>> You're right, not needed for enic since mac addr is included with
>> port-profile push and vlan membership is implied by port-profile.  So I put
>> set_mac_vlan in there basically to elicit feedback.
> 
> Ok. Two points though:
> 
> - when you say that the mac address is included in the port-profile push,
>   does that imply that the VF does not have a mac address prior to this?

Correct, VF has no mac addr prior to port-profile being applied.  The
mac_addr is the mac_addr of the VM guest interface that's to use the VF.  If
the port-profile defines L2 mac spoofing, for example, the switch wants to
know the mac address before i/o starts.   I/o doesn't start until
port-profile is applied and the switch virtual port is setup.

>   This would again mix the NIC configuration phase with the switch
>   association, which I think we really need to avoid if we want to be
>   able to implement the association in user space!
> 
> - The VLAN ID being implied in the port profile seems to be another
>   difference between what enic is doing and the current draft VDP
>   that will eventually become 802.1Qbg, and I fear that this difference
>   will be visible in the iovnl protocol.

It's not just a VLAN ID, but the entire VLAN membership for the switch
virtual port.  The port-profile may define a single native VLAN for access
mode on the switch port, or a trunk mode with a list of allowed vlans, with
on native vlan.

The key is the port-profile.  The port-profile resolves the configuration of
the switch virtual port.  The configuration of the switch virtual port
includes many setting like I mentioned earlier: VLAN membership, QoS (rate
limits, priority class, L2 security, etc).
 
>> There really wouldn't be much different between iplink and iovnl since
>> they're both rtnetlink...seems we should keep IOV-related APIs in one place.
>> Maybe there are other IOV APIs to add to iovnl in the future like:
>> 
>>     vf <- add_vf(pf)
>>     del_vf(pf, vf)
>> 
>> Ethtool doesn't seem the right place for this.
> 
> Right. My preference would probably be make these a subcategory of
> the if_link, and use the existing RTM_NEWLINK/RTM_DELLINK commands.
> This would make it resemble the existing interfaces and mean you can
> use
>
> ip link add link eth0 type macvlan    # for a container
> ip link add link eth0 type macvtap    # for qemu/vhost
> ip link add link eth0 type vf         # for device assignment
> 
> There are obviously significant differences between these three, but
> they also share enough of their properties to let us treat them
> in similar ways.
> 

I don't have strong preference for iovnl vs. extending if_link.  I thought I
had a reason against if_link, but I can't recall that now...it'll probably
come to me when I look at it again.  Let me look again...
 
> If we integrate the iovnl client into iproute2, the sequence for setting
> up an enic VF and associating it to the port profile could be
> 
> # create vf0, pass mac and vlan id to HW, no association yet
> ip link add link eth0 name vf0 type vf mac fe:dc:ba:12:34:56 vlan 78
> 
> # associate vf with port profile, mac address must match the one assigned
> #  to the interface before.
> ip iov assoc eth0 port-profile "general" host-uuid
> "dcf2a873-f5ee-41dd-a7ad-802a544e48c2" \
> mac fe:dc:ba:12:34:56

Ya, that sounds pretty close.  I still want the flexibility to direct ops to
a PF link for a VF link.

-scott


^ permalink raw reply

* Re: [PATCH] cxgb3: fix linkup issue
From: David Miller @ 2010-04-21 23:34 UTC (permalink / raw)
  To: divy; +Cc: h-shimamoto, netdev, linux-kernel
In-Reply-To: <4BCF4E0E.1080805@chelsio.com>

From: Divy Le Ray <divy@chelsio.com>
Date: Wed, 21 Apr 2010 12:12:14 -0700

> Hiroshi Shimamoto wrote:
>> From: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
>>
>> I encountered an issue that not to link up on cxgb3 fabric.
>> I bisected and found that this regression was introduced by
>> 0f07c4ee8c800923ae7918c231532a9256233eed.
>>
>> Correct to pass phy_addr to cphy_init() at t3_xaui_direct_phy_prep().
>>
>> Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
>>   
> 
> Sorry for the review delay, I just came back from some time off.
> Acked-by: Divy Le Ray <divy@chelsio.com>

Applied to net-2.6, thanks.

^ permalink raw reply

* Re: [PATCH] ks8842: Add platform data for setting mac address
From: David Miller @ 2010-04-21 23:33 UTC (permalink / raw)
  To: richard.rojfors; +Cc: netdev, bhutchings
In-Reply-To: <1271628741.18194.7.camel@debian>

From: Richard Röjfors <richard.rojfors@pelagicore.com>
Date: Mon, 19 Apr 2010 00:12:21 +0200

> This patch adds platform data to the ks8842 driver.
> 
> Via the platform data a MAC address, to be used by the controller,
> can be passed.
> 
> To ensure this MAC address is used, the MAC address is written
> after each hardware reset.
> 
> Signed-off-by: Richard Röjfors <richard.rojfors@pelagicore.com>

Applied to net-next-2.6, thanks Richard.

^ permalink raw reply

* Re: [PATCH] X25 fix dead unaccepted sockets
From: David Miller @ 2010-04-21 23:32 UTC (permalink / raw)
  To: andrew.hendry; +Cc: netdev
In-Reply-To: <1271549852.2802.37.camel@ibex>

From: Andrew Hendry <andrew.hendry@gmail.com>
Date: Sun, 18 Apr 2010 10:17:32 +1000

> 
> 1, An X25 program binds and listens
> 2, calls arrive waiting to be accepted
> 3, Program exits without accepting
> 4, Sockets time out but don't get correctly cleaned up
> 5, cat /proc/net/x25/socket shows the dead sockets with bad inode fields.
> 
> This line borrowed from AX25 sets the dying socket so the timers clean up later.
> 
> Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH] mac8390: change an error return code and some cleanup, take 3
From: David Miller @ 2010-04-21 23:30 UTC (permalink / raw)
  To: fthain; +Cc: joe, p_gortmaker, netdev, linux-kernel, linux-m68k
In-Reply-To: <alpine.OSX.2.00.1004171258330.358@localhost>

From: Finn Thain <fthain@telegraphics.com.au>
Date: Sat, 17 Apr 2010 13:16:04 +1000 (EST)

> 
> Change an error return code from -EAGAIN to -EBUSY since the former is 
> misleading.
> 
> Nubus slots are geographically addressed and their irqs are equally 
> inflexible. -EAGAIN is misleading because retrying will not help fix 
> whatever bug it was that made the irq unavailable.

request_irq() itself returns an appropriate error code, so the
correct change is to do:

	err = request_irq( ... );
	if (err) {
	...

and return 'err'.

^ permalink raw reply

* Re: [PATCH] KS8851: NULL pointer dereference if list is empty
From: David Miller @ 2010-04-21 23:29 UTC (permalink / raw)
  To: abraham.arce.moreno; +Cc: netdev
In-Reply-To: <k2ocb8016981004161748s1a91f926x3c29b3fbd45ad46c@mail.gmail.com>

From: Abraham Arce <abraham.arce.moreno@gmail.com>
Date: Fri, 16 Apr 2010 19:48:43 -0500

> Fix NULL pointer dereference in ks8851_tx_work by checking if dequeued
> list is already empty before writing the packet to TX FIFO
> 
>  Unable to handle kernel NULL pointer dereference at virtual address 00000050
>  PC is at ks8851_tx_work+0xdc/0x1b0
>  LR is at wait_for_common+0x148/0x164
>  pc : [<c01c0df4>]    lr : [<c025a980>]    psr: 20000013
>  Backtrace:
>   ks8851_tx_work+0x0/0x1b0
>   worker_thread+0x0/0x190
>   kthread+0x0/0x90
> 
> Signed-off-by: Abraham Arce <x0066660@ti.com>

Applied to net-2.6, thanks.

^ permalink raw reply

* Re: [PATCH] drivers/net/pcmcia/3c574_cs: fixing stats.tx_bytes counter
From: David Miller @ 2010-04-21 23:28 UTC (permalink / raw)
  To: linux; +Cc: akurz, netdev
In-Reply-To: <20100416130101.GB7877@comet.dominikbrodowski.net>

From: Dominik Brodowski <linux@dominikbrodowski.net>
Date: Fri, 16 Apr 2010 15:01:01 +0200

> David,
> 
> as this is more netdev-related than PCMCIA-related, could you pick it up?
> Else, I'm willing to take it upstream, but would prefer your ACK on this.

Applied to net-2.6, thanks Dominik.

^ permalink raw reply

* Re: [PATCH] RCU: don't turn off lockdep when find suspicious rcu_dereference_check() usage
From: Eric W. Biederman @ 2010-04-21 23:26 UTC (permalink / raw)
  To: paulmck
  Cc: Eric Dumazet, Miles Lane, Eric Paris, Lai Jiangshan, Ingo Molnar,
	Peter Zijlstra, LKML, vgoyal, nauman, netdev
In-Reply-To: <20100421221426.GS2563@linux.vnet.ibm.com>

"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> writes:

> On Wed, Apr 21, 2010 at 11:57:09PM +0200, Eric Dumazet wrote:
>> Le mercredi 21 avril 2010 à 14:35 -0700, Paul E. McKenney a écrit :
>> 
>> > > [   33.425087] [ INFO: suspicious rcu_dereference_check() usage. ]
>> > > [   33.425090] ---------------------------------------------------
>> > > [   33.425094] net/core/dev.c:1993 invoked rcu_dereference_check()
>> > > without protection!
>> > > [   33.425098]
>> > > [   33.425098] other info that might help us debug this:
>> > > [   33.425100]
>> > > [   33.425103]
>> > > [   33.425104] rcu_scheduler_active = 1, debug_locks = 1
>> > > [   33.425108] 2 locks held by canberra-gtk-pl/4208:
>> > > [   33.425111]  #0:  (sk_lock-AF_INET){+.+.+.}, at:
>> > > [<ffffffff81394ffd>] inet_stream_connect+0x3a/0x24d
>> > > [   33.425125]  #1:  (rcu_read_lock_bh){.+....}, at:
>> > > [<ffffffff8134a809>] dev_queue_xmit+0x14e/0x4b8
>> > > [   33.425137]
>> > > [   33.425138] stack backtrace:
>> > > [   33.425142] Pid: 4208, comm: canberra-gtk-pl Not tainted 2.6.34-rc5 #18
>> > > [   33.425146] Call Trace:
>> > > [   33.425154]  [<ffffffff81067fc2>] lockdep_rcu_dereference+0x9d/0xa5
>> > > [   33.425161]  [<ffffffff8134a914>] dev_queue_xmit+0x259/0x4b8
>> > > [   33.425167]  [<ffffffff8134a809>] ? dev_queue_xmit+0x14e/0x4b8
>> > > [   33.425173]  [<ffffffff81041c52>] ? _local_bh_enable_ip+0xcd/0xda
>> > > [   33.425180]  [<ffffffff8135375a>] neigh_resolve_output+0x234/0x285
>> > > [   33.425188]  [<ffffffff8136f71f>] ip_finish_output2+0x257/0x28c
>> > > [   33.425193]  [<ffffffff8136f7bc>] ip_finish_output+0x68/0x6a
>> > > [   33.425198]  [<ffffffff813704b3>] T.866+0x52/0x59
>> > > [   33.425203]  [<ffffffff813706fe>] ip_output+0xaa/0xb4
>> > > [   33.425209]  [<ffffffff8136ebb8>] ip_local_out+0x20/0x24
>> > > [   33.425215]  [<ffffffff8136f204>] ip_queue_xmit+0x309/0x368
>> > > [   33.425223]  [<ffffffff810e41e6>] ? __kmalloc_track_caller+0x111/0x155
>> > > [   33.425230]  [<ffffffff813831ef>] ? tcp_connect+0x223/0x3d3
>> > > [   33.425236]  [<ffffffff81381971>] tcp_transmit_skb+0x707/0x745
>> > > [   33.425243]  [<ffffffff81383342>] tcp_connect+0x376/0x3d3
>> > > [   33.425250]  [<ffffffff81268ac3>] ? secure_tcp_sequence_number+0x55/0x6f
>> > > [   33.425256]  [<ffffffff813872f0>] tcp_v4_connect+0x3df/0x455
>> > > [   33.425263]  [<ffffffff8133cbd9>] ? lock_sock_nested+0xf3/0x102
>> > > [   33.425269]  [<ffffffff81395067>] inet_stream_connect+0xa4/0x24d
>> > > [   33.425276]  [<ffffffff8133b418>] sys_connect+0x90/0xd0
>> > > [   33.425283]  [<ffffffff81002b9c>] ? sysret_check+0x27/0x62
>> > > [   33.425289]  [<ffffffff81068922>] ? trace_hardirqs_on_caller+0x114/0x13f
>> > > [   33.425296]  [<ffffffff813ced00>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>> > > [   33.425303]  [<ffffffff81002b6b>] system_call_fastpath+0x16/0x1b
>> > 
>> > This looks like an rcu_dereference() needs to instead be
>> > rcu_dereference_bh(), but the line numbering in my version of
>> > net/core/dev.c does not match yours.  CCing netdev, hopefully
>> > someone there will know which rcu_dereference() is indicated.
>> 
>> This is already sorted out in David trees
>
> Very good!!!  ;-)
>
>> > > [   85.939528] [ INFO: suspicious rcu_dereference_check() usage. ]
>> > > [   85.939531] ---------------------------------------------------
>> > > [   85.939535] include/net/inet_timewait_sock.h:227 invoked
>> > > rcu_dereference_check() without protection!
>> > > [   85.939539]
>> > > [   85.939540] other info that might help us debug this:
>> > > [   85.939541]
>> > > [   85.939544]
>> > > [   85.939545] rcu_scheduler_active = 1, debug_locks = 1
>> > > [   85.939549] 2 locks held by gwibber-service/4798:
>> > > [   85.939552]  #0:  (&p->lock){+.+.+.}, at: [<ffffffff811034b2>]
>> > > seq_read+0x37/0x381
>> > > [   85.939566]  #1:  (&(&hashinfo->ehash_locks[i])->rlock){+.-...},
>> > > at: [<ffffffff81386355>] established_get_next+0xc4/0x132
>> > > [   85.939579]
>> > > [   85.939580] stack backtrace:
>> > > [   85.939585] Pid: 4798, comm: gwibber-service Not tainted 2.6.34-rc5 #18
>> > > [   85.939588] Call Trace:
>> > > [   85.939598]  [<ffffffff81067fc2>] lockdep_rcu_dereference+0x9d/0xa5
>> > > [   85.939604]  [<ffffffff81385018>] twsk_net+0x4f/0x57
>> > > [   85.939610]  [<ffffffff813862e5>] established_get_next+0x54/0x132
>> > > [   85.939615]  [<ffffffff813864c7>] tcp_seq_next+0x5d/0x6a
>> > > [   85.939621]  [<ffffffff81103701>] seq_read+0x286/0x381
>> > > [   85.939627]  [<ffffffff8110347b>] ? seq_read+0x0/0x381
>> > > [   85.939633]  [<ffffffff81133240>] proc_reg_read+0x8d/0xac
>> > > [   85.939640]  [<ffffffff810ea110>] vfs_read+0xa6/0x103
>> > > [   85.939645]  [<ffffffff810ea223>] sys_read+0x45/0x69
>> > > [   85.939652]  [<ffffffff81002b6b>] system_call_fastpath+0x16/0x1b
>> > 
>> > This one appears to be a case of missing rcu_read_lock(), but it is
>> > not clear to me at what level it needs to go.
>> > 
>> > Eric, any enlightenment on this one and the next one?
>> 
>> Coming from commit b099ce2602d806deb41caaa578731848995cdb2a
>> >From Eric Biederman (CCed)
>> 
>> Apparently he added rcu to twsk_net(), but Changelog doesnt mention it.
>
> Thank you for chasing this down, Eric Dumazet!
>
> Eric Biederman, any enlightment?

That change to twsk_net probably should have come in
575f4cd5a5b639457747434dbe18d175fa767db4.  The point was to make
twsk_net usable in an rcu context, instead of requiring a lock. 

Should it become rcu_deference_raw now that we have lockdep support?

commit 575f4cd5a5b639457747434dbe18d175fa767db4
Author: Eric W. Biederman <ebiederm@xmission.com>
Date:   Thu Dec 3 02:29:08 2009 +0000

    net: Use rcu lookups in inet_twsk_purge.
    
    While we are looking up entries to free there is no reason to take
    the lock in inet_twsk_purge.  We have to drop locks and restart
    occassionally anyway so adding a few more in case we get on the
    wrong list because of a timewait move is no big deal.  At the
    same time not taking the lock for long periods of time is much
    more polite to the rest of the users of the hash table.
    
    In my test configuration of killing 4k network namespaces
    this change causes 4k back to back runs of inet_twsk_purge on an
    empty hash table to go from roughly 20.7s to 3.3s, and the total
    time to destroy 4k network namespaces goes from roughly 44s to
    3.3s.
    
    Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
    Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>



Eric

^ permalink raw reply

* Re: [RFC][PATCH] xfrm6 refcnt problem in bundle creation
From: David Miller @ 2010-04-21 23:25 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev
In-Reply-To: <4BC73FB7.9090106@dev.6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@dev.6wind.com>
Date: Thu, 15 Apr 2010 18:32:55 +0200

> Subject: [PATCH] xfrm6: ensure to use the same dev when building a bundle
> 
> When building a bundle, we set dst.dev and rt6.rt6i_idev.
> We must ensure to set the same device for both fields.
> 
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>

What we are doing now is definitely wrong and I think your
patch is the correct fix.

Applied to net-2.6, thanks!

^ permalink raw reply

* Re: [PATCH] net: small cleanup of lib8390
From: David Miller @ 2010-04-21 23:23 UTC (permalink / raw)
  To: knikanth; +Cc: p_gortmaker, netdev, viro, jeff
In-Reply-To: <201004151751.23899.knikanth@suse.de>

From: Nikanth Karthikesan <knikanth@suse.de>
Date: Thu, 15 Apr 2010 17:51:23 +0530

> Remove the always true #if 1. Also the unecessary re-test of ei_local->irqlock
> and the unreachable printk format string.
> 
> Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>

Applied to net-next-2.6, thanks.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox