Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCHv2] vhost-net: utilize PUBLISH_USED_IDX feature
From: Juan Quintela @ 2010-05-18 11:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: davem, Rusty Russell, Paul E. McKenney, Arnd Bergmann, kvm,
	virtualization, netdev, linux-kernel, alex.williamson, amit.shah
In-Reply-To: <20100518022105.GA23129@redhat.com>

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> With PUBLISH_USED_IDX, guest tells us which used entries
> it has consumed. This can be used to reduce the number
> of interrupts: after we write a used entry, if the guest has not yet
> consumed the previous entry, or if the guest has already consumed the
> new entry, we do not need to interrupt.
> This imporves bandwidth by 30% under some workflows.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>
> This is on top of Rusty's tree and depends on the virtio patch.
>
> Changes from v1:
> fix build

Minor nit if you have to respin it:

>  /* This actually signals the guest, using eventfd. */
> -void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> +static void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>  {
>  	__u16 flags;
> +	__u16 used;

local var named "used" here

>  	/* Flush out used index updates. This is paired
>  	 * with the barrier that the Guest executes when enabling
>  	 * interrupts. */
> @@ -1053,6 +1057,17 @@ void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>  	     !vhost_has_feature(dev, VIRTIO_F_NOTIFY_ON_EMPTY)))
>  		return;
>  
> +	if (vhost_has_feature(dev, VIRTIO_RING_F_PUBLISH_USED)) {
> +		__u16 used;

and a "more" local one also named "used" :)

I guess you want to remove the previous declaration, as var is only used
in this block.

> +		if (get_user(used, vq->last_used)) {
> +			vq_err(vq, "Failed to get last used idx");
> +			return;
> +		}
> +
> +		if (used != (u16)(vq->last_used_idx - 1))
> +			return;
> +	}
> +
>  	/* Signal the Guest tell them we used something up. */
>  	if (vq->call_ctx)
>  		eventfd_signal(vq->call_ctx, 1);

Rest of patch looks ok, and as a bonus un-export two functions.

Later, Juan.

^ permalink raw reply

* Re: [PATCHv2] virtio: put last seen used index into ring itself
From: Juan Quintela @ 2010-05-18 11:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rusty Russell, Jiri Pirko, Shirley Ma, Amit Shah, Mark McLoughlin,
	netdev, linux-kernel, alex.williamson
In-Reply-To: <20100518021315.GA22852@redhat.com>

"Michael S. Tsirkin" <mst@redhat.com> wrote:
> Generally, the Host end of the virtio ring doesn't need to see where
> Guest is up to in consuming the ring.  However, to completely understand
> what's going on from the outside, this information must be exposed.
> For example, host can reduce the number of interrupts by detecting
> that the guest is currently handling previous buffers.
>
> Fortunately, we have room to expand: the ring is always a whole number
> of pages and there's hundreds of bytes of padding after the avail ring
> and the used ring, whatever the number of descriptors (which must be a
> power of 2).
>
> We add a feature bit so the guest can tell the host that it's writing
> out the current value there, if it wants to use that.
>
> This is based on a patch by Rusty Russell, with the main difference
> being that we dedicate a feature bit to guest to tell the host it is
> writing the used index.  This way we don't need to force host to publish
> the last available index until we have a use for it.
>
> Another difference is that while the feature helps virtio-net,
> there have been conflicting reports wrt virtio-blk.
> The reason is unknown, it could be due to the fact that
> virtio-blk does not bother to disable interrupts at all.
> So for now, this patch only acks this feature for -net.
>
> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

It looks good.

Later, Juan.

^ permalink raw reply

* RE: get beyond 1Gbps with pktgen on 10Gb nic?
From: Jon Zhou @ 2010-05-18 11:14 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev@vger.kernel.org
In-Reply-To: <4BEAF44E.8090201@hp.com>

hi rick:

do you mean "TCP_NODELAY" will send with packet size as I expect

without this option,netperf might sent packet with large size? (but eventually it will be splitted into MTU size?)

thanks
jon

-----Original Message-----
From: Rick Jones [mailto:rick.jones2@hp.com] 
Sent: Thursday, May 13, 2010 2:33 AM
To: Jesse Brandeburg
Cc: Jon Zhou; Ben Greear; Ben Hutchings; netdev@vger.kernel.org
Subject: Re: get beyond 1Gbps with pktgen on 10Gb nic?

Jesse Brandeburg wrote:
> On Tue, May 11, 2010 at 9:00 PM, Jon Zhou <Jon.Zhou@jdsu.com> wrote:
> 
>> I just used multi netperf instances to reach 900K pps/8Gb+ traffic on the
>> Broadcom 10G nic:

>>
>>command:
>>
>>for i in 1 2 3 4 5 6 7 8 9 10
>>do
>>netperf -l 60 -H 192.168.0.53 -- -m 60 -s 100M -S 100M &
>>done

100 Megabytes seems a trifle excessive as a socket buffer size.  I would suggest 
  lopping-off a few zeros and use 1M instead.  Or, one can let linux auto-tune 
the socket buffers/windows - just don't accept the socket buffer size reported 
by the classic netperf command - it is from the initial creation of the socket. 
  To get what it became by the end of the test one should use the "omni" tests. 
  Contact me offlist or via netperf-talk in the netperf.org domain for more on that.

>> the msg size was assigned as 64 bytes, but when I checked the file captured
>> by tcpdump, found that netperf sent many frames which are large than 64
>> bytes(i.e.4000-10K+ bytes) and these frames were truncated by tcpdump.
>>
>> so that the actual avg packet size is around 1500 bytes, but what I want is
>> avg packet: 300-400 bytes and reach 5Gb+.
>>
>>does it make sense?
> 
> 
> if you set the TCP_NODELAY (to disable nagle) option on netperf 

If he was seeing 4K to 10K byte frames in his tcpdump, that was likely TSO above 
and beyond nagle.  I was going to say it also suggests he was running tcpdump on 
the sending side rather than the receiver, but then there is LRO/GRO isn't there...

> (check netperf -t TCP_STREAM -- -h) then you should be able to control packet
>  size.

Or at least influence it meaningfully :)  If he was seeing 4K to 10K byte frames 
in his tcpdump, that was likely TSO above and beyond nagle.If there are packet 
losses and retransmissions, the retransmissions, which may or may not include 
new data may be larger.

The "netperf -t TCP_STREAM -- -h" to get test-specific help shown by Jesse will 
show the option you need to add to set TCP_NODELAY.  For additional descriptions 
of netperf command options:

http://www.netperf.org/svn/netperf2/tags/netperf-2.4.5/doc/netperf.html

For "quick and dirty" testing, the loop as it appears above is "ok" but I would 
suggest abusing the confidence intervals code to minimize the skew error:

http://www.netperf.org/svn/netperf2/tags/netperf-2.4.5/doc/netperf.html#Using-Netperf-to-Measure-Aggregate-Performance

happy benchmarking,

rick jones

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 3/4] ipv6: Use POSTDAD state
From: Herbert Xu @ 2010-05-18 11:04 UTC (permalink / raw)
  To: David Miller, jbohac, yoshfuji, netdev, shemminger
In-Reply-To: <20100518110243.GA7750@gondor.apana.org.au>

ipv6: Use POSTDAD state

This patch makes use of the new POSTDAD state.  This prevents
a race between DAD completion and failure.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/ipv6/addrconf.c |   29 ++++++++++++++++++++++++-----
 1 file changed, 24 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index d8d7c63..e7e9643 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1412,10 +1412,27 @@ static void addrconf_dad_stop(struct inet6_ifaddr *ifp, int dad_failed)
 		ipv6_del_addr(ifp);
 }
 
+static int addrconf_dad_end(struct inet6_ifaddr *ifp)
+{
+	int err = -ENOENT;
+
+	spin_lock(&ifp->state_lock);
+	if (ifp->state == INET6_IFADDR_STATE_DAD) {
+		ifp->state = INET6_IFADDR_STATE_POSTDAD;
+		err = 0;
+	}
+	spin_unlock(&ifp->state_lock);
+
+	return err;
+}
+
 void addrconf_dad_failure(struct inet6_ifaddr *ifp)
 {
 	struct inet6_dev *idev = ifp->idev;
 
+	if (addrconf_dad_end(ifp))
+		return;
+
 	if (net_ratelimit())
 		printk(KERN_INFO "%s: IPv6 duplicate address %pI6c detected!\n",
 			ifp->idev->dev->name, &ifp->addr);
@@ -2731,6 +2748,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 
 			/* Flag it for later restoration when link comes up */
 			ifa->flags |= IFA_F_TENTATIVE;
+			ifa->state = INET6_IFADDR_STATE_DAD;
 
 			write_unlock_bh(&idev->lock);
 
@@ -2895,6 +2913,9 @@ static void addrconf_dad_timer(unsigned long data)
 	struct inet6_dev *idev = ifp->idev;
 	struct in6_addr mcaddr;
 
+	if (!ifp->probes && addrconf_dad_end(ifp))
+		goto out;
+
 	read_lock(&idev->lock);
 	if (idev->dead || !(idev->if_flags & IF_READY)) {
 		read_unlock(&idev->lock);
@@ -2967,12 +2988,10 @@ static void addrconf_dad_run(struct inet6_dev *idev) {
 	read_lock_bh(&idev->lock);
 	for (ifp = idev->addr_list; ifp; ifp = ifp->if_next) {
 		spin_lock(&ifp->lock);
-		if (!(ifp->flags & IFA_F_TENTATIVE)) {
-			spin_unlock(&ifp->lock);
-			continue;
-		}
+		if (ifp->flags & IFA_F_TENTATIVE &&
+		    ifp->state == INET6_IFADDR_STATE_DAD)
+			addrconf_dad_kick(ifp);
 		spin_unlock(&ifp->lock);
-		addrconf_dad_kick(ifp);
 	}
 	read_unlock_bh(&idev->lock);
 }

^ permalink raw reply related

* [PATCH 4/4] ipv6: Never schedule DAD timer on dead address
From: Herbert Xu @ 2010-05-18 11:04 UTC (permalink / raw)
  To: David Miller, jbohac, yoshfuji, netdev, shemminger
In-Reply-To: <20100518110243.GA7750@gondor.apana.org.au>

ipv6: Never schedule DAD timer on dead address

This patch ensures that all places that schedule the DAD timer
look at the address state in a safe manner before scheduling the
timer.  This ensures that we don't end up with pending timers
after deleting an address.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/ipv6/addrconf.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index e7e9643..dd72f9c 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2865,10 +2865,10 @@ static void addrconf_dad_start(struct inet6_ifaddr *ifp, u32 flags)
 	net_srandom(ifp->addr.s6_addr32[3]);
 
 	read_lock_bh(&idev->lock);
+	spin_lock(&ifp->lock);
 	if (ifp->state == INET6_IFADDR_STATE_DEAD)
 		goto out;
 
-	spin_lock(&ifp->lock);
 	if (dev->flags&(IFF_NOARP|IFF_LOOPBACK) ||
 	    idev->cnf.accept_dad < 1 ||
 	    !(ifp->flags&IFA_F_TENTATIVE) ||
@@ -2902,8 +2902,8 @@ static void addrconf_dad_start(struct inet6_ifaddr *ifp, u32 flags)
 		ip6_ins_rt(ifp->rt);
 
 	addrconf_dad_kick(ifp);
-	spin_unlock(&ifp->lock);
 out:
+	spin_unlock(&ifp->lock);
 	read_unlock_bh(&idev->lock);
 }
 
@@ -2923,6 +2923,12 @@ static void addrconf_dad_timer(unsigned long data)
 	}
 
 	spin_lock(&ifp->lock);
+	if (ifp->state == INET6_IFADDR_STATE_DEAD) {
+		spin_unlock(&ifp->lock);
+		read_unlock(&idev->lock);
+		goto out;
+	}
+
 	if (ifp->probes == 0) {
 		/*
 		 * DAD was successful

^ permalink raw reply related

* [PATCH 1/4] ipv6: Replace inet6_ifaddr->dead with state
From: Herbert Xu @ 2010-05-18 11:04 UTC (permalink / raw)
  To: David Miller, jbohac, yoshfuji, netdev, shemminger
In-Reply-To: <20100518110243.GA7750@gondor.apana.org.au>

ipv6: Replace inet6_ifaddr->dead with state

This patch replaces the boolean dead flag on inet6_ifaddr with
a state enum.  This allows us to roll back changes when deleting
an address according to whether DAD has completed or not.

This patch only adds the state field and does not change the logic.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/net/if_inet6.h |   12 ++++++++++--
 net/ipv6/addrconf.c    |   11 ++++++-----
 2 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/include/net/if_inet6.h b/include/net/if_inet6.h
index 545d8b0..421a1e8 100644
--- a/include/net/if_inet6.h
+++ b/include/net/if_inet6.h
@@ -30,6 +30,13 @@
 #define IF_PREFIX_ONLINK	0x01
 #define IF_PREFIX_AUTOCONF	0x02
 
+enum {
+	INET6_IFADDR_STATE_DAD,
+	INET6_IFADDR_STATE_POSTDAD,
+	INET6_IFADDR_STATE_UP,
+	INET6_IFADDR_STATE_DEAD,
+};
+
 #ifdef __KERNEL__
 
 struct inet6_ifaddr {
@@ -40,6 +47,9 @@ struct inet6_ifaddr {
 	__u32			prefered_lft;
 	atomic_t		refcnt;
 	spinlock_t		lock;
+	spinlock_t		state_lock;
+
+	int			state;
 
 	__u8			probes;
 	__u8			flags;
@@ -62,8 +72,6 @@ struct inet6_ifaddr {
 	struct inet6_ifaddr	*ifpub;
 	int			regen_count;
 #endif
-
-	int			dead;
 };
 
 struct ip6_sf_socklist {
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 413054f..14b2ccf 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -539,7 +539,7 @@ void inet6_ifa_finish_destroy(struct inet6_ifaddr *ifp)
 	if (del_timer(&ifp->timer))
 		printk("Timer is still running, when freeing ifa=%p\n", ifp);
 
-	if (!ifp->dead) {
+	if (ifp->state != INET6_IFADDR_STATE_DEAD) {
 		printk("Freeing alive inet6 address %p\n", ifp);
 		return;
 	}
@@ -642,6 +642,7 @@ ipv6_add_addr(struct inet6_dev *idev, const struct in6_addr *addr, int pfxlen,
 	ipv6_addr_copy(&ifa->addr, addr);
 
 	spin_lock_init(&ifa->lock);
+	spin_lock_init(&ifa->state_lock);
 	init_timer(&ifa->timer);
 	ifa->timer.data = (unsigned long) ifa;
 	ifa->scope = scope;
@@ -716,7 +717,7 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 
 	hash = ipv6_addr_hash(&ifp->addr);
 
-	ifp->dead = 1;
+	ifp->state = INET6_IFADDR_STATE_DEAD;
 
 	write_lock_bh(&addrconf_hash_lock);
 	for (ifap = &inet6_addr_lst[hash]; (ifa=*ifap) != NULL;
@@ -2679,7 +2680,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 	while ((ifa = idev->tempaddr_list) != NULL) {
 		idev->tempaddr_list = ifa->tmp_next;
 		ifa->tmp_next = NULL;
-		ifa->dead = 1;
+		ifa->state = INET6_IFADDR_STATE_DEAD;
 		write_unlock_bh(&idev->lock);
 		spin_lock_bh(&ifa->lock);
 
@@ -2724,7 +2725,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 			ifa->flags |= IFA_F_TENTATIVE;
 			in6_ifa_hold(ifa);
 		} else {
-			ifa->dead = 1;
+			ifa->state = INET6_IFADDR_STATE_DEAD;
 		}
 		write_unlock_bh(&idev->lock);
 
@@ -2827,7 +2828,7 @@ static void addrconf_dad_start(struct inet6_ifaddr *ifp, u32 flags)
 	net_srandom(ifp->addr.s6_addr32[3]);
 
 	read_lock_bh(&idev->lock);
-	if (ifp->dead)
+	if (ifp->state == INET6_IFADDR_STATE_DEAD)
 		goto out;
 
 	spin_lock(&ifp->lock);

^ permalink raw reply related

* [PATCH 2/4] ipv6: Use state_lock to protect ifa state
From: Herbert Xu @ 2010-05-18 11:04 UTC (permalink / raw)
  To: David Miller, jbohac, yoshfuji, netdev, shemminger
In-Reply-To: <20100518110243.GA7750@gondor.apana.org.au>

ipv6: Use state_lock to protect ifa state

This patch makes use of the new state_lock to synchronise between
updates to the ifa state.  This fixes the issue where a remotely
triggered address deletion (through DAD failure) coincides with a
local administrative address deletion, causing certain actions to
be performed twice incorrectly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 net/ipv6/addrconf.c |   23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 14b2ccf..d8d7c63 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -711,13 +711,20 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 {
 	struct inet6_ifaddr *ifa, **ifap;
 	struct inet6_dev *idev = ifp->idev;
+	int state;
 	int hash;
 	int deleted = 0, onlink = 0;
 	unsigned long expires = jiffies;
 
 	hash = ipv6_addr_hash(&ifp->addr);
 
+	spin_lock_bh(&ifp->state_lock);
+	state = ifp->state;
 	ifp->state = INET6_IFADDR_STATE_DEAD;
+	spin_unlock_bh(&ifp->state_lock);
+
+	if (state == INET6_IFADDR_STATE_DEAD)
+		goto out;
 
 	write_lock_bh(&addrconf_hash_lock);
 	for (ifap = &inet6_addr_lst[hash]; (ifa=*ifap) != NULL;
@@ -831,6 +838,7 @@ static void ipv6_del_addr(struct inet6_ifaddr *ifp)
 		dst_release(&rt->u.dst);
 	}
 
+out:
 	in6_ifa_put(ifp);
 }
 
@@ -2621,6 +2629,7 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 	struct inet6_dev *idev;
 	struct inet6_ifaddr *ifa, *keep_list, **bifa;
 	struct net *net = dev_net(dev);
+	int state;
 	int i;
 
 	ASSERT_RTNL();
@@ -2680,7 +2689,6 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 	while ((ifa = idev->tempaddr_list) != NULL) {
 		idev->tempaddr_list = ifa->tmp_next;
 		ifa->tmp_next = NULL;
-		ifa->state = INET6_IFADDR_STATE_DEAD;
 		write_unlock_bh(&idev->lock);
 		spin_lock_bh(&ifa->lock);
 
@@ -2723,14 +2731,25 @@ static int addrconf_ifdown(struct net_device *dev, int how)
 
 			/* Flag it for later restoration when link comes up */
 			ifa->flags |= IFA_F_TENTATIVE;
+
+			write_unlock_bh(&idev->lock);
+
 			in6_ifa_hold(ifa);
 		} else {
+			write_unlock_bh(&idev->lock);
+			spin_lock_bh(&ifa->state_lock);
+			state = ifa->state;
 			ifa->state = INET6_IFADDR_STATE_DEAD;
+			spin_unlock_bh(&ifa->state_lock);
+
+			if (state == INET6_IFADDR_STATE_DEAD)
+				goto put_ifa;
 		}
-		write_unlock_bh(&idev->lock);
 
 		__ipv6_ifa_notify(RTM_DELADDR, ifa);
 		atomic_notifier_call_chain(&inet6addr_chain, NETDEV_DOWN, ifa);
+
+put_ifa:
 		in6_ifa_put(ifa);
 
 		write_lock_bh(&idev->lock);

^ permalink raw reply related

* [0/4] Fix addrconf race conditions
From: Herbert Xu @ 2010-05-18 11:02 UTC (permalink / raw)
  To: David Miller; +Cc: jbohac, yoshfuji, netdev, shemminger
In-Reply-To: <20100423150540.GA26611@gondor.apana.org.au>

On Fri, Apr 23, 2010 at 11:05:40PM +0800, Herbert Xu wrote:
>
> This stuff is more broken than I thought.  For example, we perform
> a number of actions when DAD succeeds, e.g., joining an anycast
> group.  However, this is not synchronised with respect to address
> deletion at all, so if DAD succeeds just as someone deletes the
> address, you can easily get stuck on that anycast group.
> 
> I will try to untangle this mess tomorrow.

Tomorrow took a while to arrive :)

Here is a first batch of patches.  Note that this is by no means
a comprehensive fix for all the ndisc/addrconf race conditions.
It is just a first step in trying to address the problems.

The patchset revolves around a new lock, ifp->state_lock.  I
added it instead of trying to reuse the existing ifp->lock because
the latter has serious nesting issues that prevent it from easily
being used.  My long term plan is to restructure the locking and
eventually phase out ifp->lock in favour of ifp->state_lock.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: Kernel crash with sky2
From: Roedel, Joerg @ 2010-05-18 11:01 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20100517122236.7f33101f@nehalam>

Hi Stephen,

On Mon, May 17, 2010 at 03:22:36PM -0400, Stephen Hemminger wrote:
> On Mon, 17 May 2010 20:52:28 +0200
> Joerg Roedel <joerg.roedel@amd.com> wrote:
> > I experience the following crash with 2.6.34 in the sky2 code on my
> > laptop when I plug off the lan-cable and then plug-off the power cable
> > and switching to battery. It does not happen with acpi=off.
> 
> So you have a busted BIOS that powers off the device.

Yeah you are right, its a BIOS issue. I tried to find out how the OS is
informed about the device taken away. But none of the hotplug
drivers or enabling acpi debug showed anything here. I wonder how this
is done in the "other" operating system. Or how the device could be
re-enabled, plugging the wires back doesn't help. Very weird.
Anyway, I found a BIOS option to disable this behavior and now things
work unpatched. Thanks for your help.

	Joerg

^ permalink raw reply

* Re: [PATCH] virtio: put last seen used index into ring itself
From: Michael S. Tsirkin @ 2010-05-18 10:25 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Rusty Russell, Jiri Pirko, Shirley Ma, Amit Shah, Mark McLoughlin,
	netdev, linux-kernel, quintela, alex.williamson
In-Reply-To: <4BF26B87.3080902@redhat.com>

On Tue, May 18, 2010 at 12:27:19PM +0200, Jes Sorensen wrote:
> On 05/18/10 02:47, Michael S. Tsirkin wrote:
> > Generally, the Host end of the virtio ring doesn't need to see where
> > Guest is up to in consuming the ring.  However, to completely understand
> > what's going on from the outside, this information must be exposed.
> > For example, host can reduce the number of interrupts by detecting
> > that the guest is currently handling previous buffers.
> > 
> > Fortunately, we have room to expand: the ring is always a whole number
> > of pages and there's hundreds of bytes of padding after the avail ring
> > and the used ring, whatever the number of descriptors (which must be a
> > power of 2).
> 
> Hi Michael,
> 
> Small build fix for this against Linus upstream.
> 
> Jes
> 
> 
> Fix build of virtio_net.c
> 
> Signed-off-by: Jes.Sorensen@redhat.com

Thanks, this is already fixed in v2 that I sent.

> ---
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index d37e5be..679e2df 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -23,6 +23,7 @@
>  #include <linux/module.h>
>  #include <linux/virtio.h>
>  #include <linux/virtio_net.h>
> +#include <linux/virtio_ring.h>
>  #include <linux/scatterlist.h>
>  #include <linux/if_vlan.h>
>  #include <linux/slab.h>
> 

^ permalink raw reply

* Re: [PATCH] virtio: put last seen used index into ring itself
From: Jes Sorensen @ 2010-05-18 10:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Rusty Russell, Jiri Pirko, Shirley Ma, Amit Shah, Mark McLoughlin,
	netdev, linux-kernel, quintela, alex.williamson
In-Reply-To: <20100518004744.GA21359@redhat.com>

On 05/18/10 02:47, Michael S. Tsirkin wrote:
> Generally, the Host end of the virtio ring doesn't need to see where
> Guest is up to in consuming the ring.  However, to completely understand
> what's going on from the outside, this information must be exposed.
> For example, host can reduce the number of interrupts by detecting
> that the guest is currently handling previous buffers.
> 
> Fortunately, we have room to expand: the ring is always a whole number
> of pages and there's hundreds of bytes of padding after the avail ring
> and the used ring, whatever the number of descriptors (which must be a
> power of 2).

Hi Michael,

Small build fix for this against Linus upstream.

Jes


Fix build of virtio_net.c

Signed-off-by: Jes.Sorensen@redhat.com
---
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d37e5be..679e2df 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -23,6 +23,7 @@
 #include <linux/module.h>
 #include <linux/virtio.h>
 #include <linux/virtio_net.h>
+#include <linux/virtio_ring.h>
 #include <linux/scatterlist.h>
 #include <linux/if_vlan.h>
 #include <linux/slab.h>

^ permalink raw reply related

* dev_get_valid_name buggy with hash collision
From: Daniel Lezcano @ 2010-05-18 10:17 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: Linux Netdev List

[-- Attachment #1: Type: text/plain, Size: 3982 bytes --]


Hi all,

the commit:

commit d90310243fd750240755e217c5faa13e24f41536
Author: Octavian Purdila <opurdila@ixiacom.com>
Date:   Wed Nov 18 02:36:59 2009 +0000

     net: device name allocation cleanups

introduced a bug when there is a hash collision making impossible to 
rename a device with eth%d

This bug is very hard to reproduce and appears rarely, but finally I 
succeed to reproduce it with the program in attachment which fail to 
rename a device with errno ENFILE.

The test program creates a new network namespace in order to avoid 
messing the real network and to be sure we don't have any ethernet 
devices. Hence, we know if we create one ethernet device with the name 
eth%d the result will be 'eth0'.

The first step is to find a conflicting name with 'eth0':
  1) We compute the hash of 'eth0' with the hashing functions imported 
from the kernel
  2) We create a temporary name, compute its hash
  3) We compare the hash with the one we found for 'eth0'.

We loop until the hashes are different. When they are the same, then the 
temporary name is a conflicting name.

We create a dummy device with the temporary conflicting name and then we 
try to rename it with 'eth%d' (expecting 'eth0'), that fails with ENFILE 
due to the kernel bug.

 From the kernel POV, this is what happen:

dev_change_name does:
---------------------

     [ ... ]

     dev_get_valid_name(net, newname, dev->name, 1);

     [ ... ]

Note the dev->name parameter and newname is 'eth%d'.

dev_get_valid_name does:
------------------------

     [ ... ]

     if (fmt && strchr(name, '%'))
         return __dev_alloc_name(net, name, buf);

     [ ... ]

Note the 'buf' parameter is the 'dev->name' parameter and 'name' is "eth%d"

__dev_alloc_name does:
----------------------

     [ ... ]

         for_each_netdev(net, d) {
             if (!sscanf(d->name, name, &i))
                 continue;
             if (i < 0 || i >= max_netdevices)
                 continue;

             /*  avoid cases where sscanf is not exact inverse of printf */
             snprintf(buf, IFNAMSIZ, name, i);
             if (!strncmp(buf, d->name, IFNAMSIZ))
                 set_bit(i, inuse);
         }

     [ ... ]

Here the buf parameter is 'dev->name', so while we are browsing the 
network devices, we change the name of our eth device we want to rename. 
But in the context of our test program, that does not happen because 
there is no "eth[0-9]" network devices in the namespace, then we exit 
the loop with 'i == 0'.

Right after we do:

     if (buf != name)
         snprintf(buf, IFNAMSIZ, name, i);

Here buf and name pointers are different, so we modify 'buf' which is 
'dev->name', that is the network device name directly. So we have 
'dev->name' == "eth0" after this line.


So right after we are trying to find ourself :)

     [ ... ]

     if (!__dev_get_by_name(net, buf))
         return i;

     [ ... ]

When hashing are the same for the oldname and the newname, the function 
'__dev_get_by_name':

     [ ... ]

     struct hlist_head *head = dev_name_hash(net, name);

     hlist_for_each_entry(dev, p, head, name_hlist)
         if (!strncmp(dev->name, name, IFNAMSIZ))
             return dev;

     [ ... ]

will find the network device because we do __dev_get_by_name(net, 
"eth0"), the dev->name is already modified with "eth0" and the hashing 
of the temporary name and "eth0" are the same so returning the same hash 
entry.

We are lucky, most of the time, because the name of the network device 
and the new name have different hash entry, so we compare to ourself 
very rarely.


IMO, the bug is to pass the 'dev->name' parameter to __dev_alloc_name 
directly instead of a temporary name.

The patch in attachment fix the problem but I am not sure we shouldn't 
go further and do more cleanup around this bug, so you may consider it 
as a RFC more than a fix. If it is acceptable, I will send it as a patch 
against net-2.6.

Thanks
   -- Daniel















[-- Attachment #2: myethash.c --]
[-- Type: text/x-csrc, Size: 2623 bytes --]

#include <stdio.h>
#include <net/if.h>
#define _GNU_SOURCE
#include <sched.h>
#include <unistd.h>
#include <sys/types.h>

/* 
 * We import the hash function from the kernel, so we can compute a conflicting
 * name directly in this program and reproduce the bug easily
 */

#define NETDEV_HASHBITS  8

/*************** from include/linux/dcache.h *****************************/

#define init_name_hash()  0
static inline unsigned long
partial_name_hash(unsigned long c, unsigned long prevhash)
{
        return (prevhash + (c << 4) + (c >> 4)) * 11;
}

static inline unsigned long end_name_hash(unsigned long hash)
{
        return (unsigned int) hash;
}

static inline unsigned int
full_name_hash(const unsigned char *name, unsigned int len)
{
        unsigned long hash = init_name_hash();
        while (len--)
                hash = partial_name_hash(*name++, hash);
        return end_name_hash(hash);
}

/***************** from include/linux/hash.h *****************************/

/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
#define GOLDEN_RATIO_PRIME_32 0x9e370001UL

static inline unsigned int hash_32(unsigned int val, unsigned int bits)
{
        /* On some cpus multiply is faster, on others gcc will do shifts */
        unsigned int hash = val * GOLDEN_RATIO_PRIME_32;

        /* High bits are more random, so use them. */
        return hash >> (32 - bits);
}

/*************************************************************************/


unsigned int dev_name_hash(const char *name)
{
	return hash_32(full_name_hash(name, strnlen(name, IFNAMSIZ)), 
                        NETDEV_HASHBITS);
}

int main(int argc, char *argv[])
{
	char devname[IFNAMSIZ];
	char cmd[4096];
	unsigned int val, val2;
	
	if (getuid()) {
		fprintf(stderr, "you have to be root !\n");
		return -1;
	}

	/*
	 * Unshare the network namespace, we don't mess the network
         * and we can assume the eth%d rename will be eth0
	 */
	if (unshare(CLONE_NEWNET)) {
		perror("unshare");
		return -1;
	}		

	val = dev_name_hash("eth0");

	while (1) {

		snprintf(devname, IFNAMSIZ, "ethXXXXXX");
		mktemp(devname);

		val2 = dev_name_hash(devname);

		if (val == val2) {
			printf("'%s' has same hash entry\n", devname);
			break;
		}
	}

	/*
	 * We create a dummy interface with the conflicting name
	 * and then we rename it with an kernel allocated name
	 * The kernel will fail to rename with -ENFILE.
	 */

	sprintf(cmd, "ip link add %s type dummy", devname);

	if (system(cmd)) {
		perror("system");
		return -1;
	}

	sprintf(cmd, "ip link set name eth%%d dev %s", devname);
	printf("%s\n", cmd);
	system(cmd); 

	return 0;
}

[-- Attachment #3: fix-dev_get_valid_name.patch --]
[-- Type: text/x-diff, Size: 2315 bytes --]

Subject: fix dev_get_valid_name
From: Daniel Lezcano <daniel.lezcano@free.fr>

the commit:

commit d90310243fd750240755e217c5faa13e24f41536
Author: Octavian Purdila <opurdila@ixiacom.com>
Date:   Wed Nov 18 02:36:59 2009 +0000

    net: device name allocation cleanups

introduced a bug when there is a hash collision making impossible
to rename a device with eth%d. This bug is very hard to reproduce
and appears rarely.

The problem is coming from we don't pass a temporary buffer to
__dev_alloc_name but 'dev->name' which is modified by the function.

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
---
 net/core/dev.c |   20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

Index: net-2.6/net/core/dev.c
===================================================================
--- net-2.6.orig/net/core/dev.c
+++ net-2.6/net/core/dev.c
@@ -936,18 +936,22 @@ int dev_alloc_name(struct net_device *de
 }
 EXPORT_SYMBOL(dev_alloc_name);
 
-static int dev_get_valid_name(struct net *net, const char *name, char *buf,
-			      bool fmt)
+static int dev_get_valid_name(struct net_device *dev, const char *name, bool fmt)
 {
+	struct net *net;
+
+	BUG_ON(!dev_net(dev));
+	net = dev_net(dev);
+
 	if (!dev_valid_name(name))
 		return -EINVAL;
 
 	if (fmt && strchr(name, '%'))
-		return __dev_alloc_name(net, name, buf);
+		return dev_alloc_name(dev, name);
 	else if (__dev_get_by_name(net, name))
 		return -EEXIST;
-	else if (buf != name)
-		strlcpy(buf, name, IFNAMSIZ);
+	else if (strncmp(dev->name, name, IFNAMSIZ))
+		 strlcpy(dev->name, name, IFNAMSIZ);
 
 	return 0;
 }
@@ -979,7 +983,7 @@ int dev_change_name(struct net_device *d
 
 	memcpy(oldname, dev->name, IFNAMSIZ);
 
-	err = dev_get_valid_name(net, newname, dev->name, 1);
+	err = dev_get_valid_name(dev, newname, 1);
 	if (err < 0)
 		return err;
 
@@ -5083,7 +5087,7 @@ int register_netdevice(struct net_device
 		}
 	}
 
-	ret = dev_get_valid_name(net, dev->name, dev->name, 0);
+	ret = dev_get_valid_name(dev, dev->name, 0);
 	if (ret)
 		goto err_uninit;
 
@@ -5661,7 +5665,7 @@ int dev_change_net_namespace(struct net_
 		/* We get here if we can't use the current device name */
 		if (!pat)
 			goto out;
-		if (dev_get_valid_name(net, pat, dev->name, 1))
+		if (dev_get_valid_name(dev, pat, 1))
 			goto out;
 	}
 

^ permalink raw reply

* Re: [PATCH] bfin_mac: fix memleak in mii_bus{probe|remove}
From: Sonic Zhang @ 2010-05-18 10:05 UTC (permalink / raw)
  To: kirjanov, davem, michael.hennerich, sonic.zhang, cooloney,
	uclinux-dist-devel, netdev
In-Reply-To: <20100518092839.GA21791@coldcone>

2010/5/18 Denis Kirjanov <kirjanov@gmail.com <kirjanov@gmail.com>:
> Fix memory leak with miibus->irq
>
> Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
> ---
>
> drivers/net/bfin_mac.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
> index 39a54ba..7a17b8a 100644
> --- a/drivers/net/bfin_mac.c
> +++ b/drivers/net/bfin_mac.c
> @@ -1627,6 +1627,7 @@ static int __devinit bfin_mii_bus_probe(struct platform_device *pdev)
>
>  out_err_mdiobus_register:
>        mdiobus_free(miibus);
> +       kfree(miibus->irq);

Should you move this kfree before mdiobus_free?

>  out_err_alloc:
>        peripheral_free_list(pin_req);
>
> @@ -1638,6 +1639,7 @@ static int __devexit bfin_mii_bus_remove(struct platform_device *pdev)
>        struct mii_bus *miibus = platform_get_drvdata(pdev);
>        platform_set_drvdata(pdev, NULL);
>        mdiobus_unregister(miibus);
> +       kfree(miibus->irq);
>        mdiobus_free(miibus);
>        peripheral_free_list(pin_req);
>        return 0;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* [PATCH] bfin_mac: fix memleak in mii_bus{probe|remove}
From: Denis Kirjanov <kirjanov@gmail.com @ 2010-05-18  9:28 UTC (permalink / raw)
  To: davem; +Cc: michael.hennerich, sonic.zhang, cooloney, uclinux-dist-devel,
	netdev

Fix memory leak with miibus->irq

Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
---

drivers/net/bfin_mac.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/net/bfin_mac.c b/drivers/net/bfin_mac.c
index 39a54ba..7a17b8a 100644
--- a/drivers/net/bfin_mac.c
+++ b/drivers/net/bfin_mac.c
@@ -1627,6 +1627,7 @@ static int __devinit bfin_mii_bus_probe(struct platform_device *pdev)
 
 out_err_mdiobus_register:
 	mdiobus_free(miibus);
+	kfree(miibus->irq);
 out_err_alloc:
 	peripheral_free_list(pin_req);
 
@@ -1638,6 +1639,7 @@ static int __devexit bfin_mii_bus_remove(struct platform_device *pdev)
 	struct mii_bus *miibus = platform_get_drvdata(pdev);
 	platform_set_drvdata(pdev, NULL);
 	mdiobus_unregister(miibus);
+	kfree(miibus->irq);
 	mdiobus_free(miibus);
 	peripheral_free_list(pin_req);
 	return 0;

^ permalink raw reply related

* [PATCH NEXT 1/1] qlcnic: adding co maintainer
From: Amit Kumar Salecha @ 2010-05-18  9:48 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty

Adding Anirban as co maintainer

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 MAINTAINERS |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index abd0a77..1ace1b9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3971,6 +3971,7 @@ F:	include/linux/*device.h
 
 NETXEN (1/10) GbE SUPPORT
 M:	Amit Kumar Salecha <amit.salecha@qlogic.com>
+M:	Anirban Chakraborty <anirban.chakraborty@qlogic.com>
 L:	netdev@vger.kernel.org
 W:	http://www.qlogic.com
 S:	Supported
-- 
1.6.3.3


^ permalink raw reply related

* [PATCH] iproute2: rework SR-IOV VF support
From: Chris Wright @ 2010-05-18  7:57 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Williams, Mitch A

The kernel interface changed just before 2.6.34 was released.  This brings
iproute2 in line with the current changes.  The VF portion of setlink is
comprised of a set of nested attributes.

  IFLA_VFINFO_LIST (NESTED)
    IFLA_VF_INFO (NESTED)
      IFLA_VF_MAC
      IFLA_VF_VLAN
      IFLA_VF_TX_RATE

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
---
Didn't forget the doc updates, tomorrow.

 ip/ipaddress.c |   60 ++++++++++++++++++------------
 ip/iplink.c    |  113 +++++++++++++++++++++++++++++++++++--------------------
 2 files changed, 108 insertions(+), 65 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 48f7b1e..f6db183 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -187,6 +187,36 @@ static void print_linktype(FILE *fp, struct rtattr *tb)
 	}
 }
 
+static void print_vfinfo(FILE *fp, struct rtattr *vfinfo)
+{
+	struct ifla_vf_mac *vf_mac;
+	struct ifla_vf_vlan *vf_vlan;
+	struct ifla_vf_tx_rate *vf_tx_rate;
+	struct rtattr *vf[IFLA_VF_MAX+1];
+	SPRINT_BUF(b1);
+
+	if (vfinfo->rta_type != IFLA_VF_INFO) {
+		fprintf(stderr, "BUG: rta type is %d\n", vfinfo->rta_type);
+		return;
+	}
+
+	parse_rtattr_nested(vf, IFLA_VF_MAX, vfinfo);
+
+	vf_mac = RTA_DATA(vf[IFLA_VF_MAC]);
+	vf_vlan = RTA_DATA(vf[IFLA_VF_VLAN]);
+	vf_tx_rate = RTA_DATA(vf[IFLA_VF_TX_RATE]);
+
+	fprintf(fp, "\n    vf %d MAC %s", vf_mac->vf,
+		ll_addr_n2a((unsigned char *)&vf_mac->mac,
+		ETH_ALEN, 0, b1, sizeof(b1)));
+	if (vf_vlan->vlan)
+		fprintf(fp, ", vlan %d", vf_vlan->vlan);
+	if (vf_vlan->qos)
+		fprintf(fp, ", qos %d", vf_vlan->qos);
+	if (vf_tx_rate->rate)
+		fprintf(fp, ", tx rate %d (Mbps)", vf_tx_rate->rate);
+}
+
 int print_linkinfo(const struct sockaddr_nl *who,
 		   struct nlmsghdr *n, void *arg)
 {
@@ -331,31 +361,13 @@ int print_linkinfo(const struct sockaddr_nl *who,
 				);
 		}
 	}
-	if (do_link && tb[IFLA_VFINFO] && tb[IFLA_NUM_VF]) {
-		SPRINT_BUF(b1);
-		struct rtattr *rta = tb[IFLA_VFINFO];
-		struct ifla_vf_info *ivi;
-		int i;
-		for (i = 0; i < *(int *)RTA_DATA(tb[IFLA_NUM_VF]); i++) {
-			if (rta->rta_type != IFLA_VFINFO) {
-				fprintf(stderr, "BUG: rta type is %d\n", rta->rta_type);
-				break;
-			}
-			ivi = RTA_DATA(rta);
-			fprintf(fp, "\n    vf %d: MAC %s",
-				ivi->vf,
-				ll_addr_n2a((unsigned char *)&ivi->mac,
-					    ETH_ALEN, 0, b1, sizeof(b1)));
-				if (ivi->vlan)
-					fprintf(fp, ", vlan %d", ivi->vlan);
-				if (ivi->qos)
-					fprintf(fp, ", qos %d", ivi->qos);
-				if (ivi->tx_rate)
-					fprintf(fp, ", tx rate %d (Mbps_",
-						ivi->tx_rate);
-			rta = (struct rtattr *)((char *)rta + RTA_ALIGN(rta->rta_len));
-		}
+	if (do_link && tb[IFLA_VFINFO_LIST] && tb[IFLA_NUM_VF]) {
+		struct rtattr *i, *vflist = tb[IFLA_VFINFO_LIST];
+		int rem = RTA_PAYLOAD(vflist);
+		for (i = RTA_DATA(vflist); RTA_OK(i, rem); i = RTA_NEXT(i, rem))
+			print_vfinfo(fp, i);
 	}
+
 	fprintf(fp, "\n");
 	fflush(fp);
 	return 0;
diff --git a/ip/iplink.c b/ip/iplink.c
index 97fca8b..cb2c4f5 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -176,6 +176,73 @@ struct iplink_req {
 	char			buf[1024];
 };
 
+int iplink_parse_vf(int vf, int *argcp, char ***argvp,
+			   struct iplink_req *req)
+{
+	int len, argc = *argcp;
+	char **argv = *argvp;
+	struct rtattr *vfinfo;
+
+	vfinfo = addattr_nest(&req->n, sizeof(*req), IFLA_VF_INFO);
+
+	while (NEXT_ARG_OK()) {
+		NEXT_ARG();
+		if (matches(*argv, "mac") == 0) {
+			struct ifla_vf_mac ivm;
+			NEXT_ARG();
+			ivm.vf = vf;
+			len = ll_addr_a2n((char *)ivm.mac, 32, *argv);
+			if (len < 0)
+				return -1;
+			addattr_l(&req->n, sizeof(*req), IFLA_VF_MAC, &ivm, sizeof(ivm));
+		} else if (matches(*argv, "vlan") == 0) {
+			struct ifla_vf_vlan ivv;
+			NEXT_ARG();
+			if (get_unsigned(&ivv.vlan, *argv, 0)) {
+				invarg("Invalid \"vlan\" value\n", *argv);
+			}
+			ivv.vf = vf;
+			ivv.qos = 0;
+			if (NEXT_ARG_OK()) {
+				NEXT_ARG();
+				if (matches(*argv, "qos") == 0) {
+					NEXT_ARG();
+					if (get_unsigned(&ivv.qos, *argv, 0)) {
+						invarg("Invalid \"qos\" value\n", *argv);
+					}
+				} else {
+					/* rewind arg */
+					PREV_ARG();
+				}
+			}
+			addattr_l(&req->n, sizeof(*req), IFLA_VF_VLAN, &ivv, sizeof(ivv));
+		} else if (matches(*argv, "rate") == 0) {
+			struct ifla_vf_tx_rate ivt;
+			NEXT_ARG();
+			if (get_unsigned(&ivt.rate, *argv, 0)) {
+				invarg("Invalid \"rate\" value\n", *argv);
+			}
+			ivt.vf = vf;
+			addattr_l(&req->n, sizeof(*req), IFLA_VF_TX_RATE, &ivt, sizeof(ivt));
+		
+		} else {
+			/* rewind arg */
+			PREV_ARG();
+			break;
+		}
+	}
+
+	if (argc == *argcp)
+		incomplete_command();
+
+	addattr_nest_end(&req->n, vfinfo);
+
+	*argcp = argc;
+	*argvp = argv;
+	return 0;
+}
+
+
 int iplink_parse(int argc, char **argv, struct iplink_req *req,
 		char **name, char **type, char **link, char **dev)
 {
@@ -283,53 +350,17 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req,
 			} else
 				return on_off("noarp");
 		} else if (strcmp(*argv, "vf") == 0) {
+			struct rtattr *vflist;
 			NEXT_ARG();
 			if (get_integer(&vf,  *argv, 0)) {
 				invarg("Invalid \"vf\" value\n", *argv);
 			}
-		} else if (matches(*argv, "mac") == 0) {
-			struct ifla_vf_mac ivm;
-			NEXT_ARG();
-			if (vf < 0) {
-				missarg("vf");
-			}
-			ivm.vf = vf;
-			len = ll_addr_a2n((char *)ivm.mac, 32, *argv);
+			vflist = addattr_nest(&req->n, sizeof(*req),
+					      IFLA_VFINFO_LIST);
+			len = iplink_parse_vf(vf, &argc, &argv, req);
 			if (len < 0)
 				return -1;
-			addattr_l(&req->n, sizeof(*req), IFLA_VF_MAC, &ivm, sizeof(ivm));
-		} else if (matches(*argv, "vlan") == 0) {
-			struct ifla_vf_vlan ivv;
-			NEXT_ARG();
-			if (vf < 0) {
-				missarg("vf");
-			}
-			if (get_unsigned(&ivv.vlan, *argv, 0)) {
-				invarg("Invalid \"vlan\" value\n", *argv);
-			}
-			ivv.vf = vf;
-			ivv.qos = 0;
-			if (NEXT_ARG_OK()) {
-				NEXT_ARG();
-				if (matches(*argv, "qos") == 0) {
-					NEXT_ARG();
-					if (get_unsigned(&ivv.qos, *argv, 0)) {
-						invarg("Invalid \"qos\" value\n", *argv);
-					}
-				}
-			}
-			addattr_l(&req->n, sizeof(*req), IFLA_VF_VLAN, &ivv, sizeof(ivv));
-		} else if (matches(*argv, "rate") == 0) {
-			struct ifla_vf_tx_rate ivt;
-			NEXT_ARG();
-			if (vf < 0) {
-				missarg("vf");
-			}
-			if (get_unsigned(&ivt.rate, *argv, 0)) {
-				invarg("Invalid \"rate\" value\n", *argv);
-			}
-			ivt.vf = vf;
-			addattr_l(&req->n, sizeof(*req), IFLA_VF_TX_RATE, &ivt, sizeof(ivt));
+			addattr_nest_end(&req->n, vflist);
 #ifdef IFF_DYNAMIC
 		} else if (matches(*argv, "dynamic") == 0) {
 			NEXT_ARG();

^ permalink raw reply related

* [PATCH] tcp: tcp_md5_hash_skb_data() frag_list handling
From: Eric Dumazet @ 2010-05-18  6:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100517.231403.148532542.davem@davemloft.net>

Le lundi 17 mai 2010 à 23:14 -0700, David Miller a écrit :
> Eric, would you please formally submit that tcp md5 fraglist
> fix to net/ipv4/tcp.c with proper signoffs?

Here it is, thanks !


[PATCH] tcp: tcp_md5_hash_skb_data() frag_list handling

tcp_md5_hash_skb_data() should handle skb->frag_list, and eventually
recurse.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/tcp.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 6596b4f..49d0d2b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2999,6 +2999,7 @@ int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *hp,
 	const unsigned head_data_len = skb_headlen(skb) > header_len ?
 				       skb_headlen(skb) - header_len : 0;
 	const struct skb_shared_info *shi = skb_shinfo(skb);
+	struct sk_buff *frag_iter;
 
 	sg_init_table(&sg, 1);
 
@@ -3013,6 +3014,10 @@ int tcp_md5_hash_skb_data(struct tcp_md5sig_pool *hp,
 			return 1;
 	}
 
+	skb_walk_frags(skb, frag_iter)
+		if (tcp_md5_hash_skb_data(hp, frag_iter, 0))
+			return 1;
+
 	return 0;
 }
 



^ permalink raw reply related

* Re: [PATCH v3 3/3] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
From: Richard Cochran @ 2010-05-18  6:36 UTC (permalink / raw)
  To: Scott Wood
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ
In-Reply-To: <4BF18582.6000500-KZfg59tc24xl57MIdRCFDg@public.gmane.org>

On Mon, May 17, 2010 at 01:05:54PM -0500, Scott Wood wrote:
> >>> >+  - tmr_fiper1   Fixed interval period pulse generator.
> >>> >+  - tmr_fiper2   Fixed interval period pulse generator.
> >>
> 
> MPC8572 and P2020 have fiper3 as well.

I doubt they really have a third fiper.

First of all, this signal is not routed anywhere on the boards. Also,
according to the documentation, it has no bit in the TMR_CTRL or the
TMR_TEMASK registers. Unless there is a bit in TMR_TEMASK, you cannot
get an interrupt from it.

If you cannot use the signal externally (in the "real" world) and you
cannot get an interrupt, what good is it to have such a periodic
signal? Polling the bit in the TMR_TEVENT to see when a pulse occurs
seems pointless.

Scott, you have connections, right? Can you clarify this for me?

Thanks,

Richard

^ permalink raw reply

* Re: tcp md5 fraglist...
From: David Miller @ 2010-05-18  6:35 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1274164224.2567.35.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 18 May 2010 08:30:24 +0200

> Le lundi 17 mai 2010 à 23:14 -0700, David Miller a écrit :
>> Eric, would you please formally submit that tcp md5 fraglist
>> fix to net/ipv4/tcp.c with proper signoffs?
>> 
>> Thanks!
> 
> Sure, will do, but I doubt it is a real problem, because GRO or RSC
> should not coalesce two segments with MD5 signature (they are guaranteed
> to differ for each segment ?)

Right, and I just checked that GRO does in fact check if the TCP
options are different in any way.

^ permalink raw reply

* Re: tcp md5 fraglist...
From: Eric Dumazet @ 2010-05-18  6:30 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20100517.231403.148532542.davem@davemloft.net>

Le lundi 17 mai 2010 à 23:14 -0700, David Miller a écrit :
> Eric, would you please formally submit that tcp md5 fraglist
> fix to net/ipv4/tcp.c with proper signoffs?
> 
> Thanks!

Sure, will do, but I doubt it is a real problem, because GRO or RSC
should not coalesce two segments with MD5 signature (they are guaranteed
to differ for each segment ?)




^ permalink raw reply

* tcp md5 fraglist...
From: David Miller @ 2010-05-18  6:14 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev


Eric, would you please formally submit that tcp md5 fraglist
fix to net/ipv4/tcp.c with proper signoffs?

Thanks!

^ permalink raw reply

* Re: [PATCH] ARCNET: Limit com20020 PCI ID matches for SOHARD cards
From: David Miller @ 2010-05-18  6:12 UTC (permalink / raw)
  To: aeb; +Cc: netdev, linux-kernel, stable
In-Reply-To: <20100512004852.GA25845@amos.infernal>

From: Andreas Bombe <aeb@debian.org>
Date: Wed, 12 May 2010 02:49:06 +0200

> The SH SOHARD ARCNET cards are implemented using generic PLX Technology
> PCI<->IOBus bridges. Subvendor and subdevice IDs were not specified,
> causing the driver to attach to any such bridge and likely crash the
> system by attempting to initialize an unrelated device.
> 
> Fix by specifying subvendor and subdevice according to the values found
> in the PCI-ID Repository at http://pci-ids.ucw.cz/ .
> 
> Cc: stable@kernel.org
> Signed-off-by: Andreas Bombe <aeb@debian.org>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next] net: Remove unnecessary returns from void function()s
From: David Miller @ 2010-05-18  6:08 UTC (permalink / raw)
  To: joe; +Cc: netdev
In-Reply-To: <1273599941.20514.235.camel@Joe-Laptop.home>

From: Joe Perches <joe@perches.com>
Date: Tue, 11 May 2010 10:45:41 -0700

> This patch removes from net/ (but not any netfilter files)
> all the unnecessary return; statements that precede the
> last closing brace of void functions.
> 
> It does not remove the returns that are immediately
> preceded by a label as gcc doesn't like that.
> 
> Done via:
> $ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
>   xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'
> 
> Signed-off-by: Joe Perches <joe@perches.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH] net sched: cleanup and rate limit warning
From: David Miller @ 2010-05-18  6:06 UTC (permalink / raw)
  To: hadi; +Cc: shemminger, kaber, netdev
In-Reply-To: <1273768808.6117.5.camel@bigi>

From: jamal <hadi@cyberus.ca>
Date: Thu, 13 May 2010 12:40:08 -0400

> On Thu, 2010-05-13 at 09:26 -0700, Stephen Hemminger wrote:
> 
>> > 
>> 
>> And the kernel message needs to be fixed to prevent total
>> log overload for the next poor sop who makes the same mistake.
>> 
>> Please accept the patch.
> 
> I have no problem with the patch. Dont think it needs to go to stable
> but i'll let Dave call it.
> 
> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>

Applied, I don't think it's -stable material personally.  If we start
queueing stuff like this, it will be 200 instead of 100 patches in
each -stable release :-)

^ permalink raw reply

* (unknown), 
From: Jonas Bonn @ 2010-05-18  5:39 UTC (permalink / raw)
  To: netdev

Two patches for the ethoc driver and one to the Micrel PHY driver.

I'd like feedback on the patch 1/3 (write bus addresses to registers) as I'm 
not totally confident that this won't break for someone else.  The MAC registers
should definitely be getting bus/physical addresses and ->membase is thus
the wrong value to be using (it's virtual); however, since I presume that 
this driver was working for somebody else before (???), then I'd like to know
that whatever use case they have isn't broken by this -- what configuration
would have ->membase == ->mem_start anyway?  NO_MMU?

The other two patches are pretty trivial.

Thanks,
Jonas

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox