Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] bonding: check if clients MAC addr has changed
From: Brian Haley @ 2010-06-29 15:39 UTC (permalink / raw)
  To: Flavio Leitner; +Cc: bonding-devel, Jay Vosburgh, netdev, Andy Gospodarek
In-Reply-To: <1277822481-25175-1-git-send-email-fleitner@redhat.com>

On 06/29/2010 10:41 AM, Flavio Leitner wrote:
> diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
> index 40fdc41..67154bb 100644
> --- a/drivers/net/bonding/bond_alb.c
> +++ b/drivers/net/bonding/bond_alb.c
> @@ -340,7 +340,8 @@ static void rlb_update_entry_from_arp(struct bonding *bond, struct arp_pkt *arp)
>  
>  	if ((client_info->assigned) &&
>  	    (client_info->ip_src == arp->ip_dst) &&
> -	    (client_info->ip_dst == arp->ip_src)) {
> +	    (client_info->ip_dst == arp->ip_src) &&
> +	    (memcmp(client_info->mac_dst, arp->mac_src, ETH_ALEN))) {
>  		/* update the clients MAC address */
>  		memcpy(client_info->mac_dst, arp->mac_src, ETH_ALEN);
>  		client_info->ntt = 1;

compare_ether_addr_64bits() ?

-Brian

^ permalink raw reply

* RFC:  Allow 'ip' to run in daemon mode?
From: Ben Greear @ 2010-06-29 15:34 UTC (permalink / raw)
  To: NetDev; +Cc: Stephen Hemminger

I'm considering modifying 'ip' to be able to run in daemon
mode so that I can do lots of IP commands without having to
pay the startup cost of iproute.

The -batch option almost works, but it's hard to programatically
figure out failure codes.

I'm thinking about making these changes:

1)  Move all of the error printing code into common methods (basically,
    wrap printf).  In daemon mode this text can be sent back to the
    calling process, and in normal mode, it will be printed to stdout/stderr
    as it is currently.

2)  Remove all or most calls to 'exit' and instead return error codes
    to the calling logic.

3)  Add ability to listen on a unix socket for commands, basically treat
    them just like batch commands, one command per packet.

4)  Return well formatted error code and text response to calling process
    over the unix socket, maybe something like:

RV: [errno or equiv, zero for success]\n
CMD: [ command string this relates to ]\n
[ Optional free form text ]

Does something like this have any chance of upstream inclusion?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* [PATCH -next] qlge: fix a eeh handler to not add a pending timer
From: leitao @ 2010-06-29 15:24 UTC (permalink / raw)
  To: ron.mercer; +Cc: netdev, Breno Leitao

On some ocasions the function qlge_io_resume() tries to add a
pending timer, which causes the system to hit the BUG() on
add_timer() function.

This patch only add a new timer if the timer is not pending, if so,
it just reload it.

Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com>
---
 drivers/net/qlge/qlge_main.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/qlge/qlge_main.c b/drivers/net/qlge/qlge_main.c
index fa4b24c..7b10521 100644
--- a/drivers/net/qlge/qlge_main.c
+++ b/drivers/net/qlge/qlge_main.c
@@ -4808,8 +4808,14 @@ static void qlge_io_resume(struct pci_dev *pdev)
 		netif_err(qdev, ifup, qdev->ndev,
 			  "Device was not running prior to EEH.\n");
 	}
-	qdev->timer.expires = jiffies + (5*HZ);
-	add_timer(&qdev->timer);
+
+	if (timer_pending(&qdev->timer))
+		mod_timer(&qdev->timer, jiffies + (5*HZ));
+	else{
+		qdev->timer.expires = jiffies + (5*HZ);
+		add_timer(&qdev->timer);
+	}
+
 	netif_device_attach(ndev);
 }
 
-- 
1.6.5.2


^ permalink raw reply related

* [PATCH v3] fragment: add fast path for in-order fragments
From: Changli Gao @ 2010-06-29 14:39 UTC (permalink / raw)
  To: David Miller
  Cc: Alexey Kuznetsov, Pekka Savola (ipv6), James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Eric Dumazet, netdev,
	Mitchell Erblich, Changli Gao

add fast path for in-order fragments

As the fragments are sent in order in most of OSes, such as Windows, Darwin and
FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
In the fast path, we check if the skb at the end of the inet_frag_queue is the
prev we expect.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 include/net/inet_frag.h |    1 +
 net/ipv4/ip_fragment.c  |   12 ++++++++++++
 net/ipv6/reassembly.c   |   11 +++++++++++
 3 files changed, 24 insertions(+)
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
index 39f2dc9..16ff29a 100644
--- a/include/net/inet_frag.h
+++ b/include/net/inet_frag.h
@@ -20,6 +20,7 @@ struct inet_frag_queue {
 	atomic_t		refcnt;
 	struct timer_list	timer;      /* when will this queue expire? */
 	struct sk_buff		*fragments; /* list of received fragments */
+	struct sk_buff		*fragments_tail;
 	ktime_t			stamp;
 	int			len;        /* total length of orig datagram */
 	int			meat;
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 858d346..dbe8999 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -314,6 +314,7 @@ static int ip_frag_reinit(struct ipq *qp)
 	qp->q.len = 0;
 	qp->q.meat = 0;
 	qp->q.fragments = NULL;
+	qp->q.fragments_tail = NULL;
 	qp->iif = 0;
 
 	return 0;
@@ -386,6 +387,11 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 	 * in the chain of fragments so far.  We must know where to put
 	 * this fragment, right?
 	 */
+	prev = qp->q.fragments_tail;
+	if (!prev || FRAG_CB(prev)->offset < offset) {
+		next = NULL;
+		goto found;
+	}
 	prev = NULL;
 	for (next = qp->q.fragments; next != NULL; next = next->next) {
 		if (FRAG_CB(next)->offset >= offset)
@@ -393,6 +399,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 		prev = next;
 	}
 
+found:
 	/* We found where to put this one.  Check for overlap with
 	 * preceding fragment, and, if needed, align things so that
 	 * any overlaps are eliminated.
@@ -451,6 +458,8 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
 
 	/* Insert this fragment in the chain of fragments. */
 	skb->next = next;
+	if (!next)
+		qp->q.fragments_tail = skb;
 	if (prev)
 		prev->next = skb;
 	else
@@ -504,6 +513,8 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
 			goto out_nomem;
 
 		fp->next = head->next;
+		if (!fp->next)
+			qp->q.fragments_tail = fp;
 		prev->next = fp;
 
 		skb_morph(head, qp->q.fragments);
@@ -574,6 +585,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
 	iph->tot_len = htons(len);
 	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMOKS);
 	qp->q.fragments = NULL;
+	qp->q.fragments_tail = NULL;
 	return 0;
 
 out_nomem:
diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 0b97230..b832f7b 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -333,6 +333,11 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 	 * in the chain of fragments so far.  We must know where to put
 	 * this fragment, right?
 	 */
+	prev = fq->q.fragments_tail;
+	if (!prev || FRAG6_CB(prev)->offset < offset) {
+		next = NULL;
+		goto found;
+	}
 	prev = NULL;
 	for(next = fq->q.fragments; next != NULL; next = next->next) {
 		if (FRAG6_CB(next)->offset >= offset)
@@ -340,6 +345,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 		prev = next;
 	}
 
+found:
 	/* We found where to put this one.  Check for overlap with
 	 * preceding fragment, and, if needed, align things so that
 	 * any overlaps are eliminated.
@@ -397,6 +403,8 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
 
 	/* Insert this fragment in the chain of fragments. */
 	skb->next = next;
+	if (!next)
+		fq->q.fragments_tail = skb;
 	if (prev)
 		prev->next = skb;
 	else
@@ -463,6 +471,8 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 			goto out_oom;
 
 		fp->next = head->next;
+		if (!fp->next)
+			fq->q.fragments_tail = fp;
 		prev->next = fp;
 
 		skb_morph(head, fq->q.fragments);
@@ -549,6 +559,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
 	IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMOKS);
 	rcu_read_unlock();
 	fq->q.fragments = NULL;
+	fq->q.fragments_tail = NULL;
 	return 1;
 
 out_oversize:

^ permalink raw reply related

* Re: [iproute2] iproute2: Allow 'ip addr flush' to loop more than 10 times.
From: Ben Greear @ 2010-06-29 15:10 UTC (permalink / raw)
  To: David Miller; +Cc: greearb, netdev
In-Reply-To: <20100628.233600.242129599.davem@davemloft.net>

On 06/28/2010 11:36 PM, David Miller wrote:
> From: Ben Greear<greearb@candelatech.com>
> Date: Mon, 28 Jun 2010 23:27:39 -0700
>
>> I'm not sure I understand how this loop could have run forever
>> anyway, unless some other process(es) was constantly adding
>> addresses at the same time?  Or maybe some ipv6 auto config thing?
>>
>> It appears there is already code to detect when the loop
>> is done (flushing ~70 IPv4 addresses with -l 0 was one of my
>> test cases, and worked as expected).
>
> What happens is that we are simply limited by how many addresses
> we can delete in one go, and that limit is 4096 bytes of netlink
> message size.
>
> So we have to iterate, reusing that buffer each time, to get them all
> done.
>
> The limit exists because meanwhile it is possible that some other
> entity could add addresses and thus cause us to loop forever and
> never actually delete all of the addresses because every time we
> delete a bunch the other entity adds more.
>
> I can understand the reasoning behind the limit, because if this is
> run by something automated it's not like someone is at the command
> line and hit Ctrl-C to break out of a looping instance.
>
> But practically speaking I bet this never happens.
>
> So what makes sense to me is:
>
> 1) Loop forever by default.
>
> 2) When the number of loops exceeds a threshold (calculated by the
>     number of addresses we see the first dump, divided by the number
>     of deletes we can squeeze into the 4096 byte message), we emit
>     a warning.
>
> 3) A hard limit, off by default, it available via your "-l" new option.
>
> But seriously we can determine forward progress quite easily I think.
>
> Each loop, we see if the dump returns a smaller number of addresses
> than the last iteration.  If so, we just keep going.
>
> If the number of addresses increases, I think we can bail in this
> case.
>
> This logic would only ever trigger iff another entity is adding a
> large number of addresses simultaneously with our flush.  And frankly
> speaking the person doing the flush probably doesn't expect that to be
> happening.  You're flushing all of the addresses so you can start with
> a clean slate and then add specific addresses back, or whatever.

If I understand your proposal properly, this would seem to be
somewhat O(N^2) if we have large numbers of addresses, and I'm
hoping to support thousands of IPs with decent performance.

What do you think about improving the kernel side so that we can send
a single netlink msg to delete all addresses on an interface, and just
let the kernel do the looping/locking needed to make it happen?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH net-next-2.6] snmp: 64bit ipstats_mib for all arches
From: YOSHIFUJI Hideaki @ 2010-06-29 15:07 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev, YOSHIFUJI Hideaki
In-Reply-To: <1277819286.3531.555.camel@edumazet-laptop>

Hello.

Thank you for doing this work!

Eric Dumazet wrote:
> /proc/net/snmp and /proc/net/netstat expose SNMP counters.
> 
> Width of these counters is either 32 or 64 bits, depending on the size
> of "unsigned long" in kernel.
> 
> This means user program parsing these files must already be prepared to
> deal with 64bit values, regardless of user program being 32 or 64 bit.

Well, I'm rather not in favor of breaking user-space apps.
How about leaving legacy procfs as-is and fix netlink-side only?

--yoshfuji

^ permalink raw reply

* Re: [PATCH -next] qlcnic: fail when try to setup unsupported features
From: Ben Hutchings @ 2010-06-29 15:05 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Amit Salecha, netdev@vger.kernel.org, Amerigo Wang,
	Anirban Chakraborty
In-Reply-To: <20100629170025.7a130e28@dhcp-lab-109.englab.brq.redhat.com>

On Tue, 2010-06-29 at 17:00 +0200, Stanislaw Gruszka wrote:
> On Tue, 29 Jun 2010 15:41:24 +0100
> Ben Hutchings <bhutchings@solarflare.com> wrote:
> 
> > On Mon, 2010-06-28 at 15:18 +0100, Ben Hutchings wrote:
> > > On Mon, 2010-06-28 at 16:14 +0200, Stanislaw Gruszka wrote:
> > > [...]
> > > > My plan is something like that:
> > > > 
> > > > static const struct ethtool_ops my_ethtool_ops = {
> > > >         .get_flags              = ethtool_op_get_flags,
> > > >         .set_flags              = ethtool_op_set_flags,
> > > > 	.supported_flags	= ETH_FLAG_LRO
> > > > }
> > > > 
> > > > Plus op->supported_flags check in ethtool_op_set_flags. That will allow
> > > > to define flags per driver. There is also possible to add supported_flags
> > > > to netdev, but I would like to avoid that - in such case drivers can use
> > > > custom .set_flags function.
> > > 
> > > Sounds good to me.
> > 
> > On second thoughts, this is not going work - supported_flags may need to
> > be different for different chips handled by the same driver.  
> 
> I thought about driver custom ethtool_ops::set_flags in that case.
> 
> > In fact,
> > this is already the case in sfc.  So I think you should do what I
> > suggested previously - add a supported_flags parameter to
> > ethtool_op_set_flags.
> 
> What about call from net/core/ethtool.c ? 

ethtool_op_set_flags() doesn't provide a correct default behaviour since
it ignores unknown flags.  So it cannot be used directly as an
implementation of ethtool_ops::set_flags even now.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [iproute2] iproute2:  Allow 'ip addr flush' to loop more than 10 times.
From: Ben Greear @ 2010-06-29 15:04 UTC (permalink / raw)
  To: Alexander Clouter; +Cc: netdev
In-Reply-To: <50mof7-b9p.ln1@chipmunk.wormnet.eu>

On 06/29/2010 01:03 AM, Alexander Clouter wrote:
> greearb@gmail.com wrote:
>>
>> The default remains at 10 for backwards compatibility.
>>
>> For instance:
>> # ip addr flush dev eth2
>> *** Flush remains incomplete after 10 rounds. ***
>> # ip -l 20 addr flush dev eth2
>> *** Flush remains incomplete after 20 rounds. ***
>> # ip -loops 0 addr flush dev eth2
>> #
>>
>> This is useful for getting rid of large numbers of IP
>> addresses in scripts.
>>
> Maybe I am missing a trick, but what is wrong with putting this trivial
> logic into the script:
>
> ip addr show ${DEV} | awk '/inet6? / { print $2 }' | xargs -I{} ip addr del '{}' dev ${DEV}

This isn't going to be fast if you have thousands of addresses.

> You can probably speed things up with '-P' too, '-P 2' gives me a huge
> huge speed up for the work I do with 'ip route'.

Where are you using the -P at?  It's not a supported option of 'ip'
as far as I can tell.

> Why the need to cram more functionality and options into iproute when
> it is something that can be pushed into the wrapper script?

Speed and ease of use.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH -next] qlcnic: fail when try to setup unsupported features
From: Stanislaw Gruszka @ 2010-06-29 15:00 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Amit Salecha, netdev@vger.kernel.org, Amerigo Wang,
	Anirban Chakraborty
In-Reply-To: <1277822484.2112.19.camel@achroite.uk.solarflarecom.com>

On Tue, 29 Jun 2010 15:41:24 +0100
Ben Hutchings <bhutchings@solarflare.com> wrote:

> On Mon, 2010-06-28 at 15:18 +0100, Ben Hutchings wrote:
> > On Mon, 2010-06-28 at 16:14 +0200, Stanislaw Gruszka wrote:
> > [...]
> > > My plan is something like that:
> > > 
> > > static const struct ethtool_ops my_ethtool_ops = {
> > >         .get_flags              = ethtool_op_get_flags,
> > >         .set_flags              = ethtool_op_set_flags,
> > > 	.supported_flags	= ETH_FLAG_LRO
> > > }
> > > 
> > > Plus op->supported_flags check in ethtool_op_set_flags. That will allow
> > > to define flags per driver. There is also possible to add supported_flags
> > > to netdev, but I would like to avoid that - in such case drivers can use
> > > custom .set_flags function.
> > 
> > Sounds good to me.
> 
> On second thoughts, this is not going work - supported_flags may need to
> be different for different chips handled by the same driver.  

I thought about driver custom ethtool_ops::set_flags in that case.

> In fact,
> this is already the case in sfc.  So I think you should do what I
> suggested previously - add a supported_flags parameter to
> ethtool_op_set_flags.

What about call from net/core/ethtool.c ? 

Stanislaw

^ permalink raw reply

* Re: [iproute2] iproute2: Allow 'ip addr flush' to loop more than 10 times.
From: Ben Greear @ 2010-06-29 14:59 UTC (permalink / raw)
  To: Andreas Henriksson; +Cc: David Miller, netdev, shemminger
In-Reply-To: <20100629095830.GA6775@amd64.fatal.se>

On 06/29/2010 02:58 AM, Andreas Henriksson wrote:
> Hello all!
>
> I'm sorry if I forgot to CC someone in this reply. I'm not subscribed
> and all list archives seems to be very scared of showing recipiets
> these days.
>
> [...]
>>
>> I can understand the reasoning behind the limit, because if this is
>> run by something automated it's not like someone is at the command
>> line and hit Ctrl-C to break out of a looping instance.
>>
>> But practically speaking I bet this never happens.
>
> I'm sorry to bring bad news, but your bet is wrong!
>
> There are atleast two different places (IIRC route flush, addr flush)
> in iproute2 which have these limits because they've been preventing
> people from booting their systems in the past! I know atleast
> ubuntu users has been having problems booting their computers because
> firewalling scripts executed by init use iproute2 commands and
> expect them to finish.

First, my patch doesn't change the default behaviour, so whatever
bugs the old hack was working around, the new hack should work
around as well.

Second, these hacks were put in 4+ years ago..maybe the
underlying issues have been resolved since then? If not,
it would be good to figure out the root cause and fix it,
because having 'ip addr flush' not actually flush all the
addresses isn't much better than having it spin forever.

>> If the number of addresses increases, I think we can bail in this
>> case.
>>
>> This logic would only ever trigger iff another entity is adding a
>> large number of addresses simultaneously with our flush.  And frankly
>> speaking the person doing the flush probably doesn't expect that to be
>> happening.  You're flushing all of the addresses so you can start with
>> a clean slate and then add specific addresses back, or whatever.
>>
>
> How about implementing it in the kernel so iproute2 can tell the kernel
> via netlink "flush<addresses|routes>  on interface X" with a single
> netlink message?
> I guess the kernel side has some kind of lock here that will prevent
> addresses being added and removed at the same time?

I like this idea.  It should help with performance issues with
deleting very large numbers of addresses as well.  (I'm hoping
for 5k+ virtual IPs per interface.)

Unfortunately, I generally make a mess when trying to write such
kernel patches, but I will see if I can find someone who wants to
give it a try.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply

* Re: [PATCH -next] sfc: set/clear NETIF_F_RXHASH bit directly
From: Ben Hutchings @ 2010-06-29 14:43 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: netdev, Amerigo Wang
In-Reply-To: <20100629163520.642590bf@dhcp-lab-109.englab.brq.redhat.com>

On Tue, 2010-06-29 at 16:35 +0200, Stanislaw Gruszka wrote:
> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
[...]

I don't think this is a positive change.

Please change ethtool_op_set_flags; then in efx_ethtool_set_flags() you
can do:

-	if (data & ~supported)
- 		return -EOPNOTSUPP;
- 
-	return ethtool_op_set_flags(net_dev, data);
+	return ethtool_op_set_flags(net_dev, data, supported);

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH] bonding: check if clients MAC addr has changed
From: Flavio Leitner @ 2010-06-29 14:41 UTC (permalink / raw)
  To: bonding-devel, Jay Vosburgh, netdev, Andy Gospodarek; +Cc: Flavio Leitner

When two systems using bonding devices in adaptive load
balancing (ALB) communicates with each other, an endless
ping-pong of ARP replies starts between these two systems.

What happens? In the ALB mode, bonding driver keeps track
of each client connected in a hash table, so it can do the
receive load balancing (RLB). This hash table is updated
when an ARP reply is received, then it scans for the client
entry, updates its MAC address and flag it to be announced
later. Therefore, two seconds later, the alb monitor runs
and send for each updated client entry two ARP replies
updating this specific client. The same process happens on
the receiving system, causing the endless ping-pong of arp
replies.

See more information including the relevant functions below:

   System 1                          System 2
    bond0                             bond0

   ping <system2>
    ARP request  --------->
                           <--------- ARP reply

+->rlb_arp_recv  <---------------------+   <--- loop begins
|  rlb_update_entry_from_arp           |
|  client_info->ntt = 1;               |
|  bond_info->rx_ntt = 1;              |
|                                      |
|         <communication succeed>      |
|                                      |
|  bond_alb_monitor                    |
|  rlb_update_rx_clients               |
|  rlb_update_client                   |
|  arp_create(ARPOP_REPLY)             |
|   send ARP reply -------------->     V
|   send ARP reply -------------->
|                               rlb_arp_recv
|                               rlb_update_entry_from_arp
|                               client_info->ntt = 1;
|                               bond_info->rx_ntt = 1;
|                           < snipped, same as in system 1>
+-------           <-------------- send ARP reply
                   <-------------- send ARP reply

Besides the unneeded networking traffic, this loop breaks
a cluster because a backup system can't take over the IP
address. There is always one system sending an ARP reply
poisoning the network.

This patch fixes the problem adding a check for the MAC
address before updating it. Thus, if the MAC address didn't
change, there is no need to update neither to announce it later.

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
---
 drivers/net/bonding/bond_alb.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 40fdc41..67154bb 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -340,7 +340,8 @@ static void rlb_update_entry_from_arp(struct bonding *bond, struct arp_pkt *arp)

 	if ((client_info->assigned) &&
 	    (client_info->ip_src == arp->ip_dst) &&
-	    (client_info->ip_dst == arp->ip_src)) {
+	    (client_info->ip_dst == arp->ip_src) &&
+	    (memcmp(client_info->mac_dst, arp->mac_src, ETH_ALEN))) {
 		/* update the clients MAC address */
 		memcpy(client_info->mac_dst, arp->mac_src, ETH_ALEN);
 		client_info->ntt = 1;
-- 
1.7.0.1

^ permalink raw reply related

* Re: [PATCH -next] qlcnic: fail when try to setup unsupported features
From: Ben Hutchings @ 2010-06-29 14:41 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Amit Salecha, netdev@vger.kernel.org, Amerigo Wang,
	Anirban Chakraborty
In-Reply-To: <1277734724.2089.10.camel@achroite.uk.solarflarecom.com>

On Mon, 2010-06-28 at 15:18 +0100, Ben Hutchings wrote:
> On Mon, 2010-06-28 at 16:14 +0200, Stanislaw Gruszka wrote:
> [...]
> > My plan is something like that:
> > 
> > static const struct ethtool_ops my_ethtool_ops = {
> >         .get_flags              = ethtool_op_get_flags,
> >         .set_flags              = ethtool_op_set_flags,
> > 	.supported_flags	= ETH_FLAG_LRO
> > }
> > 
> > Plus op->supported_flags check in ethtool_op_set_flags. That will allow
> > to define flags per driver. There is also possible to add supported_flags
> > to netdev, but I would like to avoid that - in such case drivers can use
> > custom .set_flags function.
> 
> Sounds good to me.

On second thoughts, this is not going work - supported_flags may need to
be different for different chips handled by the same driver.  In fact,
this is already the case in sfc.  So I think you should do what I
suggested previously - add a supported_flags parameter to
ethtool_op_set_flags.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH -next] ixgbe: use NETIF_F_LRO
From: Stanislaw Gruszka @ 2010-06-29 14:38 UTC (permalink / raw)
  To: netdev; +Cc: Jeff Kirsher

Both ETH_FLAG_LRO and NETIF_F_LRO have the same value, but NETIF_F_LRO
is intended to use with netdev->features.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
 drivers/net/ixgbe/ixgbe_ethtool.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethtool.c b/drivers/net/ixgbe/ixgbe_ethtool.c
index 873b45e..e2ab4ae 100644
--- a/drivers/net/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ixgbe/ixgbe_ethtool.c
@@ -2227,7 +2227,7 @@ static int ixgbe_set_flags(struct net_device *netdev, u32 data)
 				break;
 			}
 		} else if (!adapter->rx_itr_setting) {
-			netdev->features &= ~ETH_FLAG_LRO;
+			netdev->features &= ~NETIF_F_LRO;
 			if (data & ETH_FLAG_LRO)
 				e_info("rx-usecs set to 0, "
 					"LRO/RSC cannot be enabled.\n");
-- 
1.5.5.6


^ permalink raw reply related

* [PATCH -next] myri10ge: clear NETIF_F_LRO bit directly
From: Stanislaw Gruszka @ 2010-06-29 14:37 UTC (permalink / raw)
  To: netdev; +Cc: Amerigo Wang, Andrew Gallatin, Brice Goglin

Do not use ethtool_op_set_flags() to clear one bit in ->features.
Inform user about disabling LRO.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
 drivers/net/myri10ge/myri10ge.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
index e0b47cc..2259168 100644
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -1725,17 +1725,15 @@ static u32 myri10ge_get_rx_csum(struct net_device *netdev)
 static int myri10ge_set_rx_csum(struct net_device *netdev, u32 csum_enabled)
 {
 	struct myri10ge_priv *mgp = netdev_priv(netdev);
-	int err = 0;
 
 	if (csum_enabled)
 		mgp->csum_flag = MXGEFW_FLAGS_CKSUM;
 	else {
-		u32 flags = ethtool_op_get_flags(netdev);
-		err = ethtool_op_set_flags(netdev, (flags & ~ETH_FLAG_LRO));
 		mgp->csum_flag = 0;
-
+		netdev->features &= ~NETIF_F_LRO;
+		netdev_info(netdev, "RX checksumming set off, disabling LRO\n");
 	}
-	return err;
+	return 0;
 }
 
 static int myri10ge_set_tso(struct net_device *netdev, u32 tso_enabled)
-- 
1.5.5.6


^ permalink raw reply related

* [PATCH -next] sfc: set/clear NETIF_F_RXHASH bit directly
From: Stanislaw Gruszka @ 2010-06-29 14:35 UTC (permalink / raw)
  To: netdev; +Cc: Amerigo Wang, Ben Hutchings

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
 drivers/net/sfc/ethtool.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sfc/ethtool.c b/drivers/net/sfc/ethtool.c
index 7693cfb..fd55123 100644
--- a/drivers/net/sfc/ethtool.c
+++ b/drivers/net/sfc/ethtool.c
@@ -554,7 +554,12 @@ static int efx_ethtool_set_flags(struct net_device *net_dev, u32 data)
 	if (data & ~supported)
 		return -EOPNOTSUPP;
 
-	return ethtool_op_set_flags(net_dev, data);
+	if (data & ETH_FLAG_RXHASH)
+		net_dev->features |= NETIF_F_RXHASH;
+	else
+		net_dev->features &= ~NETIF_F_RXHASH;
+
+	return 0;
 }
 
 static void efx_ethtool_self_test(struct net_device *net_dev,
-- 
1.5.5.6


^ permalink raw reply related

* [PATCH -next] enic: fail when try to setup unsupported features
From: Stanislaw Gruszka @ 2010-06-29 14:33 UTC (permalink / raw)
  To: netdev; +Cc: Amerigo Wang, Scott Feldman

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
 drivers/net/enic/enic_main.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 6c6795b..77a7f87 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -365,7 +365,6 @@ static const struct ethtool_ops enic_ethtool_ops = {
 	.get_coalesce = enic_get_coalesce,
 	.set_coalesce = enic_set_coalesce,
 	.get_flags = ethtool_op_get_flags,
-	.set_flags = ethtool_op_set_flags,
 };
 
 static void enic_free_wq_buf(struct vnic_wq *wq, struct vnic_wq_buf *buf)
-- 
1.5.5.6


^ permalink raw reply related

* Re: [PATCH v2] fragment: add fast path
From: YOSHIFUJI Hideaki @ 2010-06-29 14:29 UTC (permalink / raw)
  To: Changli Gao
  Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Patrick McHardy, netdev, Mitchell Erblich,
	YOSHIFUJI Hideaki
In-Reply-To: <AANLkTimZ9Q-i9eTOfE3dae9Fru0dV2p2cEQqO3l8XBe9@mail.gmail.com>

Changli Gao wrote:
> On Tue, Jun 29, 2010 at 9:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le vendredi 25 juin 2010 à 10:54 +0800, Changli Gao a écrit :
>>> add fast path
>>>
>>> As the fragments are sent in order in most of OSes, such as Windows, Darwin and
>>> FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
>>> In the fast path, we check if the skb at the end of the inet_frag_queue is the
>>> prev we expect.
>>>
>>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>> This patch is fine, but they are two indentation glitches.
>>
> 
> Oh, Thanks. I'll fix them.

And, I think it is better not to just say it as "fast path"
because it does not sufficient.  Probably "fast path for
in-order fragments" or something like that.

Regards,

--yoshfuji

^ permalink raw reply

* Re: [PATCH v2] fragment: add fast path
From: Changli Gao @ 2010-06-29 14:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Alexey Kuznetsov, Pekka Savola (ipv6), James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, Mitchell Erblich
In-Reply-To: <1277819660.3531.568.camel@edumazet-laptop>

On Tue, Jun 29, 2010 at 9:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 25 juin 2010 à 10:54 +0800, Changli Gao a écrit :
>> add fast path
>>
>> As the fragments are sent in order in most of OSes, such as Windows, Darwin and
>> FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
>> In the fast path, we check if the skb at the end of the inet_frag_queue is the
>> prev we expect.
>>
>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>
> This patch is fine, but they are two indentation glitches.
>

Oh, Thanks. I'll fix them.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Why is destructor_arg in shinfo?
From: Jeremy Fitzhardinge @ 2010-06-29 14:09 UTC (permalink / raw)
  To: Johann Baudy; +Cc: NetDev, Ian Campbell, David Miller

Hi,

I'm wondering why "net: TX_RING and packet mmap" (69e3c75) ended up
putting the skb destructor's arg in the skb's shinfo.  It seems like an
odd mismatch, since the skb and the shinfo have different lifetimes. 
And in principle you might have two skbs with different destructors
sharing a shinfo, and therefore conflict over the use of destructor_arg.

What's the rationale?

Would it make sense to have a shinfo destructor as well?

Thanks,
    J

^ permalink raw reply

* Re: [PATCH v2] fragment: add fast path
From: Eric Dumazet @ 2010-06-29 13:54 UTC (permalink / raw)
  To: Changli Gao
  Cc: David Miller, Alexey Kuznetsov, Pekka Savola (ipv6), James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, Mitchell Erblich
In-Reply-To: <1277434472-2845-1-git-send-email-xiaosuo@gmail.com>

Le vendredi 25 juin 2010 à 10:54 +0800, Changli Gao a écrit :
> add fast path
> 
> As the fragments are sent in order in most of OSes, such as Windows, Darwin and
> FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
> In the fast path, we check if the skb at the end of the inet_frag_queue is the
> prev we expect.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

This patch is fine, but they are two indentation glitches.

> ----
>  include/net/inet_frag.h |    1 +
>  net/ipv4/ip_fragment.c  |   12 ++++++++++++
>  net/ipv6/reassembly.c   |   11 +++++++++++
>  3 files changed, 24 insertions(+)
> diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
> index 39f2dc9..16ff29a 100644
> --- a/include/net/inet_frag.h
> +++ b/include/net/inet_frag.h
> @@ -20,6 +20,7 @@ struct inet_frag_queue {
>  	atomic_t		refcnt;
>  	struct timer_list	timer;      /* when will this queue expire? */
>  	struct sk_buff		*fragments; /* list of received fragments */
> +	struct sk_buff		*fragments_tail;
>  	ktime_t			stamp;
>  	int			len;        /* total length of orig datagram */
>  	int			meat;
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index 858d346..dbe8999 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -314,6 +314,7 @@ static int ip_frag_reinit(struct ipq *qp)
>  	qp->q.len = 0;
>  	qp->q.meat = 0;
>  	qp->q.fragments = NULL;
> +	qp->q.fragments_tail = NULL;
>  	qp->iif = 0;
>  
>  	return 0;
> @@ -386,6 +387,11 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
>  	 * in the chain of fragments so far.  We must know where to put
>  	 * this fragment, right?
>  	 */
> +	prev = qp->q.fragments_tail;
> +	if (!prev || FRAG_CB(prev)->offset < offset) {

strange indentation : one tab in excess

> +			next = NULL;
> +			goto found;
> +	}
>  	prev = NULL;
>  	for (next = qp->q.fragments; next != NULL; next = next->next) {
>  		if (FRAG_CB(next)->offset >= offset)
> @@ -393,6 +399,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
>  		prev = next;
>  	}
>  
> +found:
>  	/* We found where to put this one.  Check for overlap with
>  	 * preceding fragment, and, if needed, align things so that
>  	 * any overlaps are eliminated.
> @@ -451,6 +458,8 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
>  
>  	/* Insert this fragment in the chain of fragments. */
>  	skb->next = next;
> +	if (!next)
> +		qp->q.fragments_tail = skb;
>  	if (prev)
>  		prev->next = skb;
>  	else
> @@ -504,6 +513,8 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
>  			goto out_nomem;
>  
>  		fp->next = head->next;
> +		if (!fp->next)
> +			qp->q.fragments_tail = fp;
>  		prev->next = fp;
>  
>  		skb_morph(head, qp->q.fragments);
> @@ -574,6 +585,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
>  	iph->tot_len = htons(len);
>  	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMOKS);
>  	qp->q.fragments = NULL;
> +	qp->q.fragments_tail = NULL;
>  	return 0;
>  
>  out_nomem:
> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
> index 0b97230..b832f7b 100644
> --- a/net/ipv6/reassembly.c
> +++ b/net/ipv6/reassembly.c
> @@ -333,6 +333,11 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
>  	 * in the chain of fragments so far.  We must know where to put
>  	 * this fragment, right?
>  	 */
> +	prev = fq->q.fragments_tail;
> +	if (!prev || FRAG6_CB(prev)->offset < offset) {

same here : one tab in excess

> +			next = NULL;
> +			goto found;
> +	}
>  	prev = NULL;
>  	for(next = fq->q.fragments; next != NULL; next = next->next) {
>  		if (FRAG6_CB(next)->offset >= offset)
> @@ -340,6 +345,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
>  		prev = next;
>  	}
>  
> +found:
>  	/* We found where to put this one.  Check for overlap with
>  	 * preceding fragment, and, if needed, align things so that
>  	 * any overlaps are eliminated.
> @@ -397,6 +403,8 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
>  
>  	/* Insert this fragment in the chain of fragments. */
>  	skb->next = next;
> +	if (!next)
> +		fq->q.fragments_tail = skb;
>  	if (prev)
>  		prev->next = skb;
>  	else
> @@ -463,6 +471,8 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
>  			goto out_oom;
>  
>  		fp->next = head->next;
> +		if (!fp->next)
> +			fq->q.fragments_tail = fp;
>  		prev->next = fp;
>  
>  		skb_morph(head, fq->q.fragments);
> @@ -549,6 +559,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
>  	IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMOKS);
>  	rcu_read_unlock();
>  	fq->q.fragments = NULL;
> +	fq->q.fragments_tail = NULL;
>  	return 1;
>  
>  out_oversize:



^ permalink raw reply

* [PATCH net-next-2.6] snmp: 64bit ipstats_mib for all arches
From: Eric Dumazet @ 2010-06-29 13:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1277398051.2816.644.camel@edumazet-laptop>

/proc/net/snmp and /proc/net/netstat expose SNMP counters.

Width of these counters is either 32 or 64 bits, depending on the size
of "unsigned long" in kernel.

This means user program parsing these files must already be prepared to
deal with 64bit values, regardless of user program being 32 or 64 bit.

This patch introduces 64bit snmp values for IPSTAT mib, where some
counters can wrap pretty fast if they are 32bit wide.

# netstat -s|egrep "InOctets|OutOctets"
    InOctets: 244068329096
    OutOctets: 244069348848


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/ip.h    |   20 +++++++----
 include/net/ipv6.h  |   12 +++---
 include/net/snmp.h  |   75 +++++++++++++++++++++++++++++++++++++++---
 net/ipv4/af_inet.c  |   36 ++++++++++++++++++++
 net/ipv4/proc.c     |   15 +++++---
 net/ipv6/addrconf.c |   18 +++++++++-
 net/ipv6/proc.c     |   17 +++++++--
 7 files changed, 167 insertions(+), 26 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 3b524df..890f972 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -165,12 +165,12 @@ struct ipv4_config {
 };
 
 extern struct ipv4_config ipv4_config;
-#define IP_INC_STATS(net, field)	SNMP_INC_STATS((net)->mib.ip_statistics, field)
-#define IP_INC_STATS_BH(net, field)	SNMP_INC_STATS_BH((net)->mib.ip_statistics, field)
-#define IP_ADD_STATS(net, field, val)	SNMP_ADD_STATS((net)->mib.ip_statistics, field, val)
-#define IP_ADD_STATS_BH(net, field, val) SNMP_ADD_STATS_BH((net)->mib.ip_statistics, field, val)
-#define IP_UPD_PO_STATS(net, field, val) SNMP_UPD_PO_STATS((net)->mib.ip_statistics, field, val)
-#define IP_UPD_PO_STATS_BH(net, field, val) SNMP_UPD_PO_STATS_BH((net)->mib.ip_statistics, field, val)
+#define IP_INC_STATS(net, field)	SNMP_INC_STATS64((net)->mib.ip_statistics, field)
+#define IP_INC_STATS_BH(net, field)	SNMP_INC_STATS64_BH((net)->mib.ip_statistics, field)
+#define IP_ADD_STATS(net, field, val)	SNMP_ADD_STATS64((net)->mib.ip_statistics, field, val)
+#define IP_ADD_STATS_BH(net, field, val) SNMP_ADD_STATS64_BH((net)->mib.ip_statistics, field, val)
+#define IP_UPD_PO_STATS(net, field, val) SNMP_UPD_PO_STATS64((net)->mib.ip_statistics, field, val)
+#define IP_UPD_PO_STATS_BH(net, field, val) SNMP_UPD_PO_STATS64_BH((net)->mib.ip_statistics, field, val)
 #define NET_INC_STATS(net, field)	SNMP_INC_STATS((net)->mib.net_statistics, field)
 #define NET_INC_STATS_BH(net, field)	SNMP_INC_STATS_BH((net)->mib.net_statistics, field)
 #define NET_INC_STATS_USER(net, field) 	SNMP_INC_STATS_USER((net)->mib.net_statistics, field)
@@ -178,6 +178,14 @@ extern struct ipv4_config ipv4_config;
 #define NET_ADD_STATS_USER(net, field, adnd) SNMP_ADD_STATS_USER((net)->mib.net_statistics, field, adnd)
 
 extern unsigned long snmp_fold_field(void __percpu *mib[], int offt);
+#if BITS_PER_LONG==32
+extern u64 snmp_fold_field64(void __percpu *mib[], int offt, size_t sync_off);
+#else
+static inline u64 snmp_fold_field64(void __percpu *mib[], int offt, size_t syncp_off)
+{
+	return snmp_fold_field(mib, offt);
+}
+#endif
 extern int snmp_mib_init(void __percpu *ptr[2], size_t mibsize, size_t align);
 extern void snmp_mib_free(void __percpu *ptr[2]);
 
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index f5808d5..1f84124 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -136,17 +136,17 @@ extern struct ctl_path net_ipv6_ctl_path[];
 /* MIBs */
 
 #define IP6_INC_STATS(net, idev,field)		\
-		_DEVINC(net, ipv6, , idev, field)
+		_DEVINC(net, ipv6, 64, idev, field)
 #define IP6_INC_STATS_BH(net, idev,field)	\
-		_DEVINC(net, ipv6, _BH, idev, field)
+		_DEVINC(net, ipv6, 64_BH, idev, field)
 #define IP6_ADD_STATS(net, idev,field,val)	\
-		_DEVADD(net, ipv6, , idev, field, val)
+		_DEVADD(net, ipv6, 64, idev, field, val)
 #define IP6_ADD_STATS_BH(net, idev,field,val)	\
-		_DEVADD(net, ipv6, _BH, idev, field, val)
+		_DEVADD(net, ipv6, 64_BH, idev, field, val)
 #define IP6_UPD_PO_STATS(net, idev,field,val)   \
-		_DEVUPD(net, ipv6, , idev, field, val)
+		_DEVUPD(net, ipv6, 64, idev, field, val)
 #define IP6_UPD_PO_STATS_BH(net, idev,field,val)   \
-		_DEVUPD(net, ipv6, _BH, idev, field, val)
+		_DEVUPD(net, ipv6, 64_BH, idev, field, val)
 #define ICMP6_INC_STATS(net, idev, field)	\
 		_DEVINC(net, icmpv6, , idev, field)
 #define ICMP6_INC_STATS_BH(net, idev, field)	\
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 899003d..a0e6180 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -47,15 +47,16 @@ struct snmp_mib {
 }
 
 /*
- * We use all unsigned longs. Linux will soon be so reliable that even 
- * these will rapidly get too small 8-). Seriously consider the IpInReceives 
- * count on the 20Gb/s + networks people expect in a few years time!
+ * We use unsigned longs for most mibs but u64 for ipstats.
  */
+#include <linux/u64_stats_sync.h>
 
 /* IPstats */
 #define IPSTATS_MIB_MAX	__IPSTATS_MIB_MAX
 struct ipstats_mib {
-	unsigned long	mibs[IPSTATS_MIB_MAX];
+	/* mibs[] must be first field of struct ipstats_mib */
+	u64		mibs[IPSTATS_MIB_MAX];
+	struct u64_stats_sync syncp;
 };
 
 /* ICMP */
@@ -155,4 +156,70 @@ struct linux_xfrm_mib {
 		ptr->mibs[basefield##PKTS]++; \
 		ptr->mibs[basefield##OCTETS] += addend;\
 	} while (0)
+
+
+#if BITS_PER_LONG==32
+
+#define SNMP_ADD_STATS64_BH(mib, field, addend) 			\
+	do {								\
+		__typeof__(*mib[0]) *ptr = __this_cpu_ptr((mib)[0]);	\
+		u64_stats_update_begin(&ptr->syncp);			\
+		ptr->mibs[field] += addend;				\
+		u64_stats_update_end(&ptr->syncp);			\
+	} while (0)
+#define SNMP_ADD_STATS64_USER(mib, field, addend) 			\
+	do {								\
+		__typeof__(*mib[0]) *ptr;				\
+		preempt_disable();					\
+		ptr = __this_cpu_ptr((mib)[1]);				\
+		u64_stats_update_begin(&ptr->syncp);			\
+		ptr->mibs[field] += addend;				\
+		u64_stats_update_end(&ptr->syncp);			\
+		preempt_enable();					\
+	} while (0)
+#define SNMP_ADD_STATS64(mib, field, addend)				\
+	do {								\
+		__typeof__(*mib[0]) *ptr;				\
+		preempt_disable();					\
+		ptr = __this_cpu_ptr((mib)[!in_softirq()]);		\
+		u64_stats_update_begin(&ptr->syncp);			\
+		ptr->mibs[field] += addend;				\
+		u64_stats_update_end(&ptr->syncp);			\
+		preempt_enable();					\
+	} while (0)
+#define SNMP_INC_STATS64_BH(mib, field) SNMP_ADD_STATS64_BH(mib, field, 1)
+#define SNMP_INC_STATS64_USER(mib, field) SNMP_ADD_STATS64_USER(mib, field, 1)
+#define SNMP_INC_STATS64(mib, field) SNMP_ADD_STATS64(mib, field, 1)
+#define SNMP_UPD_PO_STATS64(mib, basefield, addend)			\
+	do {								\
+		__typeof__(*mib[0]) *ptr;				\
+		preempt_disable();					\
+		ptr = __this_cpu_ptr((mib)[!in_softirq()]);		\
+		u64_stats_update_begin(&ptr->syncp);			\
+		ptr->mibs[basefield##PKTS]++;				\
+		ptr->mibs[basefield##OCTETS] += addend;			\
+		u64_stats_update_end(&ptr->syncp);			\
+		preempt_enable();					\
+	} while (0)
+#define SNMP_UPD_PO_STATS64_BH(mib, basefield, addend)			\
+	do {								\
+		__typeof__(*mib[0]) *ptr;				\
+		ptr = __this_cpu_ptr((mib)[!in_softirq()]);		\
+		u64_stats_update_begin(&ptr->syncp);			\
+		ptr->mibs[basefield##PKTS]++;				\
+		ptr->mibs[basefield##OCTETS] += addend;			\
+		u64_stats_update_end(&ptr->syncp);			\
+	} while (0)
+#else
+#define SNMP_INC_STATS64_BH(mib, field)		SNMP_INC_STATS_BH(mib, field)
+#define SNMP_INC_STATS64_USER(mib, field)	SNMP_INC_STATS_USER(mib, field)
+#define SNMP_INC_STATS64(mib, field)		SNMP_INC_STATS(mib, field)
+#define SNMP_DEC_STATS64(mib, field)		SNMP_DEC_STATS(mib, field)
+#define SNMP_ADD_STATS64_BH(mib, field, addend) SNMP_ADD_STATS_BH(mib, field, addend)
+#define SNMP_ADD_STATS64_USER(mib, field, addend) SNMP_ADD_STATS_USER(mib, field, addend)
+#define SNMP_ADD_STATS64(mib, field, addend)	SNMP_ADD_STATS(mib, field, addend)
+#define SNMP_UPD_PO_STATS64(mib, basefield, addend) SNMP_UPD_PO_STATS(mib, basefield, addend)
+#define SNMP_UPD_PO_STATS64_BH(mib, basefield, addend) SNMP_UPD_PO_STATS_BH(mib, basefield, addend)
+#endif
+
 #endif
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 640db9b..3ceb025 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1427,6 +1427,42 @@ unsigned long snmp_fold_field(void __percpu *mib[], int offt)
 }
 EXPORT_SYMBOL_GPL(snmp_fold_field);
 
+#if BITS_PER_LONG==32
+
+u64 snmp_fold_field64(void __percpu *mib[], int offt, size_t syncp_offset)
+{
+	u64 res = 0;
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		void *bhptr, *userptr;
+		struct u64_stats_sync *syncp;
+		u64 v_bh, v_user;
+		unsigned int start;
+
+		/* first mib used by softirq context, we must use _bh() accessors */
+		bhptr = per_cpu_ptr(SNMP_STAT_BHPTR(mib), cpu);
+		syncp = (struct u64_stats_sync *)(bhptr + syncp_offset);
+		do {
+			start = u64_stats_fetch_begin_bh(syncp);
+			v_bh = *(((u64 *) bhptr) + offt);
+		} while (u64_stats_fetch_retry_bh(syncp, start));
+
+		/* second mib used in USER context */
+		userptr = per_cpu_ptr(SNMP_STAT_USRPTR(mib), cpu);
+		syncp = (struct u64_stats_sync *)(userptr + syncp_offset);
+		do {
+			start = u64_stats_fetch_begin(syncp);
+			v_user = *(((u64 *) userptr) + offt);
+		} while (u64_stats_fetch_retry(syncp, start));
+
+		res += v_bh + v_user;
+	}
+	return res;
+}
+EXPORT_SYMBOL_GPL(snmp_fold_field64);
+#endif
+
 int snmp_mib_init(void __percpu *ptr[2], size_t mibsize, size_t align)
 {
 	BUG_ON(ptr == NULL);
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index e320ca6..4ae1f20 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -343,10 +343,12 @@ static int snmp_seq_show(struct seq_file *seq, void *v)
 		   IPV4_DEVCONF_ALL(net, FORWARDING) ? 1 : 2,
 		   sysctl_ip_default_ttl);
 
+	BUILD_BUG_ON(offsetof(struct ipstats_mib, mibs) != 0);
 	for (i = 0; snmp4_ipstats_list[i].name != NULL; i++)
-		seq_printf(seq, " %lu",
-			   snmp_fold_field((void __percpu **)net->mib.ip_statistics,
-					   snmp4_ipstats_list[i].entry));
+		seq_printf(seq, " %llu",
+			   snmp_fold_field64((void __percpu **)net->mib.ip_statistics,
+					     snmp4_ipstats_list[i].entry,
+					     offsetof(struct ipstats_mib, syncp)));
 
 	icmp_put(seq);	/* RFC 2011 compatibility */
 	icmpmsg_put(seq);
@@ -432,9 +434,10 @@ static int netstat_seq_show(struct seq_file *seq, void *v)
 
 	seq_puts(seq, "\nIpExt:");
 	for (i = 0; snmp4_ipextstats_list[i].name != NULL; i++)
-		seq_printf(seq, " %lu",
-			   snmp_fold_field((void __percpu **)net->mib.ip_statistics,
-					   snmp4_ipextstats_list[i].entry));
+		seq_printf(seq, " %llu",
+			   snmp_fold_field64((void __percpu **)net->mib.ip_statistics,
+					     snmp4_ipextstats_list[i].entry,
+					     offsetof(struct ipstats_mib, syncp)));
 
 	seq_putc(seq, '\n');
 	return 0;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index c20a7c2..56165ae 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3858,12 +3858,28 @@ static inline void __snmp6_fill_stats(u64 *stats, void __percpu **mib,
 	memset(&stats[items], 0, pad);
 }
 
+static inline void __snmp6_fill_stats64(u64 *stats, void __percpu **mib,
+				      int items, int bytes, size_t syncpoff)
+{
+	int i;
+	int pad = bytes - sizeof(u64) * items;
+	BUG_ON(pad < 0);
+
+	/* Use put_unaligned() because stats may not be aligned for u64. */
+	put_unaligned(items, &stats[0]);
+	for (i = 1; i < items; i++)
+		put_unaligned(snmp_fold_field64(mib, i, syncpoff), &stats[i]);
+
+	memset(&stats[items], 0, pad);
+}
+
 static void snmp6_fill_stats(u64 *stats, struct inet6_dev *idev, int attrtype,
 			     int bytes)
 {
 	switch (attrtype) {
 	case IFLA_INET6_STATS:
-		__snmp6_fill_stats(stats, (void __percpu **)idev->stats.ipv6, IPSTATS_MIB_MAX, bytes);
+		__snmp6_fill_stats64(stats, (void __percpu **)idev->stats.ipv6,
+				     IPSTATS_MIB_MAX, bytes, offsetof(struct ipstats_mib, syncp));
 		break;
 	case IFLA_INET6_ICMP6STATS:
 		__snmp6_fill_stats(stats, (void __percpu **)idev->stats.icmpv6, ICMP6_MIB_MAX, bytes);
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index 566798d..4777691 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -174,17 +174,28 @@ static void snmp6_seq_show_item(struct seq_file *seq, void __percpu **mib,
 				const struct snmp_mib *itemlist)
 {
 	int i;
-	for (i=0; itemlist[i].name; i++)
+
+	for (i = 0; itemlist[i].name; i++)
 		seq_printf(seq, "%-32s\t%lu\n", itemlist[i].name,
 			   snmp_fold_field(mib, itemlist[i].entry));
 }
 
+static void snmp6_seq_show_item64(struct seq_file *seq, void __percpu **mib,
+				  const struct snmp_mib *itemlist, size_t syncpoff)
+{
+	int i;
+
+	for (i = 0; itemlist[i].name; i++)
+		seq_printf(seq, "%-32s\t%llu\n", itemlist[i].name,
+			   snmp_fold_field64(mib, itemlist[i].entry, syncpoff));
+}
+
 static int snmp6_seq_show(struct seq_file *seq, void *v)
 {
 	struct net *net = (struct net *)seq->private;
 
-	snmp6_seq_show_item(seq, (void __percpu **)net->mib.ipv6_statistics,
-			    snmp6_ipstats_list);
+	snmp6_seq_show_item64(seq, (void __percpu **)net->mib.ipv6_statistics,
+			    snmp6_ipstats_list, offsetof(struct ipstats_mib, syncp));
 	snmp6_seq_show_item(seq, (void __percpu **)net->mib.icmpv6_statistics,
 			    snmp6_icmp6_list);
 	snmp6_seq_show_icmpv6msg(seq,



^ permalink raw reply related

* Re: [PATCHv2] vhost-net: add dhclient work-around from userspace
From: Michael S. Tsirkin @ 2010-06-29 13:04 UTC (permalink / raw)
  To: David Miller
  Cc: arozansk, herbert.xu, quintela, kvm, virtualization, netdev,
	linux-kernel, ykaul, markmc
In-Reply-To: <20100629.003647.214219303.davem@davemloft.net>

On Tue, Jun 29, 2010 at 12:36:47AM -0700, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Mon, 28 Jun 2010 13:08:07 +0300
> 
> > Userspace virtio server has the following hack
> > so guests rely on it, and we have to replicate it, too:
> > 
> > Use port number to detect incoming IPv4 DHCP response packets,
> > and fill in the checksum for these.
> > 
> > The issue we are solving is that on linux guests, some apps
> > that use recvmsg with AF_PACKET sockets, don't know how to
> > handle CHECKSUM_PARTIAL;
> > The interface to return the relevant information was added
> > in 8dc4194474159660d7f37c495e3fc3f10d0db8cc,
> > and older userspace does not use it.
> > One important user of recvmsg with AF_PACKET is dhclient,
> > so we add a work-around just for DHCP.
> > 
> > Don't bother applying the hack to IPv6 as userspace virtio does not
> > have a work-around for that - let's hope guests will do the right
> > thing wrt IPv6.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Yikes, this is awful too.
> 
> Nothing in the kernel should be mucking around with procotol packets
> like this by default.  In particular, what the heck does port 67 mean?
> Locally I can use it for whatever I want for my own purposes, I don't
> have to follow the conventions for service ports as specified by the
> IETF.
> 
> But I can't have the packet checksum state be left alone for port 67
> traffic on a box using virtio because you have this hack there.
> 
> And yes it's broken on machines using the qemu thing, but at least the
> hack there is restricted to userspace.

Yes, and I think it was a mistake to add the hack there. This is what
prevented applications from using the new interface in the 3 years
since it was first introduced.

> I really don't want anything in the kernel that looks like this.
> 
> These applications are broken, and we've provided a way for them to
> work properly.  What's the point of having fixed applications if
> all of these hacks grow like fungus over every virtualization transport?
> 
> It just means that people won't fix the apps, since they don't have
> to.  There is no incentive, and the mechanism we created to properly
> handle this loses it's value.
> 
> At best, you can write a netfilter module that mucks up the packet
> checksum state in these situations.  At least in that case, you can
> make it generic (it mangles iff a packet matches a certain rule,
> so for your virtio guests you'd make it match for DHCP frames) instead
> of being some hard-coded DHCP thing by design.

Nod.
One question on implementation:
why does skb_checksum_help set the checksum state to
CHECKSUM_NONE? Shouldn't it be CHECKSUM_COMPLETE?



> And since this is so cleanly seperated and portable you don't even
> need to push it upstream.  It's a temporary workaround for a temporary
> problem.  You can just delete it as soon as the majority of guests
> have the fixed dhcp.  The qemu crap should disappear similarly.

Since using the module involves updating the management tools
as well, if we go down this route it will be much less painful
for everyone to do push it upstream.

-- 
MST

^ permalink raw reply

* Re: [0/8] netpoll/bridge fixes
From: Yanko Kaneti @ 2010-06-29 12:53 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Michael S.Tsirkin, Qianfeng Zhang, David S.Miller, netdev,
	WANG Cong, Stephen Hemminger, Matt Mackall
In-Reply-To: <20100610124047.GA16658@gondor.apana.org.au>

On Thu, 2010-06-10 at 22:40 +1000, Herbert Xu wrote:
> Hi:
> 
> Qianfeng Zhang reported that he was seeing crashes with the
> attached backtrace.
> 
> I tracked this down to the recently added netpoll support in
> the bridge device.  It's a classic use-after-free problem.
> 
> Trying to solve it brought out a host of other issues, some of
> which existed prior to the new bridge code.  The following patches
> attempt to address some of these issues.
> 
> Warning, this is completely untested (apart from compiling with
> everything enabled) so please look but don't merge :)

FWIW 2.6.35-0.2.rc3.git0.fc14.x86_64 and later rawhide kernels are
causing quite reproducible __br_deliver crashes on routine f13
netinstalls in a kvm guest here.

To test I cherry picked this series +
netpoll-Use-correct-primitives-for-RCU-dereferencing and
net-fix-netpoll-Allow-netpoll_setup-cleanup-recursion from net-next on
top of 2.6.35-0.15.rc3.git3.fc14.x86_64 (which is todays linus tree) and
it seems to fix the crashes for me.

Perhaps the netpoll fixes should find their way to a rc before 2.6.35
goes golden.

Regards
Yanko

^ permalink raw reply

* Re: [PATCH 1/1] Bluetooth: hidp: Add support for hidraw HIDIOCGFEATURE and HIDIOCSFEATURE
From: Andrei Emeltchenko @ 2010-06-29 12:40 UTC (permalink / raw)
  To: David Miller
  Cc: ospite-aNJ+ML1ZbiP93QAQaVx+gl6hYfS7NtTn,
	alan-yzvJWuRpmD1zbRFIqnYvSA, marcel-kz+m5ild9QBg9hUCZPvPmw,
	jkosina-AlSwsSmVLrQ, mdpoole-IZmAEv5cUt1AfugRpC6u6w,
	hadess-0MeiytkfxGOsTnJN9+BGXg,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20100629.001216.91341775.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

Hi,

On Tue, Jun 29, 2010 at 10:12 AM, David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:
> From: Antonio Ospite <ospite-aNJ+ML1ZbiP93QAQaVx+gl6hYfS7NtTn@public.gmane.org>
> Date: Mon, 28 Jun 2010 13:14:37 +0200
>
>> On Sun, 13 Jun 2010 18:20:01 -0400
>> Alan Ott <alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org> wrote:
>>
>>> This patch adds support or getting and setting feature reports for bluetooth
>>> HID devices from HIDRAW.
>>>
>>> Signed-off-by: Alan Ott <alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>
>>> ---
>>
>> Ping.
>
> We effectively don't have a bluetooth maintainer at the current point
> in time.  I've tried to let patches sit for a while hoping the listed
> maintainer would do something, at least occaisionally, but that simply
> isn't happening.
> So I'll just pick patches up directly as I find time to review them,
> but I have to warn that for me it's going to be done in a very low
> priority way because I really don't find bluetooth all that exciting. :-)

This would be good. We have a backlog of bluetooth kernel patches
waiting for several months.
This takes too much time to return to them again and again...

Please keep this ML informed.

Regards,
Andrei Emeltchenko

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox