Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH -next] sfc: set/clear NETIF_F_RXHASH bit directly
From: Ben Hutchings @ 2010-06-29 14:43 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: netdev, Amerigo Wang
In-Reply-To: <20100629163520.642590bf@dhcp-lab-109.englab.brq.redhat.com>

On Tue, 2010-06-29 at 16:35 +0200, Stanislaw Gruszka wrote:
> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
[...]

I don't think this is a positive change.

Please change ethtool_op_set_flags; then in efx_ethtool_set_flags() you
can do:

-	if (data & ~supported)
- 		return -EOPNOTSUPP;
- 
-	return ethtool_op_set_flags(net_dev, data);
+	return ethtool_op_set_flags(net_dev, data, supported);

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH] bonding: check if clients MAC addr has changed
From: Flavio Leitner @ 2010-06-29 14:41 UTC (permalink / raw)
  To: bonding-devel, Jay Vosburgh, netdev, Andy Gospodarek; +Cc: Flavio Leitner

When two systems using bonding devices in adaptive load
balancing (ALB) communicates with each other, an endless
ping-pong of ARP replies starts between these two systems.

What happens? In the ALB mode, bonding driver keeps track
of each client connected in a hash table, so it can do the
receive load balancing (RLB). This hash table is updated
when an ARP reply is received, then it scans for the client
entry, updates its MAC address and flag it to be announced
later. Therefore, two seconds later, the alb monitor runs
and send for each updated client entry two ARP replies
updating this specific client. The same process happens on
the receiving system, causing the endless ping-pong of arp
replies.

See more information including the relevant functions below:

   System 1                          System 2
    bond0                             bond0

   ping <system2>
    ARP request  --------->
                           <--------- ARP reply

+->rlb_arp_recv  <---------------------+   <--- loop begins
|  rlb_update_entry_from_arp           |
|  client_info->ntt = 1;               |
|  bond_info->rx_ntt = 1;              |
|                                      |
|         <communication succeed>      |
|                                      |
|  bond_alb_monitor                    |
|  rlb_update_rx_clients               |
|  rlb_update_client                   |
|  arp_create(ARPOP_REPLY)             |
|   send ARP reply -------------->     V
|   send ARP reply -------------->
|                               rlb_arp_recv
|                               rlb_update_entry_from_arp
|                               client_info->ntt = 1;
|                               bond_info->rx_ntt = 1;
|                           < snipped, same as in system 1>
+-------           <-------------- send ARP reply
                   <-------------- send ARP reply

Besides the unneeded networking traffic, this loop breaks
a cluster because a backup system can't take over the IP
address. There is always one system sending an ARP reply
poisoning the network.

This patch fixes the problem adding a check for the MAC
address before updating it. Thus, if the MAC address didn't
change, there is no need to update neither to announce it later.

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
---
 drivers/net/bonding/bond_alb.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 40fdc41..67154bb 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -340,7 +340,8 @@ static void rlb_update_entry_from_arp(struct bonding *bond, struct arp_pkt *arp)

 	if ((client_info->assigned) &&
 	    (client_info->ip_src == arp->ip_dst) &&
-	    (client_info->ip_dst == arp->ip_src)) {
+	    (client_info->ip_dst == arp->ip_src) &&
+	    (memcmp(client_info->mac_dst, arp->mac_src, ETH_ALEN))) {
 		/* update the clients MAC address */
 		memcpy(client_info->mac_dst, arp->mac_src, ETH_ALEN);
 		client_info->ntt = 1;
-- 
1.7.0.1

^ permalink raw reply related

* Re: [PATCH -next] qlcnic: fail when try to setup unsupported features
From: Ben Hutchings @ 2010-06-29 14:41 UTC (permalink / raw)
  To: Stanislaw Gruszka
  Cc: Amit Salecha, netdev@vger.kernel.org, Amerigo Wang,
	Anirban Chakraborty
In-Reply-To: <1277734724.2089.10.camel@achroite.uk.solarflarecom.com>

On Mon, 2010-06-28 at 15:18 +0100, Ben Hutchings wrote:
> On Mon, 2010-06-28 at 16:14 +0200, Stanislaw Gruszka wrote:
> [...]
> > My plan is something like that:
> > 
> > static const struct ethtool_ops my_ethtool_ops = {
> >         .get_flags              = ethtool_op_get_flags,
> >         .set_flags              = ethtool_op_set_flags,
> > 	.supported_flags	= ETH_FLAG_LRO
> > }
> > 
> > Plus op->supported_flags check in ethtool_op_set_flags. That will allow
> > to define flags per driver. There is also possible to add supported_flags
> > to netdev, but I would like to avoid that - in such case drivers can use
> > custom .set_flags function.
> 
> Sounds good to me.

On second thoughts, this is not going work - supported_flags may need to
be different for different chips handled by the same driver.  In fact,
this is already the case in sfc.  So I think you should do what I
suggested previously - add a supported_flags parameter to
ethtool_op_set_flags.

Ben.

-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* [PATCH -next] ixgbe: use NETIF_F_LRO
From: Stanislaw Gruszka @ 2010-06-29 14:38 UTC (permalink / raw)
  To: netdev; +Cc: Jeff Kirsher

Both ETH_FLAG_LRO and NETIF_F_LRO have the same value, but NETIF_F_LRO
is intended to use with netdev->features.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
 drivers/net/ixgbe/ixgbe_ethtool.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ixgbe/ixgbe_ethtool.c b/drivers/net/ixgbe/ixgbe_ethtool.c
index 873b45e..e2ab4ae 100644
--- a/drivers/net/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ixgbe/ixgbe_ethtool.c
@@ -2227,7 +2227,7 @@ static int ixgbe_set_flags(struct net_device *netdev, u32 data)
 				break;
 			}
 		} else if (!adapter->rx_itr_setting) {
-			netdev->features &= ~ETH_FLAG_LRO;
+			netdev->features &= ~NETIF_F_LRO;
 			if (data & ETH_FLAG_LRO)
 				e_info("rx-usecs set to 0, "
 					"LRO/RSC cannot be enabled.\n");
-- 
1.5.5.6


^ permalink raw reply related

* [PATCH -next] myri10ge: clear NETIF_F_LRO bit directly
From: Stanislaw Gruszka @ 2010-06-29 14:37 UTC (permalink / raw)
  To: netdev; +Cc: Amerigo Wang, Andrew Gallatin, Brice Goglin

Do not use ethtool_op_set_flags() to clear one bit in ->features.
Inform user about disabling LRO.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
 drivers/net/myri10ge/myri10ge.c |    8 +++-----
 1 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
index e0b47cc..2259168 100644
--- a/drivers/net/myri10ge/myri10ge.c
+++ b/drivers/net/myri10ge/myri10ge.c
@@ -1725,17 +1725,15 @@ static u32 myri10ge_get_rx_csum(struct net_device *netdev)
 static int myri10ge_set_rx_csum(struct net_device *netdev, u32 csum_enabled)
 {
 	struct myri10ge_priv *mgp = netdev_priv(netdev);
-	int err = 0;
 
 	if (csum_enabled)
 		mgp->csum_flag = MXGEFW_FLAGS_CKSUM;
 	else {
-		u32 flags = ethtool_op_get_flags(netdev);
-		err = ethtool_op_set_flags(netdev, (flags & ~ETH_FLAG_LRO));
 		mgp->csum_flag = 0;
-
+		netdev->features &= ~NETIF_F_LRO;
+		netdev_info(netdev, "RX checksumming set off, disabling LRO\n");
 	}
-	return err;
+	return 0;
 }
 
 static int myri10ge_set_tso(struct net_device *netdev, u32 tso_enabled)
-- 
1.5.5.6


^ permalink raw reply related

* [PATCH -next] sfc: set/clear NETIF_F_RXHASH bit directly
From: Stanislaw Gruszka @ 2010-06-29 14:35 UTC (permalink / raw)
  To: netdev; +Cc: Amerigo Wang, Ben Hutchings

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
 drivers/net/sfc/ethtool.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/net/sfc/ethtool.c b/drivers/net/sfc/ethtool.c
index 7693cfb..fd55123 100644
--- a/drivers/net/sfc/ethtool.c
+++ b/drivers/net/sfc/ethtool.c
@@ -554,7 +554,12 @@ static int efx_ethtool_set_flags(struct net_device *net_dev, u32 data)
 	if (data & ~supported)
 		return -EOPNOTSUPP;
 
-	return ethtool_op_set_flags(net_dev, data);
+	if (data & ETH_FLAG_RXHASH)
+		net_dev->features |= NETIF_F_RXHASH;
+	else
+		net_dev->features &= ~NETIF_F_RXHASH;
+
+	return 0;
 }
 
 static void efx_ethtool_self_test(struct net_device *net_dev,
-- 
1.5.5.6


^ permalink raw reply related

* [PATCH -next] enic: fail when try to setup unsupported features
From: Stanislaw Gruszka @ 2010-06-29 14:33 UTC (permalink / raw)
  To: netdev; +Cc: Amerigo Wang, Scott Feldman

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
---
 drivers/net/enic/enic_main.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 6c6795b..77a7f87 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -365,7 +365,6 @@ static const struct ethtool_ops enic_ethtool_ops = {
 	.get_coalesce = enic_get_coalesce,
 	.set_coalesce = enic_set_coalesce,
 	.get_flags = ethtool_op_get_flags,
-	.set_flags = ethtool_op_set_flags,
 };
 
 static void enic_free_wq_buf(struct vnic_wq *wq, struct vnic_wq_buf *buf)
-- 
1.5.5.6


^ permalink raw reply related

* Re: [PATCH v2] fragment: add fast path
From: YOSHIFUJI Hideaki @ 2010-06-29 14:29 UTC (permalink / raw)
  To: Changli Gao
  Cc: Eric Dumazet, David Miller, Alexey Kuznetsov, Pekka Savola (ipv6),
	James Morris, Patrick McHardy, netdev, Mitchell Erblich,
	YOSHIFUJI Hideaki
In-Reply-To: <AANLkTimZ9Q-i9eTOfE3dae9Fru0dV2p2cEQqO3l8XBe9@mail.gmail.com>

Changli Gao wrote:
> On Tue, Jun 29, 2010 at 9:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Le vendredi 25 juin 2010 à 10:54 +0800, Changli Gao a écrit :
>>> add fast path
>>>
>>> As the fragments are sent in order in most of OSes, such as Windows, Darwin and
>>> FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
>>> In the fast path, we check if the skb at the end of the inet_frag_queue is the
>>> prev we expect.
>>>
>>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>> This patch is fine, but they are two indentation glitches.
>>
> 
> Oh, Thanks. I'll fix them.

And, I think it is better not to just say it as "fast path"
because it does not sufficient.  Probably "fast path for
in-order fragments" or something like that.

Regards,

--yoshfuji

^ permalink raw reply

* Re: [PATCH v2] fragment: add fast path
From: Changli Gao @ 2010-06-29 14:15 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Alexey Kuznetsov, Pekka Savola (ipv6), James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, Mitchell Erblich
In-Reply-To: <1277819660.3531.568.camel@edumazet-laptop>

On Tue, Jun 29, 2010 at 9:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le vendredi 25 juin 2010 à 10:54 +0800, Changli Gao a écrit :
>> add fast path
>>
>> As the fragments are sent in order in most of OSes, such as Windows, Darwin and
>> FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
>> In the fast path, we check if the skb at the end of the inet_frag_queue is the
>> prev we expect.
>>
>> Signed-off-by: Changli Gao <xiaosuo@gmail.com>
>
> This patch is fine, but they are two indentation glitches.
>

Oh, Thanks. I'll fix them.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* Why is destructor_arg in shinfo?
From: Jeremy Fitzhardinge @ 2010-06-29 14:09 UTC (permalink / raw)
  To: Johann Baudy; +Cc: NetDev, Ian Campbell, David Miller

Hi,

I'm wondering why "net: TX_RING and packet mmap" (69e3c75) ended up
putting the skb destructor's arg in the skb's shinfo.  It seems like an
odd mismatch, since the skb and the shinfo have different lifetimes. 
And in principle you might have two skbs with different destructors
sharing a shinfo, and therefore conflict over the use of destructor_arg.

What's the rationale?

Would it make sense to have a shinfo destructor as well?

Thanks,
    J

^ permalink raw reply

* Re: [PATCH v2] fragment: add fast path
From: Eric Dumazet @ 2010-06-29 13:54 UTC (permalink / raw)
  To: Changli Gao
  Cc: David Miller, Alexey Kuznetsov, Pekka Savola (ipv6), James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, netdev, Mitchell Erblich
In-Reply-To: <1277434472-2845-1-git-send-email-xiaosuo@gmail.com>

Le vendredi 25 juin 2010 à 10:54 +0800, Changli Gao a écrit :
> add fast path
> 
> As the fragments are sent in order in most of OSes, such as Windows, Darwin and
> FreeBSD, it is likely the new fragments are at the end of the inet_frag_queue.
> In the fast path, we check if the skb at the end of the inet_frag_queue is the
> prev we expect.
> 
> Signed-off-by: Changli Gao <xiaosuo@gmail.com>

This patch is fine, but they are two indentation glitches.

> ----
>  include/net/inet_frag.h |    1 +
>  net/ipv4/ip_fragment.c  |   12 ++++++++++++
>  net/ipv6/reassembly.c   |   11 +++++++++++
>  3 files changed, 24 insertions(+)
> diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h
> index 39f2dc9..16ff29a 100644
> --- a/include/net/inet_frag.h
> +++ b/include/net/inet_frag.h
> @@ -20,6 +20,7 @@ struct inet_frag_queue {
>  	atomic_t		refcnt;
>  	struct timer_list	timer;      /* when will this queue expire? */
>  	struct sk_buff		*fragments; /* list of received fragments */
> +	struct sk_buff		*fragments_tail;
>  	ktime_t			stamp;
>  	int			len;        /* total length of orig datagram */
>  	int			meat;
> diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
> index 858d346..dbe8999 100644
> --- a/net/ipv4/ip_fragment.c
> +++ b/net/ipv4/ip_fragment.c
> @@ -314,6 +314,7 @@ static int ip_frag_reinit(struct ipq *qp)
>  	qp->q.len = 0;
>  	qp->q.meat = 0;
>  	qp->q.fragments = NULL;
> +	qp->q.fragments_tail = NULL;
>  	qp->iif = 0;
>  
>  	return 0;
> @@ -386,6 +387,11 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
>  	 * in the chain of fragments so far.  We must know where to put
>  	 * this fragment, right?
>  	 */
> +	prev = qp->q.fragments_tail;
> +	if (!prev || FRAG_CB(prev)->offset < offset) {

strange indentation : one tab in excess

> +			next = NULL;
> +			goto found;
> +	}
>  	prev = NULL;
>  	for (next = qp->q.fragments; next != NULL; next = next->next) {
>  		if (FRAG_CB(next)->offset >= offset)
> @@ -393,6 +399,7 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
>  		prev = next;
>  	}
>  
> +found:
>  	/* We found where to put this one.  Check for overlap with
>  	 * preceding fragment, and, if needed, align things so that
>  	 * any overlaps are eliminated.
> @@ -451,6 +458,8 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb)
>  
>  	/* Insert this fragment in the chain of fragments. */
>  	skb->next = next;
> +	if (!next)
> +		qp->q.fragments_tail = skb;
>  	if (prev)
>  		prev->next = skb;
>  	else
> @@ -504,6 +513,8 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
>  			goto out_nomem;
>  
>  		fp->next = head->next;
> +		if (!fp->next)
> +			qp->q.fragments_tail = fp;
>  		prev->next = fp;
>  
>  		skb_morph(head, qp->q.fragments);
> @@ -574,6 +585,7 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *prev,
>  	iph->tot_len = htons(len);
>  	IP_INC_STATS_BH(net, IPSTATS_MIB_REASMOKS);
>  	qp->q.fragments = NULL;
> +	qp->q.fragments_tail = NULL;
>  	return 0;
>  
>  out_nomem:
> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
> index 0b97230..b832f7b 100644
> --- a/net/ipv6/reassembly.c
> +++ b/net/ipv6/reassembly.c
> @@ -333,6 +333,11 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
>  	 * in the chain of fragments so far.  We must know where to put
>  	 * this fragment, right?
>  	 */
> +	prev = fq->q.fragments_tail;
> +	if (!prev || FRAG6_CB(prev)->offset < offset) {

same here : one tab in excess

> +			next = NULL;
> +			goto found;
> +	}
>  	prev = NULL;
>  	for(next = fq->q.fragments; next != NULL; next = next->next) {
>  		if (FRAG6_CB(next)->offset >= offset)
> @@ -340,6 +345,7 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
>  		prev = next;
>  	}
>  
> +found:
>  	/* We found where to put this one.  Check for overlap with
>  	 * preceding fragment, and, if needed, align things so that
>  	 * any overlaps are eliminated.
> @@ -397,6 +403,8 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
>  
>  	/* Insert this fragment in the chain of fragments. */
>  	skb->next = next;
> +	if (!next)
> +		fq->q.fragments_tail = skb;
>  	if (prev)
>  		prev->next = skb;
>  	else
> @@ -463,6 +471,8 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
>  			goto out_oom;
>  
>  		fp->next = head->next;
> +		if (!fp->next)
> +			fq->q.fragments_tail = fp;
>  		prev->next = fp;
>  
>  		skb_morph(head, fq->q.fragments);
> @@ -549,6 +559,7 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *prev,
>  	IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMOKS);
>  	rcu_read_unlock();
>  	fq->q.fragments = NULL;
> +	fq->q.fragments_tail = NULL;
>  	return 1;
>  
>  out_oversize:



^ permalink raw reply

* [PATCH net-next-2.6] snmp: 64bit ipstats_mib for all arches
From: Eric Dumazet @ 2010-06-29 13:48 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <1277398051.2816.644.camel@edumazet-laptop>

/proc/net/snmp and /proc/net/netstat expose SNMP counters.

Width of these counters is either 32 or 64 bits, depending on the size
of "unsigned long" in kernel.

This means user program parsing these files must already be prepared to
deal with 64bit values, regardless of user program being 32 or 64 bit.

This patch introduces 64bit snmp values for IPSTAT mib, where some
counters can wrap pretty fast if they are 32bit wide.

# netstat -s|egrep "InOctets|OutOctets"
    InOctets: 244068329096
    OutOctets: 244069348848


Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/ip.h    |   20 +++++++----
 include/net/ipv6.h  |   12 +++---
 include/net/snmp.h  |   75 +++++++++++++++++++++++++++++++++++++++---
 net/ipv4/af_inet.c  |   36 ++++++++++++++++++++
 net/ipv4/proc.c     |   15 +++++---
 net/ipv6/addrconf.c |   18 +++++++++-
 net/ipv6/proc.c     |   17 +++++++--
 7 files changed, 167 insertions(+), 26 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 3b524df..890f972 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -165,12 +165,12 @@ struct ipv4_config {
 };
 
 extern struct ipv4_config ipv4_config;
-#define IP_INC_STATS(net, field)	SNMP_INC_STATS((net)->mib.ip_statistics, field)
-#define IP_INC_STATS_BH(net, field)	SNMP_INC_STATS_BH((net)->mib.ip_statistics, field)
-#define IP_ADD_STATS(net, field, val)	SNMP_ADD_STATS((net)->mib.ip_statistics, field, val)
-#define IP_ADD_STATS_BH(net, field, val) SNMP_ADD_STATS_BH((net)->mib.ip_statistics, field, val)
-#define IP_UPD_PO_STATS(net, field, val) SNMP_UPD_PO_STATS((net)->mib.ip_statistics, field, val)
-#define IP_UPD_PO_STATS_BH(net, field, val) SNMP_UPD_PO_STATS_BH((net)->mib.ip_statistics, field, val)
+#define IP_INC_STATS(net, field)	SNMP_INC_STATS64((net)->mib.ip_statistics, field)
+#define IP_INC_STATS_BH(net, field)	SNMP_INC_STATS64_BH((net)->mib.ip_statistics, field)
+#define IP_ADD_STATS(net, field, val)	SNMP_ADD_STATS64((net)->mib.ip_statistics, field, val)
+#define IP_ADD_STATS_BH(net, field, val) SNMP_ADD_STATS64_BH((net)->mib.ip_statistics, field, val)
+#define IP_UPD_PO_STATS(net, field, val) SNMP_UPD_PO_STATS64((net)->mib.ip_statistics, field, val)
+#define IP_UPD_PO_STATS_BH(net, field, val) SNMP_UPD_PO_STATS64_BH((net)->mib.ip_statistics, field, val)
 #define NET_INC_STATS(net, field)	SNMP_INC_STATS((net)->mib.net_statistics, field)
 #define NET_INC_STATS_BH(net, field)	SNMP_INC_STATS_BH((net)->mib.net_statistics, field)
 #define NET_INC_STATS_USER(net, field) 	SNMP_INC_STATS_USER((net)->mib.net_statistics, field)
@@ -178,6 +178,14 @@ extern struct ipv4_config ipv4_config;
 #define NET_ADD_STATS_USER(net, field, adnd) SNMP_ADD_STATS_USER((net)->mib.net_statistics, field, adnd)
 
 extern unsigned long snmp_fold_field(void __percpu *mib[], int offt);
+#if BITS_PER_LONG==32
+extern u64 snmp_fold_field64(void __percpu *mib[], int offt, size_t sync_off);
+#else
+static inline u64 snmp_fold_field64(void __percpu *mib[], int offt, size_t syncp_off)
+{
+	return snmp_fold_field(mib, offt);
+}
+#endif
 extern int snmp_mib_init(void __percpu *ptr[2], size_t mibsize, size_t align);
 extern void snmp_mib_free(void __percpu *ptr[2]);
 
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index f5808d5..1f84124 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -136,17 +136,17 @@ extern struct ctl_path net_ipv6_ctl_path[];
 /* MIBs */
 
 #define IP6_INC_STATS(net, idev,field)		\
-		_DEVINC(net, ipv6, , idev, field)
+		_DEVINC(net, ipv6, 64, idev, field)
 #define IP6_INC_STATS_BH(net, idev,field)	\
-		_DEVINC(net, ipv6, _BH, idev, field)
+		_DEVINC(net, ipv6, 64_BH, idev, field)
 #define IP6_ADD_STATS(net, idev,field,val)	\
-		_DEVADD(net, ipv6, , idev, field, val)
+		_DEVADD(net, ipv6, 64, idev, field, val)
 #define IP6_ADD_STATS_BH(net, idev,field,val)	\
-		_DEVADD(net, ipv6, _BH, idev, field, val)
+		_DEVADD(net, ipv6, 64_BH, idev, field, val)
 #define IP6_UPD_PO_STATS(net, idev,field,val)   \
-		_DEVUPD(net, ipv6, , idev, field, val)
+		_DEVUPD(net, ipv6, 64, idev, field, val)
 #define IP6_UPD_PO_STATS_BH(net, idev,field,val)   \
-		_DEVUPD(net, ipv6, _BH, idev, field, val)
+		_DEVUPD(net, ipv6, 64_BH, idev, field, val)
 #define ICMP6_INC_STATS(net, idev, field)	\
 		_DEVINC(net, icmpv6, , idev, field)
 #define ICMP6_INC_STATS_BH(net, idev, field)	\
diff --git a/include/net/snmp.h b/include/net/snmp.h
index 899003d..a0e6180 100644
--- a/include/net/snmp.h
+++ b/include/net/snmp.h
@@ -47,15 +47,16 @@ struct snmp_mib {
 }
 
 /*
- * We use all unsigned longs. Linux will soon be so reliable that even 
- * these will rapidly get too small 8-). Seriously consider the IpInReceives 
- * count on the 20Gb/s + networks people expect in a few years time!
+ * We use unsigned longs for most mibs but u64 for ipstats.
  */
+#include <linux/u64_stats_sync.h>
 
 /* IPstats */
 #define IPSTATS_MIB_MAX	__IPSTATS_MIB_MAX
 struct ipstats_mib {
-	unsigned long	mibs[IPSTATS_MIB_MAX];
+	/* mibs[] must be first field of struct ipstats_mib */
+	u64		mibs[IPSTATS_MIB_MAX];
+	struct u64_stats_sync syncp;
 };
 
 /* ICMP */
@@ -155,4 +156,70 @@ struct linux_xfrm_mib {
 		ptr->mibs[basefield##PKTS]++; \
 		ptr->mibs[basefield##OCTETS] += addend;\
 	} while (0)
+
+
+#if BITS_PER_LONG==32
+
+#define SNMP_ADD_STATS64_BH(mib, field, addend) 			\
+	do {								\
+		__typeof__(*mib[0]) *ptr = __this_cpu_ptr((mib)[0]);	\
+		u64_stats_update_begin(&ptr->syncp);			\
+		ptr->mibs[field] += addend;				\
+		u64_stats_update_end(&ptr->syncp);			\
+	} while (0)
+#define SNMP_ADD_STATS64_USER(mib, field, addend) 			\
+	do {								\
+		__typeof__(*mib[0]) *ptr;				\
+		preempt_disable();					\
+		ptr = __this_cpu_ptr((mib)[1]);				\
+		u64_stats_update_begin(&ptr->syncp);			\
+		ptr->mibs[field] += addend;				\
+		u64_stats_update_end(&ptr->syncp);			\
+		preempt_enable();					\
+	} while (0)
+#define SNMP_ADD_STATS64(mib, field, addend)				\
+	do {								\
+		__typeof__(*mib[0]) *ptr;				\
+		preempt_disable();					\
+		ptr = __this_cpu_ptr((mib)[!in_softirq()]);		\
+		u64_stats_update_begin(&ptr->syncp);			\
+		ptr->mibs[field] += addend;				\
+		u64_stats_update_end(&ptr->syncp);			\
+		preempt_enable();					\
+	} while (0)
+#define SNMP_INC_STATS64_BH(mib, field) SNMP_ADD_STATS64_BH(mib, field, 1)
+#define SNMP_INC_STATS64_USER(mib, field) SNMP_ADD_STATS64_USER(mib, field, 1)
+#define SNMP_INC_STATS64(mib, field) SNMP_ADD_STATS64(mib, field, 1)
+#define SNMP_UPD_PO_STATS64(mib, basefield, addend)			\
+	do {								\
+		__typeof__(*mib[0]) *ptr;				\
+		preempt_disable();					\
+		ptr = __this_cpu_ptr((mib)[!in_softirq()]);		\
+		u64_stats_update_begin(&ptr->syncp);			\
+		ptr->mibs[basefield##PKTS]++;				\
+		ptr->mibs[basefield##OCTETS] += addend;			\
+		u64_stats_update_end(&ptr->syncp);			\
+		preempt_enable();					\
+	} while (0)
+#define SNMP_UPD_PO_STATS64_BH(mib, basefield, addend)			\
+	do {								\
+		__typeof__(*mib[0]) *ptr;				\
+		ptr = __this_cpu_ptr((mib)[!in_softirq()]);		\
+		u64_stats_update_begin(&ptr->syncp);			\
+		ptr->mibs[basefield##PKTS]++;				\
+		ptr->mibs[basefield##OCTETS] += addend;			\
+		u64_stats_update_end(&ptr->syncp);			\
+	} while (0)
+#else
+#define SNMP_INC_STATS64_BH(mib, field)		SNMP_INC_STATS_BH(mib, field)
+#define SNMP_INC_STATS64_USER(mib, field)	SNMP_INC_STATS_USER(mib, field)
+#define SNMP_INC_STATS64(mib, field)		SNMP_INC_STATS(mib, field)
+#define SNMP_DEC_STATS64(mib, field)		SNMP_DEC_STATS(mib, field)
+#define SNMP_ADD_STATS64_BH(mib, field, addend) SNMP_ADD_STATS_BH(mib, field, addend)
+#define SNMP_ADD_STATS64_USER(mib, field, addend) SNMP_ADD_STATS_USER(mib, field, addend)
+#define SNMP_ADD_STATS64(mib, field, addend)	SNMP_ADD_STATS(mib, field, addend)
+#define SNMP_UPD_PO_STATS64(mib, basefield, addend) SNMP_UPD_PO_STATS(mib, basefield, addend)
+#define SNMP_UPD_PO_STATS64_BH(mib, basefield, addend) SNMP_UPD_PO_STATS_BH(mib, basefield, addend)
+#endif
+
 #endif
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 640db9b..3ceb025 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1427,6 +1427,42 @@ unsigned long snmp_fold_field(void __percpu *mib[], int offt)
 }
 EXPORT_SYMBOL_GPL(snmp_fold_field);
 
+#if BITS_PER_LONG==32
+
+u64 snmp_fold_field64(void __percpu *mib[], int offt, size_t syncp_offset)
+{
+	u64 res = 0;
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		void *bhptr, *userptr;
+		struct u64_stats_sync *syncp;
+		u64 v_bh, v_user;
+		unsigned int start;
+
+		/* first mib used by softirq context, we must use _bh() accessors */
+		bhptr = per_cpu_ptr(SNMP_STAT_BHPTR(mib), cpu);
+		syncp = (struct u64_stats_sync *)(bhptr + syncp_offset);
+		do {
+			start = u64_stats_fetch_begin_bh(syncp);
+			v_bh = *(((u64 *) bhptr) + offt);
+		} while (u64_stats_fetch_retry_bh(syncp, start));
+
+		/* second mib used in USER context */
+		userptr = per_cpu_ptr(SNMP_STAT_USRPTR(mib), cpu);
+		syncp = (struct u64_stats_sync *)(userptr + syncp_offset);
+		do {
+			start = u64_stats_fetch_begin(syncp);
+			v_user = *(((u64 *) userptr) + offt);
+		} while (u64_stats_fetch_retry(syncp, start));
+
+		res += v_bh + v_user;
+	}
+	return res;
+}
+EXPORT_SYMBOL_GPL(snmp_fold_field64);
+#endif
+
 int snmp_mib_init(void __percpu *ptr[2], size_t mibsize, size_t align)
 {
 	BUG_ON(ptr == NULL);
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index e320ca6..4ae1f20 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -343,10 +343,12 @@ static int snmp_seq_show(struct seq_file *seq, void *v)
 		   IPV4_DEVCONF_ALL(net, FORWARDING) ? 1 : 2,
 		   sysctl_ip_default_ttl);
 
+	BUILD_BUG_ON(offsetof(struct ipstats_mib, mibs) != 0);
 	for (i = 0; snmp4_ipstats_list[i].name != NULL; i++)
-		seq_printf(seq, " %lu",
-			   snmp_fold_field((void __percpu **)net->mib.ip_statistics,
-					   snmp4_ipstats_list[i].entry));
+		seq_printf(seq, " %llu",
+			   snmp_fold_field64((void __percpu **)net->mib.ip_statistics,
+					     snmp4_ipstats_list[i].entry,
+					     offsetof(struct ipstats_mib, syncp)));
 
 	icmp_put(seq);	/* RFC 2011 compatibility */
 	icmpmsg_put(seq);
@@ -432,9 +434,10 @@ static int netstat_seq_show(struct seq_file *seq, void *v)
 
 	seq_puts(seq, "\nIpExt:");
 	for (i = 0; snmp4_ipextstats_list[i].name != NULL; i++)
-		seq_printf(seq, " %lu",
-			   snmp_fold_field((void __percpu **)net->mib.ip_statistics,
-					   snmp4_ipextstats_list[i].entry));
+		seq_printf(seq, " %llu",
+			   snmp_fold_field64((void __percpu **)net->mib.ip_statistics,
+					     snmp4_ipextstats_list[i].entry,
+					     offsetof(struct ipstats_mib, syncp)));
 
 	seq_putc(seq, '\n');
 	return 0;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index c20a7c2..56165ae 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3858,12 +3858,28 @@ static inline void __snmp6_fill_stats(u64 *stats, void __percpu **mib,
 	memset(&stats[items], 0, pad);
 }
 
+static inline void __snmp6_fill_stats64(u64 *stats, void __percpu **mib,
+				      int items, int bytes, size_t syncpoff)
+{
+	int i;
+	int pad = bytes - sizeof(u64) * items;
+	BUG_ON(pad < 0);
+
+	/* Use put_unaligned() because stats may not be aligned for u64. */
+	put_unaligned(items, &stats[0]);
+	for (i = 1; i < items; i++)
+		put_unaligned(snmp_fold_field64(mib, i, syncpoff), &stats[i]);
+
+	memset(&stats[items], 0, pad);
+}
+
 static void snmp6_fill_stats(u64 *stats, struct inet6_dev *idev, int attrtype,
 			     int bytes)
 {
 	switch (attrtype) {
 	case IFLA_INET6_STATS:
-		__snmp6_fill_stats(stats, (void __percpu **)idev->stats.ipv6, IPSTATS_MIB_MAX, bytes);
+		__snmp6_fill_stats64(stats, (void __percpu **)idev->stats.ipv6,
+				     IPSTATS_MIB_MAX, bytes, offsetof(struct ipstats_mib, syncp));
 		break;
 	case IFLA_INET6_ICMP6STATS:
 		__snmp6_fill_stats(stats, (void __percpu **)idev->stats.icmpv6, ICMP6_MIB_MAX, bytes);
diff --git a/net/ipv6/proc.c b/net/ipv6/proc.c
index 566798d..4777691 100644
--- a/net/ipv6/proc.c
+++ b/net/ipv6/proc.c
@@ -174,17 +174,28 @@ static void snmp6_seq_show_item(struct seq_file *seq, void __percpu **mib,
 				const struct snmp_mib *itemlist)
 {
 	int i;
-	for (i=0; itemlist[i].name; i++)
+
+	for (i = 0; itemlist[i].name; i++)
 		seq_printf(seq, "%-32s\t%lu\n", itemlist[i].name,
 			   snmp_fold_field(mib, itemlist[i].entry));
 }
 
+static void snmp6_seq_show_item64(struct seq_file *seq, void __percpu **mib,
+				  const struct snmp_mib *itemlist, size_t syncpoff)
+{
+	int i;
+
+	for (i = 0; itemlist[i].name; i++)
+		seq_printf(seq, "%-32s\t%llu\n", itemlist[i].name,
+			   snmp_fold_field64(mib, itemlist[i].entry, syncpoff));
+}
+
 static int snmp6_seq_show(struct seq_file *seq, void *v)
 {
 	struct net *net = (struct net *)seq->private;
 
-	snmp6_seq_show_item(seq, (void __percpu **)net->mib.ipv6_statistics,
-			    snmp6_ipstats_list);
+	snmp6_seq_show_item64(seq, (void __percpu **)net->mib.ipv6_statistics,
+			    snmp6_ipstats_list, offsetof(struct ipstats_mib, syncp));
 	snmp6_seq_show_item(seq, (void __percpu **)net->mib.icmpv6_statistics,
 			    snmp6_icmp6_list);
 	snmp6_seq_show_icmpv6msg(seq,



^ permalink raw reply related

* Re: [PATCHv2] vhost-net: add dhclient work-around from userspace
From: Michael S. Tsirkin @ 2010-06-29 13:04 UTC (permalink / raw)
  To: David Miller
  Cc: arozansk, herbert.xu, quintela, kvm, virtualization, netdev,
	linux-kernel, ykaul, markmc
In-Reply-To: <20100629.003647.214219303.davem@davemloft.net>

On Tue, Jun 29, 2010 at 12:36:47AM -0700, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Mon, 28 Jun 2010 13:08:07 +0300
> 
> > Userspace virtio server has the following hack
> > so guests rely on it, and we have to replicate it, too:
> > 
> > Use port number to detect incoming IPv4 DHCP response packets,
> > and fill in the checksum for these.
> > 
> > The issue we are solving is that on linux guests, some apps
> > that use recvmsg with AF_PACKET sockets, don't know how to
> > handle CHECKSUM_PARTIAL;
> > The interface to return the relevant information was added
> > in 8dc4194474159660d7f37c495e3fc3f10d0db8cc,
> > and older userspace does not use it.
> > One important user of recvmsg with AF_PACKET is dhclient,
> > so we add a work-around just for DHCP.
> > 
> > Don't bother applying the hack to IPv6 as userspace virtio does not
> > have a work-around for that - let's hope guests will do the right
> > thing wrt IPv6.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> Yikes, this is awful too.
> 
> Nothing in the kernel should be mucking around with procotol packets
> like this by default.  In particular, what the heck does port 67 mean?
> Locally I can use it for whatever I want for my own purposes, I don't
> have to follow the conventions for service ports as specified by the
> IETF.
> 
> But I can't have the packet checksum state be left alone for port 67
> traffic on a box using virtio because you have this hack there.
> 
> And yes it's broken on machines using the qemu thing, but at least the
> hack there is restricted to userspace.

Yes, and I think it was a mistake to add the hack there. This is what
prevented applications from using the new interface in the 3 years
since it was first introduced.

> I really don't want anything in the kernel that looks like this.
> 
> These applications are broken, and we've provided a way for them to
> work properly.  What's the point of having fixed applications if
> all of these hacks grow like fungus over every virtualization transport?
> 
> It just means that people won't fix the apps, since they don't have
> to.  There is no incentive, and the mechanism we created to properly
> handle this loses it's value.
> 
> At best, you can write a netfilter module that mucks up the packet
> checksum state in these situations.  At least in that case, you can
> make it generic (it mangles iff a packet matches a certain rule,
> so for your virtio guests you'd make it match for DHCP frames) instead
> of being some hard-coded DHCP thing by design.

Nod.
One question on implementation:
why does skb_checksum_help set the checksum state to
CHECKSUM_NONE? Shouldn't it be CHECKSUM_COMPLETE?



> And since this is so cleanly seperated and portable you don't even
> need to push it upstream.  It's a temporary workaround for a temporary
> problem.  You can just delete it as soon as the majority of guests
> have the fixed dhcp.  The qemu crap should disappear similarly.

Since using the module involves updating the management tools
as well, if we go down this route it will be much less painful
for everyone to do push it upstream.

-- 
MST

^ permalink raw reply

* Re: [0/8] netpoll/bridge fixes
From: Yanko Kaneti @ 2010-06-29 12:53 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Michael S.Tsirkin, Qianfeng Zhang, David S.Miller, netdev,
	WANG Cong, Stephen Hemminger, Matt Mackall
In-Reply-To: <20100610124047.GA16658@gondor.apana.org.au>

On Thu, 2010-06-10 at 22:40 +1000, Herbert Xu wrote:
> Hi:
> 
> Qianfeng Zhang reported that he was seeing crashes with the
> attached backtrace.
> 
> I tracked this down to the recently added netpoll support in
> the bridge device.  It's a classic use-after-free problem.
> 
> Trying to solve it brought out a host of other issues, some of
> which existed prior to the new bridge code.  The following patches
> attempt to address some of these issues.
> 
> Warning, this is completely untested (apart from compiling with
> everything enabled) so please look but don't merge :)

FWIW 2.6.35-0.2.rc3.git0.fc14.x86_64 and later rawhide kernels are
causing quite reproducible __br_deliver crashes on routine f13
netinstalls in a kvm guest here.

To test I cherry picked this series +
netpoll-Use-correct-primitives-for-RCU-dereferencing and
net-fix-netpoll-Allow-netpoll_setup-cleanup-recursion from net-next on
top of 2.6.35-0.15.rc3.git3.fc14.x86_64 (which is todays linus tree) and
it seems to fix the crashes for me.

Perhaps the netpoll fixes should find their way to a rc before 2.6.35
goes golden.

Regards
Yanko

^ permalink raw reply

* Re: [PATCH 1/1] Bluetooth: hidp: Add support for hidraw HIDIOCGFEATURE and HIDIOCSFEATURE
From: Andrei Emeltchenko @ 2010-06-29 12:40 UTC (permalink / raw)
  To: David Miller
  Cc: ospite-aNJ+ML1ZbiP93QAQaVx+gl6hYfS7NtTn,
	alan-yzvJWuRpmD1zbRFIqnYvSA, marcel-kz+m5ild9QBg9hUCZPvPmw,
	jkosina-AlSwsSmVLrQ, mdpoole-IZmAEv5cUt1AfugRpC6u6w,
	hadess-0MeiytkfxGOsTnJN9+BGXg,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20100629.001216.91341775.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>

Hi,

On Tue, Jun 29, 2010 at 10:12 AM, David Miller <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org> wrote:
> From: Antonio Ospite <ospite-aNJ+ML1ZbiP93QAQaVx+gl6hYfS7NtTn@public.gmane.org>
> Date: Mon, 28 Jun 2010 13:14:37 +0200
>
>> On Sun, 13 Jun 2010 18:20:01 -0400
>> Alan Ott <alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org> wrote:
>>
>>> This patch adds support or getting and setting feature reports for bluetooth
>>> HID devices from HIDRAW.
>>>
>>> Signed-off-by: Alan Ott <alan-yzvJWuRpmD1zbRFIqnYvSA@public.gmane.org>
>>> ---
>>
>> Ping.
>
> We effectively don't have a bluetooth maintainer at the current point
> in time.  I've tried to let patches sit for a while hoping the listed
> maintainer would do something, at least occaisionally, but that simply
> isn't happening.
> So I'll just pick patches up directly as I find time to review them,
> but I have to warn that for me it's going to be done in a very low
> priority way because I really don't find bluetooth all that exciting. :-)

This would be good. We have a backlog of bluetooth kernel patches
waiting for several months.
This takes too much time to return to them again and again...

Please keep this ML informed.

Regards,
Andrei Emeltchenko

^ permalink raw reply

* RE: [REGRESSION] e1000e stopped working
From: Maxim Levitsky @ 2010-06-29 10:32 UTC (permalink / raw)
  To: Allan, Bruce W; +Cc: netdev@vger.kernel.org
In-Reply-To: <8DD2590731AB5D4C9DBF71A877482A90015918FAB6@orsmsx509.amr.corp.intel.com>

On Mon, 2010-06-28 at 18:09 -0700, Allan, Bruce W wrote:
> On Monday, June 28, 2010 10:14 AM, Maxim Levitsky wrote:
> > On Mon, 2010-06-28 at 10:04 -0700, Allan, Bruce W wrote:
> >> On Sunday, June 27, 2010 10:47 AM, Maxim Levitsky wrote:
> >>> On Sun, 2010-06-27 at 20:43 +0300, Maxim Levitsky wrote:
> >>>> On Sun, 2010-06-27 at 20:29 +0300, Maxim Levitsky wrote:
> >>>>> On Sun, 2010-06-27 at 20:27 +0300, Maxim Levitsky wrote:
> >>>>>> Just that,
> >>>>>> 
> >>>>>> It doesn't receive anything from my internet router during DHCP.
> >>>>>> 
> >>>>>> 
> >>>>>> 00:19.0 Ethernet controller [0200]: Intel Corporation 82566DC
> >>>>>> 	Gigabit Network Connection [8086:104b] (rev 02) Subsystem: Intel
> >>>>>> 	Corporation Device [8086:0001] Control: I/O+ Mem+ BusMaster+
> >>>>>> 	SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
> >>>>>> 	DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> >>>>>> 	>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- 	Latency: 0
> >>>>>> 	Interrupt: pin A routed to IRQ 47 Region 0: Memory at 50300000
> >>>>>> 	(32-bit, non-prefetchable) [size=128K] Region 1: Memory at
> >>>>>> 	50324000 (32-bit, non-prefetchable) [size=4K] Region 2: I/O
> >>>>>> 		ports at 30e0 [size=32] Capabilities: [c8] Power Management
> >>>>>> 		version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
> >>>>>> 	PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 PME-Enable- DSel=0
> >>>>>> 		DScale=1 PME- Capabilities: [d0] Message Signalled Interrupts:
> >>>>>> 	Mask- 64bit+ Queue=0/0 Enable+ Address: 00000000fee0100c  Data:
> >>>>>> 	41c9 Kernel driver in use: e1000e Kernel modules: e1000e
> >>>>>> 
> >>>>>> I use vanilla tree, commit
> >>>>>> bf2937695fe2330bfd8933a2310e7bdd2581dc2e 
> >>>>>> 
> >>>>>> 
> >>>>>> Best regards,
> >>>>>> 	Maxim Levitsky
> >>>>>> 
> >>>>> 
> >>>>> It appears to work now after reboot.
> >>>>> Will keep a look for this.
> >>>>> 
> >>>>> Disregard for now.
> >>>> 
> >>>> 
> >>>> Just s2ram cycle, problem is back.
> >>>> Did full reboot (power off then on), same thing card doesn't
> >>>> work... 
> >>>> 
> >>> 
> >>> Yep, s2ram sometimes 'fixes', sometimes breaks the card.
> >>> Something got broken in device initialization path.
> >>> 
> >>> Best regards,
> >>>  	Maxim Levitsky
> >> 
> >> What distro are you using?  If RedHat, since you are using DHCP will
> >> you please try putting a "LINKDELAY=10" in the
> >> /etc/sysconfig/network-scripts/ifcfg-ethX config file.  
> >> 
> > I use ubuntu 9.10
> > 
> >> Is there anything in the system log that might help narrow down the
> >> issue? 
> > 
> > Nothing, really nothing.
> > It seems to detect link, dhcp client sends requests, but doesn't
> > recieve a thing (even tried promisc mode - doesn't help)
> > 
> > 
> > 
> > Best regards,
> > 	Maxim Levitsky
> 
> Since you say this is a regression, when did this last work for you without this problem, i.e. which distro, which kernel?

I always compile kernel, and last kernel I compiled here was vanilla
2.6.33-rc4.
It works just fine.

I mostly use my laptop, and therefore didn't update kernel on my desktop
for long time.

If I find some free time I try to bisect the problem.



Best regards,
	Maxim Levitsky


^ permalink raw reply

* Re: [iproute2] iproute2: Allow 'ip addr flush' to loop more than 10 times.
From: Andreas Henriksson @ 2010-06-29  9:58 UTC (permalink / raw)
  To: David Miller; +Cc: Ben Greear, netdev, shemminger
In-Reply-To: <20100628.233600.242129599.davem@davemloft.net>

Hello all!

I'm sorry if I forgot to CC someone in this reply. I'm not subscribed
and all list archives seems to be very scared of showing recipiets
these days.

[...]
>
> I can understand the reasoning behind the limit, because if this is
> run by something automated it's not like someone is at the command
> line and hit Ctrl-C to break out of a looping instance.
>
> But practically speaking I bet this never happens.

I'm sorry to bring bad news, but your bet is wrong!

There are atleast two different places (IIRC route flush, addr flush)
in iproute2 which have these limits because they've been preventing
people from booting their systems in the past! I know atleast
ubuntu users has been having problems booting their computers because
firewalling scripts executed by init use iproute2 commands and
expect them to finish.

>
> So what makes sense to me is:
>
> 1) Loop forever by default.

I think this is a completely insane default. The iproute2 tools
are low-level and will be executed by higher level tools. Authors
of these higher level tools expect iproute2 commands to always finish.
Since iproute2 will *usually* finish, it's hard for these authors
that they need to add some switch to always get this behaviour.

>
> 2) When the number of loops exceeds a threshold (calculated by the
>    number of addresses we see the first dump, divided by the number
>    of deletes we can squeeze into the 4096 byte message), we emit
>    a warning.
>
> 3) A hard limit, off by default, it available via your "-l" new option.
>
> But seriously we can determine forward progress quite easily I think.
>
> Each loop, we see if the dump returns a smaller number of addresses
> than the last iteration.  If so, we just keep going.

This would be a much better solution!

>
> If the number of addresses increases, I think we can bail in this
> case.
>
> This logic would only ever trigger iff another entity is adding a
> large number of addresses simultaneously with our flush.  And frankly
> speaking the person doing the flush probably doesn't expect that to be
> happening.  You're flushing all of the addresses so you can start with
> a clean slate and then add specific addresses back, or whatever.
>

How about implementing it in the kernel so iproute2 can tell the kernel
via netlink "flush <addresses|routes> on interface X" with a single
netlink message?
I guess the kernel side has some kind of lock here that will prevent
addresses being added and removed at the same time?

Please think about this more before re-introducing the same problems again!

I guess with modern distributions now using parallell boot the issue
will not block the entire bootup anymore, but firewalls being blocked
forever by iproute2 and not coming up might be very bad in some circumstances.

-- 
Andreas Henriksson

^ permalink raw reply

* [net-next-2.6 PATCH] be2net: memory barrier fixes on IBM p7 platform
From: Sathya Perla @ 2010-06-29 10:11 UTC (permalink / raw)
  To: netdev

The ibm p7 architecure seems to reorder memory accesses more
aggressively than previous ppc64 architectures. This requires memory
barriers to ensure that rx/tx doorbells are pressed only after
memory to be DMAed is written.

Signed-off-by: Sathya Perla <sathyap@serverengines.com>
---
 drivers/net/benet/be_cmds.c |    2 ++
 drivers/net/benet/be_main.c |    9 ++++++++-
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index ee1ad96..65e3260 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -25,6 +25,8 @@ static void be_mcc_notify(struct be_adapter *adapter)
 
 	val |= mccq->id & DB_MCCQ_RING_ID_MASK;
 	val |= 1 << DB_MCCQ_NUM_POSTED_SHIFT;
+
+	wmb();
 	iowrite32(val, adapter->db + DB_MCCQ_OFFSET);
 }
 
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index 01eb447..62484b8 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -89,6 +89,8 @@ static void be_rxq_notify(struct be_adapter *adapter, u16 qid, u16 posted)
 	u32 val = 0;
 	val |= qid & DB_RQ_RING_ID_MASK;
 	val |= posted << DB_RQ_NUM_POSTED_SHIFT;
+
+	wmb();
 	iowrite32(val, adapter->db + DB_RQ_OFFSET);
 }
 
@@ -97,6 +99,8 @@ static void be_txq_notify(struct be_adapter *adapter, u16 qid, u16 posted)
 	u32 val = 0;
 	val |= qid & DB_TXULP_RING_ID_MASK;
 	val |= (posted & DB_TXULP_NUM_POSTED_MASK) << DB_TXULP_NUM_POSTED_SHIFT;
+
+	wmb();
 	iowrite32(val, adapter->db + DB_TXULP1_OFFSET);
 }
 
@@ -972,7 +976,8 @@ static struct be_eth_rx_compl *be_rx_compl_get(struct be_adapter *adapter)
 
 	if (rxcp->dw[offsetof(struct amap_eth_rx_compl, valid) / 32] == 0)
 		return NULL;
-
+	
+	rmb();
 	be_dws_le_to_cpu(rxcp, sizeof(*rxcp));
 
 	queue_tail_inc(&adapter->rx_obj.cq);
@@ -1066,6 +1071,7 @@ static struct be_eth_tx_compl *be_tx_compl_get(struct be_queue_info *tx_cq)
 	if (txcp->dw[offsetof(struct amap_eth_tx_compl, valid) / 32] == 0)
 		return NULL;
 
+	rmb();
 	be_dws_le_to_cpu(txcp, sizeof(*txcp));
 
 	txcp->dw[offsetof(struct amap_eth_tx_compl, valid) / 32] = 0;
@@ -1113,6 +1119,7 @@ static inline struct be_eq_entry *event_get(struct be_eq_obj *eq_obj)
 	if (!eqe->evt)
 		return NULL;
 
+	rmb();
 	eqe->evt = le32_to_cpu(eqe->evt);
 	queue_tail_inc(&eq_obj->q);
 	return eqe;
-- 
1.6.5.2


^ permalink raw reply related

* Re: IGB driver upgrade
From: sbs @ 2010-06-29  9:50 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: netdev@vger.kernel.org, e1000-devel@lists.sourceforge.net
In-Reply-To: <4C293792.6070508@intel.com>

On Tue, Jun 29, 2010 at 4:00 AM, Alexander Duyck
<alexander.h.duyck@intel.com> wrote:
> sbs wrote:
>>
>> Hello guys.
>>
>> Is it possible to upgrade intel gigabit adapter's e1000 driver to
>> 2.2.9? This is the latest version according to Intel website.
>>
> I assume you are referring to the igb driver since that is what is in the
> subject and not the e1000 driver correct?


Yes , sorry i mean IGB of course.



>
>> I've got a problem with 2.1.0-k2 drivers statically compiled into kernel.
>>
>> Surely I can download drivers from intel and compile it as module, but
>> I need to compile it statically
>> Intel drivers do not provide sources for static compilation :(
>>
>> Is it possible to upgrade drivers to igb-2.2.9 in the source tree and
>> allow the static compilation of them?
>>
> Normally the upstream driver should be up to date with any fixes that are in
> the standalone module.  What version of the kernel are you currently
> running?  Also have your tried testing the older standalone igb modules to
> see if the same issues exist there in either igb 2.1.9 or  igb 2.0.6?  This
> would help us to determine what changes in the standalone module might be
> missing from the in-kernel version of the driver.



We're running 2.6.33.2 kernel version.
I have compiled and installed 2.0.6 and 2.1.9 modules on two our servers.

Strange freezes usually appears in several (1 or 2) days.
So I'll let you know when it happen and what module version will be
responsible for that


>
>> Because having 2.1.0-k2 we experience some strange random freezes with
>> network interface which can be fixed only by restarting network.
>>
>> 2.2.9 module has no such problems but we need to use static kernels
>> according to our policy.
>
> I have CCed our e1000-devel list.  In the future you may want to CC this
> list as it will provide better visibility to the Intel wired networking
> maintainers.
>


Thank you for that.


> Thanks,
>
> Alex
>

^ permalink raw reply

* Re: [linux-pm] [PATCH 3/3] pm_qos: get rid of the allocation in pm_qos_add_request()
From: Rafael J. Wysocki @ 2010-06-29  9:20 UTC (permalink / raw)
  To: James Bottomley; +Cc: Takashi Iwai, netdev, linux-pm, markgross
In-Reply-To: <1277763049.10879.204.camel@mulgrave.site>

On Tuesday, June 29, 2010, James Bottomley wrote:
> On Mon, 2010-06-28 at 23:59 +0200, Rafael J. Wysocki wrote:
> > On Monday, June 28, 2010, James Bottomley wrote:
> > > Since every caller has to squirrel away the returned pointer anyway,
> > > they might as well supply the memory area.  This fixes a bug in a few of
> > > the call sites where the returned pointer was dereferenced without
> > > checking it for NULL (which gets returned if the kzalloc failed).
> > > 
> > > I'd like to hear how sound and netdev feels about this: it will add
> > > about two more pointers worth of data to struct netdev and struct
> > > snd_pcm_substream .. but I think it's worth it.  If you're OK, I'll add
> > > your acks and send through the pm tree.
> > > 
> > > This also looks to me like an android independent clean up (even though
> > > it renders the request_add atomically callable).  I also added include
> > > guards to include/linux/pm_qos_params.h
> > > 
> > > cc: netdev@vger.kernel.org
> > > cc: Takashi Iwai <tiwai@suse.de>
> > > Signed-off-by: James Bottomley <James.Bottomley@suse.de>
> > 
> > I like all of the patches in this series, thanks a lot for doing this!
> > 
> > I guess it might be worth sending a CC to the LKML next round so that people
> > can see [1/3] (I don't expect any objections, but anyway it would be nice).
> 
> I cc'd the latest owners of plist.h ... although Daniel Walker has
> apparently since left MontaVista, Thomas Gleixner is still current ...
> and he can speak for the RT people, who are the primary plist users.
> 
> I can do another round and cc lkml, I was just hoping this would be the
> last revision.

OK, let's see if there's any feedback on [3/3] from netdev and Takashi.
If there's none, I'll just put the series into my linux-next branch.

Rafael

^ permalink raw reply

* Re: [PATCH -next] vmxnet3: fail when try to setup unsupported features
From: Stanislaw Gruszka @ 2010-06-29  9:15 UTC (permalink / raw)
  To: Shreyas Bhatewara; +Cc: netdev@vger.kernel.org, Amerigo Wang
In-Reply-To: <89E2752CFA8EC044846EB8499819134102BCC3C10A@EXCH-MBX-4.vmware.com>

On Mon, Jun 28, 2010 at 10:45:57AM -0700, Shreyas Bhatewara wrote:
> > +vmxnet3_set_flags(struct net_device *netdev, u32 data)
> > +{
> >  	struct vmxnet3_adapter *adapter = netdev_priv(netdev);
> >  	u8 lro_requested = (data & ETH_FLAG_LRO) == 0 ? 0 : 1;
> >  	u8 lro_present = (netdev->features & NETIF_F_LRO) == 0 ? 0 : 1;
> > 
> > +	if (data & ~ETH_FLAG_LRO)
> > +		return -EOPNOTSUPP;
> > +
> >  	if (lro_requested ^ lro_present) {
> >  		/* toggle the LRO feature*/
> >  		netdev->features ^= NETIF_F_LRO;
> > --
> > 1.5.5.6
> 
> 
> Does not make sense to me. Switching LRO on/off is supported from the driver, why should the function return -EOPNOTSUPP ?

We return EOPNOTSUPP only if someone will try to setup other features
than LRO, if data == ETH_FLAG_LRO we will turn LRO on, and turn it off
when data == 0. 

Stanislaw

^ permalink raw reply

* Re: [PATCH 1/1] Bluetooth: hidp: Add support for hidraw HIDIOCGFEATURE and HIDIOCSFEATURE
From: Johan Hedberg @ 2010-06-29  9:07 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: David Miller, ospite-aNJ+ML1ZbiP93QAQaVx+gl6hYfS7NtTn,
	alan-yzvJWuRpmD1zbRFIqnYvSA, marcel-kz+m5ild9QBg9hUCZPvPmw,
	mdpoole-IZmAEv5cUt1AfugRpC6u6w, hadess-0MeiytkfxGOsTnJN9+BGXg,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	linux-bluetooth-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <alpine.LNX.2.00.1006291047560.13809-ztGlSCb7Y1iN3ZZ/Hiejyg@public.gmane.org>

Hi,

On Tue, Jun 29, 2010, Jiri Kosina wrote:
> Frankly, I don't understand what exactly the current situation with 
> in-kernel bluetooth stack is anyway. 
> 
> What is the relation between what we have in net/bluetooth and the tree at 
> [1], which seems to be quite actively developed?

The difference is that the userspace part has essentially two
maintainers (me and Marcel) whereas the kernel side only has one
(Marcel). So even when Marcel is inactive I can at least take care of
the userspace side. I'd volunteer for the kernel also but I don't really
have any experience there (only 3-4 patches so far).

Johan

^ permalink raw reply

* Re: [PATCH 1/1] Bluetooth: hidp: Add support for hidraw HIDIOCGFEATURE and HIDIOCSFEATURE
From: Jiri Kosina @ 2010-06-29  8:50 UTC (permalink / raw)
  To: David Miller
  Cc: ospite, alan, marcel, mdpoole, hadess, eric.dumazet,
	linux-bluetooth, linux-kernel, netdev
In-Reply-To: <20100629.001216.91341775.davem@davemloft.net>

On Tue, 29 Jun 2010, David Miller wrote:

> >> This patch adds support or getting and setting feature reports for bluetooth
> >> HID devices from HIDRAW.
> >> 
> >> Signed-off-by: Alan Ott <alan@signal11.us>
> >> ---
> > 
> > Ping.
> 
> We effectively don't have a bluetooth maintainer at the current point in 
> time.  I've tried to let patches sit for a while hoping the listed 
> maintainer would do something, at least occaisionally, but that simply 
> isn't happening.

Frankly, I don't understand what exactly the current situation with 
in-kernel bluetooth stack is anyway. 

What is the relation between what we have in net/bluetooth and the tree at 
[1], which seems to be quite actively developed?

> So I'll just pick patches up directly as I find time to review them, but 
> I have to warn that for me it's going to be done in a very low priority 
> way because I really don't find bluetooth all that exciting. :-)

If needed, I can at least take over the net/bluetooth/hidp part, as I 
maintain the rest of the HID code anyway.

[1] http://git.kernel.org/?p=bluetooth/bluez.git;a=summary

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply

* Re: b44: Reset due to FIFO overflow.
From: James Courtier-Dutton @ 2010-06-29  8:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <AANLkTikVEwIj2Jjk8YMan4sHDNxpzhK3tlvpuAxYybSI@mail.gmail.com>

On 28 June 2010 22:37, James Courtier-Dutton <james.dutton@gmail.com> wrote:
>
> I tried the patch.
> I also tried without the patch, but bypassed the hw reset in the RFO case.
>
> In both cases, the hardware did not recover from the overflow.
> An "ifconfig eth0 down" then "ifconfig eth0 up" was required to bring
> it back to life, I.e. A manual hw reset.
>
> What I did find is that once the RFO state is reached, it is not cleared.
> I think we need to find a way to clear the RFO state.
> The RFO state is cleared after a HW reset.
>
> Kind Regards
>
> James
>

Under further analysis, I have found that RFO is not cleared by a
write to bw32(bp, B44_ISTAT, istat);
whereas most other conditions should be cleared by this.

So, I went searching in the hardware reset functions for when the RFO
was cleared.

I found it:
A call to this:
ssb_device_enable(bp->sdev, 0)
in the b44_chip_reset function is what actually clears the RFO.
So, does anyone have any data sheets on the ssb ?
The ssb looks to me like the DMA engine.

On a more positive note, if we can get the ssb to reset without the
phy resetting, we could have our smooth recovery achieved.

Kind Regards

James

^ permalink raw reply

* Re: [iproute2] iproute2:  Allow 'ip addr flush' to loop more than 10 times.
From: Alexander Clouter @ 2010-06-29  8:03 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1277790959-28075-1-git-send-email-greearb@candelatech.com>

greearb@gmail.com wrote:
> 
> The default remains at 10 for backwards compatibility.
> 
> For instance:
> # ip addr flush dev eth2
> *** Flush remains incomplete after 10 rounds. ***
> # ip -l 20 addr flush dev eth2
> *** Flush remains incomplete after 20 rounds. ***
> # ip -loops 0 addr flush dev eth2
> #
> 
> This is useful for getting rid of large numbers of IP
> addresses in scripts.
> 
Maybe I am missing a trick, but what is wrong with putting this trivial 
logic into the script:

ip addr show ${DEV} | awk '/inet6? / { print $2 }' | xargs -I{} ip addr del '{}' dev ${DEV}

You can probably speed things up with '-P' too, '-P 2' gives me a huge 
huge speed up for the work I do with 'ip route'.

If you still have addresses on your interface after the above command, 
your looping approach probably would have failed also.

Why the need to cram more functionality and options into iproute when 
it is something that can be pushed into the wrapper script? 

Cheers

-- 
Alexander Clouter
.sigmonster says: Lend money to a bad debtor and he will hate you.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox