Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC] possible bug in inet->opt handling
From: Herbert Xu @ 2011-04-15 17:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1302881994.3613.34.camel@edumazet-laptop>

On Fri, Apr 15, 2011 at 05:39:54PM +0200, Eric Dumazet wrote:
>
> My plan is to add RCU protection on inet->opt, unless someone has better
> idea ?

inet->opt is rarely non-NULL.  So perhaps just throw some locks
around the memcpy.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [RFC] possible bug in inet->opt handling
From: Herbert Xu @ 2011-04-15 17:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <20110415171715.GA5648@gondor.apana.org.au>

On Sat, Apr 16, 2011 at 01:17:15AM +0800, Herbert Xu wrote:
> On Fri, Apr 15, 2011 at 05:39:54PM +0200, Eric Dumazet wrote:
> >
> > My plan is to add RCU protection on inet->opt, unless someone has better
> > idea ?
> 
> inet->opt is rarely non-NULL.  So perhaps just throw some locks
> around the memcpy.

Ah I missed your other point about inet->opt going away.  The
other option would be to always kmalloc/memcpy in udp_sendmsg
and have ip_setup_cork simply steal the reference from ipc.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] net: forcedeth: convert to hw_features
From: Michał Mirosław @ 2011-04-15 17:44 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev
In-Reply-To: <1302885588.2845.4.camel@bwh-desktop>

On Fri, Apr 15, 2011 at 05:39:48PM +0100, Ben Hutchings wrote:
> On Fri, 2011-04-15 at 16:50 +0200, Michał Mirosław wrote:
> > This also fixes a race around np->txrxctl_bits while changing RXCSUM offload.
> > 
> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > ---
> >  drivers/net/forcedeth.c |   78 +++++++++++++++-------------------------------
> >  1 files changed, 26 insertions(+), 52 deletions(-)
> > 
> > diff --git a/drivers/net/forcedeth.c b/drivers/net/forcedeth.c
> > index d5ab4da..ec9a32d 100644
> > --- a/drivers/net/forcedeth.c
> > +++ b/drivers/net/forcedeth.c
> > @@ -774,7 +774,6 @@ struct fe_priv {
> >  	u32 driver_data;
> >  	u32 device_id;
> >  	u32 register_size;
> > -	int rx_csum;
> >  	u32 mac_in_use;
> >  	int mgmt_version;
> >  	int mgmt_sema;
> > @@ -4480,58 +4479,36 @@ static int nv_set_pauseparam(struct net_device *dev, struct ethtool_pauseparam*
> >  	return 0;
> >  }
> >  
> > -static u32 nv_get_rx_csum(struct net_device *dev)
> > +static u32 nv_fix_features(struct net_device *dev, u32 features)
> >  {
> > -	struct fe_priv *np = netdev_priv(dev);
> > -	return np->rx_csum != 0;
> > +	/* vlan is dependent on rx checksum offload */
> > +	if (features & (NETIF_F_HW_VLAN_TX|NETIF_F_HW_VLAN_RX))
> > +		features |= NETIF_F_RXCSUM;
> [...]
> 
> Shouldn't this be done the other way round:
> 
> 	if (!(features & NETIF_F_RXCSUM))
> 		features &= ~(NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX);
> 
> So long as the VLAN feature flags are still set in wanted_features, they
> will be turned back on automatically if RXCSUM is re-enabled.

Yes, but this way is a direct translation from old ethtool_ops. Changing
this needs implementing changing HW_VLAN features in set_features.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: [RFC v3 5/6] j1939: rename NAME to UUID?
From: Oliver Hartkopp @ 2011-04-15 17:57 UTC (permalink / raw)
  To: Kurt Van Dijck
  Cc: socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20110413044928.GA289-ozGf4kBk5synFtIcQ8t7k3L8HoS0Hn3T@public.gmane.org>

On 13.04.2011 06:49, Kurt Van Dijck wrote:
> Oliver et.al.,
> 
> On Sun, Mar 20, 2011 at 04:56:46PM +0100, Oliver Hartkopp wrote:
>> On 14.03.2011 14:59, Kurt Van Dijck wrote:
>>
>> Then you suggest to attach static and/or dynamic addresses to the interface.
>>
>>> +  Assigning addresses is done via
>>> +  $ ip addr add dev canX j1939 0xXX
>>> +  statically or
>>> +  $ ip addr add dev canX j1939 name 0xXX
>>> +  dynamically. In the latter case, address claiming must take place
>>> +  before other traffic can leave.
>>
>> like you would have using DHCP/DNS (adapted for j1939) ...
>>
> I suspect the confustion with DHCP/DNS comes free with the used terminology.
> 
> Specifications talk about a 64bit NAME, where is actually is a 64bit UUID.
> Calling this number a UUID may clarify things, but leaves the spec in the
> terminology.
> 
> one would then do:
> $ ip addr add dev canX j1939 uuid XXXX
> 
> Would that be a good way to progress?

Hello Kurt,

i don't know if it helps - at least for j1939 users - to rename the NAME for
j1939 address claiming to UUID which is usually 128 bit long an has a pretty
different understanding than the J1939 NAME which stands for

   1. Arbitrary address bit
   2. Industry group, length 3 bits
   3. Vehicle system instance, length 4 bits
   4. Vehicle system, length 7 bits
   5. Reserved bit
   6. Function, length 8 bits
   7. Function instance, length 5 bits
   8. ECU instance, length 3 bits
   9. Manufacturer code, length 11 bits
  10. Identity number, length 21 bits

(from http://www.kvaser.com/en/about-can/higher-layer-protocols/36.html)

This is not comparable to the ideas from RFC 4122 ...

Thinking about the approach to implement the j1939 address claiming (AC) in
userspace, i discovered two ways which could both be hidden inside some
easy-to-use helper functions:

1. implement a thread (e.g. within a library) which opens a CAN_RAW socket on
a specific CAN-interface and takes care of the AC procedure and monitors
ongoing AC procedures on the bus. In this case every j1939 application
requiring AC internally would monitor all the AC handling on itself (which
should be no general problem - written only once).

2. create j1939ac daemon(s) using PF_UNIX-sockets to be named e.g.
j1939ac_can0, j1939ac_can1, etc. - these daemons take care for all AC
requirements of the host it is running on. The PF_UNIX-sockets are used in
SOCK_DGRAM mode and only the j1939 processes that need AC can then register
their NAME by sending a request datagram, and get back the j1939-address once
it is claimed (and all the updates on changes). As the j1939ac daemons are
running on the same host as the j1939 application processes, optional the
process' PID could be provided to the daemon during the registering process,
so that the daemon can send a signal to a signal handler of the application
process (if you would like to omit the select() syscall to handle both the
j1939 and PF_UNIX sockets).

->   <Req><Name="A3B5667799332242" PID="12345">
<-   <Resp><ACState="claimed" Name="A3B5667799332242" Address="1B">
(some time)
<-   <Resp><ACState="changed" Name="A3B5667799332242" Address="1C">

This is a sketch that could be put into simple C-structs that are sent via the
PF_UNIX DGRAM socket.

In all suggested cases (using a thread, daemon with/without signal) the AC
procedure can be managed in userspace without real pain. But especially with
less pain than putting the AC process into kernelspace and provide your
suggested socket API with bind/connect/... in very different manners.

Regards,
Oliver

^ permalink raw reply

* Re: [PATCH] net: myri10ge: convert to hw_features
From: Andrew Gallatin @ 2011-04-15 18:36 UTC (permalink / raw)
  To: Jon Mason; +Cc: Michał Mirosław, netdev, Brice Goglin
In-Reply-To: <20110415182922.GA2458@myri.com>

On 04/15/11 14:29, Jon Mason wrote:
> On Fri, Apr 15, 2011 at 04:50:50PM +0200, Michał Mirosław wrote:
>> Signed-off-by: Michał Mirosław<mirq-linux@rere.qmqm.pl>
>> ---
>>   drivers/net/myri10ge/myri10ge.c |   66 +++++++-------------------------------
>>   1 files changed, 12 insertions(+), 54 deletions(-)
>>
>> diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
>> index 1446de5..a48eb92 100644
>> --- a/drivers/net/myri10ge/myri10ge.c
>> +++ b/drivers/net/myri10ge/myri10ge.c
>> @@ -205,7 +205,6 @@ struct myri10ge_priv {
>>   	int tx_boundary;	/* boundary transmits cannot cross */
>>   	int num_slices;
>>   	int running;		/* running?             */
>> -	int csum_flag;		/* rx_csums?            */
>
> Get rid of MXGEFW_FLAGS_CKSUM in drivers/net/myri10ge/myri10ge_mcp.h,
> as this was the only thing using it.
>

No, please don't.  MXGEFW_FLAGS_CKSUM is a TX descriptor flag that was 
(ab)used as a device state flag as well. See flags in myri10ge_xmit(). 
I think early in the development process, the value of  mgp->csum_flag 
was directly assigned into the descriptor, which is why they shared the 
value.

Drew

^ permalink raw reply

* Re: The bonding driver should notify userspace of MAC address change
From: Nicolas de Pesloüan @ 2011-04-15 18:40 UTC (permalink / raw)
  To: Michał Górny; +Cc: netdev, roy, Jay Vosburgh, Andy Gospodarek
In-Reply-To: <20110415184407.550abd88@pomiocik.lan>

Le 15/04/2011 18:44, Michał Górny a écrit :
> Hello,
>
> I'd like to file a feature request for the bonding driver. Currently,
> there is no way for userspace to know whether the driver actually gets
> a MAC address. This results in the fact that dhcpcd sends MAC-less DHCP
> packets through bonding device if it is started before bond gets any
> slaves.

A similar subject, involving bridge instead of bonding, was discussed a few weeks ago in this 
thread: http://marc.info/?l=linux-netdev&m=129939017116310&w=2

In particular, I suggested to apply Stephen's suggestion not only to bridge but also to bonding.

(http://marc.info/?l=linux-netdev&m=129948385024680&w=2)

A bonding device should not report link up to userspace until at least one slave is present and up.

And possibly, a bonding device should report link down if all slaves are down or all slave were removed.

Jay, Andy, does this sounds sensible to you?

> I've reported that problem upstream [1], and the author suggested that
> the bonding driver would have to notify the userspace about MAC address
> change, suggesting using RTM_NEWLINK message.
>
> I wanted to write a patch for that but I don't seem to see any
> appropriate IFF_* flag for that particular kind of event. dhcpcd author
> suggested using 'ifi->ifi_change = ~0U' but I'm not sure if it's
> appropriate.
>
> Could you either add such a kind of notification or give me a tip on
> how to proceed with adding it? Thanks in advance.
>
> [1] http://roy.marples.name/projects/dhcpcd/ticket/212


^ permalink raw reply

* Re: [PATCH] net: myri10ge: convert to hw_features
From: Michał Mirosław @ 2011-04-15 18:47 UTC (permalink / raw)
  To: Jon Mason; +Cc: netdev, Andrew Gallatin, Brice Goglin
In-Reply-To: <20110415182922.GA2458@myri.com>

On Fri, Apr 15, 2011 at 01:29:22PM -0500, Jon Mason wrote:
> On Fri, Apr 15, 2011 at 04:50:50PM +0200, Michał Mirosław wrote:
> > Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > ---
> >  drivers/net/myri10ge/myri10ge.c |   66 +++++++-------------------------------
> >  1 files changed, 12 insertions(+), 54 deletions(-)
[...]
> > -static int myri10ge_set_tso(struct net_device *netdev, u32 tso_enabled)
> > -{
> > -	struct myri10ge_priv *mgp = netdev_priv(netdev);
> > -	u32 flags = mgp->features & (NETIF_F_TSO6 | NETIF_F_TSO);
> > -
> > -	if (tso_enabled)
> > -		netdev->features |= flags;
> > -	else
> > -		netdev->features &= ~flags;
> > -	return 0;
> > -}
> ethtool_op_set_tso does not support TSO6.  This would remove the
> enable/disable of that feature.

Please test this. You'll see it still works.

Best Regards,
Michał Mirosław

^ permalink raw reply

* Re: The bonding driver should notify userspace of MAC address change
From: Jay Vosburgh @ 2011-04-15 18:53 UTC (permalink / raw)
  To: =?UTF-8?B?Tmljb2xhcyBkZSBQZXNsb8O8YW4=?=
  Cc: =?UTF-8?B?TWljaGHFgiBHw7Nybnk=?=, netdev, roy, Andy Gospodarek
In-Reply-To: <4DA89114.9040900@gmail.com>

Nicolas de Pesloüan 	<nicolas.2p.debian@gmail.com> wrote:

>Le 15/04/2011 18:44, Michał Górny a écrit :
>> Hello,
>>
>> I'd like to file a feature request for the bonding driver. Currently,
>> there is no way for userspace to know whether the driver actually gets
>> a MAC address. This results in the fact that dhcpcd sends MAC-less DHCP
>> packets through bonding device if it is started before bond gets any
>> slaves.
>
>A similar subject, involving bridge instead of bonding, was discussed a
>few weeks ago in this thread:
>http://marc.info/?l=linux-netdev&m=129939017116310&w=2
>
>In particular, I suggested to apply Stephen's suggestion not only to bridge but also to bonding.
>
>(http://marc.info/?l=linux-netdev&m=129948385024680&w=2)
>
>A bonding device should not report link up to userspace until at least one slave is present and up.
>
>And possibly, a bonding device should report link down if all slaves are down or all slave were removed.
>
>Jay, Andy, does this sounds sensible to you?

	I was just reading their bug and doing an experiment; I don't
see that bonding reports carrier up until there's at least one slave
(even if it's configured up), e.g.,

# modprobe bonding
# ifconfig bond0 up
# cat /sys/class/net/bond0/carrier
0
# echo +eth5 > /sys/class/net/bond0/bonding/slaves
# cat /sys/class/net/bond0/carrier
1

	If there's a slave, there's a MAC assigned, since bond_enslave
sets the master's MAC before it calls bond_set_carrier.

	In bond_create, as soon as register_netdevice returns, we call
netif_carrier_off, and it stays off until bond_enslave runs
successfully.

	Is there some race window there between the register and the
netif_carrier_off?

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [PATCH] net: cxgb4{,vf}: convert to hw_features
From: Dimitris Michailidis @ 2011-04-15 19:00 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev, Casey Leedom
In-Reply-To: <20110415145050.6A43913A6A@rere.qmqm.pl>

Michał Mirosław wrote:

> +#define TSO_FLAGS (NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_TSO_ECN)
>  #define VLAN_FEAT (NETIF_F_SG | NETIF_F_IP_CSUM | TSO_FLAGS | \
>  		   NETIF_F_IPV6_CSUM | NETIF_F_HIGHDMA)
>  
> @@ -3665,14 +3627,14 @@ static int __devinit init_one(struct pci_dev *pdev,
>  		pi = netdev_priv(netdev);
>  		pi->adapter = adapter;
>  		pi->xact_addr_filt = -1;
> -		pi->rx_offload = RX_CSO;
>  		pi->port_id = i;
>  		netdev->irq = pdev->irq;
>  
> -		netdev->features |= NETIF_F_SG | TSO_FLAGS;
> -		netdev->features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
> -		netdev->features |= NETIF_F_GRO | NETIF_F_RXHASH | highdma;
> -		netdev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
> +		netdev->hw_features = NETIF_F_SG | TSO_FLAGS |
> +			NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> +			NETIF_F_RXCSUM | NETIF_F_RXHASH |
> +			NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
> +		netdev->features |= netdev->hw_features | highdma;
>  		netdev->vlan_features = netdev->features & VLAN_FEAT;

Here vlan_features does not include NETIF_F_RXCSUM but the cxgb4vf bits 
below do include it.  I looked at some other drivers and saw again some 
include it and some don't.  The core VLAN code handles NETIF_F_RXCSUM on its 
own.  Is there some rule for whether drivers should set it in their 
vlan_features or not?

> diff --git a/drivers/net/cxgb4/sge.c b/drivers/net/cxgb4/sge.c
> index 311471b..e8f6f8e 100644
> --- a/drivers/net/cxgb4/sge.c
> +++ b/drivers/net/cxgb4/sge.c
> @@ -1587,7 +1587,7 @@ int t4_ethrx_handler(struct sge_rspq *q, const __be64 *rsp,
>  	pi = netdev_priv(skb->dev);
>  	rxq->stats.pkts++;
>  
> -	if (csum_ok && (pi->rx_offload & RX_CSO) &&
> +	if (csum_ok && (q->netdev->features & NETIF_F_RXCSUM) &&
>  	    (pkt->l2info & htonl(RXF_UDP | RXF_TCP))) {
>  		if (!pkt->ip_frag) {
>  			skb->ip_summed = CHECKSUM_UNNECESSARY;

With this change variable 'pi' can be removed but I can do this cleanup 
after this patch goes in.

> diff --git a/drivers/net/cxgb4vf/cxgb4vf_main.c b/drivers/net/cxgb4vf/cxgb4vf_main.c
> index c662679..04a5c2d 100644
> --- a/drivers/net/cxgb4vf/cxgb4vf_main.c
> +++ b/drivers/net/cxgb4vf/cxgb4vf_main.c
> @@ -2638,14 +2597,13 @@ static int __devinit cxgb4vf_pci_probe(struct pci_dev *pdev,
>  		 * it.
>  		 */
>  		pi->xact_addr_filt = -1;
> -		pi->rx_offload = RX_CSO;
>  		netif_carrier_off(netdev);
>  		netdev->irq = pdev->irq;
>  
> -		netdev->features = (NETIF_F_SG | TSO_FLAGS |
> -				    NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> -				    NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX |
> -				    NETIF_F_GRO);
> +		netdev->hw_features = NETIF_F_SG | TSO_FLAGS | NETIF_F_RXCSUM |
> +			NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
> +			NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_RX;
> +		netdev->features = netdev->hw_features;
>  		if (pci_using_dac)
>  			netdev->features |= NETIF_F_HIGHDMA;
>  		netdev->vlan_features =

cxgb4vf does not implement toggling of NETIF_F_HW_VLAN_RX so I think the 
flag should be set in features but not hw_features or maybe the driver needs 
to handle it in fix_features?

^ permalink raw reply

* Re: [PATCH] net: myri10ge: convert to hw_features
From: Jon Mason @ 2011-04-15 18:29 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: netdev, Andrew Gallatin, Brice Goglin
In-Reply-To: <20110415145050.0D65C13A66@rere.qmqm.pl>

On Fri, Apr 15, 2011 at 04:50:50PM +0200, Michał Mirosław wrote:
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> ---
>  drivers/net/myri10ge/myri10ge.c |   66 +++++++-------------------------------
>  1 files changed, 12 insertions(+), 54 deletions(-)
> 
> diff --git a/drivers/net/myri10ge/myri10ge.c b/drivers/net/myri10ge/myri10ge.c
> index 1446de5..a48eb92 100644
> --- a/drivers/net/myri10ge/myri10ge.c
> +++ b/drivers/net/myri10ge/myri10ge.c
> @@ -205,7 +205,6 @@ struct myri10ge_priv {
>  	int tx_boundary;	/* boundary transmits cannot cross */
>  	int num_slices;
>  	int running;		/* running?             */
> -	int csum_flag;		/* rx_csums?            */

Get rid of MXGEFW_FLAGS_CKSUM in drivers/net/myri10ge/myri10ge_mcp.h,
as this was the only thing using it.

>  	int small_bytes;
>  	int big_bytes;
>  	int max_intr_slots;
> @@ -1386,7 +1385,7 @@ myri10ge_rx_done(struct myri10ge_slice_state *ss, int len, __wsum csum,
>  	skb->protocol = eth_type_trans(skb, dev);
>  	skb_record_rx_queue(skb, ss - &mgp->ss[0]);
>  
> -	if (mgp->csum_flag) {
> +	if (dev->features & NETIF_F_RXCSUM) {
>  		if ((skb->protocol == htons(ETH_P_IP)) ||
>  		    (skb->protocol == htons(ETH_P_IPV6))) {
>  			skb->csum = csum;
> @@ -1757,43 +1756,6 @@ myri10ge_get_ringparam(struct net_device *netdev,
>  	ring->tx_pending = ring->tx_max_pending;
>  }
>  
> -static u32 myri10ge_get_rx_csum(struct net_device *netdev)
> -{
> -	struct myri10ge_priv *mgp = netdev_priv(netdev);
> -
> -	if (mgp->csum_flag)
> -		return 1;
> -	else
> -		return 0;
> -}
> -
> -static int myri10ge_set_rx_csum(struct net_device *netdev, u32 csum_enabled)
> -{
> -	struct myri10ge_priv *mgp = netdev_priv(netdev);
> -	int err = 0;
> -
> -	if (csum_enabled)
> -		mgp->csum_flag = MXGEFW_FLAGS_CKSUM;
> -	else {
> -		netdev->features &= ~NETIF_F_LRO;
> -		mgp->csum_flag = 0;
> -
> -	}
> -	return err;
> -}
> -
> -static int myri10ge_set_tso(struct net_device *netdev, u32 tso_enabled)
> -{
> -	struct myri10ge_priv *mgp = netdev_priv(netdev);
> -	u32 flags = mgp->features & (NETIF_F_TSO6 | NETIF_F_TSO);
> -
> -	if (tso_enabled)
> -		netdev->features |= flags;
> -	else
> -		netdev->features &= ~flags;
> -	return 0;
> -}

ethtool_op_set_tso does not support TSO6.  This would remove the
enable/disable of that feature.

> -
>  static const char myri10ge_gstrings_main_stats[][ETH_GSTRING_LEN] = {
>  	"rx_packets", "tx_packets", "rx_bytes", "tx_bytes", "rx_errors",
>  	"tx_errors", "rx_dropped", "tx_dropped", "multicast", "collisions",
> @@ -1944,11 +1906,6 @@ static u32 myri10ge_get_msglevel(struct net_device *netdev)
>  	return mgp->msg_enable;
>  }
>  
> -static int myri10ge_set_flags(struct net_device *netdev, u32 value)
> -{
> -	return ethtool_op_set_flags(netdev, value, ETH_FLAG_LRO);
> -}
> -
>  static const struct ethtool_ops myri10ge_ethtool_ops = {
>  	.get_settings = myri10ge_get_settings,
>  	.get_drvinfo = myri10ge_get_drvinfo,
> @@ -1957,19 +1914,12 @@ static const struct ethtool_ops myri10ge_ethtool_ops = {
>  	.get_pauseparam = myri10ge_get_pauseparam,
>  	.set_pauseparam = myri10ge_set_pauseparam,
>  	.get_ringparam = myri10ge_get_ringparam,
> -	.get_rx_csum = myri10ge_get_rx_csum,
> -	.set_rx_csum = myri10ge_set_rx_csum,
> -	.set_tx_csum = ethtool_op_set_tx_hw_csum,
> -	.set_sg = ethtool_op_set_sg,
> -	.set_tso = myri10ge_set_tso,
>  	.get_link = ethtool_op_get_link,
>  	.get_strings = myri10ge_get_strings,
>  	.get_sset_count = myri10ge_get_sset_count,
>  	.get_ethtool_stats = myri10ge_get_ethtool_stats,
>  	.set_msglevel = myri10ge_set_msglevel,
>  	.get_msglevel = myri10ge_get_msglevel,
> -	.get_flags = ethtool_op_get_flags,
> -	.set_flags = myri10ge_set_flags
>  };
>  
>  static int myri10ge_allocate_rings(struct myri10ge_slice_state *ss)
> @@ -3136,6 +3086,14 @@ static int myri10ge_set_mac_address(struct net_device *dev, void *addr)
>  	return 0;
>  }
>  
> +static u32 myri10ge_fix_features(struct net_device *dev, u32 features)
> +{
> +	if (!(features & NETIF_F_RXCSUM))
> +		features &= ~NETIF_F_LRO;
> +
> +	return features;
> +}
> +
>  static int myri10ge_change_mtu(struct net_device *dev, int new_mtu)
>  {
>  	struct myri10ge_priv *mgp = netdev_priv(dev);
> @@ -3834,6 +3792,7 @@ static const struct net_device_ops myri10ge_netdev_ops = {
>  	.ndo_get_stats		= myri10ge_get_stats,
>  	.ndo_validate_addr	= eth_validate_addr,
>  	.ndo_change_mtu		= myri10ge_change_mtu,
> +	.ndo_fix_features	= myri10ge_fix_features,
>  	.ndo_set_multicast_list = myri10ge_set_multicast_list,
>  	.ndo_set_mac_address	= myri10ge_set_mac_address,
>  };
> @@ -3860,7 +3819,6 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	mgp = netdev_priv(netdev);
>  	mgp->dev = netdev;
>  	mgp->pdev = pdev;
> -	mgp->csum_flag = MXGEFW_FLAGS_CKSUM;
>  	mgp->pause = myri10ge_flow_control;
>  	mgp->intr_coal_delay = myri10ge_intr_coal_delay;
>  	mgp->msg_enable = netif_msg_init(myri10ge_debug, MYRI10GE_MSG_DEFAULT);
> @@ -3976,11 +3934,11 @@ static int myri10ge_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	netdev->netdev_ops = &myri10ge_netdev_ops;
>  	netdev->mtu = myri10ge_initial_mtu;
>  	netdev->base_addr = mgp->iomem_base;
> -	netdev->features = mgp->features;
> +	netdev->hw_features = mgp->features | NETIF_F_LRO | NETIF_F_RXCSUM;
> +	netdev->features = netdev->hw_features;
>  
>  	if (dac_enabled)
>  		netdev->features |= NETIF_F_HIGHDMA;
> -	netdev->features |= NETIF_F_LRO;
>  
>  	netdev->vlan_features |= mgp->features;
>  	if (mgp->fw_ver_tiny < 37)
> -- 
> 1.7.2.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply

* [RFC net-next] bonding: notify when bonding device address changes
From: Stephen Hemminger @ 2011-04-15 19:10 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: Nicolas de Pesloüan, Michał Górny, netdev, roy,
	Andy Gospodarek
In-Reply-To: <10227.1302893590@death>

When a device changes its hardware address, it needs to call the network
device notifiers to inform protocols.

Compile tested only.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--- a/drivers/net/bonding/bond_main.c	2011-04-15 11:21:02.142866195 -0700
+++ b/drivers/net/bonding/bond_main.c	2011-04-15 11:28:06.491408825 -0700
@@ -967,9 +967,11 @@ static void bond_do_fail_over_mac(struct
 
 	switch (bond->params.fail_over_mac) {
 	case BOND_FOM_ACTIVE:
-		if (new_active)
+		if (new_active) {
 			memcpy(bond->dev->dev_addr,  new_active->dev->dev_addr,
 			       new_active->dev->addr_len);
+			call_netdevice_notifiers(NETDEV_CHANGEADDR, bond->dev);
+		}
 		break;
 	case BOND_FOM_FOLLOW:
 		/*
@@ -1386,6 +1388,7 @@ static int bond_sethwaddr(struct net_dev
 	pr_debug("slave_dev=%p\n", slave_dev);
 	pr_debug("slave_dev->addr_len=%d\n", slave_dev->addr_len);
 	memcpy(bond_dev->dev_addr, slave_dev->dev_addr, slave_dev->addr_len);
+	call_netdevice_notifiers(NETDEV_CHANGEADDR, bond_dev);
 	return 0;
 }
 
@@ -1644,10 +1647,11 @@ int bond_enslave(struct net_device *bond
 
 	/* If this is the first slave, then we need to set the master's hardware
 	 * address to be the same as the slave's. */
-	if (is_zero_ether_addr(bond->dev->dev_addr))
+	if (is_zero_ether_addr(bond->dev->dev_addr)) {
 		memcpy(bond->dev->dev_addr, slave_dev->dev_addr,
 		       slave_dev->addr_len);
-
+		call_netdevice_notifiers(NETDEV_CHANGEADDR, bond->dev);
+	}
 
 	new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL);
 	if (!new_slave) {
@@ -2067,6 +2071,7 @@ int bond_release(struct net_device *bond
 		 * to the mac address of the first slave
 		 */
 		memset(bond_dev->dev_addr, 0, bond_dev->addr_len);
+		call_netdevice_notifiers(NETDEV_CHANGEADDR, bond_dev);
 
 		if (!bond->vlgrp) {
 			bond_dev->features |= NETIF_F_VLAN_CHALLENGED;
@@ -2252,6 +2257,7 @@ static int bond_release_all(struct net_d
 	 * first slave
 	 */
 	memset(bond_dev->dev_addr, 0, bond_dev->addr_len);
+	call_netdevice_notifiers(NETDEV_CHANGEADDR, bond_dev);
 
 	if (!bond->vlgrp) {
 		bond_dev->features |= NETIF_F_VLAN_CHALLENGED;

^ permalink raw reply

* Re: The bonding driver should notify userspace of MAC address change
From: Phil Oester @ 2011-04-15 19:12 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: =?UTF-8?B?Tmljb2xhcyBkZSBQZXNsb8O8YW4=?=,
	=?UTF-8?B?TWljaGHFgiBHw7Nybnk=?=, netdev, roy, Andy Gospodarek
In-Reply-To: <10227.1302893590@death>

On Fri, Apr 15, 2011 at 11:53:10AM -0700, Jay Vosburgh wrote:
> >A bonding device should not report link up to userspace until at least one slave is present and up.
> >
> >And possibly, a bonding device should report link down if all slaves are down or all slave were removed.
> >
> >Jay, Andy, does this sounds sensible to you?
> 
> 	I was just reading their bug and doing an experiment; I don't
> see that bonding reports carrier up until there's at least one slave
> (even if it's configured up), e.g.,
> 

This was only recently fixed - see e826eafa65c6f1f7c8db5a237556cebac57ebcc5
(bonding: Call netif_carrier_off after register_netdevice). Perhaps the
reporter is not using a recent kernel?

Phil 

^ permalink raw reply

* Re: The bonding driver should notify userspace of MAC address change
From: Nicolas de Pesloüan @ 2011-04-15 19:22 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Michał Górny, netdev, roy, Andy Gospodarek
In-Reply-To: <10227.1302893590@death>

Le 15/04/2011 20:53, Jay Vosburgh a écrit :
> Nicolas de Pesloüan 	<nicolas.2p.debian@gmail.com>  wrote:
>
>> Le 15/04/2011 18:44, Michał Górny a écrit :
>>> Hello,
>>>
>>> I'd like to file a feature request for the bonding driver. Currently,
>>> there is no way for userspace to know whether the driver actually gets
>>> a MAC address. This results in the fact that dhcpcd sends MAC-less DHCP
>>> packets through bonding device if it is started before bond gets any
>>> slaves.
>>
>> A similar subject, involving bridge instead of bonding, was discussed a
>> few weeks ago in this thread:
>> http://marc.info/?l=linux-netdev&m=129939017116310&w=2
>>
>> In particular, I suggested to apply Stephen's suggestion not only to bridge but also to bonding.
>>
>> (http://marc.info/?l=linux-netdev&m=129948385024680&w=2)
>>
>> A bonding device should not report link up to userspace until at least one slave is present and up.
>>
>> And possibly, a bonding device should report link down if all slaves are down or all slave were removed.
>>
>> Jay, Andy, does this sounds sensible to you?
>
> 	I was just reading their bug and doing an experiment; I don't
> see that bonding reports carrier up until there's at least one slave
> (even if it's configured up), e.g.,
>
> # modprobe bonding
> # ifconfig bond0 up
> # cat /sys/class/net/bond0/carrier
> 0
> # echo +eth5>  /sys/class/net/bond0/bonding/slaves
> # cat /sys/class/net/bond0/carrier
> 1
>
> 	If there's a slave, there's a MAC assigned, since bond_enslave
> sets the master's MAC before it calls bond_set_carrier.
>
> 	In bond_create, as soon as register_netdevice returns, we call
> netif_carrier_off, and it stays off until bond_enslave runs
> successfully.

Agreed.

> 	Is there some race window there between the register and the
> netif_carrier_off?

It might be that dhcpd does not wait for link to be up before starting to send DHCP requests.

	Nicolas.

^ permalink raw reply

* Re: The bonding driver should notify userspace of MAC address change
From: Jay Vosburgh @ 2011-04-15 19:22 UTC (permalink / raw)
  To: Phil Oester
  Cc: =?UTF-8?B?Tmljb2xhcyBkZSBQZXNsb8O8YW4=?=,
	=?UTF-8?B?TWljaGHFgiBHw7Nybnk=?=, netdev, roy, Andy Gospodarek
In-Reply-To: <20110415191249.GA6879@linuxace.com>

Phil Oester <kernel@linuxace.com> wrote:

>On Fri, Apr 15, 2011 at 11:53:10AM -0700, Jay Vosburgh wrote:
>> >A bonding device should not report link up to userspace until at least one slave is present and up.
>> >
>> >And possibly, a bonding device should report link down if all slaves are down or all slave were removed.
>> >
>> >Jay, Andy, does this sounds sensible to you?
>> 
>> 	I was just reading their bug and doing an experiment; I don't
>> see that bonding reports carrier up until there's at least one slave
>> (even if it's configured up), e.g.,
>> 
>
>This was only recently fixed - see e826eafa65c6f1f7c8db5a237556cebac57ebcc5
>(bonding: Call netif_carrier_off after register_netdevice). Perhaps the
>reporter is not using a recent kernel?

	Yah, I looked that up after I'd sent my prior email.  Since that
change went in only a month ago, maybe they don't have it.

	On the other hand, I did my test on a FC 14 kernel,
2.6.35.6-45.fc14, which claims to have been built in October 2010, and
it seemed to behave properly.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: [RFC net-next] bonding: notify when bonding device address changes
From: Jay Vosburgh @ 2011-04-15 19:28 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Nicolas de Pesloüan, Michał Górny, netdev, roy,
	Andy Gospodarek
In-Reply-To: <20110415121054.73717900@nehalam>

Stephen Hemminger <shemminger@vyatta.com> wrote:

>When a device changes its hardware address, it needs to call the network
>device notifiers to inform protocols.
>
>Compile tested only.

	We'll need to test this, I think.  If I'm not mistaken, I
believe that inetdev_event will issue gratuitous ARPs when it gets the
NETDEV_CHANGEADDR, and we need to make sure those are correct for all
cases.

>Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>
>--- a/drivers/net/bonding/bond_main.c	2011-04-15 11:21:02.142866195 -0700
>+++ b/drivers/net/bonding/bond_main.c	2011-04-15 11:28:06.491408825 -0700
>@@ -967,9 +967,11 @@ static void bond_do_fail_over_mac(struct
>
> 	switch (bond->params.fail_over_mac) {
> 	case BOND_FOM_ACTIVE:
>-		if (new_active)
>+		if (new_active) {
> 			memcpy(bond->dev->dev_addr,  new_active->dev->dev_addr,
> 			       new_active->dev->addr_len);
>+			call_netdevice_notifiers(NETDEV_CHANGEADDR, bond->dev);
>+		}
> 		break;
> 	case BOND_FOM_FOLLOW:
> 		/*
>@@ -1386,6 +1388,7 @@ static int bond_sethwaddr(struct net_dev
> 	pr_debug("slave_dev=%p\n", slave_dev);
> 	pr_debug("slave_dev->addr_len=%d\n", slave_dev->addr_len);
> 	memcpy(bond_dev->dev_addr, slave_dev->dev_addr, slave_dev->addr_len);
>+	call_netdevice_notifiers(NETDEV_CHANGEADDR, bond_dev);
> 	return 0;
> }
>
>@@ -1644,10 +1647,11 @@ int bond_enslave(struct net_device *bond
>
> 	/* If this is the first slave, then we need to set the master's hardware
> 	 * address to be the same as the slave's. */
>-	if (is_zero_ether_addr(bond->dev->dev_addr))
>+	if (is_zero_ether_addr(bond->dev->dev_addr)) {
> 		memcpy(bond->dev->dev_addr, slave_dev->dev_addr,
> 		       slave_dev->addr_len);
>-
>+		call_netdevice_notifiers(NETDEV_CHANGEADDR, bond->dev);
>+	}
>
> 	new_slave = kzalloc(sizeof(struct slave), GFP_KERNEL);
> 	if (!new_slave) {
>@@ -2067,6 +2071,7 @@ int bond_release(struct net_device *bond
> 		 * to the mac address of the first slave
> 		 */
> 		memset(bond_dev->dev_addr, 0, bond_dev->addr_len);
>+		call_netdevice_notifiers(NETDEV_CHANGEADDR, bond_dev);

	This one in particular I'm not sure about; should the system
send a gratuitous ARP for a MAC address of all zeroes?

> 		if (!bond->vlgrp) {
> 			bond_dev->features |= NETIF_F_VLAN_CHALLENGED;
>@@ -2252,6 +2257,7 @@ static int bond_release_all(struct net_d
> 	 * first slave
> 	 */
> 	memset(bond_dev->dev_addr, 0, bond_dev->addr_len);
>+	call_netdevice_notifiers(NETDEV_CHANGEADDR, bond_dev);

	Same comment for this one.

> 	if (!bond->vlgrp) {
> 		bond_dev->features |= NETIF_F_VLAN_CHALLENGED;

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Feature request: "inverted" ping -a (beep on failure)
From: Christian Boltz @ 2011-04-15 19:35 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA

Hello,

ping -a (beep on ping success) is a quite useful command, but it can be 
annoying.

I'd like to have the exact opposite of it: beep when pinging fails.

I understand that this is slightly difficult because "ping success" is 
easier to detect (incoming package) than "ping failure" (no incoming 
package or firewall reject) - my proposal is to have a timeout for every 
package (if no reply package comes in) and beep if no reply is seen 
after the timeout is over.

For the timeout, the -W option could be used. The default timeout seems 
to be 10 seconds, which is OK.

Usecase / why this would be useful for me:
Basically for server monitoring. The exact usecase is that I have rented 
a "root server" and asked the hoster to exchange a broken harddisk.
With the "inverted" ping -a, it would be easy to notice when they switch 
off the server to replace the disk.

Please consider this feature for the next version of ping ;-)

(The iputils homepage does not list any bugtracker or similar, therefore 
I'm asking here.)

Gruß

Christian Boltz
-- 
"we will support any library from any repo combined with
any application" is something that NO ONE does.
Or if they do, they are insane, or lying, or both.
[Greg KH in opensuse-factory]
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Feature request: "inverted" ping -a (beep on failure)
From: Randy Dunlap @ 2011-04-15 19:49 UTC (permalink / raw)
  To: Christian Boltz; +Cc: netdev
In-Reply-To: <201104152135.33171@tux.boltz.de.vu>

On Fri, 15 Apr 2011 21:35:32 +0200 Christian Boltz wrote:

> Hello,
> 
> ping -a (beep on ping success) is a quite useful command, but it can be 
> annoying.
> 
> I'd like to have the exact opposite of it: beep when pinging fails.
> 
> I understand that this is slightly difficult because "ping success" is 
> easier to detect (incoming package) than "ping failure" (no incoming 
> package or firewall reject) - my proposal is to have a timeout for every 
> package (if no reply package comes in) and beep if no reply is seen 
> after the timeout is over.
> 
> For the timeout, the -W option could be used. The default timeout seems 
> to be 10 seconds, which is OK.
> 
> Usecase / why this would be useful for me:
> Basically for server monitoring. The exact usecase is that I have rented 
> a "root server" and asked the hoster to exchange a broken harddisk.
> With the "inverted" ping -a, it would be easy to notice when they switch 
> off the server to replace the disk.
> 
> Please consider this feature for the next version of ping ;-)
> 
> 
> (The iputils homepage does not list any bugtracker or similar, therefore 
> I'm asking here.)

Couldn't you look for exit code (status) 1 and then do a bell/beep
(or play a sound file :)?

Or do you want ping to beep and then continue running?

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply

* Re: unregister_netdevice: waiting for lo to become free. Usage count = 8
From: Julian Anastasov @ 2011-04-15 20:11 UTC (permalink / raw)
  To: Hans Schillstrom; +Cc: Simon Horman, netdev, lvs-devel, Eric W. Biederman
In-Reply-To: <201104150901.47214.hans@schillstrom.com>


	Hello,

On Fri, 15 Apr 2011, Hans Schillstrom wrote:

> Hello Julian
> 
> I'm trying to fix the cleanup process when a namespace get "killed",
> which is a new feature for ipvs. However an old problem appears again
> 
> When there has been traffic trough ipvs where the destination is unreachable
> the usage count on loopback dev increases one for every packet....

	What is the kernel version?

> I guess thats because of this rule :
> 
> # ip route list table all
> ...
> unreachable default dev lo  table 0  proto kernel  metric 4294967295  error -101 hoplimit 25
> ...
> 
> I made a test just forwarding packets through the same container (ipvs loaded)
> to an unreachable destination and that test had a balanced count i.e. it was possible to reboot the container.

	Can you explain, what do you mean with unreachable
destination? Are you adding some rejecting route?

> Do you have an idea why  this happens in the ipvs case ?

	Do you see with debug level 3 the "Removing destination"
messages. Only real servers can hold dest->dst_cache reference
for dev which can be a problem because the real servers are not
deleted immediately - on traffic they are moved to trash
list. But ip_vs_trash_cleanup() should remove any left
structures. You should check in debug that all servers are
deleted. If all real server structures are freed but
problem remains we should look more deeply in the
dest->dst_cache usage. DR or NAT is used?

	I assume cleanup really happens in this order:

ip_vs_cleanup():
	nf_unregister_hooks()
	...
	ip_vs_conn_cleanup()
	...
	ip_vs_control_cleanup()

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: Feature request: "inverted" ping -a (beep on failure)
From: Christian Boltz @ 2011-04-15 20:11 UTC (permalink / raw)
  To: Randy Dunlap, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20110415124937.6e746646.rdunlap-/UHa2rfvQTnk1uMJSBkQmQ@public.gmane.org>

Hello,

Am Freitag, 15. April 2011 schrieb Randy Dunlap:
> On Fri, 15 Apr 2011 21:35:32 +0200 Christian Boltz wrote:
> > I'd like to have the exact opposite of it: beep when pinging fails.
[...]
> Couldn't you look for exit code (status) 1 and then do a bell/beep
> (or play a sound file :)?

That would require that I know in advance when exactly the server is 
unreachable - but in this case, I wouldn't need to ping it ;-)

To have this working, ping would need an option "exit on error", which 
it doesn't have AFAIK.

A workaround is to run ping -c1 in a loop:

while true ; do
    ping -c1 $server || beep
    sleep 1
done

but I'd prefer to have something like this directly in ping ;-)

> Or do you want ping to beep and then continue running?

Yes, that's exactly what I want.


Gruß

Christian Boltz
-- 
> Ich moechte gern einige User die ihre Mails ueber einen Mailserver 
> (sendmail bevorzugt, postfix auch moeglich) scannen.
Dafür reicht ein Kopierer. Hosen runter, User draufsetzen und "Copy" 
drücken!   [> Ralf Thomas und Sandy Drobic in suse-linux]
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Feature request: "inverted" ping -a (beep on failure)
From: Randy Dunlap @ 2011-04-15 20:14 UTC (permalink / raw)
  To: Christian Boltz; +Cc: netdev
In-Reply-To: <201104152211.46180@tux.boltz.de.vu>

On Fri, 15 Apr 2011 22:11:45 +0200 Christian Boltz wrote:

> Hello,
> 
> Am Freitag, 15. April 2011 schrieb Randy Dunlap:
> > On Fri, 15 Apr 2011 21:35:32 +0200 Christian Boltz wrote:
> > > I'd like to have the exact opposite of it: beep when pinging fails.
> [...]
> > Couldn't you look for exit code (status) 1 and then do a bell/beep
> > (or play a sound file :)?
> 
> That would require that I know in advance when exactly the server is 
> unreachable - but in this case, I wouldn't need to ping it ;-)

I didn't follow that, but it's OK.

> To have this working, ping would need an option "exit on error", which 
> it doesn't have AFAIK.

'man ping' discusses exit status codes:

       If  ping  does  not  receive any reply packets at all it will exit with
       code 1. If a packet count and deadline are both  specified,  and  fewer
       than  count  packets are received by the time the deadline has arrived,
       it will also exit with code 1.  On other error it exits  with  code  2.
       Otherwise  it exits with code 0. This makes it possible to use the exit
       code to see if a host is alive or not.


> A workaround is to run ping -c1 in a loop:
> 
> while true ; do       If  ping  does  not  receive any reply packets at all it will exit with
       code 1. If a packet count and deadline are both  specified,  and  fewer
       than  count  packets are received by the time the deadline has arrived,
       it will also exit with code 1.  On other error it exits  with  code  2.
       Otherwise  it exits with code 0. This makes it possible to use the exit
       code to see if a host is alive or not.
>     ping -c1 $server || beep
>     sleep 1
> done
> 
> but I'd prefer to have something like this directly in ping ;-)
> 
> > Or do you want ping to beep and then continue running?
> 
> Yes, that's exactly what I want.


---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply

* net: Automatic IRQ siloing for network devices
From: Neil Horman @ 2011-04-15 20:17 UTC (permalink / raw)
  To: netdev; +Cc: davem

Automatic IRQ siloing for network devices

At last years netconf:
http://vger.kernel.org/netconf2010.html

Tom Herbert gave a talk in which he outlined some of the things we can do to
improve scalability and througput in our network stack

One of the big items on the slides was the notion of siloing irqs, which is the
practice of setting irq affinity to a cpu or cpu set that was 'close' to the
process that would be consuming data.  The idea was to ensure that a hard irq
for a nic (and its subsequent softirq) would execute on the same cpu as the
process consuming the data, increasing cache hit rates and speeding up overall
throughput.

I had taken an idea away from that talk, and have finally gotten around to
implementing it.  One of the problems with the above approach is that its all
quite manual.  I.e. to properly enact this siloiong, you have to do a few things
by hand:

1) decide which process is the heaviest user of a given rx queue 
2) restrict the cpus which that task will run on
3) identify the irq which the rx queue in (1) maps to
4) manually set the affinity for the irq in (3) to cpus which match the cpus in
(2)

That configuration of course has to change in response to workload changed (what
if your consumer process gets reworked so that its no longer the largest network
user, etc).  

I thought it would be good if we could automate some amount of this, and I think
I've found a way to do that.  With this patch set I introduce the ability to:

A) Register common affinity monitoring routines against a given irq which can
implement various algorithms to determine a suggested placement of said irq's
affinity

B) Add an algorithm to the network subsystem to track the amount of data that
flows through each entry in a given rx_queues rps_flow_table, and uses that data
to suggest an affinity for the irq associated with that rx queue.

This patchset lets these affinity suggestions get exported via the
/proc/irq/<n>/affinity_hint interface (which is unused in the kernel with the
exception of ixgbe).  It also exports a new proc file affinity_alg which informs
anyone interested in the affinity_hint how the hint is being computed.

Testing:
	I've been running this patchset on my dual core system here with a cxgb4
as my network interface.  I've been running a TCP STREAMS test from netperf in 2
minute increments under various conditions.  I've found experimentally that (as
you might expect) optimal performance is reached when irq affinity is bound to a
core that is not the cpu core identified by the largest RFS flow, but is as
close to it as possible (ideally sharing an L2 cache).  In that way with we
avoid the cpu contention between the softirq and the application, while still
maximizing cache hits.  In congunction with the irqbalance patch I hacked up
here:

http://people.redhat.com/nhorman/irqbalance.patch

To steer irqs that have affinity using the rfs max weight algorithm to cpus that
are as close as possible to the hinted cpu, I'm able to get approximately a 3%
speedup in receive rates over the pessimal case, and about a 1% speedup over the
nominal case (statically setting irq affinity to a single cpu).

Note: Currently this patch set only updates cxgb4 to use the new hinting
mechanism.  If this gets accepted, I have more cards to test with and plan to
update them, but I thought for a first pass it would be better to simply update
what I tested with.

Thoughts/Opinions appreciated

Thanks & Regards
Neil

^ permalink raw reply

* [PATCH 1/3] irq: Add registered affinity guidance infrastructure
From: Neil Horman @ 2011-04-15 20:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, nhorman, Dimitris Michailidis, Thomas Gleixner,
	David Howells, Eric Dumazet, Tom Herbert
In-Reply-To: <1302898677-3833-1-git-send-email-nhorman@tuxdriver.com>

From: nhorman <nhorman@devel2.think-freely.org>

This patch adds the needed data to the irq_desc struct, as well as the needed
API calls to allow the requester of an irq to register a handler function to
determine the affinity_hint of that irq when queried from user space.

Signed-offy-by: Neil Horman <nhorman@tuxdriver.com>

CC: Dimitris Michailidis <dm@chelsio.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: David Howells <dhowells@redhat.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
---
 include/linux/interrupt.h |   38 +++++++++++++++++++++++++++++++++++++-
 include/linux/irq.h       |    9 +++++++++
 include/linux/irqdesc.h   |    4 ++++
 kernel/irq/Kconfig        |   12 +++++++++++-
 kernel/irq/manage.c       |   39 +++++++++++++++++++++++++++++++++++++++
 kernel/irq/proc.c         |   35 +++++++++++++++++++++++++++++++++++
 6 files changed, 135 insertions(+), 2 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 59b72ca..6edb364 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -118,6 +118,17 @@ struct irqaction {
 } ____cacheline_internodealigned_in_smp;
 
 extern irqreturn_t no_action(int cpl, void *dev_id);
+#ifdef CONFIG_AFFINITY_UPDATE
+extern int setup_affinity_data(int irq, irq_affinity_init_t, void *);
+#else
+static inline int setup_affinity_data(int irq,
+				      irq_affinity_init_t init, void *d)
+{
+	return 0;
+}
+#endif
+
+extern void free_irq(unsigned int, void *);
 
 #ifdef CONFIG_GENERIC_HARDIRQS
 extern int __must_check
@@ -125,6 +136,32 @@ request_threaded_irq(unsigned int irq, irq_handler_t handler,
 		     irq_handler_t thread_fn,
 		     unsigned long flags, const char *name, void *dev);
 
+#ifdef CONFIG_AFFINITY_UPDATE
+static inline int __must_check
+request_affinity_irq(unsigned int irq, irq_handler_t handler,
+		     irq_handler_t thread_fn,
+		     unsigned long flags, const char *name, void *dev,
+		     irq_affinity_init_t af_init, void *af_priv)
+{
+	int rc;
+
+	rc = request_threaded_irq(irq, handler, thread_fn, flags, name, dev);
+	if (rc)
+		goto out;
+
+	if (af_init)
+		rc = setup_affinity_data(irq, af_init, af_priv);
+	if (rc)
+		free_irq(irq, dev);
+
+out:
+	return rc;
+}
+#else
+#define request_affinity_irq(irq, hnd, tfn, flg, nm, dev, init, priv) \
+	request_threaded_irq(irq, hnd, NULL, flg, nm, dev)
+#endif
+
 static inline int __must_check
 request_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,
 	    const char *name, void *dev)
@@ -167,7 +204,6 @@ request_any_context_irq(unsigned int irq, irq_handler_t handler,
 static inline void exit_irq_thread(void) { }
 #endif
 
-extern void free_irq(unsigned int, void *);
 
 struct device;
 
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 1d3577f..4bff14f 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -30,6 +30,15 @@
 
 struct irq_desc;
 struct irq_data;
+struct affin_data {
+	void *priv;
+	char *affinity_alg;
+	void (*affin_update)(int irq, struct affin_data *ad);
+	void (*affin_cleanup)(int irq, struct affin_data *ad);
+};
+
+typedef int (*irq_affinity_init_t)(int, struct affin_data*, void *);
+
 typedef	void (*irq_flow_handler_t)(unsigned int irq,
 					    struct irq_desc *desc);
 typedef	void (*irq_preflow_handler_t)(struct irq_data *data);
diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h
index 0021837..14a22fb 100644
--- a/include/linux/irqdesc.h
+++ b/include/linux/irqdesc.h
@@ -64,6 +64,10 @@ struct irq_desc {
 	struct timer_rand_state *timer_rand_state;
 	unsigned int __percpu	*kstat_irqs;
 	irq_flow_handler_t	handle_irq;
+#ifdef CONFIG_AFFINITY_UPDATE
+	struct affin_data	*af_data;
+#endif
+
 #ifdef CONFIG_IRQ_PREFLOW_FASTEOI
 	irq_preflow_handler_t	preflow_handler;
 #endif
diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig
index 09bef82..abaf19c 100644
--- a/kernel/irq/Kconfig
+++ b/kernel/irq/Kconfig
@@ -51,6 +51,17 @@ config IRQ_PREFLOW_FASTEOI
 config IRQ_FORCED_THREADING
        bool
 
+config AFFINITY_UPDATE
+	bool "Support irq affinity direction"
+	depends on GENERIC_HARDIRQS
+	---help---
+
+	Affinity updating adds the ability for requestors of irqs to
+	register affinity update methods against the irq in question
+	in so doing the requestor can be informed every time user space
+	queries an irq for its optimal affinity, giving the requstor the
+	chance to tell user space where the irq can be optimally handled
+
 config SPARSE_IRQ
 	bool "Support sparse irq numbering"
 	depends on HAVE_SPARSE_IRQ
@@ -64,6 +75,5 @@ config SPARSE_IRQ
 	    out the interrupt descriptors in a more NUMA-friendly way. )
 
 	  If you don't know what to do here, say N.
-
 endmenu
 endif
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index acd599a..257ea4d 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1159,6 +1159,17 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id)
 
 	unregister_handler_proc(irq, action);
 
+#ifdef CONFIG_AFFINITY_UPDATE
+	/*
+	 * Have to do this after we unregister proc accessors
+	 */
+	if (desc->af_data) {
+		if (desc->af_data->affin_cleanup)
+			desc->af_data->affin_cleanup(irq, desc->af_data);
+		kfree(desc->af_data);
+		desc->af_data = NULL;
+	}
+#endif
 	/* Make sure it's not being used on another CPU: */
 	synchronize_irq(irq);
 
@@ -1345,6 +1356,34 @@ int request_threaded_irq(unsigned int irq, irq_handler_t handler,
 }
 EXPORT_SYMBOL(request_threaded_irq);
 
+#ifdef CONFIG_AFFINITY_UPDATE
+int setup_affinity_data(int irq, irq_affinity_init_t af_init, void *af_priv)
+{
+	struct affin_data *data;
+	struct irq_desc *desc;
+	int rc;
+
+	desc = irq_to_desc(irq);
+	if (!desc)
+		return -ENOENT;
+
+	data = kzalloc(sizeof(struct affin_data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	rc = af_init(irq, data, af_priv);
+	if (rc) {
+		kfree(data);
+		return rc;
+	}
+
+	desc->af_data = data;
+
+	return 0;
+}
+EXPORT_SYMBOL(setup_affinity_data);
+#endif
+
 /**
  *	request_any_context_irq - allocate an interrupt line
  *	@irq: Interrupt line to allocate
diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 4cc2e5e..8fecb05 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -42,6 +42,11 @@ static int irq_affinity_hint_proc_show(struct seq_file *m, void *v)
 	if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
 		return -ENOMEM;
 
+#ifdef CONFIG_AFFINITY_UPDATE
+	if (desc->af_data && desc->af_data->affin_update)
+		desc->af_data->affin_update((long)m->private, desc->af_data);
+#endif
+
 	raw_spin_lock_irqsave(&desc->lock, flags);
 	if (desc->affinity_hint)
 		cpumask_copy(mask, desc->affinity_hint);
@@ -54,6 +59,19 @@ static int irq_affinity_hint_proc_show(struct seq_file *m, void *v)
 	return 0;
 }
 
+static int irq_affinity_alg_proc_show(struct seq_file *m, void *v)
+{
+	char *alg = "none";
+#ifdef CONFIG_AFFINITY_UPDATE
+	struct irq_desc *desc = irq_to_desc((long)m->private);
+
+	if (desc->af_data->affinity_alg)
+		alg = desc->af_data->affinity_alg;
+#endif
+	seq_printf(m, "%s\n", alg);
+	return 0;
+}
+
 #ifndef is_affinity_mask_valid
 #define is_affinity_mask_valid(val) 1
 #endif
@@ -110,6 +128,11 @@ static int irq_affinity_hint_proc_open(struct inode *inode, struct file *file)
 	return single_open(file, irq_affinity_hint_proc_show, PDE(inode)->data);
 }
 
+static int irq_affinity_alg_proc_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, irq_affinity_alg_proc_show, PDE(inode)->data);
+}
+
 static const struct file_operations irq_affinity_proc_fops = {
 	.open		= irq_affinity_proc_open,
 	.read		= seq_read,
@@ -125,6 +148,13 @@ static const struct file_operations irq_affinity_hint_proc_fops = {
 	.release	= single_release,
 };
 
+static const struct file_operations irq_affinity_alg_proc_fops = {
+	.open		= irq_affinity_alg_proc_open,
+	.read           = seq_read,
+	.llseek         = seq_lseek,
+	.release        = single_release,
+};
+
 static int default_affinity_show(struct seq_file *m, void *v)
 {
 	seq_cpumask(m, irq_default_affinity);
@@ -288,6 +318,11 @@ void register_irq_proc(unsigned int irq, struct irq_desc *desc)
 	/* create /proc/irq/<irq>/affinity_hint */
 	proc_create_data("affinity_hint", 0400, desc->dir,
 			 &irq_affinity_hint_proc_fops, (void *)(long)irq);
+#ifdef CONFIG_AFFINITY_UPDATE
+	/* Create /proc/irq/<irq>/affinity_alg */
+	proc_create_data("affinity_alg", 0400, desc->dir,
+			&irq_affinity_alg_proc_fops, (void *)(long)irq);
+#endif
 
 	proc_create_data("node", 0444, desc->dir,
 			 &irq_node_proc_fops, (void *)(long)irq);
-- 
1.7.4.2


^ permalink raw reply related

* [PATCH 2/3] net: Add net device irq siloing feature
From: Neil Horman @ 2011-04-15 20:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, Neil Horman, Dimitris Michailidis, Thomas Gleixner,
	David Howells, Eric Dumazet, Tom Herbert
In-Reply-To: <1302898677-3833-1-git-send-email-nhorman@tuxdriver.com>

Using the irq affinity infrastrucuture, we can now allow net devices to call
request_irq using a new wrapper function (request_net_irq), which will attach a
common affinty_update handler to each requested irq.  This affinty update
mechanism correlates each tracked irq to the flow(s) that said irq processes
most frequently.  The highest traffic flow is noted, marked and exported to user
space via the affinity_hint proc file for each irq. In this way, utilities like
irqbalance are able to determine  which cpu is recieving the most data from each
rx queue on a given NIC, and set irq affinity accordingly.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

CC: Dimitris Michailidis <dm@chelsio.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: David Howells <dhowells@redhat.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
---
 include/linux/netdevice.h  |   18 +++++++
 kernel/irq/proc.c          |    2 +-
 net/Kconfig                |   12 +++++
 net/core/dev.c             |  107 ++++++++++++++++++++++++++++++++++++++++++++
 net/core/sysctl_net_core.c |    9 ++++
 5 files changed, 147 insertions(+), 1 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5eeb2cd..ba6191f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -609,6 +609,9 @@ struct rps_map {
 struct rps_dev_flow {
 	u16 cpu;
 	u16 filter;
+#ifdef CONFIG_RFS_SILOING
+	u32 weight;
+#endif
 	unsigned int last_qtail;
 };
 #define RPS_NO_FILTER 0xffff
@@ -1631,6 +1634,21 @@ static inline void unregister_netdevice(struct net_device *dev)
 	unregister_netdevice_queue(dev, NULL);
 }
 
+#ifdef CONFIG_RFS_SILOING
+extern int netdev_rxq_silo_init(int irq, struct affin_data *afd, void *priv);
+extern int sysctl_irq_siloing_period;
+
+static inline int __must_check
+request_net_irq(unsigned int irq, irq_handler_t handler, unsigned long flags,
+		const char *name, void *dev, struct net_device *ndev, int rxq)
+{
+	return request_affinity_irq(irq, handler, NULL, flags, name, dev,
+				    netdev_rxq_silo_init, &ndev->_rx[rxq]);
+}
+#else
+#define request_net_irq(i, h, f, n, d, nd, r) request_irq(i, h, NULL, f, n, d)
+#endif
+
 extern int 		netdev_refcnt_read(const struct net_device *dev);
 extern void		free_netdev(struct net_device *dev);
 extern void		synchronize_net(void);
diff --git a/kernel/irq/proc.c b/kernel/irq/proc.c
index 8fecb05..d5a7e4d 100644
--- a/kernel/irq/proc.c
+++ b/kernel/irq/proc.c
@@ -65,7 +65,7 @@ static int irq_affinity_alg_proc_show(struct seq_file *m, void *v)
 #ifdef CONFIG_AFFINITY_UPDATE
 	struct irq_desc *desc = irq_to_desc((long)m->private);
 
-	if (desc->af_data->affinity_alg)
+	if (desc->af_data && desc->af_data->affinity_alg)
 		alg = desc->af_data->affinity_alg;
 #endif
 	seq_printf(m, "%s\n", alg);
diff --git a/net/Kconfig b/net/Kconfig
index 79cabf1..d6ef6f5 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -232,6 +232,18 @@ config XPS
 	depends on SMP && SYSFS && USE_GENERIC_SMP_HELPERS
 	default y
 
+config RFS_SILOING
+	boolean
+	depends on RFS_ACCEL && AFFINITY_UPDATE
+	default y
+	---help---
+	 This feature allows appropriately enabled network drivers to
+	 export affinity_hint data to user space based on the RFS flow hash
+	 table for the rx queue associated with a given interrupt.  This allows
+	 userspace to optimize irq affinity such that a given rx queue has its
+	 interrupt serviced on the same cpu/l2 cache/numa node running the process
+	 that consumes most of its data.
+
 menu "Network testing"
 
 config NET_PKTGEN
diff --git a/net/core/dev.c b/net/core/dev.c
index 0b88eba..4d86137 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -173,6 +173,9 @@
 #define PTYPE_HASH_SIZE	(16)
 #define PTYPE_HASH_MASK	(PTYPE_HASH_SIZE - 1)
 
+#ifdef CONFIG_RFS_SILOING
+int sysctl_irq_siloing_period;
+#endif
 static DEFINE_SPINLOCK(ptype_lock);
 static struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
 static struct list_head ptype_all __read_mostly;	/* Taps */
@@ -2640,6 +2643,9 @@ set_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		rflow->filter = rc;
 		if (old_rflow->filter == rflow->filter)
 			old_rflow->filter = RPS_NO_FILTER;
+#ifdef CONFIG_RFS_SILOING
+		old_rflow->weight = rflow->weight = 0;
+#endif
 	out:
 #endif
 		rflow->last_qtail =
@@ -2723,6 +2729,10 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb,
 		      rflow->last_qtail)) >= 0))
 			rflow = set_rps_cpu(dev, skb, rflow, next_cpu);
 
+#ifdef CONFIG_RFS_SILOING
+		rflow->weight += skb->len;
+#endif
+
 		if (tcpu != RPS_NO_CPU && cpu_online(tcpu)) {
 			*rflowp = rflow;
 			cpu = tcpu;
@@ -6224,6 +6234,103 @@ static struct hlist_head *netdev_create_hash(void)
 	return hash;
 }
 
+#ifdef CONFIG_RFS_SILOING
+struct netdev_rxq_affin_data {
+	struct netdev_rx_queue *q;
+	unsigned long last_update;
+	cpumask_var_t affinity_mask;
+};
+
+static void netdev_rxq_silo_affin_update(int irq, struct affin_data *afd)
+{
+	struct netdev_rxq_affin_data *afdp = afd->priv;
+	struct netdev_rx_queue *q = afdp->q;
+	struct rps_dev_flow_table *flow_table;
+	int i;
+	u16 tcpu;
+	u32 mw;
+	unsigned long next_update;
+
+	mw = tcpu = 0;
+
+	next_update = afdp->last_update + (sysctl_irq_siloing_period * HZ);
+
+	if (time_after(next_update, jiffies))
+		return;
+
+	afdp->last_update = jiffies;
+
+	irq_set_affinity_hint(irq, NULL);
+	cpumask_clear(afdp->affinity_mask);
+	rcu_read_lock();
+	flow_table = rcu_dereference(q->rps_flow_table);
+
+	if (!flow_table)
+		goto out;
+
+	for (i = 0; (i & flow_table->mask) == i; i++) {
+		if (mw < flow_table->flows[i].weight) {
+			tcpu = ACCESS_ONCE(flow_table->flows[i].cpu);
+			if (tcpu == RPS_NO_CPU)
+				continue;
+			mw = flow_table->flows[i].weight;
+		}
+	}
+
+
+	if (mw) {
+		cpumask_set_cpu(tcpu, afdp->affinity_mask);
+		irq_set_affinity_hint(irq, afdp->affinity_mask);
+	}
+out:
+	rcu_read_unlock();
+	return;
+}
+
+static void netdev_rxq_silo_cleanup(int irq, struct affin_data *afd)
+{
+	struct netdev_rxq_affin_data *afdp = afd->priv;
+
+	free_cpumask_var(afdp->affinity_mask);
+	kfree(afdp);
+	afd->priv = NULL;
+}
+
+/**
+ *	netdev_rxq_silo_init - setup an irq to be siloed
+ *
+ *	initalizes the irq data required to allow the networking
+ *	subsystem to determine which cpu is best suited to
+ *      service the passed in irq, and then export that data
+ *	via the affinity_hint proc interface
+ */
+int netdev_rxq_silo_init(int irq, struct affin_data *afd, void *priv)
+{
+	struct netdev_rxq_affin_data *afdp;
+
+	afd->priv = afdp = kzalloc(sizeof(struct netdev_rxq_affin_data),
+				   GFP_KERNEL);
+	if (!afdp)
+		return -ENOMEM;
+
+	if (!alloc_cpumask_var(&afdp->affinity_mask, GFP_KERNEL)) {
+		kfree(afdp);
+		return -ENOMEM;
+	}
+
+	cpumask_clear(afdp->affinity_mask);
+
+	afdp->q = priv;
+	afdp->last_update = jiffies;
+	afd->affin_update = netdev_rxq_silo_affin_update;
+	afd->affin_cleanup = netdev_rxq_silo_cleanup;
+	afd->affinity_alg = "net:rfs max weight";
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(netdev_rxq_silo_init);
+#endif
+
 /* Initialize per network namespace state */
 static int __net_init netdev_init(struct net *net)
 {
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 385b609..b5c733e 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -158,6 +158,15 @@ static struct ctl_table net_core_table[] = {
 		.proc_handler	= rps_sock_flow_sysctl
 	},
 #endif
+#ifdef CONFIG_RFS_SILOING
+	{
+		.procname	= "irq_siloing_period",
+		.data		= &sysctl_irq_siloing_period,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec
+	},
+#endif
 #endif /* CONFIG_NET */
 	{
 		.procname	= "netdev_budget",
-- 
1.7.4.2


^ permalink raw reply related

* [PATCH 3/3] net: Adding siloing irqs to cxgb4 driver
From: Neil Horman @ 2011-04-15 20:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, Neil Horman, Dimitris Michailidis, Thomas Gleixner,
	David Howells, Eric Dumazet, Tom Herbert
In-Reply-To: <1302898677-3833-1-git-send-email-nhorman@tuxdriver.com>

cxgb4 hardware has been tested here and shows correct functionality with
affinity hinting infrastructure

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

CC: Dimitris Michailidis <dm@chelsio.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: David Howells <dhowells@redhat.com>
CC: Eric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
---
 drivers/net/cxgb4/cxgb4_main.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/cxgb4/cxgb4_main.c b/drivers/net/cxgb4/cxgb4_main.c
index 5352c8a..11aeef6 100644
--- a/drivers/net/cxgb4/cxgb4_main.c
+++ b/drivers/net/cxgb4/cxgb4_main.c
@@ -562,9 +562,11 @@ static int request_msix_queue_irqs(struct adapter *adap)
 		return err;
 
 	for_each_ethrxq(s, ethqidx) {
-		err = request_irq(adap->msix_info[msi].vec, t4_sge_intr_msix, 0,
+		err = request_net_irq(adap->msix_info[msi].vec, t4_sge_intr_msix, 0,
 				  adap->msix_info[msi].desc,
-				  &s->ethrxq[ethqidx].rspq);
+				  &s->ethrxq[ethqidx].rspq,
+				  adap->port[ethqidx/MAX_NPORTS],
+				  ethqidx % MAX_NPORTS);
 		if (err)
 			goto unwind;
 		msi++;
-- 
1.7.4.2


^ permalink raw reply related

* Re: Feature request: "inverted" ping -a (beep on failure)
From: Denys Fedoryshchenko @ 2011-04-15 20:10 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Christian Boltz, netdev
In-Reply-To: <20110415124937.6e746646.rdunlap@xenotime.net>

 On Fri, 15 Apr 2011 12:49:37 -0700, Randy Dunlap wrote:
> On Fri, 15 Apr 2011 21:35:32 +0200 Christian Boltz wrote:
>
>> Hello,
>>
>> ping -a (beep on ping success) is a quite useful command, but it can 
>> be
>> annoying.
>>
>> I'd like to have the exact opposite of it: beep when pinging fails.
>>
>> I understand that this is slightly difficult because "ping success" 
>> is
>> easier to detect (incoming package) than "ping failure" (no incoming
>> package or firewall reject) - my proposal is to have a timeout for 
>> every
>> package (if no reply package comes in) and beep if no reply is seen
>> after the timeout is over.
>>
>> For the timeout, the -W option could be used. The default timeout 
>> seems
>> to be 10 seconds, which is OK.
>>
>> Usecase / why this would be useful for me:
>> Basically for server monitoring. The exact usecase is that I have 
>> rented
>> a "root server" and asked the hoster to exchange a broken harddisk.
>> With the "inverted" ping -a, it would be easy to notice when they 
>> switch
>> off the server to replace the disk.
>>
>> Please consider this feature for the next version of ping ;-)
>>
>>
>> (The iputils homepage does not list any bugtracker or similar, 
>> therefore
>> I'm asking here.)
>
> Couldn't you look for exit code (status) 1 and then do a bell/beep
> (or play a sound file :)?
>
> Or do you want ping to beep and then continue running?
>
 I wrote my own tool and call it ping watchdog (i so ideas about ping 
 watchdog in other projects, just improved it a little) :-)
 Probably it can be useful here, it can run script if ping fail more 
 than N packets... it is a bit undocumented and cryptic, but i can 
 improve it.

 http://code.google.com/p/sysadmin-tools/source/browse/trunk/pingwdog/pingwdog.c


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox