Netdev List
 help / color / mirror / Atom feed
* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Nicolas de Pesloüan @ 2011-02-20 12:12 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jay Vosburgh, David Miller, kaber, eric.dumazet, netdev,
	shemminger, andy, Fischer, Anna
In-Reply-To: <20110220103609.GA2750@psychotron.redhat.com>

Le 20/02/2011 11:36, Jiri Pirko a écrit :
> Sat, Feb 19, 2011 at 09:27:37PM CET, nicolas.2p.debian@gmail.com wrote:
>> Le 19/02/2011 14:46, Jiri Pirko a écrit :
>>> Sat, Feb 19, 2011 at 02:18:00PM CET, nicolas.2p.debian@gmail.com wrote:
>>>> Le 19/02/2011 12:28, Jiri Pirko a écrit :
>>>>> Sat, Feb 19, 2011 at 12:08:31PM CET, jpirko@redhat.com wrote:
>>>>>> Sat, Feb 19, 2011 at 11:56:23AM CET, nicolas.2p.debian@gmail.com wrote:
>>>>>>> Le 19/02/2011 09:05, Jiri Pirko a écrit :
>>>>>>>> This patch converts bonding to use rx_handler. Results in cleaner
>>>>>>>> __netif_receive_skb() with much less exceptions needed. Also
>>>>>>>> bond-specific work is moved into bond code.
>>>>>>>>
>>>>>>>> Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>>>>>>>
>>>>>>>> v1->v2:
>>>>>>>>          using skb_iif instead of new input_dev to remember original
>>>>>>>> 	device
>>>>>>>> v2->v3:
>>>>>>>> 	set orig_dev = skb->dev if skb_iif is set
>>>>>>>>
>>>>>>>
>>>>>>> Why do we need to let the rx_handlers call netif_rx() or __netif_receive_skb()?
>>>>>>>
>>>>>>> Bonding used to be handled with very few overhead, simply replacing
>>>>>>> skb->dev with skb->dev->master. Time has passed and we eventually
>>>>>>> added many special processing for bonding into __netif_receive_skb(),
>>>>>>> but the overhead remained very light.
>>>>>>>
>>>>>>> Calling netif_rx() (or __netif_receive_skb()) to allow nesting would probably lead to some overhead.
>>>>>>>
>>>>>>> Can't we, instead, loop inside __netif_receive_skb(), and deliver
>>>>>>> whatever need to be delivered, to whoever need, inside the loop ?
>>>>>>>
>>>>>>> rx_handler = rcu_dereference(skb->dev->rx_handler);
>>>>>>> while (rx_handler) {
>>>>>>> 	/* ...  */
>>>>>>> 	orig_dev = skb->dev;
>>>>>>> 	skb = rx_handler(skb);
>>>>>>> 	/* ... */
>>>>>>> 	rx_handler = (skb->dev != orig_dev) ? rcu_dereference(skb->dev->rx_handler) : NULL;
>>>>>>> }
>>>>>>>
>>>>>>> This would reduce the overhead, while still allowing nesting: vlan on
>>>>>>> top on bonding, bridge on top on bonding, ...
>>>>>>
>>>>>> I see your point. Makes sense to me. But the loop would have to include
>>>>>> at least processing of ptype_all too. I'm going to cook a follow-up
>>>>>> patch.
>>>>>>
>>>>>
>>>>> DRAFT (doesn't modify rx_handlers):
>>>>>
>>>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>>>> index 4ebf7fe..e5dba47 100644
>>>>> --- a/net/core/dev.c
>>>>> +++ b/net/core/dev.c
>>>>> @@ -3115,6 +3115,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>>   {
>>>>>   	struct packet_type *ptype, *pt_prev;
>>>>>   	rx_handler_func_t *rx_handler;
>>>>> +	struct net_device *dev;
>>>>>   	struct net_device *orig_dev;
>>>>>   	struct net_device *null_or_dev;
>>>>>   	int ret = NET_RX_DROP;
>>>>> @@ -3129,7 +3130,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>>   	if (netpoll_receive_skb(skb))
>>>>>   		return NET_RX_DROP;
>>>>>
>>>>> -	__this_cpu_inc(softnet_data.processed);
>>>>> +	skb->skb_iif = skb->dev->ifindex;
>>>>> +	orig_dev = skb->dev;
>>>>
>>>> orig_dev should be set inside the loop, to reflect "previously
>>>> crossed device", while following the path:
>>>>
>>>> eth0 ->   bond0 ->   br0.
>>>>
>>>> First step inside loop:
>>>>
>>>> orig_dev = eth0
>>>> skb->dev = bond0 (at the end of the loop).
>>>>
>>>> Second step inside loop:
>>>>
>>>> orig_dev = bond0
>>>> skb->dev = br0 (et the end of the loop).
>>>>
>>>> This would allow for exact match delivery to bond0 if someone bind there.
>>>>
>>>>> +
>>>>>   	skb_reset_network_header(skb);
>>>>>   	skb_reset_transport_header(skb);
>>>>>   	skb->mac_len = skb->network_header - skb->mac_header;
>>>>> @@ -3138,12 +3141,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>>
>>>>>   	rcu_read_lock();
>>>>>
>>>>> -	if (!skb->skb_iif) {
>>>>> -		skb->skb_iif = skb->dev->ifindex;
>>>>> -		orig_dev = skb->dev;
>>>>> -	} else {
>>>>> -		orig_dev = dev_get_by_index_rcu(dev_net(skb->dev), skb->skb_iif);
>>>>> -	}
>>>>
>>>> I like the fact that it removes the above part.
>>>>
>>>>> +another_round:
>>>>> +	__this_cpu_inc(softnet_data.processed);
>>>>> +	dev = skb->dev;
>>>>>
>>>>>   #ifdef CONFIG_NET_CLS_ACT
>>>>>   	if (skb->tc_verd&    TC_NCLS) {
>>>>> @@ -3153,7 +3153,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>>   #endif
>>>>>
>>>>>   	list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>>> -		if (!ptype->dev || ptype->dev == skb->dev) {
>>>>> +		if (!ptype->dev || ptype->dev == dev) {
>>>>>   			if (pt_prev)
>>>>>   				ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>>   			pt_prev = ptype;
>>>>
>>>> Inside the loop, we should only do exact match delivery, for
>>>> &ptype_all and for&ptype_base[ntohs(type)&   PTYPE_HASH_MASK]:
>>>>
>>>>         list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>> -               if (!ptype->dev || ptype->dev == dev) {
>>>> +               if (ptype->dev == dev) {
>>>>                         if (pt_prev)
>>>>                                 ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>                         pt_prev = ptype;
>>>>                 }
>>>>         }
>>>>
>>>>
>>>>         list_for_each_entry_rcu(ptype,
>>>>                         &ptype_base[ntohs(type)&   PTYPE_HASH_MASK], list) {
>>>>                 if (ptype->type == type&&
>>>> -                   (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>>>> +                   (ptype->dev == skb->dev)) {
>>>>                         if (pt_prev)
>>>>                                 ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>                         pt_prev = ptype;
>>>>                 }
>>>>         }
>>>>
>>>> After leaving the loop, we can do wilcard delivery, if skb is not NULL.
>>>>
>>>>         list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>> -               if (!ptype->dev || ptype->dev == dev) {
>>>> +               if (!ptype->dev) {
>>>>                         if (pt_prev)
>>>>                                 ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>                         pt_prev = ptype;
>>>>                }
>>>>         }
>>>>
>>>>
>>>>         list_for_each_entry_rcu(ptype,
>>>>                         &ptype_base[ntohs(type)&   PTYPE_HASH_MASK], list) {
>>>> -               if (ptype->type == type&&
>>>> -                   (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>>>> +		if (ptype->type == type&&   !ptype->dev) {
>>>>                         if (pt_prev)
>>>>                                 ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>                         pt_prev = ptype;
>>>>                 }
>>>>         }
>>>>
>>>> This would reduce the number of tests inside the
>>>> list_for_each_entry_rcu() loops. And because we match only ptype->dev
>>>> == dev inside the loop and !ptype->dev outside the loop, this should
>>>> avoid duplicate delivery.
>>>
>>> Would you care to put this into patch so I can see the whole picture?
>>> Thanks.
>>
>> Here is what I have in mind. It is based on your previous DRAFT patch, and don't modify rx_handlers yet.
>>
>> Only compile tested !!
>>
>> I don't know if every pieces are at the right place. I wonder what to
>> do with CONFIG_NET_CLS_ACT part, that currently is between ptype_all
>> and ptype_base processing.
>>
>> Anyway, the general idea is there.
>>
>> 	Nicolas.
>>
>> net/core/dev.c |   70 ++++++++++++++++++++++++++++++++++++++++++++++++--------
>> 1 files changed, 60 insertions(+), 10 deletions(-)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index e5dba47..7e007a9 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3117,7 +3117,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
>> 	rx_handler_func_t *rx_handler;
>> 	struct net_device *dev;
>> 	struct net_device *orig_dev;
>> -	struct net_device *null_or_dev;
>> 	int ret = NET_RX_DROP;
>> 	__be16 type;
>>
>> @@ -3130,9 +3129,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
>> 	if (netpoll_receive_skb(skb))
>> 		return NET_RX_DROP;
>>
>> -	skb->skb_iif = skb->dev->ifindex;
>> -	orig_dev = skb->dev;
>> -
>> 	skb_reset_network_header(skb);
>> 	skb_reset_transport_header(skb);
>> 	skb->mac_len = skb->network_header - skb->mac_header;
>> @@ -3143,6 +3139,8 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>
>> another_round:
>> 	__this_cpu_inc(softnet_data.processed);
>> +	skb->skb_iif = skb->dev->ifindex;
>> +	orig_dev = skb->dev;
> orig_dev should be set at the end of the loop. Now you are going to have
> it always the same as dev and skb->dev.
>

Yes, you are right.

I thinking about all this, I wonder what the protocol handlers expect as the orig_dev value ?

Lest imagine the following configuration: eth0 -> bond0 -> br0.

What does a protocol handler listening on br0 expect for orig_dev ? bond0 or eth0 ? Current 
implementation give eth0, but I think bond0 should be the right value, for proper nesting.

>> 	dev = skb->dev;
>>
>> #ifdef CONFIG_NET_CLS_ACT
>> @@ -3152,8 +3150,13 @@ another_round:
>> 	}
>> #endif
>>
>> +	/*
>> +	 * Deliver to ptype_all protocol handlers that match current dev.
>> +	 * This happens before rx_handler is given a chance to change skb->dev.
>> +	 */
>> +
>> 	list_for_each_entry_rcu(ptype,&ptype_all, list) {
>> -		if (!ptype->dev || ptype->dev == dev) {
>> +		if (ptype->dev == dev) {
>> 			if (pt_prev)
>> 				ret = deliver_skb(skb, pt_prev, orig_dev);
>> 			pt_prev = ptype;
>> @@ -3167,6 +3170,31 @@ another_round:
>> ncls:
>> #endif
>>
>> +	/*
>> +	 * Deliver to ptype_base protocol handlers that match current dev.
>> +	 * This happens before rx_handler is given a chance to change skb->dev.
>> +	 */
>> +
>> +	type = skb->protocol;
>> +	list_for_each_entry_rcu(ptype,
>> +			&ptype_base[ntohs(type)&  PTYPE_HASH_MASK], list) {
>> +		if (ptype->type == type&&  ptype->dev == skb->dev) {
>> +			if (pt_prev)
>> +				ret = deliver_skb(skb, pt_prev, orig_dev);
>> +			pt_prev = ptype;
>> +		}
>> +	}
>
> I'm not sure it is ok to deliver ptype_base here. See comment above
> ptype_head() (I'm not sure I understand that correctly)

Anyway, all this is probably plain wrong: Delivering the skb to protocol handlers while still 
changing the skb is guaranteed to cause strange behaviors.

If we want to be able to deliver the skb to different protocol handlers and give all of them the 
right values for dev->skb and orig_dev (or previous_dev), we might end up with copying the skb. I 
hate the idea, but currently can't find a cleaner way to do so.

We first need to clarify what orig_dev should be, as stated above.

>> +
>> +	/*
>> +	 * Call rx_handler for current device.
>> +	 * If rx_handler return NULL, skip wilcard protocol handler delivery.
>> +	 * Else, if skb->dev changed, restart the whole delivery process, to
>> +	 * allow for device nesting.
>> +	 *
>> +	 * Warning:
>> +	 * rx_handlers must kfree_skb(skb) if they return NULL.
> Well this is not true. They can return NULL and call netif_rx as they
> have before. No changes necessary I believe.

I don't really know. This needs to be double checked, anyway.

>> +	 */
>> +
>> 	rx_handler = rcu_dereference(dev->rx_handler);
>> 	if (rx_handler) {
>> 		if (pt_prev) {
>> @@ -3176,10 +3204,15 @@ ncls:
>> 		skb = rx_handler(skb);
>> 		if (!skb)
>> 			goto out;
>> -		if (dev != skb->dev)
>> +		if (skb->dev != dev)
>> 			goto another_round;
>> 	}
>>
>> +	/*
>> +	 * FIXME: The part below should use rx_handler instead of being hard
>> +	 * coded here.
> I'm not sure it is doable atm. For bridge and bond it should not be a
> problem, but for macvlan, there is possible to have macvlans and vlans
> on the same dev. This possibility should persist.
> /me scratches head on the idea to have multiple rx_handlers although it
> was his original idea....

I think your original proposal of having several rx_handlers per device was right.

At the time you introduced the rx_handler system, only bridge and macvlan used it. Even if using 
bridge and macvlan on the same base device might be useless, this is not true for every possible 
rx_handler configuration.

Now that we want to move bonding and vlan to the rx_handler system, it becomes obvious that we need 
several rx_handlers per device. At least, vlan should properly mix with bridge. And who know what 
would be the fifth rx_handler...

>> +	 */
>> +
>> 	if (vlan_tx_tag_present(skb)) {
>> 		if (pt_prev) {
>> 			ret = deliver_skb(skb, pt_prev, orig_dev);
>> @@ -3192,16 +3225,33 @@ ncls:
>> 			goto out;
>> 	}
>>
>> +	/*
>> +	 * FIXME: Can't this be moved into the rx_handler for bonding,
>> +	 * or into a futur rx_handler for vlan?
> This hook is something I do not like at all :/ But anyway if should be in vlan
> part I think.

Yes, and in order for the future rx_handler for vlan to properly handle it, it needs to know the 
device just below it, not the pure original device. Hence, my question about the exact meaning of 
orig_dev...

	Nicolas.

>> +	 */
>> +
>> 	vlan_on_bond_hook(skb);
>>
>> -	/* deliver only exact match when indicated */
>> -	null_or_dev = skb->deliver_no_wcard ? skb->dev : NULL;
>> +	/*
>> +	 * Deliver to wildcard ptype_all protocol handlers.
>> +	 */
>> +
>> +	list_for_each_entry_rcu(ptype,&ptype_all, list) {
>> +		if (!ptype->dev) {
>> +			if (pt_prev)
>> +				ret = deliver_skb(skb, pt_prev, orig_dev);
>> +			pt_prev = ptype;
>> +		}
>> +	}
>> +
>> +	/*
>> +	 * Deliver to wildcard ptype_all protocol handlers.
>> +	 */
>>
>> 	type = skb->protocol;
>> 	list_for_each_entry_rcu(ptype,
>> 			&ptype_base[ntohs(type)&  PTYPE_HASH_MASK], list) {
>> -		if (ptype->type == type&&
>> -		    (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>> +		if (ptype->type == type&&  !ptype->dev) {
>> 			if (pt_prev)
>> 				ret = deliver_skb(skb, pt_prev, orig_dev);
>> 			pt_prev = ptype;
>> --
>> 1.7.2.3
>>
>>
>>

^ permalink raw reply

* Re: [PATCH] net: fix unreg list corruption in dev_deactivate()
From: Stanislaw Gruszka @ 2011-02-20 12:44 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Octavian Purdila, David S. Miller
In-Reply-To: <1298203890.8559.54.camel@edumazet-laptop>

On Sun, Feb 20, 2011 at 01:11:30PM +0100, Eric Dumazet wrote:
> Hmm, you should read Eric B patch, he already addressed this problem a
> few hours ago.

Ajjj, my few hours of debugging wasted :-/

Stanislaw

^ permalink raw reply

* Re: IGMP and rwlock: Dead ocurred again on TILEPro
From: Chris Metcalf @ 2011-02-20 13:33 UTC (permalink / raw)
  To: Cypher Wu
  Cc: David Miller, xiyou.wangcong, linux-kernel, eric.dumazet, netdev
In-Reply-To: <AANLkTi=uQTQjrXn4_w-YwKCutpENFdFrxQed5kVKXTDF@mail.gmail.com>

On 2/18/2011 11:07 PM, Cypher Wu wrote:
> On Sat, Feb 19, 2011 at 5:51 AM, Chris Metcalf <cmetcalf@tilera.com> wrote:
>> I heard from one of our support folks that you were asking through that
>> channel, so I asked him to go ahead and give you the spinlock sources
>> directly.  I will be spending time next week syncing up our internal tree
>> with the public git repository so you'll see it on LKML at that time.
> I've got your source code, thank you very much.
>
> There is still two more question:
> 1. Why we merge the inlined code and the *_slow into none inlined functions?

Those functions were always borderline in terms of being sensible inlined
functions.  In my opinion, adding the SPR writes as well pushed them over
the edge, so I made them just straight function calls instead, for code
density reasons.  It also makes the code simpler, which is a plus.  And
since I was changing the read_lock versions I changed the write_lock
versions as well for consistency.

> 2. I've seen the use of 'mb()' in unlock operation, but we don't use
> that in the lock operation.

You don't need a memory barrier when acquiring a lock.  (Well, some
architectures require a read barrier, but Tile doesn't speculate loads past
control dependencies at the moment.)

> I've released a temporary version with that modification under our
> customer' demand, since they want to do a long time test though this
> weekend. I'll appreciate that if you gave some comment on my
> modifications:

It seems OK functionally, and has the advantage of addressing the deadlock
without changing the module API, so it's appropriate if you're trying to
maintain binary compatibility.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

^ permalink raw reply

* [PATCH net-2.6] bnx2x: Add a missing bit for PXP parity register of 57712.
From: Vlad Zolotarov @ 2011-02-20 14:27 UTC (permalink / raw)
  To: Dave Miller, netdev@vger.kernel.org, Eilon Greenstein

Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
---
 drivers/net/bnx2x/bnx2x_init.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/bnx2x/bnx2x_init.h b/drivers/net/bnx2x/bnx2x_init.h
index 5a268e9..fa6dbe3 100644
--- a/drivers/net/bnx2x/bnx2x_init.h
+++ b/drivers/net/bnx2x/bnx2x_init.h
@@ -241,7 +241,7 @@ static const struct {
 	/* Block IGU, MISC, PXP and PXP2 parity errors as long as we don't
 	 * want to handle "system kill" flow at the moment.
 	 */
-	BLOCK_PRTY_INFO(PXP, 0x3ffffff, 0x3ffffff, 0x3ffffff, 0x3ffffff),
+	BLOCK_PRTY_INFO(PXP, 0x7ffffff, 0x3ffffff, 0x3ffffff, 0x7ffffff),
 	BLOCK_PRTY_INFO_0(PXP2,	0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff),
 	BLOCK_PRTY_INFO_1(PXP2,	0x7ff, 0x7f, 0x7f, 0x7ff),
 	BLOCK_PRTY_INFO(HC, 0x7, 0x7, 0x7, 0),
-- 
1.7.0.4



^ permalink raw reply related

* Re: [PATCH] connector: Convert char *name to const char *name
From: Evgeniy Polyakov @ 2011-02-20 14:32 UTC (permalink / raw)
  To: Joe Perches
  Cc: Javier Martinez Canillas, Greg Kroah-Hartman, devel,
	K. Y. Srinivasan, netdev
In-Reply-To: <1298159129.7179.32.camel@Joe-Laptop>

Hi Joe.

On Sat, Feb 19, 2011 at 03:45:29PM -0800, Joe Perches (joe@perches.com) wrote:
> Allow more const declarations.
> 
> Signed-off-by: Joe Perches <joe@perches.com>
> 
> ---
> 
> Better to change the declarations and uses as this argument
> is not modified.

Looks good, thank you.
Greg, please push it into your tree.

Acked-by: Evgeniy Polyakov <zbr@ioremap.net>

-- 
	Evgeniy Polyakov

^ permalink raw reply

* [PATCH  kernel 2.6.38-rc5] fmvj18x_cs: add new id
From: Ken Kawasaki @ 2011-02-20 15:07 UTC (permalink / raw)
  To: netdev
In-Reply-To: <20110131061616.05b2fa6f.ken_kawasaki@spring.nifty.jp>


fmvj18x_cs:add new id
           Toshiba lan&modem multifuction card (model name:IPC5010A)

Signed-off-by: Ken Kawasaki <ken_kawasaki@spring.nifty.jp>

---

--- linux-2.6.38-rc5/drivers/net/pcmcia/fmvj18x_cs.c.orig	2011-02-20 14:04:06.000000000 +0900
+++ linux-2.6.38-rc5/drivers/net/pcmcia/fmvj18x_cs.c	2011-02-20 14:04:21.000000000 +0900
@@ -691,6 +691,7 @@ static struct pcmcia_device_id fmvj18x_i
 	PCMCIA_PFC_DEVICE_MANF_CARD(0, 0x0105, 0x0e0a),
 	PCMCIA_PFC_DEVICE_MANF_CARD(0, 0x0032, 0x0e01),
 	PCMCIA_PFC_DEVICE_MANF_CARD(0, 0x0032, 0x0a05),
+	PCMCIA_PFC_DEVICE_MANF_CARD(0, 0x0032, 0x0b05),
 	PCMCIA_PFC_DEVICE_MANF_CARD(0, 0x0032, 0x1101),
 	PCMCIA_DEVICE_NULL,
 };
--- linux-2.6.38-rc5/drivers/tty/serial/serial_cs.c.orig	2011-02-20 14:05:09.000000000 +0900
+++ linux-2.6.38-rc5/drivers/tty/serial/serial_cs.c	2011-02-20 14:05:28.000000000 +0900
@@ -712,6 +712,7 @@ static struct pcmcia_device_id serial_id
 	PCMCIA_PFC_DEVICE_PROD_ID12(1, "Xircom", "CreditCard Ethernet+Modem II", 0x2e3ee845, 0xeca401bf),
 	PCMCIA_PFC_DEVICE_MANF_CARD(1, 0x0032, 0x0e01),
 	PCMCIA_PFC_DEVICE_MANF_CARD(1, 0x0032, 0x0a05),
+	PCMCIA_PFC_DEVICE_MANF_CARD(1, 0x0032, 0x0b05),
 	PCMCIA_PFC_DEVICE_MANF_CARD(1, 0x0032, 0x1101),
 	PCMCIA_MFC_DEVICE_MANF_CARD(0, 0x0104, 0x0070),
 	PCMCIA_MFC_DEVICE_MANF_CARD(1, 0x0101, 0x0562),


^ permalink raw reply

* Re: [patch net-next-2.6 V3] net: convert bonding to use rx_handler
From: Jiri Pirko @ 2011-02-20 15:07 UTC (permalink / raw)
  To: Nicolas de Pesloüan
  Cc: Jay Vosburgh, David Miller, kaber, eric.dumazet, netdev,
	shemminger, andy, Fischer, Anna
In-Reply-To: <4D610511.4050902@gmail.com>

Sun, Feb 20, 2011 at 01:12:01PM CET, nicolas.2p.debian@gmail.com wrote:
>Le 20/02/2011 11:36, Jiri Pirko a écrit :
>>Sat, Feb 19, 2011 at 09:27:37PM CET, nicolas.2p.debian@gmail.com wrote:
>>>Le 19/02/2011 14:46, Jiri Pirko a écrit :
>>>>Sat, Feb 19, 2011 at 02:18:00PM CET, nicolas.2p.debian@gmail.com wrote:
>>>>>Le 19/02/2011 12:28, Jiri Pirko a écrit :
>>>>>>Sat, Feb 19, 2011 at 12:08:31PM CET, jpirko@redhat.com wrote:
>>>>>>>Sat, Feb 19, 2011 at 11:56:23AM CET, nicolas.2p.debian@gmail.com wrote:
>>>>>>>>Le 19/02/2011 09:05, Jiri Pirko a écrit :
>>>>>>>>>This patch converts bonding to use rx_handler. Results in cleaner
>>>>>>>>>__netif_receive_skb() with much less exceptions needed. Also
>>>>>>>>>bond-specific work is moved into bond code.
>>>>>>>>>
>>>>>>>>>Signed-off-by: Jiri Pirko<jpirko@redhat.com>
>>>>>>>>>
>>>>>>>>>v1->v2:
>>>>>>>>>         using skb_iif instead of new input_dev to remember original
>>>>>>>>>	device
>>>>>>>>>v2->v3:
>>>>>>>>>	set orig_dev = skb->dev if skb_iif is set
>>>>>>>>>
>>>>>>>>
>>>>>>>>Why do we need to let the rx_handlers call netif_rx() or __netif_receive_skb()?
>>>>>>>>
>>>>>>>>Bonding used to be handled with very few overhead, simply replacing
>>>>>>>>skb->dev with skb->dev->master. Time has passed and we eventually
>>>>>>>>added many special processing for bonding into __netif_receive_skb(),
>>>>>>>>but the overhead remained very light.
>>>>>>>>
>>>>>>>>Calling netif_rx() (or __netif_receive_skb()) to allow nesting would probably lead to some overhead.
>>>>>>>>
>>>>>>>>Can't we, instead, loop inside __netif_receive_skb(), and deliver
>>>>>>>>whatever need to be delivered, to whoever need, inside the loop ?
>>>>>>>>
>>>>>>>>rx_handler = rcu_dereference(skb->dev->rx_handler);
>>>>>>>>while (rx_handler) {
>>>>>>>>	/* ...  */
>>>>>>>>	orig_dev = skb->dev;
>>>>>>>>	skb = rx_handler(skb);
>>>>>>>>	/* ... */
>>>>>>>>	rx_handler = (skb->dev != orig_dev) ? rcu_dereference(skb->dev->rx_handler) : NULL;
>>>>>>>>}
>>>>>>>>
>>>>>>>>This would reduce the overhead, while still allowing nesting: vlan on
>>>>>>>>top on bonding, bridge on top on bonding, ...
>>>>>>>
>>>>>>>I see your point. Makes sense to me. But the loop would have to include
>>>>>>>at least processing of ptype_all too. I'm going to cook a follow-up
>>>>>>>patch.
>>>>>>>
>>>>>>
>>>>>>DRAFT (doesn't modify rx_handlers):
>>>>>>
>>>>>>diff --git a/net/core/dev.c b/net/core/dev.c
>>>>>>index 4ebf7fe..e5dba47 100644
>>>>>>--- a/net/core/dev.c
>>>>>>+++ b/net/core/dev.c
>>>>>>@@ -3115,6 +3115,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>>>  {
>>>>>>  	struct packet_type *ptype, *pt_prev;
>>>>>>  	rx_handler_func_t *rx_handler;
>>>>>>+	struct net_device *dev;
>>>>>>  	struct net_device *orig_dev;
>>>>>>  	struct net_device *null_or_dev;
>>>>>>  	int ret = NET_RX_DROP;
>>>>>>@@ -3129,7 +3130,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>>>  	if (netpoll_receive_skb(skb))
>>>>>>  		return NET_RX_DROP;
>>>>>>
>>>>>>-	__this_cpu_inc(softnet_data.processed);
>>>>>>+	skb->skb_iif = skb->dev->ifindex;
>>>>>>+	orig_dev = skb->dev;
>>>>>
>>>>>orig_dev should be set inside the loop, to reflect "previously
>>>>>crossed device", while following the path:
>>>>>
>>>>>eth0 ->   bond0 ->   br0.
>>>>>
>>>>>First step inside loop:
>>>>>
>>>>>orig_dev = eth0
>>>>>skb->dev = bond0 (at the end of the loop).
>>>>>
>>>>>Second step inside loop:
>>>>>
>>>>>orig_dev = bond0
>>>>>skb->dev = br0 (et the end of the loop).
>>>>>
>>>>>This would allow for exact match delivery to bond0 if someone bind there.
>>>>>
>>>>>>+
>>>>>>  	skb_reset_network_header(skb);
>>>>>>  	skb_reset_transport_header(skb);
>>>>>>  	skb->mac_len = skb->network_header - skb->mac_header;
>>>>>>@@ -3138,12 +3141,9 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>>>
>>>>>>  	rcu_read_lock();
>>>>>>
>>>>>>-	if (!skb->skb_iif) {
>>>>>>-		skb->skb_iif = skb->dev->ifindex;
>>>>>>-		orig_dev = skb->dev;
>>>>>>-	} else {
>>>>>>-		orig_dev = dev_get_by_index_rcu(dev_net(skb->dev), skb->skb_iif);
>>>>>>-	}
>>>>>
>>>>>I like the fact that it removes the above part.
>>>>>
>>>>>>+another_round:
>>>>>>+	__this_cpu_inc(softnet_data.processed);
>>>>>>+	dev = skb->dev;
>>>>>>
>>>>>>  #ifdef CONFIG_NET_CLS_ACT
>>>>>>  	if (skb->tc_verd&    TC_NCLS) {
>>>>>>@@ -3153,7 +3153,7 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>>>>  #endif
>>>>>>
>>>>>>  	list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>>>>-		if (!ptype->dev || ptype->dev == skb->dev) {
>>>>>>+		if (!ptype->dev || ptype->dev == dev) {
>>>>>>  			if (pt_prev)
>>>>>>  				ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>>>  			pt_prev = ptype;
>>>>>
>>>>>Inside the loop, we should only do exact match delivery, for
>>>>>&ptype_all and for&ptype_base[ntohs(type)&   PTYPE_HASH_MASK]:
>>>>>
>>>>>        list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>>>-               if (!ptype->dev || ptype->dev == dev) {
>>>>>+               if (ptype->dev == dev) {
>>>>>                        if (pt_prev)
>>>>>                                ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>>                        pt_prev = ptype;
>>>>>                }
>>>>>        }
>>>>>
>>>>>
>>>>>        list_for_each_entry_rcu(ptype,
>>>>>                        &ptype_base[ntohs(type)&   PTYPE_HASH_MASK], list) {
>>>>>                if (ptype->type == type&&
>>>>>-                   (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>>>>>+                   (ptype->dev == skb->dev)) {
>>>>>                        if (pt_prev)
>>>>>                                ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>>                        pt_prev = ptype;
>>>>>                }
>>>>>        }
>>>>>
>>>>>After leaving the loop, we can do wilcard delivery, if skb is not NULL.
>>>>>
>>>>>        list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>>>-               if (!ptype->dev || ptype->dev == dev) {
>>>>>+               if (!ptype->dev) {
>>>>>                        if (pt_prev)
>>>>>                                ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>>                        pt_prev = ptype;
>>>>>               }
>>>>>        }
>>>>>
>>>>>
>>>>>        list_for_each_entry_rcu(ptype,
>>>>>                        &ptype_base[ntohs(type)&   PTYPE_HASH_MASK], list) {
>>>>>-               if (ptype->type == type&&
>>>>>-                   (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>>>>>+		if (ptype->type == type&&   !ptype->dev) {
>>>>>                        if (pt_prev)
>>>>>                                ret = deliver_skb(skb, pt_prev, orig_dev);
>>>>>                        pt_prev = ptype;
>>>>>                }
>>>>>        }
>>>>>
>>>>>This would reduce the number of tests inside the
>>>>>list_for_each_entry_rcu() loops. And because we match only ptype->dev
>>>>>== dev inside the loop and !ptype->dev outside the loop, this should
>>>>>avoid duplicate delivery.
>>>>
>>>>Would you care to put this into patch so I can see the whole picture?
>>>>Thanks.
>>>
>>>Here is what I have in mind. It is based on your previous DRAFT patch, and don't modify rx_handlers yet.
>>>
>>>Only compile tested !!
>>>
>>>I don't know if every pieces are at the right place. I wonder what to
>>>do with CONFIG_NET_CLS_ACT part, that currently is between ptype_all
>>>and ptype_base processing.
>>>
>>>Anyway, the general idea is there.
>>>
>>>	Nicolas.
>>>
>>>net/core/dev.c |   70 ++++++++++++++++++++++++++++++++++++++++++++++++--------
>>>1 files changed, 60 insertions(+), 10 deletions(-)
>>>
>>>diff --git a/net/core/dev.c b/net/core/dev.c
>>>index e5dba47..7e007a9 100644
>>>--- a/net/core/dev.c
>>>+++ b/net/core/dev.c
>>>@@ -3117,7 +3117,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>	rx_handler_func_t *rx_handler;
>>>	struct net_device *dev;
>>>	struct net_device *orig_dev;
>>>-	struct net_device *null_or_dev;
>>>	int ret = NET_RX_DROP;
>>>	__be16 type;
>>>
>>>@@ -3130,9 +3129,6 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>	if (netpoll_receive_skb(skb))
>>>		return NET_RX_DROP;
>>>
>>>-	skb->skb_iif = skb->dev->ifindex;
>>>-	orig_dev = skb->dev;
>>>-
>>>	skb_reset_network_header(skb);
>>>	skb_reset_transport_header(skb);
>>>	skb->mac_len = skb->network_header - skb->mac_header;
>>>@@ -3143,6 +3139,8 @@ static int __netif_receive_skb(struct sk_buff *skb)
>>>
>>>another_round:
>>>	__this_cpu_inc(softnet_data.processed);
>>>+	skb->skb_iif = skb->dev->ifindex;
>>>+	orig_dev = skb->dev;
>>orig_dev should be set at the end of the loop. Now you are going to have
>>it always the same as dev and skb->dev.
>>
>
>Yes, you are right.
>
>I thinking about all this, I wonder what the protocol handlers expect as the orig_dev value ?
>
>Lest imagine the following configuration: eth0 -> bond0 -> br0.
>
>What does a protocol handler listening on br0 expect for orig_dev ?
>bond0 or eth0 ? Current implementation give eth0, but I think bond0
>should be the right value, for proper nesting.

I agree with you.

>
>>>	dev = skb->dev;
>>>
>>>#ifdef CONFIG_NET_CLS_ACT
>>>@@ -3152,8 +3150,13 @@ another_round:
>>>	}
>>>#endif
>>>
>>>+	/*
>>>+	 * Deliver to ptype_all protocol handlers that match current dev.
>>>+	 * This happens before rx_handler is given a chance to change skb->dev.
>>>+	 */
>>>+
>>>	list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>-		if (!ptype->dev || ptype->dev == dev) {
>>>+		if (ptype->dev == dev) {
>>>			if (pt_prev)
>>>				ret = deliver_skb(skb, pt_prev, orig_dev);
>>>			pt_prev = ptype;
>>>@@ -3167,6 +3170,31 @@ another_round:
>>>ncls:
>>>#endif
>>>
>>>+	/*
>>>+	 * Deliver to ptype_base protocol handlers that match current dev.
>>>+	 * This happens before rx_handler is given a chance to change skb->dev.
>>>+	 */
>>>+
>>>+	type = skb->protocol;
>>>+	list_for_each_entry_rcu(ptype,
>>>+			&ptype_base[ntohs(type)&  PTYPE_HASH_MASK], list) {
>>>+		if (ptype->type == type&&  ptype->dev == skb->dev) {
>>>+			if (pt_prev)
>>>+				ret = deliver_skb(skb, pt_prev, orig_dev);
>>>+			pt_prev = ptype;
>>>+		}
>>>+	}
>>
>>I'm not sure it is ok to deliver ptype_base here. See comment above
>>ptype_head() (I'm not sure I understand that correctly)
>
>Anyway, all this is probably plain wrong: Delivering the skb to
>protocol handlers while still changing the skb is guaranteed to cause
>strange behaviors.
>
>If we want to be able to deliver the skb to different protocol
>handlers and give all of them the right values for dev->skb and
>orig_dev (or previous_dev), we might end up with copying the skb. I
>hate the idea, but currently can't find a cleaner way to do so.

That would be unfortunate :/
>
>We first need to clarify what orig_dev should be, as stated above.
>
>>>+
>>>+	/*
>>>+	 * Call rx_handler for current device.
>>>+	 * If rx_handler return NULL, skip wilcard protocol handler delivery.
>>>+	 * Else, if skb->dev changed, restart the whole delivery process, to
>>>+	 * allow for device nesting.
>>>+	 *
>>>+	 * Warning:
>>>+	 * rx_handlers must kfree_skb(skb) if they return NULL.
>>Well this is not true. They can return NULL and call netif_rx as they
>>have before. No changes necessary I believe.
>
>I don't really know. This needs to be double checked, anyway.
>
>>>+	 */
>>>+
>>>	rx_handler = rcu_dereference(dev->rx_handler);
>>>	if (rx_handler) {
>>>		if (pt_prev) {
>>>@@ -3176,10 +3204,15 @@ ncls:
>>>		skb = rx_handler(skb);
>>>		if (!skb)
>>>			goto out;
>>>-		if (dev != skb->dev)
>>>+		if (skb->dev != dev)
>>>			goto another_round;
>>>	}
>>>
>>>+	/*
>>>+	 * FIXME: The part below should use rx_handler instead of being hard
>>>+	 * coded here.
>>I'm not sure it is doable atm. For bridge and bond it should not be a
>>problem, but for macvlan, there is possible to have macvlans and vlans
>>on the same dev. This possibility should persist.
>>/me scratches head on the idea to have multiple rx_handlers although it
>>was his original idea....
>
>I think your original proposal of having several rx_handlers per device was right.
>
>At the time you introduced the rx_handler system, only bridge and
>macvlan used it. Even if using bridge and macvlan on the same base
>device might be useless, this is not true for every possible
>rx_handler configuration.
>
>Now that we want to move bonding and vlan to the rx_handler system,
>it becomes obvious that we need several rx_handlers per device. At
>least, vlan should properly mix with bridge. And who know what would
>be the fifth rx_handler...
>
>>>+	 */
>>>+
>>>	if (vlan_tx_tag_present(skb)) {
>>>		if (pt_prev) {
>>>			ret = deliver_skb(skb, pt_prev, orig_dev);
>>>@@ -3192,16 +3225,33 @@ ncls:
>>>			goto out;
>>>	}
>>>
>>>+	/*
>>>+	 * FIXME: Can't this be moved into the rx_handler for bonding,
>>>+	 * or into a futur rx_handler for vlan?
>>This hook is something I do not like at all :/ But anyway if should be in vlan
>>part I think.
>
>Yes, and in order for the future rx_handler for vlan to properly
>handle it, it needs to know the device just below it, not the pure
>original device. Hence, my question about the exact meaning of
>orig_dev...
>
>	Nicolas.
>
>>>+	 */
>>>+
>>>	vlan_on_bond_hook(skb);
>>>
>>>-	/* deliver only exact match when indicated */
>>>-	null_or_dev = skb->deliver_no_wcard ? skb->dev : NULL;
>>>+	/*
>>>+	 * Deliver to wildcard ptype_all protocol handlers.
>>>+	 */
>>>+
>>>+	list_for_each_entry_rcu(ptype,&ptype_all, list) {
>>>+		if (!ptype->dev) {
>>>+			if (pt_prev)
>>>+				ret = deliver_skb(skb, pt_prev, orig_dev);
>>>+			pt_prev = ptype;
>>>+		}
>>>+	}
>>>+
>>>+	/*
>>>+	 * Deliver to wildcard ptype_all protocol handlers.
>>>+	 */
>>>
>>>	type = skb->protocol;
>>>	list_for_each_entry_rcu(ptype,
>>>			&ptype_base[ntohs(type)&  PTYPE_HASH_MASK], list) {
>>>-		if (ptype->type == type&&
>>>-		    (ptype->dev == null_or_dev || ptype->dev == skb->dev)) {
>>>+		if (ptype->type == type&&  !ptype->dev) {
>>>			if (pt_prev)
>>>				ret = deliver_skb(skb, pt_prev, orig_dev);
>>>			pt_prev = ptype;
>>>--
>>>1.7.2.3
>>>
>>>
>>>

^ permalink raw reply

* Re: Advice on network driver design
From: Micha Nelissen @ 2011-02-20 18:14 UTC (permalink / raw)
  To: Felix Radensky; +Cc: netdev@vger.kernel.org
In-Reply-To: <4D5FC7A7.5050704@embedded-sol.com>

Felix Radensky wrote:
> The host platform is Freescale P2020. The custom hardware is
> FPGA with several TX FIFOs, single RX FIFO and a set of registers.

Wasn't it easier to use the in-SOC ethernet controllers?

Micha

^ permalink raw reply

* Re: Question about tg3 and bnx2 driver suppliers
From: Micha Nelissen @ 2011-02-20 18:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Michael Durket, netdev@vger.kernel.org
In-Reply-To: <1297952255.2604.115.camel@edumazet-laptop>

Eric Dumazet wrote:
> One possible cause of packet drops is when softirqs are disabled for too
> long periods, even if NIC has a big RX ring (check ethtool -g eth0)

Why aren't the softirqs converted to workqueues? Wouldn't that cut 
dependencies to other softirq users and improve latency?

Probably a stupid question, thanks.

Micha

^ permalink raw reply

* Re: [PATCH]tcp: document tcp_max_ssthresh (Limited Slow-Start)
From: David Miller @ 2011-02-20 19:10 UTC (permalink / raw)
  To: shanwei; +Cc: ilpo.jarvinen, netdev, jheffner
In-Reply-To: <4D60C849.40905@cn.fujitsu.com>

From: Shan Wei <shanwei@cn.fujitsu.com>
Date: Sun, 20 Feb 2011 15:52:41 +0800

> From: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> 
> Base on Ilpo's patch about documenting tcp_max_ssthresh.
> (see http://marc.info/?l=linux-netdev&m=117950581307310&w=2)
> 
> According to errata of RFC3742, fix the number of segments increased 
> during RTT time. 
> 
> Just to state the occasion to use this parameter, But
> about how to set parameter value, maybe some others can do it. 
> 
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>

Applied.

^ permalink raw reply

* Re: [PATCH] tcp: Remove debug macro of TCP_CHECK_TIMER
From: David Miller @ 2011-02-20 19:10 UTC (permalink / raw)
  To: shanwei; +Cc: netdev, kuznet, pekkas, jmorris, kaber
In-Reply-To: <4D60C901.8050307@cn.fujitsu.com>

From: Shan Wei <shanwei@cn.fujitsu.com>
Date: Sun, 20 Feb 2011 15:55:45 +0800

> 
> Now, TCP_CHECK_TIMER is not used for debuging, it does nothing.
> And, it has been there for several years, maybe 6 years.
> 
> Remove it to keep code clearer.
> 
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] sctp: fix compile warnings in sctp_tsnmap_num_gabs
From: David Miller @ 2011-02-20 19:10 UTC (permalink / raw)
  To: shanwei; +Cc: vladislav.yasevich, netdev, linux-sctp
In-Reply-To: <4D60C966.2000302@cn.fujitsu.com>

From: Shan Wei <shanwei@cn.fujitsu.com>
Date: Sun, 20 Feb 2011 15:57:26 +0800

> net/sctp/tsnmap.c: In function ‘sctp_tsnmap_num_gabs’:
> net/sctp/tsnmap.c:347: warning: ‘start’ may be used uninitialized in this function
> net/sctp/tsnmap.c:347: warning: ‘end’ may be used uninitialized in this function
> 
> Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>

Applied.

^ permalink raw reply

* Re: Advice on network driver design
From: arnd @ 2011-02-20 19:13 UTC (permalink / raw)
  To: Felix Radensky; +Cc: netdev@vger.kernel.org
In-Reply-To: <4D5FC7A7.5050704@embedded-sol.com>

On Saturday 19 February 2011 14:37:43 Felix Radensky wrote:
> Hi,
> 
> I'm in the process of designing a network driver for a custom
> hardware and would like to get some advice from linux network
> gurus.
> 
> The host platform is Freescale P2020. The custom hardware is
> FPGA with several TX FIFOs, single RX FIFO and a set of registers.
> FPGA is connected to CPU via PCI-E. Host CPU DMA controller is used
> to get packets to/from FIFOs. Each FIFO has its set of events, 
> generating interrupts, which can be enabled and disabled. Status 
> register reflects the current state of events, the bit in status 
> register is cleared by FPGA when event is handled. Reads or writes to 
> status register have no impact on its contents.
> 
> The device driver should support 80Mbit/sec of traffic in each direction.
> 
> So far I have TX side working. I'm using Linux dmaengine APIs to
> transfer packets to FIFOs. The DMA completion interrupt is handled
> by per-fifo work queue.
> 
> My question is about RX. Would such design benefit from NAPI ?
> If my understanding of NAPI is correct, it runs in softirq context,
> so I cannot do any DMA work in dev->poll(). If I were to use NAPI,
> I should probably disable RX interrupts, do all DMA work in some
> work queue, keep RX packets in a list and only then call dev->poll().
> Is that correct ?
> 
> Any other advice and how to write an efficient driver for this 
> hardware is most welcome. I can influence FPGA design to some degree, 
> so if you think FPGA should be changed to improve things, please let 
> me know.

There are currently discussions ongoing about using virtio for this
kind of connection. See http://www.mail-archive.com/linuxppc-dev@lists.ozlabs.org/msg49294.html
for an archive.

When you use virtio as the base, you can use the regular virtio-net
driver or any other virtio high-level driver on top.

	Arnd

^ permalink raw reply

* Re: Question about tg3 and bnx2 driver suppliers
From: Eric Dumazet @ 2011-02-20 19:17 UTC (permalink / raw)
  To: Micha Nelissen; +Cc: Michael Durket, netdev@vger.kernel.org
In-Reply-To: <4D615B2D.5080804@neli.hopto.org>

Le dimanche 20 février 2011 à 19:19 +0100, Micha Nelissen a écrit :
> Eric Dumazet wrote:
> > One possible cause of packet drops is when softirqs are disabled for too
> > long periods, even if NIC has a big RX ring (check ethtool -g eth0)
> 
> Why aren't the softirqs converted to workqueues? Wouldn't that cut 
> dependencies to other softirq users and improve latency?
> 

Because it was done like that in the old days.

Its a bit less important these days, now typical machines have 8+ cpus.
Each device interrupt can be handled by its own cpu :)






^ permalink raw reply

* Re: how to listen() on single IP address but very many ports?
From: Eric Dumazet @ 2011-02-20 19:20 UTC (permalink / raw)
  To: Chris Friesen; +Cc: netdev
In-Reply-To: <4D5EB2AE.5050703@genband.com>

Le vendredi 18 février 2011 à 11:55 -0600, Chris Friesen a écrit :
> I have an application team that needs to listen() for tcp connections on
> many ports (and by many I mean pretty much all 64K ports).  However, the
> connections are short-lived, and the number of active connections at any
> given time is small.
> 
> Apparently when they tried this before on an older kernel the
> performance of the naive "open 60K sockets and call listen()" solution
> was not acceptable, so they used NAT with port mapping to direct all the
> incoming packets to a single real port.  However, they now want to add
> support for IPv6 and this solution won't work.
> 
> What's the recommended method for efficiently listening on this many
> ports?  Should I be able to efficiently listen() on that many sockets
> using epoll or similar?  If there isn't a way to do this, is there an
> equivalent IPv6 workaround?
> 
> One possible solution that came up was to implement a PORT_ANY which
> would match any incoming request that didn't already have an explicit
> listener.  Even better would be a way to bind a single listening socket
> to a range of ports.
> 
> Has anyone ever considered something like this?
> 

I really dont see how listening to 60K sockets can be "not acceptable".

It just runs OK, at exactly same speed than 1 socket, if using epoll.

Only 'problem' could be memory usage, a bit more heavy of course, but
who cares ?




^ permalink raw reply

* Re: [PATCH] net: fix unreg list corruption in dev_deactivate()
From: David Miller @ 2011-02-20 19:50 UTC (permalink / raw)
  To: eric.dumazet; +Cc: stf_xl, netdev, opurdila
In-Reply-To: <1298203890.8559.54.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 20 Feb 2011 13:11:30 +0100

> Hmm, you should read Eric B patch, he already addressed this problem a
> few hours ago.
> 
> A full audit _is_ needed.
> 
> https://lkml.org/lkml/2011/2/20/4

I'll apply Eric B.'s patch to net-2.6

^ permalink raw reply

* Re: Question about tg3 and bnx2 driver suppliers
From: Stephen Hemminger @ 2011-02-20 20:41 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Micha Nelissen, Michael Durket, netdev@vger.kernel.org
In-Reply-To: <1298229420.8559.59.camel@edumazet-laptop>

On Sun, 20 Feb 2011 20:17:00 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le dimanche 20 février 2011 à 19:19 +0100, Micha Nelissen a écrit :
> > Eric Dumazet wrote:
> > > One possible cause of packet drops is when softirqs are disabled for too
> > > long periods, even if NIC has a big RX ring (check ethtool -g eth0)
> > 
> > Why aren't the softirqs converted to workqueues? Wouldn't that cut 
> > dependencies to other softirq users and improve latency?
> > 
> 
> Because it was done like that in the old days.
> 
> Its a bit less important these days, now typical machines have 8+ cpus.
> Each device interrupt can be handled by its own cpu :)

The latency to schedule a work queue is still much higher
than the latency to do a softirq. Last time I played around with it,
things like loopback performance dropped 10% if using work queue.

^ permalink raw reply

* Re: Advice on network driver design
From: Felix Radensky @ 2011-02-20 21:01 UTC (permalink / raw)
  To: Micha Nelissen; +Cc: netdev@vger.kernel.org
In-Reply-To: <4D615A1E.5070002@neli.hopto.org>

Hi Micha

On 02/20/2011 08:14 PM, Micha Nelissen wrote:
> Felix Radensky wrote:
>> The host platform is Freescale P2020. The custom hardware is
>> FPGA with several TX FIFOs, single RX FIFO and a set of registers.
>
> Wasn't it easier to use the in-SOC ethernet controllers?

FPGA talks some satellite protocol to the outside world, so the
target box is essentially a ethernet to satellite router.

Felix.

^ permalink raw reply

* libmnl limitation...
From: David Miller @ 2011-02-20 21:41 UTC (permalink / raw)
  To: pablo; +Cc: netdev


Unfortunately, libmnl only allows setting socket options
that are of SOL_NETLINK.

This precludes setting socket options such as SO_ATTACH_FILTER
which need to be of level SOL_SOCKET.

It seems to be a mistake to have hard-coded the socket level
instead of simply letting the user specify it :-/

^ permalink raw reply

* [PATCH net-next 2/6] be2net: fixes in ethtool selftest
From: Ajit Khaparde @ 2011-02-20 21:41 UTC (permalink / raw)
  To: netdev, davem

> add missing separator between items in ethtool self_test array
> fix reporting of test resluts when link is down and
  when selftest command fails.

From: Suresh R <suresh.reddy@emulex.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
---
 drivers/net/benet/be_ethtool.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/benet/be_ethtool.c b/drivers/net/benet/be_ethtool.c
index 82a9a27..0833cbd 100644
--- a/drivers/net/benet/be_ethtool.c
+++ b/drivers/net/benet/be_ethtool.c
@@ -127,7 +127,7 @@ static const char et_self_tests[][ETH_GSTRING_LEN] = {
 	"MAC Loopback test",
 	"PHY Loopback test",
 	"External Loopback test",
-	"DDR DMA test"
+	"DDR DMA test",
 	"Link test"
 };
 
@@ -642,7 +642,8 @@ be_self_test(struct net_device *netdev, struct ethtool_test *test, u64 *data)
 				&qos_link_speed) != 0) {
 		test->flags |= ETH_TEST_FL_FAILED;
 		data[4] = -1;
-	} else if (mac_speed) {
+	} else if (!mac_speed) {
+		test->flags |= ETH_TEST_FL_FAILED;
 		data[4] = 1;
 	}
 }
-- 
1.7.1


^ permalink raw reply related

* [PATCH net-next 1/6] be2net: add new counters to display via ethtool stats
From: Ajit Khaparde @ 2011-02-20 21:41 UTC (permalink / raw)
  To: netdev, davem

New counters:
	> jabber frame stats
	> red drop stats

Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
---
 drivers/net/benet/be_cmds.h    |   12 ++++++++++--
 drivers/net/benet/be_ethtool.c |   13 +++++++++++--
 2 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/drivers/net/benet/be_cmds.h b/drivers/net/benet/be_cmds.h
index 91c5d2b..331e954 100644
--- a/drivers/net/benet/be_cmds.h
+++ b/drivers/net/benet/be_cmds.h
@@ -619,7 +619,10 @@ struct be_rxf_stats {
 	u32 rx_drops_invalid_ring;	/* dword 145*/
 	u32 forwarded_packets;	/* dword 146*/
 	u32 rx_drops_mtu;	/* dword 147*/
-	u32 rsvd0[15];
+	u32 rsvd0[7];
+	u32 port0_jabber_events;
+	u32 port1_jabber_events;
+	u32 rsvd1[6];
 };
 
 struct be_erx_stats {
@@ -630,11 +633,16 @@ struct be_erx_stats {
 	u32 debug_pmem_pbuf_dealloc;       /* dword 47*/
 };
 
+struct be_pmem_stats {
+	u32 eth_red_drops;
+	u32 rsvd[4];
+};
+
 struct be_hw_stats {
 	struct be_rxf_stats rxf;
 	u32 rsvd[48];
 	struct be_erx_stats erx;
-	u32 rsvd1[6];
+	struct be_pmem_stats pmem;
 };
 
 struct be_cmd_req_get_stats {
diff --git a/drivers/net/benet/be_ethtool.c b/drivers/net/benet/be_ethtool.c
index 07b4ab9..82a9a27 100644
--- a/drivers/net/benet/be_ethtool.c
+++ b/drivers/net/benet/be_ethtool.c
@@ -26,7 +26,8 @@ struct be_ethtool_stat {
 	int offset;
 };
 
-enum {NETSTAT, PORTSTAT, MISCSTAT, DRVSTAT_TX, DRVSTAT_RX, ERXSTAT};
+enum {NETSTAT, PORTSTAT, MISCSTAT, DRVSTAT_TX, DRVSTAT_RX, ERXSTAT,
+			PMEMSTAT};
 #define FIELDINFO(_struct, field) FIELD_SIZEOF(_struct, field), \
 					offsetof(_struct, field)
 #define NETSTAT_INFO(field) 	#field, NETSTAT,\
@@ -43,6 +44,8 @@ enum {NETSTAT, PORTSTAT, MISCSTAT, DRVSTAT_TX, DRVSTAT_RX, ERXSTAT};
 						field)
 #define ERXSTAT_INFO(field) 	#field, ERXSTAT,\
 					FIELDINFO(struct be_erx_stats, field)
+#define PMEMSTAT_INFO(field) 	#field, PMEMSTAT,\
+					FIELDINFO(struct be_pmem_stats, field)
 
 static const struct be_ethtool_stat et_stats[] = {
 	{NETSTAT_INFO(rx_packets)},
@@ -99,7 +102,10 @@ static const struct be_ethtool_stat et_stats[] = {
 	{MISCSTAT_INFO(rx_drops_too_many_frags)},
 	{MISCSTAT_INFO(rx_drops_invalid_ring)},
 	{MISCSTAT_INFO(forwarded_packets)},
-	{MISCSTAT_INFO(rx_drops_mtu)}
+	{MISCSTAT_INFO(rx_drops_mtu)},
+	{MISCSTAT_INFO(port0_jabber_events)},
+	{MISCSTAT_INFO(port1_jabber_events)},
+	{PMEMSTAT_INFO(eth_red_drops)}
 };
 #define ETHTOOL_STATS_NUM ARRAY_SIZE(et_stats)
 
@@ -276,6 +282,9 @@ be_get_ethtool_stats(struct net_device *netdev,
 		case MISCSTAT:
 			p = &hw_stats->rxf;
 			break;
+		case PMEMSTAT:
+			p = &hw_stats->pmem;
+			break;
 		}
 
 		p = (u8 *)p + et_stats[i].offset;
-- 
1.7.1


^ permalink raw reply related

* [PATCH net-next 4/6] be2net: fix to ignore transparent vlan ids wrongly indicated by NIC
From: Ajit Khaparde @ 2011-02-20 21:41 UTC (permalink / raw)
  To: netdev, davem

With transparent VLAN tagging, the ASIC wrongly indicates packets with VLAN ID.
Strip them off in the driver. The VLAN Tag to be stripped will be given to the host
as an async message.

Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
---
 drivers/net/benet/be.h      |    1 +
 drivers/net/benet/be_cmds.c |   14 ++++++++++++++
 drivers/net/benet/be_cmds.h |   13 +++++++++++++
 drivers/net/benet/be_main.c |    6 ++++++
 4 files changed, 34 insertions(+), 0 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index fb605e8..7bf8dd4 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -311,6 +311,7 @@ struct be_adapter {
 	struct be_vf_cfg vf_cfg[BE_MAX_VF];
 	u8 is_virtfn;
 	u32 sli_family;
+	u16 pvid;
 };
 
 #define be_physfn(adapter) (!adapter->is_virtfn)
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 7120106..59d25ac 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -124,6 +124,16 @@ static void be_async_grp5_qos_speed_process(struct be_adapter *adapter,
 	}
 }
 
+/*Grp5 PVID evt*/
+static void be_async_grp5_pvid_state_process(struct be_adapter *adapter,
+		struct be_async_event_grp5_pvid_state *evt)
+{
+	if (evt->enabled)
+		adapter->pvid = evt->tag;
+	else
+		adapter->pvid = 0;
+}
+
 static void be_async_grp5_evt_process(struct be_adapter *adapter,
 		u32 trailer, struct be_mcc_compl *evt)
 {
@@ -141,6 +151,10 @@ static void be_async_grp5_evt_process(struct be_adapter *adapter,
 		be_async_grp5_qos_speed_process(adapter,
 		(struct be_async_event_grp5_qos_link_speed *)evt);
 	break;
+	case ASYNC_EVENT_PVID_STATE:
+		be_async_grp5_pvid_state_process(adapter,
+		(struct be_async_event_grp5_pvid_state *)evt);
+	break;
 	default:
 		dev_warn(&adapter->pdev->dev, "Unknown grp5 event!\n");
 		break;
diff --git a/drivers/net/benet/be_cmds.h b/drivers/net/benet/be_cmds.h
index 331e954..a5af296 100644
--- a/drivers/net/benet/be_cmds.h
+++ b/drivers/net/benet/be_cmds.h
@@ -88,6 +88,7 @@ struct be_mcc_compl {
 #define ASYNC_EVENT_CODE_GRP_5		0x5
 #define ASYNC_EVENT_QOS_SPEED		0x1
 #define ASYNC_EVENT_COS_PRIORITY	0x2
+#define ASYNC_EVENT_PVID_STATE		0x3
 struct be_async_event_trailer {
 	u32 code;
 };
@@ -134,6 +135,18 @@ struct be_async_event_grp5_cos_priority {
 	struct be_async_event_trailer trailer;
 } __packed;
 
+/* When the event code of an async trailer is GRP5 and event type is
+ * PVID state, the mcc_compl must be interpreted as follows
+ */
+struct be_async_event_grp5_pvid_state {
+	u8 enabled;
+	u8 rsvd0;
+	u16 tag;
+	u32 event_tag;
+	u32 rsvd1;
+	struct be_async_event_trailer trailer;
+} __packed;
+
 struct be_mcc_mailbox {
 	struct be_mcc_wrb wrb;
 	struct be_mcc_compl compl;
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index b9e170d..cd6fda7 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -1047,6 +1047,9 @@ static void be_rx_compl_process(struct be_adapter *adapter,
 	if ((adapter->function_mode & 0x400) && !vtm)
 		vlanf = 0;
 
+	if ((adapter->pvid == vlanf) && !adapter->vlan_tag[vlanf])
+		vlanf = 0;
+
 	if (unlikely(vlanf)) {
 		if (!adapter->vlan_grp || adapter->vlans_added == 0) {
 			kfree_skb(skb);
@@ -1087,6 +1090,9 @@ static void be_rx_compl_process_gro(struct be_adapter *adapter,
 	if ((adapter->function_mode & 0x400) && !vtm)
 		vlanf = 0;
 
+	if ((adapter->pvid == vlanf) && !adapter->vlan_tag[vlanf])
+		vlanf = 0;
+
 	skb = napi_get_frags(&eq_obj->napi);
 	if (!skb) {
 		be_rx_compl_discard(adapter, rxo, rxcp);
-- 
1.7.1


^ permalink raw reply related

* [PATCH net-next 3/6] be2net: variable name change
From: Ajit Khaparde @ 2011-02-20 21:41 UTC (permalink / raw)
  To: netdev, davem

change occurances of stats_ioctl_sent to stats_cmd_sent

Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
---
 drivers/net/benet/be.h      |    2 +-
 drivers/net/benet/be_cmds.c |    4 ++--
 drivers/net/benet/be_main.c |    2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index 3a800e2..fb605e8 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -298,7 +298,7 @@ struct be_adapter {
 	u32 rx_fc;		/* Rx flow control */
 	u32 tx_fc;		/* Tx flow control */
 	bool ue_detected;
-	bool stats_ioctl_sent;
+	bool stats_cmd_sent;
 	int link_speed;
 	u8 port_type;
 	u8 transceiver;
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 619ebc2..7120106 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -81,7 +81,7 @@ static int be_mcc_compl_process(struct be_adapter *adapter,
 			be_dws_le_to_cpu(&resp->hw_stats,
 						sizeof(resp->hw_stats));
 			netdev_stats_update(adapter);
-			adapter->stats_ioctl_sent = false;
+			adapter->stats_cmd_sent = false;
 		}
 	} else if ((compl_status != MCC_STATUS_NOT_SUPPORTED) &&
 		   (compl->tag0 != OPCODE_COMMON_NTWK_MAC_QUERY)) {
@@ -1075,7 +1075,7 @@ int be_cmd_get_stats(struct be_adapter *adapter, struct be_dma_mem *nonemb_cmd)
 	sge->len = cpu_to_le32(nonemb_cmd->size);
 
 	be_mcc_notify(adapter);
-	adapter->stats_ioctl_sent = true;
+	adapter->stats_cmd_sent = true;
 
 err:
 	spin_unlock_bh(&adapter->mcc_lock);
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index aad7ea3..b9e170d 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -1873,7 +1873,7 @@ static void be_worker(struct work_struct *work)
 		goto reschedule;
 	}
 
-	if (!adapter->stats_ioctl_sent)
+	if (!adapter->stats_cmd_sent)
 		be_cmd_get_stats(adapter, &adapter->stats_cmd);
 
 	be_tx_rate_update(adapter);
-- 
1.7.1


^ permalink raw reply related

* [PATCH net-next 6/6] be2net: use hba_port_num instead of port_num
From: Ajit Khaparde @ 2011-02-20 21:42 UTC (permalink / raw)
  To: netdev, davem

Use hba_port_num for phy loopback and ethtool phy identification.

From: Suresh R <suresh.reddy@emulex.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
---
 drivers/net/benet/be.h         |    1 +
 drivers/net/benet/be_cmds.c    |   54 ++++++++++++++++++++++++++++++++++++++++
 drivers/net/benet/be_cmds.h    |   12 +++++++++
 drivers/net/benet/be_ethtool.c |   12 ++++----
 drivers/net/benet/be_hw.h      |   47 ++++++++++++++++++++++++++++++++++
 drivers/net/benet/be_main.c    |    4 +++
 6 files changed, 124 insertions(+), 6 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index 46b951f..ed709a5 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -318,6 +318,7 @@ struct be_adapter {
 	struct be_vf_cfg vf_cfg[BE_MAX_VF];
 	u8 is_virtfn;
 	u32 sli_family;
+	u8 hba_port_num;
 	u16 pvid;
 };
 
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index ff62aae..1822ecd 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -1954,3 +1954,57 @@ err:
 	spin_unlock_bh(&adapter->mcc_lock);
 	return status;
 }
+
+int be_cmd_get_cntl_attributes(struct be_adapter *adapter)
+{
+	struct be_mcc_wrb *wrb;
+	struct be_cmd_req_cntl_attribs *req;
+	struct be_cmd_resp_cntl_attribs *resp;
+	struct be_sge *sge;
+	int status;
+	int payload_len = max(sizeof(*req), sizeof(*resp));
+	struct mgmt_controller_attrib *attribs;
+	struct be_dma_mem attribs_cmd;
+
+	memset(&attribs_cmd, 0, sizeof(struct be_dma_mem));
+	attribs_cmd.size = sizeof(struct be_cmd_resp_cntl_attribs);
+	attribs_cmd.va = pci_alloc_consistent(adapter->pdev, attribs_cmd.size,
+						&attribs_cmd.dma);
+	if (!attribs_cmd.va) {
+		dev_err(&adapter->pdev->dev,
+				"Memory allocation failure\n");
+		return -ENOMEM;
+	}
+
+	if (mutex_lock_interruptible(&adapter->mbox_lock))
+		return -1;
+
+	wrb = wrb_from_mbox(adapter);
+	if (!wrb) {
+		status = -EBUSY;
+		goto err;
+	}
+	req = attribs_cmd.va;
+	sge = nonembedded_sgl(wrb);
+
+	be_wrb_hdr_prepare(wrb, payload_len, false, 1,
+			OPCODE_COMMON_GET_CNTL_ATTRIBUTES);
+	be_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_COMMON,
+			 OPCODE_COMMON_GET_CNTL_ATTRIBUTES, payload_len);
+	sge->pa_hi = cpu_to_le32(upper_32_bits(attribs_cmd.dma));
+	sge->pa_lo = cpu_to_le32(attribs_cmd.dma & 0xFFFFFFFF);
+	sge->len = cpu_to_le32(attribs_cmd.size);
+
+	status = be_mbox_notify_wait(adapter);
+	if (!status) {
+		attribs = (struct mgmt_controller_attrib *)( attribs_cmd.va +
+					sizeof(struct be_cmd_resp_hdr));
+		adapter->hba_port_num = attribs->hba_attribs.phy_port;
+	}
+
+err:
+	mutex_unlock(&adapter->mbox_lock);
+	pci_free_consistent(adapter->pdev, attribs_cmd.size, attribs_cmd.va,
+					attribs_cmd.dma);
+	return status;
+}
diff --git a/drivers/net/benet/be_cmds.h b/drivers/net/benet/be_cmds.h
index 6e89de8..93e5768 100644
--- a/drivers/net/benet/be_cmds.h
+++ b/drivers/net/benet/be_cmds.h
@@ -169,6 +169,7 @@ struct be_mcc_mailbox {
 #define OPCODE_COMMON_SET_QOS				28
 #define OPCODE_COMMON_MCC_CREATE_EXT			90
 #define OPCODE_COMMON_SEEPROM_READ			30
+#define OPCODE_COMMON_GET_CNTL_ATTRIBUTES               32
 #define OPCODE_COMMON_NTWK_RX_FILTER    		34
 #define OPCODE_COMMON_GET_FW_VERSION			35
 #define OPCODE_COMMON_SET_FLOW_CONTROL			36
@@ -1030,6 +1031,16 @@ struct be_cmd_resp_set_qos {
 	u32 rsvd;
 };
 
+/*********************** Controller Attributes ***********************/
+struct be_cmd_req_cntl_attribs {
+	struct be_cmd_req_hdr hdr;
+};
+
+struct be_cmd_resp_cntl_attribs {
+	struct be_cmd_resp_hdr hdr;
+	struct mgmt_controller_attrib attribs;
+};
+
 extern int be_pci_fnum_get(struct be_adapter *adapter);
 extern int be_cmd_POST(struct be_adapter *adapter);
 extern int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr,
@@ -1115,4 +1126,5 @@ extern int be_cmd_get_phy_info(struct be_adapter *adapter,
 extern int be_cmd_set_qos(struct be_adapter *adapter, u32 bps, u32 domain);
 extern void be_detect_dump_ue(struct be_adapter *adapter);
 extern int be_cmd_get_die_temperature(struct be_adapter *adapter);
+extern int be_cmd_get_cntl_attributes(struct be_adapter *adapter);
 
diff --git a/drivers/net/benet/be_ethtool.c b/drivers/net/benet/be_ethtool.c
index 4766693..6e5e433 100644
--- a/drivers/net/benet/be_ethtool.c
+++ b/drivers/net/benet/be_ethtool.c
@@ -513,7 +513,7 @@ be_phys_id(struct net_device *netdev, u32 data)
 	int status;
 	u32 cur;
 
-	be_cmd_get_beacon_state(adapter, adapter->port_num, &cur);
+	be_cmd_get_beacon_state(adapter, adapter->hba_port_num, &cur);
 
 	if (cur == BEACON_STATE_ENABLED)
 		return 0;
@@ -521,12 +521,12 @@ be_phys_id(struct net_device *netdev, u32 data)
 	if (data < 2)
 		data = 2;
 
-	status = be_cmd_set_beacon_state(adapter, adapter->port_num, 0, 0,
+	status = be_cmd_set_beacon_state(adapter, adapter->hba_port_num, 0, 0,
 			BEACON_STATE_ENABLED);
 	set_current_state(TASK_INTERRUPTIBLE);
 	schedule_timeout(data*HZ);
 
-	status = be_cmd_set_beacon_state(adapter, adapter->port_num, 0, 0,
+	status = be_cmd_set_beacon_state(adapter, adapter->hba_port_num, 0, 0,
 			BEACON_STATE_DISABLED);
 
 	return status;
@@ -605,12 +605,12 @@ err:
 static u64 be_loopback_test(struct be_adapter *adapter, u8 loopback_type,
 				u64 *status)
 {
-	be_cmd_set_loopback(adapter, adapter->port_num,
+	be_cmd_set_loopback(adapter, adapter->hba_port_num,
 				loopback_type, 1);
-	*status = be_cmd_loopback_test(adapter, adapter->port_num,
+	*status = be_cmd_loopback_test(adapter, adapter->hba_port_num,
 				loopback_type, 1500,
 				2, 0xabc);
-	be_cmd_set_loopback(adapter, adapter->port_num,
+	be_cmd_set_loopback(adapter, adapter->hba_port_num,
 				BE_NO_LOOPBACK, 1);
 	return *status;
 }
diff --git a/drivers/net/benet/be_hw.h b/drivers/net/benet/be_hw.h
index 4096d97..3f459f7 100644
--- a/drivers/net/benet/be_hw.h
+++ b/drivers/net/benet/be_hw.h
@@ -327,6 +327,53 @@ struct be_eth_rx_compl {
 	u32 dw[4];
 };
 
+struct mgmt_hba_attribs {
+	u8 flashrom_version_string[32];
+	u8 manufacturer_name[32];
+	u32 supported_modes;
+	u32 rsvd0[3];
+	u8 ncsi_ver_string[12];
+	u32 default_extended_timeout;
+	u8 controller_model_number[32];
+	u8 controller_description[64];
+	u8 controller_serial_number[32];
+	u8 ip_version_string[32];
+	u8 firmware_version_string[32];
+	u8 bios_version_string[32];
+	u8 redboot_version_string[32];
+	u8 driver_version_string[32];
+	u8 fw_on_flash_version_string[32];
+	u32 functionalities_supported;
+	u16 max_cdblength;
+	u8 asic_revision;
+	u8 generational_guid[16];
+	u8 hba_port_count;
+	u16 default_link_down_timeout;
+	u8 iscsi_ver_min_max;
+	u8 multifunction_device;
+	u8 cache_valid;
+	u8 hba_status;
+	u8 max_domains_supported;
+	u8 phy_port;
+	u32 firmware_post_status;
+	u32 hba_mtu[8];
+	u32 rsvd1[4];
+};
+
+struct mgmt_controller_attrib {
+	struct mgmt_hba_attribs hba_attribs;
+	u16 pci_vendor_id;
+	u16 pci_device_id;
+	u16 pci_sub_vendor_id;
+	u16 pci_sub_system_id;
+	u8 pci_bus_number;
+	u8 pci_device_number;
+	u8 pci_function_number;
+	u8 interface_type;
+	u64 unique_identifier;
+	u32 rsvd0[5];
+};
+
 struct controller_id {
 	u32 vendor;
 	u32 device;
diff --git a/drivers/net/benet/be_main.c b/drivers/net/benet/be_main.c
index cd6fda7..0bdccb1 100644
--- a/drivers/net/benet/be_main.c
+++ b/drivers/net/benet/be_main.c
@@ -2868,6 +2868,10 @@ static int be_get_config(struct be_adapter *adapter)
 	else
 		adapter->max_vlans = BE_NUM_VLANS_SUPPORTED;
 
+	status = be_cmd_get_cntl_attributes(adapter);
+	if (status)
+		return status;
+
 	return 0;
 }
 
-- 
1.7.1


^ permalink raw reply related

* [PATCH net-next 5/6] be2net: add code to display temperature of ASIC
From: Ajit Khaparde @ 2011-02-20 21:42 UTC (permalink / raw)
  To: netdev, davem

Add support to display temperature of ASIC via ethtool -S

From: Somnath K <somnath.kotur@emulex.com>
Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com>
---
 drivers/net/benet/be.h         |    7 ++++++
 drivers/net/benet/be_cmds.c    |   44 ++++++++++++++++++++++++++++++++++++++++
 drivers/net/benet/be_cmds.h    |   16 ++++++++++++++
 drivers/net/benet/be_ethtool.c |   11 ++++++++-
 4 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/drivers/net/benet/be.h b/drivers/net/benet/be.h
index 7bf8dd4..46b951f 100644
--- a/drivers/net/benet/be.h
+++ b/drivers/net/benet/be.h
@@ -225,6 +225,10 @@ struct be_rx_obj {
 	u32 cache_line_barrier[15];
 };
 
+struct be_drv_stats {
+	u8 be_on_die_temperature;
+};
+
 struct be_vf_cfg {
 	unsigned char vf_mac_addr[ETH_ALEN];
 	u32 vf_if_handle;
@@ -234,6 +238,7 @@ struct be_vf_cfg {
 };
 
 #define BE_INVALID_PMAC_ID		0xffffffff
+
 struct be_adapter {
 	struct pci_dev *pdev;
 	struct net_device *netdev;
@@ -269,6 +274,7 @@ struct be_adapter {
 	u32 big_page_size;	/* Compounded page size shared by rx wrbs */
 
 	u8 msix_vec_next_idx;
+	struct be_drv_stats drv_stats;
 
 	struct vlan_group *vlan_grp;
 	u16 vlans_added;
@@ -281,6 +287,7 @@ struct be_adapter {
 	struct be_dma_mem stats_cmd;
 	/* Work queue used to perform periodic tasks like getting statistics */
 	struct delayed_work work;
+	u16 work_counter;
 
 	/* Ethtool knobs and info */
 	bool rx_csum; 		/* BE card must perform rx-checksumming */
diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
index 59d25ac..ff62aae 100644
--- a/drivers/net/benet/be_cmds.c
+++ b/drivers/net/benet/be_cmds.c
@@ -18,6 +18,9 @@
 #include "be.h"
 #include "be_cmds.h"
 
+/* Must be a power of 2 or else MODULO will BUG_ON */
+static int be_get_temp_freq = 32;
+
 static void be_mcc_notify(struct be_adapter *adapter)
 {
 	struct be_queue_info *mccq = &adapter->mcc_obj.q;
@@ -1069,6 +1072,9 @@ int be_cmd_get_stats(struct be_adapter *adapter, struct be_dma_mem *nonemb_cmd)
 	struct be_sge *sge;
 	int status = 0;
 
+	if (MODULO(adapter->work_counter, be_get_temp_freq) == 0)
+		be_cmd_get_die_temperature(adapter);
+
 	spin_lock_bh(&adapter->mcc_lock);
 
 	wrb = wrb_from_mccq(adapter);
@@ -1136,6 +1142,44 @@ err:
 	return status;
 }
 
+/* Uses synchronous mcc */
+int be_cmd_get_die_temperature(struct be_adapter *adapter)
+{
+	struct be_mcc_wrb *wrb;
+	struct be_cmd_req_get_cntl_addnl_attribs *req;
+	int status;
+
+	spin_lock_bh(&adapter->mcc_lock);
+
+	wrb = wrb_from_mccq(adapter);
+	if (!wrb) {
+		status = -EBUSY;
+		goto err;
+	}
+	req = embedded_payload(wrb);
+
+	be_wrb_hdr_prepare(wrb, sizeof(*req), true, 0,
+			OPCODE_COMMON_GET_CNTL_ADDITIONAL_ATTRIBUTES);
+
+	be_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_COMMON,
+		OPCODE_COMMON_GET_CNTL_ADDITIONAL_ATTRIBUTES, sizeof(*req));
+
+	status = be_mcc_notify_wait(adapter);
+	if (!status) {
+		struct be_cmd_resp_get_cntl_addnl_attribs *resp =
+						embedded_payload(wrb);
+		adapter->drv_stats.be_on_die_temperature =
+						resp->on_die_temperature;
+	}
+	/* If IOCTL fails once, do not bother issuing it again */
+	else
+		be_get_temp_freq = 0;
+
+err:
+	spin_unlock_bh(&adapter->mcc_lock);
+	return status;
+}
+
 /* Uses Mbox */
 int be_cmd_get_fw_ver(struct be_adapter *adapter, char *fw_ver)
 {
diff --git a/drivers/net/benet/be_cmds.h b/drivers/net/benet/be_cmds.h
index a5af296..6e89de8 100644
--- a/drivers/net/benet/be_cmds.h
+++ b/drivers/net/benet/be_cmds.h
@@ -189,6 +189,7 @@ struct be_mcc_mailbox {
 #define OPCODE_COMMON_GET_BEACON_STATE			70
 #define OPCODE_COMMON_READ_TRANSRECV_DATA		73
 #define OPCODE_COMMON_GET_PHY_DETAILS			102
+#define OPCODE_COMMON_GET_CNTL_ADDITIONAL_ATTRIBUTES	121
 
 #define OPCODE_ETH_RSS_CONFIG				1
 #define OPCODE_ETH_ACPI_CONFIG				2
@@ -668,6 +669,20 @@ struct be_cmd_resp_get_stats {
 	struct be_hw_stats hw_stats;
 };
 
+struct be_cmd_req_get_cntl_addnl_attribs {
+	struct be_cmd_req_hdr hdr;
+	u8 rsvd[8];
+};
+
+struct be_cmd_resp_get_cntl_addnl_attribs {
+	struct be_cmd_resp_hdr hdr;
+	u16 ipl_file_number;
+	u8 ipl_file_version;
+	u8 rsvd0;
+	u8 on_die_temperature; /* in degrees centigrade*/
+	u8 rsvd1[3];
+};
+
 struct be_cmd_req_vlan_config {
 	struct be_cmd_req_hdr hdr;
 	u8 interface_id;
@@ -1099,4 +1114,5 @@ extern int be_cmd_get_phy_info(struct be_adapter *adapter,
 		struct be_dma_mem *cmd);
 extern int be_cmd_set_qos(struct be_adapter *adapter, u32 bps, u32 domain);
 extern void be_detect_dump_ue(struct be_adapter *adapter);
+extern int be_cmd_get_die_temperature(struct be_adapter *adapter);
 
diff --git a/drivers/net/benet/be_ethtool.c b/drivers/net/benet/be_ethtool.c
index 0833cbd..4766693 100644
--- a/drivers/net/benet/be_ethtool.c
+++ b/drivers/net/benet/be_ethtool.c
@@ -27,7 +27,7 @@ struct be_ethtool_stat {
 };
 
 enum {NETSTAT, PORTSTAT, MISCSTAT, DRVSTAT_TX, DRVSTAT_RX, ERXSTAT,
-			PMEMSTAT};
+			PMEMSTAT, DRVSTAT};
 #define FIELDINFO(_struct, field) FIELD_SIZEOF(_struct, field), \
 					offsetof(_struct, field)
 #define NETSTAT_INFO(field) 	#field, NETSTAT,\
@@ -46,6 +46,9 @@ enum {NETSTAT, PORTSTAT, MISCSTAT, DRVSTAT_TX, DRVSTAT_RX, ERXSTAT,
 					FIELDINFO(struct be_erx_stats, field)
 #define PMEMSTAT_INFO(field) 	#field, PMEMSTAT,\
 					FIELDINFO(struct be_pmem_stats, field)
+#define	DRVSTAT_INFO(field)	#field, DRVSTAT,\
+					FIELDINFO(struct be_drv_stats, \
+						field)
 
 static const struct be_ethtool_stat et_stats[] = {
 	{NETSTAT_INFO(rx_packets)},
@@ -105,7 +108,8 @@ static const struct be_ethtool_stat et_stats[] = {
 	{MISCSTAT_INFO(rx_drops_mtu)},
 	{MISCSTAT_INFO(port0_jabber_events)},
 	{MISCSTAT_INFO(port1_jabber_events)},
-	{PMEMSTAT_INFO(eth_red_drops)}
+	{PMEMSTAT_INFO(eth_red_drops)},
+	{DRVSTAT_INFO(be_on_die_temperature)}
 };
 #define ETHTOOL_STATS_NUM ARRAY_SIZE(et_stats)
 
@@ -285,6 +289,9 @@ be_get_ethtool_stats(struct net_device *netdev,
 		case PMEMSTAT:
 			p = &hw_stats->pmem;
 			break;
+		case DRVSTAT:
+			p = &adapter->drv_stats;
+			break;
 		}
 
 		p = (u8 *)p + et_stats[i].offset;
-- 
1.7.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox