Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] TBF: stop qdisc infanticide
From: Patrick McHardy @ 2010-05-13 16:30 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20100513092728.766ee059@nehalam>

Stephen Hemminger wrote:
> On Thu, 13 May 2010 18:22:56 +0200
> Patrick McHardy <kaber@trash.net> wrote:
> 
>> Stephen Hemminger wrote:
>>> Several netem users have complained that when using TBF for rate control
>>> that any change to TBF parameters destroys the child qdisc. A typical
>>> use is to have a test that sets up netem + TBF then changes bandwidth
>>> setting.  But every time the parameters of TBF are changed it destroys
>>> the child qdisc, requiring reconfiguration. Other qdisc's like HTB
>>> don't do this.
>>>
>>> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
>>>
>>>
>>> --- a/net/sched/sch_tbf.c	2010-05-12 20:41:06.257006386 -0700
>>> +++ b/net/sched/sch_tbf.c	2010-05-12 20:52:35.671216316 -0700
>>> @@ -273,7 +273,11 @@ static int tbf_change(struct Qdisc* sch,
>>>  	if (max_size < 0)
>>>  		goto done;
>>>  
>>> -	if (qopt->limit > 0) {
>>> +	if (q->qdisc) {
>>> +		err = fifo_set_limit(q->qdisc, qopt->limit);
>>> +		if (err)
>>> +			goto done;
>> q->qdisc is never NULL since a noop_qdisc is assigned by default. Also
>> this should check that the child is in fact one of the *fifos.
> 
> But the child will be netem and fifo_set_limit ignores non-fifo.

OK, but it does need to make sure the child is not a noop_qdisc,
otherwise it won't create the default bfifo.

^ permalink raw reply

* Re: [PATCH] net sched: cleanup and rate limit warning
From: jamal @ 2010-05-13 16:40 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Patrick McHardy, David Miller, netdev
In-Reply-To: <20100513092643.6551cb48@nehalam>

On Thu, 2010-05-13 at 09:26 -0700, Stephen Hemminger wrote:

> > 
> 
> And the kernel message needs to be fixed to prevent total
> log overload for the next poor sop who makes the same mistake.
> 
> Please accept the patch.

I have no problem with the patch. Dont think it needs to go to stable
but i'll let Dave call it.

Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>


cheers,
jamal


^ permalink raw reply

* Re: [PATCH net-next] drivers/net: Remove unnecessary returns from void function()s
From: Joe Perches @ 2010-05-13 16:58 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1273606224.20514.295.camel@Joe-Laptop.home>

On Tue, 2010-05-11 at 12:30 -0700, Joe Perches wrote:
> diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
> index 2aa3367..02497bc 100644
> --- a/drivers/net/bonding/bonding.h
> +++ b/drivers/net/bonding/bonding.h
> @@ -368,15 +368,12 @@ void bond_unregister_ipv6_notifier(void);
>  #else
>  static inline void bond_send_unsolicited_na(struct bonding *bond)
>  {
> -	return;
>  }
>  static inline void bond_register_ipv6_notifier(void)
>  {
> -	return;
>  }
>  static inline void bond_unregister_ipv6_notifier(void)
>  {
> -	return;
>  }
>  #endif

fyi: Patrick McHardy prefers null statement void functions to
keep the return.

There are some more removals of return like this in the patch.
If a new patch should be generated, do tell.


^ permalink raw reply

* Loan Application
From: Loan Solution Team @ 2010-05-13 16:44 UTC (permalink / raw)




--
Dear Customer,

Loan Solution Team, help you to change the way you live.Arranging a loan
with us is simple and straight forward,loans from any amount to $5,000 to
$2,000,000.00 over 5 to 30 years.we make the process convenient and
fast.Our interest rate is 3% APR Typical variable and this means thatall
of
our clients pay this rate.

Kind Regards,
Mrs.Terry Clerk
Email:solutionteam2009@f-mail.net
----------------------------------------------------------------
This e-mail has been sent via JARING webmail at http://www.jaring.my


^ permalink raw reply

* Re: [PATCH net-next-2.6 2/2] bonding: allow user-controlled output slave selection
From: Neil Horman @ 2010-05-13 17:15 UTC (permalink / raw)
  To: Andy Gospodarek; +Cc: Jay Vosburgh, netdev
In-Reply-To: <20100512221408.GI7497@gospo.rdu.redhat.com>

On Wed, May 12, 2010 at 06:14:08PM -0400, Andy Gospodarek wrote:
> On Wed, May 12, 2010 at 12:41:54PM -0700, Jay Vosburgh wrote:
> > Neil Horman <nhorman@tuxdriver.com> wrote:
> > 
> > >On Tue, May 11, 2010 at 01:09:39PM -0700, Jay Vosburgh wrote:
> > >> Andy Gospodarek <andy@greyhouse.net> wrote:
> > >> 
> > >> >This patch give the user the ability to control the output slave for
> > >> >round-robin and active-backup bonding.  Similar functionality was
> > >> >discussed in the past, but Jay Vosburgh indicated he would rather see a
> > >> >feature like this added to existing modes rather than creating a
> > >> >completely new mode.  Jay's thoughts as well as Neil's input surrounding
> > >> >some of the issues with the first implementation pushed us toward a
> > >> >design that relied on the queue_mapping rather than skb marks.
> > >> >Round-robin and active-backup modes were chosen as the first users of
> > >> >this slave selection as they seemed like the most logical choices when
> > >> >considering a multi-switch environment.
> > >> >
> > >> >Round-robin mode works without any modification, but active-backup does
> > >> >require inclusion of the first patch in this series and setting
> > >> >the 'keep_all' flag.  This will allow reception of unicast traffic on
> > >> >any of the backup interfaces.
> > >> 
> > >> 	Yes, I did think that the mark business fit better into existing
> > >> modes (I thought of it as kind of a new hash for xor and 802.3ad modes).
> > >> I also didn't expect to see so much new stuff (this, as well as the FCOE
> > >> special cases being discussed elsewhere) being shoehorned into the
> > >> active-backup mode.  I'm not so sure that adding so many special cases
> > >> to active-backup is a good thing.
> > >> 
> > >> 	Now, I'm starting to wonder if you were right, and it would be
> > >> better overall to have a "manual" mode that would hopefully satisfy this
> > >> case as well as the FCOE special case.  I don't think either of these is
> > >> a bad use case, I'm just not sure the right way to handle them is
> > >> another special knob in active-backup mode (either directly, or
> > >> implicitly in __netif_receive_skb), which wasn't what I expected to see.
> > >> 
> > >I honestly don't think a separate mode is warranted here.  While I'm not opposed
> > >to adding a new mode, I really think doing so is no different from overloading
> > >an existing mode.  I say that because to add a new mode in which we explicitly
> > >expect traffic to be directed to various slaves requires that we implement a
> > >policy for frames which have no queue mapping determined on egress.  Any policy
> > >I can think of is really an approximation of an existing policy, so we may as
> > >well reuse the policy code that we already have in place.  About the only way a
> > >separate mode makes sense is in the 'passthrough' queue mode you document below.
> > >In this model, in which queue ids map to slaves in a 1:1 fashion it doesn't make
> > >senes.
> > 
> > 	One goal I'm hoping to achieve is something that would satisfy
> > both the queue map stuff that you're looking for, and would meet the
> > needs of the FCOE people who also want to disable the duplicate
> > suppression (i.e., permit incoming traffic on the inactive slave) for a
> > different special case.
> > 
> > 	The FCOE proposal revolves around, essentially, active-backup
> > mode, but permitting incoming traffic on the inactive slave.  At the
> > moment, the patches attempt to special case things such that only
> > dev_add_pack listeners directly bound to the inactive slave are checked
> > (to permit the FCOE traffic to pass on the inactive slave, but still
> > dropping IP, as ip_rcv is a wildcard bind).
> > 
> > 	Your keep_all patch is, by and large, the same thing, except
> > that it permits anything to come in on the "inactive" slave, and it's a
> > switch that has to be turned on.
> > 
> > 	This seems like needless duplication to me; I'd prefer to see a
> > single solution that handles both cases instead of two special cases
> > that each do 90% of what the other does.
> > 
> > 	As far as a new mode goes, one major reason I think a separate
> > mode is warranted is the semantics: with either of these changes (to
> > permit more or less regular use of the "inactive" slaves), the mode
> > isn't really an "active-backup" mode any longer; there is no "inactive"
> > or "backup" slave.  I think of this as being a major change of
> > functionality, not simply a minor option.
> > 
> > 	Hence my thought that "active-backup" could stay as a "true" hot
> > standby mode (backup slaves are just that: for backup, only), and this
> > new mode would be the place to do the special queue-map / FCOE /
> > whatever that isn't really a hot standby configuration any longer.
> > 
> > 	As far as the behavior of the new mode (your concern about its
> > policy map approximations), in the end, it would probably act pretty
> > much like active-backup with your patch applied: traffic goes out the
> > active slave, unless directed otherwise.  It's a lot less complicated
> > than I had feared.
> > 
> 
> It's beginning to sound like the 'FCoE use-case' and the one Neil and I
> are proposing are quite similar.  The main goal of both is to have the
> option to have multiple slaves send and receive traffic during the
> steady-state, but in the event of a failover all traffic would run on a
> single interface.
> 
> The implementation proposed with this patch is a bit different that the
> 'mark-mode' patch you may recall I posted a few months ago.  It created
> a new mode that essentially did exactly what you are describing --
> transmit on the primary interface unless pushed to another interface via
> info in the skb and receive on all interfaces.  We initially did not
> create a new mode based on your reservations about the previous
> mark-mode patch and went the direction of enhancing one or two modes
> initially (figuring it would be good to run before walking), with the
> idea that other modes could take care of this output slave selection
> logic in the future.


So, its sounding to me like everyone is leaning toward a new mode approach here.
Before we go ahead and start coding, I hear the bullet points for this approach
as:

1) Implement a new bond mode where queue ids are used to steer frames to output
interfaces

2) Use said mode to imply universal frame reception (i.e. remove the keep_all
knob, and turn on that behavior when this new mode is selected)

3) use John F.'s skb_should_drop changes to implement the keep_all feature.

Does that sound about right?

Regards
Neil


^ permalink raw reply

* Re: [PATCH net-next] drivers/net: Remove unnecessary returns from void function()s
From: Harvey Harrison @ 2010-05-13 18:37 UTC (permalink / raw)
  To: Joe Perches; +Cc: netdev
In-Reply-To: <1273769886.21514.72.camel@Joe-Laptop.home>

On Thu, May 13, 2010 at 9:58 AM, Joe Perches <joe@perches.com> wrote:
> On Tue, 2010-05-11 at 12:30 -0700, Joe Perches wrote:
>> diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>> index 2aa3367..02497bc 100644
>> --- a/drivers/net/bonding/bonding.h
>> +++ b/drivers/net/bonding/bonding.h
>> @@ -368,15 +368,12 @@ void bond_unregister_ipv6_notifier(void);
>>  #else
>>  static inline void bond_send_unsolicited_na(struct bonding *bond)
>>  {
>> -     return;
>>  }
>>  static inline void bond_register_ipv6_notifier(void)
>>  {
>> -     return;
>>  }
>>  static inline void bond_unregister_ipv6_notifier(void)
>>  {
>> -     return;
>>  }
>>  #endif
>
> fyi: Patrick McHardy prefers null statement void functions to
> keep the return.
>
> There are some more removals of return like this in the patch.
> If a new patch should be generated, do tell.
>

If you're looking to save a few more lines, many places do the following:

static inline void bond_unregister_ipv6_notifier(void) {}

Cheers,

Harvey

^ permalink raw reply

* Re: [PATCH net-next-2.6 2/2] bonding: allow user-controlled output slave selection
From: Jay Vosburgh @ 2010-05-13 18:54 UTC (permalink / raw)
  To: John Fastabend; +Cc: Andy Gospodarek, Neil Horman, netdev@vger.kernel.org
In-Reply-To: <4BEBAAF4.5040904@intel.com>

John Fastabend <john.r.fastabend@intel.com> wrote:

>Andy Gospodarek wrote:
>> On Wed, May 12, 2010 at 12:41:54PM -0700, Jay Vosburgh wrote:
[...]
>>>       One goal I'm hoping to achieve is something that would satisfy
>>> both the queue map stuff that you're looking for, and would meet the
>>> needs of the FCOE people who also want to disable the duplicate
>>> suppression (i.e., permit incoming traffic on the inactive slave) for a
>>> different special case.
>>>
>>>       The FCOE proposal revolves around, essentially, active-backup
>>> mode, but permitting incoming traffic on the inactive slave.  At the
>>> moment, the patches attempt to special case things such that only
>>> dev_add_pack listeners directly bound to the inactive slave are checked
>>> (to permit the FCOE traffic to pass on the inactive slave, but still
>>> dropping IP, as ip_rcv is a wildcard bind).
>>>
>>>       Your keep_all patch is, by and large, the same thing, except
>>> that it permits anything to come in on the "inactive" slave, and it's a
>>> switch that has to be turned on.
>>>
>>>       This seems like needless duplication to me; I'd prefer to see a
>>> single solution that handles both cases instead of two special cases
>>> that each do 90% of what the other does.
>>>
>>>       As far as a new mode goes, one major reason I think a separate
>>> mode is warranted is the semantics: with either of these changes (to
>>> permit more or less regular use of the "inactive" slaves), the mode
>>> isn't really an "active-backup" mode any longer; there is no "inactive"
>>> or "backup" slave.  I think of this as being a major change of
>>> functionality, not simply a minor option.
>>>
>>>       Hence my thought that "active-backup" could stay as a "true" hot
>>> standby mode (backup slaves are just that: for backup, only), and this
>>> new mode would be the place to do the special queue-map / FCOE /
>>> whatever that isn't really a hot standby configuration any longer.
>>>
>>>       As far as the behavior of the new mode (your concern about its
>>> policy map approximations), in the end, it would probably act pretty
>>> much like active-backup with your patch applied: traffic goes out the
>>> active slave, unless directed otherwise.  It's a lot less complicated
>>> than I had feared.
>>>
>>
>> It's beginning to sound like the 'FCoE use-case' and the one Neil and I
>> are proposing are quite similar.  The main goal of both is to have the
>> option to have multiple slaves send and receive traffic during the
>> steady-state, but in the event of a failover all traffic would run on a
>> single interface.
>>
>
>I believe they are similar although I never considered using FCoE over a
>device that is actually in the bond.  For example the current FCoE use
>case is,
>
>bond0 ------> ethx
>               |
>vlan-fcoe -->  |
>
>Here vlan-fcoe is not a slave of bond0.  With the keep_all patch this
>would work plus an additional configuration,

	I also recall discussion that another valid FCOE configuration
is to simply bind to ethx, with no VLANs involved.

>bond0 --> vlan-fcoe1  ---> ethx
>   \                        |
>    \ --- vlan-fcoe2  --->  |
>
>Here both vlan-fcoe1 and vlan-fcoe2 are slaves of bond0.
>
>Even with the keep_all patch it still seems a little inconsistent to drop
>a packet outright if it is received on an inactive slave and destined for
>a vlan on the bond and then to deliver the packet to devices that have
>exact matches if it is received on an inactive slave but destined for the
>bond device.  I'll post a patch in just a moment that hopefully
>illustrates what I see as an unexpected side effect.

	Yes, I understand this, and I view this as a separate concern
from the duplicate suppressor (although they are linked to a degree).

	The ultimate intent (for your changes) is to permit slaves to
operate simultaneously as members of the bond as well as independent
entities, which is a significant behavior change from the past.

>> The implementation proposed with this patch is a bit different that the
>> 'mark-mode' patch you may recall I posted a few months ago.  It created
>> a new mode that essentially did exactly what you are describing --
>> transmit on the primary interface unless pushed to another interface via
>> info in the skb and receive on all interfaces.  We initially did not
>> create a new mode based on your reservations about the previous
>> mark-mode patch and went the direction of enhancing one or two modes
>> initially (figuring it would be good to run before walking), with the
>> idea that other modes could take care of this output slave selection
>> logic in the future.
>>
>>
>>>>>    I presume you're overloading active-backup because it's not
>>>>> etherchannel, 802.3ad, etc, and just talks right to the switch.  For the
>>>>> regular load balance modes, I still think overlay into the existing
>>>>> modes is preferable (more on that later); I'm thinking of "manual"
>>>>> instead of another tweak to active-backup.
>>>>>
>>>>>    If users want to have actual hot-standby functionality, then
>>>>> active-backup would do that, and nothing else (and it can be multi-queue
>>>>> aware, but only one slave active at a time).
>>>>>
>>>> Yes, but active backup doesn't provide prefered output path selection in and of
>>>> itself.  Thats the feature here.
>>>       I understand that; I'm suggesting that active-backup should
>>> provide no service other than hot standby, and not be overloaded into a
>>> manual load balancing scheme (both for your use, and for FCOE).
>>>
>>>       Maybe I'm worrying too much about defending the purity of the
>>> active-backup mode; I understand what you're trying to do a little
>>> better now, and yes, the "manual" mode I think of (in your queue mapping
>>> scheme, not the other doodads I talked about) would basically be
>>> active-backup with your queue mapper, minus the duplicate suppressor.
>>>
>>
>> It doesn't matter terribly to me which direction is taken.  Again, a
>> major reason this route was proposed was that you were not as keen on
>> creating a new mode as I was at the time of that patch posting.  It's
>> somewhat understandable as once a mode is added it's tough to take away,
>> but when one sees how much we are really changing the way active-backup
>> might behave in some cases maybe it makes sense to use a new mode?
>>
>> I guess I like the idea of adding this output selection to existing
>> modes because it at least gives us the option to use queue maps to
>> select output interfaces for more than a mode that looks like
>> present-day active-backup minus the duplicate suppression.   I'm happy to
>> code-up a patch that creates a new mode, but before I go do that and
>> test it, I'd like to know we have come to an agreement on the direction
>> for the future.
>>
>>>>>    Users who want the set of bonded slaves to look like a big
>>>>> multiqueue buffet could use this "manual" mode and set things up however
>>>>> they want.  One way to set it up is simply that the bond is N queues
>>>>> wide, where N is the total of the queue counts of all the slaves.  If a
>>>>> slave fails, N gets smaller, and the user code has to deal with that.
>>>>> Since the queue count of a device can't change dynamically, the bond
>>>>> would have to actually be set up with some big number of queues, and
>>>>> then only a subset is actually active (or there is some sort of wrap).
>>>>>
>>>>>    In such an implementation, each slave would have a range of
>>>>> queue IDs, not necessarily just one.  I'm a bit leery of exposing an API
>>>>> where each slave is one queue ID, as it could make transitioning to real
>>>>> multi-queue awareness difficult.
>>>>>
>>>> I'm sorry, what exactly do you mean when you say 'real' multi queue
>>>> awareness?  How is this any less real than any other implementation?  The
>>>> approach you outline above isn't any more or less valid than this one.
>>>       That was my misunderstanding of how you planned to handle
>>> things.  I had thought this patch was simply a scheme to use the queue
>>> IDs for slave selection, without any method to further perform queue
>>> selection on the slave itself (I hadn't thought of placing a tc action
>>> on the slave itself, which you described later on).  I had been thinking
>>> in terms of schemes to expose all of the slave queues on the bonding
>>> device.
>>
>> It wasn't our original intention either.  I didn't mention it in my
>> original post as it wasn't really the intent of our patch, but a nice
>> side-effect for the informed user. :) Obviously a bit more testing could
>> take place and we could add more examples to the documentation for the
>> nice side-effect feature of this patch, but since this wasn't our
>> original intent and we didn't test it, we did not advertise it.
>>
>>>       So, I don't see any issues with the queue mapping part.  I still
>>> want to find a common solution for FCOE / your patch with regards to the
>>> duplicate suppressor.
>>
>> Understood.
>>
>>>> While we're on the subject, Andy and I did discuss a model simmilar to what you
>>>> describe above (what I'll refer to as a queue id passthrough model), in which
>>>> you can tell the bonding driver to map a frame to a queue, and the bonding
>>>> driver doesn't really do anything with the queue id other than pass to the slave
>>>> device for hardware based multiqueue tx handling.  While we could do that, its
>>>> my feeling such a model isn't the way to go for two primary reasons:
>>>>
>>>> 1) Inconsistent behavior.  Such an implementation makes assumptions regarding
>>>> queue id specification within a driver.  For example, What if one of the slaves
>>>> reserves some fixed number of low order queues for a sepecific purpose, and as
>>>> such general use queues begin an at offset from zero, while other slaves do not.
>>>> While its easy to accomidate such needs when writing the tc filters, if a slave
>>>> fails over, such a bias would change output traffic behavior, as the bonding
>>>> driver can't be clearly informed of such a bias.  Likewise, what if a slave
>>>> driver allocates more queues than it actually supports in hardware (like the
>>>> implementation you propose, ixgbe IIRC actually does this).  If slaves handled
>>>> unimplemented tx queues different (if one wrapped queues, while the other simply
>>>> dropped frames to unimplemented queues for instance).  A failover would change
>>>> traffic patterns dramatically.
>>>>
>>>> 2) Need.  While (1) can pretty easily be managed with a few configuration
>>>> guidelines (output queues on slaves have to be configured identically, lets
>>>> chaos and madness befall you, etc), theres really no reason to bind users to
>>>> such a system.  We're using tc filters to set the queue id on skbs enqueued to
>>>> the bonding driver, theres absolutely no reason you can add addition filters to
>>>> the slaves directly.  Since the bonding driver uses dev_queue_xmit to send a
>>>> frame to a slave, it has the opportunity to pass through another set of queuing
>>>> diciplines and filters that can reset and re-assign the skbs queue mapping.  So
>>>> with the approach in this patch you can get both direct output control without
>>>> sacrificing actual hardware tx output queue control.  With a passthrough model,
>>>> you save a bit of filter configuration, but at the expense of having to be much
>>>> more careful about how you configure your slave nics, and detecting such errors
>>>> in configuration would be rather difficult to track down, as it would require
>>>> the generation of traffic that hit the right filter after a failover.
>>>       I don't disagree with any of this.  One thought I do have is
>>> that Eric Dumazet, I believe, has mentioned that the read lock in
>>> bonding is a limiting factor on 10G performance.  In the far distant
>>> future when bonding is RCU, going through the lock(s) on the tc actions
>>> of the slave could have the same net effect, and in such a case, a
>>> qdisc-less path may be of benefit.  Not a concern for today, I suspect.
>>>
>>>>>    There might also be a way to tie it in to the new RPS code on
>>>>> the receive side.
>>>>>
>>>>>    If the slaves all have the same MAC and attach to a single
>>>>> switch via etherchannel, then it all looks pretty much like a single big
>>>>> honkin' multiqueue device.  The switch probably won't map the flows back
>>>>> the same way, though.
>>>>>
>>>> I agree, they probably wont.  Receive side handling wasn't really our focus here
>>>> though.  Thats largely why we chose round robin and active backup as our first
>>>> modes to use this with.  They are already written to expect frames on either
>>>> interface.
>>>>
>>>>>    If the slaves are on discrete switches (without etherchannel),
>>>>> things become more complicated.  If the slaves have the same MAC, then
>>>>> the switches will be irritated about seeing that same MAC coming in from
>>>>> multiple places.  If the slaves have different MACs, then ARP has the
>>>>> same sort of issues.
>>>>>
>>>>>    In thinking about it, if it's linux bonding at both ends, there
>>>>> could be any number of discrete switches in the path, and it wouldn't
>>>>> matter as long as the linux end can work things out, e.g.,
>>>>>
>>>>>         -- switch 1 --
>>>>> hostA  /              \  hostB
>>>>> bond  ---- switch 2 ---- bond
>>>>>        \              /
>>>>>         -- switch 3 --
>>>>>
>>>>>    For something like this, the switches would never share MAC
>>>>> information for the bonding slaves.  The issue here then becomes more of
>>>>> detecting link failures (it would require either a "trunk failover" type
>>>>> of function on the switch, or some kind of active probe between the
>>>>> bonds).
>>>>>
>>>>>    Now, I realize that I'm babbling a bit, as from reading your
>>>>> description, this isn't necessarily your target topology (which sounded
>>>>> more like a case of slave A can reach only network X, and slave B can
>>>>> reach anywhere, so sending to network X should use slave A
>>>>> preferentially), or, as long as I'm doing ASCII-art,
>>>>>
>>>>>        --- switch 1 ---- network X
>>>>> hostA /               /
>>>>> bond  ---- switch 2 -+-- anywhere
>>>>>
>>>>>    Is that an accurate representation?  Or is it something a bit
>>>>> different, e.g.,
>>>>>
>>>>>        --- switch 1 ---- network X -\
>>>>> hostA /                             /
>>>>> bond  ---- switch 2 ---- anywhere --
>>>>>
>>>>>    I.e., the "anywhere" connects back to network X from the
>>>>> outside, so to speak.  Or, oh, maybe I'm missing it entirely, and you're
>>>>> thinking of something like this:
>>>>>
>>>>>        --- switch 1 --- VPN --- web site
>>>>> hostA /                          /
>>>>> bond  ---- switch 2 - Internet -/
>>>>>
>>>>>    Where you prefer to hit "web site" via the VPN (perhaps it's a
>>>>> more efficient or secure path), but can do it from the public network at
>>>>> large if necessary.
>>>>>
>>>> Yes, this one.  I think the other models are equally interesting, but this model
>>>> in which either path had universal reachabilty, but for some classes of traffic
>>>> one path is preferred over the other is the one we had in mind.
>>>>
>>>>>    Now, regardless of the above, your first patch ("keep_all") is
>>>>> to deal with the reverse problem, if this is a piggyback on top of
>>>>> active-backup mode: how to get packets back, when both channels can be
>>>>> active simultaneously.  That actually dovetails to a degree with work
>>>>> I've been doing lately, but the solution there probably isn't what
>>>>> you're looking for (there's a user space daemon to do path finding, and
>>>>> the "bond IP" address is piggybacked on the slaves' MAC addresses, which
>>>>> are not changed; the "bond IP" set exists in a separate subnet all its
>>>>> own).
>>>>>
>>>>>    As I said, I'm not convinced that the "keep_all" option to
>>>>> active-backup is really better than just a "manual" mode that lacks the
>>>>> dup suppression and expects the user to set everything up.
>>>>>
>>>>>    As for the round-robin change in this patch, if I'm reading it
>>>>> right, then the way it works is that the packets are round-robined,
>>>>> unless there's a queue id passed in, in which case it's assigned to the
>>>>> slave mapped to that queue id.  I'm not entirely sure why you picked
>>>>> round-robin mode for that over balance-xor; it doesn't seem to fit well
>>>>> with the description in the documentation.  Or is it just sort of a
>>>>> demonstrator?
>>>>>
>>>> It was selected because round robin allows transmits on any interface already,
>>>> and expects frames on any interface, so it was a 'safe' choice.  I would think
>>>> balance-xor would also work.  Ideally it would be nice to get more modes
>>>> supporting this mechanism.
>>>       I think that this should work for balance-xor and 802.3ad.  The
>>> only limitation for 802.3ad is that the spec requires "conversations" to
>>> not be striped or to skip around in a manner that could lead to out of
>>> order delivery.
>>
>> Agreed.  Checking would probably also have to be done to make sure that
>> we were not trasmitting on an inactive aggregator.
>>
>>>       I'm not so sure about the alb/tlb modes; at first thought, I
>>> think it could have conflicts with the internal balancing done within
>>> the modes (if, e.g., the tc action put traffic for the same destination
>>> on two slaves).
>>>
>>
>> TLB and ALB modes would certainly have to be done differently.  It
>> should not be terribly difficult to move from the existing hashing
>> that's done to one that relies on the queue_mapping, but it will take a
>> bit to make sure it's not a complete hack.
>>
>> We decided against doing that for all modes on the first pass as it
>> seemed like the active-backup and round-robin were the most-likely
>> users.  We also wanted present the code early rather that spending time
>> supporting this on every-mode to find out that it just wasn't rational
>> to do it on some of them.
>>
>>>>>    I do like one other aspect of the patch, and that's the concept
>>>>> of overlaying the queue map on top of the balance algorithm.  So, e.g.,
>>>>> balance-xor would do its usual thing, unless the packet is queue mapped,
>>>>> in which case the packet's assignment is obeyed.  The balance-xor could
>>>>> even optionally do its xor across the full set of all slaves output
>>>>> queues instead of just across the slaves.  Round-robin can operate
>>>>> similarly.  For those modes, a "balance by queue vs. balance by slave"
>>>>> seems like a reasonable knob to have.
>>>> Not sure what you mean here.  In the model implemented by this patch, there is
>>>> one output queue per slave, and as such, balance by queue == balance by slave.
>>>> That would make sense in the model you describe earlier in this note, but not in
>>>> the model presented by this patch.
>>>       Yes, I was thinking about what I had described; again,
>>> predicated on my misunderstanding of how it all worked.
>>>
>>>>>    I do understand that you're proposing something relatively
>>>>> simple, and I'm thinking out loud about alternate or additional
>>>>> implementation details.  Some of this is "ooh ahh what if", but we also
>>>>> don't want to end up with something that's forwards incompatible, and
>>>>> I'm hoping to find one solution to multiple problems.
>>>>>
>>>> For clarification, can you ennumerate what other problems you are trying to
>>>> solve with this feature, or features simmilar to this?  From this email, the one
>>>> that I most clearly see is the desire to allow a passthrough mode of queue
>>>> selection, which I think I've noted can be done already (even without this
>>>> patch), by attaching additional tc filters to the slaves output queues directly.
>>>> What else do you have in mind?
>>>       As I said above, I hadn't thought of stacking tc actions on to
>>> the slaves directly, so I was thinking on ways to expose the slave
>>> queues.
>>>
>>>       I still find something intriguing about a round-robin or xor
>>> mode that robins/xors through all of the slave queues, though, but that
>>> should be something separate (I'm not sure if such a scheme is actually
>>> "better", either).
>>>
>>>       -J
>
>It would be best if there was a solution for the FCoE use case that works
>with the current bonding modes including 802.3ad.  There is switch support
>to run mpio FCoE while doing link aggregation on the LAN side that we
>should support.  I'm not sure the keep_all patch would be good in this
>case Jay I think you mentioned this at some point, but I missed the
>conclusion?  Although maybe it would be OK I'll think about it some more
>tomorrow.

	How does that mpio FCOE / switch support function?  Does it rely
on utilizing ports (for the FC traffic) that are not members of the
802.3ad active aggregator?  E.g.:

       / eth0,eth1 -- switch A -- etc
bond0 -
       \ eth2,eth3 -- switch B -- etc

	Bond0 has four slaves, eth0 - eth3.  Eth0 and eth1 connect to
switch A; eth2 and eth3 to switch B.  Presuming that the switches aren't
stacked / magic, either eth0/eth1 or eth2/eth3 will be the active
aggregator (linux bonding only supports one active aggregator).

	Am I correct in presuming that the FCOE balancer gizmo doesn't
care about the 802.3ad state of the ports, and it and the switch will
run FC traffic across all four ports, regardless of which ports are
active and which are not?

	Or is the switch even simpler than that, and it processes all
FCOE traffic to ports, regardless of how the ports are configured
(etherchannel, 802.3ad, etc)?

	As for other bonding modes, balance-xor or balance-rr (round
robin) shouldn't have the same problems with the duplicate suppression
logic that active-backup and 802.3ad have.  The alb/tlb modes might or
might not be workable at all, depending upon how the FCOE traffic looks
(e.g., what source and destination MAC addresses are in the FCOE
frames?).

	In any event, wanting to run FCOE in conjunction with a variety
of bonding modes suggests that Neil was right all along, and the
duplicate suppressor change should be an option, not a new mode.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply

* Re: ixgbe - problem with packet/bytes count on all queues
From: Paweł Staszewski @ 2010-05-13 19:01 UTC (permalink / raw)
  To: Brandeburg, Jesse; +Cc: Linux Network Development list, e1000-devel
In-Reply-To: <4BE9B2C8.8040909@itcare.pl>

W dniu 2010-05-11 21:40, Paweł Staszewski pisze:
> W dniu 2010-05-11 20:00, Brandeburg, Jesse pisze:
>>
>> On Sun, 18 Apr 2010, Paweł Staszewski wrote:
>>
>>> Hello
>>>
>>> I want to ask is this a normal behavior of ixgb driver and 82598EB nic.
>>> look for tx_queue_7 stats:
>> Hi, sorry no-one replied.
>>
> Thanks for reply :)
>>>    ethtool -S eth2
>>> NIC statistics:
>>>        rx_packets: 35103252
>>>        tx_packets: 1770371731
>>>        rx_bytes: 3602052416
>>>        tx_bytes: 1369778276
>>>        rx_pkts_nic: 138121006018
>>>        tx_pkts_nic: 122033163226
>>>        rx_bytes_nic: 101484528847981
>>>        tx_bytes_nic: 92258799092069
>>>        lsc_int: 1
>>>        tx_busy: 0
>>>        non_eop_descs: 0
>>>        rx_errors: 0
>>>        tx_errors: 0
>>>        rx_dropped: 0
>>>        tx_dropped: 0
>>>        multicast: 490226
>>>        broadcast: 124104912
>>>        rx_no_buffer_count: 0
>>>        collisions: 0
>>>        rx_over_errors: 0
>>>        rx_crc_errors: 0
>>>        rx_frame_errors: 0
>>>        hw_rsc_aggregated: 0
>>>        hw_rsc_flushed: 0
>>>        fdir_match: 0
>>>        fdir_miss: 0
>>>        rx_fifo_errors: 0
>>>        rx_missed_errors: 0
>>>        tx_aborted_errors: 0
>>>        tx_carrier_errors: 0
>>>        tx_fifo_errors: 0
>>>        tx_heartbeat_errors: 0
>>>        tx_timeout_count: 0
>>>        tx_restart_queue: 111130
>>>        rx_long_length_errors: 38599
>>>        rx_short_length_errors: 0
>>>        tx_flow_control_xon: 0
>>>        rx_flow_control_xon: 0
>>>        tx_flow_control_xoff: 0
>>>        rx_flow_control_xoff: 0
>>>        rx_csum_offload_errors: 1554191
>>>        alloc_rx_page_failed: 0
>>>        alloc_rx_buff_failed: 0
>>>        rx_no_dma_resources: 0
>>>        tx_queue_0_packets: 108685351623
>>>        tx_queue_0_bytes: 79701402025544
>>>        tx_queue_1_packets: 3988024698
>>>        tx_queue_1_bytes: 3353530467775
>>>        tx_queue_2_packets: 1893305707
>>>        tx_queue_2_bytes: 1705357186034
>>>        tx_queue_3_packets: 1787852613
>>>        tx_queue_3_bytes: 1518632482370
>>>        tx_queue_4_packets: 1843108684
>>>        tx_queue_4_bytes: 1641474602504
>>>        tx_queue_5_packets: 1882637467
>>>        tx_queue_5_bytes: 1629905766993
>>>        tx_queue_6_packets: 1952759802
>>>        tx_queue_6_bytes: 1680666591771
>>>        tx_queue_7_packets: 0
>>>        tx_queue_7_bytes: 0
>>>        rx_queue_0_packets: 17361735592
>>>        rx_queue_0_bytes: 12585728518077
>>>        rx_queue_1_packets: 17194262916
>>>        rx_queue_1_bytes: 12518731583464
>>>        rx_queue_2_packets: 17342312348
>>>        rx_queue_2_bytes: 12734959063176
>>>        rx_queue_3_packets: 17367632051
>>>        rx_queue_3_bytes: 12656219984521
>>>        rx_queue_4_packets: 17150307164
>>>        rx_queue_4_bytes: 12408526754019
>>>        rx_queue_5_packets: 17206721842
>>>        rx_queue_5_bytes: 12470666039893
>>>        rx_queue_6_packets: 17202210572
>>>        rx_queue_6_bytes: 12431429298950
>>>        rx_queue_7_packets: 17295822822
>>>        rx_queue_7_bytes: 12573299488239
>>>
>>> and here look at multiq queue number 8:
>>> tc -s -d class show dev eth2
>>> class multiq 1:1 parent 1:
>>>    Sent 6905560675905 bytes 510743840 pkt (dropped 0, overlimits 0
>>> requeues 0)
>>>    backlog 0b 0p requeues 0
>>> class multiq 1:2 parent 1:
>>>    Sent 280699743990 bytes 330210442 pkt (dropped 0, overlimits 0 
>>> requeues 0)
>>>    backlog 0b 0p requeues 0
>>> class multiq 1:3 parent 1:
>>>    Sent 128528666971 bytes 142053106 pkt (dropped 0, overlimits 0 
>>> requeues 0)
>>>    backlog 0b 0p requeues 0
>>> class multiq 1:4 parent 1:
>>>    Sent 123086710694 bytes 140454119 pkt (dropped 0, overlimits 0 
>>> requeues 0)
>>>    backlog 0b 0p requeues 0
>>> class multiq 1:5 parent 1:
>>>    Sent 121027779083 bytes 146164066 pkt (dropped 0, overlimits 0 
>>> requeues 0)
>>>    backlog 0b 0p requeues 0
>>> class multiq 1:6 parent 1:
>>>    Sent 116245520195 bytes 141597610 pkt (dropped 0, overlimits 0 
>>> requeues 0)
>>>    backlog 0b 0p requeues 0
>>> class multiq 1:7 parent 1:
>>>    Sent 133310553887 bytes 151141714 pkt (dropped 0, overlimits 0 
>>> requeues 0)
>>>    backlog 0b 0p requeues 0
>>> class multiq 1:8 parent 1:
>>>    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>>>    backlog 0b 0p requeues 0
>>>
>>> Is that normal that driver don't use queue number 8 ?
>> This seems extremely unusual, can you tell us what kernel version you're
>> using and what kind of test you're running?
>>
> Kernel 2.6.33.1
> Traffic type - normal Internet traffic from many users.
> 2Gbit/s RX + 2.6Gbit/s TX
>
> tc -s -d qdisc show dev eth2
> qdisc mq 0: root
>  Sent 71590101434962 bytes 2410582579 pkt (dropped 0, overlimits 0 
> requeues 199799)
>  backlog 0b 0p requeues 199799
>
> Configuration for this nic:
> 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP 
> qlen 10000
> 8: vlan0100@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 9: vlan0101@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 10: vlan0102@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 11: vlan0103@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 12: vlan0104@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 13: vlan0105@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 14: vlan0106@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 15: vlan0107@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 16: vlan0108@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 17: vlan0109@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 18: vlan0110@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 19: vlan0111@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 20: vlan0112@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 21: vlan0113@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 22: vlan0140@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 23: vlan0141@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 24: vlan0143@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 25: vlan0300@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 26: vlan0114@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 27: vlan0450@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 28: vlan0401@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 29: vlan0402@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 30: vlan0301@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 31: vlan0302@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 32: vlan0303@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 33: vlan0304@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 34: vlan0305@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 35: vlan0306@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 36: vlan0307@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 37: vlan0308@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 38: vlan0309@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 39: vlan0310@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 40: vlan0311@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 41: vlan0312@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 42: vlan0313@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 43: vlan0403@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 44: vlan0314@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 45: vlan0315@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 46: vlan0316@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 47: vlan0317@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 48: vlan0318@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 49: vlan0404@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 50: vlan0405@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 51: vlan0115@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 52: vlan0406@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 53: vlan0116@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 54: vlan0490@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 55: vlan0491@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 56: vlan0319@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 57: vlan0320@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 58: vlan0321@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 59: vlan0322@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 60: vlan0323@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 61: vlan0324@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 62: vlan0325@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 63: vlan0326@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 64: vlan0327@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 65: vlan0328@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 66: vlan0329@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 67: vlan0330@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 68: vlan0331@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 69: vlan0332@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 70: vlan0333@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 71: vlan0334@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 72: vlan0335@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 73: vlan0336@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 74: vlan0337@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 75: vlan0338@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 76: vlan0339@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 77: vlan0340@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 78: vlan0341@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 79: vlan0342@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 80: vlan0343@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 81: vlan0344@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 82: vlan0345@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 83: vlan0117@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 84: vlan0118@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 85: vlan0119@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq 
> state UP qlen 100
> 86: vlan0120@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 87: vlan0121@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 88: vlan0122@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 89: vlan0407@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 90: vlan0408@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 91: vlan0409@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 92: vlan0410@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 93: vlan0411@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 94: vlan0430@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 95: vlan0431@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 96: vlan0432@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 97: vlan0433@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 98: vlan0434@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 99: vlan0435@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 100: vlan0436@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 101: vlan0437@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 102: vlan0438@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 103: vlan0439@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 104: vlan0440@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 105: vlan0451@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 106: vlan0452@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 107: vlan0453@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 108: vlan0454@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 109: vlan0455@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 110: vlan0456@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 111: vlan0457@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 112: vlan0458@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 113: vlan0459@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 114: vlan0461@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 115: vlan0202@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 116: vlan0460@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 117: vlan0462@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 118: vlan0463@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 119: vlan0464@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 120: vlan0203@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 121: vlan0503@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 122: vlan0504@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 123: vlan0130@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 124: vlan0131@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 125: vlan0132@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 126: vlan0133@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 127: vlan0134@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 128: vlan0135@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 129: vlan0136@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 130: vlan0137@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 131: vlan0138@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 132: vlan0123@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 133: vlan0124@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 134: vlan0125@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 135: vlan0126@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 136: vlan0127@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 137: vlan0128@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 138: vlan0129@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 139: vlan0139@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 140: vlan0465@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 141: vlan0466@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 142: vlan0467@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 143: vlan0468@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 144: vlan0469@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 145: vlan0470@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 146: vlan0471@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 147: vlan0472@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 148: vlan0473@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 149: vlan0215@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 150: vlan0144@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 151: vlan0145@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 152: vlan0146@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 153: vlan0147@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 154: vlan0148@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 155: vlan0150@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 156: vlan0151@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 157: vlan0152@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 158: vlan0153@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> hfsc state UP qlen 100
> 159: vlan0412@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 160: vlan0413@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 161: vlan0414@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 162: vlan0415@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 163: vlan0416@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 164: vlan0154@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 165: vlan0155@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 166: vlan0156@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 167: vlan0157@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 168: vlan0158@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 169: vlan0159@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 170: vlan0160@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 171: vlan0161@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 172: vlan0162@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
> 173: vlan0163@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc 
> noqueue state UP qlen 100
>
> more info about nic:
> ethtool -i eth2
> driver: ixgbe
> version: 2.0.44-k2
> firmware-version: 1.12-2
> bus-info: 0000:01:00.0
>
> ethtool -k eth2
> Offload parameters for eth2:
> rx-checksumming: on
> tx-checksumming: on
> scatter-gather: on
> tcp-segmentation-offload: on
> udp-fragmentation-offload: off
> generic-segmentation-offload: off
> generic-receive-offload: off
> large-receive-offload: off
>
>
>
> and other weird thing is after delete qdisc:
> I think this is also not normal.
>
> tc qdisc del dev eth2 root
> tc -s -d class show dev eth2
> class mq :1 root
>  Sent 2608239 bytes 3000 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :2 root
>  Sent 3831841 bytes 3301 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :3 root
>  Sent 3518993 bytes 4016 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :4 root
>  Sent 1750040 bytes 2657 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :5 root
>  Sent 740596 bytes 1221 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :6 root
>  Sent 143782921 bytes 210547 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :7 root
>  Sent 3935866 bytes 5059 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :8 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :9 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :a root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :b root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :c root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :d root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :e root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :f root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :10 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :11 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :12 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :13 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :14 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :15 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :16 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :17 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :18 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :19 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :1a root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :1b root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :1c root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :1d root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :1e root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :1f root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :20 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :21 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :22 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :23 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :24 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :25 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :26 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :27 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :28 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :29 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :2a root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :2b root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :2c root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :2d root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :2e root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :2f root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :30 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :31 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :32 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :33 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :34 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :35 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :36 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :37 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :38 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :39 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :3a root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :3b root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :3c root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :3d root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :3e root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :3f root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :40 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :41 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :42 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :43 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :44 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :45 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :46 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :47 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :48 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :49 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :4a root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :4b root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :4c root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :4d root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :4e root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :4f root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :50 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :51 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :52 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :53 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :54 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :55 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :56 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :57 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :58 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :59 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :5a root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :5b root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :5c root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :5d root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :5e root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :5f root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :60 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :61 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :62 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :63 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :64 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :65 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :66 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :67 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :68 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :69 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :6a root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :6b root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :6c root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :6d root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :6e root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :6f root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :70 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :71 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :72 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :73 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :74 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :75 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :76 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :77 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :78 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :79 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :7a root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :7b root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :7c root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :7d root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :7e root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :7f root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class mq :80 root
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>
>
> Normal i have multiq qdisc attached to device - and no difference when 
> this is bfifo, pfifo or multiq
> tc qdisc add dev eth2 root handle 1: multiq
> then
>  tc -s -d class show dev eth2
> class multiq 1:1 parent 1:
>  Sent 2458266 bytes 3288 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class multiq 1:2 parent 1:
>  Sent 6259789 bytes 5390 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class multiq 1:3 parent 1:
>  Sent 4451430 bytes 5457 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class multiq 1:4 parent 1:
>  Sent 2915648 bytes 3917 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class multiq 1:5 parent 1:
>  Sent 1156897 bytes 1761 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class multiq 1:6 parent 1:
>  Sent 181776227 bytes 255856 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class multiq 1:7 parent 1:
>  Sent 5510686 bytes 6832 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
> class multiq 1:8 parent 1:
>  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>
>
>
>> it almost seems that there is an off by one somewhere, what kind of
>> traffic is being transmitted?
>>
>> Jesse
>>
>>
Aditional info for this:

After some time of working the only counters that works are:
      tx_queue_0_packets: 3611561261
      tx_queue_0_bytes: 2855348411016

And on the rest tx queues i see that counters are stopped.

  ethtool -S eth2
NIC statistics:
      rx_packets: 4107258563
      tx_packets: 3812022192
      rx_bytes: 1138855016
      tx_bytes: 2436290757
      rx_pkts_nic: 4125112688
      tx_pkts_nic: 3812022192
      rx_bytes_nic: 2931206969763
      tx_bytes_nic: 3058175756001
      lsc_int: 2
      tx_busy: 0
      non_eop_descs: 0
      rx_errors: 0
      tx_errors: 0
      rx_dropped: 0
      tx_dropped: 0
      multicast: 14179
      broadcast: 5326455
      rx_no_buffer_count: 0
      collisions: 0
      rx_over_errors: 0
      rx_crc_errors: 0
      rx_frame_errors: 0
      hw_rsc_aggregated: 0
      hw_rsc_flushed: 0
      fdir_match: 0
      fdir_miss: 0
      rx_fifo_errors: 0
      rx_missed_errors: 0
      tx_aborted_errors: 0
      tx_carrier_errors: 0
      tx_fifo_errors: 0
      tx_heartbeat_errors: 0
      tx_timeout_count: 0
      tx_restart_queue: 790
      rx_long_length_errors: 38
      rx_short_length_errors: 0
      tx_flow_control_xon: 0
      rx_flow_control_xon: 0
      tx_flow_control_xoff: 0
      rx_flow_control_xoff: 0
      rx_csum_offload_errors: 15663
      alloc_rx_page_failed: 0
      alloc_rx_buff_failed: 0
      rx_no_dma_resources: 0
      tx_queue_0_packets: 3611561261
      tx_queue_0_bytes: 2855348411016
      tx_queue_1_packets: 55713901
      tx_queue_1_bytes: 51724064701
      tx_queue_2_packets: 32077638
      tx_queue_2_bytes: 26637159254
      tx_queue_3_packets: 30157483
      tx_queue_3_bytes: 22398763585
      tx_queue_4_packets: 27389438
      tx_queue_4_bytes: 21974458604
      tx_queue_5_packets: 26757646
      tx_queue_5_bytes: 23280148038
      tx_queue_6_packets: 28364824
      tx_queue_6_bytes: 24730261889
      tx_queue_7_packets: 0
      tx_queue_7_bytes: 0
      rx_queue_0_packets: 495752660
      rx_queue_0_bytes: 326420962428
      rx_queue_1_packets: 519436668
      rx_queue_1_bytes: 369318739796
      rx_queue_2_packets: 519817774
      rx_queue_2_bytes: 367813366140
      rx_queue_3_packets: 526592761
      rx_queue_3_bytes: 378001222304
      rx_queue_4_packets: 502134859
      rx_queue_4_bytes: 354272049076
      rx_queue_5_packets: 523547673
      rx_queue_5_bytes: 383190146632
      rx_queue_6_packets: 527252376
      rx_queue_6_bytes: 372095739303
      rx_queue_7_packets: 510577598
      rx_queue_7_bytes: 347093806181

>
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


^ permalink raw reply

* [PATCH net-next] ixgbevf: Enable GRO by default
From: Shirley Ma @ 2010-05-13 19:51 UTC (permalink / raw)
  To: davem; +Cc: kvm, netdev, e1000-devel

Enable GRO by default for performance. 

Signed-off-by: Shirley Ma <xma@us.ibm.com>
---
 
 drivers/net/ixgbevf/ixgbevf_main.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ixgbevf/ixgbevf_main.c b/drivers/net/ixgbevf/ixgbevf_main.c
index 40f47b8..1bbb05e 100644
--- a/drivers/net/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ixgbevf/ixgbevf_main.c
@@ -3415,6 +3415,7 @@ static int __devinit ixgbevf_probe(struct pci_dev *pdev,
 	netdev->features |= NETIF_F_IPV6_CSUM;
 	netdev->features |= NETIF_F_TSO;
 	netdev->features |= NETIF_F_TSO6;
+	netdev->features |= NETIF_F_GRO;
 	netdev->vlan_features |= NETIF_F_TSO;
 	netdev->vlan_features |= NETIF_F_TSO6;
 	netdev->vlan_features |= NETIF_F_IP_CSUM;



^ permalink raw reply related

* [PATCH 6/20] drivers/net: Use kzalloc
From: Julia Lawall @ 2010-05-13 20:00 UTC (permalink / raw)
  To: Lennert Buytenhek, netdev, linux-kernel, kernel-janitors

From: Julia Lawall <julia@diku.dk>

Use kzalloc rather than the combination of kmalloc and memset.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,size,flags;
statement S;
@@

-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
 if (x == NULL) S
-memset(x, 0, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/mv643xx_eth.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff -u -p a/drivers/net/mv643xx_eth.c b/drivers/net/mv643xx_eth.c
--- a/drivers/net/mv643xx_eth.c
+++ b/drivers/net/mv643xx_eth.c
@@ -2608,10 +2608,9 @@ static int mv643xx_eth_shared_probe(stru
 		goto out;
 
 	ret = -ENOMEM;
-	msp = kmalloc(sizeof(*msp), GFP_KERNEL);
+	msp = kzalloc(sizeof(*msp), GFP_KERNEL);
 	if (msp == NULL)
 		goto out;
-	memset(msp, 0, sizeof(*msp));
 
 	msp->base = ioremap(res->start, res->end - res->start + 1);
 	if (msp->base == NULL)

^ permalink raw reply

* [PATCH 11/20] drivers/net/wireless/orinoco: Use kzalloc
From: Julia Lawall @ 2010-05-13 20:02 UTC (permalink / raw)
  To: Pavel Roskin, David Gibson, John W. Linville, linux-wireless,
	orinoco-users

From: Julia Lawall <julia@diku.dk>

Use kzalloc rather than the combination of kmalloc and memset.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,size,flags;
statement S;
@@

-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
 if (x == NULL) S
-memset(x, 0, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/wireless/orinoco/orinoco_usb.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff -u -p a/drivers/net/wireless/orinoco/orinoco_usb.c b/drivers/net/wireless/orinoco/orinoco_usb.c
--- a/drivers/net/wireless/orinoco/orinoco_usb.c
+++ b/drivers/net/wireless/orinoco/orinoco_usb.c
@@ -356,12 +356,10 @@ static struct request_context *ezusb_all
 {
 	struct request_context *ctx;
 
-	ctx = kmalloc(sizeof(*ctx), GFP_ATOMIC);
+	ctx = kzalloc(sizeof(*ctx), GFP_ATOMIC);
 	if (!ctx)
 		return NULL;
 
-	memset(ctx, 0, sizeof(*ctx));
-
 	ctx->buf = kmalloc(BULK_BUF_SIZE, GFP_ATOMIC);
 	if (!ctx->buf) {
 		kfree(ctx);

^ permalink raw reply

* [PATCH 13/20] net/caif: Use kzalloc
From: Julia Lawall @ 2010-05-13 20:03 UTC (permalink / raw)
  To: David S. Miller, netdev, linux-kernel, kernel-janitors

From: Julia Lawall <julia@diku.dk>

Use kzalloc rather than the combination of kmalloc and memset.

A simplified version of the semantic patch that makes this change is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,size,flags;
statement S;
@@

-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
 if (x == NULL) S
-memset(x, 0, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 net/caif/cfcnfg.c |    3 +--
 net/caif/cfctrl.c |    3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff -u -p a/net/caif/cfcnfg.c b/net/caif/cfcnfg.c
--- a/net/caif/cfcnfg.c
+++ b/net/caif/cfcnfg.c
@@ -65,12 +65,11 @@ struct cfcnfg *cfcnfg_create(void)
 	struct cfcnfg *this;
 	struct cfctrl_rsp *resp;
 	/* Initiate this layer */
-	this = kmalloc(sizeof(struct cfcnfg), GFP_ATOMIC);
+	this = kzalloc(sizeof(struct cfcnfg), GFP_ATOMIC);
 	if (!this) {
 		pr_warning("CAIF: %s(): Out of memory\n", __func__);
 		return NULL;
 	}
-	memset(this, 0, sizeof(struct cfcnfg));
 	this->mux = cfmuxl_create();
 	if (!this->mux)
 		goto out_of_mem;
diff -u -p a/net/caif/cfctrl.c b/net/caif/cfctrl.c
--- a/net/caif/cfctrl.c
+++ b/net/caif/cfctrl.c
@@ -284,12 +284,11 @@ int cfctrl_linkup_request(struct cflayer
 			   __func__, param->linktype);
 		return -EINVAL;
 	}
-	req = kmalloc(sizeof(*req), GFP_KERNEL);
+	req = kzalloc(sizeof(*req), GFP_KERNEL);
 	if (!req) {
 		pr_warning("CAIF: %s(): Out of memory\n", __func__);
 		return -ENOMEM;
 	}
-	memset(req, 0, sizeof(*req));
 	req->client_layer = user_layer;
 	req->cmd = CFCTRL_CMD_LINK_SETUP;
 	req->param = *param;

^ permalink raw reply

* [PATCH 14/20] drivers/net/vmxnet3: Use kzalloc
From: Julia Lawall @ 2010-05-13 20:05 UTC (permalink / raw)
  To: Shreyas Bhatewara, VMware, Inc., netdev, linux-kernel,
	kernel-janitors

From: Julia Lawall <julia@diku.dk>

Use kzalloc rather than the combination of kmalloc and memset.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,size,flags;
statement S;
@@

-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
 if (x == NULL) S
-memset(x, 0, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/vmxnet3/vmxnet3_drv.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff -u -p a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1369,13 +1369,12 @@ vmxnet3_rq_create(struct vmxnet3_rx_queu
 
 	sz = sizeof(struct vmxnet3_rx_buf_info) * (rq->rx_ring[0].size +
 						   rq->rx_ring[1].size);
-	bi = kmalloc(sz, GFP_KERNEL);
+	bi = kzalloc(sz, GFP_KERNEL);
 	if (!bi) {
 		printk(KERN_ERR "%s: failed to allocate rx bufinfo\n",
 		       adapter->netdev->name);
 		goto err;
 	}
-	memset(bi, 0, sz);
 	rq->buf_info[0] = bi;
 	rq->buf_info[1] = bi + rq->rx_ring[0].size;
 

^ permalink raw reply

* [PATCH 15/20] drivers/net: Use kcalloc or kzalloc
From: Julia Lawall @ 2010-05-13 20:06 UTC (permalink / raw)
  To: netdev, linux-kernel, kernel-janitors

From: Julia Lawall <julia@diku.dk>

Use kcalloc or kzalloc rather than the combination of kmalloc and memset.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,y,flags;
statement S;
type T;
@@

x = 
-   kmalloc
+   kcalloc
           (
-           y * sizeof(T),
+           y, sizeof(T),
                flags);
 if (x == NULL) S
-memset(x, 0, y * sizeof(T));

@@
expression x,size,flags;
statement S;
@@

-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
 if (x == NULL) S
-memset(x, 0, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>

---
 drivers/net/ibmveth.c |    3 +--
 drivers/net/ksz884x.c |    3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff -u -p a/drivers/net/ibmveth.c b/drivers/net/ibmveth.c
--- a/drivers/net/ibmveth.c
+++ b/drivers/net/ibmveth.c
@@ -199,7 +199,7 @@ static int ibmveth_alloc_buffer_pool(str
 		return -1;
 	}
 
-	pool->skbuff = kmalloc(sizeof(void*) * pool->size, GFP_KERNEL);
+	pool->skbuff = kcalloc(pool->size, sizeof(void *), GFP_KERNEL);
 
 	if(!pool->skbuff) {
 		kfree(pool->dma_addr);
@@ -210,7 +210,6 @@ static int ibmveth_alloc_buffer_pool(str
 		return -1;
 	}
 
-	memset(pool->skbuff, 0, sizeof(void*) * pool->size);
 	memset(pool->dma_addr, 0, sizeof(dma_addr_t) * pool->size);
 
 	for(i = 0; i < pool->size; ++i) {
diff -u -p a/drivers/net/ksz884x.c b/drivers/net/ksz884x.c
--- a/drivers/net/ksz884x.c
+++ b/drivers/net/ksz884x.c
@@ -7049,10 +7049,9 @@ static int __init pcidev_init(struct pci
 			mib_port_count = SWITCH_PORT_NUM;
 		}
 		hw->mib_port_cnt = TOTAL_PORT_NUM;
-		hw->ksz_switch = kmalloc(sizeof(struct ksz_switch), GFP_KERNEL);
+		hw->ksz_switch = kzalloc(sizeof(struct ksz_switch), GFP_KERNEL);
 		if (!hw->ksz_switch)
 			goto pcidev_init_alloc_err;
-		memset(hw->ksz_switch, 0, sizeof(struct ksz_switch));
 
 		sw = hw->ksz_switch;
 	}

^ permalink raw reply

* [net-next-2.6 V6 PATCH 0/2] Add virtual port netlink support
From: Scott Feldman @ 2010-05-13 20:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, chrisw, arnd

The following series adds virtual port netlink support and adds an
implementation to Cisco's enic netdev driver:

	1/2: Adds virtual netlink RTM_SETLINK/RTM_GETLINK support, and
	     adds matching netdev ops net_{set|get}_vf_port.

	2/2: Adds enic support for net_{set|get}_vf_port for enic
	     dynamic devices.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>

^ permalink raw reply

* [net-next-2.6 V6 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Scott Feldman @ 2010-05-13 20:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, chrisw, arnd
In-Reply-To: <20100513201714.25579.53530.stgit@savbu-pc100.cisco.com>

From: Scott Feldman <scofeldm@cisco.com>

Add new netdev ops ndo_{set|get}_vf_port to allow setting of
port-profile on a netdev interface.  Extends netlink socket RTM_SETLINK/
RTM_GETLINK with new sub cmd called IFLA_VF_PORT (added to end of
IFLA_cmd list).

A port-profile is used to configure/enable the external switch virtual port
backing the netdev interface, not to configure the host-facing side of the
netdev.  A port-profile is an identifier known to the switch.  How port-
profiles are installed on the switch or how available port-profiles are
made know to the host is outside the scope of this patch.

There are two types of port-profiles specs in the netlink msg.  The first spec
is for 802.1Qbg (pre-)standard, VDP protocol.  The second spec is for devices
that run a similar protocol as VDP but in firmware, thus hiding the protocol
details.  In either case, the specs have much in common and makes sense to
define the netlink msg as the union of the two specs.  For example, both specs
have a notition of associating/deassociating a port-profile.  And both specs
require some information from the hypervisor manager, such as client port
instance ID.

The general flow is the port-profile is applied to a host netdev interface
using RTM_SETLINK, the receiver of the RTM_SETLINK msg communicates with the
switch, and the switch virtual port backing the host netdev interface is
configured/enabled based on the settings defined by the port-profile.  What
those settings comprise, and how those settings are managed is again
outside the scope of this patch, since this patch only deals with the
first step in the flow.

There is a RTM_GETLINK cmd to to return port-profile setting of an
interface and to also return the status of the last port-profile
association.

IFLA_VF_PORT is modeled after the existing IFLA_VF_* cmd where a
VF number is passed in to identify the virtual function (VF) of an SR-IOV-
capable device.  In this case, the target of IFLA_VF_PORT msg is the
netdev physical function (PF) device.  The PF will apply the port-profile
to the VF.  IFLA_VF_PORT can also be used for devices that don't
adhere to SR-IOV and can apply the port-profile directly to the netdev
target.  In this case, the VF number is ignored.

Passing in a NULL port-profile is used to delete the port-profile association.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>
---
 include/linux/if_link.h   |   52 +++++++++++++++++++
 include/linux/netdevice.h |   10 ++++
 net/core/rtnetlink.c      |  122 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 183 insertions(+), 1 deletions(-)

diff --git a/include/linux/if_link.h b/include/linux/if_link.h
index cfd420b..d93a4a5 100644
--- a/include/linux/if_link.h
+++ b/include/linux/if_link.h
@@ -116,6 +116,7 @@ enum {
 	IFLA_VF_TX_RATE,	/* TX Bandwidth Allocation */
 	IFLA_VFINFO,
 	IFLA_STATS64,
+	IFLA_VF_PORT,
 	__IFLA_MAX
 };
 
@@ -259,4 +260,55 @@ struct ifla_vf_info {
 	__u32 qos;
 	__u32 tx_rate;
 };
+
+/* VF Port management section */
+
+enum {
+	IFLA_VF_PORT_UNSPEC,
+	IFLA_VF_PORT_VF,		/* __u32 */
+	IFLA_VF_PORT_PROFILE,		/* string */
+	IFLA_VF_PORT_VSI_TYPE,		/* 802.1Qbg (pre-)standard VDP */
+	IFLA_VF_PORT_INSTANCE_UUID,	/* binary UUID */
+	IFLA_VF_PORT_HOST_UUID,		/* binary UUID */
+	IFLA_VF_PORT_REQUEST,		/* __u8 */
+	IFLA_VF_PORT_RESPONSE,		/* __u16, output only */
+	__IFLA_VF_PORT_MAX,
+};
+
+#define IFLA_VF_PORT_MAX (__IFLA_VF_PORT_MAX - 1)
+
+#define VF_PORT_PROFILE_MAX	40
+#define VF_PORT_UUID_MAX	16
+
+enum {
+	VF_PORT_REQUEST_PREASSOCIATE = 0,
+	VF_PORT_REQUEST_PREASSOCIATE_RR,
+	VF_PORT_REQUEST_ASSOCIATE,
+	VF_PORT_REQUEST_DISASSOCIATE,
+};
+
+enum {
+	VF_PORT_VDP_RESPONSE_SUCCESS = 0,
+	VF_PORT_VDP_RESPONSE_INVALID_FORMAT,
+	VF_PORT_VDP_RESPONSE_INSUFFICIENT_RESOURCES,
+	VF_PORT_VDP_RESPONSE_UNUSED_VTID,
+	VF_PORT_VDP_RESPONSE_VTID_VIOLATION,
+	VF_PORT_VDP_RESPONSE_VTID_VERSION_VIOALTION,
+	VF_PORT_VDP_RESPONSE_OUT_OF_SYNC,
+	/* 0x08-0xFF reserved for future VDP use */
+	VF_PORT_PROFILE_RESPONSE_SUCCESS = 0x100,
+	VF_PORT_PROFILE_RESPONSE_INPROGRESS,
+	VF_PORT_PROFILE_RESPONSE_INVALID,
+	VF_PORT_PROFILE_RESPONSE_BADSTATE,
+	VF_PORT_PROFILE_RESPONSE_INSUFFICIENT_RESOURCES,
+	VF_PORT_PROFILE_RESPONSE_ERROR,
+};
+
+struct ifla_vf_port_vsi {
+	__u8 vsi_mgr_id;
+	__u8 vsi_type_id[3];
+	__u8 vsi_type_version;
+	__u8 pad[3];
+};
+
 #endif /* _LINUX_IF_LINK_H */
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 69022d4..c2ba8d4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -686,6 +686,10 @@ struct netdev_rx_queue {
  * int (*ndo_set_vf_tx_rate)(struct net_device *dev, int vf, int rate);
  * int (*ndo_get_vf_config)(struct net_device *dev,
  *			    int vf, struct ifla_vf_info *ivf);
+ * int (*ndo_set_vf_port)(struct net_device *dev, int vf,
+ *			  struct nlattr *vf_port[]);
+ * int (*ndo_get_vf_port)(struct net_device *dev, int vf,
+ *			  struct sk_buff *skb);
  */
 #define HAVE_NET_DEVICE_OPS
 struct net_device_ops {
@@ -735,6 +739,12 @@ struct net_device_ops {
 	int			(*ndo_get_vf_config)(struct net_device *dev,
 						     int vf,
 						     struct ifla_vf_info *ivf);
+	int			(*ndo_set_vf_port)(struct net_device *dev,
+						   int vf,
+						   struct nlattr *vf_port[]);
+	int			(*ndo_get_vf_port)(struct net_device *dev,
+						   int vf,
+						   struct sk_buff *skb);
 #if defined(CONFIG_FCOE) || defined(CONFIG_FCOE_MODULE)
 	int			(*ndo_fcoe_enable)(struct net_device *dev);
 	int			(*ndo_fcoe_disable)(struct net_device *dev);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 23a71cb..de14d36 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -653,6 +653,26 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev)
 		return 0;
 }
 
+static size_t rtnl_vf_port_size(const struct net_device *dev)
+{
+	size_t vf_port_size = nla_total_size(sizeof(struct nlattr))
+						     /* VF_PORT_VF */
+		+ nla_total_size(VF_PORT_PROFILE_MAX)/* VF_PORT_PROFILE */
+		+ nla_total_size(sizeof(struct ifla_vf_port_vsi))
+						     /* VF_PORT_VSI_TYPE */
+		+ nla_total_size(VF_PORT_UUID_MAX)   /* VF_PORT_VSI_INSTANCE */
+		+ nla_total_size(VF_PORT_UUID_MAX)   /* VF_PORT_HOST_UUID */
+		+ nla_total_size(1)		     /* VF_PROT_VDP_REQUEST */
+		+ nla_total_size(2);		     /* VF_PORT_VDP_RESPONSE */
+
+	if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent)
+		return 0;
+	if (dev_num_vf(dev->dev.parent))
+		return vf_port_size * dev_num_vf(dev->dev.parent);
+	else
+		return vf_port_size;
+}
+
 static inline size_t if_nlmsg_size(const struct net_device *dev)
 {
 	return NLMSG_ALIGN(sizeof(struct ifinfomsg))
@@ -673,9 +693,67 @@ static inline size_t if_nlmsg_size(const struct net_device *dev)
 	       + nla_total_size(1) /* IFLA_LINKMODE */
 	       + nla_total_size(4) /* IFLA_NUM_VF */
 	       + nla_total_size(rtnl_vfinfo_size(dev)) /* IFLA_VFINFO */
+	       + rtnl_vf_port_size(dev) /* IFLA_VF_PORT */
 	       + rtnl_link_get_size(dev); /* IFLA_LINKINFO */
 }
 
+static int rtnl_vf_port_fill_nest(struct sk_buff *skb, struct net_device *dev,
+	int vf)
+{
+	struct nlattr *data;
+	int err;
+
+	data = nla_nest_start(skb, IFLA_VF_PORT);
+	if (!data)
+		return -EMSGSIZE;
+
+	if (vf >= 0)
+		nla_put_u32(skb, IFLA_VF_PORT_VF, vf);
+
+	err = dev->netdev_ops->ndo_get_vf_port(dev, vf, skb);
+	if (err == -EMSGSIZE) {
+		nla_nest_cancel(skb, data);
+		return -EMSGSIZE;
+	} else if (err) {
+		nla_nest_cancel(skb, data);
+		return 0;
+	}
+
+	nla_nest_end(skb, data);
+
+	return 0;
+}
+
+static int rtnl_vf_port_fill(struct sk_buff *skb, struct net_device *dev)
+{
+	int num_vf;
+	int err;
+
+	if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent)
+		return 0;
+
+	num_vf = dev_num_vf(dev->dev.parent);
+
+	if (num_vf) {
+		int i;
+
+		for (i = 0; i < num_vf; i++) {
+			err = rtnl_vf_port_fill_nest(skb, dev, i);
+			if (err)
+				goto nla_put_failure;
+		}
+	} else  {
+		err = rtnl_vf_port_fill_nest(skb, dev, -1);
+		if (err)
+			goto nla_put_failure;
+	}
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
 static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 			    int type, u32 pid, u32 seq, u32 change,
 			    unsigned int flags)
@@ -747,17 +825,23 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 		goto nla_put_failure;
 	copy_rtnl_link_stats64(nla_data(attr), stats);
 
+	if (dev->dev.parent)
+		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
+
 	if (dev->netdev_ops->ndo_get_vf_config && dev->dev.parent) {
 		int i;
 		struct ifla_vf_info ivi;
 
-		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
 		for (i = 0; i < dev_num_vf(dev->dev.parent); i++) {
 			if (dev->netdev_ops->ndo_get_vf_config(dev, i, &ivi))
 				break;
 			NLA_PUT(skb, IFLA_VFINFO, sizeof(ivi), &ivi);
 		}
 	}
+
+	if (rtnl_vf_port_fill(skb, dev))
+		goto nla_put_failure;
+
 	if (dev->rtnl_link_ops) {
 		if (rtnl_link_fill(skb, dev) < 0)
 			goto nla_put_failure;
@@ -824,6 +908,7 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 				    .len = sizeof(struct ifla_vf_vlan) },
 	[IFLA_VF_TX_RATE]	= { .type = NLA_BINARY,
 				    .len = sizeof(struct ifla_vf_tx_rate) },
+	[IFLA_VF_PORT]		= { .type = NLA_NESTED },
 };
 EXPORT_SYMBOL(ifla_policy);
 
@@ -832,6 +917,20 @@ static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
 	[IFLA_INFO_DATA]	= { .type = NLA_NESTED },
 };
 
+static const struct nla_policy ifla_vf_port_policy[IFLA_VF_PORT_MAX+1] = {
+	[IFLA_VF_PORT_VF]		= { .type = NLA_U32 },
+	[IFLA_VF_PORT_PROFILE]		= { .type = NLA_STRING,
+				.len = VF_PORT_PROFILE_MAX },
+	[IFLA_VF_PORT_VSI_TYPE]		= { .type = NLA_BINARY,
+				.len = sizeof(struct ifla_vf_port_vsi)},
+	[IFLA_VF_PORT_INSTANCE_UUID]	= { .type = NLA_BINARY,
+				.len = VF_PORT_UUID_MAX },
+	[IFLA_VF_PORT_HOST_UUID]	= { .type = NLA_STRING,
+				.len = VF_PORT_UUID_MAX },
+	[IFLA_VF_PORT_REQUEST]		= { .type = NLA_U8, },
+	[IFLA_VF_PORT_RESPONSE]		= { .type = NLA_U16, },
+};
+
 struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
 {
 	struct net *net;
@@ -1028,6 +1127,27 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
 	}
 	err = 0;
 
+	if (tb[IFLA_VF_PORT]) {
+		struct nlattr *vf_port[IFLA_VF_PORT_MAX+1];
+		int vf = -1;
+
+		err = nla_parse_nested(vf_port, IFLA_VF_PORT_MAX,
+			tb[IFLA_VF_PORT], ifla_vf_port_policy);
+		if (err < 0)
+			goto errout;
+
+		if (vf_port[IFLA_VF_PORT_VF])
+			vf = nla_get_u32(vf_port[IFLA_VF_PORT_VF]);
+
+		err = -EOPNOTSUPP;
+		if (ops->ndo_set_vf_port)
+			err = ops->ndo_set_vf_port(dev, vf, vf_port);
+		if (err < 0)
+			goto errout;
+		modified = 1;
+	}
+	err = 0;
+
 errout:
 	if (err < 0 && modified && net_ratelimit())
 		printk(KERN_WARNING "A link change request failed with "


^ permalink raw reply related

* [net-next-2.6 V6 PATCH 2/2] Add ndo_{set|get}_vf_port support for enic dynamic vnics
From: Scott Feldman @ 2010-05-13 20:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, chrisw, arnd
In-Reply-To: <20100513201714.25579.53530.stgit@savbu-pc100.cisco.com>

From: Scott Feldman <scofeldm@cisco.com>

Add enic ndo_{set|get}_vf_port ops to support setting/getting
port-profile for enic dynamic devices.  Enic dynamic devices are just like
normal enic eth devices except dynamic enics require an extra configuration
step to assign a port-profile identifier to the interface before the
interface is useable.  Once a port-profile is assigned, link comes up on the
interface and is ready for I/O.  The port-profile is used to configure the
network port assigned to the interface.  The network port configuration
includes VLAN membership, QoS policies, and port security settings typical
of a data center network.

A dynamic enic initially has a zero-mac address.  Before a port-profile is
assigned, a valid non-zero unicast mac address should be assign to the
dynamic enic interface.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>
---
 drivers/net/enic/Makefile    |    2 
 drivers/net/enic/enic.h      |    9 +
 drivers/net/enic/enic_main.c |  321 +++++++++++++++++++++++++++++++++++++++---
 drivers/net/enic/enic_res.c  |    5 -
 drivers/net/enic/enic_res.h  |    1 
 drivers/net/enic/vnic_dev.c  |   58 +++++++-
 drivers/net/enic/vnic_dev.h  |    7 +
 drivers/net/enic/vnic_vic.c  |   73 ++++++++++
 drivers/net/enic/vnic_vic.h  |   59 ++++++++
 9 files changed, 500 insertions(+), 35 deletions(-)

diff --git a/drivers/net/enic/Makefile b/drivers/net/enic/Makefile
index 391c3bc..e7b6c31 100644
--- a/drivers/net/enic/Makefile
+++ b/drivers/net/enic/Makefile
@@ -1,5 +1,5 @@
 obj-$(CONFIG_ENIC) := enic.o
 
 enic-y := enic_main.o vnic_cq.o vnic_intr.o vnic_wq.o \
-	enic_res.o vnic_dev.o vnic_rq.o
+	enic_res.o vnic_dev.o vnic_rq.o vnic_vic.o
 
diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 5fa56f1..91088be 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -34,7 +34,7 @@
 
 #define DRV_NAME		"enic"
 #define DRV_DESCRIPTION		"Cisco VIC Ethernet NIC Driver"
-#define DRV_VERSION		"1.3.1.1"
+#define DRV_VERSION		"1.3.1.1-pp"
 #define DRV_COPYRIGHT		"Copyright 2008-2009 Cisco Systems, Inc"
 #define PFX			DRV_NAME ": "
 
@@ -74,6 +74,12 @@ struct enic_msix_entry {
 	void *devid;
 };
 
+struct enic_port_profile {
+	char name[VF_PORT_PROFILE_MAX];
+	u8 instance_uuid[VF_PORT_UUID_MAX];
+	u8 host_uuid[VF_PORT_UUID_MAX];
+};
+
 /* Per-instance private data structure */
 struct enic {
 	struct net_device *netdev;
@@ -95,6 +101,7 @@ struct enic {
 	u32 port_mtu;
 	u32 rx_coalesce_usecs;
 	u32 tx_coalesce_usecs;
+	struct enic_port_profile pp;
 
 	/* work queue cache line section */
 	____cacheline_aligned struct vnic_wq wq[ENIC_WQ_MAX];
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index 1232887..fb8bf4a 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -29,6 +29,7 @@
 #include <linux/etherdevice.h>
 #include <linux/if_ether.h>
 #include <linux/if_vlan.h>
+#include <linux/if_link.h>
 #include <linux/ethtool.h>
 #include <linux/in.h>
 #include <linux/ip.h>
@@ -40,6 +41,7 @@
 #include "vnic_dev.h"
 #include "vnic_intr.h"
 #include "vnic_stats.h"
+#include "vnic_vic.h"
 #include "enic_res.h"
 #include "enic.h"
 
@@ -49,10 +51,12 @@
 #define ENIC_DESC_MAX_SPLITS		(MAX_TSO / WQ_ENET_MAX_DESC_LEN + 1)
 
 #define PCI_DEVICE_ID_CISCO_VIC_ENET         0x0043  /* ethernet vnic */
+#define PCI_DEVICE_ID_CISCO_VIC_ENET_DYN     0x0044  /* enet dynamic vnic */
 
 /* Supported devices */
 static DEFINE_PCI_DEVICE_TABLE(enic_id_table) = {
 	{ PCI_VDEVICE(CISCO, PCI_DEVICE_ID_CISCO_VIC_ENET) },
+	{ PCI_VDEVICE(CISCO, PCI_DEVICE_ID_CISCO_VIC_ENET_DYN) },
 	{ 0, }	/* end of table */
 };
 
@@ -113,6 +117,11 @@ static const struct enic_stat enic_rx_stats[] = {
 static const unsigned int enic_n_tx_stats = ARRAY_SIZE(enic_tx_stats);
 static const unsigned int enic_n_rx_stats = ARRAY_SIZE(enic_rx_stats);
 
+static int enic_is_dynamic(struct enic *enic)
+{
+	return enic->pdev->device == PCI_DEVICE_ID_CISCO_VIC_ENET_DYN;
+}
+
 static int enic_get_settings(struct net_device *netdev,
 	struct ethtool_cmd *ecmd)
 {
@@ -810,14 +819,73 @@ static void enic_reset_mcaddrs(struct enic *enic)
 
 static int enic_set_mac_addr(struct net_device *netdev, char *addr)
 {
-	if (!is_valid_ether_addr(addr))
-		return -EADDRNOTAVAIL;
+	struct enic *enic = netdev_priv(netdev);
+
+	if (enic_is_dynamic(enic)) {
+		if (!is_valid_ether_addr(addr) && !is_zero_ether_addr(addr))
+			return -EADDRNOTAVAIL;
+	} else {
+		if (!is_valid_ether_addr(addr))
+			return -EADDRNOTAVAIL;
+	}
 
 	memcpy(netdev->dev_addr, addr, netdev->addr_len);
 
 	return 0;
 }
 
+static int enic_dev_add_station_addr(struct enic *enic)
+{
+	int err = 0;
+
+	if (is_valid_ether_addr(enic->netdev->dev_addr)) {
+		spin_lock(&enic->devcmd_lock);
+		err = vnic_dev_add_addr(enic->vdev, enic->netdev->dev_addr);
+		spin_unlock(&enic->devcmd_lock);
+	}
+
+	return err;
+}
+
+static int enic_dev_del_station_addr(struct enic *enic)
+{
+	int err = 0;
+
+	if (is_valid_ether_addr(enic->netdev->dev_addr)) {
+		spin_lock(&enic->devcmd_lock);
+		err = vnic_dev_del_addr(enic->vdev, enic->netdev->dev_addr);
+		spin_unlock(&enic->devcmd_lock);
+	}
+
+	return err;
+}
+
+static int enic_set_mac_address(struct net_device *netdev, void *p)
+{
+	struct enic *enic = netdev_priv(netdev);
+	struct sockaddr *saddr = p;
+	char *addr = saddr->sa_data;
+	int err = -EOPNOTSUPP;
+
+	if (enic_is_dynamic(enic)) {
+		if (netif_running(enic->netdev)) {
+			err = enic_dev_del_station_addr(enic);
+			if (err)
+				return err;
+		}
+		err = enic_set_mac_addr(netdev, addr);
+		if (err)
+			return err;
+		if (netif_running(enic->netdev)) {
+			err = enic_dev_add_station_addr(enic);
+			if (err)
+				return err;
+		}
+	}
+
+	return err;
+}
+
 /* netif_tx_lock held, BHs disabled */
 static void enic_set_multicast_list(struct net_device *netdev)
 {
@@ -922,6 +990,209 @@ static void enic_tx_timeout(struct net_device *netdev)
 	schedule_work(&enic->reset);
 }
 
+static int enic_vnic_dev_deinit(struct enic *enic)
+{
+	int err;
+
+	spin_lock(&enic->devcmd_lock);
+	err = vnic_dev_deinit(enic->vdev);
+	spin_unlock(&enic->devcmd_lock);
+
+	return err;
+}
+
+static int enic_dev_init_prov(struct enic *enic, struct vic_provinfo *vp)
+{
+	int err;
+
+	spin_lock(&enic->devcmd_lock);
+	err = vnic_dev_init_prov(enic->vdev,
+		(u8 *)vp, vic_provinfo_size(vp));
+	spin_unlock(&enic->devcmd_lock);
+
+	return err;
+}
+
+static int enic_dev_init_done(struct enic *enic, int *done, int *error)
+{
+	int err;
+
+	spin_lock(&enic->devcmd_lock);
+	err = vnic_dev_init_done(enic->vdev, done, error);
+	spin_unlock(&enic->devcmd_lock);
+
+	return err;
+}
+
+static int enic_set_port_profile(struct enic *enic, int vf, u8 *mac,
+	char *name, u8 *instance_uuid, u8 *host_uuid)
+{
+	struct vic_provinfo *vp;
+	u8 oui[3] = VIC_PROVINFO_CISCO_OUI;
+	unsigned short *uuid;
+	char uuid_str[38];
+	static char *uuid_fmt = "%04X%04X-%04X-%04X-%04X-%04X%04X%04X";
+	int err;
+
+	if (!name)
+		return -EINVAL;
+
+	if (!is_valid_ether_addr(mac))
+		return -EADDRNOTAVAIL;
+
+	vp = vic_provinfo_alloc(GFP_KERNEL, oui, VIC_PROVINFO_LINUX_TYPE);
+	if (!vp)
+		return -ENOMEM;
+
+	vic_provinfo_add_tlv(vp,
+		VIC_LINUX_PROV_TLV_PORT_PROFILE_NAME_STR,
+		strlen(name) + 1, name);
+
+	vic_provinfo_add_tlv(vp,
+		VIC_LINUX_PROV_TLV_CLIENT_MAC_ADDR,
+		ETH_ALEN, mac);
+
+	if (instance_uuid) {
+		uuid = (unsigned short *)instance_uuid;
+		sprintf(uuid_str, uuid_fmt,
+			uuid[0], uuid[1], uuid[2], uuid[3],
+			uuid[4], uuid[5], uuid[6], uuid[7]);
+		vic_provinfo_add_tlv(vp,
+			VIC_LINUX_PROV_TLV_CLIENT_UUID_STR,
+			sizeof(uuid_str), uuid_str);
+	}
+
+	if (host_uuid) {
+		uuid = (unsigned short *)host_uuid;
+		sprintf(uuid_str, uuid_fmt,
+			uuid[0], uuid[1], uuid[2], uuid[3],
+			uuid[4], uuid[5], uuid[6], uuid[7]);
+		vic_provinfo_add_tlv(vp,
+			VIC_LINUX_PROV_TLV_HOST_UUID_STR,
+			sizeof(uuid_str), uuid_str);
+	}
+
+	err = enic_vnic_dev_deinit(enic);
+	if (err)
+		goto err_out;
+
+	memset(&enic->pp, 0, sizeof(enic->pp));
+
+	err = enic_dev_init_prov(enic, vp);
+	if (err)
+		goto err_out;
+
+	memcpy(enic->pp.name, name, VF_PORT_PROFILE_MAX);
+	if (instance_uuid)
+		memcpy(enic->pp.instance_uuid,
+			instance_uuid, VF_PORT_UUID_MAX);
+	if (host_uuid)
+		memcpy(enic->pp.host_uuid,
+			host_uuid, VF_PORT_UUID_MAX);
+
+err_out:
+	vic_provinfo_free(vp);
+
+	return err;
+}
+
+static int enic_unset_port_profile(struct enic *enic, int vf)
+{
+	memset(&enic->pp, 0, sizeof(enic->pp));
+	return enic_vnic_dev_deinit(enic);
+}
+
+static int enic_set_vf_port(struct net_device *netdev, int vf,
+	struct nlattr *vf_port[])
+{
+	struct enic *enic = netdev_priv(netdev);
+	char *name = NULL;
+	u8 *instance_uuid = NULL;
+	u8 *host_uuid = NULL;
+	u8 request = VF_PORT_REQUEST_DISASSOCIATE;
+
+	if (!enic_is_dynamic(enic))
+		return -EOPNOTSUPP;
+
+	if (vf_port[IFLA_VF_PORT_REQUEST])
+		request = nla_get_u8(vf_port[IFLA_VF_PORT_REQUEST]);
+
+	switch (request) {
+	case VF_PORT_REQUEST_ASSOCIATE:
+
+		if (vf_port[IFLA_VF_PORT_PROFILE])
+			name = nla_data(vf_port[IFLA_VF_PORT_PROFILE]);
+
+		if (vf_port[IFLA_VF_PORT_INSTANCE_UUID])
+			instance_uuid =
+				nla_data(vf_port[IFLA_VF_PORT_INSTANCE_UUID]);
+
+		if (vf_port[IFLA_VF_PORT_HOST_UUID])
+			host_uuid = nla_data(vf_port[IFLA_VF_PORT_HOST_UUID]);
+
+		return enic_set_port_profile(enic, vf,
+			netdev->dev_addr, name,
+			instance_uuid, host_uuid);
+
+	case VF_PORT_REQUEST_DISASSOCIATE:
+
+		return enic_unset_port_profile(enic, vf);
+
+	default:
+		break;
+	}
+
+	return -EOPNOTSUPP;
+}
+
+static int enic_get_vf_port(struct net_device *netdev, int vf,
+	struct sk_buff *skb)
+{
+	struct enic *enic = netdev_priv(netdev);
+	int err, error, done;
+	u16 response = VF_PORT_PROFILE_RESPONSE_SUCCESS;
+
+	if (!enic_is_dynamic(enic))
+		return -EOPNOTSUPP;
+
+	err = enic_dev_init_done(enic, &done, &error);
+
+	if (err)
+		return err;
+
+	switch (error) {
+	case ERR_SUCCESS:
+		if (!done)
+			response = VF_PORT_PROFILE_RESPONSE_INPROGRESS;
+		break;
+	case ERR_EINVAL:
+		response = VF_PORT_PROFILE_RESPONSE_INVALID;
+		break;
+	case ERR_EBADSTATE:
+		response = VF_PORT_PROFILE_RESPONSE_BADSTATE;
+		break;
+	case ERR_ENOMEM:
+		response = VF_PORT_PROFILE_RESPONSE_INSUFFICIENT_RESOURCES;
+		break;
+	default:
+		response = VF_PORT_PROFILE_RESPONSE_ERROR;
+		break;
+	}
+
+	NLA_PUT_U16(skb, IFLA_VF_PORT_RESPONSE, response);
+	NLA_PUT(skb, IFLA_VF_PORT_PROFILE, VF_PORT_PROFILE_MAX,
+		enic->pp.name);
+	NLA_PUT(skb, IFLA_VF_PORT_INSTANCE_UUID, VF_PORT_UUID_MAX,
+		enic->pp.instance_uuid);
+	NLA_PUT(skb, IFLA_VF_PORT_HOST_UUID, VF_PORT_UUID_MAX,
+		enic->pp.host_uuid);
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
 static void enic_free_rq_buf(struct vnic_rq *rq, struct vnic_rq_buf *buf)
 {
 	struct enic *enic = vnic_dev_priv(rq->vdev);
@@ -1440,9 +1711,7 @@ static int enic_open(struct net_device *netdev)
 	for (i = 0; i < enic->rq_count; i++)
 		vnic_rq_enable(&enic->rq[i]);
 
-	spin_lock(&enic->devcmd_lock);
-	enic_add_station_addr(enic);
-	spin_unlock(&enic->devcmd_lock);
+	enic_dev_add_station_addr(enic);
 	enic_set_multicast_list(netdev);
 
 	netif_wake_queue(netdev);
@@ -1489,6 +1758,8 @@ static int enic_stop(struct net_device *netdev)
 	netif_carrier_off(netdev);
 	netif_tx_disable(netdev);
 
+	enic_dev_del_station_addr(enic);
+
 	for (i = 0; i < enic->wq_count; i++) {
 		err = vnic_wq_disable(&enic->wq[i]);
 		if (err)
@@ -1775,20 +2046,22 @@ static void enic_clear_intr_mode(struct enic *enic)
 }
 
 static const struct net_device_ops enic_netdev_ops = {
-	.ndo_open		= enic_open,
-	.ndo_stop		= enic_stop,
-	.ndo_start_xmit		= enic_hard_start_xmit,
-	.ndo_get_stats		= enic_get_stats,
-	.ndo_validate_addr	= eth_validate_addr,
-	.ndo_set_mac_address 	= eth_mac_addr,
-	.ndo_set_multicast_list	= enic_set_multicast_list,
-	.ndo_change_mtu		= enic_change_mtu,
-	.ndo_vlan_rx_register	= enic_vlan_rx_register,
-	.ndo_vlan_rx_add_vid	= enic_vlan_rx_add_vid,
-	.ndo_vlan_rx_kill_vid	= enic_vlan_rx_kill_vid,
-	.ndo_tx_timeout		= enic_tx_timeout,
+	.ndo_open			= enic_open,
+	.ndo_stop			= enic_stop,
+	.ndo_start_xmit			= enic_hard_start_xmit,
+	.ndo_get_stats			= enic_get_stats,
+	.ndo_validate_addr		= eth_validate_addr,
+	.ndo_set_multicast_list		= enic_set_multicast_list,
+	.ndo_set_mac_address		= enic_set_mac_address,
+	.ndo_change_mtu			= enic_change_mtu,
+	.ndo_vlan_rx_register		= enic_vlan_rx_register,
+	.ndo_vlan_rx_add_vid		= enic_vlan_rx_add_vid,
+	.ndo_vlan_rx_kill_vid		= enic_vlan_rx_kill_vid,
+	.ndo_tx_timeout			= enic_tx_timeout,
+	.ndo_set_vf_port		= enic_set_vf_port,
+	.ndo_get_vf_port		= enic_get_vf_port,
 #ifdef CONFIG_NET_POLL_CONTROLLER
-	.ndo_poll_controller	= enic_poll_controller,
+	.ndo_poll_controller		= enic_poll_controller,
 #endif
 };
 
@@ -2010,11 +2283,13 @@ static int __devinit enic_probe(struct pci_dev *pdev,
 
 	netif_carrier_off(netdev);
 
-	err = vnic_dev_init(enic->vdev, 0);
-	if (err) {
-		printk(KERN_ERR PFX
-			"vNIC dev init failed, aborting.\n");
-		goto err_out_dev_close;
+	if (!enic_is_dynamic(enic)) {
+		err = vnic_dev_init(enic->vdev, 0);
+		if (err) {
+			printk(KERN_ERR PFX
+				"vNIC dev init failed, aborting.\n");
+			goto err_out_dev_close;
+		}
 	}
 
 	err = enic_dev_init(enic);
diff --git a/drivers/net/enic/enic_res.c b/drivers/net/enic/enic_res.c
index 02839bf..9b18840 100644
--- a/drivers/net/enic/enic_res.c
+++ b/drivers/net/enic/enic_res.c
@@ -103,11 +103,6 @@ int enic_get_vnic_config(struct enic *enic)
 	return 0;
 }
 
-void enic_add_station_addr(struct enic *enic)
-{
-	vnic_dev_add_addr(enic->vdev, enic->mac_addr);
-}
-
 void enic_add_multicast_addr(struct enic *enic, u8 *addr)
 {
 	vnic_dev_add_addr(enic->vdev, addr);
diff --git a/drivers/net/enic/enic_res.h b/drivers/net/enic/enic_res.h
index abc1974..494664f 100644
--- a/drivers/net/enic/enic_res.h
+++ b/drivers/net/enic/enic_res.h
@@ -131,7 +131,6 @@ static inline void enic_queue_rq_desc(struct vnic_rq *rq,
 struct enic;
 
 int enic_get_vnic_config(struct enic *);
-void enic_add_station_addr(struct enic *enic);
 void enic_add_multicast_addr(struct enic *enic, u8 *addr);
 void enic_del_multicast_addr(struct enic *enic, u8 *addr);
 void enic_add_vlan(struct enic *enic, u16 vlanid);
diff --git a/drivers/net/enic/vnic_dev.c b/drivers/net/enic/vnic_dev.c
index d43a9d4..2b3e16d 100644
--- a/drivers/net/enic/vnic_dev.c
+++ b/drivers/net/enic/vnic_dev.c
@@ -530,7 +530,7 @@ void vnic_dev_packet_filter(struct vnic_dev *vdev, int directed, int multicast,
 		printk(KERN_ERR "Can't set packet filter\n");
 }
 
-void vnic_dev_add_addr(struct vnic_dev *vdev, u8 *addr)
+int vnic_dev_add_addr(struct vnic_dev *vdev, u8 *addr)
 {
 	u64 a0 = 0, a1 = 0;
 	int wait = 1000;
@@ -543,9 +543,11 @@ void vnic_dev_add_addr(struct vnic_dev *vdev, u8 *addr)
 	err = vnic_dev_cmd(vdev, CMD_ADDR_ADD, &a0, &a1, wait);
 	if (err)
 		printk(KERN_ERR "Can't add addr [%pM], %d\n", addr, err);
+
+	return err;
 }
 
-void vnic_dev_del_addr(struct vnic_dev *vdev, u8 *addr)
+int vnic_dev_del_addr(struct vnic_dev *vdev, u8 *addr)
 {
 	u64 a0 = 0, a1 = 0;
 	int wait = 1000;
@@ -558,6 +560,8 @@ void vnic_dev_del_addr(struct vnic_dev *vdev, u8 *addr)
 	err = vnic_dev_cmd(vdev, CMD_ADDR_DEL, &a0, &a1, wait);
 	if (err)
 		printk(KERN_ERR "Can't del addr [%pM], %d\n", addr, err);
+
+	return err;
 }
 
 int vnic_dev_raise_intr(struct vnic_dev *vdev, u16 intr)
@@ -682,6 +686,56 @@ int vnic_dev_init(struct vnic_dev *vdev, int arg)
 	return r;
 }
 
+int vnic_dev_init_done(struct vnic_dev *vdev, int *done, int *err)
+{
+	u64 a0 = 0, a1 = 0;
+	int wait = 1000;
+	int ret;
+
+	*done = 0;
+
+	ret = vnic_dev_cmd(vdev, CMD_INIT_STATUS, &a0, &a1, wait);
+	if (ret)
+		return ret;
+
+	*done = (a0 == 0);
+
+	*err = (a0 == 0) ? a1 : 0;
+
+	return 0;
+}
+
+int vnic_dev_init_prov(struct vnic_dev *vdev, u8 *buf, u32 len)
+{
+	u64 a0, a1 = len;
+	int wait = 1000;
+	u64 prov_pa;
+	void *prov_buf;
+	int ret;
+
+	prov_buf = pci_alloc_consistent(vdev->pdev, len, &prov_pa);
+	if (!prov_buf)
+		return -ENOMEM;
+
+	memcpy(prov_buf, buf, len);
+
+	a0 = prov_pa;
+
+	ret = vnic_dev_cmd(vdev, CMD_INIT_PROV_INFO, &a0, &a1, wait);
+
+	pci_free_consistent(vdev->pdev, len, prov_buf, prov_pa);
+
+	return ret;
+}
+
+int vnic_dev_deinit(struct vnic_dev *vdev)
+{
+	u64 a0 = 0, a1 = 0;
+	int wait = 1000;
+
+	return vnic_dev_cmd(vdev, CMD_DEINIT, &a0, &a1, wait);
+}
+
 int vnic_dev_link_status(struct vnic_dev *vdev)
 {
 	if (vdev->linkstatus)
diff --git a/drivers/net/enic/vnic_dev.h b/drivers/net/enic/vnic_dev.h
index f5be640..caccce3 100644
--- a/drivers/net/enic/vnic_dev.h
+++ b/drivers/net/enic/vnic_dev.h
@@ -103,8 +103,8 @@ int vnic_dev_stats_dump(struct vnic_dev *vdev, struct vnic_stats **stats);
 int vnic_dev_hang_notify(struct vnic_dev *vdev);
 void vnic_dev_packet_filter(struct vnic_dev *vdev, int directed, int multicast,
 	int broadcast, int promisc, int allmulti);
-void vnic_dev_add_addr(struct vnic_dev *vdev, u8 *addr);
-void vnic_dev_del_addr(struct vnic_dev *vdev, u8 *addr);
+int vnic_dev_add_addr(struct vnic_dev *vdev, u8 *addr);
+int vnic_dev_del_addr(struct vnic_dev *vdev, u8 *addr);
 int vnic_dev_mac_addr(struct vnic_dev *vdev, u8 *mac_addr);
 int vnic_dev_raise_intr(struct vnic_dev *vdev, u16 intr);
 int vnic_dev_notify_setcmd(struct vnic_dev *vdev,
@@ -124,6 +124,9 @@ int vnic_dev_disable(struct vnic_dev *vdev);
 int vnic_dev_open(struct vnic_dev *vdev, int arg);
 int vnic_dev_open_done(struct vnic_dev *vdev, int *done);
 int vnic_dev_init(struct vnic_dev *vdev, int arg);
+int vnic_dev_init_done(struct vnic_dev *vdev, int *done, int *err);
+int vnic_dev_init_prov(struct vnic_dev *vdev, u8 *buf, u32 len);
+int vnic_dev_deinit(struct vnic_dev *vdev);
 int vnic_dev_soft_reset(struct vnic_dev *vdev, int arg);
 int vnic_dev_soft_reset_done(struct vnic_dev *vdev, int *done);
 void vnic_dev_set_intr_mode(struct vnic_dev *vdev,
diff --git a/drivers/net/enic/vnic_vic.c b/drivers/net/enic/vnic_vic.c
new file mode 100644
index 0000000..d769772
--- /dev/null
+++ b/drivers/net/enic/vnic_vic.c
@@ -0,0 +1,73 @@
+/*
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ *
+ * This program is free software; you may redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+
+#include "vnic_vic.h"
+
+struct vic_provinfo *vic_provinfo_alloc(gfp_t flags, u8 *oui, u8 type)
+{
+	struct vic_provinfo *vp = kzalloc(VIC_PROVINFO_MAX_DATA, flags);
+
+	if (!vp || !oui)
+		return NULL;
+
+	memcpy(vp->oui, oui, sizeof(vp->oui));
+	vp->type = type;
+	vp->length = htonl(sizeof(vp->num_tlvs));
+
+	return vp;
+}
+
+void vic_provinfo_free(struct vic_provinfo *vp)
+{
+	kfree(vp);
+}
+
+int vic_provinfo_add_tlv(struct vic_provinfo *vp, u16 type, u16 length,
+	void *value)
+{
+	struct vic_provinfo_tlv *tlv;
+
+	if (!vp || !value)
+		return -EINVAL;
+
+	if (ntohl(vp->length) + sizeof(*tlv) + length >
+		VIC_PROVINFO_MAX_TLV_DATA)
+		return -ENOMEM;
+
+	tlv = (struct vic_provinfo_tlv *)((u8 *)vp->tlv +
+		ntohl(vp->length) - sizeof(vp->num_tlvs));
+
+	tlv->type = htons(type);
+	tlv->length = htons(length);
+	memcpy(tlv->value, value, length);
+
+	vp->num_tlvs = htonl(ntohl(vp->num_tlvs) + 1);
+	vp->length = htonl(ntohl(vp->length) + sizeof(*tlv) + length);
+
+	return 0;
+}
+
+size_t vic_provinfo_size(struct vic_provinfo *vp)
+{
+	return vp ?  ntohl(vp->length) + sizeof(*vp) - sizeof(vp->num_tlvs) : 0;
+}
diff --git a/drivers/net/enic/vnic_vic.h b/drivers/net/enic/vnic_vic.h
new file mode 100644
index 0000000..085c2a2
--- /dev/null
+++ b/drivers/net/enic/vnic_vic.h
@@ -0,0 +1,59 @@
+/*
+ * Copyright 2010 Cisco Systems, Inc.  All rights reserved.
+ *
+ * This program is free software; you may redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#ifndef _VNIC_VIC_H_
+#define _VNIC_VIC_H_
+
+/* Note: All integer fields in NETWORK byte order */
+
+/* Note: String field lengths include null char */
+
+#define VIC_PROVINFO_CISCO_OUI		{ 0x00, 0x00, 0x0c }
+#define VIC_PROVINFO_LINUX_TYPE		0x2
+
+enum vic_linux_prov_tlv_type {
+	VIC_LINUX_PROV_TLV_PORT_PROFILE_NAME_STR = 0,
+	VIC_LINUX_PROV_TLV_CLIENT_MAC_ADDR = 1,			/* u8[6] */
+	VIC_LINUX_PROV_TLV_CLIENT_NAME_STR = 2,
+	VIC_LINUX_PROV_TLV_HOST_UUID_STR = 8,
+	VIC_LINUX_PROV_TLV_CLIENT_UUID_STR = 9,
+};
+
+struct vic_provinfo {
+	u8 oui[3];		/* OUI of data provider */
+	u8 type;		/* provider-specific type */
+	u32 length;		/* length of data below */
+	u32 num_tlvs;		/* number of tlvs */
+	struct vic_provinfo_tlv {
+		u16 type;
+		u16 length;
+		u8 value[0];
+	} tlv[0];
+} __attribute__ ((packed));
+
+#define VIC_PROVINFO_MAX_DATA		1385
+#define VIC_PROVINFO_MAX_TLV_DATA (VIC_PROVINFO_MAX_DATA - \
+	sizeof(struct vic_provinfo))
+
+struct vic_provinfo *vic_provinfo_alloc(gfp_t flags, u8 *oui, u8 type);
+void vic_provinfo_free(struct vic_provinfo *vp);
+int vic_provinfo_add_tlv(struct vic_provinfo *vp, u16 type, u16 length,
+	void *value);
+size_t vic_provinfo_size(struct vic_provinfo *vp);
+
+#endif	/* _VNIC_VIC_H_ */


^ permalink raw reply related

* Re: [net-next-2.6 V6 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Chris Wright @ 2010-05-13 20:28 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw, arnd
In-Reply-To: <20100513201720.25579.51230.stgit@savbu-pc100.cisco.com>

* Scott Feldman (scofeldm@cisco.com) wrote:
> From: Scott Feldman <scofeldm@cisco.com>
> 
> Add new netdev ops ndo_{set|get}_vf_port to allow setting of
> port-profile on a netdev interface.  Extends netlink socket RTM_SETLINK/
> RTM_GETLINK with new sub cmd called IFLA_VF_PORT (added to end of
> IFLA_cmd list).
> 
> A port-profile is used to configure/enable the external switch virtual port
> backing the netdev interface, not to configure the host-facing side of the
> netdev.  A port-profile is an identifier known to the switch.  How port-
> profiles are installed on the switch or how available port-profiles are
> made know to the host is outside the scope of this patch.
> 
> There are two types of port-profiles specs in the netlink msg.  The first spec
> is for 802.1Qbg (pre-)standard, VDP protocol.  The second spec is for devices
> that run a similar protocol as VDP but in firmware, thus hiding the protocol
> details.  In either case, the specs have much in common and makes sense to
> define the netlink msg as the union of the two specs.  For example, both specs
> have a notition of associating/deassociating a port-profile.  And both specs
> require some information from the hypervisor manager, such as client port
> instance ID.
> 
> The general flow is the port-profile is applied to a host netdev interface
> using RTM_SETLINK, the receiver of the RTM_SETLINK msg communicates with the
> switch, and the switch virtual port backing the host netdev interface is
> configured/enabled based on the settings defined by the port-profile.  What
> those settings comprise, and how those settings are managed is again
> outside the scope of this patch, since this patch only deals with the
> first step in the flow.
> 
> There is a RTM_GETLINK cmd to to return port-profile setting of an
> interface and to also return the status of the last port-profile
> association.
> 
> IFLA_VF_PORT is modeled after the existing IFLA_VF_* cmd where a
> VF number is passed in to identify the virtual function (VF) of an SR-IOV-
> capable device.  In this case, the target of IFLA_VF_PORT msg is the
> netdev physical function (PF) device.  The PF will apply the port-profile
> to the VF.  IFLA_VF_PORT can also be used for devices that don't
> adhere to SR-IOV and can apply the port-profile directly to the netdev
> target.  In this case, the VF number is ignored.
> 
> Passing in a NULL port-profile is used to delete the port-profile association.
> 
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Scott Feldman <scofeldm@cisco.com>
> Signed-off-by: Roopa Prabhu<roprabhu@cisco.com>

Nice, this looks good to me.

Acked-by: Chris Wright <chrisw@redhat.com>

^ permalink raw reply

* Re: [PATCH 6/20] drivers/net: Use kzalloc
From: Lennert Buytenhek @ 2010-05-13 20:29 UTC (permalink / raw)
  To: Julia Lawall; +Cc: netdev, linux-kernel, kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005132200050.6282@ask.diku.dk>

On Thu, May 13, 2010 at 10:00:22PM +0200, Julia Lawall wrote:

> From: Julia Lawall <julia@diku.dk>
> 
> Use kzalloc rather than the combination of kmalloc and memset.
> 
> The semantic patch that makes this change is as follows:
> (http://coccinelle.lip6.fr/)
> 
> // <smpl>
> @@
> expression x,size,flags;
> statement S;
> @@
> 
> -x = kmalloc(size,flags);
> +x = kzalloc(size,flags);
>  if (x == NULL) S
> -memset(x, 0, size);
> // </smpl>
> 
> Signed-off-by: Julia Lawall <julia@diku.dk>

Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>

^ permalink raw reply

* Re: [net-next-2.6 V6 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Patrick McHardy @ 2010-05-13 20:40 UTC (permalink / raw)
  To: Scott Feldman; +Cc: davem, netdev, chrisw, arnd
In-Reply-To: <20100513201720.25579.51230.stgit@savbu-pc100.cisco.com>

Scott Feldman wrote:
> +struct ifla_vf_port_vsi {
> +	__u8 vsi_mgr_id;
> +	__u8 vsi_type_id[3];
> +	__u8 vsi_type_version;
> +	__u8 pad[3];
> +};

Where is this actually used? The only use I could find is in the
size calculation.

> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 23a71cb..de14d36 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> +static int rtnl_vf_port_fill_nest(struct sk_buff *skb, struct net_device *dev,
> +	int vf)

Please keep the style used in that file consistent and align arguments
to the beginning of the opening '('.

> +{
> +	struct nlattr *data;
> +	int err;
> +
> +	data = nla_nest_start(skb, IFLA_VF_PORT);
> +	if (!data)
> +		return -EMSGSIZE;
> +
> +	if (vf >= 0)
> +		nla_put_u32(skb, IFLA_VF_PORT_VF, vf);
> +
> +	err = dev->netdev_ops->ndo_get_vf_port(dev, vf, skb);
> +	if (err == -EMSGSIZE) {
> +		nla_nest_cancel(skb, data);
> +		return -EMSGSIZE;
> +	} else if (err) {
> +		nla_nest_cancel(skb, data);
> +		return 0;

Why is the error not returned in this case?

> +	}
> +
> +	nla_nest_end(skb, data);
> +
> +	return 0;
> +}
> +
> +static int rtnl_vf_port_fill(struct sk_buff *skb, struct net_device *dev)
> +{
> +	int num_vf;
> +	int err;
> +
> +	if (!dev->netdev_ops->ndo_get_vf_port || !dev->dev.parent)
> +		return 0;
> +
> +	num_vf = dev_num_vf(dev->dev.parent);
> +
> +	if (num_vf) {
> +		int i;
> +
> +		for (i = 0; i < num_vf; i++) {
> +			err = rtnl_vf_port_fill_nest(skb, dev, i);
> +			if (err)
> +				goto nla_put_failure;
> +		}
> +	} else  {
> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);

What does -1 mean?

> +		if (err)
> +			goto nla_put_failure;
> +	}
> +
> +	return 0;
> +
> +nla_put_failure:
> +	return -EMSGSIZE;
> +}
> +
>  static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>  			    int type, u32 pid, u32 seq, u32 change,
>  			    unsigned int flags)
> @@ -747,17 +825,23 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
>  		goto nla_put_failure;
>  	copy_rtnl_link_stats64(nla_data(attr), stats);
>  
> +	if (dev->dev.parent)
> +		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));

Should this attribute really be included even if the number is zero?

> +
>  	if (dev->netdev_ops->ndo_get_vf_config && dev->dev.parent) {
>  		int i;
>  		struct ifla_vf_info ivi;
>  
> -		NLA_PUT_U32(skb, IFLA_NUM_VF, dev_num_vf(dev->dev.parent));
>  		for (i = 0; i < dev_num_vf(dev->dev.parent); i++) {
>  			if (dev->netdev_ops->ndo_get_vf_config(dev, i, &ivi))
>  				break;
>  			NLA_PUT(skb, IFLA_VFINFO, sizeof(ivi), &ivi);
>  		}
>  	}
> +
> +	if (rtnl_vf_port_fill(skb, dev))
> +		goto nla_put_failure;
> +
>  	if (dev->rtnl_link_ops) {
>  		if (rtnl_link_fill(skb, dev) < 0)
>  			goto nla_put_failure;
> @@ -824,6 +908,7 @@ const struct nla_policy ifla_policy[IFLA_MAX+1] = {
>  				    .len = sizeof(struct ifla_vf_vlan) },
>  	[IFLA_VF_TX_RATE]	= { .type = NLA_BINARY,
>  				    .len = sizeof(struct ifla_vf_tx_rate) },
> +	[IFLA_VF_PORT]		= { .type = NLA_NESTED },
>  };
>  EXPORT_SYMBOL(ifla_policy);
>  
> @@ -832,6 +917,20 @@ static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
>  	[IFLA_INFO_DATA]	= { .type = NLA_NESTED },
>  };
>  
> +static const struct nla_policy ifla_vf_port_policy[IFLA_VF_PORT_MAX+1] = {
> +	[IFLA_VF_PORT_VF]		= { .type = NLA_U32 },
> +	[IFLA_VF_PORT_PROFILE]		= { .type = NLA_STRING,
> +				.len = VF_PORT_PROFILE_MAX },

This is oddly indented, please align .len to .type as in the
existing attributes.

> +	[IFLA_VF_PORT_VSI_TYPE]		= { .type = NLA_BINARY,
> +				.len = sizeof(struct ifla_vf_port_vsi)},
> +	[IFLA_VF_PORT_INSTANCE_UUID]	= { .type = NLA_BINARY,
> +				.len = VF_PORT_UUID_MAX },
> +	[IFLA_VF_PORT_HOST_UUID]	= { .type = NLA_STRING,
> +				.len = VF_PORT_UUID_MAX },
> +	[IFLA_VF_PORT_REQUEST]		= { .type = NLA_U8, },
> +	[IFLA_VF_PORT_RESPONSE]		= { .type = NLA_U16, },
> +};
> +
>  struct net *rtnl_link_get_net(struct net *src_net, struct nlattr *tb[])
>  {
>  	struct net *net;
> @@ -1028,6 +1127,27 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
>  	}
>  	err = 0;
>  
> +	if (tb[IFLA_VF_PORT]) {
> +		struct nlattr *vf_port[IFLA_VF_PORT_MAX+1];
> +		int vf = -1;
> +
> +		err = nla_parse_nested(vf_port, IFLA_VF_PORT_MAX,
> +			tb[IFLA_VF_PORT], ifla_vf_port_policy);
> +		if (err < 0)
> +			goto errout;
> +
> +		if (vf_port[IFLA_VF_PORT_VF])
> +			vf = nla_get_u32(vf_port[IFLA_VF_PORT_VF]);
> +		err = -EOPNOTSUPP;
> +		if (ops->ndo_set_vf_port)
> +			err = ops->ndo_set_vf_port(dev, vf, vf_port);

This appears to be addressing a single VF to issue commands.
I already explained this during the last set of VF patches,
messages are supposed to by symetrical, since you're dumping
state for all existing VFs, you also need to accept configuration
for multiple VFs. Basically, the kernel must be able to receive
a message it created during a dump and fully recreate the state.

^ permalink raw reply

* Re: [net-next-2.6 V6 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Chris Wright @ 2010-05-13 20:46 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Scott Feldman, davem, netdev, chrisw, arnd
In-Reply-To: <4BEC63DB.2090306@trash.net>

* Patrick McHardy (kaber@trash.net) wrote:
> > +	} else  {
> > +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
> 
> What does -1 mean?

It means no VFs.  Could be made a macro/enum constant

thanks,
-chris

^ permalink raw reply

* Re: [net-next-2.6 V6 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Patrick McHardy @ 2010-05-13 20:49 UTC (permalink / raw)
  To: Chris Wright; +Cc: Scott Feldman, davem, netdev, arnd
In-Reply-To: <20100513204614.GB30483@x200.localdomain>

Chris Wright wrote:
> * Patrick McHardy (kaber@trash.net) wrote:
>>> +	} else  {
>>> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
>> What does -1 mean?
> 
> It means no VFs.  Could be made a macro/enum constant

Why call rtnl_vg_port_fill_nest at all in that case? It even
calls the ndo_get_vf_port() callback.

^ permalink raw reply

* Re: [net-next-2.6 V6 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Chris Wright @ 2010-05-13 21:08 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Chris Wright, Scott Feldman, davem, netdev, arnd
In-Reply-To: <4BEC65BC.5040208@trash.net>

* Patrick McHardy (kaber@trash.net) wrote:
> Chris Wright wrote:
> > * Patrick McHardy (kaber@trash.net) wrote:
> >>> +	} else  {
> >>> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
> >> What does -1 mean?
> > 
> > It means no VFs.  Could be made a macro/enum constant
> 
> Why call rtnl_vg_port_fill_nest at all in that case? It even
> calls the ndo_get_vf_port() callback.

For the case where port profile is set on net dev that does not
have VFs (e.g. the enic case in 2/2).

thanks,
-chris

^ permalink raw reply

* Re: [net-next-2.6 V6 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Patrick McHardy @ 2010-05-13 21:11 UTC (permalink / raw)
  To: Chris Wright; +Cc: Scott Feldman, davem, netdev, arnd
In-Reply-To: <20100513210828.GD30483@x200.localdomain>

Chris Wright wrote:
> * Patrick McHardy (kaber@trash.net) wrote:
>> Chris Wright wrote:
>>> * Patrick McHardy (kaber@trash.net) wrote:
>>>>> +	} else  {
>>>>> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
>>>> What does -1 mean?
>>> It means no VFs.  Could be made a macro/enum constant
>> Why call rtnl_vg_port_fill_nest at all in that case? It even
>> calls the ndo_get_vf_port() callback.
> 
> For the case where port profile is set on net dev that does not
> have VFs (e.g. the enic case in 2/2).

Thanks for the explanation. I guess a enum constant would be nice
to have. But the bigger problem is the asymetrical message
parsing/construction.

BTW:

> +enum {
> +	VF_PORT_REQUEST_PREASSOCIATE = 0,
> +	VF_PORT_REQUEST_PREASSOCIATE_RR,
> +	VF_PORT_REQUEST_ASSOCIATE,
> +	VF_PORT_REQUEST_DISASSOCIATE,
> +};

Do multiple of these commands have to be issued in order to
reach "associated" state? That also wouldn't fit into the
rtnetlink design, which contains state, not commands.

^ permalink raw reply

* Re: [net-next-2.6 V6 PATCH 1/2] Add netlink support for virtual port management (was iovnl)
From: Chris Wright @ 2010-05-13 21:18 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Chris Wright, Scott Feldman, davem, netdev, arnd
In-Reply-To: <4BEC6B19.1040808@trash.net>

* Patrick McHardy (kaber@trash.net) wrote:
> Chris Wright wrote:
> > * Patrick McHardy (kaber@trash.net) wrote:
> >> Chris Wright wrote:
> >>> * Patrick McHardy (kaber@trash.net) wrote:
> >>>>> +	} else  {
> >>>>> +		err = rtnl_vf_port_fill_nest(skb, dev, -1);
> >>>> What does -1 mean?
> >>> It means no VFs.  Could be made a macro/enum constant
> >> Why call rtnl_vg_port_fill_nest at all in that case? It even
> >> calls the ndo_get_vf_port() callback.
> > 
> > For the case where port profile is set on net dev that does not
> > have VFs (e.g. the enic case in 2/2).
> 
> Thanks for the explanation. I guess a enum constant would be nice
> to have. But the bigger problem is the asymetrical message
> parsing/construction.

Yeah, what would you like to do there?  I think we have to keep the
existing, just break out symmtetic set/get?

> BTW:
> 
> > +enum {
> > +	VF_PORT_REQUEST_PREASSOCIATE = 0,
> > +	VF_PORT_REQUEST_PREASSOCIATE_RR,
> > +	VF_PORT_REQUEST_ASSOCIATE,
> > +	VF_PORT_REQUEST_DISASSOCIATE,
> > +};
> 
> Do multiple of these commands have to be issued in order to
> reach "associated" state? That also wouldn't fit into the
> rtnetlink design, which contains state, not commands.

It's optional.  At the very least, you need 1 associate/disassociate for
each logical link up/down.

For VM migration or (perhaps failover modes) you can optionally issue a
preassociate.  Preassociate has 2 flavors.  One which is purely advisory,
another which will reserve resources on the switch.  These all reprsent
state transitions in the switch, but only associate should allow final
logical link up and traffic to flow.

thanks,
-chris


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox