Netdev List
 help / color / mirror / Atom feed
* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
From: Andy Gospodarek @ 2014-12-04 15:28 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, bcrl, hemal
In-Reply-To: <20141204151512.GE1861@nanopsycho.orion>

On Thu, Dec 04, 2014 at 04:15:12PM +0100, Jiri Pirko wrote:
> Thu, Dec 04, 2014 at 03:57:43PM CET, gospo@cumulusnetworks.com wrote:
> >On Thu, Dec 04, 2014 at 03:33:06PM +0100, Jiri Pirko wrote:
> >> Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
> >> >On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
> >> >> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> >> >> ---
> >> >>  include/linux/if_link.h | 1 +
> >> >>  ip/ipaddress.c          | 8 ++++++++
> >> >>  2 files changed, 9 insertions(+)
> >> >> 
> >> >> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> >> >> index 4732063..a6e2594 100644
> >> >> --- a/include/linux/if_link.h
> >> >> +++ b/include/linux/if_link.h
> >> >> @@ -145,6 +145,7 @@ enum {
> >> >>  	IFLA_CARRIER,
> >> >>  	IFLA_PHYS_PORT_ID,
> >> >>  	IFLA_CARRIER_CHANGES,
> >> >> +	IFLA_PHYS_SWITCH_ID,
> >> >
> >> >Serious question for Stephen et al, once we take this to iproute2 are we
> >> >going to be able to change the name but the string diplayed if needed?
> >> >
> >> >I had a patch that called this IFLA_PARENT_ID that I was using with the
> >> >older github tree used by Jiri and Scott before these were in net-next.
> >> >I wanted to submit that as a change to what became
> >> >82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
> >> >be accepted and the dust settled.
> >> >
> >> >I like the parent/child/sibling nomenclature better for 4 reasons:
> >> >
> >> >- Most did not seem to like the term 'offload' since that term would be
> >> >  confusing with, GRO, GSO, etc.
> >> >- A *significant* use case for many of the high-end ASICs in datacenters
> >> >  is routing.
> >> >- switchid does not make sense in the OVS/flow case because it is all
> >> >  about flows, not switches or routers, and parent made sense there.
> >> 
> >> well ovs is all about flows and has the "switch" word in the name. I
> >> believe that people are talking about "switches" in case of these "flow
> >> devices" as well. I see nothing wrong in that. I think that "switch"
> >> became generic name for "packet forwarding machines".
> >
> >Just because one chose to use it that way does not mean I agree with it
> >and that we should copy their bad decision.  :-)
> >
> >> "parent" is very generic and may mean 100 things...
> >
> >But in this case, it means 1 thing.  The netdev you are using is
> >connected to another device that controls it rather than being just a
> >NIC.
> >
> >> 
> >> 
> >> >- I wanted to combine this for use with SR-IOV use case, so one can more
> >> >  easily map PF->VF using this.
> >> 
> >> Ugh, please don't mix this up with pf, vf. That is completely different
> >> thing.
> >> 
> >> pf vf mapping is done in sysfs. In netlink, physportid is used for that
> >> purpose. We can expose this phys port id for pf as well (as a different
> >> attr) and we are done.
> >
> >I know that attribute is there, but I find it more valuable for
> >solutions like nPAR than for PF/VF use-case.  Parent/child relationship
> >makes more sense to me since for forwarding will be controlled by the
> >embedded switch on those devices.  (Notice I specifically chose not to
> 
> see, "embedded switch", not "embedded parent" or "embedded offload". And
> yes, the "embedded switch" also may (and some of them do today) work
> with flows.

It's fine to exclude the PF/VF part for now since you seem disagree with
that as a valid reason. The other 3 reasons are in my mind, so that was
why I asked to Stephen would mind waiting for me to post my set.  I'm
not asking for you to do anything here but to be patient and let me get
the set out.

> 
> 
> >use 'master' since that is already overloaded by bridge, bonding,
> >teaming, etc.)
> >
> >> 
> >> >
> >> >Can you give me a bit (a day) to clean-up that patch and submit an
> >> >alternative proposal to these?
> >> >
> >> >>  	__IFLA_MAX
> >> >>  };
> >> >>  
> >> >> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> >> >> index 4d99324..bd36a07 100644
> >> >> --- a/ip/ipaddress.c
> >> >> +++ b/ip/ipaddress.c
> >> >> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
> >> >>  				      b1, sizeof(b1)));
> >> >>  	}
> >> >>  
> >> >> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
> >> >> +		SPRINT_BUF(b1);
> >> >> +		fprintf(fp, "switchid %s ",
> >> >> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
> >> >> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
> >> >> +				      b1, sizeof(b1)));
> >> >> +	}
> >> >> +
> >> >>  	if (tb[IFLA_OPERSTATE])
> >> >>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
> >> >>  
> >> >> -- 
> >> >> 1.9.3
> >> >> 
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >--
> >To unsubscribe from this list: send the line "unsubscribe netdev" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Where exactly will arch_fast_hash be used
From: Thomas Graf @ 2014-12-04 15:26 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Daniel Borkmann, David S. Miller, Theodore Ts'o, netdev,
	Linux Kernel Mailing List
In-Reply-To: <20141204081147.GA19030@gondor.apana.org.au>

On 12/04/14 at 04:11pm, Herbert Xu wrote:
> Hi:
> 
> While working on rhashtable it came to me that this whole concept
> of arch_fast_hash is flawed.  CRCs are linear functions so it's
> fairly easy for an attacker to identify collisions or at least
> eliminate a large amount of search space (e.g., controlling the
> last bit of the hash result is almost trivial, even when you add
> a random seed).
> 
> So what exactly are we going to use arch_fast_hash for? Presumably
> it's places where security is never goint to be an issue, right?
> 
> Even if security wasn't an issue, straight CRC32 has really poor
> lower-order bit distribution, which makes it a terrible choice for
> a hash table that simply uses the lower-order bits.

As Daniel pointed out, this work originated for the OVS edge use
case where security is of less concern and the rehashing is
sufficient. Identifying collisions is less of interest as the user
space fall back provides a greater surface for an attack.

^ permalink raw reply

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
From: Jiri Pirko @ 2014-12-04 15:15 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, bcrl, hemal
In-Reply-To: <20141204145743.GS27416@gospo.rtplab.test>

Thu, Dec 04, 2014 at 03:57:43PM CET, gospo@cumulusnetworks.com wrote:
>On Thu, Dec 04, 2014 at 03:33:06PM +0100, Jiri Pirko wrote:
>> Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
>> >On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
>> >> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> >> ---
>> >>  include/linux/if_link.h | 1 +
>> >>  ip/ipaddress.c          | 8 ++++++++
>> >>  2 files changed, 9 insertions(+)
>> >> 
>> >> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>> >> index 4732063..a6e2594 100644
>> >> --- a/include/linux/if_link.h
>> >> +++ b/include/linux/if_link.h
>> >> @@ -145,6 +145,7 @@ enum {
>> >>  	IFLA_CARRIER,
>> >>  	IFLA_PHYS_PORT_ID,
>> >>  	IFLA_CARRIER_CHANGES,
>> >> +	IFLA_PHYS_SWITCH_ID,
>> >
>> >Serious question for Stephen et al, once we take this to iproute2 are we
>> >going to be able to change the name but the string diplayed if needed?
>> >
>> >I had a patch that called this IFLA_PARENT_ID that I was using with the
>> >older github tree used by Jiri and Scott before these were in net-next.
>> >I wanted to submit that as a change to what became
>> >82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
>> >be accepted and the dust settled.
>> >
>> >I like the parent/child/sibling nomenclature better for 4 reasons:
>> >
>> >- Most did not seem to like the term 'offload' since that term would be
>> >  confusing with, GRO, GSO, etc.
>> >- A *significant* use case for many of the high-end ASICs in datacenters
>> >  is routing.
>> >- switchid does not make sense in the OVS/flow case because it is all
>> >  about flows, not switches or routers, and parent made sense there.
>> 
>> well ovs is all about flows and has the "switch" word in the name. I
>> believe that people are talking about "switches" in case of these "flow
>> devices" as well. I see nothing wrong in that. I think that "switch"
>> became generic name for "packet forwarding machines".
>
>Just because one chose to use it that way does not mean I agree with it
>and that we should copy their bad decision.  :-)
>
>> "parent" is very generic and may mean 100 things...
>
>But in this case, it means 1 thing.  The netdev you are using is
>connected to another device that controls it rather than being just a
>NIC.
>
>> 
>> 
>> >- I wanted to combine this for use with SR-IOV use case, so one can more
>> >  easily map PF->VF using this.
>> 
>> Ugh, please don't mix this up with pf, vf. That is completely different
>> thing.
>> 
>> pf vf mapping is done in sysfs. In netlink, physportid is used for that
>> purpose. We can expose this phys port id for pf as well (as a different
>> attr) and we are done.
>
>I know that attribute is there, but I find it more valuable for
>solutions like nPAR than for PF/VF use-case.  Parent/child relationship
>makes more sense to me since for forwarding will be controlled by the
>embedded switch on those devices.  (Notice I specifically chose not to

see, "embedded switch", not "embedded parent" or "embedded offload". And
yes, the "embedded switch" also may (and some of them do today) work
with flows.


>use 'master' since that is already overloaded by bridge, bonding,
>teaming, etc.)
>
>> 
>> >
>> >Can you give me a bit (a day) to clean-up that patch and submit an
>> >alternative proposal to these?
>> >
>> >>  	__IFLA_MAX
>> >>  };
>> >>  
>> >> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> >> index 4d99324..bd36a07 100644
>> >> --- a/ip/ipaddress.c
>> >> +++ b/ip/ipaddress.c
>> >> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>> >>  				      b1, sizeof(b1)));
>> >>  	}
>> >>  
>> >> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>> >> +		SPRINT_BUF(b1);
>> >> +		fprintf(fp, "switchid %s ",
>> >> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>> >> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>> >> +				      b1, sizeof(b1)));
>> >> +	}
>> >> +
>> >>  	if (tb[IFLA_OPERSTATE])
>> >>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>> >>  
>> >> -- 
>> >> 1.9.3
>> >> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
From: Jiri Pirko @ 2014-12-04 15:12 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, bcrl, hemal
In-Reply-To: <20141204145743.GS27416@gospo.rtplab.test>

Thu, Dec 04, 2014 at 03:57:43PM CET, gospo@cumulusnetworks.com wrote:
>On Thu, Dec 04, 2014 at 03:33:06PM +0100, Jiri Pirko wrote:
>> Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
>> >On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
>> >> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> >> ---
>> >>  include/linux/if_link.h | 1 +
>> >>  ip/ipaddress.c          | 8 ++++++++
>> >>  2 files changed, 9 insertions(+)
>> >> 
>> >> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>> >> index 4732063..a6e2594 100644
>> >> --- a/include/linux/if_link.h
>> >> +++ b/include/linux/if_link.h
>> >> @@ -145,6 +145,7 @@ enum {
>> >>  	IFLA_CARRIER,
>> >>  	IFLA_PHYS_PORT_ID,
>> >>  	IFLA_CARRIER_CHANGES,
>> >> +	IFLA_PHYS_SWITCH_ID,
>> >
>> >Serious question for Stephen et al, once we take this to iproute2 are we
>> >going to be able to change the name but the string diplayed if needed?
>> >
>> >I had a patch that called this IFLA_PARENT_ID that I was using with the
>> >older github tree used by Jiri and Scott before these were in net-next.
>> >I wanted to submit that as a change to what became
>> >82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
>> >be accepted and the dust settled.
>> >
>> >I like the parent/child/sibling nomenclature better for 4 reasons:
>> >
>> >- Most did not seem to like the term 'offload' since that term would be
>> >  confusing with, GRO, GSO, etc.
>> >- A *significant* use case for many of the high-end ASICs in datacenters
>> >  is routing.
>> >- switchid does not make sense in the OVS/flow case because it is all
>> >  about flows, not switches or routers, and parent made sense there.
>> 
>> well ovs is all about flows and has the "switch" word in the name. I
>> believe that people are talking about "switches" in case of these "flow
>> devices" as well. I see nothing wrong in that. I think that "switch"
>> became generic name for "packet forwarding machines".
>
>Just because one chose to use it that way does not mean I agree with it
>and that we should copy their bad decision.  :-)
>
>> "parent" is very generic and may mean 100 things...
>
>But in this case, it means 1 thing.  The netdev you are using is
>connected to another device that controls it rather than being just a
>NIC.

when we discussed linkage now called upper/lower devices, there was a
proposal to use "parent" as well. And then it ment only 1 thing as well.
Why for example pf is not parent of vf? Using "parent" word here and
assume that it just makes sense for this case is IMHO wrong. Too
generic. "switch_parent" - that I am okay with.


>
>> 
>> 
>> >- I wanted to combine this for use with SR-IOV use case, so one can more
>> >  easily map PF->VF using this.
>> 
>> Ugh, please don't mix this up with pf, vf. That is completely different
>> thing.
>> 
>> pf vf mapping is done in sysfs. In netlink, physportid is used for that
>> purpose. We can expose this phys port id for pf as well (as a different
>> attr) and we are done.
>
>I know that attribute is there, but I find it more valuable for
>solutions like nPAR than for PF/VF use-case.

Why it is not good enough for pf/vf? There are SR-IOV drivers using
this:
$ git grep ndo_get_phys_port_id
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:       .ndo_get_phys_port_id   = bnx2x_get_phys_port_id,
drivers/net/ethernet/intel/i40e/i40e_main.c:    .ndo_get_phys_port_id   = i40e_get_phys_port_id,
drivers/net/ethernet/mellanox/mlx4/en_netdev.c: .ndo_get_phys_port_id   = mlx4_en_get_phys_port_id,
drivers/net/ethernet/mellanox/mlx4/en_netdev.c: .ndo_get_phys_port_id   = mlx4_en_get_phys_port_id,
drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c:       .ndo_get_phys_port_id   = qlcnic_get_phys_port_id,


>Parent/child relationship
>makes more sense to me since for forwarding will be controlled by the
>embedded switch on those devices.  (Notice I specifically chose not to
>use 'master' since that is already overloaded by bridge, bonding,
>teaming, etc.)

But is that embedded switch the same thing as PF? I doubt that. Let the
aspect of PF/VF aside and look at it as on ordinary switch. The eswitch
driver needs te be implemented and add netdevs for internal ports (they
do not exist now).

You have to look at PF/VF and eSwitch as on separate things.

>
>> 
>> >
>> >Can you give me a bit (a day) to clean-up that patch and submit an
>> >alternative proposal to these?
>> >
>> >>  	__IFLA_MAX
>> >>  };
>> >>  
>> >> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> >> index 4d99324..bd36a07 100644
>> >> --- a/ip/ipaddress.c
>> >> +++ b/ip/ipaddress.c
>> >> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>> >>  				      b1, sizeof(b1)));
>> >>  	}
>> >>  
>> >> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>> >> +		SPRINT_BUF(b1);
>> >> +		fprintf(fp, "switchid %s ",
>> >> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>> >> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>> >> +				      b1, sizeof(b1)));
>> >> +	}
>> >> +
>> >>  	if (tb[IFLA_OPERSTATE])
>> >>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>> >>  
>> >> -- 
>> >> 1.9.3
>> >> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
From: Andy Gospodarek @ 2014-12-04 14:57 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, bcrl, hemal
In-Reply-To: <20141204143306.GB1861@nanopsycho.orion>

On Thu, Dec 04, 2014 at 03:33:06PM +0100, Jiri Pirko wrote:
> Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
> >On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
> >> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> >> ---
> >>  include/linux/if_link.h | 1 +
> >>  ip/ipaddress.c          | 8 ++++++++
> >>  2 files changed, 9 insertions(+)
> >> 
> >> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> >> index 4732063..a6e2594 100644
> >> --- a/include/linux/if_link.h
> >> +++ b/include/linux/if_link.h
> >> @@ -145,6 +145,7 @@ enum {
> >>  	IFLA_CARRIER,
> >>  	IFLA_PHYS_PORT_ID,
> >>  	IFLA_CARRIER_CHANGES,
> >> +	IFLA_PHYS_SWITCH_ID,
> >
> >Serious question for Stephen et al, once we take this to iproute2 are we
> >going to be able to change the name but the string diplayed if needed?
> >
> >I had a patch that called this IFLA_PARENT_ID that I was using with the
> >older github tree used by Jiri and Scott before these were in net-next.
> >I wanted to submit that as a change to what became
> >82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
> >be accepted and the dust settled.
> >
> >I like the parent/child/sibling nomenclature better for 4 reasons:
> >
> >- Most did not seem to like the term 'offload' since that term would be
> >  confusing with, GRO, GSO, etc.
> >- A *significant* use case for many of the high-end ASICs in datacenters
> >  is routing.
> >- switchid does not make sense in the OVS/flow case because it is all
> >  about flows, not switches or routers, and parent made sense there.
> 
> well ovs is all about flows and has the "switch" word in the name. I
> believe that people are talking about "switches" in case of these "flow
> devices" as well. I see nothing wrong in that. I think that "switch"
> became generic name for "packet forwarding machines".

Just because one chose to use it that way does not mean I agree with it
and that we should copy their bad decision.  :-)

> "parent" is very generic and may mean 100 things...

But in this case, it means 1 thing.  The netdev you are using is
connected to another device that controls it rather than being just a
NIC.

> 
> 
> >- I wanted to combine this for use with SR-IOV use case, so one can more
> >  easily map PF->VF using this.
> 
> Ugh, please don't mix this up with pf, vf. That is completely different
> thing.
> 
> pf vf mapping is done in sysfs. In netlink, physportid is used for that
> purpose. We can expose this phys port id for pf as well (as a different
> attr) and we are done.

I know that attribute is there, but I find it more valuable for
solutions like nPAR than for PF/VF use-case.  Parent/child relationship
makes more sense to me since for forwarding will be controlled by the
embedded switch on those devices.  (Notice I specifically chose not to
use 'master' since that is already overloaded by bridge, bonding,
teaming, etc.)

> 
> >
> >Can you give me a bit (a day) to clean-up that patch and submit an
> >alternative proposal to these?
> >
> >>  	__IFLA_MAX
> >>  };
> >>  
> >> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> >> index 4d99324..bd36a07 100644
> >> --- a/ip/ipaddress.c
> >> +++ b/ip/ipaddress.c
> >> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
> >>  				      b1, sizeof(b1)));
> >>  	}
> >>  
> >> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
> >> +		SPRINT_BUF(b1);
> >> +		fprintf(fp, "switchid %s ",
> >> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
> >> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
> >> +				      b1, sizeof(b1)));
> >> +	}
> >> +
> >>  	if (tb[IFLA_OPERSTATE])
> >>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
> >>  
> >> -- 
> >> 1.9.3
> >> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH v2] drivers: net : cpsw: Update Kconfig for CPSW
From: Felipe Balbi @ 2014-12-04 14:48 UTC (permalink / raw)
  To: Lokesh Vutla
  Cc: netdev, davem, mugunthanvnm, linux-omap, grygorii.strashko, balbi,
	nsekhar, t-kristo
In-Reply-To: <1417668869-25616-1-git-send-email-lokeshvutla@ti.com>

[-- Attachment #1: Type: text/plain, Size: 2035 bytes --]

On Thu, Dec 04, 2014 at 10:24:29AM +0530, Lokesh Vutla wrote:
> CPSW is present in AM33xx, AM43xx, DRA7xx.
> Updating the Kconfig to depend on ARCH_OMAP2PLUS instead of listing
> all SoC's.
> 
> Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>

Reviewed-by: Felipe Balbi <balbi@ti.com>

> ---
>  drivers/net/ethernet/ti/Kconfig | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/Kconfig b/drivers/net/ethernet/ti/Kconfig
> index 5d8cb79..7051255 100644
> --- a/drivers/net/ethernet/ti/Kconfig
> +++ b/drivers/net/ethernet/ti/Kconfig
> @@ -5,7 +5,7 @@
>  config NET_VENDOR_TI
>  	bool "Texas Instruments (TI) devices"
>  	default y
> -	depends on PCI || EISA || AR7 || (ARM && (ARCH_DAVINCI || ARCH_OMAP3 || SOC_AM33XX || ARCH_KEYSTONE))
> +	depends on PCI || EISA || AR7 || ARCH_DAVINCI || ARCH_OMAP2PLUS || ARCH_KEYSTONE
>  	---help---
>  	  If you have a network (Ethernet) card belonging to this class, say Y
>  	  and read the Ethernet-HOWTO, available from
> @@ -32,7 +32,7 @@ config TI_DAVINCI_EMAC
>  
>  config TI_DAVINCI_MDIO
>  	tristate "TI DaVinci MDIO Support"
> -	depends on ARM && ( ARCH_DAVINCI || ARCH_OMAP3 || SOC_AM33XX || ARCH_KEYSTONE )
> +	depends on ARCH_DAVINCI || ARCH_OMAP2PLUS || ARCH_KEYSTONE
>  	select PHYLIB
>  	---help---
>  	  This driver supports TI's DaVinci MDIO module.
> @@ -42,7 +42,7 @@ config TI_DAVINCI_MDIO
>  
>  config TI_DAVINCI_CPDMA
>  	tristate "TI DaVinci CPDMA Support"
> -	depends on ARM && ( ARCH_DAVINCI || ARCH_OMAP3 || SOC_AM33XX )
> +	depends on ARCH_DAVINCI || ARCH_OMAP2PLUS
>  	---help---
>  	  This driver supports TI's DaVinci CPDMA dma engine.
>  
> @@ -58,7 +58,7 @@ config TI_CPSW_PHY_SEL
>  
>  config TI_CPSW
>  	tristate "TI CPSW Switch Support"
> -	depends on ARM && (ARCH_DAVINCI || SOC_AM33XX)
> +	depends on ARCH_DAVINCI || ARCH_OMAP2PLUS
>  	select TI_DAVINCI_CPDMA
>  	select TI_DAVINCI_MDIO
>  	select TI_CPSW_PHY_SEL
> -- 
> 1.9.1
> 

-- 
balbi

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
From: Roopa Prabhu @ 2014-12-04 14:45 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl, hemal
In-Reply-To: <20141204143400.GC1861@nanopsycho.orion>

On 12/4/14, 6:34 AM, Jiri Pirko wrote:
> Thu, Dec 04, 2014 at 03:26:50PM CET, roopa@cumulusnetworks.com wrote:
>> On 12/4/14, 12:57 AM, Jiri Pirko wrote:
>>> Jiri Pirko (1):
>>>    iproute2: ipa: show switch id
>>>
>>> Scott Feldman (5):
>>>    bridge/fdb: fix statistics output spacing
>>>    bridge/fdb: add flag/indication for FDB entry synced from offload
>>>      device
>>>    bridge/link: add new offload hwmode swdev
>> Ack to most patches but nack on this one. The todo list still has a note to
>> revist the flag to indicate switchdev offloads.
>> Exposing this to userspace does not help that.
> Hmm, note that this is already exposed to userspace, this patchset is
> for iproute2 (userspace tool).
hmmm, all feedback on the switchdev patches seemed to indicate we can 
change this later.
  I don't see swdev mode being used in the kernel anywhere today.
I will send a patch to remove it. Its still in net-next and so can be 
changed ?.
I was going to resend my patch to introduce a common offload flag for 
all link objects.
It would be nice if all of them had a consistent flag to indicate hw 
offload and iproute2 could display the same flag for all.
Including bonds and vxlan's.

>
>>>    link: add missing IFLA_BRPORT_PROXYARP
>>>    bridge/link: add learning_sync policy flag
>>>
>>>   bridge/fdb.c              |  4 +++-
>>>   bridge/link.c             | 17 +++++++++++++++--
>>>   include/linux/if_bridge.h |  1 +
>>>   include/linux/if_link.h   |  3 +++
>>>   include/linux/neighbour.h |  1 +
>>>   ip/ipaddress.c            |  8 ++++++++
>>>   man/man8/bridge.8         | 19 ++++++++++++++-----
>>>   7 files changed, 45 insertions(+), 8 deletions(-)
>>>

^ permalink raw reply

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
From: Jiri Pirko @ 2014-12-04 14:34 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl, hemal
In-Reply-To: <54806F2A.2050109@cumulusnetworks.com>

Thu, Dec 04, 2014 at 03:26:50PM CET, roopa@cumulusnetworks.com wrote:
>On 12/4/14, 12:57 AM, Jiri Pirko wrote:
>>Jiri Pirko (1):
>>   iproute2: ipa: show switch id
>>
>>Scott Feldman (5):
>>   bridge/fdb: fix statistics output spacing
>>   bridge/fdb: add flag/indication for FDB entry synced from offload
>>     device
>>   bridge/link: add new offload hwmode swdev
>Ack to most patches but nack on this one. The todo list still has a note to
>revist the flag to indicate switchdev offloads.
>Exposing this to userspace does not help that.

Hmm, note that this is already exposed to userspace, this patchset is
for iproute2 (userspace tool).

>
>>   link: add missing IFLA_BRPORT_PROXYARP
>>   bridge/link: add learning_sync policy flag
>>
>>  bridge/fdb.c              |  4 +++-
>>  bridge/link.c             | 17 +++++++++++++++--
>>  include/linux/if_bridge.h |  1 +
>>  include/linux/if_link.h   |  3 +++
>>  include/linux/neighbour.h |  1 +
>>  ip/ipaddress.c            |  8 ++++++++
>>  man/man8/bridge.8         | 19 ++++++++++++++-----
>>  7 files changed, 45 insertions(+), 8 deletions(-)
>>
>

^ permalink raw reply

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
From: Jiri Pirko @ 2014-12-04 14:33 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, bcrl, hemal
In-Reply-To: <20141204142015.GR27416@gospo.rtplab.test>

Thu, Dec 04, 2014 at 03:20:15PM CET, gospo@cumulusnetworks.com wrote:
>On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>  include/linux/if_link.h | 1 +
>>  ip/ipaddress.c          | 8 ++++++++
>>  2 files changed, 9 insertions(+)
>> 
>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>> index 4732063..a6e2594 100644
>> --- a/include/linux/if_link.h
>> +++ b/include/linux/if_link.h
>> @@ -145,6 +145,7 @@ enum {
>>  	IFLA_CARRIER,
>>  	IFLA_PHYS_PORT_ID,
>>  	IFLA_CARRIER_CHANGES,
>> +	IFLA_PHYS_SWITCH_ID,
>
>Serious question for Stephen et al, once we take this to iproute2 are we
>going to be able to change the name but the string diplayed if needed?
>
>I had a patch that called this IFLA_PARENT_ID that I was using with the
>older github tree used by Jiri and Scott before these were in net-next.
>I wanted to submit that as a change to what became
>82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
>be accepted and the dust settled.
>
>I like the parent/child/sibling nomenclature better for 4 reasons:
>
>- Most did not seem to like the term 'offload' since that term would be
>  confusing with, GRO, GSO, etc.
>- A *significant* use case for many of the high-end ASICs in datacenters
>  is routing.
>- switchid does not make sense in the OVS/flow case because it is all
>  about flows, not switches or routers, and parent made sense there.

well ovs is all about flows and has the "switch" word in the name. I
believe that people are talking about "switches" in case of these "flow
devices" as well. I see nothing wrong in that. I think that "switch"
became generic name for "packet forwarding machines".

"parent" is very generic and may mean 100 things...


>- I wanted to combine this for use with SR-IOV use case, so one can more
>  easily map PF->VF using this.

Ugh, please don't mix this up with pf, vf. That is completely different
thing.

pf vf mapping is done in sysfs. In netlink, physportid is used for that
purpose. We can expose this phys port id for pf as well (as a different
attr) and we are done.

>
>Can you give me a bit (a day) to clean-up that patch and submit an
>alternative proposal to these?
>
>>  	__IFLA_MAX
>>  };
>>  
>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> index 4d99324..bd36a07 100644
>> --- a/ip/ipaddress.c
>> +++ b/ip/ipaddress.c
>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>  				      b1, sizeof(b1)));
>>  	}
>>  
>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>> +		SPRINT_BUF(b1);
>> +		fprintf(fp, "switchid %s ",
>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      b1, sizeof(b1)));
>> +	}
>> +
>>  	if (tb[IFLA_OPERSTATE])
>>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>>  
>> -- 
>> 1.9.3
>> 

^ permalink raw reply

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
From: Jamal Hadi Salim @ 2014-12-04 14:31 UTC (permalink / raw)
  To: Roopa Prabhu, Andy Gospodarek
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, bcrl, hemal
In-Reply-To: <54806E1B.5010402@cumulusnetworks.com>

On 12/04/14 09:22, Roopa Prabhu wrote:
> On 12/4/14, 5:56 AM, Andy Gospodarek wrote:

> I think jamal meant the people in the cc (not netdev).
> I actually like the cc, but that probably cant last long because it is a
> lot of people.

There are 34 people on that list. I support copying all stake holders
but it is too large.

cheers,
jamal

^ permalink raw reply

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
From: Roopa Prabhu @ 2014-12-04 14:29 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, jhs,
	sfeldma, f.fainelli, linville, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, bcrl, hemal
In-Reply-To: <20141204142015.GR27416@gospo.rtplab.test>

On 12/4/14, 6:20 AM, Andy Gospodarek wrote:
> On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
>> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
>> ---
>>   include/linux/if_link.h | 1 +
>>   ip/ipaddress.c          | 8 ++++++++
>>   2 files changed, 9 insertions(+)
>>
>> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
>> index 4732063..a6e2594 100644
>> --- a/include/linux/if_link.h
>> +++ b/include/linux/if_link.h
>> @@ -145,6 +145,7 @@ enum {
>>   	IFLA_CARRIER,
>>   	IFLA_PHYS_PORT_ID,
>>   	IFLA_CARRIER_CHANGES,
>> +	IFLA_PHYS_SWITCH_ID,
> Serious question for Stephen et al, once we take this to iproute2 are we
> going to be able to change the name but the string diplayed if needed?
>
> I had a patch that called this IFLA_PARENT_ID that I was using with the
> older github tree used by Jiri and Scott before these were in net-next.
> I wanted to submit that as a change to what became
> 82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
> be accepted and the dust settled.
>
> I like the parent/child/sibling nomenclature better for 4 reasons:
>
> - Most did not seem to like the term 'offload' since that term would be
>    confusing with, GRO, GSO, etc.
> - A *significant* use case for many of the high-end ASICs in datacenters
>    is routing.
> - switchid does not make sense in the OVS/flow case because it is all
>    about flows, not switches or routers, and parent made sense there.
> - I wanted to combine this for use with SR-IOV use case, so one can more
>    easily map PF->VF using this.

ack.., I have voted for calling it a generic parent id in the past.
>
> Can you give me a bit (a day) to clean-up that patch and submit an
> alternative proposal to these?
>
>>   	__IFLA_MAX
>>   };
>>   
>> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
>> index 4d99324..bd36a07 100644
>> --- a/ip/ipaddress.c
>> +++ b/ip/ipaddress.c
>> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>>   				      b1, sizeof(b1)));
>>   	}
>>   
>> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
>> +		SPRINT_BUF(b1);
>> +		fprintf(fp, "switchid %s ",
>> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
>> +				      b1, sizeof(b1)));
>> +	}
>> +
>>   	if (tb[IFLA_OPERSTATE])
>>   		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>>   
>> -- 
>> 1.9.3
>>

^ permalink raw reply

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
From: Roopa Prabhu @ 2014-12-04 14:26 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, gospo, bcrl, hemal
In-Reply-To: <1417683438-10935-1-git-send-email-jiri@resnulli.us>

On 12/4/14, 12:57 AM, Jiri Pirko wrote:
> Jiri Pirko (1):
>    iproute2: ipa: show switch id
>
> Scott Feldman (5):
>    bridge/fdb: fix statistics output spacing
>    bridge/fdb: add flag/indication for FDB entry synced from offload
>      device
>    bridge/link: add new offload hwmode swdev
Ack to most patches but nack on this one. The todo list still has a note 
to revist the flag to indicate switchdev offloads.
Exposing this to userspace does not help that.

>    link: add missing IFLA_BRPORT_PROXYARP
>    bridge/link: add learning_sync policy flag
>
>   bridge/fdb.c              |  4 +++-
>   bridge/link.c             | 17 +++++++++++++++--
>   include/linux/if_bridge.h |  1 +
>   include/linux/if_link.h   |  3 +++
>   include/linux/neighbour.h |  1 +
>   ip/ipaddress.c            |  8 ++++++++
>   man/man8/bridge.8         | 19 ++++++++++++++-----
>   7 files changed, 45 insertions(+), 8 deletions(-)
>

^ permalink raw reply

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
From: Roopa Prabhu @ 2014-12-04 14:22 UTC (permalink / raw)
  To: Andy Gospodarek
  Cc: Jamal Hadi Salim, Jiri Pirko, netdev, davem, nhorman, andy, tgraf,
	dborkman, ogerlitz, jesse, pshelar, azhou, ben, stephen,
	jeffrey.t.kirsher, vyasevic, xiyou.wangcong, john.r.fastabend,
	edumazet, sfeldma, f.fainelli, linville, jasowang, ebiederm,
	nicolas.dichtel, ryazanov.s.a, buytenh, aviadr, nbd,
	alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, bcrl
In-Reply-To: <20141204135621.GQ27416@gospo.rtplab.test>

On 12/4/14, 5:56 AM, Andy Gospodarek wrote:
> On Thu, Dec 04, 2014 at 08:16:38AM -0500, Jamal Hadi Salim wrote:
>> General comment:
>> iproute2 patches need to be addressed to the maintainer
>> stephen@networkplumber.org
>> And are we mainstream enough now we can drop the 1000 people on the
>> To: list? I doubt most of these people are even following the discussion (I
>> planted an easter egg for Aviad and didnt see him
>> reacting) ;->
> I actually don't mind them coming here since it is a change to netlink.
> Thanks for sending them to netdev, Jiri.

I think jamal meant the people in the cc (not netdev).
I actually like the cc, but that probably cant last long because it is a 
lot of people.
>
>> cheers,
>> jamal
>>
>> On 12/04/14 03:57, Jiri Pirko wrote:
>>> Jiri Pirko (1):
>>>    iproute2: ipa: show switch id
>>>
>>> Scott Feldman (5):
>>>    bridge/fdb: fix statistics output spacing
>>>    bridge/fdb: add flag/indication for FDB entry synced from offload
>>>      device
>>>    bridge/link: add new offload hwmode swdev
>>>    link: add missing IFLA_BRPORT_PROXYARP
>>>    bridge/link: add learning_sync policy flag
>>>
>>>   bridge/fdb.c              |  4 +++-
>>>   bridge/link.c             | 17 +++++++++++++++--
>>>   include/linux/if_bridge.h |  1 +
>>>   include/linux/if_link.h   |  3 +++
>>>   include/linux/neighbour.h |  1 +
>>>   ip/ipaddress.c            |  8 ++++++++
>>>   man/man8/bridge.8         | 19 ++++++++++++++-----
>>>   7 files changed, 45 insertions(+), 8 deletions(-)
>>>

^ permalink raw reply

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
From: Andy Gospodarek @ 2014-12-04 14:20 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse,
	pshelar, azhou, ben, stephen, jeffrey.t.kirsher, vyasevic,
	xiyou.wangcong, john.r.fastabend, edumazet, jhs, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, bcrl, hemal
In-Reply-To: <1417683438-10935-2-git-send-email-jiri@resnulli.us>

On Thu, Dec 04, 2014 at 09:57:13AM +0100, Jiri Pirko wrote:
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>  include/linux/if_link.h | 1 +
>  ip/ipaddress.c          | 8 ++++++++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index 4732063..a6e2594 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -145,6 +145,7 @@ enum {
>  	IFLA_CARRIER,
>  	IFLA_PHYS_PORT_ID,
>  	IFLA_CARRIER_CHANGES,
> +	IFLA_PHYS_SWITCH_ID,

Serious question for Stephen et al, once we take this to iproute2 are we
going to be able to change the name but the string diplayed if needed?

I had a patch that called this IFLA_PARENT_ID that I was using with the
older github tree used by Jiri and Scott before these were in net-next.
I wanted to submit that as a change to what became
82f2841291cfaf4d225aa1766424280254d3e3b2, but was waiting for things to
be accepted and the dust settled.

I like the parent/child/sibling nomenclature better for 4 reasons:

- Most did not seem to like the term 'offload' since that term would be
  confusing with, GRO, GSO, etc.
- A *significant* use case for many of the high-end ASICs in datacenters
  is routing.
- switchid does not make sense in the OVS/flow case because it is all
  about flows, not switches or routers, and parent made sense there.
- I wanted to combine this for use with SR-IOV use case, so one can more
  easily map PF->VF using this.

Can you give me a bit (a day) to clean-up that patch and submit an
alternative proposal to these?

>  	__IFLA_MAX
>  };
>  
> diff --git a/ip/ipaddress.c b/ip/ipaddress.c
> index 4d99324..bd36a07 100644
> --- a/ip/ipaddress.c
> +++ b/ip/ipaddress.c
> @@ -589,6 +589,14 @@ int print_linkinfo(const struct sockaddr_nl *who,
>  				      b1, sizeof(b1)));
>  	}
>  
> +	if (tb[IFLA_PHYS_SWITCH_ID]) {
> +		SPRINT_BUF(b1);
> +		fprintf(fp, "switchid %s ",
> +			hexstring_n2a(RTA_DATA(tb[IFLA_PHYS_SWITCH_ID]),
> +				      RTA_PAYLOAD(tb[IFLA_PHYS_SWITCH_ID]),
> +				      b1, sizeof(b1)));
> +	}
> +
>  	if (tb[IFLA_OPERSTATE])
>  		print_operstate(fp, rta_getattr_u8(tb[IFLA_OPERSTATE]));
>  
> -- 
> 1.9.3
> 

^ permalink raw reply

* Re: [patch iproute2 6/6] bridge/link: add learning_sync policy flag
From: Jamal Hadi Salim @ 2014-12-04 14:15 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo, bcrl,
	hemal
In-Reply-To: <54806115.9080008@mojatatu.com>

On 12/04/14 08:26, Jamal Hadi Salim wrote:

> I actually dont think any of these patches needed an ack because they
> are off.
       ^^^ meant ok ;-> or obvious.

cheers,
jamal

^ permalink raw reply

* Re: FIXME in solos-pci.c
From: chas williams - CONTRACTOR @ 2014-12-04 14:05 UTC (permalink / raw)
  To: nick, Guy Ellis; +Cc: linux-atm-general, netdev, linux-kernel
In-Reply-To: <547FDBF6.5020008@gmail.com>

The last I heard on this topic was from Guy Ellis,

	From: Guy Ellis <guy@traverse.com.au>
	To: linux-atm-general@lists.sourceforge.net
	Subject: Re: [Linux-ATM-General] solos-pci.c: Fix me
	Date: Tue, 22 Jul 2014 07:34:30 +1000

	Hi Chas/Nick,

	I think the FIXME is reminder to deal correctly with an unknown command.
	At the moment an unknown command is treated as a PKT_COMMAND which is 
	incorrect.
	I think the correct behaviour would be to ignore unknown commands and 
	just free the skb.

	That said I doubt this condition ever occurs, which is probably why it 
	has been this way since day 1.

	Nathan is on vacation at the moment, when he gets back in August I'll 
	ask him to tidy this up.

So perhaps, Guy could let us know if he had a chance for Nathan to
look at removing this FIXME.


On Wed, 03 Dec 2014 22:58:46 -0500
nick <xerofoify@gmail.com> wrote:

> Greetings Chas,
> I am wondering if there is any reason for the FIXME message in the below code. This is due to after reading the other parts of the function the code and logical seem similar and correct unless I am missing something about the hardware.
> Cheers Nick 
>  case PKT_COMMAND:
>                        default: /* FIXME: Not really, surely? */
>                               if (process_command(card, port, skb))
>                                         break;
>                                spin_lock(&card->cli_queue_lock);
>                                if (skb_queue_len(&card->cli_queue[port]) > 10) {
>                                      if (net_ratelimit())
>                                                 dev_warn(&card->dev->dev, "Dropping console response on port %d\n",
>                                                          port);
>                                         dev_kfree_skb_any(skb);
>                                 } else
>                                          skb_queue_tail(&card->cli_queue[port], skb);
>                                spin_unlock(&card->cli_queue_lock);
>                                break;
>                        }
>  
> 

^ permalink raw reply

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
From: Andy Gospodarek @ 2014-12-04 13:56 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Jiri Pirko, netdev, davem, nhorman, andy, tgraf, dborkman,
	ogerlitz, jesse, pshelar, azhou, ben, stephen, jeffrey.t.kirsher,
	vyasevic, xiyou.wangcong, john.r.fastabend, edumazet, sfeldma,
	f.fainelli, roopa, linville, jasowang, ebiederm, nicolas.dichtel,
	ryazanov.s.a, buytenh, aviadr, nbd, alexei.starovoitov,
	Neil.Jerram, ronye, simon.horman, alexander.h.duyck, john.ronciak,
	mleitner, shrijeet, bcrl, hemal@
In-Reply-To: <54805EB6.2010901@mojatatu.com>

On Thu, Dec 04, 2014 at 08:16:38AM -0500, Jamal Hadi Salim wrote:
> 
> General comment:
> iproute2 patches need to be addressed to the maintainer
> stephen@networkplumber.org
> And are we mainstream enough now we can drop the 1000 people on the
> To: list? I doubt most of these people are even following the discussion (I
> planted an easter egg for Aviad and didnt see him
> reacting) ;->

I actually don't mind them coming here since it is a change to netlink.
Thanks for sending them to netdev, Jiri.

> 
> cheers,
> jamal
> 
> On 12/04/14 03:57, Jiri Pirko wrote:
> >Jiri Pirko (1):
> >   iproute2: ipa: show switch id
> >
> >Scott Feldman (5):
> >   bridge/fdb: fix statistics output spacing
> >   bridge/fdb: add flag/indication for FDB entry synced from offload
> >     device
> >   bridge/link: add new offload hwmode swdev
> >   link: add missing IFLA_BRPORT_PROXYARP
> >   bridge/link: add learning_sync policy flag
> >
> >  bridge/fdb.c              |  4 +++-
> >  bridge/link.c             | 17 +++++++++++++++--
> >  include/linux/if_bridge.h |  1 +
> >  include/linux/if_link.h   |  3 +++
> >  include/linux/neighbour.h |  1 +
> >  ip/ipaddress.c            |  8 ++++++++
> >  man/man8/bridge.8         | 19 ++++++++++++++-----
> >  7 files changed, 45 insertions(+), 8 deletions(-)
> >
> 

^ permalink raw reply

* Re: [PATCH net] net: sctp: use MAX_HEADER for headroom reserve in output path
From: Neil Horman @ 2014-12-04 13:41 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, linux-sctp, netdev, robert
In-Reply-To: <1417605238-9936-1-git-send-email-dborkman@redhat.com>

On Wed, Dec 03, 2014 at 12:13:58PM +0100, Daniel Borkmann wrote:
> To accomodate for enough headroom for tunnels, use MAX_HEADER instead
> of LL_MAX_HEADER. Robert reported that he has hit after roughly 40hrs
> of trinity an skb_under_panic() via SCTP output path (see reference).
> I couldn't reproduce it from here, but not using MAX_HEADER as elsewhere
> in other protocols might be one possible cause for this.
> 
> In any case, it looks like accounting on chunks themself seems to look
> good as the skb already passed the SCTP output path and did not hit
> any skb_over_panic(). Given tunneling was enabled in his .config, the
> headroom would have been expanded by MAX_HEADER in this case.
> 
> Reported-by: Robert Święcki <robert@swiecki.net>
> Reference: https://lkml.org/lkml/2014/12/1/507
> Fixes: 594ccc14dfe4d ("[SCTP] Replace incorrect use of dev_alloc_skb with alloc_skb in sctp_packet_transmit().")
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> ---
>  net/sctp/output.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sctp/output.c b/net/sctp/output.c
> index 42dffd4..fc5e45b 100644
> --- a/net/sctp/output.c
> +++ b/net/sctp/output.c
> @@ -401,12 +401,12 @@ int sctp_packet_transmit(struct sctp_packet *packet)
>  	sk = chunk->skb->sk;
>  
>  	/* Allocate the new skb.  */
> -	nskb = alloc_skb(packet->size + LL_MAX_HEADER, GFP_ATOMIC);
> +	nskb = alloc_skb(packet->size + MAX_HEADER, GFP_ATOMIC);
>  	if (!nskb)
>  		goto nomem;
>  
>  	/* Make sure the outbound skb has enough header room reserved. */
> -	skb_reserve(nskb, packet->overhead + LL_MAX_HEADER);
> +	skb_reserve(nskb, packet->overhead + MAX_HEADER);
>  
>  	/* Set the owning socket so that we know where to get the
>  	 * destination IP address.
> -- 
> 1.7.11.7
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* Re: [patch iproute2 6/6] bridge/link: add learning_sync policy flag
From: Jamal Hadi Salim @ 2014-12-04 13:26 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo, bcrl,
	hemal
In-Reply-To: <1417683438-10935-7-git-send-email-jiri@resnulli.us>

On 12/04/14 03:57, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> Add 'learned_sync' flag to turn on/off syncing of learned FDB entries from
> offload device to bridge's FDB.   Flag would be set/cleared in on SELF using
> hwmode qualifier 'swdev'.  E.g.:
>
>    $ sudo bridge link set dev swp1 hwmode swdev learning_sync on
>
>    $ bridge -d link show dev swp1
>    2: swp1 state UNKNOWN : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 2
>        hairpin off guard off root_block off fastleave off learning off flood off
>    2: swp1 state UNKNOWN : <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0
>        learning on learning_sync on hwmode swdev
>
> Adds new IFLA_BRPORT_LEARNING_SYNC attribute for IFLA_PROTINFO on the SELF
> brport.
>
> Signed-off-by: Scott Feldman <sfeldma@gmail.com>
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>

I actually dont think any of these patches needed an ack because they
are off. My ack is just to show support (despite the noun challenges).

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [patch iproute2 4/6] bridge/link: add new offload hwmode swdev
From: Jamal Hadi Salim @ 2014-12-04 13:23 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo, bcrl,
	hemal
In-Reply-To: <1417683438-10935-5-git-send-email-jiri@resnulli.us>

On 12/04/14 03:57, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>
> To support full-featured switch devices offloading bridge funtionality,
> add new hwmode 'swdev'.  Like 'vepa' and 'veb', 'swdev' indicated bridge
> port functionality is being offloaded to hardware.
>

Unhappy with the name swdev ..
Ok, go ahead and beat up on me.

cheers,
jamal

^ permalink raw reply

* Re: [patch iproute2 3/6] bridge/fdb: add flag/indication for FDB entry synced from offload device
From: Jamal Hadi Salim @ 2014-12-04 13:19 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo, bcrl,
	hemal
In-Reply-To: <1417683438-10935-4-git-send-email-jiri@resnulli.us>

On 12/04/14 03:57, Jiri Pirko wrote:
> From: Scott Feldman <sfeldma@gmail.com>
>

Did i say i like it already?;->

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [patch iproute2 1/6] iproute2: ipa: show switch id
From: Jamal Hadi Salim @ 2014-12-04 13:17 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo, bcrl,
	hemal
In-Reply-To: <1417683438-10935-2-git-send-email-jiri@resnulli.us>

On 12/04/14 03:57, Jiri Pirko wrote:
> Signed-off-by: Jiri Pirko <jiri@resnulli.us>
> ---
>   include/linux/if_link.h | 1 +
>   ip/ipaddress.c          | 8 ++++++++
>   2 files changed, 9 insertions(+)
>
> diff --git a/include/linux/if_link.h b/include/linux/if_link.h
> index 4732063..a6e2594 100644
> --- a/include/linux/if_link.h
> +++ b/include/linux/if_link.h
> @@ -145,6 +145,7 @@ enum {
>   	IFLA_CARRIER,
>   	IFLA_PHYS_PORT_ID,
>   	IFLA_CARRIER_CHANGES,
> +	IFLA_PHYS_SWITCH_ID,
>   	__IFLA_MAX
>   };
>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

^ permalink raw reply

* Re: [patch iproute2 0/6] iproute2: add changes for switchdev
From: Jamal Hadi Salim @ 2014-12-04 13:16 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, nhorman, andy, tgraf, dborkman, ogerlitz, jesse, pshelar,
	azhou, ben, stephen, jeffrey.t.kirsher, vyasevic, xiyou.wangcong,
	john.r.fastabend, edumazet, sfeldma, f.fainelli, roopa, linville,
	jasowang, ebiederm, nicolas.dichtel, ryazanov.s.a, buytenh,
	aviadr, nbd, alexei.starovoitov, Neil.Jerram, ronye, simon.horman,
	alexander.h.duyck, john.ronciak, mleitner, shrijeet, gospo, bcrl,
	hemal
In-Reply-To: <1417683438-10935-1-git-send-email-jiri@resnulli.us>


General comment:
iproute2 patches need to be addressed to the maintainer
stephen@networkplumber.org
And are we mainstream enough now we can drop the 1000 people on the
To: list? I doubt most of these people are even following the discussion 
(I planted an easter egg for Aviad and didnt see him
reacting) ;->

cheers,
jamal

On 12/04/14 03:57, Jiri Pirko wrote:
> Jiri Pirko (1):
>    iproute2: ipa: show switch id
>
> Scott Feldman (5):
>    bridge/fdb: fix statistics output spacing
>    bridge/fdb: add flag/indication for FDB entry synced from offload
>      device
>    bridge/link: add new offload hwmode swdev
>    link: add missing IFLA_BRPORT_PROXYARP
>    bridge/link: add learning_sync policy flag
>
>   bridge/fdb.c              |  4 +++-
>   bridge/link.c             | 17 +++++++++++++++--
>   include/linux/if_bridge.h |  1 +
>   include/linux/if_link.h   |  3 +++
>   include/linux/neighbour.h |  1 +
>   ip/ipaddress.c            |  8 ++++++++
>   man/man8/bridge.8         | 19 ++++++++++++++-----
>   7 files changed, 45 insertions(+), 8 deletions(-)
>

^ permalink raw reply

* [PATCH net-next 04/10] net/mlx4: Change QP allocation scheme
From: Or Gerlitz @ 2014-12-04 13:13 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Tal Alon, Jack Morgenstein,
	Eugenia Emantayev, Or Gerlitz
In-Reply-To: <1417698835-11050-1-git-send-email-ogerlitz@mellanox.com>

From: Eugenia Emantayev <eugenia@mellanox.co.il>

When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.

The current Ethernet driver code reserves a Tx QP range with 256b alignment.

This is wrong because if there are more than 64 Tx QPs in use,
QPNs >= base + 65 will have bits 6/7 set.

This problem is not specific for the Ethernet driver, any entity that
tries to reserve more than 64 BF-enabled QPs should fail. Also, using
ranges is not necessary here and is wasteful.

The new mechanism introduced here will support reservation for
"Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
(when hypervisors support WC in VMs). The flow we use is:

1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
   and request "BF enabled QPs" if BF is supported for the function

2. In the ALLOC_RES FW command, change param1 to:
a. param1[23:0]  - number of QPs
b. param1[31-24] - flags controlling QPs reservation

Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
bits 6 and 7 unset in order to be used in Ethernet.

Bits 24-30 of the flags are currently reserved.

When a function tries to allocate a QP, it states the required attributes
for this QP. Those attributes are considered "best-effort". If an attribute,
such as Ethernet BF enabled QP, is a must-have attribute, the function has
to check that attribute is supported before trying to do the allocation.

In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
which are unsupported. If SRIOV is used, the PF validates those attirubtes
and masks out unsupported attributes as well. In order to notify VFs which
attirbutes are supported, the VF uses QUERY_FUNC_CAP command. This command's
mailbox is filled by the PF, which notifies which QP allocation attributes
it supports.


Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/infiniband/hw/mlx4/main.c                  |    2 +-
 drivers/infiniband/hw/mlx4/qp.c                    |   11 +++--
 drivers/net/ethernet/mellanox/mlx4/alloc.c         |   43 +++++++++++++++++---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c     |   10 +----
 drivers/net/ethernet/mellanox/mlx4/en_rx.c         |    4 +-
 drivers/net/ethernet/mellanox/mlx4/en_tx.c         |   14 +++++-
 drivers/net/ethernet/mellanox/mlx4/fw.c            |   20 +++++++++-
 drivers/net/ethernet/mellanox/mlx4/fw.h            |    1 +
 drivers/net/ethernet/mellanox/mlx4/main.c          |   11 +++++-
 drivers/net/ethernet/mellanox/mlx4/mlx4.h          |    5 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h       |    2 +-
 drivers/net/ethernet/mellanox/mlx4/qp.c            |   24 +++++++++--
 .../net/ethernet/mellanox/mlx4/resource_tracker.c  |    7 +++-
 include/linux/mlx4/device.h                        |   21 +++++++++-
 14 files changed, 137 insertions(+), 38 deletions(-)

diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 0c33755..57ecc5b 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2227,7 +2227,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
 		ibdev->steer_qpn_count = MLX4_IB_UC_MAX_NUM_QPS;
 		err = mlx4_qp_reserve_range(dev, ibdev->steer_qpn_count,
 					    MLX4_IB_UC_STEER_QPN_ALIGN,
-					    &ibdev->steer_qpn_base);
+					    &ibdev->steer_qpn_base, 0);
 		if (err)
 			goto err_counter;
 
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 9c5150c..506d1bd 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -802,16 +802,19 @@ static int create_qp_common(struct mlx4_ib_dev *dev, struct ib_pd *pd,
 			}
 		}
 	} else {
-		/* Raw packet QPNs must be aligned to 8 bits. If not, the WQE
-		 * BlueFlame setup flow wrongly causes VLAN insertion. */
+		/* Raw packet QPNs may not have bits 6,7 set in their qp_num;
+		 * otherwise, the WQE BlueFlame setup flow wrongly causes
+		 * VLAN insertion. */
 		if (init_attr->qp_type == IB_QPT_RAW_PACKET)
-			err = mlx4_qp_reserve_range(dev->dev, 1, 1 << 8, &qpn);
+			err = mlx4_qp_reserve_range(dev->dev, 1, 1, &qpn,
+						    init_attr->cap.max_send_wr ?
+						    MLX4_RESERVE_ETH_BF_QP : 0);
 		else
 			if (qp->flags & MLX4_IB_QP_NETIF)
 				err = mlx4_ib_steer_qp_alloc(dev, 1, &qpn);
 			else
 				err = mlx4_qp_reserve_range(dev->dev, 1, 1,
-							    &qpn);
+							    &qpn, 0);
 		if (err)
 			goto err_proxy;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx4/alloc.c b/drivers/net/ethernet/mellanox/mlx4/alloc.c
index b0297da..91a8acc 100644
--- a/drivers/net/ethernet/mellanox/mlx4/alloc.c
+++ b/drivers/net/ethernet/mellanox/mlx4/alloc.c
@@ -76,22 +76,53 @@ void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj, int use_rr)
 	mlx4_bitmap_free_range(bitmap, obj, 1, use_rr);
 }
 
-u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align)
+static unsigned long find_aligned_range(unsigned long *bitmap,
+					u32 start, u32 nbits,
+					int len, int align, u32 skip_mask)
+{
+	unsigned long end, i;
+
+again:
+	start = ALIGN(start, align);
+
+	while ((start < nbits) && (test_bit(start, bitmap) ||
+				   (start & skip_mask)))
+		start += align;
+
+	if (start >= nbits)
+		return -1;
+
+	end = start+len;
+	if (end > nbits)
+		return -1;
+
+	for (i = start + 1; i < end; i++) {
+		if (test_bit(i, bitmap) || ((u32)i & skip_mask)) {
+			start = i + 1;
+			goto again;
+		}
+	}
+
+	return start;
+}
+
+u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt,
+			    int align, u32 skip_mask)
 {
 	u32 obj;
 
-	if (likely(cnt == 1 && align == 1))
+	if (likely(cnt == 1 && align == 1 && !skip_mask))
 		return mlx4_bitmap_alloc(bitmap);
 
 	spin_lock(&bitmap->lock);
 
-	obj = bitmap_find_next_zero_area(bitmap->table, bitmap->max,
-				bitmap->last, cnt, align - 1);
+	obj = find_aligned_range(bitmap->table, bitmap->last,
+				 bitmap->max, cnt, align, skip_mask);
 	if (obj >= bitmap->max) {
 		bitmap->top = (bitmap->top + bitmap->max + bitmap->reserved_top)
 				& bitmap->mask;
-		obj = bitmap_find_next_zero_area(bitmap->table, bitmap->max,
-						0, cnt, align - 1);
+		obj = find_aligned_range(bitmap->table, 0, bitmap->max,
+					 cnt, align, skip_mask);
 	}
 
 	if (obj < bitmap->max) {
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index b7c9978..6537631 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -595,7 +595,7 @@ static int mlx4_en_get_qp(struct mlx4_en_priv *priv)
 		return 0;
 	}
 
-	err = mlx4_qp_reserve_range(dev, 1, 1, qpn);
+	err = mlx4_qp_reserve_range(dev, 1, 1, qpn, 0);
 	en_dbg(DRV, priv, "Reserved qp %d\n", *qpn);
 	if (err) {
 		en_err(priv, "Failed to reserve qp for mac registration\n");
@@ -1974,15 +1974,8 @@ int mlx4_en_alloc_resources(struct mlx4_en_priv *priv)
 {
 	struct mlx4_en_port_profile *prof = priv->prof;
 	int i;
-	int err;
 	int node;
 
-	err = mlx4_qp_reserve_range(priv->mdev->dev, priv->tx_ring_num, 256, &priv->base_tx_qpn);
-	if (err) {
-		en_err(priv, "failed reserving range for TX rings\n");
-		return err;
-	}
-
 	/* Create tx Rings */
 	for (i = 0; i < priv->tx_ring_num; i++) {
 		node = cpu_to_node(i % num_online_cpus());
@@ -1991,7 +1984,6 @@ int mlx4_en_alloc_resources(struct mlx4_en_priv *priv)
 			goto err;
 
 		if (mlx4_en_create_tx_ring(priv, &priv->tx_ring[i],
-					   priv->base_tx_qpn + i,
 					   prof->tx_ring_size, TXBB_SIZE,
 					   node, i))
 			goto err;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 3a9f9bf..4862552 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1132,7 +1132,7 @@ int mlx4_en_create_drop_qp(struct mlx4_en_priv *priv)
 	int err;
 	u32 qpn;
 
-	err = mlx4_qp_reserve_range(priv->mdev->dev, 1, 1, &qpn);
+	err = mlx4_qp_reserve_range(priv->mdev->dev, 1, 1, &qpn, 0);
 	if (err) {
 		en_err(priv, "Failed reserving drop qpn\n");
 		return err;
@@ -1175,7 +1175,7 @@ int mlx4_en_config_rss_steer(struct mlx4_en_priv *priv)
 	en_dbg(DRV, priv, "Configuring rss steering\n");
 	err = mlx4_qp_reserve_range(mdev->dev, priv->rx_ring_num,
 				    priv->rx_ring_num,
-				    &rss_map->base_qpn);
+				    &rss_map->base_qpn, 0);
 	if (err) {
 		en_err(priv, "Failed reserving %d qps\n", priv->rx_ring_num);
 		return err;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index d0cecbd..a308d41 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -46,7 +46,7 @@
 #include "mlx4_en.h"
 
 int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
-			   struct mlx4_en_tx_ring **pring, int qpn, u32 size,
+			   struct mlx4_en_tx_ring **pring, u32 size,
 			   u16 stride, int node, int queue_index)
 {
 	struct mlx4_en_dev *mdev = priv->mdev;
@@ -112,11 +112,17 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 	       ring, ring->buf, ring->size, ring->buf_size,
 	       (unsigned long long) ring->wqres.buf.direct.map);
 
-	ring->qpn = qpn;
+	err = mlx4_qp_reserve_range(mdev->dev, 1, 1, &ring->qpn,
+				    MLX4_RESERVE_ETH_BF_QP);
+	if (err) {
+		en_err(priv, "failed reserving qp for TX ring\n");
+		goto err_map;
+	}
+
 	err = mlx4_qp_alloc(mdev->dev, ring->qpn, &ring->qp, GFP_KERNEL);
 	if (err) {
 		en_err(priv, "Failed allocating qp %d\n", ring->qpn);
-		goto err_map;
+		goto err_reserve;
 	}
 	ring->qp.event = mlx4_en_sqp_event;
 
@@ -143,6 +149,8 @@ int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 	*pring = ring;
 	return 0;
 
+err_reserve:
+	mlx4_qp_release_range(mdev->dev, ring->qpn, 1);
 err_map:
 	mlx4_en_unmap_buffer(&ring->wqres.buf);
 err_hwq_res:
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 8c9ea70..745deb7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -266,10 +266,15 @@ int mlx4_QUERY_FUNC_CAP_wrapper(struct mlx4_dev *dev, int slave,
 #define QUERY_FUNC_CAP_MTT_QUOTA_OFFSET		0x64
 #define QUERY_FUNC_CAP_MCG_QUOTA_OFFSET		0x68
 
+#define QUERY_FUNC_CAP_EXTRA_FLAGS_OFFSET	0x6c
+
 #define QUERY_FUNC_CAP_FMR_FLAG			0x80
 #define QUERY_FUNC_CAP_FLAG_RDMA		0x40
 #define QUERY_FUNC_CAP_FLAG_ETH			0x80
 #define QUERY_FUNC_CAP_FLAG_QUOTAS		0x10
+#define QUERY_FUNC_CAP_FLAG_VALID_MAILBOX	0x04
+
+#define QUERY_FUNC_CAP_EXTRA_FLAGS_BF_QP_ALLOC_FLAG	(1UL << 31)
 
 /* when opcode modifier = 1 */
 #define QUERY_FUNC_CAP_PHYS_PORT_OFFSET		0x3
@@ -339,7 +344,7 @@ int mlx4_QUERY_FUNC_CAP_wrapper(struct mlx4_dev *dev, int slave,
 			mlx4_get_active_ports(dev, slave);
 		/* enable rdma and ethernet interfaces, and new quota locations */
 		field = (QUERY_FUNC_CAP_FLAG_ETH | QUERY_FUNC_CAP_FLAG_RDMA |
-			 QUERY_FUNC_CAP_FLAG_QUOTAS);
+			 QUERY_FUNC_CAP_FLAG_QUOTAS | QUERY_FUNC_CAP_FLAG_VALID_MAILBOX);
 		MLX4_PUT(outbox->buf, field, QUERY_FUNC_CAP_FLAGS_OFFSET);
 
 		field = min(
@@ -401,6 +406,8 @@ int mlx4_QUERY_FUNC_CAP_wrapper(struct mlx4_dev *dev, int slave,
 		MLX4_PUT(outbox->buf, size, QUERY_FUNC_CAP_MCG_QUOTA_OFFSET);
 		MLX4_PUT(outbox->buf, size, QUERY_FUNC_CAP_MCG_QUOTA_OFFSET_DEP);
 
+		size = QUERY_FUNC_CAP_EXTRA_FLAGS_BF_QP_ALLOC_FLAG;
+		MLX4_PUT(outbox->buf, size, QUERY_FUNC_CAP_EXTRA_FLAGS_OFFSET);
 	} else
 		err = -EINVAL;
 
@@ -493,6 +500,17 @@ int mlx4_QUERY_FUNC_CAP(struct mlx4_dev *dev, u8 gen_or_port,
 		MLX4_GET(size, outbox, QUERY_FUNC_CAP_RESERVED_EQ_OFFSET);
 		func_cap->reserved_eq = size & 0xFFFFFF;
 
+		func_cap->extra_flags = 0;
+
+		/* Mailbox data from 0x6c and onward should only be treated if
+		 * QUERY_FUNC_CAP_FLAG_VALID_MAILBOX is set in func_cap->flags
+		 */
+		if (func_cap->flags & QUERY_FUNC_CAP_FLAG_VALID_MAILBOX) {
+			MLX4_GET(size, outbox, QUERY_FUNC_CAP_EXTRA_FLAGS_OFFSET);
+			if (size & QUERY_FUNC_CAP_EXTRA_FLAGS_BF_QP_ALLOC_FLAG)
+				func_cap->extra_flags |= MLX4_QUERY_FUNC_FLAGS_BF_RES_QP;
+		}
+
 		goto out;
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.h b/drivers/net/ethernet/mellanox/mlx4/fw.h
index 475215e..0e910a4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.h
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.h
@@ -144,6 +144,7 @@ struct mlx4_func_cap {
 	u8	port_flags;
 	u8	flags1;
 	u64	phys_port_id;
+	u32	extra_flags;
 };
 
 struct mlx4_func {
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 3044f9e..6a9a941 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -466,8 +466,13 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct mlx4_dev_cap *dev_cap)
 	    mlx4_is_master(dev))
 		dev->caps.function_caps |= MLX4_FUNC_CAP_64B_EQE_CQE;
 
-	if (!mlx4_is_slave(dev))
+	if (!mlx4_is_slave(dev)) {
 		mlx4_enable_cqe_eqe_stride(dev);
+		dev->caps.alloc_res_qp_mask =
+			(dev->caps.bf_reg_size ? MLX4_RESERVE_ETH_BF_QP : 0);
+	} else {
+		dev->caps.alloc_res_qp_mask = 0;
+	}
 
 	return 0;
 }
@@ -817,6 +822,10 @@ static int mlx4_slave_cap(struct mlx4_dev *dev)
 
 	slave_adjust_steering_mode(dev, &dev_cap, &hca_param);
 
+	if (func_cap.extra_flags & MLX4_QUERY_FUNC_FLAGS_BF_RES_QP &&
+	    dev->caps.bf_reg_size)
+		dev->caps.alloc_res_qp_mask |= MLX4_RESERVE_ETH_BF_QP;
+
 	return 0;
 
 err_mem:
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4.h b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
index b67ef48..6834da6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4.h
@@ -884,7 +884,8 @@ extern struct workqueue_struct *mlx4_wq;
 
 u32 mlx4_bitmap_alloc(struct mlx4_bitmap *bitmap);
 void mlx4_bitmap_free(struct mlx4_bitmap *bitmap, u32 obj, int use_rr);
-u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt, int align);
+u32 mlx4_bitmap_alloc_range(struct mlx4_bitmap *bitmap, int cnt,
+			    int align, u32 skip_mask);
 void mlx4_bitmap_free_range(struct mlx4_bitmap *bitmap, u32 obj, int cnt,
 			    int use_rr);
 u32 mlx4_bitmap_avail(struct mlx4_bitmap *bitmap);
@@ -970,7 +971,7 @@ int mlx4_DMA_wrapper(struct mlx4_dev *dev, int slave,
 		     struct mlx4_cmd_mailbox *outbox,
 		     struct mlx4_cmd_info *cmd);
 int __mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align,
-			    int *base);
+			    int *base, u8 flags);
 void __mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt);
 int __mlx4_register_mac(struct mlx4_dev *dev, u8 port, u64 mac);
 void __mlx4_unregister_mac(struct mlx4_dev *dev, u8 port, u64 mac);
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index aaa7efb..576dd07 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -778,7 +778,7 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev);
 
 int mlx4_en_create_tx_ring(struct mlx4_en_priv *priv,
 			   struct mlx4_en_tx_ring **pring,
-			   int qpn, u32 size, u16 stride,
+			   u32 size, u16 stride,
 			   int node, int queue_index);
 void mlx4_en_destroy_tx_ring(struct mlx4_en_priv *priv,
 			     struct mlx4_en_tx_ring **pring);
diff --git a/drivers/net/ethernet/mellanox/mlx4/qp.c b/drivers/net/ethernet/mellanox/mlx4/qp.c
index 2301365..40e82ed 100644
--- a/drivers/net/ethernet/mellanox/mlx4/qp.c
+++ b/drivers/net/ethernet/mellanox/mlx4/qp.c
@@ -42,6 +42,10 @@
 #include "mlx4.h"
 #include "icm.h"
 
+/* QP to support BF should have bits 6,7 cleared */
+#define MLX4_BF_QP_SKIP_MASK	0xc0
+#define MLX4_MAX_BF_QP_RANGE	0x40
+
 void mlx4_qp_event(struct mlx4_dev *dev, u32 qpn, int event_type)
 {
 	struct mlx4_qp_table *qp_table = &mlx4_priv(dev)->qp_table;
@@ -207,26 +211,36 @@ int mlx4_qp_modify(struct mlx4_dev *dev, struct mlx4_mtt *mtt,
 EXPORT_SYMBOL_GPL(mlx4_qp_modify);
 
 int __mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align,
-				   int *base)
+			    int *base, u8 flags)
 {
+	int bf_qp = !!(flags & (u8)MLX4_RESERVE_ETH_BF_QP);
+
 	struct mlx4_priv *priv = mlx4_priv(dev);
 	struct mlx4_qp_table *qp_table = &priv->qp_table;
 
-	*base = mlx4_bitmap_alloc_range(&qp_table->bitmap, cnt, align);
+	if (cnt > MLX4_MAX_BF_QP_RANGE && bf_qp)
+		return -ENOMEM;
+
+	*base = mlx4_bitmap_alloc_range(&qp_table->bitmap, cnt, align,
+					bf_qp ? MLX4_BF_QP_SKIP_MASK : 0);
 	if (*base == -1)
 		return -ENOMEM;
 
 	return 0;
 }
 
-int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base)
+int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align,
+			  int *base, u8 flags)
 {
 	u64 in_param = 0;
 	u64 out_param;
 	int err;
 
+	/* Turn off all unsupported QP allocation flags */
+	flags &= dev->caps.alloc_res_qp_mask;
+
 	if (mlx4_is_mfunc(dev)) {
-		set_param_l(&in_param, cnt);
+		set_param_l(&in_param, (((u32)flags) << 24) | (u32)cnt);
 		set_param_h(&in_param, align);
 		err = mlx4_cmd_imm(dev, in_param, &out_param,
 				   RES_QP, RES_OP_RESERVE,
@@ -238,7 +252,7 @@ int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base)
 		*base = get_param_l(&out_param);
 		return 0;
 	}
-	return __mlx4_qp_reserve_range(dev, cnt, align, base);
+	return __mlx4_qp_reserve_range(dev, cnt, align, base, flags);
 }
 EXPORT_SYMBOL_GPL(mlx4_qp_reserve_range);
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
index 16f617b..4efbd1e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
+++ b/drivers/net/ethernet/mellanox/mlx4/resource_tracker.c
@@ -1543,16 +1543,21 @@ static int qp_alloc_res(struct mlx4_dev *dev, int slave, int op, int cmd,
 	int align;
 	int base;
 	int qpn;
+	u8 flags;
 
 	switch (op) {
 	case RES_OP_RESERVE:
 		count = get_param_l(&in_param) & 0xffffff;
+		/* Turn off all unsupported QP allocation flags that the
+		 * slave tries to set.
+		 */
+		flags = (get_param_l(&in_param) >> 24) & dev->caps.alloc_res_qp_mask;
 		align = get_param_h(&in_param);
 		err = mlx4_grant_resource(dev, slave, RES_QP, count, 0);
 		if (err)
 			return err;
 
-		err = __mlx4_qp_reserve_range(dev, count, align, &base);
+		err = __mlx4_qp_reserve_range(dev, count, align, &base, flags);
 		if (err) {
 			mlx4_release_resource(dev, slave, RES_QP, count, 0);
 			return err;
diff --git a/include/linux/mlx4/device.h b/include/linux/mlx4/device.h
index 3951b53..272aa25 100644
--- a/include/linux/mlx4/device.h
+++ b/include/linux/mlx4/device.h
@@ -195,6 +195,22 @@ enum {
 };
 
 enum {
+	MLX4_QUERY_FUNC_FLAGS_BF_RES_QP		= 1LL << 0
+};
+
+/* bit enums for an 8-bit flags field indicating special use
+ * QPs which require special handling in qp_reserve_range.
+ * Currently, this only includes QPs used by the ETH interface,
+ * where we expect to use blueflame.  These QPs must not have
+ * bits 6 and 7 set in their qp number.
+ *
+ * This enum may use only bits 0..7.
+ */
+enum {
+	MLX4_RESERVE_ETH_BF_QP	= 1 << 7,
+};
+
+enum {
 	MLX4_DEV_CAP_64B_EQE_ENABLED	= 1LL << 0,
 	MLX4_DEV_CAP_64B_CQE_ENABLED	= 1LL << 1,
 	MLX4_DEV_CAP_CQE_STRIDE_ENABLED	= 1LL << 2,
@@ -501,6 +517,7 @@ struct mlx4_caps {
 	u64			phys_port_id[MLX4_MAX_PORTS + 1];
 	int			tunnel_offload_mode;
 	u8			rx_checksum_flags_port[MLX4_MAX_PORTS + 1];
+	u8			alloc_res_qp_mask;
 };
 
 struct mlx4_buf_list {
@@ -950,8 +967,8 @@ int mlx4_cq_alloc(struct mlx4_dev *dev, int nent, struct mlx4_mtt *mtt,
 		  struct mlx4_uar *uar, u64 db_rec, struct mlx4_cq *cq,
 		  unsigned vector, int collapsed, int timestamp_en);
 void mlx4_cq_free(struct mlx4_dev *dev, struct mlx4_cq *cq);
-
-int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align, int *base);
+int mlx4_qp_reserve_range(struct mlx4_dev *dev, int cnt, int align,
+			  int *base, u8 flags);
 void mlx4_qp_release_range(struct mlx4_dev *dev, int base_qpn, int cnt);
 
 int mlx4_qp_alloc(struct mlx4_dev *dev, int qpn, struct mlx4_qp *qp,
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 08/10] net/mlx4_core: Add explicit error message when rule doesn't meet configuration
From: Or Gerlitz @ 2014-12-04 13:13 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Matan Barak, Amir Vadai, Tal Alon, Jack Morgenstein,
	Or Gerlitz
In-Reply-To: <1417698835-11050-1-git-send-email-ogerlitz@mellanox.com>

From: Matan Barak <matanb@mellanox.com>

When a given flow steering rule is invalid in respect to the current
steering configuration, print the correct error message to the system log.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/mcg.c |   21 ++++++++++++++++++---
 1 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mcg.c b/drivers/net/ethernet/mellanox/mlx4/mcg.c
index 8728431..a3867e7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mcg.c
+++ b/drivers/net/ethernet/mellanox/mlx4/mcg.c
@@ -999,12 +999,27 @@ int mlx4_flow_attach(struct mlx4_dev *dev,
 	}
 
 	ret = mlx4_QP_FLOW_STEERING_ATTACH(dev, mailbox, size >> 2, reg_id);
-	if (ret == -ENOMEM)
+	if (ret == -ENOMEM) {
 		mlx4_err_rule(dev,
 			      "mcg table is full. Fail to register network rule\n",
 			      rule);
-	else if (ret)
-		mlx4_err_rule(dev, "Fail to register network rule\n", rule);
+	} else if (ret) {
+		if (ret == -ENXIO) {
+			if (dev->caps.steering_mode != MLX4_STEERING_MODE_DEVICE_MANAGED)
+				mlx4_err_rule(dev,
+					      "DMFS is not enabled, "
+					      "failed to register network rule.\n",
+					      rule);
+			else
+				mlx4_err_rule(dev,
+					      "Rule exceeds the dmfs_high_rate_mode limitations, "
+					      "failed to register network rule.\n",
+					      rule);
+
+		} else {
+			mlx4_err_rule(dev, "Fail to register network rule.\n", rule);
+		}
+	}
 
 	mlx4_free_cmd_mailbox(dev, mailbox);
 
-- 
1.7.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox