Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Eric Dumazet @ 2011-02-23 17:16 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen
In-Reply-To: <4D653C9C.8010205@trash.net>

Le mercredi 23 février 2011 à 17:58 +0100, Patrick McHardy a écrit :
> Am 23.02.2011 17:48, schrieb Eric Dumazet:
> > Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :
> > 
> >> This needs to be a per-skb property, otherwise you could have the
> >> situation:
> >>
> >> - enqueue skb, double_buffering=0, increment buffer 0
> >> - enable double buffering
> >> - swap buffers
> >> - dequeue same skb, decrement buffer 1
> >>
> >> after which the qlen values of buffer 1 will be incorrect.
> >>
> > 
> > Normally its OK, because we bzero() the zone, and the "decrement" is
> > 0-bounded.
> 
> Yeah, but we might decrement buckets of different flows which
> are non-zero. Probably not too bad, but still not correct.
> 
> > I had this idea (of storing two bits per skb), but :
> > 
> > - It means that swap_buffer() should not touch (bzero) the 'old' bins
> 
> Yes, it means we have to properly decrement the old buffer
> until all bins reached zero.
> 
> > - Since hash perturbator is changed, we have to store the two hash
> > values per skb (instead of one u32 / classid).
> 
> Indeed.

BTQ, I had this idea of storing the double_buffer per skb reading SFB
paper, because paper says the double buffering is really needed only for
unelastic flows, not for all packets.

paper quote :

As one set of bins is being used
for queue management, a second set of bins using the next set of hash
functions can be warmed up. In this
case, any time a flow is classified as non-responsive, it is hashed
using the second set of hash functions and
the marking probabilities of the corresponding bins in the warmup set
are updated.

So using two 'hash' values per skb is the way to go, with special 0
value meanings : skb was not 'inserted' into virtual queues.




^ permalink raw reply

* Kernel panic nf_nat_setup_info+0x5b3/0x6e0
From: "Oleg A. Arkhangelsky" @ 2011-02-23 17:07 UTC (permalink / raw)
  To: netfilter-devel; +Cc: netdev

Hello,

Got this panic yesterday:
http://www.progtech.ru/~oleg/crash.txt

The offending instruction is:
cmpb 54(%edx), %cl # <variable>.tuple.dst.protonum,

and here is the assembler code of net/ipv4/netfilter/nf_nat_core.c:
http://www.progtech.ru/~oleg/nf_nat_core.s

Quick investigation lead me to conclusion that the problem is in
return of same_src function:

        return (t->dst.protonum == tuple->dst.protonum &&
                t->src.u3.ip == tuple->src.u3.ip &&
                t->src.u.all == tuple->src.u.all);

So either t or tuple pointer is bad, but I don't understand how
this can be.

Looks like the similar situation described here:
https://bugzilla.kernel.org/show_bug.cgi?id=21512

Any thoughts on this?
Thank you!

-- 
wbr, Oleg.

^ permalink raw reply

* Re: [Lxc-users] Bad checksums and lost packets with macvlan on dummy
From: Andrian Nord @ 2011-02-23 17:13 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: lxc-users, Patrick McHardy, Linux Netdev List
In-Reply-To: <4D628DC3.9000400@free.fr>

[-- Attachment #1: Type: text/plain, Size: 346 bytes --]

On Mon, Feb 21, 2011 at 05:07:31PM +0100, Daniel Lezcano wrote:
> I Cc'ed the netdev mailing list and Patrick in case my analysis is wrong 
> or incomplete.

I'm confirming, that this happens only when macvlan's are onto dummy net
device. In case of some physical interface under macvlan there is no lost
packages and no broken checksums.

[-- Attachment #2: Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply

* [PATCH] net_sched: long word align struct qdisc_skb_cb data
From: Eric Dumazet @ 2011-02-23 17:05 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen
In-Reply-To: <4D6534C3.1080305@trash.net>

Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :
> Am 23.02.2011 16:14, schrieb Eric Dumazet:
> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > index 16626a0..f40d32e 100644
> > --- a/include/net/sch_generic.h
> > +++ b/include/net/sch_generic.h
> > @@ -218,6 +218,7 @@ struct tcf_proto {
> >  
> >  struct qdisc_skb_cb {
> >  	unsigned int		pkt_len;
> > +	unsigned int		sfb_classid;
> >  	char			data[];
> >  };
> 
> This could be moved into a SFB specific cb, similar to what netem
> does.

Hmm... well... I want to be sure no other sch will destroy my values.

netem seems buggy then.

Probably following patch is needed ?

Thanks

[PATCH] net_sched: long word align struct qdisc_skb_cb data

netem_skb_cb() does :

return (struct netem_skb_cb *)qdisc_skb_cb(skb)->data;

Unfortunatly struct qdisc_skb_cb data is not long word aligned, so
access to psched_time_t time_to_send uses a non aligned access.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sch_generic.h |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 16626a0..dbdf2b5 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -218,7 +218,7 @@ struct tcf_proto {
 
 struct qdisc_skb_cb {
 	unsigned int		pkt_len;
-	char			data[];
+	long			data[];
 };
 
 static inline int qdisc_qlen(struct Qdisc *q)



^ permalink raw reply related

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Patrick McHardy @ 2011-02-23 16:58 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen
In-Reply-To: <1298479711.3301.373.camel@edumazet-laptop>

Am 23.02.2011 17:48, schrieb Eric Dumazet:
> Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :
> 
>> This needs to be a per-skb property, otherwise you could have the
>> situation:
>>
>> - enqueue skb, double_buffering=0, increment buffer 0
>> - enable double buffering
>> - swap buffers
>> - dequeue same skb, decrement buffer 1
>>
>> after which the qlen values of buffer 1 will be incorrect.
>>
> 
> Normally its OK, because we bzero() the zone, and the "decrement" is
> 0-bounded.

Yeah, but we might decrement buckets of different flows which
are non-zero. Probably not too bad, but still not correct.

> I had this idea (of storing two bits per skb), but :
> 
> - It means that swap_buffer() should not touch (bzero) the 'old' bins

Yes, it means we have to properly decrement the old buffer
until all bins reached zero.

> - Since hash perturbator is changed, we have to store the two hash
> values per skb (instead of one u32 / classid).

Indeed.

^ permalink raw reply

* Re: [PATCH V11 2/4] ptp: Added a clock that uses the eTSEC found on the MPC85xx.
From: Grant Likely @ 2011-02-23 16:50 UTC (permalink / raw)
  To: Richard Cochran
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, Alan Cox, Arnd Bergmann,
	Christoph Lameter, David Miller, John Stultz, Krzysztof Halasa,
	Peter Zijlstra, Rodolfo Giometti, Thomas Gleixner,
	Benjamin Herrenschmidt, Mike Frysinger, Paul Mackerras,
	Russell King
In-Reply-To: <be1f5e801e8cd0145dd23aadae7c2055bb3c1d47.1298447722.git.richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>

On Wed, Feb 23, 2011 at 11:38:17AM +0100, Richard Cochran wrote:
> The eTSEC includes a PTP clock with quite a few features. This patch adds
> support for the basic clock adjustment functions, plus two external time
> stamps, one alarm, and the PPS callback.
> 
> Signed-off-by: Richard Cochran <richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>
> Acked-by: John Stultz <johnstul-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
> ---
>  .../devicetree/bindings/net/fsl-tsec-phy.txt       |   57 +++
>  arch/powerpc/boot/dts/mpc8313erdb.dts              |   14 +
>  arch/powerpc/boot/dts/mpc8572ds.dts                |   14 +
>  arch/powerpc/boot/dts/p2020ds.dts                  |   14 +
>  arch/powerpc/boot/dts/p2020rdb.dts                 |   14 +
>  drivers/net/Makefile                               |    1 +
>  drivers/net/gianfar_ptp.c                          |  448 ++++++++++++++++++++
>  drivers/net/gianfar_ptp_reg.h                      |  113 +++++
>  drivers/ptp/Kconfig                                |   13 +
>  9 files changed, 688 insertions(+), 0 deletions(-)
>  create mode 100644 drivers/net/gianfar_ptp.c
>  create mode 100644 drivers/net/gianfar_ptp_reg.h
> 
> diff --git a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
> index edb7ae1..f6edbb8 100644
> --- a/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
> +++ b/Documentation/devicetree/bindings/net/fsl-tsec-phy.txt
> @@ -74,3 +74,60 @@ Example:
>  		interrupt-parent = <&mpic>;
>  		phy-handle = <&phy0>
>  	};
> +
> +* Gianfar PTP clock nodes
> +
> +General Properties:
> +
> +  - compatible   Should be "fsl,etsec-ptp"

Should specify an *exact* part; ie: "fsl,mpc8313-etsec-ptp" instead of
trying to define a generic catchall.  The reason is that the same
marketing name can end up getting applied to a wide range of parts.

Instead, choose one specific device to stand in as the 'common'
implementation and get all parts with the same core to claim
compatibility with it.  ie: a p2020 might have:

	compatible = "fsl,mpc2020-etsec-ptp", "fsl,mpc8313-etsec-ptp";

> +  - reg          Offset and length of the register set for the device
> +  - interrupts   There should be at least two interrupts. Some devices
> +                 have as many as four PTP related interrupts.
> +
> +Clock Properties:
> +
> +  - tclk-period  Timer reference clock period in nanoseconds.
> +  - tmr-prsc     Prescaler, divides the output clock.
> +  - tmr-add      Frequency compensation value.
> +  - cksel        0= external clock, 1= eTSEC system clock, 3= RTC clock input.
> +                 Currently the driver only supports choice "1".

I'd be hesitant about defining something that isn't actually
implemented yet.  You may find the binding to be insufficient at a
later date.

> +  - tmr-fiper1   Fixed interval period pulse generator.
> +  - tmr-fiper2   Fixed interval period pulse generator.
> +  - max-adj      Maximum frequency adjustment in parts per billion.

These are all custom properties (not part of any shared binding) so
they should probably be prefixed with 'fsl,'.

> +
> +  These properties set the operational parameters for the PTP
> +  clock. You must choose these carefully for the clock to work right.
> +  Here is how to figure good values:
> +
> +  TimerOsc     = system clock               MHz
> +  tclk_period  = desired clock period       nanoseconds
> +  NominalFreq  = 1000 / tclk_period         MHz
> +  FreqDivRatio = TimerOsc / NominalFreq     (must be greater that 1.0)
> +  tmr_add      = ceil(2^32 / FreqDivRatio)
> +  OutputClock  = NominalFreq / tmr_prsc     MHz
> +  PulseWidth   = 1 / OutputClock            microseconds
> +  FiperFreq1   = desired frequency in Hz
> +  FiperDiv1    = 1000000 * OutputClock / FiperFreq1
> +  tmr_fiper1   = tmr_prsc * tclk_period * FiperDiv1 - tclk_period
> +  max_adj      = 1000000000 * (FreqDivRatio - 1.0) - 1
> +
> +  The calculation for tmr_fiper2 is the same as for tmr_fiper1. The
> +  driver expects that tmr_fiper1 will be correctly set to produce a 1
> +  Pulse Per Second (PPS) signal, since this will be offered to the PPS
> +  subsystem to synchronize the Linux clock.

Good documentation, thanks.  Question though, how many of these values
will the end user (or board builder) be likely to want to change.  It
is risky encoding the calculation results into the device tree when
they aren't the actually parameters that will be manipulated, or at
least very user-unfriendly.

> +
> +Example:
> +
> +	ptp_clock@24E00 {
> +		compatible = "fsl,etsec-ptp";
> +		reg = <0x24E00 0xB0>;
> +		interrupts = <12 0x8 13 0x8>;
> +		interrupt-parent = < &ipic >;
> +		tclk-period = <10>;
> +		tmr-prsc    = <100>;
> +		tmr-add     = <0x999999A4>;
> +		cksel       = <0x1>;
> +		tmr-fiper1  = <0x3B9AC9F6>;
> +		tmr-fiper2  = <0x00018696>;
> +		max-adj     = <659999998>;
> +	};
> diff --git a/arch/powerpc/boot/dts/mpc8313erdb.dts b/arch/powerpc/boot/dts/mpc8313erdb.dts
> index 183f2aa..85a7eaa 100644
> --- a/arch/powerpc/boot/dts/mpc8313erdb.dts
> +++ b/arch/powerpc/boot/dts/mpc8313erdb.dts
> @@ -208,6 +208,20 @@
>  			sleep = <&pmc 0x00300000>;
>  		};
>  
> +		ptp_clock@24E00 {
> +			compatible = "fsl,etsec-ptp";
> +			reg = <0x24E00 0xB0>;
> +			interrupts = <12 0x8 13 0x8>;
> +			interrupt-parent = < &ipic >;
> +			tclk-period = <10>;
> +			tmr-prsc    = <100>;
> +			tmr-add     = <0x999999A4>;
> +			cksel       = <0x1>;
> +			tmr-fiper1  = <0x3B9AC9F6>;
> +			tmr-fiper2  = <0x00018696>;
> +			max-adj     = <659999998>;
> +		};
> +
>  		enet0: ethernet@24000 {
>  			#address-cells = <1>;
>  			#size-cells = <1>;
> diff --git a/arch/powerpc/boot/dts/mpc8572ds.dts b/arch/powerpc/boot/dts/mpc8572ds.dts
> index cafc128..74208cd 100644
> --- a/arch/powerpc/boot/dts/mpc8572ds.dts
> +++ b/arch/powerpc/boot/dts/mpc8572ds.dts
> @@ -324,6 +324,20 @@
>  			};
>  		};
>  
> +		ptp_clock@24E00 {
> +			compatible = "fsl,etsec-ptp";
> +			reg = <0x24E00 0xB0>;
> +			interrupts = <68 2 69 2 70 2 71 2>;
> +			interrupt-parent = < &mpic >;
> +			tclk-period = <5>;
> +			tmr-prsc = <200>;
> +			tmr-add = <0xAAAAAAAB>;
> +			cksel = <1>;
> +			tmr-fiper1 = <0x3B9AC9FB>;
> +			tmr-fiper2 = <0x3B9AC9FB>;
> +			max-adj = <499999999>;
> +		};
> +
>  		enet0: ethernet@24000 {
>  			#address-cells = <1>;
>  			#size-cells = <1>;
> diff --git a/arch/powerpc/boot/dts/p2020ds.dts b/arch/powerpc/boot/dts/p2020ds.dts
> index 1101914..39d73bb 100644
> --- a/arch/powerpc/boot/dts/p2020ds.dts
> +++ b/arch/powerpc/boot/dts/p2020ds.dts
> @@ -336,6 +336,20 @@
>  			phy_type = "ulpi";
>  		};
>  
> +		ptp_clock@24E00 {
> +			compatible = "fsl,etsec-ptp";
> +			reg = <0x24E00 0xB0>;
> +			interrupts = <68 2 69 2 70 2>;
> +			interrupt-parent = < &mpic >;
> +			tclk-period = <5>;
> +			tmr-prsc = <200>;
> +			tmr-add = <0xCCCCCCCD>;
> +			cksel = <1>;
> +			tmr-fiper1 = <0x3B9AC9FB>;
> +			tmr-fiper2 = <0x0001869B>;
> +			max-adj = <249999999>;
> +		};
> +
>  		enet0: ethernet@24000 {
>  			#address-cells = <1>;
>  			#size-cells = <1>;
> diff --git a/arch/powerpc/boot/dts/p2020rdb.dts b/arch/powerpc/boot/dts/p2020rdb.dts
> index da4cb0d..5498fb9 100644
> --- a/arch/powerpc/boot/dts/p2020rdb.dts
> +++ b/arch/powerpc/boot/dts/p2020rdb.dts
> @@ -396,6 +396,20 @@
>  			phy_type = "ulpi";
>  		};
>  
> +		ptp_clock@24E00 {
> +			compatible = "fsl,etsec-ptp";
> +			reg = <0x24E00 0xB0>;
> +			interrupts = <68 2 69 2 70 2>;
> +			interrupt-parent = < &mpic >;
> +			tclk-period = <5>;
> +			tmr-prsc = <200>;
> +			tmr-add = <0xCCCCCCCD>;
> +			cksel = <1>;
> +			tmr-fiper1 = <0x3B9AC9FB>;
> +			tmr-fiper2 = <0x0001869B>;
> +			max-adj = <249999999>;
> +		};
> +
>  		enet0: ethernet@24000 {
>  			#address-cells = <1>;
>  			#size-cells = <1>;
> diff --git a/drivers/net/Makefile b/drivers/net/Makefile
> index b90738d..c303f5f 100644
> --- a/drivers/net/Makefile
> +++ b/drivers/net/Makefile
> @@ -31,6 +31,7 @@ obj-$(CONFIG_ATL2) += atlx/
>  obj-$(CONFIG_ATL1E) += atl1e/
>  obj-$(CONFIG_ATL1C) += atl1c/
>  obj-$(CONFIG_GIANFAR) += gianfar_driver.o
> +obj-$(CONFIG_PTP_1588_CLOCK_GIANFAR) += gianfar_ptp.o
>  obj-$(CONFIG_TEHUTI) += tehuti.o
>  obj-$(CONFIG_ENIC) += enic/
>  obj-$(CONFIG_JME) += jme.o
> diff --git a/drivers/net/gianfar_ptp.c b/drivers/net/gianfar_ptp.c
> new file mode 100644
> index 0000000..84fff15
> --- /dev/null
> +++ b/drivers/net/gianfar_ptp.c
> @@ -0,0 +1,448 @@
> +/*
> + * PTP 1588 clock using the eTSEC
> + *
> + * Copyright (C) 2010 OMICRON electronics GmbH
> + *
> + *  This program is free software; you can redistribute it and/or modify
> + *  it under the terms of the GNU General Public License as published by
> + *  the Free Software Foundation; either version 2 of the License, or
> + *  (at your option) any later version.
> + *
> + *  This program is distributed in the hope that it will be useful,
> + *  but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *  GNU General Public License for more details.
> + *
> + *  You should have received a copy of the GNU General Public License
> + *  along with this program; if not, write to the Free Software
> + *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
> + */
> +#include <linux/device.h>
> +#include <linux/hrtimer.h>
> +#include <linux/init.h>
> +#include <linux/interrupt.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_platform.h>
> +#include <linux/timex.h>
> +#include <linux/io.h>
> +
> +#include <linux/ptp_clock_kernel.h>
> +
> +#include "gianfar_ptp_reg.h"
> +#include "gianfar.h"
> +
> +#define DRIVER		"gianfar_ptp"
> +#define N_ALARM		1 /* first alarm is used internally to reset fipers */
> +#define N_EXT_TS	2
> +#define REG_SIZE	sizeof(struct gianfar_ptp_registers)
> +
> +struct etsects {
> +	struct gianfar_ptp_registers *regs;
> +	struct ptp_clock *clock;
> +	struct ptp_clock_info caps;
> +	int irq;
> +	u64 alarm_interval; /* for periodic alarm */
> +	u64 alarm_value;
> +	u32 tclk_period;  /* nanoseconds */
> +	u32 tmr_prsc;
> +	u32 tmr_add;
> +	u32 cksel;
> +	u32 tmr_fiper1;
> +	u32 tmr_fiper2;
> +};
> +
> +/* Private globals */
> +static struct etsects the_clock;
> +DEFINE_SPINLOCK(register_lock);
> +
> +/*
> + * Register access functions
> + */
> +
> +static u64 tmr_cnt_read(struct etsects *etsects)
> +{
> +	u64 ns;
> +	u32 lo, hi;
> +
> +	lo = gfar_read(&etsects->regs->tmr_cnt_l);
> +	hi = gfar_read(&etsects->regs->tmr_cnt_h);
> +	ns = ((u64) hi) << 32;
> +	ns |= lo;
> +	return ns;
> +}
> +
> +static void tmr_cnt_write(struct etsects *etsects, u64 ns)
> +{
> +	u32 hi = ns >> 32;
> +	u32 lo = ns & 0xffffffff;
> +
> +	gfar_write(&etsects->regs->tmr_cnt_l, lo);
> +	gfar_write(&etsects->regs->tmr_cnt_h, hi);
> +}
> +
> +static void set_alarm(struct etsects *etsects)
> +{
> +	u64 ns;
> +	u32 lo, hi;
> +
> +	ns = tmr_cnt_read(etsects) + 1500000000ULL;
> +	ns = div_u64(ns, 1000000000UL) * 1000000000ULL;
> +	ns -= etsects->tclk_period;
> +	hi = ns >> 32;
> +	lo = ns & 0xffffffff;
> +	gfar_write(&etsects->regs->tmr_alarm1_l, lo);
> +	gfar_write(&etsects->regs->tmr_alarm1_h, hi);
> +}
> +
> +static void set_fipers(struct etsects *etsects)
> +{
> +	u32 tmr_ctrl = gfar_read(&etsects->regs->tmr_ctrl);
> +
> +	gfar_write(&etsects->regs->tmr_ctrl,   tmr_ctrl & (~TE));
> +	gfar_write(&etsects->regs->tmr_prsc,   etsects->tmr_prsc);
> +	gfar_write(&etsects->regs->tmr_fiper1, etsects->tmr_fiper1);
> +	gfar_write(&etsects->regs->tmr_fiper2, etsects->tmr_fiper2);
> +	set_alarm(etsects);
> +	gfar_write(&etsects->regs->tmr_ctrl,   tmr_ctrl|TE);
> +}
> +
> +/*
> + * Interrupt service routine
> + */
> +
> +static irqreturn_t isr(int irq, void *priv)
> +{
> +	struct etsects *etsects = priv;
> +	struct ptp_clock_event event;
> +	u64 ns;
> +	u32 ack = 0, lo, hi, mask, val;
> +
> +	val = gfar_read(&etsects->regs->tmr_tevent);
> +
> +	if (val & ETS1) {
> +		ack |= ETS1;
> +		hi = gfar_read(&etsects->regs->tmr_etts1_h);
> +		lo = gfar_read(&etsects->regs->tmr_etts1_l);
> +		event.type = PTP_CLOCK_EXTTS;
> +		event.index = 0;
> +		event.timestamp = ((u64) hi) << 32;
> +		event.timestamp |= lo;
> +		ptp_clock_event(etsects->clock, &event);
> +	}
> +
> +	if (val & ETS2) {
> +		ack |= ETS2;
> +		hi = gfar_read(&etsects->regs->tmr_etts2_h);
> +		lo = gfar_read(&etsects->regs->tmr_etts2_l);
> +		event.type = PTP_CLOCK_EXTTS;
> +		event.index = 1;
> +		event.timestamp = ((u64) hi) << 32;
> +		event.timestamp |= lo;
> +		ptp_clock_event(etsects->clock, &event);
> +	}
> +
> +	if (val & ALM2) {
> +		ack |= ALM2;
> +		if (etsects->alarm_value) {
> +			event.type = PTP_CLOCK_ALARM;
> +			event.index = 0;
> +			event.timestamp = etsects->alarm_value;
> +			ptp_clock_event(etsects->clock, &event);
> +		}
> +		if (etsects->alarm_interval) {
> +			ns = etsects->alarm_value + etsects->alarm_interval;
> +			hi = ns >> 32;
> +			lo = ns & 0xffffffff;
> +			spin_lock(&register_lock);
> +			gfar_write(&etsects->regs->tmr_alarm2_l, lo);
> +			gfar_write(&etsects->regs->tmr_alarm2_h, hi);
> +			spin_unlock(&register_lock);
> +			etsects->alarm_value = ns;
> +		} else {
> +			gfar_write(&etsects->regs->tmr_tevent, ALM2);
> +			spin_lock(&register_lock);
> +			mask = gfar_read(&etsects->regs->tmr_temask);
> +			mask &= ~ALM2EN;
> +			gfar_write(&etsects->regs->tmr_temask, mask);
> +			spin_unlock(&register_lock);
> +			etsects->alarm_value = 0;
> +			etsects->alarm_interval = 0;
> +		}
> +	}
> +
> +	if (val & PP1) {
> +		ack |= PP1;
> +		event.type = PTP_CLOCK_PPS;
> +		ptp_clock_event(etsects->clock, &event);
> +	}
> +
> +	if (ack) {
> +		gfar_write(&etsects->regs->tmr_tevent, ack);
> +		return IRQ_HANDLED;
> +	} else
> +		return IRQ_NONE;
> +}
> +
> +/*
> + * PTP clock operations
> + */
> +
> +static int ptp_gianfar_adjfreq(struct ptp_clock_info *ptp, s32 ppb)
> +{
> +	u64 adj;
> +	u32 diff, tmr_add;
> +	int neg_adj = 0;
> +	struct etsects *etsects = container_of(ptp, struct etsects, caps);
> +
> +	if (ppb < 0) {
> +		neg_adj = 1;
> +		ppb = -ppb;
> +	}
> +	tmr_add = etsects->tmr_add;
> +	adj = tmr_add;
> +	adj *= ppb;
> +	diff = div_u64(adj, 1000000000ULL);
> +
> +	tmr_add = neg_adj ? tmr_add - diff : tmr_add + diff;
> +
> +	gfar_write(&etsects->regs->tmr_add, tmr_add);
> +
> +	return 0;
> +}
> +
> +static int ptp_gianfar_adjtime(struct ptp_clock_info *ptp, s64 delta)
> +{
> +	s64 now;
> +	unsigned long flags;
> +	struct etsects *etsects = container_of(ptp, struct etsects, caps);
> +
> +	spin_lock_irqsave(&register_lock, flags);
> +
> +	now = tmr_cnt_read(etsects);
> +	now += delta;
> +	tmr_cnt_write(etsects, now);
> +
> +	spin_unlock_irqrestore(&register_lock, flags);
> +
> +	set_fipers(etsects);
> +
> +	return 0;
> +}
> +
> +static int ptp_gianfar_gettime(struct ptp_clock_info *ptp, struct timespec *ts)
> +{
> +	u64 ns;
> +	u32 remainder;
> +	unsigned long flags;
> +	struct etsects *etsects = container_of(ptp, struct etsects, caps);
> +
> +	spin_lock_irqsave(&register_lock, flags);
> +
> +	ns = tmr_cnt_read(etsects);
> +
> +	spin_unlock_irqrestore(&register_lock, flags);
> +
> +	ts->tv_sec = div_u64_rem(ns, 1000000000, &remainder);
> +	ts->tv_nsec = remainder;
> +	return 0;
> +}
> +
> +static int ptp_gianfar_settime(struct ptp_clock_info *ptp,
> +			       const struct timespec *ts)
> +{
> +	u64 ns;
> +	unsigned long flags;
> +	struct etsects *etsects = container_of(ptp, struct etsects, caps);
> +
> +	ns = ts->tv_sec * 1000000000ULL;
> +	ns += ts->tv_nsec;
> +
> +	spin_lock_irqsave(&register_lock, flags);
> +
> +	tmr_cnt_write(etsects, ns);
> +	set_fipers(etsects);
> +
> +	spin_unlock_irqrestore(&register_lock, flags);
> +
> +	return 0;
> +}
> +
> +static int ptp_gianfar_enable(struct ptp_clock_info *ptp,
> +			      struct ptp_clock_request *rq, int on)
> +{
> +	struct etsects *etsects = container_of(ptp, struct etsects, caps);
> +	unsigned long flags;
> +	u32 bit, mask;
> +
> +	switch (rq->type) {
> +	case PTP_CLK_REQ_EXTTS:
> +		switch (rq->extts.index) {
> +		case 0:
> +			bit = ETS1EN;
> +			break;
> +		case 1:
> +			bit = ETS2EN;
> +			break;
> +		default:
> +			return -EINVAL;
> +		}
> +		spin_lock_irqsave(&register_lock, flags);
> +		mask = gfar_read(&etsects->regs->tmr_temask);
> +		if (on)
> +			mask |= bit;
> +		else
> +			mask &= ~bit;
> +		gfar_write(&etsects->regs->tmr_temask, mask);
> +		spin_unlock_irqrestore(&register_lock, flags);
> +		return 0;
> +
> +	case PTP_CLK_REQ_PPS:
> +		spin_lock_irqsave(&register_lock, flags);
> +		mask = gfar_read(&etsects->regs->tmr_temask);
> +		if (on)
> +			mask |= PP1EN;
> +		else
> +			mask &= ~PP1EN;
> +		gfar_write(&etsects->regs->tmr_temask, mask);
> +		spin_unlock_irqrestore(&register_lock, flags);
> +		return 0;
> +
> +	default:
> +		break;
> +	}
> +
> +	return -EOPNOTSUPP;
> +}
> +
> +static struct ptp_clock_info ptp_gianfar_caps = {
> +	.owner		= THIS_MODULE,
> +	.name		= "gianfar clock",
> +	.max_adj	= 512000,
> +	.n_alarm	= N_ALARM,
> +	.n_ext_ts	= N_EXT_TS,
> +	.n_per_out	= 0,
> +	.pps		= 1,
> +	.adjfreq	= ptp_gianfar_adjfreq,
> +	.adjtime	= ptp_gianfar_adjtime,
> +	.gettime	= ptp_gianfar_gettime,
> +	.settime	= ptp_gianfar_settime,
> +	.enable		= ptp_gianfar_enable,
> +};
> +
> +/* OF device tree */
> +
> +static int get_of_u32(struct device_node *node, char *str, u32 *val)
> +{
> +	int plen;
> +	const u32 *prop = of_get_property(node, str, &plen);
> +
> +	if (!prop || plen != sizeof(*prop))
> +		return -1;
> +	*val = *prop;
> +	return 0;
> +}
> +
> +static int gianfar_ptp_probe(struct platform_device *dev,
> +			     const struct of_device_id *match)
> +{
> +	struct device_node *node = dev->dev.of_node;
> +	struct etsects *etsects = &the_clock;
> +	struct timespec now;
> +	u32 tmr_ctrl;
> +
> +	etsects->caps = ptp_gianfar_caps;
> +
> +	if (get_of_u32(node, "tclk-period", &etsects->tclk_period) ||
> +	    get_of_u32(node, "tmr-prsc", &etsects->tmr_prsc) ||
> +	    get_of_u32(node, "tmr-add", &etsects->tmr_add) ||
> +	    get_of_u32(node, "cksel", &etsects->cksel) ||
> +	    get_of_u32(node, "tmr-fiper1", &etsects->tmr_fiper1) ||
> +	    get_of_u32(node, "tmr-fiper2", &etsects->tmr_fiper2) ||
> +	    get_of_u32(node, "max-adj", &etsects->caps.max_adj)) {
> +		pr_err("device tree node missing required elements\n");
> +		return -ENODEV;
> +	}
> +
> +	etsects->irq = irq_of_parse_and_map(node, 0);

Use platform_get_irq().

> +
> +	if (etsects->irq == NO_IRQ) {
> +		pr_err("irq not in device tree\n");
> +		return -ENODEV;
> +	}
> +	if (request_irq(etsects->irq, isr, 0, DRIVER, etsects)) {
> +		pr_err("request_irq failed\n");
> +		return -ENODEV;
> +	}
> +	etsects->regs = of_iomap(node, 0);

Use platform_get_resource(), and don't forget to request the
resources.

> +	if (!etsects->regs) {
> +		pr_err("of_iomap ptp registers failed\n");
> +		return -EINVAL;
> +	}
> +	getnstimeofday(&now);
> +	ptp_gianfar_settime(&etsects->caps, &now);
> +
> +	tmr_ctrl =
> +	  (etsects->tclk_period & TCLK_PERIOD_MASK) << TCLK_PERIOD_SHIFT |
> +	  (etsects->cksel & CKSEL_MASK) << CKSEL_SHIFT;
> +
> +	gfar_write(&etsects->regs->tmr_ctrl,   tmr_ctrl);
> +	gfar_write(&etsects->regs->tmr_add,    etsects->tmr_add);
> +	gfar_write(&etsects->regs->tmr_prsc,   etsects->tmr_prsc);
> +	gfar_write(&etsects->regs->tmr_fiper1, etsects->tmr_fiper1);
> +	gfar_write(&etsects->regs->tmr_fiper2, etsects->tmr_fiper2);
> +	set_alarm(etsects);
> +	gfar_write(&etsects->regs->tmr_ctrl,   tmr_ctrl|FS|RTPE|TE);
> +
> +	etsects->clock = ptp_clock_register(&etsects->caps);
> +
> +	return IS_ERR(etsects->clock) ? PTR_ERR(etsects->clock) : 0;
> +}
> +
> +static int gianfar_ptp_remove(struct platform_device *dev)
> +{
> +	gfar_write(&the_clock.regs->tmr_temask, 0);
> +	gfar_write(&the_clock.regs->tmr_ctrl,   0);
> +
> +	ptp_clock_unregister(the_clock.clock);
> +	free_irq(the_clock.irq, &the_clock);
> +	iounmap(the_clock.regs);
> +
> +	return 0;
> +}
> +
> +static struct of_device_id match_table[] = {
> +	{ .compatible = "fsl,etsec-ptp" },
> +	{},
> +};
> +
> +static struct of_platform_driver gianfar_ptp_driver = {

Use a platform_driver instead.  of_platform_driver is deprecated and
being removed.

> +	.driver = {
> +		.name		= "gianfar_ptp",
> +		.of_match_table	= match_table,
> +		.owner		= THIS_MODULE,
> +	},
> +	.probe       = gianfar_ptp_probe,
> +	.remove      = gianfar_ptp_remove,
> +};
> +
> +/* module operations */
> +
> +static int __init ptp_gianfar_init(void)
> +{
> +	return of_register_platform_driver(&gianfar_ptp_driver);
> +}
> +
> +module_init(ptp_gianfar_init);
> +
> +static void __exit ptp_gianfar_exit(void)
> +{
> +	of_unregister_platform_driver(&gianfar_ptp_driver);
> +}
> +
> +module_exit(ptp_gianfar_exit);
> +
> +MODULE_AUTHOR("Richard Cochran <richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>");
> +MODULE_DESCRIPTION("PTP clock using the eTSEC");
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/net/gianfar_ptp_reg.h b/drivers/net/gianfar_ptp_reg.h
> new file mode 100644
> index 0000000..95e171f
> --- /dev/null
> +++ b/drivers/net/gianfar_ptp_reg.h

This data is only used by gianfar_ptp.c, so there is no need for a
separate include file.  Move the contents of gianfar_ptp_reg.h into
gianfar_ptp.c

g.

^ permalink raw reply

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Eric Dumazet @ 2011-02-23 16:48 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen
In-Reply-To: <4D6534C3.1080305@trash.net>

Le mercredi 23 février 2011 à 17:24 +0100, Patrick McHardy a écrit :

> This needs to be a per-skb property, otherwise you could have the
> situation:
> 
> - enqueue skb, double_buffering=0, increment buffer 0
> - enable double buffering
> - swap buffers
> - dequeue same skb, decrement buffer 1
> 
> after which the qlen values of buffer 1 will be incorrect.
> 

Normally its OK, because we bzero() the zone, and the "decrement" is
0-bounded.

I had this idea (of storing two bits per skb), but :

- It means that swap_buffer() should not touch (bzero) the 'old' bins

- Since hash perturbator is changed, we have to store the two hash
values per skb (instead of one u32 / classid).


> 
> > +		slot ^= 1;
> > +		sfbhash = sfb_hash(skb, slot, q);
> 
> Isn't there room in the cb to store both hash values?

Yes, I am going to implement your idea, its probably OK to use two u32
on skb_cb for this.

Thanks !



^ permalink raw reply

* Re: [RFC PATCH 09/10] netdev: octeon_mgmt: Convert to use device tree.
From: Grant Likely @ 2011-02-23 16:32 UTC (permalink / raw)
  To: David Daney
  Cc: linux-mips-6z/3iImG2C8G8FEW9MqTrA, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, ralf-6z/3iImG2C8G8FEW9MqTrA,
	David S. Miller
In-Reply-To: <1298408274-20856-10-git-send-email-ddaney-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>

On Tue, Feb 22, 2011 at 12:57:53PM -0800, David Daney wrote:
> The device tree will supply the register bank base addresses, make
> register addressing relative to those.  PHY connection is now
> described by the device tree.
> 
> Signed-off-by: David Daney <ddaney-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> Cc: "David S. Miller" <davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
> Cc: netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
>  arch/mips/cavium-octeon/octeon-platform.c |   62 -------
>  drivers/net/octeon/octeon_mgmt.c          |  265 +++++++++++++++++------------
>  2 files changed, 160 insertions(+), 167 deletions(-)
> 
> diff --git a/drivers/net/octeon/octeon_mgmt.c b/drivers/net/octeon/octeon_mgmt.c
> index b264f0f..6c301c7 100644
> --- a/drivers/net/octeon/octeon_mgmt.c
> +++ b/drivers/net/octeon/octeon_mgmt.c
> @@ -6,16 +6,21 @@
>   * Copyright (C) 2009 Cavium Networks
>   */
>  
> -#include <linux/capability.h>
> -#include <linux/dma-mapping.h>
> -#include <linux/init.h>
>  #include <linux/platform_device.h>
> -#include <linux/netdevice.h>
> +#include <linux/dma-mapping.h>
>  #include <linux/etherdevice.h>
> +#include <linux/of_platform.h>
> +#include <linux/capability.h>
> +#include <linux/of_address.h>
> +#include <linux/netdevice.h>
> +#include <linux/spinlock.h>
>  #include <linux/if_vlan.h>
> +#include <linux/of_mdio.h>
> +#include <linux/of_irq.h>
> +#include <linux/of_net.h>
> +#include <linux/init.h>
>  #include <linux/slab.h>
>  #include <linux/phy.h>
> -#include <linux/spinlock.h>
>  
>  #include <asm/octeon/octeon.h>
>  #include <asm/octeon/cvmx-mixx-defs.h>
> @@ -55,8 +60,56 @@ union mgmt_port_ring_entry {
>  	} s;
>  };
>  
> +#define MIX_ORING1	0x0
> +#define MIX_ORING2	0x8
> +#define MIX_IRING1	0x10
> +#define MIX_IRING2	0x18
> +#define MIX_CTL		0x20
> +#define MIX_IRHWM	0x28
> +#define MIX_IRCNT	0x30
> +#define MIX_ORHWM	0x38
> +#define MIX_ORCNT	0x40
> +#define MIX_ISR		0x48
> +#define MIX_INTENA	0x50
> +#define MIX_REMCNT	0x58
> +#define MIX_BIST	0x78
> +
> +#define AGL_GMX_PRT_CFG			0x10
> +#define AGL_GMX_RX_FRM_CTL		0x18
> +#define AGL_GMX_RX_FRM_MAX		0x30
> +#define AGL_GMX_RX_JABBER		0x38
> +#define AGL_GMX_RX_STATS_CTL		0x50
> +
> +#define AGL_GMX_RX_STATS_PKTS_DRP	0xb0
> +#define AGL_GMX_RX_STATS_OCTS_DRP	0xb8
> +#define AGL_GMX_RX_STATS_PKTS_BAD	0xc0
> +
> +#define AGL_GMX_RX_ADR_CTL		0x100
> +#define AGL_GMX_RX_ADR_CAM_EN		0x108
> +#define AGL_GMX_RX_ADR_CAM0		0x180
> +#define AGL_GMX_RX_ADR_CAM1		0x188
> +#define AGL_GMX_RX_ADR_CAM2		0x190
> +#define AGL_GMX_RX_ADR_CAM3		0x198
> +#define AGL_GMX_RX_ADR_CAM4		0x1a0
> +#define AGL_GMX_RX_ADR_CAM5		0x1a8
> +
> +#define AGL_GMX_TX_STATS_CTL		0x268
> +#define AGL_GMX_TX_CTL			0x270
> +#define AGL_GMX_TX_STAT0		0x280
> +#define AGL_GMX_TX_STAT1		0x288
> +#define AGL_GMX_TX_STAT2		0x290
> +#define AGL_GMX_TX_STAT3		0x298
> +#define AGL_GMX_TX_STAT4		0x2a0
> +#define AGL_GMX_TX_STAT5		0x2a8
> +#define AGL_GMX_TX_STAT6		0x2b0
> +#define AGL_GMX_TX_STAT7		0x2b8
> +#define AGL_GMX_TX_STAT8		0x2c0
> +#define AGL_GMX_TX_STAT9		0x2c8
> +
>  struct octeon_mgmt {
>  	struct net_device *netdev;
> +	u64 mix;
> +	u64 agl;
>  	int port;
>  	int irq;
>  	u64 *tx_ring;
> @@ -82,31 +135,30 @@ struct octeon_mgmt {
>  	struct napi_struct napi;
>  	struct tasklet_struct tx_clean_tasklet;
>  	struct phy_device *phydev;
> +	struct device_node *phy_np;
>  };
>  
>  static void octeon_mgmt_set_rx_irq(struct octeon_mgmt *p, int enable)
>  {
> -	int port = p->port;
>  	union cvmx_mixx_intena mix_intena;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&p->lock, flags);
> -	mix_intena.u64 = cvmx_read_csr(CVMX_MIXX_INTENA(port));
> +	mix_intena.u64 = cvmx_read_csr(p->mix + MIX_INTENA);
>  	mix_intena.s.ithena = enable ? 1 : 0;
> -	cvmx_write_csr(CVMX_MIXX_INTENA(port), mix_intena.u64);
> +	cvmx_write_csr(p->mix + MIX_INTENA, mix_intena.u64);
>  	spin_unlock_irqrestore(&p->lock, flags);
>  }
>  
>  static void octeon_mgmt_set_tx_irq(struct octeon_mgmt *p, int enable)
>  {
> -	int port = p->port;
>  	union cvmx_mixx_intena mix_intena;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&p->lock, flags);
> -	mix_intena.u64 = cvmx_read_csr(CVMX_MIXX_INTENA(port));
> +	mix_intena.u64 = cvmx_read_csr(p->mix + MIX_INTENA);
>  	mix_intena.s.othena = enable ? 1 : 0;
> -	cvmx_write_csr(CVMX_MIXX_INTENA(port), mix_intena.u64);
> +	cvmx_write_csr(p->mix + MIX_INTENA, mix_intena.u64);
>  	spin_unlock_irqrestore(&p->lock, flags);
>  }
>  
> @@ -143,7 +195,6 @@ static unsigned int ring_size_to_bytes(unsigned int ring_size)
>  static void octeon_mgmt_rx_fill_ring(struct net_device *netdev)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  
>  	while (p->rx_current_fill < ring_max_fill(OCTEON_MGMT_RX_RING_SIZE)) {
>  		unsigned int size;
> @@ -174,24 +225,23 @@ static void octeon_mgmt_rx_fill_ring(struct net_device *netdev)
>  			(p->rx_next_fill + 1) % OCTEON_MGMT_RX_RING_SIZE;
>  		p->rx_current_fill++;
>  		/* Ring the bell.  */
> -		cvmx_write_csr(CVMX_MIXX_IRING2(port), 1);
> +		cvmx_write_csr(p->mix + MIX_IRING2, 1);
>  	}
>  }
>  
>  static void octeon_mgmt_clean_tx_buffers(struct octeon_mgmt *p)
>  {
> -	int port = p->port;
>  	union cvmx_mixx_orcnt mix_orcnt;
>  	union mgmt_port_ring_entry re;
>  	struct sk_buff *skb;
>  	int cleaned = 0;
>  	unsigned long flags;
>  
> -	mix_orcnt.u64 = cvmx_read_csr(CVMX_MIXX_ORCNT(port));
> +	mix_orcnt.u64 = cvmx_read_csr(p->mix + MIX_ORCNT);
>  	while (mix_orcnt.s.orcnt) {
>  		spin_lock_irqsave(&p->tx_list.lock, flags);
>  
> -		mix_orcnt.u64 = cvmx_read_csr(CVMX_MIXX_ORCNT(port));
> +		mix_orcnt.u64 = cvmx_read_csr(p->mix + MIX_ORCNT);
>  
>  		if (mix_orcnt.s.orcnt == 0) {
>  			spin_unlock_irqrestore(&p->tx_list.lock, flags);
> @@ -211,7 +261,7 @@ static void octeon_mgmt_clean_tx_buffers(struct octeon_mgmt *p)
>  		mix_orcnt.s.orcnt = 1;
>  
>  		/* Acknowledge to hardware that we have the buffer.  */
> -		cvmx_write_csr(CVMX_MIXX_ORCNT(port), mix_orcnt.u64);
> +		cvmx_write_csr(p->mix + MIX_ORCNT, mix_orcnt.u64);
>  		p->tx_current_fill--;
>  
>  		spin_unlock_irqrestore(&p->tx_list.lock, flags);
> @@ -221,7 +271,7 @@ static void octeon_mgmt_clean_tx_buffers(struct octeon_mgmt *p)
>  		dev_kfree_skb_any(skb);
>  		cleaned++;
>  
> -		mix_orcnt.u64 = cvmx_read_csr(CVMX_MIXX_ORCNT(port));
> +		mix_orcnt.u64 = cvmx_read_csr(p->mix + MIX_ORCNT);
>  	}
>  
>  	if (cleaned && netif_queue_stopped(p->netdev))
> @@ -238,13 +288,12 @@ static void octeon_mgmt_clean_tx_tasklet(unsigned long arg)
>  static void octeon_mgmt_update_rx_stats(struct net_device *netdev)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  	unsigned long flags;
>  	u64 drop, bad;
>  
>  	/* These reads also clear the count registers.  */
> -	drop = cvmx_read_csr(CVMX_AGL_GMX_RXX_STATS_PKTS_DRP(port));
> -	bad = cvmx_read_csr(CVMX_AGL_GMX_RXX_STATS_PKTS_BAD(port));
> +	drop = cvmx_read_csr(p->agl + AGL_GMX_RX_STATS_PKTS_DRP);
> +	bad = cvmx_read_csr(p->agl + AGL_GMX_RX_STATS_PKTS_BAD);
>  
>  	if (drop || bad) {
>  		/* Do an atomic update. */
> @@ -258,15 +307,14 @@ static void octeon_mgmt_update_rx_stats(struct net_device *netdev)
>  static void octeon_mgmt_update_tx_stats(struct net_device *netdev)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  	unsigned long flags;
>  
>  	union cvmx_agl_gmx_txx_stat0 s0;
>  	union cvmx_agl_gmx_txx_stat1 s1;
>  
>  	/* These reads also clear the count registers.  */
> -	s0.u64 = cvmx_read_csr(CVMX_AGL_GMX_TXX_STAT0(port));
> -	s1.u64 = cvmx_read_csr(CVMX_AGL_GMX_TXX_STAT1(port));
> +	s0.u64 = cvmx_read_csr(p->agl + AGL_GMX_TX_STAT0);
> +	s1.u64 = cvmx_read_csr(p->agl + AGL_GMX_TX_STAT1);
>  
>  	if (s0.s.xsdef || s0.s.xscol || s1.s.scol || s1.s.mcol) {
>  		/* Do an atomic update. */
> @@ -305,7 +353,6 @@ static u64 octeon_mgmt_dequeue_rx_buffer(struct octeon_mgmt *p,
>  
>  static int octeon_mgmt_receive_one(struct octeon_mgmt *p)
>  {
> -	int port = p->port;
>  	struct net_device *netdev = p->netdev;
>  	union cvmx_mixx_ircnt mix_ircnt;
>  	union mgmt_port_ring_entry re;
> @@ -378,18 +425,17 @@ done:
>  	/* Tell the hardware we processed a packet.  */
>  	mix_ircnt.u64 = 0;
>  	mix_ircnt.s.ircnt = 1;
> -	cvmx_write_csr(CVMX_MIXX_IRCNT(port), mix_ircnt.u64);
> +	cvmx_write_csr(p->mix + MIX_IRCNT, mix_ircnt.u64);
>  	return rc;
>  }
>  
>  static int octeon_mgmt_receive_packets(struct octeon_mgmt *p, int budget)
>  {
> -	int port = p->port;
>  	unsigned int work_done = 0;
>  	union cvmx_mixx_ircnt mix_ircnt;
>  	int rc;
>  
> -	mix_ircnt.u64 = cvmx_read_csr(CVMX_MIXX_IRCNT(port));
> +	mix_ircnt.u64 = cvmx_read_csr(p->mix + MIX_IRCNT);
>  	while (work_done < budget && mix_ircnt.s.ircnt) {
>  
>  		rc = octeon_mgmt_receive_one(p);
> @@ -397,7 +443,7 @@ static int octeon_mgmt_receive_packets(struct octeon_mgmt *p, int budget)
>  			work_done++;
>  
>  		/* Check for more packets. */
> -		mix_ircnt.u64 = cvmx_read_csr(CVMX_MIXX_IRCNT(port));
> +		mix_ircnt.u64 = cvmx_read_csr(p->mix + MIX_IRCNT);
>  	}
>  
>  	octeon_mgmt_rx_fill_ring(p->netdev);
> @@ -431,16 +477,16 @@ static void octeon_mgmt_reset_hw(struct octeon_mgmt *p)
>  	union cvmx_agl_gmx_bist agl_gmx_bist;
>  
>  	mix_ctl.u64 = 0;
> -	cvmx_write_csr(CVMX_MIXX_CTL(p->port), mix_ctl.u64);
> +	cvmx_write_csr(p->mix + MIX_CTL, mix_ctl.u64);
>  	do {
> -		mix_ctl.u64 = cvmx_read_csr(CVMX_MIXX_CTL(p->port));
> +		mix_ctl.u64 = cvmx_read_csr(p->mix + MIX_CTL);
>  	} while (mix_ctl.s.busy);
>  	mix_ctl.s.reset = 1;
> -	cvmx_write_csr(CVMX_MIXX_CTL(p->port), mix_ctl.u64);
> -	cvmx_read_csr(CVMX_MIXX_CTL(p->port));
> +	cvmx_write_csr(p->mix + MIX_CTL, mix_ctl.u64);
> +	cvmx_read_csr(p->mix + MIX_CTL);
>  	cvmx_wait(64);
>  
> -	mix_bist.u64 = cvmx_read_csr(CVMX_MIXX_BIST(p->port));
> +	mix_bist.u64 = cvmx_read_csr(p->mix + MIX_BIST);
>  	if (mix_bist.u64)
>  		dev_warn(p->dev, "MIX failed BIST (0x%016llx)\n",
>  			(unsigned long long)mix_bist.u64);
> @@ -471,7 +517,6 @@ static void octeon_mgmt_cam_state_add(struct octeon_mgmt_cam_state *cs,
>  static void octeon_mgmt_set_rx_filtering(struct net_device *netdev)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  	union cvmx_agl_gmx_rxx_adr_ctl adr_ctl;
>  	union cvmx_agl_gmx_prtx_cfg agl_gmx_prtx;
>  	unsigned long flags;
> @@ -517,29 +562,29 @@ static void octeon_mgmt_set_rx_filtering(struct net_device *netdev)
>  	spin_lock_irqsave(&p->lock, flags);
>  
>  	/* Disable packet I/O. */
> -	agl_gmx_prtx.u64 = cvmx_read_csr(CVMX_AGL_GMX_PRTX_CFG(port));
> +	agl_gmx_prtx.u64 = cvmx_read_csr(p->agl + AGL_GMX_PRT_CFG);
>  	prev_packet_enable = agl_gmx_prtx.s.en;
>  	agl_gmx_prtx.s.en = 0;
> -	cvmx_write_csr(CVMX_AGL_GMX_PRTX_CFG(port), agl_gmx_prtx.u64);
> +	cvmx_write_csr(p->agl + AGL_GMX_PRT_CFG, agl_gmx_prtx.u64);
>  
>  	adr_ctl.u64 = 0;
>  	adr_ctl.s.cam_mode = cam_mode;
>  	adr_ctl.s.mcst = multicast_mode;
>  	adr_ctl.s.bcst = 1;     /* Allow broadcast */
>  
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CTL(port), adr_ctl.u64);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CTL, adr_ctl.u64);
>  
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CAM0(port), cam_state.cam[0]);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CAM1(port), cam_state.cam[1]);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CAM2(port), cam_state.cam[2]);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CAM3(port), cam_state.cam[3]);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CAM4(port), cam_state.cam[4]);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CAM5(port), cam_state.cam[5]);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_ADR_CAM_EN(port), cam_state.cam_mask);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CAM0, cam_state.cam[0]);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CAM1, cam_state.cam[1]);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CAM2, cam_state.cam[2]);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CAM3, cam_state.cam[3]);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CAM4, cam_state.cam[4]);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CAM5, cam_state.cam[5]);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_ADR_CAM_EN, cam_state.cam_mask);
>  
>  	/* Restore packet I/O. */
>  	agl_gmx_prtx.s.en = prev_packet_enable;
> -	cvmx_write_csr(CVMX_AGL_GMX_PRTX_CFG(port), agl_gmx_prtx.u64);
> +	cvmx_write_csr(p->agl + AGL_GMX_PRT_CFG, agl_gmx_prtx.u64);
>  
>  	spin_unlock_irqrestore(&p->lock, flags);
>  }
> @@ -561,7 +606,6 @@ static int octeon_mgmt_set_mac_address(struct net_device *netdev, void *addr)
>  static int octeon_mgmt_change_mtu(struct net_device *netdev, int new_mtu)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  	int size_without_fcs = new_mtu + OCTEON_MGMT_RX_HEADROOM;
>  
>  	/*
> @@ -577,8 +621,8 @@ static int octeon_mgmt_change_mtu(struct net_device *netdev, int new_mtu)
>  
>  	netdev->mtu = new_mtu;
>  
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_FRM_MAX(port), size_without_fcs);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_JABBER(port),
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_FRM_MAX, size_without_fcs);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_JABBER,
>  		       (size_without_fcs + 7) & 0xfff8);
>  
>  	return 0;
> @@ -588,14 +632,13 @@ static irqreturn_t octeon_mgmt_interrupt(int cpl, void *dev_id)
>  {
>  	struct net_device *netdev = dev_id;
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  	union cvmx_mixx_isr mixx_isr;
>  
> -	mixx_isr.u64 = cvmx_read_csr(CVMX_MIXX_ISR(port));
> +	mixx_isr.u64 = cvmx_read_csr(p->mix + MIX_ISR);
>  
>  	/* Clear any pending interrupts */
> -	cvmx_write_csr(CVMX_MIXX_ISR(port), mixx_isr.u64);
> -	cvmx_read_csr(CVMX_MIXX_ISR(port));
> +	cvmx_write_csr(p->mix + MIX_ISR, mixx_isr.u64);
> +	cvmx_read_csr(p->mix + MIX_ISR);
>  
>  	if (mixx_isr.s.irthresh) {
>  		octeon_mgmt_disable_rx_irq(p);
> @@ -626,7 +669,6 @@ static int octeon_mgmt_ioctl(struct net_device *netdev,
>  static void octeon_mgmt_adjust_link(struct net_device *netdev)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  	union cvmx_agl_gmx_prtx_cfg prtx_cfg;
>  	unsigned long flags;
>  	int link_changed = 0;
> @@ -637,11 +679,9 @@ static void octeon_mgmt_adjust_link(struct net_device *netdev)
>  			link_changed = 1;
>  		if (p->last_duplex != p->phydev->duplex) {
>  			p->last_duplex = p->phydev->duplex;
> -			prtx_cfg.u64 =
> -				cvmx_read_csr(CVMX_AGL_GMX_PRTX_CFG(port));
> +			prtx_cfg.u64 = cvmx_read_csr(p->agl + AGL_GMX_PRT_CFG);
>  			prtx_cfg.s.duplex = p->phydev->duplex;
> -			cvmx_write_csr(CVMX_AGL_GMX_PRTX_CFG(port),
> -				       prtx_cfg.u64);
> +			cvmx_write_csr(p->agl + AGL_GMX_PRT_CFG, prtx_cfg.u64);
>  		}
>  	} else {
>  		if (p->last_link)
> @@ -667,18 +707,16 @@ static void octeon_mgmt_adjust_link(struct net_device *netdev)
>  static int octeon_mgmt_init_phy(struct net_device *netdev)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	char phy_id[20];
>  
> -	if (octeon_is_simulation()) {
> +	if (octeon_is_simulation() || p->phy_np == NULL) {
>  		/* No PHYs in the simulator. */
>  		netif_carrier_on(netdev);
>  		return 0;
>  	}
>  
> -	snprintf(phy_id, sizeof(phy_id), PHY_ID_FMT, "0", p->port);
> -
> -	p->phydev = phy_connect(netdev, phy_id, octeon_mgmt_adjust_link, 0,
> -				PHY_INTERFACE_MODE_MII);
> +	p->phydev = of_phy_connect(netdev, p->phy_np,
> +				   octeon_mgmt_adjust_link, 0,
> +				   PHY_INTERFACE_MODE_MII);
>  
>  	if (IS_ERR(p->phydev)) {
>  		p->phydev = NULL;
> @@ -734,14 +772,14 @@ static int octeon_mgmt_open(struct net_device *netdev)
>  
>  	octeon_mgmt_reset_hw(p);
>  
> -	mix_ctl.u64 = cvmx_read_csr(CVMX_MIXX_CTL(port));
> +	mix_ctl.u64 = cvmx_read_csr(p->mix + MIX_CTL);
>  
>  	/* Bring it out of reset if needed. */
>  	if (mix_ctl.s.reset) {
>  		mix_ctl.s.reset = 0;
> -		cvmx_write_csr(CVMX_MIXX_CTL(port), mix_ctl.u64);
> +		cvmx_write_csr(p->mix + MIX_CTL, mix_ctl.u64);
>  		do {
> -			mix_ctl.u64 = cvmx_read_csr(CVMX_MIXX_CTL(port));
> +			mix_ctl.u64 = cvmx_read_csr(p->mix + MIX_CTL);
>  		} while (mix_ctl.s.reset);
>  	}
>  
> @@ -752,17 +790,17 @@ static int octeon_mgmt_open(struct net_device *netdev)
>  	oring1.u64 = 0;
>  	oring1.s.obase = p->tx_ring_handle >> 3;
>  	oring1.s.osize = OCTEON_MGMT_TX_RING_SIZE;
> -	cvmx_write_csr(CVMX_MIXX_ORING1(port), oring1.u64);
> +	cvmx_write_csr(p->mix + MIX_ORING1, oring1.u64);
>  
>  	iring1.u64 = 0;
>  	iring1.s.ibase = p->rx_ring_handle >> 3;
>  	iring1.s.isize = OCTEON_MGMT_RX_RING_SIZE;
> -	cvmx_write_csr(CVMX_MIXX_IRING1(port), iring1.u64);
> +	cvmx_write_csr(p->mix + MIX_IRING1, iring1.u64);
>  
>  	/* Disable packet I/O. */
> -	prtx_cfg.u64 = cvmx_read_csr(CVMX_AGL_GMX_PRTX_CFG(port));
> +	prtx_cfg.u64 = cvmx_read_csr(p->agl + AGL_GMX_PRT_CFG);
>  	prtx_cfg.s.en = 0;
> -	cvmx_write_csr(CVMX_AGL_GMX_PRTX_CFG(port), prtx_cfg.u64);
> +	cvmx_write_csr(p->agl + AGL_GMX_PRT_CFG, prtx_cfg.u64);
>  
>  	memcpy(sa.sa_data, netdev->dev_addr, ETH_ALEN);
>  	octeon_mgmt_set_mac_address(netdev, &sa);
> @@ -779,7 +817,7 @@ static int octeon_mgmt_open(struct net_device *netdev)
>  	mix_ctl.s.nbtarb = 0;       /* Arbitration mode */
>  	/* MII CB-request FIFO programmable high watermark */
>  	mix_ctl.s.mrq_hwm = 1;
> -	cvmx_write_csr(CVMX_MIXX_CTL(port), mix_ctl.u64);
> +	cvmx_write_csr(p->mix + MIX_CTL, mix_ctl.u64);
>  
>  	if (OCTEON_IS_MODEL(OCTEON_CN56XX_PASS1_X)
>  	    || OCTEON_IS_MODEL(OCTEON_CN52XX_PASS1_X)) {
> @@ -806,16 +844,16 @@ static int octeon_mgmt_open(struct net_device *netdev)
>  
>  	/* Clear statistics. */
>  	/* Clear on read. */
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_STATS_CTL(port), 1);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_STATS_PKTS_DRP(port), 0);
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_STATS_PKTS_BAD(port), 0);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_STATS_CTL, 1);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_STATS_PKTS_DRP, 0);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_STATS_PKTS_BAD, 0);
>  
> -	cvmx_write_csr(CVMX_AGL_GMX_TXX_STATS_CTL(port), 1);
> -	cvmx_write_csr(CVMX_AGL_GMX_TXX_STAT0(port), 0);
> -	cvmx_write_csr(CVMX_AGL_GMX_TXX_STAT1(port), 0);
> +	cvmx_write_csr(p->agl + AGL_GMX_TX_STATS_CTL, 1);
> +	cvmx_write_csr(p->agl + AGL_GMX_TX_STAT0, 0);
> +	cvmx_write_csr(p->agl + AGL_GMX_TX_STAT1, 0);
>  
>  	/* Clear any pending interrupts */
> -	cvmx_write_csr(CVMX_MIXX_ISR(port), cvmx_read_csr(CVMX_MIXX_ISR(port)));
> +	cvmx_write_csr(p->mix + MIX_ISR, cvmx_read_csr(p->mix + MIX_ISR));
>  
>  	if (request_irq(p->irq, octeon_mgmt_interrupt, 0, netdev->name,
>  			netdev)) {
> @@ -826,18 +864,18 @@ static int octeon_mgmt_open(struct net_device *netdev)
>  	/* Interrupt every single RX packet */
>  	mix_irhwm.u64 = 0;
>  	mix_irhwm.s.irhwm = 0;
> -	cvmx_write_csr(CVMX_MIXX_IRHWM(port), mix_irhwm.u64);
> +	cvmx_write_csr(p->mix + MIX_IRHWM, mix_irhwm.u64);
>  
>  	/* Interrupt when we have 1 or more packets to clean.  */
>  	mix_orhwm.u64 = 0;
>  	mix_orhwm.s.orhwm = 1;
> -	cvmx_write_csr(CVMX_MIXX_ORHWM(port), mix_orhwm.u64);
> +	cvmx_write_csr(p->mix + MIX_ORHWM, mix_orhwm.u64);
>  
>  	/* Enable receive and transmit interrupts */
>  	mix_intena.u64 = 0;
>  	mix_intena.s.ithena = 1;
>  	mix_intena.s.othena = 1;
> -	cvmx_write_csr(CVMX_MIXX_INTENA(port), mix_intena.u64);
> +	cvmx_write_csr(p->mix + MIX_INTENA, mix_intena.u64);
>  
>  
>  	/* Enable packet I/O. */
> @@ -868,7 +906,7 @@ static int octeon_mgmt_open(struct net_device *netdev)
>  	 * frame.  GMX checks that the PREAMBLE is sent correctly.
>  	 */
>  	rxx_frm_ctl.s.pre_chk = 1;
> -	cvmx_write_csr(CVMX_AGL_GMX_RXX_FRM_CTL(port), rxx_frm_ctl.u64);
> +	cvmx_write_csr(p->agl + AGL_GMX_RX_FRM_CTL, rxx_frm_ctl.u64);
>  
>  	/* Enable the AGL block */
>  	agl_gmx_inf_mode.u64 = 0;
> @@ -876,13 +914,13 @@ static int octeon_mgmt_open(struct net_device *netdev)
>  	cvmx_write_csr(CVMX_AGL_GMX_INF_MODE, agl_gmx_inf_mode.u64);
>  
>  	/* Configure the port duplex and enables */
> -	prtx_cfg.u64 = cvmx_read_csr(CVMX_AGL_GMX_PRTX_CFG(port));
> +	prtx_cfg.u64 = cvmx_read_csr(p->agl + AGL_GMX_PRT_CFG);
>  	prtx_cfg.s.tx_en = 1;
>  	prtx_cfg.s.rx_en = 1;
>  	prtx_cfg.s.en = 1;
>  	p->last_duplex = 1;
>  	prtx_cfg.s.duplex = p->last_duplex;
> -	cvmx_write_csr(CVMX_AGL_GMX_PRTX_CFG(port), prtx_cfg.u64);
> +	cvmx_write_csr(p->agl + AGL_GMX_PRT_CFG, prtx_cfg.u64);
>  
>  	p->last_link = 0;
>  	netif_carrier_off(netdev);
> @@ -946,7 +984,6 @@ static int octeon_mgmt_stop(struct net_device *netdev)
>  static int octeon_mgmt_xmit(struct sk_buff *skb, struct net_device *netdev)
>  {
>  	struct octeon_mgmt *p = netdev_priv(netdev);
> -	int port = p->port;
>  	union mgmt_port_ring_entry re;
>  	unsigned long flags;
>  	int rv = NETDEV_TX_BUSY;
> @@ -990,7 +1027,7 @@ static int octeon_mgmt_xmit(struct sk_buff *skb, struct net_device *netdev)
>  	netdev->stats.tx_bytes += skb->len;
>  
>  	/* Ring the bell.  */
> -	cvmx_write_csr(CVMX_MIXX_ORING2(port), 1);
> +	cvmx_write_csr(p->mix + MIX_ORING2, 1);
>  
>  	rv = NETDEV_TX_OK;
>  out:
> @@ -1067,12 +1104,14 @@ static const struct net_device_ops octeon_mgmt_ops = {
>  #endif
>  };
>  
> -static int __devinit octeon_mgmt_probe(struct platform_device *pdev)
> +static int __devinit octeon_mgmt_probe(struct platform_device *pdev,
> +				       const struct of_device_id *match)

Same as my comments on the i2c patch; this driver does *not* need to
be converted into an of_platform_driver.  Normal platform_driver will
work just fine.

>  {
> -	struct resource *res_irq;
>  	struct net_device *netdev;
>  	struct octeon_mgmt *p;
> -	int i;
> +	const __be32 *data;
> +	const u8 *mac;
> +	int len;
>  
>  	netdev = alloc_etherdev(sizeof(struct octeon_mgmt));
>  	if (netdev == NULL)
> @@ -1086,14 +1125,21 @@ static int __devinit octeon_mgmt_probe(struct platform_device *pdev)
>  	p->netdev = netdev;
>  	p->dev = &pdev->dev;
>  
> -	p->port = pdev->id;
> +	data = of_get_property(pdev->dev.of_node, "unit-number", &len);
> +	if (data && len == sizeof(*data))
> +		p->port = be32_to_cpup(data);
> +	else
> +		p->port = 0;

Port numbers should be dynamically assigned.  If you *really* want to
assign logical names to devices, then add properties to the /aliases
node.  unit-number is the sort of thing that causes problems in the
future.

> +
>  	snprintf(netdev->name, IFNAMSIZ, "mgmt%d", p->port);
>  
> -	res_irq = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
> -	if (!res_irq)
> +	p->irq = irq_of_parse_and_map(pdev->dev.of_node, 0);
> +	if (!p->irq)
>  		goto err;

Drop this hunk

>  
> -	p->irq = res_irq->start;
> +	p->mix = (u64)of_iomap(pdev->dev.of_node, 0);
> +	p->agl = (u64)of_iomap(pdev->dev.of_node, 1);
> +
>  	spin_lock_init(&p->lock);
>  
>  	skb_queue_head_init(&p->tx_list);
> @@ -1104,15 +1150,15 @@ static int __devinit octeon_mgmt_probe(struct platform_device *pdev)
>  	netdev->netdev_ops = &octeon_mgmt_ops;
>  	netdev->ethtool_ops = &octeon_mgmt_ethtool_ops;
>  
> -	/* The mgmt ports get the first N MACs.  */
> -	for (i = 0; i < 6; i++)
> -		netdev->dev_addr[i] = octeon_bootinfo->mac_addr_base[i];
> -	netdev->dev_addr[5] += p->port;
> +	mac = of_get_mac_address(pdev->dev.of_node);
> +
> +	if (mac)
> +		memcpy(netdev->dev_addr, mac, 6);
>  
> -	if (p->port >= octeon_bootinfo->mac_addr_count)
> -		dev_err(&pdev->dev,
> -			"Error %s: Using MAC outside of the assigned range: %pM\n",
> -			netdev->name, netdev->dev_addr);
> +	p->phy_np = of_parse_phandle(pdev->dev.of_node, "phy-handle", 0);
> +
> +	pdev->dev.coherent_dma_mask = DMA_BIT_MASK(64);
> +	pdev->dev.dma_mask = &pdev->dev.coherent_dma_mask;
>  
>  	if (register_netdev(netdev))
>  		goto err;
> @@ -1133,10 +1179,19 @@ static int __devexit octeon_mgmt_remove(struct platform_device *pdev)
>  	return 0;
>  }
>  
> -static struct platform_driver octeon_mgmt_driver = {
> +static struct of_device_id octeon_mgmt_match[] = {
> +	{
> +		.compatible = "octeon,mgmt",

Ditto to the i2c comment.  Compatible value should be in the form
<vendor>,<soc-part-number>-<device>, and it needs to be documented in
Documentation/devicetree/bindings

> +	},
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, octeon_mgmt_match);
> +
> +static struct of_platform_driver octeon_mgmt_driver = {

of_platform_driver is deprecated; just stick with a normal
platform_driver.

>  	.driver = {
>  		.name		= "octeon_mgmt",
>  		.owner		= THIS_MODULE,
> +		.of_match_table = octeon_mgmt_match,
>  	},
>  	.probe		= octeon_mgmt_probe,
>  	.remove		= __devexit_p(octeon_mgmt_remove),
> @@ -1148,12 +1203,12 @@ static int __init octeon_mgmt_mod_init(void)
>  {
>  	/* Force our mdiobus driver module to be loaded first. */
>  	octeon_mdiobus_force_mod_depencency();
> -	return platform_driver_register(&octeon_mgmt_driver);
> +	return of_register_platform_driver(&octeon_mgmt_driver);

Drop this hunk

>  }
>  
>  static void __exit octeon_mgmt_mod_exit(void)
>  {
> -	platform_driver_unregister(&octeon_mgmt_driver);
> +	of_unregister_platform_driver(&octeon_mgmt_driver);

Ditto

>  }
>  
>  module_init(octeon_mgmt_mod_init);
> -- 
> 1.7.2.3
> 

^ permalink raw reply

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Patrick McHardy @ 2011-02-23 16:24 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Stephen Hemminger, netdev, Andi Kleen
In-Reply-To: <1298474091.3301.364.camel@edumazet-laptop>

Am 23.02.2011 16:14, schrieb Eric Dumazet:
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index 16626a0..f40d32e 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -218,6 +218,7 @@ struct tcf_proto {
>  
>  struct qdisc_skb_cb {
>  	unsigned int		pkt_len;
> +	unsigned int		sfb_classid;
>  	char			data[];
>  };

This could be moved into a SFB specific cb, similar to what netem
does.

> diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
> new file mode 100644
> index 0000000..b7f1c6e
> --- /dev/null
> +++ b/net/sched/sch_sfb.c
> @@ -0,0 +1,696 @@
> +/*
> + * net/sched/sch_sfb.c	  Stochastic Fair Blue
> + *
> + * Copyright (c) 2008-2011 Juliusz Chroboczek <jch@pps.jussieu.fr>
> + * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * version 2 as published by the Free Software Foundation.
> + *
> + * W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: 
> + * A New Class of Active Queue Management Algorithms. 
> + * U. Michigan CSE-TR-387-99, April 1999.
> + *
> + * http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf
> + *
> + */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/errno.h>
> +#include <linux/skbuff.h>
> +#include <linux/random.h>
> +#include <linux/jhash.h>
> +#include <net/ip.h>
> +#include <net/pkt_sched.h>
> +#include <net/inet_ecn.h>
> +
> +/*
> + * SFB uses two B[l][n] : L x N arrays of bins (L levels, N bins per level)
> + * This implementation uses L = 8 and N = 16
> + * This permits us to split one 32bit hash (provided per packet by rxhash or
> + * external classifier) into 8 subhashes of 4 bits.
> + */
> +#define SFB_BUCKET_SHIFT 4

If you want to make this dynamic, there are a couple of papers analyzing
combined hash functions for bloom filters, f.i.
"Less Hashing, Same Performance: Building a Better Bloom Filter".

> +/*
> + * If using 'internal' SFB flow classifier, sfb_classid is skb rxhash
> + * If using external classifier, sfb_classid contains the classid.
> + */
> +static u32 sfb_hash(const struct sk_buff *skb, u32 slot,
> +		    struct sfb_sched_data *q)
> +{
> +	return jhash_1word(qdisc_skb_cb(skb)->sfb_classid,
> +			   q->bins[slot].perturbation);
> +}
> +
> +/* Probabilities are coded as Q0.16 fixed-point values,
> + * with 0xFFFF representing 65535/65536 (almost 1.0)
> + * Addition and subtraction are saturating in [0, 65535]
> + */
> +static u32 prob_plus(u32 p1, u32 p2)
> +{
> +	u32 res = p1 + p2;
> +
> +	return min_t(u32, res, SFB_MAX_PROB);
> +}
> +
> +static u32 prob_minus(u32 p1, u32 p2)
> +{
> +	return p1 > p2 ? p1 - p2 : 0;
> +}
> +
> +static void increment_one_qlen(u32 sfbhash, u32 slot, struct sfb_sched_data *q)
> +{
> +	int i;
> +	struct sfb_bucket *b = &q->bins[slot].bins[0][0];
> +
> +	for (i = 0; i < SFB_LEVELS; i++) {
> +		u32 hash = sfbhash & SFB_BUCKET_MASK;
> +
> +		sfbhash >>= SFB_BUCKET_SHIFT;
> +		if (b[hash].qlen < 0xFFFF)
> +			b[hash].qlen++;
> +		b += SFB_NUMBUCKETS; /* next level */
> +	}
> +}
> +
> +static void increment_qlen(u32 hashes[2], struct sfb_sched_data *q)
> +{
> +	u32 slot = q->slot;
> +
> +	increment_one_qlen(hashes[slot], slot, q);
> +	if (q->double_buffering) {
> +		slot ^= 1;
> +		increment_one_qlen(hashes[slot], slot, q);
> +	}
> +}
> +
> +static void decrement_one_qlen(u32 sfbhash, u32 slot,
> +			       struct sfb_sched_data *q)
> +{
> +	int i;
> +	struct sfb_bucket *b = &q->bins[slot].bins[0][0];
> +
> +	for (i = 0; i < SFB_LEVELS; i++) {
> +		u32 hash = sfbhash & SFB_BUCKET_MASK;
> +
> +		sfbhash >>= SFB_BUCKET_SHIFT;
> +		if (b[hash].qlen > 0)
> +			b[hash].qlen--;
> +		b += SFB_NUMBUCKETS; /* next level */
> +	}
> +}
> +
> +static void decrement_qlen(struct sk_buff *skb, struct sfb_sched_data *q)
> +{
> +	u32 slot = q->slot;
> +	u32 sfbhash = sfb_hash(skb, slot, q);
> +
> +	decrement_one_qlen(sfbhash, slot, q);
> +	if (q->double_buffering) {

This needs to be a per-skb property, otherwise you could have the
situation:

- enqueue skb, double_buffering=0, increment buffer 0
- enable double buffering
- swap buffers
- dequeue same skb, decrement buffer 1

after which the qlen values of buffer 1 will be incorrect.


> +		slot ^= 1;
> +		sfbhash = sfb_hash(skb, slot, q);

Isn't there room in the cb to store both hash values?

> +		decrement_one_qlen(sfbhash, slot, q);
> +	}
> +}
> +

^ permalink raw reply

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Patrick McHardy @ 2011-02-23 16:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Stephen Hemminger, David Miller, Juliusz Chroboczek,
	John W. Linville, netdev, Andi Kleen
In-Reply-To: <1298477600.3301.367.camel@edumazet-laptop>

Am 23.02.2011 17:13, schrieb Eric Dumazet:
> Le mercredi 23 février 2011 à 07:43 -0800, Stephen Hemminger a écrit :
>> On Wed, 23 Feb 2011 16:14:51 +0100
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>> 1) SFB default child qdisc is pfifo_fast. It can be changed by another
>>> qdisc but a child qdisc MUST not drop a packet previously queued. This
>>> is because SFB needs to handle a dequeued packet in order to maintain
>>> its virtual queue states. pfifo_head_drop or CHOKe should not be used.
>>
>> Why not add a flag field to Qdisc_ops and to mark qdisc's that
>> are (or not) work conserving?
>>
> 
> That was my initial idea, but have no idea how to implement it (outside
> of fast path, I mean...)

This also doesn't really have anything to do with work-conserving
qdiscs, SFB f.i. is work conserving, but still might drop other
packets. Actually I don't think there's any qdisc besides the
*fifos that can reasonably be used with SFB, so we might as well
only support a built-in qdisc.


^ permalink raw reply

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Eric Dumazet @ 2011-02-23 16:13 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Patrick McHardy, netdev, Andi Kleen
In-Reply-To: <20110223074305.755c5713@nehalam>

Le mercredi 23 février 2011 à 07:43 -0800, Stephen Hemminger a écrit :
> On Wed, 23 Feb 2011 16:14:51 +0100
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > 1) SFB default child qdisc is pfifo_fast. It can be changed by another
> > qdisc but a child qdisc MUST not drop a packet previously queued. This
> > is because SFB needs to handle a dequeued packet in order to maintain
> > its virtual queue states. pfifo_head_drop or CHOKe should not be used.
> 
> Why not add a flag field to Qdisc_ops and to mark qdisc's that
> are (or not) work conserving?
> 

That was my initial idea, but have no idea how to implement it (outside
of fast path, I mean...)




^ permalink raw reply

* Re: [v3 RFC PATCH 0/4] Implement multiqueue virtio-net
From: Michael S. Tsirkin @ 2011-02-23 15:55 UTC (permalink / raw)
  To: Krishna Kumar2
  Cc: anthony, arnd, avi, davem, eric.dumazet, Simon Horman, kvm,
	netdev, rusty
In-Reply-To: <OFA5C641DD.CCB1702C-ON65257840.0024C22E-65257840.002545D9@in.ibm.com>

On Wed, Feb 23, 2011 at 12:18:36PM +0530, Krishna Kumar2 wrote:
> > "Michael S. Tsirkin" <mst@redhat.com> wrote on 02/23/2011 12:09:15 PM:
> 
> Hi Michael,
> 
> > > Yes. Michael Tsirkin had wanted to see how the MQ RX patch
> > > would look like, so I was in the process of getting the two
> > > working together. The patch is ready and is being tested.
> > > Should I send a RFC patch at this time?
> >
> > Yes, please do.
> 
> Sure, will get a build/test on latest bits and send in 1-2 days.
> 
> > > The TX-only patch helped the guest TX path but didn't help
> > > host->guest much (as tested using TCP_MAERTS from the guest).
> > > But with the TX+RX patch, both directions are getting
> > > improvements.
> >
> > Also, my hope is that with appropriate queue mapping,
> > we might be able to do away with heuristics to detect
> > single stream load that TX only code needs.
> 
> Yes, that whole stuff is removed, and the TX/RX path is
> unchanged with this patch (thankfully :)

Cool. I was wondering whether in that case, we can
do without host kernel changes at all,
and use a separate fd for each TX/RX pair.
The advantage of that approach is that this way,
the max fd limit naturally sets an upper bound
on the amount of resources userspace can use up.

Thoughts?

In any case, pls don't let the above delay
sending an RFC.

> > > Remote testing is still to be done.
> >
> > Others might be able to help here once you post the patch.
> 
> That's great, will appreciate any help.
> 
> Thanks,
> 
> - KK

^ permalink raw reply

* Re: [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Stephen Hemminger @ 2011-02-23 15:43 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, Juliusz Chroboczek, John W. Linville,
	Patrick McHardy, netdev, Andi Kleen
In-Reply-To: <1298474091.3301.364.camel@edumazet-laptop>

On Wed, 23 Feb 2011 16:14:51 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> 1) SFB default child qdisc is pfifo_fast. It can be changed by another
> qdisc but a child qdisc MUST not drop a packet previously queued. This
> is because SFB needs to handle a dequeued packet in order to maintain
> its virtual queue states. pfifo_head_drop or CHOKe should not be used.

Why not add a flag field to Qdisc_ops and to mark qdisc's that
are (or not) work conserving?

-- 

^ permalink raw reply

* Re: [V4 PATCH 3/3] bond: service netpoll arp queue on master device
From: Cong Wang @ 2011-02-23 15:34 UTC (permalink / raw)
  To: Neil Horman
  Cc: linux-kernel, Herbert Xu, David S. Miller, Neil Horman,
	Eric Dumazet, netdev
In-Reply-To: <20110223153041.GE11496@hmsreliant.think-freely.org>

于 2011年02月23日 23:30, Neil Horman 写道:
>
> seems reasonable, thanks!
> Acked-by: Neil Horman<nhorman@tuxdriver.com>
>

Thanks.

^ permalink raw reply

* Re: [V4 PATCH 3/3] bond: service netpoll arp queue on master device
From: Neil Horman @ 2011-02-23 15:30 UTC (permalink / raw)
  To: Cong Wang
  Cc: linux-kernel, Herbert Xu, David S. Miller, Neil Horman,
	Eric Dumazet, netdev
In-Reply-To: <4D6526BD.6030306@redhat.com>

On Wed, Feb 23, 2011 at 11:24:45PM +0800, Cong Wang wrote:
> Neil, ping. :)
> 
> 于 2011年02月18日 17:43, Amerigo Wang 写道:
> >Neil pointed out that we can't send ARP reply on behalf of slaves,
> >we need to move the arp queue to their bond device.
> >
> >Signed-off-by: WANG Cong<amwang@redhat.com>
> >Cc: Neil Horman<nhorman@redhat.com>
> >
> >---
> >  net/core/netpoll.c |    9 +++++++++
> >  1 files changed, 9 insertions(+), 0 deletions(-)
> >
> >diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> >index f68e694..013e04a 100644
> >--- a/net/core/netpoll.c
> >+++ b/net/core/netpoll.c
> >@@ -193,6 +193,15 @@ void netpoll_poll_dev(struct net_device *dev)
> >
> >  	poll_napi(dev);
> >
> >+	if (dev->priv_flags&  IFF_SLAVE) {
> >+		if (dev->npinfo) {
> >+			struct net_device *bond_dev = dev->master;
> >+			struct sk_buff *skb;
> >+			while ((skb = skb_dequeue(&dev->npinfo->arp_tx)))
> >+				skb_queue_tail(&bond_dev->npinfo->arp_tx, skb);
> >+		}
> >+	}
> >+
> >  	service_arp_queue(dev->npinfo);
> >
> >  	zap_completion_queue();
> 
> 
> -- 
> 我认识的人越多,我就越喜欢狗。

seems reasonable, thanks!
Acked-by: Neil Horman <nhorman@tuxdriver.com>

^ permalink raw reply

* Re: [V4 PATCH 3/3] bond: service netpoll arp queue on master device
From: Cong Wang @ 2011-02-23 15:24 UTC (permalink / raw)
  To: Amerigo Wang
  Cc: linux-kernel, Neil Horman, Herbert Xu, David S. Miller,
	Neil Horman, Eric Dumazet, netdev
In-Reply-To: <1298022215-21059-3-git-send-email-amwang@redhat.com>

Neil, ping. :)

于 2011年02月18日 17:43, Amerigo Wang 写道:
> Neil pointed out that we can't send ARP reply on behalf of slaves,
> we need to move the arp queue to their bond device.
>
> Signed-off-by: WANG Cong<amwang@redhat.com>
> Cc: Neil Horman<nhorman@redhat.com>
>
> ---
>   net/core/netpoll.c |    9 +++++++++
>   1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/net/core/netpoll.c b/net/core/netpoll.c
> index f68e694..013e04a 100644
> --- a/net/core/netpoll.c
> +++ b/net/core/netpoll.c
> @@ -193,6 +193,15 @@ void netpoll_poll_dev(struct net_device *dev)
>
>   	poll_napi(dev);
>
> +	if (dev->priv_flags&  IFF_SLAVE) {
> +		if (dev->npinfo) {
> +			struct net_device *bond_dev = dev->master;
> +			struct sk_buff *skb;
> +			while ((skb = skb_dequeue(&dev->npinfo->arp_tx)))
> +				skb_queue_tail(&bond_dev->npinfo->arp_tx, skb);
> +		}
> +	}
> +
>   	service_arp_queue(dev->npinfo);
>
>   	zap_completion_queue();


-- 
我认识的人越多,我就越喜欢狗。

^ permalink raw reply

* [PATCH net-next-2.6 v3] net_sched: SFB flow scheduler
From: Eric Dumazet @ 2011-02-23 15:14 UTC (permalink / raw)
  To: David Miller, Juliusz Chroboczek
  Cc: John W. Linville, Stephen Hemminger, Patrick McHardy, netdev,
	Andi Kleen
In-Reply-To: <1298390536.2861.9.camel@edumazet-laptop>

Hi David & Juliusz

Here is v3 of SFB. (previous ones were from Juliusz)

Thanks

[PATCH net-next-2.6 v3] net_sched: SFB flow scheduler

This is the Stochastic Fair Blue scheduler, based on work from :

W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue
Management Algorithms. U. Michigan CSE-TR-387-99, April 1999.

http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf

This implementation is based on work done by Juliusz Chroboczek

General SFB algorithm can be found in figure 14, page 15:

B[l][n] : L x N array of bins (L levels, N bins per level)
enqueue()
Calculate hash function values h{0}, h{1}, .. h{L-1}
Update bins at each level
for i = 0 to L - 1
   if (B[i][h{i}].qlen > bin_size)
      B[i][h{i}].pm += delta;
   else if (B[i][h{i}].qlen == 0)
      B[i][h{i}].pm -= delta;
pmin = min(B[0][h{0}].pm ... B[L-1][h{L-1}].pm);
if (pmin == 1.0)
    ratelimit();
else
    mark/drop with probabilty pmin;

I did the adaptation of Juliusz code to meet current kernel standards,
and various changes to address previous comments :

http://thread.gmane.org/gmane.linux.network/90225
http://thread.gmane.org/gmane.linux.network/90375

Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
we can use an external flow classifier if wanted.

tc qdisc add dev $IFB parent 1:11 handle 11:  \
	est 0.5sec 2sec sfb limit 128

tc filter add dev $DEV protocol ip parent 11: handle 3 \
	flow hash keys dst divisor 1024

Notes:

1) SFB default child qdisc is pfifo_fast. It can be changed by another
qdisc but a child qdisc MUST not drop a packet previously queued. This
is because SFB needs to handle a dequeued packet in order to maintain
its virtual queue states. pfifo_head_drop or CHOKe should not be used.

2) I added one field in qdisc_skb_cb because SFB needs to remember the
hash/classid of an skb to decrement virtual queue lengthes at dequeue()
time.

3) ECN is enabled by default, unlike RED/CHOKe/GRED

With help from Patrick McHardy & Andi Kleen

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: John W. Linville <linville@tuxdriver.com>
---
 include/linux/pkt_sched.h |   38 +
 include/net/sch_generic.h |    1 
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_sfb.c       |  696 ++++++++++++++++++++++++++++++++++++
 5 files changed, 747 insertions(+)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index d4bb6f5..629a8b0 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -522,4 +522,42 @@ struct tc_mqprio_qopt {
 	__u16	offset[TC_QOPT_MAX_QUEUE];
 };
 
+/* SFB */
+
+enum {
+	TCA_SFB_UNSPEC,
+	TCA_SFB_PARMS,
+	__TCA_SFB_MAX,
+};
+
+#define TCA_SFB_MAX (__TCA_SFB_MAX - 1)
+
+/*
+ * Note: increment, decrement are Q0.16 fixed-point values.
+ */
+struct tc_sfb_qopt {
+	__u32 rehash_interval;	/* delay between hash flip, in seconds */
+	__u32 db_interval;	/* double buffering interval in seconds (db_interval < rehash_interval) */
+	__u32 max;		/* max len of qlen_min */
+	__u32 target;		/* bin_size */
+	__u32 increment;	/* delta, (d1 in Blue) */
+	__u32 decrement;	/* delta, (d2 in Blue) */
+	__u32 limit;		/* max SFB queue length */
+	__u32 penalty_rate;
+	__u32 penalty_burst;
+};
+
+struct tc_sfb_xstats {
+	__u32 earlydrop;
+	__u32 penaltydrop;
+	__u32 bucketdrop;
+	__u32 queuedrop;
+	__u32 childdrop; /* drops in child qdisc */
+	__u32 marked;
+	__u32 maxqlen;
+	__u32 maxprob;
+};
+
+#define SFB_MAX_PROB 0xFFFF
+
 #endif
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 16626a0..f40d32e 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -218,6 +218,7 @@ struct tcf_proto {
 
 struct qdisc_skb_cb {
 	unsigned int		pkt_len;
+	unsigned int		sfb_classid;
 	char			data[];
 };
 
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 8c19b6e..a7a5583 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -126,6 +126,17 @@ config NET_SCH_RED
 	  To compile this code as a module, choose M here: the
 	  module will be called sch_red.
 
+config NET_SCH_SFB
+	tristate "Stochastic Fair Blue (SFB)"
+	---help---
+	  Say Y here if you want to use the Stochastic Fair Blue (SFB)
+	  packet scheduling algorithm.
+
+	  See the top of <file:net/sched/sch_sfb.c> for more details.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_sfb.
+
 config NET_SCH_SFQ
 	tristate "Stochastic Fairness Queueing (SFQ)"
 	---help---
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 06c6cdf..2e77b8d 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_NET_SCH_RED)	+= sch_red.o
 obj-$(CONFIG_NET_SCH_GRED)	+= sch_gred.o
 obj-$(CONFIG_NET_SCH_INGRESS)	+= sch_ingress.o 
 obj-$(CONFIG_NET_SCH_DSMARK)	+= sch_dsmark.o
+obj-$(CONFIG_NET_SCH_SFB)	+= sch_sfb.o
 obj-$(CONFIG_NET_SCH_SFQ)	+= sch_sfq.o
 obj-$(CONFIG_NET_SCH_TBF)	+= sch_tbf.o
 obj-$(CONFIG_NET_SCH_TEQL)	+= sch_teql.o
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
new file mode 100644
index 0000000..b7f1c6e
--- /dev/null
+++ b/net/sched/sch_sfb.c
@@ -0,0 +1,696 @@
+/*
+ * net/sched/sch_sfb.c	  Stochastic Fair Blue
+ *
+ * Copyright (c) 2008-2011 Juliusz Chroboczek <jch@pps.jussieu.fr>
+ * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: 
+ * A New Class of Active Queue Management Algorithms. 
+ * U. Michigan CSE-TR-387-99, April 1999.
+ *
+ * http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/random.h>
+#include <linux/jhash.h>
+#include <net/ip.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+
+/*
+ * SFB uses two B[l][n] : L x N arrays of bins (L levels, N bins per level)
+ * This implementation uses L = 8 and N = 16
+ * This permits us to split one 32bit hash (provided per packet by rxhash or
+ * external classifier) into 8 subhashes of 4 bits.
+ */
+#define SFB_BUCKET_SHIFT 4
+#define SFB_NUMBUCKETS	(1 << SFB_BUCKET_SHIFT) /* N bins per Level */
+#define SFB_BUCKET_MASK (SFB_NUMBUCKETS - 1)
+#define SFB_LEVELS	(32 / SFB_BUCKET_SHIFT) /* L */
+
+/* SFB algo uses a virtual queue, named "bin" */
+struct sfb_bucket {
+	u16		qlen; /* length of virtual queue */
+	u16		pm; /* marking probability */
+};
+
+/* We use a double buffering right before hash change
+ * (Section 4.4 of SFB reference : moving hash functions)
+ */
+struct sfb_bins {
+	u32		  perturbation; /* jhash perturbation */
+	struct sfb_bucket bins[SFB_LEVELS][SFB_NUMBUCKETS];
+};
+
+struct sfb_sched_data {
+	struct Qdisc	*qdisc;
+	struct tcf_proto *filter_list;
+	unsigned long	rehash_interval;
+	unsigned long	db_interval;	/* interval of double buffering */
+	u32		max;
+	u32		target;		/* bin_size */
+	u32		increment;	/* d1 */
+	u32		decrement;	/* d2 */
+	u32		limit;		/* HARD maximal queue length */
+	u32		penalty_rate;
+	u32		penalty_burst;
+	u32		tokens_avail;
+	unsigned long	rehash_time;
+	unsigned long	token_time;
+
+	u8		slot;		/* current active bins (0 or 1) */
+	bool		double_buffering;
+	struct sfb_bins bins[2];
+
+	struct {
+		u32	earlydrop;
+		u32	penaltydrop;
+		u32	bucketdrop;
+		u32	queuedrop;
+		u32	childdrop;	/* drops in child qdisc */
+		u32	marked;		/* ECN mark */
+	} stats;
+};
+
+/*
+ * If using 'internal' SFB flow classifier, sfb_classid is skb rxhash
+ * If using external classifier, sfb_classid contains the classid.
+ */
+static u32 sfb_hash(const struct sk_buff *skb, u32 slot,
+		    struct sfb_sched_data *q)
+{
+	return jhash_1word(qdisc_skb_cb(skb)->sfb_classid,
+			   q->bins[slot].perturbation);
+}
+
+/* Probabilities are coded as Q0.16 fixed-point values,
+ * with 0xFFFF representing 65535/65536 (almost 1.0)
+ * Addition and subtraction are saturating in [0, 65535]
+ */
+static u32 prob_plus(u32 p1, u32 p2)
+{
+	u32 res = p1 + p2;
+
+	return min_t(u32, res, SFB_MAX_PROB);
+}
+
+static u32 prob_minus(u32 p1, u32 p2)
+{
+	return p1 > p2 ? p1 - p2 : 0;
+}
+
+static void increment_one_qlen(u32 sfbhash, u32 slot, struct sfb_sched_data *q)
+{
+	int i;
+	struct sfb_bucket *b = &q->bins[slot].bins[0][0];
+
+	for (i = 0; i < SFB_LEVELS; i++) {
+		u32 hash = sfbhash & SFB_BUCKET_MASK;
+
+		sfbhash >>= SFB_BUCKET_SHIFT;
+		if (b[hash].qlen < 0xFFFF)
+			b[hash].qlen++;
+		b += SFB_NUMBUCKETS; /* next level */
+	}
+}
+
+static void increment_qlen(u32 hashes[2], struct sfb_sched_data *q)
+{
+	u32 slot = q->slot;
+
+	increment_one_qlen(hashes[slot], slot, q);
+	if (q->double_buffering) {
+		slot ^= 1;
+		increment_one_qlen(hashes[slot], slot, q);
+	}
+}
+
+static void decrement_one_qlen(u32 sfbhash, u32 slot,
+			       struct sfb_sched_data *q)
+{
+	int i;
+	struct sfb_bucket *b = &q->bins[slot].bins[0][0];
+
+	for (i = 0; i < SFB_LEVELS; i++) {
+		u32 hash = sfbhash & SFB_BUCKET_MASK;
+
+		sfbhash >>= SFB_BUCKET_SHIFT;
+		if (b[hash].qlen > 0)
+			b[hash].qlen--;
+		b += SFB_NUMBUCKETS; /* next level */
+	}
+}
+
+static void decrement_qlen(struct sk_buff *skb, struct sfb_sched_data *q)
+{
+	u32 slot = q->slot;
+	u32 sfbhash = sfb_hash(skb, slot, q);
+
+	decrement_one_qlen(sfbhash, slot, q);
+	if (q->double_buffering) {
+		slot ^= 1;
+		sfbhash = sfb_hash(skb, slot, q);
+		decrement_one_qlen(sfbhash, slot, q);
+	}
+}
+
+static void decrement_prob(struct sfb_bucket *b, struct sfb_sched_data *q)
+{
+	b->pm =	prob_minus(b->pm, q->decrement);
+}
+
+static void increment_prob(struct sfb_bucket *b, struct sfb_sched_data *q)
+{
+	b->pm = prob_plus(b->pm, q->increment);
+}
+
+static void sfb_zero_all_buckets(int slot, struct sfb_sched_data *q)
+{
+	memset(&q->bins[slot], 0, sizeof(q->bins[slot]));
+}
+
+/*
+ * compute max qlen and max pm
+ */
+static u32 sfb_compute_qlen(u32 *prob_r, const struct sfb_sched_data *q)
+{
+	int i;
+	u32 qlen = 0, prob = 0;
+	const struct sfb_bucket *b = &q->bins[q->slot].bins[0][0];
+
+	for (i = 0; i < SFB_LEVELS * SFB_NUMBUCKETS; i++) {
+		if (qlen < b->qlen)
+			qlen = b->qlen;
+		if (prob < b->pm)
+			prob = b->pm;
+		b++;
+	}
+	*prob_r = prob;
+	return qlen;
+}
+
+
+static void sfb_init_perturbation(u32 slot, struct sfb_sched_data *q)
+{
+	q->bins[slot].perturbation = net_random();
+}
+
+static void sfb_swap_buffers(struct sfb_sched_data *q)
+{
+	sfb_zero_all_buckets(q->slot, q);
+	sfb_init_perturbation(q->slot, q);
+	q->slot ^= 1;
+	q->double_buffering = false;
+}
+
+/* Non elastic flows are allowed to use part of the bandwidth, expressed
+ * in "penalty_rate" packets per second, with "penalty_burst" burst
+ */
+static bool sfb_rate_limit(struct sk_buff *skb, struct sfb_sched_data *q)
+{
+	if (q->penalty_rate == 0 || q->penalty_burst == 0)
+		return true;
+
+	if (q->tokens_avail < 1) {
+		unsigned long age = min(10UL * HZ, jiffies - q->token_time);
+
+		q->tokens_avail = (age * q->penalty_rate) / HZ;
+		if (q->tokens_avail > q->penalty_burst)
+			q->tokens_avail = q->penalty_burst;
+		q->token_time = jiffies;
+		if (q->tokens_avail < 1)
+			return true;
+	}
+
+	q->tokens_avail--;
+	return false;
+}
+
+static bool sfb_classify(struct sk_buff *skb, struct sfb_sched_data *q,
+			 int *qerr)
+{
+	struct tcf_result res;
+	int result;
+
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return false;
+		}
+#endif
+		qdisc_skb_cb(skb)->sfb_classid = TC_H_MIN(res.classid);
+		return true;
+	}
+	return false;
+}
+
+static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = q->qdisc;
+	int i;
+	u32 minprob = SFB_MAX_PROB;
+	u32 minqlen = ~0;
+	u32 r, slot, hashes[2], sfbhash;
+	int ret = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+	if (q->filter_list) {
+		/* If using external classifiers, get result and record it. */
+		if (!sfb_classify(skb, q, &ret))
+			goto other_drop;
+	} else {
+		qdisc_skb_cb(skb)->sfb_classid = skb_get_rxhash(skb);
+	}
+
+	if (q->rehash_interval > 0) {
+		unsigned long limit = q->rehash_time + q->rehash_interval;
+
+		if (unlikely(time_after(jiffies, limit))) {
+			sfb_swap_buffers(q);
+			q->rehash_time = jiffies;
+		} else if (unlikely(!q->double_buffering && q->db_interval > 0 &&
+				    time_after(jiffies, limit - q->db_interval))) {
+			q->double_buffering = true;
+		}
+	}
+
+	slot = q->slot;
+
+	hashes[slot] = sfbhash = sfb_hash(skb, slot, q);
+	for (i = 0; i < SFB_LEVELS; i++) {
+		u32 hash = sfbhash & SFB_BUCKET_MASK;
+		struct sfb_bucket *b = &q->bins[slot].bins[i][hash];
+
+		sfbhash >>= SFB_BUCKET_SHIFT;
+		if (b->qlen == 0)
+			decrement_prob(b, q);
+		else if (unlikely(b->qlen >= q->target))
+			increment_prob(b, q);
+		if (minqlen > b->qlen)
+			minqlen = b->qlen;
+		if (minprob > b->pm)
+			minprob = b->pm;
+	}
+
+	if (q->double_buffering) {
+		slot ^= 1;
+		hashes[slot] = sfbhash = sfb_hash(skb, slot, q);
+		for (i = 0; i < SFB_LEVELS; i++) {
+			u32 hash = sfbhash & SFB_BUCKET_MASK;
+			struct sfb_bucket *b = &q->bins[slot].bins[i][hash];
+
+			sfbhash >>= SFB_BUCKET_SHIFT;
+			if (b->qlen == 0)
+				decrement_prob(b, q);
+			else if (unlikely(b->qlen >= q->target))
+				increment_prob(b, q);
+		}
+	}
+
+	if (unlikely(minqlen >= q->max || sch->q.qlen >= q->limit)) {
+		sch->qstats.overlimits++;
+		if (minqlen >= q->max)
+			q->stats.bucketdrop++;
+		else
+			q->stats.queuedrop++;
+		goto drop;
+	}
+
+	if (unlikely(minprob >= SFB_MAX_PROB)) {
+		/* Inelastic flow */
+		if (sfb_rate_limit(skb, q)) {
+			sch->qstats.overlimits++;
+			q->stats.penaltydrop++;
+			goto drop;
+		}
+		goto enqueue;
+	}
+
+	r = net_random() & SFB_MAX_PROB;
+
+	if (unlikely(r < minprob)) {
+		if (unlikely(minprob > SFB_MAX_PROB / 2)) {
+			/* If we're marking that many packets, then either
+			 * this flow is unresponsive, or we're badly congested.
+			 * In either case, we want to start dropping packets.
+			 */
+			if (r < (minprob - SFB_MAX_PROB / 2) * 2) {
+				q->stats.earlydrop++;
+				goto drop;
+			}
+		}
+		if (INET_ECN_set_ce(skb)) {
+			q->stats.marked++;
+		} else {
+			q->stats.earlydrop++;
+			goto drop;
+		}
+	}
+
+enqueue:
+	ret = qdisc_enqueue(skb, child);
+	if (likely(ret == NET_XMIT_SUCCESS)) {
+		sch->q.qlen++;
+		increment_qlen(hashes, q);
+	} else if (net_xmit_drop_count(ret)) {
+		q->stats.childdrop++;
+		sch->qstats.drops++;
+	}
+	return ret;
+
+drop:
+	qdisc_drop(skb, sch);
+	return NET_XMIT_CN;
+other_drop:
+	if (ret & __NET_XMIT_BYPASS)
+		sch->qstats.drops++;
+	kfree_skb(skb);
+	return ret;
+}
+
+static struct sk_buff *sfb_dequeue(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = q->qdisc;
+	struct sk_buff *skb;
+
+	skb = child->dequeue(q->qdisc);
+
+	if (skb) {
+		qdisc_bstats_update(sch, skb);
+		sch->q.qlen--;
+		decrement_qlen(skb, q);
+	}
+
+	return skb;
+}
+
+static struct sk_buff *sfb_peek(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = q->qdisc;
+
+	return child->ops->peek(child);
+}
+
+/* No sfb_drop -- impossible since the child doesn't return the dropped skb. */
+
+static void sfb_reset(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	qdisc_reset(q->qdisc);
+	sch->q.qlen = 0;
+	q->slot = 0;
+	q->double_buffering = false;
+	sfb_zero_all_buckets(0, q);
+	sfb_zero_all_buckets(1, q);
+	sfb_init_perturbation(0, q);
+}
+
+static void sfb_destroy(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	qdisc_destroy(q->qdisc);
+}
+
+static const struct nla_policy sfb_policy[TCA_SFB_MAX + 1] = {
+	[TCA_SFB_PARMS]	= { .len = sizeof(struct tc_sfb_qopt) },
+};
+
+static int sfb_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = NULL;
+	struct nlattr *tb[TCA_SFB_MAX + 1];
+	struct tc_sfb_qopt *ctl;
+	u32 rehash_interval, db_interval;
+	u32 limit;
+	u32 max, target;
+	u32 increment, decrement;
+	u32 penalty_rate, penalty_burst;
+	int err;
+
+	if (opt == NULL) {
+		rehash_interval = 600;
+		db_interval = 60;
+		limit = 0;
+		max = 25;
+		target = 20;
+		increment = (SFB_MAX_PROB + 500) / 1000; /* 0.1 % */
+		decrement = (SFB_MAX_PROB + 3000) / 6000;
+		penalty_rate = 10;
+		penalty_burst = 20;
+	} else {
+		err = nla_parse_nested(tb, TCA_SFB_MAX, opt, sfb_policy);
+		if (err < 0)
+			return -EINVAL;
+
+		if (tb[TCA_SFB_PARMS] == NULL)
+			return -EINVAL;
+
+		ctl = nla_data(tb[TCA_SFB_PARMS]);
+
+		rehash_interval = ctl->rehash_interval;
+		db_interval = ctl->db_interval;
+		limit = ctl->limit;
+		max = ctl->max;
+		target = ctl->target;
+		increment = ctl->increment;
+		decrement = ctl->decrement;
+		penalty_rate = ctl->penalty_rate;
+		penalty_burst = ctl->penalty_burst;
+	}
+
+	if (limit == 0)
+		limit = qdisc_dev(sch)->tx_queue_len;
+	if (limit == 0)
+		limit = 1;
+
+	child = fifo_create_dflt(sch, &pfifo_qdisc_ops, limit);
+	if (IS_ERR(child))
+		return PTR_ERR(child);
+
+	sch_tree_lock(sch);
+
+	qdisc_tree_decrease_qlen(q->qdisc, q->qdisc->q.qlen);
+	qdisc_destroy(q->qdisc);
+	q->qdisc = child;
+
+	q->rehash_interval = (unsigned long)rehash_interval * HZ;
+	q->db_interval = (unsigned long)db_interval * HZ;
+	q->rehash_time = jiffies;
+	q->limit = limit;
+	q->increment = increment;
+	q->decrement = decrement;
+	q->max = max;
+	q->target = target;
+	q->penalty_rate = penalty_rate;
+	q->penalty_burst = penalty_burst;
+	q->tokens_avail = penalty_burst;
+	q->token_time = jiffies;
+
+	q->slot = 0;
+	q->double_buffering = false;
+	sfb_zero_all_buckets(0, q);
+	sfb_zero_all_buckets(1, q);
+	sfb_init_perturbation(0, q);
+
+	sch_tree_unlock(sch);
+
+	return 0;
+}
+
+static int sfb_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	q->qdisc = &noop_qdisc;
+	return sfb_change(sch, opt);
+}
+
+static int sfb_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts;
+	struct tc_sfb_qopt opt = {
+		.rehash_interval = q->rehash_interval / HZ,
+		.db_interval = q->db_interval / HZ,
+		.limit = q->limit,
+		.max = q->max,
+		.target = q->target,
+		.increment = q->increment,
+		.decrement = q->decrement,
+		.penalty_rate = q->penalty_rate,
+		.penalty_burst = q->penalty_burst,
+	};
+
+	sch->qstats.backlog = q->qdisc->qstats.backlog;
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	NLA_PUT(skb, TCA_SFB_PARMS, sizeof(opt), &opt);
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -EMSGSIZE;
+}
+
+static int sfb_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct tc_sfb_xstats st = {
+		.earlydrop = q->stats.earlydrop,
+		.penaltydrop = q->stats.penaltydrop,
+		.bucketdrop = q->stats.bucketdrop,
+		.queuedrop = q->stats.queuedrop,
+		.childdrop = q->stats.childdrop,
+		.marked = q->stats.marked,
+	};
+
+	st.maxqlen = sfb_compute_qlen(&st.maxprob, q);
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static int sfb_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	return -ENOSYS;
+}
+
+static int sfb_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
+		     struct Qdisc **old)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	if (new == NULL)
+		new = &noop_qdisc;
+
+	sch_tree_lock(sch);
+	*old = q->qdisc;
+	q->qdisc = new;
+	qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
+	qdisc_reset(*old);
+	sch_tree_unlock(sch);
+	return 0;
+}
+
+static struct Qdisc *sfb_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	return q->qdisc;
+}
+
+static unsigned long sfb_get(struct Qdisc *sch, u32 classid)
+{
+	return 1;
+}
+
+static void sfb_put(struct Qdisc *sch, unsigned long arg)
+{
+}
+
+static int sfb_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
+			    struct nlattr **tca, unsigned long *arg)
+{
+	return -ENOSYS;
+}
+
+static int sfb_delete(struct Qdisc *sch, unsigned long cl)
+{
+	return -ENOSYS;
+}
+
+static void sfb_walk(struct Qdisc *sch, struct qdisc_walker *walker)
+{
+	if (!walker->stop) {
+		if (walker->count >= walker->skip)
+			if (walker->fn(sch, 1, walker) < 0) {
+				walker->stop = 1;
+				return;
+			}
+		walker->count++;
+	}
+}
+
+static struct tcf_proto **sfb_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static unsigned long sfb_bind(struct Qdisc *sch, unsigned long parent,
+			      u32 classid)
+{
+	return 0;
+}
+
+
+static const struct Qdisc_class_ops sfb_class_ops = {
+	.graft		=	sfb_graft,
+	.leaf		=	sfb_leaf,
+	.get		=	sfb_get,
+	.put		=	sfb_put,
+	.change		=	sfb_change_class,
+	.delete		=	sfb_delete,
+	.walk		=	sfb_walk,
+	.tcf_chain	=	sfb_find_tcf,
+	.bind_tcf	=	sfb_bind,
+	.unbind_tcf	=	sfb_put,
+	.dump		=	sfb_dump_class,
+};
+
+struct Qdisc_ops sfb_qdisc_ops __read_mostly = {
+	.id		=	"sfb",
+	.priv_size	=	sizeof(struct sfb_sched_data),
+	.cl_ops		=	&sfb_class_ops,
+	.enqueue	=	sfb_enqueue,
+	.dequeue	=	sfb_dequeue,
+	.peek		=	sfb_peek,
+	.init		=	sfb_init,
+	.reset		=	sfb_reset,
+	.destroy	=	sfb_destroy,
+	.change		=	sfb_change,
+	.dump		=	sfb_dump,
+	.dump_stats	=	sfb_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init sfb_module_init(void)
+{
+	return register_qdisc(&sfb_qdisc_ops);
+}
+
+static void __exit sfb_module_exit(void)
+{
+	unregister_qdisc(&sfb_qdisc_ops);
+}
+
+module_init(sfb_module_init)
+module_exit(sfb_module_exit)
+
+MODULE_DESCRIPTION("Stochastic Fair Blue queue discipline");
+MODULE_AUTHOR("Juliusz Chroboczek");
+MODULE_LICENSE("GPL");



^ permalink raw reply related

* [PATCH 2/2 net-next-2.6] r6040: bump to version 0.27 and date 23Feb2011
From: Florian Fainelli @ 2011-02-23 14:32 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: David Miller

From: Florian Fainelli <florian@openwrt.org>

Signed-off-by: Florian Fainelli <florian@openwrt.org>
---
diff --git a/drivers/net/r6040.c b/drivers/net/r6040.c
index 7965ae4..e3ebd90 100644
--- a/drivers/net/r6040.c
+++ b/drivers/net/r6040.c
@@ -49,8 +49,8 @@
 #include <asm/processor.h>

 #define DRV_NAME	"r6040"
-#define DRV_VERSION	"0.26"
-#define DRV_RELDATE	"30May2010"
+#define DRV_VERSION	"0.27"
+#define DRV_RELDATE	"23Feb2011"

 /* PHY CHIP Address */
 #define PHY1_ADDR	1	/* For MAC1 */
--
1.7.2.3


^ permalink raw reply related

* [PATCH 1/2 net-next-2.6] r6040: fix multicast operations
From: Florian Fainelli @ 2011-02-23 14:32 UTC (permalink / raw)
  To: netdev@vger.kernel.org; +Cc: David Miller

From: Shawn Lin <shawn@dmp.com.tw>

The original code does not work well when the number of mulitcast
address to handle is greater than MCAST_MAX. It only enable promiscous
mode instead of multicast hash table mode, so the hash table function
will not be activated and all multicast frames will be recieved in this
condition.

This patch fixes the following issues with the r6040 NIC operating in
multicast:

1) When the IFF_ALLMULTI flag is set, we should write 0xffff to the NIC
hash table registers to make it process multicast traffic.

2) When the number of multicast address to handle is smaller than
MCAST_MAX, we should use the NIC multicast registers MID1_{L,M,H}.

3) The hashing of the address was not correct, due to an invalid
substraction (15 - (crc & 0x0f)) instead of (crc & 0x0f) and an
incorrect crc algorithm (ether_crc_le) instead of (ether_crc).

4) If necessary, we should set HASH_EN flag in MCR0 to enable multicast
hash table function.

CC: stable@kernel.org
Reported-by: Marc Leclerc <marc-leclerc@signaturealpha.com>
Tested-by: Marc Leclerc <marc-leclerc@signaturealpha.com>
Signed-off-by: Shawn Lin <shawn@dmp.com.tw>
Signed-off-by: Albert Chen <albert.chen@rdc.com.tw>
Signed-off-by: Florian Fainelli <florian@openwrt.org>
---
Stable: 2.6.33+

diff --git a/drivers/net/r6040.c b/drivers/net/r6040.c
index 27e6f6d..7965ae4 100644
--- a/drivers/net/r6040.c
+++ b/drivers/net/r6040.c
@@ -69,6 +69,8 @@

 /* MAC registers */
 #define MCR0		0x00	/* Control register 0 */
+#define  MCR0_PROMISC	0x0020	/* Promiscuous mode */
+#define  MCR0_HASH_EN	0x0100	/* Enable multicast hash table function */
 #define MCR1		0x04	/* Control register 1 */
 #define  MAC_RST	0x0001	/* Reset the MAC */
 #define MBCR		0x08	/* Bus control */
@@ -851,77 +853,92 @@ static void r6040_multicast_list(struct net_device *dev)
 {
 	struct r6040_private *lp = netdev_priv(dev);
 	void __iomem *ioaddr = lp->base;
-	u16 *adrp;
-	u16 reg;
 	unsigned long flags;
 	struct netdev_hw_addr *ha;
 	int i;
+	u16 *adrp;
+	u16 hash_table[4] = { 0 };
+
+	spin_lock_irqsave(&lp->lock, flags);

-	/* MAC Address */
+	/* Keep our MAC Address */
 	adrp = (u16 *)dev->dev_addr;
 	iowrite16(adrp[0], ioaddr + MID_0L);
 	iowrite16(adrp[1], ioaddr + MID_0M);
 	iowrite16(adrp[2], ioaddr + MID_0H);

-	/* Promiscous Mode */
-	spin_lock_irqsave(&lp->lock, flags);
-
 	/* Clear AMCP & PROM bits */
-	reg = ioread16(ioaddr) & ~0x0120;
-	if (dev->flags & IFF_PROMISC) {
-		reg |= 0x0020;
-		lp->mcr0 |= 0x0020;
-	}
-	/* Too many multicast addresses
-	 * accept all traffic */
-	else if ((netdev_mc_count(dev) > MCAST_MAX) ||
-		 (dev->flags & IFF_ALLMULTI))
-		reg |= 0x0020;
+	lp->mcr0 = ioread16(ioaddr + MCR0) & ~(MCR0_PROMISC | MCR0_HASH_EN);

-	iowrite16(reg, ioaddr);
-	spin_unlock_irqrestore(&lp->lock, flags);
+	/* Promiscuous mode */
+	if (dev->flags & IFF_PROMISC)
+		lp->mcr0 |= MCR0_PROMISC;

-	/* Build the hash table */
-	if (netdev_mc_count(dev) > MCAST_MAX) {
-		u16 hash_table[4];
-		u32 crc;
+	/* Enable multicast hash table function to
+	 * receive all multicast packets. */
+	else if (dev->flags & IFF_ALLMULTI) {
+		lp->mcr0 |= MCR0_HASH_EN;

-		for (i = 0; i < 4; i++)
-			hash_table[i] = 0;
+		for (i = 0; i < MCAST_MAX ; i++) {
+			iowrite16(0, ioaddr + MID_1L + 8 * i);
+			iowrite16(0, ioaddr + MID_1M + 8 * i);
+			iowrite16(0, ioaddr + MID_1H + 8 * i);
+		}

+		for (i = 0; i < 4; i++)
+			hash_table[i] = 0xffff;
+	}
+	/* Use internal multicast address registers if the number of
+	 * multicast addresses is not greater than MCAST_MAX. */
+	else if (netdev_mc_count(dev) <= MCAST_MAX) {
+		i = 0;
 		netdev_for_each_mc_addr(ha, dev) {
-			char *addrs = ha->addr;
+			u16 *adrp = (u16 *) ha->addr;
+			iowrite16(adrp[0], ioaddr + MID_1L + 8 * i);
+			iowrite16(adrp[1], ioaddr + MID_1M + 8 * i);
+			iowrite16(adrp[2], ioaddr + MID_1H + 8 * i);
+			i++;
+		}
+		while (i < MCAST_MAX) {
+			iowrite16(0, ioaddr + MID_1L + 8 * i);
+			iowrite16(0, ioaddr + MID_1M + 8 * i);
+			iowrite16(0, ioaddr + MID_1H + 8 * i);
+			i++;
+		}
+	}
+	/* Otherwise, Enable multicast hash table function. */
+	else {
+		u32 crc;

-			if (!(*addrs & 1))
-				continue;
+		lp->mcr0 |= MCR0_HASH_EN;
+
+		for (i = 0; i < MCAST_MAX ; i++) {
+			iowrite16(0, ioaddr + MID_1L + 8 * i);
+			iowrite16(0, ioaddr + MID_1M + 8 * i);
+			iowrite16(0, ioaddr + MID_1H + 8 * i);
+		}

-			crc = ether_crc_le(6, addrs);
+		/* Build multicast hash table */
+		netdev_for_each_mc_addr(ha, dev) {
+			u8 *addrs = ha->addr;
+
+			crc = ether_crc(ETH_ALEN, addrs);
 			crc >>= 26;
-			hash_table[crc >> 4] |= 1 << (15 - (crc & 0xf));
+			hash_table[crc >> 4] |= 1 << (crc & 0xf);
 		}
-		/* Fill the MAC hash tables with their values */
+	}
+
+	iowrite16(lp->mcr0, ioaddr + MCR0);
+
+	/* Fill the MAC hash tables with their values */
+	if (lp->mcr0 && MCR0_HASH_EN) {
 		iowrite16(hash_table[0], ioaddr + MAR0);
 		iowrite16(hash_table[1], ioaddr + MAR1);
 		iowrite16(hash_table[2], ioaddr + MAR2);
 		iowrite16(hash_table[3], ioaddr + MAR3);
 	}
-	/* Multicast Address 1~4 case */
-	i = 0;
-	netdev_for_each_mc_addr(ha, dev) {
-		if (i >= MCAST_MAX)
-			break;
-		adrp = (u16 *) ha->addr;
-		iowrite16(adrp[0], ioaddr + MID_1L + 8 * i);
-		iowrite16(adrp[1], ioaddr + MID_1M + 8 * i);
-		iowrite16(adrp[2], ioaddr + MID_1H + 8 * i);
-		i++;
-	}
-	while (i < MCAST_MAX) {
-		iowrite16(0xffff, ioaddr + MID_1L + 8 * i);
-		iowrite16(0xffff, ioaddr + MID_1M + 8 * i);
-		iowrite16(0xffff, ioaddr + MID_1H + 8 * i);
-		i++;
-	}
+
+	spin_unlock_irqrestore(&lp->lock, flags);
 }

 static void netdev_get_drvinfo(struct net_device *dev,
--
1.7.2.3


^ permalink raw reply related

* check for return value of clk_get() in mpc512x_can_get_clock()
From: Andrej Krutak @ 2011-02-23 13:35 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: socketcan-core, netdev

Hi,

just wanted to inform you - in the mpc512x_can_get_clock() function 
(drivers/net/can/mscan/mpc5xxx_can.c), there are calls to clg_get(), and later 
checks

if (!sys_clk) {
and
if (!ref_clk) {

but these should be

if (IS_ERR(sys_clk)) {
and
if (IS_ERR(ref_clk)) {

(otherwise you'll get an oops if something goes wrong with clg_get()).


-- 
Best regards | S pozdravom | Mit freundlichen Grüßen

Andrej Krutak

^ permalink raw reply

* [PATCHv2 NEXT 2/2] qlcnic: Remove validation for max tx and max rx queues
From: Amit Kumar Salecha @ 2011-02-23 13:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty, Rajesh Borundia
In-Reply-To: <1298467285-7201-1-git-send-email-amit.salecha@qlogic.com>

From: Rajesh Borundia <rajesh.borundia@qlogic.com>

Max rx queues and tx queues are governed by fimware.
So driver should not validate these values.

Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |    4 ----
 drivers/net/qlcnic/qlcnic_main.c |    6 ++----
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index fa7f794..dc44564 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -1132,14 +1132,10 @@ struct qlcnic_eswitch {
 #define MAX_BW			100	/* % of link speed */
 #define MAX_VLAN_ID		4095
 #define MIN_VLAN_ID		2
-#define MAX_TX_QUEUES		1
-#define MAX_RX_QUEUES		4
 #define DEFAULT_MAC_LEARN	1
 
 #define IS_VALID_VLAN(vlan)	(vlan >= MIN_VLAN_ID && vlan < MAX_VLAN_ID)
 #define IS_VALID_BW(bw)		(bw <= MAX_BW)
-#define IS_VALID_TX_QUEUES(que)	(que > 0 && que <= MAX_TX_QUEUES)
-#define IS_VALID_RX_QUEUES(que)	(que > 0 && que <= MAX_RX_QUEUES)
 
 struct qlcnic_pci_func_cfg {
 	u16	func_type;
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 4994b94..cd88c7e 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -3653,10 +3653,8 @@ validate_npar_config(struct qlcnic_adapter *adapter,
 		if (adapter->npars[pci_func].type != QLCNIC_TYPE_NIC)
 			return QL_STATUS_INVALID_PARAM;
 
-		if (!IS_VALID_BW(np_cfg[i].min_bw)
-				|| !IS_VALID_BW(np_cfg[i].max_bw)
-				|| !IS_VALID_RX_QUEUES(np_cfg[i].max_rx_queues)
-				|| !IS_VALID_TX_QUEUES(np_cfg[i].max_tx_queues))
+		if (!IS_VALID_BW(np_cfg[i].min_bw) ||
+		    !IS_VALID_BW(np_cfg[i].max_bw))
 			return QL_STATUS_INVALID_PARAM;
 	}
 	return 0;
-- 
1.7.3.2


^ permalink raw reply related

* [PATCHv2 NEXT 1/2] qlcnic: fix checks for auto_fw_reset
From: Amit Kumar Salecha @ 2011-02-23 13:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty
In-Reply-To: <1298467285-7201-1-git-send-email-amit.salecha@qlogic.com>

o Remove checks of 1 for auto_fw_reset module parameter.
  auto_fw_reset is of type int and can have value > 1.
o Remove unnecessary #define for 1

Signed-off-by: Amit Kumar Salecha <amit.salecha@qlogic.com>
---
 drivers/net/qlcnic/qlcnic.h      |    1 -
 drivers/net/qlcnic/qlcnic_main.c |    9 ++++-----
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/qlcnic/qlcnic.h b/drivers/net/qlcnic/qlcnic.h
index 44e316f..fa7f794 100644
--- a/drivers/net/qlcnic/qlcnic.h
+++ b/drivers/net/qlcnic/qlcnic.h
@@ -867,7 +867,6 @@ struct qlcnic_nic_intr_coalesce {
 #define LINKEVENT_LINKSPEED_MBPS	0
 #define LINKEVENT_LINKSPEED_ENCODED	1
 
-#define AUTO_FW_RESET_ENABLED	0x01
 /* firmware response header:
  *	63:58 - message type
  *	57:56 - owner
diff --git a/drivers/net/qlcnic/qlcnic_main.c b/drivers/net/qlcnic/qlcnic_main.c
index 37c04b4..4994b94 100644
--- a/drivers/net/qlcnic/qlcnic_main.c
+++ b/drivers/net/qlcnic/qlcnic_main.c
@@ -42,7 +42,7 @@ static int use_msi_x = 1;
 module_param(use_msi_x, int, 0444);
 MODULE_PARM_DESC(use_msi_x, "MSI-X interrupt (0=disabled, 1=enabled");
 
-static int auto_fw_reset = AUTO_FW_RESET_ENABLED;
+static int auto_fw_reset = 1;
 module_param(auto_fw_reset, int, 0644);
 MODULE_PARM_DESC(auto_fw_reset, "Auto firmware reset (0=disabled, 1=enabled");
 
@@ -2959,8 +2959,7 @@ qlcnic_check_health(struct qlcnic_adapter *adapter)
 		if (adapter->need_fw_reset)
 			goto detach;
 
-		if (adapter->reset_context &&
-		    auto_fw_reset == AUTO_FW_RESET_ENABLED) {
+		if (adapter->reset_context && auto_fw_reset) {
 			qlcnic_reset_hw_context(adapter);
 			adapter->netdev->trans_start = jiffies;
 		}
@@ -2973,7 +2972,7 @@ qlcnic_check_health(struct qlcnic_adapter *adapter)
 
 	qlcnic_dev_request_reset(adapter);
 
-	if ((auto_fw_reset == AUTO_FW_RESET_ENABLED))
+	if (auto_fw_reset)
 		clear_bit(__QLCNIC_FW_ATTACHED, &adapter->state);
 
 	dev_info(&netdev->dev, "firmware hang detected\n");
@@ -2982,7 +2981,7 @@ detach:
 	adapter->dev_state = (state == QLCNIC_DEV_NEED_QUISCENT) ? state :
 		QLCNIC_DEV_NEED_RESET;
 
-	if ((auto_fw_reset == AUTO_FW_RESET_ENABLED) &&
+	if (auto_fw_reset &&
 		!test_and_set_bit(__QLCNIC_RESETTING, &adapter->state)) {
 
 		qlcnic_schedule_work(adapter, qlcnic_detach_work, 0);
-- 
1.7.3.2


^ permalink raw reply related

* [PATCHv2 NEXT 0/2]qlcnic: minor fixes
From: Amit Kumar Salecha @ 2011-02-23 13:21 UTC (permalink / raw)
  To: davem; +Cc: netdev, ameen.rahman, anirban.chakraborty

Hi
  Series v2 of minor fixes. Apply them on net-next.
  Now, patch 1 doesn't change module parameters name and its properties.

-Amit

^ permalink raw reply

* Re: [PATCH V3] Export ACPI _DSM provided firmware instance number and string name to sysfs
From: Narendra_K @ 2011-02-23 13:00 UTC (permalink / raw)
  To: mjg59
  Cc: linux-pci, linux-hotplug, netdev, Matt_Domsch, Charles_Rose,
	Jordan_Hargrave, Shyam_Iyer
In-Reply-To: <20110223124419.GA5695@srcf.ucam.org>

On Wed, Feb 23, 2011 at 06:14:19PM +0530, Matthew Garrett wrote:
> I think this version will still break the build. You need to depend on 
> CONFIG_NLS.

Matthew,

Thanks. I posted a patch to linux-next to fix the build failure.

With regards,
Narendra K

^ permalink raw reply

* [PATCH] f_phonet: avoid pskb_pull(), fix OOPS with CONFIG_HIGHMEM
From: Rémi Denis-Courmont @ 2011-02-23 12:51 UTC (permalink / raw)
  To: netdev
In-Reply-To: <linux-usb@vger.kernel.org>

This is similar to what we already do in cdc-phonet.c in the same
situation. pskb_pull() refuses to work with HIGHMEM, even if it is
known that the socket buffer is entirely in "low" memory.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
---
 drivers/usb/gadget/f_phonet.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/gadget/f_phonet.c b/drivers/usb/gadget/f_phonet.c
index 3c6e1a0..5e14950 100644
--- a/drivers/usb/gadget/f_phonet.c
+++ b/drivers/usb/gadget/f_phonet.c
@@ -346,14 +346,19 @@ static void pn_rx_complete(struct usb_ep *ep, struct usb_request *req)
 
 		if (unlikely(!skb))
 			break;
-		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page, 0,
-				req->actual);
-		page = NULL;
 
-		if (req->actual < req->length) { /* Last fragment */
+		if (skb->len == 0) { /* First fragment */
 			skb->protocol = htons(ETH_P_PHONET);
 			skb_reset_mac_header(skb);
-			pskb_pull(skb, 1);
+			/* Can't use pskb_pull() on page in IRQ */
+			memcpy(skb_put(skb, 1), page_address(page), 1);
+		}
+
+		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
+				skb->len == 0, req->actual);
+		page = NULL;
+
+		if (req->actual < req->length) { /* Last fragment */
 			skb->dev = dev;
 			dev->stats.rx_packets++;
 			dev->stats.rx_bytes += skb->len;
-- 
1.7.1


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox