Netdev List

Netdev List
 help / color / mirror / Atom feed

* RE: [net-next 07/12] ixgbe: Enable timesync clock-out feature for PPS support on X540
From: Keller, Jacob E @ 2012-05-11 18:10 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
	gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <20120511054036.GC2170@netboy.at.omicron.at>

> -----Original Message-----
> From: Richard Cochran [mailto:richardcochran@gmail.com]
> Sent: Thursday, May 10, 2012 10:41 PM
> To: Keller, Jacob E
> Cc: Kirsher, Jeffrey T; davem@davemloft.net; netdev@vger.kernel.org;
> gospo@redhat.com; sassmann@redhat.com
> Subject: Re: [net-next 07/12] ixgbe: Enable timesync clock-out feature for PPS
> support on X540
> 
> On Thu, May 10, 2012 at 10:08:44PM +0000, Keller, Jacob E wrote:
> >
> > Oops stupid mail program sent that on accident. Anyways: I think you
> > might be right, Richard. We don't read those timestamp values unless
> > the stat err bit for timestamps is set on the descriptor. But I am not
> > sure what happens when the tjmestamped packet is dropped off the end
> > of the ring. What would you propose here? How can we detect if this
> > timestamp doesn't match the packet? I can look into using the extra
> > hardware features for matching timestamps. That might be a more useful, in
> that it would help prevent this case.
> 
> [ Talking about the Rx time stamping locking from other patch... ]
> 
> The IGB provides some PTP event packet identification fields (seqNum,
> etc) just for the purpose of matching time stamps to packets. Some of the
> other PHC drivers (ixp4xx, dp83640) have code that does the matching.
> 

Ixgbe has the sequence number also. I'll take a look at the PHC drivers
that match already, and see how difficult it would be to perform this check.

- Jake

> HTH,
> Richard
> 

^ permalink raw reply

* RE: [net-next 07/12] ixgbe: Enable timesync clock-out feature for PPS support on X540
From: Keller, Jacob E @ 2012-05-11 18:09 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
	gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <20120511053605.GB2170@netboy.at.omicron.at>

> -----Original Message-----
> From: Richard Cochran [mailto:richardcochran@gmail.com]
> Sent: Thursday, May 10, 2012 10:36 PM
> To: Keller, Jacob E
> Cc: Kirsher, Jeffrey T; davem@davemloft.net; netdev@vger.kernel.org;
> gospo@redhat.com; sassmann@redhat.com
> Subject: Re: [net-next 07/12] ixgbe: Enable timesync clock-out feature for PPS
> support on X540
> 
> On Thu, May 10, 2012 at 09:54:52PM +0000, Keller, Jacob E wrote:
> > > Since this function is called in every interrupt, I would check this
> > > flag first thing.
> > >
> > Not sure what you mean? Check this before checking the other interrupts?
> > I can do that.
> 
> I only meant that, assuming that the other interrupt sources are much more
> frequent than the PPS, the normal case will be that the PPS flag is not set.
> 
> It would be more efficient (that is, shorter ISR code path in normal
> case) to check the flag first, perhaps like this.
> 
> ixgbe_intr()
> {
> 	...
> #ifdef CONFIG_IXGBE_PTP
> 	if (eicr & IXGBE_EICR_TIMESYNC)
> 		ixgbe_ptp_check_pps_event(adapter, eicr); #endif
> 	...
> }

Ok. This makes sense. :) Check the flag before checking the hardware type.

- Jake

> 
> Thanks,
> Richard

^ permalink raw reply

* RE: [net-next 06/12] ixgbe: Hardware Timestamping + PTP Hardware Clock (PHC)
From: Keller, Jacob E @ 2012-05-11 18:05 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Kirsher, Jeffrey T, davem@davemloft.net, netdev@vger.kernel.org,
	gospo@redhat.com, sassmann@redhat.com
In-Reply-To: <20120511051509.GA2170@netboy.at.omicron.at>

> -----Original Message-----
> From: Richard Cochran [mailto:richardcochran@gmail.com]
> Sent: Thursday, May 10, 2012 10:15 PM
> To: Keller, Jacob E
> Cc: Kirsher, Jeffrey T; davem@davemloft.net; netdev@vger.kernel.org;
> gospo@redhat.com; sassmann@redhat.com
> Subject: Re: [net-next 06/12] ixgbe: Hardware Timestamping + PTP Hardware
> Clock (PHC)
> 
> On Thu, May 10, 2012 at 09:53:18PM +0000, Keller, Jacob E wrote:
> > > > +	/*
> > > > +	 * If this bit is set, then the RX registers contain the time
> stamp. No
> > > > +	 * other packet will be time stamped until we read these
> registers, so
> > > > +	 * read the registers to make them available again. Because only
> one
> > > > +	 * packet can be time stamped at a time, we know that the
> register
> > > > +	 * values must belong to this one here and therefore we don't
> need to
> > > > +	 * compare any of the additional attributes stored for it.
> > >
> > > I suspect that this assumption is wrong. What happens if the time
> > > stamping logic locks a value but the packet is lost because the ring is
> full?
> > >
> > > BTW, the IGB driver also has this defect.
> > >
> >
> > Note how I read the rx registers first? So it will always clear the value.
> > That should unlock the value for the next rx stamp packet.
> 
> 1. Hw recognizes ptp event packet, locks time stamp 2. Hw drops packet because
> queue is full 3. No more time stamps are ever generated
> 
> Can this happen? The docs seems to say it can.
> 
> Richard

It might be possible. I'm curious what's the best method to solve this. I don't
think this generation of hardware can get an interrupt when the RX timestamp is
locked, and I don't know how easy it would be to process the sequence number within
the driver. It might be the case that the next packet would still be marked in the
descriptor and therefor the previous timestamp would apply to the next packet. I am
not sure, but I know we haven't seen issues where timestamps drop completely. This
is something I am not sure what's the best solution.

- Jake

^ permalink raw reply

* Re: [PATCH] net: of/phy: fix build error when phylib is built as a module
From: David Daney @ 2012-05-11 17:58 UTC (permalink / raw)
  To: Bjørn Mork, David Miller
  Cc: paul.gortmaker@windriver.com, rdunlap@xenotime.net,
	sfr@canb.auug.org.au, linux-next@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1336751221-19127-1-git-send-email-bjorn@mork.no>

On 05/11/2012 08:47 AM, Bjørn Mork wrote:
> CONFIG_OF_MDIO is tristate and will be m if PHYLIB is m.  Use
> IS_ENABLED macro to prevent build error:
>
>   ERROR: "of_mdio_find_bus" [drivers/net/phy/mdio-mux.ko] undefined!
>
> Reported-by: Randy Dunlap<rdunlap@xenotime.net>
> Cc: David Daney<david.daney@cavium.com>
> Signed-off-by: Bjørn Mork<bjorn@mork.no>

I was able to reproduce the failure, and this patch both fixes it and 
seems correct, so...

Acked-by: David Daney<david.daney@cavium.com>

Sorry about this failure.


> ---
> I wonder if this could be as banal as this?  Not even build tested...
>
> Should be wrapped into commit 25106022 if it works, to ensure
> bisectability.
>
>
>
> Bjørn
>
>   drivers/net/phy/mdio_bus.c |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
> index 83d5c9f..683ef1c 100644
> --- a/drivers/net/phy/mdio_bus.c
> +++ b/drivers/net/phy/mdio_bus.c
> @@ -88,7 +88,7 @@ static struct class mdio_bus_class = {
>   	.dev_release	= mdiobus_release,
>   };
>
> -#ifdef CONFIG_OF_MDIO
> +#if IS_ENABLED(CONFIG_OF_MDIO)
>   /* Helper function for of_mdio_find_bus */
>   static int of_mdio_bus_match(struct device *dev, void *mdio_bus_np)
>   {

^ permalink raw reply

* [PATCH] dm9000: some coldfire boards need this
From: Steven King @ 2012-05-11 16:49 UTC (permalink / raw)
  To: netdev

Some coldfire boards (ie m5253demo) have a dm9000 onboard.

Signed-off-by: Steven King <sfking@fdwdc.com>
---
 drivers/net/ethernet/davicom/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/davicom/Kconfig b/drivers/net/ethernet/davicom/Kconfig
index 972b62b..9745fe5 100644
--- a/drivers/net/ethernet/davicom/Kconfig
+++ b/drivers/net/ethernet/davicom/Kconfig
@@ -4,7 +4,7 @@
 
 config DM9000
 	tristate "DM9000 support"
-	depends on ARM || BLACKFIN || MIPS
+	depends on ARM || BLACKFIN || MIPS || COLDFIRE
 	select CRC32
 	select NET_CORE
 	select MII

^ permalink raw reply related

* [PATCH] Net ipv6: Fixed checkpatch errors
From: Cristian Chilipirea @ 2012-05-11 16:55 UTC (permalink / raw)
  To: davem
  Cc: kuznet, jmorris, yoshfuji, kaber, netdev, linux-kernel,
	daniel.baluta, Cristian Chilipirea

Fixed all of ERROR "foo* bar" should be "foo *bar"

Signed-off-by: Cristian Chilipirea <cristian.chilipirea@gmail.com>
---
 net/ipv6/addrconf.c |   23 ++++++++++++-----------
 1 files changed, 12 insertions(+), 11 deletions(-)
 mode change 100644 => 100755 net/ipv6/addrconf.c

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
old mode 100644
new mode 100644
index 7d5cb97..9dbc266
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -339,7 +339,7 @@ void in6_dev_finish_destroy(struct inet6_dev *idev)
 
 EXPORT_SYMBOL(in6_dev_finish_destroy);
 
-static struct inet6_dev * ipv6_add_dev(struct net_device *dev)
+static struct inet6_dev *ipv6_add_dev(struct net_device *dev)
 {
 	struct inet6_dev *ndev;
 
@@ -441,7 +441,7 @@ static struct inet6_dev * ipv6_add_dev(struct net_device *dev)
 	return ndev;
 }
 
-static struct inet6_dev * ipv6_find_idev(struct net_device *dev)
+static struct inet6_dev *ipv6_find_idev(struct net_device *dev)
 {
 	struct inet6_dev *idev;
 
@@ -1522,7 +1522,7 @@ static int addrconf_ifid_arcnet(u8 *eui, struct net_device *dev)
 	if (dev->addr_len != ARCNET_ALEN)
 		return -1;
 	memset(eui, 0, 7);
-	eui[7] = *(u8*)dev->dev_addr;
+	eui[7] = *(u8 *)dev->dev_addr;
 	return 0;
 }
 
@@ -1667,7 +1667,8 @@ out:
 	in6_dev_put(idev);
 }
 
-static int __ipv6_try_regen_rndid(struct inet6_dev *idev, struct in6_addr *tmpaddr) {
+static int __ipv6_try_regen_rndid(struct inet6_dev *idev, struct in6_addr *tmpaddr)
+{
 	int ret = 0;
 
 	if (tmpaddr && memcmp(idev->rndid, &tmpaddr->s6_addr[8], 8) == 0)
@@ -1908,7 +1909,7 @@ void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool sllao)
 	/* Try to figure out our local address for this prefix */
 
 	if (pinfo->autoconf && in6_dev->cnf.autoconf) {
-		struct inet6_ifaddr * ifp;
+		struct inet6_ifaddr *ifp;
 		struct in6_addr addr;
 		int create = 0, update_lft = 0;
 
@@ -2362,9 +2363,9 @@ static void sit_add_v4_addrs(struct inet6_dev *idev)
 	}
 
 	for_each_netdev(net, dev) {
-		struct in_device * in_dev = __in_dev_get_rtnl(dev);
+		struct in_device *in_dev = __in_dev_get_rtnl(dev);
 		if (in_dev && (dev->flags & IFF_UP)) {
-			struct in_ifaddr * ifa;
+			struct in_ifaddr *ifa;
 
 			int flag = scope;
 
@@ -2410,7 +2411,7 @@ static void init_loopback(struct net_device *dev)
 
 static void addrconf_add_linklocal(struct inet6_dev *idev, const struct in6_addr *addr)
 {
-	struct inet6_ifaddr * ifp;
+	struct inet6_ifaddr *ifp;
 	u32 addr_flags = IFA_F_PERMANENT;
 
 #ifdef CONFIG_IPV6_OPTIMISTIC_DAD
@@ -2431,7 +2432,7 @@ static void addrconf_add_linklocal(struct inet6_dev *idev, const struct in6_addr
 static void addrconf_dev_config(struct net_device *dev)
 {
 	struct in6_addr addr;
-	struct inet6_dev    * idev;
+	struct inet6_dev    *idev;
 
 	ASSERT_RTNL();
 
@@ -2567,7 +2568,7 @@ static void addrconf_ip6_tnl_config(struct net_device *dev)
 }
 
 static int addrconf_notify(struct notifier_block *this, unsigned long event,
-			   void * data)
+			   void *data)
 {
 	struct net_device *dev = (struct net_device *) data;
 	struct inet6_dev *idev = __in6_dev_get(dev);
@@ -3791,7 +3792,7 @@ static int inet6_dump_ifacaddr(struct sk_buff *skb, struct netlink_callback *cb)
 	return inet6_dump_addr(skb, cb, type);
 }
 
-static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr* nlh,
+static int inet6_rtm_getaddr(struct sk_buff *in_skb, struct nlmsghdr *nlh,
 			     void *arg)
 {
 	struct net *net = sock_net(in_skb->sk);
-- 
1.7.5.4

^ permalink raw reply related

* Re: linux-next: Tree for May 10 (net/phy)
From: David Daney @ 2012-05-11 16:14 UTC (permalink / raw)
  To: David Miller
  Cc: paul.gortmaker@windriver.com, rdunlap@xenotime.net,
	sfr@canb.auug.org.au, linux-next@vger.kernel.org,
	linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, Bjørn Mork
In-Reply-To: <20120510.224352.2149006669429730060.davem@davemloft.net>

On 05/10/2012 07:43 PM, David Miller wrote:
> From: Paul Gortmaker<paul.gortmaker@windriver.com>
> Date: Thu, 10 May 2012 22:36:55 -0400
>
>> On Thu, May 10, 2012 at 5:40 PM, Randy Dunlap<rdunlap@xenotime.net>  wrote:
>>> On 05/10/2012 02:26 AM, Stephen Rothwell wrote:
>>>
>>>> Hi all,
>>>>
>>>> Changes since 20120508:
>>>
>>>
>>>
>>> ERROR: "of_mdio_find_bus" [drivers/net/phy/mdio-mux.ko] undefined!
>>
>> Not just randconfig, but also powerpc allmodconfig:
>>
>> http://kisskb.ellerman.id.au/kisskb/buildresult/6291463/
>>
>> Adding ppc ML to cc.
>
> Adding the guilty party to the CC:.  This really stinks, especially
> after I did give the patch submitter a hard time about getting the
> dependencies right. :-/

Indeed, how embarrassing.

It looks like Bjørn may have already posted a fix.  I will try to verify 
that it works, and take appropriate action.

Sorry about the screw up,
David Daney

^ permalink raw reply

* Re: [PATCH net-next] fq_codel: Fair Queue Codel AQM
From: Eric Dumazet @ 2012-05-11 16:08 UTC (permalink / raw)
  To: Changli Gao
  Cc: David Miller, netdev, Dave Taht, Kathleen Nichols, Van Jacobson,
	Tom Herbert, Matt Mathis, Yuchung Cheng, Stephen Hemminger
In-Reply-To: <1336749810.31653.176.camel@edumazet-glaptop>

On Fri, 2012-05-11 at 17:23 +0200, Eric Dumazet wrote:

> Its possible I missed something, because in my first coding I had 3
> lists.
> 
> Anyway I'll send a V2 because I left .change method to NULL, while the
> intent was to permit a change on fq_codel.

Before sending v2, here is the diff against v1 :

 net/sched/sch_fq_codel.c |   26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 8675ff8..f29a967 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -231,18 +231,16 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
 	struct sk_buff *skb;
 	struct fq_codel_flow *flow;
+	struct list_head *head;
 
 begin:
-	if (!list_empty(&q->new_flows))
-		flow = list_first_entry(&q->new_flows,
-					struct fq_codel_flow,
-					flowchain);
-	else if (!list_empty(&q->old_flows))
-		flow = list_first_entry(&q->old_flows,
-					struct fq_codel_flow,
-					flowchain);
-	else
-		return NULL;
+	head = &q->new_flows;
+	if (list_empty(head)) {
+		head = &q->old_flows;
+		if (list_empty(head))
+			return NULL;
+	}
+	flow = list_first_entry(head, struct fq_codel_flow, flowchain);
 
 	if (flow->deficit <= 0) {
 		flow->deficit += q->quantum;
@@ -252,7 +250,11 @@ begin:
 	skb = codel_dequeue(sch, &q->cparams, &flow->cvars, &q->cstats,
 			    dequeue, &q->backlogs[flow - q->flows]);
 	if (!skb) {
-		list_del_init(&flow->flowchain);
+		/* force a pass through old_flows to prevent starvation */
+		if ((head == &q->new_flows) && !list_empty(&q->old_flows))
+			list_move_tail(&flow->flowchain, &q->old_flows);
+		else
+			list_del_init(&flow->flowchain);
 		goto begin;
 	}
 	qdisc_bstats_update(sch, skb);
@@ -573,7 +575,7 @@ static struct Qdisc_ops fq_codel_qdisc_ops __read_mostly = {
 	.init		=	fq_codel_init,
 	.reset		=	fq_codel_reset,
 	.destroy	=	fq_codel_destroy,
-	.change		=	NULL,
+	.change		=	fq_codel_change,
 	.dump		=	fq_codel_dump,
 	.dump_stats =	fq_codel_dump_stats,
 	.owner		=	THIS_MODULE,

^ permalink raw reply related

* [PATCH] net: of/phy: fix build error when phylib is built as a module
From: Bjørn Mork @ 2012-05-11 15:47 UTC (permalink / raw)
  To: David Miller
  Cc: paul.gortmaker, rdunlap, sfr, linux-next, linux-kernel, netdev,
	linuxppc-dev, david.daney, Bjørn Mork
In-Reply-To: <20120510.224352.2149006669429730060.davem@davemloft.net>

CONFIG_OF_MDIO is tristate and will be m if PHYLIB is m.  Use
IS_ENABLED macro to prevent build error:

 ERROR: "of_mdio_find_bus" [drivers/net/phy/mdio-mux.ko] undefined!

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: David Daney <david.daney@cavium.com>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
---
I wonder if this could be as banal as this?  Not even build tested...

Should be wrapped into commit 25106022 if it works, to ensure
bisectability.



Bjørn

 drivers/net/phy/mdio_bus.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 83d5c9f..683ef1c 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -88,7 +88,7 @@ static struct class mdio_bus_class = {
 	.dev_release	= mdiobus_release,
 };
 
-#ifdef CONFIG_OF_MDIO
+#if IS_ENABLED(CONFIG_OF_MDIO)
 /* Helper function for of_mdio_find_bus */
 static int of_mdio_bus_match(struct device *dev, void *mdio_bus_np)
 {
-- 
1.7.10

^ permalink raw reply related

* Re: [PATCH 00/17] Swap-over-NBD without deadlocking V10
From: Mel Gorman @ 2012-05-11 15:45 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
	michaelc, emunson
In-Reply-To: <20120511.010445.1020972261904383892.davem@davemloft.net>

On Fri, May 11, 2012 at 01:04:45AM -0400, David Miller wrote:
> 
> Ok, I'm generally happy with the networking parts.
> 

Great!

> If you address my feedback I'll sign off on it.
> 

I didn't get through all the feedback and respond today but I will
during next week, get it retested and reposted. Thanks a lot.

> The next question is whose tree this stuff goes through :-)

Yep, that's going to be entertaining.  I had structured this so it could
go through multiple trees but it's not perfect. If I switch patches 14
(slab-related) and 15 (network related), then it becomes

Patch 1 gets dropped after the next merge window as it'll be in mainline anyway
Patch 2-3 goes through Pekka's sl*b tree
Patch 4-7 goes through akpm
Patch 8-14 goes through linux-net
Patch 15-17 goes through akpm

That sort of multiple staging is messy though and correctness would depend
on what order linux-next pulls trees from. I think I should be able to
move 15-17 before linux-net which might simplify things a little although
that would be a bit odd from a bisection perspective.

>From my point of view, the ideal would be that all the patches go through
akpm's tree or yours but that probably will cause merge difficulties.

Any recommendations?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* CONTACT WESTERN UNION FOR YOUR PAYMENT MTCN
From: Dr. James William @ 2012-05-11 14:50 UTC (permalink / raw)




-- 
Attention.

 I’ve sent you the first payment of $5.000.00 today from your winning
funds total amount of $4.7MillionUSD,Therefore you need to contact western
union agent for him to give you the transfer payment information and MTCN,

Agent: Mr.Adam Smith
Phone: +229-9813-6203
Email: wu_34live@superposta.com

Please try to contact him today for him to forward the payment information
to you, remember to indicate the registration code of EB-2520 to him when
e-mailing or calling he also he will be sending you $5,000.00 daily as per
my discussion with him.

Do get in touch with me once you have received the transfer.
Sincerely,
Mr. James Williams.

^ permalink raw reply

* Re: [PATCH v2 5/5] net: sh_eth: use NAPI
From: Ben Hutchings @ 2012-05-11 15:35 UTC (permalink / raw)
  To: Shimoda, Yoshihiro; +Cc: netdev, SH-Linux
In-Reply-To: <4FACD007.1070308@renesas.com>

On Fri, 2012-05-11 at 17:38 +0900, Shimoda, Yoshihiro wrote:
> This patch modifies the driver to use NAPI.
[...]
> +static int sh_eth_poll(struct napi_struct *napi, int budget)
> +{
> +	struct sh_eth_private *mdp = container_of(napi, struct sh_eth_private,
> +						  napi);
> +	struct net_device *ndev = mdp->ndev;
> +	struct sh_eth_cpu_data *cd = mdp->cd;
> +	int work_done = 0, txfree_num;
> +	u32 intr_status = sh_eth_read(ndev, EESR);
> +
> +	/* Clear interrupt flags */
> +	sh_eth_write(ndev, intr_status, EESR);
> +
> +	/* check txdesc */
> +	txfree_num = sh_eth_txfree(ndev);
> +	if (txfree_num)
>  		netif_wake_queue(ndev);
> -	}
> 
> +	/* check rxdesc */
> +	sh_eth_rx(ndev, &work_done, budget);
> +
> +	/* check error flags */
>  	if (intr_status & cd->eesr_err_check)
>  		sh_eth_error(ndev, intr_status);
> 
> -other_irq:
> -	spin_unlock(&mdp->lock);
> +	/* get current interrupt flags */
> +	intr_status = sh_eth_read(ndev, EESR);
> 
> -	return ret;
> +	/* check whether the controller doesn't have any events */
> +	if (!txfree_num && !(intr_status & cd->eesr_err_check) &&
> +	    work_done < budget) {
> +		napi_complete(napi);

If and only if you return a value less than the budget then you *must*
call napi_complete().  You can't add these extra conditions.

> +		/* Enable all interrupts */
> +		sh_eth_write(ndev, cd->eesipr_value, EESIPR);
> +	}
> +
> +	return work_done;
>  }
> 
>  /* PHY state control function */
> @@ -1565,6 +1590,8 @@ static int sh_eth_open(struct net_device *ndev)
> 
>  	pm_runtime_get_sync(&mdp->pdev->dev);
> 
> +	napi_enable(&mdp->napi);
> +
>  	ret = request_irq(ndev->irq, sh_eth_interrupt,
>  #if defined(CONFIG_CPU_SUBTYPE_SH7763) || \
>  	defined(CONFIG_CPU_SUBTYPE_SH7764) || \
> @@ -1705,6 +1732,8 @@ static int sh_eth_close(struct net_device *ndev)
>  		phy_disconnect(mdp->phydev);
>  	}
> 
> +	napi_disable(&mdp->napi);
> +
[...]

You will also need to call napi_disable() and napi_enable() in the
set_ringparam implementation.

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply

* Re: [PATCH v2 4/5] net: sh_eth: add support for set_ringparam/get_ringparam
From: Ben Hutchings @ 2012-05-11 15:27 UTC (permalink / raw)
  To: Shimoda, Yoshihiro; +Cc: netdev, SH-Linux
In-Reply-To: <4FACD005.3010800@renesas.com>

On Fri, 2012-05-11 at 17:38 +0900, Shimoda, Yoshihiro wrote:
> This patch supports the ethtool's set_ringparam() and get_ringparam().
[...]
> +static int sh_eth_set_ringparam(struct net_device *ndev,
> +				struct ethtool_ringparam *ring)
> +{
> +	struct sh_eth_private *mdp = netdev_priv(ndev);
> +	int ringsize;
> +
> +	if (ring->tx_pending > TX_RING_MAX ||
> +	    ring->rx_pending > RX_RING_MAX ||
> +	    ring->tx_pending < TX_RING_MIN ||
> +	    ring->rx_pending < RX_RING_MIN)
> +		return -EINVAL;
> +	if (ring->rx_mini_pending || ring->rx_jumbo_pending)
> +		return -EINVAL;
> +
> +	if (netif_running(ndev)) {
> +		/* Disable interrupts by clearing the interrupt mask. */
> +		sh_eth_write(ndev, 0x0000, EESIPR);
> +		/* Stop the chip's Tx and Rx processes. */
> +		sh_eth_write(ndev, 0, EDTRR);
> +		sh_eth_write(ndev, 0, EDRRR);

You'll also need to call netif_tx_disable() and synchronize_irq().

> +	}
> +
> +	/* Free all the skbuffs in the Rx queue. */
> +	sh_eth_ring_free(ndev);
> +	/* Free DMA buffer */
> +	ringsize = sizeof(struct sh_eth_rxdesc) * mdp->num_rx_ring;
> +	dma_free_coherent(NULL, ringsize, mdp->rx_ring, mdp->rx_desc_dma);
> +	ringsize = sizeof(struct sh_eth_txdesc) * mdp->num_tx_ring;
> +	dma_free_coherent(NULL, ringsize, mdp->tx_ring, mdp->tx_desc_dma);
> +
> +	/* Set new parameters */
> +	mdp->num_rx_ring = ring->rx_pending;
> +	mdp->num_tx_ring = ring->tx_pending;
> +
> +	sh_eth_ring_init(ndev);
> +	sh_eth_dev_init(ndev);

And what if either of these fails?

> +	if (netif_running(ndev)) {
> +		sh_eth_write(ndev, mdp->cd->eesipr_value, EESIPR);
> +		/* Setting the Rx mode will start the Rx process. */
> +		sh_eth_write(ndev, EDRRR_R, EDRRR);

You'll need to call netif_wake_queue().

> +	}
> +
> +	return 0;
> +}
[...]

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [PATCH net-next] fq_codel: Fair Queue Codel AQM
From: Eric Dumazet @ 2012-05-11 15:23 UTC (permalink / raw)
  To: Changli Gao
  Cc: David Miller, netdev, Dave Taht, Kathleen Nichols, Van Jacobson,
	Tom Herbert, Matt Mathis, Yuchung Cheng, Stephen Hemminger
In-Reply-To: <CABa6K_Fk07CUR1+hOx06cwJ5MuNYi7vpeLYFOc_1Qwmu5ubYhQ@mail.gmail.com>

On Fri, 2012-05-11 at 23:03 +0800, Changli Gao wrote:
> On Fri, May 11, 2012 at 9:59 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Fair Queue Codel implementation.
> >
> > Principles :
> >
> > - Packets are classified (internal classifier or external) on flows.
> > - This is a Stochastic model (as we use a hash, several flows might
> >                              be hashed on same slot)
> > - Each flow has a CoDel managed queue.
> > - Flows are linked onto two (Round Robin) lists,
> >  so that new flows have priority on old ones.
> 
> I don't think it is a good idea, as the old ones may be starved. It isn't
> fair. Why not use the conventional DRR?
> 

Hey, its DRR, but with 64 bytes per flow instead of more than 256.
One cache line per flow, that was my goal, sharing the codel_params and
stats for all flows.

A 'struct fq_codel_flow' can be in three states :

- Detached state
- In new flow list
- In old flow list

And its the dequeue() that can put a flow in detached state, only if
coming from old flow list.

Its possible I missed something, because in my first coding I had 3
lists.

Anyway I'll send a V2 because I left .change method to NULL, while the
intent was to permit a change on fq_codel.

> > +
> > +       /* Queue is full! Find the fat flow and drop packet from it.
> > +        * This might sound expensive, but with 1024 flows, we scan
> > +        * 4KB of memory, and we dont need to handle a complex tree
> > +        * in fast path (packet queue/enqueue) with many cache misses.
> > +        */
> 
> How about the tricks used by SFQ?

They are too expensive in term of cache misses and limits.
Code is complex and difficult to maintain.
That was a nice compromise 20 years ago when memory was expensive.
Now, memory is cheap but still slow.

Also adding the 'priority to new flows' is too difficult with SFQ.

^ permalink raw reply

* Re: [PATCH 10/17] netvm: Allow skb allocation to use PFMEMALLOC reserves
From: Mel Gorman @ 2012-05-11 15:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Miller, akpm, linux-mm, netdev, linux-kernel, neilb,
	michaelc, emunson
In-Reply-To: <1336747350.1017.22.camel@twins>

On Fri, May 11, 2012 at 04:42:30PM +0200, Peter Zijlstra wrote:
> On Fri, 2012-05-11 at 15:32 +0100, Mel Gorman wrote:
> > > > +extern atomic_t memalloc_socks;
> > > > +static inline int sk_memalloc_socks(void)
> > > > +{
> > > > +   return atomic_read(&memalloc_socks);
> > > > +}
> > > 
> > > Please change this to be a static branch.
> > > 
> > 
> > Will do. I renamed memalloc_socks to sk_memalloc_socks, made it a int as
> > atomics are unnecessary and I check it directly in a branch instead of a
> > static inline. It should be relatively easy for the branch predictor. 
> 
> David means you to use include/linux/jump_label.h.
> 

Ah, that makes a whole lot more sense. Thanks for the clarification.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net-next] fq_codel: Fair Queue Codel AQM
From: Changli Gao @ 2012-05-11 15:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Miller, netdev, Dave Taht, Kathleen Nichols, Van Jacobson,
	Tom Herbert, Matt Mathis, Yuchung Cheng, Stephen Hemminger
In-Reply-To: <1336744796.31653.164.camel@edumazet-glaptop>

On Fri, May 11, 2012 at 9:59 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Fair Queue Codel implementation.
>
> Principles :
>
> - Packets are classified (internal classifier or external) on flows.
> - This is a Stochastic model (as we use a hash, several flows might
>                              be hashed on same slot)
> - Each flow has a CoDel managed queue.
> - Flows are linked onto two (Round Robin) lists,
>  so that new flows have priority on old ones.

I don't think it is a good idea, as the old ones may be starved. It isn't
fair. Why not use the conventional DRR?

> +
> +       /* Queue is full! Find the fat flow and drop packet from it.
> +        * This might sound expensive, but with 1024 flows, we scan
> +        * 4KB of memory, and we dont need to handle a complex tree
> +        * in fast path (packet queue/enqueue) with many cache misses.
> +        */

How about the tricks used by SFQ?

Thanks.

-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply

* [PATCH 1/2 net v2] 6lowpan: rework data fetching from skb
From: Alexander Smirnov @ 2012-05-11 14:58 UTC (permalink / raw)
  To: davem; +Cc: netdev, eric.dumazet, Alexander Smirnov
In-Reply-To: <4fabc166.9208cc0a.4de8.ffff9211@mx.google.com>

This patch reworks functions responsible for fetching data from skb. Now they
work more accurately and can notify if something went wrong.

Signed-off-by: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
---
 net/ieee802154/6lowpan.c |   75 ++++++++++++++++++++++++++-------------------
 1 files changed, 43 insertions(+), 32 deletions(-)

diff --git a/net/ieee802154/6lowpan.c b/net/ieee802154/6lowpan.c
index 32eb417..c2bbf01 100644
--- a/net/ieee802154/6lowpan.c
+++ b/net/ieee802154/6lowpan.c
@@ -291,25 +291,31 @@ lowpan_compress_udp_header(u8 **hc06_ptr, struct sk_buff *skb)
 	*hc06_ptr += 2;
 }
 
-static u8 lowpan_fetch_skb_u8(struct sk_buff *skb)
+static inline int lowpan_fetch_skb_u8(struct sk_buff *skb, u8 *val)
 {
-	u8 ret;
+	if (WARN_ON_ONCE(!pskb_may_pull(skb, 1))) {
+		/*
+		 * Uhhh, something went terribly wrong.
+		 * Check the bottom layers code
+		 */
+		return -EINVAL;
+	}
 
-	ret = skb->data[0];
+	*val = skb->data[0];
 	skb_pull(skb, 1);
 
-	return ret;
+	return 0;
 }
 
-static u16 lowpan_fetch_skb_u16(struct sk_buff *skb)
+static inline int lowpan_fetch_skb_u16(struct sk_buff *skb, u16 *val)
 {
-	u16 ret;
-
-	BUG_ON(!pskb_may_pull(skb, 2));
+	if (WARN_ON_ONCE(!pskb_may_pull(skb, 2)))
+		return -EINVAL;
 
-	ret = skb->data[0] | (skb->data[1] << 8);
+	*val = skb->data[0] | (skb->data[1] << 8);
 	skb_pull(skb, 2);
-	return ret;
+
+	return 0;
 }
 
 static int
@@ -318,7 +324,8 @@ lowpan_uncompress_udp_header(struct sk_buff *skb)
 	struct udphdr *uh = udp_hdr(skb);
 	u8 tmp;
 
-	tmp = lowpan_fetch_skb_u8(skb);
+	if (lowpan_fetch_skb_u8(skb, &tmp))
+		goto err;
 
 	if ((tmp & LOWPAN_NHC_UDP_MASK) == LOWPAN_NHC_UDP_ID) {
 		pr_debug("(%s): UDP header uncompression\n", __func__);
@@ -710,7 +717,9 @@ lowpan_process_data(struct sk_buff *skb)
 	/* at least two bytes will be used for the encoding */
 	if (skb->len < 2)
 		goto drop;
-	iphc0 = lowpan_fetch_skb_u8(skb);
+
+	if (lowpan_fetch_skb_u8(skb, &iphc0))
+		goto drop;
 
 	/* fragments assembling */
 	switch (iphc0 & LOWPAN_DISPATCH_MASK) {
@@ -722,8 +731,9 @@ lowpan_process_data(struct sk_buff *skb)
 		u16 tag;
 		bool found = false;
 
-		len = lowpan_fetch_skb_u8(skb); /* frame length */
-		tag = lowpan_fetch_skb_u16(skb);
+		if (lowpan_fetch_skb_u8(skb, &len) || /* frame length */
+		    lowpan_fetch_skb_u16(skb, &tag))  /* fragment tag */
+			goto drop;
 
 		/*
 		 * check if frame assembling with the same tag is
@@ -747,7 +757,8 @@ lowpan_process_data(struct sk_buff *skb)
 		if ((iphc0 & LOWPAN_DISPATCH_MASK) == LOWPAN_DISPATCH_FRAG1)
 			goto unlock_and_drop;
 
-		offset = lowpan_fetch_skb_u8(skb); /* fetch offset */
+		if (lowpan_fetch_skb_u8(skb, &offset)) /* fetch offset */
+			goto unlock_and_drop;
 
 		/* if payload fits buffer, copy it */
 		if (likely((offset * 8 + skb->len) <= frame->length))
@@ -769,7 +780,10 @@ lowpan_process_data(struct sk_buff *skb)
 			dev_kfree_skb(skb);
 			skb = frame->skb;
 			kfree(frame);
-			iphc0 = lowpan_fetch_skb_u8(skb);
+
+			if (lowpan_fetch_skb_u8(skb, &iphc0))
+				goto unlock_and_drop;
+
 			break;
 		}
 		spin_unlock(&flist_lock);
@@ -780,7 +794,8 @@ lowpan_process_data(struct sk_buff *skb)
 		break;
 	}
 
-	iphc1 = lowpan_fetch_skb_u8(skb);
+	if (lowpan_fetch_skb_u8(skb, &iphc1))
+		goto drop;
 
 	_saddr = mac_cb(skb)->sa.hwaddr;
 	_daddr = mac_cb(skb)->da.hwaddr;
@@ -791,9 +806,8 @@ lowpan_process_data(struct sk_buff *skb)
 	if (iphc1 & LOWPAN_IPHC_CID) {
 		pr_debug("(%s): CID flag is set, increase header with one\n",
 								__func__);
-		if (!skb->len)
+		if (lowpan_fetch_skb_u8(skb, &num_context))
 			goto drop;
-		num_context = lowpan_fetch_skb_u8(skb);
 	}
 
 	hdr.version = 6;
@@ -805,9 +819,9 @@ lowpan_process_data(struct sk_buff *skb)
 	 * ECN + DSCP + 4-bit Pad + Flow Label (4 bytes)
 	 */
 	case 0: /* 00b */
-		if (!skb->len)
+		if (lowpan_fetch_skb_u8(skb, &tmp))
 			goto drop;
-		tmp = lowpan_fetch_skb_u8(skb);
+
 		memcpy(&hdr.flow_lbl, &skb->data[0], 3);
 		skb_pull(skb, 3);
 		hdr.priority = ((tmp >> 2) & 0x0f);
@@ -819,9 +833,9 @@ lowpan_process_data(struct sk_buff *skb)
 	 * ECN + DSCP (1 byte), Flow Label is elided
 	 */
 	case 1: /* 10b */
-		if (!skb->len)
+		if (lowpan_fetch_skb_u8(skb, &tmp))
 			goto drop;
-		tmp = lowpan_fetch_skb_u8(skb);
+
 		hdr.priority = ((tmp >> 2) & 0x0f);
 		hdr.flow_lbl[0] = ((tmp << 6) & 0xC0) | ((tmp >> 2) & 0x30);
 		hdr.flow_lbl[1] = 0;
@@ -832,9 +846,9 @@ lowpan_process_data(struct sk_buff *skb)
 	 * ECN + 2-bit Pad + Flow Label (3 bytes), DSCP is elided
 	 */
 	case 2: /* 01b */
-		if (!skb->len)
+		if (lowpan_fetch_skb_u8(skb, &tmp))
 			goto drop;
-		tmp = lowpan_fetch_skb_u8(skb);
+
 		hdr.flow_lbl[0] = (skb->data[0] & 0x0F) | ((tmp >> 2) & 0x30);
 		memcpy(&hdr.flow_lbl[1], &skb->data[0], 2);
 		skb_pull(skb, 2);
@@ -853,9 +867,9 @@ lowpan_process_data(struct sk_buff *skb)
 	/* Next Header */
 	if ((iphc0 & LOWPAN_IPHC_NH_C) == 0) {
 		/* Next header is carried inline */
-		if (!skb->len)
+		if (lowpan_fetch_skb_u8(skb, &(hdr.nexthdr)))
 			goto drop;
-		hdr.nexthdr = lowpan_fetch_skb_u8(skb);
+
 		pr_debug("(%s): NH flag is set, next header is carried "
 			 "inline: %02x\n", __func__, hdr.nexthdr);
 	}
@@ -864,9 +878,8 @@ lowpan_process_data(struct sk_buff *skb)
 	if ((iphc0 & 0x03) != LOWPAN_IPHC_TTL_I)
 		hdr.hop_limit = lowpan_ttl_values[iphc0 & 0x03];
 	else {
-		if (!skb->len)
+		if (lowpan_fetch_skb_u8(skb, &(hdr.hop_limit)))
 			goto drop;
-		hdr.hop_limit = lowpan_fetch_skb_u8(skb);
 	}
 
 	/* Extract SAM to the tmp variable */
@@ -894,10 +907,8 @@ lowpan_process_data(struct sk_buff *skb)
 			pr_debug("(%s): destination address non-context-based"
 				 " multicast compression\n", __func__);
 			if (0 < tmp && tmp < 3) {
-				if (!skb->len)
+				if (lowpan_fetch_skb_u8(skb, &prefix[1]))
 					goto drop;
-				else
-					prefix[1] = lowpan_fetch_skb_u8(skb);
 			}
 
 			err = lowpan_uncompress_addr(skb, &hdr.daddr, prefix,
-- 
1.7.2.3

^ permalink raw reply related

* Re: [PATCH 12/17] netvm: Propagate page->pfmemalloc from netdev_alloc_page to skb
From: Mel Gorman @ 2012-05-11 14:46 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
	michaelc, emunson
In-Reply-To: <20120511.010109.1698578316660207883.davem@davemloft.net>

On Fri, May 11, 2012 at 01:01:09AM -0400, David Miller wrote:
> From: Mel Gorman <mgorman@suse.de>
> Date: Thu, 10 May 2012 14:45:05 +0100
> 
> > +/**
> > + *	propagate_pfmemalloc_skb - Propagate pfmemalloc if skb is allocated after RX page
> > + *	@page: The page that was allocated from netdev_alloc_page
> > + *	@skb: The skb that may need pfmemalloc set
> > + */
> > +static inline void propagate_pfmemalloc_skb(struct page *page,
> > +						struct sk_buff *skb)
> 
> Please use consistent prefixes in the names for new interfaces.
> 

Understood.

> This one should probably be named "skb_propagate_pfmemalloc()" and
> go into skbuff.h since it needs no knowledge of netdevices.
> 

I used a netdev prefix and placed it in skbuff.h which was stupid. The
screw-up was because I was partially reverting a patch that deleted
netdev_alloc_page but I didn't need any device information so the naming
was poor. I renamed netdev_alloc_page to skb_alloc_page and will fix up
the documentation appropriately.

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 10/17] netvm: Allow skb allocation to use PFMEMALLOC reserves
From: Peter Zijlstra @ 2012-05-11 14:42 UTC (permalink / raw)
  To: Mel Gorman
  Cc: David Miller, akpm, linux-mm, netdev, linux-kernel, neilb,
	michaelc, emunson
In-Reply-To: <20120511143218.GS11435@suse.de>

On Fri, 2012-05-11 at 15:32 +0100, Mel Gorman wrote:
> > > +extern atomic_t memalloc_socks;
> > > +static inline int sk_memalloc_socks(void)
> > > +{
> > > +   return atomic_read(&memalloc_socks);
> > > +}
> > 
> > Please change this to be a static branch.
> > 
> 
> Will do. I renamed memalloc_socks to sk_memalloc_socks, made it a int as
> atomics are unnecessary and I check it directly in a branch instead of a
> static inline. It should be relatively easy for the branch predictor. 

David means you to use include/linux/jump_label.h.

static struct static_key sk_memalloc_socks = STATIC_KEY_INIT_FALSE;

and have your function read:

static inline bool sk_memalloc_socks(void)
{
	return static_key_false(&sk_memalloc_socks);
}

which can be modified using:

  static_key_slow_inc(&sk_memalloc_socks);

or

  static_key_slow_dec(&sk_memalloc_socks);

This magic goo turns the branch into self-modifying code such that the
branch is an unconditional jump at runtime.

  

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 10/17] netvm: Allow skb allocation to use PFMEMALLOC reserves
From: Mel Gorman @ 2012-05-11 14:32 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
	michaelc, emunson
In-Reply-To: <20120511.005740.210437168371869566.davem@davemloft.net>

On Fri, May 11, 2012 at 12:57:40AM -0400, David Miller wrote:
> From: Mel Gorman <mgorman@suse.de>
> Date: Thu, 10 May 2012 14:45:03 +0100
> 
> > +/* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */
> > +bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
> 
> I know this gets added in an earlier patch, but it seems slightly
> overkill to have a function call just for a simply bit test.
> 

It's not that simple. gfp_pfmemalloc_allowed calls gfp_to_alloc_flags()
which is quite involved and probably should not be duplicated. In the slab
case, it's called from slow paths where we are already under memory pressure
and swapping to network so it's not a major problem. In the network case,
it is called when kmalloc() has already failed and also a slow path.

> > +extern atomic_t memalloc_socks;
> > +static inline int sk_memalloc_socks(void)
> > +{
> > +	return atomic_read(&memalloc_socks);
> > +}
> 
> Please change this to be a static branch.
> 

Will do. I renamed memalloc_socks to sk_memalloc_socks, made it a int as
atomics are unnecessary and I check it directly in a branch instead of a
static inline. It should be relatively easy for the branch predictor.

> > +	skb = __alloc_skb(length + NET_SKB_PAD, gfp_mask,
> > +						SKB_ALLOC_RX, NUMA_NO_NODE);
> 
> Please fix the argument indentation.
> 

Done.

> > +	data = kmalloc_reserve(size + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
> > +		       gfp_mask, NUMA_NO_NODE, NULL);
> 
> Likewise.

Done

> 
> > -	struct sk_buff *n = alloc_skb(newheadroom + skb->len + newtailroom,
> > -				      gfp_mask);
> > +	struct sk_buff *n = __alloc_skb(newheadroom + skb->len + newtailroom,
> > +				      gfp_mask, skb_alloc_rx_flag(skb),
> > +				      NUMA_NO_NODE);
> 
> Likewise.
> 

Done.

> > -			nskb = alloc_skb(hsize + doffset + headroom,
> > -					 GFP_ATOMIC);
> > +			nskb = __alloc_skb(hsize + doffset + headroom,
> > +					 GFP_ATOMIC, skb_alloc_rx_flag(skb),
> > +					 NUMA_NO_NODE);
> 
> Likewise.

Done.

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH net-next v2] be2net: Fix to allow get/set of debug levels in the firmware.
From: Somnath Kotur @ 2012-05-11 14:20 UTC (permalink / raw)
  To: netdev, bhutchings; +Cc: Somnath Kotur, Suresh Reddy

Incorporated review comments by Ben Hutchings.

Signed-off-by: Suresh Reddy <suresh.reddy@emulex.com>
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
---
 drivers/net/ethernet/emulex/benet/be.h         |    3 +
 drivers/net/ethernet/emulex/benet/be_cmds.c    |   56 +++++++++++++++++
 drivers/net/ethernet/emulex/benet/be_cmds.h    |   57 +++++++++++++++++
 drivers/net/ethernet/emulex/benet/be_ethtool.c |   77 ++++++++++++++++++++++++
 drivers/net/ethernet/emulex/benet/be_main.c    |   37 +++++++++++
 5 files changed, 230 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/emulex/benet/be.h b/drivers/net/ethernet/emulex/benet/be.h
index c3ee910..9817fed 100644
--- a/drivers/net/ethernet/emulex/benet/be.h
+++ b/drivers/net/ethernet/emulex/benet/be.h
@@ -415,6 +415,7 @@ struct be_adapter {
 	bool wol;
 	u32 max_pmac_cnt;	/* Max secondary UC MACs programmable */
 	u32 uc_macs;		/* Count of secondary UC MAC programmed */
+	u32 msg_enable;
 };
 
 #define be_physfn(adapter) (!adapter->is_virtfn)
@@ -603,4 +604,6 @@ extern void be_parse_stats(struct be_adapter *adapter);
 extern int be_load_fw(struct be_adapter *adapter, u8 *func);
 extern bool be_is_wol_supported(struct be_adapter *adapter);
 extern bool be_pause_supported(struct be_adapter *adapter);
+extern u32 be_get_fw_log_level(struct be_adapter *adapter);
+
 #endif				/* BE_H */
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c b/drivers/net/ethernet/emulex/benet/be_cmds.c
index 43167e8..b24623c 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -2589,4 +2589,60 @@ err:
 	mutex_unlock(&adapter->mbox_lock);
 	pci_free_consistent(adapter->pdev, cmd.size, cmd.va, cmd.dma);
 	return status;
+
+}
+int be_cmd_get_ext_fat_capabilites(struct be_adapter *adapter,
+				   struct be_dma_mem *cmd)
+{
+	struct be_mcc_wrb *wrb;
+	struct be_cmd_req_get_ext_fat_caps *req;
+	int status;
+
+	if (mutex_lock_interruptible(&adapter->mbox_lock))
+		return -1;
+
+	wrb = wrb_from_mbox(adapter);
+	if (!wrb) {
+		status = -EBUSY;
+		goto err;
+	}
+
+	req = cmd->va;
+	be_wrb_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_COMMON,
+			       OPCODE_COMMON_GET_EXT_FAT_CAPABILITES,
+			       cmd->size, wrb, cmd);
+	req->parameter_type = cpu_to_le32(1);
+
+	status = be_mbox_notify_wait(adapter);
+err:
+	mutex_unlock(&adapter->mbox_lock);
+	return status;
+}
+
+int be_cmd_set_ext_fat_capabilites(struct be_adapter *adapter,
+				   struct be_dma_mem *cmd,
+				   struct be_fat_conf_params *configs)
+{
+	struct be_mcc_wrb *wrb;
+	struct be_cmd_req_set_ext_fat_caps *req;
+	int status;
+
+	spin_lock_bh(&adapter->mcc_lock);
+
+	wrb = wrb_from_mccq(adapter);
+	if (!wrb) {
+		status = -EBUSY;
+		goto err;
+	}
+
+	req = cmd->va;
+	memcpy(&req->set_params, configs, sizeof(struct be_fat_conf_params));
+	be_wrb_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_COMMON,
+			       OPCODE_COMMON_SET_EXT_FAT_CAPABILITES,
+			       cmd->size, wrb, cmd);
+
+	status = be_mcc_notify_wait(adapter);
+err:
+	spin_unlock_bh(&adapter->mcc_lock);
+	return status;
 }
diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.h b/drivers/net/ethernet/emulex/benet/be_cmds.h
index 944f031..0b1029b 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.h
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.h
@@ -189,6 +189,8 @@ struct be_mcc_mailbox {
 #define OPCODE_COMMON_GET_PHY_DETAILS			102
 #define OPCODE_COMMON_SET_DRIVER_FUNCTION_CAP		103
 #define OPCODE_COMMON_GET_CNTL_ADDITIONAL_ATTRIBUTES	121
+#define OPCODE_COMMON_GET_EXT_FAT_CAPABILITES		125
+#define OPCODE_COMMON_SET_EXT_FAT_CAPABILITES		126
 #define OPCODE_COMMON_GET_MAC_LIST			147
 #define OPCODE_COMMON_SET_MAC_LIST			148
 #define OPCODE_COMMON_GET_HSW_CONFIG			152
@@ -1602,6 +1604,56 @@ static inline void *be_erx_stats_from_cmd(struct be_adapter *adapter)
 	}
 }
 
+
+/************** get fat capabilites *******************/
+#define MAX_MODULES 27
+#define MAX_MODES 4
+#define MODE_UART 0
+#define FW_LOG_LEVEL_DEFAULT 48
+#define FW_LOG_LEVEL_FATAL 64
+
+struct ext_fat_mode {
+	u8 mode;
+	u8 rsvd0;
+	u16 port_mask;
+	u32 dbg_lvl;
+	u64 fun_mask;
+} __packed;
+
+struct ext_fat_modules {
+	u8 modules_str[32];
+	u32 modules_id;
+	u32 num_modes;
+	struct ext_fat_mode trace_lvl[MAX_MODES];
+} __packed;
+
+struct be_fat_conf_params {
+	u32 max_log_entries;
+	u32 log_entry_size;
+	u8 log_type;
+	u8 max_log_funs;
+	u8 max_log_ports;
+	u8 rsvd0;
+	u32 supp_modes;
+	u32 num_modules;
+	struct ext_fat_modules module[MAX_MODULES];
+} __packed;
+
+struct be_cmd_req_get_ext_fat_caps {
+	struct be_cmd_req_hdr hdr;
+	u32 parameter_type;
+};
+
+struct be_cmd_resp_get_ext_fat_caps {
+	struct be_cmd_resp_hdr hdr;
+	struct be_fat_conf_params get_params;
+};
+
+struct be_cmd_req_set_ext_fat_caps {
+	struct be_cmd_req_hdr hdr;
+	struct be_fat_conf_params set_params;
+};
+
 extern int be_pci_fnum_get(struct be_adapter *adapter);
 extern int be_cmd_POST(struct be_adapter *adapter);
 extern int be_cmd_mac_addr_query(struct be_adapter *adapter, u8 *mac_addr,
@@ -1707,4 +1759,9 @@ extern int be_cmd_set_hsw_config(struct be_adapter *adapter, u16 pvid,
 extern int be_cmd_get_hsw_config(struct be_adapter *adapter, u16 *pvid,
 			u32 domain, u16 intf_id);
 extern int be_cmd_get_acpi_wol_cap(struct be_adapter *adapter);
+extern int be_cmd_get_ext_fat_capabilites(struct be_adapter *adapter,
+					  struct be_dma_mem *cmd);
+extern int be_cmd_set_ext_fat_capabilites(struct be_adapter *adapter,
+					  struct be_dma_mem *cmd,
+					  struct be_fat_conf_params *cfgs);
 
diff --git a/drivers/net/ethernet/emulex/benet/be_ethtool.c b/drivers/net/ethernet/emulex/benet/be_ethtool.c
index 747f68f..c40147d 100644
--- a/drivers/net/ethernet/emulex/benet/be_ethtool.c
+++ b/drivers/net/ethernet/emulex/benet/be_ethtool.c
@@ -878,6 +878,81 @@ be_read_eeprom(struct net_device *netdev, struct ethtool_eeprom *eeprom,
 	return status;
 }
 
+static u32 be_get_msg_level(struct net_device *netdev)
+{
+	struct be_adapter *adapter = netdev_priv(netdev);
+
+	if (lancer_chip(adapter)) {
+		dev_err(&adapter->pdev->dev, "Operation not supported\n");
+		return -EOPNOTSUPP;
+	}
+
+	return adapter->msg_enable;
+}
+
+static void be_set_fw_log_level(struct be_adapter *adapter, u32 level)
+{
+	struct be_dma_mem extfat_cmd;
+	struct be_fat_conf_params *cfgs;
+	int status;
+	int i, j;
+
+	memset(&extfat_cmd, 0, sizeof(struct be_dma_mem));
+	extfat_cmd.size = sizeof(struct be_cmd_resp_get_ext_fat_caps);
+	extfat_cmd.va = pci_alloc_consistent(adapter->pdev, extfat_cmd.size,
+					     &extfat_cmd.dma);
+	if (!extfat_cmd.va) {
+		dev_err(&adapter->pdev->dev, "%s: Memory allocation failure\n",
+			__func__);
+		goto err;
+	}
+	status = be_cmd_get_ext_fat_capabilites(adapter, &extfat_cmd);
+	if (!status) {
+		cfgs = (struct be_fat_conf_params *)(extfat_cmd.va +
+					sizeof(struct be_cmd_resp_hdr));
+		for (i = 0; i < cfgs->num_modules; i++) {
+			for (j = 0; j < cfgs->module[i].num_modes; j++) {
+				if (cfgs->module[i].trace_lvl[j].mode ==
+								MODE_UART)
+					cfgs->module[i].trace_lvl[j].dbg_lvl =
+							cpu_to_le32(level);
+			}
+		}
+		status = be_cmd_set_ext_fat_capabilites(adapter, &extfat_cmd,
+							cfgs);
+		if (status)
+			dev_err(&adapter->pdev->dev,
+				"Message level set failed\n");
+	} else {
+		dev_err(&adapter->pdev->dev, "Message level get failed\n");
+	}
+
+	pci_free_consistent(adapter->pdev, extfat_cmd.size, extfat_cmd.va,
+			    extfat_cmd.dma);
+err:
+	return;
+}
+
+static void be_set_msg_level(struct net_device *netdev, u32 level)
+{
+	struct be_adapter *adapter = netdev_priv(netdev);
+
+	if (lancer_chip(adapter)) {
+		dev_err(&adapter->pdev->dev, "Operation not supported\n");
+		return;
+	}
+
+	if (adapter->msg_enable == level)
+		return;
+
+	if (level & NETIF_MSG_HW != adapter->msg_enable & NETIF_MSG_HW) {
+		be_set_fw_log_level(adapter, level & NETIF_MSG_HW ?
+			       	    FW_LOG_LEVEL_DEFAULT : FW_LOG_LEVEL_FATAL);
+	adapter->msg_enable = level;
+
+	return;
+}
+
 const struct ethtool_ops be_ethtool_ops = {
 	.get_settings = be_get_settings,
 	.get_drvinfo = be_get_drvinfo,
@@ -893,6 +968,8 @@ const struct ethtool_ops be_ethtool_ops = {
 	.set_pauseparam = be_set_pauseparam,
 	.get_strings = be_get_stat_strings,
 	.set_phys_id = be_set_phys_id,
+	.get_msglevel = be_get_msg_level,
+	.set_msglevel = be_set_msg_level,
 	.get_sset_count = be_get_sset_count,
 	.get_ethtool_stats = be_get_ethtool_stats,
 	.get_regs_len = be_get_reg_len,
diff --git a/drivers/net/ethernet/emulex/benet/be_main.c b/drivers/net/ethernet/emulex/benet/be_main.c
index 6d5d30b..c2286a2 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -3367,9 +3367,43 @@ bool be_is_wol_supported(struct be_adapter *adapter)
 		!be_is_wol_excluded(adapter)) ? true : false;
 }
 
+u32 be_get_fw_log_level(struct be_adapter *adapter)
+{
+	struct be_dma_mem extfat_cmd;
+	struct be_fat_conf_params *cfgs;
+	int status;
+	u32 level = 0;
+	int j;
+
+	memset(&extfat_cmd, 0, sizeof(struct be_dma_mem));
+	extfat_cmd.size = sizeof(struct be_cmd_resp_get_ext_fat_caps);
+	extfat_cmd.va = pci_alloc_consistent(adapter->pdev, extfat_cmd.size,
+					     &extfat_cmd.dma);
+	if (!extfat_cmd.va) {
+		dev_err(&adapter->pdev->dev, "%s: Memory allocation failure\n",
+			__func__);
+		goto err;
+	}
+
+	status = be_cmd_get_ext_fat_capabilites(adapter, &extfat_cmd);
+	if (!status) {
+		cfgs = (struct be_fat_conf_params *)(extfat_cmd.va +
+						sizeof(struct be_cmd_resp_hdr));
+		for (j = 0; j < cfgs->module[0].num_modes; j++) {
+			if (cfgs->module[0].trace_lvl[j].mode == MODE_UART)
+				level = cfgs->module[0].trace_lvl[j].dbg_lvl;
+		}
+	}
+	pci_free_consistent(adapter->pdev, extfat_cmd.size, extfat_cmd.va,
+			    extfat_cmd.dma);
+err:
+	return level;
+}
+
 static int be_get_config(struct be_adapter *adapter)
 {
 	int status;
+	u32 level;
 
 	status = be_cmd_query_fw_cfg(adapter, &adapter->port_num,
 			&adapter->function_mode, &adapter->function_caps);
@@ -3407,6 +3441,9 @@ static int be_get_config(struct be_adapter *adapter)
 	if (be_is_wol_supported(adapter))
 		adapter->wol = true;
 
+	level = be_get_fw_log_level(adapter);
+	adapter->msg_enable = level <= FW_LOG_LEVEL_DEFAULT ? NETIF_MSG_HW : 0;
+
 	return 0;
 }
 
-- 
1.5.6.1

^ permalink raw reply related

* Re: [PATCH 08/17] net: Introduce sk_allocation() to allow addition of GFP flags depending on the individual socket
From: Mel Gorman @ 2012-05-11 14:12 UTC (permalink / raw)
  To: David Miller
  Cc: akpm, linux-mm, netdev, linux-kernel, neilb, a.p.zijlstra,
	michaelc, emunson
In-Reply-To: <20120511.004949.655300373402132371.davem@davemloft.net>

On Fri, May 11, 2012 at 12:49:49AM -0400, David Miller wrote:
> From: Mel Gorman <mgorman@suse.de>
> Date: Thu, 10 May 2012 14:45:01 +0100
> 
> > Introduce sk_allocation(), this function allows to inject sock specific
> > flags to each sock related allocation. It is only used on allocation
> > paths that may be required for writing pages back to network storage.
> > 
> > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> 
> This is still a little bit more than it needs to be.
> 
> You are trying to propagate a single bit from sk->sk_allocation into
> all of the annotated socket memory allocation sites.
> 
> But many of them use sk->sk_allocation already.  In fact all of them
> that use a variable rather than a constant GFP_* satisfy this
> invariant.
> 
> All of those annotations are therefore spurious, and probably end up
> generating unnecessary |'s in of that special bit in at least some
> cases.
> 

Yes, you're completely correct here.

> What you really, therefore, care about are the GFP_FOO cases.  And in
> fact those are all GFP_ATOMIC.  So make something that says what it
> is that you want, a GFP_ATOMIC with some socket specified bits |'d
> in.
> 
> Something like this:
> 
> static inline gfp_t sk_gfp_atomic(struct sock *sk)
> {
> 	return GFP_ATOMIC | (sk->sk_allocation & __GFP_MEMALLOC);
> }
> 

I went with this.

> You'll also have to make your networking patches conform to the
> networking subsystem coding style.
> 
> For example:
> 
> > -	skb = sock_wmalloc(sk, MAX_TCP_HEADER + 15 + s_data_desired, 1, GFP_ATOMIC);
> > +	skb = sock_wmalloc(sk, MAX_TCP_HEADER + 15 + s_data_desired, 1,
> > +					sk_allocation(sk, GFP_ATOMIC));
> 
> The sk_allocation() argument has to line up with the first column
> after the openning parenthesis of the function call.  You can't just
> use all TAB characters.  And this all TABs thing looks extremely ugly
> to boot.
> 

I was not aware of the networking subsystem coding style. I'll fix it
up.

> > -		newnp->pktoptions = skb_clone(treq->pktopts, GFP_ATOMIC);
> > +		newnp->pktoptions = skb_clone(treq->pktopts,
> > +						sk_allocation(sk, GFP_ATOMIC));
> 
> Same here.
> 
> What's really funny to me is that in several cases elsewhere in this
> pach you get it right.

Whether I got it right or not would be effectively random. I tried
myself to see what pattern I was using thinking it would be "always"
tab but nope, no pattern :)

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH net-next] fq_codel: Fair Queue Codel AQM
From: Eric Dumazet @ 2012-05-11 13:59 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Dave Taht, Kathleen Nichols, Van Jacobson, Tom Herbert,
	Matt Mathis, Yuchung Cheng, Stephen Hemminger

From: Eric Dumazet <edumazet@google.com>

Fair Queue Codel implementation.

Principles :

- Packets are classified (internal classifier or external) on flows.
- This is a Stochastic model (as we use a hash, several flows might
                              be hashed on same slot)
- Each flow has a CoDel managed queue.
- Flows are linked onto two (Round Robin) lists,
  so that new flows have priority on old ones.

- For a given flow, packets are not reordered (CoDel uses a FIFO)
- head drops only.
- ECN capability is on by default.
- Very low memory footprint (64 bytes per flow)

tc qdisc ... fq_codel [ limit PACKETS ] [ flows number ]
                      [ target TIME ] [ interval TIME ] [ noecn ]

defaults : 1024 flows, 10240 packets limit

Impressive results on load :

# tc -s -d class show dev eth9

class htb 1:1 root leaf 10: prio 0 quantum 1514 rate 200000Kbit ceil 200000Kbit burst 1475b/8 mpu 0b overhead 0b cburst 1475b/8 mpu 0b overhead 0b level 0 
 Sent 1267974946 bytes 837585 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 202298Kbit 16702pps backlog 0b 103p requeues 0 
 lended: 837482 borrowed: 0 giants: 0
 tokens: -912 ctokens: -912

class fq_codel 10:a7 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 18168b 12p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 7.0ms
class fq_codel 10:10b parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 16654b 11p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 6.4ms
class fq_codel 10:146 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 13626b 9p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 5.2ms
class fq_codel 10:1c0 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 12112b 8p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 2.8ms
class fq_codel 10:2ba parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 13626b 9p requeues 0 
  deficit 926 count 1 lastcount 1 ldelay 5.2ms
class fq_codel 10:31d parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 16654b 11p requeues 0 
  deficit 0 count 1 lastcount 1 ldelay 6.4ms
class fq_codel 10:32c parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 15140b 10p requeues 0 
  deficit 80 count 1 lastcount 1 ldelay 6.4ms
class fq_codel 10:342 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 16654b 11p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 6.4ms
class fq_codel 10:3ab parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 18168b 12p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 7.0ms
class fq_codel 10:3c2 parent 10: 
 (dropped 0, overlimits 0 requeues 0) 
 backlog 15140b 10p requeues 0 
  deficit 1514 count 1 lastcount 1 ldelay 6.4ms

# tc -s -d qdisc show dev eth9

qdisc htb 1: root refcnt 6 r2q 10 default 1 direct_packets_stat 0 ver 3.17
 Sent 1267878050 bytes 837521 pkt (dropped 0, overlimits 1666567 requeues 1) 
 rate 202305Kbit 16703pps backlog 0b 104p requeues 1 
qdisc fq_codel 10: parent 1:1 limit 10240p target 5.0ms interval 100.0ms ecn 
 Sent 1267878050 bytes 837521 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 202305Kbit 16703pps backlog 157456b 104p requeues 0 
  maxpacket 1514 drop_overlimit 0 new_flow_count 87 ecn_mark 4071
  new_flows_len 0 old_flows_len 10

# ping -c 10 172.30.42.18
PING 172.30.42.18 (172.30.42.18) 56(84) bytes of data.
64 bytes from 172.30.42.18: icmp_req=1 ttl=64 time=0.227 ms
64 bytes from 172.30.42.18: icmp_req=2 ttl=64 time=0.165 ms
64 bytes from 172.30.42.18: icmp_req=3 ttl=64 time=0.166 ms
64 bytes from 172.30.42.18: icmp_req=4 ttl=64 time=0.151 ms
64 bytes from 172.30.42.18: icmp_req=5 ttl=64 time=0.164 ms
64 bytes from 172.30.42.18: icmp_req=6 ttl=64 time=0.172 ms
64 bytes from 172.30.42.18: icmp_req=7 ttl=64 time=0.175 ms
64 bytes from 172.30.42.18: icmp_req=8 ttl=64 time=0.183 ms
64 bytes from 172.30.42.18: icmp_req=9 ttl=64 time=0.158 ms
64 bytes from 172.30.42.18: icmp_req=10 ttl=64 time=0.200 ms

--- 172.30.42.18 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 8999ms
rtt min/avg/max/mdev = 0.151/0.176/0.227/0.022 ms

Much better than SFQ because of priority given to new flows

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Kathleen Nichols <nichols@pollere.com>
Cc: Van Jacobson <van@pollere.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
---
 include/linux/pkt_sched.h |   45 ++
 net/sched/Kconfig         |   11 
 net/sched/Makefile        |    1 
 net/sched/sch_fq_codel.c  |  595 ++++++++++++++++++++++++++++++++++++
 4 files changed, 652 insertions(+)

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index cde56c2..3ffdaec 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -681,4 +681,49 @@ struct tc_codel_xstats {
 	__u32	dropping;  /* are we in dropping state ? */
 };
 
+/* FQ_CODEL */
+
+enum {
+	TCA_FQ_CODEL_UNSPEC,
+	TCA_FQ_CODEL_TARGET,
+	TCA_FQ_CODEL_LIMIT,
+	TCA_FQ_CODEL_INTERVAL,
+	TCA_FQ_CODEL_ECN,
+	TCA_FQ_CODEL_FLOWS,
+	__TCA_FQ_CODEL_MAX
+};
+
+#define TCA_FQ_CODEL_MAX	(__TCA_FQ_CODEL_MAX - 1)
+
+enum {
+	TCA_FQ_CODEL_XSTATS_QDISC,
+	TCA_FQ_CODEL_XSTATS_CLASS,
+};
+
+struct tc_fq_codel_xstats {
+	__u32	type;
+	union {
+		struct {
+			__u32	maxpacket; /* largest packet we've seen so far */
+			__u32	drop_overlimit; /* number of time max qdisc packet limit was hit */
+			__u32	ecn_mark;  /* number of packets we ECN marked
+					    * instead of dropped
+					    */
+			__u32	new_flow_count; /* number of time packets created a 'new flow' */
+			__u32	new_flows_len;	/* count of flows in new list */
+			__u32	old_flows_len;	/* count of flows in old list */ 
+		} qdisc_stats;
+		struct {
+			__s32	deficit;
+			__u32	ldelay; /* in-queue delay seen by most recently
+					 * dequeued packet
+					 */
+			__u32	count;
+			__u32	lastcount;
+			__u32	dropping;
+			__s32	drop_next;
+		} class_stats;
+	};
+};
+
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index fadd252..e7a8976 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -261,6 +261,17 @@ config NET_SCH_CODEL
 
 	  If unsure, say N.
 
+config NET_SCH_FQ_CODEL
+	tristate "Fair Queue Controlled Delay AQM (FQ_CODEL)"
+	help
+	  Say Y here if you want to use the FQ Controlled Delay (FQ_CODEL)
+	  packet scheduling algorithm.
+
+	  To compile this driver as a module, choose M here: the module
+	  will be called sch_fq_codel.
+
+	  If unsure, say N.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 30fab03..5940a19 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_NET_SCH_MQPRIO)	+= sch_mqprio.o
 obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_SCH_QFQ)	+= sch_qfq.o
 obj-$(CONFIG_NET_SCH_CODEL)	+= sch_codel.o
+obj-$(CONFIG_NET_SCH_FQ_CODEL)	+= sch_fq_codel.o
 
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
new file mode 100644
index 0000000..8675ff8
--- /dev/null
+++ b/net/sched/sch_fq_codel.c
@@ -0,0 +1,595 @@
+/*
+ * Fair Queue CoDel discipline
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ *
+ *  Copyright (C) 2012 Eric Dumazet <edumazet@google.com>
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/jiffies.h>
+#include <linux/string.h>
+#include <linux/in.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/skbuff.h>
+#include <linux/jhash.h>
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <net/netlink.h>
+#include <net/pkt_sched.h>
+#include <net/flow_keys.h>
+#include <net/codel.h>
+
+/*	Fair Queue CoDel.
+ *
+ * Principles :
+ * Packets are classified (internal classifier or external) on flows.
+ * This is a Stochastic model (as we use a hash, several flows
+ *			       might be hashed on same slot)
+ * Each flow has a CoDel managed queue.
+ * Flows are linked onto two (Round Robin) lists,
+ * so that new flows have priority on old ones.
+ *
+ * For a given flow, packets are not reordered (CoDel uses a FIFO)
+ * head drops only.
+ * ECN capability is on by default.
+ * Low memory footprint (64 bytes per flow)
+ */
+
+struct fq_codel_flow {
+	struct sk_buff	  *head;
+	struct sk_buff	  *tail;
+	struct list_head  flowchain;
+	int		  deficit;
+	struct codel_vars cvars;
+};
+
+struct fq_codel_sched_data {
+	struct tcf_proto *filter_list;	/* external classifier */
+	struct fq_codel_flow *flows;	/* Flows table [flows_cnt] */
+	u32		*backlogs;	/* backlog table [flows_cnt] */
+	u32		flows_cnt;	/* number of flows */
+	u32		perturbation;	/* hash perturbation */
+	u32		quantum;	/* psched_mtu(qdisc_dev(sch)); */
+	struct codel_params cparams;
+	struct codel_stats cstats;
+	u32		drop_overlimit;
+	u32		new_flow_count;
+
+	struct list_head new_flows;	/* list of new flows */
+	struct list_head old_flows;	/* list of old flows */
+};
+
+static unsigned int fq_codel_hash(const struct fq_codel_sched_data *q,
+				  const struct sk_buff *skb)
+{
+	struct flow_keys keys;
+	unsigned int hash;
+
+	skb_flow_dissect(skb, &keys);
+	hash = jhash_3words((__force u32)keys.dst,
+			    (__force u32)keys.src ^ keys.ip_proto,
+			    (__force u32)keys.ports, q->perturbation);
+	return ((u64)hash * q->flows_cnt) >> 32;
+}
+
+static unsigned int fq_codel_classify(struct sk_buff *skb, struct Qdisc *sch,
+				      int *qerr)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct tcf_result res;
+	int result;
+
+	if (TC_H_MAJ(skb->priority) == sch->handle &&
+	    TC_H_MIN(skb->priority) > 0 &&
+	    TC_H_MIN(skb->priority) <= q->flows_cnt)
+		return TC_H_MIN(skb->priority);
+
+	if (!q->filter_list)
+		return fq_codel_hash(q, skb) + 1;
+
+	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return 0;
+		}
+#endif
+		if (TC_H_MIN(res.classid) <= q->flows_cnt)
+			return TC_H_MIN(res.classid);
+	}
+	return 0;
+}
+
+/* helper functions : might be changed when/if skb use a standard list_head */
+
+/* remove one skb from head of slot queue */
+static inline struct sk_buff *dequeue_head(struct fq_codel_flow *flow)
+{
+	struct sk_buff *skb = flow->head;
+
+	flow->head = skb->next;
+	skb->next = NULL;
+	return skb;
+}
+
+/* add skb to flow queue (tail add) */
+static inline void flow_queue_add(struct fq_codel_flow *flow,
+				  struct sk_buff *skb)
+{
+	if (flow->head == NULL)
+		flow->head = skb;
+	else
+		flow->tail->next = skb;
+	flow->tail = skb;
+	skb->next = NULL;
+}
+
+static unsigned int fq_codel_drop(struct Qdisc *sch)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+	unsigned int maxbacklog = 0, idx = 0, i, len;
+	struct fq_codel_flow *flow;
+
+	/* Queue is full! Find the fat flow and drop packet from it.
+	 * This might sound expensive, but with 1024 flows, we scan
+	 * 4KB of memory, and we dont need to handle a complex tree
+	 * in fast path (packet queue/enqueue) with many cache misses.
+	 */
+	for (i = 0; i < q->flows_cnt; i++) {
+		if (q->backlogs[i] > maxbacklog) {
+			maxbacklog = q->backlogs[i];
+			idx = i;
+		}
+	}
+	flow = &q->flows[idx];
+	skb = dequeue_head(flow);
+	len = qdisc_pkt_len(skb);
+	q->backlogs[idx] -= len;
+	kfree_skb(skb);
+	sch->q.qlen--;
+	sch->qstats.drops++;
+	sch->qstats.backlog -= len;
+	return idx;
+}
+
+static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	unsigned int idx;
+	struct fq_codel_flow *flow;
+	int uninitialized_var(ret);
+
+	idx = fq_codel_classify(skb, sch, &ret);
+	if (idx == 0) {
+		if (ret & __NET_XMIT_BYPASS)
+			sch->qstats.drops++;
+		kfree_skb(skb);
+		return ret;
+	}
+	idx--;
+
+	codel_set_enqueue_time(skb);
+	flow = &q->flows[idx];
+	flow_queue_add(flow, skb);
+	q->backlogs[idx] += qdisc_pkt_len(skb);
+	sch->qstats.backlog += qdisc_pkt_len(skb);
+
+	if (list_empty(&flow->flowchain)) {
+		list_add_tail(&flow->flowchain, &q->new_flows);
+		codel_vars_init(&flow->cvars);
+		q->new_flow_count++;
+		flow->deficit = q->quantum;
+	}
+	if (++sch->q.qlen < sch->limit)
+		return NET_XMIT_SUCCESS;
+
+	q->drop_overlimit++;
+	/* Return Congestion Notification only if we dropped a packet
+	 * from this flow.
+	 */
+	if (fq_codel_drop(sch) == idx)
+		return NET_XMIT_CN;
+
+	/* As we dropped a packet, better let upper stack know this */
+	qdisc_tree_decrease_qlen(sch, 1);
+	return NET_XMIT_SUCCESS;
+}
+
+/* This is the specific function called from codel_dequeue()
+ * to dequeue a packet from queue. Note: backlog is handled in
+ * codel, we dont need to reduce it here.
+ */
+static struct sk_buff *dequeue(struct codel_vars *vars, struct Qdisc *sch)
+{
+	struct fq_codel_flow *flow;
+	struct sk_buff *skb = NULL;
+
+	flow = container_of(vars, struct fq_codel_flow, cvars);
+	if (flow->head) {
+		skb = dequeue_head(flow);
+		sch->qstats.backlog -= qdisc_pkt_len(skb);
+		sch->q.qlen--;
+	}
+	return skb;
+}
+
+static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+	struct fq_codel_flow *flow;
+
+begin:
+	if (!list_empty(&q->new_flows))
+		flow = list_first_entry(&q->new_flows,
+					struct fq_codel_flow,
+					flowchain);
+	else if (!list_empty(&q->old_flows))
+		flow = list_first_entry(&q->old_flows,
+					struct fq_codel_flow,
+					flowchain);
+	else
+		return NULL;
+
+	if (flow->deficit <= 0) {
+		flow->deficit += q->quantum;
+		list_move_tail(&flow->flowchain, &q->old_flows);
+		goto begin;
+	}
+	skb = codel_dequeue(sch, &q->cparams, &flow->cvars, &q->cstats,
+			    dequeue, &q->backlogs[flow - q->flows]);
+	if (!skb) {
+		list_del_init(&flow->flowchain);
+		goto begin;
+	}
+	qdisc_bstats_update(sch, skb);
+	flow->deficit -= qdisc_pkt_len(skb);
+	return skb;
+}
+
+static void fq_codel_reset(struct Qdisc *sch)
+{
+	struct sk_buff *skb;
+
+	while ((skb = fq_codel_dequeue(sch)) != NULL)
+		kfree_skb(skb);
+}
+
+static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = {
+	[TCA_FQ_CODEL_TARGET]	= { .type = NLA_U32 },
+	[TCA_FQ_CODEL_LIMIT]	= { .type = NLA_U32 },
+	[TCA_FQ_CODEL_INTERVAL]	= { .type = NLA_U32 },
+	[TCA_FQ_CODEL_ECN]	= { .type = NLA_U32 },
+};
+
+static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_FQ_CODEL_MAX + 1];
+	int err;
+
+	if (!opt)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_FQ_CODEL_MAX, opt, fq_codel_policy);
+	if (err < 0)
+		return err;
+	if (tb[TCA_FQ_CODEL_FLOWS]) {
+		if (q->flows)
+			return -EINVAL;
+		q->flows_cnt = nla_get_u32(tb[TCA_FQ_CODEL_FLOWS]);
+		if (!q->flows_cnt ||
+		    q->flows_cnt > 65536)
+			return -EINVAL;
+	}
+	sch_tree_lock(sch);
+
+	if (tb[TCA_FQ_CODEL_TARGET]) {
+		u64 target = nla_get_u32(tb[TCA_FQ_CODEL_TARGET]);
+
+		q->cparams.target = (target * NSEC_PER_USEC) >> CODEL_SHIFT;
+	}
+
+	if (tb[TCA_FQ_CODEL_INTERVAL]) {
+		u64 interval = nla_get_u32(tb[TCA_FQ_CODEL_INTERVAL]);
+
+		q->cparams.interval = (interval * NSEC_PER_USEC) >> CODEL_SHIFT;
+	}
+
+	if (tb[TCA_FQ_CODEL_LIMIT])
+		sch->limit = nla_get_u32(tb[TCA_FQ_CODEL_LIMIT]);
+
+	if (tb[TCA_FQ_CODEL_ECN])
+		q->cparams.ecn = !!nla_get_u32(tb[TCA_FQ_CODEL_ECN]);
+
+	while (sch->q.qlen > sch->limit) {
+		struct sk_buff *skb = fq_codel_dequeue(sch);
+
+		kfree_skb(skb);
+		q->cstats.drop_count++;
+	}
+	qdisc_tree_decrease_qlen(sch, q->cstats.drop_count);
+	q->cstats.drop_count = 0;
+
+	sch_tree_unlock(sch);
+	return 0;
+}
+
+static void *fq_codel_zalloc(size_t sz)
+{
+	void *ptr = kzalloc(sz, GFP_KERNEL | __GFP_NOWARN);
+
+	if (!ptr)
+		ptr = vzalloc(sz);
+	return ptr;
+}
+
+static void fq_codel_free(void *addr)
+{
+	if (addr) {
+		if (is_vmalloc_addr(addr))
+			vfree(addr);
+		else
+			kfree(addr);
+	}
+}
+
+static void fq_codel_destroy(struct Qdisc *sch)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	fq_codel_free(q->backlogs);
+	fq_codel_free(q->flows);
+}
+
+static int fq_codel_init(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	int i;
+
+	sch->limit = 10*1024;
+	q->flows_cnt = 1024;
+	q->quantum = psched_mtu(qdisc_dev(sch));
+	q->perturbation = net_random();
+	INIT_LIST_HEAD(&q->new_flows);
+	INIT_LIST_HEAD(&q->old_flows);
+	codel_params_init(&q->cparams);
+	codel_stats_init(&q->cstats);
+	q->cparams.ecn = true;
+
+	if (opt) {
+		int err = fq_codel_change(sch, opt);
+		if (err)
+			return err;
+	}
+
+	if (!q->flows) {
+		q->flows = fq_codel_zalloc(q->flows_cnt *
+					   sizeof(struct fq_codel_flow));
+		if (!q->flows)
+			return -ENOMEM;
+		q->backlogs = fq_codel_zalloc(q->flows_cnt * sizeof(u32));
+		if (!q->backlogs) {
+			fq_codel_free(q->flows);
+			return -ENOMEM;
+		}
+		for (i = 0; i < q->flows_cnt; i++) {
+			struct fq_codel_flow *flow = q->flows + i;
+
+			INIT_LIST_HEAD(&flow->flowchain);
+		}
+	}
+	if (sch->limit >= 1)
+		sch->flags |= TCQ_F_CAN_BYPASS;
+	else
+		sch->flags &= ~TCQ_F_CAN_BYPASS;
+	return 0;
+}
+
+static int fq_codel_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts;
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+
+	if (nla_put_u32(skb, TCA_FQ_CODEL_TARGET,
+			codel_time_to_us(q->cparams.target)) ||
+	    nla_put_u32(skb, TCA_FQ_CODEL_LIMIT,
+			sch->limit) ||
+	    nla_put_u32(skb, TCA_FQ_CODEL_INTERVAL,
+			codel_time_to_us(q->cparams.interval)) ||
+	    nla_put_u32(skb, TCA_FQ_CODEL_ECN,
+			q->cparams.ecn) ||
+	    nla_put_u32(skb, TCA_FQ_CODEL_FLOWS,
+			q->flows_cnt))
+		goto nla_put_failure;
+
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -1;
+}
+
+static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	struct tc_fq_codel_xstats st = {
+		.type				= TCA_FQ_CODEL_XSTATS_QDISC,
+		.qdisc_stats.maxpacket		= q->cstats.maxpacket,
+		.qdisc_stats.drop_overlimit	= q->drop_overlimit,
+		.qdisc_stats.ecn_mark		= q->cstats.ecn_mark,
+		.qdisc_stats.new_flow_count	= q->new_flow_count,
+	};
+	struct list_head *pos;
+
+	list_for_each(pos, &q->new_flows)
+		st.qdisc_stats.new_flows_len++;
+
+	list_for_each(pos, &q->old_flows)
+		st.qdisc_stats.old_flows_len++;
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static struct Qdisc *fq_codel_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	return NULL;
+}
+
+static unsigned long fq_codel_get(struct Qdisc *sch, u32 classid)
+{
+	return 0;
+}
+
+static unsigned long fq_codel_bind(struct Qdisc *sch, unsigned long parent,
+			      u32 classid)
+{
+	/* we cannot bypass queue discipline anymore */
+	sch->flags &= ~TCQ_F_CAN_BYPASS;
+	return 0;
+}
+
+static void fq_codel_put(struct Qdisc *q, unsigned long cl)
+{
+}
+
+static struct tcf_proto **fq_codel_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static int fq_codel_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	return 0;
+}
+
+static int fq_codel_dump_class_stats(struct Qdisc *sch, unsigned long cl,
+				     struct gnet_dump *d)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	u32 idx = cl - 1;
+	struct gnet_stats_queue qs = { 0 };
+	struct tc_fq_codel_xstats xstats;
+
+	if (idx < q->flows_cnt) {
+		const struct fq_codel_flow *flow = &q->flows[idx];
+		const struct sk_buff *skb = flow->head;
+
+		memset(&xstats, 0, sizeof(xstats));
+		xstats.type = TCA_FQ_CODEL_XSTATS_CLASS;
+		xstats.class_stats.deficit = flow->deficit;
+		xstats.class_stats.ldelay =
+			codel_time_to_us(flow->cvars.ldelay);
+		xstats.class_stats.count = flow->cvars.count;
+		xstats.class_stats.lastcount = flow->cvars.lastcount;
+		xstats.class_stats.dropping = flow->cvars.dropping;
+		if (flow->cvars.dropping) {
+			codel_tdiff_t delta = flow->cvars.drop_next -
+					      codel_get_time();
+
+			xstats.class_stats.drop_next = (delta >= 0) ?
+				codel_time_to_us(delta) :
+				-codel_time_to_us(-delta);
+		}
+		while (skb) {
+			qs.qlen++;
+			skb = skb->next;
+		}
+		qs.backlog = q->backlogs[idx];
+	}
+	if (gnet_stats_copy_queue(d, &qs) < 0)
+		return -1;
+	if (idx < q->flows_cnt)
+		return gnet_stats_copy_app(d, &xstats, sizeof(xstats));
+	return 0;
+}
+
+static void fq_codel_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	struct fq_codel_sched_data *q = qdisc_priv(sch);
+	unsigned int i;
+
+	if (arg->stop)
+		return;
+
+	for (i = 0; i < q->flows_cnt; i++) {
+		if (list_empty(&q->flows[i].flowchain) ||
+		    arg->count < arg->skip) {
+			arg->count++;
+			continue;
+		}
+		if (arg->fn(sch, i + 1, arg) < 0) {
+			arg->stop = 1;
+			break;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops fq_codel_class_ops = {
+	.leaf		=	fq_codel_leaf,
+	.get		=	fq_codel_get,
+	.put		=	fq_codel_put,
+	.tcf_chain	=	fq_codel_find_tcf,
+	.bind_tcf	=	fq_codel_bind,
+	.unbind_tcf	=	fq_codel_put,
+	.dump		=	fq_codel_dump_class,
+	.dump_stats	=	fq_codel_dump_class_stats,
+	.walk		=	fq_codel_walk,
+};
+
+static struct Qdisc_ops fq_codel_qdisc_ops __read_mostly = {
+	.cl_ops		=	&fq_codel_class_ops,
+	.id		=	"fq_codel",
+	.priv_size	=	sizeof(struct fq_codel_sched_data),
+	.enqueue	=	fq_codel_enqueue,
+	.dequeue	=	fq_codel_dequeue,
+	.peek		=	qdisc_peek_dequeued,
+	.drop		=	fq_codel_drop,
+	.init		=	fq_codel_init,
+	.reset		=	fq_codel_reset,
+	.destroy	=	fq_codel_destroy,
+	.change		=	NULL,
+	.dump		=	fq_codel_dump,
+	.dump_stats =	fq_codel_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init fq_codel_module_init(void)
+{
+	return register_qdisc(&fq_codel_qdisc_ops);
+}
+
+static void __exit fq_codel_module_exit(void)
+{
+	unregister_qdisc(&fq_codel_qdisc_ops);
+}
+
+module_init(fq_codel_module_init)
+module_exit(fq_codel_module_exit)
+MODULE_AUTHOR("Eric Dumazet");
+MODULE_LICENSE("GPL");

^ permalink raw reply related

* [PATCH v4] bonding: don't increase rx_dropped after processing LACPDUs
From: Jiri Bohac @ 2012-05-11 13:59 UTC (permalink / raw)
  To: David Miller; +Cc: andy, netdev, fubar

Since commit 3aba891d, bonding processes LACP frames (802.3ad
mode) with bond_handle_frame(). Currently a copy of the skb is
made and the original is left to be processed by other
rx_handlers and the rest of the network stack by returning
RX_HANDLER_ANOTHER.  As there is no protocol handler for
PKT_TYPE_LACPDU, the frame is dropped and dev->rx_dropped
increased.

Fix this by making bond_handle_frame() return RX_HANDLER_CONSUMED
if bonding has processed the LACP frame.

Signed-off-by: Jiri Bohac <jbohac@suse.cz>

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2173,9 +2173,10 @@ re_arm:
  * received frames (loopback). Since only the payload is given to this
  * function, it check for loopback.
  */
-static void bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u16 length)
+static int bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u16 length)
 {
 	struct port *port;
+	int ret = RX_HANDLER_ANOTHER;
 
 	if (length >= sizeof(struct lacpdu)) {
 
@@ -2184,11 +2185,12 @@ static void bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u
 		if (!port->slave) {
 			pr_warning("%s: Warning: port of slave %s is uninitialized\n",
 				   slave->dev->name, slave->dev->master->name);
-			return;
+			return ret;
 		}
 
 		switch (lacpdu->subtype) {
 		case AD_TYPE_LACPDU:
+			ret = RX_HANDLER_CONSUMED;
 			pr_debug("Received LACPDU on port %d\n",
 				 port->actor_port_number);
 			/* Protect against concurrent state machines */
@@ -2198,6 +2200,7 @@ static void bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u
 			break;
 
 		case AD_TYPE_MARKER:
+			ret = RX_HANDLER_CONSUMED;
 			// No need to convert fields to Little Endian since we don't use the marker's fields.
 
 			switch (((struct bond_marker *)lacpdu)->tlv_type) {
@@ -2219,6 +2222,7 @@ static void bond_3ad_rx_indication(struct lacpdu *lacpdu, struct slave *slave, u
 			}
 		}
 	}
+	return ret;
 }
 
 /**
@@ -2456,18 +2460,20 @@ out:
 	return NETDEV_TX_OK;
 }
 
-void bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
+int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
 			  struct slave *slave)
 {
+	int ret = RX_HANDLER_ANOTHER;
 	if (skb->protocol != PKT_TYPE_LACPDU)
-		return;
+		return ret;
 
 	if (!pskb_may_pull(skb, sizeof(struct lacpdu)))
-		return;
+		return ret;
 
 	read_lock(&bond->lock);
-	bond_3ad_rx_indication((struct lacpdu *) skb->data, slave, skb->len);
+	ret = bond_3ad_rx_indication((struct lacpdu *) skb->data, slave, skb->len);
 	read_unlock(&bond->lock);
+	return ret;
 }
 
 /*
diff --git a/drivers/net/bonding/bond_3ad.h b/drivers/net/bonding/bond_3ad.h
index 235b2cc..5ee7e3c 100644
--- a/drivers/net/bonding/bond_3ad.h
+++ b/drivers/net/bonding/bond_3ad.h
@@ -274,7 +274,7 @@ void bond_3ad_adapter_duplex_changed(struct slave *slave);
 void bond_3ad_handle_link_change(struct slave *slave, char link);
 int  bond_3ad_get_active_agg_info(struct bonding *bond, struct ad_info *ad_info);
 int bond_3ad_xmit_xor(struct sk_buff *skb, struct net_device *dev);
-void bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
+int bond_3ad_lacpdu_recv(struct sk_buff *skb, struct bonding *bond,
 			  struct slave *slave);
 int bond_3ad_set_carrier(struct bonding *bond);
 void bond_3ad_update_lacp_rate(struct bonding *bond);
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 9abfde4..2e1f806 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -342,26 +342,26 @@ static void rlb_update_entry_from_arp(struct bonding *bond, struct arp_pkt *arp)
 	_unlock_rx_hashtbl_bh(bond);
 }
 
-static void rlb_arp_recv(struct sk_buff *skb, struct bonding *bond,
+static int rlb_arp_recv(struct sk_buff *skb, struct bonding *bond,
 			 struct slave *slave)
 {
 	struct arp_pkt *arp;
 
 	if (skb->protocol != cpu_to_be16(ETH_P_ARP))
-		return;
+		goto out;
 
 	arp = (struct arp_pkt *) skb->data;
 	if (!arp) {
 		pr_debug("Packet has no ARP data\n");
-		return;
+		goto out;
 	}
 
 	if (!pskb_may_pull(skb, arp_hdr_len(bond->dev)))
-		return;
+		goto out;
 
 	if (skb->len < sizeof(struct arp_pkt)) {
 		pr_debug("Packet is too small to be an ARP\n");
-		return;
+		goto out;
 	}
 
 	if (arp->op_code == htons(ARPOP_REPLY)) {
@@ -369,6 +369,8 @@ static void rlb_arp_recv(struct sk_buff *skb, struct bonding *bond,
 		rlb_update_entry_from_arp(bond, arp);
 		pr_debug("Server received an ARP Reply from client\n");
 	}
+out:
+	return RX_HANDLER_ANOTHER;
 }
 
 /* Caller must hold bond lock for read */
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 62d2409..bc13b3d 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1444,8 +1444,9 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
 	struct sk_buff *skb = *pskb;
 	struct slave *slave;
 	struct bonding *bond;
-	void (*recv_probe)(struct sk_buff *, struct bonding *,
+	int (*recv_probe)(struct sk_buff *, struct bonding *,
 				struct slave *);
+	int ret = RX_HANDLER_ANOTHER;
 
 	skb = skb_share_check(skb, GFP_ATOMIC);
 	if (unlikely(!skb))
@@ -1464,8 +1465,12 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
 		struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
 
 		if (likely(nskb)) {
-			recv_probe(nskb, bond, slave);
+			ret = recv_probe(nskb, bond, slave);
 			dev_kfree_skb(nskb);
+			if (ret == RX_HANDLER_CONSUMED) {
+				consume_skb(skb);
+				return ret;
+			}
 		}
 	}
 
@@ -1487,7 +1492,7 @@ static rx_handler_result_t bond_handle_frame(struct sk_buff **pskb)
 		memcpy(eth_hdr(skb)->h_dest, bond->dev->dev_addr, ETH_ALEN);
 	}
 
-	return RX_HANDLER_ANOTHER;
+	return ret;
 }
 
 /* enslave device <slave> to bond device <master> */
@@ -2723,7 +2728,7 @@ static void bond_validate_arp(struct bonding *bond, struct slave *slave, __be32
 	}
 }
 
-static void bond_arp_rcv(struct sk_buff *skb, struct bonding *bond,
+static int bond_arp_rcv(struct sk_buff *skb, struct bonding *bond,
 			 struct slave *slave)
 {
 	struct arphdr *arp;
@@ -2731,7 +2736,7 @@ static void bond_arp_rcv(struct sk_buff *skb, struct bonding *bond,
 	__be32 sip, tip;
 
 	if (skb->protocol != __cpu_to_be16(ETH_P_ARP))
-		return;
+		return RX_HANDLER_ANOTHER;
 
 	read_lock(&bond->lock);
 
@@ -2776,6 +2781,7 @@ static void bond_arp_rcv(struct sk_buff *skb, struct bonding *bond,
 
 out_unlock:
 	read_unlock(&bond->lock);
+	return RX_HANDLER_ANOTHER;
 }
 
 /*
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 9f2bae66..4581aa5 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -218,7 +218,7 @@ struct bonding {
 	struct   slave *primary_slave;
 	bool     force_primary;
 	s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
-	void     (*recv_probe)(struct sk_buff *, struct bonding *,
+	int     (*recv_probe)(struct sk_buff *, struct bonding *,
 			       struct slave *);
 	rwlock_t lock;
 	rwlock_t curr_slave_lock;


-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ

^ permalink raw reply related

* Re: [PATCH v3][resend] bonding: don't increase rx_dropped after processing LACPDUs
From: Jiri Bohac @ 2012-05-11 13:54 UTC (permalink / raw)
  To: David Miller; +Cc: jbohac, andy, netdev, fubar
In-Reply-To: <20120510.233134.1390996845692589691.davem@davemloft.net>

On Thu, May 10, 2012 at 11:31:34PM -0400, David Miller wrote:
> > Since commit 3aba891d, bonding processes LACP frames (802.3ad
> > mode) with bond_handle_frame(). Currently a copy of the skb is
> > made and the original is left to be processed by other
> > rx_handlers and the rest of the network stack by returning
> > RX_HANDLER_ANOTHER.  As there is no protocol handler for
> > PKT_TYPE_LACPDU, the frame is dropped and dev->rx_dropped
> > increased.
> > 
> > Fix this by making bond_handle_frame() return RX_HANDLER_CONSUMED
> > if bonding has processed the LACP frame.
> > 
> > Signed-off-by: Jiri Bohac <jbohac@suse.cz>
> > 
> 
> Don't send me garbage you didn't even check the compile of:
> 
> drivers/net/bonding/bond_main.c: In function ‘bond_handle_frame’:
> drivers/net/bonding/bond_main.c:1463:13: warning: assignment from incompatible pointer type [enabled by default]
> drivers/net/bonding/bond_main.c: In function ‘bond_open’:
> drivers/net/bonding/bond_main.c:3441:21: warning: assignment from incompatible pointer type [enabled by default]
> drivers/net/bonding/bond_main.c:3448:20: warning: assignment from incompatible pointer type [enabled by default]

sorry, I overlooked these warnings. The patch actually broke the
rlb mode which I did not test. 

I'll send a fixed patch (v4).

Thanks!

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox