Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: STMMAC driver: NFS Problem on 2.6.37
From: Armando Visconti @ 2011-01-13 18:28 UTC (permalink / raw)
  To: Chuck Lever, Deepak SIKRI
  Cc: Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Shiraz HASHIM,
	Viresh KUMAR, Giuseppe CAVALLARO
In-Reply-To: <2D04CF75-CA68-4BDC-99A3-FA1DD6113602-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

Chuck Lever wrote:
> On Jan 13, 2011, at 4:09 AM, deepaksi wrote:
>
>   
>> Hi
>>
>> I am facing a problem related to nfs boot, while using the stmmac driver
>> ported on 2.6.37 kernel. When we use a JFFS2 file system and mount the kernel,
>> the network driver works fine.
>>
>> I have been following the mailing list and could find some issues with NFS 
>> on 2.6.37 but I am not too sure whether the kernel crash I am getting is 
>> related to that.
>>
>> The driver worked fine on 2.6.32 kernel, but while booting the 2.6.37
>> kernel I get the following log messages:
>>
>> stmmac: Rx Checksum Offload Engine supported
>>        TX Checksum insertion supported
>> IP-Config: Complete:
>>     device=eth0, addr=192.168.1.10, mask=255.255.255.0, gw=255.255.255.255,
>>     host=192.168.1.10, domain=, nis-domain=(none),
>>     bootserver=192.168.1.1, rootserver=192.168.1.1, rootpath=
>>     
>
> Why is rootpath left undefined?
>   

Yes, Chuck.
Good catch.

Deepak,
Can you possibly verify  your bootargs?

I  see exactly your same problem with kernel 2.6.32 (rc6.3) on my
board, where the bootargs is defined like this:

bootargs=console=ttyAMA0,115200 mem=128M root=/dev/nfs 
ip=192.168.1.10:192.168.1
.1:192.168.1.1:255.255.255.0 nfsroot=192.168.1.1:/home/guest/armv7/target

In fact, rootpath is undefined also in my case...

But if I get the network info from my DHCP server the system is booting 
correctly.
(i.e. console=ttyAMA0,115200 mem=128M root=/dev/nfs ip=dhcp)

So, why do we have rootpath undefined in our bootargs?
I guess we screwed up something someway...

Let's see it tomorrow.

Ciao,
Arm



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] CHOKe flow scheduler (0.7)
From: Eric Dumazet @ 2011-01-13 18:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20110113092706.154748c2@s6510>

Le jeudi 13 janvier 2011 à 09:27 -0800, Stephen Hemminger a écrit :
> This implements the CHOKe packet scheduler based on the existing
> Linux RED scheduler based on the algorithm described in the paper.
> 
> The core idea is:
>   For every packet arrival:
>   	Calculate Qave
> 	if (Qave < minth) 
> 	     Queue the new packet
> 	else 
> 	     Select randomly a packet from the queue 
> 	     if (both packets from same flow)
> 	     then Drop both the packets
> 	     else if (Qave > maxth)
> 	          Drop packet
> 	     else
> 	       	  Admit packet with proability p (same as RED)
> 
> See also:
>   Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
>    queue management scheme for approximating fair bandwidth allocation", 
>   Proceeding of INFOCOM'2000, March 2000.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> ---
> Patch versions
> 0.3 (Eric) uses table for queue.
> 0.4 allows classification with TC filters
>     fixes crash when peek_random() finds a hole
> 0.5 (Eric) that fixes qlen with holes and peek
> 0.7 change to use separate params / stats than RED
>     account for drops in backlog
> 
> Almost ready, still need to make sure API (netlink) is right
> 
> 
>  net/sched/Kconfig     |   11 +
>  net/sched/Makefile    |    1 
>  net/sched/sch_choke.c |  536 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 548 insertions(+)
> 
> --- a/net/sched/Kconfig	2011-01-12 17:44:05.747500044 -0800
> +++ b/net/sched/Kconfig	2011-01-12 17:44:53.167735188 -0800
> @@ -205,6 +205,17 @@ config NET_SCH_DRR
>  
>  	  If unsure, say N.
>  
> +config NET_SCH_CHOKE
> +	tristate "CHOose and Keep responsive flow scheduler (CHOKE)"
> +	help
> +	  Say Y here if you want to use the CHOKe packet scheduler (CHOose
> +	  and Keep for responsive flows, CHOose and Kill for unresponsive
> +	  flows). This is a variation of RED which trys to penalize flows
> +	  that monopolize the queue.
> +
> +	  To compile this code as a module, choose M here: the
> +	  module will be called sch_choke.
> +
>  config NET_SCH_INGRESS
>  	tristate "Ingress Qdisc"
>  	depends on NET_CLS_ACT
> --- a/net/sched/Makefile	2011-01-12 17:44:05.767500135 -0800
> +++ b/net/sched/Makefile	2011-01-12 17:44:53.167735188 -0800
> @@ -32,6 +32,7 @@ obj-$(CONFIG_NET_SCH_MULTIQ)	+= sch_mult
>  obj-$(CONFIG_NET_SCH_ATM)	+= sch_atm.o
>  obj-$(CONFIG_NET_SCH_NETEM)	+= sch_netem.o
>  obj-$(CONFIG_NET_SCH_DRR)	+= sch_drr.o
> +obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
>  obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
>  obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
>  obj-$(CONFIG_NET_CLS_FW)	+= cls_fw.o
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ b/net/sched/sch_choke.c	2011-01-12 17:45:07.227806180 -0800
> @@ -0,0 +1,556 @@
> +/*
> + * net/sched/sch_choke.c	CHOKE scheduler
> + *
> + * Copyright (c) 2011 Stephen Hemminger <shemminger@vyatta.com>
> + * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * version 2 as published by the Free Software Foundation.
> + *
> + */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/skbuff.h>
> +#include <linux/reciprocal_div.h>
> +#include <net/pkt_sched.h>
> +#include <net/inet_ecn.h>
> +#include <net/red.h>
> +
> +/*	CHOKe stateless AQM for fair bandwidth allocation
> +        =================================================
> +
> +   CHOKe (CHOose and Keep for responsive flows, CHOose and Kill for
> +   unresponsive flows) is a variant of RED that penalizes misbehaving flows but
> +   maintains no flow state. The difference from RED is an additional step
> +   during the enqueuing process. If average queue size is over the
> +   low threshold (qmin), a packet is chosen at random from the queue.
> +   If both the new and chosen packet are from the same flow, both
> +   are dropped. Unlike RED, CHOKe is not really a "classful" qdisc because it
> +   needs to access packets in queue randomly. It has a minimal class
> +   interface to allow overriding the builtin flow classifier with
> +   filters.
> +
> +   Source:
> +   R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless
> +   Active Queue Management Scheme for Approximating Fair Bandwidth Allocation",
> +   IEEE INFOCOM, 2000.
> +
> +   A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial
> +   Characteristics", IEEE/ACM Transactions on Networking, 2004
> +
> + */
> +
> +/* Upper bound on size of sk_buff table */
> +#define CHOKE_MAX_QUEUE	(128*1024 - 1)
> +
> +struct choke_sched_data {
> +/* Parameters */
> +	u32		 limit;
> +	unsigned char	 flags;
> +
> +	struct red_parms parms;
> +
> +/* Variables */
> +	struct tcf_proto *filter_list;
> +	struct {
> +		u32	prob_drop;	/* Early probability drops */
> +		u32	prob_mark;	/* Early probability marks */
> +		u32	forced_drop;	/* Forced drops, qavg > max_thresh */
> +		u32	forced_mark;	/* Forced marks, qavg > max_thresh */
> +		u32	pdrop;          /* Drops due to queue limits */
> +		u32	other;          /* Drops due to drop() calls */
> +		u32	matched;	/* Drops to flow match */
> +	} stats;
> +
> +	unsigned int	 head;
> +	unsigned int	 tail;
> +	unsigned int	 holes;
> +	unsigned int	 tab_mask; /* size - 1 */
> +
> +	struct sk_buff **tab;
> +};
> +
> +static inline unsigned int choke_len(const struct choke_sched_data *q)
> +{
> +	return (q->tail - q->head) & q->tab_mask;
> +}
> +
> +/* deliver a random number between 0 and N - 1 */
> +static inline u32 random_N(unsigned int N)
> +{
> +	return reciprocal_divide(random32(), N);
> +}
> +
> +/* Select a packet at random from the queue in O(1) and handle holes */
> +static struct sk_buff *choke_peek_random(struct choke_sched_data *q,
> +					 unsigned int *pidx)
> +{
> +	struct sk_buff *skb;
> +	int retrys = 3;
> +
> +	do {
> +		*pidx = (q->head + random_N(choke_len(q))) & q->tab_mask;
> +		skb = q->tab[*pidx];
> +		if (skb)
> +			return skb;
> +	} while (--retrys > 0);
> +
> +	/* queue is has lots of holes use the head which is known to exist */
> +	return q->tab[*pidx = q->head];
> +}
> +
> +/* Is ECN parameter configured */
> +static inline int use_ecn(const struct choke_sched_data *q)
> +{
> +	return q->flags & TC_RED_ECN;
> +}
> +
> +/* Should packets over max just be dropped (versus marked) */
> +static inline int use_harddrop(const struct choke_sched_data *q)
> +{
> +	return q->flags & TC_RED_HARDDROP;
> +}
> +
> +/* Move head pointer forward to skip over holes */
> +static void choke_zap_head_holes(struct choke_sched_data *q)
> +{
> +	while (q->holes && q->tab[q->head] == NULL) {
> +		q->head = (q->head + 1) & q->tab_mask;
> +		q->holes--;
> +	}
> +}
> +
> +/* Move tail pointer backwards to reuse holes */
> +static void choke_zap_tail_holes(struct choke_sched_data *q)
> +{
> +	while (q->holes && q->tab[q->tail - 1] == NULL) {
> +		q->tail = (q->tail - 1) & q->tab_mask;
> +		q->holes--;
> +	}
> +}
> +
> +/* Drop packet from queue array by creating a "hole" */
> +static void choke_drop_by_idx(struct choke_sched_data *q, unsigned int idx)
> +{
> +	q->tab[idx] = NULL;
> +	q->holes++;
> +
> +	if (idx == q->head)
> +		choke_zap_head_holes(q);
> +	if (idx == q->tail)
> +		choke_zap_tail_holes(q);
> +}
> +
> +/* Classify flow using either:
> +   1. pre-existing classification result in skb
> +   2. fast internal classification
> +   3. use TC filter based classification
> +*/
> +static inline unsigned int choke_classify(struct sk_buff *skb,
> +					  struct Qdisc *sch, int *qerr)
> +
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +	struct tcf_result res;
> +	int result;
> +
> +	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
> +
> +	if (TC_H_MAJ(skb->priority) == sch->handle &&
> +	    TC_H_MIN(skb->priority) > 0)
> +		return TC_H_MIN(skb->priority);
> +
> +	if (!q->filter_list)
> +		return skb_get_rxhash(skb);
> +
> +	result = tc_classify(skb, q->filter_list, &res);
> +	if (result >= 0) {
> +#ifdef CONFIG_NET_CLS_ACT
> +		switch (result) {
> +		case TC_ACT_STOLEN:
> +		case TC_ACT_QUEUED:
> +			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
> +		case TC_ACT_SHOT:
> +			return 0;
> +		}
> +#endif
> +		return TC_H_MIN(res.classid);
> +	}
> +
> +	return 0;
> +}
> +
> +static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +	struct red_parms *p = &q->parms;
> +	unsigned int hash;
> +	int uninitialized_var(ret);
> +
> +	hash = choke_classify(skb, sch, &ret);
> +	if (!hash) {
> +		/* Packet was eaten by filter */
> +		if (ret & __NET_XMIT_BYPASS)
> +			sch->qstats.drops++;
> +		kfree_skb(skb);
> +		return ret;
> +	}
> +
> +	/* Maybe add hash as field in struct qdisc_skb_cb? */
> +	*(unsigned int *)(qdisc_skb_cb(skb)->data) = hash;
> +
> +	/* Compute average queue usage (see RED) */
> +	p->qavg = red_calc_qavg(p, choke_len(q) - q->holes);
> +	if (red_is_idling(p))
> +		red_end_of_idle_period(p);
> +
> +	/* Is queue small? */
> +	if (p->qavg <= p->qth_min)
> +		p->qcount = -1;
> +	else {
> +		struct sk_buff *oskb;
> +		unsigned int idx;
> +
> +		/* Draw a packet at random from queue */
> +		oskb = choke_peek_random(q, &idx);
> +
> +		/* Both packets from same flow ? */
> +		if (*(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
> +			/* Drop both packets */
> +			q->stats.matched++;
> +			choke_drop_by_idx(q, idx);
> +			qdisc_drop(oskb, sch);

I feel we should add : sch->q.qlen--;

> +			goto congestion_drop;
> +		}
> +
> +		/* Queue is large, always mark/drop */
> +		if (p->qavg > p->qth_max) {
> +			p->qcount = -1;
> +
> +			sch->qstats.overlimits++;
> +			if (use_harddrop(q) || !use_ecn(q) ||
> +			    !INET_ECN_set_ce(skb)) {
> +				q->stats.forced_drop++;
> +				goto congestion_drop;
> +			}
> +
> +			q->stats.forced_mark++;
> +		} else if (++p->qcount) {
> +			if (red_mark_probability(p, p->qavg)) {
> +				p->qcount = 0;
> +				p->qR = red_random(p);
> +
> +				sch->qstats.overlimits++;
> +				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
> +					q->stats.prob_drop++;
> +					goto congestion_drop;
> +				}
> +
> +				q->stats.prob_mark++;
> +			}
> +		} else
> +			p->qR = red_random(p);
> +	}
> +
> +	/* Admit new packet */
> +	if (likely(choke_len(q) < q->limit)) {
> +
> +		q->tab[q->tail] = skb;
> +		q->tail = (q->tail + 1) & q->tab_mask;
> +
> +		sch->qstats.backlog += qdisc_pkt_len(skb);
> +		qdisc_update_bstats(sch, skb);
> +		sch->q.qlen = choke_len(q) - q->holes;
	or : sch->q.qlen++;

(If sch->q.qlen is up2date in respect of above comment)

> +		return NET_XMIT_SUCCESS;
> +	}
> +
> +	q->stats.pdrop++;
> +	sch->qstats.drops++;
> +	kfree_skb(skb);
> +	return NET_XMIT_DROP;
> +
> + congestion_drop:
> +	qdisc_drop(skb, sch);
> +	return NET_XMIT_CN;
> +}
> +
> +static struct sk_buff *choke_dequeue(struct Qdisc *sch)
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +	struct sk_buff *skb;
> +
> +	if (q->head == q->tail) {
> +		if (!red_is_idling(&q->parms))
> +			red_start_of_idle_period(&q->parms);
> +		return NULL;
> +	}
> +	skb = q->tab[q->head];
> +	q->tab[q->head] = NULL; /* not really needed */
> +	q->head = (q->head + 1) & q->tab_mask;
> +	choke_zap_head_holes(q);
> +	sch->qstats.backlog -= qdisc_pkt_len(skb);
> +	sch->q.qlen = choke_len(q) - q->holes;

	sch->q.qlen--;

> +
> +	return skb;
> +}
> +
> +static unsigned int choke_drop(struct Qdisc *sch)
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +	unsigned int len;
> +
> +	len = qdisc_queue_drop(sch);
> +	if (len > 0)
> +		q->stats.other++;
> +	else {
> +		if (!red_is_idling(&q->parms))
> +			red_start_of_idle_period(&q->parms);
> +	}
> +
> +	return len;
> +}
> +
> +static void choke_reset(struct Qdisc* sch)
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +
> +	red_restart(&q->parms);
> +}
> +
> +static const struct nla_policy choke_policy[TCA_CHOKE_MAX + 1] = {
> +	[TCA_CHOKE_PARMS]	= { .len = sizeof(struct tc_red_qopt) },
> +	[TCA_CHOKE_STAB]	= { .len = 256 },

RED_STAB_SIZE ?


Thanks !



^ permalink raw reply

* Re: [PATCH] CHOKe flow scheduler (0.7)
From: Eric Dumazet @ 2011-01-13 18:02 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <1294941621.3570.419.camel@edumazet-laptop>

Le jeudi 13 janvier 2011 à 19:00 +0100, Eric Dumazet a écrit :
> Le jeudi 13 janvier 2011 à 09:27 -0800, Stephen Hemminger a écrit :
> > This implements the CHOKe packet scheduler based on the existing
> > Linux RED scheduler based on the algorithm described in the paper.
> > 
> >

Sorry for the long reply, I hit 'Send' button in the wrong window,
before removing hunks.




^ permalink raw reply

* Re: [PATCH v1 2/2] TCPCT API sockopt update to draft -03
From: Eric Dumazet @ 2011-01-13 18:00 UTC (permalink / raw)
  To: William Allen Simpson
  Cc: Stephen Hemminger, Linux Kernel Developers,
	Linux Kernel Network Developers, David Miller, Andrew Morton
In-Reply-To: <4D2F3723.9040405@gmail.com>

Le jeudi 13 janvier 2011 à 12:32 -0500, William Allen Simpson a écrit :
> On 1/12/11 1:56 PM, Stephen Hemminger wrote:
> > On Wed, 12 Jan 2011 12:59:38 -0500
> > William Allen Simpson<william.allen.simpson@gmail.com>  wrote:
> >
> >> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> >> index e64f4c6..c8f4017 100644
> >> --- a/include/linux/tcp.h
> >> +++ b/include/linux/tcp.h
> >> @@ -185,22 +185,37 @@ struct tcp_md5sig {
> >>   #define TCP_COOKIE_PAIR_SIZE	(2*TCP_COOKIE_MAX)
> >>
> >>   /* Flags for both getsockopt and setsockopt */
> >> -#define TCP_COOKIE_IN_ALWAYS	(1<<  0)	/* Discard SYN without cookie */
> >> -#define TCP_COOKIE_OUT_NEVER	(1<<  1)	/* Prohibit outgoing cookies,
> >> +#define TCPCT_IN_ALWAYS		(1<<  0)	/* Discard SYN without cookie */
> >> +#define TCPCT_OUT_NEVER		(1<<  1)	/* Prohibit outgoing cookies,
> >
> > You end up changing values in kernel userspace API in a way
> > that is incompatible with older applications. This is not acceptable.
> >
> While I agree in principle and argued strongly against it, other
> members of the research group (particularly the original project
> sponsor) have over-ridden my concerns.  I'm sorry to inform you that
> many/most participants don't care much about Linux.
> 

How leaving TCP_COOKIE_IN_ALWAYS and TCP_COOKIE_OUT_NEVER definitions so
that user space programs compiles can be a problem to "research group" ?

AFAIK, TCPCT_IN_ALWAYS / TCPCT_OUT_NEVER are not mentioned in
http://www.rfc-editor.org/authors/rfc6013.txt

But TCP_COOKIE_IN_ALWAYS and TCP_COOKIE_OUT_NEVER are ...

Isnt it a bit confusing ?

> Note that the *bits* are the same, and previously compiled programs
> (that don't access more advanced features) should continue to run as
> they have in the past.
> 
> Even though I'm not paid to work on Linux, I'm doing my best to give you
> folks a quick heads up and provide code to rectify the very recent changes
> that can be propagated back through the stable tree (to 2.6.33).
> 
> As always, what you actually do with my code is up to you....

Maybe its too early, and we should wait for an official RFC, especially
if you insist breaking API in 6 months.

^ permalink raw reply

* Re: [PATCH v1 2/2] TCPCT API sockopt update to draft -03
From: Arnaud Lacombe @ 2011-01-13 17:53 UTC (permalink / raw)
  To: William Allen Simpson
  Cc: Stephen Hemminger, Linux Kernel Developers,
	Linux Kernel Network Developers, David Miller, Andrew Morton
In-Reply-To: <4D2F3723.9040405@gmail.com>

Hi,

On Thu, Jan 13, 2011 at 12:32 PM, William Allen Simpson
<william.allen.simpson@gmail.com> wrote:
> On 1/12/11 1:56 PM, Stephen Hemminger wrote:
>>
>> On Wed, 12 Jan 2011 12:59:38 -0500
>> William Allen Simpson<william.allen.simpson@gmail.com>  wrote:
>>
>>> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
>>> index e64f4c6..c8f4017 100644
>>> --- a/include/linux/tcp.h
>>> +++ b/include/linux/tcp.h
>>> @@ -185,22 +185,37 @@ struct tcp_md5sig {
>>>  #define TCP_COOKIE_PAIR_SIZE  (2*TCP_COOKIE_MAX)
>>>
>>>  /* Flags for both getsockopt and setsockopt */
>>> -#define TCP_COOKIE_IN_ALWAYS   (1<<  0)        /* Discard SYN without
>>> cookie */
>>> -#define TCP_COOKIE_OUT_NEVER   (1<<  1)        /* Prohibit outgoing
>>> cookies,
>>> +#define TCPCT_IN_ALWAYS                (1<<  0)        /* Discard SYN
>>> without cookie */
>>> +#define TCPCT_OUT_NEVER                (1<<  1)        /* Prohibit
>>> outgoing cookies,
>>
>> You end up changing values in kernel userspace API in a way
>> that is incompatible with older applications. This is not acceptable.
>>
> While I agree in principle and argued strongly against it, other
> members of the research group (particularly the original project
> sponsor) have over-ridden my concerns.  I'm sorry to inform you that
> many/most participants don't care much about Linux.
>
> Note that the *bits* are the same, and previously compiled programs
> (that don't access more advanced features) should continue to run as
> they have in the past.
>
> Even though I'm not paid to work on Linux, I'm doing my best to give you
> folks a quick heads up and provide code to rectify the very recent changes
> that can be propagated back through the stable tree (to 2.6.33).
>
> As always, what you actually do with my code is up to you....
>
FWIW, what is the basis of this hunk ? The RFC text[0] seems to use
the TCP_COOKIE_* naming, not TCPCT_.

Thanks,
 - Arnaud

[0]: http://www.rfc-editor.org/authors/rfc6013.txt

--
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply

* [PATCH 2/2] ks8695net: Use default implementation of ethtool_ops::get_link
From: Ben Hutchings @ 2011-01-13 17:52 UTC (permalink / raw)
  To: Figo.zhang, zeal; +Cc: netdev
In-Reply-To: <1294941014.3946.46.camel@bwh-desktop>

This is completely untested as I don't have an ARM build environment.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
I'm fairly confident that this is right for the WAN mode.  It depends on
the previous patch.

Ben.

 drivers/net/arm/ks8695net.c |   16 +---------------
 1 files changed, 1 insertions(+), 15 deletions(-)

diff --git a/drivers/net/arm/ks8695net.c b/drivers/net/arm/ks8695net.c
index 8820fcd..62d6f88 100644
--- a/drivers/net/arm/ks8695net.c
+++ b/drivers/net/arm/ks8695net.c
@@ -994,20 +994,6 @@ ks8695_wan_nwayreset(struct net_device *ndev)
 }
 
 /**
- *	ks8695_wan_get_link - Retrieve link status of network interface
- *	@ndev: The network interface to retrive the link status of.
- */
-static u32
-ks8695_wan_get_link(struct net_device *ndev)
-{
-	struct ks8695_priv *ksp = netdev_priv(ndev);
-	u32 ctrl;
-
-	ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
-	return ctrl & WMC_WLS;
-}
-
-/**
  *	ks8695_wan_get_pause - Retrieve network pause/flow-control
advertising
  *	@ndev: The device to retrieve settings from
  *	@param: The structure to fill out with the information
@@ -1058,7 +1044,7 @@ static const struct ethtool_ops
ks8695_wan_ethtool_ops = {
 	.get_settings	= ks8695_wan_get_settings,
 	.set_settings	= ks8695_wan_set_settings,
 	.nway_reset	= ks8695_wan_nwayreset,
-	.get_link	= ks8695_wan_get_link,
+	.get_link	= ethtool_op_get_link,
 	.get_pauseparam = ks8695_wan_get_pause,
 	.get_drvinfo	= ks8695_get_drvinfo,
 };
-- 
1.7.3.4


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* [PATCH 1/2] ks8695net: Disable non-working ethtool operations
From: Ben Hutchings @ 2011-01-13 17:50 UTC (permalink / raw)
  To: Figo.zhang, zeal; +Cc: netdev

Some ethtool operations can only be implemented for the WAN port, and
not all such operations are allowed to return an error code such as
-EOPNOTSUPP.  Therefore, define two separate ethtool_ops structures
for WAN and non-WAN ports; simplify and rename the WAN-only functions.

This is completely untested as I don't have an ARM build environment.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
This has nothing much to do with my work, but I spotted it while
auditing the various implementations of ethtool_ops::get_link.  While
ks8695net doesn't have a regular maintainer, the commit log suggests
that you are using it so perhaps you could test this change.

Ben.

 drivers/net/arm/ks8695net.c |  282 +++++++++++++++----------------------------
 1 files changed, 99 insertions(+), 183 deletions(-)

diff --git a/drivers/net/arm/ks8695net.c b/drivers/net/arm/ks8695net.c
index 54c6d84..8820fcd 100644
--- a/drivers/net/arm/ks8695net.c
+++ b/drivers/net/arm/ks8695net.c
@@ -854,12 +854,12 @@ ks8695_set_msglevel(struct net_device *ndev, u32 value)
 }
 
 /**
- *	ks8695_get_settings - Get device-specific settings.
+ *	ks8695_wan_get_settings - Get device-specific settings.
  *	@ndev: The network device to read settings from
  *	@cmd: The ethtool structure to read into
  */
 static int
-ks8695_get_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
+ks8695_wan_get_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
 {
 	struct ks8695_priv *ksp = netdev_priv(ndev);
 	u32 ctrl;
@@ -870,69 +870,50 @@ ks8695_get_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
 			  SUPPORTED_TP | SUPPORTED_MII);
 	cmd->transceiver = XCVR_INTERNAL;
 
-	/* Port specific extras */
-	switch (ksp->dtype) {
-	case KS8695_DTYPE_HPNA:
-		cmd->phy_address = 0;
-		/* not supported for HPNA */
-		cmd->autoneg = AUTONEG_DISABLE;
+	cmd->advertising = ADVERTISED_TP | ADVERTISED_MII;
+	cmd->port = PORT_MII;
+	cmd->supported |= (SUPPORTED_Autoneg | SUPPORTED_Pause);
+	cmd->phy_address = 0;
 
-		/* BUG: Erm, dtype hpna implies no phy regs */
-		/*
-		ctrl = readl(KS8695_MISC_VA + KS8695_HMC);
-		cmd->speed = (ctrl & HMC_HSS) ? SPEED_100 : SPEED_10;
-		cmd->duplex = (ctrl & HMC_HDS) ? DUPLEX_FULL : DUPLEX_HALF;
-		*/
-		return -EOPNOTSUPP;
-	case KS8695_DTYPE_WAN:
-		cmd->advertising = ADVERTISED_TP | ADVERTISED_MII;
-		cmd->port = PORT_MII;
-		cmd->supported |= (SUPPORTED_Autoneg | SUPPORTED_Pause);
-		cmd->phy_address = 0;
+	ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
+	if ((ctrl & WMC_WAND) == 0) {
+		/* auto-negotiation is enabled */
+		cmd->advertising |= ADVERTISED_Autoneg;
+		if (ctrl & WMC_WANA100F)
+			cmd->advertising |= ADVERTISED_100baseT_Full;
+		if (ctrl & WMC_WANA100H)
+			cmd->advertising |= ADVERTISED_100baseT_Half;
+		if (ctrl & WMC_WANA10F)
+			cmd->advertising |= ADVERTISED_10baseT_Full;
+		if (ctrl & WMC_WANA10H)
+			cmd->advertising |= ADVERTISED_10baseT_Half;
+		if (ctrl & WMC_WANAP)
+			cmd->advertising |= ADVERTISED_Pause;
+		cmd->autoneg = AUTONEG_ENABLE;
+
+		cmd->speed = (ctrl & WMC_WSS) ? SPEED_100 : SPEED_10;
+		cmd->duplex = (ctrl & WMC_WDS) ?
+			DUPLEX_FULL : DUPLEX_HALF;
+	} else {
+		/* auto-negotiation is disabled */
+		cmd->autoneg = AUTONEG_DISABLE;
 
-		ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
-		if ((ctrl & WMC_WAND) == 0) {
-			/* auto-negotiation is enabled */
-			cmd->advertising |= ADVERTISED_Autoneg;
-			if (ctrl & WMC_WANA100F)
-				cmd->advertising |= ADVERTISED_100baseT_Full;
-			if (ctrl & WMC_WANA100H)
-				cmd->advertising |= ADVERTISED_100baseT_Half;
-			if (ctrl & WMC_WANA10F)
-				cmd->advertising |= ADVERTISED_10baseT_Full;
-			if (ctrl & WMC_WANA10H)
-				cmd->advertising |= ADVERTISED_10baseT_Half;
-			if (ctrl & WMC_WANAP)
-				cmd->advertising |= ADVERTISED_Pause;
-			cmd->autoneg = AUTONEG_ENABLE;
-
-			cmd->speed = (ctrl & WMC_WSS) ? SPEED_100 : SPEED_10;
-			cmd->duplex = (ctrl & WMC_WDS) ?
-				DUPLEX_FULL : DUPLEX_HALF;
-		} else {
-			/* auto-negotiation is disabled */
-			cmd->autoneg = AUTONEG_DISABLE;
-
-			cmd->speed = (ctrl & WMC_WANF100) ?
-				SPEED_100 : SPEED_10;
-			cmd->duplex = (ctrl & WMC_WANFF) ?
-				DUPLEX_FULL : DUPLEX_HALF;
-		}
-		break;
-	case KS8695_DTYPE_LAN:
-		return -EOPNOTSUPP;
+		cmd->speed = (ctrl & WMC_WANF100) ?
+			SPEED_100 : SPEED_10;
+		cmd->duplex = (ctrl & WMC_WANFF) ?
+			DUPLEX_FULL : DUPLEX_HALF;
 	}
 
 	return 0;
 }
 
 /**
- *	ks8695_set_settings - Set device-specific settings.
+ *	ks8695_wan_set_settings - Set device-specific settings.
  *	@ndev: The network device to configure
  *	@cmd: The settings to configure
  */
 static int
-ks8695_set_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
+ks8695_wan_set_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
 {
 	struct ks8695_priv *ksp = netdev_priv(ndev);
 	u32 ctrl;
@@ -956,171 +937,99 @@ ks8695_set_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
 				ADVERTISED_100baseT_Full)) == 0)
 			return -EINVAL;
 
-		switch (ksp->dtype) {
-		case KS8695_DTYPE_HPNA:
-			/* HPNA does not support auto-negotiation. */
-			return -EINVAL;
-		case KS8695_DTYPE_WAN:
-			ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
-
-			ctrl &= ~(WMC_WAND | WMC_WANA100F | WMC_WANA100H |
-				  WMC_WANA10F | WMC_WANA10H);
-			if (cmd->advertising & ADVERTISED_100baseT_Full)
-				ctrl |= WMC_WANA100F;
-			if (cmd->advertising & ADVERTISED_100baseT_Half)
-				ctrl |= WMC_WANA100H;
-			if (cmd->advertising & ADVERTISED_10baseT_Full)
-				ctrl |= WMC_WANA10F;
-			if (cmd->advertising & ADVERTISED_10baseT_Half)
-				ctrl |= WMC_WANA10H;
-
-			/* force a re-negotiation */
-			ctrl |= WMC_WANR;
-			writel(ctrl, ksp->phyiface_regs + KS8695_WMC);
-			break;
-		case KS8695_DTYPE_LAN:
-			return -EOPNOTSUPP;
-		}
+		ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
 
+		ctrl &= ~(WMC_WAND | WMC_WANA100F | WMC_WANA100H |
+			  WMC_WANA10F | WMC_WANA10H);
+		if (cmd->advertising & ADVERTISED_100baseT_Full)
+			ctrl |= WMC_WANA100F;
+		if (cmd->advertising & ADVERTISED_100baseT_Half)
+			ctrl |= WMC_WANA100H;
+		if (cmd->advertising & ADVERTISED_10baseT_Full)
+			ctrl |= WMC_WANA10F;
+		if (cmd->advertising & ADVERTISED_10baseT_Half)
+			ctrl |= WMC_WANA10H;
+
+		/* force a re-negotiation */
+		ctrl |= WMC_WANR;
+		writel(ctrl, ksp->phyiface_regs + KS8695_WMC);
 	} else {
-		switch (ksp->dtype) {
-		case KS8695_DTYPE_HPNA:
-			/* BUG: dtype_hpna implies no phy registers */
-			/*
-			ctrl = __raw_readl(KS8695_MISC_VA + KS8695_HMC);
-
-			ctrl &= ~(HMC_HSS | HMC_HDS);
-			if (cmd->speed == SPEED_100)
-				ctrl |= HMC_HSS;
-			if (cmd->duplex == DUPLEX_FULL)
-				ctrl |= HMC_HDS;
-
-			__raw_writel(ctrl, KS8695_MISC_VA + KS8695_HMC);
-			*/
-			return -EOPNOTSUPP;
-		case KS8695_DTYPE_WAN:
-			ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
-
-			/* disable auto-negotiation */
-			ctrl |= WMC_WAND;
-			ctrl &= ~(WMC_WANF100 | WMC_WANFF);
-
-			if (cmd->speed == SPEED_100)
-				ctrl |= WMC_WANF100;
-			if (cmd->duplex == DUPLEX_FULL)
-				ctrl |= WMC_WANFF;
-
-			writel(ctrl, ksp->phyiface_regs + KS8695_WMC);
-			break;
-		case KS8695_DTYPE_LAN:
-			return -EOPNOTSUPP;
-		}
+		ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
+
+		/* disable auto-negotiation */
+		ctrl |= WMC_WAND;
+		ctrl &= ~(WMC_WANF100 | WMC_WANFF);
+
+		if (cmd->speed == SPEED_100)
+			ctrl |= WMC_WANF100;
+		if (cmd->duplex == DUPLEX_FULL)
+			ctrl |= WMC_WANFF;
+
+		writel(ctrl, ksp->phyiface_regs + KS8695_WMC);
 	}
 
 	return 0;
 }
 
 /**
- *	ks8695_nwayreset - Restart the autonegotiation on the port.
+ *	ks8695_wan_nwayreset - Restart the autonegotiation on the port.
  *	@ndev: The network device to restart autoneotiation on
  */
 static int
-ks8695_nwayreset(struct net_device *ndev)
+ks8695_wan_nwayreset(struct net_device *ndev)
 {
 	struct ks8695_priv *ksp = netdev_priv(ndev);
 	u32 ctrl;
 
-	switch (ksp->dtype) {
-	case KS8695_DTYPE_HPNA:
-		/* No phy means no autonegotiation on hpna */
-		return -EINVAL;
-	case KS8695_DTYPE_WAN:
-		ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
+	ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
 
-		if ((ctrl & WMC_WAND) == 0)
-			writel(ctrl | WMC_WANR,
-			       ksp->phyiface_regs + KS8695_WMC);
-		else
-			/* auto-negotiation not enabled */
-			return -EINVAL;
-		break;
-	case KS8695_DTYPE_LAN:
-		return -EOPNOTSUPP;
-	}
+	if ((ctrl & WMC_WAND) == 0)
+		writel(ctrl | WMC_WANR,
+		       ksp->phyiface_regs + KS8695_WMC);
+	else
+		/* auto-negotiation not enabled */
+		return -EINVAL;
 
 	return 0;
 }
 
 /**
- *	ks8695_get_link - Retrieve link status of network interface
+ *	ks8695_wan_get_link - Retrieve link status of network interface
  *	@ndev: The network interface to retrive the link status of.
  */
 static u32
-ks8695_get_link(struct net_device *ndev)
+ks8695_wan_get_link(struct net_device *ndev)
 {
 	struct ks8695_priv *ksp = netdev_priv(ndev);
 	u32 ctrl;
 
-	switch (ksp->dtype) {
-	case KS8695_DTYPE_HPNA:
-		/* HPNA always has link */
-		return 1;
-	case KS8695_DTYPE_WAN:
-		/* WAN we can read the PHY for */
-		ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
-		return ctrl & WMC_WLS;
-	case KS8695_DTYPE_LAN:
-		return -EOPNOTSUPP;
-	}
-	return 0;
+	ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
+	return ctrl & WMC_WLS;
 }
 
 /**
- *	ks8695_get_pause - Retrieve network pause/flow-control advertising
+ *	ks8695_wan_get_pause - Retrieve network pause/flow-control advertising
  *	@ndev: The device to retrieve settings from
  *	@param: The structure to fill out with the information
  */
 static void
-ks8695_get_pause(struct net_device *ndev, struct ethtool_pauseparam *param)
+ks8695_wan_get_pause(struct net_device *ndev, struct ethtool_pauseparam *param)
 {
 	struct ks8695_priv *ksp = netdev_priv(ndev);
 	u32 ctrl;
 
-	switch (ksp->dtype) {
-	case KS8695_DTYPE_HPNA:
-		/* No phy link on hpna to configure */
-		return;
-	case KS8695_DTYPE_WAN:
-		ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
-
-		/* advertise Pause */
-		param->autoneg = (ctrl & WMC_WANAP);
+	ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
 
-		/* current Rx Flow-control */
-		ctrl = ks8695_readreg(ksp, KS8695_DRXC);
-		param->rx_pause = (ctrl & DRXC_RFCE);
+	/* advertise Pause */
+	param->autoneg = (ctrl & WMC_WANAP);
 
-		/* current Tx Flow-control */
-		ctrl = ks8695_readreg(ksp, KS8695_DTXC);
-		param->tx_pause = (ctrl & DTXC_TFCE);
-		break;
-	case KS8695_DTYPE_LAN:
-		/* The LAN's "phy" is a direct-attached switch */
-		return;
-	}
-}
+	/* current Rx Flow-control */
+	ctrl = ks8695_readreg(ksp, KS8695_DRXC);
+	param->rx_pause = (ctrl & DRXC_RFCE);
 
-/**
- *	ks8695_set_pause - Configure pause/flow-control
- *	@ndev: The device to configure
- *	@param: The pause parameters to set
- *
- *	TODO: Implement this
- */
-static int
-ks8695_set_pause(struct net_device *ndev, struct ethtool_pauseparam *param)
-{
-	return -EOPNOTSUPP;
+	/* current Tx Flow-control */
+	ctrl = ks8695_readreg(ksp, KS8695_DTXC);
+	param->tx_pause = (ctrl & DTXC_TFCE);
 }
 
 /**
@@ -1140,12 +1049,17 @@ ks8695_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *info)
 static const struct ethtool_ops ks8695_ethtool_ops = {
 	.get_msglevel	= ks8695_get_msglevel,
 	.set_msglevel	= ks8695_set_msglevel,
-	.get_settings	= ks8695_get_settings,
-	.set_settings	= ks8695_set_settings,
-	.nway_reset	= ks8695_nwayreset,
-	.get_link	= ks8695_get_link,
-	.get_pauseparam = ks8695_get_pause,
-	.set_pauseparam = ks8695_set_pause,
+	.get_drvinfo	= ks8695_get_drvinfo,
+};
+
+static const struct ethtool_ops ks8695_wan_ethtool_ops = {
+	.get_msglevel	= ks8695_get_msglevel,
+	.set_msglevel	= ks8695_set_msglevel,
+	.get_settings	= ks8695_wan_get_settings,
+	.set_settings	= ks8695_wan_set_settings,
+	.nway_reset	= ks8695_wan_nwayreset,
+	.get_link	= ks8695_wan_get_link,
+	.get_pauseparam = ks8695_wan_get_pause,
 	.get_drvinfo	= ks8695_get_drvinfo,
 };
 
@@ -1541,7 +1455,6 @@ ks8695_probe(struct platform_device *pdev)
 
 	/* driver system setup */
 	ndev->netdev_ops = &ks8695_netdev_ops;
-	SET_ETHTOOL_OPS(ndev, &ks8695_ethtool_ops);
 	ndev->watchdog_timeo	 = msecs_to_jiffies(watchdog);
 
 	netif_napi_add(ndev, &ksp->napi, ks8695_poll, NAPI_WEIGHT);
@@ -1608,12 +1521,15 @@ ks8695_probe(struct platform_device *pdev)
 	if (ksp->phyiface_regs && ksp->link_irq == -1) {
 		ks8695_init_switch(ksp);
 		ksp->dtype = KS8695_DTYPE_LAN;
+		SET_ETHTOOL_OPS(ndev, &ks8695_ethtool_ops);
 	} else if (ksp->phyiface_regs && ksp->link_irq != -1) {
 		ks8695_init_wan_phy(ksp);
 		ksp->dtype = KS8695_DTYPE_WAN;
+		SET_ETHTOOL_OPS(ndev, &ks8695_wan_ethtool_ops);
 	} else {
 		/* No initialisation since HPNA does not have a PHY */
 		ksp->dtype = KS8695_DTYPE_HPNA;
+		SET_ETHTOOL_OPS(ndev, &ks8695_ethtool_ops);
 	}
 
 	/* And bring up the net_device with the net core */
-- 
1.7.3.4





^ permalink raw reply related

* [PATCH] CHOKe flow scheduler (iproute)
From: Stephen Hemminger @ 2011-01-13 17:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <20110113092706.154748c2@s6510>

Preliminary interface for CHOKe scheduler in iproute

---
 include/linux/pkt_sched.h |   29 ++++++
 tc/Makefile               |    1 +
 tc/q_choke.c              |  221 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 251 insertions(+), 0 deletions(-)
 create mode 100644 tc/q_choke.c

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 2cfa4bc..83bac92 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -247,6 +247,35 @@ struct tc_gred_sopt {
 	__u16		pad1;
 };
 
+/* CHOKe section */
+
+enum {
+	TCA_CHOKE_UNSPEC,
+	TCA_CHOKE_PARMS,
+	TCA_CHOKE_STAB,
+	__TCA_CHOKE_MAX,
+};
+
+#define TCA_CHOKE_MAX (__TCA_CHOKE_MAX - 1)
+
+struct tc_choke_qopt {
+	__u32		limit;		/* HARD maximal queue length (packets)	*/
+	__u32		qth_min;	/* Min average length threshold (packets) */
+	__u32		qth_max;	/* Max average length threshold (packets) */
+	unsigned char   Wlog;		/* log(W)		*/
+	unsigned char   Plog;		/* log(P_max/(qth_max-qth_min))	*/
+	unsigned char   Scell_log;	/* cell size for idle damping */
+	unsigned char	flags;		/* see RED flags */
+};
+
+struct tc_choke_xstats {
+	__u32           early;          /* Early drops */
+	__u32           pdrop;          /* Drops due to queue limits */
+	__u32           other;          /* Drops due to drop() calls */
+	__u32           marked;         /* Marked packets */
+	__u32		matched;	/* Drops due to flow match */
+};
+
 /* HTB section */
 #define TC_HTB_NUMPRIO		8
 #define TC_HTB_MAXDEPTH		8
diff --git a/tc/Makefile b/tc/Makefile
index 101cc83..2cbd5d5 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -15,6 +15,7 @@ TCMODULES += q_cbq.o
 TCMODULES += q_rr.o
 TCMODULES += q_multiq.o
 TCMODULES += q_netem.o
+TCMODULES += q_choke.o
 TCMODULES += f_rsvp.o
 TCMODULES += f_u32.o
 TCMODULES += f_route.o
diff --git a/tc/q_choke.c b/tc/q_choke.c
new file mode 100644
index 0000000..044ae9a
--- /dev/null
+++ b/tc/q_choke.c
@@ -0,0 +1,221 @@
+/*
+ * q_choke.c		CHOKE.
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Stephen Hemminger <shemminger@vyatta.com>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+#include "tc_red.h"
+
+static void explain(void)
+{
+	fprintf(stderr, "Usage: ... choke limit PACKETS bandwidth KBPS [ecn]\n");
+	fprintf(stderr, "                 [ min PACKETS ] [ max PACKETS ] [ burst PACKETS ]\n");
+}
+
+static int choke_parse_opt(struct qdisc_util *qu, int argc, char **argv,
+			   struct nlmsghdr *n)
+{
+	struct tc_red_qopt opt;
+	unsigned burst = 0;
+	unsigned avpkt = 1000;
+	double probability = 0.02;
+	unsigned rate = 0;
+	int ecn_ok = 0;
+	int wlog;
+	__u8 sbuf[256];
+	struct rtattr *tail;
+
+	memset(&opt, 0, sizeof(opt));
+
+	while (argc > 0) {
+		if (strcmp(*argv, "limit") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&opt.limit, *argv, 0)) {
+				fprintf(stderr, "Illegal \"limit\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "bandwidth") == 0) {
+			NEXT_ARG();
+			if (get_rate(&rate, *argv)) {
+				fprintf(stderr, "Illegal \"bandwidth\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "ecn") == 0) {
+			ecn_ok = 1;
+		} else if (strcmp(*argv, "min") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&opt.qth_min, *argv, 0)) {
+				fprintf(stderr, "Illegal \"min\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "max") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&opt.qth_max, *argv, 0)) {
+				fprintf(stderr, "Illegal \"max\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "burst") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&burst, *argv, 0)) {
+				fprintf(stderr, "Illegal \"burst\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "avpkt") == 0) {
+			NEXT_ARG();
+			if (get_size(&avpkt, *argv)) {
+				fprintf(stderr, "Illegal \"avpkt\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "probability") == 0) {
+			NEXT_ARG();
+			if (sscanf(*argv, "%lg", &probability) != 1) {
+				fprintf(stderr, "Illegal \"probability\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "help") == 0) {
+			explain();
+			return -1;
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--; argv++;
+	}
+
+	if (!rate || !opt.limit) {
+		fprintf(stderr, "Required parameter (bandwidth, limit) is missing\n");
+		return -1;
+	}
+
+	/* Compute default min/max thresholds based on 
+	   Sally Floyd's recommendations:
+	   http://www.icir.org/floyd/REDparameters.txt
+	*/
+	if (!opt.qth_max) 
+		opt.qth_max = opt.limit / 4;
+	if (!opt.qth_min)
+		opt.qth_min = opt.qth_max / 3;
+	if (!burst)
+		burst = (2 * opt.qth_min + opt.qth_max) / 3;
+
+	if (opt.qth_max > opt.limit) {
+		fprintf(stderr, "\"max\" is larger than \"limit\"\n");
+		return -1;
+	}
+
+	if (opt.qth_min > opt.qth_min) {
+		fprintf(stderr, "\"min\" is not smaller than \"max\"\n");
+		return -1;
+	}
+
+	printf("min=%u max=%u burst=%u limit=%u\n",
+	       opt.qth_min, opt.qth_max, burst, opt.limit);
+	wlog = tc_red_eval_ewma(opt.qth_min, burst, 1);
+	if (wlog < 0) {
+		fprintf(stderr, "CHOKE: failed to calculate EWMA constant.\n");
+		return -1;
+	}
+	if (wlog >= 10)
+		fprintf(stderr, "CHOKE: WARNING. Burst %d seems to be to large.\n", burst);
+	opt.Wlog = wlog;
+
+	wlog = tc_red_eval_P(opt.qth_min, opt.qth_max, probability);
+	if (wlog < 0) {
+		fprintf(stderr, "CHOKE: failed to calculate probability.\n");
+		return -1;
+	}
+	opt.Plog = wlog;
+
+	wlog = tc_red_eval_idle_damping(opt.Wlog, avpkt, rate, sbuf);
+	if (wlog < 0) {
+		fprintf(stderr, "CHOKE: failed to calculate idle damping table.\n");
+		return -1;
+	}
+	opt.Scell_log = wlog;
+	if (ecn_ok)
+		opt.flags |= TC_RED_ECN;
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	addattr_l(n, 1024, TCA_CHOKE_PARMS, &opt, sizeof(opt));
+	addattr_l(n, 1024, TCA_CHOKE_STAB, sbuf, 256);
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+	return 0;
+}
+
+static int choke_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+	struct rtattr *tb[TCA_CHOKE_STAB+1];
+	struct tc_red_qopt *qopt;
+	SPRINT_BUF(b1);
+	SPRINT_BUF(b2);
+	SPRINT_BUF(b3);
+
+	if (opt == NULL)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_CHOKE_STAB, opt);
+
+	if (tb[TCA_CHOKE_PARMS] == NULL)
+		return -1;
+	qopt = RTA_DATA(tb[TCA_CHOKE_PARMS]);
+	if (RTA_PAYLOAD(tb[TCA_CHOKE_PARMS])  < sizeof(*qopt))
+		return -1;
+	fprintf(f, "limit %s min %s max %s ",
+		sprint_size(qopt->limit, b1),
+		sprint_size(qopt->qth_min, b2),
+		sprint_size(qopt->qth_max, b3));
+
+	if (qopt->flags & TC_RED_ECN)
+		fprintf(f, "ecn ");
+
+	if (show_details) {
+		fprintf(f, "ewma %u Plog %u Scell_log %u",
+			qopt->Wlog, qopt->Plog, qopt->Scell_log);
+	}
+	return 0;
+}
+
+static int choke_print_xstats(struct qdisc_util *qu, FILE *f,
+			      struct rtattr *xstats)
+{
+	struct tc_choke_xstats *st;
+
+	if (xstats == NULL)
+		return 0;
+
+	if (RTA_PAYLOAD(xstats) < sizeof(*st))
+		return -1;
+
+	st = RTA_DATA(xstats);
+	fprintf(f, "  marked %u early %u pdrop %u other %u matched %u",
+		st->marked, st->early, st->pdrop, st->other, st->matched);
+	return 0;
+
+}
+
+struct qdisc_util choke_qdisc_util = {
+	.id		= "choke",
+	.parse_qopt	= choke_parse_opt,
+	.print_qopt	= choke_print_opt,
+	.print_xstats	= choke_print_xstats,
+};
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH v1 2/2] TCPCT API sockopt update to draft -03
From: William Allen Simpson @ 2011-01-13 17:32 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Linux Kernel Developers, Linux Kernel Network Developers,
	David Miller, Andrew Morton
In-Reply-To: <20110112105608.793787b2@s6510>

On 1/12/11 1:56 PM, Stephen Hemminger wrote:
> On Wed, 12 Jan 2011 12:59:38 -0500
> William Allen Simpson<william.allen.simpson@gmail.com>  wrote:
>
>> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
>> index e64f4c6..c8f4017 100644
>> --- a/include/linux/tcp.h
>> +++ b/include/linux/tcp.h
>> @@ -185,22 +185,37 @@ struct tcp_md5sig {
>>   #define TCP_COOKIE_PAIR_SIZE	(2*TCP_COOKIE_MAX)
>>
>>   /* Flags for both getsockopt and setsockopt */
>> -#define TCP_COOKIE_IN_ALWAYS	(1<<  0)	/* Discard SYN without cookie */
>> -#define TCP_COOKIE_OUT_NEVER	(1<<  1)	/* Prohibit outgoing cookies,
>> +#define TCPCT_IN_ALWAYS		(1<<  0)	/* Discard SYN without cookie */
>> +#define TCPCT_OUT_NEVER		(1<<  1)	/* Prohibit outgoing cookies,
>
> You end up changing values in kernel userspace API in a way
> that is incompatible with older applications. This is not acceptable.
>
While I agree in principle and argued strongly against it, other
members of the research group (particularly the original project
sponsor) have over-ridden my concerns.  I'm sorry to inform you that
many/most participants don't care much about Linux.

Note that the *bits* are the same, and previously compiled programs
(that don't access more advanced features) should continue to run as
they have in the past.

Even though I'm not paid to work on Linux, I'm doing my best to give you
folks a quick heads up and provide code to rectify the very recent changes
that can be propagated back through the stable tree (to 2.6.33).

As always, what you actually do with my code is up to you....

^ permalink raw reply

* [PATCH] CHOKe flow scheduler (0.7)
From: Stephen Hemminger @ 2011-01-13 17:27 UTC (permalink / raw)
  To: David Miller, Eric Dumazet; +Cc: netdev

This implements the CHOKe packet scheduler based on the existing
Linux RED scheduler based on the algorithm described in the paper.

The core idea is:
  For every packet arrival:
  	Calculate Qave
	if (Qave < minth) 
	     Queue the new packet
	else 
	     Select randomly a packet from the queue 
	     if (both packets from same flow)
	     then Drop both the packets
	     else if (Qave > maxth)
	          Drop packet
	     else
	       	  Admit packet with proability p (same as RED)

See also:
  Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
   queue management scheme for approximating fair bandwidth allocation", 
  Proceeding of INFOCOM'2000, March 2000.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
Patch versions
0.3 (Eric) uses table for queue.
0.4 allows classification with TC filters
    fixes crash when peek_random() finds a hole
0.5 (Eric) that fixes qlen with holes and peek
0.7 change to use separate params / stats than RED
    account for drops in backlog

Almost ready, still need to make sure API (netlink) is right


 net/sched/Kconfig     |   11 +
 net/sched/Makefile    |    1 
 net/sched/sch_choke.c |  536 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 548 insertions(+)

--- a/net/sched/Kconfig	2011-01-12 17:44:05.747500044 -0800
+++ b/net/sched/Kconfig	2011-01-12 17:44:53.167735188 -0800
@@ -205,6 +205,17 @@ config NET_SCH_DRR
 
 	  If unsure, say N.
 
+config NET_SCH_CHOKE
+	tristate "CHOose and Keep responsive flow scheduler (CHOKE)"
+	help
+	  Say Y here if you want to use the CHOKe packet scheduler (CHOose
+	  and Keep for responsive flows, CHOose and Kill for unresponsive
+	  flows). This is a variation of RED which trys to penalize flows
+	  that monopolize the queue.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_choke.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
--- a/net/sched/Makefile	2011-01-12 17:44:05.767500135 -0800
+++ b/net/sched/Makefile	2011-01-12 17:44:53.167735188 -0800
@@ -32,6 +32,7 @@ obj-$(CONFIG_NET_SCH_MULTIQ)	+= sch_mult
 obj-$(CONFIG_NET_SCH_ATM)	+= sch_atm.o
 obj-$(CONFIG_NET_SCH_NETEM)	+= sch_netem.o
 obj-$(CONFIG_NET_SCH_DRR)	+= sch_drr.o
+obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
 obj-$(CONFIG_NET_CLS_FW)	+= cls_fw.o
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/net/sched/sch_choke.c	2011-01-12 17:45:07.227806180 -0800
@@ -0,0 +1,556 @@
+/*
+ * net/sched/sch_choke.c	CHOKE scheduler
+ *
+ * Copyright (c) 2011 Stephen Hemminger <shemminger@vyatta.com>
+ * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/reciprocal_div.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <net/red.h>
+
+/*	CHOKe stateless AQM for fair bandwidth allocation
+        =================================================
+
+   CHOKe (CHOose and Keep for responsive flows, CHOose and Kill for
+   unresponsive flows) is a variant of RED that penalizes misbehaving flows but
+   maintains no flow state. The difference from RED is an additional step
+   during the enqueuing process. If average queue size is over the
+   low threshold (qmin), a packet is chosen at random from the queue.
+   If both the new and chosen packet are from the same flow, both
+   are dropped. Unlike RED, CHOKe is not really a "classful" qdisc because it
+   needs to access packets in queue randomly. It has a minimal class
+   interface to allow overriding the builtin flow classifier with
+   filters.
+
+   Source:
+   R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless
+   Active Queue Management Scheme for Approximating Fair Bandwidth Allocation",
+   IEEE INFOCOM, 2000.
+
+   A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial
+   Characteristics", IEEE/ACM Transactions on Networking, 2004
+
+ */
+
+/* Upper bound on size of sk_buff table */
+#define CHOKE_MAX_QUEUE	(128*1024 - 1)
+
+struct choke_sched_data {
+/* Parameters */
+	u32		 limit;
+	unsigned char	 flags;
+
+	struct red_parms parms;
+
+/* Variables */
+	struct tcf_proto *filter_list;
+	struct {
+		u32	prob_drop;	/* Early probability drops */
+		u32	prob_mark;	/* Early probability marks */
+		u32	forced_drop;	/* Forced drops, qavg > max_thresh */
+		u32	forced_mark;	/* Forced marks, qavg > max_thresh */
+		u32	pdrop;          /* Drops due to queue limits */
+		u32	other;          /* Drops due to drop() calls */
+		u32	matched;	/* Drops to flow match */
+	} stats;
+
+	unsigned int	 head;
+	unsigned int	 tail;
+	unsigned int	 holes;
+	unsigned int	 tab_mask; /* size - 1 */
+
+	struct sk_buff **tab;
+};
+
+static inline unsigned int choke_len(const struct choke_sched_data *q)
+{
+	return (q->tail - q->head) & q->tab_mask;
+}
+
+/* deliver a random number between 0 and N - 1 */
+static inline u32 random_N(unsigned int N)
+{
+	return reciprocal_divide(random32(), N);
+}
+
+/* Select a packet at random from the queue in O(1) and handle holes */
+static struct sk_buff *choke_peek_random(struct choke_sched_data *q,
+					 unsigned int *pidx)
+{
+	struct sk_buff *skb;
+	int retrys = 3;
+
+	do {
+		*pidx = (q->head + random_N(choke_len(q))) & q->tab_mask;
+		skb = q->tab[*pidx];
+		if (skb)
+			return skb;
+	} while (--retrys > 0);
+
+	/* queue is has lots of holes use the head which is known to exist */
+	return q->tab[*pidx = q->head];
+}
+
+/* Is ECN parameter configured */
+static inline int use_ecn(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_ECN;
+}
+
+/* Should packets over max just be dropped (versus marked) */
+static inline int use_harddrop(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_HARDDROP;
+}
+
+/* Move head pointer forward to skip over holes */
+static void choke_zap_head_holes(struct choke_sched_data *q)
+{
+	while (q->holes && q->tab[q->head] == NULL) {
+		q->head = (q->head + 1) & q->tab_mask;
+		q->holes--;
+	}
+}
+
+/* Move tail pointer backwards to reuse holes */
+static void choke_zap_tail_holes(struct choke_sched_data *q)
+{
+	while (q->holes && q->tab[q->tail - 1] == NULL) {
+		q->tail = (q->tail - 1) & q->tab_mask;
+		q->holes--;
+	}
+}
+
+/* Drop packet from queue array by creating a "hole" */
+static void choke_drop_by_idx(struct choke_sched_data *q, unsigned int idx)
+{
+	q->tab[idx] = NULL;
+	q->holes++;
+
+	if (idx == q->head)
+		choke_zap_head_holes(q);
+	if (idx == q->tail)
+		choke_zap_tail_holes(q);
+}
+
+/* Classify flow using either:
+   1. pre-existing classification result in skb
+   2. fast internal classification
+   3. use TC filter based classification
+*/
+static inline unsigned int choke_classify(struct sk_buff *skb,
+					  struct Qdisc *sch, int *qerr)
+
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tcf_result res;
+	int result;
+
+	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+	if (TC_H_MAJ(skb->priority) == sch->handle &&
+	    TC_H_MIN(skb->priority) > 0)
+		return TC_H_MIN(skb->priority);
+
+	if (!q->filter_list)
+		return skb_get_rxhash(skb);
+
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return 0;
+		}
+#endif
+		return TC_H_MIN(res.classid);
+	}
+
+	return 0;
+}
+
+static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct red_parms *p = &q->parms;
+	unsigned int hash;
+	int uninitialized_var(ret);
+
+	hash = choke_classify(skb, sch, &ret);
+	if (!hash) {
+		/* Packet was eaten by filter */
+		if (ret & __NET_XMIT_BYPASS)
+			sch->qstats.drops++;
+		kfree_skb(skb);
+		return ret;
+	}
+
+	/* Maybe add hash as field in struct qdisc_skb_cb? */
+	*(unsigned int *)(qdisc_skb_cb(skb)->data) = hash;
+
+	/* Compute average queue usage (see RED) */
+	p->qavg = red_calc_qavg(p, choke_len(q) - q->holes);
+	if (red_is_idling(p))
+		red_end_of_idle_period(p);
+
+	/* Is queue small? */
+	if (p->qavg <= p->qth_min)
+		p->qcount = -1;
+	else {
+		struct sk_buff *oskb;
+		unsigned int idx;
+
+		/* Draw a packet at random from queue */
+		oskb = choke_peek_random(q, &idx);
+
+		/* Both packets from same flow ? */
+		if (*(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
+			/* Drop both packets */
+			q->stats.matched++;
+			choke_drop_by_idx(q, idx);
+			qdisc_drop(oskb, sch);
+			goto congestion_drop;
+		}
+
+		/* Queue is large, always mark/drop */
+		if (p->qavg > p->qth_max) {
+			p->qcount = -1;
+
+			sch->qstats.overlimits++;
+			if (use_harddrop(q) || !use_ecn(q) ||
+			    !INET_ECN_set_ce(skb)) {
+				q->stats.forced_drop++;
+				goto congestion_drop;
+			}
+
+			q->stats.forced_mark++;
+		} else if (++p->qcount) {
+			if (red_mark_probability(p, p->qavg)) {
+				p->qcount = 0;
+				p->qR = red_random(p);
+
+				sch->qstats.overlimits++;
+				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
+					q->stats.prob_drop++;
+					goto congestion_drop;
+				}
+
+				q->stats.prob_mark++;
+			}
+		} else
+			p->qR = red_random(p);
+	}
+
+	/* Admit new packet */
+	if (likely(choke_len(q) < q->limit)) {
+
+		q->tab[q->tail] = skb;
+		q->tail = (q->tail + 1) & q->tab_mask;
+
+		sch->qstats.backlog += qdisc_pkt_len(skb);
+		qdisc_update_bstats(sch, skb);
+		sch->q.qlen = choke_len(q) - q->holes;
+		return NET_XMIT_SUCCESS;
+	}
+
+	q->stats.pdrop++;
+	sch->qstats.drops++;
+	kfree_skb(skb);
+	return NET_XMIT_DROP;
+
+ congestion_drop:
+	qdisc_drop(skb, sch);
+	return NET_XMIT_CN;
+}
+
+static struct sk_buff *choke_dequeue(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+
+	if (q->head == q->tail) {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+		return NULL;
+	}
+	skb = q->tab[q->head];
+	q->tab[q->head] = NULL; /* not really needed */
+	q->head = (q->head + 1) & q->tab_mask;
+	choke_zap_head_holes(q);
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	sch->q.qlen = choke_len(q) - q->holes;
+
+	return skb;
+}
+
+static unsigned int choke_drop(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	unsigned int len;
+
+	len = qdisc_queue_drop(sch);
+	if (len > 0)
+		q->stats.other++;
+	else {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+	}
+
+	return len;
+}
+
+static void choke_reset(struct Qdisc* sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	red_restart(&q->parms);
+}
+
+static const struct nla_policy choke_policy[TCA_CHOKE_MAX + 1] = {
+	[TCA_CHOKE_PARMS]	= { .len = sizeof(struct tc_red_qopt) },
+	[TCA_CHOKE_STAB]	= { .len = 256 },
+};
+
+
+static void choke_free(void *addr)
+{
+	if (addr) {
+		if (is_vmalloc_addr(addr))
+			vfree(addr);
+		else
+			kfree(addr);
+	}
+}
+
+static int choke_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_CHOKE_MAX + 1];
+	struct tc_red_qopt *ctl;
+	int err;
+	struct sk_buff **old = NULL;
+	unsigned int mask;
+
+	if (opt == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CHOKE_MAX, opt, choke_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_CHOKE_PARMS] == NULL ||
+	    tb[TCA_CHOKE_STAB] == NULL)
+		return -EINVAL;
+
+	ctl = nla_data(tb[TCA_CHOKE_PARMS]);
+
+	if (ctl->limit > CHOKE_MAX_QUEUE)
+		return -EINVAL;
+
+	mask = roundup_pow_of_two(ctl->limit + 1) - 1;
+	if (mask != q->tab_mask) {
+		struct sk_buff **ntab;
+
+		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *), GFP_KERNEL);
+		if (!ntab)
+			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		if (!ntab)
+			return -ENOMEM;
+
+		sch_tree_lock(sch);
+		old = q->tab;
+		if (old) {
+			unsigned int tail = 0;
+
+			while (q->head != q->tail) {
+				ntab[tail++] = q->tab[q->head];
+				q->head = (q->head + 1) & q->tab_mask;
+			}
+			q->head = 0;
+			q->tail = tail;
+		}
+
+		q->tab_mask = mask;
+		q->tab = ntab;
+		q->holes = 0;
+	} else
+		sch_tree_lock(sch);
+
+	q->flags = ctl->flags;
+	q->limit = ctl->limit;
+
+	red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog,
+		      ctl->Plog, ctl->Scell_log,
+		      nla_data(tb[TCA_CHOKE_STAB]));
+
+	if (q->head == q->tail)
+		red_end_of_idle_period(&q->parms);
+
+	sch_tree_unlock(sch);
+	choke_free(old);
+	return 0;
+}
+
+static int choke_init(struct Qdisc* sch, struct nlattr *opt)
+{
+	return choke_change(sch, opt);
+}
+
+static int choke_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts = NULL;
+	struct tc_red_qopt opt = {
+		.limit		= q->limit,
+		.flags		= q->flags,
+		.qth_min	= q->parms.qth_min >> q->parms.Wlog,
+		.qth_max	= q->parms.qth_max >> q->parms.Wlog,
+		.Wlog		= q->parms.Wlog,
+		.Plog		= q->parms.Plog,
+		.Scell_log	= q->parms.Scell_log,
+	};
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+
+	NLA_PUT(skb, TCA_CHOKE_PARMS, sizeof(opt), &opt);
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -EMSGSIZE;
+}
+
+static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tc_choke_xstats st = {
+		.early	= q->stats.prob_drop + q->stats.forced_drop,
+		.marked	= q->stats.prob_mark + q->stats.forced_mark,
+		.pdrop	= q->stats.pdrop,
+		.other	= q->stats.other,
+		.matched = q->stats.matched,
+	};
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static void choke_destroy(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	choke_free(q->tab);
+}
+
+static struct Qdisc *choke_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	return NULL;
+}
+
+static unsigned long choke_get(struct Qdisc *sch, u32 classid)
+{
+	return 0;
+}
+
+static void choke_put(struct Qdisc *q, unsigned long cl)
+{
+}
+
+static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
+				u32 classid)
+{
+	return 0;
+}
+
+static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static int choke_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	return 0;
+}
+
+static void choke_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	if (!arg->stop) {
+		if (arg->fn(sch, 1, arg) < 0) {
+			arg->stop = 1;
+			return;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops choke_class_ops = {
+	.leaf		=	choke_leaf,
+	.get		=	choke_get,
+	.put		=	choke_put,
+	.tcf_chain	=	choke_find_tcf,
+	.bind_tcf	=	choke_bind,
+	.unbind_tcf	=	choke_put,
+	.dump		=	choke_dump_class,
+	.walk		=	choke_walk,
+};
+
+static struct sk_buff *choke_peek_head(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	return (q->head != q->tail) ? q->tab[q->head] : NULL;
+}
+
+static struct Qdisc_ops choke_qdisc_ops __read_mostly = {
+	.id		=	"choke",
+	.priv_size	=	sizeof(struct choke_sched_data),
+
+	.enqueue	=	choke_enqueue,
+	.dequeue	=	choke_dequeue,
+	.peek		=	choke_peek_head,
+	.drop		=	choke_drop,
+	.init		=	choke_init,
+	.destroy	=	choke_destroy,
+	.reset		=	choke_reset,
+	.change		=	choke_change,
+	.dump		=	choke_dump,
+	.dump_stats	=	choke_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init choke_module_init(void)
+{
+	return register_qdisc(&choke_qdisc_ops);
+}
+
+static void __exit choke_module_exit(void)
+{
+	unregister_qdisc(&choke_qdisc_ops);
+}
+
+module_init(choke_module_init)
+module_exit(choke_module_exit)
+
+MODULE_LICENSE("GPL");
--- a/include/linux/pkt_sched.h	2011-01-12 17:44:05.823500415 -0800
+++ b/include/linux/pkt_sched.h	2011-01-12 17:44:53.175735219 -0800
@@ -247,6 +247,35 @@ struct tc_gred_sopt {
 	__u16		pad1;
 };
 
+/* CHOKe section */
+
+enum {
+	TCA_CHOKE_UNSPEC,
+	TCA_CHOKE_PARMS,
+	TCA_CHOKE_STAB,
+	__TCA_CHOKE_MAX,
+};
+
+#define TCA_CHOKE_MAX (__TCA_CHOKE_MAX - 1)
+
+struct tc_choke_qopt {
+	__u32		limit;		/* HARD maximal queue length (packets)	*/
+	__u32		qth_min;	/* Min average length threshold (packets) */
+	__u32		qth_max;	/* Max average length threshold (packets) */
+	unsigned char   Wlog;		/* log(W)		*/
+	unsigned char   Plog;		/* log(P_max/(qth_max-qth_min))	*/
+	unsigned char   Scell_log;	/* cell size for idle damping */
+	unsigned char	flags;		/* see RED flags */
+};
+
+struct tc_choke_xstats {
+	__u32           early;          /* Early drops */
+	__u32           pdrop;          /* Drops due to queue limits */
+	__u32           other;          /* Drops due to drop() calls */
+	__u32           marked;         /* Marked packets */
+	__u32		matched;	/* Drops due to flow match */
+};
+
 /* HTB section */
 #define TC_HTB_NUMPRIO		8
 #define TC_HTB_MAXDEPTH		8

^ permalink raw reply

* [RFC PATCH] ipsec: fix IPv4 AH alignment on 32 bits
From: Nicolas Dichtel @ 2011-01-13 17:20 UTC (permalink / raw)
  To: netdev; +Cc: Christophe Gouault

[-- Attachment #1: Type: text/plain, Size: 298 bytes --]

Hi,

here is a patch to fix alignment of IPv4 AH. Note that this break compatiblity 
for some algorithms (like SHA256) with old kernels ... but upstream cannot use 
SHA256 on IPv4, for example, with a target that is RFC compliant.

I don't know what is the best way to fix this.


Regards,
Nicolas

[-- Attachment #2: 0001-ipsec-fix-IPv4-AH-alignment-on-32-bits.patch --]
[-- Type: text/x-patch, Size: 2682 bytes --]

>From 14bbe173eed25cf59e3e54222eb7de1a5578e54e Mon Sep 17 00:00:00 2001
From: Dang Hongwu <hongwu.dang@6wind.com>
Date: Wed, 22 Dec 2010 11:38:47 -0500
Subject: [PATCH] ipsec: fix IPv4 AH alignment on 32 bits

The Linux IPv4 AH stack aligns the AH header on a 64 bit boundary
(like in IPv6). This is not RFC compliant (see RFC4302, Section
3.3.3.2.1), it should be aligned on 32 bits.

For most of the authentication algorithms, the ICV size is 96 bits.
The AH header alignment on 32 or 64 bits gives the same results.

However for SHA-256-128 for instance, the wrong 64 bit alignment results
in adding useless padding in IPv4 AH, which is forbidden by the RFC.

Signed-off-by: Dang Hongwu <hongwu.dang@6wind.com>
Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/xfrm.h |    1 +
 net/ipv4/ah4.c     |    8 ++++----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index bcfb6b2..525d882 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -36,6 +36,7 @@
 #define XFRM_PROTO_ROUTING	IPPROTO_ROUTING
 #define XFRM_PROTO_DSTOPTS	IPPROTO_DSTOPTS
 
+#define XFRM_ALIGN4(len)	(((len) + 3) & ~3)
 #define XFRM_ALIGN8(len)	(((len) + 7) & ~7)
 #define MODULE_ALIAS_XFRM_MODE(family, encap) \
 	MODULE_ALIAS("xfrm-mode-" __stringify(family) "-" __stringify(encap))
diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index 86961be..95561d6 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -201,7 +201,7 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb)
 	top_iph->ttl = 0;
 	top_iph->check = 0;
 
-	ah->hdrlen  = (XFRM_ALIGN8(sizeof(*ah) + ahp->icv_trunc_len) >> 2) - 2;
+	ah->hdrlen  = (XFRM_ALIGN4(sizeof(*ah) + ahp->icv_trunc_len) >> 2) - 2;
 
 	ah->reserved = 0;
 	ah->spi = x->id.spi;
@@ -299,8 +299,8 @@ static int ah_input(struct xfrm_state *x, struct sk_buff *skb)
 	nexthdr = ah->nexthdr;
 	ah_hlen = (ah->hdrlen + 2) << 2;
 
-	if (ah_hlen != XFRM_ALIGN8(sizeof(*ah) + ahp->icv_full_len) &&
-	    ah_hlen != XFRM_ALIGN8(sizeof(*ah) + ahp->icv_trunc_len))
+	if (ah_hlen != XFRM_ALIGN4(sizeof(*ah) + ahp->icv_full_len) &&
+	    ah_hlen != XFRM_ALIGN4(sizeof(*ah) + ahp->icv_trunc_len))
 		goto out;
 
 	if (!pskb_may_pull(skb, ah_hlen))
@@ -450,7 +450,7 @@ static int ah_init_state(struct xfrm_state *x)
 
 	BUG_ON(ahp->icv_trunc_len > MAX_AH_AUTH_LEN);
 
-	x->props.header_len = XFRM_ALIGN8(sizeof(struct ip_auth_hdr) +
+	x->props.header_len = XFRM_ALIGN4(sizeof(struct ip_auth_hdr) +
 					  ahp->icv_trunc_len);
 	if (x->props.mode == XFRM_MODE_TUNNEL)
 		x->props.header_len += sizeof(struct iphdr);
-- 
1.5.6.5


^ permalink raw reply related

* Re: [PATCH V9 08/13] posix clocks: cleanup the CLOCK_DISPTACH macro
From: Thomas Gleixner @ 2011-01-13 17:03 UTC (permalink / raw)
  To: Richard Cochran
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	John Stultz, Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti
In-Reply-To: <90b2beef441615d01c93fcad029c44af4e505c5f.1294917348.git.richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>

On Thu, 13 Jan 2011, Richard Cochran wrote:
>  int posix_cpu_clock_getres(const clockid_t which_clock, struct timespec *ts);
>  int posix_cpu_clock_get(const clockid_t which_clock, struct timespec *ts);
> -int posix_cpu_clock_set(const clockid_t which_clock, const struct timespec *ts);
> +int posix_cpu_clock_set(const clockid_t which_clock, struct timespec *ts);

Shouldn't we change the clock_set function to have *ts const in all places ?

>  static int posix_ktime_get_ts(clockid_t which_clock, struct timespec *tp)
> @@ -279,12 +230,29 @@ static __init int init_posix_timers(void)
>  {
>  	struct k_clock clock_realtime = {
>  		.clock_getres = hrtimer_get_res,
> +		/* defaults: */
> +		.clock_adj	= common_clock_adj,
> +		.clock_get	= common_clock_get,
> +		.clock_set	= common_clock_set,
> +		.nsleep		= common_nsleep,
> +		.nsleep_restart	= common_nsleep_restart,
> +		.timer_create	= common_timer_create,
> +		.timer_del	= common_timer_del,
> +		.timer_get	= common_timer_get,
> +		.timer_set	= common_timer_set,
>  	};
>  	struct k_clock clock_monotonic = {
>  		.clock_getres = hrtimer_get_res,
>  		.clock_get = posix_ktime_get_ts,
>  		.clock_set = do_posix_clock_nosettime,
>  		.clock_adj = do_posix_clock_noadjtime,
> +		/* defaults: */
> +		.nsleep		= common_nsleep,
> +		.nsleep_restart	= common_nsleep_restart,
> +		.timer_create	= common_timer_create,
> +		.timer_del	= common_timer_del,
> +		.timer_get	= common_timer_get,
> +		.timer_set	= common_timer_set,
>  	};
>  	struct k_clock clock_monotonic_raw = {
>  		.clock_getres = hrtimer_get_res,
> @@ -293,6 +261,11 @@ static __init int init_posix_timers(void)
>  		.clock_adj = do_posix_clock_noadjtime,
>  		.timer_create = no_timer_create,
>  		.nsleep = no_nsleep,
> +		/* defaults: */
> +		.nsleep_restart	= common_nsleep_restart,
> +		.timer_del	= common_timer_del,
> +		.timer_get	= common_timer_get,
> +		.timer_set	= common_timer_set,

Hmm, we do not need to set functional entries for clocks which neither
implement timer_create nor nsleep.

Otherwise I really like the outcome. :)

Thanks for your patience !

       tglx

^ permalink raw reply

* Re: [PATCH v4] netfilter: ipt_CLUSTERIP: remove "no conntrack!"
From: Eric Dumazet @ 2011-01-13 16:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jan Engelhardt, Netfilter Development Mailinglist, netdev,
	Patrick McHardy
In-Reply-To: <4D2F29D6.3040600@netfilter.org>

Le jeudi 13 janvier 2011 à 17:35 +0100, Pablo Neira Ayuso a écrit :
> On 13/01/11 17:30, Pablo Neira Ayuso wrote:
> > On 13/01/11 15:39, Eric Dumazet wrote:
> >> Then, cluster match can be improved, I am sure you already have a patch
> >> for it.
> > 
> > what scenario could benefit from the destination-based hashing?
> 
> I'm telling this because it doesn't make too sense to me.

Me too ;)

But hash(source_IP, source_PORT) definitely make sense in some
workloads.



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next-2.6] etherdevice.h: Add is_unicast_ether_addr function
From: Chris Metcalf @ 2011-01-13 16:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Joe Perches, Tobias Klauser, David Miller, netdev
In-Reply-To: <1294908916.3570.21.camel@edumazet-laptop>

On 1/13/2011 3:55 AM, Eric Dumazet wrote:
> Le jeudi 13 janvier 2011 à 00:24 -0800, Joe Perches a écrit :
>> On Thu, 2011-01-13 at 09:14 +0100, Tobias Klauser wrote:
>>> >From a check for !is_multicast_ether_addr it is not always obvious that
>>> we're checking for a unicast address. So add this helper function to
>>> make those code paths easier to read.
>>>  include/linux/etherdevice.h |   11 +++++++++++
>> []
>>>  /**
>>> + * is_unicast_ether_addr - Determine if the Ethernet address is unicast
>>> + * @addr: Pointer to a six-byte array containing the Ethernet address
>>> + *
>>> + * Return true if the address is a unicast address.
>>> + */
>>> +static inline int is_unicast_ether_addr(const u8 *addr)
>>> +{
>>> +	return !is_multicast_ether_addr(addr);
>>> +}
>> Can't you simply use is_valid_ether_addr?
>>
>> I can't think of much reason that a new function for
>> !multicast without the !is_zero is needed.
>>
> performance ?
>
> is_valid_ether_addr() is used at device init time, not when receiving
> packets.
>
> I am not sure we _need_ to check for is_zero_ether_addr() each time we
> receive a packet.
>
> Either a MAC is unicast or multicast.
>
> A zero address is not multicast for sure.

I agree - the is_zero_ether_addr() check seems pointless in the context of
the running interface.

Also, I think a static inline is better form than a #define, all things
being equal.

So, I like Tobias' reworked patches.  I can take them into the Tilera tree,
but I'd prefer David Miller take them into the net tree if he is agreeable,
since it now includes changes to generic networking code.  If you take the
latter approach you can include my:

Acked-by: Chris Metcalf <cmetcalf@tilera.com>

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com


^ permalink raw reply

* Re: [PATCH v4] netfilter: ipt_CLUSTERIP: remove "no conntrack!"
From: Pablo Neira Ayuso @ 2011-01-13 16:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jan Engelhardt, Netfilter Development Mailinglist, netdev,
	Patrick McHardy
In-Reply-To: <4D2F28B9.50407@netfilter.org>

On 13/01/11 17:30, Pablo Neira Ayuso wrote:
> On 13/01/11 15:39, Eric Dumazet wrote:
>> Then, cluster match can be improved, I am sure you already have a patch
>> for it.
> 
> what scenario could benefit from the destination-based hashing?

I'm telling this because it doesn't make too sense to me.

^ permalink raw reply

* Re: [PATCH v4] netfilter: ipt_CLUSTERIP: remove "no conntrack!"
From: Pablo Neira Ayuso @ 2011-01-13 16:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jan Engelhardt, Netfilter Development Mailinglist, netdev,
	Patrick McHardy
In-Reply-To: <1294929579.3570.163.camel@edumazet-laptop>

On 13/01/11 15:39, Eric Dumazet wrote:
> Le jeudi 13 janvier 2011 à 15:02 +0100, Jan Engelhardt a écrit :
>> On Thursday 2011-01-13 14:38, Eric Dumazet wrote:
>>
>>> Le jeudi 13 janvier 2011 à 12:54 +0100, Pablo Neira Ayuso a écrit :
>>>
>>>> But printing this does not provide any useful information. The first
>>>> packet that does not belong to the cluster node that has received the
>>>> packet, or the first invalid packet, will trigger this.
>>>>
>>>> Moreover, this confuses users since they can do nothing if they receive
>>>> this message.
>>>>
>>>> Moreover, this target should be supersedes by the cluster match, which
>>>> has been there for quite some time (it's also more flexible).
>>>
>>> Now you mentioned it, cluster match is not as flexible right now,
>>> its hashing is on source_ip only.
>>
>> I think in that case, xt_cluster should be improved rather
>> than an old module.
> 
> Amen
> 
> We should not improve IPv4 support then, I see.
> 
> My customers use this old module, and upgrading to xt_cluster is not an
> option.
> 
> Should we discuss this forever or fix it ?

hey hey, I'm fine with fixing things. Patch v4 is OK.

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

> In the end, people are forced to add useless iptables rule to DROP
> INVALID packets before entering ipt_CLUSTERIP, after googling or
> eventually asking to experts.
> 
> Last time this was discussed, this went nowhere :
> 
> http://www.spinics.net/lists/netfilter/msg48676.html
> 
> Come on guys, we can do it, dont be afraid.
> 
> A non rate limited printk() in kernel is forbidden, especially in
> network stack.
> 
> Then, cluster match can be improved, I am sure you already have a patch
> for it.

what scenario could benefit from the destination-based hashing?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] net: add Faraday FTMAC100 10/100 Ethernet driver
From: Andres Salomon @ 2011-01-13 16:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev
In-Reply-To: <1294928559.3570.130.camel@edumazet-laptop>

On Thu, 13 Jan 2011 15:22:39 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le jeudi 13 janvier 2011 à 19:49 +0800, Po-Yu Chuang a écrit :
> > From: Po-Yu Chuang <ratbert@faraday-tech.com>
> > 
> > FTMAC100 Ethernet Media Access Controller supports 10/100 Mbps and
> > MII.  This driver has been working on some ARM/NDS32 SoC including
> > Faraday A320 and Andes AG101.
> > 
> > Signed-off-by: Po-Yu Chuang <ratbert@faraday-tech.com>
> > ---
> >  drivers/net/Kconfig    |    9 +
> >  drivers/net/Makefile   |    1 +
> >  drivers/net/ftmac100.c | 1341
> > ++++++++++++++++++++++++++++++++++++++++++++++++
> > drivers/net/ftmac100.h |  180 +++++++ 4 files changed, 1531
> > insertions(+), 0 deletions(-) create mode 100644
> > drivers/net/ftmac100.c create mode 100644 drivers/net/ftmac100.h
> 
> Hi
> 
[...]
> 
> 9) Instead of dev_info(&netdev->dev ...) , please consider
> netdev_info()
> 
> 

No one else mentioned it, so I'll add:

Don't explicitly inline functions unless they're in a header, or you
have a really good reason (and that reason should probably be described
in a comment).  Otherwise, just leave off the 'inline' keyword; the
compiler is smart enough to decide whether a function should be inlined
or not.

^ permalink raw reply

* sch_sfb [was: net_sched: mark packet staying on queue too long]
From: Juliusz Chroboczek @ 2011-01-13 16:04 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev, Jesper Dangaard Brouer, David Miller
In-Reply-To: <4D2EE708.8010806@trash.net>

>> Have you looked at the SFB (Stochastic Fair Blue) implementation by
>> Juliusz Chroboczek?

>> http://www.pps.jussieu.fr/~jch/software/sfb/

> I had a closer look at this some time ago and noticed a couple of bugs
> (f.i. double buffering might be enabled or disabled or the buffers
> switched while a packet is queued, so on dequeue the wrong buffer will
> have its queue length decremented) and also found the hashing quite
> inefficient,

And you never found the time to drop me a mail on the subject?

> so I've implemented my own version.

I see.

                                        Juliusz

^ permalink raw reply

* Re: [PATCH] tcp: remove duplicate th = tcp_hdr(skb)
From: Eric Dumazet @ 2011-01-13 15:59 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: davem, netdev
In-Reply-To: <1294933302-28469-1-git-send-email-christoph.paasch@uclouvain.be>

Le jeudi 13 janvier 2011 à 16:41 +0100, Christoph Paasch a écrit :
> th is already set some lines before to be th = tcp_hdr(skb).
> 
> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
> ---
>  net/ipv4/tcp_ipv4.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
> 
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 856f684..7fe29c6 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1644,7 +1644,6 @@ int tcp_v4_rcv(struct sk_buff *skb)
>  	if (!skb_csum_unnecessary(skb) && tcp_v4_checksum_init(skb))
>  		goto bad_packet;
>  

Well... no please.

> -	th = tcp_hdr(skb);
>  	iph = ip_hdr(skb);
>  	TCP_SKB_CB(skb)->seq = ntohl(th->seq);
>  	TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin +

Hint : pskb_may_pull() can change tcp_hdr(skb) result.



^ permalink raw reply

* Re: [RFC] net_sched: mark packet staying on queue too long
From: Eric Dumazet @ 2011-01-13 15:54 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Jesper Dangaard Brouer, Stephen Hemminger, hadi, Jarek Poplawski,
	David Miller, netdev
In-Reply-To: <4D2EE708.8010806@trash.net>

Le jeudi 13 janvier 2011 à 12:50 +0100, Patrick McHardy a écrit :
> On 04.01.2011 15:19, Jesper Dangaard Brouer wrote:
> >> ...
> >> You might want to look into CHOKe and ECSFQ which are other AQM models
> >> that have shown up in research.
> > 
> > Have you looked at the SFB (Stochastic Fair Blue) implementation by Juliusz Chroboczek?
> > 
> > http://www.pps.jussieu.fr/~jch/software/sfb/
> 
> I had a closer look at this some time ago and noticed a couple of bugs
> (f.i. double buffering might be enabled or disabled or the buffers
> switched while a packet is queued, so on dequeue the wrong buffer will
> have its queue length decremented) and also found the hashing quite
> inefficient, so I've implemented my own version. There's still a minor
> bug somewhere, but if people are interested I could finish it some
> time soon and post the patches.

I am very interested Patrick, I also found SFB hashing inefficient, so
any new idea is welcomed. 

We should get more schedulers at hand.

CHOKe is about to be released, QFQ is also coming soon, thanks to
Stephen (and respective AQM authors)

Thanks



^ permalink raw reply

* Re: [PATCH] tcp: remove duplicate th = tcp_hdr(skb)
From: Jesse Gross @ 2011-01-13 15:54 UTC (permalink / raw)
  To: Christoph Paasch; +Cc: davem, netdev
In-Reply-To: <1294933302-28469-1-git-send-email-christoph.paasch@uclouvain.be>

On Thu, Jan 13, 2011 at 10:41 AM, Christoph Paasch
<christoph.paasch@uclouvain.be> wrote:
> th is already set some lines before to be th = tcp_hdr(skb).
>
> Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
> ---
>  net/ipv4/tcp_ipv4.c |    1 -
>  1 files changed, 0 insertions(+), 1 deletions(-)
>
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 856f684..7fe29c6 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1644,7 +1644,6 @@ int tcp_v4_rcv(struct sk_buff *skb)
>        if (!skb_csum_unnecessary(skb) && tcp_v4_checksum_init(skb))
>                goto bad_packet;
>
> -       th = tcp_hdr(skb);

It needs to be reloaded because pskb_may_pull() may reallocate the buffer.

^ permalink raw reply

* [PATCH] tcp: remove duplicate th = tcp_hdr(skb)
From: Christoph Paasch @ 2011-01-13 15:41 UTC (permalink / raw)
  To: davem, netdev; +Cc: Christoph Paasch

th is already set some lines before to be th = tcp_hdr(skb).

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
---
 net/ipv4/tcp_ipv4.c |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 856f684..7fe29c6 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1644,7 +1644,6 @@ int tcp_v4_rcv(struct sk_buff *skb)
 	if (!skb_csum_unnecessary(skb) && tcp_v4_checksum_init(skb))
 		goto bad_packet;
 
-	th = tcp_hdr(skb);
 	iph = ip_hdr(skb);
 	TCP_SKB_CB(skb)->seq = ntohl(th->seq);
 	TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin +
-- 
1.7.1


^ permalink raw reply related

* Re: Flow Control and Port Mirroring Revisited
From: Jesse Gross @ 2011-01-13 15:45 UTC (permalink / raw)
  To: Simon Horman
  Cc: Eric Dumazet, Rusty Russell, virtualization, dev, virtualization,
	netdev, kvm, Michael S. Tsirkin
In-Reply-To: <20110113064718.GA17905@verge.net.au>

On Thu, Jan 13, 2011 at 1:47 AM, Simon Horman <horms@verge.net.au> wrote:
> On Mon, Jan 10, 2011 at 06:31:55PM +0900, Simon Horman wrote:
>> On Fri, Jan 07, 2011 at 10:23:58AM +0900, Simon Horman wrote:
>> > On Thu, Jan 06, 2011 at 05:38:01PM -0500, Jesse Gross wrote:
>> >
>> > [ snip ]
>> > >
>> > > I know that everyone likes a nice netperf result but I agree with
>> > > Michael that this probably isn't the right question to be asking.  I
>> > > don't think that socket buffers are a real solution to the flow
>> > > control problem: they happen to provide that functionality but it's
>> > > more of a side effect than anything.  It's just that the amount of
>> > > memory consumed by packets in the queue(s) doesn't really have any
>> > > implicit meaning for flow control (think multiple physical adapters,
>> > > all with the same speed instead of a virtual device and a physical
>> > > device with wildly different speeds).  The analog in the physical
>> > > world that you're looking for would be Ethernet flow control.
>> > > Obviously, if the question is limiting CPU or memory consumption then
>> > > that's a different story.
>> >
>> > Point taken. I will see if I can control CPU (and thus memory) consumption
>> > using cgroups and/or tc.
>>
>> I have found that I can successfully control the throughput using
>> the following techniques
>>
>> 1) Place a tc egress filter on dummy0
>>
>> 2) Use ovs-ofctl to add a flow that sends skbs to dummy0 and then eth1,
>>    this is effectively the same as one of my hacks to the datapath
>>    that I mentioned in an earlier mail. The result is that eth1
>>    "paces" the connection.
>
> Further to this, I wonder if there is any interest in providing
> a method to switch the action order - using ovs-ofctl is a hack imho -
> and/or switching the default action order for mirroring.

I'm not sure that there is a way to do this that is correct in the
generic case.  It's possible that the destination could be a VM while
packets are being mirrored to a physical device or we could be
multicasting or some other arbitrarily complex scenario.  Just think
of what a physical switch would do if it has ports with two different
speeds.

^ permalink raw reply

* Re: [PATCH] net: add Faraday FTMAC100 10/100 Ethernet driver
From: Joe Perches @ 2011-01-13 15:39 UTC (permalink / raw)
  To: Po-Yu Chuang; +Cc: netdev, linux-kernel, Po-Yu Chuang
In-Reply-To: <1294919372-1904-1-git-send-email-ratbert.chuang@gmail.com>

On Thu, 2011-01-13 at 19:49 +0800, Po-Yu Chuang wrote:
> From: Po-Yu Chuang <ratbert@faraday-tech.com>
> 
> FTMAC100 Ethernet Media Access Controller supports 10/100 Mbps and
> MII.  This driver has been working on some ARM/NDS32 SoC including
> Faraday A320 and Andes AG101.

A couple of trivial comments not already mentioned by others.

> diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
[]
> @@ -2014,6 +2014,15 @@ config BCM63XX_ENET
>  	  This driver supports the ethernet MACs in the Broadcom 63xx
>  	  MIPS chipset family (BCM63XX).
>  
> +config FTMAC100
> +	tristate "Faraday FTMAC100 10/100 Ethernet support"
> +	depends on ARM
> +	select MII
> +	help
> +	  This driver supports the FTMAC100 Ethernet controller from
> +	  Faraday. It is used on Faraday A320, Andes AG101, AG101P
> +	  and some other ARM/NDS32 SoC's.
> +

ARM specific net drivers are for now in drivers/net/arm/
Perhaps it's better to locate these files there?

> diff --git a/drivers/net/ftmac100.c b/drivers/net/ftmac100.c
[]
> +static int ftmac100_rx_packet_error(struct ftmac100 *priv,
> +		struct ftmac100_rxdes *rxdes)
> +{
> +	struct device *dev = &priv->netdev->dev;
> +	int error = 0;
> +
> +	if (unlikely(ftmac100_rxdes_rx_error(rxdes))) {
> +		if (printk_ratelimit())
> +			dev_info(dev, "rx err\n");

There are many printk_ratelimit() tests.

It's better to use net_ratelimit() or a local ratelimit_state
so there's less possible suppression of other subsystem
messages.

^ permalink raw reply

* Re: [PATCH v4 08/10] ARM: mxs: add ocotp read function
From: Uwe Kleine-König @ 2011-01-13 15:19 UTC (permalink / raw)
  To: Shawn Guo
  Cc: davem, gerg, baruch, eric, bryan.wu, r64343, B32542, lw, w.sang,
	s.hauer, jamie, jamie, netdev, linux-arm-kernel
In-Reply-To: <1294297998-26930-9-git-send-email-shawn.guo@freescale.com>

On Thu, Jan 06, 2011 at 03:13:16PM +0800, Shawn Guo wrote:
> Signed-off-by: Shawn Guo <shawn.guo@freescale.com>
> ---
> Changes for v4:
>  - Call cpu_relax() during polling
> 
> Changes for v2:
>  - Add mutex locking for mxs_read_ocotp()
>  - Use type size_t for count and i
>  - Add comment for clk_enable/disable skipping
>  - Add ERROR bit clearing and polling step
> 
>  arch/arm/mach-mxs/Makefile              |    2 +-
>  arch/arm/mach-mxs/include/mach/common.h |    1 +
>  arch/arm/mach-mxs/ocotp.c               |   79 +++++++++++++++++++++++++++++++
>  3 files changed, 81 insertions(+), 1 deletions(-)
>  create mode 100644 arch/arm/mach-mxs/ocotp.c
> 
> diff --git a/arch/arm/mach-mxs/Makefile b/arch/arm/mach-mxs/Makefile
> index 39d3f9c..f23ebbd 100644
> --- a/arch/arm/mach-mxs/Makefile
> +++ b/arch/arm/mach-mxs/Makefile
> @@ -1,5 +1,5 @@
>  # Common support
> -obj-y := clock.o devices.o gpio.o icoll.o iomux.o system.o timer.o
> +obj-y := clock.o devices.o gpio.o icoll.o iomux.o ocotp.o system.o timer.o
is it worth to make ocotp optional?  (and let evk select
CONFIG_MXS_OCOTP)

Best regards
Uwe
>  
>  obj-$(CONFIG_SOC_IMX23) += clock-mx23.o mm-mx23.o
>  obj-$(CONFIG_SOC_IMX28) += clock-mx28.o mm-mx28.o
> diff --git a/arch/arm/mach-mxs/include/mach/common.h b/arch/arm/mach-mxs/include/mach/common.h
> index 59133eb..cf02552 100644
> --- a/arch/arm/mach-mxs/include/mach/common.h
> +++ b/arch/arm/mach-mxs/include/mach/common.h
> @@ -13,6 +13,7 @@
>  
>  struct clk;
>  
> +extern int mxs_read_ocotp(int offset, int count, u32 *values);
>  extern int mxs_reset_block(void __iomem *);
>  extern void mxs_timer_init(struct clk *, int);
>  
> diff --git a/arch/arm/mach-mxs/ocotp.c b/arch/arm/mach-mxs/ocotp.c
> new file mode 100644
> index 0000000..e2d39aa
> --- /dev/null
> +++ b/arch/arm/mach-mxs/ocotp.c
> @@ -0,0 +1,79 @@
> +/*
> + * Copyright 2010 Freescale Semiconductor, Inc. All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/delay.h>
> +#include <linux/err.h>
> +#include <linux/mutex.h>
> +
> +#include <mach/mxs.h>
> +
> +#define BM_OCOTP_CTRL_BUSY		(1 << 8)
> +#define BM_OCOTP_CTRL_ERROR		(1 << 9)
> +#define BM_OCOTP_CTRL_RD_BANK_OPEN	(1 << 12)
> +
> +static DEFINE_MUTEX(ocotp_mutex);
> +
> +int mxs_read_ocotp(unsigned offset, size_t count, u32 *values)
> +{
> +	void __iomem *ocotp_base = MXS_IO_ADDRESS(MXS_OCOTP_BASE_ADDR);
> +	int timeout = 0x400;
> +	size_t i;
> +
> +	mutex_lock(&ocotp_mutex);
> +
> +	/*
> +	 * clk_enable(hbus_clk) for ocotp can be skipped
> +	 * as it must be on when system is running.
> +	 */
> +
> +	/* try to clear ERROR bit */
> +	__mxs_clrl(BM_OCOTP_CTRL_ERROR, ocotp_base);
> +
> +	/* check both BUSY and ERROR cleared */
> +	while ((__raw_readl(ocotp_base) &
> +		(BM_OCOTP_CTRL_BUSY | BM_OCOTP_CTRL_ERROR)) && --timeout)
> +		cpu_relax();
> +
> +	if (unlikely(!timeout))
> +		goto error_unlock;
> +
> +	/* open OCOTP banks for read */
> +	__mxs_setl(BM_OCOTP_CTRL_RD_BANK_OPEN, ocotp_base);
> +
> +	/* approximately wait 32 hclk cycles */
> +	udelay(1);
> +
> +	/* poll BUSY bit becoming cleared */
> +	timeout = 0x400;
> +	while ((__raw_readl(ocotp_base) & BM_OCOTP_CTRL_BUSY) && --timeout)
> +		cpu_relax();
> +
> +	if (unlikely(!timeout))
> +		goto error_unlock;
> +
> +	for (i = 0; i < count; i++, offset += 4)
> +		*values++ = __raw_readl(ocotp_base + offset);
> +
> +	/* close banks for power saving */
> +	__mxs_clrl(BM_OCOTP_CTRL_RD_BANK_OPEN, ocotp_base);
> +
> +	mutex_unlock(&ocotp_mutex);
> +
> +	return 0;
> +
> +error_unlock:
> +	mutex_unlock(&ocotp_mutex);
> +	pr_err("%s: timeout in reading OCOTP\n", __func__);
> +	return -ETIMEDOUT;
> +}
> -- 
> 1.7.1
> 
> 
> 

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox