Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next-2.6] etherdevice.h: Add is_unicast_ether_addr function
From: Chris Metcalf @ 2011-01-13 16:38 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Joe Perches, Tobias Klauser, David Miller, netdev
In-Reply-To: <1294908916.3570.21.camel@edumazet-laptop>

On 1/13/2011 3:55 AM, Eric Dumazet wrote:
> Le jeudi 13 janvier 2011 à 00:24 -0800, Joe Perches a écrit :
>> On Thu, 2011-01-13 at 09:14 +0100, Tobias Klauser wrote:
>>> >From a check for !is_multicast_ether_addr it is not always obvious that
>>> we're checking for a unicast address. So add this helper function to
>>> make those code paths easier to read.
>>>  include/linux/etherdevice.h |   11 +++++++++++
>> []
>>>  /**
>>> + * is_unicast_ether_addr - Determine if the Ethernet address is unicast
>>> + * @addr: Pointer to a six-byte array containing the Ethernet address
>>> + *
>>> + * Return true if the address is a unicast address.
>>> + */
>>> +static inline int is_unicast_ether_addr(const u8 *addr)
>>> +{
>>> +	return !is_multicast_ether_addr(addr);
>>> +}
>> Can't you simply use is_valid_ether_addr?
>>
>> I can't think of much reason that a new function for
>> !multicast without the !is_zero is needed.
>>
> performance ?
>
> is_valid_ether_addr() is used at device init time, not when receiving
> packets.
>
> I am not sure we _need_ to check for is_zero_ether_addr() each time we
> receive a packet.
>
> Either a MAC is unicast or multicast.
>
> A zero address is not multicast for sure.

I agree - the is_zero_ether_addr() check seems pointless in the context of
the running interface.

Also, I think a static inline is better form than a #define, all things
being equal.

So, I like Tobias' reworked patches.  I can take them into the Tilera tree,
but I'd prefer David Miller take them into the net tree if he is agreeable,
since it now includes changes to generic networking code.  If you take the
latter approach you can include my:

Acked-by: Chris Metcalf <cmetcalf@tilera.com>

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com


^ permalink raw reply

* Re: [PATCH v4] netfilter: ipt_CLUSTERIP: remove "no conntrack!"
From: Eric Dumazet @ 2011-01-13 16:48 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Jan Engelhardt, Netfilter Development Mailinglist, netdev,
	Patrick McHardy
In-Reply-To: <4D2F29D6.3040600@netfilter.org>

Le jeudi 13 janvier 2011 à 17:35 +0100, Pablo Neira Ayuso a écrit :
> On 13/01/11 17:30, Pablo Neira Ayuso wrote:
> > On 13/01/11 15:39, Eric Dumazet wrote:
> >> Then, cluster match can be improved, I am sure you already have a patch
> >> for it.
> > 
> > what scenario could benefit from the destination-based hashing?
> 
> I'm telling this because it doesn't make too sense to me.

Me too ;)

But hash(source_IP, source_PORT) definitely make sense in some
workloads.



--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH V9 08/13] posix clocks: cleanup the CLOCK_DISPTACH macro
From: Thomas Gleixner @ 2011-01-13 17:03 UTC (permalink / raw)
  To: Richard Cochran
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	John Stultz, Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti
In-Reply-To: <90b2beef441615d01c93fcad029c44af4e505c5f.1294917348.git.richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>

On Thu, 13 Jan 2011, Richard Cochran wrote:
>  int posix_cpu_clock_getres(const clockid_t which_clock, struct timespec *ts);
>  int posix_cpu_clock_get(const clockid_t which_clock, struct timespec *ts);
> -int posix_cpu_clock_set(const clockid_t which_clock, const struct timespec *ts);
> +int posix_cpu_clock_set(const clockid_t which_clock, struct timespec *ts);

Shouldn't we change the clock_set function to have *ts const in all places ?

>  static int posix_ktime_get_ts(clockid_t which_clock, struct timespec *tp)
> @@ -279,12 +230,29 @@ static __init int init_posix_timers(void)
>  {
>  	struct k_clock clock_realtime = {
>  		.clock_getres = hrtimer_get_res,
> +		/* defaults: */
> +		.clock_adj	= common_clock_adj,
> +		.clock_get	= common_clock_get,
> +		.clock_set	= common_clock_set,
> +		.nsleep		= common_nsleep,
> +		.nsleep_restart	= common_nsleep_restart,
> +		.timer_create	= common_timer_create,
> +		.timer_del	= common_timer_del,
> +		.timer_get	= common_timer_get,
> +		.timer_set	= common_timer_set,
>  	};
>  	struct k_clock clock_monotonic = {
>  		.clock_getres = hrtimer_get_res,
>  		.clock_get = posix_ktime_get_ts,
>  		.clock_set = do_posix_clock_nosettime,
>  		.clock_adj = do_posix_clock_noadjtime,
> +		/* defaults: */
> +		.nsleep		= common_nsleep,
> +		.nsleep_restart	= common_nsleep_restart,
> +		.timer_create	= common_timer_create,
> +		.timer_del	= common_timer_del,
> +		.timer_get	= common_timer_get,
> +		.timer_set	= common_timer_set,
>  	};
>  	struct k_clock clock_monotonic_raw = {
>  		.clock_getres = hrtimer_get_res,
> @@ -293,6 +261,11 @@ static __init int init_posix_timers(void)
>  		.clock_adj = do_posix_clock_noadjtime,
>  		.timer_create = no_timer_create,
>  		.nsleep = no_nsleep,
> +		/* defaults: */
> +		.nsleep_restart	= common_nsleep_restart,
> +		.timer_del	= common_timer_del,
> +		.timer_get	= common_timer_get,
> +		.timer_set	= common_timer_set,

Hmm, we do not need to set functional entries for clocks which neither
implement timer_create nor nsleep.

Otherwise I really like the outcome. :)

Thanks for your patience !

       tglx

^ permalink raw reply

* [RFC PATCH] ipsec: fix IPv4 AH alignment on 32 bits
From: Nicolas Dichtel @ 2011-01-13 17:20 UTC (permalink / raw)
  To: netdev; +Cc: Christophe Gouault

[-- Attachment #1: Type: text/plain, Size: 298 bytes --]

Hi,

here is a patch to fix alignment of IPv4 AH. Note that this break compatiblity 
for some algorithms (like SHA256) with old kernels ... but upstream cannot use 
SHA256 on IPv4, for example, with a target that is RFC compliant.

I don't know what is the best way to fix this.


Regards,
Nicolas

[-- Attachment #2: 0001-ipsec-fix-IPv4-AH-alignment-on-32-bits.patch --]
[-- Type: text/x-patch, Size: 2682 bytes --]

>From 14bbe173eed25cf59e3e54222eb7de1a5578e54e Mon Sep 17 00:00:00 2001
From: Dang Hongwu <hongwu.dang@6wind.com>
Date: Wed, 22 Dec 2010 11:38:47 -0500
Subject: [PATCH] ipsec: fix IPv4 AH alignment on 32 bits

The Linux IPv4 AH stack aligns the AH header on a 64 bit boundary
(like in IPv6). This is not RFC compliant (see RFC4302, Section
3.3.3.2.1), it should be aligned on 32 bits.

For most of the authentication algorithms, the ICV size is 96 bits.
The AH header alignment on 32 or 64 bits gives the same results.

However for SHA-256-128 for instance, the wrong 64 bit alignment results
in adding useless padding in IPv4 AH, which is forbidden by the RFC.

Signed-off-by: Dang Hongwu <hongwu.dang@6wind.com>
Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/xfrm.h |    1 +
 net/ipv4/ah4.c     |    8 ++++----
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index bcfb6b2..525d882 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -36,6 +36,7 @@
 #define XFRM_PROTO_ROUTING	IPPROTO_ROUTING
 #define XFRM_PROTO_DSTOPTS	IPPROTO_DSTOPTS
 
+#define XFRM_ALIGN4(len)	(((len) + 3) & ~3)
 #define XFRM_ALIGN8(len)	(((len) + 7) & ~7)
 #define MODULE_ALIAS_XFRM_MODE(family, encap) \
 	MODULE_ALIAS("xfrm-mode-" __stringify(family) "-" __stringify(encap))
diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index 86961be..95561d6 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -201,7 +201,7 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb)
 	top_iph->ttl = 0;
 	top_iph->check = 0;
 
-	ah->hdrlen  = (XFRM_ALIGN8(sizeof(*ah) + ahp->icv_trunc_len) >> 2) - 2;
+	ah->hdrlen  = (XFRM_ALIGN4(sizeof(*ah) + ahp->icv_trunc_len) >> 2) - 2;
 
 	ah->reserved = 0;
 	ah->spi = x->id.spi;
@@ -299,8 +299,8 @@ static int ah_input(struct xfrm_state *x, struct sk_buff *skb)
 	nexthdr = ah->nexthdr;
 	ah_hlen = (ah->hdrlen + 2) << 2;
 
-	if (ah_hlen != XFRM_ALIGN8(sizeof(*ah) + ahp->icv_full_len) &&
-	    ah_hlen != XFRM_ALIGN8(sizeof(*ah) + ahp->icv_trunc_len))
+	if (ah_hlen != XFRM_ALIGN4(sizeof(*ah) + ahp->icv_full_len) &&
+	    ah_hlen != XFRM_ALIGN4(sizeof(*ah) + ahp->icv_trunc_len))
 		goto out;
 
 	if (!pskb_may_pull(skb, ah_hlen))
@@ -450,7 +450,7 @@ static int ah_init_state(struct xfrm_state *x)
 
 	BUG_ON(ahp->icv_trunc_len > MAX_AH_AUTH_LEN);
 
-	x->props.header_len = XFRM_ALIGN8(sizeof(struct ip_auth_hdr) +
+	x->props.header_len = XFRM_ALIGN4(sizeof(struct ip_auth_hdr) +
 					  ahp->icv_trunc_len);
 	if (x->props.mode == XFRM_MODE_TUNNEL)
 		x->props.header_len += sizeof(struct iphdr);
-- 
1.5.6.5


^ permalink raw reply related

* [PATCH] CHOKe flow scheduler (0.7)
From: Stephen Hemminger @ 2011-01-13 17:27 UTC (permalink / raw)
  To: David Miller, Eric Dumazet; +Cc: netdev

This implements the CHOKe packet scheduler based on the existing
Linux RED scheduler based on the algorithm described in the paper.

The core idea is:
  For every packet arrival:
  	Calculate Qave
	if (Qave < minth) 
	     Queue the new packet
	else 
	     Select randomly a packet from the queue 
	     if (both packets from same flow)
	     then Drop both the packets
	     else if (Qave > maxth)
	          Drop packet
	     else
	       	  Admit packet with proability p (same as RED)

See also:
  Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
   queue management scheme for approximating fair bandwidth allocation", 
  Proceeding of INFOCOM'2000, March 2000.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

---
Patch versions
0.3 (Eric) uses table for queue.
0.4 allows classification with TC filters
    fixes crash when peek_random() finds a hole
0.5 (Eric) that fixes qlen with holes and peek
0.7 change to use separate params / stats than RED
    account for drops in backlog

Almost ready, still need to make sure API (netlink) is right


 net/sched/Kconfig     |   11 +
 net/sched/Makefile    |    1 
 net/sched/sch_choke.c |  536 ++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 548 insertions(+)

--- a/net/sched/Kconfig	2011-01-12 17:44:05.747500044 -0800
+++ b/net/sched/Kconfig	2011-01-12 17:44:53.167735188 -0800
@@ -205,6 +205,17 @@ config NET_SCH_DRR
 
 	  If unsure, say N.
 
+config NET_SCH_CHOKE
+	tristate "CHOose and Keep responsive flow scheduler (CHOKE)"
+	help
+	  Say Y here if you want to use the CHOKe packet scheduler (CHOose
+	  and Keep for responsive flows, CHOose and Kill for unresponsive
+	  flows). This is a variation of RED which trys to penalize flows
+	  that monopolize the queue.
+
+	  To compile this code as a module, choose M here: the
+	  module will be called sch_choke.
+
 config NET_SCH_INGRESS
 	tristate "Ingress Qdisc"
 	depends on NET_CLS_ACT
--- a/net/sched/Makefile	2011-01-12 17:44:05.767500135 -0800
+++ b/net/sched/Makefile	2011-01-12 17:44:53.167735188 -0800
@@ -32,6 +32,7 @@ obj-$(CONFIG_NET_SCH_MULTIQ)	+= sch_mult
 obj-$(CONFIG_NET_SCH_ATM)	+= sch_atm.o
 obj-$(CONFIG_NET_SCH_NETEM)	+= sch_netem.o
 obj-$(CONFIG_NET_SCH_DRR)	+= sch_drr.o
+obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
 obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
 obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
 obj-$(CONFIG_NET_CLS_FW)	+= cls_fw.o
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ b/net/sched/sch_choke.c	2011-01-12 17:45:07.227806180 -0800
@@ -0,0 +1,556 @@
+/*
+ * net/sched/sch_choke.c	CHOKE scheduler
+ *
+ * Copyright (c) 2011 Stephen Hemminger <shemminger@vyatta.com>
+ * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+#include <linux/reciprocal_div.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+#include <net/red.h>
+
+/*	CHOKe stateless AQM for fair bandwidth allocation
+        =================================================
+
+   CHOKe (CHOose and Keep for responsive flows, CHOose and Kill for
+   unresponsive flows) is a variant of RED that penalizes misbehaving flows but
+   maintains no flow state. The difference from RED is an additional step
+   during the enqueuing process. If average queue size is over the
+   low threshold (qmin), a packet is chosen at random from the queue.
+   If both the new and chosen packet are from the same flow, both
+   are dropped. Unlike RED, CHOKe is not really a "classful" qdisc because it
+   needs to access packets in queue randomly. It has a minimal class
+   interface to allow overriding the builtin flow classifier with
+   filters.
+
+   Source:
+   R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless
+   Active Queue Management Scheme for Approximating Fair Bandwidth Allocation",
+   IEEE INFOCOM, 2000.
+
+   A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial
+   Characteristics", IEEE/ACM Transactions on Networking, 2004
+
+ */
+
+/* Upper bound on size of sk_buff table */
+#define CHOKE_MAX_QUEUE	(128*1024 - 1)
+
+struct choke_sched_data {
+/* Parameters */
+	u32		 limit;
+	unsigned char	 flags;
+
+	struct red_parms parms;
+
+/* Variables */
+	struct tcf_proto *filter_list;
+	struct {
+		u32	prob_drop;	/* Early probability drops */
+		u32	prob_mark;	/* Early probability marks */
+		u32	forced_drop;	/* Forced drops, qavg > max_thresh */
+		u32	forced_mark;	/* Forced marks, qavg > max_thresh */
+		u32	pdrop;          /* Drops due to queue limits */
+		u32	other;          /* Drops due to drop() calls */
+		u32	matched;	/* Drops to flow match */
+	} stats;
+
+	unsigned int	 head;
+	unsigned int	 tail;
+	unsigned int	 holes;
+	unsigned int	 tab_mask; /* size - 1 */
+
+	struct sk_buff **tab;
+};
+
+static inline unsigned int choke_len(const struct choke_sched_data *q)
+{
+	return (q->tail - q->head) & q->tab_mask;
+}
+
+/* deliver a random number between 0 and N - 1 */
+static inline u32 random_N(unsigned int N)
+{
+	return reciprocal_divide(random32(), N);
+}
+
+/* Select a packet at random from the queue in O(1) and handle holes */
+static struct sk_buff *choke_peek_random(struct choke_sched_data *q,
+					 unsigned int *pidx)
+{
+	struct sk_buff *skb;
+	int retrys = 3;
+
+	do {
+		*pidx = (q->head + random_N(choke_len(q))) & q->tab_mask;
+		skb = q->tab[*pidx];
+		if (skb)
+			return skb;
+	} while (--retrys > 0);
+
+	/* queue is has lots of holes use the head which is known to exist */
+	return q->tab[*pidx = q->head];
+}
+
+/* Is ECN parameter configured */
+static inline int use_ecn(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_ECN;
+}
+
+/* Should packets over max just be dropped (versus marked) */
+static inline int use_harddrop(const struct choke_sched_data *q)
+{
+	return q->flags & TC_RED_HARDDROP;
+}
+
+/* Move head pointer forward to skip over holes */
+static void choke_zap_head_holes(struct choke_sched_data *q)
+{
+	while (q->holes && q->tab[q->head] == NULL) {
+		q->head = (q->head + 1) & q->tab_mask;
+		q->holes--;
+	}
+}
+
+/* Move tail pointer backwards to reuse holes */
+static void choke_zap_tail_holes(struct choke_sched_data *q)
+{
+	while (q->holes && q->tab[q->tail - 1] == NULL) {
+		q->tail = (q->tail - 1) & q->tab_mask;
+		q->holes--;
+	}
+}
+
+/* Drop packet from queue array by creating a "hole" */
+static void choke_drop_by_idx(struct choke_sched_data *q, unsigned int idx)
+{
+	q->tab[idx] = NULL;
+	q->holes++;
+
+	if (idx == q->head)
+		choke_zap_head_holes(q);
+	if (idx == q->tail)
+		choke_zap_tail_holes(q);
+}
+
+/* Classify flow using either:
+   1. pre-existing classification result in skb
+   2. fast internal classification
+   3. use TC filter based classification
+*/
+static inline unsigned int choke_classify(struct sk_buff *skb,
+					  struct Qdisc *sch, int *qerr)
+
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tcf_result res;
+	int result;
+
+	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
+
+	if (TC_H_MAJ(skb->priority) == sch->handle &&
+	    TC_H_MIN(skb->priority) > 0)
+		return TC_H_MIN(skb->priority);
+
+	if (!q->filter_list)
+		return skb_get_rxhash(skb);
+
+	result = tc_classify(skb, q->filter_list, &res);
+	if (result >= 0) {
+#ifdef CONFIG_NET_CLS_ACT
+		switch (result) {
+		case TC_ACT_STOLEN:
+		case TC_ACT_QUEUED:
+			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
+		case TC_ACT_SHOT:
+			return 0;
+		}
+#endif
+		return TC_H_MIN(res.classid);
+	}
+
+	return 0;
+}
+
+static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct red_parms *p = &q->parms;
+	unsigned int hash;
+	int uninitialized_var(ret);
+
+	hash = choke_classify(skb, sch, &ret);
+	if (!hash) {
+		/* Packet was eaten by filter */
+		if (ret & __NET_XMIT_BYPASS)
+			sch->qstats.drops++;
+		kfree_skb(skb);
+		return ret;
+	}
+
+	/* Maybe add hash as field in struct qdisc_skb_cb? */
+	*(unsigned int *)(qdisc_skb_cb(skb)->data) = hash;
+
+	/* Compute average queue usage (see RED) */
+	p->qavg = red_calc_qavg(p, choke_len(q) - q->holes);
+	if (red_is_idling(p))
+		red_end_of_idle_period(p);
+
+	/* Is queue small? */
+	if (p->qavg <= p->qth_min)
+		p->qcount = -1;
+	else {
+		struct sk_buff *oskb;
+		unsigned int idx;
+
+		/* Draw a packet at random from queue */
+		oskb = choke_peek_random(q, &idx);
+
+		/* Both packets from same flow ? */
+		if (*(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
+			/* Drop both packets */
+			q->stats.matched++;
+			choke_drop_by_idx(q, idx);
+			qdisc_drop(oskb, sch);
+			goto congestion_drop;
+		}
+
+		/* Queue is large, always mark/drop */
+		if (p->qavg > p->qth_max) {
+			p->qcount = -1;
+
+			sch->qstats.overlimits++;
+			if (use_harddrop(q) || !use_ecn(q) ||
+			    !INET_ECN_set_ce(skb)) {
+				q->stats.forced_drop++;
+				goto congestion_drop;
+			}
+
+			q->stats.forced_mark++;
+		} else if (++p->qcount) {
+			if (red_mark_probability(p, p->qavg)) {
+				p->qcount = 0;
+				p->qR = red_random(p);
+
+				sch->qstats.overlimits++;
+				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
+					q->stats.prob_drop++;
+					goto congestion_drop;
+				}
+
+				q->stats.prob_mark++;
+			}
+		} else
+			p->qR = red_random(p);
+	}
+
+	/* Admit new packet */
+	if (likely(choke_len(q) < q->limit)) {
+
+		q->tab[q->tail] = skb;
+		q->tail = (q->tail + 1) & q->tab_mask;
+
+		sch->qstats.backlog += qdisc_pkt_len(skb);
+		qdisc_update_bstats(sch, skb);
+		sch->q.qlen = choke_len(q) - q->holes;
+		return NET_XMIT_SUCCESS;
+	}
+
+	q->stats.pdrop++;
+	sch->qstats.drops++;
+	kfree_skb(skb);
+	return NET_XMIT_DROP;
+
+ congestion_drop:
+	qdisc_drop(skb, sch);
+	return NET_XMIT_CN;
+}
+
+static struct sk_buff *choke_dequeue(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct sk_buff *skb;
+
+	if (q->head == q->tail) {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+		return NULL;
+	}
+	skb = q->tab[q->head];
+	q->tab[q->head] = NULL; /* not really needed */
+	q->head = (q->head + 1) & q->tab_mask;
+	choke_zap_head_holes(q);
+	sch->qstats.backlog -= qdisc_pkt_len(skb);
+	sch->q.qlen = choke_len(q) - q->holes;
+
+	return skb;
+}
+
+static unsigned int choke_drop(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	unsigned int len;
+
+	len = qdisc_queue_drop(sch);
+	if (len > 0)
+		q->stats.other++;
+	else {
+		if (!red_is_idling(&q->parms))
+			red_start_of_idle_period(&q->parms);
+	}
+
+	return len;
+}
+
+static void choke_reset(struct Qdisc* sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	red_restart(&q->parms);
+}
+
+static const struct nla_policy choke_policy[TCA_CHOKE_MAX + 1] = {
+	[TCA_CHOKE_PARMS]	= { .len = sizeof(struct tc_red_qopt) },
+	[TCA_CHOKE_STAB]	= { .len = 256 },
+};
+
+
+static void choke_free(void *addr)
+{
+	if (addr) {
+		if (is_vmalloc_addr(addr))
+			vfree(addr);
+		else
+			kfree(addr);
+	}
+}
+
+static int choke_change(struct Qdisc *sch, struct nlattr *opt)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_CHOKE_MAX + 1];
+	struct tc_red_qopt *ctl;
+	int err;
+	struct sk_buff **old = NULL;
+	unsigned int mask;
+
+	if (opt == NULL)
+		return -EINVAL;
+
+	err = nla_parse_nested(tb, TCA_CHOKE_MAX, opt, choke_policy);
+	if (err < 0)
+		return err;
+
+	if (tb[TCA_CHOKE_PARMS] == NULL ||
+	    tb[TCA_CHOKE_STAB] == NULL)
+		return -EINVAL;
+
+	ctl = nla_data(tb[TCA_CHOKE_PARMS]);
+
+	if (ctl->limit > CHOKE_MAX_QUEUE)
+		return -EINVAL;
+
+	mask = roundup_pow_of_two(ctl->limit + 1) - 1;
+	if (mask != q->tab_mask) {
+		struct sk_buff **ntab;
+
+		ntab = kcalloc(mask + 1, sizeof(struct sk_buff *), GFP_KERNEL);
+		if (!ntab)
+			ntab = vzalloc((mask + 1) * sizeof(struct sk_buff *));
+		if (!ntab)
+			return -ENOMEM;
+
+		sch_tree_lock(sch);
+		old = q->tab;
+		if (old) {
+			unsigned int tail = 0;
+
+			while (q->head != q->tail) {
+				ntab[tail++] = q->tab[q->head];
+				q->head = (q->head + 1) & q->tab_mask;
+			}
+			q->head = 0;
+			q->tail = tail;
+		}
+
+		q->tab_mask = mask;
+		q->tab = ntab;
+		q->holes = 0;
+	} else
+		sch_tree_lock(sch);
+
+	q->flags = ctl->flags;
+	q->limit = ctl->limit;
+
+	red_set_parms(&q->parms, ctl->qth_min, ctl->qth_max, ctl->Wlog,
+		      ctl->Plog, ctl->Scell_log,
+		      nla_data(tb[TCA_CHOKE_STAB]));
+
+	if (q->head == q->tail)
+		red_end_of_idle_period(&q->parms);
+
+	sch_tree_unlock(sch);
+	choke_free(old);
+	return 0;
+}
+
+static int choke_init(struct Qdisc* sch, struct nlattr *opt)
+{
+	return choke_change(sch, opt);
+}
+
+static int choke_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct nlattr *opts = NULL;
+	struct tc_red_qopt opt = {
+		.limit		= q->limit,
+		.flags		= q->flags,
+		.qth_min	= q->parms.qth_min >> q->parms.Wlog,
+		.qth_max	= q->parms.qth_max >> q->parms.Wlog,
+		.Wlog		= q->parms.Wlog,
+		.Plog		= q->parms.Plog,
+		.Scell_log	= q->parms.Scell_log,
+	};
+
+	opts = nla_nest_start(skb, TCA_OPTIONS);
+	if (opts == NULL)
+		goto nla_put_failure;
+
+	NLA_PUT(skb, TCA_CHOKE_PARMS, sizeof(opt), &opt);
+	return nla_nest_end(skb, opts);
+
+nla_put_failure:
+	nla_nest_cancel(skb, opts);
+	return -EMSGSIZE;
+}
+
+static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+	struct tc_choke_xstats st = {
+		.early	= q->stats.prob_drop + q->stats.forced_drop,
+		.marked	= q->stats.prob_mark + q->stats.forced_mark,
+		.pdrop	= q->stats.pdrop,
+		.other	= q->stats.other,
+		.matched = q->stats.matched,
+	};
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+
+static void choke_destroy(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	tcf_destroy_chain(&q->filter_list);
+	choke_free(q->tab);
+}
+
+static struct Qdisc *choke_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	return NULL;
+}
+
+static unsigned long choke_get(struct Qdisc *sch, u32 classid)
+{
+	return 0;
+}
+
+static void choke_put(struct Qdisc *q, unsigned long cl)
+{
+}
+
+static unsigned long choke_bind(struct Qdisc *sch, unsigned long parent,
+				u32 classid)
+{
+	return 0;
+}
+
+static struct tcf_proto **choke_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	if (cl)
+		return NULL;
+	return &q->filter_list;
+}
+
+static int choke_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+	tcm->tcm_handle |= TC_H_MIN(cl);
+	return 0;
+}
+
+static void choke_walk(struct Qdisc *sch, struct qdisc_walker *arg)
+{
+	if (!arg->stop) {
+		if (arg->fn(sch, 1, arg) < 0) {
+			arg->stop = 1;
+			return;
+		}
+		arg->count++;
+	}
+}
+
+static const struct Qdisc_class_ops choke_class_ops = {
+	.leaf		=	choke_leaf,
+	.get		=	choke_get,
+	.put		=	choke_put,
+	.tcf_chain	=	choke_find_tcf,
+	.bind_tcf	=	choke_bind,
+	.unbind_tcf	=	choke_put,
+	.dump		=	choke_dump_class,
+	.walk		=	choke_walk,
+};
+
+static struct sk_buff *choke_peek_head(struct Qdisc *sch)
+{
+	struct choke_sched_data *q = qdisc_priv(sch);
+
+	return (q->head != q->tail) ? q->tab[q->head] : NULL;
+}
+
+static struct Qdisc_ops choke_qdisc_ops __read_mostly = {
+	.id		=	"choke",
+	.priv_size	=	sizeof(struct choke_sched_data),
+
+	.enqueue	=	choke_enqueue,
+	.dequeue	=	choke_dequeue,
+	.peek		=	choke_peek_head,
+	.drop		=	choke_drop,
+	.init		=	choke_init,
+	.destroy	=	choke_destroy,
+	.reset		=	choke_reset,
+	.change		=	choke_change,
+	.dump		=	choke_dump,
+	.dump_stats	=	choke_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init choke_module_init(void)
+{
+	return register_qdisc(&choke_qdisc_ops);
+}
+
+static void __exit choke_module_exit(void)
+{
+	unregister_qdisc(&choke_qdisc_ops);
+}
+
+module_init(choke_module_init)
+module_exit(choke_module_exit)
+
+MODULE_LICENSE("GPL");
--- a/include/linux/pkt_sched.h	2011-01-12 17:44:05.823500415 -0800
+++ b/include/linux/pkt_sched.h	2011-01-12 17:44:53.175735219 -0800
@@ -247,6 +247,35 @@ struct tc_gred_sopt {
 	__u16		pad1;
 };
 
+/* CHOKe section */
+
+enum {
+	TCA_CHOKE_UNSPEC,
+	TCA_CHOKE_PARMS,
+	TCA_CHOKE_STAB,
+	__TCA_CHOKE_MAX,
+};
+
+#define TCA_CHOKE_MAX (__TCA_CHOKE_MAX - 1)
+
+struct tc_choke_qopt {
+	__u32		limit;		/* HARD maximal queue length (packets)	*/
+	__u32		qth_min;	/* Min average length threshold (packets) */
+	__u32		qth_max;	/* Max average length threshold (packets) */
+	unsigned char   Wlog;		/* log(W)		*/
+	unsigned char   Plog;		/* log(P_max/(qth_max-qth_min))	*/
+	unsigned char   Scell_log;	/* cell size for idle damping */
+	unsigned char	flags;		/* see RED flags */
+};
+
+struct tc_choke_xstats {
+	__u32           early;          /* Early drops */
+	__u32           pdrop;          /* Drops due to queue limits */
+	__u32           other;          /* Drops due to drop() calls */
+	__u32           marked;         /* Marked packets */
+	__u32		matched;	/* Drops due to flow match */
+};
+
 /* HTB section */
 #define TC_HTB_NUMPRIO		8
 #define TC_HTB_MAXDEPTH		8

^ permalink raw reply

* Re: [PATCH v1 2/2] TCPCT API sockopt update to draft -03
From: William Allen Simpson @ 2011-01-13 17:32 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Linux Kernel Developers, Linux Kernel Network Developers,
	David Miller, Andrew Morton
In-Reply-To: <20110112105608.793787b2@s6510>

On 1/12/11 1:56 PM, Stephen Hemminger wrote:
> On Wed, 12 Jan 2011 12:59:38 -0500
> William Allen Simpson<william.allen.simpson@gmail.com>  wrote:
>
>> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
>> index e64f4c6..c8f4017 100644
>> --- a/include/linux/tcp.h
>> +++ b/include/linux/tcp.h
>> @@ -185,22 +185,37 @@ struct tcp_md5sig {
>>   #define TCP_COOKIE_PAIR_SIZE	(2*TCP_COOKIE_MAX)
>>
>>   /* Flags for both getsockopt and setsockopt */
>> -#define TCP_COOKIE_IN_ALWAYS	(1<<  0)	/* Discard SYN without cookie */
>> -#define TCP_COOKIE_OUT_NEVER	(1<<  1)	/* Prohibit outgoing cookies,
>> +#define TCPCT_IN_ALWAYS		(1<<  0)	/* Discard SYN without cookie */
>> +#define TCPCT_OUT_NEVER		(1<<  1)	/* Prohibit outgoing cookies,
>
> You end up changing values in kernel userspace API in a way
> that is incompatible with older applications. This is not acceptable.
>
While I agree in principle and argued strongly against it, other
members of the research group (particularly the original project
sponsor) have over-ridden my concerns.  I'm sorry to inform you that
many/most participants don't care much about Linux.

Note that the *bits* are the same, and previously compiled programs
(that don't access more advanced features) should continue to run as
they have in the past.

Even though I'm not paid to work on Linux, I'm doing my best to give you
folks a quick heads up and provide code to rectify the very recent changes
that can be propagated back through the stable tree (to 2.6.33).

As always, what you actually do with my code is up to you....

^ permalink raw reply

* [PATCH] CHOKe flow scheduler (iproute)
From: Stephen Hemminger @ 2011-01-13 17:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <20110113092706.154748c2@s6510>

Preliminary interface for CHOKe scheduler in iproute

---
 include/linux/pkt_sched.h |   29 ++++++
 tc/Makefile               |    1 +
 tc/q_choke.c              |  221 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 251 insertions(+), 0 deletions(-)
 create mode 100644 tc/q_choke.c

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 2cfa4bc..83bac92 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -247,6 +247,35 @@ struct tc_gred_sopt {
 	__u16		pad1;
 };
 
+/* CHOKe section */
+
+enum {
+	TCA_CHOKE_UNSPEC,
+	TCA_CHOKE_PARMS,
+	TCA_CHOKE_STAB,
+	__TCA_CHOKE_MAX,
+};
+
+#define TCA_CHOKE_MAX (__TCA_CHOKE_MAX - 1)
+
+struct tc_choke_qopt {
+	__u32		limit;		/* HARD maximal queue length (packets)	*/
+	__u32		qth_min;	/* Min average length threshold (packets) */
+	__u32		qth_max;	/* Max average length threshold (packets) */
+	unsigned char   Wlog;		/* log(W)		*/
+	unsigned char   Plog;		/* log(P_max/(qth_max-qth_min))	*/
+	unsigned char   Scell_log;	/* cell size for idle damping */
+	unsigned char	flags;		/* see RED flags */
+};
+
+struct tc_choke_xstats {
+	__u32           early;          /* Early drops */
+	__u32           pdrop;          /* Drops due to queue limits */
+	__u32           other;          /* Drops due to drop() calls */
+	__u32           marked;         /* Marked packets */
+	__u32		matched;	/* Drops due to flow match */
+};
+
 /* HTB section */
 #define TC_HTB_NUMPRIO		8
 #define TC_HTB_MAXDEPTH		8
diff --git a/tc/Makefile b/tc/Makefile
index 101cc83..2cbd5d5 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -15,6 +15,7 @@ TCMODULES += q_cbq.o
 TCMODULES += q_rr.o
 TCMODULES += q_multiq.o
 TCMODULES += q_netem.o
+TCMODULES += q_choke.o
 TCMODULES += f_rsvp.o
 TCMODULES += f_u32.o
 TCMODULES += f_route.o
diff --git a/tc/q_choke.c b/tc/q_choke.c
new file mode 100644
index 0000000..044ae9a
--- /dev/null
+++ b/tc/q_choke.c
@@ -0,0 +1,221 @@
+/*
+ * q_choke.c		CHOKE.
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Stephen Hemminger <shemminger@vyatta.com>
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+#include "tc_red.h"
+
+static void explain(void)
+{
+	fprintf(stderr, "Usage: ... choke limit PACKETS bandwidth KBPS [ecn]\n");
+	fprintf(stderr, "                 [ min PACKETS ] [ max PACKETS ] [ burst PACKETS ]\n");
+}
+
+static int choke_parse_opt(struct qdisc_util *qu, int argc, char **argv,
+			   struct nlmsghdr *n)
+{
+	struct tc_red_qopt opt;
+	unsigned burst = 0;
+	unsigned avpkt = 1000;
+	double probability = 0.02;
+	unsigned rate = 0;
+	int ecn_ok = 0;
+	int wlog;
+	__u8 sbuf[256];
+	struct rtattr *tail;
+
+	memset(&opt, 0, sizeof(opt));
+
+	while (argc > 0) {
+		if (strcmp(*argv, "limit") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&opt.limit, *argv, 0)) {
+				fprintf(stderr, "Illegal \"limit\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "bandwidth") == 0) {
+			NEXT_ARG();
+			if (get_rate(&rate, *argv)) {
+				fprintf(stderr, "Illegal \"bandwidth\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "ecn") == 0) {
+			ecn_ok = 1;
+		} else if (strcmp(*argv, "min") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&opt.qth_min, *argv, 0)) {
+				fprintf(stderr, "Illegal \"min\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "max") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&opt.qth_max, *argv, 0)) {
+				fprintf(stderr, "Illegal \"max\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "burst") == 0) {
+			NEXT_ARG();
+			if (get_unsigned(&burst, *argv, 0)) {
+				fprintf(stderr, "Illegal \"burst\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "avpkt") == 0) {
+			NEXT_ARG();
+			if (get_size(&avpkt, *argv)) {
+				fprintf(stderr, "Illegal \"avpkt\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "probability") == 0) {
+			NEXT_ARG();
+			if (sscanf(*argv, "%lg", &probability) != 1) {
+				fprintf(stderr, "Illegal \"probability\"\n");
+				return -1;
+			}
+		} else if (strcmp(*argv, "help") == 0) {
+			explain();
+			return -1;
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--; argv++;
+	}
+
+	if (!rate || !opt.limit) {
+		fprintf(stderr, "Required parameter (bandwidth, limit) is missing\n");
+		return -1;
+	}
+
+	/* Compute default min/max thresholds based on 
+	   Sally Floyd's recommendations:
+	   http://www.icir.org/floyd/REDparameters.txt
+	*/
+	if (!opt.qth_max) 
+		opt.qth_max = opt.limit / 4;
+	if (!opt.qth_min)
+		opt.qth_min = opt.qth_max / 3;
+	if (!burst)
+		burst = (2 * opt.qth_min + opt.qth_max) / 3;
+
+	if (opt.qth_max > opt.limit) {
+		fprintf(stderr, "\"max\" is larger than \"limit\"\n");
+		return -1;
+	}
+
+	if (opt.qth_min > opt.qth_min) {
+		fprintf(stderr, "\"min\" is not smaller than \"max\"\n");
+		return -1;
+	}
+
+	printf("min=%u max=%u burst=%u limit=%u\n",
+	       opt.qth_min, opt.qth_max, burst, opt.limit);
+	wlog = tc_red_eval_ewma(opt.qth_min, burst, 1);
+	if (wlog < 0) {
+		fprintf(stderr, "CHOKE: failed to calculate EWMA constant.\n");
+		return -1;
+	}
+	if (wlog >= 10)
+		fprintf(stderr, "CHOKE: WARNING. Burst %d seems to be to large.\n", burst);
+	opt.Wlog = wlog;
+
+	wlog = tc_red_eval_P(opt.qth_min, opt.qth_max, probability);
+	if (wlog < 0) {
+		fprintf(stderr, "CHOKE: failed to calculate probability.\n");
+		return -1;
+	}
+	opt.Plog = wlog;
+
+	wlog = tc_red_eval_idle_damping(opt.Wlog, avpkt, rate, sbuf);
+	if (wlog < 0) {
+		fprintf(stderr, "CHOKE: failed to calculate idle damping table.\n");
+		return -1;
+	}
+	opt.Scell_log = wlog;
+	if (ecn_ok)
+		opt.flags |= TC_RED_ECN;
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	addattr_l(n, 1024, TCA_CHOKE_PARMS, &opt, sizeof(opt));
+	addattr_l(n, 1024, TCA_CHOKE_STAB, sbuf, 256);
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+	return 0;
+}
+
+static int choke_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+	struct rtattr *tb[TCA_CHOKE_STAB+1];
+	struct tc_red_qopt *qopt;
+	SPRINT_BUF(b1);
+	SPRINT_BUF(b2);
+	SPRINT_BUF(b3);
+
+	if (opt == NULL)
+		return 0;
+
+	parse_rtattr_nested(tb, TCA_CHOKE_STAB, opt);
+
+	if (tb[TCA_CHOKE_PARMS] == NULL)
+		return -1;
+	qopt = RTA_DATA(tb[TCA_CHOKE_PARMS]);
+	if (RTA_PAYLOAD(tb[TCA_CHOKE_PARMS])  < sizeof(*qopt))
+		return -1;
+	fprintf(f, "limit %s min %s max %s ",
+		sprint_size(qopt->limit, b1),
+		sprint_size(qopt->qth_min, b2),
+		sprint_size(qopt->qth_max, b3));
+
+	if (qopt->flags & TC_RED_ECN)
+		fprintf(f, "ecn ");
+
+	if (show_details) {
+		fprintf(f, "ewma %u Plog %u Scell_log %u",
+			qopt->Wlog, qopt->Plog, qopt->Scell_log);
+	}
+	return 0;
+}
+
+static int choke_print_xstats(struct qdisc_util *qu, FILE *f,
+			      struct rtattr *xstats)
+{
+	struct tc_choke_xstats *st;
+
+	if (xstats == NULL)
+		return 0;
+
+	if (RTA_PAYLOAD(xstats) < sizeof(*st))
+		return -1;
+
+	st = RTA_DATA(xstats);
+	fprintf(f, "  marked %u early %u pdrop %u other %u matched %u",
+		st->marked, st->early, st->pdrop, st->other, st->matched);
+	return 0;
+
+}
+
+struct qdisc_util choke_qdisc_util = {
+	.id		= "choke",
+	.parse_qopt	= choke_parse_opt,
+	.print_qopt	= choke_print_opt,
+	.print_xstats	= choke_print_xstats,
+};
-- 
1.7.1


^ permalink raw reply related

* [PATCH 1/2] ks8695net: Disable non-working ethtool operations
From: Ben Hutchings @ 2011-01-13 17:50 UTC (permalink / raw)
  To: Figo.zhang, zeal; +Cc: netdev

Some ethtool operations can only be implemented for the WAN port, and
not all such operations are allowed to return an error code such as
-EOPNOTSUPP.  Therefore, define two separate ethtool_ops structures
for WAN and non-WAN ports; simplify and rename the WAN-only functions.

This is completely untested as I don't have an ARM build environment.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
This has nothing much to do with my work, but I spotted it while
auditing the various implementations of ethtool_ops::get_link.  While
ks8695net doesn't have a regular maintainer, the commit log suggests
that you are using it so perhaps you could test this change.

Ben.

 drivers/net/arm/ks8695net.c |  282 +++++++++++++++----------------------------
 1 files changed, 99 insertions(+), 183 deletions(-)

diff --git a/drivers/net/arm/ks8695net.c b/drivers/net/arm/ks8695net.c
index 54c6d84..8820fcd 100644
--- a/drivers/net/arm/ks8695net.c
+++ b/drivers/net/arm/ks8695net.c
@@ -854,12 +854,12 @@ ks8695_set_msglevel(struct net_device *ndev, u32 value)
 }
 
 /**
- *	ks8695_get_settings - Get device-specific settings.
+ *	ks8695_wan_get_settings - Get device-specific settings.
  *	@ndev: The network device to read settings from
  *	@cmd: The ethtool structure to read into
  */
 static int
-ks8695_get_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
+ks8695_wan_get_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
 {
 	struct ks8695_priv *ksp = netdev_priv(ndev);
 	u32 ctrl;
@@ -870,69 +870,50 @@ ks8695_get_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
 			  SUPPORTED_TP | SUPPORTED_MII);
 	cmd->transceiver = XCVR_INTERNAL;
 
-	/* Port specific extras */
-	switch (ksp->dtype) {
-	case KS8695_DTYPE_HPNA:
-		cmd->phy_address = 0;
-		/* not supported for HPNA */
-		cmd->autoneg = AUTONEG_DISABLE;
+	cmd->advertising = ADVERTISED_TP | ADVERTISED_MII;
+	cmd->port = PORT_MII;
+	cmd->supported |= (SUPPORTED_Autoneg | SUPPORTED_Pause);
+	cmd->phy_address = 0;
 
-		/* BUG: Erm, dtype hpna implies no phy regs */
-		/*
-		ctrl = readl(KS8695_MISC_VA + KS8695_HMC);
-		cmd->speed = (ctrl & HMC_HSS) ? SPEED_100 : SPEED_10;
-		cmd->duplex = (ctrl & HMC_HDS) ? DUPLEX_FULL : DUPLEX_HALF;
-		*/
-		return -EOPNOTSUPP;
-	case KS8695_DTYPE_WAN:
-		cmd->advertising = ADVERTISED_TP | ADVERTISED_MII;
-		cmd->port = PORT_MII;
-		cmd->supported |= (SUPPORTED_Autoneg | SUPPORTED_Pause);
-		cmd->phy_address = 0;
+	ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
+	if ((ctrl & WMC_WAND) == 0) {
+		/* auto-negotiation is enabled */
+		cmd->advertising |= ADVERTISED_Autoneg;
+		if (ctrl & WMC_WANA100F)
+			cmd->advertising |= ADVERTISED_100baseT_Full;
+		if (ctrl & WMC_WANA100H)
+			cmd->advertising |= ADVERTISED_100baseT_Half;
+		if (ctrl & WMC_WANA10F)
+			cmd->advertising |= ADVERTISED_10baseT_Full;
+		if (ctrl & WMC_WANA10H)
+			cmd->advertising |= ADVERTISED_10baseT_Half;
+		if (ctrl & WMC_WANAP)
+			cmd->advertising |= ADVERTISED_Pause;
+		cmd->autoneg = AUTONEG_ENABLE;
+
+		cmd->speed = (ctrl & WMC_WSS) ? SPEED_100 : SPEED_10;
+		cmd->duplex = (ctrl & WMC_WDS) ?
+			DUPLEX_FULL : DUPLEX_HALF;
+	} else {
+		/* auto-negotiation is disabled */
+		cmd->autoneg = AUTONEG_DISABLE;
 
-		ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
-		if ((ctrl & WMC_WAND) == 0) {
-			/* auto-negotiation is enabled */
-			cmd->advertising |= ADVERTISED_Autoneg;
-			if (ctrl & WMC_WANA100F)
-				cmd->advertising |= ADVERTISED_100baseT_Full;
-			if (ctrl & WMC_WANA100H)
-				cmd->advertising |= ADVERTISED_100baseT_Half;
-			if (ctrl & WMC_WANA10F)
-				cmd->advertising |= ADVERTISED_10baseT_Full;
-			if (ctrl & WMC_WANA10H)
-				cmd->advertising |= ADVERTISED_10baseT_Half;
-			if (ctrl & WMC_WANAP)
-				cmd->advertising |= ADVERTISED_Pause;
-			cmd->autoneg = AUTONEG_ENABLE;
-
-			cmd->speed = (ctrl & WMC_WSS) ? SPEED_100 : SPEED_10;
-			cmd->duplex = (ctrl & WMC_WDS) ?
-				DUPLEX_FULL : DUPLEX_HALF;
-		} else {
-			/* auto-negotiation is disabled */
-			cmd->autoneg = AUTONEG_DISABLE;
-
-			cmd->speed = (ctrl & WMC_WANF100) ?
-				SPEED_100 : SPEED_10;
-			cmd->duplex = (ctrl & WMC_WANFF) ?
-				DUPLEX_FULL : DUPLEX_HALF;
-		}
-		break;
-	case KS8695_DTYPE_LAN:
-		return -EOPNOTSUPP;
+		cmd->speed = (ctrl & WMC_WANF100) ?
+			SPEED_100 : SPEED_10;
+		cmd->duplex = (ctrl & WMC_WANFF) ?
+			DUPLEX_FULL : DUPLEX_HALF;
 	}
 
 	return 0;
 }
 
 /**
- *	ks8695_set_settings - Set device-specific settings.
+ *	ks8695_wan_set_settings - Set device-specific settings.
  *	@ndev: The network device to configure
  *	@cmd: The settings to configure
  */
 static int
-ks8695_set_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
+ks8695_wan_set_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
 {
 	struct ks8695_priv *ksp = netdev_priv(ndev);
 	u32 ctrl;
@@ -956,171 +937,99 @@ ks8695_set_settings(struct net_device *ndev, struct ethtool_cmd *cmd)
 				ADVERTISED_100baseT_Full)) == 0)
 			return -EINVAL;
 
-		switch (ksp->dtype) {
-		case KS8695_DTYPE_HPNA:
-			/* HPNA does not support auto-negotiation. */
-			return -EINVAL;
-		case KS8695_DTYPE_WAN:
-			ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
-
-			ctrl &= ~(WMC_WAND | WMC_WANA100F | WMC_WANA100H |
-				  WMC_WANA10F | WMC_WANA10H);
-			if (cmd->advertising & ADVERTISED_100baseT_Full)
-				ctrl |= WMC_WANA100F;
-			if (cmd->advertising & ADVERTISED_100baseT_Half)
-				ctrl |= WMC_WANA100H;
-			if (cmd->advertising & ADVERTISED_10baseT_Full)
-				ctrl |= WMC_WANA10F;
-			if (cmd->advertising & ADVERTISED_10baseT_Half)
-				ctrl |= WMC_WANA10H;
-
-			/* force a re-negotiation */
-			ctrl |= WMC_WANR;
-			writel(ctrl, ksp->phyiface_regs + KS8695_WMC);
-			break;
-		case KS8695_DTYPE_LAN:
-			return -EOPNOTSUPP;
-		}
+		ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
 
+		ctrl &= ~(WMC_WAND | WMC_WANA100F | WMC_WANA100H |
+			  WMC_WANA10F | WMC_WANA10H);
+		if (cmd->advertising & ADVERTISED_100baseT_Full)
+			ctrl |= WMC_WANA100F;
+		if (cmd->advertising & ADVERTISED_100baseT_Half)
+			ctrl |= WMC_WANA100H;
+		if (cmd->advertising & ADVERTISED_10baseT_Full)
+			ctrl |= WMC_WANA10F;
+		if (cmd->advertising & ADVERTISED_10baseT_Half)
+			ctrl |= WMC_WANA10H;
+
+		/* force a re-negotiation */
+		ctrl |= WMC_WANR;
+		writel(ctrl, ksp->phyiface_regs + KS8695_WMC);
 	} else {
-		switch (ksp->dtype) {
-		case KS8695_DTYPE_HPNA:
-			/* BUG: dtype_hpna implies no phy registers */
-			/*
-			ctrl = __raw_readl(KS8695_MISC_VA + KS8695_HMC);
-
-			ctrl &= ~(HMC_HSS | HMC_HDS);
-			if (cmd->speed == SPEED_100)
-				ctrl |= HMC_HSS;
-			if (cmd->duplex == DUPLEX_FULL)
-				ctrl |= HMC_HDS;
-
-			__raw_writel(ctrl, KS8695_MISC_VA + KS8695_HMC);
-			*/
-			return -EOPNOTSUPP;
-		case KS8695_DTYPE_WAN:
-			ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
-
-			/* disable auto-negotiation */
-			ctrl |= WMC_WAND;
-			ctrl &= ~(WMC_WANF100 | WMC_WANFF);
-
-			if (cmd->speed == SPEED_100)
-				ctrl |= WMC_WANF100;
-			if (cmd->duplex == DUPLEX_FULL)
-				ctrl |= WMC_WANFF;
-
-			writel(ctrl, ksp->phyiface_regs + KS8695_WMC);
-			break;
-		case KS8695_DTYPE_LAN:
-			return -EOPNOTSUPP;
-		}
+		ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
+
+		/* disable auto-negotiation */
+		ctrl |= WMC_WAND;
+		ctrl &= ~(WMC_WANF100 | WMC_WANFF);
+
+		if (cmd->speed == SPEED_100)
+			ctrl |= WMC_WANF100;
+		if (cmd->duplex == DUPLEX_FULL)
+			ctrl |= WMC_WANFF;
+
+		writel(ctrl, ksp->phyiface_regs + KS8695_WMC);
 	}
 
 	return 0;
 }
 
 /**
- *	ks8695_nwayreset - Restart the autonegotiation on the port.
+ *	ks8695_wan_nwayreset - Restart the autonegotiation on the port.
  *	@ndev: The network device to restart autoneotiation on
  */
 static int
-ks8695_nwayreset(struct net_device *ndev)
+ks8695_wan_nwayreset(struct net_device *ndev)
 {
 	struct ks8695_priv *ksp = netdev_priv(ndev);
 	u32 ctrl;
 
-	switch (ksp->dtype) {
-	case KS8695_DTYPE_HPNA:
-		/* No phy means no autonegotiation on hpna */
-		return -EINVAL;
-	case KS8695_DTYPE_WAN:
-		ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
+	ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
 
-		if ((ctrl & WMC_WAND) == 0)
-			writel(ctrl | WMC_WANR,
-			       ksp->phyiface_regs + KS8695_WMC);
-		else
-			/* auto-negotiation not enabled */
-			return -EINVAL;
-		break;
-	case KS8695_DTYPE_LAN:
-		return -EOPNOTSUPP;
-	}
+	if ((ctrl & WMC_WAND) == 0)
+		writel(ctrl | WMC_WANR,
+		       ksp->phyiface_regs + KS8695_WMC);
+	else
+		/* auto-negotiation not enabled */
+		return -EINVAL;
 
 	return 0;
 }
 
 /**
- *	ks8695_get_link - Retrieve link status of network interface
+ *	ks8695_wan_get_link - Retrieve link status of network interface
  *	@ndev: The network interface to retrive the link status of.
  */
 static u32
-ks8695_get_link(struct net_device *ndev)
+ks8695_wan_get_link(struct net_device *ndev)
 {
 	struct ks8695_priv *ksp = netdev_priv(ndev);
 	u32 ctrl;
 
-	switch (ksp->dtype) {
-	case KS8695_DTYPE_HPNA:
-		/* HPNA always has link */
-		return 1;
-	case KS8695_DTYPE_WAN:
-		/* WAN we can read the PHY for */
-		ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
-		return ctrl & WMC_WLS;
-	case KS8695_DTYPE_LAN:
-		return -EOPNOTSUPP;
-	}
-	return 0;
+	ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
+	return ctrl & WMC_WLS;
 }
 
 /**
- *	ks8695_get_pause - Retrieve network pause/flow-control advertising
+ *	ks8695_wan_get_pause - Retrieve network pause/flow-control advertising
  *	@ndev: The device to retrieve settings from
  *	@param: The structure to fill out with the information
  */
 static void
-ks8695_get_pause(struct net_device *ndev, struct ethtool_pauseparam *param)
+ks8695_wan_get_pause(struct net_device *ndev, struct ethtool_pauseparam *param)
 {
 	struct ks8695_priv *ksp = netdev_priv(ndev);
 	u32 ctrl;
 
-	switch (ksp->dtype) {
-	case KS8695_DTYPE_HPNA:
-		/* No phy link on hpna to configure */
-		return;
-	case KS8695_DTYPE_WAN:
-		ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
-
-		/* advertise Pause */
-		param->autoneg = (ctrl & WMC_WANAP);
+	ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
 
-		/* current Rx Flow-control */
-		ctrl = ks8695_readreg(ksp, KS8695_DRXC);
-		param->rx_pause = (ctrl & DRXC_RFCE);
+	/* advertise Pause */
+	param->autoneg = (ctrl & WMC_WANAP);
 
-		/* current Tx Flow-control */
-		ctrl = ks8695_readreg(ksp, KS8695_DTXC);
-		param->tx_pause = (ctrl & DTXC_TFCE);
-		break;
-	case KS8695_DTYPE_LAN:
-		/* The LAN's "phy" is a direct-attached switch */
-		return;
-	}
-}
+	/* current Rx Flow-control */
+	ctrl = ks8695_readreg(ksp, KS8695_DRXC);
+	param->rx_pause = (ctrl & DRXC_RFCE);
 
-/**
- *	ks8695_set_pause - Configure pause/flow-control
- *	@ndev: The device to configure
- *	@param: The pause parameters to set
- *
- *	TODO: Implement this
- */
-static int
-ks8695_set_pause(struct net_device *ndev, struct ethtool_pauseparam *param)
-{
-	return -EOPNOTSUPP;
+	/* current Tx Flow-control */
+	ctrl = ks8695_readreg(ksp, KS8695_DTXC);
+	param->tx_pause = (ctrl & DTXC_TFCE);
 }
 
 /**
@@ -1140,12 +1049,17 @@ ks8695_get_drvinfo(struct net_device *ndev, struct ethtool_drvinfo *info)
 static const struct ethtool_ops ks8695_ethtool_ops = {
 	.get_msglevel	= ks8695_get_msglevel,
 	.set_msglevel	= ks8695_set_msglevel,
-	.get_settings	= ks8695_get_settings,
-	.set_settings	= ks8695_set_settings,
-	.nway_reset	= ks8695_nwayreset,
-	.get_link	= ks8695_get_link,
-	.get_pauseparam = ks8695_get_pause,
-	.set_pauseparam = ks8695_set_pause,
+	.get_drvinfo	= ks8695_get_drvinfo,
+};
+
+static const struct ethtool_ops ks8695_wan_ethtool_ops = {
+	.get_msglevel	= ks8695_get_msglevel,
+	.set_msglevel	= ks8695_set_msglevel,
+	.get_settings	= ks8695_wan_get_settings,
+	.set_settings	= ks8695_wan_set_settings,
+	.nway_reset	= ks8695_wan_nwayreset,
+	.get_link	= ks8695_wan_get_link,
+	.get_pauseparam = ks8695_wan_get_pause,
 	.get_drvinfo	= ks8695_get_drvinfo,
 };
 
@@ -1541,7 +1455,6 @@ ks8695_probe(struct platform_device *pdev)
 
 	/* driver system setup */
 	ndev->netdev_ops = &ks8695_netdev_ops;
-	SET_ETHTOOL_OPS(ndev, &ks8695_ethtool_ops);
 	ndev->watchdog_timeo	 = msecs_to_jiffies(watchdog);
 
 	netif_napi_add(ndev, &ksp->napi, ks8695_poll, NAPI_WEIGHT);
@@ -1608,12 +1521,15 @@ ks8695_probe(struct platform_device *pdev)
 	if (ksp->phyiface_regs && ksp->link_irq == -1) {
 		ks8695_init_switch(ksp);
 		ksp->dtype = KS8695_DTYPE_LAN;
+		SET_ETHTOOL_OPS(ndev, &ks8695_ethtool_ops);
 	} else if (ksp->phyiface_regs && ksp->link_irq != -1) {
 		ks8695_init_wan_phy(ksp);
 		ksp->dtype = KS8695_DTYPE_WAN;
+		SET_ETHTOOL_OPS(ndev, &ks8695_wan_ethtool_ops);
 	} else {
 		/* No initialisation since HPNA does not have a PHY */
 		ksp->dtype = KS8695_DTYPE_HPNA;
+		SET_ETHTOOL_OPS(ndev, &ks8695_ethtool_ops);
 	}
 
 	/* And bring up the net_device with the net core */
-- 
1.7.3.4





^ permalink raw reply related

* [PATCH 2/2] ks8695net: Use default implementation of ethtool_ops::get_link
From: Ben Hutchings @ 2011-01-13 17:52 UTC (permalink / raw)
  To: Figo.zhang, zeal; +Cc: netdev
In-Reply-To: <1294941014.3946.46.camel@bwh-desktop>

This is completely untested as I don't have an ARM build environment.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
---
I'm fairly confident that this is right for the WAN mode.  It depends on
the previous patch.

Ben.

 drivers/net/arm/ks8695net.c |   16 +---------------
 1 files changed, 1 insertions(+), 15 deletions(-)

diff --git a/drivers/net/arm/ks8695net.c b/drivers/net/arm/ks8695net.c
index 8820fcd..62d6f88 100644
--- a/drivers/net/arm/ks8695net.c
+++ b/drivers/net/arm/ks8695net.c
@@ -994,20 +994,6 @@ ks8695_wan_nwayreset(struct net_device *ndev)
 }
 
 /**
- *	ks8695_wan_get_link - Retrieve link status of network interface
- *	@ndev: The network interface to retrive the link status of.
- */
-static u32
-ks8695_wan_get_link(struct net_device *ndev)
-{
-	struct ks8695_priv *ksp = netdev_priv(ndev);
-	u32 ctrl;
-
-	ctrl = readl(ksp->phyiface_regs + KS8695_WMC);
-	return ctrl & WMC_WLS;
-}
-
-/**
  *	ks8695_wan_get_pause - Retrieve network pause/flow-control
advertising
  *	@ndev: The device to retrieve settings from
  *	@param: The structure to fill out with the information
@@ -1058,7 +1044,7 @@ static const struct ethtool_ops
ks8695_wan_ethtool_ops = {
 	.get_settings	= ks8695_wan_get_settings,
 	.set_settings	= ks8695_wan_set_settings,
 	.nway_reset	= ks8695_wan_nwayreset,
-	.get_link	= ks8695_wan_get_link,
+	.get_link	= ethtool_op_get_link,
 	.get_pauseparam = ks8695_wan_get_pause,
 	.get_drvinfo	= ks8695_get_drvinfo,
 };
-- 
1.7.3.4


-- 
Ben Hutchings, Senior Software Engineer, Solarflare Communications
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.


^ permalink raw reply related

* Re: [PATCH v1 2/2] TCPCT API sockopt update to draft -03
From: Arnaud Lacombe @ 2011-01-13 17:53 UTC (permalink / raw)
  To: William Allen Simpson
  Cc: Stephen Hemminger, Linux Kernel Developers,
	Linux Kernel Network Developers, David Miller, Andrew Morton
In-Reply-To: <4D2F3723.9040405@gmail.com>

Hi,

On Thu, Jan 13, 2011 at 12:32 PM, William Allen Simpson
<william.allen.simpson@gmail.com> wrote:
> On 1/12/11 1:56 PM, Stephen Hemminger wrote:
>>
>> On Wed, 12 Jan 2011 12:59:38 -0500
>> William Allen Simpson<william.allen.simpson@gmail.com>  wrote:
>>
>>> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
>>> index e64f4c6..c8f4017 100644
>>> --- a/include/linux/tcp.h
>>> +++ b/include/linux/tcp.h
>>> @@ -185,22 +185,37 @@ struct tcp_md5sig {
>>>  #define TCP_COOKIE_PAIR_SIZE  (2*TCP_COOKIE_MAX)
>>>
>>>  /* Flags for both getsockopt and setsockopt */
>>> -#define TCP_COOKIE_IN_ALWAYS   (1<<  0)        /* Discard SYN without
>>> cookie */
>>> -#define TCP_COOKIE_OUT_NEVER   (1<<  1)        /* Prohibit outgoing
>>> cookies,
>>> +#define TCPCT_IN_ALWAYS                (1<<  0)        /* Discard SYN
>>> without cookie */
>>> +#define TCPCT_OUT_NEVER                (1<<  1)        /* Prohibit
>>> outgoing cookies,
>>
>> You end up changing values in kernel userspace API in a way
>> that is incompatible with older applications. This is not acceptable.
>>
> While I agree in principle and argued strongly against it, other
> members of the research group (particularly the original project
> sponsor) have over-ridden my concerns.  I'm sorry to inform you that
> many/most participants don't care much about Linux.
>
> Note that the *bits* are the same, and previously compiled programs
> (that don't access more advanced features) should continue to run as
> they have in the past.
>
> Even though I'm not paid to work on Linux, I'm doing my best to give you
> folks a quick heads up and provide code to rectify the very recent changes
> that can be propagated back through the stable tree (to 2.6.33).
>
> As always, what you actually do with my code is up to you....
>
FWIW, what is the basis of this hunk ? The RFC text[0] seems to use
the TCP_COOKIE_* naming, not TCPCT_.

Thanks,
 - Arnaud

[0]: http://www.rfc-editor.org/authors/rfc6013.txt

--
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply

* Re: [PATCH v1 2/2] TCPCT API sockopt update to draft -03
From: Eric Dumazet @ 2011-01-13 18:00 UTC (permalink / raw)
  To: William Allen Simpson
  Cc: Stephen Hemminger, Linux Kernel Developers,
	Linux Kernel Network Developers, David Miller, Andrew Morton
In-Reply-To: <4D2F3723.9040405@gmail.com>

Le jeudi 13 janvier 2011 à 12:32 -0500, William Allen Simpson a écrit :
> On 1/12/11 1:56 PM, Stephen Hemminger wrote:
> > On Wed, 12 Jan 2011 12:59:38 -0500
> > William Allen Simpson<william.allen.simpson@gmail.com>  wrote:
> >
> >> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> >> index e64f4c6..c8f4017 100644
> >> --- a/include/linux/tcp.h
> >> +++ b/include/linux/tcp.h
> >> @@ -185,22 +185,37 @@ struct tcp_md5sig {
> >>   #define TCP_COOKIE_PAIR_SIZE	(2*TCP_COOKIE_MAX)
> >>
> >>   /* Flags for both getsockopt and setsockopt */
> >> -#define TCP_COOKIE_IN_ALWAYS	(1<<  0)	/* Discard SYN without cookie */
> >> -#define TCP_COOKIE_OUT_NEVER	(1<<  1)	/* Prohibit outgoing cookies,
> >> +#define TCPCT_IN_ALWAYS		(1<<  0)	/* Discard SYN without cookie */
> >> +#define TCPCT_OUT_NEVER		(1<<  1)	/* Prohibit outgoing cookies,
> >
> > You end up changing values in kernel userspace API in a way
> > that is incompatible with older applications. This is not acceptable.
> >
> While I agree in principle and argued strongly against it, other
> members of the research group (particularly the original project
> sponsor) have over-ridden my concerns.  I'm sorry to inform you that
> many/most participants don't care much about Linux.
> 

How leaving TCP_COOKIE_IN_ALWAYS and TCP_COOKIE_OUT_NEVER definitions so
that user space programs compiles can be a problem to "research group" ?

AFAIK, TCPCT_IN_ALWAYS / TCPCT_OUT_NEVER are not mentioned in
http://www.rfc-editor.org/authors/rfc6013.txt

But TCP_COOKIE_IN_ALWAYS and TCP_COOKIE_OUT_NEVER are ...

Isnt it a bit confusing ?

> Note that the *bits* are the same, and previously compiled programs
> (that don't access more advanced features) should continue to run as
> they have in the past.
> 
> Even though I'm not paid to work on Linux, I'm doing my best to give you
> folks a quick heads up and provide code to rectify the very recent changes
> that can be propagated back through the stable tree (to 2.6.33).
> 
> As always, what you actually do with my code is up to you....

Maybe its too early, and we should wait for an official RFC, especially
if you insist breaking API in 6 months.

^ permalink raw reply

* Re: [PATCH] CHOKe flow scheduler (0.7)
From: Eric Dumazet @ 2011-01-13 18:02 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <1294941621.3570.419.camel@edumazet-laptop>

Le jeudi 13 janvier 2011 à 19:00 +0100, Eric Dumazet a écrit :
> Le jeudi 13 janvier 2011 à 09:27 -0800, Stephen Hemminger a écrit :
> > This implements the CHOKe packet scheduler based on the existing
> > Linux RED scheduler based on the algorithm described in the paper.
> > 
> >

Sorry for the long reply, I hit 'Send' button in the wrong window,
before removing hunks.




^ permalink raw reply

* Re: [PATCH] CHOKe flow scheduler (0.7)
From: Eric Dumazet @ 2011-01-13 18:00 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20110113092706.154748c2@s6510>

Le jeudi 13 janvier 2011 à 09:27 -0800, Stephen Hemminger a écrit :
> This implements the CHOKe packet scheduler based on the existing
> Linux RED scheduler based on the algorithm described in the paper.
> 
> The core idea is:
>   For every packet arrival:
>   	Calculate Qave
> 	if (Qave < minth) 
> 	     Queue the new packet
> 	else 
> 	     Select randomly a packet from the queue 
> 	     if (both packets from same flow)
> 	     then Drop both the packets
> 	     else if (Qave > maxth)
> 	          Drop packet
> 	     else
> 	       	  Admit packet with proability p (same as RED)
> 
> See also:
>   Rong Pan, Balaji Prabhakar, Konstantinos Psounis, "CHOKe: a stateless active
>    queue management scheme for approximating fair bandwidth allocation", 
>   Proceeding of INFOCOM'2000, March 2000.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
> 
> ---
> Patch versions
> 0.3 (Eric) uses table for queue.
> 0.4 allows classification with TC filters
>     fixes crash when peek_random() finds a hole
> 0.5 (Eric) that fixes qlen with holes and peek
> 0.7 change to use separate params / stats than RED
>     account for drops in backlog
> 
> Almost ready, still need to make sure API (netlink) is right
> 
> 
>  net/sched/Kconfig     |   11 +
>  net/sched/Makefile    |    1 
>  net/sched/sch_choke.c |  536 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 548 insertions(+)
> 
> --- a/net/sched/Kconfig	2011-01-12 17:44:05.747500044 -0800
> +++ b/net/sched/Kconfig	2011-01-12 17:44:53.167735188 -0800
> @@ -205,6 +205,17 @@ config NET_SCH_DRR
>  
>  	  If unsure, say N.
>  
> +config NET_SCH_CHOKE
> +	tristate "CHOose and Keep responsive flow scheduler (CHOKE)"
> +	help
> +	  Say Y here if you want to use the CHOKe packet scheduler (CHOose
> +	  and Keep for responsive flows, CHOose and Kill for unresponsive
> +	  flows). This is a variation of RED which trys to penalize flows
> +	  that monopolize the queue.
> +
> +	  To compile this code as a module, choose M here: the
> +	  module will be called sch_choke.
> +
>  config NET_SCH_INGRESS
>  	tristate "Ingress Qdisc"
>  	depends on NET_CLS_ACT
> --- a/net/sched/Makefile	2011-01-12 17:44:05.767500135 -0800
> +++ b/net/sched/Makefile	2011-01-12 17:44:53.167735188 -0800
> @@ -32,6 +32,7 @@ obj-$(CONFIG_NET_SCH_MULTIQ)	+= sch_mult
>  obj-$(CONFIG_NET_SCH_ATM)	+= sch_atm.o
>  obj-$(CONFIG_NET_SCH_NETEM)	+= sch_netem.o
>  obj-$(CONFIG_NET_SCH_DRR)	+= sch_drr.o
> +obj-$(CONFIG_NET_SCH_CHOKE)	+= sch_choke.o
>  obj-$(CONFIG_NET_CLS_U32)	+= cls_u32.o
>  obj-$(CONFIG_NET_CLS_ROUTE4)	+= cls_route.o
>  obj-$(CONFIG_NET_CLS_FW)	+= cls_fw.o
> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ b/net/sched/sch_choke.c	2011-01-12 17:45:07.227806180 -0800
> @@ -0,0 +1,556 @@
> +/*
> + * net/sched/sch_choke.c	CHOKE scheduler
> + *
> + * Copyright (c) 2011 Stephen Hemminger <shemminger@vyatta.com>
> + * Copyright (c) 2011 Eric Dumazet <eric.dumazet@gmail.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * version 2 as published by the Free Software Foundation.
> + *
> + */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/skbuff.h>
> +#include <linux/reciprocal_div.h>
> +#include <net/pkt_sched.h>
> +#include <net/inet_ecn.h>
> +#include <net/red.h>
> +
> +/*	CHOKe stateless AQM for fair bandwidth allocation
> +        =================================================
> +
> +   CHOKe (CHOose and Keep for responsive flows, CHOose and Kill for
> +   unresponsive flows) is a variant of RED that penalizes misbehaving flows but
> +   maintains no flow state. The difference from RED is an additional step
> +   during the enqueuing process. If average queue size is over the
> +   low threshold (qmin), a packet is chosen at random from the queue.
> +   If both the new and chosen packet are from the same flow, both
> +   are dropped. Unlike RED, CHOKe is not really a "classful" qdisc because it
> +   needs to access packets in queue randomly. It has a minimal class
> +   interface to allow overriding the builtin flow classifier with
> +   filters.
> +
> +   Source:
> +   R. Pan, B. Prabhakar, and K. Psounis, "CHOKe, A Stateless
> +   Active Queue Management Scheme for Approximating Fair Bandwidth Allocation",
> +   IEEE INFOCOM, 2000.
> +
> +   A. Tang, J. Wang, S. Low, "Understanding CHOKe: Throughput and Spatial
> +   Characteristics", IEEE/ACM Transactions on Networking, 2004
> +
> + */
> +
> +/* Upper bound on size of sk_buff table */
> +#define CHOKE_MAX_QUEUE	(128*1024 - 1)
> +
> +struct choke_sched_data {
> +/* Parameters */
> +	u32		 limit;
> +	unsigned char	 flags;
> +
> +	struct red_parms parms;
> +
> +/* Variables */
> +	struct tcf_proto *filter_list;
> +	struct {
> +		u32	prob_drop;	/* Early probability drops */
> +		u32	prob_mark;	/* Early probability marks */
> +		u32	forced_drop;	/* Forced drops, qavg > max_thresh */
> +		u32	forced_mark;	/* Forced marks, qavg > max_thresh */
> +		u32	pdrop;          /* Drops due to queue limits */
> +		u32	other;          /* Drops due to drop() calls */
> +		u32	matched;	/* Drops to flow match */
> +	} stats;
> +
> +	unsigned int	 head;
> +	unsigned int	 tail;
> +	unsigned int	 holes;
> +	unsigned int	 tab_mask; /* size - 1 */
> +
> +	struct sk_buff **tab;
> +};
> +
> +static inline unsigned int choke_len(const struct choke_sched_data *q)
> +{
> +	return (q->tail - q->head) & q->tab_mask;
> +}
> +
> +/* deliver a random number between 0 and N - 1 */
> +static inline u32 random_N(unsigned int N)
> +{
> +	return reciprocal_divide(random32(), N);
> +}
> +
> +/* Select a packet at random from the queue in O(1) and handle holes */
> +static struct sk_buff *choke_peek_random(struct choke_sched_data *q,
> +					 unsigned int *pidx)
> +{
> +	struct sk_buff *skb;
> +	int retrys = 3;
> +
> +	do {
> +		*pidx = (q->head + random_N(choke_len(q))) & q->tab_mask;
> +		skb = q->tab[*pidx];
> +		if (skb)
> +			return skb;
> +	} while (--retrys > 0);
> +
> +	/* queue is has lots of holes use the head which is known to exist */
> +	return q->tab[*pidx = q->head];
> +}
> +
> +/* Is ECN parameter configured */
> +static inline int use_ecn(const struct choke_sched_data *q)
> +{
> +	return q->flags & TC_RED_ECN;
> +}
> +
> +/* Should packets over max just be dropped (versus marked) */
> +static inline int use_harddrop(const struct choke_sched_data *q)
> +{
> +	return q->flags & TC_RED_HARDDROP;
> +}
> +
> +/* Move head pointer forward to skip over holes */
> +static void choke_zap_head_holes(struct choke_sched_data *q)
> +{
> +	while (q->holes && q->tab[q->head] == NULL) {
> +		q->head = (q->head + 1) & q->tab_mask;
> +		q->holes--;
> +	}
> +}
> +
> +/* Move tail pointer backwards to reuse holes */
> +static void choke_zap_tail_holes(struct choke_sched_data *q)
> +{
> +	while (q->holes && q->tab[q->tail - 1] == NULL) {
> +		q->tail = (q->tail - 1) & q->tab_mask;
> +		q->holes--;
> +	}
> +}
> +
> +/* Drop packet from queue array by creating a "hole" */
> +static void choke_drop_by_idx(struct choke_sched_data *q, unsigned int idx)
> +{
> +	q->tab[idx] = NULL;
> +	q->holes++;
> +
> +	if (idx == q->head)
> +		choke_zap_head_holes(q);
> +	if (idx == q->tail)
> +		choke_zap_tail_holes(q);
> +}
> +
> +/* Classify flow using either:
> +   1. pre-existing classification result in skb
> +   2. fast internal classification
> +   3. use TC filter based classification
> +*/
> +static inline unsigned int choke_classify(struct sk_buff *skb,
> +					  struct Qdisc *sch, int *qerr)
> +
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +	struct tcf_result res;
> +	int result;
> +
> +	*qerr = NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
> +
> +	if (TC_H_MAJ(skb->priority) == sch->handle &&
> +	    TC_H_MIN(skb->priority) > 0)
> +		return TC_H_MIN(skb->priority);
> +
> +	if (!q->filter_list)
> +		return skb_get_rxhash(skb);
> +
> +	result = tc_classify(skb, q->filter_list, &res);
> +	if (result >= 0) {
> +#ifdef CONFIG_NET_CLS_ACT
> +		switch (result) {
> +		case TC_ACT_STOLEN:
> +		case TC_ACT_QUEUED:
> +			*qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
> +		case TC_ACT_SHOT:
> +			return 0;
> +		}
> +#endif
> +		return TC_H_MIN(res.classid);
> +	}
> +
> +	return 0;
> +}
> +
> +static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch)
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +	struct red_parms *p = &q->parms;
> +	unsigned int hash;
> +	int uninitialized_var(ret);
> +
> +	hash = choke_classify(skb, sch, &ret);
> +	if (!hash) {
> +		/* Packet was eaten by filter */
> +		if (ret & __NET_XMIT_BYPASS)
> +			sch->qstats.drops++;
> +		kfree_skb(skb);
> +		return ret;
> +	}
> +
> +	/* Maybe add hash as field in struct qdisc_skb_cb? */
> +	*(unsigned int *)(qdisc_skb_cb(skb)->data) = hash;
> +
> +	/* Compute average queue usage (see RED) */
> +	p->qavg = red_calc_qavg(p, choke_len(q) - q->holes);
> +	if (red_is_idling(p))
> +		red_end_of_idle_period(p);
> +
> +	/* Is queue small? */
> +	if (p->qavg <= p->qth_min)
> +		p->qcount = -1;
> +	else {
> +		struct sk_buff *oskb;
> +		unsigned int idx;
> +
> +		/* Draw a packet at random from queue */
> +		oskb = choke_peek_random(q, &idx);
> +
> +		/* Both packets from same flow ? */
> +		if (*(unsigned int *)(qdisc_skb_cb(oskb)->data) == hash) {
> +			/* Drop both packets */
> +			q->stats.matched++;
> +			choke_drop_by_idx(q, idx);
> +			qdisc_drop(oskb, sch);

I feel we should add : sch->q.qlen--;

> +			goto congestion_drop;
> +		}
> +
> +		/* Queue is large, always mark/drop */
> +		if (p->qavg > p->qth_max) {
> +			p->qcount = -1;
> +
> +			sch->qstats.overlimits++;
> +			if (use_harddrop(q) || !use_ecn(q) ||
> +			    !INET_ECN_set_ce(skb)) {
> +				q->stats.forced_drop++;
> +				goto congestion_drop;
> +			}
> +
> +			q->stats.forced_mark++;
> +		} else if (++p->qcount) {
> +			if (red_mark_probability(p, p->qavg)) {
> +				p->qcount = 0;
> +				p->qR = red_random(p);
> +
> +				sch->qstats.overlimits++;
> +				if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
> +					q->stats.prob_drop++;
> +					goto congestion_drop;
> +				}
> +
> +				q->stats.prob_mark++;
> +			}
> +		} else
> +			p->qR = red_random(p);
> +	}
> +
> +	/* Admit new packet */
> +	if (likely(choke_len(q) < q->limit)) {
> +
> +		q->tab[q->tail] = skb;
> +		q->tail = (q->tail + 1) & q->tab_mask;
> +
> +		sch->qstats.backlog += qdisc_pkt_len(skb);
> +		qdisc_update_bstats(sch, skb);
> +		sch->q.qlen = choke_len(q) - q->holes;
	or : sch->q.qlen++;

(If sch->q.qlen is up2date in respect of above comment)

> +		return NET_XMIT_SUCCESS;
> +	}
> +
> +	q->stats.pdrop++;
> +	sch->qstats.drops++;
> +	kfree_skb(skb);
> +	return NET_XMIT_DROP;
> +
> + congestion_drop:
> +	qdisc_drop(skb, sch);
> +	return NET_XMIT_CN;
> +}
> +
> +static struct sk_buff *choke_dequeue(struct Qdisc *sch)
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +	struct sk_buff *skb;
> +
> +	if (q->head == q->tail) {
> +		if (!red_is_idling(&q->parms))
> +			red_start_of_idle_period(&q->parms);
> +		return NULL;
> +	}
> +	skb = q->tab[q->head];
> +	q->tab[q->head] = NULL; /* not really needed */
> +	q->head = (q->head + 1) & q->tab_mask;
> +	choke_zap_head_holes(q);
> +	sch->qstats.backlog -= qdisc_pkt_len(skb);
> +	sch->q.qlen = choke_len(q) - q->holes;

	sch->q.qlen--;

> +
> +	return skb;
> +}
> +
> +static unsigned int choke_drop(struct Qdisc *sch)
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +	unsigned int len;
> +
> +	len = qdisc_queue_drop(sch);
> +	if (len > 0)
> +		q->stats.other++;
> +	else {
> +		if (!red_is_idling(&q->parms))
> +			red_start_of_idle_period(&q->parms);
> +	}
> +
> +	return len;
> +}
> +
> +static void choke_reset(struct Qdisc* sch)
> +{
> +	struct choke_sched_data *q = qdisc_priv(sch);
> +
> +	red_restart(&q->parms);
> +}
> +
> +static const struct nla_policy choke_policy[TCA_CHOKE_MAX + 1] = {
> +	[TCA_CHOKE_PARMS]	= { .len = sizeof(struct tc_red_qopt) },
> +	[TCA_CHOKE_STAB]	= { .len = 256 },

RED_STAB_SIZE ?


Thanks !



^ permalink raw reply

* Re: STMMAC driver: NFS Problem on 2.6.37
From: Armando Visconti @ 2011-01-13 18:28 UTC (permalink / raw)
  To: Chuck Lever, Deepak SIKRI
  Cc: Trond.Myklebust-HgOvQuBEEgTQT0dZR+AlfA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Shiraz HASHIM,
	Viresh KUMAR, Giuseppe CAVALLARO
In-Reply-To: <2D04CF75-CA68-4BDC-99A3-FA1DD6113602-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

Chuck Lever wrote:
> On Jan 13, 2011, at 4:09 AM, deepaksi wrote:
>
>   
>> Hi
>>
>> I am facing a problem related to nfs boot, while using the stmmac driver
>> ported on 2.6.37 kernel. When we use a JFFS2 file system and mount the kernel,
>> the network driver works fine.
>>
>> I have been following the mailing list and could find some issues with NFS 
>> on 2.6.37 but I am not too sure whether the kernel crash I am getting is 
>> related to that.
>>
>> The driver worked fine on 2.6.32 kernel, but while booting the 2.6.37
>> kernel I get the following log messages:
>>
>> stmmac: Rx Checksum Offload Engine supported
>>        TX Checksum insertion supported
>> IP-Config: Complete:
>>     device=eth0, addr=192.168.1.10, mask=255.255.255.0, gw=255.255.255.255,
>>     host=192.168.1.10, domain=, nis-domain=(none),
>>     bootserver=192.168.1.1, rootserver=192.168.1.1, rootpath=
>>     
>
> Why is rootpath left undefined?
>   

Yes, Chuck.
Good catch.

Deepak,
Can you possibly verify  your bootargs?

I  see exactly your same problem with kernel 2.6.32 (rc6.3) on my
board, where the bootargs is defined like this:

bootargs=console=ttyAMA0,115200 mem=128M root=/dev/nfs 
ip=192.168.1.10:192.168.1
.1:192.168.1.1:255.255.255.0 nfsroot=192.168.1.1:/home/guest/armv7/target

In fact, rootpath is undefined also in my case...

But if I get the network info from my DHCP server the system is booting 
correctly.
(i.e. console=ttyAMA0,115200 mem=128M root=/dev/nfs ip=dhcp)

So, why do we have rootpath undefined in our bootargs?
I guess we screwed up something someway...

Let's see it tomorrow.

Ciao,
Arm



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: sch_sfb [was: net_sched: mark packet staying on queue too long]
From: Patrick McHardy @ 2011-01-13 18:59 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: netdev, Jesper Dangaard Brouer, David Miller
In-Reply-To: <87bp3kg58i.fsf_-_@trurl.pps.jussieu.fr>

Am 13.01.2011 17:04, schrieb Juliusz Chroboczek:
>>> Have you looked at the SFB (Stochastic Fair Blue) implementation by
>>> Juliusz Chroboczek?
> 
>>> http://www.pps.jussieu.fr/~jch/software/sfb/
> 
>> I had a closer look at this some time ago and noticed a couple of bugs
>> (f.i. double buffering might be enabled or disabled or the buffers
>> switched while a packet is queued, so on dequeue the wrong buffer will
>> have its queue length decremented) and also found the hashing quite
>> inefficient,
> 
> And you never found the time to drop me a mail on the subject?

Well, I just looked at it out of interest after already having started
my own version. Also I was riding the train without possibility of
communication :)

>> so I've implemented my own version.
> 
> I see.

I took a lot of ideas from your version, and this is also mentioned
in my version. It just seemed easier to start from scratch than to
fully analyze your version and base it on the original paper.
Since to my knowledge you've never attempted an upstream merge this
also seemed like the more polite way. No impoliteness intended.

^ permalink raw reply

* [PATCH] vxge: Remember to release firmware after upgrading firmware
From: Jesper Juhl @ 2011-01-13 20:25 UTC (permalink / raw)
  To: netdev
  Cc: linux-kernel, Ramkrishna Vepa, Jon Mason, Sivakumar Subramani,
	Sreenivasa Honnur

Regardless of whether the firmware update being performed by 
vxge_fw_upgrade() is a success or not we must still remember to always 
release_firmware() before returning.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
---
 vxge-main.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/vxge/vxge-main.c b/drivers/net/vxge/vxge-main.c
index 1ac9b56..c81a651 100644
--- a/drivers/net/vxge/vxge-main.c
+++ b/drivers/net/vxge/vxge-main.c
@@ -4120,6 +4120,7 @@ int vxge_fw_upgrade(struct vxgedev *vdev, char *fw_name, int override)
 	       "hotplug event.\n");
 
 out:
+	release_firmware(fw);
 	return ret;
 }
 


-- 
Jesper Juhl <jj@chaosbits.net>            http://www.chaosbits.net/
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please.


^ permalink raw reply related

* Re: [PATCH] ipv4: devconf: start IPV4_DEVCONF_* from 0
From: David Miller @ 2011-01-13 20:28 UTC (permalink / raw)
  To: lucian.grijincu
  Cc: netdev, kuznet, pekkas, jmorris, yoshfuji, kaber, opurdila,
	ddvlad
In-Reply-To: <AANLkTi=jRKjxCJ+LitX5TaqxWakj39DM37hGUw_rJG4H@mail.gmail.com>

From: Lucian Adrian Grijincu <lucian.grijincu@gmail.com>
Date: Thu, 13 Jan 2011 14:23:41 +0200

> On Thu, Jan 13, 2011 at 12:02 PM, Thomas Graf <tgraf@infradead.org> wrote:
>> On Thu, Jan 13, 2011 at 09:50:14AM +0200, Lucian Adrian Grijincu wrote:
>>> Yes it works, but there does not seem to be a good reason why to
>>> complicate things like this (again the sentinel nature of zero is not
>>> used in any place here).
>>
>> I have no objects to changing this at all but we don't gain much either.
> 
> 
> Should I post a new patch in which IFLA_INET_CONF cfgid values start
> from 0 also or must the ABI for libnl be kept as-is (counting from 1)?

I think the conclusion is that we won't apply your change, sorry.

There is almost zero upside, and one known downside in that we have
to deal with this user visible breakage somehow.

^ permalink raw reply

* Re: [patch v2] phonet: some signedness bugs
From: David Miller @ 2011-01-13 20:30 UTC (permalink / raw)
  To: remi.denis-courmont; +Cc: error27, dan.j.rosenberg, netdev, kernel-janitors
In-Reply-To: <201101131432.58059.remi.denis-courmont@nokia.com>

From: "Rémi Denis-Courmont" <remi.denis-courmont@nokia.com>
Date: Thu, 13 Jan 2011 14:32:57 +0200

> On Tuesday 11 January 2011 02:06:20 ext David Miller, you wrote:
>> From: Dan Carpenter <error27@gmail.com>
>> Date: Mon, 10 Jan 2011 17:06:58 +0300
>> 
>> > Dan Rosenberg pointed out that there were some signed comparison bugs
>> > in the phonet protocol.
>> > 
>> > http://marc.info/?l=full-disclosure&m=129424528425330&w=2
>> > 
>> > The problem is that we check for array overflows but "protocol" is
>> > signed and we don't check for array underflows.  If you have already
>> > have CAP_SYS_ADMIN then you could use the bugs to get root, or someone
>> > could cause an oops by mistake.
>> > 
>> > Signed-off-by: Dan Carpenter <error27@gmail.com>
>> 
>> Applied.
> 
> Shouldn't this be sent to stable trees?

It will be.

^ permalink raw reply

* Re: [PATCH net-next-2.6] etherdevice.h: Add is_unicast_ether_addr function
From: David Miller @ 2011-01-13 20:35 UTC (permalink / raw)
  To: cmetcalf; +Cc: eric.dumazet, joe, tklauser, netdev
In-Reply-To: <4D2F2A92.7020909@tilera.com>

From: Chris Metcalf <cmetcalf@tilera.com>
Date: Thu, 13 Jan 2011 11:38:42 -0500

> So, I like Tobias' reworked patches.  I can take them into the Tilera tree,
> but I'd prefer David Miller take them into the net tree if he is agreeable,
> since it now includes changes to generic networking code.  If you take the
> latter approach you can include my:

I'll take them into my networking tree, which is where all network
device changes ought to go in the first place.

^ permalink raw reply

* RE: [PATCH] vxge: Remember to release firmware after upgrading firmware {nodisc}
From: Ramkrishna Vepa @ 2011-01-13 20:35 UTC (permalink / raw)
  To: Jesper Juhl, netdev@vger.kernel.org
  Cc: linux-kernel@vger.kernel.org, Jon Mason, Sivakumar Subramani,
	Sreenivasa Honnur
In-Reply-To: <alpine.LNX.2.00.1101132119080.11347@swampdragon.chaosbits.net>

> Regardless of whether the firmware update being performed by
> vxge_fw_upgrade() is a success or not we must still remember to always
> release_firmware() before returning.
> 
> Signed-off-by: Jesper Juhl <jj@chaosbits.net>
> ---
>  vxge-main.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/net/vxge/vxge-main.c b/drivers/net/vxge/vxge-main.c
> index 1ac9b56..c81a651 100644
> --- a/drivers/net/vxge/vxge-main.c
> +++ b/drivers/net/vxge/vxge-main.c
> @@ -4120,6 +4120,7 @@ int vxge_fw_upgrade(struct vxgedev *vdev, char
> *fw_name, int override)
>  	       "hotplug event.\n");
> 
>  out:
> +	release_firmware(fw);
>  	return ret;
>  }
Thanks!

Acked-by: Ram Vepa <ram.vepa@exar.com>

^ permalink raw reply

* Re: [PATCH V9 02/13] ntp: add ADJ_SETOFFSET mode bit
From: Kuwahara,T. @ 2011-01-13 20:39 UTC (permalink / raw)
  To: Richard Cochran
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	John Stultz, Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti,
	Thomas Gleixner
In-Reply-To: <60566a54842bcf5974d55ed39f387c32ff9cf5cb.1294917348.git.richard.cochran-3mrvs1K0uXizZXS1Dc/lvw@public.gmane.org>

On Thu, Jan 13, 2011 at 8:32 PM, Richard Cochran
<richardcochran-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> This patch adds a new mode bit into the timex structure. When set, the bit
> instructs the kernel to add the given time value to the current time.

The unix time is a nonlinear timescale and there's no way to predict
how many leap seconds will be inserted/deleted during the given time
interval.  So it is impossible to convert relative time to absolute
time and thus your patch is broken. (More specifically, the
timekeeping_inject_offset function is broken.)

My proposal: Limit the adjustable range of the offset so that leap
seconds will never occur more than once. (2147.5 seconds would be the
best choice. :-)

^ permalink raw reply

* Re: [PATCH] CHOKe flow scheduler (0.7)
From: Eric Dumazet @ 2011-01-13 20:37 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20110113092706.154748c2@s6510>

Le jeudi 13 janvier 2011 à 09:27 -0800, Stephen Hemminger a écrit :
> +	/* Admit new packet */
> +	if (likely(choke_len(q) < q->limit)) {
> +
> +		q->tab[q->tail] = skb;
> +		q->tail = (q->tail + 1) & q->tab_mask;
> +
> +		sch->qstats.backlog += qdisc_pkt_len(skb);
> +		qdisc_update_bstats(sch, skb);
> +		sch->q.qlen = choke_len(q) - q->holes;
> +		return NET_XMIT_SUCCESS;
> +	}
> +

You now must use qdisc_bstats_update() ;)




^ permalink raw reply

* Re: [PATCH 2.6.36] vlan: Avoid hwaccel vlan packets when vid not used
From: Matt Carlson @ 2011-01-13 20:50 UTC (permalink / raw)
  To: Jesse Gross
  Cc: Matthew Carlson, Michael Leun, Michael Chan, Eric Dumazet,
	David Miller, Ben Greear, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <AANLkTimz3Mw5itChtTgHTeGT4seJauy3+FVrpEJn1iez@mail.gmail.com>

On Thu, Jan 13, 2011 at 07:06:22AM -0800, Jesse Gross wrote:
> On Wed, Jan 12, 2011 at 8:21 PM, Matt Carlson <mcarlson@broadcom.com> wrote:
> > On Thu, Jan 06, 2011 at 08:36:27PM -0800, Jesse Gross wrote:
> >> On Thu, Jan 6, 2011 at 10:24 PM, Matt Carlson <mcarlson@broadcom.com> wrote:
> >> > On Sat, Dec 18, 2010 at 07:38:00PM -0800, Jesse Gross wrote:
> >> >> On Tue, Dec 14, 2010 at 11:16 PM, Michael Leun
> >> >> <lkml20101129@newton.leun.net> wrote:
> >> >> > OK - all tests done on that DL320G5:
> >> >> >
> >> >> > For completeness, 2.6.37-rc5 unpatched:
> >> >> >
> >> >> > eth0, no vlan configured: totally broken - see double tagged vlans
> >> >> > without tag, single or untagged packets missing at all
> >> >>
> >> >> Random behavior? ?This one is somewhat hard to explain - maybe there
> >> >> are some other factors. ?eth0 has ASF on, so it always strips tags. ?I
> >> >> would expect it to behave like the vlan configured case.
> >> >>
> >> >> >
> >> >> > eth0, vlan configured: see packets without vlan tag (see double tagged
> >> >> > packets with one vlan tag)
> >> >>
> >> >> Both ASF and vlan group configured cause tag stripping to be enabled.
> >> >> Missing tag.
> >> >>
> >> >> >
> >> >> > eth1 same as originally reported:
> >> >> > without vlan configured see vlan tags (single and double tagged as
> >> >> > expected)
> >> >>
> >> >> No ASF and no vlan group means tag stripping is disabled. ?Have tag.
> >> >>
> >> >> > with vlan configured: see packets without vlan tag (see double tagged
> >> >> > packets with one vlan tag)
> >> >>
> >> >> Configuring vlan group causes stripping to be enabled. ?Missing tag.
> >> >>
> >> >> >
> >> >> >
> >> >> > 2.6.37-rc5, your tg3 use new vlan-code patch:
> >> >> >
> >> >> > eth0, no vlan configured: ?see packets without vlan tag (see double
> >> >> > tagged packets with one vlan tag)
> >> >>
> >> >> ASF enables tag stripping. ?Missing tag.
> >> >>
> >> >> > eth1, no vlan configured: see vlan tags (single and double tagged as
> >> >> > expected)
> >> >>
> >> >> No ASF, no vlan group means no stripping. ?Have tag.
> >> >>
> >> >> >
> >> >> >
> >> >> > eth0, vlan configured: as without vlan
> >> >>
> >> >> ASF enables stripping. ?Missing tag.
> >> >>
> >> >> > eth1, vlan configured: as without vlan
> >> >>
> >> >> With this patch vlan stripping is only enabled when ASF is on, so no
> >> >> stripping. ?Have tag.
> >> >>
> >> >> >
> >> >> > 2.6.37-rc5, your tg3 use new vlan-code patch with test patch ontop
> >> >> >
> >> >> > eth1 no vlan configured: see packets without vlan tag (see double tagged
> >> >> > packets with one vlan tag)
> >> >>
> >> >> With the second patch, vlan stripping is always enabled. ?Missing tag.
> >> >>
> >> >> > eth1 with vlan: the same
> >> >>
> >> >> Stripping still always enabled. ?Missing tag.
> >> >>
> >> >> The bottom line is whenever vlan stripping is enabled we're missing
> >> >> the outer tag. ?It might be worth adding some debugging in the area
> >> >> before napi_gro_receive/vlan_gro_receive (depending on version). ?My
> >> >> guess is that (desc->type_flags & RXD_FLAG_VLAN) is false even for
> >> >> vlan packets on this NIC.
> >> >>
> >> >> You said that everything works on the 5752? ?Matt, is it possible that
> >> >> the 5714 either has a problem with vlan stripping or a different way
> >> >> of reporting it?
> >> >
> >> > I don't think this is a 5714 specific issue. ?I think the problem is
> >> > rooted in the fact that the VLAN tag stripping is enabled.
> >>
> >> It's definitely related to vlan stripping being enabled. ?Other cards
> >> using tg3 seem to work fine with stripping though, which is why I
> >> thought it might be specific to the 5714.
> >
> > I just tested this on a 5714S, using a net-next-2.6 snapshot obtained
> > today. ?It does the right thing in both cases (2nd tg3 patch ommited /
> > applied). ?The tag is always visible in the packet stream as seen from
> > tcpdump.
> >
> >> > Your RXD_FLAG_VLAN idea sounds unlikely to me, but it's worth a check.
> >> >
> >> > The patch here is using __vlan_hwaccel_put_tag(), which informs the
> >> > stack a VLAN tag is present. ?If this is indeed a reporting problem, I'm
> >> > not sure what else the driver should be doing.
> >>
> >> The code to hand off the tag to the stack looks OK to me. ?Michael was
> >> seeing this on older versions of the kernel as well with this NIC,
> >> which predates both this patch and the larger vlan changes so it
> >> doesn't seem like a problem with passing the tag to the network stack.
> >> ?It's hard to know exactly what is going on though without seeing what
> >> the hardware is reporting.
> >
> > When RX_MODE_KEEP_VLAN_TAG is set, the RXD_FLAG_VLAN flag will not be set
> > when receiving a packet. ?The driver skips the __vlan_hwaccel_put_tag()
> > call.
> >
> > When RX_MODE_KEEP_VLAN_TAG is unset, the RXD_FLAG_VLAN flag is set, and
> > __vlan_hwaccel_put_tag() is called to reinject the packet.
> 
> OK, thanks for testing it out.  I'm not sure that there's anything
> more we can do without hearing from Michael.

In the meantime, I think what we have should go upstream.  Just to be
absolutely clear though, your position is that VLAN tags should always
be stripped?


^ permalink raw reply

* [PATCH] Even Batman should not dereference NULL pointers
From: Jesper Juhl @ 2011-01-13 20:53 UTC (permalink / raw)
  To: b.a.t.m.a.n
  Cc: netdev, linux-kernel, Marek Lindner, Simon Wunderlich,
	Sven Eckelmann, David S. Miller

There's a problem in net/batman-adv/unicast.c::frag_send_skb().
dev_alloc_skb() allocates memory and may fail, thus returning NULL. If 
this happens we'll pass a NULL pointer on to skb_split() which in turn 
hands it to skb_split_inside_header() from where it gets passed to 
skb_put() that lets skb_tail_pointer() play with it and that function 
dereferences it. And thus the bat dies.

While I was at it I also moved the call to dev_alloc_skb() above the 
assignment to 'unicast_packet' since there's no reason to do that 
assignment if the memory allocation fails.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
---
 unicast.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/batman-adv/unicast.c b/net/batman-adv/unicast.c
index dc2e28b..ee41fef 100644
--- a/net/batman-adv/unicast.c
+++ b/net/batman-adv/unicast.c
@@ -229,10 +229,12 @@ int frag_send_skb(struct sk_buff *skb, struct bat_priv *bat_priv,
 	if (!bat_priv->primary_if)
 		goto dropped;

-	unicast_packet = (struct unicast_packet *) skb->data;
+	frag_skb = dev_alloc_skb(data_len - (data_len / 2) + ucf_hdr_len);
+	if (!frag_skb)
+		goto dropped;

+	unicast_packet = (struct unicast_packet *) skb->data;
 	memcpy(&tmp_uc, unicast_packet, uc_hdr_len);
-	frag_skb = dev_alloc_skb(data_len - (data_len / 2) + ucf_hdr_len);
 	skb_split(skb, frag_skb, data_len / 2);

 	if (my_skb_head_push(skb, ucf_hdr_len - uc_hdr_len) < 0 ||

-- 
Jesper Juhl <jj@chaosbits.net>            http://www.chaosbits.net/
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please.

^ permalink raw reply related

* Re: [PATCH V9 02/13] ntp: add ADJ_SETOFFSET mode bit
From: john stultz @ 2011-01-13 20:57 UTC (permalink / raw)
  To: Kuwahara,T.
  Cc: Richard Cochran, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti,
	Thomas Gleixner
In-Reply-To: <AANLkTimHP1OsWauj6O566WgnxVTjMbNg2PK644QcT9Lq-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Fri, 2011-01-14 at 05:39 +0900, Kuwahara,T. wrote:
> My proposal: Limit the adjustable range of the offset so that leap
> seconds will never occur more than once. (2147.5 seconds would be the
> best choice. :-)

2147.5? That's ~36 minutes. 

While I think a limit could be a sensible compromise here. Leap seconds
are limited to every six months. So surely a limit of 86400 (one day),
or 2592000 (30 days) would be more reasonable.

thanks
-john

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox