netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Stochastic Fair Blue queue discipline
@ 2008-04-07 22:37 Juliusz Chroboczek
  2008-04-08  0:04 ` Patrick McHardy
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Juliusz Chroboczek @ 2008-04-07 22:37 UTC (permalink / raw)
  To: netdev

Hello,

The attached is an implementation of the Stochastic Fair Blue queue
discipline for Linux.

I would like to request a review of the following code, and whether it
is worthwile to add the Kconfig bits and submit it for inclusion into
the mainline kernel.

You will find a ready to build tarball on

  http://www.pps.jussieu.fr/~jch/software/files/sch_sfb-20080407.tar.gz
  http://www.pps.jussieu.fr/~jch/software/files/sch_sfb-20080407.tar.gz.asc

and a short description on

  http://www.pps.jussieu.fr/~jch/software/sfb/

Thanks,

                                        Juliusz Chroboczek

diff -urN empty/Makefile sch_sfb/Makefile
--- empty/Makefile	1970-01-01 01:00:00.000000000 +0100
+++ sch_sfb/Makefile	2008-04-08 00:28:43.000000000 +0200
@@ -0,0 +1,11 @@
+KERNEL_SOURCE=/lib/modules/$(shell uname -r)/build
+
+obj-m += sch_sfb.o
+
+all:
+	make -C $(KERNEL_SOURCE) M=$(PWD) modules
+
+clean:
+	make -C $(KERNEL_SOURCE) M=$(PWD) clean
+	-rm -f *~ Module.symvers
+
diff -urN empty/README sch_sfb/README
--- empty/README	1970-01-01 01:00:00.000000000 +0100
+++ sch_sfb/README	2008-04-08 00:28:43.000000000 +0200
@@ -0,0 +1,160 @@
+                    Stochastic Fair Blue for Linux
+                    ******************************
+
+
+Sch_sfb is an implementation of the Stochastic Fair Blue (SFB) queue
+management algorithm for Linux.  SFB is described in
+
+  Stochastic Fair Blue: A Queue Management Algorithm for Enforcing
+  Fairness.  Wu-chang Feng, Dilip D. Kandlur, Debanjan Saha and
+  Kang G. Shin.  Proc. INFOCOM'01. 2001.
+
+SFB aims to keep your queues short while avoiding packet drops,
+maximising link utilisation and preserving fairness between flows.
+SFB will detect flows that do not respond to congestion indications,
+and limit them to a constant share of the link's capacity.
+
+SFB only deals with marking and/or droping packets.  Reordering of
+packets is handled by sfb's child qdisc; by default, this is pfifo,
+meaning that no reordering will happen.  You may want to use something
+like prio or sfq as sfb's child.
+
+Unlike most other fair queueing policies, SFB doesn't actually keep
+per-flow state; it manages flow information using a Bloom filter,
+which means that hash collisions are reduced to a minimum while using
+fairly little memory.
+
+
+Differences from the paper
+**************************
+
+This implementation takes the minimum of the mark probabilities of all
+matching buckets, rather than the maximum, as suggested in the paper.
+Either the paper is wrong, or I don't understand how Bloom filters
+work.
+
+If an aggregate has a very high marking rate, then either it is
+unresponsive, or we are badly congested.  In either case, it is a good
+idea to start dropping packets rather than just marking them.  If pm
+is below 1/2, we mark with probability pm; if pm is larger than 1/2,
+then we drop with probability pm * (2 * pm - 1/2), and mark with
+probability pm * (3/2 - 2 * pm).
+
+
+Quick start:
+************
+
+  tc qdisc add dev eth0 root handle 1: sfb
+
+Sfb only deals with marking/dropping packets, but doesn't reorder
+them.  You may want to put a reordering qdisc below sfb.
+
+  tc qdisc add dev eth0 handle 2: parent 1: prio
+
+Since sfb penalises inelastic flows rather severely, you may want to
+put another qdisc above sfb to bypass sfb for such flows.  And of
+course if you're rate-limiting (e.g. using cbq, tbf or htb) you'll
+want to do that above sfb.
+
+If you have the patched version of tc (see below), you may set
+non-default values and see some of sfb's statistics:
+
+  tc qdisc add dev eth0 root handle 1: sfb target 30 max 50 penalty_rate 100
+  tc -s qdisc show dev eth0
+
+
+Building
+********
+
+  $ cd sch_sfb/
+  $ make KERNEL_SOURCE=/usr/src/linux
+  $ insmod ./sch_sfb.ko
+
+You will also want to patch your tc tool; a patch for current versions
+of iproute2 is in the file iproute-sfb.patch.
+
+
+Parameters
+**********
+
+hash-type: one of ``flow'' (per-flow hashing), ``source'' (hash on the
+source IP address), ``dest'' (hash on the destination address), or
+``source-dest'' (hash on both the source and destination addresses).
+
+hashes HASHES, buckets BUCKETS: the size of the Bloom filter to use.
+The amount of space used is proportional to HASHES x BUCKETS, and the
+per-packet CPU time is roughly proportional to HASHES.
+
+rehash SECS: specifies how often we will switch to a fresh Bloom
+filter and a different set of hash functions.  Since Bloom filters are
+more resistent to hash collisions that hash tables, this may be set to
+a fairly large value.
+
+db SECS: specifies fow how long we will perform double-buffering
+before switching to a new Bloom filter.  This should be long enough to
+make sure that the new filter is ``primed'' before it is used, a few
+tens of seconds should be enough.
+
+limit PACKETS: the hard limit on the number of packets in sfb's queue.
+Set it to some large value, it should never be reached anyway.
+
+target PACKETS: the target per-flow queue size in packets.  sfb will
+try to keep each per-aggregate queue between 0 and target.
+
+max PACKETS: the maximum number of packets queued for a given aggregate.
+It should be very slightly larger than target, in order to absorb
+transient bursts.  Setting this to more than roughly 1.5 times target
+will cause instabilities that Blue is not designed to cope with.
+
+increment FLOAT, decrement FLOAT: the values by which per-flow
+dropping probability is in/decreased on queue under/overflow.  These
+should be a small fraction of a percent, and increment should be a few
+times smaller than decrement.
+
+penalty_rate PPS, penalty_burst PACKETS: when a flow doesn't back off
+in response to congestion indication, sbf will categorise it as
+``non-reactive'' and will rate-limit it.  Penalty-rate specifies the
+total throughput that non-reactive flows are allowed to use; burst is
+the size of the token bucket.  You should set penalty_rate to some
+reasonable fraction of your up-link throughput (the default values are
+probably too small).
+
+
+Statistics
+**********
+
+  $ tc -s qdisc show dev eth1
+  ...
+  qdisc sfb 2: dev eth1 parent 1: hash flow limit 1000 max 30 target 20
+    increment 0.00050 decrement 0.00005 penalty rate 10 burst 20 (600s 60s 8x24)
+   Sent 9317012 bytes 74577 pkt (dropped 3165, overlimits 3235 requeues 58519) 
+   rate 0bit 0pps backlog 0b 0p requeues 58519 
+    earlydrop 2196 ratedrop 0 bucketdrop 969 queuedrop 0 marked 70
+    maxqlen 0 maxprob 0.01494
+
+earlydrop: the number of packets dropped before a per-flow queue was full.
+
+ratedrop: the number of packets dropped because of rate-limiting.
+
+bucketdrop: the number of packets dropped because a per-flow queue was
+full.
+
+queuedrop: the number of packets dropped due to reaching limit.  This
+should normally be 0.
+
+marked: the number of packets marked with ECN.
+
+maxqlen: the length of the longest per-flow (virtual) queue.
+
+maxprob: the maximum per-flow drop probability.  1 means that some
+flows have been detected as non-reactive.
+
+
+If ratedrop is high, you're sending your non-reactive flows through
+sfb; this may not be a good idea.  If bucketdrop is high, you're
+probably sending a lot of aggressive short-lived flows through sfb.
+If queuedrop is not 0, something is wrong.
+
+
+                                        Juliusz Chroboczek
+                                        <jch@pps.jussieu.fr>
diff -urN empty/sch_sfb.c sch_sfb/sch_sfb.c
--- empty/sch_sfb.c	1970-01-01 01:00:00.000000000 +0100
+++ sch_sfb/sch_sfb.c	2008-04-08 00:29:14.000000000 +0200
@@ -0,0 +1,687 @@
+/*
+ * sch_sfb.c    Stochastic Fair Blue
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Juliusz Chroboczek <jch@pps.jussieu.fr>
+ */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/random.h>
+#include <linux/jhash.h>
+#include <net/ip.h>
+#include <net/pkt_sched.h>
+#include <net/inet_ecn.h>
+
+#include "sfb.h"
+
+struct bucket {
+        u16 qlen;
+        u16 pm;
+};
+
+struct sfb_sched_data
+{
+        u16 numhashes, numbuckets;
+        u16 rehash_interval, db_interval;
+        u16 max, target;
+        u16 increment, decrement;
+        u32 limit;
+        u32 penalty_rate, penalty_burst;
+        u32 tokens_avail;
+        psched_time_t rehash_time, token_time;
+
+        u8 hash_type;
+        u8 filter;
+        u8 double_buffering;
+        u32 perturbation[2][MAXHASHES];
+        struct bucket buckets[2][MAXHASHES][MAXBUCKETS];
+	struct Qdisc *qdisc;
+
+        __u32 earlydrop, penaltydrop, bucketdrop, queuedrop, marked;
+};
+
+static unsigned sfb_hash(struct sk_buff *skb, int hash, int filter,
+                         struct sfb_sched_data *q)
+{
+	u32 h, h2;
+        u8 hash_type = q->hash_type;
+
+	switch (skb->protocol) {
+	case __constant_htons(ETH_P_IP):
+	{
+		const struct iphdr *iph = ip_hdr(skb);
+		h = hash_type == SFB_HASH_SOURCE ? 0 : iph->daddr;
+		h2 = hash_type == SFB_HASH_DEST ? 0 : iph->saddr;
+		if (hash_type == SFB_HASH_FLOW) {
+                        h2 ^= iph->protocol;
+                        if(!(iph->frag_off&htons(IP_MF|IP_OFFSET)) &&
+                           (iph->protocol == IPPROTO_TCP ||
+                            iph->protocol == IPPROTO_UDP ||
+                            iph->protocol == IPPROTO_UDPLITE ||
+                            iph->protocol == IPPROTO_SCTP ||
+                            iph->protocol == IPPROTO_DCCP ||
+                            iph->protocol == IPPROTO_ESP)) {
+                                h2 ^= *(((u32*)iph) + iph->ihl);
+                        }
+                }
+		break;
+	}
+	case __constant_htons(ETH_P_IPV6):
+	{
+		struct ipv6hdr *iph = ipv6_hdr(skb);
+		h = hash_type == SFB_HASH_SOURCE ? 0 :
+                        iph->daddr.s6_addr32[1] ^ iph->daddr.s6_addr32[3];
+		h2 = hash_type == SFB_HASH_DEST ? 0 :
+                        iph->saddr.s6_addr32[1] ^ iph->saddr.s6_addr32[3];
+                if (hash_type == SFB_HASH_FLOW) {
+                        h2 ^= iph->nexthdr;
+                        if(iph->nexthdr == IPPROTO_TCP ||
+                           iph->nexthdr == IPPROTO_UDP ||
+                           iph->nexthdr == IPPROTO_UDPLITE ||
+                           iph->nexthdr == IPPROTO_SCTP ||
+                           iph->nexthdr == IPPROTO_DCCP ||
+                           iph->nexthdr == IPPROTO_ESP)
+                                h2 ^= *(u32*)&iph[1];
+                }
+		break;
+	}
+	default:
+		h = skb->protocol;
+                if(hash_type != SFB_HASH_SOURCE)
+                        h ^= (u32)(unsigned long)skb->dst;
+                h2 = hash_type == SFB_HASH_FLOW ?
+                        (u32)(unsigned long)skb->sk : 0;
+	}
+
+        return jhash_2words(h, h2, q->perturbation[filter][hash]) %
+                q->numbuckets;
+
+}
+
+static inline u16 prob_plus(u16 p1, u16 p2)
+{
+        return p1 < SFB_MAX_PROB - p2 ? p1 + p2 : SFB_MAX_PROB;
+}
+
+static inline u16 prob_minus(u16 p1, u16 p2)
+{
+        return p1 > p2 ? p1 - p2 : 0;
+}
+
+static void
+increment_one_qlen(struct sk_buff *skb, int filter, struct sfb_sched_data *q)
+{
+        int i;
+        for(i = 0; i < q->numhashes; i++) {
+                unsigned hash = sfb_hash(skb, i, filter, q);
+                if(q->buckets[filter][i][hash].qlen < 0xFFFF)
+                        q->buckets[filter][i][hash].qlen++;
+        }
+}
+
+static void
+increment_qlen(struct sk_buff *skb, struct sfb_sched_data *q)
+{
+        increment_one_qlen(skb, q->filter, q);
+        if(q->double_buffering)
+                increment_one_qlen(skb, !q->filter, q);
+}
+
+static void
+decrement_one_qlen(struct sk_buff *skb, int filter, struct sfb_sched_data *q)
+{
+        int i;
+        for(i = 0; i < q->numhashes; i++) {
+                unsigned hash = sfb_hash(skb, i, filter, q);
+                if(q->buckets[filter][i][hash].qlen > 0)
+                    q->buckets[filter][i][hash].qlen--;
+        }
+}
+
+static void
+decrement_qlen(struct sk_buff *skb, struct sfb_sched_data *q)
+{
+        decrement_one_qlen(skb, q->filter, q);
+        if(q->double_buffering)
+                decrement_one_qlen(skb, !q->filter, q);
+}
+
+static inline void
+decrement_prob(int filter, int bucket, unsigned hash, struct sfb_sched_data *q)
+{
+        q->buckets[filter][bucket][hash].pm =
+                prob_minus(q->buckets[filter][bucket][hash].pm,
+                           q->decrement);
+}
+
+static inline void
+increment_prob(int filter, int bucket, unsigned hash, struct sfb_sched_data *q)
+{
+        q->buckets[filter][bucket][hash].pm =
+                prob_plus(q->buckets[filter][bucket][hash].pm,
+                           q->increment);
+}
+
+static void
+zero_all_buckets(int filter, struct sfb_sched_data *q)
+{
+        int i, j;
+        for(i = 0; i < MAXHASHES; i++) {
+                for(j = 0; j < MAXBUCKETS; j++) {
+                        q->buckets[filter][i][j].pm = 0;
+                        q->buckets[filter][i][j].qlen = 0;
+                }
+        }
+}
+
+static void
+compute_qlen(u16 *qlen_r, u16 *prob_r, struct sfb_sched_data *q)
+{
+        int i, j, filter = q->filter;
+        u16 qlen = 0, prob = 0;
+        for(i = 0; i < q->numhashes; i++) {
+                for(j = 0; j < q->numbuckets; j++) {
+                        if(qlen < q->buckets[filter][i][j].qlen)
+                                qlen = q->buckets[filter][i][j].qlen;
+                        if(qlen < q->buckets[filter][i][j].pm)
+                                prob = q->buckets[filter][i][j].pm;
+                }
+        }
+        *qlen_r = qlen;
+        *prob_r = prob;
+}
+
+
+static void
+init_perturbation(int filter, struct sfb_sched_data *q)
+{
+        get_random_bytes(q->perturbation[filter],
+                         sizeof(q->perturbation[filter]));
+}
+
+static void
+swap_buffers(struct sfb_sched_data *q)
+{
+        q->filter = !q->filter;
+        zero_all_buckets(!q->filter, q);
+        init_perturbation(!q->filter, q);
+        q->double_buffering = 0;
+}
+
+static int rate_limit(struct sk_buff *skb, psched_time_t now,
+                      struct sfb_sched_data* q)
+{
+        if(q->penalty_rate == 0 || q->penalty_burst == 0)
+                return 1;
+
+        if(q->tokens_avail < 1) {
+                psched_tdiff_t age;
+
+                age = psched_tdiff_bounded(now, q->token_time,
+                                           256 * PSCHED_TICKS_PER_SEC);
+                q->tokens_avail =
+                        (age * q->penalty_rate / PSCHED_TICKS_PER_SEC);
+                if(q->tokens_avail > q->penalty_burst)
+                        q->tokens_avail = q->penalty_burst;
+                q->token_time = now;
+                if(q->tokens_avail < 1)
+                        return 1;
+        }
+
+        q->tokens_avail--;
+        return 0;
+}
+
+static int sfb_enqueue(struct sk_buff *skb, struct Qdisc* sch)
+{
+
+	struct sfb_sched_data *q = qdisc_priv(sch);
+        struct Qdisc *child = q->qdisc;
+        psched_time_t now;
+        int filter;
+        u16 minprob = SFB_MAX_PROB;
+        u16 minqlen = (u16)(~0);
+        u32 r;
+        int ret, i;
+
+        now = psched_get_time();
+
+        if(q->rehash_interval > 0) {
+                psched_tdiff_t age;
+                age = psched_tdiff_bounded(now, q->rehash_time,
+                                           q->rehash_interval *
+                                           PSCHED_TICKS_PER_SEC);
+                if(unlikely(age >= q->rehash_interval * PSCHED_TICKS_PER_SEC)) {
+                        swap_buffers(q);
+                        q->rehash_time = now;
+                }
+                if(unlikely(!q->double_buffering && q->db_interval > 0 &&
+                            age >= (q->rehash_interval - q->db_interval) *
+                            PSCHED_TICKS_PER_SEC))
+                        q->double_buffering = 1;
+        }
+
+        filter = q->filter;
+
+        for(i = 0; i < q->numhashes; i++) {
+                unsigned hash = sfb_hash(skb, i, filter, q);
+                if(q->buckets[filter][i][hash].qlen == 0)
+                        decrement_prob(filter, i, hash, q);
+                else if(unlikely(q->buckets[filter][i][hash].qlen >= q->target))
+                        increment_prob(filter, i, hash, q);
+                if(minqlen > q->buckets[filter][i][hash].qlen)
+                        minqlen = q->buckets[filter][i][hash].qlen;
+                if(minprob > q->buckets[filter][i][hash].pm)
+                        minprob = q->buckets[filter][i][hash].pm;
+        }
+
+        if(q->double_buffering) {
+                for(i = 0; i < q->numhashes; i++) {
+                        unsigned hash = sfb_hash(skb, i, !filter, q);
+                        if(q->buckets[!filter][i][hash].qlen == 0)
+                                decrement_prob(!filter, i, hash, q);
+                        else if(unlikely(q->buckets[!filter][i][hash].qlen >=
+                                         q->target))
+                                increment_prob(!filter, i, hash, q);
+                }
+        }
+        
+        if(unlikely(minqlen >= q->max || sch->q.qlen >= q->limit)) {
+                sch->qstats.overlimits++;
+                if(likely(minqlen >= q->max))
+                        q->bucketdrop++;
+                else
+                        q->queuedrop++;
+                goto drop;
+        }
+
+        if(unlikely(minprob >= SFB_MAX_PROB)) {
+                /* Inelastic flow */
+                if(rate_limit(skb, now, q)) {
+                        sch->qstats.overlimits++;
+                        q->penaltydrop++;
+                        goto drop;
+                }
+                goto enqueue;
+        }
+
+        r = net_random() & SFB_MAX_PROB;
+
+        if(unlikely(r < minprob)) {
+                if(unlikely(minprob > SFB_MAX_PROB / 2)) {
+                        /* If we're marking that many packets, then either
+                           this flow is unresponsive, or we're badly congested.
+                           In either case, we want to start dropping packets. */
+                        if(r < (minprob - SFB_MAX_PROB / 2) * 2) {
+                                q->earlydrop++;
+                                goto drop;
+                        }
+                }
+                if(INET_ECN_set_ce(skb)) {
+                        q->marked++;
+                } else {
+                        q->earlydrop++;
+                        goto drop;
+                }
+        }
+
+ enqueue:
+        ret = child->enqueue(skb, child);
+        if(likely(ret == NET_XMIT_SUCCESS)) {
+                sch->q.qlen++;
+                increment_qlen(skb, q);
+		sch->bstats.packets++;
+		sch->bstats.bytes += skb->len;
+                sch->qstats.backlog += skb->len;
+        } else {
+                q->queuedrop++;
+        }
+        return ret;
+
+ drop:
+        qdisc_drop(skb, sch);
+        return NET_XMIT_CN;
+}
+
+static int sfb_requeue(struct sk_buff *skb, struct Qdisc* sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+        struct Qdisc *child = q->qdisc;
+        int ret;
+
+        ret = child->ops->requeue(skb, child);
+        if(unlikely(ret != NET_XMIT_SUCCESS)) {
+                sch->qstats.drops++;
+                return ret;
+        }
+
+        sch->q.qlen++;
+        increment_qlen(skb, q);
+        sch->qstats.backlog += skb->len;
+        sch->qstats.requeues++;
+        return ret;
+}
+
+static struct sk_buff *sfb_dequeue(struct Qdisc* sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+        struct Qdisc *child = q->qdisc;
+	struct sk_buff *skb;
+
+	skb = child->dequeue(q->qdisc);
+
+        if(skb) {
+                sch->q.qlen--;
+		sch->qstats.backlog -= skb->len;
+                decrement_qlen(skb, q);
+        }
+
+        return skb;
+}
+
+static void sfb_reset(struct Qdisc* sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	qdisc_reset(q->qdisc);
+        sch->q.qlen = 0;
+        sch->qstats.backlog = 0;
+        q->filter = 0;
+        q->double_buffering = 0;
+        zero_all_buckets(0, q);
+        zero_all_buckets(1, q);
+        init_perturbation(q->filter, q);
+}
+
+static void sfb_destroy(struct Qdisc *sch)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	qdisc_destroy(q->qdisc);
+        q->qdisc = NULL;
+}
+
+static struct Qdisc *sfb_create_dflt(struct Qdisc *sch, u32 limit)
+{
+	struct Qdisc *q;
+	struct rtattr *rta;
+	int ret;
+
+	q = qdisc_create_dflt(sch->dev, &pfifo_qdisc_ops,
+			      TC_H_MAKE(sch->handle, 1));
+	if (q) {
+		rta = kmalloc(RTA_LENGTH(sizeof(struct tc_fifo_qopt)),
+			      GFP_KERNEL);
+		if (rta) {
+			rta->rta_type = RTM_NEWQDISC;
+			rta->rta_len = RTA_LENGTH(sizeof(struct tc_fifo_qopt));
+			((struct tc_fifo_qopt *)RTA_DATA(rta))->limit = limit;
+
+			ret = q->ops->change(q, rta);
+			kfree(rta);
+
+			if (ret == 0)
+				return q;
+		}
+		qdisc_destroy(q);
+	}
+	return NULL;
+}
+
+static int sfb_change(struct Qdisc *sch, struct rtattr *opt)
+{
+        struct sfb_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *child = NULL;
+	struct rtattr *tb[TCA_SFB_MAX];
+	struct tc_sfb_qopt *ctl;
+        u16 numhashes, numbuckets, rehash_interval, db_interval;
+        u8 hash_type;
+        u32 limit;
+        u16 max, target;
+        u16 increment, decrement;
+        u32 penalty_rate, penalty_burst;
+
+	if (opt == NULL) {
+                numhashes = 6;
+                numbuckets = 32;
+                hash_type = SFB_HASH_FLOW;
+                rehash_interval = 600;
+                db_interval = 60;
+                limit = 0;
+                max = 25;
+                target = 20;
+                increment = (SFB_MAX_PROB + 1000) / 2000;
+                decrement = (SFB_MAX_PROB + 10000) / 20000;
+                penalty_rate = 10;
+                penalty_burst = 20;
+        } else {
+                if(rtattr_parse_nested(tb, TCA_SFB_MAX, opt))
+                        return -EINVAL;
+
+                if (tb[TCA_SFB_PARMS-1] == NULL ||
+                    RTA_PAYLOAD(tb[TCA_SFB_PARMS-1]) < sizeof(*ctl))
+                        return -EINVAL;
+                
+                ctl = RTA_DATA(tb[TCA_SFB_PARMS-1]);
+                numhashes = ctl->numhashes;
+                numbuckets = ctl->numbuckets;
+                rehash_interval = ctl->rehash_interval;
+                db_interval = ctl->db_interval;
+                hash_type = ctl->hash_type;
+                limit = ctl->limit;
+                max = ctl->max;
+                target = ctl->target;
+                increment = ctl->increment;
+                decrement = ctl->decrement;
+                penalty_rate = ctl->penalty_rate;
+                penalty_burst = ctl->penalty_burst;
+        }
+
+        if(numbuckets <= 0 || numbuckets > MAXBUCKETS)
+                numbuckets = MAXBUCKETS;
+        if(numhashes <= 0 || numhashes > MAXHASHES)
+                numhashes = MAXHASHES;
+        if(hash_type >= __SFB_HASH_MAX)
+                hash_type = SFB_HASH_FLOW;
+        if(limit == 0)
+                limit = sch->dev->tx_queue_len ? sch->dev->tx_queue_len : 1;
+
+        child = sfb_create_dflt(sch, limit);
+        if(child == NULL)
+                return -ENOMEM;
+                
+        sch_tree_lock(sch);
+	if(child) {
+		qdisc_tree_decrease_qlen(q->qdisc, q->qdisc->q.qlen);
+		qdisc_destroy(xchg(&q->qdisc, child));
+	}
+
+        q->numhashes = numhashes;
+        q->numbuckets = numbuckets;
+        q->rehash_interval = rehash_interval;
+        q->db_interval = db_interval;
+        q->hash_type = hash_type;
+        q->limit = limit;
+        q->increment = increment;
+        q->decrement = decrement;
+        q->max = max;
+        q->target = target;
+        q->penalty_rate = penalty_rate;
+        q->penalty_burst = penalty_burst;
+
+        q->filter = 0;
+        q->double_buffering = 0;
+        zero_all_buckets(0, q);
+        zero_all_buckets(1, q);
+        init_perturbation(q->filter, q);
+
+	sch_tree_unlock(sch);
+
+	return 0;
+}
+
+static int sfb_init(struct Qdisc *sch, struct rtattr *opt)
+{
+      	struct sfb_sched_data *q = qdisc_priv(sch);
+
+        q->qdisc = &noop_qdisc;
+        return sfb_change(sch, opt);
+}
+
+static int sfb_dump(struct Qdisc *sch, struct sk_buff *skb)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct rtattr *opts = NULL;
+	struct tc_sfb_qopt opt = { .numhashes = q->numhashes,
+                                   .numbuckets = q->numbuckets,
+                                   .rehash_interval = q->rehash_interval,
+                                   .db_interval = q->db_interval,
+                                   .hash_type = q->hash_type,
+                                   .limit = q->limit,
+                                   .max = q->max,
+                                   .target = q->target,
+                                   .increment = q->increment,
+                                   .decrement = q->decrement,
+                                   .penalty_rate = q->penalty_rate,
+                                   .penalty_burst = q->penalty_burst,
+        };
+
+	opts = RTA_NEST(skb, TCA_OPTIONS);
+	RTA_PUT(skb, TCA_SFB_PARMS, sizeof(opt), &opt);
+	return RTA_NEST_END(skb, opts);
+
+rtattr_failure:
+	return -1;
+}
+
+static int sfb_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	struct tc_sfb_xstats st = { .earlydrop = q->earlydrop,
+                                    .penaltydrop = q->penaltydrop,
+                                    .bucketdrop = q->bucketdrop,
+                                    .marked = q->marked};
+
+        compute_qlen(&st.maxqlen, &st.maxprob, q);
+
+	return gnet_stats_copy_app(d, &st, sizeof(st));
+}
+        
+static int sfb_dump_class(struct Qdisc *sch, unsigned long cl,
+			  struct sk_buff *skb, struct tcmsg *tcm)
+{
+        return -ENOSYS;
+}
+
+static int sfb_graft(struct Qdisc *sch, unsigned long arg, struct Qdisc *new,
+		     struct Qdisc **old)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+
+	if (new == NULL)
+		new = &noop_qdisc;
+
+	sch_tree_lock(sch);
+	*old = xchg(&q->qdisc, new);
+	qdisc_tree_decrease_qlen(*old, (*old)->q.qlen);
+	qdisc_reset(*old);
+	sch_tree_unlock(sch);
+	return 0;
+}
+
+static struct Qdisc *sfb_leaf(struct Qdisc *sch, unsigned long arg)
+{
+	struct sfb_sched_data *q = qdisc_priv(sch);
+	return q->qdisc;
+}
+
+static unsigned long sfb_get(struct Qdisc *sch, u32 classid)
+{
+	return 1;
+}
+
+static void sfb_put(struct Qdisc *sch, unsigned long arg)
+{
+	return;
+}
+
+static int sfb_change_class(struct Qdisc *sch, u32 classid, u32 parentid,
+			    struct rtattr **tca, unsigned long *arg)
+{
+	return -ENOSYS;
+}
+
+static int sfb_delete(struct Qdisc *sch, unsigned long cl)
+{
+	return -ENOSYS;
+}
+
+static void sfb_walk(struct Qdisc *sch, struct qdisc_walker *walker)
+{
+	if (!walker->stop) {
+		if (walker->count >= walker->skip)
+			if (walker->fn(sch, 1, walker) < 0) {
+				walker->stop = 1;
+				return;
+			}
+		walker->count++;
+	}
+}
+
+static struct tcf_proto **sfb_find_tcf(struct Qdisc *sch, unsigned long cl)
+{
+	return NULL;
+}
+
+static struct Qdisc_class_ops sfb_class_ops =
+{
+	.graft		=	sfb_graft,
+	.leaf		=	sfb_leaf,
+	.get		=	sfb_get,
+	.put		=	sfb_put,
+	.change		=	sfb_change_class,
+	.delete		=	sfb_delete,
+	.walk		=	sfb_walk,
+	.tcf_chain	=	sfb_find_tcf,
+	.dump		=	sfb_dump_class,
+};
+
+struct Qdisc_ops sfb_qdisc_ops = {
+	.id		=	"sfb",
+	.priv_size	=	sizeof(struct sfb_sched_data),
+	.cl_ops		=	&sfb_class_ops,
+	.enqueue	=	sfb_enqueue,
+	.dequeue	=	sfb_dequeue,
+	.requeue	=	sfb_requeue,
+	.init		=	sfb_init,
+	.reset		=	sfb_reset,
+	.destroy	=	sfb_destroy,
+	.change		=	sfb_change,
+	.dump		=	sfb_dump,
+	.dump_stats	=	sfb_dump_stats,
+	.owner		=	THIS_MODULE,
+};
+
+static int __init sfb_module_init(void)
+{
+	return register_qdisc(&sfb_qdisc_ops);
+}
+
+static void __exit sfb_module_exit(void)
+{
+	unregister_qdisc(&sfb_qdisc_ops);
+}
+
+module_init(sfb_module_init)
+module_exit(sfb_module_exit)
+
+MODULE_DESCRIPTION("Stochastic Fair Blue queue discipline");
+MODULE_AUTHOR("Juliusz Chroboczek");
+MODULE_LICENSE("GPL");
diff -urN empty/sfb.h sch_sfb/sfb.h
--- empty/sfb.h	1970-01-01 01:00:00.000000000 +0100
+++ sch_sfb/sfb.h	2008-04-08 00:28:43.000000000 +0200
@@ -0,0 +1,52 @@
+/*
+ * sfb.h        Stochastic Fair Blue
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Juliusz Chroboczek <jch@pps.jussieu.fr>
+ */
+
+#include <linux/types.h>
+
+#define MAXHASHES 12
+#define MAXBUCKETS 32
+
+enum
+{
+	TCA_SFB_UNSPEC,
+	TCA_SFB_PARMS,
+	__TCA_SFB_MAX,
+};
+
+enum {
+        SFB_HASH_FLOW,
+        SFB_HASH_SOURCE,
+        SFB_HASH_DEST,
+        SFB_HASH_SOURCE_DEST,
+        __SFB_HASH_MAX,
+};
+
+#define TCA_SFB_MAX (__TCA_SFB_MAX - 1)
+
+struct tc_sfb_qopt
+{
+        __u8 hash_type, pad;
+        __u16 numhashes, numbuckets;
+        __u16 rehash_interval, db_interval;
+        __u16 max, target;
+        __u16 increment, decrement;
+        __u16 pad2;
+        __u32 limit;
+        __u32 penalty_rate, penalty_burst;
+};
+
+struct tc_sfb_xstats
+{
+        __u32 earlydrop, penaltydrop, bucketdrop, queuedrop, marked;
+        __u16 maxqlen, maxprob;
+};
+
+#define SFB_MAX_PROB 0xFFFF
diff --git a/tc/Makefile b/tc/Makefile
index 0facc88..2e7cddb 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -13,6 +13,7 @@ TCMODULES += q_tbf.o
 TCMODULES += q_cbq.o
 TCMODULES += q_rr.o
 TCMODULES += q_netem.o
+TCMODULES += q_sfb.o
 TCMODULES += f_rsvp.o
 TCMODULES += f_u32.o
 TCMODULES += f_route.o
diff --git a/tc/q_sfb.c b/tc/q_sfb.c
new file mode 100644
index 0000000..41512db
--- /dev/null
+++ b/tc/q_sfb.c
@@ -0,0 +1,252 @@
+/*
+ * q_red.c	Stochastic Fair Blue.
+ *
+ *		This program is free software; you can redistribute it and/or
+ *		modify it under the terms of the GNU General Public License
+ *		as published by the Free Software Foundation; either version
+ *		2 of the License, or (at your option) any later version.
+ *
+ * Authors:	Juliusz Chroboczek <jch@pps.jussieu.fr>
+ *
+ */
+
+
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <syslog.h>
+#include <fcntl.h>
+#include <sys/socket.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <string.h>
+
+#include "utils.h"
+#include "tc_util.h"
+
+#include "sfb.h"
+
+static void explain(void)
+{
+	fprintf(stderr,
+                "Usage: ... sfb "
+                "hashes N buckets N hash-type TYPE rehash SECS db SECS\n"
+                "           limit PACKETS max PACKETS target PACKETS\n"
+                "           increment FLOAT decrement FLOAT\n"
+                "           penalty_rate PPS penalty_burst PACKETS\n");
+}
+
+static int get_prob(__u16 *val, const char *arg)
+{
+        double d;
+        char *ptr;
+
+        if(!arg || !*arg)
+                return -1;
+        d = strtod(arg, &ptr);
+        if(!ptr || ptr == arg || d < 0.0 || d > 1.0)
+                return -1;
+        *val = (__u16)(d * SFB_MAX_PROB + 0.5);
+        return 0;
+}
+
+static char *
+hash_type(__u8 val)
+{
+        switch(val) {
+        case SFB_HASH_FLOW: return "flow";
+        case SFB_HASH_SOURCE: return "source";
+        case SFB_HASH_DEST: return "dest";
+        case SFB_HASH_SOURCE_DEST: return "source-dest";
+        default: return "???";
+        }
+}
+
+static int get_hash_type(__u8 *val, const char *arg)
+{
+        if(strcmp(arg, "flow") == 0)
+                *val = SFB_HASH_FLOW;
+        else if(strcmp(arg, "source") == 0)
+                *val = SFB_HASH_SOURCE;
+        else if(strcmp(arg, "dest") == 0)
+                *val = SFB_HASH_DEST;
+        else if(strcmp(arg, "source-dest") == 0)
+                *val = SFB_HASH_SOURCE_DEST;
+        else
+                return -1;
+
+        return 0;
+}
+
+static int sfb_parse_opt(struct qdisc_util *qu, int argc, char **argv,
+                         struct nlmsghdr *n)
+{
+	struct tc_sfb_qopt opt;
+	struct rtattr *tail;
+
+	memset(&opt, 0, sizeof(opt));
+        opt.numhashes = 6;
+        opt.numbuckets = 32;
+        opt.rehash_interval = 600;
+        opt.db_interval = 60;
+        opt.penalty_rate = 10;
+        opt.penalty_burst = 20;
+        opt.increment = (SFB_MAX_PROB + 1000) / 2000;
+        opt.decrement = (SFB_MAX_PROB + 10000) / 20000;
+
+	while (argc > 0) {
+                if (strcmp(*argv, "hashes") == 0) {
+			NEXT_ARG();
+			if (get_u16(&opt.numhashes, *argv, 0)) {
+				fprintf(stderr, "Illegal \"hashes\"\n");
+				return -1;
+			}
+                } else if (strcmp(*argv, "buckets") == 0) {
+			NEXT_ARG();
+			if (get_u16(&opt.numbuckets, *argv, 0)) {
+				fprintf(stderr, "Illegal \"buckets\"\n");
+				return -1;
+			}
+                } else if (strcmp(*argv, "hash-type") == 0) {
+			NEXT_ARG();
+			if (get_hash_type(&opt.hash_type, *argv)) {
+				fprintf(stderr, "Illegal \"hash-type\"\n");
+				return -1;
+			}
+                } else if (strcmp(*argv, "rehash") == 0) {
+			NEXT_ARG();
+			if (get_u16(&opt.rehash_interval, *argv, 0)) {
+				fprintf(stderr, "Illegal \"rehash\"\n");
+				return -1;
+			}
+                } else if (strcmp(*argv, "db") == 0) {
+			NEXT_ARG();
+			if (get_u16(&opt.db_interval, *argv, 0)) {
+				fprintf(stderr, "Illegal \"db\"\n");
+				return -1;
+			}
+                } else if (strcmp(*argv, "limit") == 0) {
+			NEXT_ARG();
+			if (get_u32(&opt.limit, *argv, 0)) {
+				fprintf(stderr, "Illegal \"limit\"\n");
+				return -1;
+			}
+                } else if (strcmp(*argv, "max") == 0) {
+			NEXT_ARG();
+			if (get_u16(&opt.max, *argv, 0)) {
+				fprintf(stderr, "Illegal \"max\"\n");
+				return -1;
+			}
+                } else if (strcmp(*argv, "target") == 0) {
+			NEXT_ARG();
+			if (get_u16(&opt.target, *argv, 0)) {
+				fprintf(stderr, "Illegal \"target\"\n");
+				return -1;
+			}
+                } else if (strcmp(*argv, "increment") == 0) {
+			NEXT_ARG();
+			if (get_prob(&opt.increment, *argv)) {
+				fprintf(stderr, "Illegal \"increment\"\n");
+				return -1;
+			}
+                } else if (strcmp(*argv, "decrement") == 0) {
+			NEXT_ARG();
+			if (get_prob(&opt.decrement, *argv)) {
+				fprintf(stderr, "Illegal \"decrement\"\n");
+				return -1;
+			}
+                } else if (strcmp(*argv, "penalty_rate") == 0) {
+			NEXT_ARG();
+			if (get_u32(&opt.penalty_rate, *argv, 0)) {
+				fprintf(stderr, "Illegal \"penalty_rate\"\n");
+				return -1;
+			}
+                } else if (strcmp(*argv, "penalty_burst") == 0) {
+			NEXT_ARG();
+			if (get_u32(&opt.penalty_burst, *argv, 0)) {
+				fprintf(stderr, "Illegal \"penalty_burst\"\n");
+				return -1;
+			}
+		} else {
+			fprintf(stderr, "What is \"%s\"?\n", *argv);
+			explain();
+			return -1;
+		}
+		argc--; argv++;
+	}
+
+        if(opt.max == 0) {
+                if(opt.target >= 1)
+                        opt.max = (opt.target * 5 + 1) / 4;
+                else
+                        opt.max = 25;
+        }
+        if(opt.target == 0)
+                opt.target = (opt.max * 4 + 3) / 5;
+
+	tail = NLMSG_TAIL(n);
+	addattr_l(n, 1024, TCA_OPTIONS, NULL, 0);
+	addattr_l(n, 1024, TCA_SFB_PARMS, &opt, sizeof(opt));
+	tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
+	return 0;
+}
+
+static int sfb_print_opt(struct qdisc_util *qu, FILE *f, struct rtattr *opt)
+{
+    	struct rtattr *tb[__TCA_SFB_MAX];
+	struct tc_sfb_qopt *qopt;
+
+	if(opt == NULL)
+		return 0;
+
+        parse_rtattr_nested(tb, TCA_SFB_MAX, opt);
+	if(tb[TCA_SFB_PARMS] == NULL)
+		return -1;
+	qopt = RTA_DATA(tb[TCA_SFB_PARMS]);
+	if(RTA_PAYLOAD(tb[TCA_SFB_PARMS]) < sizeof(*qopt))
+		return -1;
+
+        fprintf(f,
+                "hash %s limit %d max %d target %d\n"
+                "  increment %.5f decrement %.5f penalty rate %d burst %d "
+                "(%ds %ds %dx%d)",
+                hash_type(qopt->hash_type),
+                qopt->limit, qopt->max, qopt->target,
+                (double)qopt->increment / SFB_MAX_PROB,
+                (double)qopt->decrement / SFB_MAX_PROB,
+                qopt->penalty_rate, qopt->penalty_burst,
+                qopt->rehash_interval, qopt->db_interval,
+                qopt->numhashes, qopt->numbuckets);
+
+        return 0;
+}
+
+static int sfb_print_xstats(struct qdisc_util *qu, FILE *f,
+                            struct rtattr *xstats)
+{
+    struct tc_sfb_xstats *st;
+
+    if(xstats == NULL)
+            return 0;
+
+    if(RTA_PAYLOAD(xstats) < sizeof(*st))
+            return -1;
+
+    st = RTA_DATA(xstats);
+    fprintf(f,
+            "  earlydrop %u penaltydrop %u bucketdrop %u queuedrop %u marked %u\n"
+            "  maxqlen %u maxprob %.5f",
+            st->earlydrop, st->penaltydrop, st->bucketdrop, st->queuedrop,
+            st->marked,
+            st->maxqlen, (double)st->maxprob / SFB_MAX_PROB);
+
+    return 0;
+}
+
+struct qdisc_util sfb_qdisc_util = {
+	.id		= "sfb",
+	.parse_qopt	= sfb_parse_opt,
+	.print_qopt	= sfb_print_opt,
+	.print_xstats	= sfb_print_xstats,
+};

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-07 22:37 [PATCH] Stochastic Fair Blue queue discipline Juliusz Chroboczek
@ 2008-04-08  0:04 ` Patrick McHardy
  2008-04-08  0:56   ` Patrick McHardy
  2008-04-08 15:28   ` Juliusz Chroboczek
  2008-04-08  8:20 ` Andi Kleen
  2008-04-08 16:44 ` Patrick McHardy
  2 siblings, 2 replies; 23+ messages in thread
From: Patrick McHardy @ 2008-04-08  0:04 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: netdev

Juliusz Chroboczek wrote:
> Hello,
>
> The attached is an implementation of the Stochastic Fair Blue queue
> discipline for Linux.
>
> I would like to request a review of the following code, and whether it
> is worthwile to add the Kconfig bits and submit it for inclusion into
> the mainline kernel.
>
> You will find a ready to build tarball on
>
>   http://www.pps.jussieu.fr/~jch/software/files/sch_sfb-20080407.tar.gz
>   http://www.pps.jussieu.fr/~jch/software/files/sch_sfb-20080407.tar.gz.asc
>
> and a short description on
>
>   http://www.pps.jussieu.fr/~jch/software/sfb/

Do you have some graphs or something showing the effects?
I experienced with SFB myself some time ago and recall
getting huge unfairness towards non-ECN connections.
I can't remember exactly how good it behaved overall.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-08  0:04 ` Patrick McHardy
@ 2008-04-08  0:56   ` Patrick McHardy
  2008-04-08 15:28   ` Juliusz Chroboczek
  1 sibling, 0 replies; 23+ messages in thread
From: Patrick McHardy @ 2008-04-08  0:56 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: netdev

Patrick McHardy wrote:
> Juliusz Chroboczek wrote:
>> Hello,
>>
>> The attached is an implementation of the Stochastic Fair Blue queue
>> discipline for Linux.
>>
>> I would like to request a review of the following code, and whether it
>> is worthwile to add the Kconfig bits and submit it for inclusion into
>> the mainline kernel.
>>
>> You will find a ready to build tarball on
>>
>>   http://www.pps.jussieu.fr/~jch/software/files/sch_sfb-20080407.tar.gz
>>   
>> http://www.pps.jussieu.fr/~jch/software/files/sch_sfb-20080407.tar.gz.asc 
>>
>>
>> and a short description on
>>
>>   http://www.pps.jussieu.fr/~jch/software/sfb/
>
> Do you have some graphs or something showing the effects?
> I experienced with SFB myself some time ago and recall
> getting huge unfairness towards non-ECN connections.
> I can't remember exactly how good it behaved overall.

I just found my old tree and it was just Blue, which might explain
this :) Thanks for posting this, I'll have a closer look at your
patch tommorrow.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-07 22:37 [PATCH] Stochastic Fair Blue queue discipline Juliusz Chroboczek
  2008-04-08  0:04 ` Patrick McHardy
@ 2008-04-08  8:20 ` Andi Kleen
  2008-04-08 12:11   ` Patrick McHardy
  2008-04-08 15:38   ` Juliusz Chroboczek
  2008-04-08 16:44 ` Patrick McHardy
  2 siblings, 2 replies; 23+ messages in thread
From: Andi Kleen @ 2008-04-08  8:20 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: netdev

Juliusz Chroboczek <jch@pps.jussieu.fr> writes:


Have not read everything. General comment you'll likely need to 
do some code style cleanups (read Documentation/CodingStyle) 
Especially indentation should be consistent (one easy way 
to get that would be to run it through Lindent, although you
might need to review it afterwards because Lindent sometimes does
weird things)

> +
> +static unsigned sfb_hash(struct sk_buff *skb, int hash, int filter,
> +                         struct sfb_sched_data *q)
> +{
> +	u32 h, h2;
> +        u8 hash_type = q->hash_type;
> +
> +	switch (skb->protocol) {
> +	case __constant_htons(ETH_P_IP):

I think we have multiple qdiscs now doing very similar such hashes.
Any chance this could be factored out into a common library function 
first?

The nice property of that is that it would make it much easier
to add new protocols later.

> +static int sfb_enqueue(struct sk_buff *skb, struct Qdisc* sch)
> +{
> +
> +	struct sfb_sched_data *q = qdisc_priv(sch);
> +        struct Qdisc *child = q->qdisc;
> +        psched_time_t now;
> +        int filter;
> +        u16 minprob = SFB_MAX_PROB;
> +        u16 minqlen = (u16)(~0);
> +        u32 r;
> +        int ret, i;
> +
> +        now = psched_get_time();

Getting high res time can be actually somewhat expensive on different
platforms. I would recommend to only use it if it's actually needed in
the inelastic flow case below and use jiffies to check for rehashing.

> +
> +        if(q->rehash_interval > 0) {
> +                psched_tdiff_t age;
> +                age = psched_tdiff_bounded(now, q->rehash_time,
> +                                           q->rehash_interval *
> +                                           PSCHED_TICKS_PER_SEC);
> +                if(unlikely(age >= q->rehash_interval * PSCHED_TICKS_PER_SEC)) {
> +                        swap_buffers(q);
> +                        q->rehash_time = now;
> +                }
> +                if(unlikely(!q->double_buffering && q->db_interval > 0 &&
> +                            age >= (q->rehash_interval - q->db_interval) *
> +                            PSCHED_TICKS_PER_SEC))
> +                        q->double_buffering = 1;


/dev/random is a precious resource (some systems have very little true entropy)
and getting that much data from it regularly is not a good idea because
you take it away from other users who really need it (like strong cryptography)

I'm not sure if the rehash is for security or not (some comments on that 
would be good). But in any case it is likely better you only get
the get_random_bytes() entropy once and then use that to init some kind
of RNG (either a secure one if it's for security or some random 
fast one if it's not) and then use that output for rehashing.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-08  8:20 ` Andi Kleen
@ 2008-04-08 12:11   ` Patrick McHardy
  2008-04-08 15:39     ` Juliusz Chroboczek
  2008-04-08 15:38   ` Juliusz Chroboczek
  1 sibling, 1 reply; 23+ messages in thread
From: Patrick McHardy @ 2008-04-08 12:11 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Juliusz Chroboczek, netdev

Andi Kleen wrote:
> Juliusz Chroboczek <jch@pps.jussieu.fr> writes:
> 
> 
> Have not read everything. General comment you'll likely need to 
> do some code style cleanups (read Documentation/CodingStyle) 
> Especially indentation should be consistent (one easy way 
> to get that would be to run it through Lindent, although you
> might need to review it afterwards because Lindent sometimes does
> weird things)
> 
>> +
>> +static unsigned sfb_hash(struct sk_buff *skb, int hash, int filter,
>> +                         struct sfb_sched_data *q)
>> +{
>> +	u32 h, h2;
>> +        u8 hash_type = q->hash_type;
>> +
>> +	switch (skb->protocol) {
>> +	case __constant_htons(ETH_P_IP):
> 
> I think we have multiple qdiscs now doing very similar such hashes.
> Any chance this could be factored out into a common library function 
> first?

I'd suggest to use the flow classifier for this. For simplicity
you could attach a default classifier that uses the same keys as
this one.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-08  0:04 ` Patrick McHardy
  2008-04-08  0:56   ` Patrick McHardy
@ 2008-04-08 15:28   ` Juliusz Chroboczek
  1 sibling, 0 replies; 23+ messages in thread
From: Juliusz Chroboczek @ 2008-04-08 15:28 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netdev

> I experienced with SFB myself some time ago and recall
> getting huge unfairness towards non-ECN connections.

I've only done informal checking, but didn't test ECN against non-ECN.
Will do, and I'll let you know the results.

> I just found my old tree and it was just Blue, which might explain this

Blue is supposed to be roughly fair w.r.t. TCP-compliant flows with
similar RTT.  If you see long-term unfairness with Blue at a single
RTT, then something is wrong at the sender.  I'll see if I can repeat
your issues.

Earlier versions of Linux did not reduce their sending rate when they
received ECE and the congestion window was just 1 MSS.  I don't know
if this bug has been fixed.

                                        Juliusz

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-08  8:20 ` Andi Kleen
  2008-04-08 12:11   ` Patrick McHardy
@ 2008-04-08 15:38   ` Juliusz Chroboczek
  2008-04-08 16:32     ` Andi Kleen
  1 sibling, 1 reply; 23+ messages in thread
From: Juliusz Chroboczek @ 2008-04-08 15:38 UTC (permalink / raw)
  To: Andi Kleen; +Cc: netdev

> (read Documentation/CodingStyle)

Will do, thanks for the pointer.

>> +static unsigned sfb_hash(struct sk_buff *skb, int hash, int filter,
>> +                         struct sfb_sched_data *q)

> I think we have multiple qdiscs now doing very similar such hashes.
> Any chance this could be factored out into a common library function 
> first?

I'll be glad to do so.  Where shall I put it?

> Getting high res time can be actually somewhat expensive on different
> platforms. I would recommend to only use it if it's actually needed in
> the inelastic flow case below and use jiffies to check for rehashing.

Good point.

> /dev/random is a precious resource (some systems have very little
> true entropy) and getting that much data from it regularly is not
> a good idea because you take it away from other users who really
> need it

In the default configuration, I'm getting 32 bytes every 10 minutes,
which seemed reasonable.  This code was copied from sch_sfq (function
sfq_perturbation), which does the very same thing.

> But in any case it is likely better you only get the
> get_random_bytes() entropy once and then use that to init some kind
> of RNG (either a secure one if it's for security or some random fast
> one if it's not) and then use that output for rehashing.

Could do, but in that case the RNG should be shared between sfq and
sfb.  Shall I simply use net_random (which calls random32) ?

                                        Juliusz

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-08 12:11   ` Patrick McHardy
@ 2008-04-08 15:39     ` Juliusz Chroboczek
  2008-04-08 15:45       ` Patrick McHardy
  0 siblings, 1 reply; 23+ messages in thread
From: Juliusz Chroboczek @ 2008-04-08 15:39 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Andi Kleen, netdev

>> I think we have multiple qdiscs now doing very similar such hashes.
>> Any chance this could be factored out into a common library function
>> first?

> I'd suggest to use the flow classifier for this. For simplicity
> you could attach a default classifier that uses the same keys as
> this one.

I'm not sure I know what you're speaking about.  Could you please
point me at code somewhere?

                                        Juliusz

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-08 15:39     ` Juliusz Chroboczek
@ 2008-04-08 15:45       ` Patrick McHardy
  2008-04-09 15:48         ` Juliusz Chroboczek
  0 siblings, 1 reply; 23+ messages in thread
From: Patrick McHardy @ 2008-04-08 15:45 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: Andi Kleen, netdev

Juliusz Chroboczek wrote:
>>> I think we have multiple qdiscs now doing very similar such hashes.
>>> Any chance this could be factored out into a common library function
>>> first?
> 
>> I'd suggest to use the flow classifier for this. For simplicity
>> you could attach a default classifier that uses the same keys as
>> this one.
> 
> I'm not sure I know what you're speaking about.  Could you please
> point me at code somewhere?

Check out a recent tree (either Linus' current tree or ideally
the net-2.6.26.git tree from

git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26.git

and look at net/sched/cls_flow.c and the recent changes to
net/sched/sch_sfq.c for external classifiers. There are also
a number of other incompatible changes that your patch needs
to be adapter to, most importantly the use of the new netlink
infrastructure.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-08 15:38   ` Juliusz Chroboczek
@ 2008-04-08 16:32     ` Andi Kleen
  2008-04-08 17:35       ` Juliusz Chroboczek
  0 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2008-04-08 16:32 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: Andi Kleen, netdev

On Tue, Apr 08, 2008 at 05:38:47PM +0200, Juliusz Chroboczek wrote:
> > I think we have multiple qdiscs now doing very similar such hashes.
> > Any chance this could be factored out into a common library function 
> > first?
> 
> I'll be glad to do so.  Where shall I put it?

See Patrick's suggestion.

> > But in any case it is likely better you only get the
> > get_random_bytes() entropy once and then use that to init some kind
> > of RNG (either a secure one if it's for security or some random fast
> > one if it's not) and then use that output for rehashing.
> 
> Could do, but in that case the RNG should be shared between sfq and
> sfb.  Shall I simply use net_random (which calls random32) ?

no net_random/srandom32 is per CPU and likely does the wrong thing for you.

There was a new RND being scheduled to be merged with perfmon,
either use that or add a simple own (I assume your needs are not great)

-Andi


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-07 22:37 [PATCH] Stochastic Fair Blue queue discipline Juliusz Chroboczek
  2008-04-08  0:04 ` Patrick McHardy
  2008-04-08  8:20 ` Andi Kleen
@ 2008-04-08 16:44 ` Patrick McHardy
  2 siblings, 0 replies; 23+ messages in thread
From: Patrick McHardy @ 2008-04-08 16:44 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: netdev

> diff -urN empty/sch_sfb.c sch_sfb/sch_sfb.c
> --- empty/sch_sfb.c	1970-01-01 01:00:00.000000000 +0100
> +++ sch_sfb/sch_sfb.c	2008-04-08 00:29:14.000000000 +0200


I wanted to review this, but the CodingStyle issues are
distracting to me, so I guess I'll wait until you've done
the cleanups suggested by Andi.

If you could make your next patch against a kernel tree
that would be also nice.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-08 16:32     ` Andi Kleen
@ 2008-04-08 17:35       ` Juliusz Chroboczek
  2008-04-08 17:53         ` Andi Kleen
  0 siblings, 1 reply; 23+ messages in thread
From: Juliusz Chroboczek @ 2008-04-08 17:35 UTC (permalink / raw)
  To: Andi Kleen; +Cc: netdev

> no net_random/srandom32 is per CPU and likely does the wrong thing for you.

Why is that?  I'm only using this to initialise a global data
structure, why should it matter that I use per-cpu state?

                                        Juliusz

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-08 17:35       ` Juliusz Chroboczek
@ 2008-04-08 17:53         ` Andi Kleen
  2008-04-09 15:45           ` Juliusz Chroboczek
  0 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2008-04-08 17:53 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: Andi Kleen, netdev

On Tue, Apr 08, 2008 at 07:35:36PM +0200, Juliusz Chroboczek wrote:
> > no net_random/srandom32 is per CPU and likely does the wrong thing for you.
> 
> Why is that?  I'm only using this to initialise a global data
> structure, why should it matter that I use per-cpu state?

Because they're independent and there is no guarantee you always 
run on the same CPU and sampling them randomly will not necessarily
give you a good random number sequence.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-08 17:53         ` Andi Kleen
@ 2008-04-09 15:45           ` Juliusz Chroboczek
  2008-04-09 17:49             ` Andi Kleen
  0 siblings, 1 reply; 23+ messages in thread
From: Juliusz Chroboczek @ 2008-04-09 15:45 UTC (permalink / raw)
  To: Andi Kleen; +Cc: netdev

>> > no net_random/srandom32 is per CPU and likely does the wrong thing for you.

>> Why is that?  I'm only using this to initialise a global data
>> structure, why should it matter that I use per-cpu state?

> Because they're independent and there is no guarantee you always 
> run on the same CPU and sampling them randomly will not necessarily
> give you a good random number sequence.

I'm sorry to be such a pain, but I still don't understand.

Random32 is initialised from get_random_bytes; so the per-cpu
pseudo-random sequences should be uncorrelated.  I fail to see how an
arbitrary interleaving of uncorrelated good pseudo-random sequences
can fail to be good.

Looking at line 448 of sch_sfq.c in Linus' current HEAD, I see that
somebody else thinks the same as I do.  So please let me know if sfq
needs fixed, or whether I can use net_random in sfb.

Thanks,

                                        Juliusz

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-08 15:45       ` Patrick McHardy
@ 2008-04-09 15:48         ` Juliusz Chroboczek
  2008-04-09 16:00           ` Patrick McHardy
  0 siblings, 1 reply; 23+ messages in thread
From: Juliusz Chroboczek @ 2008-04-09 15:48 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Andi Kleen, netdev

>>> I'd suggest to use the flow classifier for this. For simplicity
>>> you could attach a default classifier that uses the same keys as
>>> this one.

>> I'm not sure I know what you're speaking about.  Could you please
>> point me at code somewhere?

> Check out a recent tree (either Linus' current tree or ideally
> the net-2.6.26.git tree from
>
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26.git
>
> and look at net/sched/cls_flow.c and the recent changes to
> net/sched/sch_sfq.c for external classifiers.

Thanks for the pointer.  Unfortunately, it will need some changes to
work with sfb -- since we do double-buffering, we need some overlap
between the old and new classifiers when we choose to reclassify.  I'm
not quite sure how to do that.

Other than that, it will require some changes to sfb (which currently
assumes we can get an arbitrary amount of hash data for a single
packet, while the classifiers only give 32 bits), but these changes
are a good thing -- 32 bits should be enough.

Would you have an example of how this stuff is configured from
user-space?

> There are also a number of other incompatible changes that your
> patch needs to be adapter to, most importantly the use of the new
> netlink infrastructure.

Done, thanks for the pointer.

                                        Juliusz

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-09 15:48         ` Juliusz Chroboczek
@ 2008-04-09 16:00           ` Patrick McHardy
  0 siblings, 0 replies; 23+ messages in thread
From: Patrick McHardy @ 2008-04-09 16:00 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: Andi Kleen, netdev

Juliusz Chroboczek wrote:
>>>> I'd suggest to use the flow classifier for this. For simplicity
>>>> you could attach a default classifier that uses the same keys as
>>>> this one.
> 
>>> I'm not sure I know what you're speaking about.  Could you please
>>> point me at code somewhere?
> 
>> Check out a recent tree (either Linus' current tree or ideally
>> the net-2.6.26.git tree from
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.26.git
>>
>> and look at net/sched/cls_flow.c and the recent changes to
>> net/sched/sch_sfq.c for external classifiers.
> 
> Thanks for the pointer.  Unfortunately, it will need some changes to
> work with sfb -- since we do double-buffering, we need some overlap
> between the old and new classifiers when we choose to reclassify.  I'm
> not quite sure how to do that.

You mean for perturbation? Yes, thats something not currently
supported. I'd suggest to ignore this for now, I intend
to add support, just need to figure out a way to reliably
kill the timer on cleanup (can't use del_timer_sync since
we're holding a lock). Shouldn't be that hard ...

> Other than that, it will require some changes to sfb (which currently
> assumes we can get an arbitrary amount of hash data for a single
> packet, while the classifiers only give 32 bits), but these changes
> are a good thing -- 32 bits should be enough.

Actually its only (max) 16 bits, the qdisc handle is always
encoded in the upper 16 bits. If you need more you could
combine it with other data. I recall seeing a paper about bloom
filters that analyzed the effectiveness of this compared to
perfect hashing, they concluded that the increase in false
positives is negligible.

I'll see if I can find it again in case you're interested.

> Would you have an example of how this stuff is configured from
> user-space?

I'm using this for per-dst hashing to 1024 classes
(1400:1 - 1400:1025).

tc filter add dev eth0 protocol ip pref 1 parent 1400: handle 1 \
         flow hash keys dst divisor 1024 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-09 15:45           ` Juliusz Chroboczek
@ 2008-04-09 17:49             ` Andi Kleen
  2008-04-10  1:23               ` Patrick McHardy
  0 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2008-04-09 17:49 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: Andi Kleen, netdev

> Random32 is initialised from get_random_bytes; so the per-cpu
> pseudo-random sequences should be uncorrelated.  I fail to see how an
> arbitrary interleaving of uncorrelated good pseudo-random sequences
> can fail to be good.

They are not necessarily uncorreleated, especially on platforms
which do have poor entropy support and when your initialization happens
at boot time. Take a look at how the random pool starts in random.c.

> 
> Looking at line 448 of sch_sfq.c in Linus' current HEAD, I see that
> somebody else thinks the same as I do.  So please let me know if sfq
> needs fixed, or whether I can use net_random in sfb.

A lot of people get this wrong, but this doesn't mean that the
problem should be readded in new code again.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-09 17:49             ` Andi Kleen
@ 2008-04-10  1:23               ` Patrick McHardy
  2008-04-10  1:38                 ` Patrick McHardy
  2008-04-10  7:36                 ` Andi Kleen
  0 siblings, 2 replies; 23+ messages in thread
From: Patrick McHardy @ 2008-04-10  1:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Juliusz Chroboczek, netdev

Andi Kleen wrote:
>> Random32 is initialised from get_random_bytes; so the per-cpu
>> pseudo-random sequences should be uncorrelated.  I fail to see how an
>> arbitrary interleaving of uncorrelated good pseudo-random sequences
>> can fail to be good.
> 
> They are not necessarily uncorreleated, especially on platforms
> which do have poor entropy support and when your initialization happens
> at boot time. Take a look at how the random pool starts in random.c.
> 
>> Looking at line 448 of sch_sfq.c in Linus' current HEAD, I see that
>> somebody else thinks the same as I do.  So please let me know if sfq
>> needs fixed, or whether I can use net_random in sfb.
> 
> A lot of people get this wrong, but this doesn't mean that the
> problem should be readded in new code again.


Well, if I'm not mistaken net_random() used to be a function
(in net/core/utils.c) that didn't have this problem. So these
problems seem to have been introduced by the conversion to
srandom().


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-10  1:23               ` Patrick McHardy
@ 2008-04-10  1:38                 ` Patrick McHardy
  2008-04-10 11:17                   ` Juliusz Chroboczek
  2008-04-10 14:04                   ` Stephen Hemminger
  2008-04-10  7:36                 ` Andi Kleen
  1 sibling, 2 replies; 23+ messages in thread
From: Patrick McHardy @ 2008-04-10  1:38 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Juliusz Chroboczek, netdev

Patrick McHardy wrote:
> Andi Kleen wrote:
>>> Random32 is initialised from get_random_bytes; so the per-cpu
>>> pseudo-random sequences should be uncorrelated.  I fail to see how an
>>> arbitrary interleaving of uncorrelated good pseudo-random sequences
>>> can fail to be good.
>>
>> They are not necessarily uncorreleated, especially on platforms
>> which do have poor entropy support and when your initialization happens
>> at boot time. Take a look at how the random pool starts in random.c.
>>
>>> Looking at line 448 of sch_sfq.c in Linus' current HEAD, I see that
>>> somebody else thinks the same as I do.  So please let me know if sfq
>>> needs fixed, or whether I can use net_random in sfb.
>>
>> A lot of people get this wrong, but this doesn't mean that the
>> problem should be readded in new code again.
> 
> 
> Well, if I'm not mistaken net_random() used to be a function
> (in net/core/utils.c) that didn't have this problem. So these
> problems seem to have been introduced by the conversion to
> srandom().

Two more noteworthy things:

- net_random() was intended to provide *mediocre* random,
cheap to compute, not perfect. Good enough for many networking
related things.

- traffic schedulers shouldn't depend on perfect random,
its more about statistical multiplexing.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-10  1:23               ` Patrick McHardy
  2008-04-10  1:38                 ` Patrick McHardy
@ 2008-04-10  7:36                 ` Andi Kleen
  1 sibling, 0 replies; 23+ messages in thread
From: Andi Kleen @ 2008-04-10  7:36 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Andi Kleen, Juliusz Chroboczek, netdev

> Well, if I'm not mistaken net_random() used to be a function
> (in net/core/utils.c) that didn't have this problem. 

Correct it's a regression.

> So these
> problems seem to have been introduced by the conversion to
> srandom().

I think the per CPU ness was generally a mistake, it was a misguided
optimization and hurts

Iff someone really needs per cpu RNDs the better way would have been
to add a rand_r() like interface with explicit state and only do 
that in that particular caller.

Also I think a lot of get_random_bytes() shouldn't really be using it 
because they don't need precious cryptographically strong entropy,
but rather should get their data from another RND which has been
seeded once from the entropy pool only.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-10  1:38                 ` Patrick McHardy
@ 2008-04-10 11:17                   ` Juliusz Chroboczek
  2008-04-11 13:07                     ` Patrick McHardy
  2008-04-10 14:04                   ` Stephen Hemminger
  1 sibling, 1 reply; 23+ messages in thread
From: Juliusz Chroboczek @ 2008-04-10 11:17 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Andi Kleen, netdev

> - traffic schedulers shouldn't depend on perfect random,
> its more about statistical multiplexing.

Okay, I've been thinking about this, and I'm not quite sure that what
sfq and sfb want is a pseudo-random sequence in the first place.

Sfq and sfb only draw random numbers in order to change hashing
functions periodically, so as not to have an innocent flow stuck
within a hash bucket with a non-reactive flow.  For sfb it doesn't
matter much (collisions in a Bloom filter are so rare as to be almost
nonexistent), but for sfq, it is fairly important.

Now assuming jhash is any good, i.e. that a single bit change in the
input changes all output bits with roughly similar probability (and
I don't know whether it is), it shouldn't matter much what sequence of
u32 we use for the perturbation as long as it doesn't repeat values
too often.  Something as simple as a static counter might actually be
good enough.

So I'd argue that the wise thing is to make sure of is that jhash is
good (in the sense above), and not bother too much with the PRNG.

                                        Juliusz

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-10  1:38                 ` Patrick McHardy
  2008-04-10 11:17                   ` Juliusz Chroboczek
@ 2008-04-10 14:04                   ` Stephen Hemminger
  1 sibling, 0 replies; 23+ messages in thread
From: Stephen Hemminger @ 2008-04-10 14:04 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Andi Kleen, Juliusz Chroboczek, netdev

On Thu, 10 Apr 2008 03:38:13 +0200
Patrick McHardy <kaber@trash.net> wrote:

> Patrick McHardy wrote:
> > Andi Kleen wrote:
> >>> Random32 is initialised from get_random_bytes; so the per-cpu
> >>> pseudo-random sequences should be uncorrelated.  I fail to see how an
> >>> arbitrary interleaving of uncorrelated good pseudo-random sequences
> >>> can fail to be good.
> >>
> >> They are not necessarily uncorreleated, especially on platforms
> >> which do have poor entropy support and when your initialization happens
> >> at boot time. Take a look at how the random pool starts in random.c.
> >>
> >>> Looking at line 448 of sch_sfq.c in Linus' current HEAD, I see that
> >>> somebody else thinks the same as I do.  So please let me know if sfq
> >>> needs fixed, or whether I can use net_random in sfb.
> >>
> >> A lot of people get this wrong, but this doesn't mean that the
> >> problem should be readded in new code again.
> > 
> > 
> > Well, if I'm not mistaken net_random() used to be a function
> > (in net/core/utils.c) that didn't have this problem. So these
> > problems seem to have been introduced by the conversion to
> > srandom().
> 
> Two more noteworthy things:
> 
> - net_random() was intended to provide *mediocre* random,
> cheap to compute, not perfect. Good enough for many networking
> related things.
> 
> - traffic schedulers shouldn't depend on perfect random,
> its more about statistical multiplexing.

Anything related to random number seems to bring out the paranoia
in people.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] Stochastic Fair Blue queue discipline
  2008-04-10 11:17                   ` Juliusz Chroboczek
@ 2008-04-11 13:07                     ` Patrick McHardy
  0 siblings, 0 replies; 23+ messages in thread
From: Patrick McHardy @ 2008-04-11 13:07 UTC (permalink / raw)
  To: Juliusz Chroboczek; +Cc: Andi Kleen, netdev

Juliusz Chroboczek wrote:
>> - traffic schedulers shouldn't depend on perfect random,
>> its more about statistical multiplexing.
> 
> Okay, I've been thinking about this, and I'm not quite sure that what
> sfq and sfb want is a pseudo-random sequence in the first place.
> 
> Sfq and sfb only draw random numbers in order to change hashing
> functions periodically, so as not to have an innocent flow stuck
> within a hash bucket with a non-reactive flow.  For sfb it doesn't
> matter much (collisions in a Bloom filter are so rare as to be almost
> nonexistent), but for sfq, it is fairly important.
> 
> Now assuming jhash is any good, i.e. that a single bit change in the
> input changes all output bits with roughly similar probability (and
> I don't know whether it is), it shouldn't matter much what sequence of
> u32 we use for the perturbation as long as it doesn't repeat values
> too often.  Something as simple as a static counter might actually be
> good enough.
> 
> So I'd argue that the wise thing is to make sure of is that jhash is
> good (in the sense above), and not bother too much with the PRNG.


Agreed. jhash gives pretty good distribution, so we should be fine.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2008-04-11 13:07 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-04-07 22:37 [PATCH] Stochastic Fair Blue queue discipline Juliusz Chroboczek
2008-04-08  0:04 ` Patrick McHardy
2008-04-08  0:56   ` Patrick McHardy
2008-04-08 15:28   ` Juliusz Chroboczek
2008-04-08  8:20 ` Andi Kleen
2008-04-08 12:11   ` Patrick McHardy
2008-04-08 15:39     ` Juliusz Chroboczek
2008-04-08 15:45       ` Patrick McHardy
2008-04-09 15:48         ` Juliusz Chroboczek
2008-04-09 16:00           ` Patrick McHardy
2008-04-08 15:38   ` Juliusz Chroboczek
2008-04-08 16:32     ` Andi Kleen
2008-04-08 17:35       ` Juliusz Chroboczek
2008-04-08 17:53         ` Andi Kleen
2008-04-09 15:45           ` Juliusz Chroboczek
2008-04-09 17:49             ` Andi Kleen
2008-04-10  1:23               ` Patrick McHardy
2008-04-10  1:38                 ` Patrick McHardy
2008-04-10 11:17                   ` Juliusz Chroboczek
2008-04-11 13:07                     ` Patrick McHardy
2008-04-10 14:04                   ` Stephen Hemminger
2008-04-10  7:36                 ` Andi Kleen
2008-04-08 16:44 ` Patrick McHardy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).