Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH 3/6] [IPROUTE2]: Update pkt_sched.h (to resemble the kernel one)
From: Jesper Dangaard Brouer @ 2007-09-12 10:14 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: Patrick McHardy, David S. Miller, Stephen Hemminger

commit ef065a43b8900fbc0763eac0fa0a9a8a00c8aaa2
Author: Jesper Dangaard Brouer <hawk@comx.dk>
Date:   Tue Sep 11 16:17:46 2007 +0200

    [IPROUTE2]: Update pkt_sched.h (to resemble the kernel one)
    
     Extend the tc_ratespec struct, with two parameters: 1) "cell_align"
     that allow adjusting the alignment of the rate table. 2) "overhead"
     that allow adding a packet overhead before the lookup in the kernel.
    
     This is done in order to, add support to changing the rate table to
     use the upper-boundry L2T (length to time) value. Currently we use the
     lower-boundry, which result in under-estimating the actual bandwidth
     usage.
    
    Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 268c515..919af93 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -77,8 +77,8 @@ struct tc_ratespec
 {
 	unsigned char	cell_log;
 	unsigned char	__reserved;
-	unsigned short	feature;
-	short		addend;
+	unsigned short	overhead;
+	short		cell_align;
 	unsigned short	mpu;
 	__u32		rate;
 };


^ permalink raw reply related

* [PATCH 2/6] [NET_SCHED]: Making rate table lookups more flexible
From: Jesper Dangaard Brouer @ 2007-09-12 10:14 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: Patrick McHardy, David S. Miller, Stephen Hemminger

commit 57e993268df114a4270519b1004b8ea8086f671f
Author: Jesper Dangaard Brouer <hawk@comx.dk>
Date:   Tue Sep 11 15:44:15 2007 +0200

    [NET_SCHED]: Making rate table lookups more flexible.
    
     This is done in order to, add support to changing the rate table to
     use the upper-boundry L2T (length to time) value. Currently we use the
     lower-boundry, which result in under-estimating the actual bandwidth
     usage.
    
     Extend the tc_ratespec struct, with two parameters: 1) "cell_align"
     that allow adjusting the alignment of the rate table. 2) "overhead"
     that allow adding a packet overhead before the lookup.
    
    Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>

diff --git a/include/linux/pkt_sched.h b/include/linux/pkt_sched.h
index 268c515..919af93 100644
--- a/include/linux/pkt_sched.h
+++ b/include/linux/pkt_sched.h
@@ -77,8 +77,8 @@ struct tc_ratespec
 {
 	unsigned char	cell_log;
 	unsigned char	__reserved;
-	unsigned short	feature;
-	short		addend;
+	unsigned short	overhead;
+	short		cell_align;
 	unsigned short	mpu;
 	__u32		rate;
 };
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 4ebd615..a02ec9e 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -307,7 +307,9 @@ drop:
  */
 static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, unsigned int pktlen)
 {
-	int slot = pktlen;
+	int slot = pktlen + rtab->rate.cell_align + rtab->rate.overhead;
+	if (slot < 0)
+		slot = 0;
 	slot >>= rtab->rate.cell_log;
 	if (slot > 255)
 		return (rtab->data[255]*(slot >> 8) + rtab->data[slot & 0xFF]);


^ permalink raw reply related

* [PATCH 1/6] [NET_SCHED]: Cleanup L2T macros and handle oversized packets
From: Jesper Dangaard Brouer @ 2007-09-12 10:13 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: David S. Miller, Patrick McHardy, Stephen Hemminger

commit a28343c933f6cfc3df1be86e0ebe8d99fa8d5f77
Author: Jesper Dangaard Brouer <hawk@comx.dk>
Date:   Wed Sep 12 10:01:00 2007 +0200

    [NET_SCHED]: Cleanup L2T macros and handle oversized packets
    
    Change L2T (length to time) macros, in all rate based schedulers, to
    call a common function qdisc_l2t() that does the rate table lookup.
    This function handles if the packet size lookup is larger than the
    rate table, which often occurs with TSO enabled.
    
    Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk>

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 8a67f24..4ebd615 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -302,4 +302,16 @@ drop:
 	return NET_XMIT_DROP;
 }
 
+/* Length to Time (L2T) lookup in a qdisc_rate_table, to determine how
+   long it will take to send a packet given its size.
+ */
+static inline u32 qdisc_l2t(struct qdisc_rate_table* rtab, unsigned int pktlen)
+{
+	int slot = pktlen;
+	slot >>= rtab->rate.cell_log;
+	if (slot > 255)
+		return (rtab->data[255]*(slot >> 8) + rtab->data[slot & 0xFF]);
+	return rtab->data[slot];
+}
+
 #endif
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 6085be5..46deb5f 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -21,8 +21,8 @@
 #include <net/act_api.h>
 #include <net/netlink.h>
 
-#define L2T(p,L)   ((p)->tcfp_R_tab->data[(L)>>(p)->tcfp_R_tab->rate.cell_log])
-#define L2T_P(p,L) ((p)->tcfp_P_tab->data[(L)>>(p)->tcfp_P_tab->rate.cell_log])
+#define L2T(p,L)   qdisc_l2t((p)->tcfp_R_tab, L)
+#define L2T_P(p,L) qdisc_l2t((p)->tcfp_P_tab, L)
 
 #define POL_TAB_MASK     15
 static struct tcf_common *tcf_police_ht[POL_TAB_MASK + 1];
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c
index e38c283..aed2af2 100644
--- a/net/sched/sch_cbq.c
+++ b/net/sched/sch_cbq.c
@@ -175,7 +175,7 @@ struct cbq_sched_data
 };
 
 
-#define L2T(cl,len)	((cl)->R_tab->data[(len)>>(cl)->R_tab->rate.cell_log])
+#define L2T(cl,len)	qdisc_l2t((cl)->R_tab,len)
 
 
 static __inline__ unsigned cbq_hash(u32 h)
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 246a2f9..5e608a6 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -132,10 +132,8 @@ struct htb_class {
 static inline long L2T(struct htb_class *cl, struct qdisc_rate_table *rate,
 			   int size)
 {
-	int slot = size >> rate->rate.cell_log;
-	if (slot > 255)
-		return (rate->data[255]*(slot >> 8) + rate->data[slot & 0xFF]);
-	return rate->data[slot];
+	long result = qdisc_l2t(rate, size);
+	return result;
 }
 
 struct htb_sched {
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 8c2639a..b0d8109 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -115,8 +115,8 @@ struct tbf_sched_data
 	struct qdisc_watchdog watchdog;	/* Watchdog timer */
 };
 
-#define L2T(q,L)   ((q)->R_tab->data[(L)>>(q)->R_tab->rate.cell_log])
-#define L2T_P(q,L) ((q)->P_tab->data[(L)>>(q)->P_tab->rate.cell_log])
+#define L2T(q,L)   qdisc_l2t((q)->R_tab,L)
+#define L2T_P(q,L) qdisc_l2t((q)->P_tab,L)
 
 static int tbf_enqueue(struct sk_buff *skb, struct Qdisc* sch)
 {


^ permalink raw reply related

* [PATCH 0/6] NET_SCHED: Rate table fixes
From: Jesper Dangaard Brouer @ 2007-09-12 10:13 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: David S. Miller, Patrick McHardy, Stephen Hemminger

This set of patches, aim at fixing an issue with the rate table used
by the rate based schedulers.

Currently we use the lower-boundry value, which result in
under-estimating the actual bandwidth usage.  The patches will change
this to use the upper-boundry L2T (length to time) value.

The patches include both changes to the kernel and iproute2 userspace
utility. The kernel changes, only adds flexibility to allow userspace
to do the rate table alignment. The patches has been splitup in
cleanup and actual functional change patches.

The patches also moves the overhead calculation (currently only used
by HTB) into the kernel, which makes it more precise (as it won't miss-align
the contents of the rate table).

This should raise some questions,
 1. How does the current/old rate table mapping look like.
 2. How does new aligned rate table mapping look like.
 3. What happens when only the TC util is changed and used on a old kernel.

Lets look at how the layout of the rate tables looks like:

Illustrating the rate table array:
  Legend description
    rtab[x]   : Array index x of rtab[x]
    xmit_sz   : Transmit size contained in rtab[x] (normally transmit time)
    maps[a-b] : Packet sizes from a to b, will map into rtab[x]

(1) Current/old rate table mapping (cell_log:3):
  rtab[0]:=xmit_sz:0  maps[0-7]
  rtab[1]:=xmit_sz:8  maps[8-15]
  rtab[2]:=xmit_sz:16 maps[16-23]
  rtab[3]:=xmit_sz:24 maps[24-31]
  rtab[4]:=xmit_sz:32 maps[32-39]
  rtab[5]:=xmit_sz:40 maps[40-47]
  rtab[6]:=xmit_sz:48 maps[48-55]

The above illustrates that we are using the lower-boundry transmit
size (xmit_sz).

(2) New iproute rate table mapping, with kernel cell_align support.
  rtab[0]:=xmit_sz:8  maps[0-8]
  rtab[1]:=xmit_sz:16 maps[9-16]
  rtab[2]:=xmit_sz:24 maps[17-24]
  rtab[3]:=xmit_sz:32 maps[25-32]
  rtab[4]:=xmit_sz:40 maps[33-40]
  rtab[5]:=xmit_sz:48 maps[41-48]
  rtab[6]:=xmit_sz:56 maps[49-56]

The above illustrates that we are using the upper-boundry transmit
size (xmit_sz), when mapping packets sizes.

The interesting question is what about compatibility.  If a old
iproute utility is used on a new kernel, we simply get the old rate
table (lower-bound) alignment. The interesting case is what happens
with a new iproute util on a old kernel. The table below, shows that
what happens is that we use the upper-bound+1byte.  I believe that
this is a good and acceptable solution.

(3) New TC util on a kernel WITHOUT support for cell_align
  rtab[0]:=xmit_sz:8 maps[0-7]
  rtab[1]:=xmit_sz:16 maps[8-15]
  rtab[2]:=xmit_sz:24 maps[16-23]
  rtab[3]:=xmit_sz:32 maps[24-31]
  rtab[4]:=xmit_sz:40 maps[32-39]
  rtab[5]:=xmit_sz:48 maps[40-47]
  rtab[6]:=xmit_sz:56 maps[48-55]

-- 
Med venlig hilsen / Best regards
  Jesper Brouer
  ComX Networks A/S
  Linux Network developer
  Cand. Scient Datalog / MSc.
  Author of http://adsl-optimizer.dk

^ permalink raw reply

* Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
From: Eric Dumazet @ 2007-09-12 10:08 UTC (permalink / raw)
  To: David Miller; +Cc: herbert, netdev
In-Reply-To: <20070912.020525.39165997.davem@davemloft.net>

On Wed, 12 Sep 2007 02:05:25 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Eric Dumazet <dada1@cosmosbay.com>
> Date: Tue, 11 Sep 2007 14:56:13 +0200
> 
> > When the periodic IP route cache flush is done (every 600 seconds on 
> > default configuration), some hosts suffer a lot and eventually trigger
> > the "soft lockup" message.
> > 
> > dst_run_gc() is doing a scan of a possibly huge list of dst_entries,
> > eventually freeing some (less than 1%) of them, while holding the 
> > dst_lock spinlock for the whole scan.
> > 
> > Then it rearms a timer to redo the full thing 1/10 s later...
> > The slowdown can last one minute or so, depending on how active are
> > the tcp sessions.
> > 
> > This second version of the patch converts the processing from a softirq
> > based one to a workqueue.
> > 
> > Even if the list of entries in garbage_list is huge, host is still
> > responsive to softirqs and can make progress.
> > 
> > Instead of reseting gc timer to 0.1 second if one entry was freed in a
> > gc run, we do this if more than 10% of entries were freed.
> 
> I like this patch a lot, some minor fix is needed though:

Thank you

I also spoted a missing static before 
DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task);
 no need to stress Adrian on this :)

> 
> > +		__builtin_prefetch(&next->next, 1, 0);
> 
> Please use prefetch() instead of a direct explicit
> call to a gcc-specific routine :-)

Unfortunatly, there is no equivalent for this one. 
This gives on my Opterons a nice "prefetchnta"

prefetch(addr) is more like __builtin_prefetch(addr, 0, 3)

I would like to avoid to zap L2 cache with useless data.

__builtin_prefetch() is included from gcc 3.1 (2002), so every 
platform should support it, as linux-2.6 requires gcc 3.2 at least.

I guess you are going to tell me to first publish a patch to lkml :)

Thank you

Eric

^ permalink raw reply

* Re: [PATCH 1/2] dgrs: remove from build, config, and maintainer list
From: maximilian attems @ 2007-09-12 10:05 UTC (permalink / raw)
  To: neroden; +Cc: netdev
In-Reply-To: <20070912073920.GA6608@doctormoo.dyndns.org>

On Wed, 12 Sep 2007, Nathanael Nerode wrote:

> From: Nathanael Nerode
> 
> Stop building and configuring driver for Digi RightSwitch, which was 
> never actually sold to anyone, and remove it from MAINTAINERS.
> 
> In response to an investigation into the firmware of the "Digi Rightswitch" 
> driver, Andres Salomon discovered:

search the netdev archive for this month before sending
out duplicate patches.

jgarzik was on the kernel summit, so i'm waiting on his reply
to the complete removal patch.

-- 
maks

^ permalink raw reply

* Re: [PATCH 08/16] net: Make socket creation namespace safe.
From: David Miller @ 2007-09-12 10:04 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev, containers
In-Reply-To: <m1sl5ovolm.fsf_-_@ebiederm.dsl.xmission.com>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:23:01 -0600

> 
> This patch passes in the namespace a new socket should be created in
> and has the socket code do the appropriate reference counting.  By
> virtue of this all socket create methods are touched.  In addition
> the socket create methods are modified so that they will fail if
> you attempt to create a socket in a non-default network namespace.
> 
> Failing if we attempt to create a socket outside of the default
> network namespace ensures that as we incrementally make the network stack
> network namespace aware we will not export functionality that someone
> has not audited and made certain is network namespace safe.
> Allowing us to partially enable network namespaces before all of the
> exotic protocols are supported.
> 
> Any protocol layers I have missed will fail to compile because I now
> pass an extra parameter into the socket creation code.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Patch applied, thanks.

^ permalink raw reply

* Re: [PATCH 07/16] net: Make /proc/net per network namespace
From: David Miller @ 2007-09-12 10:02 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev, containers
In-Reply-To: <m11wd8x3a3.fsf_-_@ebiederm.dsl.xmission.com>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:20:36 -0600

> 
> This patch makes /proc/net per network namespace.  It modifies the global
> variables proc_net and proc_net_stat to be per network namespace.
> The proc_net file helpers are modified to take a network namespace argument,
> and all of their callers are fixed to pass &init_net for that argument.
> This ensures that all of the /proc/net files are only visible and
> usable in the initial network namespace until the code behind them
> has been updated to be handle multiple network namespaces.
> 
> Making /proc/net per namespace is necessary as at least some files
> in /proc/net depend upon the set of network devices which is per
> network namespace, and even more files in /proc/net have contents
> that are relevant to a single network namespace.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Patch applied, thanks.

^ permalink raw reply

* Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
From: Christoph Hellwig @ 2007-09-12 10:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Herbert Xu, David Miller, netdev@vger.kernel.org
In-Reply-To: <20070911145613.4762c534.dada1@cosmosbay.com>

This looks nice in general, getting things out of softirq context is
always good.

On Tue, Sep 11, 2007 at 02:56:13PM +0200, Eric Dumazet wrote:
>  #if RT_CACHE_DEBUG >= 2
>  static atomic_t			 dst_total = ATOMIC_INIT(0);
>  #endif
> -static unsigned long dst_gc_timer_expires;
> -static unsigned long dst_gc_timer_inc = DST_GC_MAX;
> -static void dst_run_gc(unsigned long);
> +static struct {
> +	spinlock_t		lock;
> +	struct dst_entry 	*list;
> +	unsigned long		timer_inc;
> +	unsigned long		timer_expires;
> +} dst_garbage = {
> +	.lock = __SPIN_LOCK_UNLOCKED(dst_garbage.lock),
> +	.timer_inc = DST_GC_MAX,
> +};

Can you please et rid of this useless struct?  It just complicates
the code and means we can't use the proper DEFINE_SPINLOCK initializer.

> +DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task);

This should be static.


^ permalink raw reply

* [PATCH] RDMA/CMA: Use neigh_event_send() to initiate neighbour discovery.
From: Steve Wise @ 2007-09-12 10:00 UTC (permalink / raw)
  To: rdreier, sean.hefty; +Cc: netdev, linux-kernel, general

RDMA/CMA: Use neigh_event_send() to initiate neighbour discovery.

Calling arp_send() to initiate neighbour discovery (ND) doesn't do the
full ND protocol.  Namely, it doesn't handle retransmitting the arp
request if it is dropped. The function neigh_event_send() does all this.
Without doing full ND, rdma address resolution fails in the presence of
dropped arp bcast packets.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---

 drivers/infiniband/core/addr.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index c5c33d3..5381c80 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -161,8 +161,7 @@ static void addr_send_arp(struct sockadd
 	if (ip_route_output_key(&rt, &fl))
 		return;

-	arp_send(ARPOP_REQUEST, ETH_P_ARP, rt->rt_gateway, rt->idev->dev,
-		 rt->rt_src, NULL, rt->idev->dev->dev_addr, NULL);
+	neigh_event_send(rt->u.dst.neighbour, NULL);
 	ip_rt_put(rt);
 }

^ permalink raw reply related

* Re: [PATCH 06/16] net: Add a network namespace parameter to struct sock
From: David Miller @ 2007-09-12  9:58 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev, containers
In-Reply-To: <m1wsv0vony.fsf_-_@ebiederm.dsl.xmission.com>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:21:37 -0600

> 
> Sockets need to get a reference to their network namespace,
> or possibly a simple hold if someone registers on the network
> namespace notifier and will free the sockets when the namespace
> is going to be destroyed.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 05/16] net: Add a network namespace tag to struct net_device
From: David Miller @ 2007-09-12  9:57 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev, containers
In-Reply-To: <m1642kx3e3.fsf_-_@ebiederm.dsl.xmission.com>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:18:12 -0600

> 
> Please note that network devices do not increase the count
> count on the network namespace.  The are inside the network
> namespace and so the network namespace tag is in the nature
> of a back pointer and so getting and putting the network namespace
> is unnecessary.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Applied to net-2.6.24, thanks.

^ permalink raw reply

* Re: [PATCH 04/16] net: Add a network namespace parameter to tasks
From: David Miller @ 2007-09-12  9:55 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev, containers
In-Reply-To: <m1abrwx3g0.fsf_-_@ebiederm.dsl.xmission.com>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:17:03 -0600

> 
> This is the network namespace from which all which all sockets
> and anything else under user control ultimately get their network
> namespace parameters.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Applied to net-2.6.24

^ permalink raw reply

* Re: [PATCH 03/16] net: Basic network namespace infrastructure.
From: David Miller @ 2007-09-12  9:52 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev, containers
In-Reply-To: <m1ejh8x3ih.fsf_-_@ebiederm.dsl.xmission.com>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:15:34 -0600

> 
> This is the basic infrastructure needed to support network
> namespaces.  This infrastructure is:
> - Registration functions to support initializing per network
>   namespace data when a network namespaces is created or destroyed.
> 
> - struct net.  The network namespace data structure.
>   This structure will grow as variables are made per network
>   namespace but this is the minimal starting point.
> 
> - Functions to grab a reference to the network namespace.
>   I provide both get/put functions that keep a network namespace
>   from being freed.  And hold/release functions serve as weak references
>   and will warn if their count is not zero when the data structure
>   is freed.  Useful for dealing with more complicated data structures
>   like the ipv4 route cache.
> 
> - A list of all of the network namespaces so we can iterate over them.
> 
> - A slab for the network namespace data structure allowing leaks
>   to be spotted.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

I realize there are some discussions about naming and fixing
some races, but I applied this anyways so we can make some
forward progress.

We can make name changes and fixes on top of this initial work.

^ permalink raw reply

* Re: [PATCH 02/16] net: Don't implement dev_ifname32 inline
From: David Miller @ 2007-09-12  9:39 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev, containers
In-Reply-To: <m1ir6kx3mn.fsf_-_@ebiederm.dsl.xmission.com>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:13:04 -0600

> 
> The current implementation of dev_ifname makes maintenance difficult
> because updates to the implementation of the ioctl have to made in two
> places.  So this patch updates dev_ifname32 to do a classic 32/64
> structure conversion and call sys_ioctl like the rest of the
> compat calls do.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Applied to net-2.6.24, thanks.

^ permalink raw reply

* Re: [PATCH 01/16] appletalk: In notifier handlers convert the void pointer to a netdevice
From: David Miller @ 2007-09-12  9:27 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev, containers
In-Reply-To: <m1myvwx3sf.fsf@ebiederm.dsl.xmission.com>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Sat, 08 Sep 2007 15:09:36 -0600

> 
> This slightly improves code safetly and clarity.
> 
> Later network namespace patches touch this code so this is a
> preliminary cleanup.
> 
> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>

Applied to net-2.6.24

^ permalink raw reply

* Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
From: David Miller @ 2007-09-12  9:05 UTC (permalink / raw)
  To: dada1; +Cc: herbert, netdev
In-Reply-To: <20070911145613.4762c534.dada1@cosmosbay.com>

From: Eric Dumazet <dada1@cosmosbay.com>
Date: Tue, 11 Sep 2007 14:56:13 +0200

> When the periodic IP route cache flush is done (every 600 seconds on 
> default configuration), some hosts suffer a lot and eventually trigger
> the "soft lockup" message.
> 
> dst_run_gc() is doing a scan of a possibly huge list of dst_entries,
> eventually freeing some (less than 1%) of them, while holding the 
> dst_lock spinlock for the whole scan.
> 
> Then it rearms a timer to redo the full thing 1/10 s later...
> The slowdown can last one minute or so, depending on how active are
> the tcp sessions.
> 
> This second version of the patch converts the processing from a softirq
> based one to a workqueue.
> 
> Even if the list of entries in garbage_list is huge, host is still
> responsive to softirqs and can make progress.
> 
> Instead of reseting gc timer to 0.1 second if one entry was freed in a
> gc run, we do this if more than 10% of entries were freed.

I like this patch a lot, some minor fix is needed though:

> +		__builtin_prefetch(&next->next, 1, 0);

Please use prefetch() instead of a direct explicit
call to a gcc-specific routine :-)

^ permalink raw reply

* Re: [PATCH]: xfrm audit calls
From: David Miller @ 2007-09-12  8:57 UTC (permalink / raw)
  To: latten; +Cc: netdev, linux-audit
In-Reply-To: <200709120003.l8C03E4G004949@faith.austin.ibm.com>

From: Joy Latten <latten@austin.ibm.com>
Date: Tue, 11 Sep 2007 19:03:14 -0500

> This patch modifies the current ipsec audit layer
> by breaking it up into purpose driven audit calls.
> 
> So far, the only audit calls made are when add/delete
> an SA/policy. It had been discussed to give each 
> key manager it's own calls to do this, but I found
> there to be much redundnacy since they did the exact 
> same things, except for how they got auid and sid, so I 
> combined them. The below audit calls can be made by any 
> key manager. Hopefully, this is ok.
> 
> I compiled and tested with CONFIG_AUDITSYSCALLS on and off.
> 
> Signed-off-by: Joy Latten <latten@austin.ibm.com>

Patch applied, thanks!

^ permalink raw reply

* Re: [PATCH 0/2] Clean up owner field in sock_lock_t
From: David Miller @ 2007-09-12  8:45 UTC (permalink / raw)
  To: jheffner; +Cc: netdev
In-Reply-To: <11895336933187-git-send-email-jheffner@psc.edu>

From: John Heffner <jheffner@psc.edu>
Date: Tue, 11 Sep 2007 14:01:31 -0400

> I don't know why the owner field is a (struct sock_iocb *).  I'm assuming
> it's historical.  Can someone check this out?  Did I miss some alternate
> usage?

AIO used it somehow in net/socket.c and I believe there was some
intention to access this sock_iocb deeper in the call chain.

None of that materialized of course :)

> These patches are against net-2.6.24.

Thanks a lot, I'll add these patches.

^ permalink raw reply

* [PATCH 1/2] dgrs: remove from build, config, and maintainer list
From: Nathanael Nerode @ 2007-09-12  7:39 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel

From: Nathanael Nerode

Stop building and configuring driver for Digi RightSwitch, which was 
never actually sold to anyone, and remove it from MAINTAINERS.

In response to an investigation into the firmware of the "Digi Rightswitch" 
driver, Andres Salomon discovered:
>
> Dear Andres:
>
> After further research, we found that this product was killed in place
> and never reached the market.  We would like to request that this not be
> included.  

Since the product never reached market, clearly nobody is using this orphaned 
driver.

Signed-off-by: Nathanael Nerode <neroden@gcc.gnu.org>

---

This is patch 1 of 2 for removing the "Digi Rightswitch" (dgrs).

Patch 2 would be the patch to remove the actual files.  However, that would
be around 400K, which doesn't seem suitable for a mailing list -- and this 
length seems quite unnecessary, given that it would consist solely of full-file 
deletions.  I'm not quite sure what to do about this.  Please advise.

These are the files to be deleted:
./Documentation/networking/dgrs.txt
./drivers/net/dgrs.c
./drivers/net/dgrs.h
./drivers/net/dgrs_asstruct.h
./drivers/net/dgrs_bcomm.h
./drivers/net/dgrs_es4h.h
./drivers/net/dgrs_ether.h
./drivers/net/dgrs_firmware.c (this is the very large one)
./drivers/net/dgrs_i82596.h
./drivers/net/dgrs_plx9060.h

diff -upr linux-2.6.22.6/drivers/net/Kconfig linux-2.6-deleted/drivers/net/Kconfig
--- linux-2.6.22.6/drivers/net/Kconfig	2007-08-31 02:21:01.000000000 -0400
+++ linux-2.6-deleted/drivers/net/Kconfig	2007-09-12 03:28:11.000000000 -0400
@@ -1447,21 +1447,6 @@ config TC35815
 	depends on NET_PCI && PCI && MIPS
 	select MII
 
-config DGRS
-	tristate "Digi Intl. RightSwitch SE-X support"
-	depends on NET_PCI && (PCI || EISA)
-	---help---
-	  This is support for the Digi International RightSwitch series of
-	  PCI/EISA Ethernet switch cards. These include the SE-4 and the SE-6
-	  models.  If you have a network card of this type, say Y and read the
-	  Ethernet-HOWTO, available from
-	  <http://www.tldp.org/docs.html#howto>.  More specific
-	  information is contained in <file:Documentation/networking/dgrs.txt>.
-
-	  To compile this driver as a module, choose M here and read
-	  <file:Documentation/networking/net-modules.txt>.  The module
-	  will be called dgrs.
-
 config EEPRO100
 	tristate "EtherExpressPro/100 support (eepro100, original Becker driver)"
 	depends on NET_PCI && PCI
diff -upr linux-2.6.22.6/drivers/net/Makefile linux-2.6-deleted/drivers/net/Makefile
--- linux-2.6.22.6/drivers/net/Makefile	2007-08-31 02:21:01.000000000 -0400
+++ linux-2.6-deleted/drivers/net/Makefile	2007-09-12 03:28:31.000000000 -0400
@@ -38,7 +38,6 @@ obj-$(CONFIG_CASSINI) += cassini.o
 obj-$(CONFIG_MACE) += mace.o
 obj-$(CONFIG_BMAC) += bmac.o
 
-obj-$(CONFIG_DGRS) += dgrs.o
 obj-$(CONFIG_VORTEX) += 3c59x.o
 obj-$(CONFIG_TYPHOON) += typhoon.o
 obj-$(CONFIG_NE2K_PCI) += ne2k-pci.o 8390.o
diff -upr linux-2.6.22.6/MAINTAINERS linux-2.6-deleted/MAINTAINERS
--- linux-2.6.22.6/MAINTAINERS	2007-08-31 02:21:01.000000000 -0400
+++ linux-2.6-deleted/MAINTAINERS	2007-09-12 03:27:26.000000000 -0400
@@ -1234,12 +1234,6 @@ L:	Eng.Linux@digi.com
 W:	http://www.digi.com
 S:	Orphaned
 
-DIGI RIGHTSWITCH NETWORK DRIVER
-P:	Rick Richardson
-L:	netdev@vger.kernel.org
-W:	http://www.digi.com
-S:	Orphaned
-
 DIRECTORY NOTIFICATION
 P:	Stephen Rothwell
 M:	sfr@canb.auug.org.au

-- 
Nathanael Nerode  <neroden@fastmail.fm>

[Insert famous quote here]

^ permalink raw reply

* Re: [PATCH] Add IP1000A Driver
From: Stephen Hemminger @ 2007-09-12  7:34 UTC (permalink / raw)
  To: 黃建興-Jesse; +Cc: Francois Romieu, jeff, akpm, netdev, jesse
In-Reply-To: <AA68EB0EBA29BA40A06B700C33343EEF0190124E@fileserver.icplus.com.tw>

On Wed, 12 Sep 2007 13:35:43 +0800
黃建興-Jesse <Jesse@icplus.com.tw> wrote:

> 
> > -----Original Message-----
> > From: Stephen Hemminger [mailto:shemminger@linux-foundation.org] 
> > Sent: Tuesday, September 11, 2007 10:42 PM
> > To: Jesse Huang
> > Cc: jeff@garzik.org; akpm@linux-foundation.org; netdev@vger.kernel.org;
> jesse@icplus.com.tw
> > Subject: Re: [PATCH] Add IP1000A Driver
> >
> >
> > Who will be listed as maintainer of this device?
> > A good way to show that is to add an entry to MAINTAINERS file.
> 
> 
> Ok, Should I generate a patch to modify MAINTAINERS file?

Yes, can be included with patch or separate, it doesn't matter.

> 
> > + * Current Maintainer:
> > + *
> > + *   Sorbica Shieh.
> > + *   10F, No.47, Lane 2, Kwang-Fu RD.
> > + *   Sec. 2, Hsin-Chu, Taiwan, R.O.C.
> > + *   http://www.icplus.com.tw
> > + *   sorbica@icplus.com.tw
> > + */
> 
> > Names only, no physical addresses please.
> 
> Should I remove those two lins?
> 10F, No.47, Lane 2, Kwang-Fu RD.
> Sec. 2, Hsin-Chu, Taiwan, R.O.C.

It is your option, but many times people and companies move locations
and this gets out of date.

^ permalink raw reply

* Re: RFC: possible NAPI improvements to reduce interrupt rates for low traffic rates
From: Bill Fink @ 2007-09-12  7:04 UTC (permalink / raw)
  To: hadi
  Cc: James Chapman, netdev, davem, jeff, mandeep.baines, ossthema,
	Stephen Hemminger
In-Reply-To: <1189171370.4234.38.camel@localhost>

On Fri, 07 Sep 2007, jamal wrote:

> On Fri, 2007-07-09 at 10:31 +0100, James Chapman wrote:
> > Not really. I used 3-year-old, single CPU x86 boxes with e100 
> > interfaces. 
> > The idle poll change keeps them in polled mode. Without idle 
> > poll, I get twice as many interrupts as packets, one for txdone and one 
> > for rx. NAPI is continuously scheduled in/out.
> 
> Certainly faster than the machine in the paper (which was about 2 years
> old in 2005).
> I could never get ping -f to do that for me - so things must be getting
> worse with newer machines then.
> 
> > No. Since I did a flood ping from the machine under test, the improved 
> > latency meant that the ping response was handled more quickly, causing 
> > the next packet to be sent sooner. So more packets were transmitted in 
> > the allotted time (10 seconds).
> 
> ok.
> 
> > With current NAPI:
> > rtt min/avg/max/mdev = 0.902/1.843/101.727/4.659 ms, pipe 9, ipg/ewma 
> > 1.611/1.421 ms
> > 
> > With idle poll changes:
> > rtt min/avg/max/mdev = 0.898/1.117/28.371/0.689 ms, pipe 3, ipg/ewma 
> > 1.175/1.236 ms
> 
> Not bad in terms of latency. The deviation certainly looks better.
> 
> > But the CPU has done more work. 
> 
> I am going to be the devil's advocate[1]:

So let me be the angel's advocate.  :-)

> If the problem i am trying to solve is "reduce cpu use at lower rate",
> then this is not the right answer because your cpu use has gone up.
> Your latency numbers have not improved that much (looking at the avg)
> and your throughput is not that much higher. Will i be willing to pay
> more cpu (of an already piggish cpu use by NAPI at that rate with 2
> interupts per packet)?

I view his results much more favorably.  With current NAPI, the average
RTT is 104% higher than the minimum, the deviation is 4.659 ms, and the
maximum RTT is 101.727 ms.  With his patch, the average RTT is only 24%
higher than the minimum, the deviation is only 0.689 ms, and the maximum
RTT is 28.371 ms.  The average RTT improved by 39%, the deviation was
6.8 times smaller, and the maximum RTT was 3.6 times smaller.  So in
every respect the latency was significantly better.

The throughput increased from 6200 packets to 8510 packets or an increase
of 37%.  The only negative is that the CPU utilization increased from
62% to 100% or an increase of 61%, so the CPU increase was greater than
the increase in the amount of work performed (17.6% greater than what
one would expect purely from the increased amount of work).

You can't always improve on all metrics of a workload.  Sometimes there
are tradeoffs to be made to be decided by the user based on what's most
important to that user and his specific workload.  And the suggested
ethtool option (defaulting to current behavior) would enable the user
to make that decision.

						-Bill

P.S.  I agree that some tests run in parallel with some CPU hogs also
      running might be beneficial and enlightening.

^ permalink raw reply

* Re: sh: add support for ax88796 and 93cx6 to highlander boards
From: Magnus Damm @ 2007-09-12  5:21 UTC (permalink / raw)
  To: Paul Mundt; +Cc: netdev, jgarzik, ben-linux
In-Reply-To: <20070910071528.GB13672@linux-sh.org>

On 9/10/07, Paul Mundt <lethal@linux-sh.org> wrote:
> On Mon, Sep 10, 2007 at 03:36:26PM +0900, Magnus Damm wrote:
> > --- 0001/drivers/net/Kconfig
> > +++ work/drivers/net/Kconfig  2007-09-06 15:35:41.000000000 +0900
> > @@ -218,13 +218,20 @@ source "drivers/net/arm/Kconfig"
> >
> >  config AX88796
> >       tristate "ASIX AX88796 NE2000 clone support"
> > -     depends on ARM || MIPS
> > +     depends on ARM || MIPS || SUPERH
> >       select CRC32
> >       select MII
> >       help
> >         AX88796 driver, using platform bus to provide
> >         chip detection and resources
> >
> > +config AX88796_93CX6
> > +     bool "ASIX AX88796 external 93CX6 eeprom support"
> > +     depends on AX88796
> > +     select EEPROM_93CX6
> > +     help
> > +       Select this if your platform comes with an external 93CX6 eeprom.
> > +
> >  config MACE
> >       tristate "MACE (Power Mac ethernet) support"
> >       depends on PPC_PMAC && PPC32
>
> There are two different changes here, these should probably be split up
> and applied independently of each other, given that there's no real
> dependency between them.

Sure. I hope to first get some feedback regarding the AX88796 specific
parts of the patch, then i'll split it up and repost. Getting this one
included in 2.6.24 would be nice if possible.

Thanks,

/ magnus

^ permalink raw reply

* Re: [PATCH] Configurable tap interface MTU
From: Herbert Xu @ 2007-09-12  3:29 UTC (permalink / raw)
  To: Ed Swierk; +Cc: netdev, maxk, linux-kernel
In-Reply-To: <e6c711510709110742v4d2da644ha63b0a6d47c1230b@mail.gmail.com>

Ed Swierk <eswierk@arastra.com> wrote:
> 
> The patch caps the MTU somewhat arbitrarily at 16000 bytes. This is
> slightly lower than the value used by the e1000 driver, so it seems
> like a safe upper limit.

Please make it 65535 without an Ethernet header and 65521
with an Ethernet header.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [RFC 2/3] rfkill: Add support for ultrawideband
From: Inaky Perez-Gonzalez @ 2007-09-12  1:07 UTC (permalink / raw)
  To: Ivo van Doorn; +Cc: Dmitry Torokhov, netdev, linux-wireless
In-Reply-To: <200709081711.50344.IvDoorn@gmail.com>

On Saturday 08 September 2007, Ivo van Doorn wrote:
> This patch will add support for UWB keys to rfkill,
> support for this has been requested by Inaky.
> 
> Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
> CC: Inaky Perez-Gonzalez <inaky@linux.intel.com>

Thanks so much

Acked-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox