Netdev List
 help / color / mirror / Atom feed
* Re: Top 10 kernel oopses for the week ending January 5th, 2008
From: Andi Kleen @ 2008-01-08 17:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kevin Winchester, J. Bruce Fields, Al Viro, Arjan van de Ven,
	Linux Kernel Mailing List, Andrew Morton, NetDev
In-Reply-To: <alpine.LFD.1.00.0801071851120.3148@woody.linux-foundation.org>

Linus Torvalds <torvalds@linux-foundation.org> writes:
>
> I usually just compile a small program like

Just use scripts/decodecode and cat the Code line into that.

> particularly good way to do it, and the old "ksymoops" program used to do 
> a pretty good job of this, but I'm used to that particular idiotic way 
> myself, since it's how I've basically always done it)
>
> After that, you still need to try to match up the assembly code with the 
> source code and figure out what variables the register contents actually 
> are all about. You can often try to do a
>
> 	make the/affected/file.s


IMHO better is  

make the/file/xyz.lst        

which gives you a listing with binary data in there which can be
grepped for.

But you should install a very recent binutils because older objdump -S
couldn't deal with unit-at-a-time compilers.

-Andi

^ permalink raw reply

* [ANNOUNCE] bridge-utils 1.4
From: Stephen Hemminger @ 2008-01-08 16:56 UTC (permalink / raw)
  To: bridge; +Cc: netdev

Minor update to bridge-utils. Mostly fixing bugs in usage of sysfs.

Release tarball:
  http://downloads.sourceforge.net/bridge/bridge-utils-1.4.tar.gz

Alon Bar-Lev (1):
      Allow bridge-utils to run when no TCP/IP is available

Denys Vlasenko (1):
      fix use of sysfs (affects 32/64 bit compat)

Jeremy Jackson (1):
      Fix parsing of port_id's (hex).

Stephen Hemminger (3):
      Add ignore for generated files.
      Update gitignore
      Use linux/if.h rather than net/if.h


-- 
Stephen Hemminger <stephen.hemminger@vyatta.com>

^ permalink raw reply

* Re: SACK scoreboard
From: John Heffner @ 2008-01-08 16:51 UTC (permalink / raw)
  To: David Miller; +Cc: ilpo.jarvinen, lachlan.andrew, netdev, quetchen
In-Reply-To: <20080107.233617.203640686.davem@davemloft.net>

David Miller wrote:
> Ilpo, just trying to keep an old conversation from dying off.
> 
> Did you happen to read a recent blog posting of mine?
> 
> 	http://vger.kernel.org/~davem/cgi-bin/blog.cgi/2007/12/31#tcp_overhead
> 
> I've been thinking more and more and I think we might be able
> to get away with enforcing that SACKs are always increasing in
> coverage.
> 
> I doubt there are any real systems out there that drop out of order
> packets that are properly formed and are in window, even though the
> SACK specification (foolishly, in my opinion) allows this.
> 
> If we could free packets as SACK blocks cover them, all the problems
> go away.
> 
> For one thing, this will allow the retransmit queue liberation during
> loss recovery to be spread out over the event, instead of batched up
> like crazy to the point where the cumulative ACK finally moves and
> releases an entire window's worth of data.
> 
> Next, it would simplify all of this scanning code trying to figure out
> which holes to fill during recovery.
> 
> And for SACK scoreboard marking, the RB trie would become very nearly
> unecessary as far as I can tell.
> 
> I would not even entertain this kind of crazy idea unless I thought
> the fundamental complexity simplification payback was enormous.  And
> in this case I think it is.
> 
> What we could do is put some experimental hack in there for developers
> to start playing with, which would enforce that SACKs always increase
> in coverage.  If violated the connection reset and a verbose log
> message is logged so we can analyze any cases that occur.
> 
> Sounds crazy, but maybe has potential.  What do you think?


Linux has a code path where this can happen under memory over-commit, in 
tcp_prune_queue().  Also, I think one of the motivations for making SACK 
strictly advisory is there was some concern about buggy SACK 
implementations.  Keeping data in your retransmit queue allows you to 
fall back to timeout and go-back-n if things completely fall apart.  For 
better or worse, we have to deal with the spec the way it is.

Even if you made this assumption of "hard" SACKs, you still have to 
worry about large ACKs if SACK is disabled, though I guess you could say 
people running with large windows without SACK deserve what they get. :)


I haven't thought about this too hard, but can we approximate this by 
moving scaked data into a sacked queue, then if something bad happens 
merge this back into the retransmit queue?  The code will have to deal 
with non-contiguous data in the retransmit queue; I'm not sure offhand 
if that violates any assumptions.  You still have a single expensive ACK 
at the end of recovery, though I wonder how much this really hurts.  If 
you want to ameliorate this, you could save this sacked queue to be 
batch processed later, in application context for instance.

   -John



^ permalink raw reply

* Re: [PATCH] ip[6]_tables.c: remove some inlines
From: Patrick McHardy @ 2008-01-08 16:36 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Harald Welte, netdev, linux-kernel,
	Netfilter Development Mailinglist
In-Reply-To: <200712311755.02578.vda.linux@googlemail.com>

Denys Vlasenko wrote:
> On Monday 31 December 2007 17:00, Patrick McHardy wrote:
>> Denys Vlasenko wrote:
>>> ip[6]_tables.c seem to abuse inline.
>>>
>>> This patch removes inlines except those which are used
>>> by packet matching code and thus are performance-critical.
>>>   
>> Some people also consider the ruleset replacement path performance
>> critical, but overall I agree with your patch. I'm travelling currently
>> though so it will take a few days until I'll apply it.
>>
>> Do you have some numbers that show the actual difference these
>> changes make?
> 
> Before:
> 
> $ size */*/*/ip*tables*.o
>    text    data     bss     dec     hex filename
>    6402     500      16    6918    1b06 net/ipv4/netfilter/ip_tables.o
>    7130     500      16    7646    1dde net/ipv6/netfilter/ip6_tables.o
> 
> After:
> 
> $ size */*/*/ip*tables*.o
>    text    data     bss     dec     hex filename
>    6307     500      16    6823    1aa7 net/ipv4/netfilter/ip_tables.o
>    7010     500      16    7526    1d66 net/ipv6/netfilter/ip6_tables.o


Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-2.6.25 6/6][NETFILTER] Use the ctl paths instead of hand-made analogue
From: Patrick McHardy @ 2008-01-08 16:13 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: David Miller, Linux Netdev List, devel
In-Reply-To: <4783A0D1.4010202@openvz.org>

Pavel Emelyanov wrote:
> The conntracks subsystem has a similar infrastructure
> to maintain ctl_paths, but since we already have it
> on the generic level, I think it's OK to switch to
> using it.
> 
> So, basically, this patch just replaces the ctl_table-s
> with ctl_path-s, nf_register_sysctl_table with 
> register_sysctl_paths() and removes no longer needed code.

Also looks good, thanks. But please remeber to CC netfilter-devel
on patches affecting netfilter.

> After this the net/netfilter/nf_sysctl.c file contains 
> the paths only.

That sounds like they should be moved to net/netfilter/core.c
instead and the file removed. I can take care of that once
your patches are in Dave's tree.

^ permalink raw reply

* Re: [PATCH net-2.6.25 5/6][NETFILTER] Switch to using ctl_paths in nf_queue and conntrack modules
From: Patrick McHardy @ 2008-01-08 16:10 UTC (permalink / raw)
  To: Pavel Emelyanov; +Cc: David Miller, Linux Netdev List, devel
In-Reply-To: <4783A00D.7000008@openvz.org>

Pavel Emelyanov wrote:
> This includes the most simple cases for netfilter.
> 
> The first part is tne queue modules for ipv4 and ipv6,
> on which the net/ipv4/ and net/ipv6/ paths are reused 
> from the appropriate ipv4 and ipv6 code.
> 
> The conntrack module is also patched, but this hunk is 
> very small and simple.
> 
> Signed-off-by: Pavel Emelyanov <xemul@openvz.org>


Looks good to me.

^ permalink raw reply

* Re: Top 10 kernel oopses for the week ending January 5th, 2008
From: Randy Dunlap @ 2008-01-08 16:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kevin Winchester, J. Bruce Fields, Al Viro, Arjan van de Ven,
	Linux Kernel Mailing List, Andrew Morton, NetDev
In-Reply-To: <alpine.LFD.1.00.0801071851120.3148@woody.linux-foundation.org>

On Mon, 7 Jan 2008 19:26:12 -0800 (PST) Linus Torvalds wrote:

> On Mon, 7 Jan 2008, Kevin Winchester wrote:
> 
> > J. Bruce Fields wrote:
> > > 
> > > Is there any good basic documentation on this to point people at?
> > 
> > I would second this question.  I see people "decode" oops on lkml often 
> > enough, but I've never been entirely sure how its done.  Is it somewhere 
> > in Documentation?
> 
> It's actually not necessarily at all that trivial, unless you have a deep 
> understanding of the code generated for the architecture in question (and 
> even then, some oopses take more time to figure out than others, thanks 
> to inlining and tailcalls etc).
> 
> If the oops happened with a kernel you generated yourself, it's usually 
> rather easy. Especially if you said "y" to the "generate debugging info" 
> question at configuration time. Because, in that case, you really just do 
> a simple
> 
> 	gdb vmlinux
> 
> and then you can do (for example) something like setting a breakpoint at 
> the EIP that was reported for the oops, and it will tell you what line it 
> came from.
> 
> However, if you don't have the exact binary - which is the common case for 
> random oopses reported on lkml - you will generally have to disassemble 
> the hex sequence given in the oops (the "Code:" line), and try to match it 
> up against the source code to try to figure out what is going on.
> 
> Even just the disassembly is not entirely trivial, since the oops will 
> give you the eip that it happened at, but you often want to also 
> disassemble *backwards* in order to get more of a context (the "Code:" 
> line will mark the particular EIP that starts the oopsing instruction by 
> enclosing it in <xx>, but with non-constant instruction lengths, you need 
> to use a bit of trial-and-error to figure it out.
> 
> I usually just compile a small program like
> 
> 	const char array[]="\xnn\xnn\xnn...";
> 
> 	int main(int argc, char **argv)
> 	{
> 		printf("%p\n", array);
> 		*(int *)0=0;
> 	}
> 
> and run it under gdb, and then when it gets the SIGSEGV (due to the 
> obvious NULL pointer dereference), I can just ask gdb to disassemble 
> around the array that contains the code[] stuff. Try a few offsets, to see 
> when the disassembly makes sense (and gives the reported EIP as the 
> beginning of one of the disassembled instructions).
> 
> (You can do it other and smarter ways too, I'm not claiming that's a 
> particularly good way to do it, and the old "ksymoops" program used to do 
> a pretty good job of this, but I'm used to that particular idiotic way 
> myself, since it's how I've basically always done it)

One other way to do it (at least for x86-32/64) is to use
$kerneltree/scripts/decodecode.  It may work on other $arches also,
but I haven't tested it on others.

---
~Randy

^ permalink raw reply

* [PATCH net-2.6.25 6/6][NETFILTER] Use the ctl paths instead of hand-made analogue
From: Pavel Emelyanov @ 2008-01-08 16:12 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel, Patrick McHardy
In-Reply-To: <47839C30.2040402@openvz.org>

The conntracks subsystem has a similar infrastructure
to maintain ctl_paths, but since we already have it
on the generic level, I think it's OK to switch to
using it.

So, basically, this patch just replaces the ctl_table-s
with ctl_path-s, nf_register_sysctl_table with 
register_sysctl_paths() and removes no longer needed code.

After this the net/netfilter/nf_sysctl.c file contains 
the paths only.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

 include/linux/netfilter.h                    |    8 -
 include/net/netfilter/nf_conntrack_l3proto.h |    2 
 net/netfilter/nf_conntrack_proto.c           |    7 -
 net/netfilter/nf_sysctl.c                    |  127 +--------------------------
 4 files changed, 16 insertions(+), 128 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index d190d56..c41f643 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -120,12 +120,8 @@ void nf_unregister_sockopt(struct nf_sockopt_ops *reg);
 
 #ifdef CONFIG_SYSCTL
 /* Sysctl registration */
-struct ctl_table_header *nf_register_sysctl_table(struct ctl_table *path,
-						  struct ctl_table *table);
-void nf_unregister_sysctl_table(struct ctl_table_header *header,
-				struct ctl_table *table);
-extern struct ctl_table nf_net_netfilter_sysctl_path[];
-extern struct ctl_table nf_net_ipv4_netfilter_sysctl_path[];
+extern struct ctl_path nf_net_netfilter_sysctl_path[];
+extern struct ctl_path nf_net_ipv4_netfilter_sysctl_path[];
 #endif /* CONFIG_SYSCTL */
 
 extern struct list_head nf_hooks[NPROTO][NF_MAX_HOOKS];
diff --git a/include/net/netfilter/nf_conntrack_l3proto.h b/include/net/netfilter/nf_conntrack_l3proto.h
index 15888fc..875c6d4 100644
--- a/include/net/netfilter/nf_conntrack_l3proto.h
+++ b/include/net/netfilter/nf_conntrack_l3proto.h
@@ -73,7 +73,7 @@ struct nf_conntrack_l3proto
 
 #ifdef CONFIG_SYSCTL
 	struct ctl_table_header	*ctl_table_header;
-	struct ctl_table	*ctl_table_path;
+	struct ctl_path		*ctl_table_path;
 	struct ctl_table	*ctl_table;
 #endif /* CONFIG_SYSCTL */
 
diff --git a/net/netfilter/nf_conntrack_proto.c b/net/netfilter/nf_conntrack_proto.c
index 6d94706..8595b59 100644
--- a/net/netfilter/nf_conntrack_proto.c
+++ b/net/netfilter/nf_conntrack_proto.c
@@ -36,11 +36,11 @@ static DEFINE_MUTEX(nf_ct_proto_mutex);
 
 #ifdef CONFIG_SYSCTL
 static int
-nf_ct_register_sysctl(struct ctl_table_header **header, struct ctl_table *path,
+nf_ct_register_sysctl(struct ctl_table_header **header, struct ctl_path *path,
 		      struct ctl_table *table, unsigned int *users)
 {
 	if (*header == NULL) {
-		*header = nf_register_sysctl_table(path, table);
+		*header = register_sysctl_paths(path, table);
 		if (*header == NULL)
 			return -ENOMEM;
 	}
@@ -55,7 +55,8 @@ nf_ct_unregister_sysctl(struct ctl_table_header **header,
 {
 	if (users != NULL && --*users > 0)
 		return;
-	nf_unregister_sysctl_table(*header, table);
+
+	unregister_sysctl_table(*header);
 	*header = NULL;
 }
 #endif
diff --git a/net/netfilter/nf_sysctl.c b/net/netfilter/nf_sysctl.c
index ee34589..d9fcc89 100644
--- a/net/netfilter/nf_sysctl.c
+++ b/net/netfilter/nf_sysctl.c
@@ -7,128 +7,19 @@
 #include <linux/string.h>
 #include <linux/slab.h>
 
-static void
-path_free(struct ctl_table *path, struct ctl_table *table)
-{
-	struct ctl_table *t, *next;
-
-	for (t = path; t != NULL && t != table; t = next) {
-		next = t->child;
-		kfree(t);
-	}
-}
-
-static struct ctl_table *
-path_dup(struct ctl_table *path, struct ctl_table *table)
-{
-	struct ctl_table *t, *last = NULL, *tmp;
-
-	for (t = path; t != NULL; t = t->child) {
-		/* twice the size since path elements are terminated by an
-		 * empty element */
-		tmp = kmemdup(t, 2 * sizeof(*t), GFP_KERNEL);
-		if (tmp == NULL) {
-			if (last != NULL)
-				path_free(path, table);
-			return NULL;
-		}
-
-		if (last != NULL)
-			last->child = tmp;
-		else
-			path = tmp;
-		last = tmp;
-	}
-
-	if (last != NULL)
-		last->child = table;
-	else
-		path = table;
-
-	return path;
-}
-
-struct ctl_table_header *
-nf_register_sysctl_table(struct ctl_table *path, struct ctl_table *table)
-{
-	struct ctl_table_header *header;
-
-	path = path_dup(path, table);
-	if (path == NULL)
-		return NULL;
-	header = register_sysctl_table(path);
-	if (header == NULL)
-		path_free(path, table);
-	return header;
-}
-EXPORT_SYMBOL_GPL(nf_register_sysctl_table);
-
-void
-nf_unregister_sysctl_table(struct ctl_table_header *header,
-			   struct ctl_table *table)
-{
-	struct ctl_table *path = header->ctl_table;
-
-	unregister_sysctl_table(header);
-	path_free(path, table);
-}
-EXPORT_SYMBOL_GPL(nf_unregister_sysctl_table);
-
 /* net/netfilter */
-static struct ctl_table nf_net_netfilter_table[] = {
-	{
-		.ctl_name	= NET_NETFILTER,
-		.procname	= "netfilter",
-		.mode		= 0555,
-	},
-	{
-		.ctl_name	= 0
-	}
-};
-struct ctl_table nf_net_netfilter_sysctl_path[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= nf_net_netfilter_table,
-	},
-	{
-		.ctl_name	= 0
-	}
+struct ctl_path nf_net_netfilter_sysctl_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "netfilter", .ctl_name = NET_NETFILTER, },
+	{ }
 };
 EXPORT_SYMBOL_GPL(nf_net_netfilter_sysctl_path);
 
 /* net/ipv4/netfilter */
-static struct ctl_table nf_net_ipv4_netfilter_table[] = {
-	{
-		.ctl_name	= NET_IPV4_NETFILTER,
-		.procname	= "netfilter",
-		.mode		= 0555,
-	},
-	{
-		.ctl_name	= 0
-	}
-};
-static struct ctl_table nf_net_ipv4_table[] = {
-	{
-		.ctl_name	= NET_IPV4,
-		.procname	= "ipv4",
-		.mode		= 0555,
-		.child		= nf_net_ipv4_netfilter_table,
-	},
-	{
-		.ctl_name	= 0
-	}
-};
-struct ctl_table nf_net_ipv4_netfilter_sysctl_path[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= nf_net_ipv4_table,
-	},
-	{
-		.ctl_name	= 0
-	}
+struct ctl_path nf_net_ipv4_netfilter_sysctl_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "ipv4", .ctl_name = NET_IPV4, },
+	{ .procname = "netfilter", .ctl_name = NET_IPV4_NETFILTER, },
+	{ }
 };
 EXPORT_SYMBOL_GPL(nf_net_ipv4_netfilter_sysctl_path);

^ permalink raw reply related

* [PATCH net-2.6.25 5/6][NETFILTER] Switch to using ctl_paths in nf_queue and conntrack modules
From: Pavel Emelyanov @ 2008-01-08 16:08 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel, Patrick McHardy
In-Reply-To: <47839C30.2040402@openvz.org>

This includes the most simple cases for netfilter.

The first part is tne queue modules for ipv4 and ipv6,
on which the net/ipv4/ and net/ipv6/ paths are reused 
from the appropriate ipv4 and ipv6 code.

The conntrack module is also patched, but this hunk is 
very small and simple.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

diff --git a/include/net/ip.h b/include/net/ip.h
index 8be48c8..2ad4d2f 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -177,6 +177,8 @@ extern void inet_get_local_port_range(int *low, int *high);
 extern int sysctl_ip_default_ttl;
 extern int sysctl_ip_nonlocal_bind;
 
+extern struct ctl_path net_ipv4_ctl_path[];
+
 /* From ip_fragment.c */
 struct inet_frags_ctl;
 extern struct inet_frags_ctl ip4_frags_ctl;
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index f2adedf..e371f32 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -112,6 +112,8 @@ struct frag_hdr {
 extern int sysctl_ipv6_bindv6only;
 extern int sysctl_mld_max_msf;
 
+extern struct ctl_path net_ipv6_ctl_path[];
+
 #define _DEVINC(statname, modifier, idev, field)			\
 ({									\
 	struct inet6_dev *_idev = (idev);				\
diff --git a/net/ipv4/netfilter/ip_queue.c b/net/ipv4/netfilter/ip_queue.c
index 68b12ce..7361315 100644
--- a/net/ipv4/netfilter/ip_queue.c
+++ b/net/ipv4/netfilter/ip_queue.c
@@ -29,6 +29,7 @@
 #include <net/sock.h>
 #include <net/route.h>
 #include <net/netfilter/nf_queue.h>
+#include <net/ip.h>
 
 #define IPQ_QMAX_DEFAULT 1024
 #define IPQ_PROC_FS_NAME "ip_queue"
@@ -525,26 +526,6 @@ static ctl_table ipq_table[] = {
 	{ .ctl_name = 0 }
 };
 
-static ctl_table ipq_dir_table[] = {
-	{
-		.ctl_name	= NET_IPV4,
-		.procname	= "ipv4",
-		.mode		= 0555,
-		.child		= ipq_table
-	},
-	{ .ctl_name = 0 }
-};
-
-static ctl_table ipq_root_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= ipq_dir_table
-	},
-	{ .ctl_name = 0 }
-};
-
 static int ip_queue_show(struct seq_file *m, void *v)
 {
 	read_lock_bh(&queue_lock);
@@ -610,7 +591,7 @@ static int __init ip_queue_init(void)
 	}
 
 	register_netdevice_notifier(&ipq_dev_notifier);
-	ipq_sysctl_header = register_sysctl_table(ipq_root_table);
+	ipq_sysctl_header = register_sysctl_paths(net_ipv4_ctl_path, ipq_table);
 
 	status = nf_register_queue_handler(PF_INET, &nfqh);
 	if (status < 0) {
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index a5a9f8e..45536a9 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -846,17 +846,18 @@ static struct ctl_table ipv4_table[] = {
 	{ .ctl_name = 0 }
 };
 
-static __initdata struct ctl_path net_ipv4_path[] = {
+struct ctl_path net_ipv4_ctl_path[] = {
 	{ .procname = "net", .ctl_name = CTL_NET, },
 	{ .procname = "ipv4", .ctl_name = NET_IPV4, },
 	{ },
 };
+EXPORT_SYMBOL_GPL(net_ipv4_ctl_path);
 
 static __init int sysctl_ipv4_init(void)
 {
 	struct ctl_table_header *hdr;
 
-	hdr = register_sysctl_paths(net_ipv4_path, ipv4_table);
+	hdr = register_sysctl_paths(net_ipv4_ctl_path, ipv4_table);
 	return hdr == NULL ? -ENOMEM : 0;
 }
 
diff --git a/net/ipv6/netfilter/ip6_queue.c b/net/ipv6/netfilter/ip6_queue.c
index e5b0059..a20db0b 100644
--- a/net/ipv6/netfilter/ip6_queue.c
+++ b/net/ipv6/netfilter/ip6_queue.c
@@ -529,26 +529,6 @@ static ctl_table ipq_table[] = {
 	{ .ctl_name = 0 }
 };
 
-static ctl_table ipq_dir_table[] = {
-	{
-		.ctl_name	= NET_IPV6,
-		.procname	= "ipv6",
-		.mode		= 0555,
-		.child		= ipq_table
-	},
-	{ .ctl_name = 0 }
-};
-
-static ctl_table ipq_root_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= ipq_dir_table
-	},
-	{ .ctl_name = 0 }
-};
-
 static int ip6_queue_show(struct seq_file *m, void *v)
 {
 	read_lock_bh(&queue_lock);
@@ -614,7 +594,7 @@ static int __init ip6_queue_init(void)
 	}
 
 	register_netdevice_notifier(&ipq_dev_notifier);
-	ipq_sysctl_header = register_sysctl_table(ipq_root_table);
+	ipq_sysctl_header = register_sysctl_paths(net_ipv6_ctl_path, ipq_table);
 
 	status = nf_register_queue_handler(PF_INET6, &nfqh);
 	if (status < 0) {
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 0b5bec3..4ad8d9d 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -82,17 +82,19 @@ static ctl_table ipv6_table[] = {
 	{ .ctl_name = 0 }
 };
 
-static struct ctl_path ipv6_ctl_path[] = {
+struct ctl_path net_ipv6_ctl_path[] = {
 	{ .procname = "net", .ctl_name = CTL_NET, },
 	{ .procname = "ipv6", .ctl_name = NET_IPV6, },
 	{ },
 };
+EXPORT_SYMBOL_GPL(net_ipv6_ctl_path);
 
 static struct ctl_table_header *ipv6_sysctl_header;
 
 void ipv6_sysctl_register(void)
 {
-	ipv6_sysctl_header = register_sysctl_paths(ipv6_ctl_path, ipv6_table);
+	ipv6_sysctl_header = register_sysctl_paths(net_ipv6_ctl_path,
+			ipv6_table);
 }
 
 void ipv6_sysctl_unregister(void)
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index 9efdd37..2ad4933 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -383,15 +383,11 @@ static ctl_table nf_ct_netfilter_table[] = {
 	{ .ctl_name = 0 }
 };
 
-static ctl_table nf_ct_net_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= nf_ct_netfilter_table,
-	},
-	{ .ctl_name = 0 }
+struct ctl_path nf_ct_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ }
 };
+
 EXPORT_SYMBOL_GPL(nf_ct_log_invalid);
 #endif /* CONFIG_SYSCTL */
 
@@ -418,7 +414,8 @@ static int __init nf_conntrack_standalone_init(void)
 	proc_stat->owner = THIS_MODULE;
 #endif
 #ifdef CONFIG_SYSCTL
-	nf_ct_sysctl_header = register_sysctl_table(nf_ct_net_table);
+	nf_ct_sysctl_header = register_sysctl_paths(nf_ct_path,
+			nf_ct_netfilter_table);
 	if (nf_ct_sysctl_header == NULL) {
 		printk("nf_conntrack: can't register to sysctl.\n");
 		ret = -ENOMEM;


^ permalink raw reply related

* [PATCH net-2.6.25 4/6][AX25] Switch to using ctl_paths.
From: Pavel Emelyanov @ 2008-01-08 16:05 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel
In-Reply-To: <47839C30.2040402@openvz.org>

This one is almost the same as the hunks in the
first patch, but ax25 tables are created dynamically.

So this patch differs a bit to handle this case.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

 net/ax25/sysctl_net_ax25.c |   27 +++++----------------------
 1 files changed, 5 insertions(+), 22 deletions(-)

diff --git a/net/ax25/sysctl_net_ax25.c b/net/ax25/sysctl_net_ax25.c
index 443a836..f597987 100644
--- a/net/ax25/sysctl_net_ax25.c
+++ b/net/ax25/sysctl_net_ax25.c
@@ -31,25 +31,11 @@ static struct ctl_table_header *ax25_table_header;
 static ctl_table *ax25_table;
 static int ax25_table_size;
 
-static ctl_table ax25_dir_table[] = {
-	{
-		.ctl_name	= NET_AX25,
-		.procname	= "ax25",
-		.mode		= 0555,
-	},
-	{ .ctl_name = 0 }
-};
-
-static ctl_table ax25_root_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= ax25_dir_table
-	},
-	{ .ctl_name = 0 }
+static struct ctl_path ax25_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "ax25", .ctl_name = NET_AX25, },
+	{ }
 };
-
 static const ctl_table ax25_param_table[] = {
 	{
 		.ctl_name	= NET_AX25_IP_DEFAULT_MODE,
@@ -243,9 +229,7 @@ void ax25_register_sysctl(void)
 	}
 	spin_unlock_bh(&ax25_dev_lock);
 
-	ax25_dir_table[0].child = ax25_table;
-
-	ax25_table_header = register_sysctl_table(ax25_root_table);
+	ax25_table_header = register_sysctl_paths(ax25_path, ax25_table);
 }
 
 void ax25_unregister_sysctl(void)
@@ -253,7 +237,6 @@ void ax25_unregister_sysctl(void)
 	ctl_table *p;
 	unregister_sysctl_table(ax25_table_header);
 
-	ax25_dir_table[0].child = NULL;
 	for (p = ax25_table; p->ctl_name; p++)
 		kfree(p->child);
 	kfree(ax25_table);


^ permalink raw reply related

* [PATCH net-2.6.25 3/6][DECNET] Switch to using ctl_paths.
From: Pavel Emelyanov @ 2008-01-08 16:02 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel
In-Reply-To: <47839C30.2040402@openvz.org>

The decnet includes two places to patch. The first one is 
the net/decnet table itself, and it is patched just like 
other subsystems in the first patch in this series.

The second place is a bit more complex - it is the 
net/decnet/conf/xxx entries,. similar to those in 
ipv4/devinet.c and ipv6/addrconf.c. This code is made similar 
to those in ipv[46].

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

 net/decnet/dn_dev.c            |   52 +++++++++++------------------------------
 net/decnet/sysctl_net_decnet.c |   23 +++---------------
 2 files changed, 20 insertions(+), 55 deletions(-)

diff --git a/net/decnet/dn_dev.c b/net/decnet/dn_dev.c
index 39c89c6..fb884e5 100644
--- a/net/decnet/dn_dev.c
+++ b/net/decnet/dn_dev.c
@@ -173,10 +173,6 @@ static int dn_forwarding_sysctl(ctl_table *table, int __user *name, int nlen,
 static struct dn_dev_sysctl_table {
 	struct ctl_table_header *sysctl_header;
 	ctl_table dn_dev_vars[5];
-	ctl_table dn_dev_dev[2];
-	ctl_table dn_dev_conf_dir[2];
-	ctl_table dn_dev_proto_dir[2];
-	ctl_table dn_dev_root_dir[2];
 } dn_dev_sysctl = {
 	NULL,
 	{
@@ -224,30 +220,6 @@ static struct dn_dev_sysctl_table {
 	},
 	{0}
 	},
-	{{
-		.ctl_name = 0,
-		.procname = "",
-		.mode = 0555,
-		.child = dn_dev_sysctl.dn_dev_vars
-	}, {0}},
-	{{
-		.ctl_name = NET_DECNET_CONF,
-		.procname = "conf",
-		.mode = 0555,
-		.child = dn_dev_sysctl.dn_dev_dev
-	}, {0}},
-	{{
-		.ctl_name = NET_DECNET,
-		.procname = "decnet",
-		.mode = 0555,
-		.child = dn_dev_sysctl.dn_dev_conf_dir
-	}, {0}},
-	{{
-		.ctl_name = CTL_NET,
-		.procname = "net",
-		.mode = 0555,
-		.child = dn_dev_sysctl.dn_dev_proto_dir
-	}, {0}}
 };
 
 static void dn_dev_sysctl_register(struct net_device *dev, struct dn_dev_parms *parms)
@@ -255,6 +227,16 @@ static void dn_dev_sysctl_register(struct net_device *dev, struct dn_dev_parms *
 	struct dn_dev_sysctl_table *t;
 	int i;
 
+#define DN_CTL_PATH_DEV	3
+	
+	struct ctl_path dn_ctl_path[] = {
+		{ .procname = "net", .ctl_name = CTL_NET, },
+		{ .procname = "decnet", .ctl_name = NET_DECNET, },
+		{ .procname = "conf", .ctl_name = NET_DECNET_CONF, },
+		{ /* to be set */ },
+		{ },
+	};
+
 	t = kmemdup(&dn_dev_sysctl, sizeof(*t), GFP_KERNEL);
 	if (t == NULL)
 		return;
@@ -265,20 +247,16 @@ static void dn_dev_sysctl_register(struct net_device *dev, struct dn_dev_parms *
 	}
 
 	if (dev) {
-		t->dn_dev_dev[0].procname = dev->name;
-		t->dn_dev_dev[0].ctl_name = dev->ifindex;
+		dn_ctl_path[DN_CTL_PATH_DEV].procname = dev->name;
+		dn_ctl_path[DN_CTL_PATH_DEV].ctl_name = dev->ifindex;
 	} else {
-		t->dn_dev_dev[0].procname = parms->name;
-		t->dn_dev_dev[0].ctl_name = parms->ctl_name;
+		dn_ctl_path[DN_CTL_PATH_DEV].procname = parms->name;
+		dn_ctl_path[DN_CTL_PATH_DEV].ctl_name = parms->ctl_name;
 	}
 
-	t->dn_dev_dev[0].child = t->dn_dev_vars;
-	t->dn_dev_conf_dir[0].child = t->dn_dev_dev;
-	t->dn_dev_proto_dir[0].child = t->dn_dev_conf_dir;
-	t->dn_dev_root_dir[0].child = t->dn_dev_proto_dir;
 	t->dn_dev_vars[0].extra1 = (void *)dev;
 
-	t->sysctl_header = register_sysctl_table(t->dn_dev_root_dir);
+	t->sysctl_header = register_sysctl_paths(dn_ctl_path, t->dn_dev_vars);
 	if (t->sysctl_header == NULL)
 		kfree(t);
 	else
diff --git a/net/decnet/sysctl_net_decnet.c b/net/decnet/sysctl_net_decnet.c
index ae354a4..228067c 100644
--- a/net/decnet/sysctl_net_decnet.c
+++ b/net/decnet/sysctl_net_decnet.c
@@ -470,28 +470,15 @@ static ctl_table dn_table[] = {
 	{0}
 };
 
-static ctl_table dn_dir_table[] = {
-	{
-		.ctl_name = NET_DECNET,
-		.procname = "decnet",
-		.mode = 0555,
-		.child = dn_table},
-	{0}
-};
-
-static ctl_table dn_root_table[] = {
-	{
-		.ctl_name = CTL_NET,
-		.procname = "net",
-		.mode = 0555,
-		.child = dn_dir_table
-	},
-	{0}
+static struct ctl_path dn_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "decnet", .ctl_name = NET_DECNET, },
+	{ }
 };
 
 void dn_register_sysctl(void)
 {
-	dn_table_header = register_sysctl_table(dn_root_table);
+	dn_table_header = register_sysctl_paths(dn_path, dn_table);
 }
 
 void dn_unregister_sysctl(void)


^ permalink raw reply related

* Re: 2.6.24-rc6-mm1
From: Ingo Molnar @ 2008-01-08 15:59 UTC (permalink / raw)
  To: FUJITA Tomonori
  Cc: just.for.lkml, tomof, akpm, jarkao2, herbert, linux-kernel, neilb,
	bfields, netdev, tom
In-Reply-To: <20080107151639P.fujita.tomonori@lab.ntt.co.jp>


* FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote:

> The patches are available at:
> 
> http://www.kernel.org/pub/linux/kernel/people/tomo/iommu/
> 
> Or if you prefer the git tree:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/tomo/linux-2.6-misc.git 
> iommu-sg-fixes

btw., these improvements to the IOMMU code are in -mm and will go into 
v2.6.25, right? The changes look robust to me.

	Ingo

^ permalink raw reply

* [PATCH net-2.6.25 2/6][IPVS] Switch to using ctl_paths.
From: Pavel Emelyanov @ 2008-01-08 15:58 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel
In-Reply-To: <47839C30.2040402@openvz.org>

The feature of ipvs ctls is that the net/ipv4/vs path
is common for core ipvs ctls and for two schedulers,
so I make it exported and re-use it in modules.

Two other .c files required linux/sysctl.h to make the
extern declaration of this path compile well.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

 include/net/ip_vs.h         |    1 +
 net/ipv4/ipvs/ip_vs_ctl.c   |   35 +++++++----------------------------
 net/ipv4/ipvs/ip_vs_est.c   |    1 +
 net/ipv4/ipvs/ip_vs_lblc.c  |   31 +------------------------------
 net/ipv4/ipvs/ip_vs_lblcr.c |   31 +------------------------------
 net/ipv4/ipvs/ip_vs_sched.c |    1 +
 6 files changed, 12 insertions(+), 88 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 3de6d1e..02ab7ca 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -854,6 +854,7 @@ extern int sysctl_ip_vs_expire_quiescent_template;
 extern int sysctl_ip_vs_sync_threshold[2];
 extern int sysctl_ip_vs_nat_icmp_send;
 extern struct ip_vs_stats ip_vs_stats;
+extern struct ctl_path net_vs_ctl_path[];
 
 extern struct ip_vs_service *
 ip_vs_service_get(__u32 fwmark, __u16 protocol, __be32 vaddr, __be16 vport);
diff --git a/net/ipv4/ipvs/ip_vs_ctl.c b/net/ipv4/ipvs/ip_vs_ctl.c
index 693d924..9fecfe7 100644
--- a/net/ipv4/ipvs/ip_vs_ctl.c
+++ b/net/ipv4/ipvs/ip_vs_ctl.c
@@ -1591,34 +1591,13 @@ static struct ctl_table vs_vars[] = {
 	{ .ctl_name = 0 }
 };
 
-static ctl_table vs_table[] = {
-	{
-		.procname	= "vs",
-		.mode		= 0555,
-		.child		= vs_vars
-	},
-	{ .ctl_name = 0 }
-};
-
-static ctl_table ipvs_ipv4_table[] = {
-	{
-		.ctl_name	= NET_IPV4,
-		.procname	= "ipv4",
-		.mode		= 0555,
-		.child		= vs_table,
-	},
-	{ .ctl_name = 0 }
-};
-
-static ctl_table vs_root_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= ipvs_ipv4_table,
-	},
-	{ .ctl_name = 0 }
+struct ctl_path net_vs_ctl_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "ipv4", .ctl_name = NET_IPV4, },
+	{ .procname = "vs", },
+	{ }
 };
+EXPORT_SYMBOL_GPL(net_vs_ctl_path);
 
 static struct ctl_table_header * sysctl_header;
 
@@ -2345,7 +2324,7 @@ int ip_vs_control_init(void)
 	proc_net_fops_create(&init_net, "ip_vs", 0, &ip_vs_info_fops);
 	proc_net_fops_create(&init_net, "ip_vs_stats",0, &ip_vs_stats_fops);
 
-	sysctl_header = register_sysctl_table(vs_root_table);
+	sysctl_header = register_sysctl_paths(net_vs_ctl_path, vs_vars);
 
 	/* Initialize ip_vs_svc_table, ip_vs_svc_fwm_table, ip_vs_rtable */
 	for(idx = 0; idx < IP_VS_SVC_TAB_SIZE; idx++)  {
diff --git a/net/ipv4/ipvs/ip_vs_est.c b/net/ipv4/ipvs/ip_vs_est.c
index efdd74e..dfa0d71 100644
--- a/net/ipv4/ipvs/ip_vs_est.c
+++ b/net/ipv4/ipvs/ip_vs_est.c
@@ -18,6 +18,7 @@
 #include <linux/slab.h>
 #include <linux/types.h>
 #include <linux/interrupt.h>
+#include <linux/sysctl.h>
 
 #include <net/ip_vs.h>
 
diff --git a/net/ipv4/ipvs/ip_vs_lblc.c b/net/ipv4/ipvs/ip_vs_lblc.c
index bf8c04a..3888642 100644
--- a/net/ipv4/ipvs/ip_vs_lblc.c
+++ b/net/ipv4/ipvs/ip_vs_lblc.c
@@ -123,35 +123,6 @@ static ctl_table vs_vars_table[] = {
 	{ .ctl_name = 0 }
 };
 
-static ctl_table vs_table[] = {
-	{
-		.procname	= "vs",
-		.mode		= 0555,
-		.child		= vs_vars_table
-	},
-	{ .ctl_name = 0 }
-};
-
-static ctl_table ipvs_ipv4_table[] = {
-	{
-		.ctl_name	= NET_IPV4,
-		.procname	= "ipv4",
-		.mode		= 0555,
-		.child		= vs_table
-	},
-	{ .ctl_name = 0 }
-};
-
-static ctl_table lblc_root_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= ipvs_ipv4_table
-	},
-	{ .ctl_name = 0 }
-};
-
 static struct ctl_table_header * sysctl_header;
 
 /*
@@ -582,7 +553,7 @@ static int __init ip_vs_lblc_init(void)
 	int ret;
 
 	INIT_LIST_HEAD(&ip_vs_lblc_scheduler.n_list);
-	sysctl_header = register_sysctl_table(lblc_root_table);
+	sysctl_header = register_sysctl_paths(net_vs_ctl_path, vs_vars_table);
 	ret = register_ip_vs_scheduler(&ip_vs_lblc_scheduler);
 	if (ret)
 		unregister_sysctl_table(sysctl_header);
diff --git a/net/ipv4/ipvs/ip_vs_lblcr.c b/net/ipv4/ipvs/ip_vs_lblcr.c
index f50da64..daa260e 100644
--- a/net/ipv4/ipvs/ip_vs_lblcr.c
+++ b/net/ipv4/ipvs/ip_vs_lblcr.c
@@ -311,35 +311,6 @@ static ctl_table vs_vars_table[] = {
 	{ .ctl_name = 0 }
 };
 
-static ctl_table vs_table[] = {
-	{
-		.procname	= "vs",
-		.mode		= 0555,
-		.child		= vs_vars_table
-	},
-	{ .ctl_name = 0 }
-};
-
-static ctl_table ipvs_ipv4_table[] = {
-	{
-		.ctl_name	= NET_IPV4,
-		.procname	= "ipv4",
-		.mode		= 0555,
-		.child		= vs_table
-	},
-	{ .ctl_name = 0 }
-};
-
-static ctl_table lblcr_root_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= ipvs_ipv4_table
-	},
-	{ .ctl_name = 0 }
-};
-
 static struct ctl_table_header * sysctl_header;
 
 /*
@@ -771,7 +742,7 @@ static int __init ip_vs_lblcr_init(void)
 	int ret;
 
 	INIT_LIST_HEAD(&ip_vs_lblcr_scheduler.n_list);
-	sysctl_header = register_sysctl_table(lblcr_root_table);
+	sysctl_header = register_sysctl_paths(net_vs_ctl_path, vs_vars_table);
 	ret = register_ip_vs_scheduler(&ip_vs_lblcr_scheduler);
 	if (ret)
 		unregister_sysctl_table(sysctl_header);
diff --git a/net/ipv4/ipvs/ip_vs_sched.c b/net/ipv4/ipvs/ip_vs_sched.c
index 4322358..121a32b 100644
--- a/net/ipv4/ipvs/ip_vs_sched.c
+++ b/net/ipv4/ipvs/ip_vs_sched.c
@@ -24,6 +24,7 @@
 #include <linux/interrupt.h>
 #include <asm/string.h>
 #include <linux/kmod.h>
+#include <linux/sysctl.h>
 
 #include <net/ip_vs.h>
 


^ permalink raw reply related

* [PATCH net-2.6.25 1/6][NET] Simple ctl_table to ctl_path conversions.
From: Pavel Emelyanov @ 2008-01-08 15:54 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel
In-Reply-To: <47839C30.2040402@openvz.org>

This patch includes many places, that only required
replacing the ctl_table-s with appropriate ctl_paths
and call register_sysctl_paths().

Nothing special was done with them.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

---

 net/appletalk/sysctl_net_atalk.c |   24 +++++-------------------
 net/bridge/br_netfilter.c        |   24 +++++-------------------
 net/dccp/sysctl.c                |   36 +++++++-----------------------------
 net/ipx/sysctl_net_ipx.c         |   24 +++++-------------------
 net/irda/irsysctl.c              |   28 +++++-----------------------
 net/llc/sysctl_net_llc.c         |   24 +++++-------------------
 net/netrom/sysctl_net_netrom.c   |   24 +++++-------------------
 net/rose/sysctl_net_rose.c       |   24 +++++-------------------
 net/sctp/sysctl.c                |   24 +++++-------------------
 net/x25/sysctl_net_x25.c         |   24 +++++-------------------
 10 files changed, 52 insertions(+), 204 deletions(-)

diff --git a/net/appletalk/sysctl_net_atalk.c b/net/appletalk/sysctl_net_atalk.c
index 7df1778..621805d 100644
--- a/net/appletalk/sysctl_net_atalk.c
+++ b/net/appletalk/sysctl_net_atalk.c
@@ -49,31 +49,17 @@ static struct ctl_table atalk_table[] = {
 	{ 0 },
 };
 
-static struct ctl_table atalk_dir_table[] = {
-	{
-		.ctl_name	= NET_ATALK,
-		.procname	= "appletalk",
-		.mode		= 0555,
-		.child		= atalk_table,
-	},
-	{ 0 },
-};
-
-static struct ctl_table atalk_root_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= atalk_dir_table,
-	},
-	{ 0 },
+static struct ctl_path atalk_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "appletalk", .ctl_name = NET_ATALK, },
+	{ }
 };
 
 static struct ctl_table_header *atalk_table_header;
 
 void atalk_register_sysctl(void)
 {
-	atalk_table_header = register_sysctl_table(atalk_root_table);
+	atalk_table_header = register_sysctl_paths(atalk_path, atalk_table);
 }
 
 void atalk_unregister_sysctl(void)
diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c
index 32ac035..91a180a 100644
--- a/net/bridge/br_netfilter.c
+++ b/net/bridge/br_netfilter.c
@@ -934,24 +934,10 @@ static ctl_table brnf_table[] = {
 	{ .ctl_name = 0 }
 };
 
-static ctl_table brnf_bridge_table[] = {
-	{
-		.ctl_name	= NET_BRIDGE,
-		.procname	= "bridge",
-		.mode		= 0555,
-		.child		= brnf_table,
-	},
-	{ .ctl_name = 0 }
-};
-
-static ctl_table brnf_net_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= brnf_bridge_table,
-	},
-	{ .ctl_name = 0 }
+static struct ctl_path brnf_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "bridge", .ctl_name = NET_BRIDGE, },
+	{ }
 };
 #endif
 
@@ -963,7 +949,7 @@ int __init br_netfilter_init(void)
 	if (ret < 0)
 		return ret;
 #ifdef CONFIG_SYSCTL
-	brnf_sysctl_header = register_sysctl_table(brnf_net_table);
+	brnf_sysctl_header = register_sysctl_paths(brnf_path, brnf_table);
 	if (brnf_sysctl_header == NULL) {
 		printk(KERN_WARNING
 		       "br_netfilter: can't register to sysctl.\n");
diff --git a/net/dccp/sysctl.c b/net/dccp/sysctl.c
index c62c050..2129599 100644
--- a/net/dccp/sysctl.c
+++ b/net/dccp/sysctl.c
@@ -100,41 +100,19 @@ static struct ctl_table dccp_default_table[] = {
 	{ .ctl_name = 0, }
 };
 
-static struct ctl_table dccp_table[] = {
-	{
-		.ctl_name	= NET_DCCP_DEFAULT,
-		.procname	= "default",
-		.mode		= 0555,
-		.child		= dccp_default_table,
-	},
-	{ .ctl_name = 0, },
-};
-
-static struct ctl_table dccp_dir_table[] = {
-	{
-		.ctl_name	= NET_DCCP,
-		.procname	= "dccp",
-		.mode		= 0555,
-		.child		= dccp_table,
-	},
-	{ .ctl_name = 0, },
-};
-
-static struct ctl_table dccp_root_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= dccp_dir_table,
-	},
-	{ .ctl_name = 0, },
+static struct ctl_path dccp_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "dccp", .ctl_name = NET_DCCP, },
+	{ .procname = "default", .ctl_name = NET_DCCP_DEFAULT, },
+	{ }
 };
 
 static struct ctl_table_header *dccp_table_header;
 
 int __init dccp_sysctl_init(void)
 {
-	dccp_table_header = register_sysctl_table(dccp_root_table);
+	dccp_table_header = register_sysctl_paths(dccp_path,
+			dccp_default_table);
 
 	return dccp_table_header != NULL ? 0 : -ENOMEM;
 }
diff --git a/net/ipx/sysctl_net_ipx.c b/net/ipx/sysctl_net_ipx.c
index 0cf5264..92fef86 100644
--- a/net/ipx/sysctl_net_ipx.c
+++ b/net/ipx/sysctl_net_ipx.c
@@ -28,31 +28,17 @@ static struct ctl_table ipx_table[] = {
 	{ 0 },
 };
 
-static struct ctl_table ipx_dir_table[] = {
-	{
-		.ctl_name	= NET_IPX,
-		.procname	= "ipx",
-		.mode		= 0555,
-		.child		= ipx_table,
-	},
-	{ 0 },
-};
-
-static struct ctl_table ipx_root_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= ipx_dir_table,
-	},
-	{ 0 },
+static struct ctl_path ipx_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "ipx", .ctl_name = NET_IPX, },
+	{ }
 };
 
 static struct ctl_table_header *ipx_table_header;
 
 void ipx_register_sysctl(void)
 {
-	ipx_table_header = register_sysctl_table(ipx_root_table);
+	ipx_table_header = register_sysctl_paths(ipx_path, ipx_table);
 }
 
 void ipx_unregister_sysctl(void)
diff --git a/net/irda/irsysctl.c b/net/irda/irsysctl.c
index 565cbf0..d8aba86 100644
--- a/net/irda/irsysctl.c
+++ b/net/irda/irsysctl.c
@@ -234,28 +234,10 @@ static ctl_table irda_table[] = {
 	{ .ctl_name = 0 }
 };
 
-/* One directory */
-static ctl_table irda_net_table[] = {
-	{
-		.ctl_name	= NET_IRDA,
-		.procname	= "irda",
-		.maxlen		= 0,
-		.mode		= 0555,
-		.child		= irda_table
-	},
-	{ .ctl_name = 0 }
-};
-
-/* The parent directory */
-static ctl_table irda_root_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.maxlen		= 0,
-		.mode		= 0555,
-		.child		= irda_net_table
-	},
-	{ .ctl_name = 0 }
+static struct ctl_path irda_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "irda", .ctl_name = NET_IRDA, },
+	{ }
 };
 
 static struct ctl_table_header *irda_table_header;
@@ -268,7 +250,7 @@ static struct ctl_table_header *irda_table_header;
  */
 int __init irda_sysctl_register(void)
 {
-	irda_table_header = register_sysctl_table(irda_root_table);
+	irda_table_header = register_sysctl_paths(irda_path, irda_table);
 	if (!irda_table_header)
 		return -ENOMEM;
 
diff --git a/net/llc/sysctl_net_llc.c b/net/llc/sysctl_net_llc.c
index 46992d0..5bef1dc 100644
--- a/net/llc/sysctl_net_llc.c
+++ b/net/llc/sysctl_net_llc.c
@@ -92,31 +92,17 @@ static struct ctl_table llc_table[] = {
 	{ 0 },
 };
 
-static struct ctl_table llc_dir_table[] = {
-	{
-		.ctl_name	= NET_LLC,
-		.procname	= "llc",
-		.mode		= 0555,
-		.child		= llc_table,
-	},
-	{ 0 },
-};
-
-static struct ctl_table llc_root_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= llc_dir_table,
-	},
-	{ 0 },
+static struct ctl_path llc_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "llc", .ctl_name = NET_LLC, },
+	{ }
 };
 
 static struct ctl_table_header *llc_table_header;
 
 int __init llc_sysctl_init(void)
 {
-	llc_table_header = register_sysctl_table(llc_root_table);
+	llc_table_header = register_sysctl_paths(llc_path, llc_table);
 
 	return llc_table_header ? 0 : -ENOMEM;
 }
diff --git a/net/netrom/sysctl_net_netrom.c b/net/netrom/sysctl_net_netrom.c
index 2ea68da..34c96c9 100644
--- a/net/netrom/sysctl_net_netrom.c
+++ b/net/netrom/sysctl_net_netrom.c
@@ -170,29 +170,15 @@ static ctl_table nr_table[] = {
 	{ .ctl_name = 0 }
 };
 
-static ctl_table nr_dir_table[] = {
-	{
-		.ctl_name	= NET_NETROM,
-		.procname	= "netrom",
-		.mode		= 0555,
-		.child		= nr_table
-	},
-	{ .ctl_name = 0 }
-};
-
-static ctl_table nr_root_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= nr_dir_table
-	},
-	{ .ctl_name = 0 }
+static struct ctl_path nr_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "netrom", .ctl_name = NET_NETROM, },
+	{ }
 };
 
 void __init nr_register_sysctl(void)
 {
-	nr_table_header = register_sysctl_table(nr_root_table);
+	nr_table_header = register_sysctl_paths(nr_path, nr_table);
 }
 
 void nr_unregister_sysctl(void)
diff --git a/net/rose/sysctl_net_rose.c b/net/rose/sysctl_net_rose.c
index 455b055..20be348 100644
--- a/net/rose/sysctl_net_rose.c
+++ b/net/rose/sysctl_net_rose.c
@@ -138,29 +138,15 @@ static ctl_table rose_table[] = {
 	{ .ctl_name = 0 }
 };
 
-static ctl_table rose_dir_table[] = {
-	{
-		.ctl_name	= NET_ROSE,
-		.procname	= "rose",
-		.mode		= 0555,
-		.child		= rose_table
-	},
-	{ .ctl_name = 0 }
-};
-
-static ctl_table rose_root_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= rose_dir_table
-	},
-	{ .ctl_name = 0 }
+static struct ctl_path rose_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "rose", .ctl_name = NET_ROSE, },
+	{ }
 };
 
 void __init rose_register_sysctl(void)
 {
-	rose_table_header = register_sysctl_table(rose_root_table);
+	rose_table_header = register_sysctl_paths(rose_path, rose_table);
 }
 
 void rose_unregister_sysctl(void)
diff --git a/net/sctp/sysctl.c b/net/sctp/sysctl.c
index da4f157..5eb6ea8 100644
--- a/net/sctp/sysctl.c
+++ b/net/sctp/sysctl.c
@@ -275,24 +275,10 @@ static ctl_table sctp_table[] = {
 	{ .ctl_name = 0 }
 };
 
-static ctl_table sctp_net_table[] = {
-	{
-		.ctl_name	= NET_SCTP,
-		.procname	= "sctp",
-		.mode		= 0555,
-		.child		= sctp_table
-	},
-	{ .ctl_name = 0 }
-};
-
-static ctl_table sctp_root_table[] = {
-	{
-		.ctl_name	= CTL_NET,
-		.procname	= "net",
-		.mode		= 0555,
-		.child		= sctp_net_table
-	},
-	{ .ctl_name = 0 }
+static struct ctl_path sctp_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "sctp", .ctl_name = NET_SCTP, },
+	{ }
 };
 
 static struct ctl_table_header * sctp_sysctl_header;
@@ -300,7 +286,7 @@ static struct ctl_table_header * sctp_sysctl_header;
 /* Sysctl registration.  */
 void sctp_sysctl_register(void)
 {
-	sctp_sysctl_header = register_sysctl_table(sctp_root_table);
+	sctp_sysctl_header = register_sysctl_paths(sctp_path, sctp_table);
 }
 
 /* Sysctl deregistration.  */
diff --git a/net/x25/sysctl_net_x25.c b/net/x25/sysctl_net_x25.c
index a59b77f..6ebda25 100644
--- a/net/x25/sysctl_net_x25.c
+++ b/net/x25/sysctl_net_x25.c
@@ -84,29 +84,15 @@ static struct ctl_table x25_table[] = {
 	{ 0, },
 };
 
-static struct ctl_table x25_dir_table[] = {
-	{
-		.ctl_name =	NET_X25,
-		.procname =	"x25",
-		.mode =		0555,
-		.child =	x25_table,
-	},
-	{ 0, },
-};
-
-static struct ctl_table x25_root_table[] = {
-	{
-		.ctl_name =	CTL_NET,
-		.procname =	"net",
-		.mode =		0555,
-		.child =	x25_dir_table,
-	},
-	{ 0, },
+static struct ctl_path x25_path[] = {
+	{ .procname = "net", .ctl_name = CTL_NET, },
+	{ .procname = "x25", .ctl_name = NET_X25, },
+	{ }
 };
 
 void __init x25_register_sysctl(void)
 {
-	x25_table_header = register_sysctl_table(x25_root_table);
+	x25_table_header = register_sysctl_paths(x25_path, x25_table);
 }
 
 void x25_unregister_sysctl(void)


^ permalink raw reply related

* [PATCH net-2.6.25 0/6] Use ctl paths in the networking code.
From: Pavel Emelyanov @ 2008-01-08 15:52 UTC (permalink / raw)
  To: David Miller; +Cc: Linux Netdev List, devel

This set almost completes the ctl paths usage in the 
networking code. The first patches doing this were accepted 
in the last year :), but they tuned only core, ipv4 and ipv6.

I thought, that splitting this into many subsystem would
produce too many patches, so I splitted it so, that most 
subsystems that are patched in a very similar way are 
merged into one patch to make the review easier. Hope this 
is OK.

After this set the vmlinux size becomes almost 4Kb less (when
all patched files are built-in):

add/remove: 16/40 grow/shrink: 18/6 up/down: 486/-4394 (-3908)
function                                     old     new   delta
net_vs_ctl_path                                -      32     +32
dccp_path                                      -      32     +32
x25_path                                       -      24     +24
sctp_path                                      -      24     +24
rose_path                                      -      24     +24
nr_path                                        -      24     +24
net_ipv6_ctl_path                              -      24     +24
net_ipv4_ctl_path                              -      24     +24
llc_path                                       -      24     +24
irda_path                                      -      24     +24
ipx_path                                       -      24     +24
dn_path                                        -      24     +24
brnf_path                                      -      24     +24
ax25_path                                      -      24     +24
atalk_path                                     -      24     +24
nf_ct_path                                     -      16     +16
dn_dev_sysctl_register                       164     173      +9
x25_register_sysctl                           16      21      +5
sctp_sysctl_register                          16      21      +5
rose_register_sysctl                          16      21      +5
nr_register_sysctl                            16      21      +5
nf_conntrack_standalone_init                 166     171      +5
llc_sysctl_init                               24      29      +5
irda_sysctl_register                          24      29      +5
ipx_register_sysctl                           16      21      +5
ip_vs_lblcr_init                              66      71      +5
ip_vs_control_init                           209     214      +5
ip_queue_init                                288     293      +5
ip6_queue_init                               288     293      +5
dn_register_sysctl                            16      21      +5
dccp_sysctl_init                              24      29      +5
br_netfilter_init                             91      96      +5
atalk_register_sysctl                         16      21      +5
ax25_register_sysctl                         294     290      -4
ax25_unregister_sysctl                        56      46     -10
nf_unregister_sysctl_table                    19       -     -19
net_ipv4_path                                 48      24     -24
ipv6_ctl_path                                 24       -     -24
path_free                                     27       -     -27
nf_net_ipv4_netfilter_sysctl_path             88      32     -56
nf_net_netfilter_sysctl_path                  88      24     -64
x25_root_table                                88       -     -88
x25_dir_table                                 88       -     -88
vs_root_table                                 88       -     -88
sctp_root_table                               88       -     -88
sctp_net_table                                88       -     -88
rose_root_table                               88       -     -88
rose_dir_table                                88       -     -88
nr_root_table                                 88       -     -88
nr_dir_table                                  88       -     -88
nf_net_netfilter_table                        88       -     -88
nf_net_ipv4_table                             88       -     -88
nf_net_ipv4_netfilter_table                   88       -     -88
nf_ct_net_table                               88       -     -88
llc_root_table                                88       -     -88
llc_dir_table                                 88       -     -88
lblcr_root_table                              88       -     -88
lblc_root_table                               88       -     -88
irda_root_table                               88       -     -88
irda_net_table                                88       -     -88
ipx_root_table                                88       -     -88
ipx_dir_table                                 88       -     -88
dn_root_table                                 88       -     -88
dn_dir_table                                  88       -     -88
dccp_table                                    88       -     -88
dccp_root_table                               88       -     -88
dccp_dir_table                                88       -     -88
brnf_net_table                                88       -     -88
brnf_bridge_table                             88       -     -88
ax25_root_table                               88       -     -88
ax25_dir_table                                88       -     -88
atalk_root_table                              88       -     -88
atalk_dir_table                               88       -     -88
nf_register_sysctl_table                     118       -    -118
ipq_root_table                               176       -    -176
ipq_dir_table                                176       -    -176
vs_table                                     264       -    -264
ipvs_ipv4_table                              264       -    -264
dn_dev_sysctl                                576     224    -352

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>

^ permalink raw reply

* [PATCH net-2.6.25] [BRIDGE] Remove unused macros from ebt_vlan.c
From: Rami Rosen @ 2008-01-08 14:38 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 166 bytes --]

Hi,

 Remove two unused macros, INV_FLAG and SET_BITMASK
 from net/bridge/netfilter/ebt_vlan.c.

Regards,
Rami Rosen


Signed-off-by: Rami Rosen <ramirose@gmail.com>

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 640 bytes --]

diff --git a/net/bridge/netfilter/ebt_vlan.c b/net/bridge/netfilter/ebt_vlan.c
index a43c697..0ddf749 100644
--- a/net/bridge/netfilter/ebt_vlan.c
+++ b/net/bridge/netfilter/ebt_vlan.c
@@ -37,9 +37,7 @@ MODULE_LICENSE("GPL");
 
 
 #define DEBUG_MSG(args...) if (debug) printk (KERN_DEBUG "ebt_vlan: " args)
-#define INV_FLAG(_inv_flag_) (info->invflags & _inv_flag_) ? "!" : ""
 #define GET_BITMASK(_BIT_MASK_) info->bitmask & _BIT_MASK_
-#define SET_BITMASK(_BIT_MASK_) info->bitmask |= _BIT_MASK_
 #define EXIT_ON_MISMATCH(_MATCH_,_MASK_) {if (!((info->_MATCH_ == _MATCH_)^!!(info->invflags & _MASK_))) return EBT_NOMATCH;}
 
 static int

^ permalink raw reply related

* [PATCH 2/4] [POWERPC][NET] ucc_geth_mii and users: get rid of device_type
From: Anton Vorontsov @ 2008-01-08 14:29 UTC (permalink / raw)
  To: Kumar Gala, Jeff Garzik; +Cc: linuxppc-dev, netdev
In-Reply-To: <20080108142634.GA5989@localhost.localdomain>

device_type property is bogus, thus use proper compatible.

Also change compatible property to "fsl,ucc-mdio".

Per http://ozlabs.org/pipermail/linuxppc-dev/2007-December/048388.html

Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
---
 arch/powerpc/boot/dts/mpc832x_mds.dts |    3 +--
 arch/powerpc/boot/dts/mpc832x_rdb.dts |    3 +--
 arch/powerpc/boot/dts/mpc836x_mds.dts |    3 +--
 arch/powerpc/boot/dts/mpc8568mds.dts  |    2 +-
 drivers/net/ucc_geth_mii.c            |    3 +++
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/boot/dts/mpc832x_mds.dts b/arch/powerpc/boot/dts/mpc832x_mds.dts
index 010c8d9..cf41194 100644
--- a/arch/powerpc/boot/dts/mpc832x_mds.dts
+++ b/arch/powerpc/boot/dts/mpc832x_mds.dts
@@ -255,8 +255,7 @@
 			#address-cells = <1>;
 			#size-cells = <0>;
 			reg = <2320 18>;
-			device_type = "mdio";
-			compatible = "ucc_geth_phy";
+			compatible = "fsl,ucc-mdio";
 
 			phy3: ethernet-phy@03 {
 				interrupt-parent = < &ipic >;
diff --git a/arch/powerpc/boot/dts/mpc832x_rdb.dts b/arch/powerpc/boot/dts/mpc832x_rdb.dts
index 3a73134..09301c1 100644
--- a/arch/powerpc/boot/dts/mpc832x_rdb.dts
+++ b/arch/powerpc/boot/dts/mpc832x_rdb.dts
@@ -236,8 +236,7 @@
 			#address-cells = <1>;
 			#size-cells = <0>;
 			reg = <3120 18>;
-			device_type = "mdio";
-			compatible = "ucc_geth_phy";
+			compatible = "fsl,ucc-mdio";
 
 			phy00:ethernet-phy@00 {
 				interrupt-parent = <&pic>;
diff --git a/arch/powerpc/boot/dts/mpc836x_mds.dts b/arch/powerpc/boot/dts/mpc836x_mds.dts
index 2986860..3eb8f72 100644
--- a/arch/powerpc/boot/dts/mpc836x_mds.dts
+++ b/arch/powerpc/boot/dts/mpc836x_mds.dts
@@ -288,8 +288,7 @@
 			#address-cells = <1>;
 			#size-cells = <0>;
 			reg = <2120 18>;
-			device_type = "mdio";
-			compatible = "ucc_geth_phy";
+			compatible = "fsl,ucc-mdio";
 
 			phy0: ethernet-phy@00 {
 				interrupt-parent = < &ipic >;
diff --git a/arch/powerpc/boot/dts/mpc8568mds.dts b/arch/powerpc/boot/dts/mpc8568mds.dts
index 7440347..2bc147f 100644
--- a/arch/powerpc/boot/dts/mpc8568mds.dts
+++ b/arch/powerpc/boot/dts/mpc8568mds.dts
@@ -356,7 +356,7 @@
 			#address-cells = <1>;
 			#size-cells = <0>;
 			reg = <2120 18>;
-			compatible = "ucc_geth_phy";
+			compatible = "fsl,ucc-mdio";
 
 			/* These are the same PHYs as on
 			 * gianfar's MDIO bus */
diff --git a/drivers/net/ucc_geth_mii.c b/drivers/net/ucc_geth_mii.c
index df884f0..e3ba14a 100644
--- a/drivers/net/ucc_geth_mii.c
+++ b/drivers/net/ucc_geth_mii.c
@@ -256,6 +256,9 @@ static struct of_device_id uec_mdio_match[] = {
 		.type = "mdio",
 		.compatible = "ucc_geth_phy",
 	},
+	{
+		.compatible = "fsl,ucc-mdio",
+	},
 	{},
 };
 
-- 
1.5.2.2


^ permalink raw reply related

* [PATCH 3/3] [CCID2]: Add option-parsing code to process Ack Vectors at the HC-sender
From: Gerrit Renker @ 2008-01-08 14:07 UTC (permalink / raw)
  To: acme; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <1199801276-16955-3-git-send-email-gerrit@erg.abdn.ac.uk>

This is patch 2 in the set and uses the routines provided by the previous
patch to implement parsing of received Ack Vectors, replacing duplicate code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/ccids/ccid2.c |  132 ++++++++++++++++--------------------------------
 net/dccp/ccids/ccid2.h |    2 +
 2 files changed, 46 insertions(+), 88 deletions(-)

--- a/net/dccp/ccids/ccid2.h
+++ b/net/dccp/ccids/ccid2.h
@@ -47,6 +47,7 @@ struct ccid2_seq {
  * @ccid2hctx_lastrtt -time RTT was last measured
  * @ccid2hctx_rpseq - last consecutive seqno
  * @ccid2hctx_rpdupack - dupacks since rpseq
+ * @ccid2hctx_parsed_ackvecs: list of Ack Vectors received on current skb
 */
 struct ccid2_hc_tx_sock {
 	u32			ccid2hctx_cwnd;
@@ -66,6 +67,7 @@ struct ccid2_hc_tx_sock {
 	int			ccid2hctx_rpdupack;
 	unsigned long		ccid2hctx_last_cong;
 	u64			ccid2hctx_high_ack;
+	struct list_head	ccid2hctx_parsed_ackvecs;
 };
 
 struct ccid2_hc_rx_sock {
--- a/net/dccp/ccids/ccid2.c
+++ b/net/dccp/ccids/ccid2.c
@@ -320,68 +320,6 @@ static void ccid2_hc_tx_packet_sent(struct sock *sk, int more, unsigned int len)
 #endif
 }
 
-/* XXX Lame code duplication!
- * returns -1 if none was found.
- * else returns the next offset to use in the function call.
- */
-static int ccid2_ackvector(struct sock *sk, struct sk_buff *skb, int offset,
-			   unsigned char **vec, unsigned char *veclen)
-{
-	const struct dccp_hdr *dh = dccp_hdr(skb);
-	unsigned char *options = (unsigned char *)dh + dccp_hdr_len(skb);
-	unsigned char *opt_ptr;
-	const unsigned char *opt_end = (unsigned char *)dh +
-					(dh->dccph_doff * 4);
-	unsigned char opt, len;
-	unsigned char *value;
-
-	BUG_ON(offset < 0);
-	options += offset;
-	opt_ptr = options;
-	if (opt_ptr >= opt_end)
-		return -1;
-
-	while (opt_ptr != opt_end) {
-		opt   = *opt_ptr++;
-		len   = 0;
-		value = NULL;
-
-		/* Check if this isn't a single byte option */
-		if (opt > DCCPO_MAX_RESERVED) {
-			if (opt_ptr == opt_end)
-				goto out_invalid_option;
-
-			len = *opt_ptr++;
-			if (len < 3)
-				goto out_invalid_option;
-			/*
-			 * Remove the type and len fields, leaving
-			 * just the value size
-			 */
-			len     -= 2;
-			value   = opt_ptr;
-			opt_ptr += len;
-
-			if (opt_ptr > opt_end)
-				goto out_invalid_option;
-		}
-
-		switch (opt) {
-		case DCCPO_ACK_VECTOR_0:
-		case DCCPO_ACK_VECTOR_1:
-			*vec	= value;
-			*veclen = len;
-			return offset + (opt_ptr - options);
-		}
-	}
-
-	return -1;
-
-out_invalid_option:
-	DCCP_BUG("Invalid option - this should not happen (previous parsing)!");
-	return -1;
-}
-
 static void ccid2_hc_tx_kill_rto_timer(struct sock *sk)
 {
 	struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk);
@@ -502,15 +440,30 @@ static void ccid2_congestion_event(struct sock *sk, struct ccid2_seq *seqp)
 		ccid2_change_l_ack_ratio(sk, hctx->ccid2hctx_cwnd);
 }
 
+static int ccid2_hc_tx_parse_options(struct sock *sk, unsigned char option,
+				     unsigned char len, u16 idx,
+				     unsigned char *value)
+{
+	struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk);
+
+	switch (option) {
+	case DCCPO_ACK_VECTOR_0:
+		return dccp_ackvec_parsed_add(&hctx->ccid2hctx_parsed_ackvecs,
+					      value, len, 0);
+	case DCCPO_ACK_VECTOR_1:
+		return dccp_ackvec_parsed_add(&hctx->ccid2hctx_parsed_ackvecs,
+					      value, len, 1);
+	}
+	return 0;
+}
+
 static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
 {
 	struct dccp_sock *dp = dccp_sk(sk);
 	struct ccid2_hc_tx_sock *hctx = ccid2_hc_tx_sk(sk);
+	struct dccp_ackvec_parsed *avp;
 	u64 ackno, seqno;
 	struct ccid2_seq *seqp;
-	unsigned char *vector;
-	unsigned char veclen;
-	int offset = 0;
 	int done = 0;
 	unsigned int maxincr = 0;
 
@@ -545,13 +498,13 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
 	}
 
 	/* check forward path congestion */
-	/* still didn't send out new data packets */
-	if (hctx->ccid2hctx_seqh == hctx->ccid2hctx_seqt)
-		return;
-
 	if (dccp_packet_without_ack(skb))
 		return;
 
+	/* still didn't send out new data packets */
+	if (hctx->ccid2hctx_seqh == hctx->ccid2hctx_seqt)
+		goto done;
+
 	ackno = DCCP_SKB_CB(skb)->dccpd_ack_seq;
 	if (after48(ackno, hctx->ccid2hctx_high_ack))
 		hctx->ccid2hctx_high_ack = ackno;
@@ -574,16 +527,16 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
 		maxincr = DIV_ROUND_UP(dp->dccps_l_ack_ratio, 2);
 
 	/* go through all ack vectors */
-	while ((offset = ccid2_ackvector(sk, skb, offset,
-					 &vector, &veclen)) != -1) {
+	list_for_each_entry(avp, &hctx->ccid2hctx_parsed_ackvecs, node) {
 		/* go through this ack vector */
-		while (veclen--) {
+		for (; avp->len--; avp->vec++) {
 			u64 ackno_end_rl = SUB48(ackno,
-						 dccp_ackvec_runlen(vector));
+						 dccp_ackvec_runlen(avp->vec));
 
-			ccid2_pr_debug("ackvec start:%llu end:%llu\n",
+			ccid2_pr_debug("ackvec %llu |%u,%u|\n",
 				       (unsigned long long)ackno,
-				       (unsigned long long)ackno_end_rl);
+				       dccp_ackvec_state(avp->vec) >> 6,
+				       dccp_ackvec_runlen(avp->vec));
 			/* if the seqno we are analyzing is larger than the
 			 * current ackno, then move towards the tail of our
 			 * seqnos.
@@ -602,7 +555,7 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
 			 * run length
 			 */
 			while (between48(seqp->ccid2s_seq,ackno_end_rl,ackno)) {
-				const u8 state = dccp_ackvec_state(vector);
+				const u8 state = dccp_ackvec_state(avp->vec);
 
 				/* new packet received or marked */
 				if (state != DCCPAV_NOT_RECEIVED &&
@@ -629,7 +582,6 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
 				break;
 
 			ackno = SUB48(ackno_end_rl, 1);
-			vector++;
 		}
 		if (done)
 			break;
@@ -693,6 +645,8 @@ static void ccid2_hc_tx_packet_recv(struct sock *sk, struct sk_buff *skb)
 	}
 
 	ccid2_hc_tx_check_sanity(hctx);
+done:
+	dccp_ackvec_parsed_cleanup(&hctx->ccid2hctx_parsed_ackvecs);
 }
 
 static int ccid2_hc_tx_init(struct ccid *ccid, struct sock *sk)
@@ -727,6 +681,7 @@ static int ccid2_hc_tx_init(struct ccid *ccid, struct sock *sk)
 	hctx->ccid2hctx_last_cong = jiffies;
 	setup_timer(&hctx->ccid2hctx_rtotimer, ccid2_hc_tx_rto_expire,
 			(unsigned long)sk);
+	INIT_LIST_HEAD(&hctx->ccid2hctx_parsed_ackvecs);
 
 	ccid2_hc_tx_check_sanity(hctx);
 	return 0;
@@ -762,17 +717,18 @@ static void ccid2_hc_rx_packet_recv(struct sock *sk, struct sk_buff *skb)
 }
 
 static struct ccid_operations ccid2 = {
-	.ccid_id		= DCCPC_CCID2,
-	.ccid_name		= "TCP-like",
-	.ccid_owner		= THIS_MODULE,
-	.ccid_hc_tx_obj_size	= sizeof(struct ccid2_hc_tx_sock),
-	.ccid_hc_tx_init	= ccid2_hc_tx_init,
-	.ccid_hc_tx_exit	= ccid2_hc_tx_exit,
-	.ccid_hc_tx_send_packet	= ccid2_hc_tx_send_packet,
-	.ccid_hc_tx_packet_sent	= ccid2_hc_tx_packet_sent,
-	.ccid_hc_tx_packet_recv	= ccid2_hc_tx_packet_recv,
-	.ccid_hc_rx_obj_size	= sizeof(struct ccid2_hc_rx_sock),
-	.ccid_hc_rx_packet_recv	= ccid2_hc_rx_packet_recv,
+	.ccid_id		  = DCCPC_CCID2,
+	.ccid_name		  = "TCP-like",
+	.ccid_owner		  = THIS_MODULE,
+	.ccid_hc_tx_obj_size	  = sizeof(struct ccid2_hc_tx_sock),
+	.ccid_hc_tx_init	  = ccid2_hc_tx_init,
+	.ccid_hc_tx_exit	  = ccid2_hc_tx_exit,
+	.ccid_hc_tx_send_packet	  = ccid2_hc_tx_send_packet,
+	.ccid_hc_tx_packet_sent	  = ccid2_hc_tx_packet_sent,
+	.ccid_hc_tx_parse_options = ccid2_hc_tx_parse_options,
+	.ccid_hc_tx_packet_recv	  = ccid2_hc_tx_packet_recv,
+	.ccid_hc_rx_obj_size	  = sizeof(struct ccid2_hc_rx_sock),
+	.ccid_hc_rx_packet_recv	  = ccid2_hc_rx_packet_recv,
 };
 
 #ifdef CONFIG_IP_DCCP_CCID2_DEBUG

^ permalink raw reply

* [PATCH 1/3] [ACKVEC]: Schedule SyncAck when running out of space
From: Gerrit Renker @ 2008-01-08 14:07 UTC (permalink / raw)
  To: acme; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <1199801276-16955-1-git-send-email-gerrit@erg.abdn.ac.uk>

The problem with Ack Vectors is that

  i) their length is variable and can in principle grow quite large,
 ii) it is hard to predict exactly how large they will be.

 Due to the second point it seems not a good idea to reduce the MPS;
i particular when on average there is enough room for the Ack Vector
and an increase in length is momentarily due to some burst loss, after
which the Ack Vector returns to its normal/average length.

The solution taken by this patch to address the outstanding FIXME is
to schedule a separate Sync when running out of space on the skb, and to
log a warning into the syslog.

The mechanism can also be used for other out-of-band signalling: it does
quicker signalling than scheduling an Ack, since it does not need to wait
for new data.

 Additional Note:
 ----------------
 It is possible to lower MPS according to the average length of Ack Vectors;
 the following argues why this does not seem to be a good idea.

 When determining the average Ack Vector length, a moving-average is more
 useful than a normal average, since sudden peaks (burst losses) are better
 dampened. The Ack Vector buffer would have a field `av_avg_len' which tracks
 this moving average and MPS would be reduced by this value (plus 2 bytes for
 type/value for each full Ack Vector).

 However, this means that the MPS decreases in the middle of an established
 connection. For a user who has tuned his/her application to work with the
 MPS taken at the beginning of the connection this can be very counter-
 intuitive and annoying.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 include/linux/dccp.h |    2 ++
 net/dccp/options.c   |   28 +++++++++++++++++++---------
 net/dccp/output.c    |    8 ++++++++
 3 files changed, 29 insertions(+), 9 deletions(-)

--- a/include/linux/dccp.h
+++ b/include/linux/dccp.h
@@ -475,6 +475,7 @@ struct dccp_ackvec;
  * @dccps_hc_rx_insert_options - receiver wants to add options when acking
  * @dccps_hc_tx_insert_options - sender wants to add options when sending
  * @dccps_server_timewait - server holds timewait state on close (RFC 4340, 8.3)
+ * @dccps_sync_scheduled - flag which signals "send out-of-band message soon"
  * @dccps_xmit_timer - timer for when CCID is not ready to send
  * @dccps_syn_rtt - RTT sample from Request/Response exchange (in usecs)
  */
@@ -515,6 +516,7 @@ struct dccp_sock {
 	__u8				dccps_hc_rx_insert_options:1;
 	__u8				dccps_hc_tx_insert_options:1;
 	__u8				dccps_server_timewait:1;
+	__u8				dccps_sync_scheduled:1;
 	struct timer_list		dccps_xmit_timer;
 };
 
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -426,7 +426,9 @@ static int dccp_insert_option_timestamp_echo(struct dccp_sock *dp,
 
 int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff *skb)
 {
-	struct dccp_ackvec *av = dccp_sk(sk)->dccps_hc_rx_ackvec;
+	struct dccp_sock *dp = dccp_sk(sk);
+	struct dccp_ackvec *av = dp->dccps_hc_rx_ackvec;
+	struct dccp_skb_cb *dcb = DCCP_SKB_CB(skb);
 	const u16 buflen = dccp_ackvec_buflen(av);
 	/* Figure out how many options do we need to represent the ackvec */
 	const u16 nr_opts = DIV_ROUND_UP(buflen, DCCP_SINGLE_OPT_MAXLEN);
@@ -435,16 +437,24 @@ int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff *skb)
 	const unsigned char *tail, *from;
 	unsigned char *to;
 
-	if (DCCP_SKB_CB(skb)->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) {
-		/*
-		 * FIXME: when running out of option space while piggybacking on
-		 * DataAck, return 0 here and schedule a separate Ack instead.
-		 */
+	if (dcb->dccpd_opt_len + len > DCCP_MAX_OPT_LEN) {
 		DCCP_WARN("Lacking space for %u bytes on %s packet\n", len,
-			  dccp_packet_name(DCCP_SKB_CB(skb)->dccpd_type));
+			  dccp_packet_name(dcb->dccpd_type));
 		return -1;
 	}
-	DCCP_SKB_CB(skb)->dccpd_opt_len += len;
+	/*
+	 * Since Ack Vectors are variable-length, we can not always predict
+	 * their size. To catch exception cases where the space is running out
+	 * on the skb, a separate Sync is scheduled to carry the Ack Vector.
+	 */
+	if (dcb->dccpd_opt_len + skb->len + len > dp->dccps_mss_cache) {
+		DCCP_WARN("No space left for Ack Vector (%u) on skb (%u+%u), "
+			  "MPS=%u ==> reduce payload size?\n", len, skb->len,
+			  dcb->dccpd_opt_len, dp->dccps_mss_cache);
+		dp->dccps_sync_scheduled = 1;
+		return 0;
+	}
+	dcb->dccpd_opt_len += len;
 
 	to   = skb_push(skb, len);
 	len  = buflen;
@@ -485,7 +495,7 @@ int dccp_insert_option_ackvec(struct sock *sk, struct sk_buff *skb)
 	/*
 	 * Each sent Ack Vector is recorded in the list, as per A.2 of RFC 4340.
 	 */
-	if (dccp_ackvec_update_records(av, DCCP_SKB_CB(skb)->dccpd_seq, nonce))
+	if (dccp_ackvec_update_records(av, dcb->dccpd_seq, nonce))
 		return -ENOBUFS;
 	return 0;
 }
--- a/net/dccp/output.c
+++ b/net/dccp/output.c
@@ -282,6 +282,8 @@ void dccp_write_xmit(struct sock *sk, int block)
 			if (err)
 				DCCP_BUG("err=%d after ccid_hc_tx_packet_sent",
 					 err);
+			if (dp->dccps_sync_scheduled)
+				dccp_send_sync(sk, dp->dccps_gsr, DCCP_PKT_SYNC);
 		} else {
 			dccp_pr_debug("packet discarded due to err=%d\n", err);
 			kfree_skb(skb);
@@ -574,6 +576,12 @@ void dccp_send_sync(struct sock *sk, const u64 ackno,
 	DCCP_SKB_CB(skb)->dccpd_type = pkt_type;
 	DCCP_SKB_CB(skb)->dccpd_ack_seq = ackno;
 
+	/*
+	 * Clear the flag in case the Sync was scheduled for out-of-band data,
+	 * such as carrying a long Ack Vector.
+	 */
+	dccp_sk(sk)->dccps_sync_scheduled = 0;
+
 	dccp_transmit_skb(sk, skb);
 }
 

^ permalink raw reply

* [PATCH 2/3] [ACKVEC]: Separate skb option parsing from CCID-specific code
From: Gerrit Renker @ 2008-01-08 14:07 UTC (permalink / raw)
  To: acme; +Cc: dccp, netdev, Gerrit Renker
In-Reply-To: <1199801276-16955-2-git-send-email-gerrit@erg.abdn.ac.uk>

This patch replaces an almost identical reduplication of code; large parts of
dccp_parse_options() are repeated as ccid2_ackvector() in ccid2.c.

Two problems are involved, apart from the duplication:
 1. no separation of concerns: CCIDs should not need to be concerned with the
    details of parsing header options;
 2. one can not assume that Ack Vectors appear as a contiguous area within an
    skb, it is legal to insert other options and/or padding in between. The
    current code would throw an error and stop reading in such a case. Not good.

Details:
--------
 * received Ack Vectors are parsed by dccp_parse_options() alone, which passes
   the result on to the CCID-specific ccid_hc_tx_parse_options();
 * CCIDs interested in using/decoding Ack Vector information need to add code
   to receive parsed Ack Vectors via this interface;
 * a data structure, `struct dccp_ackvec_parsed' is provided, which arranges all
   Ack Vectors of a single skb into a list of parsed chunks;
 * a doubly-linked list was used since insertion needs to be at the tail end;
 * the associated list handling and list de-allocation routines.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
---
 net/dccp/ackvec.c  |   28 ++++++++++++++++++++++++++++
 net/dccp/ackvec.h  |   19 +++++++++++++++++++
 net/dccp/options.c |   17 ++++++++++-------
 3 files changed, 57 insertions(+), 7 deletions(-)

--- a/net/dccp/ackvec.h
+++ b/net/dccp/ackvec.h
@@ -112,4 +112,23 @@ static inline bool dccp_ackvec_is_empty(const struct dccp_ackvec *av)
 {
 	return av->av_overflow == 0 && av->av_buf_head == av->av_buf_tail;
 }
+
+/**
+ * struct dccp_ackvec_parsed  -  Record offsets of Ack Vectors in skb
+ * @vec:	start of vector (offset into skb)
+ * @len:	length of @vec
+ * @nonce:	whether @vec had an ECN nonce of 0 or 1
+ * @node:	FIFO - arranged in descending order of ack_ackno
+ * This structure is used by CCIDs to access Ack Vectors in a received skb.
+ */
+struct dccp_ackvec_parsed {
+	u8		 *vec,
+			 len,
+			 nonce:1;
+	struct list_head node;
+};
+
+extern int dccp_ackvec_parsed_add(struct list_head *head,
+				  u8 *vec, u8 len, u8 nonce);
+extern void dccp_ackvec_parsed_cleanup(struct list_head *parsed_chunks);
 #endif /* _ACKVEC_H */
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -135,13 +135,6 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
 			if (rc)
 				goto out_invalid_option;
 			break;
-		case DCCPO_ACK_VECTOR_0:
-		case DCCPO_ACK_VECTOR_1:
-			if (dccp_packet_without_ack(skb))   /* RFC 4340, 11.4 */
-				break;
-			dccp_pr_debug("%s Ack Vector (len=%u)\n", dccp_role(sk),
-				      len);
-			break;
 		case DCCPO_TIMESTAMP:
 			if (len != 4)
 				goto out_invalid_option;
@@ -229,6 +222,16 @@ int dccp_parse_options(struct sock *sk, struct dccp_request_sock *dreq,
 						     value) != 0)
 				goto out_invalid_option;
 			break;
+		case DCCPO_ACK_VECTOR_0:
+		case DCCPO_ACK_VECTOR_1:
+			if (dccp_packet_without_ack(skb))   /* RFC 4340, 11.4 */
+				break;
+			/*
+			 * Ack vectors are processed by the TX CCID if it is
+			 * interested. The RX CCID need not parse Ack Vectors,
+			 * since it is only interested in clearing old state.
+			 * Fall through.
+			 */
 		case DCCPO_MIN_TX_CCID_SPECIFIC ... DCCPO_MAX_TX_CCID_SPECIFIC:
 			if (ccid_hc_tx_parse_options(dp->dccps_hc_tx_ccid, sk,
 						     opt, len, value - options,
--- a/net/dccp/ackvec.c
+++ b/net/dccp/ackvec.c
@@ -345,6 +345,34 @@ free_records:
 	}
 }
 
+/*
+ *	Routines to keep track of Ack Vectors received in an skb
+ */
+int dccp_ackvec_parsed_add(struct list_head *head, u8 *vec, u8 len, u8 nonce)
+{
+	struct dccp_ackvec_parsed *new = kmalloc(sizeof(*new), GFP_ATOMIC);
+
+	if (new == NULL)
+		return -ENOBUFS;
+	new->vec   = vec;
+	new->len   = len;
+	new->nonce = nonce;
+
+	list_add_tail(&new->node, head);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(dccp_ackvec_parsed_add);
+
+void dccp_ackvec_parsed_cleanup(struct list_head *parsed_chunks)
+{
+	struct dccp_ackvec_parsed *cur, *next;
+
+	list_for_each_entry_safe(cur, next, parsed_chunks, node)
+		kfree(cur);
+	INIT_LIST_HEAD(parsed_chunks);
+}
+EXPORT_SYMBOL_GPL(dccp_ackvec_parsed_cleanup);
+
 int __init dccp_ackvec_init(void)
 {
 	dccp_ackvec_slab = kmem_cache_create("dccp_ackvec",

^ permalink raw reply

* [DCCP] [PATCH 0/3]: Finishing Ack Vector patch set
From: Gerrit Renker @ 2008-01-08 14:07 UTC (permalink / raw)
  To: acme; +Cc: dccp, netdev

This is the "new stuff" for Ack Vectors, completing the Ack Vector work.

All other patches are as before, with the single exception of the update sent
yesterday (the recovery strategy for dealing with suddenly large losses).

Arnaldo, can you please indicate whether I should resubmit the older patches.

After some more testing I am positive that these are now ready to be considered.

Patch #1: Addresses the pending FIXME, if Ack Vectors grow too large for a packet,
          their variable-length information is transported via a separate/scheduled
	  Sync. This mechanism can also be used for other out-of-band signalling
	  (e.g. slow-receiver option).

Patch #2: Since there is already a rich option-parsing infrastructure in options.c,
          it is not necessary to replicate the code to parse Ack Vectors. This patch
	  adds utility routines to store parsed Ack Vectors, which are processed by
	  the existing infrastructure via the hc_tx_parse_options() interface.

Patch #3: Integrates patch #3 for use with CCID2.	  

All patches have been uploaded to git://eden-feed.erg.abdn.ac.uk/dccp_exp

^ permalink raw reply

* Re: [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
From: linux @ 2008-01-08 13:19 UTC (permalink / raw)
  To: netdev, romieu; +Cc: akpm, davem, linux
In-Reply-To: <20080108122814.12005.qmail@science.horizon.com>

I take that back.  This patch does NOT fix the leak, at least if
ping: sendmsg: No buffer space available
is any indication...

I think I was reading slabinfo wrong.
kmalloc-2048       42111  42112   2048    4    2 : tunables    0    0    0 : slabdata  10528  10528      0

Sorry for the false hope.

^ permalink raw reply

* [PATCH 2/3] drivers/net/ipg.c: convert Jumbo.FoundStart to bool
From: linux @ 2008-01-08 12:32 UTC (permalink / raw)
  To: netdev, romieu; +Cc: akpm, davem, linux

This is a fairly basic code cleanup that annoyed me while working
on the first patch.
---
diff --git a/drivers/net/ipg.h b/drivers/net/ipg.h
index d5d092c..5d7cc84 100644
--- a/drivers/net/ipg.h
+++ b/drivers/net/ipg.h
@@ -789,11 +789,6 @@ struct ipg_rx {
 	__le64 frag_info;
 };
 
-struct SJumbo {
-	int FoundStart;
-	int CurrentSize;
-	struct sk_buff *skb;
-};
 /* Structure of IPG NIC specific data. */
 struct ipg_nic_private {
 	void __iomem *ioaddr;
@@ -809,7 +804,11 @@ struct ipg_nic_private {
 	unsigned int rx_dirty;
 // Add by Grace 2005/05/19
 #ifdef JUMBO_FRAME
-	struct SJumbo Jumbo;
+	struct SJumbo {
+		bool FoundStart;
+		int CurrentSize;
+		struct sk_buff *skb;
+	} Jumbo;
 #endif
 	unsigned int rx_buf_sz;
 	struct pci_dev *pdev;
diff --git a/drivers/net/ipg.h b/drivers/net/ipg.h
index d5d092c..5d7cc84 100644
--- a/drivers/net/ipg.h
+++ b/drivers/net/ipg.h
@@ -789,11 +789,6 @@ struct ipg_rx {
 	__le64 frag_info;
 };
 
-struct SJumbo {
-	int FoundStart;
-	int CurrentSize;
-	struct sk_buff *skb;
-};
 /* Structure of IPG NIC specific data. */
 struct ipg_nic_private {
 	void __iomem *ioaddr;
@@ -809,7 +804,11 @@ struct ipg_nic_private {
 	unsigned int rx_dirty;
 // Add by Grace 2005/05/19
 #ifdef JUMBO_FRAME
-	struct SJumbo Jumbo;
+	struct SJumbo {
+		bool FoundStart;
+		int CurrentSize;
+		struct sk_buff *skb;
+	} Jumbo;
 #endif
 	unsigned int rx_buf_sz;
 	struct pci_dev *pdev;
diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index a0dfba5..3860fcd 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -1206,7 +1206,7 @@ static void ipg_nic_rx_with_start_and_end(struct net_device *dev,
 
 	if (jumbo->FoundStart) {
 		IPG_DEV_KFREE_SKB(jumbo->skb);
-		jumbo->FoundStart = 0;
+		jumbo->FoundStart = false;
 		jumbo->CurrentSize = 0;
 		jumbo->skb = NULL;
 	}
@@ -1257,7 +1257,7 @@ static void ipg_nic_rx_with_start(struct net_device *dev,
 
 	skb_put(skb, IPG_RXFRAG_SIZE);
 
-	jumbo->FoundStart = 1;
+	jumbo->FoundStart = true;
 	jumbo->CurrentSize = IPG_RXFRAG_SIZE;
 	jumbo->skb = skb;
 
@@ -1303,14 +1303,14 @@ static void ipg_nic_rx_with_end(struct net_device *dev,
 		}
 
 		dev->last_rx = jiffies;
-		jumbo->FoundStart = 0;
+		jumbo->FoundStart = false;
 		jumbo->CurrentSize = 0;
 		jumbo->skb = NULL;
 
 		ipg_nic_rx_free_skb(dev, entry);
 	} else {
 		IPG_DEV_KFREE_SKB(jumbo->skb);
-		jumbo->FoundStart = 0;
+		jumbo->FoundStart = false;
 		jumbo->CurrentSize = 0;
 		jumbo->skb = NULL;
 	}
@@ -1340,7 +1340,7 @@ static void ipg_nic_rx_no_start_no_end(struct net_device *dev,
 		}
 	} else {
 		IPG_DEV_KFREE_SKB(jumbo->skb);
-		jumbo->FoundStart = 0;
+		jumbo->FoundStart = false;
 		jumbo->CurrentSize = 0;
 		jumbo->skb = NULL;
 	}
@@ -1840,7 +1840,7 @@ static int ipg_nic_open(struct net_device *dev)
 
 #ifdef JUMBO_FRAME
 	/* initialize JUMBO Frame control variable */
-	sp->Jumbo.FoundStart = 0;
+	sp->Jumbo.FoundStart = false;
 	sp->Jumbo.CurrentSize = 0;
 	sp->Jumbo.skb = 0;
 	dev->mtu = IPG_TXFRAG_SIZE;

^ permalink raw reply related

* [PATCH 1/3] drivers/net/ipg.c: Fix skbuff leak
From: linux @ 2008-01-08 12:28 UTC (permalink / raw)
  To: netdev, romieu; +Cc: akpm, davem, linux
In-Reply-To: <20080107.231447.08811264.davem@davemloft.net>

Prompted by davem, this attempt at fixing the memory leak
actually appears to work.  At least, leaving ping -f -s1472 -l64
running doesn't drop packets and doesn't show up in /proc/slabinfo.
---
diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index dbd23bb..a0dfba5 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -1110,10 +1110,9 @@ enum {
 	Frame_WithStart_WithEnd = 11
 };
 
-inline void ipg_nic_rx_free_skb(struct net_device *dev)
+inline void ipg_nic_rx_free_skb(struct net_device *dev, unsigned entry)
 {
 	struct ipg_nic_private *sp = netdev_priv(dev);
-	unsigned int entry = sp->rx_current % IPG_RFDLIST_LENGTH;
 
 	if (sp->RxBuff[entry]) {
 		struct ipg_rx *rxfd = sp->rxd + entry;
@@ -1308,7 +1307,7 @@ static void ipg_nic_rx_with_end(struct net_device *dev,
 		jumbo->CurrentSize = 0;
 		jumbo->skb = NULL;
 
-		ipg_nic_rx_free_skb(dev);
+		ipg_nic_rx_free_skb(dev, entry);
 	} else {
 		IPG_DEV_KFREE_SKB(jumbo->skb);
 		jumbo->FoundStart = 0;
@@ -1337,7 +1336,7 @@ static void ipg_nic_rx_no_start_no_end(struct net_device *dev,
 				}
 			}
 			dev->last_rx = jiffies;
-			ipg_nic_rx_free_skb(dev);
+			ipg_nic_rx_free_skb(dev, entry);
 		}
 	} else {
 		IPG_DEV_KFREE_SKB(jumbo->skb);

^ permalink raw reply related

* [PATCH 3/3] drivers/net/ipg.c: fix horrible mdio_read and _write
From: linux @ 2008-01-08 12:40 UTC (permalink / raw)
  To: netdev, romieu; +Cc: akpm, davem, linux

akpm noticed that this code was awful.
 ipg.c |  158 +++++++++++++++++------------------------------------------------- 1 file changed, 43 insertions(+), 115 deletions(-)
should be sufficient justification all by itself.
---
diff --git a/drivers/net/ipg.c b/drivers/net/ipg.c
index 3860fcd..b3d3fc8 100644
--- a/drivers/net/ipg.c
+++ b/drivers/net/ipg.c
@@ -202,12 +202,31 @@ static u16 read_phy_bit(void __iomem * ioaddr, u8 phyctrlpolarity)
 }
 
 /*
+ * Transmit the given bits, MSB-first, through the MgmtData bit (bit 1)
+ * of the PhyCtrl register. 1 <= len <= 32.  "ioaddr" is the register
+ * address, and "otherbits" are the values of the other bits.
+ */
+static void mdio_write_bits(void __iomem *ioaddr, u8 otherbits, u32 data, unsigned len)
+{
+	otherbits |= IPG_PC_MGMTDIR;
+	do {
+		/* IPG_PC_MGMTDATA is a power of 2; compiler knows to shift */
+		u8 d = ((data >> --len) & 1) * IPG_PC_MGMTDATA;
+		/* + rather than | lets compiler microoptimize better */
+		ipg_drive_phy_ctl_low_high(ioaddr, d + otherbits);
+	} while (len);
+}
+
+/*
  * Read a register from the Physical Layer device located
  * on the IPG NIC, using the IPG PHYCTRL register.
  */
 static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
 {
 	void __iomem *ioaddr = ipg_ioaddr(dev);
+	u8 const polarity = ipg_r8(PHY_CTRL) &
+			(IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
+	unsigned i, data = 0;
 	/*
 	 * The GMII mangement frame structure for a read is as follows:
 	 *
@@ -221,75 +240,30 @@ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
 	 * D = bit of read data (MSB first)
 	 *
 	 * Transmission order is 'Preamble' field first, bits transmitted
-	 * left to right (first to last).
+	 * left to right (msbit-first).
 	 */
-	struct {
-		u32 field;
-		unsigned int len;
-	} p[] = {
-		{ GMII_PREAMBLE,	32 },	/* Preamble */
-		{ GMII_ST,		2  },	/* ST */
-		{ GMII_READ,		2  },	/* OP */
-		{ phy_id,		5  },	/* PHYAD */
-		{ phy_reg,		5  },	/* REGAD */
-		{ 0x0000,		2  },	/* TA */
-		{ 0x0000,		16 },	/* DATA */
-		{ 0x0000,		1  }	/* IDLE */
-	};
-	unsigned int i, j;
-	u8 polarity, data;
-
-	polarity  = ipg_r8(PHY_CTRL);
-	polarity &= (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
-
-	/* Create the Preamble, ST, OP, PHYAD, and REGAD field. */
-	for (j = 0; j < 5; j++) {
-		for (i = 0; i < p[j].len; i++) {
-			/* For each variable length field, the MSB must be
-			 * transmitted first. Rotate through the field bits,
-			 * starting with the MSB, and move each bit into the
-			 * the 1st (2^1) bit position (this is the bit position
-			 * corresponding to the MgmtData bit of the PhyCtrl
-			 * register for the IPG).
-			 *
-			 * Example: ST = 01;
-			 *
-			 *          First write a '0' to bit 1 of the PhyCtrl
-			 *          register, then write a '1' to bit 1 of the
-			 *          PhyCtrl register.
-			 *
-			 * To do this, right shift the MSB of ST by the value:
-			 * [field length - 1 - #ST bits already written]
-			 * then left shift this result by 1.
-			 */
-			data  = (p[j].field >> (p[j].len - 1 - i)) << 1;
-			data &= IPG_PC_MGMTDATA;
-			data |= polarity | IPG_PC_MGMTDIR;
-
-			ipg_drive_phy_ctl_low_high(ioaddr, data);
-		}
-	}
-
-	send_three_state(ioaddr, polarity);
-
-	read_phy_bit(ioaddr, polarity);
+	mdio_write_bits(ioaddr, polarity, GMII_PREAMBLE, 32);
+	mdio_write_bits(ioaddr, polarity, GMII_ST<<12 | GMII_READ << 10 |
+	                                  phy_id << 5 | phy_reg, 14);
 
 	/*
 	 * For a read cycle, the bits for the next two fields (TA and
 	 * DATA) are driven by the PHY (the IPG reads these bits).
 	 */
-	for (i = 0; i < p[6].len; i++) {
-		p[6].field |=
-		    (read_phy_bit(ioaddr, polarity) << (p[6].len - 1 - i));
-	}
+	send_three_state(ioaddr, polarity);	/* TA first bit */
+	(void)read_phy_bit(ioaddr, polarity);		/* TA second bit */
+
+	for (i = 0; i < 16; i++)
+		data += data + read_phy_bit(ioaddr, polarity);
 
+	/* Trailing idle */
 	send_three_state(ioaddr, polarity);
 	send_three_state(ioaddr, polarity);
 	send_three_state(ioaddr, polarity);
 	send_end(ioaddr, polarity);
 
 	/* Return the value of the DATA field. */
-	return p[6].field;
+	return data;
 }
 
 /*
@@ -299,11 +273,13 @@ static int mdio_read(struct net_device * dev, int phy_id, int phy_reg)
 static void mdio_write(struct net_device *dev, int phy_id, int phy_reg, int val)
 {
 	void __iomem *ioaddr = ipg_ioaddr(dev);
+	u8 const polarity = ipg_r8(PHY_CTRL) &
+			(IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
 	/*
-	 * The GMII mangement frame structure for a read is as follows:
+	 * The GMII mangement frame structure for a write is as follows:
 	 *
 	 * |Preamble|st|op|phyad|regad|ta|      data      |idle|
-	 * |< 32 1s>|01|10|AAAAA|RRRRR|z0|DDDDDDDDDDDDDDDD|z   |
+	 * |< 32 1s>|01|01|AAAAA|RRRRR|10|DDDDDDDDDDDDDDDD|z   |
 	 *
 	 * <32 1s> = 32 consecutive logic 1 values
 	 * A = bit of Physical Layer device address (MSB first)
@@ -312,66 +288,18 @@ static void mdio_write(struct net_device *dev, int phy_id, int phy_reg, int val)
 	 * D = bit of write data (MSB first)
 	 *
 	 * Transmission order is 'Preamble' field first, bits transmitted
-	 * left to right (first to last).
+	 * left to right (msbit-first).
 	 */
-	struct {
-		u32 field;
-		unsigned int len;
-	} p[] = {
-		{ GMII_PREAMBLE,	32 },	/* Preamble */
-		{ GMII_ST,		2  },	/* ST */
-		{ GMII_WRITE,		2  },	/* OP */
-		{ phy_id,		5  },	/* PHYAD */
-		{ phy_reg,		5  },	/* REGAD */
-		{ 0x0002,		2  },	/* TA */
-		{ val & 0xffff,		16 },	/* DATA */
-		{ 0x0000,		1  }	/* IDLE */
-	};
-	unsigned int i, j;
-	u8 polarity, data;
-
-	polarity  = ipg_r8(PHY_CTRL);
-	polarity &= (IPG_PC_DUPLEX_POLARITY | IPG_PC_LINK_POLARITY);
-
-	/* Create the Preamble, ST, OP, PHYAD, and REGAD field. */
-	for (j = 0; j < 7; j++) {
-		for (i = 0; i < p[j].len; i++) {
-			/* For each variable length field, the MSB must be
-			 * transmitted first. Rotate through the field bits,
-			 * starting with the MSB, and move each bit into the
-			 * the 1st (2^1) bit position (this is the bit position
-			 * corresponding to the MgmtData bit of the PhyCtrl
-			 * register for the IPG).
-			 *
-			 * Example: ST = 01;
-			 *
-			 *          First write a '0' to bit 1 of the PhyCtrl
-			 *          register, then write a '1' to bit 1 of the
-			 *          PhyCtrl register.
-			 *
-			 * To do this, right shift the MSB of ST by the value:
-			 * [field length - 1 - #ST bits already written]
-			 * then left shift this result by 1.
-			 */
-			data  = (p[j].field >> (p[j].len - 1 - i)) << 1;
-			data &= IPG_PC_MGMTDATA;
-			data |= polarity | IPG_PC_MGMTDIR;
-
-			ipg_drive_phy_ctl_low_high(ioaddr, data);
-		}
-	}
+	mdio_write_bits(ioaddr, polarity, GMII_PREAMBLE, 32);
+	mdio_write_bits(ioaddr, polarity, GMII_ST << 14 | GMII_WRITE << 12 |
+	                                  phy_id << 7 | phy_reg << 2 |
+					  0x2, 16);
+	mdio_write_bits(ioaddr, polarity, val, 16);	/* DATA */
 
 	/* The last cycle is a tri-state, so read from the PHY. */
-	for (j = 7; j < 8; j++) {
-		for (i = 0; i < p[j].len; i++) {
-			ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_LO | polarity);
-
-			p[j].field |= ((ipg_r8(PHY_CTRL) &
-				IPG_PC_MGMTDATA) >> 1) << (p[j].len - 1 - i);
-
-			ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_HI | polarity);
-		}
-	}
+	ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_LO | polarity);
+	(void)ipg_r8(PHY_CTRL);
+	ipg_write_phy_ctl(ioaddr, IPG_PC_MGMTCLK_HI | polarity);
 }
 
 /* Set LED_Mode JES20040127EEPROM */

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox