Netdev List
 help / color / mirror / Atom feed
* Re: Assertions in latest kernels
From: David Miller @ 2008-01-23 11:06 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: krkumar2, netdev
In-Reply-To: <Pine.LNX.4.64.0801231215250.31652@kivilampi-30.cs.helsinki.fi>

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Wed, 23 Jan 2008 12:49:31 +0200 (EET)

> On Wed, 23 Jan 2008, Ilpo Järvinen wrote:
> 
> Hmm, perhaps it could be something related to this (and some untested 
> path somewhere which is now exposed):
> 
> commit 4a55b553f691abadaa63570dfc714e20913561c1
> Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> Date:   Thu Dec 20 20:36:03 2007 -0800
> 
>     [TCP]: Fix TSO deferring
> 
> Dave, what do you think? Wouldn't explain the one -rc only report though 
> from Denys. Another one I'm a bit unsure of is this:

Right, this would be something to consider for the net-2.6.25
and thus -mm cases, but not for 2.6.24* since this patch didn't
go there.

I don't see how this change could matter offhand.  Even with that
incorrect TSO test, the same set of write queue configurations can
still occur just some less likely than after the patch.

I would expect the usual global test coverage to narrow that gap.

> commit 757c32944b80fd95542bd66f06032ab773034d53
> Author: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
> Date:   Thu Jan 3 20:39:01 2008 -0800
> 
>     [TCP]: Perform setting of common control fields in one place
> 
> ->sacked field is cleared in tcp_retransmit_skb due to a subtle change, 
> which might be buggy.... However, I find it rather unlikely that this 
> would explain Kumar's case. Anyway, here's the one that reverts the 
> problematic part of it. ...and this is net-2.6.25 as well so it won't 
> solve Denys' case either.

I also suspect this one isn't the cause.

I think we'll be better off once we get some more data,
and therefore be able to make more correlations between
the various failures.

Anyways your patch here is a good start as it will provable
eliminate this as a possibility.

^ permalink raw reply

* Re: pull request: wireless-2.6 'upstream' 2008-01-22
From: Stefano Brivio @ 2008-01-23 11:15 UTC (permalink / raw)
  To: John W. Linville
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20080123014521.GG3206-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org>

On Tue, 22 Jan 2008 20:45:21 -0500
"John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org> wrote:

>       b43legacy: Remove the PHY spinlock

I hope you tested this. I still haven't been able to (I received the
needed hardware yesterday), and Michael said that the patch has been
compile-tested only.


-- 
Ciao
Stefano

^ permalink raw reply

* Re: pull request: wireless-2.6 'upstream' 2008-01-22
From: Michael Buesch @ 2008-01-23 11:30 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: John W. Linville, davem, netdev, linux-wireless
In-Reply-To: <20080123121551.4586e706@morte>

On Wednesday 23 January 2008 12:15:51 Stefano Brivio wrote:
> On Tue, 22 Jan 2008 20:45:21 -0500
> "John W. Linville" <linville@tuxdriver.com> wrote:
> 
> >       b43legacy: Remove the PHY spinlock
> 
> I hope you tested this. I still haven't been able to (I received the
> needed hardware yesterday), and Michael said that the patch has been
> compile-tested only.

John, if we use subject lines such as:

[PATCH RFT] foobar: bizbaz

That means the patch is _not_ submitted for inclusion, yet.
The RFT means Request-For-Testing. Ususally, if I send out such
patches, they are completely untested. I usually compiletest them,
but that's it.
But in future I can also put a comment into the mailbody that
explains why to not apply it, yet.

For this particular patch, please leave it in now. I'm pretty
sure it is correct. So actual testing will be done upstream now. ;)

-- 
Greetings Michael.

^ permalink raw reply

* Re: 2.6.24-rc8 ppp regression
From: Evgeniy Polyakov @ 2008-01-23 11:33 UTC (permalink / raw)
  To: maximilian attems; +Cc: netdev
In-Reply-To: <20080123093509.GA4718@stro.at>

On Wed, Jan 23, 2008 at 10:35:09AM +0100, maximilian attems (max@stro.at) wrote:
> Jan 22 23:23:13 dual kernel: unregister_netdevice: waiting for ppp0 to become free. Usage count = 1
> Jan 22 23:23:44 dual last message repeated 3 times
> Jan 22 23:23:54 dual kernel: unregister_netdevice: waiting for ppp0 to become free. Usage count = 1
> 
> 2.6.24-rc7 works fine, not yet bisected, will do later in the evening.

Fix (revert) is in Dave's tree already.

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: Assertions in latest kernels
From: Ilpo Järvinen @ 2008-01-23 11:42 UTC (permalink / raw)
  To: Krishna Kumar2; +Cc: David Miller, Netdev
In-Reply-To: <OFADDD4CF1.583863C7-ON652573D9.003CD405-652573D9.003CA37C@in.ibm.com>

On Wed, 23 Jan 2008, Krishna Kumar2 wrote:

> While running with this patch, I got these errors (pasted at the end
> of this mail).

I don't have a clue why it didn't go to the checking func (or it didn't 
print anything) but just had those WARN_ONs... Hopefully this is giving 
somewhat better input (applies on top of the other debug patch).

--
 i.

[PATCH] [TCP]: more debug

---
 include/net/tcp.h    |    3 ++-
 net/ipv4/tcp_input.c |    9 ++++++++-
 net/ipv4/tcp_ipv4.c  |   19 ++++++++++++++-----
 3 files changed, 24 insertions(+), 7 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 0685035..129c3b1 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -272,6 +272,7 @@ DECLARE_SNMP_STAT(struct tcp_mib, tcp_statistics);
 #define TCP_ADD_STATS_BH(field, val)	SNMP_ADD_STATS_BH(tcp_statistics, field, val)
 #define TCP_ADD_STATS_USER(field, val)	SNMP_ADD_STATS_USER(tcp_statistics, field, val)
 
+extern void tcp_print_queue(struct sock *sk);
 extern void			tcp_verify_wq(struct sock *sk);
 
 extern void			tcp_v4_err(struct sk_buff *skb, u32);
@@ -772,8 +773,8 @@ static inline __u32 tcp_current_ssthresh(const struct sock *sk)
 /* Use define here intentionally to get WARN_ON location shown at the caller */
 #define tcp_verify_left_out(tp)	\
 	do { \
-		WARN_ON(tcp_left_out(tp) > tp->packets_out); \
 		tcp_verify_wq((struct sock *)tp); \
+		WARN_ON(tcp_left_out(tp) > tp->packets_out); \
 	} while(0)
 
 extern void tcp_enter_cwr(struct sock *sk, const int set_ssthresh);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index cdacf70..295490e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2133,12 +2133,15 @@ static void tcp_verify_retransmit_hint(struct tcp_sock *tp, struct sk_buff *skb)
 static void tcp_mark_head_lost(struct sock *sk, int packets, int fast_rexmit)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	struct sk_buff *skb;
+	struct sk_buff *skb, *prev = NULL;
 	int cnt;
 
+	tcp_verify_left_out(tp);
+
 	BUG_TRAP(packets <= tp->packets_out);
 	if (tp->lost_skb_hint) {
 		skb = tp->lost_skb_hint;
+		prev = skb;
 		cnt = tp->lost_cnt_hint;
 	} else {
 		skb = tcp_write_queue_head(sk);
@@ -2166,6 +2169,10 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int fast_rexmit)
 			tcp_verify_retransmit_hint(tp, skb);
 		}
 	}
+	if (tcp_left_out(tp) > tp->packets_out) {
+		printk(KERN_ERR "Prev hint: %p, exit %p\n", prev, skb);
+		tcp_print_queue(sk);
+	}
 	tcp_verify_left_out(tp);
 }
 
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index c95682e..c2a88c5 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -117,6 +117,15 @@ void tcp_print_queue(struct sock *sk)
 	int idx = 0;
 	int i;
 
+	i = 0;
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			printk(KERN_ERR "head %u %p\n", i, skb);		
+		else
+			printk(KERN_ERR "skb %u %p\n", i, skb);
+		i++;
+	}
+
 	tcp_for_write_queue(skb, sk) {
 		if (skb == tcp_send_head(sk))
 			break;
@@ -195,11 +204,6 @@ void tcp_verify_wq(struct sock *sk)
 		packets += tcp_skb_pcount(skb);
 	}
 
-	WARN_ON(lost != tp->lost_out);
-	WARN_ON(tcp_is_sack(tp) && (sacked != tp->sacked_out));
-	WARN_ON(packets != tp->packets_out);
-	WARN_ON(fackets != tp->fackets_out);
-
 	if ((lost != tp->lost_out) ||
 	    (tcp_is_sack(tp) && (sacked != tp->sacked_out)) ||
 	    (packets != tp->packets_out) ||
@@ -213,6 +217,11 @@ void tcp_verify_wq(struct sock *sk)
 			tp->rx_opt.sack_ok);
 		tcp_print_queue(sk);
 	}
+
+	WARN_ON(lost != tp->lost_out);
+	WARN_ON(tcp_is_sack(tp) && (sacked != tp->sacked_out));
+	WARN_ON(packets != tp->packets_out);
+	WARN_ON(fackets != tp->fackets_out);
 }
 
 static int tcp_v4_get_port(struct sock *sk, unsigned short snum)
-- 
1.5.2.2


^ permalink raw reply related

* [PATCH] Introducing socket mark socket option
From: Laszlo Attila Toth @ 2008-01-23 12:40 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Netfilter Developer Mailing List, netdev, linux-arch,
	Laszlo Attila Toth

A userspace program may wish to set the mark for each packets its send
without using the netfilter MARK target. Changing the mark can be used
mark based routing without netfilter or for packet filtering.

It requires CAP_NET_ADMIN capability.

Signed-off-by: Laszlo Attila Toth <panther@balabit.hu>
---
 include/asm-alpha/socket.h    |    2 ++
 include/asm-arm/socket.h      |    2 ++
 include/asm-avr32/socket.h    |    2 ++
 include/asm-blackfin/socket.h |    3 +++
 include/asm-cris/socket.h     |    2 ++
 include/asm-frv/socket.h      |    2 ++
 include/asm-h8300/socket.h    |    2 ++
 include/asm-ia64/socket.h     |    2 ++
 include/asm-m32r/socket.h     |    2 ++
 include/asm-m68k/socket.h     |    2 ++
 include/asm-mips/socket.h     |    2 ++
 include/asm-parisc/socket.h   |    2 ++
 include/asm-powerpc/socket.h  |    2 ++
 include/asm-s390/socket.h     |    2 ++
 include/asm-sh/socket.h       |    2 ++
 include/asm-sparc/socket.h    |    2 ++
 include/asm-sparc64/socket.h  |    1 +
 include/asm-v850/socket.h     |    2 ++
 include/asm-x86/socket.h      |    2 ++
 include/asm-xtensa/socket.h   |    2 ++
 include/net/route.h           |    2 ++
 include/net/sock.h            |    2 ++
 net/core/sock.c               |   11 +++++++++++
 net/ipv4/icmp.c               |    1 +
 net/ipv4/ip_output.c          |    3 +++
 net/ipv4/raw.c                |    2 ++
 26 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/include/asm-alpha/socket.h b/include/asm-alpha/socket.h
index 1fede7f..08c9793 100644
--- a/include/asm-alpha/socket.h
+++ b/include/asm-alpha/socket.h
@@ -60,4 +60,6 @@
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	20
 #define SO_SECURITY_ENCRYPTION_NETWORK		21
 
+#define SO_MARK			36
+
 #endif /* _ASM_SOCKET_H */
diff --git a/include/asm-arm/socket.h b/include/asm-arm/socket.h
index 65a1a64..6817be9 100644
--- a/include/asm-arm/socket.h
+++ b/include/asm-arm/socket.h
@@ -52,4 +52,6 @@
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #endif /* _ASM_SOCKET_H */
diff --git a/include/asm-avr32/socket.h b/include/asm-avr32/socket.h
index a0d0507..35863f2 100644
--- a/include/asm-avr32/socket.h
+++ b/include/asm-avr32/socket.h
@@ -52,4 +52,6 @@
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #endif /* __ASM_AVR32_SOCKET_H */
diff --git a/include/asm-blackfin/socket.h b/include/asm-blackfin/socket.h
index 5213c96..2ca702e 100644
--- a/include/asm-blackfin/socket.h
+++ b/include/asm-blackfin/socket.h
@@ -50,4 +50,7 @@
 #define SO_PASSSEC		34
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
+
+#define SO_MARK			36
+
 #endif				/* _ASM_SOCKET_H */
diff --git a/include/asm-cris/socket.h b/include/asm-cris/socket.h
index 5b18dfd..9df0ca8 100644
--- a/include/asm-cris/socket.h
+++ b/include/asm-cris/socket.h
@@ -54,6 +54,8 @@
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #endif /* _ASM_SOCKET_H */
 
 
diff --git a/include/asm-frv/socket.h b/include/asm-frv/socket.h
index a823bef..e51ca67 100644
--- a/include/asm-frv/socket.h
+++ b/include/asm-frv/socket.h
@@ -52,5 +52,7 @@
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/include/asm-h8300/socket.h b/include/asm-h8300/socket.h
index 39911d8..da2520d 100644
--- a/include/asm-h8300/socket.h
+++ b/include/asm-h8300/socket.h
@@ -52,4 +52,6 @@
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #endif /* _ASM_SOCKET_H */
diff --git a/include/asm-ia64/socket.h b/include/asm-ia64/socket.h
index 9e42ce4..d5ef0aa 100644
--- a/include/asm-ia64/socket.h
+++ b/include/asm-ia64/socket.h
@@ -61,4 +61,6 @@
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/include/asm-m32r/socket.h b/include/asm-m32r/socket.h
index 793d5d3..9a0e200 100644
--- a/include/asm-m32r/socket.h
+++ b/include/asm-m32r/socket.h
@@ -52,4 +52,6 @@
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/include/asm-m68k/socket.h b/include/asm-m68k/socket.h
index 6d21b90..dbc64e9 100644
--- a/include/asm-m68k/socket.h
+++ b/include/asm-m68k/socket.h
@@ -52,4 +52,6 @@
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #endif /* _ASM_SOCKET_H */
diff --git a/include/asm-mips/socket.h b/include/asm-mips/socket.h
index 9594568..63f6025 100644
--- a/include/asm-mips/socket.h
+++ b/include/asm-mips/socket.h
@@ -73,6 +73,8 @@ To add: #define SO_REUSEPORT 0x0200	/* Allow local address and port reuse.  */
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #ifdef __KERNEL__
 
 /** sock_type - Socket types
diff --git a/include/asm-parisc/socket.h b/include/asm-parisc/socket.h
index 99e868f..69a7a0d 100644
--- a/include/asm-parisc/socket.h
+++ b/include/asm-parisc/socket.h
@@ -52,4 +52,6 @@
 #define SO_PEERSEC		0x401d
 #define SO_PASSSEC		0x401e
 
+#define SO_MARK			0x401f
+
 #endif /* _ASM_SOCKET_H */
diff --git a/include/asm-powerpc/socket.h b/include/asm-powerpc/socket.h
index 403e9fd..f5a4e16 100644
--- a/include/asm-powerpc/socket.h
+++ b/include/asm-powerpc/socket.h
@@ -59,4 +59,6 @@
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/include/asm-s390/socket.h b/include/asm-s390/socket.h
index 1161ebe..c786ab6 100644
--- a/include/asm-s390/socket.h
+++ b/include/asm-s390/socket.h
@@ -60,4 +60,6 @@
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #endif /* _ASM_SOCKET_H */
diff --git a/include/asm-sh/socket.h b/include/asm-sh/socket.h
index c48d6fc..6d4bf65 100644
--- a/include/asm-sh/socket.h
+++ b/include/asm-sh/socket.h
@@ -52,4 +52,6 @@
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #endif /* __ASM_SH_SOCKET_H */
diff --git a/include/asm-sparc/socket.h b/include/asm-sparc/socket.h
index 7c14239..2e2bd0b 100644
--- a/include/asm-sparc/socket.h
+++ b/include/asm-sparc/socket.h
@@ -52,6 +52,8 @@
 #define SO_TIMESTAMPNS		0x0021
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			0x0022
+
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
diff --git a/include/asm-sparc64/socket.h b/include/asm-sparc64/socket.h
index 986441d..44a625a 100644
--- a/include/asm-sparc64/socket.h
+++ b/include/asm-sparc64/socket.h
@@ -57,4 +57,5 @@
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
 #define SO_SECURITY_ENCRYPTION_NETWORK		0x5004
 
+#define SO_MARK			0x0022
 #endif /* _ASM_SOCKET_H */
diff --git a/include/asm-v850/socket.h b/include/asm-v850/socket.h
index a4c2493..e199a2b 100644
--- a/include/asm-v850/socket.h
+++ b/include/asm-v850/socket.h
@@ -52,4 +52,6 @@
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #endif /* __V850_SOCKET_H__ */
diff --git a/include/asm-x86/socket.h b/include/asm-x86/socket.h
index 99ca648..80af9c4 100644
--- a/include/asm-x86/socket.h
+++ b/include/asm-x86/socket.h
@@ -52,4 +52,6 @@
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #endif /* _ASM_SOCKET_H */
diff --git a/include/asm-xtensa/socket.h b/include/asm-xtensa/socket.h
index 1f5aeac..6100682 100644
--- a/include/asm-xtensa/socket.h
+++ b/include/asm-xtensa/socket.h
@@ -63,4 +63,6 @@
 #define SO_TIMESTAMPNS		35
 #define SCM_TIMESTAMPNS		SO_TIMESTAMPNS
 
+#define SO_MARK			36
+
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/net/route.h b/include/net/route.h
index 5847e6f..326c499 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -27,6 +27,7 @@
 #include <net/dst.h>
 #include <net/inetpeer.h>
 #include <net/flow.h>
+#include <net/sock.h>
 #include <linux/in_route.h>
 #include <linux/rtnetlink.h>
 #include <linux/route.h>
@@ -148,6 +149,7 @@ static inline int ip_route_connect(struct rtable **rp, __be32 dst,
 				   int flags)
 {
 	struct flowi fl = { .oif = oif,
+			    .mark = sk->sk_mark,
 			    .nl_u = { .ip4_u = { .daddr = dst,
 						 .saddr = src,
 						 .tos   = tos } },
diff --git a/include/net/sock.h b/include/net/sock.h
index 9023244..e3fb4c0 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -262,6 +262,8 @@ struct sock {
 	__u32			sk_sndmsg_off;
 	int			sk_write_pending;
 	void			*sk_security;
+	__u32			sk_mark;
+	/* XXX 4 bytes hole on 64 bit */
 	void			(*sk_state_change)(struct sock *sk);
 	void			(*sk_data_ready)(struct sock *sk, int bytes);
 	void			(*sk_write_space)(struct sock *sk);
diff --git a/net/core/sock.c b/net/core/sock.c
index 1c4b1cd..433715f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -667,6 +667,13 @@ set_rcvbuf:
 		else
 			clear_bit(SOCK_PASSSEC, &sock->flags);
 		break;
+	case SO_MARK:
+		if (!capable(CAP_NET_ADMIN))
+			ret = -EPERM;
+		else {
+			sk->sk_mark = val;
+		}
+		break;
 
 		/* We implement the SO_SNDLOWAT etc to
 		   not be settable (1003.1g 5.3) */
@@ -836,6 +843,10 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 	case SO_PEERSEC:
 		return security_socket_getpeersec_stream(sock, optval, optlen, len);
 
+	case SO_MARK:
+		v.val = sk->sk_mark;
+		break;
+
 	default:
 		return -ENOPROTOOPT;
 	}
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 1dbe89c..d25f66a 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -403,6 +403,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
 					      { .daddr = daddr,
 						.saddr = rt->rt_spec_dst,
 						.tos = RT_TOS(ip_hdr(skb)->tos) } },
+				    .mark = sk->sk_mark,
 				    .proto = IPPROTO_ICMP };
 		security_skb_classify_flow(skb, &fl);
 		if (ip_route_output_key(&rt, &fl))
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e57de0f..299cefa 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -168,6 +168,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk,
 	}
 
 	skb->priority = sk->sk_priority;
+	skb->mark = sk->sk_mark;
 
 	/* Send it out. */
 	return ip_local_out(skb);
@@ -385,6 +386,7 @@ packet_routed:
 			     (skb_shinfo(skb)->gso_segs ?: 1) - 1);
 
 	skb->priority = sk->sk_priority;
+	skb->mark = sk->sk_mark;
 
 	return ip_local_out(skb);
 
@@ -1282,6 +1284,7 @@ int ip_push_pending_frames(struct sock *sk)
 	iph->daddr = rt->rt_dst;
 
 	skb->priority = sk->sk_priority;
+	skb->mark = sk->sk_mark;
 	skb->dst = dst_clone(&rt->u.dst);
 
 	if (iph->protocol == IPPROTO_ICMP)
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 91a5218..a50e657 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -352,6 +352,7 @@ static int raw_send_hdrinc(struct sock *sk, void *from, size_t length,
 	skb_reserve(skb, hh_len);
 
 	skb->priority = sk->sk_priority;
+	skb->mark = sk->sk_mark;
 	skb->dst = dst_clone(&rt->u.dst);
 
 	skb_reset_network_header(skb);
@@ -544,6 +545,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 
 	{
 		struct flowi fl = { .oif = ipc.oif,
+				    .mark = sk->sk_mark,
 				    .nl_u = { .ip4_u =
 					      { .daddr = daddr,
 						.saddr = saddr,
-- 
1.5.2.5


^ permalink raw reply related

* Re: pull request: wireless-2.6 'upstream' 2008-01-22
From: John W. Linville @ 2008-01-23 12:46 UTC (permalink / raw)
  To: Michael Buesch
  Cc: Stefano Brivio, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <200801231230.07734.mb-fseUSCV1ubazQB+pC5nmwQ@public.gmane.org>

On Wed, Jan 23, 2008 at 12:30:07PM +0100, Michael Buesch wrote:
> On Wednesday 23 January 2008 12:15:51 Stefano Brivio wrote:
> > On Tue, 22 Jan 2008 20:45:21 -0500
> > "John W. Linville" <linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org> wrote:
> > 
> > >       b43legacy: Remove the PHY spinlock
> > 
> > I hope you tested this. I still haven't been able to (I received the
> > needed hardware yesterday), and Michael said that the patch has been
> > compile-tested only.
> 
> John, if we use subject lines such as:
> 
> [PATCH RFT] foobar: bizbaz
> 
> That means the patch is _not_ submitted for inclusion, yet.
> The RFT means Request-For-Testing. Ususally, if I send out such

Thanks, I'm well aware.

Larry said he tested it, two weeks ago.  No one contradicted it.
And time is short for new development for 2.6.25 -- but there is
plenty of time for fixes. :-)

> For this particular patch, please leave it in now. I'm pretty
> sure it is correct. So actual testing will be done upstream now. ;)

Exactly.

John
-- 
John W. Linville
linville-2XuSBdqkA4R54TAoqtyWWQ@public.gmane.org

^ permalink raw reply

* Re: pull request: wireless-2.6 'upstream' 2008-01-22
From: John W. Linville @ 2008-01-23 12:41 UTC (permalink / raw)
  To: Stefano Brivio; +Cc: davem, netdev, linux-wireless
In-Reply-To: <20080123121551.4586e706@morte>

On Wed, Jan 23, 2008 at 12:15:51PM +0100, Stefano Brivio wrote:
> On Tue, 22 Jan 2008 20:45:21 -0500
> "John W. Linville" <linville@tuxdriver.com> wrote:
> 
> >       b43legacy: Remove the PHY spinlock
> 
> I hope you tested this. I still haven't been able to (I received the
> needed hardware yesterday), and Michael said that the patch has been
> compile-tested only.

Larry said it worked for him.

-- 
John W. Linville
linville@tuxdriver.com

^ permalink raw reply

* Re: [PATCH 2.6.23+] ingress classify to [nf]mark
From: Dzianis Kahanovich @ 2008-01-23 16:42 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1200487509.4457.33.camel@localhost>

Too many pixels to smoke. Sorry.

May be so? ;)) (if undefined classid not overwrited by random value tc_classify)
Even "tc" say to classid=0 - "????"

--- 1/net/sched/sch_ingress.c	2008-01-12 17:27:05.000000000 +0200
+++ 2/net/sched/sch_ingress.c	2008-01-22 22:09:32.000000000 +0200
@@ -136,6 +136,9 @@
  	struct ingress_qdisc_data *p = PRIV(sch);
  	struct tcf_result res;
  	int result;
+#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK
+	res.classid=0;
+#endif

  	D2PRINTK("ingress_enqueue(skb %p,sch %p,[qdisc %p])\n", skb, sch, p);
  	result = tc_classify(skb, p->filter_list, &res);
@@ -169,6 +172,11 @@
  	sch->bstats.packets++;
  	sch->bstats.bytes += skb->len;
  #endif
+#ifdef CONFIG_NET_SCH_INGRESS_TC2MARK
+	if(res.classid)
+	    skb->mark =
(skb->mark&(res.classid>>16))|(skb->tc_index=TC_H_MIN(res.classid));
+//	    skb->mark=res.classid; /* or just so */
+#endif

  	return result;
  }



jamal wrote:

[skipped]

-- 
WBR,
Denis Kaganovich,  mahatma@eu.by  http://mahatma.bspu.unibel.by


^ permalink raw reply

* Re: [Cbe-oss-dev] [PATCH 1/5] spidernet: add missing initialization
From: Jens Osterkamp @ 2008-01-23 13:53 UTC (permalink / raw)
  To: cbe-oss-dev; +Cc: Ishizaki Kou, netdev
In-Reply-To: <20080111.153859.-1300526764.kouish@swc.toshiba.co.jp>

On Friday 11 January 2008, Ishizaki Kou wrote:
> This patch fixes initialization of "aneg_count" and "medium" fields in
> spider_net_card to make spidernet driver correctly sets "link status".
> 
> Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>

Acked-by: Jens Osterkamp <jens@de.ibm.com>

> ---
> 
> Index: linux-powerpc-git/drivers/net/spider_net.c
> ===================================================================
> --- linux-powerpc-git.orig/drivers/net/spider_net.c
> +++ linux-powerpc-git/drivers/net/spider_net.c
> @@ -1399,6 +1399,8 @@ spider_net_link_reset(struct net_device 
>  	spider_net_write_reg(card, SPIDER_NET_GMACINTEN, 0);
> 
>  	/* reset phy and setup aneg */
> +	card->aneg_count = 0;
> +	card->medium = BCM54XX_COPPER;
>  	spider_net_setup_aneg(card);
>  	mod_timer(&card->aneg_timer, jiffies + SPIDER_NET_ANEG_TIMER);
> 
> @@ -1982,6 +1984,8 @@ spider_net_open(struct net_device *netde
>  		goto init_firmware_failed;
> 
>  	/* start probing with copper */
> +	card->aneg_count = 0;
> +	card->medium = BCM54XX_COPPER;
>  	spider_net_setup_aneg(card);
>  	if (card->phy.def->phy_id)
>  		mod_timer(&card->aneg_timer, jiffies + SPIDER_NET_ANEG_TIMER);
> _______________________________________________
> cbe-oss-dev mailing list
> cbe-oss-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/cbe-oss-dev
>
-- 

Gruß,

Jens

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher 
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply

* Re: [Cbe-oss-dev] [PATCH 2/5] spidernet: increase auto-negotiation timeout to 5 seconds
From: Jens Osterkamp @ 2008-01-23 13:54 UTC (permalink / raw)
  To: cbe-oss-dev; +Cc: Ishizaki Kou, netdev
In-Reply-To: <20080111.154125.-1350516935.kouish@swc.toshiba.co.jp>

On Friday 11 January 2008, Ishizaki Kou wrote:
> This patch extends the timeout for spidernet auto-negotiation.
> Auto-negotiation often fails to finish in 2 seconds.
> 
> Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>

Acked-by: Jens Osterkamp <jens@de.ibm.com>

> ---
> 
> Index: linux-powerpc-git/drivers/net/spider_net.h
> ===================================================================
> --- linux-powerpc-git.orig/drivers/net/spider_net.h
> +++ linux-powerpc-git/drivers/net/spider_net.h
> @@ -52,7 +52,7 @@ extern char spider_net_driver_name[];
> 
>  #define SPIDER_NET_TX_TIMER			(HZ/5)
>  #define SPIDER_NET_ANEG_TIMER			(HZ)
> -#define SPIDER_NET_ANEG_TIMEOUT			2
> +#define SPIDER_NET_ANEG_TIMEOUT			5
> 
>  #define SPIDER_NET_RX_CSUM_DEFAULT		1
> 
> _______________________________________________
> cbe-oss-dev mailing list
> cbe-oss-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/cbe-oss-dev
> 



-- 

Gruß,

Jens

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher 
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply

* Re: [Cbe-oss-dev] [PATCH 3/5] spidernet: change interrupt masks
From: Jens Osterkamp @ 2008-01-23 13:55 UTC (permalink / raw)
  To: cbe-oss-dev; +Cc: Ishizaki Kou, netdev
In-Reply-To: <20080111.154343.-1625861724.kouish@swc.toshiba.co.jp>

On Friday 11 January 2008, Ishizaki Kou wrote:
> This patch changes spidernet interrupt masks.
> 
>  - unmask GDAINVAINT. There is an operation to do by spidernet
>    interrupt handler.
>  - mask some interrupts. There are no operations in the interrupt handler.
> 
> Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>

Acked-by: Jens Osterkamp <jens@de.ibm.com>

> ---
> 
> Index: linux-powerpc-git/drivers/net/spider_net.h
> ===================================================================
> --- linux-powerpc-git.orig/drivers/net/spider_net.h
> +++ linux-powerpc-git/drivers/net/spider_net.h
> @@ -159,9 +159,8 @@ extern char spider_net_driver_name[];
> 
>  /** interrupt mask registers */
>  #define SPIDER_NET_INT0_MASK_VALUE	0x3f7fe2c7
> -#define SPIDER_NET_INT1_MASK_VALUE	0xffff7ff7
> -/* no MAC aborts -> auto retransmission */
> -#define SPIDER_NET_INT2_MASK_VALUE	0xffef7ff1
> +#define SPIDER_NET_INT1_MASK_VALUE	0x0000fff2
> +#define SPIDER_NET_INT2_MASK_VALUE	0x000003f1
> 
>  /* we rely on flagged descriptor interrupts */
>  #define SPIDER_NET_FRAMENUM_VALUE	0x00000000
> _______________________________________________
> cbe-oss-dev mailing list
> cbe-oss-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/cbe-oss-dev
> 



-- 

Gruß,

Jens

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher 
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply

* Re: [Cbe-oss-dev] [PATCH 4/5] spidernet: fix error interrupt handling
From: Jens Osterkamp @ 2008-01-23 13:55 UTC (permalink / raw)
  To: cbe-oss-dev; +Cc: Ishizaki Kou, linas, netdev
In-Reply-To: <20080111.154614.-345477188.kouish@swc.toshiba.co.jp>

On Friday 11 January 2008, Ishizaki Kou wrote:
> In addition to the value of GHIINT0STS, spidernet interrupt handler
> should check the values of GHIINT1STS/GHIINT2STS registers at the
> beginning of spider_net_interrupt() so as not to drop error
> interrupts.
> 
> GHIINT1STS/GHIINT2STS registers indicates some of erroneous conditions
> in spidernet, and a few bits of GHIINT0STS register reflects these
> conditions. But GHIINT0MSK masks these bits, so you should check these
> conditions by reading GHIINT1STS/GHIINT2STS registers directly.
> 
> Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>

Acked-by: Jens Osterkamp <jens@de.ibm.com>

> 
> Index: linux-powerpc-git/drivers/net/spider_net.c
> ===================================================================
> --- linux-powerpc-git.orig/drivers/net/spider_net.c
> +++ linux-powerpc-git/drivers/net/spider_net.c
> @@ -1415,18 +1415,12 @@ spider_net_link_reset(struct net_device 
>   * found when an interrupt is presented
>   */
>  static void
> -spider_net_handle_error_irq(struct spider_net_card *card, u32 status_reg)
> +spider_net_handle_error_irq(struct spider_net_card *card, u32 status_reg,
> +			    u32 error_reg1, u32 error_reg2)
>  {
> -	u32 error_reg1, error_reg2;
>  	u32 i;
>  	int show_error = 1;
> 
> -	error_reg1 = spider_net_read_reg(card, SPIDER_NET_GHIINT1STS);
> -	error_reg2 = spider_net_read_reg(card, SPIDER_NET_GHIINT2STS);
> -
> -	error_reg1 &= SPIDER_NET_INT1_MASK_VALUE;
> -	error_reg2 &= SPIDER_NET_INT2_MASK_VALUE;
> -
>  	/* check GHIINT0STS ************************************/
>  	if (status_reg)
>  		for (i = 0; i < 32; i++)
> @@ -1656,12 +1650,15 @@ spider_net_interrupt(int irq, void *ptr)
>  {
>  	struct net_device *netdev = ptr;
>  	struct spider_net_card *card = netdev_priv(netdev);
> -	u32 status_reg;
> +	u32 status_reg, error_reg1, error_reg2;
> 
>  	status_reg = spider_net_read_reg(card, SPIDER_NET_GHIINT0STS);
> -	status_reg &= SPIDER_NET_INT0_MASK_VALUE;
> +	error_reg1 = spider_net_read_reg(card, SPIDER_NET_GHIINT1STS);
> +	error_reg2 = spider_net_read_reg(card, SPIDER_NET_GHIINT2STS);
> 
> -	if (!status_reg)
> +	if (!(status_reg & SPIDER_NET_INT0_MASK_VALUE) &&
> +	    !(error_reg1 & SPIDER_NET_INT1_MASK_VALUE) &&
> +	    !(error_reg2 & SPIDER_NET_INT2_MASK_VALUE))
>  		return IRQ_NONE;
> 
>  	if (status_reg & SPIDER_NET_RXINT ) {
> @@ -1676,7 +1673,8 @@ spider_net_interrupt(int irq, void *ptr)
>  		spider_net_link_reset(netdev);
> 
>  	if (status_reg & SPIDER_NET_ERRINT )
> -		spider_net_handle_error_irq(card, status_reg);
> +		spider_net_handle_error_irq(card, status_reg,
> +					    error_reg1, error_reg2);
> 
>  	/* clear interrupt sources */
>  	spider_net_write_reg(card, SPIDER_NET_GHIINT0STS, status_reg);
> _______________________________________________
> cbe-oss-dev mailing list
> cbe-oss-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/cbe-oss-dev
> 



-- 

Gruß,

Jens

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher 
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply

* Re: [Cbe-oss-dev] [PATCH 5/5] spidernet: revise link status logging
From: Jens Osterkamp @ 2008-01-23 13:55 UTC (permalink / raw)
  To: cbe-oss-dev; +Cc: Ishizaki Kou, netdev
In-Reply-To: <20080111.154847.-494080281.kouish@swc.toshiba.co.jp>

On Friday 11 January 2008, Ishizaki Kou wrote:
> This patch revises the logging for link informations of spidernet.
> 
>   - The link down message is too verbose because auto-negotiation timeout
>     occurs periodically while an ethernet cable is not connected. 
>   - We want to see the link result, and we think it should be displayed. 
> 
> Signed-off-by: Kou Ishizaki <kou.ishizaki@toshiba.co.jp>

Acked-by: Jens Osterkamp <jens@de.ibm.com>

> ---
> 
> Index: linux-powerpc-git/drivers/net/spider_net.c
> ===================================================================
> --- linux-powerpc-git.orig/drivers/net/spider_net.c
> +++ linux-powerpc-git/drivers/net/spider_net.c
> @@ -2045,7 +2045,8 @@ static void spider_net_link_phy(unsigned
>  	/* if link didn't come up after SPIDER_NET_ANEG_TIMEOUT tries, setup phy again */
>  	if (card->aneg_count > SPIDER_NET_ANEG_TIMEOUT) {
> 
> -		pr_info("%s: link is down trying to bring it up\n", card->netdev->name);
> +		pr_debug("%s: link is down trying to bring it up\n",
> +			 card->netdev->name);
> 
>  		switch (card->medium) {
>  		case BCM54XX_COPPER:
> @@ -2096,9 +2097,10 @@ static void spider_net_link_phy(unsigned
> 
>  	card->aneg_count = 0;
> 
> -	pr_debug("Found %s with %i Mbps, %s-duplex %sautoneg.\n",
> -		phy->def->name, phy->speed, phy->duplex==1 ? "Full" : "Half",
> -		phy->autoneg==1 ? "" : "no ");
> +	pr_info("%s: link up, %i Mbps, %s-duplex %sautoneg.\n",
> +		card->netdev->name, phy->speed,
> +		phy->duplex == 1 ? "Full" : "Half",
> +		phy->autoneg == 1 ? "" : "no ");
> 
>  	return;
>  }
> _______________________________________________
> cbe-oss-dev mailing list
> cbe-oss-dev@ozlabs.org
> https://ozlabs.org/mailman/listinfo/cbe-oss-dev
> 



-- 

Gruß,

Jens

IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschäftsführung: Herbert Kircher 
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply

* [IPV4 0/9] TRIE performance patches
From: Robert Olsson @ 2008-01-23 14:06 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, netdev
In-Reply-To: <20080122233733.404145234@linux-foundation.org>


Stephen Hemminger writes:

 > Time to handle a full BGP load (163K of routes).
 > 
 > Before:		Load		Dump		Flush
 >
 > kmem_cache	3.8		13.0		7.2
 > iter		3.9		12.3		6.9
 > unordered	3.1		11.9		4.9
 > find_node	3.1		 0.3		1.2

 I certainly like the speed but what will we brake when
 we don't return in longest prefix order?

labb:/# ip r
default via 10.10.10.1 dev eth0 
5.0.0.0/8 via 192.168.2.2 dev eth3 
10.10.10.0/24 dev eth0  proto kernel  scope link  src 10.10.10.2 
10.10.11.0/24 dev eth1  proto kernel  scope link  src 10.10.11.1 
11.0.0.0/8 via 10.10.11.2 dev eth1 
192.168.1.0/24 dev eth2  proto kernel  scope link  src 192.168.1.2 
192.168.2.0/24 dev eth3  proto kernel  scope link  src 192.168.2.1 

labb:/# ip route list match 10.10.10.1
default via 10.10.10.1 dev eth0 
10.10.10.0/24 dev eth0  proto kernel  scope link  src 10.10.10.2 
labb:/# 

Maybe the unordered dump can be ordered cheaply...

Cheers.
				--ro


^ permalink raw reply

* Re: [PATCH] Introducing socket mark socket option
From: Patrick McHardy @ 2008-01-23 14:19 UTC (permalink / raw)
  To: Laszlo Attila Toth; +Cc: Netfilter Developer Mailing List, netdev, linux-arch
In-Reply-To: <12010920051270-git-send-email-panther@balabit.hu>

Laszlo Attila Toth wrote:
> A userspace program may wish to set the mark for each packets its send
> without using the netfilter MARK target. Changing the mark can be used
> mark based routing without netfilter or for packet filtering.
> 
> It requires CAP_NET_ADMIN capability.
> 

> @@ -403,6 +403,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
>  					      { .daddr = daddr,
>  						.saddr = rt->rt_spec_dst,
>  						.tos = RT_TOS(ip_hdr(skb)->tos) } },
> +				    .mark = sk->sk_mark,

This is useless, the icmp socket is not visible to userspace.

> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
 > ...

What about IPv6?

^ permalink raw reply

* [PATCH 1/3] Cleanup and simplify virtnet header
From: Rusty Russell @ 2008-01-23 14:07 UTC (permalink / raw)
  To: netdev; +Cc: Herbert Xu, virtualization

1) Turn GSO on virtio net into an all-or-nothing (keep checksumming
   separate).  Having multiple bits is a pain: if you can't support something
   you should handle it in software, which is still a performance win.

2) Make VIRTIO_NET_HDR_GSO_ECN a flag in the header, so it can apply to
   IPv6 or v4.

3) Rename VIRTIO_NET_F_NO_CSUM to VIRTIO_NET_F_CSUM (ie. means we do
   checksumming).

4) Add csum and gso params to virtio_net to allow more testing.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/net/virtio_net.c   |   32 ++++++++++++++++----------------
 include/linux/virtio_net.h |   12 ++++--------
 2 files changed, 20 insertions(+), 24 deletions(-)

diff -r 4fb788b18cf8 drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c	Wed Jan 23 13:07:59 2008 +1100
+++ b/drivers/net/virtio_net.c	Wed Jan 23 18:46:05 2008 +1100
@@ -26,6 +26,10 @@
 
 static int napi_weight = 128;
 module_param(napi_weight, int, 0444);
+
+static int csum = 1, gso = 1;
+module_param(csum, int, 0444);
+module_param(gso, int, 0444);
 
 MODULE_LICENSE("GPL");
 
@@ -95,12 +99,9 @@ static void receive_skb(struct net_devic
 
 	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
 		pr_debug("GSO!\n");
-		switch (hdr->gso_type) {
+		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
 		case VIRTIO_NET_HDR_GSO_TCPV4:
 			skb_shinfo(skb)->gso_type = SKB_GSO_TCPV4;
-			break;
-		case VIRTIO_NET_HDR_GSO_TCPV4_ECN:
-			skb_shinfo(skb)->gso_type = SKB_GSO_TCP_ECN;
 			break;
 		case VIRTIO_NET_HDR_GSO_UDP:
 			skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
@@ -114,6 +115,9 @@ static void receive_skb(struct net_devic
 				       dev->name, hdr->gso_type);
 			goto frame_err;
 		}
+
+		if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
+			skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
 
 		skb_shinfo(skb)->gso_size = hdr->gso_size;
 		if (skb_shinfo(skb)->gso_size == 0) {
@@ -249,9 +253,7 @@ static int start_xmit(struct sk_buff *sk
 	if (skb_is_gso(skb)) {
 		hdr->hdr_len = skb_transport_header(skb) - skb->data;
 		hdr->gso_size = skb_shinfo(skb)->gso_size;
-		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_ECN)
-			hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4_ECN;
-		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV4)
 			hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
 		else if (skb_shinfo(skb)->gso_type & SKB_GSO_TCPV6)
 			hdr->gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
@@ -259,6 +261,8 @@ static int start_xmit(struct sk_buff *sk
 			hdr->gso_type = VIRTIO_NET_HDR_GSO_UDP;
 		else
 			BUG();
+		if (skb_shinfo(skb)->gso_type & SKB_GSO_TCP_ECN)
+			hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN;
 	} else {
 		hdr->gso_type = VIRTIO_NET_HDR_GSO_NONE;
 		hdr->gso_size = hdr->hdr_len = 0;
@@ -354,17 +358,13 @@ static int virtnet_probe(struct virtio_d
 	SET_NETDEV_DEV(dev, &vdev->dev);
 
 	/* Do we support "hardware" checksums? */
-	if (vdev->config->feature(vdev, VIRTIO_NET_F_NO_CSUM)) {
+	if (csum && vdev->config->feature(vdev, VIRTIO_NET_F_CSUM)) {
 		/* This opens up the world of extra features. */
 		dev->features |= NETIF_F_HW_CSUM|NETIF_F_SG|NETIF_F_FRAGLIST;
-		if (vdev->config->feature(vdev, VIRTIO_NET_F_TSO4))
-			dev->features |= NETIF_F_TSO;
-		if (vdev->config->feature(vdev, VIRTIO_NET_F_UFO))
-			dev->features |= NETIF_F_UFO;
-		if (vdev->config->feature(vdev, VIRTIO_NET_F_TSO4_ECN))
-			dev->features |= NETIF_F_TSO_ECN;
-		if (vdev->config->feature(vdev, VIRTIO_NET_F_TSO6))
-			dev->features |= NETIF_F_TSO6;
+		if (gso && vdev->config->feature(vdev, VIRTIO_NET_F_GSO)) {
+			dev->features |= NETIF_F_TSO | NETIF_F_UFO
+				| NETIF_F_TSO_ECN | NETIF_F_TSO6;
+		}
 	}
 
 	/* Configuration may specify what MAC to use.  Otherwise random. */
diff -r 4fb788b18cf8 include/linux/virtio_net.h
--- a/include/linux/virtio_net.h	Wed Jan 23 13:07:59 2008 +1100
+++ b/include/linux/virtio_net.h	Wed Jan 23 18:46:05 2008 +1100
@@ -6,12 +6,9 @@
 #define VIRTIO_ID_NET	1
 
 /* The feature bitmap for virtio net */
-#define VIRTIO_NET_F_NO_CSUM	0
-#define VIRTIO_NET_F_TSO4	1
-#define VIRTIO_NET_F_UFO	2
-#define VIRTIO_NET_F_TSO4_ECN	3
-#define VIRTIO_NET_F_TSO6	4
-#define VIRTIO_NET_F_MAC	5
+#define VIRTIO_NET_F_CSUM	0	/* Can handle pkts w/ partial csum */
+#define VIRTIO_NET_F_MAC	5	/* Host has given MAC address. */
+#define VIRTIO_NET_F_GSO	6	/* Can handle pkts w/ any GSO type */
 
 struct virtio_net_config
 {
@@ -27,10 +24,9 @@ struct virtio_net_hdr
 	__u8 flags;
 #define VIRTIO_NET_HDR_GSO_NONE		0	// Not a GSO frame
 #define VIRTIO_NET_HDR_GSO_TCPV4	1	// GSO frame, IPv4 TCP (TSO)
-/* FIXME: Do we need this?  If they said they can handle ECN, do they care? */
-#define VIRTIO_NET_HDR_GSO_TCPV4_ECN	2	// GSO frame, IPv4 TCP w/ ECN
 #define VIRTIO_NET_HDR_GSO_UDP		3	// GSO frame, IPv4 UDP (UFO)
 #define VIRTIO_NET_HDR_GSO_TCPV6	4	// GSO frame, IPv6 TCP
+#define VIRTIO_NET_HDR_GSO_ECN		0x80	// TCP has ECN set
 	__u8 gso_type;
 	__u16 hdr_len;		/* Ethernet + IP + tcp/udp hdrs */
 	__u16 gso_size;		/* Bytes to append to gso_hdr_len per frame */

^ permalink raw reply

* [PATCH 2/3] partial checksum and GSO support for tun/tap.
From: Rusty Russell @ 2008-01-23 14:10 UTC (permalink / raw)
  To: netdev; +Cc: Herbert Xu, virtualization
In-Reply-To: <200801240107.38929.rusty@rustcorp.com.au>

(Changes since last time: we how have explicit IFF_RECV_CSUM and 
IFF_RECV_GSO bits, and some renaming of virtio_net hdr)

We use the virtio_net_hdr: it is an ABI already and designed to
encapsulate such metadata as GSO and partial checksums.

IFF_VIRTIO_HDR means you will write and read a 'struct virtio_net_hdr'
at the start of each packet.  You can always write packets with
partial checksum and gso to the tap device using this header.

IFF_RECV_CSUM means you can handle reading packets with partial
checksums.  If IFF_RECV_GSO is also set, it means you can handle
reading (all types of) GSO packets.

Note that there is no easy way to detect if these flags are supported:
see next patch.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/net/tun.c      |  259 +++++++++++++++++++++++++++++++++++++++++++------
 include/linux/if_tun.h |    6 +
 2 files changed, 238 insertions(+), 27 deletions(-)

diff -r cb85fb035378 drivers/net/tun.c
--- a/drivers/net/tun.c	Wed Jan 23 20:06:56 2008 +1100
+++ b/drivers/net/tun.c	Wed Jan 23 20:12:51 2008 +1100
@@ -62,6 +62,7 @@
 #include <linux/if_ether.h>
 #include <linux/if_tun.h>
 #include <linux/crc32.h>
+#include <linux/virtio_net.h>
 #include <net/net_namespace.h>
 
 #include <asm/system.h>
@@ -238,35 +239,188 @@ static unsigned int tun_chr_poll(struct 
 	return mask;
 }
 
+static struct sk_buff *copy_user_skb(size_t align, struct iovec *iv, size_t len)
+{
+	struct sk_buff *skb;
+
+	if (!(skb = alloc_skb(len + align, GFP_KERNEL)))
+		return ERR_PTR(-ENOMEM);
+
+	if (align)
+		skb_reserve(skb, align);
+
+	if (memcpy_fromiovec(skb_put(skb, len), iv, len)) {
+		kfree_skb(skb);
+		return ERR_PTR(-EFAULT);
+	}
+	return skb;
+}
+
+/* This will fail if they give us a crazy iovec, but that's their own fault. */
+static int get_user_skb_frags(const struct iovec *iv, size_t count,
+			      struct skb_frag_struct *f)
+{
+	unsigned int i, j, num_pg = 0;
+	int err;
+	struct page *pages[MAX_SKB_FRAGS];
+
+	down_read(&current->mm->mmap_sem);
+	for (i = 0; i < count; i++) {
+		int n, npages;
+		unsigned long base, len;
+		base = (unsigned long)iv[i].iov_base;
+		len = (unsigned long)iv[i].iov_len;
+
+		if (len == 0)
+			continue;
+
+		/* How many pages will this take? */
+		npages = 1 + (base + len - 1)/PAGE_SIZE - base/PAGE_SIZE;
+		if (unlikely(num_pg + npages > MAX_SKB_FRAGS)) {
+			err = -ENOSPC;
+			goto fail;
+		}
+		n = get_user_pages(current, current->mm, base, npages,
+				   0, 0, pages, NULL);
+		if (unlikely(n < 0)) {
+			err = n;
+			goto fail;
+		}
+
+		/* Transfer pages to the frag array */
+		for (j = 0; j < n; j++) {
+			f[num_pg].page = pages[j];
+			if (j == 0) {
+				f[num_pg].page_offset = offset_in_page(base);
+				f[num_pg].size = min(len, PAGE_SIZE -
+						     f[num_pg].page_offset);
+			} else {
+				f[num_pg].page_offset = 0;
+				f[num_pg].size = min(len, PAGE_SIZE);
+			}
+			len -= f[num_pg].size;
+			base += f[num_pg].size;
+			num_pg++;
+		}
+
+		if (unlikely(n != npages)) {
+			err = -EFAULT;
+			goto fail;
+		}
+	}
+	up_read(&current->mm->mmap_sem);
+	return num_pg;
+
+fail:
+	for (i = 0; i < num_pg; i++)
+		put_page(f[i].page);
+	up_read(&current->mm->mmap_sem);
+	return err;
+}
+
+
+static struct sk_buff *map_user_skb(const struct virtio_net_hdr *gso,
+				    size_t align, struct iovec *iv,
+				    size_t count, size_t len)
+{
+	struct sk_buff *skb;
+	struct skb_shared_info *sinfo;
+	int err;
+
+	if (!(skb = alloc_skb(gso->hdr_len + align, GFP_KERNEL)))
+		return ERR_PTR(-ENOMEM);
+
+	if (align)
+		skb_reserve(skb, align);
+
+	sinfo = skb_shinfo(skb);
+	sinfo->gso_size = gso->gso_size;
+	sinfo->gso_type = SKB_GSO_DODGY;
+	switch (gso->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+	case VIRTIO_NET_HDR_GSO_TCPV4:
+		sinfo->gso_type |= SKB_GSO_TCPV4;
+		break;
+	case VIRTIO_NET_HDR_GSO_TCPV6:
+		sinfo->gso_type |= SKB_GSO_TCPV6;
+		break;
+	case VIRTIO_NET_HDR_GSO_UDP:
+		sinfo->gso_type |= SKB_GSO_UDP;
+		break;
+	default:
+		err = -EINVAL;
+		goto fail;
+	}
+
+	if (gso->gso_type & VIRTIO_NET_HDR_GSO_ECN)
+		skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
+
+	/* Copy in the header. */
+	if (memcpy_fromiovec(skb_put(skb, gso->hdr_len), iv, gso->hdr_len)) {
+		err = -EFAULT;
+		goto fail;
+	}
+
+	err = get_user_skb_frags(iv, count, sinfo->frags);
+	if (err < 0)
+		goto fail;
+
+	sinfo->nr_frags = err;
+	skb->len += len;
+	skb->data_len += len;
+	
+	return skb;
+
+fail:
+	kfree_skb(skb);
+	return ERR_PTR(err);
+}
+
+static inline size_t iov_total(const struct iovec *iv, unsigned long count)
+{
+	unsigned long i;
+	size_t len;
+
+	for (i = 0, len = 0; i < count; i++)
+		len += iv[i].iov_len;
+
+	return len;
+}
+
 /* Get packet from user space buffer */
-static __inline__ ssize_t tun_get_user(struct tun_struct *tun, struct iovec *iv, size_t count)
+static __inline__ ssize_t tun_get_user(struct tun_struct *tun, struct iovec *iv, size_t num)
 {
 	struct tun_pi pi = { 0, __constant_htons(ETH_P_IP) };
+	struct virtio_net_hdr gso = { 0, VIRTIO_NET_HDR_GSO_NONE };
 	struct sk_buff *skb;
-	size_t len = count, align = 0;
+	size_t tot_len = iov_total(iv, num);
+	size_t len = tot_len, align = 0;
 
 	if (!(tun->flags & TUN_NO_PI)) {
-		if ((len -= sizeof(pi)) > count)
+		if ((len -= sizeof(pi)) > tot_len)
 			return -EINVAL;
 
 		if(memcpy_fromiovec((void *)&pi, iv, sizeof(pi)))
+			return -EFAULT;
+	}
+	if (tun->flags & TUN_VIRTIO_HDR) {
+		if ((len -= sizeof(gso)) > tot_len)
+			return -EINVAL;
+
+		if (memcpy_fromiovec((void *)&gso, iv, sizeof(gso)))
 			return -EFAULT;
 	}
 
 	if ((tun->flags & TUN_TYPE_MASK) == TUN_TAP_DEV)
 		align = NET_IP_ALIGN;
 
-	if (!(skb = alloc_skb(len + align, GFP_KERNEL))) {
+	if (gso.gso_type != VIRTIO_NET_HDR_GSO_NONE)
+		skb = map_user_skb(&gso, align, iv, num, len);
+	else
+		skb = copy_user_skb(align, iv, len);
+
+	if (IS_ERR(skb)) {
 		tun->dev->stats.rx_dropped++;
-		return -ENOMEM;
-	}
-
-	if (align)
-		skb_reserve(skb, align);
-	if (memcpy_fromiovec(skb_put(skb, len), iv, len)) {
-		tun->dev->stats.rx_dropped++;
-		kfree_skb(skb);
-		return -EFAULT;
+		return PTR_ERR(skb);
 	}
 
 	switch (tun->flags & TUN_TYPE_MASK) {
@@ -280,7 +434,13 @@ static __inline__ ssize_t tun_get_user(s
 		break;
 	};
 
-	if (tun->flags & TUN_NOCHECKSUM)
+	if (gso.flags & (1 << VIRTIO_NET_F_CSUM)) {
+		if (!skb_partial_csum_set(skb,gso.csum_start,gso.csum_offset)) {
+			tun->dev->stats.rx_dropped++;
+			kfree_skb(skb);
+			return -EINVAL;
+		}
+	} else if (tun->flags & TUN_NOCHECKSUM)
 		skb->ip_summed = CHECKSUM_UNNECESSARY;
 
 	netif_rx_ni(skb);
@@ -289,18 +449,7 @@ static __inline__ ssize_t tun_get_user(s
 	tun->dev->stats.rx_packets++;
 	tun->dev->stats.rx_bytes += len;
 
-	return count;
-}
-
-static inline size_t iov_total(const struct iovec *iv, unsigned long count)
-{
-	unsigned long i;
-	size_t len;
-
-	for (i = 0, len = 0; i < count; i++)
-		len += iv[i].iov_len;
-
-	return len;
+	return tot_len;
 }
 
 static ssize_t tun_chr_aio_write(struct kiocb *iocb, const struct iovec *iv,
@@ -313,7 +462,7 @@ static ssize_t tun_chr_aio_write(struct 
 
 	DBG(KERN_INFO "%s: tun_chr_write %ld\n", tun->dev->name, count);
 
-	return tun_get_user(tun, (struct iovec *) iv, iov_total(iv, count));
+	return tun_get_user(tun, (struct iovec *) iv, count);
 }
 
 /* Put packet to the user space buffer */
@@ -336,6 +485,42 @@ static __inline__ ssize_t tun_put_user(s
 		if (memcpy_toiovec(iv, (void *) &pi, sizeof(pi)))
 			return -EFAULT;
 		total += sizeof(pi);
+	}
+	if (tun->flags & TUN_VIRTIO_HDR) {
+		struct virtio_net_hdr gso;
+		struct skb_shared_info *sinfo = skb_shinfo(skb);
+
+		if (skb_is_gso(skb)) {
+			gso.hdr_len = skb_transport_header(skb) - skb->data;
+			gso.gso_size = sinfo->gso_size;
+			if (sinfo->gso_type & SKB_GSO_TCPV4)
+				gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV4;
+			else if (sinfo->gso_type & SKB_GSO_TCPV6)
+				gso.gso_type = VIRTIO_NET_HDR_GSO_TCPV6;
+			else if (sinfo->gso_type & SKB_GSO_UDP)
+				gso.gso_type = VIRTIO_NET_HDR_GSO_UDP;
+			else
+				BUG();
+			if (sinfo->gso_type & SKB_GSO_TCP_ECN)
+				gso.gso_type |= VIRTIO_NET_HDR_GSO_ECN;
+		} else
+			gso.gso_type = VIRTIO_NET_HDR_GSO_NONE;
+		
+		if (skb->ip_summed == CHECKSUM_PARTIAL) {
+			gso.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
+			gso.csum_start = skb->csum_start - skb_headroom(skb);
+			gso.csum_offset = skb->csum_offset;
+		} else {
+			gso.flags = 0;
+			gso.csum_offset = gso.csum_start = 0;
+		}
+
+		if ((len -= sizeof(gso)) < 0)
+			return -EINVAL;
+
+		if (memcpy_toiovec(iv, (void *)&gso, sizeof(gso)))
+			return -EFAULT;
+		total += sizeof(gso);
 	}
 
 	len = min_t(int, skb->len, len);
@@ -523,6 +708,17 @@ static int tun_set_iff(struct file *file
 
 		tun_net_init(dev);
 
+		/* Virtio header means we can handle csum & gso. */
+		if ((ifr->ifr_flags & (IFF_VIRTIO_HDR|IFF_RECV_CSUM)) ==
+		    (IFF_VIRTIO_HDR|IFF_RECV_CSUM)) {
+			dev->features = NETIF_F_SG | NETIF_F_HW_CSUM |
+					NETIF_F_HIGHDMA | NETIF_F_FRAGLIST;
+
+			if (ifr->ifr_flags & IFF_RECV_GSO)
+				dev->features |= NETIF_F_TSO | NETIF_F_UFO |
+						 NETIF_F_TSO_ECN | NETIF_F_TSO6;
+		}
+
 		if (strchr(dev->name, '%')) {
 			err = dev_alloc_name(dev, dev->name);
 			if (err < 0)
@@ -543,6 +739,15 @@ static int tun_set_iff(struct file *file
 
 	if (ifr->ifr_flags & IFF_ONE_QUEUE)
 		tun->flags |= TUN_ONE_QUEUE;
+
+	if (ifr->ifr_flags & IFF_VIRTIO_HDR)
+		tun->flags |= TUN_VIRTIO_HDR;
+
+	if (ifr->ifr_flags & IFF_RECV_CSUM)
+		tun->flags |= TUN_RECV_CSUM;
+
+	if (ifr->ifr_flags & IFF_RECV_GSO)
+		tun->flags |= TUN_RECV_GSO;
 
 	file->private_data = tun;
 	tun->attached = 1;
diff -r cb85fb035378 include/linux/if_tun.h
--- a/include/linux/if_tun.h	Wed Jan 23 20:06:56 2008 +1100
+++ b/include/linux/if_tun.h	Wed Jan 23 20:12:51 2008 +1100
@@ -70,6 +70,9 @@ struct tun_struct {
 #define TUN_NO_PI	0x0040
 #define TUN_ONE_QUEUE	0x0080
 #define TUN_PERSIST 	0x0100	
+#define TUN_VIRTIO_HDR	0x0200
+#define TUN_RECV_CSUM	0x0400
+#define TUN_RECV_GSO	0x0400
 
 /* Ioctl defines */
 #define TUNSETNOCSUM  _IOW('T', 200, int) 
@@ -85,6 +88,9 @@ struct tun_struct {
 #define IFF_TAP		0x0002
 #define IFF_NO_PI	0x1000
 #define IFF_ONE_QUEUE	0x2000
+#define IFF_VIRTIO_HDR	0x4000
+#define IFF_RECV_CSUM	0x8000
+#define IFF_RECV_GSO	0x0800
 
 struct tun_pi {
 	unsigned short flags;

^ permalink raw reply

* [PATCH 3/3] Interface to query tun/tap features.
From: Rusty Russell @ 2008-01-23 14:14 UTC (permalink / raw)
  To: netdev; +Cc: Herbert Xu, virtualization
In-Reply-To: <200801240110.45178.rusty@rustcorp.com.au>

(No real change, just updated with new bits)

The problem with introducing IFF_RECV_CSUM and IFF_RECV_GSO is that
they need to set dev->features to enable GSO and/or checksumming,
which is supposed to be done before register_netdevice(), ie. as part
of TUNSETIFF.

Unfortunately, TUNSETIFF has always just ignored flags it doesn't understand,
so there's no good way of detecting whether the kernel supports IFF_GSO_HDR.

This patch implements a TUNGETFEATURES ioctl which returns all the valid IFF
flags.  It could be extended later to include other features.

Here's an example program which uses it:

#include <linux/if_tun.h>
#include <sys/types.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <err.h>
#include <stdio.h>

static struct {
	unsigned int flag;
	const char *name;
} known_flags[] = {
	{ IFF_TUN, "TUN" },
	{ IFF_TAP, "TAP" },
	{ IFF_NO_PI, "NO_PI" },
	{ IFF_ONE_QUEUE, "ONE_QUEUE" },
	{ IFF_VIRTIO_HDR, "VIRTIO_HDR" },
	{ IFF_RECV_CSUM, "RECV_CSUM" },
	{ IFF_RECV_GSO, "RECV_GSO" },
};

int main()
{
	unsigned int features, i;

	int netfd = open("/dev/net/tun", O_RDWR);
	if (netfd < 0)
		err(1, "Opening /dev/net/tun");

	if (ioctl(netfd, TUNGETFEATURES, &features) != 0) {
		printf("Kernel does not support TUNGETFEATURES, guessing\n");
		features = (IFF_TUN|IFF_TAP|IFF_NO_PI|IFF_ONE_QUEUE);
	}
	printf("Available features are: ");
	for (i = 0; i < sizeof(known_flags)/sizeof(known_flags[0]); i++) {
		if (features & known_flags[i].flag) {
			features &= ~known_flags[i].flag;
			printf("%s ", known_flags[i].name);
		}
	}
	if (features)
		printf("(UNKNOWN %#x)", features);
	printf("\n");
	return 0;
}

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/net/tun.c      |    9 +++++++++
 include/linux/if_tun.h |    3 +++
 2 files changed, 12 insertions(+)

diff -r c0e7a8b99325 drivers/net/tun.c
--- a/drivers/net/tun.c	Wed Jan 23 20:12:51 2008 +1100
+++ b/drivers/net/tun.c	Wed Jan 23 20:17:28 2008 +1100
@@ -790,6 +790,15 @@ static int tun_chr_ioctl(struct inode *i
 		return 0;
 	}
 
+	if (cmd == TUNGETFEATURES) {
+		/* Currently this just means: "what IFF flags are valid?".
+		 * This is needed because we never checked for invalid flags on
+		 * TUNSETIFF.  This was introduced with IFF_GSO_HDR, so if a
+		 * kernel doesn't have this ioctl, it doesn't have GSO header
+		 * support. */
+		return put_user(IFF_ALL_FLAGS, (unsigned int __user*)argp);
+	}
+
 	if (!tun)
 		return -EBADFD;
 
diff -r c0e7a8b99325 include/linux/if_tun.h
--- a/include/linux/if_tun.h	Wed Jan 23 20:12:51 2008 +1100
+++ b/include/linux/if_tun.h	Wed Jan 23 20:17:28 2008 +1100
@@ -82,6 +82,7 @@ struct tun_struct {
 #define TUNSETOWNER   _IOW('T', 204, int)
 #define TUNSETLINK    _IOW('T', 205, int)
 #define TUNSETGROUP   _IOW('T', 206, int)
+#define TUNGETFEATURES _IOR('T', 207, unsigned int)
 
 /* TUNSETIFF ifr flags */
 #define IFF_TUN		0x0001
@@ -91,6 +92,8 @@ struct tun_struct {
 #define IFF_VIRTIO_HDR	0x4000
 #define IFF_RECV_CSUM	0x8000
 #define IFF_RECV_GSO	0x0800
+#define IFF_ALL_FLAGS (IFF_TUN | IFF_TAP | IFF_NO_PI | IFF_ONE_QUEUE | \
+		       IFF_VIRTIO_HDR | IFF_RECV_CSUM | IFF_RECV_GSO)
 
 struct tun_pi {
 	unsigned short flags;

^ permalink raw reply

* [PATCH net-2.6.25] [PKTGEN] Remove an unused definition in pktgen.c.
From: Rami Rosen @ 2008-01-23 14:38 UTC (permalink / raw)
  To: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 404 bytes --]

Hi,
- Remove an unused definition (LAT_BUCKETS_MAX) in net/core/pktgen.c.
- Remove the corresponding comment.
- The LAT_BUCKETS_MAX seems to have to do with a patch from a long
time ago which was not applied (Ben Greear), which dealt with latency
counters.

See, for example : http://oss.sgi.com/archives/netdev/2002-09/msg00184.html

Regards,
Rami Rosen


Signed-off-by: Rami Rosen <ramirose@gmail.com>

[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 448 bytes --]

diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index eebccdb..b7f2de1 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -170,8 +170,6 @@
 
 #define VERSION  "pktgen v2.69: Packet Generator for packet performance testing.\n"
 
-/* The buckets are exponential in 'width' */
-#define LAT_BUCKETS_MAX 32
 #define IP_NAME_SZ 32
 #define MAX_MPLS_LABELS 16 /* This is the max label stack depth */
 #define MPLS_STACK_BOTTOM htonl(0x00000100)

^ permalink raw reply related

* My 802.3ad is my bond
From: Steven Whitehouse @ 2008-01-23 15:45 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: netdev

Hi,

This commit: ece95f7fefe3afae19e641e1b3f5e64b00d5b948 seems to have
caused a problem with parsing bond arguments as now only the numeric
arguments seem to work (in modprobe.conf) and specifying 802.3ad fails.
When I revert that patch in my local tree all seems ok.

Also I notice that one of my two NICs now reports this:

bonding: bond0: link status definitely down for interface eth0,
disabling it
bonding: bond0: Interface eth0 is already enslaved!
bond0.5: no IPv6 routers present

which I think is also new with this set of bonding updates, before it
used to use both interfaces ok. I've not worked out which of the other
patches causes this so far, but I can if its helpful,

Steve.



^ permalink raw reply

* Re: [IPV4 0/9] TRIE performance patches
From: Stephen Hemminger @ 2008-01-23 16:31 UTC (permalink / raw)
  To: Robert Olsson; +Cc: David Miller, netdev
In-Reply-To: <18327.18935.137799.285515@robur.slu.se>

Robert Olsson wrote:
> Stephen Hemminger writes:
>
>  > Time to handle a full BGP load (163K of routes).
>  > 
>  > Before:		Load		Dump		Flush
>  >
>  > kmem_cache	3.8		13.0		7.2
>  > iter		3.9		12.3		6.9
>  > unordered	3.1		11.9		4.9
>  > find_node	3.1		 0.3		1.2
>
>  I certainly like the speed but what will we brake when
>  we don't return in longest prefix order?
>
> labb:/# ip r
> default via 10.10.10.1 dev eth0 
> 5.0.0.0/8 via 192.168.2.2 dev eth3 
> 10.10.10.0/24 dev eth0  proto kernel  scope link  src 10.10.10.2 
> 10.10.11.0/24 dev eth1  proto kernel  scope link  src 10.10.11.1 
> 11.0.0.0/8 via 10.10.11.2 dev eth1 
> 192.168.1.0/24 dev eth2  proto kernel  scope link  src 192.168.1.2 
> 192.168.2.0/24 dev eth3  proto kernel  scope link  src 192.168.2.1 
>
> labb:/# ip route list match 10.10.10.1
> default via 10.10.10.1 dev eth0 
> 10.10.10.0/24 dev eth0  proto kernel  scope link  src 10.10.10.2 
> labb:/# 
>
> Maybe the unordered dump can be ordered cheaply...
>
> Cheers.
> 				--ro
>
>   
Hash returned the routes in prefix order (then random).  Returning the 
routes in numerical order
seems just as logical. I'm going to test on quagga.


^ permalink raw reply

* [NET_SCHED 00/15]: Make use of new netlink API features
From: Patrick McHardy @ 2008-01-23 16:36 UTC (permalink / raw)
  To: davem; +Cc: netdev, Patrick McHardy

Hi Dave,

these patches change the packet schedulers/classifers/actions to make use
of the features of the new netlink API, like typeful attribute dumping and
parsing, automatic basic attribute validation etc. The also fix a bug and
a warning introduced by my last set of patches.

Please apply, thanks.


 include/linux/pkt_sched.h |    2 +
 include/net/act_api.h     |    4 +-
 net/sched/act_api.c       |  197 +++++++++++++++++++++++++--------------------
 net/sched/act_gact.c      |   20 +++--
 net/sched/act_ipt.c       |   33 +++++---
 net/sched/act_mirred.c    |   15 +++-
 net/sched/act_nat.c       |   15 +++-
 net/sched/act_pedit.c     |   15 +++-
 net/sched/act_police.c    |   43 +++++-----
 net/sched/act_simple.c    |   15 +++-
 net/sched/cls_api.c       |   30 ++++---
 net/sched/cls_basic.c     |   33 ++++----
 net/sched/cls_fw.c        |   41 +++++-----
 net/sched/cls_route.c     |   47 +++++------
 net/sched/cls_rsvp.h      |   45 +++++------
 net/sched/cls_tcindex.c   |   70 ++++++++--------
 net/sched/cls_u32.c       |   56 +++++++------
 net/sched/em_meta.c       |   18 +++--
 net/sched/ematch.c        |   31 +++++---
 net/sched/sch_api.c       |    7 +-
 net/sched/sch_atm.c       |   45 ++++++----
 net/sched/sch_cbq.c       |   75 +++++++----------
 net/sched/sch_dsmark.c    |   37 +++++----
 net/sched/sch_gred.c      |   28 +++++--
 net/sched/sch_hfsc.c      |   30 ++++---
 net/sched/sch_htb.c       |   64 ++++++++++-----
 net/sched/sch_ingress.c   |   12 ++--
 net/sched/sch_netem.c     |   73 +++++++---------
 net/sched/sch_prio.c      |    9 ++-
 net/sched/sch_red.c       |   16 +++-
 net/sched/sch_tbf.c       |   29 +++++---
 31 files changed, 650 insertions(+), 505 deletions(-)

Patrick McHardy (15):
      [NET_SCHED]: sch_atm: fix format string warning
      [NET_SCHED]: sch_netem: use nla_parse_nested_compat
      [NET_SCHED]: act_api: fix netlink API conversion bug
      [NET_SCHED]: act_api: use nlmsg_parse
      [NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get
      [NET_SCHED]: Propagate nla_parse return value
      [NET_SCHED]: Use nla_nest_start/nla_nest_end
      [NET_SCHED]: Use NLA_PUT_STRING for string dumping
      [NET_SCHED]: Use typeful attribute construction helpers
      [NET_SCHED]: Use typeful attribute parsing helpers
      [NET_SCHED]: sch_api: introduce constant for rate table size
      [NET_SCHED]: Use nla_policy for attribute validation in packet schedulers
      [NET_SCHED]: Use nla_policy for attribute validation in classifiers
      [NET_SCHED]: Use nla_policy for attribute validation in actions
      [NET_SCHED]: Use nla_policy for attribute validation in ematches

^ permalink raw reply

* [NET_SCHED 01/15]: sch_atm: fix format string warning
From: Patrick McHardy @ 2008-01-23 16:36 UTC (permalink / raw)
  To: davem; +Cc: netdev, Patrick McHardy
In-Reply-To: <20080123163555.6459.69501.sendpatchset@localhost.localdomain>

[NET_SCHED]: sch_atm: fix format string warning

Fix format string warning introduces by the netlink API conversion:

net/sched/sch_atm.c:250: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'int'.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 3bab4166cf0350552419d7871b4df463c5aed2ea
tree c08015e39f6f1da0255f1325cc21dfda0c738bc2
parent a7f92e3b13a5e3db64383c503f2249dc74b41bd6
author Patrick McHardy <kaber@trash.net> Wed, 23 Jan 2008 16:48:28 +0100
committer Patrick McHardy <kaber@trash.net> Wed, 23 Jan 2008 16:48:28 +0100

 net/sched/sch_atm.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c
index eb01aae..e587391 100644
--- a/net/sched/sch_atm.c
+++ b/net/sched/sch_atm.c
@@ -246,7 +246,7 @@ static int atm_tc_change(struct Qdisc *sch, u32 classid, u32 parent,
 		if (!excess)
 			return -ENOENT;
 	}
-	pr_debug("atm_tc_change: type %d, payload %lu, hdr_len %d\n",
+	pr_debug("atm_tc_change: type %d, payload %d, hdr_len %d\n",
 		 opt->nla_type, nla_len(opt), hdr_len);
 	sock = sockfd_lookup(fd, &error);
 	if (!sock)

^ permalink raw reply related

* [NET_SCHED 02/15]: sch_netem: use nla_parse_nested_compat
From: Patrick McHardy @ 2008-01-23 16:36 UTC (permalink / raw)
  To: davem; +Cc: netdev, Patrick McHardy
In-Reply-To: <20080123163555.6459.69501.sendpatchset@localhost.localdomain>

[NET_SCHED]: sch_netem: use nla_parse_nested_compat

Replace open coded equivalent of nla_parse_nested_compat().

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 1af28b79f4f0a67db344938ef6739ad2af1a72a7
tree 52294414fad2e6cd11aa719113f160a47bbe5bd5
parent 3bab4166cf0350552419d7871b4df463c5aed2ea
author Patrick McHardy <kaber@trash.net> Wed, 23 Jan 2008 16:48:46 +0100
committer Patrick McHardy <kaber@trash.net> Wed, 23 Jan 2008 16:48:46 +0100

 net/sched/sch_netem.c |   58 ++++++++++++++++++++++---------------------------
 1 files changed, 26 insertions(+), 32 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index a7b58df..1a75579 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -407,13 +407,18 @@ static int get_corrupt(struct Qdisc *sch, const struct nlattr *attr)
 static int netem_change(struct Qdisc *sch, struct nlattr *opt)
 {
 	struct netem_sched_data *q = qdisc_priv(sch);
+	struct nlattr *tb[TCA_NETEM_MAX + 1];
 	struct tc_netem_qopt *qopt;
 	int ret;
 
-	if (opt == NULL || nla_len(opt) < sizeof(*qopt))
+	if (opt == NULL)
 		return -EINVAL;
 
-	qopt = nla_data(opt);
+	ret = nla_parse_nested_compat(tb, TCA_NETEM_MAX, opt, NULL, qopt,
+				      sizeof(*qopt));
+	if (ret < 0)
+		return ret;
+
 	ret = set_fifo_limit(q->qdisc, qopt->limit);
 	if (ret) {
 		pr_debug("netem: can't set fifo limit\n");
@@ -434,39 +439,28 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt)
 	if (q->gap)
 		q->reorder = ~0;
 
-	/* Handle nested options after initial queue options.
-	 * Should have put all options in nested format but too late now.
-	 */
-	if (nla_len(opt) > sizeof(*qopt)) {
-		struct nlattr *tb[TCA_NETEM_MAX + 1];
-		if (nla_parse(tb, TCA_NETEM_MAX,
-			      nla_data(opt) + sizeof(*qopt),
-			      nla_len(opt) - sizeof(*qopt), NULL))
-			return -EINVAL;
-
-		if (tb[TCA_NETEM_CORR]) {
-			ret = get_correlation(sch, tb[TCA_NETEM_CORR]);
-			if (ret)
-				return ret;
-		}
+	if (tb[TCA_NETEM_CORR]) {
+		ret = get_correlation(sch, tb[TCA_NETEM_CORR]);
+		if (ret)
+			return ret;
+	}
 
-		if (tb[TCA_NETEM_DELAY_DIST]) {
-			ret = get_dist_table(sch, tb[TCA_NETEM_DELAY_DIST]);
-			if (ret)
-				return ret;
-		}
+	if (tb[TCA_NETEM_DELAY_DIST]) {
+		ret = get_dist_table(sch, tb[TCA_NETEM_DELAY_DIST]);
+		if (ret)
+			return ret;
+	}
 
-		if (tb[TCA_NETEM_REORDER]) {
-			ret = get_reorder(sch, tb[TCA_NETEM_REORDER]);
-			if (ret)
-				return ret;
-		}
+	if (tb[TCA_NETEM_REORDER]) {
+		ret = get_reorder(sch, tb[TCA_NETEM_REORDER]);
+		if (ret)
+			return ret;
+	}
 
-		if (tb[TCA_NETEM_CORRUPT]) {
-			ret = get_corrupt(sch, tb[TCA_NETEM_CORRUPT]);
-			if (ret)
-				return ret;
-		}
+	if (tb[TCA_NETEM_CORRUPT]) {
+		ret = get_corrupt(sch, tb[TCA_NETEM_CORRUPT]);
+		if (ret)
+			return ret;
 	}
 
 	return 0;

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox