Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH RFC] Per route TCP options
From: Gilad Ben-Yossef @ 2009-10-21  8:40 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Rick Jones, netdev, ori
In-Reply-To: <20091021082109.GE8704@Chamillionaire.breakpoint.cc>

Hi Florian,

Florian Westphal wrote:

> Gilad Ben-Yossef <gilad@codefidence.com> wrote:
>   
>> The point is that even then you are more then likely to wish to turn
>> off these options to specific destination and routes (that go over
>> said exotic link) and keep using them over others - e.g. timestamp
>> OK for local LAN, but for default route that goes over exotic TCP/IP
>> over carrier penguins turn it off.
>>     
>
> If you need a bandaid solution, its is possible to replace tcp
> options with NOOPs using netfilters TCPOPTSTRIP target.
>
> There is also an ECN target to work around ECN blackholes.
>   
Thanks for the tip. It is appreciated.

The band aid solution that Ori and company found was
simply to patch the local copy of the kernel used but being that sitting
on a bunch of "private" patches seems like a lose-lose situation (there
is a term you don't hear much when talking to MBAs :-) I'm now trying
to get it mainlined for them.

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list

^ permalink raw reply

* Re: Policy routing + route "via" gives a strange behavior
From: Mallika Gautam @ 2009-10-21  8:47 UTC (permalink / raw)
  To: Guido Trotter; +Cc: Atis Elsts, netdev
In-Reply-To: <20091020172334.GA6404@gg.studio.tixteam.net>

>
>> Anyway, you can achieve what you wish by using the "onlink" option, e.g.:
>>   ip route add table 100 default dev eth1 via 192.168.5.254 onlink
>

Hi,
I am facing exactly the same issue and have to flush the main table
once the routes in the table get added. This is really nasty as I need
to this from a kernel module every time an interface is shutdown and
again brought up.

"onlink" doesn't seem to work as it "sees" the destination IP address
"onlink" instead of resolving it via the gateway. E.g. if I want to
ping 139.85.111.12 via eth1, it tries to send ARP request for
139.85.111.12  and not 192.168.5.254. So it doesn' work.

I would really link to see this behaviour fixed as the workarounds are
equally painful.

Regards
Mallika

^ permalink raw reply

* [PATCH v2 0/8] Per route TCP options
From: Gilad Ben-Yossef @ 2009-10-21  8:56 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef

Turn the global sysctls allowing disabling of TCP SACK, DSCAK,
time stamp and window scale into per route entry feature options,
laying the ground to future removal of the relevant global sysctls.

You really only want to disable SACK, DSACK, time stamp or window
scale if you've got a piece of broken networking equipment somewhere 
as a stop gap until you can bring a big enough hammer to deal with
the broken network equipment. It doesn't make sense to "punish" the
entire connections going through the machine to destinations not 
related to the broken equipment.

This is doubly true when you're dealing with network containers
used to isolate several virtual domains.

Per route options implemented in free bits in the features route
entry property, which in some cases were reserved by name for these
options, so this does not inflate any structure and I expect that
when the apropriate global sysctls will be removed the overall code
base will be smaller.

Tested on x86 using Qemu/KVM.  

Will send the matching patch to iproute2 if/when this is ACKed or
if someone wants to test this.

Patchset based on original work by Ori Finkelman and Yony Amit 
from ComSleep Ltd.

Gilad Ben-Yossef (8):
  Only parse time stamp TCP option in time wait sock
  Allow tcp_parse_options to consult dst entry
  Infrastructure for querying route entry features
  Add the no SACK route option feature
  Allow disabling TCP timestamp options per route
  Allow to turn off TCP window scale opt per route
  Allow disabling of DSACK TCP option per route
  Document future removal of sysctl_tcp_* options

 Documentation/feature-removal-schedule.txt |   12 ++++++++++++
 include/linux/rtnetlink.h                  |    6 ++++--
 include/net/dst.h                          |    8 +++++++-
 include/net/tcp.h                          |    3 ++-
 net/ipv4/syncookies.c                      |   27 ++++++++++++++-------------
 net/ipv4/tcp_input.c                       |   26 ++++++++++++++++++--------
 net/ipv4/tcp_ipv4.c                        |   19 ++++++++++---------
 net/ipv4/tcp_minisocks.c                   |    8 +++++---
 net/ipv4/tcp_output.c                      |   18 +++++++++++++-----
 net/ipv6/syncookies.c                      |   28 +++++++++++++++-------------
 net/ipv6/tcp_ipv6.c                        |    3 ++-
 11 files changed, 102 insertions(+), 56 deletions(-)

^ permalink raw reply

* [PATCH v2 3/8] Add dst_feature to query route entry features
From: Gilad Ben-Yossef @ 2009-10-21  8:56 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef
In-Reply-To: <1256115421-12714-1-git-send-email-gilad@codefidence.com>

Adding an accessor to existing  dst_entry feautres field and
refactor the only supported feature (allfrag) to use it.


Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>

---
 include/net/dst.h |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/include/net/dst.h b/include/net/dst.h
index 5a900dd..b562be3 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -111,6 +111,12 @@ dst_metric(const struct dst_entry *dst, int metric)
 	return dst->metrics[metric-1];
 }
 
+static inline u32
+dst_feature(const struct dst_entry *dst, u32 feature)
+{
+	return dst_metric(dst, RTAX_FEATURES) & feature;
+}
+
 static inline u32 dst_mtu(const struct dst_entry *dst)
 {
 	u32 mtu = dst_metric(dst, RTAX_MTU);
@@ -136,7 +142,7 @@ static inline void set_dst_metric_rtt(struct dst_entry *dst, int metric,
 static inline u32
 dst_allfrag(const struct dst_entry *dst)
 {
-	int ret = dst_metric(dst, RTAX_FEATURES) & RTAX_FEATURE_ALLFRAG;
+	int ret = dst_feature(dst,  RTAX_FEATURE_ALLFRAG);
 	/* Yes, _exactly_. This is paranoia. */
 	barrier();
 	return ret;
-- 
1.5.6.3


^ permalink raw reply related

* [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
From: Gilad Ben-Yossef @ 2009-10-21  8:57 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef
In-Reply-To: <1256115421-12714-1-git-send-email-gilad@codefidence.com>

No need for global kill switches if we have per route entry controls.
Wait a year before removing in case someone is using this.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>

---
 Documentation/feature-removal-schedule.txt |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 89a47b5..60db855 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -6,6 +6,18 @@ be removed from this file.
 
 ---------------------------
 
+What:	sysctl_tcp_sack, sysctl_tcp_timestamps, sysctl_tcp_window_scaling,
+	sysctl_tcp_dsack
+When:	October 2010
+
+Why:	These options can now be set on a per route basis via the
+	RTAX_FEATURE_NO_SACK, RTAX_FEATURE_NO_TSTAMP, RTAX_FEATURE_NO_WSCALE,
+	and RTAX_FEATURE_NO_DSACK route feature options.
+
+Who:	Gilad Ben-Yossef <gilad@codefidence.com>
+
+---------------------------
+
 What:	PRISM54
 When:	2.6.34
 
-- 
1.5.6.3


^ permalink raw reply related

* [PATCH v2 6/8] Allow to turn off TCP window scale opt per route
From: Gilad Ben-Yossef @ 2009-10-21  8:56 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef
In-Reply-To: <1256115421-12714-1-git-send-email-gilad@codefidence.com>

Add and use no window scale bit in the features field.

Note that this is not the same as setting a window scale of 0 
as would happen with window limit on route.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>
---
 include/linux/rtnetlink.h |    1 +
 net/ipv4/tcp_input.c      |    3 ++-
 net/ipv4/tcp_output.c     |    6 ++++--
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 2ab8c75..6784b34 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -380,6 +380,7 @@ enum
 #define RTAX_FEATURE_NO_SACK	0x00000002
 #define RTAX_FEATURE_NO_TSTAMP	0x00000004
 #define RTAX_FEATURE_ALLFRAG	0x00000008
+#define RTAX_FEATURE_NO_WSCALE	0x00000010
 
 struct rta_session
 {
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d2f9742..4f5e914 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3739,7 +3739,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
 				break;
 			case TCPOPT_WINDOW:
 				if (opsize == TCPOLEN_WINDOW && th->syn &&
-				    !estab && sysctl_tcp_window_scaling) {
+				    !estab && sysctl_tcp_window_scaling &&
+				    !dst_feature(dst, RTAX_FEATURE_NO_WSCALE)) {
 					__u8 snd_wscale = *(__u8 *)ptr;
 					opt_rx->wscale_ok = 1;
 					if (snd_wscale > 14) {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8f30c18..ff60a21 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -496,7 +496,8 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 		opts->tsecr = tp->rx_opt.ts_recent;
 		size += TCPOLEN_TSTAMP_ALIGNED;
 	}
-	if (likely(sysctl_tcp_window_scaling)) {
+	if (likely(sysctl_tcp_window_scaling &&
+		   !dst_feature(dst, RTAX_FEATURE_NO_WSCALE))) {
 		opts->ws = tp->rx_opt.rcv_wscale;
 		opts->options |= OPTION_WSCALE;
 		size += TCPOLEN_WSCALE_ALIGNED;
@@ -2347,7 +2348,8 @@ static void tcp_connect_init(struct sock *sk)
 				  tp->advmss - (tp->rx_opt.ts_recent_stamp ? tp->tcp_header_len - sizeof(struct tcphdr) : 0),
 				  &tp->rcv_wnd,
 				  &tp->window_clamp,
-				  sysctl_tcp_window_scaling,
+				  (sysctl_tcp_window_scaling &&
+				   !dst_feature(dst, RTAX_FEATURE_NO_WSCALE)),
 				  &rcv_wscale);
 
 	tp->rx_opt.rcv_wscale = rcv_wscale;
-- 
1.5.6.3


^ permalink raw reply related

* [PATCH v2 4/8] Add the no SACK route option feature
From: Gilad Ben-Yossef @ 2009-10-21  8:56 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef
In-Reply-To: <1256115421-12714-1-git-send-email-gilad@codefidence.com>

Implement querying and acting upon the no sack bit in the features
field.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>

---
 include/linux/rtnetlink.h |    2 +-
 net/ipv4/tcp_input.c      |    3 ++-
 net/ipv4/tcp_output.c     |    4 +++-
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index adf2068..9c802a6 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -377,7 +377,7 @@ enum
 #define RTAX_MAX (__RTAX_MAX - 1)
 
 #define RTAX_FEATURE_ECN	0x00000001
-#define RTAX_FEATURE_SACK	0x00000002
+#define RTAX_FEATURE_NO_SACK	0x00000002
 #define RTAX_FEATURE_TIMESTAMP	0x00000004
 #define RTAX_FEATURE_ALLFRAG	0x00000008
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d502f49..b14f780 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3763,7 +3763,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
 				break;
 			case TCPOPT_SACK_PERM:
 				if (opsize == TCPOLEN_SACK_PERM && th->syn &&
-				    !estab && sysctl_tcp_sack) {
+				    !estab && sysctl_tcp_sack &&
+				    !dst_feature(dst, RTAX_FEATURE_NO_SACK)) {
 					opt_rx->sack_ok = 1;
 					tcp_sack_reset(opt_rx);
 				}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index fcd278a..64db8dd 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -464,6 +464,7 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 				struct tcp_md5sig_key **md5) {
 	struct tcp_sock *tp = tcp_sk(sk);
 	unsigned size = 0;
+	struct dst_entry *dst = __sk_dst_get(sk);
 
 #ifdef CONFIG_TCP_MD5SIG
 	*md5 = tp->af_specific->md5_lookup(sk, sk);
@@ -498,7 +499,8 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 		opts->options |= OPTION_WSCALE;
 		size += TCPOLEN_WSCALE_ALIGNED;
 	}
-	if (likely(sysctl_tcp_sack)) {
+	if (likely(sysctl_tcp_sack &&
+		   !dst_feature(dst, RTAX_FEATURE_NO_SACK))) {
 		opts->options |= OPTION_SACK_ADVERTISE;
 		if (unlikely(!(OPTION_TS & opts->options)))
 			size += TCPOLEN_SACKPERM_ALIGNED;
-- 
1.5.6.3


^ permalink raw reply related

* [PATCH v2 1/8] Only parse time stamp TCP option in time wait sock
From: Gilad Ben-Yossef @ 2009-10-21  8:56 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef, Yony Amit
In-Reply-To: <1256115421-12714-1-git-send-email-gilad@codefidence.com>

A time wait socket is established - we already know if time stamp
option is called for or not.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Signed-off-by: Ori Finkelman <ori@comsleep.com>
Signed-off-by: Yony Amit <yony@comsleep.com>

---
 net/ipv4/tcp_minisocks.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 624c3c9..c49a550 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -100,9 +100,8 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
 	struct tcp_options_received tmp_opt;
 	int paws_reject = 0;
 
-	tmp_opt.saw_tstamp = 0;
 	if (th->doff > (sizeof(*th) >> 2) && tcptw->tw_ts_recent_stamp) {
-		tcp_parse_options(skb, &tmp_opt, 0);
+		tcp_parse_options(skb, &tmp_opt, 1);
 
 		if (tmp_opt.saw_tstamp) {
 			tmp_opt.ts_recent	= tcptw->tw_ts_recent;
-- 
1.5.6.3


^ permalink raw reply related

* [PATCH v2 2/8] Allow tcp_parse_options to consult dst entry
From: Gilad Ben-Yossef @ 2009-10-21  8:56 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef
In-Reply-To: <1256115421-12714-1-git-send-email-gilad@codefidence.com>

We need tcp_parse_options to be aware of dst_entry to 
take into account per dst_entry TCP options settings

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>

---
 include/net/tcp.h        |    3 ++-
 net/ipv4/syncookies.c    |   27 ++++++++++++++-------------
 net/ipv4/tcp_input.c     |    9 ++++++---
 net/ipv4/tcp_ipv4.c      |   19 ++++++++++---------
 net/ipv4/tcp_minisocks.c |    7 +++++--
 net/ipv6/syncookies.c    |   28 +++++++++++++++-------------
 net/ipv6/tcp_ipv6.c      |    3 ++-
 7 files changed, 54 insertions(+), 42 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 03a49c7..740d09b 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -409,7 +409,8 @@ extern int			tcp_recvmsg(struct kiocb *iocb, struct sock *sk,
 
 extern void			tcp_parse_options(struct sk_buff *skb,
 						  struct tcp_options_received *opt_rx,
-						  int estab);
+						  int estab,
+						  struct dst_entry *dst);
 
 extern u8			*tcp_parse_md5sig_option(struct tcphdr *th);
 
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index a6e0e07..4990dd4 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -276,13 +276,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV);
 
-	/* check for timestamp cookie support */
-	memset(&tcp_opt, 0, sizeof(tcp_opt));
-	tcp_parse_options(skb, &tcp_opt, 0);
-
-	if (tcp_opt.saw_tstamp)
-		cookie_check_timestamp(&tcp_opt);
-
 	ret = NULL;
 	req = inet_reqsk_alloc(&tcp_request_sock_ops); /* for safety */
 	if (!req)
@@ -298,12 +291,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 	ireq->loc_addr		= ip_hdr(skb)->daddr;
 	ireq->rmt_addr		= ip_hdr(skb)->saddr;
 	ireq->ecn_ok		= 0;
-	ireq->snd_wscale	= tcp_opt.snd_wscale;
-	ireq->rcv_wscale	= tcp_opt.rcv_wscale;
-	ireq->sack_ok		= tcp_opt.sack_ok;
-	ireq->wscale_ok		= tcp_opt.wscale_ok;
-	ireq->tstamp_ok		= tcp_opt.saw_tstamp;
-	req->ts_recent		= tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
 
 	/* We throwed the options of the initial SYN away, so we hope
 	 * the ACK carries the same options again (see RFC1122 4.2.3.8)
@@ -351,6 +338,20 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
 		}
 	}
 
+	/* check for timestamp cookie support */
+	memset(&tcp_opt, 0, sizeof(tcp_opt));
+	tcp_parse_options(skb, &tcp_opt, 0, &rt->u.dst);
+
+	if (tcp_opt.saw_tstamp)
+		cookie_check_timestamp(&tcp_opt);
+
+	ireq->snd_wscale        = tcp_opt.snd_wscale;
+	ireq->rcv_wscale        = tcp_opt.rcv_wscale;
+	ireq->sack_ok           = tcp_opt.sack_ok;
+	ireq->wscale_ok         = tcp_opt.wscale_ok;
+	ireq->tstamp_ok         = tcp_opt.saw_tstamp;
+	req->ts_recent          = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
+
 	/* Try to redo what tcp_v4_send_synack did. */
 	req->window_clamp = tp->window_clamp ? :dst_metric(&rt->u.dst, RTAX_WINDOW);
 
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d86784b..d502f49 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3698,12 +3698,14 @@ old_ack:
  * the fast version below fails.
  */
 void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
-		       int estab)
+		       int estab,  struct dst_entry *dst)
 {
 	unsigned char *ptr;
 	struct tcphdr *th = tcp_hdr(skb);
 	int length = (th->doff * 4) - sizeof(struct tcphdr);
 
+	BUG_ON(!estab && !dst);
+
 	ptr = (unsigned char *)(th + 1);
 	opt_rx->saw_tstamp = 0;
 
@@ -3820,7 +3822,7 @@ static int tcp_fast_parse_options(struct sk_buff *skb, struct tcphdr *th,
 		if (tcp_parse_aligned_timestamp(tp, th))
 			return 1;
 	}
-	tcp_parse_options(skb, &tp->rx_opt, 1);
+	tcp_parse_options(skb, &tp->rx_opt, 1, NULL);
 	return 1;
 }
 
@@ -5364,8 +5366,9 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	int saved_clamp = tp->rx_opt.mss_clamp;
+	struct dst_entry *dst = __sk_dst_get(sk);
 
-	tcp_parse_options(skb, &tp->rx_opt, 0);
+	tcp_parse_options(skb, &tp->rx_opt, 0, dst);
 
 	if (th->ack) {
 		/* rfc793:
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7cda24b..1cb0ec4 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1256,11 +1256,18 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 	tcp_rsk(req)->af_specific = &tcp_request_sock_ipv4_ops;
 #endif
 
+	ireq = inet_rsk(req);
+	ireq->loc_addr = daddr;
+	ireq->rmt_addr = saddr;
+	ireq->no_srccheck = inet_sk(sk)->transparent;
+	ireq->opt = tcp_v4_save_options(sk, skb);
+
+	dst = inet_csk_route_req(sk, req);
 	tcp_clear_options(&tmp_opt);
 	tmp_opt.mss_clamp = 536;
 	tmp_opt.user_mss  = tcp_sk(sk)->rx_opt.user_mss;
 
-	tcp_parse_options(skb, &tmp_opt, 0);
+	tcp_parse_options(skb, &tmp_opt, 0, dst);
 
 	if (want_cookie && !tmp_opt.saw_tstamp)
 		tcp_clear_options(&tmp_opt);
@@ -1269,14 +1276,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 
 	tcp_openreq_init(req, &tmp_opt, skb);
 
-	ireq = inet_rsk(req);
-	ireq->loc_addr = daddr;
-	ireq->rmt_addr = saddr;
-	ireq->no_srccheck = inet_sk(sk)->transparent;
-	ireq->opt = tcp_v4_save_options(sk, skb);
-
 	if (security_inet_conn_request(sk, skb, req))
-		goto drop_and_free;
+		goto drop_and_release;
 
 	if (!want_cookie)
 		TCP_ECN_create_request(req, tcp_hdr(skb));
@@ -1301,7 +1302,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
 		 */
 		if (tmp_opt.saw_tstamp &&
 		    tcp_death_row.sysctl_tw_recycle &&
-		    (dst = inet_csk_route_req(sk, req)) != NULL &&
+		    dst != NULL &&
 		    (peer = rt_get_peer((struct rtable *)dst)) != NULL &&
 		    peer->v4daddr == saddr) {
 			if (get_seconds() < peer->tcp_ts_stamp + TCP_PAWS_MSL &&
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index c49a550..70ff955 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -101,7 +101,7 @@ tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb,
 	int paws_reject = 0;
 
 	if (th->doff > (sizeof(*th) >> 2) && tcptw->tw_ts_recent_stamp) {
-		tcp_parse_options(skb, &tmp_opt, 1);
+		tcp_parse_options(skb, &tmp_opt, 1, NULL);
 
 		if (tmp_opt.saw_tstamp) {
 			tmp_opt.ts_recent	= tcptw->tw_ts_recent;
@@ -499,10 +499,11 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 	int paws_reject = 0;
 	struct tcp_options_received tmp_opt;
 	struct sock *child;
+	struct dst_entry *dst = inet_csk_route_req(sk, req);
 
 	tmp_opt.saw_tstamp = 0;
 	if (th->doff > (sizeof(struct tcphdr)>>2)) {
-		tcp_parse_options(skb, &tmp_opt, 0);
+		tcp_parse_options(skb, &tmp_opt, 0, dst);
 
 		if (tmp_opt.saw_tstamp) {
 			tmp_opt.ts_recent = req->ts_recent;
@@ -515,6 +516,8 @@ struct sock *tcp_check_req(struct sock *sk, struct sk_buff *skb,
 		}
 	}
 
+	dst_release(dst);
+
 	/* Check for pure retransmitted SYN. */
 	if (TCP_SKB_CB(skb)->seq == tcp_rsk(req)->rcv_isn &&
 	    flg == TCP_FLAG_SYN &&
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index 6b6ae91..6ece408 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -184,13 +184,6 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 
 	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV);
 
-	/* check for timestamp cookie support */
-	memset(&tcp_opt, 0, sizeof(tcp_opt));
-	tcp_parse_options(skb, &tcp_opt, 0);
-
-	if (tcp_opt.saw_tstamp)
-		cookie_check_timestamp(&tcp_opt);
-
 	ret = NULL;
 	req = inet6_reqsk_alloc(&tcp6_request_sock_ops);
 	if (!req)
@@ -224,12 +217,6 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 	req->expires = 0UL;
 	req->retrans = 0;
 	ireq->ecn_ok		= 0;
-	ireq->snd_wscale	= tcp_opt.snd_wscale;
-	ireq->rcv_wscale	= tcp_opt.rcv_wscale;
-	ireq->sack_ok		= tcp_opt.sack_ok;
-	ireq->wscale_ok		= tcp_opt.wscale_ok;
-	ireq->tstamp_ok		= tcp_opt.saw_tstamp;
-	req->ts_recent		= tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
 	treq->rcv_isn = ntohl(th->seq) - 1;
 	treq->snt_isn = cookie;
 
@@ -264,6 +251,21 @@ struct sock *cookie_v6_check(struct sock *sk, struct sk_buff *skb)
 			goto out_free;
 	}
 
+	/* check for timestamp cookie support */
+	memset(&tcp_opt, 0, sizeof(tcp_opt));
+	tcp_parse_options(skb, &tcp_opt, 0, dst);
+
+	if (tcp_opt.saw_tstamp)
+		cookie_check_timestamp(&tcp_opt);
+
+	req->ts_recent          = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
+
+	ireq->snd_wscale        = tcp_opt.snd_wscale;
+	ireq->rcv_wscale        = tcp_opt.rcv_wscale;
+	ireq->sack_ok           = tcp_opt.sack_ok;
+	ireq->wscale_ok         = tcp_opt.wscale_ok;
+	ireq->tstamp_ok         = tcp_opt.saw_tstamp;
+
 	req->window_clamp = tp->window_clamp ? :dst_metric(dst, RTAX_WINDOW);
 	tcp_select_initial_window(tcp_full_space(sk), req->mss,
 				  &req->rcv_wnd, &req->window_clamp,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 21d100b..2eebab5 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1165,6 +1165,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct request_sock *req = NULL;
 	__u32 isn = TCP_SKB_CB(skb)->when;
+	struct dst_entry *dst = __sk_dst_get(sk);
 #ifdef CONFIG_SYN_COOKIES
 	int want_cookie = 0;
 #else
@@ -1203,7 +1204,7 @@ static int tcp_v6_conn_request(struct sock *sk, struct sk_buff *skb)
 	tmp_opt.mss_clamp = IPV6_MIN_MTU - sizeof(struct tcphdr) - sizeof(struct ipv6hdr);
 	tmp_opt.user_mss = tp->rx_opt.user_mss;
 
-	tcp_parse_options(skb, &tmp_opt, 0);
+	tcp_parse_options(skb, &tmp_opt, 0, dst);
 
 	if (want_cookie && !tmp_opt.saw_tstamp)
 		tcp_clear_options(&tmp_opt);
-- 
1.5.6.3


^ permalink raw reply related

* [PATCH v2 7/8] Allow disabling of DSACK TCP option per route
From: Gilad Ben-Yossef @ 2009-10-21  8:57 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef
In-Reply-To: <1256115421-12714-1-git-send-email-gilad@codefidence.com>

Add and use no DSCAK bit in the features field.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>

---
 include/linux/rtnetlink.h |    1 +
 net/ipv4/tcp_input.c      |    8 ++++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 6784b34..e78b60c 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -381,6 +381,7 @@ enum
 #define RTAX_FEATURE_NO_TSTAMP	0x00000004
 #define RTAX_FEATURE_ALLFRAG	0x00000008
 #define RTAX_FEATURE_NO_WSCALE	0x00000010
+#define RTAX_FEATURE_NO_DSACK	0x00000020
 
 struct rta_session
 {
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4f5e914..4262da5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4080,8 +4080,10 @@ static inline int tcp_sack_extend(struct tcp_sack_block *sp, u32 seq,
 static void tcp_dsack_set(struct sock *sk, u32 seq, u32 end_seq)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	struct dst_entry *dst = __sk_dst_get(sk);
 
-	if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
+	if (tcp_is_sack(tp) && sysctl_tcp_dsack &&
+	    !dst_feature(dst, RTAX_FEATURE_NO_DSACK)) {
 		int mib_idx;
 
 		if (before(seq, tp->rcv_nxt))
@@ -4110,13 +4112,15 @@ static void tcp_dsack_extend(struct sock *sk, u32 seq, u32 end_seq)
 static void tcp_send_dupack(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
+	struct dst_entry *dst = __sk_dst_get(sk);
 
 	if (TCP_SKB_CB(skb)->end_seq != TCP_SKB_CB(skb)->seq &&
 	    before(TCP_SKB_CB(skb)->seq, tp->rcv_nxt)) {
 		NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_DELAYEDACKLOST);
 		tcp_enter_quickack_mode(sk);
 
-		if (tcp_is_sack(tp) && sysctl_tcp_dsack) {
+		if (tcp_is_sack(tp) && sysctl_tcp_dsack &&
+		    !dst_feature(dst, RTAX_FEATURE_NO_DSACK)) {
 			u32 end_seq = TCP_SKB_CB(skb)->end_seq;
 
 			if (after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt))
-- 
1.5.6.3


^ permalink raw reply related

* [PATCH v2 5/8] Allow disabling TCP timestamp options per route
From: Gilad Ben-Yossef @ 2009-10-21  8:56 UTC (permalink / raw)
  To: netdev; +Cc: ori, Gilad Ben-Yossef
In-Reply-To: <1256115421-12714-1-git-send-email-gilad@codefidence.com>

Implement querying and acting upon the no timestamp bit in the feature 
field.

Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: Ori Finkelman <ori@comsleep.com>
Sigend-off-by: Yony Amit <yony@comsleep.com>

---
 include/linux/rtnetlink.h |    2 +-
 net/ipv4/tcp_input.c      |    3 ++-
 net/ipv4/tcp_output.c     |    8 ++++++--
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
index 9c802a6..2ab8c75 100644
--- a/include/linux/rtnetlink.h
+++ b/include/linux/rtnetlink.h
@@ -378,7 +378,7 @@ enum
 
 #define RTAX_FEATURE_ECN	0x00000001
 #define RTAX_FEATURE_NO_SACK	0x00000002
-#define RTAX_FEATURE_TIMESTAMP	0x00000004
+#define RTAX_FEATURE_NO_TSTAMP	0x00000004
 #define RTAX_FEATURE_ALLFRAG	0x00000008
 
 struct rta_session
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index b14f780..d2f9742 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3755,7 +3755,8 @@ void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
 			case TCPOPT_TIMESTAMP:
 				if ((opsize == TCPOLEN_TIMESTAMP) &&
 				    ((estab && opt_rx->tstamp_ok) ||
-				     (!estab && sysctl_tcp_timestamps))) {
+				     (!estab && sysctl_tcp_timestamps &&
+				      !dst_feature(dst, RTAX_FEATURE_NO_TSTAMP)))) {
 					opt_rx->saw_tstamp = 1;
 					opt_rx->rcv_tsval = get_unaligned_be32(ptr);
 					opt_rx->rcv_tsecr = get_unaligned_be32(ptr + 4);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 64db8dd..8f30c18 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -488,7 +488,9 @@ static unsigned tcp_syn_options(struct sock *sk, struct sk_buff *skb,
 	opts->mss = tcp_advertise_mss(sk);
 	size += TCPOLEN_MSS_ALIGNED;
 
-	if (likely(sysctl_tcp_timestamps && *md5 == NULL)) {
+	if (likely(sysctl_tcp_timestamps &&
+		   !dst_feature(dst, RTAX_FEATURE_NO_TSTAMP) &&
+		   *md5 == NULL)) {
 		opts->options |= OPTION_TS;
 		opts->tsval = TCP_SKB_CB(skb)->when;
 		opts->tsecr = tp->rx_opt.ts_recent;
@@ -2317,7 +2319,9 @@ static void tcp_connect_init(struct sock *sk)
 	 * See tcp_input.c:tcp_rcv_state_process case TCP_SYN_SENT.
 	 */
 	tp->tcp_header_len = sizeof(struct tcphdr) +
-		(sysctl_tcp_timestamps ? TCPOLEN_TSTAMP_ALIGNED : 0);
+		(sysctl_tcp_timestamps &&
+		(!dst_feature(dst, RTAX_FEATURE_NO_TSTAMP) ?
+		  TCPOLEN_TSTAMP_ALIGNED : 0));
 
 #ifdef CONFIG_TCP_MD5SIG
 	if (tp->af_specific->md5_lookup(sk, sk) != NULL)
-- 
1.5.6.3


^ permalink raw reply related

* Re: Enable syn cookies by default
From: William Allen Simpson @ 2009-10-21  9:16 UTC (permalink / raw)
  To: netdev
In-Reply-To: <b2cc26e40910210048y43bdb604pcd356376a93c41e@mail.gmail.com>

Olaf van der Spek wrote:
> On Wed, Oct 21, 2009 at 9:25 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> This is a user selectable setting. What's wrong with /etc/sysctl.conf ?
> 
> It requires user action...
> Often you notice cookies are disabled only after a service becomes unreachable.
> What's wrong with improving defaults?

I've not been a regular contributor here, so I'm not sure that my view has
much weight, but I'm *against* changing the coded default.

Keep in mind that I'm busy trying to replace syncookies with real cookies,
so I'm biased.  The syncookies interfere with new options; although in
Linux, they interfere less than other systems.

For Ubuntu, the practice is complicated.  In /etc/sysctl.conf, the text
assumes that the default is off:

# Uncomment the next line to enable TCP/IP SYN cookies
# This disables TCP Window Scaling (http://lkml.org/lkml/2008/2/5/167),
# and is not recommended.
#net.ipv4.tcp_syncookies=1

But in the default installed /etc/sysctl.d/10-network-security.conf, it
is explicitly on in any case:

# Turn on SYN-flood protections.  Starting with 2.6.26, there is no loss
# of TCP functionality/features under normal conditions.  When flood
# protections kick in under high unanswered-SYN load, the system
# should remain more stable, with a trade off of some loss of TCP
# functionality/features (e.g. TCP Window scaling).
net.ipv4.tcp_syncookies=1

As Ubuntu is debian based, perhaps they can back-port the Ubuntu changes?

> Don't forget the missing log entries.
> 
On this I agree.  I'd like the system to syslog it's under attack,
especially whenever syncookies are off.

^ permalink raw reply

* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
From: William Allen Simpson @ 2009-10-21  9:40 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: netdev, ori
In-Reply-To: <1256115421-12714-9-git-send-email-gilad@codefidence.com>

Gilad Ben-Yossef wrote:
> +What:	sysctl_tcp_sack, sysctl_tcp_timestamps, sysctl_tcp_window_scaling,
> +	sysctl_tcp_dsack

Opposed to removing, as this is a common configuration for *BSD.  It helps
operators to have similar utility across systems.

net.inet.tcp.sack.enable

net.inet.tcp.timestamps

net.inet.tcp.win_scale

Support removing sysctl_tcp_dsack, as this appears to be Linux-only.

^ permalink raw reply

* Re: [PATCH v2 1/8] Only parse time stamp TCP option in time wait sock
From: William Allen Simpson @ 2009-10-21  9:49 UTC (permalink / raw)
  To: netdev
In-Reply-To: <1256115421-12714-2-git-send-email-gilad@codefidence.com>

Gilad Ben-Yossef wrote:
> A time wait socket is established - we already know if time stamp
> option is called for or not.
> 
Not so sure about this.  A timewait sock isn't actually established,
and new/changed options could appear.  There's all sorts of edge cases.

There's also some current work to note:

  http://tools.ietf.org/html/draft-ietf-tcpm-1323bis

  http://tools.ietf.org/html/draft-gont-tcpm-tcp-timestamps

^ permalink raw reply

* Re: [PATCH v2 1/8] Only parse time stamp TCP option in time wait sock
From: Gilad Ben-Yossef @ 2009-10-21 10:07 UTC (permalink / raw)
  To: William Allen Simpson; +Cc: netdev
In-Reply-To: <4ADED915.7000107@gmail.com>

William Allen Simpson wrote:

> Gilad Ben-Yossef wrote:
>> A time wait socket is established - we already know if time stamp
>> option is called for or not.
>>
> Not so sure about this.  A timewait sock isn't actually established,
> and new/changed options could appear.  There's all sorts of edge cases.
If you examine the specific context where tcp_parse_options is being 
called here,
the only TCP option which is of interest is the time stamp option, and 
this code path
is only being taken when we already know that the original socket  had
used the time stamp option.

So while I agree that in general you are right, I do believe that in the 
specific context
of this patch we should call tcp_parse_options with the established flag 
on and let it
know we are expecting to see a time stamp option, which is what I was 
referring to.

>
> There's also some current work to note:
>
>  http://tools.ietf.org/html/draft-ietf-tcpm-1323bis
>
>  http://tools.ietf.org/html/draft-gont-tcpm-tcp-timestamps

Very interesting, thank you.

As I noted above, my comment about
TIME WAIT sockets being "established" should really only be considered
in the context of the specific call to tcp_parse_options() and the 
"established"
parameter of that function.

Thanks,
Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list

^ permalink raw reply

* Re: Enable syn cookies by default
From: Olaf van der Spek @ 2009-10-21 10:10 UTC (permalink / raw)
  To: William Allen Simpson; +Cc: netdev
In-Reply-To: <4ADED186.3040300@gmail.com>

On Wed, Oct 21, 2009 at 11:16 AM, William Allen Simpson
<william.allen.simpson@gmail.com> wrote:
> Keep in mind that I'm busy trying to replace syncookies with real cookies,
> so I'm biased.  The syncookies interfere with new options; although in
> Linux, they interfere less than other systems.

How and when do they interfere?
If syn cookies are enabled and the queue isn't full, they're not used
so they don't interfere.
If the queue is full, they do interfere, but the alternative would be
no connection at all.
So I really don't see the disadvantage of enabling cookies by default.

> As Ubuntu is debian based, perhaps they can back-port the Ubuntu changes?

Actually changing the value isn't the problem, but the Debian
maintainer isn't sure it's a good idea (but he doesn't know why).

Olaf

^ permalink raw reply

* Re: [PATCH v2 8/8] Document future removal of sysctl_tcp_* options
From: Gilad Ben-Yossef @ 2009-10-21 10:23 UTC (permalink / raw)
  To: William Allen Simpson; +Cc: netdev, ori
In-Reply-To: <4ADED6FA.2030502@gmail.com>

William Allen Simpson wrote:

> Gilad Ben-Yossef wrote:
>> +What:    sysctl_tcp_sack, sysctl_tcp_timestamps, 
>> sysctl_tcp_window_scaling,
>> +    sysctl_tcp_dsack
>
> Opposed to removing, as this is a common configuration for *BSD.  It 
> helps
> operators to have similar utility across systems.
>
> net.inet.tcp.sack.enable
>
> net.inet.tcp.timestamps
>
> net.inet.tcp.win_scale
>
> Support removing sysctl_tcp_dsack, as this appears to be Linux-only.

I have no issue with leaving those, if everyone thinks we're better off.

BTW, while we're talking about OS envy, I do believe that Windows do let
you specify on a per route basis. Not that this is really a good ground for
technical decision, but still... :-)

Gilad

-- 
Gilad Ben-Yossef
Chief Coffee Drinker & CTO
Codefidence Ltd.

Web:   http://codefidence.com
Cell:  +972-52-8260388
Skype: gilad_codefidence
Tel:   +972-8-9316883 ext. 201
Fax:   +972-8-9316884
Email: gilad@codefidence.com

Check out our Open Source technology and training blog - http://tuxology.net

	"Sorry cannot parse this, its too long to be true  :)"
	  -- Eric Dumazet on netdev mailing list

^ permalink raw reply

* [PATCH] e1000: the power down when running ifdown command
From: Naohiro Ooiwa @ 2009-10-21  9:52 UTC (permalink / raw)
  To: netdev@vger.kernel.org, e1000-devel, svaidy, linux-kernel-owner

Hi all

I'm trying to modify e1000 driver for power saving.

The e1000 driver doesn't let the power down when running ifdown command.
So, I set to the D3hot state of a PCI device at the end of e1000_close().

With this modification, e1000 driver reduces power by ifdown
to the same level as link-down case.

I spoke it in Collaboration Summit 2009.
For details and result of power measurement,
please refer the 36-38 page of following document.
http://events.linuxfoundation.org/archive/lfcs09_ooiwa.pdf

Could you please check the my patch ?

Should I consider WOL ?
Dosen't E1000_PCI_POWER_SAVE need ?


Hi Vaidy

I was glad that you talked to me in Collaboration Summit.
I'm sorry for sending the patch so late.

I found a bug in the patch which I used in Collaboration Summit.
Sometimes, it's don't work to auto-negotiation by repeated ifup and ifdown.

I fixed it. I'd appreciate it if you could test.


Thanks you.
Naohiro Ooiwa

Signed-off-by: Naohiro Ooiwa <nooiwa@miraclelinux.com>
---
 drivers/net/e1000/e1000_main.c |   32 ++++++++++++++++++++++++++++++++
 1 files changed, 32 insertions(+), 0 deletions(-)

diff --git a/drivers/net/e1000/e1000_main.c b/drivers/net/e1000/e1000_main.c
index bcd192c..12e1a42 100644
--- a/drivers/net/e1000/e1000_main.c
+++ b/drivers/net/e1000/e1000_main.c
@@ -26,6 +26,11 @@

 *******************************************************************************/

+/*
+ * define this if you want pci save power while ifdown.
+ */
+#define E1000_PCI_POWER_SAVE
+
 #include "e1000.h"
 #include <net/ip6_checksum.h>

@@ -1248,6 +1253,23 @@ static int e1000_open(struct net_device *netdev)
 	struct e1000_hw *hw = &adapter->hw;
 	int err;

+#ifdef E1000_PCI_POWER_SAVE
+	struct pci_dev *pdev = adapter->pdev;
+
+	pci_set_power_state(pdev, PCI_D0);
+	pci_restore_state(pdev);
+
+	if (adapter->need_ioport)
+		err = pci_enable_device(pdev);
+	else
+		err = pci_enable_device_mem(pdev);
+	if (err) {
+		printk(KERN_ERR "e1000: Cannot enable PCI device from power-save\n");
+		return err;
+	}
+	pci_set_master(pdev);
+#endif
+
 	/* disallow open during test */
 	if (test_bit(__E1000_TESTING, &adapter->flags))
 		return -EBUSY;
@@ -1265,6 +1287,7 @@ static int e1000_open(struct net_device *netdev)
 		goto err_setup_rx;

 	e1000_power_up_phy(adapter);
+	e1000_reset(adapter);

 	adapter->mng_vlan_id = E1000_MNG_VLAN_NONE;
 	if ((hw->mng_cookie.status &
@@ -1341,6 +1364,15 @@ static int e1000_close(struct net_device *netdev)
 		e1000_vlan_rx_kill_vid(netdev, adapter->mng_vlan_id);
 	}

+#ifdef E1000_PCI_POWER_SAVE
+#ifdef CONFIG_PM
+	pci_save_state(adapter->pdev);
+#endif
+	pci_disable_device(adapter->pdev);
+	pci_wake_from_d3(adapter->pdev, true);
+	pci_set_power_state(adapter->pdev, PCI_D3hot);
+#endif
+
 	return 0;
 }

-- 
1.5.4.1


^ permalink raw reply related

* Re: [patch 0/3] KS8851 updates for -rc5
From: Ben Dooks @ 2009-10-21 10:44 UTC (permalink / raw)
  To: David Miller; +Cc: ben, Ping.Doong, netdev, Charles.Li
In-Reply-To: <20091020.191204.183462320.davem@davemloft.net>

On Tue, Oct 20, 2009 at 07:12:04PM -0700, David Miller wrote:
> From: David Miller <davem@davemloft.net>
> Date: Tue, 20 Oct 2009 19:11:42 -0700 (PDT)

> > All applied to net-next-2.6, thanks.
> 
> Sorry, I meant 'net-2.6'

thanks.

^ permalink raw reply

* Re: [PATCH/RFC] make unregister_netdev() delete more than 4 interfaces per second
From: Octavian Purdila @ 2009-10-21 12:39 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Eric Dumazet, netdev, Cosmin Ratiu
In-Reply-To: <20091018182144.GC23395@kvack.org>

On Sunday 18 October 2009 21:21:44 you wrote:
> > The msleep(250) should be tuned first. Then if this is really necessary
> > to dismantle 100.000 netdevices per second, we might have to think a bit
> > more. 
> > Just try msleep(1 or 2), it should work quite well.
> 
> My goal is tearing down 100,000 interfaces in a few seconds, which really
>  is  necessary.  Right now we're running about 40,000 interfaces on a not
>  yet saturated 10Gbps link.  Going to dual 10Gbps links means pushing more
>  than 100,000 subscriber interfaces, and it looks like a modern dual socket
>  system can handle that.
> 

I would also like to see this patch in, we are running into scalability issues 
with creating/deleting lots of interfaces as well.

Thanks,
tavi

^ permalink raw reply

* Re: [PATCH net-next V2 1/3] iwmc3200top: Add Intel Wireless MultiCom  3200 top driver.
From: Marcel Holtmann @ 2009-10-21 12:52 UTC (permalink / raw)
  To: Tomas Winkler
  Cc: David Miller, linville, netdev, linux-wireless, linux-mmc, yi.zhu,
	inaky.perez-gonzalez, cindy.h.kao, guy.cohen, ron.rindjunsky
In-Reply-To: <1ba2fa240910200453k4638dc6cm8c8911353ea85b60@mail.gmail.com>

Hi Tomas,

> >> This patch adds Intel Wireless MultiCom 3200 top driver.
> >> IWMC3200 is 4Wireless Com CHIP (GPS/BT/WiFi/WiMAX).
> >> Top driver is responsible for device initialization and firmware download.
> >> Firmware handled by top is responsible for top itself and
> >> as well as bluetooth and GPS coms. (Wifi and WiMax provide their own firmware)
> >> In addition top driver is used to retrieve firmware logs
> >> and supports other debugging features
> >>
> >> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
> >
> > Applied to net-next-2.6
> 
> Thanks Dave
> 
> Marcel
> I want to send out now the BT driver, would like the patch against
> bluetooth-next-2.6.git, then I wait till you sync or can you also pick
> it from net-next if Dave is OK with that?

send it against net-next since you are working in drivers/bluetooth/, we
will have not conflicts.

Regards

Marcel



^ permalink raw reply

* [PATCH] bonding: fix a race condition in calls to slave MII ioctls
From: Jiri Bohac @ 2009-10-21 13:03 UTC (permalink / raw)
  To: fubar; +Cc: netdev

Hi,

In mii monitor mode, bond_check_dev_link() calls the the ioctl
handler of slave devices. It stores the ndo_do_ioctl function
pointer to a static (!) ioctl variable and later uses it to call the
handler with the IOCTL macro.

If another thread executes bond_check_dev_link() at the same time
(even with a different bond, which none of the locks prevent), a
race condition occurs. If the two racing slaves have different
drivers, this may result in one driver's ioctl handler being
called with a pointer to a net_device controlled with a different
driver, resulting in unpredictable breakage.

Unless I am overlooking something, the "static" must be a
copy'n'paste error (?).

Signed-off-by: Jiri Bohac <jbohac@suse.cz>

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 69c5b15..5bfdd0c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -691,7 +691,7 @@ static int bond_check_dev_link(struct bonding *bond,
 			       struct net_device *slave_dev, int reporting)
 {
 	const struct net_device_ops *slave_ops = slave_dev->netdev_ops;
-	static int (*ioctl)(struct net_device *, struct ifreq *, int);
+	int (*ioctl)(struct net_device *, struct ifreq *, int);
 	struct ifreq ifr;
 	struct mii_ioctl_data *mii;

-- 
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, SUSE CZ

^ permalink raw reply related

* Re: [PATCH v2 2/8] Allow tcp_parse_options to consult dst entry
From: Ilpo Järvinen @ 2009-10-21 13:03 UTC (permalink / raw)
  To: Gilad Ben-Yossef; +Cc: Netdev, ori
In-Reply-To: <1256115421-12714-3-git-send-email-gilad@codefidence.com>

On Wed, 21 Oct 2009, Gilad Ben-Yossef wrote:

> We need tcp_parse_options to be aware of dst_entry to 
> take into account per dst_entry TCP options settings
> 
> Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com>
> Sigend-off-by: Ori Finkelman <ori@comsleep.com>
> Sigend-off-by: Yony Amit <yony@comsleep.com>
> 
> ---
>  include/net/tcp.h        |    3 ++-
>  net/ipv4/syncookies.c    |   27 ++++++++++++++-------------
>  net/ipv4/tcp_input.c     |    9 ++++++---
>  net/ipv4/tcp_ipv4.c      |   19 ++++++++++---------
>  net/ipv4/tcp_minisocks.c |    7 +++++--
>  net/ipv6/syncookies.c    |   28 +++++++++++++++-------------
>  net/ipv6/tcp_ipv6.c      |    3 ++-
>  7 files changed, 54 insertions(+), 42 deletions(-)
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 03a49c7..740d09b 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -409,7 +409,8 @@ extern int			tcp_recvmsg(struct kiocb *iocb, struct sock *sk,
>  
>  extern void			tcp_parse_options(struct sk_buff *skb,
>  						  struct tcp_options_received *opt_rx,
> -						  int estab);
> +						  int estab,
> +						  struct dst_entry *dst);
>  
>  extern u8			*tcp_parse_md5sig_option(struct tcphdr *th);
>  
> diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
> index a6e0e07..4990dd4 100644
> --- a/net/ipv4/syncookies.c
> +++ b/net/ipv4/syncookies.c
> @@ -276,13 +276,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
>  
>  	NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_SYNCOOKIESRECV);
>  
> -	/* check for timestamp cookie support */
> -	memset(&tcp_opt, 0, sizeof(tcp_opt));
> -	tcp_parse_options(skb, &tcp_opt, 0);
> -
> -	if (tcp_opt.saw_tstamp)
> -		cookie_check_timestamp(&tcp_opt);
> -
>  	ret = NULL;
>  	req = inet_reqsk_alloc(&tcp_request_sock_ops); /* for safety */
>  	if (!req)
> @@ -298,12 +291,6 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
>  	ireq->loc_addr		= ip_hdr(skb)->daddr;
>  	ireq->rmt_addr		= ip_hdr(skb)->saddr;
>  	ireq->ecn_ok		= 0;
> -	ireq->snd_wscale	= tcp_opt.snd_wscale;
> -	ireq->rcv_wscale	= tcp_opt.rcv_wscale;
> -	ireq->sack_ok		= tcp_opt.sack_ok;
> -	ireq->wscale_ok		= tcp_opt.wscale_ok;
> -	ireq->tstamp_ok		= tcp_opt.saw_tstamp;
> -	req->ts_recent		= tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
>  
>  	/* We throwed the options of the initial SYN away, so we hope
>  	 * the ACK carries the same options again (see RFC1122 4.2.3.8)
> @@ -351,6 +338,20 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
>  		}
>  	}
>  
> +	/* check for timestamp cookie support */
> +	memset(&tcp_opt, 0, sizeof(tcp_opt));
> +	tcp_parse_options(skb, &tcp_opt, 0, &rt->u.dst);
> +
> +	if (tcp_opt.saw_tstamp)
> +		cookie_check_timestamp(&tcp_opt);
> +
> +	ireq->snd_wscale        = tcp_opt.snd_wscale;
> +	ireq->rcv_wscale        = tcp_opt.rcv_wscale;
> +	ireq->sack_ok           = tcp_opt.sack_ok;
> +	ireq->wscale_ok         = tcp_opt.wscale_ok;
> +	ireq->tstamp_ok         = tcp_opt.saw_tstamp;
> +	req->ts_recent          = tcp_opt.saw_tstamp ? tcp_opt.rcv_tsval : 0;
> +
>  	/* Try to redo what tcp_v4_send_synack did. */
>  	req->window_clamp = tp->window_clamp ? :dst_metric(&rt->u.dst, RTAX_WINDOW);
>  
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
> index d86784b..d502f49 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -3698,12 +3698,14 @@ old_ack:
>   * the fast version below fails.
>   */
>  void tcp_parse_options(struct sk_buff *skb, struct tcp_options_received *opt_rx,
> -		       int estab)
> +		       int estab,  struct dst_entry *dst)
>  {
>  	unsigned char *ptr;
>  	struct tcphdr *th = tcp_hdr(skb);
>  	int length = (th->doff * 4) - sizeof(struct tcphdr);
>  
> +	BUG_ON(!estab && !dst);
> +
>  	ptr = (unsigned char *)(th + 1);
>  	opt_rx->saw_tstamp = 0;
>  
> @@ -3820,7 +3822,7 @@ static int tcp_fast_parse_options(struct sk_buff *skb, struct tcphdr *th,
>  		if (tcp_parse_aligned_timestamp(tp, th))
>  			return 1;
>  	}
> -	tcp_parse_options(skb, &tp->rx_opt, 1);
> +	tcp_parse_options(skb, &tp->rx_opt, 1, NULL);
>  	return 1;
>  }
>  
> @@ -5364,8 +5366,9 @@ static int tcp_rcv_synsent_state_process(struct sock *sk, struct sk_buff *skb,
>  	struct tcp_sock *tp = tcp_sk(sk);
>  	struct inet_connection_sock *icsk = inet_csk(sk);
>  	int saved_clamp = tp->rx_opt.mss_clamp;
> +	struct dst_entry *dst = __sk_dst_get(sk);
>  
> -	tcp_parse_options(skb, &tp->rx_opt, 0);
> +	tcp_parse_options(skb, &tp->rx_opt, 0, dst);
>  
>  	if (th->ack) {
>  		/* rfc793:
> diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
> index 7cda24b..1cb0ec4 100644
> --- a/net/ipv4/tcp_ipv4.c
> +++ b/net/ipv4/tcp_ipv4.c
> @@ -1256,11 +1256,18 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>  	tcp_rsk(req)->af_specific = &tcp_request_sock_ipv4_ops;
>  #endif
>  
> +	ireq = inet_rsk(req);
> +	ireq->loc_addr = daddr;
> +	ireq->rmt_addr = saddr;
> +	ireq->no_srccheck = inet_sk(sk)->transparent;
> +	ireq->opt = tcp_v4_save_options(sk, skb);
> +
> +	dst = inet_csk_route_req(sk, req);
>  	tcp_clear_options(&tmp_opt);
>  	tmp_opt.mss_clamp = 536;
>  	tmp_opt.user_mss  = tcp_sk(sk)->rx_opt.user_mss;
>  
> -	tcp_parse_options(skb, &tmp_opt, 0);
> +	tcp_parse_options(skb, &tmp_opt, 0, dst);
>  
>  	if (want_cookie && !tmp_opt.saw_tstamp)
>  		tcp_clear_options(&tmp_opt);
> @@ -1269,14 +1276,8 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>  
>  	tcp_openreq_init(req, &tmp_opt, skb);
>  
> -	ireq = inet_rsk(req);
> -	ireq->loc_addr = daddr;
> -	ireq->rmt_addr = saddr;
> -	ireq->no_srccheck = inet_sk(sk)->transparent;
> -	ireq->opt = tcp_v4_save_options(sk, skb);
> -
>  	if (security_inet_conn_request(sk, skb, req))
> -		goto drop_and_free;
> +		goto drop_and_release;
>  
>  	if (!want_cookie)
>  		TCP_ECN_create_request(req, tcp_hdr(skb));
> @@ -1301,7 +1302,7 @@ int tcp_v4_conn_request(struct sock *sk, struct sk_buff *skb)
>  		 */
>  		if (tmp_opt.saw_tstamp &&
>  		    tcp_death_row.sysctl_tw_recycle &&
> -		    (dst = inet_csk_route_req(sk, req)) != NULL &&
> +		    dst != NULL &&

Why you need this NULL check this here while you trap it with BUG_ON 
elsewhere? Does your patch perhaps create a remote DoS opportunity?


-- 
 i.

^ permalink raw reply

* Re: Enable syn cookies by default
From: David Miller @ 2009-10-21 13:04 UTC (permalink / raw)
  To: olafvdspek; +Cc: netdev
In-Reply-To: <b2cc26e40910210017v3885b18dre5021c8a920f30d7@mail.gmail.com>

From: Olaf van der Spek <olafvdspek@gmail.com>
Date: Wed, 21 Oct 2009 09:17:53 +0200

> Anybody?

Would please you be patient?

In case you haven't fucking noticed, all of the major kernel
developers are in Japan at the annual kernel summit and the Japan
Linux Symposium since late last week.

So nobody has the time to look into anything requiring real
long thinking like this issue does.

^ permalink raw reply

* Re: [PATCH 1/4] Adds random ect generation to tfrc-sp sender side
From: Ivo Calado @ 2009-10-21 13:15 UTC (permalink / raw)
  To: dccp, Gerrit Renker, Ivo Calado, netdev
In-Reply-To: <425e6efa0910210449r1a10fb2cvf7650ad470d987aa@mail.gmail.com>

On Mon, Oct 19, 2009 at 2:23 AM, Gerrit Renker <gerrit@erg.abdn.ac.uk> wrote:
> | Adds random ect generation to tfrc-sp sender side.
>
> I thought about this and found several reasons why it would be better to
> defer ECN Nonce sums to a later implementation.
>
>  1) At the moment the code always sets ECT(0). Even if it would
>    alternate ECT(0) and ECT(1), this would later be overwritten by ECT(0)
>    in dccp_msghdr_parse(). Ok, this could be fixed, but the real problem
>    is that the underlying machinery does not support ECN nonces, since
>
>    * ECN / DiffServ information is in two separate places of the
>      inet_sock (u8 `tos' field and u8 `tclass' field of ipv6_pinfo);
>
>    * the ECN driver sits in include/net/inet_ecn.h as
>      #define INET_ECN_xmit(sk) do { inet_sk(sk)->tos |= INET_ECN_ECT_0; } while (0)
>
>    * hence this would need to be revised and the best way to make an
>      acceptable suggestion would be a coded proof of concept that
>      changing the underlying implementation does have benefits.
>
>    On the receiver side the situation is the same. The function
>    tfrc_sp_check_ecn_sum(), introduced in Patch 2/4 of the TFRC-SP sender
>    implementation is only referenced in Patch 2/2 of the CCID-4 set, where
>    it ends, without side effect in "TODO: consider ecn sum test fail".
>
>    That is, at the moment both the sender and receiver side of the ECN Nonce
>    sum verification are placeholders which currently have no effect.
>

Okay, then the implementation would be useless now.

>
>  2) As far as I can see the ECN Nonce is an optimisation, an
>
>      "optional addition to Explicit Congestion Notification [RFC3168]
>       improving its robustness against malicious or accidental
>       concealment of marked packets [...]" (from the abstract)
>
>    Hence if at all, we would only have a benefit of adding the ECN Nonce
>    verification on top of an already verified implementation.
>

Yes, not priority at all. And you're right, no benefit.

>  3) Starting an implementation throws up further questions that need to
>    be addressed, both the basis and the extension need to be verified.
>
> I would like to suggest to implement the basis, that is CCID-4 with ECN
> (using plain ECT(0)), test with that until it works satisfactorily, and
> then continue adding measures such as the ECN Nonce verification.
>

Okay. But, when would be good to at least include random ECT
generation? When DCCP ECN code will get fixed? Is there any work on
this?

> Nothing is lost, once we are at this stage we can return to this set of
> initial patches and revise the situation based on the insights gained
> with ECT(0) experience.
>
> In summary, I would like to suggest to remove the ECN verification for
> the moment and focus on the "basic" issues first.
>
> Would you be ok with that?
>

Yes, we'll keep the ECN verification code here at our git until the
scenario is ready.

>
>
> Appendix
> --------
> | +int tfrc_sp_get_random_ect(struct tfrc_tx_li_data *li_data, u64 seqn)
> | +{
> | +     int ect;
> | +     struct tfrc_ecn_echo_sum_entry *sum;
> | +
> | +     /* TODO: implement random ect*/
> | +     ect = INET_ECN_ECT_0;
> | +
> | +     sum = kmem_cache_alloc(tfrc_ecn_echo_sum_slab, GFP_ATOMIC);
>
> For a later implementation, there should be protection against NULL, e.g.
>        if (sum == NULL) {
>           DCCP_CRIT("Problem here ...");
>           return 0;
>        }
> | +
> | +     sum->previous = li_data->ecn_sums_head;
> | +     sum->ecn_echo_sum = (sum->previous->ecn_echo_sum) ? !ect : ect;
>

Thanks, i forgot that.

> (Also for later) I wonder how to do the sums, with RFC 3168
> ECT(0) = 0x2 => !0x2 = 0
> ECT(1) = 0x1 => !0x1 = 0
>

I don't understand. Can you try to explain it? Or cite RFC section
that address it?

> From the addition table in RFC 3540, section 2,
> ECT(0) + ECT(0) = 0
> ECT(0) + ECT(1) = 1
> ECT(1) + ECT(1) = 0
>
> One way could be
>        sum->ecn_echo_sum ^= (ect == INET_ECN_ECT_1);

Ok.




-- 
Ivo Augusto Andrade Rocha Calado
MSc. Candidate
Embedded Systems and Pervasive Computing Lab - http://embedded.ufcg.edu.br
Systems and Computing Department - http://www.dsc.ufcg.edu.br
Electrical Engineering and Informatics Center - http://www.ceei.ufcg.edu.br
Federal University of Campina Grande - http://www.ufcg.edu.br

PGP: 0x03422935
Quidquid latine dictum sit, altum viditur.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox