netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH][BUG] tcp: advertise MSS requested by user
@ 2008-08-26 18:11 Tom Quetchenbach
  2008-08-28  9:46 ` David Miller
  2008-09-21  7:22 ` David Miller
  0 siblings, 2 replies; 7+ messages in thread
From: Tom Quetchenbach @ 2008-08-26 18:11 UTC (permalink / raw)
  To: netdev; +Cc: Lachlan Andrew

I'm trying to use the TCP_MAXSEG option to setsockopt() to set the MSS
for both sides of a bidirectional connection.

man tcp says: "If this option is set before connection establishment, it
also changes the MSS value announced to the other end in the initial
packet."

However, the kernel only uses the MTU/route cache to set the advertised
MSS. That means if I set the MSS to, say, 500 before calling connect(),
I will send at most 500-byte packets, but I will still receive 1500-byte
packets in reply.

This is a bug, either in the kernel or the documentation.

This patch (applies to latest net-2.6) reduces the advertised value to
that requested by the user as long as setsockopt() is called before
connect() or accept(). This seems like the behavior that one would expect
as well as that which is documented.

I've tried to make sure that things that depend on the advertised MSS
are set correctly, but please correct me if I've broken something.

Currently there's no way for the user to change the MSS of an established
connection--calling setsockopt() after the connection is established has
no effect. Is this something that should be changed?

Thanks,
-Tom

Signed-off-by: Tom Quetchenbach <virtualphtn@gmail.com>
---

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 44c1e93..8bc8b36 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1364,6 +1364,11 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
 	tcp_mtup_init(newsk);
 	tcp_sync_mss(newsk, dst_mtu(dst));
 	newtp->advmss = dst_metric(dst, RTAX_ADVMSS);
+
+	if (tcp_sk(sk)->rx_opt.user_mss &&
+				tcp_sk(sk)->rx_opt.user_mss < newtp->advmss)
+		newtp->advmss = tcp_sk(sk)->rx_opt.user_mss;
+
 	tcp_initialize_rcv_mss(newsk);
 
 #ifdef CONFIG_TCP_MD5SIG
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a00532d..50536e1 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2239,6 +2239,7 @@ struct sk_buff *tcp_make_synack(struct sock *sk, struct dst_entry *dst,
 	struct sk_buff *skb;
 	struct tcp_md5sig_key *md5;
 	__u8 *md5_hash_location;
+	int mss;
 
 	skb = sock_wmalloc(sk, MAX_TCP_HEADER + 15, 1, GFP_ATOMIC);
 	if (skb == NULL)
@@ -2249,13 +2250,17 @@ struct sk_buff *tcp_make_synack(struct sock *sk, struct dst_entry *dst,
 
 	skb->dst = dst_clone(dst);
 
+	mss = dst_metric(dst, RTAX_ADVMSS);
+	if (tp->rx_opt.user_mss && tp->rx_opt.user_mss < mss)
+		mss = tp->rx_opt.user_mss;
+
 	if (req->rcv_wnd == 0) { /* ignored for retransmitted syns */
 		__u8 rcv_wscale;
 		/* Set this up on the first call only */
 		req->window_clamp = tp->window_clamp ? : dst_metric(dst, RTAX_WINDOW);
 		/* tcp_full_space because it is guaranteed to be the first packet */
 		tcp_select_initial_window(tcp_full_space(sk),
-			dst_metric(dst, RTAX_ADVMSS) - (ireq->tstamp_ok ? TCPOLEN_TSTAMP_ALIGNED : 0),
+			mss - (ireq->tstamp_ok ? TCPOLEN_TSTAMP_ALIGNED : 0),
 			&req->rcv_wnd,
 			&req->window_clamp,
 			ireq->wscale_ok,
@@ -2265,8 +2270,7 @@ struct sk_buff *tcp_make_synack(struct sock *sk, struct dst_entry *dst,
 
 	memset(&opts, 0, sizeof(opts));
 	TCP_SKB_CB(skb)->when = tcp_time_stamp;
-	tcp_header_size = tcp_synack_options(sk, req,
-					     dst_metric(dst, RTAX_ADVMSS),
+	tcp_header_size = tcp_synack_options(sk, req, mss,
 					     skb, &opts, &md5) +
 			  sizeof(struct tcphdr);
 
@@ -2340,6 +2344,11 @@ static void tcp_connect_init(struct sock *sk)
 	if (!tp->window_clamp)
 		tp->window_clamp = dst_metric(dst, RTAX_WINDOW);
 	tp->advmss = dst_metric(dst, RTAX_ADVMSS);
+
+	if (tp->rx_opt.user_mss && tp->rx_opt.user_mss < tp->advmss){
+		tp->advmss = tp->rx_opt.user_mss;
+	}
+
 	tcp_initialize_rcv_mss(sk);
 
 	tcp_select_initial_window(tcp_full_space(sk),

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH][BUG] tcp: advertise MSS requested by user
  2008-08-26 18:11 [PATCH][BUG] tcp: advertise MSS requested by user Tom Quetchenbach
@ 2008-08-28  9:46 ` David Miller
  2008-08-28  9:50   ` Denys Fedoryshchenko
  2008-09-21  7:22 ` David Miller
  1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2008-08-28  9:46 UTC (permalink / raw)
  To: virtualphtn; +Cc: netdev, lachlan.andrew

From: Tom Quetchenbach <virtualphtn@gmail.com>
Date: Tue, 26 Aug 2008 11:11:52 -0700

> I'm trying to use the TCP_MAXSEG option to setsockopt() to set the MSS
> for both sides of a bidirectional connection.
> 
> man tcp says: "If this option is set before connection establishment, it
> also changes the MSS value announced to the other end in the initial
> packet."
> 
> However, the kernel only uses the MTU/route cache to set the advertised
> MSS. That means if I set the MSS to, say, 500 before calling connect(),
> I will send at most 500-byte packets, but I will still receive 1500-byte
> packets in reply.
> 
> This is a bug, either in the kernel or the documentation.

I think this got broken by accident several years ago, shows how many
people use this feature :)

We used to use ->mss_clamp to determine the MSS option value,
but along the way when all of the code using ->advmss et al.
was added it didn't get translated correctly.

Thanks for your patch, I'll mull over this and put in the fix
once I understand exactly how this got broken.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][BUG] tcp: advertise MSS requested by user
  2008-08-28  9:46 ` David Miller
@ 2008-08-28  9:50   ` Denys Fedoryshchenko
  2008-08-29 16:48     ` Tom Quetchenbach
  0 siblings, 1 reply; 7+ messages in thread
From: Denys Fedoryshchenko @ 2008-08-28  9:50 UTC (permalink / raw)
  To: David Miller; +Cc: virtualphtn, netdev, lachlan.andrew


> I think this got broken by accident several years ago, shows how many
> people use this feature :)
>
> We used to use ->mss_clamp to determine the MSS option value,
> but along the way when all of the code using ->advmss et al.
> was added it didn't get translated correctly.
>
> Thanks for your patch, I'll mull over this and put in the fix
> once I understand exactly how this got broken.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
I am using, just i workaround this over iptables MSS option. You feel 
discomfort, but most people start googling and see this trick with iptables, 
and they use it.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][BUG] tcp: advertise MSS requested by user
  2008-08-28  9:50   ` Denys Fedoryshchenko
@ 2008-08-29 16:48     ` Tom Quetchenbach
  2008-08-29 16:57       ` Denys Fedoryshchenko
  0 siblings, 1 reply; 7+ messages in thread
From: Tom Quetchenbach @ 2008-08-29 16:48 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: David Miller, netdev, lachlan.andrew

Best I can tell, my mailer ate this reply, so apologies if you get it twice.

>> Thanks for your patch, I'll mull over this and put in the fix
>> once I understand exactly how this got broken.

David, thanks a lot for looking into this. It will be really helpful for
our TCP benchmarking work.

> I am using, just i workaround this over iptables MSS option. You feel 
> discomfort, but most people start googling and see this trick with iptables, 
> and they use it.

In my case I don't think the iptables workaround is the right solution.
I'm working on a traffic generator (tmix, from the University of North
Carolina), that starts many simultaneous flows between two endpoints,
based on a set of "connection vectors" derived from a tcpdump. The MSS
for each flow is specified in the connection vector file on a per-flow
basis. So it's not so simple as clamping the MSS to the path MTU or
setting it based on some simple criteria like destination or port
number. I'm sure I could hack something together using iptables to
approximate what I want to do, but it seems like a bit of a mess to me.

I'm not that familiar with iptables so please correct me if I'm wrong.

Thanks
-Tom


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][BUG] tcp: advertise MSS requested by user
  2008-08-29 16:48     ` Tom Quetchenbach
@ 2008-08-29 16:57       ` Denys Fedoryshchenko
  0 siblings, 0 replies; 7+ messages in thread
From: Denys Fedoryshchenko @ 2008-08-29 16:57 UTC (permalink / raw)
  To: Tom Quetchenbach; +Cc: David Miller, netdev, lachlan.andrew

On Friday 29 August 2008, Tom Quetchenbach wrote:
> In my case I don't think the iptables workaround is the right solution.
> I'm working on a traffic generator (tmix, from the University of North
> Carolina), that starts many simultaneous flows between two endpoints,
> based on a set of "connection vectors" derived from a tcpdump. The MSS
> for each flow is specified in the connection vector file on a per-flow
> basis. So it's not so simple as clamping the MSS to the path MTU or
> setting it based on some simple criteria like destination or port
> number. I'm sure I could hack something together using iptables to
> approximate what I want to do, but it seems like a bit of a mess to me.
>
> I'm not that familiar with iptables so please correct me if I'm wrong.
Sure proper fix always better.
My case MUCH simplier (i know that all endpoints have lower than 1500 MTU, so 
i just set required value for all SYN requests to that direction).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][BUG] tcp: advertise MSS requested by user
  2008-08-26 18:11 [PATCH][BUG] tcp: advertise MSS requested by user Tom Quetchenbach
  2008-08-28  9:46 ` David Miller
@ 2008-09-21  7:22 ` David Miller
  2009-06-13 10:12   ` Willy Tarreau
  1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2008-09-21  7:22 UTC (permalink / raw)
  To: virtualphtn; +Cc: netdev, lachlan.andrew

From: Tom Quetchenbach <virtualphtn@gmail.com>
Date: Tue, 26 Aug 2008 11:11:52 -0700

> This patch (applies to latest net-2.6) reduces the advertised value to
> that requested by the user as long as setsockopt() is called before
> connect() or accept(). This seems like the behavior that one would expect
> as well as that which is documented.
 ...
> Signed-off-by: Tom Quetchenbach <virtualphtn@gmail.com>

I've applied this patch to net-next-2.6, thanks Tom.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH][BUG] tcp: advertise MSS requested by user
  2008-09-21  7:22 ` David Miller
@ 2009-06-13 10:12   ` Willy Tarreau
  0 siblings, 0 replies; 7+ messages in thread
From: Willy Tarreau @ 2009-06-13 10:12 UTC (permalink / raw)
  To: David Miller; +Cc: virtualphtn, netdev, lachlan.andrew, stable

Hi Dave,

On Sun, Sep 21, 2008 at 12:22:15AM -0700, David Miller wrote:
> From: Tom Quetchenbach <virtualphtn@gmail.com>
> Date: Tue, 26 Aug 2008 11:11:52 -0700
> 
> > This patch (applies to latest net-2.6) reduces the advertised value to
> > that requested by the user as long as setsockopt() is called before
> > connect() or accept(). This seems like the behavior that one would expect
> > as well as that which is documented.
>  ...
> > Signed-off-by: Tom Quetchenbach <virtualphtn@gmail.com>
> 
> I've applied this patch to net-next-2.6, thanks Tom.

Yesterday I worked on a similar patch to address the same issue which
I was experiencing on 2.6.27.25. Before posting my tiny patch, I
preferred to check the list and found this one which addresses the
same issue and which is already merged in mainline !

Would you accept to queue it for -stable please ? The commit ID was
f5fff5dc8a7a3f395b0525c02ba92c95d42b7390. I did not figure how to find
when it was merged, because its version pre-dates 2.6.27 (it matches
your net-2.6 queue I think). git-describe --contains tells me it was
merged in 2.6.28-rc1.

BTW, I found the same behaviour in 2.4 too, and from my quick analysis,
the mss is correctly computed till the moment the SYN-ACK is built,
where the one from the route cache is used instead. I'll check deeper
when I have time, but most likely a similar fix is needed there too.

Thanks!
Willy


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-06-13 10:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-26 18:11 [PATCH][BUG] tcp: advertise MSS requested by user Tom Quetchenbach
2008-08-28  9:46 ` David Miller
2008-08-28  9:50   ` Denys Fedoryshchenko
2008-08-29 16:48     ` Tom Quetchenbach
2008-08-29 16:57       ` Denys Fedoryshchenko
2008-09-21  7:22 ` David Miller
2009-06-13 10:12   ` Willy Tarreau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).