public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Willy Tarreau <w@1wt.eu>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Eric Dumazet <edumazet@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Ben Hutchings <ben@decadent.org.uk>, Willy Tarreau <w@1wt.eu>
Subject: [ 38/48] tcp: avoid looping in tcp_send_fin()
Date: Fri, 15 May 2015 10:06:08 +0200	[thread overview]
Message-ID: <20150515080531.898779404@1wt.eu> (raw)
In-Reply-To: <9c2783dfae10ef2d1e9b08bcc1e562c5@local>

2.6.32-longterm review patch.  If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 845704a535e9b3c76448f52af1b70e4422ea03fd ]

Presence of an unbound loop in tcp_send_fin() had always been hard
to explain when analyzing crash dumps involving gigantic dying processes
with millions of sockets.

Lets try a different strategy :

In case of memory pressure, try to add the FIN flag to last packet
in write queue, even if packet was already sent. TCP stack will
be able to deliver this FIN after a timeout event. Note that this
FIN being delivered by a retransmit, it also carries a Push flag
given our current implementation.

By checking sk_under_memory_pressure(), we anticipate that cooking
many FIN packets might deplete tcp memory.

In the case we could not allocate a packet, even with __GFP_WAIT
allocation, then not sending a FIN seems quite reasonable if it allows
to get rid of this socket, free memory, and not block the process from
eventually doing other useful work.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[bwh: Backported to 3.2:
 - Drop inapplicable change to sk_forced_wmem_schedule()
 - s/sk_under_memory_pressure(sk)/tcp_memory_pressure/]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
(cherry picked from commit 82241580d7734af2207ad0bb1720904f569dac3a)
[wt: backported to 2.6.32: s/TCPHDR_FIN/TCPCB_FLAG_FIN/]
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 net/ipv4/tcp_output.c | 45 ++++++++++++++++++++++++++-------------------
 1 file changed, 26 insertions(+), 19 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 9e7fc38..5339f06 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2121,33 +2121,40 @@ begin_fwd:
 	}
 }
 
-/* Send a fin.  The caller locks the socket for us.  This cannot be
- * allowed to fail queueing a FIN frame under any circumstances.
+/* Send a FIN. The caller locks the socket for us.
+ * We should try to send a FIN packet really hard, but eventually give up.
  */
 void tcp_send_fin(struct sock *sk)
 {
+	struct sk_buff *skb, *tskb = tcp_write_queue_tail(sk);
 	struct tcp_sock *tp = tcp_sk(sk);
-	struct sk_buff *skb = tcp_write_queue_tail(sk);
-	int mss_now;
 
-	/* Optimization, tack on the FIN if we have a queue of
-	 * unsent frames.  But be careful about outgoing SACKS
-	 * and IP options.
+	/* Optimization, tack on the FIN if we have one skb in write queue and
+	 * this skb was not yet sent, or we are under memory pressure.
+	 * Note: in the latter case, FIN packet will be sent after a timeout,
+	 * as TCP stack thinks it has already been transmitted.
 	 */
-	mss_now = tcp_current_mss(sk);
-
-	if (tcp_send_head(sk) != NULL) {
+	if (tskb && (tcp_send_head(sk) || tcp_memory_pressure)) {
+coalesce:
 		TCP_SKB_CB(skb)->flags |= TCPCB_FLAG_FIN;
-		TCP_SKB_CB(skb)->end_seq++;
+		TCP_SKB_CB(tskb)->end_seq++;
 		tp->write_seq++;
+		if (!tcp_send_head(sk)) {
+			/* This means tskb was already sent.
+			 * Pretend we included the FIN on previous transmit.
+			 * We need to set tp->snd_nxt to the value it would have
+			 * if FIN had been sent. This is because retransmit path
+			 * does not change tp->snd_nxt.
+			 */
+			tp->snd_nxt++;
+			return;
+		}
 	} else {
-		/* Socket is locked, keep trying until memory is available. */
-		for (;;) {
-			skb = alloc_skb_fclone(MAX_TCP_HEADER,
-					       sk->sk_allocation);
-			if (skb)
-				break;
-			yield();
+		skb = alloc_skb_fclone(MAX_TCP_HEADER, sk->sk_allocation);
+		if (unlikely(!skb)) {
+			if (tskb)
+				goto coalesce;
+			return;
 		}
 
 		/* Reserve space for headers and prepare control bits. */
@@ -2157,7 +2164,7 @@ void tcp_send_fin(struct sock *sk)
 				     TCPCB_FLAG_ACK | TCPCB_FLAG_FIN);
 		tcp_queue_skb(sk, skb);
 	}
-	__tcp_push_pending_frames(sk, mss_now, TCP_NAGLE_OFF);
+	__tcp_push_pending_frames(sk, tcp_current_mss(sk), TCP_NAGLE_OFF);
 }
 
 /* We get here when a process closes a file descriptor (either due to
-- 
1.7.12.2.21.g234cd45.dirty




  parent reply	other threads:[~2015-05-15  8:22 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <9c2783dfae10ef2d1e9b08bcc1e562c5@local>
2015-05-15  8:05 ` [ 00/48] 2.6.32.66-longterm review Willy Tarreau
2015-05-15  8:05 ` [ 01/48] x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and sync_regs Willy Tarreau
2015-05-15  8:05 ` [ 02/48] x86/tls: Validate TLS entries to protect espfix Willy Tarreau
2015-05-15  8:05 ` [ 03/48] x86, tls, ldt: Stop checking lm in LDT_empty Willy Tarreau
2015-05-15  8:05 ` [ 04/48] x86, tls: Interpret an all-zero struct user_desc as "no segment" Willy Tarreau
2015-05-15  8:05 ` [ 05/48] x86_64, switch_to(): Load TLS descriptors before switching DS and ES Willy Tarreau
2015-05-15 12:32   ` Ben Hutchings
2015-05-15 13:38     ` Willy Tarreau
2015-05-15 14:25       ` Ben Hutchings
2015-05-15 14:31         ` Ben Hutchings
2015-05-15 14:37         ` Willy Tarreau
2015-05-15 15:53         ` Andi Kleen
2015-05-15 16:48           ` Willy Tarreau
2015-05-15 20:53           ` Ben Hutchings
2015-05-15 22:15             ` Andi Kleen
2015-05-15  8:05 ` [ 06/48] x86/tls: Disallow unusual TLS segments Willy Tarreau
2015-05-15  8:05 ` [ 07/48] x86/tls: Dont validate lm in set_thread_area() after all Willy Tarreau
2015-05-15  8:05 ` [ 08/48] x86, kvm: Clear paravirt_enabled on KVM guests for espfix32s benefit Willy Tarreau
2015-05-15  8:05 ` [ 09/48] x86_64, vdso: Fix the vdso address randomization algorithm Willy Tarreau
2015-05-15 21:02   ` Ben Hutchings
2015-05-15  8:05 ` [ 10/48] ASLR: fix stack randomization on 64-bit systems Willy Tarreau
2015-05-15  8:05 ` [ 11/48] x86, cpu, amd: Add workaround for family 16h, erratum 793 Willy Tarreau
2015-05-15  8:05 ` [ 12/48] x86/asm/entry/64: Remove a bogus ret_from_fork optimization Willy Tarreau
2015-05-15  8:05 ` [ 13/48] x86: Conditionally update time when ack-ing pending irqs Willy Tarreau
2015-05-15  8:05 ` [ 14/48] serial: samsung: wait for transfer completion before clock disable Willy Tarreau
2015-05-15  8:05 ` [ 15/48] splice: Apply generic position and size checks to each write Willy Tarreau
2015-05-15  8:05 ` [ 16/48] netfilter: conntrack: disable generic tracking for known protocols Willy Tarreau
2015-05-15 21:05   ` Ben Hutchings
2015-05-15  8:05 ` [ 17/48] isofs: Fix infinite looping over CE entries Willy Tarreau
2015-05-15  8:05 ` [ 18/48] isofs: Fix unchecked printing of ER records Willy Tarreau
2015-05-15  8:05 ` [ 19/48] net: sctp: fix memory leak in auth key management Willy Tarreau
2015-05-15  8:05 ` [ 20/48] net: sctp: fix slab corruption from use after free on INIT collisions Willy Tarreau
2015-05-15  8:05 ` [ 21/48] IB/uverbs: Prevent integer overflow in ib_umem_get address arithmetic Willy Tarreau
2015-05-15  8:05 ` [ 22/48] net: llc: use correct size for sysctl timeout entries Willy Tarreau
2015-05-15  8:05 ` [ 23/48] net: rds: use correct size for max unacked packets and bytes Willy Tarreau
2015-05-15  8:05 ` [ 24/48] ipv6: Dont reduce hop limit for an interface Willy Tarreau
2015-05-15  8:05 ` [ 25/48] fs: take i_mutex during prepare_binprm for set[ug]id executables Willy Tarreau
2015-05-15  8:05 ` [ 26/48] net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr struct from userland Willy Tarreau
2015-05-15 21:08   ` Ben Hutchings
2015-05-16  5:31     ` Willy Tarreau
2015-05-15  8:05 ` [ 27/48] ppp: deflate: never return len larger than output buffer Willy Tarreau
2015-05-15  8:05 ` [ 29/48] net: reject creation of netdev names with colons Willy Tarreau
2015-05-15  8:06 ` [ 30/48] ipv4: Dont use ufo handling on later transformed packets Willy Tarreau
2015-05-15  8:06 ` [ 31/48] udp: only allow UFO for packets from SOCK_DGRAM sockets Willy Tarreau
2015-05-15  8:06 ` [ 32/48] net: avoid to hang up on sending due to sysctl configuration overflow Willy Tarreau
2015-05-15  8:06 ` [ 33/48] net: sysctl_net_core: check SNDBUF and RCVBUF for min length Willy Tarreau
2015-05-15  8:06 ` [ 34/48] rds: avoid potential stack overflow Willy Tarreau
2015-05-15  8:06 ` [ 35/48] rxrpc: bogus MSG_PEEK test in rxrpc_recvmsg() Willy Tarreau
2015-05-15  8:06 ` [ 36/48] tcp: make connect() mem charging friendly Willy Tarreau
2015-05-15  8:06 ` [ 37/48] ip_forward: Drop frames with attached skb->sk Willy Tarreau
2015-05-15  8:06 ` Willy Tarreau [this message]
2015-05-15  8:06 ` [ 39/48] spi: spidev: fix possible arithmetic overflow for multi-transfer message Willy Tarreau
2015-05-15  8:06 ` [ 40/48] IB/core: Avoid leakage from kernel to user space Willy Tarreau
2015-05-15  8:06 ` [ 41/48] ipvs: uninitialized data with IP_VS_IPV6 Willy Tarreau
2015-05-15  8:06 ` [ 42/48] ipv4: fix nexthop attlen check in fib_nh_match Willy Tarreau
2015-05-15  8:06 ` [ 43/48] pagemap: do not leak physical addresses to non-privileged userspace Willy Tarreau
2015-05-15  8:06 ` [ 44/48] lockd: Try to reconnect if statd has moved Willy Tarreau
2015-05-15  8:06 ` [ 45/48] scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND Willy Tarreau
2015-05-15  8:06 ` [ 46/48] posix-timers: Fix stack info leak in timer_create() Willy Tarreau
2015-05-15  8:06 ` [ 47/48] hfsplus: fix B-tree corruption after insertion at position 0 Willy Tarreau
2015-05-15  8:06 ` [ 48/48] sound/oss: fix deadlock in sequencer_ioctl(SNDCTL_SEQ_OUTOFBAND) Willy Tarreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150515080531.898779404@1wt.eu \
    --to=w@1wt.eu \
    --cc=ben@decadent.org.uk \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox