From: "Ahmed, Aaron" <aarnahmd@amazon.com>
To: "stable@vger.kernel.org" <stable@vger.kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Cc: "ncardwell@google.com" <ncardwell@google.com>,
"edumazet@google.com" <edumazet@google.com>,
"kuniyu@google.com" <kuniyu@google.com>
Subject: [BUG] net: tcp: SO_LINGER with l_linger=0 leaks memory when closing sockets with pending send data
Date: Sat, 18 Apr 2026 00:19:55 +0000 [thread overview]
Message-ID: <48BADABE-4DFB-4DAD-8248-E94D8F5238D2@amazon.com> (raw)
Hi,
We have identified a TCP memory leak issue on Amazon Linux with kernel versions 5.15.168 through 6.18.20 that occurs when closing sockets with SO_LINGER set to l_onoff=1, l_linger=0, on servers handling many persistent connections with full write buffers.
Overview:
The issue was discovered on a public-facing non-blocking TCP server that maintains many persistent connections and streams data to clients. When a client cannot read fast enough, the TCP write socket buffer on the server side fills up and send() returns EAGAIN. At that point, the server application disconnects the slow client by setting SO_LINGER to l_onoff=1, l_linger=0 and calling close(). This is intended to immediately reset the connection and release all associated kernel resources. However, while the socket disappears from netstat and sockstat (TCP inuse drops), the write buffer memory is not properly reclaimed. /proc/net/sockstat shows TCP mem pages accumulating with no owning sockets, causing the leaked memory to grow past the tcp_mem limits. Setting SO_LINGER to l_onoff=1, l_linger=1 instead does not leak. With l_linger=1, the connection goes through FIN_WAIT1 → FIN_WAIT2 → CLOSE (confirmed with BPF tcpstates), and all memory is freed properly. With l_linger=0, the connection transitions directly from ESTABLISHED → CLOSE via RST, bypassing the FIN states entirely.
Reproducer:
```
/* tcp_linger_memleak.c - SO_LINGER(0) TCP memory leak reproducer
*
* Build: gcc -O2 -o tcp_linger_memleak tcp_linger_memleak.c
* Run: sudo sysctl -w net.core.wmem_max=4194304
* sudo sysctl -w net.ipv4.tcp_rmem="4096 8192 16384"
* ./tcp_linger_memleak
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
#include <signal.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <netinet/in.h>
#define NUM_CONNS 5000
#define PORT 6666
static void print_mem(const char *label) {
FILE *f;
char line[256];
f = fopen("/proc/meminfo", "r");
while (fgets(line, sizeof(line), f))
if (strncmp(line, "MemAvailable:", 13) == 0)
printf("%s: %s", label, line);
fclose(f);
f = fopen("/proc/net/sockstat", "r");
while (fgets(line, sizeof(line), f))
if (strncmp(line, "TCP:", 4) == 0)
printf("%s: %s", label, line);
fclose(f);
}
int main(void) {
struct sockaddr_in addr = {
.sin_family = AF_INET,
.sin_port = htons(PORT),
.sin_addr.s_addr = htonl(INADDR_LOOPBACK)
};
int opt = 1;
signal(SIGPIPE, SIG_IGN);
int lsn = socket(AF_INET, SOCK_STREAM, 0);
setsockopt(lsn, SOL_SOCKET, SO_REUSEADDR, &opt, sizeof(opt));
bind(lsn, (struct sockaddr *)&addr, sizeof(addr));
listen(lsn, NUM_CONNS);
/* Fork client: connect N times, never read */
pid_t child = fork();
if (child == 0) {
int fds[NUM_CONNS];
for (int i = 0; i < NUM_CONNS; i++) {
fds[i] = socket(AF_INET, SOCK_STREAM, 0);
connect(fds[i], (struct sockaddr *)&addr, sizeof(addr));
}
pause(); /* sit forever, never read */
_exit(0);
}
/* Accept all connections */
int clients[NUM_CONNS];
for (int i = 0; i < NUM_CONNS; i++)
clients[i] = accept(lsn, NULL, NULL);
/* Freeze client so it stops reading */
kill(child, SIGSTOP);
printf("=== %d connections established, client frozen ===\n", NUM_CONNS);
print_mem("BEFORE");
/* Fill buffers and close with SO_LINGER(1,0) */
char buf[2048];
memset(buf, 'A', sizeof(buf));
for (int i = 0; i < NUM_CONNS; i++) {
int flags = fcntl(clients[i], F_GETFL, 0);
fcntl(clients[i], F_SETFL, flags | O_NONBLOCK);
while (send(clients[i], buf, sizeof(buf), MSG_NOSIGNAL) > 0);
struct linger lg = { .l_onoff = 1, .l_linger = 0 };
setsockopt(clients[i], SOL_SOCKET, SO_LINGER, &lg, sizeof(lg));
close(clients[i]);
}
sleep(2);
printf("\n=== All sockets closed with SO_LINGER(1,0) ===\n");
print_mem("AFTER");
kill(child, SIGKILL);
waitpid(child, NULL, 0);
close(lsn);
return 0;
}
```
Output (Tested on 6.18.20):
```
=== 5000 connections established, client frozen ===
BEFORE: MemAvailable: 95491288 kB
BEFORE: TCP: inuse 10005 orphan 0 tw 5 alloc 10006 mem 0
=== All sockets closed with SO_LINGER(1,0) ===
AFTER: MemAvailable: 95321800 kB
AFTER: TCP: inuse 5 orphan 0 tw 5 alloc 5006 mem 8300
```
Thanks,
Aaron Ahmed
next reply other threads:[~2026-04-18 0:19 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-18 0:19 Ahmed, Aaron [this message]
2026-04-18 0:44 ` [BUG] net: tcp: SO_LINGER with l_linger=0 leaks memory when closing sockets with pending send data Kuniyuki Iwashima
2026-04-18 1:06 ` Kuniyuki Iwashima
2026-04-27 22:26 ` Ahmed, Aaron
2026-04-28 0:15 ` Kuniyuki Iwashima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48BADABE-4DFB-4DAD-8248-E94D8F5238D2@amazon.com \
--to=aarnahmd@amazon.com \
--cc=edumazet@google.com \
--cc=kuniyu@google.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox