* Kernel panic in inet_twdr_do_twkill_work
@ 2009-05-14 1:22 Eric W. Biederman
2009-05-14 7:53 ` Daniel Lezcano
0 siblings, 1 reply; 10+ messages in thread
From: Eric W. Biederman @ 2009-05-14 1:22 UTC (permalink / raw)
To: Daniel Lezcano; +Cc: netdev, Denis V. Lunev
So far I have only seen this twice. But the backtrace looks
almost identical to the one in commit d315492b1a6ba29da0fa2860759505ae1b2db857
The kernels I saw this on were patched version of 2.6.28 with some
network namespace backports. commit
d315492b1a6ba29da0fa2860759505ae1b2db857 was definitely present.
Daniel any ideas?
Eric
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Kernel panic in inet_twdr_do_twkill_work
2009-05-14 1:22 Kernel panic in inet_twdr_do_twkill_work Eric W. Biederman
@ 2009-05-14 7:53 ` Daniel Lezcano
2009-05-14 8:18 ` Eric W. Biederman
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Lezcano @ 2009-05-14 7:53 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: netdev, Denis V. Lunev
Eric W. Biederman wrote:
> So far I have only seen this twice. But the backtrace looks
> almost identical to the one in commit d315492b1a6ba29da0fa2860759505ae1b2db857
>
> The kernels I saw this on were patched version of 2.6.28 with some
> network namespace backports. commit
> d315492b1a6ba29da0fa2860759505ae1b2db857 was definitely present.
>
> Daniel any ideas?
>
Hi Eric,
I found this one. May be it could be related to your problem:
commit 2bad35b7c9588eb5e65c03bcae54e7eb6b1a6504
Let me know :)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Kernel panic in inet_twdr_do_twkill_work
2009-05-14 7:53 ` Daniel Lezcano
@ 2009-05-14 8:18 ` Eric W. Biederman
2009-05-14 8:33 ` Daniel Lezcano
0 siblings, 1 reply; 10+ messages in thread
From: Eric W. Biederman @ 2009-05-14 8:18 UTC (permalink / raw)
To: Daniel Lezcano; +Cc: netdev, Denis V. Lunev
Daniel Lezcano <daniel.lezcano@free.fr> writes:
> Eric W. Biederman wrote:
>> So far I have only seen this twice. But the backtrace looks
>> almost identical to the one in commit d315492b1a6ba29da0fa2860759505ae1b2db857
>>
>> The kernels I saw this on were patched version of 2.6.28 with some
>> network namespace backports. commit
>> d315492b1a6ba29da0fa2860759505ae1b2db857 was definitely present.
>>
>> Daniel any ideas?
>>
> Hi Eric,
>
> I found this one. May be it could be related to your problem:
>
> commit 2bad35b7c9588eb5e65c03bcae54e7eb6b1a6504
>
> Let me know :)
"netns: oops in ip[6]_frag_reasm incrementing stats" does not look likely.
There is no real ipv6 traffic currently on the our network and the panic
is definitely in inet_twdr_do_twkill_work.
Further we are getting the net of a timewait socket. So I don't see how
a problem with NULL devs could have anything to do with it.
I really suspect the purge code is not being successful.
Eric
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Kernel panic in inet_twdr_do_twkill_work
2009-05-14 8:18 ` Eric W. Biederman
@ 2009-05-14 8:33 ` Daniel Lezcano
2009-05-14 9:13 ` Eric W. Biederman
0 siblings, 1 reply; 10+ messages in thread
From: Daniel Lezcano @ 2009-05-14 8:33 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: netdev, Denis V. Lunev
Eric W. Biederman wrote:
> Daniel Lezcano <daniel.lezcano@free.fr> writes:
>
>
>> Eric W. Biederman wrote:
>>
>>> So far I have only seen this twice. But the backtrace looks
>>> almost identical to the one in commit d315492b1a6ba29da0fa2860759505ae1b2db857
>>>
>>> The kernels I saw this on were patched version of 2.6.28 with some
>>> network namespace backports. commit
>>> d315492b1a6ba29da0fa2860759505ae1b2db857 was definitely present.
>>>
>>> Daniel any ideas?
>>>
>>>
>> Hi Eric,
>>
>> I found this one. May be it could be related to your problem:
>>
>> commit 2bad35b7c9588eb5e65c03bcae54e7eb6b1a6504
>>
>> Let me know :)
>>
>
> "netns: oops in ip[6]_frag_reasm incrementing stats" does not look likely.
>
> There is no real ipv6 traffic currently on the our network and the panic
> is definitely in inet_twdr_do_twkill_work.
>
> Further we are getting the net of a timewait socket. So I don't see how
> a problem with NULL devs could have anything to do with it.
>
> I really suspect the purge code is not being successful.
>
May be you can activate the NETNS_REFCNT_DEBUG in order to check if the
timewait socket
were destroyed at the namespace destruction ? Unfortunately it looks
like the option is not in the Kconfig :(
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Kernel panic in inet_twdr_do_twkill_work
2009-05-14 8:33 ` Daniel Lezcano
@ 2009-05-14 9:13 ` Eric W. Biederman
2009-05-14 9:21 ` Daniel Lezcano
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Eric W. Biederman @ 2009-05-14 9:13 UTC (permalink / raw)
To: Daniel Lezcano; +Cc: netdev, Denis V. Lunev
Daniel Lezcano <daniel.lezcano@free.fr> writes:
> May be you can activate the NETNS_REFCNT_DEBUG in order to check if the timewait
> socket
> were destroyed at the namespace destruction ? Unfortunately it looks like the
> option is not in the Kconfig :(
Looks like a good starting place.
I will enable that when I respin my internal kernel.
I don't have a good reproducer at the moment.... So I was hoping we could
figure this out with code inspection.
Eric
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Kernel panic in inet_twdr_do_twkill_work
2009-05-14 9:13 ` Eric W. Biederman
@ 2009-05-14 9:21 ` Daniel Lezcano
2009-05-14 9:42 ` Daniel Lezcano
2009-05-24 13:26 ` Daniel Lezcano
2 siblings, 0 replies; 10+ messages in thread
From: Daniel Lezcano @ 2009-05-14 9:21 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: netdev, Denis V. Lunev
Eric W. Biederman wrote:
> Daniel Lezcano <daniel.lezcano@free.fr> writes:
>
>
>> May be you can activate the NETNS_REFCNT_DEBUG in order to check if the timewait
>> socket
>> were destroyed at the namespace destruction ? Unfortunately it looks like the
>> option is not in the Kconfig :(
>>
>
> Looks like a good starting place.
>
> I will enable that when I respin my internal kernel.
>
> I don't have a good reproducer at the moment.... So I was hoping we could
> figure this out with code inspection.
>
I remember I wrote a small program to create hundred of timewait sockets
to test the purge.
I will look if I can found it and try as a good reproducer.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Kernel panic in inet_twdr_do_twkill_work
2009-05-14 9:13 ` Eric W. Biederman
2009-05-14 9:21 ` Daniel Lezcano
@ 2009-05-14 9:42 ` Daniel Lezcano
2009-05-24 13:26 ` Daniel Lezcano
2 siblings, 0 replies; 10+ messages in thread
From: Daniel Lezcano @ 2009-05-14 9:42 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: netdev, Denis V. Lunev
[-- Attachment #1: Type: text/plain, Size: 690 bytes --]
Eric W. Biederman wrote:
> Daniel Lezcano <daniel.lezcano@free.fr> writes:
>
>
>> May be you can activate the NETNS_REFCNT_DEBUG in order to check if the timewait
>> socket
>> were destroyed at the namespace destruction ? Unfortunately it looks like the
>> option is not in the Kconfig :(
>>
>
> Looks like a good starting place.
>
> I will enable that when I respin my internal kernel.
>
> I don't have a good reproducer at the moment.... So I was hoping we could
> figure this out with code inspection.
>
I found this one which makes a lot of timewait sockets. I tried on a
2.6.29 kernel and I was not able to reproduce it. Can you check if this
program reproduce the bug ?
[-- Attachment #2: timewait.c --]
[-- Type: text/x-csrc, Size: 2286 bytes --]
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/poll.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>
#include <unistd.h>
#define MAXCONN 10000
int client(int *fds)
{
int i, len;
struct sockaddr_in6 addr;
close(fds[1]);
memset(&addr, 0, sizeof(addr));
addr.sin6_family = AF_INET6;
addr.sin6_port = htons(10000);
addr.sin6_addr = in6addr_loopback;
if (read(fds[0], &i, sizeof(i)) == -1) {
perror("read");
return 1;
}
for (i = 0; i < MAXCONN; i++) {
int fd = socket(PF_INET6, SOCK_STREAM, 0);
if (fd == -1) {
perror("socket");
return 1;
}
if (connect(fd, (const struct sockaddr *)&addr, sizeof(addr))) {
perror("connect");
return 1;
}
len = write(fd, &fd, sizeof(fd));
if (!len) {
fprintf(stderr, "write wrote 0 bytes\n");
return 1;
}
if (len == -1) {
perror("write");
return 1;
}
}
return 0;
}
int server(int *fds)
{
int i, fd, fdpoll[MAXCONN];
struct sockaddr_in6 addr;
socklen_t socklen = sizeof(addr);
close(fds[0]);
fd = socket(PF_INET6, SOCK_STREAM, 0);
if (fd == -1) {
perror("socket");
return 1;
}
memset(&addr, 0, sizeof(addr));
addr.sin6_family = AF_INET6;
addr.sin6_port = htons(10000);
addr.sin6_addr = in6addr_loopback;
if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &fd, sizeof(fd))) {
perror("setsockopt");
return 1;
}
if (bind(fd, (const struct sockaddr *)&addr, sizeof(addr))) {
perror("bind");
return 1;
}
if (listen(fd, MAXCONN)) {
perror("listen");
return 1;
}
if (write(fds[1], &i, sizeof(i)) == -1) {
perror("write");
return 1;
}
for (i = 0; i < MAXCONN; i++) {
int len, f = accept(fd, (struct sockaddr *)&addr, &socklen);
if (f == -1) {
perror("accept");
return 1;
}
fdpoll[i] = f;
len = read(f, &f, sizeof(f));
if (!len) {
fprintf(stderr, "read readen 0 bytes\n");
return 1;
}
if (len == -1) {
perror("read");
return 1;
}
}
return 0;
}
int main(int argc, char *argv[])
{
int fds[2];
int pid;
if (pipe(fds)) {
perror("pipe");
return 1;
}
pid = fork();
if (pid == -1) {
perror("fork");
return 1;
}
if (!pid)
return client(fds);
else
return server(fds);
}
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Kernel panic in inet_twdr_do_twkill_work
2009-05-14 9:13 ` Eric W. Biederman
2009-05-14 9:21 ` Daniel Lezcano
2009-05-14 9:42 ` Daniel Lezcano
@ 2009-05-24 13:26 ` Daniel Lezcano
2009-05-24 13:54 ` Eric W. Biederman
2009-06-03 0:40 ` Eric W. Biederman
2 siblings, 2 replies; 10+ messages in thread
From: Daniel Lezcano @ 2009-05-24 13:26 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: netdev, Denis V. Lunev
Eric W. Biederman wrote:
> Daniel Lezcano <daniel.lezcano@free.fr> writes:
>
>
>> May be you can activate the NETNS_REFCNT_DEBUG in order to check if the timewait
>> socket
>> were destroyed at the namespace destruction ? Unfortunately it looks like the
>> option is not in the Kconfig :(
>>
>
> Looks like a good starting place.
>
> I will enable that when I respin my internal kernel.
>
> I don't have a good reproducer at the moment.... So I was hoping we could
> figure this out with code inspection.
>
Hi Eric,
did you succeeded to reproduce the bug with the test program I sent you ?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Kernel panic in inet_twdr_do_twkill_work
2009-05-24 13:26 ` Daniel Lezcano
@ 2009-05-24 13:54 ` Eric W. Biederman
2009-06-03 0:40 ` Eric W. Biederman
1 sibling, 0 replies; 10+ messages in thread
From: Eric W. Biederman @ 2009-05-24 13:54 UTC (permalink / raw)
To: Daniel Lezcano; +Cc: netdev, Denis V. Lunev
Daniel Lezcano <daniel.lezcano@free.fr> writes:
> did you succeeded to reproduce the bug with the test program I sent you ?
Grr. My apologies. I haven't had a chance to play with that yet.
Thank you for the reminder.
Eric
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Kernel panic in inet_twdr_do_twkill_work
2009-05-24 13:26 ` Daniel Lezcano
2009-05-24 13:54 ` Eric W. Biederman
@ 2009-06-03 0:40 ` Eric W. Biederman
1 sibling, 0 replies; 10+ messages in thread
From: Eric W. Biederman @ 2009-06-03 0:40 UTC (permalink / raw)
To: Daniel Lezcano; +Cc: netdev, Denis V. Lunev
Daniel Lezcano <daniel.lezcano@free.fr> writes:
> Eric W. Biederman wrote:
>> Daniel Lezcano <daniel.lezcano@free.fr> writes:
>>
>>
>>> May be you can activate the NETNS_REFCNT_DEBUG in order to check if the timewait
>>> socket
>>> were destroyed at the namespace destruction ? Unfortunately it looks like the
>>> option is not in the Kconfig :(
>>>
>>
>> Looks like a good starting place.
>>
>> I will enable that when I respin my internal kernel.
>>
>> I don't have a good reproducer at the moment.... So I was hoping we could
>> figure this out with code inspection.
>>
> Hi Eric,
>
> did you succeeded to reproduce the bug with the test program I sent you ?
Weird. I finally got around to running your little test app, and I don't trigger
it here.
At the same time I am starting to see what I think is this error more often.
Eric
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2009-06-03 0:40 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-14 1:22 Kernel panic in inet_twdr_do_twkill_work Eric W. Biederman
2009-05-14 7:53 ` Daniel Lezcano
2009-05-14 8:18 ` Eric W. Biederman
2009-05-14 8:33 ` Daniel Lezcano
2009-05-14 9:13 ` Eric W. Biederman
2009-05-14 9:21 ` Daniel Lezcano
2009-05-14 9:42 ` Daniel Lezcano
2009-05-24 13:26 ` Daniel Lezcano
2009-05-24 13:54 ` Eric W. Biederman
2009-06-03 0:40 ` Eric W. Biederman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).