netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel panic in inet_twdr_do_twkill_work
@ 2009-05-14  1:22 Eric W. Biederman
  2009-05-14  7:53 ` Daniel Lezcano
  0 siblings, 1 reply; 10+ messages in thread
From: Eric W. Biederman @ 2009-05-14  1:22 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: netdev, Denis V. Lunev


So far I have only seen this twice.  But the backtrace looks
almost identical to the one in commit d315492b1a6ba29da0fa2860759505ae1b2db857

The kernels I saw this on were patched version of 2.6.28 with some
network namespace backports.  commit
d315492b1a6ba29da0fa2860759505ae1b2db857 was definitely present.

Daniel any ideas?

Eric


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel panic in inet_twdr_do_twkill_work
  2009-05-14  1:22 Kernel panic in inet_twdr_do_twkill_work Eric W. Biederman
@ 2009-05-14  7:53 ` Daniel Lezcano
  2009-05-14  8:18   ` Eric W. Biederman
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Lezcano @ 2009-05-14  7:53 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, Denis V. Lunev

Eric W. Biederman wrote:
> So far I have only seen this twice.  But the backtrace looks
> almost identical to the one in commit d315492b1a6ba29da0fa2860759505ae1b2db857
>
> The kernels I saw this on were patched version of 2.6.28 with some
> network namespace backports.  commit
> d315492b1a6ba29da0fa2860759505ae1b2db857 was definitely present.
>
> Daniel any ideas?
>   
Hi Eric,

I found this one. May be it could be related to your problem:

commit 2bad35b7c9588eb5e65c03bcae54e7eb6b1a6504

Let me know :)




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel panic in inet_twdr_do_twkill_work
  2009-05-14  7:53 ` Daniel Lezcano
@ 2009-05-14  8:18   ` Eric W. Biederman
  2009-05-14  8:33     ` Daniel Lezcano
  0 siblings, 1 reply; 10+ messages in thread
From: Eric W. Biederman @ 2009-05-14  8:18 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: netdev, Denis V. Lunev

Daniel Lezcano <daniel.lezcano@free.fr> writes:

> Eric W. Biederman wrote:
>> So far I have only seen this twice.  But the backtrace looks
>> almost identical to the one in commit d315492b1a6ba29da0fa2860759505ae1b2db857
>>
>> The kernels I saw this on were patched version of 2.6.28 with some
>> network namespace backports.  commit
>> d315492b1a6ba29da0fa2860759505ae1b2db857 was definitely present.
>>
>> Daniel any ideas?
>>   
> Hi Eric,
>
> I found this one. May be it could be related to your problem:
>
> commit 2bad35b7c9588eb5e65c03bcae54e7eb6b1a6504
>
> Let me know :)

"netns: oops in ip[6]_frag_reasm incrementing stats" does not look likely.

There is no real ipv6 traffic currently on the our network and the panic
is definitely in inet_twdr_do_twkill_work.

Further we are getting the net of a timewait socket.  So I don't see how
a problem with NULL devs could have anything to do with it.

I really suspect the purge code is not being successful.

Eric


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel panic in inet_twdr_do_twkill_work
  2009-05-14  8:18   ` Eric W. Biederman
@ 2009-05-14  8:33     ` Daniel Lezcano
  2009-05-14  9:13       ` Eric W. Biederman
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Lezcano @ 2009-05-14  8:33 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, Denis V. Lunev

Eric W. Biederman wrote:
> Daniel Lezcano <daniel.lezcano@free.fr> writes:
>
>   
>> Eric W. Biederman wrote:
>>     
>>> So far I have only seen this twice.  But the backtrace looks
>>> almost identical to the one in commit d315492b1a6ba29da0fa2860759505ae1b2db857
>>>
>>> The kernels I saw this on were patched version of 2.6.28 with some
>>> network namespace backports.  commit
>>> d315492b1a6ba29da0fa2860759505ae1b2db857 was definitely present.
>>>
>>> Daniel any ideas?
>>>   
>>>       
>> Hi Eric,
>>
>> I found this one. May be it could be related to your problem:
>>
>> commit 2bad35b7c9588eb5e65c03bcae54e7eb6b1a6504
>>
>> Let me know :)
>>     
>
> "netns: oops in ip[6]_frag_reasm incrementing stats" does not look likely.
>
> There is no real ipv6 traffic currently on the our network and the panic
> is definitely in inet_twdr_do_twkill_work.
>
> Further we are getting the net of a timewait socket.  So I don't see how
> a problem with NULL devs could have anything to do with it.
>
> I really suspect the purge code is not being successful.
>   
May be you can activate the NETNS_REFCNT_DEBUG in order to check if the 
timewait socket
were destroyed at the namespace destruction ? Unfortunately it looks 
like the option is not in the Kconfig :(

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel panic in inet_twdr_do_twkill_work
  2009-05-14  8:33     ` Daniel Lezcano
@ 2009-05-14  9:13       ` Eric W. Biederman
  2009-05-14  9:21         ` Daniel Lezcano
                           ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Eric W. Biederman @ 2009-05-14  9:13 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: netdev, Denis V. Lunev

Daniel Lezcano <daniel.lezcano@free.fr> writes:

> May be you can activate the NETNS_REFCNT_DEBUG in order to check if the timewait
> socket
> were destroyed at the namespace destruction ? Unfortunately it looks like the
> option is not in the Kconfig :(

Looks like a good starting place.

I will enable that when I respin my internal kernel.

I don't have a good reproducer at the moment....  So I was hoping we could
figure this out with code inspection.

Eric


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel panic in inet_twdr_do_twkill_work
  2009-05-14  9:13       ` Eric W. Biederman
@ 2009-05-14  9:21         ` Daniel Lezcano
  2009-05-14  9:42         ` Daniel Lezcano
  2009-05-24 13:26         ` Daniel Lezcano
  2 siblings, 0 replies; 10+ messages in thread
From: Daniel Lezcano @ 2009-05-14  9:21 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, Denis V. Lunev

Eric W. Biederman wrote:
> Daniel Lezcano <daniel.lezcano@free.fr> writes:
>
>   
>> May be you can activate the NETNS_REFCNT_DEBUG in order to check if the timewait
>> socket
>> were destroyed at the namespace destruction ? Unfortunately it looks like the
>> option is not in the Kconfig :(
>>     
>
> Looks like a good starting place.
>
> I will enable that when I respin my internal kernel.
>
> I don't have a good reproducer at the moment....  So I was hoping we could
> figure this out with code inspection.
>   

I remember I wrote a small program to create hundred of timewait sockets 
to test the purge.
I will look if I can found it and try as a good reproducer.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel panic in inet_twdr_do_twkill_work
  2009-05-14  9:13       ` Eric W. Biederman
  2009-05-14  9:21         ` Daniel Lezcano
@ 2009-05-14  9:42         ` Daniel Lezcano
  2009-05-24 13:26         ` Daniel Lezcano
  2 siblings, 0 replies; 10+ messages in thread
From: Daniel Lezcano @ 2009-05-14  9:42 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, Denis V. Lunev

[-- Attachment #1: Type: text/plain, Size: 690 bytes --]

Eric W. Biederman wrote:
> Daniel Lezcano <daniel.lezcano@free.fr> writes:
>
>   
>> May be you can activate the NETNS_REFCNT_DEBUG in order to check if the timewait
>> socket
>> were destroyed at the namespace destruction ? Unfortunately it looks like the
>> option is not in the Kconfig :(
>>     
>
> Looks like a good starting place.
>
> I will enable that when I respin my internal kernel.
>
> I don't have a good reproducer at the moment....  So I was hoping we could
> figure this out with code inspection.
>   
I found this one which makes a lot of timewait sockets. I tried on a 
2.6.29 kernel and I was not able to reproduce it. Can you check if this 
program reproduce the bug ?

[-- Attachment #2: timewait.c --]
[-- Type: text/x-csrc, Size: 2286 bytes --]

#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/poll.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
#include <arpa/inet.h>

#include <unistd.h>

#define MAXCONN 10000

int client(int *fds)
{
	int i, len;
	struct sockaddr_in6 addr;

	close(fds[1]);

	memset(&addr, 0, sizeof(addr));

        addr.sin6_family = AF_INET6;
        addr.sin6_port = htons(10000);
        addr.sin6_addr = in6addr_loopback;

	if (read(fds[0], &i, sizeof(i)) == -1) {
		perror("read");
		return 1;
	}

	for (i = 0; i < MAXCONN; i++) {
		int fd = socket(PF_INET6, SOCK_STREAM, 0);
		if (fd == -1) {
			perror("socket");
			return 1;
		}

		if (connect(fd, (const struct sockaddr *)&addr, sizeof(addr))) {
			perror("connect");
			return 1;
		}
		
		len = write(fd, &fd, sizeof(fd));
		if (!len) {
			fprintf(stderr, "write wrote 0 bytes\n");
			return 1;
		}
		if (len == -1) {
			perror("write");
			return 1;
		}
	}

	return 0;
}

int server(int *fds)
{
	int i, fd, fdpoll[MAXCONN];
	struct sockaddr_in6 addr;
	socklen_t socklen = sizeof(addr);

	close(fds[0]);

	fd = socket(PF_INET6, SOCK_STREAM, 0);
	if (fd == -1) {
		perror("socket");
		return 1;
	}

	memset(&addr, 0, sizeof(addr));

        addr.sin6_family = AF_INET6;
        addr.sin6_port = htons(10000);
        addr.sin6_addr = in6addr_loopback;
	
	if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &fd, sizeof(fd))) {
		perror("setsockopt");
		return 1;
	}

	if (bind(fd, (const struct sockaddr *)&addr, sizeof(addr))) {
		perror("bind");
		return 1;
	}

	if (listen(fd, MAXCONN)) {
		perror("listen");
		return 1;
	}
	
	if (write(fds[1], &i, sizeof(i)) == -1) {
		perror("write");
		return 1;
	}

	for (i = 0; i < MAXCONN; i++) {
		int len, f = accept(fd, (struct sockaddr *)&addr, &socklen);
		if (f == -1) {
			perror("accept");
			return 1;
		}
		fdpoll[i] = f;

		len = read(f, &f, sizeof(f)); 
		if (!len) {
			fprintf(stderr, "read readen 0 bytes\n");
			return 1;
		}
		if (len == -1) {
			perror("read");
			return 1;
		}
	}

	return 0;
}

int main(int argc, char *argv[])
{
	int fds[2];
	int pid;

	if (pipe(fds)) {
		perror("pipe");
		return 1;
	}

	pid = fork();
	if (pid == -1) {
		perror("fork");
		return 1;
	}

	if (!pid) 
		return client(fds);
	else 
		return server(fds);
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel panic in inet_twdr_do_twkill_work
  2009-05-14  9:13       ` Eric W. Biederman
  2009-05-14  9:21         ` Daniel Lezcano
  2009-05-14  9:42         ` Daniel Lezcano
@ 2009-05-24 13:26         ` Daniel Lezcano
  2009-05-24 13:54           ` Eric W. Biederman
  2009-06-03  0:40           ` Eric W. Biederman
  2 siblings, 2 replies; 10+ messages in thread
From: Daniel Lezcano @ 2009-05-24 13:26 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, Denis V. Lunev

Eric W. Biederman wrote:
> Daniel Lezcano <daniel.lezcano@free.fr> writes:
>
>   
>> May be you can activate the NETNS_REFCNT_DEBUG in order to check if the timewait
>> socket
>> were destroyed at the namespace destruction ? Unfortunately it looks like the
>> option is not in the Kconfig :(
>>     
>
> Looks like a good starting place.
>
> I will enable that when I respin my internal kernel.
>
> I don't have a good reproducer at the moment....  So I was hoping we could
> figure this out with code inspection.
>   
Hi Eric,

did you succeeded to reproduce the bug with the test program I sent you ?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel panic in inet_twdr_do_twkill_work
  2009-05-24 13:26         ` Daniel Lezcano
@ 2009-05-24 13:54           ` Eric W. Biederman
  2009-06-03  0:40           ` Eric W. Biederman
  1 sibling, 0 replies; 10+ messages in thread
From: Eric W. Biederman @ 2009-05-24 13:54 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: netdev, Denis V. Lunev

Daniel Lezcano <daniel.lezcano@free.fr> writes:

> did you succeeded to reproduce the bug with the test program I sent you ?

Grr.  My apologies.  I haven't had a chance to play with that yet.
Thank you for the reminder. 

Eric


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel panic in inet_twdr_do_twkill_work
  2009-05-24 13:26         ` Daniel Lezcano
  2009-05-24 13:54           ` Eric W. Biederman
@ 2009-06-03  0:40           ` Eric W. Biederman
  1 sibling, 0 replies; 10+ messages in thread
From: Eric W. Biederman @ 2009-06-03  0:40 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: netdev, Denis V. Lunev

Daniel Lezcano <daniel.lezcano@free.fr> writes:

> Eric W. Biederman wrote:
>> Daniel Lezcano <daniel.lezcano@free.fr> writes:
>>
>>   
>>> May be you can activate the NETNS_REFCNT_DEBUG in order to check if the timewait
>>> socket
>>> were destroyed at the namespace destruction ? Unfortunately it looks like the
>>> option is not in the Kconfig :(
>>>     
>>
>> Looks like a good starting place.
>>
>> I will enable that when I respin my internal kernel.
>>
>> I don't have a good reproducer at the moment....  So I was hoping we could
>> figure this out with code inspection.
>>   
> Hi Eric,
>
> did you succeeded to reproduce the bug with the test program I sent you ?

Weird.  I finally got around to running your little test app, and I don't trigger
it here.

At the same time I am starting to see what I think is this error more often.

Eric

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-06-03  0:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-14  1:22 Kernel panic in inet_twdr_do_twkill_work Eric W. Biederman
2009-05-14  7:53 ` Daniel Lezcano
2009-05-14  8:18   ` Eric W. Biederman
2009-05-14  8:33     ` Daniel Lezcano
2009-05-14  9:13       ` Eric W. Biederman
2009-05-14  9:21         ` Daniel Lezcano
2009-05-14  9:42         ` Daniel Lezcano
2009-05-24 13:26         ` Daniel Lezcano
2009-05-24 13:54           ` Eric W. Biederman
2009-06-03  0:40           ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).