public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Oops in sock_poll
@ 2002-01-17 16:51 Fabien Ribes
  2002-01-17 21:12 ` David S. Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Fabien Ribes @ 2002-01-17 16:51 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org

Hi all,

I have a kernel Oops on ppc kernel 2.4.5 with an application listenning
to a high throughput of incoming messages on a netlink socket. The
application is running select() on the netlink socket file descriptor
followed with  recvmsg() call (in a forever loop).

The Oops (not saved, and hard to reproduce) showed a crash in the
sock_poll function (kernel/net/socket.c); after investigations, crash is
due to a NULL pointer in f_dentry member of the file structure. This
pointer is set to NULL in the fput (kernel/fs/file_table.c) function.
The backtraces show that the calling function is the sys_recvmsg
(kernel/net/socket.c).

My understanding of the problem is the following:

- When everything goes right:

A/ When netlink socket is opened, its associated file structure is
initialised with f_count to 1, and a dentry;

B/ When select is executed, f_count is increased to 2;

C/ When select ends, f_count is decreased to 1;

D/ When recvmsg is executed, f_count is increased to 2;

E/ When recvmsg ends, f_count is decreased to 1;

F/ Loop forever to B/

- When the problem occurs:

A/ When netlink socket is opened, its associated file structure is
initialised with f_count to 1, and a dentry;

B/ When select is executed, f_count is increased to 2;

C/ When select ends, f_count is decreased to 1;

D/ When recvmsg is executed, f_count is increased to 2;

????/ SOMETHING decreases f_count to 1;

E/ When recvmsg ends, f_count is decreased to 0, AND THEREFORE f_dentry
member of file is set to NULL (since file is considered as not used) ;

F/ When select is executed, f_count is incremented to 1, but f_dentry is
NULL and therefore following code crashes in sock_poll function:
 sock = socki_lookup(file->f_dentry->d_inode);

Do you have an idea of the event that could have decreased the f_count
member between D/ and E/ ?
Could you give me elements to continue my investigation ?

Thanks a lot for you help,
Fabien

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in sock_poll
  2002-01-17 16:51 Oops in sock_poll Fabien Ribes
@ 2002-01-17 21:12 ` David S. Miller
  2002-01-18  9:01   ` Fabien Ribes
  0 siblings, 1 reply; 5+ messages in thread
From: David S. Miller @ 2002-01-17 21:12 UTC (permalink / raw)
  To: fabien.ribes; +Cc: linux-kernel


Can you reproduce this with a more recent kernel?  Anything
>=2.4.9 (this includes all Red Hat errata kernels therefore)
would be sufficient.

And also please provide a full decoded OOPS log as well, thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in sock_poll
  2002-01-17 21:12 ` David S. Miller
@ 2002-01-18  9:01   ` Fabien Ribes
  2002-01-18 10:55     ` David S. Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Fabien Ribes @ 2002-01-18  9:01 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

Hi,

"David S. Miller" wrote:
> 
> Can you reproduce this with a more recent kernel?  Anything
> >=2.4.9 (this includes all Red Hat errata kernels therefore)
> would be sufficient.
The kernel used is customized in many ways, it is a long work to upgrade
...

> And also please provide a full decoded OOPS log as well, thanks.
here it is:
ksymoops 2.3.7 on i686 2.4.3.  Options used
     -v vmlinux (specified)
     -K (specified)
     -L (specified)
     -O (specified)
     -m System.map (specified)
     -t elf_powerpc -a powerpc:common

Oops: kernel access of bad area, sig: 11
NIP: C00A0EB4 XER: 00000000 LR: C0046B20 SP: C1981E60 REGS: c1981db0
TRAP: 0300
MSR: 00009230 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c1980000[148] 'feemond' Last syscall: 142 
last math 00000000 last altivec 00000000
GPR00: C0046B20 C1981E60 C1980000 C1AEFBA0 C1981E78 C1981E78 C1BEB780
00000000 
GPR08: 00000000 00000000 00000000 C1BEB800 C1BEB780 1001D8B8 00000000
00000000 
GPR16: 00000000 00000000 00000000 00000000 C1981EE8 00000005 000000B4
00000000 
GPR24: C1981E78 00000004 00000145 C1981EC8 00000000 00000000 00000010
C1AEFBA0 
Call backtrace: 
C0046884 C0046B20 C0046FC4 C0007E1C C000266C 10001888 100016F8 
10000B30 0FEF6A6C 00000000 
Warning (Oops_read): Code line not seen, dumping what data is available

>>NIP; c00a0eb4 <sock_poll+14/3c>   <=====
Trace; c0046884 <poll_freewait+54/70>
Trace; c0046b20 <do_select+e4/208>
Trace; c0046fc4 <sys_select+330/470>
Trace; c0007e1c <ppc_select+a0/b0>
Trace; c000266c <ret_from_syscall_1+0/b4>
Trace; 10001888 Before first symbol
Trace; 100016f8 Before first symbol
Trace; 10000b30 Before first symbol
Trace; 0fef6a6c Before first symbol
Trace; 00000000 Before first symbol


1 warning issued.  Results may not be reliable.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in sock_poll
  2002-01-18  9:01   ` Fabien Ribes
@ 2002-01-18 10:55     ` David S. Miller
  2002-01-19 18:41       ` kuznet
  0 siblings, 1 reply; 5+ messages in thread
From: David S. Miller @ 2002-01-18 10:55 UTC (permalink / raw)
  To: fabien.ribes; +Cc: linux-kernel

   From: Fabien Ribes <fabien.ribes@cgey.com>
   Date: Fri, 18 Jan 2002 09:01:32 +0000

   "David S. Miller" wrote:
   > 
   > Can you reproduce this with a more recent kernel?  Anything
   > >=2.4.9 (this includes all Red Hat errata kernels therefore)
   > would be sufficient.

   The kernel used is customized in many ways, it is a long work to upgrade

Then I can't help you... there have probably been many
networking bugs fixed since 2.4.9

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in sock_poll
  2002-01-18 10:55     ` David S. Miller
@ 2002-01-19 18:41       ` kuznet
  0 siblings, 0 replies; 5+ messages in thread
From: kuznet @ 2002-01-19 18:41 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

Hello!

>    The kernel used is customized in many ways, it is a long work to upgrade
> 
> Then I can't help you... there have probably been many
> networking bugs fixed since 2.4.9

I do not remember that we _ever_ had problems with leaking f_count.
And it is so far of networking... :-)

"customized in many ways" bug sounds as better candidate to be fixed.

Alexey

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-01-19 18:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-01-17 16:51 Oops in sock_poll Fabien Ribes
2002-01-17 21:12 ` David S. Miller
2002-01-18  9:01   ` Fabien Ribes
2002-01-18 10:55     ` David S. Miller
2002-01-19 18:41       ` kuznet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox