netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
       [not found] <bug-10737-10286@http.bugzilla.kernel.org/>
@ 2008-05-17 21:10 ` Andrew Morton
  2008-05-18  1:39   ` Patrick McHardy
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2008-05-17 21:10 UTC (permalink / raw)
  To: netdev; +Cc: bugme-daemon, devzero, Robert Olsson, Denis V. Lunev

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Sat, 17 May 2008 13:59:25 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=10737
> 
>            Summary: pktgen procfs problem
>            Product: Networking
>            Version: 2.5
>      KernelVersion: 2.6.26-rc2
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>         AssignedTo: acme@ghostprotocols.net
>         ReportedBy: devzero@web.de
> 
> 
> Latest working kernel version: n/a
> Earliest failing kernel version: 2.6.26-rc2
> Distribution: suse 10.1
> Hardware Environment: p4 + gigabyte i915g MoBo
> Software Environment:
> Problem Description:
> pktgen module ocaasional seems to leak procfs entry
> 
> May 18 00:00:34 test kernel: [ 2663.373955] pktgen v2.69: Packet Generator for
> packet performance testing.
> May 18 00:00:34 test kernel: [ 2663.384819] remove_proc_entry: removing
> non-empty directory 'net/pktgen', leaking at least 'kpktgend_1'

^^

Possibly we were always leaking this procfs entry, and newly-added
procfs diagnostics are now detecting it.


> May 18 00:00:34 test kernel: [ 2663.384819] ------------[ cut here
> ]------------
> May 18 00:00:34 test kernel: [ 2663.384819] WARNING: at fs/proc/generic.c:799
> remove_proc_entry+0x110/0x12d()
> May 18 00:00:34 test kernel: [ 2663.384819] Modules linked in: pktgen(-) ip_gre
> lzo lzo_decompress lzo_compress tgr192 cts seed camellia fcrypt anubis cast6
> cast5 ccm gcm ctr lrw gf128mul aes_i586 aes_generic blowfish md4 cbc ecb
> sha1_generic aead 8250_accent 8250_exar_st16c554 8250_fourport 8250_boca
> 8250_hub6 ovcamchip i2c_core guillemot adi a3d cobra gameport parkbd
> deadline_iosched capifs ar7part mtd eql ide_generic siimage aec62xx trm290
> alim15x3 hpt34x hpt366 cmd64x cmd640 piix rz1000 slc90e66 jmicron cs5535 cs5530
> cs5520 sc1200 triflex ide_pci_generic atiixp pdc202xx_old pdc202xx_new tc86c001
> opti621 ns87415 cy82c693 amd74xx sis5513 via82cxxx serverworks it821x it8213
> ide_core eni suni zatm uPD98402 atm non_fatal p4_clockmod speedstep_lib
> freq_table decnet crypto_blkcipher sctp libcrc32c sunrpc psmouse ipv6 vboxdrv
> mousedev e100 mii intel_agp agpgart usbcore parport_pc parport megaraid_mbox
> megaraid_mm sd_mod scsi_mod [last unloaded: ax25]
> May 18 00:00:34 test kernel: [ 2663.384819] Pid: 31183, comm: modprobe Tainted:
> G        W 2.6.26-rc2-git5 #2
> May 18 00:00:34 test kernel: [ 2663.384819]  [<c0122aae>]
> warn_on_slowpath+0x41/0x6c
> May 18 00:00:34 test kernel: [ 2663.384819]  [<c0135c32>] ? up+0x2b/0x2f
> May 18 00:00:34 test kernel: [ 2663.384819]  [<c0122f99>] ?
> release_console_sem+0x184/0x18c
> May 18 00:00:34 test kernel: [ 2663.384819]  [<c02f5f3e>] ?
> schedule_timeout+0x16/0x8b
> May 18 00:00:34 test kernel: [ 2663.384819]  [<c0123417>] ? printk+0x15/0x17
> May 18 00:00:34 test kernel: [ 2663.384819]  [<c01a9a5a>]
> remove_proc_entry+0x110/0x12d
> May 18 00:00:34 test kernel: [ 2663.384819]  [<c01ac70b>]
> proc_net_remove+0xf/0x11
> May 18 00:00:34 test kernel: [ 2663.384819]  [<f8a47be7>] pg_cleanup+0x5f/0x90
> [pktgen]
> May 18 00:00:34 test kernel: [ 2663.384819]  [<c0140899>]
> sys_delete_module+0x18e/0x1d9
> May 18 00:00:34 test kernel: [ 2663.384819]  [<c016813d>] ?
> remove_vma+0x41/0x47
> May 18 00:00:34 test kernel: [ 2663.384819]  [<c0168aa7>] ?
> do_munmap+0x17d/0x197
> May 18 00:00:34 test kernel: [ 2663.384819]  [<c0103811>]
> sysenter_past_esp+0x6a/0x91
> May 18 00:00:34 test kernel: [ 2663.384819]  [<c02f0000>] ?
> pcibios_setup+0xa8/0x2a0
> May 18 00:00:34 test kernel: [ 2663.384819]  =======================
> May 18 00:00:34 test kernel: [ 2663.384819] ---[ end trace 4351c3133409518a
> ]---
> 
> 
> Steps to reproduce: not sure yet, but i saw this while doing some module
> load/unload  regression test.
> 


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
  2008-05-17 21:10 ` [Bugme-new] [Bug 10737] New: pktgen procfs problem Andrew Morton
@ 2008-05-18  1:39   ` Patrick McHardy
  2008-05-18  4:56     ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Patrick McHardy @ 2008-05-18  1:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: netdev, bugme-daemon, devzero, Robert Olsson, Denis V. Lunev

Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Sat, 17 May 2008 13:59:25 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:
>
>   
>> http://bugzilla.kernel.org/show_bug.cgi?id=10737
>>
>>            Summary: pktgen procfs problem
>>            Product: Networking
>>            Version: 2.5
>>      KernelVersion: 2.6.26-rc2
>>           Platform: All
>>         OS/Version: Linux
>>               Tree: Mainline
>>             Status: NEW
>>           Severity: normal
>>           Priority: P1
>>          Component: Other
>>         AssignedTo: acme@ghostprotocols.net
>>         ReportedBy: devzero@web.de
>>
>>
>> Latest working kernel version: n/a
>> Earliest failing kernel version: 2.6.26-rc2
>> Distribution: suse 10.1
>> Hardware Environment: p4 + gigabyte i915g MoBo
>> Software Environment:
>> Problem Description:
>> pktgen module ocaasional seems to leak procfs entry
>>
>> May 18 00:00:34 test kernel: [ 2663.373955] pktgen v2.69: Packet Generator for
>> packet performance testing.
>> May 18 00:00:34 test kernel: [ 2663.384819] remove_proc_entry: removing
>> non-empty directory 'net/pktgen', leaking at least 'kpktgend_1'
>>     
>
> ^^
>
> Possibly we were always leaking this procfs entry, and newly-added
> procfs diagnostics are now detecting it.
>   

I've been looking into the same problem, without much success so
far. The problem appears to affect any /proc/net file, but not
files outside of /proc/net, so I'm guessing its net-ns related.
A testcase found by Ben Greear is opening the file multiple times:

# /tmp/open /proc/net/kpktgen_0

=> refcnt goes to 1

^C

=> refcnt goes to 0

Without ^C and opening the file a second time:

# /tmp/open /proc/net/kpktgen_0

=> refcnt goes to 2 (sometimes also 11)

^C

=> refcnt stays at previous value.

The refcnt even leaks if the file can't be successfully opened,
for example because of lacking permissions.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
  2008-05-18  1:39   ` Patrick McHardy
@ 2008-05-18  4:56     ` Andrew Morton
  2008-05-18 12:06       ` Patrick McHardy
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2008-05-18  4:56 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: netdev, bugme-daemon, devzero, Robert Olsson, Denis V. Lunev

On Sun, 18 May 2008 03:39:42 +0200 Patrick McHardy <kaber@trash.net> wrote:

> Andrew Morton wrote:
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> >
> > On Sat, 17 May 2008 13:59:25 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote:
> >
> >   
> >> http://bugzilla.kernel.org/show_bug.cgi?id=10737
> >>
> >>            Summary: pktgen procfs problem
> >>            Product: Networking
> >>            Version: 2.5
> >>      KernelVersion: 2.6.26-rc2
> >>           Platform: All
> >>         OS/Version: Linux
> >>               Tree: Mainline
> >>             Status: NEW
> >>           Severity: normal
> >>           Priority: P1
> >>          Component: Other
> >>         AssignedTo: acme@ghostprotocols.net
> >>         ReportedBy: devzero@web.de
> >>
> >>
> >> Latest working kernel version: n/a
> >> Earliest failing kernel version: 2.6.26-rc2
> >> Distribution: suse 10.1
> >> Hardware Environment: p4 + gigabyte i915g MoBo
> >> Software Environment:
> >> Problem Description:
> >> pktgen module ocaasional seems to leak procfs entry
> >>
> >> May 18 00:00:34 test kernel: [ 2663.373955] pktgen v2.69: Packet Generator for
> >> packet performance testing.
> >> May 18 00:00:34 test kernel: [ 2663.384819] remove_proc_entry: removing
> >> non-empty directory 'net/pktgen', leaking at least 'kpktgend_1'
> >>     
> >
> > ^^
> >
> > Possibly we were always leaking this procfs entry, and newly-added
> > procfs diagnostics are now detecting it.
> >   
> 
> I've been looking into the same problem, without much success so
> far. The problem appears to affect any /proc/net file, but not
> files outside of /proc/net, so I'm guessing its net-ns related.
> A testcase found by Ben Greear is opening the file multiple times:
> 
> # /tmp/open /proc/net/kpktgen_0
> 
> => refcnt goes to 1
> 
> ^C
> 
> => refcnt goes to 0
> 
> Without ^C and opening the file a second time:
> 
> # /tmp/open /proc/net/kpktgen_0
> 
> => refcnt goes to 2 (sometimes also 11)
> 
> ^C
> 
> => refcnt stays at previous value.
> 
> The refcnt even leaks if the file can't be successfully opened,
> for example because of lacking permissions.

urgh.  Is any of this known to be post-2.6.25?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
  2008-05-18  4:56     ` Andrew Morton
@ 2008-05-18 12:06       ` Patrick McHardy
  2008-05-18 13:24         ` Patrick McHardy
  0 siblings, 1 reply; 19+ messages in thread
From: Patrick McHardy @ 2008-05-18 12:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: netdev, bugme-daemon, devzero, Robert Olsson, Denis V. Lunev

Andrew Morton wrote:
> On Sun, 18 May 2008 03:39:42 +0200 Patrick McHardy <kaber@trash.net> wrote:
>
>   
>> I've been looking into the same problem, without much success so
>> far. The problem appears to affect any /proc/net file, but not
>> files outside of /proc/net, so I'm guessing its net-ns related.
>> A testcase found by Ben Greear is opening the file multiple times:
>>
>> # /tmp/open /proc/net/kpktgen_0
>>
>> => refcnt goes to 1
>>
>> ^C
>>
>> => refcnt goes to 0
>>
>> Without ^C and opening the file a second time:
>>
>> # /tmp/open /proc/net/kpktgen_0
>>
>> => refcnt goes to 2 (sometimes also 11)
>>
>> ^C
>>
>> => refcnt stays at previous value.
>>
>> The refcnt even leaks if the file can't be successfully opened,
>> for example because of lacking permissions.
>>     
>
> urgh.  Is any of this known to be post-2.6.25?
>   

2.6.25 is also affected. I don't know about earlier kernels.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
  2008-05-18 12:06       ` Patrick McHardy
@ 2008-05-18 13:24         ` Patrick McHardy
  2008-05-18 15:31           ` Patrick McHardy
  0 siblings, 1 reply; 19+ messages in thread
From: Patrick McHardy @ 2008-05-18 13:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: netdev, bugme-daemon, devzero, Robert Olsson, Denis V. Lunev

Patrick McHardy wrote:
> Andrew Morton wrote:
>> On Sun, 18 May 2008 03:39:42 +0200 Patrick McHardy <kaber@trash.net> 
>> wrote:
>>
>>  
>>> I've been looking into the same problem, without much success so
>>> far. The problem appears to affect any /proc/net file, but not
>>> files outside of /proc/net, so I'm guessing its net-ns related.
>>> A testcase found by Ben Greear is opening the file multiple times:
>>>
>>> # /tmp/open /proc/net/kpktgen_0
>>>
>>> => refcnt goes to 1
>>>
>>> ^C
>>>
>>> => refcnt goes to 0
>>>
>>> Without ^C and opening the file a second time:
>>>
>>> # /tmp/open /proc/net/kpktgen_0
>>>
>>> => refcnt goes to 2 (sometimes also 11)
>>>
>>> ^C
>>>
>>> => refcnt stays at previous value.
>>>
>>> The refcnt even leaks if the file can't be successfully opened,
>>> for example because of lacking permissions.
>>>     
>>
>> urgh.  Is any of this known to be post-2.6.25?
>>   
> 
> 2.6.25 is also affected. I don't know about earlier kernels.

Some more information: the problem seems to occur only if
the file is opened by two different processes.

I'm starting a bisection now.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
  2008-05-18 13:24         ` Patrick McHardy
@ 2008-05-18 15:31           ` Patrick McHardy
  2008-05-19  7:54             ` Denis V. Lunev
  2008-05-19 21:34             ` Eric W. Biederman
  0 siblings, 2 replies; 19+ messages in thread
From: Patrick McHardy @ 2008-05-18 15:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: netdev, bugme-daemon, devzero, Robert Olsson, Denis V. Lunev,
	Pavel Emelyanov, Eric W. Biederman, Ben Greear

Patrick McHardy wrote:
>>>> I've been looking into the same problem, without much success so
>>>> far. The problem appears to affect any /proc/net file, but not
>>>> files outside of /proc/net, so I'm guessing its net-ns related.
>>>> A testcase found by Ben Greear is opening the file multiple times:
>>>>
>>>> # /tmp/open /proc/net/kpktgen_0
>>>>
>>>> => refcnt goes to 1
>>>>
>>>> ^C
>>>>
>>>> => refcnt goes to 0
>>>>
>>>> Without ^C and opening the file a second time:
>>>>
>>>> # /tmp/open /proc/net/kpktgen_0
>>>>
>>>> => refcnt goes to 2 (sometimes also 11)
>>>>
>>>> ^C
>>>>
>>>> => refcnt stays at previous value.
>>>>
>>>> The refcnt even leaks if the file can't be successfully opened,
>>>> for example because of lacking permissions.
>>>>     
>>>
>>> urgh.  Is any of this known to be post-2.6.25?
>>>   
>>
>> 2.6.25 is also affected. I don't know about earlier kernels.
> 
> Some more information: the problem seems to occur only if
> the file is opened by two different processes.
> 
> I'm starting a bisection now.


git-bisect identified e9720acd ([NET]: Make /proc/net a symlink
on /proc/self/net (v3)) as the guilty commit. I couldn't find
the problem in that commit, so someone with a better understanding
of how this is supposed to work should look into it.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
  2008-05-18 15:31           ` Patrick McHardy
@ 2008-05-19  7:54             ` Denis V. Lunev
  2008-05-19 10:34               ` Patrick McHardy
  2008-05-20  0:26               ` Ben Greear
  2008-05-19 21:34             ` Eric W. Biederman
  1 sibling, 2 replies; 19+ messages in thread
From: Denis V. Lunev @ 2008-05-19  7:54 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Andrew Morton, netdev, bugme-daemon, devzero, Robert Olsson,
	Pavel Emelyanov, Eric W. Biederman, Ben Greear

[-- Attachment #1: Type: text/plain, Size: 1728 bytes --]

On Sun, 2008-05-18 at 17:31 +0200, Patrick McHardy wrote:
> Patrick McHardy wrote:
> >>>> I've been looking into the same problem, without much success so
> >>>> far. The problem appears to affect any /proc/net file, but not
> >>>> files outside of /proc/net, so I'm guessing its net-ns related.
> >>>> A testcase found by Ben Greear is opening the file multiple times:
> >>>>
> >>>> # /tmp/open /proc/net/kpktgen_0
> >>>>
> >>>> => refcnt goes to 1
> >>>>
> >>>> ^C
> >>>>
> >>>> => refcnt goes to 0
> >>>>
> >>>> Without ^C and opening the file a second time:
> >>>>
> >>>> # /tmp/open /proc/net/kpktgen_0
> >>>>
> >>>> => refcnt goes to 2 (sometimes also 11)
> >>>>
> >>>> ^C
> >>>>
> >>>> => refcnt stays at previous value.
> >>>>
> >>>> The refcnt even leaks if the file can't be successfully opened,
> >>>> for example because of lacking permissions.
> >>>>     
> >>>
> >>> urgh.  Is any of this known to be post-2.6.25?
> >>>   
> >>
> >> 2.6.25 is also affected. I don't know about earlier kernels.
> > 
> > Some more information: the problem seems to occur only if
> > the file is opened by two different processes.
> > 
> > I'm starting a bisection now.
> 
> 
> git-bisect identified e9720acd ([NET]: Make /proc/net a symlink
> on /proc/self/net (v3)) as the guilty commit. I couldn't find
> the problem in that commit, so someone with a better understanding
> of how this is supposed to work should look into it.

could you consider this preliminary patch? It fixes the problem for me
and Pavel agrees with it.

The problem is that module_get is called for each file opening while
module_put is called only when /proc inode is destroyed. So, lets put
module counter if we are dealing with already initialised inode.

[-- Attachment #2: diff-procinode-count --]
[-- Type: text/x-patch, Size: 324 bytes --]

diff --git a/fs/proc/inode.c b/fs/proc/inode.c
index 6f4e8dc..b08d100 100644
--- a/fs/proc/inode.c
+++ b/fs/proc/inode.c
@@ -425,7 +425,8 @@ struct inode *proc_get_inode(struct super_block *sb, unsigned int ino,
 			}
 		}
 		unlock_new_inode(inode);
-	}
+	} else
+	       module_put(de->owner);
 	return inode;
 
 out_ino:

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
  2008-05-19  7:54             ` Denis V. Lunev
@ 2008-05-19 10:34               ` Patrick McHardy
  2008-05-20  0:26               ` Ben Greear
  1 sibling, 0 replies; 19+ messages in thread
From: Patrick McHardy @ 2008-05-19 10:34 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: Andrew Morton, netdev, bugme-daemon, devzero, Robert Olsson,
	Pavel Emelyanov, Eric W. Biederman, Ben Greear

Denis V. Lunev wrote:
> On Sun, 2008-05-18 at 17:31 +0200, Patrick McHardy wrote:
>   
>> git-bisect identified e9720acd ([NET]: Make /proc/net a symlink
>> on /proc/self/net (v3)) as the guilty commit. I couldn't find
>> the problem in that commit, so someone with a better understanding
>> of how this is supposed to work should look into it.
>>     
>
> could you consider this preliminary patch? It fixes the problem for me
> and Pavel agrees with it.
>
> The problem is that module_get is called for each file opening while
> module_put is called only when /proc inode is destroyed. So, lets put
> module counter if we are dealing with already initialised inode.
>   


The patch works fine for me, thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
@ 2008-05-19 15:19 Alexey Dobriyan
  0 siblings, 0 replies; 19+ messages in thread
From: Alexey Dobriyan @ 2008-05-19 15:19 UTC (permalink / raw)
  To: bugme-daemon; +Cc: akpm, kaber, den, netdev, devzero

> remove_proc_entry: removing non-empty directory 'net/pktgen', leaking at least 'kpktgend_1'

> --- a/fs/proc/inode.c
> +++ b/fs/proc/inode.c
> @@ -425,7 +425,8 @@ struct inode *proc_get_inode(struct super_block *sb, unsigned int ino,
>                         }
>                 }
>                 unlock_new_inode(inode);
> -       }
> +       } else
> +              module_put(de->owner);
>         return inode;
>  
>  out_ino:

For the record, I fail to see how this can fix original bugreport:
If rmmod started and proc warnings were spewed, module's refcount was 0,
so positive refcount leak can't do that.

Roland, was it just modprobe/rmmod or something else?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
@ 2008-05-19 16:03 Alexey Dobriyan
  0 siblings, 0 replies; 19+ messages in thread
From: Alexey Dobriyan @ 2008-05-19 16:03 UTC (permalink / raw)
  To: bugme-daemon; +Cc: akpm, kaber, den, netdev, devzero

> remove_proc_entry: removing non-empty directory 'net/pktgen', leaking at least 'kpktgend_1'

reproduced, and BTW, proc stuff is least of worries. It can also oops via

list_del corruption. prev->next should be ffff81007e8a5e70, but was 6b6b6b6b6b6b6b6b
kernel BUG at lib/list_debug.c:67!
 :pktgen:pktgen_thread_worker+0x374/0x10b0
 ? autoremove_wake_function+0x0/0x40
 ? _spin_unlock_irqrestore+0x42/0x80
 ? :pktgen:pktgen_thread_worker+0x0/0x10b0
 kthread+0x4d/0x80
 child_rip+0xa/0x12
 ? restore_args+0x0/0x30
 ? kthread+0x0/0x80
 ? child_rip+0x0/0x12
RIP  list_del+0x48/0x70

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
  2008-05-18 15:31           ` Patrick McHardy
  2008-05-19  7:54             ` Denis V. Lunev
@ 2008-05-19 21:34             ` Eric W. Biederman
  2008-05-19 22:19               ` Ben Greear
  1 sibling, 1 reply; 19+ messages in thread
From: Eric W. Biederman @ 2008-05-19 21:34 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Andrew Morton, netdev, bugme-daemon, devzero, Robert Olsson,
	Denis V. Lunev, Pavel Emelyanov, Ben Greear

Patrick McHardy <kaber@trash.net> writes:

> Patrick McHardy wrote:
>>>>> I've been looking into the same problem, without much success so
>>>>> far. The problem appears to affect any /proc/net file, but not
>>>>> files outside of /proc/net, so I'm guessing its net-ns related.
>>>>> A testcase found by Ben Greear is opening the file multiple times:
>>>>>
>>>>> # /tmp/open /proc/net/kpktgen_0
>>>>>
>>>>> => refcnt goes to 1
>>>>>
>>>>> ^C
>>>>>
>>>>> => refcnt goes to 0
>>>>>
>>>>> Without ^C and opening the file a second time:
>>>>>
>>>>> # /tmp/open /proc/net/kpktgen_0
>>>>>
>>>>> => refcnt goes to 2 (sometimes also 11)
>>>>>
>>>>> ^C
>>>>>
>>>>> => refcnt stays at previous value.
>>>>>
>>>>> The refcnt even leaks if the file can't be successfully opened,
>>>>> for example because of lacking permissions.

How are you reading the refcount on kpktgen_0? Just a printk in the
kernel code?

>> Some more information: the problem seems to occur only if
>> the file is opened by two different processes.
>>
>> I'm starting a bisection now.
>
>
> git-bisect identified e9720acd ([NET]: Make /proc/net a symlink
> on /proc/self/net (v3)) as the guilty commit. I couldn't find
> the problem in that commit, so someone with a better understanding
> of how this is supposed to work should look into it.

To recap:
- The problem is that we get complaints from remove_proc_entry
  on unload of the pktgen module.

- The problem appears to only happen when multiple processes open the file.

- The problem only appears after we moved /proc/net into /proc/<pid>/net


The obvious candidate is that we have multiple dcache entries for the
same proc inode.

It looks like time to reproduce this and see if we can figure out why
kpktgend_1 is still exists in the directory we are removing.



Eric


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
  2008-05-19 21:34             ` Eric W. Biederman
@ 2008-05-19 22:19               ` Ben Greear
  0 siblings, 0 replies; 19+ messages in thread
From: Ben Greear @ 2008-05-19 22:19 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Patrick McHardy, Andrew Morton, netdev, bugme-daemon, devzero,
	Robert Olsson, Denis V. Lunev, Pavel Emelyanov

Eric W. Biederman wrote:
> Patrick McHardy <kaber@trash.net> writes:
> 
>> Patrick McHardy wrote:
>>>>>> I've been looking into the same problem, without much success so
>>>>>> far. The problem appears to affect any /proc/net file, but not
>>>>>> files outside of /proc/net, so I'm guessing its net-ns related.
>>>>>> A testcase found by Ben Greear is opening the file multiple times:
>>>>>>
>>>>>> # /tmp/open /proc/net/kpktgen_0
>>>>>>
>>>>>> => refcnt goes to 1
>>>>>>
>>>>>> ^C
>>>>>>
>>>>>> => refcnt goes to 0
>>>>>>
>>>>>> Without ^C and opening the file a second time:
>>>>>>
>>>>>> # /tmp/open /proc/net/kpktgen_0
>>>>>>
>>>>>> => refcnt goes to 2 (sometimes also 11)
>>>>>>
>>>>>> ^C
>>>>>>
>>>>>> => refcnt stays at previous value.
>>>>>>
>>>>>> The refcnt even leaks if the file can't be successfully opened,
>>>>>> for example because of lacking permissions.
> 
> How are you reading the refcount on kpktgen_0? Just a printk in the
> kernel code?
> 
>>> Some more information: the problem seems to occur only if
>>> the file is opened by two different processes.
>>>
>>> I'm starting a bisection now.
>>
>> git-bisect identified e9720acd ([NET]: Make /proc/net a symlink
>> on /proc/self/net (v3)) as the guilty commit. I couldn't find
>> the problem in that commit, so someone with a better understanding
>> of how this is supposed to work should look into it.
> 
> To recap:
> - The problem is that we get complaints from remove_proc_entry
>   on unload of the pktgen module.
> 
> - The problem appears to only happen when multiple processes open the file.
> 
> - The problem only appears after we moved /proc/net into /proc/<pid>/net

Just to be clear, the problem I saw with too many refs would not even let
me remove a module.  They may be the same root problem, but not necessarily.

I also saw the problem on multiple modules with proc/net/ file systems,
not just pktgen.

The part about multiple processes opening the file is definitely true
with my bug report, but based on email I saw, I'm not sure it is true for
the remove_proc_entry problem reported by someone else.

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
@ 2008-05-19 22:21 Alexey Dobriyan
  2008-05-19 22:45 ` Robert Olsson
  0 siblings, 1 reply; 19+ messages in thread
From: Alexey Dobriyan @ 2008-05-19 22:21 UTC (permalink / raw)
  To: bugme-daemon; +Cc: devzero, akpm, kaber, den, adobriyan, netdev

> list_del corruption. prev->next should be ffff81007e8a5e70, but was 6b6b6b6b6b6b6b6b
> kernel BUG at lib/list_debug.c:67!
>	:pktgen:pktgen_thread_worker+0x374/0x10b0
>	? autoremove_wake_function+0x0/0x40
>	? _spin_unlock_irqrestore+0x42/0x80
>	? :pktgen:pktgen_thread_worker+0x0/0x10b0
>	kthread+0x4d/0x80
>	child_rip+0xa/0x12
>	? restore_args+0x0/0x30
>	? kthread+0x0/0x80
>	? child_rip+0x0/0x12
> RIP  list_del+0x48/0x70

OK, trivial debugging and we know that worker thread function
(pktgen_thread_worker) never ran between kthread creation and rmmod.

Consequently, addition to "pktgen_threads" list happened, list_del of
thread 0 never happened, list_del of thread 1 barfs.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
  2008-05-19 22:21 Alexey Dobriyan
@ 2008-05-19 22:45 ` Robert Olsson
  0 siblings, 0 replies; 19+ messages in thread
From: Robert Olsson @ 2008-05-19 22:45 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: bugme-daemon, devzero, akpm, kaber, den, adobriyan, netdev


Alexey Dobriyan writes:
 > > list_del corruption. prev->next should be ffff81007e8a5e70, but was 6b6b6b6b6b6b6b6b
 > > kernel BUG at lib/list_debug.c:67!
 > >	:pktgen:pktgen_thread_worker+0x374/0x10b0
 > >	? autoremove_wake_function+0x0/0x40
 > >	? _spin_unlock_irqrestore+0x42/0x80
 > >	? :pktgen:pktgen_thread_worker+0x0/0x10b0
 > >	kthread+0x4d/0x80
 > >	child_rip+0xa/0x12
 > >	? restore_args+0x0/0x30
 > >	? kthread+0x0/0x80
 > >	? child_rip+0x0/0x12
 > > RIP  list_del+0x48/0x70
 > 
 > OK, trivial debugging and we know that worker thread function
 > (pktgen_thread_worker) never ran between kthread creation and rmmod.
 > 
 > Consequently, addition to "pktgen_threads" list happened, list_del of
 > thread 0 never happened, list_del of thread 1 barfs.

 Yes a schedule_timeout_interruptible(1) first in pg_cleanup seems to "cure" 
 insmod/rmmod race. You can even put the delay between the two proc removal 
 calls. But it seems we need a better way to sync so both start and stop 
 happen for all threads in pktgen_thread_worker before we remove the last proc 
 entries. And I see this with just one "controlling" process. 

 Cheers.
					--ro

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
  2008-05-19  7:54             ` Denis V. Lunev
  2008-05-19 10:34               ` Patrick McHardy
@ 2008-05-20  0:26               ` Ben Greear
  2008-05-20  1:14                 ` Eric W. Biederman
  1 sibling, 1 reply; 19+ messages in thread
From: Ben Greear @ 2008-05-20  0:26 UTC (permalink / raw)
  To: Denis V. Lunev
  Cc: Patrick McHardy, Andrew Morton, netdev, bugme-daemon, devzero,
	Robert Olsson, Pavel Emelyanov, Eric W. Biederman

Denis V. Lunev wrote:

> could you consider this preliminary patch? It fixes the problem for me
> and Pavel agrees with it.
> 
> The problem is that module_get is called for each file opening while
> module_put is called only when /proc inode is destroyed. So, lets put
> module counter if we are dealing with already initialised inode.
> 
> 
> ------------------------------------------------------------------------
> 
> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
> index 6f4e8dc..b08d100 100644
> --- a/fs/proc/inode.c
> +++ b/fs/proc/inode.c
> @@ -425,7 +425,8 @@ struct inode *proc_get_inode(struct super_block *sb, unsigned int ino,
>  			}
>  		}
>  		unlock_new_inode(inode);
> -	}
> +	} else
> +	       module_put(de->owner);
>  	return inode;
>  
>  out_ino:

I just tested this and it seems to fix my problem (I applied this to 2.6.25 kernel).

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
  2008-05-20  0:26               ` Ben Greear
@ 2008-05-20  1:14                 ` Eric W. Biederman
  2008-05-20  8:25                   ` Robert Olsson
  0 siblings, 1 reply; 19+ messages in thread
From: Eric W. Biederman @ 2008-05-20  1:14 UTC (permalink / raw)
  To: Ben Greear
  Cc: Denis V. Lunev, Patrick McHardy, Andrew Morton, netdev,
	bugme-daemon, devzero, Robert Olsson, Pavel Emelyanov

Ben Greear <greearb@candelatech.com> writes:

> Denis V. Lunev wrote:
>
>> could you consider this preliminary patch? It fixes the problem for me
>> and Pavel agrees with it.
>>
>> The problem is that module_get is called for each file opening while
>> module_put is called only when /proc inode is destroyed. So, lets put
>> module counter if we are dealing with already initialised inode.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>


>> ------------------------------------------------------------------------
>>
>> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
>> index 6f4e8dc..b08d100 100644
>> --- a/fs/proc/inode.c
>> +++ b/fs/proc/inode.c
>> @@ -425,7 +425,8 @@ struct inode *proc_get_inode(struct super_block *sb,
> unsigned int ino,
>>  			}
>>  		}
>>  		unlock_new_inode(inode);
>> -	}
>> +	} else
>> +	       module_put(de->owner);
>>  	return inode;
>>   out_ino:
>
> I just tested this and it seems to fix my problem (I applied this to 2.6.25
> kernel).

Looks good and it seems to follow the characterization.
proc_lookup -> proc_lookup_de -> proc_get_inode 

Will happen each time a dentry is created for a name/inode.
Which is much easier now with multiple instances in
proc.

It looks like this bug an old bug that has existed since

commit e9543659715602e3180f00a227bb6db34141ac41
Author: Kirill Korotaev <dev@sw.ru>
Date:   Sun Oct 30 15:02:26 2005 -0800

    [PATCH] proc: fix of error path in proc_get_inode()
    
    This patch fixes incorrect error path in proc_get_inode(), when module
    can't be get due to being unloaded.  When try_module_get() fails, this
    function puts de(!) and still returns inode with non-getted de.
    
    There are still unresolved known bugs in proc yet to be fixed:
    - proc_dir_entry tree is managed without any serialization
    - create_proc_entry() doesn't setup de->owner anyhow,
       so setting it later manually is inatomic.
    - looks like almost all modules do not care whether
       it's de->owner is set...
    
    Signed-Off-By: Denis Lunev <den@sw.ru>
    Signed-Off-By: Kirill Korotaev <dev@sw.ru>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

Eric

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
  2008-05-20  1:14                 ` Eric W. Biederman
@ 2008-05-20  8:25                   ` Robert Olsson
  0 siblings, 0 replies; 19+ messages in thread
From: Robert Olsson @ 2008-05-20  8:25 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Ben Greear, Denis V. Lunev, Patrick McHardy, Andrew Morton,
	netdev, bugme-daemon, devzero, Robert Olsson, Pavel Emelyanov


Eric W. Biederman writes:
 > > Denis V. Lunev wrote:
 > >
 > >> could you consider this preliminary patch? It fixes the problem for me
 > >> and Pavel agrees with it.
 > >>
 > Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

 > >> ------------------------------------------------------------------------
 > >>
 > >> diff --git a/fs/proc/inode.c b/fs/proc/inode.c
 > >> index 6f4e8dc..b08d100 100644
 > >> --- a/fs/proc/inode.c
 > >> +++ b/fs/proc/inode.c
 > >> @@ -425,7 +425,8 @@ struct inode *proc_get_inode(struct super_block *sb,
 > > unsigned int ino,
 > >>  			}
 > >>  		}
 > >>  		unlock_new_inode(inode);
 > >> -	}
 > >> +	} else
 > >> +	       module_put(de->owner);
 > >>  	return inode;
 > >>   out_ino:

 Yes it fixes the oops I got as well.

 Acked-by: Robert Olsson <robert.olsson@its.uu.se>

 Cheers.
					--ro

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
  2008-05-20 21:57 Alexey Dobriyan
@ 2008-05-20 21:17 ` Robert Olsson
  0 siblings, 0 replies; 19+ messages in thread
From: Robert Olsson @ 2008-05-20 21:17 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: robert.olsson, netdev



Alexey Dobriyan writes:
 > > Yes it fixes the oops I got as well.
 > >
 > > Acked-by: Robert Olsson <robert.olsson@its.uu.se>
 > 
 > Robert, which oops do you see?
 
 Sorry I meant the procfs problem.

 Cheers. 
					--ro


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Bugme-new] [Bug 10737] New: pktgen procfs problem
@ 2008-05-20 21:57 Alexey Dobriyan
  2008-05-20 21:17 ` Robert Olsson
  0 siblings, 1 reply; 19+ messages in thread
From: Alexey Dobriyan @ 2008-05-20 21:57 UTC (permalink / raw)
  To: robert.olsson; +Cc: netdev

> > >> +	} else
> > >> +	       module_put(de->owner);
> > >>  	return inode;
> > >>   out_ino:

> Yes it fixes the oops I got as well.
>
> Acked-by: Robert Olsson <robert.olsson@its.uu.se>

Robert, which oops do you see?

Patch you cited hardly can fix an oops (as in real oops, not warning).


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-05-20 21:17 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <bug-10737-10286@http.bugzilla.kernel.org/>
2008-05-17 21:10 ` [Bugme-new] [Bug 10737] New: pktgen procfs problem Andrew Morton
2008-05-18  1:39   ` Patrick McHardy
2008-05-18  4:56     ` Andrew Morton
2008-05-18 12:06       ` Patrick McHardy
2008-05-18 13:24         ` Patrick McHardy
2008-05-18 15:31           ` Patrick McHardy
2008-05-19  7:54             ` Denis V. Lunev
2008-05-19 10:34               ` Patrick McHardy
2008-05-20  0:26               ` Ben Greear
2008-05-20  1:14                 ` Eric W. Biederman
2008-05-20  8:25                   ` Robert Olsson
2008-05-19 21:34             ` Eric W. Biederman
2008-05-19 22:19               ` Ben Greear
2008-05-19 15:19 Alexey Dobriyan
  -- strict thread matches above, loose matches on Subject: below --
2008-05-19 16:03 Alexey Dobriyan
2008-05-19 22:21 Alexey Dobriyan
2008-05-19 22:45 ` Robert Olsson
2008-05-20 21:57 Alexey Dobriyan
2008-05-20 21:17 ` Robert Olsson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).