another problem with latest code drops

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* another problem with latest code drops
@ 2008-10-16  2:06 Lachlan McIlroy
  2008-10-16  6:02 ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Lachlan McIlroy @ 2008-10-16  2:06 UTC (permalink / raw)
  To: xfs-oss

fsstress started reporting these errors

fsstress: check_cwd failure
fsstress: check_cwd failure
fsstress: check_cwd failure
fsstress: check_cwd failure
fsstress: check_cwd failure
...

The filesystem is mounted on /mnt/data but the mount point is now toast.

wipeout:/mnt # mount
...
/dev/mapper/dm0 on /mnt/data type xfs (rw,logdev=/dev/ram0,nobarrier)


wipeout:/mnt # ls -alF
/bin/ls: data: Input/output error
total 4
drwxr-xr-x  6 root root   57 Aug  8 03:09 ./
drwxr-xr-x 21 root root 4096 Oct 15 11:56 ../
?---------  0 root root    0 Dec 31  1969 data
drwxr-xr-x  2 root root    6 Jul 16 08:21 home/

Root inode is 128?

wipeout:/mnt # xfs_db -f /dev/mapper/dm0 
xfs_db> inode 128
xfs_db> print
core.magic = 0x494e
core.mode = 040755
core.version = 2
core.format = 2 (extents)
core.nlinkv2 = 11
core.onlink = 0
core.projid = 0
core.uid = 0
core.gid = 0
core.flushiter = 3
core.atime.sec = Wed Oct 15 12:02:22 2008
core.atime.nsec = 254053000
core.mtime.sec = Wed Oct 15 13:00:56 2008
core.mtime.nsec = 340001031
core.ctime.sec = Wed Oct 15 13:00:56 2008
core.ctime.nsec = 340001031
core.size = 4096
core.nblocks = 1
core.extsize = 0
core.nextents = 1
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 0
next_unlinked = null
u.bmx[0] = [startoff,startblock,blockcount,extentflag] 0:[0,18,1,0]

Looks okay, must be corrupt in memory only.


I can't cd into the filesystem but luckily I still had a shell open in the filesystem.

wipeout:/mnt/data/temp # ls -alF
/bin/ls: reading directory .: Input/output error
total 0
wipeout:/mnt/data/temp # ls -alF file_1
/bin/ls: file_1: Input/output error
?--------- 0 root root 0 1969-12-31 18:00 file_1
wipeout:/mnt/data/temp # ls -alF file_2
/bin/ls: file_2: Input/output error
?--------- 0 root root 0 1969-12-31 18:00 file_2
wipeout:/mnt/data/temp # ls -alF file_3
/bin/ls: file_3: Input/output error
?--------- 0 root root 0 1969-12-31 18:00 file_3
wipeout:/mnt/data/temp # ls -alF file_4
/bin/ls: file_4: Input/output error
?--------- 0 root root 0 1969-12-31 18:00 file_4
wipeout:/mnt/data/temp # ls -alF file_5
/bin/ls: file_5: Input/output error
?--------- 0 root root 0 1969-12-31 18:00 file_5
...
/bin/ls: file_710: Input/output error
?--------- 0 root root 0 1969-12-31 18:00 file_710
wipeout:/mnt/data/temp # ls -alF file_711
/bin/ls: file_711: Input/output error
?--------- 0 root root 0 1969-12-31 18:00 file_711
wipeout:/mnt/data/temp # ls -alF file_712
/bin/ls: file_712: Input/output error
?--------- 0 root root 0 1969-12-31 18:00 file_712

Lots of files toasted here too.

Here's a hint as to what's gone wrong.

Oct 16 09:54:54 wipeout kernel: [79179.449760] Filesystem "dm-0": XFS internal error xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c.  Caller 0xffffffff8118
d422
Oct 16 09:54:54 wipeout kernel: [79179.449773] Pid: 6679, comm: fsstress Not tainted 2.6.27-rc8 #192
Oct 16 09:54:54 wipeout kernel: [79179.449775] 
Oct 16 09:54:54 wipeout kernel: [79179.449775] Call Trace:
Oct 16 09:54:54 wipeout kernel: [79179.449784]  [<ffffffff81176d54>] xfs_error_report+0x3c/0x3e
Oct 16 09:54:54 wipeout kernel: [79179.449789]  [<ffffffff8118d422>] ? xfs_rename+0x703/0x745
Oct 16 09:54:54 wipeout kernel: [79179.449795]  [<ffffffff8118e9cb>] xfs_trans_cancel+0x5f/0xfc
Oct 16 09:54:54 wipeout kernel: [79179.449799]  [<ffffffff8118d422>] xfs_rename+0x703/0x745
Oct 16 09:54:54 wipeout kernel: [79179.449805]  [<ffffffff8119d4b2>] xfs_vn_rename+0x5d/0x61
Oct 16 09:54:54 wipeout kernel: [79179.449810]  [<ffffffff810ab449>] vfs_rename+0x2b2/0x42e
Oct 16 09:54:54 wipeout kernel: [79179.449815]  [<ffffffff810ad0f2>] sys_renameat+0x16d/0x1e3
Oct 16 09:54:54 wipeout kernel: [79179.449821]  [<ffffffff810a66d2>] ? sys_newstat+0x31/0x3c
Oct 16 09:54:54 wipeout kernel: [79179.449826]  [<ffffffff810ad17e>] sys_rename+0x16/0x18
Oct 16 09:54:54 wipeout kernel: [79179.449831]  [<ffffffff8100bf3b>] system_call_fastpath+0x16/0x1b
Oct 16 09:54:54 wipeout kernel: [79179.449835] 
Oct 16 09:54:54 wipeout kernel: [79179.449840] xfs_force_shutdown(dm-0,0x8) called from line 1165 of file fs/xfs/xfs_trans.c.  Return address = 0xffffffff8118e9e4
Oct 16 09:54:54 wipeout kernel: [79179.449851] xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-0.  Returning error.
Oct 16 09:54:54 wipeout kernel: [79179.449862] xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-0.  Returning error.
Oct 16 09:54:54 wipeout kernel: [79179.449867] Filesystem "dm-0": Corruption of in-memory data detected.  Shutting down filesystem: dm-0
Oct 16 09:54:54 wipeout kernel: [79179.449873] Pid: 6679, comm: fsstress Not tainted 2.6.27-rc8 #192
Oct 16 09:54:54 wipeout kernel: [79179.449878] 
Oct 16 09:54:54 wipeout kernel: [79179.449878] Call Trace:
Oct 16 09:54:54 wipeout kernel: [79179.449883]  [<ffffffff81195daf>] xfs_do_force_shutdown+0xac/0x138
Oct 16 09:54:54 wipeout kernel: [79179.449887]  [<ffffffff8118e9e4>] ? xfs_trans_cancel+0x78/0xfc
Oct 16 09:54:54 wipeout kernel: [79179.449892]  [<ffffffff8118e9e4>] xfs_trans_cancel+0x78/0xfc
Oct 16 09:54:54 wipeout kernel: [79179.449896]  [<ffffffff8118d422>] xfs_rename+0x703/0x745
Oct 16 09:54:54 wipeout kernel: [79179.449901]  [<ffffffff8119d4b2>] xfs_vn_rename+0x5d/0x61
Oct 16 09:54:54 wipeout kernel: [79179.449906]  [<ffffffff810ab449>] vfs_rename+0x2b2/0x42e
Oct 16 09:54:54 wipeout kernel: [79179.449911]  [<ffffffff810ad0f2>] sys_renameat+0x16d/0x1e3
Oct 16 09:54:54 wipeout kernel: [79179.449916]  [<ffffffff810a66d2>] ? sys_newstat+0x31/0x3c
Oct 16 09:54:54 wipeout kernel: [79179.449920]  [<ffffffff810ad17e>] sys_rename+0x16/0x18
Oct 16 09:54:54 wipeout kernel: [79179.449964]  [<ffffffff8100bf3b>] system_call_fastpath+0x16/0x1b
Oct 16 09:54:54 wipeout kernel: [79179.449966] 
Oct 16 09:54:54 wipeout kernel: [79179.449966] Please umount the filesystem, and rectify the problem(s)
Oct 16 09:54:54 wipeout kernel: [79179.450069] xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-0.  Returning error.
Oct 16 09:54:54 wipeout kernel: [79179.450119] xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-0.  Returning error.
Oct 16 09:54:54 wipeout kernel: [79179.450125] xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-0.  Returning error.
Oct 16 09:54:54 wipeout kernel: [79179.450268] xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-0.  Returning error.
Oct 16 09:54:54 wipeout kernel: [79179.450273] xfs_imap_to_bp: xfs_trans_read_buf()returned an error 5 on dm-0.  Returning error.

But doesn't explain why the mode of all the files is zeroed out
and with the root inode busted I can't unmount.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: another problem with latest code drops
  2008-10-16  2:06 another problem with latest code drops Lachlan McIlroy
@ 2008-10-16  6:02 ` Dave Chinner
  2008-10-16  7:38   ` Lachlan McIlroy
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2008-10-16  6:02 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: xfs-oss

On Thu, Oct 16, 2008 at 12:06:21PM +1000, Lachlan McIlroy wrote:
> fsstress started reporting these errors
>
> fsstress: check_cwd failure
> fsstress: check_cwd failure
> fsstress: check_cwd failure
> fsstress: check_cwd failure
> fsstress: check_cwd failure
> ...
>
> The filesystem is mounted on /mnt/data but the mount point is now toast.
>
> wipeout:/mnt # mount
> ...
> /dev/mapper/dm0 on /mnt/data type xfs (rw,logdev=/dev/ram0,nobarrier)
>
>
> wipeout:/mnt # ls -alF
> /bin/ls: data: Input/output error
> total 4
> drwxr-xr-x  6 root root   57 Aug  8 03:09 ./
> drwxr-xr-x 21 root root 4096 Oct 15 11:56 ../
> ?---------  0 root root    0 Dec 31  1969 data
> drwxr-xr-x  2 root root    6 Jul 16 08:21 home/

I bet the filesystem has been shut down....

[snip]

> Oct 16 09:54:54 wipeout kernel: [79179.449760] Filesystem "dm-0": XFS internal error xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c.  Caller 0xffffffff8118
> d422
> Oct 16 09:54:54 wipeout kernel: [79179.449773] Pid: 6679, comm: fsstress Not tainted 2.6.27-rc8 #192
> Oct 16 09:54:54 wipeout kernel: [79179.449775] Oct 16 09:54:54 wipeout 
> kernel: [79179.449775] Call Trace:
> Oct 16 09:54:54 wipeout kernel: [79179.449784]  [<ffffffff81176d54>] xfs_error_report+0x3c/0x3e
> Oct 16 09:54:54 wipeout kernel: [79179.449789]  [<ffffffff8118d422>] ? xfs_rename+0x703/0x745
> Oct 16 09:54:54 wipeout kernel: [79179.449795]  [<ffffffff8118e9cb>] xfs_trans_cancel+0x5f/0xfc
> Oct 16 09:54:54 wipeout kernel: [79179.449799]  [<ffffffff8118d422>] xfs_rename+0x703/0x745
> Oct 16 09:54:54 wipeout kernel: [79179.449805]  [<ffffffff8119d4b2>] xfs_vn_rename+0x5d/0x61
> Oct 16 09:54:54 wipeout kernel: [79179.449810]  [<ffffffff810ab449>] vfs_rename+0x2b2/0x42e
> Oct 16 09:54:54 wipeout kernel: [79179.449815]  [<ffffffff810ad0f2>] sys_renameat+0x16d/0x1e3
> Oct 16 09:54:54 wipeout kernel: [79179.449821]  [<ffffffff810a66d2>] ? sys_newstat+0x31/0x3c
> Oct 16 09:54:54 wipeout kernel: [79179.449826]  [<ffffffff810ad17e>] sys_rename+0x16/0x18
> Oct 16 09:54:54 wipeout kernel: [79179.449831]  [<ffffffff8100bf3b>] system_call_fastpath+0x16/0x1b
> Oct 16 09:54:54 wipeout kernel: [79179.449835] Oct 16 09:54:54 wipeout 

Ah, yes. A shutdown in a directory transaction. Have you applied the
fix to the directory block allocation transaction accounting that was one
of the last patches I posted?

If so, then there's some other problem in that code that we'll
need a reproducable test case to be able to find....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: another problem with latest code drops
  2008-10-16  7:38   ` Lachlan McIlroy
@ 2008-10-16  7:20     ` Dave Chinner
  2008-10-16  8:35       ` Lachlan McIlroy
  2008-10-19  9:10       ` Christoph Hellwig
  0 siblings, 2 replies; 20+ messages in thread
From: Dave Chinner @ 2008-10-16  7:20 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: xfs-oss

On Thu, Oct 16, 2008 at 05:38:39PM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> On Thu, Oct 16, 2008 at 12:06:21PM +1000, Lachlan McIlroy wrote:
>>> fsstress started reporting these errors
>>>
>>> fsstress: check_cwd failure
>>> fsstress: check_cwd failure
>>> fsstress: check_cwd failure
>>> fsstress: check_cwd failure
>>> fsstress: check_cwd failure
>>> ...
....
>> Ah, yes. A shutdown in a directory transaction. Have you applied the
>> fix to the directory block allocation transaction accounting that was one
>> of the last patches I posted?
> Yes, I checked that in yesterday and ran with it overnight.

OK.

>> If so, then there's some other problem in that code that we'll
>> need a reproducable test case to be able to find....
>
> I was running 8 copies of this command:
> fsstress -p 64 -n 10000000 -d /mnt/data/fsstress.$i
>
> I tried it again but this time the system ran out of memory
> and locked up hard.  I couldn't see why though - maybe a memory
> leak.

I just ran up the same load in a UML session. I'd say it's this
slab:

  2482   2481  99%    0.23K    146       17       584K xfs_btree_cur

which is showing a leak. It is slowly growing on my system
and dropping the caches doesn't reduce it's size. At least it's
a place to start looking - somewhere in the new btree code we
seem to be leaking a btree cursor....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: another problem with latest code drops
  2008-10-16  6:02 ` Dave Chinner
@ 2008-10-16  7:38   ` Lachlan McIlroy
  2008-10-16  7:20     ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Lachlan McIlroy @ 2008-10-16  7:38 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-oss

Dave Chinner wrote:
> On Thu, Oct 16, 2008 at 12:06:21PM +1000, Lachlan McIlroy wrote:
>> fsstress started reporting these errors
>>
>> fsstress: check_cwd failure
>> fsstress: check_cwd failure
>> fsstress: check_cwd failure
>> fsstress: check_cwd failure
>> fsstress: check_cwd failure
>> ...
>>
>> The filesystem is mounted on /mnt/data but the mount point is now toast.
>>
>> wipeout:/mnt # mount
>> ...
>> /dev/mapper/dm0 on /mnt/data type xfs (rw,logdev=/dev/ram0,nobarrier)
>>
>>
>> wipeout:/mnt # ls -alF
>> /bin/ls: data: Input/output error
>> total 4
>> drwxr-xr-x  6 root root   57 Aug  8 03:09 ./
>> drwxr-xr-x 21 root root 4096 Oct 15 11:56 ../
>> ?---------  0 root root    0 Dec 31  1969 data
>> drwxr-xr-x  2 root root    6 Jul 16 08:21 home/
> 
> I bet the filesystem has been shut down....
> 
> [snip]
> 
>> Oct 16 09:54:54 wipeout kernel: [79179.449760] Filesystem "dm-0": XFS internal error xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c.  Caller 0xffffffff8118
>> d422
>> Oct 16 09:54:54 wipeout kernel: [79179.449773] Pid: 6679, comm: fsstress Not tainted 2.6.27-rc8 #192
>> Oct 16 09:54:54 wipeout kernel: [79179.449775] Oct 16 09:54:54 wipeout 
>> kernel: [79179.449775] Call Trace:
>> Oct 16 09:54:54 wipeout kernel: [79179.449784]  [<ffffffff81176d54>] xfs_error_report+0x3c/0x3e
>> Oct 16 09:54:54 wipeout kernel: [79179.449789]  [<ffffffff8118d422>] ? xfs_rename+0x703/0x745
>> Oct 16 09:54:54 wipeout kernel: [79179.449795]  [<ffffffff8118e9cb>] xfs_trans_cancel+0x5f/0xfc
>> Oct 16 09:54:54 wipeout kernel: [79179.449799]  [<ffffffff8118d422>] xfs_rename+0x703/0x745
>> Oct 16 09:54:54 wipeout kernel: [79179.449805]  [<ffffffff8119d4b2>] xfs_vn_rename+0x5d/0x61
>> Oct 16 09:54:54 wipeout kernel: [79179.449810]  [<ffffffff810ab449>] vfs_rename+0x2b2/0x42e
>> Oct 16 09:54:54 wipeout kernel: [79179.449815]  [<ffffffff810ad0f2>] sys_renameat+0x16d/0x1e3
>> Oct 16 09:54:54 wipeout kernel: [79179.449821]  [<ffffffff810a66d2>] ? sys_newstat+0x31/0x3c
>> Oct 16 09:54:54 wipeout kernel: [79179.449826]  [<ffffffff810ad17e>] sys_rename+0x16/0x18
>> Oct 16 09:54:54 wipeout kernel: [79179.449831]  [<ffffffff8100bf3b>] system_call_fastpath+0x16/0x1b
>> Oct 16 09:54:54 wipeout kernel: [79179.449835] Oct 16 09:54:54 wipeout 
> 
> Ah, yes. A shutdown in a directory transaction. Have you applied the
> fix to the directory block allocation transaction accounting that was one
> of the last patches I posted?
Yes, I checked that in yesterday and ran with it overnight.

> 
> If so, then there's some other problem in that code that we'll
> need a reproducable test case to be able to find....

I was running 8 copies of this command:
fsstress -p 64 -n 10000000 -d /mnt/data/fsstress.$i

I tried it again but this time the system ran out of memory
and locked up hard.  I couldn't see why though - maybe a memory
leak.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: another problem with latest code drops
  2008-10-16  7:20     ` Dave Chinner
@ 2008-10-16  8:35       ` Lachlan McIlroy
  2008-10-16  9:08         ` Christoph Hellwig
  2008-10-16 22:29         ` Dave Chinner
  2008-10-19  9:10       ` Christoph Hellwig
  1 sibling, 2 replies; 20+ messages in thread
From: Lachlan McIlroy @ 2008-10-16  8:35 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-oss

Dave Chinner wrote:
> On Thu, Oct 16, 2008 at 05:38:39PM +1000, Lachlan McIlroy wrote:
>> Dave Chinner wrote:
>>> On Thu, Oct 16, 2008 at 12:06:21PM +1000, Lachlan McIlroy wrote:
>>>> fsstress started reporting these errors
>>>>
>>>> fsstress: check_cwd failure
>>>> fsstress: check_cwd failure
>>>> fsstress: check_cwd failure
>>>> fsstress: check_cwd failure
>>>> fsstress: check_cwd failure
>>>> ...
> ....
>>> Ah, yes. A shutdown in a directory transaction. Have you applied the
>>> fix to the directory block allocation transaction accounting that was one
>>> of the last patches I posted?
>> Yes, I checked that in yesterday and ran with it overnight.
> 
> OK.
> 
>>> If so, then there's some other problem in that code that we'll
>>> need a reproducable test case to be able to find....
>> I was running 8 copies of this command:
>> fsstress -p 64 -n 10000000 -d /mnt/data/fsstress.$i
>>
>> I tried it again but this time the system ran out of memory
>> and locked up hard.  I couldn't see why though - maybe a memory
>> leak.
> 
> I just ran up the same load in a UML session. I'd say it's this
> slab:
> 
>   2482   2481  99%    0.23K    146       17       584K xfs_btree_cur
> 
> which is showing a leak. It is slowly growing on my system
> and dropping the caches doesn't reduce it's size. At least it's
> a place to start looking - somewhere in the new btree code we
> seem to be leaking a btree cursor....
> 


I'm not seeing a leak in that slab - actually that slab doesn't even
show up.  I am seeing a lot of memory used here though:

116605669 116605669  26%    0.23K 6859157       17  27436628K selinux_inode_security

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: another problem with latest code drops
  2008-10-16  8:35       ` Lachlan McIlroy
@ 2008-10-16  9:08         ` Christoph Hellwig
  2008-10-17  1:13           ` Lachlan McIlroy
  2008-10-16 22:29         ` Dave Chinner
  1 sibling, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2008-10-16  9:08 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: xfs-oss

On Thu, Oct 16, 2008 at 06:35:03PM +1000, Lachlan McIlroy wrote:
> I'm not seeing a leak in that slab - actually that slab doesn't even
> show up.  I am seeing a lot of memory used here though:

Are you using slab or slub?  The latter merges caches of equal size, so
it's totally useless for the kind of debug stats Dave looked at.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: another problem with latest code drops
  2008-10-16  8:35       ` Lachlan McIlroy
  2008-10-16  9:08         ` Christoph Hellwig
@ 2008-10-16 22:29         ` Dave Chinner
  2008-10-17  1:17           ` Lachlan McIlroy
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2008-10-16 22:29 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: xfs-oss

On Thu, Oct 16, 2008 at 06:35:03PM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> On Thu, Oct 16, 2008 at 05:38:39PM +1000, Lachlan McIlroy wrote:
>>> Dave Chinner wrote:
>>>> On Thu, Oct 16, 2008 at 12:06:21PM +1000, Lachlan McIlroy wrote:
>>>>> fsstress started reporting these errors
>>>>>
>>>>> fsstress: check_cwd failure
>>>>> fsstress: check_cwd failure
>>>>> fsstress: check_cwd failure
>>>>> fsstress: check_cwd failure
>>>>> fsstress: check_cwd failure
>>>>> ...
>> ....
>>>> Ah, yes. A shutdown in a directory transaction. Have you applied the
>>>> fix to the directory block allocation transaction accounting that was one
>>>> of the last patches I posted?
>>> Yes, I checked that in yesterday and ran with it overnight.
>>
>> OK.
>>
>>>> If so, then there's some other problem in that code that we'll
>>>> need a reproducable test case to be able to find....
>>> I was running 8 copies of this command:
>>> fsstress -p 64 -n 10000000 -d /mnt/data/fsstress.$i
>>>
>>> I tried it again but this time the system ran out of memory
>>> and locked up hard.  I couldn't see why though - maybe a memory
>>> leak.
>>
>> I just ran up the same load in a UML session. I'd say it's this
>> slab:
>>
>>   2482   2481  99%    0.23K    146       17       584K xfs_btree_cur
>>
>> which is showing a leak. It is slowly growing on my system
>> and dropping the caches doesn't reduce it's size. At least it's
>> a place to start looking - somewhere in the new btree code we
>> seem to be leaking a btree cursor....
>
> I'm not seeing a leak in that slab - actually that slab doesn't even
> show up.

Overnight the xfs_btree_cur slab made it up to about 7000 in use
entries, so there is definitely a leak there, though it is a slow
one.

> I am seeing a lot of memory used here though:
>
> 116605669 116605669  26%    0.23K 6859157       17  27436628K selinux_inode_security

Ah - I don't run selinux. Sounds like a bug that needs reporting
to lkml...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: another problem with latest code drops
  2008-10-16  9:08         ` Christoph Hellwig
@ 2008-10-17  1:13           ` Lachlan McIlroy
  0 siblings, 0 replies; 20+ messages in thread
From: Lachlan McIlroy @ 2008-10-17  1:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs-oss

Christoph Hellwig wrote:
> On Thu, Oct 16, 2008 at 06:35:03PM +1000, Lachlan McIlroy wrote:
>> I'm not seeing a leak in that slab - actually that slab doesn't even
>> show up.  I am seeing a lot of memory used here though:
> 
> Are you using slab or slub?  The latter merges caches of equal size, so
> it's totally useless for the kind of debug stats Dave looked at.
> 
It's slub.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: another problem with latest code drops
  2008-10-16 22:29         ` Dave Chinner
@ 2008-10-17  1:17           ` Lachlan McIlroy
  2008-10-17  1:21             ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Lachlan McIlroy @ 2008-10-17  1:17 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-oss

Dave Chinner wrote:
> On Thu, Oct 16, 2008 at 06:35:03PM +1000, Lachlan McIlroy wrote:
>> Dave Chinner wrote:
>>> On Thu, Oct 16, 2008 at 05:38:39PM +1000, Lachlan McIlroy wrote:
>>>> Dave Chinner wrote:
>>>>> On Thu, Oct 16, 2008 at 12:06:21PM +1000, Lachlan McIlroy wrote:
>>>>>> fsstress started reporting these errors
>>>>>>
>>>>>> fsstress: check_cwd failure
>>>>>> fsstress: check_cwd failure
>>>>>> fsstress: check_cwd failure
>>>>>> fsstress: check_cwd failure
>>>>>> fsstress: check_cwd failure
>>>>>> ...
>>> ....
>>>>> Ah, yes. A shutdown in a directory transaction. Have you applied the
>>>>> fix to the directory block allocation transaction accounting that was one
>>>>> of the last patches I posted?
>>>> Yes, I checked that in yesterday and ran with it overnight.
>>> OK.
>>>
>>>>> If so, then there's some other problem in that code that we'll
>>>>> need a reproducable test case to be able to find....
>>>> I was running 8 copies of this command:
>>>> fsstress -p 64 -n 10000000 -d /mnt/data/fsstress.$i
>>>>
>>>> I tried it again but this time the system ran out of memory
>>>> and locked up hard.  I couldn't see why though - maybe a memory
>>>> leak.
>>> I just ran up the same load in a UML session. I'd say it's this
>>> slab:
>>>
>>>   2482   2481  99%    0.23K    146       17       584K xfs_btree_cur
>>>
>>> which is showing a leak. It is slowly growing on my system
>>> and dropping the caches doesn't reduce it's size. At least it's
>>> a place to start looking - somewhere in the new btree code we
>>> seem to be leaking a btree cursor....
>> I'm not seeing a leak in that slab - actually that slab doesn't even
>> show up.
> 
> Overnight the xfs_btree_cur slab made it up to about 7000 in use
> entries, so there is definitely a leak there, though it is a slow
> one.
> 
>> I am seeing a lot of memory used here though:
>>
>> 116605669 116605669  26%    0.23K 6859157       17  27436628K selinux_inode_security
> 
> Ah - I don't run selinux. Sounds like a bug that needs reporting
> to lkml...

I'm sure this is caused by your changes that introduced inode_init_always().
It re-initialises an existing inode without destroying it first so it calls
security_inode_alloc() without calling security_inode_free().

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: another problem with latest code drops
  2008-10-17  1:17           ` Lachlan McIlroy
@ 2008-10-17  1:21             ` Dave Chinner
  2008-10-17  2:04               ` [PATCH] " Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2008-10-17  1:21 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: xfs-oss

On Fri, Oct 17, 2008 at 11:17:46AM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>>> I am seeing a lot of memory used here though:
>>>
>>> 116605669 116605669  26%    0.23K 6859157       17  27436628K selinux_inode_security
>>
>> Ah - I don't run selinux. Sounds like a bug that needs reporting
>> to lkml...
>
> I'm sure this is caused by your changes that introduced inode_init_always().
> It re-initialises an existing inode without destroying it first so it calls
> security_inode_alloc() without calling security_inode_free().

I can't think of how. The layers above XFS are symmetric:

	alloc_inode()
	  inode_init_always()
	    security_inode_alloc()
	.....
	security_inode_free()
	->destroy_inode()

So the filesystems should never, ever need to know about the
security context attached to the inode. The changes that introduced
inode_init_always() do not change this symmetry - we do:

	xfs_inode_alloc()
	  inode_init_always()
	    security_inode_alloc()
	.....
	security_inode_free()
	->destroy_inode()

And we should have this symmetry everywhere.

<thinks for a bit>

Hmmmm - maybe the xfs_iget_cache_miss failure paths where we call
xfs_idestroy() could leak contexts. We should really call xfs_iput()
because we have an initialised linux inode at this point and so
we need to go through destroy_inode(). I'll have a bit more of
a look, but this doesn't seem to account for the huge number of
leaked contexts you reported....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] Re: another problem with latest code drops
  2008-10-17  1:21             ` Dave Chinner
@ 2008-10-17  2:04               ` Dave Chinner
  2008-10-17  2:07                 ` Dave Chinner
  2008-10-17  3:14                 ` Lachlan McIlroy
  0 siblings, 2 replies; 20+ messages in thread
From: Dave Chinner @ 2008-10-17  2:04 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-oss

On Fri, Oct 17, 2008 at 12:21:41PM +1100, Dave Chinner wrote:
> On Fri, Oct 17, 2008 at 11:17:46AM +1000, Lachlan McIlroy wrote:
> > Dave Chinner wrote:
> >>> I am seeing a lot of memory used here though:
> >>>
> >>> 116605669 116605669  26%    0.23K 6859157       17  27436628K selinux_inode_security
> >>
> >> Ah - I don't run selinux. Sounds like a bug that needs reporting
> >> to lkml...
> >
> > I'm sure this is caused by your changes that introduced inode_init_always().
> > It re-initialises an existing inode without destroying it first so it calls
> > security_inode_alloc() without calling security_inode_free().
> 
> I can't think of how. The layers above XFS are symmetric:
.....
> And we should have this symmetry everywhere.
> 
> <thinks for a bit>
> 
> Hmmmm - maybe the xfs_iget_cache_miss failure paths where we call
> xfs_idestroy() could leak contexts. We should really call xfs_iput()
> because we have an initialised linux inode at this point and so
> we need to go through destroy_inode(). I'll have a bit more of
> a look, but this doesn't seem to account for the huge number of
> leaked contexts you reported....

Patch below that replaces xfs_idestroy() with IRELE() to destroy
the inode via the normal iput() path. It also fixes a second issue
that I found by inspection related to security contexts as a result
of hooking up ->destroy_inode.

It's running QA now.

FWIW, I'm not sure if this patch will apply cleanly - I'm still
running of my stack of patches and not what has been checked into
ptools. Any idea of when all the patches in ptools will be pushed
out to the git tree?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: another problem with latest code drops
  2008-10-17  2:04               ` [PATCH] " Dave Chinner
@ 2008-10-17  2:07                 ` Dave Chinner
  2008-10-20  2:37                   ` Lachlan McIlroy
  2008-10-17  3:14                 ` Lachlan McIlroy
  1 sibling, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2008-10-17  2:07 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-oss

On Fri, Oct 17, 2008 at 01:04:34PM +1100, Dave Chinner wrote:
> On Fri, Oct 17, 2008 at 12:21:41PM +1100, Dave Chinner wrote:
> > On Fri, Oct 17, 2008 at 11:17:46AM +1000, Lachlan McIlroy wrote:
> > > Dave Chinner wrote:
> > >>> I am seeing a lot of memory used here though:
> > >>>
> > >>> 116605669 116605669  26%    0.23K 6859157       17  27436628K selinux_inode_security
> > >>
> > >> Ah - I don't run selinux. Sounds like a bug that needs reporting
> > >> to lkml...
> > >
> > > I'm sure this is caused by your changes that introduced inode_init_always().
> > > It re-initialises an existing inode without destroying it first so it calls
> > > security_inode_alloc() without calling security_inode_free().
> > 
> > I can't think of how. The layers above XFS are symmetric:
> .....
> > And we should have this symmetry everywhere.
> > 
> > <thinks for a bit>
> > 
> > Hmmmm - maybe the xfs_iget_cache_miss failure paths where we call
> > xfs_idestroy() could leak contexts. We should really call xfs_iput()
> > because we have an initialised linux inode at this point and so
> > we need to go through destroy_inode(). I'll have a bit more of
> > a look, but this doesn't seem to account for the huge number of
> > leaked contexts you reported....
> 
> Patch below that replaces xfs_idestroy() with IRELE() to destroy
> the inode via the normal iput() path. It also fixes a second issue
> that I found by inspection related to security contexts as a result
> of hooking up ->destroy_inode.
> 
> It's running QA now.
> 
> FWIW, I'm not sure if this patch will apply cleanly - I'm still
> running of my stack of patches and not what has been checked into
> ptools. Any idea of when all the patches in ptools will be pushed
> out to the git tree?

And now with the patch.
-- 
Dave Chinner
david@fromorbit.com

XFS: Ensure that we destroy the linux inode after initialisation

Now that XFS initialises the struct inode prior to completing
all checks and insertion into caches, we need to destroy that
state if we fail to instantiate the inode completely. Hence
we cannot just call xfs_idestroy() to clean up state when
such an occurrence happens - we need to go through the normal
reclaim path by dropping the reference count on the linux inode
we now hold. This will prevent leaking security contexts on
failed lookups.

Also, now that we have a ->destroy_inode() method, failing to
allocate a security context for the inode will result in the
xfs_inode being freed via the ->destroy_inode() path internally
to inode_init_always(). Rearrange xfs_inode_alloc() to initialise
the xfs_inode prior to attempting to initialise the VFS inode
so that the reclaim path will work and remove the freeing
of the inode in xfs_inode_alloc() if VFS inode initialisation
fails.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_iget.c  |    9 ++++++++-
 fs/xfs/xfs_inode.c |   19 +++++++++++--------
 2 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index b34b732..29afe4e 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -197,7 +197,14 @@ out_unlock:
 	write_unlock(&pag->pag_ici_lock);
 	radix_tree_preload_end();
 out_destroy:
-	xfs_idestroy(ip);
+	/*
+	 * we've already initialised the linux inode, so we have a valid
+	 * reference count of 1 and so we cannot destroy the inode with
+	 * xfs_idestroy. Kill it by dropping the reference count to push
+	 * it through the normal reclaim path so that state on the linux
+	 * inode is also destroyed.
+	 */
+	IRELE(ip);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 17dbf24..7a2aaae 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -812,14 +812,6 @@ xfs_inode_alloc(
 	ASSERT(!spin_is_locked(&ip->i_flags_lock));
 	ASSERT(completion_done(&ip->i_flush));
 
-	/*
-	 * initialise the VFS inode here to get failures
-	 * out of the way early.
-	 */
-	if (!inode_init_always(mp->m_super, VFS_I(ip))) {
-		kmem_zone_free(xfs_inode_zone, ip);
-		return NULL;
-	}
 
 	/* initialise the xfs inode */
 	ip->i_ino = ino;
@@ -859,6 +851,17 @@ xfs_inode_alloc(
 	ip->i_dir_trace = ktrace_alloc(XFS_DIR2_KTRACE_SIZE, KM_NOFS);
 #endif
 
+	/*
+	 * Now initialise the VFS inode. We do this after the xfs_inode
+	 * initialisation as internal failures will result in ->destroy_inode
+	 * being called and that will pass down through the reclaim path and
+	 * free the XFS inode. This path requires the XFS inode to already be
+	 * initialised. Hence if this call fails, the xfs_inode has already
+	 * been freed and we should not reference it at all in the error
+	 * handling.
+	 */
+	if (!inode_init_always(mp->m_super, VFS_I(ip)))
+		return NULL;
 	return ip;
 }
 

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: another problem with latest code drops
  2008-10-17  2:04               ` [PATCH] " Dave Chinner
  2008-10-17  2:07                 ` Dave Chinner
@ 2008-10-17  3:14                 ` Lachlan McIlroy
  1 sibling, 0 replies; 20+ messages in thread
From: Lachlan McIlroy @ 2008-10-17  3:14 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-oss

Dave Chinner wrote:
> On Fri, Oct 17, 2008 at 12:21:41PM +1100, Dave Chinner wrote:
>> On Fri, Oct 17, 2008 at 11:17:46AM +1000, Lachlan McIlroy wrote:
>>> Dave Chinner wrote:
>>>>> I am seeing a lot of memory used here though:
>>>>>
>>>>> 116605669 116605669  26%    0.23K 6859157       17  27436628K selinux_inode_security
>>>> Ah - I don't run selinux. Sounds like a bug that needs reporting
>>>> to lkml...
>>> I'm sure this is caused by your changes that introduced inode_init_always().
>>> It re-initialises an existing inode without destroying it first so it calls
>>> security_inode_alloc() without calling security_inode_free().
>> I can't think of how. The layers above XFS are symmetric:
> .....
>> And we should have this symmetry everywhere.
>>
>> <thinks for a bit>
>>
>> Hmmmm - maybe the xfs_iget_cache_miss failure paths where we call
>> xfs_idestroy() could leak contexts. We should really call xfs_iput()
>> because we have an initialised linux inode at this point and so
>> we need to go through destroy_inode(). I'll have a bit more of
>> a look, but this doesn't seem to account for the huge number of
>> leaked contexts you reported....
> 
> Patch below that replaces xfs_idestroy() with IRELE() to destroy
> the inode via the normal iput() path. It also fixes a second issue
> that I found by inspection related to security contexts as a result
> of hooking up ->destroy_inode.
> 
> It's running QA now.
> 
> FWIW, I'm not sure if this patch will apply cleanly - I'm still
> running of my stack of patches and not what has been checked into
> ptools. Any idea of when all the patches in ptools will be pushed
> out to the git tree?
Should be on OSS later today.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: another problem with latest code drops
  2008-10-16  7:20     ` Dave Chinner
  2008-10-16  8:35       ` Lachlan McIlroy
@ 2008-10-19  9:10       ` Christoph Hellwig
  1 sibling, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2008-10-19  9:10 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-oss

On Thu, Oct 16, 2008 at 06:20:19PM +1100, Dave Chinner wrote:
> I just ran up the same load in a UML session. I'd say it's this
> slab:
> 
>   2482   2481  99%    0.23K    146       17       584K xfs_btree_cur
> 
> which is showing a leak. It is slowly growing on my system
> and dropping the caches doesn't reduce it's size. At least it's
> a place to start looking - somewhere in the new btree code we
> seem to be leaking a btree cursor....

I've been running

for i in $(seq 1 8); do
	fsstress -p 64 -n 10000000 -d /mnt/data/fsstress.$i &
done

for about 20 hours now, and I'm only up to a few hundred xfs_btree_cur
entires, not changing much at all over time.  This is xfs-2.6 on a 4-way
PPC machine.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: another problem with latest code drops
  2008-10-17  2:07                 ` Dave Chinner
@ 2008-10-20  2:37                   ` Lachlan McIlroy
  2008-10-20  3:17                     ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Lachlan McIlroy @ 2008-10-20  2:37 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-oss

Dave Chinner wrote:
> On Fri, Oct 17, 2008 at 01:04:34PM +1100, Dave Chinner wrote:
>> On Fri, Oct 17, 2008 at 12:21:41PM +1100, Dave Chinner wrote:
>>> On Fri, Oct 17, 2008 at 11:17:46AM +1000, Lachlan McIlroy wrote:
>>>> Dave Chinner wrote:
>>>>>> I am seeing a lot of memory used here though:
>>>>>>
>>>>>> 116605669 116605669  26%    0.23K 6859157       17  27436628K selinux_inode_security
>>>>> Ah - I don't run selinux. Sounds like a bug that needs reporting
>>>>> to lkml...
>>>> I'm sure this is caused by your changes that introduced inode_init_always().
>>>> It re-initialises an existing inode without destroying it first so it calls
>>>> security_inode_alloc() without calling security_inode_free().
>>> I can't think of how. The layers above XFS are symmetric:
>> .....
>>> And we should have this symmetry everywhere.
>>>
>>> <thinks for a bit>
>>>
>>> Hmmmm - maybe the xfs_iget_cache_miss failure paths where we call
>>> xfs_idestroy() could leak contexts. We should really call xfs_iput()
>>> because we have an initialised linux inode at this point and so
>>> we need to go through destroy_inode(). I'll have a bit more of
>>> a look, but this doesn't seem to account for the huge number of
>>> leaked contexts you reported....
>> Patch below that replaces xfs_idestroy() with IRELE() to destroy
>> the inode via the normal iput() path. It also fixes a second issue
>> that I found by inspection related to security contexts as a result
>> of hooking up ->destroy_inode.
>>
>> It's running QA now.
>>
>> FWIW, I'm not sure if this patch will apply cleanly - I'm still
>> running of my stack of patches and not what has been checked into
>> ptools. Any idea of when all the patches in ptools will be pushed
>> out to the git tree?
> 
> And now with the patch.

Nope, that didn't help.  The system still leaks - and at the same
apparent rate too.

I also hit this panic where we have taken a reference on an inode
that has I_CLEAR set.  I suspect we've made it into xfs_iget_cache_hit()
and found an inode with XFS_IRECLAIMABLE set and since we don't call
igrab() we don't do the I_CLEAR check.  I'm not really convinced that
activating dead inodes is such a good idea.

<5>[  253.457411] XFS mounting filesystem dm-0
<7>[  253.460353] Ending clean XFS mount for filesystem: dm-0
<4>[ 1727.071933] sar used greatest stack depth: 3368 bytes left
<4>[ 6212.214445] pdflush used greatest stack depth: 2888 bytes left
<4>[ 6601.643218] df used greatest stack depth: 2632 bytes left
<0>[ 6601.800534] ------------[ cut here ]------------
<2>[ 6601.801127] kernel BUG at fs/inode.c:1194!
[1]kdb> 
[1]kdb> bt
Stack traceback for pid 6700
0xffff8810048d5dc0     6700     6594  1    1   R  0xffff8810048d6228 *fsstress
sp                ip                Function (args)
0xffff881004945a58 0xffffffff810bc8a4 iput+0x1b (0xffff880857fb02a0)
0xffff881004945aa0 0xffffffff8119734d xfs_iget+0x432 (0xffff88100e780000, 0x0, 0x20000080, invalid, 0x200000008, 0xffff881004945b38, 0xcd18e40)
0xffff881004945b20 0xffffffff811a135e xfs_bulkstat_one_iget+0x3a (0xffff88100e780000, 0x20000080, 0xcd18e40, 0xffff8803d9ed2c60, 0xffff881004945ce4)
0xffff881004945b70 0xffffffff811a15b6 xfs_bulkstat_one+0x9a (0xffff88100e780000, 0x20000080, 0x7fff91556de0, invalid, invalid, 0xcd18e40, 0xffff881004945cd0, 0x0, 0xffff881004945ce4)
0xffff881004945bc0 0xffffffff811a0f7f xfs_bulkstat+0x7fd (0xffff88100e780000, 0xffff881004945dd8, 0xffff881004945d5c, 0xffffffff811a151c, 0x0, 0x88, 0x7fff91556de0, 0x1, 0xffff881004945de0)
0xffff881004945d20 0xffffffff811a16a7 xfs_bulkstat_single+0x93 (0xffff88100e780000, 0xffff881004945dd8, 0x7fff91556de0, 0xffff881004945de0)
0xffff881004945d90 0xffffffff811c1f4a xfs_ioc_bulkstat+0xd5 (0xffff88100e780000, invalid, invalid)
0xffff881004945e10 0xffffffff811c2f99 xfs_ioctl+0x2ea (0xffff88100b3cc140, 0xffff880a8e984900, invalid, invalid, 0x7fff91556e70)
0xffff881004945e80 0xffffffff811c123f xfs_file_ioctl+0x36 (invalid, invalid, invalid)
0xffff881004945eb0 0xffffffff810b5c42 vfs_ioctl+0x2a (0xffff880a8e984900, invalid, 0x7fff91556e70)
0xffff881004945ee0 0xffffffff810b5eee do_vfs_ioctl+0x25f (invalid, invalid, invalid, 0x7fff91556e70)
0xffff881004945f30 0xffffffff810b5f62 sys_ioctl+0x57 (invalid, invalid, 0x7fff91556e70)
  not matched: from 0xffffffff8100bfb2 to 0xffffffff8100c02a drop_through 0 bb_jmp[7]
bb_special_case: Invalid bb_reg_state.memory, missing trailing entries
bb_special_case: on transfer to int_with_check
  Assuming system_call_fastpath is 'pass through' with 6 register parameters
kdb_bb: 0xffffffff8100bf3b [kernel]system_call_fastpath failed at 0xffffffff8100bfcd

Using old style backtrace, unreliable with no arguments
sp                ip                Function (args)
[1]more> 
0xffff881004945a08 0xffffffff811a135e xfs_bulkstat_one_iget+0x3a
0xffff881004945a18 0xffffffff811a135e xfs_bulkstat_one_iget+0x3a
0xffff881004945a58 0xffffffff810bc8a4 iput+0x1b
0xffff881004945aa0 0xffffffff8119734d xfs_iget+0x432
0xffff881004945b00 0xffffffff811a06e2 xfs_bulkstat_one_fmt
0xffff881004945b20 0xffffffff811a135e xfs_bulkstat_one_iget+0x3a
0xffff881004945b30 0xffffffff811bbd4d kmem_alloc+0x67
0xffff881004945b38 0xffffffff811a0cdd xfs_bulkstat+0x55b
0xffff881004945b50 0xffffffff811a06e2 xfs_bulkstat_one_fmt
0xffff881004945b70 0xffffffff811a15b6 xfs_bulkstat_one+0x9a
0xffff881004945bc0 0xffffffff811a0f7f xfs_bulkstat+0x7fd
0xffff881004945bf8 0xffffffff811a151c xfs_bulkstat_one
0xffff881004945ca0 0xffffffff811a06e2 xfs_bulkstat_one_fmt
0xffff881004945d20 0xffffffff811a16a7 xfs_bulkstat_single+0x93
0xffff881004945d90 0xffffffff811c1f4a xfs_ioc_bulkstat+0xd5
0xffff881004945da0 0xffffffff811c900f _xfs_itrace_entry+0x9e
0xffff881004945e10 0xffffffff811c2f99 xfs_ioctl+0x2ea
0xffff881004945e80 0xffffffff811c123f xfs_file_ioctl+0x36
0xffff881004945eb0 0xffffffff810b5c42 vfs_ioctl+0x2a
0xffff881004945ee0 0xffffffff810b5eee do_vfs_ioctl+0x25f
0xffff881004945f30 0xffffffff810b5f62 sys_ioctl+0x57
[1]kdb> 
[1]kdb> xinode 0xffff880857fb02a0
Unknown kdb command: 'xinode 0xffff880857fb02a0
'
[1]kdb> inode 0xffff880857fb02a0
struct inode at  0xffff880857fb02a0
 i_ino = 1745051539 i_count = 1 i_size 0
 i_mode = 040777  i_nlink = 1  i_rdev = 0x0
 i_hash.nxt = 0x0000000000000000 i_hash.pprev = 0x0000000000000000
 i_list.nxt = 0x00000000001000f0 i_list.prv = 0x00000000002001f0
 i_dentry.nxt = 0xffff880857fb0238 i_dentry.prv = 0xffff880857fb0238
 i_sb = 0xffff88100e7818d8 i_op = 0xffffffff81eecb80 i_data = 0xffff880857fb0488 nrpages = 0
 i_fop= 0xffffffff81eecaa0 i_flock = 0x0000000000000000 i_mapping = 0xffff880857fb0488
 i_flags 0x0 i_state 0x40 [I_CLEAR]  fs specific info @ 0xffff880857fb0688
[1]kdb> md8c1 0xffff880857fb0688
0xffff880857fb0688 0000000000000000                    ........
[1]kdb> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: another problem with latest code drops
  2008-10-20  2:37                   ` Lachlan McIlroy
@ 2008-10-20  3:17                     ` Dave Chinner
  2008-10-20  4:37                       ` Lachlan McIlroy
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2008-10-20  3:17 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: xfs-oss

On Mon, Oct 20, 2008 at 12:37:13PM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> On Fri, Oct 17, 2008 at 01:04:34PM +1100, Dave Chinner wrote:
>>> On Fri, Oct 17, 2008 at 12:21:41PM +1100, Dave Chinner wrote:
>>>> On Fri, Oct 17, 2008 at 11:17:46AM +1000, Lachlan McIlroy wrote:
>>>>> Dave Chinner wrote:
>>>>>>> I am seeing a lot of memory used here though:
>>>>>>>
>>>>>>> 116605669 116605669  26%    0.23K 6859157       17  27436628K selinux_inode_security
>>>>>> Ah - I don't run selinux. Sounds like a bug that needs reporting
>>>>>> to lkml...
>>>>> I'm sure this is caused by your changes that introduced inode_init_always().
>>>>> It re-initialises an existing inode without destroying it first so it calls
>>>>> security_inode_alloc() without calling security_inode_free().
>>>> I can't think of how. The layers above XFS are symmetric:
>>> .....
>>>> And we should have this symmetry everywhere.
>>>>
>>>> <thinks for a bit>
>>>>
>>>> Hmmmm - maybe the xfs_iget_cache_miss failure paths where we call
>>>> xfs_idestroy() could leak contexts. We should really call xfs_iput()
>>>> because we have an initialised linux inode at this point and so
>>>> we need to go through destroy_inode(). I'll have a bit more of
>>>> a look, but this doesn't seem to account for the huge number of
>>>> leaked contexts you reported....
>>> Patch below that replaces xfs_idestroy() with IRELE() to destroy
>>> the inode via the normal iput() path. It also fixes a second issue
>>> that I found by inspection related to security contexts as a result
>>> of hooking up ->destroy_inode.
>>>
>>> It's running QA now.
>>>
>>> FWIW, I'm not sure if this patch will apply cleanly - I'm still
>>> running of my stack of patches and not what has been checked into
>>> ptools. Any idea of when all the patches in ptools will be pushed
>>> out to the git tree?
>>
>> And now with the patch.
>
> Nope, that didn't help.  The system still leaks - and at the same
> apparent rate too.

I didn't fix xfs_iread() properly - it still calls xfs_idestroy()
directly rather than dropping reference counts. Updated patch below
that should fix this.

> I also hit this panic where we have taken a reference on an inode
> that has I_CLEAR set.  I suspect we've made it into xfs_iget_cache_hit()

I don't think there is an iput() in that path.  The only iput() call
should be the IRELE() I added to xfs_iget_cache_miss(). Can you make
sure the compiler is not inlining functions so we can pin-point
where the iput() call is coming from? (i.e. static > STATIC on the
hit/miss functions)

> and found an inode with XFS_IRECLAIMABLE set and since we don't call
> igrab() we don't do the I_CLEAR check.

In that case, we call inode_init_always() instead which sets the
state to I_NEW and the reference count to 1. In the error case,
the inode will have already been freed and we make

> I'm not really convinced that
> activating dead inodes is such a good idea.

By the time the XFS_IRECLAIMABLE flag is set, I_CLEAR has been set
on the VFS inode. It is safe to re-use the inode at this point as
the VFS inode has been "destroyed" and hence all we need to do is
re-initialise it. We've always done this for inodes in reclaim so
we don't have to re-read them off disk...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

XFS: Ensure that we destroy the linux inode after initialisation

Now that XFS initialises the struct inode prior to completing
all checks and insertion into caches, we need to destroy that
state if we fail to instantiate the inode completely. Hence
we cannot just call xfs_idestroy() to clean up state when
such an occurrence happens - we need to go through the normal
reclaim path by dropping the reference count on the linux inode
we now hold. This will prevent leaking security contexts on
failed lookups.

Also, now that we have a ->destroy_inode() method, failing to
allocate a security context for the inode will result in the
xfs_inode being freed via the ->destroy_inode() path internally
to inode_init_always(). Rearrange xfs_inode_alloc() to initialise
the xfs_inode prior to attempting to initialise the VFS inode
so that the reclaim path will work and remove the freeing
of the inode in xfs_inode_alloc() if VFS inode initialisation
fails.

Version 2:
o use reference counted destroys in xfs_iread() instead of
  xfs_idestroy() calls as well.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_iget.c  |    9 ++++++++-
 fs/xfs/xfs_inode.c |   46 ++++++++++++++++++++++++++++------------------
 2 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index a1f209b..bf41ae4 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -197,7 +197,14 @@ out_unlock:
 	write_unlock(&pag->pag_ici_lock);
 	radix_tree_preload_end();
 out_destroy:
-	xfs_idestroy(ip);
+	/*
+	 * we've already initialised the linux inode, so we have a valid
+	 * reference count of 1 and so we cannot destroy the inode with
+	 * xfs_idestroy. Kill it by dropping the reference count to push
+	 * it through the normal reclaim path so that state on the linux
+	 * inode is also destroyed.
+	 */
+	IRELE(ip);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index c83f699..920a0ae 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -813,14 +813,6 @@ xfs_inode_alloc(
 	ASSERT(!spin_is_locked(&ip->i_flags_lock));
 	ASSERT(completion_done(&ip->i_flush));
 
-	/*
-	 * initialise the VFS inode here to get failures
-	 * out of the way early.
-	 */
-	if (!inode_init_always(mp->m_super, VFS_I(ip))) {
-		kmem_zone_free(xfs_inode_zone, ip);
-		return NULL;
-	}
 
 	/* initialise the xfs inode */
 	ip->i_ino = ino;
@@ -860,6 +852,17 @@ xfs_inode_alloc(
 	ip->i_dir_trace = ktrace_alloc(XFS_DIR2_KTRACE_SIZE, KM_NOFS);
 #endif
 
+	/*
+	 * Now initialise the VFS inode. We do this after the xfs_inode
+	 * initialisation as internal failures will result in ->destroy_inode
+	 * being called and that will pass down through the reclaim path and
+	 * free the XFS inode. This path requires the XFS inode to already be
+	 * initialised. Hence if this call fails, the xfs_inode has already
+	 * been freed and we should not reference it at all in the error
+	 * handling.
+	 */
+	if (!inode_init_always(mp->m_super, VFS_I(ip)))
+		return NULL;
 	return ip;
 }
 
@@ -897,18 +900,16 @@ xfs_iread(
 	 * know that this is a new incore inode.
 	 */
 	error = xfs_itobp(mp, tp, ip, &dip, &bp, bno, imap_flags, XFS_BUF_LOCK);
-	if (error) {
-		xfs_idestroy(ip);
-		return error;
-	}
+	if (error)
+		goto out_error;
+
 
 	/*
 	 * If we got something that isn't an inode it means someone
 	 * (nfs or dmi) has a stale handle.
 	 */
 	if (be16_to_cpu(dip->di_core.di_magic) != XFS_DINODE_MAGIC) {
-		xfs_idestroy(ip);
-		xfs_trans_brelse(tp, bp);
+		error = EINVAL;
 #ifdef DEBUG
 		xfs_fs_cmn_err(CE_ALERT, mp, "xfs_iread: "
 				"dip->di_core.di_magic (0x%x) != "
@@ -916,7 +917,7 @@ xfs_iread(
 				be16_to_cpu(dip->di_core.di_magic),
 				XFS_DINODE_MAGIC);
 #endif /* DEBUG */
-		return XFS_ERROR(EINVAL);
+		goto out_error_relse;
 	}
 
 	/*
@@ -930,14 +931,12 @@ xfs_iread(
 		xfs_dinode_from_disk(&ip->i_d, &dip->di_core);
 		error = xfs_iformat(ip, dip);
 		if (error)  {
-			xfs_idestroy(ip);
-			xfs_trans_brelse(tp, bp);
 #ifdef DEBUG
 			xfs_fs_cmn_err(CE_ALERT, mp, "xfs_iread: "
 					"xfs_iformat() returned error %d",
 					error);
 #endif /* DEBUG */
-			return error;
+			goto out_error_relse;
 		}
 	} else {
 		ip->i_d.di_magic = be16_to_cpu(dip->di_core.di_magic);
@@ -1003,6 +1002,17 @@ xfs_iread(
 	xfs_trans_brelse(tp, bp);
 	*ipp = ip;
 	return 0;
+
+out_error_relse:
+	xfs_trans_brelse(tp, bp);
+out_error:
+	/*
+	 * As per xfs_iget_cache_miss(), we have a valid reference count on
+	 * the inode now so  need to destroy it by dropping the reference
+	 * count.
+	 */
+	IRELE(ip);
+	return XFS_ERROR(error);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: another problem with latest code drops
  2008-10-20  3:17                     ` Dave Chinner
@ 2008-10-20  4:37                       ` Lachlan McIlroy
  2008-10-20  5:29                         ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Lachlan McIlroy @ 2008-10-20  4:37 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-oss

Dave Chinner wrote:
> On Mon, Oct 20, 2008 at 12:37:13PM +1000, Lachlan McIlroy wrote:
>> Dave Chinner wrote:
>>> On Fri, Oct 17, 2008 at 01:04:34PM +1100, Dave Chinner wrote:
>>>> On Fri, Oct 17, 2008 at 12:21:41PM +1100, Dave Chinner wrote:
>>>>> On Fri, Oct 17, 2008 at 11:17:46AM +1000, Lachlan McIlroy wrote:
>>>>>> Dave Chinner wrote:
>>>>>>>> I am seeing a lot of memory used here though:
>>>>>>>>
>>>>>>>> 116605669 116605669  26%    0.23K 6859157       17  27436628K selinux_inode_security
>>>>>>> Ah - I don't run selinux. Sounds like a bug that needs reporting
>>>>>>> to lkml...
>>>>>> I'm sure this is caused by your changes that introduced inode_init_always().
>>>>>> It re-initialises an existing inode without destroying it first so it calls
>>>>>> security_inode_alloc() without calling security_inode_free().
>>>>> I can't think of how. The layers above XFS are symmetric:
>>>> .....
>>>>> And we should have this symmetry everywhere.
>>>>>
>>>>> <thinks for a bit>
>>>>>
>>>>> Hmmmm - maybe the xfs_iget_cache_miss failure paths where we call
>>>>> xfs_idestroy() could leak contexts. We should really call xfs_iput()
>>>>> because we have an initialised linux inode at this point and so
>>>>> we need to go through destroy_inode(). I'll have a bit more of
>>>>> a look, but this doesn't seem to account for the huge number of
>>>>> leaked contexts you reported....
>>>> Patch below that replaces xfs_idestroy() with IRELE() to destroy
>>>> the inode via the normal iput() path. It also fixes a second issue
>>>> that I found by inspection related to security contexts as a result
>>>> of hooking up ->destroy_inode.
>>>>
>>>> It's running QA now.
>>>>
>>>> FWIW, I'm not sure if this patch will apply cleanly - I'm still
>>>> running of my stack of patches and not what has been checked into
>>>> ptools. Any idea of when all the patches in ptools will be pushed
>>>> out to the git tree?
>>> And now with the patch.
>> Nope, that didn't help.  The system still leaks - and at the same
>> apparent rate too.
> 
> I didn't fix xfs_iread() properly - it still calls xfs_idestroy()
> directly rather than dropping reference counts. Updated patch below
> that should fix this.
> 
>> I also hit this panic where we have taken a reference on an inode
>> that has I_CLEAR set.  I suspect we've made it into xfs_iget_cache_hit()
> 
> I don't think there is an iput() in that path.  The only iput() call
> should be the IRELE() I added to xfs_iget_cache_miss(). Can you make
> sure the compiler is not inlining functions so we can pin-point
> where the iput() call is coming from? (i.e. static > STATIC on the
> hit/miss functions)
Just disassembled xfs_iget() and xfs_iget_cache_miss() has been inlined
and we're calling the IRELE() at the end of that function.

> 
>> and found an inode with XFS_IRECLAIMABLE set and since we don't call
>> igrab() we don't do the I_CLEAR check.
> 
> In that case, we call inode_init_always() instead which sets the
> state to I_NEW and the reference count to 1. In the error case,
> the inode will have already been freed and we make
We don't set inode->i_state to i_NEW.  We're stuffing XFS_INEW into
the XFS inode's i_flags field and not removing the I_CLEAR from the
linux inode.  Note that inode_init_always() doesn't touch i_state.

> 
>> I'm not really convinced that
>> activating dead inodes is such a good idea.
> 
> By the time the XFS_IRECLAIMABLE flag is set, I_CLEAR has been set
> on the VFS inode. It is safe to re-use the inode at this point as
> the VFS inode has been "destroyed" and hence all we need to do is
> re-initialise it. We've always done this for inodes in reclaim so
> we don't have to re-read them off disk...
> 
> Cheers,
> 
> Dave.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: another problem with latest code drops
  2008-10-20  4:37                       ` Lachlan McIlroy
@ 2008-10-20  5:29                         ` Dave Chinner
  2008-10-20  6:05                           ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2008-10-20  5:29 UTC (permalink / raw)
  To: Lachlan McIlroy; +Cc: xfs-oss

On Mon, Oct 20, 2008 at 02:37:42PM +1000, Lachlan McIlroy wrote:
> Dave Chinner wrote:
>> On Mon, Oct 20, 2008 at 12:37:13PM +1000, Lachlan McIlroy wrote:
>>> I also hit this panic where we have taken a reference on an inode
>>> that has I_CLEAR set.  I suspect we've made it into xfs_iget_cache_hit()
>>
>> I don't think there is an iput() in that path.  The only iput() call
>> should be the IRELE() I added to xfs_iget_cache_miss(). Can you make
>> sure the compiler is not inlining functions so we can pin-point
>> where the iput() call is coming from? (i.e. static > STATIC on the
>> hit/miss functions)
> Just disassembled xfs_iget() and xfs_iget_cache_miss() has been inlined
> and we're calling the IRELE() at the end of that function.

Ok, that makes more sense.

>>> and found an inode with XFS_IRECLAIMABLE set and since we don't call
>>> igrab() we don't do the I_CLEAR check.
>>
>> In that case, we call inode_init_always() instead which sets the
>> state to I_NEW and the reference count to 1. In the error case,
>> the inode will have already been freed and we make
> We don't set inode->i_state to i_NEW.  We're stuffing XFS_INEW into
> the XFS inode's i_flags field and not removing the I_CLEAR from the
> linux inode.  Note that inode_init_always() doesn't touch i_state.

Yeah, xfs_setup_inode() is what is doing:

	inode->i_state = I_NEW|I_LOCK;

which happens after the cache miss has been processed.

Ok, so the initialisation code needs to clear іnode->i_state
during allocation so that iput() does not complain. The i_state
field is not accessed by anything until the inode is inserted
into the superblock lists, so it should be safe to zero it
in xfs_inode_alloc(). I missed that new_inode() returns an
inode with a zeroed i_state....

Updated patch below, just starting a QA run on it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

XFS: Ensure that we destroy the linux inode after initialisation

Now that XFS initialises the struct inode prior to completing
all checks and insertion into caches, we need to destroy that
state if we fail to instantiate the inode completely. Hence
we cannot just call xfs_idestroy() to clean up state when
such an occurrence happens - we need to go through the normal
reclaim path by dropping the reference count on the linux inode
we now hold. This will prevent leaking security contexts on
failed lookups.

Also, now that we have a ->destroy_inode() method, failing to
allocate a security context for the inode will result in the
xfs_inode being freed via the ->destroy_inode() path internally
to inode_init_always(). Rearrange xfs_inode_alloc() to initialise
the xfs_inode prior to attempting to initialise the VFS inode
so that the reclaim path will work and remove the freeing
of the inode in xfs_inode_alloc() if VFS inode initialisation
fails.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
Version 3:
o initialise i_state early so that we don't trip a BUG in iput()
  when destroying an inode due to a failed initialisation.

Version 2:
o use reference counted destroys in xfs_iread() instead of
  xfs_idestroy() calls as well.

 fs/xfs/xfs_iget.c  |    9 ++++++++-
 fs/xfs/xfs_inode.c |   50 ++++++++++++++++++++++++++++++++------------------
 2 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index a1f209b..bf41ae4 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -197,7 +197,14 @@ out_unlock:
 	write_unlock(&pag->pag_ici_lock);
 	radix_tree_preload_end();
 out_destroy:
-	xfs_idestroy(ip);
+	/*
+	 * we've already initialised the linux inode, so we have a valid
+	 * reference count of 1 and so we cannot destroy the inode with
+	 * xfs_idestroy. Kill it by dropping the reference count to push
+	 * it through the normal reclaim path so that state on the linux
+	 * inode is also destroyed.
+	 */
+	IRELE(ip);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index c83f699..d9ffd55 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -813,14 +813,6 @@ xfs_inode_alloc(
 	ASSERT(!spin_is_locked(&ip->i_flags_lock));
 	ASSERT(completion_done(&ip->i_flush));
 
-	/*
-	 * initialise the VFS inode here to get failures
-	 * out of the way early.
-	 */
-	if (!inode_init_always(mp->m_super, VFS_I(ip))) {
-		kmem_zone_free(xfs_inode_zone, ip);
-		return NULL;
-	}
 
 	/* initialise the xfs inode */
 	ip->i_ino = ino;
@@ -860,6 +852,21 @@ xfs_inode_alloc(
 	ip->i_dir_trace = ktrace_alloc(XFS_DIR2_KTRACE_SIZE, KM_NOFS);
 #endif
 
+	/*
+	 * Now initialise the VFS inode. We do this after the xfs_inode
+	 * initialisation as internal failures will result in ->destroy_inode
+	 * being called and that will pass down through the reclaim path and
+	 * free the XFS inode. This path requires the XFS inode to already be
+	 * initialised. Hence if this call fails, the xfs_inode has already
+	 * been freed and we should not reference it at all in the error
+	 * handling. Note that we need to manually initialise the VFS inode
+	 * state to prevent triggering a BUG in iput() if we fail the inode
+	 * instantiation later on.
+	 */
+	if (!inode_init_always(mp->m_super, VFS_I(ip)))
+		return NULL;
+	VFS_I(ip)->i_state = 0;
+
 	return ip;
 }
 
@@ -897,18 +904,16 @@ xfs_iread(
 	 * know that this is a new incore inode.
 	 */
 	error = xfs_itobp(mp, tp, ip, &dip, &bp, bno, imap_flags, XFS_BUF_LOCK);
-	if (error) {
-		xfs_idestroy(ip);
-		return error;
-	}
+	if (error)
+		goto out_error;
+
 
 	/*
 	 * If we got something that isn't an inode it means someone
 	 * (nfs or dmi) has a stale handle.
 	 */
 	if (be16_to_cpu(dip->di_core.di_magic) != XFS_DINODE_MAGIC) {
-		xfs_idestroy(ip);
-		xfs_trans_brelse(tp, bp);
+		error = EINVAL;
 #ifdef DEBUG
 		xfs_fs_cmn_err(CE_ALERT, mp, "xfs_iread: "
 				"dip->di_core.di_magic (0x%x) != "
@@ -916,7 +921,7 @@ xfs_iread(
 				be16_to_cpu(dip->di_core.di_magic),
 				XFS_DINODE_MAGIC);
 #endif /* DEBUG */
-		return XFS_ERROR(EINVAL);
+		goto out_error_relse;
 	}
 
 	/*
@@ -930,14 +935,12 @@ xfs_iread(
 		xfs_dinode_from_disk(&ip->i_d, &dip->di_core);
 		error = xfs_iformat(ip, dip);
 		if (error)  {
-			xfs_idestroy(ip);
-			xfs_trans_brelse(tp, bp);
 #ifdef DEBUG
 			xfs_fs_cmn_err(CE_ALERT, mp, "xfs_iread: "
 					"xfs_iformat() returned error %d",
 					error);
 #endif /* DEBUG */
-			return error;
+			goto out_error_relse;
 		}
 	} else {
 		ip->i_d.di_magic = be16_to_cpu(dip->di_core.di_magic);
@@ -1003,6 +1006,17 @@ xfs_iread(
 	xfs_trans_brelse(tp, bp);
 	*ipp = ip;
 	return 0;
+
+out_error_relse:
+	xfs_trans_brelse(tp, bp);
+out_error:
+	/*
+	 * As per xfs_iget_cache_miss(), we have a valid reference count on
+	 * the inode now so  need to destroy it by dropping the reference
+	 * count.
+	 */
+	IRELE(ip);
+	return XFS_ERROR(error);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: another problem with latest code drops
  2008-10-20  5:29                         ` Dave Chinner
@ 2008-10-20  6:05                           ` Dave Chinner
  2008-10-20 21:41                             ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2008-10-20  6:05 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-oss

On Mon, Oct 20, 2008 at 04:29:29PM +1100, Dave Chinner wrote:
> On Mon, Oct 20, 2008 at 02:37:42PM +1000, Lachlan McIlroy wrote:
> > Dave Chinner wrote:
> >> On Mon, Oct 20, 2008 at 12:37:13PM +1000, Lachlan McIlroy wrote:
> >>> I also hit this panic where we have taken a reference on an inode
> >>> that has I_CLEAR set.  I suspect we've made it into xfs_iget_cache_hit()
> >>
> >> I don't think there is an iput() in that path.  The only iput() call
> >> should be the IRELE() I added to xfs_iget_cache_miss(). Can you make
> >> sure the compiler is not inlining functions so we can pin-point
> >> where the iput() call is coming from? (i.e. static > STATIC on the
> >> hit/miss functions)
> > Just disassembled xfs_iget() and xfs_iget_cache_miss() has been inlined
> > and we're calling the IRELE() at the end of that function.
> 
> Ok, that makes more sense.
> 
> >>> and found an inode with XFS_IRECLAIMABLE set and since we don't call
> >>> igrab() we don't do the I_CLEAR check.
> >>
> >> In that case, we call inode_init_always() instead which sets the
> >> state to I_NEW and the reference count to 1. In the error case,
> >> the inode will have already been freed and we make
> > We don't set inode->i_state to i_NEW.  We're stuffing XFS_INEW into
> > the XFS inode's i_flags field and not removing the I_CLEAR from the
> > linux inode.  Note that inode_init_always() doesn't touch i_state.
> 
> Yeah, xfs_setup_inode() is what is doing:
> 
> 	inode->i_state = I_NEW|I_LOCK;
> 
> which happens after the cache miss has been processed.
> 
> Ok, so the initialisation code needs to clear іnode->i_state
> during allocation so that iput() does not complain. The i_state
> field is not accessed by anything until the inode is inserted
> into the superblock lists, so it should be safe to zero it
> in xfs_inode_alloc(). I missed that new_inode() returns an
> inode with a zeroed i_state....

On second thoughts, the inode has not been set up properly
by this stage, so we can't really even expect to be able to
call iput() on it. Hmmmm - maybe the only thing we can do
here is call security_inode_free() before xfs_idestroy.

Christoph - any opinions on what the best thing to do is?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] Re: another problem with latest code drops
  2008-10-20  6:05                           ` Dave Chinner
@ 2008-10-20 21:41                             ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2008-10-20 21:41 UTC (permalink / raw)
  To: Lachlan McIlroy, xfs-oss

On Mon, Oct 20, 2008 at 05:05:22PM +1100, Dave Chinner wrote:
> On second thoughts, the inode has not been set up properly
> by this stage, so we can't really even expect to be able to
> call iput() on it. Hmmmm - maybe the only thing we can do
> here is call security_inode_free() before xfs_idestroy.
> 
> Christoph - any opinions on what the best thing to do is?

Calling destroy_inode after exporting it seems most logical.  If we want
to undo init_inode_once we should have VFS functionality taking care
of that, too.

I've implemented this including some preparatory patches dealing with
the bulkstat issue.  It's half-way trough xfsqa so far.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2008-10-20 21:39 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-16  2:06 another problem with latest code drops Lachlan McIlroy
2008-10-16  6:02 ` Dave Chinner
2008-10-16  7:38   ` Lachlan McIlroy
2008-10-16  7:20     ` Dave Chinner
2008-10-16  8:35       ` Lachlan McIlroy
2008-10-16  9:08         ` Christoph Hellwig
2008-10-17  1:13           ` Lachlan McIlroy
2008-10-16 22:29         ` Dave Chinner
2008-10-17  1:17           ` Lachlan McIlroy
2008-10-17  1:21             ` Dave Chinner
2008-10-17  2:04               ` [PATCH] " Dave Chinner
2008-10-17  2:07                 ` Dave Chinner
2008-10-20  2:37                   ` Lachlan McIlroy
2008-10-20  3:17                     ` Dave Chinner
2008-10-20  4:37                       ` Lachlan McIlroy
2008-10-20  5:29                         ` Dave Chinner
2008-10-20  6:05                           ` Dave Chinner
2008-10-20 21:41                             ` Christoph Hellwig
2008-10-17  3:14                 ` Lachlan McIlroy
2008-10-19  9:10       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox