* Help debugging a use after free
@ 2009-04-13 17:42 Stuart MENEFY
2009-04-14 19:10 ` Russell Cattelan
0 siblings, 1 reply; 8+ messages in thread
From: Stuart MENEFY @ 2009-04-13 17:42 UTC (permalink / raw)
To: xfs
Folks
For the last few days I've been trying to debug an apparent use after
free in XFS code. This is in a 2.6.23.17 kernel on an SH4 processor,
so this could easily be an XFS problem which has already been solved,
or an architecture problem. This started because I've been trying to
work around the D-cache aliasing problems which the SH4 suffers from
because of XFS's use of vmap, so its also possible this is an obscure
aliasing problem. I've also been unable to reproduce it on an x86_64,
albeit with a slightly later kernel.
The problem occurs when running a loop which is essentially:
mount /dev/sda1 /mnt
bonnie++ -u root -f -s0 -n16:1024:1024 -b -d /mnt
umount /mnt
on an otherwise clean partition, and usually takes at least 24 hours
to appear. The problem only occurs with SLAB poisoning enabled, and
results in a misaligned access because the poison value when treated
as a pointer is an illegal misaligned:
Unaligned kernel access on behalf of "bonnie++" pid=22419 pc=0x841a6c6c ins=0x8433
Pid : 22419, Comm: bonnie++
PC is at xfs_trans_unlocked_item+0xc/0x60
PC : 841a6c6c SP : 87f29d3c SR : 40008000 TEA : 00402ebe Not tainted
R0 : 00000000 R1 : 841a6c60 R2 : 87f28000 R3 : 00000001
R4 : 6b6b6b6b R5 : 6b6b6b6b R6 : 843342ac R7 : 840139c0
R8 : 6b6b6b6b R9 : 6b6b6b6b R10 : 878cb000 R11 : 8452eb8c
R12 : 00003255 R13 : 00000625 R14 : 87f29d4c
MACH: 00000000 MACL: 0df75800 GBR : 297f9658 PR : 84189d20
Call trace:
[<84189d20>] xfs_iunlock+0x60/0xc0
[<841915da>] xfs_inode_item_push+0x1a/0x40
[<841a6ff4>] xfs_trans_push_ail+0x1b4/0x240
[<841971b6>] xlog_grant_push_ail+0xf6/0x180
[<8419961c>] xfs_log_reserve+0x3c/0xc0
[<841a540a>] xfs_trans_reserve+0x8a/0x220
[<8418d6f0>] xfs_itruncate_finish+0x170/0x420
[<841a7de0>] xfs_trans_ihold+0x0/0x20
[<841a7de0>] xfs_trans_ihold+0x0/0x20
[<841a7e60>] xfs_trans_ijoin+0x0/0xa0
[<841b112a>] xfs_inactive+0x3ea/0x500
[<84189da0>] xfs_ilock+0x0/0xe0
[<841bf75c>] xfs_fs_clear_inode+0x3c/0xa0
[<843379a0>] _spin_unlock+0x0/0x60
[<840818da>] clear_inode+0x5a/0x140
[<84081a92>] generic_delete_inode+0xd2/0x120
[<84081bc6>] generic_drop_inode+0xe6/0x1c0
[<84080e2e>] iput+0x4e/0x80
[<840764ba>] do_unlinkat+0xfa/0x1a0
[<8408fdba>] do_fsync+0x7a/0xe0
[<8408fe40>] __do_fsync+0x20/0x60
[<8407656e>] sys_unlink+0xe/0x20
[<84076560>] sys_unlink+0x0/0x20
[<840088f8>] syscall_call+0xa/0xe
(Note that due to the way backtrace is implemented on SH architectures there
are spurious values in this backtrace, so it is probably safe to ignore any
entries of the form FN+0x0.)
The problem is caused by the "xfs_inode_t" which is passed into
xfs_iunlock() having already been freed at the SLAB level. Putting
some additional tracing code into the free path, this structure is
being freed by xfs_inode_item_destroy(), which is called from from
xfs_idestroy() which is called from xfs_ireclaim().
In between the free and use there appears to be some disk activity,
as I see an allocate from the SCSI layer, plus a few other kmalloc
and kfree's.
Running with XFS_DEBUG enabled hasn't shown any problems, and still
failed in the same way.
So, does this sound like a problem which has been seen before?
Alternatively can anyone suggest what would normally prevent this
happening, and I can look into that (although I know next to nothing
about the guts of the XFS code, so please don't be afraid to state
the obvious!). Or any other suggestions welcome.
Stuart
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Help debugging a use after free
2009-04-13 17:42 Help debugging a use after free Stuart MENEFY
@ 2009-04-14 19:10 ` Russell Cattelan
2009-04-17 15:30 ` Stuart MENEFY
0 siblings, 1 reply; 8+ messages in thread
From: Russell Cattelan @ 2009-04-14 19:10 UTC (permalink / raw)
To: Stuart MENEFY; +Cc: xfs
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Stuart MENEFY wrote:
> Folks
>
> For the last few days I've been trying to debug an apparent use
> after free in XFS code. This is in a 2.6.23.17 kernel on an SH4
> processor, so this could easily be an XFS problem which has already
> been solved, or an architecture problem. This started because I've
> been trying to work around the D-cache aliasing problems which the
> SH4 suffers from because of XFS's use of vmap, so its also possible
> this is an obscure aliasing problem. I've also been unable to
> reproduce it on an x86_64, albeit with a slightly later kernel.
>
> The problem occurs when running a loop which is essentially: mount
> /dev/sda1 /mnt bonnie++ -u root -f -s0 -n16:1024:1024 -b -d /mnt
> umount /mnt on an otherwise clean partition, and usually takes at
> least 24 hours to appear. The problem only occurs with SLAB
> poisoning enabled, and results in a misaligned access because the
> poison value when treated as a pointer is an illegal misaligned:
>
> Unaligned kernel access on behalf of "bonnie++" pid=22419
> pc=0x841a6c6c ins=0x8433 Pid : 22419, Comm: bonnie++ PC
> is at xfs_trans_unlocked_item+0xc/0x60 PC : 841a6c6c SP :
> 87f29d3c SR : 40008000 TEA : 00402ebe Not tainted R0 :
> 00000000 R1 : 841a6c60 R2 : 87f28000 R3 : 00000001 R4 :
> 6b6b6b6b R5 : 6b6b6b6b R6 : 843342ac R7 : 840139c0 R8 :
> 6b6b6b6b R9 : 6b6b6b6b R10 : 878cb000 R11 : 8452eb8c R12 :
> 00003255 R13 : 00000625 R14 : 87f29d4c MACH: 00000000 MACL:
> 0df75800 GBR : 297f9658 PR : 84189d20
>
> Call trace: [<84189d20>] xfs_iunlock+0x60/0xc0 [<841915da>]
> xfs_inode_item_push+0x1a/0x40 [<841a6ff4>]
> xfs_trans_push_ail+0x1b4/0x240 [<841971b6>]
> xlog_grant_push_ail+0xf6/0x180 [<8419961c>]
> xfs_log_reserve+0x3c/0xc0 [<841a540a>] xfs_trans_reserve+0x8a/0x220
> [<8418d6f0>] xfs_itruncate_finish+0x170/0x420 [<841a7de0>]
> xfs_trans_ihold+0x0/0x20 [<841a7de0>] xfs_trans_ihold+0x0/0x20
> [<841a7e60>] xfs_trans_ijoin+0x0/0xa0 [<841b112a>]
> xfs_inactive+0x3ea/0x500 [<84189da0>] xfs_ilock+0x0/0xe0
> [<841bf75c>] xfs_fs_clear_inode+0x3c/0xa0 [<843379a0>]
> _spin_unlock+0x0/0x60 [<840818da>] clear_inode+0x5a/0x140
> [<84081a92>] generic_delete_inode+0xd2/0x120 [<84081bc6>]
> generic_drop_inode+0xe6/0x1c0 [<84080e2e>] iput+0x4e/0x80
> [<840764ba>] do_unlinkat+0xfa/0x1a0 [<8408fdba>] do_fsync+0x7a/0xe0
> [<8408fe40>] __do_fsync+0x20/0x60 [<8407656e>] sys_unlink+0xe/0x20
> [<84076560>] sys_unlink+0x0/0x20 [<840088f8>] syscall_call+0xa/0xe
>
>
> (Note that due to the way backtrace is implemented on SH
> architectures there are spurious values in this backtrace, so it is
> probably safe to ignore any entries of the form FN+0x0.)
>
> The problem is caused by the "xfs_inode_t" which is passed into
> xfs_iunlock() having already been freed at the SLAB level. Putting
> some additional tracing code into the free path, this structure is
> being freed by xfs_inode_item_destroy(), which is called from from
> xfs_idestroy() which is called from xfs_ireclaim().
I guess the first question is which thread has actually free'ed the
inode? it sounds like a race
that seems unlikely unless you are on a multi processor system. I
don't know anything about the SH4 architecture
but I assume this is a single processor system?
If you can instrument actually free that is causing the problem it
would be good to print out the
actually pid doing the free and as many return addresses that you can,
so we can get an
idea of the actual call chain.
>
> In between the free and use there appears to be some disk activity,
> as I see an allocate from the SCSI layer, plus a few other kmalloc
> and kfree's.
>
> Running with XFS_DEBUG enabled hasn't shown any problems, and still
> failed in the same way.
I wouldn't expect XFS_DEBUG to point out anything particularly
interesting in this area,
but it never hurts to try.
>
> So, does this sound like a problem which has been seen before?
> Alternatively can anyone suggest what would normally prevent this
> happening, and I can look into that (although I know next to
> nothing about the guts of the XFS code, so please don't be afraid
> to state the obvious!). Or any other suggestions welcome.
>
> Stuart
>
> _______________________________________________ xfs mailing list
> xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFJ5N+LNRmM+OaGhBgRAn8vAJoCUKDPYWEyZlXs+PgwASpXSfLT9gCfRQja
9TX1Y/ntxb8Ux9pzvg3V0Yg=
=2Rcd
-----END PGP SIGNATURE-----
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Help debugging a use after free
2009-04-14 19:10 ` Russell Cattelan
@ 2009-04-17 15:30 ` Stuart MENEFY
2009-04-19 8:17 ` Dave Chinner
0 siblings, 1 reply; 8+ messages in thread
From: Stuart MENEFY @ 2009-04-17 15:30 UTC (permalink / raw)
To: Russell Cattelan; +Cc: xfs
Thanks for getting back to me.
Russell Cattelan wrote:
> Stuart MENEFY wrote:
>> Folks
>>
>> For the last few days I've been trying to debug an apparent use
>> after free in XFS code. This is in a 2.6.23.17 kernel on an SH4
>> processor, so this could easily be an XFS problem which has already
>> been solved, or an architecture problem. This started because I've
>> been trying to work around the D-cache aliasing problems which the
>> SH4 suffers from because of XFS's use of vmap, so its also possible
>> this is an obscure aliasing problem. I've also been unable to
>> reproduce it on an x86_64, albeit with a slightly later kernel.
>>
>> The problem occurs when running a loop which is essentially: mount
>> /dev/sda1 /mnt bonnie++ -u root -f -s0 -n16:1024:1024 -b -d /mnt
>> umount /mnt on an otherwise clean partition, and usually takes at
>> least 24 hours to appear. The problem only occurs with SLAB
>> poisoning enabled, and results in a misaligned access because the
>> poison value when treated as a pointer is an illegal misaligned:
>>
>> Unaligned kernel access on behalf of "bonnie++" pid=22419
>> pc=0x841a6c6c ins=0x8433 Pid : 22419, Comm: bonnie++ PC
>> is at xfs_trans_unlocked_item+0xc/0x60 PC : 841a6c6c SP :
>> 87f29d3c SR : 40008000 TEA : 00402ebe Not tainted R0 :
>> 00000000 R1 : 841a6c60 R2 : 87f28000 R3 : 00000001 R4 :
>> 6b6b6b6b R5 : 6b6b6b6b R6 : 843342ac R7 : 840139c0 R8 :
>> 6b6b6b6b R9 : 6b6b6b6b R10 : 878cb000 R11 : 8452eb8c R12 :
>> 00003255 R13 : 00000625 R14 : 87f29d4c MACH: 00000000 MACL:
>> 0df75800 GBR : 297f9658 PR : 84189d20
>>
>> Call trace: [<84189d20>] xfs_iunlock+0x60/0xc0 [<841915da>]
>> xfs_inode_item_push+0x1a/0x40 [<841a6ff4>]
>> xfs_trans_push_ail+0x1b4/0x240 [<841971b6>]
>> xlog_grant_push_ail+0xf6/0x180 [<8419961c>]
>> xfs_log_reserve+0x3c/0xc0 [<841a540a>] xfs_trans_reserve+0x8a/0x220
>> [<8418d6f0>] xfs_itruncate_finish+0x170/0x420 [<841a7de0>]
>> xfs_trans_ihold+0x0/0x20 [<841a7de0>] xfs_trans_ihold+0x0/0x20
>> [<841a7e60>] xfs_trans_ijoin+0x0/0xa0 [<841b112a>]
>> xfs_inactive+0x3ea/0x500 [<84189da0>] xfs_ilock+0x0/0xe0
>> [<841bf75c>] xfs_fs_clear_inode+0x3c/0xa0 [<843379a0>]
>> _spin_unlock+0x0/0x60 [<840818da>] clear_inode+0x5a/0x140
>> [<84081a92>] generic_delete_inode+0xd2/0x120 [<84081bc6>]
>> generic_drop_inode+0xe6/0x1c0 [<84080e2e>] iput+0x4e/0x80
>> [<840764ba>] do_unlinkat+0xfa/0x1a0 [<8408fdba>] do_fsync+0x7a/0xe0
>> [<8408fe40>] __do_fsync+0x20/0x60 [<8407656e>] sys_unlink+0xe/0x20
>> [<84076560>] sys_unlink+0x0/0x20 [<840088f8>] syscall_call+0xa/0xe
>>
>>
>> (Note that due to the way backtrace is implemented on SH
>> architectures there are spurious values in this backtrace, so it is
>> probably safe to ignore any entries of the form FN+0x0.)
>>
>> The problem is caused by the "xfs_inode_t" which is passed into
>> xfs_iunlock() having already been freed at the SLAB level. Putting
>> some additional tracing code into the free path, this structure is
>> being freed by xfs_inode_item_destroy(), which is called from from
>> xfs_idestroy() which is called from xfs_ireclaim().
>
> I guess the first question is which thread has actually free'ed the
> inode? it sounds like a race
> that seems unlikely unless you are on a multi processor system. I
> don't know anything about the SH4 architecture
> but I assume this is a single processor system?
Correct, its a simple, in-order, uni-processor system.
> If you can instrument actually free that is causing the problem it
> would be good to print out the
> actually pid doing the free and as many return addresses that you can,
> so we can get an
> idea of the actual call chain.
The free is coming from the xfssyncd thread, the back trace looks like:
[<8418d0e2>] xfs_idestroy+0x22/0x100
[<8418a1b0>] xfs_ireclaim+0x50/0x80
[<841ad7f2>] xfs_finish_reclaim+0x32/0x1c0
[<841ada30>] xfs_finish_reclaim_all+0xb0/0x100
[<8418a780>] xfs_ilock_nowait+0x0/0x160
[<841a9df2>] xfs_syncsub+0x52/0x360
[<84335108>] schedule_timeout+0x48/0x100
[<841ab684>] xfs_sync+0x24/0x40
[<841e0ce0>] list_add+0x0/0x20
[<841c041c>] vfs_sync+0x1c/0x40
[<841bf37c>] vfs_sync_worker+0x1c/0x60
[<841bf6b6>] xfssyncd+0xb6/0x140
[<8402f0dc>] kthread+0x3c/0x80
[<84012440>] complete+0x0/0x60
[<841bf600>] xfssyncd+0x0/0x140
[<8402f060>] kthread_should_stop+0x0/0x20
[<84003984>] kernel_thread_helper+0x4/0x20
[<8402f0a0>] kthread+0x0/0x80
[<84003980>] kernel_thread_helper+0x0/0x20
with the same warning that there are probably some spurious entries here.
As before, any suggestions welcome.
Stuart
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Help debugging a use after free
2009-04-17 15:30 ` Stuart MENEFY
@ 2009-04-19 8:17 ` Dave Chinner
2009-04-23 16:09 ` Stuart MENEFY
0 siblings, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2009-04-19 8:17 UTC (permalink / raw)
To: Stuart MENEFY; +Cc: Russell Cattelan, xfs
On Fri, Apr 17, 2009 at 04:30:09PM +0100, Stuart MENEFY wrote:
> > If you can instrument actually free that is causing the problem it
> > would be good to print out the
> > actually pid doing the free and as many return addresses that you can,
> > so we can get an
> > idea of the actual call chain.
>
> The free is coming from the xfssyncd thread, the back trace looks like:
>
> [<8418d0e2>] xfs_idestroy+0x22/0x100
> [<8418a1b0>] xfs_ireclaim+0x50/0x80
> [<841ad7f2>] xfs_finish_reclaim+0x32/0x1c0
> [<841ada30>] xfs_finish_reclaim_all+0xb0/0x100
> [<8418a780>] xfs_ilock_nowait+0x0/0x160
> [<841a9df2>] xfs_syncsub+0x52/0x360
> [<84335108>] schedule_timeout+0x48/0x100
> [<841ab684>] xfs_sync+0x24/0x40
> [<841e0ce0>] list_add+0x0/0x20
> [<841c041c>] vfs_sync+0x1c/0x40
> [<841bf37c>] vfs_sync_worker+0x1c/0x60
> [<841bf6b6>] xfssyncd+0xb6/0x140
> [<8402f0dc>] kthread+0x3c/0x80
> [<84012440>] complete+0x0/0x60
> [<841bf600>] xfssyncd+0x0/0x140
> [<8402f060>] kthread_should_stop+0x0/0x20
> [<84003984>] kernel_thread_helper+0x4/0x20
> [<8402f0a0>] kthread+0x0/0x80
> [<84003980>] kernel_thread_helper+0x0/0x20
There were some use-after-free fixes in .27 timeframe to the inode
reclaim code. Can you reeetest with a more recent kernel?
Cheers,
Dave.
--
Dave Chinner
dgc@evostor.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Help debugging a use after free
2009-04-19 8:17 ` Dave Chinner
@ 2009-04-23 16:09 ` Stuart MENEFY
2009-04-23 16:24 ` Russell Cattelan
0 siblings, 1 reply; 8+ messages in thread
From: Stuart MENEFY @ 2009-04-23 16:09 UTC (permalink / raw)
To: Dave Chinner; +Cc: Russell Cattelan, xfs
Dave Chinner wrote:
> On Fri, Apr 17, 2009 at 04:30:09PM +0100, Stuart MENEFY wrote:
>>> If you can instrument actually free that is causing the problem it
>>> would be good to print out the
>>> actually pid doing the free and as many return addresses that you can,
>>> so we can get an
>>> idea of the actual call chain.
>> The free is coming from the xfssyncd thread, the back trace looks like:
>>
>> [<8418d0e2>] xfs_idestroy+0x22/0x100
>> [<8418a1b0>] xfs_ireclaim+0x50/0x80
>> [<841ad7f2>] xfs_finish_reclaim+0x32/0x1c0
>> [<841ada30>] xfs_finish_reclaim_all+0xb0/0x100
>> [<8418a780>] xfs_ilock_nowait+0x0/0x160
>> [<841a9df2>] xfs_syncsub+0x52/0x360
>> [<84335108>] schedule_timeout+0x48/0x100
>> [<841ab684>] xfs_sync+0x24/0x40
>> [<841e0ce0>] list_add+0x0/0x20
>> [<841c041c>] vfs_sync+0x1c/0x40
>> [<841bf37c>] vfs_sync_worker+0x1c/0x60
>> [<841bf6b6>] xfssyncd+0xb6/0x140
>> [<8402f0dc>] kthread+0x3c/0x80
>> [<84012440>] complete+0x0/0x60
>> [<841bf600>] xfssyncd+0x0/0x140
>> [<8402f060>] kthread_should_stop+0x0/0x20
>> [<84003984>] kernel_thread_helper+0x4/0x20
>> [<8402f0a0>] kthread+0x0/0x80
>> [<84003980>] kernel_thread_helper+0x0/0x20
>
> There were some use-after-free fixes in .27 timeframe to the inode
> reclaim code. Can you reeetest with a more recent kernel?
Possibly. I have a half-finished port of 2.6.27-rc4 to our hardware
which I could probably get going fairly quickly. Do you think that
would be sufficient to pick up the use-after-free fixes?
Thanks
Stuart
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Help debugging a use after free
2009-04-23 16:09 ` Stuart MENEFY
@ 2009-04-23 16:24 ` Russell Cattelan
2009-04-23 18:37 ` Stuart MENEFY
0 siblings, 1 reply; 8+ messages in thread
From: Russell Cattelan @ 2009-04-23 16:24 UTC (permalink / raw)
To: Stuart MENEFY; +Cc: Russell Cattelan, xfs
Stuart MENEFY wrote:
> Dave Chinner wrote:
>
>> On Fri, Apr 17, 2009 at 04:30:09PM +0100, Stuart MENEFY wrote:
>>
>>>> If you can instrument actually free that is causing the problem it
>>>> would be good to print out the
>>>> actually pid doing the free and as many return addresses that you can,
>>>> so we can get an
>>>> idea of the actual call chain.
>>>>
>>> The free is coming from the xfssyncd thread, the back trace looks like:
>>>
>>> [<8418d0e2>] xfs_idestroy+0x22/0x100
>>> [<8418a1b0>] xfs_ireclaim+0x50/0x80
>>> [<841ad7f2>] xfs_finish_reclaim+0x32/0x1c0
>>> [<841ada30>] xfs_finish_reclaim_all+0xb0/0x100
>>> [<8418a780>] xfs_ilock_nowait+0x0/0x160
>>> [<841a9df2>] xfs_syncsub+0x52/0x360
>>> [<84335108>] schedule_timeout+0x48/0x100
>>> [<841ab684>] xfs_sync+0x24/0x40
>>> [<841e0ce0>] list_add+0x0/0x20
>>> [<841c041c>] vfs_sync+0x1c/0x40
>>> [<841bf37c>] vfs_sync_worker+0x1c/0x60
>>> [<841bf6b6>] xfssyncd+0xb6/0x140
>>> [<8402f0dc>] kthread+0x3c/0x80
>>> [<84012440>] complete+0x0/0x60
>>> [<841bf600>] xfssyncd+0x0/0x140
>>> [<8402f060>] kthread_should_stop+0x0/0x20
>>> [<84003984>] kernel_thread_helper+0x4/0x20
>>> [<8402f0a0>] kthread+0x0/0x80
>>> [<84003980>] kernel_thread_helper+0x0/0x20
>>>
>> There were some use-after-free fixes in .27 timeframe to the inode
>> reclaim code. Can you reeetest with a more recent kernel?
>>
>
> Possibly. I have a half-finished port of 2.6.27-rc4
> to our hardware
> which I could probably get going fairly quickly. Do you think that
> would be sufficient to pick up the use-after-free fixes?
>
> Thanks
>
> Stuart
>
>
Any reason you are using an rc4? vs 2.6.27?
I would simply run a diff over rc4 fs/xfs and 2.6.27 fs/xfs and see what
is different if anything.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Help debugging a use after free
2009-04-23 16:24 ` Russell Cattelan
@ 2009-04-23 18:37 ` Stuart MENEFY
2009-04-23 21:35 ` Russell Cattelan
0 siblings, 1 reply; 8+ messages in thread
From: Stuart MENEFY @ 2009-04-23 18:37 UTC (permalink / raw)
To: Russell Cattelan; +Cc: xfs
Russell Cattelan wrote:
> Stuart MENEFY wrote:
>> Dave Chinner wrote:
>>
>>> On Fri, Apr 17, 2009 at 04:30:09PM +0100, Stuart MENEFY wrote:
>>>
>>>>> If you can instrument actually free that is causing the problem it
>>>>> would be good to print out the
>>>>> actually pid doing the free and as many return addresses that you can,
>>>>> so we can get an
>>>>> idea of the actual call chain.
>>>>>
>>>> The free is coming from the xfssyncd thread, the back trace looks like:
>>>>
>>>> [<8418d0e2>] xfs_idestroy+0x22/0x100
>>>> [<8418a1b0>] xfs_ireclaim+0x50/0x80
>>>> [<841ad7f2>] xfs_finish_reclaim+0x32/0x1c0
>>>> [<841ada30>] xfs_finish_reclaim_all+0xb0/0x100
>>>> [<8418a780>] xfs_ilock_nowait+0x0/0x160
>>>> [<841a9df2>] xfs_syncsub+0x52/0x360
>>>> [<84335108>] schedule_timeout+0x48/0x100
>>>> [<841ab684>] xfs_sync+0x24/0x40
>>>> [<841e0ce0>] list_add+0x0/0x20
>>>> [<841c041c>] vfs_sync+0x1c/0x40
>>>> [<841bf37c>] vfs_sync_worker+0x1c/0x60
>>>> [<841bf6b6>] xfssyncd+0xb6/0x140
>>>> [<8402f0dc>] kthread+0x3c/0x80
>>>> [<84012440>] complete+0x0/0x60
>>>> [<841bf600>] xfssyncd+0x0/0x140
>>>> [<8402f060>] kthread_should_stop+0x0/0x20
>>>> [<84003984>] kernel_thread_helper+0x4/0x20
>>>> [<8402f0a0>] kthread+0x0/0x80
>>>> [<84003980>] kernel_thread_helper+0x0/0x20
>>>>
>>> There were some use-after-free fixes in .27 timeframe to the inode
>>> reclaim code. Can you reeetest with a more recent kernel?
>>>
>> Possibly. I have a half-finished port of 2.6.27-rc4
>> to our hardware
>> which I could probably get going fairly quickly. Do you think that
>> would be sufficient to pick up the use-after-free fixes?
>>
>> Thanks
>>
>> Stuart
>>
>>
> Any reason you are using an rc4? vs 2.6.27?
None, it just happened to be the latest at the time I did the work.
> I would simply run a diff over rc4 fs/xfs and 2.6.27 fs/xfs and see what
> is different if anything.
There are about 500 lines of diff. Tracking them back to the git commits its
about 10 commits, most of which are self contained XFS specific changes,
and a couple are simple changes which affect all users of a kernel wide API.
So on a first glance bringing the XFS code upto 2.6.27 level wouldn't be a
problem. One of these change is a use-after-free fix (in buffers, not inodes
which appears to be the problem I'm seeing), so its probably worth the effort.
OK, I'll give that a try, and see what happens.
Stuart
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Help debugging a use after free
2009-04-23 18:37 ` Stuart MENEFY
@ 2009-04-23 21:35 ` Russell Cattelan
0 siblings, 0 replies; 8+ messages in thread
From: Russell Cattelan @ 2009-04-23 21:35 UTC (permalink / raw)
To: Stuart MENEFY; +Cc: Russell Cattelan, xfs
Stuart MENEFY wrote:
> Russell Cattelan wrote:
>
>> Stuart MENEFY wrote:
>>
>>> Dave Chinner wrote:
>>>
>>>
>>>> On Fri, Apr 17, 2009 at 04:30:09PM +0100, Stuart MENEFY wrote:
>>>>
>>>>
>>>>>> If you can instrument actually free that is causing the problem it
>>>>>> would be good to print out the
>>>>>> actually pid doing the free and as many return addresses that you can,
>>>>>> so we can get an
>>>>>> idea of the actual call chain.
>>>>>>
>>>>>>
>>>>> The free is coming from the xfssyncd thread, the back trace looks like:
>>>>>
>>>>> [<8418d0e2>] xfs_idestroy+0x22/0x100
>>>>> [<8418a1b0>] xfs_ireclaim+0x50/0x80
>>>>> [<841ad7f2>] xfs_finish_reclaim+0x32/0x1c0
>>>>> [<841ada30>] xfs_finish_reclaim_all+0xb0/0x100
>>>>> [<8418a780>] xfs_ilock_nowait+0x0/0x160
>>>>> [<841a9df2>] xfs_syncsub+0x52/0x360
>>>>> [<84335108>] schedule_timeout+0x48/0x100
>>>>> [<841ab684>] xfs_sync+0x24/0x40
>>>>> [<841e0ce0>] list_add+0x0/0x20
>>>>> [<841c041c>] vfs_sync+0x1c/0x40
>>>>> [<841bf37c>] vfs_sync_worker+0x1c/0x60
>>>>> [<841bf6b6>] xfssyncd+0xb6/0x140
>>>>> [<8402f0dc>] kthread+0x3c/0x80
>>>>> [<84012440>] complete+0x0/0x60
>>>>> [<841bf600>] xfssyncd+0x0/0x140
>>>>> [<8402f060>] kthread_should_stop+0x0/0x20
>>>>> [<84003984>] kernel_thread_helper+0x4/0x20
>>>>> [<8402f0a0>] kthread+0x0/0x80
>>>>> [<84003980>] kernel_thread_helper+0x0/0x20
>>>>>
>>>>>
>>>> There were some use-after-free fixes in .27 timeframe to the inode
>>>> reclaim code. Can you reeetest with a more recent kernel?
>>>>
>>>>
>>> Possibly. I have a half-finished port of 2.6.27-rc4
>>> to our hardware
>>> which I could probably get going fairly quickly. Do you think that
>>> would be sufficient to pick up the use-after-free fixes?
>>>
>>> Thanks
>>>
>>> Stuart
>>>
>>>
>>>
>> Any reason you are using an rc4? vs 2.6.27?
>>
>
> None, it just happened to be the latest at the time I did the work.
>
>
>> I would simply run a diff over rc4 fs/xfs and 2.6.27 fs/xfs and see what
>> is different if anything.
>>
>
> There are about 500 lines of diff. Tracking them back to the git commits its
> about 10 commits, most of which are self contained XFS specific changes,
> and a couple are simple changes which affect all users of a kernel wide API.
> So on a first glance bringing the XFS code upto 2.6.27 level wouldn't be a
> problem. One of these change is a use-after-free fix (in buffers, not inodes
> which appears to be the problem I'm seeing), so its probably worth the effort.
>
> OK, I'll give that a try, and see what happens.
>
> Stuart
>
>
Ya I have not looked specifically to see what is there in terms of inode
use after free patches.
The problem will be to track down why the ref counting is out of wack,
and why it
is allowing for the inode to be released.
I wouldn't hurt to start with a more current xfs if possible to see if
it is still a problem.
-Russell
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2009-04-23 21:36 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-13 17:42 Help debugging a use after free Stuart MENEFY
2009-04-14 19:10 ` Russell Cattelan
2009-04-17 15:30 ` Stuart MENEFY
2009-04-19 8:17 ` Dave Chinner
2009-04-23 16:09 ` Stuart MENEFY
2009-04-23 16:24 ` Russell Cattelan
2009-04-23 18:37 ` Stuart MENEFY
2009-04-23 21:35 ` Russell Cattelan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox