public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* xfs leaking?
@ 2008-07-11 17:04 Eric Sandeen
  2008-07-11 23:38 ` Dave Chinner
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2008-07-11 17:04 UTC (permalink / raw)
  To: xfs-oss

after my  fill-the-1T-fs-with-20k-files test I tried an xfs_repair, and
it was sorrowfully slow compared to e2fsck of ext4 - I stopped it after
almost 2 hours, and only half complete.

I noticed that during the run, I was about out of memory (8G) and
swapping badly.

So I unmounted the fs, dropped caches, and was astounded to find
10492540 buffer heads still in the slab caches.

This was all  on 2.6.26-rc2 (I need to update) and lazy-count=1, 1T fs,
32 ags, mounted with inode64, nobarriers, and maximal logbuf count & size.

Rebooted, let the fs_mark test run just a bit, then tried removing the
xfs module because I forgot to load the one with dave's  fix, and:


slab error in kmem_cache_destroy(): cache `xfs_inode': Can't free all
objects
Pid: 3676, comm: rmmod Not tainted 2.6.26-rc2 #3

Call Trace:
 [<ffffffff80287e18>] kmem_cache_destroy+0x7d/0xb9
 [<ffffffffa03e6708>] :xfs:xfs_cleanup+0x5c/0xf9
 [<ffffffffa03e67bf>] :xfs:exit_xfs_fs+0x1a/0x28
 [<ffffffff80250b1f>] sys_delete_module+0x186/0x1de
 [<ffffffff8020bee2>] tracesys+0xd5/0xda

slab error in kmem_cache_destroy(): cache `xfs_buf_item': Can't free all
objects
Pid: 3676, comm: rmmod Not tainted 2.6.26-rc2 #3

Call Trace:
 [<ffffffff80287e18>] kmem_cache_destroy+0x7d/0xb9
 [<ffffffffa03e674c>] :xfs:xfs_cleanup+0xa0/0xf9
 [<ffffffffa03e67bf>] :xfs:exit_xfs_fs+0x1a/0x28
 [<ffffffff80250b1f>] sys_delete_module+0x186/0x1de
 [<ffffffff8020bee2>] tracesys+0xd5/0xda

slab error in kmem_cache_destroy(): cache `xfs_ili': Can't free all objects
Pid: 3676, comm: rmmod Not tainted 2.6.26-rc2 #3

Call Trace:
 [<ffffffff80287e18>] kmem_cache_destroy+0x7d/0xb9
 [<ffffffffa03e6790>] :xfs:xfs_cleanup+0xe4/0xf9
 [<ffffffffa03e67bf>] :xfs:exit_xfs_fs+0x1a/0x28
 [<ffffffff80250b1f>] sys_delete_module+0x186/0x1de
 [<ffffffff8020bee2>] tracesys+0xd5/0xda

slab error in kmem_cache_destroy(): cache `xfs_buf': Can't free all objects
Pid: 3676, comm: rmmod Not tainted 2.6.26-rc2 #3

Call Trace:
 [<ffffffff80287e18>] kmem_cache_destroy+0x7d/0xb9
 [<ffffffffa03e67c4>] :xfs:exit_xfs_fs+0x1f/0x28
 [<ffffffff80250b1f>] sys_delete_module+0x186/0x1de
 [<ffffffff8020bee2>] tracesys+0xd5/0xda

slab error in kmem_cache_destroy(): cache `xfs_vnode': Can't free all
objects
Pid: 3676, comm: rmmod Not tainted 2.6.26-rc2 #3

Call Trace:
 [<ffffffff80287e18>] kmem_cache_destroy+0x7d/0xb9
 [<ffffffffa03e5e5a>] :xfs:xfs_destroy_zones+0x21/0x36
 [<ffffffff80250b1f>] sys_delete_module+0x186/0x1de
 [<ffffffff8020bee2>] tracesys+0xd5/0xda

BUG: unable to handle kernel paging request at ffffffffa03ebabb
IP: [<ffffffff8031a399>] strnlen+0x11/0x1a
PGD 203067 PUD 207063 PMD 21d714067 PTE 0
Oops: 0000 [1] SMP
CPU 2
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6
dm_multipath sbs sbshc battery acpi_memhotplug ac parport_pc lp parport
sg dcdbas ide_cd_mod cdrom tg3 button serio_raw k8temp i2c_piix4 shpchp
pcspkr i2c_core hwmon dm_snapshot dm_zero dm_mirror dm_log dm_mod
qla2xxx scsi_transport_fc sata_svw libata sd_mod scsi_mod ext3 jbd
uhci_hcd ohci_hcd ehci_hcd [last unloaded: xfs]
Pid: 3687, comm: grep Not tainted 2.6.26-rc2 #3
RIP: 0010:[<ffffffff8031a399>]  [<ffffffff8031a399>] strnlen+0x11/0x1a
RSP: 0018:ffff810107163cc0  EFLAGS: 00010297
RAX: ffffffffa03ebabb RBX: ffff810107163d28 RCX: ffffffff8056ae84
RDX: ffff810107163d58 RSI: fffffffffffffffe RDI: ffffffffa03ebabb
RBP: ffff81010718b0cc R08: 00000000ffffffff R09: 0000000000000240
R10: ffffffffffffffff R11: ffff81011fc113c0 R12: ffffffffa03ebabb
R13: 0000000000000011 R14: 0000000000000010 R15: ffff81010718c000
FS:  00007f09e29386e0(0000) GS:ffff81011faa3940(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffa03ebabb CR3: 000000011d4b9000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process grep (pid: 3687, threadinfo ffff810107162000, task ffff81011dd461c0)
Stack:  ffffffff8031b5a2 0000000000000001 000000008029c713 0000000000000f34
 ffff81010718b0cc ffffffff8056ae84 ffff81011dc420c0 0000000000039c1c
 ffff81011ddb6540 0000000000000000 0000000000008404 ffff81011dc420c0
Call Trace:
 [<ffffffff8031b5a2>] ? vsnprintf+0x31a/0x585
 [<ffffffff802a4977>] ? seq_printf+0x67/0x8f
 [<ffffffff80285a64>] ? s_show+0x160/0x28d
 [<ffffffff80285b2c>] ? s_show+0x228/0x28d
 [<ffffffff802a4e30>] ? seq_read+0x109/0x29d
 [<ffffffff802c7dc0>] ? proc_reg_read+0x73/0x8e
 [<ffffffff8028c6c0>] ? vfs_read+0xaa/0x132
 [<ffffffff8028ca5c>] ? sys_read+0x45/0x6e
 [<ffffffff8020bee2>] ? tracesys+0xd5/0xda


Code: f2 ae 48 f7 d1 48 8d 44 11 ff 40 38 30 74 0a 48 ff c8 48 39 d0 73
f3 31 c0 c3 48 89 f8 eb 03 48 ff c0 48 ff ce 48 83 fe ff 74 05 <80> 38
00 75 ef 48 29 f8 c3 31 c0 eb 12 41 38 c8 74 0a 48 ff c2
RIP  [<ffffffff8031a399>] strnlen+0x11/0x1a
 RSP <ffff810107163cc0>
CR2: ffffffffa03ebabb
---[ end trace 6767d9b951178909 ]---

-Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: xfs leaking?
  2008-07-11 17:04 xfs leaking? Eric Sandeen
@ 2008-07-11 23:38 ` Dave Chinner
  2008-07-12  0:35   ` Eric Sandeen
  2008-07-12  1:31   ` Eric Sandeen
  0 siblings, 2 replies; 8+ messages in thread
From: Dave Chinner @ 2008-07-11 23:38 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs-oss

On Fri, Jul 11, 2008 at 12:04:10PM -0500, Eric Sandeen wrote:
> after my  fill-the-1T-fs-with-20k-files test I tried an xfs_repair, and
> it was sorrowfully slow compared to e2fsck of ext4 - I stopped it after
> almost 2 hours, and only half complete.
> 
> I noticed that during the run, I was about out of memory (8G) and
> swapping badly.
> 
> So I unmounted the fs, dropped caches, and was astounded to find
> 10492540 buffer heads still in the slab caches.

Curious - that implies buffer heads aren't being freed. But
if the pages have been freed, then the bufferheads should have
been. Generic code or VM bug perhaps?

> This was all  on 2.6.26-rc2 (I need to update) and lazy-count=1, 1T fs,
> 32 ags, mounted with inode64, nobarriers, and maximal logbuf count & size.
> 
> Rebooted, let the fs_mark test run just a bit, then tried removing the
> xfs module because I forgot to load the one with dave's  fix, and:
> 
> slab error in kmem_cache_destroy(): cache `xfs_inode': Can't free all objects
> Pid: 3676, comm: rmmod Not tainted 2.6.26-rc2 #3
> 
> Call Trace:
>  [<ffffffff80287e18>] kmem_cache_destroy+0x7d/0xb9
>  [<ffffffffa03e6708>] :xfs:xfs_cleanup+0x5c/0xf9
>  [<ffffffffa03e67bf>] :xfs:exit_xfs_fs+0x1a/0x28
>  [<ffffffff80250b1f>] sys_delete_module+0x186/0x1de
>  [<ffffffff8020bee2>] tracesys+0xd5/0xda

....

That's rather unhealthy. I'm running a 2.6.26-rc9-git? on
my uml system, and when no filesystems are mounted all the XFS
slab caches have zero objects in them, even after several runs
on xfsqa. So I don't see any obvious leak here. You did unmount
the filesystem(s) first, right?

I'd suggest updating to 2.6.26-rc9 and repeating the test.  After
unmounting all the filesystems and before you rmmod the kernel
module, dump /proc/slabinfo so we can see if there are remaining
objects in the XFs slabs....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: xfs leaking?
  2008-07-11 23:38 ` Dave Chinner
@ 2008-07-12  0:35   ` Eric Sandeen
  2008-07-12  2:02     ` Eric Sandeen
  2008-07-12  1:31   ` Eric Sandeen
  1 sibling, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2008-07-12  0:35 UTC (permalink / raw)
  To: Eric Sandeen, xfs-oss

Dave Chinner wrote:

> I'd suggest updating to 2.6.26-rc9 and repeating the test.  After
> unmounting all the filesystems and before you rmmod the kernel
> module, dump /proc/slabinfo so we can see if there are remaining
> objects in the XFs slabs....

Yep I fired up on 2.6.26-rc9 this morning after I sent the mail.  And I
wish I'd checked the slabs before the explosion last time ... it'll be
full again in a while and I'll try again.

The complete lack of memory may well explain the horrendous repair
performance, too.  :)

-Eric

> Cheers,
> 
> Dave.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: xfs leaking?
  2008-07-11 23:38 ` Dave Chinner
  2008-07-12  0:35   ` Eric Sandeen
@ 2008-07-12  1:31   ` Eric Sandeen
  1 sibling, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2008-07-12  1:31 UTC (permalink / raw)
  To: Eric Sandeen, xfs-oss

Dave Chinner wrote:
> On Fri, Jul 11, 2008 at 12:04:10PM -0500, Eric Sandeen wrote:
>> after my  fill-the-1T-fs-with-20k-files test I tried an xfs_repair, and
>> it was sorrowfully slow compared to e2fsck of ext4 - I stopped it after
>> almost 2 hours, and only half complete.
>>
>> I noticed that during the run, I was about out of memory (8G) and
>> swapping badly.
>>
>> So I unmounted the fs, dropped caches, and was astounded to find
>> 10492540 buffer heads still in the slab caches.

Hm that sounds like I unmounted after xfs_repair.  That didn't come out
right - no, I did not repair a mounted filesystem ;)

-Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: xfs leaking?
  2008-07-12  0:35   ` Eric Sandeen
@ 2008-07-12  2:02     ` Eric Sandeen
  2008-07-14  4:07       ` Mark Goodwin
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Sandeen @ 2008-07-12  2:02 UTC (permalink / raw)
  To: xfs-oss

Eric Sandeen wrote:
> Dave Chinner wrote:
> 
>> I'd suggest updating to 2.6.26-rc9 and repeating the test.  After
>> unmounting all the filesystems and before you rmmod the kernel
>> module, dump /proc/slabinfo so we can see if there are remaining
>> objects in the XFs slabs....
> 
> Yep I fired up on 2.6.26-rc9 this morning after I sent the mail.  And I
> wish I'd checked the slabs before the explosion last time ... it'll be
> full again in a while and I'll try again.

2.6.26-rc9 passed without incident.

> The complete lack of memory may well explain the horrendous repair
> performance, too.  :)

about 12min, not bad.

(twice that of e2fsck/ext4 though, for this test!)  :)

-Eric

> -Eric
> 
>> Cheers,
>>
>> Dave.
> 
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: xfs leaking?
  2008-07-12  2:02     ` Eric Sandeen
@ 2008-07-14  4:07       ` Mark Goodwin
  2008-07-14  4:13         ` Christoph Hellwig
  2008-07-14  4:15         ` Eric Sandeen
  0 siblings, 2 replies; 8+ messages in thread
From: Mark Goodwin @ 2008-07-14  4:07 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs-oss



Eric Sandeen wrote:
> 
> 2.6.26-rc9 passed without incident.

So what is the conclusion here? Because you just down-rev? Or do we have
an intermittent leak of some kind?

Cheers


-- 

  Mark Goodwin                                  markgw@sgi.com
  Engineering Manager for XFS and PCP    Phone: +61-3-99631937
  SGI Australian Software Group           Cell: +61-4-18969583
-------------------------------------------------------------

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: xfs leaking?
  2008-07-14  4:07       ` Mark Goodwin
@ 2008-07-14  4:13         ` Christoph Hellwig
  2008-07-14  4:15         ` Eric Sandeen
  1 sibling, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2008-07-14  4:13 UTC (permalink / raw)
  To: Mark Goodwin; +Cc: Eric Sandeen, xfs-oss

On Mon, Jul 14, 2008 at 02:07:30PM +1000, Mark Goodwin wrote:
>
>
> Eric Sandeen wrote:
>>
>> 2.6.26-rc9 passed without incident.
>
> So what is the conclusion here? Because you just down-rev? Or do we have
> an intermittent leak of some kind?

The symptoms in the first post look like an inode leak, which is more
likely to be in common code than in XFS.  I'd expect -rc9 just fixed it
or Eric didn't manage to hit it as easily.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: xfs leaking?
  2008-07-14  4:07       ` Mark Goodwin
  2008-07-14  4:13         ` Christoph Hellwig
@ 2008-07-14  4:15         ` Eric Sandeen
  1 sibling, 0 replies; 8+ messages in thread
From: Eric Sandeen @ 2008-07-14  4:15 UTC (permalink / raw)
  To: markgw; +Cc: xfs-oss

Mark Goodwin wrote:
> 
> Eric Sandeen wrote:
>> 2.6.26-rc9 passed without incident.
> 
> So what is the conclusion here? Because you just down-rev? Or do we have
> an intermittent leak of some kind?

haven't yet re-tested on -rc2... not 100% sure yet.  So far I'd be
willing to chalk it up to something wrong in -rc2.

-Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-07-14  4:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-11 17:04 xfs leaking? Eric Sandeen
2008-07-11 23:38 ` Dave Chinner
2008-07-12  0:35   ` Eric Sandeen
2008-07-12  2:02     ` Eric Sandeen
2008-07-14  4:07       ` Mark Goodwin
2008-07-14  4:13         ` Christoph Hellwig
2008-07-14  4:15         ` Eric Sandeen
2008-07-12  1:31   ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox