public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* XFS Kernel 2.6.27.7 oopses
@ 2009-01-30 22:23 Ralf Liebenow
  2009-02-01  0:37 ` Dave Chinner
  0 siblings, 1 reply; 7+ messages in thread
From: Ralf Liebenow @ 2009-01-30 22:23 UTC (permalink / raw)
  To: xfs

Hello !

I heavily use XFS for an incremental backup server (by using rsync --link-dest option
to create hardlinks to unchanged files), and therefore have about 10 million files
on my TB Harddisk. To remove old versions nightly an "rm -rf" will remove a million
hardlinks/files every night.

After a while I had regular oopses and so I updated the system to make sure its
on a current version.

It is now a SuSE 11.1 64Bit with SuSE's Kernel 2.6.27.7-9-default

The Server is a Quad-Core Intel 64Bit with 8 GB RAM running a 64Bit Linux.
(I have vmware server 2 installed, so those modules can be seen in the kmesg,
but the OOPs happens also without them).

Now sometimes the "rm -rf" Job OOPses the kernel and get stuck (there is no
other measurable IO traffic on that system). The /proc/kmesg gives:

cat /proc/kmsg 
<0>general protection fault: 0000 [1] SMP 
<0>last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
<4>CPU 3 
<4>Modules linked in: snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device binfmt_mi 
sc vmnet(N) vsock(N) vmci(N) vmmon(N) nfsd lockd nfs_acl auth_rpcgss sunrpc expo 
rtfs microcode fuse loop dm_mod snd_hda_intel st r8169 snd_pcm snd_timer osst sn 
d_page_alloc ppdev iTCO_wdt mii shpchp button rtc_cmos snd_hwdep pci_hotplug par 
port_pc rtc_core sky2 ohci1394 intel_agp rtc_lib snd i2c_i801 iTCO_vendor_suppor 
t ieee1394 parport pcspkr i2c_core sg soundcore raid456 async_xor async_memcpy a 
sync_tx xor raid0 sd_mod crc_t10dif ehci_hcd uhci_hcd usbcore edd raid1 xfs fan  
ahci libata dock aic79xx scsi_transport_spi scsi_mod thermal processor thermal_s 
ys hwmon
<4>Supported: No
<4>Pid: 5176, comm: xfssyncd Tainted: G          2.6.27.7-9-default #1
<4>RIP: 0010:[<ffffffff80230865>]  [<ffffffff80230865>] __wake_up_common+0x29/0x 
76
<4>RSP: 0018:ffff880114df9d30  EFLAGS: 00010086
<4>RAX: 7fff8800255b8a70 RBX: ffff8800255b8a60 RCX: 0000000000000000
<4>RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8800255b8a68
<4>RBP: ffff880114df9d60 R08: 7fff8800255b8a58 R09: 0000000000000282
<4>R10: 0000000000000002 R11: ffff8800255b87c0 R12: 0000000000000001
<4>R13: 0000000000000282 R14: ffff8800255b8a70 R15: 0000000000000000
<4>FS:  0000000000000000(0000) GS:ffff88012fba0ec0(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
<4>CR2: 00007f28d42a2000 CR3: 0000000124e34000 CR4: 00000000000006e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>Process xfssyncd (pid: 5176, threadinfo ffff880114df8000, task ffff88012bc1e0 
c0)
<4>Stack:  0000000300000000 ffff8800255b8a60 ffff8800255b8a68 0000000000000282
<4> ffff88012d802000 0000000000000001 ffff880114df9d90 ffffffff8023219a
<4> 0000000000000286 0000000000000000 ffff88006ef1d240 ffff88012aca3800
<4>Call Trace:
<4> [<ffffffff8023219a>] complete+0x38/0x4b
<4> [<ffffffffa00f5316>] xfs_iflush+0x73/0x2ab [xfs]
<4> [<ffffffffa010a7a2>] xfs_finish_reclaim+0x12a/0x168 [xfs]
<4> [<ffffffffa010a871>] xfs_finish_reclaim_all+0x91/0xcb [xfs]
<4> [<ffffffffa010925c>] xfs_syncsub+0x50/0x22b [xfs]
<4> [<ffffffffa0118a3a>] xfs_sync_worker+0x17/0x36 [xfs]
<4> [<ffffffffa01189d4>] xfssyncd+0x15d/0x1ac [xfs]
<4> [<ffffffff8025434d>] kthread+0x47/0x73
<4> [<ffffffff8020d7b9>] child_rip+0xa/0x11
<4>
<4>
<0>Code: c9 c3 55 48 89 e5 41 57 4d 89 c7 41 56 4c 8d 77 08 41 55 41 54 41 89 d4 
 53 48 83 ec 08 89 75 d4 89 4d d0 48 8b 47 08 4c 8d 40 e8 <49> 8b 40 18 48 8d 58 
 e8 eb 2d 45 8b 28 4c 89 f9 8b 55 d0 8b 75 
<1>RIP  [<ffffffff80230865>] __wake_up_common+0x29/0x76
<4> RSP <ffff880114df9d30>
<4>---[ end trace a069bd11f2b4e6ab ]---

It _always_ gets stuck at the same place in "complete" of xfssyncd, so i dont
think its hardware related.

I also always did a xfs_repair after very OOPS->Reboot, so the filesystem itself
should be consistent.

I initilly used default settings for mkfs.xfs and mount. Now I use different
settings, but get the same OOPs again, it seems to be unrelated.

What do you recommend ? Has this bug already been addressed within the
hundrets of fixes I've seen on the mailing list ? Shall I try a stock 2.6.28
kernel ?

   Thanks in advance !

      Ralf
-- 
theCode AG 
HRB 78053, Amtsgericht Charlottenbg
USt-IdNr.: DE204114808
Vorstand: Ralf Liebenow, Michael Oesterreich, Peter Witzel
Aufsichtsratsvorsitzender: Wolf von Jaduczynski
Oranienstr. 10-11, 10997 Berlin [×]
fon +49 30 617 897-0  fax -10
ralf@theCo.de http://www.theCo.de

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS Kernel 2.6.27.7 oopses
  2009-01-30 22:23 XFS Kernel 2.6.27.7 oopses Ralf Liebenow
@ 2009-02-01  0:37 ` Dave Chinner
  2009-02-05  5:38   ` Ralf Liebenow
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2009-02-01  0:37 UTC (permalink / raw)
  To: Ralf Liebenow; +Cc: xfs

On Fri, Jan 30, 2009 at 11:23:59PM +0100, Ralf Liebenow wrote:
> Hello !
> 
> I heavily use XFS for an incremental backup server (by using rsync --link-dest option
> to create hardlinks to unchanged files), and therefore have about 10 million files
> on my TB Harddisk. To remove old versions nightly an "rm -rf" will remove a million
> hardlinks/files every night.
> 
> After a while I had regular oopses and so I updated the system to make sure its
> on a current version.
> 
> It is now a SuSE 11.1 64Bit with SuSE's Kernel 2.6.27.7-9-default

What kernel did you originally see this problem on?

> <4>Call Trace:
> <4> [<ffffffff8023219a>] complete+0x38/0x4b
> <4> [<ffffffffa00f5316>] xfs_iflush+0x73/0x2ab [xfs]
> <4> [<ffffffffa010a7a2>] xfs_finish_reclaim+0x12a/0x168 [xfs]
> <4> [<ffffffffa010a871>] xfs_finish_reclaim_all+0x91/0xcb [xfs]
> <4> [<ffffffffa010925c>] xfs_syncsub+0x50/0x22b [xfs]
> <4> [<ffffffffa0118a3a>] xfs_sync_worker+0x17/0x36 [xfs]
> <4> [<ffffffffa01189d4>] xfssyncd+0x15d/0x1ac [xfs]
> <4> [<ffffffff8025434d>] kthread+0x47/0x73
> <4> [<ffffffff8020d7b9>] child_rip+0xa/0x11

That may be a use after free. I know lachlan fixed a few in this
area, but I'm not sure what release those fixeѕ ended up in....

> What do you recommend ? Has this bug already been addressed within the
> hundrets of fixes I've seen on the mailing list ? Shall I try a stock 2.6.28
> kernel ?

Try the lastest 2.6.28.x stable kernel (*not* the straight 2.6.28 release
as there's a directory traversal bug that is fixed in 2.6.28.1) and
see if the problem persists.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS Kernel 2.6.27.7 oopses
  2009-02-01  0:37 ` Dave Chinner
@ 2009-02-05  5:38   ` Ralf Liebenow
  2009-02-10  9:50     ` Dave Chinner
  2009-02-10  9:56     ` Christoph Hellwig
  0 siblings, 2 replies; 7+ messages in thread
From: Ralf Liebenow @ 2009-02-05  5:38 UTC (permalink / raw)
  To: xfs

Hello !

Finally I found the time to compile and test the latest stable 2.6.28.3 kernel
but I can reproduce it:

Feb  5 03:00:19 up kernel: general protection fault: 0000 [#1] SMP
Feb  5 03:00:19 up kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
Feb  5 03:00:19 up kernel: CPU 2
Feb  5 03:00:19 up kernel: Modules linked in: vmnet parport_pc vsock vmci vmmon nfsd lockd nfs_acl auth_rpcgss snd_pcm_oss sunrpc snd_mi
xer_oss exportfs snd_seq snd_seq_device binfmt_misc microcode fuse loop dm_mod snd_hda_intel osst st snd_pcm snd_timer snd_page_alloc pp
dev shpchp rtc_cmos i2c_i801 rtc_core button snd_hwdep r8169 rtc_lib pcspkr ohci1394 intel_agp mii i2c_core parport sky2 pci_hotplug iTC
O_wdt ieee1394 iTCO_vendor_support snd sg soundcore raid456 async_xor async_memcpy async_tx xor raid0 sd_mod crc_t10dif ehci_hcd uhci_hc
d usbcore edd raid1 xfs fan ahci libata aic79xx scsi_transport_spi scsi_mod thermal processor thermal_sys hwmon [last unloaded: vmnet]
Feb  5 03:00:19 up kernel: Pid: 1462, comm: xfssyncd Not tainted 2.6.28.3-9-default #1
Feb  5 03:00:19 up kernel: RIP: 0010:[<ffffffff802327a1>]  [<ffffffff802327a1>] __wake_up_common+0x29/0x76
Feb  5 03:00:19 up kernel: RSP: 0018:ffff88012e56fcf0  EFLAGS: 00010086
Feb  5 03:00:19 up kernel: RAX: 7fff8800255b8a70 RBX: ffff8800255b8a60 RCX: 0000000000000000
Feb  5 03:00:19 up kernel: RDX: 0000000000000001 RSI: 0000000000000003 RDI: ffff8800255b8a68
Feb  5 03:00:19 up kernel: RBP: ffff88012e56fd20 R08: 7fff8800255b8a58 R09: ffff880129d02e18
Feb  5 03:00:19 up kernel: R10: 0000000000000002 R11: 0000000300000000 R12: 0000000000000001
Feb  5 03:00:19 up kernel: R13: 0000000000000286 R14: ffff8800255b8a70 R15: 0000000000000000
Feb  5 03:00:19 up kernel: FS:  0000000000000000(0000) GS:ffff88012fb2e8c0(0000) knlGS:0000000000000000
Feb  5 03:00:19 up kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb  5 03:00:19 up kernel: CR2: 00007f075ee9ab00 CR3: 0000000000201000 CR4: 00000000000006e0
Feb  5 03:00:19 up kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb  5 03:00:19 up kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb  5 03:00:19 up kernel: Process xfssyncd (pid: 1462, threadinfo ffff88012e56e000, task ffff88012c842640)
Feb  5 03:00:19 up kernel: Stack:
Feb  5 03:00:19 up kernel:  0000000300000000 ffff8800255b8a60 ffff8800255b8a68 0000000000000286
Feb  5 03:00:19 up kernel:  ffff88012b922000 ffff88012a1eb000 ffff88012e56fd50 ffffffff8023410a
Feb  5 03:00:19 up kernel:  ffff8800255b87c0 0000000000000000 ffff8800255b8980 ffff88004dc64140
Feb  5 03:00:19 up kernel: Call Trace:
Feb  5 03:00:20 up kernel:  [<ffffffff8023410a>] complete+0x38/0x4c
Feb  5 03:00:20 up kernel:  [<ffffffffa01a2424>] xfs_iflush+0x7a/0x2b2 [xfs]
Feb  5 03:00:20 up kernel:  [<ffffffff802241cc>] ? default_spin_lock_flags+0x17/0x1b
Feb  5 03:00:20 up kernel:  [<ffffffffa01b7cf9>] xfs_finish_reclaim+0x136/0x175 [xfs]
Feb  5 03:00:20 up kernel:  [<ffffffffa01b7dd0>] xfs_finish_reclaim_all+0x98/0xd4 [xfs]
Feb  5 03:00:20 up kernel:  [<ffffffffa01b694c>] xfs_syncsub+0x55/0x22f [xfs]
Feb  5 03:00:20 up kernel:  [<ffffffffa01b6b68>] xfs_sync+0x42/0x47 [xfs]
Feb  5 03:00:20 up kernel:  [<ffffffffa01c55fd>] xfs_sync_worker+0x1f/0x41 [xfs]
Feb  5 03:00:20 up kernel:  [<ffffffffa01c558f>] xfssyncd+0x15d/0x1ac [xfs]
Feb  5 03:00:20 up kernel:  [<ffffffffa01c5432>] ? xfssyncd+0x0/0x1ac [xfs]
Feb  5 03:00:20 up kernel:  [<ffffffff802563e5>] kthread+0x49/0x76
Feb  5 03:00:20 up kernel:  [<ffffffff8020d659>] child_rip+0xa/0x11
Feb  5 03:00:20 up kernel:  [<ffffffff8025639c>] ? kthread+0x0/0x76
Feb  5 03:00:20 up kernel:  [<ffffffff8020d64f>] ? child_rip+0x0/0x11
Feb  5 03:00:20 up kernel: Code: c9 c3 55 48 89 e5 41 57 4d 89 c7 41 56 4c 8d 77 08 41 55 41 54 41 89 d4 53 48 83 ec 08 89 75 d4 89 4d d
0 48 8b 47 08 4c 8d 40 e8 <49> 8b 40 18 48 8d 58 e8 eb 2d 45 8b 28 4c 89 f9 8b 55 d0 8b 75
Feb  5 03:00:20 up kernel: RIP  [<ffffffff802327a1>] __wake_up_common+0x29/0x76
Feb  5 03:00:20 up kernel:  RSP <ffff88012e56fcf0>
Feb  5 03:00:20 up kernel: ---[ end trace a0fbe14899a3ce1c ]---

So its not SuSEs fault, and its the latest stable kernel from kernel.org ....

Hmmm ... can I do something to help you find the problem ? I can
reproduce it by creating some millon of hardlinks to files and then remove some
million hardlinks with one "rm -rf"

The Filesystem is 1 TB big.

Settings:
meta-data=/dev/sdd1              isize=256    agcount=32, agsize=7630937 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=244189984, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=32768, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=65536  blocks=0, rtextents=0

[I originally had log version=1 but with the same problem. The problem occurs
 with barriers=on and with barriers=off ]

I have not tried to run the system with one CPU core yet, that maybe a thing
I can check tomorrow ...

   Thanks for your help
      Ralf


> On Fri, Jan 30, 2009 at 11:23:59PM +0100, Ralf Liebenow wrote:
> > Hello !
> > 
> > I heavily use XFS for an incremental backup server (by using rsync --link-dest option
> > to create hardlinks to unchanged files), and therefore have about 10 million files
> > on my TB Harddisk. To remove old versions nightly an "rm -rf" will remove a million
> > hardlinks/files every night.
> > 
> > After a while I had regular oopses and so I updated the system to make sure its
> > on a current version.
> > 
> > It is now a SuSE 11.1 64Bit with SuSE's Kernel 2.6.27.7-9-default
> 
> What kernel did you originally see this problem on?
> 
> > <4>Call Trace:
> > <4> [<ffffffff8023219a>] complete+0x38/0x4b
> > <4> [<ffffffffa00f5316>] xfs_iflush+0x73/0x2ab [xfs]
> > <4> [<ffffffffa010a7a2>] xfs_finish_reclaim+0x12a/0x168 [xfs]
> > <4> [<ffffffffa010a871>] xfs_finish_reclaim_all+0x91/0xcb [xfs]
> > <4> [<ffffffffa010925c>] xfs_syncsub+0x50/0x22b [xfs]
> > <4> [<ffffffffa0118a3a>] xfs_sync_worker+0x17/0x36 [xfs]
> > <4> [<ffffffffa01189d4>] xfssyncd+0x15d/0x1ac [xfs]
> > <4> [<ffffffff8025434d>] kthread+0x47/0x73
> > <4> [<ffffffff8020d7b9>] child_rip+0xa/0x11
> 
> That may be a use after free. I know lachlan fixed a few in this
> area, but I'm not sure what release those fixe?? ended up in....
> 
> > What do you recommend ? Has this bug already been addressed within the
> > hundrets of fixes I've seen on the mailing list ? Shall I try a stock 2.6.28
> > kernel ?
> 
> Try the lastest 2.6.28.x stable kernel (*not* the straight 2.6.28 release
> as there's a directory traversal bug that is fixed in 2.6.28.1) and
> see if the problem persists.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

-- 
theCode AG 
HRB 78053, Amtsgericht Charlottenbg
USt-IdNr.: DE204114808
Vorstand: Ralf Liebenow, Michael Oesterreich, Peter Witzel
Aufsichtsratsvorsitzender: Wolf von Jaduczynski
Oranienstr. 10-11, 10997 Berlin [×]
fon +49 30 617 897-0  fax -10
ralf@theCo.de http://www.theCo.de

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS Kernel 2.6.27.7 oopses
  2009-02-05  5:38   ` Ralf Liebenow
@ 2009-02-10  9:50     ` Dave Chinner
  2009-02-10 19:41       ` Ralf Liebenow
  2009-02-17 12:33       ` Ralf Liebenow
  2009-02-10  9:56     ` Christoph Hellwig
  1 sibling, 2 replies; 7+ messages in thread
From: Dave Chinner @ 2009-02-10  9:50 UTC (permalink / raw)
  To: Ralf Liebenow; +Cc: xfs

On Thu, Feb 05, 2009 at 06:38:47AM +0100, Ralf Liebenow wrote:
> Hello !
> 
> Finally I found the time to compile and test the latest stable 2.6.28.3 kernel
> but I can reproduce it:

OK.

....

> Hmmm ... can I do something to help you find the problem ? I can
> reproduce it by creating some millon of hardlinks to files and then remove some
> million hardlinks with one "rm -rf"

Interesting. Sounds like a race between writing back the inode and
it being freed. How long does it take to reproduce the problem?
Do you have a script that you could share?

Next question - what is the setting of ikeep/noikeep in your mount
options? If you dump /proc/self/mounts on 2.6.28 it will tell us
if inode clusters are being deleted or not....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS Kernel 2.6.27.7 oopses
  2009-02-05  5:38   ` Ralf Liebenow
  2009-02-10  9:50     ` Dave Chinner
@ 2009-02-10  9:56     ` Christoph Hellwig
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2009-02-10  9:56 UTC (permalink / raw)
  To: Ralf Liebenow; +Cc: xfs

On Thu, Feb 05, 2009 at 06:38:47AM +0100, Ralf Liebenow wrote:
> Hmmm ... can I do something to help you find the problem ? I can
> reproduce it by creating some millon of hardlinks to files and then remove some
> million hardlinks with one "rm -rf"

Can you isolated that testcase to a simple shell script?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS Kernel 2.6.27.7 oopses
  2009-02-10  9:50     ` Dave Chinner
@ 2009-02-10 19:41       ` Ralf Liebenow
  2009-02-17 12:33       ` Ralf Liebenow
  1 sibling, 0 replies; 7+ messages in thread
From: Ralf Liebenow @ 2009-02-10 19:41 UTC (permalink / raw)
  To: xfs

Hello !

Here are my mount settings:

cat  /proc/self/mounts

..
/dev/sdc1 /backup xfs rw,nobarrier,logbufs=8,logbsize=256k,noquota 0 0

This is my current setting, but it also happend before i changed the
settings. Before I had it with this:

/dev/sdc1 /backup xfs rw,noquota 0 0

The problem occured independently of the settings i changed.

Shall I try to set ikeep/noikeep (whats the default for that ?).

At the moment I have no time to create a minimum Script to
reproduce, but essentially I do the following:
  - I have a tree with about 2 million files in it called daily.1
  - I create a new tree daily.0 with rsync --link-dest=daily.1
    so that the most (the unchanged ones) of those million files 
    just get hardlinked to the ones in daily.1 and only the
    changed ones are created newly.
  - every day daily.1 gets renamed daily.2 and daily.0 gets
    renamed daily.1 (currently I have rotated to daily.14)
    The oldest daily.X folder gets removed by "rm -rf" which
    is where the oops sometimes (not every time, but often
    enough to reproduce) happens.

So the setting is: I have about 2 million files, and most of
them are multip hardlinked, so i have about > 20 million Inodes
on this system. Every night about 2 million of those inodes
get removed, most of them pointing to files which have other
hardlinks and therefore are not really removed.

> How long does it take to reproduce the problem?

On my system I just need to make a new rsync and remove
some million files/hardlinks, but it take some hours until it happens.
Somtimes it even runs successfully through ...

As I said before an xfs_check/xfs_repair does not detect any
incosistencies after the problem happend. ( But the rm process
hangs and the filesystem cannot be umounted any more )

I need to see, if the problem is with the massive hardlinking,
or if it just can be reproduced by creating 20 million files,
and remove them in one sweep ... I will check it, when i have
the time.

  Greets
    Ralf

> On Thu, Feb 05, 2009 at 06:38:47AM +0100, Ralf Liebenow wrote:
> > Hello !
> > 
> > Finally I found the time to compile and test the latest stable 2.6.28.3 kernel
> > but I can reproduce it:
> 
> OK.
> 
> .....
> 
> > Hmmm ... can I do something to help you find the problem ? I can
> > reproduce it by creating some millon of hardlinks to files and then remove some
> > million hardlinks with one "rm -rf"
> 
> Interesting. Sounds like a race between writing back the inode and
> it being freed. How long does it take to reproduce the problem?
> Do you have a script that you could share?
> 
> Next question - what is the setting of ikeep/noikeep in your mount
> options? If you dump /proc/self/mounts on 2.6.28 it will tell us
> if inode clusters are being deleted or not....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

-- 
theCode AG 
HRB 78053, Amtsgericht Charlottenbg
USt-IdNr.: DE204114808
Vorstand: Ralf Liebenow, Michael Oesterreich, Peter Witzel
Aufsichtsratsvorsitzender: Wolf von Jaduczynski
Oranienstr. 10-11, 10997 Berlin [×]
fon +49 30 617 897-0  fax -10
ralf@theCo.de http://www.theCo.de

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: XFS Kernel 2.6.27.7 oopses
  2009-02-10  9:50     ` Dave Chinner
  2009-02-10 19:41       ` Ralf Liebenow
@ 2009-02-17 12:33       ` Ralf Liebenow
  1 sibling, 0 replies; 7+ messages in thread
From: Ralf Liebenow @ 2009-02-17 12:33 UTC (permalink / raw)
  To: xfs

Hello !

More testing reveals the same problem with a different oops ..
I did the remove again, and that worked without oops, but the oops
happens shortly after, when the machine needed to swap/reorganice memory,
and kswapd tried to cleanup/reclaim inode space.

It looks like the there are invalid (nulled) inodes in an (freed ?) inode
list, which generates oopses whenever a process tries to cleanup/reclaim them.

Is there a debugging/compile time option I can use to checkup that an
inode pointer is valid and usable ??

  Thanks !!

     Ralf

Feb 17 12:13:53 up kernel: general protection fault: 0000 [#1] SMP
Feb 17 12:13:53 up kernel: last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
Feb 17 12:13:53 up kernel: CPU 1
Feb 17 12:13:53 up kernel: Modules linked in: vmnet vsock vmci vmmon snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device binfmt_misc nfsd lockd n
fs_acl auth_rpcgss sunrpc exportfs microcode fuse loop dm_mod snd_hda_intel osst snd_pcm st snd_timer rtc_cmos ppdev snd_page_alloc shpchp r81
69 snd_hwdep parport_pc rtc_core i2c_i801 ohci1394 iTCO_wdt snd parport mii intel_agp button rtc_lib ieee1394 pcspkr pci_hotplug iTCO_vendor_s
upport i2c_core sky2 sg soundcore raid456 async_xor async_memcpy async_tx xor raid0 sd_mod crc_t10dif ehci_hcd uhci_hcd usbcore edd raid1 xfs
fan ahci libata aic79xx scsi_transport_spi scsi_mod thermal processor thermal_sys hwmon
Feb 17 12:13:53 up kernel: Pid: 38, comm: kswapd0 Not tainted 2.6.28.3-9-default #1
Feb 17 12:13:53 up kernel: RIP: 0010:[<ffffffffa01a1cf3>]  [<ffffffffa01a1cf3>] xfs_idestroy_fork+0x1f/0xca [xfs]
Feb 17 12:13:53 up kernel: RSP: 0018:ffff88012bb05bd0  EFLAGS: 00010202
Feb 17 12:13:53 up kernel: RAX: ffff8800813dcb80 RBX: 1000000000000000 RCX: ffff8800813dcb00
Feb 17 12:13:53 up kernel: RDX: ffff8800813dcb80 RSI: 0000000000000001 RDI: ffff8800813dcb00
Feb 17 12:13:53 up kernel: RBP: ffff88012bb05bf0 R08: ffff88012bb05d1b R09: a55a5a5a5a5a5a5a
Feb 17 12:13:53 up kernel: R10: ffa5a5a5a5a5a5a5 R11: 0000000300000000 R12: ffff8800813dcb00
Feb 17 12:13:53 up kernel: R13: 0000000000000001 R14: ffff88012bb05d1b R15: ffff88012dc81000
Feb 17 12:13:53 up kernel: FS:  0000000000000000(0000) GS:ffff88012fac22c0(0000) knlGS:0000000000000000
Feb 17 12:13:53 up kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Feb 17 12:13:53 up kernel: CR2: 00007f9e560ef000 CR3: 00000000993b2000 CR4: 00000000000006e0
Feb 17 12:13:53 up kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Feb 17 12:13:53 up kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Feb 17 12:13:53 up kernel: Process kswapd0 (pid: 38, threadinfo ffff88012bb04000, task ffff88012bb02180)
Feb 17 12:13:53 up kernel: Stack:
Feb 17 12:13:53 up kernel:  ffff8800813dcb00 ffff8800813dcb00 ffff8800813dcb00 ffff88009fa38240
Feb 17 12:13:53 up kernel:  ffff88012bb05c20 ffffffffa01a1dec ffff8800813dcb00 ffff8800813dcb00
Feb 17 12:13:53 up kernel:  ffff88009fa38240 ffff88012bb05d1b ffff88012bb05c40 ffffffffa019f4c6
Feb 17 12:13:53 up kernel: Call Trace:
Feb 17 12:13:53 up kernel:  [<ffffffffa01a1dec>] xfs_idestroy+0x4e/0xbc [xfs]
Feb 17 12:13:53 up kernel:  [<ffffffffa019f4c6>] xfs_ireclaim+0x83/0x87 [xfs]
Feb 17 12:13:53 up kernel:  [<ffffffffa01b7d5e>] xfs_finish_reclaim+0x167/0x175 [xfs]
Feb 17 12:13:53 up kernel:  [<ffffffffa01b7eb6>] xfs_reclaim+0x76/0x10e [xfs]
Feb 17 12:13:53 up kernel:  [<ffffffffa01c41db>] xfs_fs_clear_inode+0xf1/0x115 [xfs]
Feb 17 12:13:53 up kernel:  [<ffffffff802d225f>] clear_inode+0x79/0xd2
Feb 17 12:13:53 up kernel:  [<ffffffff802d236f>] dispose_list+0x68/0x138
Feb 17 12:13:53 up kernel:  [<ffffffff802d264a>] shrink_icache_memory+0x20b/0x241
Feb 17 12:13:53 up kernel:  [<ffffffff802961eb>] shrink_slab+0xe3/0x158
Feb 17 12:13:53 up kernel:  [<ffffffff802969b2>] kswapd+0x4b2/0x63d
Feb 17 12:13:53 up kernel:  [<ffffffff80294011>] ? isolate_pages_global+0x0/0x22d
Feb 17 12:13:53 up kernel:  [<ffffffff80256758>] ? autoremove_wake_function+0x0/0x38
Feb 17 12:13:53 up kernel:  [<ffffffff80296500>] ? kswapd+0x0/0x63d
Feb 17 12:13:53 up kernel:  [<ffffffff802563e5>] kthread+0x49/0x76
Feb 17 12:13:53 up kernel:  [<ffffffff8020d659>] child_rip+0xa/0x11
Feb 17 12:13:53 up kernel:  [<ffffffff8025639c>] ? kthread+0x0/0x76
Feb 17 12:13:53 up kernel:  [<ffffffff8020d64f>] ? child_rip+0x0/0x11
Feb 17 12:13:53 up kernel: Code: be 03 00 00 00 e8 9a 24 09 e0 c9 c3 55 48 89 e5 41 55 41 89 f5 41 54 49 89 fc 53 48 8d 5f 60 48 83 ec 08 85 f
6 74 04 48 8b 5f 58 <48> 8b 7b 08 48 85 ff 74 0d e8 81 9e 01 00 48 c7 43 08 00 00 00
Feb 17 12:13:53 up kernel: RIP  [<ffffffffa01a1cf3>] xfs_idestroy_fork+0x1f/0xca [xfs]
Feb 17 12:13:53 up kernel:  RSP <ffff88012bb05bd0>
Feb 17 12:13:53 up kernel: ---[ end trace 564bbbd2e5103836 ]---

> On Thu, Feb 05, 2009 at 06:38:47AM +0100, Ralf Liebenow wrote:
> > Hello !
> > 
> > Finally I found the time to compile and test the latest stable 2.6.28.3 kernel
> > but I can reproduce it:
> 
> OK.
> 
> .....
> 
> > Hmmm ... can I do something to help you find the problem ? I can
> > reproduce it by creating some millon of hardlinks to files and then remove some
> > million hardlinks with one "rm -rf"
> 
> Interesting. Sounds like a race between writing back the inode and
> it being freed. How long does it take to reproduce the problem?
> Do you have a script that you could share?
> 
> Next question - what is the setting of ikeep/noikeep in your mount
> options? If you dump /proc/self/mounts on 2.6.28 it will tell us
> if inode clusters are being deleted or not....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

-- 
theCode AG 
HRB 78053, Amtsgericht Charlottenbg
USt-IdNr.: DE204114808
Vorstand: Ralf Liebenow, Michael Oesterreich, Peter Witzel
Aufsichtsratsvorsitzender: Wolf von Jaduczynski
Oranienstr. 10-11, 10997 Berlin [×]
fon +49 30 617 897-0  fax -10
ralf@theCo.de http://www.theCo.de

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-02-17 12:34 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-01-30 22:23 XFS Kernel 2.6.27.7 oopses Ralf Liebenow
2009-02-01  0:37 ` Dave Chinner
2009-02-05  5:38   ` Ralf Liebenow
2009-02-10  9:50     ` Dave Chinner
2009-02-10 19:41       ` Ralf Liebenow
2009-02-17 12:33       ` Ralf Liebenow
2009-02-10  9:56     ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox