xfstests testcase 111: Infinite xfs_bulkstat bad-inode loop case from Roger Willcocks

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* xfstests testcase 111: Infinite xfs_bulkstat bad-inode loop case from Roger Willcocks
@ 2008-12-22 16:58 Christoph Hellwig
  2008-12-22 20:28 ` xfstests testcase 111: Infinite xfs_bulkstat bad-inode loop casefrom " Roger Willcocks
  0 siblings, 1 reply; 3+ messages in thread
From: Christoph Hellwig @ 2008-12-22 16:58 UTC (permalink / raw)
  To: Roger Willcocks; +Cc: xfs

Hi Roger,

I believe the xfstests case 111 is based on a report by you.  Do you
remember what was going on there?  From a look at the testcase it
overwrites an inode cluster and then tries to bulkstat them.  This works
fine with a non-debug kernel, but due to debug kernels panicing it fails
there.

Do you remember what the testcase was looking for?  I suspect we should
just not run it for debug kernels, but I'd like to know more about it
so we can add comments describing it.

Cheers,
	Christoph

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xfstests testcase 111: Infinite xfs_bulkstat bad-inode loop casefrom Roger Willcocks
  2008-12-22 16:58 xfstests testcase 111: Infinite xfs_bulkstat bad-inode loop case from Roger Willcocks Christoph Hellwig
@ 2008-12-22 20:28 ` Roger Willcocks
  2008-12-22 20:50   ` Christoph Hellwig
  0 siblings, 1 reply; 3+ messages in thread
From: Roger Willcocks @ 2008-12-22 20:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

> Hi Roger,
>
> I believe the xfstests case 111 is based on a report by you.  Do you
> remember what was going on there?  From a look at the testcase it
> overwrites an inode cluster and then tries to bulkstat them.  This works
> fine with a non-debug kernel, but due to debug kernels panicing it fails
> there.
>
> Do you remember what the testcase was looking for?  I suspect we should
> just not run it for debug kernels, but I'd like to know more about it
> so we can add comments describing it.
>
> Cheers,
> Christoph
>

Hi Christoph,

here are the relevant extracts from our in-house bugzilla (bug 3675). Since 
the problem only occurs when the disk is corrupted, I don't see any problem 
with skipping the test on debug kernels.

** 2006-02-01

xfs_fsr can get into a state where one processor spends 100% of its time
looping in the kernel. The application can't be killed. 'top' shows it using
50% CPU (i.e. all of one of the two processors).

oprofile reveals that one processor spends about 2/3 of its time in xfs.ko. 
It
looks like the offending syscall is xfs_bulkstat.

** 2006-02-03

Looks like xfs_itobp (map inode number to disk buffer) detects a corrupted
inode (bad magic number). That causes a break out of a loop in xfs_bulkstat,
skipping setting the teminating condition of a containing loop.

I'll file a bug report with SGI.

** 2006-02-03

SGI say 'Ayup, I think you're right'-

http://marc.theaimsgroup.com/?t=113889680200006

** 2006-02-07

A bad inode magic number can cause the xfs_bulkstat syscall to get stuck
looping in the kernel.

To reproduce: (don't try this at home folks!) -

mkfs.xfs /dev/sda
mount filesystem and create 1000 or so files (I copied a handy 313-byte 
file).
run this program:

---------
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>

char buffer[32768];

void nuke()
{
        int i;
        for (i = 2048; i < 32768-1; i++)
                if (buffer[i] == 'I' && buffer[i+1] == 'N')
                        buffer[i] = buffer[i+1] = 'X';
}

                                      int main(int argc, char* argv[])
{
        int f = open("/dev/sda", O_RDWR);
        if (lseek(f, 32768, SEEK_SET) < 0) perror("lseek");
        if (read(f, buffer, 32768) != 32768) perror("read");
        nuke();
        if (lseek(f, 32768, SEEK_SET) < 0) perror("lseek");
        if (write(f, buffer, 32768) != 32768) perror("write");
        close(f);
}
---------

mount the disk and run xfs_fsr. It immediately gets stuck in a kernel loop.

** 2006-02-08

SGI have added a corresponding regression test to the xfs_cmds package

http://oss.sgi.com/cgi-bin/cvsweb.cgi/xfs-cmds/xfstests/111?rev=1.1

--
Roger

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xfstests testcase 111: Infinite xfs_bulkstat bad-inode loop casefrom Roger Willcocks
  2008-12-22 20:28 ` xfstests testcase 111: Infinite xfs_bulkstat bad-inode loop casefrom " Roger Willcocks
@ 2008-12-22 20:50   ` Christoph Hellwig
  0 siblings, 0 replies; 3+ messages in thread
From: Christoph Hellwig @ 2008-12-22 20:50 UTC (permalink / raw)
  To: Roger Willcocks; +Cc: Christoph Hellwig, xfs

On Mon, Dec 22, 2008 at 08:28:59PM -0000, Roger Willcocks wrote:
> Hi Christoph,
>
> here are the relevant extracts from our in-house bugzilla (bug 3675). 
> Since the problem only occurs when the disk is corrupted, I don't see any 
> problem with skipping the test on debug kernels.

Thanks a lot, that's some very helpful notes.  I'll put a shortened
version of this into the testcase as a comment.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-12-22 20:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-22 16:58 xfstests testcase 111: Infinite xfs_bulkstat bad-inode loop case from Roger Willcocks Christoph Hellwig
2008-12-22 20:28 ` xfstests testcase 111: Infinite xfs_bulkstat bad-inode loop casefrom " Roger Willcocks
2008-12-22 20:50   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox