raid50 and 9TB volumes

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* raid50 and 9TB volumes
@ 2007-07-16 12:42 Raz
  2007-07-16 13:01 ` David Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Raz @ 2007-07-16 12:42 UTC (permalink / raw)
  To: linux-xfs

Hello
I found that using xfs over raid50, ( two raid5's 8 disks each and
raid 0 over them ) crashes the file system when the file system is ~
9TB. crashing is easy: we simply create few hundred of files, then
erase them in bulk. the same test passes in 6.4TB filesystems.
this bug happens in 2.6.22 as well as 2.6.17.7.
thank you .

4391322.839000] Filesystem "md3": XFS internal error
xfs_alloc_read_agf at line 2176 of file fs/xfs/xfs_alloc.c.  Caller
0xc10d31ea
[4391322.863000]  <c10d36e9> xfs_alloc_read_agf+0x199/0x220
<c10d31ea> xfs_alloc_fix_freelist+0x41a/0x4b0
[4391322.882000]  <c10d31ea> xfs_alloc_fix_freelist+0x41a/0x4b0
<c10d31ea> xfs_alloc_fix_freelist+0x41a/0x4b0
[4391322.901000]  <c1114294> xfs_iext_remove+0x64/0x90  <c10e1020>
xfs_bmap_add_extent_delay_real+0x12a0/0x16d0
[4391322.921000]  <c10d0fce> xfs_alloc_ag_vextent+0xae/0x140
<c10d3a77> xfs_alloc_vextent+0x307/0x5b0
[4391322.939000]  <c10e49fb> xfs_bmap_btalloc+0x41b/0x980  <c1114a46>
xfs_iext_bno_to_ext+0x126/0x1d0
[4391322.958000]  <c1049845> get_page_from_freelist+0x75/0xa0
<c10e9355> xfs_bmapi+0x1495/0x18d0
[4391322.975000]  <c1114a46> xfs_iext_bno_to_ext+0x126/0x1d0
<c10e698c> xfs_bmap_search_multi_extents+0xfc/0x110
[4391322.995000]  <c1117a07> xfs_iomap_write_allocate+0x327/0x620
<f8871195> release_stripe+0x35/0x60 [raid5]
[4391323.015000]  <c11164a0> xfs_iomap+0x440/0x570  <c113979b>
xfs_map_blocks+0x5b/0xa0
[4391323.031000]  <c113aa3a> xfs_page_state_convert+0x46a/0x7a0
<c1044d7b> find_get_pages_tag+0x7b/0x90
[4391323.049000]  <c113add9> xfs_vm_writepage+0x69/0x100  <c108df58>
mpage_writepages+0x218/0x3f0
[4391323.067000]  <c113ad70> xfs_vm_writepage+0x0/0x100  <c104b614>
do_writepages+0x54/0x60
[4391323.083000]  <c108be86> __sync_single_inode+0x66/0x1f0
<c108c098> __writeback_single_inode+0x88/0x1b0
[4391323.102000]  <c1017fd7> find_busiest_group+0x287/0x2f0
<c108c3a7> sync_sb_inodes+0x1e7/0x300
[4391323.120000]  <c104bef0> pdflush+0x0/0x50  <c108c595>
writeback_inodes+0xd5/0xf0
[4391323.135000]  <c104b3ac> wb_kupdate+0xbc/0x130  <c1085cc0>
mark_mounts_for_expiry+0x0/0x180
[4391323.152000]  <c104be0a> __pdflush+0xca/0x1b0  <c104bf2f> pdflush+0x3f/0x50
[4391323.166000]  <c104b2f0> wb_kupdate+0x0/0x130  <c10330f7> kthread+0xb7/0xc0
[4391323.181000]  <c1033040> kthread+0x0/0xc0  <c10011ed>
kernel_thread_helper+0x5/0x18


-- 
Raz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid50 and 9TB volumes
  2007-07-16 12:42 raid50 and 9TB volumes Raz
@ 2007-07-16 13:01 ` David Chinner
  2007-07-16 13:57   ` Raz
       [not found]   ` <5d96567b0707160653m5951fac9v5a56bb4c92174d63@mail.gmail.com>
  0 siblings, 2 replies; 15+ messages in thread
From: David Chinner @ 2007-07-16 13:01 UTC (permalink / raw)
  To: Raz; +Cc: linux-xfs

On Mon, Jul 16, 2007 at 03:42:28PM +0300, Raz wrote:
> Hello
> I found that using xfs over raid50, ( two raid5's 8 disks each and
> raid 0 over them ) crashes the file system when the file system is ~
> 9TB. crashing is easy: we simply create few hundred of files, then
> erase them in bulk. the same test passes in 6.4TB filesystems.
> this bug happens in 2.6.22 as well as 2.6.17.7.
> thank you .
> 
> 4391322.839000] Filesystem "md3": XFS internal error
> xfs_alloc_read_agf at line 2176 of file fs/xfs/xfs_alloc.c.  Caller
> 0xc10d31ea
> [4391322.863000]  <c10d36e9> xfs_alloc_read_agf+0x199/0x220
> <c10d31ea> xfs_alloc_fix_freelist+0x41a/0x4b0

Judging by the kernel addresses (<c10d36e9>) you're running
an i386 kernel right? Which means there's probably a wrapping
issue at 8TB somewhere in the code which has caused an AGF
header to be trashed somewhere lower down in the filesystem.
what does /proc/partitions say? I.e. does the kernel see
the whole 9TB of space?

What does xfs_repair tell you about the corruption? (assuming
it doesn't OOM, which is a good chance if you really are on
i386).

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid50 and 9TB volumes
  2007-07-16 13:01 ` David Chinner
@ 2007-07-16 13:57   ` Raz
  2007-07-16 15:24     ` Eric Sandeen
       [not found]   ` <5d96567b0707160653m5951fac9v5a56bb4c92174d63@mail.gmail.com>
  1 sibling, 1 reply; 15+ messages in thread
From: Raz @ 2007-07-16 13:57 UTC (permalink / raw)
  To: linux-xfs

On 7/16/07, David Chinner <dgc@sgi.com> wrote:
> On Mon, Jul 16, 2007 at 03:42:28PM +0300, Raz wrote:
> > Hello
> > I found that using xfs over raid50, ( two raid5's 8 disks each and
> > raid 0 over them ) crashes the file system when the file system is ~
> > 9TB. crashing is easy: we simply create few hundred of files, then
> > erase them in bulk. the same test passes in 6.4TB filesystems.
> > this bug happens in 2.6.22 as well as 2.6.17.7.
> > thank you .
> >
> > 4391322.839000] Filesystem "md3": XFS internal error
> > xfs_alloc_read_agf at line 2176 of file fs/xfs/xfs_alloc.c.  Caller
> > 0xc10d31ea
> > [4391322.863000]  <c10d36e9> xfs_alloc_read_agf+0x199/0x220
> > <c10d31ea> xfs_alloc_fix_freelist+0x41a/0x4b0
>
> Judging by the kernel addresses (<c10d36e9>) you're running
> an i386 kernel right? Which means there's probably a wrapping
> issue at 8TB somewhere in the code which has caused an AGF
> header to be trashed somewhere lower down in the filesystem.
> what does /proc/partitions say? I.e. does the kernel see
> the whole 9TB of space?
>
> What does xfs_repair tell you about the corruption? (assuming
> it doesn't OOM, which is a good chance if you really are on
> i386).
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
>
Well you are right.  /proc/partitions  says:
 ....
    8   241  488384001 sdp1
    9     1 3404964864 md1
    9     2 3418684416 md2
    9     3 6823647232 md3

 while xfs formats md3 as 9 TB.
 If i am using LBD , what is the biggest size I can use on i386 ?

 many thanks
 raz

-- 
Raz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid50 and 9TB volumes
  2007-07-16 13:57   ` Raz
@ 2007-07-16 15:24     ` Eric Sandeen
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Sandeen @ 2007-07-16 15:24 UTC (permalink / raw)
  To: Raz; +Cc: linux-xfs

Raz wrote:

> Well you are right.  /proc/partitions  says:
>  ....
>     8   241  488384001 sdp1
>     9     1 3404964864 md1
>     9     2 3418684416 md2
>     9     3 6823647232 md3
> 
>  while xfs formats md3 as 9 TB.
>  If i am using LBD , what is the biggest size I can use on i386 ?

With LBD on, you *should* be able to get to 16TB (2^32 * 4096) in
general, assuming that everything in your IO path is clean.  (The 16TB
limit is due to page cache addressing on x86).

-Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid50 and 9TB volumes
       [not found]   ` <5d96567b0707160653m5951fac9v5a56bb4c92174d63@mail.gmail.com>
@ 2007-07-16 22:18     ` David Chinner
  2007-07-16 23:56       ` Neil Brown
  0 siblings, 1 reply; 15+ messages in thread
From: David Chinner @ 2007-07-16 22:18 UTC (permalink / raw)
  To: Raz; +Cc: neilb, dgc, xfs-oss

On Mon, Jul 16, 2007 at 04:53:22PM +0300, Raz wrote:
> On 7/16/07, David Chinner <dgc@sgi.com> wrote:
> >On Mon, Jul 16, 2007 at 03:42:28PM +0300, Raz wrote:
> >> Hello
> >> I found that using xfs over raid50, ( two raid5's 8 disks each and
> >> raid 0 over them ) crashes the file system when the file system is ~
> >> 9TB. crashing is easy: we simply create few hundred of files, then
> >> erase them in bulk. the same test passes in 6.4TB filesystems.
> >> this bug happens in 2.6.22 as well as 2.6.17.7.
> >> thank you .
> >>
> >> 4391322.839000] Filesystem "md3": XFS internal error
> >> xfs_alloc_read_agf at line 2176 of file fs/xfs/xfs_alloc.c.  Caller
> >> 0xc10d31ea
> >> [4391322.863000]  <c10d36e9> xfs_alloc_read_agf+0x199/0x220
> >> <c10d31ea> xfs_alloc_fix_freelist+0x41a/0x4b0
> >
> >Judging by the kernel addresses (<c10d36e9>) you're running
> >an i386 kernel right? Which means there's probably a wrapping
> >issue at 8TB somewhere in the code which has caused an AGF
> 
> what is AGF ?

An AGF is a "Allocation Group Freespace" structure that holds
the free space indexes that the allocator uses. The AGF holds
the root of the btrees used to find space, so if the AGF is
trashed, you're in big trouble. :/

> >header to be trashed somewhere lower down in the filesystem.
> >what does /proc/partitions say? I.e. does the kernel see
> >the whole 9TB of space?
> >
> >What does xfs_repair tell you about the corruption? (assuming
> >it doesn't OOM, which is a good chance if you really are on
> >i386).
> 
> Well you are right.  /proc/partitions  says:
> ....
>   8   241  488384001 sdp1
>   9     1 3404964864 md1
>   9     2 3418684416 md2
>   9     3 6823647232 md3
> 
> while xfs formats md3 as 9 TB.
> If i am using LBD , what is the biggest size I can use on i386 ?

Supposedly 16TB. 32bit x 4k page size = 16TB. Given that the size is
not being reported correctly, I'd say that this is probably not an
XFS issue. The next thing to check is how large an MD device you
can create correctly.

Neil, do you know of any problems with > 8TB md devices on i386?

Cheers,

Dave.
> 
> many thanks
> raz
> 
> 
> 
> 
> -- 
> Raz

-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid50 and 9TB volumes
  2007-07-16 22:18     ` David Chinner
@ 2007-07-16 23:56       ` Neil Brown
  2007-07-17  0:12         ` David Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Neil Brown @ 2007-07-16 23:56 UTC (permalink / raw)
  To: David Chinner; +Cc: Raz, xfs-oss

On Tuesday July 17, dgc@sgi.com wrote:
> On Mon, Jul 16, 2007 at 04:53:22PM +0300, Raz wrote:
> > 
> > Well you are right.  /proc/partitions  says:
> > ....
> >   8   241  488384001 sdp1
> >   9     1 3404964864 md1
> >   9     2 3418684416 md2
> >   9     3 6823647232 md3
> > 
> > while xfs formats md3 as 9 TB.
> > If i am using LBD , what is the biggest size I can use on i386 ?
> 
> Supposedly 16TB. 32bit x 4k page size = 16TB. Given that the size is
> not being reported correctly, I'd say that this is probably not an
> XFS issue. The next thing to check is how large an MD device you
> can create correctly.
> 
> Neil, do you know of any problems with > 8TB md devices on i386?

Should work, but the amount of testing has been limited, and bugs
have existed.

Each component of a raid5 is limited to 2^32 K by the metadata, so
that is 4TB.  At 490GB, you are well under that.
There should be no problem with a 3TB raid5, providing LBD has been
selected.

raid0 over 3TB devices should also be fine.  There was a bug fixed in
May this year that caused problem with md/raid0 was used over
components larger than 4TB on a 32bit host, but that shouldn't affect
you and it does suggest that someone had success with a very large
raid0 once this bug was fixed.

If XFS is given a 6.8TB devices and formats it as 9TB, then I would be
looking at mkfs.xfs(??).

NeilBrown

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid50 and 9TB volumes
  2007-07-16 23:56       ` Neil Brown
@ 2007-07-17  0:12         ` David Chinner
  2007-07-17  0:54           ` Neil Brown
  0 siblings, 1 reply; 15+ messages in thread
From: David Chinner @ 2007-07-17  0:12 UTC (permalink / raw)
  To: Neil Brown; +Cc: David Chinner, Raz, xfs-oss

On Tue, Jul 17, 2007 at 09:56:25AM +1000, Neil Brown wrote:
> On Tuesday July 17, dgc@sgi.com wrote:
> > On Mon, Jul 16, 2007 at 04:53:22PM +0300, Raz wrote:
> > > 
> > > Well you are right.  /proc/partitions  says:
> > > ....
> > >   8   241  488384001 sdp1
> > >   9     1 3404964864 md1
> > >   9     2 3418684416 md2
> > >   9     3 6823647232 md3
> > > 
> > > while xfs formats md3 as 9 TB.
> > > If i am using LBD , what is the biggest size I can use on i386 ?
> > 
> > Supposedly 16TB. 32bit x 4k page size = 16TB. Given that the size is
> > not being reported correctly, I'd say that this is probably not an
> > XFS issue. The next thing to check is how large an MD device you
> > can create correctly.
> > 
> > Neil, do you know of any problems with > 8TB md devices on i386?
> 
> Should work, but the amount of testing has been limited, and bugs
> have existed.
> 
> Each component of a raid5 is limited to 2^32 K by the metadata, so
> that is 4TB.  At 490GB, you are well under that.
> There should be no problem with a 3TB raid5, providing LBD has been
> selected.
> 
> raid0 over 3TB devices should also be fine.  There was a bug fixed in
> May this year that caused problem with md/raid0 was used over
> components larger than 4TB on a 32bit host, but that shouldn't affect
> you and it does suggest that someone had success with a very large
> raid0 once this bug was fixed.
> 
> If XFS is given a 6.8TB devices and formats it as 9TB, then I would be
> looking at mkfs.xfs(??).

mkfs.xfs tries to read the last block of the device that it is given
and proceeds only if that read is successful. IOWs, mkfs.xfs has been
told the size of the device is 9TB, it's successfully read from offset
9TB, so the device must be at least 9TB.

However, internal to the kernel there appears to be some kind of
wrapping bug and typically that shows up with /proc/partition
showing an incosistent size of the partition compared to other
utilities.

We've come across this problem repeatedly over the past few years
with exactly these symptoms (the end of the FS overwriting the front
of the FS), which was why it was the first question I asked. It
has always been a block layer or partition problem and they always
show up on i386 with filesystems larger than 2TB.

FWIW, what a partitioning tool (if any) is being used here and
what does it think the size of the partitions are?.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid50 and 9TB volumes
  2007-07-17  0:12         ` David Chinner
@ 2007-07-17  0:54           ` Neil Brown
  2007-07-17  0:58             ` David Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Neil Brown @ 2007-07-17  0:54 UTC (permalink / raw)
  To: David Chinner; +Cc: Raz, xfs-oss

On Tuesday July 17, dgc@sgi.com wrote:
> On Tue, Jul 17, 2007 at 09:56:25AM +1000, Neil Brown wrote:
> > On Tuesday July 17, dgc@sgi.com wrote:
> > > On Mon, Jul 16, 2007 at 04:53:22PM +0300, Raz wrote:
> > > > 
> > > > Well you are right.  /proc/partitions  says:
> > > > ....
> > > >   8   241  488384001 sdp1
> > > >   9     1 3404964864 md1
> > > >   9     2 3418684416 md2
> > > >   9     3 6823647232 md3
> > > > 
> > > > while xfs formats md3 as 9 TB.
..
> > 
> > If XFS is given a 6.8TB devices and formats it as 9TB, then I would be
> > looking at mkfs.xfs(??).
> 
> mkfs.xfs tries to read the last block of the device that it is given
> and proceeds only if that read is successful. IOWs, mkfs.xfs has been
> told the size of the device is 9TB, it's successfully read from offset
> 9TB, so the device must be at least 9TB.

Odd.
Given that the drives are 490GB, and there are 8 in a raid5 array,
the raid5 arrays are really under 3.5GB.  And two of them is less than
7GB.  So there definitely are not 9TB worth of bytes..

mkfs.xfs uses the BLKGETSIZE64 ioctl which returns
bdev->bi_inode->i_size, where as /proc/partitions uses get_capacity
which uses disk->capacity, so there is some room for them to return
different values... Except that on open, it calls
   bd_set_size(bdev, (loff_t)get_capacity(disk)<<9);
which makes sure the two have the same value.

I cannot see where the size difference comes from.
What does
   /sbin/blockdev --getsize64
report for each of the different devices, as compared to what
/proc/partitions reports?

NeilBrown

> 
> However, internal to the kernel there appears to be some kind of
> wrapping bug and typically that shows up with /proc/partition
> showing an incosistent size of the partition compared to other
> utilities.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid50 and 9TB volumes
  2007-07-17  0:54           ` Neil Brown
@ 2007-07-17  0:58             ` David Chinner
  2007-07-23  6:09               ` Raz
  0 siblings, 1 reply; 15+ messages in thread
From: David Chinner @ 2007-07-17  0:58 UTC (permalink / raw)
  To: Neil Brown; +Cc: David Chinner, Raz, xfs-oss

On Tue, Jul 17, 2007 at 10:54:36AM +1000, Neil Brown wrote:
> On Tuesday July 17, dgc@sgi.com wrote:
> > On Tue, Jul 17, 2007 at 09:56:25AM +1000, Neil Brown wrote:
> > > On Tuesday July 17, dgc@sgi.com wrote:
> > > > On Mon, Jul 16, 2007 at 04:53:22PM +0300, Raz wrote:
> > > > > 
> > > > > Well you are right.  /proc/partitions  says:
> > > > > ....
> > > > >   8   241  488384001 sdp1
> > > > >   9     1 3404964864 md1
> > > > >   9     2 3418684416 md2
> > > > >   9     3 6823647232 md3
> > > > > 
> > > > > while xfs formats md3 as 9 TB.
> ..
> > > 
> > > If XFS is given a 6.8TB devices and formats it as 9TB, then I would be
> > > looking at mkfs.xfs(??).
> > 
> > mkfs.xfs tries to read the last block of the device that it is given
> > and proceeds only if that read is successful. IOWs, mkfs.xfs has been
> > told the size of the device is 9TB, it's successfully read from offset
> > 9TB, so the device must be at least 9TB.
> 
> Odd.
> Given that the drives are 490GB, and there are 8 in a raid5 array,
> the raid5 arrays are really under 3.5GB.  And two of them is less than
> 7GB.  So there definitely are not 9TB worth of bytes..
> 
> mkfs.xfs uses the BLKGETSIZE64 ioctl which returns
> bdev->bi_inode->i_size, where as /proc/partitions uses get_capacity
> which uses disk->capacity, so there is some room for them to return
> different values... Except that on open, it calls
>    bd_set_size(bdev, (loff_t)get_capacity(disk)<<9);
> which makes sure the two have the same value.
> 
> I cannot see where the size difference comes from.
> What does
>    /sbin/blockdev --getsize64
> report for each of the different devices, as compared to what
> /proc/partitions reports?

And add to that the output of `xfs_growfs -n <mntpt>` so we can
see what XFS really thinks the size of the filesystem is.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid50 and 9TB volumes
  2007-07-17  0:58             ` David Chinner
@ 2007-07-23  6:09               ` Raz
  2007-07-24  1:01                 ` David Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Raz @ 2007-07-23  6:09 UTC (permalink / raw)
  To: David Chinner; +Cc: xfs-oss

On 7/17/07, David Chinner <dgc@sgi.com> wrote:
> On Tue, Jul 17, 2007 at 10:54:36AM +1000, Neil Brown wrote:
> > On Tuesday July 17, dgc@sgi.com wrote:
> > > On Tue, Jul 17, 2007 at 09:56:25AM +1000, Neil Brown wrote:
> > > > On Tuesday July 17, dgc@sgi.com wrote:
> > > > > On Mon, Jul 16, 2007 at 04:53:22PM +0300, Raz wrote:
> > > > > >
> > > > > > Well you are right.  /proc/partitions  says:
> > > > > > ....
> > > > > >   8   241  488384001 sdp1
> > > > > >   9     1 3404964864 md1
> > > > > >   9     2 3418684416 md2
> > > > > >   9     3 6823647232 md3
> > > > > >
> > > > > > while xfs formats md3 as 9 TB.
> > ..
> > > >
> > > > If XFS is given a 6.8TB devices and formats it as 9TB, then I would be
> > > > looking at mkfs.xfs(??).
> > >
> > > mkfs.xfs tries to read the last block of the device that it is given
> > > and proceeds only if that read is successful. IOWs, mkfs.xfs has been
> > > told the size of the device is 9TB, it's successfully read from offset
> > > 9TB, so the device must be at least 9TB.
> >
> > Odd.
> > Given that the drives are 490GB, and there are 8 in a raid5 array,
> > the raid5 arrays are really under 3.5GB.  And two of them is less than
> > 7GB.  So there definitely are not 9TB worth of bytes..
> >
> > mkfs.xfs uses the BLKGETSIZE64 ioctl which returns
> > bdev->bi_inode->i_size, where as /proc/partitions uses get_capacity
> > which uses disk->capacity, so there is some room for them to return
> > different values... Except that on open, it calls
> >    bd_set_size(bdev, (loff_t)get_capacity(disk)<<9);
> > which makes sure the two have the same value.
> >
> > I cannot see where the size difference comes from.
> > What does
> >    /sbin/blockdev --getsize64
> > report for each of the different devices, as compared to what
> > /proc/partitions reports?
> And add to that the output of `xfs_growfs -n <mntpt>` so we can
> see what XFS really thinks the size of the filesystem is.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
>

My QA to re-installed the system. same kernel, different results. now,
/proc/paritions
reports :
   9     1 5114281984 md1
   9     2 5128001536 md2
   9     3 10242281472 md3

blockdev --getsize64 /dev/md3
10488096227328

but xfs keeps on crashing. when formatting it ot 6.3 TB we're OK. when
letting xfs's mkfs choose the

/dev/hda1             243M  155M   76M  68% /
/dev/hda1             243M  155M   76M  68% /
/dev/md0              1.9G   35M  1.8G   2% /d0
/dev/md3              6.3T  5.7T  593G  91% /d1


when formatting to 6.4 TB:
xfs_growfs -n ( or xfs_info ) reports:
meta-data=/dev/md3               isize=256    agcount=33, agsize=52428544 blks
              =                       sectsz=512   attr=0
data        =                       bsize=4096   blocks=1677721600, imaxpct=25
              =                       sunit=256    swidth=512 blks,
unwritten=1
naming   =   version 2        bsize=4096
log         =   internal          bsize=4096   blocks=32768, version=1
             =                        sectsz=512   sunit=0 blks
realtime =none                 extsz=2097152 blocks=0, rtextents=0

when formatting without any size argument, xfs_growfs reports:

meta-data=/dev/md3               isize=256    agcount=33, agsize=80017664 blks
              =                          sectsz=512   attr=0
data        =                           bsize=4096
blocks=2560570368, imaxpct=25
              =                          sunit=256    swidth=512 blks,
unwritten=1
naming   = version 2             bsize=4096
log         = internal               bsize=4096   blocks=32768, version=1
             =                           sectsz=512   sunit=0 blks
realtime =none                     extsz=2097152 blocks=0, rtextents=0

in this case , xfs crashes again.
[4613896.794000]  <c10d36e9> xfs_alloc_read_agf+0x199/0x220
<c10d31ea> xfs_alloc_fix_freelist+0x41a/0x4b0
[4613896.794000]  <c10d31ea> xfs_alloc_fix_freelist+0x41a/0x4b0
<c10d31ea> xfs_alloc_fix_freelist+0x41a/0x4b0
[4613896.794000]  <c1006ebc> timer_interrupt+0x6c/0xa0  <c1042e64>
__do_IRQ+0xc4/0x120
[4613896.795000]  <c100595e> do_IRQ+0x1e/0x30  <c1003aba>
common_interrupt+0x1a/0x20
[4613896.795000]  <c10d3a77> xfs_alloc_vextent+0x307/0x5b0  <c10e49fb>
xfs_bmap_btalloc+0x41b/0x980
[4613896.795000]  <c1114a46> xfs_iext_bno_to_ext+0x126/0x1d0
<c10283b6> update_wall_time_one_tick+0x6/0x80
[4613896.795000]  <c102846a> update_wall_time+0xa/0x40  <c10e9355>
xfs_bmapi+0x1495/0x18d0
[4613896.795000]  <c1114a46> xfs_iext_bno_to_ext+0x126/0x1d0
<c10e698c> xfs_bmap_search_multi_extents+0xfc/0x110
[4613896.795000]  <c1117a07> xfs_iomap_write_allocate+0x327/0x620
<c104807c> mempool_free+0x4c/0xa0
[4613896.795000]  <c104807c> mempool_free+0x4c/0xa0  <c106b3f8>
bio_fs_destructor+0x18/0x20
[4613896.795000]  <c11164a0> xfs_iomap+0x440/0x570  <c113979b>
xfs_map_blocks+0x5b/0xa0
[4613896.795000]  <c113aa3a> xfs_page_state_convert+0x46a/0x7a0
<c1044d7b> find_get_pages_tag+0x7b/0x90
[4613896.795000]  <c113add9> xfs_vm_writepage+0x69/0x100  <c108df58>
mpage_writepages+0x218/0x3f0
[4613896.795000]  <c113ad70> xfs_vm_writepage+0x0/0x100  <c104b614>
do_writepages+0x54/0x60
[4613896.795000]  <c108be86> __sync_single_inode+0x66/0x1f0
<c108c098> __writeback_single_inode+0x88/0x1b0
[4613896.795000]  <c1028120> del_timer_sync+0x10/0x20  <c1298ee0>
schedule_timeout+0x60/0xb0
[4613896.795000]  <c10288a0> process_timeout+0x0/0x10  <c108c3a7>
sync_sb_inodes+0x1e7/0x300
[4613896.795000]  <c108c595> writeback_inodes+0xd5/0xf0  <c104afc2>
balance_dirty_pages+0xd2/0x190
[4613896.795000]  <c106a07c> generic_commit_write+0x7c/0xa0
<c1046950> generic_file_buffered_write+0x310/0x6b0
[4613896.795000]  <c1082b7d> file_update_time+0x5d/0xe0  <c114366a>
xfs_write+0xc0a/0xe00
[4613896.795000]  <c1156eef> __bitmap_weight+0x5f/0x80  <c1141f47>
xfs_read+0x1a7/0x370
[4613896.795000]  <c113dc9f> xfs_file_aio_write+0x8f/0xa0  <c1065a71>
do_sync_write+0xd1/0x120
[4613896.795000]  <c1033650> autoremove_wake_function+0x0/0x60
<c1065b88> vfs_write+0xc8/0x190
[4613896.795000]  <c1065d21> sys_write+0x51/0x80  <c10030ef>
syscall_call+0x7/0xb
[root@video1 eyal_kaufer]$


-- 
Raz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid50 and 9TB volumes
  2007-07-23  6:09               ` Raz
@ 2007-07-24  1:01                 ` David Chinner
  2007-08-07  9:20                   ` Raz
  0 siblings, 1 reply; 15+ messages in thread
From: David Chinner @ 2007-07-24  1:01 UTC (permalink / raw)
  To: Raz; +Cc: David Chinner, xfs-oss

On Mon, Jul 23, 2007 at 09:09:03AM +0300, Raz wrote:
> My QA to re-installed the system. same kernel, different results. now,
> /proc/paritions
> reports :
>   9     1 5114281984 md1
>   9     2 5128001536 md2
>   9     3 10242281472 md3
> 
> blockdev --getsize64 /dev/md3
> 10488096227328
> 
> but xfs keeps on crashing. when formatting it ot 6.3 TB we're OK. when
> letting xfs's mkfs choose the

So at 6.3TB everything is ok. At what point does it start having
problems? 6.4TB, 6.8TB, 8TB, 9TB?

I know Neil pointed out that you shouldn't have 10TB but closer to
7TB - is this true?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid50 and 9TB volumes
  2007-07-24  1:01                 ` David Chinner
@ 2007-08-07  9:20                   ` Raz
  2007-09-03 14:24                     ` Raz
  0 siblings, 1 reply; 15+ messages in thread
From: Raz @ 2007-08-07  9:20 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-xfs

On 7/24/07, David Chinner <dgc@sgi.com> wrote:
> On Mon, Jul 23, 2007 at 09:09:03AM +0300, Raz wrote:
> > My QA to re-installed the system. same kernel, different results. now,
> > /proc/paritions
> > reports :
> >   9     1 5114281984 md1
> >   9     2 5128001536 md2
> >   9     3 10242281472 md3
> >
> > blockdev --getsize64 /dev/md3
> > 10488096227328
> >
> > but xfs keeps on crashing. when formatting it ot 6.3 TB we're OK. when
> > letting xfs's mkfs choose the
>
> So at 6.3TB everything is ok. At what point does it start having
> problems? 6.4TB, 6.8TB, 8TB, 9TB?
over 8 TB. we checked several times. in 8.5 it crashes.
> I know Neil pointed out that you shouldn't have 10TB but closer to
> 7TB - is this true?
the drives are of 750 GB each.
> Cheers,
>
> Dave.
> --
> Dave Chinner
> Principal Engineer
> SGI Australian Software Group
>


-- 
Raz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid50 and 9TB volumes
  2007-08-07  9:20                   ` Raz
@ 2007-09-03 14:24                     ` Raz
  2007-09-03 18:55                       ` Christian Kujau
  0 siblings, 1 reply; 15+ messages in thread
From: Raz @ 2007-09-03 14:24 UTC (permalink / raw)
  To: David Chinner; +Cc: linux-xfs

Dave hello.
What is the curret status of this problem ? If you recall , xfs in 32bit
over 10 TB md device ( raid50 in this case) sees only 8TB ( and no more).
The disks I am using are 750GB hitachi.
kernel is 2.6.17.

thank you
raz


On 8/7/07, Raz <raziebe@gmail.com> wrote:
> On 7/24/07, David Chinner <dgc@sgi.com> wrote:
> > On Mon, Jul 23, 2007 at 09:09:03AM +0300, Raz wrote:
> > > My QA to re-installed the system. same kernel, different results. now,
> > > /proc/paritions
> > > reports :
> > >   9     1 5114281984 md1
> > >   9     2 5128001536 md2
> > >   9     3 10242281472 md3
> > >
> > > blockdev --getsize64 /dev/md3
> > > 10488096227328
> > >
> > > but xfs keeps on crashing. when formatting it ot 6.3 TB we're OK. when
> > > letting xfs's mkfs choose the
> >
> > So at 6.3TB everything is ok. At what point does it start having
> > problems? 6.4TB, 6.8TB, 8TB, 9TB?
> over 8 TB. we checked several times. in 8.5 it crashes.
> > I know Neil pointed out that you shouldn't have 10TB but closer to
> > 7TB - is this true?
> the drives are of 750 GB each.
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > Principal Engineer
> > SGI Australian Software Group
> >
>
>
> --
> Raz
>


-- 
Raz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid50 and 9TB volumes
  2007-09-03 14:24                     ` Raz
@ 2007-09-03 18:55                       ` Christian Kujau
  2007-09-04  2:50                         ` Eric Sandeen
  0 siblings, 1 reply; 15+ messages in thread
From: Christian Kujau @ 2007-09-03 18:55 UTC (permalink / raw)
  To: linux-xfs

On Mon, 3 Sep 2007, Raz wrote:
> What is the curret status of this problem ? If you recall , xfs in 32bit
> over 10 TB md device ( raid50 in this case) sees only 8TB ( and no more).
> The disks I am using are 750GB hitachi.
> kernel is 2.6.17.

dunno about this particular issue, but you meant "kernel 2.6.17.7 or 
higher", right? If not: http://oss.sgi.com/projects/xfs/faq.html#dir2

-- 
BOFH excuse #374:

It's the InterNIC's fault.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid50 and 9TB volumes
  2007-09-03 18:55                       ` Christian Kujau
@ 2007-09-04  2:50                         ` Eric Sandeen
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Sandeen @ 2007-09-04  2:50 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-xfs

Christian Kujau wrote:
> On Mon, 3 Sep 2007, Raz wrote:
>> What is the curret status of this problem ? If you recall , xfs in 32bit
>> over 10 TB md device ( raid50 in this case) sees only 8TB ( and no more).
>> The disks I am using are 750GB hitachi.
>> kernel is 2.6.17.
> 
> dunno about this particular issue, but you meant "kernel 2.6.17.7 or 
> higher", right? If not: http://oss.sgi.com/projects/xfs/faq.html#dir2

That shouldn't be affected by volume size.

Raz, are you certain that the MD volume is in good shape at this point,
after Neil's questions?  If it's something you can test on, I'd suggest
getting lmdd and writing a pattern directly to the block device,
spanning the 8T point, then go read it back & double check that all is
well.  XFS certainly has been tested at 8T and above; if I had to bet on
it, I'd bet at a problem in another layer, or the configuration.

-Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-09-04  2:50 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-16 12:42 raid50 and 9TB volumes Raz
2007-07-16 13:01 ` David Chinner
2007-07-16 13:57   ` Raz
2007-07-16 15:24     ` Eric Sandeen
     [not found]   ` <5d96567b0707160653m5951fac9v5a56bb4c92174d63@mail.gmail.com>
2007-07-16 22:18     ` David Chinner
2007-07-16 23:56       ` Neil Brown
2007-07-17  0:12         ` David Chinner
2007-07-17  0:54           ` Neil Brown
2007-07-17  0:58             ` David Chinner
2007-07-23  6:09               ` Raz
2007-07-24  1:01                 ` David Chinner
2007-08-07  9:20                   ` Raz
2007-09-03 14:24                     ` Raz
2007-09-03 18:55                       ` Christian Kujau
2007-09-04  2:50                         ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox