public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* mkfs.xfs with a 9TB realtime volume hangs
@ 2008-11-14 10:41 Jan Wagner
  2009-01-10 21:13 ` Eric Sandeen
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Wagner @ 2008-11-14 10:41 UTC (permalink / raw)
  To: xfs

Hi,

I have a RAID0 with 11x750GB+1x1TB components in the following 
partitionable-md test setup

root@abidal:~# cat /proc/partitions | grep md
  254     0 9035047936 md_d0
  254     1     124983 md_d0p1
  254     2    1828125 md_d0p2
  254     3    1953125 md_d0p3
  254     4 9031141669 md_d0p4

Essentially, four partitions: 128MB, ~1.9GB, 2GB, 9TB. I'd like to use the 
1.9GB partition for xfs and put a realtime subvolume onto the same raid0 
onto the 9TB partition. The partition tables are GDT instead of MBR to be 
able to have >=2TB partitions.

When I create xfs with realtime subvolume on the 2GB partition all is 
fine:

root@abidal:~# mkfs.xfs -f -d su=1024k,sw=12 -r rtdev=/dev/md_d0p3 /dev/md_d0p2
log stripe unit (1048576 bytes) is too large (maximum is 256KiB)
log stripe unit adjusted to 32KiB
meta-data=/dev/md_d0p2           isize=256    agcount=4, agsize=114432 
blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=457031, imaxpct=25
          =                       sunit=256    swidth=3072 blks
naming   =version 2              bsize=4096
log      =internal log           bsize=4096   blocks=2560, version=2
          =                       sectsz=512   sunit=8 blks, lazy-count=0
realtime =/dev/md_d0p3           extsz=4096   blocks=488281, rtextents=488281

When I try the same but place the realtime subvolume on the 9TB partition 
the mkfs.xfs hangs indefinitely with 100% CPU:

root@abidal:~# mkfs.xfs -f -d su=1024k,sw=12 -r rtdev=/dev/md_d0p4 /dev/md_d0p2
log stripe unit (1048576 bytes) is too large (maximum is 256KiB)
log stripe unit adjusted to 32KiB
meta-data=/dev/md_d0p2           isize=256    agcount=4, agsize=114432 
blks
          =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=457031, imaxpct=25
          =                       sunit=256    swidth=3072 blks
naming   =version 2              bsize=4096
log      =internal log           bsize=4096   blocks=2560, version=2
          =                       sectsz=512   sunit=8 blks, lazy-count=0
realtime =/dev/md_d0p4           extsz=4096   blocks=2257785417, rtextents=2257785417
(hangs...)

When I run strace on the first, it completes with
...
pwrite(4, "IABT\0\0\0\0\377\377\377\377\377\377\377\377\0\0\0\0\0"..., 
4096, 468725760) = 4096
pwrite(4, "XAGI\0\0\0\1\0\0\0\1\0\1\277\0\0\0\0\0\0\0\0\3\0\0\0\1"..., 
512, 468714496) = 512
pread(4, "XFSB\0\0\20\0\0\0\0\0\0\6\371G\0\0\0\0\0\7sY\0\0\0\0\0"..., 512, 
0) = 512
pwrite(4, "XFSB\0\0\20\0\0\0\0\0\0\6\371G\0\0\0\0\0\7sY\0\0\0\0\0"..., 
512, 0) = 512
fsync(5)                                = 0
ioctl(5, BLKFLSBUF, 0)                  = 0
close(5)                                = 0
fsync(4)                                = 0
ioctl(4, BLKFLSBUF, 0)                  = 0
close(4)                                = 0
exit_group(0)                           = ?

When I run strace on the latter mkfs.xfs it is reading for hours

pread(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
4096, 7802880) = 4096
brk(0x1667000)                          = 0x1667000
pread(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
4096, 7806976) = 4096
pread(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 
4096, 7811072) = 4096
....

Any ideas?

  - Jan

--
****************************************************
  Helsinki University of Technology
  Dept. of Metsähovi Radio Observatory
  http://www.metsahovi.fi/~jwagner/
  Work +358-9-428320-36

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mkfs.xfs with a 9TB realtime volume hangs
  2008-11-14 10:41 mkfs.xfs with a 9TB realtime volume hangs Jan Wagner
@ 2009-01-10 21:13 ` Eric Sandeen
  2009-01-11 10:35   ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2009-01-10 21:13 UTC (permalink / raw)
  To: Jan Wagner; +Cc: xfs

Jan Wagner wrote:
> Hi,
> 
> I have a RAID0 with 11x750GB+1x1TB components in the following 
> partitionable-md test setup
> 
> root@abidal:~# cat /proc/partitions | grep md
>   254     0 9035047936 md_d0
>   254     1     124983 md_d0p1
>   254     2    1828125 md_d0p2
>   254     3    1953125 md_d0p3
>   254     4 9031141669 md_d0p4
> 
> Essentially, four partitions: 128MB, ~1.9GB, 2GB, 9TB. I'd like to use the 
> 1.9GB partition for xfs and put a realtime subvolume onto the same raid0 
> onto the 9TB partition. The partition tables are GDT instead of MBR to be 
> able to have >=2TB partitions.

Sorry for the slow/no reply.  It seems to be doing many calculations in
rtinit, haven't sorted out what yet, but it's not likely hung, it's
workin hard.  :)

If you give it a larger extsize it should go faster (if the larger
extsize is acceptable for your use...)

I tried a 4t realtime volume:

mkfs.xfs -dfile,name=fsfile,size=1g -rfile,name=rtfile,size=4t,extsize=$SIZE

for a few different extent sizes, and got

extsize	  time
-------	  ----
512k	  0.3s
256k	  0.7s
128k	  1.9s
 64k	  8.4s
 32k	 25.4s
 16k	129.4s

With the default 4k extent size this takes forever (the man page claims
default is 64k, maybe this got broken at some point).

Somebody will need to find time to look into what's going on.

I think it's doing lots of work in rtinit, something like

#0  xfs_rtfind_back (mp=0x7fffa69e2f30, tp=0x3afc8b0, start=115245056,
limit=0, rtblock=0x7fffa69e28b8) at xfs_rtalloc.c:83
#1  0x000000000041433c in xfs_rtfree_range (mp=0x7fffa69e2f30,
tp=0x3afc8b0, start=115245056, len=32768, rbpp=0x7fffa69e2908,
rsb=0x7fffa69e2910)
    at xfs_rtalloc.c:448
#2  0x00000000004144e4 in libxfs_rtfree_extent (tp=0x3afc8b0,
bno=115245056, len=32768) at xfs_rtalloc.c:756
#3  0x0000000000403a6b in parseproto (mp=0x7fffa69e2f30, pip=<value
optimized out>, fsxp=0x7fffa69e3240, pp=0x7fffa69e2ef0, name=<value
optimized out>)
    at proto.c:752
...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mkfs.xfs with a 9TB realtime volume hangs
  2009-01-10 21:13 ` Eric Sandeen
@ 2009-01-11 10:35   ` Dave Chinner
  2009-01-11 13:46     ` Eric Sandeen
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2009-01-11 10:35 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs, Jan Wagner

On Sat, Jan 10, 2009 at 03:13:23PM -0600, Eric Sandeen wrote:
> Jan Wagner wrote:
> > Hi,
> > 
> > I have a RAID0 with 11x750GB+1x1TB components in the following 
> > partitionable-md test setup
> > 
> > root@abidal:~# cat /proc/partitions | grep md
> >   254     0 9035047936 md_d0
> >   254     1     124983 md_d0p1
> >   254     2    1828125 md_d0p2
> >   254     3    1953125 md_d0p3
> >   254     4 9031141669 md_d0p4
> > 
> > Essentially, four partitions: 128MB, ~1.9GB, 2GB, 9TB. I'd like to use the 
> > 1.9GB partition for xfs and put a realtime subvolume onto the same raid0 
> > onto the 9TB partition. The partition tables are GDT instead of MBR to be 
> > able to have >=2TB partitions.
> 
> Sorry for the slow/no reply.  It seems to be doing many calculations in
> rtinit, haven't sorted out what yet, but it's not likely hung, it's
> workin hard.  :)
> 
> If you give it a larger extsize it should go faster (if the larger
> extsize is acceptable for your use...)
> 
> I tried a 4t realtime volume:
> 
> mkfs.xfs -dfile,name=fsfile,size=1g -rfile,name=rtfile,size=4t,extsize=$SIZE
> 
> for a few different extent sizes, and got
> 
> extsize	  time
> -------	  ----
> 512k	  0.3s
> 256k	  0.7s
> 128k	  1.9s
>  64k	  8.4s
>  32k	 25.4s
>  16k	129.4s
> 
> With the default 4k extent size this takes forever (the man page claims
> default is 64k, maybe this got broken at some point).

It got changed a few years back by Nathan, IIRC. I bet the time
being taken a result of the blow-out in bitmap size caused by reducing
the extent size. Given it is non-linear, it may have something to do
with cache sizes as well.  e.g buftarg hashes not large enough.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: mkfs.xfs with a 9TB realtime volume hangs
  2009-01-11 10:35   ` Dave Chinner
@ 2009-01-11 13:46     ` Eric Sandeen
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Sandeen @ 2009-01-11 13:46 UTC (permalink / raw)
  To: Eric Sandeen, Jan Wagner, xfs

Dave Chinner wrote:
> On Sat, Jan 10, 2009 at 03:13:23PM -0600, Eric Sandeen wrote:


>> With the default 4k extent size this takes forever (the man page claims
>> default is 64k, maybe this got broken at some point).
> 
> It got changed a few years back by Nathan, IIRC. 

Yep, I found the commit yesterday, it was for buffered IO's benefit on
the rt subvol, to reduce unwritten extent conversion IIRC.

I was going to follow up w/ the commit etc yesterday, but the mailing
list was down (again, sigh - this is indicative of sgi's staunch
commitment to xfs, I'm sure) so had nothing to reply to.

-Eric

> I bet the time
> being taken a result of the blow-out in bitmap size caused by reducing
> the extent size. Given it is non-linear, it may have something to do
> with cache sizes as well.  e.g buftarg hashes not large enough.
> 
> Cheers,
> 
> Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-01-11 13:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-14 10:41 mkfs.xfs with a 9TB realtime volume hangs Jan Wagner
2009-01-10 21:13 ` Eric Sandeen
2009-01-11 10:35   ` Dave Chinner
2009-01-11 13:46     ` Eric Sandeen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox