* posix_fallocate
@ 2010-05-07 8:22 Krzysztof Błaszkowski
2010-05-07 9:23 ` posix_fallocate Stan Hoeppner
2010-05-07 16:26 ` posix_fallocate Eric Sandeen
0 siblings, 2 replies; 15+ messages in thread
From: Krzysztof Błaszkowski @ 2010-05-07 8:22 UTC (permalink / raw)
To: xfs
Hello,
I use this to preallocate large space but found an issue. Posix_fallocate
works right with sizes like 100G, 1T and even 10T on some boxes (on some
other can fail after e.g. 7T threshold) but if i tried e.g. 16T the user
space process would be "R"unning forever and it is not interruptible.
Furthermore some other not related processes like sshd, bash enter D state.
There is nothing in kernel log.
I made so far a few logs with ftrace facility for 1G, 100G, 1T and 10T sizes.
I noticed that for 1st three sizes the log is as long as abt 1.5M (2M peak)
while 10T generates 94M long log. I couldn't retrieve a log for 17T case
because "cat /sys ... /trace" enters D.
I would appreciate any help because i gave up with ftrace logs analysis. The
xfs_vn_fallocate is covered in abt 11k lines for a 1.5M log case while there
are abt 163k lines in 94M log. And all i could see is poss some relationship
between time spent in xfs_vn_fallocate subfunctions vs requested space.
Box details:
16 Hitachi 2TB drives (backplane connected), dm, 1 lvm lun of 25T size,
kernel 2.6.31.5, more recent kernels neither xfs were not tested.
Regards,
Krzysztof Blaszkowski
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: posix_fallocate
2010-05-07 8:22 posix_fallocate Krzysztof Błaszkowski
@ 2010-05-07 9:23 ` Stan Hoeppner
2010-05-07 9:48 ` posix_fallocate Krzysztof Błaszkowski
2010-05-07 10:07 ` posix_fallocate Krzysztof Błaszkowski
2010-05-07 16:26 ` posix_fallocate Eric Sandeen
1 sibling, 2 replies; 15+ messages in thread
From: Stan Hoeppner @ 2010-05-07 9:23 UTC (permalink / raw)
To: xfs
Krzysztof Błaszkowski put forth on 5/7/2010 3:22 AM:
> Hello,
>
> I use this to preallocate large space but found an issue. Posix_fallocate
> works right with sizes like 100G, 1T and even 10T on some boxes (on some
> other can fail after e.g. 7T threshold) but if i tried e.g. 16T the user
> space process would be "R"unning forever and it is not interruptible.
> Furthermore some other not related processes like sshd, bash enter D state.
> There is nothing in kernel log.
>
> I made so far a few logs with ftrace facility for 1G, 100G, 1T and 10T sizes.
> I noticed that for 1st three sizes the log is as long as abt 1.5M (2M peak)
> while 10T generates 94M long log. I couldn't retrieve a log for 17T case
> because "cat /sys ... /trace" enters D.
>
> I would appreciate any help because i gave up with ftrace logs analysis. The
> xfs_vn_fallocate is covered in abt 11k lines for a 1.5M log case while there
> are abt 163k lines in 94M log. And all i could see is poss some relationship
> between time spent in xfs_vn_fallocate subfunctions vs requested space.
>
> Box details:
> 16 Hitachi 2TB drives (backplane connected), dm, 1 lvm lun of 25T size,
> kernel 2.6.31.5, more recent kernels neither xfs were not tested.
32 or 64 bit kernel? What is the size of the XFS filesystem on the 25TB LVM
LUN against which you're running posix_fallocate? The reason I ask is that
XFS has a 16TB per filesystem limitation on 32 bit kernels. I can only
assume that your XFS filesystem is larger than 16TB since you're attempting
to posix_fallocate 16TB. But, it's best to ask for confirmation rather than
assume, especially given that your problem is appearing near that magical
16TB boundary.
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: posix_fallocate
2010-05-07 9:23 ` posix_fallocate Stan Hoeppner
@ 2010-05-07 9:48 ` Krzysztof Błaszkowski
2010-05-07 10:07 ` posix_fallocate Krzysztof Błaszkowski
1 sibling, 0 replies; 15+ messages in thread
From: Krzysztof Błaszkowski @ 2010-05-07 9:48 UTC (permalink / raw)
To: xfs, Stan Hoeppner
On Friday 07 May 2010 11:23, Stan Hoeppner wrote:
> Krzysztof Błaszkowski put forth on 5/7/2010 3:22 AM:
> > Hello,
> >
> > I use this to preallocate large space but found an issue. Posix_fallocate
> > works right with sizes like 100G, 1T and even 10T on some boxes (on some
> > other can fail after e.g. 7T threshold) but if i tried e.g. 16T the user
> > space process would be "R"unning forever and it is not interruptible.
> > Furthermore some other not related processes like sshd, bash enter D
> > state. There is nothing in kernel log.
> >
> > I made so far a few logs with ftrace facility for 1G, 100G, 1T and 10T
> > sizes. I noticed that for 1st three sizes the log is as long as abt 1.5M
> > (2M peak) while 10T generates 94M long log. I couldn't retrieve a log for
> > 17T case because "cat /sys ... /trace" enters D.
> >
> > I would appreciate any help because i gave up with ftrace logs analysis.
> > The xfs_vn_fallocate is covered in abt 11k lines for a 1.5M log case
> > while there are abt 163k lines in 94M log. And all i could see is poss
> > some relationship between time spent in xfs_vn_fallocate subfunctions vs
> > requested space.
> >
> > Box details:
> > 16 Hitachi 2TB drives (backplane connected), dm, 1 lvm lun of 25T size,
> > kernel 2.6.31.5, more recent kernels neither xfs were not tested.
>
> 32 or 64 bit kernel?
sorry, i meant 64 bit.
> What is the size of the XFS filesystem on the 25TB
> LVM LUN against which you're running posix_fallocate?
xfs occupies whole lun (ie 25TB)
> The reason I ask is
> that XFS has a 16TB per filesystem limitation on 32 bit kernels. I can
> only assume that your XFS filesystem is larger than 16TB since you're
> attempting to posix_fallocate 16TB. But, it's best to ask for confirmation
> rather than assume, especially given that your problem is appearing near
> that magical 16TB boundary.
sure, i see. I use 64 bit by default.
Regards,
Krzysztof
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: posix_fallocate
2010-05-07 9:23 ` posix_fallocate Stan Hoeppner
2010-05-07 9:48 ` posix_fallocate Krzysztof Błaszkowski
@ 2010-05-07 10:07 ` Krzysztof Błaszkowski
2010-05-07 10:42 ` posix_fallocate Stan Hoeppner
1 sibling, 1 reply; 15+ messages in thread
From: Krzysztof Błaszkowski @ 2010-05-07 10:07 UTC (permalink / raw)
To: xfs; +Cc: Stan Hoeppner
Hello Stan,
It seems that your mail server blocks traffic from ".pl" domains so don't be
offended if you will not see my reply sent to your address.
(Remote host said: 550 5.7.1 <v007470.home.net.pl[212.85.125.104]>: Client
host rejected: We do not accept mail from .pl domains)
Krzysztof
On Friday 07 May 2010 11:23, Stan Hoeppner wrote:
> Krzysztof Błaszkowski put forth on 5/7/2010 3:22 AM:
> > Hello,
> >
> > I use this to preallocate large space but found an issue. Posix_fallocate
> > works right with sizes like 100G, 1T and even 10T on some boxes (on some
> > other can fail after e.g. 7T threshold) but if i tried e.g. 16T the user
> > space process would be "R"unning forever and it is not interruptible.
> > Furthermore some other not related processes like sshd, bash enter D
> > state. There is nothing in kernel log.
> >
> > I made so far a few logs with ftrace facility for 1G, 100G, 1T and 10T
> > sizes. I noticed that for 1st three sizes the log is as long as abt 1.5M
> > (2M peak) while 10T generates 94M long log. I couldn't retrieve a log for
> > 17T case because "cat /sys ... /trace" enters D.
> >
> > I would appreciate any help because i gave up with ftrace logs analysis.
> > The xfs_vn_fallocate is covered in abt 11k lines for a 1.5M log case
> > while there are abt 163k lines in 94M log. And all i could see is poss
> > some relationship between time spent in xfs_vn_fallocate subfunctions vs
> > requested space.
> >
> > Box details:
> > 16 Hitachi 2TB drives (backplane connected), dm, 1 lvm lun of 25T size,
> > kernel 2.6.31.5, more recent kernels neither xfs were not tested.
>
> 32 or 64 bit kernel? What is the size of the XFS filesystem on the 25TB
> LVM LUN against which you're running posix_fallocate? The reason I ask is
> that XFS has a 16TB per filesystem limitation on 32 bit kernels. I can
> only assume that your XFS filesystem is larger than 16TB since you're
> attempting to posix_fallocate 16TB. But, it's best to ask for confirmation
> rather than assume, especially given that your problem is appearing near
> that magical 16TB boundary.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: posix_fallocate
2010-05-07 10:07 ` posix_fallocate Krzysztof Błaszkowski
@ 2010-05-07 10:42 ` Stan Hoeppner
2010-05-07 10:56 ` posix_fallocate Krzysztof Błaszkowski
0 siblings, 1 reply; 15+ messages in thread
From: Stan Hoeppner @ 2010-05-07 10:42 UTC (permalink / raw)
To: xfs
Krzysztof Błaszkowski put forth on 5/7/2010 5:07 AM:
> Hello Stan,
>
> It seems that your mail server blocks traffic from ".pl" domains so don't be
> offended if you will not see my reply sent to your address.
>
> (Remote host said: 550 5.7.1 <v007470.home.net.pl[212.85.125.104]>: Client
> host rejected: We do not accept mail from .pl domains)
Sorry about that. Due to the constant battle against spam I've implemented
some pretty draconian countermeasures over time, including some ccTLD
blocking of SMTP. All my "overseas" contacts are via public mailing lists,
as in this case. Very rarely does a conversation need to go "off list", so
I've not had much of a problem with this setup. If you'd like I can
whitelist your address.
--
Stan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: posix_fallocate
2010-05-07 10:42 ` posix_fallocate Stan Hoeppner
@ 2010-05-07 10:56 ` Krzysztof Błaszkowski
0 siblings, 0 replies; 15+ messages in thread
From: Krzysztof Błaszkowski @ 2010-05-07 10:56 UTC (permalink / raw)
To: xfs; +Cc: Stan Hoeppner
On Friday 07 May 2010 12:42, Stan Hoeppner wrote:
> Krzysztof Błaszkowski put forth on 5/7/2010 5:07 AM:
> > Hello Stan,
> >
> > It seems that your mail server blocks traffic from ".pl" domains so don't
> > be offended if you will not see my reply sent to your address.
> >
> > (Remote host said: 550 5.7.1 <v007470.home.net.pl[212.85.125.104]>:
> > Client host rejected: We do not accept mail from .pl domains)
>
> Sorry about that. Due to the constant battle against spam I've implemented
> some pretty draconian countermeasures over time, including some ccTLD
> blocking of SMTP. All my "overseas" contacts are via public mailing lists,
> as in this case. Very rarely does a conversation need to go "off list", so
> I've not had much of a problem with this setup. If you'd like I can
> whitelist your address.
you are welcome. if it is not a big hassle then go ahead.
thanks,
Krzysztof
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: posix_fallocate
2010-05-07 8:22 posix_fallocate Krzysztof Błaszkowski
2010-05-07 9:23 ` posix_fallocate Stan Hoeppner
@ 2010-05-07 16:26 ` Eric Sandeen
2010-05-07 16:53 ` posix_fallocate Eric Sandeen
1 sibling, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2010-05-07 16:26 UTC (permalink / raw)
To: Krzysztof Błaszkowski; +Cc: xfs
Krzysztof Błaszkowski wrote:
> Hello,
>
> I use this to preallocate large space but found an issue. Posix_fallocate
> works right with sizes like 100G, 1T and even 10T on some boxes (on some
> other can fail after e.g. 7T threshold) but if i tried e.g. 16T the user
> space process would be "R"unning forever and it is not interruptible.
> Furthermore some other not related processes like sshd, bash enter D state.
> There is nothing in kernel log.
>
> I made so far a few logs with ftrace facility for 1G, 100G, 1T and 10T sizes.
> I noticed that for 1st three sizes the log is as long as abt 1.5M (2M peak)
> while 10T generates 94M long log. I couldn't retrieve a log for 17T case
> because "cat /sys ... /trace" enters D.
>
> I would appreciate any help because i gave up with ftrace logs analysis. The
> xfs_vn_fallocate is covered in abt 11k lines for a 1.5M log case while there
> are abt 163k lines in 94M log. And all i could see is poss some relationship
> between time spent in xfs_vn_fallocate subfunctions vs requested space.
>
> Box details:
> 16 Hitachi 2TB drives (backplane connected), dm, 1 lvm lun of 25T size,
> kernel 2.6.31.5, more recent kernels neither xfs were not tested.
It'd be great if you could test a more recent kernel.
sysrq-t would give us all the backtraces, except I suppose not for the
running process...
I can try to scrape up >16T to test on at some point ...
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: posix_fallocate
2010-05-07 16:26 ` posix_fallocate Eric Sandeen
@ 2010-05-07 16:53 ` Eric Sandeen
2010-05-07 22:16 ` posix_fallocate Dave Chinner
2010-05-10 7:11 ` posix_fallocate Krzysztof Błaszkowski
0 siblings, 2 replies; 15+ messages in thread
From: Eric Sandeen @ 2010-05-07 16:53 UTC (permalink / raw)
To: Krzysztof Błaszkowski; +Cc: xfs
Eric Sandeen wrote:
> Krzysztof Błaszkowski wrote:
>> Hello,
>>
>> I use this to preallocate large space but found an issue. Posix_fallocate
>> works right with sizes like 100G, 1T and even 10T on some boxes (on some
>> other can fail after e.g. 7T threshold) but if i tried e.g. 16T the user
>> space process would be "R"unning forever and it is not interruptible.
>> Furthermore some other not related processes like sshd, bash enter D state.
>> There is nothing in kernel log.
Oh, one thing you should know is that depending on your version of glibc,
posix_fallocate may be writing 0s and not using preallocation calls.
Do you know which yours is using? strace should tell you on a small
file test.
Anyway, I am seeing things get stuck around 8T it seems...
# touch /mnt/test/bigfile
# xfs_io -c "resvsp 0 16t" /mnt/test/bigfile
... wait ... in other window ...
# du -hc /mnt/test/bigfile
8.0G /mnt/test/bigfile
8.0G total
# echo t > /proc/sysrq-trigger
# dmesg | grep -A20 xfs_io
xfs_io R running task 3576 29444 29362 0x00000006
ffff8809cfbb4920 ffffffff81478d9f ffffffffa032d3c5 0000000000000246
ffff8809cfbb4920 ffffffff814788bc 0000000000000000 ffffffff81ba3510
ffff8809d3429a68 ffffffffa032b60f ffff8809d3429aa8 000000000000001e
Call Trace:
[<ffffffff81478d9f>] ? __mutex_lock_common+0x36d/0x392
[<ffffffffa032d3c5>] ? xfs_icsb_modify_counters+0x17f/0x1ac [xfs]
[<ffffffffa032b60f>] ? xfs_icsb_unlock_all_counters+0x4d/0x60 [xfs]
[<ffffffffa032b8bf>] ? xfs_icsb_disable_counter+0x8c/0x95 [xfs]
[<ffffffff81478e88>] ? mutex_lock_nested+0x3e/0x43
[<ffffffffa032d3d3>] ? xfs_icsb_modify_counters+0x18d/0x1ac [xfs]
[<ffffffffa032d536>] ? xfs_mod_incore_sb+0x29/0x6e [xfs]
[<ffffffffa033052c>] ? _xfs_trans_alloc+0x27/0x61 [xfs]
[<ffffffffa03303d3>] ? xfs_trans_reserve+0x6c/0x19e [xfs]
[<ffffffff8106fb45>] ? up_write+0x2b/0x32
[<ffffffffa0335e55>] ? xfs_alloc_file_space+0x163/0x306 [xfs]
[<ffffffff8107120a>] ? sched_clock_cpu+0xc3/0xce
[<ffffffffa0336122>] ? xfs_change_file_space+0x12a/0x2b8 [xfs]
[<ffffffff8106f9bf>] ? down_write_nested+0x80/0x8b
[<ffffffffa031b8ce>] ? xfs_ilock+0x30/0xb4 [xfs]
[<ffffffffa033e0e4>] ? xfs_vn_fallocate+0x80/0xf4 [xfs]
--
R xfs_io 29444 86014624.786617 162 120 86014624.786617 137655.161327 408.979977 /
# uname -r
2.6.34-0.4.rc0.git2.fc14.x86_64
I'll look into it.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: posix_fallocate
2010-05-07 16:53 ` posix_fallocate Eric Sandeen
@ 2010-05-07 22:16 ` Dave Chinner
2010-05-10 7:11 ` posix_fallocate Krzysztof Błaszkowski
1 sibling, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2010-05-07 22:16 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs, Krzysztof Błaszkowski
On Fri, May 07, 2010 at 11:53:27AM -0500, Eric Sandeen wrote:
> Eric Sandeen wrote:
> > Krzysztof Błaszkowski wrote:
> >> Hello,
> >>
> >> I use this to preallocate large space but found an issue. Posix_fallocate
> >> works right with sizes like 100G, 1T and even 10T on some boxes (on some
> >> other can fail after e.g. 7T threshold) but if i tried e.g. 16T the user
> >> space process would be "R"unning forever and it is not interruptible.
> >> Furthermore some other not related processes like sshd, bash enter D state.
> >> There is nothing in kernel log.
>
> Oh, one thing you should know is that depending on your version of glibc,
> posix_fallocate may be writing 0s and not using preallocation calls.
>
> Do you know which yours is using? strace should tell you on a small
> file test.
>
> Anyway, I am seeing things get stuck around 8T it seems...
>
> # touch /mnt/test/bigfile
> # xfs_io -c "resvsp 0 16t" /mnt/test/bigfile
>
> ... wait ... in other window ...
>
> # du -hc /mnt/test/bigfile
> 8.0G /mnt/test/bigfile
> 8.0G total
>
> # echo t > /proc/sysrq-trigger
> # dmesg | grep -A20 xfs_io
> xfs_io R running task 3576 29444 29362 0x00000006
> ffff8809cfbb4920 ffffffff81478d9f ffffffffa032d3c5 0000000000000246
> ffff8809cfbb4920 ffffffff814788bc 0000000000000000 ffffffff81ba3510
> ffff8809d3429a68 ffffffffa032b60f ffff8809d3429aa8 000000000000001e
> Call Trace:
> [<ffffffff81478d9f>] ? __mutex_lock_common+0x36d/0x392
> [<ffffffffa032d3c5>] ? xfs_icsb_modify_counters+0x17f/0x1ac [xfs]
> [<ffffffffa032b60f>] ? xfs_icsb_unlock_all_counters+0x4d/0x60 [xfs]
> [<ffffffffa032b8bf>] ? xfs_icsb_disable_counter+0x8c/0x95 [xfs]
> [<ffffffff81478e88>] ? mutex_lock_nested+0x3e/0x43
> [<ffffffffa032d3d3>] ? xfs_icsb_modify_counters+0x18d/0x1ac [xfs]
> [<ffffffffa032d536>] ? xfs_mod_incore_sb+0x29/0x6e [xfs]
> [<ffffffffa033052c>] ? _xfs_trans_alloc+0x27/0x61 [xfs]
> [<ffffffffa03303d3>] ? xfs_trans_reserve+0x6c/0x19e [xfs]
> [<ffffffff8106fb45>] ? up_write+0x2b/0x32
> [<ffffffffa0335e55>] ? xfs_alloc_file_space+0x163/0x306 [xfs]
> [<ffffffff8107120a>] ? sched_clock_cpu+0xc3/0xce
> [<ffffffffa0336122>] ? xfs_change_file_space+0x12a/0x2b8 [xfs]
> [<ffffffff8106f9bf>] ? down_write_nested+0x80/0x8b
> [<ffffffffa031b8ce>] ? xfs_ilock+0x30/0xb4 [xfs]
> [<ffffffffa033e0e4>] ? xfs_vn_fallocate+0x80/0xf4 [xfs]
> --
> R xfs_io 29444 86014624.786617 162 120 86014624.786617 137655.161327 408.979977 /
>
> # uname -r
> 2.6.34-0.4.rc0.git2.fc14.x86_64
>
> I'll look into it.
On my current delayed-logging branch on a 30GB filesystem:
# xfs_io -f -c "resvsp 0 16t" /mnt/scratch/bigfile
And in dmesg:
[60173.119760] Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 475
[60173.121263] ------------[ cut here ]------------
[60173.121771] kernel BUG at fs/xfs/support/debug.c:109!
[60173.121771] invalid opcode: 0000 [#1] SMP
[60173.121771] last sysfs file: /sys/devices/virtio-pci/virtio2/block/vdb/removable
[60173.121771] CPU 7
[60173.121771] Modules linked in: [last unloaded: scsi_wait_scan]
[60173.121771]
[60173.121771] Pid: 3596, comm: xfs_io Not tainted 2.6.34-rc1-dgc #138 /Bochs
[60173.121771] RIP: 0010:[<ffffffff8135db5f>] [<ffffffff8135db5f>] assfail+0x1f/0x30
[60173.121771] RSP: 0018:ffff880112043808 EFLAGS: 00010292
[60173.121771] RAX: 000000000000006d RBX: ffff880105038da0 RCX: 0000000000000000
[60173.121771] RDX: ffff880003600000 RSI: 0000000000000000 RDI: 0000000000000246
[60173.121771] RBP: ffff880112043808 R08: 0000000000000002 R09: 0000000000000000
[60173.121771] R10: ffffffff81a70bb8 R11: 0000000000000000 R12: ffffffffffe20082
[60173.121771] R13: ffff88011cea5000 R14: 0000000000000001 R15: 0000000000000000
[60173.121771] FS: 00007f0311cda6f0(0000) GS:ffff880003600000(0000) knlGS:0000000000000000
[60173.121771] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60173.121771] CR2: 00007f031164d750 CR3: 000000011bc59000 CR4: 00000000000006e0
[60173.121771] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[60173.121771] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[60173.121771] Process xfs_io (pid: 3596, threadinfo ffff880112042000, task ffff88011629a740)
[60173.121771] Stack:
[60173.121771] ffff880112043838 ffffffff81342415 00000000001dff7e ffff880112043928
[60173.121771] <0> 00000000001dff7e 0000000000000004 ffff880112043858 ffffffff812e82ce
[60173.121771] <0> ffff880112043928 ffff88011cea5000 ffff8801120438c8 ffffffff812e9008
[60173.121771] Call Trace:
[60173.121771] [<ffffffff81342415>] xfs_trans_mod_sb+0x2f5/0x330
[60173.121771] [<ffffffff812e82ce>] xfs_alloc_ag_vextent+0x18e/0x2b0
[60173.121771] [<ffffffff812e9008>] xfs_alloc_vextent+0x598/0x870
[60173.121771] [<ffffffff812f9c0f>] xfs_bmap_btalloc+0x29f/0x7b0
[60173.121771] [<ffffffff812f4af1>] ? xfs_bmap_search_multi_extents+0x71/0x110
[60173.121771] [<ffffffff812fa141>] xfs_bmap_alloc+0x21/0x40
[60173.121771] [<ffffffff812fd43c>] xfs_bmapi+0xf2c/0x1a90
[60173.121771] [<ffffffff81333f55>] ? xlog_grant_log_space+0x35/0x640
[60173.121771] [<ffffffff8132293b>] ? xfs_ilock+0x10b/0x190
[60173.121771] [<ffffffff81349660>] xfs_alloc_file_space+0x190/0x440
[60173.121771] [<ffffffff810b41cd>] ? trace_hardirqs_on+0xd/0x10
[60173.121771] [<ffffffff8134c7a4>] xfs_change_file_space+0x2d4/0x380
[60173.121771] [<ffffffff810a291e>] ? down_write_nested+0x9e/0xb0
[60173.121771] [<ffffffff81322918>] ? xfs_ilock+0xe8/0x190
[60173.121771] [<ffffffff81358a87>] xfs_vn_fallocate+0x87/0x110
[60173.121771] [<ffffffff8112661c>] ? __do_fault+0x12c/0x450
[60173.121771] [<ffffffff81124ebc>] ? might_fault+0x5c/0xb0
[60173.121771] [<ffffffff81126889>] ? __do_fault+0x399/0x450
[60173.121771] [<ffffffff8114cfb3>] do_fallocate+0x103/0x110
[60173.121771] [<ffffffff8115dd8c>] ioctl_preallocate+0x8c/0xb0
[60173.121771] [<ffffffff8115e1c5>] do_vfs_ioctl+0x415/0x5b0
[60173.121771] [<ffffffff810a2a43>] ? up_read+0x23/0x40
[60173.121771] [<ffffffff8115e3e1>] sys_ioctl+0x81/0xa0
[60173.121771] [<ffffffff81036032>] system_call_fastpath+0x16/0x1b
So there's been a transaction overrun, which tends to imply we're
allocating too much in a single transaction. I'd say there's an
overflow happening somewhere in this path.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: posix_fallocate
2010-05-07 16:53 ` posix_fallocate Eric Sandeen
2010-05-07 22:16 ` posix_fallocate Dave Chinner
@ 2010-05-10 7:11 ` Krzysztof Błaszkowski
2010-05-10 14:39 ` posix_fallocate Eric Sandeen
1 sibling, 1 reply; 15+ messages in thread
From: Krzysztof Błaszkowski @ 2010-05-10 7:11 UTC (permalink / raw)
To: xfs; +Cc: Eric Sandeen
On Friday 07 May 2010 18:53, Eric Sandeen wrote:
> Eric Sandeen wrote:
> > Krzysztof Błaszkowski wrote:
> >> Hello,
> >>
> >> I use this to preallocate large space but found an issue.
> >> Posix_fallocate works right with sizes like 100G, 1T and even 10T on
> >> some boxes (on some other can fail after e.g. 7T threshold) but if i
> >> tried e.g. 16T the user space process would be "R"unning forever and it
> >> is not interruptible. Furthermore some other not related processes like
> >> sshd, bash enter D state. There is nothing in kernel log.
>
> Oh, one thing you should know is that depending on your version of glibc,
> posix_fallocate may be writing 0s and not using preallocation calls.
I am absolutely sure that recent libc doesn't emulate this syscall
>
> Do you know which yours is using?
syscall (libc 2.9)
> strace should tell you on a small
> file test.
>
> Anyway, I am seeing things get stuck around 8T it seems...
yes, i noticed that sometimes the threshold point is higher.
>
> # touch /mnt/test/bigfile
> # xfs_io -c "resvsp 0 16t" /mnt/test/bigfile
>
> ... wait ... in other window ...
>
> # du -hc /mnt/test/bigfile
> 8.0G /mnt/test/bigfile
> 8.0G total
>
> # echo t > /proc/sysrq-trigger
It was good idea to use sysrq. I didn't think about this but rather focused on
ftrace and how to analyse these megs of data
> # dmesg | grep -A20 xfs_io
> xfs_io R running task 3576 29444 29362 0x00000006
> ffff8809cfbb4920 ffffffff81478d9f ffffffffa032d3c5 0000000000000246
> ffff8809cfbb4920 ffffffff814788bc 0000000000000000 ffffffff81ba3510
> ffff8809d3429a68 ffffffffa032b60f ffff8809d3429aa8 000000000000001e
> Call Trace:
> [<ffffffff81478d9f>] ? __mutex_lock_common+0x36d/0x392
> [<ffffffffa032d3c5>] ? xfs_icsb_modify_counters+0x17f/0x1ac [xfs]
> [<ffffffffa032b60f>] ? xfs_icsb_unlock_all_counters+0x4d/0x60 [xfs]
> [<ffffffffa032b8bf>] ? xfs_icsb_disable_counter+0x8c/0x95 [xfs]
> [<ffffffff81478e88>] ? mutex_lock_nested+0x3e/0x43
> [<ffffffffa032d3d3>] ? xfs_icsb_modify_counters+0x18d/0x1ac [xfs]
> [<ffffffffa032d536>] ? xfs_mod_incore_sb+0x29/0x6e [xfs]
> [<ffffffffa033052c>] ? _xfs_trans_alloc+0x27/0x61 [xfs]
> [<ffffffffa03303d3>] ? xfs_trans_reserve+0x6c/0x19e [xfs]
> [<ffffffff8106fb45>] ? up_write+0x2b/0x32
> [<ffffffffa0335e55>] ? xfs_alloc_file_space+0x163/0x306 [xfs]
> [<ffffffff8107120a>] ? sched_clock_cpu+0xc3/0xce
> [<ffffffffa0336122>] ? xfs_change_file_space+0x12a/0x2b8 [xfs]
> [<ffffffff8106f9bf>] ? down_write_nested+0x80/0x8b
> [<ffffffffa031b8ce>] ? xfs_ilock+0x30/0xb4 [xfs]
> [<ffffffffa033e0e4>] ? xfs_vn_fallocate+0x80/0xf4 [xfs]
> --
> R xfs_io 29444 86014624.786617 162 120 86014624.786617
> 137655.161327 408.979977 /
>
> # uname -r
> 2.6.34-0.4.rc0.git2.fc14.x86_64
>
> I'll look into it.
We stick with 2.6.31.5 which seems to be good for us. We do not change kernels
easily, as soon as higher revision arrives because it doesn't make sense from
stability point of view. We have seen too many times regression bugs so if we
are confident with some revision then there is no point to change this.
Krzysztof Błaszkowski
>
> -Eric
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: posix_fallocate
2010-05-10 7:11 ` posix_fallocate Krzysztof Błaszkowski
@ 2010-05-10 14:39 ` Eric Sandeen
2010-05-10 18:17 ` posix_fallocate Krzysztof Błaszkowski
0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2010-05-10 14:39 UTC (permalink / raw)
To: Krzysztof Błaszkowski; +Cc: xfs
Krzysztof Błaszkowski wrote:
> On Friday 07 May 2010 18:53, Eric Sandeen wrote:
>> Eric Sandeen wrote:
>>> Krzysztof Błaszkowski wrote:
>>>> Hello,
>>>>
>>>> I use this to preallocate large space but found an issue.
>>>> Posix_fallocate works right with sizes like 100G, 1T and even 10T on
>>>> some boxes (on some other can fail after e.g. 7T threshold) but if i
>>>> tried e.g. 16T the user space process would be "R"unning forever and it
>>>> is not interruptible. Furthermore some other not related processes like
>>>> sshd, bash enter D state. There is nothing in kernel log.
>> Oh, one thing you should know is that depending on your version of glibc,
>> posix_fallocate may be writing 0s and not using preallocation calls.
>
> I am absolutely sure that recent libc doesn't emulate this syscall
right, recent glibc does not (unless the underlying fs doesn't support it)
...
> We stick with 2.6.31.5 which seems to be good for us. We do not change kernels
> easily, as soon as higher revision arrives because it doesn't make sense from
> stability point of view. We have seen too many times regression bugs so if we
> are confident with some revision then there is no point to change this.
It was just a testing suggestion, but I already tested upstream and the problem
persists, now just need to find the time to dig into it.
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: posix_fallocate
2010-05-10 14:39 ` posix_fallocate Eric Sandeen
@ 2010-05-10 18:17 ` Krzysztof Błaszkowski
2010-05-10 18:45 ` posix_fallocate Eric Sandeen
0 siblings, 1 reply; 15+ messages in thread
From: Krzysztof Błaszkowski @ 2010-05-10 18:17 UTC (permalink / raw)
To: xfs; +Cc: Eric Sandeen
On Monday 10 May 2010 16:39, Eric Sandeen wrote:
> Krzysztof Błaszkowski wrote:
> > On Friday 07 May 2010 18:53, Eric Sandeen wrote:
> >> Eric Sandeen wrote:
> >>> Krzysztof Błaszkowski wrote:
> >>>> Hello,
> >>>>
> >>>> I use this to preallocate large space but found an issue.
> >>>> Posix_fallocate works right with sizes like 100G, 1T and even 10T on
> >>>> some boxes (on some other can fail after e.g. 7T threshold) but if i
> >>>> tried e.g. 16T the user space process would be "R"unning forever and
> >>>> it is not interruptible. Furthermore some other not related processes
> >>>> like sshd, bash enter D state. There is nothing in kernel log.
> >>
> >> Oh, one thing you should know is that depending on your version of
> >> glibc, posix_fallocate may be writing 0s and not using preallocation
> >> calls.
> >
> > I am absolutely sure that recent libc doesn't emulate this syscall
>
> right, recent glibc does not (unless the underlying fs doesn't support it)
>
> ...
>
> > We stick with 2.6.31.5 which seems to be good for us. We do not change
> > kernels easily, as soon as higher revision arrives because it doesn't
> > make sense from stability point of view. We have seen too many times
> > regression bugs so if we are confident with some revision then there is
> > no point to change this.
>
> It was just a testing suggestion, but I already tested upstream and the
> problem persists, now just need to find the time to dig into it.
I see and I am glad you confirmed this. Do you think that fallocate called
many times with fixed size and increasing offset will work better than one
time call with huge size @ 0 offset ?
Krzysztof
>
> -Eric
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: posix_fallocate
2010-05-10 18:17 ` posix_fallocate Krzysztof Błaszkowski
@ 2010-05-10 18:45 ` Eric Sandeen
2010-05-11 14:20 ` posix_fallocate Krzysztof Błaszkowski
0 siblings, 1 reply; 15+ messages in thread
From: Eric Sandeen @ 2010-05-10 18:45 UTC (permalink / raw)
To: Krzysztof Błaszkowski; +Cc: xfs
Krzysztof Błaszkowski wrote:
> On Monday 10 May 2010 16:39, Eric Sandeen wrote:
>> Krzysztof Błaszkowski wrote:
...
>>> We stick with 2.6.31.5 which seems to be good for us. We do not change
>>> kernels easily, as soon as higher revision arrives because it doesn't
>>> make sense from stability point of view. We have seen too many times
>>> regression bugs so if we are confident with some revision then there is
>>> no point to change this.
>> It was just a testing suggestion, but I already tested upstream and the
>> problem persists, now just need to find the time to dig into it.
>
> I see and I am glad you confirmed this. Do you think that fallocate called
> many times with fixed size and increasing offset will work better than one
> time call with huge size @ 0 offset ?
I'd expect that to work; it's certainly worth a test, and please send your
results back to the list ;)
thanks,
-Eric
> Krzysztof
>> -Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: posix_fallocate
2010-05-10 18:45 ` posix_fallocate Eric Sandeen
@ 2010-05-11 14:20 ` Krzysztof Błaszkowski
2010-05-11 14:54 ` posix_fallocate Eric Sandeen
0 siblings, 1 reply; 15+ messages in thread
From: Krzysztof Błaszkowski @ 2010-05-11 14:20 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
On Monday 10 May 2010 20:45, Eric Sandeen wrote:
> Krzysztof Błaszkowski wrote:
> > On Monday 10 May 2010 16:39, Eric Sandeen wrote:
> >> Krzysztof Błaszkowski wrote:
>
> ...
>
> >>> We stick with 2.6.31.5 which seems to be good for us. We do not change
> >>> kernels easily, as soon as higher revision arrives because it doesn't
> >>> make sense from stability point of view. We have seen too many times
> >>> regression bugs so if we are confident with some revision then there is
> >>> no point to change this.
> >>
> >> It was just a testing suggestion, but I already tested upstream and the
> >> problem persists, now just need to find the time to dig into it.
> >
> > I see and I am glad you confirmed this. Do you think that fallocate
> > called many times with fixed size and increasing offset will work better
> > than one time call with huge size @ 0 offset ?
>
> I'd expect that to work; it's certainly worth a test
agreed.
> , and please send your
> results back to the list ;)
okay, will do this tomorrow.
BUT let's think about possible results:
- test will fail. nothing to comment.
- test will pass. this is interesting case.
A passed test - does this test prove anything ?
it may but this is not obvious.
If the fault was caused by some algorithmic mistake (some table size, buffer
size according to input size) then the test result could be a proof.
but if it fails due to missing spinlock/mutex elsewhere then we talk about
probability of failure which depends on requested size.
bad news is that this failure happens at various sizes depending on hw
configuration. On some boxes the threshold point is abt 7T while another
boxes fail after e.g. 15T.
I am not sure about any relationship between these boxes in installed memory,
amount of logical cores and theirs frequency (and current workload)
In this later case the test will prove nothing. If i run it 5 times and it
passed 5 times it would mean only that i was lucky.
as long as we don't know the exact nature of this fault we can't consider such
test as reliable fix.
Krzysztof
PS of course i will try this just to satisfy curiosity tomorrow afternoon (PL
time) as all high capacity storage has been shipped to customers already.
>
> thanks,
> -Eric
>
> > Krzysztof
> >
> >> -Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-05-11 14:52 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-07 8:22 posix_fallocate Krzysztof Błaszkowski
2010-05-07 9:23 ` posix_fallocate Stan Hoeppner
2010-05-07 9:48 ` posix_fallocate Krzysztof Błaszkowski
2010-05-07 10:07 ` posix_fallocate Krzysztof Błaszkowski
2010-05-07 10:42 ` posix_fallocate Stan Hoeppner
2010-05-07 10:56 ` posix_fallocate Krzysztof Błaszkowski
2010-05-07 16:26 ` posix_fallocate Eric Sandeen
2010-05-07 16:53 ` posix_fallocate Eric Sandeen
2010-05-07 22:16 ` posix_fallocate Dave Chinner
2010-05-10 7:11 ` posix_fallocate Krzysztof Błaszkowski
2010-05-10 14:39 ` posix_fallocate Eric Sandeen
2010-05-10 18:17 ` posix_fallocate Krzysztof Błaszkowski
2010-05-10 18:45 ` posix_fallocate Eric Sandeen
2010-05-11 14:20 ` posix_fallocate Krzysztof Błaszkowski
2010-05-11 14:54 ` posix_fallocate Eric Sandeen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox