* NFS-RDMA hangs: connection closed (-103)
@ 2010-12-01 23:13 Spelic
[not found] ` <4CF6D69B.4030501-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Spelic @ 2010-12-01 23:13 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hello all
First of all: I have tried to send this message to the list at least 3
times but it doesn't seem to get through (and I'm given no error back).
It was very long with 2 attachments... is is because of that? What are
the limits of this ML?
This time I will shorten it a bit and remove the attachments.
Here is my problem:
I am trying to use NFS over RDMA. It doesn't work: hangs very soon.
I tried kernel 2.6.32 from ubuntu 10.04, and then I tried the most
recent upstream 2.6.37-rc4 compiled from source. They behave basically
the same regarding the NFS mount itself, only difference is that 2.6.32
will hang the complete operating system when nfs hangs, while 2.6.37-rc4
(after nfs hangs) will only hang processes which launch sync or list nfs
directories. Anyway the mount is hanged forever; does not resolve by
itself.
IPoIB nfs mounts appear to work flawlessly, the problem is with RDMA only.
Hardware: (identical client and server machines)
07:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx
HCA] (rev 20)
Subsystem: Mellanox Technologies MT25204 [InfiniHost III Lx HCA]
Flags: bus master, fast devsel, latency 0, IRQ 30
Memory at d8800000 (64-bit, non-prefetchable) [size=1M]
Memory at d8000000 (64-bit, prefetchable) [size=8M]
Capabilities: [40] Power Management version 2
Capabilities: [48] Vital Product Data <?>
Capabilities: [90] Message Signalled Interrupts: Mask- 64bit+
Queue=0/5 Enable-
Capabilities: [84] MSI-X: Enable+ Mask- TabSize=32
Capabilities: [60] Express Endpoint, MSI 00
Kernel driver in use: ib_mthca
Kernel modules: ib_mthca
Mainboard = Supermicro X7DWT with embedded infiniband.
This is my test:
on server I make a big 14GB ramdisk (exact boot option:
ramdisk_size=14680064), format xfs and mount like this:
mkfs.xfs -f -l size=128m -d agcount=16 /dev/ram0
mount -o nobarrier,inode64,logbufs=8,logbsize=256k /dev/ram0
/mnt/ram/
On the client I mount like this (fstab):
10.100.0.220:/ /mnt/nfsram nfs4
_netdev,auto,defaults,rdma,port=20049 0 0
Then on the client I perform
echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo
syncing now ; sync ; echo finished
It hangs as soon as it reaches the end of the 14GB of space, but never
writes "syncing now". It seems like the "disk full" message triggers the
hangup reliably on NFS over RDMA over XFS over ramdisk; other
combinations are not so reliable for triggering the bug (e.g. ext4).
However please note that this is not an XFS problem in itself: we had
another hangup on an ext4 filesystem on NFS on RDMA on real disks for
real work after a few hours (and it hadn't hit the "disk full"
situation); this technique with XFS on ramdisk is just more reliably
reproducible.
Note that the hangup does not happen on NFS over IPoIB (no RDMA) over
XFS over ramdisk. It's really an RDMA-only bug.
On the other machine (2.6.32) that was doing real work on real disks I
am now mounting over IPoIB without RDMA and in fact that one is still
running reliably.
The dd process hangs like this: (/proc/pid/stack)
[<ffffffff810f8f75>] sync_page+0x45/0x60
[<ffffffff810f9143>] wait_on_page_bit+0x73/0x80
[<ffffffff810f9590>] filemap_fdatawait_range+0x110/0x1a0
[<ffffffff810f9720>] filemap_write_and_wait_range+0x70/0x80
[<ffffffff811766ba>] vfs_fsync_range+0x5a/0xa0
[<ffffffff8117676c>] vfs_fsync+0x1c/0x20
[<ffffffffa02bda1d>] nfs_file_write+0xdd/0x1f0 [nfs]
[<ffffffff8114d4fa>] do_sync_write+0xda/0x120
[<ffffffff8114d808>] vfs_write+0xc8/0x190
[<ffffffff8114e061>] sys_write+0x51/0x90
[<ffffffff8100c042>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
The dd process is not killable with -9 . Stays alive and hanged.
In the dmesg (client) you can see this line immediately, as soon as
transfer stops (iostat -n 1) and dd hangs up:
[ 3072.884988] rpcrdma: connection to 10.100.0.220:20049 closed
(-103)
after a while you can see this in dmesg
[ 3242.890030] INFO: task dd:2140 blocked for more than 120 seconds.
[ 3242.890132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 3242.890239] dd D ffff88040a8f0398 0 2140 2113
0x00000000
[ 3242.890243] ffff88040891fb38 0000000000000082 ffff88040891fa98
ffff88040891fa98
[ 3242.890248] 00000000000139c0 ffff88040a8f0000 ffff88040a8f0398
ffff88040891ffd8
[ 3242.890251] ffff88040a8f03a0 00000000000139c0 ffff88040891e010
00000000000139c0
[ 3242.890255] Call Trace:
[ 3242.890264] [<ffffffff81035509>] ? default_spin_lock_flags+0x9/0x10
[ 3242.890269] [<ffffffff810f8f30>] ? sync_page+0x0/0x60
[ 3242.890273] [<ffffffff8157b824>] io_schedule+0x44/0x60
[ 3242.890276] [<ffffffff810f8f75>] sync_page+0x45/0x60
[ 3242.890279] [<ffffffff8157c0bf>] __wait_on_bit+0x5f/0x90
[ 3242.890281] [<ffffffff810f9143>] wait_on_page_bit+0x73/0x80
[ 3242.890286] [<ffffffff81081bf0>] ? wake_bit_function+0x0/0x40
[ 3242.890290] [<ffffffff81103ce5>] ? pagevec_lookup_tag+0x25/0x40
[ 3242.890293] [<ffffffff810f9590>] filemap_fdatawait_range+0x110/0x1a0
[ 3242.890296] [<ffffffff810f9720>] filemap_write_and_wait_range+0x70/0x80
[ 3242.890301] [<ffffffff811766ba>] vfs_fsync_range+0x5a/0xa0
[ 3242.890303] [<ffffffff8117676c>] vfs_fsync+0x1c/0x20
[ 3242.890319] [<ffffffffa02bda1d>] nfs_file_write+0xdd/0x1f0 [nfs]
[ 3242.890323] [<ffffffff8114d4fa>] do_sync_write+0xda/0x120
[ 3242.890328] [<ffffffff812967c8>] ? apparmor_file_permission+0x18/0x20
[ 3242.890333] [<ffffffff81263323>] ? security_file_permission+0x23/0x90
[ 3242.890335] [<ffffffff8114d808>] vfs_write+0xc8/0x190
[ 3242.890338] [<ffffffff8114e061>] sys_write+0x51/0x90
[ 3242.890343] [<ffffffff8100c001>] ? system_call_after_swapgs+0x41/0x6c
[ 3242.890346] [<ffffffff8100c042>] system_call_fastpath+0x16/0x1b
[ 3253.280020] nfs: server 10.100.0.220 not responding, still trying
If I try an umount -f this happens:
umount2: Device or resource busy
umount.nfs4: /mnt/nfsram: device is busy
umount2: Device or resource busy
umount.nfs4: /mnt/nfsram: device is busy
it stays mounted and processes are still hanged
However if I repeat the umount -f three times, at the third time it
does unmount. At that point the dd process is killed. However it's still
impossible to remount with RDMA, and lsmod shows:
xprtrdma 41048 1
module in use with 1 reference.
Only a reboot of the client allows to repeat the mount at this point.
It is not needed to reboot the server.
The dmesg at server side is this:
for each hang at client side (with about 1 minute delay every time):
BEGIN
[ 464.780047] WARNING: at
net/sunrpc/xprtrdma/svc_rdma_transport.c:1162
__svc_rdma_free+0x20d/0x230 [svcrdma]()
[ 464.780050] Hardware name: X7DWT
[ 464.780051] Modules linked in: xfs binfmt_misc ppdev fbcon
tileblit font bitblit softcursor nfs radeon nfsd lockd svcrdma exportfs
ib_srp scsi_transport_srp fscache scsi_tgt rdma_ucm crc32c ib_ipoib
ib_iser nfs_acl auth_rpcgss rdma_cm ttm ib_cm iw_cm ib_sa ib_addr
mlx4_ib drm_kms_helper iscsi_tcp libiscsi_tcp drm libiscsi mlx4_core
scsi_transport_iscsi ib_mthca sunrpc ib_uverbs e1000e ib_umad ib_mad
ioatdma ib_core joydev i5400_edac psmouse edac_core usbhid i2c_algo_bit
lp shpchp dca hid parport i5k_amb serio_raw [last unloaded: xprtrdma]
[ 464.780094] Pid: 11, comm: kworker/0:1 Not tainted
2.6.37-rc4-stserver-mykernel3+ #1
[ 464.780096] Call Trace:
[ 464.780106] [<ffffffff8105feef>] warn_slowpath_common+0x7f/0xc0
[ 464.780110] [<ffffffffa0220ff0>] ? __svc_rdma_free+0x0/0x230
[svcrdma]
[ 464.780112] [<ffffffff8105ff4a>] warn_slowpath_null+0x1a/0x20
[ 464.780116] [<ffffffffa02211fd>] __svc_rdma_free+0x20d/0x230
[svcrdma]
[ 464.780119] [<ffffffffa0220ff0>] ? __svc_rdma_free+0x0/0x230
[svcrdma]
[ 464.780123] [<ffffffff8107aa05>] process_one_work+0x125/0x440
[ 464.780126] [<ffffffff8107d1d0>] worker_thread+0x170/0x410
[ 464.780129] [<ffffffff8107d060>] ? worker_thread+0x0/0x410
[ 464.780132] [<ffffffff81081656>] kthread+0x96/0xa0
[ 464.780137] [<ffffffff8100ce64>] kernel_thread_helper+0x4/0x10
[ 464.780139] [<ffffffff810815c0>] ? kthread+0x0/0xa0
[ 464.780142] [<ffffffff8100ce60>] ? kernel_thread_helper+0x0/0x10
[ 464.780144] ---[ end trace e31416a7a1dc2103 ]---
END
If at this point I restart nfs-kernel-server, this is the dmesg at
serverside:
[ 1119.778579] nfsd: last server has exited, flushing export cache
[ 1120.930444] svc: failed to register lockdv1 RPC service (errno 97).
[ 1120.930490] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4
state recovery directory
[ 1120.930512] NFSD: starting 90-second grace period
I'm not sure if the svc line is meaningful...
I have the dmesgs of the client and server since boot if you are
interested, lsmod, and other stuff. I have removed them to see if the
message gets through to the list now...
Thanks for your help
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS-RDMA hangs: connection closed (-103)
[not found] ` <4CF6D69B.4030501-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>
@ 2010-12-01 23:25 ` Tom Tucker
2010-12-01 23:59 ` Tom Tucker
1 sibling, 0 replies; 9+ messages in thread
From: Tom Tucker @ 2010-12-01 23:25 UTC (permalink / raw)
To: Spelic; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi Spelic,
Can you reproduce this with an nfsv3 mount?
On 12/1/10 5:13 PM, Spelic wrote:
> Hello all
>
> First of all: I have tried to send this message to the list at least 3
> times but it doesn't seem to get through (and I'm given no error back).
> It was very long with 2 attachments... is is because of that? What are
> the limits of this ML?
> This time I will shorten it a bit and remove the attachments.
>
> Here is my problem:
> I am trying to use NFS over RDMA. It doesn't work: hangs very soon.
> I tried kernel 2.6.32 from ubuntu 10.04, and then I tried the most
> recent upstream 2.6.37-rc4 compiled from source. They behave basically
> the same regarding the NFS mount itself, only difference is that 2.6.32
> will hang the complete operating system when nfs hangs, while 2.6.37-rc4
> (after nfs hangs) will only hang processes which launch sync or list nfs
> directories. Anyway the mount is hanged forever; does not resolve by
> itself.
> IPoIB nfs mounts appear to work flawlessly, the problem is with RDMA only.
>
> Hardware: (identical client and server machines)
> 07:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx
> HCA] (rev 20)
> Subsystem: Mellanox Technologies MT25204 [InfiniHost III Lx HCA]
> Flags: bus master, fast devsel, latency 0, IRQ 30
> Memory at d8800000 (64-bit, non-prefetchable) [size=1M]
> Memory at d8000000 (64-bit, prefetchable) [size=8M]
> Capabilities: [40] Power Management version 2
> Capabilities: [48] Vital Product Data <?>
> Capabilities: [90] Message Signalled Interrupts: Mask- 64bit+
> Queue=0/5 Enable-
> Capabilities: [84] MSI-X: Enable+ Mask- TabSize=32
> Capabilities: [60] Express Endpoint, MSI 00
> Kernel driver in use: ib_mthca
> Kernel modules: ib_mthca
>
> Mainboard = Supermicro X7DWT with embedded infiniband.
>
> This is my test:
> on server I make a big 14GB ramdisk (exact boot option:
> ramdisk_size=14680064), format xfs and mount like this:
> mkfs.xfs -f -l size=128m -d agcount=16 /dev/ram0
> mount -o nobarrier,inode64,logbufs=8,logbsize=256k /dev/ram0
> /mnt/ram/
> On the client I mount like this (fstab):
> 10.100.0.220:/ /mnt/nfsram nfs4
> _netdev,auto,defaults,rdma,port=20049 0 0
>
> Then on the client I perform
> echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo
> syncing now ; sync ; echo finished
>
> It hangs as soon as it reaches the end of the 14GB of space, but never
> writes "syncing now". It seems like the "disk full" message triggers the
> hangup reliably on NFS over RDMA over XFS over ramdisk; other
> combinations are not so reliable for triggering the bug (e.g. ext4).
>
> However please note that this is not an XFS problem in itself: we had
> another hangup on an ext4 filesystem on NFS on RDMA on real disks for
> real work after a few hours (and it hadn't hit the "disk full"
> situation); this technique with XFS on ramdisk is just more reliably
> reproducible.
>
> Note that the hangup does not happen on NFS over IPoIB (no RDMA) over
> XFS over ramdisk. It's really an RDMA-only bug.
> On the other machine (2.6.32) that was doing real work on real disks I
> am now mounting over IPoIB without RDMA and in fact that one is still
> running reliably.
>
> The dd process hangs like this: (/proc/pid/stack)
> [<ffffffff810f8f75>] sync_page+0x45/0x60
> [<ffffffff810f9143>] wait_on_page_bit+0x73/0x80
> [<ffffffff810f9590>] filemap_fdatawait_range+0x110/0x1a0
> [<ffffffff810f9720>] filemap_write_and_wait_range+0x70/0x80
> [<ffffffff811766ba>] vfs_fsync_range+0x5a/0xa0
> [<ffffffff8117676c>] vfs_fsync+0x1c/0x20
> [<ffffffffa02bda1d>] nfs_file_write+0xdd/0x1f0 [nfs]
> [<ffffffff8114d4fa>] do_sync_write+0xda/0x120
> [<ffffffff8114d808>] vfs_write+0xc8/0x190
> [<ffffffff8114e061>] sys_write+0x51/0x90
> [<ffffffff8100c042>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> The dd process is not killable with -9 . Stays alive and hanged.
>
> In the dmesg (client) you can see this line immediately, as soon as
> transfer stops (iostat -n 1) and dd hangs up:
> [ 3072.884988] rpcrdma: connection to 10.100.0.220:20049 closed (-103)
>
> after a while you can see this in dmesg
> [ 3242.890030] INFO: task dd:2140 blocked for more than 120 seconds.
> [ 3242.890132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 3242.890239] dd D ffff88040a8f0398 0 2140 2113
> 0x00000000
> [ 3242.890243] ffff88040891fb38 0000000000000082 ffff88040891fa98
> ffff88040891fa98
> [ 3242.890248] 00000000000139c0 ffff88040a8f0000 ffff88040a8f0398
> ffff88040891ffd8
> [ 3242.890251] ffff88040a8f03a0 00000000000139c0 ffff88040891e010
> 00000000000139c0
> [ 3242.890255] Call Trace:
> [ 3242.890264] [<ffffffff81035509>] ? default_spin_lock_flags+0x9/0x10
> [ 3242.890269] [<ffffffff810f8f30>] ? sync_page+0x0/0x60
> [ 3242.890273] [<ffffffff8157b824>] io_schedule+0x44/0x60
> [ 3242.890276] [<ffffffff810f8f75>] sync_page+0x45/0x60
> [ 3242.890279] [<ffffffff8157c0bf>] __wait_on_bit+0x5f/0x90
> [ 3242.890281] [<ffffffff810f9143>] wait_on_page_bit+0x73/0x80
> [ 3242.890286] [<ffffffff81081bf0>] ? wake_bit_function+0x0/0x40
> [ 3242.890290] [<ffffffff81103ce5>] ? pagevec_lookup_tag+0x25/0x40
> [ 3242.890293] [<ffffffff810f9590>] filemap_fdatawait_range+0x110/0x1a0
> [ 3242.890296] [<ffffffff810f9720>] filemap_write_and_wait_range+0x70/0x80
> [ 3242.890301] [<ffffffff811766ba>] vfs_fsync_range+0x5a/0xa0
> [ 3242.890303] [<ffffffff8117676c>] vfs_fsync+0x1c/0x20
> [ 3242.890319] [<ffffffffa02bda1d>] nfs_file_write+0xdd/0x1f0 [nfs]
> [ 3242.890323] [<ffffffff8114d4fa>] do_sync_write+0xda/0x120
> [ 3242.890328] [<ffffffff812967c8>] ? apparmor_file_permission+0x18/0x20
> [ 3242.890333] [<ffffffff81263323>] ? security_file_permission+0x23/0x90
> [ 3242.890335] [<ffffffff8114d808>] vfs_write+0xc8/0x190
> [ 3242.890338] [<ffffffff8114e061>] sys_write+0x51/0x90
> [ 3242.890343] [<ffffffff8100c001>] ? system_call_after_swapgs+0x41/0x6c
> [ 3242.890346] [<ffffffff8100c042>] system_call_fastpath+0x16/0x1b
> [ 3253.280020] nfs: server 10.100.0.220 not responding, still trying
>
>
> If I try an umount -f this happens:
> umount2: Device or resource busy
> umount.nfs4: /mnt/nfsram: device is busy
> umount2: Device or resource busy
> umount.nfs4: /mnt/nfsram: device is busy
> it stays mounted and processes are still hanged
> However if I repeat the umount -f three times, at the third time it
> does unmount. At that point the dd process is killed. However it's still
> impossible to remount with RDMA, and lsmod shows:
> xprtrdma 41048 1
> module in use with 1 reference.
>
> Only a reboot of the client allows to repeat the mount at this point.
> It is not needed to reboot the server.
>
>
> The dmesg at server side is this:
>
> for each hang at client side (with about 1 minute delay every time):
> BEGIN
> [ 464.780047] WARNING: at
> net/sunrpc/xprtrdma/svc_rdma_transport.c:1162
> __svc_rdma_free+0x20d/0x230 [svcrdma]()
> [ 464.780050] Hardware name: X7DWT
> [ 464.780051] Modules linked in: xfs binfmt_misc ppdev fbcon
> tileblit font bitblit softcursor nfs radeon nfsd lockd svcrdma exportfs
> ib_srp scsi_transport_srp fscache scsi_tgt rdma_ucm crc32c ib_ipoib
> ib_iser nfs_acl auth_rpcgss rdma_cm ttm ib_cm iw_cm ib_sa ib_addr
> mlx4_ib drm_kms_helper iscsi_tcp libiscsi_tcp drm libiscsi mlx4_core
> scsi_transport_iscsi ib_mthca sunrpc ib_uverbs e1000e ib_umad ib_mad
> ioatdma ib_core joydev i5400_edac psmouse edac_core usbhid i2c_algo_bit
> lp shpchp dca hid parport i5k_amb serio_raw [last unloaded: xprtrdma]
> [ 464.780094] Pid: 11, comm: kworker/0:1 Not tainted
> 2.6.37-rc4-stserver-mykernel3+ #1
> [ 464.780096] Call Trace:
> [ 464.780106] [<ffffffff8105feef>] warn_slowpath_common+0x7f/0xc0
> [ 464.780110] [<ffffffffa0220ff0>] ? __svc_rdma_free+0x0/0x230
> [svcrdma]
> [ 464.780112] [<ffffffff8105ff4a>] warn_slowpath_null+0x1a/0x20
> [ 464.780116] [<ffffffffa02211fd>] __svc_rdma_free+0x20d/0x230
> [svcrdma]
> [ 464.780119] [<ffffffffa0220ff0>] ? __svc_rdma_free+0x0/0x230
> [svcrdma]
> [ 464.780123] [<ffffffff8107aa05>] process_one_work+0x125/0x440
> [ 464.780126] [<ffffffff8107d1d0>] worker_thread+0x170/0x410
> [ 464.780129] [<ffffffff8107d060>] ? worker_thread+0x0/0x410
> [ 464.780132] [<ffffffff81081656>] kthread+0x96/0xa0
> [ 464.780137] [<ffffffff8100ce64>] kernel_thread_helper+0x4/0x10
> [ 464.780139] [<ffffffff810815c0>] ? kthread+0x0/0xa0
> [ 464.780142] [<ffffffff8100ce60>] ? kernel_thread_helper+0x0/0x10
> [ 464.780144] ---[ end trace e31416a7a1dc2103 ]---
> END
>
> If at this point I restart nfs-kernel-server, this is the dmesg at
> serverside:
> [ 1119.778579] nfsd: last server has exited, flushing export cache
> [ 1120.930444] svc: failed to register lockdv1 RPC service (errno 97).
> [ 1120.930490] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4
> state recovery directory
> [ 1120.930512] NFSD: starting 90-second grace period
> I'm not sure if the svc line is meaningful...
>
> I have the dmesgs of the client and server since boot if you are
> interested, lsmod, and other stuff. I have removed them to see if the
> message gets through to the list now...
>
> Thanks for your help
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS-RDMA hangs: connection closed (-103)
[not found] ` <4CF6D69B.4030501-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>
2010-12-01 23:25 ` Tom Tucker
@ 2010-12-01 23:59 ` Tom Tucker
[not found] ` <4CF6E144.1080200-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
1 sibling, 1 reply; 9+ messages in thread
From: Tom Tucker @ 2010-12-01 23:59 UTC (permalink / raw)
To: Spelic; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Spelic,
I have seen this problem before, but have not been able to reliably
reproduce it. When I saw the problem, there were no transport errors and
it appeared as if the I/O had actually completed, but that the waiter was
not being awoken. I was not able to reliably reproduce the problem and was
not able to determine if the problem was a latent bug in NFS in general or
a bug in the RDMA transport in particular.
I will try your setup here, but I don't have a system like yours so I'll
have to settle for a smaller ramdisk, however, I have a few questions:
- Does the FS matter? For example, can you use ext[2-4] on the ramdisk and
not still reproduce
- As I mentioned earlier NFS v3 vs. NFS v4
- RAMDISK size, i.e. 2G vs. 14G
Thanks,
Tom
On 12/1/10 5:13 PM, Spelic wrote:
> Hello all
>
> First of all: I have tried to send this message to the list at least 3
> times but it doesn't seem to get through (and I'm given no error back).
> It was very long with 2 attachments... is is because of that? What are
> the limits of this ML?
> This time I will shorten it a bit and remove the attachments.
>
> Here is my problem:
> I am trying to use NFS over RDMA. It doesn't work: hangs very soon.
> I tried kernel 2.6.32 from ubuntu 10.04, and then I tried the most
> recent upstream 2.6.37-rc4 compiled from source. They behave basically
> the same regarding the NFS mount itself, only difference is that 2.6.32
> will hang the complete operating system when nfs hangs, while 2.6.37-rc4
> (after nfs hangs) will only hang processes which launch sync or list nfs
> directories. Anyway the mount is hanged forever; does not resolve by
> itself.
> IPoIB nfs mounts appear to work flawlessly, the problem is with RDMA only.
>
> Hardware: (identical client and server machines)
> 07:00.0 InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx
> HCA] (rev 20)
> Subsystem: Mellanox Technologies MT25204 [InfiniHost III Lx HCA]
> Flags: bus master, fast devsel, latency 0, IRQ 30
> Memory at d8800000 (64-bit, non-prefetchable) [size=1M]
> Memory at d8000000 (64-bit, prefetchable) [size=8M]
> Capabilities: [40] Power Management version 2
> Capabilities: [48] Vital Product Data <?>
> Capabilities: [90] Message Signalled Interrupts: Mask- 64bit+
> Queue=0/5 Enable-
> Capabilities: [84] MSI-X: Enable+ Mask- TabSize=32
> Capabilities: [60] Express Endpoint, MSI 00
> Kernel driver in use: ib_mthca
> Kernel modules: ib_mthca
>
> Mainboard = Supermicro X7DWT with embedded infiniband.
>
> This is my test:
> on server I make a big 14GB ramdisk (exact boot option:
> ramdisk_size=14680064), format xfs and mount like this:
> mkfs.xfs -f -l size=128m -d agcount=16 /dev/ram0
> mount -o nobarrier,inode64,logbufs=8,logbsize=256k /dev/ram0
> /mnt/ram/
> On the client I mount like this (fstab):
> 10.100.0.220:/ /mnt/nfsram nfs4
> _netdev,auto,defaults,rdma,port=20049 0 0
>
> Then on the client I perform
> echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo
> syncing now ; sync ; echo finished
>
> It hangs as soon as it reaches the end of the 14GB of space, but never
> writes "syncing now". It seems like the "disk full" message triggers the
> hangup reliably on NFS over RDMA over XFS over ramdisk; other
> combinations are not so reliable for triggering the bug (e.g. ext4).
>
> However please note that this is not an XFS problem in itself: we had
> another hangup on an ext4 filesystem on NFS on RDMA on real disks for
> real work after a few hours (and it hadn't hit the "disk full"
> situation); this technique with XFS on ramdisk is just more reliably
> reproducible.
>
> Note that the hangup does not happen on NFS over IPoIB (no RDMA) over
> XFS over ramdisk. It's really an RDMA-only bug.
> On the other machine (2.6.32) that was doing real work on real disks I
> am now mounting over IPoIB without RDMA and in fact that one is still
> running reliably.
>
> The dd process hangs like this: (/proc/pid/stack)
> [<ffffffff810f8f75>] sync_page+0x45/0x60
> [<ffffffff810f9143>] wait_on_page_bit+0x73/0x80
> [<ffffffff810f9590>] filemap_fdatawait_range+0x110/0x1a0
> [<ffffffff810f9720>] filemap_write_and_wait_range+0x70/0x80
> [<ffffffff811766ba>] vfs_fsync_range+0x5a/0xa0
> [<ffffffff8117676c>] vfs_fsync+0x1c/0x20
> [<ffffffffa02bda1d>] nfs_file_write+0xdd/0x1f0 [nfs]
> [<ffffffff8114d4fa>] do_sync_write+0xda/0x120
> [<ffffffff8114d808>] vfs_write+0xc8/0x190
> [<ffffffff8114e061>] sys_write+0x51/0x90
> [<ffffffff8100c042>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> The dd process is not killable with -9 . Stays alive and hanged.
>
> In the dmesg (client) you can see this line immediately, as soon as
> transfer stops (iostat -n 1) and dd hangs up:
> [ 3072.884988] rpcrdma: connection to 10.100.0.220:20049 closed (-103)
>
> after a while you can see this in dmesg
> [ 3242.890030] INFO: task dd:2140 blocked for more than 120 seconds.
> [ 3242.890132] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 3242.890239] dd D ffff88040a8f0398 0 2140 2113
> 0x00000000
> [ 3242.890243] ffff88040891fb38 0000000000000082 ffff88040891fa98
> ffff88040891fa98
> [ 3242.890248] 00000000000139c0 ffff88040a8f0000 ffff88040a8f0398
> ffff88040891ffd8
> [ 3242.890251] ffff88040a8f03a0 00000000000139c0 ffff88040891e010
> 00000000000139c0
> [ 3242.890255] Call Trace:
> [ 3242.890264] [<ffffffff81035509>] ? default_spin_lock_flags+0x9/0x10
> [ 3242.890269] [<ffffffff810f8f30>] ? sync_page+0x0/0x60
> [ 3242.890273] [<ffffffff8157b824>] io_schedule+0x44/0x60
> [ 3242.890276] [<ffffffff810f8f75>] sync_page+0x45/0x60
> [ 3242.890279] [<ffffffff8157c0bf>] __wait_on_bit+0x5f/0x90
> [ 3242.890281] [<ffffffff810f9143>] wait_on_page_bit+0x73/0x80
> [ 3242.890286] [<ffffffff81081bf0>] ? wake_bit_function+0x0/0x40
> [ 3242.890290] [<ffffffff81103ce5>] ? pagevec_lookup_tag+0x25/0x40
> [ 3242.890293] [<ffffffff810f9590>] filemap_fdatawait_range+0x110/0x1a0
> [ 3242.890296] [<ffffffff810f9720>] filemap_write_and_wait_range+0x70/0x80
> [ 3242.890301] [<ffffffff811766ba>] vfs_fsync_range+0x5a/0xa0
> [ 3242.890303] [<ffffffff8117676c>] vfs_fsync+0x1c/0x20
> [ 3242.890319] [<ffffffffa02bda1d>] nfs_file_write+0xdd/0x1f0 [nfs]
> [ 3242.890323] [<ffffffff8114d4fa>] do_sync_write+0xda/0x120
> [ 3242.890328] [<ffffffff812967c8>] ? apparmor_file_permission+0x18/0x20
> [ 3242.890333] [<ffffffff81263323>] ? security_file_permission+0x23/0x90
> [ 3242.890335] [<ffffffff8114d808>] vfs_write+0xc8/0x190
> [ 3242.890338] [<ffffffff8114e061>] sys_write+0x51/0x90
> [ 3242.890343] [<ffffffff8100c001>] ? system_call_after_swapgs+0x41/0x6c
> [ 3242.890346] [<ffffffff8100c042>] system_call_fastpath+0x16/0x1b
> [ 3253.280020] nfs: server 10.100.0.220 not responding, still trying
>
>
> If I try an umount -f this happens:
> umount2: Device or resource busy
> umount.nfs4: /mnt/nfsram: device is busy
> umount2: Device or resource busy
> umount.nfs4: /mnt/nfsram: device is busy
> it stays mounted and processes are still hanged
> However if I repeat the umount -f three times, at the third time it
> does unmount. At that point the dd process is killed. However it's still
> impossible to remount with RDMA, and lsmod shows:
> xprtrdma 41048 1
> module in use with 1 reference.
>
> Only a reboot of the client allows to repeat the mount at this point.
> It is not needed to reboot the server.
>
>
> The dmesg at server side is this:
>
> for each hang at client side (with about 1 minute delay every time):
> BEGIN
> [ 464.780047] WARNING: at
> net/sunrpc/xprtrdma/svc_rdma_transport.c:1162
> __svc_rdma_free+0x20d/0x230 [svcrdma]()
> [ 464.780050] Hardware name: X7DWT
> [ 464.780051] Modules linked in: xfs binfmt_misc ppdev fbcon
> tileblit font bitblit softcursor nfs radeon nfsd lockd svcrdma exportfs
> ib_srp scsi_transport_srp fscache scsi_tgt rdma_ucm crc32c ib_ipoib
> ib_iser nfs_acl auth_rpcgss rdma_cm ttm ib_cm iw_cm ib_sa ib_addr
> mlx4_ib drm_kms_helper iscsi_tcp libiscsi_tcp drm libiscsi mlx4_core
> scsi_transport_iscsi ib_mthca sunrpc ib_uverbs e1000e ib_umad ib_mad
> ioatdma ib_core joydev i5400_edac psmouse edac_core usbhid i2c_algo_bit
> lp shpchp dca hid parport i5k_amb serio_raw [last unloaded: xprtrdma]
> [ 464.780094] Pid: 11, comm: kworker/0:1 Not tainted
> 2.6.37-rc4-stserver-mykernel3+ #1
> [ 464.780096] Call Trace:
> [ 464.780106] [<ffffffff8105feef>] warn_slowpath_common+0x7f/0xc0
> [ 464.780110] [<ffffffffa0220ff0>] ? __svc_rdma_free+0x0/0x230
> [svcrdma]
> [ 464.780112] [<ffffffff8105ff4a>] warn_slowpath_null+0x1a/0x20
> [ 464.780116] [<ffffffffa02211fd>] __svc_rdma_free+0x20d/0x230
> [svcrdma]
> [ 464.780119] [<ffffffffa0220ff0>] ? __svc_rdma_free+0x0/0x230
> [svcrdma]
> [ 464.780123] [<ffffffff8107aa05>] process_one_work+0x125/0x440
> [ 464.780126] [<ffffffff8107d1d0>] worker_thread+0x170/0x410
> [ 464.780129] [<ffffffff8107d060>] ? worker_thread+0x0/0x410
> [ 464.780132] [<ffffffff81081656>] kthread+0x96/0xa0
> [ 464.780137] [<ffffffff8100ce64>] kernel_thread_helper+0x4/0x10
> [ 464.780139] [<ffffffff810815c0>] ? kthread+0x0/0xa0
> [ 464.780142] [<ffffffff8100ce60>] ? kernel_thread_helper+0x0/0x10
> [ 464.780144] ---[ end trace e31416a7a1dc2103 ]---
> END
>
> If at this point I restart nfs-kernel-server, this is the dmesg at
> serverside:
> [ 1119.778579] nfsd: last server has exited, flushing export cache
> [ 1120.930444] svc: failed to register lockdv1 RPC service (errno 97).
> [ 1120.930490] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4
> state recovery directory
> [ 1120.930512] NFSD: starting 90-second grace period
> I'm not sure if the svc line is meaningful...
>
> I have the dmesgs of the client and server since boot if you are
> interested, lsmod, and other stuff. I have removed them to see if the
> message gets through to the list now...
>
> Thanks for your help
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS-RDMA hangs: connection closed (-103)
[not found] ` <4CF6E144.1080200-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-12-02 12:16 ` Spelic
[not found] ` <4CF78E0E.2040308-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Spelic @ 2010-12-02 12:16 UTC (permalink / raw)
To: Tom Tucker; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
On 12/02/2010 12:59 AM, Tom Tucker wrote:
> Spelic,
>
> I have seen this problem before, but have not been able to reliably
> reproduce it. When I saw the problem, there were no transport errors
> and it appeared as if the I/O had actually completed, but that the
> waiter was not being awoken. I was not able to reliably reproduce the
> problem and was not able to determine if the problem was a latent bug
> in NFS in general or a bug in the RDMA transport in particular.
>
> I will try your setup here, but I don't have a system like yours so
> I'll have to settle for a smaller ramdisk, however, I have a few
> questions:
>
> - Does the FS matter? For example, can you use ext[2-4] on the ramdisk
> and not still reproduce
> - As I mentioned earlier NFS v3 vs. NFS v4
> - RAMDISK size, i.e. 2G vs. 14G
>
> Thanks,
> Tom
Hello Tom, thanks for replying
- The FS matters to some extent: as I wrote, with ext4 it's not possible
to reproduce the bug in this way, so immediately and reliably, however
ext4 also will hang eventually if you work on it for hours so I had to
switch to IPoIB for our real work; reread my previous post.
- NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a
pointer on instructions?
- RAMDISK size: I am testing it.
Ok I confirm with 1.5GB ramdisk it's reproducible.
boot option ramdisk_size=1572864
(1.5*1024**2=1572864.0)
confirm: blockdev --getsize64 /dev/ram0 == 1610612736
now at server side mkfs and mount with defaults:
mkfs.xfs /dev/ram0
mount /dev/ram0 /mnt/ram
(this is a simplification over my previous email, and it's needed with a
smaller ramdisk or mkfs.xfs will refuse to work. The bug is still
reproducible like this)
DOH! another bug:
It's strange how at the end of the test
ls -lh /mnt/ram
at server side will show a zerofile larger than 1.5GB at the end of the
procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's larger
than the ramdisk size.
# ll -h /mnt/ram
total 1.5G
drwxr-xr-x 2 root root 21 2010-12-02 12:54 ./
drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../
-rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 294G 4.1G 275G 2% /
devtmpfs 7.9G 184K 7.9G 1% /dev
none 7.9G 0 7.9G 0% /dev/shm
none 7.9G 100K 7.9G 1% /var/run
none 7.9G 0 7.9G 0% /var/lock
none 7.9G 0 7.9G 0% /lib/init/rw
/dev/ram0 1.5G 1.5G 20K 100% /mnt/ram
# dd if=/mnt/ram/zerofile | wc -c
4791480+0 records in
4791480+0 records out
2453237760
2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s
It seems there is also an XFS bug here...
This might help triggering the bug however please note than ext4
(nfs-rdma over it) also hanged on us and it was real work on HDD disks
and they were not full... after switching to IPoIB it didn't hang anymore.
On IPoIB the size problem also shows up: final file is 2.3GB instead of
< 1.5GB, however nothing hangs:
# echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo
syncing now ; time sync ; echo finished
begin
dd: writing `/mnt/nfsram/zerofile': Input/output error
2497+0 records in
2496+0 records out
2617245696 bytes (2.6 GB) copied, 10.4 s, 252 MB/s
syncing now
real 0m0.057s
user 0m0.000s
sys 0m0.000s
finished
I think I noticed the same problem with a 14GB ramdisk, the file ended
up to be about 15GB, but at that time I thought I made some computation
mistakes. Now with a smaller ramdisk it's more obvious.
Earlier or later someone should notify the XFS developers of the "size" bug.
However currently it's a good thing: the size bug might help us to fix
the RDMA bug.
Thanks for your help
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS-RDMA hangs: connection closed (-103)
[not found] ` <4CF78E0E.2040308-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>
@ 2010-12-02 18:37 ` Roland Dreier
[not found] ` <ada39qgm36k.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Roland Dreier @ 2010-12-02 18:37 UTC (permalink / raw)
To: Spelic; +Cc: Tom Tucker, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dave Chinner
Adding Dave Chinner to the cc list, since he's both an XFS guru as well
as being very familiar with NFS and RDMA...
Dave, if you read below, it seems there is some strange behavior
exporting XFS with NFS/RDMA.
- R.
> On 12/02/2010 12:59 AM, Tom Tucker wrote:
> > Spelic,
> >
> > I have seen this problem before, but have not been able to reliably
> > reproduce it. When I saw the problem, there were no transport errors
> > and it appeared as if the I/O had actually completed, but that the
> > waiter was not being awoken. I was not able to reliably reproduce
> > the problem and was not able to determine if the problem was a
> > latent bug in NFS in general or a bug in the RDMA transport in
> > particular.
> >
> > I will try your setup here, but I don't have a system like yours so
> > I'll have to settle for a smaller ramdisk, however, I have a few
> > questions:
> >
> > - Does the FS matter? For example, can you use ext[2-4] on the
> > ramdisk and not still reproduce
> > - As I mentioned earlier NFS v3 vs. NFS v4
> > - RAMDISK size, i.e. 2G vs. 14G
> >
> > Thanks,
> > Tom
>
> Hello Tom, thanks for replying
>
> - The FS matters to some extent: as I wrote, with ext4 it's not
> possible to reproduce the bug in this way, so immediately and
> reliably, however ext4 also will hang eventually if you work on it for
> hours so I had to switch to IPoIB for our real work; reread my
> previous post.
>
> - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a
> pointer on instructions?
>
>
> - RAMDISK size: I am testing it.
>
> Ok I confirm with 1.5GB ramdisk it's reproducible.
> boot option ramdisk_size=1572864
> (1.5*1024**2=1572864.0)
> confirm: blockdev --getsize64 /dev/ram0 == 1610612736
>
> now at server side mkfs and mount with defaults:
> mkfs.xfs /dev/ram0
> mount /dev/ram0 /mnt/ram
> (this is a simplification over my previous email, and it's needed with
> a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still
> reproducible like this)
>
>
> DOH! another bug:
> It's strange how at the end of the test
> ls -lh /mnt/ram
> at server side will show a zerofile larger than 1.5GB at the end of
> the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's
> larger than the ramdisk size.
>
> # ll -h /mnt/ram
> total 1.5G
> drwxr-xr-x 2 root root 21 2010-12-02 12:54 ./
> drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../
> -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile
> # df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/sda1 294G 4.1G 275G 2% /
> devtmpfs 7.9G 184K 7.9G 1% /dev
> none 7.9G 0 7.9G 0% /dev/shm
> none 7.9G 100K 7.9G 1% /var/run
> none 7.9G 0 7.9G 0% /var/lock
> none 7.9G 0 7.9G 0% /lib/init/rw
> /dev/ram0 1.5G 1.5G 20K 100% /mnt/ram
>
> # dd if=/mnt/ram/zerofile | wc -c
> 4791480+0 records in
> 4791480+0 records out
> 2453237760
> 2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s
>
> It seems there is also an XFS bug here...
>
> This might help triggering the bug however please note than ext4
> (nfs-rdma over it) also hanged on us and it was real work on HDD disks
> and they were not full... after switching to IPoIB it didn't hang
> anymore.
>
> On IPoIB the size problem also shows up: final file is 2.3GB instead
> of < 1.5GB, however nothing hangs:
>
> # echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo
> syncing now ; time sync ; echo finished
> begin
> dd: writing `/mnt/nfsram/zerofile': Input/output error
> 2497+0 records in
> 2496+0 records out
> 2617245696 bytes (2.6 GB) copied, 10.4 s, 252 MB/s
> syncing now
>
> real 0m0.057s
> user 0m0.000s
> sys 0m0.000s
> finished
>
> I think I noticed the same problem with a 14GB ramdisk, the file ended
> up to be about 15GB, but at that time I thought I made some
> computation mistakes. Now with a smaller ramdisk it's more obvious.
>
> Earlier or later someone should notify the XFS developers of the "size" bug.
> However currently it's a good thing: the size bug might help us to fix
> the RDMA bug.
>
> Thanks for your help
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS-RDMA hangs: connection closed (-103)
[not found] ` <ada39qgm36k.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
@ 2010-12-02 19:09 ` Spelic
[not found] ` <4CF7EEE0.9030408-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Spelic @ 2010-12-02 19:09 UTC (permalink / raw)
To: Roland Dreier; +Cc: Tom Tucker, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dave Chinner
Hello all
please be aware that the "file oversize" bug is reproducible also
without infiniband, with just nfs over ethernet over xfs over ramdisk
(but it doesn't hang, so it's a different bug than the one I posted here
at the RDMA mailing list)
I have posted another thread regarding the "file oversize" bug, which
you can read in the LVM, XFS, and LKML mailing lists, please have a look
http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/
Especially my second post, replying myself at +30 minutes, explains that
it's reproducible also with ethernet.
Thank you
On 12/02/2010 07:37 PM, Roland Dreier wrote:
> Adding Dave Chinner to the cc list, since he's both an XFS guru as well
> as being very familiar with NFS and RDMA...
>
> Dave, if you read below, it seems there is some strange behavior
> exporting XFS with NFS/RDMA.
>
> - R.
>
> > On 12/02/2010 12:59 AM, Tom Tucker wrote:
> > > Spelic,
> > >
> > > I have seen this problem before, but have not been able to reliably
> > > reproduce it. When I saw the problem, there were no transport errors
> > > and it appeared as if the I/O had actually completed, but that the
> > > waiter was not being awoken. I was not able to reliably reproduce
> > > the problem and was not able to determine if the problem was a
> > > latent bug in NFS in general or a bug in the RDMA transport in
> > > particular.
> > >
> > > I will try your setup here, but I don't have a system like yours so
> > > I'll have to settle for a smaller ramdisk, however, I have a few
> > > questions:
> > >
> > > - Does the FS matter? For example, can you use ext[2-4] on the
> > > ramdisk and not still reproduce
> > > - As I mentioned earlier NFS v3 vs. NFS v4
> > > - RAMDISK size, i.e. 2G vs. 14G
> > >
> > > Thanks,
> > > Tom
> >
> > Hello Tom, thanks for replying
> >
> > - The FS matters to some extent: as I wrote, with ext4 it's not
> > possible to reproduce the bug in this way, so immediately and
> > reliably, however ext4 also will hang eventually if you work on it for
> > hours so I had to switch to IPoIB for our real work; reread my
> > previous post.
> >
> > - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a
> > pointer on instructions?
> >
> >
> > - RAMDISK size: I am testing it.
> >
> > Ok I confirm with 1.5GB ramdisk it's reproducible.
> > boot option ramdisk_size=1572864
> > (1.5*1024**2=1572864.0)
> > confirm: blockdev --getsize64 /dev/ram0 == 1610612736
> >
> > now at server side mkfs and mount with defaults:
> > mkfs.xfs /dev/ram0
> > mount /dev/ram0 /mnt/ram
> > (this is a simplification over my previous email, and it's needed with
> > a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still
> > reproducible like this)
> >
> >
> > DOH! another bug:
> > It's strange how at the end of the test
> > ls -lh /mnt/ram
> > at server side will show a zerofile larger than 1.5GB at the end of
> > the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's
> > larger than the ramdisk size.
> >
> > # ll -h /mnt/ram
> > total 1.5G
> > drwxr-xr-x 2 root root 21 2010-12-02 12:54 ./
> > drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../
> > -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile
> > # df -h
> > Filesystem Size Used Avail Use% Mounted on
> > /dev/sda1 294G 4.1G 275G 2% /
> > devtmpfs 7.9G 184K 7.9G 1% /dev
> > none 7.9G 0 7.9G 0% /dev/shm
> > none 7.9G 100K 7.9G 1% /var/run
> > none 7.9G 0 7.9G 0% /var/lock
> > none 7.9G 0 7.9G 0% /lib/init/rw
> > /dev/ram0 1.5G 1.5G 20K 100% /mnt/ram
> >
> > # dd if=/mnt/ram/zerofile | wc -c
> > 4791480+0 records in
> > 4791480+0 records out
> > 2453237760
> > 2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s
> >
> > It seems there is also an XFS bug here...
> >
> > This might help triggering the bug however please note than ext4
> > (nfs-rdma over it) also hanged on us and it was real work on HDD disks
> > and they were not full... after switching to IPoIB it didn't hang
> > anymore.
> >
> > On IPoIB the size problem also shows up: final file is 2.3GB instead
> > of< 1.5GB, however nothing hangs:
> >
> > # echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo
> > syncing now ; time sync ; echo finished
> > begin
> > dd: writing `/mnt/nfsram/zerofile': Input/output error
> > 2497+0 records in
> > 2496+0 records out
> > 2617245696 bytes (2.6 GB) copied, 10.4 s, 252 MB/s
> > syncing now
> >
> > real 0m0.057s
> > user 0m0.000s
> > sys 0m0.000s
> > finished
> >
> > I think I noticed the same problem with a 14GB ramdisk, the file ended
> > up to be about 15GB, but at that time I thought I made some
> > computation mistakes. Now with a smaller ramdisk it's more obvious.
> >
> > Earlier or later someone should notify the XFS developers of the "size" bug.
> > However currently it's a good thing: the size bug might help us to fix
> > the RDMA bug.
> >
> > Thanks for your help
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS-RDMA hangs: connection closed (-103)
[not found] ` <4CF7EEE0.9030408-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>
@ 2010-12-07 16:12 ` Tom Tucker
[not found] ` <4CFE5CF1.6020806-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Tom Tucker @ 2010-12-07 16:12 UTC (permalink / raw)
To: Spelic; +Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dave Chinner
Status update...
I have reproduced the bug a number of different ways. It seems to be most
easily reproduced by simply writing more data than the filesystem has
space for. I can do this reliably with any FS. I think the XFS bug may
have tickled this bug somehow.
Tom
On 12/2/10 1:09 PM, Spelic wrote:
> Hello all
> please be aware that the "file oversize" bug is reproducible also
> without infiniband, with just nfs over ethernet over xfs over ramdisk
> (but it doesn't hang, so it's a different bug than the one I posted here
> at the RDMA mailing list)
> I have posted another thread regarding the "file oversize" bug, which
> you can read in the LVM, XFS, and LKML mailing lists, please have a look
> http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/
>
> Especially my second post, replying myself at +30 minutes, explains that
> it's reproducible also with ethernet.
>
> Thank you
>
> On 12/02/2010 07:37 PM, Roland Dreier wrote:
>> Adding Dave Chinner to the cc list, since he's both an XFS guru as well
>> as being very familiar with NFS and RDMA...
>>
>> Dave, if you read below, it seems there is some strange behavior
>> exporting XFS with NFS/RDMA.
>>
>> - R.
>>
>> > On 12/02/2010 12:59 AM, Tom Tucker wrote:
>> > > Spelic,
>> > >
>> > > I have seen this problem before, but have not been able to reliably
>> > > reproduce it. When I saw the problem, there were no transport errors
>> > > and it appeared as if the I/O had actually completed, but that the
>> > > waiter was not being awoken. I was not able to reliably reproduce
>> > > the problem and was not able to determine if the problem was a
>> > > latent bug in NFS in general or a bug in the RDMA transport in
>> > > particular.
>> > >
>> > > I will try your setup here, but I don't have a system like yours so
>> > > I'll have to settle for a smaller ramdisk, however, I have a few
>> > > questions:
>> > >
>> > > - Does the FS matter? For example, can you use ext[2-4] on the
>> > > ramdisk and not still reproduce
>> > > - As I mentioned earlier NFS v3 vs. NFS v4
>> > > - RAMDISK size, i.e. 2G vs. 14G
>> > >
>> > > Thanks,
>> > > Tom
>> >
>> > Hello Tom, thanks for replying
>> >
>> > - The FS matters to some extent: as I wrote, with ext4 it's not
>> > possible to reproduce the bug in this way, so immediately and
>> > reliably, however ext4 also will hang eventually if you work on it for
>> > hours so I had to switch to IPoIB for our real work; reread my
>> > previous post.
>> >
>> > - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you have a
>> > pointer on instructions?
>> >
>> >
>> > - RAMDISK size: I am testing it.
>> >
>> > Ok I confirm with 1.5GB ramdisk it's reproducible.
>> > boot option ramdisk_size=1572864
>> > (1.5*1024**2=1572864.0)
>> > confirm: blockdev --getsize64 /dev/ram0 == 1610612736
>> >
>> > now at server side mkfs and mount with defaults:
>> > mkfs.xfs /dev/ram0
>> > mount /dev/ram0 /mnt/ram
>> > (this is a simplification over my previous email, and it's needed with
>> > a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still
>> > reproducible like this)
>> >
>> >
>> > DOH! another bug:
>> > It's strange how at the end of the test
>> > ls -lh /mnt/ram
>> > at server side will show a zerofile larger than 1.5GB at the end of
>> > the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's
>> > larger than the ramdisk size.
>> >
>> > # ll -h /mnt/ram
>> > total 1.5G
>> > drwxr-xr-x 2 root root 21 2010-12-02 12:54 ./
>> > drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../
>> > -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile
>> > # df -h
>> > Filesystem Size Used Avail Use% Mounted on
>> > /dev/sda1 294G 4.1G 275G 2% /
>> > devtmpfs 7.9G 184K 7.9G 1% /dev
>> > none 7.9G 0 7.9G 0% /dev/shm
>> > none 7.9G 100K 7.9G 1% /var/run
>> > none 7.9G 0 7.9G 0% /var/lock
>> > none 7.9G 0 7.9G 0% /lib/init/rw
>> > /dev/ram0 1.5G 1.5G 20K 100% /mnt/ram
>> >
>> > # dd if=/mnt/ram/zerofile | wc -c
>> > 4791480+0 records in
>> > 4791480+0 records out
>> > 2453237760
>> > 2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s
>> >
>> > It seems there is also an XFS bug here...
>> >
>> > This might help triggering the bug however please note than ext4
>> > (nfs-rdma over it) also hanged on us and it was real work on HDD disks
>> > and they were not full... after switching to IPoIB it didn't hang
>> > anymore.
>> >
>> > On IPoIB the size problem also shows up: final file is 2.3GB instead
>> > of< 1.5GB, however nothing hangs:
>> >
>> > # echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo
>> > syncing now ; time sync ; echo finished
>> > begin
>> > dd: writing `/mnt/nfsram/zerofile': Input/output error
>> > 2497+0 records in
>> > 2496+0 records out
>> > 2617245696 bytes (2.6 GB) copied, 10.4 s, 252 MB/s
>> > syncing now
>> >
>> > real 0m0.057s
>> > user 0m0.000s
>> > sys 0m0.000s
>> > finished
>> >
>> > I think I noticed the same problem with a 14GB ramdisk, the file ended
>> > up to be about 15GB, but at that time I thought I made some
>> > computation mistakes. Now with a smaller ramdisk it's more obvious.
>> >
>> > Earlier or later someone should notify the XFS developers of the
>> "size" bug.
>> > However currently it's a good thing: the size bug might help us to fix
>> > the RDMA bug.
>> >
>> > Thanks for your help
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS-RDMA hangs: connection closed (-103)
[not found] ` <4CFE5CF1.6020806-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
@ 2010-12-08 15:10 ` Spelic
[not found] ` <4CFF9FE4.5010705-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>
0 siblings, 1 reply; 9+ messages in thread
From: Spelic @ 2010-12-08 15:10 UTC (permalink / raw)
To: Tom Tucker; +Cc: Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA, Dave Chinner
Tom, have you reproduced the "RDMA hangs - connection closes" bug or the
"sparse file at server side upon NFS hitting ENOSPC" ?
Because for the latter people have already given exhaustive explanation:
see this other thread at
http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/
While the former bug is still open and very interesting for us.
Thanks for your help
S.
On 12/07/2010 05:12 PM, Tom Tucker wrote:
> Status update...
>
> I have reproduced the bug a number of different ways. It seems to be
> most easily reproduced by simply writing more data than the filesystem
> has space for. I can do this reliably with any FS. I think the XFS bug
> may have tickled this bug somehow.
>
> Tom
>
> On 12/2/10 1:09 PM, Spelic wrote:
>> Hello all
>> please be aware that the "file oversize" bug is reproducible also
>> without infiniband, with just nfs over ethernet over xfs over ramdisk
>> (but it doesn't hang, so it's a different bug than the one I posted
>> here at the RDMA mailing list)
>> I have posted another thread regarding the "file oversize" bug, which
>> you can read in the LVM, XFS, and LKML mailing lists, please have a look
>> http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/
>>
>> Especially my second post, replying myself at +30 minutes, explains
>> that it's reproducible also with ethernet.
>>
>> Thank you
>>
>> On 12/02/2010 07:37 PM, Roland Dreier wrote:
>>> Adding Dave Chinner to the cc list, since he's both an XFS guru as well
>>> as being very familiar with NFS and RDMA...
>>>
>>> Dave, if you read below, it seems there is some strange behavior
>>> exporting XFS with NFS/RDMA.
>>>
>>> - R.
>>>
>>> > On 12/02/2010 12:59 AM, Tom Tucker wrote:
>>> > > Spelic,
>>> > >
>>> > > I have seen this problem before, but have not been able to
>>> reliably
>>> > > reproduce it. When I saw the problem, there were no transport
>>> errors
>>> > > and it appeared as if the I/O had actually completed, but that the
>>> > > waiter was not being awoken. I was not able to reliably reproduce
>>> > > the problem and was not able to determine if the problem was a
>>> > > latent bug in NFS in general or a bug in the RDMA transport in
>>> > > particular.
>>> > >
>>> > > I will try your setup here, but I don't have a system like
>>> yours so
>>> > > I'll have to settle for a smaller ramdisk, however, I have a few
>>> > > questions:
>>> > >
>>> > > - Does the FS matter? For example, can you use ext[2-4] on the
>>> > > ramdisk and not still reproduce
>>> > > - As I mentioned earlier NFS v3 vs. NFS v4
>>> > > - RAMDISK size, i.e. 2G vs. 14G
>>> > >
>>> > > Thanks,
>>> > > Tom
>>> >
>>> > Hello Tom, thanks for replying
>>> >
>>> > - The FS matters to some extent: as I wrote, with ext4 it's not
>>> > possible to reproduce the bug in this way, so immediately and
>>> > reliably, however ext4 also will hang eventually if you work on
>>> it for
>>> > hours so I had to switch to IPoIB for our real work; reread my
>>> > previous post.
>>> >
>>> > - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you
>>> have a
>>> > pointer on instructions?
>>> >
>>> >
>>> > - RAMDISK size: I am testing it.
>>> >
>>> > Ok I confirm with 1.5GB ramdisk it's reproducible.
>>> > boot option ramdisk_size=1572864
>>> > (1.5*1024**2=1572864.0)
>>> > confirm: blockdev --getsize64 /dev/ram0 == 1610612736
>>> >
>>> > now at server side mkfs and mount with defaults:
>>> > mkfs.xfs /dev/ram0
>>> > mount /dev/ram0 /mnt/ram
>>> > (this is a simplification over my previous email, and it's needed
>>> with
>>> > a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still
>>> > reproducible like this)
>>> >
>>> >
>>> > DOH! another bug:
>>> > It's strange how at the end of the test
>>> > ls -lh /mnt/ram
>>> > at server side will show a zerofile larger than 1.5GB at the end of
>>> > the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's
>>> > larger than the ramdisk size.
>>> >
>>> > # ll -h /mnt/ram
>>> > total 1.5G
>>> > drwxr-xr-x 2 root root 21 2010-12-02 12:54 ./
>>> > drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../
>>> > -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile
>>> > # df -h
>>> > Filesystem Size Used Avail Use% Mounted on
>>> > /dev/sda1 294G 4.1G 275G 2% /
>>> > devtmpfs 7.9G 184K 7.9G 1% /dev
>>> > none 7.9G 0 7.9G 0% /dev/shm
>>> > none 7.9G 100K 7.9G 1% /var/run
>>> > none 7.9G 0 7.9G 0% /var/lock
>>> > none 7.9G 0 7.9G 0% /lib/init/rw
>>> > /dev/ram0 1.5G 1.5G 20K 100% /mnt/ram
>>> >
>>> > # dd if=/mnt/ram/zerofile | wc -c
>>> > 4791480+0 records in
>>> > 4791480+0 records out
>>> > 2453237760
>>> > 2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s
>>> >
>>> > It seems there is also an XFS bug here...
>>> >
>>> > This might help triggering the bug however please note than ext4
>>> > (nfs-rdma over it) also hanged on us and it was real work on HDD
>>> disks
>>> > and they were not full... after switching to IPoIB it didn't hang
>>> > anymore.
>>> >
>>> > On IPoIB the size problem also shows up: final file is 2.3GB instead
>>> > of< 1.5GB, however nothing hangs:
>>> >
>>> > # echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo
>>> > syncing now ; time sync ; echo finished
>>> > begin
>>> > dd: writing `/mnt/nfsram/zerofile': Input/output error
>>> > 2497+0 records in
>>> > 2496+0 records out
>>> > 2617245696 bytes (2.6 GB) copied, 10.4 s, 252 MB/s
>>> > syncing now
>>> >
>>> > real 0m0.057s
>>> > user 0m0.000s
>>> > sys 0m0.000s
>>> > finished
>>> >
>>> > I think I noticed the same problem with a 14GB ramdisk, the file
>>> ended
>>> > up to be about 15GB, but at that time I thought I made some
>>> > computation mistakes. Now with a smaller ramdisk it's more obvious.
>>> >
>>> > Earlier or later someone should notify the XFS developers of the
>>> "size" bug.
>>> > However currently it's a good thing: the size bug might help us
>>> to fix
>>> > the RDMA bug.
>>> >
>>> > Thanks for your help
>>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: NFS-RDMA hangs: connection closed (-103)
[not found] ` <4CFF9FE4.5010705-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>
@ 2010-12-09 15:25 ` Tom Tucker
0 siblings, 0 replies; 9+ messages in thread
From: Tom Tucker @ 2010-12-09 15:25 UTC (permalink / raw)
To: Spelic
Cc: Tom Tucker, Roland Dreier, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
Dave Chinner
On 12/8/10 9:10 AM, Spelic wrote:
> Tom, have you reproduced the "RDMA hangs - connection closes" bug or
> the "sparse file at server side upon NFS hitting ENOSPC" ?
>
> Because for the latter people have already given exhaustive
> explanation: see this other thread at
> http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/
>
>
> While the former bug is still open and very interesting for us.
>
I'm working on the 'former' bug. The bug that I think you've run in to
with how RDMA transport errors are handled and how RPC are retried in
the event of an error. With hard mounts (which I'm suspecting you have),
the RPC will be retried forever. In this bug, the transport never
'recovers' after the error and therefore the RPC never succeeds and the
mount is effectively hung.
There were bugs fixed in this area between 34 and top which is why you
saw the less catastrophic, but still broken behavior you see now.
Unfortunately I can only support this part-time, but I'll keep you
updated on the progress.
Thanks for finding this and helping to debug,
Tom
> Thanks for your help
> S.
>
>
> On 12/07/2010 05:12 PM, Tom Tucker wrote:
>> Status update...
>>
>> I have reproduced the bug a number of different ways. It seems to be
>> most easily reproduced by simply writing more data than the
>> filesystem has space for. I can do this reliably with any FS. I think
>> the XFS bug may have tickled this bug somehow.
>>
>> Tom
>>
>> On 12/2/10 1:09 PM, Spelic wrote:
>>> Hello all
>>> please be aware that the "file oversize" bug is reproducible also
>>> without infiniband, with just nfs over ethernet over xfs over
>>> ramdisk (but it doesn't hang, so it's a different bug than the one I
>>> posted here at the RDMA mailing list)
>>> I have posted another thread regarding the "file oversize" bug,
>>> which you can read in the LVM, XFS, and LKML mailing lists, please
>>> have a look
>>> http://fossplanet.com/f13/%5Blinux-lvm%5D-bugs-mkfs-xfs-device-mapper-xfs-dev-ram-81653/
>>>
>>> Especially my second post, replying myself at +30 minutes, explains
>>> that it's reproducible also with ethernet.
>>>
>>> Thank you
>>>
>>> On 12/02/2010 07:37 PM, Roland Dreier wrote:
>>>> Adding Dave Chinner to the cc list, since he's both an XFS guru as
>>>> well
>>>> as being very familiar with NFS and RDMA...
>>>>
>>>> Dave, if you read below, it seems there is some strange behavior
>>>> exporting XFS with NFS/RDMA.
>>>>
>>>> - R.
>>>>
>>>> > On 12/02/2010 12:59 AM, Tom Tucker wrote:
>>>> > > Spelic,
>>>> > >
>>>> > > I have seen this problem before, but have not been able to
>>>> reliably
>>>> > > reproduce it. When I saw the problem, there were no transport
>>>> errors
>>>> > > and it appeared as if the I/O had actually completed, but that
>>>> the
>>>> > > waiter was not being awoken. I was not able to reliably reproduce
>>>> > > the problem and was not able to determine if the problem was a
>>>> > > latent bug in NFS in general or a bug in the RDMA transport in
>>>> > > particular.
>>>> > >
>>>> > > I will try your setup here, but I don't have a system like
>>>> yours so
>>>> > > I'll have to settle for a smaller ramdisk, however, I have a few
>>>> > > questions:
>>>> > >
>>>> > > - Does the FS matter? For example, can you use ext[2-4] on the
>>>> > > ramdisk and not still reproduce
>>>> > > - As I mentioned earlier NFS v3 vs. NFS v4
>>>> > > - RAMDISK size, i.e. 2G vs. 14G
>>>> > >
>>>> > > Thanks,
>>>> > > Tom
>>>> >
>>>> > Hello Tom, thanks for replying
>>>> >
>>>> > - The FS matters to some extent: as I wrote, with ext4 it's not
>>>> > possible to reproduce the bug in this way, so immediately and
>>>> > reliably, however ext4 also will hang eventually if you work on
>>>> it for
>>>> > hours so I had to switch to IPoIB for our real work; reread my
>>>> > previous post.
>>>> >
>>>> > - NFS3 not tried yet. Never tried to do RDMA on NFS3... do you
>>>> have a
>>>> > pointer on instructions?
>>>> >
>>>> >
>>>> > - RAMDISK size: I am testing it.
>>>> >
>>>> > Ok I confirm with 1.5GB ramdisk it's reproducible.
>>>> > boot option ramdisk_size=1572864
>>>> > (1.5*1024**2=1572864.0)
>>>> > confirm: blockdev --getsize64 /dev/ram0 == 1610612736
>>>> >
>>>> > now at server side mkfs and mount with defaults:
>>>> > mkfs.xfs /dev/ram0
>>>> > mount /dev/ram0 /mnt/ram
>>>> > (this is a simplification over my previous email, and it's
>>>> needed with
>>>> > a smaller ramdisk or mkfs.xfs will refuse to work. The bug is still
>>>> > reproducible like this)
>>>> >
>>>> >
>>>> > DOH! another bug:
>>>> > It's strange how at the end of the test
>>>> > ls -lh /mnt/ram
>>>> > at server side will show a zerofile larger than 1.5GB at the end of
>>>> > the procedure, sometimes it's 3GB, sometimes it's 2.3GB... but it's
>>>> > larger than the ramdisk size.
>>>> >
>>>> > # ll -h /mnt/ram
>>>> > total 1.5G
>>>> > drwxr-xr-x 2 root root 21 2010-12-02 12:54 ./
>>>> > drwxr-xr-x 3 root root 4.0K 2010-11-29 23:51 ../
>>>> > -rw-r--r-- 1 root root 2.3G 2010-12-02 12:59 zerofile
>>>> > # df -h
>>>> > Filesystem Size Used Avail Use% Mounted on
>>>> > /dev/sda1 294G 4.1G 275G 2% /
>>>> > devtmpfs 7.9G 184K 7.9G 1% /dev
>>>> > none 7.9G 0 7.9G 0% /dev/shm
>>>> > none 7.9G 100K 7.9G 1% /var/run
>>>> > none 7.9G 0 7.9G 0% /var/lock
>>>> > none 7.9G 0 7.9G 0% /lib/init/rw
>>>> > /dev/ram0 1.5G 1.5G 20K 100% /mnt/ram
>>>> >
>>>> > # dd if=/mnt/ram/zerofile | wc -c
>>>> > 4791480+0 records in
>>>> > 4791480+0 records out
>>>> > 2453237760
>>>> > 2453237760 bytes (2.5 GB) copied, 8.41821 s, 291 MB/s
>>>> >
>>>> > It seems there is also an XFS bug here...
>>>> >
>>>> > This might help triggering the bug however please note than ext4
>>>> > (nfs-rdma over it) also hanged on us and it was real work on HDD
>>>> disks
>>>> > and they were not full... after switching to IPoIB it didn't hang
>>>> > anymore.
>>>> >
>>>> > On IPoIB the size problem also shows up: final file is 2.3GB
>>>> instead
>>>> > of< 1.5GB, however nothing hangs:
>>>> >
>>>> > # echo begin; dd if=/dev/zero of=/mnt/nfsram/zerofile bs=1M ; echo
>>>> > syncing now ; time sync ; echo finished
>>>> > begin
>>>> > dd: writing `/mnt/nfsram/zerofile': Input/output error
>>>> > 2497+0 records in
>>>> > 2496+0 records out
>>>> > 2617245696 bytes (2.6 GB) copied, 10.4 s, 252 MB/s
>>>> > syncing now
>>>> >
>>>> > real 0m0.057s
>>>> > user 0m0.000s
>>>> > sys 0m0.000s
>>>> > finished
>>>> >
>>>> > I think I noticed the same problem with a 14GB ramdisk, the file
>>>> ended
>>>> > up to be about 15GB, but at that time I thought I made some
>>>> > computation mistakes. Now with a smaller ramdisk it's more obvious.
>>>> >
>>>> > Earlier or later someone should notify the XFS developers of the
>>>> "size" bug.
>>>> > However currently it's a good thing: the size bug might help us
>>>> to fix
>>>> > the RDMA bug.
>>>> >
>>>> > Thanks for your help
>>>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-12-09 15:25 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-01 23:13 NFS-RDMA hangs: connection closed (-103) Spelic
[not found] ` <4CF6D69B.4030501-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>
2010-12-01 23:25 ` Tom Tucker
2010-12-01 23:59 ` Tom Tucker
[not found] ` <4CF6E144.1080200-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-12-02 12:16 ` Spelic
[not found] ` <4CF78E0E.2040308-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>
2010-12-02 18:37 ` Roland Dreier
[not found] ` <ada39qgm36k.fsf-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
2010-12-02 19:09 ` Spelic
[not found] ` <4CF7EEE0.9030408-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>
2010-12-07 16:12 ` Tom Tucker
[not found] ` <4CFE5CF1.6020806-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
2010-12-08 15:10 ` Spelic
[not found] ` <4CFF9FE4.5010705-9AbUPqfR1/2XDw4h08c5KA@public.gmane.org>
2010-12-09 15:25 ` Tom Tucker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox