linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NFS stalls when writing - linux 3.6.x
@ 2012-11-03 19:29 Florian Pritz
  2012-11-07 19:07 ` J. Bruce Fields
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Pritz @ 2012-11-03 19:29 UTC (permalink / raw)
  To: linux-nfs

[-- Attachment #1: Type: text/plain, Size: 3269 bytes --]

Hi,

Long text ahead.


Since I have no idea what to look at/for, I tried to summarise all more
or less relevant information. If you need any more, please tell me.

I've been trying to debug this for days now and might have mixed
something up although I double checked as much as possible while writing
this mail.


# Overview

I've been experiencing stalls when trying to write big-ish files on my
nfs mount for some time (few months) now. Rsync is also somewhat slow,
transferring only like 1 file per second even if the files are only a
few kilobytes in size. Sometimes it also stalls for a few seconds
between files. I hardly run rsync over nfs so can't tell if this might
be normal.

Sadly I don't know when this started happening.

Server and client are both running Arch Linux with linux 3.6.5 and
nfs-utils 1.2.6.

The server is running on a striped raid10 array with 4 disks using the
deadline scheduler and connected via Gbit ethernet. The CPU is an Intel
i3-530 and it has 2GB RAM. The raid10 is part of an LVM which contains
the actual XFS file system exported by nfsd.

At first I assumed a problem with file system, but I switched from ext3
to XFS and still experience the issue. Transferring large amounts
(>80GB) of data over samba + cifs didn't cause any problems so I'm
ruling out network and disks.

# Description

dd if=/dev/zero of=test bs=1M count=8000 (writing a 1GB file is also
enough, sometimes)

Watch the network traffic (with "vnstat -l" or conky) and wait until it
drops from 110MB/s to 0-5MB/s (you might need to run dd multiple times,
wait a few minutes/hours or reboot the server)

top on the server now shows lots of nfsd threads in D state. iostat only
shows the 0-5MB/s of network traffic going to the disk.

A local dd job on the server manages to write 160MB/s while nfsd
continues to hang. Reading from the nfs share while nfsd is hanging is
possible, but has a delay of up to ~20-30 seconds.

After some time the client displays "nfs: server levant not responding,
still trying" in dmesg followed by a "nfs: server levant OK" 0 or more
seconds later (yes, zero). Both messages sometimes appear more than once
at the same time.

Apart from those messages dmesg is clean on either system even after
waiting for a few minutes.

# Environment

## Mount options (from /proc/mounts)

rw,nosuid,nodev,noexec,relatime,vers=4.0,rsize=65536,wsize=65536,
namlen=255,hard,proto=tcp,port=0,timeo=14,retrans=2,sec=sys,
clientaddr=192.168.4.247,local_lock=none,addr=192.168.4.103,user

## /etc/exportsfs -v

/mnt/data/nfs
192.168.4.1/24(rw,wdelay,crossmnt,root_squash,all_squash,no_subtree_check,anonuid=999,anongid=999)

## Programm versions

Those are all the same on both client and server.

acl 2.2.51-2
libgssglue 0.4-1
libevent 2.0.20-1
librpcsecgss 0.19-7
nfs-utils 1.2.6-2
util-linux 2.22.1-2

# Other notes

I tried reproducing the issue with a virtual machine and it somehow
worked, but I'm not really sure if I actually hit the same issue because
the vm sometimes locks up too.

The VM was set up in qemu with one virtio disk which was directly
partioned without the use of mdadm or lvm.


Thank you for reading.

-- 
Florian Pritz


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NFS stalls when writing - linux 3.6.x
  2012-11-03 19:29 NFS stalls when writing - linux 3.6.x Florian Pritz
@ 2012-11-07 19:07 ` J. Bruce Fields
       [not found]   ` <509D2993.4050604@xinu.at>
  0 siblings, 1 reply; 6+ messages in thread
From: J. Bruce Fields @ 2012-11-07 19:07 UTC (permalink / raw)
  To: Florian Pritz; +Cc: linux-nfs

On Sat, Nov 03, 2012 at 08:29:11PM +0100, Florian Pritz wrote:
> Hi,
> 
> Long text ahead.
> 
> 
> Since I have no idea what to look at/for, I tried to summarise all more
> or less relevant information. If you need any more, please tell me.
> 
> I've been trying to debug this for days now and might have mixed
> something up although I double checked as much as possible while writing
> this mail.
> 
> 
> # Overview
> 
> I've been experiencing stalls when trying to write big-ish files on my
> nfs mount for some time (few months) now. Rsync is also somewhat slow,
> transferring only like 1 file per second even if the files are only a
> few kilobytes in size. Sometimes it also stalls for a few seconds
> between files. I hardly run rsync over nfs so can't tell if this might
> be normal.
> 
> Sadly I don't know when this started happening.

It would be helpful to know that--especially if you find an easy way to
reproduce this, it would be worth booting to older kernels and seeing if
you can figure when the problem started.

> Server and client are both running Arch Linux with linux 3.6.5 and
> nfs-utils 1.2.6.
> 
> The server is running on a striped raid10 array with 4 disks using the
> deadline scheduler and connected via Gbit ethernet. The CPU is an Intel
> i3-530 and it has 2GB RAM. The raid10 is part of an LVM which contains
> the actual XFS file system exported by nfsd.
> 
> At first I assumed a problem with file system, but I switched from ext3
> to XFS and still experience the issue. Transferring large amounts
> (>80GB) of data over samba + cifs didn't cause any problems so I'm
> ruling out network and disks.
> 
> # Description
> 
> dd if=/dev/zero of=test bs=1M count=8000 (writing a 1GB file is also
> enough, sometimes)
> 
> Watch the network traffic (with "vnstat -l" or conky) and wait until it
> drops from 110MB/s to 0-5MB/s (you might need to run dd multiple times,
> wait a few minutes/hours or reboot the server)
> 
> top on the server now shows lots of nfsd threads in D state.

Next time you find in that state, could you try

	echo t >/proc/sysrq-trigger

on the server?  That will dump a bunch of data to the logs which we
might be able to use.

--b.

> iostat only
> shows the 0-5MB/s of network traffic going to the disk.
> 
> A local dd job on the server manages to write 160MB/s while nfsd
> continues to hang. Reading from the nfs share while nfsd is hanging is
> possible, but has a delay of up to ~20-30 seconds.
> 
> After some time the client displays "nfs: server levant not responding,
> still trying" in dmesg followed by a "nfs: server levant OK" 0 or more
> seconds later (yes, zero). Both messages sometimes appear more than once
> at the same time.
> 
> Apart from those messages dmesg is clean on either system even after
> waiting for a few minutes.
> 
> # Environment
> 
> ## Mount options (from /proc/mounts)
> 
> rw,nosuid,nodev,noexec,relatime,vers=4.0,rsize=65536,wsize=65536,
> namlen=255,hard,proto=tcp,port=0,timeo=14,retrans=2,sec=sys,
> clientaddr=192.168.4.247,local_lock=none,addr=192.168.4.103,user
> 
> ## /etc/exportsfs -v
> 
> /mnt/data/nfs
> 192.168.4.1/24(rw,wdelay,crossmnt,root_squash,all_squash,no_subtree_check,anonuid=999,anongid=999)
> 
> ## Programm versions
> 
> Those are all the same on both client and server.
> 
> acl 2.2.51-2
> libgssglue 0.4-1
> libevent 2.0.20-1
> librpcsecgss 0.19-7
> nfs-utils 1.2.6-2
> util-linux 2.22.1-2
> 
> # Other notes
> 
> I tried reproducing the issue with a virtual machine and it somehow
> worked, but I'm not really sure if I actually hit the same issue because
> the vm sometimes locks up too.
> 
> The VM was set up in qemu with one virtio disk which was directly
> partioned without the use of mdadm or lvm.
> 
> 
> Thank you for reading.
> 
> -- 
> Florian Pritz
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NFS stalls when writing - linux 3.6.x
       [not found]   ` <509D2993.4050604@xinu.at>
@ 2012-11-09 16:36     ` J. Bruce Fields
  2012-11-09 17:20       ` Ben Myers
  0 siblings, 1 reply; 6+ messages in thread
From: J. Bruce Fields @ 2012-11-09 16:36 UTC (permalink / raw)
  To: Florian Pritz; +Cc: linux-nfs, xfs, Ben Myers, Alex Elder

On Fri, Nov 09, 2012 at 05:04:35PM +0100, Florian Pritz wrote:
> On 07.11.2012 20:07, J. Bruce Fields wrote:
> >> Sadly I don't know when this started happening.
> > 
> > It would be helpful to know that--especially if you find an easy way to
> > reproduce this, it would be worth booting to older kernels and seeing if
> > you can figure when the problem started.
> 
> I'll try that unless the sysrq output helps.
> 
> >> top on the server now shows lots of nfsd threads in D state.
> > 
> > Next time you find in that state, could you try
> > 
> > 	echo t >/proc/sysrq-trigger
> > 
> > on the server?  That will dump a bunch of data to the logs which we
> > might be able to use.
> 
> Data should be attached if the ML allows it. Hope it helps. sysrq output
> is after line 883.

The nfsd threads are all stuck in _vfs_lock_force_lsn in xfs_file_fsync;
for example:

> [99390.866452] nfsd            D ffff880075723060     0  1342      2 0x00000000
> [99390.866455]  ffff88007988d8b0 0000000000000046 ffff880075723060 ffff88007988dfd8
> [99390.866460]  ffff88007988dfd8 ffff88007988dfd8 ffff88007bbcd0a0 ffff880075723060
> [99390.866464]  ffff88007553b998 ffff88007553b980 ffff88005cd2d0c0 0000000000000001
> [99390.866468] Call Trace:
> [99390.866484]  [<ffffffffa0af521e>] ? xlog_cil_force_lsn+0xce/0x160 [xfs]
> [99390.866488]  [<ffffffff81491b79>] schedule+0x29/0x70
> [99390.866504]  [<ffffffffa0af3a17>] _xfs_log_force_lsn+0x287/0x2d0 [xfs]
> [99390.866508]  [<ffffffff8108bbd0>] ? try_to_wake_up+0x2f0/0x2f0
> [99390.866519]  [<ffffffffa0a9ecff>] xfs_file_fsync+0x1bf/0x230 [xfs]
> [99390.866524]  [<ffffffff811ae21d>] generic_write_sync+0x4d/0x60
> [99390.866536]  [<ffffffffa0a9fbcf>] xfs_file_aio_write+0x12f/0x160 [xfs]
> [99390.866547]  [<ffffffffa0a9faa0>] ? xfs_file_buffered_aio_write+0x1f0/0x1f0 [xfs]
> [99390.866552]  [<ffffffff8117fb73>] do_sync_readv_writev+0xa3/0xe0
> [99390.866557]  [<ffffffff8117fe54>] do_readv_writev+0xd4/0x1e0
> [99390.866565]  [<ffffffffa0ba3170>] ? _fh_update.isra.9.part.10+0x60/0x60 [nfsd]
> [99390.866572]  [<ffffffffa0ba3170>] ? _fh_update.isra.9.part.10+0x60/0x60 [nfsd]
> [99390.866576]  [<ffffffff811f64cf>] ? exportfs_decode_fh+0xaf/0x310
> [99390.866581]  [<ffffffff8117ff95>] vfs_writev+0x35/0x60
> [99390.866588]  [<ffffffffa0ba44fb>] nfsd_vfs_write.isra.11+0xeb/0x300 [nfsd]
> [99390.866598]  [<ffffffffa0bbb262>] ? find_confirmed_client+0xb2/0x100 [nfsd]
> [99390.866607]  [<ffffffffa0bbc1ba>] ? nfsd4_lookup_stateid+0xea/0x120 [nfsd]
> [99390.866615]  [<ffffffffa0ba6b22>] nfsd_write+0xa2/0x110 [nfsd]
> [99390.866623]  [<ffffffffa0bb1463>] nfsd4_write+0xe3/0x110 [nfsd]
> [99390.866632]  [<ffffffffa0bb20bc>] nfsd4_proc_compound+0x2fc/0x650 [nfsd]
> [99390.866639]  [<ffffffffa0ba0b3e>] nfsd_dispatch+0xbe/0x1c0 [nfsd]
> [99390.866651]  [<ffffffffa027020e>] svc_process+0x48e/0x790 [sunrpc]
> [99390.866658]  [<ffffffffa0ba0115>] nfsd+0xb5/0x1a0 [nfsd]
> [99390.866664]  [<ffffffffa0ba0060>] ? nfsd_get_default_max_blksize+0x60/0x60 [nfsd]
> [99390.866668]  [<ffffffff81079a03>] kthread+0x93/0xa0
> [99390.866673]  [<ffffffff8149b144>] kernel_thread_helper+0x4/0x10
> [99390.866678]  [<ffffffff81079970>] ? kthread_freezable_should_stop+0x70/0x70
> [99390.866682]  [<ffffffff8149b140>] ? gs_change+0x13/0x13

Maybe xfs folks would have an idea?  (More context:

	http://thread.gmane.org/gmane.linux.nfs/53123/focus=53180
)

--b.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NFS stalls when writing - linux 3.6.x
  2012-11-09 16:36     ` J. Bruce Fields
@ 2012-11-09 17:20       ` Ben Myers
  2012-11-09 17:25         ` Florian Pritz
  0 siblings, 1 reply; 6+ messages in thread
From: Ben Myers @ 2012-11-09 17:20 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Florian Pritz, linux-nfs, Alex Elder, xfs

Hey Bruce & Florian,

On Fri, Nov 09, 2012 at 11:36:58AM -0500, J. Bruce Fields wrote:
> On Fri, Nov 09, 2012 at 05:04:35PM +0100, Florian Pritz wrote:
> > On 07.11.2012 20:07, J. Bruce Fields wrote:
> > >> Sadly I don't know when this started happening.
> > > 
> > > It would be helpful to know that--especially if you find an easy way to
> > > reproduce this, it would be worth booting to older kernels and seeing if
> > > you can figure when the problem started.
> > 
> > I'll try that unless the sysrq output helps.
> > 
> > >> top on the server now shows lots of nfsd threads in D state.
> > > 
> > > Next time you find in that state, could you try
> > > 
> > > 	echo t >/proc/sysrq-trigger
> > > 
> > > on the server?  That will dump a bunch of data to the logs which we
> > > might be able to use.
> > 
> > Data should be attached if the ML allows it. Hope it helps. sysrq output
> > is after line 883.
> 
> The nfsd threads are all stuck in _vfs_lock_force_lsn in xfs_file_fsync;
> for example:
> 
> > [99390.866452] nfsd            D ffff880075723060     0  1342      2 0x00000000
> > [99390.866455]  ffff88007988d8b0 0000000000000046 ffff880075723060 ffff88007988dfd8
> > [99390.866460]  ffff88007988dfd8 ffff88007988dfd8 ffff88007bbcd0a0 ffff880075723060
> > [99390.866464]  ffff88007553b998 ffff88007553b980 ffff88005cd2d0c0 0000000000000001
> > [99390.866468] Call Trace:
> > [99390.866484]  [<ffffffffa0af521e>] ? xlog_cil_force_lsn+0xce/0x160 [xfs]
> > [99390.866488]  [<ffffffff81491b79>] schedule+0x29/0x70
> > [99390.866504]  [<ffffffffa0af3a17>] _xfs_log_force_lsn+0x287/0x2d0 [xfs]
> > [99390.866508]  [<ffffffff8108bbd0>] ? try_to_wake_up+0x2f0/0x2f0
> > [99390.866519]  [<ffffffffa0a9ecff>] xfs_file_fsync+0x1bf/0x230 [xfs]
> > [99390.866524]  [<ffffffff811ae21d>] generic_write_sync+0x4d/0x60
> > [99390.866536]  [<ffffffffa0a9fbcf>] xfs_file_aio_write+0x12f/0x160 [xfs]
> > [99390.866547]  [<ffffffffa0a9faa0>] ? xfs_file_buffered_aio_write+0x1f0/0x1f0 [xfs]
> > [99390.866552]  [<ffffffff8117fb73>] do_sync_readv_writev+0xa3/0xe0
> > [99390.866557]  [<ffffffff8117fe54>] do_readv_writev+0xd4/0x1e0
> > [99390.866565]  [<ffffffffa0ba3170>] ? _fh_update.isra.9.part.10+0x60/0x60 [nfsd]
> > [99390.866572]  [<ffffffffa0ba3170>] ? _fh_update.isra.9.part.10+0x60/0x60 [nfsd]
> > [99390.866576]  [<ffffffff811f64cf>] ? exportfs_decode_fh+0xaf/0x310
> > [99390.866581]  [<ffffffff8117ff95>] vfs_writev+0x35/0x60
> > [99390.866588]  [<ffffffffa0ba44fb>] nfsd_vfs_write.isra.11+0xeb/0x300 [nfsd]
> > [99390.866598]  [<ffffffffa0bbb262>] ? find_confirmed_client+0xb2/0x100 [nfsd]
> > [99390.866607]  [<ffffffffa0bbc1ba>] ? nfsd4_lookup_stateid+0xea/0x120 [nfsd]
> > [99390.866615]  [<ffffffffa0ba6b22>] nfsd_write+0xa2/0x110 [nfsd]
> > [99390.866623]  [<ffffffffa0bb1463>] nfsd4_write+0xe3/0x110 [nfsd]
> > [99390.866632]  [<ffffffffa0bb20bc>] nfsd4_proc_compound+0x2fc/0x650 [nfsd]
> > [99390.866639]  [<ffffffffa0ba0b3e>] nfsd_dispatch+0xbe/0x1c0 [nfsd]
> > [99390.866651]  [<ffffffffa027020e>] svc_process+0x48e/0x790 [sunrpc]
> > [99390.866658]  [<ffffffffa0ba0115>] nfsd+0xb5/0x1a0 [nfsd]
> > [99390.866664]  [<ffffffffa0ba0060>] ? nfsd_get_default_max_blksize+0x60/0x60 [nfsd]
> > [99390.866668]  [<ffffffff81079a03>] kthread+0x93/0xa0
> > [99390.866673]  [<ffffffff8149b144>] kernel_thread_helper+0x4/0x10
> > [99390.866678]  [<ffffffff81079970>] ? kthread_freezable_should_stop+0x70/0x70
> > [99390.866682]  [<ffffffff8149b140>] ? gs_change+0x13/0x13
> 
> Maybe xfs folks would have an idea?  (More context:
> 
> 	http://thread.gmane.org/gmane.linux.nfs/53123/focus=53180
> )

I took a brief look but there were no slam dunks.  This seems like it might be
a candidate:

8375f922 - xfs: re-enable xfsaild idle mode and fix associated races

The sysrq-trigger information would definately be helpful.  Meanwhile I'll see
if I can reproduce the issue.

Thanks,
	Ben

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NFS stalls when writing - linux 3.6.x
  2012-11-09 17:20       ` Ben Myers
@ 2012-11-09 17:25         ` Florian Pritz
  2012-11-09 17:53           ` Mark Tinguely
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Pritz @ 2012-11-09 17:25 UTC (permalink / raw)
  To: Ben Myers; +Cc: J. Bruce Fields, linux-nfs, Alex Elder, xfs

[-- Attachment #1: Type: text/plain, Size: 551 bytes --]

On 09.11.2012 18:20, Ben Myers wrote:
> I took a brief look but there were no slam dunks.  This seems like it might be
> a candidate:
> 
> 8375f922 - xfs: re-enable xfsaild idle mode and fix associated races
> 
> The sysrq-trigger information would definately be helpful.  Meanwhile I'll see
> if I can reproduce the issue.

No idea why it doesn't show up on gmane yet (the excerpt is from my mail
to the list), but I've uploaded it here[1]. The link will be valid for
10 days.

[1]: https://paste.xinu.at/koXRw/

-- 
Florian Pritz


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: NFS stalls when writing - linux 3.6.x
  2012-11-09 17:25         ` Florian Pritz
@ 2012-11-09 17:53           ` Mark Tinguely
  0 siblings, 0 replies; 6+ messages in thread
From: Mark Tinguely @ 2012-11-09 17:53 UTC (permalink / raw)
  To: Florian Pritz; +Cc: Ben Myers, J. Bruce Fields, linux-nfs, Alex Elder, xfs

On 11/09/12 11:25, Florian Pritz wrote:
> On 09.11.2012 18:20, Ben Myers wrote:
>> I took a brief look but there were no slam dunks.  This seems like it might be
>> a candidate:
>>
>> 8375f922 - xfs: re-enable xfsaild idle mode and fix associated races
>>
>> The sysrq-trigger information would definately be helpful.  Meanwhile I'll see
>> if I can reproduce the issue.
>
> No idea why it doesn't show up on gmane yet (the excerpt is from my mail
> to the list), but I've uploaded it here[1]. The link will be valid for
> 10 days.
>
> [1]: https://paste.xinu.at/koXRw/
>
>
>
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

Thank-you for the task list. Could we get a dump of the hang?

  # echo c >/proc/sysrq-trigger

It requires kdump configured.

--Mark.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-11-09 17:53 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-03 19:29 NFS stalls when writing - linux 3.6.x Florian Pritz
2012-11-07 19:07 ` J. Bruce Fields
     [not found]   ` <509D2993.4050604@xinu.at>
2012-11-09 16:36     ` J. Bruce Fields
2012-11-09 17:20       ` Ben Myers
2012-11-09 17:25         ` Florian Pritz
2012-11-09 17:53           ` Mark Tinguely

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).