From: bfields@fieldses.org (J. Bruce Fields)
To: Peter Thurner <p.thurner@blunix.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: NFS Kernel Panics
Date: Mon, 30 Nov 2015 18:07:05 -0500 [thread overview]
Message-ID: <20151130230705.GD31564@fieldses.org> (raw)
In-Reply-To: <565C7747.1080703@blunix.org>
On Mon, Nov 30, 2015 at 05:20:23PM +0100, Peter Thurner wrote:
> Hi guys,
>
> I'm running the following Setup on Ubuntu 14.04 for both Server and Clients:
I don't know what kernel version that translates to.
Ideally this would either get reported to Ubuntu, or reproduced with an
upstream kernel before getting reported here.
>
>
> == NFS Server with /etc/exports:
>
> /var/www/ 172.16.1.254(rw,no_root_squash,sync,no_subtree_check)
> 172.16.1.184(rw,no_root_squash,sync,no_subtree_check)
> 172.16.0.120(rw,no_root_squash,sync,no_subtree_check)
> 172.16.0.193(rw,no_root_squash,sync,no_subtree_check)
>
> Version: 1:1.2.8-6ubuntu1.2
>
>
> == Four NFS Clients with fstab:
>
> alpha:/var/www /var/www nfs4
> nosharecache,fsc=example_web,noatime,tcp,bg,nosuid,rsize=32768,wsize=32768,soft,proto=tcp
> 0 0
>
> On the Clients i'm using cachefilesd:
>
> /var/cache/cachefilesd/loopimage.img
> /var/cache/cachefilesd/srv ext4
> loop,rw,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered 0
> 0
>
> root@web1:~# cat /etc/cachefilesd.conf
> dir /var/cache/cachefilesd/srv
> tag nfs_filesystem_cache
> brun 20%
> frun 10%
> bcull 10%
> fcull 7%
> bstop 5%
> fstop 3%
>
>
> == Problem
>
> Both server and clients experience random kernel Panics. Of the five
> machines, around one dies per die.
per day?
> They all run on Amazon AWS as
> m4.large instances. When I set
>
> rpcdebug -m nfsd -s all
> rpcdebug -m rpc -s all
>
> The messages before the crash (this time on the NFS server) are:
>
> ```
> Nov 30 13:49:54 nfs-master kernel: [38232.649545] nfsd_dispatch: vers 4
> proc 1
> Nov 30 13:49:54 nfs-master kernel: [38232.649547] nfsv4 compound op
> #1/3: 22 (OP_PUTFH)
> Nov 30 13:49:54 nfs-master kernel: [38232.649548] nfsd: fh_verify(32:
> 81060001 0c7791ab ab46dd87 663ae28a 6877949f 2802898e)
> Nov 30 13:49:54 nfs-master kernel: [38232.649552] nfsv4 compound op
> ffff8802026c8080 opcnt 3 #1: 22: status 0
> Nov 30 13:49:54 nfs-master kernel: [38232.649553] nfsv4 compound op
> #2/3: 4 (OP_CLOSE)
> Nov 30 13:49:54 nfs-master kernel: [38232.649554] NFSD: nfsd4_close on
> file objectLinksShadow.png
> Nov 30 13:49:54 nfs-master kernel: [38232.649556] NFSD:
> nfs4_preprocess_seqid_op: seqid=818421 stateid =
> (565bb0a0/00000001/00083f05/00000001)
> Nov 30 13:49:54 nfs-master kernel: [38232.649557] renewing client
> (clientid 565bb0a0/00000001)
> Nov 30 13:49:54 nfs-master kernel: [38232.649558] NFSD:
> move_to_close_lru nfs4_openowner ffff8800373b8000
> Nov 30 13:49:54 nfs-master kernel: [38232.649559] nfsv4 compound op
> ffff8802026c8080 opcnt 3 #2: 4: status 0
> Nov 30 13:49:54 nfs-master kernel: [38232.649560] nfsv4 compound op
> #3/3: 9 (OP_GETATTR)
> Nov 30 13:49:54 nfs-master kernel: [38232.649562] nfsd: fh_verify(32:
> 81060001 0c7791ab ab46dd87 663ae28a 6877949f 2802898e)
> Nov 30 13:49:54 nfs-master kernel: [38232.649564] nfsv4 compound op
> ffff8802026c8080 opcnt 3 #3: 9: status 0
> Nov 30 13:49:54 nfs-master kernel: [38232.649565] nfsv4 compound returned 0
> Nov 30 13:49:54 nfs-master kernel: [38232.649570] svc: socket
> ffff8800e929d000 sendto([ffff8801e07ae000 136... ], 136) = 136 (addr
> 172.16.0.120, port=958)
> Nov 30 13:49:54 nfs-master kernel: [38232.649571] svc: server
> ffff880202142000 waiting for data (to = 900000)
This all looks pretty normal to me.
> Nov 30 13:49:54 nfs-master rsyslogd: [origin software="rsyslogd"
> swVersion="7.4.4" x-pid="939" x-info="http://www.rsyslog.com"] exiting
> on signal 15.
That's SIGTERM. No idea if that means anything.
Sorry, I don't see anything much to go on here. Is there a console that
might have anything more? I'm not very familiar with AWS.
--b.
> Server is rebooting here
>
> Nov 30 13:50:34 nfs-master rsyslogd: [origin software="rsyslogd"
> swVersion="7.4.4" x-pid="951" x-info="http://www.rsyslog.com"] start
> Nov 30 13:50:34 nfs-master rsyslogd-2307: warning: ~ action is
> deprecated, consider using the 'stop' statement instead [try
> http://www.rsyslog.com/e/2307 ]
> Nov 30 13:50:34 nfs-master rsyslogd: rsyslogd's groupid changed to 104
> Nov 30 13:50:34 nfs-master rsyslogd: rsyslogd's userid changed to 101
> Nov 30 13:50:34 nfs-master kernel: [ 0.000000] Initializing cgroup
> subsys cpuset
> Nov 30 13:50:34 nfs-master kernel: [ 0.000000] Initializing cgroup
> subsys cpu
> Nov 30 13:50:34 nfs-master kernel: [ 0.000000] Initializing cgroup
> subsys cpuacct
>
> ```
>
>
>
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
prev parent reply other threads:[~2015-11-30 23:07 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-30 16:20 NFS Kernel Panics Peter Thurner
2015-11-30 23:07 ` J. Bruce Fields [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151130230705.GD31564@fieldses.org \
--to=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
--cc=p.thurner@blunix.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox