All of lore.kernel.org
 help / color / mirror / Atom feed
From: bfields@fieldses.org (J. Bruce Fields)
To: Peter Thurner <p.thurner@blunix.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: NFS Kernel Panics
Date: Mon, 30 Nov 2015 18:07:05 -0500	[thread overview]
Message-ID: <20151130230705.GD31564@fieldses.org> (raw)
In-Reply-To: <565C7747.1080703@blunix.org>

On Mon, Nov 30, 2015 at 05:20:23PM +0100, Peter Thurner wrote:
> Hi guys,
> 
> I'm running the following Setup on Ubuntu 14.04 for both Server and Clients:

I don't know what kernel version that translates to.

Ideally this would either get reported to Ubuntu, or reproduced with an
upstream kernel before getting reported here.

> 
> 
> == NFS Server with /etc/exports:
> 
> /var/www/ 172.16.1.254(rw,no_root_squash,sync,no_subtree_check)
> 172.16.1.184(rw,no_root_squash,sync,no_subtree_check)
> 172.16.0.120(rw,no_root_squash,sync,no_subtree_check)
> 172.16.0.193(rw,no_root_squash,sync,no_subtree_check)
> 
> Version: 1:1.2.8-6ubuntu1.2
> 
> 
> == Four NFS Clients with fstab:
> 
> alpha:/var/www        /var/www    nfs4   
> nosharecache,fsc=example_web,noatime,tcp,bg,nosuid,rsize=32768,wsize=32768,soft,proto=tcp   
> 0 0
> 
> On the Clients i'm using cachefilesd:
> 
> /var/cache/cachefilesd/loopimage.img       
> /var/cache/cachefilesd/srv    ext4   
> loop,rw,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered    0
> 0
> 
> root@web1:~# cat /etc/cachefilesd.conf
> dir /var/cache/cachefilesd/srv
> tag nfs_filesystem_cache
> brun 20%
> frun 10%
> bcull 10%
> fcull 7%
> bstop 5%
> fstop 3%
> 
> 
> == Problem
> 
> Both server and clients experience random kernel Panics. Of the five
> machines, around one dies per die.

per day?

> They all run on Amazon AWS as
> m4.large instances. When I set
> 
> rpcdebug -m nfsd -s all
> rpcdebug -m rpc -s all
> 
> The messages before the crash (this time on the NFS server) are:
> 
> ```
> Nov 30 13:49:54 nfs-master kernel: [38232.649545] nfsd_dispatch: vers 4
> proc 1
> Nov 30 13:49:54 nfs-master kernel: [38232.649547] nfsv4 compound op
> #1/3: 22 (OP_PUTFH)
> Nov 30 13:49:54 nfs-master kernel: [38232.649548] nfsd: fh_verify(32:
> 81060001 0c7791ab ab46dd87 663ae28a 6877949f 2802898e)
> Nov 30 13:49:54 nfs-master kernel: [38232.649552] nfsv4 compound op
> ffff8802026c8080 opcnt 3 #1: 22: status 0
> Nov 30 13:49:54 nfs-master kernel: [38232.649553] nfsv4 compound op
> #2/3: 4 (OP_CLOSE)
> Nov 30 13:49:54 nfs-master kernel: [38232.649554] NFSD: nfsd4_close on
> file objectLinksShadow.png
> Nov 30 13:49:54 nfs-master kernel: [38232.649556] NFSD:
> nfs4_preprocess_seqid_op: seqid=818421 stateid =
> (565bb0a0/00000001/00083f05/00000001)
> Nov 30 13:49:54 nfs-master kernel: [38232.649557] renewing client
> (clientid 565bb0a0/00000001)
> Nov 30 13:49:54 nfs-master kernel: [38232.649558] NFSD:
> move_to_close_lru nfs4_openowner ffff8800373b8000
> Nov 30 13:49:54 nfs-master kernel: [38232.649559] nfsv4 compound op
> ffff8802026c8080 opcnt 3 #2: 4: status 0
> Nov 30 13:49:54 nfs-master kernel: [38232.649560] nfsv4 compound op
> #3/3: 9 (OP_GETATTR)
> Nov 30 13:49:54 nfs-master kernel: [38232.649562] nfsd: fh_verify(32:
> 81060001 0c7791ab ab46dd87 663ae28a 6877949f 2802898e)
> Nov 30 13:49:54 nfs-master kernel: [38232.649564] nfsv4 compound op
> ffff8802026c8080 opcnt 3 #3: 9: status 0
> Nov 30 13:49:54 nfs-master kernel: [38232.649565] nfsv4 compound returned 0
> Nov 30 13:49:54 nfs-master kernel: [38232.649570] svc: socket
> ffff8800e929d000 sendto([ffff8801e07ae000 136... ], 136) = 136 (addr
> 172.16.0.120, port=958)
> Nov 30 13:49:54 nfs-master kernel: [38232.649571] svc: server
> ffff880202142000 waiting for data (to = 900000)

This all looks pretty normal to me.

> Nov 30 13:49:54 nfs-master rsyslogd: [origin software="rsyslogd"
> swVersion="7.4.4" x-pid="939" x-info="http://www.rsyslog.com"] exiting
> on signal 15.

That's SIGTERM.  No idea if that means anything.

Sorry, I don't see anything much to go on here.  Is there a console that
might have anything more?  I'm not very familiar with AWS.

--b.

> Server is rebooting here
> 
> Nov 30 13:50:34 nfs-master rsyslogd: [origin software="rsyslogd"
> swVersion="7.4.4" x-pid="951" x-info="http://www.rsyslog.com"] start
> Nov 30 13:50:34 nfs-master rsyslogd-2307: warning: ~ action is
> deprecated, consider using the 'stop' statement instead [try
> http://www.rsyslog.com/e/2307 ]
> Nov 30 13:50:34 nfs-master rsyslogd: rsyslogd's groupid changed to 104
> Nov 30 13:50:34 nfs-master rsyslogd: rsyslogd's userid changed to 101
> Nov 30 13:50:34 nfs-master kernel: [    0.000000] Initializing cgroup
> subsys cpuset
> Nov 30 13:50:34 nfs-master kernel: [    0.000000] Initializing cgroup
> subsys cpu
> Nov 30 13:50:34 nfs-master kernel: [    0.000000] Initializing cgroup
> subsys cpuacct
> 
> ```
> 
> 
> 
> 
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

      reply	other threads:[~2015-11-30 23:07 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-30 16:20 NFS Kernel Panics Peter Thurner
2015-11-30 23:07 ` J. Bruce Fields [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151130230705.GD31564@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=p.thurner@blunix.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.