All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Reis <kiko@async.com.br>
To: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: linux-kernel@vger.kernel.org, NFS@lists.sourceforge.net
Subject: Re: 2.4.19+trond and diskless locking problems
Date: Wed, 4 Dec 2002 12:20:15 -0200	[thread overview]
Message-ID: <20021204122015.G7742@blackjesus.async.com.br> (raw)
In-Reply-To: <200212021921.24330.trond.myklebust@fys.uio.no>; from trond.myklebust@fys.uio.no on Mon, Dec 02, 2002 at 07:21:24PM +0100


Hey there. Another update on the wierd hangs. After rebooting the server
(and cleaning up /var/lib/nfs/sm consequently) we've had a couple of
days with no hangs. I've been keeping an eye on that directory closely
the past days to see if we have leftover entries there, and today one
showed up:

anthem:~$ date
Wed Dec  4 11:59:31 BRDT 2002
anthem:~$ sudo ls -l /var/lib/nfs/sm
total 0
-rw-------   1 root     root            0 Dec  4 08:47 192.168.99.7

So I have a file there that has been untouched for over 3 hours now.
Having a look at the logfile, I have:

    [... bootup messages culminating in ntpd startup ]

    Dec  4 08:44:16 canario ntpd[317]: kernel time discipline status 0040
    Dec  4 08:44:16 canario ntpd[317]: frequency initialized -97.153
    from /var/lib/ntp/driftfile.canario
    Dec  4 08:47:22 canario rpc.statd[101]: Received erroneous SM_UNMON
    request from canario for 192.168.99.4

    [ same time as file above, see ]

    Dec  4 08:48:47 canario ntpd[317]: kernel time discipline status change 41
    Dec  4 08:50:01 canario /USR/SBIN/CRON[425]: (smmsp) CMD (test -x
    /usr/share/sendmail/sendmail && /usr/share/sendmail/sendmail cron-msp)

    [... more bootup messages]

The reboot log from the night before shows a series of 8 erroneous
SM_UNMON messenges for a period of about an hour before the actual
reboot. The machine was shut down unclean the time before that (well,
as unclean as an NFS-mounted box can be).

And, sure enough, when shutting down this box now, we get the
stalls/pauses again. nlm_debug outputs for both the client and the
server (neat 30s intervals show up :-) - I've selected specific seconds
of activity (i.e. there was a set of entries at 12:06:32 before):

    CLIENT:

    Dec  4 12:07:02 canario kernel: lockd: nlm_bind_host(c0a86304)
    Dec  4 12:07:02 canario kernel: lockd: next rebind in 6000 jiffies
    Dec  4 12:07:02 canario kernel: lockd: get host 192.168.99.4
    Dec  4 12:07:02 canario kernel: lockd: release host 192.168.99.4
    Dec  4 12:07:02 canario kernel: lockd: release host 192.168.99.4
    Dec  4 12:07:02 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
    Dec  4 12:07:02 canario kernel: lockd: host garbage collection
    Dec  4 12:07:02 canario kernel: lockd: nlmsvc_mark_resources
    Dec  4 12:07:02 canario kernel: lockd: get host 192.168.99.4
    Dec  4 12:07:02 canario kernel: lockd: nlm_bind_host(c0a86304)
    Dec  4 12:07:02 canario kernel: lockd: release host 192.168.99.4
    Dec  4 12:07:02 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
    Dec  4 12:07:02 canario kernel: lockd: get host 192.168.99.4
    Dec  4 12:07:02 canario kernel: lockd: nlm_bind_host(c0a86304)
    Dec  4 12:07:32 canario kernel: lockd: nlm_bind_host(c0a86304)
    Dec  4 12:07:32 canario kernel: lockd: get host 192.168.99.4
    Dec  4 12:07:32 canario kernel: lockd: release host 192.168.99.4
    Dec  4 12:07:32 canario kernel: lockd: release host 192.168.99.4
    Dec  4 12:07:32 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
    Dec  4 12:07:32 canario kernel: lockd: get host 192.168.99.4
    Dec  4 12:07:32 canario kernel: lockd: nlm_bind_host(c0a86304)
    Dec  4 12:07:32 canario kernel: lockd: release host 192.168.99.4
    Dec  4 12:07:32 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
    Dec  4 12:07:32 canario kernel: lockd: get host 192.168.99.4
    Dec  4 12:07:32 canario kernel: lockd: nlm_bind_host(c0a86304)

    SERVER:

    Dec  4 12:07:02 anthem kernel: lockd: request from c0a86307
    Dec  4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
    Dec  4 12:07:02 anthem kernel: lockd: found file c1d1f380 (count 0)
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=1
    Dec  4 12:07:02 anthem kernel: lockd: check f=c1d1f380 pd=1 0-9223372036854775807 ty=1 cookie=1393
    Dec  4 12:07:02 anthem kernel: lockd: deleting block d12e9000...
    Dec  4 12:07:02 anthem kernel: lockd: unblocking blocked lock
    Dec  4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
    Dec  4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:02 anthem kernel: lockd: request from c0a86307
    Dec  4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
    Dec  4 12:07:02 anthem kernel: lockd: found file c1d1f380 (count 0)
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_unlock(090b/64826, pi=1, 0-9223372036854775807)
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=2
    Dec  4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
    Dec  4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:02 anthem kernel: lockd: request from c0a86307
    Dec  4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
    Dec  4 12:07:02 anthem kernel: lockd: found file c1d1f380 (count 0)
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_lock(090b/64826, ty=1, pi=1, 0-9223372036854775807, bl=1)
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=1
    Dec  4 12:07:02 anthem kernel: lockd: blocking on this lock (allocating).
    Dec  4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: created block d12e9000...
    Dec  4 12:07:02 anthem kernel: lockd: nlmsvc_insert_block(d12e9000, -1)
    Dec  4 12:07:02 anthem kernel: lockd: blocking on this lock.
    Dec  4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:02 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
    Dec  4 12:07:02 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)
    Dec  4 12:07:02 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)
    Dec  4 12:07:32 anthem kernel: lockd: request from c0a86307
    Dec  4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:32 anthem kernel: lockd: host garbage collection
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_mark_resources
    Dec  4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
    Dec  4 12:07:32 anthem kernel: lockd: found file c1d1f380 (count 0)
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=1
    Dec  4 12:07:32 anthem kernel: lockd: check f=c1d1f380 pd=1 0-9223372036854775807 ty=1 cookie=1396
    Dec  4 12:07:32 anthem kernel: lockd: deleting block d12e9000...
    Dec  4 12:07:32 anthem kernel: lockd: unblocking blocked lock
    Dec  4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
    Dec  4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:32 anthem kernel: lockd: request from c0a86307
    Dec  4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
    Dec  4 12:07:32 anthem kernel: lockd: found file c1d1f380 (count 0)
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_unlock(090b/64826, pi=1, 0-9223372036854775807)
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775 807 ty=2
    Dec  4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
    Dec  4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
    Dec  4 12:07:32 anthem kernel: lockd: request from c0a86307
    Dec  4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
    Dec  4 12:07:32 anthem kernel: lockd: found file c1d1f380 (count 0)
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_lock(090b/64826, ty=1, pi=1, 0-9223372036854775 807, bl=1)
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775 807 ty=1
    Dec  4 12:07:32 anthem kernel: lockd: blocking on this lock (allocating).
    Dec  4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
    Dec  4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: created block d12e9000...
    Dec  4 12:07:32 anthem kernel: lockd: nlmsvc_insert_block(d12e9000, -1)
    Dec  4 12:07:32 anthem kernel: lockd: blocking on this lock.
    Dec  4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
    Dec  4 12:07:32 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
    Dec  4 12:07:32 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)
    Dec  4 12:07:32 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)

HTH. Yes, we're all sick of this subject :-/

Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL

  parent reply	other threads:[~2002-12-04 14:13 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-03 21:44 2.4.19+trond and diskless locking problems Christian Reis
2002-10-03 22:47 ` Trond Myklebust
2002-10-03 23:26   ` Christian Reis
2002-10-04  1:13     ` Trond Myklebust
2002-11-20 14:02       ` Christian Reis
2002-11-20 17:02         ` Trond Myklebust
2002-11-27  0:41           ` Christian Reis
2002-11-27  1:14             ` Trond Myklebust
2002-11-27  1:44               ` Christian Reis
2002-11-27 17:08               ` Christian Reis
2002-11-27 20:31                 ` Trond Myklebust
2002-11-29  1:34                   ` Christian Reis
2002-12-02 14:51                     ` Christian Reis
     [not found]                   ` <20021128232911.G18175@blackjesus.async.com.br>
     [not found]                     ` <200212021921.24330.trond.myklebust@fys.uio.no>
2002-12-04 14:20                       ` Christian Reis [this message]
2002-11-27  1:01           ` Christian Reis
2002-10-04  7:07 ` [NFS] " Juergen Hasch
2002-10-04 13:17   ` Christian Reis
2002-10-04 16:00     ` Juergen Hasch
2002-10-04 16:00       ` [NFS] " Juergen Hasch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20021204122015.G7742@blackjesus.async.com.br \
    --to=kiko@async.com.br \
    --cc=NFS@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=trond.myklebust@fys.uio.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.