All of lore.kernel.org
 help / color / mirror / Atom feed
From: Txema Heredia Genestar <txema.heredia@upf.edu>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: NFSv4 memory allocation bug?
Date: Fri, 14 Jan 2011 13:11:01 +0100	[thread overview]
Message-ID: <4D303D55.1080908@upf.edu> (raw)
In-Reply-To: <20110113180534.GK20946@fieldses.org>

  El 13/01/11 19:05, J. Bruce Fields escribió:
> On Thu, Jan 13, 2011 at 06:25:23PM +0100, Txema Heredia Genestar wrote:
>>   El 13/01/11 17:19, J. Bruce Fields escribió:
>>> On Thu, Jan 13, 2011 at 04:48:26PM +0100, Txema Heredia Genestar wrote:
>>>>   Hi Bruce, thanks for your answer
>>>>
>>>>
>>>> El 12/01/11 19:35, J. Bruce Fields escribió:
>>>>> On Wed, Jan 12, 2011 at 06:14:53PM +0100, Txema Heredia Genestar wrote:
>>>>>> Additionally, I have checked tcpdump and found, when mounting an
>>>>>> NFS4 drive from a working storage-system:
>>>>>> ...
>>>>>> 12:38:06.372303 IP client.907>    storage.nfs: . ack 29 win 46
>>>>>> <nop,nop,timestamp 4063464822 174132214>
>>>>>> 12:38:06.372429 IP client.2364980656>    storage.nfs: 148 getattr [|nfs]
>>>>>> 12:38:06.372792 IP storage.nfs>    client.2364980656: reply ok 248
>>>>>> getattr [|nfs]
>>>>>> 12:38:06.372958 IP client.2381757872>    storage.nfs: 172 getattr [|nfs]
>>>>>> 12:38:06.373132 IP storage.nfs>    client.2381757872: reply ok 88
>>>>>> getattr [|nfs]
>>>>>> 12:38:06.373157 IP client.2398535088>    storage.nfs: 176 getattr [|nfs]
>>>>>> 12:38:06.373316 IP storage.nfs>    client.2398535088: reply ok 100
>>>>>> getattr [|nfs]
>>>>>> 12:38:06.373339 IP client.2415312304>    storage.nfs: 172 getattr [|nfs]
>>>>>>
>>>>>>
>>>>>> But when I mount from the same client, the NFS4 share from my server
>>>>>> gets stuck on the "getattr" call
>>>>>> ...
>>>>>> 12:36:37.051840 IP client.926>    server.nfs: . ack 29 win 140
>>>>>> <nop,nop,timestamp 4063375488 434039929>
>>>>>> 12:36:37.051903 IP client.1734362088>    server.nfs: 148 getattr [|nfs]
>>>>>> 12:36:37.090274 IP server.nfs>    client.926: . ack 192 win 4742
>>>>>> <nop,nop,timestamp 434039939 4063375488>
>>>>>> ---silence---
>>>>> Something like wireshark would give a few more details.
>>>> I have wiresharked it and I don't see any differences between the
>>>> "getattr" packages in both cases. Do you want me to paste them in a
>>>> specific format?
>>> I'm curious which attributes were requested.  In particular, is the
>>> unreplied-to getattr the *first* time that the client requests the owner
>>> or owner_group attributes?
>>>
>> Yes, the "unreplied-to" getattr call is the very first (and only)
>> time it those are requested:
> Yeah, so almost certainly some idmapping problem.
Mi current idmap library is 0.12-16.20, built on 16/1/2009

>>> Might also be worth looking at the nfs4.idtoname cache contents after
>>> the hang:
>>>
>>> 	rpcdebug -m rpc -s cache
>>> 	cat /proc/net/rpc/nfs4.idtoname/content
>>>
>>> I seem to recall c9b6cbe56d3ac471e6cd72a59ec9e324b3417016 or
>>> 0a725fc4d3bfc4734164863d6c50208b109ca5c7 being possible causes of hangs.
>>>
>>> --b.
>> Unfortunately, rpcdebug is not present in this server. So my
>> /proc/net/rpc/nfs4.idtoname/content file is empty.
>>
>> May this command be of any use?
>> "echo "65535">  /proc/sys/sunrpc/rpc_debug"
> Yeah doing that may get you a little more information.
>
> Note that kernel's pretty out of date with respect to mainstream so your
> best bet is your distro support or seeing if you can reproduce the
> problem on a more recent kernel.
>
> --b.

This is the /var/log/messages log after enabling debugging for nfs, nfsd 
and rpc:

Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 TCP (listen) 
state change 10
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 served by 
daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket 
ffff8101fd09c080, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff8101fd09c080 data 0 
conn 1 close 0
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff8101fd09c080 sock 
ffff81028b747380
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff810282a67780 allocated
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 busy, not 
enqueued
Jan 14 12:01:49 server kernel: nfsd: connect from unprivileged port: 
172.22.201.203:40892
Jan 14 12:01:49 server kernel: nfsd: connect from 172.22.201.203:9fbc
Jan 14 12:01:49 server kernel: svc: svc_setup_socket ffff810282a67780
Jan 14 12:01:49 server kernel: setting up TCP socket for reading
Jan 14 12:01:49 server kernel: svc: svc_setup_socket created 
ffff810167ed3d80 (inet ffff810125e547c0)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by 
daemon ffff810236f4f000
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 served by 
daemon ffff81029edad000
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP data 
ready (svsk ffff810167ed3d80)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not 
enqueued
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 busy, not 
enqueued
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for 
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000, socket 
ffff810167ed3d80, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 
recvfrom(ffff810167ed3e2c, 0) = 4
Jan 14 12:01:49 server kernel: svc: TCP record, 40 bytes
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 
recvfrom(ffff810149105028, 4056) = 40
Jan 14 12:01:49 server kernel: svc: TCP complete record (40 bytes)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by 
daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: got len=40
Jan 14 12:01:49 server kernel: svc: svc_authenticate (0)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not 
enqueued
Jan 14 12:01:49 server kernel: svc: calling dispatcher
Jan 14 12:01:49 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 
sendto([ffff8101dfe10000 28... ], 28) = 28 (addr cbc916ac)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not 
enqueued
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000 waiting for 
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff81029edad000, socket 
ffff8101fd09c080, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff8101fd09c080 data 0 
conn 1 close 0
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff8101fd09c080 sock 
ffff81028b747380
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff810282a67d00 allocated
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: server ffff81029edad000 waiting for 
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket 
ffff810167ed3d80, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 
recvfrom(ffff810167ed3e28, 4) = -11
Jan 14 12:01:49 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 14 12:01:49 server kernel: svc: got len=-11
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for 
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP 
(connected) state change 8 (svsk ffff810167ed3d80)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by 
daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP data 
ready (svsk ffff810167ed3d80)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not 
enqueued
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket 
ffff810167ed3d80, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 
conn 0 close 1
Jan 14 12:01:49 server kernel: svc: svc_delete_socket(ffff810167ed3d80)
Jan 14 12:01:49 server kernel: svc: server socket destroy delayed
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: releasing dead socket
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for 
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 TCP (listen) 
state change 10
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 served by 
daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket 
ffff8101fd09c080, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff8101fd09c080 data 0 
conn 1 close 0
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff8101fd09c080 sock 
ffff81028b747380
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff810282a67780 allocated
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 busy, not 
enqueued
Jan 14 12:01:49 server kernel: nfsd: connect from 172.22.201.203:03d0
Jan 14 12:01:49 server kernel: svc: svc_setup_socket ffff810282a67780
Jan 14 12:01:49 server kernel: setting up TCP socket for reading
Jan 14 12:01:49 server kernel: svc: svc_setup_socket created 
ffff810167ed3d80 (inet ffff810125e547c0)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP data 
ready (svsk ffff810167ed3d80)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not 
enqueued
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by 
daemon ffff81029edad000
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 served by 
daemon ffff810236f4f000
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: socket ffff81012024c7c0 busy, not 
enqueued
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for 
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff81029edad000, socket 
ffff810167ed3d80, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 
recvfrom(ffff810167ed3e2c, 0) = 4
Jan 14 12:01:49 server kernel: svc: TCP record, 40 bytes
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 
recvfrom(ffff8102451ec028, 4056) = 40
Jan 14 12:01:49 server kernel: svc: TCP complete record (40 bytes)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by 
daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: got len=40
Jan 14 12:01:49 server kernel: svc: svc_authenticate (0)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not 
enqueued
Jan 14 12:01:49 server kernel: svc: calling dispatcher
Jan 14 12:01:49 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 
sendto([ffff81017bafe000 28... ], 28) = 28 (addr cbc916ac)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not 
enqueued
Jan 14 12:01:49 server kernel: svc: server ffff81029edad000 waiting for 
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000, socket 
ffff8101fd09c080, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff8101fd09c080 data 0 
conn 1 close 0
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff8101fd09c080 sock 
ffff81028b747380
Jan 14 12:01:49 server kernel: svc: tcp_accept ffff810282a67d00 allocated
Jan 14 12:01:49 server kernel: svc: got len=0
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000 waiting for 
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket 
ffff810167ed3d80, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 
recvfrom(ffff810167ed3e28, 4) = -11
Jan 14 12:01:49 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 14 12:01:49 server kernel: svc: got len=-11
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for 
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 TCP data 
ready (svsk ffff810167ed3d80)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by 
daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket 
ffff810167ed3d80, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 
recvfrom(ffff810167ed3e2c, 0) = 4
Jan 14 12:01:49 server kernel: svc: TCP record, 144 bytes
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 
recvfrom(ffff810273143090, 3952) = 144
Jan 14 12:01:49 server kernel: svc: TCP complete record (144 bytes)
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by 
daemon ffff810236f4f000
Jan 14 12:01:49 server kernel: svc: got len=144
Jan 14 12:01:49 server kernel: svc: svc_authenticate (1)
Jan 14 12:01:49 server kernel: Want update, refage=120, age=0
Jan 14 12:01:49 server kernel: svc: svc_process dropit
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 dropped request
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not 
enqueued
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for 
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000, socket 
ffff810167ed3d80, inuse=2
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 
recvfrom(ffff810167ed3e28, 4) = -11
Jan 14 12:01:49 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 14 12:01:49 server kernel: svc: got len=-11
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000 waiting for 
data (to = 900000)
Jan 14 12:01:49 server kernel: revisit queued
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by 
daemon ffff810236f4f000
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000, socket 
ffff810167ed3d80, inuse=1
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 0 
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 served by 
daemon ffff8102a5ede400
Jan 14 12:01:49 server kernel: svc: got len=144
Jan 14 12:01:49 server kernel: svc: svc_authenticate (1)
Jan 14 12:01:49 server kernel: svc: calling dispatcher
Jan 14 12:01:49 server kernel: nfsd_dispatch: vers 4 proc 1
Jan 14 12:01:49 server kernel: nfsv4 compound op #1: 24
Jan 14 12:01:49 server kernel: Want update, refage=120, age=0
Jan 14 12:01:49 server kernel: nfsd: Dropping request due to malloc failure!
**Jan 14 12:01:49 server kernel: svc: svc_process dropit**
**Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 dropped 
request**
Jan 14 12:01:49 server kernel: svc: socket ffff810125e547c0 busy, not 
enqueued
Jan 14 12:01:49 server kernel: svc: server ffff810236f4f000 waiting for 
data (to = 900000)
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400, socket 
ffff810167ed3d80, inuse=2
Jan 14 12:01:49 server kernel: svc: tcp_recv ffff810167ed3d80 data 0 
conn 0 close 0
Jan 14 12:01:49 server kernel: svc: socket ffff810167ed3d80 
recvfrom(ffff810167ed3e28, 4) = -11
Jan 14 12:01:49 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 14 12:01:49 server kernel: svc: got len=-11
Jan 14 12:01:49 server kernel: svc: server ffff8102a5ede400 waiting for 
data (to = 900000)


---After I Ctrl+C 'd the mount command in the client:---


Jan 14 12:01:57 server kernel: svc: socket ffff810125e547c0 TCP 
(connected) state change 8 (svsk ffff810167ed3d80)
Jan 14 12:01:57 server kernel: svc: socket ffff810125e547c0 served by 
daemon ffff8102a5ede400
Jan 14 12:01:57 server kernel: svc: socket ffff810125e547c0 TCP data 
ready (svsk ffff810167ed3d80)
Jan 14 12:01:57 server kernel: svc: socket ffff810125e547c0 busy, not 
enqueued
Jan 14 12:01:57 server kernel: svc: server ffff8102a5ede400, socket 
ffff810167ed3d80, inuse=2
Jan 14 12:01:57 server kernel: svc: tcp_recv ffff810167ed3d80 data 1 
conn 0 close 1
Jan 14 12:01:57 server kernel: svc: svc_delete_socket(ffff810167ed3d80)
Jan 14 12:01:57 server kernel: svc: server socket destroy delayed
Jan 14 12:01:57 server kernel: svc: got len=0
Jan 14 12:01:57 server kernel: svc: server ffff8102a5ede400 waiting for 
data (to = 900000)


I also tried to run idmap in verbose mode to see what it says
  # rpc.idmapd -vvv -f
rpc.idmapd: Opened /proc/net/rpc/nfs4.nametoid/channel
rpc.idmapd: Opened /proc/net/rpc/nfs4.idtoname/channel

but it doesn't send anything to messages and, when checking 
/proc/net/rpc/nfs4.idtoname/content, it's still empty
# cat /proc/net/rpc/nfs4.idtoname/content
#domain type id [name]

May this be a hint that my nfsd and idmapd are not communicating? (and 
so the getattr is never answered)


As for the kernel version, unfortunately I am not allowed to upgrade it 
right now, as this is a production host (it's in a High Availability 
cluster but I can do the nfs tests because the other one is serving the 
"main" nfs right now), but I expect there will be an updating stop 
during this year.

Txema

PS: Damn! I have just realized you are the author of idmap, thanks man!

  reply	other threads:[~2011-01-14 12:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-12 17:14 NFSv4 memory allocation bug? Txema Heredia Genestar
2011-01-12 18:35 ` J. Bruce Fields
2011-01-13 15:48   ` Txema Heredia Genestar
2011-01-13 16:19     ` J. Bruce Fields
2011-01-13 17:25       ` Txema Heredia Genestar
2011-01-13 18:05         ` J. Bruce Fields
2011-01-14 12:11           ` Txema Heredia Genestar [this message]
2011-02-08 18:07             ` Txema Heredia
2011-02-09  0:09               ` 'J. Bruce Fields'
2011-02-09 13:57                 ` Txema Heredia Genestar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D303D55.1080908@upf.edu \
    --to=txema.heredia@upf.edu \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.