From: Txema Heredia Genestar <txema.heredia@upf.edu>
To: linux-nfs@vger.kernel.org
Subject: NFSv4 memory allocation bug?
Date: Wed, 12 Jan 2011 18:14:53 +0100 [thread overview]
Message-ID: <4D2DE18D.40604@upf.edu> (raw)
Hello everybody,
I have a host serving disk through the network using NFS v3, and it has
been working ok for 3 years. Recently I have been benchmarking its
throughput and I wanted to test how it would work if I used NFS v4
instead, but I couldn't mount a single drive.
This is what I see:
Client:
client# mount -v -t nfs4 serverurl:/ /mnt/NFS_test/
mount: pinging: prog 100003 vers 4 prot tcp port 2049
---no response---
Server:
No error messages. Once I enabled debugging with echo "65535" >
/proc/sys/sunrpc/nfsd_debug :
/var/log/messages:
Jan 12 13:02:35 server kernel: NFSD: laundromat service - starting
Jan 12 13:02:35 server kernel: NFSD: laundromat_main - sleeping for 90
seconds
Jan 12 13:02:40 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 12 13:02:40 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 12 13:02:40 server kernel: nfsd_dispatch: vers 4 proc 1
Jan 12 13:02:40 server kernel: nfsv4 compound op #1: 24
Jan 12 13:02:40 server kernel: nfsd: Dropping request due to malloc failure!
And if rpc debugging is enabled:
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 busy, not
enqueued
Jan 11 18:50:48 server kernel: svc: calling dispatcher
Jan 11 18:50:48 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580
sendto([ffff8102273eb000 28... ], 28) = 28 (addr cbc916ac)
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 busy, not
enqueued
Jan 11 18:50:48 server kernel: svc: server ffff81027d33d000 waiting for
data (to = 900000)
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400, socket
ffff810127c28880, inuse=1
Jan 11 18:50:48 server kernel: svc: tcp_recv ffff810127c28880 data 0
conn 1 close 0
Jan 11 18:50:48 server kernel: svc: tcp_accept ffff810127c28880 sock
ffff81029821b340
Jan 11 18:50:48 server kernel: svc: tcp_accept ffff81027c7f08c0 allocated
Jan 11 18:50:48 server kernel: svc: got len=0
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400 waiting for
data (to = 900000)
Jan 11 18:50:48 server kernel: svc: server ffff810251911800, socket
ffff81029c53a580, inuse=1
Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 1
conn 0 close 0
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580
recvfrom(ffff81029c53a628, 4) = -11
Jan 11 18:50:48 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 11 18:50:48 server kernel: svc: got len=-11
Jan 11 18:50:48 server kernel: svc: server ffff810251911800 waiting for
data (to = 900000)
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 TCP data
ready (svsk ffff81029c53a580)
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 served by
daemon ffff810251911800
Jan 11 18:50:48 server kernel: svc: server ffff810251911800, socket
ffff81029c53a580, inuse=1
Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 1
conn 0 close 0
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580
recvfrom(ffff81029c53a62c, 0) = 4
Jan 11 18:50:48 server kernel: svc: TCP record, 144 bytes
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580
recvfrom(ffff81011f93a090, 3952) = 144
Jan 11 18:50:48 server kernel: svc: TCP complete record (144 bytes)
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 served by
daemon ffff81025e481400
Jan 11 18:50:48 server kernel: svc: got len=144
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400, socket
ffff81029c53a580, inuse=2
Jan 11 18:50:48 server kernel: svc: svc_authenticate (1)
Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 1
conn 0 close 0
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580
recvfrom(ffff81029c53a628, 4) = -11
Jan 11 18:50:48 server kernel: Want update, refage=120, age=0
Jan 11 18:50:48 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 11 18:50:48 server kernel: svc: got len=-11
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400 waiting for
data (to = 900000)
Jan 11 18:50:48 server kernel: svc: svc_process dropit
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580 dropped request
Jan 11 18:50:48 server kernel: svc: server ffff810251911800 waiting for
data (to = 900000)
Jan 11 18:50:48 server kernel: revisit queued
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 served by
daemon ffff810251911800
Jan 11 18:50:48 server kernel: svc: server ffff810251911800, socket
ffff81029c53a580, inuse=1
Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 0
conn 0 close 0
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 served by
daemon ffff81025e481400
Jan 11 18:50:48 server kernel: svc: got len=144
Jan 11 18:50:48 server kernel: svc: svc_authenticate (1)
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400, socket
ffff81029c53a580, inuse=2
Jan 11 18:50:48 server kernel: svc: calling dispatcher
Jan 11 18:50:48 server kernel: nfsd_dispatch: vers 4 proc 1
Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 0
conn 0 close 0
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580
recvfrom(ffff81029c53a628, 4) = -11
Jan 11 18:50:48 server kernel: nfsv4 compound op #1: 24
Jan 11 18:50:48 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 11 18:50:48 server kernel: svc: got len=-11
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400 waiting for
data (to = 900000)
Jan 11 18:50:48 server kernel: Want update, refage=120, age=0
Jan 11 18:50:48 server kernel: nfsd: Dropping request due to malloc failure!
Jan 11 18:50:48 server kernel: svc: svc_process dropit
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580 dropped request
Jan 11 18:50:48 server kernel: svc: server ffff810251911800 waiting for
data (to = 900000)
Additionally, I have checked tcpdump and found, when mounting an NFS4
drive from a working storage-system:
...
12:38:06.372303 IP client.907 > storage.nfs: . ack 29 win 46
<nop,nop,timestamp 4063464822 174132214>
12:38:06.372429 IP client.2364980656 > storage.nfs: 148 getattr [|nfs]
12:38:06.372792 IP storage.nfs > client.2364980656: reply ok 248 getattr
[|nfs]
12:38:06.372958 IP client.2381757872 > storage.nfs: 172 getattr [|nfs]
12:38:06.373132 IP storage.nfs > client.2381757872: reply ok 88 getattr
[|nfs]
12:38:06.373157 IP client.2398535088 > storage.nfs: 176 getattr [|nfs]
12:38:06.373316 IP storage.nfs > client.2398535088: reply ok 100 getattr
[|nfs]
12:38:06.373339 IP client.2415312304 > storage.nfs: 172 getattr [|nfs]
But when I mount from the same client, the NFS4 share from my server
gets stuck on the "getattr" call
...
12:36:37.051840 IP client.926 > server.nfs: . ack 29 win 140
<nop,nop,timestamp 4063375488 434039929>
12:36:37.051903 IP client.1734362088 > server.nfs: 148 getattr [|nfs]
12:36:37.090274 IP server.nfs > client.926: . ack 192 win 4742
<nop,nop,timestamp 434039939 4063375488>
---silence---
So I suppose that the "RPC: TCP recvfrom got EAGAIN" on the messages log
corresponds to that "getattr[|nfs]" call.
I have been searching around and I have found several threads about
either the "malloc failure" message or the "EAGAIN" message. But I
haven't found anything concerning them both at the same time. I have
also checked for this kind of problems in NFS4 and found nothing useful.
May this be some kind of (already solved) bug in my nfs implementation?
I'm running a pretty old version (SuSE LES 10.2, nfs-utils 1.0.7-36.2)
Thanks in advance,
Txema
My configuration files:
# cat /etc/sysconfig/nfs
USE_KERNEL_NFSD_NUMBER="4"
MOUNTD_PORT="980"
NFS_SECURITY_GSS="no"
NFS4_SUPPORT="yes"
# cat /etc/exports
#TESTS NFS 4
/NFS4exports
172.22.201.0/24(rw,sync,no_root_squash,no_subtree_check,secure,secure_locks,acl,mp,fsid=0,no_wdelay)
/NFS4exports
172.22.202.0/24(rw,sync,no_root_squash,no_subtree_check,secure,secure_locks,acl,mp,fsid=0,no_wdelay)
/NFS4exports/NFS_test_params
172.22.201.0/24(rw,sync,no_root_squash,no_subtree_check,secure,hide,secure_locks,acl,mp,fsid=255,no_wdelay)
/NFS4exports/NFS_test_params
172.22.202.0/24(rw,sync,no_root_squash,no_subtree_check,secure,hide,secure_locks,acl,mp,fsid=255,no_wdelay)
#NFS 3 working ones
#/sp/fs/NFS_test_params
172.22.201.0/24(rw,sync,no_root_squash,no_subtree_check,secure,hide,secure_locks,acl,mp,fsid=255,no_wdelay)
172.22.202.0/24(rw,sync,no_root_squash,no_subtree_check,secure,hide,secure_locks,acl,mp,fsid=255,no_wdelay)
# cat /etc/idmapd.conf
[General]
Verbosity=1
Pipefs-Directory=/var/lib/rpc_pipefs
Domain=mydomain
[Mapping]
Nobody-User=nobody
Nobody-Group=nobody
PS: This is my very first post on this list, so I'm not sure if I am
missing something.
next reply other threads:[~2011-01-12 17:14 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-12 17:14 Txema Heredia Genestar [this message]
2011-01-12 18:35 ` NFSv4 memory allocation bug? J. Bruce Fields
2011-01-13 15:48 ` Txema Heredia Genestar
2011-01-13 16:19 ` J. Bruce Fields
2011-01-13 17:25 ` Txema Heredia Genestar
2011-01-13 18:05 ` J. Bruce Fields
2011-01-14 12:11 ` Txema Heredia Genestar
2011-02-08 18:07 ` Txema Heredia
2011-02-09 0:09 ` 'J. Bruce Fields'
2011-02-09 13:57 ` Txema Heredia Genestar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D2DE18D.40604@upf.edu \
--to=txema.heredia@upf.edu \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.