From: Txema Heredia Genestar <txema.heredia@upf.edu>
To: linux-nfs@vger.kernel.org
Subject: NFSv4 memory allocation bug?
Date: Wed, 12 Jan 2011 18:14:53 +0100 [thread overview]
Message-ID: <4D2DE18D.40604@upf.edu> (raw)
Hello everybody,
I have a host serving disk through the network using NFS v3, and it has
been working ok for 3 years. Recently I have been benchmarking its
throughput and I wanted to test how it would work if I used NFS v4
instead, but I couldn't mount a single drive.
This is what I see:
Client:
client# mount -v -t nfs4 serverurl:/ /mnt/NFS_test/
mount: pinging: prog 100003 vers 4 prot tcp port 2049
---no response---
Server:
No error messages. Once I enabled debugging with echo "65535" >
/proc/sys/sunrpc/nfsd_debug :
/var/log/messages:
Jan 12 13:02:35 server kernel: NFSD: laundromat service - starting
Jan 12 13:02:35 server kernel: NFSD: laundromat_main - sleeping for 90
seconds
Jan 12 13:02:40 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 12 13:02:40 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 12 13:02:40 server kernel: nfsd_dispatch: vers 4 proc 1
Jan 12 13:02:40 server kernel: nfsv4 compound op #1: 24
Jan 12 13:02:40 server kernel: nfsd: Dropping request due to malloc failure!
And if rpc debugging is enabled:
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 busy, not
enqueued
Jan 11 18:50:48 server kernel: svc: calling dispatcher
Jan 11 18:50:48 server kernel: nfsd_dispatch: vers 4 proc 0
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580
sendto([ffff8102273eb000 28... ], 28) = 28 (addr cbc916ac)
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 busy, not
enqueued
Jan 11 18:50:48 server kernel: svc: server ffff81027d33d000 waiting for
data (to = 900000)
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400, socket
ffff810127c28880, inuse=1
Jan 11 18:50:48 server kernel: svc: tcp_recv ffff810127c28880 data 0
conn 1 close 0
Jan 11 18:50:48 server kernel: svc: tcp_accept ffff810127c28880 sock
ffff81029821b340
Jan 11 18:50:48 server kernel: svc: tcp_accept ffff81027c7f08c0 allocated
Jan 11 18:50:48 server kernel: svc: got len=0
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400 waiting for
data (to = 900000)
Jan 11 18:50:48 server kernel: svc: server ffff810251911800, socket
ffff81029c53a580, inuse=1
Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 1
conn 0 close 0
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580
recvfrom(ffff81029c53a628, 4) = -11
Jan 11 18:50:48 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 11 18:50:48 server kernel: svc: got len=-11
Jan 11 18:50:48 server kernel: svc: server ffff810251911800 waiting for
data (to = 900000)
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 TCP data
ready (svsk ffff81029c53a580)
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 served by
daemon ffff810251911800
Jan 11 18:50:48 server kernel: svc: server ffff810251911800, socket
ffff81029c53a580, inuse=1
Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 1
conn 0 close 0
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580
recvfrom(ffff81029c53a62c, 0) = 4
Jan 11 18:50:48 server kernel: svc: TCP record, 144 bytes
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580
recvfrom(ffff81011f93a090, 3952) = 144
Jan 11 18:50:48 server kernel: svc: TCP complete record (144 bytes)
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 served by
daemon ffff81025e481400
Jan 11 18:50:48 server kernel: svc: got len=144
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400, socket
ffff81029c53a580, inuse=2
Jan 11 18:50:48 server kernel: svc: svc_authenticate (1)
Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 1
conn 0 close 0
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580
recvfrom(ffff81029c53a628, 4) = -11
Jan 11 18:50:48 server kernel: Want update, refage=120, age=0
Jan 11 18:50:48 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 11 18:50:48 server kernel: svc: got len=-11
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400 waiting for
data (to = 900000)
Jan 11 18:50:48 server kernel: svc: svc_process dropit
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580 dropped request
Jan 11 18:50:48 server kernel: svc: server ffff810251911800 waiting for
data (to = 900000)
Jan 11 18:50:48 server kernel: revisit queued
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 served by
daemon ffff810251911800
Jan 11 18:50:48 server kernel: svc: server ffff810251911800, socket
ffff81029c53a580, inuse=1
Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 0
conn 0 close 0
Jan 11 18:50:48 server kernel: svc: socket ffff810252868680 served by
daemon ffff81025e481400
Jan 11 18:50:48 server kernel: svc: got len=144
Jan 11 18:50:48 server kernel: svc: svc_authenticate (1)
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400, socket
ffff81029c53a580, inuse=2
Jan 11 18:50:48 server kernel: svc: calling dispatcher
Jan 11 18:50:48 server kernel: nfsd_dispatch: vers 4 proc 1
Jan 11 18:50:48 server kernel: svc: tcp_recv ffff81029c53a580 data 0
conn 0 close 0
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580
recvfrom(ffff81029c53a628, 4) = -11
Jan 11 18:50:48 server kernel: nfsv4 compound op #1: 24
Jan 11 18:50:48 server kernel: RPC: TCP recvfrom got EAGAIN
Jan 11 18:50:48 server kernel: svc: got len=-11
Jan 11 18:50:48 server kernel: svc: server ffff81025e481400 waiting for
data (to = 900000)
Jan 11 18:50:48 server kernel: Want update, refage=120, age=0
Jan 11 18:50:48 server kernel: nfsd: Dropping request due to malloc failure!
Jan 11 18:50:48 server kernel: svc: svc_process dropit
Jan 11 18:50:48 server kernel: svc: socket ffff81029c53a580 dropped request
Jan 11 18:50:48 server kernel: svc: server ffff810251911800 waiting for
data (to = 900000)
Additionally, I have checked tcpdump and found, when mounting an NFS4
drive from a working storage-system:
...
12:38:06.372303 IP client.907 > storage.nfs: . ack 29 win 46
<nop,nop,timestamp 4063464822 174132214>
12:38:06.372429 IP client.2364980656 > storage.nfs: 148 getattr [|nfs]
12:38:06.372792 IP storage.nfs > client.2364980656: reply ok 248 getattr
[|nfs]
12:38:06.372958 IP client.2381757872 > storage.nfs: 172 getattr [|nfs]
12:38:06.373132 IP storage.nfs > client.2381757872: reply ok 88 getattr
[|nfs]
12:38:06.373157 IP client.2398535088 > storage.nfs: 176 getattr [|nfs]
12:38:06.373316 IP storage.nfs > client.2398535088: reply ok 100 getattr
[|nfs]
12:38:06.373339 IP client.2415312304 > storage.nfs: 172 getattr [|nfs]
But when I mount from the same client, the NFS4 share from my server
gets stuck on the "getattr" call
...
12:36:37.051840 IP client.926 > server.nfs: . ack 29 win 140
<nop,nop,timestamp 4063375488 434039929>
12:36:37.051903 IP client.1734362088 > server.nfs: 148 getattr [|nfs]
12:36:37.090274 IP server.nfs > client.926: . ack 192 win 4742
<nop,nop,timestamp 434039939 4063375488>
---silence---
So I suppose that the "RPC: TCP recvfrom got EAGAIN" on the messages log
corresponds to that "getattr[|nfs]" call.
I have been searching around and I have found several threads about
either the "malloc failure" message or the "EAGAIN" message. But I
haven't found anything concerning them both at the same time. I have
also checked for this kind of problems in NFS4 and found nothing useful.
May this be some kind of (already solved) bug in my nfs implementation?
I'm running a pretty old version (SuSE LES 10.2, nfs-utils 1.0.7-36.2)
Thanks in advance,
Txema
My configuration files:
# cat /etc/sysconfig/nfs
USE_KERNEL_NFSD_NUMBER="4"
MOUNTD_PORT="980"
NFS_SECURITY_GSS="no"
NFS4_SUPPORT="yes"
# cat /etc/exports
#TESTS NFS 4
/NFS4exports
172.22.201.0/24(rw,sync,no_root_squash,no_subtree_check,secure,secure_locks,acl,mp,fsid=0,no_wdelay)
/NFS4exports
172.22.202.0/24(rw,sync,no_root_squash,no_subtree_check,secure,secure_locks,acl,mp,fsid=0,no_wdelay)
/NFS4exports/NFS_test_params
172.22.201.0/24(rw,sync,no_root_squash,no_subtree_check,secure,hide,secure_locks,acl,mp,fsid=255,no_wdelay)
/NFS4exports/NFS_test_params
172.22.202.0/24(rw,sync,no_root_squash,no_subtree_check,secure,hide,secure_locks,acl,mp,fsid=255,no_wdelay)
#NFS 3 working ones
#/sp/fs/NFS_test_params
172.22.201.0/24(rw,sync,no_root_squash,no_subtree_check,secure,hide,secure_locks,acl,mp,fsid=255,no_wdelay)
172.22.202.0/24(rw,sync,no_root_squash,no_subtree_check,secure,hide,secure_locks,acl,mp,fsid=255,no_wdelay)
# cat /etc/idmapd.conf
[General]
Verbosity=1
Pipefs-Directory=/var/lib/rpc_pipefs
Domain=mydomain
[Mapping]
Nobody-User=nobody
Nobody-Group=nobody
PS: This is my very first post on this list, so I'm not sure if I am
missing something.
next reply other threads:[~2011-01-12 17:14 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-12 17:14 Txema Heredia Genestar [this message]
2011-01-12 18:35 ` NFSv4 memory allocation bug? J. Bruce Fields
2011-01-13 15:48 ` Txema Heredia Genestar
2011-01-13 16:19 ` J. Bruce Fields
2011-01-13 17:25 ` Txema Heredia Genestar
2011-01-13 18:05 ` J. Bruce Fields
2011-01-14 12:11 ` Txema Heredia Genestar
2011-02-08 18:07 ` Txema Heredia
2011-02-09 0:09 ` 'J. Bruce Fields'
2011-02-09 13:57 ` Txema Heredia Genestar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D2DE18D.40604@upf.edu \
--to=txema.heredia@upf.edu \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).