linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Lutz Vieweg <lvml@5t9.de>, linux-nfs@vger.kernel.org
Subject: Re: PROBLEM: nfs I/O errors with sqlite applications
Date: Fri, 09 Jun 2017 08:07:23 +1000	[thread overview]
Message-ID: <871squb0bo.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <59399945.2020507@5t9.de>

[-- Attachment #1: Type: text/plain, Size: 3349 bytes --]

On Thu, Jun 08 2017, Lutz Vieweg wrote:

> On 06/07/2017 05:08 AM, NeilBrown wrote:
>>>>   fcntl(3, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 EIO (Input/output error)
>>>>   write(2, "Error: disk I/O error\n", 22Error: disk I/O error
>>>
>>> But unlike the original reporter, we use the NFS v3 protocol:
>>>> myserver:/data on /data type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,timeo=600,retrans=2,sec=sys,mountvers=3,mountport=20048,mountproto=udp,local_lock=none)
>>
>> Using "soft" is not a good idea.  It could be the cause, but it isn't very
>> likely if NFS is otherwise working OK.
>
> NFS v3 has been working very well for us for many years.
> When we upgraded those two servers ~3 years ago, we did try NFS v4 first, but
> that had caused frequent occurences of "un-killable processes in D state",
> so we had to revert to v3 to allow for stable operation.

I queried the use of "soft" - as opposed to "hard".
You defend the use of v3 as opposed to v4.
I think there is some miscommunication happening here.

If v3 works better for you than v4, then certainly use it.
You could try reporting details of the problems with v4, but I cannot
promise a helpful response, so it is totally up to you.

But "soft" is generally a bad idea.  It can lead to data corruption in
various way as it ports errors to user-space which user-space is often
not expecting.

These days, the processes in D state are (usually) killable.

>
>> It might help to run
>>    rpcdebug -m nfs -s all; rpcdebug -m nlm -s all ;rpcdebug -m rpc -s all
>>    #repeat your test
>>    rpcdebug -m nfs -c all; rpcdebug -m nlm -c all ;rpcdebug -m rpc -c all
>>
>> then collect the kernel logs (possibly just run "dmesg") and post all
>> the messages which happened at that time.
>
> Ok, attaching a log generated like this while running:
>
> sqlite3 x.sqlite "PRAGMA case_sensitive_like=1;PRAGMA synchronous=OFF;PRAGMA 
> recursive_triggers=ON;PRAGMA foreign_keys=OFF;PRAGMA locking_mode = NORMAL;PRAGMA journal_mode = 
> TRUNCATE;"

Thanks. Probably the key line is

[2339904.695240] RPC: 46702 remote rpcbind: RPC program/version unavailable

The client is trying to talk to lockd on the server, and lockd doesn't
seem to be there.


>
>> It might also help to find the port number that lockd is running on
>>     rpcinfo -p $SERVER | grep 'tcp.*nlockmgr'
>
> None of the ports reported this way contains the string "nlockmgr":

This agrees with the line from the log.  If nlockmgr isn't listed, then
locking cannot work.  This is the cause of your problem.

>> rpcinfo -p myserver
>>    program vers proto   port  service
>>     100000    4   tcp    111  portmapper
>>     100000    3   tcp    111  portmapper
>>     100000    2   tcp    111  portmapper
>>     100000    4   udp    111  portmapper
>>     100000    3   udp    111  portmapper
>>     100000    2   udp    111  portmapper

Even "nfs" isn't listed - but clearly the nfs server is running.

My guess is that rpcbind was restarted with the "-w" flag, so it lost
all the state that it previosly had.
If you stop and restart NFS service on the server, it might start
working again.  Otherwise just reboot the nfs server.

NeilBrown


>
> Regards,
>
> Lutz Vieweg

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2017-06-08 22:07 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-12 16:48 PROBLEM: nfs I/O errors with sqlite applications Nick Bowler
2015-10-12 19:25 ` J. Bruce Fields
2015-10-12 19:46   ` J. Bruce Fields
2015-10-13  3:01     ` Nick Bowler
2015-10-13 10:52       ` Jeff Layton
2015-10-13 12:54         ` Nick Bowler
2016-07-29 16:43           ` Nick Bowler
2016-07-29 17:52             ` Jeff Layton
2017-06-06 16:46               ` Lutz Vieweg
2017-06-07  3:08                 ` NeilBrown
2017-06-08 18:36                   ` Lutz Vieweg
2017-06-08 22:07                     ` NeilBrown [this message]
2017-06-09 11:01                       ` Lutz Vieweg
2017-06-09 22:01                         ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871squb0bo.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=lvml@5t9.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).