public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Terry Loftin <terry.loftin@hp.com>
To: SteveD@redhat.com, linux-nfs@vger.kernel.org
Subject: [PATCH 0/1] nfs: Panic when commit fails
Date: Tue, 20 Oct 2009 11:44:36 -0600	[thread overview]
Message-ID: <4ADDF704.9080109@hp.com> (raw)


Steve,

Please consider the following patch.  Thanks in advance!

Summary:
At the end of an nfs write, if the nfs commit fails, all the writes
will be rescheduled.  They are supposed to be rescheduled as NFS_FILE_SYNC
writes, but the rpc_task structure is not completely intialized and so
the option is not passed.  When the rescheduled writes complete, the
return indicates that they are NFS_UNSTABLE and we try to do another
commit.  This leads to a Panic because the commit data structure pointer
was set to null in the initial (failed) commit attempt.


Long-Winded Description:
This panic is pretty rare - it requires all writes to complete
succesfully, and the commit to fail.

We had an NFS clilent panic with this stack trace:
    PID: 5981   TASK: ffff8801a51c86c0  CPU: 6   COMMAND: "nfsiod"
     #0 [ffff88019f1ab800] crash_kexec at ffffffff8026eb22
     #1 [ffff88019f1ab8e0] oops_end at ffffffff804c7099
     #2 [ffff88019f1ab910] do_page_fault at ffffffff804c8d2a
     #3 [ffff88019f1abc90] page_fault at ffffffff804c65d5
     #4 [ffff88019f1abd18] nfs_direct_write_complete at ffffffffa0345bf7
     #5 [ffff88019f1abdf8] nfs_direct_write_release at ffffffffa0346030
     #6 [ffff88019f1abe28] rpc_release_calldata at ffffffffa02ec427
     #7 [ffff88019f1abe38] rpc_free_task at ffffffffa02ec4ce
     #8 [ffff88019f1abe68] rpc_async_release at ffffffffa02ec58b
     #9 [ffff88019f1abe78] run_workqueue at ffffffff80252b17
    #10 [ffff88019f1abec8] worker_thread at ffffffff80252c91
    #11 [ffff88019f1abf28] kthread at ffffffff8025623d
    #12 [ffff88019f1abf48] kernel_thread at ffffffff8021241a

Just prior to this, our nfs server crashed due to hw problems.
The Panic happens in nfs_direct_commit_schedule(), at line ~568 in
fs/nfs/direct.c:

        data->inode = dreq->inode;

Here, data is NULL and is the cause of the Panic.  This means that
dreq->commit_data is NULL.  By inspection, this pointer is only set
to NULL in nfs_direct_commit_schedule(), so we must have called this
function twice.

This panic happens when a commit fails.  In nfs_direct_commit_release(),
when a commit fails, we set dreq->flags to NFS_ODIRECT_RESCHED_WRITES
to reschedule all the associated writes.  When these writes complete,
we try to do another commit, but, the commit_data pointer has already
been set to NULL during previous commit attempt.

The problem is that we shouldn't be doing a second commit attempt.
In nfs_direct_write_reschedule() we request a sync write by setting
the args->stable pointer to NFS_FILE_SYNC, but the server does not
see this, because the rpc_task structure isn't fully initialized.
Thus the response from the server for these rescheduled writes always
returns NFS_UNSTABLE, which causes us to do a second commit  (See:
nfs_direct_write_release()) and causes a Panic.

I added code to nfs_direct_release() to fail every 8th commit.  The
panic was reproduced on the 8th commit every time.  I then added a
line to initialize the rpc_task structure in nfs_direct_write_reschedule().
I quit testing after logging over 64K commits (with 8K commit failures)
and no Panics, and no write errors detected.

This is pretty much a typo.  Every other rpc_task struct appears to
be fully initialized.

-T

                 reply	other threads:[~2009-10-20 17:44 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ADDF704.9080109@hp.com \
    --to=terry.loftin@hp.com \
    --cc=SteveD@redhat.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox