public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
* nfs_page_async_flush returning 0 for fatal errors on writeback
@ 2021-07-07 18:51 Calum Mackay
  2021-07-07 22:01 ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Calum Mackay @ 2021-07-07 18:51 UTC (permalink / raw)
  To: Trond Myklebust, Linux NFS Mailing List


[-- Attachment #1.1: Type: text/plain, Size: 2914 bytes --]

hi Trond,

I had a question about these two old commits of yours, from v5.0 & v5.2:

14bebe3c90b3 NFS: Don't interrupt file writeout due to fatal errors (2 
years, 2 months ago)

8fc75bed96bb NFS: Fix up return value on fatal errors in 
nfs_page_async_flush() (2 years, 5 months ago)


I am looking at a crash dump, with a kernel based on an older-still 
v4.14 stable which did not have either of the above commits.

        PANIC: "BUG: unable to handle kernel NULL pointer dereference at
0000000000000080"

     [exception RIP: _raw_spin_lock+20]

#10 [ffffb1493d78fcb8] nfs_updatepage at ffffffffc08f1791 [nfs]
#11 [ffffb1493d78fd10] nfs_write_end at ffffffffc08e094e [nfs]
#12 [ffffb1493d78fd58] generic_perform_write at ffffffffa71d458b
#13 [ffffb1493d78fde0] nfs_file_write at ffffffffc08dfdb4 [nfs]
#14 [ffffb1493d78fe18] __vfs_write at ffffffffa72848bc
#15 [ffffb1493d78fea0] vfs_write at ffffffffa7284ad2
#16 [ffffb1493d78fee0] sys_write at ffffffffa7284d35
#17 [ffffb1493d78ff28] do_syscall_64 at ffffffffa7003949

the real sequence, obscured by compiler inlining, is:

    nfs_updatepage
       nfs_writepage_setup
          nfs_setup_write_request
             nfs_inode_add_request
                spin_lock(&mapping->private_lock);

and we crash since the as mapping pointer is NULL.


I thought I was able to construct a possible sequence that would explain 
the above, if we are in (from above):

    nfs_setup_write_request
     nfs_try_to_update_request
      nfs_wb_page
       nfs_writepage_locked
        nfs_do_writepage

and nfs_page_async_flush detects a fatal server error, and calls 
nfs_write_error_remove_page, which results in the page->mapping set to NULL.

In that version of the code, without your commits above, 
nfs_page_async_flush returns 0 in this case, which I thought might 
result in nfs_setup_write_request going ahead and calling 
nfs_inode_add_request with that page, resulting in the crash seen.


I then discovered your v5.0 commit:

8fc75bed96bb NFS: Fix up return value on fatal errors in 
nfs_page_async_flush() (2 years, 5 months ago)

which appeared to correct that, having nfs_page_async_flush return the 
error in this case, so we would not end up in nfs_inode_add_request.


But I then spotted your later v5.2 commit:

14bebe3c90b3 NFS: Don't interrupt file writeout due to fatal errors (2 
years, 2 months ago)

which changes things back, so that nfs_page_async_flush now again 
returns 0, in the "launder" case, and that's how that code remains today.


If so, is there anything to stop the possible crash path that I describe 
above?


Is my theory just wrong? Is there another mechanism that prevents the 
path I suggest above? Or perhaps I'm missing another commit that stops 
it happening, even after your second commit above?


thanks very much,

cheers,
calum.



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-08-06 21:25 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-07-07 18:51 nfs_page_async_flush returning 0 for fatal errors on writeback Calum Mackay
2021-07-07 22:01 ` Trond Myklebust
2021-07-07 23:13   ` Calum Mackay
2021-07-07 23:50     ` Trond Myklebust
2021-07-07 23:57       ` Calum Mackay
2021-08-06 21:25       ` Calum Mackay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox