linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Question about LTS 4.19 patch "89047634f5ce NFS: Don't interrupt file writeout due to fatal errors"
@ 2023-10-30  8:39 ChenXiaoSong
  2023-10-30  8:43 ` Greg KH
  0 siblings, 1 reply; 9+ messages in thread
From: ChenXiaoSong @ 2023-10-30  8:39 UTC (permalink / raw)
  To: gregkh, trond.myklebust, chenxiaosong
  Cc: Anna.Schumaker, sashal, liuzhengyuan, huangjinhui, liuyun01,
	huhai, linux-nfs, linux-kernel, stable

Hi Trond and Greg:

LTS 4.19 reported null-ptr-deref BUG as follows:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
Call Trace:
  nfs_inode_add_request+0x1cc/0x5b8
  nfs_setup_write_request+0x1fa/0x1fc
  nfs_writepage_setup+0x2d/0x7d
  nfs_updatepage+0x8b8/0x936
  nfs_write_end+0x61d/0xd45
  generic_perform_write+0x19a/0x3f0
  nfs_file_write+0x2cc/0x6e5
  new_sync_write+0x442/0x560
  __vfs_write+0xda/0xef
  vfs_write+0x176/0x48b
  ksys_write+0x10a/0x1e9
  __se_sys_write+0x24/0x29
  __x64_sys_write+0x79/0x93
  do_syscall_64+0x16d/0x4bb
  entry_SYSCALL_64_after_hwframe+0x5c/0xc1

The reason is: generic_error_remove_page set page->mapping to NULL when 
nfs server have a fatal error:

nfs_updatepage
   nfs_writepage_setup
     nfs_setup_write_request
       nfs_try_to_update_request // return NULL
         nfs_wb_page // return 0
           nfs_writepage_locked // return 0
             nfs_do_writepage // return 0
               nfs_page_async_flush // return 0
                 nfs_error_is_fatal_on_server
                 generic_error_remove_page
                   truncate_inode_page
                     delete_from_page_cache
                       __delete_from_page_cache
                         page_cache_tree_delete
                           page->mapping = NULL // this is point
       nfs_create_request
         req->wb_page    = page // the page is freed
       nfs_inode_add_request
         mapping = page_file_mapping(req->wb_page)
           return page->mapping
         spin_lock(&mapping->private_lock) // mapping is NULL

It is reasonable by reverting the patch "89047634f5ce NFS: Don't 
interrupt file writeout due to fatal errors" to fix this bug?


This patch is one patch of patchset [Fix up soft mounts for 
NFSv4.x](https://lore.kernel.org/all/20190407175912.23528-1-trond.myklebust@hammerspace.com/), 
the patchset replace custom error reporting mechanism. it seams that we 
should merge all the patchset to LTS 4.19, or all patchs should not be 
merged. And the "Fixes:" label is not correct, this patch is a 
refactoring patch, not for fixing bugs.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about LTS 4.19 patch "89047634f5ce NFS: Don't interrupt file writeout due to fatal errors"
  2023-10-30  8:39 Question about LTS 4.19 patch "89047634f5ce NFS: Don't interrupt file writeout due to fatal errors" ChenXiaoSong
@ 2023-10-30  8:43 ` Greg KH
  2023-10-30  8:54   ` ChenXiaoSong
  0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2023-10-30  8:43 UTC (permalink / raw)
  To: ChenXiaoSong
  Cc: trond.myklebust, chenxiaosong, Anna.Schumaker, sashal,
	liuzhengyuan, huangjinhui, liuyun01, huhai, linux-nfs,
	linux-kernel, stable

On Mon, Oct 30, 2023 at 04:39:11PM +0800, ChenXiaoSong wrote:
> Hi Trond and Greg:
> 
> LTS 4.19 reported null-ptr-deref BUG as follows:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
> Call Trace:
>  nfs_inode_add_request+0x1cc/0x5b8
>  nfs_setup_write_request+0x1fa/0x1fc
>  nfs_writepage_setup+0x2d/0x7d
>  nfs_updatepage+0x8b8/0x936
>  nfs_write_end+0x61d/0xd45
>  generic_perform_write+0x19a/0x3f0
>  nfs_file_write+0x2cc/0x6e5
>  new_sync_write+0x442/0x560
>  __vfs_write+0xda/0xef
>  vfs_write+0x176/0x48b
>  ksys_write+0x10a/0x1e9
>  __se_sys_write+0x24/0x29
>  __x64_sys_write+0x79/0x93
>  do_syscall_64+0x16d/0x4bb
>  entry_SYSCALL_64_after_hwframe+0x5c/0xc1
> 
> The reason is: generic_error_remove_page set page->mapping to NULL when nfs
> server have a fatal error:
> 
> nfs_updatepage
>   nfs_writepage_setup
>     nfs_setup_write_request
>       nfs_try_to_update_request // return NULL
>         nfs_wb_page // return 0
>           nfs_writepage_locked // return 0
>             nfs_do_writepage // return 0
>               nfs_page_async_flush // return 0
>                 nfs_error_is_fatal_on_server
>                 generic_error_remove_page
>                   truncate_inode_page
>                     delete_from_page_cache
>                       __delete_from_page_cache
>                         page_cache_tree_delete
>                           page->mapping = NULL // this is point
>       nfs_create_request
>         req->wb_page    = page // the page is freed
>       nfs_inode_add_request
>         mapping = page_file_mapping(req->wb_page)
>           return page->mapping
>         spin_lock(&mapping->private_lock) // mapping is NULL
> 
> It is reasonable by reverting the patch "89047634f5ce NFS: Don't interrupt
> file writeout due to fatal errors" to fix this bug?

Try it and see, but note, that came from the 4.19.99 release which was
released years ago, are you sure you are using the most recent 4.19.y
release?

> This patch is one patch of patchset [Fix up soft mounts for NFSv4.x](https://lore.kernel.org/all/20190407175912.23528-1-trond.myklebust@hammerspace.com/),
> the patchset replace custom error reporting mechanism. it seams that we
> should merge all the patchset to LTS 4.19, or all patchs should not be
> merged. And the "Fixes:" label is not correct, this patch is a refactoring
> patch, not for fixing bugs.

If we missed some patches, that should be added on top of the current
tree, please let us know the git commit ids of them after you have
tested them that they work properly, and we will gladly apply them.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about LTS 4.19 patch "89047634f5ce NFS: Don't interrupt file writeout due to fatal errors"
  2023-10-30  8:43 ` Greg KH
@ 2023-10-30  8:54   ` ChenXiaoSong
  2023-10-30  8:58     ` Greg KH
  0 siblings, 1 reply; 9+ messages in thread
From: ChenXiaoSong @ 2023-10-30  8:54 UTC (permalink / raw)
  To: Greg KH
  Cc: trond.myklebust, chenxiaosong, Anna.Schumaker, sashal,
	liuzhengyuan, huangjinhui, liuyun01, huhai, linux-nfs,
	linux-kernel, stable

On 2023/10/30 16:43, Greg KH wrote:
> Try it and see, but note, that came from the 4.19.99 release which was
> released years ago, are you sure you are using the most recent 4.19.y
> release?

I use the most recent release 1b540579cf66 Linux 4.19.296.

> If we missed some patches, that should be added on top of the current
> tree, please let us know the git commit ids of them after you have
> tested them that they work properly, and we will gladly apply them.
Merging the entire patchset may not be the best option. Perhaps a better 
choice is to revert this patch. And I would like to see Trond's suggestion.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about LTS 4.19 patch "89047634f5ce NFS: Don't interrupt file writeout due to fatal errors"
  2023-10-30  8:54   ` ChenXiaoSong
@ 2023-10-30  8:58     ` Greg KH
  2023-10-30  9:04       ` ChenXiaoSong
  0 siblings, 1 reply; 9+ messages in thread
From: Greg KH @ 2023-10-30  8:58 UTC (permalink / raw)
  To: ChenXiaoSong
  Cc: trond.myklebust, chenxiaosong, Anna.Schumaker, sashal,
	liuzhengyuan, huangjinhui, liuyun01, huhai, linux-nfs,
	linux-kernel, stable

On Mon, Oct 30, 2023 at 04:54:02PM +0800, ChenXiaoSong wrote:
> On 2023/10/30 16:43, Greg KH wrote:
> > Try it and see, but note, that came from the 4.19.99 release which was
> > released years ago, are you sure you are using the most recent 4.19.y
> > release?
> 
> I use the most recent release 1b540579cf66 Linux 4.19.296.
> 
> > If we missed some patches, that should be added on top of the current
> > tree, please let us know the git commit ids of them after you have
> > tested them that they work properly, and we will gladly apply them.
> Merging the entire patchset may not be the best option. Perhaps a better
> choice is to revert this patch. And I would like to see Trond's suggestion.
> 

If you just revert that one patch, is the issue resolved?  And what
about other stable releases that have that change in it?

If this does need to be reverted, please submit a patch that does so and
we can review it that way.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about LTS 4.19 patch "89047634f5ce NFS: Don't interrupt file writeout due to fatal errors"
  2023-10-30  8:58     ` Greg KH
@ 2023-10-30  9:04       ` ChenXiaoSong
  2023-10-30  9:19         ` Greg KH
  2023-10-30 14:56         ` Trond Myklebust
  0 siblings, 2 replies; 9+ messages in thread
From: ChenXiaoSong @ 2023-10-30  9:04 UTC (permalink / raw)
  To: Greg KH, trond.myklebust
  Cc: chenxiaosong, Anna.Schumaker, sashal, liuzhengyuan, huangjinhui,
	liuyun01, huhai, linux-nfs, linux-kernel, stable


On 2023/10/30 16:58, Greg KH wrote:
> If you just revert that one patch, is the issue resolved?  And what
> about other stable releases that have that change in it?
>
> If this does need to be reverted, please submit a patch that does so and
> we can review it that way.
>
> thanks,
>
> greg k-h


In my opinion, this patch is not for fixing a bug, but is part of a 
refactoring patchset. The 'Fixes:' tag should not be added.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about LTS 4.19 patch "89047634f5ce NFS: Don't interrupt file writeout due to fatal errors"
  2023-10-30  9:04       ` ChenXiaoSong
@ 2023-10-30  9:19         ` Greg KH
  2023-10-30 14:56         ` Trond Myklebust
  1 sibling, 0 replies; 9+ messages in thread
From: Greg KH @ 2023-10-30  9:19 UTC (permalink / raw)
  To: ChenXiaoSong
  Cc: trond.myklebust, chenxiaosong, Anna.Schumaker, sashal,
	liuzhengyuan, huangjinhui, liuyun01, huhai, linux-nfs,
	linux-kernel, stable

On Mon, Oct 30, 2023 at 05:04:35PM +0800, ChenXiaoSong wrote:
> 
> On 2023/10/30 16:58, Greg KH wrote:
> > If you just revert that one patch, is the issue resolved?  And what
> > about other stable releases that have that change in it?
> > 
> > If this does need to be reverted, please submit a patch that does so and
> > we can review it that way.
> > 
> > thanks,
> > 
> > greg k-h
> 
> 
> In my opinion, this patch is not for fixing a bug, but is part of a
> refactoring patchset. The 'Fixes:' tag should not be added.

Nothing we can do about that now, right?  And to try to ask developers
about a change they made in 2019 is a bit rough, don't you think?  If
you think the change is incorrect, please submit a patch to resolve it
after you have tested that it works properly.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about LTS 4.19 patch "89047634f5ce NFS: Don't interrupt file writeout due to fatal errors"
  2023-10-30  9:04       ` ChenXiaoSong
  2023-10-30  9:19         ` Greg KH
@ 2023-10-30 14:56         ` Trond Myklebust
  2023-11-17  3:28           ` ChenXiaoSong
  2023-11-17  4:09           ` ChenXiaoSong
  1 sibling, 2 replies; 9+ messages in thread
From: Trond Myklebust @ 2023-10-30 14:56 UTC (permalink / raw)
  To: chenxiaosongemail@foxmail.com, gregkh@linuxfoundation.org
  Cc: linux-nfs@vger.kernel.org, chenxiaosong@kylinos.cn,
	stable@vger.kernel.org, huangjinhui@kylinos.cn,
	liuzhengyuan@kylinos.cn, liuyun01@kylinos.cn, huhai@kylinos.cn,
	sashal@kernel.org, linux-kernel@vger.kernel.org,
	Anna.Schumaker@netapp.com

On Mon, 2023-10-30 at 17:04 +0800, ChenXiaoSong wrote:
> [You don't often get email from chenxiaosongemail@foxmail.com. Learn
> why this is important at
> https://aka.ms/LearnAboutSenderIdentification ]
> 
> On 2023/10/30 16:58, Greg KH wrote:
> > If you just revert that one patch, is the issue resolved?  And what
> > about other stable releases that have that change in it?
> > 
> > If this does need to be reverted, please submit a patch that does
> > so and
> > we can review it that way.
> > 
> > thanks,
> > 
> > greg k-h
> 
> 
> In my opinion, this patch is not for fixing a bug, but is part of a
> refactoring patchset. The 'Fixes:' tag should not be added.
> 
> 

A refactoring is by definition a change that does not affect code
behaviour. It is obvious that this was never intended to be such a
patch.

The reason that the bug is occurring in 4.19.x, and not in the latest
kernels, is because the former is missing another bugfix (one which
actually is missing a "Fixes:" tag).

Can you therefore please check if applying commit 22876f540bdf ("NFS:
Don't call generic_error_remove_page() while holding locks") fixes the
issue.

Note that the latter patch is needed in any case in order to fix a read
deadlock (as indicated on the label).

Thanks,
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about LTS 4.19 patch "89047634f5ce NFS: Don't interrupt file writeout due to fatal errors"
  2023-10-30 14:56         ` Trond Myklebust
@ 2023-11-17  3:28           ` ChenXiaoSong
  2023-11-17  4:09           ` ChenXiaoSong
  1 sibling, 0 replies; 9+ messages in thread
From: ChenXiaoSong @ 2023-11-17  3:28 UTC (permalink / raw)
  To: Trond Myklebust, gregkh@linuxfoundation.org
  Cc: linux-nfs@vger.kernel.org, chenxiaosong@kylinos.cn,
	stable@vger.kernel.org, huangjinhui@kylinos.cn,
	liuzhengyuan@kylinos.cn, liuyun01@kylinos.cn, huhai@kylinos.cn,
	sashal@kernel.org, linux-kernel@vger.kernel.org,
	Anna.Schumaker@netapp.com

On 2023/10/30 22:56, Trond Myklebust wrote:
> A refactoring is by definition a change that does not affect code
> behaviour. It is obvious that this was never intended to be such a
> patch.
>
> The reason that the bug is occurring in 4.19.x, and not in the latest
> kernels, is because the former is missing another bugfix (one which
> actually is missing a "Fixes:" tag).
>
> Can you therefore please check if applying commit 22876f540bdf ("NFS:
> Don't call generic_error_remove_page() while holding locks") fixes the
> issue.
>
> Note that the latter patch is needed in any case in order to fix a read
> deadlock (as indicated on the label).
>
> Thanks,
>    Trond
>
After applying commit 22876f540bdf ("NFS: Don't call 
generic_error_remove_page() while holding locks"), I encountered an 
issue of infinite loop:

write ... nfs_updatepage nfs_writepage_setup nfs_setup_write_request 
nfs_try_to_update_request nfs_wb_page if (clear_page_dirty_for_io(page)) 
// true nfs_writepage_locked // return 0 nfs_do_writepage // return 0 
nfs_page_async_flush // return 0 nfs_error_is_fatal_on_server 
nfs_write_error_remove_page SetPageError // instead of 
generic_error_remove_page // loop begin if 
(clear_page_dirty_for_io(page)) // false if (!PagePrivate(page)) // 
false ret = nfs_commit_inode = 0 // loop again, never quit

before applying commit 22876f540bdf ("NFS: Don't call 
generic_error_remove_page() while holding locks"), 
generic_error_remove_page() will clear PG_private, and infinite loop 
will never happen:

generic_error_remove_page truncate_inode_page truncate_cleanup_page 
do_invalidatepage nfs_invalidate_page nfs_wb_page_cancel 
nfs_inode_remove_request ClearPagePrivate(head->wb_page)

If applying this patch, are other patches required? And I cannot 
reproducethe read deadlock bug that the patch want to fix, are there 
specific conditions required to reproduce this read deadlock bug?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Question about LTS 4.19 patch "89047634f5ce NFS: Don't interrupt file writeout due to fatal errors"
  2023-10-30 14:56         ` Trond Myklebust
  2023-11-17  3:28           ` ChenXiaoSong
@ 2023-11-17  4:09           ` ChenXiaoSong
  1 sibling, 0 replies; 9+ messages in thread
From: ChenXiaoSong @ 2023-11-17  4:09 UTC (permalink / raw)
  To: Trond Myklebust, gregkh@linuxfoundation.org
  Cc: linux-nfs@vger.kernel.org, chenxiaosong@kylinos.cn,
	stable@vger.kernel.org, huangjinhui@kylinos.cn,
	liuzhengyuan@kylinos.cn, liuyun01@kylinos.cn, huhai@kylinos.cn,
	sashal@kernel.org, linux-kernel@vger.kernel.org,
	Anna.Schumaker@netapp.com

On 2023/10/30 22:56, Trond Myklebust wrote:
> A refactoring is by definition a change that does not affect code
> behaviour. It is obvious that this was never intended to be such a
> patch.
>
> The reason that the bug is occurring in 4.19.x, and not in the latest
> kernels, is because the former is missing another bugfix (one which
> actually is missing a "Fixes:" tag).
>
> Can you therefore please check if applying commit 22876f540bdf ("NFS:
> Don't call generic_error_remove_page() while holding locks") fixes the
> issue.
>
> Note that the latter patch is needed in any case in order to fix a read
> deadlock (as indicated on the label).
>
> Thanks,
>    Trond

Sorry, the previous email had formatting issues. I'll resend it.


After applying commit 22876f540bdf ("NFS: Don't call 
generic_error_remove_page() while holding locks"), I encountered an 
issue of infinite loop:

write
   ...
   nfs_updatepage
     nfs_writepage_setup
       nfs_setup_write_request
         nfs_try_to_update_request
           nfs_wb_page
             if (clear_page_dirty_for_io(page)) // true
             nfs_writepage_locked // return 0
               nfs_do_writepage // return 0
                 nfs_page_async_flush // return 0
                   nfs_error_is_fatal_on_server
                   nfs_write_error_remove_page
                     SetPageError // instead of generic_error_remove_page
             // loop begin
             if (clear_page_dirty_for_io(page)) // false
             if (!PagePrivate(page)) // false
             ret = nfs_commit_inode = 0
             // loop again, never quit


before applying commit 22876f540bdf ("NFS: Don't call 
generic_error_remove_page() while holding locks"), 
generic_error_remove_page() will clear PG_private, and infinite loop 
will never happen:

generic_error_remove_page
   truncate_inode_page
     truncate_cleanup_page
       do_invalidatepage
         nfs_invalidate_page
           nfs_wb_page_cancel
             nfs_inode_remove_request
               ClearPagePrivate(head->wb_page)


If applying this patch, are other patches required? And I cannot 
reproducethe read deadlock bug that the patch want to fix, are there 
specific conditions required to reproduce this read deadlock bug?



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-11-17  4:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-30  8:39 Question about LTS 4.19 patch "89047634f5ce NFS: Don't interrupt file writeout due to fatal errors" ChenXiaoSong
2023-10-30  8:43 ` Greg KH
2023-10-30  8:54   ` ChenXiaoSong
2023-10-30  8:58     ` Greg KH
2023-10-30  9:04       ` ChenXiaoSong
2023-10-30  9:19         ` Greg KH
2023-10-30 14:56         ` Trond Myklebust
2023-11-17  3:28           ` ChenXiaoSong
2023-11-17  4:09           ` ChenXiaoSong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).