* spurious sillyrename after O_DIRECT writes get ENOSPC
@ 2017-12-08 22:16 J. Bruce Fields
2017-12-13 17:18 ` J. Bruce Fields
0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2017-12-08 22:16 UTC (permalink / raw)
To: linux-nfs; +Cc: hch
Last year Christoph noticed a bug that could result in a file being
unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC:
http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org
It's reproduceable on upstream, over v3 or v4.
I looked into it some more, and it seems to reproduce whenever a write
system call results in multiple WRITE calls, only some of which receive
ENOSPC. I think that's resulting in a leak of the wb_kref on some
nfs_pages (possibly the ones corresponding to the ENOSPC failures?).
Those nfs_pages in turn hold references on nfs_{lock,open}_contexts. So
a "rm" on the client (even after the file is closed) results in a
sillyrename.
I'll keep looking at this, but the relevant code is pretty opaque to me
so far. Any ideas welcomed.
--b.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: spurious sillyrename after O_DIRECT writes get ENOSPC 2017-12-08 22:16 spurious sillyrename after O_DIRECT writes get ENOSPC J. Bruce Fields @ 2017-12-13 17:18 ` J. Bruce Fields 2017-12-14 13:08 ` Benjamin Coddington 0 siblings, 1 reply; 7+ messages in thread From: J. Bruce Fields @ 2017-12-13 17:18 UTC (permalink / raw) To: linux-nfs; +Cc: hch On Fri, Dec 08, 2017 at 05:16:26PM -0500, J. Bruce Fields wrote: > Last year Christoph noticed a bug that could result in a file being > unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC: > > http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org > > It's reproduceable on upstream, over v3 or v4. > > I looked into it some more, and it seems to reproduce whenever a write > system call results in multiple WRITE calls, only some of which receive > ENOSPC. I think that's resulting in a leak of the wb_kref on some > nfs_pages (possibly the ones corresponding to the ENOSPC failures?). > Those nfs_pages in turn hold references on nfs_{lock,open}_contexts. So > a "rm" on the client (even after the file is closed) results in a > sillyrename. > > I'll keep looking at this, but the relevant code is pretty opaque to me > so far. Any ideas welcomed. Actually it looks like a leak of dreq->io_count? That prevents commits from being sent (which I'm also seeing in network traces--the succesfull WRITEs are unstable but never get committed), which means nfs_direct_commit_complete() is never called, and the reference taken on wb_kref in the request_commit case of nfs_direct_write_completion is never put. --b. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: spurious sillyrename after O_DIRECT writes get ENOSPC 2017-12-13 17:18 ` J. Bruce Fields @ 2017-12-14 13:08 ` Benjamin Coddington 2017-12-14 16:36 ` J. Bruce Fields 0 siblings, 1 reply; 7+ messages in thread From: Benjamin Coddington @ 2017-12-14 13:08 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs, hch On 13 Dec 2017, at 12:18, J. Bruce Fields wrote: > On Fri, Dec 08, 2017 at 05:16:26PM -0500, J. Bruce Fields wrote: >> Last year Christoph noticed a bug that could result in a file being >> unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC: >> >> http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org >> >> It's reproduceable on upstream, over v3 or v4. >> >> I looked into it some more, and it seems to reproduce whenever a write >> system call results in multiple WRITE calls, only some of which receive >> ENOSPC. I think that's resulting in a leak of the wb_kref on some >> nfs_pages (possibly the ones corresponding to the ENOSPC failures?). >> Those nfs_pages in turn hold references on nfs_{lock,open}_contexts. So >> a "rm" on the client (even after the file is closed) results in a >> sillyrename. >> >> I'll keep looking at this, but the relevant code is pretty opaque to me >> so far. Any ideas welcomed. > > Actually it looks like a leak of dreq->io_count? That prevents commits > from being sent (which I'm also seeing in network traces--the succesfull > WRITEs are unstable but never get committed), which means > nfs_direct_commit_complete() is never called, and the reference taken on > wb_kref in the request_commit case of nfs_direct_write_completion is > never put. This sounds to me like the problem Scott's working - he sent a patch yesterday "nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds". I think the the rule should be that once we call nfs_pgio_completion_ops->init_hdr, we have to finish with ->completion. However, there are some paths where that is not the case. The callgraph in between nfs_pgheader_init() and nfs_initiate_pgio() in nfs_generic_pg_pgios() for this case might show where we're bailing out early. Ben ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: spurious sillyrename after O_DIRECT writes get ENOSPC 2017-12-14 13:08 ` Benjamin Coddington @ 2017-12-14 16:36 ` J. Bruce Fields 2017-12-14 18:55 ` J. Bruce Fields 0 siblings, 1 reply; 7+ messages in thread From: J. Bruce Fields @ 2017-12-14 16:36 UTC (permalink / raw) To: Benjamin Coddington; +Cc: linux-nfs, hch On Thu, Dec 14, 2017 at 08:08:53AM -0500, Benjamin Coddington wrote: > > On 13 Dec 2017, at 12:18, J. Bruce Fields wrote: > > > On Fri, Dec 08, 2017 at 05:16:26PM -0500, J. Bruce Fields wrote: > >> Last year Christoph noticed a bug that could result in a file being > >> unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC: > >> > >> http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org > >> > >> It's reproduceable on upstream, over v3 or v4. > >> > >> I looked into it some more, and it seems to reproduce whenever a write > >> system call results in multiple WRITE calls, only some of which receive > >> ENOSPC. I think that's resulting in a leak of the wb_kref on some > >> nfs_pages (possibly the ones corresponding to the ENOSPC failures?). > >> Those nfs_pages in turn hold references on nfs_{lock,open}_contexts. So > >> a "rm" on the client (even after the file is closed) results in a > >> sillyrename. > >> > >> I'll keep looking at this, but the relevant code is pretty opaque to me > >> so far. Any ideas welcomed. > > > > Actually it looks like a leak of dreq->io_count? That prevents commits > > from being sent (which I'm also seeing in network traces--the succesfull > > WRITEs are unstable but never get committed), which means > > nfs_direct_commit_complete() is never called, and the reference taken on > > wb_kref in the request_commit case of nfs_direct_write_completion is > > never put. > > This sounds to me like the problem Scott's working - he sent a patch > yesterday "nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the > mds". > > I think the the rule should be that once we call > nfs_pgio_completion_ops->init_hdr, we have to finish with ->completion. > However, there are some paths where that is not the case. Yes, I wondered about that.... Unfortunately after some more tests now I was think I was wrong, the dreq_get()s and put()s are balanced and the bug is somewhere else--in the case of my particular reproducer. Argh. But yes I can easily believe there could be a leak there. --b. > > The callgraph in between nfs_pgheader_init() and nfs_initiate_pgio() in > nfs_generic_pg_pgios() for this case might show where we're bailing out > early. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: spurious sillyrename after O_DIRECT writes get ENOSPC 2017-12-14 16:36 ` J. Bruce Fields @ 2017-12-14 18:55 ` J. Bruce Fields 2017-12-19 16:56 ` J. Bruce Fields 0 siblings, 1 reply; 7+ messages in thread From: J. Bruce Fields @ 2017-12-14 18:55 UTC (permalink / raw) To: Benjamin Coddington; +Cc: linux-nfs, hch On Thu, Dec 14, 2017 at 11:36:54AM -0500, J. Bruce Fields wrote: > On Thu, Dec 14, 2017 at 08:08:53AM -0500, Benjamin Coddington wrote: > > > > On 13 Dec 2017, at 12:18, J. Bruce Fields wrote: > > > > > On Fri, Dec 08, 2017 at 05:16:26PM -0500, J. Bruce Fields wrote: > > >> Last year Christoph noticed a bug that could result in a file being > > >> unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC: > > >> > > >> http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org > > >> > > >> It's reproduceable on upstream, over v3 or v4. > > >> > > >> I looked into it some more, and it seems to reproduce whenever a write > > >> system call results in multiple WRITE calls, only some of which receive > > >> ENOSPC. I think that's resulting in a leak of the wb_kref on some > > >> nfs_pages (possibly the ones corresponding to the ENOSPC failures?). > > >> Those nfs_pages in turn hold references on nfs_{lock,open}_contexts. So > > >> a "rm" on the client (even after the file is closed) results in a > > >> sillyrename. > > >> > > >> I'll keep looking at this, but the relevant code is pretty opaque to me > > >> so far. Any ideas welcomed. > > > > > > Actually it looks like a leak of dreq->io_count? That prevents commits > > > from being sent (which I'm also seeing in network traces--the succesfull > > > WRITEs are unstable but never get committed), which means > > > nfs_direct_commit_complete() is never called, and the reference taken on > > > wb_kref in the request_commit case of nfs_direct_write_completion is > > > never put. > > > > This sounds to me like the problem Scott's working - he sent a patch > > yesterday "nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the > > mds". > > > > I think the the rule should be that once we call > > nfs_pgio_completion_ops->init_hdr, we have to finish with ->completion. > > However, there are some paths where that is not the case. > > Yes, I wondered about that.... > > Unfortunately after some more tests now I was think I was wrong, the > dreq_get()s and put()s are balanced and the bug is somewhere else--in > the case of my particular reproducer. Argh. But yes I can easily > believe there could be a leak there. So actually what happens is if you do a direct io write where some WRITEs succeed and the one fails, then this: if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) { dreq->flags = 0; dreq->error = hdr->error; } clears the NFS_ODIRECT_DO_COMMIT flag, so nfs_direct_write_complete never scheduels the commit calls. It looks like that leaves a bunch of nfs_pages on some to-be-committed list, so we end up leaking a bunch of stuff, with the most visible symptom being an unnecessarily sillyrename on close. I can just remove that clear of dreq_flags and that fixes the problem, but I doubt that's correct. --b. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: spurious sillyrename after O_DIRECT writes get ENOSPC 2017-12-14 18:55 ` J. Bruce Fields @ 2017-12-19 16:56 ` J. Bruce Fields 2018-01-16 15:08 ` [PATCH] NFS: commit direct writes even if they fail partially J. Bruce Fields 0 siblings, 1 reply; 7+ messages in thread From: J. Bruce Fields @ 2017-12-19 16:56 UTC (permalink / raw) To: Benjamin Coddington; +Cc: linux-nfs, hch, Trond Myklebust, Anna Schumaker On Thu, Dec 14, 2017 at 01:55:14PM -0500, J. Bruce Fields wrote: > So actually what happens is if you do a direct io write where some > WRITEs succeed and the one fails, then this: > > if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) { > dreq->flags = 0; > dreq->error = hdr->error; > } > > clears the NFS_ODIRECT_DO_COMMIT flag, so nfs_direct_write_complete > never scheduels the commit calls. It looks like that leaves a bunch of > nfs_pages on some to-be-committed list, so we end up leaking a bunch of > stuff, with the most visible symptom being an unnecessarily sillyrename > on close. > > I can just remove that clear of dreq_flags and that fixes the problem, > but I doubt that's correct. Or, maybe it is. If *any* WRITE(s) involved in this write might need a commit or reschedule, then surely we should run nfs_direct_write_schedule_work and let it sort them out? I'm having trouble seeing why clearing that field partway through could ever be correct. Trond, Anna, does the following look right? --b. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] NFS: commit direct writes even if they fail partially 2017-12-19 16:56 ` J. Bruce Fields @ 2018-01-16 15:08 ` J. Bruce Fields 0 siblings, 0 replies; 7+ messages in thread From: J. Bruce Fields @ 2018-01-16 15:08 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, hch, Benjamin Coddington From: "J. Bruce Fields" <bfields@redhat.com> If some of the WRITE calls making up an O_DIRECT write syscall fail, we neglect to commit, even if some of the WRITEs succeed. We also depend on the commit code to free the reference count on the nfs_page taken in the "if (request_commit)" case at the end of nfs_direct_write_completion(). The problem was originally noticed because ENOSPC's encountered partway through a write would result in a closed file being sillyrenamed when it should have been unlinked. Signed-off-by: J. Bruce Fields <bfields@redhat.com> --- fs/nfs/direct.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index d2972d537469..8c10b0562e75 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -775,10 +775,8 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr) spin_lock(&dreq->lock); - if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) { - dreq->flags = 0; + if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) dreq->error = hdr->error; - } if (dreq->error == 0) { nfs_direct_good_bytes(dreq, hdr); if (nfs_write_need_commit(hdr)) { -- 2.14.3 ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-01-16 15:08 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-12-08 22:16 spurious sillyrename after O_DIRECT writes get ENOSPC J. Bruce Fields 2017-12-13 17:18 ` J. Bruce Fields 2017-12-14 13:08 ` Benjamin Coddington 2017-12-14 16:36 ` J. Bruce Fields 2017-12-14 18:55 ` J. Bruce Fields 2017-12-19 16:56 ` J. Bruce Fields 2018-01-16 15:08 ` [PATCH] NFS: commit direct writes even if they fail partially J. Bruce Fields
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).