* spurious sillyrename after O_DIRECT writes get ENOSPC
@ 2017-12-08 22:16 J. Bruce Fields
2017-12-13 17:18 ` J. Bruce Fields
0 siblings, 1 reply; 7+ messages in thread
From: J. Bruce Fields @ 2017-12-08 22:16 UTC (permalink / raw)
To: linux-nfs; +Cc: hch
Last year Christoph noticed a bug that could result in a file being
unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC:
http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org
It's reproduceable on upstream, over v3 or v4.
I looked into it some more, and it seems to reproduce whenever a write
system call results in multiple WRITE calls, only some of which receive
ENOSPC. I think that's resulting in a leak of the wb_kref on some
nfs_pages (possibly the ones corresponding to the ENOSPC failures?).
Those nfs_pages in turn hold references on nfs_{lock,open}_contexts. So
a "rm" on the client (even after the file is closed) results in a
sillyrename.
I'll keep looking at this, but the relevant code is pretty opaque to me
so far. Any ideas welcomed.
--b.
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: spurious sillyrename after O_DIRECT writes get ENOSPC 2017-12-08 22:16 spurious sillyrename after O_DIRECT writes get ENOSPC J. Bruce Fields @ 2017-12-13 17:18 ` J. Bruce Fields 2017-12-14 13:08 ` Benjamin Coddington 0 siblings, 1 reply; 7+ messages in thread From: J. Bruce Fields @ 2017-12-13 17:18 UTC (permalink / raw) To: linux-nfs; +Cc: hch On Fri, Dec 08, 2017 at 05:16:26PM -0500, J. Bruce Fields wrote: > Last year Christoph noticed a bug that could result in a file being > unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC: > > http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org > > It's reproduceable on upstream, over v3 or v4. > > I looked into it some more, and it seems to reproduce whenever a write > system call results in multiple WRITE calls, only some of which receive > ENOSPC. I think that's resulting in a leak of the wb_kref on some > nfs_pages (possibly the ones corresponding to the ENOSPC failures?). > Those nfs_pages in turn hold references on nfs_{lock,open}_contexts. So > a "rm" on the client (even after the file is closed) results in a > sillyrename. > > I'll keep looking at this, but the relevant code is pretty opaque to me > so far. Any ideas welcomed. Actually it looks like a leak of dreq->io_count? That prevents commits from being sent (which I'm also seeing in network traces--the succesfull WRITEs are unstable but never get committed), which means nfs_direct_commit_complete() is never called, and the reference taken on wb_kref in the request_commit case of nfs_direct_write_completion is never put. --b. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: spurious sillyrename after O_DIRECT writes get ENOSPC 2017-12-13 17:18 ` J. Bruce Fields @ 2017-12-14 13:08 ` Benjamin Coddington 2017-12-14 16:36 ` J. Bruce Fields 0 siblings, 1 reply; 7+ messages in thread From: Benjamin Coddington @ 2017-12-14 13:08 UTC (permalink / raw) To: J. Bruce Fields; +Cc: linux-nfs, hch On 13 Dec 2017, at 12:18, J. Bruce Fields wrote: > On Fri, Dec 08, 2017 at 05:16:26PM -0500, J. Bruce Fields wrote: >> Last year Christoph noticed a bug that could result in a file being >> unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC: >> >> http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org >> >> It's reproduceable on upstream, over v3 or v4. >> >> I looked into it some more, and it seems to reproduce whenever a write >> system call results in multiple WRITE calls, only some of which receive >> ENOSPC. I think that's resulting in a leak of the wb_kref on some >> nfs_pages (possibly the ones corresponding to the ENOSPC failures?). >> Those nfs_pages in turn hold references on nfs_{lock,open}_contexts. So >> a "rm" on the client (even after the file is closed) results in a >> sillyrename. >> >> I'll keep looking at this, but the relevant code is pretty opaque to me >> so far. Any ideas welcomed. > > Actually it looks like a leak of dreq->io_count? That prevents commits > from being sent (which I'm also seeing in network traces--the succesfull > WRITEs are unstable but never get committed), which means > nfs_direct_commit_complete() is never called, and the reference taken on > wb_kref in the request_commit case of nfs_direct_write_completion is > never put. This sounds to me like the problem Scott's working - he sent a patch yesterday "nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the mds". I think the the rule should be that once we call nfs_pgio_completion_ops->init_hdr, we have to finish with ->completion. However, there are some paths where that is not the case. The callgraph in between nfs_pgheader_init() and nfs_initiate_pgio() in nfs_generic_pg_pgios() for this case might show where we're bailing out early. Ben ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: spurious sillyrename after O_DIRECT writes get ENOSPC 2017-12-14 13:08 ` Benjamin Coddington @ 2017-12-14 16:36 ` J. Bruce Fields 2017-12-14 18:55 ` J. Bruce Fields 0 siblings, 1 reply; 7+ messages in thread From: J. Bruce Fields @ 2017-12-14 16:36 UTC (permalink / raw) To: Benjamin Coddington; +Cc: linux-nfs, hch On Thu, Dec 14, 2017 at 08:08:53AM -0500, Benjamin Coddington wrote: > > On 13 Dec 2017, at 12:18, J. Bruce Fields wrote: > > > On Fri, Dec 08, 2017 at 05:16:26PM -0500, J. Bruce Fields wrote: > >> Last year Christoph noticed a bug that could result in a file being > >> unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC: > >> > >> http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org > >> > >> It's reproduceable on upstream, over v3 or v4. > >> > >> I looked into it some more, and it seems to reproduce whenever a write > >> system call results in multiple WRITE calls, only some of which receive > >> ENOSPC. I think that's resulting in a leak of the wb_kref on some > >> nfs_pages (possibly the ones corresponding to the ENOSPC failures?). > >> Those nfs_pages in turn hold references on nfs_{lock,open}_contexts. So > >> a "rm" on the client (even after the file is closed) results in a > >> sillyrename. > >> > >> I'll keep looking at this, but the relevant code is pretty opaque to me > >> so far. Any ideas welcomed. > > > > Actually it looks like a leak of dreq->io_count? That prevents commits > > from being sent (which I'm also seeing in network traces--the succesfull > > WRITEs are unstable but never get committed), which means > > nfs_direct_commit_complete() is never called, and the reference taken on > > wb_kref in the request_commit case of nfs_direct_write_completion is > > never put. > > This sounds to me like the problem Scott's working - he sent a patch > yesterday "nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the > mds". > > I think the the rule should be that once we call > nfs_pgio_completion_ops->init_hdr, we have to finish with ->completion. > However, there are some paths where that is not the case. Yes, I wondered about that.... Unfortunately after some more tests now I was think I was wrong, the dreq_get()s and put()s are balanced and the bug is somewhere else--in the case of my particular reproducer. Argh. But yes I can easily believe there could be a leak there. --b. > > The callgraph in between nfs_pgheader_init() and nfs_initiate_pgio() in > nfs_generic_pg_pgios() for this case might show where we're bailing out > early. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: spurious sillyrename after O_DIRECT writes get ENOSPC 2017-12-14 16:36 ` J. Bruce Fields @ 2017-12-14 18:55 ` J. Bruce Fields 2017-12-19 16:56 ` J. Bruce Fields 0 siblings, 1 reply; 7+ messages in thread From: J. Bruce Fields @ 2017-12-14 18:55 UTC (permalink / raw) To: Benjamin Coddington; +Cc: linux-nfs, hch On Thu, Dec 14, 2017 at 11:36:54AM -0500, J. Bruce Fields wrote: > On Thu, Dec 14, 2017 at 08:08:53AM -0500, Benjamin Coddington wrote: > > > > On 13 Dec 2017, at 12:18, J. Bruce Fields wrote: > > > > > On Fri, Dec 08, 2017 at 05:16:26PM -0500, J. Bruce Fields wrote: > > >> Last year Christoph noticed a bug that could result in a file being > > >> unnecessarily sillyrenamed after O_DIRECT writes get ENOSPC: > > >> > > >> http://lkml.kernel.org/r/20160616150146.GA14015@infradead.org > > >> > > >> It's reproduceable on upstream, over v3 or v4. > > >> > > >> I looked into it some more, and it seems to reproduce whenever a write > > >> system call results in multiple WRITE calls, only some of which receive > > >> ENOSPC. I think that's resulting in a leak of the wb_kref on some > > >> nfs_pages (possibly the ones corresponding to the ENOSPC failures?). > > >> Those nfs_pages in turn hold references on nfs_{lock,open}_contexts. So > > >> a "rm" on the client (even after the file is closed) results in a > > >> sillyrename. > > >> > > >> I'll keep looking at this, but the relevant code is pretty opaque to me > > >> so far. Any ideas welcomed. > > > > > > Actually it looks like a leak of dreq->io_count? That prevents commits > > > from being sent (which I'm also seeing in network traces--the succesfull > > > WRITEs are unstable but never get committed), which means > > > nfs_direct_commit_complete() is never called, and the reference taken on > > > wb_kref in the request_commit case of nfs_direct_write_completion is > > > never put. > > > > This sounds to me like the problem Scott's working - he sent a patch > > yesterday "nfs/pnfs: fix nfs_direct_req ref leak when i/o falls back to the > > mds". > > > > I think the the rule should be that once we call > > nfs_pgio_completion_ops->init_hdr, we have to finish with ->completion. > > However, there are some paths where that is not the case. > > Yes, I wondered about that.... > > Unfortunately after some more tests now I was think I was wrong, the > dreq_get()s and put()s are balanced and the bug is somewhere else--in > the case of my particular reproducer. Argh. But yes I can easily > believe there could be a leak there. So actually what happens is if you do a direct io write where some WRITEs succeed and the one fails, then this: if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) { dreq->flags = 0; dreq->error = hdr->error; } clears the NFS_ODIRECT_DO_COMMIT flag, so nfs_direct_write_complete never scheduels the commit calls. It looks like that leaves a bunch of nfs_pages on some to-be-committed list, so we end up leaking a bunch of stuff, with the most visible symptom being an unnecessarily sillyrename on close. I can just remove that clear of dreq_flags and that fixes the problem, but I doubt that's correct. --b. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: spurious sillyrename after O_DIRECT writes get ENOSPC 2017-12-14 18:55 ` J. Bruce Fields @ 2017-12-19 16:56 ` J. Bruce Fields 2018-01-16 15:08 ` [PATCH] NFS: commit direct writes even if they fail partially J. Bruce Fields 0 siblings, 1 reply; 7+ messages in thread From: J. Bruce Fields @ 2017-12-19 16:56 UTC (permalink / raw) To: Benjamin Coddington; +Cc: linux-nfs, hch, Trond Myklebust, Anna Schumaker On Thu, Dec 14, 2017 at 01:55:14PM -0500, J. Bruce Fields wrote: > So actually what happens is if you do a direct io write where some > WRITEs succeed and the one fails, then this: > > if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) { > dreq->flags = 0; > dreq->error = hdr->error; > } > > clears the NFS_ODIRECT_DO_COMMIT flag, so nfs_direct_write_complete > never scheduels the commit calls. It looks like that leaves a bunch of > nfs_pages on some to-be-committed list, so we end up leaking a bunch of > stuff, with the most visible symptom being an unnecessarily sillyrename > on close. > > I can just remove that clear of dreq_flags and that fixes the problem, > but I doubt that's correct. Or, maybe it is. If *any* WRITE(s) involved in this write might need a commit or reschedule, then surely we should run nfs_direct_write_schedule_work and let it sort them out? I'm having trouble seeing why clearing that field partway through could ever be correct. Trond, Anna, does the following look right? --b. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] NFS: commit direct writes even if they fail partially 2017-12-19 16:56 ` J. Bruce Fields @ 2018-01-16 15:08 ` J. Bruce Fields 0 siblings, 0 replies; 7+ messages in thread From: J. Bruce Fields @ 2018-01-16 15:08 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs, hch, Benjamin Coddington From: "J. Bruce Fields" <bfields@redhat.com> If some of the WRITE calls making up an O_DIRECT write syscall fail, we neglect to commit, even if some of the WRITEs succeed. We also depend on the commit code to free the reference count on the nfs_page taken in the "if (request_commit)" case at the end of nfs_direct_write_completion(). The problem was originally noticed because ENOSPC's encountered partway through a write would result in a closed file being sillyrenamed when it should have been unlinked. Signed-off-by: J. Bruce Fields <bfields@redhat.com> --- fs/nfs/direct.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index d2972d537469..8c10b0562e75 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -775,10 +775,8 @@ static void nfs_direct_write_completion(struct nfs_pgio_header *hdr) spin_lock(&dreq->lock); - if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) { - dreq->flags = 0; + if (test_bit(NFS_IOHDR_ERROR, &hdr->flags)) dreq->error = hdr->error; - } if (dreq->error == 0) { nfs_direct_good_bytes(dreq, hdr); if (nfs_write_need_commit(hdr)) { -- 2.14.3 ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-01-16 15:08 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-12-08 22:16 spurious sillyrename after O_DIRECT writes get ENOSPC J. Bruce Fields 2017-12-13 17:18 ` J. Bruce Fields 2017-12-14 13:08 ` Benjamin Coddington 2017-12-14 16:36 ` J. Bruce Fields 2017-12-14 18:55 ` J. Bruce Fields 2017-12-19 16:56 ` J. Bruce Fields 2018-01-16 15:08 ` [PATCH] NFS: commit direct writes even if they fail partially J. Bruce Fields
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.