* [PATCH 0/2] nfs: fix UAF in direct writes @ 2024-03-01 16:49 Josef Bacik 2024-03-01 16:49 ` [PATCH 1/2] nfs: properly protect nfs_direct_req fields Josef Bacik 2024-03-01 16:49 ` [PATCH 2/2] nfs: fix UAF in direct writes Josef Bacik 0 siblings, 2 replies; 6+ messages in thread From: Josef Bacik @ 2024-03-01 16:49 UTC (permalink / raw) To: trond.myklebust, anna, linux-nfs Hello, Here are two patches to address a UAF we've been seeing in our stress testing with NFS internally. The first one is to clean up the accesses to the nfs_direct_req fields which is more a matter of correctness than an actual observed problem. The second patch is to address the problem itself. I've been running these patches on my stress test for a few hours now, I would appreciate feedback and review to see if this is the correct way to fix the problem. Thanks, Josef Josef Bacik (2): nfs: properly protect nfs_direct_req fields nfs: fix UAF in direct writes fs/nfs/direct.c | 18 +++++++++++++++--- fs/nfs/write.c | 2 +- include/linux/nfs_fs.h | 1 + 3 files changed, 17 insertions(+), 4 deletions(-) -- 2.43.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] nfs: properly protect nfs_direct_req fields 2024-03-01 16:49 [PATCH 0/2] nfs: fix UAF in direct writes Josef Bacik @ 2024-03-01 16:49 ` Josef Bacik 2024-03-04 13:14 ` Jeff Layton 2024-03-01 16:49 ` [PATCH 2/2] nfs: fix UAF in direct writes Josef Bacik 1 sibling, 1 reply; 6+ messages in thread From: Josef Bacik @ 2024-03-01 16:49 UTC (permalink / raw) To: trond.myklebust, anna, linux-nfs We protect accesses to the nfs_direct_req fields with the dreq->lock ever where except nfs_direct_commit_complete. This isn't a huge deal, but it does lead to confusion, and we could potentially end up setting NFS_ODIRECT_RESCHED_WRITES in one thread where we've had an error in another. Clean this up to properly protect ->error and ->flags in the commit completion path. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/nfs/direct.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index c03926a1cc73..befcc167e25f 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -606,6 +606,7 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data) trace_nfs_direct_commit_complete(dreq); + spin_lock(&dreq->lock); if (status < 0) { /* Errors in commit are fatal */ dreq->error = status; @@ -613,6 +614,7 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data) } else { status = dreq->error; } + spin_unlock(&dreq->lock); nfs_init_cinfo_from_dreq(&cinfo, dreq); @@ -625,7 +627,10 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data) spin_unlock(&dreq->lock); nfs_release_request(req); } else if (!nfs_write_match_verf(verf, req)) { - dreq->flags = NFS_ODIRECT_RESCHED_WRITES; + spin_lock(&dreq->lock); + if (dreq->flags == 0) + dreq->flags = NFS_ODIRECT_RESCHED_WRITES; + spin_unlock(&dreq->lock); /* * Despite the reboot, the write was successful, * so reset wb_nio. -- 2.43.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] nfs: properly protect nfs_direct_req fields 2024-03-01 16:49 ` [PATCH 1/2] nfs: properly protect nfs_direct_req fields Josef Bacik @ 2024-03-04 13:14 ` Jeff Layton 0 siblings, 0 replies; 6+ messages in thread From: Jeff Layton @ 2024-03-04 13:14 UTC (permalink / raw) To: Josef Bacik, trond.myklebust, anna, linux-nfs On Fri, 2024-03-01 at 11:49 -0500, Josef Bacik wrote: > We protect accesses to the nfs_direct_req fields with the dreq->lock > ever where except nfs_direct_commit_complete. This isn't a huge deal, > but it does lead to confusion, and we could potentially end up setting > NFS_ODIRECT_RESCHED_WRITES in one thread where we've had an error in > another. Clean this up to properly protect ->error and ->flags in the > commit completion path. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > fs/nfs/direct.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c > index c03926a1cc73..befcc167e25f 100644 > --- a/fs/nfs/direct.c > +++ b/fs/nfs/direct.c > @@ -606,6 +606,7 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data) > > trace_nfs_direct_commit_complete(dreq); > > + spin_lock(&dreq->lock); > if (status < 0) { > /* Errors in commit are fatal */ > dreq->error = status; > @@ -613,6 +614,7 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data) > } else { > status = dreq->error; > } > + spin_unlock(&dreq->lock); > > nfs_init_cinfo_from_dreq(&cinfo, dreq); > > @@ -625,7 +627,10 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data) > spin_unlock(&dreq->lock); > nfs_release_request(req); > } else if (!nfs_write_match_verf(verf, req)) { > - dreq->flags = NFS_ODIRECT_RESCHED_WRITES; > + spin_lock(&dreq->lock); > + if (dreq->flags == 0) > + dreq->flags = NFS_ODIRECT_RESCHED_WRITES; > + spin_unlock(&dreq->lock); > /* > * Despite the reboot, the write was successful, > * so reset wb_nio. Seems reasonable, even if just for consistency's sake: Reviewed-by: Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/2] nfs: fix UAF in direct writes 2024-03-01 16:49 [PATCH 0/2] nfs: fix UAF in direct writes Josef Bacik 2024-03-01 16:49 ` [PATCH 1/2] nfs: properly protect nfs_direct_req fields Josef Bacik @ 2024-03-01 16:49 ` Josef Bacik 2024-03-04 13:30 ` Jeff Layton 2024-03-04 17:57 ` Trond Myklebust 1 sibling, 2 replies; 6+ messages in thread From: Josef Bacik @ 2024-03-01 16:49 UTC (permalink / raw) To: trond.myklebust, anna, linux-nfs In production we have been hitting the following warning consistently ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28 refcount_warn_saturate+0x9c/0xe0 Workqueue: nfsiod nfs_direct_write_schedule_work [nfs] RIP: 0010:refcount_warn_saturate+0x9c/0xe0 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x9f/0x130 ? refcount_warn_saturate+0x9c/0xe0 ? report_bug+0xcc/0x150 ? handle_bug+0x3d/0x70 ? exc_invalid_op+0x16/0x40 ? asm_exc_invalid_op+0x16/0x20 ? refcount_warn_saturate+0x9c/0xe0 nfs_direct_write_schedule_work+0x237/0x250 [nfs] process_one_work+0x12f/0x4a0 worker_thread+0x14e/0x3b0 ? ZSTD_getCParams_internal+0x220/0x220 kthread+0xdc/0x120 ? __btf_name_valid+0xa0/0xa0 ret_from_fork+0x1f/0x30 This is because we're completing the nfs_direct_request twice in a row. The source of this is when we have our commit requests to submit, we process them and send them off, and then in the completion path for the commit requests we have if (nfs_commit_end(cinfo.mds)) nfs_direct_write_complete(dreq); However since we're submitting asynchronous requests we sometimes have one that completes before we submit the next one, so we end up calling complete on the nfs_direct_request twice. The only other place we use nfs_generic_commit_list() is in __nfs_commit_inode, which wraps this call in a nfs_commit_begin(); nfs_commit_end(); Which is a common pattern for this style of completion handling, one that is also repeated in the direct code with get_dreq()/put_dreq() calls around where we process events as well as in the completion paths. Fix this by using the same pattern for the commit requests. Before with my 200 node rocksdb stress running this warning would pop every 10ish minutes. With my patch the stress test has been running for several hours without popping. Signed-off-by: Josef Bacik <josef@toxicpanda.com> --- fs/nfs/direct.c | 11 +++++++++-- fs/nfs/write.c | 2 +- include/linux/nfs_fs.h | 1 + 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c index befcc167e25f..6b8798d01e3a 100644 --- a/fs/nfs/direct.c +++ b/fs/nfs/direct.c @@ -672,10 +672,17 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq) LIST_HEAD(mds_list); nfs_init_cinfo_from_dreq(&cinfo, dreq); + nfs_commit_begin(cinfo.mds); nfs_scan_commit(dreq->inode, &mds_list, &cinfo); res = nfs_generic_commit_list(dreq->inode, &mds_list, 0, &cinfo); - if (res < 0) /* res == -ENOMEM */ - nfs_direct_write_reschedule(dreq); + if (res < 0) { /* res == -ENOMEM */ + spin_lock(&dreq->lock); + if (dreq->flags == 0) + dreq->flags = NFS_ODIRECT_RESCHED_WRITES; + spin_unlock(&dreq->lock); + } + if (nfs_commit_end(cinfo.mds)) + nfs_direct_write_complete(dreq); } static void nfs_direct_write_clear_reqs(struct nfs_direct_req *dreq) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index bb79d3a886ae..5d9dc6c05325 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1650,7 +1650,7 @@ static int wait_on_commit(struct nfs_mds_commit_info *cinfo) !atomic_read(&cinfo->rpcs_out)); } -static void nfs_commit_begin(struct nfs_mds_commit_info *cinfo) +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo) { atomic_inc(&cinfo->rpcs_out); } diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index f5ce7b101146..d59116ac8209 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -611,6 +611,7 @@ int nfs_wb_folio_cancel(struct inode *inode, struct folio *folio); extern int nfs_commit_inode(struct inode *, int); extern struct nfs_commit_data *nfs_commitdata_alloc(void); extern void nfs_commit_free(struct nfs_commit_data *data); +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo); bool nfs_commit_end(struct nfs_mds_commit_info *cinfo); static inline bool nfs_have_writebacks(const struct inode *inode) -- 2.43.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] nfs: fix UAF in direct writes 2024-03-01 16:49 ` [PATCH 2/2] nfs: fix UAF in direct writes Josef Bacik @ 2024-03-04 13:30 ` Jeff Layton 2024-03-04 17:57 ` Trond Myklebust 1 sibling, 0 replies; 6+ messages in thread From: Jeff Layton @ 2024-03-04 13:30 UTC (permalink / raw) To: Josef Bacik, trond.myklebust, anna, linux-nfs On Fri, 2024-03-01 at 11:49 -0500, Josef Bacik wrote: > In production we have been hitting the following warning consistently > > ------------[ cut here ]------------ > refcount_t: underflow; use-after-free. > WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28 refcount_warn_saturate+0x9c/0xe0 > Workqueue: nfsiod nfs_direct_write_schedule_work [nfs] > RIP: 0010:refcount_warn_saturate+0x9c/0xe0 > PKRU: 55555554 > Call Trace: > <TASK> > ? __warn+0x9f/0x130 > ? refcount_warn_saturate+0x9c/0xe0 > ? report_bug+0xcc/0x150 > ? handle_bug+0x3d/0x70 > ? exc_invalid_op+0x16/0x40 > ? asm_exc_invalid_op+0x16/0x20 > ? refcount_warn_saturate+0x9c/0xe0 > nfs_direct_write_schedule_work+0x237/0x250 [nfs] > process_one_work+0x12f/0x4a0 > worker_thread+0x14e/0x3b0 > ? ZSTD_getCParams_internal+0x220/0x220 > kthread+0xdc/0x120 > ? __btf_name_valid+0xa0/0xa0 > ret_from_fork+0x1f/0x30 > > This is because we're completing the nfs_direct_request twice in a row. > > The source of this is when we have our commit requests to submit, we > process them and send them off, and then in the completion path for the > commit requests we have > > if (nfs_commit_end(cinfo.mds)) > nfs_direct_write_complete(dreq); > > However since we're submitting asynchronous requests we sometimes have > one that completes before we submit the next one, so we end up calling > complete on the nfs_direct_request twice. > > The only other place we use nfs_generic_commit_list() is in > __nfs_commit_inode, which wraps this call in a > > nfs_commit_begin(); > nfs_commit_end(); > > Which is a common pattern for this style of completion handling, one > that is also repeated in the direct code with get_dreq()/put_dreq() > calls around where we process events as well as in the completion paths. > > Fix this by using the same pattern for the commit requests. > > Before with my 200 node rocksdb stress running this warning would pop > every 10ish minutes. With my patch the stress test has been running for > several hours without popping. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> > --- > fs/nfs/direct.c | 11 +++++++++-- > fs/nfs/write.c | 2 +- > include/linux/nfs_fs.h | 1 + > 3 files changed, 11 insertions(+), 3 deletions(-) > > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c > index befcc167e25f..6b8798d01e3a 100644 > --- a/fs/nfs/direct.c > +++ b/fs/nfs/direct.c > @@ -672,10 +672,17 @@ static void nfs_direct_commit_schedule(struct nfs_direct_req *dreq) > LIST_HEAD(mds_list); > > nfs_init_cinfo_from_dreq(&cinfo, dreq); > + nfs_commit_begin(cinfo.mds); > nfs_scan_commit(dreq->inode, &mds_list, &cinfo); > res = nfs_generic_commit_list(dreq->inode, &mds_list, 0, &cinfo); > - if (res < 0) /* res == -ENOMEM */ > - nfs_direct_write_reschedule(dreq); > + if (res < 0) { /* res == -ENOMEM */ > + spin_lock(&dreq->lock); > + if (dreq->flags == 0) > + dreq->flags = NFS_ODIRECT_RESCHED_WRITES; > + spin_unlock(&dreq->lock); > + } > + if (nfs_commit_end(cinfo.mds)) > + nfs_direct_write_complete(dreq); Before, it was calling nfs_direct_write_reschedule directly, but with the above change it's now just setting NFS_ODIRECT_RESCHED_WRITES and then queueing the completion workqueue job if it's the last reference (which should then go and reschedule the job, I think). That may be a reasonable change to make, but I think it merits some justification and mention in the changelog. Was that change necessary for some reason? If so, why? > } > > static void nfs_direct_write_clear_reqs(struct nfs_direct_req *dreq) > diff --git a/fs/nfs/write.c b/fs/nfs/write.c > index bb79d3a886ae..5d9dc6c05325 100644 > --- a/fs/nfs/write.c > +++ b/fs/nfs/write.c > @@ -1650,7 +1650,7 @@ static int wait_on_commit(struct nfs_mds_commit_info *cinfo) > !atomic_read(&cinfo->rpcs_out)); > } > > -static void nfs_commit_begin(struct nfs_mds_commit_info *cinfo) > +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo) > { > atomic_inc(&cinfo->rpcs_out); > } > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h > index f5ce7b101146..d59116ac8209 100644 > --- a/include/linux/nfs_fs.h > +++ b/include/linux/nfs_fs.h > @@ -611,6 +611,7 @@ int nfs_wb_folio_cancel(struct inode *inode, struct folio *folio); > extern int nfs_commit_inode(struct inode *, int); > extern struct nfs_commit_data *nfs_commitdata_alloc(void); > extern void nfs_commit_free(struct nfs_commit_data *data); > +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo); > bool nfs_commit_end(struct nfs_mds_commit_info *cinfo); > > static inline bool nfs_have_writebacks(const struct inode *inode) -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] nfs: fix UAF in direct writes 2024-03-01 16:49 ` [PATCH 2/2] nfs: fix UAF in direct writes Josef Bacik 2024-03-04 13:30 ` Jeff Layton @ 2024-03-04 17:57 ` Trond Myklebust 1 sibling, 0 replies; 6+ messages in thread From: Trond Myklebust @ 2024-03-04 17:57 UTC (permalink / raw) To: anna@kernel.org, josef@toxicpanda.com, linux-nfs@vger.kernel.org On Fri, 2024-03-01 at 11:49 -0500, Josef Bacik wrote: > In production we have been hitting the following warning consistently > > ------------[ cut here ]------------ > refcount_t: underflow; use-after-free. > WARNING: CPU: 17 PID: 1800359 at lib/refcount.c:28 > refcount_warn_saturate+0x9c/0xe0 > Workqueue: nfsiod nfs_direct_write_schedule_work [nfs] > RIP: 0010:refcount_warn_saturate+0x9c/0xe0 > PKRU: 55555554 > Call Trace: > <TASK> > ? __warn+0x9f/0x130 > ? refcount_warn_saturate+0x9c/0xe0 > ? report_bug+0xcc/0x150 > ? handle_bug+0x3d/0x70 > ? exc_invalid_op+0x16/0x40 > ? asm_exc_invalid_op+0x16/0x20 > ? refcount_warn_saturate+0x9c/0xe0 > nfs_direct_write_schedule_work+0x237/0x250 [nfs] > process_one_work+0x12f/0x4a0 > worker_thread+0x14e/0x3b0 > ? ZSTD_getCParams_internal+0x220/0x220 > kthread+0xdc/0x120 > ? __btf_name_valid+0xa0/0xa0 > ret_from_fork+0x1f/0x30 > > This is because we're completing the nfs_direct_request twice in a > row. > > The source of this is when we have our commit requests to submit, we > process them and send them off, and then in the completion path for > the > commit requests we have > > if (nfs_commit_end(cinfo.mds)) > nfs_direct_write_complete(dreq); > > However since we're submitting asynchronous requests we sometimes > have > one that completes before we submit the next one, so we end up > calling > complete on the nfs_direct_request twice. > > The only other place we use nfs_generic_commit_list() is in > __nfs_commit_inode, which wraps this call in a > > nfs_commit_begin(); > nfs_commit_end(); > > Which is a common pattern for this style of completion handling, one > that is also repeated in the direct code with get_dreq()/put_dreq() > calls around where we process events as well as in the completion > paths. > > Fix this by using the same pattern for the commit requests. > > Before with my 200 node rocksdb stress running this warning would pop > every 10ish minutes. With my patch the stress test has been running > for > several hours without popping. > > Signed-off-by: Josef Bacik <josef@toxicpanda.com> I think this deserves to be a stable patch. Thanks for finding it! > --- > fs/nfs/direct.c | 11 +++++++++-- > fs/nfs/write.c | 2 +- > include/linux/nfs_fs.h | 1 + > 3 files changed, 11 insertions(+), 3 deletions(-) > > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c > index befcc167e25f..6b8798d01e3a 100644 > --- a/fs/nfs/direct.c > +++ b/fs/nfs/direct.c > @@ -672,10 +672,17 @@ static void nfs_direct_commit_schedule(struct > nfs_direct_req *dreq) > LIST_HEAD(mds_list); > > nfs_init_cinfo_from_dreq(&cinfo, dreq); > + nfs_commit_begin(cinfo.mds); > nfs_scan_commit(dreq->inode, &mds_list, &cinfo); > res = nfs_generic_commit_list(dreq->inode, &mds_list, 0, > &cinfo); > - if (res < 0) /* res == -ENOMEM */ > - nfs_direct_write_reschedule(dreq); > + if (res < 0) { /* res == -ENOMEM */ > + spin_lock(&dreq->lock); > + if (dreq->flags == 0) > + dreq->flags = NFS_ODIRECT_RESCHED_WRITES; > + spin_unlock(&dreq->lock); > + } > + if (nfs_commit_end(cinfo.mds)) > + nfs_direct_write_complete(dreq); > } > > static void nfs_direct_write_clear_reqs(struct nfs_direct_req *dreq) > diff --git a/fs/nfs/write.c b/fs/nfs/write.c > index bb79d3a886ae..5d9dc6c05325 100644 > --- a/fs/nfs/write.c > +++ b/fs/nfs/write.c > @@ -1650,7 +1650,7 @@ static int wait_on_commit(struct > nfs_mds_commit_info *cinfo) > !atomic_read(&cinfo- > >rpcs_out)); > } > > -static void nfs_commit_begin(struct nfs_mds_commit_info *cinfo) > +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo) > { > atomic_inc(&cinfo->rpcs_out); > } > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h > index f5ce7b101146..d59116ac8209 100644 > --- a/include/linux/nfs_fs.h > +++ b/include/linux/nfs_fs.h > @@ -611,6 +611,7 @@ int nfs_wb_folio_cancel(struct inode *inode, > struct folio *folio); > extern int nfs_commit_inode(struct inode *, int); > extern struct nfs_commit_data *nfs_commitdata_alloc(void); > extern void nfs_commit_free(struct nfs_commit_data *data); > +void nfs_commit_begin(struct nfs_mds_commit_info *cinfo); > bool nfs_commit_end(struct nfs_mds_commit_info *cinfo); > > static inline bool nfs_have_writebacks(const struct inode *inode) -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-03-04 17:57 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-03-01 16:49 [PATCH 0/2] nfs: fix UAF in direct writes Josef Bacik 2024-03-01 16:49 ` [PATCH 1/2] nfs: properly protect nfs_direct_req fields Josef Bacik 2024-03-04 13:14 ` Jeff Layton 2024-03-01 16:49 ` [PATCH 2/2] nfs: fix UAF in direct writes Josef Bacik 2024-03-04 13:30 ` Jeff Layton 2024-03-04 17:57 ` Trond Myklebust
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).