* [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests
@ 2024-05-15  8:45 libaokun
  2024-05-15  8:45 ` [PATCH v2 01/12] cachefiles: remove request from xarry during flush requests libaokun
                   ` (12 more replies)
  0 siblings, 13 replies; 38+ messages in thread
From: libaokun @ 2024-05-15  8:45 UTC (permalink / raw)
  To: netfs, dhowells, jlayton
  Cc: hsiangkao, jefflexu, zhujia.zj, linux-erofs, linux-fsdevel,
	linux-kernel, libaokun, yangerkun, houtao1, yukuai3, wozizhi,
	Baokun Li
From: Baokun Li <libaokun1@huawei.com>
Hi all!
This is the second version of this patch series. Thank you, Jia Zhu and
Jingbo Xu, for the feedback in the previous version.
We've been testing ondemand mode for cachefiles since January, and we're
almost done. We hit a lot of issues during the testing period, and this
patch set fixes some of the issues related to ondemand requests.
The patches have passed internal testing without regression.
The following is a brief overview of the patches, see the patches for
more details.
Patch 1-5: Holding reference counts of reqs and objects on read requests
to avoid malicious restore leading to use-after-free.
Patch 6-10: Add some consistency checks to copen/cread/get_fd to avoid
malicious copen/cread/close fd injections causing use-after-free or hung.
Patch 11: When cache is marked as CACHEFILES_DEAD, flush all requests,
otherwise the kernel may be hung. since this state is irreversible, the
daemon can read open requests but cannot copen.
Patch 12: Allow interrupting a read request being processed by killing
the read process as a way of avoiding hung in some special cases.
Comments and questions are, as always, welcome.
Please let me know what you think.
Thanks,
Baokun
Changes since v1:
  * Collect RVB from Jia Zhu and Jingbo Xu.(Thanks for your review!)
  * Pathch 1: Add Fixes tag and enrich the commit message.
  * Pathch 7: Add function graph comments.
  * Pathch 8: Update commit message and comments.
  * Pathch 9: Enriched commit msg.
Baokun Li (11):
  cachefiles: remove request from xarry during flush requests
  cachefiles: remove err_put_fd tag in cachefiles_ondemand_daemon_read()
  cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
  cachefiles: fix slab-use-after-free in
    cachefiles_ondemand_daemon_read()
  cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd
  cachefiles: add consistency check for copen/cread
  cachefiles: add spin_lock for cachefiles_ondemand_info
  cachefiles: never get a new anonymous fd if ondemand_id is valid
  cachefiles: defer exposing anon_fd until after copy_to_user() succeeds
  cachefiles: flush all requests after setting CACHEFILES_DEAD
  cachefiles: make on-demand read killable
Zizhi Wo (1):
  cachefiles: Set object to close if ondemand_id < 0 in copen
 fs/cachefiles/daemon.c            |   3 +-
 fs/cachefiles/internal.h          |   5 +
 fs/cachefiles/ondemand.c          | 218 ++++++++++++++++++++++--------
 include/trace/events/cachefiles.h |   8 +-
 4 files changed, 177 insertions(+), 57 deletions(-)
-- 
2.39.2
^ permalink raw reply	[flat|nested] 38+ messages in thread
* [PATCH v2 01/12] cachefiles: remove request from xarry during flush requests
  2024-05-15  8:45 [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests libaokun
@ 2024-05-15  8:45 ` libaokun
  2024-05-20  2:20   ` Gao Xiang
  2024-05-20  7:09   ` Jingbo Xu
  2024-05-15  8:45 ` [PATCH v2 02/12] cachefiles: remove err_put_fd tag in cachefiles_ondemand_daemon_read() libaokun
                   ` (11 subsequent siblings)
  12 siblings, 2 replies; 38+ messages in thread
From: libaokun @ 2024-05-15  8:45 UTC (permalink / raw)
  To: netfs, dhowells, jlayton
  Cc: hsiangkao, jefflexu, zhujia.zj, linux-erofs, linux-fsdevel,
	linux-kernel, libaokun, yangerkun, houtao1, yukuai3, wozizhi,
	Baokun Li
From: Baokun Li <libaokun1@huawei.com>
Even with CACHEFILES_DEAD set, we can still read the requests, so in the
following concurrency the request may be used after it has been freed:
     mount  |   daemon_thread1    |    daemon_thread2
------------------------------------------------------------
 cachefiles_ondemand_init_object
  cachefiles_ondemand_send_req
   REQ_A = kzalloc(sizeof(*req) + data_len)
   wait_for_completion(&REQ_A->done)
            cachefiles_daemon_read
             cachefiles_ondemand_daemon_read
                                  // close dev fd
                                  cachefiles_flush_reqs
                                   complete(&REQ_A->done)
   kfree(REQ_A)
              xa_lock(&cache->reqs);
              cachefiles_ondemand_select_req
                req->msg.opcode != CACHEFILES_OP_READ
                // req use-after-free !!!
              xa_unlock(&cache->reqs);
                                   xa_destroy(&cache->reqs)
Hence remove requests from cache->reqs when flushing them to avoid
accessing freed requests.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
---
 fs/cachefiles/daemon.c | 1 +
 1 file changed, 1 insertion(+)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index 6465e2574230..ccb7b707ea4b 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -159,6 +159,7 @@ static void cachefiles_flush_reqs(struct cachefiles_cache *cache)
 	xa_for_each(xa, index, req) {
 		req->error = -EIO;
 		complete(&req->done);
+		__xa_erase(xa, index);
 	}
 	xa_unlock(xa);
 
-- 
2.39.2
^ permalink raw reply related	[flat|nested] 38+ messages in thread
* [PATCH v2 02/12] cachefiles: remove err_put_fd tag in cachefiles_ondemand_daemon_read()
  2024-05-15  8:45 [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests libaokun
  2024-05-15  8:45 ` [PATCH v2 01/12] cachefiles: remove request from xarry during flush requests libaokun
@ 2024-05-15  8:45 ` libaokun
  2024-05-20  2:23   ` Gao Xiang
  2024-05-15  8:45 ` [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd() libaokun
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: libaokun @ 2024-05-15  8:45 UTC (permalink / raw)
  To: netfs, dhowells, jlayton
  Cc: hsiangkao, jefflexu, zhujia.zj, linux-erofs, linux-fsdevel,
	linux-kernel, libaokun, yangerkun, houtao1, yukuai3, wozizhi,
	Baokun Li
From: Baokun Li <libaokun1@huawei.com>
The err_put_fd tag is only used once, so remove it to make the code more
readable.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
---
 fs/cachefiles/ondemand.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 4ba42f1fa3b4..fd49728d8bae 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -347,7 +347,9 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 
 	if (copy_to_user(_buffer, msg, n) != 0) {
 		ret = -EFAULT;
-		goto err_put_fd;
+		if (msg->opcode == CACHEFILES_OP_OPEN)
+			close_fd(((struct cachefiles_open *)msg->data)->fd);
+		goto error;
 	}
 
 	/* CLOSE request has no reply */
@@ -358,9 +360,6 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 
 	return n;
 
-err_put_fd:
-	if (msg->opcode == CACHEFILES_OP_OPEN)
-		close_fd(((struct cachefiles_open *)msg->data)->fd);
 error:
 	xa_erase(&cache->reqs, id);
 	req->error = ret;
-- 
2.39.2
^ permalink raw reply related	[flat|nested] 38+ messages in thread
* [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
  2024-05-15  8:45 [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests libaokun
  2024-05-15  8:45 ` [PATCH v2 01/12] cachefiles: remove request from xarry during flush requests libaokun
  2024-05-15  8:45 ` [PATCH v2 02/12] cachefiles: remove err_put_fd tag in cachefiles_ondemand_daemon_read() libaokun
@ 2024-05-15  8:45 ` libaokun
  2024-05-20  7:24   ` Jingbo Xu
  2024-05-20  8:06   ` Jingbo Xu
  2024-05-15  8:45 ` [PATCH v2 04/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read() libaokun
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 38+ messages in thread
From: libaokun @ 2024-05-15  8:45 UTC (permalink / raw)
  To: netfs, dhowells, jlayton
  Cc: hsiangkao, jefflexu, zhujia.zj, linux-erofs, linux-fsdevel,
	linux-kernel, libaokun, yangerkun, houtao1, yukuai3, wozizhi,
	Baokun Li
From: Baokun Li <libaokun1@huawei.com>
We got the following issue in a fuzz test of randomly issuing the restore
command:
==================================================================
BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0x609/0xab0
Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962
CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542
Call Trace:
 kasan_report+0x94/0xc0
 cachefiles_ondemand_daemon_read+0x609/0xab0
 vfs_read+0x169/0xb50
 ksys_read+0xf5/0x1e0
Allocated by task 626:
 __kmalloc+0x1df/0x4b0
 cachefiles_ondemand_send_req+0x24d/0x690
 cachefiles_create_tmpfile+0x249/0xb30
 cachefiles_create_file+0x6f/0x140
 cachefiles_look_up_object+0x29c/0xa60
 cachefiles_lookup_cookie+0x37d/0xca0
 fscache_cookie_state_machine+0x43c/0x1230
 [...]
Freed by task 626:
 kfree+0xf1/0x2c0
 cachefiles_ondemand_send_req+0x568/0x690
 cachefiles_create_tmpfile+0x249/0xb30
 cachefiles_create_file+0x6f/0x140
 cachefiles_look_up_object+0x29c/0xa60
 cachefiles_lookup_cookie+0x37d/0xca0
 fscache_cookie_state_machine+0x43c/0x1230
 [...]
==================================================================
Following is the process that triggers the issue:
     mount  |   daemon_thread1    |    daemon_thread2
------------------------------------------------------------
 cachefiles_ondemand_init_object
  cachefiles_ondemand_send_req
   REQ_A = kzalloc(sizeof(*req) + data_len)
   wait_for_completion(&REQ_A->done)
            cachefiles_daemon_read
             cachefiles_ondemand_daemon_read
              REQ_A = cachefiles_ondemand_select_req
              cachefiles_ondemand_get_fd
              copy_to_user(_buffer, msg, n)
            process_open_req(REQ_A)
                                  ------ restore ------
                                  cachefiles_ondemand_restore
                                  xas_for_each(&xas, req, ULONG_MAX)
                                   xas_set_mark(&xas, CACHEFILES_REQ_NEW);
                                  cachefiles_daemon_read
                                   cachefiles_ondemand_daemon_read
                                    REQ_A = cachefiles_ondemand_select_req
             write(devfd, ("copen %u,%llu", msg->msg_id, size));
             cachefiles_ondemand_copen
              xa_erase(&cache->reqs, id)
              complete(&REQ_A->done)
   kfree(REQ_A)
                                    cachefiles_ondemand_get_fd(REQ_A)
                                     fd = get_unused_fd_flags
                                     file = anon_inode_getfile
                                     fd_install(fd, file)
                                     load = (void *)REQ_A->msg.data;
                                     load->fd = fd;
                                     // load UAF !!!
This issue is caused by issuing a restore command when the daemon is still
alive, which results in a request being processed multiple times thus
triggering a UAF. So to avoid this problem, add an additional reference
count to cachefiles_req, which is held while waiting and reading, and then
released when the waiting and reading is over.
Note that since there is only one reference count for waiting, we need to
avoid the same request being completed multiple times, so we can only
complete the request if it is successfully removed from the xarray.
Fixes: e73fa11a356c ("cachefiles: add restore command to recover inflight ondemand read requests")
Suggested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
---
 fs/cachefiles/internal.h |  1 +
 fs/cachefiles/ondemand.c | 44 ++++++++++++++++++++++------------------
 2 files changed, 25 insertions(+), 20 deletions(-)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index d33169f0018b..7745b8abc3aa 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -138,6 +138,7 @@ static inline bool cachefiles_in_ondemand_mode(struct cachefiles_cache *cache)
 struct cachefiles_req {
 	struct cachefiles_object *object;
 	struct completion done;
+	refcount_t ref;
 	int error;
 	struct cachefiles_msg msg;
 };
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index fd49728d8bae..56d12fe4bf73 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -4,6 +4,12 @@
 #include <linux/uio.h>
 #include "internal.h"
 
+static inline void cachefiles_req_put(struct cachefiles_req *req)
+{
+	if (refcount_dec_and_test(&req->ref))
+		kfree(req);
+}
+
 static int cachefiles_ondemand_fd_release(struct inode *inode,
 					  struct file *file)
 {
@@ -299,7 +305,6 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 {
 	struct cachefiles_req *req;
 	struct cachefiles_msg *msg;
-	unsigned long id = 0;
 	size_t n;
 	int ret = 0;
 	XA_STATE(xas, &cache->reqs, cache->req_id_next);
@@ -330,41 +335,39 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 
 	xas_clear_mark(&xas, CACHEFILES_REQ_NEW);
 	cache->req_id_next = xas.xa_index + 1;
+	refcount_inc(&req->ref);
 	xa_unlock(&cache->reqs);
 
-	id = xas.xa_index;
-
 	if (msg->opcode == CACHEFILES_OP_OPEN) {
 		ret = cachefiles_ondemand_get_fd(req);
 		if (ret) {
 			cachefiles_ondemand_set_object_close(req->object);
-			goto error;
+			goto out;
 		}
 	}
 
-	msg->msg_id = id;
+	msg->msg_id = xas.xa_index;
 	msg->object_id = req->object->ondemand->ondemand_id;
 
 	if (copy_to_user(_buffer, msg, n) != 0) {
 		ret = -EFAULT;
 		if (msg->opcode == CACHEFILES_OP_OPEN)
 			close_fd(((struct cachefiles_open *)msg->data)->fd);
-		goto error;
 	}
-
-	/* CLOSE request has no reply */
-	if (msg->opcode == CACHEFILES_OP_CLOSE) {
-		xa_erase(&cache->reqs, id);
-		complete(&req->done);
+out:
+	/* Remove error request and CLOSE request has no reply */
+	if (ret || msg->opcode == CACHEFILES_OP_CLOSE) {
+		xas_reset(&xas);
+		xas_lock(&xas);
+		if (xas_load(&xas) == req) {
+			req->error = ret;
+			complete(&req->done);
+			xas_store(&xas, NULL);
+		}
+		xas_unlock(&xas);
 	}
-
-	return n;
-
-error:
-	xa_erase(&cache->reqs, id);
-	req->error = ret;
-	complete(&req->done);
-	return ret;
+	cachefiles_req_put(req);
+	return ret ? ret : n;
 }
 
 typedef int (*init_req_fn)(struct cachefiles_req *req, void *private);
@@ -394,6 +397,7 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
 		goto out;
 	}
 
+	refcount_set(&req->ref, 1);
 	req->object = object;
 	init_completion(&req->done);
 	req->msg.opcode = opcode;
@@ -455,7 +459,7 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
 	wake_up_all(&cache->daemon_pollwq);
 	wait_for_completion(&req->done);
 	ret = req->error;
-	kfree(req);
+	cachefiles_req_put(req);
 	return ret;
 out:
 	/* Reset the object to close state in error handling path.
-- 
2.39.2
^ permalink raw reply related	[flat|nested] 38+ messages in thread
* [PATCH v2 04/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read()
  2024-05-15  8:45 [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests libaokun
                   ` (2 preceding siblings ...)
  2024-05-15  8:45 ` [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd() libaokun
@ 2024-05-15  8:45 ` libaokun
  2024-05-20  7:36   ` Jingbo Xu
  2024-05-15  8:45 ` [PATCH v2 05/12] cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd libaokun
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: libaokun @ 2024-05-15  8:45 UTC (permalink / raw)
  To: netfs, dhowells, jlayton
  Cc: hsiangkao, jefflexu, zhujia.zj, linux-erofs, linux-fsdevel,
	linux-kernel, libaokun, yangerkun, houtao1, yukuai3, wozizhi,
	Baokun Li
From: Baokun Li <libaokun1@huawei.com>
We got the following issue in a fuzz test of randomly issuing the restore
command:
==================================================================
BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0xb41/0xb60
Read of size 8 at addr ffff888122e84088 by task ondemand-04-dae/963
CPU: 13 PID: 963 Comm: ondemand-04-dae Not tainted 6.8.0-dirty #564
Call Trace:
 kasan_report+0x93/0xc0
 cachefiles_ondemand_daemon_read+0xb41/0xb60
 vfs_read+0x169/0xb50
 ksys_read+0xf5/0x1e0
Allocated by task 116:
 kmem_cache_alloc+0x140/0x3a0
 cachefiles_lookup_cookie+0x140/0xcd0
 fscache_cookie_state_machine+0x43c/0x1230
 [...]
Freed by task 792:
 kmem_cache_free+0xfe/0x390
 cachefiles_put_object+0x241/0x480
 fscache_cookie_state_machine+0x5c8/0x1230
 [...]
==================================================================
Following is the process that triggers the issue:
     mount  |   daemon_thread1    |    daemon_thread2
------------------------------------------------------------
cachefiles_withdraw_cookie
 cachefiles_ondemand_clean_object(object)
  cachefiles_ondemand_send_req
   REQ_A = kzalloc(sizeof(*req) + data_len)
   wait_for_completion(&REQ_A->done)
            cachefiles_daemon_read
             cachefiles_ondemand_daemon_read
              REQ_A = cachefiles_ondemand_select_req
              msg->object_id = req->object->ondemand->ondemand_id
                                  ------ restore ------
                                  cachefiles_ondemand_restore
                                  xas_for_each(&xas, req, ULONG_MAX)
                                   xas_set_mark(&xas, CACHEFILES_REQ_NEW)
                                  cachefiles_daemon_read
                                   cachefiles_ondemand_daemon_read
                                    REQ_A = cachefiles_ondemand_select_req
              copy_to_user(_buffer, msg, n)
               xa_erase(&cache->reqs, id)
               complete(&REQ_A->done)
              ------ close(fd) ------
              cachefiles_ondemand_fd_release
               cachefiles_put_object
 cachefiles_put_object
  kmem_cache_free(cachefiles_object_jar, object)
                                    REQ_A->object->ondemand->ondemand_id
                                     // object UAF !!!
When we see the request within xa_lock, req->object must not have been
freed yet, so grab the reference count of object before xa_unlock to
avoid the above issue.
Fixes: 0a7e54c1959c ("cachefiles: resend an open request if the read request's object is closed")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
---
 fs/cachefiles/ondemand.c          | 2 ++
 include/trace/events/cachefiles.h | 6 +++++-
 2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 56d12fe4bf73..bb94ef6a6f61 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -336,6 +336,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 	xas_clear_mark(&xas, CACHEFILES_REQ_NEW);
 	cache->req_id_next = xas.xa_index + 1;
 	refcount_inc(&req->ref);
+	cachefiles_grab_object(req->object, cachefiles_obj_get_read_req);
 	xa_unlock(&cache->reqs);
 
 	if (msg->opcode == CACHEFILES_OP_OPEN) {
@@ -355,6 +356,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 			close_fd(((struct cachefiles_open *)msg->data)->fd);
 	}
 out:
+	cachefiles_put_object(req->object, cachefiles_obj_put_read_req);
 	/* Remove error request and CLOSE request has no reply */
 	if (ret || msg->opcode == CACHEFILES_OP_CLOSE) {
 		xas_reset(&xas);
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index cf4b98b9a9ed..119a823fb5a0 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -33,6 +33,8 @@ enum cachefiles_obj_ref_trace {
 	cachefiles_obj_see_withdrawal,
 	cachefiles_obj_get_ondemand_fd,
 	cachefiles_obj_put_ondemand_fd,
+	cachefiles_obj_get_read_req,
+	cachefiles_obj_put_read_req,
 };
 
 enum fscache_why_object_killed {
@@ -127,7 +129,9 @@ enum cachefiles_error_trace {
 	EM(cachefiles_obj_see_lookup_cookie,	"SEE lookup_cookie")	\
 	EM(cachefiles_obj_see_lookup_failed,	"SEE lookup_failed")	\
 	EM(cachefiles_obj_see_withdraw_cookie,	"SEE withdraw_cookie")	\
-	E_(cachefiles_obj_see_withdrawal,	"SEE withdrawal")
+	EM(cachefiles_obj_see_withdrawal,	"SEE withdrawal")	\
+	EM(cachefiles_obj_get_read_req,		"GET read_req")		\
+	E_(cachefiles_obj_put_read_req,		"PUT read_req")
 
 #define cachefiles_coherency_traces					\
 	EM(cachefiles_coherency_check_aux,	"BAD aux ")		\
-- 
2.39.2
^ permalink raw reply related	[flat|nested] 38+ messages in thread
* [PATCH v2 05/12] cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd
  2024-05-15  8:45 [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests libaokun
                   ` (3 preceding siblings ...)
  2024-05-15  8:45 ` [PATCH v2 04/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read() libaokun
@ 2024-05-15  8:45 ` libaokun
  2024-05-20  7:40   ` Jingbo Xu
  2024-05-15  8:45 ` [PATCH v2 06/12] cachefiles: add consistency check for copen/cread libaokun
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: libaokun @ 2024-05-15  8:45 UTC (permalink / raw)
  To: netfs, dhowells, jlayton
  Cc: hsiangkao, jefflexu, zhujia.zj, linux-erofs, linux-fsdevel,
	linux-kernel, libaokun, yangerkun, houtao1, yukuai3, wozizhi,
	Baokun Li
From: Baokun Li <libaokun1@huawei.com>
This lets us see the correct trace output.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 include/trace/events/cachefiles.h | 2 ++
 1 file changed, 2 insertions(+)
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
index 119a823fb5a0..bb56e3104b12 100644
--- a/include/trace/events/cachefiles.h
+++ b/include/trace/events/cachefiles.h
@@ -130,6 +130,8 @@ enum cachefiles_error_trace {
 	EM(cachefiles_obj_see_lookup_failed,	"SEE lookup_failed")	\
 	EM(cachefiles_obj_see_withdraw_cookie,	"SEE withdraw_cookie")	\
 	EM(cachefiles_obj_see_withdrawal,	"SEE withdrawal")	\
+	EM(cachefiles_obj_get_ondemand_fd,      "GET ondemand_fd")      \
+	EM(cachefiles_obj_put_ondemand_fd,      "PUT ondemand_fd")      \
 	EM(cachefiles_obj_get_read_req,		"GET read_req")		\
 	E_(cachefiles_obj_put_read_req,		"PUT read_req")
 
-- 
2.39.2
^ permalink raw reply related	[flat|nested] 38+ messages in thread
* [PATCH v2 06/12] cachefiles: add consistency check for copen/cread
  2024-05-15  8:45 [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests libaokun
                   ` (4 preceding siblings ...)
  2024-05-15  8:45 ` [PATCH v2 05/12] cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd libaokun
@ 2024-05-15  8:45 ` libaokun
  2024-05-15  8:45 ` [PATCH v2 07/12] cachefiles: add spin_lock for cachefiles_ondemand_info libaokun
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: libaokun @ 2024-05-15  8:45 UTC (permalink / raw)
  To: netfs, dhowells, jlayton
  Cc: hsiangkao, jefflexu, zhujia.zj, linux-erofs, linux-fsdevel,
	linux-kernel, libaokun, yangerkun, houtao1, yukuai3, wozizhi,
	Baokun Li
From: Baokun Li <libaokun1@huawei.com>
This prevents malicious processes from completing random copen/cread
requests and crashing the system. Added checks are listed below:
  * Generic, copen can only complete open requests, and cread can only
    complete read requests.
  * For copen, ondemand_id must not be 0, because this indicates that the
    request has not been read by the daemon.
  * For cread, the object corresponding to fd and req should be the same.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/cachefiles/ondemand.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index bb94ef6a6f61..898fab68332b 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -82,12 +82,12 @@ static loff_t cachefiles_ondemand_fd_llseek(struct file *filp, loff_t pos,
 }
 
 static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl,
-					 unsigned long arg)
+					 unsigned long id)
 {
 	struct cachefiles_object *object = filp->private_data;
 	struct cachefiles_cache *cache = object->volume->cache;
 	struct cachefiles_req *req;
-	unsigned long id;
+	XA_STATE(xas, &cache->reqs, id);
 
 	if (ioctl != CACHEFILES_IOC_READ_COMPLETE)
 		return -EINVAL;
@@ -95,10 +95,15 @@ static long cachefiles_ondemand_fd_ioctl(struct file *filp, unsigned int ioctl,
 	if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
 		return -EOPNOTSUPP;
 
-	id = arg;
-	req = xa_erase(&cache->reqs, id);
-	if (!req)
+	xa_lock(&cache->reqs);
+	req = xas_load(&xas);
+	if (!req || req->msg.opcode != CACHEFILES_OP_READ ||
+	    req->object != object) {
+		xa_unlock(&cache->reqs);
 		return -EINVAL;
+	}
+	xas_store(&xas, NULL);
+	xa_unlock(&cache->reqs);
 
 	trace_cachefiles_ondemand_cread(object, id);
 	complete(&req->done);
@@ -126,6 +131,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
 	unsigned long id;
 	long size;
 	int ret;
+	XA_STATE(xas, &cache->reqs, 0);
 
 	if (!test_bit(CACHEFILES_ONDEMAND_MODE, &cache->flags))
 		return -EOPNOTSUPP;
@@ -149,9 +155,16 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
 	if (ret)
 		return ret;
 
-	req = xa_erase(&cache->reqs, id);
-	if (!req)
+	xa_lock(&cache->reqs);
+	xas.xa_index = id;
+	req = xas_load(&xas);
+	if (!req || req->msg.opcode != CACHEFILES_OP_OPEN ||
+	    !req->object->ondemand->ondemand_id) {
+		xa_unlock(&cache->reqs);
 		return -EINVAL;
+	}
+	xas_store(&xas, NULL);
+	xa_unlock(&cache->reqs);
 
 	/* fail OPEN request if copen format is invalid */
 	ret = kstrtol(psize, 0, &size);
-- 
2.39.2
^ permalink raw reply related	[flat|nested] 38+ messages in thread
* [PATCH v2 07/12] cachefiles: add spin_lock for cachefiles_ondemand_info
  2024-05-15  8:45 [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests libaokun
                   ` (5 preceding siblings ...)
  2024-05-15  8:45 ` [PATCH v2 06/12] cachefiles: add consistency check for copen/cread libaokun
@ 2024-05-15  8:45 ` libaokun
  2024-05-15  8:45 ` [PATCH v2 08/12] cachefiles: never get a new anonymous fd if ondemand_id is valid libaokun
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: libaokun @ 2024-05-15  8:45 UTC (permalink / raw)
  To: netfs, dhowells, jlayton
  Cc: hsiangkao, jefflexu, zhujia.zj, linux-erofs, linux-fsdevel,
	linux-kernel, libaokun, yangerkun, houtao1, yukuai3, wozizhi,
	Baokun Li
From: Baokun Li <libaokun1@huawei.com>
The following concurrency may cause a read request to fail to be completed
and result in a hung:
           t1             |             t2
---------------------------------------------------------
                            cachefiles_ondemand_copen
                              req = xa_erase(&cache->reqs, id)
// Anon fd is maliciously closed.
cachefiles_ondemand_fd_release
  xa_lock(&cache->reqs)
  cachefiles_ondemand_set_object_close(object)
  xa_unlock(&cache->reqs)
                              cachefiles_ondemand_set_object_open
                              // No one will ever close it again.
cachefiles_ondemand_daemon_read
  cachefiles_ondemand_select_req
  // Get a read req but its fd is already closed.
  // The daemon can't issue a cread ioctl with an closed fd, then hung.
So add spin_lock for cachefiles_ondemand_info to protect ondemand_id and
state, thus we can avoid the above problem in cachefiles_ondemand_copen()
by using ondemand_id to determine if fd has been closed.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/cachefiles/internal.h |  1 +
 fs/cachefiles/ondemand.c | 35 ++++++++++++++++++++++++++++++++++-
 2 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 7745b8abc3aa..45c8bed60538 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -55,6 +55,7 @@ struct cachefiles_ondemand_info {
 	int				ondemand_id;
 	enum cachefiles_object_state	state;
 	struct cachefiles_object	*object;
+	spinlock_t			lock;
 };
 
 /*
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 898fab68332b..d04ddc6576e3 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -16,13 +16,16 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
 	struct cachefiles_object *object = file->private_data;
 	struct cachefiles_cache *cache = object->volume->cache;
 	struct cachefiles_ondemand_info *info = object->ondemand;
-	int object_id = info->ondemand_id;
+	int object_id;
 	struct cachefiles_req *req;
 	XA_STATE(xas, &cache->reqs, 0);
 
 	xa_lock(&cache->reqs);
+	spin_lock(&info->lock);
+	object_id = info->ondemand_id;
 	info->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED;
 	cachefiles_ondemand_set_object_close(object);
+	spin_unlock(&info->lock);
 
 	/* Only flush CACHEFILES_REQ_NEW marked req to avoid race with daemon_read */
 	xas_for_each_marked(&xas, req, ULONG_MAX, CACHEFILES_REQ_NEW) {
@@ -127,6 +130,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
 {
 	struct cachefiles_req *req;
 	struct fscache_cookie *cookie;
+	struct cachefiles_ondemand_info *info;
 	char *pid, *psize;
 	unsigned long id;
 	long size;
@@ -185,6 +189,33 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
 		goto out;
 	}
 
+	info = req->object->ondemand;
+	spin_lock(&info->lock);
+	/*
+	 * The anonymous fd was closed before copen ? Fail the request.
+	 *
+	 *             t1             |             t2
+	 * ---------------------------------------------------------
+	 *                             cachefiles_ondemand_copen
+	 *                             req = xa_erase(&cache->reqs, id)
+	 * // Anon fd is maliciously closed.
+	 * cachefiles_ondemand_fd_release
+	 * xa_lock(&cache->reqs)
+	 * cachefiles_ondemand_set_object_close(object)
+	 * xa_unlock(&cache->reqs)
+	 *                             cachefiles_ondemand_set_object_open
+	 *                             // No one will ever close it again.
+	 * cachefiles_ondemand_daemon_read
+	 * cachefiles_ondemand_select_req
+	 *
+	 * Get a read req but its fd is already closed. The daemon can't
+	 * issue a cread ioctl with an closed fd, then hung.
+	 */
+	if (info->ondemand_id == CACHEFILES_ONDEMAND_ID_CLOSED) {
+		spin_unlock(&info->lock);
+		req->error = -EBADFD;
+		goto out;
+	}
 	cookie = req->object->cookie;
 	cookie->object_size = size;
 	if (size)
@@ -194,6 +225,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
 	trace_cachefiles_ondemand_copen(req->object, id, size);
 
 	cachefiles_ondemand_set_object_open(req->object);
+	spin_unlock(&info->lock);
 	wake_up_all(&cache->daemon_pollwq);
 
 out:
@@ -596,6 +628,7 @@ int cachefiles_ondemand_init_obj_info(struct cachefiles_object *object,
 		return -ENOMEM;
 
 	object->ondemand->object = object;
+	spin_lock_init(&object->ondemand->lock);
 	INIT_WORK(&object->ondemand->ondemand_work, ondemand_object_worker);
 	return 0;
 }
-- 
2.39.2
^ permalink raw reply related	[flat|nested] 38+ messages in thread
* [PATCH v2 08/12] cachefiles: never get a new anonymous fd if ondemand_id is valid
  2024-05-15  8:45 [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests libaokun
                   ` (6 preceding siblings ...)
  2024-05-15  8:45 ` [PATCH v2 07/12] cachefiles: add spin_lock for cachefiles_ondemand_info libaokun
@ 2024-05-15  8:45 ` libaokun
  2024-05-20  8:43   ` Jingbo Xu
  2024-05-15  8:45 ` [PATCH v2 09/12] cachefiles: defer exposing anon_fd until after copy_to_user() succeeds libaokun
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: libaokun @ 2024-05-15  8:45 UTC (permalink / raw)
  To: netfs, dhowells, jlayton
  Cc: hsiangkao, jefflexu, zhujia.zj, linux-erofs, linux-fsdevel,
	linux-kernel, libaokun, yangerkun, houtao1, yukuai3, wozizhi,
	Baokun Li
From: Baokun Li <libaokun1@huawei.com>
Now every time the daemon reads an open request, it gets a new anonymous fd
and ondemand_id. With the introduction of "restore", it is possible to read
the same open request more than once, and therefore an object can have more
than one anonymous fd.
If the anonymous fd is not unique, the following concurrencies will result
in an fd leak:
     t1     |         t2         |          t3
------------------------------------------------------------
 cachefiles_ondemand_init_object
  cachefiles_ondemand_send_req
   REQ_A = kzalloc(sizeof(*req) + data_len)
   wait_for_completion(&REQ_A->done)
            cachefiles_daemon_read
             cachefiles_ondemand_daemon_read
              REQ_A = cachefiles_ondemand_select_req
              cachefiles_ondemand_get_fd
                load->fd = fd0
                ondemand_id = object_id0
                                  ------ restore ------
                                  cachefiles_ondemand_restore
                                   // restore REQ_A
                                  cachefiles_daemon_read
                                   cachefiles_ondemand_daemon_read
                                    REQ_A = cachefiles_ondemand_select_req
                                      cachefiles_ondemand_get_fd
                                        load->fd = fd1
                                        ondemand_id = object_id1
             process_open_req(REQ_A)
             write(devfd, ("copen %u,%llu", msg->msg_id, size))
             cachefiles_ondemand_copen
              xa_erase(&cache->reqs, id)
              complete(&REQ_A->done)
   kfree(REQ_A)
                                  process_open_req(REQ_A)
                                  // copen fails due to no req
                                  // daemon close(fd1)
                                  cachefiles_ondemand_fd_release
                                   // set object closed
 -- umount --
 cachefiles_withdraw_cookie
  cachefiles_ondemand_clean_object
   cachefiles_ondemand_init_close_req
    if (!cachefiles_ondemand_object_is_open(object))
      return -ENOENT;
    // The fd0 is not closed until the daemon exits.
However, the anonymous fd holds the reference count of the object and the
object holds the reference count of the cookie. So even though the cookie
has been relinquished, it will not be unhashed and freed until the daemon
exits.
In fscache_hash_cookie(), when the same cookie is found in the hash list,
if the cookie is set with the FSCACHE_COOKIE_RELINQUISHED bit, then the new
cookie waits for the old cookie to be unhashed, while the old cookie is
waiting for the leaked fd to be closed, if the daemon does not exit in time
it will trigger a hung task.
To avoid this, allocate a new anonymous fd only if no anonymous fd has
been allocated (ondemand_id == 0) or if the previously allocated anonymous
fd has been closed (ondemand_id == -1). Moreover, returns an error if
ondemand_id is valid, letting the daemon know that the current userland
restore logic is abnormal and needs to be checked.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/cachefiles/ondemand.c | 34 ++++++++++++++++++++++++++++------
 1 file changed, 28 insertions(+), 6 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index d04ddc6576e3..d2d4e27fca6f 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -14,11 +14,18 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
 					  struct file *file)
 {
 	struct cachefiles_object *object = file->private_data;
-	struct cachefiles_cache *cache = object->volume->cache;
-	struct cachefiles_ondemand_info *info = object->ondemand;
+	struct cachefiles_cache *cache;
+	struct cachefiles_ondemand_info *info;
 	int object_id;
 	struct cachefiles_req *req;
-	XA_STATE(xas, &cache->reqs, 0);
+	XA_STATE(xas, NULL, 0);
+
+	if (!object)
+		return 0;
+
+	info = object->ondemand;
+	cache = object->volume->cache;
+	xas.xa = &cache->reqs;
 
 	xa_lock(&cache->reqs);
 	spin_lock(&info->lock);
@@ -288,22 +295,39 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
 		goto err_put_fd;
 	}
 
+	spin_lock(&object->ondemand->lock);
+	if (object->ondemand->ondemand_id > 0) {
+		spin_unlock(&object->ondemand->lock);
+		/* Pair with check in cachefiles_ondemand_fd_release(). */
+		file->private_data = NULL;
+		ret = -EEXIST;
+		goto err_put_file;
+	}
+
 	file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
 	fd_install(fd, file);
 
 	load = (void *)req->msg.data;
 	load->fd = fd;
 	object->ondemand->ondemand_id = object_id;
+	spin_unlock(&object->ondemand->lock);
 
 	cachefiles_get_unbind_pincount(cache);
 	trace_cachefiles_ondemand_open(object, &req->msg, load);
 	return 0;
 
+err_put_file:
+	fput(file);
 err_put_fd:
 	put_unused_fd(fd);
 err_free_id:
 	xa_erase(&cache->ondemand_ids, object_id);
 err:
+	spin_lock(&object->ondemand->lock);
+	/* Avoid marking an opened object as closed. */
+	if (object->ondemand->ondemand_id <= 0)
+		cachefiles_ondemand_set_object_close(object);
+	spin_unlock(&object->ondemand->lock);
 	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
 	return ret;
 }
@@ -386,10 +410,8 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 
 	if (msg->opcode == CACHEFILES_OP_OPEN) {
 		ret = cachefiles_ondemand_get_fd(req);
-		if (ret) {
-			cachefiles_ondemand_set_object_close(req->object);
+		if (ret)
 			goto out;
-		}
 	}
 
 	msg->msg_id = xas.xa_index;
-- 
2.39.2
^ permalink raw reply related	[flat|nested] 38+ messages in thread
* [PATCH v2 09/12] cachefiles: defer exposing anon_fd until after copy_to_user() succeeds
  2024-05-15  8:45 [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests libaokun
                   ` (7 preceding siblings ...)
  2024-05-15  8:45 ` [PATCH v2 08/12] cachefiles: never get a new anonymous fd if ondemand_id is valid libaokun
@ 2024-05-15  8:45 ` libaokun
  2024-05-20  9:39   ` Jingbo Xu
  2024-05-15  8:45 ` [PATCH v2 10/12] cachefiles: Set object to close if ondemand_id < 0 in copen libaokun
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 38+ messages in thread
From: libaokun @ 2024-05-15  8:45 UTC (permalink / raw)
  To: netfs, dhowells, jlayton
  Cc: hsiangkao, jefflexu, zhujia.zj, linux-erofs, linux-fsdevel,
	linux-kernel, libaokun, yangerkun, houtao1, yukuai3, wozizhi,
	Baokun Li
From: Baokun Li <libaokun1@huawei.com>
After installing the anonymous fd, we can now see it in userland and close
it. However, at this point we may not have gotten the reference count of
the cache, but we will put it during colse fd, so this may cause a cache
UAF.
So grab the cache reference count before fd_install(). In addition, by
kernel convention, fd is taken over by the user land after fd_install(),
and the kernel should not call close_fd() after that, i.e., it should call
fd_install() after everything is ready, thus fd_install() is called after
copy_to_user() succeeds.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Suggested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/cachefiles/ondemand.c | 53 +++++++++++++++++++++++++---------------
 1 file changed, 33 insertions(+), 20 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index d2d4e27fca6f..3a36613e00a7 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -4,6 +4,11 @@
 #include <linux/uio.h>
 #include "internal.h"
 
+struct anon_file {
+	struct file *file;
+	int fd;
+};
+
 static inline void cachefiles_req_put(struct cachefiles_req *req)
 {
 	if (refcount_dec_and_test(&req->ref))
@@ -263,14 +268,14 @@ int cachefiles_ondemand_restore(struct cachefiles_cache *cache, char *args)
 	return 0;
 }
 
-static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
+static int cachefiles_ondemand_get_fd(struct cachefiles_req *req,
+				      struct anon_file *anon_file)
 {
 	struct cachefiles_object *object;
 	struct cachefiles_cache *cache;
 	struct cachefiles_open *load;
-	struct file *file;
 	u32 object_id;
-	int ret, fd;
+	int ret;
 
 	object = cachefiles_grab_object(req->object,
 			cachefiles_obj_get_ondemand_fd);
@@ -282,16 +287,16 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
 	if (ret < 0)
 		goto err;
 
-	fd = get_unused_fd_flags(O_WRONLY);
-	if (fd < 0) {
-		ret = fd;
+	anon_file->fd = get_unused_fd_flags(O_WRONLY);
+	if (anon_file->fd < 0) {
+		ret = anon_file->fd;
 		goto err_free_id;
 	}
 
-	file = anon_inode_getfile("[cachefiles]", &cachefiles_ondemand_fd_fops,
-				  object, O_WRONLY);
-	if (IS_ERR(file)) {
-		ret = PTR_ERR(file);
+	anon_file->file = anon_inode_getfile("[cachefiles]",
+				&cachefiles_ondemand_fd_fops, object, O_WRONLY);
+	if (IS_ERR(anon_file->file)) {
+		ret = PTR_ERR(anon_file->file);
 		goto err_put_fd;
 	}
 
@@ -299,16 +304,15 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
 	if (object->ondemand->ondemand_id > 0) {
 		spin_unlock(&object->ondemand->lock);
 		/* Pair with check in cachefiles_ondemand_fd_release(). */
-		file->private_data = NULL;
+		anon_file->file->private_data = NULL;
 		ret = -EEXIST;
 		goto err_put_file;
 	}
 
-	file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
-	fd_install(fd, file);
+	anon_file->file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
 
 	load = (void *)req->msg.data;
-	load->fd = fd;
+	load->fd = anon_file->fd;
 	object->ondemand->ondemand_id = object_id;
 	spin_unlock(&object->ondemand->lock);
 
@@ -317,9 +321,11 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
 	return 0;
 
 err_put_file:
-	fput(file);
+	fput(anon_file->file);
+	anon_file->file = NULL;
 err_put_fd:
-	put_unused_fd(fd);
+	put_unused_fd(anon_file->fd);
+	anon_file->fd = ret;
 err_free_id:
 	xa_erase(&cache->ondemand_ids, object_id);
 err:
@@ -376,6 +382,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 	struct cachefiles_msg *msg;
 	size_t n;
 	int ret = 0;
+	struct anon_file anon_file;
 	XA_STATE(xas, &cache->reqs, cache->req_id_next);
 
 	xa_lock(&cache->reqs);
@@ -409,7 +416,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 	xa_unlock(&cache->reqs);
 
 	if (msg->opcode == CACHEFILES_OP_OPEN) {
-		ret = cachefiles_ondemand_get_fd(req);
+		ret = cachefiles_ondemand_get_fd(req, &anon_file);
 		if (ret)
 			goto out;
 	}
@@ -417,10 +424,16 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
 	msg->msg_id = xas.xa_index;
 	msg->object_id = req->object->ondemand->ondemand_id;
 
-	if (copy_to_user(_buffer, msg, n) != 0) {
+	if (copy_to_user(_buffer, msg, n) != 0)
 		ret = -EFAULT;
-		if (msg->opcode == CACHEFILES_OP_OPEN)
-			close_fd(((struct cachefiles_open *)msg->data)->fd);
+
+	if (msg->opcode == CACHEFILES_OP_OPEN) {
+		if (ret < 0) {
+			fput(anon_file.file);
+			put_unused_fd(anon_file.fd);
+			goto out;
+		}
+		fd_install(anon_file.fd, anon_file.file);
 	}
 out:
 	cachefiles_put_object(req->object, cachefiles_obj_put_read_req);
-- 
2.39.2
^ permalink raw reply related	[flat|nested] 38+ messages in thread
* [PATCH v2 10/12] cachefiles: Set object to close if ondemand_id < 0 in copen
  2024-05-15  8:45 [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests libaokun
                   ` (8 preceding siblings ...)
  2024-05-15  8:45 ` [PATCH v2 09/12] cachefiles: defer exposing anon_fd until after copy_to_user() succeeds libaokun
@ 2024-05-15  8:45 ` libaokun
  2024-05-15  8:46 ` [PATCH v2 11/12] cachefiles: flush all requests after setting CACHEFILES_DEAD libaokun
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 38+ messages in thread
From: libaokun @ 2024-05-15  8:45 UTC (permalink / raw)
  To: netfs, dhowells, jlayton
  Cc: hsiangkao, jefflexu, zhujia.zj, linux-erofs, linux-fsdevel,
	linux-kernel, libaokun, yangerkun, houtao1, yukuai3, wozizhi,
	Baokun Li
From: Zizhi Wo <wozizhi@huawei.com>
If copen is maliciously called in the user mode, it may delete the request
corresponding to the random id. And the request may have not been read yet.
Note that when the object is set to reopen, the open request will be done
with the still reopen state in above case. As a result, the request
corresponding to this object is always skipped in select_req function, so
the read request is never completed and blocks other process.
Fix this issue by simply set object to close if its id < 0 in copen.
Signed-off-by: Zizhi Wo <wozizhi@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
---
 fs/cachefiles/ondemand.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index 3a36613e00a7..a511d0a89109 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -182,6 +182,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
 	xas_store(&xas, NULL);
 	xa_unlock(&cache->reqs);
 
+	info = req->object->ondemand;
 	/* fail OPEN request if copen format is invalid */
 	ret = kstrtol(psize, 0, &size);
 	if (ret) {
@@ -201,7 +202,6 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
 		goto out;
 	}
 
-	info = req->object->ondemand;
 	spin_lock(&info->lock);
 	/*
 	 * The anonymous fd was closed before copen ? Fail the request.
@@ -241,6 +241,11 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args)
 	wake_up_all(&cache->daemon_pollwq);
 
 out:
+	spin_lock(&info->lock);
+	/* Need to set object close to avoid reopen status continuing */
+	if (info->ondemand_id == CACHEFILES_ONDEMAND_ID_CLOSED)
+		cachefiles_ondemand_set_object_close(req->object);
+	spin_unlock(&info->lock);
 	complete(&req->done);
 	return ret;
 }
-- 
2.39.2
^ permalink raw reply related	[flat|nested] 38+ messages in thread
* [PATCH v2 11/12] cachefiles: flush all requests after setting CACHEFILES_DEAD
  2024-05-15  8:45 [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests libaokun
                   ` (9 preceding siblings ...)
  2024-05-15  8:45 ` [PATCH v2 10/12] cachefiles: Set object to close if ondemand_id < 0 in copen libaokun
@ 2024-05-15  8:46 ` libaokun
  2024-05-15  8:46 ` [PATCH v2 12/12] cachefiles: make on-demand read killable libaokun
  2024-05-19 10:56 ` [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests Jeff Layton
  12 siblings, 0 replies; 38+ messages in thread
From: libaokun @ 2024-05-15  8:46 UTC (permalink / raw)
  To: netfs, dhowells, jlayton
  Cc: hsiangkao, jefflexu, zhujia.zj, linux-erofs, linux-fsdevel,
	linux-kernel, libaokun, yangerkun, houtao1, yukuai3, wozizhi,
	Baokun Li
From: Baokun Li <libaokun1@huawei.com>
In ondemand mode, when the daemon is processing an open request, if the
kernel flags the cache as CACHEFILES_DEAD, the cachefiles_daemon_write()
will always return -EIO, so the daemon can't pass the copen to the kernel.
Then the kernel process that is waiting for the copen triggers a hung_task.
Since the DEAD state is irreversible, it can only be exited by closing
/dev/cachefiles. Therefore, after calling cachefiles_io_error() to mark
the cache as CACHEFILES_DEAD, if in ondemand mode, flush all requests to
avoid the above hungtask. We may still be able to read some of the cached
data before closing the fd of /dev/cachefiles.
Note that this relies on the patch that adds reference counting to the req,
otherwise it may UAF.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/cachefiles/daemon.c   | 2 +-
 fs/cachefiles/internal.h | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
index ccb7b707ea4b..06cdf1a8a16f 100644
--- a/fs/cachefiles/daemon.c
+++ b/fs/cachefiles/daemon.c
@@ -133,7 +133,7 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
-static void cachefiles_flush_reqs(struct cachefiles_cache *cache)
+void cachefiles_flush_reqs(struct cachefiles_cache *cache)
 {
 	struct xarray *xa = &cache->reqs;
 	struct cachefiles_req *req;
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
index 45c8bed60538..6845a90cdfcc 100644
--- a/fs/cachefiles/internal.h
+++ b/fs/cachefiles/internal.h
@@ -188,6 +188,7 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache,
  * daemon.c
  */
 extern const struct file_operations cachefiles_daemon_fops;
+extern void cachefiles_flush_reqs(struct cachefiles_cache *cache);
 extern void cachefiles_get_unbind_pincount(struct cachefiles_cache *cache);
 extern void cachefiles_put_unbind_pincount(struct cachefiles_cache *cache);
 
@@ -426,6 +427,8 @@ do {							\
 	pr_err("I/O Error: " FMT"\n", ##__VA_ARGS__);	\
 	fscache_io_error((___cache)->cache);		\
 	set_bit(CACHEFILES_DEAD, &(___cache)->flags);	\
+	if (cachefiles_in_ondemand_mode(___cache))	\
+		cachefiles_flush_reqs(___cache);	\
 } while (0)
 
 #define cachefiles_io_error_obj(object, FMT, ...)			\
-- 
2.39.2
^ permalink raw reply related	[flat|nested] 38+ messages in thread
* [PATCH v2 12/12] cachefiles: make on-demand read killable
  2024-05-15  8:45 [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests libaokun
                   ` (10 preceding siblings ...)
  2024-05-15  8:46 ` [PATCH v2 11/12] cachefiles: flush all requests after setting CACHEFILES_DEAD libaokun
@ 2024-05-15  8:46 ` libaokun
  2024-05-19 10:56 ` [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests Jeff Layton
  12 siblings, 0 replies; 38+ messages in thread
From: libaokun @ 2024-05-15  8:46 UTC (permalink / raw)
  To: netfs, dhowells, jlayton
  Cc: hsiangkao, jefflexu, zhujia.zj, linux-erofs, linux-fsdevel,
	linux-kernel, libaokun, yangerkun, houtao1, yukuai3, wozizhi,
	Baokun Li
From: Baokun Li <libaokun1@huawei.com>
Replacing wait_for_completion() with wait_for_completion_killable() in
cachefiles_ondemand_send_req() allows us to kill processes that might
trigger a hunk_task if the daemon is abnormal.
But now only CACHEFILES_OP_READ is killable, because OP_CLOSE and OP_OPEN
is initiated from kworker context and the signal is prohibited in these
kworker.
Note that when the req in xas changes, i.e. xas_load(&xas) != req, it
means that a process will complete the current request soon, so wait
again for the request to be completed.
Suggested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
---
 fs/cachefiles/ondemand.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
index a511d0a89109..bdc2d6dbadce 100644
--- a/fs/cachefiles/ondemand.c
+++ b/fs/cachefiles/ondemand.c
@@ -544,8 +544,25 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
 		goto out;
 
 	wake_up_all(&cache->daemon_pollwq);
-	wait_for_completion(&req->done);
-	ret = req->error;
+wait:
+	ret = wait_for_completion_killable(&req->done);
+	if (!ret) {
+		ret = req->error;
+	} else {
+		xas_reset(&xas);
+		xas_lock(&xas);
+		if (xas_load(&xas) == req) {
+			xas_store(&xas, NULL);
+			ret = -EINTR;
+		}
+		xas_unlock(&xas);
+
+		/* Someone will complete it soon. */
+		if (ret != -EINTR) {
+			cpu_relax();
+			goto wait;
+		}
+	}
 	cachefiles_req_put(req);
 	return ret;
 out:
-- 
2.39.2
^ permalink raw reply related	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests
  2024-05-15  8:45 [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests libaokun
                   ` (11 preceding siblings ...)
  2024-05-15  8:46 ` [PATCH v2 12/12] cachefiles: make on-demand read killable libaokun
@ 2024-05-19 10:56 ` Jeff Layton
  12 siblings, 0 replies; 38+ messages in thread
From: Jeff Layton @ 2024-05-19 10:56 UTC (permalink / raw)
  To: libaokun, netfs, dhowells
  Cc: hsiangkao, jefflexu, zhujia.zj, linux-erofs, linux-fsdevel,
	linux-kernel, yangerkun, houtao1, yukuai3, wozizhi, Baokun Li
On Wed, 2024-05-15 at 16:45 +0800, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> Hi all!
> 
> This is the second version of this patch series. Thank you, Jia Zhu and
> Jingbo Xu, for the feedback in the previous version.
> 
> We've been testing ondemand mode for cachefiles since January, and we're
> almost done. We hit a lot of issues during the testing period, and this
> patch set fixes some of the issues related to ondemand requests.
> The patches have passed internal testing without regression.
> 
> The following is a brief overview of the patches, see the patches for
> more details.
> 
> Patch 1-5: Holding reference counts of reqs and objects on read requests
> to avoid malicious restore leading to use-after-free.
> 
> Patch 6-10: Add some consistency checks to copen/cread/get_fd to avoid
> malicious copen/cread/close fd injections causing use-after-free or hung.
> 
> Patch 11: When cache is marked as CACHEFILES_DEAD, flush all requests,
> otherwise the kernel may be hung. since this state is irreversible, the
> daemon can read open requests but cannot copen.
> 
> Patch 12: Allow interrupting a read request being processed by killing
> the read process as a way of avoiding hung in some special cases.
> 
> Comments and questions are, as always, welcome.
> Please let me know what you think.
> 
> Thanks,
> Baokun
> 
> Changes since v1:
>   * Collect RVB from Jia Zhu and Jingbo Xu.(Thanks for your review!)
>   * Pathch 1: Add Fixes tag and enrich the commit message.
>   * Pathch 7: Add function graph comments.
>   * Pathch 8: Update commit message and comments.
>   * Pathch 9: Enriched commit msg.
> 
> Baokun Li (11):
>   cachefiles: remove request from xarry during flush requests
>   cachefiles: remove err_put_fd tag in cachefiles_ondemand_daemon_read()
>   cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
>   cachefiles: fix slab-use-after-free in
>     cachefiles_ondemand_daemon_read()
>   cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd
>   cachefiles: add consistency check for copen/cread
>   cachefiles: add spin_lock for cachefiles_ondemand_info
>   cachefiles: never get a new anonymous fd if ondemand_id is valid
>   cachefiles: defer exposing anon_fd until after copy_to_user() succeeds
>   cachefiles: flush all requests after setting CACHEFILES_DEAD
>   cachefiles: make on-demand read killable
> 
> Zizhi Wo (1):
>   cachefiles: Set object to close if ondemand_id < 0 in copen
> 
>  fs/cachefiles/daemon.c            |   3 +-
>  fs/cachefiles/internal.h          |   5 +
>  fs/cachefiles/ondemand.c          | 218 ++++++++++++++++++++++--------
>  include/trace/events/cachefiles.h |   8 +-
>  4 files changed, 177 insertions(+), 57 deletions(-)
> 
Looks like most of these are fixes inside the ondemand code, which I
don't have the greatest grasp of, so...
Acked-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 01/12] cachefiles: remove request from xarry during flush requests
  2024-05-15  8:45 ` [PATCH v2 01/12] cachefiles: remove request from xarry during flush requests libaokun
@ 2024-05-20  2:20   ` Gao Xiang
  2024-05-20  4:11     ` Baokun Li
  2024-05-20  7:09   ` Jingbo Xu
  1 sibling, 1 reply; 38+ messages in thread
From: Gao Xiang @ 2024-05-20  2:20 UTC (permalink / raw)
  To: libaokun, netfs, dhowells, jlayton
  Cc: jefflexu, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li
On 2024/5/15 16:45, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
The subject line can be
"cachefiles: remove requests from xarray during flushing requests"
> 
> Even with CACHEFILES_DEAD set, we can still read the requests, so in the
> following concurrency the request may be used after it has been freed:
> 
>       mount  |   daemon_thread1    |    daemon_thread2
> ------------------------------------------------------------
>   cachefiles_ondemand_init_object
>    cachefiles_ondemand_send_req
>     REQ_A = kzalloc(sizeof(*req) + data_len)
>     wait_for_completion(&REQ_A->done)
>              cachefiles_daemon_read
>               cachefiles_ondemand_daemon_read
>                                    // close dev fd
>                                    cachefiles_flush_reqs
>                                     complete(&REQ_A->done)
>     kfree(REQ_A)
>                xa_lock(&cache->reqs);
>                cachefiles_ondemand_select_req
>                  req->msg.opcode != CACHEFILES_OP_READ
>                  // req use-after-free !!!
>                xa_unlock(&cache->reqs);
>                                     xa_destroy(&cache->reqs)
> 
> Hence remove requests from cache->reqs when flushing them to avoid
> accessing freed requests.
> 
> Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Thanks,
Gao Xiang
> ---
>   fs/cachefiles/daemon.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
> index 6465e2574230..ccb7b707ea4b 100644
> --- a/fs/cachefiles/daemon.c
> +++ b/fs/cachefiles/daemon.c
> @@ -159,6 +159,7 @@ static void cachefiles_flush_reqs(struct cachefiles_cache *cache)
>   	xa_for_each(xa, index, req) {
>   		req->error = -EIO;
>   		complete(&req->done);
> +		__xa_erase(xa, index);
>   	}
>   	xa_unlock(xa);
>   
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 02/12] cachefiles: remove err_put_fd tag in cachefiles_ondemand_daemon_read()
  2024-05-15  8:45 ` [PATCH v2 02/12] cachefiles: remove err_put_fd tag in cachefiles_ondemand_daemon_read() libaokun
@ 2024-05-20  2:23   ` Gao Xiang
  2024-05-20  4:15     ` Baokun Li
  0 siblings, 1 reply; 38+ messages in thread
From: Gao Xiang @ 2024-05-20  2:23 UTC (permalink / raw)
  To: libaokun, netfs, dhowells, jlayton
  Cc: jefflexu, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li
On 2024/5/15 16:45, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> The err_put_fd tag is only used once, so remove it to make the code more
The err_put_fd label ..
Also the subject line needs to be updated too.  ("C goto label")
> readable.
> 
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Thanks,
Gao Xiang
> ---
>   fs/cachefiles/ondemand.c | 7 +++----
>   1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
> index 4ba42f1fa3b4..fd49728d8bae 100644
> --- a/fs/cachefiles/ondemand.c
> +++ b/fs/cachefiles/ondemand.c
> @@ -347,7 +347,9 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>   
>   	if (copy_to_user(_buffer, msg, n) != 0) {
>   		ret = -EFAULT;
> -		goto err_put_fd;
> +		if (msg->opcode == CACHEFILES_OP_OPEN)
> +			close_fd(((struct cachefiles_open *)msg->data)->fd);
> +		goto error;
>   	}
>   
>   	/* CLOSE request has no reply */
> @@ -358,9 +360,6 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>   
>   	return n;
>   
> -err_put_fd:
> -	if (msg->opcode == CACHEFILES_OP_OPEN)
> -		close_fd(((struct cachefiles_open *)msg->data)->fd);
>   error:
>   	xa_erase(&cache->reqs, id);
>   	req->error = ret;
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 01/12] cachefiles: remove request from xarry during flush requests
  2024-05-20  2:20   ` Gao Xiang
@ 2024-05-20  4:11     ` Baokun Li
  0 siblings, 0 replies; 38+ messages in thread
From: Baokun Li @ 2024-05-20  4:11 UTC (permalink / raw)
  To: Gao Xiang, netfs, dhowells, jlayton
  Cc: jefflexu, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li, libaokun
On 2024/5/20 10:20, Gao Xiang wrote:
>
>
> On 2024/5/15 16:45, libaokun@huaweicloud.com wrote:
>> From: Baokun Li <libaokun1@huawei.com>
>
>
> The subject line can be
> "cachefiles: remove requests from xarray during flushing requests"
Thank you very much for the correction! I will update my patch.
>
>>
>> Even with CACHEFILES_DEAD set, we can still read the requests, so in the
>> following concurrency the request may be used after it has been freed:
>>
>>       mount  |   daemon_thread1    |    daemon_thread2
>> ------------------------------------------------------------
>>   cachefiles_ondemand_init_object
>>    cachefiles_ondemand_send_req
>>     REQ_A = kzalloc(sizeof(*req) + data_len)
>>     wait_for_completion(&REQ_A->done)
>>              cachefiles_daemon_read
>>               cachefiles_ondemand_daemon_read
>>                                    // close dev fd
>>                                    cachefiles_flush_reqs
>> complete(&REQ_A->done)
>>     kfree(REQ_A)
>>                xa_lock(&cache->reqs);
>>                cachefiles_ondemand_select_req
>>                  req->msg.opcode != CACHEFILES_OP_READ
>>                  // req use-after-free !!!
>>                xa_unlock(&cache->reqs);
>> xa_destroy(&cache->reqs)
>>
>> Hence remove requests from cache->reqs when flushing them to avoid
>> accessing freed requests.
>>
>> Fixes: c8383054506c ("cachefiles: notify the user daemon when looking 
>> up cookie")
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
>
> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
>
> Thanks,
> Gao Xiang
Thanks a lot for the review!
Cheers,
Baokun
>
>> ---
>>   fs/cachefiles/daemon.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
>> index 6465e2574230..ccb7b707ea4b 100644
>> --- a/fs/cachefiles/daemon.c
>> +++ b/fs/cachefiles/daemon.c
>> @@ -159,6 +159,7 @@ static void cachefiles_flush_reqs(struct 
>> cachefiles_cache *cache)
>>       xa_for_each(xa, index, req) {
>>           req->error = -EIO;
>>           complete(&req->done);
>> +        __xa_erase(xa, index);
>>       }
>>       xa_unlock(xa);
-- 
With Best Regards,
Baokun Li
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 02/12] cachefiles: remove err_put_fd tag in cachefiles_ondemand_daemon_read()
  2024-05-20  2:23   ` Gao Xiang
@ 2024-05-20  4:15     ` Baokun Li
  0 siblings, 0 replies; 38+ messages in thread
From: Baokun Li @ 2024-05-20  4:15 UTC (permalink / raw)
  To: Gao Xiang, netfs, dhowells, jlayton
  Cc: jefflexu, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li, libaokun
On 2024/5/20 10:23, Gao Xiang wrote:
>
>
> On 2024/5/15 16:45, libaokun@huaweicloud.com wrote:
>> From: Baokun Li <libaokun1@huawei.com>
>>
>> The err_put_fd tag is only used once, so remove it to make the code more
>
> The err_put_fd label ..
>
> Also the subject line needs to be updated too.  ("C goto label")
>
Sorry about that, and thank you very much for the correction!
I will correct "tag" to "label" in the next iteration.
>> readable.
>>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
>
> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
>
> Thanks,
> Gao Xiang
Thank you very much for your review!
Regards,
Baokun
>
>> ---
>>   fs/cachefiles/ondemand.c | 7 +++----
>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
>> index 4ba42f1fa3b4..fd49728d8bae 100644
>> --- a/fs/cachefiles/ondemand.c
>> +++ b/fs/cachefiles/ondemand.c
>> @@ -347,7 +347,9 @@ ssize_t cachefiles_ondemand_daemon_read(struct 
>> cachefiles_cache *cache,
>>         if (copy_to_user(_buffer, msg, n) != 0) {
>>           ret = -EFAULT;
>> -        goto err_put_fd;
>> +        if (msg->opcode == CACHEFILES_OP_OPEN)
>> +            close_fd(((struct cachefiles_open *)msg->data)->fd);
>> +        goto error;
>>       }
>>         /* CLOSE request has no reply */
>> @@ -358,9 +360,6 @@ ssize_t cachefiles_ondemand_daemon_read(struct 
>> cachefiles_cache *cache,
>>         return n;
>>   -err_put_fd:
>> -    if (msg->opcode == CACHEFILES_OP_OPEN)
>> -        close_fd(((struct cachefiles_open *)msg->data)->fd);
>>   error:
>>       xa_erase(&cache->reqs, id);
>>       req->error = ret;
-- 
With Best Regards,
Baokun Li
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 01/12] cachefiles: remove request from xarry during flush requests
  2024-05-15  8:45 ` [PATCH v2 01/12] cachefiles: remove request from xarry during flush requests libaokun
  2024-05-20  2:20   ` Gao Xiang
@ 2024-05-20  7:09   ` Jingbo Xu
  1 sibling, 0 replies; 38+ messages in thread
From: Jingbo Xu @ 2024-05-20  7:09 UTC (permalink / raw)
  To: libaokun, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li
On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> Even with CACHEFILES_DEAD set, we can still read the requests, so in the
> following concurrency the request may be used after it has been freed:
> 
>      mount  |   daemon_thread1    |    daemon_thread2
> ------------------------------------------------------------
>  cachefiles_ondemand_init_object
>   cachefiles_ondemand_send_req
>    REQ_A = kzalloc(sizeof(*req) + data_len)
>    wait_for_completion(&REQ_A->done)
>             cachefiles_daemon_read
>              cachefiles_ondemand_daemon_read
>                                   // close dev fd
>                                   cachefiles_flush_reqs
>                                    complete(&REQ_A->done)
>    kfree(REQ_A)
>               xa_lock(&cache->reqs);
>               cachefiles_ondemand_select_req
>                 req->msg.opcode != CACHEFILES_OP_READ
>                 // req use-after-free !!!
>               xa_unlock(&cache->reqs);
>                                    xa_destroy(&cache->reqs)
> 
> Hence remove requests from cache->reqs when flushing them to avoid
> accessing freed requests.
> 
> Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
> ---
>  fs/cachefiles/daemon.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c
> index 6465e2574230..ccb7b707ea4b 100644
> --- a/fs/cachefiles/daemon.c
> +++ b/fs/cachefiles/daemon.c
> @@ -159,6 +159,7 @@ static void cachefiles_flush_reqs(struct cachefiles_cache *cache)
>  	xa_for_each(xa, index, req) {
>  		req->error = -EIO;
>  		complete(&req->done);
> +		__xa_erase(xa, index);
>  	}
>  	xa_unlock(xa);
>  
-- 
Thanks,
Jingbo
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
  2024-05-15  8:45 ` [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd() libaokun
@ 2024-05-20  7:24   ` Jingbo Xu
  2024-05-20  8:38     ` Baokun Li
  2024-05-20  8:06   ` Jingbo Xu
  1 sibling, 1 reply; 38+ messages in thread
From: Jingbo Xu @ 2024-05-20  7:24 UTC (permalink / raw)
  To: libaokun, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li
On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> We got the following issue in a fuzz test of randomly issuing the restore
> command:
> 
> ==================================================================
> BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0x609/0xab0
> Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962
> 
> CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542
> Call Trace:
>  kasan_report+0x94/0xc0
>  cachefiles_ondemand_daemon_read+0x609/0xab0
>  vfs_read+0x169/0xb50
>  ksys_read+0xf5/0x1e0
> 
> Allocated by task 626:
>  __kmalloc+0x1df/0x4b0
>  cachefiles_ondemand_send_req+0x24d/0x690
>  cachefiles_create_tmpfile+0x249/0xb30
>  cachefiles_create_file+0x6f/0x140
>  cachefiles_look_up_object+0x29c/0xa60
>  cachefiles_lookup_cookie+0x37d/0xca0
>  fscache_cookie_state_machine+0x43c/0x1230
>  [...]
> 
> Freed by task 626:
>  kfree+0xf1/0x2c0
>  cachefiles_ondemand_send_req+0x568/0x690
>  cachefiles_create_tmpfile+0x249/0xb30
>  cachefiles_create_file+0x6f/0x140
>  cachefiles_look_up_object+0x29c/0xa60
>  cachefiles_lookup_cookie+0x37d/0xca0
>  fscache_cookie_state_machine+0x43c/0x1230
>  [...]
> ==================================================================
> 
> Following is the process that triggers the issue:
> 
>      mount  |   daemon_thread1    |    daemon_thread2
> ------------------------------------------------------------
>  cachefiles_ondemand_init_object
>   cachefiles_ondemand_send_req
>    REQ_A = kzalloc(sizeof(*req) + data_len)
>    wait_for_completion(&REQ_A->done)
> 
>             cachefiles_daemon_read
>              cachefiles_ondemand_daemon_read
>               REQ_A = cachefiles_ondemand_select_req
>               cachefiles_ondemand_get_fd
>               copy_to_user(_buffer, msg, n)
>             process_open_req(REQ_A)
>                                   ------ restore ------
>                                   cachefiles_ondemand_restore
>                                   xas_for_each(&xas, req, ULONG_MAX)
>                                    xas_set_mark(&xas, CACHEFILES_REQ_NEW);
> 
>                                   cachefiles_daemon_read
>                                    cachefiles_ondemand_daemon_read
>                                     REQ_A = cachefiles_ondemand_select_req
> 
>              write(devfd, ("copen %u,%llu", msg->msg_id, size));
>              cachefiles_ondemand_copen
>               xa_erase(&cache->reqs, id)
>               complete(&REQ_A->done)
>    kfree(REQ_A)
>                                     cachefiles_ondemand_get_fd(REQ_A)
>                                      fd = get_unused_fd_flags
>                                      file = anon_inode_getfile
>                                      fd_install(fd, file)
>                                      load = (void *)REQ_A->msg.data;
>                                      load->fd = fd;
>                                      // load UAF !!!
> 
> This issue is caused by issuing a restore command when the daemon is still
> alive, which results in a request being processed multiple times thus
> triggering a UAF. So to avoid this problem, add an additional reference
> count to cachefiles_req, which is held while waiting and reading, and then
> released when the waiting and reading is over.
> 
> Note that since there is only one reference count for waiting, we need to
> avoid the same request being completed multiple times, so we can only
> complete the request if it is successfully removed from the xarray.
Sorry the above description makes me confused.  As the same request may
be got by different daemon threads multiple times, the introduced
refcount mechanism can't protect it from being completed multiple times
(which is expected).  The refcount only protects it from being freed
multiple times.
> 
> Fixes: e73fa11a356c ("cachefiles: add restore command to recover inflight ondemand read requests")
> Suggested-by: Hou Tao <houtao1@huawei.com>
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
> ---
>  fs/cachefiles/internal.h |  1 +
>  fs/cachefiles/ondemand.c | 44 ++++++++++++++++++++++------------------
>  2 files changed, 25 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
> index d33169f0018b..7745b8abc3aa 100644
> --- a/fs/cachefiles/internal.h
> +++ b/fs/cachefiles/internal.h
> @@ -138,6 +138,7 @@ static inline bool cachefiles_in_ondemand_mode(struct cachefiles_cache *cache)
>  struct cachefiles_req {
>  	struct cachefiles_object *object;
>  	struct completion done;
> +	refcount_t ref;
>  	int error;
>  	struct cachefiles_msg msg;
>  };
> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
> index fd49728d8bae..56d12fe4bf73 100644
> --- a/fs/cachefiles/ondemand.c
> +++ b/fs/cachefiles/ondemand.c
> @@ -4,6 +4,12 @@
>  #include <linux/uio.h>
>  #include "internal.h"
>  
> +static inline void cachefiles_req_put(struct cachefiles_req *req)
> +{
> +	if (refcount_dec_and_test(&req->ref))
> +		kfree(req);
> +}
> +
>  static int cachefiles_ondemand_fd_release(struct inode *inode,
>  					  struct file *file)
>  {
> @@ -299,7 +305,6 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>  {
>  	struct cachefiles_req *req;
>  	struct cachefiles_msg *msg;
> -	unsigned long id = 0;
>  	size_t n;
>  	int ret = 0;
>  	XA_STATE(xas, &cache->reqs, cache->req_id_next);
> @@ -330,41 +335,39 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>  
>  	xas_clear_mark(&xas, CACHEFILES_REQ_NEW);
>  	cache->req_id_next = xas.xa_index + 1;
> +	refcount_inc(&req->ref);
>  	xa_unlock(&cache->reqs);
>  
> -	id = xas.xa_index;
> -
>  	if (msg->opcode == CACHEFILES_OP_OPEN) {
>  		ret = cachefiles_ondemand_get_fd(req);
>  		if (ret) {
>  			cachefiles_ondemand_set_object_close(req->object);
> -			goto error;
> +			goto out;
>  		}
>  	}
>  
> -	msg->msg_id = id;
> +	msg->msg_id = xas.xa_index;
>  	msg->object_id = req->object->ondemand->ondemand_id;
>  
>  	if (copy_to_user(_buffer, msg, n) != 0) {
>  		ret = -EFAULT;
>  		if (msg->opcode == CACHEFILES_OP_OPEN)
>  			close_fd(((struct cachefiles_open *)msg->data)->fd);
> -		goto error;
>  	}
> -
> -	/* CLOSE request has no reply */
> -	if (msg->opcode == CACHEFILES_OP_CLOSE) {
> -		xa_erase(&cache->reqs, id);
> -		complete(&req->done);
> +out:
> +	/* Remove error request and CLOSE request has no reply */
> +	if (ret || msg->opcode == CACHEFILES_OP_CLOSE) {
> +		xas_reset(&xas);
> +		xas_lock(&xas);
> +		if (xas_load(&xas) == req) {
Just out of curiosity... How could xas_load(&xas) doesn't equal to req?
> +			req->error = ret;
> +			complete(&req->done);
> +			xas_store(&xas, NULL);
> +		}
> +		xas_unlock(&xas);
>  	}
> -
> -	return n;
> -
> -error:
> -	xa_erase(&cache->reqs, id);
> -	req->error = ret;
> -	complete(&req->done);
> -	return ret;
> +	cachefiles_req_put(req);
> +	return ret ? ret : n;
>  }
This is actually a combination of a fix and a cleanup which combines the
logic of removing error request and the CLOSE requests into one place.
Also it relies on the cleanup made in patch 2 ("cachefiles: remove
err_put_fd tag in cachefiles_ondemand_daemon_read()"), making it
difficult to be atomatically back ported to the stable (as patch 2 is
not marked as "Fixes").
Thus could we make the fix first, and then make the cleanup.
>  
>  typedef int (*init_req_fn)(struct cachefiles_req *req, void *private);
> @@ -394,6 +397,7 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
>  		goto out;
>  	}
>  
> +	refcount_set(&req->ref, 1);
>  	req->object = object;
>  	init_completion(&req->done);
>  	req->msg.opcode = opcode;
> @@ -455,7 +459,7 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
>  	wake_up_all(&cache->daemon_pollwq);
>  	wait_for_completion(&req->done);
>  	ret = req->error;
> -	kfree(req);
> +	cachefiles_req_put(req);
>  	return ret;
>  out:
>  	/* Reset the object to close state in error handling path.
Don't we need to also convert "kfree(req)" to cachefiles_req_put(req)
for the error path of cachefiles_ondemand_send_req()?
```
out:
	/* Reset the object to close state in error handling path.
	 * If error occurs after creating the anonymous fd,
	 * cachefiles_ondemand_fd_release() will set object to close.
	 */
	if (opcode == CACHEFILES_OP_OPEN)
		cachefiles_ondemand_set_object_close(object);
	kfree(req);
	return ret;
```
-- 
Thanks,
Jingbo
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 04/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read()
  2024-05-15  8:45 ` [PATCH v2 04/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read() libaokun
@ 2024-05-20  7:36   ` Jingbo Xu
  2024-05-20  8:56     ` Baokun Li
  0 siblings, 1 reply; 38+ messages in thread
From: Jingbo Xu @ 2024-05-20  7:36 UTC (permalink / raw)
  To: libaokun, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li
On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> We got the following issue in a fuzz test of randomly issuing the restore
> command:
> 
> ==================================================================
> BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0xb41/0xb60
> Read of size 8 at addr ffff888122e84088 by task ondemand-04-dae/963
> 
> CPU: 13 PID: 963 Comm: ondemand-04-dae Not tainted 6.8.0-dirty #564
> Call Trace:
>  kasan_report+0x93/0xc0
>  cachefiles_ondemand_daemon_read+0xb41/0xb60
>  vfs_read+0x169/0xb50
>  ksys_read+0xf5/0x1e0
> 
> Allocated by task 116:
>  kmem_cache_alloc+0x140/0x3a0
>  cachefiles_lookup_cookie+0x140/0xcd0
>  fscache_cookie_state_machine+0x43c/0x1230
>  [...]
> 
> Freed by task 792:
>  kmem_cache_free+0xfe/0x390
>  cachefiles_put_object+0x241/0x480
>  fscache_cookie_state_machine+0x5c8/0x1230
>  [...]
> ==================================================================
> 
> Following is the process that triggers the issue:
> 
>      mount  |   daemon_thread1    |    daemon_thread2
> ------------------------------------------------------------
> cachefiles_withdraw_cookie
>  cachefiles_ondemand_clean_object(object)
>   cachefiles_ondemand_send_req
>    REQ_A = kzalloc(sizeof(*req) + data_len)
>    wait_for_completion(&REQ_A->done)
> 
>             cachefiles_daemon_read
>              cachefiles_ondemand_daemon_read
>               REQ_A = cachefiles_ondemand_select_req
>               msg->object_id = req->object->ondemand->ondemand_id
>                                   ------ restore ------
>                                   cachefiles_ondemand_restore
>                                   xas_for_each(&xas, req, ULONG_MAX)
>                                    xas_set_mark(&xas, CACHEFILES_REQ_NEW)
> 
>                                   cachefiles_daemon_read
>                                    cachefiles_ondemand_daemon_read
>                                     REQ_A = cachefiles_ondemand_select_req
>               copy_to_user(_buffer, msg, n)
>                xa_erase(&cache->reqs, id)
>                complete(&REQ_A->done)
>               ------ close(fd) ------
>               cachefiles_ondemand_fd_release
>                cachefiles_put_object
>  cachefiles_put_object
>   kmem_cache_free(cachefiles_object_jar, object)
>                                     REQ_A->object->ondemand->ondemand_id
>                                      // object UAF !!!
> 
> When we see the request within xa_lock, req->object must not have been
> freed yet, so grab the reference count of object before xa_unlock to
> avoid the above issue.
> 
> Fixes: 0a7e54c1959c ("cachefiles: resend an open request if the read request's object is closed")
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
> ---
>  fs/cachefiles/ondemand.c          | 2 ++
>  include/trace/events/cachefiles.h | 6 +++++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
> index 56d12fe4bf73..bb94ef6a6f61 100644
> --- a/fs/cachefiles/ondemand.c
> +++ b/fs/cachefiles/ondemand.c
> @@ -336,6 +336,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>  	xas_clear_mark(&xas, CACHEFILES_REQ_NEW);
>  	cache->req_id_next = xas.xa_index + 1;
>  	refcount_inc(&req->ref);
> +	cachefiles_grab_object(req->object, cachefiles_obj_get_read_req);
>  	xa_unlock(&cache->reqs);
>  
>  	if (msg->opcode == CACHEFILES_OP_OPEN) {
> @@ -355,6 +356,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>  			close_fd(((struct cachefiles_open *)msg->data)->fd);
>  	}
>  out:
> +	cachefiles_put_object(req->object, cachefiles_obj_put_read_req);
>  	/* Remove error request and CLOSE request has no reply */
>  	if (ret || msg->opcode == CACHEFILES_OP_CLOSE) {
>  		xas_reset(&xas);
> diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
> index cf4b98b9a9ed..119a823fb5a0 100644
> --- a/include/trace/events/cachefiles.h
> +++ b/include/trace/events/cachefiles.h
> @@ -33,6 +33,8 @@ enum cachefiles_obj_ref_trace {
>  	cachefiles_obj_see_withdrawal,
>  	cachefiles_obj_get_ondemand_fd,
>  	cachefiles_obj_put_ondemand_fd,
> +	cachefiles_obj_get_read_req,
> +	cachefiles_obj_put_read_req,
How about cachefiles_obj_[get|put]_ondemand_read, so that it could be
easily identified as ondemand mode at the first glance?
>  };
>  
>  enum fscache_why_object_killed {
> @@ -127,7 +129,9 @@ enum cachefiles_error_trace {
>  	EM(cachefiles_obj_see_lookup_cookie,	"SEE lookup_cookie")	\
>  	EM(cachefiles_obj_see_lookup_failed,	"SEE lookup_failed")	\
>  	EM(cachefiles_obj_see_withdraw_cookie,	"SEE withdraw_cookie")	\
> -	E_(cachefiles_obj_see_withdrawal,	"SEE withdrawal")
> +	EM(cachefiles_obj_see_withdrawal,	"SEE withdrawal")	\
> +	EM(cachefiles_obj_get_read_req,		"GET read_req")		\
> +	E_(cachefiles_obj_put_read_req,		"PUT read_req")
Ditto.
Otherwise, LGTM.
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
-- 
Thanks,
Jingbo
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 05/12] cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd
  2024-05-15  8:45 ` [PATCH v2 05/12] cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd libaokun
@ 2024-05-20  7:40   ` Jingbo Xu
  2024-05-20  9:02     ` Baokun Li
  0 siblings, 1 reply; 38+ messages in thread
From: Jingbo Xu @ 2024-05-20  7:40 UTC (permalink / raw)
  To: libaokun, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li
On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> This lets us see the correct trace output.
> 
> Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
Could we move this simple fix to the beginning of the patch set?
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
> ---
>  include/trace/events/cachefiles.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
> index 119a823fb5a0..bb56e3104b12 100644
> --- a/include/trace/events/cachefiles.h
> +++ b/include/trace/events/cachefiles.h
> @@ -130,6 +130,8 @@ enum cachefiles_error_trace {
>  	EM(cachefiles_obj_see_lookup_failed,	"SEE lookup_failed")	\
>  	EM(cachefiles_obj_see_withdraw_cookie,	"SEE withdraw_cookie")	\
>  	EM(cachefiles_obj_see_withdrawal,	"SEE withdrawal")	\
> +	EM(cachefiles_obj_get_ondemand_fd,      "GET ondemand_fd")      \
> +	EM(cachefiles_obj_put_ondemand_fd,      "PUT ondemand_fd")      \
>  	EM(cachefiles_obj_get_read_req,		"GET read_req")		\
>  	E_(cachefiles_obj_put_read_req,		"PUT read_req")
>  
-- 
Thanks,
Jingbo
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
  2024-05-15  8:45 ` [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd() libaokun
  2024-05-20  7:24   ` Jingbo Xu
@ 2024-05-20  8:06   ` Jingbo Xu
  2024-05-20  9:10     ` Baokun Li
  1 sibling, 1 reply; 38+ messages in thread
From: Jingbo Xu @ 2024-05-20  8:06 UTC (permalink / raw)
  To: libaokun, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li
On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> We got the following issue in a fuzz test of randomly issuing the restore
> command:
> 
> ==================================================================
> BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0x609/0xab0
> Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962
> 
> CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542
> Call Trace:
>  kasan_report+0x94/0xc0
>  cachefiles_ondemand_daemon_read+0x609/0xab0
>  vfs_read+0x169/0xb50
>  ksys_read+0xf5/0x1e0
> 
> Allocated by task 626:
>  __kmalloc+0x1df/0x4b0
>  cachefiles_ondemand_send_req+0x24d/0x690
>  cachefiles_create_tmpfile+0x249/0xb30
>  cachefiles_create_file+0x6f/0x140
>  cachefiles_look_up_object+0x29c/0xa60
>  cachefiles_lookup_cookie+0x37d/0xca0
>  fscache_cookie_state_machine+0x43c/0x1230
>  [...]
> 
> Freed by task 626:
>  kfree+0xf1/0x2c0
>  cachefiles_ondemand_send_req+0x568/0x690
>  cachefiles_create_tmpfile+0x249/0xb30
>  cachefiles_create_file+0x6f/0x140
>  cachefiles_look_up_object+0x29c/0xa60
>  cachefiles_lookup_cookie+0x37d/0xca0
>  fscache_cookie_state_machine+0x43c/0x1230
>  [...]
> ==================================================================
> 
> Following is the process that triggers the issue:
> 
>      mount  |   daemon_thread1    |    daemon_thread2
> ------------------------------------------------------------
>  cachefiles_ondemand_init_object
>   cachefiles_ondemand_send_req
>    REQ_A = kzalloc(sizeof(*req) + data_len)
>    wait_for_completion(&REQ_A->done)
> 
>             cachefiles_daemon_read
>              cachefiles_ondemand_daemon_read
>               REQ_A = cachefiles_ondemand_select_req
>               cachefiles_ondemand_get_fd
>               copy_to_user(_buffer, msg, n)
>             process_open_req(REQ_A)
>                                   ------ restore ------
>                                   cachefiles_ondemand_restore
>                                   xas_for_each(&xas, req, ULONG_MAX)
>                                    xas_set_mark(&xas, CACHEFILES_REQ_NEW);
> 
>                                   cachefiles_daemon_read
>                                    cachefiles_ondemand_daemon_read
>                                     REQ_A = cachefiles_ondemand_select_req
> 
>              write(devfd, ("copen %u,%llu", msg->msg_id, size));
>              cachefiles_ondemand_copen
>               xa_erase(&cache->reqs, id)
>               complete(&REQ_A->done)
>    kfree(REQ_A)
>                                     cachefiles_ondemand_get_fd(REQ_A)
>                                      fd = get_unused_fd_flags
>                                      file = anon_inode_getfile
>                                      fd_install(fd, file)
>                                      load = (void *)REQ_A->msg.data;
>                                      load->fd = fd;
>                                      // load UAF !!!
> 
> This issue is caused by issuing a restore command when the daemon is still
> alive, which results in a request being processed multiple times thus
> triggering a UAF. So to avoid this problem, add an additional reference
> count to cachefiles_req, which is held while waiting and reading, and then
> released when the waiting and reading is over.
> 
> Note that since there is only one reference count for waiting, we need to
> avoid the same request being completed multiple times, so we can only
> complete the request if it is successfully removed from the xarray.
> 
> Fixes: e73fa11a356c ("cachefiles: add restore command to recover inflight ondemand read requests")
> Suggested-by: Hou Tao <houtao1@huawei.com>
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
How could we protect it from being erased from the xarray with the same
message id in this case?
-- 
Thanks,
Jingbo
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
  2024-05-20  7:24   ` Jingbo Xu
@ 2024-05-20  8:38     ` Baokun Li
  2024-05-20  8:45       ` Gao Xiang
  2024-05-20  9:10       ` Jingbo Xu
  0 siblings, 2 replies; 38+ messages in thread
From: Baokun Li @ 2024-05-20  8:38 UTC (permalink / raw)
  To: Jingbo Xu, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li, libaokun
Hi Jingbo,
Thanks for your review!
On 2024/5/20 15:24, Jingbo Xu wrote:
>
> On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
>> From: Baokun Li <libaokun1@huawei.com>
>>
>> We got the following issue in a fuzz test of randomly issuing the restore
>> command:
>>
>> ==================================================================
>> BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0x609/0xab0
>> Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962
>>
>> CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542
>> Call Trace:
>>   kasan_report+0x94/0xc0
>>   cachefiles_ondemand_daemon_read+0x609/0xab0
>>   vfs_read+0x169/0xb50
>>   ksys_read+0xf5/0x1e0
>>
>> Allocated by task 626:
>>   __kmalloc+0x1df/0x4b0
>>   cachefiles_ondemand_send_req+0x24d/0x690
>>   cachefiles_create_tmpfile+0x249/0xb30
>>   cachefiles_create_file+0x6f/0x140
>>   cachefiles_look_up_object+0x29c/0xa60
>>   cachefiles_lookup_cookie+0x37d/0xca0
>>   fscache_cookie_state_machine+0x43c/0x1230
>>   [...]
>>
>> Freed by task 626:
>>   kfree+0xf1/0x2c0
>>   cachefiles_ondemand_send_req+0x568/0x690
>>   cachefiles_create_tmpfile+0x249/0xb30
>>   cachefiles_create_file+0x6f/0x140
>>   cachefiles_look_up_object+0x29c/0xa60
>>   cachefiles_lookup_cookie+0x37d/0xca0
>>   fscache_cookie_state_machine+0x43c/0x1230
>>   [...]
>> ==================================================================
>>
>> Following is the process that triggers the issue:
>>
>>       mount  |   daemon_thread1    |    daemon_thread2
>> ------------------------------------------------------------
>>   cachefiles_ondemand_init_object
>>    cachefiles_ondemand_send_req
>>     REQ_A = kzalloc(sizeof(*req) + data_len)
>>     wait_for_completion(&REQ_A->done)
>>
>>              cachefiles_daemon_read
>>               cachefiles_ondemand_daemon_read
>>                REQ_A = cachefiles_ondemand_select_req
>>                cachefiles_ondemand_get_fd
>>                copy_to_user(_buffer, msg, n)
>>              process_open_req(REQ_A)
>>                                    ------ restore ------
>>                                    cachefiles_ondemand_restore
>>                                    xas_for_each(&xas, req, ULONG_MAX)
>>                                     xas_set_mark(&xas, CACHEFILES_REQ_NEW);
>>
>>                                    cachefiles_daemon_read
>>                                     cachefiles_ondemand_daemon_read
>>                                      REQ_A = cachefiles_ondemand_select_req
>>
>>               write(devfd, ("copen %u,%llu", msg->msg_id, size));
>>               cachefiles_ondemand_copen
>>                xa_erase(&cache->reqs, id)
>>                complete(&REQ_A->done)
>>     kfree(REQ_A)
>>                                      cachefiles_ondemand_get_fd(REQ_A)
>>                                       fd = get_unused_fd_flags
>>                                       file = anon_inode_getfile
>>                                       fd_install(fd, file)
>>                                       load = (void *)REQ_A->msg.data;
>>                                       load->fd = fd;
>>                                       // load UAF !!!
>>
>> This issue is caused by issuing a restore command when the daemon is still
>> alive, which results in a request being processed multiple times thus
>> triggering a UAF. So to avoid this problem, add an additional reference
>> count to cachefiles_req, which is held while waiting and reading, and then
>> released when the waiting and reading is over.
>>
>>
>> Note that since there is only one reference count for waiting, we need to
>> avoid the same request being completed multiple times, so we can only
>> complete the request if it is successfully removed from the xarray.
> Sorry the above description makes me confused.  As the same request may
> be got by different daemon threads multiple times, the introduced
> refcount mechanism can't protect it from being completed multiple times
> (which is expected).  The refcount only protects it from being freed
> multiple times.
The idea here is that because the wait only holds one reference count,
complete(&req->done) can only be called when the req has been
successfully removed from the xarry, otherwise the following UAF may
occur:
    daemon_thread1    |    daemon_thread2
-------------------------------------------
cachefiles_ondemand_daemon_read
  xa_lock(&cache->reqs)
  // select req_A
  xa_unlock(&cache->reqs)
                     // restore req_A and read again
                     cachefiles_ondemand_daemon_read
                     xa_lock(&cache->reqs)
                     // select req_A
                     xa_unlock(&cache->reqs)
// goto error, erase success
xa_erase(&cache->reqs, id)
complete(&req_A->done)
// free req_A
                     // goto error, erase failed
                     complete(&req_A->done)
                     // req_A use-after-free
This is also why error requests and CLOSE requests are handled
together and why xas_load(&xas) == req is checked.
>> Fixes: e73fa11a356c ("cachefiles: add restore command to recover inflight ondemand read requests")
>> Suggested-by: Hou Tao <houtao1@huawei.com>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
>> ---
>>   fs/cachefiles/internal.h |  1 +
>>   fs/cachefiles/ondemand.c | 44 ++++++++++++++++++++++------------------
>>   2 files changed, 25 insertions(+), 20 deletions(-)
>>
>> diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
>> index d33169f0018b..7745b8abc3aa 100644
>> --- a/fs/cachefiles/internal.h
>> +++ b/fs/cachefiles/internal.h
>> @@ -138,6 +138,7 @@ static inline bool cachefiles_in_ondemand_mode(struct cachefiles_cache *cache)
>>   struct cachefiles_req {
>>   	struct cachefiles_object *object;
>>   	struct completion done;
>> +	refcount_t ref;
>>   	int error;
>>   	struct cachefiles_msg msg;
>>   };
>> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
>> index fd49728d8bae..56d12fe4bf73 100644
>> --- a/fs/cachefiles/ondemand.c
>> +++ b/fs/cachefiles/ondemand.c
>> @@ -4,6 +4,12 @@
>>   #include <linux/uio.h>
>>   #include "internal.h"
>>   
>> +static inline void cachefiles_req_put(struct cachefiles_req *req)
>> +{
>> +	if (refcount_dec_and_test(&req->ref))
>> +		kfree(req);
>> +}
>> +
>>   static int cachefiles_ondemand_fd_release(struct inode *inode,
>>   					  struct file *file)
>>   {
>> @@ -299,7 +305,6 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>>   {
>>   	struct cachefiles_req *req;
>>   	struct cachefiles_msg *msg;
>> -	unsigned long id = 0;
>>   	size_t n;
>>   	int ret = 0;
>>   	XA_STATE(xas, &cache->reqs, cache->req_id_next);
>> @@ -330,41 +335,39 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>>   
>>   	xas_clear_mark(&xas, CACHEFILES_REQ_NEW);
>>   	cache->req_id_next = xas.xa_index + 1;
>> +	refcount_inc(&req->ref);
>>   	xa_unlock(&cache->reqs);
>>   
>> -	id = xas.xa_index;
>> -
>>   	if (msg->opcode == CACHEFILES_OP_OPEN) {
>>   		ret = cachefiles_ondemand_get_fd(req);
>>   		if (ret) {
>>   			cachefiles_ondemand_set_object_close(req->object);
>> -			goto error;
>> +			goto out;
>>   		}
>>   	}
>>   
>> -	msg->msg_id = id;
>> +	msg->msg_id = xas.xa_index;
>>   	msg->object_id = req->object->ondemand->ondemand_id;
>>   
>>   	if (copy_to_user(_buffer, msg, n) != 0) {
>>   		ret = -EFAULT;
>>   		if (msg->opcode == CACHEFILES_OP_OPEN)
>>   			close_fd(((struct cachefiles_open *)msg->data)->fd);
>> -		goto error;
>>   	}
>> -
>> -	/* CLOSE request has no reply */
>> -	if (msg->opcode == CACHEFILES_OP_CLOSE) {
>> -		xa_erase(&cache->reqs, id);
>> -		complete(&req->done);
>> +out:
>> +	/* Remove error request and CLOSE request has no reply */
>> +	if (ret || msg->opcode == CACHEFILES_OP_CLOSE) {
>> +		xas_reset(&xas);
>> +		xas_lock(&xas);
>> +		if (xas_load(&xas) == req) {
> Just out of curiosity... How could xas_load(&xas) doesn't equal to req?
As mentioned above, the req may have been deleted or even the id
may have been reused.
>
>> +			req->error = ret;
>> +			complete(&req->done);
>> +			xas_store(&xas, NULL);
>> +		}
>> +		xas_unlock(&xas);
>>   	}
>> -
>> -	return n;
>> -
>> -error:
>> -	xa_erase(&cache->reqs, id);
>> -	req->error = ret;
>> -	complete(&req->done);
>> -	return ret;
>> +	cachefiles_req_put(req);
>> +	return ret ? ret : n;
>>   }
> This is actually a combination of a fix and a cleanup which combines the
> logic of removing error request and the CLOSE requests into one place.
> Also it relies on the cleanup made in patch 2 ("cachefiles: remove
> err_put_fd tag in cachefiles_ondemand_daemon_read()"), making it
> difficult to be atomatically back ported to the stable (as patch 2 is
> not marked as "Fixes").
>
> Thus could we make the fix first, and then make the cleanup.
I don't think that's necessary, stable automatically backports the
relevant dependency patches in case of backport patch conflicts,
and later patches modify the logic here as well.
Or add Fixes tag for patch 2?
>>   
>>   typedef int (*init_req_fn)(struct cachefiles_req *req, void *private);
>> @@ -394,6 +397,7 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
>>   		goto out;
>>   	}
>>   
>> +	refcount_set(&req->ref, 1);
>>   	req->object = object;
>>   	init_completion(&req->done);
>>   	req->msg.opcode = opcode;
>> @@ -455,7 +459,7 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object,
>>   	wake_up_all(&cache->daemon_pollwq);
>>   	wait_for_completion(&req->done);
>>   	ret = req->error;
>> -	kfree(req);
>> +	cachefiles_req_put(req);
>>   	return ret;
>>   out:
>>   	/* Reset the object to close state in error handling path.
>
> Don't we need to also convert "kfree(req)" to cachefiles_req_put(req)
> for the error path of cachefiles_ondemand_send_req()?
>
> ```
> out:
> 	/* Reset the object to close state in error handling path.
> 	 * If error occurs after creating the anonymous fd,
> 	 * cachefiles_ondemand_fd_release() will set object to close.
> 	 */
> 	if (opcode == CACHEFILES_OP_OPEN)
> 		cachefiles_ondemand_set_object_close(object);
> 	kfree(req);
> 	return ret;
> ```
When "goto out;" is called in cachefiles_ondemand_send_req(),
it means that the req is unallocated/failed to be allocated/failed to
be inserted into the xarry, and therefore the req can only be accessed
by the current function, so there is no need to consider concurrency
and reference counting.
Thanks!
-- 
With Best Regards,
Baokun Li
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 08/12] cachefiles: never get a new anonymous fd if ondemand_id is valid
  2024-05-15  8:45 ` [PATCH v2 08/12] cachefiles: never get a new anonymous fd if ondemand_id is valid libaokun
@ 2024-05-20  8:43   ` Jingbo Xu
  2024-05-20  9:07     ` Baokun Li
  0 siblings, 1 reply; 38+ messages in thread
From: Jingbo Xu @ 2024-05-20  8:43 UTC (permalink / raw)
  To: libaokun, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li
On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> Now every time the daemon reads an open request, it gets a new anonymous fd
> and ondemand_id. With the introduction of "restore", it is possible to read
> the same open request more than once, and therefore an object can have more
> than one anonymous fd.
> 
> If the anonymous fd is not unique, the following concurrencies will result
> in an fd leak:
> 
>      t1     |         t2         |          t3
> ------------------------------------------------------------
>  cachefiles_ondemand_init_object
>   cachefiles_ondemand_send_req
>    REQ_A = kzalloc(sizeof(*req) + data_len)
>    wait_for_completion(&REQ_A->done)
>             cachefiles_daemon_read
>              cachefiles_ondemand_daemon_read
>               REQ_A = cachefiles_ondemand_select_req
>               cachefiles_ondemand_get_fd
>                 load->fd = fd0
>                 ondemand_id = object_id0
>                                   ------ restore ------
>                                   cachefiles_ondemand_restore
>                                    // restore REQ_A
>                                   cachefiles_daemon_read
>                                    cachefiles_ondemand_daemon_read
>                                     REQ_A = cachefiles_ondemand_select_req
>                                       cachefiles_ondemand_get_fd
>                                         load->fd = fd1
>                                         ondemand_id = object_id1
>              process_open_req(REQ_A)
>              write(devfd, ("copen %u,%llu", msg->msg_id, size))
>              cachefiles_ondemand_copen
>               xa_erase(&cache->reqs, id)
>               complete(&REQ_A->done)
>    kfree(REQ_A)
>                                   process_open_req(REQ_A)
>                                   // copen fails due to no req
>                                   // daemon close(fd1)
>                                   cachefiles_ondemand_fd_release
>                                    // set object closed
>  -- umount --
>  cachefiles_withdraw_cookie
>   cachefiles_ondemand_clean_object
>    cachefiles_ondemand_init_close_req
>     if (!cachefiles_ondemand_object_is_open(object))
>       return -ENOENT;
>     // The fd0 is not closed until the daemon exits.
> 
> However, the anonymous fd holds the reference count of the object and the
> object holds the reference count of the cookie. So even though the cookie
> has been relinquished, it will not be unhashed and freed until the daemon
> exits.
> 
> In fscache_hash_cookie(), when the same cookie is found in the hash list,
> if the cookie is set with the FSCACHE_COOKIE_RELINQUISHED bit, then the new
> cookie waits for the old cookie to be unhashed, while the old cookie is
> waiting for the leaked fd to be closed, if the daemon does not exit in time
> it will trigger a hung task.
> 
> To avoid this, allocate a new anonymous fd only if no anonymous fd has
> been allocated (ondemand_id == 0) or if the previously allocated anonymous
> fd has been closed (ondemand_id == -1). Moreover, returns an error if
> ondemand_id is valid, letting the daemon know that the current userland
> restore logic is abnormal and needs to be checked.
> 
> Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
The LOCs of this fix is quite under control.  But still it seems that
the worst consequence is that the (potential) malicious daemon gets
hung.  No more effect to the system or other processes.  Or does a
non-malicious daemon have any chance having the same issue?
> ---
>  fs/cachefiles/ondemand.c | 34 ++++++++++++++++++++++++++++------
>  1 file changed, 28 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
> index d04ddc6576e3..d2d4e27fca6f 100644
> --- a/fs/cachefiles/ondemand.c
> +++ b/fs/cachefiles/ondemand.c
> @@ -14,11 +14,18 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
>  					  struct file *file)
>  {
>  	struct cachefiles_object *object = file->private_data;
> -	struct cachefiles_cache *cache = object->volume->cache;
> -	struct cachefiles_ondemand_info *info = object->ondemand;
> +	struct cachefiles_cache *cache;
> +	struct cachefiles_ondemand_info *info;
>  	int object_id;
>  	struct cachefiles_req *req;
> -	XA_STATE(xas, &cache->reqs, 0);
> +	XA_STATE(xas, NULL, 0);
> +
> +	if (!object)
> +		return 0;
> +
> +	info = object->ondemand;
> +	cache = object->volume->cache;
> +	xas.xa = &cache->reqs;
>  
>  	xa_lock(&cache->reqs);
>  	spin_lock(&info->lock);
> @@ -288,22 +295,39 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
>  		goto err_put_fd;
>  	}
>  
> +	spin_lock(&object->ondemand->lock);
> +	if (object->ondemand->ondemand_id > 0) {
> +		spin_unlock(&object->ondemand->lock);
> +		/* Pair with check in cachefiles_ondemand_fd_release(). */
> +		file->private_data = NULL;
> +		ret = -EEXIST;
> +		goto err_put_file;
> +	}
> +
>  	file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
>  	fd_install(fd, file);
>  
>  	load = (void *)req->msg.data;
>  	load->fd = fd;
>  	object->ondemand->ondemand_id = object_id;
> +	spin_unlock(&object->ondemand->lock);
>  
>  	cachefiles_get_unbind_pincount(cache);
>  	trace_cachefiles_ondemand_open(object, &req->msg, load);
>  	return 0;
>  
> +err_put_file:
> +	fput(file);
>  err_put_fd:
>  	put_unused_fd(fd);
>  err_free_id:
>  	xa_erase(&cache->ondemand_ids, object_id);
>  err:
> +	spin_lock(&object->ondemand->lock);
> +	/* Avoid marking an opened object as closed. */
> +	if (object->ondemand->ondemand_id <= 0)
> +		cachefiles_ondemand_set_object_close(object);
> +	spin_unlock(&object->ondemand->lock);
>  	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
>  	return ret;
>  }
> @@ -386,10 +410,8 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>  
>  	if (msg->opcode == CACHEFILES_OP_OPEN) {
>  		ret = cachefiles_ondemand_get_fd(req);
> -		if (ret) {
> -			cachefiles_ondemand_set_object_close(req->object);
> +		if (ret)
>  			goto out;
> -		}
>  	}
>  
>  	msg->msg_id = xas.xa_index;
-- 
Thanks,
Jingbo
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
  2024-05-20  8:38     ` Baokun Li
@ 2024-05-20  8:45       ` Gao Xiang
  2024-05-20  9:10       ` Jingbo Xu
  1 sibling, 0 replies; 38+ messages in thread
From: Gao Xiang @ 2024-05-20  8:45 UTC (permalink / raw)
  To: Baokun Li, Jingbo Xu, netfs
  Cc: zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel, yangerkun,
	houtao1, yukuai3, wozizhi, Baokun Li, David Howells, Jeff Layton
On 2024/5/20 16:38, Baokun Li wrote:
> Hi Jingbo,
> 
> Thanks for your review!
> 
> On 2024/5/20 15:24, Jingbo Xu wrote:
>>
>> On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
>>> From: Baokun Li <libaokun1@huawei.com>
>>>
>>> We got the following issue in a fuzz test of randomly issuing the restore
>>> command:
>>>
>>> ==================================================================
>>> BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0x609/0xab0
>>> Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962
>>>
>>> CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542
>>> Call Trace:
>>>   kasan_report+0x94/0xc0
>>>   cachefiles_ondemand_daemon_read+0x609/0xab0
>>>   vfs_read+0x169/0xb50
>>>   ksys_read+0xf5/0x1e0
>>>
>>> Allocated by task 626:
>>>   __kmalloc+0x1df/0x4b0
>>>   cachefiles_ondemand_send_req+0x24d/0x690
>>>   cachefiles_create_tmpfile+0x249/0xb30
>>>   cachefiles_create_file+0x6f/0x140
>>>   cachefiles_look_up_object+0x29c/0xa60
>>>   cachefiles_lookup_cookie+0x37d/0xca0
>>>   fscache_cookie_state_machine+0x43c/0x1230
>>>   [...]
>>>
>>> Freed by task 626:
>>>   kfree+0xf1/0x2c0
>>>   cachefiles_ondemand_send_req+0x568/0x690
>>>   cachefiles_create_tmpfile+0x249/0xb30
>>>   cachefiles_create_file+0x6f/0x140
>>>   cachefiles_look_up_object+0x29c/0xa60
>>>   cachefiles_lookup_cookie+0x37d/0xca0
>>>   fscache_cookie_state_machine+0x43c/0x1230
>>>   [...]
>>> ==================================================================
>>>
>>> Following is the process that triggers the issue:
>>>
>>>       mount  |   daemon_thread1    |    daemon_thread2
>>> ------------------------------------------------------------
>>>   cachefiles_ondemand_init_object
>>>    cachefiles_ondemand_send_req
>>>     REQ_A = kzalloc(sizeof(*req) + data_len)
>>>     wait_for_completion(&REQ_A->done)
>>>
>>>              cachefiles_daemon_read
>>>               cachefiles_ondemand_daemon_read
>>>                REQ_A = cachefiles_ondemand_select_req
>>>                cachefiles_ondemand_get_fd
>>>                copy_to_user(_buffer, msg, n)
>>>              process_open_req(REQ_A)
>>>                                    ------ restore ------
>>>                                    cachefiles_ondemand_restore
>>>                                    xas_for_each(&xas, req, ULONG_MAX)
>>>                                     xas_set_mark(&xas, CACHEFILES_REQ_NEW);
>>>
>>>                                    cachefiles_daemon_read
>>>                                     cachefiles_ondemand_daemon_read
>>>                                      REQ_A = cachefiles_ondemand_select_req
>>>
>>>               write(devfd, ("copen %u,%llu", msg->msg_id, size));
>>>               cachefiles_ondemand_copen
>>>                xa_erase(&cache->reqs, id)
>>>                complete(&REQ_A->done)
>>>     kfree(REQ_A)
>>>                                      cachefiles_ondemand_get_fd(REQ_A)
>>>                                       fd = get_unused_fd_flags
>>>                                       file = anon_inode_getfile
>>>                                       fd_install(fd, file)
>>>                                       load = (void *)REQ_A->msg.data;
>>>                                       load->fd = fd;
>>>                                       // load UAF !!!
>>>
>>> This issue is caused by issuing a restore command when the daemon is still
>>> alive, which results in a request being processed multiple times thus
>>> triggering a UAF. So to avoid this problem, add an additional reference
>>> count to cachefiles_req, which is held while waiting and reading, and then
>>> released when the waiting and reading is over.
>>>
>>>
>>> Note that since there is only one reference count for waiting, we need to
>>> avoid the same request being completed multiple times, so we can only
>>> complete the request if it is successfully removed from the xarray.
>> Sorry the above description makes me confused.  As the same request may
>> be got by different daemon threads multiple times, the introduced
>> refcount mechanism can't protect it from being completed multiple times
>> (which is expected).  The refcount only protects it from being freed
>> multiple times.
> The idea here is that because the wait only holds one reference count,
> complete(&req->done) can only be called when the req has been
> successfully removed from the xarry, otherwise the following UAF may
> occur:
> 
>     daemon_thread1    |    daemon_thread2
> -------------------------------------------
> cachefiles_ondemand_daemon_read
>   xa_lock(&cache->reqs)
>   // select req_A
>   xa_unlock(&cache->reqs)
>                      // restore req_A and read again
>                      cachefiles_ondemand_daemon_read
>                      xa_lock(&cache->reqs)
>                      // select req_A
>                      xa_unlock(&cache->reqs)
> // goto error, erase success
> xa_erase(&cache->reqs, id)
> complete(&req_A->done)
> // free req_A
>                      // goto error, erase failed
>                      complete(&req_A->done)
>                      // req_A use-after-free
> 
> This is also why error requests and CLOSE requests are handled
> together and why xas_load(&xas) == req is checked.
>>> Fixes: e73fa11a356c ("cachefiles: add restore command to recover inflight ondemand read requests")
>>> Suggested-by: Hou Tao <houtao1@huawei.com>
>>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>>> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
>>> ---
>>>   fs/cachefiles/internal.h |  1 +
>>>   fs/cachefiles/ondemand.c | 44 ++++++++++++++++++++++------------------
>>>   2 files changed, 25 insertions(+), 20 deletions(-)
>>>
>>> diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h
>>> index d33169f0018b..7745b8abc3aa 100644
>>> --- a/fs/cachefiles/internal.h
>>> +++ b/fs/cachefiles/internal.h
>>> @@ -138,6 +138,7 @@ static inline bool cachefiles_in_ondemand_mode(struct cachefiles_cache *cache)
>>>   struct cachefiles_req {
>>>       struct cachefiles_object *object;
>>>       struct completion done;
>>> +    refcount_t ref;
>>>       int error;
>>>       struct cachefiles_msg msg;
>>>   };
>>> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
>>> index fd49728d8bae..56d12fe4bf73 100644
>>> --- a/fs/cachefiles/ondemand.c
>>> +++ b/fs/cachefiles/ondemand.c
>>> @@ -4,6 +4,12 @@
>>>   #include <linux/uio.h>
>>>   #include "internal.h"
>>> +static inline void cachefiles_req_put(struct cachefiles_req *req)
>>> +{
>>> +    if (refcount_dec_and_test(&req->ref))
>>> +        kfree(req);
>>> +}
>>> +
>>>   static int cachefiles_ondemand_fd_release(struct inode *inode,
>>>                         struct file *file)
>>>   {
>>> @@ -299,7 +305,6 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>>>   {
>>>       struct cachefiles_req *req;
>>>       struct cachefiles_msg *msg;
>>> -    unsigned long id = 0;
>>>       size_t n;
>>>       int ret = 0;
>>>       XA_STATE(xas, &cache->reqs, cache->req_id_next);
>>> @@ -330,41 +335,39 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>>>       xas_clear_mark(&xas, CACHEFILES_REQ_NEW);
>>>       cache->req_id_next = xas.xa_index + 1;
>>> +    refcount_inc(&req->ref);
>>>       xa_unlock(&cache->reqs);
>>> -    id = xas.xa_index;
>>> -
>>>       if (msg->opcode == CACHEFILES_OP_OPEN) {
>>>           ret = cachefiles_ondemand_get_fd(req);
>>>           if (ret) {
>>>               cachefiles_ondemand_set_object_close(req->object);
>>> -            goto error;
>>> +            goto out;
>>>           }
>>>       }
>>> -    msg->msg_id = id;
>>> +    msg->msg_id = xas.xa_index;
>>>       msg->object_id = req->object->ondemand->ondemand_id;
>>>       if (copy_to_user(_buffer, msg, n) != 0) {
>>>           ret = -EFAULT;
>>>           if (msg->opcode == CACHEFILES_OP_OPEN)
>>>               close_fd(((struct cachefiles_open *)msg->data)->fd);
>>> -        goto error;
>>>       }
>>> -
>>> -    /* CLOSE request has no reply */
>>> -    if (msg->opcode == CACHEFILES_OP_CLOSE) {
>>> -        xa_erase(&cache->reqs, id);
>>> -        complete(&req->done);
>>> +out:
>>> +    /* Remove error request and CLOSE request has no reply */
>>> +    if (ret || msg->opcode == CACHEFILES_OP_CLOSE) {
>>> +        xas_reset(&xas);
>>> +        xas_lock(&xas);
>>> +        if (xas_load(&xas) == req) {
>> Just out of curiosity... How could xas_load(&xas) doesn't equal to req?
> 
> As mentioned above, the req may have been deleted or even the id
> 
> may have been reused.
> 
>>
>>> +            req->error = ret;
>>> +            complete(&req->done);
>>> +            xas_store(&xas, NULL);
>>> +        }
>>> +        xas_unlock(&xas);
>>>       }
>>> -
>>> -    return n;
>>> -
>>> -error:
>>> -    xa_erase(&cache->reqs, id);
>>> -    req->error = ret;
>>> -    complete(&req->done);
>>> -    return ret;
>>> +    cachefiles_req_put(req);
>>> +    return ret ? ret : n;
>>>   }
>> This is actually a combination of a fix and a cleanup which combines the
>> logic of removing error request and the CLOSE requests into one place.
>> Also it relies on the cleanup made in patch 2 ("cachefiles: remove
>> err_put_fd tag in cachefiles_ondemand_daemon_read()"), making it
>> difficult to be atomatically back ported to the stable (as patch 2 is
>> not marked as "Fixes").
>>
>> Thus could we make the fix first, and then make the cleanup.
> I don't think that's necessary, stable automatically backports the
> relevant dependency patches in case of backport patch conflicts,
> and later patches modify the logic here as well.
> Or add Fixes tag for patch 2?
I think we might better to avoid unnecessary dependencies
since it relies on some "AI" magic and often mis-backportes
real dependencies.
I tend to leave real bugfixes first, and do cleanup next.
But please don't leave cleanup patches with "Fixes:" tags
anyway since it just misleads people.
Thanks,
Gao Xiang
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 04/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read()
  2024-05-20  7:36   ` Jingbo Xu
@ 2024-05-20  8:56     ` Baokun Li
  0 siblings, 0 replies; 38+ messages in thread
From: Baokun Li @ 2024-05-20  8:56 UTC (permalink / raw)
  To: Jingbo Xu, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li, libaokun
On 2024/5/20 15:36, Jingbo Xu wrote:
>
> On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
>> From: Baokun Li <libaokun1@huawei.com>
>>
>> We got the following issue in a fuzz test of randomly issuing the restore
>> command:
>>
>> ==================================================================
>> BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0xb41/0xb60
>> Read of size 8 at addr ffff888122e84088 by task ondemand-04-dae/963
>>
>> CPU: 13 PID: 963 Comm: ondemand-04-dae Not tainted 6.8.0-dirty #564
>> Call Trace:
>>   kasan_report+0x93/0xc0
>>   cachefiles_ondemand_daemon_read+0xb41/0xb60
>>   vfs_read+0x169/0xb50
>>   ksys_read+0xf5/0x1e0
>>
>> Allocated by task 116:
>>   kmem_cache_alloc+0x140/0x3a0
>>   cachefiles_lookup_cookie+0x140/0xcd0
>>   fscache_cookie_state_machine+0x43c/0x1230
>>   [...]
>>
>> Freed by task 792:
>>   kmem_cache_free+0xfe/0x390
>>   cachefiles_put_object+0x241/0x480
>>   fscache_cookie_state_machine+0x5c8/0x1230
>>   [...]
>> ==================================================================
>>
>> Following is the process that triggers the issue:
>>
>>       mount  |   daemon_thread1    |    daemon_thread2
>> ------------------------------------------------------------
>> cachefiles_withdraw_cookie
>>   cachefiles_ondemand_clean_object(object)
>>    cachefiles_ondemand_send_req
>>     REQ_A = kzalloc(sizeof(*req) + data_len)
>>     wait_for_completion(&REQ_A->done)
>>
>>              cachefiles_daemon_read
>>               cachefiles_ondemand_daemon_read
>>                REQ_A = cachefiles_ondemand_select_req
>>                msg->object_id = req->object->ondemand->ondemand_id
>>                                    ------ restore ------
>>                                    cachefiles_ondemand_restore
>>                                    xas_for_each(&xas, req, ULONG_MAX)
>>                                     xas_set_mark(&xas, CACHEFILES_REQ_NEW)
>>
>>                                    cachefiles_daemon_read
>>                                     cachefiles_ondemand_daemon_read
>>                                      REQ_A = cachefiles_ondemand_select_req
>>                copy_to_user(_buffer, msg, n)
>>                 xa_erase(&cache->reqs, id)
>>                 complete(&REQ_A->done)
>>                ------ close(fd) ------
>>                cachefiles_ondemand_fd_release
>>                 cachefiles_put_object
>>   cachefiles_put_object
>>    kmem_cache_free(cachefiles_object_jar, object)
>>                                      REQ_A->object->ondemand->ondemand_id
>>                                       // object UAF !!!
>>
>> When we see the request within xa_lock, req->object must not have been
>> freed yet, so grab the reference count of object before xa_unlock to
>> avoid the above issue.
>>
>> Fixes: 0a7e54c1959c ("cachefiles: resend an open request if the read request's object is closed")
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
>> ---
>>   fs/cachefiles/ondemand.c          | 2 ++
>>   include/trace/events/cachefiles.h | 6 +++++-
>>   2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
>> index 56d12fe4bf73..bb94ef6a6f61 100644
>> --- a/fs/cachefiles/ondemand.c
>> +++ b/fs/cachefiles/ondemand.c
>> @@ -336,6 +336,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>>   	xas_clear_mark(&xas, CACHEFILES_REQ_NEW);
>>   	cache->req_id_next = xas.xa_index + 1;
>>   	refcount_inc(&req->ref);
>> +	cachefiles_grab_object(req->object, cachefiles_obj_get_read_req);
>>   	xa_unlock(&cache->reqs);
>>   
>>   	if (msg->opcode == CACHEFILES_OP_OPEN) {
>> @@ -355,6 +356,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>>   			close_fd(((struct cachefiles_open *)msg->data)->fd);
>>   	}
>>   out:
>> +	cachefiles_put_object(req->object, cachefiles_obj_put_read_req);
>>   	/* Remove error request and CLOSE request has no reply */
>>   	if (ret || msg->opcode == CACHEFILES_OP_CLOSE) {
>>   		xas_reset(&xas);
>> diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
>> index cf4b98b9a9ed..119a823fb5a0 100644
>> --- a/include/trace/events/cachefiles.h
>> +++ b/include/trace/events/cachefiles.h
>> @@ -33,6 +33,8 @@ enum cachefiles_obj_ref_trace {
>>   	cachefiles_obj_see_withdrawal,
>>   	cachefiles_obj_get_ondemand_fd,
>>   	cachefiles_obj_put_ondemand_fd,
>> +	cachefiles_obj_get_read_req,
>> +	cachefiles_obj_put_read_req,
> How about cachefiles_obj_[get|put]_ondemand_read, so that it could be
> easily identified as ondemand mode at the first glance?
The ondemand_read tends to confuse whether it's
ondemand_daemon_read or ondemand_data_read. I think it's better
to emphasise the read request, and currently only the ondemand
mode has a cachefiles req.
>>   };
>>   
>>   enum fscache_why_object_killed {
>> @@ -127,7 +129,9 @@ enum cachefiles_error_trace {
>>   	EM(cachefiles_obj_see_lookup_cookie,	"SEE lookup_cookie")	\
>>   	EM(cachefiles_obj_see_lookup_failed,	"SEE lookup_failed")	\
>>   	EM(cachefiles_obj_see_withdraw_cookie,	"SEE withdraw_cookie")	\
>> -	E_(cachefiles_obj_see_withdrawal,	"SEE withdrawal")
>> +	EM(cachefiles_obj_see_withdrawal,	"SEE withdrawal")	\
>> +	EM(cachefiles_obj_get_read_req,		"GET read_req")		\
>> +	E_(cachefiles_obj_put_read_req,		"PUT read_req")
> Ditto.
>
>
> Otherwise, LGTM.
>
> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
>
Thank you very much for your review!
-- 
With Best Regards,
Baokun Li
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 05/12] cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd
  2024-05-20  7:40   ` Jingbo Xu
@ 2024-05-20  9:02     ` Baokun Li
  0 siblings, 0 replies; 38+ messages in thread
From: Baokun Li @ 2024-05-20  9:02 UTC (permalink / raw)
  To: Jingbo Xu, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li, libaokun
On 2024/5/20 15:40, Jingbo Xu wrote:
>
> On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
>> From: Baokun Li <libaokun1@huawei.com>
>>
>> This lets us see the correct trace output.
>>
>> Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>
> Could we move this simple fix to the beginning of the patch set?
>
> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
>
Thanks for the review!
This patch is here because when adding trace-related output in the
last patch (path 4), it was found that cachefiles_obj_[get|put]_ondemand_fd
did not have any relevant trace output, so it is added in this patch.
Putting it in the first patch may confuse the reader as to why it was
added, but it is easier to understand with cachefiles_obj_[get|put]_read_req
in front of it.
>> ---
>>   include/trace/events/cachefiles.h | 2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h
>> index 119a823fb5a0..bb56e3104b12 100644
>> --- a/include/trace/events/cachefiles.h
>> +++ b/include/trace/events/cachefiles.h
>> @@ -130,6 +130,8 @@ enum cachefiles_error_trace {
>>   	EM(cachefiles_obj_see_lookup_failed,	"SEE lookup_failed")	\
>>   	EM(cachefiles_obj_see_withdraw_cookie,	"SEE withdraw_cookie")	\
>>   	EM(cachefiles_obj_see_withdrawal,	"SEE withdrawal")	\
>> +	EM(cachefiles_obj_get_ondemand_fd,      "GET ondemand_fd")      \
>> +	EM(cachefiles_obj_put_ondemand_fd,      "PUT ondemand_fd")      \
>>   	EM(cachefiles_obj_get_read_req,		"GET read_req")		\
>>   	E_(cachefiles_obj_put_read_req,		"PUT read_req")
>>   
-- 
With Best Regards,
Baokun Li
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 08/12] cachefiles: never get a new anonymous fd if ondemand_id is valid
  2024-05-20  8:43   ` Jingbo Xu
@ 2024-05-20  9:07     ` Baokun Li
  2024-05-20  9:24       ` Jingbo Xu
  0 siblings, 1 reply; 38+ messages in thread
From: Baokun Li @ 2024-05-20  9:07 UTC (permalink / raw)
  To: Jingbo Xu, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li, libaokun
On 2024/5/20 16:43, Jingbo Xu wrote:
>
> On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
>> From: Baokun Li <libaokun1@huawei.com>
>>
>> Now every time the daemon reads an open request, it gets a new anonymous fd
>> and ondemand_id. With the introduction of "restore", it is possible to read
>> the same open request more than once, and therefore an object can have more
>> than one anonymous fd.
>>
>> If the anonymous fd is not unique, the following concurrencies will result
>> in an fd leak:
>>
>>       t1     |         t2         |          t3
>> ------------------------------------------------------------
>>   cachefiles_ondemand_init_object
>>    cachefiles_ondemand_send_req
>>     REQ_A = kzalloc(sizeof(*req) + data_len)
>>     wait_for_completion(&REQ_A->done)
>>              cachefiles_daemon_read
>>               cachefiles_ondemand_daemon_read
>>                REQ_A = cachefiles_ondemand_select_req
>>                cachefiles_ondemand_get_fd
>>                  load->fd = fd0
>>                  ondemand_id = object_id0
>>                                    ------ restore ------
>>                                    cachefiles_ondemand_restore
>>                                     // restore REQ_A
>>                                    cachefiles_daemon_read
>>                                     cachefiles_ondemand_daemon_read
>>                                      REQ_A = cachefiles_ondemand_select_req
>>                                        cachefiles_ondemand_get_fd
>>                                          load->fd = fd1
>>                                          ondemand_id = object_id1
>>               process_open_req(REQ_A)
>>               write(devfd, ("copen %u,%llu", msg->msg_id, size))
>>               cachefiles_ondemand_copen
>>                xa_erase(&cache->reqs, id)
>>                complete(&REQ_A->done)
>>     kfree(REQ_A)
>>                                    process_open_req(REQ_A)
>>                                    // copen fails due to no req
>>                                    // daemon close(fd1)
>>                                    cachefiles_ondemand_fd_release
>>                                     // set object closed
>>   -- umount --
>>   cachefiles_withdraw_cookie
>>    cachefiles_ondemand_clean_object
>>     cachefiles_ondemand_init_close_req
>>      if (!cachefiles_ondemand_object_is_open(object))
>>        return -ENOENT;
>>      // The fd0 is not closed until the daemon exits.
>>
>> However, the anonymous fd holds the reference count of the object and the
>> object holds the reference count of the cookie. So even though the cookie
>> has been relinquished, it will not be unhashed and freed until the daemon
>> exits.
>>
>> In fscache_hash_cookie(), when the same cookie is found in the hash list,
>> if the cookie is set with the FSCACHE_COOKIE_RELINQUISHED bit, then the new
>> cookie waits for the old cookie to be unhashed, while the old cookie is
>> waiting for the leaked fd to be closed, if the daemon does not exit in time
>> it will trigger a hung task.
>>
>> To avoid this, allocate a new anonymous fd only if no anonymous fd has
>> been allocated (ondemand_id == 0) or if the previously allocated anonymous
>> fd has been closed (ondemand_id == -1). Moreover, returns an error if
>> ondemand_id is valid, letting the daemon know that the current userland
>> restore logic is abnormal and needs to be checked.
>>
>> Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> The LOCs of this fix is quite under control.  But still it seems that
> the worst consequence is that the (potential) malicious daemon gets
> hung.  No more effect to the system or other processes.  Or does a
> non-malicious daemon have any chance having the same issue?
If we enable hung_task_panic, it may cause panic to crash the server.
>> ---
>>   fs/cachefiles/ondemand.c | 34 ++++++++++++++++++++++++++++------
>>   1 file changed, 28 insertions(+), 6 deletions(-)
>>
>> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
>> index d04ddc6576e3..d2d4e27fca6f 100644
>> --- a/fs/cachefiles/ondemand.c
>> +++ b/fs/cachefiles/ondemand.c
>> @@ -14,11 +14,18 @@ static int cachefiles_ondemand_fd_release(struct inode *inode,
>>   					  struct file *file)
>>   {
>>   	struct cachefiles_object *object = file->private_data;
>> -	struct cachefiles_cache *cache = object->volume->cache;
>> -	struct cachefiles_ondemand_info *info = object->ondemand;
>> +	struct cachefiles_cache *cache;
>> +	struct cachefiles_ondemand_info *info;
>>   	int object_id;
>>   	struct cachefiles_req *req;
>> -	XA_STATE(xas, &cache->reqs, 0);
>> +	XA_STATE(xas, NULL, 0);
>> +
>> +	if (!object)
>> +		return 0;
>> +
>> +	info = object->ondemand;
>> +	cache = object->volume->cache;
>> +	xas.xa = &cache->reqs;
>>   
>>   	xa_lock(&cache->reqs);
>>   	spin_lock(&info->lock);
>> @@ -288,22 +295,39 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
>>   		goto err_put_fd;
>>   	}
>>   
>> +	spin_lock(&object->ondemand->lock);
>> +	if (object->ondemand->ondemand_id > 0) {
>> +		spin_unlock(&object->ondemand->lock);
>> +		/* Pair with check in cachefiles_ondemand_fd_release(). */
>> +		file->private_data = NULL;
>> +		ret = -EEXIST;
>> +		goto err_put_file;
>> +	}
>> +
>>   	file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
>>   	fd_install(fd, file);
>>   
>>   	load = (void *)req->msg.data;
>>   	load->fd = fd;
>>   	object->ondemand->ondemand_id = object_id;
>> +	spin_unlock(&object->ondemand->lock);
>>   
>>   	cachefiles_get_unbind_pincount(cache);
>>   	trace_cachefiles_ondemand_open(object, &req->msg, load);
>>   	return 0;
>>   
>> +err_put_file:
>> +	fput(file);
>>   err_put_fd:
>>   	put_unused_fd(fd);
>>   err_free_id:
>>   	xa_erase(&cache->ondemand_ids, object_id);
>>   err:
>> +	spin_lock(&object->ondemand->lock);
>> +	/* Avoid marking an opened object as closed. */
>> +	if (object->ondemand->ondemand_id <= 0)
>> +		cachefiles_ondemand_set_object_close(object);
>> +	spin_unlock(&object->ondemand->lock);
>>   	cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd);
>>   	return ret;
>>   }
>> @@ -386,10 +410,8 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>>   
>>   	if (msg->opcode == CACHEFILES_OP_OPEN) {
>>   		ret = cachefiles_ondemand_get_fd(req);
>> -		if (ret) {
>> -			cachefiles_ondemand_set_object_close(req->object);
>> +		if (ret)
>>   			goto out;
>> -		}
>>   	}
>>   
>>   	msg->msg_id = xas.xa_index;
-- 
With Best Regards,
Baokun Li
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
  2024-05-20  8:06   ` Jingbo Xu
@ 2024-05-20  9:10     ` Baokun Li
  0 siblings, 0 replies; 38+ messages in thread
From: Baokun Li @ 2024-05-20  9:10 UTC (permalink / raw)
  To: Jingbo Xu, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li, libaokun
On 2024/5/20 16:06, Jingbo Xu wrote:
>
> On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
>> From: Baokun Li <libaokun1@huawei.com>
>>
>> We got the following issue in a fuzz test of randomly issuing the restore
>> command:
>>
>> ==================================================================
>> BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0x609/0xab0
>> Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962
>>
>> CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542
>> Call Trace:
>>   kasan_report+0x94/0xc0
>>   cachefiles_ondemand_daemon_read+0x609/0xab0
>>   vfs_read+0x169/0xb50
>>   ksys_read+0xf5/0x1e0
>>
>> Allocated by task 626:
>>   __kmalloc+0x1df/0x4b0
>>   cachefiles_ondemand_send_req+0x24d/0x690
>>   cachefiles_create_tmpfile+0x249/0xb30
>>   cachefiles_create_file+0x6f/0x140
>>   cachefiles_look_up_object+0x29c/0xa60
>>   cachefiles_lookup_cookie+0x37d/0xca0
>>   fscache_cookie_state_machine+0x43c/0x1230
>>   [...]
>>
>> Freed by task 626:
>>   kfree+0xf1/0x2c0
>>   cachefiles_ondemand_send_req+0x568/0x690
>>   cachefiles_create_tmpfile+0x249/0xb30
>>   cachefiles_create_file+0x6f/0x140
>>   cachefiles_look_up_object+0x29c/0xa60
>>   cachefiles_lookup_cookie+0x37d/0xca0
>>   fscache_cookie_state_machine+0x43c/0x1230
>>   [...]
>> ==================================================================
>>
>> Following is the process that triggers the issue:
>>
>>       mount  |   daemon_thread1    |    daemon_thread2
>> ------------------------------------------------------------
>>   cachefiles_ondemand_init_object
>>    cachefiles_ondemand_send_req
>>     REQ_A = kzalloc(sizeof(*req) + data_len)
>>     wait_for_completion(&REQ_A->done)
>>
>>              cachefiles_daemon_read
>>               cachefiles_ondemand_daemon_read
>>                REQ_A = cachefiles_ondemand_select_req
>>                cachefiles_ondemand_get_fd
>>                copy_to_user(_buffer, msg, n)
>>              process_open_req(REQ_A)
>>                                    ------ restore ------
>>                                    cachefiles_ondemand_restore
>>                                    xas_for_each(&xas, req, ULONG_MAX)
>>                                     xas_set_mark(&xas, CACHEFILES_REQ_NEW);
>>
>>                                    cachefiles_daemon_read
>>                                     cachefiles_ondemand_daemon_read
>>                                      REQ_A = cachefiles_ondemand_select_req
>>
>>               write(devfd, ("copen %u,%llu", msg->msg_id, size));
>>               cachefiles_ondemand_copen
>>                xa_erase(&cache->reqs, id)
>>                complete(&REQ_A->done)
>>     kfree(REQ_A)
>>                                      cachefiles_ondemand_get_fd(REQ_A)
>>                                       fd = get_unused_fd_flags
>>                                       file = anon_inode_getfile
>>                                       fd_install(fd, file)
>>                                       load = (void *)REQ_A->msg.data;
>>                                       load->fd = fd;
>>                                       // load UAF !!!
>>
>> This issue is caused by issuing a restore command when the daemon is still
>> alive, which results in a request being processed multiple times thus
>> triggering a UAF. So to avoid this problem, add an additional reference
>> count to cachefiles_req, which is held while waiting and reading, and then
>> released when the waiting and reading is over.
>>
>> Note that since there is only one reference count for waiting, we need to
>> avoid the same request being completed multiple times, so we can only
>> complete the request if it is successfully removed from the xarray.
>>
>> Fixes: e73fa11a356c ("cachefiles: add restore command to recover inflight ondemand read requests")
>> Suggested-by: Hou Tao <houtao1@huawei.com>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> Reviewed-by: Jia Zhu <zhujia.zj@bytedance.com>
> How could we protect it from being erased from the xarray with the same
> message id in this case?
We will hold xa_lock while erasing the id to avoid concurrency.
-- 
With Best Regards,
Baokun Li
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
  2024-05-20  8:38     ` Baokun Li
  2024-05-20  8:45       ` Gao Xiang
@ 2024-05-20  9:10       ` Jingbo Xu
  2024-05-20  9:19         ` Baokun Li
  2024-05-20 12:22         ` Baokun Li
  1 sibling, 2 replies; 38+ messages in thread
From: Jingbo Xu @ 2024-05-20  9:10 UTC (permalink / raw)
  To: Baokun Li, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li
On 5/20/24 4:38 PM, Baokun Li wrote:
> Hi Jingbo,
> 
> Thanks for your review!
> 
> On 2024/5/20 15:24, Jingbo Xu wrote:
>>
>> On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
>>> From: Baokun Li <libaokun1@huawei.com>
>>>
>>> We got the following issue in a fuzz test of randomly issuing the
>>> restore
>>> command:
>>>
>>> ==================================================================
>>> BUG: KASAN: slab-use-after-free in
>>> cachefiles_ondemand_daemon_read+0x609/0xab0
>>> Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962
>>>
>>> CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542
>>> Call Trace:
>>>   kasan_report+0x94/0xc0
>>>   cachefiles_ondemand_daemon_read+0x609/0xab0
>>>   vfs_read+0x169/0xb50
>>>   ksys_read+0xf5/0x1e0
>>>
>>> Allocated by task 626:
>>>   __kmalloc+0x1df/0x4b0
>>>   cachefiles_ondemand_send_req+0x24d/0x690
>>>   cachefiles_create_tmpfile+0x249/0xb30
>>>   cachefiles_create_file+0x6f/0x140
>>>   cachefiles_look_up_object+0x29c/0xa60
>>>   cachefiles_lookup_cookie+0x37d/0xca0
>>>   fscache_cookie_state_machine+0x43c/0x1230
>>>   [...]
>>>
>>> Freed by task 626:
>>>   kfree+0xf1/0x2c0
>>>   cachefiles_ondemand_send_req+0x568/0x690
>>>   cachefiles_create_tmpfile+0x249/0xb30
>>>   cachefiles_create_file+0x6f/0x140
>>>   cachefiles_look_up_object+0x29c/0xa60
>>>   cachefiles_lookup_cookie+0x37d/0xca0
>>>   fscache_cookie_state_machine+0x43c/0x1230
>>>   [...]
>>> ==================================================================
>>>
>>> Following is the process that triggers the issue:
>>>
>>>       mount  |   daemon_thread1    |    daemon_thread2
>>> ------------------------------------------------------------
>>>   cachefiles_ondemand_init_object
>>>    cachefiles_ondemand_send_req
>>>     REQ_A = kzalloc(sizeof(*req) + data_len)
>>>     wait_for_completion(&REQ_A->done)
>>>
>>>              cachefiles_daemon_read
>>>               cachefiles_ondemand_daemon_read
>>>                REQ_A = cachefiles_ondemand_select_req
>>>                cachefiles_ondemand_get_fd
>>>                copy_to_user(_buffer, msg, n)
>>>              process_open_req(REQ_A)
>>>                                    ------ restore ------
>>>                                    cachefiles_ondemand_restore
>>>                                    xas_for_each(&xas, req, ULONG_MAX)
>>>                                     xas_set_mark(&xas,
>>> CACHEFILES_REQ_NEW);
>>>
>>>                                    cachefiles_daemon_read
>>>                                     cachefiles_ondemand_daemon_read
>>>                                      REQ_A =
>>> cachefiles_ondemand_select_req
>>>
>>>               write(devfd, ("copen %u,%llu", msg->msg_id, size));
>>>               cachefiles_ondemand_copen
>>>                xa_erase(&cache->reqs, id)
>>>                complete(&REQ_A->done)
>>>     kfree(REQ_A)
>>>                                      cachefiles_ondemand_get_fd(REQ_A)
>>>                                       fd = get_unused_fd_flags
>>>                                       file = anon_inode_getfile
>>>                                       fd_install(fd, file)
>>>                                       load = (void *)REQ_A->msg.data;
>>>                                       load->fd = fd;
>>>                                       // load UAF !!!
>>>
>>> This issue is caused by issuing a restore command when the daemon is
>>> still
>>> alive, which results in a request being processed multiple times thus
>>> triggering a UAF. So to avoid this problem, add an additional reference
>>> count to cachefiles_req, which is held while waiting and reading, and
>>> then
>>> released when the waiting and reading is over.
>>>
>>>
>>> Note that since there is only one reference count for waiting, we
>>> need to
>>> avoid the same request being completed multiple times, so we can only
>>> complete the request if it is successfully removed from the xarray.
>> Sorry the above description makes me confused.  As the same request may
>> be got by different daemon threads multiple times, the introduced
>> refcount mechanism can't protect it from being completed multiple times
>> (which is expected).  The refcount only protects it from being freed
>> multiple times.
> The idea here is that because the wait only holds one reference count,
> complete(&req->done) can only be called when the req has been
> successfully removed from the xarry, otherwise the following UAF may
> occur:
"complete(&req->done) can only be called when the req has been
successfully removed from the xarry ..."
How this is done? since the following xarray_erase() following the first
xarray_erase() will fail as the xarray slot referred by the same id has
already been erased?
>>> @@ -455,7 +459,7 @@ static int cachefiles_ondemand_send_req(struct
>>> cachefiles_object *object,
>>>       wake_up_all(&cache->daemon_pollwq);
>>>       wait_for_completion(&req->done);
>>>       ret = req->error;
>>> -    kfree(req);
>>> +    cachefiles_req_put(req);
>>>       return ret;
>>>   out:
>>>       /* Reset the object to close state in error handling path.
>>
>> Don't we need to also convert "kfree(req)" to cachefiles_req_put(req)
>> for the error path of cachefiles_ondemand_send_req()?
>>
>> ```
>> out:
>>     /* Reset the object to close state in error handling path.
>>      * If error occurs after creating the anonymous fd,
>>      * cachefiles_ondemand_fd_release() will set object to close.
>>      */
>>     if (opcode == CACHEFILES_OP_OPEN)
>>         cachefiles_ondemand_set_object_close(object);
>>     kfree(req);
>>     return ret;
>> ```
> When "goto out;" is called in cachefiles_ondemand_send_req(),
> it means that the req is unallocated/failed to be allocated/failed to
> be inserted into the xarry, and therefore the req can only be accessed
> by the current function, so there is no need to consider concurrency
> and reference counting.
Okay I understand. But this is indeed quite confusing. I see no cost of
also converting to cachefiles_req_put(req).
-- 
Thanks,
Jingbo
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
  2024-05-20  9:10       ` Jingbo Xu
@ 2024-05-20  9:19         ` Baokun Li
  2024-05-20 12:22         ` Baokun Li
  1 sibling, 0 replies; 38+ messages in thread
From: Baokun Li @ 2024-05-20  9:19 UTC (permalink / raw)
  To: Jingbo Xu, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li, libaokun
On 2024/5/20 17:10, Jingbo Xu wrote:
>
> On 5/20/24 4:38 PM, Baokun Li wrote:
>> Hi Jingbo,
>>
>> Thanks for your review!
>>
>> On 2024/5/20 15:24, Jingbo Xu wrote:
>>> On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
>>>> From: Baokun Li <libaokun1@huawei.com>
>>>>
>>>> We got the following issue in a fuzz test of randomly issuing the
>>>> restore
>>>> command:
>>>>
>>>> ==================================================================
>>>> BUG: KASAN: slab-use-after-free in
>>>> cachefiles_ondemand_daemon_read+0x609/0xab0
>>>> Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962
>>>>
>>>> CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542
>>>> Call Trace:
>>>>    kasan_report+0x94/0xc0
>>>>    cachefiles_ondemand_daemon_read+0x609/0xab0
>>>>    vfs_read+0x169/0xb50
>>>>    ksys_read+0xf5/0x1e0
>>>>
>>>> Allocated by task 626:
>>>>    __kmalloc+0x1df/0x4b0
>>>>    cachefiles_ondemand_send_req+0x24d/0x690
>>>>    cachefiles_create_tmpfile+0x249/0xb30
>>>>    cachefiles_create_file+0x6f/0x140
>>>>    cachefiles_look_up_object+0x29c/0xa60
>>>>    cachefiles_lookup_cookie+0x37d/0xca0
>>>>    fscache_cookie_state_machine+0x43c/0x1230
>>>>    [...]
>>>>
>>>> Freed by task 626:
>>>>    kfree+0xf1/0x2c0
>>>>    cachefiles_ondemand_send_req+0x568/0x690
>>>>    cachefiles_create_tmpfile+0x249/0xb30
>>>>    cachefiles_create_file+0x6f/0x140
>>>>    cachefiles_look_up_object+0x29c/0xa60
>>>>    cachefiles_lookup_cookie+0x37d/0xca0
>>>>    fscache_cookie_state_machine+0x43c/0x1230
>>>>    [...]
>>>> ==================================================================
>>>>
>>>> Following is the process that triggers the issue:
>>>>
>>>>        mount  |   daemon_thread1    |    daemon_thread2
>>>> ------------------------------------------------------------
>>>>    cachefiles_ondemand_init_object
>>>>     cachefiles_ondemand_send_req
>>>>      REQ_A = kzalloc(sizeof(*req) + data_len)
>>>>      wait_for_completion(&REQ_A->done)
>>>>
>>>>               cachefiles_daemon_read
>>>>                cachefiles_ondemand_daemon_read
>>>>                 REQ_A = cachefiles_ondemand_select_req
>>>>                 cachefiles_ondemand_get_fd
>>>>                 copy_to_user(_buffer, msg, n)
>>>>               process_open_req(REQ_A)
>>>>                                     ------ restore ------
>>>>                                     cachefiles_ondemand_restore
>>>>                                     xas_for_each(&xas, req, ULONG_MAX)
>>>>                                      xas_set_mark(&xas,
>>>> CACHEFILES_REQ_NEW);
>>>>
>>>>                                     cachefiles_daemon_read
>>>>                                      cachefiles_ondemand_daemon_read
>>>>                                       REQ_A =
>>>> cachefiles_ondemand_select_req
>>>>
>>>>                write(devfd, ("copen %u,%llu", msg->msg_id, size));
>>>>                cachefiles_ondemand_copen
>>>>                 xa_erase(&cache->reqs, id)
>>>>                 complete(&REQ_A->done)
>>>>      kfree(REQ_A)
>>>>                                       cachefiles_ondemand_get_fd(REQ_A)
>>>>                                        fd = get_unused_fd_flags
>>>>                                        file = anon_inode_getfile
>>>>                                        fd_install(fd, file)
>>>>                                        load = (void *)REQ_A->msg.data;
>>>>                                        load->fd = fd;
>>>>                                        // load UAF !!!
>>>>
>>>> This issue is caused by issuing a restore command when the daemon is
>>>> still
>>>> alive, which results in a request being processed multiple times thus
>>>> triggering a UAF. So to avoid this problem, add an additional reference
>>>> count to cachefiles_req, which is held while waiting and reading, and
>>>> then
>>>> released when the waiting and reading is over.
>>>>
>>>>
>>>> Note that since there is only one reference count for waiting, we
>>>> need to
>>>> avoid the same request being completed multiple times, so we can only
>>>> complete the request if it is successfully removed from the xarray.
>>> Sorry the above description makes me confused.  As the same request may
>>> be got by different daemon threads multiple times, the introduced
>>> refcount mechanism can't protect it from being completed multiple times
>>> (which is expected).  The refcount only protects it from being freed
>>> multiple times.
>> The idea here is that because the wait only holds one reference count,
>> complete(&req->done) can only be called when the req has been
>> successfully removed from the xarry, otherwise the following UAF may
>> occur:
>
> "complete(&req->done) can only be called when the req has been
> successfully removed from the xarry ..."
>
> How this is done? since the following xarray_erase() following the first
> xarray_erase() will fail as the xarray slot referred by the same id has
> already been erased?
>
>
>>>> @@ -455,7 +459,7 @@ static int cachefiles_ondemand_send_req(struct
>>>> cachefiles_object *object,
>>>>        wake_up_all(&cache->daemon_pollwq);
>>>>        wait_for_completion(&req->done);
>>>>        ret = req->error;
>>>> -    kfree(req);
>>>> +    cachefiles_req_put(req);
>>>>        return ret;
>>>>    out:
>>>>        /* Reset the object to close state in error handling path.
>>> Don't we need to also convert "kfree(req)" to cachefiles_req_put(req)
>>> for the error path of cachefiles_ondemand_send_req()?
>>>
>>> ```
>>> out:
>>>      /* Reset the object to close state in error handling path.
>>>       * If error occurs after creating the anonymous fd,
>>>       * cachefiles_ondemand_fd_release() will set object to close.
>>>       */
>>>      if (opcode == CACHEFILES_OP_OPEN)
>>>          cachefiles_ondemand_set_object_close(object);
>>>      kfree(req);
>>>      return ret;
>>> ```
>> When "goto out;" is called in cachefiles_ondemand_send_req(),
>> it means that the req is unallocated/failed to be allocated/failed to
>> be inserted into the xarry, and therefore the req can only be accessed
>> by the current function, so there is no need to consider concurrency
>> and reference counting.
> Okay I understand. But this is indeed quite confusing. I see no cost of
> also converting to cachefiles_req_put(req).
>
>
Yes, kfree(req) converts to cachefiles_req_put(req) at no cost,
but may trigger a NULL pointer dereference in cachefiles_req_put(req)
if the req has not been initialised.
-- 
With Best Regards,
Baokun Li
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 08/12] cachefiles: never get a new anonymous fd if ondemand_id is valid
  2024-05-20  9:07     ` Baokun Li
@ 2024-05-20  9:24       ` Jingbo Xu
  2024-05-20 11:14         ` Baokun Li
  0 siblings, 1 reply; 38+ messages in thread
From: Jingbo Xu @ 2024-05-20  9:24 UTC (permalink / raw)
  To: Baokun Li, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li
On 5/20/24 5:07 PM, Baokun Li wrote:
> On 2024/5/20 16:43, Jingbo Xu wrote:
>>
>> On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
>>> From: Baokun Li <libaokun1@huawei.com>
>>>
>>> Now every time the daemon reads an open request, it gets a new
>>> anonymous fd
>>> and ondemand_id. With the introduction of "restore", it is possible
>>> to read
>>> the same open request more than once, and therefore an object can
>>> have more
>>> than one anonymous fd.
>>>
>>> If the anonymous fd is not unique, the following concurrencies will
>>> result
>>> in an fd leak:
>>>
>>>       t1     |         t2         |          t3
>>> ------------------------------------------------------------
>>>   cachefiles_ondemand_init_object
>>>    cachefiles_ondemand_send_req
>>>     REQ_A = kzalloc(sizeof(*req) + data_len)
>>>     wait_for_completion(&REQ_A->done)
>>>              cachefiles_daemon_read
>>>               cachefiles_ondemand_daemon_read
>>>                REQ_A = cachefiles_ondemand_select_req
>>>                cachefiles_ondemand_get_fd
>>>                  load->fd = fd0
>>>                  ondemand_id = object_id0
>>>                                    ------ restore ------
>>>                                    cachefiles_ondemand_restore
>>>                                     // restore REQ_A
>>>                                    cachefiles_daemon_read
>>>                                     cachefiles_ondemand_daemon_read
>>>                                      REQ_A =
>>> cachefiles_ondemand_select_req
>>>                                        cachefiles_ondemand_get_fd
>>>                                          load->fd = fd1
>>>                                          ondemand_id = object_id1
>>>               process_open_req(REQ_A)
>>>               write(devfd, ("copen %u,%llu", msg->msg_id, size))
>>>               cachefiles_ondemand_copen
>>>                xa_erase(&cache->reqs, id)
>>>                complete(&REQ_A->done)
>>>     kfree(REQ_A)
>>>                                    process_open_req(REQ_A)
>>>                                    // copen fails due to no req
>>>                                    // daemon close(fd1)
>>>                                    cachefiles_ondemand_fd_release
>>>                                     // set object closed
>>>   -- umount --
>>>   cachefiles_withdraw_cookie
>>>    cachefiles_ondemand_clean_object
>>>     cachefiles_ondemand_init_close_req
>>>      if (!cachefiles_ondemand_object_is_open(object))
>>>        return -ENOENT;
>>>      // The fd0 is not closed until the daemon exits.
>>>
>>> However, the anonymous fd holds the reference count of the object and
>>> the
>>> object holds the reference count of the cookie. So even though the
>>> cookie
>>> has been relinquished, it will not be unhashed and freed until the
>>> daemon
>>> exits.
>>>
>>> In fscache_hash_cookie(), when the same cookie is found in the hash
>>> list,
>>> if the cookie is set with the FSCACHE_COOKIE_RELINQUISHED bit, then
>>> the new
>>> cookie waits for the old cookie to be unhashed, while the old cookie is
>>> waiting for the leaked fd to be closed, if the daemon does not exit
>>> in time
>>> it will trigger a hung task.
>>>
>>> To avoid this, allocate a new anonymous fd only if no anonymous fd has
>>> been allocated (ondemand_id == 0) or if the previously allocated
>>> anonymous
>>> fd has been closed (ondemand_id == -1). Moreover, returns an error if
>>> ondemand_id is valid, letting the daemon know that the current userland
>>> restore logic is abnormal and needs to be checked.
>>>
>>> Fixes: c8383054506c ("cachefiles: notify the user daemon when looking
>>> up cookie")
>>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> The LOCs of this fix is quite under control.  But still it seems that
>> the worst consequence is that the (potential) malicious daemon gets
>> hung.  No more effect to the system or other processes.  Or does a
>> non-malicious daemon have any chance having the same issue?
> If we enable hung_task_panic, it may cause panic to crash the server.
Then this issue has nothing to do with this patch?  As long as a
malicious daemon doesn't close the anonymous fd after umounting, then I
guess a following attempt of mounting cookie with the same name will
also wait and hung there?
-- 
Thanks,
Jingbo
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 09/12] cachefiles: defer exposing anon_fd until after copy_to_user() succeeds
  2024-05-15  8:45 ` [PATCH v2 09/12] cachefiles: defer exposing anon_fd until after copy_to_user() succeeds libaokun
@ 2024-05-20  9:39   ` Jingbo Xu
  2024-05-20 11:36     ` Baokun Li
  0 siblings, 1 reply; 38+ messages in thread
From: Jingbo Xu @ 2024-05-20  9:39 UTC (permalink / raw)
  To: libaokun, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li
On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
> From: Baokun Li <libaokun1@huawei.com>
> 
> After installing the anonymous fd, we can now see it in userland and close
> it. However, at this point we may not have gotten the reference count of
> the cache, but we will put it during colse fd, so this may cause a cache
> UAF.
> 
> So grab the cache reference count before fd_install(). In addition, by
> kernel convention, fd is taken over by the user land after fd_install(),
> and the kernel should not call close_fd() after that, i.e., it should call
> fd_install() after everything is ready, thus fd_install() is called after
> copy_to_user() succeeds.
> 
> Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
> Suggested-by: Hou Tao <houtao1@huawei.com>
> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> ---
>  fs/cachefiles/ondemand.c | 53 +++++++++++++++++++++++++---------------
>  1 file changed, 33 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
> index d2d4e27fca6f..3a36613e00a7 100644
> --- a/fs/cachefiles/ondemand.c
> +++ b/fs/cachefiles/ondemand.c
> @@ -4,6 +4,11 @@
>  #include <linux/uio.h>
>  #include "internal.h"
>  
> +struct anon_file {
> +	struct file *file;
> +	int fd;
> +};
> +
>  static inline void cachefiles_req_put(struct cachefiles_req *req)
>  {
>  	if (refcount_dec_and_test(&req->ref))
> @@ -263,14 +268,14 @@ int cachefiles_ondemand_restore(struct cachefiles_cache *cache, char *args)
>  	return 0;
>  }
>  
> -static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
> +static int cachefiles_ondemand_get_fd(struct cachefiles_req *req,
> +				      struct anon_file *anon_file)
How about:
int cachefiles_ondemand_get_fd(struct cachefiles_req *req, int *fd,
struct file *file) ?
It isn't worth introducing a new structure as it is used only for
parameter passing.
>  {
>  	struct cachefiles_object *object;
>  	struct cachefiles_cache *cache;
>  	struct cachefiles_open *load;
> -	struct file *file;
>  	u32 object_id;
> -	int ret, fd;
> +	int ret;
>  
>  	object = cachefiles_grab_object(req->object,
>  			cachefiles_obj_get_ondemand_fd);
> @@ -282,16 +287,16 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
>  	if (ret < 0)
>  		goto err;
>  
> -	fd = get_unused_fd_flags(O_WRONLY);
> -	if (fd < 0) {
> -		ret = fd;
> +	anon_file->fd = get_unused_fd_flags(O_WRONLY);
> +	if (anon_file->fd < 0) {
> +		ret = anon_file->fd;
>  		goto err_free_id;
>  	}
>  
> -	file = anon_inode_getfile("[cachefiles]", &cachefiles_ondemand_fd_fops,
> -				  object, O_WRONLY);
> -	if (IS_ERR(file)) {
> -		ret = PTR_ERR(file);
> +	anon_file->file = anon_inode_getfile("[cachefiles]",
> +				&cachefiles_ondemand_fd_fops, object, O_WRONLY);
> +	if (IS_ERR(anon_file->file)) {
> +		ret = PTR_ERR(anon_file->file);
>  		goto err_put_fd;
>  	}
>  
> @@ -299,16 +304,15 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
>  	if (object->ondemand->ondemand_id > 0) {
>  		spin_unlock(&object->ondemand->lock);
>  		/* Pair with check in cachefiles_ondemand_fd_release(). */
> -		file->private_data = NULL;
> +		anon_file->file->private_data = NULL;
>  		ret = -EEXIST;
>  		goto err_put_file;
>  	}
>  
> -	file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
> -	fd_install(fd, file);
> +	anon_file->file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
>  
>  	load = (void *)req->msg.data;
> -	load->fd = fd;
> +	load->fd = anon_file->fd;
>  	object->ondemand->ondemand_id = object_id;
>  	spin_unlock(&object->ondemand->lock);
>  
> @@ -317,9 +321,11 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
>  	return 0;
>  
>  err_put_file:
> -	fput(file);
> +	fput(anon_file->file);
> +	anon_file->file = NULL;
When cachefiles_ondemand_get_fd() returns failure, anon_file->file is
not used, and thus I don't think it is worth resetting anon_file->file
to NULL. Or we could assign fd and struct file at the very end when all
succeed.
>  err_put_fd:
> -	put_unused_fd(fd);
> +	put_unused_fd(anon_file->fd);
> +	anon_file->fd = ret;
Ditto.
>  err_free_id:
>  	xa_erase(&cache->ondemand_ids, object_id);
>  err:
> @@ -376,6 +382,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>  	struct cachefiles_msg *msg;
>  	size_t n;
>  	int ret = 0;
> +	struct anon_file anon_file;
>  	XA_STATE(xas, &cache->reqs, cache->req_id_next);
>  
>  	xa_lock(&cache->reqs);
> @@ -409,7 +416,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>  	xa_unlock(&cache->reqs);
>  
>  	if (msg->opcode == CACHEFILES_OP_OPEN) {
> -		ret = cachefiles_ondemand_get_fd(req);
> +		ret = cachefiles_ondemand_get_fd(req, &anon_file);
>  		if (ret)
>  			goto out;
>  	}
> @@ -417,10 +424,16 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>  	msg->msg_id = xas.xa_index;
>  	msg->object_id = req->object->ondemand->ondemand_id;
>  
> -	if (copy_to_user(_buffer, msg, n) != 0) {
> +	if (copy_to_user(_buffer, msg, n) != 0)
>  		ret = -EFAULT;
> -		if (msg->opcode == CACHEFILES_OP_OPEN)
> -			close_fd(((struct cachefiles_open *)msg->data)->fd);
> +
> +	if (msg->opcode == CACHEFILES_OP_OPEN) {
> +		if (ret < 0) {
> +			fput(anon_file.file);
> +			put_unused_fd(anon_file.fd);
> +			goto out;
> +		}
> +		fd_install(anon_file.fd, anon_file.file);
>  	}
>  out:
>  	cachefiles_put_object(req->object, cachefiles_obj_put_read_req);
-- 
Thanks,
Jingbo
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 08/12] cachefiles: never get a new anonymous fd if ondemand_id is valid
  2024-05-20  9:24       ` Jingbo Xu
@ 2024-05-20 11:14         ` Baokun Li
  2024-05-20 11:24           ` Gao Xiang
  0 siblings, 1 reply; 38+ messages in thread
From: Baokun Li @ 2024-05-20 11:14 UTC (permalink / raw)
  To: Jingbo Xu, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li, libaokun
On 2024/5/20 17:24, Jingbo Xu wrote:
>
> On 5/20/24 5:07 PM, Baokun Li wrote:
>> On 2024/5/20 16:43, Jingbo Xu wrote:
>>> On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
>>>> From: Baokun Li <libaokun1@huawei.com>
>>>>
SNIP
>>>>
>>>> To avoid this, allocate a new anonymous fd only if no anonymous fd has
>>>> been allocated (ondemand_id == 0) or if the previously allocated
>>>> anonymous
>>>> fd has been closed (ondemand_id == -1). Moreover, returns an error if
>>>> ondemand_id is valid, letting the daemon know that the current userland
>>>> restore logic is abnormal and needs to be checked.
>>>>
>>>> Fixes: c8383054506c ("cachefiles: notify the user daemon when looking
>>>> up cookie")
>>>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>>> The LOCs of this fix is quite under control.  But still it seems that
>>> the worst consequence is that the (potential) malicious daemon gets
>>> hung.  No more effect to the system or other processes.  Or does a
>>> non-malicious daemon have any chance having the same issue?
>> If we enable hung_task_panic, it may cause panic to crash the server.
> Then this issue has nothing to do with this patch?  As long as a
> malicious daemon doesn't close the anonymous fd after umounting, then I
> guess a following attempt of mounting cookie with the same name will
> also wait and hung there?
>
Yes, a daemon that only reads requests but doesn't process them will
cause hung,but the daemon will obey the basic constraints when we
test it.
-- 
With Best Regards,
Baokun Li
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 08/12] cachefiles: never get a new anonymous fd if ondemand_id is valid
  2024-05-20 11:14         ` Baokun Li
@ 2024-05-20 11:24           ` Gao Xiang
  0 siblings, 0 replies; 38+ messages in thread
From: Gao Xiang @ 2024-05-20 11:24 UTC (permalink / raw)
  To: Baokun Li, Jingbo Xu, netfs, dhowells, jlayton
  Cc: zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel, yangerkun,
	houtao1, yukuai3, wozizhi, Baokun Li
On 2024/5/20 19:14, Baokun Li wrote:
> On 2024/5/20 17:24, Jingbo Xu wrote:
>>
>> On 5/20/24 5:07 PM, Baokun Li wrote:
>>> On 2024/5/20 16:43, Jingbo Xu wrote:
>>>> On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
>>>>> From: Baokun Li <libaokun1@huawei.com>
>>>>>
> SNIP
>>>>>
>>>>> To avoid this, allocate a new anonymous fd only if no anonymous fd has
>>>>> been allocated (ondemand_id == 0) or if the previously allocated
>>>>> anonymous
>>>>> fd has been closed (ondemand_id == -1). Moreover, returns an error if
>>>>> ondemand_id is valid, letting the daemon know that the current userland
>>>>> restore logic is abnormal and needs to be checked.
>>>>>
>>>>> Fixes: c8383054506c ("cachefiles: notify the user daemon when looking
>>>>> up cookie")
>>>>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>>>> The LOCs of this fix is quite under control.  But still it seems that
>>>> the worst consequence is that the (potential) malicious daemon gets
>>>> hung.  No more effect to the system or other processes.  Or does a
>>>> non-malicious daemon have any chance having the same issue?
>>> If we enable hung_task_panic, it may cause panic to crash the server.
>> Then this issue has nothing to do with this patch?  As long as a
>> malicious daemon doesn't close the anonymous fd after umounting, then I
>> guess a following attempt of mounting cookie with the same name will
>> also wait and hung there?
>>
> Yes, a daemon that only reads requests but doesn't process them will
> cause hung,but the daemon will obey the basic constraints when we
> test it.
If we'd really like to enhanace this ("hung_task_panic"), I think
you'd better to switch wait_for_completion() to
wait_for_completion_killable() at least IMHO anyway.
Thanks,
Gao Xiang
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 09/12] cachefiles: defer exposing anon_fd until after copy_to_user() succeeds
  2024-05-20  9:39   ` Jingbo Xu
@ 2024-05-20 11:36     ` Baokun Li
  0 siblings, 0 replies; 38+ messages in thread
From: Baokun Li @ 2024-05-20 11:36 UTC (permalink / raw)
  To: Jingbo Xu, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li, libaokun
On 2024/5/20 17:39, Jingbo Xu wrote:
>
> On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
>> From: Baokun Li <libaokun1@huawei.com>
>>
>> After installing the anonymous fd, we can now see it in userland and close
>> it. However, at this point we may not have gotten the reference count of
>> the cache, but we will put it during colse fd, so this may cause a cache
>> UAF.
>>
>> So grab the cache reference count before fd_install(). In addition, by
>> kernel convention, fd is taken over by the user land after fd_install(),
>> and the kernel should not call close_fd() after that, i.e., it should call
>> fd_install() after everything is ready, thus fd_install() is called after
>> copy_to_user() succeeds.
>>
>> Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie")
>> Suggested-by: Hou Tao <houtao1@huawei.com>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> ---
>>   fs/cachefiles/ondemand.c | 53 +++++++++++++++++++++++++---------------
>>   1 file changed, 33 insertions(+), 20 deletions(-)
>>
>> diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c
>> index d2d4e27fca6f..3a36613e00a7 100644
>> --- a/fs/cachefiles/ondemand.c
>> +++ b/fs/cachefiles/ondemand.c
>> @@ -4,6 +4,11 @@
>>   #include <linux/uio.h>
>>   #include "internal.h"
>>   
>> +struct anon_file {
>> +	struct file *file;
>> +	int fd;
>> +};
>> +
>>   static inline void cachefiles_req_put(struct cachefiles_req *req)
>>   {
>>   	if (refcount_dec_and_test(&req->ref))
>> @@ -263,14 +268,14 @@ int cachefiles_ondemand_restore(struct cachefiles_cache *cache, char *args)
>>   	return 0;
>>   }
>>   
>
>> -static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
>> +static int cachefiles_ondemand_get_fd(struct cachefiles_req *req,
>> +				      struct anon_file *anon_file)
>
> How about:
>
> int cachefiles_ondemand_get_fd(struct cachefiles_req *req, int *fd,
> struct file *file) ?
>
> It isn't worth introducing a new structure as it is used only for
> parameter passing.
>
It's just a different code style preference, and internally we think
it makes the code look clearer when encapsulated this way.
>>   {
>>   	struct cachefiles_object *object;
>>   	struct cachefiles_cache *cache;
>>   	struct cachefiles_open *load;
>> -	struct file *file;
>>   	u32 object_id;
>> -	int ret, fd;
>> +	int ret;
>>   
>>   	object = cachefiles_grab_object(req->object,
>>   			cachefiles_obj_get_ondemand_fd);
>> @@ -282,16 +287,16 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
>>   	if (ret < 0)
>>   		goto err;
>>   
>> -	fd = get_unused_fd_flags(O_WRONLY);
>> -	if (fd < 0) {
>> -		ret = fd;
>> +	anon_file->fd = get_unused_fd_flags(O_WRONLY);
>> +	if (anon_file->fd < 0) {
>> +		ret = anon_file->fd;
>>   		goto err_free_id;
>>   	}
>>   
>> -	file = anon_inode_getfile("[cachefiles]", &cachefiles_ondemand_fd_fops,
>> -				  object, O_WRONLY);
>> -	if (IS_ERR(file)) {
>> -		ret = PTR_ERR(file);
>> +	anon_file->file = anon_inode_getfile("[cachefiles]",
>> +				&cachefiles_ondemand_fd_fops, object, O_WRONLY);
>> +	if (IS_ERR(anon_file->file)) {
>> +		ret = PTR_ERR(anon_file->file);
>>   		goto err_put_fd;
>>   	}
>>   
>> @@ -299,16 +304,15 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
>>   	if (object->ondemand->ondemand_id > 0) {
>>   		spin_unlock(&object->ondemand->lock);
>>   		/* Pair with check in cachefiles_ondemand_fd_release(). */
>> -		file->private_data = NULL;
>> +		anon_file->file->private_data = NULL;
>>   		ret = -EEXIST;
>>   		goto err_put_file;
>>   	}
>>   
>> -	file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
>> -	fd_install(fd, file);
>> +	anon_file->file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
>>   
>>   	load = (void *)req->msg.data;
>> -	load->fd = fd;
>> +	load->fd = anon_file->fd;
>>   	object->ondemand->ondemand_id = object_id;
>>   	spin_unlock(&object->ondemand->lock);
>>   
>> @@ -317,9 +321,11 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req)
>>   	return 0;
>>   
>>   err_put_file:
>> -	fput(file);
>> +	fput(anon_file->file);
>> +	anon_file->file = NULL;
> When cachefiles_ondemand_get_fd() returns failure, anon_file->file is
> not used, and thus I don't think it is worth resetting anon_file->file
> to NULL. Or we could assign fd and struct file at the very end when all
> succeed.
Nulling pointers that are no longer in use is a safer coding convention,
which goes some way to avoiding double free or use-after-free.
Moreover it's in the error branch, so it doesn't cost anything.
>>   err_put_fd:
>> -	put_unused_fd(fd);
>> +	put_unused_fd(anon_file->fd);
>> +	anon_file->fd = ret;
> Ditto.
>
>>   err_free_id:
>>   	xa_erase(&cache->ondemand_ids, object_id);
>>   err:
>> @@ -376,6 +382,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>>   	struct cachefiles_msg *msg;
>>   	size_t n;
>>   	int ret = 0;
>> +	struct anon_file anon_file;
>>   	XA_STATE(xas, &cache->reqs, cache->req_id_next);
>>   
>>   	xa_lock(&cache->reqs);
>> @@ -409,7 +416,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>>   	xa_unlock(&cache->reqs);
>>   
>>   	if (msg->opcode == CACHEFILES_OP_OPEN) {
>> -		ret = cachefiles_ondemand_get_fd(req);
>> +		ret = cachefiles_ondemand_get_fd(req, &anon_file);
>>   		if (ret)
>>   			goto out;
>>   	}
>> @@ -417,10 +424,16 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
>>   	msg->msg_id = xas.xa_index;
>>   	msg->object_id = req->object->ondemand->ondemand_id;
>>   
>> -	if (copy_to_user(_buffer, msg, n) != 0) {
>> +	if (copy_to_user(_buffer, msg, n) != 0)
>>   		ret = -EFAULT;
>> -		if (msg->opcode == CACHEFILES_OP_OPEN)
>> -			close_fd(((struct cachefiles_open *)msg->data)->fd);
>> +
>> +	if (msg->opcode == CACHEFILES_OP_OPEN) {
>> +		if (ret < 0) {
>> +			fput(anon_file.file);
>> +			put_unused_fd(anon_file.fd);
>> +			goto out;
>> +		}
>> +		fd_install(anon_file.fd, anon_file.file);
>>   	}
>>   out:
>>   	cachefiles_put_object(req->object, cachefiles_obj_put_read_req);
-- 
With Best Regards,
Baokun Li
^ permalink raw reply	[flat|nested] 38+ messages in thread
* Re: [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
  2024-05-20  9:10       ` Jingbo Xu
  2024-05-20  9:19         ` Baokun Li
@ 2024-05-20 12:22         ` Baokun Li
  1 sibling, 0 replies; 38+ messages in thread
From: Baokun Li @ 2024-05-20 12:22 UTC (permalink / raw)
  To: Jingbo Xu, netfs, dhowells, jlayton
  Cc: hsiangkao, zhujia.zj, linux-erofs, linux-fsdevel, linux-kernel,
	yangerkun, houtao1, yukuai3, wozizhi, Baokun Li, libaokun
On 2024/5/20 17:10, Jingbo Xu wrote:
>
> On 5/20/24 4:38 PM, Baokun Li wrote:
>> Hi Jingbo,
>>
>> Thanks for your review!
>>
>> On 2024/5/20 15:24, Jingbo Xu wrote:
>>> On 5/15/24 4:45 PM, libaokun@huaweicloud.com wrote:
>>>> From: Baokun Li <libaokun1@huawei.com>
>>>>
>>>> We got the following issue in a fuzz test of randomly issuing the
>>>> restore
>>>> command:
>>>>
>>>> ==================================================================
>>>> BUG: KASAN: slab-use-after-free in
>>>> cachefiles_ondemand_daemon_read+0x609/0xab0
>>>> Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962
>>>>
>>>> CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542
>>>> Call Trace:
>>>>    kasan_report+0x94/0xc0
>>>>    cachefiles_ondemand_daemon_read+0x609/0xab0
>>>>    vfs_read+0x169/0xb50
>>>>    ksys_read+0xf5/0x1e0
>>>>
>>>> Allocated by task 626:
>>>>    __kmalloc+0x1df/0x4b0
>>>>    cachefiles_ondemand_send_req+0x24d/0x690
>>>>    cachefiles_create_tmpfile+0x249/0xb30
>>>>    cachefiles_create_file+0x6f/0x140
>>>>    cachefiles_look_up_object+0x29c/0xa60
>>>>    cachefiles_lookup_cookie+0x37d/0xca0
>>>>    fscache_cookie_state_machine+0x43c/0x1230
>>>>    [...]
>>>>
>>>> Freed by task 626:
>>>>    kfree+0xf1/0x2c0
>>>>    cachefiles_ondemand_send_req+0x568/0x690
>>>>    cachefiles_create_tmpfile+0x249/0xb30
>>>>    cachefiles_create_file+0x6f/0x140
>>>>    cachefiles_look_up_object+0x29c/0xa60
>>>>    cachefiles_lookup_cookie+0x37d/0xca0
>>>>    fscache_cookie_state_machine+0x43c/0x1230
>>>>    [...]
>>>> ==================================================================
>>>>
>>>> Following is the process that triggers the issue:
>>>>
>>>>        mount  |   daemon_thread1    |    daemon_thread2
>>>> ------------------------------------------------------------
>>>>    cachefiles_ondemand_init_object
>>>>     cachefiles_ondemand_send_req
>>>>      REQ_A = kzalloc(sizeof(*req) + data_len)
>>>>      wait_for_completion(&REQ_A->done)
>>>>
>>>>               cachefiles_daemon_read
>>>>                cachefiles_ondemand_daemon_read
>>>>                 REQ_A = cachefiles_ondemand_select_req
>>>>                 cachefiles_ondemand_get_fd
>>>>                 copy_to_user(_buffer, msg, n)
>>>>               process_open_req(REQ_A)
>>>>                                     ------ restore ------
>>>>                                     cachefiles_ondemand_restore
>>>>                                     xas_for_each(&xas, req, ULONG_MAX)
>>>>                                      xas_set_mark(&xas,
>>>> CACHEFILES_REQ_NEW);
>>>>
>>>>                                     cachefiles_daemon_read
>>>>                                      cachefiles_ondemand_daemon_read
>>>>                                       REQ_A =
>>>> cachefiles_ondemand_select_req
>>>>
>>>>                write(devfd, ("copen %u,%llu", msg->msg_id, size));
>>>>                cachefiles_ondemand_copen
>>>>                 xa_erase(&cache->reqs, id)
>>>>                 complete(&REQ_A->done)
>>>>      kfree(REQ_A)
>>>>                                       cachefiles_ondemand_get_fd(REQ_A)
>>>>                                        fd = get_unused_fd_flags
>>>>                                        file = anon_inode_getfile
>>>>                                        fd_install(fd, file)
>>>>                                        load = (void *)REQ_A->msg.data;
>>>>                                        load->fd = fd;
>>>>                                        // load UAF !!!
>>>>
>>>> This issue is caused by issuing a restore command when the daemon is
>>>> still
>>>> alive, which results in a request being processed multiple times thus
>>>> triggering a UAF. So to avoid this problem, add an additional reference
>>>> count to cachefiles_req, which is held while waiting and reading, and
>>>> then
>>>> released when the waiting and reading is over.
>>>>
>>>>
>>>> Note that since there is only one reference count for waiting, we
>>>> need to
>>>> avoid the same request being completed multiple times, so we can only
>>>> complete the request if it is successfully removed from the xarray.
>>> Sorry the above description makes me confused.  As the same request may
>>> be got by different daemon threads multiple times, the introduced
>>> refcount mechanism can't protect it from being completed multiple times
>>> (which is expected).  The refcount only protects it from being freed
>>> multiple times.
>> The idea here is that because the wait only holds one reference count,
>> complete(&req->done) can only be called when the req has been
>> successfully removed from the xarry, otherwise the following UAF may
>> occur:
>
> "complete(&req->done) can only be called when the req has been
> successfully removed from the xarry ..."
>
> How this is done? since the following xarray_erase() following the first
> xarray_erase() will fail as the xarray slot referred by the same id has
> already been erased?
Sorry just forgot to reply to this!
Yes, after loading the xas, the entry (aka req) is checked to see if it 
meets
expectations, and only when it does do we null the xas and complete the 
request.
-- 
With Best Regards,
Baokun Li
^ permalink raw reply	[flat|nested] 38+ messages in thread
end of thread, other threads:[~2024-05-20 12:22 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-15  8:45 [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests libaokun
2024-05-15  8:45 ` [PATCH v2 01/12] cachefiles: remove request from xarry during flush requests libaokun
2024-05-20  2:20   ` Gao Xiang
2024-05-20  4:11     ` Baokun Li
2024-05-20  7:09   ` Jingbo Xu
2024-05-15  8:45 ` [PATCH v2 02/12] cachefiles: remove err_put_fd tag in cachefiles_ondemand_daemon_read() libaokun
2024-05-20  2:23   ` Gao Xiang
2024-05-20  4:15     ` Baokun Li
2024-05-15  8:45 ` [PATCH v2 03/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd() libaokun
2024-05-20  7:24   ` Jingbo Xu
2024-05-20  8:38     ` Baokun Li
2024-05-20  8:45       ` Gao Xiang
2024-05-20  9:10       ` Jingbo Xu
2024-05-20  9:19         ` Baokun Li
2024-05-20 12:22         ` Baokun Li
2024-05-20  8:06   ` Jingbo Xu
2024-05-20  9:10     ` Baokun Li
2024-05-15  8:45 ` [PATCH v2 04/12] cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read() libaokun
2024-05-20  7:36   ` Jingbo Xu
2024-05-20  8:56     ` Baokun Li
2024-05-15  8:45 ` [PATCH v2 05/12] cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd libaokun
2024-05-20  7:40   ` Jingbo Xu
2024-05-20  9:02     ` Baokun Li
2024-05-15  8:45 ` [PATCH v2 06/12] cachefiles: add consistency check for copen/cread libaokun
2024-05-15  8:45 ` [PATCH v2 07/12] cachefiles: add spin_lock for cachefiles_ondemand_info libaokun
2024-05-15  8:45 ` [PATCH v2 08/12] cachefiles: never get a new anonymous fd if ondemand_id is valid libaokun
2024-05-20  8:43   ` Jingbo Xu
2024-05-20  9:07     ` Baokun Li
2024-05-20  9:24       ` Jingbo Xu
2024-05-20 11:14         ` Baokun Li
2024-05-20 11:24           ` Gao Xiang
2024-05-15  8:45 ` [PATCH v2 09/12] cachefiles: defer exposing anon_fd until after copy_to_user() succeeds libaokun
2024-05-20  9:39   ` Jingbo Xu
2024-05-20 11:36     ` Baokun Li
2024-05-15  8:45 ` [PATCH v2 10/12] cachefiles: Set object to close if ondemand_id < 0 in copen libaokun
2024-05-15  8:46 ` [PATCH v2 11/12] cachefiles: flush all requests after setting CACHEFILES_DEAD libaokun
2024-05-15  8:46 ` [PATCH v2 12/12] cachefiles: make on-demand read killable libaokun
2024-05-19 10:56 ` [PATCH v2 00/12] cachefiles: some bugfixes and cleanups for ondemand requests Jeff Layton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).