* [PATCH v2 0/5] misc: fastrpc: Add invokeV2 to support new features
@ 2025-01-27 4:42 Ekansh Gupta
2025-01-27 4:42 ` [PATCH v2 1/5] misc: fastrpc: Move fdlist to invoke context structure Ekansh Gupta
` (4 more replies)
0 siblings, 5 replies; 25+ messages in thread
From: Ekansh Gupta @ 2025-01-27 4:42 UTC (permalink / raw)
To: srinivas.kandagatla, linux-arm-msm
Cc: gregkh, quic_bkumar, linux-kernel, quic_chennak, dri-devel, arnd
This patch series adds the listed features that have been missing
in upstream fastRPC driver.
- Add changes to support new enhanced invocation ioctl request.
- Add support for CRC check.
- Add polling mode support.
Userspace change: https://github.com/quic/fastrpc/pull/134
Patch [v1]: https://lore.kernel.org/all/20241007084518.3649876-1-quic_ekangupt@quicinc.com/
Changes in v2:
- Moved context specific pointers under invoke_ctx structure to avoid
code duplicacy.
- Separated userspace and kernel data structures.
- Changed design to address existing problems.
- Added checks for reserved bits.
- Dropped performance counter changes for now. I will rework and
rework and submit it separately.
Ekansh Gupta (5):
misc: fastrpc: Move fdlist to invoke context structure
misc: fastrpc: Introduce context params structure
misc: fastrpc: Add CRC support using invokeV2 request
misc: fastrpc: Add polling mode support for fastRPC driver
misc: fastrpc: Modify context id mask to support polling mode
drivers/misc/fastrpc.c | 348 ++++++++++++++++++++++++++++++------
include/uapi/misc/fastrpc.h | 8 +
2 files changed, 302 insertions(+), 54 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH v2 1/5] misc: fastrpc: Move fdlist to invoke context structure
2025-01-27 4:42 [PATCH v2 0/5] misc: fastrpc: Add invokeV2 to support new features Ekansh Gupta
@ 2025-01-27 4:42 ` Ekansh Gupta
2025-01-27 15:37 ` Dmitry Baryshkov
2025-01-27 4:42 ` [PATCH v2 2/5] misc: fastrpc: Introduce context params structure Ekansh Gupta
` (3 subsequent siblings)
4 siblings, 1 reply; 25+ messages in thread
From: Ekansh Gupta @ 2025-01-27 4:42 UTC (permalink / raw)
To: srinivas.kandagatla, linux-arm-msm
Cc: gregkh, quic_bkumar, linux-kernel, quic_chennak, dri-devel, arnd
FD list is part of meta buffer which is calulated during put args.
Move fdlist to invoke context structure for better maintenance and
to avoid code duplicacy for calculation of critcal meta buffer
contents that are used by fastrpc driver.
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
---
drivers/misc/fastrpc.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 42b4d60c224c..1a936d462519 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -236,6 +236,7 @@ struct fastrpc_invoke_ctx {
int pid;
int client_id;
u32 sc;
+ u64 *fdlist;
u32 *crc;
u64 ctxid;
u64 msg_sz;
@@ -965,6 +966,7 @@ static int fastrpc_get_args(u32 kernel, struct fastrpc_invoke_ctx *ctx)
rpra = ctx->buf->virt;
list = fastrpc_invoke_buf_start(rpra, ctx->nscalars);
pages = fastrpc_phy_page_start(list, ctx->nscalars);
+ ctx->fdlist = (u64 *)(pages + ctx->nscalars);
args = (uintptr_t)ctx->buf->virt + metalen;
rlen = pkt_size - metalen;
ctx->rpra = rpra;
@@ -1067,17 +1069,9 @@ static int fastrpc_put_args(struct fastrpc_invoke_ctx *ctx,
union fastrpc_remote_arg *rpra = ctx->rpra;
struct fastrpc_user *fl = ctx->fl;
struct fastrpc_map *mmap = NULL;
- struct fastrpc_invoke_buf *list;
- struct fastrpc_phy_page *pages;
- u64 *fdlist;
- int i, inbufs, outbufs, handles;
+ int i, inbufs;
inbufs = REMOTE_SCALARS_INBUFS(ctx->sc);
- outbufs = REMOTE_SCALARS_OUTBUFS(ctx->sc);
- handles = REMOTE_SCALARS_INHANDLES(ctx->sc) + REMOTE_SCALARS_OUTHANDLES(ctx->sc);
- list = fastrpc_invoke_buf_start(rpra, ctx->nscalars);
- pages = fastrpc_phy_page_start(list, ctx->nscalars);
- fdlist = (uint64_t *)(pages + inbufs + outbufs + handles);
for (i = inbufs; i < ctx->nbufs; ++i) {
if (!ctx->maps[i]) {
@@ -1096,9 +1090,9 @@ static int fastrpc_put_args(struct fastrpc_invoke_ctx *ctx,
/* Clean up fdlist which is updated by DSP */
for (i = 0; i < FASTRPC_MAX_FDLIST; i++) {
- if (!fdlist[i])
+ if (!ctx->fdlist[i])
break;
- if (!fastrpc_map_lookup(fl, (int)fdlist[i], &mmap, false))
+ if (!fastrpc_map_lookup(fl, (int)ctx->fdlist[i], &mmap, false))
fastrpc_map_put(mmap);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 2/5] misc: fastrpc: Introduce context params structure
2025-01-27 4:42 [PATCH v2 0/5] misc: fastrpc: Add invokeV2 to support new features Ekansh Gupta
2025-01-27 4:42 ` [PATCH v2 1/5] misc: fastrpc: Move fdlist to invoke context structure Ekansh Gupta
@ 2025-01-27 4:42 ` Ekansh Gupta
2025-01-27 15:43 ` Dmitry Baryshkov
2025-01-27 4:42 ` [PATCH v2 3/5] misc: fastrpc: Add CRC support using invokeV2 request Ekansh Gupta
` (2 subsequent siblings)
4 siblings, 1 reply; 25+ messages in thread
From: Ekansh Gupta @ 2025-01-27 4:42 UTC (permalink / raw)
To: srinivas.kandagatla, linux-arm-msm
Cc: gregkh, quic_bkumar, linux-kernel, quic_chennak, dri-devel, arnd
Add structure to invoke context parameterms. This structure is meant
to carry invoke context specific data. This structure is passed to
internal invocation call to save the data in context. Examples of
data intended to part of this structure are: CRC user memory address,
poll timeout for invoke call, call priority etc.
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
---
drivers/misc/fastrpc.c | 138 ++++++++++++++++++++++++++++++++++-------
1 file changed, 117 insertions(+), 21 deletions(-)
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 1a936d462519..c29d5536195e 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -254,6 +254,11 @@ struct fastrpc_invoke_ctx {
struct fastrpc_channel_ctx *cctx;
};
+struct fastrpc_ctx_args {
+ struct fastrpc_invoke_args *args;
+ void __user *crc;
+};
+
struct fastrpc_session_ctx {
struct device *dev;
int sid;
@@ -574,7 +579,7 @@ static void fastrpc_get_buff_overlaps(struct fastrpc_invoke_ctx *ctx)
static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
struct fastrpc_user *user, u32 kernel, u32 sc,
- struct fastrpc_invoke_args *args)
+ struct fastrpc_ctx_args *cargs)
{
struct fastrpc_channel_ctx *cctx = user->cctx;
struct fastrpc_invoke_ctx *ctx = NULL;
@@ -605,7 +610,7 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
kfree(ctx);
return ERR_PTR(-ENOMEM);
}
- ctx->args = args;
+ ctx->args = cargs->args;
fastrpc_get_buff_overlaps(ctx);
}
@@ -1133,7 +1138,7 @@ static int fastrpc_invoke_send(struct fastrpc_session_ctx *sctx,
static int fastrpc_internal_invoke(struct fastrpc_user *fl, u32 kernel,
u32 handle, u32 sc,
- struct fastrpc_invoke_args *args)
+ struct fastrpc_ctx_args *cargs)
{
struct fastrpc_invoke_ctx *ctx = NULL;
struct fastrpc_buf *buf, *b;
@@ -1151,7 +1156,7 @@ static int fastrpc_internal_invoke(struct fastrpc_user *fl, u32 kernel,
return -EPERM;
}
- ctx = fastrpc_context_alloc(fl, kernel, sc, args);
+ ctx = fastrpc_context_alloc(fl, kernel, sc, cargs);
if (IS_ERR(ctx))
return PTR_ERR(ctx);
@@ -1233,6 +1238,7 @@ static int fastrpc_init_create_static_process(struct fastrpc_user *fl,
{
struct fastrpc_init_create_static init;
struct fastrpc_invoke_args *args;
+ struct fastrpc_ctx_args *cargs;
struct fastrpc_phy_page pages[1];
char *name;
int err;
@@ -1307,15 +1313,25 @@ static int fastrpc_init_create_static_process(struct fastrpc_user *fl,
args[2].length = sizeof(*pages);
args[2].fd = -1;
+ cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
+ if (!cargs) {
+ err = -ENOMEM;
+ goto err_invoke;
+ }
+
+ cargs->args = args;
sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_CREATE_STATIC, 3, 0);
err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
- sc, args);
- if (err)
+ sc, cargs);
+ if (err) {
+ kfree(cargs);
goto err_invoke;
+ }
kfree(args);
kfree(name);
+ kfree(cargs);
return 0;
err_invoke:
@@ -1351,6 +1367,7 @@ static int fastrpc_init_create_process(struct fastrpc_user *fl,
{
struct fastrpc_init_create init;
struct fastrpc_invoke_args *args;
+ struct fastrpc_ctx_args *cargs;
struct fastrpc_phy_page pages[1];
struct fastrpc_map *map = NULL;
struct fastrpc_buf *imem = NULL;
@@ -1438,16 +1455,26 @@ static int fastrpc_init_create_process(struct fastrpc_user *fl,
args[5].length = sizeof(inbuf.siglen);
args[5].fd = -1;
+ cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
+ if (!cargs) {
+ err = -ENOMEM;
+ goto err_invoke;
+ }
+
+ cargs->args = args;
sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_CREATE, 4, 0);
if (init.attrs)
sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_CREATE_ATTR, 4, 0);
err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
- sc, args);
- if (err)
+ sc, cargs);
+ if (err) {
+ kfree(cargs);
goto err_invoke;
+ }
kfree(args);
+ kfree(cargs);
return 0;
@@ -1498,17 +1525,27 @@ static void fastrpc_session_free(struct fastrpc_channel_ctx *cctx,
static int fastrpc_release_current_dsp_process(struct fastrpc_user *fl)
{
struct fastrpc_invoke_args args[1];
- int client_id = 0;
+ struct fastrpc_ctx_args *cargs;
+ int client_id = 0, err;
u32 sc;
client_id = fl->client_id;
args[0].ptr = (u64)(uintptr_t) &client_id;
args[0].length = sizeof(client_id);
args[0].fd = -1;
+
+ cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
+ if (!cargs)
+ return -ENOMEM;
+
+ cargs->args = args;
sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_RELEASE, 1, 0);
- return fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
- sc, &args[0]);
+ err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
+ sc, cargs);
+ kfree(cargs);
+
+ return err;
}
static int fastrpc_device_release(struct inode *inode, struct file *file)
@@ -1643,22 +1680,33 @@ static int fastrpc_dmabuf_alloc(struct fastrpc_user *fl, char __user *argp)
static int fastrpc_init_attach(struct fastrpc_user *fl, int pd)
{
struct fastrpc_invoke_args args[1];
- int client_id = fl->client_id;
+ struct fastrpc_ctx_args *cargs;
+ int client_id = fl->client_id, err;
u32 sc;
args[0].ptr = (u64)(uintptr_t) &client_id;
args[0].length = sizeof(client_id);
args[0].fd = -1;
+
+ cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
+ if (!cargs)
+ return -ENOMEM;
+
+ cargs->args = args;
sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_ATTACH, 1, 0);
fl->pd = pd;
- return fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
- sc, &args[0]);
+ err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
+ sc, cargs);
+ kfree(cargs);
+
+ return err;
}
static int fastrpc_invoke(struct fastrpc_user *fl, char __user *argp)
{
struct fastrpc_invoke_args *args = NULL;
+ struct fastrpc_ctx_args *cargs;
struct fastrpc_invoke inv;
u32 nscalars;
int err;
@@ -1679,9 +1727,16 @@ static int fastrpc_invoke(struct fastrpc_user *fl, char __user *argp)
return -EFAULT;
}
}
+ cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
+ if (!cargs) {
+ kfree(args);
+ return -ENOMEM;
+ }
- err = fastrpc_internal_invoke(fl, false, inv.handle, inv.sc, args);
+ cargs->args = args;
+ err = fastrpc_internal_invoke(fl, false, inv.handle, inv.sc, cargs);
kfree(args);
+ kfree(cargs);
return err;
}
@@ -1690,6 +1745,8 @@ static int fastrpc_get_info_from_dsp(struct fastrpc_user *fl, uint32_t *dsp_attr
uint32_t dsp_attr_buf_len)
{
struct fastrpc_invoke_args args[2] = { 0 };
+ struct fastrpc_ctx_args *cargs;
+ int err;
/*
* Capability filled in userspace. This carries the information
@@ -1706,8 +1763,15 @@ static int fastrpc_get_info_from_dsp(struct fastrpc_user *fl, uint32_t *dsp_attr
args[1].length = dsp_attr_buf_len * sizeof(u32);
args[1].fd = -1;
- return fastrpc_internal_invoke(fl, true, FASTRPC_DSP_UTILITIES_HANDLE,
- FASTRPC_SCALARS(0, 1, 1), args);
+ cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
+ if (!cargs)
+ return -ENOMEM;
+
+ cargs->args = args;
+ err = fastrpc_internal_invoke(fl, true, FASTRPC_DSP_UTILITIES_HANDLE,
+ FASTRPC_SCALARS(0, 1, 1), cargs);
+ kfree(cargs);
+ return err;
}
static int fastrpc_get_info_from_kernel(struct fastrpc_ioctl_capability *cap,
@@ -1794,6 +1858,7 @@ static int fastrpc_get_dsp_info(struct fastrpc_user *fl, char __user *argp)
static int fastrpc_req_munmap_impl(struct fastrpc_user *fl, struct fastrpc_buf *buf)
{
struct fastrpc_invoke_args args[1] = { [0] = { 0 } };
+ struct fastrpc_ctx_args *cargs;
struct fastrpc_munmap_req_msg req_msg;
struct device *dev = fl->sctx->dev;
int err;
@@ -1806,9 +1871,14 @@ static int fastrpc_req_munmap_impl(struct fastrpc_user *fl, struct fastrpc_buf *
args[0].ptr = (u64) (uintptr_t) &req_msg;
args[0].length = sizeof(req_msg);
+ cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
+ if (!cargs)
+ return -ENOMEM;
+
+ cargs->args = args;
sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MUNMAP, 1, 0);
err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc,
- &args[0]);
+ cargs);
if (!err) {
dev_dbg(dev, "unmmap\tpt 0x%09lx OK\n", buf->raddr);
spin_lock(&fl->lock);
@@ -1818,6 +1888,7 @@ static int fastrpc_req_munmap_impl(struct fastrpc_user *fl, struct fastrpc_buf *
} else {
dev_err(dev, "unmmap\tpt 0x%09lx ERROR\n", buf->raddr);
}
+ kfree(cargs);
return err;
}
@@ -1852,6 +1923,7 @@ static int fastrpc_req_munmap(struct fastrpc_user *fl, char __user *argp)
static int fastrpc_req_mmap(struct fastrpc_user *fl, char __user *argp)
{
struct fastrpc_invoke_args args[3] = { [0 ... 2] = { 0 } };
+ struct fastrpc_ctx_args *cargs;
struct fastrpc_buf *buf = NULL;
struct fastrpc_mmap_req_msg req_msg;
struct fastrpc_mmap_rsp_msg rsp_msg;
@@ -1902,12 +1974,18 @@ static int fastrpc_req_mmap(struct fastrpc_user *fl, char __user *argp)
args[2].ptr = (u64) (uintptr_t) &rsp_msg;
args[2].length = sizeof(rsp_msg);
+ cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
+ if (!cargs)
+ return -ENOMEM;
+
+ cargs->args = args;
sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MMAP, 2, 1);
err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc,
- &args[0]);
+ cargs);
if (err) {
dev_err(dev, "mmap error (len 0x%08llx)\n", buf->size);
fastrpc_buf_free(buf);
+ kfree(cargs);
return err;
}
@@ -1942,17 +2020,20 @@ static int fastrpc_req_mmap(struct fastrpc_user *fl, char __user *argp)
dev_dbg(dev, "mmap\t\tpt 0x%09lx OK [len 0x%08llx]\n",
buf->raddr, buf->size);
+ kfree(cargs);
return 0;
err_assign:
fastrpc_req_munmap_impl(fl, buf);
+ kfree(cargs);
return err;
}
static int fastrpc_req_mem_unmap_impl(struct fastrpc_user *fl, struct fastrpc_mem_unmap *req)
{
struct fastrpc_invoke_args args[1] = { [0] = { 0 } };
+ struct fastrpc_ctx_args *cargs;
struct fastrpc_map *map = NULL, *iter, *m;
struct fastrpc_mem_unmap_req_msg req_msg = { 0 };
int err = 0;
@@ -1982,14 +2063,21 @@ static int fastrpc_req_mem_unmap_impl(struct fastrpc_user *fl, struct fastrpc_me
args[0].ptr = (u64) (uintptr_t) &req_msg;
args[0].length = sizeof(req_msg);
+ cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
+ if (!cargs)
+ return -ENOMEM;
+
+ cargs->args = args;
sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MEM_UNMAP, 1, 0);
err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc,
- &args[0]);
+ cargs);
if (err) {
dev_err(dev, "unmmap\tpt fd = %d, 0x%09llx error\n", map->fd, map->raddr);
+ kfree(cargs);
return err;
}
fastrpc_map_put(map);
+ kfree(cargs);
return 0;
}
@@ -2007,6 +2095,7 @@ static int fastrpc_req_mem_unmap(struct fastrpc_user *fl, char __user *argp)
static int fastrpc_req_mem_map(struct fastrpc_user *fl, char __user *argp)
{
struct fastrpc_invoke_args args[4] = { [0 ... 3] = { 0 } };
+ struct fastrpc_ctx_args *cargs;
struct fastrpc_mem_map_req_msg req_msg = { 0 };
struct fastrpc_mmap_rsp_msg rsp_msg = { 0 };
struct fastrpc_mem_unmap req_unmap = { 0 };
@@ -2051,8 +2140,13 @@ static int fastrpc_req_mem_map(struct fastrpc_user *fl, char __user *argp)
args[3].ptr = (u64) (uintptr_t) &rsp_msg;
args[3].length = sizeof(rsp_msg);
+ cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
+ if (!cargs)
+ return -ENOMEM;
+
+ cargs->args = args;
sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MEM_MAP, 3, 1);
- err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc, &args[0]);
+ err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc, cargs);
if (err) {
dev_err(dev, "mem mmap error, fd %d, vaddr %llx, size %lld\n",
req.fd, req.vaddrin, map->size);
@@ -2072,11 +2166,13 @@ static int fastrpc_req_mem_map(struct fastrpc_user *fl, char __user *argp)
fastrpc_req_mem_unmap_impl(fl, &req_unmap);
return -EFAULT;
}
+ kfree(cargs);
return 0;
err_invoke:
fastrpc_map_put(map);
+ kfree(cargs);
return err;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 3/5] misc: fastrpc: Add CRC support using invokeV2 request
2025-01-27 4:42 [PATCH v2 0/5] misc: fastrpc: Add invokeV2 to support new features Ekansh Gupta
2025-01-27 4:42 ` [PATCH v2 1/5] misc: fastrpc: Move fdlist to invoke context structure Ekansh Gupta
2025-01-27 4:42 ` [PATCH v2 2/5] misc: fastrpc: Introduce context params structure Ekansh Gupta
@ 2025-01-27 4:42 ` Ekansh Gupta
2025-01-29 10:41 ` Dmitry Baryshkov
2025-01-27 4:42 ` [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver Ekansh Gupta
2025-01-27 4:42 ` [PATCH v2 5/5] misc: fastrpc: Modify context id mask to support polling mode Ekansh Gupta
4 siblings, 1 reply; 25+ messages in thread
From: Ekansh Gupta @ 2025-01-27 4:42 UTC (permalink / raw)
To: srinivas.kandagatla, linux-arm-msm
Cc: gregkh, quic_bkumar, linux-kernel, quic_chennak, dri-devel, arnd
InvokeV2 request is intended to support multiple enhanced invoke
requests like CRC check, performance counter enablement and polling
mode for RPC invocations. CRC check is getting enabled as part of
this patch. CRC check for input and output argument helps in ensuring
data consistency over a remote call. If user intends to enable CRC
check, first local user CRC is calculated at user end and a CRC buffer
is passed to DSP to capture remote CRC values. DSP is expected to
write to the remote CRC buffer which is then compared at user level
with the local CRC values.
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
---
drivers/misc/fastrpc.c | 78 ++++++++++++++++++++++++++++++-------
include/uapi/misc/fastrpc.h | 7 ++++
2 files changed, 70 insertions(+), 15 deletions(-)
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index c29d5536195e..cfacee0dded5 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -237,7 +237,8 @@ struct fastrpc_invoke_ctx {
int client_id;
u32 sc;
u64 *fdlist;
- u32 *crc;
+ u32 *crclist;
+ void __user *crc;
u64 ctxid;
u64 msg_sz;
struct kref refcount;
@@ -617,6 +618,7 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
/* Released in fastrpc_context_put() */
fastrpc_channel_ctx_get(cctx);
+ ctx->crc = cargs->crc;
ctx->sc = sc;
ctx->retval = -1;
ctx->pid = current->pid;
@@ -972,6 +974,7 @@ static int fastrpc_get_args(u32 kernel, struct fastrpc_invoke_ctx *ctx)
list = fastrpc_invoke_buf_start(rpra, ctx->nscalars);
pages = fastrpc_phy_page_start(list, ctx->nscalars);
ctx->fdlist = (u64 *)(pages + ctx->nscalars);
+ ctx->crclist = (u32 *)(ctx->fdlist + FASTRPC_MAX_FDLIST);
args = (uintptr_t)ctx->buf->virt + metalen;
rlen = pkt_size - metalen;
ctx->rpra = rpra;
@@ -1101,6 +1104,12 @@ static int fastrpc_put_args(struct fastrpc_invoke_ctx *ctx,
fastrpc_map_put(mmap);
}
+ if (ctx->crc && ctx->crclist && rpra) {
+ if (copy_to_user(ctx->crc, ctx->crclist,
+ FASTRPC_MAX_CRCLIST * sizeof(u32)))
+ return -EFAULT;
+ }
+
return 0;
}
@@ -1703,39 +1712,75 @@ static int fastrpc_init_attach(struct fastrpc_user *fl, int pd)
return err;
}
-static int fastrpc_invoke(struct fastrpc_user *fl, char __user *argp)
+static int fastrpc_remote_invoke(struct fastrpc_user *fl, struct fastrpc_invoke *inv,
+ struct fastrpc_ctx_args *cargs)
{
- struct fastrpc_invoke_args *args = NULL;
- struct fastrpc_ctx_args *cargs;
- struct fastrpc_invoke inv;
+ struct fastrpc_invoke_args *args;
u32 nscalars;
int err;
- if (copy_from_user(&inv, argp, sizeof(inv)))
- return -EFAULT;
-
/* nscalars is truncated here to max supported value */
- nscalars = REMOTE_SCALARS_LENGTH(inv.sc);
+ nscalars = REMOTE_SCALARS_LENGTH(inv->sc);
if (nscalars) {
args = kcalloc(nscalars, sizeof(*args), GFP_KERNEL);
if (!args)
return -ENOMEM;
- if (copy_from_user(args, (void __user *)(uintptr_t)inv.args,
+ if (copy_from_user(args, (void __user *)(uintptr_t)inv->args,
nscalars * sizeof(*args))) {
kfree(args);
return -EFAULT;
}
}
+
+ cargs->args = args;
+ err = fastrpc_internal_invoke(fl, false, inv->handle, inv->sc, cargs);
+ kfree(args);
+
+ return err;
+}
+
+static int fastrpc_invoke(struct fastrpc_user *fl, char __user *argp)
+{
+ struct fastrpc_ctx_args *cargs;
+ struct fastrpc_invoke inv;
+ int err;
+
+ if (copy_from_user(&inv, argp, sizeof(inv)))
+ return -EFAULT;
+
cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
- if (!cargs) {
- kfree(args);
+ if (!cargs)
return -ENOMEM;
+
+ err = fastrpc_remote_invoke(fl, &inv, cargs);
+ kfree(cargs);
+
+ return err;
+}
+
+static int fastrpc_invokev2(struct fastrpc_user *fl, char __user *argp)
+{
+ struct fastrpc_ctx_args *cargs;
+ struct fastrpc_invoke_v2 inv2;
+ int i, err;
+
+ if (copy_from_user(&inv2, argp, sizeof(inv2)))
+ return -EFAULT;
+
+ /* Check if all reserved fields are zero */
+ for (i = 0; i < 16; i++) {
+ if (inv2.reserved[i] != 0)
+ return -EINVAL;
}
- cargs->args = args;
- err = fastrpc_internal_invoke(fl, false, inv.handle, inv.sc, cargs);
- kfree(args);
+ cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
+ if (!cargs)
+ return -ENOMEM;
+
+ cargs->crc = (void __user *)(uintptr_t)inv2.crc;
+
+ err = fastrpc_remote_invoke(fl, &inv2.inv, cargs);
kfree(cargs);
return err;
@@ -2188,6 +2233,9 @@ static long fastrpc_device_ioctl(struct file *file, unsigned int cmd,
case FASTRPC_IOCTL_INVOKE:
err = fastrpc_invoke(fl, argp);
break;
+ case FASTRPC_IOCTL_INVOKEV2:
+ err = fastrpc_invokev2(fl, argp);
+ break;
case FASTRPC_IOCTL_INIT_ATTACH:
err = fastrpc_init_attach(fl, ROOT_PD);
break;
diff --git a/include/uapi/misc/fastrpc.h b/include/uapi/misc/fastrpc.h
index f33d914d8f46..406b80555d41 100644
--- a/include/uapi/misc/fastrpc.h
+++ b/include/uapi/misc/fastrpc.h
@@ -17,6 +17,7 @@
#define FASTRPC_IOCTL_MEM_MAP _IOWR('R', 10, struct fastrpc_mem_map)
#define FASTRPC_IOCTL_MEM_UNMAP _IOWR('R', 11, struct fastrpc_mem_unmap)
#define FASTRPC_IOCTL_GET_DSP_INFO _IOWR('R', 13, struct fastrpc_ioctl_capability)
+#define FASTRPC_IOCTL_INVOKEV2 _IOWR('R', 14, struct fastrpc_invoke_v2)
/**
* enum fastrpc_map_flags - control flags for mapping memory on DSP user process
@@ -80,6 +81,12 @@ struct fastrpc_invoke {
__u64 args;
};
+struct fastrpc_invoke_v2 {
+ struct fastrpc_invoke inv;
+ __u64 crc;
+ __u32 reserved[16];
+};
+
struct fastrpc_init_create {
__u32 filelen; /* elf file length */
__s32 filefd; /* fd for the file */
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver
2025-01-27 4:42 [PATCH v2 0/5] misc: fastrpc: Add invokeV2 to support new features Ekansh Gupta
` (2 preceding siblings ...)
2025-01-27 4:42 ` [PATCH v2 3/5] misc: fastrpc: Add CRC support using invokeV2 request Ekansh Gupta
@ 2025-01-27 4:42 ` Ekansh Gupta
2025-01-28 23:29 ` Dmitry Baryshkov
2025-01-27 4:42 ` [PATCH v2 5/5] misc: fastrpc: Modify context id mask to support polling mode Ekansh Gupta
4 siblings, 1 reply; 25+ messages in thread
From: Ekansh Gupta @ 2025-01-27 4:42 UTC (permalink / raw)
To: srinivas.kandagatla, linux-arm-msm
Cc: gregkh, quic_bkumar, linux-kernel, quic_chennak, dri-devel, arnd
For any remote call to DSP, after sending an invocation message,
fastRPC driver waits for glink response and during this time the
CPU can go into low power modes. Adding a polling mode support
with which fastRPC driver will poll continuously on a memory
after sending a message to remote subsystem which will eliminate
CPU wakeup and scheduling latencies and reduce fastRPC overhead.
With this change, DSP always sends a glink response which will
get ignored if polling mode didn't time out.
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
---
drivers/misc/fastrpc.c | 122 +++++++++++++++++++++++++++++++++---
include/uapi/misc/fastrpc.h | 3 +-
2 files changed, 114 insertions(+), 11 deletions(-)
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index cfacee0dded5..257a741af115 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -19,6 +19,7 @@
#include <linux/rpmsg.h>
#include <linux/scatterlist.h>
#include <linux/slab.h>
+#include <linux/delay.h>
#include <linux/firmware/qcom/qcom_scm.h>
#include <uapi/misc/fastrpc.h>
#include <linux/of_reserved_mem.h>
@@ -38,6 +39,7 @@
#define FASTRPC_CTX_MAX (256)
#define FASTRPC_INIT_HANDLE 1
#define FASTRPC_DSP_UTILITIES_HANDLE 2
+#define FASTRPC_MAX_STATIC_HANDLE (20)
#define FASTRPC_CTXID_MASK (0xFF0)
#define INIT_FILELEN_MAX (2 * 1024 * 1024)
#define INIT_FILE_NAMELEN_MAX (128)
@@ -106,6 +108,19 @@
#define miscdev_to_fdevice(d) container_of(d, struct fastrpc_device, miscdev)
+/* Poll response number from remote processor for call completion */
+#define FASTRPC_POLL_RESPONSE (0xdecaf)
+/* timeout in us for polling until memory barrier */
+#define FASTRPC_POLL_TIME_MEM_UPDATE (500)
+
+/* Response types supported for RPC calls */
+enum fastrpc_response_flags {
+ /* normal job completion glink response */
+ NORMAL_RESPONSE = 0,
+ /* process updates poll memory instead of glink response */
+ POLL_MODE = 1,
+};
+
static const char *domains[FASTRPC_DEV_MAX] = { "adsp", "mdsp",
"sdsp", "cdsp", "cdsp1" };
struct fastrpc_phy_page {
@@ -238,9 +253,16 @@ struct fastrpc_invoke_ctx {
u32 sc;
u64 *fdlist;
u32 *crclist;
+ u32 *poll;
void __user *crc;
u64 ctxid;
u64 msg_sz;
+ /* Threads poll for specified timeout and fall back to glink wait */
+ u64 poll_timeout;
+ /* work done status flag */
+ bool is_work_done;
+ /* response flags from remote processor */
+ enum fastrpc_response_flags rsp_flags;
struct kref refcount;
struct list_head node; /* list of ctxs */
struct completion work;
@@ -258,6 +280,7 @@ struct fastrpc_invoke_ctx {
struct fastrpc_ctx_args {
struct fastrpc_invoke_args *args;
void __user *crc;
+ u64 poll_timeout;
};
struct fastrpc_session_ctx {
@@ -619,11 +642,14 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
fastrpc_channel_ctx_get(cctx);
ctx->crc = cargs->crc;
+ ctx->poll_timeout = cargs->poll_timeout;
ctx->sc = sc;
ctx->retval = -1;
ctx->pid = current->pid;
ctx->client_id = user->client_id;
ctx->cctx = cctx;
+ ctx->rsp_flags = NORMAL_RESPONSE;
+ ctx->is_work_done = false;
init_completion(&ctx->work);
INIT_WORK(&ctx->put_work, fastrpc_context_put_wq);
@@ -882,7 +908,8 @@ static int fastrpc_get_meta_size(struct fastrpc_invoke_ctx *ctx)
sizeof(struct fastrpc_invoke_buf) +
sizeof(struct fastrpc_phy_page)) * ctx->nscalars +
sizeof(u64) * FASTRPC_MAX_FDLIST +
- sizeof(u32) * FASTRPC_MAX_CRCLIST;
+ sizeof(u32) * FASTRPC_MAX_CRCLIST +
+ sizeof(u32);
return size;
}
@@ -975,6 +1002,8 @@ static int fastrpc_get_args(u32 kernel, struct fastrpc_invoke_ctx *ctx)
pages = fastrpc_phy_page_start(list, ctx->nscalars);
ctx->fdlist = (u64 *)(pages + ctx->nscalars);
ctx->crclist = (u32 *)(ctx->fdlist + FASTRPC_MAX_FDLIST);
+ ctx->poll = (u32 *)(ctx->crclist + FASTRPC_MAX_CRCLIST);
+
args = (uintptr_t)ctx->buf->virt + metalen;
rlen = pkt_size - metalen;
ctx->rpra = rpra;
@@ -1145,6 +1174,72 @@ static int fastrpc_invoke_send(struct fastrpc_session_ctx *sctx,
}
+static int poll_for_remote_response(struct fastrpc_invoke_ctx *ctx, u64 timeout)
+{
+ int err = -EIO, i, j;
+
+ /* poll on memory for DSP response. Return failure on timeout */
+ for (i = 0, j = 0; i < timeout; i++, j++) {
+ if (*ctx->poll == FASTRPC_POLL_RESPONSE) {
+ err = 0;
+ ctx->is_work_done = true;
+ ctx->retval = 0;
+ break;
+ }
+ if (j == FASTRPC_POLL_TIME_MEM_UPDATE) {
+ /* make sure that all poll memory writes by DSP are seen by CPU */
+ dma_rmb();
+ j = 0;
+ }
+ udelay(1);
+ }
+ return err;
+}
+
+static inline int fastrpc_wait_for_response(struct fastrpc_invoke_ctx *ctx,
+ u32 kernel)
+{
+ int err = 0;
+
+ if (kernel) {
+ if (!wait_for_completion_timeout(&ctx->work, 10 * HZ))
+ err = -ETIMEDOUT;
+ } else {
+ err = wait_for_completion_interruptible(&ctx->work);
+ }
+
+ return err;
+}
+
+static int fastrpc_wait_for_completion(struct fastrpc_invoke_ctx *ctx,
+ u32 kernel)
+{
+ int err;
+
+ do {
+ switch (ctx->rsp_flags) {
+ case NORMAL_RESPONSE:
+ err = fastrpc_wait_for_response(ctx, kernel);
+ if (err || ctx->is_work_done)
+ return err;
+ break;
+ case POLL_MODE:
+ err = poll_for_remote_response(ctx, ctx->poll_timeout);
+ /* If polling timed out, move to normal response mode */
+ if (err)
+ ctx->rsp_flags = NORMAL_RESPONSE;
+ break;
+ default:
+ err = -EBADR;
+ dev_dbg(ctx->fl->sctx->dev,
+ "unsupported response type:0x%x\n", ctx->rsp_flags);
+ break;
+ }
+ } while (!ctx->is_work_done);
+
+ return err;
+}
+
static int fastrpc_internal_invoke(struct fastrpc_user *fl, u32 kernel,
u32 handle, u32 sc,
struct fastrpc_ctx_args *cargs)
@@ -1180,16 +1275,20 @@ static int fastrpc_internal_invoke(struct fastrpc_user *fl, u32 kernel,
if (err)
goto bail;
- if (kernel) {
- if (!wait_for_completion_timeout(&ctx->work, 10 * HZ))
- err = -ETIMEDOUT;
- } else {
- err = wait_for_completion_interruptible(&ctx->work);
- }
+ if (ctx->poll_timeout != 0 && handle > FASTRPC_MAX_STATIC_HANDLE
+ && fl->pd == USER_PD)
+ ctx->rsp_flags = POLL_MODE;
+ err = fastrpc_wait_for_completion(ctx, kernel);
if (err)
goto bail;
+ if (!ctx->is_work_done) {
+ err = -ETIMEDOUT;
+ dev_dbg(fl->sctx->dev, "Invalid workdone state for handle 0x%x, sc 0x%x\n",
+ handle, sc);
+ goto bail;
+ }
/* make sure that all memory writes by DSP are seen by CPU */
dma_rmb();
/* populate all the output buffers with results */
@@ -1769,7 +1868,7 @@ static int fastrpc_invokev2(struct fastrpc_user *fl, char __user *argp)
return -EFAULT;
/* Check if all reserved fields are zero */
- for (i = 0; i < 16; i++) {
+ for (i = 0; i < 14; i++) {
if (inv2.reserved[i] != 0)
return -EINVAL;
}
@@ -1779,6 +1878,7 @@ static int fastrpc_invokev2(struct fastrpc_user *fl, char __user *argp)
return -ENOMEM;
cargs->crc = (void __user *)(uintptr_t)inv2.crc;
+ cargs->poll_timeout = inv2.poll_timeout;
err = fastrpc_remote_invoke(fl, &inv2.inv, cargs);
kfree(cargs);
@@ -2581,12 +2681,14 @@ static int fastrpc_rpmsg_callback(struct rpmsg_device *rpdev, void *data,
ctx = idr_find(&cctx->ctx_idr, ctxid);
spin_unlock_irqrestore(&cctx->lock, flags);
+ /* Ignore this failure as context returned will be NULL for polling mode */
if (!ctx) {
- dev_err(&rpdev->dev, "No context ID matches response\n");
- return -ENOENT;
+ dev_dbg(&rpdev->dev, "No context ID matches response\n");
+ return 0;
}
ctx->retval = rsp->retval;
+ ctx->is_work_done = true;
complete(&ctx->work);
/*
diff --git a/include/uapi/misc/fastrpc.h b/include/uapi/misc/fastrpc.h
index 406b80555d41..1920c537bbbf 100644
--- a/include/uapi/misc/fastrpc.h
+++ b/include/uapi/misc/fastrpc.h
@@ -84,7 +84,8 @@ struct fastrpc_invoke {
struct fastrpc_invoke_v2 {
struct fastrpc_invoke inv;
__u64 crc;
- __u32 reserved[16];
+ __u64 poll_timeout;
+ __u32 reserved[14];
};
struct fastrpc_init_create {
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH v2 5/5] misc: fastrpc: Modify context id mask to support polling mode
2025-01-27 4:42 [PATCH v2 0/5] misc: fastrpc: Add invokeV2 to support new features Ekansh Gupta
` (3 preceding siblings ...)
2025-01-27 4:42 ` [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver Ekansh Gupta
@ 2025-01-27 4:42 ` Ekansh Gupta
2025-01-28 23:30 ` Dmitry Baryshkov
4 siblings, 1 reply; 25+ messages in thread
From: Ekansh Gupta @ 2025-01-27 4:42 UTC (permalink / raw)
To: srinivas.kandagatla, linux-arm-msm
Cc: gregkh, quic_bkumar, linux-kernel, quic_chennak, dri-devel, arnd
DSP needs last 4 bits of context id to be 0 for polling mode to be
supported as setting of last 8 is intended for async mode(not yet
supported on upstream driver) and setting these bits restrics
writing to poll memory from DSP. Modify context id mask to ensure
polling mode is supported.
Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
---
drivers/misc/fastrpc.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
index 257a741af115..ef56c793c564 100644
--- a/drivers/misc/fastrpc.c
+++ b/drivers/misc/fastrpc.c
@@ -40,7 +40,7 @@
#define FASTRPC_INIT_HANDLE 1
#define FASTRPC_DSP_UTILITIES_HANDLE 2
#define FASTRPC_MAX_STATIC_HANDLE (20)
-#define FASTRPC_CTXID_MASK (0xFF0)
+#define FASTRPC_CTXID_MASK (0xFF0000)
#define INIT_FILELEN_MAX (2 * 1024 * 1024)
#define INIT_FILE_NAMELEN_MAX (128)
#define FASTRPC_DEVICE_NAME "fastrpc"
@@ -524,7 +524,7 @@ static void fastrpc_context_free(struct kref *ref)
fastrpc_buf_free(ctx->buf);
spin_lock_irqsave(&cctx->lock, flags);
- idr_remove(&cctx->ctx_idr, ctx->ctxid >> 4);
+ idr_remove(&cctx->ctx_idr, ctx->ctxid >> 16);
spin_unlock_irqrestore(&cctx->lock, flags);
kfree(ctx->maps);
@@ -664,7 +664,7 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
spin_unlock_irqrestore(&cctx->lock, flags);
goto err_idr;
}
- ctx->ctxid = ret << 4;
+ ctx->ctxid = ret << 16;
spin_unlock_irqrestore(&cctx->lock, flags);
kref_init(&ctx->refcount);
@@ -2675,7 +2675,7 @@ static int fastrpc_rpmsg_callback(struct rpmsg_device *rpdev, void *data,
if (len < sizeof(*rsp))
return -EINVAL;
- ctxid = ((rsp->ctx & FASTRPC_CTXID_MASK) >> 4);
+ ctxid = ((rsp->ctx & FASTRPC_CTXID_MASK) >> 16);
spin_lock_irqsave(&cctx->lock, flags);
ctx = idr_find(&cctx->ctx_idr, ctxid);
--
2.34.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH v2 1/5] misc: fastrpc: Move fdlist to invoke context structure
2025-01-27 4:42 ` [PATCH v2 1/5] misc: fastrpc: Move fdlist to invoke context structure Ekansh Gupta
@ 2025-01-27 15:37 ` Dmitry Baryshkov
0 siblings, 0 replies; 25+ messages in thread
From: Dmitry Baryshkov @ 2025-01-27 15:37 UTC (permalink / raw)
To: Ekansh Gupta
Cc: srinivas.kandagatla, linux-arm-msm, gregkh, quic_bkumar,
linux-kernel, quic_chennak, dri-devel, arnd
On Mon, Jan 27, 2025 at 10:12:35AM +0530, Ekansh Gupta wrote:
> FD list is part of meta buffer which is calulated during put args.
> Move fdlist to invoke context structure for better maintenance and
> to avoid code duplicacy for calculation of critcal meta buffer
> contents that are used by fastrpc driver.
>
> Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
> ---
> drivers/misc/fastrpc.c | 16 +++++-----------
> 1 file changed, 5 insertions(+), 11 deletions(-)
>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/5] misc: fastrpc: Introduce context params structure
2025-01-27 4:42 ` [PATCH v2 2/5] misc: fastrpc: Introduce context params structure Ekansh Gupta
@ 2025-01-27 15:43 ` Dmitry Baryshkov
2025-01-29 4:57 ` Ekansh Gupta
0 siblings, 1 reply; 25+ messages in thread
From: Dmitry Baryshkov @ 2025-01-27 15:43 UTC (permalink / raw)
To: Ekansh Gupta
Cc: srinivas.kandagatla, linux-arm-msm, gregkh, quic_bkumar,
linux-kernel, quic_chennak, dri-devel, arnd
On Mon, Jan 27, 2025 at 10:12:36AM +0530, Ekansh Gupta wrote:
> Add structure to invoke context parameterms. This structure is meant
Nit: for invoke context parameters.
> to carry invoke context specific data. This structure is passed to
> internal invocation call to save the data in context. Examples of
> data intended to part of this structure are: CRC user memory address,
> poll timeout for invoke call, call priority etc.
>
> Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
> ---
> drivers/misc/fastrpc.c | 138 ++++++++++++++++++++++++++++++++++-------
> 1 file changed, 117 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
> index 1a936d462519..c29d5536195e 100644
> --- a/drivers/misc/fastrpc.c
> +++ b/drivers/misc/fastrpc.c
> @@ -254,6 +254,11 @@ struct fastrpc_invoke_ctx {
> struct fastrpc_channel_ctx *cctx;
> };
>
> +struct fastrpc_ctx_args {
> + struct fastrpc_invoke_args *args;
> + void __user *crc;
> +};
Why do we need a separate struct? Can we use fastrpc_invoke_ctx instead?
> +
> struct fastrpc_session_ctx {
> struct device *dev;
> int sid;
> @@ -574,7 +579,7 @@ static void fastrpc_get_buff_overlaps(struct fastrpc_invoke_ctx *ctx)
>
> static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
> struct fastrpc_user *user, u32 kernel, u32 sc,
> - struct fastrpc_invoke_args *args)
> + struct fastrpc_ctx_args *cargs)
> {
> struct fastrpc_channel_ctx *cctx = user->cctx;
> struct fastrpc_invoke_ctx *ctx = NULL;
> @@ -605,7 +610,7 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
> kfree(ctx);
> return ERR_PTR(-ENOMEM);
> }
> - ctx->args = args;
> + ctx->args = cargs->args;
> fastrpc_get_buff_overlaps(ctx);
> }
>
> @@ -1133,7 +1138,7 @@ static int fastrpc_invoke_send(struct fastrpc_session_ctx *sctx,
>
> static int fastrpc_internal_invoke(struct fastrpc_user *fl, u32 kernel,
> u32 handle, u32 sc,
> - struct fastrpc_invoke_args *args)
> + struct fastrpc_ctx_args *cargs)
> {
> struct fastrpc_invoke_ctx *ctx = NULL;
> struct fastrpc_buf *buf, *b;
> @@ -1151,7 +1156,7 @@ static int fastrpc_internal_invoke(struct fastrpc_user *fl, u32 kernel,
> return -EPERM;
> }
>
> - ctx = fastrpc_context_alloc(fl, kernel, sc, args);
> + ctx = fastrpc_context_alloc(fl, kernel, sc, cargs);
> if (IS_ERR(ctx))
> return PTR_ERR(ctx);
>
> @@ -1233,6 +1238,7 @@ static int fastrpc_init_create_static_process(struct fastrpc_user *fl,
> {
> struct fastrpc_init_create_static init;
> struct fastrpc_invoke_args *args;
> + struct fastrpc_ctx_args *cargs;
> struct fastrpc_phy_page pages[1];
> char *name;
> int err;
> @@ -1307,15 +1313,25 @@ static int fastrpc_init_create_static_process(struct fastrpc_user *fl,
> args[2].length = sizeof(*pages);
> args[2].fd = -1;
>
> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
> + if (!cargs) {
> + err = -ENOMEM;
> + goto err_invoke;
> + }
> +
> + cargs->args = args;
> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_CREATE_STATIC, 3, 0);
>
> err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
> - sc, args);
> - if (err)
> + sc, cargs);
> + if (err) {
> + kfree(cargs);
No, this should be a part of the error path.
> goto err_invoke;
> + }
>
> kfree(args);
> kfree(name);
> + kfree(cargs);
>
> return 0;
> err_invoke:
> @@ -1351,6 +1367,7 @@ static int fastrpc_init_create_process(struct fastrpc_user *fl,
> {
> struct fastrpc_init_create init;
> struct fastrpc_invoke_args *args;
> + struct fastrpc_ctx_args *cargs;
> struct fastrpc_phy_page pages[1];
> struct fastrpc_map *map = NULL;
> struct fastrpc_buf *imem = NULL;
> @@ -1438,16 +1455,26 @@ static int fastrpc_init_create_process(struct fastrpc_user *fl,
> args[5].length = sizeof(inbuf.siglen);
> args[5].fd = -1;
>
> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
> + if (!cargs) {
> + err = -ENOMEM;
> + goto err_invoke;
> + }
> +
> + cargs->args = args;
> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_CREATE, 4, 0);
> if (init.attrs)
> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_CREATE_ATTR, 4, 0);
>
> err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
> - sc, args);
> - if (err)
> + sc, cargs);
> + if (err) {
> + kfree(cargs);
Likewise.
> goto err_invoke;
> + }
>
> kfree(args);
> + kfree(cargs);
>
> return 0;
>
> @@ -1498,17 +1525,27 @@ static void fastrpc_session_free(struct fastrpc_channel_ctx *cctx,
> static int fastrpc_release_current_dsp_process(struct fastrpc_user *fl)
> {
> struct fastrpc_invoke_args args[1];
> - int client_id = 0;
> + struct fastrpc_ctx_args *cargs;
> + int client_id = 0, err;
> u32 sc;
>
> client_id = fl->client_id;
> args[0].ptr = (u64)(uintptr_t) &client_id;
> args[0].length = sizeof(client_id);
> args[0].fd = -1;
> +
> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
> + if (!cargs)
> + return -ENOMEM;
> +
> + cargs->args = args;
> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_RELEASE, 1, 0);
>
> - return fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
> - sc, &args[0]);
> + err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
> + sc, cargs);
> + kfree(cargs);
> +
> + return err;
> }
>
> static int fastrpc_device_release(struct inode *inode, struct file *file)
> @@ -1643,22 +1680,33 @@ static int fastrpc_dmabuf_alloc(struct fastrpc_user *fl, char __user *argp)
> static int fastrpc_init_attach(struct fastrpc_user *fl, int pd)
> {
> struct fastrpc_invoke_args args[1];
> - int client_id = fl->client_id;
> + struct fastrpc_ctx_args *cargs;
> + int client_id = fl->client_id, err;
> u32 sc;
>
> args[0].ptr = (u64)(uintptr_t) &client_id;
> args[0].length = sizeof(client_id);
> args[0].fd = -1;
> +
> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
> + if (!cargs)
> + return -ENOMEM;
> +
> + cargs->args = args;
> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_ATTACH, 1, 0);
> fl->pd = pd;
>
> - return fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
> - sc, &args[0]);
> + err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
> + sc, cargs);
> + kfree(cargs);
> +
> + return err;
> }
>
> static int fastrpc_invoke(struct fastrpc_user *fl, char __user *argp)
> {
> struct fastrpc_invoke_args *args = NULL;
> + struct fastrpc_ctx_args *cargs;
> struct fastrpc_invoke inv;
> u32 nscalars;
> int err;
> @@ -1679,9 +1727,16 @@ static int fastrpc_invoke(struct fastrpc_user *fl, char __user *argp)
> return -EFAULT;
> }
> }
> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
> + if (!cargs) {
> + kfree(args);
> + return -ENOMEM;
> + }
>
> - err = fastrpc_internal_invoke(fl, false, inv.handle, inv.sc, args);
> + cargs->args = args;
> + err = fastrpc_internal_invoke(fl, false, inv.handle, inv.sc, cargs);
> kfree(args);
> + kfree(cargs);
>
> return err;
> }
> @@ -1690,6 +1745,8 @@ static int fastrpc_get_info_from_dsp(struct fastrpc_user *fl, uint32_t *dsp_attr
> uint32_t dsp_attr_buf_len)
> {
> struct fastrpc_invoke_args args[2] = { 0 };
> + struct fastrpc_ctx_args *cargs;
> + int err;
>
> /*
> * Capability filled in userspace. This carries the information
> @@ -1706,8 +1763,15 @@ static int fastrpc_get_info_from_dsp(struct fastrpc_user *fl, uint32_t *dsp_attr
> args[1].length = dsp_attr_buf_len * sizeof(u32);
> args[1].fd = -1;
>
> - return fastrpc_internal_invoke(fl, true, FASTRPC_DSP_UTILITIES_HANDLE,
> - FASTRPC_SCALARS(0, 1, 1), args);
> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
> + if (!cargs)
> + return -ENOMEM;
> +
> + cargs->args = args;
> + err = fastrpc_internal_invoke(fl, true, FASTRPC_DSP_UTILITIES_HANDLE,
> + FASTRPC_SCALARS(0, 1, 1), cargs);
> + kfree(cargs);
> + return err;
> }
>
> static int fastrpc_get_info_from_kernel(struct fastrpc_ioctl_capability *cap,
> @@ -1794,6 +1858,7 @@ static int fastrpc_get_dsp_info(struct fastrpc_user *fl, char __user *argp)
> static int fastrpc_req_munmap_impl(struct fastrpc_user *fl, struct fastrpc_buf *buf)
> {
> struct fastrpc_invoke_args args[1] = { [0] = { 0 } };
> + struct fastrpc_ctx_args *cargs;
> struct fastrpc_munmap_req_msg req_msg;
> struct device *dev = fl->sctx->dev;
> int err;
> @@ -1806,9 +1871,14 @@ static int fastrpc_req_munmap_impl(struct fastrpc_user *fl, struct fastrpc_buf *
> args[0].ptr = (u64) (uintptr_t) &req_msg;
> args[0].length = sizeof(req_msg);
>
> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
> + if (!cargs)
> + return -ENOMEM;
> +
> + cargs->args = args;
> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MUNMAP, 1, 0);
> err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc,
> - &args[0]);
> + cargs);
> if (!err) {
> dev_dbg(dev, "unmmap\tpt 0x%09lx OK\n", buf->raddr);
> spin_lock(&fl->lock);
> @@ -1818,6 +1888,7 @@ static int fastrpc_req_munmap_impl(struct fastrpc_user *fl, struct fastrpc_buf *
> } else {
> dev_err(dev, "unmmap\tpt 0x%09lx ERROR\n", buf->raddr);
> }
> + kfree(cargs);
>
> return err;
> }
> @@ -1852,6 +1923,7 @@ static int fastrpc_req_munmap(struct fastrpc_user *fl, char __user *argp)
> static int fastrpc_req_mmap(struct fastrpc_user *fl, char __user *argp)
> {
> struct fastrpc_invoke_args args[3] = { [0 ... 2] = { 0 } };
> + struct fastrpc_ctx_args *cargs;
> struct fastrpc_buf *buf = NULL;
> struct fastrpc_mmap_req_msg req_msg;
> struct fastrpc_mmap_rsp_msg rsp_msg;
> @@ -1902,12 +1974,18 @@ static int fastrpc_req_mmap(struct fastrpc_user *fl, char __user *argp)
> args[2].ptr = (u64) (uintptr_t) &rsp_msg;
> args[2].length = sizeof(rsp_msg);
>
> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
> + if (!cargs)
> + return -ENOMEM;
> +
> + cargs->args = args;
> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MMAP, 2, 1);
> err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc,
> - &args[0]);
> + cargs);
> if (err) {
> dev_err(dev, "mmap error (len 0x%08llx)\n", buf->size);
> fastrpc_buf_free(buf);
> + kfree(cargs);
> return err;
> }
>
> @@ -1942,17 +2020,20 @@ static int fastrpc_req_mmap(struct fastrpc_user *fl, char __user *argp)
> dev_dbg(dev, "mmap\t\tpt 0x%09lx OK [len 0x%08llx]\n",
> buf->raddr, buf->size);
>
> + kfree(cargs);
> return 0;
>
> err_assign:
> fastrpc_req_munmap_impl(fl, buf);
>
> + kfree(cargs);
> return err;
> }
>
> static int fastrpc_req_mem_unmap_impl(struct fastrpc_user *fl, struct fastrpc_mem_unmap *req)
> {
> struct fastrpc_invoke_args args[1] = { [0] = { 0 } };
> + struct fastrpc_ctx_args *cargs;
> struct fastrpc_map *map = NULL, *iter, *m;
> struct fastrpc_mem_unmap_req_msg req_msg = { 0 };
> int err = 0;
> @@ -1982,14 +2063,21 @@ static int fastrpc_req_mem_unmap_impl(struct fastrpc_user *fl, struct fastrpc_me
> args[0].ptr = (u64) (uintptr_t) &req_msg;
> args[0].length = sizeof(req_msg);
>
> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
> + if (!cargs)
> + return -ENOMEM;
> +
> + cargs->args = args;
> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MEM_UNMAP, 1, 0);
> err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc,
> - &args[0]);
> + cargs);
> if (err) {
> dev_err(dev, "unmmap\tpt fd = %d, 0x%09llx error\n", map->fd, map->raddr);
> + kfree(cargs);
> return err;
> }
> fastrpc_map_put(map);
> + kfree(cargs);
>
> return 0;
> }
> @@ -2007,6 +2095,7 @@ static int fastrpc_req_mem_unmap(struct fastrpc_user *fl, char __user *argp)
> static int fastrpc_req_mem_map(struct fastrpc_user *fl, char __user *argp)
> {
> struct fastrpc_invoke_args args[4] = { [0 ... 3] = { 0 } };
> + struct fastrpc_ctx_args *cargs;
> struct fastrpc_mem_map_req_msg req_msg = { 0 };
> struct fastrpc_mmap_rsp_msg rsp_msg = { 0 };
> struct fastrpc_mem_unmap req_unmap = { 0 };
> @@ -2051,8 +2140,13 @@ static int fastrpc_req_mem_map(struct fastrpc_user *fl, char __user *argp)
> args[3].ptr = (u64) (uintptr_t) &rsp_msg;
> args[3].length = sizeof(rsp_msg);
>
> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
> + if (!cargs)
> + return -ENOMEM;
> +
> + cargs->args = args;
> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MEM_MAP, 3, 1);
> - err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc, &args[0]);
> + err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc, cargs);
> if (err) {
> dev_err(dev, "mem mmap error, fd %d, vaddr %llx, size %lld\n",
> req.fd, req.vaddrin, map->size);
> @@ -2072,11 +2166,13 @@ static int fastrpc_req_mem_map(struct fastrpc_user *fl, char __user *argp)
> fastrpc_req_mem_unmap_impl(fl, &req_unmap);
> return -EFAULT;
> }
> + kfree(cargs);
>
> return 0;
>
> err_invoke:
> fastrpc_map_put(map);
> + kfree(cargs);
>
> return err;
> }
> --
> 2.34.1
>
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver
2025-01-27 4:42 ` [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver Ekansh Gupta
@ 2025-01-28 23:29 ` Dmitry Baryshkov
2025-01-29 5:42 ` Ekansh Gupta
0 siblings, 1 reply; 25+ messages in thread
From: Dmitry Baryshkov @ 2025-01-28 23:29 UTC (permalink / raw)
To: Ekansh Gupta
Cc: srinivas.kandagatla, linux-arm-msm, gregkh, quic_bkumar,
linux-kernel, quic_chennak, dri-devel, arnd
On Mon, Jan 27, 2025 at 10:12:38AM +0530, Ekansh Gupta wrote:
> For any remote call to DSP, after sending an invocation message,
> fastRPC driver waits for glink response and during this time the
> CPU can go into low power modes. Adding a polling mode support
> with which fastRPC driver will poll continuously on a memory
> after sending a message to remote subsystem which will eliminate
> CPU wakeup and scheduling latencies and reduce fastRPC overhead.
> With this change, DSP always sends a glink response which will
> get ignored if polling mode didn't time out.
Is there a chance to implement actual async I/O protocol with the help
of the poll() call instead of hiding the polling / wait inside the
invoke2?
>
> Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
> ---
> drivers/misc/fastrpc.c | 122 +++++++++++++++++++++++++++++++++---
> include/uapi/misc/fastrpc.h | 3 +-
> 2 files changed, 114 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
> index cfacee0dded5..257a741af115 100644
> --- a/drivers/misc/fastrpc.c
> +++ b/drivers/misc/fastrpc.c
> @@ -19,6 +19,7 @@
> #include <linux/rpmsg.h>
> #include <linux/scatterlist.h>
> #include <linux/slab.h>
> +#include <linux/delay.h>
> #include <linux/firmware/qcom/qcom_scm.h>
> #include <uapi/misc/fastrpc.h>
> #include <linux/of_reserved_mem.h>
> @@ -38,6 +39,7 @@
> #define FASTRPC_CTX_MAX (256)
> #define FASTRPC_INIT_HANDLE 1
> #define FASTRPC_DSP_UTILITIES_HANDLE 2
> +#define FASTRPC_MAX_STATIC_HANDLE (20)
> #define FASTRPC_CTXID_MASK (0xFF0)
> #define INIT_FILELEN_MAX (2 * 1024 * 1024)
> #define INIT_FILE_NAMELEN_MAX (128)
> @@ -106,6 +108,19 @@
>
> #define miscdev_to_fdevice(d) container_of(d, struct fastrpc_device, miscdev)
>
> +/* Poll response number from remote processor for call completion */
> +#define FASTRPC_POLL_RESPONSE (0xdecaf)
> +/* timeout in us for polling until memory barrier */
> +#define FASTRPC_POLL_TIME_MEM_UPDATE (500)
> +
> +/* Response types supported for RPC calls */
> +enum fastrpc_response_flags {
> + /* normal job completion glink response */
> + NORMAL_RESPONSE = 0,
> + /* process updates poll memory instead of glink response */
> + POLL_MODE = 1,
> +};
> +
> static const char *domains[FASTRPC_DEV_MAX] = { "adsp", "mdsp",
> "sdsp", "cdsp", "cdsp1" };
> struct fastrpc_phy_page {
> @@ -238,9 +253,16 @@ struct fastrpc_invoke_ctx {
> u32 sc;
> u64 *fdlist;
> u32 *crclist;
> + u32 *poll;
> void __user *crc;
> u64 ctxid;
> u64 msg_sz;
> + /* Threads poll for specified timeout and fall back to glink wait */
> + u64 poll_timeout;
> + /* work done status flag */
> + bool is_work_done;
> + /* response flags from remote processor */
> + enum fastrpc_response_flags rsp_flags;
> struct kref refcount;
> struct list_head node; /* list of ctxs */
> struct completion work;
> @@ -258,6 +280,7 @@ struct fastrpc_invoke_ctx {
> struct fastrpc_ctx_args {
> struct fastrpc_invoke_args *args;
> void __user *crc;
> + u64 poll_timeout;
> };
>
> struct fastrpc_session_ctx {
> @@ -619,11 +642,14 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
> fastrpc_channel_ctx_get(cctx);
>
> ctx->crc = cargs->crc;
> + ctx->poll_timeout = cargs->poll_timeout;
> ctx->sc = sc;
> ctx->retval = -1;
> ctx->pid = current->pid;
> ctx->client_id = user->client_id;
> ctx->cctx = cctx;
> + ctx->rsp_flags = NORMAL_RESPONSE;
> + ctx->is_work_done = false;
> init_completion(&ctx->work);
> INIT_WORK(&ctx->put_work, fastrpc_context_put_wq);
>
> @@ -882,7 +908,8 @@ static int fastrpc_get_meta_size(struct fastrpc_invoke_ctx *ctx)
> sizeof(struct fastrpc_invoke_buf) +
> sizeof(struct fastrpc_phy_page)) * ctx->nscalars +
> sizeof(u64) * FASTRPC_MAX_FDLIST +
> - sizeof(u32) * FASTRPC_MAX_CRCLIST;
> + sizeof(u32) * FASTRPC_MAX_CRCLIST +
> + sizeof(u32);
>
> return size;
> }
> @@ -975,6 +1002,8 @@ static int fastrpc_get_args(u32 kernel, struct fastrpc_invoke_ctx *ctx)
> pages = fastrpc_phy_page_start(list, ctx->nscalars);
> ctx->fdlist = (u64 *)(pages + ctx->nscalars);
> ctx->crclist = (u32 *)(ctx->fdlist + FASTRPC_MAX_FDLIST);
> + ctx->poll = (u32 *)(ctx->crclist + FASTRPC_MAX_CRCLIST);
> +
> args = (uintptr_t)ctx->buf->virt + metalen;
> rlen = pkt_size - metalen;
> ctx->rpra = rpra;
> @@ -1145,6 +1174,72 @@ static int fastrpc_invoke_send(struct fastrpc_session_ctx *sctx,
>
> }
>
> +static int poll_for_remote_response(struct fastrpc_invoke_ctx *ctx, u64 timeout)
> +{
> + int err = -EIO, i, j;
> +
> + /* poll on memory for DSP response. Return failure on timeout */
> + for (i = 0, j = 0; i < timeout; i++, j++) {
> + if (*ctx->poll == FASTRPC_POLL_RESPONSE) {
> + err = 0;
> + ctx->is_work_done = true;
> + ctx->retval = 0;
> + break;
> + }
> + if (j == FASTRPC_POLL_TIME_MEM_UPDATE) {
> + /* make sure that all poll memory writes by DSP are seen by CPU */
> + dma_rmb();
> + j = 0;
> + }
> + udelay(1);
> + }
> + return err;
> +}
> +
> +static inline int fastrpc_wait_for_response(struct fastrpc_invoke_ctx *ctx,
> + u32 kernel)
> +{
> + int err = 0;
> +
> + if (kernel) {
> + if (!wait_for_completion_timeout(&ctx->work, 10 * HZ))
> + err = -ETIMEDOUT;
> + } else {
> + err = wait_for_completion_interruptible(&ctx->work);
> + }
> +
> + return err;
> +}
> +
> +static int fastrpc_wait_for_completion(struct fastrpc_invoke_ctx *ctx,
> + u32 kernel)
> +{
> + int err;
> +
> + do {
> + switch (ctx->rsp_flags) {
> + case NORMAL_RESPONSE:
> + err = fastrpc_wait_for_response(ctx, kernel);
> + if (err || ctx->is_work_done)
> + return err;
> + break;
> + case POLL_MODE:
> + err = poll_for_remote_response(ctx, ctx->poll_timeout);
> + /* If polling timed out, move to normal response mode */
> + if (err)
> + ctx->rsp_flags = NORMAL_RESPONSE;
> + break;
> + default:
> + err = -EBADR;
> + dev_dbg(ctx->fl->sctx->dev,
> + "unsupported response type:0x%x\n", ctx->rsp_flags);
> + break;
> + }
> + } while (!ctx->is_work_done);
> +
> + return err;
> +}
> +
> static int fastrpc_internal_invoke(struct fastrpc_user *fl, u32 kernel,
> u32 handle, u32 sc,
> struct fastrpc_ctx_args *cargs)
> @@ -1180,16 +1275,20 @@ static int fastrpc_internal_invoke(struct fastrpc_user *fl, u32 kernel,
> if (err)
> goto bail;
>
> - if (kernel) {
> - if (!wait_for_completion_timeout(&ctx->work, 10 * HZ))
> - err = -ETIMEDOUT;
> - } else {
> - err = wait_for_completion_interruptible(&ctx->work);
> - }
> + if (ctx->poll_timeout != 0 && handle > FASTRPC_MAX_STATIC_HANDLE
> + && fl->pd == USER_PD)
> + ctx->rsp_flags = POLL_MODE;
>
> + err = fastrpc_wait_for_completion(ctx, kernel);
> if (err)
> goto bail;
>
> + if (!ctx->is_work_done) {
> + err = -ETIMEDOUT;
> + dev_dbg(fl->sctx->dev, "Invalid workdone state for handle 0x%x, sc 0x%x\n",
> + handle, sc);
> + goto bail;
> + }
> /* make sure that all memory writes by DSP are seen by CPU */
> dma_rmb();
> /* populate all the output buffers with results */
> @@ -1769,7 +1868,7 @@ static int fastrpc_invokev2(struct fastrpc_user *fl, char __user *argp)
> return -EFAULT;
>
> /* Check if all reserved fields are zero */
> - for (i = 0; i < 16; i++) {
> + for (i = 0; i < 14; i++) {
> if (inv2.reserved[i] != 0)
> return -EINVAL;
> }
> @@ -1779,6 +1878,7 @@ static int fastrpc_invokev2(struct fastrpc_user *fl, char __user *argp)
> return -ENOMEM;
>
> cargs->crc = (void __user *)(uintptr_t)inv2.crc;
> + cargs->poll_timeout = inv2.poll_timeout;
>
> err = fastrpc_remote_invoke(fl, &inv2.inv, cargs);
> kfree(cargs);
> @@ -2581,12 +2681,14 @@ static int fastrpc_rpmsg_callback(struct rpmsg_device *rpdev, void *data,
> ctx = idr_find(&cctx->ctx_idr, ctxid);
> spin_unlock_irqrestore(&cctx->lock, flags);
>
> + /* Ignore this failure as context returned will be NULL for polling mode */
> if (!ctx) {
> - dev_err(&rpdev->dev, "No context ID matches response\n");
> - return -ENOENT;
> + dev_dbg(&rpdev->dev, "No context ID matches response\n");
> + return 0;
> }
>
> ctx->retval = rsp->retval;
> + ctx->is_work_done = true;
> complete(&ctx->work);
>
> /*
> diff --git a/include/uapi/misc/fastrpc.h b/include/uapi/misc/fastrpc.h
> index 406b80555d41..1920c537bbbf 100644
> --- a/include/uapi/misc/fastrpc.h
> +++ b/include/uapi/misc/fastrpc.h
> @@ -84,7 +84,8 @@ struct fastrpc_invoke {
> struct fastrpc_invoke_v2 {
> struct fastrpc_invoke inv;
> __u64 crc;
> - __u32 reserved[16];
> + __u64 poll_timeout;
> + __u32 reserved[14];
> };
>
> struct fastrpc_init_create {
> --
> 2.34.1
>
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 5/5] misc: fastrpc: Modify context id mask to support polling mode
2025-01-27 4:42 ` [PATCH v2 5/5] misc: fastrpc: Modify context id mask to support polling mode Ekansh Gupta
@ 2025-01-28 23:30 ` Dmitry Baryshkov
2025-01-29 5:43 ` Ekansh Gupta
0 siblings, 1 reply; 25+ messages in thread
From: Dmitry Baryshkov @ 2025-01-28 23:30 UTC (permalink / raw)
To: Ekansh Gupta
Cc: srinivas.kandagatla, linux-arm-msm, gregkh, quic_bkumar,
linux-kernel, quic_chennak, dri-devel, arnd
On Mon, Jan 27, 2025 at 10:12:39AM +0530, Ekansh Gupta wrote:
> DSP needs last 4 bits of context id to be 0 for polling mode to be
> supported as setting of last 8 is intended for async mode(not yet
> supported on upstream driver) and setting these bits restrics
> writing to poll memory from DSP. Modify context id mask to ensure
> polling mode is supported.
Shouldn't this commit come before the previous one?
>
> Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
> ---
> drivers/misc/fastrpc.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
> index 257a741af115..ef56c793c564 100644
> --- a/drivers/misc/fastrpc.c
> +++ b/drivers/misc/fastrpc.c
> @@ -40,7 +40,7 @@
> #define FASTRPC_INIT_HANDLE 1
> #define FASTRPC_DSP_UTILITIES_HANDLE 2
> #define FASTRPC_MAX_STATIC_HANDLE (20)
> -#define FASTRPC_CTXID_MASK (0xFF0)
> +#define FASTRPC_CTXID_MASK (0xFF0000)
> #define INIT_FILELEN_MAX (2 * 1024 * 1024)
> #define INIT_FILE_NAMELEN_MAX (128)
> #define FASTRPC_DEVICE_NAME "fastrpc"
> @@ -524,7 +524,7 @@ static void fastrpc_context_free(struct kref *ref)
> fastrpc_buf_free(ctx->buf);
>
> spin_lock_irqsave(&cctx->lock, flags);
> - idr_remove(&cctx->ctx_idr, ctx->ctxid >> 4);
> + idr_remove(&cctx->ctx_idr, ctx->ctxid >> 16);
> spin_unlock_irqrestore(&cctx->lock, flags);
>
> kfree(ctx->maps);
> @@ -664,7 +664,7 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
> spin_unlock_irqrestore(&cctx->lock, flags);
> goto err_idr;
> }
> - ctx->ctxid = ret << 4;
> + ctx->ctxid = ret << 16;
> spin_unlock_irqrestore(&cctx->lock, flags);
>
> kref_init(&ctx->refcount);
> @@ -2675,7 +2675,7 @@ static int fastrpc_rpmsg_callback(struct rpmsg_device *rpdev, void *data,
> if (len < sizeof(*rsp))
> return -EINVAL;
>
> - ctxid = ((rsp->ctx & FASTRPC_CTXID_MASK) >> 4);
> + ctxid = ((rsp->ctx & FASTRPC_CTXID_MASK) >> 16);
>
> spin_lock_irqsave(&cctx->lock, flags);
> ctx = idr_find(&cctx->ctx_idr, ctxid);
> --
> 2.34.1
>
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/5] misc: fastrpc: Introduce context params structure
2025-01-27 15:43 ` Dmitry Baryshkov
@ 2025-01-29 4:57 ` Ekansh Gupta
2025-01-29 10:29 ` Dmitry Baryshkov
0 siblings, 1 reply; 25+ messages in thread
From: Ekansh Gupta @ 2025-01-29 4:57 UTC (permalink / raw)
To: Dmitry Baryshkov
Cc: srinivas.kandagatla, linux-arm-msm, gregkh, quic_bkumar,
linux-kernel, quic_chennak, dri-devel, arnd
On 1/27/2025 9:13 PM, Dmitry Baryshkov wrote:
> On Mon, Jan 27, 2025 at 10:12:36AM +0530, Ekansh Gupta wrote:
>> Add structure to invoke context parameterms. This structure is meant
> Nit: for invoke context parameters.
Ack.
>
>> to carry invoke context specific data. This structure is passed to
>> internal invocation call to save the data in context. Examples of
>> data intended to part of this structure are: CRC user memory address,
>> poll timeout for invoke call, call priority etc.
>>
>> Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
>> ---
>> drivers/misc/fastrpc.c | 138 ++++++++++++++++++++++++++++++++++-------
>> 1 file changed, 117 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
>> index 1a936d462519..c29d5536195e 100644
>> --- a/drivers/misc/fastrpc.c
>> +++ b/drivers/misc/fastrpc.c
>> @@ -254,6 +254,11 @@ struct fastrpc_invoke_ctx {
>> struct fastrpc_channel_ctx *cctx;
>> };
>>
>> +struct fastrpc_ctx_args {
>> + struct fastrpc_invoke_args *args;
>> + void __user *crc;
>> +};
> Why do we need a separate struct? Can we use fastrpc_invoke_ctx instead?
As fastrpc_invoke_ctx structure is allocated and returned from fastrpc_context_alloc
and getting fastrpc_context_free. I was thinking of keeping the user passed
information in a different structure.
>
>> +
>> struct fastrpc_session_ctx {
>> struct device *dev;
>> int sid;
>> @@ -574,7 +579,7 @@ static void fastrpc_get_buff_overlaps(struct fastrpc_invoke_ctx *ctx)
>>
>> static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
>> struct fastrpc_user *user, u32 kernel, u32 sc,
>> - struct fastrpc_invoke_args *args)
>> + struct fastrpc_ctx_args *cargs)
>> {
>> struct fastrpc_channel_ctx *cctx = user->cctx;
>> struct fastrpc_invoke_ctx *ctx = NULL;
>> @@ -605,7 +610,7 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
>> kfree(ctx);
>> return ERR_PTR(-ENOMEM);
>> }
>> - ctx->args = args;
>> + ctx->args = cargs->args;
>> fastrpc_get_buff_overlaps(ctx);
>> }
>>
>> @@ -1133,7 +1138,7 @@ static int fastrpc_invoke_send(struct fastrpc_session_ctx *sctx,
>>
>> static int fastrpc_internal_invoke(struct fastrpc_user *fl, u32 kernel,
>> u32 handle, u32 sc,
>> - struct fastrpc_invoke_args *args)
>> + struct fastrpc_ctx_args *cargs)
>> {
>> struct fastrpc_invoke_ctx *ctx = NULL;
>> struct fastrpc_buf *buf, *b;
>> @@ -1151,7 +1156,7 @@ static int fastrpc_internal_invoke(struct fastrpc_user *fl, u32 kernel,
>> return -EPERM;
>> }
>>
>> - ctx = fastrpc_context_alloc(fl, kernel, sc, args);
>> + ctx = fastrpc_context_alloc(fl, kernel, sc, cargs);
>> if (IS_ERR(ctx))
>> return PTR_ERR(ctx);
>>
>> @@ -1233,6 +1238,7 @@ static int fastrpc_init_create_static_process(struct fastrpc_user *fl,
>> {
>> struct fastrpc_init_create_static init;
>> struct fastrpc_invoke_args *args;
>> + struct fastrpc_ctx_args *cargs;
>> struct fastrpc_phy_page pages[1];
>> char *name;
>> int err;
>> @@ -1307,15 +1313,25 @@ static int fastrpc_init_create_static_process(struct fastrpc_user *fl,
>> args[2].length = sizeof(*pages);
>> args[2].fd = -1;
>>
>> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
>> + if (!cargs) {
>> + err = -ENOMEM;
>> + goto err_invoke;
>> + }
>> +
>> + cargs->args = args;
>> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_CREATE_STATIC, 3, 0);
>>
>> err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
>> - sc, args);
>> - if (err)
>> + sc, cargs);
>> + if (err) {
>> + kfree(cargs);
> No, this should be a part of the error path.
Ack.
>
>> goto err_invoke;
>> + }
>>
>> kfree(args);
>> kfree(name);
>> + kfree(cargs);
>>
>> return 0;
>> err_invoke:
>> @@ -1351,6 +1367,7 @@ static int fastrpc_init_create_process(struct fastrpc_user *fl,
>> {
>> struct fastrpc_init_create init;
>> struct fastrpc_invoke_args *args;
>> + struct fastrpc_ctx_args *cargs;
>> struct fastrpc_phy_page pages[1];
>> struct fastrpc_map *map = NULL;
>> struct fastrpc_buf *imem = NULL;
>> @@ -1438,16 +1455,26 @@ static int fastrpc_init_create_process(struct fastrpc_user *fl,
>> args[5].length = sizeof(inbuf.siglen);
>> args[5].fd = -1;
>>
>> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
>> + if (!cargs) {
>> + err = -ENOMEM;
>> + goto err_invoke;
>> + }
>> +
>> + cargs->args = args;
>> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_CREATE, 4, 0);
>> if (init.attrs)
>> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_CREATE_ATTR, 4, 0);
>>
>> err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
>> - sc, args);
>> - if (err)
>> + sc, cargs);
>> + if (err) {
>> + kfree(cargs);
> Likewise.
Ack.
--ekansh
>
>> goto err_invoke;
>> + }
>>
>> kfree(args);
>> + kfree(cargs);
>>
>> return 0;
>>
>> @@ -1498,17 +1525,27 @@ static void fastrpc_session_free(struct fastrpc_channel_ctx *cctx,
>> static int fastrpc_release_current_dsp_process(struct fastrpc_user *fl)
>> {
>> struct fastrpc_invoke_args args[1];
>> - int client_id = 0;
>> + struct fastrpc_ctx_args *cargs;
>> + int client_id = 0, err;
>> u32 sc;
>>
>> client_id = fl->client_id;
>> args[0].ptr = (u64)(uintptr_t) &client_id;
>> args[0].length = sizeof(client_id);
>> args[0].fd = -1;
>> +
>> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
>> + if (!cargs)
>> + return -ENOMEM;
>> +
>> + cargs->args = args;
>> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_RELEASE, 1, 0);
>>
>> - return fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
>> - sc, &args[0]);
>> + err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
>> + sc, cargs);
>> + kfree(cargs);
>> +
>> + return err;
>> }
>>
>> static int fastrpc_device_release(struct inode *inode, struct file *file)
>> @@ -1643,22 +1680,33 @@ static int fastrpc_dmabuf_alloc(struct fastrpc_user *fl, char __user *argp)
>> static int fastrpc_init_attach(struct fastrpc_user *fl, int pd)
>> {
>> struct fastrpc_invoke_args args[1];
>> - int client_id = fl->client_id;
>> + struct fastrpc_ctx_args *cargs;
>> + int client_id = fl->client_id, err;
>> u32 sc;
>>
>> args[0].ptr = (u64)(uintptr_t) &client_id;
>> args[0].length = sizeof(client_id);
>> args[0].fd = -1;
>> +
>> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
>> + if (!cargs)
>> + return -ENOMEM;
>> +
>> + cargs->args = args;
>> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_ATTACH, 1, 0);
>> fl->pd = pd;
>>
>> - return fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
>> - sc, &args[0]);
>> + err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE,
>> + sc, cargs);
>> + kfree(cargs);
>> +
>> + return err;
>> }
>>
>> static int fastrpc_invoke(struct fastrpc_user *fl, char __user *argp)
>> {
>> struct fastrpc_invoke_args *args = NULL;
>> + struct fastrpc_ctx_args *cargs;
>> struct fastrpc_invoke inv;
>> u32 nscalars;
>> int err;
>> @@ -1679,9 +1727,16 @@ static int fastrpc_invoke(struct fastrpc_user *fl, char __user *argp)
>> return -EFAULT;
>> }
>> }
>> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
>> + if (!cargs) {
>> + kfree(args);
>> + return -ENOMEM;
>> + }
>>
>> - err = fastrpc_internal_invoke(fl, false, inv.handle, inv.sc, args);
>> + cargs->args = args;
>> + err = fastrpc_internal_invoke(fl, false, inv.handle, inv.sc, cargs);
>> kfree(args);
>> + kfree(cargs);
>>
>> return err;
>> }
>> @@ -1690,6 +1745,8 @@ static int fastrpc_get_info_from_dsp(struct fastrpc_user *fl, uint32_t *dsp_attr
>> uint32_t dsp_attr_buf_len)
>> {
>> struct fastrpc_invoke_args args[2] = { 0 };
>> + struct fastrpc_ctx_args *cargs;
>> + int err;
>>
>> /*
>> * Capability filled in userspace. This carries the information
>> @@ -1706,8 +1763,15 @@ static int fastrpc_get_info_from_dsp(struct fastrpc_user *fl, uint32_t *dsp_attr
>> args[1].length = dsp_attr_buf_len * sizeof(u32);
>> args[1].fd = -1;
>>
>> - return fastrpc_internal_invoke(fl, true, FASTRPC_DSP_UTILITIES_HANDLE,
>> - FASTRPC_SCALARS(0, 1, 1), args);
>> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
>> + if (!cargs)
>> + return -ENOMEM;
>> +
>> + cargs->args = args;
>> + err = fastrpc_internal_invoke(fl, true, FASTRPC_DSP_UTILITIES_HANDLE,
>> + FASTRPC_SCALARS(0, 1, 1), cargs);
>> + kfree(cargs);
>> + return err;
>> }
>>
>> static int fastrpc_get_info_from_kernel(struct fastrpc_ioctl_capability *cap,
>> @@ -1794,6 +1858,7 @@ static int fastrpc_get_dsp_info(struct fastrpc_user *fl, char __user *argp)
>> static int fastrpc_req_munmap_impl(struct fastrpc_user *fl, struct fastrpc_buf *buf)
>> {
>> struct fastrpc_invoke_args args[1] = { [0] = { 0 } };
>> + struct fastrpc_ctx_args *cargs;
>> struct fastrpc_munmap_req_msg req_msg;
>> struct device *dev = fl->sctx->dev;
>> int err;
>> @@ -1806,9 +1871,14 @@ static int fastrpc_req_munmap_impl(struct fastrpc_user *fl, struct fastrpc_buf *
>> args[0].ptr = (u64) (uintptr_t) &req_msg;
>> args[0].length = sizeof(req_msg);
>>
>> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
>> + if (!cargs)
>> + return -ENOMEM;
>> +
>> + cargs->args = args;
>> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MUNMAP, 1, 0);
>> err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc,
>> - &args[0]);
>> + cargs);
>> if (!err) {
>> dev_dbg(dev, "unmmap\tpt 0x%09lx OK\n", buf->raddr);
>> spin_lock(&fl->lock);
>> @@ -1818,6 +1888,7 @@ static int fastrpc_req_munmap_impl(struct fastrpc_user *fl, struct fastrpc_buf *
>> } else {
>> dev_err(dev, "unmmap\tpt 0x%09lx ERROR\n", buf->raddr);
>> }
>> + kfree(cargs);
>>
>> return err;
>> }
>> @@ -1852,6 +1923,7 @@ static int fastrpc_req_munmap(struct fastrpc_user *fl, char __user *argp)
>> static int fastrpc_req_mmap(struct fastrpc_user *fl, char __user *argp)
>> {
>> struct fastrpc_invoke_args args[3] = { [0 ... 2] = { 0 } };
>> + struct fastrpc_ctx_args *cargs;
>> struct fastrpc_buf *buf = NULL;
>> struct fastrpc_mmap_req_msg req_msg;
>> struct fastrpc_mmap_rsp_msg rsp_msg;
>> @@ -1902,12 +1974,18 @@ static int fastrpc_req_mmap(struct fastrpc_user *fl, char __user *argp)
>> args[2].ptr = (u64) (uintptr_t) &rsp_msg;
>> args[2].length = sizeof(rsp_msg);
>>
>> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
>> + if (!cargs)
>> + return -ENOMEM;
>> +
>> + cargs->args = args;
>> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MMAP, 2, 1);
>> err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc,
>> - &args[0]);
>> + cargs);
>> if (err) {
>> dev_err(dev, "mmap error (len 0x%08llx)\n", buf->size);
>> fastrpc_buf_free(buf);
>> + kfree(cargs);
>> return err;
>> }
>>
>> @@ -1942,17 +2020,20 @@ static int fastrpc_req_mmap(struct fastrpc_user *fl, char __user *argp)
>> dev_dbg(dev, "mmap\t\tpt 0x%09lx OK [len 0x%08llx]\n",
>> buf->raddr, buf->size);
>>
>> + kfree(cargs);
>> return 0;
>>
>> err_assign:
>> fastrpc_req_munmap_impl(fl, buf);
>>
>> + kfree(cargs);
>> return err;
>> }
>>
>> static int fastrpc_req_mem_unmap_impl(struct fastrpc_user *fl, struct fastrpc_mem_unmap *req)
>> {
>> struct fastrpc_invoke_args args[1] = { [0] = { 0 } };
>> + struct fastrpc_ctx_args *cargs;
>> struct fastrpc_map *map = NULL, *iter, *m;
>> struct fastrpc_mem_unmap_req_msg req_msg = { 0 };
>> int err = 0;
>> @@ -1982,14 +2063,21 @@ static int fastrpc_req_mem_unmap_impl(struct fastrpc_user *fl, struct fastrpc_me
>> args[0].ptr = (u64) (uintptr_t) &req_msg;
>> args[0].length = sizeof(req_msg);
>>
>> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
>> + if (!cargs)
>> + return -ENOMEM;
>> +
>> + cargs->args = args;
>> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MEM_UNMAP, 1, 0);
>> err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc,
>> - &args[0]);
>> + cargs);
>> if (err) {
>> dev_err(dev, "unmmap\tpt fd = %d, 0x%09llx error\n", map->fd, map->raddr);
>> + kfree(cargs);
>> return err;
>> }
>> fastrpc_map_put(map);
>> + kfree(cargs);
>>
>> return 0;
>> }
>> @@ -2007,6 +2095,7 @@ static int fastrpc_req_mem_unmap(struct fastrpc_user *fl, char __user *argp)
>> static int fastrpc_req_mem_map(struct fastrpc_user *fl, char __user *argp)
>> {
>> struct fastrpc_invoke_args args[4] = { [0 ... 3] = { 0 } };
>> + struct fastrpc_ctx_args *cargs;
>> struct fastrpc_mem_map_req_msg req_msg = { 0 };
>> struct fastrpc_mmap_rsp_msg rsp_msg = { 0 };
>> struct fastrpc_mem_unmap req_unmap = { 0 };
>> @@ -2051,8 +2140,13 @@ static int fastrpc_req_mem_map(struct fastrpc_user *fl, char __user *argp)
>> args[3].ptr = (u64) (uintptr_t) &rsp_msg;
>> args[3].length = sizeof(rsp_msg);
>>
>> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
>> + if (!cargs)
>> + return -ENOMEM;
>> +
>> + cargs->args = args;
>> sc = FASTRPC_SCALARS(FASTRPC_RMID_INIT_MEM_MAP, 3, 1);
>> - err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc, &args[0]);
>> + err = fastrpc_internal_invoke(fl, true, FASTRPC_INIT_HANDLE, sc, cargs);
>> if (err) {
>> dev_err(dev, "mem mmap error, fd %d, vaddr %llx, size %lld\n",
>> req.fd, req.vaddrin, map->size);
>> @@ -2072,11 +2166,13 @@ static int fastrpc_req_mem_map(struct fastrpc_user *fl, char __user *argp)
>> fastrpc_req_mem_unmap_impl(fl, &req_unmap);
>> return -EFAULT;
>> }
>> + kfree(cargs);
>>
>> return 0;
>>
>> err_invoke:
>> fastrpc_map_put(map);
>> + kfree(cargs);
>>
>> return err;
>> }
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver
2025-01-28 23:29 ` Dmitry Baryshkov
@ 2025-01-29 5:42 ` Ekansh Gupta
2025-01-29 10:40 ` Dmitry Baryshkov
0 siblings, 1 reply; 25+ messages in thread
From: Ekansh Gupta @ 2025-01-29 5:42 UTC (permalink / raw)
To: Dmitry Baryshkov
Cc: srinivas.kandagatla, linux-arm-msm, gregkh, quic_bkumar,
linux-kernel, quic_chennak, dri-devel, arnd
On 1/29/2025 4:59 AM, Dmitry Baryshkov wrote:
> On Mon, Jan 27, 2025 at 10:12:38AM +0530, Ekansh Gupta wrote:
>> For any remote call to DSP, after sending an invocation message,
>> fastRPC driver waits for glink response and during this time the
>> CPU can go into low power modes. Adding a polling mode support
>> with which fastRPC driver will poll continuously on a memory
>> after sending a message to remote subsystem which will eliminate
>> CPU wakeup and scheduling latencies and reduce fastRPC overhead.
>> With this change, DSP always sends a glink response which will
>> get ignored if polling mode didn't time out.
> Is there a chance to implement actual async I/O protocol with the help
> of the poll() call instead of hiding the polling / wait inside the
> invoke2?
This design is based on the implementation on DSP firmware as of today:
Call flow: https://github.com/quic-ekangupt/fastrpc/blob/invokev2/Docs/invoke_v2.md#5-polling-mode
Can you please give some reference to the async I/O protocol that you've
suggested? I can check if it can be implemented here.
--ekansh
>
>> Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
>> ---
>> drivers/misc/fastrpc.c | 122 +++++++++++++++++++++++++++++++++---
>> include/uapi/misc/fastrpc.h | 3 +-
>> 2 files changed, 114 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
>> index cfacee0dded5..257a741af115 100644
>> --- a/drivers/misc/fastrpc.c
>> +++ b/drivers/misc/fastrpc.c
>> @@ -19,6 +19,7 @@
>> #include <linux/rpmsg.h>
>> #include <linux/scatterlist.h>
>> #include <linux/slab.h>
>> +#include <linux/delay.h>
>> #include <linux/firmware/qcom/qcom_scm.h>
>> #include <uapi/misc/fastrpc.h>
>> #include <linux/of_reserved_mem.h>
>> @@ -38,6 +39,7 @@
>> #define FASTRPC_CTX_MAX (256)
>> #define FASTRPC_INIT_HANDLE 1
>> #define FASTRPC_DSP_UTILITIES_HANDLE 2
>> +#define FASTRPC_MAX_STATIC_HANDLE (20)
>> #define FASTRPC_CTXID_MASK (0xFF0)
>> #define INIT_FILELEN_MAX (2 * 1024 * 1024)
>> #define INIT_FILE_NAMELEN_MAX (128)
>> @@ -106,6 +108,19 @@
>>
>> #define miscdev_to_fdevice(d) container_of(d, struct fastrpc_device, miscdev)
>>
>> +/* Poll response number from remote processor for call completion */
>> +#define FASTRPC_POLL_RESPONSE (0xdecaf)
>> +/* timeout in us for polling until memory barrier */
>> +#define FASTRPC_POLL_TIME_MEM_UPDATE (500)
>> +
>> +/* Response types supported for RPC calls */
>> +enum fastrpc_response_flags {
>> + /* normal job completion glink response */
>> + NORMAL_RESPONSE = 0,
>> + /* process updates poll memory instead of glink response */
>> + POLL_MODE = 1,
>> +};
>> +
>> static const char *domains[FASTRPC_DEV_MAX] = { "adsp", "mdsp",
>> "sdsp", "cdsp", "cdsp1" };
>> struct fastrpc_phy_page {
>> @@ -238,9 +253,16 @@ struct fastrpc_invoke_ctx {
>> u32 sc;
>> u64 *fdlist;
>> u32 *crclist;
>> + u32 *poll;
>> void __user *crc;
>> u64 ctxid;
>> u64 msg_sz;
>> + /* Threads poll for specified timeout and fall back to glink wait */
>> + u64 poll_timeout;
>> + /* work done status flag */
>> + bool is_work_done;
>> + /* response flags from remote processor */
>> + enum fastrpc_response_flags rsp_flags;
>> struct kref refcount;
>> struct list_head node; /* list of ctxs */
>> struct completion work;
>> @@ -258,6 +280,7 @@ struct fastrpc_invoke_ctx {
>> struct fastrpc_ctx_args {
>> struct fastrpc_invoke_args *args;
>> void __user *crc;
>> + u64 poll_timeout;
>> };
>>
>> struct fastrpc_session_ctx {
>> @@ -619,11 +642,14 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
>> fastrpc_channel_ctx_get(cctx);
>>
>> ctx->crc = cargs->crc;
>> + ctx->poll_timeout = cargs->poll_timeout;
>> ctx->sc = sc;
>> ctx->retval = -1;
>> ctx->pid = current->pid;
>> ctx->client_id = user->client_id;
>> ctx->cctx = cctx;
>> + ctx->rsp_flags = NORMAL_RESPONSE;
>> + ctx->is_work_done = false;
>> init_completion(&ctx->work);
>> INIT_WORK(&ctx->put_work, fastrpc_context_put_wq);
>>
>> @@ -882,7 +908,8 @@ static int fastrpc_get_meta_size(struct fastrpc_invoke_ctx *ctx)
>> sizeof(struct fastrpc_invoke_buf) +
>> sizeof(struct fastrpc_phy_page)) * ctx->nscalars +
>> sizeof(u64) * FASTRPC_MAX_FDLIST +
>> - sizeof(u32) * FASTRPC_MAX_CRCLIST;
>> + sizeof(u32) * FASTRPC_MAX_CRCLIST +
>> + sizeof(u32);
>>
>> return size;
>> }
>> @@ -975,6 +1002,8 @@ static int fastrpc_get_args(u32 kernel, struct fastrpc_invoke_ctx *ctx)
>> pages = fastrpc_phy_page_start(list, ctx->nscalars);
>> ctx->fdlist = (u64 *)(pages + ctx->nscalars);
>> ctx->crclist = (u32 *)(ctx->fdlist + FASTRPC_MAX_FDLIST);
>> + ctx->poll = (u32 *)(ctx->crclist + FASTRPC_MAX_CRCLIST);
>> +
>> args = (uintptr_t)ctx->buf->virt + metalen;
>> rlen = pkt_size - metalen;
>> ctx->rpra = rpra;
>> @@ -1145,6 +1174,72 @@ static int fastrpc_invoke_send(struct fastrpc_session_ctx *sctx,
>>
>> }
>>
>> +static int poll_for_remote_response(struct fastrpc_invoke_ctx *ctx, u64 timeout)
>> +{
>> + int err = -EIO, i, j;
>> +
>> + /* poll on memory for DSP response. Return failure on timeout */
>> + for (i = 0, j = 0; i < timeout; i++, j++) {
>> + if (*ctx->poll == FASTRPC_POLL_RESPONSE) {
>> + err = 0;
>> + ctx->is_work_done = true;
>> + ctx->retval = 0;
>> + break;
>> + }
>> + if (j == FASTRPC_POLL_TIME_MEM_UPDATE) {
>> + /* make sure that all poll memory writes by DSP are seen by CPU */
>> + dma_rmb();
>> + j = 0;
>> + }
>> + udelay(1);
>> + }
>> + return err;
>> +}
>> +
>> +static inline int fastrpc_wait_for_response(struct fastrpc_invoke_ctx *ctx,
>> + u32 kernel)
>> +{
>> + int err = 0;
>> +
>> + if (kernel) {
>> + if (!wait_for_completion_timeout(&ctx->work, 10 * HZ))
>> + err = -ETIMEDOUT;
>> + } else {
>> + err = wait_for_completion_interruptible(&ctx->work);
>> + }
>> +
>> + return err;
>> +}
>> +
>> +static int fastrpc_wait_for_completion(struct fastrpc_invoke_ctx *ctx,
>> + u32 kernel)
>> +{
>> + int err;
>> +
>> + do {
>> + switch (ctx->rsp_flags) {
>> + case NORMAL_RESPONSE:
>> + err = fastrpc_wait_for_response(ctx, kernel);
>> + if (err || ctx->is_work_done)
>> + return err;
>> + break;
>> + case POLL_MODE:
>> + err = poll_for_remote_response(ctx, ctx->poll_timeout);
>> + /* If polling timed out, move to normal response mode */
>> + if (err)
>> + ctx->rsp_flags = NORMAL_RESPONSE;
>> + break;
>> + default:
>> + err = -EBADR;
>> + dev_dbg(ctx->fl->sctx->dev,
>> + "unsupported response type:0x%x\n", ctx->rsp_flags);
>> + break;
>> + }
>> + } while (!ctx->is_work_done);
>> +
>> + return err;
>> +}
>> +
>> static int fastrpc_internal_invoke(struct fastrpc_user *fl, u32 kernel,
>> u32 handle, u32 sc,
>> struct fastrpc_ctx_args *cargs)
>> @@ -1180,16 +1275,20 @@ static int fastrpc_internal_invoke(struct fastrpc_user *fl, u32 kernel,
>> if (err)
>> goto bail;
>>
>> - if (kernel) {
>> - if (!wait_for_completion_timeout(&ctx->work, 10 * HZ))
>> - err = -ETIMEDOUT;
>> - } else {
>> - err = wait_for_completion_interruptible(&ctx->work);
>> - }
>> + if (ctx->poll_timeout != 0 && handle > FASTRPC_MAX_STATIC_HANDLE
>> + && fl->pd == USER_PD)
>> + ctx->rsp_flags = POLL_MODE;
>>
>> + err = fastrpc_wait_for_completion(ctx, kernel);
>> if (err)
>> goto bail;
>>
>> + if (!ctx->is_work_done) {
>> + err = -ETIMEDOUT;
>> + dev_dbg(fl->sctx->dev, "Invalid workdone state for handle 0x%x, sc 0x%x\n",
>> + handle, sc);
>> + goto bail;
>> + }
>> /* make sure that all memory writes by DSP are seen by CPU */
>> dma_rmb();
>> /* populate all the output buffers with results */
>> @@ -1769,7 +1868,7 @@ static int fastrpc_invokev2(struct fastrpc_user *fl, char __user *argp)
>> return -EFAULT;
>>
>> /* Check if all reserved fields are zero */
>> - for (i = 0; i < 16; i++) {
>> + for (i = 0; i < 14; i++) {
>> if (inv2.reserved[i] != 0)
>> return -EINVAL;
>> }
>> @@ -1779,6 +1878,7 @@ static int fastrpc_invokev2(struct fastrpc_user *fl, char __user *argp)
>> return -ENOMEM;
>>
>> cargs->crc = (void __user *)(uintptr_t)inv2.crc;
>> + cargs->poll_timeout = inv2.poll_timeout;
>>
>> err = fastrpc_remote_invoke(fl, &inv2.inv, cargs);
>> kfree(cargs);
>> @@ -2581,12 +2681,14 @@ static int fastrpc_rpmsg_callback(struct rpmsg_device *rpdev, void *data,
>> ctx = idr_find(&cctx->ctx_idr, ctxid);
>> spin_unlock_irqrestore(&cctx->lock, flags);
>>
>> + /* Ignore this failure as context returned will be NULL for polling mode */
>> if (!ctx) {
>> - dev_err(&rpdev->dev, "No context ID matches response\n");
>> - return -ENOENT;
>> + dev_dbg(&rpdev->dev, "No context ID matches response\n");
>> + return 0;
>> }
>>
>> ctx->retval = rsp->retval;
>> + ctx->is_work_done = true;
>> complete(&ctx->work);
>>
>> /*
>> diff --git a/include/uapi/misc/fastrpc.h b/include/uapi/misc/fastrpc.h
>> index 406b80555d41..1920c537bbbf 100644
>> --- a/include/uapi/misc/fastrpc.h
>> +++ b/include/uapi/misc/fastrpc.h
>> @@ -84,7 +84,8 @@ struct fastrpc_invoke {
>> struct fastrpc_invoke_v2 {
>> struct fastrpc_invoke inv;
>> __u64 crc;
>> - __u32 reserved[16];
>> + __u64 poll_timeout;
>> + __u32 reserved[14];
>> };
>>
>> struct fastrpc_init_create {
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 5/5] misc: fastrpc: Modify context id mask to support polling mode
2025-01-28 23:30 ` Dmitry Baryshkov
@ 2025-01-29 5:43 ` Ekansh Gupta
2025-01-29 11:23 ` Dmitry Baryshkov
0 siblings, 1 reply; 25+ messages in thread
From: Ekansh Gupta @ 2025-01-29 5:43 UTC (permalink / raw)
To: Dmitry Baryshkov
Cc: srinivas.kandagatla, linux-arm-msm, gregkh, quic_bkumar,
linux-kernel, quic_chennak, dri-devel, arnd
On 1/29/2025 5:00 AM, Dmitry Baryshkov wrote:
> On Mon, Jan 27, 2025 at 10:12:39AM +0530, Ekansh Gupta wrote:
>> DSP needs last 4 bits of context id to be 0 for polling mode to be
>> supported as setting of last 8 is intended for async mode(not yet
>> supported on upstream driver) and setting these bits restrics
>> writing to poll memory from DSP. Modify context id mask to ensure
>> polling mode is supported.
> Shouldn't this commit come before the previous one?
Yes, I'll change the order in next series.
Thanks for reviewing the changes.
--ekansh
>
>> Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
>> ---
>> drivers/misc/fastrpc.c | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
>> index 257a741af115..ef56c793c564 100644
>> --- a/drivers/misc/fastrpc.c
>> +++ b/drivers/misc/fastrpc.c
>> @@ -40,7 +40,7 @@
>> #define FASTRPC_INIT_HANDLE 1
>> #define FASTRPC_DSP_UTILITIES_HANDLE 2
>> #define FASTRPC_MAX_STATIC_HANDLE (20)
>> -#define FASTRPC_CTXID_MASK (0xFF0)
>> +#define FASTRPC_CTXID_MASK (0xFF0000)
>> #define INIT_FILELEN_MAX (2 * 1024 * 1024)
>> #define INIT_FILE_NAMELEN_MAX (128)
>> #define FASTRPC_DEVICE_NAME "fastrpc"
>> @@ -524,7 +524,7 @@ static void fastrpc_context_free(struct kref *ref)
>> fastrpc_buf_free(ctx->buf);
>>
>> spin_lock_irqsave(&cctx->lock, flags);
>> - idr_remove(&cctx->ctx_idr, ctx->ctxid >> 4);
>> + idr_remove(&cctx->ctx_idr, ctx->ctxid >> 16);
>> spin_unlock_irqrestore(&cctx->lock, flags);
>>
>> kfree(ctx->maps);
>> @@ -664,7 +664,7 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
>> spin_unlock_irqrestore(&cctx->lock, flags);
>> goto err_idr;
>> }
>> - ctx->ctxid = ret << 4;
>> + ctx->ctxid = ret << 16;
>> spin_unlock_irqrestore(&cctx->lock, flags);
>>
>> kref_init(&ctx->refcount);
>> @@ -2675,7 +2675,7 @@ static int fastrpc_rpmsg_callback(struct rpmsg_device *rpdev, void *data,
>> if (len < sizeof(*rsp))
>> return -EINVAL;
>>
>> - ctxid = ((rsp->ctx & FASTRPC_CTXID_MASK) >> 4);
>> + ctxid = ((rsp->ctx & FASTRPC_CTXID_MASK) >> 16);
>>
>> spin_lock_irqsave(&cctx->lock, flags);
>> ctx = idr_find(&cctx->ctx_idr, ctxid);
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 2/5] misc: fastrpc: Introduce context params structure
2025-01-29 4:57 ` Ekansh Gupta
@ 2025-01-29 10:29 ` Dmitry Baryshkov
0 siblings, 0 replies; 25+ messages in thread
From: Dmitry Baryshkov @ 2025-01-29 10:29 UTC (permalink / raw)
To: Ekansh Gupta
Cc: srinivas.kandagatla, linux-arm-msm, gregkh, quic_bkumar,
linux-kernel, quic_chennak, dri-devel, arnd
On Wed, Jan 29, 2025 at 10:27:45AM +0530, Ekansh Gupta wrote:
>
>
>
> On 1/27/2025 9:13 PM, Dmitry Baryshkov wrote:
> > On Mon, Jan 27, 2025 at 10:12:36AM +0530, Ekansh Gupta wrote:
> >> Add structure to invoke context parameterms. This structure is meant
> > Nit: for invoke context parameters.
>
> Ack.
>
> >
> >> to carry invoke context specific data. This structure is passed to
> >> internal invocation call to save the data in context. Examples of
> >> data intended to part of this structure are: CRC user memory address,
> >> poll timeout for invoke call, call priority etc.
> >>
> >> Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
> >> ---
> >> drivers/misc/fastrpc.c | 138 ++++++++++++++++++++++++++++++++++-------
> >> 1 file changed, 117 insertions(+), 21 deletions(-)
> >>
> >> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
> >> index 1a936d462519..c29d5536195e 100644
> >> --- a/drivers/misc/fastrpc.c
> >> +++ b/drivers/misc/fastrpc.c
> >> @@ -254,6 +254,11 @@ struct fastrpc_invoke_ctx {
> >> struct fastrpc_channel_ctx *cctx;
> >> };
> >>
> >> +struct fastrpc_ctx_args {
> >> + struct fastrpc_invoke_args *args;
> >> + void __user *crc;
> >> +};
> > Why do we need a separate struct? Can we use fastrpc_invoke_ctx instead?
>
> As fastrpc_invoke_ctx structure is allocated and returned from fastrpc_context_alloc
> and getting fastrpc_context_free. I was thinking of keeping the user passed
> information in a different structure.
I think we've discussed that previously: don't store user-passed
information. Parse it as necessary and then drop the user data.
This way both old and new invocation types will use the same data struct
internally.
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver
2025-01-29 5:42 ` Ekansh Gupta
@ 2025-01-29 10:40 ` Dmitry Baryshkov
2025-03-20 13:49 ` Ekansh Gupta
0 siblings, 1 reply; 25+ messages in thread
From: Dmitry Baryshkov @ 2025-01-29 10:40 UTC (permalink / raw)
To: Ekansh Gupta
Cc: srinivas.kandagatla, linux-arm-msm, gregkh, quic_bkumar,
linux-kernel, quic_chennak, dri-devel, arnd
On Wed, Jan 29, 2025 at 11:12:16AM +0530, Ekansh Gupta wrote:
>
>
>
> On 1/29/2025 4:59 AM, Dmitry Baryshkov wrote:
> > On Mon, Jan 27, 2025 at 10:12:38AM +0530, Ekansh Gupta wrote:
> >> For any remote call to DSP, after sending an invocation message,
> >> fastRPC driver waits for glink response and during this time the
> >> CPU can go into low power modes. Adding a polling mode support
> >> with which fastRPC driver will poll continuously on a memory
> >> after sending a message to remote subsystem which will eliminate
> >> CPU wakeup and scheduling latencies and reduce fastRPC overhead.
> >> With this change, DSP always sends a glink response which will
> >> get ignored if polling mode didn't time out.
> > Is there a chance to implement actual async I/O protocol with the help
> > of the poll() call instead of hiding the polling / wait inside the
> > invoke2?
>
> This design is based on the implementation on DSP firmware as of today:
> Call flow: https://github.com/quic-ekangupt/fastrpc/blob/invokev2/Docs/invoke_v2.md#5-polling-mode
>
> Can you please give some reference to the async I/O protocol that you've
> suggested? I can check if it can be implemented here.
As with the typical poll() call implementation:
- write some data using ioctl
- call poll() / select() to wait for the data to be processed
- read data using another ioctl
Getting back to your patch. from you commit message it is not clear,
which SoCs support this feature. Reminding you that we are supporting
all kinds of platforms, including the ones that are EoLed by Qualcomm.
Next, you wrote that in-driver polling eliminates CPU wakeup and
scheduling. However this should also increase power consumption. Is
there any measurable difference in the latencies, granted that you
already use ioctl() syscall, as such there will be two context switches.
What is the actual impact?
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 3/5] misc: fastrpc: Add CRC support using invokeV2 request
2025-01-27 4:42 ` [PATCH v2 3/5] misc: fastrpc: Add CRC support using invokeV2 request Ekansh Gupta
@ 2025-01-29 10:41 ` Dmitry Baryshkov
0 siblings, 0 replies; 25+ messages in thread
From: Dmitry Baryshkov @ 2025-01-29 10:41 UTC (permalink / raw)
To: Ekansh Gupta
Cc: srinivas.kandagatla, linux-arm-msm, gregkh, quic_bkumar,
linux-kernel, quic_chennak, dri-devel, arnd
On Mon, Jan 27, 2025 at 10:12:37AM +0530, Ekansh Gupta wrote:
> InvokeV2 request is intended to support multiple enhanced invoke
> requests like CRC check, performance counter enablement and polling
> mode for RPC invocations. CRC check is getting enabled as part of
> this patch. CRC check for input and output argument helps in ensuring
> data consistency over a remote call. If user intends to enable CRC
> check, first local user CRC is calculated at user end and a CRC buffer
> is passed to DSP to capture remote CRC values. DSP is expected to
> write to the remote CRC buffer which is then compared at user level
> with the local CRC values.
>
> Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
> ---
> drivers/misc/fastrpc.c | 78 ++++++++++++++++++++++++++++++-------
> include/uapi/misc/fastrpc.h | 7 ++++
> 2 files changed, 70 insertions(+), 15 deletions(-)
>
> +
> +static int fastrpc_invokev2(struct fastrpc_user *fl, char __user *argp)
> +{
> + struct fastrpc_ctx_args *cargs;
> + struct fastrpc_invoke_v2 inv2;
> + int i, err;
> +
> + if (copy_from_user(&inv2, argp, sizeof(inv2)))
> + return -EFAULT;
> +
> + /* Check if all reserved fields are zero */
> + for (i = 0; i < 16; i++) {
Noticed this while reviewing next patch. No. Nak. Never. Who makes sure
that this magic 16 is the same as the actual size of an array? Please
always use ARRAY_SIZE in such cases.
> + if (inv2.reserved[i] != 0)
> + return -EINVAL;
> }
>
> - cargs->args = args;
> - err = fastrpc_internal_invoke(fl, false, inv.handle, inv.sc, cargs);
> - kfree(args);
> + cargs = kzalloc(sizeof(*cargs), GFP_KERNEL);
> + if (!cargs)
> + return -ENOMEM;
> +
> + cargs->crc = (void __user *)(uintptr_t)inv2.crc;
> +
> + err = fastrpc_remote_invoke(fl, &inv2.inv, cargs);
> kfree(cargs);
>
> return err;
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 5/5] misc: fastrpc: Modify context id mask to support polling mode
2025-01-29 5:43 ` Ekansh Gupta
@ 2025-01-29 11:23 ` Dmitry Baryshkov
2025-02-10 9:05 ` Ekansh Gupta
0 siblings, 1 reply; 25+ messages in thread
From: Dmitry Baryshkov @ 2025-01-29 11:23 UTC (permalink / raw)
To: Ekansh Gupta
Cc: srinivas.kandagatla, linux-arm-msm, gregkh, quic_bkumar,
linux-kernel, quic_chennak, dri-devel, arnd
On Wed, Jan 29, 2025 at 11:13:05AM +0530, Ekansh Gupta wrote:
>
>
>
> On 1/29/2025 5:00 AM, Dmitry Baryshkov wrote:
> > On Mon, Jan 27, 2025 at 10:12:39AM +0530, Ekansh Gupta wrote:
> >> DSP needs last 4 bits of context id to be 0 for polling mode to be
> >> supported as setting of last 8 is intended for async mode(not yet
> >> supported on upstream driver) and setting these bits restrics
> >> writing to poll memory from DSP. Modify context id mask to ensure
> >> polling mode is supported.
> > Shouldn't this commit come before the previous one?
>
> Yes, I'll change the order in next series.
>
> Thanks for reviewing the changes.
Please consider asking somebody for the internal review before sending
patches. This should help you to catch such mistakes.
Next, I keep on asking for a sensible userspace and testing suite. No,
existing fastrpc doesn't pass that criteria. We have discussed that, but
I see no changes in the development. The PR that you've linked in the
cover letter contains a single commit, covering documentation, new
IOCTL, CRC support, poll mode support, etc. What if the discussion ends
up accepting the CRC support but declining the polling mode? Or vice
versa, accepting poll mode but declining the CRC support? There is no
easy way for us to review userspace impact, corresponding changes, etc.
Last, but not least: I want to bring up Sima's response in one of the
earlier discussions ([1]): "Yeah, if fastrpc just keeps growing the
story will completely different."
You are adding new IOCTL and new ivocation API. That totally sounds
like "keeps growing", which returns us back to the uAPI question,
drm_accel.h and the rest of the questions on the userspace, compiler,
etc.
[1] https://lore.kernel.org/dri-devel/Znk87-xCx8f3fIUL@phenom.ffwll.local/
>
> --ekansh
>
> >
> >> Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
> >> ---
> >> drivers/misc/fastrpc.c | 8 ++++----
> >> 1 file changed, 4 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
> >> index 257a741af115..ef56c793c564 100644
> >> --- a/drivers/misc/fastrpc.c
> >> +++ b/drivers/misc/fastrpc.c
> >> @@ -40,7 +40,7 @@
> >> #define FASTRPC_INIT_HANDLE 1
> >> #define FASTRPC_DSP_UTILITIES_HANDLE 2
> >> #define FASTRPC_MAX_STATIC_HANDLE (20)
> >> -#define FASTRPC_CTXID_MASK (0xFF0)
> >> +#define FASTRPC_CTXID_MASK (0xFF0000)
> >> #define INIT_FILELEN_MAX (2 * 1024 * 1024)
> >> #define INIT_FILE_NAMELEN_MAX (128)
> >> #define FASTRPC_DEVICE_NAME "fastrpc"
> >> @@ -524,7 +524,7 @@ static void fastrpc_context_free(struct kref *ref)
> >> fastrpc_buf_free(ctx->buf);
> >>
> >> spin_lock_irqsave(&cctx->lock, flags);
> >> - idr_remove(&cctx->ctx_idr, ctx->ctxid >> 4);
> >> + idr_remove(&cctx->ctx_idr, ctx->ctxid >> 16);
> >> spin_unlock_irqrestore(&cctx->lock, flags);
> >>
> >> kfree(ctx->maps);
> >> @@ -664,7 +664,7 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
> >> spin_unlock_irqrestore(&cctx->lock, flags);
> >> goto err_idr;
> >> }
> >> - ctx->ctxid = ret << 4;
> >> + ctx->ctxid = ret << 16;
> >> spin_unlock_irqrestore(&cctx->lock, flags);
> >>
> >> kref_init(&ctx->refcount);
> >> @@ -2675,7 +2675,7 @@ static int fastrpc_rpmsg_callback(struct rpmsg_device *rpdev, void *data,
> >> if (len < sizeof(*rsp))
> >> return -EINVAL;
> >>
> >> - ctxid = ((rsp->ctx & FASTRPC_CTXID_MASK) >> 4);
> >> + ctxid = ((rsp->ctx & FASTRPC_CTXID_MASK) >> 16);
> >>
> >> spin_lock_irqsave(&cctx->lock, flags);
> >> ctx = idr_find(&cctx->ctx_idr, ctxid);
> >> --
> >> 2.34.1
> >>
>
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 5/5] misc: fastrpc: Modify context id mask to support polling mode
2025-01-29 11:23 ` Dmitry Baryshkov
@ 2025-02-10 9:05 ` Ekansh Gupta
2025-02-10 10:35 ` Dmitry Baryshkov
0 siblings, 1 reply; 25+ messages in thread
From: Ekansh Gupta @ 2025-02-10 9:05 UTC (permalink / raw)
To: Dmitry Baryshkov
Cc: srinivas.kandagatla, linux-arm-msm, gregkh, quic_bkumar,
linux-kernel, quic_chennak, dri-devel, arnd
On 1/29/2025 4:53 PM, Dmitry Baryshkov wrote:
> On Wed, Jan 29, 2025 at 11:13:05AM +0530, Ekansh Gupta wrote:
>>
>>
>> On 1/29/2025 5:00 AM, Dmitry Baryshkov wrote:
>>> On Mon, Jan 27, 2025 at 10:12:39AM +0530, Ekansh Gupta wrote:
>>>> DSP needs last 4 bits of context id to be 0 for polling mode to be
>>>> supported as setting of last 8 is intended for async mode(not yet
>>>> supported on upstream driver) and setting these bits restrics
>>>> writing to poll memory from DSP. Modify context id mask to ensure
>>>> polling mode is supported.
>>> Shouldn't this commit come before the previous one?
>> Yes, I'll change the order in next series.
>>
>> Thanks for reviewing the changes.
> Please consider asking somebody for the internal review before sending
> patches. This should help you to catch such mistakes.
>
> Next, I keep on asking for a sensible userspace and testing suite. No,
> existing fastrpc doesn't pass that criteria. We have discussed that, but
> I see no changes in the development. The PR that you've linked in the
> cover letter contains a single commit, covering documentation, new
> IOCTL, CRC support, poll mode support, etc. What if the discussion ends
> up accepting the CRC support but declining the polling mode? Or vice
> versa, accepting poll mode but declining the CRC support? There is no
> easy way for us to review userspace impact, corresponding changes, etc.
We are working with our Legal team to push HexagonSDK publicly , that will
have sample apps for all features supported by upstream driver and can be used
for testing.
I'll break down the PR to multiple meaningful commits based on the features
that are getting added.
>
> Last, but not least: I want to bring up Sima's response in one of the
> earlier discussions ([1]): "Yeah, if fastrpc just keeps growing the
> story will completely different."
>
> You are adding new IOCTL and new ivocation API. That totally sounds
> like "keeps growing", which returns us back to the uAPI question,
> drm_accel.h and the rest of the questions on the userspace, compiler,
> etc.
>
> [1] https://lore.kernel.org/dri-devel/Znk87-xCx8f3fIUL@phenom.ffwll.local/
Currently, we are upstreaming the features supported on DSP for publicly
available platforms. No features for future platforms are planned for FastRPC
driver.
We are also looking in having the driver under drivers/accel for any new features
that are planned in future platforms.
--ekansh
>
>
>> --ekansh
>>
>>>> Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
>>>> ---
>>>> drivers/misc/fastrpc.c | 8 ++++----
>>>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
>>>> index 257a741af115..ef56c793c564 100644
>>>> --- a/drivers/misc/fastrpc.c
>>>> +++ b/drivers/misc/fastrpc.c
>>>> @@ -40,7 +40,7 @@
>>>> #define FASTRPC_INIT_HANDLE 1
>>>> #define FASTRPC_DSP_UTILITIES_HANDLE 2
>>>> #define FASTRPC_MAX_STATIC_HANDLE (20)
>>>> -#define FASTRPC_CTXID_MASK (0xFF0)
>>>> +#define FASTRPC_CTXID_MASK (0xFF0000)
>>>> #define INIT_FILELEN_MAX (2 * 1024 * 1024)
>>>> #define INIT_FILE_NAMELEN_MAX (128)
>>>> #define FASTRPC_DEVICE_NAME "fastrpc"
>>>> @@ -524,7 +524,7 @@ static void fastrpc_context_free(struct kref *ref)
>>>> fastrpc_buf_free(ctx->buf);
>>>>
>>>> spin_lock_irqsave(&cctx->lock, flags);
>>>> - idr_remove(&cctx->ctx_idr, ctx->ctxid >> 4);
>>>> + idr_remove(&cctx->ctx_idr, ctx->ctxid >> 16);
>>>> spin_unlock_irqrestore(&cctx->lock, flags);
>>>>
>>>> kfree(ctx->maps);
>>>> @@ -664,7 +664,7 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
>>>> spin_unlock_irqrestore(&cctx->lock, flags);
>>>> goto err_idr;
>>>> }
>>>> - ctx->ctxid = ret << 4;
>>>> + ctx->ctxid = ret << 16;
>>>> spin_unlock_irqrestore(&cctx->lock, flags);
>>>>
>>>> kref_init(&ctx->refcount);
>>>> @@ -2675,7 +2675,7 @@ static int fastrpc_rpmsg_callback(struct rpmsg_device *rpdev, void *data,
>>>> if (len < sizeof(*rsp))
>>>> return -EINVAL;
>>>>
>>>> - ctxid = ((rsp->ctx & FASTRPC_CTXID_MASK) >> 4);
>>>> + ctxid = ((rsp->ctx & FASTRPC_CTXID_MASK) >> 16);
>>>>
>>>> spin_lock_irqsave(&cctx->lock, flags);
>>>> ctx = idr_find(&cctx->ctx_idr, ctxid);
>>>> --
>>>> 2.34.1
>>>>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 5/5] misc: fastrpc: Modify context id mask to support polling mode
2025-02-10 9:05 ` Ekansh Gupta
@ 2025-02-10 10:35 ` Dmitry Baryshkov
0 siblings, 0 replies; 25+ messages in thread
From: Dmitry Baryshkov @ 2025-02-10 10:35 UTC (permalink / raw)
To: Ekansh Gupta
Cc: srinivas.kandagatla, linux-arm-msm, gregkh, quic_bkumar,
linux-kernel, quic_chennak, dri-devel, arnd
On Mon, 10 Feb 2025 at 11:05, Ekansh Gupta <quic_ekangupt@quicinc.com> wrote:
>
>
>
>
> On 1/29/2025 4:53 PM, Dmitry Baryshkov wrote:
> > On Wed, Jan 29, 2025 at 11:13:05AM +0530, Ekansh Gupta wrote:
> >>
> >>
> >> On 1/29/2025 5:00 AM, Dmitry Baryshkov wrote:
> >>> On Mon, Jan 27, 2025 at 10:12:39AM +0530, Ekansh Gupta wrote:
> >>>> DSP needs last 4 bits of context id to be 0 for polling mode to be
> >>>> supported as setting of last 8 is intended for async mode(not yet
> >>>> supported on upstream driver) and setting these bits restrics
> >>>> writing to poll memory from DSP. Modify context id mask to ensure
> >>>> polling mode is supported.
> >>> Shouldn't this commit come before the previous one?
> >> Yes, I'll change the order in next series.
> >>
> >> Thanks for reviewing the changes.
> > Please consider asking somebody for the internal review before sending
> > patches. This should help you to catch such mistakes.
> >
> > Next, I keep on asking for a sensible userspace and testing suite. No,
> > existing fastrpc doesn't pass that criteria. We have discussed that, but
> > I see no changes in the development. The PR that you've linked in the
> > cover letter contains a single commit, covering documentation, new
> > IOCTL, CRC support, poll mode support, etc. What if the discussion ends
> > up accepting the CRC support but declining the polling mode? Or vice
> > versa, accepting poll mode but declining the CRC support? There is no
> > easy way for us to review userspace impact, corresponding changes, etc.
>
> We are working with our Legal team to push HexagonSDK publicly , that will
> have sample apps for all features supported by upstream driver and can be used
> for testing.
>
> I'll break down the PR to multiple meaningful commits based on the features
> that are getting added.
>
> >
> > Last, but not least: I want to bring up Sima's response in one of the
> > earlier discussions ([1]): "Yeah, if fastrpc just keeps growing the
> > story will completely different."
> >
> > You are adding new IOCTL and new ivocation API. That totally sounds
> > like "keeps growing", which returns us back to the uAPI question,
> > drm_accel.h and the rest of the questions on the userspace, compiler,
> > etc.
> >
> > [1] https://lore.kernel.org/dri-devel/Znk87-xCx8f3fIUL@phenom.ffwll.local/
>
> Currently, we are upstreaming the features supported on DSP for publicly
> available platforms. No features for future platforms are planned for FastRPC
> driver.
>
> We are also looking in having the driver under drivers/accel for any new features
> that are planned in future platforms.
Granted that there is a single driver, supporting all platforms, I
don't think that supporting new vs old platforms makes any sense.
The message from Sima was about growing the driver / uAPI. From my
point of view, adding a new IOCTL points out active driver development
(from the upstream kernel point of view).
Please talk to your only upstream users - libssc / hexagonrpcd
developers. It should be them reviewing your uAPI changes, not just
me.
And yes, from my side, the question would be the same: if you are
adding a new uAPI, why is it not following drm_accel.h? "It has been
done this way ages ago" isn't an answer for _new_ IOCTLs.
>
> --ekansh
>
> >
> >
> >> --ekansh
> >>
> >>>> Signed-off-by: Ekansh Gupta <quic_ekangupt@quicinc.com>
> >>>> ---
> >>>> drivers/misc/fastrpc.c | 8 ++++----
> >>>> 1 file changed, 4 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/misc/fastrpc.c b/drivers/misc/fastrpc.c
> >>>> index 257a741af115..ef56c793c564 100644
> >>>> --- a/drivers/misc/fastrpc.c
> >>>> +++ b/drivers/misc/fastrpc.c
> >>>> @@ -40,7 +40,7 @@
> >>>> #define FASTRPC_INIT_HANDLE 1
> >>>> #define FASTRPC_DSP_UTILITIES_HANDLE 2
> >>>> #define FASTRPC_MAX_STATIC_HANDLE (20)
> >>>> -#define FASTRPC_CTXID_MASK (0xFF0)
> >>>> +#define FASTRPC_CTXID_MASK (0xFF0000)
> >>>> #define INIT_FILELEN_MAX (2 * 1024 * 1024)
> >>>> #define INIT_FILE_NAMELEN_MAX (128)
> >>>> #define FASTRPC_DEVICE_NAME "fastrpc"
> >>>> @@ -524,7 +524,7 @@ static void fastrpc_context_free(struct kref *ref)
> >>>> fastrpc_buf_free(ctx->buf);
> >>>>
> >>>> spin_lock_irqsave(&cctx->lock, flags);
> >>>> - idr_remove(&cctx->ctx_idr, ctx->ctxid >> 4);
> >>>> + idr_remove(&cctx->ctx_idr, ctx->ctxid >> 16);
> >>>> spin_unlock_irqrestore(&cctx->lock, flags);
> >>>>
> >>>> kfree(ctx->maps);
> >>>> @@ -664,7 +664,7 @@ static struct fastrpc_invoke_ctx *fastrpc_context_alloc(
> >>>> spin_unlock_irqrestore(&cctx->lock, flags);
> >>>> goto err_idr;
> >>>> }
> >>>> - ctx->ctxid = ret << 4;
> >>>> + ctx->ctxid = ret << 16;
> >>>> spin_unlock_irqrestore(&cctx->lock, flags);
> >>>>
> >>>> kref_init(&ctx->refcount);
> >>>> @@ -2675,7 +2675,7 @@ static int fastrpc_rpmsg_callback(struct rpmsg_device *rpdev, void *data,
> >>>> if (len < sizeof(*rsp))
> >>>> return -EINVAL;
> >>>>
> >>>> - ctxid = ((rsp->ctx & FASTRPC_CTXID_MASK) >> 4);
> >>>> + ctxid = ((rsp->ctx & FASTRPC_CTXID_MASK) >> 16);
> >>>>
> >>>> spin_lock_irqsave(&cctx->lock, flags);
> >>>> ctx = idr_find(&cctx->ctx_idr, ctxid);
> >>>> --
> >>>> 2.34.1
> >>>>
>
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver
2025-01-29 10:40 ` Dmitry Baryshkov
@ 2025-03-20 13:49 ` Ekansh Gupta
2025-03-20 14:15 ` Dmitry Baryshkov
0 siblings, 1 reply; 25+ messages in thread
From: Ekansh Gupta @ 2025-03-20 13:49 UTC (permalink / raw)
To: Dmitry Baryshkov, Dmitry Baryshkov
Cc: srinivas.kandagatla, linux-arm-msm, gregkh, quic_bkumar,
linux-kernel, quic_chennak, dri-devel, arnd, ekansh.gupta
On 1/29/2025 4:10 PM, Dmitry Baryshkov wrote:
> On Wed, Jan 29, 2025 at 11:12:16AM +0530, Ekansh Gupta wrote:
>>
>>
>> On 1/29/2025 4:59 AM, Dmitry Baryshkov wrote:
>>> On Mon, Jan 27, 2025 at 10:12:38AM +0530, Ekansh Gupta wrote:
>>>> For any remote call to DSP, after sending an invocation message,
>>>> fastRPC driver waits for glink response and during this time the
>>>> CPU can go into low power modes. Adding a polling mode support
>>>> with which fastRPC driver will poll continuously on a memory
>>>> after sending a message to remote subsystem which will eliminate
>>>> CPU wakeup and scheduling latencies and reduce fastRPC overhead.
>>>> With this change, DSP always sends a glink response which will
>>>> get ignored if polling mode didn't time out.
>>> Is there a chance to implement actual async I/O protocol with the help
>>> of the poll() call instead of hiding the polling / wait inside the
>>> invoke2?
>> This design is based on the implementation on DSP firmware as of today:
>> Call flow: https://github.com/quic-ekangupt/fastrpc/blob/invokev2/Docs/invoke_v2.md#5-polling-mode
>>
>> Can you please give some reference to the async I/O protocol that you've
>> suggested? I can check if it can be implemented here.
> As with the typical poll() call implementation:
> - write some data using ioctl
> - call poll() / select() to wait for the data to be processed
> - read data using another ioctl
>
> Getting back to your patch. from you commit message it is not clear,
> which SoCs support this feature. Reminding you that we are supporting
> all kinds of platforms, including the ones that are EoLed by Qualcomm.
>
> Next, you wrote that in-driver polling eliminates CPU wakeup and
> scheduling. However this should also increase power consumption. Is
> there any measurable difference in the latencies, granted that you
> already use ioctl() syscall, as such there will be two context switches.
> What is the actual impact?
Hi Dmitry,
Thank you for your feedback.
I'm currently reworking this change and adding testing details. Regarding the SoC
support, I'll add all the necessary information. For now, with in-driver
polling, we are seeing significant performance improvements for calls
with different sized buffers. On polling supporting platform, I've observed an
~80us improvement in latency. You can find more details in the test
results here:
https://github.com/quic/fastrpc/pull/134/files#diff-7dbc6537cd3ade7fea5766229cf585db585704e02730efd72e7afc9b148e28ed
Regarding your concerns about power consumption, while in-driver polling
eliminates CPU wakeup and scheduling, it does increase power consumption.
However, the performance gains seem to outweigh this increase.
Do you think the poll implementation that you suggested above could provide similar
improvements?
Thanks,
Ekansh
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver
2025-03-20 13:49 ` Ekansh Gupta
@ 2025-03-20 14:15 ` Dmitry Baryshkov
2025-03-20 15:57 ` Ekansh Gupta
0 siblings, 1 reply; 25+ messages in thread
From: Dmitry Baryshkov @ 2025-03-20 14:15 UTC (permalink / raw)
To: Ekansh Gupta
Cc: Dmitry Baryshkov, srinivas.kandagatla, linux-arm-msm, gregkh,
quic_bkumar, linux-kernel, quic_chennak, dri-devel, arnd
On Thu, Mar 20, 2025 at 07:19:31PM +0530, Ekansh Gupta wrote:
>
>
> On 1/29/2025 4:10 PM, Dmitry Baryshkov wrote:
> > On Wed, Jan 29, 2025 at 11:12:16AM +0530, Ekansh Gupta wrote:
> >>
> >>
> >> On 1/29/2025 4:59 AM, Dmitry Baryshkov wrote:
> >>> On Mon, Jan 27, 2025 at 10:12:38AM +0530, Ekansh Gupta wrote:
> >>>> For any remote call to DSP, after sending an invocation message,
> >>>> fastRPC driver waits for glink response and during this time the
> >>>> CPU can go into low power modes. Adding a polling mode support
> >>>> with which fastRPC driver will poll continuously on a memory
> >>>> after sending a message to remote subsystem which will eliminate
> >>>> CPU wakeup and scheduling latencies and reduce fastRPC overhead.
> >>>> With this change, DSP always sends a glink response which will
> >>>> get ignored if polling mode didn't time out.
> >>> Is there a chance to implement actual async I/O protocol with the help
> >>> of the poll() call instead of hiding the polling / wait inside the
> >>> invoke2?
> >> This design is based on the implementation on DSP firmware as of today:
> >> Call flow: https://github.com/quic-ekangupt/fastrpc/blob/invokev2/Docs/invoke_v2.md#5-polling-mode
> >>
> >> Can you please give some reference to the async I/O protocol that you've
> >> suggested? I can check if it can be implemented here.
> > As with the typical poll() call implementation:
> > - write some data using ioctl
> > - call poll() / select() to wait for the data to be processed
> > - read data using another ioctl
> >
> > Getting back to your patch. from you commit message it is not clear,
> > which SoCs support this feature. Reminding you that we are supporting
> > all kinds of platforms, including the ones that are EoLed by Qualcomm.
> >
> > Next, you wrote that in-driver polling eliminates CPU wakeup and
> > scheduling. However this should also increase power consumption. Is
> > there any measurable difference in the latencies, granted that you
> > already use ioctl() syscall, as such there will be two context switches.
> > What is the actual impact?
>
> Hi Dmitry,
>
> Thank you for your feedback.
>
> I'm currently reworking this change and adding testing details. Regarding the SoC
> support, I'll add all the necessary information.
Please make sure that both the kernel and the userspace can handle the
'non-supported' case properly.
> For now, with in-driver
> polling, we are seeing significant performance improvements for calls
> with different sized buffers. On polling supporting platform, I've observed an
> ~80us improvement in latency. You can find more details in the test
> results here:
> https://github.com/quic/fastrpc/pull/134/files#diff-7dbc6537cd3ade7fea5766229cf585db585704e02730efd72e7afc9b148e28ed
Does the improvement come from the CPU not goint to idle or from the
glink response processing?
> Regarding your concerns about power consumption, while in-driver polling
> eliminates CPU wakeup and scheduling, it does increase power consumption.
> However, the performance gains seem to outweigh this increase.
>
> Do you think the poll implementation that you suggested above could provide similar
> improvements?
No, I agree here. I was more concentrated on userspace polling rather
than hw polling.
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver
2025-03-20 14:15 ` Dmitry Baryshkov
@ 2025-03-20 15:57 ` Ekansh Gupta
2025-03-21 10:18 ` Ekansh Gupta
0 siblings, 1 reply; 25+ messages in thread
From: Ekansh Gupta @ 2025-03-20 15:57 UTC (permalink / raw)
To: Dmitry Baryshkov
Cc: Dmitry Baryshkov, srinivas.kandagatla, linux-arm-msm, gregkh,
quic_bkumar, linux-kernel, quic_chennak, dri-devel, arnd
On 3/20/2025 7:45 PM, Dmitry Baryshkov wrote:
> On Thu, Mar 20, 2025 at 07:19:31PM +0530, Ekansh Gupta wrote:
>>
>> On 1/29/2025 4:10 PM, Dmitry Baryshkov wrote:
>>> On Wed, Jan 29, 2025 at 11:12:16AM +0530, Ekansh Gupta wrote:
>>>>
>>>> On 1/29/2025 4:59 AM, Dmitry Baryshkov wrote:
>>>>> On Mon, Jan 27, 2025 at 10:12:38AM +0530, Ekansh Gupta wrote:
>>>>>> For any remote call to DSP, after sending an invocation message,
>>>>>> fastRPC driver waits for glink response and during this time the
>>>>>> CPU can go into low power modes. Adding a polling mode support
>>>>>> with which fastRPC driver will poll continuously on a memory
>>>>>> after sending a message to remote subsystem which will eliminate
>>>>>> CPU wakeup and scheduling latencies and reduce fastRPC overhead.
>>>>>> With this change, DSP always sends a glink response which will
>>>>>> get ignored if polling mode didn't time out.
>>>>> Is there a chance to implement actual async I/O protocol with the help
>>>>> of the poll() call instead of hiding the polling / wait inside the
>>>>> invoke2?
>>>> This design is based on the implementation on DSP firmware as of today:
>>>> Call flow: https://github.com/quic-ekangupt/fastrpc/blob/invokev2/Docs/invoke_v2.md#5-polling-mode
>>>>
>>>> Can you please give some reference to the async I/O protocol that you've
>>>> suggested? I can check if it can be implemented here.
>>> As with the typical poll() call implementation:
>>> - write some data using ioctl
>>> - call poll() / select() to wait for the data to be processed
>>> - read data using another ioctl
>>>
>>> Getting back to your patch. from you commit message it is not clear,
>>> which SoCs support this feature. Reminding you that we are supporting
>>> all kinds of platforms, including the ones that are EoLed by Qualcomm.
>>>
>>> Next, you wrote that in-driver polling eliminates CPU wakeup and
>>> scheduling. However this should also increase power consumption. Is
>>> there any measurable difference in the latencies, granted that you
>>> already use ioctl() syscall, as such there will be two context switches.
>>> What is the actual impact?
>> Hi Dmitry,
>>
>> Thank you for your feedback.
>>
>> I'm currently reworking this change and adding testing details. Regarding the SoC
>> support, I'll add all the necessary information.
> Please make sure that both the kernel and the userspace can handle the
> 'non-supported' case properly.
Yes, I will include changes to handle in both userspace and kernel.
>
>> For now, with in-driver
>> polling, we are seeing significant performance improvements for calls
>> with different sized buffers. On polling supporting platform, I've observed an
>> ~80us improvement in latency. You can find more details in the test
>> results here:
>> https://github.com/quic/fastrpc/pull/134/files#diff-7dbc6537cd3ade7fea5766229cf585db585704e02730efd72e7afc9b148e28ed
> Does the improvement come from the CPU not goint to idle or from the
> glink response processing?
Although both are contributing to performance improvement, the major
improvement is coming from CPU not going to idle state.
Thanks,
Ekansh
>
>> Regarding your concerns about power consumption, while in-driver polling
>> eliminates CPU wakeup and scheduling, it does increase power consumption.
>> However, the performance gains seem to outweigh this increase.
>>
>> Do you think the poll implementation that you suggested above could provide similar
>> improvements?
> No, I agree here. I was more concentrated on userspace polling rather
> than hw polling.
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver
2025-03-20 15:57 ` Ekansh Gupta
@ 2025-03-21 10:18 ` Ekansh Gupta
2025-03-21 11:17 ` Dmitry Baryshkov
0 siblings, 1 reply; 25+ messages in thread
From: Ekansh Gupta @ 2025-03-21 10:18 UTC (permalink / raw)
To: Dmitry Baryshkov
Cc: Dmitry Baryshkov, srinivas.kandagatla, linux-arm-msm, gregkh,
quic_bkumar, linux-kernel, quic_chennak, dri-devel, arnd
On 3/20/2025 9:27 PM, Ekansh Gupta wrote:
>
> On 3/20/2025 7:45 PM, Dmitry Baryshkov wrote:
>> On Thu, Mar 20, 2025 at 07:19:31PM +0530, Ekansh Gupta wrote:
>>> On 1/29/2025 4:10 PM, Dmitry Baryshkov wrote:
>>>> On Wed, Jan 29, 2025 at 11:12:16AM +0530, Ekansh Gupta wrote:
>>>>> On 1/29/2025 4:59 AM, Dmitry Baryshkov wrote:
>>>>>> On Mon, Jan 27, 2025 at 10:12:38AM +0530, Ekansh Gupta wrote:
>>>>>>> For any remote call to DSP, after sending an invocation message,
>>>>>>> fastRPC driver waits for glink response and during this time the
>>>>>>> CPU can go into low power modes. Adding a polling mode support
>>>>>>> with which fastRPC driver will poll continuously on a memory
>>>>>>> after sending a message to remote subsystem which will eliminate
>>>>>>> CPU wakeup and scheduling latencies and reduce fastRPC overhead.
>>>>>>> With this change, DSP always sends a glink response which will
>>>>>>> get ignored if polling mode didn't time out.
>>>>>> Is there a chance to implement actual async I/O protocol with the help
>>>>>> of the poll() call instead of hiding the polling / wait inside the
>>>>>> invoke2?
>>>>> This design is based on the implementation on DSP firmware as of today:
>>>>> Call flow: https://github.com/quic-ekangupt/fastrpc/blob/invokev2/Docs/invoke_v2.md#5-polling-mode
>>>>>
>>>>> Can you please give some reference to the async I/O protocol that you've
>>>>> suggested? I can check if it can be implemented here.
>>>> As with the typical poll() call implementation:
>>>> - write some data using ioctl
>>>> - call poll() / select() to wait for the data to be processed
>>>> - read data using another ioctl
>>>>
>>>> Getting back to your patch. from you commit message it is not clear,
>>>> which SoCs support this feature. Reminding you that we are supporting
>>>> all kinds of platforms, including the ones that are EoLed by Qualcomm.
>>>>
>>>> Next, you wrote that in-driver polling eliminates CPU wakeup and
>>>> scheduling. However this should also increase power consumption. Is
>>>> there any measurable difference in the latencies, granted that you
>>>> already use ioctl() syscall, as such there will be two context switches.
>>>> What is the actual impact?
>>> Hi Dmitry,
>>>
>>> Thank you for your feedback.
>>>
>>> I'm currently reworking this change and adding testing details. Regarding the SoC
>>> support, I'll add all the necessary information.
>> Please make sure that both the kernel and the userspace can handle the
>> 'non-supported' case properly.
> Yes, I will include changes to handle in both userspace and kernel.
I am seeking additional suggestions on handling "non-supported" cases before making the
changes.
Userspace: To enable DSP side polling, a remote call is made as defined in the DSP image.
If this call fails, polling mode will not be enabled from userspace.
Kernel: Since this is a DSP-specific feature, I plan to add a devicetree property, such
as "qcom,polling-supported," under the fastrpc node if the DSP supports polling mode.
Does this approach seem appropriate, or is there a better way to handle this?
Thanks,
Ekansh
>
>>> For now, with in-driver
>>> polling, we are seeing significant performance improvements for calls
>>> with different sized buffers. On polling supporting platform, I've observed an
>>> ~80us improvement in latency. You can find more details in the test
>>> results here:
>>> https://github.com/quic/fastrpc/pull/134/files#diff-7dbc6537cd3ade7fea5766229cf585db585704e02730efd72e7afc9b148e28ed
>> Does the improvement come from the CPU not goint to idle or from the
>> glink response processing?
> Although both are contributing to performance improvement, the major
> improvement is coming from CPU not going to idle state.
>
> Thanks,
> Ekansh
>
>>> Regarding your concerns about power consumption, while in-driver polling
>>> eliminates CPU wakeup and scheduling, it does increase power consumption.
>>> However, the performance gains seem to outweigh this increase.
>>>
>>> Do you think the poll implementation that you suggested above could provide similar
>>> improvements?
>> No, I agree here. I was more concentrated on userspace polling rather
>> than hw polling.
>>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver
2025-03-21 10:18 ` Ekansh Gupta
@ 2025-03-21 11:17 ` Dmitry Baryshkov
2025-03-21 13:13 ` Ekansh Gupta
0 siblings, 1 reply; 25+ messages in thread
From: Dmitry Baryshkov @ 2025-03-21 11:17 UTC (permalink / raw)
To: Ekansh Gupta
Cc: Dmitry Baryshkov, srinivas.kandagatla, linux-arm-msm, gregkh,
quic_bkumar, linux-kernel, quic_chennak, dri-devel, arnd
On Fri, 21 Mar 2025 at 12:18, Ekansh Gupta
<ekansh.gupta@oss.qualcomm.com> wrote:
>
>
>
> On 3/20/2025 9:27 PM, Ekansh Gupta wrote:
> >
> > On 3/20/2025 7:45 PM, Dmitry Baryshkov wrote:
> >> On Thu, Mar 20, 2025 at 07:19:31PM +0530, Ekansh Gupta wrote:
> >>> On 1/29/2025 4:10 PM, Dmitry Baryshkov wrote:
> >>>> On Wed, Jan 29, 2025 at 11:12:16AM +0530, Ekansh Gupta wrote:
> >>>>> On 1/29/2025 4:59 AM, Dmitry Baryshkov wrote:
> >>>>>> On Mon, Jan 27, 2025 at 10:12:38AM +0530, Ekansh Gupta wrote:
> >>>>>>> For any remote call to DSP, after sending an invocation message,
> >>>>>>> fastRPC driver waits for glink response and during this time the
> >>>>>>> CPU can go into low power modes. Adding a polling mode support
> >>>>>>> with which fastRPC driver will poll continuously on a memory
> >>>>>>> after sending a message to remote subsystem which will eliminate
> >>>>>>> CPU wakeup and scheduling latencies and reduce fastRPC overhead.
> >>>>>>> With this change, DSP always sends a glink response which will
> >>>>>>> get ignored if polling mode didn't time out.
> >>>>>> Is there a chance to implement actual async I/O protocol with the help
> >>>>>> of the poll() call instead of hiding the polling / wait inside the
> >>>>>> invoke2?
> >>>>> This design is based on the implementation on DSP firmware as of today:
> >>>>> Call flow: https://github.com/quic-ekangupt/fastrpc/blob/invokev2/Docs/invoke_v2.md#5-polling-mode
> >>>>>
> >>>>> Can you please give some reference to the async I/O protocol that you've
> >>>>> suggested? I can check if it can be implemented here.
> >>>> As with the typical poll() call implementation:
> >>>> - write some data using ioctl
> >>>> - call poll() / select() to wait for the data to be processed
> >>>> - read data using another ioctl
> >>>>
> >>>> Getting back to your patch. from you commit message it is not clear,
> >>>> which SoCs support this feature. Reminding you that we are supporting
> >>>> all kinds of platforms, including the ones that are EoLed by Qualcomm.
> >>>>
> >>>> Next, you wrote that in-driver polling eliminates CPU wakeup and
> >>>> scheduling. However this should also increase power consumption. Is
> >>>> there any measurable difference in the latencies, granted that you
> >>>> already use ioctl() syscall, as such there will be two context switches.
> >>>> What is the actual impact?
> >>> Hi Dmitry,
> >>>
> >>> Thank you for your feedback.
> >>>
> >>> I'm currently reworking this change and adding testing details. Regarding the SoC
> >>> support, I'll add all the necessary information.
> >> Please make sure that both the kernel and the userspace can handle the
> >> 'non-supported' case properly.
> > Yes, I will include changes to handle in both userspace and kernel.
>
> I am seeking additional suggestions on handling "non-supported" cases before making the
> changes.
>
> Userspace: To enable DSP side polling, a remote call is made as defined in the DSP image.
> If this call fails, polling mode will not be enabled from userspace.
No. Instead userspace should check with the kernel, which capabilities
are supported. Don't perform API calls which knowingly can fail.
>
> Kernel: Since this is a DSP-specific feature, I plan to add a devicetree property, such
> as "qcom,polling-supported," under the fastrpc node if the DSP supports polling mode.
This doesn't sound like a logical solution. The kernel already knows
the hardware that it is running on. As such, there should be no need
to further describe the hardware in DT. If the DSP firmware can report
its capabilities, use that. If not, extend the schema to add an
SoC-specific compatibility string. As a last resort we can use
of_machine_is_compatible().
>
> Does this approach seem appropriate, or is there a better way to handle this?
>
> Thanks,
> Ekansh
>
> >
> >>> For now, with in-driver
> >>> polling, we are seeing significant performance improvements for calls
> >>> with different sized buffers. On polling supporting platform, I've observed an
> >>> ~80us improvement in latency. You can find more details in the test
> >>> results here:
> >>> https://github.com/quic/fastrpc/pull/134/files#diff-7dbc6537cd3ade7fea5766229cf585db585704e02730efd72e7afc9b148e28ed
> >> Does the improvement come from the CPU not goint to idle or from the
> >> glink response processing?
> > Although both are contributing to performance improvement, the major
> > improvement is coming from CPU not going to idle state.
> >
> > Thanks,
> > Ekansh
> >
> >>> Regarding your concerns about power consumption, while in-driver polling
> >>> eliminates CPU wakeup and scheduling, it does increase power consumption.
> >>> However, the performance gains seem to outweigh this increase.
> >>>
> >>> Do you think the poll implementation that you suggested above could provide similar
> >>> improvements?
> >> No, I agree here. I was more concentrated on userspace polling rather
> >> than hw polling.
> >>
>
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver
2025-03-21 11:17 ` Dmitry Baryshkov
@ 2025-03-21 13:13 ` Ekansh Gupta
0 siblings, 0 replies; 25+ messages in thread
From: Ekansh Gupta @ 2025-03-21 13:13 UTC (permalink / raw)
To: Dmitry Baryshkov
Cc: Dmitry Baryshkov, srinivas.kandagatla, linux-arm-msm, gregkh,
quic_bkumar, linux-kernel, quic_chennak, dri-devel, arnd
On 3/21/2025 4:47 PM, Dmitry Baryshkov wrote:
> On Fri, 21 Mar 2025 at 12:18, Ekansh Gupta
> <ekansh.gupta@oss.qualcomm.com> wrote:
>>
>>
>> On 3/20/2025 9:27 PM, Ekansh Gupta wrote:
>>> On 3/20/2025 7:45 PM, Dmitry Baryshkov wrote:
>>>> On Thu, Mar 20, 2025 at 07:19:31PM +0530, Ekansh Gupta wrote:
>>>>> On 1/29/2025 4:10 PM, Dmitry Baryshkov wrote:
>>>>>> On Wed, Jan 29, 2025 at 11:12:16AM +0530, Ekansh Gupta wrote:
>>>>>>> On 1/29/2025 4:59 AM, Dmitry Baryshkov wrote:
>>>>>>>> On Mon, Jan 27, 2025 at 10:12:38AM +0530, Ekansh Gupta wrote:
>>>>>>>>> For any remote call to DSP, after sending an invocation message,
>>>>>>>>> fastRPC driver waits for glink response and during this time the
>>>>>>>>> CPU can go into low power modes. Adding a polling mode support
>>>>>>>>> with which fastRPC driver will poll continuously on a memory
>>>>>>>>> after sending a message to remote subsystem which will eliminate
>>>>>>>>> CPU wakeup and scheduling latencies and reduce fastRPC overhead.
>>>>>>>>> With this change, DSP always sends a glink response which will
>>>>>>>>> get ignored if polling mode didn't time out.
>>>>>>>> Is there a chance to implement actual async I/O protocol with the help
>>>>>>>> of the poll() call instead of hiding the polling / wait inside the
>>>>>>>> invoke2?
>>>>>>> This design is based on the implementation on DSP firmware as of today:
>>>>>>> Call flow: https://github.com/quic-ekangupt/fastrpc/blob/invokev2/Docs/invoke_v2.md#5-polling-mode
>>>>>>>
>>>>>>> Can you please give some reference to the async I/O protocol that you've
>>>>>>> suggested? I can check if it can be implemented here.
>>>>>> As with the typical poll() call implementation:
>>>>>> - write some data using ioctl
>>>>>> - call poll() / select() to wait for the data to be processed
>>>>>> - read data using another ioctl
>>>>>>
>>>>>> Getting back to your patch. from you commit message it is not clear,
>>>>>> which SoCs support this feature. Reminding you that we are supporting
>>>>>> all kinds of platforms, including the ones that are EoLed by Qualcomm.
>>>>>>
>>>>>> Next, you wrote that in-driver polling eliminates CPU wakeup and
>>>>>> scheduling. However this should also increase power consumption. Is
>>>>>> there any measurable difference in the latencies, granted that you
>>>>>> already use ioctl() syscall, as such there will be two context switches.
>>>>>> What is the actual impact?
>>>>> Hi Dmitry,
>>>>>
>>>>> Thank you for your feedback.
>>>>>
>>>>> I'm currently reworking this change and adding testing details. Regarding the SoC
>>>>> support, I'll add all the necessary information.
>>>> Please make sure that both the kernel and the userspace can handle the
>>>> 'non-supported' case properly.
>>> Yes, I will include changes to handle in both userspace and kernel.
>> I am seeking additional suggestions on handling "non-supported" cases before making the
>> changes.
>>
>> Userspace: To enable DSP side polling, a remote call is made as defined in the DSP image.
>> If this call fails, polling mode will not be enabled from userspace.
> No. Instead userspace should check with the kernel, which capabilities
> are supported. Don't perform API calls which knowingly can fail.
>
>> Kernel: Since this is a DSP-specific feature, I plan to add a devicetree property, such
>> as "qcom,polling-supported," under the fastrpc node if the DSP supports polling mode.
> This doesn't sound like a logical solution. The kernel already knows
> the hardware that it is running on. As such, there should be no need
> to further describe the hardware in DT. If the DSP firmware can report
> its capabilities, use that. If not, extend the schema to add an
> SoC-specific compatibility string. As a last resort we can use
> of_machine_is_compatible().
Thanks for your suggestions. I'll try these out.
--Ekansh
>
>> Does this approach seem appropriate, or is there a better way to handle this?
>>
>> Thanks,
>> Ekansh
>>
>>>>> For now, with in-driver
>>>>> polling, we are seeing significant performance improvements for calls
>>>>> with different sized buffers. On polling supporting platform, I've observed an
>>>>> ~80us improvement in latency. You can find more details in the test
>>>>> results here:
>>>>> https://github.com/quic/fastrpc/pull/134/files#diff-7dbc6537cd3ade7fea5766229cf585db585704e02730efd72e7afc9b148e28ed
>>>> Does the improvement come from the CPU not goint to idle or from the
>>>> glink response processing?
>>> Although both are contributing to performance improvement, the major
>>> improvement is coming from CPU not going to idle state.
>>>
>>> Thanks,
>>> Ekansh
>>>
>>>>> Regarding your concerns about power consumption, while in-driver polling
>>>>> eliminates CPU wakeup and scheduling, it does increase power consumption.
>>>>> However, the performance gains seem to outweigh this increase.
>>>>>
>>>>> Do you think the poll implementation that you suggested above could provide similar
>>>>> improvements?
>>>> No, I agree here. I was more concentrated on userspace polling rather
>>>> than hw polling.
>>>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2025-03-21 13:13 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-27 4:42 [PATCH v2 0/5] misc: fastrpc: Add invokeV2 to support new features Ekansh Gupta
2025-01-27 4:42 ` [PATCH v2 1/5] misc: fastrpc: Move fdlist to invoke context structure Ekansh Gupta
2025-01-27 15:37 ` Dmitry Baryshkov
2025-01-27 4:42 ` [PATCH v2 2/5] misc: fastrpc: Introduce context params structure Ekansh Gupta
2025-01-27 15:43 ` Dmitry Baryshkov
2025-01-29 4:57 ` Ekansh Gupta
2025-01-29 10:29 ` Dmitry Baryshkov
2025-01-27 4:42 ` [PATCH v2 3/5] misc: fastrpc: Add CRC support using invokeV2 request Ekansh Gupta
2025-01-29 10:41 ` Dmitry Baryshkov
2025-01-27 4:42 ` [PATCH v2 4/5] misc: fastrpc: Add polling mode support for fastRPC driver Ekansh Gupta
2025-01-28 23:29 ` Dmitry Baryshkov
2025-01-29 5:42 ` Ekansh Gupta
2025-01-29 10:40 ` Dmitry Baryshkov
2025-03-20 13:49 ` Ekansh Gupta
2025-03-20 14:15 ` Dmitry Baryshkov
2025-03-20 15:57 ` Ekansh Gupta
2025-03-21 10:18 ` Ekansh Gupta
2025-03-21 11:17 ` Dmitry Baryshkov
2025-03-21 13:13 ` Ekansh Gupta
2025-01-27 4:42 ` [PATCH v2 5/5] misc: fastrpc: Modify context id mask to support polling mode Ekansh Gupta
2025-01-28 23:30 ` Dmitry Baryshkov
2025-01-29 5:43 ` Ekansh Gupta
2025-01-29 11:23 ` Dmitry Baryshkov
2025-02-10 9:05 ` Ekansh Gupta
2025-02-10 10:35 ` Dmitry Baryshkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).