* shared variables between requester and completer threads - a concern
@ 2023-06-15 16:01 Bob Pearson
2023-06-16 5:25 ` Zhu Yanjun
2023-06-19 12:42 ` Jason Gunthorpe
0 siblings, 2 replies; 3+ messages in thread
From: Bob Pearson @ 2023-06-15 16:01 UTC (permalink / raw)
To: Jason Gunthorpe, Zhu Yanjun, linux-rdma@vger.kernel.org
Cc: Hack, Jenny (Ft. Collins)
I am still on a campaign to tighten the screws in the rxe driver. There are a lot of variables that are shared
between the requester task and the completer task (now on work queues) that control resources and error recovery.
There is almost no effort to make sure changes in one thread are visible in the other. The following is a summary:
In requester task In completer task
qp->req.psn RW R
qp->req.rd_atomic (A) RW W
qp->req.wait_fence W RW
qp->req.need_rd_atomic W RW
qp->req.wait_psn W RW
qp->req.need_retry RW RW
qp->req.wait_for_rnr_timer RW W
These are all int's except for rd_atomic which is an atomic_t and all properly aligned.
Several of these are similar to wait_fence:
if (rxe_wqe_is_fenced(qp, wqe) {
qp->req.wait_fence = 1;
goto exit; (the task thread)
}
...
// completed something
if (qp->req.wait_fence) {
qp->req.wait_fence = 0;
rxe_sched_task(&qp->req.task);
// requester will run at least once
// after this
}
As long as the write and read actually get executed this will work eventually because the caches are
coherent. But what if they don't? The sched_task implies a memory barrier before the requester task
runs again but it doesn't read wait_fence so it doesn't seem to matter.
There also may be a race between a second execution of the requester re-setting the flag and the completer
clearing it since someone else (e.g. verbs API could also schedule the requester.) I think the worst
that can happen here is an extra rescheduling which is safe.
Could add an explicit memory barrier in the requester or matched smp_store_release/smp_load_acquire,
or a spinlock, or WRITE_ONCE/READ_ONCE. I am not sure what, if anything, should be done in this case.
It currently works fine AFAIK on x86/x64 but there are others.
Thanks for your advice.
Bob
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: shared variables between requester and completer threads - a concern
2023-06-15 16:01 shared variables between requester and completer threads - a concern Bob Pearson
@ 2023-06-16 5:25 ` Zhu Yanjun
2023-06-19 12:42 ` Jason Gunthorpe
1 sibling, 0 replies; 3+ messages in thread
From: Zhu Yanjun @ 2023-06-16 5:25 UTC (permalink / raw)
To: Bob Pearson, Jason Gunthorpe, Zhu Yanjun,
linux-rdma@vger.kernel.org
Cc: Hack, Jenny (Ft. Collins)
在 2023/6/16 0:01, Bob Pearson 写道:
> I am still on a campaign to tighten the screws in the rxe driver. There are a lot of variables that are shared
> between the requester task and the completer task (now on work queues) that control resources and error recovery.
> There is almost no effort to make sure changes in one thread are visible in the other. The following is a summary:
>
> In requester task In completer task
> qp->req.psn RW R
> qp->req.rd_atomic (A) RW W
> qp->req.wait_fence W RW
> qp->req.need_rd_atomic W RW
> qp->req.wait_psn W RW
> qp->req.need_retry RW RW
> qp->req.wait_for_rnr_timer RW W
>
> These are all int's except for rd_atomic which is an atomic_t and all properly aligned.
> Several of these are similar to wait_fence:
>
> if (rxe_wqe_is_fenced(qp, wqe) {
> qp->req.wait_fence = 1;
> goto exit; (the task thread)
> }
> ...
> // completed something
> if (qp->req.wait_fence) {
> qp->req.wait_fence = 0;
> rxe_sched_task(&qp->req.task);
> // requester will run at least once
> // after this
> }
>
> As long as the write and read actually get executed this will work eventually because the caches are
> coherent. But what if they don't? The sched_task implies a memory barrier before the requester task
> runs again but it doesn't read wait_fence so it doesn't seem to matter.
>
> There also may be a race between a second execution of the requester re-setting the flag and the completer
> clearing it since someone else (e.g. verbs API could also schedule the requester.) I think the worst
> that can happen here is an extra rescheduling which is safe.
>
> Could add an explicit memory barrier in the requester or matched smp_store_release/smp_load_acquire,
> or a spinlock, or WRITE_ONCE/READ_ONCE. I am not sure what, if anything, should be done in this case.
> It currently works fine AFAIK on x86/x64 but there are others.
Thanks a lot for your great work. Bob.
I confirm that RXE can work well in aarch64 hosts.
In the following host,
"
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 1
NUMA node(s): 1
Vendor ID: ARM
Model: 0
Model name: Cortex-A57
Stepping: r1p0
BogoMIPS: 125.00
NUMA node0 CPU(s): 0-7
"
I built RXE, load/unload rxe and simple rping between rxe clients.
It can work well.
Not sure if rxe can work well on other architectures or not.
If someone can make tests on other architectures, please let us know.
Best Regards,
Zhu Yanjun
>
> Thanks for your advice.
>
> Bob
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: shared variables between requester and completer threads - a concern
2023-06-15 16:01 shared variables between requester and completer threads - a concern Bob Pearson
2023-06-16 5:25 ` Zhu Yanjun
@ 2023-06-19 12:42 ` Jason Gunthorpe
1 sibling, 0 replies; 3+ messages in thread
From: Jason Gunthorpe @ 2023-06-19 12:42 UTC (permalink / raw)
To: Bob Pearson
Cc: Zhu Yanjun, linux-rdma@vger.kernel.org, Hack, Jenny (Ft. Collins)
On Thu, Jun 15, 2023 at 11:01:38AM -0500, Bob Pearson wrote:
> I am still on a campaign to tighten the screws in the rxe driver. There are a lot of variables that are shared
> between the requester task and the completer task (now on work queues) that control resources and error recovery.
> There is almost no effort to make sure changes in one thread are visible in the other. The following is a summary:
>
> In requester task In completer task
> qp->req.psn RW R
> qp->req.rd_atomic (A) RW W
> qp->req.wait_fence W RW
> qp->req.need_rd_atomic W RW
> qp->req.wait_psn W RW
> qp->req.need_retry RW RW
> qp->req.wait_for_rnr_timer RW W
>
> These are all int's except for rd_atomic which is an atomic_t and all properly aligned.
> Several of these are similar to wait_fence:
>
> if (rxe_wqe_is_fenced(qp, wqe) {
> qp->req.wait_fence = 1;
> goto exit; (the task thread)
> }
> ...
> // completed something
> if (qp->req.wait_fence) {
> qp->req.wait_fence = 0;
> rxe_sched_task(&qp->req.task);
> // requester will run at least once
> // after this
> }
>
> As long as the write and read actually get executed this will work eventually because the caches are
> coherent. But what if they don't? The sched_task implies a memory barrier before the requester task
> runs again but it doesn't read wait_fence so it doesn't seem to matter.
>
> There also may be a race between a second execution of the requester re-setting the flag and the completer
> clearing it since someone else (e.g. verbs API could also schedule the requester.) I think the worst
> that can happen here is an extra rescheduling which is safe.
>
> Could add an explicit memory barrier in the requester or matched smp_store_release/smp_load_acquire,
> or a spinlock, or WRITE_ONCE/READ_ONCE. I am not sure what, if anything, should be done in this case.
> It currently works fine AFAIK on x86/x64 but there are others.
It looks really sketchy.
This is the requestor hitting a fence opcode and needing to pause
processing until the completor reaches the matching barrier? How is
this just not completely racy? Forget about caches and barriers.
Jason
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-06-19 12:42 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-15 16:01 shared variables between requester and completer threads - a concern Bob Pearson
2023-06-16 5:25 ` Zhu Yanjun
2023-06-19 12:42 ` Jason Gunthorpe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).