* [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head
@ 2022-10-20 15:35 Uros Bizjak
2022-10-23 8:22 ` Sagi Grimberg
` (2 more replies)
0 siblings, 3 replies; 5+ messages in thread
From: Uros Bizjak @ 2022-10-20 15:35 UTC (permalink / raw)
To: linux-nvme, linux-kernel
Cc: Uros Bizjak, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni
Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
nvmet_update_sq_head. x86 CMPXCHG instruction returns success in ZF flag, so
this change saves a compare after cmpxchg (and related move instruction in
front of cmpxchg).
Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg
fails. There is no need to re-read the value in the loop.
Note that the value from *ptr should be read using READ_ONCE to prevent
the compiler from merging, refetching or reordering the read.
No functional change intended.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
---
drivers/nvme/target/core.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 14677145bbba..a384a0927aed 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -695,11 +695,10 @@ static void nvmet_update_sq_head(struct nvmet_req *req)
if (req->sq->size) {
u32 old_sqhd, new_sqhd;
+ old_sqhd = READ_ONCE(req->sq->sqhd);
do {
- old_sqhd = req->sq->sqhd;
new_sqhd = (old_sqhd + 1) % req->sq->size;
- } while (cmpxchg(&req->sq->sqhd, old_sqhd, new_sqhd) !=
- old_sqhd);
+ } while (!try_cmpxchg(&req->sq->sqhd, &old_sqhd, new_sqhd));
}
req->cqe->sq_head = cpu_to_le16(req->sq->sqhd & 0x0000FFFF);
}
--
2.37.3
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head 2022-10-20 15:35 [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head Uros Bizjak @ 2022-10-23 8:22 ` Sagi Grimberg 2022-10-25 0:47 ` Chaitanya Kulkarni 2022-11-01 9:45 ` Christoph Hellwig 2 siblings, 0 replies; 5+ messages in thread From: Sagi Grimberg @ 2022-10-23 8:22 UTC (permalink / raw) To: Uros Bizjak, linux-nvme, linux-kernel Cc: Christoph Hellwig, Chaitanya Kulkarni Reviewed-by: Sagi Grimberg <sagi@grimberg.me> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head 2022-10-20 15:35 [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head Uros Bizjak 2022-10-23 8:22 ` Sagi Grimberg @ 2022-10-25 0:47 ` Chaitanya Kulkarni 2022-10-25 6:48 ` Uros Bizjak 2022-11-01 9:45 ` Christoph Hellwig 2 siblings, 1 reply; 5+ messages in thread From: Chaitanya Kulkarni @ 2022-10-25 0:47 UTC (permalink / raw) To: Uros Bizjak, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni On 10/20/22 08:35, Uros Bizjak wrote: > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in > nvmet_update_sq_head. x86 CMPXCHG instruction returns success in ZF flag, so > this change saves a compare after cmpxchg (and related move instruction in > front of cmpxchg). > Is it worth a share delts of assembly instructions of the changes above? as developers on block mailing list are sharing the delta between before and after patch including the assembly. I also hope that you have tested this change with blktests nvme. Either way:- Reviewed-by: ChaItanya Kulkarni <kch@nvidia.com> -ck ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head 2022-10-25 0:47 ` Chaitanya Kulkarni @ 2022-10-25 6:48 ` Uros Bizjak 0 siblings, 0 replies; 5+ messages in thread From: Uros Bizjak @ 2022-10-25 6:48 UTC (permalink / raw) To: Chaitanya Kulkarni Cc: linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org, Christoph Hellwig, Sagi Grimberg On Tue, Oct 25, 2022 at 2:47 AM Chaitanya Kulkarni <chaitanyak@nvidia.com> wrote: > > On 10/20/22 08:35, Uros Bizjak wrote: > > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in > > nvmet_update_sq_head. x86 CMPXCHG instruction returns success in ZF flag, so > > this change saves a compare after cmpxchg (and related move instruction in > > front of cmpxchg). > > > > Is it worth a share delts of assembly instructions of the changes above? > as developers on block mailing list are sharing the delta between before > and after patch including the assembly. The difference in the assembly of nvmet_update_sq_head function is: before: 0000000000001d30 <nvmet_update_sq_head>: 1d30: 48 8b 4f 10 mov 0x10(%rdi),%rcx 1d34: 49 89 f8 mov %rdi,%r8 1d37: 0f b7 71 1a movzwl 0x1a(%rcx),%esi 1d3b: 66 85 f6 test %si,%si 1d3e: 75 14 jne 1d54 <nvmet_update_sq_head+0x24> 1d40: 49 8b 40 08 mov 0x8(%r8),%rax 1d44: 8b 51 1c mov 0x1c(%rcx),%edx 1d47: 66 89 50 08 mov %dx,0x8(%rax) 1d4b: e9 00 00 00 00 jmpq 1d50 <nvmet_update_sq_head+0x20> 1d4c: R_X86_64_PLT32 __x86_return_thunk-0x4 1d50: 0f b7 71 1a movzwl 0x1a(%rcx),%esi 1d54: 8b 79 1c mov 0x1c(%rcx),%edi 1d57: 31 d2 xor %edx,%edx 1d59: 8d 47 01 lea 0x1(%rdi),%eax 1d5c: f7 f6 div %esi 1d5e: 89 f8 mov %edi,%eax 1d60: f0 0f b1 51 1c lock cmpxchg %edx,0x1c(%rcx) 1d65: 49 8b 48 10 mov 0x10(%r8),%rcx 1d69: 39 c7 cmp %eax,%edi 1d6b: 75 e3 jne 1d50 <nvmet_update_sq_head+0x20> 1d6d: 49 8b 40 08 mov 0x8(%r8),%rax 1d71: 8b 51 1c mov 0x1c(%rcx),%edx 1d74: 66 89 50 08 mov %dx,0x8(%rax) 1d78: e9 00 00 00 00 jmpq 1d7d <nvmet_update_sq_head+0x4d> 1d79: R_X86_64_PLT32 __x86_return_thunk-0x4 1d7d: after: 0000000000001d30 <nvmet_update_sq_head>: 1d30: 48 8b 4f 10 mov 0x10(%rdi),%rcx 1d34: 0f b7 51 1a movzwl 0x1a(%rcx),%edx 1d38: 66 85 d2 test %dx,%dx 1d3b: 74 1e je 1d5b <nvmet_update_sq_head+0x2b> 1d3d: 8b 71 1c mov 0x1c(%rcx),%esi 1d40: 44 0f b7 c2 movzwl %dx,%r8d 1d44: 8d 46 01 lea 0x1(%rsi),%eax 1d47: 31 d2 xor %edx,%edx 1d49: 41 f7 f0 div %r8d 1d4c: 89 f0 mov %esi,%eax 1d4e: f0 0f b1 51 1c lock cmpxchg %edx,0x1c(%rcx) 1d53: 48 8b 4f 10 mov 0x10(%rdi),%rcx 1d57: 89 c6 mov %eax,%esi 1d59: 75 10 jne 1d6b <nvmet_update_sq_head+0x3b> 1d5b: 48 8b 47 08 mov 0x8(%rdi),%rax 1d5f: 8b 51 1c mov 0x1c(%rcx),%edx 1d62: 66 89 50 08 mov %dx,0x8(%rax) 1d66: e9 00 00 00 00 jmpq 1d6b <nvmet_update_sq_head+0x3b> 1d67: R_X86_64_PLT32 __x86_return_thunk-0x4 1d6b: 0f b7 51 1a movzwl 0x1a(%rcx),%edx 1d6f: eb cf jmp 1d40 <nvmet_update_sq_head+0x10> 1d71: You can see that in addition to the smaller size of the function, the load of req->sq->size at 1d6b got moved to a cold path. As the main benefit, the load at 1d3d is now out of the loop, and the value in %esi is now provided by cmpxchg insn itself at 1d4e (plus move at 1d57). Unfortunately, the division clobbers %eax, so some reg-reg moves are necessary. Note also that the compare at 1d69 is now gone. > I also hope that you have tested this change with blktests nvme. No, I didn't test the patch that thoroughly, but the change is the same as some similar recent changes in the generic code, so I confirmed the patch by inspecting asm code. OTOH, the kernel booted from a NVME SSD. > Either way:- > > Reviewed-by: ChaItanya Kulkarni <kch@nvidia.com> Thanks! Uros. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head 2022-10-20 15:35 [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head Uros Bizjak 2022-10-23 8:22 ` Sagi Grimberg 2022-10-25 0:47 ` Chaitanya Kulkarni @ 2022-11-01 9:45 ` Christoph Hellwig 2 siblings, 0 replies; 5+ messages in thread From: Christoph Hellwig @ 2022-11-01 9:45 UTC (permalink / raw) To: Uros Bizjak Cc: linux-nvme, linux-kernel, Christoph Hellwig, Sagi Grimberg, Chaitanya Kulkarni Thanks, applied to nvme-6.2. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-11-01 9:45 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-10-20 15:35 [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head Uros Bizjak 2022-10-23 8:22 ` Sagi Grimberg 2022-10-25 0:47 ` Chaitanya Kulkarni 2022-10-25 6:48 ` Uros Bizjak 2022-11-01 9:45 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox