public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 1/2] blk-mq: Convert request->csd to call_single_data_t and reposition it
@ 2023-05-11  8:58 Leonardo Bras
  2023-05-11  8:58 ` [RFC PATCH 2/2] smp: Change signatures to use call_single_data_t Leonardo Bras
  2023-05-12 15:01 ` [RFC PATCH 1/2] blk-mq: Convert request->csd to call_single_data_t and reposition it Jens Axboe
  0 siblings, 2 replies; 4+ messages in thread
From: Leonardo Bras @ 2023-05-11  8:58 UTC (permalink / raw)
  To: Jens Axboe, Peter Zijlstra, Paul E. McKenney, Valentin Schneider,
	Juergen Gross, Yury Norov, Leonardo Bras
  Cc: linux-block, linux-kernel

Currently, request->csd has type struct __call_single_data.

call_single_data_t is defined in include/linux/smp.h :

/* Use __aligned() to avoid to use 2 cache lines for 1 csd */
typedef struct __call_single_data call_single_data_t
	__aligned(sizeof(struct __call_single_data));

As the comment above the typedef suggests, having this struct split between
2 cachelines causes the need to fetch / invalidate / bounce 2 cachelines
instead of 1 when the cpu receiving the request gets to run the requested
function. This is usually bad for performance, due to one extra memory
access and 1 extra cacheline usage.

Changing request->csd was previously attempted in commit
966a967116e6 ("smp: Avoid using two cache lines for struct call_single_data")
but at the time the union that contains csd was positioned near the top of
struct request, only below a struct list_head, and this caused the issue of
holes summing up 24 extra bytes in struct request.

The struct size was restored back to normal by
commit 4ccafe032005 ("block: unalign call_single_data in struct request")
but it caused the csd to be possibly split in 2 cachelines again.

As an example with a 64-bit machine with
CONFIG_BLK_RQ_ALLOC_TIME=y
CONFIG_BLK_WBT=y
CONFIG_BLK_DEV_INTEGRITY=y
CONFIG_BLK_INLINE_ENCRYPTION=y

Will output pahole with:
struct request {
[...]
	union {
		struct __call_single_data csd;           /*   240    32 */
		u64                fifo_time;            /*   240     8 */
	};                                               /*   240    32 */
[...]
}

At this config, and any cacheline size between 32 and 256, will cause csd
to be split between 2 cachelines: csd->node (16 bytes) in the first
cacheline, and csd->func (8 bytes) & csd->info (8 bytes) in the second.

During blk_mq_complete_send_ipi(), csd->func and csd->info are getting
changed, and when it calls __smp_call_single_queue() csd->node will get
changed.

On the cpu which got the request, csd->func and csd->info get read by
__flush_smp_call_function_queue() and csd->node gets changed by
csd_unlock(), meaning the two cachelines containing csd will get accessed.

To avoid this, it would be necessary to change request->csd back to
csd_single_data_t, which may end up increasing the struct size.
(In above example, it increased from 288 to 320 -> 32 bytes).

In order to keep the csd_single_data_t and avoid the struct's size
increase, move request->csd to the end of the struct.
The rationale of this strategy is that for cachelines >= 32 bytes, there
will never be used an extra cacheline for struct request:

- If request->csd is 32-byte aligned, there is no change in the object.
- If request->csd is not 32-byte aligned, and part of it is in a different
  cacheline, the whole csd is moved to that cacheline.
- If request->csd is not 32-byte aligned, but it's all contained in the
  same cacheline (> 32 bytes), aligning it to 32 will just put it a few
  bytes forward in this cacheline.

(In above example, the change kept the struct's size in 288 bytes).

Convert request->csd to csd_single_data_t and move it to the end of
struct request, so csd is never split between cachelines and don't use any
extra cachelines.

Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
 include/linux/blk-mq.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 06caacd77ed6..50ef86172621 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -189,16 +189,16 @@ struct request {
 		} flush;
 	};
 
-	union {
-		struct __call_single_data csd;
-		u64 fifo_time;
-	};
-
 	/*
 	 * completion callback.
 	 */
 	rq_end_io_fn *end_io;
 	void *end_io_data;
+
+	union {
+		call_single_data_t csd;
+		u64 fifo_time;
+	};
 };
 
 static inline enum req_op req_op(const struct request *req)
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-15 20:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-11  8:58 [RFC PATCH 1/2] blk-mq: Convert request->csd to call_single_data_t and reposition it Leonardo Bras
2023-05-11  8:58 ` [RFC PATCH 2/2] smp: Change signatures to use call_single_data_t Leonardo Bras
2023-05-12 15:01 ` [RFC PATCH 1/2] blk-mq: Convert request->csd to call_single_data_t and reposition it Jens Axboe
2023-05-15 20:15   ` Leonardo Brás

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox