* [PATCH 0/7] Set/get accessor speedups
@ 2025-07-01 17:23 David Sterba
2025-07-01 17:23 ` [PATCH 1/7] btrfs: accessors: simplify folio bounds checks David Sterba
` (7 more replies)
0 siblings, 8 replies; 16+ messages in thread
From: David Sterba @ 2025-07-01 17:23 UTC (permalink / raw)
To: linux-btrfs; +Cc: David Sterba
Followup to [1] bringing refactoring that allows compiler to do more
optimizations and stack usage reduction. As the set/get helpers are
heavily used in many functions this improves performance. I will provide
some numbers later as I've done mostly functional tests. During the
development I've identified a few more possible optimizations to improve
performance, but this series is OK as is.
Overall effects of this series:
Stack:
btrfs_set_16 -72 (88 -> 16)
btrfs_get_32 -56 (80 -> 24)
btrfs_set_8 -72 (88 -> 16)
btrfs_set_64 -64 (88 -> 24)
btrfs_get_8 -72 (80 -> 8)
btrfs_get_16 -64 (80 -> 16)
btrfs_set_32 -64 (88 -> 24)
btrfs_get_64 -56 (80 -> 24)
NEW (48):
report_setget_bounds 48
LOST/NEW DELTA: +48
PRE/POST DELTA: -472
Code:
text data bss dec hex filename
1456601 115665 16088 1588354 183c82 pre/btrfs.ko
1454229 115665 16088 1585982 18333e post/btrfs.ko
DELTA: -2372
[1] https://lore.kernel.org/linux-btrfs/cover.1751032655.git.dsterba@suse.com/
David Sterba (7):
btrfs: accessors: simplify folio bounds checks
btrfs: accessors: use type sizeof constants directly
btrfs: accessors: inline eb bounds check and factor out the error
report
btrfs: accessors: compile-time fast path for u8
btrfs: accessors: compile-time fast path for u16
btrfs: accessors: set target address at initialization
btrfs: accessors: factor out split memcpy with two sources
fs/btrfs/accessors.c | 84 +++++++++++++++++++++++++++-----------------
1 file changed, 51 insertions(+), 33 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/7] btrfs: accessors: simplify folio bounds checks
2025-07-01 17:23 [PATCH 0/7] Set/get accessor speedups David Sterba
@ 2025-07-01 17:23 ` David Sterba
2025-07-01 17:23 ` [PATCH 2/7] btrfs: accessors: use type sizeof constants directly David Sterba
` (6 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2025-07-01 17:23 UTC (permalink / raw)
To: linux-btrfs; +Cc: David Sterba
As we can have non-contiguous range in the eb->folios, any item can be
straddling two folios and we need to check if it can be read in one go
or in two parts. For that there's a check which is not implemented in
the simplest way:
offset in folio + size <= folio size
With a simple expression transformation:
oil + size <= unit_size
size <= unit_size - oil
sizeof() <= part
this can be simplified and reusing existing run-time or compile-time
constants.
Add likely() annotation for this expression as this is the fast path and
compiler sometimes reorders that after the followup block with the
memcpy (observed in practice with other simplifications).
Overall effect on stack consumption:
btrfs_get_8 -8 (80 -> 72)
btrfs_set_8 -8 (88 -> 80)
And .ko size (due to optimizations making use of the direct constants):
text data bss dec hex filename
1456601 115665 16088 1588354 183c82 pre/btrfs.ko
1456093 115665 16088 1587846 183a86 post/btrfs.ko
DELTA: -508
Signed-off-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/accessors.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/accessors.c b/fs/btrfs/accessors.c
index 5cfb0801700e6c..b54c8abe467a06 100644
--- a/fs/btrfs/accessors.c
+++ b/fs/btrfs/accessors.c
@@ -59,7 +59,7 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \
u8 lebytes[sizeof(u##bits)]; \
\
ASSERT(check_setget_bounds(eb, ptr, off, size)); \
- if (INLINE_EXTENT_BUFFER_PAGES == 1 || oil + size <= unit_size) \
+ if (INLINE_EXTENT_BUFFER_PAGES == 1 || likely(sizeof(u##bits) <= part)) \
return get_unaligned_le##bits(kaddr + oil); \
\
memcpy(lebytes, kaddr + oil, part); \
@@ -82,7 +82,7 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
\
ASSERT(check_setget_bounds(eb, ptr, off, size)); \
if (INLINE_EXTENT_BUFFER_PAGES == 1 || \
- oil + size <= unit_size) { \
+ likely(sizeof(u##bits) <= part)) { \
put_unaligned_le##bits(val, kaddr + oil); \
return; \
} \
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/7] btrfs: accessors: use type sizeof constants directly
2025-07-01 17:23 [PATCH 0/7] Set/get accessor speedups David Sterba
2025-07-01 17:23 ` [PATCH 1/7] btrfs: accessors: simplify folio bounds checks David Sterba
@ 2025-07-01 17:23 ` David Sterba
2025-07-02 17:58 ` Boris Burkov
2025-07-01 17:23 ` [PATCH 3/7] btrfs: accessors: inline eb bounds check and factor out the error report David Sterba
` (5 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2025-07-01 17:23 UTC (permalink / raw)
To: linux-btrfs; +Cc: David Sterba
Now unit_size is used only once, so use it directly in 'part'
calculation. Don't cache sizeof(type) in a variable. While this is a
compile-time constant, forcing the type 'int' generates worse code as it
leads to additional conversion from 32 to 64 bit type on x86_64.
The sizeof() is used only a few times and it does not make the code that
harder to read, so use it directly and let the compiler utilize the
immediate constants in the context it needs. The .ko code size slightly
increases (+50) but further patches will reduce that again.
Signed-off-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/accessors.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/fs/btrfs/accessors.c b/fs/btrfs/accessors.c
index b54c8abe467a06..2e90b9b14e73f4 100644
--- a/fs/btrfs/accessors.c
+++ b/fs/btrfs/accessors.c
@@ -52,19 +52,17 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \
const unsigned long idx = get_eb_folio_index(eb, member_offset);\
const unsigned long oil = get_eb_offset_in_folio(eb, \
member_offset);\
- const int unit_size = eb->folio_size; \
char *kaddr = folio_address(eb->folios[idx]); \
- const int size = sizeof(u##bits); \
- const int part = unit_size - oil; \
+ const int part = eb->folio_size - oil; \
u8 lebytes[sizeof(u##bits)]; \
\
- ASSERT(check_setget_bounds(eb, ptr, off, size)); \
+ ASSERT(check_setget_bounds(eb, ptr, off, sizeof(u##bits))); \
if (INLINE_EXTENT_BUFFER_PAGES == 1 || likely(sizeof(u##bits) <= part)) \
return get_unaligned_le##bits(kaddr + oil); \
\
memcpy(lebytes, kaddr + oil, part); \
kaddr = folio_address(eb->folios[idx + 1]); \
- memcpy(lebytes + part, kaddr, size - part); \
+ memcpy(lebytes + part, kaddr, sizeof(u##bits) - part); \
return get_unaligned_le##bits(lebytes); \
} \
void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
@@ -74,13 +72,11 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
const unsigned long idx = get_eb_folio_index(eb, member_offset);\
const unsigned long oil = get_eb_offset_in_folio(eb, \
member_offset);\
- const int unit_size = eb->folio_size; \
char *kaddr = folio_address(eb->folios[idx]); \
- const int size = sizeof(u##bits); \
- const int part = unit_size - oil; \
+ const int part = eb->folio_size - oil; \
u8 lebytes[sizeof(u##bits)]; \
\
- ASSERT(check_setget_bounds(eb, ptr, off, size)); \
+ ASSERT(check_setget_bounds(eb, ptr, off, sizeof(u##bits))); \
if (INLINE_EXTENT_BUFFER_PAGES == 1 || \
likely(sizeof(u##bits) <= part)) { \
put_unaligned_le##bits(val, kaddr + oil); \
@@ -90,7 +86,7 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
put_unaligned_le##bits(val, lebytes); \
memcpy(kaddr + oil, lebytes, part); \
kaddr = folio_address(eb->folios[idx + 1]); \
- memcpy(kaddr, lebytes + part, size - part); \
+ memcpy(kaddr, lebytes + part, sizeof(u##bits) - part); \
}
DEFINE_BTRFS_SETGET_BITS(8)
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/7] btrfs: accessors: inline eb bounds check and factor out the error report
2025-07-01 17:23 [PATCH 0/7] Set/get accessor speedups David Sterba
2025-07-01 17:23 ` [PATCH 1/7] btrfs: accessors: simplify folio bounds checks David Sterba
2025-07-01 17:23 ` [PATCH 2/7] btrfs: accessors: use type sizeof constants directly David Sterba
@ 2025-07-01 17:23 ` David Sterba
2025-07-02 18:00 ` Boris Burkov
2025-07-01 17:23 ` [PATCH 4/7] btrfs: accessors: compile-time fast path for u8 David Sterba
` (4 subsequent siblings)
7 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2025-07-01 17:23 UTC (permalink / raw)
To: linux-btrfs; +Cc: David Sterba
There's a check in each set/get helper if the requested range is within
extent buffer bounds, and if it's not then report it. This was in an
ASSERT statement so with CONFIG_BTRFS_ASSERT this crashes right away, on
other configs this is only reported but reading out of the bounds is
done anyway. There are currently no known reports of this particular
condition failing.
There are some drawbacks though. The behaviour dependence on the
assertions being compiled in or not and a less visible effect of
inlining report_setget_bounds() into each helper.
As the bounds check is expected to succeed almost always it's ok to
inline it but make the report a function and move it out of the helper
completely (__cold puts it to a different section). This also skips
reading/writing the requested range in case it fails.
This improves stack usage significantly:
btrfs_get_16 -48 (80 -> 32)
btrfs_get_32 -48 (80 -> 32)
btrfs_get_64 -48 (80 -> 32)
btrfs_get_8 -48 (72 -> 24)
btrfs_set_16 -56 (88 -> 32)
btrfs_set_32 -56 (88 -> 32)
btrfs_set_64 -56 (88 -> 32)
btrfs_set_8 -48 (80 -> 32)
NEW (48):
report_setget_bounds 48
LOST/NEW DELTA: +48
PRE/POST DELTA: -360
Same as .ko size:
text data bss dec hex filename
1456079 115665 16088 1587832 183a78 pre/btrfs.ko
1454951 115665 16088 1586704 183610 post/btrfs.ko
DELTA: -1128
Signed-off-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/accessors.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/fs/btrfs/accessors.c b/fs/btrfs/accessors.c
index 2e90b9b14e73f4..a7b6b2d7bde224 100644
--- a/fs/btrfs/accessors.c
+++ b/fs/btrfs/accessors.c
@@ -9,20 +9,15 @@
#include "fs.h"
#include "accessors.h"
-static bool check_setget_bounds(const struct extent_buffer *eb,
- const void *ptr, unsigned off, int size)
+static void __cold report_setget_bounds(const struct extent_buffer *eb,
+ const void *ptr, unsigned off, int size)
{
- const unsigned long member_offset = (unsigned long)ptr + off;
+ unsigned long member_offset = (unsigned long)ptr + off;
- if (unlikely(member_offset + size > eb->len)) {
- btrfs_warn(eb->fs_info,
- "bad eb member %s: ptr 0x%lx start %llu member offset %lu size %d",
- (member_offset > eb->len ? "start" : "end"),
- (unsigned long)ptr, eb->start, member_offset, size);
- return false;
- }
-
- return true;
+ btrfs_warn(eb->fs_info,
+ "bad eb member %s: ptr 0x%lx start %llu member offset %lu size %d",
+ (member_offset > eb->len ? "start" : "end"),
+ (unsigned long)ptr, eb->start, member_offset, size);
}
/*
@@ -56,7 +51,10 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \
const int part = eb->folio_size - oil; \
u8 lebytes[sizeof(u##bits)]; \
\
- ASSERT(check_setget_bounds(eb, ptr, off, sizeof(u##bits))); \
+ if (unlikely(member_offset + sizeof(u##bits) > eb->len)) { \
+ report_setget_bounds(eb, ptr, off, sizeof(u##bits)); \
+ return 0; \
+ } \
if (INLINE_EXTENT_BUFFER_PAGES == 1 || likely(sizeof(u##bits) <= part)) \
return get_unaligned_le##bits(kaddr + oil); \
\
@@ -76,7 +74,10 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
const int part = eb->folio_size - oil; \
u8 lebytes[sizeof(u##bits)]; \
\
- ASSERT(check_setget_bounds(eb, ptr, off, sizeof(u##bits))); \
+ if (unlikely(member_offset + sizeof(u##bits) > eb->len)) { \
+ report_setget_bounds(eb, ptr, off, sizeof(u##bits)); \
+ return; \
+ } \
if (INLINE_EXTENT_BUFFER_PAGES == 1 || \
likely(sizeof(u##bits) <= part)) { \
put_unaligned_le##bits(val, kaddr + oil); \
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/7] btrfs: accessors: compile-time fast path for u8
2025-07-01 17:23 [PATCH 0/7] Set/get accessor speedups David Sterba
` (2 preceding siblings ...)
2025-07-01 17:23 ` [PATCH 3/7] btrfs: accessors: inline eb bounds check and factor out the error report David Sterba
@ 2025-07-01 17:23 ` David Sterba
2025-07-01 17:23 ` [PATCH 5/7] btrfs: accessors: compile-time fast path for u16 David Sterba
` (3 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2025-07-01 17:23 UTC (permalink / raw)
To: linux-btrfs; +Cc: David Sterba
Reading/writing 1 byte (u8) is a special case compared to the others as
it's always contained in the folio we find, so the split memcpy will
never be needed. Turn it to a compile-time check that the memcpy part
can be optimized out.
The stack usage is reduced:
btrfs_set_8 -16 (32 -> 16)
btrfs_get_8 -16 (24 -> 8)
Code size reduction:
text data bss dec hex filename
1454951 115665 16088 1586704 183610 pre/btrfs.ko
1454691 115665 16088 1586444 18350c post/btrfs.ko
DELTA: -260
Signed-off-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/accessors.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/accessors.c b/fs/btrfs/accessors.c
index a7b6b2d7bde224..547e9f8fb87d61 100644
--- a/fs/btrfs/accessors.c
+++ b/fs/btrfs/accessors.c
@@ -55,7 +55,8 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \
report_setget_bounds(eb, ptr, off, sizeof(u##bits)); \
return 0; \
} \
- if (INLINE_EXTENT_BUFFER_PAGES == 1 || likely(sizeof(u##bits) <= part)) \
+ if (INLINE_EXTENT_BUFFER_PAGES == 1 || sizeof(u##bits) == 1 || \
+ likely(sizeof(u##bits) <= part)) \
return get_unaligned_le##bits(kaddr + oil); \
\
memcpy(lebytes, kaddr + oil, part); \
@@ -78,7 +79,7 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
report_setget_bounds(eb, ptr, off, sizeof(u##bits)); \
return; \
} \
- if (INLINE_EXTENT_BUFFER_PAGES == 1 || \
+ if (INLINE_EXTENT_BUFFER_PAGES == 1 || sizeof(u##bits) == 1 || \
likely(sizeof(u##bits) <= part)) { \
put_unaligned_le##bits(val, kaddr + oil); \
return; \
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 5/7] btrfs: accessors: compile-time fast path for u16
2025-07-01 17:23 [PATCH 0/7] Set/get accessor speedups David Sterba
` (3 preceding siblings ...)
2025-07-01 17:23 ` [PATCH 4/7] btrfs: accessors: compile-time fast path for u8 David Sterba
@ 2025-07-01 17:23 ` David Sterba
2025-07-01 17:23 ` [PATCH 6/7] btrfs: accessors: set target address at initialization David Sterba
` (2 subsequent siblings)
7 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2025-07-01 17:23 UTC (permalink / raw)
To: linux-btrfs; +Cc: David Sterba
Reading/writing 2 bytes (u16) may need 2 folios to be written to, each
time it's just one byte so using memcpy for that is an overkill.
Add a branch for the split case so that memcpy is now used for u32 and
u64. Another side effect is that the u16 types now don't need additional
stack space, everything fits to registers.
Stack usage is reduced:
btrfs_get_16 -8 (32 -> 24)
btrfs_set_16 -16 (32 -> 16)
Code size reduction:
text data bss dec hex filename
1454691 115665 16088 1586444 18350c pre/btrfs.ko
1454459 115665 16088 1586212 183424 post/btrfs.ko
DELTA: -232
Signed-off-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/accessors.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/accessors.c b/fs/btrfs/accessors.c
index 547e9f8fb87d61..8df404b5f6a334 100644
--- a/fs/btrfs/accessors.c
+++ b/fs/btrfs/accessors.c
@@ -59,9 +59,15 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \
likely(sizeof(u##bits) <= part)) \
return get_unaligned_le##bits(kaddr + oil); \
\
- memcpy(lebytes, kaddr + oil, part); \
- kaddr = folio_address(eb->folios[idx + 1]); \
- memcpy(lebytes + part, kaddr, sizeof(u##bits) - part); \
+ if (sizeof(u##bits) == 2) { \
+ lebytes[0] = *(kaddr + oil); \
+ kaddr = folio_address(eb->folios[idx + 1]); \
+ lebytes[1] = *kaddr; \
+ } else { \
+ memcpy(lebytes, kaddr + oil, part); \
+ kaddr = folio_address(eb->folios[idx + 1]); \
+ memcpy(lebytes + part, kaddr, sizeof(u##bits) - part); \
+ } \
return get_unaligned_le##bits(lebytes); \
} \
void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
@@ -84,11 +90,16 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
put_unaligned_le##bits(val, kaddr + oil); \
return; \
} \
- \
put_unaligned_le##bits(val, lebytes); \
- memcpy(kaddr + oil, lebytes, part); \
- kaddr = folio_address(eb->folios[idx + 1]); \
- memcpy(kaddr, lebytes + part, sizeof(u##bits) - part); \
+ if (sizeof(u##bits) == 2) { \
+ *(kaddr + oil) = lebytes[0]; \
+ kaddr = folio_address(eb->folios[idx + 1]); \
+ *kaddr = lebytes[1]; \
+ } else { \
+ memcpy(kaddr + oil, lebytes, part); \
+ kaddr = folio_address(eb->folios[idx + 1]); \
+ memcpy(kaddr, lebytes + part, sizeof(u##bits) - part); \
+ } \
}
DEFINE_BTRFS_SETGET_BITS(8)
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 6/7] btrfs: accessors: set target address at initialization
2025-07-01 17:23 [PATCH 0/7] Set/get accessor speedups David Sterba
` (4 preceding siblings ...)
2025-07-01 17:23 ` [PATCH 5/7] btrfs: accessors: compile-time fast path for u16 David Sterba
@ 2025-07-01 17:23 ` David Sterba
2025-07-01 17:23 ` [PATCH 7/7] btrfs: accessors: factor out split memcpy with two sources David Sterba
2025-07-02 18:54 ` [PATCH 0/7] Set/get accessor speedups Boris Burkov
7 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2025-07-01 17:23 UTC (permalink / raw)
To: linux-btrfs; +Cc: David Sterba
The target address for the read/write can be simplified as it's the same
expression for the first folio. This improves the generated code as the
folio address does not have to be cached on stack.
Stack usage reduction:
btrfs_set_32 -8 (32 -> 24)
btrfs_set_64 -8 (32 -> 24)
btrfs_get_16 -8 (24 -> 16)
Code size reduction:
text data bss dec hex filename
1454459 115665 16088 1586212 183424 pre/btrfs.ko
1454279 115665 16088 1586032 183370 post/btrfs.ko
DELTA: -180
Signed-off-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/accessors.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/accessors.c b/fs/btrfs/accessors.c
index 8df404b5f6a334..af11f547371815 100644
--- a/fs/btrfs/accessors.c
+++ b/fs/btrfs/accessors.c
@@ -47,7 +47,7 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \
const unsigned long idx = get_eb_folio_index(eb, member_offset);\
const unsigned long oil = get_eb_offset_in_folio(eb, \
member_offset);\
- char *kaddr = folio_address(eb->folios[idx]); \
+ char *kaddr = folio_address(eb->folios[idx]) + oil; \
const int part = eb->folio_size - oil; \
u8 lebytes[sizeof(u##bits)]; \
\
@@ -57,14 +57,14 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \
} \
if (INLINE_EXTENT_BUFFER_PAGES == 1 || sizeof(u##bits) == 1 || \
likely(sizeof(u##bits) <= part)) \
- return get_unaligned_le##bits(kaddr + oil); \
+ return get_unaligned_le##bits(kaddr); \
\
if (sizeof(u##bits) == 2) { \
- lebytes[0] = *(kaddr + oil); \
+ lebytes[0] = *kaddr; \
kaddr = folio_address(eb->folios[idx + 1]); \
lebytes[1] = *kaddr; \
} else { \
- memcpy(lebytes, kaddr + oil, part); \
+ memcpy(lebytes, kaddr, part); \
kaddr = folio_address(eb->folios[idx + 1]); \
memcpy(lebytes + part, kaddr, sizeof(u##bits) - part); \
} \
@@ -77,7 +77,7 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
const unsigned long idx = get_eb_folio_index(eb, member_offset);\
const unsigned long oil = get_eb_offset_in_folio(eb, \
member_offset);\
- char *kaddr = folio_address(eb->folios[idx]); \
+ char *kaddr = folio_address(eb->folios[idx]) + oil; \
const int part = eb->folio_size - oil; \
u8 lebytes[sizeof(u##bits)]; \
\
@@ -87,16 +87,16 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
} \
if (INLINE_EXTENT_BUFFER_PAGES == 1 || sizeof(u##bits) == 1 || \
likely(sizeof(u##bits) <= part)) { \
- put_unaligned_le##bits(val, kaddr + oil); \
+ put_unaligned_le##bits(val, kaddr); \
return; \
} \
put_unaligned_le##bits(val, lebytes); \
if (sizeof(u##bits) == 2) { \
- *(kaddr + oil) = lebytes[0]; \
+ *kaddr = lebytes[0]; \
kaddr = folio_address(eb->folios[idx + 1]); \
*kaddr = lebytes[1]; \
} else { \
- memcpy(kaddr + oil, lebytes, part); \
+ memcpy(kaddr, lebytes, part); \
kaddr = folio_address(eb->folios[idx + 1]); \
memcpy(kaddr, lebytes + part, sizeof(u##bits) - part); \
} \
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 7/7] btrfs: accessors: factor out split memcpy with two sources
2025-07-01 17:23 [PATCH 0/7] Set/get accessor speedups David Sterba
` (5 preceding siblings ...)
2025-07-01 17:23 ` [PATCH 6/7] btrfs: accessors: set target address at initialization David Sterba
@ 2025-07-01 17:23 ` David Sterba
2025-07-02 18:53 ` Boris Burkov
2025-07-02 18:54 ` [PATCH 0/7] Set/get accessor speedups Boris Burkov
7 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2025-07-01 17:23 UTC (permalink / raw)
To: linux-btrfs; +Cc: David Sterba
The case of a reading the bytes from 2 folios needs two memcpy()s, the
compiler does not emit calls but two inline loops.
Factoring out the code makes some improvement (stack, code) and in the
future will provide an optimized implementation as well. (The analogical
version with two destinations is not done as it increases stack usage
but can be done if needed.)
The address of the second folio is reordered before the first memcpy,
which leads to an optimization reusing the vmemmap_base and
page_offset_base (implementing folio_address()).
Stack usage reduction:
btrfs_get_32 -8 (32 -> 24)
btrfs_get_64 -8 (32 -> 24)
Code size reduction:
text data bss dec hex filename
1454279 115665 16088 1586032 183370 pre/btrfs.ko
1454229 115665 16088 1585982 18333e post/btrfs.ko
DELTA: -50
As this is the last patch in this series, here's the overall diff
starting and including commit "btrfs: accessors: simplify folio bounds
checks":
Stack:
btrfs_set_16 -72 (88 -> 16)
btrfs_get_32 -56 (80 -> 24)
btrfs_set_8 -72 (88 -> 16)
btrfs_set_64 -64 (88 -> 24)
btrfs_get_8 -72 (80 -> 8)
btrfs_get_16 -64 (80 -> 16)
btrfs_set_32 -64 (88 -> 24)
btrfs_get_64 -56 (80 -> 24)
NEW (48):
report_setget_bounds 48
LOST/NEW DELTA: +48
PRE/POST DELTA: -472
Code:
text data bss dec hex filename
1456601 115665 16088 1588354 183c82 pre/btrfs.ko
1454229 115665 16088 1585982 18333e post/btrfs.ko
DELTA: -2372
Signed-off-by: David Sterba <dsterba@suse.com>
---
fs/btrfs/accessors.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/accessors.c b/fs/btrfs/accessors.c
index af11f547371815..f554c4f723617f 100644
--- a/fs/btrfs/accessors.c
+++ b/fs/btrfs/accessors.c
@@ -20,6 +20,15 @@ static void __cold report_setget_bounds(const struct extent_buffer *eb,
(unsigned long)ptr, eb->start, member_offset, size);
}
+/* Copy bytes from @src1 and @src2 to @dest. */
+static __always_inline void memcpy_split_src(char *dest, const char *src1,
+ const char *src2, const size_t len1,
+ const size_t total)
+{
+ memcpy(dest, src1, len1);
+ memcpy(dest + len1, src2, total - len1);
+}
+
/*
* Macro templates that define helpers to read/write extent buffer data of a
* given size, that are also used via ctree.h for access to item members by
@@ -64,9 +73,9 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \
kaddr = folio_address(eb->folios[idx + 1]); \
lebytes[1] = *kaddr; \
} else { \
- memcpy(lebytes, kaddr, part); \
- kaddr = folio_address(eb->folios[idx + 1]); \
- memcpy(lebytes + part, kaddr, sizeof(u##bits) - part); \
+ memcpy_split_src(lebytes, kaddr, \
+ folio_address(eb->folios[idx + 1]), \
+ part, sizeof(u##bits)); \
} \
return get_unaligned_le##bits(lebytes); \
} \
--
2.49.0
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 2/7] btrfs: accessors: use type sizeof constants directly
2025-07-01 17:23 ` [PATCH 2/7] btrfs: accessors: use type sizeof constants directly David Sterba
@ 2025-07-02 17:58 ` Boris Burkov
2025-07-03 21:20 ` David Sterba
0 siblings, 1 reply; 16+ messages in thread
From: Boris Burkov @ 2025-07-02 17:58 UTC (permalink / raw)
To: David Sterba; +Cc: linux-btrfs
On Tue, Jul 01, 2025 at 07:23:49PM +0200, David Sterba wrote:
> Now unit_size is used only once, so use it directly in 'part'
> calculation. Don't cache sizeof(type) in a variable. While this is a
> compile-time constant, forcing the type 'int' generates worse code as it
> leads to additional conversion from 32 to 64 bit type on x86_64.
>
> The sizeof() is used only a few times and it does not make the code that
> harder to read, so use it directly and let the compiler utilize the
> immediate constants in the context it needs. The .ko code size slightly
> increases (+50) but further patches will reduce that again.
>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
> fs/btrfs/accessors.c | 16 ++++++----------
> 1 file changed, 6 insertions(+), 10 deletions(-)
>
> diff --git a/fs/btrfs/accessors.c b/fs/btrfs/accessors.c
> index b54c8abe467a06..2e90b9b14e73f4 100644
> --- a/fs/btrfs/accessors.c
> +++ b/fs/btrfs/accessors.c
> @@ -52,19 +52,17 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \
> const unsigned long idx = get_eb_folio_index(eb, member_offset);\
> const unsigned long oil = get_eb_offset_in_folio(eb, \
> member_offset);\
> - const int unit_size = eb->folio_size; \
> char *kaddr = folio_address(eb->folios[idx]); \
> - const int size = sizeof(u##bits); \
> - const int part = unit_size - oil; \
> + const int part = eb->folio_size - oil; \
nit: the names oil and part are pretty non-sensical to me. Oil used to
be oip for Offset In Page. Is it Offset In foLio?
I can't figure out what part should mean.
So while I see why you're doing all the changes, I can' help but notice
that you removed the two named variables with logical names and left the
confusing ones. :)
> u8 lebytes[sizeof(u##bits)]; \
> \
> - ASSERT(check_setget_bounds(eb, ptr, off, size)); \
> + ASSERT(check_setget_bounds(eb, ptr, off, sizeof(u##bits))); \
> if (INLINE_EXTENT_BUFFER_PAGES == 1 || likely(sizeof(u##bits) <= part)) \
> return get_unaligned_le##bits(kaddr + oil); \
> \
> memcpy(lebytes, kaddr + oil, part); \
> kaddr = folio_address(eb->folios[idx + 1]); \
> - memcpy(lebytes + part, kaddr, size - part); \
> + memcpy(lebytes + part, kaddr, sizeof(u##bits) - part); \
> return get_unaligned_le##bits(lebytes); \
> } \
> void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
> @@ -74,13 +72,11 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
> const unsigned long idx = get_eb_folio_index(eb, member_offset);\
> const unsigned long oil = get_eb_offset_in_folio(eb, \
> member_offset);\
> - const int unit_size = eb->folio_size; \
> char *kaddr = folio_address(eb->folios[idx]); \
> - const int size = sizeof(u##bits); \
> - const int part = unit_size - oil; \
> + const int part = eb->folio_size - oil; \
> u8 lebytes[sizeof(u##bits)]; \
> \
> - ASSERT(check_setget_bounds(eb, ptr, off, size)); \
> + ASSERT(check_setget_bounds(eb, ptr, off, sizeof(u##bits))); \
> if (INLINE_EXTENT_BUFFER_PAGES == 1 || \
> likely(sizeof(u##bits) <= part)) { \
> put_unaligned_le##bits(val, kaddr + oil); \
> @@ -90,7 +86,7 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
> put_unaligned_le##bits(val, lebytes); \
> memcpy(kaddr + oil, lebytes, part); \
> kaddr = folio_address(eb->folios[idx + 1]); \
> - memcpy(kaddr, lebytes + part, size - part); \
> + memcpy(kaddr, lebytes + part, sizeof(u##bits) - part); \
> }
>
> DEFINE_BTRFS_SETGET_BITS(8)
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/7] btrfs: accessors: inline eb bounds check and factor out the error report
2025-07-01 17:23 ` [PATCH 3/7] btrfs: accessors: inline eb bounds check and factor out the error report David Sterba
@ 2025-07-02 18:00 ` Boris Burkov
2025-07-03 21:16 ` David Sterba
0 siblings, 1 reply; 16+ messages in thread
From: Boris Burkov @ 2025-07-02 18:00 UTC (permalink / raw)
To: David Sterba; +Cc: linux-btrfs
On Tue, Jul 01, 2025 at 07:23:50PM +0200, David Sterba wrote:
> There's a check in each set/get helper if the requested range is within
> extent buffer bounds, and if it's not then report it. This was in an
> ASSERT statement so with CONFIG_BTRFS_ASSERT this crashes right away, on
> other configs this is only reported but reading out of the bounds is
> done anyway. There are currently no known reports of this particular
> condition failing.
>
> There are some drawbacks though. The behaviour dependence on the
> assertions being compiled in or not and a less visible effect of
> inlining report_setget_bounds() into each helper.
>
> As the bounds check is expected to succeed almost always it's ok to
> inline it but make the report a function and move it out of the helper
> completely (__cold puts it to a different section). This also skips
> reading/writing the requested range in case it fails.
>
> This improves stack usage significantly:
>
> btrfs_get_16 -48 (80 -> 32)
> btrfs_get_32 -48 (80 -> 32)
> btrfs_get_64 -48 (80 -> 32)
> btrfs_get_8 -48 (72 -> 24)
> btrfs_set_16 -56 (88 -> 32)
> btrfs_set_32 -56 (88 -> 32)
> btrfs_set_64 -56 (88 -> 32)
> btrfs_set_8 -48 (80 -> 32)
>
> NEW (48):
> report_setget_bounds 48
> LOST/NEW DELTA: +48
> PRE/POST DELTA: -360
>
> Same as .ko size:
>
> text data bss dec hex filename
> 1456079 115665 16088 1587832 183a78 pre/btrfs.ko
> 1454951 115665 16088 1586704 183610 post/btrfs.ko
>
> DELTA: -1128
>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
> fs/btrfs/accessors.c | 29 +++++++++++++++--------------
> 1 file changed, 15 insertions(+), 14 deletions(-)
>
> diff --git a/fs/btrfs/accessors.c b/fs/btrfs/accessors.c
> index 2e90b9b14e73f4..a7b6b2d7bde224 100644
> --- a/fs/btrfs/accessors.c
> +++ b/fs/btrfs/accessors.c
> @@ -9,20 +9,15 @@
> #include "fs.h"
> #include "accessors.h"
>
> -static bool check_setget_bounds(const struct extent_buffer *eb,
> - const void *ptr, unsigned off, int size)
> +static void __cold report_setget_bounds(const struct extent_buffer *eb,
> + const void *ptr, unsigned off, int size)
> {
> - const unsigned long member_offset = (unsigned long)ptr + off;
> + unsigned long member_offset = (unsigned long)ptr + off;
>
> - if (unlikely(member_offset + size > eb->len)) {
> - btrfs_warn(eb->fs_info,
> - "bad eb member %s: ptr 0x%lx start %llu member offset %lu size %d",
> - (member_offset > eb->len ? "start" : "end"),
> - (unsigned long)ptr, eb->start, member_offset, size);
> - return false;
> - }
> -
> - return true;
> + btrfs_warn(eb->fs_info,
> + "bad eb member %s: ptr 0x%lx start %llu member offset %lu size %d",
> + (member_offset > eb->len ? "start" : "end"),
> + (unsigned long)ptr, eb->start, member_offset, size);
> }
>
> /*
> @@ -56,7 +51,10 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \
> const int part = eb->folio_size - oil; \
> u8 lebytes[sizeof(u##bits)]; \
> \
> - ASSERT(check_setget_bounds(eb, ptr, off, sizeof(u##bits))); \
> + if (unlikely(member_offset + sizeof(u##bits) > eb->len)) { \
> + report_setget_bounds(eb, ptr, off, sizeof(u##bits)); \
For full no-change compatibility would it make sense to also ASSERT
here? (or stuff it in report, these are the only two users)
> + return 0; \
> + } \
> if (INLINE_EXTENT_BUFFER_PAGES == 1 || likely(sizeof(u##bits) <= part)) \
> return get_unaligned_le##bits(kaddr + oil); \
> \
> @@ -76,7 +74,10 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
> const int part = eb->folio_size - oil; \
> u8 lebytes[sizeof(u##bits)]; \
> \
> - ASSERT(check_setget_bounds(eb, ptr, off, sizeof(u##bits))); \
> + if (unlikely(member_offset + sizeof(u##bits) > eb->len)) { \
> + report_setget_bounds(eb, ptr, off, sizeof(u##bits)); \
> + return; \
> + } \
Would a helper macro to reduce this duplication be useful? Seems
borderline but worth mentioning. Next time you improve it you won't
have to hit two spots.
> if (INLINE_EXTENT_BUFFER_PAGES == 1 || \
> likely(sizeof(u##bits) <= part)) { \
> put_unaligned_le##bits(val, kaddr + oil); \
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 7/7] btrfs: accessors: factor out split memcpy with two sources
2025-07-01 17:23 ` [PATCH 7/7] btrfs: accessors: factor out split memcpy with two sources David Sterba
@ 2025-07-02 18:53 ` Boris Burkov
2025-07-03 20:57 ` David Sterba
0 siblings, 1 reply; 16+ messages in thread
From: Boris Burkov @ 2025-07-02 18:53 UTC (permalink / raw)
To: David Sterba; +Cc: linux-btrfs
On Tue, Jul 01, 2025 at 07:23:54PM +0200, David Sterba wrote:
> The case of a reading the bytes from 2 folios needs two memcpy()s, the
> compiler does not emit calls but two inline loops.
>
> Factoring out the code makes some improvement (stack, code) and in the
> future will provide an optimized implementation as well. (The analogical
> version with two destinations is not done as it increases stack usage
> but can be done if needed.)
Is there some fundamental reason for this, or does it just happen to be
so? Sort of interesting either way. Does it make you worry that this
stuff will regress randomly in the future?
>
> The address of the second folio is reordered before the first memcpy,
> which leads to an optimization reusing the vmemmap_base and
> page_offset_base (implementing folio_address()).
>
> Stack usage reduction:
>
> btrfs_get_32 -8 (32 -> 24)
> btrfs_get_64 -8 (32 -> 24)
>
> Code size reduction:
>
> text data bss dec hex filename
> 1454279 115665 16088 1586032 183370 pre/btrfs.ko
> 1454229 115665 16088 1585982 18333e post/btrfs.ko
>
> DELTA: -50
>
> As this is the last patch in this series, here's the overall diff
> starting and including commit "btrfs: accessors: simplify folio bounds
> checks":
>
> Stack:
>
> btrfs_set_16 -72 (88 -> 16)
> btrfs_get_32 -56 (80 -> 24)
> btrfs_set_8 -72 (88 -> 16)
> btrfs_set_64 -64 (88 -> 24)
> btrfs_get_8 -72 (80 -> 8)
> btrfs_get_16 -64 (80 -> 16)
> btrfs_set_32 -64 (88 -> 24)
> btrfs_get_64 -56 (80 -> 24)
>
> NEW (48):
> report_setget_bounds 48
> LOST/NEW DELTA: +48
> PRE/POST DELTA: -472
>
> Code:
>
> text data bss dec hex filename
> 1456601 115665 16088 1588354 183c82 pre/btrfs.ko
> 1454229 115665 16088 1585982 18333e post/btrfs.ko
>
> DELTA: -2372
Sweet!
>
> Signed-off-by: David Sterba <dsterba@suse.com>
> ---
> fs/btrfs/accessors.c | 15 ++++++++++++---
> 1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/fs/btrfs/accessors.c b/fs/btrfs/accessors.c
> index af11f547371815..f554c4f723617f 100644
> --- a/fs/btrfs/accessors.c
> +++ b/fs/btrfs/accessors.c
> @@ -20,6 +20,15 @@ static void __cold report_setget_bounds(const struct extent_buffer *eb,
> (unsigned long)ptr, eb->start, member_offset, size);
> }
>
> +/* Copy bytes from @src1 and @src2 to @dest. */
> +static __always_inline void memcpy_split_src(char *dest, const char *src1,
> + const char *src2, const size_t len1,
> + const size_t total)
> +{
> + memcpy(dest, src1, len1);
> + memcpy(dest + len1, src2, total - len1);
> +}
> +
> /*
> * Macro templates that define helpers to read/write extent buffer data of a
> * given size, that are also used via ctree.h for access to item members by
> @@ -64,9 +73,9 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \
> kaddr = folio_address(eb->folios[idx + 1]); \
> lebytes[1] = *kaddr; \
> } else { \
> - memcpy(lebytes, kaddr, part); \
> - kaddr = folio_address(eb->folios[idx + 1]); \
> - memcpy(lebytes + part, kaddr, sizeof(u##bits) - part); \
> + memcpy_split_src(lebytes, kaddr, \
> + folio_address(eb->folios[idx + 1]), \
> + part, sizeof(u##bits)); \
> } \
> return get_unaligned_le##bits(lebytes); \
> } \
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/7] Set/get accessor speedups
2025-07-01 17:23 [PATCH 0/7] Set/get accessor speedups David Sterba
` (6 preceding siblings ...)
2025-07-01 17:23 ` [PATCH 7/7] btrfs: accessors: factor out split memcpy with two sources David Sterba
@ 2025-07-02 18:54 ` Boris Burkov
7 siblings, 0 replies; 16+ messages in thread
From: Boris Burkov @ 2025-07-02 18:54 UTC (permalink / raw)
To: David Sterba; +Cc: linux-btrfs
On Tue, Jul 01, 2025 at 07:23:47PM +0200, David Sterba wrote:
> Followup to [1] bringing refactoring that allows compiler to do more
> optimizations and stack usage reduction. As the set/get helpers are
> heavily used in many functions this improves performance. I will provide
> some numbers later as I've done mostly functional tests. During the
> development I've identified a few more possible optimizations to improve
> performance, but this series is OK as is.
>
> Overall effects of this series:
>
> Stack:
>
> btrfs_set_16 -72 (88 -> 16)
> btrfs_get_32 -56 (80 -> 24)
> btrfs_set_8 -72 (88 -> 16)
> btrfs_set_64 -64 (88 -> 24)
> btrfs_get_8 -72 (80 -> 8)
> btrfs_get_16 -64 (80 -> 16)
> btrfs_set_32 -64 (88 -> 24)
> btrfs_get_64 -56 (80 -> 24)
>
> NEW (48):
> report_setget_bounds 48
> LOST/NEW DELTA: +48
> PRE/POST DELTA: -472
>
> Code:
>
> text data bss dec hex filename
> 1456601 115665 16088 1588354 183c82 pre/btrfs.ko
> 1454229 115665 16088 1585982 18333e post/btrfs.ko
>
> DELTA: -2372
I sent some minor inline questions / suggestions, but overall this looks
great and each patch makes a good deal of sense on its own.
Reviewed-by: Boris Burkov <boris@bur.io>
>
> [1] https://lore.kernel.org/linux-btrfs/cover.1751032655.git.dsterba@suse.com/
>
> David Sterba (7):
> btrfs: accessors: simplify folio bounds checks
> btrfs: accessors: use type sizeof constants directly
> btrfs: accessors: inline eb bounds check and factor out the error
> report
> btrfs: accessors: compile-time fast path for u8
> btrfs: accessors: compile-time fast path for u16
> btrfs: accessors: set target address at initialization
> btrfs: accessors: factor out split memcpy with two sources
>
> fs/btrfs/accessors.c | 84 +++++++++++++++++++++++++++-----------------
> 1 file changed, 51 insertions(+), 33 deletions(-)
>
> --
> 2.49.0
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 7/7] btrfs: accessors: factor out split memcpy with two sources
2025-07-02 18:53 ` Boris Burkov
@ 2025-07-03 20:57 ` David Sterba
0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2025-07-03 20:57 UTC (permalink / raw)
To: Boris Burkov; +Cc: David Sterba, linux-btrfs
On Wed, Jul 02, 2025 at 11:53:37AM -0700, Boris Burkov wrote:
> On Tue, Jul 01, 2025 at 07:23:54PM +0200, David Sterba wrote:
> > The case of a reading the bytes from 2 folios needs two memcpy()s, the
> > compiler does not emit calls but two inline loops.
> >
> > Factoring out the code makes some improvement (stack, code) and in the
> > future will provide an optimized implementation as well. (The analogical
> > version with two destinations is not done as it increases stack usage
> > but can be done if needed.)
>
> Is there some fundamental reason for this, or does it just happen to be
> so? Sort of interesting either way. Does it make you worry that this
> stuff will regress randomly in the future?
My explanation for that is that it's in the compiler optimization black
box, the function is inline and when evaluated in the context of the
caller it just came out worse.
This is the patch:
--- a/fs/btrfs/accessors.c
+++ b/fs/btrfs/accessors.c
@@ -29,6 +29,14 @@ static __always_inline void memcpy_split_src(char *dest, const char *src1,
memcpy(dest + len1, src2, total - len1);
}
+static __always_inline void memcpy_split_dest(char *dest1, const char *src,
+ char *dest2, const size_t len1,
+ const size_t total)
+{
+ memcpy(dest1, src, len1);
+ memcpy(dest2, src + len1, total - len1);
+}
+
/*
* Macro templates that define helpers to read/write extent buffer data of a
* given size, that are also used via ctree.h for access to item members by
@@ -105,9 +113,9 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
kaddr = folio_address(eb->folios[idx + 1]); \
*kaddr = lebytes[1]; \
} else { \
- memcpy(kaddr, lebytes, part); \
- kaddr = folio_address(eb->folios[idx + 1]); \
- memcpy(kaddr, lebytes + part, sizeof(u##bits) - part); \
+ memcpy_split_dest(kaddr, lebytes, \
+ folio_address(eb->folios[idx + 1]), \
+ part, sizeof(u##bits)); \
} \
}
---
And the evaluation:
btrfs_set_64 +8 (24 -> 32)
btrfs_set_32 +8 (24 -> 32)
text data bss dec hex filename
1454157 115665 16088 1585910 1832f6 pre/btrfs.ko
1454173 115665 16088 1585926 183306 post/btrfs.ko
DELTA: +16
I excluded the patch because of the worse stack and code diff, because
the other patches were all an improvement. This one was preparatory for
the fancy optimization I have so it can be placed to a helper instead of
the macro. Given the remaining time before code freeze it would not
leave enough time for testing and it might also be a generic helper in
the end.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 3/7] btrfs: accessors: inline eb bounds check and factor out the error report
2025-07-02 18:00 ` Boris Burkov
@ 2025-07-03 21:16 ` David Sterba
0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2025-07-03 21:16 UTC (permalink / raw)
To: Boris Burkov; +Cc: David Sterba, linux-btrfs
On Wed, Jul 02, 2025 at 11:00:55AM -0700, Boris Burkov wrote:
> > @@ -56,7 +51,10 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \
> > const int part = eb->folio_size - oil; \
> > u8 lebytes[sizeof(u##bits)]; \
> > \
> > - ASSERT(check_setget_bounds(eb, ptr, off, sizeof(u##bits))); \
> > + if (unlikely(member_offset + sizeof(u##bits) > eb->len)) { \
> > + report_setget_bounds(eb, ptr, off, sizeof(u##bits)); \
>
> For full no-change compatibility would it make sense to also ASSERT
> here? (or stuff it in report, these are the only two users)
I'm not claiming no-change here and it's implied by the changelog that
the behaviour and outcome depend on CONFIG_BTRFS_ASSEERT which I wanted
to unify.
I see the point that it should keep the assert here when things go
wrong, during development. However, the type of errors it would catch is
quite rare. Most btree items are accessed by the set/get helpers with
fixed sizes, or the read_extent_buffer/write_extent_buffer with variable
length (and they have their own bounds checks and reports).
I can add an assert or rather a debug warning to the report function so
we get a noisy report. The reason I'd like use DEBUG_WARN for that is
that we get a noisy report but don't crash so we can also observe what
happens if the function returns a this is what would happen with
non-assert configs.
For future plans, this error case will also somehow mark the filesystem
as "we have a serious problem" and let it shut down. Qu has some work
for that pending so I'll connect the two once it's done.
My idea, before the shutdown works, was to set a filesystem bit and
detect it at transaction commit to abort early. The reason is that the
set/get helpers are used everywhere without error checking, so it would
be more sensible to catch that in some selected points.
> > @@ -76,7 +74,10 @@ void btrfs_set_##bits(const struct extent_buffer *eb, void *ptr, \
> > const int part = eb->folio_size - oil; \
> > u8 lebytes[sizeof(u##bits)]; \
> > \
> > - ASSERT(check_setget_bounds(eb, ptr, off, sizeof(u##bits))); \
> > + if (unlikely(member_offset + sizeof(u##bits) > eb->len)) { \
> > + report_setget_bounds(eb, ptr, off, sizeof(u##bits)); \
> > + return; \
> > + } \
>
> Would a helper macro to reduce this duplication be useful? Seems
> borderline but worth mentioning. Next time you improve it you won't
> have to hit two spots.
I don't see much reason for that, the definitions are next to each
other and once the token versions got removed the code it's basically on
one screen, hard to miss the other location.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/7] btrfs: accessors: use type sizeof constants directly
2025-07-02 17:58 ` Boris Burkov
@ 2025-07-03 21:20 ` David Sterba
2025-07-07 14:25 ` David Sterba
0 siblings, 1 reply; 16+ messages in thread
From: David Sterba @ 2025-07-03 21:20 UTC (permalink / raw)
To: Boris Burkov; +Cc: David Sterba, linux-btrfs
On Wed, Jul 02, 2025 at 10:58:54AM -0700, Boris Burkov wrote:
> On Tue, Jul 01, 2025 at 07:23:49PM +0200, David Sterba wrote:
> > Now unit_size is used only once, so use it directly in 'part'
> > calculation. Don't cache sizeof(type) in a variable. While this is a
> > compile-time constant, forcing the type 'int' generates worse code as it
> > leads to additional conversion from 32 to 64 bit type on x86_64.
> >
> > The sizeof() is used only a few times and it does not make the code that
> > harder to read, so use it directly and let the compiler utilize the
> > immediate constants in the context it needs. The .ko code size slightly
> > increases (+50) but further patches will reduce that again.
> >
> > Signed-off-by: David Sterba <dsterba@suse.com>
> > ---
> > fs/btrfs/accessors.c | 16 ++++++----------
> > 1 file changed, 6 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/btrfs/accessors.c b/fs/btrfs/accessors.c
> > index b54c8abe467a06..2e90b9b14e73f4 100644
> > --- a/fs/btrfs/accessors.c
> > +++ b/fs/btrfs/accessors.c
> > @@ -52,19 +52,17 @@ u##bits btrfs_get_##bits(const struct extent_buffer *eb, \
> > const unsigned long idx = get_eb_folio_index(eb, member_offset);\
> > const unsigned long oil = get_eb_offset_in_folio(eb, \
> > member_offset);\
> > - const int unit_size = eb->folio_size; \
> > char *kaddr = folio_address(eb->folios[idx]); \
> > - const int size = sizeof(u##bits); \
> > - const int part = unit_size - oil; \
> > + const int part = eb->folio_size - oil; \
>
> nit: the names oil and part are pretty non-sensical to me. Oil used to
> be oip for Offset In Page. Is it Offset In foLio?
>
> I can't figure out what part should mean.
It confused me the whole time I was looking at the code, it snuck in
with the folio changes, it should have been 'oif' follwing the previous
naming pattern.
> So while I see why you're doing all the changes, I can' help but notice
> that you removed the two named variables with logical names and left the
> confusing ones. :)
So I can sneak in a patch renaming it at the beginning or at the end,
naming it 'oif' or 'foff' (there's the eb related offset as parameter so
we need some kind of distinction).
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/7] btrfs: accessors: use type sizeof constants directly
2025-07-03 21:20 ` David Sterba
@ 2025-07-07 14:25 ` David Sterba
0 siblings, 0 replies; 16+ messages in thread
From: David Sterba @ 2025-07-07 14:25 UTC (permalink / raw)
To: David Sterba; +Cc: Boris Burkov, David Sterba, linux-btrfs
On Thu, Jul 03, 2025 at 11:20:15PM +0200, David Sterba wrote:
> > nit: the names oil and part are pretty non-sensical to me. Oil used to
> > be oip for Offset In Page. Is it Offset In foLio?
> >
> > I can't figure out what part should mean.
>
> It confused me the whole time I was looking at the code, it snuck in
> with the folio changes, it should have been 'oif' follwing the previous
> naming pattern.
>
> > So while I see why you're doing all the changes, I can' help but notice
> > that you removed the two named variables with logical names and left the
> > confusing ones. :)
>
> So I can sneak in a patch renaming it at the beginning or at the end,
> naming it 'oif' or 'foff' (there's the eb related offset as parameter so
> we need some kind of distinction).
Appended a patch renaming it to 'oif'.
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-07-07 14:25 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-01 17:23 [PATCH 0/7] Set/get accessor speedups David Sterba
2025-07-01 17:23 ` [PATCH 1/7] btrfs: accessors: simplify folio bounds checks David Sterba
2025-07-01 17:23 ` [PATCH 2/7] btrfs: accessors: use type sizeof constants directly David Sterba
2025-07-02 17:58 ` Boris Burkov
2025-07-03 21:20 ` David Sterba
2025-07-07 14:25 ` David Sterba
2025-07-01 17:23 ` [PATCH 3/7] btrfs: accessors: inline eb bounds check and factor out the error report David Sterba
2025-07-02 18:00 ` Boris Burkov
2025-07-03 21:16 ` David Sterba
2025-07-01 17:23 ` [PATCH 4/7] btrfs: accessors: compile-time fast path for u8 David Sterba
2025-07-01 17:23 ` [PATCH 5/7] btrfs: accessors: compile-time fast path for u16 David Sterba
2025-07-01 17:23 ` [PATCH 6/7] btrfs: accessors: set target address at initialization David Sterba
2025-07-01 17:23 ` [PATCH 7/7] btrfs: accessors: factor out split memcpy with two sources David Sterba
2025-07-02 18:53 ` Boris Burkov
2025-07-03 20:57 ` David Sterba
2025-07-02 18:54 ` [PATCH 0/7] Set/get accessor speedups Boris Burkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox