* [PATCH 0/3] xfs: 64kb directory block verification hurts
@ 2021-02-23 5:47 Dave Chinner
2021-02-23 5:47 ` [PATCH 1/3] xfs: type verification is expensive Dave Chinner
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: Dave Chinner @ 2021-02-23 5:47 UTC (permalink / raw)
To: linux-xfs
Hi folks,
Quite simple: the debug code we have for internal directory block
verification does not scale to large directory blocks. What is a
minor nuisance at 4kB block size turns into a major problem at 64kB
block size.
These three patches work to reduce the worst of this overhead
without completely gutting the debug checks that are being done.
The type verification code that is generated is horrible - the
compiler does not inline all the nested 3 line functions and so
the function call overhead is significant. Adding a few inline
keywords so that the internal nesting is inlined cuts the overhead
by 30%.
The overhead is still huge. Some of the verification testing is
unnecessary - testing for error injection for a bad inode number
millions of times a second is hugely expensive, and getting rid of
that cuts overhead of directory inode number validation in half.
But the overhead is still huge. The biggest offender is the
directory leaf hash ordering checks. On every debug check call, of
which there is several for every directory modification, we walk the
entire hash entry table in the buffer (8k entries!) to check for
order. This is largely unnecessary, so only do this full order check
when the check function is called from the IO verifiers. If a kernel
dev needs more expensive checks to be re-instated, they only need to
change a single parameter from false to true to do so.
These changes make scalability testing with 64kB directory blocks on
debug kernels possible.
Cheers,
Dave.
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH 1/3] xfs: type verification is expensive 2021-02-23 5:47 [PATCH 0/3] xfs: 64kb directory block verification hurts Dave Chinner @ 2021-02-23 5:47 ` Dave Chinner 2021-02-24 21:46 ` Darrick J. Wong 2021-02-25 9:03 ` Christoph Hellwig 2021-02-23 5:47 ` [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check Dave Chinner 2021-02-23 5:47 ` [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks Dave Chinner 2 siblings, 2 replies; 11+ messages in thread From: Dave Chinner @ 2021-02-23 5:47 UTC (permalink / raw) To: linux-xfs From: Dave Chinner <dchinner@redhat.com> From a concurrent rm -rf workload: 41.04% [kernel] [k] xfs_dir3_leaf_check_int 9.85% [kernel] [k] __xfs_dir3_data_check 5.60% [kernel] [k] xfs_verify_ino 5.32% [kernel] [k] xfs_agino_range 4.21% [kernel] [k] memcpy 3.06% [kernel] [k] xfs_errortag_test 2.57% [kernel] [k] xfs_dir_ino_validate 1.66% [kernel] [k] xfs_dir2_data_get_ftype 1.17% [kernel] [k] do_raw_spin_lock 1.11% [kernel] [k] xfs_verify_dir_ino 0.84% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock 0.83% [kernel] [k] xfs_buf_find 0.64% [kernel] [k] xfs_log_commit_cil THere's an awful lot of overhead in just range checking inode numbers in that, but each inode number check is not a lot of code. The total is a bit over 14.5% of the CPU time is spent validating inode numbers. The problem is that they deeply nested global scope functions so the overhead here is all in function call marshalling. text data bss dec hex filename 2077 0 0 2077 81d fs/xfs/libxfs/xfs_types.o.orig 2197 0 0 2197 895 fs/xfs/libxfs/xfs_types.o There's a small increase in binary size by inlining all the local nested calls in the verifier functions, but the same workload now profiles as: 40.69% [kernel] [k] xfs_dir3_leaf_check_int 10.52% [kernel] [k] __xfs_dir3_data_check 6.68% [kernel] [k] xfs_verify_dir_ino 4.22% [kernel] [k] xfs_errortag_test 4.15% [kernel] [k] memcpy 3.53% [kernel] [k] xfs_dir_ino_validate 1.87% [kernel] [k] xfs_dir2_data_get_ftype 1.37% [kernel] [k] do_raw_spin_lock 0.98% [kernel] [k] xfs_buf_find 0.94% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock 0.73% [kernel] [k] xfs_log_commit_cil Now we only spend just over 10% of the time validing inode numbers for the same workload. Hence a few "inline" keyworks is good enough to reduce the validation overhead by 30%... Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/libxfs/xfs_types.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/fs/xfs/libxfs/xfs_types.c b/fs/xfs/libxfs/xfs_types.c index b254fbeaaa50..04801362e1a7 100644 --- a/fs/xfs/libxfs/xfs_types.c +++ b/fs/xfs/libxfs/xfs_types.c @@ -13,7 +13,7 @@ #include "xfs_mount.h" /* Find the size of the AG, in blocks. */ -xfs_agblock_t +inline xfs_agblock_t xfs_ag_block_count( struct xfs_mount *mp, xfs_agnumber_t agno) @@ -29,7 +29,7 @@ xfs_ag_block_count( * Verify that an AG block number pointer neither points outside the AG * nor points at static metadata. */ -bool +inline bool xfs_verify_agbno( struct xfs_mount *mp, xfs_agnumber_t agno, @@ -49,7 +49,7 @@ xfs_verify_agbno( * Verify that an FS block number pointer neither points outside the * filesystem nor points at static AG metadata. */ -bool +inline bool xfs_verify_fsbno( struct xfs_mount *mp, xfs_fsblock_t fsbno) @@ -85,7 +85,7 @@ xfs_verify_fsbext( } /* Calculate the first and last possible inode number in an AG. */ -void +inline void xfs_agino_range( struct xfs_mount *mp, xfs_agnumber_t agno, @@ -116,7 +116,7 @@ xfs_agino_range( * Verify that an AG inode number pointer neither points outside the AG * nor points at static metadata. */ -bool +inline bool xfs_verify_agino( struct xfs_mount *mp, xfs_agnumber_t agno, @@ -146,7 +146,7 @@ xfs_verify_agino_or_null( * Verify that an FS inode number pointer neither points outside the * filesystem nor points at static AG metadata. */ -bool +inline bool xfs_verify_ino( struct xfs_mount *mp, xfs_ino_t ino) @@ -162,7 +162,7 @@ xfs_verify_ino( } /* Is this an internal inode number? */ -bool +inline bool xfs_internal_inum( struct xfs_mount *mp, xfs_ino_t ino) @@ -190,7 +190,7 @@ xfs_verify_dir_ino( * Verify that an realtime block number pointer doesn't point off the * end of the realtime device. */ -bool +inline bool xfs_verify_rtbno( struct xfs_mount *mp, xfs_rtblock_t rtbno) @@ -215,7 +215,7 @@ xfs_verify_rtext( } /* Calculate the range of valid icount values. */ -void +inline void xfs_icount_range( struct xfs_mount *mp, unsigned long long *min, -- 2.28.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] xfs: type verification is expensive 2021-02-23 5:47 ` [PATCH 1/3] xfs: type verification is expensive Dave Chinner @ 2021-02-24 21:46 ` Darrick J. Wong 2021-02-25 9:03 ` Christoph Hellwig 1 sibling, 0 replies; 11+ messages in thread From: Darrick J. Wong @ 2021-02-24 21:46 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Feb 23, 2021 at 04:47:46PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > From a concurrent rm -rf workload: > > 41.04% [kernel] [k] xfs_dir3_leaf_check_int > 9.85% [kernel] [k] __xfs_dir3_data_check > 5.60% [kernel] [k] xfs_verify_ino > 5.32% [kernel] [k] xfs_agino_range > 4.21% [kernel] [k] memcpy > 3.06% [kernel] [k] xfs_errortag_test > 2.57% [kernel] [k] xfs_dir_ino_validate > 1.66% [kernel] [k] xfs_dir2_data_get_ftype > 1.17% [kernel] [k] do_raw_spin_lock > 1.11% [kernel] [k] xfs_verify_dir_ino > 0.84% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock > 0.83% [kernel] [k] xfs_buf_find > 0.64% [kernel] [k] xfs_log_commit_cil > > THere's an awful lot of overhead in just range checking inode > numbers in that, but each inode number check is not a lot of code. > The total is a bit over 14.5% of the CPU time is spent validating > inode numbers. > > The problem is that they deeply nested global scope functions so the > overhead here is all in function call marshalling. > > text data bss dec hex filename > 2077 0 0 2077 81d fs/xfs/libxfs/xfs_types.o.orig > 2197 0 0 2197 895 fs/xfs/libxfs/xfs_types.o > > There's a small increase in binary size by inlining all the local > nested calls in the verifier functions, but the same workload now > profiles as: > > 40.69% [kernel] [k] xfs_dir3_leaf_check_int > 10.52% [kernel] [k] __xfs_dir3_data_check > 6.68% [kernel] [k] xfs_verify_dir_ino > 4.22% [kernel] [k] xfs_errortag_test > 4.15% [kernel] [k] memcpy > 3.53% [kernel] [k] xfs_dir_ino_validate > 1.87% [kernel] [k] xfs_dir2_data_get_ftype > 1.37% [kernel] [k] do_raw_spin_lock > 0.98% [kernel] [k] xfs_buf_find > 0.94% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock > 0.73% [kernel] [k] xfs_log_commit_cil > > Now we only spend just over 10% of the time validing inode numbers > for the same workload. Hence a few "inline" keyworks is good enough > to reduce the validation overhead by 30%... > > Signed-off-by: Dave Chinner <dchinner@redhat.com> Looks fine I guess, Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > --- > fs/xfs/libxfs/xfs_types.c | 18 +++++++++--------- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_types.c b/fs/xfs/libxfs/xfs_types.c > index b254fbeaaa50..04801362e1a7 100644 > --- a/fs/xfs/libxfs/xfs_types.c > +++ b/fs/xfs/libxfs/xfs_types.c > @@ -13,7 +13,7 @@ > #include "xfs_mount.h" > > /* Find the size of the AG, in blocks. */ > -xfs_agblock_t > +inline xfs_agblock_t > xfs_ag_block_count( > struct xfs_mount *mp, > xfs_agnumber_t agno) > @@ -29,7 +29,7 @@ xfs_ag_block_count( > * Verify that an AG block number pointer neither points outside the AG > * nor points at static metadata. > */ > -bool > +inline bool > xfs_verify_agbno( > struct xfs_mount *mp, > xfs_agnumber_t agno, > @@ -49,7 +49,7 @@ xfs_verify_agbno( > * Verify that an FS block number pointer neither points outside the > * filesystem nor points at static AG metadata. > */ > -bool > +inline bool > xfs_verify_fsbno( > struct xfs_mount *mp, > xfs_fsblock_t fsbno) > @@ -85,7 +85,7 @@ xfs_verify_fsbext( > } > > /* Calculate the first and last possible inode number in an AG. */ > -void > +inline void > xfs_agino_range( > struct xfs_mount *mp, > xfs_agnumber_t agno, > @@ -116,7 +116,7 @@ xfs_agino_range( > * Verify that an AG inode number pointer neither points outside the AG > * nor points at static metadata. > */ > -bool > +inline bool > xfs_verify_agino( > struct xfs_mount *mp, > xfs_agnumber_t agno, > @@ -146,7 +146,7 @@ xfs_verify_agino_or_null( > * Verify that an FS inode number pointer neither points outside the > * filesystem nor points at static AG metadata. > */ > -bool > +inline bool > xfs_verify_ino( > struct xfs_mount *mp, > xfs_ino_t ino) > @@ -162,7 +162,7 @@ xfs_verify_ino( > } > > /* Is this an internal inode number? */ > -bool > +inline bool > xfs_internal_inum( > struct xfs_mount *mp, > xfs_ino_t ino) > @@ -190,7 +190,7 @@ xfs_verify_dir_ino( > * Verify that an realtime block number pointer doesn't point off the > * end of the realtime device. > */ > -bool > +inline bool > xfs_verify_rtbno( > struct xfs_mount *mp, > xfs_rtblock_t rtbno) > @@ -215,7 +215,7 @@ xfs_verify_rtext( > } > > /* Calculate the range of valid icount values. */ > -void > +inline void > xfs_icount_range( > struct xfs_mount *mp, > unsigned long long *min, > -- > 2.28.0 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] xfs: type verification is expensive 2021-02-23 5:47 ` [PATCH 1/3] xfs: type verification is expensive Dave Chinner 2021-02-24 21:46 ` Darrick J. Wong @ 2021-02-25 9:03 ` Christoph Hellwig 2021-02-25 22:04 ` Dave Chinner 1 sibling, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2021-02-25 9:03 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs Any reason to not just mark them static inline and move them to xfs_types.h? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/3] xfs: type verification is expensive 2021-02-25 9:03 ` Christoph Hellwig @ 2021-02-25 22:04 ` Dave Chinner 0 siblings, 0 replies; 11+ messages in thread From: Dave Chinner @ 2021-02-25 22:04 UTC (permalink / raw) To: Christoph Hellwig; +Cc: linux-xfs On Thu, Feb 25, 2021 at 10:03:37AM +0100, Christoph Hellwig wrote: > Any reason to not just mark them static inline and move them to > xfs_types.h? Circular header file dependencies. xfs_mount.h needs xfs_types.h and moving these to xfs_types.h means xfs_types.h now depends on xfs_mount.h and a bunch of other header files... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check 2021-02-23 5:47 [PATCH 0/3] xfs: 64kb directory block verification hurts Dave Chinner 2021-02-23 5:47 ` [PATCH 1/3] xfs: type verification is expensive Dave Chinner @ 2021-02-23 5:47 ` Dave Chinner 2021-02-24 21:47 ` Darrick J. Wong 2021-02-25 9:06 ` Christoph Hellwig 2021-02-23 5:47 ` [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks Dave Chinner 2 siblings, 2 replies; 11+ messages in thread From: Dave Chinner @ 2021-02-23 5:47 UTC (permalink / raw) To: linux-xfs From: Dave Chinner <dchinner@redhat.com> We call xfs_dir_ino_validate() for every dir entry in a directory when doing validity checking of the directory. It calls xfs_verify_dir_ino() then emits a corruption report if bad or does error injection if good. It is extremely costly: 43.27% [kernel] [k] xfs_dir3_leaf_check_int 10.28% [kernel] [k] __xfs_dir3_data_check 6.61% [kernel] [k] xfs_verify_dir_ino 4.16% [kernel] [k] xfs_errortag_test 4.00% [kernel] [k] memcpy 3.48% [kernel] [k] xfs_dir_ino_validate 7% of the cpu usage in this directory traversal workload is xfs_dir_ino_validate() doing absolutely nothing. We don't need error injection to simulate a bad inode numbers in the directory structure because we can do that by fuzzing the structure on disk. And we don't need a corruption report, because the __xfs_dir3_data_check() will emit one if the inode number is bad. So just call xfs_verify_dir_ino() directly here, and get rid of all this unnecessary overhead: 40.30% [kernel] [k] xfs_dir3_leaf_check_int 10.98% [kernel] [k] __xfs_dir3_data_check 8.10% [kernel] [k] xfs_verify_dir_ino 4.42% [kernel] [k] memcpy 2.22% [kernel] [k] xfs_dir2_data_get_ftype 1.52% [kernel] [k] do_raw_spin_lock Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/libxfs/xfs_dir2_data.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_dir2_data.c b/fs/xfs/libxfs/xfs_dir2_data.c index 375b3edb2ad2..e67fa086f2c1 100644 --- a/fs/xfs/libxfs/xfs_dir2_data.c +++ b/fs/xfs/libxfs/xfs_dir2_data.c @@ -218,7 +218,7 @@ __xfs_dir3_data_check( */ if (dep->namelen == 0) return __this_address; - if (xfs_dir_ino_validate(mp, be64_to_cpu(dep->inumber))) + if (!xfs_verify_dir_ino(mp, be64_to_cpu(dep->inumber))) return __this_address; if (offset + xfs_dir2_data_entsize(mp, dep->namelen) > end) return __this_address; -- 2.28.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check 2021-02-23 5:47 ` [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check Dave Chinner @ 2021-02-24 21:47 ` Darrick J. Wong 2021-02-25 9:06 ` Christoph Hellwig 1 sibling, 0 replies; 11+ messages in thread From: Darrick J. Wong @ 2021-02-24 21:47 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Feb 23, 2021 at 04:47:47PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > We call xfs_dir_ino_validate() for every dir entry in a directory > when doing validity checking of the directory. It calls > xfs_verify_dir_ino() then emits a corruption report if bad or does > error injection if good. It is extremely costly: > > 43.27% [kernel] [k] xfs_dir3_leaf_check_int > 10.28% [kernel] [k] __xfs_dir3_data_check > 6.61% [kernel] [k] xfs_verify_dir_ino > 4.16% [kernel] [k] xfs_errortag_test > 4.00% [kernel] [k] memcpy > 3.48% [kernel] [k] xfs_dir_ino_validate > > 7% of the cpu usage in this directory traversal workload is > xfs_dir_ino_validate() doing absolutely nothing. > > We don't need error injection to simulate a bad inode numbers in the > directory structure because we can do that by fuzzing the structure > on disk. > > And we don't need a corruption report, because the > __xfs_dir3_data_check() will emit one if the inode number is bad. > > So just call xfs_verify_dir_ino() directly here, and get rid of all > this unnecessary overhead: > > 40.30% [kernel] [k] xfs_dir3_leaf_check_int > 10.98% [kernel] [k] __xfs_dir3_data_check > 8.10% [kernel] [k] xfs_verify_dir_ino > 4.42% [kernel] [k] memcpy > 2.22% [kernel] [k] xfs_dir2_data_get_ftype > 1.52% [kernel] [k] do_raw_spin_lock > > Signed-off-by: Dave Chinner <dchinner@redhat.com> Heh, yeah, this one has filled up the fuzzer test results for a while... Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > --- > fs/xfs/libxfs/xfs_dir2_data.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/xfs/libxfs/xfs_dir2_data.c b/fs/xfs/libxfs/xfs_dir2_data.c > index 375b3edb2ad2..e67fa086f2c1 100644 > --- a/fs/xfs/libxfs/xfs_dir2_data.c > +++ b/fs/xfs/libxfs/xfs_dir2_data.c > @@ -218,7 +218,7 @@ __xfs_dir3_data_check( > */ > if (dep->namelen == 0) > return __this_address; > - if (xfs_dir_ino_validate(mp, be64_to_cpu(dep->inumber))) > + if (!xfs_verify_dir_ino(mp, be64_to_cpu(dep->inumber))) > return __this_address; > if (offset + xfs_dir2_data_entsize(mp, dep->namelen) > end) > return __this_address; > -- > 2.28.0 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check 2021-02-23 5:47 ` [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check Dave Chinner 2021-02-24 21:47 ` Darrick J. Wong @ 2021-02-25 9:06 ` Christoph Hellwig 1 sibling, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2021-02-25 9:06 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> On Tue, Feb 23, 2021 at 04:47:47PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > We call xfs_dir_ino_validate() for every dir entry in a directory > when doing validity checking of the directory. It calls > xfs_verify_dir_ino() then emits a corruption report if bad or does > error injection if good. It is extremely costly: > > 43.27% [kernel] [k] xfs_dir3_leaf_check_int > 10.28% [kernel] [k] __xfs_dir3_data_check > 6.61% [kernel] [k] xfs_verify_dir_ino > 4.16% [kernel] [k] xfs_errortag_test > 4.00% [kernel] [k] memcpy > 3.48% [kernel] [k] xfs_dir_ino_validate > > 7% of the cpu usage in this directory traversal workload is > xfs_dir_ino_validate() doing absolutely nothing. > > We don't need error injection to simulate a bad inode numbers in the > directory structure because we can do that by fuzzing the structure > on disk. > > And we don't need a corruption report, because the > __xfs_dir3_data_check() will emit one if the inode number is bad. > > So just call xfs_verify_dir_ino() directly here, and get rid of all > this unnecessary overhead: > > 40.30% [kernel] [k] xfs_dir3_leaf_check_int > 10.98% [kernel] [k] __xfs_dir3_data_check > 8.10% [kernel] [k] xfs_verify_dir_ino > 4.42% [kernel] [k] memcpy > 2.22% [kernel] [k] xfs_dir2_data_get_ftype > 1.52% [kernel] [k] do_raw_spin_lock > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/libxfs/xfs_dir2_data.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/xfs/libxfs/xfs_dir2_data.c b/fs/xfs/libxfs/xfs_dir2_data.c > index 375b3edb2ad2..e67fa086f2c1 100644 > --- a/fs/xfs/libxfs/xfs_dir2_data.c > +++ b/fs/xfs/libxfs/xfs_dir2_data.c > @@ -218,7 +218,7 @@ __xfs_dir3_data_check( > */ > if (dep->namelen == 0) > return __this_address; > - if (xfs_dir_ino_validate(mp, be64_to_cpu(dep->inumber))) > + if (!xfs_verify_dir_ino(mp, be64_to_cpu(dep->inumber))) > return __this_address; > if (offset + xfs_dir2_data_entsize(mp, dep->namelen) > end) > return __this_address; > -- > 2.28.0 > ---end quoted text--- ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks 2021-02-23 5:47 [PATCH 0/3] xfs: 64kb directory block verification hurts Dave Chinner 2021-02-23 5:47 ` [PATCH 1/3] xfs: type verification is expensive Dave Chinner 2021-02-23 5:47 ` [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check Dave Chinner @ 2021-02-23 5:47 ` Dave Chinner 2021-02-24 21:50 ` Darrick J. Wong 2021-02-25 9:09 ` Christoph Hellwig 2 siblings, 2 replies; 11+ messages in thread From: Dave Chinner @ 2021-02-23 5:47 UTC (permalink / raw) To: linux-xfs From: Dave Chinner <dchinner@redhat.com> On debug kernels, we call xfs_dir3_leaf_check_int() multiple times on every directory modification. The robust hash ordering checks it does on every entry in the leaf on every call results in a massive CPU overhead which slows down debug kernels by a large amount. We use xfs_dir3_leaf_check_int() for the verifiers as well, so we can't just gut the function to reduce overhead. What we can do, however, is reduce the work it does when it is called from the debug interfaces, just leaving the high level checks in place and leaving the robust validation to the verifiers. This means the debug checks will catch gross errors, but subtle bugs might not be caught until a verifier is run. It is easy enough to restore the existing debug behaviour if the developer needs it (just change a call parameter in the debug code), but overwise the overhead makes testing large directory block sizes on debug kernels very slow. Profile at an unlink rate of ~80k file/s on a 64k block size filesystem before the patch: 40.30% [kernel] [k] xfs_dir3_leaf_check_int 10.98% [kernel] [k] __xfs_dir3_data_check 8.10% [kernel] [k] xfs_verify_dir_ino 4.42% [kernel] [k] memcpy 2.22% [kernel] [k] xfs_dir2_data_get_ftype 1.52% [kernel] [k] do_raw_spin_lock Profile after, at an unlink rate of ~125k files/s (+50% improvement) has largely dropped the leaf verification debug overhead out of the profile. 16.53% [kernel] [k] __xfs_dir3_data_check 12.53% [kernel] [k] xfs_verify_dir_ino 7.97% [kernel] [k] memcpy 3.36% [kernel] [k] xfs_dir2_data_get_ftype 2.86% [kernel] [k] __pv_queued_spin_lock_slowpath Create shows a similar change in profile and a +25% improvement in performance. Signed-off-by: Dave Chinner <dchinner@redhat.com> --- fs/xfs/libxfs/xfs_dir2_leaf.c | 10 +++++++--- fs/xfs/libxfs/xfs_dir2_node.c | 2 +- fs/xfs/libxfs/xfs_dir2_priv.h | 3 ++- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c index 95d2a3f92d75..ccd8d0aa62b8 100644 --- a/fs/xfs/libxfs/xfs_dir2_leaf.c +++ b/fs/xfs/libxfs/xfs_dir2_leaf.c @@ -113,7 +113,7 @@ xfs_dir3_leaf1_check( } else if (leafhdr.magic != XFS_DIR2_LEAF1_MAGIC) return __this_address; - return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf); + return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf, false); } static inline void @@ -139,7 +139,8 @@ xfs_failaddr_t xfs_dir3_leaf_check_int( struct xfs_mount *mp, struct xfs_dir3_icleaf_hdr *hdr, - struct xfs_dir2_leaf *leaf) + struct xfs_dir2_leaf *leaf, + bool expensive_checking) { struct xfs_da_geometry *geo = mp->m_dir_geo; xfs_dir2_leaf_tail_t *ltp; @@ -162,6 +163,9 @@ xfs_dir3_leaf_check_int( (char *)&hdr->ents[hdr->count] > (char *)xfs_dir2_leaf_bests_p(ltp)) return __this_address; + if (!expensive_checking) + return NULL; + /* Check hash value order, count stale entries. */ for (i = stale = 0; i < hdr->count; i++) { if (i + 1 < hdr->count) { @@ -195,7 +199,7 @@ xfs_dir3_leaf_verify( return fa; xfs_dir2_leaf_hdr_from_disk(mp, &leafhdr, bp->b_addr); - return xfs_dir3_leaf_check_int(mp, &leafhdr, bp->b_addr); + return xfs_dir3_leaf_check_int(mp, &leafhdr, bp->b_addr, true); } static void diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c index 5d51265d29d6..80a64117b460 100644 --- a/fs/xfs/libxfs/xfs_dir2_node.c +++ b/fs/xfs/libxfs/xfs_dir2_node.c @@ -73,7 +73,7 @@ xfs_dir3_leafn_check( } else if (leafhdr.magic != XFS_DIR2_LEAFN_MAGIC) return __this_address; - return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf); + return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf, false); } static inline void diff --git a/fs/xfs/libxfs/xfs_dir2_priv.h b/fs/xfs/libxfs/xfs_dir2_priv.h index 44c6a77cba05..94943ce49cab 100644 --- a/fs/xfs/libxfs/xfs_dir2_priv.h +++ b/fs/xfs/libxfs/xfs_dir2_priv.h @@ -127,7 +127,8 @@ xfs_dir3_leaf_find_entry(struct xfs_dir3_icleaf_hdr *leafhdr, extern int xfs_dir2_node_to_leaf(struct xfs_da_state *state); extern xfs_failaddr_t xfs_dir3_leaf_check_int(struct xfs_mount *mp, - struct xfs_dir3_icleaf_hdr *hdr, struct xfs_dir2_leaf *leaf); + struct xfs_dir3_icleaf_hdr *hdr, struct xfs_dir2_leaf *leaf, + bool expensive_checks); /* xfs_dir2_node.c */ void xfs_dir2_free_hdr_from_disk(struct xfs_mount *mp, -- 2.28.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks 2021-02-23 5:47 ` [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks Dave Chinner @ 2021-02-24 21:50 ` Darrick J. Wong 2021-02-25 9:09 ` Christoph Hellwig 1 sibling, 0 replies; 11+ messages in thread From: Darrick J. Wong @ 2021-02-24 21:50 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Tue, Feb 23, 2021 at 04:47:48PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > On debug kernels, we call xfs_dir3_leaf_check_int() multiple times > on every directory modification. The robust hash ordering checks it > does on every entry in the leaf on every call results in a massive > CPU overhead which slows down debug kernels by a large amount. > > We use xfs_dir3_leaf_check_int() for the verifiers as well, so we > can't just gut the function to reduce overhead. What we can do, > however, is reduce the work it does when it is called from the > debug interfaces, just leaving the high level checks in place and > leaving the robust validation to the verifiers. This means the debug > checks will catch gross errors, but subtle bugs might not be caught > until a verifier is run. > > It is easy enough to restore the existing debug behaviour if the > developer needs it (just change a call parameter in the debug code), > but overwise the overhead makes testing large directory block sizes > on debug kernels very slow. > > Profile at an unlink rate of ~80k file/s on a 64k block size > filesystem before the patch: > > 40.30% [kernel] [k] xfs_dir3_leaf_check_int > 10.98% [kernel] [k] __xfs_dir3_data_check > 8.10% [kernel] [k] xfs_verify_dir_ino > 4.42% [kernel] [k] memcpy > 2.22% [kernel] [k] xfs_dir2_data_get_ftype > 1.52% [kernel] [k] do_raw_spin_lock > > Profile after, at an unlink rate of ~125k files/s (+50% improvement) > has largely dropped the leaf verification debug overhead out of the > profile. > > 16.53% [kernel] [k] __xfs_dir3_data_check > 12.53% [kernel] [k] xfs_verify_dir_ino > 7.97% [kernel] [k] memcpy > 3.36% [kernel] [k] xfs_dir2_data_get_ftype > 2.86% [kernel] [k] __pv_queued_spin_lock_slowpath > > Create shows a similar change in profile and a +25% improvement in > performance. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> Looks good, Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > --- > fs/xfs/libxfs/xfs_dir2_leaf.c | 10 +++++++--- > fs/xfs/libxfs/xfs_dir2_node.c | 2 +- > fs/xfs/libxfs/xfs_dir2_priv.h | 3 ++- > 3 files changed, 10 insertions(+), 5 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c > index 95d2a3f92d75..ccd8d0aa62b8 100644 > --- a/fs/xfs/libxfs/xfs_dir2_leaf.c > +++ b/fs/xfs/libxfs/xfs_dir2_leaf.c > @@ -113,7 +113,7 @@ xfs_dir3_leaf1_check( > } else if (leafhdr.magic != XFS_DIR2_LEAF1_MAGIC) > return __this_address; > > - return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf); > + return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf, false); > } > > static inline void > @@ -139,7 +139,8 @@ xfs_failaddr_t > xfs_dir3_leaf_check_int( > struct xfs_mount *mp, > struct xfs_dir3_icleaf_hdr *hdr, > - struct xfs_dir2_leaf *leaf) > + struct xfs_dir2_leaf *leaf, > + bool expensive_checking) > { > struct xfs_da_geometry *geo = mp->m_dir_geo; > xfs_dir2_leaf_tail_t *ltp; > @@ -162,6 +163,9 @@ xfs_dir3_leaf_check_int( > (char *)&hdr->ents[hdr->count] > (char *)xfs_dir2_leaf_bests_p(ltp)) > return __this_address; > > + if (!expensive_checking) > + return NULL; > + > /* Check hash value order, count stale entries. */ > for (i = stale = 0; i < hdr->count; i++) { > if (i + 1 < hdr->count) { > @@ -195,7 +199,7 @@ xfs_dir3_leaf_verify( > return fa; > > xfs_dir2_leaf_hdr_from_disk(mp, &leafhdr, bp->b_addr); > - return xfs_dir3_leaf_check_int(mp, &leafhdr, bp->b_addr); > + return xfs_dir3_leaf_check_int(mp, &leafhdr, bp->b_addr, true); > } > > static void > diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c > index 5d51265d29d6..80a64117b460 100644 > --- a/fs/xfs/libxfs/xfs_dir2_node.c > +++ b/fs/xfs/libxfs/xfs_dir2_node.c > @@ -73,7 +73,7 @@ xfs_dir3_leafn_check( > } else if (leafhdr.magic != XFS_DIR2_LEAFN_MAGIC) > return __this_address; > > - return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf); > + return xfs_dir3_leaf_check_int(dp->i_mount, &leafhdr, leaf, false); > } > > static inline void > diff --git a/fs/xfs/libxfs/xfs_dir2_priv.h b/fs/xfs/libxfs/xfs_dir2_priv.h > index 44c6a77cba05..94943ce49cab 100644 > --- a/fs/xfs/libxfs/xfs_dir2_priv.h > +++ b/fs/xfs/libxfs/xfs_dir2_priv.h > @@ -127,7 +127,8 @@ xfs_dir3_leaf_find_entry(struct xfs_dir3_icleaf_hdr *leafhdr, > extern int xfs_dir2_node_to_leaf(struct xfs_da_state *state); > > extern xfs_failaddr_t xfs_dir3_leaf_check_int(struct xfs_mount *mp, > - struct xfs_dir3_icleaf_hdr *hdr, struct xfs_dir2_leaf *leaf); > + struct xfs_dir3_icleaf_hdr *hdr, struct xfs_dir2_leaf *leaf, > + bool expensive_checks); > > /* xfs_dir2_node.c */ > void xfs_dir2_free_hdr_from_disk(struct xfs_mount *mp, > -- > 2.28.0 > ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks 2021-02-23 5:47 ` [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks Dave Chinner 2021-02-24 21:50 ` Darrick J. Wong @ 2021-02-25 9:09 ` Christoph Hellwig 1 sibling, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2021-02-25 9:09 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs Looks good, Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-02-25 22:05 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-02-23 5:47 [PATCH 0/3] xfs: 64kb directory block verification hurts Dave Chinner 2021-02-23 5:47 ` [PATCH 1/3] xfs: type verification is expensive Dave Chinner 2021-02-24 21:46 ` Darrick J. Wong 2021-02-25 9:03 ` Christoph Hellwig 2021-02-25 22:04 ` Dave Chinner 2021-02-23 5:47 ` [PATCH 2/3] xfs: No need for inode number error injection in __xfs_dir3_data_check Dave Chinner 2021-02-24 21:47 ` Darrick J. Wong 2021-02-25 9:06 ` Christoph Hellwig 2021-02-23 5:47 ` [PATCH 3/3] xfs: reduce debug overhead of dir leaf/node checks Dave Chinner 2021-02-24 21:50 ` Darrick J. Wong 2021-02-25 9:09 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).