* [PATCH v2 1/3] xfs: add lock protection when remove perag from radix tree
@ 2023-12-09 12:21 Long Li
2023-12-09 12:21 ` [PATCH v2 2/3] xfs: don't assert perag when free perag Long Li
2023-12-09 12:21 ` [PATCH v2 3/3] xfs: fix perag leak when growfs fails Long Li
0 siblings, 2 replies; 8+ messages in thread
From: Long Li @ 2023-12-09 12:21 UTC (permalink / raw)
To: djwong, chandanbabu; +Cc: linux-xfs, yi.zhang, houtao1, leo.lilong, yangerkun
Take mp->m_perag_lock for deletions from the perag radix tree in
xfs_initialize_perag to prevent racing with tagging operations.
Lookups are fine - they are RCU protected so already deal with the
tree changing shape underneath the lookup - but tagging operations
require the tree to be stable while the tags are propagated back up
to the root.
Right now there's nothing stopping radix tree tagging from operating
while a growfs operation is progress and adding/removing new entries
into the radix tree.
Hence we can have traversals that require a stable tree occurring at
the same time we are removing unused entries from the radix tree which
causes the shape of the tree to change.
Likely this hasn't caused a problem in the past because we are only
doing append addition and removal so the active AG part of the tree
is not changing shape, but that doesn't mean it is safe. Just making
the radix tree modifications serialise against each other is obviously
correct.
Signed-off-by: Long Li <leo.lilong@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
v2:
- Refer to Dave's explanation and rewrite the commit message.
fs/xfs/libxfs/xfs_ag.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index f9f4d694640d..cc10a3ca052f 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -424,13 +424,17 @@ xfs_initialize_perag(
out_remove_pag:
xfs_defer_drain_free(&pag->pag_intents_drain);
+ spin_lock(&mp->m_perag_lock);
radix_tree_delete(&mp->m_perag_tree, index);
+ spin_unlock(&mp->m_perag_lock);
out_free_pag:
kmem_free(pag);
out_unwind_new_pags:
/* unwind any prior newly initialized pags */
for (index = first_initialised; index < agcount; index++) {
+ spin_lock(&mp->m_perag_lock);
pag = radix_tree_delete(&mp->m_perag_tree, index);
+ spin_unlock(&mp->m_perag_lock);
if (!pag)
break;
xfs_buf_hash_destroy(pag);
--
2.31.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 2/3] xfs: don't assert perag when free perag
2023-12-09 12:21 [PATCH v2 1/3] xfs: add lock protection when remove perag from radix tree Long Li
@ 2023-12-09 12:21 ` Long Li
2023-12-11 21:42 ` Darrick J. Wong
2023-12-11 22:00 ` Dave Chinner
2023-12-09 12:21 ` [PATCH v2 3/3] xfs: fix perag leak when growfs fails Long Li
1 sibling, 2 replies; 8+ messages in thread
From: Long Li @ 2023-12-09 12:21 UTC (permalink / raw)
To: djwong, chandanbabu; +Cc: linux-xfs, yi.zhang, houtao1, leo.lilong, yangerkun
When releasing the perag in xfs_free_perag(), the assertion that the
perag in readix tree is correct in most cases. However, there is one
corner case where the assertion is not true. During log recovery, the
AGs become visible(that is included in mp->m_sb.sb_agcount) first, and
then the perag is initialized. If the initialization of the perag fails,
the assertion will be triggered. Worse yet, null pointer dereferencing
can occur.
Signed-off-by: Long Li <leo.lilong@huawei.com>
---
fs/xfs/libxfs/xfs_ag.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index cc10a3ca052f..11ed048c350c 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -258,7 +258,8 @@ xfs_free_perag(
spin_lock(&mp->m_perag_lock);
pag = radix_tree_delete(&mp->m_perag_tree, agno);
spin_unlock(&mp->m_perag_lock);
- ASSERT(pag);
+ if (!pag)
+ break;
XFS_IS_CORRUPT(pag->pag_mount, atomic_read(&pag->pag_ref) != 0);
xfs_defer_drain_free(&pag->pag_intents_drain);
--
2.31.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v2 3/3] xfs: fix perag leak when growfs fails
2023-12-09 12:21 [PATCH v2 1/3] xfs: add lock protection when remove perag from radix tree Long Li
2023-12-09 12:21 ` [PATCH v2 2/3] xfs: don't assert perag when free perag Long Li
@ 2023-12-09 12:21 ` Long Li
2023-12-11 22:03 ` Darrick J. Wong
1 sibling, 1 reply; 8+ messages in thread
From: Long Li @ 2023-12-09 12:21 UTC (permalink / raw)
To: djwong, chandanbabu; +Cc: linux-xfs, yi.zhang, houtao1, leo.lilong, yangerkun
During growfs, if new ag in memory has been initialized, however
sb_agcount has not been updated, if an error occurs at this time it
will cause perag leaks as follows, these new AGs will not been freed
during umount , because of these new AGs are not visible(that is
included in mp->m_sb.sb_agcount).
unreferenced object 0xffff88810be40200 (size 512):
comm "xfs_growfs", pid 857, jiffies 4294909093
hex dump (first 32 bytes):
00 c0 c1 05 81 88 ff ff 04 00 00 00 00 00 00 00 ................
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc 381741e2):
[<ffffffff8191aef6>] __kmalloc+0x386/0x4f0
[<ffffffff82553e65>] kmem_alloc+0xb5/0x2f0
[<ffffffff8238dac5>] xfs_initialize_perag+0xc5/0x810
[<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0
[<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0
[<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0
[<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0
[<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a
unreferenced object 0xffff88810be40800 (size 512):
comm "xfs_growfs", pid 857, jiffies 4294909093
hex dump (first 32 bytes):
20 00 00 00 00 00 00 00 57 ef be dc 00 00 00 00 .......W.......
10 08 e4 0b 81 88 ff ff 10 08 e4 0b 81 88 ff ff ................
backtrace (crc bde50e2d):
[<ffffffff8191b43a>] __kmalloc_node+0x3da/0x540
[<ffffffff81814489>] kvmalloc_node+0x99/0x160
[<ffffffff8286acff>] bucket_table_alloc.isra.0+0x5f/0x400
[<ffffffff8286bdc5>] rhashtable_init+0x405/0x760
[<ffffffff8238dda3>] xfs_initialize_perag+0x3a3/0x810
[<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0
[<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0
[<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0
[<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0
[<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a
Now, the logic for freeing perag in xfs_free_perag() and
xfs_initialize_perag() error handling is essentially the same. Factor
out xfs_free_perag_range() from xfs_free_perag(), used for freeing
unused perag within a specified range, inclued when growfs fails.
Signed-off-by: Long Li <leo.lilong@huawei.com>
---
fs/xfs/libxfs/xfs_ag.c | 35 ++++++++++++++++++++---------------
fs/xfs/libxfs/xfs_ag.h | 2 ++
fs/xfs/xfs_fsops.c | 5 ++++-
3 files changed, 26 insertions(+), 16 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
index 11ed048c350c..edec03ab09aa 100644
--- a/fs/xfs/libxfs/xfs_ag.c
+++ b/fs/xfs/libxfs/xfs_ag.c
@@ -245,16 +245,20 @@ __xfs_free_perag(
}
/*
- * Free up the per-ag resources associated with the mount structure.
+ * Free per-ag within the specified range, if agno is not found in the
+ * radix tree, then it means that agno and subsequent AGs have not been
+ * initialized.
*/
void
-xfs_free_perag(
- struct xfs_mount *mp)
+xfs_free_perag_range(
+ xfs_mount_t *mp,
+ xfs_agnumber_t agstart,
+ xfs_agnumber_t agend)
{
- struct xfs_perag *pag;
xfs_agnumber_t agno;
+ struct xfs_perag *pag;
- for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+ for (agno = agstart; agno < agend; agno++) {
spin_lock(&mp->m_perag_lock);
pag = radix_tree_delete(&mp->m_perag_tree, agno);
spin_unlock(&mp->m_perag_lock);
@@ -274,6 +278,16 @@ xfs_free_perag(
}
}
+/*
+ * Free up the per-ag resources associated with the mount structure.
+ */
+void
+xfs_free_perag(
+ struct xfs_mount *mp)
+{
+ xfs_free_perag_range(mp, 0, mp->m_sb.sb_agcount);
+}
+
/* Find the size of the AG, in blocks. */
static xfs_agblock_t
__xfs_ag_block_count(
@@ -432,16 +446,7 @@ xfs_initialize_perag(
kmem_free(pag);
out_unwind_new_pags:
/* unwind any prior newly initialized pags */
- for (index = first_initialised; index < agcount; index++) {
- spin_lock(&mp->m_perag_lock);
- pag = radix_tree_delete(&mp->m_perag_tree, index);
- spin_unlock(&mp->m_perag_lock);
- if (!pag)
- break;
- xfs_buf_hash_destroy(pag);
- xfs_defer_drain_free(&pag->pag_intents_drain);
- kmem_free(pag);
- }
+ xfs_free_perag_range(mp, first_initialised, agcount);
return error;
}
diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
index 2e0aef87d633..927f737f84ec 100644
--- a/fs/xfs/libxfs/xfs_ag.h
+++ b/fs/xfs/libxfs/xfs_ag.h
@@ -136,6 +136,8 @@ __XFS_AG_OPSTATE(agfl_needs_reset, AGFL_NEEDS_RESET)
int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t agcount,
xfs_rfsblock_t dcount, xfs_agnumber_t *maxagi);
int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t agno);
+void xfs_free_perag_range(xfs_mount_t *mp, xfs_agnumber_t agstart,
+ xfs_agnumber_t agend);
void xfs_free_perag(struct xfs_mount *mp);
/* Passive AG references */
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 57076a25f17d..144fea4374af 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -153,7 +153,7 @@ xfs_growfs_data_private(
error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata, -delta, 0,
0, &tp);
if (error)
- return error;
+ goto out_free_unused_perag;
last_pag = xfs_perag_get(mp, oagcount - 1);
if (delta > 0) {
@@ -227,6 +227,9 @@ xfs_growfs_data_private(
out_trans_cancel:
xfs_trans_cancel(tp);
+out_free_unused_perag:
+ if (nagcount > oagcount)
+ xfs_free_perag_range(mp, oagcount, nagcount);
return error;
}
--
2.31.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/3] xfs: don't assert perag when free perag
2023-12-09 12:21 ` [PATCH v2 2/3] xfs: don't assert perag when free perag Long Li
@ 2023-12-11 21:42 ` Darrick J. Wong
2023-12-11 22:00 ` Dave Chinner
1 sibling, 0 replies; 8+ messages in thread
From: Darrick J. Wong @ 2023-12-11 21:42 UTC (permalink / raw)
To: Long Li; +Cc: chandanbabu, linux-xfs, yi.zhang, houtao1, yangerkun
On Sat, Dec 09, 2023 at 08:21:06PM +0800, Long Li wrote:
> When releasing the perag in xfs_free_perag(), the assertion that the
> perag in readix tree is correct in most cases. However, there is one
> corner case where the assertion is not true. During log recovery, the
> AGs become visible(that is included in mp->m_sb.sb_agcount) first, and
> then the perag is initialized. If the initialization of the perag fails,
> the assertion will be triggered. Worse yet, null pointer dereferencing
> can occur.
>
> Signed-off-by: Long Li <leo.lilong@huawei.com>
> ---
> fs/xfs/libxfs/xfs_ag.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index cc10a3ca052f..11ed048c350c 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -258,7 +258,8 @@ xfs_free_perag(
> spin_lock(&mp->m_perag_lock);
> pag = radix_tree_delete(&mp->m_perag_tree, agno);
> spin_unlock(&mp->m_perag_lock);
> - ASSERT(pag);
> + if (!pag)
> + break;
Why wouldn't you continue to the next agnumber?
--D
> XFS_IS_CORRUPT(pag->pag_mount, atomic_read(&pag->pag_ref) != 0);
> xfs_defer_drain_free(&pag->pag_intents_drain);
>
> --
> 2.31.1
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/3] xfs: don't assert perag when free perag
2023-12-09 12:21 ` [PATCH v2 2/3] xfs: don't assert perag when free perag Long Li
2023-12-11 21:42 ` Darrick J. Wong
@ 2023-12-11 22:00 ` Dave Chinner
2023-12-12 13:28 ` Long Li
1 sibling, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2023-12-11 22:00 UTC (permalink / raw)
To: Long Li; +Cc: djwong, chandanbabu, linux-xfs, yi.zhang, houtao1, yangerkun
On Sat, Dec 09, 2023 at 08:21:06PM +0800, Long Li wrote:
> When releasing the perag in xfs_free_perag(), the assertion that the
> perag in readix tree is correct in most cases. However, there is one
> corner case where the assertion is not true. During log recovery, the
> AGs become visible(that is included in mp->m_sb.sb_agcount) first, and
> then the perag is initialized. If the initialization of the perag fails,
> the assertion will be triggered. Worse yet, null pointer dereferencing
> can occur.
I'm going to assume that you are talking about xlog_do_recover()
because the commit message doesn't actually tell us how this
situation occurs.
That code re-reads the superblock, then copies it to mp->m_sb,
then calls xfs_initialize_perag() with the values from mp->m_sb.
If log recovery replayed a growfs transaction, the mp->m_sb has a
larger sb_agcount and so then xfs_initialize_perag() is called
and if that fails we end up back in xfs_mountfs and the error
stack calls xfs_free_perag().
Is that correct?
If so, then the fix is to change how xlog_do_recover() works. It
needs to initialise the new perags before it updates the in-memory
superblock. If xfs_initialize_perag() fails, it undoes all the
changes it has made, so if we haven't updated the in-memory
superblock when the init of the new perags fails then the error
unwinding code works exactly as it should right now.
i.e. the bug is that xlog_do_recover() is leaving the in-memory
state inconsistent on init failure, and we need to fix that rather
than remove the assert that is telling us that in-memory state is
inconsistent....
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 3/3] xfs: fix perag leak when growfs fails
2023-12-09 12:21 ` [PATCH v2 3/3] xfs: fix perag leak when growfs fails Long Li
@ 2023-12-11 22:03 ` Darrick J. Wong
2023-12-12 13:40 ` Long Li
0 siblings, 1 reply; 8+ messages in thread
From: Darrick J. Wong @ 2023-12-11 22:03 UTC (permalink / raw)
To: Long Li; +Cc: chandanbabu, linux-xfs, yi.zhang, houtao1, yangerkun
On Sat, Dec 09, 2023 at 08:21:07PM +0800, Long Li wrote:
> During growfs, if new ag in memory has been initialized, however
> sb_agcount has not been updated, if an error occurs at this time it
> will cause perag leaks as follows, these new AGs will not been freed
> during umount , because of these new AGs are not visible(that is
> included in mp->m_sb.sb_agcount).
>
> unreferenced object 0xffff88810be40200 (size 512):
> comm "xfs_growfs", pid 857, jiffies 4294909093
> hex dump (first 32 bytes):
> 00 c0 c1 05 81 88 ff ff 04 00 00 00 00 00 00 00 ................
> 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace (crc 381741e2):
> [<ffffffff8191aef6>] __kmalloc+0x386/0x4f0
> [<ffffffff82553e65>] kmem_alloc+0xb5/0x2f0
> [<ffffffff8238dac5>] xfs_initialize_perag+0xc5/0x810
> [<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0
> [<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0
> [<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0
> [<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0
> [<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a
> unreferenced object 0xffff88810be40800 (size 512):
> comm "xfs_growfs", pid 857, jiffies 4294909093
> hex dump (first 32 bytes):
> 20 00 00 00 00 00 00 00 57 ef be dc 00 00 00 00 .......W.......
> 10 08 e4 0b 81 88 ff ff 10 08 e4 0b 81 88 ff ff ................
> backtrace (crc bde50e2d):
> [<ffffffff8191b43a>] __kmalloc_node+0x3da/0x540
> [<ffffffff81814489>] kvmalloc_node+0x99/0x160
> [<ffffffff8286acff>] bucket_table_alloc.isra.0+0x5f/0x400
> [<ffffffff8286bdc5>] rhashtable_init+0x405/0x760
> [<ffffffff8238dda3>] xfs_initialize_perag+0x3a3/0x810
> [<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0
> [<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0
> [<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0
> [<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0
> [<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a
>
> Now, the logic for freeing perag in xfs_free_perag() and
> xfs_initialize_perag() error handling is essentially the same. Factor
> out xfs_free_perag_range() from xfs_free_perag(), used for freeing
> unused perag within a specified range, inclued when growfs fails.
>
> Signed-off-by: Long Li <leo.lilong@huawei.com>
> ---
> fs/xfs/libxfs/xfs_ag.c | 35 ++++++++++++++++++++---------------
> fs/xfs/libxfs/xfs_ag.h | 2 ++
> fs/xfs/xfs_fsops.c | 5 ++++-
> 3 files changed, 26 insertions(+), 16 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> index 11ed048c350c..edec03ab09aa 100644
> --- a/fs/xfs/libxfs/xfs_ag.c
> +++ b/fs/xfs/libxfs/xfs_ag.c
> @@ -245,16 +245,20 @@ __xfs_free_perag(
> }
>
> /*
> - * Free up the per-ag resources associated with the mount structure.
> + * Free per-ag within the specified range, if agno is not found in the
> + * radix tree, then it means that agno and subsequent AGs have not been
> + * initialized.
> */
> void
> -xfs_free_perag(
> - struct xfs_mount *mp)
> +xfs_free_perag_range(
> + xfs_mount_t *mp,
Please stop reverting the codebase's use of typedefs for struct
pointers.
> + xfs_agnumber_t agstart,
> + xfs_agnumber_t agend)
This is also ^^ unnecessary indentation.
> {
> - struct xfs_perag *pag;
> xfs_agnumber_t agno;
> + struct xfs_perag *pag;
...and unnecessary rearranging of variables...
>
> - for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
> + for (agno = agstart; agno < agend; agno++) {
> spin_lock(&mp->m_perag_lock);
> pag = radix_tree_delete(&mp->m_perag_tree, agno);
> spin_unlock(&mp->m_perag_lock);
> @@ -274,6 +278,16 @@ xfs_free_perag(
> }
> }
>
> +/*
> + * Free up the per-ag resources associated with the mount structure.
> + */
> +void
> +xfs_free_perag(
> + struct xfs_mount *mp)
> +{
> + xfs_free_perag_range(mp, 0, mp->m_sb.sb_agcount);
> +}
> +
> /* Find the size of the AG, in blocks. */
> static xfs_agblock_t
> __xfs_ag_block_count(
> @@ -432,16 +446,7 @@ xfs_initialize_perag(
> kmem_free(pag);
> out_unwind_new_pags:
> /* unwind any prior newly initialized pags */
> - for (index = first_initialised; index < agcount; index++) {
> - spin_lock(&mp->m_perag_lock);
> - pag = radix_tree_delete(&mp->m_perag_tree, index);
> - spin_unlock(&mp->m_perag_lock);
> - if (!pag)
> - break;
> - xfs_buf_hash_destroy(pag);
> - xfs_defer_drain_free(&pag->pag_intents_drain);
> - kmem_free(pag);
> - }
> + xfs_free_perag_range(mp, first_initialised, agcount);
> return error;
> }
>
> diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h
> index 2e0aef87d633..927f737f84ec 100644
> --- a/fs/xfs/libxfs/xfs_ag.h
> +++ b/fs/xfs/libxfs/xfs_ag.h
> @@ -136,6 +136,8 @@ __XFS_AG_OPSTATE(agfl_needs_reset, AGFL_NEEDS_RESET)
> int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t agcount,
> xfs_rfsblock_t dcount, xfs_agnumber_t *maxagi);
> int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t agno);
> +void xfs_free_perag_range(xfs_mount_t *mp, xfs_agnumber_t agstart,
> + xfs_agnumber_t agend);
> void xfs_free_perag(struct xfs_mount *mp);
>
> /* Passive AG references */
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 57076a25f17d..144fea4374af 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -153,7 +153,7 @@ xfs_growfs_data_private(
> error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata, -delta, 0,
> 0, &tp);
> if (error)
> - return error;
> + goto out_free_unused_perag;
>
> last_pag = xfs_perag_get(mp, oagcount - 1);
> if (delta > 0) {
> @@ -227,6 +227,9 @@ xfs_growfs_data_private(
>
> out_trans_cancel:
> xfs_trans_cancel(tp);
> +out_free_unused_perag:
> + if (nagcount > oagcount)
> + xfs_free_perag_range(mp, oagcount, nagcount);
Complaints aside, this addition looks correct.
--D
> return error;
> }
>
> --
> 2.31.1
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/3] xfs: don't assert perag when free perag
2023-12-11 22:00 ` Dave Chinner
@ 2023-12-12 13:28 ` Long Li
0 siblings, 0 replies; 8+ messages in thread
From: Long Li @ 2023-12-12 13:28 UTC (permalink / raw)
To: Dave Chinner; +Cc: djwong, chandanbabu, linux-xfs, yi.zhang, houtao1, yangerkun
On Tue, Dec 12, 2023 at 09:00:50AM +1100, Dave Chinner wrote:
> On Sat, Dec 09, 2023 at 08:21:06PM +0800, Long Li wrote:
> > When releasing the perag in xfs_free_perag(), the assertion that the
> > perag in readix tree is correct in most cases. However, there is one
> > corner case where the assertion is not true. During log recovery, the
> > AGs become visible(that is included in mp->m_sb.sb_agcount) first, and
> > then the perag is initialized. If the initialization of the perag fails,
> > the assertion will be triggered. Worse yet, null pointer dereferencing
> > can occur.
>
> I'm going to assume that you are talking about xlog_do_recover()
> because the commit message doesn't actually tell us how this
> situation occurs.
>
> That code re-reads the superblock, then copies it to mp->m_sb,
> then calls xfs_initialize_perag() with the values from mp->m_sb.
>
> If log recovery replayed a growfs transaction, the mp->m_sb has a
> larger sb_agcount and so then xfs_initialize_perag() is called
> and if that fails we end up back in xfs_mountfs and the error
> stack calls xfs_free_perag().
>
> Is that correct?
Yes, you are right. When I tried to fix the perag leak issue in patch 3,
I found this problem.
>
> If so, then the fix is to change how xlog_do_recover() works. It
> needs to initialise the new perags before it updates the in-memory
> superblock. If xfs_initialize_perag() fails, it undoes all the
> changes it has made, so if we haven't updated the in-memory
> superblock when the init of the new perags fails then the error
> unwinding code works exactly as it should right now.
>
> i.e. the bug is that xlog_do_recover() is leaving the in-memory
> state inconsistent on init failure, and we need to fix that rather
> than remove the assert that is telling us that in-memory state is
> inconsistent....
>
Yes, agree with you, I used to think that removing the assertion
would solve the problem, but now it seems a bit lazy, the problem
should be solved at the source. Right now, I haven't figured out
how to fix this problem comprehensively, so I'll fix perag leak
issue first.
Thanks,
Long Li
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 3/3] xfs: fix perag leak when growfs fails
2023-12-11 22:03 ` Darrick J. Wong
@ 2023-12-12 13:40 ` Long Li
0 siblings, 0 replies; 8+ messages in thread
From: Long Li @ 2023-12-12 13:40 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: chandanbabu, linux-xfs, yi.zhang, houtao1, yangerkun
On Mon, Dec 11, 2023 at 02:03:05PM -0800, Darrick J. Wong wrote:
> On Sat, Dec 09, 2023 at 08:21:07PM +0800, Long Li wrote:
> > During growfs, if new ag in memory has been initialized, however
> > sb_agcount has not been updated, if an error occurs at this time it
> > will cause perag leaks as follows, these new AGs will not been freed
> > during umount , because of these new AGs are not visible(that is
> > included in mp->m_sb.sb_agcount).
> >
> > unreferenced object 0xffff88810be40200 (size 512):
> > comm "xfs_growfs", pid 857, jiffies 4294909093
> > hex dump (first 32 bytes):
> > 00 c0 c1 05 81 88 ff ff 04 00 00 00 00 00 00 00 ................
> > 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> > backtrace (crc 381741e2):
> > [<ffffffff8191aef6>] __kmalloc+0x386/0x4f0
> > [<ffffffff82553e65>] kmem_alloc+0xb5/0x2f0
> > [<ffffffff8238dac5>] xfs_initialize_perag+0xc5/0x810
> > [<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0
> > [<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0
> > [<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0
> > [<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0
> > [<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a
> > unreferenced object 0xffff88810be40800 (size 512):
> > comm "xfs_growfs", pid 857, jiffies 4294909093
> > hex dump (first 32 bytes):
> > 20 00 00 00 00 00 00 00 57 ef be dc 00 00 00 00 .......W.......
> > 10 08 e4 0b 81 88 ff ff 10 08 e4 0b 81 88 ff ff ................
> > backtrace (crc bde50e2d):
> > [<ffffffff8191b43a>] __kmalloc_node+0x3da/0x540
> > [<ffffffff81814489>] kvmalloc_node+0x99/0x160
> > [<ffffffff8286acff>] bucket_table_alloc.isra.0+0x5f/0x400
> > [<ffffffff8286bdc5>] rhashtable_init+0x405/0x760
> > [<ffffffff8238dda3>] xfs_initialize_perag+0x3a3/0x810
> > [<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0
> > [<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0
> > [<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0
> > [<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0
> > [<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a
> >
> > Now, the logic for freeing perag in xfs_free_perag() and
> > xfs_initialize_perag() error handling is essentially the same. Factor
> > out xfs_free_perag_range() from xfs_free_perag(), used for freeing
> > unused perag within a specified range, inclued when growfs fails.
> >
> > Signed-off-by: Long Li <leo.lilong@huawei.com>
> > ---
> > fs/xfs/libxfs/xfs_ag.c | 35 ++++++++++++++++++++---------------
> > fs/xfs/libxfs/xfs_ag.h | 2 ++
> > fs/xfs/xfs_fsops.c | 5 ++++-
> > 3 files changed, 26 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c
> > index 11ed048c350c..edec03ab09aa 100644
> > --- a/fs/xfs/libxfs/xfs_ag.c
> > +++ b/fs/xfs/libxfs/xfs_ag.c
> > @@ -245,16 +245,20 @@ __xfs_free_perag(
> > }
> >
> > /*
> > - * Free up the per-ag resources associated with the mount structure.
> > + * Free per-ag within the specified range, if agno is not found in the
> > + * radix tree, then it means that agno and subsequent AGs have not been
> > + * initialized.
> > */
> > void
> > -xfs_free_perag(
> > - struct xfs_mount *mp)
> > +xfs_free_perag_range(
> > + xfs_mount_t *mp,
>
> Please stop reverting the codebase's use of typedefs for struct
> pointers.
>
> > + xfs_agnumber_t agstart,
> > + xfs_agnumber_t agend)
>
> This is also ^^ unnecessary indentation.
>
> > {
> > - struct xfs_perag *pag;
> > xfs_agnumber_t agno;
> > + struct xfs_perag *pag;
>
> ...and unnecessary rearranging of variables...
I ignored these minor issues, thanks for pointing them out, and it will be changed
in the next version.
Thanks,
Long Li
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-12-12 13:36 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-09 12:21 [PATCH v2 1/3] xfs: add lock protection when remove perag from radix tree Long Li
2023-12-09 12:21 ` [PATCH v2 2/3] xfs: don't assert perag when free perag Long Li
2023-12-11 21:42 ` Darrick J. Wong
2023-12-11 22:00 ` Dave Chinner
2023-12-12 13:28 ` Long Li
2023-12-09 12:21 ` [PATCH v2 3/3] xfs: fix perag leak when growfs fails Long Li
2023-12-11 22:03 ` Darrick J. Wong
2023-12-12 13:40 ` Long Li
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox