* [PATCH] xfs_db: make sure agblocks is valid to prevent corruption
@ 2024-08-21 10:44 liuhuan01
2024-08-23 0:49 ` Darrick J. Wong
0 siblings, 1 reply; 10+ messages in thread
From: liuhuan01 @ 2024-08-21 10:44 UTC (permalink / raw)
To: linux-xfs; +Cc: djwong, cmaiolino, liuh
From: liuh <liuhuan01@kylinos.cn>
Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
xfs_db -c "sb 0" -c "p agblocks" /dev/loop1
System will generate signal SIGFPE corrupt the process. And the stack as follow:
corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
#0 libxfs_getbuf_flags
#1 libxfs_getbuf_flags
#2 libxfs_buf_read_map
#3 libxfs_buf_read
#4 libxfs_mount
#5 init
#6 main
The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
In this case, user cannot run in expert mode also.
Never check (mp)->m_sb.sb_agblocks before use it cause this issue.
Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message.
Signed-off-by: liuh <liuhuan01@kylinos.cn>
---
db/init.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/db/init.c b/db/init.c
index cea25ae5..2d3295ba 100644
--- a/db/init.c
+++ b/db/init.c
@@ -129,6 +129,13 @@ init(
}
}
+ if (unlikely(sbp->sb_agblocks == 0)) {
+ fprintf(stderr,
+ _("%s: device %s agblocks unexpected\n"),
+ progname, x.data.name);
+ exit(1);
+ }
+
agcount = sbp->sb_agcount;
mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER);
if (!mp) {
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs_db: make sure agblocks is valid to prevent corruption
2024-08-21 10:44 [PATCH] xfs_db: make sure agblocks is valid to prevent corruption liuhuan01
@ 2024-08-23 0:49 ` Darrick J. Wong
2024-08-27 3:23 ` Dave Chinner
0 siblings, 1 reply; 10+ messages in thread
From: Darrick J. Wong @ 2024-08-23 0:49 UTC (permalink / raw)
To: liuhuan01; +Cc: linux-xfs, cmaiolino
On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote:
> From: liuh <liuhuan01@kylinos.cn>
>
> Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
> xfs_db -c "sb 0" -c "p agblocks" /dev/loop1
>
> System will generate signal SIGFPE corrupt the process. And the stack as follow:
> corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
> #0 libxfs_getbuf_flags
> #1 libxfs_getbuf_flags
> #2 libxfs_buf_read_map
> #3 libxfs_buf_read
> #4 libxfs_mount
> #5 init
> #6 main
>
> The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
> In this case, user cannot run in expert mode also.
>
> Never check (mp)->m_sb.sb_agblocks before use it cause this issue.
> Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message.
>
> Signed-off-by: liuh <liuhuan01@kylinos.cn>
> ---
> db/init.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/db/init.c b/db/init.c
> index cea25ae5..2d3295ba 100644
> --- a/db/init.c
> +++ b/db/init.c
> @@ -129,6 +129,13 @@ init(
> }
> }
>
> + if (unlikely(sbp->sb_agblocks == 0)) {
> + fprintf(stderr,
> + _("%s: device %s agblocks unexpected\n"),
> + progname, x.data.name);
> + exit(1);
What if we set sb_agblocks to 1 and let the debugger continue?
--D
> + }
> +
> agcount = sbp->sb_agcount;
> mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER);
> if (!mp) {
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs_db: make sure agblocks is valid to prevent corruption
2024-08-23 0:49 ` Darrick J. Wong
@ 2024-08-27 3:23 ` Dave Chinner
2024-08-27 10:24 ` liuh
0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2024-08-27 3:23 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: liuhuan01, linux-xfs, cmaiolino
On Thu, Aug 22, 2024 at 05:49:12PM -0700, Darrick J. Wong wrote:
> On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote:
> > From: liuh <liuhuan01@kylinos.cn>
> >
> > Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
> > xfs_db -c "sb 0" -c "p agblocks" /dev/loop1
> >
> > System will generate signal SIGFPE corrupt the process. And the stack as follow:
> > corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
> > #0 libxfs_getbuf_flags
> > #1 libxfs_getbuf_flags
> > #2 libxfs_buf_read_map
> > #3 libxfs_buf_read
> > #4 libxfs_mount
> > #5 init
> > #6 main
> >
> > The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
> > In this case, user cannot run in expert mode also.
> >
> > Never check (mp)->m_sb.sb_agblocks before use it cause this issue.
> > Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message.
> >
> > Signed-off-by: liuh <liuhuan01@kylinos.cn>
> > ---
> > db/init.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/db/init.c b/db/init.c
> > index cea25ae5..2d3295ba 100644
> > --- a/db/init.c
> > +++ b/db/init.c
> > @@ -129,6 +129,13 @@ init(
> > }
> > }
> >
> > + if (unlikely(sbp->sb_agblocks == 0)) {
> > + fprintf(stderr,
> > + _("%s: device %s agblocks unexpected\n"),
> > + progname, x.data.name);
> > + exit(1);
>
> What if we set sb_agblocks to 1 and let the debugger continue?
Yeah, I'd prefer that xfs_db will operate on a corrupt filesystem and
maybe crash unexpectedly than to refuse to allow any diagnosis of
the corrupt filesystem.
xfs_db is a debug and forensic analysis tool. Having it crash
because it didn't handle some corruption entirely corectly isn't
something that we should be particularly worried about...
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs_db: make sure agblocks is valid to prevent corruption
2024-08-27 3:23 ` Dave Chinner
@ 2024-08-27 10:24 ` liuh
2024-08-27 23:37 ` Dave Chinner
0 siblings, 1 reply; 10+ messages in thread
From: liuh @ 2024-08-27 10:24 UTC (permalink / raw)
To: Dave Chinner, Darrick J. Wong; +Cc: linux-xfs, cmaiolino
在 2024/8/27 11:23, Dave Chinner 写道:
> On Thu, Aug 22, 2024 at 05:49:12PM -0700, Darrick J. Wong wrote:
>> On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote:
>>> From: liuh <liuhuan01@kylinos.cn>
>>>
>>> Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
>>> xfs_db -c "sb 0" -c "p agblocks" /dev/loop1
>>>
>>> System will generate signal SIGFPE corrupt the process. And the stack as follow:
>>> corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
>>> #0 libxfs_getbuf_flags
>>> #1 libxfs_getbuf_flags
>>> #2 libxfs_buf_read_map
>>> #3 libxfs_buf_read
>>> #4 libxfs_mount
>>> #5 init
>>> #6 main
>>>
>>> The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
>>> In this case, user cannot run in expert mode also.
>>>
>>> Never check (mp)->m_sb.sb_agblocks before use it cause this issue.
>>> Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message.
>>>
>>> Signed-off-by: liuh <liuhuan01@kylinos.cn>
>>> ---
>>> db/init.c | 7 +++++++
>>> 1 file changed, 7 insertions(+)
>>>
>>> diff --git a/db/init.c b/db/init.c
>>> index cea25ae5..2d3295ba 100644
>>> --- a/db/init.c
>>> +++ b/db/init.c
>>> @@ -129,6 +129,13 @@ init(
>>> }
>>> }
>>>
>>> + if (unlikely(sbp->sb_agblocks == 0)) {
>>> + fprintf(stderr,
>>> + _("%s: device %s agblocks unexpected\n"),
>>> + progname, x.data.name);
>>> + exit(1);
>> What if we set sb_agblocks to 1 and let the debugger continue?
> Yeah, I'd prefer that xfs_db will operate on a corrupt filesystem and
> maybe crash unexpectedly than to refuse to allow any diagnosis of
> the corrupt filesystem.
>
> xfs_db is a debug and forensic analysis tool. Having it crash
> because it didn't handle some corruption entirely corectly isn't
> something that we should be particularly worried about...
>
> -Dave.
I agree with both of you, xfs_db is just a debugger tool.
But for the above case, xfs_db can do nothing, even do a simple view on
primary superblock.
The user all knowns is that xfs_db goto corrupt, but don't know what's
cause the problem.
If set sb_agblocks to 1, xfs_db can going to work to view on primary
superblock,
but can't relay on it to view more information.
Maybe left a warning message and set sb_agblocks to 1 and let debugger
continue is better.
Thanks.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs_db: make sure agblocks is valid to prevent corruption
2024-08-27 10:24 ` liuh
@ 2024-08-27 23:37 ` Dave Chinner
2024-09-02 10:12 ` [PATCH v2] " liuhuan01
0 siblings, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2024-08-27 23:37 UTC (permalink / raw)
To: liuh; +Cc: Darrick J. Wong, linux-xfs, cmaiolino
On Tue, Aug 27, 2024 at 06:24:25PM +0800, liuh wrote:
> 在 2024/8/27 11:23, Dave Chinner 写道:
> > On Thu, Aug 22, 2024 at 05:49:12PM -0700, Darrick J. Wong wrote:
> > > On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote:
> > > > From: liuh <liuhuan01@kylinos.cn>
> > > >
> > > > Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
> > > > xfs_db -c "sb 0" -c "p agblocks" /dev/loop1
> > > >
> > > > System will generate signal SIGFPE corrupt the process. And the stack as follow:
> > > > corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
> > > > #0 libxfs_getbuf_flags
> > > > #1 libxfs_getbuf_flags
> > > > #2 libxfs_buf_read_map
> > > > #3 libxfs_buf_read
> > > > #4 libxfs_mount
> > > > #5 init
> > > > #6 main
> > > >
> > > > The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
> > > > In this case, user cannot run in expert mode also.
> > > >
> > > > Never check (mp)->m_sb.sb_agblocks before use it cause this issue.
> > > > Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message.
> > > >
> > > > Signed-off-by: liuh <liuhuan01@kylinos.cn>
> > > > ---
> > > > db/init.c | 7 +++++++
> > > > 1 file changed, 7 insertions(+)
> > > >
> > > > diff --git a/db/init.c b/db/init.c
> > > > index cea25ae5..2d3295ba 100644
> > > > --- a/db/init.c
> > > > +++ b/db/init.c
> > > > @@ -129,6 +129,13 @@ init(
> > > > }
> > > > }
> > > > + if (unlikely(sbp->sb_agblocks == 0)) {
> > > > + fprintf(stderr,
> > > > + _("%s: device %s agblocks unexpected\n"),
> > > > + progname, x.data.name);
> > > > + exit(1);
> > > What if we set sb_agblocks to 1 and let the debugger continue?
> > Yeah, I'd prefer that xfs_db will operate on a corrupt filesystem and
> > maybe crash unexpectedly than to refuse to allow any diagnosis of
> > the corrupt filesystem.
> >
> > xfs_db is a debug and forensic analysis tool. Having it crash
> > because it didn't handle some corruption entirely corectly isn't
> > something that we should be particularly worried about...
> >
> > -Dave.
>
> I agree with both of you, xfs_db is just a debugger tool.
> But for the above case, xfs_db can do nothing, even do a simple view on
> primary superblock.
> The user all knowns is that xfs_db goto corrupt, but don't know what's cause
> the problem.
>
> If set sb_agblocks to 1, xfs_db can going to work to view on primary
> superblock,
> but can't relay on it to view more information.
Yes, a value of "1" will avoid the crash.
However, it is obviously not an ideal choice because a value of 1 is
not a valid value for sb_agblocks.
IOWs, Darricks suggestion really was not to literally use a value of
1, but to substitute a non-zero value that allows xfs_db to largely
function correctly on a filesystem corrupted in this way.
> Maybe left a warning message and set sb_agblocks to 1 and let debugger
> continue is better.
When two senior developers both say "do it this way", it is worth
trying to understand why they made that suggestion, not take the
suggestion as the exact solution to the problem. We've provided
the -framework- for the desired solution, but you still need to do
some work to make it work.
So what is a valid value for sb_agblocks? Is that value more
functional that a value of 1? If it is more functional, can we
calculate a better approximation of the correct value?
Go an look at all the geometry information we have in
the superblock that we can calculate an approximate AG size from.
sb_agcount tells us how many AGs there are.
sb_dblocks tells us the size of the filesystem.
sb_agblklog tells us the maximum size the AG could possibly be.
Go look for redundant metadata that we can get the exact value of
agblocks from without relying on a specific sb_agblocks value.
agf->agf_length should always equal sb_agblocks
agi->agi_length should always equal sb_agblocks
Look at the sb_agblocks verification/bounds checking code to
determine absolute min/max valid values.
XFS_MIN_AG_BYTES tells us the smallest valid AG size.
XFS_MIN_AG_BLOCKS tells us the smallest valid AG block count.
So, how would you go about calculating an approximate, if not
exactly correct value for the missing sb_agblocks value?
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2] xfs_db: make sure agblocks is valid to prevent corruption
2024-08-27 23:37 ` Dave Chinner
@ 2024-09-02 10:12 ` liuhuan01
2024-09-02 18:56 ` Darrick J. Wong
2024-09-03 2:28 ` Dave Chinner
0 siblings, 2 replies; 10+ messages in thread
From: liuhuan01 @ 2024-09-02 10:12 UTC (permalink / raw)
To: david; +Cc: cmaiolino, djwong, linux-xfs, liuh
From: liuh <liuhuan01@kylinos.cn>
Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
xfs_db -c "sb 0" -c "p agblocks" /dev/loop1
System will generate signal SIGFPE corrupt the process. And the stack as follow:
corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
#0 libxfs_getbuf_flags
#1 libxfs_getbuf_flags
#2 libxfs_buf_read_map
#3 libxfs_buf_read
#4 libxfs_mount
#5 init
#6 main
The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
In this case, user cannot run in expert mode also.
So, try to get agblocks from agf/agi 0, if failed use the default geometry to calc agblocks.
The worst thing is cannot get agblocks accroding above method, then set it to 1.
Signed-off-by: liuh <liuhuan01@kylinos.cn>
---
db/Makefile | 2 +-
db/init.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 129 insertions(+), 1 deletion(-)
diff --git a/db/Makefile b/db/Makefile
index 83389376..322d5617 100644
--- a/db/Makefile
+++ b/db/Makefile
@@ -68,7 +68,7 @@ CFILES = $(HFILES:.h=.c) \
LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh
LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBFROG) $(LIBUUID) $(LIBRT) $(LIBURCU) \
- $(LIBPTHREAD)
+ $(LIBPTHREAD) $(LIBBLKID)
LTDEPENDENCIES = $(LIBXFS) $(LIBXLOG) $(LIBFROG)
LLDFLAGS += -static-libtool-libs
diff --git a/db/init.c b/db/init.c
index cea25ae5..15124ee2 100644
--- a/db/init.c
+++ b/db/init.c
@@ -38,6 +38,121 @@ usage(void)
exit(1);
}
+static void
+xfs_guess_default_ag_geometry(uint64_t *agsize, uint64_t *agcount, struct libxfs_init *x)
+{
+ struct fs_topology ft;
+ int blocklog;
+ uint64_t dblocks;
+ int multidisk;
+
+ memset(&ft, 0, sizeof(ft));
+ get_topology(x, &ft, 1);
+
+ /*
+ * get geometry from get_topology result.
+ * Use default block size (2^12)
+ */
+ blocklog = 12;
+ multidisk = ft.data.swidth | ft.data.sunit;
+ dblocks = x->data.size >> (blocklog - BBSHIFT);
+ calc_default_ag_geometry(blocklog, dblocks, multidisk,
+ agsize, agcount);
+}
+
+static xfs_agblock_t
+xfs_get_agblock_from_agf(struct xfs_mount *mp)
+{
+ xfs_agblock_t agblocks = 0;
+ int error;
+ struct xfs_buf *bp;
+ struct xfs_agf *agf;
+
+ error = -libxfs_buf_read_uncached(mp->m_ddev_targp,
+ XFS_AG_DADDR(mp, 0, XFS_AGF_DADDR(mp)),
+ XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
+ if (error) {
+ fprintf(stderr, "xfs_get_agblock from agf-0 error %d\n", error);
+ return agblocks;
+ }
+
+ if (xfs_has_crc(mp) && !xfs_buf_verify_cksum(bp, XFS_AGF_CRC_OFF)) {
+ fprintf(stderr, "xfs_get_agblock from agf-0 badcrc\n");
+ return agblocks;
+ }
+
+ agf = bp->b_addr;
+ agblocks = be32_to_cpu(agf->agf_length);
+
+ libxfs_buf_relse(bp);
+
+ return agblocks;
+}
+
+static xfs_agblock_t
+xfs_get_agblock_from_agi(struct xfs_mount *mp)
+{
+ xfs_agblock_t agblocks = 0;
+ int error;
+ struct xfs_buf *bp;
+ struct xfs_agi *agi;
+
+ error = -libxfs_buf_read_uncached(mp->m_ddev_targp,
+ XFS_AG_DADDR(mp, 0, XFS_AGI_DADDR(mp)),
+ XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
+ if (error) {
+ fprintf(stderr, "xfs_get_agblock from agi-0 error %d\n", error);
+ return agblocks;
+ }
+
+ if (xfs_has_crc(mp) && !xfs_buf_verify_cksum(bp, XFS_AGI_CRC_OFF)) {
+ fprintf(stderr, "xfs_get_agblock from agi-0 badcrc\n");
+ return agblocks;
+ }
+
+ agi = bp->b_addr;
+ agblocks = be32_to_cpu(agi->agi_length);
+
+ libxfs_buf_relse(bp);
+
+ return agblocks;
+}
+
+/*
+ * If sb_agblocks was damaged, try to read it from agf/agi 0.
+ * With read agf/agi fails use default geometry to calc agblocks/agcount.
+ * The worst thing is cannot get agblocks according above method, then set to 1.
+ */
+static xfs_agblock_t
+xfs_try_get_agblocks(struct xfs_mount *mp, struct libxfs_init *x)
+{
+ xfs_agblock_t agblocks;
+ uint64_t agsize, agcount;
+
+ /* firset try to get agblocks from agf-0 */
+ agblocks = xfs_get_agblock_from_agf(mp);
+ if (XFS_FSB_TO_B(mp, agblocks) >= XFS_MIN_AG_BYTES &&
+ XFS_FSB_TO_B(mp, agblocks) <= XFS_MAX_AG_BYTES)
+ return agblocks;
+
+ /* second try to get agblocks from agi-0 */
+ agblocks = xfs_get_agblock_from_agi(mp);
+ if (XFS_FSB_TO_B(mp, agblocks) >= XFS_MIN_AG_BYTES &&
+ XFS_FSB_TO_B(mp, agblocks) <= XFS_MAX_AG_BYTES)
+ return agblocks;
+
+ /* third use default geometry to calc agblocks/agcount */
+ xfs_guess_default_ag_geometry(&agsize, &agcount, x);
+
+ if (XFS_FSB_TO_B(mp, agsize) < XFS_MIN_AG_BYTES ||
+ XFS_FSB_TO_B(mp, agsize) > XFS_MAX_AG_BYTES)
+ agblocks = 1; /* the worst is set to 1 */
+ else
+ agblocks = agsize;
+
+ return agblocks;
+}
+
static void
init(
int argc,
@@ -129,6 +244,19 @@ init(
}
}
+ /* If sb_agblocks was damaged, try to get agblocks */
+ if (XFS_FSB_TO_B(&xmount, sbp->sb_agblocks) < XFS_MIN_AG_BYTES ||
+ XFS_FSB_TO_B(&xmount, sbp->sb_agblocks) > XFS_MAX_AG_BYTES) {
+ xfs_agblock_t agblocks;
+ xfs_agblock_t bad_agblocks = sbp->sb_agblocks;
+
+ agblocks = xfs_try_get_agblocks(&xmount, &x);
+ sbp->sb_agblocks = agblocks;
+
+ fprintf(stderr, "wrong agblocks, try to get from agblocks %u -> %u\n",
+ bad_agblocks, sbp->sb_agblocks);
+ }
+
agcount = sbp->sb_agcount;
mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER);
if (!mp) {
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] xfs_db: make sure agblocks is valid to prevent corruption
2024-09-02 10:12 ` [PATCH v2] " liuhuan01
@ 2024-09-02 18:56 ` Darrick J. Wong
2024-09-03 2:28 ` Dave Chinner
1 sibling, 0 replies; 10+ messages in thread
From: Darrick J. Wong @ 2024-09-02 18:56 UTC (permalink / raw)
To: liuhuan01; +Cc: david, cmaiolino, linux-xfs
On Mon, Sep 02, 2024 at 06:12:39PM +0800, liuhuan01@kylinos.cn wrote:
> From: liuh <liuhuan01@kylinos.cn>
>
> Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
> xfs_db -c "sb 0" -c "p agblocks" /dev/loop1
>
> System will generate signal SIGFPE corrupt the process. And the stack as follow:
> corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
> #0 libxfs_getbuf_flags
> #1 libxfs_getbuf_flags
> #2 libxfs_buf_read_map
> #3 libxfs_buf_read
> #4 libxfs_mount
> #5 init
> #6 main
>
> The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
> In this case, user cannot run in expert mode also.
>
> So, try to get agblocks from agf/agi 0, if failed use the default geometry to calc agblocks.
> The worst thing is cannot get agblocks accroding above method, then set it to 1.
>
> Signed-off-by: liuh <liuhuan01@kylinos.cn>
> ---
> db/Makefile | 2 +-
> db/init.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 129 insertions(+), 1 deletion(-)
>
> diff --git a/db/Makefile b/db/Makefile
> index 83389376..322d5617 100644
> --- a/db/Makefile
> +++ b/db/Makefile
> @@ -68,7 +68,7 @@ CFILES = $(HFILES:.h=.c) \
> LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh
>
> LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBFROG) $(LIBUUID) $(LIBRT) $(LIBURCU) \
> - $(LIBPTHREAD)
> + $(LIBPTHREAD) $(LIBBLKID)
> LTDEPENDENCIES = $(LIBXFS) $(LIBXLOG) $(LIBFROG)
> LLDFLAGS += -static-libtool-libs
>
> diff --git a/db/init.c b/db/init.c
> index cea25ae5..15124ee2 100644
> --- a/db/init.c
> +++ b/db/init.c
> @@ -38,6 +38,121 @@ usage(void)
> exit(1);
> }
>
> +static void
> +xfs_guess_default_ag_geometry(uint64_t *agsize, uint64_t *agcount, struct libxfs_init *x)
> +{
> + struct fs_topology ft;
> + int blocklog;
> + uint64_t dblocks;
> + int multidisk;
> +
> + memset(&ft, 0, sizeof(ft));
> + get_topology(x, &ft, 1);
> +
> + /*
> + * get geometry from get_topology result.
> + * Use default block size (2^12)
> + */
> + blocklog = 12;
> + multidisk = ft.data.swidth | ft.data.sunit;
> + dblocks = x->data.size >> (blocklog - BBSHIFT);
> + calc_default_ag_geometry(blocklog, dblocks, multidisk,
> + agsize, agcount);
> +}
> +
> +static xfs_agblock_t
> +xfs_get_agblock_from_agf(struct xfs_mount *mp)
> +{
> + xfs_agblock_t agblocks = 0;
> + int error;
> + struct xfs_buf *bp;
> + struct xfs_agf *agf;
> +
> + error = -libxfs_buf_read_uncached(mp->m_ddev_targp,
> + XFS_AG_DADDR(mp, 0, XFS_AGF_DADDR(mp)),
> + XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
Why not call the full verifier on the buffer read?
> + if (error) {
> + fprintf(stderr, "xfs_get_agblock from agf-0 error %d\n", error);
> + return agblocks;
> + }
> +
> + if (xfs_has_crc(mp) && !xfs_buf_verify_cksum(bp, XFS_AGF_CRC_OFF)) {
> + fprintf(stderr, "xfs_get_agblock from agf-0 badcrc\n");
> + return agblocks;
> + }
Then you don't have to do this ^^^ stuff.
--D
> + agf = bp->b_addr;
> + agblocks = be32_to_cpu(agf->agf_length);
> +
> + libxfs_buf_relse(bp);
> +
> + return agblocks;
> +}
> +
> +static xfs_agblock_t
> +xfs_get_agblock_from_agi(struct xfs_mount *mp)
> +{
> + xfs_agblock_t agblocks = 0;
> + int error;
> + struct xfs_buf *bp;
> + struct xfs_agi *agi;
> +
> + error = -libxfs_buf_read_uncached(mp->m_ddev_targp,
> + XFS_AG_DADDR(mp, 0, XFS_AGI_DADDR(mp)),
> + XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
> + if (error) {
> + fprintf(stderr, "xfs_get_agblock from agi-0 error %d\n", error);
> + return agblocks;
> + }
> +
> + if (xfs_has_crc(mp) && !xfs_buf_verify_cksum(bp, XFS_AGI_CRC_OFF)) {
> + fprintf(stderr, "xfs_get_agblock from agi-0 badcrc\n");
> + return agblocks;
> + }
> +
> + agi = bp->b_addr;
> + agblocks = be32_to_cpu(agi->agi_length);
> +
> + libxfs_buf_relse(bp);
> +
> + return agblocks;
> +}
> +
> +/*
> + * If sb_agblocks was damaged, try to read it from agf/agi 0.
> + * With read agf/agi fails use default geometry to calc agblocks/agcount.
> + * The worst thing is cannot get agblocks according above method, then set to 1.
> + */
> +static xfs_agblock_t
> +xfs_try_get_agblocks(struct xfs_mount *mp, struct libxfs_init *x)
> +{
> + xfs_agblock_t agblocks;
> + uint64_t agsize, agcount;
> +
> + /* firset try to get agblocks from agf-0 */
> + agblocks = xfs_get_agblock_from_agf(mp);
> + if (XFS_FSB_TO_B(mp, agblocks) >= XFS_MIN_AG_BYTES &&
> + XFS_FSB_TO_B(mp, agblocks) <= XFS_MAX_AG_BYTES)
> + return agblocks;
> +
> + /* second try to get agblocks from agi-0 */
> + agblocks = xfs_get_agblock_from_agi(mp);
> + if (XFS_FSB_TO_B(mp, agblocks) >= XFS_MIN_AG_BYTES &&
> + XFS_FSB_TO_B(mp, agblocks) <= XFS_MAX_AG_BYTES)
> + return agblocks;
> +
> + /* third use default geometry to calc agblocks/agcount */
> + xfs_guess_default_ag_geometry(&agsize, &agcount, x);
> +
> + if (XFS_FSB_TO_B(mp, agsize) < XFS_MIN_AG_BYTES ||
> + XFS_FSB_TO_B(mp, agsize) > XFS_MAX_AG_BYTES)
> + agblocks = 1; /* the worst is set to 1 */
> + else
> + agblocks = agsize;
> +
> + return agblocks;
> +}
> +
> static void
> init(
> int argc,
> @@ -129,6 +244,19 @@ init(
> }
> }
>
> + /* If sb_agblocks was damaged, try to get agblocks */
> + if (XFS_FSB_TO_B(&xmount, sbp->sb_agblocks) < XFS_MIN_AG_BYTES ||
> + XFS_FSB_TO_B(&xmount, sbp->sb_agblocks) > XFS_MAX_AG_BYTES) {
> + xfs_agblock_t agblocks;
> + xfs_agblock_t bad_agblocks = sbp->sb_agblocks;
> +
> + agblocks = xfs_try_get_agblocks(&xmount, &x);
> + sbp->sb_agblocks = agblocks;
> +
> + fprintf(stderr, "wrong agblocks, try to get from agblocks %u -> %u\n",
> + bad_agblocks, sbp->sb_agblocks);
> + }
> +
> agcount = sbp->sb_agcount;
> mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER);
> if (!mp) {
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] xfs_db: make sure agblocks is valid to prevent corruption
2024-09-02 10:12 ` [PATCH v2] " liuhuan01
2024-09-02 18:56 ` Darrick J. Wong
@ 2024-09-03 2:28 ` Dave Chinner
2024-09-03 10:24 ` [PATCH v3] " liuhuan01
1 sibling, 1 reply; 10+ messages in thread
From: Dave Chinner @ 2024-09-03 2:28 UTC (permalink / raw)
To: liuhuan01; +Cc: cmaiolino, djwong, linux-xfs
On Mon, Sep 02, 2024 at 06:12:39PM +0800, liuhuan01@kylinos.cn wrote:
> From: liuh <liuhuan01@kylinos.cn>
>
> Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
> xfs_db -c "sb 0" -c "p agblocks" /dev/loop1
>
> System will generate signal SIGFPE corrupt the process. And the stack as follow:
> corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
> #0 libxfs_getbuf_flags
> #1 libxfs_getbuf_flags
> #2 libxfs_buf_read_map
> #3 libxfs_buf_read
> #4 libxfs_mount
> #5 init
> #6 main
>
> The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
> In this case, user cannot run in expert mode also.
>
> So, try to get agblocks from agf/agi 0, if failed use the default geometry to calc agblocks.
> The worst thing is cannot get agblocks accroding above method, then set it to 1.
>
> Signed-off-by: liuh <liuhuan01@kylinos.cn>
> ---
> db/Makefile | 2 +-
> db/init.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 129 insertions(+), 1 deletion(-)
>
> diff --git a/db/Makefile b/db/Makefile
> index 83389376..322d5617 100644
> --- a/db/Makefile
> +++ b/db/Makefile
> @@ -68,7 +68,7 @@ CFILES = $(HFILES:.h=.c) \
> LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh
>
> LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBFROG) $(LIBUUID) $(LIBRT) $(LIBURCU) \
> - $(LIBPTHREAD)
> + $(LIBPTHREAD) $(LIBBLKID)
> LTDEPENDENCIES = $(LIBXFS) $(LIBXLOG) $(LIBFROG)
> LLDFLAGS += -static-libtool-libs
>
> diff --git a/db/init.c b/db/init.c
> index cea25ae5..15124ee2 100644
> --- a/db/init.c
> +++ b/db/init.c
> @@ -38,6 +38,121 @@ usage(void)
> exit(1);
> }
>
> +static void
> +xfs_guess_default_ag_geometry(uint64_t *agsize, uint64_t *agcount, struct libxfs_init *x)
> +{
> + struct fs_topology ft;
> + int blocklog;
> + uint64_t dblocks;
> + int multidisk;
> +
> + memset(&ft, 0, sizeof(ft));
> + get_topology(x, &ft, 1);
> +
> + /*
> + * get geometry from get_topology result.
> + * Use default block size (2^12)
> + */
> + blocklog = 12;
> + multidisk = ft.data.swidth | ft.data.sunit;
> + dblocks = x->data.size >> (blocklog - BBSHIFT);
> + calc_default_ag_geometry(blocklog, dblocks, multidisk,
> + agsize, agcount);
> +}
> +
> +static xfs_agblock_t
> +xfs_get_agblock_from_agf(struct xfs_mount *mp)
> +{
> + xfs_agblock_t agblocks = 0;
> + int error;
> + struct xfs_buf *bp;
> + struct xfs_agf *agf;
> +
> + error = -libxfs_buf_read_uncached(mp->m_ddev_targp,
> + XFS_AG_DADDR(mp, 0, XFS_AGF_DADDR(mp)),
> + XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
You're not checking what the code you wrote actually does.
#define XFS_AGB_TO_DADDR(mp,agno,agbno) \
((xfs_daddr_t)XFS_FSB_TO_BB(mp, \
(xfs_fsblock_t)(agno) * (mp)->m_sb.sb_agblocks + (agbno)))
#define XFS_AG_DADDR(mp,agno,d) (XFS_AGB_TO_DADDR(mp, agno, 0) + (d))
Yup, XFS_AG_DADDR() uses the very field we've already decided is
invalid....
In these cases where we specifically want to read from the AG 0
headers and we can't trust anything in the superblock, we need to
open code the daddrs we need to read from.
In this case the daddr of AGF 0 is simply XFS_AGF_DADDR(mp). And we
don't need XFS_FSS_TO_BB(), either - we can simply read a single
sector at that location as the AGF and AGI headers always fit in a
single sector.
Hence this should be:
error = -libxfs_buf_read_uncached(mp->m_ddev_targp, XFS_AGF_DADDR(mp),
1, 0, &bp, NULL);
> + if (error) {
> + fprintf(stderr, "xfs_get_agblock from agf-0 error %d\n", error);
> + return agblocks;
Return NULLAGBLOCK on failure.
> + }
> +
> + if (xfs_has_crc(mp) && !xfs_buf_verify_cksum(bp, XFS_AGF_CRC_OFF)) {
> + fprintf(stderr, "xfs_get_agblock from agf-0 badcrc\n");
> + return agblocks;
> + }
We can't trust the CRC to discover corruption here - syzbot corrupts
structures and then recalculates the CRC. Hence a good CRC is not an
indication of a valid, uncorrupted structure. And ignoring the CRC
allows a single sector read....
> + agf = bp->b_addr;
> + agblocks = be32_to_cpu(agf->agf_length);
This is missing magic number checks to determine if the metadata
returned is actually an AGF block.
> +
> + libxfs_buf_relse(bp);
> +
> + return agblocks;
> +}
> +
> +static xfs_agblock_t
> +xfs_get_agblock_from_agi(struct xfs_mount *mp)
> +{
> + xfs_agblock_t agblocks = 0;
> + int error;
> + struct xfs_buf *bp;
> + struct xfs_agi *agi;
> +
> + error = -libxfs_buf_read_uncached(mp->m_ddev_targp,
> + XFS_AG_DADDR(mp, 0, XFS_AGI_DADDR(mp)),
> + XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL);
> + if (error) {
> + fprintf(stderr, "xfs_get_agblock from agi-0 error %d\n", error);
> + return agblocks;
> + }
> +
> + if (xfs_has_crc(mp) && !xfs_buf_verify_cksum(bp, XFS_AGI_CRC_OFF)) {
> + fprintf(stderr, "xfs_get_agblock from agi-0 badcrc\n");
> + return agblocks;
> + }
> +
> + agi = bp->b_addr;
> + agblocks = be32_to_cpu(agi->agi_length);
> +
> + libxfs_buf_relse(bp);
> +
> + return agblocks;
> +}
> +
> +/*
> + * If sb_agblocks was damaged, try to read it from agf/agi 0.
> + * With read agf/agi fails use default geometry to calc agblocks/agcount.
> + * The worst thing is cannot get agblocks according above method, then set to 1.
> + */
> +static xfs_agblock_t
> +xfs_try_get_agblocks(struct xfs_mount *mp, struct libxfs_init *x)
> +{
> + xfs_agblock_t agblocks;
> + uint64_t agsize, agcount;
> +
> + /* firset try to get agblocks from agf-0 */
> + agblocks = xfs_get_agblock_from_agf(mp);
> + if (XFS_FSB_TO_B(mp, agblocks) >= XFS_MIN_AG_BYTES &&
Ah, did you look at what XFS_FSB_TO_B() requires to be set?
#define XFS_FSB_TO_B(mp,fsbno) ((xfs_fsize_t)(fsbno) << (mp)->m_sb.sb_blocklog)
Essentially, detected that the superblock is damaged by using an
adjacent field in the superblock (which also may be damaged) to do
the validity check isn't reliable here.
We have XFS_MIN_AG_BLOCKS and XFS_MAX_AG_BLOCKS constants defined,
so that is what the validity checks here should use...
> + XFS_FSB_TO_B(mp, agblocks) <= XFS_MAX_AG_BYTES)
> + return agblocks;
We shouldn't return agblocks here - we need to validate that it is
the same as the AGI length because we can't trust a single value to
be correct if we have corruption in the headers.
i.e. if both AGI and AGF magic numbers are valid, and the lengths
match, it's a pretty good indication that this is the actual length
of the AG. It just doesn't matter if the AGF/AGI are
otherwise damaged and fail CRCs or pass the CRCs because of
malicious corruption, but if they match we can probably trust it to
be correct for the purposes of diagnosis.
> + /* second try to get agblocks from agi-0 */
> + agblocks = xfs_get_agblock_from_agi(mp);
> + if (XFS_FSB_TO_B(mp, agblocks) >= XFS_MIN_AG_BYTES &&
> + XFS_FSB_TO_B(mp, agblocks) <= XFS_MAX_AG_BYTES)
> + return agblocks;
> +
> + /* third use default geometry to calc agblocks/agcount */
> + xfs_guess_default_ag_geometry(&agsize, &agcount, x);
> +
> + if (XFS_FSB_TO_B(mp, agsize) < XFS_MIN_AG_BYTES ||
> + XFS_FSB_TO_B(mp, agsize) > XFS_MAX_AG_BYTES)
> + agblocks = 1; /* the worst is set to 1 */
*no*.
agblocks = 1 is *not a valid size* - why set an out-of-bounds value
for something we are already know is out-of-bounds?
Further, xfs_guess_default_ag_geometry() should never return an
invalid AG size unless the block device is under the minimum size
of an XFS filesystem. In which case, there's really nothing we can
do and should actually abort this saying "device too small to hold a
valid XFS filesystem".
> + else
> + agblocks = agsize;
The resultant should be probably be compared against the lengths
found from the AGI/AGF checks. If it matches one of them, we have
more confidence that this is the correct length.
> +
> + return agblocks;
> +}
> +
> static void
> init(
> int argc,
> @@ -129,6 +244,19 @@ init(
> }
> }
>
> + /* If sb_agblocks was damaged, try to get agblocks */
> + if (XFS_FSB_TO_B(&xmount, sbp->sb_agblocks) < XFS_MIN_AG_BYTES ||
> + XFS_FSB_TO_B(&xmount, sbp->sb_agblocks) > XFS_MAX_AG_BYTES) {
> + xfs_agblock_t agblocks;
> + xfs_agblock_t bad_agblocks = sbp->sb_agblocks;
> +
> + agblocks = xfs_try_get_agblocks(&xmount, &x);
> + sbp->sb_agblocks = agblocks;
> +
> + fprintf(stderr, "wrong agblocks, try to get from agblocks %u -> %u\n",
> + bad_agblocks, sbp->sb_agblocks);
What does this error message mean? If I saw that, I wouldn't have a
clue what it means.
I'd expect output something like this on a AGf/AGI recovery:
Out of bounds superblock agblocks (%u) found.
Attempting recovery from AGF/AGI 0 metadata....
AGF length %u blocks found.
AGI length %u blocks found.
AGF/AGI length matches.
Using %u blocks for superblock agblocks.
Or if AGF or AGI recovery fails:
Out of bounds superblock agblocks (%u) found.
Attempting recovery from AGF/AGI 0 metadata....
AGF length recovery failed.
AGI length %u blocks found.
Attempting to guess AG length from device geometry. This may not work.
Guessed AG length is %u blocks.
Guessed length matches AGI length.
Using %u blocks for superblock agblocks.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3] xfs_db: make sure agblocks is valid to prevent corruption
2024-09-03 2:28 ` Dave Chinner
@ 2024-09-03 10:24 ` liuhuan01
2024-09-16 6:55 ` Carlos Maiolino
0 siblings, 1 reply; 10+ messages in thread
From: liuhuan01 @ 2024-09-03 10:24 UTC (permalink / raw)
To: david; +Cc: cmaiolino, djwong, linux-xfs, liuh
From: liuh <liuhuan01@kylinos.cn>
Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
xfs_db -c "sb 0" -c "p agblocks" /dev/loop1
System will generate signal SIGFPE corrupt the process. And the stack as follow:
corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
#0 libxfs_getbuf_flags
#1 libxfs_getbuf_flags
#2 libxfs_buf_read_map
#3 libxfs_buf_read
#4 libxfs_mount
#5 init
#6 main
The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
In this case, user cannot run in expert mode also.
So, try to get agblocks from agf/agi 0, if agf/agi 0 length match, use it as agblocks.
If failed use the default geometry to calc agblocks.
Signed-off-by: liuh <liuhuan01@kylinos.cn>
---
db/Makefile | 2 +-
db/init.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 143 insertions(+), 1 deletion(-)
diff --git a/db/Makefile b/db/Makefile
index 83389376..322d5617 100644
--- a/db/Makefile
+++ b/db/Makefile
@@ -68,7 +68,7 @@ CFILES = $(HFILES:.h=.c) \
LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh
LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBFROG) $(LIBUUID) $(LIBRT) $(LIBURCU) \
- $(LIBPTHREAD)
+ $(LIBPTHREAD) $(LIBBLKID)
LTDEPENDENCIES = $(LIBXFS) $(LIBXLOG) $(LIBFROG)
LLDFLAGS += -static-libtool-libs
diff --git a/db/init.c b/db/init.c
index cea25ae5..167bc777 100644
--- a/db/init.c
+++ b/db/init.c
@@ -38,6 +38,138 @@ usage(void)
exit(1);
}
+static void
+xfs_guess_default_ag_geometry(uint64_t *agsize, uint64_t *agcount, struct libxfs_init *x)
+{
+ struct fs_topology ft;
+ int blocklog;
+ uint64_t dblocks;
+ int multidisk;
+
+ fprintf(stderr, "Attempting to guess AG length from device geometry. This may not work.\n");
+
+ memset(&ft, 0, sizeof(ft));
+ get_topology(x, &ft, 1);
+
+ /*
+ * get geometry from get_topology result.
+ * Use default block size (2^12)
+ */
+ blocklog = 12;
+ multidisk = ft.data.swidth | ft.data.sunit;
+ dblocks = x->data.size >> (blocklog - BBSHIFT);
+ calc_default_ag_geometry(blocklog, dblocks, multidisk,
+ agsize, agcount);
+
+ if (*agsize >= XFS_MIN_AG_BLOCKS && *agsize <= XFS_MAX_AG_BLOCKS)
+ fprintf(stderr, "Guessed AG length is %lu blocks.\n", *agsize);
+}
+
+static xfs_agblock_t
+xfs_get_agblock_from_agf(struct xfs_mount *mp)
+{
+ xfs_agblock_t agblocks = NULLAGBLOCK;
+ int error;
+ struct xfs_buf *bp;
+ struct xfs_agf *agf;
+
+ error = -libxfs_buf_read_uncached(mp->m_ddev_targp,
+ XFS_AGF_DADDR(mp), 1, 0, &bp, NULL);
+ if (error) {
+ fprintf(stderr, "AGF 0 length recovery failed\n");
+ return NULLAGBLOCK;
+ }
+
+ agf = bp->b_addr;
+ if (be32_to_cpu(agf->agf_magicnum) == XFS_AGF_MAGIC)
+ agblocks = be32_to_cpu(agf->agf_length);
+
+ libxfs_buf_relse(bp);
+
+ if (agblocks != NULLAGBLOCK)
+ fprintf(stderr, "AGF 0 length %u blocks found.\n", agblocks);
+ else
+ fprintf(stderr, "AGF 0 length recovery failed.\n");
+
+ return agblocks;
+}
+
+static xfs_agblock_t
+xfs_get_agblock_from_agi(struct xfs_mount *mp)
+{
+ xfs_agblock_t agblocks = NULLAGBLOCK;
+ int error;
+ struct xfs_buf *bp;
+ struct xfs_agi *agi;
+
+ error = -libxfs_buf_read_uncached(mp->m_ddev_targp,
+ XFS_AGI_DADDR(mp), 1, 0, &bp, NULL);
+ if (error) {
+ fprintf(stderr, "AGI 0 length recovery failed\n");
+ return NULLAGBLOCK;
+ }
+
+
+ agi = bp->b_addr;
+ if (be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC)
+ agblocks = be32_to_cpu(agi->agi_length);
+
+ libxfs_buf_relse(bp);
+
+ if (agblocks != NULLAGBLOCK)
+ fprintf(stderr, "AGI 0 length %u blocks found.\n", agblocks);
+ else
+ fprintf(stderr, "AGI 0 length recovery failed.\n");
+
+ return agblocks;
+}
+
+/*
+ * Try to get it from agf/agi length when primary superblock agblocks damaged.
+ * If agf matchs agi length, use it as agblocks, otherwise use the default geometry
+ * to calc agblocks
+ */
+static xfs_agblock_t
+xfs_try_get_agblocks(struct xfs_mount *mp, struct libxfs_init *x)
+{
+ xfs_agblock_t agblocks = NULLAGBLOCK;
+ xfs_agblock_t agblocks_agf, agblocks_agi;
+ uint64_t agsize, agcount;
+
+ fprintf(stderr, "Attempting recovery from AGF/AGI 0 metadata...\n");
+
+ agblocks_agf = xfs_get_agblock_from_agf(mp);
+ agblocks_agi = xfs_get_agblock_from_agi(mp);
+
+ if (agblocks_agf == agblocks_agi && agblocks_agf >= XFS_MIN_AG_BLOCKS && agblocks_agf <= XFS_MAX_AG_BLOCKS) {
+ fprintf(stderr, "AGF/AGI 0 length matches.\n");
+ fprintf(stderr, "Using %u blocks for superblock agblocks\n", agblocks_agf);
+ return agblocks_agf;
+ }
+
+ /* use default geometry to calc agblocks/agcount */
+ xfs_guess_default_ag_geometry(&agsize, &agcount, x);
+
+ /* choose the agblocks among agf/agi length and agsize */
+ if (agblocks_agf == agsize && agsize >= XFS_MIN_AG_BLOCKS && agsize <= XFS_MAX_AG_BLOCKS) {
+ fprintf(stderr, "Guessed AG matchs AGF length\n");
+ agblocks = agsize;
+ } else if (agblocks_agi == agsize && agsize >= XFS_MIN_AG_BLOCKS && agsize <= XFS_MAX_AG_BLOCKS) {
+ fprintf(stderr, "Guessed AG matchs AGI length\n");
+ agblocks = agsize;
+ } else if (agsize >= XFS_MIN_AG_BLOCKS && agsize <= XFS_MAX_AG_BLOCKS) {
+ fprintf(stderr, "Guessed AG does not match AGF/AGI 0 length.\n");
+ agblocks = agsize;
+ } else {
+ fprintf(stderr, "_(%s: device too small to hold a valid XFS filesystem)", progname);
+ exit(1);
+ }
+
+ fprintf(stderr, "Using %u blocks for superblock agblocks.\n", agblocks);
+
+ return agblocks;
+}
+
static void
init(
int argc,
@@ -129,6 +261,16 @@ init(
}
}
+ /* If sb_agblocks was damaged, try to get agblocks */
+ if (sbp->sb_agblocks < XFS_MIN_AG_BLOCKS || sbp->sb_agblocks > XFS_MAX_AG_BLOCKS) {
+ xfs_agblock_t agblocks;
+
+ fprintf(stderr, "Out of bounds superblock agblocks (%u) found.\n", sbp->sb_agblocks);
+
+ agblocks = xfs_try_get_agblocks(&xmount, &x);
+ sbp->sb_agblocks = agblocks;
+ }
+
agcount = sbp->sb_agcount;
mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER);
if (!mp) {
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3] xfs_db: make sure agblocks is valid to prevent corruption
2024-09-03 10:24 ` [PATCH v3] " liuhuan01
@ 2024-09-16 6:55 ` Carlos Maiolino
0 siblings, 0 replies; 10+ messages in thread
From: Carlos Maiolino @ 2024-09-16 6:55 UTC (permalink / raw)
To: liuhuan01; +Cc: david, cmaiolino, djwong, linux-xfs
On Tue, Sep 03, 2024 at 06:24:02PM GMT, liuhuan01@kylinos.cn wrote:
> From: liuh <liuhuan01@kylinos.cn>
>
> Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process.
> xfs_db -c "sb 0" -c "p agblocks" /dev/loop1
Please, don't send new versions in reply to old versions.
Carlos
>
> System will generate signal SIGFPE corrupt the process. And the stack as follow:
> corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags
> #0 libxfs_getbuf_flags
> #1 libxfs_getbuf_flags
> #2 libxfs_buf_read_map
> #3 libxfs_buf_read
> #4 libxfs_mount
> #5 init
> #6 main
>
> The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0.
> In this case, user cannot run in expert mode also.
>
> So, try to get agblocks from agf/agi 0, if agf/agi 0 length match, use it as agblocks.
> If failed use the default geometry to calc agblocks.
>
> Signed-off-by: liuh <liuhuan01@kylinos.cn>
> ---
> db/Makefile | 2 +-
> db/init.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 143 insertions(+), 1 deletion(-)
>
> diff --git a/db/Makefile b/db/Makefile
> index 83389376..322d5617 100644
> --- a/db/Makefile
> +++ b/db/Makefile
> @@ -68,7 +68,7 @@ CFILES = $(HFILES:.h=.c) \
> LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh
>
> LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBFROG) $(LIBUUID) $(LIBRT) $(LIBURCU) \
> - $(LIBPTHREAD)
> + $(LIBPTHREAD) $(LIBBLKID)
> LTDEPENDENCIES = $(LIBXFS) $(LIBXLOG) $(LIBFROG)
> LLDFLAGS += -static-libtool-libs
>
> diff --git a/db/init.c b/db/init.c
> index cea25ae5..167bc777 100644
> --- a/db/init.c
> +++ b/db/init.c
> @@ -38,6 +38,138 @@ usage(void)
> exit(1);
> }
>
> +static void
> +xfs_guess_default_ag_geometry(uint64_t *agsize, uint64_t *agcount, struct libxfs_init *x)
> +{
> + struct fs_topology ft;
> + int blocklog;
> + uint64_t dblocks;
> + int multidisk;
> +
> + fprintf(stderr, "Attempting to guess AG length from device geometry. This may not work.\n");
> +
> + memset(&ft, 0, sizeof(ft));
> + get_topology(x, &ft, 1);
> +
> + /*
> + * get geometry from get_topology result.
> + * Use default block size (2^12)
> + */
> + blocklog = 12;
> + multidisk = ft.data.swidth | ft.data.sunit;
> + dblocks = x->data.size >> (blocklog - BBSHIFT);
> + calc_default_ag_geometry(blocklog, dblocks, multidisk,
> + agsize, agcount);
> +
> + if (*agsize >= XFS_MIN_AG_BLOCKS && *agsize <= XFS_MAX_AG_BLOCKS)
> + fprintf(stderr, "Guessed AG length is %lu blocks.\n", *agsize);
> +}
> +
> +static xfs_agblock_t
> +xfs_get_agblock_from_agf(struct xfs_mount *mp)
> +{
> + xfs_agblock_t agblocks = NULLAGBLOCK;
> + int error;
> + struct xfs_buf *bp;
> + struct xfs_agf *agf;
> +
> + error = -libxfs_buf_read_uncached(mp->m_ddev_targp,
> + XFS_AGF_DADDR(mp), 1, 0, &bp, NULL);
> + if (error) {
> + fprintf(stderr, "AGF 0 length recovery failed\n");
> + return NULLAGBLOCK;
> + }
> +
> + agf = bp->b_addr;
> + if (be32_to_cpu(agf->agf_magicnum) == XFS_AGF_MAGIC)
> + agblocks = be32_to_cpu(agf->agf_length);
> +
> + libxfs_buf_relse(bp);
> +
> + if (agblocks != NULLAGBLOCK)
> + fprintf(stderr, "AGF 0 length %u blocks found.\n", agblocks);
> + else
> + fprintf(stderr, "AGF 0 length recovery failed.\n");
> +
> + return agblocks;
> +}
> +
> +static xfs_agblock_t
> +xfs_get_agblock_from_agi(struct xfs_mount *mp)
> +{
> + xfs_agblock_t agblocks = NULLAGBLOCK;
> + int error;
> + struct xfs_buf *bp;
> + struct xfs_agi *agi;
> +
> + error = -libxfs_buf_read_uncached(mp->m_ddev_targp,
> + XFS_AGI_DADDR(mp), 1, 0, &bp, NULL);
> + if (error) {
> + fprintf(stderr, "AGI 0 length recovery failed\n");
> + return NULLAGBLOCK;
> + }
> +
> +
> + agi = bp->b_addr;
> + if (be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC)
> + agblocks = be32_to_cpu(agi->agi_length);
> +
> + libxfs_buf_relse(bp);
> +
> + if (agblocks != NULLAGBLOCK)
> + fprintf(stderr, "AGI 0 length %u blocks found.\n", agblocks);
> + else
> + fprintf(stderr, "AGI 0 length recovery failed.\n");
> +
> + return agblocks;
> +}
> +
> +/*
> + * Try to get it from agf/agi length when primary superblock agblocks damaged.
> + * If agf matchs agi length, use it as agblocks, otherwise use the default geometry
> + * to calc agblocks
> + */
> +static xfs_agblock_t
> +xfs_try_get_agblocks(struct xfs_mount *mp, struct libxfs_init *x)
> +{
> + xfs_agblock_t agblocks = NULLAGBLOCK;
> + xfs_agblock_t agblocks_agf, agblocks_agi;
> + uint64_t agsize, agcount;
> +
> + fprintf(stderr, "Attempting recovery from AGF/AGI 0 metadata...\n");
> +
> + agblocks_agf = xfs_get_agblock_from_agf(mp);
> + agblocks_agi = xfs_get_agblock_from_agi(mp);
> +
> + if (agblocks_agf == agblocks_agi && agblocks_agf >= XFS_MIN_AG_BLOCKS && agblocks_agf <= XFS_MAX_AG_BLOCKS) {
> + fprintf(stderr, "AGF/AGI 0 length matches.\n");
> + fprintf(stderr, "Using %u blocks for superblock agblocks\n", agblocks_agf);
> + return agblocks_agf;
> + }
> +
> + /* use default geometry to calc agblocks/agcount */
> + xfs_guess_default_ag_geometry(&agsize, &agcount, x);
> +
> + /* choose the agblocks among agf/agi length and agsize */
> + if (agblocks_agf == agsize && agsize >= XFS_MIN_AG_BLOCKS && agsize <= XFS_MAX_AG_BLOCKS) {
> + fprintf(stderr, "Guessed AG matchs AGF length\n");
> + agblocks = agsize;
> + } else if (agblocks_agi == agsize && agsize >= XFS_MIN_AG_BLOCKS && agsize <= XFS_MAX_AG_BLOCKS) {
> + fprintf(stderr, "Guessed AG matchs AGI length\n");
> + agblocks = agsize;
> + } else if (agsize >= XFS_MIN_AG_BLOCKS && agsize <= XFS_MAX_AG_BLOCKS) {
> + fprintf(stderr, "Guessed AG does not match AGF/AGI 0 length.\n");
> + agblocks = agsize;
> + } else {
> + fprintf(stderr, "_(%s: device too small to hold a valid XFS filesystem)", progname);
> + exit(1);
> + }
> +
> + fprintf(stderr, "Using %u blocks for superblock agblocks.\n", agblocks);
> +
> + return agblocks;
> +}
> +
> static void
> init(
> int argc,
> @@ -129,6 +261,16 @@ init(
> }
> }
>
> + /* If sb_agblocks was damaged, try to get agblocks */
> + if (sbp->sb_agblocks < XFS_MIN_AG_BLOCKS || sbp->sb_agblocks > XFS_MAX_AG_BLOCKS) {
> + xfs_agblock_t agblocks;
> +
> + fprintf(stderr, "Out of bounds superblock agblocks (%u) found.\n", sbp->sb_agblocks);
> +
> + agblocks = xfs_try_get_agblocks(&xmount, &x);
> + sbp->sb_agblocks = agblocks;
> + }
> +
> agcount = sbp->sb_agcount;
> mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER);
> if (!mp) {
> --
> 2.43.0
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-09-16 6:55 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-21 10:44 [PATCH] xfs_db: make sure agblocks is valid to prevent corruption liuhuan01
2024-08-23 0:49 ` Darrick J. Wong
2024-08-27 3:23 ` Dave Chinner
2024-08-27 10:24 ` liuh
2024-08-27 23:37 ` Dave Chinner
2024-09-02 10:12 ` [PATCH v2] " liuhuan01
2024-09-02 18:56 ` Darrick J. Wong
2024-09-03 2:28 ` Dave Chinner
2024-09-03 10:24 ` [PATCH v3] " liuhuan01
2024-09-16 6:55 ` Carlos Maiolino
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox