* [PATCH] xfs_db: make sure agblocks is valid to prevent corruption @ 2024-08-21 10:44 liuhuan01 2024-08-23 0:49 ` Darrick J. Wong 0 siblings, 1 reply; 10+ messages in thread From: liuhuan01 @ 2024-08-21 10:44 UTC (permalink / raw) To: linux-xfs; +Cc: djwong, cmaiolino, liuh From: liuh <liuhuan01@kylinos.cn> Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process. xfs_db -c "sb 0" -c "p agblocks" /dev/loop1 System will generate signal SIGFPE corrupt the process. And the stack as follow: corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags #0 libxfs_getbuf_flags #1 libxfs_getbuf_flags #2 libxfs_buf_read_map #3 libxfs_buf_read #4 libxfs_mount #5 init #6 main The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0. In this case, user cannot run in expert mode also. Never check (mp)->m_sb.sb_agblocks before use it cause this issue. Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message. Signed-off-by: liuh <liuhuan01@kylinos.cn> --- db/init.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/db/init.c b/db/init.c index cea25ae5..2d3295ba 100644 --- a/db/init.c +++ b/db/init.c @@ -129,6 +129,13 @@ init( } } + if (unlikely(sbp->sb_agblocks == 0)) { + fprintf(stderr, + _("%s: device %s agblocks unexpected\n"), + progname, x.data.name); + exit(1); + } + agcount = sbp->sb_agcount; mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER); if (!mp) { -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs_db: make sure agblocks is valid to prevent corruption 2024-08-21 10:44 [PATCH] xfs_db: make sure agblocks is valid to prevent corruption liuhuan01 @ 2024-08-23 0:49 ` Darrick J. Wong 2024-08-27 3:23 ` Dave Chinner 0 siblings, 1 reply; 10+ messages in thread From: Darrick J. Wong @ 2024-08-23 0:49 UTC (permalink / raw) To: liuhuan01; +Cc: linux-xfs, cmaiolino On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote: > From: liuh <liuhuan01@kylinos.cn> > > Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process. > xfs_db -c "sb 0" -c "p agblocks" /dev/loop1 > > System will generate signal SIGFPE corrupt the process. And the stack as follow: > corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags > #0 libxfs_getbuf_flags > #1 libxfs_getbuf_flags > #2 libxfs_buf_read_map > #3 libxfs_buf_read > #4 libxfs_mount > #5 init > #6 main > > The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0. > In this case, user cannot run in expert mode also. > > Never check (mp)->m_sb.sb_agblocks before use it cause this issue. > Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message. > > Signed-off-by: liuh <liuhuan01@kylinos.cn> > --- > db/init.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/db/init.c b/db/init.c > index cea25ae5..2d3295ba 100644 > --- a/db/init.c > +++ b/db/init.c > @@ -129,6 +129,13 @@ init( > } > } > > + if (unlikely(sbp->sb_agblocks == 0)) { > + fprintf(stderr, > + _("%s: device %s agblocks unexpected\n"), > + progname, x.data.name); > + exit(1); What if we set sb_agblocks to 1 and let the debugger continue? --D > + } > + > agcount = sbp->sb_agcount; > mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER); > if (!mp) { > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs_db: make sure agblocks is valid to prevent corruption 2024-08-23 0:49 ` Darrick J. Wong @ 2024-08-27 3:23 ` Dave Chinner 2024-08-27 10:24 ` liuh 0 siblings, 1 reply; 10+ messages in thread From: Dave Chinner @ 2024-08-27 3:23 UTC (permalink / raw) To: Darrick J. Wong; +Cc: liuhuan01, linux-xfs, cmaiolino On Thu, Aug 22, 2024 at 05:49:12PM -0700, Darrick J. Wong wrote: > On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote: > > From: liuh <liuhuan01@kylinos.cn> > > > > Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process. > > xfs_db -c "sb 0" -c "p agblocks" /dev/loop1 > > > > System will generate signal SIGFPE corrupt the process. And the stack as follow: > > corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags > > #0 libxfs_getbuf_flags > > #1 libxfs_getbuf_flags > > #2 libxfs_buf_read_map > > #3 libxfs_buf_read > > #4 libxfs_mount > > #5 init > > #6 main > > > > The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0. > > In this case, user cannot run in expert mode also. > > > > Never check (mp)->m_sb.sb_agblocks before use it cause this issue. > > Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message. > > > > Signed-off-by: liuh <liuhuan01@kylinos.cn> > > --- > > db/init.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/db/init.c b/db/init.c > > index cea25ae5..2d3295ba 100644 > > --- a/db/init.c > > +++ b/db/init.c > > @@ -129,6 +129,13 @@ init( > > } > > } > > > > + if (unlikely(sbp->sb_agblocks == 0)) { > > + fprintf(stderr, > > + _("%s: device %s agblocks unexpected\n"), > > + progname, x.data.name); > > + exit(1); > > What if we set sb_agblocks to 1 and let the debugger continue? Yeah, I'd prefer that xfs_db will operate on a corrupt filesystem and maybe crash unexpectedly than to refuse to allow any diagnosis of the corrupt filesystem. xfs_db is a debug and forensic analysis tool. Having it crash because it didn't handle some corruption entirely corectly isn't something that we should be particularly worried about... -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs_db: make sure agblocks is valid to prevent corruption 2024-08-27 3:23 ` Dave Chinner @ 2024-08-27 10:24 ` liuh 2024-08-27 23:37 ` Dave Chinner 0 siblings, 1 reply; 10+ messages in thread From: liuh @ 2024-08-27 10:24 UTC (permalink / raw) To: Dave Chinner, Darrick J. Wong; +Cc: linux-xfs, cmaiolino 在 2024/8/27 11:23, Dave Chinner 写道: > On Thu, Aug 22, 2024 at 05:49:12PM -0700, Darrick J. Wong wrote: >> On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote: >>> From: liuh <liuhuan01@kylinos.cn> >>> >>> Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process. >>> xfs_db -c "sb 0" -c "p agblocks" /dev/loop1 >>> >>> System will generate signal SIGFPE corrupt the process. And the stack as follow: >>> corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags >>> #0 libxfs_getbuf_flags >>> #1 libxfs_getbuf_flags >>> #2 libxfs_buf_read_map >>> #3 libxfs_buf_read >>> #4 libxfs_mount >>> #5 init >>> #6 main >>> >>> The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0. >>> In this case, user cannot run in expert mode also. >>> >>> Never check (mp)->m_sb.sb_agblocks before use it cause this issue. >>> Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message. >>> >>> Signed-off-by: liuh <liuhuan01@kylinos.cn> >>> --- >>> db/init.c | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/db/init.c b/db/init.c >>> index cea25ae5..2d3295ba 100644 >>> --- a/db/init.c >>> +++ b/db/init.c >>> @@ -129,6 +129,13 @@ init( >>> } >>> } >>> >>> + if (unlikely(sbp->sb_agblocks == 0)) { >>> + fprintf(stderr, >>> + _("%s: device %s agblocks unexpected\n"), >>> + progname, x.data.name); >>> + exit(1); >> What if we set sb_agblocks to 1 and let the debugger continue? > Yeah, I'd prefer that xfs_db will operate on a corrupt filesystem and > maybe crash unexpectedly than to refuse to allow any diagnosis of > the corrupt filesystem. > > xfs_db is a debug and forensic analysis tool. Having it crash > because it didn't handle some corruption entirely corectly isn't > something that we should be particularly worried about... > > -Dave. I agree with both of you, xfs_db is just a debugger tool. But for the above case, xfs_db can do nothing, even do a simple view on primary superblock. The user all knowns is that xfs_db goto corrupt, but don't know what's cause the problem. If set sb_agblocks to 1, xfs_db can going to work to view on primary superblock, but can't relay on it to view more information. Maybe left a warning message and set sb_agblocks to 1 and let debugger continue is better. Thanks. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] xfs_db: make sure agblocks is valid to prevent corruption 2024-08-27 10:24 ` liuh @ 2024-08-27 23:37 ` Dave Chinner 2024-09-02 10:12 ` [PATCH v2] " liuhuan01 0 siblings, 1 reply; 10+ messages in thread From: Dave Chinner @ 2024-08-27 23:37 UTC (permalink / raw) To: liuh; +Cc: Darrick J. Wong, linux-xfs, cmaiolino On Tue, Aug 27, 2024 at 06:24:25PM +0800, liuh wrote: > 在 2024/8/27 11:23, Dave Chinner 写道: > > On Thu, Aug 22, 2024 at 05:49:12PM -0700, Darrick J. Wong wrote: > > > On Wed, Aug 21, 2024 at 06:44:12PM +0800, liuhuan01@kylinos.cn wrote: > > > > From: liuh <liuhuan01@kylinos.cn> > > > > > > > > Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process. > > > > xfs_db -c "sb 0" -c "p agblocks" /dev/loop1 > > > > > > > > System will generate signal SIGFPE corrupt the process. And the stack as follow: > > > > corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags > > > > #0 libxfs_getbuf_flags > > > > #1 libxfs_getbuf_flags > > > > #2 libxfs_buf_read_map > > > > #3 libxfs_buf_read > > > > #4 libxfs_mount > > > > #5 init > > > > #6 main > > > > > > > > The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0. > > > > In this case, user cannot run in expert mode also. > > > > > > > > Never check (mp)->m_sb.sb_agblocks before use it cause this issue. > > > > Make sure (mp)->m_sb.sb_agblocks > 0 before libxfs_mount to prevent corruption and leave a message. > > > > > > > > Signed-off-by: liuh <liuhuan01@kylinos.cn> > > > > --- > > > > db/init.c | 7 +++++++ > > > > 1 file changed, 7 insertions(+) > > > > > > > > diff --git a/db/init.c b/db/init.c > > > > index cea25ae5..2d3295ba 100644 > > > > --- a/db/init.c > > > > +++ b/db/init.c > > > > @@ -129,6 +129,13 @@ init( > > > > } > > > > } > > > > + if (unlikely(sbp->sb_agblocks == 0)) { > > > > + fprintf(stderr, > > > > + _("%s: device %s agblocks unexpected\n"), > > > > + progname, x.data.name); > > > > + exit(1); > > > What if we set sb_agblocks to 1 and let the debugger continue? > > Yeah, I'd prefer that xfs_db will operate on a corrupt filesystem and > > maybe crash unexpectedly than to refuse to allow any diagnosis of > > the corrupt filesystem. > > > > xfs_db is a debug and forensic analysis tool. Having it crash > > because it didn't handle some corruption entirely corectly isn't > > something that we should be particularly worried about... > > > > -Dave. > > I agree with both of you, xfs_db is just a debugger tool. > But for the above case, xfs_db can do nothing, even do a simple view on > primary superblock. > The user all knowns is that xfs_db goto corrupt, but don't know what's cause > the problem. > > If set sb_agblocks to 1, xfs_db can going to work to view on primary > superblock, > but can't relay on it to view more information. Yes, a value of "1" will avoid the crash. However, it is obviously not an ideal choice because a value of 1 is not a valid value for sb_agblocks. IOWs, Darricks suggestion really was not to literally use a value of 1, but to substitute a non-zero value that allows xfs_db to largely function correctly on a filesystem corrupted in this way. > Maybe left a warning message and set sb_agblocks to 1 and let debugger > continue is better. When two senior developers both say "do it this way", it is worth trying to understand why they made that suggestion, not take the suggestion as the exact solution to the problem. We've provided the -framework- for the desired solution, but you still need to do some work to make it work. So what is a valid value for sb_agblocks? Is that value more functional that a value of 1? If it is more functional, can we calculate a better approximation of the correct value? Go an look at all the geometry information we have in the superblock that we can calculate an approximate AG size from. sb_agcount tells us how many AGs there are. sb_dblocks tells us the size of the filesystem. sb_agblklog tells us the maximum size the AG could possibly be. Go look for redundant metadata that we can get the exact value of agblocks from without relying on a specific sb_agblocks value. agf->agf_length should always equal sb_agblocks agi->agi_length should always equal sb_agblocks Look at the sb_agblocks verification/bounds checking code to determine absolute min/max valid values. XFS_MIN_AG_BYTES tells us the smallest valid AG size. XFS_MIN_AG_BLOCKS tells us the smallest valid AG block count. So, how would you go about calculating an approximate, if not exactly correct value for the missing sb_agblocks value? -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2] xfs_db: make sure agblocks is valid to prevent corruption 2024-08-27 23:37 ` Dave Chinner @ 2024-09-02 10:12 ` liuhuan01 2024-09-02 18:56 ` Darrick J. Wong 2024-09-03 2:28 ` Dave Chinner 0 siblings, 2 replies; 10+ messages in thread From: liuhuan01 @ 2024-09-02 10:12 UTC (permalink / raw) To: david; +Cc: cmaiolino, djwong, linux-xfs, liuh From: liuh <liuhuan01@kylinos.cn> Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process. xfs_db -c "sb 0" -c "p agblocks" /dev/loop1 System will generate signal SIGFPE corrupt the process. And the stack as follow: corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags #0 libxfs_getbuf_flags #1 libxfs_getbuf_flags #2 libxfs_buf_read_map #3 libxfs_buf_read #4 libxfs_mount #5 init #6 main The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0. In this case, user cannot run in expert mode also. So, try to get agblocks from agf/agi 0, if failed use the default geometry to calc agblocks. The worst thing is cannot get agblocks accroding above method, then set it to 1. Signed-off-by: liuh <liuhuan01@kylinos.cn> --- db/Makefile | 2 +- db/init.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 129 insertions(+), 1 deletion(-) diff --git a/db/Makefile b/db/Makefile index 83389376..322d5617 100644 --- a/db/Makefile +++ b/db/Makefile @@ -68,7 +68,7 @@ CFILES = $(HFILES:.h=.c) \ LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBFROG) $(LIBUUID) $(LIBRT) $(LIBURCU) \ - $(LIBPTHREAD) + $(LIBPTHREAD) $(LIBBLKID) LTDEPENDENCIES = $(LIBXFS) $(LIBXLOG) $(LIBFROG) LLDFLAGS += -static-libtool-libs diff --git a/db/init.c b/db/init.c index cea25ae5..15124ee2 100644 --- a/db/init.c +++ b/db/init.c @@ -38,6 +38,121 @@ usage(void) exit(1); } +static void +xfs_guess_default_ag_geometry(uint64_t *agsize, uint64_t *agcount, struct libxfs_init *x) +{ + struct fs_topology ft; + int blocklog; + uint64_t dblocks; + int multidisk; + + memset(&ft, 0, sizeof(ft)); + get_topology(x, &ft, 1); + + /* + * get geometry from get_topology result. + * Use default block size (2^12) + */ + blocklog = 12; + multidisk = ft.data.swidth | ft.data.sunit; + dblocks = x->data.size >> (blocklog - BBSHIFT); + calc_default_ag_geometry(blocklog, dblocks, multidisk, + agsize, agcount); +} + +static xfs_agblock_t +xfs_get_agblock_from_agf(struct xfs_mount *mp) +{ + xfs_agblock_t agblocks = 0; + int error; + struct xfs_buf *bp; + struct xfs_agf *agf; + + error = -libxfs_buf_read_uncached(mp->m_ddev_targp, + XFS_AG_DADDR(mp, 0, XFS_AGF_DADDR(mp)), + XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL); + if (error) { + fprintf(stderr, "xfs_get_agblock from agf-0 error %d\n", error); + return agblocks; + } + + if (xfs_has_crc(mp) && !xfs_buf_verify_cksum(bp, XFS_AGF_CRC_OFF)) { + fprintf(stderr, "xfs_get_agblock from agf-0 badcrc\n"); + return agblocks; + } + + agf = bp->b_addr; + agblocks = be32_to_cpu(agf->agf_length); + + libxfs_buf_relse(bp); + + return agblocks; +} + +static xfs_agblock_t +xfs_get_agblock_from_agi(struct xfs_mount *mp) +{ + xfs_agblock_t agblocks = 0; + int error; + struct xfs_buf *bp; + struct xfs_agi *agi; + + error = -libxfs_buf_read_uncached(mp->m_ddev_targp, + XFS_AG_DADDR(mp, 0, XFS_AGI_DADDR(mp)), + XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL); + if (error) { + fprintf(stderr, "xfs_get_agblock from agi-0 error %d\n", error); + return agblocks; + } + + if (xfs_has_crc(mp) && !xfs_buf_verify_cksum(bp, XFS_AGI_CRC_OFF)) { + fprintf(stderr, "xfs_get_agblock from agi-0 badcrc\n"); + return agblocks; + } + + agi = bp->b_addr; + agblocks = be32_to_cpu(agi->agi_length); + + libxfs_buf_relse(bp); + + return agblocks; +} + +/* + * If sb_agblocks was damaged, try to read it from agf/agi 0. + * With read agf/agi fails use default geometry to calc agblocks/agcount. + * The worst thing is cannot get agblocks according above method, then set to 1. + */ +static xfs_agblock_t +xfs_try_get_agblocks(struct xfs_mount *mp, struct libxfs_init *x) +{ + xfs_agblock_t agblocks; + uint64_t agsize, agcount; + + /* firset try to get agblocks from agf-0 */ + agblocks = xfs_get_agblock_from_agf(mp); + if (XFS_FSB_TO_B(mp, agblocks) >= XFS_MIN_AG_BYTES && + XFS_FSB_TO_B(mp, agblocks) <= XFS_MAX_AG_BYTES) + return agblocks; + + /* second try to get agblocks from agi-0 */ + agblocks = xfs_get_agblock_from_agi(mp); + if (XFS_FSB_TO_B(mp, agblocks) >= XFS_MIN_AG_BYTES && + XFS_FSB_TO_B(mp, agblocks) <= XFS_MAX_AG_BYTES) + return agblocks; + + /* third use default geometry to calc agblocks/agcount */ + xfs_guess_default_ag_geometry(&agsize, &agcount, x); + + if (XFS_FSB_TO_B(mp, agsize) < XFS_MIN_AG_BYTES || + XFS_FSB_TO_B(mp, agsize) > XFS_MAX_AG_BYTES) + agblocks = 1; /* the worst is set to 1 */ + else + agblocks = agsize; + + return agblocks; +} + static void init( int argc, @@ -129,6 +244,19 @@ init( } } + /* If sb_agblocks was damaged, try to get agblocks */ + if (XFS_FSB_TO_B(&xmount, sbp->sb_agblocks) < XFS_MIN_AG_BYTES || + XFS_FSB_TO_B(&xmount, sbp->sb_agblocks) > XFS_MAX_AG_BYTES) { + xfs_agblock_t agblocks; + xfs_agblock_t bad_agblocks = sbp->sb_agblocks; + + agblocks = xfs_try_get_agblocks(&xmount, &x); + sbp->sb_agblocks = agblocks; + + fprintf(stderr, "wrong agblocks, try to get from agblocks %u -> %u\n", + bad_agblocks, sbp->sb_agblocks); + } + agcount = sbp->sb_agcount; mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER); if (!mp) { -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] xfs_db: make sure agblocks is valid to prevent corruption 2024-09-02 10:12 ` [PATCH v2] " liuhuan01 @ 2024-09-02 18:56 ` Darrick J. Wong 2024-09-03 2:28 ` Dave Chinner 1 sibling, 0 replies; 10+ messages in thread From: Darrick J. Wong @ 2024-09-02 18:56 UTC (permalink / raw) To: liuhuan01; +Cc: david, cmaiolino, linux-xfs On Mon, Sep 02, 2024 at 06:12:39PM +0800, liuhuan01@kylinos.cn wrote: > From: liuh <liuhuan01@kylinos.cn> > > Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process. > xfs_db -c "sb 0" -c "p agblocks" /dev/loop1 > > System will generate signal SIGFPE corrupt the process. And the stack as follow: > corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags > #0 libxfs_getbuf_flags > #1 libxfs_getbuf_flags > #2 libxfs_buf_read_map > #3 libxfs_buf_read > #4 libxfs_mount > #5 init > #6 main > > The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0. > In this case, user cannot run in expert mode also. > > So, try to get agblocks from agf/agi 0, if failed use the default geometry to calc agblocks. > The worst thing is cannot get agblocks accroding above method, then set it to 1. > > Signed-off-by: liuh <liuhuan01@kylinos.cn> > --- > db/Makefile | 2 +- > db/init.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 129 insertions(+), 1 deletion(-) > > diff --git a/db/Makefile b/db/Makefile > index 83389376..322d5617 100644 > --- a/db/Makefile > +++ b/db/Makefile > @@ -68,7 +68,7 @@ CFILES = $(HFILES:.h=.c) \ > LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh > > LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBFROG) $(LIBUUID) $(LIBRT) $(LIBURCU) \ > - $(LIBPTHREAD) > + $(LIBPTHREAD) $(LIBBLKID) > LTDEPENDENCIES = $(LIBXFS) $(LIBXLOG) $(LIBFROG) > LLDFLAGS += -static-libtool-libs > > diff --git a/db/init.c b/db/init.c > index cea25ae5..15124ee2 100644 > --- a/db/init.c > +++ b/db/init.c > @@ -38,6 +38,121 @@ usage(void) > exit(1); > } > > +static void > +xfs_guess_default_ag_geometry(uint64_t *agsize, uint64_t *agcount, struct libxfs_init *x) > +{ > + struct fs_topology ft; > + int blocklog; > + uint64_t dblocks; > + int multidisk; > + > + memset(&ft, 0, sizeof(ft)); > + get_topology(x, &ft, 1); > + > + /* > + * get geometry from get_topology result. > + * Use default block size (2^12) > + */ > + blocklog = 12; > + multidisk = ft.data.swidth | ft.data.sunit; > + dblocks = x->data.size >> (blocklog - BBSHIFT); > + calc_default_ag_geometry(blocklog, dblocks, multidisk, > + agsize, agcount); > +} > + > +static xfs_agblock_t > +xfs_get_agblock_from_agf(struct xfs_mount *mp) > +{ > + xfs_agblock_t agblocks = 0; > + int error; > + struct xfs_buf *bp; > + struct xfs_agf *agf; > + > + error = -libxfs_buf_read_uncached(mp->m_ddev_targp, > + XFS_AG_DADDR(mp, 0, XFS_AGF_DADDR(mp)), > + XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL); Why not call the full verifier on the buffer read? > + if (error) { > + fprintf(stderr, "xfs_get_agblock from agf-0 error %d\n", error); > + return agblocks; > + } > + > + if (xfs_has_crc(mp) && !xfs_buf_verify_cksum(bp, XFS_AGF_CRC_OFF)) { > + fprintf(stderr, "xfs_get_agblock from agf-0 badcrc\n"); > + return agblocks; > + } Then you don't have to do this ^^^ stuff. --D > + agf = bp->b_addr; > + agblocks = be32_to_cpu(agf->agf_length); > + > + libxfs_buf_relse(bp); > + > + return agblocks; > +} > + > +static xfs_agblock_t > +xfs_get_agblock_from_agi(struct xfs_mount *mp) > +{ > + xfs_agblock_t agblocks = 0; > + int error; > + struct xfs_buf *bp; > + struct xfs_agi *agi; > + > + error = -libxfs_buf_read_uncached(mp->m_ddev_targp, > + XFS_AG_DADDR(mp, 0, XFS_AGI_DADDR(mp)), > + XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL); > + if (error) { > + fprintf(stderr, "xfs_get_agblock from agi-0 error %d\n", error); > + return agblocks; > + } > + > + if (xfs_has_crc(mp) && !xfs_buf_verify_cksum(bp, XFS_AGI_CRC_OFF)) { > + fprintf(stderr, "xfs_get_agblock from agi-0 badcrc\n"); > + return agblocks; > + } > + > + agi = bp->b_addr; > + agblocks = be32_to_cpu(agi->agi_length); > + > + libxfs_buf_relse(bp); > + > + return agblocks; > +} > + > +/* > + * If sb_agblocks was damaged, try to read it from agf/agi 0. > + * With read agf/agi fails use default geometry to calc agblocks/agcount. > + * The worst thing is cannot get agblocks according above method, then set to 1. > + */ > +static xfs_agblock_t > +xfs_try_get_agblocks(struct xfs_mount *mp, struct libxfs_init *x) > +{ > + xfs_agblock_t agblocks; > + uint64_t agsize, agcount; > + > + /* firset try to get agblocks from agf-0 */ > + agblocks = xfs_get_agblock_from_agf(mp); > + if (XFS_FSB_TO_B(mp, agblocks) >= XFS_MIN_AG_BYTES && > + XFS_FSB_TO_B(mp, agblocks) <= XFS_MAX_AG_BYTES) > + return agblocks; > + > + /* second try to get agblocks from agi-0 */ > + agblocks = xfs_get_agblock_from_agi(mp); > + if (XFS_FSB_TO_B(mp, agblocks) >= XFS_MIN_AG_BYTES && > + XFS_FSB_TO_B(mp, agblocks) <= XFS_MAX_AG_BYTES) > + return agblocks; > + > + /* third use default geometry to calc agblocks/agcount */ > + xfs_guess_default_ag_geometry(&agsize, &agcount, x); > + > + if (XFS_FSB_TO_B(mp, agsize) < XFS_MIN_AG_BYTES || > + XFS_FSB_TO_B(mp, agsize) > XFS_MAX_AG_BYTES) > + agblocks = 1; /* the worst is set to 1 */ > + else > + agblocks = agsize; > + > + return agblocks; > +} > + > static void > init( > int argc, > @@ -129,6 +244,19 @@ init( > } > } > > + /* If sb_agblocks was damaged, try to get agblocks */ > + if (XFS_FSB_TO_B(&xmount, sbp->sb_agblocks) < XFS_MIN_AG_BYTES || > + XFS_FSB_TO_B(&xmount, sbp->sb_agblocks) > XFS_MAX_AG_BYTES) { > + xfs_agblock_t agblocks; > + xfs_agblock_t bad_agblocks = sbp->sb_agblocks; > + > + agblocks = xfs_try_get_agblocks(&xmount, &x); > + sbp->sb_agblocks = agblocks; > + > + fprintf(stderr, "wrong agblocks, try to get from agblocks %u -> %u\n", > + bad_agblocks, sbp->sb_agblocks); > + } > + > agcount = sbp->sb_agcount; > mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER); > if (!mp) { > -- > 2.43.0 > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] xfs_db: make sure agblocks is valid to prevent corruption 2024-09-02 10:12 ` [PATCH v2] " liuhuan01 2024-09-02 18:56 ` Darrick J. Wong @ 2024-09-03 2:28 ` Dave Chinner 2024-09-03 10:24 ` [PATCH v3] " liuhuan01 1 sibling, 1 reply; 10+ messages in thread From: Dave Chinner @ 2024-09-03 2:28 UTC (permalink / raw) To: liuhuan01; +Cc: cmaiolino, djwong, linux-xfs On Mon, Sep 02, 2024 at 06:12:39PM +0800, liuhuan01@kylinos.cn wrote: > From: liuh <liuhuan01@kylinos.cn> > > Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process. > xfs_db -c "sb 0" -c "p agblocks" /dev/loop1 > > System will generate signal SIGFPE corrupt the process. And the stack as follow: > corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags > #0 libxfs_getbuf_flags > #1 libxfs_getbuf_flags > #2 libxfs_buf_read_map > #3 libxfs_buf_read > #4 libxfs_mount > #5 init > #6 main > > The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0. > In this case, user cannot run in expert mode also. > > So, try to get agblocks from agf/agi 0, if failed use the default geometry to calc agblocks. > The worst thing is cannot get agblocks accroding above method, then set it to 1. > > Signed-off-by: liuh <liuhuan01@kylinos.cn> > --- > db/Makefile | 2 +- > db/init.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 129 insertions(+), 1 deletion(-) > > diff --git a/db/Makefile b/db/Makefile > index 83389376..322d5617 100644 > --- a/db/Makefile > +++ b/db/Makefile > @@ -68,7 +68,7 @@ CFILES = $(HFILES:.h=.c) \ > LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh > > LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBFROG) $(LIBUUID) $(LIBRT) $(LIBURCU) \ > - $(LIBPTHREAD) > + $(LIBPTHREAD) $(LIBBLKID) > LTDEPENDENCIES = $(LIBXFS) $(LIBXLOG) $(LIBFROG) > LLDFLAGS += -static-libtool-libs > > diff --git a/db/init.c b/db/init.c > index cea25ae5..15124ee2 100644 > --- a/db/init.c > +++ b/db/init.c > @@ -38,6 +38,121 @@ usage(void) > exit(1); > } > > +static void > +xfs_guess_default_ag_geometry(uint64_t *agsize, uint64_t *agcount, struct libxfs_init *x) > +{ > + struct fs_topology ft; > + int blocklog; > + uint64_t dblocks; > + int multidisk; > + > + memset(&ft, 0, sizeof(ft)); > + get_topology(x, &ft, 1); > + > + /* > + * get geometry from get_topology result. > + * Use default block size (2^12) > + */ > + blocklog = 12; > + multidisk = ft.data.swidth | ft.data.sunit; > + dblocks = x->data.size >> (blocklog - BBSHIFT); > + calc_default_ag_geometry(blocklog, dblocks, multidisk, > + agsize, agcount); > +} > + > +static xfs_agblock_t > +xfs_get_agblock_from_agf(struct xfs_mount *mp) > +{ > + xfs_agblock_t agblocks = 0; > + int error; > + struct xfs_buf *bp; > + struct xfs_agf *agf; > + > + error = -libxfs_buf_read_uncached(mp->m_ddev_targp, > + XFS_AG_DADDR(mp, 0, XFS_AGF_DADDR(mp)), > + XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL); You're not checking what the code you wrote actually does. #define XFS_AGB_TO_DADDR(mp,agno,agbno) \ ((xfs_daddr_t)XFS_FSB_TO_BB(mp, \ (xfs_fsblock_t)(agno) * (mp)->m_sb.sb_agblocks + (agbno))) #define XFS_AG_DADDR(mp,agno,d) (XFS_AGB_TO_DADDR(mp, agno, 0) + (d)) Yup, XFS_AG_DADDR() uses the very field we've already decided is invalid.... In these cases where we specifically want to read from the AG 0 headers and we can't trust anything in the superblock, we need to open code the daddrs we need to read from. In this case the daddr of AGF 0 is simply XFS_AGF_DADDR(mp). And we don't need XFS_FSS_TO_BB(), either - we can simply read a single sector at that location as the AGF and AGI headers always fit in a single sector. Hence this should be: error = -libxfs_buf_read_uncached(mp->m_ddev_targp, XFS_AGF_DADDR(mp), 1, 0, &bp, NULL); > + if (error) { > + fprintf(stderr, "xfs_get_agblock from agf-0 error %d\n", error); > + return agblocks; Return NULLAGBLOCK on failure. > + } > + > + if (xfs_has_crc(mp) && !xfs_buf_verify_cksum(bp, XFS_AGF_CRC_OFF)) { > + fprintf(stderr, "xfs_get_agblock from agf-0 badcrc\n"); > + return agblocks; > + } We can't trust the CRC to discover corruption here - syzbot corrupts structures and then recalculates the CRC. Hence a good CRC is not an indication of a valid, uncorrupted structure. And ignoring the CRC allows a single sector read.... > + agf = bp->b_addr; > + agblocks = be32_to_cpu(agf->agf_length); This is missing magic number checks to determine if the metadata returned is actually an AGF block. > + > + libxfs_buf_relse(bp); > + > + return agblocks; > +} > + > +static xfs_agblock_t > +xfs_get_agblock_from_agi(struct xfs_mount *mp) > +{ > + xfs_agblock_t agblocks = 0; > + int error; > + struct xfs_buf *bp; > + struct xfs_agi *agi; > + > + error = -libxfs_buf_read_uncached(mp->m_ddev_targp, > + XFS_AG_DADDR(mp, 0, XFS_AGI_DADDR(mp)), > + XFS_FSS_TO_BB(mp, 1), 0, &bp, NULL); > + if (error) { > + fprintf(stderr, "xfs_get_agblock from agi-0 error %d\n", error); > + return agblocks; > + } > + > + if (xfs_has_crc(mp) && !xfs_buf_verify_cksum(bp, XFS_AGI_CRC_OFF)) { > + fprintf(stderr, "xfs_get_agblock from agi-0 badcrc\n"); > + return agblocks; > + } > + > + agi = bp->b_addr; > + agblocks = be32_to_cpu(agi->agi_length); > + > + libxfs_buf_relse(bp); > + > + return agblocks; > +} > + > +/* > + * If sb_agblocks was damaged, try to read it from agf/agi 0. > + * With read agf/agi fails use default geometry to calc agblocks/agcount. > + * The worst thing is cannot get agblocks according above method, then set to 1. > + */ > +static xfs_agblock_t > +xfs_try_get_agblocks(struct xfs_mount *mp, struct libxfs_init *x) > +{ > + xfs_agblock_t agblocks; > + uint64_t agsize, agcount; > + > + /* firset try to get agblocks from agf-0 */ > + agblocks = xfs_get_agblock_from_agf(mp); > + if (XFS_FSB_TO_B(mp, agblocks) >= XFS_MIN_AG_BYTES && Ah, did you look at what XFS_FSB_TO_B() requires to be set? #define XFS_FSB_TO_B(mp,fsbno) ((xfs_fsize_t)(fsbno) << (mp)->m_sb.sb_blocklog) Essentially, detected that the superblock is damaged by using an adjacent field in the superblock (which also may be damaged) to do the validity check isn't reliable here. We have XFS_MIN_AG_BLOCKS and XFS_MAX_AG_BLOCKS constants defined, so that is what the validity checks here should use... > + XFS_FSB_TO_B(mp, agblocks) <= XFS_MAX_AG_BYTES) > + return agblocks; We shouldn't return agblocks here - we need to validate that it is the same as the AGI length because we can't trust a single value to be correct if we have corruption in the headers. i.e. if both AGI and AGF magic numbers are valid, and the lengths match, it's a pretty good indication that this is the actual length of the AG. It just doesn't matter if the AGF/AGI are otherwise damaged and fail CRCs or pass the CRCs because of malicious corruption, but if they match we can probably trust it to be correct for the purposes of diagnosis. > + /* second try to get agblocks from agi-0 */ > + agblocks = xfs_get_agblock_from_agi(mp); > + if (XFS_FSB_TO_B(mp, agblocks) >= XFS_MIN_AG_BYTES && > + XFS_FSB_TO_B(mp, agblocks) <= XFS_MAX_AG_BYTES) > + return agblocks; > + > + /* third use default geometry to calc agblocks/agcount */ > + xfs_guess_default_ag_geometry(&agsize, &agcount, x); > + > + if (XFS_FSB_TO_B(mp, agsize) < XFS_MIN_AG_BYTES || > + XFS_FSB_TO_B(mp, agsize) > XFS_MAX_AG_BYTES) > + agblocks = 1; /* the worst is set to 1 */ *no*. agblocks = 1 is *not a valid size* - why set an out-of-bounds value for something we are already know is out-of-bounds? Further, xfs_guess_default_ag_geometry() should never return an invalid AG size unless the block device is under the minimum size of an XFS filesystem. In which case, there's really nothing we can do and should actually abort this saying "device too small to hold a valid XFS filesystem". > + else > + agblocks = agsize; The resultant should be probably be compared against the lengths found from the AGI/AGF checks. If it matches one of them, we have more confidence that this is the correct length. > + > + return agblocks; > +} > + > static void > init( > int argc, > @@ -129,6 +244,19 @@ init( > } > } > > + /* If sb_agblocks was damaged, try to get agblocks */ > + if (XFS_FSB_TO_B(&xmount, sbp->sb_agblocks) < XFS_MIN_AG_BYTES || > + XFS_FSB_TO_B(&xmount, sbp->sb_agblocks) > XFS_MAX_AG_BYTES) { > + xfs_agblock_t agblocks; > + xfs_agblock_t bad_agblocks = sbp->sb_agblocks; > + > + agblocks = xfs_try_get_agblocks(&xmount, &x); > + sbp->sb_agblocks = agblocks; > + > + fprintf(stderr, "wrong agblocks, try to get from agblocks %u -> %u\n", > + bad_agblocks, sbp->sb_agblocks); What does this error message mean? If I saw that, I wouldn't have a clue what it means. I'd expect output something like this on a AGf/AGI recovery: Out of bounds superblock agblocks (%u) found. Attempting recovery from AGF/AGI 0 metadata.... AGF length %u blocks found. AGI length %u blocks found. AGF/AGI length matches. Using %u blocks for superblock agblocks. Or if AGF or AGI recovery fails: Out of bounds superblock agblocks (%u) found. Attempting recovery from AGF/AGI 0 metadata.... AGF length recovery failed. AGI length %u blocks found. Attempting to guess AG length from device geometry. This may not work. Guessed AG length is %u blocks. Guessed length matches AGI length. Using %u blocks for superblock agblocks. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3] xfs_db: make sure agblocks is valid to prevent corruption 2024-09-03 2:28 ` Dave Chinner @ 2024-09-03 10:24 ` liuhuan01 2024-09-16 6:55 ` Carlos Maiolino 0 siblings, 1 reply; 10+ messages in thread From: liuhuan01 @ 2024-09-03 10:24 UTC (permalink / raw) To: david; +Cc: cmaiolino, djwong, linux-xfs, liuh From: liuh <liuhuan01@kylinos.cn> Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process. xfs_db -c "sb 0" -c "p agblocks" /dev/loop1 System will generate signal SIGFPE corrupt the process. And the stack as follow: corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags #0 libxfs_getbuf_flags #1 libxfs_getbuf_flags #2 libxfs_buf_read_map #3 libxfs_buf_read #4 libxfs_mount #5 init #6 main The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0. In this case, user cannot run in expert mode also. So, try to get agblocks from agf/agi 0, if agf/agi 0 length match, use it as agblocks. If failed use the default geometry to calc agblocks. Signed-off-by: liuh <liuhuan01@kylinos.cn> --- db/Makefile | 2 +- db/init.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 143 insertions(+), 1 deletion(-) diff --git a/db/Makefile b/db/Makefile index 83389376..322d5617 100644 --- a/db/Makefile +++ b/db/Makefile @@ -68,7 +68,7 @@ CFILES = $(HFILES:.h=.c) \ LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBFROG) $(LIBUUID) $(LIBRT) $(LIBURCU) \ - $(LIBPTHREAD) + $(LIBPTHREAD) $(LIBBLKID) LTDEPENDENCIES = $(LIBXFS) $(LIBXLOG) $(LIBFROG) LLDFLAGS += -static-libtool-libs diff --git a/db/init.c b/db/init.c index cea25ae5..167bc777 100644 --- a/db/init.c +++ b/db/init.c @@ -38,6 +38,138 @@ usage(void) exit(1); } +static void +xfs_guess_default_ag_geometry(uint64_t *agsize, uint64_t *agcount, struct libxfs_init *x) +{ + struct fs_topology ft; + int blocklog; + uint64_t dblocks; + int multidisk; + + fprintf(stderr, "Attempting to guess AG length from device geometry. This may not work.\n"); + + memset(&ft, 0, sizeof(ft)); + get_topology(x, &ft, 1); + + /* + * get geometry from get_topology result. + * Use default block size (2^12) + */ + blocklog = 12; + multidisk = ft.data.swidth | ft.data.sunit; + dblocks = x->data.size >> (blocklog - BBSHIFT); + calc_default_ag_geometry(blocklog, dblocks, multidisk, + agsize, agcount); + + if (*agsize >= XFS_MIN_AG_BLOCKS && *agsize <= XFS_MAX_AG_BLOCKS) + fprintf(stderr, "Guessed AG length is %lu blocks.\n", *agsize); +} + +static xfs_agblock_t +xfs_get_agblock_from_agf(struct xfs_mount *mp) +{ + xfs_agblock_t agblocks = NULLAGBLOCK; + int error; + struct xfs_buf *bp; + struct xfs_agf *agf; + + error = -libxfs_buf_read_uncached(mp->m_ddev_targp, + XFS_AGF_DADDR(mp), 1, 0, &bp, NULL); + if (error) { + fprintf(stderr, "AGF 0 length recovery failed\n"); + return NULLAGBLOCK; + } + + agf = bp->b_addr; + if (be32_to_cpu(agf->agf_magicnum) == XFS_AGF_MAGIC) + agblocks = be32_to_cpu(agf->agf_length); + + libxfs_buf_relse(bp); + + if (agblocks != NULLAGBLOCK) + fprintf(stderr, "AGF 0 length %u blocks found.\n", agblocks); + else + fprintf(stderr, "AGF 0 length recovery failed.\n"); + + return agblocks; +} + +static xfs_agblock_t +xfs_get_agblock_from_agi(struct xfs_mount *mp) +{ + xfs_agblock_t agblocks = NULLAGBLOCK; + int error; + struct xfs_buf *bp; + struct xfs_agi *agi; + + error = -libxfs_buf_read_uncached(mp->m_ddev_targp, + XFS_AGI_DADDR(mp), 1, 0, &bp, NULL); + if (error) { + fprintf(stderr, "AGI 0 length recovery failed\n"); + return NULLAGBLOCK; + } + + + agi = bp->b_addr; + if (be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC) + agblocks = be32_to_cpu(agi->agi_length); + + libxfs_buf_relse(bp); + + if (agblocks != NULLAGBLOCK) + fprintf(stderr, "AGI 0 length %u blocks found.\n", agblocks); + else + fprintf(stderr, "AGI 0 length recovery failed.\n"); + + return agblocks; +} + +/* + * Try to get it from agf/agi length when primary superblock agblocks damaged. + * If agf matchs agi length, use it as agblocks, otherwise use the default geometry + * to calc agblocks + */ +static xfs_agblock_t +xfs_try_get_agblocks(struct xfs_mount *mp, struct libxfs_init *x) +{ + xfs_agblock_t agblocks = NULLAGBLOCK; + xfs_agblock_t agblocks_agf, agblocks_agi; + uint64_t agsize, agcount; + + fprintf(stderr, "Attempting recovery from AGF/AGI 0 metadata...\n"); + + agblocks_agf = xfs_get_agblock_from_agf(mp); + agblocks_agi = xfs_get_agblock_from_agi(mp); + + if (agblocks_agf == agblocks_agi && agblocks_agf >= XFS_MIN_AG_BLOCKS && agblocks_agf <= XFS_MAX_AG_BLOCKS) { + fprintf(stderr, "AGF/AGI 0 length matches.\n"); + fprintf(stderr, "Using %u blocks for superblock agblocks\n", agblocks_agf); + return agblocks_agf; + } + + /* use default geometry to calc agblocks/agcount */ + xfs_guess_default_ag_geometry(&agsize, &agcount, x); + + /* choose the agblocks among agf/agi length and agsize */ + if (agblocks_agf == agsize && agsize >= XFS_MIN_AG_BLOCKS && agsize <= XFS_MAX_AG_BLOCKS) { + fprintf(stderr, "Guessed AG matchs AGF length\n"); + agblocks = agsize; + } else if (agblocks_agi == agsize && agsize >= XFS_MIN_AG_BLOCKS && agsize <= XFS_MAX_AG_BLOCKS) { + fprintf(stderr, "Guessed AG matchs AGI length\n"); + agblocks = agsize; + } else if (agsize >= XFS_MIN_AG_BLOCKS && agsize <= XFS_MAX_AG_BLOCKS) { + fprintf(stderr, "Guessed AG does not match AGF/AGI 0 length.\n"); + agblocks = agsize; + } else { + fprintf(stderr, "_(%s: device too small to hold a valid XFS filesystem)", progname); + exit(1); + } + + fprintf(stderr, "Using %u blocks for superblock agblocks.\n", agblocks); + + return agblocks; +} + static void init( int argc, @@ -129,6 +261,16 @@ init( } } + /* If sb_agblocks was damaged, try to get agblocks */ + if (sbp->sb_agblocks < XFS_MIN_AG_BLOCKS || sbp->sb_agblocks > XFS_MAX_AG_BLOCKS) { + xfs_agblock_t agblocks; + + fprintf(stderr, "Out of bounds superblock agblocks (%u) found.\n", sbp->sb_agblocks); + + agblocks = xfs_try_get_agblocks(&xmount, &x); + sbp->sb_agblocks = agblocks; + } + agcount = sbp->sb_agcount; mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER); if (!mp) { -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3] xfs_db: make sure agblocks is valid to prevent corruption 2024-09-03 10:24 ` [PATCH v3] " liuhuan01 @ 2024-09-16 6:55 ` Carlos Maiolino 0 siblings, 0 replies; 10+ messages in thread From: Carlos Maiolino @ 2024-09-16 6:55 UTC (permalink / raw) To: liuhuan01; +Cc: david, cmaiolino, djwong, linux-xfs On Tue, Sep 03, 2024 at 06:24:02PM GMT, liuhuan01@kylinos.cn wrote: > From: liuh <liuhuan01@kylinos.cn> > > Recently, I was testing xfstests. When I run xfs/350 case, it always generate coredump during the process. > xfs_db -c "sb 0" -c "p agblocks" /dev/loop1 Please, don't send new versions in reply to old versions. Carlos > > System will generate signal SIGFPE corrupt the process. And the stack as follow: > corrupt at: (*bpp)->b_pag = xfs_perag_get(btp->bt_mount, xfs_daddr_to_agno(btp->bt_mount, blkno)); in function libxfs_getbuf_flags > #0 libxfs_getbuf_flags > #1 libxfs_getbuf_flags > #2 libxfs_buf_read_map > #3 libxfs_buf_read > #4 libxfs_mount > #5 init > #6 main > > The coredump was caused by the corrupt superblock metadata: (mp)->m_sb.sb_agblocks, it was 0. > In this case, user cannot run in expert mode also. > > So, try to get agblocks from agf/agi 0, if agf/agi 0 length match, use it as agblocks. > If failed use the default geometry to calc agblocks. > > Signed-off-by: liuh <liuhuan01@kylinos.cn> > --- > db/Makefile | 2 +- > db/init.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 143 insertions(+), 1 deletion(-) > > diff --git a/db/Makefile b/db/Makefile > index 83389376..322d5617 100644 > --- a/db/Makefile > +++ b/db/Makefile > @@ -68,7 +68,7 @@ CFILES = $(HFILES:.h=.c) \ > LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh > > LLDLIBS = $(LIBXFS) $(LIBXLOG) $(LIBFROG) $(LIBUUID) $(LIBRT) $(LIBURCU) \ > - $(LIBPTHREAD) > + $(LIBPTHREAD) $(LIBBLKID) > LTDEPENDENCIES = $(LIBXFS) $(LIBXLOG) $(LIBFROG) > LLDFLAGS += -static-libtool-libs > > diff --git a/db/init.c b/db/init.c > index cea25ae5..167bc777 100644 > --- a/db/init.c > +++ b/db/init.c > @@ -38,6 +38,138 @@ usage(void) > exit(1); > } > > +static void > +xfs_guess_default_ag_geometry(uint64_t *agsize, uint64_t *agcount, struct libxfs_init *x) > +{ > + struct fs_topology ft; > + int blocklog; > + uint64_t dblocks; > + int multidisk; > + > + fprintf(stderr, "Attempting to guess AG length from device geometry. This may not work.\n"); > + > + memset(&ft, 0, sizeof(ft)); > + get_topology(x, &ft, 1); > + > + /* > + * get geometry from get_topology result. > + * Use default block size (2^12) > + */ > + blocklog = 12; > + multidisk = ft.data.swidth | ft.data.sunit; > + dblocks = x->data.size >> (blocklog - BBSHIFT); > + calc_default_ag_geometry(blocklog, dblocks, multidisk, > + agsize, agcount); > + > + if (*agsize >= XFS_MIN_AG_BLOCKS && *agsize <= XFS_MAX_AG_BLOCKS) > + fprintf(stderr, "Guessed AG length is %lu blocks.\n", *agsize); > +} > + > +static xfs_agblock_t > +xfs_get_agblock_from_agf(struct xfs_mount *mp) > +{ > + xfs_agblock_t agblocks = NULLAGBLOCK; > + int error; > + struct xfs_buf *bp; > + struct xfs_agf *agf; > + > + error = -libxfs_buf_read_uncached(mp->m_ddev_targp, > + XFS_AGF_DADDR(mp), 1, 0, &bp, NULL); > + if (error) { > + fprintf(stderr, "AGF 0 length recovery failed\n"); > + return NULLAGBLOCK; > + } > + > + agf = bp->b_addr; > + if (be32_to_cpu(agf->agf_magicnum) == XFS_AGF_MAGIC) > + agblocks = be32_to_cpu(agf->agf_length); > + > + libxfs_buf_relse(bp); > + > + if (agblocks != NULLAGBLOCK) > + fprintf(stderr, "AGF 0 length %u blocks found.\n", agblocks); > + else > + fprintf(stderr, "AGF 0 length recovery failed.\n"); > + > + return agblocks; > +} > + > +static xfs_agblock_t > +xfs_get_agblock_from_agi(struct xfs_mount *mp) > +{ > + xfs_agblock_t agblocks = NULLAGBLOCK; > + int error; > + struct xfs_buf *bp; > + struct xfs_agi *agi; > + > + error = -libxfs_buf_read_uncached(mp->m_ddev_targp, > + XFS_AGI_DADDR(mp), 1, 0, &bp, NULL); > + if (error) { > + fprintf(stderr, "AGI 0 length recovery failed\n"); > + return NULLAGBLOCK; > + } > + > + > + agi = bp->b_addr; > + if (be32_to_cpu(agi->agi_magicnum) == XFS_AGI_MAGIC) > + agblocks = be32_to_cpu(agi->agi_length); > + > + libxfs_buf_relse(bp); > + > + if (agblocks != NULLAGBLOCK) > + fprintf(stderr, "AGI 0 length %u blocks found.\n", agblocks); > + else > + fprintf(stderr, "AGI 0 length recovery failed.\n"); > + > + return agblocks; > +} > + > +/* > + * Try to get it from agf/agi length when primary superblock agblocks damaged. > + * If agf matchs agi length, use it as agblocks, otherwise use the default geometry > + * to calc agblocks > + */ > +static xfs_agblock_t > +xfs_try_get_agblocks(struct xfs_mount *mp, struct libxfs_init *x) > +{ > + xfs_agblock_t agblocks = NULLAGBLOCK; > + xfs_agblock_t agblocks_agf, agblocks_agi; > + uint64_t agsize, agcount; > + > + fprintf(stderr, "Attempting recovery from AGF/AGI 0 metadata...\n"); > + > + agblocks_agf = xfs_get_agblock_from_agf(mp); > + agblocks_agi = xfs_get_agblock_from_agi(mp); > + > + if (agblocks_agf == agblocks_agi && agblocks_agf >= XFS_MIN_AG_BLOCKS && agblocks_agf <= XFS_MAX_AG_BLOCKS) { > + fprintf(stderr, "AGF/AGI 0 length matches.\n"); > + fprintf(stderr, "Using %u blocks for superblock agblocks\n", agblocks_agf); > + return agblocks_agf; > + } > + > + /* use default geometry to calc agblocks/agcount */ > + xfs_guess_default_ag_geometry(&agsize, &agcount, x); > + > + /* choose the agblocks among agf/agi length and agsize */ > + if (agblocks_agf == agsize && agsize >= XFS_MIN_AG_BLOCKS && agsize <= XFS_MAX_AG_BLOCKS) { > + fprintf(stderr, "Guessed AG matchs AGF length\n"); > + agblocks = agsize; > + } else if (agblocks_agi == agsize && agsize >= XFS_MIN_AG_BLOCKS && agsize <= XFS_MAX_AG_BLOCKS) { > + fprintf(stderr, "Guessed AG matchs AGI length\n"); > + agblocks = agsize; > + } else if (agsize >= XFS_MIN_AG_BLOCKS && agsize <= XFS_MAX_AG_BLOCKS) { > + fprintf(stderr, "Guessed AG does not match AGF/AGI 0 length.\n"); > + agblocks = agsize; > + } else { > + fprintf(stderr, "_(%s: device too small to hold a valid XFS filesystem)", progname); > + exit(1); > + } > + > + fprintf(stderr, "Using %u blocks for superblock agblocks.\n", agblocks); > + > + return agblocks; > +} > + > static void > init( > int argc, > @@ -129,6 +261,16 @@ init( > } > } > > + /* If sb_agblocks was damaged, try to get agblocks */ > + if (sbp->sb_agblocks < XFS_MIN_AG_BLOCKS || sbp->sb_agblocks > XFS_MAX_AG_BLOCKS) { > + xfs_agblock_t agblocks; > + > + fprintf(stderr, "Out of bounds superblock agblocks (%u) found.\n", sbp->sb_agblocks); > + > + agblocks = xfs_try_get_agblocks(&xmount, &x); > + sbp->sb_agblocks = agblocks; > + } > + > agcount = sbp->sb_agcount; > mp = libxfs_mount(&xmount, sbp, &x, LIBXFS_MOUNT_DEBUGGER); > if (!mp) { > -- > 2.43.0 > > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-09-16 6:55 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-08-21 10:44 [PATCH] xfs_db: make sure agblocks is valid to prevent corruption liuhuan01 2024-08-23 0:49 ` Darrick J. Wong 2024-08-27 3:23 ` Dave Chinner 2024-08-27 10:24 ` liuh 2024-08-27 23:37 ` Dave Chinner 2024-09-02 10:12 ` [PATCH v2] " liuhuan01 2024-09-02 18:56 ` Darrick J. Wong 2024-09-03 2:28 ` Dave Chinner 2024-09-03 10:24 ` [PATCH v3] " liuhuan01 2024-09-16 6:55 ` Carlos Maiolino
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox