* [PATCH] xfs_repair: coordinate parallel updates to the rt bitmap
@ 2020-09-23 18:24 Darrick J. Wong
2020-09-24 5:40 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2020-09-23 18:24 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
From: Darrick J. Wong <darrick.wong@oracle.com>
Actually take the rt lock before updating the bitmap from multiple
threads. This fixes an infrequent corruption problem when running
generic/013 and rtinherit=1 is set on the root dir.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
repair/dinode.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/repair/dinode.c b/repair/dinode.c
index 57013bf149b2..07f3f83aef8c 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -184,6 +184,7 @@ process_rt_rec(
xfs_rfsblock_t *tot,
int check_dups)
{
+ struct aglock *lock = &ag_locks[(signed)NULLAGNUMBER];
xfs_fsblock_t b, lastb;
xfs_rtblock_t ext;
int state;
@@ -245,6 +246,7 @@ _("data fork in rt ino %" PRIu64 " claims dup rt extent,"
continue;
}
+ pthread_mutex_lock(&lock->lock);
state = get_rtbmap(ext);
switch (state) {
case XR_E_FREE:
@@ -270,6 +272,7 @@ _("data fork in rt inode %" PRIu64 " found metadata block %" PRIu64 " in rt bmap
do_warn(
_("data fork in rt inode %" PRIu64 " claims used rt block %" PRIu64 "\n"),
ino, ext);
+ pthread_mutex_unlock(&lock->lock);
return 1;
case XR_E_FREE1:
default:
@@ -277,6 +280,7 @@ _("data fork in rt inode %" PRIu64 " claims used rt block %" PRIu64 "\n"),
_("illegal state %d in rt block map %" PRIu64 "\n"),
state, b);
}
+ pthread_mutex_unlock(&lock->lock);
}
/*
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] xfs_repair: coordinate parallel updates to the rt bitmap
2020-09-23 18:24 [PATCH] xfs_repair: coordinate parallel updates to the rt bitmap Darrick J. Wong
@ 2020-09-24 5:40 ` Christoph Hellwig
2020-09-24 6:00 ` Darrick J. Wong
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2020-09-24 5:40 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Eric Sandeen, xfs
On Wed, Sep 23, 2020 at 11:24:37AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
>
> Actually take the rt lock before updating the bitmap from multiple
> threads. This fixes an infrequent corruption problem when running
> generic/013 and rtinherit=1 is set on the root dir.
>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
> repair/dinode.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/repair/dinode.c b/repair/dinode.c
> index 57013bf149b2..07f3f83aef8c 100644
> --- a/repair/dinode.c
> +++ b/repair/dinode.c
> @@ -184,6 +184,7 @@ process_rt_rec(
> xfs_rfsblock_t *tot,
> int check_dups)
> {
> + struct aglock *lock = &ag_locks[(signed)NULLAGNUMBER];
Err, what is this weird cast doing here?
The rest looks sane.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xfs_repair: coordinate parallel updates to the rt bitmap
2020-09-24 5:40 ` Christoph Hellwig
@ 2020-09-24 6:00 ` Darrick J. Wong
2020-09-24 6:19 ` Christoph Hellwig
0 siblings, 1 reply; 5+ messages in thread
From: Darrick J. Wong @ 2020-09-24 6:00 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Eric Sandeen, xfs
On Thu, Sep 24, 2020 at 06:40:41AM +0100, Christoph Hellwig wrote:
> On Wed, Sep 23, 2020 at 11:24:37AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> >
> > Actually take the rt lock before updating the bitmap from multiple
> > threads. This fixes an infrequent corruption problem when running
> > generic/013 and rtinherit=1 is set on the root dir.
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> > repair/dinode.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/repair/dinode.c b/repair/dinode.c
> > index 57013bf149b2..07f3f83aef8c 100644
> > --- a/repair/dinode.c
> > +++ b/repair/dinode.c
> > @@ -184,6 +184,7 @@ process_rt_rec(
> > xfs_rfsblock_t *tot,
> > int check_dups)
> > {
> > + struct aglock *lock = &ag_locks[(signed)NULLAGNUMBER];
>
> Err, what is this weird cast doing here?
Well.... ag_locks is allocated with length ag_locks[agcount + 1], and
then the pointer is incremented so that ag_locks[-1] is the rt lock.
NULLAGNUMBER is cast to xfs_agnumber_t, which is uint32_t, so we have to
cast it back to signed so that we actually do the pointer subtraction(!)
Yeah, I know, it's nuts...
--D
> The rest looks sane.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xfs_repair: coordinate parallel updates to the rt bitmap
2020-09-24 6:00 ` Darrick J. Wong
@ 2020-09-24 6:19 ` Christoph Hellwig
2020-09-24 15:06 ` Darrick J. Wong
0 siblings, 1 reply; 5+ messages in thread
From: Christoph Hellwig @ 2020-09-24 6:19 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: Christoph Hellwig, Eric Sandeen, xfs
On Wed, Sep 23, 2020 at 11:00:01PM -0700, Darrick J. Wong wrote:
> > > + struct aglock *lock = &ag_locks[(signed)NULLAGNUMBER];
> >
> > Err, what is this weird cast doing here?
>
> Well.... ag_locks is allocated with length ag_locks[agcount + 1], and
> then the pointer is incremented so that ag_locks[-1] is the rt lock.
At least in the for-next branch it isn't:
ag_locks = calloc(mp->m_sb.sb_agcount, sizeof(struct aglock));
More importantly, I can't even find other uses of ag_locks for the
RT subvolume. Is this hidden in one of your series?
Either way I think a separate lock for the RT subvolume would make a
whole lot more sense.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] xfs_repair: coordinate parallel updates to the rt bitmap
2020-09-24 6:19 ` Christoph Hellwig
@ 2020-09-24 15:06 ` Darrick J. Wong
0 siblings, 0 replies; 5+ messages in thread
From: Darrick J. Wong @ 2020-09-24 15:06 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Eric Sandeen, xfs
On Thu, Sep 24, 2020 at 07:19:11AM +0100, Christoph Hellwig wrote:
> On Wed, Sep 23, 2020 at 11:00:01PM -0700, Darrick J. Wong wrote:
> > > > + struct aglock *lock = &ag_locks[(signed)NULLAGNUMBER];
> > >
> > > Err, what is this weird cast doing here?
> >
> > Well.... ag_locks is allocated with length ag_locks[agcount + 1], and
> > then the pointer is incremented so that ag_locks[-1] is the rt lock.
>
> At least in the for-next branch it isn't:
>
> ag_locks = calloc(mp->m_sb.sb_agcount, sizeof(struct aglock));
>
> More importantly, I can't even find other uses of ag_locks for the
> RT subvolume. Is this hidden in one of your series?
Doh. Yes, it is, in the realtime rmap series. :( :(
> Either way I think a separate lock for the RT subvolume would make a
> whole lot more sense.
Yes, let's do it that way.
--D
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-09-24 15:09 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-09-23 18:24 [PATCH] xfs_repair: coordinate parallel updates to the rt bitmap Darrick J. Wong
2020-09-24 5:40 ` Christoph Hellwig
2020-09-24 6:00 ` Darrick J. Wong
2020-09-24 6:19 ` Christoph Hellwig
2020-09-24 15:06 ` Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox