* xfs_repair deleting realtime files.
@ 2012-09-21 0:40 Anand Tiwari
2012-09-21 5:00 ` Eric Sandeen
0 siblings, 1 reply; 16+ messages in thread
From: Anand Tiwari @ 2012-09-21 0:40 UTC (permalink / raw)
To: xfs
[-- Attachment #1.1: Type: text/plain, Size: 3701 bytes --]
Hi All,
I have been looking into an issue with xfs_repair with realtime sub volume.
some times while running xfs_repair I see following errors
----------------------------
data fork in rt inode 134 claims used rt block 19607
bad data fork in inode 134
would have cleared inode 134
data fork in rt inode 135 claims used rt block 29607
bad data fork in inode 135
would have cleared inode 135
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
entry "test-011" in shortform directory 128 references free inode 134
would have junked entry "test-011" in directory inode 128
entry "test-0" in shortform directory 128 references free inode 135
would have junked entry "test-0" in directory inode 128
data fork in rt ino 134 claims dup rt extent,off - 0, start - 7942144,
count 2097000
bad data fork in inode 134
would have cleared inode 134
data fork in rt ino 135 claims dup rt extent,off - 0, start - 13062144,
count 2097000
bad data fork in inode 135
would have cleared inode 135
No modify flag set, skipping phase 5
------------------------
Here is the bmap for both inodes.
xfs_db> inode 135
xfs_db> bmap
data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0
data offset 2097000 startblock 15159144 (14/479080) count 2097000 flag 0
data offset 4194000 startblock 17256144 (16/478928) count 2097000 flag 0
data offset 6291000 startblock 19353144 (18/478776) count 2097000 flag 0
data offset 8388000 startblock 21450144 (20/478624) count 2097000 flag 0
data offset 10485000 startblock 23547144 (22/478472) count 2097000 flag 0
data offset 12582000 startblock 25644144 (24/478320) count 2097000 flag 0
data offset 14679000 startblock 27741144 (26/478168) count 2097000 flag 0
data offset 16776000 startblock 29838144 (28/478016) count 2097000 flag 0
data offset 18873000 startblock 31935144 (30/477864) count 1607000 flag 0
xfs_db> inode 134
xfs_db> bmap
data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0
data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag 0
data offset 4194000 startblock 12136144 (11/601808) count 926000 flag 0
by looking into xfs_repair code, it looks like repair does not handle a
case where we have more than one extent in a real-time extent.
following is code from repair/dinode.c: process_rt_rec
-----
for (b = irec->br_startblock; b < irec->br_startblock
+
irec->br_blockcount; b += mp->m_sb.sb_rextsize)
{
ext = (xfs_drtbno_t) b /
mp->m_sb.sb_rextsize;
pwe = xfs_sb_version_hasextflgbit(&mp->m_sb)
&&
irec->br_state == XFS_EXT_UNWRITTEN
&&
(b % mp->m_sb.sb_rextsize !=
0);
-----
In my case rextsize is 512 (512 * 4096 = 2mb). So we have multiple extents
(written extents to be precise, thanks dchinner for that), value of "ext"
will be same for all of them and xfs_repair does not like it. thus the
error message ""data fork in rt inode XX claims used rt block XX".
If I ignore this failure condition, xfs_repairs seems to be happy. (FYI:
this file-system is cleanly umounted)
But in my opinion, its not good as these multiple extents can overlap too.
Should we be using XR_E_MULT to flag and keep track of duplicated real-time
extents. (maybe use the present API for adding/detecting duplicate extents)
I am open of suggestion or comments on how to fix this.
xfs_repair version is 3.1.8 and kernel 2.6.37.
thanks,
Anand
[-- Attachment #1.2: Type: text/html, Size: 4706 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-09-21 0:40 xfs_repair deleting realtime files Anand Tiwari
@ 2012-09-21 5:00 ` Eric Sandeen
2012-09-21 15:51 ` Anand Tiwari
2012-09-24 7:55 ` Dave Chinner
0 siblings, 2 replies; 16+ messages in thread
From: Eric Sandeen @ 2012-09-21 5:00 UTC (permalink / raw)
To: Anand Tiwari; +Cc: xfs
On 9/20/12 7:40 PM, Anand Tiwari wrote:
> Hi All,
>
> I have been looking into an issue with xfs_repair with realtime sub volume. some times while running xfs_repair I see following errors
>
> ----------------------------
> data fork in rt inode 134 claims used rt block 19607
> bad data fork in inode 134
> would have cleared inode 134
> data fork in rt inode 135 claims used rt block 29607
> bad data fork in inode 135
> would have cleared inode 135
> - agno = 1
> - agno = 2
> - agno = 3
> - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
> - setting up duplicate extent list...
> - check for inodes claiming duplicate blocks...
> - agno = 0
> - agno = 1
> - agno = 2
> - agno = 3
> entry "test-011" in shortform directory 128 references free inode 134
> would have junked entry "test-011" in directory inode 128
> entry "test-0" in shortform directory 128 references free inode 135
> would have junked entry "test-0" in directory inode 128
> data fork in rt ino 134 claims dup rt extent,off - 0, start - 7942144, count 2097000
> bad data fork in inode 134
> would have cleared inode 134
> data fork in rt ino 135 claims dup rt extent,off - 0, start - 13062144, count 2097000
> bad data fork in inode 135
> would have cleared inode 135
> No modify flag set, skipping phase 5
> ------------------------
>
> Here is the bmap for both inodes.
>
> xfs_db> inode 135
> xfs_db> bmap
> data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0
> data offset 2097000 startblock 15159144 (14/479080) count 2097000 flag 0
> data offset 4194000 startblock 17256144 (16/478928) count 2097000 flag 0
> data offset 6291000 startblock 19353144 (18/478776) count 2097000 flag 0
> data offset 8388000 startblock 21450144 (20/478624) count 2097000 flag 0
> data offset 10485000 startblock 23547144 (22/478472) count 2097000 flag 0
> data offset 12582000 startblock 25644144 (24/478320) count 2097000 flag 0
> data offset 14679000 startblock 27741144 (26/478168) count 2097000 flag 0
> data offset 16776000 startblock 29838144 (28/478016) count 2097000 flag 0
> data offset 18873000 startblock 31935144 (30/477864) count 1607000 flag 0
> xfs_db> inode 134
> xfs_db> bmap
> data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0
> data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag 0
> data offset 4194000 startblock 12136144 (11/601808) count 926000 flag 0
It's been a while since I thought about realtime, but -
That all seems fine, I don't see anything overlapping there, they are
all perfectly adjacent, though of interesting size.
>
> by looking into xfs_repair code, it looks like repair does not handle
> a case where we have more than one extent in a real-time extent.
> following is code from repair/dinode.c: process_rt_rec
"more than one extent in a real-time extent?" I'm not sure what that means.
Every extent above is length 2097000 blocks, and they are adjacent.
But you say your realtime extent size is 512 blocks ... which doesn't go
into 2097000 evenly. So that's odd, at least.
Can you provide your xfs_info output for this fs?
Or maybe better yet an xfs_metadump image.
> -----
> for (b = irec->br_startblock; b < irec->br_startblock +
> irec->br_blockcount; b += mp->m_sb.sb_rextsize) {
> ext = (xfs_drtbno_t) b / mp->m_sb.sb_rextsize;
> pwe = xfs_sb_version_hasextflgbit(&mp->m_sb) &&
> irec->br_state == XFS_EXT_UNWRITTEN &&
> (b % mp->m_sb.sb_rextsize != 0);
> -----
>
> In my case rextsize is 512 (512 * 4096 = 2mb). So we have multiple
> extents (written extents to be precise, thanks dchinner for that),
> value of "ext" will be same for all of them and xfs_repair does not
> like it. thus the error message ""data fork in rt inode XX claims
> used rt block XX".
"ext" should not be the same for all of them; ext is the realtime extent
number in the fs, based on the physical start, br_startblock,
divided by the rt extent size. There shouldn't be duplicate values
of "ext" based on the bmaps above.
The error comes from search_rt_dup_extent() which looks for overlaps
elsewhere in the fs...
If you can provide a metadump of the fs it might be easier to see what's going on.
-Eric
> If I ignore this failure condition, xfs_repairs seems to be happy.
> (FYI: this file-system is cleanly umounted) But in my opinion, its
> not good as these multiple extents can overlap too.
> Should we be using XR_E_MULT to flag and keep track of duplicated
> real-time extents. (maybe use the present API for adding/detecting
> duplicate extents)
>
> I am open of suggestion or comments on how to fix this.
>
> xfs_repair version is 3.1.8 and kernel 2.6.37.
>
> thanks,
> Anand
>
>
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-09-21 5:00 ` Eric Sandeen
@ 2012-09-21 15:51 ` Anand Tiwari
2012-09-21 16:07 ` Eric Sandeen
2012-09-24 7:55 ` Dave Chinner
1 sibling, 1 reply; 16+ messages in thread
From: Anand Tiwari @ 2012-09-21 15:51 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
[-- Attachment #1.1: Type: text/plain, Size: 6634 bytes --]
On Thu, Sep 20, 2012 at 11:00 PM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 9/20/12 7:40 PM, Anand Tiwari wrote:
> > Hi All,
> >
> > I have been looking into an issue with xfs_repair with realtime sub
> volume. some times while running xfs_repair I see following errors
> >
> > ----------------------------
> > data fork in rt inode 134 claims used rt block 19607
> > bad data fork in inode 134
> > would have cleared inode 134
> > data fork in rt inode 135 claims used rt block 29607
> > bad data fork in inode 135
> > would have cleared inode 135
> > - agno = 1
> > - agno = 2
> > - agno = 3
> > - process newly discovered inodes...
> > Phase 4 - check for duplicate blocks...
> > - setting up duplicate extent list...
> > - check for inodes claiming duplicate blocks...
> > - agno = 0
> > - agno = 1
> > - agno = 2
> > - agno = 3
> > entry "test-011" in shortform directory 128 references free inode 134
> > would have junked entry "test-011" in directory inode 128
> > entry "test-0" in shortform directory 128 references free inode 135
> > would have junked entry "test-0" in directory inode 128
> > data fork in rt ino 134 claims dup rt extent,off - 0, start - 7942144,
> count 2097000
> > bad data fork in inode 134
> > would have cleared inode 134
> > data fork in rt ino 135 claims dup rt extent,off - 0, start - 13062144,
> count 2097000
> > bad data fork in inode 135
> > would have cleared inode 135
> > No modify flag set, skipping phase 5
> > ------------------------
> >
> > Here is the bmap for both inodes.
> >
> > xfs_db> inode 135
> > xfs_db> bmap
> > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0
> > data offset 2097000 startblock 15159144 (14/479080) count 2097000 flag 0
> > data offset 4194000 startblock 17256144 (16/478928) count 2097000 flag 0
> > data offset 6291000 startblock 19353144 (18/478776) count 2097000 flag 0
> > data offset 8388000 startblock 21450144 (20/478624) count 2097000 flag 0
> > data offset 10485000 startblock 23547144 (22/478472) count 2097000 flag 0
> > data offset 12582000 startblock 25644144 (24/478320) count 2097000 flag 0
> > data offset 14679000 startblock 27741144 (26/478168) count 2097000 flag 0
> > data offset 16776000 startblock 29838144 (28/478016) count 2097000 flag 0
> > data offset 18873000 startblock 31935144 (30/477864) count 1607000 flag 0
> > xfs_db> inode 134
> > xfs_db> bmap
> > data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0
> > data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag 0
> > data offset 4194000 startblock 12136144 (11/601808) count 926000 flag 0
>
> It's been a while since I thought about realtime, but -
>
> That all seems fine, I don't see anything overlapping there, they are
> all perfectly adjacent, though of interesting size.
>
> >
> > by looking into xfs_repair code, it looks like repair does not handle
> > a case where we have more than one extent in a real-time extent.
> > following is code from repair/dinode.c: process_rt_rec
>
> "more than one extent in a real-time extent?" I'm not sure what that
> means.
>
> Every extent above is length 2097000 blocks, and they are adjacent.
> But you say your realtime extent size is 512 blocks ... which doesn't go
> into 2097000 evenly. So that's odd, at least.
>
>
well, lets look at first extent
> data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0
> data offset 2097000 startblock 15159144 (14/479080) count 2097000 flag 0
startblock is aligned and rtext is 25512, since the blockcount is not
multiple of 512, last realtime extent ( 25512 + 4095) is partially used,
360 blks
second extent start from realtime extent 29607 (ie 25512 + 4095). so, yes,
extents are not overlapping, but 29607 realtime extent is shared by two
extents.
Now once xfs_repair detects this case in phase 2, it bails out and clears
that inode. I think search for duplicate extent is done in phase 4, but
inode is marked already.
Can you provide your xfs_info output for this fs?
> Or maybe better yet an xfs_metadump image.
>
>
I will do it soon. The other thing I noticed is if I try to delete this
file (inode 135), I get following assertion failure.
[75669.291000] Assertion failed: prev.br_state == XFS_EXT_NORM, file:
fs/xfs/xfs_bmap.c, line:
5187
[75669.300000] [<8024dc44>]
assfail+0x28/0x2c
[75669.300000] [<801d8864>]
xfs_bunmapi+0x1288/0x14e4
[75669.300000] [<8020aff4>]
xfs_itruncate_finish+0x344/0x738
[75669.300000] [<802363b0>]
xfs_inactive+0x4a8/0x51c
[75669.300000] [<80109124>]
evict+0x28/0xd0
[75669.300000] [<80109c14>]
iput+0x19c/0x2d8
[75669.300000] [<800fe58c>]
do_unlinkat+0x10c/0x19c
[75669.300000] [<80011b7c>] stack_done+0x20/0x40
> > -----
> > for (b = irec->br_startblock; b < irec->br_startblock +
> > irec->br_blockcount; b += mp->m_sb.sb_rextsize)
> {
> > ext = (xfs_drtbno_t) b / mp->m_sb.sb_rextsize;
> > pwe = xfs_sb_version_hasextflgbit(&mp->m_sb) &&
> > irec->br_state == XFS_EXT_UNWRITTEN &&
> > (b % mp->m_sb.sb_rextsize != 0);
> > -----
> >
> > In my case rextsize is 512 (512 * 4096 = 2mb). So we have multiple
> > extents (written extents to be precise, thanks dchinner for that),
> > value of "ext" will be same for all of them and xfs_repair does not
> > like it. thus the error message ""data fork in rt inode XX claims
> > used rt block XX".
>
> "ext" should not be the same for all of them; ext is the realtime extent
> number in the fs, based on the physical start, br_startblock,
> divided by the rt extent size. There shouldn't be duplicate values
> of "ext" based on the bmaps above.
>
> The error comes from search_rt_dup_extent() which looks for overlaps
> elsewhere in the fs...
>
> If you can provide a metadump of the fs it might be easier to see what's
> going on.
>
> -Eric
>
> > If I ignore this failure condition, xfs_repairs seems to be happy.
> > (FYI: this file-system is cleanly umounted) But in my opinion, its
> > not good as these multiple extents can overlap too.
>
> > Should we be using XR_E_MULT to flag and keep track of duplicated
> > real-time extents. (maybe use the present API for adding/detecting
> > duplicate extents)
> >
> > I am open of suggestion or comments on how to fix this.
> >
> > xfs_repair version is 3.1.8 and kernel 2.6.37.
> >
> > thanks,
> > Anand
> >
> >
> >
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
> >
>
>
[-- Attachment #1.2: Type: text/html, Size: 9326 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-09-21 15:51 ` Anand Tiwari
@ 2012-09-21 16:07 ` Eric Sandeen
2012-09-21 16:40 ` Anand Tiwari
0 siblings, 1 reply; 16+ messages in thread
From: Eric Sandeen @ 2012-09-21 16:07 UTC (permalink / raw)
To: Anand Tiwari; +Cc: xfs
On 9/21/12 10:51 AM, Anand Tiwari wrote:
>
>
> On Thu, Sep 20, 2012 at 11:00 PM, Eric Sandeen <sandeen@sandeen.net
> <mailto:sandeen@sandeen.net>> wrote:
>
> On 9/20/12 7:40 PM, Anand Tiwari wrote:
>> Hi All,
>>
>> I have been looking into an issue with xfs_repair with realtime sub
>> volume. some times while running xfs_repair I see following errors
>>
>> ---------------------------- data fork in rt inode 134 claims used
>> rt block 19607 bad data fork in inode 134 would have cleared inode
>> 134 data fork in rt inode 135 claims used rt block 29607 bad data
>> fork in inode 135 would have cleared inode 135 - agno = 1 - agno =
>> 2 - agno = 3 - process newly discovered inodes... Phase 4 - check
>> for duplicate blocks... - setting up duplicate extent list... -
>> check for inodes claiming duplicate blocks... - agno = 0 - agno =
>> 1 - agno = 2 - agno = 3 entry "test-011" in shortform directory 128
>> references free inode 134 would have junked entry "test-011" in
>> directory inode 128 entry "test-0" in shortform directory 128
>> references free inode 135 would have junked entry "test-0" in
>> directory inode 128 data fork in rt ino 134 claims dup rt
>> extent,off - 0, start - 7942144, count 2097000 bad data fork in
>> inode 134 would have cleared inode 134 data fork in rt ino 135
>> claims dup rt extent,off - 0, start - 13062144, count 2097000 bad
>> data fork in inode 135 would have cleared inode 135 No modify flag
>> set, skipping phase 5 ------------------------
>>
>> Here is the bmap for both inodes.
>>
>> xfs_db> inode 135 xfs_db> bmap data offset 0 startblock 13062144
>> (12/479232) count 2097000 flag 0 data offset 2097000 startblock
>> 15159144 (14/479080) count 2097000 flag 0 data offset 4194000
>> startblock 17256144 (16/478928) count 2097000 flag 0 data offset
>> 6291000 startblock 19353144 (18/478776) count 2097000 flag 0 data
>> offset 8388000 startblock 21450144 (20/478624) count 2097000 flag
>> 0 data offset 10485000 startblock 23547144 (22/478472) count
>> 2097000 flag 0 data offset 12582000 startblock 25644144 (24/478320)
>> count 2097000 flag 0 data offset 14679000 startblock 27741144
>> (26/478168) count 2097000 flag 0 data offset 16776000 startblock
>> 29838144 (28/478016) count 2097000 flag 0 data offset 18873000
>> startblock 31935144 (30/477864) count 1607000 flag 0 xfs_db> inode
>> 134 xfs_db> bmap data offset 0 startblock 7942144 (7/602112) count
>> 2097000 flag 0 data offset 2097000 startblock 10039144 (9/601960)
>> count 2097000 flag 0 data offset 4194000 startblock 12136144
>> (11/601808) count 926000 flag 0
>
> It's been a while since I thought about realtime, but -
>
> That all seems fine, I don't see anything overlapping there, they
> are all perfectly adjacent, though of interesting size.
>
>>
>> by looking into xfs_repair code, it looks like repair does not
>> handle a case where we have more than one extent in a real-time
>> extent. following is code from repair/dinode.c: process_rt_rec
>
> "more than one extent in a real-time extent?" I'm not sure what that
> means.
>
> Every extent above is length 2097000 blocks, and they are adjacent.
> But you say your realtime extent size is 512 blocks ... which doesn't
> go into 2097000 evenly. So that's odd, at least.
>
>
> well, lets look at first extent
>> data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0
>> data offset 2097000 startblock 15159144 (14/479080) count 2097000
>> flag 0
> startblock is aligned and rtext is 25512, since the blockcount is
> not multiple of 512, last realtime extent ( 25512 + 4095) is
> partially used, 360 blks second extent start from realtime extent
> 29607 (ie 25512 + 4095). so, yes, extents are not overlapping, but
> 29607 realtime extent is shared by two extents. Now once xfs_repair
> detects this case in phase 2, it bails out and clears that inode. I
> think search for duplicate extent is done in phase 4, but inode is
> marked already.
... ok I realize I was misunderstanding some things about the realtime
volume. (It's been a very long time since I thought about it). Still,
I'd like to look at the metadump image if possible.
Thanks,
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-09-21 16:07 ` Eric Sandeen
@ 2012-09-21 16:40 ` Anand Tiwari
0 siblings, 0 replies; 16+ messages in thread
From: Anand Tiwari @ 2012-09-21 16:40 UTC (permalink / raw)
To: Eric Sandeen; +Cc: xfs
[-- Attachment #1.1: Type: text/plain, Size: 4419 bytes --]
On Fri, Sep 21, 2012 at 10:07 AM, Eric Sandeen <sandeen@sandeen.net> wrote:
> On 9/21/12 10:51 AM, Anand Tiwari wrote:
> >
> >
> > On Thu, Sep 20, 2012 at 11:00 PM, Eric Sandeen <sandeen@sandeen.net
> > <mailto:sandeen@sandeen.net>> wrote:
> >
> > On 9/20/12 7:40 PM, Anand Tiwari wrote:
> >> Hi All,
> >>
> >> I have been looking into an issue with xfs_repair with realtime sub
> >> volume. some times while running xfs_repair I see following errors
> >>
> >> ---------------------------- data fork in rt inode 134 claims used
> >> rt block 19607 bad data fork in inode 134 would have cleared inode
> >> 134 data fork in rt inode 135 claims used rt block 29607 bad data
> >> fork in inode 135 would have cleared inode 135 - agno = 1 - agno =
> >> 2 - agno = 3 - process newly discovered inodes... Phase 4 - check
> >> for duplicate blocks... - setting up duplicate extent list... -
> >> check for inodes claiming duplicate blocks... - agno = 0 - agno =
> >> 1 - agno = 2 - agno = 3 entry "test-011" in shortform directory 128
> >> references free inode 134 would have junked entry "test-011" in
> >> directory inode 128 entry "test-0" in shortform directory 128
> >> references free inode 135 would have junked entry "test-0" in
> >> directory inode 128 data fork in rt ino 134 claims dup rt
> >> extent,off - 0, start - 7942144, count 2097000 bad data fork in
> >> inode 134 would have cleared inode 134 data fork in rt ino 135
> >> claims dup rt extent,off - 0, start - 13062144, count 2097000 bad
> >> data fork in inode 135 would have cleared inode 135 No modify flag
> >> set, skipping phase 5 ------------------------
> >>
> >> Here is the bmap for both inodes.
> >>
> >> xfs_db> inode 135 xfs_db> bmap data offset 0 startblock 13062144
> >> (12/479232) count 2097000 flag 0 data offset 2097000 startblock
> >> 15159144 (14/479080) count 2097000 flag 0 data offset 4194000
> >> startblock 17256144 (16/478928) count 2097000 flag 0 data offset
> >> 6291000 startblock 19353144 (18/478776) count 2097000 flag 0 data
> >> offset 8388000 startblock 21450144 (20/478624) count 2097000 flag
> >> 0 data offset 10485000 startblock 23547144 (22/478472) count
> >> 2097000 flag 0 data offset 12582000 startblock 25644144 (24/478320)
> >> count 2097000 flag 0 data offset 14679000 startblock 27741144
> >> (26/478168) count 2097000 flag 0 data offset 16776000 startblock
> >> 29838144 (28/478016) count 2097000 flag 0 data offset 18873000
> >> startblock 31935144 (30/477864) count 1607000 flag 0 xfs_db> inode
> >> 134 xfs_db> bmap data offset 0 startblock 7942144 (7/602112) count
> >> 2097000 flag 0 data offset 2097000 startblock 10039144 (9/601960)
> >> count 2097000 flag 0 data offset 4194000 startblock 12136144
> >> (11/601808) count 926000 flag 0
> >
> > It's been a while since I thought about realtime, but -
> >
> > That all seems fine, I don't see anything overlapping there, they
> > are all perfectly adjacent, though of interesting size.
> >
> >>
> >> by looking into xfs_repair code, it looks like repair does not
> >> handle a case where we have more than one extent in a real-time
> >> extent. following is code from repair/dinode.c: process_rt_rec
> >
> > "more than one extent in a real-time extent?" I'm not sure what that
> > means.
> >
> > Every extent above is length 2097000 blocks, and they are adjacent.
> > But you say your realtime extent size is 512 blocks ... which doesn't
> > go into 2097000 evenly. So that's odd, at least.
> >
> >
> > well, lets look at first extent
> >> data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0
> >> data offset 2097000 startblock 15159144 (14/479080) count 2097000
> >> flag 0
> > startblock is aligned and rtext is 25512, since the blockcount is
> > not multiple of 512, last realtime extent ( 25512 + 4095) is
> > partially used, 360 blks second extent start from realtime extent
> > 29607 (ie 25512 + 4095). so, yes, extents are not overlapping, but
> > 29607 realtime extent is shared by two extents. Now once xfs_repair
> > detects this case in phase 2, it bails out and clears that inode. I
> > think search for duplicate extent is done in phase 4, but inode is
> > marked already.
>
> ... ok I realize I was misunderstanding some things about the realtime
> volume. (It's been a very long time since I thought about it). Still,
> I'd like to look at the metadump image if possible.
>
> Thanks,
> -Eric
>
metadump attached
[-- Attachment #1.2: Type: text/html, Size: 5568 bytes --]
[-- Attachment #2: dumpo.xfs.bz2 --]
[-- Type: application/x-bzip2, Size: 24562 bytes --]
[-- Attachment #3: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-09-21 5:00 ` Eric Sandeen
2012-09-21 15:51 ` Anand Tiwari
@ 2012-09-24 7:55 ` Dave Chinner
2012-09-24 12:51 ` Anand Tiwari
1 sibling, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2012-09-24 7:55 UTC (permalink / raw)
To: Eric Sandeen; +Cc: Anand Tiwari, xfs
On Fri, Sep 21, 2012 at 12:00:13AM -0500, Eric Sandeen wrote:
> On 9/20/12 7:40 PM, Anand Tiwari wrote:
> > Hi All,
> >
> > I have been looking into an issue with xfs_repair with realtime sub volume. some times while running xfs_repair I see following errors
> >
> > ----------------------------
> > data fork in rt inode 134 claims used rt block 19607
> > bad data fork in inode 134
> > would have cleared inode 134
> > data fork in rt inode 135 claims used rt block 29607
> > bad data fork in inode 135
> > would have cleared inode 135
.....
> > xfs_db> inode 135
> > xfs_db> bmap
> > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0
> > data offset 2097000 startblock 15159144 (14/479080) count 2097000 flag 0
> > data offset 4194000 startblock 17256144 (16/478928) count 2097000 flag 0
> > data offset 6291000 startblock 19353144 (18/478776) count 2097000 flag 0
> > data offset 8388000 startblock 21450144 (20/478624) count 2097000 flag 0
> > data offset 10485000 startblock 23547144 (22/478472) count 2097000 flag 0
> > data offset 12582000 startblock 25644144 (24/478320) count 2097000 flag 0
> > data offset 14679000 startblock 27741144 (26/478168) count 2097000 flag 0
> > data offset 16776000 startblock 29838144 (28/478016) count 2097000 flag 0
> > data offset 18873000 startblock 31935144 (30/477864) count 1607000 flag 0
> > xfs_db> inode 134
> > xfs_db> bmap
> > data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0
> > data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag 0
> > data offset 4194000 startblock 12136144 (11/601808) count 926000 flag 0
>
> It's been a while since I thought about realtime, but -
>
> That all seems fine, I don't see anything overlapping there, they are
> all perfectly adjacent, though of interesting size.
Yeah, the size is the problem.
....
> Every extent above is length 2097000 blocks, and they are adjacent.
> But you say your realtime extent size is 512 blocks ... which doesn't go
> into 2097000 evenly. So that's odd, at least.
Once you realise that the bmapbt is recording multiples of FSB (4k)
rather than rtextsz (2MB), it becomes more obvious what the problem
is: rounding of the extent size at MAXEXTLEN - 2097000 is only 152
blocks short of 2^21 (2097152).
I haven't looked at the kernel code yet to work out why it is
rounding to a non-rtextsz multiple, but that is the source of the
problem.
The repair code is detecting that extents are not of the
correct granularity, but the error message indicates that this was
only ever expected for duplicate blocks occurring rather than a
kernel bug. So "fixing repair" is not what is needd here - finding
and fixing the kernel bug is what you shoul be looking at.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-09-24 7:55 ` Dave Chinner
@ 2012-09-24 12:51 ` Anand Tiwari
2012-09-26 1:26 ` Anand Tiwari
0 siblings, 1 reply; 16+ messages in thread
From: Anand Tiwari @ 2012-09-24 12:51 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, xfs
[-- Attachment #1.1: Type: text/plain, Size: 3078 bytes --]
On Mon, Sep 24, 2012 at 1:55 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Sep 21, 2012 at 12:00:13AM -0500, Eric Sandeen wrote:
> > On 9/20/12 7:40 PM, Anand Tiwari wrote:
> > > Hi All,
> > >
> > > I have been looking into an issue with xfs_repair with realtime sub
> volume. some times while running xfs_repair I see following errors
> > >
> > > ----------------------------
> > > data fork in rt inode 134 claims used rt block 19607
> > > bad data fork in inode 134
> > > would have cleared inode 134
> > > data fork in rt inode 135 claims used rt block 29607
> > > bad data fork in inode 135
> > > would have cleared inode 135
> .....
> > > xfs_db> inode 135
> > > xfs_db> bmap
> > > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0
> > > data offset 2097000 startblock 15159144 (14/479080) count 2097000 flag
> 0
> > > data offset 4194000 startblock 17256144 (16/478928) count 2097000 flag
> 0
> > > data offset 6291000 startblock 19353144 (18/478776) count 2097000 flag
> 0
> > > data offset 8388000 startblock 21450144 (20/478624) count 2097000 flag
> 0
> > > data offset 10485000 startblock 23547144 (22/478472) count 2097000
> flag 0
> > > data offset 12582000 startblock 25644144 (24/478320) count 2097000
> flag 0
> > > data offset 14679000 startblock 27741144 (26/478168) count 2097000
> flag 0
> > > data offset 16776000 startblock 29838144 (28/478016) count 2097000
> flag 0
> > > data offset 18873000 startblock 31935144 (30/477864) count 1607000
> flag 0
> > > xfs_db> inode 134
> > > xfs_db> bmap
> > > data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0
> > > data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag 0
> > > data offset 4194000 startblock 12136144 (11/601808) count 926000 flag 0
> >
> > It's been a while since I thought about realtime, but -
> >
> > That all seems fine, I don't see anything overlapping there, they are
> > all perfectly adjacent, though of interesting size.
>
> Yeah, the size is the problem.
>
> ....
> > Every extent above is length 2097000 blocks, and they are adjacent.
> > But you say your realtime extent size is 512 blocks ... which doesn't go
> > into 2097000 evenly. So that's odd, at least.
>
> Once you realise that the bmapbt is recording multiples of FSB (4k)
> rather than rtextsz (2MB), it becomes more obvious what the problem
> is: rounding of the extent size at MAXEXTLEN - 2097000 is only 152
> blocks short of 2^21 (2097152).
>
> I haven't looked at the kernel code yet to work out why it is
> rounding to a non-rtextsz multiple, but that is the source of the
> problem.
>
> The repair code is detecting that extents are not of the
> correct granularity, but the error message indicates that this was
> only ever expected for duplicate blocks occurring rather than a
> kernel bug. So "fixing repair" is not what is needd here - finding
> and fixing the kernel bug is what you shoul be looking at.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
thanks, I started looking at allocator code and and will report if see
something
[-- Attachment #1.2: Type: text/html, Size: 3933 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-09-24 12:51 ` Anand Tiwari
@ 2012-09-26 1:26 ` Anand Tiwari
2012-09-26 2:44 ` Dave Chinner
0 siblings, 1 reply; 16+ messages in thread
From: Anand Tiwari @ 2012-09-26 1:26 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, xfs
[-- Attachment #1.1: Type: text/plain, Size: 4810 bytes --]
On Mon, Sep 24, 2012 at 6:51 AM, Anand Tiwari <tiwarikanand@gmail.com>wrote:
>
>
> On Mon, Sep 24, 2012 at 1:55 AM, Dave Chinner <david@fromorbit.com> wrote:
>
>> On Fri, Sep 21, 2012 at 12:00:13AM -0500, Eric Sandeen wrote:
>> > On 9/20/12 7:40 PM, Anand Tiwari wrote:
>> > > Hi All,
>> > >
>> > > I have been looking into an issue with xfs_repair with realtime sub
>> volume. some times while running xfs_repair I see following errors
>> > >
>> > > ----------------------------
>> > > data fork in rt inode 134 claims used rt block 19607
>> > > bad data fork in inode 134
>> > > would have cleared inode 134
>> > > data fork in rt inode 135 claims used rt block 29607
>> > > bad data fork in inode 135
>> > > would have cleared inode 135
>> .....
>> > > xfs_db> inode 135
>> > > xfs_db> bmap
>> > > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0
>> > > data offset 2097000 startblock 15159144 (14/479080) count 2097000
>> flag 0
>> > > data offset 4194000 startblock 17256144 (16/478928) count 2097000
>> flag 0
>> > > data offset 6291000 startblock 19353144 (18/478776) count 2097000
>> flag 0
>> > > data offset 8388000 startblock 21450144 (20/478624) count 2097000
>> flag 0
>> > > data offset 10485000 startblock 23547144 (22/478472) count 2097000
>> flag 0
>> > > data offset 12582000 startblock 25644144 (24/478320) count 2097000
>> flag 0
>> > > data offset 14679000 startblock 27741144 (26/478168) count 2097000
>> flag 0
>> > > data offset 16776000 startblock 29838144 (28/478016) count 2097000
>> flag 0
>> > > data offset 18873000 startblock 31935144 (30/477864) count 1607000
>> flag 0
>> > > xfs_db> inode 134
>> > > xfs_db> bmap
>> > > data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0
>> > > data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag
>> 0
>> > > data offset 4194000 startblock 12136144 (11/601808) count 926000 flag
>> 0
>> >
>> > It's been a while since I thought about realtime, but -
>> >
>> > That all seems fine, I don't see anything overlapping there, they are
>> > all perfectly adjacent, though of interesting size.
>>
>> Yeah, the size is the problem.
>>
>> ....
>> > Every extent above is length 2097000 blocks, and they are adjacent.
>> > But you say your realtime extent size is 512 blocks ... which doesn't go
>> > into 2097000 evenly. So that's odd, at least.
>>
>> Once you realise that the bmapbt is recording multiples of FSB (4k)
>> rather than rtextsz (2MB), it becomes more obvious what the problem
>> is: rounding of the extent size at MAXEXTLEN - 2097000 is only 152
>> blocks short of 2^21 (2097152).
>>
>> I haven't looked at the kernel code yet to work out why it is
>> rounding to a non-rtextsz multiple, but that is the source of the
>> problem.
>>
>> The repair code is detecting that extents are not of the
>> correct granularity, but the error message indicates that this was
>> only ever expected for duplicate blocks occurring rather than a
>> kernel bug. So "fixing repair" is not what is needd here - finding
>> and fixing the kernel bug is what you shoul be looking at.
>>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@fromorbit.com
>>
>
>
> thanks, I started looking at allocator code and and will report if see
> something
>
I think this is what happening. If we have following conditions,
1) we have more than 8gb contiguous space available to allocate. ( i.e.
more than 2^21 4k blocks)
2) only one file is open for writing in real-time volume.
To satisfy first condition, I just took empty file-system.
Now lets start allocating, lets say in chucks of 25000, realtime allocator
will have no problem allocating "exact" block while searching forward.
xfs_rtfind_forw(). It will allocate 49 "real-time extents", where the 49th
"real-time extent" is partially full. (25000/512 = 48)
everything is fine for first 83 allocations, as we were able to grow the
extent. Now we have 2075000 (25000*83) blocks in first extent ie 4053
"real-time extents" (where last "real-time extent" is partially full).
for 84th allocation, real-time allocator will allocate another 49
"real-time extents" as it does not know about maximum extent size, but we
can not grow the extent in xfs_bmap_add_extent_unwritten_real(). so we
insert a new extent (case BMAP_LEFT_FILLING). now the new extent starts
from 2075000, which is not aligned with rextsize (512 in this case).
To fix this, I see two options,
1) fix real-time allocator and teach it about maximum extent size.
2) for real-time files, aligned new extent before inserting.
In my opinion, we should not worry about either of above, as this looks
good method for allocation. I can fix xfs_repair tool and make it aware of
these conditions ("real-time extents" shared by two or more extents).
Let me know what you guys think,
anand
[-- Attachment #1.2: Type: text/html, Size: 5994 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-09-26 1:26 ` Anand Tiwari
@ 2012-09-26 2:44 ` Dave Chinner
2012-09-26 3:45 ` Anand Tiwari
0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2012-09-26 2:44 UTC (permalink / raw)
To: Anand Tiwari; +Cc: Eric Sandeen, xfs
On Tue, Sep 25, 2012 at 07:26:32PM -0600, Anand Tiwari wrote:
> On Mon, Sep 24, 2012 at 6:51 AM, Anand Tiwari <tiwarikanand@gmail.com>wrote:
>
> >
> >
> > On Mon, Sep 24, 2012 at 1:55 AM, Dave Chinner <david@fromorbit.com> wrote:
> >
> >> On Fri, Sep 21, 2012 at 12:00:13AM -0500, Eric Sandeen wrote:
> >> > On 9/20/12 7:40 PM, Anand Tiwari wrote:
> >> > > Hi All,
> >> > >
> >> > > I have been looking into an issue with xfs_repair with realtime sub
> >> volume. some times while running xfs_repair I see following errors
> >> > >
> >> > > ----------------------------
> >> > > data fork in rt inode 134 claims used rt block 19607
> >> > > bad data fork in inode 134
> >> > > would have cleared inode 134
> >> > > data fork in rt inode 135 claims used rt block 29607
> >> > > bad data fork in inode 135
> >> > > would have cleared inode 135
> >> .....
> >> > > xfs_db> inode 135
> >> > > xfs_db> bmap
> >> > > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0
> >> > > data offset 2097000 startblock 15159144 (14/479080) count 2097000
> >> flag 0
> >> > > data offset 4194000 startblock 17256144 (16/478928) count 2097000
> >> flag 0
> >> > > data offset 6291000 startblock 19353144 (18/478776) count 2097000
> >> flag 0
> >> > > data offset 8388000 startblock 21450144 (20/478624) count 2097000
> >> flag 0
> >> > > data offset 10485000 startblock 23547144 (22/478472) count 2097000
> >> flag 0
> >> > > data offset 12582000 startblock 25644144 (24/478320) count 2097000
> >> flag 0
> >> > > data offset 14679000 startblock 27741144 (26/478168) count 2097000
> >> flag 0
> >> > > data offset 16776000 startblock 29838144 (28/478016) count 2097000
> >> flag 0
> >> > > data offset 18873000 startblock 31935144 (30/477864) count 1607000
> >> flag 0
> >> > > xfs_db> inode 134
> >> > > xfs_db> bmap
> >> > > data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0
> >> > > data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag
> >> 0
> >> > > data offset 4194000 startblock 12136144 (11/601808) count 926000 flag
> >> 0
> >> >
> >> > It's been a while since I thought about realtime, but -
> >> >
> >> > That all seems fine, I don't see anything overlapping there, they are
> >> > all perfectly adjacent, though of interesting size.
> >>
> >> Yeah, the size is the problem.
> >>
> >> ....
> >> > Every extent above is length 2097000 blocks, and they are adjacent.
> >> > But you say your realtime extent size is 512 blocks ... which doesn't go
> >> > into 2097000 evenly. So that's odd, at least.
> >>
> >> Once you realise that the bmapbt is recording multiples of FSB (4k)
> >> rather than rtextsz (2MB), it becomes more obvious what the problem
> >> is: rounding of the extent size at MAXEXTLEN - 2097000 is only 152
> >> blocks short of 2^21 (2097152).
> >>
> >> I haven't looked at the kernel code yet to work out why it is
> >> rounding to a non-rtextsz multiple, but that is the source of the
> >> problem.
> >>
> >> The repair code is detecting that extents are not of the
> >> correct granularity, but the error message indicates that this was
> >> only ever expected for duplicate blocks occurring rather than a
> >> kernel bug. So "fixing repair" is not what is needd here - finding
> >> and fixing the kernel bug is what you shoul be looking at.
> >>
> >> Cheers,
> >>
> >> Dave.
> >> --
> >> Dave Chinner
> >> david@fromorbit.com
> >>
> >
> >
> > thanks, I started looking at allocator code and and will report if see
> > something
> >
>
>
> I think this is what happening. If we have following conditions,
> 1) we have more than 8gb contiguous space available to allocate. ( i.e.
> more than 2^21 4k blocks)
> 2) only one file is open for writing in real-time volume.
>
> To satisfy first condition, I just took empty file-system.
>
> Now lets start allocating, lets say in chucks of 25000, realtime allocator
> will have no problem allocating "exact" block while searching forward.
> xfs_rtfind_forw(). It will allocate 49 "real-time extents", where the 49th
> "real-time extent" is partially full. (25000/512 = 48)
>
> everything is fine for first 83 allocations, as we were able to grow the
> extent. Now we have 2075000 (25000*83) blocks in first extent ie 4053
> "real-time extents" (where last "real-time extent" is partially full).
>
> for 84th allocation, real-time allocator will allocate another 49
> "real-time extents" as it does not know about maximum extent size, but we
> can not grow the extent in xfs_bmap_add_extent_unwritten_real(). so we
> insert a new extent (case BMAP_LEFT_FILLING). now the new extent starts
> from 2075000, which is not aligned with rextsize (512 in this case).
Ok, so it's a problem with using unwritten extents and converting
them. That is, the issue probably has nothing to do with the
realtime allocator at all.
Basically, when the unwritten extent occurs, we end up with a map
like this:
ext 0: offset 0, length 2075000 state written
ext 1: offset 2075000 length 25000 state unwritten
This will occur because you can't mix written/unwritten state in a
single extent.
What xfs_bmap_add_extent_unwritten_real() is attempting to do is
convert the unwritten extent to written state and merge it with it's
siblings. In this case, 2075000 + 25000 > MAXEXTLEN, so it does not
merge them because of this check:
if ((state & BMAP_LEFT_VALID) && !(state & BMAP_LEFT_DELAY) &&
LEFT.br_startoff + LEFT.br_blockcount == new->br_startoff &&
LEFT.br_startblock + LEFT.br_blockcount == new->br_startblock &&
LEFT.br_state == newext &&
>>>>>> LEFT.br_blockcount + new->br_blockcount <= MAXEXTLEN)
state |= BMAP_LEFT_CONTIG;
Which means that BMAP_LEFT_CONTIG is not set, indicating that the no
merging with the an adjacent left extent should occur. Hence we end
up wwith this:
ext 0: offset 0, length 2075000 state written
ext 1: offset 2075000 length 25000 state written
That's fine for normal operation, but it means that large contiguous
regions written via direct IO with non-rtextsz aligned/sized IO will
have problem this problem.
What technically should happen for these real time files is that the
LEFT extent should be shortened to be aligned, and the new extent be
lengthened and have it's startblock adjusted accordingly.
i.e. we should end up with this:
ext 0: offset 0, length 2074624 state written
ext 1: offset 2074624 length 25376 state written
> To fix this, I see two options,
> 1) fix real-time allocator and teach it about maximum extent size.
> 2) for real-time files, aligned new extent before inserting.
3) Fix the {BMAP_LEFT_CONTIG,MAXEXTLEN,rtextsz} handling in
xfs_bmap_add_extent_unwritten_real().
It's possible that the BMAP_RIGHT_CONTIG case also needs similar
fixing...
> In my opinion, we should not worry about either of above, as this looks
> good method for allocation. I can fix xfs_repair tool and make it aware of
> these conditions ("real-time extents" shared by two or more extents).
Personally, I'd prefer that 3) is done because RT extents should
always be rtextsz aligned and sized, and the bmapbt should respect
that requirement in all cases.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-09-26 2:44 ` Dave Chinner
@ 2012-09-26 3:45 ` Anand Tiwari
2012-09-26 6:17 ` Dave Chinner
0 siblings, 1 reply; 16+ messages in thread
From: Anand Tiwari @ 2012-09-26 3:45 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, xfs
[-- Attachment #1.1: Type: text/plain, Size: 7963 bytes --]
On Tue, Sep 25, 2012 at 8:44 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Tue, Sep 25, 2012 at 07:26:32PM -0600, Anand Tiwari wrote:
> > On Mon, Sep 24, 2012 at 6:51 AM, Anand Tiwari <tiwarikanand@gmail.com
> >wrote:
> >
> > >
> > >
> > > On Mon, Sep 24, 2012 at 1:55 AM, Dave Chinner <david@fromorbit.com>
> wrote:
> > >
> > >> On Fri, Sep 21, 2012 at 12:00:13AM -0500, Eric Sandeen wrote:
> > >> > On 9/20/12 7:40 PM, Anand Tiwari wrote:
> > >> > > Hi All,
> > >> > >
> > >> > > I have been looking into an issue with xfs_repair with realtime
> sub
> > >> volume. some times while running xfs_repair I see following errors
> > >> > >
> > >> > > ----------------------------
> > >> > > data fork in rt inode 134 claims used rt block 19607
> > >> > > bad data fork in inode 134
> > >> > > would have cleared inode 134
> > >> > > data fork in rt inode 135 claims used rt block 29607
> > >> > > bad data fork in inode 135
> > >> > > would have cleared inode 135
> > >> .....
> > >> > > xfs_db> inode 135
> > >> > > xfs_db> bmap
> > >> > > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0
> > >> > > data offset 2097000 startblock 15159144 (14/479080) count 2097000
> > >> flag 0
> > >> > > data offset 4194000 startblock 17256144 (16/478928) count 2097000
> > >> flag 0
> > >> > > data offset 6291000 startblock 19353144 (18/478776) count 2097000
> > >> flag 0
> > >> > > data offset 8388000 startblock 21450144 (20/478624) count 2097000
> > >> flag 0
> > >> > > data offset 10485000 startblock 23547144 (22/478472) count 2097000
> > >> flag 0
> > >> > > data offset 12582000 startblock 25644144 (24/478320) count 2097000
> > >> flag 0
> > >> > > data offset 14679000 startblock 27741144 (26/478168) count 2097000
> > >> flag 0
> > >> > > data offset 16776000 startblock 29838144 (28/478016) count 2097000
> > >> flag 0
> > >> > > data offset 18873000 startblock 31935144 (30/477864) count 1607000
> > >> flag 0
> > >> > > xfs_db> inode 134
> > >> > > xfs_db> bmap
> > >> > > data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0
> > >> > > data offset 2097000 startblock 10039144 (9/601960) count 2097000
> flag
> > >> 0
> > >> > > data offset 4194000 startblock 12136144 (11/601808) count 926000
> flag
> > >> 0
> > >> >
> > >> > It's been a while since I thought about realtime, but -
> > >> >
> > >> > That all seems fine, I don't see anything overlapping there, they
> are
> > >> > all perfectly adjacent, though of interesting size.
> > >>
> > >> Yeah, the size is the problem.
> > >>
> > >> ....
> > >> > Every extent above is length 2097000 blocks, and they are adjacent.
> > >> > But you say your realtime extent size is 512 blocks ... which
> doesn't go
> > >> > into 2097000 evenly. So that's odd, at least.
> > >>
> > >> Once you realise that the bmapbt is recording multiples of FSB (4k)
> > >> rather than rtextsz (2MB), it becomes more obvious what the problem
> > >> is: rounding of the extent size at MAXEXTLEN - 2097000 is only 152
> > >> blocks short of 2^21 (2097152).
> > >>
> > >> I haven't looked at the kernel code yet to work out why it is
> > >> rounding to a non-rtextsz multiple, but that is the source of the
> > >> problem.
> > >>
> > >> The repair code is detecting that extents are not of the
> > >> correct granularity, but the error message indicates that this was
> > >> only ever expected for duplicate blocks occurring rather than a
> > >> kernel bug. So "fixing repair" is not what is needd here - finding
> > >> and fixing the kernel bug is what you shoul be looking at.
> > >>
> > >> Cheers,
> > >>
> > >> Dave.
> > >> --
> > >> Dave Chinner
> > >> david@fromorbit.com
> > >>
> > >
> > >
> > > thanks, I started looking at allocator code and and will report if see
> > > something
> > >
> >
> >
> > I think this is what happening. If we have following conditions,
> > 1) we have more than 8gb contiguous space available to allocate. ( i.e.
> > more than 2^21 4k blocks)
> > 2) only one file is open for writing in real-time volume.
> >
> > To satisfy first condition, I just took empty file-system.
> >
> > Now lets start allocating, lets say in chucks of 25000, realtime
> allocator
> > will have no problem allocating "exact" block while searching forward.
> > xfs_rtfind_forw(). It will allocate 49 "real-time extents", where the
> 49th
> > "real-time extent" is partially full. (25000/512 = 48)
> >
> > everything is fine for first 83 allocations, as we were able to grow the
> > extent. Now we have 2075000 (25000*83) blocks in first extent ie 4053
> > "real-time extents" (where last "real-time extent" is partially full).
> >
> > for 84th allocation, real-time allocator will allocate another 49
> > "real-time extents" as it does not know about maximum extent size, but we
> > can not grow the extent in xfs_bmap_add_extent_unwritten_real(). so we
> > insert a new extent (case BMAP_LEFT_FILLING). now the new extent starts
> > from 2075000, which is not aligned with rextsize (512 in this case).
>
> Ok, so it's a problem with using unwritten extents and converting
> them. That is, the issue probably has nothing to do with the
> realtime allocator at all.
>
> Basically, when the unwritten extent occurs, we end up with a map
> like this:
>
> ext 0: offset 0, length 2075000 state written
> ext 1: offset 2075000 length 25000 state unwritten
>
> This will occur because you can't mix written/unwritten state in a
> single extent.
>
> What xfs_bmap_add_extent_unwritten_real() is attempting to do is
> convert the unwritten extent to written state and merge it with it's
> siblings. In this case, 2075000 + 25000 > MAXEXTLEN, so it does not
> merge them because of this check:
>
> if ((state & BMAP_LEFT_VALID) && !(state & BMAP_LEFT_DELAY) &&
> LEFT.br_startoff + LEFT.br_blockcount == new->br_startoff &&
> LEFT.br_startblock + LEFT.br_blockcount == new->br_startblock
> &&
> LEFT.br_state == newext &&
> >>>>>> LEFT.br_blockcount + new->br_blockcount <= MAXEXTLEN)
> state |= BMAP_LEFT_CONTIG;
>
> Which means that BMAP_LEFT_CONTIG is not set, indicating that the no
> merging with the an adjacent left extent should occur. Hence we end
> up wwith this:
>
> ext 0: offset 0, length 2075000 state written
> ext 1: offset 2075000 length 25000 state written
>
> That's fine for normal operation, but it means that large contiguous
> regions written via direct IO with non-rtextsz aligned/sized IO will
> have problem this problem.
>
> What technically should happen for these real time files is that the
> LEFT extent should be shortened to be aligned, and the new extent be
> lengthened and have it's startblock adjusted accordingly.
>
> i.e. we should end up with this:
>
> ext 0: offset 0, length 2074624 state written
> ext 1: offset 2074624 length 25376 state written
>
> > To fix this, I see two options,
> > 1) fix real-time allocator and teach it about maximum extent size.
> > 2) for real-time files, aligned new extent before inserting.
>
> 3) Fix the {BMAP_LEFT_CONTIG,MAXEXTLEN,rtextsz} handling in
> xfs_bmap_add_extent_unwritten_real().
>
> It's possible that the BMAP_RIGHT_CONTIG case also needs similar
> fixing...
>
> > In my opinion, we should not worry about either of above, as this looks
> > good method for allocation. I can fix xfs_repair tool and make it aware
> of
> > these conditions ("real-time extents" shared by two or more extents).
>
> Personally, I'd prefer that 3) is done because RT extents should
> always be rtextsz aligned and sized, and the bmapbt should respect
> that requirement in all cases.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
thanks Dave for prompt reply, I meant to implement option 2 as you
explained (option 3). I will start working on it tomorrow. In the mean
time, I also had to put something in xfs_repair for the files which
already exists on the disk. Would you guys willing to review/comment on
that ?
anand
[-- Attachment #1.2: Type: text/html, Size: 10376 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-09-26 3:45 ` Anand Tiwari
@ 2012-09-26 6:17 ` Dave Chinner
2012-09-28 1:27 ` Anand Tiwari
0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2012-09-26 6:17 UTC (permalink / raw)
To: Anand Tiwari; +Cc: Eric Sandeen, xfs
On Tue, Sep 25, 2012 at 09:45:07PM -0600, Anand Tiwari wrote:
> thanks Dave for prompt reply, I meant to implement option 2 as you
> explained (option 3). I will start working on it tomorrow. In the mean
> time, I also had to put something in xfs_repair for the files which
> already exists on the disk. Would you guys willing to review/comment on
> that ?
Sure.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-09-26 6:17 ` Dave Chinner
@ 2012-09-28 1:27 ` Anand Tiwari
2012-09-28 6:47 ` Dave Chinner
0 siblings, 1 reply; 16+ messages in thread
From: Anand Tiwari @ 2012-09-28 1:27 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, xfs
[-- Attachment #1.1: Type: text/plain, Size: 4612 bytes --]
On Wed, Sep 26, 2012 at 12:17 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Tue, Sep 25, 2012 at 09:45:07PM -0600, Anand Tiwari wrote:
> > thanks Dave for prompt reply, I meant to implement option 2 as you
> > explained (option 3). I will start working on it tomorrow. In the mean
> > time, I also had to put something in xfs_repair for the files which
> > already exists on the disk. Would you guys willing to review/comment on
> > that ?
>
> Sure.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
following are my changes for xfs_repair. my goal is to keep changes
minimum as they may not included in upstream. I had to do these changes as
we already have files with extent map not properly aligned.
As we know, this happens only when we are growing a file in a realtime
volume. by keeping this in mind, I am checking if start of a record in
extent map is not aligned, check previous record and if they are
contiguous, we can skip that part of record.
let me know if you any issues with this or if someone has better approach.
I would like to use pointers for prev and irec but again, I wanted to keep
changes minimum
thanks
anand
XFS currently can have records in extent map, which starts from unaligned
block w.r.t rextsize.
--------------------------------------
xfs_repair considers this as a bug (multiple claims for a one real time
extent) and deletes the file.
This patch addresses the issue, by comparing current and previous records
and make sure they are
contiguous and not overlapped.
diff --git a/repair/dinode.c b/repair/dinode.c
index 5a2da39..5537f1c 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -406,6 +406,7 @@ verify_agbno(xfs_mount_t *mp,
static int
process_rt_rec(
xfs_mount_t *mp,
+ xfs_bmbt_irec_t *prev,
xfs_bmbt_irec_t *irec,
xfs_ino_t ino,
xfs_drfsbno_t *tot,
@@ -413,8 +414,11 @@ process_rt_rec(
{
xfs_dfsbno_t b;
xfs_drtbno_t ext;
+ xfs_drtbno_t start_block;
+ xfs_filblks_t block_count;
int state;
int pwe; /* partially-written extent */
+ int rtext_remainder; /* start block is not aligned w.r.t rextsize */
/*
* check numeric validity of the extent
@@ -461,12 +465,32 @@ _("malformed rt inode extent [%" PRIu64 " %" PRIu64
"] (fs rtext size = %u)\n"),
return 1;
}
+ /* If we have start of record unaligned w.r.t to rextsize, see
+ * if we are sharing this realtime extent with previous record. sharing is
only
+ * allowed with previous extent. fail otherwise.
+ * Also, if above condition is true, align start block and block count
+ */
+ rtext_remainder = irec->br_startblock % mp->m_sb.sb_rextsize;
+ if (rtext_remainder) {
+ do_warn(
+_("data fork in rt ino %" PRIu64 " has unalinged start block %"PRIu64
"\n"),
+ ino,
+ irec->br_startblock);
+ if ((prev->br_startoff + prev->br_blockcount == irec->br_startoff) &&
+ (prev->br_startblock + prev->br_blockcount == irec->br_startblock)) {
+ start_block = irec->br_startblock + (mp->m_sb.sb_rextsize -
rtext_remainder);
+ block_count = irec->br_blockcount - (mp->m_sb.sb_rextsize -
rtext_remainder);
+ }
+ } else {
+ start_block = irec->br_startblock;
+ block_count = irec->br_blockcount;
+ }
+
/*
* set the appropriate number of extents
* this iterates block by block, this can be optimised using extents
*/
- for (b = irec->br_startblock; b < irec->br_startblock +
- irec->br_blockcount; b += mp->m_sb.sb_rextsize) {
+ for (b = start_block; b < start_block + block_count; b +=
mp->m_sb.sb_rextsize) {
ext = (xfs_drtbno_t) b / mp->m_sb.sb_rextsize;
pwe = xfs_sb_version_hasextflgbit(&mp->m_sb) &&
irec->br_state == XFS_EXT_UNWRITTEN &&
@@ -548,6 +572,7 @@ process_bmbt_reclist_int(
int check_dups,
int whichfork)
{
+ xfs_bmbt_irec_t prev;
xfs_bmbt_irec_t irec;
xfs_dfilblks_t cp = 0; /* prev count */
xfs_dfsbno_t sp = 0; /* prev start */
@@ -574,6 +599,11 @@ process_bmbt_reclist_int(
else
ftype = _("regular");
+ prev.br_startoff = 0;
+ prev.br_blockcount = 0;
+ prev.br_startblock = 0;
+ prev.br_state = 0;
+
for (i = 0; i < *numrecs; i++) {
libxfs_bmbt_disk_get_all(rp + i, &irec);
if (i == 0)
@@ -610,12 +640,13 @@ _("zero length extent (off = %" PRIu64 ", fsbno = %"
PRIu64 ") in ino %" PRIu64
* realtime bitmaps don't use AG locks, so returning
* immediately is fine for this code path.
*/
- if (process_rt_rec(mp, &irec, ino, tot, check_dups))
+ if (process_rt_rec(mp, &prev, &irec, ino, tot, check_dups))
return 1;
/*
* skip rest of loop processing since that'irec.br_startblock
* all for regular file forks and attr forks
*/
+ memcpy(&prev, &irec, sizeof(xfs_bmbt_irec_t));
continue;
}
[-- Attachment #1.2: Type: text/html, Size: 12372 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-09-28 1:27 ` Anand Tiwari
@ 2012-09-28 6:47 ` Dave Chinner
2012-09-29 22:49 ` Anand Tiwari
0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2012-09-28 6:47 UTC (permalink / raw)
To: Anand Tiwari; +Cc: Eric Sandeen, xfs
On Thu, Sep 27, 2012 at 07:27:16PM -0600, Anand Tiwari wrote:
> On Wed, Sep 26, 2012 at 12:17 AM, Dave Chinner <david@fromorbit.com> wrote:
>
> > On Tue, Sep 25, 2012 at 09:45:07PM -0600, Anand Tiwari wrote:
> > > thanks Dave for prompt reply, I meant to implement option 2 as you
> > > explained (option 3). I will start working on it tomorrow. In the mean
> > > time, I also had to put something in xfs_repair for the files which
> > > already exists on the disk. Would you guys willing to review/comment on
> > > that ?
> >
> > Sure.
> >
> > Cheers,
> >
> > Dave.
> > --
> > Dave Chinner
> > david@fromorbit.com
> >
>
> following are my changes for xfs_repair. my goal is to keep changes
> minimum as they may not included in upstream. I had to do these changes as
> we already have files with extent map not properly aligned.
> As we know, this happens only when we are growing a file in a realtime
> volume. by keeping this in mind, I am checking if start of a record in
> extent map is not aligned, check previous record and if they are
> contiguous, we can skip that part of record.
>
> let me know if you any issues with this or if someone has better approach.
> I would like to use pointers for prev and irec but again, I wanted to keep
> changes minimum
>
> thanks
> anand
>
>
> XFS currently can have records in extent map, which starts from unaligned
> block w.r.t rextsize.
>
> --------------------------------------
>
> xfs_repair considers this as a bug (multiple claims for a one real time
> extent) and deletes the file.
> This patch addresses the issue, by comparing current and previous records
> and make sure they are
> contiguous and not overlapped.
>
> diff --git a/repair/dinode.c b/repair/dinode.c
> index 5a2da39..5537f1c 100644
> --- a/repair/dinode.c
> +++ b/repair/dinode.c
> @@ -406,6 +406,7 @@ verify_agbno(xfs_mount_t *mp,
> static int
> process_rt_rec(
> xfs_mount_t *mp,
> + xfs_bmbt_irec_t *prev,
> xfs_bmbt_irec_t *irec,
> xfs_ino_t ino,
> xfs_drfsbno_t *tot,
Your mailer has removed all the whitespace from the patch. The files
Documentation/SubmittingPatches and Documentation/email-clients.txt
for help with how to send patches sanely via email. ;)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-09-28 6:47 ` Dave Chinner
@ 2012-09-29 22:49 ` Anand Tiwari
2012-10-02 0:59 ` Dave Chinner
0 siblings, 1 reply; 16+ messages in thread
From: Anand Tiwari @ 2012-09-29 22:49 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, Anand Tiwari, xfs
On Fri, 28 Sep 2012, Dave Chinner wrote:
> On Thu, Sep 27, 2012 at 07:27:16PM -0600, Anand Tiwari wrote:
> > On Wed, Sep 26, 2012 at 12:17 AM, Dave Chinner <david@fromorbit.com> wrote:
> >
> > > On Tue, Sep 25, 2012 at 09:45:07PM -0600, Anand Tiwari wrote:
> > > > thanks Dave for prompt reply, I meant to implement option 2 as you
> > > > explained (option 3). I will start working on it tomorrow. In the mean
> > > > time, I also had to put something in xfs_repair for the files which
> > > > already exists on the disk. Would you guys willing to review/comment on
> > > > that ?
> > >
> > > Sure.
> > >
> > > Cheers,
> > >
> > > Dave.
> > > --
> > > Dave Chinner
> > > david@fromorbit.com
> > >
> >
> > following are my changes for xfs_repair. my goal is to keep changes
> > minimum as they may not included in upstream. I had to do these changes as
> > we already have files with extent map not properly aligned.
> > As we know, this happens only when we are growing a file in a realtime
> > volume. by keeping this in mind, I am checking if start of a record in
> > extent map is not aligned, check previous record and if they are
> > contiguous, we can skip that part of record.
> >
> > let me know if you any issues with this or if someone has better approach.
> > I would like to use pointers for prev and irec but again, I wanted to keep
> > changes minimum
> >
> > thanks
> > anand
> >
> >
>
> Your mailer has removed all the whitespace from the patch. The files
> Documentation/SubmittingPatches and Documentation/email-clients.txt
> for help with how to send patches sanely via email. ;)
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
okay,here is my second attempt, hope this time it work.
>From 77f8dc66fa5ce6b004f95dbb794322be35d4190a Mon Sep 17 00:00:00 2001
From: Anand Tiwari <anand.tiwari@linux.com>
Date: Fri, 28 Sep 2012 15:45:55 -0600
Subject: [PATCH] xfs_repair: detect realtime extent shared by multiple
records in extent map
XFS currently can have records in extent map, which starts from unaligned start block w.r.t rextsize.
xfs_repair considers this as a bug (multiple claims for a real-time extent) and deletes the file.
This patch addresses the issue, by comparing current and previous records and make sure they are
contiguous and not overlapped.
Signed-off-by: Anand Tiwari <anand.tiwari@linux.com>
---
repair/dinode.c | 37 ++++++++++++++++++++++++++++++++++---
1 file changed, 34 insertions(+), 3 deletions(-)
diff --git a/repair/dinode.c b/repair/dinode.c
index 5a2da39..5537f1c 100644
--- a/repair/dinode.c
+++ b/repair/dinode.c
@@ -406,6 +406,7 @@ verify_agbno(xfs_mount_t *mp,
static int
process_rt_rec(
xfs_mount_t *mp,
+ xfs_bmbt_irec_t *prev,
xfs_bmbt_irec_t *irec,
xfs_ino_t ino,
xfs_drfsbno_t *tot,
@@ -413,8 +414,11 @@ process_rt_rec(
{
xfs_dfsbno_t b;
xfs_drtbno_t ext;
+ xfs_drtbno_t start_block;
+ xfs_filblks_t block_count;
int state;
int pwe; /* partially-written extent */
+ int rtext_remainder; /* start block is not aligned w.r.t rextsize */
/*
* check numeric validity of the extent
@@ -461,12 +465,32 @@ _("malformed rt inode extent [%" PRIu64 " %" PRIu64 "] (fs rtext size = %u)\n"),
return 1;
}
+ /* If we have start of record unaligned w.r.t to rextsize, see
+ * if we are sharing this realtime extent with previous record. sharing is only
+ * allowed with previous extent. fail otherwise.
+ * Also, we above condition is true, align start block and block count
+ */
+ rtext_remainder = irec->br_startblock % mp->m_sb.sb_rextsize;
+ if (rtext_remainder) {
+ do_warn(
+_("data fork in rt ino %" PRIu64 " has unalinged start block %"PRIu64 "\n"),
+ ino,
+ irec->br_startblock);
+ if ((prev->br_startoff + prev->br_blockcount == irec->br_startoff) &&
+ (prev->br_startblock + prev->br_blockcount == irec->br_startblock)) {
+ start_block = irec->br_startblock + (mp->m_sb.sb_rextsize - rtext_remainder);
+ block_count = irec->br_blockcount - (mp->m_sb.sb_rextsize - rtext_remainder);
+ }
+ } else {
+ start_block = irec->br_startblock;
+ block_count = irec->br_blockcount;
+ }
+
/*
* set the appropriate number of extents
* this iterates block by block, this can be optimised using extents
*/
- for (b = irec->br_startblock; b < irec->br_startblock +
- irec->br_blockcount; b += mp->m_sb.sb_rextsize) {
+ for (b = start_block; b < start_block + block_count; b += mp->m_sb.sb_rextsize) {
ext = (xfs_drtbno_t) b / mp->m_sb.sb_rextsize;
pwe = xfs_sb_version_hasextflgbit(&mp->m_sb) &&
irec->br_state == XFS_EXT_UNWRITTEN &&
@@ -548,6 +572,7 @@ process_bmbt_reclist_int(
int check_dups,
int whichfork)
{
+ xfs_bmbt_irec_t prev;
xfs_bmbt_irec_t irec;
xfs_dfilblks_t cp = 0; /* prev count */
xfs_dfsbno_t sp = 0; /* prev start */
@@ -574,6 +599,11 @@ process_bmbt_reclist_int(
else
ftype = _("regular");
+ prev.br_startoff = 0;
+ prev.br_blockcount = 0;
+ prev.br_startblock = 0;
+ prev.br_state = 0;
+
for (i = 0; i < *numrecs; i++) {
libxfs_bmbt_disk_get_all(rp + i, &irec);
if (i == 0)
@@ -610,12 +640,13 @@ _("zero length extent (off = %" PRIu64 ", fsbno = %" PRIu64 ") in ino %" PRIu64
* realtime bitmaps don't use AG locks, so returning
* immediately is fine for this code path.
*/
- if (process_rt_rec(mp, &irec, ino, tot, check_dups))
+ if (process_rt_rec(mp, &prev, &irec, ino, tot, check_dups))
return 1;
/*
* skip rest of loop processing since that'irec.br_startblock
* all for regular file forks and attr forks
*/
+ memcpy(&prev, &irec, sizeof(xfs_bmbt_irec_t));
continue;
}
--
1.7.9.5
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-09-29 22:49 ` Anand Tiwari
@ 2012-10-02 0:59 ` Dave Chinner
2012-10-02 16:47 ` Anand Tiwari
0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2012-10-02 0:59 UTC (permalink / raw)
To: Anand Tiwari; +Cc: Eric Sandeen, xfs
On Sat, Sep 29, 2012 at 04:49:03PM -0600, Anand Tiwari wrote:
> Subject: [PATCH] xfs_repair: detect realtime extent shared by multiple
> records in extent map
>
> XFS currently can have records in extent map, which starts from unaligned start block w.r.t rextsize.
> xfs_repair considers this as a bug (multiple claims for a real-time extent) and deletes the file.
> This patch addresses the issue, by comparing current and previous records and make sure they are
> contiguous and not overlapped.
>
> Signed-off-by: Anand Tiwari <anand.tiwari@linux.com>
> ---
> repair/dinode.c | 37 ++++++++++++++++++++++++++++++++++---
> 1 file changed, 34 insertions(+), 3 deletions(-)
>
> diff --git a/repair/dinode.c b/repair/dinode.c
> index 5a2da39..5537f1c 100644
> --- a/repair/dinode.c
> +++ b/repair/dinode.c
> @@ -406,6 +406,7 @@ verify_agbno(xfs_mount_t *mp,
> static int
> process_rt_rec(
> xfs_mount_t *mp,
> + xfs_bmbt_irec_t *prev,
> xfs_bmbt_irec_t *irec,
> xfs_ino_t ino,
> xfs_drfsbno_t *tot,
> @@ -413,8 +414,11 @@ process_rt_rec(
> {
> xfs_dfsbno_t b;
> xfs_drtbno_t ext;
> + xfs_drtbno_t start_block;
> + xfs_filblks_t block_count;
> int state;
> int pwe; /* partially-written extent */
> + int rtext_remainder; /* start block is not aligned w.r.t rextsize */
>
> /*
> * check numeric validity of the extent
> @@ -461,12 +465,32 @@ _("malformed rt inode extent [%" PRIu64 " %" PRIu64 "] (fs rtext size = %u)\n"),
> return 1;
> }
>
> + /* If we have start of record unaligned w.r.t to rextsize, see
> + * if we are sharing this realtime extent with previous record. sharing is only
> + * allowed with previous extent. fail otherwise.
> + * Also, we above condition is true, align start block and block count
> + */
Comment format is:
/*
* this is a
* multiline comment
*/
> + rtext_remainder = irec->br_startblock % mp->m_sb.sb_rextsize;
> + if (rtext_remainder) {
> + do_warn(
> +_("data fork in rt ino %" PRIu64 " has unalinged start block %"PRIu64 "\n"),
unaligned
> + ino,
> + irec->br_startblock);
indent these a bit so the fact they are part of the do_warn function
call is obvious. i.e.
do_warn(
<format string>
ino, irec->br_startblock);
> + if ((prev->br_startoff + prev->br_blockcount == irec->br_startoff) &&
> + (prev->br_startblock + prev->br_blockcount == irec->br_startblock)) {
> + start_block = irec->br_startblock + (mp->m_sb.sb_rextsize - rtext_remainder);
> + block_count = irec->br_blockcount - (mp->m_sb.sb_rextsize - rtext_remainder);
> + }
> + } else {
> + start_block = irec->br_startblock;
> + block_count = irec->br_blockcount;
> + }
So this just changes the loop below not to check the first part of
the new extent and therefore avoid the duplicate extent usage
detection?
Any plans to correct the bmapbt extents so that repeated repair runs
don't keep warning about the same problem?
> +
> /*
> * set the appropriate number of extents
> * this iterates block by block, this can be optimised using extents
> */
> - for (b = irec->br_startblock; b < irec->br_startblock +
> - irec->br_blockcount; b += mp->m_sb.sb_rextsize) {
> + for (b = start_block; b < start_block + block_count; b += mp->m_sb.sb_rextsize) {
> ext = (xfs_drtbno_t) b / mp->m_sb.sb_rextsize;
> pwe = xfs_sb_version_hasextflgbit(&mp->m_sb) &&
> irec->br_state == XFS_EXT_UNWRITTEN &&
> @@ -548,6 +572,7 @@ process_bmbt_reclist_int(
> int check_dups,
> int whichfork)
> {
> + xfs_bmbt_irec_t prev;
> xfs_bmbt_irec_t irec;
> xfs_dfilblks_t cp = 0; /* prev count */
> xfs_dfsbno_t sp = 0; /* prev start */
> @@ -574,6 +599,11 @@ process_bmbt_reclist_int(
> else
> ftype = _("regular");
>
> + prev.br_startoff = 0;
> + prev.br_blockcount = 0;
> + prev.br_startblock = 0;
> + prev.br_state = 0;
memset(&prev, 0, sizeof(prev));
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files.
2012-10-02 0:59 ` Dave Chinner
@ 2012-10-02 16:47 ` Anand Tiwari
0 siblings, 0 replies; 16+ messages in thread
From: Anand Tiwari @ 2012-10-02 16:47 UTC (permalink / raw)
To: Dave Chinner; +Cc: Eric Sandeen, Anand Tiwari, xfs
On Tue, 2 Oct 2012, Dave Chinner wrote:
> On Sat, Sep 29, 2012 at 04:49:03PM -0600, Anand Tiwari wrote:
> > Subject: [PATCH] xfs_repair: detect realtime extent shared by multiple
> > records in extent map
> >
> > XFS currently can have records in extent map, which starts from unaligned start block w.r.t rextsize.
> > xfs_repair considers this as a bug (multiple claims for a real-time extent) and deletes the file.
> > This patch addresses the issue, by comparing current and previous records and make sure they are
> > contiguous and not overlapped.
> >
> > Signed-off-by: Anand Tiwari <anand.tiwari@linux.com>
> > ---
> > repair/dinode.c | 37 ++++++++++++++++++++++++++++++++++---
> > 1 file changed, 34 insertions(+), 3 deletions(-)
> >
> > diff --git a/repair/dinode.c b/repair/dinode.c
> > index 5a2da39..5537f1c 100644
> > --- a/repair/dinode.c
> > +++ b/repair/dinode.c
> > @@ -406,6 +406,7 @@ verify_agbno(xfs_mount_t *mp,
> > static int
> > process_rt_rec(
> > xfs_mount_t *mp,
> > + xfs_bmbt_irec_t *prev,
> > xfs_bmbt_irec_t *irec,
> > xfs_ino_t ino,
> > xfs_drfsbno_t *tot,
> > @@ -413,8 +414,11 @@ process_rt_rec(
> > {
> > xfs_dfsbno_t b;
> > xfs_drtbno_t ext;
> > + xfs_drtbno_t start_block;
> > + xfs_filblks_t block_count;
> > int state;
> > int pwe; /* partially-written extent */
> > + int rtext_remainder; /* start block is not aligned w.r.t rextsize */
> >
> > /*
> > * check numeric validity of the extent
> > @@ -461,12 +465,32 @@ _("malformed rt inode extent [%" PRIu64 " %" PRIu64 "] (fs rtext size = %u)\n"),
> > return 1;
> > }
> >
> > + /* If we have start of record unaligned w.r.t to rextsize, see
> > + * if we are sharing this realtime extent with previous record. sharing is only
> > + * allowed with previous extent. fail otherwise.
> > + * Also, we above condition is true, align start block and block count
> > + */
>
> Comment format is:
>
> /*
> * this is a
> * multiline comment
> */
>
> > + rtext_remainder = irec->br_startblock % mp->m_sb.sb_rextsize;
> > + if (rtext_remainder) {
> > + do_warn(
> > +_("data fork in rt ino %" PRIu64 " has unalinged start block %"PRIu64 "\n"),
> unaligned
>
> > + ino,
> > + irec->br_startblock);
>
> indent these a bit so the fact they are part of the do_warn function
> call is obvious. i.e.
>
> do_warn(
> <format string>
> ino, irec->br_startblock);
>
> > + if ((prev->br_startoff + prev->br_blockcount == irec->br_startoff) &&
> > + (prev->br_startblock + prev->br_blockcount == irec->br_startblock)) {
> > + start_block = irec->br_startblock + (mp->m_sb.sb_rextsize - rtext_remainder);
> > + block_count = irec->br_blockcount - (mp->m_sb.sb_rextsize - rtext_remainder);
> > + }
> > + } else {
> > + start_block = irec->br_startblock;
> > + block_count = irec->br_blockcount;
> > + }
>
> So this just changes the loop below not to check the first part of
> the new extent and therefore avoid the duplicate extent usage
> detection?
>
> Any plans to correct the bmapbt extents so that repeated repair runs
> don't keep warning about the same problem?
>
I was planning on fixing kernel code first and tackle this later. Let me
know if you guys think otherwise.
> > +
> > /*
> > * set the appropriate number of extents
> > * this iterates block by block, this can be optimised using extents
> > */
> > - for (b = irec->br_startblock; b < irec->br_startblock +
> > - irec->br_blockcount; b += mp->m_sb.sb_rextsize) {
> > + for (b = start_block; b < start_block + block_count; b += mp->m_sb.sb_rextsize) {
> > ext = (xfs_drtbno_t) b / mp->m_sb.sb_rextsize;
> > pwe = xfs_sb_version_hasextflgbit(&mp->m_sb) &&
> > irec->br_state == XFS_EXT_UNWRITTEN &&
> > @@ -548,6 +572,7 @@ process_bmbt_reclist_int(
> > int check_dups,
> > int whichfork)
> > {
> > + xfs_bmbt_irec_t prev;
> > xfs_bmbt_irec_t irec;
> > xfs_dfilblks_t cp = 0; /* prev count */
> > xfs_dfsbno_t sp = 0; /* prev start */
> > @@ -574,6 +599,11 @@ process_bmbt_reclist_int(
> > else
> > ftype = _("regular");
> >
> > + prev.br_startoff = 0;
> > + prev.br_blockcount = 0;
> > + prev.br_startblock = 0;
> > + prev.br_state = 0;
>
> memset(&prev, 0, sizeof(prev));
>
which one is more preferred in XFS land,
a) xfs_bmbt_irec_t prev = {0};
b) memset(&prev, 0, sizeof(prev));
my vote is "a". I already addressed other suggestions.
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
thanks a bunch,
anand
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2012-10-02 18:35 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-21 0:40 xfs_repair deleting realtime files Anand Tiwari
2012-09-21 5:00 ` Eric Sandeen
2012-09-21 15:51 ` Anand Tiwari
2012-09-21 16:07 ` Eric Sandeen
2012-09-21 16:40 ` Anand Tiwari
2012-09-24 7:55 ` Dave Chinner
2012-09-24 12:51 ` Anand Tiwari
2012-09-26 1:26 ` Anand Tiwari
2012-09-26 2:44 ` Dave Chinner
2012-09-26 3:45 ` Anand Tiwari
2012-09-26 6:17 ` Dave Chinner
2012-09-28 1:27 ` Anand Tiwari
2012-09-28 6:47 ` Dave Chinner
2012-09-29 22:49 ` Anand Tiwari
2012-10-02 0:59 ` Dave Chinner
2012-10-02 16:47 ` Anand Tiwari
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox