* xfs_repair deleting realtime files.
@ 2012-09-21 0:40 Anand Tiwari
2012-09-21 5:00 ` Eric Sandeen
0 siblings, 1 reply; 16+ messages in thread
From: Anand Tiwari @ 2012-09-21 0:40 UTC (permalink / raw)
To: xfs
[-- Attachment #1.1: Type: text/plain, Size: 3701 bytes --]
Hi All,
I have been looking into an issue with xfs_repair with realtime sub volume.
some times while running xfs_repair I see following errors
----------------------------
data fork in rt inode 134 claims used rt block 19607
bad data fork in inode 134
would have cleared inode 134
data fork in rt inode 135 claims used rt block 29607
bad data fork in inode 135
would have cleared inode 135
- agno = 1
- agno = 2
- agno = 3
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
entry "test-011" in shortform directory 128 references free inode 134
would have junked entry "test-011" in directory inode 128
entry "test-0" in shortform directory 128 references free inode 135
would have junked entry "test-0" in directory inode 128
data fork in rt ino 134 claims dup rt extent,off - 0, start - 7942144,
count 2097000
bad data fork in inode 134
would have cleared inode 134
data fork in rt ino 135 claims dup rt extent,off - 0, start - 13062144,
count 2097000
bad data fork in inode 135
would have cleared inode 135
No modify flag set, skipping phase 5
------------------------
Here is the bmap for both inodes.
xfs_db> inode 135
xfs_db> bmap
data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0
data offset 2097000 startblock 15159144 (14/479080) count 2097000 flag 0
data offset 4194000 startblock 17256144 (16/478928) count 2097000 flag 0
data offset 6291000 startblock 19353144 (18/478776) count 2097000 flag 0
data offset 8388000 startblock 21450144 (20/478624) count 2097000 flag 0
data offset 10485000 startblock 23547144 (22/478472) count 2097000 flag 0
data offset 12582000 startblock 25644144 (24/478320) count 2097000 flag 0
data offset 14679000 startblock 27741144 (26/478168) count 2097000 flag 0
data offset 16776000 startblock 29838144 (28/478016) count 2097000 flag 0
data offset 18873000 startblock 31935144 (30/477864) count 1607000 flag 0
xfs_db> inode 134
xfs_db> bmap
data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0
data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag 0
data offset 4194000 startblock 12136144 (11/601808) count 926000 flag 0
by looking into xfs_repair code, it looks like repair does not handle a
case where we have more than one extent in a real-time extent.
following is code from repair/dinode.c: process_rt_rec
-----
for (b = irec->br_startblock; b < irec->br_startblock
+
irec->br_blockcount; b += mp->m_sb.sb_rextsize)
{
ext = (xfs_drtbno_t) b /
mp->m_sb.sb_rextsize;
pwe = xfs_sb_version_hasextflgbit(&mp->m_sb)
&&
irec->br_state == XFS_EXT_UNWRITTEN
&&
(b % mp->m_sb.sb_rextsize !=
0);
-----
In my case rextsize is 512 (512 * 4096 = 2mb). So we have multiple extents
(written extents to be precise, thanks dchinner for that), value of "ext"
will be same for all of them and xfs_repair does not like it. thus the
error message ""data fork in rt inode XX claims used rt block XX".
If I ignore this failure condition, xfs_repairs seems to be happy. (FYI:
this file-system is cleanly umounted)
But in my opinion, its not good as these multiple extents can overlap too.
Should we be using XR_E_MULT to flag and keep track of duplicated real-time
extents. (maybe use the present API for adding/detecting duplicate extents)
I am open of suggestion or comments on how to fix this.
xfs_repair version is 3.1.8 and kernel 2.6.37.
thanks,
Anand
[-- Attachment #1.2: Type: text/html, Size: 4706 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 16+ messages in thread* Re: xfs_repair deleting realtime files. 2012-09-21 0:40 xfs_repair deleting realtime files Anand Tiwari @ 2012-09-21 5:00 ` Eric Sandeen 2012-09-21 15:51 ` Anand Tiwari 2012-09-24 7:55 ` Dave Chinner 0 siblings, 2 replies; 16+ messages in thread From: Eric Sandeen @ 2012-09-21 5:00 UTC (permalink / raw) To: Anand Tiwari; +Cc: xfs On 9/20/12 7:40 PM, Anand Tiwari wrote: > Hi All, > > I have been looking into an issue with xfs_repair with realtime sub volume. some times while running xfs_repair I see following errors > > ---------------------------- > data fork in rt inode 134 claims used rt block 19607 > bad data fork in inode 134 > would have cleared inode 134 > data fork in rt inode 135 claims used rt block 29607 > bad data fork in inode 135 > would have cleared inode 135 > - agno = 1 > - agno = 2 > - agno = 3 > - process newly discovered inodes... > Phase 4 - check for duplicate blocks... > - setting up duplicate extent list... > - check for inodes claiming duplicate blocks... > - agno = 0 > - agno = 1 > - agno = 2 > - agno = 3 > entry "test-011" in shortform directory 128 references free inode 134 > would have junked entry "test-011" in directory inode 128 > entry "test-0" in shortform directory 128 references free inode 135 > would have junked entry "test-0" in directory inode 128 > data fork in rt ino 134 claims dup rt extent,off - 0, start - 7942144, count 2097000 > bad data fork in inode 134 > would have cleared inode 134 > data fork in rt ino 135 claims dup rt extent,off - 0, start - 13062144, count 2097000 > bad data fork in inode 135 > would have cleared inode 135 > No modify flag set, skipping phase 5 > ------------------------ > > Here is the bmap for both inodes. > > xfs_db> inode 135 > xfs_db> bmap > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0 > data offset 2097000 startblock 15159144 (14/479080) count 2097000 flag 0 > data offset 4194000 startblock 17256144 (16/478928) count 2097000 flag 0 > data offset 6291000 startblock 19353144 (18/478776) count 2097000 flag 0 > data offset 8388000 startblock 21450144 (20/478624) count 2097000 flag 0 > data offset 10485000 startblock 23547144 (22/478472) count 2097000 flag 0 > data offset 12582000 startblock 25644144 (24/478320) count 2097000 flag 0 > data offset 14679000 startblock 27741144 (26/478168) count 2097000 flag 0 > data offset 16776000 startblock 29838144 (28/478016) count 2097000 flag 0 > data offset 18873000 startblock 31935144 (30/477864) count 1607000 flag 0 > xfs_db> inode 134 > xfs_db> bmap > data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0 > data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag 0 > data offset 4194000 startblock 12136144 (11/601808) count 926000 flag 0 It's been a while since I thought about realtime, but - That all seems fine, I don't see anything overlapping there, they are all perfectly adjacent, though of interesting size. > > by looking into xfs_repair code, it looks like repair does not handle > a case where we have more than one extent in a real-time extent. > following is code from repair/dinode.c: process_rt_rec "more than one extent in a real-time extent?" I'm not sure what that means. Every extent above is length 2097000 blocks, and they are adjacent. But you say your realtime extent size is 512 blocks ... which doesn't go into 2097000 evenly. So that's odd, at least. Can you provide your xfs_info output for this fs? Or maybe better yet an xfs_metadump image. > ----- > for (b = irec->br_startblock; b < irec->br_startblock + > irec->br_blockcount; b += mp->m_sb.sb_rextsize) { > ext = (xfs_drtbno_t) b / mp->m_sb.sb_rextsize; > pwe = xfs_sb_version_hasextflgbit(&mp->m_sb) && > irec->br_state == XFS_EXT_UNWRITTEN && > (b % mp->m_sb.sb_rextsize != 0); > ----- > > In my case rextsize is 512 (512 * 4096 = 2mb). So we have multiple > extents (written extents to be precise, thanks dchinner for that), > value of "ext" will be same for all of them and xfs_repair does not > like it. thus the error message ""data fork in rt inode XX claims > used rt block XX". "ext" should not be the same for all of them; ext is the realtime extent number in the fs, based on the physical start, br_startblock, divided by the rt extent size. There shouldn't be duplicate values of "ext" based on the bmaps above. The error comes from search_rt_dup_extent() which looks for overlaps elsewhere in the fs... If you can provide a metadump of the fs it might be easier to see what's going on. -Eric > If I ignore this failure condition, xfs_repairs seems to be happy. > (FYI: this file-system is cleanly umounted) But in my opinion, its > not good as these multiple extents can overlap too. > Should we be using XR_E_MULT to flag and keep track of duplicated > real-time extents. (maybe use the present API for adding/detecting > duplicate extents) > > I am open of suggestion or comments on how to fix this. > > xfs_repair version is 3.1.8 and kernel 2.6.37. > > thanks, > Anand > > > > _______________________________________________ > xfs mailing list > xfs@oss.sgi.com > http://oss.sgi.com/mailman/listinfo/xfs > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files. 2012-09-21 5:00 ` Eric Sandeen @ 2012-09-21 15:51 ` Anand Tiwari 2012-09-21 16:07 ` Eric Sandeen 2012-09-24 7:55 ` Dave Chinner 1 sibling, 1 reply; 16+ messages in thread From: Anand Tiwari @ 2012-09-21 15:51 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs [-- Attachment #1.1: Type: text/plain, Size: 6634 bytes --] On Thu, Sep 20, 2012 at 11:00 PM, Eric Sandeen <sandeen@sandeen.net> wrote: > On 9/20/12 7:40 PM, Anand Tiwari wrote: > > Hi All, > > > > I have been looking into an issue with xfs_repair with realtime sub > volume. some times while running xfs_repair I see following errors > > > > ---------------------------- > > data fork in rt inode 134 claims used rt block 19607 > > bad data fork in inode 134 > > would have cleared inode 134 > > data fork in rt inode 135 claims used rt block 29607 > > bad data fork in inode 135 > > would have cleared inode 135 > > - agno = 1 > > - agno = 2 > > - agno = 3 > > - process newly discovered inodes... > > Phase 4 - check for duplicate blocks... > > - setting up duplicate extent list... > > - check for inodes claiming duplicate blocks... > > - agno = 0 > > - agno = 1 > > - agno = 2 > > - agno = 3 > > entry "test-011" in shortform directory 128 references free inode 134 > > would have junked entry "test-011" in directory inode 128 > > entry "test-0" in shortform directory 128 references free inode 135 > > would have junked entry "test-0" in directory inode 128 > > data fork in rt ino 134 claims dup rt extent,off - 0, start - 7942144, > count 2097000 > > bad data fork in inode 134 > > would have cleared inode 134 > > data fork in rt ino 135 claims dup rt extent,off - 0, start - 13062144, > count 2097000 > > bad data fork in inode 135 > > would have cleared inode 135 > > No modify flag set, skipping phase 5 > > ------------------------ > > > > Here is the bmap for both inodes. > > > > xfs_db> inode 135 > > xfs_db> bmap > > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0 > > data offset 2097000 startblock 15159144 (14/479080) count 2097000 flag 0 > > data offset 4194000 startblock 17256144 (16/478928) count 2097000 flag 0 > > data offset 6291000 startblock 19353144 (18/478776) count 2097000 flag 0 > > data offset 8388000 startblock 21450144 (20/478624) count 2097000 flag 0 > > data offset 10485000 startblock 23547144 (22/478472) count 2097000 flag 0 > > data offset 12582000 startblock 25644144 (24/478320) count 2097000 flag 0 > > data offset 14679000 startblock 27741144 (26/478168) count 2097000 flag 0 > > data offset 16776000 startblock 29838144 (28/478016) count 2097000 flag 0 > > data offset 18873000 startblock 31935144 (30/477864) count 1607000 flag 0 > > xfs_db> inode 134 > > xfs_db> bmap > > data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0 > > data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag 0 > > data offset 4194000 startblock 12136144 (11/601808) count 926000 flag 0 > > It's been a while since I thought about realtime, but - > > That all seems fine, I don't see anything overlapping there, they are > all perfectly adjacent, though of interesting size. > > > > > by looking into xfs_repair code, it looks like repair does not handle > > a case where we have more than one extent in a real-time extent. > > following is code from repair/dinode.c: process_rt_rec > > "more than one extent in a real-time extent?" I'm not sure what that > means. > > Every extent above is length 2097000 blocks, and they are adjacent. > But you say your realtime extent size is 512 blocks ... which doesn't go > into 2097000 evenly. So that's odd, at least. > > well, lets look at first extent > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0 > data offset 2097000 startblock 15159144 (14/479080) count 2097000 flag 0 startblock is aligned and rtext is 25512, since the blockcount is not multiple of 512, last realtime extent ( 25512 + 4095) is partially used, 360 blks second extent start from realtime extent 29607 (ie 25512 + 4095). so, yes, extents are not overlapping, but 29607 realtime extent is shared by two extents. Now once xfs_repair detects this case in phase 2, it bails out and clears that inode. I think search for duplicate extent is done in phase 4, but inode is marked already. Can you provide your xfs_info output for this fs? > Or maybe better yet an xfs_metadump image. > > I will do it soon. The other thing I noticed is if I try to delete this file (inode 135), I get following assertion failure. [75669.291000] Assertion failed: prev.br_state == XFS_EXT_NORM, file: fs/xfs/xfs_bmap.c, line: 5187 [75669.300000] [<8024dc44>] assfail+0x28/0x2c [75669.300000] [<801d8864>] xfs_bunmapi+0x1288/0x14e4 [75669.300000] [<8020aff4>] xfs_itruncate_finish+0x344/0x738 [75669.300000] [<802363b0>] xfs_inactive+0x4a8/0x51c [75669.300000] [<80109124>] evict+0x28/0xd0 [75669.300000] [<80109c14>] iput+0x19c/0x2d8 [75669.300000] [<800fe58c>] do_unlinkat+0x10c/0x19c [75669.300000] [<80011b7c>] stack_done+0x20/0x40 > > ----- > > for (b = irec->br_startblock; b < irec->br_startblock + > > irec->br_blockcount; b += mp->m_sb.sb_rextsize) > { > > ext = (xfs_drtbno_t) b / mp->m_sb.sb_rextsize; > > pwe = xfs_sb_version_hasextflgbit(&mp->m_sb) && > > irec->br_state == XFS_EXT_UNWRITTEN && > > (b % mp->m_sb.sb_rextsize != 0); > > ----- > > > > In my case rextsize is 512 (512 * 4096 = 2mb). So we have multiple > > extents (written extents to be precise, thanks dchinner for that), > > value of "ext" will be same for all of them and xfs_repair does not > > like it. thus the error message ""data fork in rt inode XX claims > > used rt block XX". > > "ext" should not be the same for all of them; ext is the realtime extent > number in the fs, based on the physical start, br_startblock, > divided by the rt extent size. There shouldn't be duplicate values > of "ext" based on the bmaps above. > > The error comes from search_rt_dup_extent() which looks for overlaps > elsewhere in the fs... > > If you can provide a metadump of the fs it might be easier to see what's > going on. > > -Eric > > > If I ignore this failure condition, xfs_repairs seems to be happy. > > (FYI: this file-system is cleanly umounted) But in my opinion, its > > not good as these multiple extents can overlap too. > > > Should we be using XR_E_MULT to flag and keep track of duplicated > > real-time extents. (maybe use the present API for adding/detecting > > duplicate extents) > > > > I am open of suggestion or comments on how to fix this. > > > > xfs_repair version is 3.1.8 and kernel 2.6.37. > > > > thanks, > > Anand > > > > > > > > _______________________________________________ > > xfs mailing list > > xfs@oss.sgi.com > > http://oss.sgi.com/mailman/listinfo/xfs > > > > [-- Attachment #1.2: Type: text/html, Size: 9326 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files. 2012-09-21 15:51 ` Anand Tiwari @ 2012-09-21 16:07 ` Eric Sandeen 2012-09-21 16:40 ` Anand Tiwari 0 siblings, 1 reply; 16+ messages in thread From: Eric Sandeen @ 2012-09-21 16:07 UTC (permalink / raw) To: Anand Tiwari; +Cc: xfs On 9/21/12 10:51 AM, Anand Tiwari wrote: > > > On Thu, Sep 20, 2012 at 11:00 PM, Eric Sandeen <sandeen@sandeen.net > <mailto:sandeen@sandeen.net>> wrote: > > On 9/20/12 7:40 PM, Anand Tiwari wrote: >> Hi All, >> >> I have been looking into an issue with xfs_repair with realtime sub >> volume. some times while running xfs_repair I see following errors >> >> ---------------------------- data fork in rt inode 134 claims used >> rt block 19607 bad data fork in inode 134 would have cleared inode >> 134 data fork in rt inode 135 claims used rt block 29607 bad data >> fork in inode 135 would have cleared inode 135 - agno = 1 - agno = >> 2 - agno = 3 - process newly discovered inodes... Phase 4 - check >> for duplicate blocks... - setting up duplicate extent list... - >> check for inodes claiming duplicate blocks... - agno = 0 - agno = >> 1 - agno = 2 - agno = 3 entry "test-011" in shortform directory 128 >> references free inode 134 would have junked entry "test-011" in >> directory inode 128 entry "test-0" in shortform directory 128 >> references free inode 135 would have junked entry "test-0" in >> directory inode 128 data fork in rt ino 134 claims dup rt >> extent,off - 0, start - 7942144, count 2097000 bad data fork in >> inode 134 would have cleared inode 134 data fork in rt ino 135 >> claims dup rt extent,off - 0, start - 13062144, count 2097000 bad >> data fork in inode 135 would have cleared inode 135 No modify flag >> set, skipping phase 5 ------------------------ >> >> Here is the bmap for both inodes. >> >> xfs_db> inode 135 xfs_db> bmap data offset 0 startblock 13062144 >> (12/479232) count 2097000 flag 0 data offset 2097000 startblock >> 15159144 (14/479080) count 2097000 flag 0 data offset 4194000 >> startblock 17256144 (16/478928) count 2097000 flag 0 data offset >> 6291000 startblock 19353144 (18/478776) count 2097000 flag 0 data >> offset 8388000 startblock 21450144 (20/478624) count 2097000 flag >> 0 data offset 10485000 startblock 23547144 (22/478472) count >> 2097000 flag 0 data offset 12582000 startblock 25644144 (24/478320) >> count 2097000 flag 0 data offset 14679000 startblock 27741144 >> (26/478168) count 2097000 flag 0 data offset 16776000 startblock >> 29838144 (28/478016) count 2097000 flag 0 data offset 18873000 >> startblock 31935144 (30/477864) count 1607000 flag 0 xfs_db> inode >> 134 xfs_db> bmap data offset 0 startblock 7942144 (7/602112) count >> 2097000 flag 0 data offset 2097000 startblock 10039144 (9/601960) >> count 2097000 flag 0 data offset 4194000 startblock 12136144 >> (11/601808) count 926000 flag 0 > > It's been a while since I thought about realtime, but - > > That all seems fine, I don't see anything overlapping there, they > are all perfectly adjacent, though of interesting size. > >> >> by looking into xfs_repair code, it looks like repair does not >> handle a case where we have more than one extent in a real-time >> extent. following is code from repair/dinode.c: process_rt_rec > > "more than one extent in a real-time extent?" I'm not sure what that > means. > > Every extent above is length 2097000 blocks, and they are adjacent. > But you say your realtime extent size is 512 blocks ... which doesn't > go into 2097000 evenly. So that's odd, at least. > > > well, lets look at first extent >> data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0 >> data offset 2097000 startblock 15159144 (14/479080) count 2097000 >> flag 0 > startblock is aligned and rtext is 25512, since the blockcount is > not multiple of 512, last realtime extent ( 25512 + 4095) is > partially used, 360 blks second extent start from realtime extent > 29607 (ie 25512 + 4095). so, yes, extents are not overlapping, but > 29607 realtime extent is shared by two extents. Now once xfs_repair > detects this case in phase 2, it bails out and clears that inode. I > think search for duplicate extent is done in phase 4, but inode is > marked already. ... ok I realize I was misunderstanding some things about the realtime volume. (It's been a very long time since I thought about it). Still, I'd like to look at the metadump image if possible. Thanks, -Eric _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files. 2012-09-21 16:07 ` Eric Sandeen @ 2012-09-21 16:40 ` Anand Tiwari 0 siblings, 0 replies; 16+ messages in thread From: Anand Tiwari @ 2012-09-21 16:40 UTC (permalink / raw) To: Eric Sandeen; +Cc: xfs [-- Attachment #1.1: Type: text/plain, Size: 4419 bytes --] On Fri, Sep 21, 2012 at 10:07 AM, Eric Sandeen <sandeen@sandeen.net> wrote: > On 9/21/12 10:51 AM, Anand Tiwari wrote: > > > > > > On Thu, Sep 20, 2012 at 11:00 PM, Eric Sandeen <sandeen@sandeen.net > > <mailto:sandeen@sandeen.net>> wrote: > > > > On 9/20/12 7:40 PM, Anand Tiwari wrote: > >> Hi All, > >> > >> I have been looking into an issue with xfs_repair with realtime sub > >> volume. some times while running xfs_repair I see following errors > >> > >> ---------------------------- data fork in rt inode 134 claims used > >> rt block 19607 bad data fork in inode 134 would have cleared inode > >> 134 data fork in rt inode 135 claims used rt block 29607 bad data > >> fork in inode 135 would have cleared inode 135 - agno = 1 - agno = > >> 2 - agno = 3 - process newly discovered inodes... Phase 4 - check > >> for duplicate blocks... - setting up duplicate extent list... - > >> check for inodes claiming duplicate blocks... - agno = 0 - agno = > >> 1 - agno = 2 - agno = 3 entry "test-011" in shortform directory 128 > >> references free inode 134 would have junked entry "test-011" in > >> directory inode 128 entry "test-0" in shortform directory 128 > >> references free inode 135 would have junked entry "test-0" in > >> directory inode 128 data fork in rt ino 134 claims dup rt > >> extent,off - 0, start - 7942144, count 2097000 bad data fork in > >> inode 134 would have cleared inode 134 data fork in rt ino 135 > >> claims dup rt extent,off - 0, start - 13062144, count 2097000 bad > >> data fork in inode 135 would have cleared inode 135 No modify flag > >> set, skipping phase 5 ------------------------ > >> > >> Here is the bmap for both inodes. > >> > >> xfs_db> inode 135 xfs_db> bmap data offset 0 startblock 13062144 > >> (12/479232) count 2097000 flag 0 data offset 2097000 startblock > >> 15159144 (14/479080) count 2097000 flag 0 data offset 4194000 > >> startblock 17256144 (16/478928) count 2097000 flag 0 data offset > >> 6291000 startblock 19353144 (18/478776) count 2097000 flag 0 data > >> offset 8388000 startblock 21450144 (20/478624) count 2097000 flag > >> 0 data offset 10485000 startblock 23547144 (22/478472) count > >> 2097000 flag 0 data offset 12582000 startblock 25644144 (24/478320) > >> count 2097000 flag 0 data offset 14679000 startblock 27741144 > >> (26/478168) count 2097000 flag 0 data offset 16776000 startblock > >> 29838144 (28/478016) count 2097000 flag 0 data offset 18873000 > >> startblock 31935144 (30/477864) count 1607000 flag 0 xfs_db> inode > >> 134 xfs_db> bmap data offset 0 startblock 7942144 (7/602112) count > >> 2097000 flag 0 data offset 2097000 startblock 10039144 (9/601960) > >> count 2097000 flag 0 data offset 4194000 startblock 12136144 > >> (11/601808) count 926000 flag 0 > > > > It's been a while since I thought about realtime, but - > > > > That all seems fine, I don't see anything overlapping there, they > > are all perfectly adjacent, though of interesting size. > > > >> > >> by looking into xfs_repair code, it looks like repair does not > >> handle a case where we have more than one extent in a real-time > >> extent. following is code from repair/dinode.c: process_rt_rec > > > > "more than one extent in a real-time extent?" I'm not sure what that > > means. > > > > Every extent above is length 2097000 blocks, and they are adjacent. > > But you say your realtime extent size is 512 blocks ... which doesn't > > go into 2097000 evenly. So that's odd, at least. > > > > > > well, lets look at first extent > >> data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0 > >> data offset 2097000 startblock 15159144 (14/479080) count 2097000 > >> flag 0 > > startblock is aligned and rtext is 25512, since the blockcount is > > not multiple of 512, last realtime extent ( 25512 + 4095) is > > partially used, 360 blks second extent start from realtime extent > > 29607 (ie 25512 + 4095). so, yes, extents are not overlapping, but > > 29607 realtime extent is shared by two extents. Now once xfs_repair > > detects this case in phase 2, it bails out and clears that inode. I > > think search for duplicate extent is done in phase 4, but inode is > > marked already. > > ... ok I realize I was misunderstanding some things about the realtime > volume. (It's been a very long time since I thought about it). Still, > I'd like to look at the metadump image if possible. > > Thanks, > -Eric > metadump attached [-- Attachment #1.2: Type: text/html, Size: 5568 bytes --] [-- Attachment #2: dumpo.xfs.bz2 --] [-- Type: application/x-bzip2, Size: 24562 bytes --] [-- Attachment #3: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files. 2012-09-21 5:00 ` Eric Sandeen 2012-09-21 15:51 ` Anand Tiwari @ 2012-09-24 7:55 ` Dave Chinner 2012-09-24 12:51 ` Anand Tiwari 1 sibling, 1 reply; 16+ messages in thread From: Dave Chinner @ 2012-09-24 7:55 UTC (permalink / raw) To: Eric Sandeen; +Cc: Anand Tiwari, xfs On Fri, Sep 21, 2012 at 12:00:13AM -0500, Eric Sandeen wrote: > On 9/20/12 7:40 PM, Anand Tiwari wrote: > > Hi All, > > > > I have been looking into an issue with xfs_repair with realtime sub volume. some times while running xfs_repair I see following errors > > > > ---------------------------- > > data fork in rt inode 134 claims used rt block 19607 > > bad data fork in inode 134 > > would have cleared inode 134 > > data fork in rt inode 135 claims used rt block 29607 > > bad data fork in inode 135 > > would have cleared inode 135 ..... > > xfs_db> inode 135 > > xfs_db> bmap > > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0 > > data offset 2097000 startblock 15159144 (14/479080) count 2097000 flag 0 > > data offset 4194000 startblock 17256144 (16/478928) count 2097000 flag 0 > > data offset 6291000 startblock 19353144 (18/478776) count 2097000 flag 0 > > data offset 8388000 startblock 21450144 (20/478624) count 2097000 flag 0 > > data offset 10485000 startblock 23547144 (22/478472) count 2097000 flag 0 > > data offset 12582000 startblock 25644144 (24/478320) count 2097000 flag 0 > > data offset 14679000 startblock 27741144 (26/478168) count 2097000 flag 0 > > data offset 16776000 startblock 29838144 (28/478016) count 2097000 flag 0 > > data offset 18873000 startblock 31935144 (30/477864) count 1607000 flag 0 > > xfs_db> inode 134 > > xfs_db> bmap > > data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0 > > data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag 0 > > data offset 4194000 startblock 12136144 (11/601808) count 926000 flag 0 > > It's been a while since I thought about realtime, but - > > That all seems fine, I don't see anything overlapping there, they are > all perfectly adjacent, though of interesting size. Yeah, the size is the problem. .... > Every extent above is length 2097000 blocks, and they are adjacent. > But you say your realtime extent size is 512 blocks ... which doesn't go > into 2097000 evenly. So that's odd, at least. Once you realise that the bmapbt is recording multiples of FSB (4k) rather than rtextsz (2MB), it becomes more obvious what the problem is: rounding of the extent size at MAXEXTLEN - 2097000 is only 152 blocks short of 2^21 (2097152). I haven't looked at the kernel code yet to work out why it is rounding to a non-rtextsz multiple, but that is the source of the problem. The repair code is detecting that extents are not of the correct granularity, but the error message indicates that this was only ever expected for duplicate blocks occurring rather than a kernel bug. So "fixing repair" is not what is needd here - finding and fixing the kernel bug is what you shoul be looking at. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files. 2012-09-24 7:55 ` Dave Chinner @ 2012-09-24 12:51 ` Anand Tiwari 2012-09-26 1:26 ` Anand Tiwari 0 siblings, 1 reply; 16+ messages in thread From: Anand Tiwari @ 2012-09-24 12:51 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, xfs [-- Attachment #1.1: Type: text/plain, Size: 3078 bytes --] On Mon, Sep 24, 2012 at 1:55 AM, Dave Chinner <david@fromorbit.com> wrote: > On Fri, Sep 21, 2012 at 12:00:13AM -0500, Eric Sandeen wrote: > > On 9/20/12 7:40 PM, Anand Tiwari wrote: > > > Hi All, > > > > > > I have been looking into an issue with xfs_repair with realtime sub > volume. some times while running xfs_repair I see following errors > > > > > > ---------------------------- > > > data fork in rt inode 134 claims used rt block 19607 > > > bad data fork in inode 134 > > > would have cleared inode 134 > > > data fork in rt inode 135 claims used rt block 29607 > > > bad data fork in inode 135 > > > would have cleared inode 135 > ..... > > > xfs_db> inode 135 > > > xfs_db> bmap > > > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0 > > > data offset 2097000 startblock 15159144 (14/479080) count 2097000 flag > 0 > > > data offset 4194000 startblock 17256144 (16/478928) count 2097000 flag > 0 > > > data offset 6291000 startblock 19353144 (18/478776) count 2097000 flag > 0 > > > data offset 8388000 startblock 21450144 (20/478624) count 2097000 flag > 0 > > > data offset 10485000 startblock 23547144 (22/478472) count 2097000 > flag 0 > > > data offset 12582000 startblock 25644144 (24/478320) count 2097000 > flag 0 > > > data offset 14679000 startblock 27741144 (26/478168) count 2097000 > flag 0 > > > data offset 16776000 startblock 29838144 (28/478016) count 2097000 > flag 0 > > > data offset 18873000 startblock 31935144 (30/477864) count 1607000 > flag 0 > > > xfs_db> inode 134 > > > xfs_db> bmap > > > data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0 > > > data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag 0 > > > data offset 4194000 startblock 12136144 (11/601808) count 926000 flag 0 > > > > It's been a while since I thought about realtime, but - > > > > That all seems fine, I don't see anything overlapping there, they are > > all perfectly adjacent, though of interesting size. > > Yeah, the size is the problem. > > .... > > Every extent above is length 2097000 blocks, and they are adjacent. > > But you say your realtime extent size is 512 blocks ... which doesn't go > > into 2097000 evenly. So that's odd, at least. > > Once you realise that the bmapbt is recording multiples of FSB (4k) > rather than rtextsz (2MB), it becomes more obvious what the problem > is: rounding of the extent size at MAXEXTLEN - 2097000 is only 152 > blocks short of 2^21 (2097152). > > I haven't looked at the kernel code yet to work out why it is > rounding to a non-rtextsz multiple, but that is the source of the > problem. > > The repair code is detecting that extents are not of the > correct granularity, but the error message indicates that this was > only ever expected for duplicate blocks occurring rather than a > kernel bug. So "fixing repair" is not what is needd here - finding > and fixing the kernel bug is what you shoul be looking at. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > thanks, I started looking at allocator code and and will report if see something [-- Attachment #1.2: Type: text/html, Size: 3933 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files. 2012-09-24 12:51 ` Anand Tiwari @ 2012-09-26 1:26 ` Anand Tiwari 2012-09-26 2:44 ` Dave Chinner 0 siblings, 1 reply; 16+ messages in thread From: Anand Tiwari @ 2012-09-26 1:26 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, xfs [-- Attachment #1.1: Type: text/plain, Size: 4810 bytes --] On Mon, Sep 24, 2012 at 6:51 AM, Anand Tiwari <tiwarikanand@gmail.com>wrote: > > > On Mon, Sep 24, 2012 at 1:55 AM, Dave Chinner <david@fromorbit.com> wrote: > >> On Fri, Sep 21, 2012 at 12:00:13AM -0500, Eric Sandeen wrote: >> > On 9/20/12 7:40 PM, Anand Tiwari wrote: >> > > Hi All, >> > > >> > > I have been looking into an issue with xfs_repair with realtime sub >> volume. some times while running xfs_repair I see following errors >> > > >> > > ---------------------------- >> > > data fork in rt inode 134 claims used rt block 19607 >> > > bad data fork in inode 134 >> > > would have cleared inode 134 >> > > data fork in rt inode 135 claims used rt block 29607 >> > > bad data fork in inode 135 >> > > would have cleared inode 135 >> ..... >> > > xfs_db> inode 135 >> > > xfs_db> bmap >> > > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0 >> > > data offset 2097000 startblock 15159144 (14/479080) count 2097000 >> flag 0 >> > > data offset 4194000 startblock 17256144 (16/478928) count 2097000 >> flag 0 >> > > data offset 6291000 startblock 19353144 (18/478776) count 2097000 >> flag 0 >> > > data offset 8388000 startblock 21450144 (20/478624) count 2097000 >> flag 0 >> > > data offset 10485000 startblock 23547144 (22/478472) count 2097000 >> flag 0 >> > > data offset 12582000 startblock 25644144 (24/478320) count 2097000 >> flag 0 >> > > data offset 14679000 startblock 27741144 (26/478168) count 2097000 >> flag 0 >> > > data offset 16776000 startblock 29838144 (28/478016) count 2097000 >> flag 0 >> > > data offset 18873000 startblock 31935144 (30/477864) count 1607000 >> flag 0 >> > > xfs_db> inode 134 >> > > xfs_db> bmap >> > > data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0 >> > > data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag >> 0 >> > > data offset 4194000 startblock 12136144 (11/601808) count 926000 flag >> 0 >> > >> > It's been a while since I thought about realtime, but - >> > >> > That all seems fine, I don't see anything overlapping there, they are >> > all perfectly adjacent, though of interesting size. >> >> Yeah, the size is the problem. >> >> .... >> > Every extent above is length 2097000 blocks, and they are adjacent. >> > But you say your realtime extent size is 512 blocks ... which doesn't go >> > into 2097000 evenly. So that's odd, at least. >> >> Once you realise that the bmapbt is recording multiples of FSB (4k) >> rather than rtextsz (2MB), it becomes more obvious what the problem >> is: rounding of the extent size at MAXEXTLEN - 2097000 is only 152 >> blocks short of 2^21 (2097152). >> >> I haven't looked at the kernel code yet to work out why it is >> rounding to a non-rtextsz multiple, but that is the source of the >> problem. >> >> The repair code is detecting that extents are not of the >> correct granularity, but the error message indicates that this was >> only ever expected for duplicate blocks occurring rather than a >> kernel bug. So "fixing repair" is not what is needd here - finding >> and fixing the kernel bug is what you shoul be looking at. >> >> Cheers, >> >> Dave. >> -- >> Dave Chinner >> david@fromorbit.com >> > > > thanks, I started looking at allocator code and and will report if see > something > I think this is what happening. If we have following conditions, 1) we have more than 8gb contiguous space available to allocate. ( i.e. more than 2^21 4k blocks) 2) only one file is open for writing in real-time volume. To satisfy first condition, I just took empty file-system. Now lets start allocating, lets say in chucks of 25000, realtime allocator will have no problem allocating "exact" block while searching forward. xfs_rtfind_forw(). It will allocate 49 "real-time extents", where the 49th "real-time extent" is partially full. (25000/512 = 48) everything is fine for first 83 allocations, as we were able to grow the extent. Now we have 2075000 (25000*83) blocks in first extent ie 4053 "real-time extents" (where last "real-time extent" is partially full). for 84th allocation, real-time allocator will allocate another 49 "real-time extents" as it does not know about maximum extent size, but we can not grow the extent in xfs_bmap_add_extent_unwritten_real(). so we insert a new extent (case BMAP_LEFT_FILLING). now the new extent starts from 2075000, which is not aligned with rextsize (512 in this case). To fix this, I see two options, 1) fix real-time allocator and teach it about maximum extent size. 2) for real-time files, aligned new extent before inserting. In my opinion, we should not worry about either of above, as this looks good method for allocation. I can fix xfs_repair tool and make it aware of these conditions ("real-time extents" shared by two or more extents). Let me know what you guys think, anand [-- Attachment #1.2: Type: text/html, Size: 5994 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files. 2012-09-26 1:26 ` Anand Tiwari @ 2012-09-26 2:44 ` Dave Chinner 2012-09-26 3:45 ` Anand Tiwari 0 siblings, 1 reply; 16+ messages in thread From: Dave Chinner @ 2012-09-26 2:44 UTC (permalink / raw) To: Anand Tiwari; +Cc: Eric Sandeen, xfs On Tue, Sep 25, 2012 at 07:26:32PM -0600, Anand Tiwari wrote: > On Mon, Sep 24, 2012 at 6:51 AM, Anand Tiwari <tiwarikanand@gmail.com>wrote: > > > > > > > On Mon, Sep 24, 2012 at 1:55 AM, Dave Chinner <david@fromorbit.com> wrote: > > > >> On Fri, Sep 21, 2012 at 12:00:13AM -0500, Eric Sandeen wrote: > >> > On 9/20/12 7:40 PM, Anand Tiwari wrote: > >> > > Hi All, > >> > > > >> > > I have been looking into an issue with xfs_repair with realtime sub > >> volume. some times while running xfs_repair I see following errors > >> > > > >> > > ---------------------------- > >> > > data fork in rt inode 134 claims used rt block 19607 > >> > > bad data fork in inode 134 > >> > > would have cleared inode 134 > >> > > data fork in rt inode 135 claims used rt block 29607 > >> > > bad data fork in inode 135 > >> > > would have cleared inode 135 > >> ..... > >> > > xfs_db> inode 135 > >> > > xfs_db> bmap > >> > > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0 > >> > > data offset 2097000 startblock 15159144 (14/479080) count 2097000 > >> flag 0 > >> > > data offset 4194000 startblock 17256144 (16/478928) count 2097000 > >> flag 0 > >> > > data offset 6291000 startblock 19353144 (18/478776) count 2097000 > >> flag 0 > >> > > data offset 8388000 startblock 21450144 (20/478624) count 2097000 > >> flag 0 > >> > > data offset 10485000 startblock 23547144 (22/478472) count 2097000 > >> flag 0 > >> > > data offset 12582000 startblock 25644144 (24/478320) count 2097000 > >> flag 0 > >> > > data offset 14679000 startblock 27741144 (26/478168) count 2097000 > >> flag 0 > >> > > data offset 16776000 startblock 29838144 (28/478016) count 2097000 > >> flag 0 > >> > > data offset 18873000 startblock 31935144 (30/477864) count 1607000 > >> flag 0 > >> > > xfs_db> inode 134 > >> > > xfs_db> bmap > >> > > data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0 > >> > > data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag > >> 0 > >> > > data offset 4194000 startblock 12136144 (11/601808) count 926000 flag > >> 0 > >> > > >> > It's been a while since I thought about realtime, but - > >> > > >> > That all seems fine, I don't see anything overlapping there, they are > >> > all perfectly adjacent, though of interesting size. > >> > >> Yeah, the size is the problem. > >> > >> .... > >> > Every extent above is length 2097000 blocks, and they are adjacent. > >> > But you say your realtime extent size is 512 blocks ... which doesn't go > >> > into 2097000 evenly. So that's odd, at least. > >> > >> Once you realise that the bmapbt is recording multiples of FSB (4k) > >> rather than rtextsz (2MB), it becomes more obvious what the problem > >> is: rounding of the extent size at MAXEXTLEN - 2097000 is only 152 > >> blocks short of 2^21 (2097152). > >> > >> I haven't looked at the kernel code yet to work out why it is > >> rounding to a non-rtextsz multiple, but that is the source of the > >> problem. > >> > >> The repair code is detecting that extents are not of the > >> correct granularity, but the error message indicates that this was > >> only ever expected for duplicate blocks occurring rather than a > >> kernel bug. So "fixing repair" is not what is needd here - finding > >> and fixing the kernel bug is what you shoul be looking at. > >> > >> Cheers, > >> > >> Dave. > >> -- > >> Dave Chinner > >> david@fromorbit.com > >> > > > > > > thanks, I started looking at allocator code and and will report if see > > something > > > > > I think this is what happening. If we have following conditions, > 1) we have more than 8gb contiguous space available to allocate. ( i.e. > more than 2^21 4k blocks) > 2) only one file is open for writing in real-time volume. > > To satisfy first condition, I just took empty file-system. > > Now lets start allocating, lets say in chucks of 25000, realtime allocator > will have no problem allocating "exact" block while searching forward. > xfs_rtfind_forw(). It will allocate 49 "real-time extents", where the 49th > "real-time extent" is partially full. (25000/512 = 48) > > everything is fine for first 83 allocations, as we were able to grow the > extent. Now we have 2075000 (25000*83) blocks in first extent ie 4053 > "real-time extents" (where last "real-time extent" is partially full). > > for 84th allocation, real-time allocator will allocate another 49 > "real-time extents" as it does not know about maximum extent size, but we > can not grow the extent in xfs_bmap_add_extent_unwritten_real(). so we > insert a new extent (case BMAP_LEFT_FILLING). now the new extent starts > from 2075000, which is not aligned with rextsize (512 in this case). Ok, so it's a problem with using unwritten extents and converting them. That is, the issue probably has nothing to do with the realtime allocator at all. Basically, when the unwritten extent occurs, we end up with a map like this: ext 0: offset 0, length 2075000 state written ext 1: offset 2075000 length 25000 state unwritten This will occur because you can't mix written/unwritten state in a single extent. What xfs_bmap_add_extent_unwritten_real() is attempting to do is convert the unwritten extent to written state and merge it with it's siblings. In this case, 2075000 + 25000 > MAXEXTLEN, so it does not merge them because of this check: if ((state & BMAP_LEFT_VALID) && !(state & BMAP_LEFT_DELAY) && LEFT.br_startoff + LEFT.br_blockcount == new->br_startoff && LEFT.br_startblock + LEFT.br_blockcount == new->br_startblock && LEFT.br_state == newext && >>>>>> LEFT.br_blockcount + new->br_blockcount <= MAXEXTLEN) state |= BMAP_LEFT_CONTIG; Which means that BMAP_LEFT_CONTIG is not set, indicating that the no merging with the an adjacent left extent should occur. Hence we end up wwith this: ext 0: offset 0, length 2075000 state written ext 1: offset 2075000 length 25000 state written That's fine for normal operation, but it means that large contiguous regions written via direct IO with non-rtextsz aligned/sized IO will have problem this problem. What technically should happen for these real time files is that the LEFT extent should be shortened to be aligned, and the new extent be lengthened and have it's startblock adjusted accordingly. i.e. we should end up with this: ext 0: offset 0, length 2074624 state written ext 1: offset 2074624 length 25376 state written > To fix this, I see two options, > 1) fix real-time allocator and teach it about maximum extent size. > 2) for real-time files, aligned new extent before inserting. 3) Fix the {BMAP_LEFT_CONTIG,MAXEXTLEN,rtextsz} handling in xfs_bmap_add_extent_unwritten_real(). It's possible that the BMAP_RIGHT_CONTIG case also needs similar fixing... > In my opinion, we should not worry about either of above, as this looks > good method for allocation. I can fix xfs_repair tool and make it aware of > these conditions ("real-time extents" shared by two or more extents). Personally, I'd prefer that 3) is done because RT extents should always be rtextsz aligned and sized, and the bmapbt should respect that requirement in all cases. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files. 2012-09-26 2:44 ` Dave Chinner @ 2012-09-26 3:45 ` Anand Tiwari 2012-09-26 6:17 ` Dave Chinner 0 siblings, 1 reply; 16+ messages in thread From: Anand Tiwari @ 2012-09-26 3:45 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, xfs [-- Attachment #1.1: Type: text/plain, Size: 7963 bytes --] On Tue, Sep 25, 2012 at 8:44 PM, Dave Chinner <david@fromorbit.com> wrote: > On Tue, Sep 25, 2012 at 07:26:32PM -0600, Anand Tiwari wrote: > > On Mon, Sep 24, 2012 at 6:51 AM, Anand Tiwari <tiwarikanand@gmail.com > >wrote: > > > > > > > > > > > On Mon, Sep 24, 2012 at 1:55 AM, Dave Chinner <david@fromorbit.com> > wrote: > > > > > >> On Fri, Sep 21, 2012 at 12:00:13AM -0500, Eric Sandeen wrote: > > >> > On 9/20/12 7:40 PM, Anand Tiwari wrote: > > >> > > Hi All, > > >> > > > > >> > > I have been looking into an issue with xfs_repair with realtime > sub > > >> volume. some times while running xfs_repair I see following errors > > >> > > > > >> > > ---------------------------- > > >> > > data fork in rt inode 134 claims used rt block 19607 > > >> > > bad data fork in inode 134 > > >> > > would have cleared inode 134 > > >> > > data fork in rt inode 135 claims used rt block 29607 > > >> > > bad data fork in inode 135 > > >> > > would have cleared inode 135 > > >> ..... > > >> > > xfs_db> inode 135 > > >> > > xfs_db> bmap > > >> > > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0 > > >> > > data offset 2097000 startblock 15159144 (14/479080) count 2097000 > > >> flag 0 > > >> > > data offset 4194000 startblock 17256144 (16/478928) count 2097000 > > >> flag 0 > > >> > > data offset 6291000 startblock 19353144 (18/478776) count 2097000 > > >> flag 0 > > >> > > data offset 8388000 startblock 21450144 (20/478624) count 2097000 > > >> flag 0 > > >> > > data offset 10485000 startblock 23547144 (22/478472) count 2097000 > > >> flag 0 > > >> > > data offset 12582000 startblock 25644144 (24/478320) count 2097000 > > >> flag 0 > > >> > > data offset 14679000 startblock 27741144 (26/478168) count 2097000 > > >> flag 0 > > >> > > data offset 16776000 startblock 29838144 (28/478016) count 2097000 > > >> flag 0 > > >> > > data offset 18873000 startblock 31935144 (30/477864) count 1607000 > > >> flag 0 > > >> > > xfs_db> inode 134 > > >> > > xfs_db> bmap > > >> > > data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0 > > >> > > data offset 2097000 startblock 10039144 (9/601960) count 2097000 > flag > > >> 0 > > >> > > data offset 4194000 startblock 12136144 (11/601808) count 926000 > flag > > >> 0 > > >> > > > >> > It's been a while since I thought about realtime, but - > > >> > > > >> > That all seems fine, I don't see anything overlapping there, they > are > > >> > all perfectly adjacent, though of interesting size. > > >> > > >> Yeah, the size is the problem. > > >> > > >> .... > > >> > Every extent above is length 2097000 blocks, and they are adjacent. > > >> > But you say your realtime extent size is 512 blocks ... which > doesn't go > > >> > into 2097000 evenly. So that's odd, at least. > > >> > > >> Once you realise that the bmapbt is recording multiples of FSB (4k) > > >> rather than rtextsz (2MB), it becomes more obvious what the problem > > >> is: rounding of the extent size at MAXEXTLEN - 2097000 is only 152 > > >> blocks short of 2^21 (2097152). > > >> > > >> I haven't looked at the kernel code yet to work out why it is > > >> rounding to a non-rtextsz multiple, but that is the source of the > > >> problem. > > >> > > >> The repair code is detecting that extents are not of the > > >> correct granularity, but the error message indicates that this was > > >> only ever expected for duplicate blocks occurring rather than a > > >> kernel bug. So "fixing repair" is not what is needd here - finding > > >> and fixing the kernel bug is what you shoul be looking at. > > >> > > >> Cheers, > > >> > > >> Dave. > > >> -- > > >> Dave Chinner > > >> david@fromorbit.com > > >> > > > > > > > > > thanks, I started looking at allocator code and and will report if see > > > something > > > > > > > > > I think this is what happening. If we have following conditions, > > 1) we have more than 8gb contiguous space available to allocate. ( i.e. > > more than 2^21 4k blocks) > > 2) only one file is open for writing in real-time volume. > > > > To satisfy first condition, I just took empty file-system. > > > > Now lets start allocating, lets say in chucks of 25000, realtime > allocator > > will have no problem allocating "exact" block while searching forward. > > xfs_rtfind_forw(). It will allocate 49 "real-time extents", where the > 49th > > "real-time extent" is partially full. (25000/512 = 48) > > > > everything is fine for first 83 allocations, as we were able to grow the > > extent. Now we have 2075000 (25000*83) blocks in first extent ie 4053 > > "real-time extents" (where last "real-time extent" is partially full). > > > > for 84th allocation, real-time allocator will allocate another 49 > > "real-time extents" as it does not know about maximum extent size, but we > > can not grow the extent in xfs_bmap_add_extent_unwritten_real(). so we > > insert a new extent (case BMAP_LEFT_FILLING). now the new extent starts > > from 2075000, which is not aligned with rextsize (512 in this case). > > Ok, so it's a problem with using unwritten extents and converting > them. That is, the issue probably has nothing to do with the > realtime allocator at all. > > Basically, when the unwritten extent occurs, we end up with a map > like this: > > ext 0: offset 0, length 2075000 state written > ext 1: offset 2075000 length 25000 state unwritten > > This will occur because you can't mix written/unwritten state in a > single extent. > > What xfs_bmap_add_extent_unwritten_real() is attempting to do is > convert the unwritten extent to written state and merge it with it's > siblings. In this case, 2075000 + 25000 > MAXEXTLEN, so it does not > merge them because of this check: > > if ((state & BMAP_LEFT_VALID) && !(state & BMAP_LEFT_DELAY) && > LEFT.br_startoff + LEFT.br_blockcount == new->br_startoff && > LEFT.br_startblock + LEFT.br_blockcount == new->br_startblock > && > LEFT.br_state == newext && > >>>>>> LEFT.br_blockcount + new->br_blockcount <= MAXEXTLEN) > state |= BMAP_LEFT_CONTIG; > > Which means that BMAP_LEFT_CONTIG is not set, indicating that the no > merging with the an adjacent left extent should occur. Hence we end > up wwith this: > > ext 0: offset 0, length 2075000 state written > ext 1: offset 2075000 length 25000 state written > > That's fine for normal operation, but it means that large contiguous > regions written via direct IO with non-rtextsz aligned/sized IO will > have problem this problem. > > What technically should happen for these real time files is that the > LEFT extent should be shortened to be aligned, and the new extent be > lengthened and have it's startblock adjusted accordingly. > > i.e. we should end up with this: > > ext 0: offset 0, length 2074624 state written > ext 1: offset 2074624 length 25376 state written > > > To fix this, I see two options, > > 1) fix real-time allocator and teach it about maximum extent size. > > 2) for real-time files, aligned new extent before inserting. > > 3) Fix the {BMAP_LEFT_CONTIG,MAXEXTLEN,rtextsz} handling in > xfs_bmap_add_extent_unwritten_real(). > > It's possible that the BMAP_RIGHT_CONTIG case also needs similar > fixing... > > > In my opinion, we should not worry about either of above, as this looks > > good method for allocation. I can fix xfs_repair tool and make it aware > of > > these conditions ("real-time extents" shared by two or more extents). > > Personally, I'd prefer that 3) is done because RT extents should > always be rtextsz aligned and sized, and the bmapbt should respect > that requirement in all cases. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > thanks Dave for prompt reply, I meant to implement option 2 as you explained (option 3). I will start working on it tomorrow. In the mean time, I also had to put something in xfs_repair for the files which already exists on the disk. Would you guys willing to review/comment on that ? anand [-- Attachment #1.2: Type: text/html, Size: 10376 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files. 2012-09-26 3:45 ` Anand Tiwari @ 2012-09-26 6:17 ` Dave Chinner 2012-09-28 1:27 ` Anand Tiwari 0 siblings, 1 reply; 16+ messages in thread From: Dave Chinner @ 2012-09-26 6:17 UTC (permalink / raw) To: Anand Tiwari; +Cc: Eric Sandeen, xfs On Tue, Sep 25, 2012 at 09:45:07PM -0600, Anand Tiwari wrote: > thanks Dave for prompt reply, I meant to implement option 2 as you > explained (option 3). I will start working on it tomorrow. In the mean > time, I also had to put something in xfs_repair for the files which > already exists on the disk. Would you guys willing to review/comment on > that ? Sure. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files. 2012-09-26 6:17 ` Dave Chinner @ 2012-09-28 1:27 ` Anand Tiwari 2012-09-28 6:47 ` Dave Chinner 0 siblings, 1 reply; 16+ messages in thread From: Anand Tiwari @ 2012-09-28 1:27 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, xfs [-- Attachment #1.1: Type: text/plain, Size: 4612 bytes --] On Wed, Sep 26, 2012 at 12:17 AM, Dave Chinner <david@fromorbit.com> wrote: > On Tue, Sep 25, 2012 at 09:45:07PM -0600, Anand Tiwari wrote: > > thanks Dave for prompt reply, I meant to implement option 2 as you > > explained (option 3). I will start working on it tomorrow. In the mean > > time, I also had to put something in xfs_repair for the files which > > already exists on the disk. Would you guys willing to review/comment on > > that ? > > Sure. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > following are my changes for xfs_repair. my goal is to keep changes minimum as they may not included in upstream. I had to do these changes as we already have files with extent map not properly aligned. As we know, this happens only when we are growing a file in a realtime volume. by keeping this in mind, I am checking if start of a record in extent map is not aligned, check previous record and if they are contiguous, we can skip that part of record. let me know if you any issues with this or if someone has better approach. I would like to use pointers for prev and irec but again, I wanted to keep changes minimum thanks anand XFS currently can have records in extent map, which starts from unaligned block w.r.t rextsize. -------------------------------------- xfs_repair considers this as a bug (multiple claims for a one real time extent) and deletes the file. This patch addresses the issue, by comparing current and previous records and make sure they are contiguous and not overlapped. diff --git a/repair/dinode.c b/repair/dinode.c index 5a2da39..5537f1c 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -406,6 +406,7 @@ verify_agbno(xfs_mount_t *mp, static int process_rt_rec( xfs_mount_t *mp, + xfs_bmbt_irec_t *prev, xfs_bmbt_irec_t *irec, xfs_ino_t ino, xfs_drfsbno_t *tot, @@ -413,8 +414,11 @@ process_rt_rec( { xfs_dfsbno_t b; xfs_drtbno_t ext; + xfs_drtbno_t start_block; + xfs_filblks_t block_count; int state; int pwe; /* partially-written extent */ + int rtext_remainder; /* start block is not aligned w.r.t rextsize */ /* * check numeric validity of the extent @@ -461,12 +465,32 @@ _("malformed rt inode extent [%" PRIu64 " %" PRIu64 "] (fs rtext size = %u)\n"), return 1; } + /* If we have start of record unaligned w.r.t to rextsize, see + * if we are sharing this realtime extent with previous record. sharing is only + * allowed with previous extent. fail otherwise. + * Also, if above condition is true, align start block and block count + */ + rtext_remainder = irec->br_startblock % mp->m_sb.sb_rextsize; + if (rtext_remainder) { + do_warn( +_("data fork in rt ino %" PRIu64 " has unalinged start block %"PRIu64 "\n"), + ino, + irec->br_startblock); + if ((prev->br_startoff + prev->br_blockcount == irec->br_startoff) && + (prev->br_startblock + prev->br_blockcount == irec->br_startblock)) { + start_block = irec->br_startblock + (mp->m_sb.sb_rextsize - rtext_remainder); + block_count = irec->br_blockcount - (mp->m_sb.sb_rextsize - rtext_remainder); + } + } else { + start_block = irec->br_startblock; + block_count = irec->br_blockcount; + } + /* * set the appropriate number of extents * this iterates block by block, this can be optimised using extents */ - for (b = irec->br_startblock; b < irec->br_startblock + - irec->br_blockcount; b += mp->m_sb.sb_rextsize) { + for (b = start_block; b < start_block + block_count; b += mp->m_sb.sb_rextsize) { ext = (xfs_drtbno_t) b / mp->m_sb.sb_rextsize; pwe = xfs_sb_version_hasextflgbit(&mp->m_sb) && irec->br_state == XFS_EXT_UNWRITTEN && @@ -548,6 +572,7 @@ process_bmbt_reclist_int( int check_dups, int whichfork) { + xfs_bmbt_irec_t prev; xfs_bmbt_irec_t irec; xfs_dfilblks_t cp = 0; /* prev count */ xfs_dfsbno_t sp = 0; /* prev start */ @@ -574,6 +599,11 @@ process_bmbt_reclist_int( else ftype = _("regular"); + prev.br_startoff = 0; + prev.br_blockcount = 0; + prev.br_startblock = 0; + prev.br_state = 0; + for (i = 0; i < *numrecs; i++) { libxfs_bmbt_disk_get_all(rp + i, &irec); if (i == 0) @@ -610,12 +640,13 @@ _("zero length extent (off = %" PRIu64 ", fsbno = %" PRIu64 ") in ino %" PRIu64 * realtime bitmaps don't use AG locks, so returning * immediately is fine for this code path. */ - if (process_rt_rec(mp, &irec, ino, tot, check_dups)) + if (process_rt_rec(mp, &prev, &irec, ino, tot, check_dups)) return 1; /* * skip rest of loop processing since that'irec.br_startblock * all for regular file forks and attr forks */ + memcpy(&prev, &irec, sizeof(xfs_bmbt_irec_t)); continue; } [-- Attachment #1.2: Type: text/html, Size: 12372 bytes --] [-- Attachment #2: Type: text/plain, Size: 121 bytes --] _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files. 2012-09-28 1:27 ` Anand Tiwari @ 2012-09-28 6:47 ` Dave Chinner 2012-09-29 22:49 ` Anand Tiwari 0 siblings, 1 reply; 16+ messages in thread From: Dave Chinner @ 2012-09-28 6:47 UTC (permalink / raw) To: Anand Tiwari; +Cc: Eric Sandeen, xfs On Thu, Sep 27, 2012 at 07:27:16PM -0600, Anand Tiwari wrote: > On Wed, Sep 26, 2012 at 12:17 AM, Dave Chinner <david@fromorbit.com> wrote: > > > On Tue, Sep 25, 2012 at 09:45:07PM -0600, Anand Tiwari wrote: > > > thanks Dave for prompt reply, I meant to implement option 2 as you > > > explained (option 3). I will start working on it tomorrow. In the mean > > > time, I also had to put something in xfs_repair for the files which > > > already exists on the disk. Would you guys willing to review/comment on > > > that ? > > > > Sure. > > > > Cheers, > > > > Dave. > > -- > > Dave Chinner > > david@fromorbit.com > > > > following are my changes for xfs_repair. my goal is to keep changes > minimum as they may not included in upstream. I had to do these changes as > we already have files with extent map not properly aligned. > As we know, this happens only when we are growing a file in a realtime > volume. by keeping this in mind, I am checking if start of a record in > extent map is not aligned, check previous record and if they are > contiguous, we can skip that part of record. > > let me know if you any issues with this or if someone has better approach. > I would like to use pointers for prev and irec but again, I wanted to keep > changes minimum > > thanks > anand > > > XFS currently can have records in extent map, which starts from unaligned > block w.r.t rextsize. > > -------------------------------------- > > xfs_repair considers this as a bug (multiple claims for a one real time > extent) and deletes the file. > This patch addresses the issue, by comparing current and previous records > and make sure they are > contiguous and not overlapped. > > diff --git a/repair/dinode.c b/repair/dinode.c > index 5a2da39..5537f1c 100644 > --- a/repair/dinode.c > +++ b/repair/dinode.c > @@ -406,6 +406,7 @@ verify_agbno(xfs_mount_t *mp, > static int > process_rt_rec( > xfs_mount_t *mp, > + xfs_bmbt_irec_t *prev, > xfs_bmbt_irec_t *irec, > xfs_ino_t ino, > xfs_drfsbno_t *tot, Your mailer has removed all the whitespace from the patch. The files Documentation/SubmittingPatches and Documentation/email-clients.txt for help with how to send patches sanely via email. ;) Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files. 2012-09-28 6:47 ` Dave Chinner @ 2012-09-29 22:49 ` Anand Tiwari 2012-10-02 0:59 ` Dave Chinner 0 siblings, 1 reply; 16+ messages in thread From: Anand Tiwari @ 2012-09-29 22:49 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, Anand Tiwari, xfs On Fri, 28 Sep 2012, Dave Chinner wrote: > On Thu, Sep 27, 2012 at 07:27:16PM -0600, Anand Tiwari wrote: > > On Wed, Sep 26, 2012 at 12:17 AM, Dave Chinner <david@fromorbit.com> wrote: > > > > > On Tue, Sep 25, 2012 at 09:45:07PM -0600, Anand Tiwari wrote: > > > > thanks Dave for prompt reply, I meant to implement option 2 as you > > > > explained (option 3). I will start working on it tomorrow. In the mean > > > > time, I also had to put something in xfs_repair for the files which > > > > already exists on the disk. Would you guys willing to review/comment on > > > > that ? > > > > > > Sure. > > > > > > Cheers, > > > > > > Dave. > > > -- > > > Dave Chinner > > > david@fromorbit.com > > > > > > > following are my changes for xfs_repair. my goal is to keep changes > > minimum as they may not included in upstream. I had to do these changes as > > we already have files with extent map not properly aligned. > > As we know, this happens only when we are growing a file in a realtime > > volume. by keeping this in mind, I am checking if start of a record in > > extent map is not aligned, check previous record and if they are > > contiguous, we can skip that part of record. > > > > let me know if you any issues with this or if someone has better approach. > > I would like to use pointers for prev and irec but again, I wanted to keep > > changes minimum > > > > thanks > > anand > > > > > > Your mailer has removed all the whitespace from the patch. The files > Documentation/SubmittingPatches and Documentation/email-clients.txt > for help with how to send patches sanely via email. ;) > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > okay,here is my second attempt, hope this time it work. >From 77f8dc66fa5ce6b004f95dbb794322be35d4190a Mon Sep 17 00:00:00 2001 From: Anand Tiwari <anand.tiwari@linux.com> Date: Fri, 28 Sep 2012 15:45:55 -0600 Subject: [PATCH] xfs_repair: detect realtime extent shared by multiple records in extent map XFS currently can have records in extent map, which starts from unaligned start block w.r.t rextsize. xfs_repair considers this as a bug (multiple claims for a real-time extent) and deletes the file. This patch addresses the issue, by comparing current and previous records and make sure they are contiguous and not overlapped. Signed-off-by: Anand Tiwari <anand.tiwari@linux.com> --- repair/dinode.c | 37 ++++++++++++++++++++++++++++++++++--- 1 file changed, 34 insertions(+), 3 deletions(-) diff --git a/repair/dinode.c b/repair/dinode.c index 5a2da39..5537f1c 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -406,6 +406,7 @@ verify_agbno(xfs_mount_t *mp, static int process_rt_rec( xfs_mount_t *mp, + xfs_bmbt_irec_t *prev, xfs_bmbt_irec_t *irec, xfs_ino_t ino, xfs_drfsbno_t *tot, @@ -413,8 +414,11 @@ process_rt_rec( { xfs_dfsbno_t b; xfs_drtbno_t ext; + xfs_drtbno_t start_block; + xfs_filblks_t block_count; int state; int pwe; /* partially-written extent */ + int rtext_remainder; /* start block is not aligned w.r.t rextsize */ /* * check numeric validity of the extent @@ -461,12 +465,32 @@ _("malformed rt inode extent [%" PRIu64 " %" PRIu64 "] (fs rtext size = %u)\n"), return 1; } + /* If we have start of record unaligned w.r.t to rextsize, see + * if we are sharing this realtime extent with previous record. sharing is only + * allowed with previous extent. fail otherwise. + * Also, we above condition is true, align start block and block count + */ + rtext_remainder = irec->br_startblock % mp->m_sb.sb_rextsize; + if (rtext_remainder) { + do_warn( +_("data fork in rt ino %" PRIu64 " has unalinged start block %"PRIu64 "\n"), + ino, + irec->br_startblock); + if ((prev->br_startoff + prev->br_blockcount == irec->br_startoff) && + (prev->br_startblock + prev->br_blockcount == irec->br_startblock)) { + start_block = irec->br_startblock + (mp->m_sb.sb_rextsize - rtext_remainder); + block_count = irec->br_blockcount - (mp->m_sb.sb_rextsize - rtext_remainder); + } + } else { + start_block = irec->br_startblock; + block_count = irec->br_blockcount; + } + /* * set the appropriate number of extents * this iterates block by block, this can be optimised using extents */ - for (b = irec->br_startblock; b < irec->br_startblock + - irec->br_blockcount; b += mp->m_sb.sb_rextsize) { + for (b = start_block; b < start_block + block_count; b += mp->m_sb.sb_rextsize) { ext = (xfs_drtbno_t) b / mp->m_sb.sb_rextsize; pwe = xfs_sb_version_hasextflgbit(&mp->m_sb) && irec->br_state == XFS_EXT_UNWRITTEN && @@ -548,6 +572,7 @@ process_bmbt_reclist_int( int check_dups, int whichfork) { + xfs_bmbt_irec_t prev; xfs_bmbt_irec_t irec; xfs_dfilblks_t cp = 0; /* prev count */ xfs_dfsbno_t sp = 0; /* prev start */ @@ -574,6 +599,11 @@ process_bmbt_reclist_int( else ftype = _("regular"); + prev.br_startoff = 0; + prev.br_blockcount = 0; + prev.br_startblock = 0; + prev.br_state = 0; + for (i = 0; i < *numrecs; i++) { libxfs_bmbt_disk_get_all(rp + i, &irec); if (i == 0) @@ -610,12 +640,13 @@ _("zero length extent (off = %" PRIu64 ", fsbno = %" PRIu64 ") in ino %" PRIu64 * realtime bitmaps don't use AG locks, so returning * immediately is fine for this code path. */ - if (process_rt_rec(mp, &irec, ino, tot, check_dups)) + if (process_rt_rec(mp, &prev, &irec, ino, tot, check_dups)) return 1; /* * skip rest of loop processing since that'irec.br_startblock * all for regular file forks and attr forks */ + memcpy(&prev, &irec, sizeof(xfs_bmbt_irec_t)); continue; } -- 1.7.9.5 _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files. 2012-09-29 22:49 ` Anand Tiwari @ 2012-10-02 0:59 ` Dave Chinner 2012-10-02 16:47 ` Anand Tiwari 0 siblings, 1 reply; 16+ messages in thread From: Dave Chinner @ 2012-10-02 0:59 UTC (permalink / raw) To: Anand Tiwari; +Cc: Eric Sandeen, xfs On Sat, Sep 29, 2012 at 04:49:03PM -0600, Anand Tiwari wrote: > Subject: [PATCH] xfs_repair: detect realtime extent shared by multiple > records in extent map > > XFS currently can have records in extent map, which starts from unaligned start block w.r.t rextsize. > xfs_repair considers this as a bug (multiple claims for a real-time extent) and deletes the file. > This patch addresses the issue, by comparing current and previous records and make sure they are > contiguous and not overlapped. > > Signed-off-by: Anand Tiwari <anand.tiwari@linux.com> > --- > repair/dinode.c | 37 ++++++++++++++++++++++++++++++++++--- > 1 file changed, 34 insertions(+), 3 deletions(-) > > diff --git a/repair/dinode.c b/repair/dinode.c > index 5a2da39..5537f1c 100644 > --- a/repair/dinode.c > +++ b/repair/dinode.c > @@ -406,6 +406,7 @@ verify_agbno(xfs_mount_t *mp, > static int > process_rt_rec( > xfs_mount_t *mp, > + xfs_bmbt_irec_t *prev, > xfs_bmbt_irec_t *irec, > xfs_ino_t ino, > xfs_drfsbno_t *tot, > @@ -413,8 +414,11 @@ process_rt_rec( > { > xfs_dfsbno_t b; > xfs_drtbno_t ext; > + xfs_drtbno_t start_block; > + xfs_filblks_t block_count; > int state; > int pwe; /* partially-written extent */ > + int rtext_remainder; /* start block is not aligned w.r.t rextsize */ > > /* > * check numeric validity of the extent > @@ -461,12 +465,32 @@ _("malformed rt inode extent [%" PRIu64 " %" PRIu64 "] (fs rtext size = %u)\n"), > return 1; > } > > + /* If we have start of record unaligned w.r.t to rextsize, see > + * if we are sharing this realtime extent with previous record. sharing is only > + * allowed with previous extent. fail otherwise. > + * Also, we above condition is true, align start block and block count > + */ Comment format is: /* * this is a * multiline comment */ > + rtext_remainder = irec->br_startblock % mp->m_sb.sb_rextsize; > + if (rtext_remainder) { > + do_warn( > +_("data fork in rt ino %" PRIu64 " has unalinged start block %"PRIu64 "\n"), unaligned > + ino, > + irec->br_startblock); indent these a bit so the fact they are part of the do_warn function call is obvious. i.e. do_warn( <format string> ino, irec->br_startblock); > + if ((prev->br_startoff + prev->br_blockcount == irec->br_startoff) && > + (prev->br_startblock + prev->br_blockcount == irec->br_startblock)) { > + start_block = irec->br_startblock + (mp->m_sb.sb_rextsize - rtext_remainder); > + block_count = irec->br_blockcount - (mp->m_sb.sb_rextsize - rtext_remainder); > + } > + } else { > + start_block = irec->br_startblock; > + block_count = irec->br_blockcount; > + } So this just changes the loop below not to check the first part of the new extent and therefore avoid the duplicate extent usage detection? Any plans to correct the bmapbt extents so that repeated repair runs don't keep warning about the same problem? > + > /* > * set the appropriate number of extents > * this iterates block by block, this can be optimised using extents > */ > - for (b = irec->br_startblock; b < irec->br_startblock + > - irec->br_blockcount; b += mp->m_sb.sb_rextsize) { > + for (b = start_block; b < start_block + block_count; b += mp->m_sb.sb_rextsize) { > ext = (xfs_drtbno_t) b / mp->m_sb.sb_rextsize; > pwe = xfs_sb_version_hasextflgbit(&mp->m_sb) && > irec->br_state == XFS_EXT_UNWRITTEN && > @@ -548,6 +572,7 @@ process_bmbt_reclist_int( > int check_dups, > int whichfork) > { > + xfs_bmbt_irec_t prev; > xfs_bmbt_irec_t irec; > xfs_dfilblks_t cp = 0; /* prev count */ > xfs_dfsbno_t sp = 0; /* prev start */ > @@ -574,6 +599,11 @@ process_bmbt_reclist_int( > else > ftype = _("regular"); > > + prev.br_startoff = 0; > + prev.br_blockcount = 0; > + prev.br_startblock = 0; > + prev.br_state = 0; memset(&prev, 0, sizeof(prev)); Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: xfs_repair deleting realtime files. 2012-10-02 0:59 ` Dave Chinner @ 2012-10-02 16:47 ` Anand Tiwari 0 siblings, 0 replies; 16+ messages in thread From: Anand Tiwari @ 2012-10-02 16:47 UTC (permalink / raw) To: Dave Chinner; +Cc: Eric Sandeen, Anand Tiwari, xfs On Tue, 2 Oct 2012, Dave Chinner wrote: > On Sat, Sep 29, 2012 at 04:49:03PM -0600, Anand Tiwari wrote: > > Subject: [PATCH] xfs_repair: detect realtime extent shared by multiple > > records in extent map > > > > XFS currently can have records in extent map, which starts from unaligned start block w.r.t rextsize. > > xfs_repair considers this as a bug (multiple claims for a real-time extent) and deletes the file. > > This patch addresses the issue, by comparing current and previous records and make sure they are > > contiguous and not overlapped. > > > > Signed-off-by: Anand Tiwari <anand.tiwari@linux.com> > > --- > > repair/dinode.c | 37 ++++++++++++++++++++++++++++++++++--- > > 1 file changed, 34 insertions(+), 3 deletions(-) > > > > diff --git a/repair/dinode.c b/repair/dinode.c > > index 5a2da39..5537f1c 100644 > > --- a/repair/dinode.c > > +++ b/repair/dinode.c > > @@ -406,6 +406,7 @@ verify_agbno(xfs_mount_t *mp, > > static int > > process_rt_rec( > > xfs_mount_t *mp, > > + xfs_bmbt_irec_t *prev, > > xfs_bmbt_irec_t *irec, > > xfs_ino_t ino, > > xfs_drfsbno_t *tot, > > @@ -413,8 +414,11 @@ process_rt_rec( > > { > > xfs_dfsbno_t b; > > xfs_drtbno_t ext; > > + xfs_drtbno_t start_block; > > + xfs_filblks_t block_count; > > int state; > > int pwe; /* partially-written extent */ > > + int rtext_remainder; /* start block is not aligned w.r.t rextsize */ > > > > /* > > * check numeric validity of the extent > > @@ -461,12 +465,32 @@ _("malformed rt inode extent [%" PRIu64 " %" PRIu64 "] (fs rtext size = %u)\n"), > > return 1; > > } > > > > + /* If we have start of record unaligned w.r.t to rextsize, see > > + * if we are sharing this realtime extent with previous record. sharing is only > > + * allowed with previous extent. fail otherwise. > > + * Also, we above condition is true, align start block and block count > > + */ > > Comment format is: > > /* > * this is a > * multiline comment > */ > > > + rtext_remainder = irec->br_startblock % mp->m_sb.sb_rextsize; > > + if (rtext_remainder) { > > + do_warn( > > +_("data fork in rt ino %" PRIu64 " has unalinged start block %"PRIu64 "\n"), > unaligned > > > + ino, > > + irec->br_startblock); > > indent these a bit so the fact they are part of the do_warn function > call is obvious. i.e. > > do_warn( > <format string> > ino, irec->br_startblock); > > > + if ((prev->br_startoff + prev->br_blockcount == irec->br_startoff) && > > + (prev->br_startblock + prev->br_blockcount == irec->br_startblock)) { > > + start_block = irec->br_startblock + (mp->m_sb.sb_rextsize - rtext_remainder); > > + block_count = irec->br_blockcount - (mp->m_sb.sb_rextsize - rtext_remainder); > > + } > > + } else { > > + start_block = irec->br_startblock; > > + block_count = irec->br_blockcount; > > + } > > So this just changes the loop below not to check the first part of > the new extent and therefore avoid the duplicate extent usage > detection? > > Any plans to correct the bmapbt extents so that repeated repair runs > don't keep warning about the same problem? > I was planning on fixing kernel code first and tackle this later. Let me know if you guys think otherwise. > > + > > /* > > * set the appropriate number of extents > > * this iterates block by block, this can be optimised using extents > > */ > > - for (b = irec->br_startblock; b < irec->br_startblock + > > - irec->br_blockcount; b += mp->m_sb.sb_rextsize) { > > + for (b = start_block; b < start_block + block_count; b += mp->m_sb.sb_rextsize) { > > ext = (xfs_drtbno_t) b / mp->m_sb.sb_rextsize; > > pwe = xfs_sb_version_hasextflgbit(&mp->m_sb) && > > irec->br_state == XFS_EXT_UNWRITTEN && > > @@ -548,6 +572,7 @@ process_bmbt_reclist_int( > > int check_dups, > > int whichfork) > > { > > + xfs_bmbt_irec_t prev; > > xfs_bmbt_irec_t irec; > > xfs_dfilblks_t cp = 0; /* prev count */ > > xfs_dfsbno_t sp = 0; /* prev start */ > > @@ -574,6 +599,11 @@ process_bmbt_reclist_int( > > else > > ftype = _("regular"); > > > > + prev.br_startoff = 0; > > + prev.br_blockcount = 0; > > + prev.br_startblock = 0; > > + prev.br_state = 0; > > memset(&prev, 0, sizeof(prev)); > which one is more preferred in XFS land, a) xfs_bmbt_irec_t prev = {0}; b) memset(&prev, 0, sizeof(prev)); my vote is "a". I already addressed other suggestions. > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > thanks a bunch, anand _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2012-10-02 18:35 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-09-21 0:40 xfs_repair deleting realtime files Anand Tiwari 2012-09-21 5:00 ` Eric Sandeen 2012-09-21 15:51 ` Anand Tiwari 2012-09-21 16:07 ` Eric Sandeen 2012-09-21 16:40 ` Anand Tiwari 2012-09-24 7:55 ` Dave Chinner 2012-09-24 12:51 ` Anand Tiwari 2012-09-26 1:26 ` Anand Tiwari 2012-09-26 2:44 ` Dave Chinner 2012-09-26 3:45 ` Anand Tiwari 2012-09-26 6:17 ` Dave Chinner 2012-09-28 1:27 ` Anand Tiwari 2012-09-28 6:47 ` Dave Chinner 2012-09-29 22:49 ` Anand Tiwari 2012-10-02 0:59 ` Dave Chinner 2012-10-02 16:47 ` Anand Tiwari
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox