Wrapped journal record corruption on read at recovery - patch attached (was Re: XFS corruption with failover)

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* Wrapped journal record corruption on read at recovery - patch attached (was Re: XFS corruption with failover)
       [not found] <mailman.0.1255458988.141519.xfs@oss.sgi.com>
@ 2009-10-13 19:26 ` Andy Poling
  2009-10-13 23:33   ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Poling @ 2009-10-13 19:26 UTC (permalink / raw)
  To: xfs; +Cc: John Quigley

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1713 bytes --]

At his request, I followed up on John Quigley's problem, and I believe I found
a bug in xlog_do_recovery_pass() in handling of a wrapped journal record.  A
patch against 2.6.28.10 is attached.  It looks like the patch will apply
cleanly against any recent version.

Summary of problem:

If a journal record wraps at the physical end of the journal, it has to be
read in two parts in xlog_do_recovery_pass(): a read at the physical end and a
read at the physical beginning.  If xlog_bread() has to re-align the first
read, the second read request does not take that re-alignment into account.
If the first read was re-aligned, the second read over-writes the end of the
data from the first read, effectively corrupting it.  This can happen either
when reading the record header or reading the record data.

The first sanity check in xlog_recover_process_data() is to check for a valid
clientid, so that is the error reported.

Summary of fix:

If there was a first read at the physical end, XFS_BUF_PTR() returns where the
data was requested to begin.  Conversely, because it is the result of
xlog_align(), offset indicates where the requested data for the first read
actually begins - whether or not xlog_bread() has re-aligned it.

Using offset as the base for the calculation of where to place the second read
data ensures that it will be correctly placed immediately following the data
from the first read instead of sometimes over-writing the end of it.

The attached patch has resolved the reported problem of occasional inability
to recover the journal (reporting "bad clientid").

-Andy

It ain't what you don't know that gets you into trouble.
It's what you know for sure that just ain't so. - Mark Twain

[-- Attachment #2: xfs_log_recover.pat --]
[-- Type: IMAGE/X-CORELDRAWPATTERN, Size: 809 bytes --]

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Wrapped journal record corruption on read at recovery - patch attached (was Re: XFS corruption with failover)
  2009-10-13 19:26 ` Wrapped journal record corruption on read at recovery - patch attached (was Re: XFS corruption with failover) Andy Poling
@ 2009-10-13 23:33   ` Christoph Hellwig
  2009-10-14  0:29     ` Andy Poling
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2009-10-13 23:33 UTC (permalink / raw)
  To: Andy Poling; +Cc: John Quigley, xfs

Thanks Andy,

after starting over the code for over half an hour I think both
your analysis and the patch are correct.

-				bufaddr = XFS_BUF_PTR(hbp);
+				if (offset)
+					bufaddr = offset;
+				else
+					bufaddr = XFS_BUF_PTR(hbp);

However the way this is written is a bit confusing.  Probably less
confusing than some of the surrounding code,  but it adds up.

We calculate the offset for one case above, and then after we're done
with the reading that never touches offset we calculate it if we don't
have it.

So this at very least needs a big comment describing it, or even better
moving up the xlog_align call so that we can simply always use
offset directly, which could also get rid of the now superflous bufaddr
variable by always using offset.

Exactly the same also applies for the second use.

Anyway, thanks a lot again and if you can fix this up I'd welcome it,
otherwise I'll do it once I get some time.

I suspect we might even be able to come up with a small enough testcase
for this for xfsqa, we just need to make sure logs are aligned and then
find a good enough heuristic to reproduce log wrapping.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Wrapped journal record corruption on read at recovery - patch attached (was Re: XFS corruption with failover)
  2009-10-13 23:33   ` Christoph Hellwig
@ 2009-10-14  0:29     ` Andy Poling
  2009-10-14 13:20       ` Christoph Hellwig
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Poling @ 2009-10-14  0:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: John Quigley, xfs

On Tue, 13 Oct 2009, Christoph Hellwig wrote:
> However the way this is written is a bit confusing.  Probably less
> confusing than some of the surrounding code,  but it adds up.
>
> We calculate the offset for one case above, and then after we're done
> with the reading that never touches offset we calculate it if we don't
> have it.
>
> So this at very least needs a big comment describing it, or even better
> moving up the xlog_align call so that we can simply always use
> offset directly, which could also get rid of the now superflous bufaddr
> variable by always using offset.
>
> Exactly the same also applies for the second use.
>
> Anyway, thanks a lot again and if you can fix this up I'd welcome it,
> otherwise I'll do it once I get some time.

Agreed.  I aimed for the smallest possible patch, thinking it might increase
the likelihood of acceptance.  :-)

I think the complexity here stems from an uncertainty (as we prepare for the
"second" read) whether there was a first read or not.  As the code reads
today, if there is data before the end, the first read is done and offset has
been set.  If not, offset is NULL.

It seems like the more elegant approach would be to set offset before the
first read, and then update it if the first read takes place (in case it was
unaligned).  That also gets rid of bufaddr, and seems like it might read
better.

I'll try that and see how it goes.

> I suspect we might even be able to come up with a small enough testcase
> for this for xfsqa, we just need to make sure logs are aligned and then
> find a good enough heuristic to reproduce log wrapping.

I don't know if it would be appropriate in that context, but maybe a tool to
process an unwrapped log and wrap it would be easier?

-Andy

It ain't what you don't know that gets you into trouble.
It's what you know for sure that just ain't so. - Mark Twain

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Wrapped journal record corruption on read at recovery - patch attached (was Re: XFS corruption with failover)
  2009-10-14  0:29     ` Andy Poling
@ 2009-10-14 13:20       ` Christoph Hellwig
  2009-10-14 16:43         ` Andy Poling
  0 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2009-10-14 13:20 UTC (permalink / raw)
  To: Andy Poling; +Cc: Christoph Hellwig, John Quigley, xfs

On Tue, Oct 13, 2009 at 07:29:00PM -0500, Andy Poling wrote:
> Agreed.  I aimed for the smallest possible patch, thinking it might increase
> the likelihood of acceptance.  :-)

Heh.  In the Linux world and especially for this project we don't have
any problem with large fixes as long as they are well-explained and
self-contained, epecially if they are as well-documented as yours.

> I think the complexity here stems from an uncertainty (as we prepare for the
> "second" read) whether there was a first read or not.  As the code reads
> today, if there is data before the end, the first read is done and offset has
> been set.  If not, offset is NULL.
>
> It seems like the more elegant approach would be to set offset before the
> first read, and then update it if the first read takes place (in case it was
> unaligned).  That also gets rid of bufaddr, and seems like it might read
> better.

Yeah.  Note that in Linux 2.6.29 I did some changes in that are to
take the read and offset calculation into a common helper, so if the
changes become larger we might see some cosmetic difference between
older and newer kernels.

>> I suspect we might even be able to come up with a small enough testcase
>> for this for xfsqa, we just need to make sure logs are aligned and then
>> find a good enough heuristic to reproduce log wrapping.
>
> I don't know if it would be appropriate in that context, but maybe a tool to
> process an unwrapped log and wrap it would be easier?

We have a tool called loggen to produce log traffic as part of the QA
test suite.  We could try to use it to reproduce this case.  The most
important bit is that we work on a filesystem that actually requires
the alignment bits, that is one using a larger log sector size.  Just
curious, what is the value of sb_logsectlog for your test fs?

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Wrapped journal record corruption on read at recovery - patch attached (was Re: XFS corruption with failover)
  2009-10-14 13:20       ` Christoph Hellwig
@ 2009-10-14 16:43         ` Andy Poling
  2009-10-16  0:36           ` Andy Poling
  0 siblings, 1 reply; 11+ messages in thread
From: Andy Poling @ 2009-10-14 16:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: John Quigley, xfs

On Wed, 14 Oct 2009, Christoph Hellwig wrote:
>> I think the complexity here stems from an uncertainty (as we prepare for the
>> "second" read) whether there was a first read or not.  As the code reads
>> today, if there is data before the end, the first read is done and offset has
>> been set.  If not, offset is NULL.
>>
>> It seems like the more elegant approach would be to set offset before the
>> first read, and then update it if the first read takes place (in case it was
>> unaligned).  That also gets rid of bufaddr, and seems like it might read
>> better.
>
> Yeah.  Note that in Linux 2.6.29 I did some changes in that are to
> take the read and offset calculation into a common helper, so if the
> changes become larger we might see some cosmetic difference between
> older and newer kernels.

I'll definitely take a look at that.  When I looked into doing what I had
previously described above, I realized it doesn't end up looking any clearer
than the original code.  If there is no first read, the second read could
potentially be re-aligned by xlog_bread(), which means we then need to use
xlog_align() to determine the real beginning of the data.

Given that, I'm beginning to think the existing code that only sets offset
when a read is done is pretty good.  I have one more idea to clean it up a bit
that I'll try.

Additionally, I have not been able to conclusively convince myself that with a
wrapped record, after a first read, the second read will never need to be
re-aligned... which would be problematic.  Is the beginning of the log always
correctly aligned?  If so, I will stop worrying about it.

> We have a tool called loggen to produce log traffic as part of the QA
> test suite.  We could try to use it to reproduce this case.  The most
> important bit is that we work on a filesystem that actually requires
> the alignment bits, that is one using a larger log sector size.  Just
> curious, what is the value of sb_logsectlog for your test fs?

If I'm interpreting this correctly, xfs_info says it is 4KiB:

meta-data=/dev/loop0             isize=256    agcount=8, agsize=65536 blks
          =                       sectsz=4096  attr=1
data     =                       bsize=4096   blocks=524288, imaxpct=25
          =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=2560, version=2
          =                       sectsz=4096  sunit=1 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0

-Andy

It ain't what you don't know that gets you into trouble.
It's what you know for sure that just ain't so. - Mark Twain

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Wrapped journal record corruption on read at recovery - patch attached (was Re: XFS corruption with failover)
  2009-10-14 16:43         ` Andy Poling
@ 2009-10-16  0:36           ` Andy Poling
  2009-10-16  6:00             ` Christoph Hellwig
  2009-11-03 17:26             ` [PATCH] xfs: Wrapped journal record corruption on read at recovery Alex Elder
  0 siblings, 2 replies; 11+ messages in thread
From: Andy Poling @ 2009-10-16  0:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: John Quigley, xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1220 bytes --]

On Wed, 14 Oct 2009, Christoph Hellwig wrote:
>> It seems like the more elegant approach would be to set offset before the
>> first read, and then update it if the first read takes place (in case it 
>> was
>> unaligned).  That also gets rid of bufaddr, and seems like it might read
>> better.
> 
> Yeah.  Note that in Linux 2.6.29 I did some changes in that are to
> take the read and offset calculation into a common helper

Looking at the improved code since 2.6.29, and assuming that the beginning of
the log will always be aligned, I have come up with the attached patch against
2.6.31.4.  I think it's cleaner, and it passes my tests.

The bufaddr variable is no longer needed.  I removed the xlog_align() after
the second read for both headers and data since the second read will always be
aligned (even if there was no first read) since it is always at the beginning
of the log.

It sets offset to the beginning of the buffer with XFS_BUF_PTR() before the
first read, and then your improved xlog_bread() will update the offset if the
first read had to be re-aligned.

What do you think?

-Andy

It ain't what you don't know that gets you into trouble.
It's what you know for sure that just ain't so. - Mark Twain

[-- Attachment #2: xfs_log_recover.pat --]
[-- Type: IMAGE/x-coreldrawpattern, Size: 2297 bytes --]

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Wrapped journal record corruption on read at recovery - patch attached (was Re: XFS corruption with failover)
  2009-10-16  0:36           ` Andy Poling
@ 2009-10-16  6:00             ` Christoph Hellwig
  2009-11-03 17:26             ` [PATCH] xfs: Wrapped journal record corruption on read at recovery Alex Elder
  1 sibling, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2009-10-16  6:00 UTC (permalink / raw)
  To: Andy Poling; +Cc: Christoph Hellwig, John Quigley, xfs

On Thu, Oct 15, 2009 at 07:36:33PM -0500, Andy Poling wrote:
> Looking at the improved code since 2.6.29, and assuming that the beginning of
> the log will always be aligned, I have come up with the attached patch against
> 2.6.31.4.  I think it's cleaner, and it passes my tests.
>
> The bufaddr variable is no longer needed.  I removed the xlog_align() after
> the second read for both headers and data since the second read will always be
> aligned (even if there was no first read) since it is always at the beginning
> of the log.
>
> It sets offset to the beginning of the buffer with XFS_BUF_PTR() before the
> first read, and then your improved xlog_bread() will update the offset if the
> first read had to be re-aligned.
>
> What do you think?

The patch looks good to me from a quick glance.  I still need to read
through the code in that area a bit more, and I really want to write a QA
testcase to reproduce it.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] xfs: Wrapped journal record corruption on read at recovery
  2009-10-16  0:36           ` Andy Poling
  2009-10-16  6:00             ` Christoph Hellwig
@ 2009-11-03 17:26             ` Alex Elder
  2009-11-03 19:15               ` Alex Elder
  2009-11-09 18:06               ` Alex Elder
  1 sibling, 2 replies; 11+ messages in thread
From: Alex Elder @ 2009-11-03 17:26 UTC (permalink / raw)
  To: xfs; +Cc: John Quigley, Andy Poling

Andy Poling wrote:
> On Wed, 14 Oct 2009, Christoph Hellwig wrote:
>>> It seems like the more elegant approach would be to set offset before the
>>> first read, and then update it if the first read takes place (in case it was
>>> unaligned).  That also gets rid of bufaddr, and seems like it might read
>>> better.
>> 
>> Yeah.  Note that in Linux 2.6.29 I did some changes in that are to
>> take the read and offset calculation into a common helper
> 
> Looking at the improved code since 2.6.29, and assuming that the beginning of
> the log will always be aligned, I have come up with the attached patch against
> 2.6.31.4.  I think it's cleaner, and it passes my tests.
> 
> The bufaddr variable is no longer needed.  I removed the xlog_align() after
> the second read for both headers and data since the second read will always be
> aligned (even if there was no first read) since it is always at the beginning
> of the log.
> 
> It sets offset to the beginning of the buffer with XFS_BUF_PTR() before the
> first read, and then your improved xlog_bread() will update the offset if the
> first read had to be re-aligned.
> 
> What do you think?
> 
> -Andy

I am re-submitting Andy's patch using the conventions that
(hopefully) Patchwork understands to facilitate tracking.

					-Alex

Author: Andy Poling <andy@realbig.com>

Summary of problem:

If a journal record wraps at the physical end of the journal, it has to be
read in two parts in xlog_do_recovery_pass(): a read at the physical end and a
read at the physical beginning.  If xlog_bread() has to re-align the first
read, the second read request does not take that re-alignment into account.
If the first read was re-aligned, the second read over-writes the end of the
data from the first read, effectively corrupting it.  This can happen either
when reading the record header or reading the record data.

The first sanity check in xlog_recover_process_data() is to check for a valid
clientid, so that is the error reported.


Summary of fix:

If there was a first read at the physical end, XFS_BUF_PTR() returns where the
data was requested to begin.  Conversely, because it is the result of
xlog_align(), offset indicates where the requested data for the first read
actually begins - whether or not xlog_bread() has re-aligned it.

Using offset as the base for the calculation of where to place the second read
data ensures that it will be correctly placed immediately following the data
from the first read instead of sometimes over-writing the end of it.

The attached patch has resolved the reported problem of occasional inability
to recover the journal (reporting "bad clientid").


--- fs/xfs/xfs_log_recover.c.orig	2009-10-14 23:39:49.753494876 -0500
+++ fs/xfs/xfs_log_recover.c	2009-10-15 18:26:23.790887700 -0500
@@ -3517,7 +3517,7 @@
 {
 	xlog_rec_header_t	*rhead;
 	xfs_daddr_t		blk_no;
-	xfs_caddr_t		bufaddr, offset;
+	xfs_caddr_t		offset;
 	xfs_buf_t		*hbp, *dbp;
 	int			error = 0, h_size;
 	int			bblks, split_bblks;
@@ -3610,7 +3610,7 @@
 			/*
 			 * Check for header wrapping around physical end-of-log
 			 */
-			offset = NULL;
+			offset = XFS_BUF_PTR(hbp);
 			split_hblks = 0;
 			wrapped_hblks = 0;
 			if (blk_no + hblks <= log->l_logBBsize) {
@@ -3646,9 +3646,8 @@
 				 *   - order is important.
 				 */
 				wrapped_hblks = hblks - split_hblks;
-				bufaddr = XFS_BUF_PTR(hbp);
 				error = XFS_BUF_SET_PTR(hbp,
-						bufaddr + BBTOB(split_hblks),
+						offset + BBTOB(split_hblks),
 						BBTOB(hblks - split_hblks));
 				if (error)
 					goto bread_err2;
@@ -3658,14 +3657,10 @@
 				if (error)
 					goto bread_err2;
 
-				error = XFS_BUF_SET_PTR(hbp, bufaddr,
+				error = XFS_BUF_SET_PTR(hbp, offset,
 							BBTOB(hblks));
 				if (error)
 					goto bread_err2;
-
-				if (!offset)
-					offset = xlog_align(log, 0,
-							wrapped_hblks, hbp);
 			}
 			rhead = (xlog_rec_header_t *)offset;
 			error = xlog_valid_rec_header(log, rhead,
@@ -3685,7 +3680,7 @@
 			} else {
 				/* This log record is split across the
 				 * physical end of log */
-				offset = NULL;
+				offset = XFS_BUF_PTR(dbp);
 				split_bblks = 0;
 				if (blk_no != log->l_logBBsize) {
 					/* some data is before the physical
@@ -3714,9 +3709,8 @@
 				 *   _first_, then the log start (LR header end)
 				 *   - order is important.
 				 */
-				bufaddr = XFS_BUF_PTR(dbp);
 				error = XFS_BUF_SET_PTR(dbp,
-						bufaddr + BBTOB(split_bblks),
+						offset + BBTOB(split_bblks),
 						BBTOB(bblks - split_bblks));
 				if (error)
 					goto bread_err2;
@@ -3727,13 +3721,9 @@
 				if (error)
 					goto bread_err2;
 
-				error = XFS_BUF_SET_PTR(dbp, bufaddr, h_size);
+				error = XFS_BUF_SET_PTR(dbp, offset, h_size);
 				if (error)
 					goto bread_err2;
-
-				if (!offset)
-					offset = xlog_align(log, wrapped_hblks,
-						bblks - split_bblks, dbp);
 			}
 			xlog_unpack_data(rhead, offset, log);
 			if ((error = xlog_recover_process_data(log, rhash,

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] xfs: Wrapped journal record corruption on read at recovery
  2009-11-03 17:26             ` [PATCH] xfs: Wrapped journal record corruption on read at recovery Alex Elder
@ 2009-11-03 19:15               ` Alex Elder
  2009-11-09 18:04                 ` Alex Elder
  2009-11-09 18:06               ` Alex Elder
  1 sibling, 1 reply; 11+ messages in thread
From: Alex Elder @ 2009-11-03 19:15 UTC (permalink / raw)
  To: Andy Poling; +Cc: John Quigley, xfs

Alex Elder wrote:
> Andy Poling wrote:
>> On Wed, 14 Oct 2009, Christoph Hellwig wrote:
>>>> It seems like the more elegant approach would be to set offset before the
>>>> first read, and then update it if the first read takes place (in case it was
>>>> unaligned).  That also gets rid of bufaddr, and seems like it might read
...


Andy, can you tell me the log sector
size of a file system that exhibits
this failure condition?

You can just send the output of:
    xfs_info /dev/<whatever>
if you like.  I have a suspicion
that there may still be a problem,
even with your proposed fix, and
would like to rule that possibility
out.  Basically, if log sectsz is
is > 1024 your fix is probably OK.

Thanks.

					-Alex

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] xfs: Wrapped journal record corruption on read at recovery
  2009-11-03 19:15               ` Alex Elder
@ 2009-11-09 18:04                 ` Alex Elder
  0 siblings, 0 replies; 11+ messages in thread
From: Alex Elder @ 2009-11-09 18:04 UTC (permalink / raw)
  To: Andy Poling; +Cc: John Quigley, xfs

Alex Elder wrote:
> Alex Elder wrote:
>> Andy Poling wrote:
>>> On Wed, 14 Oct 2009, Christoph Hellwig wrote:
>>>>> It seems like the more elegant approach would be to set offset before the
>>>>> first read, and then update it if the first read takes place (in case it was
>>>>> unaligned).  That also gets rid of bufaddr, and seems like it might read ...
> 
> 
> Andy, can you tell me the log sector
> size of a file system that exhibits
> this failure condition?
> 
> You can just send the output of:
>     xfs_info /dev/<whatever>
> if you like.  I have a suspicion
> that there may still be a problem,
> even with your proposed fix, and
> would like to rule that possibility
> out.  Basically, if log sectsz is
> is > 1024 your fix is probably OK.


Nevermind, I think all is well with your patch.  I'll respond to
your other posting with my positive review shortly.

					-Alex

> Thanks.
> 
> 					-Alex
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH] xfs: Wrapped journal record corruption on read at recovery
  2009-11-03 17:26             ` [PATCH] xfs: Wrapped journal record corruption on read at recovery Alex Elder
  2009-11-03 19:15               ` Alex Elder
@ 2009-11-09 18:06               ` Alex Elder
  1 sibling, 0 replies; 11+ messages in thread
From: Alex Elder @ 2009-11-09 18:06 UTC (permalink / raw)
  To: Andy Poling; +Cc: John Quigley, xfs

Alex Elder wrote:
> Andy Poling wrote:
>> On Wed, 14 Oct 2009, Christoph Hellwig wrote:
>>>> It seems like the more elegant approach would be to set offset before the
>>>> first read, and then update it if the first read takes place (in case it was
>>>> unaligned).  That also gets rid of bufaddr, and seems like it might read
>>>> better.
>>> 
>>> Yeah.  Note that in Linux 2.6.29 I did some changes in that are to
>>> take the read and offset calculation into a common helper
>> 
>> Looking at the improved code since 2.6.29, and assuming that the beginning of
>> the log will always be aligned, I have come up with the attached patch against
>> 2.6.31.4.  I think it's cleaner, and it passes my tests.
>> 
>> The bufaddr variable is no longer needed.  I removed the xlog_align() after
>> the second read for both headers and data since the second read will always be
>> aligned (even if there was no first read) since it is always at the beginning of the log.
>> 
>> It sets offset to the beginning of the buffer with XFS_BUF_PTR() before the
>> first read, and then your improved xlog_bread() will update the offset if the
>> first read had to be re-aligned.
>> 
>> What do you think?

The patch looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>


>> -Andy
> 
> I am re-submitting Andy's patch using the conventions that
> (hopefully) Patchwork understands to facilitate tracking.
> 
> 					-Alex
> 
> Author: Andy Poling <andy@realbig.com>
> 
> Summary of problem:
> 
> If a journal record wraps at the physical end of the journal, it has to be
> read in two parts in xlog_do_recovery_pass(): a read at the physical end and a
> read at the physical beginning.  If xlog_bread() has to re-align the first
> read, the second read request does not take that re-alignment into account.
> If the first read was re-aligned, the second read over-writes the end of the
> data from the first read, effectively corrupting it.  This can happen either
> when reading the record header or reading the record data.
> 
> The first sanity check in xlog_recover_process_data() is to check for a valid
> clientid, so that is the error reported.
> 
> 
> Summary of fix:
> 
> If there was a first read at the physical end, XFS_BUF_PTR() returns where the
> data was requested to begin.  Conversely, because it is the result of
> xlog_align(), offset indicates where the requested data for the first read
> actually begins - whether or not xlog_bread() has re-aligned it.
> 
> Using offset as the base for the calculation of where to place the second read
> data ensures that it will be correctly placed immediately following the data
> from the first read instead of sometimes over-writing the end of it.
> 
> The attached patch has resolved the reported problem of occasional inability
> to recover the journal (reporting "bad clientid").
> 
> 
> --- fs/xfs/xfs_log_recover.c.orig	2009-10-14 23:39:49.753494876 -0500
> +++ fs/xfs/xfs_log_recover.c	2009-10-15 18:26:23.790887700 -0500
> @@ -3517,7 +3517,7 @@
>  {
>  	xlog_rec_header_t	*rhead;
>  	xfs_daddr_t		blk_no;
> -	xfs_caddr_t		bufaddr, offset;
> +	xfs_caddr_t		offset;
>  	xfs_buf_t		*hbp, *dbp;
>  	int			error = 0, h_size;
>  	int			bblks, split_bblks;
> @@ -3610,7 +3610,7 @@
>  			/*
>  			 * Check for header wrapping around physical end-of-log
>  			 */
> -			offset = NULL;
> +			offset = XFS_BUF_PTR(hbp);
>  			split_hblks = 0;
>  			wrapped_hblks = 0;
>  			if (blk_no + hblks <= log->l_logBBsize) {
> @@ -3646,9 +3646,8 @@
>  				 *   - order is important.
>  				 */
>  				wrapped_hblks = hblks - split_hblks;
> -				bufaddr = XFS_BUF_PTR(hbp);
>  				error = XFS_BUF_SET_PTR(hbp,
> -						bufaddr + BBTOB(split_hblks),
> +						offset + BBTOB(split_hblks),
>  						BBTOB(hblks - split_hblks));
>  				if (error)
>  					goto bread_err2;
> @@ -3658,14 +3657,10 @@
>  				if (error)
>  					goto bread_err2;
> 
> -				error = XFS_BUF_SET_PTR(hbp, bufaddr,
> +				error = XFS_BUF_SET_PTR(hbp, offset,
>  							BBTOB(hblks));
>  				if (error)
>  					goto bread_err2;
> -
> -				if (!offset)
> -					offset = xlog_align(log, 0,
> -							wrapped_hblks, hbp);
>  			}
>  			rhead = (xlog_rec_header_t *)offset;
>  			error = xlog_valid_rec_header(log, rhead,
> @@ -3685,7 +3680,7 @@
>  			} else {
>  				/* This log record is split across the
>  				 * physical end of log */
> -				offset = NULL;
> +				offset = XFS_BUF_PTR(dbp);
>  				split_bblks = 0;
>  				if (blk_no != log->l_logBBsize) {
>  					/* some data is before the physical
> @@ -3714,9 +3709,8 @@
>  				 *   _first_, then the log start (LR header end)
>  				 *   - order is important.
>  				 */
> -				bufaddr = XFS_BUF_PTR(dbp);
>  				error = XFS_BUF_SET_PTR(dbp,
> -						bufaddr + BBTOB(split_bblks),
> +						offset + BBTOB(split_bblks),
>  						BBTOB(bblks - split_bblks));
>  				if (error)
>  					goto bread_err2;
> @@ -3727,13 +3721,9 @@
>  				if (error)
>  					goto bread_err2;
> 
> -				error = XFS_BUF_SET_PTR(dbp, bufaddr, h_size);
> +				error = XFS_BUF_SET_PTR(dbp, offset, h_size);
>  				if (error)
>  					goto bread_err2;
> -
> -				if (!offset)
> -					offset = xlog_align(log, wrapped_hblks,
> -						bblks - split_bblks, dbp);
>  			}
>  			xlog_unpack_data(rhead, offset, log);
>  			if ((error = xlog_recover_process_data(log, rhash,
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-11-09 18:05 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <mailman.0.1255458988.141519.xfs@oss.sgi.com>
2009-10-13 19:26 ` Wrapped journal record corruption on read at recovery - patch attached (was Re: XFS corruption with failover) Andy Poling
2009-10-13 23:33   ` Christoph Hellwig
2009-10-14  0:29     ` Andy Poling
2009-10-14 13:20       ` Christoph Hellwig
2009-10-14 16:43         ` Andy Poling
2009-10-16  0:36           ` Andy Poling
2009-10-16  6:00             ` Christoph Hellwig
2009-11-03 17:26             ` [PATCH] xfs: Wrapped journal record corruption on read at recovery Alex Elder
2009-11-03 19:15               ` Alex Elder
2009-11-09 18:04                 ` Alex Elder
2009-11-09 18:06               ` Alex Elder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox