public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Re: XFS corruption with failover
       [not found] <990461759.2142271250648177725.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
@ 2009-08-19  2:18 ` Lachlan McIlroy
  2009-08-19 15:46   ` John Quigley
  0 siblings, 1 reply; 8+ messages in thread
From: Lachlan McIlroy @ 2009-08-19  2:18 UTC (permalink / raw)
  To: John Quigley; +Cc: XFS Development

----- "John Quigley" <jquigley@jquigley.com> wrote:

> Lachlan McIlroy wrote:
> > If that fails too can you run xfs_logprint on /dev/sde and
> > post any errors it reports?
> 
> My apologies for the delayed response; output of logprint can be
> downloaded as a ~4MB bzip:
> 
> http://www.jquigley.com/files/tmp/xfs-failover-logprint.bz2

xfs_logprint doesn't find any problems with this log but that doesn't mean
the kernel doesn't - they use different implementations to read the log.  I
noticed that the active part of the log wraps around the physical end/start
of the log which reminds of this fix:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d1afb678ce77b930334a8a640a05b8e68178a377

I remember that without this fix we were seeing ASSERTs in the log recovery
code - unfortunately I don't remember exactly where but it could be from
the same location you are getting the "bad clientid" error.  When a log
record wraps the end/start of the physical log we need to do two I/Os to
read the log record in.  This bug caused the second read to go to an
incorrect location in the buffer which overwrote part of the first I/O and
corrupted the log record.  I think the fix made it into 2.6.24.

> 
> Thanks very much for your consideration.
> 
> - John Quigley
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS corruption with failover
  2009-08-19  2:18 ` XFS corruption with failover Lachlan McIlroy
@ 2009-08-19 15:46   ` John Quigley
  2009-08-23 18:17     ` XFS corruption with power failure John Quigley
  0 siblings, 1 reply; 8+ messages in thread
From: John Quigley @ 2009-08-19 15:46 UTC (permalink / raw)
  To: XFS Development

Lachlan McIlroy wrote:
> xfs_logprint doesn't find any problems with this log but that doesn't mean
> the kernel doesn't - they use different implementations to read the log.  I
> noticed that the active part of the log wraps around the physical end/start
> of the log which reminds of this fix:

Very interesting indeed, thank you /very/ much for looking at this.

> I think the fix made it into 2.6.24.

We're currently using the very latest 2.6.30, unfortunately.  We've distilled this into a reproducible environment with a stack of NFS + XFS to a local disk + automated sysrq 'b' reboots.  We're working on getting this bundled up into a nice little package as a VirtualBox vm for your consumption.  Please tell me if this is not desirable.

Thanks very much again.

John Quigley
jquigley.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS corruption with power failure
  2009-08-19 15:46   ` John Quigley
@ 2009-08-23 18:17     ` John Quigley
  2009-08-26 17:54       ` John Quigley
  2009-09-01 19:17       ` John Quigley
  0 siblings, 2 replies; 8+ messages in thread
From: John Quigley @ 2009-08-23 18:17 UTC (permalink / raw)
  To: XFS Development

John Quigley wrote:
> We've distilled this into a reproducible environment with a stack of NFS + XFS 
> to a local disk + automated sysrq 'b' reboots.  We're working on getting 
> this bundled up into a nice little package as a VirtualBox vm for your 
> consumption.  Please tell me if this is not desirable.

The self-contained and reproducible environment can be downloaded from the following location:

  http://www.jquigley.com/tmp/xfsVM.tar.bz2

That's a ~550 MB (compressed) image that can be 'imported' directly into the latest VirtualBox.  Instructions for setting up the environment (trivial, should take you a mere couple of minutes):

  http://www.jquigley.com/tmp/README.txt

The basic concept here is to use the VM as the file server, accessing the XFS file system thereon through the guest OS which acts as the NFS client.  Automated reboots are effected with cron and sysrq 'b', so you can just set this up and run it until failure.

Hope this helps lend insight - if there are any questions, please ask.  Thanks for your time.

- John Quigley

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS corruption with power failure
  2009-08-23 18:17     ` XFS corruption with power failure John Quigley
@ 2009-08-26 17:54       ` John Quigley
  2009-09-01 18:13         ` Christoph Hellwig
  2009-09-01 19:17       ` John Quigley
  1 sibling, 1 reply; 8+ messages in thread
From: John Quigley @ 2009-08-26 17:54 UTC (permalink / raw)
  To: XFS Development

John Quigley wrote:
> John Quigley wrote:
>> We've distilled this into a reproducible environment with a stack of 
>> NFS + XFS to a local disk + automated sysrq 'b' reboots.  We're 
>> working on getting this bundled up into a nice little package as a 
>> VirtualBox vm for your consumption.  Please tell me if this is not 
>> desirable.
> 
> The self-contained and reproducible environment can be downloaded from 
> the following location:
> 
>  http://www.jquigley.com/tmp/xfsVM.tar.bz2

Has anyone by chance had an opportunity to utilize this?  Any corruption reports?

- John Quigley

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS corruption with power failure
  2009-08-26 17:54       ` John Quigley
@ 2009-09-01 18:13         ` Christoph Hellwig
  2009-09-15 15:50           ` John Quigley
  0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2009-09-01 18:13 UTC (permalink / raw)
  To: John Quigley; +Cc: XFS Development

On Wed, Aug 26, 2009 at 12:54:56PM -0500, John Quigley wrote:
> John Quigley wrote:
>> John Quigley wrote:
>>> We've distilled this into a reproducible environment with a stack of  
>>> NFS + XFS to a local disk + automated sysrq 'b' reboots.  We're  
>>> working on getting this bundled up into a nice little package as a  
>>> VirtualBox vm for your consumption.  Please tell me if this is not  
>>> desirable.
>>
>> The self-contained and reproducible environment can be downloaded from  
>> the following location:
>>
>>  http://www.jquigley.com/tmp/xfsVM.tar.bz2
>
> Has anyone by chance had an opportunity to utilize this?  Any corruption reports?

Looked at it, but it turns virtualbox is a real big pile of junk
including it's own huge kernel module.  Qemu/kvm now has support for the
virtualbox disk images and I will give it a try next.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS corruption with power failure
  2009-08-23 18:17     ` XFS corruption with power failure John Quigley
  2009-08-26 17:54       ` John Quigley
@ 2009-09-01 19:17       ` John Quigley
  1 sibling, 0 replies; 8+ messages in thread
From: John Quigley @ 2009-09-01 19:17 UTC (permalink / raw)
  To: XFS Development

John Quigley wrote:
> John Quigley wrote:
>> We've distilled this into a reproducible environment with a stack of 
>> NFS + XFS to a local disk + automated sysrq 'b' reboots.  We're 
>> working on getting this bundled up into a nice little package as a 
>> VirtualBox vm for your consumption.  Please tell me if this is not 
>> desirable.

By way of an update, the corruption is definitely specific to Linux nfsd access to XFS at time of power failure.  We've be unable to reproduce the problem in any other context except when running IO through NFS to the underlying XFS mount.

- John Quigley

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS corruption with power failure
  2009-09-01 18:13         ` Christoph Hellwig
@ 2009-09-15 15:50           ` John Quigley
  2009-09-17 18:35             ` Christoph Hellwig
  0 siblings, 1 reply; 8+ messages in thread
From: John Quigley @ 2009-09-15 15:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: XFS Development

Christoph Hellwig wrote:
> On Wed, Aug 26, 2009 at 12:54:56PM -0500, John Quigley wrote:
>> John Quigley wrote:
>>> John Quigley wrote:
>>>> We've distilled this into a reproducible environment with a stack of  
>>>> NFS + XFS to a local disk + automated sysrq 'b' reboots.  We're  
>>>> working on getting this bundled up into a nice little package as a  
>>>> VirtualBox vm for your consumption.  Please tell me if this is not  
>>>> desirable.
>>> The self-contained and reproducible environment can be downloaded from  
>>> the following location:
>>>
>>>  http://www.jquigley.com/tmp/xfsVM.tar.bz2
>> Has anyone by chance had an opportunity to utilize this?  Any corruption reports?
> 
> Looked at it, but it turns virtualbox is a real big pile of junk
> including it's own huge kernel module.  Qemu/kvm now has support for the
> virtualbox disk images and I will give it a try next.

Ping ... sorry to be a bother.  I've finally gotten some time allocated 
to look at this at the code level, but I'm unfamiliar with the XFS 
implementation, and I believe this problem arises as an interaction 
between XFS and NFS, which causes me heartburn.  I'll report here if I 
discover anything meaningful.

- John Quigley

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: XFS corruption with power failure
  2009-09-15 15:50           ` John Quigley
@ 2009-09-17 18:35             ` Christoph Hellwig
  0 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2009-09-17 18:35 UTC (permalink / raw)
  To: John Quigley; +Cc: Christoph Hellwig, XFS Development

On Tue, Sep 15, 2009 at 10:50:58AM -0500, John Quigley wrote:
>> including it's own huge kernel module.  Qemu/kvm now has support for the
>> virtualbox disk images and I will give it a try next.
>
> Ping ... sorry to be a bother.  I've finally gotten some time allocated  
> to look at this at the code level, but I'm unfamiliar with the XFS  
> implementation, and I believe this problem arises as an interaction  
> between XFS and NFS, which causes me heartburn.  I'll report here if I  
> discover anything meaningful.

Sorry, busy time here.  I manage to get your image files converted for
use with qemu, but I can't actually get the image to output anything on
a serial concole.  Given that content in READMe I guess you need some
graphics bit for it?    That's unforuntately not easily doable for me
sitting behind a slow linux with the testbox far away.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-09-17 18:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <990461759.2142271250648177725.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-08-19  2:18 ` XFS corruption with failover Lachlan McIlroy
2009-08-19 15:46   ` John Quigley
2009-08-23 18:17     ` XFS corruption with power failure John Quigley
2009-08-26 17:54       ` John Quigley
2009-09-01 18:13         ` Christoph Hellwig
2009-09-15 15:50           ` John Quigley
2009-09-17 18:35             ` Christoph Hellwig
2009-09-01 19:17       ` John Quigley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox