public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* xfs file system corruption
@ 2008-10-07 23:18 Allan Haywood
  2008-10-07 23:34 ` Dave Chinner
  0 siblings, 1 reply; 6+ messages in thread
From: Allan Haywood @ 2008-10-07 23:18 UTC (permalink / raw)
  To: xfs@oss.sgi.com

To All:

On a production system we are running into an interesting situation where we hit a corrupt file system, let me outline the timeline as best I know it:

We have a failover process where there are two servers connected to fiber storage, if the active server goes into failover (for numerous reasons) an automatic process kicks in that makes it inactive, and then makes the backup server active, here are the details:


1.       On failed server database and other processes are shut down (attempts)

2.       Fiber attached file system is unmounted (attempts)

3.       Fiber ports are turned off for that server

4.       On backup server fiber ports are turned on

5.       Fiber attached file system is mounted (same filesystems that were on the previous server)

6.       Database and other processes are started

7.       The backup server is now active and processing queries

Here is where it got interesting, when recovering from the backup server back to the main server we pretty much just reverse the steps above. We had the file systems unmount cleanly on the backup server, however when we went to mount it on the main server it detected a file system corruption (using xfs_check it indicated a repair was needed so xfs_repair was then run on the filesystem), it proceded to "fix" the filesystem, at which point we lost files that the database needed for one of the tables.

What I am curious about is the following message in the system log:

Oct  2 08:15:09 arch-node4 kernel: Device dm-31, XFS metadata write error block 0x40 in dm-31

This is when the main node was fenced (fiber ports turned off), I am wondering if any pending XFS metadata still exists, later on when the fiber is unfenced that the metadata flushes to disk. I could see this as an issue, if there are pending metadata writes to a filesystem, that filesystem through failure is mounted on another server and used as normal, then unmounted normally, then when the ports are re-activated on the server that has pending metadata, is it possible this does get flushed to the disk, but since the disk has been in use on another server the metadata no longer matches the filesystem properly and potentially writes over or changes the filesystem in a way that causes corruption.

Any thoughts would be great.

If there is any more info I can provide, let me know.

Thanks.


--

Allan Haywood, Systems Engineering Program Manager II
SQL Server Data Warehousing Product Unit
allan.haywood@microsoft.com



[[HTML alternate version deleted]]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfs file system corruption
  2008-10-07 23:18 xfs file system corruption Allan Haywood
@ 2008-10-07 23:34 ` Dave Chinner
  2008-10-07 23:49   ` Allan Haywood
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2008-10-07 23:34 UTC (permalink / raw)
  To: Allan Haywood; +Cc: xfs@oss.sgi.com

[Allan, please wrap text at 72 lines. Thx]

On Tue, Oct 07, 2008 at 04:18:57PM -0700, Allan Haywood wrote:
> We have a failover process where there are two servers connected
> to fiber storage, if the active server goes into failover (for
> numerous reasons) an automatic process kicks in that makes it
> inactive, and then makes the backup server active, here are the
> details:
> 
> 1.       On failed server database and other processes are shut
> down (attempts)
> 
> 2.       Fiber attached file system is unmounted (attempts)
> 
> 3.       Fiber ports are turned off for that server
> 
> 4.       On backup server fiber ports are turned on
> 
> 5.       Fiber attached file system is mounted (same filesystems
> that were on the previous server)
> 
> 6.       Database and other processes are started
> 
> 7.       The backup server is now active and processing queries
> 
> Here is where it got interesting, when recovering from the backup
> server back to the main server we pretty much just reverse the
> steps above. We had the file systems unmount cleanly on the backup
> server, however when we went to mount it on the main server it
> detected a file system corruption (using xfs_check it indicated a
> repair was needed so xfs_repair was then run on the filesystem),
> it proceded to "fix" the filesystem, at which point we lost files
> that the database needed for one of the tables.
> 
> What I am curious about is the following message in the system
> log:
> 
> Oct  2 08:15:09 arch-node4 kernel: Device dm-31, XFS metadata
> write error block 0x40 in dm-31
> 
> This is when the main node was fenced (fiber ports turned off), I
> am wondering if any pending XFS metadata still exists, later on
> when the fiber is unfenced that the metadata flushes to disk.

According to your above detail, you attempt to unmount the filesystem
before you fence the fibre ports. If you don't unmount the
filesystem before fencing, this is what you'll see - XFS trying
to writeback async metadata and failing.

> I could see this as an issue, if there are pending metadata writes
> to a filesystem, that filesystem through failure is mounted on
> another server and used as normal, then unmounted normally, then
> when the ports are re-activated on the server that has pending
> metadata, is it possible this does get flushed to the disk, but
> since the disk has been in use on another server the metadata no
> longer matches the filesystem properly and potentially writes over
> or changes the filesystem in a way that causes corruption.

Right.

Once you've fenced the server, you really, really need to make
sure that it has no further pending writes that could be issued
when the fence is removed. I'd suggest that if you failed to
unmount the filesystem before fencing, you need to reboot that
server to remove any possibility of it issuing stale I/O
once it is unfenced. i.e. step 3b = STONITH.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: xfs file system corruption
  2008-10-07 23:34 ` Dave Chinner
@ 2008-10-07 23:49   ` Allan Haywood
  2008-10-07 23:58     ` Allan Haywood
  2008-10-08  0:15     ` Andi Kleen
  0 siblings, 2 replies; 6+ messages in thread
From: Allan Haywood @ 2008-10-07 23:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs@oss.sgi.com



-----Original Message-----
From: Dave Chinner [mailto:david@fromorbit.com]
Sent: Tuesday, October 07, 2008 4:34 PM
To: Allan Haywood
Cc: xfs@oss.sgi.com
Subject: Re: xfs file system corruption

[Allan, please wrap text at 72 lines. Thx]

On Tue, Oct 07, 2008 at 04:18:57PM -0700, Allan Haywood wrote:
> We have a failover process where there are two servers connected
> to fiber storage, if the active server goes into failover (for
> numerous reasons) an automatic process kicks in that makes it
> inactive, and then makes the backup server active, here are the
> details:
>
> 1.       On failed server database and other processes are shut
> down (attempts)
>
> 2.       Fiber attached file system is unmounted (attempts)
>
> 3.       Fiber ports are turned off for that server
>
> 4.       On backup server fiber ports are turned on
>
> 5.       Fiber attached file system is mounted (same filesystems
> that were on the previous server)
>
> 6.       Database and other processes are started
>
> 7.       The backup server is now active and processing queries
>
> Here is where it got interesting, when recovering from the backup
> server back to the main server we pretty much just reverse the
> steps above. We had the file systems unmount cleanly on the backup
> server, however when we went to mount it on the main server it
> detected a file system corruption (using xfs_check it indicated a
> repair was needed so xfs_repair was then run on the filesystem),
> it proceded to "fix" the filesystem, at which point we lost files
> that the database needed for one of the tables.
>
> What I am curious about is the following message in the system
> log:
>
> Oct  2 08:15:09 arch-node4 kernel: Device dm-31, XFS metadata
> write error block 0x40 in dm-31
>
> This is when the main node was fenced (fiber ports turned off), I
> am wondering if any pending XFS metadata still exists, later on
> when the fiber is unfenced that the metadata flushes to disk.

According to your above detail, you attempt to unmount the filesystem
before you fence the fibre ports. If you don't unmount the
filesystem before fencing, this is what you'll see - XFS trying
to writeback async metadata and failing.

> I could see this as an issue, if there are pending metadata writes
> to a filesystem, that filesystem through failure is mounted on
> another server and used as normal, then unmounted normally, then
> when the ports are re-activated on the server that has pending
> metadata, is it possible this does get flushed to the disk, but
> since the disk has been in use on another server the metadata no
> longer matches the filesystem properly and potentially writes over
> or changes the filesystem in a way that causes corruption.

Right.

Once you've fenced the server, you really, really need to make
sure that it has no further pending writes that could be issued
when the fence is removed. I'd suggest that if you failed to
unmount the filesystem before fencing, you need to reboot that
server to remove any possibility of it issuing stale I/O
once it is unfenced. i.e. step 3b = STONITH.

        Would reloading the xfs module work also, to clear any pending
        writes (if I could get it to a point where modprobe -r xfs
        would work)? Although I am doubting that if there are pending
        writes that it would be easy to get xfs to unload.

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: xfs file system corruption
  2008-10-07 23:49   ` Allan Haywood
@ 2008-10-07 23:58     ` Allan Haywood
  2008-10-08  0:12       ` Dave Chinner
  2008-10-08  0:15     ` Andi Kleen
  1 sibling, 1 reply; 6+ messages in thread
From: Allan Haywood @ 2008-10-07 23:58 UTC (permalink / raw)
  To: Allan Haywood, Dave Chinner; +Cc: xfs@oss.sgi.com

[Allan, please wrap text at 72 lines. Thx]

On Tue, Oct 07, 2008 at 04:18:57PM -0700, Allan Haywood wrote:
> We have a failover process where there are two servers connected
> to fiber storage, if the active server goes into failover (for
> numerous reasons) an automatic process kicks in that makes it
> inactive, and then makes the backup server active, here are the
> details:
>
> 1.       On failed server database and other processes are shut
> down (attempts)
>
> 2.       Fiber attached file system is unmounted (attempts)
>
> 3.       Fiber ports are turned off for that server
>
> 4.       On backup server fiber ports are turned on
>
> 5.       Fiber attached file system is mounted (same filesystems
> that were on the previous server)
>
> 6.       Database and other processes are started
>
> 7.       The backup server is now active and processing queries
>
> Here is where it got interesting, when recovering from the backup
> server back to the main server we pretty much just reverse the
> steps above. We had the file systems unmount cleanly on the backup
> server, however when we went to mount it on the main server it
> detected a file system corruption (using xfs_check it indicated a
> repair was needed so xfs_repair was then run on the filesystem),
> it proceded to "fix" the filesystem, at which point we lost files
> that the database needed for one of the tables.
>
> What I am curious about is the following message in the system
> log:
>
> Oct  2 08:15:09 arch-node4 kernel: Device dm-31, XFS metadata
> write error block 0x40 in dm-31
>
> This is when the main node was fenced (fiber ports turned off), I
> am wondering if any pending XFS metadata still exists, later on
> when the fiber is unfenced that the metadata flushes to disk.

According to your above detail, you attempt to unmount the filesystem
before you fence the fibre ports. If you don't unmount the
filesystem before fencing, this is what you'll see - XFS trying
to writeback async metadata and failing.

> I could see this as an issue, if there are pending metadata writes
> to a filesystem, that filesystem through failure is mounted on
> another server and used as normal, then unmounted normally, then
> when the ports are re-activated on the server that has pending
> metadata, is it possible this does get flushed to the disk, but
> since the disk has been in use on another server the metadata no
> longer matches the filesystem properly and potentially writes over
> or changes the filesystem in a way that causes corruption.

Right.

Once you've fenced the server, you really, really need to make
sure that it has no further pending writes that could be issued
when the fence is removed. I'd suggest that if you failed to
unmount the filesystem before fencing, you need to reboot that
server to remove any possibility of it issuing stale I/O
once it is unfenced. i.e. step 3b = STONITH.

        Would reloading the xfs module work also, to clear any pending
        writes (if I could get it to a point where modprobe -r xfs
        would work)? Although I am doubting that if there are pending
        writes that it would be easy to get xfs to unload.

                Another possibility, is there a command that will tell xfs
                To clear any pending writes?

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfs file system corruption
  2008-10-07 23:58     ` Allan Haywood
@ 2008-10-08  0:12       ` Dave Chinner
  0 siblings, 0 replies; 6+ messages in thread
From: Dave Chinner @ 2008-10-08  0:12 UTC (permalink / raw)
  To: Allan Haywood; +Cc: xfs@oss.sgi.com

On Tue, Oct 07, 2008 at 04:58:24PM -0700, Allan Haywood wrote:
> > I could see this as an issue, if there are pending metadata writes
> > to a filesystem, that filesystem through failure is mounted on
> > another server and used as normal, then unmounted normally, then
> > when the ports are re-activated on the server that has pending
> > metadata, is it possible this does get flushed to the disk, but
> > since the disk has been in use on another server the metadata no
> > longer matches the filesystem properly and potentially writes over
> > or changes the filesystem in a way that causes corruption.
> 
> Right.
> 
> Once you've fenced the server, you really, really need to make
> sure that it has no further pending writes that could be issued
> when the fence is removed. I'd suggest that if you failed to
> unmount the filesystem before fencing, you need to reboot that
> server to remove any possibility of it issuing stale I/O
> once it is unfenced. i.e. step 3b = STONITH.
> 
> > Would reloading the xfs module work also, to clear any pending
> > writes (if I could get it to a point where modprobe -r xfs
> > would work)? Although I am doubting that if there are pending
> > writes that it would be easy to get xfs to unload.

Correct. While a filesystem is mounted, you can't unload the XFS
module.

> > Another possibility, is there a command that will tell xfs
> > To clear any pending writes?

You can force-shutdown the filesystem then unmount it. That
is:

# xfs_io -x -c "shutdown" <mtpt>
# umount <mtpt>

See the man page for xfs_io - you want to shut down the filesystem
without forcing the log (can't do I/O).

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xfs file system corruption
  2008-10-07 23:49   ` Allan Haywood
  2008-10-07 23:58     ` Allan Haywood
@ 2008-10-08  0:15     ` Andi Kleen
  1 sibling, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2008-10-08  0:15 UTC (permalink / raw)
  To: Allan Haywood; +Cc: Dave Chinner, xfs@oss.sgi.com

Allan Haywood <Allan.Haywood@microsoft.com> writes:
>
>         Would reloading the xfs module work also, to clear any pending
>         writes (if I could get it to a point where modprobe -r xfs
>         would work)? Although I am doubting that if there are pending
>         writes that it would be easy to get xfs to unload.

Linux doesn't have IO cancel and it's hard to stop everything
in the IO stack, so the usual way is to fence it at a high level
and then wait for all pending IO. But that might take a long time.

That is why most HA setups use hard stonith, as in power switch.

-Andi
-- 
ak@linux.intel.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-10-08  0:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-07 23:18 xfs file system corruption Allan Haywood
2008-10-07 23:34 ` Dave Chinner
2008-10-07 23:49   ` Allan Haywood
2008-10-07 23:58     ` Allan Haywood
2008-10-08  0:12       ` Dave Chinner
2008-10-08  0:15     ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox