* XFS corruption with failover
@ 2009-08-13 20:17 John Quigley
2009-08-13 21:17 ` Emmanuel Florac
2009-08-13 21:44 ` Felix Blyakher
0 siblings, 2 replies; 20+ messages in thread
From: John Quigley @ 2009-08-13 20:17 UTC (permalink / raw)
To: XFS Development
Folks:
We're deploying XFS in a configuration where the file system is being exported with NFS. XFS is being mounted on Linux, with default options; an iSCSI volume is the formatted media. We're working out a failover solution for this deployment utilizing Linux HA. Things appear to work correctly in the general case, but in continuous testing we're getting XFS superblock corruption on a very reproducible basis.
The sequence of events in our test scenario:
1. NFS server #1 online
2. Run IO to NFS server #1 from NFS client
3. NFS server #1 offline, (via passing 'b' to /proc/sysrq-trigger)
4. NFS server #2 online
5. XFS mounted as part of failover mechanism, mount fails
The mount fails with the following:
<snip>
kernel: XFS mounting filesystem sde
kernel: Starting XFS recovery on filesystem: sde (logdev: internal)
kernel: XFS: xlog_recover_process_data: bad clientid
kernel: XFS: log mount/recovery failed: error 5
kernel: XFS: log mount failed
</snip>
When running xfs_repair:
<snip>
[root@machine ~]# xfs_repair /dev/sde
xfs_repair: warning - cannot set blocksize on block device /dev/sde: Invalid argument
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ERROR: The filesystem has valuable metadata changes in a log which needs ...
</snip>
Any advice or insight into what we're doing wrong would be very much appreciated. My apologies in advance for the somewhat off-topic question.
- John Quigley
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-13 20:17 XFS corruption with failover John Quigley
@ 2009-08-13 21:17 ` Emmanuel Florac
2009-08-13 22:42 ` Felix Blyakher
2009-08-14 0:50 ` John Quigley
2009-08-13 21:44 ` Felix Blyakher
1 sibling, 2 replies; 20+ messages in thread
From: Emmanuel Florac @ 2009-08-13 21:17 UTC (permalink / raw)
To: John Quigley; +Cc: XFS Development
Le Thu, 13 Aug 2009 15:17:22 -0500 vous écriviez:
> Any advice or insight into what we're doing wrong would be very much
> appreciated. My apologies in advance for the somewhat off-topic
> question.
By killing abruptly the primary server while doing IO, you're probably
pushing the envelope... You may have a somewhat better luck with a
cluster fs, OCFS2 works very well for me usually (GFS is a complete
PITA to setup).
The better option would be to disallow completely write
caching on the client side (because this is probably where it's going
wrong) however I don't know how. You can get it to flush extremely
often by playing with /proc/sys/vm/dirty_expire_centiseconds
and /proc/sys/vm/dirty_writeback_centisecs, though. Safer settings
generally imply terrible performance, though, you've been warned.
Ah another thing may be some cache option in the iSCSI target. what
target are you using?
--
--------------------------------------------------
Emmanuel Florac www.intellique.com
--------------------------------------------------
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-13 20:17 XFS corruption with failover John Quigley
2009-08-13 21:17 ` Emmanuel Florac
@ 2009-08-13 21:44 ` Felix Blyakher
2009-08-14 0:31 ` Eric Sandeen
2009-08-14 0:56 ` John Quigley
1 sibling, 2 replies; 20+ messages in thread
From: Felix Blyakher @ 2009-08-13 21:44 UTC (permalink / raw)
To: John Quigley; +Cc: XFS Development
On Aug 13, 2009, at 3:17 PM, John Quigley wrote:
> Folks:
>
> We're deploying XFS in a configuration where the file system is
> being exported with NFS. XFS is being mounted on Linux, with
> default options; an iSCSI volume is the formatted media. We're
> working out a failover solution for this deployment utilizing Linux
> HA. Things appear to work correctly in the general case, but in
> continuous testing we're getting XFS superblock corruption on a very
> reproducible basis.
> The sequence of events in our test scenario:
>
> 1. NFS server #1 online
> 2. Run IO to NFS server #1 from NFS client
> 3. NFS server #1 offline, (via passing 'b' to /proc/sysrq-trigger)
> 4. NFS server #2 online
> 5. XFS mounted as part of failover mechanism, mount fails
>
> The mount fails with the following:
>
> <snip>
> kernel: XFS mounting filesystem sde
> kernel: Starting XFS recovery on filesystem: sde (logdev: internal)
> kernel: XFS: xlog_recover_process_data: bad clientid
> kernel: XFS: log mount/recovery failed: error 5
This is an IO error. Is the block device (/dev/sde) accessible
from the server #2 OK? Can you dd from that device?
>
> kernel: XFS: log mount failed
> </snip>
>
> When running xfs_repair:
That's not a good time to run xfs_repair. There were no
indication that the filesystem is corrupted.
Let's take for a sec "NFS server #2" out of the picture.
Can you mount the filesystem from the original server
after it reboots?
Felix
>
>
> <snip>
> [root@machine ~]# xfs_repair /dev/sde xfs_repair: warning - cannot
> set blocksize on block device /dev/sde: Invalid argument
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> ERROR: The filesystem has valuable metadata changes in a log which
> needs ...
> </snip>
>
> Any advice or insight into what we're doing wrong would be very much
> appreciated. My apologies in advance for the somewhat off-topic
> question.
>
> - John Quigley
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-13 21:17 ` Emmanuel Florac
@ 2009-08-13 22:42 ` Felix Blyakher
2009-08-14 0:52 ` John Quigley
2009-08-14 0:50 ` John Quigley
1 sibling, 1 reply; 20+ messages in thread
From: Felix Blyakher @ 2009-08-13 22:42 UTC (permalink / raw)
To: Emmanuel Florac; +Cc: John Quigley, XFS Development
On Aug 13, 2009, at 4:17 PM, Emmanuel Florac wrote:
> Le Thu, 13 Aug 2009 15:17:22 -0500 vous écriviez:
>
>> Any advice or insight into what we're doing wrong would be very much
>> appreciated. My apologies in advance for the somewhat off-topic
>> question.
>
> By killing abruptly the primary server while doing IO, you're probably
> pushing the envelope...
I don't think it's pushing too much. XFS was designed to
survive such events.
> You may have a somewhat better luck with a
> cluster fs, OCFS2 works very well for me usually (GFS is a complete
> PITA to setup).
>
>
> The better option would be to disallow completely write
> caching on the client side (because this is probably where it's going
> wrong) however I don't know how.
Client's caching can't affect the metadata and the log in particular
operations on the server. Client may indeed loose some data, but
that's completely different issue.
Felix
> You can get it to flush extremely
> often by playing with /proc/sys/vm/dirty_expire_centiseconds
> and /proc/sys/vm/dirty_writeback_centisecs, though. Safer settings
> generally imply terrible performance, though, you've been warned.
>
> Ah another thing may be some cache option in the iSCSI target. what
> target are you using?
>
> --
> --------------------------------------------------
> Emmanuel Florac www.intellique.com
> --------------------------------------------------
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-13 21:44 ` Felix Blyakher
@ 2009-08-14 0:31 ` Eric Sandeen
2009-08-14 0:58 ` Lachlan McIlroy
` (2 more replies)
2009-08-14 0:56 ` John Quigley
1 sibling, 3 replies; 20+ messages in thread
From: Eric Sandeen @ 2009-08-14 0:31 UTC (permalink / raw)
To: Felix Blyakher; +Cc: John Quigley, XFS Development
Felix Blyakher wrote:
> On Aug 13, 2009, at 3:17 PM, John Quigley wrote:
>
>> Folks:
>>
>> We're deploying XFS in a configuration where the file system is
>> being exported with NFS. XFS is being mounted on Linux, with
>> default options; an iSCSI volume is the formatted media. We're
>> working out a failover solution for this deployment utilizing Linux
>> HA. Things appear to work correctly in the general case, but in
>> continuous testing we're getting XFS superblock corruption on a very
>> reproducible basis.
>> The sequence of events in our test scenario:
>>
>> 1. NFS server #1 online
>> 2. Run IO to NFS server #1 from NFS client
>> 3. NFS server #1 offline, (via passing 'b' to /proc/sysrq-trigger)
>> 4. NFS server #2 online
>> 5. XFS mounted as part of failover mechanism, mount fails
>>
>> The mount fails with the following:
>>
>> <snip>
>> kernel: XFS mounting filesystem sde
>> kernel: Starting XFS recovery on filesystem: sde (logdev: internal)
>> kernel: XFS: xlog_recover_process_data: bad clientid
>> kernel: XFS: log mount/recovery failed: error 5
>
> This is an IO error. Is the block device (/dev/sde) accessible
> from the server #2 OK? Can you dd from that device?
Are you sure?
if (ohead->oh_clientid != XFS_TRANSACTION &&
ohead->oh_clientid != XFS_LOG) {
xlog_warn(
"XFS: xlog_recover_process_data: bad clientid");
ASSERT(0);
return (XFS_ERROR(EIO));
}
so it does say EIO but that seems to me to be the wrong error; loks more
like a bad log to me.
It does make me wonder if there's any sort of per-initiator caching on
the iscsi target or something. </handwave>
-Eric
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
[not found] <424153067.1934481250210293891.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
@ 2009-08-14 0:38 ` Lachlan McIlroy
2009-08-14 1:14 ` John Quigley
2009-08-17 18:04 ` John Quigley
0 siblings, 2 replies; 20+ messages in thread
From: Lachlan McIlroy @ 2009-08-14 0:38 UTC (permalink / raw)
To: Felix Blyakher; +Cc: John Quigley, XFS Development
----- "Felix Blyakher" <felixb@sgi.com> wrote:
> On Aug 13, 2009, at 3:17 PM, John Quigley wrote:
>
> > Folks:
> >
> > We're deploying XFS in a configuration where the file system is
> > being exported with NFS. XFS is being mounted on Linux, with
> > default options; an iSCSI volume is the formatted media. We're
> > working out a failover solution for this deployment utilizing Linux
>
> > HA. Things appear to work correctly in the general case, but in
> > continuous testing we're getting XFS superblock corruption on a very
>
> > reproducible basis.
> > The sequence of events in our test scenario:
> >
> > 1. NFS server #1 online
> > 2. Run IO to NFS server #1 from NFS client
> > 3. NFS server #1 offline, (via passing 'b' to /proc/sysrq-trigger)
> > 4. NFS server #2 online
> > 5. XFS mounted as part of failover mechanism, mount fails
> >
> > The mount fails with the following:
> >
> > <snip>
> > kernel: XFS mounting filesystem sde
> > kernel: Starting XFS recovery on filesystem: sde (logdev: internal)
> > kernel: XFS: xlog_recover_process_data: bad clientid
> > kernel: XFS: log mount/recovery failed: error 5
>
> This is an IO error. Is the block device (/dev/sde) accessible
> from the server #2 OK? Can you dd from that device?
>
> >
> > kernel: XFS: log mount failed
> > </snip>
> >
> > When running xfs_repair:
>
> That's not a good time to run xfs_repair. There were no
> indication that the filesystem is corrupted.
>
> Let's take for a sec "NFS server #2" out of the picture.
> Can you mount the filesystem from the original server
> after it reboots?
If that fails too can you run xfs_logprint on /dev/sde and
post any errors it reports?
>
> Felix
>
> >
> >
> > <snip>
> > [root@machine ~]# xfs_repair /dev/sde xfs_repair: warning - cannot
>
> > set blocksize on block device /dev/sde: Invalid argument
> > Phase 1 - find and verify superblock...
> > Phase 2 - using internal log
> > - zero log...
> > ERROR: The filesystem has valuable metadata changes in a log which
>
> > needs ...
> > </snip>
> >
> > Any advice or insight into what we're doing wrong would be very much
>
> > appreciated. My apologies in advance for the somewhat off-topic
> > question.
> >
> > - John Quigley
> >
> > _______________________________________________
> > xfs mailing list
> > xfs@oss.sgi.com
> > http://oss.sgi.com/mailman/listinfo/xfs
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-13 21:17 ` Emmanuel Florac
2009-08-13 22:42 ` Felix Blyakher
@ 2009-08-14 0:50 ` John Quigley
1 sibling, 0 replies; 20+ messages in thread
From: John Quigley @ 2009-08-14 0:50 UTC (permalink / raw)
To: Emmanuel Florac; +Cc: XFS Development
Emmanuel Florac wrote:
> By killing abruptly the primary server while doing IO, you're probably
> pushing the envelope... You may have a somewhat better luck with a
> cluster fs, OCFS2 works very well for me usually (GFS is a complete
> PITA to setup).
Acknowledged; we've looked at GFS, and I've been meaning to read up on OCFS2. For various reasons, particularly performance, ease of deployment and flexible growth, XFS has been the clear winner in our particular case (and our case is fairly unique, as our volume is backed by a distributed storage device).
> You can get it to flush extremely
> often by playing with /proc/sys/vm/dirty_expire_centiseconds
> and /proc/sys/vm/dirty_writeback_centisecs, though. Safer settings
> generally imply terrible performance, though, you've been warned.
Okay, interesting, I wasn't aware of these and will look into it.
> Ah another thing may be some cache option in the iSCSI target. what
> target are you using?
No caching target side - I can speak definitely on that because I wrote it (integrates with our data dispersal stack [1]). Also, we're utilizing the same target when failing over; it's just the ISCSI initiator (aka, the NFS server) that is changing.
Thank you kindly for the quick response.
- John Quigley
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-13 22:42 ` Felix Blyakher
@ 2009-08-14 0:52 ` John Quigley
0 siblings, 0 replies; 20+ messages in thread
From: John Quigley @ 2009-08-14 0:52 UTC (permalink / raw)
Cc: XFS Development
Felix Blyakher wrote:
> I don't think it's pushing too much. XFS was designed to
> survive such events.
And that was my understanding, based on all I've read about the design and intended usage. XFS has been remarkably resilient in the face of various poor operating conditions, and this is the only environment under which failure has been observed. It's for this reason that I assumed it's something we're doing wrong, and not an inherent issue with the file system.
- John Quigley
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-13 21:44 ` Felix Blyakher
2009-08-14 0:31 ` Eric Sandeen
@ 2009-08-14 0:56 ` John Quigley
1 sibling, 0 replies; 20+ messages in thread
From: John Quigley @ 2009-08-14 0:56 UTC (permalink / raw)
Cc: XFS Development
Felix Blyakher wrote:
> This is an IO error. Is the block device (/dev/sde) accessible
> from the server #2 OK? Can you dd from that device?
Interesting suggestion; I don't recall having seen any indication of IO errors, but I'm testing again this evening, and will report back on what I find with dd.
> That's not a good time to run xfs_repair. There were no
> indication that the filesystem is corrupted.
Interesting again. I ran the tool merely because I had no other recourse to remedy this.
> Let's take for a sec "NFS server #2" out of the picture.
> Can you mount the filesystem from the original server
> after it reboots?
This has been tested and fails, so it does appear to be a corruption issue on the stable media. Based on your first comment, I'm re-running tests to verify that the ISCSI volumes are fully online and ready to accept IO before mounting the file system.
Thank you very much.
- John Quigley
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-14 0:31 ` Eric Sandeen
@ 2009-08-14 0:58 ` Lachlan McIlroy
2009-08-14 1:35 ` Eric Sandeen
2009-08-14 1:06 ` John Quigley
2009-08-14 13:21 ` Felix Blyakher
2 siblings, 1 reply; 20+ messages in thread
From: Lachlan McIlroy @ 2009-08-14 0:58 UTC (permalink / raw)
To: Eric Sandeen; +Cc: John Quigley, XFS Development
----- "Eric Sandeen" <sandeen@sandeen.net> wrote:
> Felix Blyakher wrote:
> > On Aug 13, 2009, at 3:17 PM, John Quigley wrote:
> >
> >> Folks:
> >>
> >> We're deploying XFS in a configuration where the file system is
> >> being exported with NFS. XFS is being mounted on Linux, with
> >> default options; an iSCSI volume is the formatted media. We're
> >> working out a failover solution for this deployment utilizing Linux
>
> >> HA. Things appear to work correctly in the general case, but in
> >> continuous testing we're getting XFS superblock corruption on a
> very
> >> reproducible basis.
> >> The sequence of events in our test scenario:
> >>
> >> 1. NFS server #1 online
> >> 2. Run IO to NFS server #1 from NFS client
> >> 3. NFS server #1 offline, (via passing 'b' to /proc/sysrq-trigger)
> >> 4. NFS server #2 online
> >> 5. XFS mounted as part of failover mechanism, mount fails
> >>
> >> The mount fails with the following:
> >>
> >> <snip>
> >> kernel: XFS mounting filesystem sde
> >> kernel: Starting XFS recovery on filesystem: sde (logdev:
> internal)
> >> kernel: XFS: xlog_recover_process_data: bad clientid
> >> kernel: XFS: log mount/recovery failed: error 5
> >
> > This is an IO error. Is the block device (/dev/sde) accessible
> > from the server #2 OK? Can you dd from that device?
>
> Are you sure?
>
> if (ohead->oh_clientid != XFS_TRANSACTION &&
> ohead->oh_clientid != XFS_LOG) {
> xlog_warn(
> "XFS: xlog_recover_process_data: bad clientid");
> ASSERT(0);
> return (XFS_ERROR(EIO));
> }
>
> so it does say EIO but that seems to me to be the wrong error; loks
> more
> like a bad log to me.
>
> It does make me wonder if there's any sort of per-initiator caching
> on
> the iscsi target or something. </handwave>
Should barriers be enabled in XFS then?
>
> -Eric
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-14 0:31 ` Eric Sandeen
2009-08-14 0:58 ` Lachlan McIlroy
@ 2009-08-14 1:06 ` John Quigley
2009-08-14 13:21 ` Felix Blyakher
2 siblings, 0 replies; 20+ messages in thread
From: John Quigley @ 2009-08-14 1:06 UTC (permalink / raw)
To: XFS Development
Eric Sandeen wrote:
> Are you sure?
>
> if (ohead->oh_clientid != XFS_TRANSACTION &&
> ohead->oh_clientid != XFS_LOG) {
> xlog_warn(
> "XFS: xlog_recover_process_data: bad clientid");
> ASSERT(0);
> return (XFS_ERROR(EIO));
> }
>
> so it does say EIO but that seems to me to be the wrong error; loks more
> like a bad log to me.
Hey Eric:
That would certainly be consistent with our experience, as the only way we're able to bring the file system back online is by zeroing the log.
> It does make me wonder if there's any sort of per-initiator caching on
> the iscsi target or something. </handwave>
There isn't, as mentioned above, though we have several intermediate layers between the file system and iSCSI initiator, including multipath and LVM, both of which I was initially suspicious of. In testing with a similar scenario but in a more isolate fashion without those two intermediates, the behavior was still present. Also, just to clarify the topology:
/-----[Failover Secondary]------\
/ \
NFS Client ----/ \-----[ISCSI Target]----[Distributed Storage]
\ /
\ /
\-----[Failover Primary]--------/
Those two failover machines, Primary and Secondary, act as the NFS server, the XFS mountpoint and ISCSI initiator. Only one failover machine is logged into the ISCSI target/has XFS mounted.
Thanks very much for your cycles on this guys.
- John Quigley
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-14 0:38 ` Lachlan McIlroy
@ 2009-08-14 1:14 ` John Quigley
2009-08-17 18:04 ` John Quigley
1 sibling, 0 replies; 20+ messages in thread
From: John Quigley @ 2009-08-14 1:14 UTC (permalink / raw)
To: Lachlan McIlroy; +Cc: XFS Development
Lachlan McIlroy wrote:
> If that fails too can you run xfs_logprint on /dev/sde and
> post any errors it reports?
I'll definitely do so, thanks.
- John Quigley
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-14 0:58 ` Lachlan McIlroy
@ 2009-08-14 1:35 ` Eric Sandeen
2009-08-14 1:44 ` John Quigley
0 siblings, 1 reply; 20+ messages in thread
From: Eric Sandeen @ 2009-08-14 1:35 UTC (permalink / raw)
To: Lachlan McIlroy; +Cc: John Quigley, XFS Development
Lachlan McIlroy wrote:
> ----- "Eric Sandeen" <sandeen@sandeen.net> wrote:
>
>> Felix Blyakher wrote:
>>> On Aug 13, 2009, at 3:17 PM, John Quigley wrote:
>>>
>>>> Folks:
>>>>
>>>> We're deploying XFS in a configuration where the file system is
>>>> being exported with NFS. XFS is being mounted on Linux, with
>>>> default options; an iSCSI volume is the formatted media. We're
>>>> working out a failover solution for this deployment utilizing Linux
>>
>>>> HA. Things appear to work correctly in the general case, but in
>>>> continuous testing we're getting XFS superblock corruption on a
>> very
>>>> reproducible basis.
>>>> The sequence of events in our test scenario:
>>>>
>>>> 1. NFS server #1 online
>>>> 2. Run IO to NFS server #1 from NFS client
>>>> 3. NFS server #1 offline, (via passing 'b' to /proc/sysrq-trigger)
>>>> 4. NFS server #2 online
>>>> 5. XFS mounted as part of failover mechanism, mount fails
>>>>
>>>> The mount fails with the following:
>>>>
>>>> <snip>
>>>> kernel: XFS mounting filesystem sde
>>>> kernel: Starting XFS recovery on filesystem: sde (logdev:
>> internal)
>>>> kernel: XFS: xlog_recover_process_data: bad clientid
>>>> kernel: XFS: log mount/recovery failed: error 5
>>> This is an IO error. Is the block device (/dev/sde) accessible
>>> from the server #2 OK? Can you dd from that device?
>> Are you sure?
>>
>> if (ohead->oh_clientid != XFS_TRANSACTION &&
>> ohead->oh_clientid != XFS_LOG) {
>> xlog_warn(
>> "XFS: xlog_recover_process_data: bad clientid");
>> ASSERT(0);
>> return (XFS_ERROR(EIO));
>> }
>>
>> so it does say EIO but that seems to me to be the wrong error; loks
>> more
>> like a bad log to me.
>>
>> It does make me wonder if there's any sort of per-initiator caching
>> on
>> the iscsi target or something. </handwave>
> Should barriers be enabled in XFS then?
Could try it but I bet the iscsi target doesn't claim to support them...
-eric
>> -Eric
>>
>> _______________________________________________
>> xfs mailing list
>> xfs@oss.sgi.com
>> http://oss.sgi.com/mailman/listinfo/xfs
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
[not found] <835473717.1935811250214078456.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
@ 2009-08-14 1:43 ` Lachlan McIlroy
0 siblings, 0 replies; 20+ messages in thread
From: Lachlan McIlroy @ 2009-08-14 1:43 UTC (permalink / raw)
To: Eric Sandeen; +Cc: John Quigley, XFS Development
----- "Eric Sandeen" <sandeen@sandeen.net> wrote:
> Lachlan McIlroy wrote:
> > ----- "Eric Sandeen" <sandeen@sandeen.net> wrote:
> >
> >> Felix Blyakher wrote:
> >>> On Aug 13, 2009, at 3:17 PM, John Quigley wrote:
> >>>
> >>>> Folks:
> >>>>
> >>>> We're deploying XFS in a configuration where the file system is
>
> >>>> being exported with NFS. XFS is being mounted on Linux, with
> >>>> default options; an iSCSI volume is the formatted media. We're
>
> >>>> working out a failover solution for this deployment utilizing
> Linux
> >>
> >>>> HA. Things appear to work correctly in the general case, but in
>
> >>>> continuous testing we're getting XFS superblock corruption on a
> >> very
> >>>> reproducible basis.
> >>>> The sequence of events in our test scenario:
> >>>>
> >>>> 1. NFS server #1 online
> >>>> 2. Run IO to NFS server #1 from NFS client
> >>>> 3. NFS server #1 offline, (via passing 'b' to
> /proc/sysrq-trigger)
> >>>> 4. NFS server #2 online
> >>>> 5. XFS mounted as part of failover mechanism, mount fails
> >>>>
> >>>> The mount fails with the following:
> >>>>
> >>>> <snip>
> >>>> kernel: XFS mounting filesystem sde
> >>>> kernel: Starting XFS recovery on filesystem: sde (logdev:
> >> internal)
> >>>> kernel: XFS: xlog_recover_process_data: bad clientid
> >>>> kernel: XFS: log mount/recovery failed: error 5
> >>> This is an IO error. Is the block device (/dev/sde) accessible
> >>> from the server #2 OK? Can you dd from that device?
> >> Are you sure?
> >>
> >> if (ohead->oh_clientid != XFS_TRANSACTION &&
> >> ohead->oh_clientid != XFS_LOG) {
> >> xlog_warn(
> >> "XFS: xlog_recover_process_data: bad clientid");
> >> ASSERT(0);
> >> return (XFS_ERROR(EIO));
> >> }
> >>
> >> so it does say EIO but that seems to me to be the wrong error;
> loks
> >> more
> >> like a bad log to me.
> >>
> >> It does make me wonder if there's any sort of per-initiator
> caching
> >> on
> >> the iscsi target or something. </handwave>
> > Should barriers be enabled in XFS then?
>
> Could try it but I bet the iscsi target doesn't claim to support
> them...
You're probably right.
Is it possible for a transaction record to span two log buffers and
only one made it to disk so the rest of the transction record appears
corrupt?
>
> -eric
>
> >> -Eric
> >>
> >> _______________________________________________
> >> xfs mailing list
> >> xfs@oss.sgi.com
> >> http://oss.sgi.com/mailman/listinfo/xfs
> >
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-14 1:35 ` Eric Sandeen
@ 2009-08-14 1:44 ` John Quigley
0 siblings, 0 replies; 20+ messages in thread
From: John Quigley @ 2009-08-14 1:44 UTC (permalink / raw)
To: XFS Development
Eric Sandeen wrote:
>> Should barriers be enabled in XFS then?
>
> Could try it but I bet the iscsi target doesn't claim to support them...
The target implementation, being new, is fairly naive and does not support this (or have any caching facilities, for that matter) at this time.
- John Quigley
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-14 0:31 ` Eric Sandeen
2009-08-14 0:58 ` Lachlan McIlroy
2009-08-14 1:06 ` John Quigley
@ 2009-08-14 13:21 ` Felix Blyakher
2 siblings, 0 replies; 20+ messages in thread
From: Felix Blyakher @ 2009-08-14 13:21 UTC (permalink / raw)
To: Eric Sandeen; +Cc: John Quigley, XFS Development
On Aug 13, 2009, at 7:31 PM, Eric Sandeen wrote:
>> This is an IO error. Is the block device (/dev/sde) accessible
>> from the server #2 OK? Can you dd from that device?
>
> Are you sure?
No, I'm not. Replied first without looking at the code ^)
>
>
> if (ohead->oh_clientid != XFS_TRANSACTION &&
> ohead->oh_clientid != XFS_LOG) {
> xlog_warn(
> "XFS: xlog_recover_process_data: bad clientid");
> ASSERT(0);
> return (XFS_ERROR(EIO));
> }
>
> so it does say EIO but that seems to me to be the wrong error; loks
> more
> like a bad log to me.
Agree. It does look like corrupted (incomplete) log.
>
>
> It does make me wonder if there's any sort of per-initiator caching on
> the iscsi target or something. </handwave>
Yep, somewhere piece of log left out in a cache, and wasn't flushed
to disk.
Felix
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-14 0:38 ` Lachlan McIlroy
2009-08-14 1:14 ` John Quigley
@ 2009-08-17 18:04 ` John Quigley
1 sibling, 0 replies; 20+ messages in thread
From: John Quigley @ 2009-08-17 18:04 UTC (permalink / raw)
To: XFS Development
Lachlan McIlroy wrote:
> If that fails too can you run xfs_logprint on /dev/sde and
> post any errors it reports?
My apologies for the delayed response; output of logprint can be downloaded as a ~4MB bzip:
http://www.jquigley.com/files/tmp/xfs-failover-logprint.bz2
Thanks very much for your consideration.
- John Quigley
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
[not found] <990461759.2142271250648177725.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
@ 2009-08-19 2:18 ` Lachlan McIlroy
2009-08-19 15:46 ` John Quigley
0 siblings, 1 reply; 20+ messages in thread
From: Lachlan McIlroy @ 2009-08-19 2:18 UTC (permalink / raw)
To: John Quigley; +Cc: XFS Development
----- "John Quigley" <jquigley@jquigley.com> wrote:
> Lachlan McIlroy wrote:
> > If that fails too can you run xfs_logprint on /dev/sde and
> > post any errors it reports?
>
> My apologies for the delayed response; output of logprint can be
> downloaded as a ~4MB bzip:
>
> http://www.jquigley.com/files/tmp/xfs-failover-logprint.bz2
xfs_logprint doesn't find any problems with this log but that doesn't mean
the kernel doesn't - they use different implementations to read the log. I
noticed that the active part of the log wraps around the physical end/start
of the log which reminds of this fix:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d1afb678ce77b930334a8a640a05b8e68178a377
I remember that without this fix we were seeing ASSERTs in the log recovery
code - unfortunately I don't remember exactly where but it could be from
the same location you are getting the "bad clientid" error. When a log
record wraps the end/start of the physical log we need to do two I/Os to
read the log record in. This bug caused the second read to go to an
incorrect location in the buffer which overwrote part of the first I/O and
corrupted the log record. I think the fix made it into 2.6.24.
>
> Thanks very much for your consideration.
>
> - John Quigley
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
2009-08-19 2:18 ` Lachlan McIlroy
@ 2009-08-19 15:46 ` John Quigley
0 siblings, 0 replies; 20+ messages in thread
From: John Quigley @ 2009-08-19 15:46 UTC (permalink / raw)
To: XFS Development
Lachlan McIlroy wrote:
> xfs_logprint doesn't find any problems with this log but that doesn't mean
> the kernel doesn't - they use different implementations to read the log. I
> noticed that the active part of the log wraps around the physical end/start
> of the log which reminds of this fix:
Very interesting indeed, thank you /very/ much for looking at this.
> I think the fix made it into 2.6.24.
We're currently using the very latest 2.6.30, unfortunately. We've distilled this into a reproducible environment with a stack of NFS + XFS to a local disk + automated sysrq 'b' reboots. We're working on getting this bundled up into a nice little package as a VirtualBox vm for your consumption. Please tell me if this is not desirable.
Thanks very much again.
John Quigley
jquigley.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: XFS corruption with failover
[not found] <1194138654.75921250838215929.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
@ 2009-08-21 7:11 ` Lachlan McIlroy
0 siblings, 0 replies; 20+ messages in thread
From: Lachlan McIlroy @ 2009-08-21 7:11 UTC (permalink / raw)
To: John Quigley; +Cc: XFS Development
----- "John Quigley" <jquigley@jquigley.com> wrote:
> Lachlan McIlroy wrote:
> > xfs_logprint doesn't find any problems with this log but that
> doesn't mean
> > the kernel doesn't - they use different implementations to read the
> log. I
> > noticed that the active part of the log wraps around the physical
> end/start
> > of the log which reminds of this fix:
Hang on, I made a mistake there. The xfs_logprint transactional view
of the log didn't find any errors but dumping the contents of the log
shows a different story.
$ xfs_logprint -f xfs-failover-logprint
xfs_logprint:
data device: 0xffffffffffffffff
log device: 0xffffffffffffffff daddr: 0 length: 262144
Header 0xb wanted 0xfeedbabe
**********************************************************************
* ERROR: header cycle=11 block=6168 *
**********************************************************************
Bad log record header
$ xfs_logprint -d -f xfs-failover-logprint
xfs_logprint:
data device: 0xffffffffffffffff
log device: 0xffffffffffffffff daddr: 0 length: 262144
[00000 - 00000] Cycle 0xffffffff New Cycle 0x0000000c
32 HEADER Cycle 12 tail 11:257848 len 32256 ops 707
96 HEADER Cycle 12 tail 11:257848 len 24064 ops 456
144 HEADER Cycle 12 tail 11:257848 len 3584 ops 25
152 HEADER Cycle 12 tail 11:257848 len 32256 ops 708
216 HEADER Cycle 12 tail 11:257848 len 32256 ops 706
280 HEADER Cycle 12 tail 11:257848 len 32256 ops 709
344 HEADER Cycle 12 tail 11:257848 len 3584 ops 18
352 HEADER Cycle 12 tail 11:257848 len 32256 ops 708
416 HEADER Cycle 12 tail 11:257848 len 32256 ops 706
480 HEADER Cycle 12 tail 11:257848 len 32256 ops 709
544 HEADER Cycle 12 tail 11:257848 len 32256 ops 707
608 HEADER Cycle 12 tail 11:257848 len 32256 ops 710
672 HEADER Cycle 12 tail 11:257848 len 32256 ops 707
736 HEADER Cycle 12 tail 11:257848 len 32256 ops 709
800 HEADER Cycle 12 tail 11:257848 len 32256 ops 707
864 HEADER Cycle 12 tail 11:257848 len 32256 ops 706
928 HEADER Cycle 12 tail 11:257848 len 32256 ops 709
992 HEADER Cycle 12 tail 11:257848 len 32256 ops 707
1056 HEADER Cycle 12 tail 11:257848 len 32256 ops 710
1120 HEADER Cycle 12 tail 11:257848 len 32256 ops 707
1184 HEADER Cycle 12 tail 11:257848 len 32256 ops 709
1248 HEADER Cycle 12 tail 11:257848 len 32256 ops 707
1312 HEADER Cycle 12 tail 11:257848 len 32256 ops 706
1376 HEADER Cycle 12 tail 11:257848 len 32256 ops 709
1440 HEADER Cycle 12 tail 11:257848 len 32256 ops 707
1504 HEADER Cycle 12 tail 11:257848 len 32256 ops 710
1568 HEADER Cycle 12 tail 11:257848 len 24064 ops 437
1616 HEADER Cycle 12 tail 11:257848 len 3584 ops 25
1624 HEADER Cycle 12 tail 11:257848 len 32256 ops 708
1688 HEADER Cycle 12 tail 11:257848 len 32256 ops 706
1752 HEADER Cycle 12 tail 11:257848 len 32256 ops 709
1816 HEADER Cycle 12 tail 11:257848 len 32256 ops 707
1880 HEADER Cycle 12 tail 11:257848 len 32256 ops 710
1944 HEADER Cycle 12 tail 11:257848 len 32256 ops 707
2008 HEADER Cycle 12 tail 11:257848 len 32256 ops 709
2072 HEADER Cycle 11 tail 11:257848 len 0 ops 0
[00000 - 02072] Cycle 0x0000000c New Cycle 0x0000000b
2073 HEADER Cycle 11 tail 11:257848 len 0 ops 0
2074 HEADER Cycle 11 tail 11:257848 len 0 ops 0
2075 HEADER Cycle 11 tail 11:257848 len 0 ops 0
.........
6165 HEADER Cycle 11 tail 11:257848 len 0 ops 0
6166 HEADER Cycle 11 tail 11:257848 len 0 ops 0
6167 HEADER Cycle 11 tail 11:257848 len 0 ops 0
6184 HEADER Cycle 11 tail 10:260744 len 32256 ops 707
6248 HEADER Cycle 11 tail 10:260744 len 32256 ops 710
6312 HEADER Cycle 11 tail 10:260744 len 32256 ops 707
..........
So we get to block 6168 and there's an unexpected state change - instead
of a magic number we have the cycle number.
BLKNO: 6167
0 bebaedfe b000000 2000000 0 b000000 17180000 b000000 38ef0300
8 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0
20 0 0 0 0 0 0 0 0
28 0 0 0 0 0 0 0 0
30 0 0 0 0 0 0 0 0
38 0 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0 0
48 0 0 0 1000000 af447af9 4a44d930 5a9b0fa0 20d7ba86
50 0 0 0 0 0 0 0 0
58 0 0 0 0 0 0 0 0
60 0 0 0 0 0 0 0 0
68 0 0 0 0 0 0 0 0
70 0 0 0 0 0 0 0 0
78 0 0 0 0 0 0 0 0
BLKNO: 6168
0 b000000 69 81a4494e 10201 63 63 1 0
8 0 20000 4a857400 207b4682 4a859784 21b73460 4a859784 21b73460
10 5f60000 0 0 0 0 0 2000000 0
18 0 0 b00428a0 0 269 780528a0 0 169
20 780528a0 10000000 69 5452414e 3 0 1 780528a0
28 38000000 69 2123b 1 0 0 12c126 0
30 0 0 0 0 96090 0 10 600
38 780528a0 60000000 69 81a4494e 10201 63 63 1
40 0 0 20000 4a857400 207b4682 4a859784 21b73460 4a859784
48 21b73460 5f60000 0 0 0 0 0 2000000
50 0 0 0 780528a0 0 269 400628a0 0
58 169 400628a0 10000000 69 5452414e 3 0 1
60 400628a0 38000000 69 2123b 1 0 0 12c126
68 0 0 0 0 0 96090 0 10
70 600 400628a0 60000000 69 81a4494e 10201 63 63
78 1 0 0 20000 4a857400 207b4682 4a859784 21c676a0
I don't know what's happened here. It may not even be related to the log
recovery failure.
>
> Very interesting indeed, thank you /very/ much for looking at this.
>
> > I think the fix made it into 2.6.24.
>
> We're currently using the very latest 2.6.30, unfortunately. We've
> distilled this into a reproducible environment with a stack of NFS +
> XFS to a local disk + automated sysrq 'b' reboots. We're working on
> getting this bundled up into a nice little package as a VirtualBox vm
> for your consumption. Please tell me if this is not desirable.
>
> Thanks very much again.
>
> John Quigley
> jquigley.com
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2009-08-21 7:11 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-08-13 20:17 XFS corruption with failover John Quigley
2009-08-13 21:17 ` Emmanuel Florac
2009-08-13 22:42 ` Felix Blyakher
2009-08-14 0:52 ` John Quigley
2009-08-14 0:50 ` John Quigley
2009-08-13 21:44 ` Felix Blyakher
2009-08-14 0:31 ` Eric Sandeen
2009-08-14 0:58 ` Lachlan McIlroy
2009-08-14 1:35 ` Eric Sandeen
2009-08-14 1:44 ` John Quigley
2009-08-14 1:06 ` John Quigley
2009-08-14 13:21 ` Felix Blyakher
2009-08-14 0:56 ` John Quigley
[not found] <424153067.1934481250210293891.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-08-14 0:38 ` Lachlan McIlroy
2009-08-14 1:14 ` John Quigley
2009-08-17 18:04 ` John Quigley
[not found] <835473717.1935811250214078456.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-08-14 1:43 ` Lachlan McIlroy
[not found] <990461759.2142271250648177725.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-08-19 2:18 ` Lachlan McIlroy
2009-08-19 15:46 ` John Quigley
[not found] <1194138654.75921250838215929.JavaMail.root@zmail05.collab.prod.int.phx2.redhat.com>
2009-08-21 7:11 ` Lachlan McIlroy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox