public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* OCFS2 Filesystem inconsistency across nodes
@ 2006-02-10  5:36 Claudio Martins
  2006-02-10  6:46 ` Mark Fasheh
       [not found] ` <20060210054958.GG4755@ca-server1.us.oracle.com>
  0 siblings, 2 replies; 14+ messages in thread
From: Claudio Martins @ 2006-02-10  5:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Mark Fasheh, ocfs2-devel


 (I'm posting this to lkml since ocfs2-users@oss.oracle.com doesn't seem to be 
accepting new subscription requests)

 Hi all

 I'm testing OCFS2 on a 3 node cluster (2 nodes are Dual Xeon 512MB of RAM, 
the third one is a Dual AMD Athlon with 1GB of RAM) with gigabit ethernet 
interconnect and using an iSCSI target box for shared storage. 
 I'm using kernel version 2.6.16-rc2-git3 (gcc 4.0.3 from Debian) and compiled 
the iSCSI modules from the latest open-iscsi tree.
 Ocfs2-tools is version 1.1.5 from Debian distro.
 I followed the procedures for cluster configuration from the "OCFS2 User's 
Guide", onlined the cluster and formated a 2.3TB shared volume with 

 mkfs.ocfs2 -b 4K -C 64K -N 4 -L OCFSTest1 /dev/sda

 So after mounting this shared volume on all three nodes I played with it by 
creating/deleting/writing to files on different nodes. And this is the part 
where the fun begins:

On node1 I can do:

#mkdir dir1
#ls -l
total 4
drwxr-xr-x  2 ctpm ctpm 4096 2006-02-10 04:30 dir1

On node2 I do:

#ls -l
total 4
drwxr-xr-x 2 ctpm ctpm 4096 Feb 10 04:30 dir1

On node3 I do:
#ls -l
total 0

 Whooops!... now that directory should have appeared on all three nodes. It 
doesn't; not even if I wait half an hour. 
 I can reproduce the above behavior touching or writing to files instead of 
directories. It seems to be random, so sometimes it works and the file 
appears on all three nodes and sometimes only on 1 or 2 of them. 
 Node order doesn't seem to matter, so I don't think it's a problem w/ the 
configuration on one of the nodes. 
 Another example is to create a file, write something to it and then try to 
read it from another node (when it happens to be visible at all) with cat 
which results in

cat: file1: Input/output error

and lots of this on dmesg output:

(1925,0):ocfs2_extent_map_lookup_read:362 ERROR: status = -3
(1925,0):ocfs2_extent_map_get_blocks:818 ERROR: status = -3
(1925,0):ocfs2_get_block:166 ERROR: Error -3 from get_blocks(0xd6a7d4bc, 
436976, 1, 0, NULL)
(1925,0):ocfs2_extent_map_lookup_read:362 ERROR: status = -3
(1925,0):ocfs2_extent_map_get_blocks:818 ERROR: status = -3
(1925,0):ocfs2_get_block:166 ERROR: Error -3 from get_blocks(0xd6a7d4bc, 
436977, 1, 0, NULL)
(1925,0):ocfs2_extent_map_lookup_read:362 ERROR: status = -3
(1925,0):ocfs2_extent_map_get_blocks:818 ERROR: status = -3


 At first I thought this might be caused by the metadata info being propagated 
to the other nodes but the caches not being flushed to disk on the node that 
wrote to a file. So I tested this by copying ~2GB sized files to try to cause 
some memory pressure, yet with the same kind of disappointing results.

 Yet another interesting thing results when I create one big file. The file is 
shown on the directory listing on the node that wrote it. The other ones 
don't see it. I umount the filesystem from the nodes one by one starting with 
the one which wrote this file, and then the others. I then remount the 
filesystem on all of them and the file I created is gone. So not even a flush 
caused by unmounting assures that a file gets to persistant storage.

 Note that I'm using regular stuff to write the files, like copy, cat, echo, 
and shell redirects. No fancy mmap stuff or test programs.

 I also tested OCFS2 with a smaller volume over NBD from a remote machine with 
exactly the same kind of behavior, so I don't think this is related with the 
iSCSI target volume or open-iscsi modules or with the disk box (on a side 
node this box from Promise worked flawlessly with XFS and the same open-iscsi 
modules on any machine).

 I'd like to know if anyone on the list has had the opportunity of testing 
OCFS2 or had similar problems. OTOH, if I'm wrongly assuming something about 
OCFS2 which I shouldn't be, please tell me and I'll apologise for wasting 
your time ;-)
 I googled for this subject and found suprisingly few info about real world 
OCFS2 testing. Only lots of happy guys because of it being merged for 
2.6.16 ;-)

 I'm willing to make any tests or apply any patches you want. I'll be trying 
to keep the machines and the disk box for as many days as possible, so please 
try to bug me if you think these are real bugs and you want me to test fixes 
before 2.6.16 comes out.
 If you need kernel .config or any other info please ask.

Thanks

Best regards

Claudio Martins


^ permalink raw reply	[flat|nested] 14+ messages in thread
* Re: OCFS2 Filesystem inconsistency across nodes
@ 2006-02-13 15:08 Nohez
  0 siblings, 0 replies; 14+ messages in thread
From: Nohez @ 2006-02-13 15:08 UTC (permalink / raw)
  To: linux-kernel


Hello,

We are testing OCFS2 on a 2 node cluster.  Both the nodes are Sun v40z
with OpenSuSE10.1 Beta3 (x86_64) installed (kernel-smp-2.6.16_rc2_git5-3)
OCFS2 version in OpenSuSE kernel is 1.1.7-SLES.  OCFS2 fences the
system with the default heartbeat threshold.  So heartbeat threshold
is configured to 31 on both the nodes (O2CB_HEARTBEAT_THRESHOLD=31).
Using Sun StorEdge 6920 for shared storage. Have also setup multipath
on both the nodes. Formatted a 200GB shared volume with the following
command:

    mkfs.ocfs2 -b 4K -C 32K -N 4 -L Raid5 /dev/mapper/3600015d00004b200000000000000050e-part1

Performed the following 2 tests:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Test1
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Mounted the formatted partition on both the nodes.
    sun5:~ # mount /dev/mapper/3600015d00004b200000000000000050e-part1 /raid5 -t ocfs2
    sun4:~ # mount /dev/mapper/3600015d00004b200000000000000050e-part1 /raid5 -t ocfs2

Created a tar file on Sun5.
    sun5:~ # tar cvf /raid5/test.tar /etc

File viewable on both the nodes.
    sun5:~ # ls -l /raid5/test.tar
    -rw-r--r-- 1 root root 21749760 Feb 13 19:57 /raid5/test.tar
    sun4:~ # ls -l /raid5/test.tar
    -rw-r--r-- 1 root root 21749760 Feb 13 19:57 /raid5/test.tar

Created a file on sun4 using touch. 
    sun4:~ # touch /raid5/abcd
    touch: cannot touch `/raid5/abcd': Input/output error

OCFS2 marks the filesystem read only on sun4. Relevant messages from
/var/log/messages of sun4:

<----------------------------------------------------------->
<Sun4 /var/log/messages>
<----------------------------------------------------------->
Feb 13 19:56:18 sun4 klogd: (4200,0):o2net_set_nn_state:418 connected to node sun5 (num 0) at 196.1.1.240:7777
Feb 13 19:56:22 sun4 klogd: OCFS2 1.1.7-SLES Mon Jan 16 11:58:10 PST 2006 (build sles)
Feb 13 19:56:22 sun4 klogd: (4784,3):ocfs2_initialize_super:1379 max_slots for this device: 4
Feb 13 19:56:22 sun4 klogd: (4784,3):ocfs2_fill_local_node_info:1044 I am node 2
Feb 13 19:56:22 sun4 klogd: (4784,3):__dlm_print_nodes:384 Nodes in my domain ("2FC845E5792045B198E42561E9A0C405"):
Feb 13 19:56:22 sun4 klogd: (4784,3):__dlm_print_nodes:388  node 0
Feb 13 19:56:22 sun4 klogd: (4784,3):__dlm_print_nodes:388  node 2
Feb 13 19:56:22 sun4 klogd: (4784,3):ocfs2_find_slot:267 taking node slot 1
Feb 13 19:56:22 sun4 klogd: kjournald starting.  Commit interval 5 seconds
Feb 13 19:56:22 sun4 klogd: ocfs2: Mounting device (253,4) on (node 2, slot 1)
Feb 13 19:58:30 sun4 klogd: OCFS2: ERROR (device dm-4): ocfs2_search_chain: Group Descriptor # 4485090715960753726 has bad signature >>>>>>>
Feb 13 19:58:30 sun4 klogd: File system is now read-only due to the potential of on-disk corruption. Please run fsck.ocfs2 once the file system is unmounted.
Feb 13 19:58:30 sun4 klogd: (4890,0):ocfs2_claim_suballoc_bits:1157 ERROR: status = -5
Feb 13 19:58:30 sun4 klogd: (4890,0):ocfs2_claim_new_inode:1255 ERROR: status = -5
Feb 13 19:58:30 sun4 klogd: (4890,0):ocfs2_mknod_locked:479 ERROR: status = -5
Feb 13 19:58:30 sun4 klogd: (4890,0):ocfs2_mknod:384 ERROR: status = -5
<----------------------------------------------------------->

<----------------------------------------------------------->
<Sun5 /var/log/messages>
<----------------------------------------------------------->
Feb 13 19:56:03 sun5 klogd: OCFS2 1.1.7-SLES Mon Jan 16 11:58:10 PST 2006 (build sles)
Feb 13 19:56:03 sun5 klogd: (4914,2):ocfs2_initialize_super:1379 max_slots for this device: 4
Feb 13 19:56:03 sun5 klogd: (4914,2):ocfs2_fill_local_node_info:1044 I am node 0
Feb 13 19:56:03 sun5 klogd: (4914,2):__dlm_print_nodes:384 Nodes in my domain ("2FC845E5792045B198E42561E9A0C405"):
Feb 13 19:56:03 sun5 klogd: (4914,2):__dlm_print_nodes:388  node 0
Feb 13 19:56:03 sun5 klogd: (4914,2):ocfs2_find_slot:267 taking node slot 0
Feb 13 19:56:03 sun5 klogd: kjournald starting.  Commit interval 5 seconds
Feb 13 19:56:03 sun5 klogd: ocfs2: Mounting device (253,5) on (node 0, slot 0)
Feb 13 19:56:18 sun5 klogd: (4198,0):o2net_set_nn_state:418 accepted connection from node sun4 (num 2) at 196.1.1.229:7777
Feb 13 19:56:22 sun5 klogd: (4198,0):__dlm_print_nodes:384 Nodes in my domain ("2FC845E5792045B198E42561E9A0C405"):
Feb 13 19:56:22 sun5 klogd: (4198,0):__dlm_print_nodes:388  node 0
Feb 13 19:56:22 sun5 klogd: (4198,0):__dlm_print_nodes:388  node 2
<----------------------------------------------------------->




~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Test 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Mounted the formatted partition on both the nodes.
This time mounted it first on Sun4 and then on Sun5
    sun4:~ # mount /dev/mapper/3600015d00004b200000000000000050e-part1 /raid5 -t ocfs2
    sun5:~ # mount /dev/mapper/3600015d00004b200000000000000050e-part1 /raid5 -t ocfs2

Created 2 files:
    sun4:~ # touch /raid5/abcd
    sun4:~ # touch /raid5/abcd1

Files viewable on both the nodes.
    sun4:~ # ls -l /raid5/abcd*
    -rw-r--r-- 1 root root 0 Feb 13 20:20 /raid5/abcd
    -rw-r--r-- 1 root root 0 Feb 13 20:20 /raid5/abcd1

    sun5:~ # ls -l /raid5/abcd*
    -rw-r--r-- 1 root root 0 Feb 13 20:20 /raid5/abcd
    -rw-r--r-- 1 root root 0 Feb 13 20:20 /raid5/abcd1

Issued "rm" command on Sun5.
    sun5:~ # rm /raid5/abcd
    sun5:~ # rm /raid5/abcd1

    sun5:~ # ls -l /raid5/abcd*
    /bin/ls: /raid5/abcd*: No such file or directory

This time no error message on Sun5. However files exist on Sun4

    sun4:~ # ls -l /raid5/abcd*
    -rw-r--r-- 1 root root 0 Feb 13 20:20 /raid5/abcd
    -rw-r--r-- 1 root root 0 Feb 13 20:20 /raid5/abcd1

Sun4 kernel reports the following:

<----------------------------------------------------------->
<Sun4 /var/log/messages>
<----------------------------------------------------------->
Feb 13 20:20:18 sun4 klogd: OCFS2 1.1.7-SLES Mon Jan 16 11:58:10 PST 2006 (build sles)
Feb 13 20:20:18 sun4 klogd: (4774,0):ocfs2_initialize_super:1379 max_slots for this device: 4
Feb 13 20:20:18 sun4 klogd: (4774,0):ocfs2_fill_local_node_info:1044 I am node 2
Feb 13 20:20:18 sun4 klogd: (4774,0):__dlm_print_nodes:384 Nodes in my domain ("2FC845E5792045B198E42561E9A0C405"):
Feb 13 20:20:18 sun4 klogd: (4774,0):__dlm_print_nodes:388  node 2
Feb 13 20:20:18 sun4 klogd: (4774,0):ocfs2_find_slot:267 taking node slot 0
Feb 13 20:20:18 sun4 klogd: kjournald starting.  Commit interval 5 seconds
Feb 13 20:20:18 sun4 klogd: ocfs2: Mounting device (253,3) on (node 2, slot 0)
Feb 13 20:20:28 sun4 klogd: (4235,0):o2net_set_nn_state:418 connected to node sun5 (num 0) at 196.1.1.240:7777
Feb 13 20:20:32 sun4 klogd: (4235,0):__dlm_print_nodes:384 Nodes in my domain ("2FC845E5792045B198E42561E9A0C405"):
Feb 13 20:20:32 sun4 klogd: (4235,0):__dlm_print_nodes:388  node 0
Feb 13 20:20:32 sun4 klogd: (4235,0):__dlm_print_nodes:388  node 2
<----------------------------------------------------------->

<----------------------------------------------------------->
<Sun5 /var/log/messages>
<----------------------------------------------------------->
Feb 13 20:20:28 sun5 klogd: (4310,0):o2net_set_nn_state:418 accepted connection from node sun4 (num 2) at 196.1.1.229:7777
Feb 13 20:20:32 sun5 klogd: OCFS2 1.1.7-SLES Mon Jan 16 11:58:10 PST 2006 (build sles)
Feb 13 20:20:32 sun5 klogd: (5284,0):ocfs2_initialize_super:1379 max_slots for this device: 4
Feb 13 20:20:32 sun5 klogd: (5284,2):ocfs2_fill_local_node_info:1044 I am node 0
Feb 13 20:20:32 sun5 klogd: (5284,2):__dlm_print_nodes:384 Nodes in my domain ("2FC845E5792045B198E42561E9A0C405"):
Feb 13 20:20:32 sun5 klogd: (5284,2):__dlm_print_nodes:388  node 0
Feb 13 20:20:32 sun5 klogd: (5284,2):__dlm_print_nodes:388  node 2
Feb 13 20:20:32 sun5 klogd: (5284,2):ocfs2_find_slot:267 taking node slot 1
Feb 13 20:20:32 sun5 klogd: kjournald starting.  Commit interval 5 seconds
Feb 13 20:20:32 sun5 klogd: ocfs2: Mounting device (253,5) on (node 0, slot 1)
<----------------------------------------------------------->

Filesystem is in read-write mode on both the nodes.

Unmounted filesystem on Sun4 and remounted. 
    sun4:~ # umount /raid5
    sun4:~ # mount /dev/mapper/3600015d00004b200000000000000050e-part1 /raid5 -t ocfs2
    sun4:~ # ls -l /raid5/abcd*
    /bin/ls: /raid5/abcd*: No such file or directory

Now the filesystems are in sync on both the nodes. 

~~~~~~~~~~~~~~~~~~~~~~~
/etc/cluster/ocfs2.conf (conf file same on both the servers,
                         sun1 mentioned in conf file is not powered on)
~~~~~~~~~~~~~~~~~~~~~~~
node:
        ip_port = 7777
        ip_address = 196.1.1.240
        number = 0
        name = sun5
        cluster = ocfs2

node:
        ip_port = 7777
        ip_address = 196.1.1.118
        number = 1
        name = sun1
        cluster = ocfs2

node:
        ip_port = 7777
        ip_address = 196.1.1.229
        number = 2
        name = sun4
        cluster = ocfs2

cluster:
        node_count = 3
        name = ocfs2
~~~~~~~~~~~~~~~~~~~~~~~

Regards

Nohez

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-02-15 21:39 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-02-10  5:36 OCFS2 Filesystem inconsistency across nodes Claudio Martins
2006-02-10  6:46 ` Mark Fasheh
2006-02-10  7:20   ` Claudio Martins
2006-02-11  5:40   ` Claudio Martins
2006-02-13 22:26     ` Mark Fasheh
2006-02-14  0:00       ` Claudio Martins
2006-02-14  6:16       ` Claudio Martins
2006-02-14 20:19         ` Mark Fasheh
2006-02-15  3:17           ` Claudio Martins
2006-02-15 17:50           ` Nohez
2006-02-15 21:42             ` Claudio Martins
2006-02-14 12:17       ` Jan Kara
     [not found] ` <20060210054958.GG4755@ca-server1.us.oracle.com>
2006-02-10  7:11   ` Claudio Martins
  -- strict thread matches above, loose matches on Subject: below --
2006-02-13 15:08 Nohez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox