From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Message-ID: <42766EF6.4090707@steeleye.com> Date: Mon, 02 May 2005 14:18:30 -0400 From: Paul Clements MIME-Version: 1.0 To: drbd-dev@linbit.com, drbd-user@linbit.com References: <4F6A5F5D1AA48F4584FF34E6F57D0518018A85@steelpo1.steeleye.com> <42765DEB.6040109@steeleye.com> In-Reply-To: <42765DEB.6040109@steeleye.com> Content-Type: multipart/mixed; boundary="------------020001010600000706030604" Subject: [Drbd-dev] BUG: DRBD on Power PC 64-bit with RedHat EL 3 (2.4.21 kernel) does not work List-Id: Coordination of development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , This is a multi-part message in MIME format. --------------020001010600000706030604 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit We've recently been trying to certify DRBD (we've tried both 0.7.5 and 0.7.10 with the same results) on ppc64 with RHEL3. Unfortunately, we have run into two fairly serious issues: 1) we had to hack up the source just to get it to build: The basic problem is that the adjust_drbd_config_h.sh script is not doing the right thing for RHEL3 on ppc64. RHEL3 has a find_next_bit() function, and on most architectures it's an inline function. However, on ppc64 it's not inline and it's not exported, which means drbd (being a module) can't use it. So we have to actually disable the HAVE_FIND_NEXT_BIT setting in drbd_config.h. Also, there is no arch-specific find_next_bit function for ppc64 in drbd_compat_types.h, so we have to use the generic find_next_bit function that's in that file (by defining USE_GENERIC_FIND_NEXT_BIT in drbd_config.h). Of course, when this function is defined, it conflicts with the previous find_next_bit function declaration from the kernel headers (asm-ppc64/bitops.h). So, we had to rename the generic function to generic_find_next_bit and change all calls in the drbd source (just one in drbd_bitmap.c) to use generic_find_next_bit instead of find_next_bit. 2) additionally, the driver appears to start up fine on both machines, but when the resync begins, it quickly stalls and never makes any progress Included are the DRBD config and other system information. Please let me know if you need any further information. Thanks, Paul -------- After starting drbd on the source system (already running on the target), this happens: [root@trumpkin root]# cat /proc/drbd version: 0.7.5 (api:76/proto:74) SVN Revision: 1578 build by root@tumnus, 2005-04-29 10:13:28 0: cs:SyncSource st:Secondary/Secondary ld:Consistent ns:128 nr:0 dw:0 dr:128 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 [>...................] sync'ed: 0.9% (978816/978944)K finish: 2:43:08 speed: 32 (32) K/sec [root@trumpkin root]# cat /proc/drbd version: 0.7.5 (api:76/proto:74) SVN Revision: 1578 build by root@tumnus, 2005-04-29 10:13:28 0: cs:SyncSource st:Secondary/Secondary ld:Consistent ns:128 nr:0 dw:0 dr:128 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 [>...................] sync'ed: 0.9% (978816/978944)K finish: 7:28:37 speed: 8 (8) K/sec [root@trumpkin root]# cat /proc/drbd version: 0.7.5 (api:76/proto:74) SVN Revision: 1578 build by root@tumnus, 2005-04-29 10:13:28 0: cs:SyncSource st:Secondary/Secondary ld:Consistent ns:128 nr:0 dw:0 dr:128 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 [>...................] sync'ed: 0.9% (978816/978944)K finish: 8:09:24 speed: 8 (8) K/sec [root@trumpkin root]# uname -a Linux trumpkin 2.4.21-27.EL #1 SMP Wed Dec 1 21:53:20 EST 2004 ppc64 ppc64 ppc64 GNU/Linux -------- On the target, this is reported: [root@tumnus root]# cat /proc/drbd version: 0.7.5 (api:76/proto:74) SVN Revision: 1578 build by root@tumnus, 2005-04-29 10:13:28 0: cs:SyncTarget st:Secondary/Secondary ld:Inconsistent ns:0 nr:128 dw:128 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 [>...................] sync'ed: 0.9% (978816/978944)K finish: 106:02:18 speed: 0 (0) K/sec [root@tumnus root]# cat /proc/drbd version: 0.7.5 (api:76/proto:74) SVN Revision: 1578 build by root@tumnus, 2005-04-29 10:13:28 0: cs:SyncTarget st:Secondary/Secondary ld:Inconsistent ns:0 nr:128 dw:128 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 [>...................] sync'ed: 0.9% (978816/978944)K finish: 117:35:37 speed: 0 (0) K/sec [root@tumnus root]# cat /proc/drbd version: 0.7.5 (api:76/proto:74) SVN Revision: 1578 build by root@tumnus, 2005-04-29 10:13:28 0: cs:SyncTarget st:Secondary/Secondary ld:Inconsistent ns:0 nr:128 dw:128 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 [>...................] sync'ed: 0.9% (978816/978944)K finish: 118:16:24 speed: 0 (0) K/sec [root@tumnus root]# uname -a Linux tumnus 2.4.21-27.EL #1 SMP Wed Dec 1 21:53:20 EST 2004 ppc64 ppc64 ppc64 GNU/Linux --------------020001010600000706030604 Content-Type: text/plain; name="drbd.conf" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="drbd.conf" # # drbd.conf example # resource RHEL0 { # transfer protocol to use. # C: write IO is reported as completed, if we know it has # reached _both_ local and remote DISK. # * for critical transactional data. # B: write IO is reported as completed, if it has reached # local DISK and remote buffer cache. # * for most cases. # A: write IO is reported as completed, if it has reached # local DISK and local tcp send buffer. (see also sndbuf-size) # * for high latency networks # #********** # uhm, benchmarks have shown that C is actually better than B. # this note shall disappear, when we are convinced that B is # the right choice "for most cases". # Until then, always use C unless you have a reason not to. # --lge #********** # protocol C; # what should be done in case the cluster starts up in # degraded mode, but knows it has inconsistent data. # incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ; sleep 60 ; halt -f"; startup { # Wait for connection timeout. # The init script blocks the boot process until the resources # are connected. This is so when the cluster manager starts later, # it does not see a resource with internal split-brain. # In case you want to limit the wait time, do it here. # Default is 0, which means unlimited. Unit is seconds. # wfc-timeout 30; # Wait for connection timeout if this node was a degraded cluster. # In case a degraded cluster (= cluster with only one node left) # is rebooted, this timeout value is used. # degr-wfc-timeout 120; # 2 minutes. } syncer { # Limit the bandwith used by the resynchronisation process. # default unit is KB/sec; optional suffixes K,M,G are allowed # rate 100M; } on trumpkin { device /dev/drbd0; disk /dev/sdb2; address 172.17.100.211:60003; meta-disk /dev/sdb7[0]; # meta-disk is either 'internal' or '/dev/ice/name [idx]' # # You can use a single block device to store meta-data # of multiple DRBD's. # E.g. use meta-disk /dev/hde6[0]; and meta-disk /dev/hde6[1]; # for two different resources. In this case the meta-disk # would need to be at least 256 MB in size. # # 'internal' means, that the last 128 MB of the lower device # are used to store the meta-data. # You must not give an index with 'internal'. } on tumnus { device /dev/drbd0; disk /dev/sdb2; address 172.17.100.210:60003; meta-disk /dev/sdb7[0]; } } --------------020001010600000706030604--