From: Goldwyn Rodrigues <rgoldwyn@suse.de>
To: Philipp Marek <philipp.marek@linbit.com>
Cc: drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] [bug] drbd 9: Receiver error
Date: Fri, 20 Feb 2015 03:12:36 -0600 [thread overview]
Message-ID: <54E6FA84.2070306@suse.de> (raw)
In-Reply-To: <20150220074152.GC12926@cacao.linbit>
Hi Philipp,
On 02/20/2015 01:41 AM, Philipp Marek wrote:
> Hi Goldwyn,
>
>> I compiled drbd-9.0 against Opensuse Tumbleweed and installed it [1].
>> I applied Linux kernel commit f730c848affc05fb7262574b06e0cd7e1fa96096 to
>> get it to compile against the latest (factory) kernels and should be the
>> same for the tumbleweed kernel (3.18.3 based) I am using to test.
>>
>> I am getting the following receiver errors in the kernel log when I try to
>> start the drbd service:
> ...
>> Please let me know if you need any more information to debug this or if I am
>> doing something wrong.
> how many nodes did you connect, which DRBD versions were they running?
Two nodes, traditional way with one local device and no other clustering
software.
> Can you show the configuration, and some more log lines - from all nodes,
> and starting quite a bit earlier?
Here is the one which errs:
[ 175.644532] drbd: initialized. Version: 9.0.0rc1 (api:1/proto:86-110)
[ 175.644535] drbd: GIT-hash: 9804ed9b1eedab65cb137380f8066518a9521c12
build by abuild@cloud109, 2015-02-20 08:39:07
[ 175.644536] drbd: registered as block device major 147
[ 175.657174] drbd r0: Starting worker thread (from drbdsetup [1162])
[ 175.797012] sda: unknown partition table
[ 175.798367] block drbd0: disk( Diskless -> Attaching )
[ 175.798404] block drbd0: Maximum number of peer devices = 1
[ 175.798634] drbd r0: Method to ensure write ordering: flush
[ 175.798644] block drbd0: drbd_bm_resize called with capacity == 62912568
[ 175.799879] block drbd0: resync bitmap: bits=7864071 words=122877
pages=240
[ 175.799886] block drbd0: size = 30 GB (31456284 KB)
[ 175.830618] block drbd0: recounting of set bits took additional 0ms
[ 175.830633] block drbd0: Suspended AL updates
[ 175.830642] block drbd0: disk( Attaching -> Inconsistent )
[ 175.830645] block drbd0: attached to current UUID: 0000000000000004
[ 175.843022] drbd r0 tumbleweed1: conn( StandAlone -> Unconnected )
[ 175.843069] drbd r0 tumbleweed1: Starting sender thread (from
drbdsetup [1175])
[ 175.880303] drbd r0 tumbleweed1: Starting receiver thread (from
drbd_w_r0 [1163])
[ 175.880486] drbd r0 tumbleweed1: conn( Unconnected -> Connecting )
[ 176.380200] drbd r0 tumbleweed1: Handshake successful: Agreed network
protocol version 110
[ 176.380205] drbd r0 tumbleweed1: Agreed to support TRIM on protocol level
[ 176.380218] drbd r0 tumbleweed1: Starting asender thread (from
drbd_r_r0 [1183])
[ 176.392180] drbd r0 tumbleweed1: Preparing remote state change
1900523070 (primary_nodes=0, weak_nodes=0)
[ 176.392697] drbd r0 tumbleweed1: Committing remote state change
1900523070
[ 176.392718] drbd r0 tumbleweed1: conn( Connecting -> Connected )
peer( Unknown -> Secondary )
[ 176.400177] block drbd0: tumbleweed1: drbd_sync_handshake:
[ 176.400183] block drbd0: tumbleweed1: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:7864071 flags:0
[ 176.400188] block drbd0: tumbleweed1: peer
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000
bits:8388343 flags:20
[ 176.400191] block drbd0: tumbleweed1: uuid_compare()=-2 by rule 20
[ 176.400195] block drbd0: tumbleweed1: Writing the whole bitmap, full
sync required after drbd_sync_handshake.
[ 176.421355] block drbd0: tumbleweed1: pdsk( DUnknown -> UpToDate )
repl( Off -> WFBitMapT )
[ 176.421366] block drbd0: Resumed AL updates
[ 176.429715] block drbd0: tumbleweed1: bitmap overflow (e:8388342)
while decoding bm RLE packet
[ 176.429739] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP,
e: -5 l: 7!
[ 176.429763] drbd r0 tumbleweed1: conn( Connected -> ProtocolError )
peer( Secondary -> Unknown )
[ 176.429768] block drbd0: tumbleweed1: pdsk( UpToDate -> DUnknown )
repl( WFBitMapT -> Off )
[ 176.429805] drbd r0 tumbleweed1: asender terminated
[ 176.429807] drbd r0 tumbleweed1: Terminating asender thread
[ 176.456361] drbd r0 tumbleweed1: Connection closed
[ 176.456382] drbd r0 tumbleweed1: conn( ProtocolError -> Unconnected )
[ 176.456404] drbd r0 tumbleweed1: Restarting receiver thread
[ 176.456413] drbd r0 tumbleweed1: conn( Unconnected -> Connecting )
[ 176.972179] drbd r0 tumbleweed1: Handshake successful: Agreed network
protocol version 110
[ 176.972185] drbd r0 tumbleweed1: Agreed to support TRIM on protocol level
[ 176.972205] drbd r0 tumbleweed1: Starting asender thread (from
drbd_r_r0 [1183])
[ 176.984192] drbd r0 tumbleweed1: Preparing remote state change
2872485783 (primary_nodes=0, weak_nodes=0)
[ 176.984540] drbd r0 tumbleweed1: Committing remote state change
2872485783
[ 176.984560] drbd r0 tumbleweed1: conn( Connecting -> Connected )
peer( Unknown -> Secondary )
[ 176.992109] block drbd0: tumbleweed1: drbd_sync_handshake:
[ 176.992115] block drbd0: tumbleweed1: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:7864071 flags:0
[ 176.992120] block drbd0: tumbleweed1: peer
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000
bits:8388343 flags:20
[ 176.992124] block drbd0: tumbleweed1: uuid_compare()=-2 by rule 20
[ 176.992127] block drbd0: tumbleweed1: Writing the whole bitmap, full
sync required after drbd_sync_handshake.
[ 177.003945] block drbd0: tumbleweed1: pdsk( DUnknown -> UpToDate )
repl( Off -> WFBitMapT )
[ 177.012276] block drbd0: tumbleweed1: bitmap overflow (e:8388342)
while decoding bm RLE packet
[ 177.012300] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP,
e: -5 l: 7!
[ 177.012324] drbd r0 tumbleweed1: conn( Connected -> ProtocolError )
peer( Secondary -> Unknown )
[ 177.012328] block drbd0: tumbleweed1: pdsk( UpToDate -> DUnknown )
repl( WFBitMapT -> Off )
[ 177.012365] drbd r0 tumbleweed1: asender terminated
[ 177.012369] drbd r0 tumbleweed1: Terminating asender thread
[ 177.036176] drbd r0 tumbleweed1: Connection closed
[ 177.036197] drbd r0 tumbleweed1: conn( ProtocolError -> Unconnected )
[ 177.036219] drbd r0 tumbleweed1: Restarting receiver thread
[ 177.036237] drbd r0 tumbleweed1: conn( Unconnected -> Connecting )
[ 177.536161] drbd r0 tumbleweed1: Handshake successful: Agreed network
protocol version 110
[ 177.536165] drbd r0 tumbleweed1: Agreed to support TRIM on protocol level
[ 177.536177] drbd r0 tumbleweed1: Starting asender thread (from
drbd_r_r0 [1183])
[ 177.544168] drbd r0 tumbleweed1: Preparing remote state change
1844200584 (primary_nodes=0, weak_nodes=0)
[ 177.544569] drbd r0 tumbleweed1: Committing remote state change
1844200584
[ 177.544588] drbd r0 tumbleweed1: conn( Connecting -> Connected )
peer( Unknown -> Secondary )
[ 177.548098] block drbd0: tumbleweed1: drbd_sync_handshake:
[ 177.548106] block drbd0: tumbleweed1: self
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:7864071 flags:0
[ 177.548113] block drbd0: tumbleweed1: peer
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000
bits:8388343 flags:20
[ 177.548118] block drbd0: tumbleweed1: uuid_compare()=-2 by rule 20
[ 177.548122] block drbd0: tumbleweed1: Writing the whole bitmap, full
sync required after drbd_sync_handshake.
[ 177.561761] block drbd0: tumbleweed1: pdsk( DUnknown -> UpToDate )
repl( Off -> WFBitMapT )
[ 177.570088] block drbd0: tumbleweed1: bitmap overflow (e:8388342)
while decoding bm RLE packet
[ 177.570109] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP,
e: -5 l: 7!
[the last messages start looping]
And the one which attempts to be primary
[ 148.615292] drbd: initialized. Version: 9.0.0rc1 (api:1/proto:86-110)
[ 148.615297] drbd: GIT-hash: 9804ed9b1eedab65cb137380f8066518a9521c12
build by abuild@cloud109, 2015-02-20 08:39:07
[ 148.615299] drbd: registered as block device major 147
[ 148.629013] drbd r0: Starting worker thread (from drbdsetup [1138])
[ 148.746514] sda: unknown partition table
[ 148.747150] block drbd0: disk( Diskless -> Attaching )
[ 148.747161] block drbd0: Maximum number of peer devices = 1
[ 148.747251] drbd r0: Method to ensure write ordering: flush
[ 148.747255] block drbd0: drbd_bm_resize called with capacity == 67106744
[ 148.748075] block drbd0: resync bitmap: bits=8388343 words=131068
pages=256
[ 148.748078] block drbd0: size = 32 GB (33553372 KB)
[ 148.774226] block drbd0: recounting of set bits took additional 0ms
[ 148.774247] block drbd0: Suspended AL updates
[ 148.774262] block drbd0: disk( Attaching -> UpToDate )
[ 148.774270] block drbd0: attached to current UUID: 2077E6DCE2ED8D5E
[ 148.826678] drbd r0 tumbleweed3: conn( StandAlone -> Unconnected )
[ 148.826719] drbd r0 tumbleweed3: Starting sender thread (from
drbdsetup [1151])
[ 148.880835] drbd r0 tumbleweed3: Starting receiver thread (from
drbd_w_r0 [1139])
[ 148.881016] drbd r0 tumbleweed3: conn( Unconnected -> Connecting )
[ 156.921255] drbd r0 tumbleweed3: Handshake successful: Agreed network
protocol version 110
[ 156.921260] drbd r0 tumbleweed3: Agreed to support TRIM on protocol level
[ 156.921275] drbd r0 tumbleweed3: Starting asender thread (from
drbd_r_r0 [1159])
[ 156.928136] drbd r0: Preparing cluster-wide state change 1900523070
(0->1 499/146)
[ 156.933242] drbd r0: State change 1900523070: primary_nodes=0,
weak_nodes=0
[ 156.933246] drbd r0: Committing cluster-wide state change 1900523070
(4ms)
[ 156.933263] drbd r0 tumbleweed3: conn( Connecting -> Connected )
peer( Unknown -> Secondary )
[ 156.936119] block drbd0: tumbleweed3: drbd_sync_handshake:
[ 156.936125] block drbd0: tumbleweed3: self
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000
bits:8388343 flags:0
[ 156.936130] block drbd0: tumbleweed3: peer
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:7864071 flags:24
[ 156.936133] block drbd0: tumbleweed3: uuid_compare()=2 by rule 30
[ 156.936136] block drbd0: tumbleweed3: Writing the whole bitmap, full
sync required after drbd_sync_handshake.
[ 156.963632] block drbd0: tumbleweed3: pdsk( DUnknown -> Inconsistent
) repl( Off -> WFBitMapS )
[ 156.963644] block drbd0: Resumed AL updates
[ 156.966625] block drbd0: tumbleweed3: send bitmap stats
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[ 156.985166] drbd r0 tumbleweed3: sock was shut down by peer
[ 156.985186] drbd r0 tumbleweed3: conn( Connected -> BrokenPipe )
peer( Secondary -> Unknown )
[ 156.985191] block drbd0: tumbleweed3: repl( WFBitMapS -> Off )
[ 156.985215] drbd r0 tumbleweed3: short read (expected size 16)
[ 156.985238] drbd r0 tumbleweed3: asender terminated
[ 156.985241] drbd r0 tumbleweed3: Terminating asender thread
[ 157.012359] drbd r0 tumbleweed3: Connection closed
[ 157.012380] drbd r0 tumbleweed3: conn( BrokenPipe -> Unconnected )
[ 157.012402] drbd r0 tumbleweed3: Restarting receiver thread
[ 157.012420] drbd r0 tumbleweed3: conn( Unconnected -> Connecting )
[ 157.513134] drbd r0 tumbleweed3: Handshake successful: Agreed network
protocol version 110
[ 157.513138] drbd r0 tumbleweed3: Agreed to support TRIM on protocol level
[ 157.513153] drbd r0 tumbleweed3: Starting asender thread (from
drbd_r_r0 [1159])
[ 157.520104] drbd r0: Preparing cluster-wide state change 2872485783
(0->1 499/146)
[ 157.525159] drbd r0: State change 2872485783: primary_nodes=0,
weak_nodes=0
[ 157.525162] drbd r0: Committing cluster-wide state change 2872485783
(4ms)
[ 157.525179] drbd r0 tumbleweed3: conn( Connecting -> Connected )
peer( Unknown -> Secondary )
[ 157.528106] block drbd0: tumbleweed3: drbd_sync_handshake:
[ 157.528111] block drbd0: tumbleweed3: self
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000
bits:8388343 flags:0
[ 157.528115] block drbd0: tumbleweed3: peer
0000000000000004:0000000000000000:0000000000000000:0000000000000000
bits:7864071 flags:24
[ 157.528118] block drbd0: tumbleweed3: uuid_compare()=2 by rule 30
[ 157.528121] block drbd0: tumbleweed3: Writing the whole bitmap, full
sync required after drbd_sync_handshake.
[ 157.546946] block drbd0: tumbleweed3: repl( Off -> WFBitMapS )
[ 157.548668] block drbd0: tumbleweed3: send bitmap stats
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[ 157.561288] drbd r0 tumbleweed3: sock was shut down by peer
[ 157.561306] drbd r0 tumbleweed3: conn( Connected -> BrokenPipe )
peer( Secondary -> Unknown )
[ 157.561311] block drbd0: tumbleweed3: repl( WFBitMapS -> Off )
[ 157.561335] drbd r0 tumbleweed3: short read (expected size 16)
[ 157.561358] drbd r0 tumbleweed3: asender terminated
[ 157.561360] drbd r0 tumbleweed3: Terminating asender thread
[ 157.576230] drbd r0 tumbleweed3: Connection closed
[ 157.576251] drbd r0 tumbleweed3: conn( BrokenPipe -> Unconnected )
[ 157.576273] drbd r0 tumbleweed3: Restarting receiver thread
[ 157.576290] drbd r0 tumbleweed3: conn( Unconnected -> Connecting )
[ 158.077114] drbd r0 tumbleweed3: Handshake successful: Agreed network
protocol version 110
[ 158.077119] drbd r0 tumbleweed3: Agreed to support TRIM on protocol level
[ 158.077152] drbd r0 tumbleweed3: Starting asender thread (from
drbd_r_r0 [1159])
[ 158.084157] drbd r0: Preparing cluster-wide state change 1844200584
(0->1 499/146)
[ 158.085191] drbd r0: State change 1844200584: primary_nodes=0,
weak_nodes=0
[ 158.085196] drbd r0: Committing cluster-wide state change 1844200584
(0ms)
[ 158.085214] drbd r0 tumbleweed3: conn( Connecting -> Connected )
peer( Unknown -> Secondary )
And the configs:
tumbleweed1:~ # cat /etc/drbd.conf
# You can find an example in /usr/share/doc/drbd.../drbd.conf.example
include "drbd.d/global_common.conf";
include "drbd.d/*.res";
I haven't turned on any optins in global_common.conf
tumbleweed1:~ # cat /etc/drbd.d/r0.res
resource r0 {
device /dev/drbd_r0 minor 0;
disk /dev/sda;
meta-disk internal;
on tumbleweed1 {
address 192.168.1.111:7788;
}
on tumbleweed3 {
address 192.168.1.113:7788;
}
syncer {
rate 7M;
}
}
> And then the DRBD 9 git HEAD has already moved again...
>
Yes, I rebuilt against the latest one and the error still occurs.
BTW, the latest git still needs to fix the f_dentry in
drbd/drbd_debugfs.c not covered in 7b00306c929c389bf030463466cb62e65bc415ec
IOW, there is still reference to f_dentry is drbd/drbd_debugfs.c
--
Goldwyn
next prev parent reply other threads:[~2015-02-20 9:14 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-20 5:40 [Drbd-dev] [bug] drbd 9: Receiver error Goldwyn Rodrigues
2015-02-20 7:41 ` Philipp Marek
2015-02-20 9:12 ` Goldwyn Rodrigues [this message]
2015-02-20 9:26 ` Philipp Marek
2015-02-20 14:18 ` Lars Ellenberg
2015-02-23 17:18 ` Goldwyn Rodrigues
2015-02-26 20:12 ` Lars Ellenberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=54E6FA84.2070306@suse.de \
--to=rgoldwyn@suse.de \
--cc=drbd-dev@lists.linbit.com \
--cc=philipp.marek@linbit.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.