All of lore.kernel.org
 help / color / mirror / Atom feed
From: Goldwyn Rodrigues <rgoldwyn@suse.de>
To: Philipp Marek <philipp.marek@linbit.com>
Cc: drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] [bug] drbd 9: Receiver error
Date: Fri, 20 Feb 2015 03:12:36 -0600	[thread overview]
Message-ID: <54E6FA84.2070306@suse.de> (raw)
In-Reply-To: <20150220074152.GC12926@cacao.linbit>

Hi Philipp,


On 02/20/2015 01:41 AM, Philipp Marek wrote:
> Hi Goldwyn,
>
>> I compiled drbd-9.0 against Opensuse Tumbleweed and installed it [1].
>> I applied Linux kernel commit f730c848affc05fb7262574b06e0cd7e1fa96096 to
>> get it to compile against the latest (factory) kernels and should be the
>> same for the tumbleweed kernel (3.18.3 based) I am using to test.
>>
>> I am getting the following receiver errors in the kernel log when I try to
>> start the drbd service:
> ...
>> Please let me know if you need any more information to debug this or if I am
>> doing something wrong.
> how many nodes did you connect, which DRBD versions were they running?

Two nodes, traditional way with one local device and no other clustering 
software.

> Can you show the configuration, and some more log lines - from all nodes,
> and starting quite a bit earlier?

Here is the one which errs:

[  175.644532] drbd: initialized. Version: 9.0.0rc1 (api:1/proto:86-110)
[  175.644535] drbd: GIT-hash: 9804ed9b1eedab65cb137380f8066518a9521c12 
build by abuild@cloud109, 2015-02-20 08:39:07
[  175.644536] drbd: registered as block device major 147
[  175.657174] drbd r0: Starting worker thread (from drbdsetup [1162])
[  175.797012]  sda: unknown partition table
[  175.798367] block drbd0: disk( Diskless -> Attaching )
[  175.798404] block drbd0: Maximum number of peer devices = 1
[  175.798634] drbd r0: Method to ensure write ordering: flush
[  175.798644] block drbd0: drbd_bm_resize called with capacity == 62912568
[  175.799879] block drbd0: resync bitmap: bits=7864071 words=122877 
pages=240
[  175.799886] block drbd0: size = 30 GB (31456284 KB)
[  175.830618] block drbd0: recounting of set bits took additional 0ms
[  175.830633] block drbd0: Suspended AL updates
[  175.830642] block drbd0: disk( Attaching -> Inconsistent )
[  175.830645] block drbd0: attached to current UUID: 0000000000000004
[  175.843022] drbd r0 tumbleweed1: conn( StandAlone -> Unconnected )
[  175.843069] drbd r0 tumbleweed1: Starting sender thread (from 
drbdsetup [1175])
[  175.880303] drbd r0 tumbleweed1: Starting receiver thread (from 
drbd_w_r0 [1163])
[  175.880486] drbd r0 tumbleweed1: conn( Unconnected -> Connecting )
[  176.380200] drbd r0 tumbleweed1: Handshake successful: Agreed network 
protocol version 110
[  176.380205] drbd r0 tumbleweed1: Agreed to support TRIM on protocol level
[  176.380218] drbd r0 tumbleweed1: Starting asender thread (from 
drbd_r_r0 [1183])
[  176.392180] drbd r0 tumbleweed1: Preparing remote state change 
1900523070 (primary_nodes=0, weak_nodes=0)
[  176.392697] drbd r0 tumbleweed1: Committing remote state change 
1900523070
[  176.392718] drbd r0 tumbleweed1: conn( Connecting -> Connected ) 
peer( Unknown -> Secondary )
[  176.400177] block drbd0: tumbleweed1: drbd_sync_handshake:
[  176.400183] block drbd0: tumbleweed1: self 
0000000000000004:0000000000000000:0000000000000000:0000000000000000 
bits:7864071 flags:0
[  176.400188] block drbd0: tumbleweed1: peer 
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000 
bits:8388343 flags:20
[  176.400191] block drbd0: tumbleweed1: uuid_compare()=-2 by rule 20
[  176.400195] block drbd0: tumbleweed1: Writing the whole bitmap, full 
sync required after drbd_sync_handshake.
[  176.421355] block drbd0: tumbleweed1: pdsk( DUnknown -> UpToDate ) 
repl( Off -> WFBitMapT )
[  176.421366] block drbd0: Resumed AL updates
[  176.429715] block drbd0: tumbleweed1: bitmap overflow (e:8388342) 
while decoding bm RLE packet
[  176.429739] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP, 
e: -5 l: 7!
[  176.429763] drbd r0 tumbleweed1: conn( Connected -> ProtocolError ) 
peer( Secondary -> Unknown )
[  176.429768] block drbd0: tumbleweed1: pdsk( UpToDate -> DUnknown ) 
repl( WFBitMapT -> Off )
[  176.429805] drbd r0 tumbleweed1: asender terminated
[  176.429807] drbd r0 tumbleweed1: Terminating asender thread
[  176.456361] drbd r0 tumbleweed1: Connection closed
[  176.456382] drbd r0 tumbleweed1: conn( ProtocolError -> Unconnected )
[  176.456404] drbd r0 tumbleweed1: Restarting receiver thread
[  176.456413] drbd r0 tumbleweed1: conn( Unconnected -> Connecting )
[  176.972179] drbd r0 tumbleweed1: Handshake successful: Agreed network 
protocol version 110
[  176.972185] drbd r0 tumbleweed1: Agreed to support TRIM on protocol level
[  176.972205] drbd r0 tumbleweed1: Starting asender thread (from 
drbd_r_r0 [1183])
[  176.984192] drbd r0 tumbleweed1: Preparing remote state change 
2872485783 (primary_nodes=0, weak_nodes=0)
[  176.984540] drbd r0 tumbleweed1: Committing remote state change 
2872485783
[  176.984560] drbd r0 tumbleweed1: conn( Connecting -> Connected ) 
peer( Unknown -> Secondary )
[  176.992109] block drbd0: tumbleweed1: drbd_sync_handshake:
[  176.992115] block drbd0: tumbleweed1: self 
0000000000000004:0000000000000000:0000000000000000:0000000000000000 
bits:7864071 flags:0
[  176.992120] block drbd0: tumbleweed1: peer 
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000 
bits:8388343 flags:20
[  176.992124] block drbd0: tumbleweed1: uuid_compare()=-2 by rule 20
[  176.992127] block drbd0: tumbleweed1: Writing the whole bitmap, full 
sync required after drbd_sync_handshake.
[  177.003945] block drbd0: tumbleweed1: pdsk( DUnknown -> UpToDate ) 
repl( Off -> WFBitMapT )
[  177.012276] block drbd0: tumbleweed1: bitmap overflow (e:8388342) 
while decoding bm RLE packet
[  177.012300] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP, 
e: -5 l: 7!
[  177.012324] drbd r0 tumbleweed1: conn( Connected -> ProtocolError ) 
peer( Secondary -> Unknown )
[  177.012328] block drbd0: tumbleweed1: pdsk( UpToDate -> DUnknown ) 
repl( WFBitMapT -> Off )
[  177.012365] drbd r0 tumbleweed1: asender terminated
[  177.012369] drbd r0 tumbleweed1: Terminating asender thread
[  177.036176] drbd r0 tumbleweed1: Connection closed
[  177.036197] drbd r0 tumbleweed1: conn( ProtocolError -> Unconnected )
[  177.036219] drbd r0 tumbleweed1: Restarting receiver thread
[  177.036237] drbd r0 tumbleweed1: conn( Unconnected -> Connecting )
[  177.536161] drbd r0 tumbleweed1: Handshake successful: Agreed network 
protocol version 110
[  177.536165] drbd r0 tumbleweed1: Agreed to support TRIM on protocol level
[  177.536177] drbd r0 tumbleweed1: Starting asender thread (from 
drbd_r_r0 [1183])
[  177.544168] drbd r0 tumbleweed1: Preparing remote state change 
1844200584 (primary_nodes=0, weak_nodes=0)
[  177.544569] drbd r0 tumbleweed1: Committing remote state change 
1844200584
[  177.544588] drbd r0 tumbleweed1: conn( Connecting -> Connected ) 
peer( Unknown -> Secondary )
[  177.548098] block drbd0: tumbleweed1: drbd_sync_handshake:
[  177.548106] block drbd0: tumbleweed1: self 
0000000000000004:0000000000000000:0000000000000000:0000000000000000 
bits:7864071 flags:0
[  177.548113] block drbd0: tumbleweed1: peer 
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000 
bits:8388343 flags:20
[  177.548118] block drbd0: tumbleweed1: uuid_compare()=-2 by rule 20
[  177.548122] block drbd0: tumbleweed1: Writing the whole bitmap, full 
sync required after drbd_sync_handshake.
[  177.561761] block drbd0: tumbleweed1: pdsk( DUnknown -> UpToDate ) 
repl( Off -> WFBitMapT )
[  177.570088] block drbd0: tumbleweed1: bitmap overflow (e:8388342) 
while decoding bm RLE packet
[  177.570109] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP, 
e: -5 l: 7!

[the last messages start looping]

And the one which attempts to be primary
[  148.615292] drbd: initialized. Version: 9.0.0rc1 (api:1/proto:86-110)
[  148.615297] drbd: GIT-hash: 9804ed9b1eedab65cb137380f8066518a9521c12 
build by abuild@cloud109, 2015-02-20 08:39:07
[  148.615299] drbd: registered as block device major 147
[  148.629013] drbd r0: Starting worker thread (from drbdsetup [1138])
[  148.746514]  sda: unknown partition table
[  148.747150] block drbd0: disk( Diskless -> Attaching )
[  148.747161] block drbd0: Maximum number of peer devices = 1
[  148.747251] drbd r0: Method to ensure write ordering: flush
[  148.747255] block drbd0: drbd_bm_resize called with capacity == 67106744
[  148.748075] block drbd0: resync bitmap: bits=8388343 words=131068 
pages=256
[  148.748078] block drbd0: size = 32 GB (33553372 KB)
[  148.774226] block drbd0: recounting of set bits took additional 0ms
[  148.774247] block drbd0: Suspended AL updates
[  148.774262] block drbd0: disk( Attaching -> UpToDate )
[  148.774270] block drbd0: attached to current UUID: 2077E6DCE2ED8D5E
[  148.826678] drbd r0 tumbleweed3: conn( StandAlone -> Unconnected )
[  148.826719] drbd r0 tumbleweed3: Starting sender thread (from 
drbdsetup [1151])
[  148.880835] drbd r0 tumbleweed3: Starting receiver thread (from 
drbd_w_r0 [1139])
[  148.881016] drbd r0 tumbleweed3: conn( Unconnected -> Connecting )
[  156.921255] drbd r0 tumbleweed3: Handshake successful: Agreed network 
protocol version 110
[  156.921260] drbd r0 tumbleweed3: Agreed to support TRIM on protocol level
[  156.921275] drbd r0 tumbleweed3: Starting asender thread (from 
drbd_r_r0 [1159])
[  156.928136] drbd r0: Preparing cluster-wide state change 1900523070 
(0->1 499/146)
[  156.933242] drbd r0: State change 1900523070: primary_nodes=0, 
weak_nodes=0
[  156.933246] drbd r0: Committing cluster-wide state change 1900523070 
(4ms)
[  156.933263] drbd r0 tumbleweed3: conn( Connecting -> Connected ) 
peer( Unknown -> Secondary )
[  156.936119] block drbd0: tumbleweed3: drbd_sync_handshake:
[  156.936125] block drbd0: tumbleweed3: self 
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000 
bits:8388343 flags:0
[  156.936130] block drbd0: tumbleweed3: peer 
0000000000000004:0000000000000000:0000000000000000:0000000000000000 
bits:7864071 flags:24
[  156.936133] block drbd0: tumbleweed3: uuid_compare()=2 by rule 30
[  156.936136] block drbd0: tumbleweed3: Writing the whole bitmap, full 
sync required after drbd_sync_handshake.
[  156.963632] block drbd0: tumbleweed3: pdsk( DUnknown -> Inconsistent 
) repl( Off -> WFBitMapS )
[  156.963644] block drbd0: Resumed AL updates
[  156.966625] block drbd0: tumbleweed3: send bitmap stats 
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[  156.985166] drbd r0 tumbleweed3: sock was shut down by peer
[  156.985186] drbd r0 tumbleweed3: conn( Connected -> BrokenPipe ) 
peer( Secondary -> Unknown )
[  156.985191] block drbd0: tumbleweed3: repl( WFBitMapS -> Off )
[  156.985215] drbd r0 tumbleweed3: short read (expected size 16)
[  156.985238] drbd r0 tumbleweed3: asender terminated
[  156.985241] drbd r0 tumbleweed3: Terminating asender thread
[  157.012359] drbd r0 tumbleweed3: Connection closed
[  157.012380] drbd r0 tumbleweed3: conn( BrokenPipe -> Unconnected )
[  157.012402] drbd r0 tumbleweed3: Restarting receiver thread
[  157.012420] drbd r0 tumbleweed3: conn( Unconnected -> Connecting )
[  157.513134] drbd r0 tumbleweed3: Handshake successful: Agreed network 
protocol version 110
[  157.513138] drbd r0 tumbleweed3: Agreed to support TRIM on protocol level
[  157.513153] drbd r0 tumbleweed3: Starting asender thread (from 
drbd_r_r0 [1159])
[  157.520104] drbd r0: Preparing cluster-wide state change 2872485783 
(0->1 499/146)
[  157.525159] drbd r0: State change 2872485783: primary_nodes=0, 
weak_nodes=0
[  157.525162] drbd r0: Committing cluster-wide state change 2872485783 
(4ms)
[  157.525179] drbd r0 tumbleweed3: conn( Connecting -> Connected ) 
peer( Unknown -> Secondary )
[  157.528106] block drbd0: tumbleweed3: drbd_sync_handshake:
[  157.528111] block drbd0: tumbleweed3: self 
2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000 
bits:8388343 flags:0
[  157.528115] block drbd0: tumbleweed3: peer 
0000000000000004:0000000000000000:0000000000000000:0000000000000000 
bits:7864071 flags:24
[  157.528118] block drbd0: tumbleweed3: uuid_compare()=2 by rule 30
[  157.528121] block drbd0: tumbleweed3: Writing the whole bitmap, full 
sync required after drbd_sync_handshake.
[  157.546946] block drbd0: tumbleweed3: repl( Off -> WFBitMapS )
[  157.548668] block drbd0: tumbleweed3: send bitmap stats 
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[  157.561288] drbd r0 tumbleweed3: sock was shut down by peer
[  157.561306] drbd r0 tumbleweed3: conn( Connected -> BrokenPipe ) 
peer( Secondary -> Unknown )
[  157.561311] block drbd0: tumbleweed3: repl( WFBitMapS -> Off )
[  157.561335] drbd r0 tumbleweed3: short read (expected size 16)
[  157.561358] drbd r0 tumbleweed3: asender terminated
[  157.561360] drbd r0 tumbleweed3: Terminating asender thread
[  157.576230] drbd r0 tumbleweed3: Connection closed
[  157.576251] drbd r0 tumbleweed3: conn( BrokenPipe -> Unconnected )
[  157.576273] drbd r0 tumbleweed3: Restarting receiver thread
[  157.576290] drbd r0 tumbleweed3: conn( Unconnected -> Connecting )
[  158.077114] drbd r0 tumbleweed3: Handshake successful: Agreed network 
protocol version 110
[  158.077119] drbd r0 tumbleweed3: Agreed to support TRIM on protocol level
[  158.077152] drbd r0 tumbleweed3: Starting asender thread (from 
drbd_r_r0 [1159])
[  158.084157] drbd r0: Preparing cluster-wide state change 1844200584 
(0->1 499/146)
[  158.085191] drbd r0: State change 1844200584: primary_nodes=0, 
weak_nodes=0
[  158.085196] drbd r0: Committing cluster-wide state change 1844200584 
(0ms)
[  158.085214] drbd r0 tumbleweed3: conn( Connecting -> Connected ) 
peer( Unknown -> Secondary )


And the configs:

tumbleweed1:~ # cat /etc/drbd.conf
# You can find an example in  /usr/share/doc/drbd.../drbd.conf.example

include "drbd.d/global_common.conf";
include "drbd.d/*.res";


I haven't turned on any optins in global_common.conf

tumbleweed1:~ # cat /etc/drbd.d/r0.res
resource r0 {
   device /dev/drbd_r0 minor 0;
   disk /dev/sda;
   meta-disk internal;
   on tumbleweed1 {
     address  192.168.1.111:7788;
   }
   on tumbleweed3 {
     address 192.168.1.113:7788;
   }
   syncer {
     rate  7M;
   }
}


> And then the DRBD 9 git HEAD has already moved again...
>

Yes, I rebuilt against the latest one and the error still occurs.

BTW, the latest git still needs to fix the f_dentry in 
drbd/drbd_debugfs.c not covered in 7b00306c929c389bf030463466cb62e65bc415ec

IOW, there is still reference to f_dentry is drbd/drbd_debugfs.c



-- 
Goldwyn

  reply	other threads:[~2015-02-20  9:14 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-20  5:40 [Drbd-dev] [bug] drbd 9: Receiver error Goldwyn Rodrigues
2015-02-20  7:41 ` Philipp Marek
2015-02-20  9:12   ` Goldwyn Rodrigues [this message]
2015-02-20  9:26     ` Philipp Marek
2015-02-20 14:18     ` Lars Ellenberg
2015-02-23 17:18       ` Goldwyn Rodrigues
2015-02-26 20:12         ` Lars Ellenberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54E6FA84.2070306@suse.de \
    --to=rgoldwyn@suse.de \
    --cc=drbd-dev@lists.linbit.com \
    --cc=philipp.marek@linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.