* [Drbd-dev] [bug] drbd 9: Receiver error @ 2015-02-20 5:40 Goldwyn Rodrigues 2015-02-20 7:41 ` Philipp Marek 0 siblings, 1 reply; 7+ messages in thread From: Goldwyn Rodrigues @ 2015-02-20 5:40 UTC (permalink / raw) To: drbd-dev Hi, I compiled drbd-9.0 against Opensuse Tumbleweed and installed it [1]. I applied Linux kernel commit f730c848affc05fb7262574b06e0cd7e1fa96096 to get it to compile against the latest (factory) kernels and should be the same for the tumbleweed kernel (3.18.3 based) I am using to test. I am getting the following receiver errors in the kernel log when I try to start the drbd service: [ 862.129755] drbd r0 tumbleweed1: conn( Connected -> ProtocolError ) peer( Secondary -> Unknown ) [ 862.129760] block drbd0: tumbleweed1: pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off ) [ 862.129797] drbd r0 tumbleweed1: asender terminated [ 862.129800] drbd r0 tumbleweed1: Terminating asender thread [ 862.156233] drbd r0 tumbleweed1: Connection closed [ 862.156254] drbd r0 tumbleweed1: conn( ProtocolError -> Unconnected ) [ 862.156276] drbd r0 tumbleweed1: Restarting receiver thread [ 862.156294] drbd r0 tumbleweed1: conn( Unconnected -> Connecting ) [ 862.656286] drbd r0 tumbleweed1: Handshake successful: Agreed network protocol version 110 [ 862.656290] drbd r0 tumbleweed1: Agreed to support TRIM on protocol level [ 862.656304] drbd r0 tumbleweed1: Starting asender thread (from drbd_r_r0 [13897]) [ 862.732281] drbd r0 tumbleweed1: Preparing remote state change 528121841 (primary_nodes=0, weak_nodes=0) [ 862.732990] drbd r0 tumbleweed1: Committing remote state change 528121841 [ 862.733010] drbd r0 tumbleweed1: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [ 862.736117] block drbd0: tumbleweed1: drbd_sync_handshake: [ 862.736127] block drbd0: tumbleweed1: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:7864071 flags:0 [ 862.736131] block drbd0: tumbleweed1: peer 2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000 bits:8388343 flags:20 [ 862.736135] block drbd0: tumbleweed1: uuid_compare()=-2 by rule 20 [ 862.736138] block drbd0: tumbleweed1: Writing the whole bitmap, full sync required after drbd_sync_handshake. [ 862.766801] block drbd0: tumbleweed1: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT ) [ 862.775107] block drbd0: tumbleweed1: bitmap overflow (e:8388342) while decoding bm RLE packet [ 862.775127] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP, e: -5 l: 7! [1] Opensuse Packages builds: https://build.opensuse.org/package/show/home:goldwynr:branches:network:ha-clustering:Factory/drbd9 Please let me know if you need any more information to debug this or if I am doing something wrong. -- Goldwyn ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Drbd-dev] [bug] drbd 9: Receiver error 2015-02-20 5:40 [Drbd-dev] [bug] drbd 9: Receiver error Goldwyn Rodrigues @ 2015-02-20 7:41 ` Philipp Marek 2015-02-20 9:12 ` Goldwyn Rodrigues 0 siblings, 1 reply; 7+ messages in thread From: Philipp Marek @ 2015-02-20 7:41 UTC (permalink / raw) To: Goldwyn Rodrigues; +Cc: drbd-dev Hi Goldwyn, > I compiled drbd-9.0 against Opensuse Tumbleweed and installed it [1]. > I applied Linux kernel commit f730c848affc05fb7262574b06e0cd7e1fa96096 to > get it to compile against the latest (factory) kernels and should be the > same for the tumbleweed kernel (3.18.3 based) I am using to test. > > I am getting the following receiver errors in the kernel log when I try to > start the drbd service: ... > Please let me know if you need any more information to debug this or if I am > doing something wrong. how many nodes did you connect, which DRBD versions were they running? Can you show the configuration, and some more log lines - from all nodes, and starting quite a bit earlier? And then the DRBD 9 git HEAD has already moved again... Regards, Phil -- : Ing. Philipp Marek : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com : DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Drbd-dev] [bug] drbd 9: Receiver error 2015-02-20 7:41 ` Philipp Marek @ 2015-02-20 9:12 ` Goldwyn Rodrigues 2015-02-20 9:26 ` Philipp Marek 2015-02-20 14:18 ` Lars Ellenberg 0 siblings, 2 replies; 7+ messages in thread From: Goldwyn Rodrigues @ 2015-02-20 9:12 UTC (permalink / raw) To: Philipp Marek; +Cc: drbd-dev Hi Philipp, On 02/20/2015 01:41 AM, Philipp Marek wrote: > Hi Goldwyn, > >> I compiled drbd-9.0 against Opensuse Tumbleweed and installed it [1]. >> I applied Linux kernel commit f730c848affc05fb7262574b06e0cd7e1fa96096 to >> get it to compile against the latest (factory) kernels and should be the >> same for the tumbleweed kernel (3.18.3 based) I am using to test. >> >> I am getting the following receiver errors in the kernel log when I try to >> start the drbd service: > ... >> Please let me know if you need any more information to debug this or if I am >> doing something wrong. > how many nodes did you connect, which DRBD versions were they running? Two nodes, traditional way with one local device and no other clustering software. > Can you show the configuration, and some more log lines - from all nodes, > and starting quite a bit earlier? Here is the one which errs: [ 175.644532] drbd: initialized. Version: 9.0.0rc1 (api:1/proto:86-110) [ 175.644535] drbd: GIT-hash: 9804ed9b1eedab65cb137380f8066518a9521c12 build by abuild@cloud109, 2015-02-20 08:39:07 [ 175.644536] drbd: registered as block device major 147 [ 175.657174] drbd r0: Starting worker thread (from drbdsetup [1162]) [ 175.797012] sda: unknown partition table [ 175.798367] block drbd0: disk( Diskless -> Attaching ) [ 175.798404] block drbd0: Maximum number of peer devices = 1 [ 175.798634] drbd r0: Method to ensure write ordering: flush [ 175.798644] block drbd0: drbd_bm_resize called with capacity == 62912568 [ 175.799879] block drbd0: resync bitmap: bits=7864071 words=122877 pages=240 [ 175.799886] block drbd0: size = 30 GB (31456284 KB) [ 175.830618] block drbd0: recounting of set bits took additional 0ms [ 175.830633] block drbd0: Suspended AL updates [ 175.830642] block drbd0: disk( Attaching -> Inconsistent ) [ 175.830645] block drbd0: attached to current UUID: 0000000000000004 [ 175.843022] drbd r0 tumbleweed1: conn( StandAlone -> Unconnected ) [ 175.843069] drbd r0 tumbleweed1: Starting sender thread (from drbdsetup [1175]) [ 175.880303] drbd r0 tumbleweed1: Starting receiver thread (from drbd_w_r0 [1163]) [ 175.880486] drbd r0 tumbleweed1: conn( Unconnected -> Connecting ) [ 176.380200] drbd r0 tumbleweed1: Handshake successful: Agreed network protocol version 110 [ 176.380205] drbd r0 tumbleweed1: Agreed to support TRIM on protocol level [ 176.380218] drbd r0 tumbleweed1: Starting asender thread (from drbd_r_r0 [1183]) [ 176.392180] drbd r0 tumbleweed1: Preparing remote state change 1900523070 (primary_nodes=0, weak_nodes=0) [ 176.392697] drbd r0 tumbleweed1: Committing remote state change 1900523070 [ 176.392718] drbd r0 tumbleweed1: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [ 176.400177] block drbd0: tumbleweed1: drbd_sync_handshake: [ 176.400183] block drbd0: tumbleweed1: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:7864071 flags:0 [ 176.400188] block drbd0: tumbleweed1: peer 2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000 bits:8388343 flags:20 [ 176.400191] block drbd0: tumbleweed1: uuid_compare()=-2 by rule 20 [ 176.400195] block drbd0: tumbleweed1: Writing the whole bitmap, full sync required after drbd_sync_handshake. [ 176.421355] block drbd0: tumbleweed1: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT ) [ 176.421366] block drbd0: Resumed AL updates [ 176.429715] block drbd0: tumbleweed1: bitmap overflow (e:8388342) while decoding bm RLE packet [ 176.429739] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP, e: -5 l: 7! [ 176.429763] drbd r0 tumbleweed1: conn( Connected -> ProtocolError ) peer( Secondary -> Unknown ) [ 176.429768] block drbd0: tumbleweed1: pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off ) [ 176.429805] drbd r0 tumbleweed1: asender terminated [ 176.429807] drbd r0 tumbleweed1: Terminating asender thread [ 176.456361] drbd r0 tumbleweed1: Connection closed [ 176.456382] drbd r0 tumbleweed1: conn( ProtocolError -> Unconnected ) [ 176.456404] drbd r0 tumbleweed1: Restarting receiver thread [ 176.456413] drbd r0 tumbleweed1: conn( Unconnected -> Connecting ) [ 176.972179] drbd r0 tumbleweed1: Handshake successful: Agreed network protocol version 110 [ 176.972185] drbd r0 tumbleweed1: Agreed to support TRIM on protocol level [ 176.972205] drbd r0 tumbleweed1: Starting asender thread (from drbd_r_r0 [1183]) [ 176.984192] drbd r0 tumbleweed1: Preparing remote state change 2872485783 (primary_nodes=0, weak_nodes=0) [ 176.984540] drbd r0 tumbleweed1: Committing remote state change 2872485783 [ 176.984560] drbd r0 tumbleweed1: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [ 176.992109] block drbd0: tumbleweed1: drbd_sync_handshake: [ 176.992115] block drbd0: tumbleweed1: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:7864071 flags:0 [ 176.992120] block drbd0: tumbleweed1: peer 2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000 bits:8388343 flags:20 [ 176.992124] block drbd0: tumbleweed1: uuid_compare()=-2 by rule 20 [ 176.992127] block drbd0: tumbleweed1: Writing the whole bitmap, full sync required after drbd_sync_handshake. [ 177.003945] block drbd0: tumbleweed1: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT ) [ 177.012276] block drbd0: tumbleweed1: bitmap overflow (e:8388342) while decoding bm RLE packet [ 177.012300] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP, e: -5 l: 7! [ 177.012324] drbd r0 tumbleweed1: conn( Connected -> ProtocolError ) peer( Secondary -> Unknown ) [ 177.012328] block drbd0: tumbleweed1: pdsk( UpToDate -> DUnknown ) repl( WFBitMapT -> Off ) [ 177.012365] drbd r0 tumbleweed1: asender terminated [ 177.012369] drbd r0 tumbleweed1: Terminating asender thread [ 177.036176] drbd r0 tumbleweed1: Connection closed [ 177.036197] drbd r0 tumbleweed1: conn( ProtocolError -> Unconnected ) [ 177.036219] drbd r0 tumbleweed1: Restarting receiver thread [ 177.036237] drbd r0 tumbleweed1: conn( Unconnected -> Connecting ) [ 177.536161] drbd r0 tumbleweed1: Handshake successful: Agreed network protocol version 110 [ 177.536165] drbd r0 tumbleweed1: Agreed to support TRIM on protocol level [ 177.536177] drbd r0 tumbleweed1: Starting asender thread (from drbd_r_r0 [1183]) [ 177.544168] drbd r0 tumbleweed1: Preparing remote state change 1844200584 (primary_nodes=0, weak_nodes=0) [ 177.544569] drbd r0 tumbleweed1: Committing remote state change 1844200584 [ 177.544588] drbd r0 tumbleweed1: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [ 177.548098] block drbd0: tumbleweed1: drbd_sync_handshake: [ 177.548106] block drbd0: tumbleweed1: self 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:7864071 flags:0 [ 177.548113] block drbd0: tumbleweed1: peer 2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000 bits:8388343 flags:20 [ 177.548118] block drbd0: tumbleweed1: uuid_compare()=-2 by rule 20 [ 177.548122] block drbd0: tumbleweed1: Writing the whole bitmap, full sync required after drbd_sync_handshake. [ 177.561761] block drbd0: tumbleweed1: pdsk( DUnknown -> UpToDate ) repl( Off -> WFBitMapT ) [ 177.570088] block drbd0: tumbleweed1: bitmap overflow (e:8388342) while decoding bm RLE packet [ 177.570109] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP, e: -5 l: 7! [the last messages start looping] And the one which attempts to be primary [ 148.615292] drbd: initialized. Version: 9.0.0rc1 (api:1/proto:86-110) [ 148.615297] drbd: GIT-hash: 9804ed9b1eedab65cb137380f8066518a9521c12 build by abuild@cloud109, 2015-02-20 08:39:07 [ 148.615299] drbd: registered as block device major 147 [ 148.629013] drbd r0: Starting worker thread (from drbdsetup [1138]) [ 148.746514] sda: unknown partition table [ 148.747150] block drbd0: disk( Diskless -> Attaching ) [ 148.747161] block drbd0: Maximum number of peer devices = 1 [ 148.747251] drbd r0: Method to ensure write ordering: flush [ 148.747255] block drbd0: drbd_bm_resize called with capacity == 67106744 [ 148.748075] block drbd0: resync bitmap: bits=8388343 words=131068 pages=256 [ 148.748078] block drbd0: size = 32 GB (33553372 KB) [ 148.774226] block drbd0: recounting of set bits took additional 0ms [ 148.774247] block drbd0: Suspended AL updates [ 148.774262] block drbd0: disk( Attaching -> UpToDate ) [ 148.774270] block drbd0: attached to current UUID: 2077E6DCE2ED8D5E [ 148.826678] drbd r0 tumbleweed3: conn( StandAlone -> Unconnected ) [ 148.826719] drbd r0 tumbleweed3: Starting sender thread (from drbdsetup [1151]) [ 148.880835] drbd r0 tumbleweed3: Starting receiver thread (from drbd_w_r0 [1139]) [ 148.881016] drbd r0 tumbleweed3: conn( Unconnected -> Connecting ) [ 156.921255] drbd r0 tumbleweed3: Handshake successful: Agreed network protocol version 110 [ 156.921260] drbd r0 tumbleweed3: Agreed to support TRIM on protocol level [ 156.921275] drbd r0 tumbleweed3: Starting asender thread (from drbd_r_r0 [1159]) [ 156.928136] drbd r0: Preparing cluster-wide state change 1900523070 (0->1 499/146) [ 156.933242] drbd r0: State change 1900523070: primary_nodes=0, weak_nodes=0 [ 156.933246] drbd r0: Committing cluster-wide state change 1900523070 (4ms) [ 156.933263] drbd r0 tumbleweed3: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [ 156.936119] block drbd0: tumbleweed3: drbd_sync_handshake: [ 156.936125] block drbd0: tumbleweed3: self 2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000 bits:8388343 flags:0 [ 156.936130] block drbd0: tumbleweed3: peer 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:7864071 flags:24 [ 156.936133] block drbd0: tumbleweed3: uuid_compare()=2 by rule 30 [ 156.936136] block drbd0: tumbleweed3: Writing the whole bitmap, full sync required after drbd_sync_handshake. [ 156.963632] block drbd0: tumbleweed3: pdsk( DUnknown -> Inconsistent ) repl( Off -> WFBitMapS ) [ 156.963644] block drbd0: Resumed AL updates [ 156.966625] block drbd0: tumbleweed3: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% [ 156.985166] drbd r0 tumbleweed3: sock was shut down by peer [ 156.985186] drbd r0 tumbleweed3: conn( Connected -> BrokenPipe ) peer( Secondary -> Unknown ) [ 156.985191] block drbd0: tumbleweed3: repl( WFBitMapS -> Off ) [ 156.985215] drbd r0 tumbleweed3: short read (expected size 16) [ 156.985238] drbd r0 tumbleweed3: asender terminated [ 156.985241] drbd r0 tumbleweed3: Terminating asender thread [ 157.012359] drbd r0 tumbleweed3: Connection closed [ 157.012380] drbd r0 tumbleweed3: conn( BrokenPipe -> Unconnected ) [ 157.012402] drbd r0 tumbleweed3: Restarting receiver thread [ 157.012420] drbd r0 tumbleweed3: conn( Unconnected -> Connecting ) [ 157.513134] drbd r0 tumbleweed3: Handshake successful: Agreed network protocol version 110 [ 157.513138] drbd r0 tumbleweed3: Agreed to support TRIM on protocol level [ 157.513153] drbd r0 tumbleweed3: Starting asender thread (from drbd_r_r0 [1159]) [ 157.520104] drbd r0: Preparing cluster-wide state change 2872485783 (0->1 499/146) [ 157.525159] drbd r0: State change 2872485783: primary_nodes=0, weak_nodes=0 [ 157.525162] drbd r0: Committing cluster-wide state change 2872485783 (4ms) [ 157.525179] drbd r0 tumbleweed3: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) [ 157.528106] block drbd0: tumbleweed3: drbd_sync_handshake: [ 157.528111] block drbd0: tumbleweed3: self 2077E6DCE2ED8D5E:0000000000000000:0000000000000000:0000000000000000 bits:8388343 flags:0 [ 157.528115] block drbd0: tumbleweed3: peer 0000000000000004:0000000000000000:0000000000000000:0000000000000000 bits:7864071 flags:24 [ 157.528118] block drbd0: tumbleweed3: uuid_compare()=2 by rule 30 [ 157.528121] block drbd0: tumbleweed3: Writing the whole bitmap, full sync required after drbd_sync_handshake. [ 157.546946] block drbd0: tumbleweed3: repl( Off -> WFBitMapS ) [ 157.548668] block drbd0: tumbleweed3: send bitmap stats [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0% [ 157.561288] drbd r0 tumbleweed3: sock was shut down by peer [ 157.561306] drbd r0 tumbleweed3: conn( Connected -> BrokenPipe ) peer( Secondary -> Unknown ) [ 157.561311] block drbd0: tumbleweed3: repl( WFBitMapS -> Off ) [ 157.561335] drbd r0 tumbleweed3: short read (expected size 16) [ 157.561358] drbd r0 tumbleweed3: asender terminated [ 157.561360] drbd r0 tumbleweed3: Terminating asender thread [ 157.576230] drbd r0 tumbleweed3: Connection closed [ 157.576251] drbd r0 tumbleweed3: conn( BrokenPipe -> Unconnected ) [ 157.576273] drbd r0 tumbleweed3: Restarting receiver thread [ 157.576290] drbd r0 tumbleweed3: conn( Unconnected -> Connecting ) [ 158.077114] drbd r0 tumbleweed3: Handshake successful: Agreed network protocol version 110 [ 158.077119] drbd r0 tumbleweed3: Agreed to support TRIM on protocol level [ 158.077152] drbd r0 tumbleweed3: Starting asender thread (from drbd_r_r0 [1159]) [ 158.084157] drbd r0: Preparing cluster-wide state change 1844200584 (0->1 499/146) [ 158.085191] drbd r0: State change 1844200584: primary_nodes=0, weak_nodes=0 [ 158.085196] drbd r0: Committing cluster-wide state change 1844200584 (0ms) [ 158.085214] drbd r0 tumbleweed3: conn( Connecting -> Connected ) peer( Unknown -> Secondary ) And the configs: tumbleweed1:~ # cat /etc/drbd.conf # You can find an example in /usr/share/doc/drbd.../drbd.conf.example include "drbd.d/global_common.conf"; include "drbd.d/*.res"; I haven't turned on any optins in global_common.conf tumbleweed1:~ # cat /etc/drbd.d/r0.res resource r0 { device /dev/drbd_r0 minor 0; disk /dev/sda; meta-disk internal; on tumbleweed1 { address 192.168.1.111:7788; } on tumbleweed3 { address 192.168.1.113:7788; } syncer { rate 7M; } } > And then the DRBD 9 git HEAD has already moved again... > Yes, I rebuilt against the latest one and the error still occurs. BTW, the latest git still needs to fix the f_dentry in drbd/drbd_debugfs.c not covered in 7b00306c929c389bf030463466cb62e65bc415ec IOW, there is still reference to f_dentry is drbd/drbd_debugfs.c -- Goldwyn ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Drbd-dev] [bug] drbd 9: Receiver error 2015-02-20 9:12 ` Goldwyn Rodrigues @ 2015-02-20 9:26 ` Philipp Marek 2015-02-20 14:18 ` Lars Ellenberg 1 sibling, 0 replies; 7+ messages in thread From: Philipp Marek @ 2015-02-20 9:26 UTC (permalink / raw) To: Goldwyn Rodrigues; +Cc: drbd-dev Hello Goldwyn, > [ 176.429715] block drbd0: tumbleweed1: bitmap overflow (e:8388342) while > decoding bm RLE packet > [ 176.429739] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP, e: > -5 l: 7! > [ 176.429763] drbd r0 tumbleweed1: conn( Connected -> ProtocolError ) peer( > Secondary -> Unknown ) Hmmm, we've seen similar things with _old_ kernels, not with 3.18 yet. Please try switching bitmap RLE compression off. Regards, Phil -- : Ing. Philipp Marek : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com : DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Drbd-dev] [bug] drbd 9: Receiver error 2015-02-20 9:12 ` Goldwyn Rodrigues 2015-02-20 9:26 ` Philipp Marek @ 2015-02-20 14:18 ` Lars Ellenberg 2015-02-23 17:18 ` Goldwyn Rodrigues 1 sibling, 1 reply; 7+ messages in thread From: Lars Ellenberg @ 2015-02-20 14:18 UTC (permalink / raw) To: drbd-dev On Fri, Feb 20, 2015 at 03:12:36AM -0600, Goldwyn Rodrigues wrote: > Hi Philipp, > > > On 02/20/2015 01:41 AM, Philipp Marek wrote: > >Hi Goldwyn, > > > >>I compiled drbd-9.0 against Opensuse Tumbleweed and installed it [1]. > >>I applied Linux kernel commit f730c848affc05fb7262574b06e0cd7e1fa96096 to > >>get it to compile against the latest (factory) kernels and should be the > >>same for the tumbleweed kernel (3.18.3 based) I am using to test. > >> > >>I am getting the following receiver errors in the kernel log when I try to > >>start the drbd service: > >... > >>Please let me know if you need any more information to debug this or if I am > >>doing something wrong. > >how many nodes did you connect, which DRBD versions were they running? > > Two nodes, traditional way with one local device and no other > clustering software. > > >Can you show the configuration, and some more log lines - from all nodes, > >and starting quite a bit earlier? > > Here is the one which errs: > > [ 175.644532] drbd: initialized. Version: 9.0.0rc1 (api:1/proto:86-110) > [ 175.644535] drbd: GIT-hash: > 9804ed9b1eedab65cb137380f8066518a9521c12 build by abuild@cloud109, > 2015-02-20 08:39:07 > [ 175.644536] drbd: registered as block device major 147 > [ 175.657174] drbd r0: Starting worker thread (from drbdsetup [1162]) > [ 175.797012] sda: unknown partition table > [ 175.798367] block drbd0: disk( Diskless -> Attaching ) > [ 175.798404] block drbd0: Maximum number of peer devices = 1 > [ 175.798634] drbd r0: Method to ensure write ordering: flush > [ 175.798644] block drbd0: drbd_bm_resize called with capacity == 62912568 > [ 175.799879] block drbd0: resync bitmap: bits=7864071 words=122877 > pages=240 > [ 176.429715] block drbd0: tumbleweed1: bitmap overflow (e:8388342) while decoding bm RLE packet > [ 176.429739] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP, e: -5 l: 7! If you can reproduce this (with RLE enabled), can you please down drbd on both nodes, then "dump-md"? I'm interested in how exactly your bitmaps look like, so I could "unit test" the bitmap compression/decompression for it. Thanks, Lars -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Drbd-dev] [bug] drbd 9: Receiver error 2015-02-20 14:18 ` Lars Ellenberg @ 2015-02-23 17:18 ` Goldwyn Rodrigues 2015-02-26 20:12 ` Lars Ellenberg 0 siblings, 1 reply; 7+ messages in thread From: Goldwyn Rodrigues @ 2015-02-23 17:18 UTC (permalink / raw) To: drbd-dev On 02/20/2015 08:18 AM, Lars Ellenberg wrote: > On Fri, Feb 20, 2015 at 03:12:36AM -0600, Goldwyn Rodrigues wrote: >> Hi Philipp, >> >> >> On 02/20/2015 01:41 AM, Philipp Marek wrote: >>> Hi Goldwyn, >>> >>>> I compiled drbd-9.0 against Opensuse Tumbleweed and installed it [1]. >>>> I applied Linux kernel commit f730c848affc05fb7262574b06e0cd7e1fa96096 to >>>> get it to compile against the latest (factory) kernels and should be the >>>> same for the tumbleweed kernel (3.18.3 based) I am using to test. >>>> >>>> I am getting the following receiver errors in the kernel log when I try to >>>> start the drbd service: >>> ... >>>> Please let me know if you need any more information to debug this or if I am >>>> doing something wrong. >>> how many nodes did you connect, which DRBD versions were they running? >> >> Two nodes, traditional way with one local device and no other >> clustering software. >> >>> Can you show the configuration, and some more log lines - from all nodes, >>> and starting quite a bit earlier? >> >> Here is the one which errs: >> >> [ 175.644532] drbd: initialized. Version: 9.0.0rc1 (api:1/proto:86-110) >> [ 175.644535] drbd: GIT-hash: >> 9804ed9b1eedab65cb137380f8066518a9521c12 build by abuild@cloud109, >> 2015-02-20 08:39:07 >> [ 175.644536] drbd: registered as block device major 147 >> [ 175.657174] drbd r0: Starting worker thread (from drbdsetup [1162]) >> [ 175.797012] sda: unknown partition table >> [ 175.798367] block drbd0: disk( Diskless -> Attaching ) >> [ 175.798404] block drbd0: Maximum number of peer devices = 1 >> [ 175.798634] drbd r0: Method to ensure write ordering: flush >> [ 175.798644] block drbd0: drbd_bm_resize called with capacity == 62912568 >> [ 175.799879] block drbd0: resync bitmap: bits=7864071 words=122877 >> pages=240 > > >> [ 176.429715] block drbd0: tumbleweed1: bitmap overflow (e:8388342) while decoding bm RLE packet >> [ 176.429739] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP, e: -5 l: 7! > > > If you can reproduce this (with RLE enabled), > can you please down drbd on both nodes, > then "dump-md"? > > I'm interested in how exactly your bitmaps look like, > so I could "unit test" the bitmap compression/decompression for it. > The problem occurs when the devices are of unequal sizes. I am unable to get the dump-md _after_ the error, because of the following: tumbleweed3:~ # drbdadm dump-md r0 Found meta data is "unclean", please apply-al first Command 'drbdmeta 0 v09 /dev/sda internal dump-md' terminated with exit code 255 I am able to recreate the problem everytime. Here is the dump before starting the service: tumbleweed1:~ # drbdadm dump-md r0 # DRBD meta data dump # 2015-02-23 10:31:50 -0600 [1424709110] # tumbleweed1> drbdmeta 0 v09 /dev/sda internal dump-md # version "v09"; max-peers 1; # md_size_sect 2120 # md_offset 34359734272 # al_offset 34359701504 # bm_offset 34358652928 node-id -1; current-uuid 0x0000000000000004; flags 0x00000080; peer[0] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[1] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[2] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[3] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[4] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[5] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[6] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[7] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[8] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[9] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[10] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[11] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[12] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[13] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[14] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[15] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[16] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[17] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[18] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[19] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[20] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[21] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[22] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[23] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[24] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[25] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[26] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[27] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[28] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[29] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[30] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[31] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } history-uuids { 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; } # al-extents 257; la-size-sect 0; bm-byte-per-bit 4096; device-uuid 0xA4C9657C56B6D358; la-peer-max-bio-size 0; al-stripes 1; al-stripe-size-4k 8; # bm-bytes 0; bitmap[0] { } # bits-set 0; tumbleweed3:~ # drbdadm dump-md r0 # DRBD meta data dump # 2015-02-23 10:32:12 -0600 [1424709132] # tumbleweed3> drbdmeta 0 v09 /dev/sda internal dump-md # version "v09"; max-peers 1; # md_size_sect 1992 # md_offset 32212250624 # al_offset 32212217856 # bm_offset 32211234816 node-id -1; current-uuid 0x0000000000000004; flags 0x00000080; peer[0] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[1] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[2] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[3] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[4] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[5] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[6] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[7] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[8] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[9] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[10] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[11] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[12] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[13] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[14] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[15] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[16] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[17] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[18] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[19] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[20] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[21] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[22] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[23] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[24] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[25] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[26] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[27] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[28] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[29] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[30] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } peer[31] { bitmap-index -1; bitmap-uuid 0x0000000000000000; bitmap-dagtag 0x0000000000000000; flags 0x00000000; } history-uuids { 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; } # al-extents 257; la-size-sect 0; bm-byte-per-bit 4096; device-uuid 0xC789EF231ECD0071; la-peer-max-bio-size 0; al-stripes 1; al-stripe-size-4k 8; # bm-bytes 0; bitmap[0] { } # bits-set 0; This happens when I try to assign tumbleweed1 (bigger device) the primary using the command: # drbdadm -- --overwrite-data-of-peer primary r0 -- Goldwyn ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Drbd-dev] [bug] drbd 9: Receiver error 2015-02-23 17:18 ` Goldwyn Rodrigues @ 2015-02-26 20:12 ` Lars Ellenberg 0 siblings, 0 replies; 7+ messages in thread From: Lars Ellenberg @ 2015-02-26 20:12 UTC (permalink / raw) To: drbd-dev On Mon, Feb 23, 2015 at 11:18:31AM -0600, Goldwyn Rodrigues wrote: > >>[ 176.429715] block drbd0: tumbleweed1: bitmap overflow (e:8388342) while decoding bm RLE packet > >>[ 176.429739] drbd r0 tumbleweed1: error receiving P_COMPRESSED_BITMAP, e: -5 l: 7! > > > > > >If you can reproduce this (with RLE enabled), > >can you please down drbd on both nodes, > >then "dump-md"? > > > >I'm interested in how exactly your bitmaps look like, > >so I could "unit test" the bitmap compression/decompression for it. > > > > The problem occurs when the devices are of unequal sizes. Oh. Well, they should refuse to talk to each other in the first place. Or have agreed to the minimum of all involved sizes before even trying to exchange bitmap information. However, you should not connect different size DRBD, anyways. If that does not work yet, well, then don't do it ;-) > I am > unable to get the dump-md _after_ the error, because of the > following: > > tumbleweed3:~ # drbdadm dump-md r0 > Found meta data is "unclean", please apply-al first > Command 'drbdmeta 0 v09 /dev/sda internal dump-md' terminated with > exit code 255 you can add "--force". or, well, down, then "apply-al", as in "drbdadm apply-al", resp. drbdmeta 0 v09 /dev/sda internal apply-al (where "al" is "activity log"). > I am able to recreate the problem everytime. Here is the dump before > starting the service: > > tumbleweed1:~ # drbdadm dump-md r0 > # DRBD meta data dump > # 2015-02-23 10:31:50 -0600 [1424709110] > # tumbleweed1> drbdmeta 0 v09 /dev/sda internal dump-md > # > > version "v09"; > > max-peers 1; > # md_size_sect 2120 > # md_offset 34359734272 > # al_offset 34359701504 > # bm_offset 34358652928 > > node-id -1; > current-uuid 0x0000000000000004; > flags 0x00000080; > peer[0] { > bitmap-index -1; > bitmap-uuid 0x0000000000000000; > bitmap-dagtag 0x0000000000000000; > flags 0x00000000; > } > peer[1] { > bitmap-index -1; > bitmap-uuid 0x0000000000000000; > bitmap-dagtag 0x0000000000000000; > flags 0x00000000; > } > history-uuids { > 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; > # al-extents 257; > la-size-sect 0; > bm-byte-per-bit 4096; > device-uuid 0xA4C9657C56B6D358; > la-peer-max-bio-size 0; > al-stripes 1; > al-stripe-size-4k 8; > # bm-bytes 0; > bitmap[0] { > } > # bits-set 0; > tumbleweed3:~ # drbdadm dump-md r0 > # DRBD meta data dump > # 2015-02-23 10:32:12 -0600 [1424709132] > # tumbleweed3> drbdmeta 0 v09 /dev/sda internal dump-md > # > > version "v09"; > > max-peers 1; > # md_size_sect 1992 > # md_offset 32212250624 > # al_offset 32212217856 > # bm_offset 32211234816 > > node-id -1; > current-uuid 0x0000000000000004; > flags 0x00000080; > peer[0] { > bitmap-index -1; > bitmap-uuid 0x0000000000000000; > bitmap-dagtag 0x0000000000000000; > flags 0x00000000; > } > peer[1] { > bitmap-index -1; > bitmap-uuid 0x0000000000000000; > bitmap-dagtag 0x0000000000000000; > flags 0x00000000; > } > history-uuids { > 0x0000000000000000; 0x0000000000000000; 0x0000000000000000; > 0x0000000000000000; > # al-extents 257; > la-size-sect 0; > bm-byte-per-bit 4096; > device-uuid 0xC789EF231ECD0071; > la-peer-max-bio-size 0; > al-stripes 1; > al-stripe-size-4k 8; > # bm-bytes 0; > bitmap[0] { > } > # bits-set 0; > This happens when I try to assign tumbleweed1 (bigger device) the > primary using the command: > > # drbdadm -- --overwrite-data-of-peer primary r0 should be enough info for us to reproduce. For now: just don't do that. Use devices of the same size everywhere. Thanks, Lars -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2015-02-26 20:12 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-02-20 5:40 [Drbd-dev] [bug] drbd 9: Receiver error Goldwyn Rodrigues 2015-02-20 7:41 ` Philipp Marek 2015-02-20 9:12 ` Goldwyn Rodrigues 2015-02-20 9:26 ` Philipp Marek 2015-02-20 14:18 ` Lars Ellenberg 2015-02-23 17:18 ` Goldwyn Rodrigues 2015-02-26 20:12 ` Lars Ellenberg
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.