* [Drbd-dev] DRBD broken in kernel 6.5 and 6.5.1
@ 2023-09-02 20:31 Serguei Ivantsov
2023-09-05 15:23 ` [Drbd-dev] " Jakub Kicinski
0 siblings, 1 reply; 9+ messages in thread
From: Serguei Ivantsov @ 2023-09-02 20:31 UTC (permalink / raw)
To: stable; +Cc: kuba, regressions, drbd-dev
[-- Attachment #1: Type: text/plain, Size: 8275 bytes --]
Hello,
After upgrading the kernel to 6.5 the system can't connect to the peer
(6.4.11) anymore.
I checked 6.5.1 - same issue.
All previous kernels including 6.4.14 are working just fine.
Checking the 6.5 changelog, I found commit
9ae440b8fdd6772b6c007fa3d3766530a09c9045 which mentioned some changes to
DRBD.
On the 6.5.X system I have the following in the kernel log (drbd_send_block()
failed):
[ 2.473497] drbd: initialized. Version: 8.4.11 (api:1/proto:86-101)
[ 2.475394] drbd: built-in
[ 2.477254] drbd: registered as block device major 147
[ 7.421400] drbd drbd0: Starting worker thread (from drbdsetup-84 [3844])
[ 7.421509] drbd drbd0/0 drbd0: disk( Diskless -> Attaching )
[ 7.421552] drbd drbd0: Method to ensure write ordering: flush
[ 7.421554] drbd drbd0/0 drbd0: max BIO size = 131072
[ 7.421557] drbd drbd0/0 drbd0: drbd_bm_resize called with capacity ==
1845173184
[ 7.428017] drbd drbd0/0 drbd0: resync bitmap: bits=230646648
words=3603854 pages=7039
[ 7.467370] drbd0: detected capacity change from 0 to 1845173184
[ 7.467372] drbd drbd0/0 drbd0: size = 880 GB (922586592 KB)
[ 7.486005] drbd drbd0/0 drbd0: recounting of set bits took additional 0
jiffies
[ 7.486010] drbd drbd0/0 drbd0: 0 KB (0 bits) marked out-of-sync by on
disk bit-map.
[ 7.486017] drbd drbd0/0 drbd0: disk( Attaching -> UpToDate )
[ 7.486021] drbd drbd0/0 drbd0: attached to UUIDs
32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
[ 7.486863] drbd drbd0: conn( StandAlone -> Unconnected )
[ 7.486871] drbd drbd0: Starting receiver thread (from drbd_w_drbd0
[3847])
[ 7.486918] drbd drbd0: receiver (re)started
[ 7.486929] drbd drbd0: conn( Unconnected -> WFConnection )
[ 12.340212] drbd drbd0: initial packet S crossed
[ 22.310856] drbd drbd0: Handshake successful: Agreed network protocol
version 101
[ 22.311087] drbd drbd0: Feature flags enabled on protocol level: 0xf
TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[ 22.311425] drbd drbd0: conn( WFConnection -> WFReportParams )
[ 22.311621] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
[4071])
[ 22.400702] drbd drbd0/0 drbd0: drbd_sync_handshake:
[ 22.400869] drbd drbd0/0 drbd0: self
32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD bits:0
flags:0
[ 22.401205] drbd drbd0/0 drbd0: peer
32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:0
flags:0
[ 22.401538] drbd drbd0/0 drbd0: uuid_compare()=0 by rule 40
[ 22.401709] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
[ 22.415394] drbd drbd0/0 drbd0: role( Secondary -> Primary )
[ 22.506540] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096
sent=-5
[ 22.506773] drbd drbd0: peer( Secondary -> Unknown ) conn( Connected ->
NetworkFailure ) pdsk( UpToDate -> DUnknown )
[ 22.507109] drbd drbd0/0 drbd0: new current UUID
7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
[ 22.507451] drbd drbd0: ack_receiver terminated
[ 22.507588] drbd drbd0: Terminating drbd_a_drbd0
[ 22.600693] drbd drbd0: Connection closed
[ 22.600937] drbd drbd0: conn( NetworkFailure -> Unconnected )
[ 22.601115] drbd drbd0: receiver terminated
[ 22.601238] drbd drbd0: Restarting receiver thread
[ 22.601378] drbd drbd0: receiver (re)started
[ 22.601508] drbd drbd0: conn( Unconnected -> WFConnection )
[ 23.260624] drbd drbd0: Handshake successful: Agreed network protocol
version 101
[ 23.260859] drbd drbd0: Feature flags enabled on protocol level: 0xf
TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[ 23.261187] drbd drbd0: conn( WFConnection -> WFReportParams )
[ 23.261367] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
[4071])
[ 23.340593] drbd drbd0/0 drbd0: drbd_sync_handshake:
[ 23.340771] drbd drbd0/0 drbd0: self
7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD bits:1
flags:0
[ 23.341192] drbd drbd0/0 drbd0: peer
32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:0
flags:0
[ 23.341649] drbd drbd0/0 drbd0: uuid_compare()=1 by rule 70
[ 23.341824] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
[ 23.344911] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
plain 0(0), RLE 23(1), total 23; compression: 100.0%
[ 23.396792] drbd drbd0/0 drbd0: receive bitmap stats [Bytes(packets)]:
plain 0(0), RLE 23(1), total 23; compression: 100.0%
[ 23.397210] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
before-resync-source minor-0
[ 23.407965] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
before-resync-source minor-0 exit code 0 (0x0)
[ 23.417547] drbd drbd0/0 drbd0: conn( WFBitMapS -> SyncSource ) pdsk(
Consistent -> Inconsistent )
[ 23.426697] drbd drbd0/0 drbd0: Began resync as SyncSource (will sync 4
KB [1 bits set]).
[ 23.435638] drbd drbd0/0 drbd0: updated sync UUID
7F8B15C04AF49C4D:32DEB2019708F68B:32DDB2019708F68B:7D97648599B446DD
[ 23.488608] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096
sent=-5
[ 23.498182] drbd drbd0/0 drbd0: drbd_send_block() failed
[ 23.508498] drbd drbd0: peer( Secondary -> Unknown ) conn( SyncSource ->
NetworkFailure )
[ 23.517597] drbd drbd0: ack_receiver terminated
[ 23.527513] drbd drbd0: Terminating drbd_a_drbd0
[ 23.690598] drbd drbd0: Connection closed
[ 23.701857] drbd drbd0: conn( NetworkFailure -> Unconnected )
[ 23.712017] drbd drbd0: receiver terminated
[ 23.721597] drbd drbd0: Restarting receiver thread
On the peer:
[349071.038278] drbd drbd0: conn( Unconnected -> WFConnection )
[349071.558245] drbd drbd0: Handshake successful: Agreed network protocol
version 101
[349071.562105] drbd drbd0: Feature flags enabled on protocol level: 0xf
TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[349071.569889] drbd drbd0: conn( WFConnection -> WFReportParams )
[349071.573802] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
[2660])
[349071.688547] drbd drbd0/0 drbd0: drbd_sync_handshake:
[349071.692323] drbd drbd0/0 drbd0: self
3375B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:1
flags:0
[349071.699871] drbd drbd0/0 drbd0: peer
7F8B15C04AF49C4D:3375B2019708F68B:3374B2019708F68B:3373B2019708F68B bits:1
flags:0
[349071.707687] drbd drbd0/0 drbd0: uuid_compare()=-1 by rule 50
[349071.711563] drbd drbd0/0 drbd0: Becoming sync target due to disk states.
[349071.715381] drbd drbd0/0 drbd0: peer( Unknown -> Primary ) conn(
WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
[349071.723039] drbd drbd0/0 drbd0: receive bitmap stats [Bytes(packets)]:
plain 0(0), RLE 23(1), total 23; compression: 100.0%
[349071.732489] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
plain 0(0), RLE 23(1), total 23; compression: 100.0%
[349071.740178] drbd drbd0/0 drbd0: conn( WFBitMapT -> WFSyncUUID )
[349071.787113] drbd drbd0/0 drbd0: updated sync uuid
3376B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
[349071.794907] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0
[349071.800006] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0 exit code 0 (0x0)
[349071.807737] drbd drbd0/0 drbd0: conn( WFSyncUUID -> SyncTarget )
[349071.811639] drbd drbd0/0 drbd0: Began resync as SyncTarget (will sync 4
KB [1 bits set]).
[349071.916117] drbd drbd0: sock was shut down by peer
[349071.919955] drbd drbd0: peer( Primary -> Unknown ) conn( SyncTarget ->
BrokenPipe ) pdsk( UpToDate -> DUnknown )
[349071.927796] drbd drbd0: short read (expected size 4096)
[349071.931812] drbd drbd0: error receiving RSDataReply, e: -5 l: 4096!
[349071.935864] drbd drbd0: ack_receiver terminated
[349071.939906] drbd drbd0: Terminating drbd_a_drbd0
[349072.088385] drbd drbd0: Connection closed
[349072.092398] drbd drbd0: conn( BrokenPipe -> Unconnected )
[349072.096436] drbd drbd0: receiver terminated
[349072.100469] drbd drbd0: Restarting receiver thread
[349072.104454] drbd drbd0: receiver (re)started
[349072.108373] drbd drbd0: conn( Unconnected -> WFConnection )
--
Best Regards,
Serguei
[-- Attachment #2: Type: text/html, Size: 35095 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: DRBD broken in kernel 6.5 and 6.5.1
2023-09-02 20:31 [Drbd-dev] DRBD broken in kernel 6.5 and 6.5.1 Serguei Ivantsov
@ 2023-09-05 15:23 ` Jakub Kicinski
0 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2023-09-05 15:23 UTC (permalink / raw)
To: Serguei Ivantsov; +Cc: stable, regressions, drbd-dev, David Howells
CC: David
On Sat, 2 Sep 2023 22:31:06 +0200 Serguei Ivantsov wrote:
> Hello,
>
> After upgrading the kernel to 6.5 the system can't connect to the peer
> (6.4.11) anymore.
> I checked 6.5.1 - same issue.
> All previous kernels including 6.4.14 are working just fine.
> Checking the 6.5 changelog, I found commit
> 9ae440b8fdd6772b6c007fa3d3766530a09c9045 which mentioned some changes to
> DRBD.
I don't see anything obviously wrong here, maybe David has better
ideas. Can you try this?
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 79ab532aabaf..c607d4304608 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1539,8 +1539,6 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa
int offset, size_t size, unsigned msg_flags)
{
struct socket *socket = peer_device->connection->data.socket;
- struct msghdr msg = { .msg_flags = msg_flags, };
- struct bio_vec bvec;
int len = size;
int err = -EIO;
@@ -1551,10 +1549,13 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa
* __page_cache_release a page that would actually still be referenced
* by someone, leading to some obscure delayed Oops somewhere else. */
if (!drbd_disable_sendpage && sendpage_ok(page))
- msg.msg_flags |= MSG_NOSIGNAL | MSG_SPLICE_PAGES;
+ msg_flags |= MSG_SPLICE_PAGES;
+ msg_flags |= MSG_NOSIGNAL;
drbd_update_congested(peer_device->connection);
do {
+ struct msghdr msg = { .msg_flags = msg_flags, };
+ struct bio_vec bvec;
int sent;
bvec_set_page(&bvec, page, offset, len);
> On the 6.5.X system I have the following in the kernel log (drbd_send_block()
> failed):
>
> [ 2.473497] drbd: initialized. Version: 8.4.11 (api:1/proto:86-101)
>
> [ 2.475394] drbd: built-in
>
> [ 2.477254] drbd: registered as block device major 147
>
> [ 7.421400] drbd drbd0: Starting worker thread (from drbdsetup-84 [3844])
>
> [ 7.421509] drbd drbd0/0 drbd0: disk( Diskless -> Attaching )
>
> [ 7.421552] drbd drbd0: Method to ensure write ordering: flush
>
> [ 7.421554] drbd drbd0/0 drbd0: max BIO size = 131072
>
> [ 7.421557] drbd drbd0/0 drbd0: drbd_bm_resize called with capacity ==
> 1845173184
>
> [ 7.428017] drbd drbd0/0 drbd0: resync bitmap: bits=230646648
> words=3603854 pages=7039
>
> [ 7.467370] drbd0: detected capacity change from 0 to 1845173184
>
> [ 7.467372] drbd drbd0/0 drbd0: size = 880 GB (922586592 KB)
>
> [ 7.486005] drbd drbd0/0 drbd0: recounting of set bits took additional 0
> jiffies
>
> [ 7.486010] drbd drbd0/0 drbd0: 0 KB (0 bits) marked out-of-sync by on
> disk bit-map.
>
> [ 7.486017] drbd drbd0/0 drbd0: disk( Attaching -> UpToDate )
>
> [ 7.486021] drbd drbd0/0 drbd0: attached to UUIDs
> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
>
> [ 7.486863] drbd drbd0: conn( StandAlone -> Unconnected )
>
> [ 7.486871] drbd drbd0: Starting receiver thread (from drbd_w_drbd0
> [3847])
>
> [ 7.486918] drbd drbd0: receiver (re)started
>
> [ 7.486929] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [ 12.340212] drbd drbd0: initial packet S crossed
>
> [ 22.310856] drbd drbd0: Handshake successful: Agreed network protocol
> version 101
>
> [ 22.311087] drbd drbd0: Feature flags enabled on protocol level: 0xf
> TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [ 22.311425] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [ 22.311621] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
> [4071])
>
> [ 22.400702] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [ 22.400869] drbd drbd0/0 drbd0: self
> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD bits:0
> flags:0
>
> [ 22.401205] drbd drbd0/0 drbd0: peer
> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:0
> flags:0
>
> [ 22.401538] drbd drbd0/0 drbd0: uuid_compare()=0 by rule 40
>
> [ 22.401709] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
>
> [ 22.415394] drbd drbd0/0 drbd0: role( Secondary -> Primary )
>
> [ 22.506540] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096
> sent=-5
>
> [ 22.506773] drbd drbd0: peer( Secondary -> Unknown ) conn( Connected ->
> NetworkFailure ) pdsk( UpToDate -> DUnknown )
>
> [ 22.507109] drbd drbd0/0 drbd0: new current UUID
> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
>
> [ 22.507451] drbd drbd0: ack_receiver terminated
>
> [ 22.507588] drbd drbd0: Terminating drbd_a_drbd0
>
> [ 22.600693] drbd drbd0: Connection closed
>
> [ 22.600937] drbd drbd0: conn( NetworkFailure -> Unconnected )
>
> [ 22.601115] drbd drbd0: receiver terminated
>
> [ 22.601238] drbd drbd0: Restarting receiver thread
>
> [ 22.601378] drbd drbd0: receiver (re)started
>
> [ 22.601508] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [ 23.260624] drbd drbd0: Handshake successful: Agreed network protocol
> version 101
>
> [ 23.260859] drbd drbd0: Feature flags enabled on protocol level: 0xf
> TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [ 23.261187] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [ 23.261367] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
> [4071])
>
> [ 23.340593] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [ 23.340771] drbd drbd0/0 drbd0: self
> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD bits:1
> flags:0
>
> [ 23.341192] drbd drbd0/0 drbd0: peer
> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:0
> flags:0
>
> [ 23.341649] drbd drbd0/0 drbd0: uuid_compare()=1 by rule 70
>
> [ 23.341824] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
> WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
>
> [ 23.344911] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [ 23.396792] drbd drbd0/0 drbd0: receive bitmap stats [Bytes(packets)]:
> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [ 23.397210] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-source minor-0
>
> [ 23.407965] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-source minor-0 exit code 0 (0x0)
>
> [ 23.417547] drbd drbd0/0 drbd0: conn( WFBitMapS -> SyncSource ) pdsk(
> Consistent -> Inconsistent )
>
> [ 23.426697] drbd drbd0/0 drbd0: Began resync as SyncSource (will sync 4
> KB [1 bits set]).
>
> [ 23.435638] drbd drbd0/0 drbd0: updated sync UUID
> 7F8B15C04AF49C4D:32DEB2019708F68B:32DDB2019708F68B:7D97648599B446DD
>
> [ 23.488608] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096
> sent=-5
>
> [ 23.498182] drbd drbd0/0 drbd0: drbd_send_block() failed
>
> [ 23.508498] drbd drbd0: peer( Secondary -> Unknown ) conn( SyncSource ->
> NetworkFailure )
>
> [ 23.517597] drbd drbd0: ack_receiver terminated
>
> [ 23.527513] drbd drbd0: Terminating drbd_a_drbd0
>
> [ 23.690598] drbd drbd0: Connection closed
>
> [ 23.701857] drbd drbd0: conn( NetworkFailure -> Unconnected )
>
> [ 23.712017] drbd drbd0: receiver terminated
>
> [ 23.721597] drbd drbd0: Restarting receiver thread
>
>
>
> On the peer:
>
>
> [349071.038278] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [349071.558245] drbd drbd0: Handshake successful: Agreed network protocol
> version 101
>
> [349071.562105] drbd drbd0: Feature flags enabled on protocol level: 0xf
> TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [349071.569889] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [349071.573802] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
> [2660])
>
> [349071.688547] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [349071.692323] drbd drbd0/0 drbd0: self
> 3375B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:1
> flags:0
>
> [349071.699871] drbd drbd0/0 drbd0: peer
> 7F8B15C04AF49C4D:3375B2019708F68B:3374B2019708F68B:3373B2019708F68B bits:1
> flags:0
>
> [349071.707687] drbd drbd0/0 drbd0: uuid_compare()=-1 by rule 50
>
> [349071.711563] drbd drbd0/0 drbd0: Becoming sync target due to disk states.
>
> [349071.715381] drbd drbd0/0 drbd0: peer( Unknown -> Primary ) conn(
> WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
>
> [349071.723039] drbd drbd0/0 drbd0: receive bitmap stats [Bytes(packets)]:
> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [349071.732489] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [349071.740178] drbd drbd0/0 drbd0: conn( WFBitMapT -> WFSyncUUID )
>
> [349071.787113] drbd drbd0/0 drbd0: updated sync uuid
> 3376B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>
> [349071.794907] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-target minor-0
>
> [349071.800006] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-target minor-0 exit code 0 (0x0)
>
> [349071.807737] drbd drbd0/0 drbd0: conn( WFSyncUUID -> SyncTarget )
>
> [349071.811639] drbd drbd0/0 drbd0: Began resync as SyncTarget (will sync 4
> KB [1 bits set]).
>
> [349071.916117] drbd drbd0: sock was shut down by peer
>
> [349071.919955] drbd drbd0: peer( Primary -> Unknown ) conn( SyncTarget ->
> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>
> [349071.927796] drbd drbd0: short read (expected size 4096)
>
> [349071.931812] drbd drbd0: error receiving RSDataReply, e: -5 l: 4096!
>
> [349071.935864] drbd drbd0: ack_receiver terminated
>
> [349071.939906] drbd drbd0: Terminating drbd_a_drbd0
>
> [349072.088385] drbd drbd0: Connection closed
>
> [349072.092398] drbd drbd0: conn( BrokenPipe -> Unconnected )
>
> [349072.096436] drbd drbd0: receiver terminated
>
> [349072.100469] drbd drbd0: Restarting receiver thread
>
> [349072.104454] drbd drbd0: receiver (re)started
>
> [349072.108373] drbd drbd0: conn( Unconnected -> WFConnection )
>
>
> --
>
> Best Regards,
>
> Serguei
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [Drbd-dev] DRBD broken in kernel 6.5 and 6.5.1
@ 2023-09-05 15:23 ` Jakub Kicinski
0 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2023-09-05 15:23 UTC (permalink / raw)
To: Serguei Ivantsov; +Cc: David Howells, regressions, stable, drbd-dev
CC: David
On Sat, 2 Sep 2023 22:31:06 +0200 Serguei Ivantsov wrote:
> Hello,
>
> After upgrading the kernel to 6.5 the system can't connect to the peer
> (6.4.11) anymore.
> I checked 6.5.1 - same issue.
> All previous kernels including 6.4.14 are working just fine.
> Checking the 6.5 changelog, I found commit
> 9ae440b8fdd6772b6c007fa3d3766530a09c9045 which mentioned some changes to
> DRBD.
I don't see anything obviously wrong here, maybe David has better
ideas. Can you try this?
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 79ab532aabaf..c607d4304608 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1539,8 +1539,6 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa
int offset, size_t size, unsigned msg_flags)
{
struct socket *socket = peer_device->connection->data.socket;
- struct msghdr msg = { .msg_flags = msg_flags, };
- struct bio_vec bvec;
int len = size;
int err = -EIO;
@@ -1551,10 +1549,13 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa
* __page_cache_release a page that would actually still be referenced
* by someone, leading to some obscure delayed Oops somewhere else. */
if (!drbd_disable_sendpage && sendpage_ok(page))
- msg.msg_flags |= MSG_NOSIGNAL | MSG_SPLICE_PAGES;
+ msg_flags |= MSG_SPLICE_PAGES;
+ msg_flags |= MSG_NOSIGNAL;
drbd_update_congested(peer_device->connection);
do {
+ struct msghdr msg = { .msg_flags = msg_flags, };
+ struct bio_vec bvec;
int sent;
bvec_set_page(&bvec, page, offset, len);
> On the 6.5.X system I have the following in the kernel log (drbd_send_block()
> failed):
>
> [ 2.473497] drbd: initialized. Version: 8.4.11 (api:1/proto:86-101)
>
> [ 2.475394] drbd: built-in
>
> [ 2.477254] drbd: registered as block device major 147
>
> [ 7.421400] drbd drbd0: Starting worker thread (from drbdsetup-84 [3844])
>
> [ 7.421509] drbd drbd0/0 drbd0: disk( Diskless -> Attaching )
>
> [ 7.421552] drbd drbd0: Method to ensure write ordering: flush
>
> [ 7.421554] drbd drbd0/0 drbd0: max BIO size = 131072
>
> [ 7.421557] drbd drbd0/0 drbd0: drbd_bm_resize called with capacity ==
> 1845173184
>
> [ 7.428017] drbd drbd0/0 drbd0: resync bitmap: bits=230646648
> words=3603854 pages=7039
>
> [ 7.467370] drbd0: detected capacity change from 0 to 1845173184
>
> [ 7.467372] drbd drbd0/0 drbd0: size = 880 GB (922586592 KB)
>
> [ 7.486005] drbd drbd0/0 drbd0: recounting of set bits took additional 0
> jiffies
>
> [ 7.486010] drbd drbd0/0 drbd0: 0 KB (0 bits) marked out-of-sync by on
> disk bit-map.
>
> [ 7.486017] drbd drbd0/0 drbd0: disk( Attaching -> UpToDate )
>
> [ 7.486021] drbd drbd0/0 drbd0: attached to UUIDs
> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
>
> [ 7.486863] drbd drbd0: conn( StandAlone -> Unconnected )
>
> [ 7.486871] drbd drbd0: Starting receiver thread (from drbd_w_drbd0
> [3847])
>
> [ 7.486918] drbd drbd0: receiver (re)started
>
> [ 7.486929] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [ 12.340212] drbd drbd0: initial packet S crossed
>
> [ 22.310856] drbd drbd0: Handshake successful: Agreed network protocol
> version 101
>
> [ 22.311087] drbd drbd0: Feature flags enabled on protocol level: 0xf
> TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [ 22.311425] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [ 22.311621] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
> [4071])
>
> [ 22.400702] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [ 22.400869] drbd drbd0/0 drbd0: self
> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD bits:0
> flags:0
>
> [ 22.401205] drbd drbd0/0 drbd0: peer
> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:0
> flags:0
>
> [ 22.401538] drbd drbd0/0 drbd0: uuid_compare()=0 by rule 40
>
> [ 22.401709] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
>
> [ 22.415394] drbd drbd0/0 drbd0: role( Secondary -> Primary )
>
> [ 22.506540] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096
> sent=-5
>
> [ 22.506773] drbd drbd0: peer( Secondary -> Unknown ) conn( Connected ->
> NetworkFailure ) pdsk( UpToDate -> DUnknown )
>
> [ 22.507109] drbd drbd0/0 drbd0: new current UUID
> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
>
> [ 22.507451] drbd drbd0: ack_receiver terminated
>
> [ 22.507588] drbd drbd0: Terminating drbd_a_drbd0
>
> [ 22.600693] drbd drbd0: Connection closed
>
> [ 22.600937] drbd drbd0: conn( NetworkFailure -> Unconnected )
>
> [ 22.601115] drbd drbd0: receiver terminated
>
> [ 22.601238] drbd drbd0: Restarting receiver thread
>
> [ 22.601378] drbd drbd0: receiver (re)started
>
> [ 22.601508] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [ 23.260624] drbd drbd0: Handshake successful: Agreed network protocol
> version 101
>
> [ 23.260859] drbd drbd0: Feature flags enabled on protocol level: 0xf
> TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [ 23.261187] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [ 23.261367] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
> [4071])
>
> [ 23.340593] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [ 23.340771] drbd drbd0/0 drbd0: self
> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD bits:1
> flags:0
>
> [ 23.341192] drbd drbd0/0 drbd0: peer
> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:0
> flags:0
>
> [ 23.341649] drbd drbd0/0 drbd0: uuid_compare()=1 by rule 70
>
> [ 23.341824] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
> WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
>
> [ 23.344911] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [ 23.396792] drbd drbd0/0 drbd0: receive bitmap stats [Bytes(packets)]:
> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [ 23.397210] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-source minor-0
>
> [ 23.407965] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-source minor-0 exit code 0 (0x0)
>
> [ 23.417547] drbd drbd0/0 drbd0: conn( WFBitMapS -> SyncSource ) pdsk(
> Consistent -> Inconsistent )
>
> [ 23.426697] drbd drbd0/0 drbd0: Began resync as SyncSource (will sync 4
> KB [1 bits set]).
>
> [ 23.435638] drbd drbd0/0 drbd0: updated sync UUID
> 7F8B15C04AF49C4D:32DEB2019708F68B:32DDB2019708F68B:7D97648599B446DD
>
> [ 23.488608] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096
> sent=-5
>
> [ 23.498182] drbd drbd0/0 drbd0: drbd_send_block() failed
>
> [ 23.508498] drbd drbd0: peer( Secondary -> Unknown ) conn( SyncSource ->
> NetworkFailure )
>
> [ 23.517597] drbd drbd0: ack_receiver terminated
>
> [ 23.527513] drbd drbd0: Terminating drbd_a_drbd0
>
> [ 23.690598] drbd drbd0: Connection closed
>
> [ 23.701857] drbd drbd0: conn( NetworkFailure -> Unconnected )
>
> [ 23.712017] drbd drbd0: receiver terminated
>
> [ 23.721597] drbd drbd0: Restarting receiver thread
>
>
>
> On the peer:
>
>
> [349071.038278] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [349071.558245] drbd drbd0: Handshake successful: Agreed network protocol
> version 101
>
> [349071.562105] drbd drbd0: Feature flags enabled on protocol level: 0xf
> TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [349071.569889] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [349071.573802] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
> [2660])
>
> [349071.688547] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [349071.692323] drbd drbd0/0 drbd0: self
> 3375B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:1
> flags:0
>
> [349071.699871] drbd drbd0/0 drbd0: peer
> 7F8B15C04AF49C4D:3375B2019708F68B:3374B2019708F68B:3373B2019708F68B bits:1
> flags:0
>
> [349071.707687] drbd drbd0/0 drbd0: uuid_compare()=-1 by rule 50
>
> [349071.711563] drbd drbd0/0 drbd0: Becoming sync target due to disk states.
>
> [349071.715381] drbd drbd0/0 drbd0: peer( Unknown -> Primary ) conn(
> WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
>
> [349071.723039] drbd drbd0/0 drbd0: receive bitmap stats [Bytes(packets)]:
> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [349071.732489] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [349071.740178] drbd drbd0/0 drbd0: conn( WFBitMapT -> WFSyncUUID )
>
> [349071.787113] drbd drbd0/0 drbd0: updated sync uuid
> 3376B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>
> [349071.794907] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-target minor-0
>
> [349071.800006] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-target minor-0 exit code 0 (0x0)
>
> [349071.807737] drbd drbd0/0 drbd0: conn( WFSyncUUID -> SyncTarget )
>
> [349071.811639] drbd drbd0/0 drbd0: Began resync as SyncTarget (will sync 4
> KB [1 bits set]).
>
> [349071.916117] drbd drbd0: sock was shut down by peer
>
> [349071.919955] drbd drbd0: peer( Primary -> Unknown ) conn( SyncTarget ->
> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>
> [349071.927796] drbd drbd0: short read (expected size 4096)
>
> [349071.931812] drbd drbd0: error receiving RSDataReply, e: -5 l: 4096!
>
> [349071.935864] drbd drbd0: ack_receiver terminated
>
> [349071.939906] drbd drbd0: Terminating drbd_a_drbd0
>
> [349072.088385] drbd drbd0: Connection closed
>
> [349072.092398] drbd drbd0: conn( BrokenPipe -> Unconnected )
>
> [349072.096436] drbd drbd0: receiver terminated
>
> [349072.100469] drbd drbd0: Restarting receiver thread
>
> [349072.104454] drbd drbd0: receiver (re)started
>
> [349072.108373] drbd drbd0: conn( Unconnected -> WFConnection )
>
>
> --
>
> Best Regards,
>
> Serguei
^ permalink raw reply related [flat|nested] 9+ messages in thread
* DRBD broken in kernel 6.5 and 6.5.1
@ 2023-09-02 20:37 Serguei Ivantsov
2023-09-02 21:09 ` Greg KH
2023-09-03 5:45 ` Linux regression tracking (Thorsten Leemhuis)
0 siblings, 2 replies; 9+ messages in thread
From: Serguei Ivantsov @ 2023-09-02 20:37 UTC (permalink / raw)
To: regressions
Hello,
After upgrading the kernel to 6.5 the system can't connect to the peer
(6.4.11) anymore.
I checked 6.5.1 - same issue.
All previous kernels including 6.4.14 are working just fine.
Checking the 6.5 changelog, I found commit
9ae440b8fdd6772b6c007fa3d3766530a09c9045 which mentioned some changes
to DRBD.
On the 6.5.X system I have the following in the kernel log
(drbd_send_block() failed):
[ 2.473497] drbd: initialized. Version: 8.4.11 (api:1/proto:86-101)
[ 2.475394] drbd: built-in
[ 2.477254] drbd: registered as block device major 147
[ 7.421400] drbd drbd0: Starting worker thread (from drbdsetup-84 [3844])
[ 7.421509] drbd drbd0/0 drbd0: disk( Diskless -> Attaching )
[ 7.421552] drbd drbd0: Method to ensure write ordering: flush
[ 7.421554] drbd drbd0/0 drbd0: max BIO size = 131072
[ 7.421557] drbd drbd0/0 drbd0: drbd_bm_resize called with capacity
== 1845173184
[ 7.428017] drbd drbd0/0 drbd0: resync bitmap: bits=230646648
words=3603854 pages=7039
[ 7.467370] drbd0: detected capacity change from 0 to 1845173184
[ 7.467372] drbd drbd0/0 drbd0: size = 880 GB (922586592 KB)
[ 7.486005] drbd drbd0/0 drbd0: recounting of set bits took
additional 0 jiffies
[ 7.486010] drbd drbd0/0 drbd0: 0 KB (0 bits) marked out-of-sync by
on disk bit-map.
[ 7.486017] drbd drbd0/0 drbd0: disk( Attaching -> UpToDate )
[ 7.486021] drbd drbd0/0 drbd0: attached to UUIDs
32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
[ 7.486863] drbd drbd0: conn( StandAlone -> Unconnected )
[ 7.486871] drbd drbd0: Starting receiver thread (from drbd_w_drbd0 [3847])
[ 7.486918] drbd drbd0: receiver (re)started
[ 7.486929] drbd drbd0: conn( Unconnected -> WFConnection )
[ 12.340212] drbd drbd0: initial packet S crossed
[ 22.310856] drbd drbd0: Handshake successful: Agreed network
protocol version 101
[ 22.311087] drbd drbd0: Feature flags enabled on protocol level:
0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[ 22.311425] drbd drbd0: conn( WFConnection -> WFReportParams )
[ 22.311621] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [4071])
[ 22.400702] drbd drbd0/0 drbd0: drbd_sync_handshake:
[ 22.400869] drbd drbd0/0 drbd0: self
32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
bits:0 flags:0
[ 22.401205] drbd drbd0/0 drbd0: peer
32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
bits:0 flags:0
[ 22.401538] drbd drbd0/0 drbd0: uuid_compare()=0 by rule 40
[ 22.401709] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
[ 22.415394] drbd drbd0/0 drbd0: role( Secondary -> Primary )
[ 22.506540] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096 sent=-5
[ 22.506773] drbd drbd0: peer( Secondary -> Unknown ) conn(
Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
[ 22.507109] drbd drbd0/0 drbd0: new current UUID
7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
[ 22.507451] drbd drbd0: ack_receiver terminated
[ 22.507588] drbd drbd0: Terminating drbd_a_drbd0
[ 22.600693] drbd drbd0: Connection closed
[ 22.600937] drbd drbd0: conn( NetworkFailure -> Unconnected )
[ 22.601115] drbd drbd0: receiver terminated
[ 22.601238] drbd drbd0: Restarting receiver thread
[ 22.601378] drbd drbd0: receiver (re)started
[ 22.601508] drbd drbd0: conn( Unconnected -> WFConnection )
[ 23.260624] drbd drbd0: Handshake successful: Agreed network
protocol version 101
[ 23.260859] drbd drbd0: Feature flags enabled on protocol level:
0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[ 23.261187] drbd drbd0: conn( WFConnection -> WFReportParams )
[ 23.261367] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [4071])
[ 23.340593] drbd drbd0/0 drbd0: drbd_sync_handshake:
[ 23.340771] drbd drbd0/0 drbd0: self
7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
bits:1 flags:0
[ 23.341192] drbd drbd0/0 drbd0: peer
32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
bits:0 flags:0
[ 23.341649] drbd drbd0/0 drbd0: uuid_compare()=1 by rule 70
[ 23.341824] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
[ 23.344911] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
plain 0(0), RLE 23(1), total 23; compression: 100.0%
[ 23.396792] drbd drbd0/0 drbd0: receive bitmap stats
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[ 23.397210] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
before-resync-source minor-0
[ 23.407965] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
before-resync-source minor-0 exit code 0 (0x0)
[ 23.417547] drbd drbd0/0 drbd0: conn( WFBitMapS -> SyncSource )
pdsk( Consistent -> Inconsistent )
[ 23.426697] drbd drbd0/0 drbd0: Began resync as SyncSource (will
sync 4 KB [1 bits set]).
[ 23.435638] drbd drbd0/0 drbd0: updated sync UUID
7F8B15C04AF49C4D:32DEB2019708F68B:32DDB2019708F68B:7D97648599B446DD
[ 23.488608] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096 sent=-5
[ 23.498182] drbd drbd0/0 drbd0: drbd_send_block() failed
[ 23.508498] drbd drbd0: peer( Secondary -> Unknown ) conn(
SyncSource -> NetworkFailure )
[ 23.517597] drbd drbd0: ack_receiver terminated
[ 23.527513] drbd drbd0: Terminating drbd_a_drbd0
[ 23.690598] drbd drbd0: Connection closed
[ 23.701857] drbd drbd0: conn( NetworkFailure -> Unconnected )
[ 23.712017] drbd drbd0: receiver terminated
[ 23.721597] drbd drbd0: Restarting receiver thread
On the peer:
[349071.038278] drbd drbd0: conn( Unconnected -> WFConnection )
[349071.558245] drbd drbd0: Handshake successful: Agreed network
protocol version 101
[349071.562105] drbd drbd0: Feature flags enabled on protocol level:
0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[349071.569889] drbd drbd0: conn( WFConnection -> WFReportParams )
[349071.573802] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [2660])
[349071.688547] drbd drbd0/0 drbd0: drbd_sync_handshake:
[349071.692323] drbd drbd0/0 drbd0: self
3375B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
bits:1 flags:0
[349071.699871] drbd drbd0/0 drbd0: peer
7F8B15C04AF49C4D:3375B2019708F68B:3374B2019708F68B:3373B2019708F68B
bits:1 flags:0
[349071.707687] drbd drbd0/0 drbd0: uuid_compare()=-1 by rule 50
[349071.711563] drbd drbd0/0 drbd0: Becoming sync target due to disk states.
[349071.715381] drbd drbd0/0 drbd0: peer( Unknown -> Primary ) conn(
WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
[349071.723039] drbd drbd0/0 drbd0: receive bitmap stats
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[349071.732489] drbd drbd0/0 drbd0: send bitmap stats
[Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
[349071.740178] drbd drbd0/0 drbd0: conn( WFBitMapT -> WFSyncUUID )
[349071.787113] drbd drbd0/0 drbd0: updated sync uuid
3376B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
[349071.794907] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0
[349071.800006] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0 exit code 0 (0x0)
[349071.807737] drbd drbd0/0 drbd0: conn( WFSyncUUID -> SyncTarget )
[349071.811639] drbd drbd0/0 drbd0: Began resync as SyncTarget (will
sync 4 KB [1 bits set]).
[349071.916117] drbd drbd0: sock was shut down by peer
[349071.919955] drbd drbd0: peer( Primary -> Unknown ) conn(
SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
[349071.927796] drbd drbd0: short read (expected size 4096)
[349071.931812] drbd drbd0: error receiving RSDataReply, e: -5 l: 4096!
[349071.935864] drbd drbd0: ack_receiver terminated
[349071.939906] drbd drbd0: Terminating drbd_a_drbd0
[349072.088385] drbd drbd0: Connection closed
[349072.092398] drbd drbd0: conn( BrokenPipe -> Unconnected )
[349072.096436] drbd drbd0: receiver terminated
[349072.100469] drbd drbd0: Restarting receiver thread
[349072.104454] drbd drbd0: receiver (re)started
[349072.108373] drbd drbd0: conn( Unconnected -> WFConnection )
--
Best Regards,
Serguei
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: DRBD broken in kernel 6.5 and 6.5.1
2023-09-02 20:37 Serguei Ivantsov
@ 2023-09-02 21:09 ` Greg KH
2023-09-02 21:18 ` Serguei Ivantsov
2023-09-03 5:45 ` Linux regression tracking (Thorsten Leemhuis)
1 sibling, 1 reply; 9+ messages in thread
From: Greg KH @ 2023-09-02 21:09 UTC (permalink / raw)
To: Serguei Ivantsov; +Cc: regressions
On Sat, Sep 02, 2023 at 10:37:04PM +0200, Serguei Ivantsov wrote:
> Hello,
>
> After upgrading the kernel to 6.5 the system can't connect to the peer
> (6.4.11) anymore.
> I checked 6.5.1 - same issue.
> All previous kernels including 6.4.14 are working just fine.
> Checking the 6.5 changelog, I found commit
> 9ae440b8fdd6772b6c007fa3d3766530a09c9045 which mentioned some changes
> to DRBD.
If you revert that change, does it work properly?
If not, can you run 'git bisect' to try to find the offending change?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: DRBD broken in kernel 6.5 and 6.5.1
2023-09-02 21:09 ` Greg KH
@ 2023-09-02 21:18 ` Serguei Ivantsov
0 siblings, 0 replies; 9+ messages in thread
From: Serguei Ivantsov @ 2023-09-02 21:18 UTC (permalink / raw)
To: Greg KH; +Cc: regressions
Well, my bad, it appears there were more commits related to DRBD in 6.5
I'll try to find which is causing the issue.
--
Serguei
On Sat, Sep 2, 2023 at 11:09 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Sat, Sep 02, 2023 at 10:37:04PM +0200, Serguei Ivantsov wrote:
> > Hello,
> >
> > After upgrading the kernel to 6.5 the system can't connect to the peer
> > (6.4.11) anymore.
> > I checked 6.5.1 - same issue.
> > All previous kernels including 6.4.14 are working just fine.
> > Checking the 6.5 changelog, I found commit
> > 9ae440b8fdd6772b6c007fa3d3766530a09c9045 which mentioned some changes
> > to DRBD.
>
> If you revert that change, does it work properly?
>
> If not, can you run 'git bisect' to try to find the offending change?
>
> thanks,
>
> greg k-h
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: DRBD broken in kernel 6.5 and 6.5.1
2023-09-02 20:37 Serguei Ivantsov
2023-09-02 21:09 ` Greg KH
@ 2023-09-03 5:45 ` Linux regression tracking (Thorsten Leemhuis)
2023-09-05 15:31 ` Thomas Voegtle
2023-09-05 17:24 ` Christoph Böhmwalder
1 sibling, 2 replies; 9+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-09-03 5:45 UTC (permalink / raw)
To: Serguei Ivantsov, regressions
Cc: Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder,
drbd-dev, LKML
CCing the DRBD maintainers and the appropriate lists, as they should
know about this -- or actually might know what is causing this already
or be able to guess the cause. For the rest of this mail:
[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.]
On 02.09.23 22:37, Serguei Ivantsov wrote:
> Hello,
>
> After upgrading the kernel to 6.5 the system can't connect to the peer
> (6.4.11) anymore.
> I checked 6.5.1 - same issue.
> All previous kernels including 6.4.14 are working just fine.
> Checking the 6.5 changelog, I found commit
> 9ae440b8fdd6772b6c007fa3d3766530a09c9045 which mentioned some changes
> to DRBD.
>
> On the 6.5.X system I have the following in the kernel log
> (drbd_send_block() failed):
>
> [ 2.473497] drbd: initialized. Version: 8.4.11 (api:1/proto:86-101)
>
> [ 2.475394] drbd: built-in
>
> [ 2.477254] drbd: registered as block device major 147
>
> [ 7.421400] drbd drbd0: Starting worker thread (from drbdsetup-84 [3844])
>
> [ 7.421509] drbd drbd0/0 drbd0: disk( Diskless -> Attaching )
>
> [ 7.421552] drbd drbd0: Method to ensure write ordering: flush
>
> [ 7.421554] drbd drbd0/0 drbd0: max BIO size = 131072
>
> [ 7.421557] drbd drbd0/0 drbd0: drbd_bm_resize called with capacity
> == 1845173184
>
> [ 7.428017] drbd drbd0/0 drbd0: resync bitmap: bits=230646648
> words=3603854 pages=7039
>
> [ 7.467370] drbd0: detected capacity change from 0 to 1845173184
>
> [ 7.467372] drbd drbd0/0 drbd0: size = 880 GB (922586592 KB)
>
> [ 7.486005] drbd drbd0/0 drbd0: recounting of set bits took
> additional 0 jiffies
>
> [ 7.486010] drbd drbd0/0 drbd0: 0 KB (0 bits) marked out-of-sync by
> on disk bit-map.
>
> [ 7.486017] drbd drbd0/0 drbd0: disk( Attaching -> UpToDate )
>
> [ 7.486021] drbd drbd0/0 drbd0: attached to UUIDs
> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
>
> [ 7.486863] drbd drbd0: conn( StandAlone -> Unconnected )
>
> [ 7.486871] drbd drbd0: Starting receiver thread (from drbd_w_drbd0 [3847])
>
> [ 7.486918] drbd drbd0: receiver (re)started
>
> [ 7.486929] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [ 12.340212] drbd drbd0: initial packet S crossed
>
> [ 22.310856] drbd drbd0: Handshake successful: Agreed network
> protocol version 101
>
> [ 22.311087] drbd drbd0: Feature flags enabled on protocol level:
> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [ 22.311425] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [ 22.311621] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [4071])
>
> [ 22.400702] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [ 22.400869] drbd drbd0/0 drbd0: self
> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
> bits:0 flags:0
>
> [ 22.401205] drbd drbd0/0 drbd0: peer
> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
> bits:0 flags:0
>
> [ 22.401538] drbd drbd0/0 drbd0: uuid_compare()=0 by rule 40
>
> [ 22.401709] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
>
> [ 22.415394] drbd drbd0/0 drbd0: role( Secondary -> Primary )
>
> [ 22.506540] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096 sent=-5
>
> [ 22.506773] drbd drbd0: peer( Secondary -> Unknown ) conn(
> Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
>
> [ 22.507109] drbd drbd0/0 drbd0: new current UUID
> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
>
> [ 22.507451] drbd drbd0: ack_receiver terminated
>
> [ 22.507588] drbd drbd0: Terminating drbd_a_drbd0
>
> [ 22.600693] drbd drbd0: Connection closed
>
> [ 22.600937] drbd drbd0: conn( NetworkFailure -> Unconnected )
>
> [ 22.601115] drbd drbd0: receiver terminated
>
> [ 22.601238] drbd drbd0: Restarting receiver thread
>
> [ 22.601378] drbd drbd0: receiver (re)started
>
> [ 22.601508] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [ 23.260624] drbd drbd0: Handshake successful: Agreed network
> protocol version 101
>
> [ 23.260859] drbd drbd0: Feature flags enabled on protocol level:
> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [ 23.261187] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [ 23.261367] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [4071])
>
> [ 23.340593] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [ 23.340771] drbd drbd0/0 drbd0: self
> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
> bits:1 flags:0
>
> [ 23.341192] drbd drbd0/0 drbd0: peer
> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
> bits:0 flags:0
>
> [ 23.341649] drbd drbd0/0 drbd0: uuid_compare()=1 by rule 70
>
> [ 23.341824] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
> WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
>
> [ 23.344911] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [ 23.396792] drbd drbd0/0 drbd0: receive bitmap stats
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [ 23.397210] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-source minor-0
>
> [ 23.407965] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-source minor-0 exit code 0 (0x0)
>
> [ 23.417547] drbd drbd0/0 drbd0: conn( WFBitMapS -> SyncSource )
> pdsk( Consistent -> Inconsistent )
>
> [ 23.426697] drbd drbd0/0 drbd0: Began resync as SyncSource (will
> sync 4 KB [1 bits set]).
>
> [ 23.435638] drbd drbd0/0 drbd0: updated sync UUID
> 7F8B15C04AF49C4D:32DEB2019708F68B:32DDB2019708F68B:7D97648599B446DD
>
> [ 23.488608] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096 sent=-5
>
> [ 23.498182] drbd drbd0/0 drbd0: drbd_send_block() failed
>
> [ 23.508498] drbd drbd0: peer( Secondary -> Unknown ) conn(
> SyncSource -> NetworkFailure )
>
> [ 23.517597] drbd drbd0: ack_receiver terminated
>
> [ 23.527513] drbd drbd0: Terminating drbd_a_drbd0
>
> [ 23.690598] drbd drbd0: Connection closed
>
> [ 23.701857] drbd drbd0: conn( NetworkFailure -> Unconnected )
>
> [ 23.712017] drbd drbd0: receiver terminated
>
> [ 23.721597] drbd drbd0: Restarting receiver thread
>
>
>
> On the peer:
>
>
> [349071.038278] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [349071.558245] drbd drbd0: Handshake successful: Agreed network
> protocol version 101
>
> [349071.562105] drbd drbd0: Feature flags enabled on protocol level:
> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [349071.569889] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [349071.573802] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [2660])
>
> [349071.688547] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [349071.692323] drbd drbd0/0 drbd0: self
> 3375B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
> bits:1 flags:0
>
> [349071.699871] drbd drbd0/0 drbd0: peer
> 7F8B15C04AF49C4D:3375B2019708F68B:3374B2019708F68B:3373B2019708F68B
> bits:1 flags:0
>
> [349071.707687] drbd drbd0/0 drbd0: uuid_compare()=-1 by rule 50
>
> [349071.711563] drbd drbd0/0 drbd0: Becoming sync target due to disk states.
>
> [349071.715381] drbd drbd0/0 drbd0: peer( Unknown -> Primary ) conn(
> WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
>
> [349071.723039] drbd drbd0/0 drbd0: receive bitmap stats
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [349071.732489] drbd drbd0/0 drbd0: send bitmap stats
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [349071.740178] drbd drbd0/0 drbd0: conn( WFBitMapT -> WFSyncUUID )
>
> [349071.787113] drbd drbd0/0 drbd0: updated sync uuid
> 3376B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>
> [349071.794907] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-target minor-0
>
> [349071.800006] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-target minor-0 exit code 0 (0x0)
>
> [349071.807737] drbd drbd0/0 drbd0: conn( WFSyncUUID -> SyncTarget )
>
> [349071.811639] drbd drbd0/0 drbd0: Began resync as SyncTarget (will
> sync 4 KB [1 bits set]).
>
> [349071.916117] drbd drbd0: sock was shut down by peer
>
> [349071.919955] drbd drbd0: peer( Primary -> Unknown ) conn(
> SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>
> [349071.927796] drbd drbd0: short read (expected size 4096)
>
> [349071.931812] drbd drbd0: error receiving RSDataReply, e: -5 l: 4096!
>
> [349071.935864] drbd drbd0: ack_receiver terminated
>
> [349071.939906] drbd drbd0: Terminating drbd_a_drbd0
>
> [349072.088385] drbd drbd0: Connection closed
>
> [349072.092398] drbd drbd0: conn( BrokenPipe -> Unconnected )
>
> [349072.096436] drbd drbd0: receiver terminated
>
> [349072.100469] drbd drbd0: Restarting receiver thread
>
> [349072.104454] drbd drbd0: receiver (re)started
>
> [349072.108373] drbd drbd0: conn( Unconnected -> WFConnection )
>
>
> --
>
> Best Regards,
>
> Serguei
Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:
#regzbot ^introduced v6.4..v6.5
#regzbot title drbd: drbd_send_block() failed
#regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: DRBD broken in kernel 6.5 and 6.5.1
2023-09-03 5:45 ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-09-05 15:31 ` Thomas Voegtle
2023-09-05 17:24 ` Christoph Böhmwalder
1 sibling, 0 replies; 9+ messages in thread
From: Thomas Voegtle @ 2023-09-05 15:31 UTC (permalink / raw)
To: Linux regressions mailing list
Cc: Serguei Ivantsov, Philipp Reisner, Lars Ellenberg,
Christoph Böhmwalder, drbd-dev, LKML, David Howells
On Sun, 3 Sep 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
> CCing the DRBD maintainers and the appropriate lists, as they should
> know about this -- or actually might know what is causing this already
> or be able to guess the cause. For the rest of this mail:
>
> [TLDR: I'm adding this report to the list of tracked Linux kernel
> regressions; the text you find below is based on a few templates
> paragraphs you might have encountered already in similar form.]
>
> On 02.09.23 22:37, Serguei Ivantsov wrote:
>> Hello,
>>
>> After upgrading the kernel to 6.5 the system can't connect to the peer
>> (6.4.11) anymore.
>> I checked 6.5.1 - same issue.
>> All previous kernels including 6.4.14 are working just fine.
>> Checking the 6.5 changelog, I found commit
>> 9ae440b8fdd6772b6c007fa3d3766530a09c9045 which mentioned some changes
>> to DRBD.
>>
>> On the 6.5.X system I have the following in the kernel log
>> (drbd_send_block() failed):
>>
>> [ 2.473497] drbd: initialized. Version: 8.4.11 (api:1/proto:86-101)
>>
>> [ 2.475394] drbd: built-in
>>
>> [ 2.477254] drbd: registered as block device major 147
>>
>> [ 7.421400] drbd drbd0: Starting worker thread (from drbdsetup-84 [3844])
>>
>> [ 7.421509] drbd drbd0/0 drbd0: disk( Diskless -> Attaching )
>>
>> [ 7.421552] drbd drbd0: Method to ensure write ordering: flush
>>
>> [ 7.421554] drbd drbd0/0 drbd0: max BIO size = 131072
>>
>> [ 7.421557] drbd drbd0/0 drbd0: drbd_bm_resize called with capacity
>> == 1845173184
>>
>> [ 7.428017] drbd drbd0/0 drbd0: resync bitmap: bits=230646648
>> words=3603854 pages=7039
>>
>> [ 7.467370] drbd0: detected capacity change from 0 to 1845173184
>>
>> [ 7.467372] drbd drbd0/0 drbd0: size = 880 GB (922586592 KB)
>>
>> [ 7.486005] drbd drbd0/0 drbd0: recounting of set bits took
>> additional 0 jiffies
>>
>> [ 7.486010] drbd drbd0/0 drbd0: 0 KB (0 bits) marked out-of-sync by
>> on disk bit-map.
>>
>> [ 7.486017] drbd drbd0/0 drbd0: disk( Attaching -> UpToDate )
>>
>> [ 7.486021] drbd drbd0/0 drbd0: attached to UUIDs
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
>>
>> [ 7.486863] drbd drbd0: conn( StandAlone -> Unconnected )
>>
>> [ 7.486871] drbd drbd0: Starting receiver thread (from drbd_w_drbd0 [3847])
>>
>> [ 7.486918] drbd drbd0: receiver (re)started
>>
>> [ 7.486929] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>> [ 12.340212] drbd drbd0: initial packet S crossed
>>
>> [ 22.310856] drbd drbd0: Handshake successful: Agreed network
>> protocol version 101
>>
>> [ 22.311087] drbd drbd0: Feature flags enabled on protocol level:
>> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>>
>> [ 22.311425] drbd drbd0: conn( WFConnection -> WFReportParams )
>>
>> [ 22.311621] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [4071])
>>
>> [ 22.400702] drbd drbd0/0 drbd0: drbd_sync_handshake:
>>
>> [ 22.400869] drbd drbd0/0 drbd0: self
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
>> bits:0 flags:0
>>
>> [ 22.401205] drbd drbd0/0 drbd0: peer
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>> bits:0 flags:0
>>
>> [ 22.401538] drbd drbd0/0 drbd0: uuid_compare()=0 by rule 40
>>
>> [ 22.401709] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
>> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
>>
>> [ 22.415394] drbd drbd0/0 drbd0: role( Secondary -> Primary )
>>
>> [ 22.506540] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096 sent=-5
>>
>> [ 22.506773] drbd drbd0: peer( Secondary -> Unknown ) conn(
>> Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
>>
>> [ 22.507109] drbd drbd0/0 drbd0: new current UUID
>> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
>>
>> [ 22.507451] drbd drbd0: ack_receiver terminated
>>
>> [ 22.507588] drbd drbd0: Terminating drbd_a_drbd0
>>
>> [ 22.600693] drbd drbd0: Connection closed
>>
>> [ 22.600937] drbd drbd0: conn( NetworkFailure -> Unconnected )
>>
>> [ 22.601115] drbd drbd0: receiver terminated
>>
>> [ 22.601238] drbd drbd0: Restarting receiver thread
>>
>> [ 22.601378] drbd drbd0: receiver (re)started
>>
>> [ 22.601508] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>> [ 23.260624] drbd drbd0: Handshake successful: Agreed network
>> protocol version 101
>>
>> [ 23.260859] drbd drbd0: Feature flags enabled on protocol level:
>> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>>
>> [ 23.261187] drbd drbd0: conn( WFConnection -> WFReportParams )
>>
>> [ 23.261367] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [4071])
>>
>> [ 23.340593] drbd drbd0/0 drbd0: drbd_sync_handshake:
>>
>> [ 23.340771] drbd drbd0/0 drbd0: self
>> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
>> bits:1 flags:0
>>
>> [ 23.341192] drbd drbd0/0 drbd0: peer
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>> bits:0 flags:0
>>
>> [ 23.341649] drbd drbd0/0 drbd0: uuid_compare()=1 by rule 70
>>
>> [ 23.341824] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
>> WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
>>
>> [ 23.344911] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
>> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [ 23.396792] drbd drbd0/0 drbd0: receive bitmap stats
>> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [ 23.397210] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-source minor-0
>>
>> [ 23.407965] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-source minor-0 exit code 0 (0x0)
>>
>> [ 23.417547] drbd drbd0/0 drbd0: conn( WFBitMapS -> SyncSource )
>> pdsk( Consistent -> Inconsistent )
>>
>> [ 23.426697] drbd drbd0/0 drbd0: Began resync as SyncSource (will
>> sync 4 KB [1 bits set]).
>>
>> [ 23.435638] drbd drbd0/0 drbd0: updated sync UUID
>> 7F8B15C04AF49C4D:32DEB2019708F68B:32DDB2019708F68B:7D97648599B446DD
>>
>> [ 23.488608] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096 sent=-5
>>
>> [ 23.498182] drbd drbd0/0 drbd0: drbd_send_block() failed
>>
>> [ 23.508498] drbd drbd0: peer( Secondary -> Unknown ) conn(
>> SyncSource -> NetworkFailure )
>>
>> [ 23.517597] drbd drbd0: ack_receiver terminated
>>
>> [ 23.527513] drbd drbd0: Terminating drbd_a_drbd0
>>
>> [ 23.690598] drbd drbd0: Connection closed
>>
>> [ 23.701857] drbd drbd0: conn( NetworkFailure -> Unconnected )
>>
>> [ 23.712017] drbd drbd0: receiver terminated
>>
>> [ 23.721597] drbd drbd0: Restarting receiver thread
>>
>>
>>
>> On the peer:
>>
>>
>> [349071.038278] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>> [349071.558245] drbd drbd0: Handshake successful: Agreed network
>> protocol version 101
>>
>> [349071.562105] drbd drbd0: Feature flags enabled on protocol level:
>> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>>
>> [349071.569889] drbd drbd0: conn( WFConnection -> WFReportParams )
>>
>> [349071.573802] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [2660])
>>
>> [349071.688547] drbd drbd0/0 drbd0: drbd_sync_handshake:
>>
>> [349071.692323] drbd drbd0/0 drbd0: self
>> 3375B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>> bits:1 flags:0
>>
>> [349071.699871] drbd drbd0/0 drbd0: peer
>> 7F8B15C04AF49C4D:3375B2019708F68B:3374B2019708F68B:3373B2019708F68B
>> bits:1 flags:0
>>
>> [349071.707687] drbd drbd0/0 drbd0: uuid_compare()=-1 by rule 50
>>
>> [349071.711563] drbd drbd0/0 drbd0: Becoming sync target due to disk states.
>>
>> [349071.715381] drbd drbd0/0 drbd0: peer( Unknown -> Primary ) conn(
>> WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
>>
>> [349071.723039] drbd drbd0/0 drbd0: receive bitmap stats
>> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [349071.732489] drbd drbd0/0 drbd0: send bitmap stats
>> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [349071.740178] drbd drbd0/0 drbd0: conn( WFBitMapT -> WFSyncUUID )
>>
>> [349071.787113] drbd drbd0/0 drbd0: updated sync uuid
>> 3376B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>>
>> [349071.794907] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-target minor-0
>>
>> [349071.800006] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-target minor-0 exit code 0 (0x0)
>>
>> [349071.807737] drbd drbd0/0 drbd0: conn( WFSyncUUID -> SyncTarget )
>>
>> [349071.811639] drbd drbd0/0 drbd0: Began resync as SyncTarget (will
>> sync 4 KB [1 bits set]).
>>
>> [349071.916117] drbd drbd0: sock was shut down by peer
>>
>> [349071.919955] drbd drbd0: peer( Primary -> Unknown ) conn(
>> SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>>
>> [349071.927796] drbd drbd0: short read (expected size 4096)
>>
>> [349071.931812] drbd drbd0: error receiving RSDataReply, e: -5 l: 4096!
>>
>> [349071.935864] drbd drbd0: ack_receiver terminated
>>
>> [349071.939906] drbd drbd0: Terminating drbd_a_drbd0
>>
>> [349072.088385] drbd drbd0: Connection closed
>>
>> [349072.092398] drbd drbd0: conn( BrokenPipe -> Unconnected )
>>
>> [349072.096436] drbd drbd0: receiver terminated
>>
>> [349072.100469] drbd drbd0: Restarting receiver thread
>>
>> [349072.104454] drbd drbd0: receiver (re)started
>>
>> [349072.108373] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>>
>> --
>>
>> Best Regards,
>>
>> Serguei
>
>
> Thanks for the report. To be sure the issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
>
> #regzbot ^introduced v6.4..v6.5
> #regzbot title drbd: drbd_send_block() failed
> #regzbot ignore-activity
>
> This isn't a regression? This issue or a fix for it are already
> discussed somewhere else? It was fixed already? You want to clarify when
> the regression started to happen? Or point out I got the title or
> something else totally wrong? Then just reply and tell me -- ideally
> while also telling regzbot about it, as explained by the page listed in
> the footer of this mail.
>
> Developers: When fixing the issue, remember to add 'Link:' tags pointing
> to the report (the parent of this mail). See page linked in footer for
> details.
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> That page also explains what to do if mails like this annoy you.
>
>
Saw the same problem today.
Didn't bisect, just started reverting commits.
I found out that a revert of
commit eeac7405c735acde8ec78869489a5aa25a141c13
drbd: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
fixed drbd for me on Linux 6.5.0.
You will need to return these as well, otherwise you will not be able to
build:
commit dc97391e661009eab46783030d2404c9b6e6f2e7
sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)
commit b848b26c6672c9b977890ba85f5a155e5eb221f0
net: Kill MSG_SENDPAGE_NOTLAST
Hope this helps.
Thomas
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: DRBD broken in kernel 6.5 and 6.5.1
2023-09-03 5:45 ` Linux regression tracking (Thorsten Leemhuis)
2023-09-05 15:31 ` Thomas Voegtle
@ 2023-09-05 17:24 ` Christoph Böhmwalder
1 sibling, 0 replies; 9+ messages in thread
From: Christoph Böhmwalder @ 2023-09-05 17:24 UTC (permalink / raw)
To: Linux regressions mailing list, Serguei Ivantsov
Cc: Philipp Reisner, Lars Ellenberg, drbd-dev, LKML, David Howells
Am 03.09.23 um 07:45 schrieb Linux regression tracking (Thorsten Leemhuis):
> CCing the DRBD maintainers and the appropriate lists, as they should
> know about this -- or actually might know what is causing this already
> or be able to guess the cause. For the rest of this mail:
>
> [TLDR: I'm adding this report to the list of tracked Linux kernel
> regressions; the text you find below is based on a few templates
> paragraphs you might have encountered already in similar form.]
>
> On 02.09.23 22:37, Serguei Ivantsov wrote:
>> Hello,
>>
>> After upgrading the kernel to 6.5 the system can't connect to the peer
>> (6.4.11) anymore.
>> I checked 6.5.1 - same issue.
>> All previous kernels including 6.4.14 are working just fine.
>> Checking the 6.5 changelog, I found commit
>> 9ae440b8fdd6772b6c007fa3d3766530a09c9045 which mentioned some changes
>> to DRBD.
>>
>> On the 6.5.X system I have the following in the kernel log
>> (drbd_send_block() failed):
>>
>> [ 2.473497] drbd: initialized. Version: 8.4.11 (api:1/proto:86-101)
>>
>> [ 2.475394] drbd: built-in
>>
>> [ 2.477254] drbd: registered as block device major 147
>>
>> [ 7.421400] drbd drbd0: Starting worker thread (from drbdsetup-84 [3844])
>>
>> [ 7.421509] drbd drbd0/0 drbd0: disk( Diskless -> Attaching )
>>
>> [ 7.421552] drbd drbd0: Method to ensure write ordering: flush
>>
>> [ 7.421554] drbd drbd0/0 drbd0: max BIO size = 131072
>>
>> [ 7.421557] drbd drbd0/0 drbd0: drbd_bm_resize called with capacity
>> == 1845173184
>>
>> [ 7.428017] drbd drbd0/0 drbd0: resync bitmap: bits=230646648
>> words=3603854 pages=7039
>>
>> [ 7.467370] drbd0: detected capacity change from 0 to 1845173184
>>
>> [ 7.467372] drbd drbd0/0 drbd0: size = 880 GB (922586592 KB)
>>
>> [ 7.486005] drbd drbd0/0 drbd0: recounting of set bits took
>> additional 0 jiffies
>>
>> [ 7.486010] drbd drbd0/0 drbd0: 0 KB (0 bits) marked out-of-sync by
>> on disk bit-map.
>>
>> [ 7.486017] drbd drbd0/0 drbd0: disk( Attaching -> UpToDate )
>>
>> [ 7.486021] drbd drbd0/0 drbd0: attached to UUIDs
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
>>
>> [ 7.486863] drbd drbd0: conn( StandAlone -> Unconnected )
>>
>> [ 7.486871] drbd drbd0: Starting receiver thread (from drbd_w_drbd0 [3847])
>>
>> [ 7.486918] drbd drbd0: receiver (re)started
>>
>> [ 7.486929] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>> [ 12.340212] drbd drbd0: initial packet S crossed
>>
>> [ 22.310856] drbd drbd0: Handshake successful: Agreed network
>> protocol version 101
>>
>> [ 22.311087] drbd drbd0: Feature flags enabled on protocol level:
>> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>>
>> [ 22.311425] drbd drbd0: conn( WFConnection -> WFReportParams )
>>
>> [ 22.311621] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [4071])
>>
>> [ 22.400702] drbd drbd0/0 drbd0: drbd_sync_handshake:
>>
>> [ 22.400869] drbd drbd0/0 drbd0: self
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
>> bits:0 flags:0
>>
>> [ 22.401205] drbd drbd0/0 drbd0: peer
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>> bits:0 flags:0
>>
>> [ 22.401538] drbd drbd0/0 drbd0: uuid_compare()=0 by rule 40
>>
>> [ 22.401709] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
>> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
>>
>> [ 22.415394] drbd drbd0/0 drbd0: role( Secondary -> Primary )
>>
>> [ 22.506540] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096 sent=-5
>>
>> [ 22.506773] drbd drbd0: peer( Secondary -> Unknown ) conn(
>> Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
>>
>> [ 22.507109] drbd drbd0/0 drbd0: new current UUID
>> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
>>
>> [ 22.507451] drbd drbd0: ack_receiver terminated
>>
>> [ 22.507588] drbd drbd0: Terminating drbd_a_drbd0
>>
>> [ 22.600693] drbd drbd0: Connection closed
>>
>> [ 22.600937] drbd drbd0: conn( NetworkFailure -> Unconnected )
>>
>> [ 22.601115] drbd drbd0: receiver terminated
>>
>> [ 22.601238] drbd drbd0: Restarting receiver thread
>>
>> [ 22.601378] drbd drbd0: receiver (re)started
>>
>> [ 22.601508] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>> [ 23.260624] drbd drbd0: Handshake successful: Agreed network
>> protocol version 101
>>
>> [ 23.260859] drbd drbd0: Feature flags enabled on protocol level:
>> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>>
>> [ 23.261187] drbd drbd0: conn( WFConnection -> WFReportParams )
>>
>> [ 23.261367] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [4071])
>>
>> [ 23.340593] drbd drbd0/0 drbd0: drbd_sync_handshake:
>>
>> [ 23.340771] drbd drbd0/0 drbd0: self
>> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
>> bits:1 flags:0
>>
>> [ 23.341192] drbd drbd0/0 drbd0: peer
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>> bits:0 flags:0
>>
>> [ 23.341649] drbd drbd0/0 drbd0: uuid_compare()=1 by rule 70
>>
>> [ 23.341824] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
>> WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
>>
>> [ 23.344911] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
>> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [ 23.396792] drbd drbd0/0 drbd0: receive bitmap stats
>> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [ 23.397210] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-source minor-0
>>
>> [ 23.407965] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-source minor-0 exit code 0 (0x0)
>>
>> [ 23.417547] drbd drbd0/0 drbd0: conn( WFBitMapS -> SyncSource )
>> pdsk( Consistent -> Inconsistent )
>>
>> [ 23.426697] drbd drbd0/0 drbd0: Began resync as SyncSource (will
>> sync 4 KB [1 bits set]).
>>
>> [ 23.435638] drbd drbd0/0 drbd0: updated sync UUID
>> 7F8B15C04AF49C4D:32DEB2019708F68B:32DDB2019708F68B:7D97648599B446DD
>>
>> [ 23.488608] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096 sent=-5
>>
>> [ 23.498182] drbd drbd0/0 drbd0: drbd_send_block() failed
>>
>> [ 23.508498] drbd drbd0: peer( Secondary -> Unknown ) conn(
>> SyncSource -> NetworkFailure )
>>
>> [ 23.517597] drbd drbd0: ack_receiver terminated
>>
>> [ 23.527513] drbd drbd0: Terminating drbd_a_drbd0
>>
>> [ 23.690598] drbd drbd0: Connection closed
>>
>> [ 23.701857] drbd drbd0: conn( NetworkFailure -> Unconnected )
>>
>> [ 23.712017] drbd drbd0: receiver terminated
>>
>> [ 23.721597] drbd drbd0: Restarting receiver thread
>>
>>
>>
>> On the peer:
>>
>>
>> [349071.038278] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>> [349071.558245] drbd drbd0: Handshake successful: Agreed network
>> protocol version 101
>>
>> [349071.562105] drbd drbd0: Feature flags enabled on protocol level:
>> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>>
>> [349071.569889] drbd drbd0: conn( WFConnection -> WFReportParams )
>>
>> [349071.573802] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [2660])
>>
>> [349071.688547] drbd drbd0/0 drbd0: drbd_sync_handshake:
>>
>> [349071.692323] drbd drbd0/0 drbd0: self
>> 3375B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>> bits:1 flags:0
>>
>> [349071.699871] drbd drbd0/0 drbd0: peer
>> 7F8B15C04AF49C4D:3375B2019708F68B:3374B2019708F68B:3373B2019708F68B
>> bits:1 flags:0
>>
>> [349071.707687] drbd drbd0/0 drbd0: uuid_compare()=-1 by rule 50
>>
>> [349071.711563] drbd drbd0/0 drbd0: Becoming sync target due to disk states.
>>
>> [349071.715381] drbd drbd0/0 drbd0: peer( Unknown -> Primary ) conn(
>> WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
>>
>> [349071.723039] drbd drbd0/0 drbd0: receive bitmap stats
>> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [349071.732489] drbd drbd0/0 drbd0: send bitmap stats
>> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [349071.740178] drbd drbd0/0 drbd0: conn( WFBitMapT -> WFSyncUUID )
>>
>> [349071.787113] drbd drbd0/0 drbd0: updated sync uuid
>> 3376B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>>
>> [349071.794907] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-target minor-0
>>
>> [349071.800006] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-target minor-0 exit code 0 (0x0)
>>
>> [349071.807737] drbd drbd0/0 drbd0: conn( WFSyncUUID -> SyncTarget )
>>
>> [349071.811639] drbd drbd0/0 drbd0: Began resync as SyncTarget (will
>> sync 4 KB [1 bits set]).
>>
>> [349071.916117] drbd drbd0: sock was shut down by peer
>>
>> [349071.919955] drbd drbd0: peer( Primary -> Unknown ) conn(
>> SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>>
>> [349071.927796] drbd drbd0: short read (expected size 4096)
>>
>> [349071.931812] drbd drbd0: error receiving RSDataReply, e: -5 l: 4096!
>>
>> [349071.935864] drbd drbd0: ack_receiver terminated
>>
>> [349071.939906] drbd drbd0: Terminating drbd_a_drbd0
>>
>> [349072.088385] drbd drbd0: Connection closed
>>
>> [349072.092398] drbd drbd0: conn( BrokenPipe -> Unconnected )
>>
>> [349072.096436] drbd drbd0: receiver terminated
>>
>> [349072.100469] drbd drbd0: Restarting receiver thread
>>
>> [349072.104454] drbd drbd0: receiver (re)started
>>
>> [349072.108373] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>>
>> --
>>
>> Best Regards,
>>
>> Serguei
>
>
> Thanks for the report. To be sure the issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
>
> #regzbot ^introduced v6.4..v6.5
> #regzbot title drbd: drbd_send_block() failed
> #regzbot ignore-activity
>
> This isn't a regression? This issue or a fix for it are already
> discussed somewhere else? It was fixed already? You want to clarify when
> the regression started to happen? Or point out I got the title or
> something else totally wrong? Then just reply and tell me -- ideally
> while also telling regzbot about it, as explained by the page listed in
> the footer of this mail.
>
> Developers: When fixing the issue, remember to add 'Link:' tags pointing
> to the report (the parent of this mail). See page linked in footer for
> details.
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> That page also explains what to do if mails like this annoy you.
Hello,
thank you for the report. We were able to reproduce the issue and are
now investigating.
Based off of context we think that eeac7405c735 ("drbd: Use
sendmsg(MSG_SPLICE_PAGES) rather than sendpage()") is the culprit.
(See also the other reply by Thomas Voegtle, which confirms this.)
I'm also adding David Howells to CC, maybe he has an idea what could be
happening here.
--
Christoph Böhmwalder
LINBIT | Keeping the Digital World Running
DRBD HA — Disaster Recovery — Software defined Storage
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2023-09-05 17:24 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-02 20:31 [Drbd-dev] DRBD broken in kernel 6.5 and 6.5.1 Serguei Ivantsov
2023-09-05 15:23 ` Jakub Kicinski
2023-09-05 15:23 ` [Drbd-dev] " Jakub Kicinski
-- strict thread matches above, loose matches on Subject: below --
2023-09-02 20:37 Serguei Ivantsov
2023-09-02 21:09 ` Greg KH
2023-09-02 21:18 ` Serguei Ivantsov
2023-09-03 5:45 ` Linux regression tracking (Thorsten Leemhuis)
2023-09-05 15:31 ` Thomas Voegtle
2023-09-05 17:24 ` Christoph Böhmwalder
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.