* [Drbd-dev] DRBD broken in kernel 6.5 and 6.5.1
@ 2023-09-02 20:31 Serguei Ivantsov
2023-09-05 15:23 ` Jakub Kicinski
0 siblings, 1 reply; 5+ messages in thread
From: Serguei Ivantsov @ 2023-09-02 20:31 UTC (permalink / raw)
To: stable; +Cc: kuba, regressions, drbd-dev
[-- Attachment #1: Type: text/plain, Size: 8275 bytes --]
Hello,
After upgrading the kernel to 6.5 the system can't connect to the peer
(6.4.11) anymore.
I checked 6.5.1 - same issue.
All previous kernels including 6.4.14 are working just fine.
Checking the 6.5 changelog, I found commit
9ae440b8fdd6772b6c007fa3d3766530a09c9045 which mentioned some changes to
DRBD.
On the 6.5.X system I have the following in the kernel log (drbd_send_block()
failed):
[ 2.473497] drbd: initialized. Version: 8.4.11 (api:1/proto:86-101)
[ 2.475394] drbd: built-in
[ 2.477254] drbd: registered as block device major 147
[ 7.421400] drbd drbd0: Starting worker thread (from drbdsetup-84 [3844])
[ 7.421509] drbd drbd0/0 drbd0: disk( Diskless -> Attaching )
[ 7.421552] drbd drbd0: Method to ensure write ordering: flush
[ 7.421554] drbd drbd0/0 drbd0: max BIO size = 131072
[ 7.421557] drbd drbd0/0 drbd0: drbd_bm_resize called with capacity ==
1845173184
[ 7.428017] drbd drbd0/0 drbd0: resync bitmap: bits=230646648
words=3603854 pages=7039
[ 7.467370] drbd0: detected capacity change from 0 to 1845173184
[ 7.467372] drbd drbd0/0 drbd0: size = 880 GB (922586592 KB)
[ 7.486005] drbd drbd0/0 drbd0: recounting of set bits took additional 0
jiffies
[ 7.486010] drbd drbd0/0 drbd0: 0 KB (0 bits) marked out-of-sync by on
disk bit-map.
[ 7.486017] drbd drbd0/0 drbd0: disk( Attaching -> UpToDate )
[ 7.486021] drbd drbd0/0 drbd0: attached to UUIDs
32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
[ 7.486863] drbd drbd0: conn( StandAlone -> Unconnected )
[ 7.486871] drbd drbd0: Starting receiver thread (from drbd_w_drbd0
[3847])
[ 7.486918] drbd drbd0: receiver (re)started
[ 7.486929] drbd drbd0: conn( Unconnected -> WFConnection )
[ 12.340212] drbd drbd0: initial packet S crossed
[ 22.310856] drbd drbd0: Handshake successful: Agreed network protocol
version 101
[ 22.311087] drbd drbd0: Feature flags enabled on protocol level: 0xf
TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[ 22.311425] drbd drbd0: conn( WFConnection -> WFReportParams )
[ 22.311621] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
[4071])
[ 22.400702] drbd drbd0/0 drbd0: drbd_sync_handshake:
[ 22.400869] drbd drbd0/0 drbd0: self
32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD bits:0
flags:0
[ 22.401205] drbd drbd0/0 drbd0: peer
32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:0
flags:0
[ 22.401538] drbd drbd0/0 drbd0: uuid_compare()=0 by rule 40
[ 22.401709] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
[ 22.415394] drbd drbd0/0 drbd0: role( Secondary -> Primary )
[ 22.506540] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096
sent=-5
[ 22.506773] drbd drbd0: peer( Secondary -> Unknown ) conn( Connected ->
NetworkFailure ) pdsk( UpToDate -> DUnknown )
[ 22.507109] drbd drbd0/0 drbd0: new current UUID
7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
[ 22.507451] drbd drbd0: ack_receiver terminated
[ 22.507588] drbd drbd0: Terminating drbd_a_drbd0
[ 22.600693] drbd drbd0: Connection closed
[ 22.600937] drbd drbd0: conn( NetworkFailure -> Unconnected )
[ 22.601115] drbd drbd0: receiver terminated
[ 22.601238] drbd drbd0: Restarting receiver thread
[ 22.601378] drbd drbd0: receiver (re)started
[ 22.601508] drbd drbd0: conn( Unconnected -> WFConnection )
[ 23.260624] drbd drbd0: Handshake successful: Agreed network protocol
version 101
[ 23.260859] drbd drbd0: Feature flags enabled on protocol level: 0xf
TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[ 23.261187] drbd drbd0: conn( WFConnection -> WFReportParams )
[ 23.261367] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
[4071])
[ 23.340593] drbd drbd0/0 drbd0: drbd_sync_handshake:
[ 23.340771] drbd drbd0/0 drbd0: self
7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD bits:1
flags:0
[ 23.341192] drbd drbd0/0 drbd0: peer
32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:0
flags:0
[ 23.341649] drbd drbd0/0 drbd0: uuid_compare()=1 by rule 70
[ 23.341824] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
[ 23.344911] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
plain 0(0), RLE 23(1), total 23; compression: 100.0%
[ 23.396792] drbd drbd0/0 drbd0: receive bitmap stats [Bytes(packets)]:
plain 0(0), RLE 23(1), total 23; compression: 100.0%
[ 23.397210] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
before-resync-source minor-0
[ 23.407965] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
before-resync-source minor-0 exit code 0 (0x0)
[ 23.417547] drbd drbd0/0 drbd0: conn( WFBitMapS -> SyncSource ) pdsk(
Consistent -> Inconsistent )
[ 23.426697] drbd drbd0/0 drbd0: Began resync as SyncSource (will sync 4
KB [1 bits set]).
[ 23.435638] drbd drbd0/0 drbd0: updated sync UUID
7F8B15C04AF49C4D:32DEB2019708F68B:32DDB2019708F68B:7D97648599B446DD
[ 23.488608] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096
sent=-5
[ 23.498182] drbd drbd0/0 drbd0: drbd_send_block() failed
[ 23.508498] drbd drbd0: peer( Secondary -> Unknown ) conn( SyncSource ->
NetworkFailure )
[ 23.517597] drbd drbd0: ack_receiver terminated
[ 23.527513] drbd drbd0: Terminating drbd_a_drbd0
[ 23.690598] drbd drbd0: Connection closed
[ 23.701857] drbd drbd0: conn( NetworkFailure -> Unconnected )
[ 23.712017] drbd drbd0: receiver terminated
[ 23.721597] drbd drbd0: Restarting receiver thread
On the peer:
[349071.038278] drbd drbd0: conn( Unconnected -> WFConnection )
[349071.558245] drbd drbd0: Handshake successful: Agreed network protocol
version 101
[349071.562105] drbd drbd0: Feature flags enabled on protocol level: 0xf
TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
[349071.569889] drbd drbd0: conn( WFConnection -> WFReportParams )
[349071.573802] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
[2660])
[349071.688547] drbd drbd0/0 drbd0: drbd_sync_handshake:
[349071.692323] drbd drbd0/0 drbd0: self
3375B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:1
flags:0
[349071.699871] drbd drbd0/0 drbd0: peer
7F8B15C04AF49C4D:3375B2019708F68B:3374B2019708F68B:3373B2019708F68B bits:1
flags:0
[349071.707687] drbd drbd0/0 drbd0: uuid_compare()=-1 by rule 50
[349071.711563] drbd drbd0/0 drbd0: Becoming sync target due to disk states.
[349071.715381] drbd drbd0/0 drbd0: peer( Unknown -> Primary ) conn(
WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
[349071.723039] drbd drbd0/0 drbd0: receive bitmap stats [Bytes(packets)]:
plain 0(0), RLE 23(1), total 23; compression: 100.0%
[349071.732489] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
plain 0(0), RLE 23(1), total 23; compression: 100.0%
[349071.740178] drbd drbd0/0 drbd0: conn( WFBitMapT -> WFSyncUUID )
[349071.787113] drbd drbd0/0 drbd0: updated sync uuid
3376B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
[349071.794907] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0
[349071.800006] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
before-resync-target minor-0 exit code 0 (0x0)
[349071.807737] drbd drbd0/0 drbd0: conn( WFSyncUUID -> SyncTarget )
[349071.811639] drbd drbd0/0 drbd0: Began resync as SyncTarget (will sync 4
KB [1 bits set]).
[349071.916117] drbd drbd0: sock was shut down by peer
[349071.919955] drbd drbd0: peer( Primary -> Unknown ) conn( SyncTarget ->
BrokenPipe ) pdsk( UpToDate -> DUnknown )
[349071.927796] drbd drbd0: short read (expected size 4096)
[349071.931812] drbd drbd0: error receiving RSDataReply, e: -5 l: 4096!
[349071.935864] drbd drbd0: ack_receiver terminated
[349071.939906] drbd drbd0: Terminating drbd_a_drbd0
[349072.088385] drbd drbd0: Connection closed
[349072.092398] drbd drbd0: conn( BrokenPipe -> Unconnected )
[349072.096436] drbd drbd0: receiver terminated
[349072.100469] drbd drbd0: Restarting receiver thread
[349072.104454] drbd drbd0: receiver (re)started
[349072.108373] drbd drbd0: conn( Unconnected -> WFConnection )
--
Best Regards,
Serguei
[-- Attachment #2: Type: text/html, Size: 35095 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Drbd-dev] DRBD broken in kernel 6.5 and 6.5.1
[not found] <CAKH+VT3YLmAn0Y8=q37UTDShqxDLsqPcQ4hBMzY7HPn7zNx+RQ@mail.gmail.com>
@ 2023-09-03 5:45 ` Linux regression tracking (Thorsten Leemhuis)
2023-09-05 15:31 ` Thomas Voegtle
2023-09-05 17:24 ` Christoph Böhmwalder
0 siblings, 2 replies; 5+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-09-03 5:45 UTC (permalink / raw)
To: Serguei Ivantsov, regressions
Cc: Lars Ellenberg, Philipp Reisner, LKML, drbd-dev
CCing the DRBD maintainers and the appropriate lists, as they should
know about this -- or actually might know what is causing this already
or be able to guess the cause. For the rest of this mail:
[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.]
On 02.09.23 22:37, Serguei Ivantsov wrote:
> Hello,
>
> After upgrading the kernel to 6.5 the system can't connect to the peer
> (6.4.11) anymore.
> I checked 6.5.1 - same issue.
> All previous kernels including 6.4.14 are working just fine.
> Checking the 6.5 changelog, I found commit
> 9ae440b8fdd6772b6c007fa3d3766530a09c9045 which mentioned some changes
> to DRBD.
>
> On the 6.5.X system I have the following in the kernel log
> (drbd_send_block() failed):
>
> [ 2.473497] drbd: initialized. Version: 8.4.11 (api:1/proto:86-101)
>
> [ 2.475394] drbd: built-in
>
> [ 2.477254] drbd: registered as block device major 147
>
> [ 7.421400] drbd drbd0: Starting worker thread (from drbdsetup-84 [3844])
>
> [ 7.421509] drbd drbd0/0 drbd0: disk( Diskless -> Attaching )
>
> [ 7.421552] drbd drbd0: Method to ensure write ordering: flush
>
> [ 7.421554] drbd drbd0/0 drbd0: max BIO size = 131072
>
> [ 7.421557] drbd drbd0/0 drbd0: drbd_bm_resize called with capacity
> == 1845173184
>
> [ 7.428017] drbd drbd0/0 drbd0: resync bitmap: bits=230646648
> words=3603854 pages=7039
>
> [ 7.467370] drbd0: detected capacity change from 0 to 1845173184
>
> [ 7.467372] drbd drbd0/0 drbd0: size = 880 GB (922586592 KB)
>
> [ 7.486005] drbd drbd0/0 drbd0: recounting of set bits took
> additional 0 jiffies
>
> [ 7.486010] drbd drbd0/0 drbd0: 0 KB (0 bits) marked out-of-sync by
> on disk bit-map.
>
> [ 7.486017] drbd drbd0/0 drbd0: disk( Attaching -> UpToDate )
>
> [ 7.486021] drbd drbd0/0 drbd0: attached to UUIDs
> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
>
> [ 7.486863] drbd drbd0: conn( StandAlone -> Unconnected )
>
> [ 7.486871] drbd drbd0: Starting receiver thread (from drbd_w_drbd0 [3847])
>
> [ 7.486918] drbd drbd0: receiver (re)started
>
> [ 7.486929] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [ 12.340212] drbd drbd0: initial packet S crossed
>
> [ 22.310856] drbd drbd0: Handshake successful: Agreed network
> protocol version 101
>
> [ 22.311087] drbd drbd0: Feature flags enabled on protocol level:
> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [ 22.311425] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [ 22.311621] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [4071])
>
> [ 22.400702] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [ 22.400869] drbd drbd0/0 drbd0: self
> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
> bits:0 flags:0
>
> [ 22.401205] drbd drbd0/0 drbd0: peer
> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
> bits:0 flags:0
>
> [ 22.401538] drbd drbd0/0 drbd0: uuid_compare()=0 by rule 40
>
> [ 22.401709] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
>
> [ 22.415394] drbd drbd0/0 drbd0: role( Secondary -> Primary )
>
> [ 22.506540] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096 sent=-5
>
> [ 22.506773] drbd drbd0: peer( Secondary -> Unknown ) conn(
> Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
>
> [ 22.507109] drbd drbd0/0 drbd0: new current UUID
> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
>
> [ 22.507451] drbd drbd0: ack_receiver terminated
>
> [ 22.507588] drbd drbd0: Terminating drbd_a_drbd0
>
> [ 22.600693] drbd drbd0: Connection closed
>
> [ 22.600937] drbd drbd0: conn( NetworkFailure -> Unconnected )
>
> [ 22.601115] drbd drbd0: receiver terminated
>
> [ 22.601238] drbd drbd0: Restarting receiver thread
>
> [ 22.601378] drbd drbd0: receiver (re)started
>
> [ 22.601508] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [ 23.260624] drbd drbd0: Handshake successful: Agreed network
> protocol version 101
>
> [ 23.260859] drbd drbd0: Feature flags enabled on protocol level:
> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [ 23.261187] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [ 23.261367] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [4071])
>
> [ 23.340593] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [ 23.340771] drbd drbd0/0 drbd0: self
> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
> bits:1 flags:0
>
> [ 23.341192] drbd drbd0/0 drbd0: peer
> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
> bits:0 flags:0
>
> [ 23.341649] drbd drbd0/0 drbd0: uuid_compare()=1 by rule 70
>
> [ 23.341824] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
> WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
>
> [ 23.344911] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [ 23.396792] drbd drbd0/0 drbd0: receive bitmap stats
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [ 23.397210] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-source minor-0
>
> [ 23.407965] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-source minor-0 exit code 0 (0x0)
>
> [ 23.417547] drbd drbd0/0 drbd0: conn( WFBitMapS -> SyncSource )
> pdsk( Consistent -> Inconsistent )
>
> [ 23.426697] drbd drbd0/0 drbd0: Began resync as SyncSource (will
> sync 4 KB [1 bits set]).
>
> [ 23.435638] drbd drbd0/0 drbd0: updated sync UUID
> 7F8B15C04AF49C4D:32DEB2019708F68B:32DDB2019708F68B:7D97648599B446DD
>
> [ 23.488608] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096 sent=-5
>
> [ 23.498182] drbd drbd0/0 drbd0: drbd_send_block() failed
>
> [ 23.508498] drbd drbd0: peer( Secondary -> Unknown ) conn(
> SyncSource -> NetworkFailure )
>
> [ 23.517597] drbd drbd0: ack_receiver terminated
>
> [ 23.527513] drbd drbd0: Terminating drbd_a_drbd0
>
> [ 23.690598] drbd drbd0: Connection closed
>
> [ 23.701857] drbd drbd0: conn( NetworkFailure -> Unconnected )
>
> [ 23.712017] drbd drbd0: receiver terminated
>
> [ 23.721597] drbd drbd0: Restarting receiver thread
>
>
>
> On the peer:
>
>
> [349071.038278] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [349071.558245] drbd drbd0: Handshake successful: Agreed network
> protocol version 101
>
> [349071.562105] drbd drbd0: Feature flags enabled on protocol level:
> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [349071.569889] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [349071.573802] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [2660])
>
> [349071.688547] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [349071.692323] drbd drbd0/0 drbd0: self
> 3375B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
> bits:1 flags:0
>
> [349071.699871] drbd drbd0/0 drbd0: peer
> 7F8B15C04AF49C4D:3375B2019708F68B:3374B2019708F68B:3373B2019708F68B
> bits:1 flags:0
>
> [349071.707687] drbd drbd0/0 drbd0: uuid_compare()=-1 by rule 50
>
> [349071.711563] drbd drbd0/0 drbd0: Becoming sync target due to disk states.
>
> [349071.715381] drbd drbd0/0 drbd0: peer( Unknown -> Primary ) conn(
> WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
>
> [349071.723039] drbd drbd0/0 drbd0: receive bitmap stats
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [349071.732489] drbd drbd0/0 drbd0: send bitmap stats
> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [349071.740178] drbd drbd0/0 drbd0: conn( WFBitMapT -> WFSyncUUID )
>
> [349071.787113] drbd drbd0/0 drbd0: updated sync uuid
> 3376B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>
> [349071.794907] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-target minor-0
>
> [349071.800006] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-target minor-0 exit code 0 (0x0)
>
> [349071.807737] drbd drbd0/0 drbd0: conn( WFSyncUUID -> SyncTarget )
>
> [349071.811639] drbd drbd0/0 drbd0: Began resync as SyncTarget (will
> sync 4 KB [1 bits set]).
>
> [349071.916117] drbd drbd0: sock was shut down by peer
>
> [349071.919955] drbd drbd0: peer( Primary -> Unknown ) conn(
> SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>
> [349071.927796] drbd drbd0: short read (expected size 4096)
>
> [349071.931812] drbd drbd0: error receiving RSDataReply, e: -5 l: 4096!
>
> [349071.935864] drbd drbd0: ack_receiver terminated
>
> [349071.939906] drbd drbd0: Terminating drbd_a_drbd0
>
> [349072.088385] drbd drbd0: Connection closed
>
> [349072.092398] drbd drbd0: conn( BrokenPipe -> Unconnected )
>
> [349072.096436] drbd drbd0: receiver terminated
>
> [349072.100469] drbd drbd0: Restarting receiver thread
>
> [349072.104454] drbd drbd0: receiver (re)started
>
> [349072.108373] drbd drbd0: conn( Unconnected -> WFConnection )
>
>
> --
>
> Best Regards,
>
> Serguei
Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:
#regzbot ^introduced v6.4..v6.5
#regzbot title drbd: drbd_send_block() failed
#regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Drbd-dev] DRBD broken in kernel 6.5 and 6.5.1
2023-09-02 20:31 Serguei Ivantsov
@ 2023-09-05 15:23 ` Jakub Kicinski
0 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2023-09-05 15:23 UTC (permalink / raw)
To: Serguei Ivantsov; +Cc: David Howells, regressions, stable, drbd-dev
CC: David
On Sat, 2 Sep 2023 22:31:06 +0200 Serguei Ivantsov wrote:
> Hello,
>
> After upgrading the kernel to 6.5 the system can't connect to the peer
> (6.4.11) anymore.
> I checked 6.5.1 - same issue.
> All previous kernels including 6.4.14 are working just fine.
> Checking the 6.5 changelog, I found commit
> 9ae440b8fdd6772b6c007fa3d3766530a09c9045 which mentioned some changes to
> DRBD.
I don't see anything obviously wrong here, maybe David has better
ideas. Can you try this?
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 79ab532aabaf..c607d4304608 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1539,8 +1539,6 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa
int offset, size_t size, unsigned msg_flags)
{
struct socket *socket = peer_device->connection->data.socket;
- struct msghdr msg = { .msg_flags = msg_flags, };
- struct bio_vec bvec;
int len = size;
int err = -EIO;
@@ -1551,10 +1549,13 @@ static int _drbd_send_page(struct drbd_peer_device *peer_device, struct page *pa
* __page_cache_release a page that would actually still be referenced
* by someone, leading to some obscure delayed Oops somewhere else. */
if (!drbd_disable_sendpage && sendpage_ok(page))
- msg.msg_flags |= MSG_NOSIGNAL | MSG_SPLICE_PAGES;
+ msg_flags |= MSG_SPLICE_PAGES;
+ msg_flags |= MSG_NOSIGNAL;
drbd_update_congested(peer_device->connection);
do {
+ struct msghdr msg = { .msg_flags = msg_flags, };
+ struct bio_vec bvec;
int sent;
bvec_set_page(&bvec, page, offset, len);
> On the 6.5.X system I have the following in the kernel log (drbd_send_block()
> failed):
>
> [ 2.473497] drbd: initialized. Version: 8.4.11 (api:1/proto:86-101)
>
> [ 2.475394] drbd: built-in
>
> [ 2.477254] drbd: registered as block device major 147
>
> [ 7.421400] drbd drbd0: Starting worker thread (from drbdsetup-84 [3844])
>
> [ 7.421509] drbd drbd0/0 drbd0: disk( Diskless -> Attaching )
>
> [ 7.421552] drbd drbd0: Method to ensure write ordering: flush
>
> [ 7.421554] drbd drbd0/0 drbd0: max BIO size = 131072
>
> [ 7.421557] drbd drbd0/0 drbd0: drbd_bm_resize called with capacity ==
> 1845173184
>
> [ 7.428017] drbd drbd0/0 drbd0: resync bitmap: bits=230646648
> words=3603854 pages=7039
>
> [ 7.467370] drbd0: detected capacity change from 0 to 1845173184
>
> [ 7.467372] drbd drbd0/0 drbd0: size = 880 GB (922586592 KB)
>
> [ 7.486005] drbd drbd0/0 drbd0: recounting of set bits took additional 0
> jiffies
>
> [ 7.486010] drbd drbd0/0 drbd0: 0 KB (0 bits) marked out-of-sync by on
> disk bit-map.
>
> [ 7.486017] drbd drbd0/0 drbd0: disk( Attaching -> UpToDate )
>
> [ 7.486021] drbd drbd0/0 drbd0: attached to UUIDs
> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
>
> [ 7.486863] drbd drbd0: conn( StandAlone -> Unconnected )
>
> [ 7.486871] drbd drbd0: Starting receiver thread (from drbd_w_drbd0
> [3847])
>
> [ 7.486918] drbd drbd0: receiver (re)started
>
> [ 7.486929] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [ 12.340212] drbd drbd0: initial packet S crossed
>
> [ 22.310856] drbd drbd0: Handshake successful: Agreed network protocol
> version 101
>
> [ 22.311087] drbd drbd0: Feature flags enabled on protocol level: 0xf
> TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [ 22.311425] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [ 22.311621] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
> [4071])
>
> [ 22.400702] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [ 22.400869] drbd drbd0/0 drbd0: self
> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD bits:0
> flags:0
>
> [ 22.401205] drbd drbd0/0 drbd0: peer
> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:0
> flags:0
>
> [ 22.401538] drbd drbd0/0 drbd0: uuid_compare()=0 by rule 40
>
> [ 22.401709] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
>
> [ 22.415394] drbd drbd0/0 drbd0: role( Secondary -> Primary )
>
> [ 22.506540] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096
> sent=-5
>
> [ 22.506773] drbd drbd0: peer( Secondary -> Unknown ) conn( Connected ->
> NetworkFailure ) pdsk( UpToDate -> DUnknown )
>
> [ 22.507109] drbd drbd0/0 drbd0: new current UUID
> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
>
> [ 22.507451] drbd drbd0: ack_receiver terminated
>
> [ 22.507588] drbd drbd0: Terminating drbd_a_drbd0
>
> [ 22.600693] drbd drbd0: Connection closed
>
> [ 22.600937] drbd drbd0: conn( NetworkFailure -> Unconnected )
>
> [ 22.601115] drbd drbd0: receiver terminated
>
> [ 22.601238] drbd drbd0: Restarting receiver thread
>
> [ 22.601378] drbd drbd0: receiver (re)started
>
> [ 22.601508] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [ 23.260624] drbd drbd0: Handshake successful: Agreed network protocol
> version 101
>
> [ 23.260859] drbd drbd0: Feature flags enabled on protocol level: 0xf
> TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [ 23.261187] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [ 23.261367] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
> [4071])
>
> [ 23.340593] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [ 23.340771] drbd drbd0/0 drbd0: self
> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD bits:1
> flags:0
>
> [ 23.341192] drbd drbd0/0 drbd0: peer
> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:0
> flags:0
>
> [ 23.341649] drbd drbd0/0 drbd0: uuid_compare()=1 by rule 70
>
> [ 23.341824] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
> WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
>
> [ 23.344911] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [ 23.396792] drbd drbd0/0 drbd0: receive bitmap stats [Bytes(packets)]:
> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [ 23.397210] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-source minor-0
>
> [ 23.407965] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-source minor-0 exit code 0 (0x0)
>
> [ 23.417547] drbd drbd0/0 drbd0: conn( WFBitMapS -> SyncSource ) pdsk(
> Consistent -> Inconsistent )
>
> [ 23.426697] drbd drbd0/0 drbd0: Began resync as SyncSource (will sync 4
> KB [1 bits set]).
>
> [ 23.435638] drbd drbd0/0 drbd0: updated sync UUID
> 7F8B15C04AF49C4D:32DEB2019708F68B:32DDB2019708F68B:7D97648599B446DD
>
> [ 23.488608] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096
> sent=-5
>
> [ 23.498182] drbd drbd0/0 drbd0: drbd_send_block() failed
>
> [ 23.508498] drbd drbd0: peer( Secondary -> Unknown ) conn( SyncSource ->
> NetworkFailure )
>
> [ 23.517597] drbd drbd0: ack_receiver terminated
>
> [ 23.527513] drbd drbd0: Terminating drbd_a_drbd0
>
> [ 23.690598] drbd drbd0: Connection closed
>
> [ 23.701857] drbd drbd0: conn( NetworkFailure -> Unconnected )
>
> [ 23.712017] drbd drbd0: receiver terminated
>
> [ 23.721597] drbd drbd0: Restarting receiver thread
>
>
>
> On the peer:
>
>
> [349071.038278] drbd drbd0: conn( Unconnected -> WFConnection )
>
> [349071.558245] drbd drbd0: Handshake successful: Agreed network protocol
> version 101
>
> [349071.562105] drbd drbd0: Feature flags enabled on protocol level: 0xf
> TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>
> [349071.569889] drbd drbd0: conn( WFConnection -> WFReportParams )
>
> [349071.573802] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0
> [2660])
>
> [349071.688547] drbd drbd0/0 drbd0: drbd_sync_handshake:
>
> [349071.692323] drbd drbd0/0 drbd0: self
> 3375B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD bits:1
> flags:0
>
> [349071.699871] drbd drbd0/0 drbd0: peer
> 7F8B15C04AF49C4D:3375B2019708F68B:3374B2019708F68B:3373B2019708F68B bits:1
> flags:0
>
> [349071.707687] drbd drbd0/0 drbd0: uuid_compare()=-1 by rule 50
>
> [349071.711563] drbd drbd0/0 drbd0: Becoming sync target due to disk states.
>
> [349071.715381] drbd drbd0/0 drbd0: peer( Unknown -> Primary ) conn(
> WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
>
> [349071.723039] drbd drbd0/0 drbd0: receive bitmap stats [Bytes(packets)]:
> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [349071.732489] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>
> [349071.740178] drbd drbd0/0 drbd0: conn( WFBitMapT -> WFSyncUUID )
>
> [349071.787113] drbd drbd0/0 drbd0: updated sync uuid
> 3376B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>
> [349071.794907] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-target minor-0
>
> [349071.800006] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
> before-resync-target minor-0 exit code 0 (0x0)
>
> [349071.807737] drbd drbd0/0 drbd0: conn( WFSyncUUID -> SyncTarget )
>
> [349071.811639] drbd drbd0/0 drbd0: Began resync as SyncTarget (will sync 4
> KB [1 bits set]).
>
> [349071.916117] drbd drbd0: sock was shut down by peer
>
> [349071.919955] drbd drbd0: peer( Primary -> Unknown ) conn( SyncTarget ->
> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>
> [349071.927796] drbd drbd0: short read (expected size 4096)
>
> [349071.931812] drbd drbd0: error receiving RSDataReply, e: -5 l: 4096!
>
> [349071.935864] drbd drbd0: ack_receiver terminated
>
> [349071.939906] drbd drbd0: Terminating drbd_a_drbd0
>
> [349072.088385] drbd drbd0: Connection closed
>
> [349072.092398] drbd drbd0: conn( BrokenPipe -> Unconnected )
>
> [349072.096436] drbd drbd0: receiver terminated
>
> [349072.100469] drbd drbd0: Restarting receiver thread
>
> [349072.104454] drbd drbd0: receiver (re)started
>
> [349072.108373] drbd drbd0: conn( Unconnected -> WFConnection )
>
>
> --
>
> Best Regards,
>
> Serguei
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [Drbd-dev] DRBD broken in kernel 6.5 and 6.5.1
2023-09-03 5:45 ` [Drbd-dev] DRBD broken in kernel 6.5 and 6.5.1 Linux regression tracking (Thorsten Leemhuis)
@ 2023-09-05 15:31 ` Thomas Voegtle
2023-09-05 17:24 ` Christoph Böhmwalder
1 sibling, 0 replies; 5+ messages in thread
From: Thomas Voegtle @ 2023-09-05 15:31 UTC (permalink / raw)
To: Linux regressions mailing list
Cc: Serguei Ivantsov, Philipp Reisner, LKML, David Howells,
Lars Ellenberg, drbd-dev
On Sun, 3 Sep 2023, Linux regression tracking (Thorsten Leemhuis) wrote:
> CCing the DRBD maintainers and the appropriate lists, as they should
> know about this -- or actually might know what is causing this already
> or be able to guess the cause. For the rest of this mail:
>
> [TLDR: I'm adding this report to the list of tracked Linux kernel
> regressions; the text you find below is based on a few templates
> paragraphs you might have encountered already in similar form.]
>
> On 02.09.23 22:37, Serguei Ivantsov wrote:
>> Hello,
>>
>> After upgrading the kernel to 6.5 the system can't connect to the peer
>> (6.4.11) anymore.
>> I checked 6.5.1 - same issue.
>> All previous kernels including 6.4.14 are working just fine.
>> Checking the 6.5 changelog, I found commit
>> 9ae440b8fdd6772b6c007fa3d3766530a09c9045 which mentioned some changes
>> to DRBD.
>>
>> On the 6.5.X system I have the following in the kernel log
>> (drbd_send_block() failed):
>>
>> [ 2.473497] drbd: initialized. Version: 8.4.11 (api:1/proto:86-101)
>>
>> [ 2.475394] drbd: built-in
>>
>> [ 2.477254] drbd: registered as block device major 147
>>
>> [ 7.421400] drbd drbd0: Starting worker thread (from drbdsetup-84 [3844])
>>
>> [ 7.421509] drbd drbd0/0 drbd0: disk( Diskless -> Attaching )
>>
>> [ 7.421552] drbd drbd0: Method to ensure write ordering: flush
>>
>> [ 7.421554] drbd drbd0/0 drbd0: max BIO size = 131072
>>
>> [ 7.421557] drbd drbd0/0 drbd0: drbd_bm_resize called with capacity
>> == 1845173184
>>
>> [ 7.428017] drbd drbd0/0 drbd0: resync bitmap: bits=230646648
>> words=3603854 pages=7039
>>
>> [ 7.467370] drbd0: detected capacity change from 0 to 1845173184
>>
>> [ 7.467372] drbd drbd0/0 drbd0: size = 880 GB (922586592 KB)
>>
>> [ 7.486005] drbd drbd0/0 drbd0: recounting of set bits took
>> additional 0 jiffies
>>
>> [ 7.486010] drbd drbd0/0 drbd0: 0 KB (0 bits) marked out-of-sync by
>> on disk bit-map.
>>
>> [ 7.486017] drbd drbd0/0 drbd0: disk( Attaching -> UpToDate )
>>
>> [ 7.486021] drbd drbd0/0 drbd0: attached to UUIDs
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
>>
>> [ 7.486863] drbd drbd0: conn( StandAlone -> Unconnected )
>>
>> [ 7.486871] drbd drbd0: Starting receiver thread (from drbd_w_drbd0 [3847])
>>
>> [ 7.486918] drbd drbd0: receiver (re)started
>>
>> [ 7.486929] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>> [ 12.340212] drbd drbd0: initial packet S crossed
>>
>> [ 22.310856] drbd drbd0: Handshake successful: Agreed network
>> protocol version 101
>>
>> [ 22.311087] drbd drbd0: Feature flags enabled on protocol level:
>> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>>
>> [ 22.311425] drbd drbd0: conn( WFConnection -> WFReportParams )
>>
>> [ 22.311621] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [4071])
>>
>> [ 22.400702] drbd drbd0/0 drbd0: drbd_sync_handshake:
>>
>> [ 22.400869] drbd drbd0/0 drbd0: self
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
>> bits:0 flags:0
>>
>> [ 22.401205] drbd drbd0/0 drbd0: peer
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>> bits:0 flags:0
>>
>> [ 22.401538] drbd drbd0/0 drbd0: uuid_compare()=0 by rule 40
>>
>> [ 22.401709] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
>> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
>>
>> [ 22.415394] drbd drbd0/0 drbd0: role( Secondary -> Primary )
>>
>> [ 22.506540] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096 sent=-5
>>
>> [ 22.506773] drbd drbd0: peer( Secondary -> Unknown ) conn(
>> Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
>>
>> [ 22.507109] drbd drbd0/0 drbd0: new current UUID
>> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
>>
>> [ 22.507451] drbd drbd0: ack_receiver terminated
>>
>> [ 22.507588] drbd drbd0: Terminating drbd_a_drbd0
>>
>> [ 22.600693] drbd drbd0: Connection closed
>>
>> [ 22.600937] drbd drbd0: conn( NetworkFailure -> Unconnected )
>>
>> [ 22.601115] drbd drbd0: receiver terminated
>>
>> [ 22.601238] drbd drbd0: Restarting receiver thread
>>
>> [ 22.601378] drbd drbd0: receiver (re)started
>>
>> [ 22.601508] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>> [ 23.260624] drbd drbd0: Handshake successful: Agreed network
>> protocol version 101
>>
>> [ 23.260859] drbd drbd0: Feature flags enabled on protocol level:
>> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>>
>> [ 23.261187] drbd drbd0: conn( WFConnection -> WFReportParams )
>>
>> [ 23.261367] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [4071])
>>
>> [ 23.340593] drbd drbd0/0 drbd0: drbd_sync_handshake:
>>
>> [ 23.340771] drbd drbd0/0 drbd0: self
>> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
>> bits:1 flags:0
>>
>> [ 23.341192] drbd drbd0/0 drbd0: peer
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>> bits:0 flags:0
>>
>> [ 23.341649] drbd drbd0/0 drbd0: uuid_compare()=1 by rule 70
>>
>> [ 23.341824] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
>> WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
>>
>> [ 23.344911] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
>> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [ 23.396792] drbd drbd0/0 drbd0: receive bitmap stats
>> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [ 23.397210] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-source minor-0
>>
>> [ 23.407965] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-source minor-0 exit code 0 (0x0)
>>
>> [ 23.417547] drbd drbd0/0 drbd0: conn( WFBitMapS -> SyncSource )
>> pdsk( Consistent -> Inconsistent )
>>
>> [ 23.426697] drbd drbd0/0 drbd0: Began resync as SyncSource (will
>> sync 4 KB [1 bits set]).
>>
>> [ 23.435638] drbd drbd0/0 drbd0: updated sync UUID
>> 7F8B15C04AF49C4D:32DEB2019708F68B:32DDB2019708F68B:7D97648599B446DD
>>
>> [ 23.488608] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096 sent=-5
>>
>> [ 23.498182] drbd drbd0/0 drbd0: drbd_send_block() failed
>>
>> [ 23.508498] drbd drbd0: peer( Secondary -> Unknown ) conn(
>> SyncSource -> NetworkFailure )
>>
>> [ 23.517597] drbd drbd0: ack_receiver terminated
>>
>> [ 23.527513] drbd drbd0: Terminating drbd_a_drbd0
>>
>> [ 23.690598] drbd drbd0: Connection closed
>>
>> [ 23.701857] drbd drbd0: conn( NetworkFailure -> Unconnected )
>>
>> [ 23.712017] drbd drbd0: receiver terminated
>>
>> [ 23.721597] drbd drbd0: Restarting receiver thread
>>
>>
>>
>> On the peer:
>>
>>
>> [349071.038278] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>> [349071.558245] drbd drbd0: Handshake successful: Agreed network
>> protocol version 101
>>
>> [349071.562105] drbd drbd0: Feature flags enabled on protocol level:
>> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>>
>> [349071.569889] drbd drbd0: conn( WFConnection -> WFReportParams )
>>
>> [349071.573802] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [2660])
>>
>> [349071.688547] drbd drbd0/0 drbd0: drbd_sync_handshake:
>>
>> [349071.692323] drbd drbd0/0 drbd0: self
>> 3375B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>> bits:1 flags:0
>>
>> [349071.699871] drbd drbd0/0 drbd0: peer
>> 7F8B15C04AF49C4D:3375B2019708F68B:3374B2019708F68B:3373B2019708F68B
>> bits:1 flags:0
>>
>> [349071.707687] drbd drbd0/0 drbd0: uuid_compare()=-1 by rule 50
>>
>> [349071.711563] drbd drbd0/0 drbd0: Becoming sync target due to disk states.
>>
>> [349071.715381] drbd drbd0/0 drbd0: peer( Unknown -> Primary ) conn(
>> WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
>>
>> [349071.723039] drbd drbd0/0 drbd0: receive bitmap stats
>> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [349071.732489] drbd drbd0/0 drbd0: send bitmap stats
>> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [349071.740178] drbd drbd0/0 drbd0: conn( WFBitMapT -> WFSyncUUID )
>>
>> [349071.787113] drbd drbd0/0 drbd0: updated sync uuid
>> 3376B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>>
>> [349071.794907] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-target minor-0
>>
>> [349071.800006] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-target minor-0 exit code 0 (0x0)
>>
>> [349071.807737] drbd drbd0/0 drbd0: conn( WFSyncUUID -> SyncTarget )
>>
>> [349071.811639] drbd drbd0/0 drbd0: Began resync as SyncTarget (will
>> sync 4 KB [1 bits set]).
>>
>> [349071.916117] drbd drbd0: sock was shut down by peer
>>
>> [349071.919955] drbd drbd0: peer( Primary -> Unknown ) conn(
>> SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>>
>> [349071.927796] drbd drbd0: short read (expected size 4096)
>>
>> [349071.931812] drbd drbd0: error receiving RSDataReply, e: -5 l: 4096!
>>
>> [349071.935864] drbd drbd0: ack_receiver terminated
>>
>> [349071.939906] drbd drbd0: Terminating drbd_a_drbd0
>>
>> [349072.088385] drbd drbd0: Connection closed
>>
>> [349072.092398] drbd drbd0: conn( BrokenPipe -> Unconnected )
>>
>> [349072.096436] drbd drbd0: receiver terminated
>>
>> [349072.100469] drbd drbd0: Restarting receiver thread
>>
>> [349072.104454] drbd drbd0: receiver (re)started
>>
>> [349072.108373] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>>
>> --
>>
>> Best Regards,
>>
>> Serguei
>
>
> Thanks for the report. To be sure the issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
>
> #regzbot ^introduced v6.4..v6.5
> #regzbot title drbd: drbd_send_block() failed
> #regzbot ignore-activity
>
> This isn't a regression? This issue or a fix for it are already
> discussed somewhere else? It was fixed already? You want to clarify when
> the regression started to happen? Or point out I got the title or
> something else totally wrong? Then just reply and tell me -- ideally
> while also telling regzbot about it, as explained by the page listed in
> the footer of this mail.
>
> Developers: When fixing the issue, remember to add 'Link:' tags pointing
> to the report (the parent of this mail). See page linked in footer for
> details.
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> That page also explains what to do if mails like this annoy you.
>
>
Saw the same problem today.
Didn't bisect, just started reverting commits.
I found out that a revert of
commit eeac7405c735acde8ec78869489a5aa25a141c13
drbd: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage()
fixed drbd for me on Linux 6.5.0.
You will need to return these as well, otherwise you will not be able to
build:
commit dc97391e661009eab46783030d2404c9b6e6f2e7
sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)
commit b848b26c6672c9b977890ba85f5a155e5eb221f0
net: Kill MSG_SENDPAGE_NOTLAST
Hope this helps.
Thomas
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Drbd-dev] DRBD broken in kernel 6.5 and 6.5.1
2023-09-03 5:45 ` [Drbd-dev] DRBD broken in kernel 6.5 and 6.5.1 Linux regression tracking (Thorsten Leemhuis)
2023-09-05 15:31 ` Thomas Voegtle
@ 2023-09-05 17:24 ` Christoph Böhmwalder
1 sibling, 0 replies; 5+ messages in thread
From: Christoph Böhmwalder @ 2023-09-05 17:24 UTC (permalink / raw)
To: Linux regressions mailing list, Serguei Ivantsov
Cc: David Howells, LKML, Lars Ellenberg, Philipp Reisner, drbd-dev
Am 03.09.23 um 07:45 schrieb Linux regression tracking (Thorsten Leemhuis):
> CCing the DRBD maintainers and the appropriate lists, as they should
> know about this -- or actually might know what is causing this already
> or be able to guess the cause. For the rest of this mail:
>
> [TLDR: I'm adding this report to the list of tracked Linux kernel
> regressions; the text you find below is based on a few templates
> paragraphs you might have encountered already in similar form.]
>
> On 02.09.23 22:37, Serguei Ivantsov wrote:
>> Hello,
>>
>> After upgrading the kernel to 6.5 the system can't connect to the peer
>> (6.4.11) anymore.
>> I checked 6.5.1 - same issue.
>> All previous kernels including 6.4.14 are working just fine.
>> Checking the 6.5 changelog, I found commit
>> 9ae440b8fdd6772b6c007fa3d3766530a09c9045 which mentioned some changes
>> to DRBD.
>>
>> On the 6.5.X system I have the following in the kernel log
>> (drbd_send_block() failed):
>>
>> [ 2.473497] drbd: initialized. Version: 8.4.11 (api:1/proto:86-101)
>>
>> [ 2.475394] drbd: built-in
>>
>> [ 2.477254] drbd: registered as block device major 147
>>
>> [ 7.421400] drbd drbd0: Starting worker thread (from drbdsetup-84 [3844])
>>
>> [ 7.421509] drbd drbd0/0 drbd0: disk( Diskless -> Attaching )
>>
>> [ 7.421552] drbd drbd0: Method to ensure write ordering: flush
>>
>> [ 7.421554] drbd drbd0/0 drbd0: max BIO size = 131072
>>
>> [ 7.421557] drbd drbd0/0 drbd0: drbd_bm_resize called with capacity
>> == 1845173184
>>
>> [ 7.428017] drbd drbd0/0 drbd0: resync bitmap: bits=230646648
>> words=3603854 pages=7039
>>
>> [ 7.467370] drbd0: detected capacity change from 0 to 1845173184
>>
>> [ 7.467372] drbd drbd0/0 drbd0: size = 880 GB (922586592 KB)
>>
>> [ 7.486005] drbd drbd0/0 drbd0: recounting of set bits took
>> additional 0 jiffies
>>
>> [ 7.486010] drbd drbd0/0 drbd0: 0 KB (0 bits) marked out-of-sync by
>> on disk bit-map.
>>
>> [ 7.486017] drbd drbd0/0 drbd0: disk( Attaching -> UpToDate )
>>
>> [ 7.486021] drbd drbd0/0 drbd0: attached to UUIDs
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
>>
>> [ 7.486863] drbd drbd0: conn( StandAlone -> Unconnected )
>>
>> [ 7.486871] drbd drbd0: Starting receiver thread (from drbd_w_drbd0 [3847])
>>
>> [ 7.486918] drbd drbd0: receiver (re)started
>>
>> [ 7.486929] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>> [ 12.340212] drbd drbd0: initial packet S crossed
>>
>> [ 22.310856] drbd drbd0: Handshake successful: Agreed network
>> protocol version 101
>>
>> [ 22.311087] drbd drbd0: Feature flags enabled on protocol level:
>> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>>
>> [ 22.311425] drbd drbd0: conn( WFConnection -> WFReportParams )
>>
>> [ 22.311621] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [4071])
>>
>> [ 22.400702] drbd drbd0/0 drbd0: drbd_sync_handshake:
>>
>> [ 22.400869] drbd drbd0/0 drbd0: self
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DD:7D96648599B446DD
>> bits:0 flags:0
>>
>> [ 22.401205] drbd drbd0/0 drbd0: peer
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>> bits:0 flags:0
>>
>> [ 22.401538] drbd drbd0/0 drbd0: uuid_compare()=0 by rule 40
>>
>> [ 22.401709] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
>> WFReportParams -> Connected ) pdsk( DUnknown -> UpToDate )
>>
>> [ 22.415394] drbd drbd0/0 drbd0: role( Secondary -> Primary )
>>
>> [ 22.506540] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096 sent=-5
>>
>> [ 22.506773] drbd drbd0: peer( Secondary -> Unknown ) conn(
>> Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
>>
>> [ 22.507109] drbd drbd0/0 drbd0: new current UUID
>> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
>>
>> [ 22.507451] drbd drbd0: ack_receiver terminated
>>
>> [ 22.507588] drbd drbd0: Terminating drbd_a_drbd0
>>
>> [ 22.600693] drbd drbd0: Connection closed
>>
>> [ 22.600937] drbd drbd0: conn( NetworkFailure -> Unconnected )
>>
>> [ 22.601115] drbd drbd0: receiver terminated
>>
>> [ 22.601238] drbd drbd0: Restarting receiver thread
>>
>> [ 22.601378] drbd drbd0: receiver (re)started
>>
>> [ 22.601508] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>> [ 23.260624] drbd drbd0: Handshake successful: Agreed network
>> protocol version 101
>>
>> [ 23.260859] drbd drbd0: Feature flags enabled on protocol level:
>> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>>
>> [ 23.261187] drbd drbd0: conn( WFConnection -> WFReportParams )
>>
>> [ 23.261367] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [4071])
>>
>> [ 23.340593] drbd drbd0/0 drbd0: drbd_sync_handshake:
>>
>> [ 23.340771] drbd drbd0/0 drbd0: self
>> 7F8B15C04AF49C4D:32DDB2019708F68B:7D97648599B446DD:7D96648599B446DD
>> bits:1 flags:0
>>
>> [ 23.341192] drbd drbd0/0 drbd0: peer
>> 32DDB2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>> bits:0 flags:0
>>
>> [ 23.341649] drbd drbd0/0 drbd0: uuid_compare()=1 by rule 70
>>
>> [ 23.341824] drbd drbd0/0 drbd0: peer( Unknown -> Secondary ) conn(
>> WFReportParams -> WFBitMapS ) pdsk( DUnknown -> Consistent )
>>
>> [ 23.344911] drbd drbd0/0 drbd0: send bitmap stats [Bytes(packets)]:
>> plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [ 23.396792] drbd drbd0/0 drbd0: receive bitmap stats
>> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [ 23.397210] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-source minor-0
>>
>> [ 23.407965] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-source minor-0 exit code 0 (0x0)
>>
>> [ 23.417547] drbd drbd0/0 drbd0: conn( WFBitMapS -> SyncSource )
>> pdsk( Consistent -> Inconsistent )
>>
>> [ 23.426697] drbd drbd0/0 drbd0: Began resync as SyncSource (will
>> sync 4 KB [1 bits set]).
>>
>> [ 23.435638] drbd drbd0/0 drbd0: updated sync UUID
>> 7F8B15C04AF49C4D:32DEB2019708F68B:32DDB2019708F68B:7D97648599B446DD
>>
>> [ 23.488608] drbd drbd0/0 drbd0: _drbd_send_page: size=4096 len=4096 sent=-5
>>
>> [ 23.498182] drbd drbd0/0 drbd0: drbd_send_block() failed
>>
>> [ 23.508498] drbd drbd0: peer( Secondary -> Unknown ) conn(
>> SyncSource -> NetworkFailure )
>>
>> [ 23.517597] drbd drbd0: ack_receiver terminated
>>
>> [ 23.527513] drbd drbd0: Terminating drbd_a_drbd0
>>
>> [ 23.690598] drbd drbd0: Connection closed
>>
>> [ 23.701857] drbd drbd0: conn( NetworkFailure -> Unconnected )
>>
>> [ 23.712017] drbd drbd0: receiver terminated
>>
>> [ 23.721597] drbd drbd0: Restarting receiver thread
>>
>>
>>
>> On the peer:
>>
>>
>> [349071.038278] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>> [349071.558245] drbd drbd0: Handshake successful: Agreed network
>> protocol version 101
>>
>> [349071.562105] drbd drbd0: Feature flags enabled on protocol level:
>> 0xf TRIM THIN_RESYNC WRITE_SAME WRITE_ZEROES.
>>
>> [349071.569889] drbd drbd0: conn( WFConnection -> WFReportParams )
>>
>> [349071.573802] drbd drbd0: Starting ack_recv thread (from drbd_r_drbd0 [2660])
>>
>> [349071.688547] drbd drbd0/0 drbd0: drbd_sync_handshake:
>>
>> [349071.692323] drbd drbd0/0 drbd0: self
>> 3375B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>> bits:1 flags:0
>>
>> [349071.699871] drbd drbd0/0 drbd0: peer
>> 7F8B15C04AF49C4D:3375B2019708F68B:3374B2019708F68B:3373B2019708F68B
>> bits:1 flags:0
>>
>> [349071.707687] drbd drbd0/0 drbd0: uuid_compare()=-1 by rule 50
>>
>> [349071.711563] drbd drbd0/0 drbd0: Becoming sync target due to disk states.
>>
>> [349071.715381] drbd drbd0/0 drbd0: peer( Unknown -> Primary ) conn(
>> WFReportParams -> WFBitMapT ) pdsk( DUnknown -> UpToDate )
>>
>> [349071.723039] drbd drbd0/0 drbd0: receive bitmap stats
>> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [349071.732489] drbd drbd0/0 drbd0: send bitmap stats
>> [Bytes(packets)]: plain 0(0), RLE 23(1), total 23; compression: 100.0%
>>
>> [349071.740178] drbd drbd0/0 drbd0: conn( WFBitMapT -> WFSyncUUID )
>>
>> [349071.787113] drbd drbd0/0 drbd0: updated sync uuid
>> 3376B2019708F68A:0000000000000000:7D97648599B446DC:7D96648599B446DD
>>
>> [349071.794907] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-target minor-0
>>
>> [349071.800006] drbd drbd0/0 drbd0: helper command: /sbin/drbdadm
>> before-resync-target minor-0 exit code 0 (0x0)
>>
>> [349071.807737] drbd drbd0/0 drbd0: conn( WFSyncUUID -> SyncTarget )
>>
>> [349071.811639] drbd drbd0/0 drbd0: Began resync as SyncTarget (will
>> sync 4 KB [1 bits set]).
>>
>> [349071.916117] drbd drbd0: sock was shut down by peer
>>
>> [349071.919955] drbd drbd0: peer( Primary -> Unknown ) conn(
>> SyncTarget -> BrokenPipe ) pdsk( UpToDate -> DUnknown )
>>
>> [349071.927796] drbd drbd0: short read (expected size 4096)
>>
>> [349071.931812] drbd drbd0: error receiving RSDataReply, e: -5 l: 4096!
>>
>> [349071.935864] drbd drbd0: ack_receiver terminated
>>
>> [349071.939906] drbd drbd0: Terminating drbd_a_drbd0
>>
>> [349072.088385] drbd drbd0: Connection closed
>>
>> [349072.092398] drbd drbd0: conn( BrokenPipe -> Unconnected )
>>
>> [349072.096436] drbd drbd0: receiver terminated
>>
>> [349072.100469] drbd drbd0: Restarting receiver thread
>>
>> [349072.104454] drbd drbd0: receiver (re)started
>>
>> [349072.108373] drbd drbd0: conn( Unconnected -> WFConnection )
>>
>>
>> --
>>
>> Best Regards,
>>
>> Serguei
>
>
> Thanks for the report. To be sure the issue doesn't fall through the
> cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
> tracking bot:
>
> #regzbot ^introduced v6.4..v6.5
> #regzbot title drbd: drbd_send_block() failed
> #regzbot ignore-activity
>
> This isn't a regression? This issue or a fix for it are already
> discussed somewhere else? It was fixed already? You want to clarify when
> the regression started to happen? Or point out I got the title or
> something else totally wrong? Then just reply and tell me -- ideally
> while also telling regzbot about it, as explained by the page listed in
> the footer of this mail.
>
> Developers: When fixing the issue, remember to add 'Link:' tags pointing
> to the report (the parent of this mail). See page linked in footer for
> details.
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> That page also explains what to do if mails like this annoy you.
Hello,
thank you for the report. We were able to reproduce the issue and are
now investigating.
Based off of context we think that eeac7405c735 ("drbd: Use
sendmsg(MSG_SPLICE_PAGES) rather than sendpage()") is the culprit.
(See also the other reply by Thomas Voegtle, which confirms this.)
I'm also adding David Howells to CC, maybe he has an idea what could be
happening here.
--
Christoph Böhmwalder
LINBIT | Keeping the Digital World Running
DRBD HA — Disaster Recovery — Software defined Storage
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-09-05 17:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAKH+VT3YLmAn0Y8=q37UTDShqxDLsqPcQ4hBMzY7HPn7zNx+RQ@mail.gmail.com>
2023-09-03 5:45 ` [Drbd-dev] DRBD broken in kernel 6.5 and 6.5.1 Linux regression tracking (Thorsten Leemhuis)
2023-09-05 15:31 ` Thomas Voegtle
2023-09-05 17:24 ` Christoph Böhmwalder
2023-09-02 20:31 Serguei Ivantsov
2023-09-05 15:23 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox