All of lore.kernel.org
 help / color / mirror / Atom feed
* [Drbd-dev] drbd 8.4.3: refcounter overflow on re-sync
@ 2014-09-19  9:49 Marc Schiffbauer
  2014-09-19 14:48 ` Lars Ellenberg
  0 siblings, 1 reply; 16+ messages in thread
From: Marc Schiffbauer @ 2014-09-19  9:49 UTC (permalink / raw)
  To: drbd-dev

Hi,

about a year ago I encountered a problem with drbd: On long running 
re-syncs a refcounter overflow happens in the drbd module resulting 
in loss of network connection (and reconnect).

I am running a linux kernel that is hardened with grsecurity and 
PaX.  It has a feature to detect such recounter overflows 
(CONFIG_PAX_REFCOUNT)

Now I encountered that same Probleem egain with a much newer kernel.

There may be two causes that can trigger those cases:
1) real bug in a part of the kernel (drbd in that case)
2) false positive in PAX

The developer of PAX had a look at this issue and assumes a real bug 
in drbd but asked me to ask the drbd developer for details.

Please see [1].

Now today, with a newer kernel the issue looks like that:

[63999.116870] PAX: refcount overflow detected in: drbd_r_ms03:6378, uid/euid: 0/0
[63999.116875] CPU: 0 PID: 6378 Comm: drbd_r_ms03 Not tainted 3.14.18-hardened-r2 #1
[63999.116876] Hardware name: Supermicro X9DRi-LN4+/X9DR3-LN4+/X9DRi-LN4+/X9DR3-LN4+, BIOS 3.0b 05/27/2014
[63999.116878] task: ffff882f8b599010 ti: ffff882f8b599730 task.ti: ffff882f8b599730
[63999.116879] RIP: 0010:[<ffffffffa00663ca>]  [<ffffffffa00663ca>] ffffffffa00663ca
[63999.116882] RSP: 0000:ffffc90016483dd8  EFLAGS: 00000a02
[63999.116883] RAX: 0000000000000000 RBX: 000000027fd7cb00 RCX: ffff88306c17e6a0
[63999.116884] RDX: 0000000000000100 RSI: ffffffff818d2101 RDI: ffff882fb6c9d650
[63999.116884] RBP: ffff882f9c577010 R08: ffff88306c17e6a0 R09: ffff882fb6e76cc0
[63999.116885] R10: ffff882fb6e76cc0 R11: ffff882fb6e76cc0 R12: ffffc90016483e50
[63999.116886] R13: ffff882fb617f228 R14: ffff882f35028200 R15: ffff882fb617f000
[63999.116888] FS:  0000000000000000(0000) GS:ffff88307f200000(0000) knlGS:0000000000000000
[63999.116889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[63999.116889] CR2: 00000320a5073008 CR3: 000000000154d000 CR4: 00000000001607f0
[63999.116890] Stack:
[63999.116891]  ffff882f8b85d800 0000000000000018 0000000000000018 0000010000000018
[63999.116893]  0000000000000000 ffff882f8b85d800 0000000000000009 00000000000000d8
[63999.116895]  0000000000000018 0000000000000018 0000000000000000 ffffffffa0068134
[63999.116896] Call Trace:
[63999.116907]  [<ffffffffa0068134>] ? drbdd_init+0x147/0x1d7 [drbd]
[63999.116913]  [<ffffffffa006f617>] ? drbd_thread_setup+0x4e/0x117 [drbd]
[63999.116917]  [<ffffffffa006f5c9>] ? conn_destroy+0x86/0x86 [drbd]
[63999.116922]  [<ffffffff8107fbfc>] ? kthread+0xd5/0xdd
[63999.116924]  [<ffffffff8107fb27>] ? kthread_worker_fn+0xf9/0xf9
[63999.116929]  [<ffffffff81535f74>] ? ret_from_fork+0x74/0xa0
[63999.116930]  [<ffffffff8107fb27>] ? kthread_worker_fn+0xf9/0xf9
[63999.116931] Code: 48 89 de 4c 89 ff e8 c3 80 00 00 85 c0 0f 85 b2 00 00 00 8b 54 24 1c f0 41 01 97 d0 04 00 00 71 0a f0 41 29 97 d0 04 00 00 cd 04 <bb> 03 00 00 00 f0 41 ff 87 24 02 00 00 71 0a f0 41 ff 8f 24 02


and drbd itself says:

[63999.116965] block drbd0: drbd_alloc_pages interrupted!
[63999.116968] d-con ms03: error receiving RSDataRequest, e: -12 l: 0!
[63999.116986] d-con ms03: peer( Secondary -> Unknown ) conn( SyncSource -> ProtocolError )
[63999.117021] d-con ms03: asender terminated
[63999.117025] d-con ms03: Terminating drbd_a_ms03
[63999.130575] d-con ms03: Connection closed
[63999.130599] d-con ms03: conn( ProtocolError -> Unconnected )
[63999.130601] d-con ms03: receiver terminated
[63999.130602] d-con ms03: Restarting receiver thread
[63999.130603] d-con ms03: receiver (re)started
[63999.130614] d-con ms03: conn( Unconnected -> WFConnection )
[64000.116691] d-con ms03: initial packet S crossed
[64009.195530] d-con ms03: Handshake successful: Agreed network protocol version 101
[64009.195807] d-con ms03: Peer authenticated using 64 bytes HMAC
[64009.195834] d-con ms03: conn( WFConnection -> WFReportParams )
[64009.195843] d-con ms03: Starting asender thread (from drbd_r_ms03 [6378])
[ ... and continues to sync ... ]


Is this a real bug in drbd?

Thanks
-Marc



[1] https://forums.grsecurity.net/viewtopic.php?f=3&t=3786&p=13558&hilit=REFCOUNT#p13558
-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-09-29 18:31 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-09-19  9:49 [Drbd-dev] drbd 8.4.3: refcounter overflow on re-sync Marc Schiffbauer
2014-09-19 14:48 ` Lars Ellenberg
2014-09-19 15:16   ` Marc Schiffbauer
2014-09-23 11:03     ` Lars Ellenberg
2014-09-23 17:08       ` Marc Schiffbauer
2014-09-24 10:04         ` Lars Ellenberg
2014-09-23 18:14       ` Marc Schiffbauer
2014-09-24 10:14         ` Lars Ellenberg
2014-09-24 12:50           ` Lars Ellenberg
2014-09-24 15:57             ` PaX Team
2014-09-24 16:31               ` Lars Ellenberg
2014-09-24 18:07                 ` PaX Team
2014-09-24 21:50                   ` Lars Ellenberg
2014-09-24 23:25                     ` PaX Team
2014-09-25  0:07                       ` Lars Ellenberg
2014-09-27  0:45                         ` PaX Team

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.