All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Schiffbauer <m@sys4.de>
To: drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] drbd 8.4.3: refcounter overflow on re-sync
Date: Fri, 19 Sep 2014 17:16:53 +0200	[thread overview]
Message-ID: <20140919151653.GH21578@schiffbauer.net> (raw)
In-Reply-To: <20140919144805.GS13125@soda.linbit>

* Lars Ellenberg schrieb am 19.09.14 um 16:48 Uhr:
>On Fri, Sep 19, 2014 at 11:49:09AM +0200, Marc Schiffbauer wrote:
>> Hi,
>>
>
>If you resolve that to a code line,
>I may be able to figure out what PAX is talking about.
>
>But from this stack trace alone, I have absolutely no idea what PAX
>is trying to say, which refcount could possibly be meant there,
>let alone why it could possibly overflow or.
>
>Ah, ok. Looking at [1], "PaX Team" says:
>.---
>| after having looked at the drbd code a bit i think this could be a
>| real bug in drbd but only upstream can tell for sure so you'll have to
>| contact them. you can show them the following that i figured out so far:
>|
>| the refcount overflow was detected in
>| drivers/block/drbd/drbd_bitmap.c:bm_page_io_async at the
>|
>| atomic_add(len >> 9, &mdev->rs_sect_ev)
>
>Well, yes, why would it not overflow.
>It is *not* a refcount.
>It is an atomic counter.
>It is meant to overflow.

Ok, then I can report this back and it should be fixed in PaX as a 
false postive. Thanks for clarifying this.


>
>| statement. rs_sect_ev is an atomic_t in struct drbd_conf declared in
>| drivers/block/drbd/drbd_int.h (i'll note here that i think the
>| rs_sect_in field is simiarly affected by this problem).
>|
>| based on the code, these two fields don't look like refcounts, nor are
>| they free-running counters or statistics either (the usual cases for
>| false positives). instead they're some sector counts that get reset on
>| certain events (the details of which i can't tell as i don't know the
>| drbd code). therefore my feeling is that these counts are not supposed
>| to overflow as they'd otherwise lead to incorrect calculations in
>| drbd_rs_should_slow_down and drbd_rs_controller (the latter reads
>| rs_sect_in into an unsigned int btw, this is mixing up signed/unsigned
>| integers, that can't be good...).
>
>But yes, it *is* a "free running counter".
>For IO that has to be accounted to the resyncer.
>They are reset whenever a new resync starts.
>
>Apparently you are syncing more than 2*41 byte.
>That's ok.  Others do too.
>Having a lot of storage is no reason to drop the connection.

ack


>
>| so what happened to you is that somehow rs_sect_ev reached 2G (that
>| corresponds to about 1TB of traffic between two counter resets or
>| 'events') and the signed overflow detection triggered on it (if that's
>| too unrealistic traffic for drbd then there was some other problem
>| calculating the sector counts that resulted in some big enough value to
>| trigger a signed overflow, though at the moment of the overflow 'len'
>| had a value of 8 only). in any case it looks that an atomic_t is not
>| enough to store real life sector counts and will have to be enlarged
>| probably (or the counters will have to be reset more frequently).
>`---
>
>It is perfectly ok for rs_sect_ev to overflow, it is meant to overflow.
>It is used in some signed modulo 32bit calculation only.
>
>> [63999.116913]  [<ffffffffa006f617>] ? drbd_thread_setup+0x4e/0x117 [drbd]
>> [63999.116917]  [<ffffffffa006f5c9>] ? conn_destroy+0x86/0x86 [drbd]
>> [63999.116922]  [<ffffffff8107fbfc>] ? kthread+0xd5/0xdd
>> [63999.116924]  [<ffffffff8107fb27>] ? kthread_worker_fn+0xf9/0xf9
>> [63999.116929]  [<ffffffff81535f74>] ? ret_from_fork+0x74/0xa0
>> [63999.116930]  [<ffffffff8107fb27>] ? kthread_worker_fn+0xf9/0xf9
>> [63999.116931] Code: 48 89 de 4c 89 ff e8 c3 80 00 00 85 c0 0f 85 b2 00 00 00 8b 54 24 1c f0 41 01 97 d0 04 00 00 71 0a f0 41 29 97 d0 04 00 00 cd 04 <bb> 03 00 00 00 f0 41 ff 87 24 02 00 00 71 0a f0 41 ff 8f 24 02
>>
>>
>> and drbd itself says:
>>
>> [63999.116965] block drbd0: drbd_alloc_pages interrupted!
>
>Um, I'd guess it does say some things before that, too.
>Also, the other node will likely have something to say.
>
>Can you give some more context?
>
>Interrupted by whom?

by the PAX-System in the Kernel to protect the system becaus ethis 
*might* be an evil overflow.


>Is PAX delivering some signal?
>Which?

I don't know, sorry :-/

>Why would it do that?

see above

>If so, stop it :-)

OK ;)

If you are still interested inmore context because you think there 
might be something wrong in the drbd code, then I can boot the 
previous kernel with PAX_REFCOUNT enabled to reproduce the proplem 
again.

But it would be some effort to do zhis again, so I want to be sure 
its of any use for you now that it is clear that this must be a 
false positive.


Thanks
-Marc

-- 
[*] sys4 AG

http://sys4.de, +49 (89) 30 90 46 64
Franziskanerstraße 15, 81669 München

Sitz der Gesellschaft: München, Amtsgericht München: HRB 199263
Vorstand: Patrick Ben Koetter, Marc Schiffbauer
Aufsichtsratsvorsitzender: Florian Kirstein

  reply	other threads:[~2014-09-19 15:16 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-19  9:49 [Drbd-dev] drbd 8.4.3: refcounter overflow on re-sync Marc Schiffbauer
2014-09-19 14:48 ` Lars Ellenberg
2014-09-19 15:16   ` Marc Schiffbauer [this message]
2014-09-23 11:03     ` Lars Ellenberg
2014-09-23 17:08       ` Marc Schiffbauer
2014-09-24 10:04         ` Lars Ellenberg
2014-09-23 18:14       ` Marc Schiffbauer
2014-09-24 10:14         ` Lars Ellenberg
2014-09-24 12:50           ` Lars Ellenberg
2014-09-24 15:57             ` PaX Team
2014-09-24 16:31               ` Lars Ellenberg
2014-09-24 18:07                 ` PaX Team
2014-09-24 21:50                   ` Lars Ellenberg
2014-09-24 23:25                     ` PaX Team
2014-09-25  0:07                       ` Lars Ellenberg
2014-09-27  0:45                         ` PaX Team

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140919151653.GH21578@schiffbauer.net \
    --to=m@sys4.de \
    --cc=drbd-dev@lists.linbit.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.