From: Lars Ellenberg <lars.ellenberg@linbit.com>
To: drbd-dev@lists.linbit.com
Subject: Re: [Drbd-dev] drbd 8.4.3: refcounter overflow on re-sync
Date: Wed, 24 Sep 2014 14:50:22 +0200 [thread overview]
Message-ID: <20140924125022.GE7118@soda.linbit> (raw)
In-Reply-To: <20140924101451.GC7118@soda.linbit>
On Wed, Sep 24, 2014 at 12:14:51PM +0200, Lars Ellenberg wrote:
> On Tue, Sep 23, 2014 at 08:14:21PM +0200, Marc Schiffbauer wrote:
> > * Lars Ellenberg schrieb am 23.09.14 um 13:03 Uhr:
> > >On Fri, Sep 19, 2014 at 05:16:53PM +0200, Marc Schiffbauer wrote:
> > >>* Lars Ellenberg schrieb am 19.09.14 um 16:48 Uhr:
> > >>>On Fri, Sep 19, 2014 at 11:49:09AM +0200, Marc Schiffbauer wrote:
> > >>>>Hi,
> > >>>>
> > >>>
> > >>>If you resolve that to a code line,
> > >>>I may be able to figure out what PAX is talking about.
> > >>>
> > >>>But from this stack trace alone, I have absolutely no idea what PAX
> > >>>is trying to say, which refcount could possibly be meant there,
> > >>>let alone why it could possibly overflow or.
> > >>>
> > >>>Ah, ok. Looking at [1], "PaX Team" says:
> > >>>.---
> > >>>| after having looked at the drbd code a bit i think this could be a
> > >>>| real bug in drbd but only upstream can tell for sure so you'll have to
> > >>>| contact them. you can show them the following that i figured out so far:
> > >>>|
> > >>>| the refcount overflow was detected in
> > >>>| drivers/block/drbd/drbd_bitmap.c:bm_page_io_async at the
> > >>>|
> > >>>| atomic_add(len >> 9, &mdev->rs_sect_ev)
> > >>>
> > >>>Well, yes, why would it not overflow.
> > >>>It is *not* a refcount.
> > >>>It is an atomic counter.
> > >>>It is meant to overflow.
> >
> >
> > Another question PaX-Team is asking:
> >
> > what about rs_sect_in?
>
> That usually should not overflow, as it is typically regularly (several
> times per second) reset to zero (and for other reasons).
>
> If you manage to transfer more 2 TiByte in subseconds via a single TCP
> connection, more power to you.
>
> Still, if it should overflow (for whatever reason), no real harm done.
> Arbitrarily sending a signal or terminating processes in that case would
> be the only actually disturbing thing.
Ok.
So what PAX really is doing is redefine "atomic_add" and similar to
basically become a no-op, if it would overflow.
typedef struct { int counter } atomic_t;
void atomic_add(int i, atomic_t *v)
{
v->counter += i;
if (that_caused_a_counter_wrap_in_any_direction) {
/* oops, overflow */
SCREAM("help me, overflow...");
v->counter -= i;
}
}
If that *is* really an object refcount,
and somewhere would be
if (atomic_dec_and_test(that_count))
free(some_object);
then ok, you have replace one bug
with an error message and different bug.
Might help with debugging. Not with much else.
But really.
Precautionary changing (x + y) to be silently identical to (x + 0),
"just in case", will surely generally improve program flow... D'oh.
Anyways, now that I know PAX is really just keeping that counter
at a fixed value of INT_MAX in this case, and nothing else,
what would have caused DRBD to disconnect/reconnect?
Could that have been you?
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
next prev parent reply other threads:[~2014-09-24 12:50 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-09-19 9:49 [Drbd-dev] drbd 8.4.3: refcounter overflow on re-sync Marc Schiffbauer
2014-09-19 14:48 ` Lars Ellenberg
2014-09-19 15:16 ` Marc Schiffbauer
2014-09-23 11:03 ` Lars Ellenberg
2014-09-23 17:08 ` Marc Schiffbauer
2014-09-24 10:04 ` Lars Ellenberg
2014-09-23 18:14 ` Marc Schiffbauer
2014-09-24 10:14 ` Lars Ellenberg
2014-09-24 12:50 ` Lars Ellenberg [this message]
2014-09-24 15:57 ` PaX Team
2014-09-24 16:31 ` Lars Ellenberg
2014-09-24 18:07 ` PaX Team
2014-09-24 21:50 ` Lars Ellenberg
2014-09-24 23:25 ` PaX Team
2014-09-25 0:07 ` Lars Ellenberg
2014-09-27 0:45 ` PaX Team
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140924125022.GE7118@soda.linbit \
--to=lars.ellenberg@linbit.com \
--cc=drbd-dev@lists.linbit.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.