All of lore.kernel.org
 help / color / mirror / Atom feed
* [Drbd-dev] ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset )
@ 2016-09-19 19:09 Eric Wheeler
  2016-09-20  7:57 ` Lars Ellenberg
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Wheeler @ 2016-09-19 19:09 UTC (permalink / raw)
  To: drbd-dev

Hello all,

We noticed after resizing one backing device and not the other, that the 
side with the larger device issued the following assertion:

ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in drbd/obj/default/drbd_main.c:3257

We were issuing a drbdadm resize --assume-clean, but for "reasons", the 
far-end did not resize. Below you can see the trace of the side issuing 
assertions. Is this a bug that should be hanadled intelligent way?

What does the failed assertion indicate? It appears to assert shortly 
after role change to-or-from Primary-or-Secondary.

We are using DRBD 8.4.7-2 from git.

[576057.341024] block drbd8033: logical block size of local backend does not match (drbd:512, backend:4096); was this a late attach?
[576057.343299] block drbd8033: drbd_bm_resize called with capacity == 21854808
[576057.344366] block drbd8033: resync bitmap: bits=2731851 words=42686 pages=84
[576057.345395] block drbd8033: size = 10 GB (10927404 KB)
[576057.381838] block drbd8033: Writing the whole bitmap, size changed
[576057.382865] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[576057.384064] block drbd8033: Resync of new storage suppressed with --assume-clean
[576061.509044] block drbd8033: logical block size of local backend does not match (drbd:512, backend:4096); was this a late attach?
[576911.559399] block drbd8033: role( Primary -> Secondary ) 
[576911.560677] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[662435.411830] block drbd8033: role( Secondary -> Primary ) 
[663204.542351] block drbd8033: role( Primary -> Secondary ) 
[663204.543515] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[748834.185982] block drbd8033: role( Secondary -> Primary ) 
[750155.916752] block drbd8033: role( Primary -> Secondary ) 
[750155.918007] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[750155.918768] block drbd8033: ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in /root/rpmbuild/BUILD/drbd-8.4.7-2-e4242d818e66301920ef28733f533053e924717f/obj/default/drbd_main.c:3257
[835228.859747] block drbd8033: role( Secondary -> Primary ) 
[835228.860996] block drbd8033: ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in /root/rpmbuild/BUILD/drbd-8.4.7-2-e4242d818e66301920ef28733f533053e924717f/obj/default/drbd_main.c:3257
[836287.928694] block drbd8033: role( Primary -> Secondary ) 
[836287.929886] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[836287.930695] block drbd8033: ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in /root/rpmbuild/BUILD/drbd-8.4.7-2-e4242d818e66301920ef28733f533053e924717f/obj/default/drbd_main.c:3257
[921626.879416] block drbd8033: role( Secondary -> Primary ) 
[921626.880644] block drbd8033: ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in /root/rpmbuild/BUILD/drbd-8.4.7-2-e4242d818e66301920ef28733f533053e924717f/obj/default/drbd_main.c:3257
[922755.643770] block drbd8033: role( Primary -> Secondary ) 
[922755.645472] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[922755.646186] block drbd8033: ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in /root/rpmbuild/BUILD/drbd-8.4.7-2-e4242d818e66301920ef28733f533053e924717f/obj/default/drbd_main.c:3257
[1008022.668036] block drbd8033: role( Secondary -> Primary ) 
[1008022.669397] block drbd8033: ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in /root/rpmbuild/BUILD/drbd-8.4.7-2-e4242d818e66301920ef28733f533053e924717f/obj/default/drbd_main.c:3257
[1009802.518711] block drbd8033: role( Primary -> Secondary ) 
[1009802.519884] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[1009802.520536] block drbd8033: ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in /root/rpmbuild/BUILD/drbd-8.4.7-2-e4242d818e66301920ef28733f533053e924717f/obj/default/drbd_main.c:3257
[1024556.860980] block drbd8033: role( Secondary -> Primary ) 
[1024556.863208] block drbd8033: ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in /root/rpmbuild/BUILD/drbd-8.4.7-2-e4242d818e66301920ef28733f533053e924717f/obj/default/drbd_main.c:3257
[1024614.269377] block drbd8033: role( Primary -> Secondary ) 
[1024614.271469] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
[1024614.272483] block drbd8033: ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in /root/rpmbuild/BUILD/drbd-8.4.7-2-e4242d818e66301920ef28733f533053e924717f/obj/default/drbd_main.c:3257


--
Eric Wheeler

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Drbd-dev] ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset )
  2016-09-19 19:09 [Drbd-dev] ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) Eric Wheeler
@ 2016-09-20  7:57 ` Lars Ellenberg
  2016-09-23 22:01   ` Eric Wheeler
  0 siblings, 1 reply; 4+ messages in thread
From: Lars Ellenberg @ 2016-09-20  7:57 UTC (permalink / raw)
  To: drbd-dev

On Mon, Sep 19, 2016 at 12:09:12PM -0700, Eric Wheeler wrote:
> Hello all,
> 
> We noticed after resizing one backing device and not the other, that the 
> side with the larger device issued the following assertion:
> 
> ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in drbd/obj/default/drbd_main.c:3257
>
> We were issuing a drbdadm resize --assume-clean, but for "reasons", the 
> far-end did not resize. Below you can see the trace of the side issuing 
> assertions. Is this a bug that should be hanadled intelligent way?
> 
> What does the failed assertion indicate? It appears to assert shortly 
> after role change to-or-from Primary-or-Secondary.

DRBD is configured for "internal" meta data,
for some reason does some meta data IO, and realizes that someone
resized the backing device under it, without telling it to.

Should not do further harm.

On the box that is logging these asserts,
do a "drbdadm check-resize".

Preferably, you should follow a backend resize immediately
with a drbdadm check-resize (or resize).

> We are using DRBD 8.4.7-2 from git.
> 
> [576057.341024] block drbd8033: logical block size of local backend does not match (drbd:512, backend:4096); was this a late attach?
> [576057.343299] block drbd8033: drbd_bm_resize called with capacity == 21854808
> [576057.344366] block drbd8033: resync bitmap: bits=2731851 words=42686 pages=84
> [576057.345395] block drbd8033: size = 10 GB (10927404 KB)
> [576057.381838] block drbd8033: Writing the whole bitmap, size changed
> [576057.382865] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> [576057.384064] block drbd8033: Resync of new storage suppressed with --assume-clean
> [576061.509044] block drbd8033: logical block size of local backend does not match (drbd:512, backend:4096); was this a late attach?
> [576911.559399] block drbd8033: role( Primary -> Secondary ) 
> [576911.560677] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> [662435.411830] block drbd8033: role( Secondary -> Primary ) 
> [663204.542351] block drbd8033: role( Primary -> Secondary ) 
> [663204.543515] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> [748834.185982] block drbd8033: role( Secondary -> Primary ) 
> [750155.916752] block drbd8033: role( Primary -> Secondary ) 
> [750155.918007] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> [750155.918768] block drbd8033: ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in /root/rpmbuild/BUILD/drbd-8.4.7-2-e4242d818e66301920ef28733f533053e924717f/obj/default/drbd_main.c:3257
> [835228.859747] block drbd8033: role( Secondary -> Primary ) 
> [835228.860996] block drbd8033: ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in /root/rpmbuild/BUILD/drbd-8.4.7-2-e4242d818e66301920ef28733f533053e924717f/obj/default/drbd_main.c:3257

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Drbd-dev] ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset )
  2016-09-20  7:57 ` Lars Ellenberg
@ 2016-09-23 22:01   ` Eric Wheeler
  2016-09-26  9:46     ` Lars Ellenberg
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Wheeler @ 2016-09-23 22:01 UTC (permalink / raw)
  To: Lars Ellenberg; +Cc: drbd-dev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 3663 bytes --]

On Tue, 20 Sep 2016, Lars Ellenberg wrote:

> On Mon, Sep 19, 2016 at 12:09:12PM -0700, Eric Wheeler wrote:
> > Hello all,
> > 
> > We noticed after resizing one backing device and not the other, that the 
> > side with the larger device issued the following assertion:
> > 
> > ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in drbd/obj/default/drbd_main.c:3257
> >
> > We were issuing a drbdadm resize --assume-clean, but for "reasons", the 
> > far-end did not resize. Below you can see the trace of the side issuing 
> > assertions. Is this a bug that should be hanadled intelligent way?
> > 
> > What does the failed assertion indicate? It appears to assert shortly 
> > after role change to-or-from Primary-or-Secondary.
> 
> DRBD is configured for "internal" meta data,
> for some reason does some meta data IO, and realizes that someone
> resized the backing device under it, without telling it to.
> 
> Should not do further harm.

"Further" ?  Was there any harm (corruption) in the first place?

> On the box that is logging these asserts,
> do a "drbdadm check-resize".
> 
> Preferably, you should follow a backend resize immediately
> with a drbdadm check-resize (or resize).

Is `drbdadm check-resize` a node-local operation?  Does it move metadata 
to the end on both sides, or only on the side invoking check-resize?

Thanks for your help!

--
Eric Wheeler


> 
> > We are using DRBD 8.4.7-2 from git.
> > 
> > [576057.341024] block drbd8033: logical block size of local backend does not match (drbd:512, backend:4096); was this a late attach?
> > [576057.343299] block drbd8033: drbd_bm_resize called with capacity == 21854808
> > [576057.344366] block drbd8033: resync bitmap: bits=2731851 words=42686 pages=84
> > [576057.345395] block drbd8033: size = 10 GB (10927404 KB)
> > [576057.381838] block drbd8033: Writing the whole bitmap, size changed
> > [576057.382865] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> > [576057.384064] block drbd8033: Resync of new storage suppressed with --assume-clean
> > [576061.509044] block drbd8033: logical block size of local backend does not match (drbd:512, backend:4096); was this a late attach?
> > [576911.559399] block drbd8033: role( Primary -> Secondary ) 
> > [576911.560677] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> > [662435.411830] block drbd8033: role( Secondary -> Primary ) 
> > [663204.542351] block drbd8033: role( Primary -> Secondary ) 
> > [663204.543515] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> > [748834.185982] block drbd8033: role( Secondary -> Primary ) 
> > [750155.916752] block drbd8033: role( Primary -> Secondary ) 
> > [750155.918007] block drbd8033: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> > [750155.918768] block drbd8033: ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in /root/rpmbuild/BUILD/drbd-8.4.7-2-e4242d818e66301920ef28733f533053e924717f/obj/default/drbd_main.c:3257
> > [835228.859747] block drbd8033: role( Secondary -> Primary ) 
> > [835228.860996] block drbd8033: ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in /root/rpmbuild/BUILD/drbd-8.4.7-2-e4242d818e66301920ef28733f533053e924717f/obj/default/drbd_main.c:3257
> 
> -- 
> : Lars Ellenberg
> : LINBIT | Keeping the Digital World Running
> : DRBD -- Heartbeat -- Corosync -- Pacemaker
> : R&D, Integration, Ops, Consulting, Support
> 
> DRBD® and LINBIT® are registered trademarks of LINBIT
> _______________________________________________
> drbd-dev mailing list
> drbd-dev@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-dev
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Drbd-dev] ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset )
  2016-09-23 22:01   ` Eric Wheeler
@ 2016-09-26  9:46     ` Lars Ellenberg
  0 siblings, 0 replies; 4+ messages in thread
From: Lars Ellenberg @ 2016-09-26  9:46 UTC (permalink / raw)
  To: drbd-dev

On Fri, Sep 23, 2016 at 03:01:11PM -0700, Eric Wheeler wrote:
> On Tue, 20 Sep 2016, Lars Ellenberg wrote:
> 
> > On Mon, Sep 19, 2016 at 12:09:12PM -0700, Eric Wheeler wrote:
> > > Hello all,
> > > 
> > > We noticed after resizing one backing device and not the other, that the 
> > > side with the larger device issued the following assertion:
> > > 
> > > ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) in drbd/obj/default/drbd_main.c:3257
> > >
> > > We were issuing a drbdadm resize --assume-clean, but for "reasons", the 
> > > far-end did not resize. Below you can see the trace of the side issuing 
> > > assertions. Is this a bug that should be hanadled intelligent way?
> > > 
> > > What does the failed assertion indicate? It appears to assert shortly 
> > > after role change to-or-from Primary-or-Secondary.
> > 
> > DRBD is configured for "internal" meta data,
> > for some reason does some meta data IO, and realizes that someone
> > resized the backing device under it, without telling it to.
> > 
> > Should not do further harm.
> 
> "Further" ?  Was there any harm (corruption) in the first place?

Spamming your logs ;)

> > On the box that is logging these asserts,
> > do a "drbdadm check-resize".
> > 
> > Preferably, you should follow a backend resize immediately
> > with a drbdadm check-resize (or resize).
> 
> Is `drbdadm check-resize` a node-local operation?  Does it move metadata 
> to the end on both sides, or only on the side invoking check-resize?

Depends on whether or not you have a user set size limit configured.

"check-resize" tells DRBD to check for possible backend resize,
and to move its meta data to where it would expect it
if it would attach now.

After the meta data was moved,
we send our new backend size, current exposed size,
and user capped size if any, to the peer.
Which will take the hint and itself double-check
if it's backend has been resized,
if so, do the same (see above).

Also, whenever we receive such size tuples, we check if it would allow
us to agree on a new size, and if so, do it right there.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBD® and LINBIT® are registered trademarks of LINBIT

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-09-26  9:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-19 19:09 [Drbd-dev] ASSERT( drbd_md_ss(device->ldev) == device->ldev->md.md_offset ) Eric Wheeler
2016-09-20  7:57 ` Lars Ellenberg
2016-09-23 22:01   ` Eric Wheeler
2016-09-26  9:46     ` Lars Ellenberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.