From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <lars.ellenberg@linbit.com>
Received: from mail-wm0-f68.google.com (mail-wm0-f68.google.com [74.125.82.68])
	by mail09.linbit.com (LINBIT Mail Daemon) with ESMTP id 7F8961058011
	for <drbd-dev@lists.linbit.com>; Thu,  2 Nov 2017 14:19:47 +0100 (CET)
Received: by mail-wm0-f68.google.com with SMTP id y83so11283119wmc.4
	for <drbd-dev@lists.linbit.com>; Thu, 02 Nov 2017 06:19:47 -0700 (PDT)
Received: from soda.linbit ([86.59.100.100])
	by smtp.gmail.com with ESMTPSA id c7sm3236510edc.26.2017.11.02.06.19.44
	for <drbd-dev@lists.linbit.com>
	(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
	Thu, 02 Nov 2017 06:19:45 -0700 (PDT)
Date: Thu, 2 Nov 2017 14:19:44 +0100
From: Lars Ellenberg <lars.ellenberg@linbit.com>
To: drbd-dev@lists.linbit.com
Message-ID: <20171102131944.GE18213@soda.linbit>
References: <20171031035818.GA6109@eros>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20171031035818.GA6109@eros>
Subject: Re: [Drbd-dev] verbose log output from drbd kernel module
List-Id: "*Coordination* of development, patches,
	contributions -- *Questions* \(even to developers\) go to drbd-user,
	please." <drbd-dev.lists.linbit.com>
List-Unsubscribe: <http://lists.linbit.com/mailman/options/drbd-dev>,
	<mailto:drbd-dev-request@lists.linbit.com?subject=unsubscribe>
List-Archive: <http://lists.linbit.com/pipermail/drbd-dev>
List-Post: <mailto:drbd-dev@lists.linbit.com>
List-Help: <mailto:drbd-dev-request@lists.linbit.com?subject=help>
List-Subscribe: <http://lists.linbit.com/mailman/listinfo/drbd-dev>,
	<mailto:drbd-dev-request@lists.linbit.com?subject=subscribe>

On Tue, Oct 31, 2017 at 02:58:18PM +1100, Tobin C. Harding wrote:
> Currently the drdb kernel module is a little bit verbose. While there
> are numerous repeated log lines, a number of them are repeated in
> different sequences. It is not immediately clear which lines
> are _really_ necessary to understand what is going on. A good
> understanding of drdb is probably required to reduce the duplicates
> without loss of information. There are however a number of duplicate log
> entries that appear consecutively, these could be reduced to a
> single instance without loss of information (assuming the number of
> times the error message appears consecutively is not useful
> information).
> 
> So far I have found the following duplicate consecutive log entries
> 
> #
> 
> [ 1274.952627] F2FS-fs (drbd0): Magic Mismatch, valid(0xf2f52010) - read(0x29363930)
> [ 1274.952646] F2FS-fs (drbd0): Can't find valid F2FS filesystem in 1th superblock
> [ 1274.953695] F2FS-fs (drbd0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
> [ 1274.953710] F2FS-fs (drbd0): Can't find valid F2FS filesystem in 2th superblock
> [ 1274.953760] F2FS-fs (drbd0): Magic Mismatch, valid(0xf2f52010) - read(0x29363930)
> [ 1274.953770] F2FS-fs (drbd0): Can't find valid F2FS filesystem in 1th superblock
> [ 1274.953779] F2FS-fs (drbd0): Magic Mismatch, valid(0xf2f52010) - read(0x0)
> [ 1274.953786] F2FS-fs (drbd0): Can't find valid F2FS filesystem in 2th superblock

Those are not DRBD messages, but generated by f2fs.

> # After adding/deploying a resource 
> 
> [1368149.470125] drbd backups: Preparing cluster-wide state change 3280213538 (1->-1 3/1)
> [1368149.470127] drbd backups: Committing cluster-wide state change 3280213538 (0ms)
> [1368149.470136] drbd backups: role( Secondary -> Primary )
> [1368149.474569] drbd backups: role( Primary -> Secondary )
> [1368149.479674] drbd backups: Preparing cluster-wide state change 1520483517 (1->-1 3/1)
> [1368149.479676] drbd backups: Committing cluster-wide state change 1520483517 (0ms)
> [1368149.479685] drbd backups: role( Secondary -> Primary )
> [1368149.484642] drbd backups: role( Primary -> Secondary )
> [1368164.080148] drbd backups: Preparing cluster-wide state change 3014496965 (1->-1 3/1)
> [1368164.080150] drbd backups: Committing cluster-wide state change 3014496965 (0ms)
> [1368164.080161] drbd backups: role( Secondary -> Primary )
> ... [repeats many times]

Auto promote, every time someone opens DRBD for write, this does
Secondary -> Primary, when the last opener closes again,
it does an "auto-demotion" again Primary -> Secondary.

If you do stuff like that: "while : > /dev/drbd0 ; do :; done",
you get it logged, over and over again.

Usually, you want to use DRBD for longer than a few milli seconds,
so these should only be logged once per mount respectively umount.


> # If block device is busy
> 
> [1369106.294749] drbd backups: Failed: role( Primary -> Secondary )
> [1369106.297290] drbd backups: State change failed: Device is held open by someone
> ... [repeats 6 times]
> 
> 
> Would it be useful to reduce these log lines to a single instance?

Useful, maybe.
Not so easy, though, if it is retried from userland.
And maybe not that easy even for in-kernel retries,
because we want to log that failure at least once,
and may not always know beforehand which retry will be
our last try in that particular sequence of retries.

-- 
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker
: R&D, Integration, Ops, Consulting, Support

DRBDŽ and LINBITŽ are registered trademarks of LINBIT