From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f52.google.com ([74.125.82.52]:38213 "EHLO mail-wm0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750922AbcGLNr6 (ORCPT ); Tue, 12 Jul 2016 09:47:58 -0400 Received: by mail-wm0-f52.google.com with SMTP id o80so26139368wme.1 for ; Tue, 12 Jul 2016 06:47:56 -0700 (PDT) Subject: Re: [PATCH] btrfs: Add ratelimiting to printing facility To: dsterba@suse.cz, jbacik@fb.com, clm@fb.com, operations@siteground.com, linux-btrfs@vger.kernel.org References: <1468326012-15910-1-git-send-email-kernel@kyup.com> <20160712132044.GC10595@suse.cz> From: Nikolay Borisov Message-ID: <5784F509.8040200@kyup.com> Date: Tue, 12 Jul 2016 16:47:53 +0300 MIME-Version: 1.0 In-Reply-To: <20160712132044.GC10595@suse.cz> Content-Type: text/plain; charset=windows-1252 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 07/12/2016 04:20 PM, David Sterba wrote: > On Tue, Jul 12, 2016 at 03:20:12PM +0300, Nikolay Borisov wrote: >> Currently the btrfs printk infrastructure doesn't implement any >> kind of ratelimiting. > > If you count the whole infrastructure, it does. See ctree.h and macros > ending with _rl (btrfs_err_rl), and should be used where the messages > are likely to flood. Otherwise I think "more is better" regarding > messages as this is helpful when debugging issues. So I definitely didn't look at those. But now I have and it seems they implement more or less the same thing. Also, if I'm reading the code correctly, as it stands now using the _rl versions seem to be more flexible as the limits is going to be per-message rather than per-message class as it is in my proposal. So I'd rather move that particular csum related message to the _rl infrastructure. > >> Recently I came accross a case where due to >> FS corruption an excessive amount of printk caused the softlockup >> detector to trigger and reset the server. This patch aims to avoid >> two types of issue: >> * I want to avoid different levels of messages interefere with the >> ratelimiting of oneanother so as to avoid a situation where a >> flood of INFO messages causes the ratelimit to trigger, >> potentially leading to supression of more important messages. > > Yeah, that's my concern as well. What if there's a burst of several > error messages that do not fit to the limit and some of them get > dropped. > >> * Avoid a flood of any type of messages rendering the machine >> unusable > > While I'd rather set a per-message ratelimiting, it's possible that an > unexpected error will start flooding. So some sort of per-level limiting > could be implemented, as you propose, but I'd suggest to set the numbers > higher. That way it would still flood up to certain level but should > avoid the lockups. Sure, I'm happy to set the limits higher and have it act as a safety net. But then again, what would make a sensible limit - 100 messages in 5 seconds seems reasonable but this is completely arbitrary. Any thoughts?