From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.2 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3011C433EF for ; Wed, 15 Sep 2021 16:33:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D2081610D1 for ; Wed, 15 Sep 2021 16:33:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229554AbhIOQez (ORCPT ); Wed, 15 Sep 2021 12:34:55 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:48542 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229491AbhIOQey (ORCPT ); Wed, 15 Sep 2021 12:34:54 -0400 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id E985E221AF; Wed, 15 Sep 2021 16:33:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1631723614; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9CAesP9l05/qiFjD3jOUqKHzQbHjYyhw5mmZ/xisxyo=; b=wxG0JaJ4HyJe01JVeJpRj5HubgVLTEYvMZNIg41cKSiP+8xM/CoEekmD484WPrLElyALM/ wsDghf1Va8Q45xMou/VJG+VUkRSl2HtYehmmFq05m58NYDbw16oyIBS7shZ0GK1wobEFek OQkfZQsgH3gi0npptOwp69VZASQq1b4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1631723614; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=9CAesP9l05/qiFjD3jOUqKHzQbHjYyhw5mmZ/xisxyo=; b=J01JZoYriYTWzdkscQrD9lUzVduqjzxlc815Ool3U63p6XnOFLt6Q1A1w9RFbodF6iI/iA 4ksPZlh7pEfkRKBw== Received: from quack2.suse.cz (unknown [10.100.224.230]) by relay2.suse.de (Postfix) with ESMTP id DDF53A3B91; Wed, 15 Sep 2021 16:33:34 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id B8E981E0BEA; Wed, 15 Sep 2021 18:33:34 +0200 (CEST) Date: Wed, 15 Sep 2021 18:33:34 +0200 From: Jan Kara To: Amir Goldstein Cc: Jan Kara , linux-fsdevel Subject: Re: [PATCH 5/7] fanotify: limit number of event merge attempts Message-ID: <20210915163334.GD6166@quack2.suse.cz> References: <20210202162010.305971-1-amir73il@gmail.com> <20210202162010.305971-6-amir73il@gmail.com> <20210301130818.GE25026@quack2.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Wed 15-09-21 15:39:13, Amir Goldstein wrote: > On Mon, Mar 1, 2021 at 3:08 PM Jan Kara wrote: > > > > On Sat 27-02-21 10:31:52, Amir Goldstein wrote: > > > On Tue, Feb 2, 2021 at 6:20 PM Amir Goldstein wrote: > > > > > > > > Event merges are expensive when event queue size is large. > > > > Limit the linear search to 128 merge tests. > > > > In combination with 128 hash lists, there is a potential to > > > > merge with up to 16K events in the hashed queue. > > > > > > > > Signed-off-by: Amir Goldstein > > > > --- > > > > fs/notify/fanotify/fanotify.c | 6 ++++++ > > > > 1 file changed, 6 insertions(+) > > > > > > > > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c > > > > index 12df6957e4d8..6d3807012851 100644 > > > > --- a/fs/notify/fanotify/fanotify.c > > > > +++ b/fs/notify/fanotify/fanotify.c > > > > @@ -129,11 +129,15 @@ static bool fanotify_should_merge(struct fsnotify_event *old_fsn, > > > > return false; > > > > } > > > > > > > > +/* Limit event merges to limit CPU overhead per event */ > > > > +#define FANOTIFY_MAX_MERGE_EVENTS 128 > > > > + > > > > /* and the list better be locked by something too! */ > > > > static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) > > > > { > > > > struct fsnotify_event *test_event; > > > > struct fanotify_event *new; > > > > + int i = 0; > > > > > > > > pr_debug("%s: list=%p event=%p\n", __func__, list, event); > > > > new = FANOTIFY_E(event); > > > > @@ -147,6 +151,8 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event) > > > > return 0; > > > > > > > > list_for_each_entry_reverse(test_event, list, list) { > > > > + if (++i > FANOTIFY_MAX_MERGE_EVENTS) > > > > + break; > > > > if (fanotify_should_merge(test_event, event)) { > > > > FANOTIFY_E(test_event)->mask |= new->mask; > > > > return 1; > > > > -- > > > > 2.25.1 > > > > > > > > > > Jan, > > > > > > I was thinking that this patch or a variant thereof should be applied to stable > > > kernels, but not the entire series. > > > > > > OTOH, I am concerned about regressing existing workloads that depend on > > > merging events on more than 128 inodes. > > > > Honestly, I don't think pushing anything to stable for this is really worth > > it. > > > > 1) fanotify() is limited to CAP_SYS_ADMIN (in init namespace) so this is > > hardly a security issue. > > > > 2) We have cond_resched() in the merge code now so the kernel doesn't > > lockup anymore. So this is only about fanotify becoming slow if you have > > lots of events. > > > > 3) I haven't heard any complaints since we've added the cond_resched() > > patch so the performance issue seems to be really rare. > > > > If I get complaits from real users about this, we can easily reconsider, it > > is not a big deal. But I just don't think preemptive action is warranted... > > > > Hi Jan, > > I know you have some catching up to do, but applying this patch to stable > has become a priority for me. > It was a mistake on my part not to push harder 6 months ago, so I am trying > to rectify this mistake now as soon as possible. > > To answer your arguments against preemptive action: > 1) My application has CAP_SYS_ADMIN, it is not malicious, but it cannot > do its job without taking up more CPU that it needs to, because bursts of > events will cause the event queue to grow to thousands of events and > fanotify_merge() will become a high CPU consumer > 2) It's not only about fanotify becoming slow, it's about fanotify making the > entire system slow and as a result it takes a long time for the system > to recover from this condition > 3) You haven't heard any complains because nobody was using sb mark > We have been using sb mark in production for a few years and carry > this patch in our kernel, so I can say for certain that sb mark on a fs with > heavy workload is disturbing the entire system without this patch. > > I don't think that "regressing" the number of merged event is a big issue, > as we never guaranteed any specific merge behavior and the behavior > was hard enough to predict, so I don't think applications could have > relied on it. > > So, are you ok with me sending this patch to stable as is? Sure, go ahead. I was not strongly against pushing this to stable, I just didn't see good reason to do that but your arguments make sense - you count as a user report I was waiting for ;). Honza -- Jan Kara SUSE Labs, CR