From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77143C433F5 for ; Mon, 24 Jan 2022 15:53:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6E53E6B0087; Mon, 24 Jan 2022 10:53:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6940F6B008C; Mon, 24 Jan 2022 10:53:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 55C2D6B00B4; Mon, 24 Jan 2022 10:53:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0211.hostedemail.com [216.40.44.211]) by kanga.kvack.org (Postfix) with ESMTP id 441526B0087 for ; Mon, 24 Jan 2022 10:53:44 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 034B194FB4 for ; Mon, 24 Jan 2022 15:53:44 +0000 (UTC) X-FDA: 79065625968.26.5835C8F Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) by imf09.hostedemail.com (Postfix) with ESMTP id E0EB4140005 for ; Mon, 24 Jan 2022 15:53:25 +0000 (UTC) Received: by mail-qk1-f175.google.com with SMTP id a12so5735105qkk.9 for ; Mon, 24 Jan 2022 07:53:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=5kW/9SXiovCXkW3uTKPS+tUBIUOqvG4jszJhTXEHiHs=; b=e8OGTjrTE+gIW7oKMoEUAmJef11dU2wUDdCIAaqgypBhHISzFrBaRKol3kYHikbvYo EzHVTnnXGZNh2qbzSH6vWY5xlXsttvHcRNQ2pnxkujOR7x51vUra6VCYNFpY0Pn/L7LC jqK/8G/mBtas/iZ3FH0iCqd5IO6rQuY040KQ1oSD3cLWdcdYD1eRuAQnrmvXBvW1C+Nx IsC+SbRy9UPZZTOOmTl4SiXJESYhaD7NgAOCp9M67eb0wK3Rc3kRZlpIyRzG623HJlH9 Rwu0Zd2Q+Wvb9QnQDtGgDHpa2iCrCebFaZfVA9MCCpUaWN6b3A4t1UcIf3SvoTX/K+rT ICNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=5kW/9SXiovCXkW3uTKPS+tUBIUOqvG4jszJhTXEHiHs=; b=CqqUhnXeAchzCXPKKEm7aVl4UW3mH6gmJx+ZxcD6Vq2fw9SsIckY0jLINwNdTJgRwZ tuI02R7q9NFBsdYs9+wQqVKxQL1L4crMLh63Ner72mYiTu1ECbOG0unYgCB0QlvVllxj d9PvJY65t2Hz4Bf1MkRhZNFeOKImlMtFZLV0IdUp4rUYIlYU4O7/NejIGYpjbOjGROxZ kK43ebbZbo4IxRnTRNCnLqAeB4jUcKwCQiIRUoUa49hfivMSx8wXfw8Vgol1j/n94Iwo TFHgPcUZffayG5l/open3he6VHB22WoMP0vznL//hg/G7YHXd4/tp4KL78ZvK1lRvH+u jSFg== X-Gm-Message-State: AOAM531PxtGzZGRyoZJz3IVP2AxV13hs4Gpfad6BXrShEZ2p/hYv6GPw B6DpClhohPmrmvZegp/TT9iesA== X-Google-Smtp-Source: ABdhPJxji/utSHe1RitivbiagK3oDItR8C7yOgTCJNmg8n1G0uP3RluZUYEGbg/l01s3xcX/E8f+xQ== X-Received: by 2002:a05:620a:1474:: with SMTP id j20mr3388902qkl.77.1643039605159; Mon, 24 Jan 2022 07:53:25 -0800 (PST) Received: from localhost (cpe-98-15-154-102.hvc.res.rr.com. [98.15.154.102]) by smtp.gmail.com with ESMTPSA id f9sm6038831qkp.94.2022.01.24.07.53.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Jan 2022 07:53:24 -0800 (PST) Date: Mon, 24 Jan 2022 10:53:23 -0500 From: Johannes Weiner To: Huangzhaoyang Cc: Suren Baghdasaryan , Peter Zijlstra , Zhaoyang Huang , Ingo Molnar , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [Resend PATCH v3] psi: fix possible trigger missing in the window Message-ID: References: <1642649516-15076-1-git-send-email-huangzhaoyang@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1642649516-15076-1-git-send-email-huangzhaoyang@gmail.com> X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: E0EB4140005 X-Stat-Signature: 6g37xfbgnsentdifmoaso6gfap14km54 Authentication-Results: imf09.hostedemail.com; dkim=temperror ("DNS error when getting key") header.d=cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=e8OGTjrT; spf=temperror (imf09.hostedemail.com: error in processing during lookup of hannes@cmpxchg.org: DNS error) smtp.mailfrom=hannes@cmpxchg.org X-Rspam-User: nil X-HE-Tag: 1643039605-604014 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jan 20, 2022 at 11:31:56AM +0800, Huangzhaoyang wrote: > From: Zhaoyang Huang > > When a new threshold breaching stall happens after a psi event was > generated and within the window duration, the new event is not > generated because the events are rate-limited to one per window. If > after that no new stall is recorded then the event will not be > generated even after rate-limiting duration has passed. This is > happening because with no new stall, window_update will not be called > even though threshold was previously breached. To fix this, record > threshold breaching occurrence and generate the event once window > duration is passed. > > Suggested-by: Suren Baghdasaryan > Signed-off-by: Zhaoyang Huang Good catch. The change makes sense to me. However, I had to re-read the discussion to understand *why* triggering once per window can be a practical problem. Could you please include the lkmd scenario you mentioned? Suren, even though it's your suggested code, can you please also add ack/review tags? Thanks! Some minor inline comments: > diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h > index 0a23300..87b694a 100644 > --- a/include/linux/psi_types.h > +++ b/include/linux/psi_types.h > @@ -132,6 +132,8 @@ struct psi_trigger { > > /* Refcounting to prevent premature destruction */ > struct kref refcount; > + > + bool threshold_breach; Something like bool pending_event would be more descriptive, IMO. Also please remember to add a short comment like we have for the other struct members. For example: /* Deferred event(s) from previous ratelimit window */ > @@ -524,24 +524,29 @@ static u64 update_triggers(struct psi_group *group, u64 now) > */ > list_for_each_entry(t, &group->triggers, node) { > u64 growth; > + bool trigger_stalled = > + group->polling_total[t->state] != total[t->state]; Triggers don't stall, they trigger on stalls. How about this: bool new_stall; u64 growth; new_stall = group->polling_total[t->state] != total[t->state]; (order local declarations by length, avoid line wraps where possible) > - /* Check for stall activity */ > - if (group->polling_total[t->state] == total[t->state]) > - continue; > - > - /* > - * Multiple triggers might be looking at the same state, > - * remember to update group->polling_total[] once we've > - * been through all of them. Also remember to extend the > - * polling time if we see new stall activity. > - */ > - new_stall = true; > - > - /* Calculate growth since last update */ > - growth = window_update(&t->win, now, total[t->state]); > - if (growth < t->threshold) > + /* Check for stall activity or a previous threshold breach */ > + if (!trigger_stalled && !t->threshold_breach) > continue; This could use a bit more explanation imo: /* * Check for new stall activity, as well as deferred * events that occurred in the last window after the * trigger had already fired (we want to ratelimit * events without dropping any). */ if (!new_stall && !t->pending_event) continue; > + if (trigger_stalled) { > + /* > + * Multiple triggers might be looking at the same state, > + * remember to update group->polling_total[] once we've > + * been through all of them. Also remember to extend the > + * polling time if we see new stall activity. > + */ > + new_stall = true; and then rename this flag `update_total'. > + /* Calculate growth since last update */ > + growth = window_update(&t->win, now, total[t->state]); > + if (growth < t->threshold) > + continue; > + > + t->threshold_breach = true; > + } > /* Limit event signaling to once per window */ > if (now < t->last_event_time + t->win.size) > continue; > @@ -550,6 +555,8 @@ static u64 update_triggers(struct psi_group *group, u64 now) > if (cmpxchg(&t->event, 0, 1) == 0) > wake_up_interruptible(&t->event_wait); > t->last_event_time = now; > + /* Reset threshold breach flag once event got generated */ > + t->threshold_breach = false; > } > > if (new_stall) > @@ -1152,6 +1159,7 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group, > t->last_event_time = 0; > init_waitqueue_head(&t->event_wait); > kref_init(&t->refcount); > + t->threshold_breach = false; > > mutex_lock(&group->trigger_lock); Thanks!