From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-21.6 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SIGNED_OFF_BY,SPF_PASS,USER_AGENT_GIT, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 570BDC282CB for ; Wed, 6 Feb 2019 02:35:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EFB51217F9 for ; Wed, 6 Feb 2019 02:35:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="agWp14HF" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726804AbfBFCfB (ORCPT ); Tue, 5 Feb 2019 21:35:01 -0500 Received: from mail-pl1-f196.google.com ([209.85.214.196]:38783 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726507AbfBFCfB (ORCPT ); Tue, 5 Feb 2019 21:35:01 -0500 Received: by mail-pl1-f196.google.com with SMTP id e5so2424763plb.5 for ; Tue, 05 Feb 2019 18:35:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=ApKCEzo8RhYd2Jky/u2pYoeIXX1Nl4FeNibEZamIk9M=; b=agWp14HFiOLZO45M2BpSB7s1w7kG3D3c98BgKK54IlfrWf6reLeELjTy3eqKXklD8o dQGxrjmvRxtpXnfrkXGqyHHcFYXraX51LnXgKCg3N7Doz7BFmO0sHPGp9vk0kBKACrLD J7djI1oBNd00VVSWlvvwDYkcDu6dvhxz0lcYDCtTVj78w6lNCbAydOmTe0Rd5VhLhAWW LoQDKSJ4JSHjtYb/1krgNGSFS1AuyjaFutUwaou4pcPRhGLULC2i8gYNtqO+f1jUhw1I HrWfSvFZQgccXykpLy4RsIyK/lkPjY2m0mGMytuUMpefPeShEbrBoB//0DLXOctCa7mO al7A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=ApKCEzo8RhYd2Jky/u2pYoeIXX1Nl4FeNibEZamIk9M=; b=CZ83Ueq2HBoVdeR3ItamlV9Pgl/wIeq8iIxsegVq3Ie3tlpwm34LNg7cRxlPWGOpK5 e5U25jTjX+BSLmQhwep8aSOUDs1kxO/4ZwdmhJx/x3nvqYiiAsyuQ5B6tff67tUKRRrn YMF6mL2BVcA4D7ml1i/2CWrePfR/sYtIgSQx9p8ADzNPI79Zl1RgLNGKfIVA6rh+4o0K ncmLNBc0JazSyBlGBNx4AbSoMPSufRswziFADGzlc6Up+/TXkXdKrZ/X98JX9BEmR3BP /jFNIF6hbPFrwml7KGfZwaqSeGUBQF8laNKzHb67+VJAf0fMfURhZlcrmxkN5UHUg2xl 6gSA== X-Gm-Message-State: AHQUAubDpF0jTRDE9G6vrMLv3V2YvSFeU+CTSYywwPbyWBH/EsQxTtrc +v/++AFrBymplpo61QSofsQvyw== X-Google-Smtp-Source: AHgI3IaN/oyoJISr5usvYncbOg3giDKskU77Xv9iT1jncDlZrvDrQTfVoZaFhs4u76goixtpjc+Ovw== X-Received: by 2002:a17:902:8b81:: with SMTP id ay1mr1877863plb.320.1549420499013; Tue, 05 Feb 2019 18:34:59 -0800 (PST) Received: from surenb0.mtv.corp.google.com ([2620:0:1000:1612:3320:4357:47df:276b]) by smtp.googlemail.com with ESMTPSA id o2sm6173221pgq.90.2019.02.05.18.34.57 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 05 Feb 2019 18:34:58 -0800 (PST) From: Suren Baghdasaryan To: gregkh@linuxfoundation.org Cc: tj@kernel.org, lizefan@huawei.com, hannes@cmpxchg.org, axboe@kernel.dk, dennis@kernel.org, dennisszhou@gmail.com, mingo@redhat.com, peterz@infradead.org, akpm@linux-foundation.org, corbet@lwn.net, cgroups@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan Subject: [PATCH v4 1/1] psi: introduce psi monitor Date: Tue, 5 Feb 2019 18:34:46 -0800 Message-Id: <20190206023446.177362-1-surenb@google.com> X-Mailer: git-send-email 2.20.1.611.gfbb209baf1-goog MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Psi monitor aims to provide a low-latency short-term pressure detection mechanism configurable by users. It allows users to monitor psi metrics growth and trigger events whenever a metric raises above user-defined threshold within user-defined time window. Time window and threshold are both expressed in usecs. Multiple psi resources with different thresholds and window sizes can be monitored concurrently. Psi monitors activate when system enters stall state for the monitored psi metric and deactivate upon exit from the stall state. While system is in the stall state psi signal growth is monitored at a rate of 10 times per tracking window. Min window size is 500ms, therefore the min monitoring interval is 50ms. Max window size is 10s with monitoring interval of 1s. When activated psi monitor stays active for at least the duration of one tracking window to avoid repeated activations/deactivations when psi signal is bouncing. Notifications to the users are rate-limited to one per tracking window. Signed-off-by: Suren Baghdasaryan Signed-off-by: Johannes Weiner --- This is respin of: https://lwn.net/ml/linux-kernel/20190124211518.244221-1-surenb%40google.com/ First 4 patches in the series are in linux-next: 1. fs: kernfs: add poll file operation https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=6a78cef7ad8a1734477a1352dd04a97f1dc58a70 2. kernel: cgroup: add poll file operation https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=c88177361203be291a49956b6c9d5ec164ea24b2 3. psi: introduce state_mask to represent stalled psi states https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=9d8a0c4a7f1c197de9c12bd53ef45fb6d273374e 4. psi: rename psi fields in preparation for psi trigger addition https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=0ef9bb049a4db519152a8664088f7ce34bbee5ac This patch can be cleanly applied either over linux-next tree (tag: next-20190201) or over linux-stable v5.0-rc4 after applying abovementioned 4 patches. Changes in v4: - Resolved conflict with "psi: fix aggregation idle shut-off" patch, as per Andrew Morton - Replaced smp_mb__after_atomic() with smp_mb() for proper ordering, as per Peter - Moved now=sched_clock() in psi_update_work() after mutex acquisition, as per Peter - Expanded comments to explain why smp_mb() is needed in psi_update_work, as per Peter - Fixed g->polling operation order in the diagram above psi_update_work(), as per Johannes - Merged psi_trigger_parse() into psi_trigger_create(), as per Johannes - Replaced list_del_init with list_del in psi_trigger_destroy(), as per Minchan - Replaced return value in get_recent_times and collect_percpu_times to return-by-parameter, as per Minchan - Renamed window_init into window_reset and reused it, as per Minchan - Replaced kzalloc with kmalloc, as per Minchan - Added explanation in psi.txt for min/max window size choices, as per Minchan - Misc variable name cleanups, as per Minchan and Johannes Documentation/accounting/psi.txt | 107 ++++++ include/linux/psi.h | 8 + include/linux/psi_types.h | 59 ++++ kernel/cgroup/cgroup.c | 95 +++++- kernel/sched/psi.c | 559 +++++++++++++++++++++++++++++-- 5 files changed, 794 insertions(+), 34 deletions(-) diff --git a/Documentation/accounting/psi.txt b/Documentation/accounting/psi.txt index b8ca28b60215..4fb40fe94828 100644 --- a/Documentation/accounting/psi.txt +++ b/Documentation/accounting/psi.txt @@ -63,6 +63,110 @@ tracked and exported as well, to allow detection of latency spikes which wouldn't necessarily make a dent in the time averages, or to average trends over custom time frames. +Monitoring for pressure thresholds +================================== + +Users can register triggers and use poll() to be woken up when resource +pressure exceeds certain thresholds. + +A trigger describes the maximum cumulative stall time over a specific +time window, e.g. 100ms of total stall time within any 500ms window to +generate a wakeup event. + +To register a trigger user has to open psi interface file under +/proc/pressure/ representing the resource to be monitored and write the +desired threshold and time window. The open file descriptor should be +used to wait for trigger events using select(), poll() or epoll(). +The following format is used: + +