From: Tao Cui <cuitao@kylinos.cn>
To: tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com,
cgroups@vger.kernel.org
Cc: Tao Cui <cuitao@kylinos.cn>
Subject: [PATCH v3 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local]
Date: Thu, 14 May 2026 14:50:30 +0800 [thread overview]
Message-ID: <20260514065034.387197-1-cuitao@kylinos.cn> (raw)
Hi,
This is v3 of the RDMA cgroup observability series. Thanks to the
reviewers for the detailed feedback on v1 and v2.
This series adds new cgroup interface files to the RDMA controller
to improve observability of resource usage and limit enforcement:
- rdma.peak: per-device high watermark of resource usage
- rdma.events: hierarchical max and alloc_fail event counters
- rdma.events.local: per-cgroup local max and alloc_fail counters
rdma.peak tracks the historical high watermark so administrators can
determine a sensible rdma.max based on actual peak demand rather than
guesswork. This is directly analogous to memory.peak.
rdma.events and rdma.events.local provide per-device counters that
track how often resource limits block allocations, and can be monitored
via poll/epoll for real-time alerting. Both files expose the same
keys (max and alloc_fail); rdma.events aggregates hierarchically while
rdma.events.local shows per-cgroup values. This follows the
pids.events / pids.events.local design.
Patch overview:
Patch 1 introduces rdma.peak, adding a per-resource peak field to track
the high watermark of usage, updated only after a full hierarchical
charge succeeds, and extends rpool lifetime to preserve non-zero
peak values.
Patch 2 adds rdma.events, which introduces rdmacg_event_locked() to
propagate hierarchical max counters upward from the over-limit
cgroup using get_cg_rpool_locked() to ensure full hierarchical
coverage even for ancestors without a prior rpool, with poll/epoll
notification via cgroup_file_notify().
Patch 3 adds rdma.events.local and hierarchical alloc_fail, extending
the event framework with per-cgroup local counters (local_max for
the over-limit cgroup, local_alloc_fail for the requesting cgroup)
and a hierarchical alloc_fail counter propagated from the requestor
upward. It also extracts the duplicated rpool-keep predicate into
a rpool_has_persistent_state() helper and replaces the non-error
goto dev_err in rdmacg_resource_set_max() with an if-guard.
Patch 4 documents all three new interface files in cgroup-v2.rst.
Tao Cui (4):
cgroup/rdma: add rdma.peak for per-device peak usage tracking
cgroup/rdma: add rdma.events to track resource limit exhaustion
cgroup/rdma: add rdma.events.local for per-cgroup allocation failure
attribution
cgroup/rdma: document rdma.peak, rdma.events and rdma.events.local
Documentation/admin-guide/cgroup-v2.rst | 54 +++++++
include/linux/cgroup_rdma.h | 4 +
kernel/cgroup/rdma.c | 199 ++++++++++++++++++++++--
3 files changed, 247 insertions(+), 10 deletions(-)
---
Changes in v3:
- Switch rdmacg_event_locked() from find_ to get_cg_rpool_locked()
in hierarchical propagation loops (events_max and events_alloc_fail)
to ensure full hierarchical coverage; the rpool-keep check now
covers event counters, so spurious-rpool concern from v1 no longer
applies.
- Extract the duplicated rpool-keep predicate (peak + 4 event
counters) into rpool_has_persistent_state() helper.
- Replace the non-error goto dev_err in rdmacg_resource_set_max()
with an if-guard so dev_err is only used for real error paths.
- Fix commit message of rdma.events.local patch to mention the
rdma.events hierarchical alloc_fail extension.
- Use %llu and drop (s64) cast in rdmacg_events_show() and
rdmacg_events_local_show() to match u64 counter type.
Changes in v2:
- Fix peak updated before full hierarchical charge succeeds.
- Use find_cg_rpool_locked() to avoid creating spurious rpools.
- Replace atomic64_t with u64 + READ_ONCE (all under rdmacg_mutex).
- Use key=value output format, remove trailing spaces.
- Always list all devices, show zero for devices without an rpool.
- Extend rpool-free condition to preserve non-zero event counters.
- Rename "failcnt" to "alloc_fail" (cgroup v2 naming convention).
- Fix alloc_fail semantics: local to the requesting cgroup only.
- Add hierarchical alloc_fail to rdma.events for key consistency.
- Add documentation in Documentation/admin-guide/cgroup-v2.rst.
v1:
https://lore.kernel.org/all/20260512031719.273507-1-cuitao@kylinos.cn/
v2:
https://lore.kernel.org/all/20260513104956.373216-1-cuitao@kylinos.cn/
--
2.43.0
next reply other threads:[~2026-05-14 6:50 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-14 6:50 Tao Cui [this message]
2026-05-14 6:50 ` [PATCH v3 1/4] cgroup/rdma: add rdma.peak for per-device peak usage tracking Tao Cui
2026-05-14 6:50 ` [PATCH v3 2/4] cgroup/rdma: add rdma.events to track resource limit exhaustion Tao Cui
2026-05-14 6:50 ` [PATCH v3 3/4] cgroup/rdma: add rdma.events.local for per-cgroup allocation failure attribution Tao Cui
2026-05-14 6:50 ` [PATCH v3 4/4] cgroup/rdma: document rdma.peak, rdma.events and rdma.events.local Tao Cui
2026-05-14 21:26 ` [PATCH v3 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local] Tejun Heo
2026-05-15 0:48 ` Tao Cui
2026-05-14 21:27 ` Tejun Heo
2026-05-14 21:27 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260514065034.387197-1-cuitao@kylinos.cn \
--to=cuitao@kylinos.cn \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=mkoutny@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox