Linux cgroups development
 help / color / mirror / Atom feed
* [PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local]
@ 2026-05-13 10:49 Tao Cui
  2026-05-13 10:49 ` [PATCH v2 1/4] cgroup/rdma: add rdma.peak for per-device peak usage tracking Tao Cui
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Tao Cui @ 2026-05-13 10:49 UTC (permalink / raw)
  To: tj, hannes, mkoutny, cgroups; +Cc: Tao Cui

Hi,

This is v2 of the RDMA cgroup observability series.  Thanks to the
reviewers for the detailed feedback on v1.  

This series adds new cgroup interface files to the RDMA controller
to improve observability of resource usage and limit enforcement:

  - rdma.peak:        per-device high watermark of resource usage
  - rdma.events:      hierarchical max and alloc_fail event counters
  - rdma.events.local: per-cgroup local max and alloc_fail counters

rdma.peak tracks the historical high watermark so administrators can
determine a sensible rdma.max based on actual peak demand rather than
guesswork.  This is directly analogous to memory.peak.

rdma.events and rdma.events.local provide per-device counters that
track how often resource limits block allocations, and can be monitored
via poll/epoll for real-time alerting.  Both files expose the same
keys (max and alloc_fail); rdma.events aggregates hierarchically while
rdma.events.local shows per-cgroup values.  This follows the
pids.events / pids.events.local design.

Patch overview:
  Patch 1 introduces rdma.peak, adding a per-resource peak field to track
  the high watermark of usage, updated only after a full hierarchical
  charge succeeds, and extends rpool lifetime to preserve non-zero
  peak values.
  Patch 2 adds rdma.events, which introduces rdmacg_event_locked() to
  propagate hierarchical max counters upward from the over-limit
  cgroup, with poll/epoll notification via cgroup_file_notify().
  Patch 3 adds rdma.events.local and hierarchical alloc_fail, extending
  the event framework with per-cgroup local counters (local_max for
  the over-limit cgroup, local_alloc_fail for the requesting cgroup)
  and a hierarchical alloc_fail counter propagated from the requestor
  upward.
  Patch 4 documents all three new interface files in cgroup-v2.rst.

Tao Cui (4):
  cgroup/rdma: add rdma.peak for per-device peak usage tracking
  cgroup/rdma: add rdma.events to track resource limit exhaustion
  cgroup/rdma: add rdma.events.local for per-cgroup allocation failure
    attribution
  cgroup/rdma: document rdma.peak, rdma.events and rdma.events.local

 Documentation/admin-guide/cgroup-v2.rst |  54 +++++++
 include/linux/cgroup_rdma.h             |   4 +
 kernel/cgroup/rdma.c                    | 180 ++++++++++++++++++++++++
 3 files changed, 238 insertions(+)

---
Changes in v2:
  - Fix peak updated before full hierarchical charge succeeds.
  - Use find_cg_rpool_locked() to avoid creating spurious rpools.
  - Replace atomic64_t with u64 + READ_ONCE (all under rdmacg_mutex).
  - Use key=value output format, remove trailing spaces.
  - Always list all devices, show zero for devices without an rpool.
  - Extend rpool-free condition to preserve non-zero event counters.
  - Rename "failcnt" to "alloc_fail" (cgroup v2 naming convention).
  - Fix alloc_fail semantics: local to the requesting cgroup only.
  - Add hierarchical alloc_fail to rdma.events for key consistency.
  - Add documentation in Documentation/admin-guide/cgroup-v2.rst.

v1:
  https://lore.kernel.org/all/20260512031719.273507-1-cuitao@kylinos.cn/
-- 
2.43.0


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-13 20:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-13 10:49 [PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local] Tao Cui
2026-05-13 10:49 ` [PATCH v2 1/4] cgroup/rdma: add rdma.peak for per-device peak usage tracking Tao Cui
2026-05-13 10:49 ` [PATCH v2 2/4] cgroup/rdma: add rdma.events to track resource limit exhaustion Tao Cui
2026-05-13 10:49 ` [PATCH v2 3/4] cgroup/rdma: add rdma.events.local for per-cgroup allocation failure attribution Tao Cui
2026-05-13 10:49 ` [PATCH v2 4/4] cgroup/rdma: document rdma.peak, rdma.events and rdma.events.local Tao Cui
2026-05-13 20:27 ` [PATCH v2 0/4] cgroup/rdma: add rdma.peak and rdma.events[.local] Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox