From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Gunthorpe Subject: Re: [RFC PATCH 00/19] mm: Introduce a cgroup to limit the amount of locked and pinned memory Date: Tue, 24 Jan 2023 16:12:00 -0400 Message-ID: References: Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=wfSdJyyULtrOpfbSVhYDUgYUP+Kcr0i4zzjub7Vqyzg=; b=jCc/4Xi3cG4QC8+cTl98C0JhhPrXgsiOQ+vFDcUm8/oFr5jXi3ji/4NLcUe+52qT5O0bl6kWepfKanv6k5OTE35UIbcReoK3RqdzJ7u2PmflmMfE8aIUVUFNLWUGrhtpJJzxdl1EArKxIkCPP386KKW3pJ4+2HcXwsPADZCiqUoB/wgGDciI2esHZTmOfYSez7prE/qMq8rccA5fXVDsrd5rLw8d7o+eokxRdukW+rZZ6ffPJqmp3roIY3QUBD4SSYPgHGl/aojC8jCefyOQYGI40pW6UulFbxnlAkBHed5zZBEY6i0cMijI3p7KnrSXE0IuW/D1cdtprEM9JwTORA== Content-Disposition: inline In-Reply-To: List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Alistair Popple Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, jhubbard-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org, tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org, surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, mkoutny-IBi9RG/b67k@public.gmane.org, daniel-/w4YWyX8dFk@public.gmane.org On Tue, Jan 24, 2023 at 04:42:29PM +1100, Alistair Popple wrote: > Having large amounts of unmovable or unreclaimable memory in a system > can lead to system instability due to increasing the likelihood of > encountering out-of-memory conditions. Therefore it is desirable to > limit the amount of memory users can lock or pin. > > From userspace such limits can be enforced by setting > RLIMIT_MEMLOCK. However there is no standard method that drivers and > other in-kernel users can use to check and enforce this limit. > > This has lead to a large number of inconsistencies in how limits are > enforced. For example some drivers will use mm->locked_mm while others > will use mm->pinned_mm or user->locked_mm. It is therefore possible to > have up to three times RLIMIT_MEMLOCKED pinned. > > Having pinned memory limited per-task also makes it easy for users to > exceed the limit. For example drivers that pin memory with > pin_user_pages() it tends to remain pinned after fork. To deal with > this and other issues this series introduces a cgroup for tracking and > limiting the number of pages pinned or locked by tasks in the group. > > However the existing behaviour with regards to the rlimit needs to be > maintained. Therefore the lesser of the two limits is > enforced. Furthermore having CAP_IPC_LOCK usually bypasses the rlimit, > but this bypass is not allowed for the cgroup. > > The first part of this series converts existing drivers which > open-code the use of locked_mm/pinned_mm over to a common interface > which manages the refcounts of the associated task/mm/user > structs. This ensures accounting of pages is consistent and makes it > easier to add charging of the cgroup. > > The second part of the series adds the cgroup and converts core mm > code such as mlock over to charging the cgroup before finally > introducing some selftests. > > As I don't have access to systems with all the various devices I > haven't been able to test all driver changes. Any help there would be > appreciated. I'm excited by this series, thanks for making it. The pin accounting has been a long standing problem and cgroups will really help! Jason