From: Jason Gunthorpe <jgg-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>,
Yosry Ahmed <yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Alistair Popple <apopple-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
jhubbard-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org,
tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org,
surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
mkoutny-IBi9RG/b67k@public.gmane.org,
daniel-/w4YWyX8dFk@public.gmane.org,
"Daniel P . Berrange"
<berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Alex Williamson
<alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Zefan Li <lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Subject: Re: [PATCH 14/19] mm: Introduce a cgroup for pinned memory
Date: Tue, 21 Feb 2023 13:51:12 -0400 [thread overview]
Message-ID: <Y/UEkNn0O65Pfi4e@nvidia.com> (raw)
In-Reply-To: <Y/T/bkcYc9Krw4rE-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
On Tue, Feb 21, 2023 at 07:29:18AM -1000, Tejun Heo wrote:
> On Tue, Feb 21, 2023 at 01:25:59PM -0400, Jason Gunthorpe wrote:
> > On Tue, Feb 21, 2023 at 06:51:48AM -1000, Tejun Heo wrote:
> > > cgroup, right? It makes little sense to me to separate the owner of the
> > > memory page and the pinner of it. They should be one and the same.
> >
> > The owner and pinner are not always the same entity or we could just
> > use the page's cgroup.
>
> Yeah, so, what I'm trying to say is that that might be the source of the
> problem. Is the current page ownership attribution correct
It should be correct.
This mechanism is driven by pin_user_page(), (as it is the only API
that can actually create a pin) so the cgroup owner of the page is
broadly related to the "owner" of the VMA's inode.
The owner of the pin is the caller of pin_user_page(), which is
initated by some FD/proces that is not necessarily related to the
VMA's inode.
Eg concretely, something like io_uring will do something like:
buffer = mmap() <- Charge memcg for the pages
fd = io_uring_setup(..)
io_uring_register(fd,xx,buffer,..); <- Charge the pincg for the pin
If mmap is a private anonymous VMA created by the same process then it
is likely the pages will have the same cgroup as io_uring_register and
the FD.
Otherwise the page cgroup is unconstrained. MAP_SHARED mappings will
have the page cgroup point at whatever cgroup was first to allocate
the page for the VMA's inode.
AFAIK there are few real use cases to establish a pin on MAP_SHARED
mappings outside your cgroup. However, it is possible, the APIs allow
it, and for security sandbox purposes we can't allow a process inside
a cgroup to triger a charge on a different cgroup. That breaks the
sandbox goal.
If memcg could support multiple owners then it would be logical that
the pinner would be one of the memcg owners.
> for whatever reason is determining the pinning ownership or should the page
> ownership be attributed the same way too? If they indeed need to differ,
> that probably would need pretty strong justifications.
It is inherent to how pin_user_pages() works. It is an API that
establishs pins on existing pages. There is nothing about it that says
who the page's memcg owner is.
I don't think we can do anything about this without breaking things.
Jason
WARNING: multiple messages have this Message-ID (diff)
From: Jason Gunthorpe <jgg@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>,
Yosry Ahmed <yosryahmed@google.com>,
Alistair Popple <apopple@nvidia.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, jhubbard@nvidia.com,
tjmercier@google.com, hannes@cmpxchg.org, surenb@google.com,
mkoutny@suse.com, daniel@ffwll.ch,
"Daniel P . Berrange" <berrange@redhat.com>,
Alex Williamson <alex.williamson@redhat.com>,
Zefan Li <lizefan.x@bytedance.com>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 14/19] mm: Introduce a cgroup for pinned memory
Date: Tue, 21 Feb 2023 13:51:12 -0400 [thread overview]
Message-ID: <Y/UEkNn0O65Pfi4e@nvidia.com> (raw)
In-Reply-To: <Y/T/bkcYc9Krw4rE@slm.duckdns.org>
On Tue, Feb 21, 2023 at 07:29:18AM -1000, Tejun Heo wrote:
> On Tue, Feb 21, 2023 at 01:25:59PM -0400, Jason Gunthorpe wrote:
> > On Tue, Feb 21, 2023 at 06:51:48AM -1000, Tejun Heo wrote:
> > > cgroup, right? It makes little sense to me to separate the owner of the
> > > memory page and the pinner of it. They should be one and the same.
> >
> > The owner and pinner are not always the same entity or we could just
> > use the page's cgroup.
>
> Yeah, so, what I'm trying to say is that that might be the source of the
> problem. Is the current page ownership attribution correct
It should be correct.
This mechanism is driven by pin_user_page(), (as it is the only API
that can actually create a pin) so the cgroup owner of the page is
broadly related to the "owner" of the VMA's inode.
The owner of the pin is the caller of pin_user_page(), which is
initated by some FD/proces that is not necessarily related to the
VMA's inode.
Eg concretely, something like io_uring will do something like:
buffer = mmap() <- Charge memcg for the pages
fd = io_uring_setup(..)
io_uring_register(fd,xx,buffer,..); <- Charge the pincg for the pin
If mmap is a private anonymous VMA created by the same process then it
is likely the pages will have the same cgroup as io_uring_register and
the FD.
Otherwise the page cgroup is unconstrained. MAP_SHARED mappings will
have the page cgroup point at whatever cgroup was first to allocate
the page for the VMA's inode.
AFAIK there are few real use cases to establish a pin on MAP_SHARED
mappings outside your cgroup. However, it is possible, the APIs allow
it, and for security sandbox purposes we can't allow a process inside
a cgroup to triger a charge on a different cgroup. That breaks the
sandbox goal.
If memcg could support multiple owners then it would be logical that
the pinner would be one of the memcg owners.
> for whatever reason is determining the pinning ownership or should the page
> ownership be attributed the same way too? If they indeed need to differ,
> that probably would need pretty strong justifications.
It is inherent to how pin_user_pages() works. It is an API that
establishs pins on existing pages. There is nothing about it that says
who the page's memcg owner is.
I don't think we can do anything about this without breaking things.
Jason
next prev parent reply other threads:[~2023-02-21 17:51 UTC|newest]
Thread overview: 128+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-06 7:47 [PATCH 00/19] mm: Introduce a cgroup to limit the amount of locked and pinned memory Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 01/19] mm: Introduce vm_account Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-06 7:47 ` Alistair Popple
[not found] ` <cover.c238416f0e82377b449846dbb2459ae9d7030c8e.1675669136.git-series.apopple-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-06 7:47 ` [PATCH 02/19] drivers/vhost: Convert to use vm_account Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 03/19] drivers/vdpa: Convert vdpa to use the new vm_structure Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 04/19] infiniband/umem: Convert to use vm_account Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 05/19] RMDA/siw: " Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-12 17:32 ` Bernard Metzler
2023-02-06 7:47 ` [PATCH 06/19] RDMA/usnic: convert " Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 07/19] vfio/type1: Charge pinned pages to pinned_vm instead of locked_vm Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 13/19] fpga: dfl: afu: convert to use vm_account Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 14/19] mm: Introduce a cgroup for pinned memory Alistair Popple
2023-02-06 7:47 ` Alistair Popple
[not found] ` <c7b5e502d1a3b9b8f6e96cbf9ca553b143c327e0.1675669136.git-series.apopple-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-06 21:01 ` Yosry Ahmed
2023-02-06 21:01 ` Yosry Ahmed
2023-02-06 21:14 ` Tejun Heo
2023-02-06 21:14 ` Tejun Heo
[not found] ` <Y+Fttp1ozejoSQzl-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2023-02-06 22:32 ` Yosry Ahmed
2023-02-06 22:32 ` Yosry Ahmed
2023-02-06 22:36 ` Tejun Heo
[not found] ` <Y+GA6Y7SVhAW5Xm9-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2023-02-06 22:39 ` Yosry Ahmed
2023-02-06 22:39 ` Yosry Ahmed
[not found] ` <CAJD7tka6SC1ho-dffV0bK_acoZd-5DQzBOy0xg3TkOFG1zAPMg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2023-02-06 23:25 ` Tejun Heo
2023-02-06 23:25 ` Tejun Heo
[not found] ` <Y+GMbWWP/YhtJQqe-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2023-02-06 23:34 ` Yosry Ahmed
2023-02-06 23:34 ` Yosry Ahmed
2023-02-06 23:40 ` Jason Gunthorpe
2023-02-06 23:40 ` Jason Gunthorpe
[not found] ` <Y+GQB9I6MFN6BOFw-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-07 0:32 ` Tejun Heo
2023-02-07 0:32 ` Tejun Heo
[not found] ` <Y+GcJQRhvjqFaaSp-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2023-02-07 12:19 ` Jason Gunthorpe
2023-02-07 12:19 ` Jason Gunthorpe
2023-02-15 19:00 ` Michal Hocko
2023-02-15 19:00 ` Michal Hocko
[not found] ` <Y+0rxoM4w9nilUMZ-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2023-02-15 19:07 ` Jason Gunthorpe
2023-02-15 19:07 ` Jason Gunthorpe
[not found] ` <Y+0tWZxMUx/NZ3Ne-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-16 8:04 ` Michal Hocko
2023-02-16 8:04 ` Michal Hocko
[not found] ` <Y+3jcw9vo4ml5p0M-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2023-02-16 12:45 ` Jason Gunthorpe
2023-02-16 12:45 ` Jason Gunthorpe
[not found] ` <Y+4lcq4Fge27TQIn-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-21 16:51 ` Tejun Heo
2023-02-21 16:51 ` Tejun Heo
2023-02-21 17:25 ` Jason Gunthorpe
[not found] ` <Y/T+pw25oGmKqz1k-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-21 17:29 ` Tejun Heo
2023-02-21 17:29 ` Tejun Heo
[not found] ` <Y/T/bkcYc9Krw4rE-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2023-02-21 17:51 ` Jason Gunthorpe [this message]
2023-02-21 17:51 ` Jason Gunthorpe
[not found] ` <Y/UEkNn0O65Pfi4e-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-21 18:07 ` Tejun Heo
2023-02-21 18:07 ` Tejun Heo
[not found] ` <Y/UIURDjR9pv+gzx-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2023-02-21 19:26 ` Jason Gunthorpe
2023-02-21 19:26 ` Jason Gunthorpe
[not found] ` <Y/Ua6VcNe/DFh7X4-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-21 19:45 ` Tejun Heo
2023-02-21 19:45 ` Tejun Heo
2023-02-21 19:49 ` Tejun Heo
2023-02-21 19:49 ` Tejun Heo
2023-02-21 19:57 ` Jason Gunthorpe
[not found] ` <Y/UiQmuVwh2eqrfA-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-22 11:38 ` Alistair Popple
2023-02-22 11:38 ` Alistair Popple
[not found] ` <87o7pmnd0p.fsf-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-22 12:57 ` Jason Gunthorpe
2023-02-22 12:57 ` Jason Gunthorpe
[not found] ` <Y/YRJNwwvqp7nKKt-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-22 22:59 ` Alistair Popple
2023-02-22 22:59 ` Alistair Popple
2023-02-23 0:05 ` Christoph Hellwig
[not found] ` <Y/at3iYz/xBSPPM+-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2023-02-23 0:35 ` Alistair Popple
2023-02-23 0:35 ` Alistair Popple
[not found] ` <87k009nvnr.fsf-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-23 1:53 ` Jason Gunthorpe
2023-02-23 1:53 ` Jason Gunthorpe
2023-02-23 9:12 ` Daniel P. Berrangé
[not found] ` <Y/ct88JBeQuSmCuj-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2023-02-23 17:31 ` Jason Gunthorpe
2023-02-23 17:31 ` Jason Gunthorpe
2023-02-23 17:18 ` T.J. Mercier
[not found] ` <CABdmKX18MY19bnsxN5W38Z9zmoaZx+S4+zzN_5XCYDBruwPrLg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2023-02-23 17:28 ` Jason Gunthorpe
2023-02-23 17:28 ` Jason Gunthorpe
2023-02-23 18:03 ` Yosry Ahmed
[not found] ` <CAJD7tkadBRP22qP63-SjKSch1im9sHLoMzc6c2h10+ggbuxqMg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2023-02-23 18:10 ` Jason Gunthorpe
2023-02-23 18:10 ` Jason Gunthorpe
[not found] ` <Y/esMBOyTaJnv5CW-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-23 18:14 ` Yosry Ahmed
2023-02-23 18:14 ` Yosry Ahmed
2023-02-23 18:15 ` Tejun Heo
[not found] ` <Y/etNfwxHv8XO3iC-NiLfg/pYEd1N0TnZuCh8vA@public.gmane.org>
2023-02-23 18:17 ` Jason Gunthorpe
2023-02-23 18:17 ` Jason Gunthorpe
[not found] ` <Y/etrtvyaiwRIo6f-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-23 18:22 ` Tejun Heo
2023-02-23 18:22 ` Tejun Heo
2023-02-07 1:00 ` Waiman Long
2023-02-07 1:00 ` Waiman Long
2023-02-07 1:03 ` Tejun Heo
[not found] ` <Y+GjSTu9vE/A/EKG-qYNAdHglDFBN0TnZuCh8vA@public.gmane.org>
2023-02-07 1:50 ` Alistair Popple
2023-02-07 1:50 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 15/19] mm/util: Extend vm_account to charge pages against the pin cgroup Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 16/19] mm/util: Refactor account_locked_vm Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 17/19] mm: Convert mmap and mlock to use account_locked_vm Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 18/19] mm/mmap: Charge locked memory to pins cgroup Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-06 21:12 ` Yosry Ahmed
2023-02-06 7:47 ` [PATCH 19/19] selftests/vm: Add pins-cgroup selftest for mlock/mmap Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-16 11:01 ` [PATCH 00/19] mm: Introduce a cgroup to limit the amount of locked and pinned memory David Hildenbrand
2023-02-16 11:01 ` David Hildenbrand
2023-02-06 7:47 ` [PATCH 08/19] vfio/spapr_tce: Convert accounting to pinned_vm Alistair Popple
2023-02-06 7:47 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 09/19] io_uring: convert to use vm_account Alistair Popple
[not found] ` <44e6ead48bc53789191b22b0e140aeb82459e75f.1675669136.git-series.apopple-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-06 15:29 ` Jens Axboe
2023-02-06 15:29 ` Jens Axboe
[not found] ` <52d41a7e-1407-e74f-9206-6dd583b7b6b5-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2023-02-07 1:03 ` Alistair Popple
2023-02-07 1:03 ` Alistair Popple
2023-02-07 14:28 ` Jens Axboe
[not found] ` <eff3cc48-7279-2fbf-fdbd-f35eff2124d0-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2023-02-07 14:55 ` Jason Gunthorpe
2023-02-07 14:55 ` Jason Gunthorpe
[not found] ` <Y+JmdMJhPEGN0Zw+-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2023-02-07 17:05 ` Jens Axboe
2023-02-07 17:05 ` Jens Axboe
[not found] ` <53816439-6473-1c4f-2134-02cd1c46cfe8-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2023-02-13 11:30 ` Alistair Popple
2023-02-13 11:30 ` Alistair Popple
2023-02-06 7:47 ` [PATCH 10/19] net: skb: Switch to using vm_account Alistair Popple
2023-02-06 7:47 ` [PATCH 11/19] xdp: convert to use vm_account Alistair Popple
2023-02-06 7:47 ` [PATCH 12/19] kvm/book3s_64_vio: Convert account_locked_vm() to vm_account_pinned() Alistair Popple
2023-02-06 7:47 ` Alistair Popple
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y/UEkNn0O65Pfi4e@nvidia.com \
--to=jgg-ddmlm1+adcrqt0dzr+alfa@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=alex.williamson-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=apopple-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org \
--cc=berrange-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=daniel-/w4YWyX8dFk@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=jhubbard-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=lizefan.x-EC8Uxl6Npydl57MIdRCFDg@public.gmane.org \
--cc=mhocko-IBi9RG/b67k@public.gmane.org \
--cc=mkoutny-IBi9RG/b67k@public.gmane.org \
--cc=surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
--cc=tjmercier-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.