From: Mike Rapoport <rppt@kernel.org>
To: Roman Gushchin <guroan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
Borislav Petkov <bp@alien8.de>,
Catalin Marinas <catalin.marinas@arm.com>,
Christopher Lameter <cl@linux.com>,
Dan Williams <dan.j.williams@intel.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Hildenbrand <david@redhat.com>,
Elena Reshetova <elena.reshetova@intel.com>,
"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
James Bottomley <jejb@linux.ibm.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Matthew Wilcox <willy@infradead.org>,
Mark Rutland <mark.rutland@arm.com>,
Mike Rapoport <rppt@linux.ibm.com>,
Michael Kerrisk <mtk.manpages@gmail.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Peter Zijlstra <peterz@infradead.org>,
Rick Edgecombe <rick.p.edgecombe@intel.com>,
Shuah Khan <shuah@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Tycho Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
linux-api@vger.kernel.org, linux-arch@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
x86@kernel.org
Subject: Re: [PATCH v8 6/9] secretmem: add memcg accounting
Date: Sun, 15 Nov 2020 11:17:00 +0200 [thread overview]
Message-ID: <20201115091700.GY4758@kernel.org> (raw)
In-Reply-To: <CALo0P13aq3GsONnZrksZNU9RtfhMsZXGWhK1n=xYJWQizCd4Zw@mail.gmail.com>
On Fri, Nov 13, 2020 at 03:42:25PM -0800, Roman Gushchin wrote:
> вт, 10 нояб. 2020 г. в 07:16, Mike Rapoport <rppt@kernel.org>:
> >
> > From: Mike Rapoport <rppt@linux.ibm.com>
> >
> > Account memory consumed by secretmem to memcg. The accounting is updated
> > when the memory is actually allocated and freed.
> >
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> > mm/filemap.c | 2 +-
> > mm/secretmem.c | 42 +++++++++++++++++++++++++++++++++++++++++-
> > 2 files changed, 42 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 249cf489f5df..11387a077373 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -844,7 +844,7 @@ static noinline int __add_to_page_cache_locked(struct page *page,
> > page->mapping = mapping;
> > page->index = offset;
> >
> > - if (!huge) {
> > + if (!huge && !page->memcg_data) {
> > error = mem_cgroup_charge(page, current->mm, gfp);
> > if (error)
> > goto error;
> > diff --git a/mm/secretmem.c b/mm/secretmem.c
> > index 1aa2b7cffe0d..1eb7667016fa 100644
> > --- a/mm/secretmem.c
> > +++ b/mm/secretmem.c
> > @@ -17,6 +17,7 @@
> > #include <linux/syscalls.h>
> > #include <linux/memblock.h>
> > #include <linux/pseudo_fs.h>
> > +#include <linux/memcontrol.h>
> > #include <linux/set_memory.h>
> > #include <linux/sched/signal.h>
> >
> > @@ -49,6 +50,38 @@ struct secretmem_ctx {
> >
> > static struct cma *secretmem_cma;
> >
>
> Hi Mike!
>
> > +static int secretmem_memcg_charge(struct page *page, gfp_t gfp, int order)
> > +{
> > + unsigned long nr_pages = (1 << order);
> > + int i, err;
> > +
> > + err = memcg_kmem_charge_page(page, gfp, order);
> > + if (err)
> > + return err;
> > +
> > + for (i = 1; i < nr_pages; i++) {
> > + struct page *p = page + i;
> > +
> > + p->memcg_data = page->memcg_data;
> > + }
>
> Hm, it looks very strange to me. Why do we need to copy memcg_data?
> What about css reference counting?
I need to copy memcg_data to mark a page as being accounted so it won't
be charged again when it is added to page cache.
What happens here is that I allocate a large page and then use it as a
local cache for allocations in secretmem_fault(). I charge the large
page as kmem.
During secretmem_fault() a small sub-page from that large page goes into
page cache and there I skip its memcg accounting.
In the end, when the large page is freed, the memcg_data for all its
sub-pages is cleared and I uncharge memcg with the order of large page.
An alternative would be to uncharge a small page from kmem in
secretmem_fault() and make this page charged in add_to_page_cache(), but
that would complicate the release path as I would need to re-charge the
small page back to kmem at secretmem_freepage() and track all the
participating memcgs till the large page is freed.
> And what about statistics?
Hmm, that's probably won't be accurate :-/
> I'm sorry for being late.
>
> Thank you!
>
> > +
> > + return 0;
> > +}
> > +
> > +static void secretmem_memcg_uncharge(struct page *page, int order)
> > +{
> > + unsigned long nr_pages = (1 << order);
> > + int i;
> > +
> > + for (i = 1; i < nr_pages; i++) {
> > + struct page *p = page + i;
> > +
> > + p->memcg_data = 0;
> > + }
> > +
> > + memcg_kmem_uncharge_page(page, PMD_PAGE_ORDER);
> > +}
> > +
> > static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > {
> > unsigned long nr_pages = (1 << PMD_PAGE_ORDER);
> > @@ -61,10 +94,14 @@ static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > if (!page)
> > return -ENOMEM;
> >
> > - err = set_direct_map_invalid_noflush(page, nr_pages);
> > + err = secretmem_memcg_charge(page, gfp, PMD_PAGE_ORDER);
> > if (err)
> > goto err_cma_release;
> >
> > + err = set_direct_map_invalid_noflush(page, nr_pages);
> > + if (err)
> > + goto err_memcg_uncharge;
> > +
> > addr = (unsigned long)page_address(page);
> > err = gen_pool_add(pool, addr, PMD_SIZE, NUMA_NO_NODE);
> > if (err)
> > @@ -81,6 +118,8 @@ static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > * won't fail
> > */
> > set_direct_map_default_noflush(page, nr_pages);
> > +err_memcg_uncharge:
> > + secretmem_memcg_uncharge(page, PMD_PAGE_ORDER);
> > err_cma_release:
> > cma_release(secretmem_cma, page, nr_pages);
> > return err;
> > @@ -310,6 +349,7 @@ static void secretmem_cleanup_chunk(struct gen_pool *pool,
> > int i;
> >
> > set_direct_map_default_noflush(page, nr_pages);
> > + secretmem_memcg_uncharge(page, PMD_PAGE_ORDER);
> >
> > for (i = 0; i < nr_pages; i++)
> > clear_highpage(page + i);
> > --
> > 2.28.0
> >
> >
--
Sincerely yours,
Mike.
WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@kernel.org>
To: Roman Gushchin <guroan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andy Lutomirski <luto@kernel.org>, Arnd Bergmann <arnd@arndb.de>,
Borislav Petkov <bp@alien8.de>,
Catalin Marinas <catalin.marinas@arm.com>,
Christopher Lameter <cl@linux.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
David Hildenbrand <david@redhat.com>,
Elena Reshetova <elena.reshetova@intel.com>,
"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
James Bottomley <jejb@linux.ibm.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Matthew Wilcox <willy@infradead.org>,
Mark Rutland <mark.rutland@arm.com>,
Mike Rapoport <rppt@linux.ibm.com>,
Michael Kerrisk <mtk.manpages@gmail.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Peter Zijlstra <peterz@infradead.org>,
Rick Edgecombe <rick.p.edgecombe@intel.com>,
Shuah Khan <shuah@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Tycho Andersen <tycho@tycho.ws>, Will Deacon <will@kernel.org>,
linux-ap i@vger.kernel.org, linux-arch@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org,
x86@kernel.org
Subject: Re: [PATCH v8 6/9] secretmem: add memcg accounting
Date: Sun, 15 Nov 2020 11:17:00 +0200 [thread overview]
Message-ID: <20201115091700.GY4758@kernel.org> (raw)
In-Reply-To: <CALo0P13aq3GsONnZrksZNU9RtfhMsZXGWhK1n=xYJWQizCd4Zw@mail.gmail.com>
On Fri, Nov 13, 2020 at 03:42:25PM -0800, Roman Gushchin wrote:
> вт, 10 нояб. 2020 г. в 07:16, Mike Rapoport <rppt@kernel.org>:
> >
> > From: Mike Rapoport <rppt@linux.ibm.com>
> >
> > Account memory consumed by secretmem to memcg. The accounting is updated
> > when the memory is actually allocated and freed.
> >
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> > mm/filemap.c | 2 +-
> > mm/secretmem.c | 42 +++++++++++++++++++++++++++++++++++++++++-
> > 2 files changed, 42 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 249cf489f5df..11387a077373 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -844,7 +844,7 @@ static noinline int __add_to_page_cache_locked(struct page *page,
> > page->mapping = mapping;
> > page->index = offset;
> >
> > - if (!huge) {
> > + if (!huge && !page->memcg_data) {
> > error = mem_cgroup_charge(page, current->mm, gfp);
> > if (error)
> > goto error;
> > diff --git a/mm/secretmem.c b/mm/secretmem.c
> > index 1aa2b7cffe0d..1eb7667016fa 100644
> > --- a/mm/secretmem.c
> > +++ b/mm/secretmem.c
> > @@ -17,6 +17,7 @@
> > #include <linux/syscalls.h>
> > #include <linux/memblock.h>
> > #include <linux/pseudo_fs.h>
> > +#include <linux/memcontrol.h>
> > #include <linux/set_memory.h>
> > #include <linux/sched/signal.h>
> >
> > @@ -49,6 +50,38 @@ struct secretmem_ctx {
> >
> > static struct cma *secretmem_cma;
> >
>
> Hi Mike!
>
> > +static int secretmem_memcg_charge(struct page *page, gfp_t gfp, int order)
> > +{
> > + unsigned long nr_pages = (1 << order);
> > + int i, err;
> > +
> > + err = memcg_kmem_charge_page(page, gfp, order);
> > + if (err)
> > + return err;
> > +
> > + for (i = 1; i < nr_pages; i++) {
> > + struct page *p = page + i;
> > +
> > + p->memcg_data = page->memcg_data;
> > + }
>
> Hm, it looks very strange to me. Why do we need to copy memcg_data?
> What about css reference counting?
I need to copy memcg_data to mark a page as being accounted so it won't
be charged again when it is added to page cache.
What happens here is that I allocate a large page and then use it as a
local cache for allocations in secretmem_fault(). I charge the large
page as kmem.
During secretmem_fault() a small sub-page from that large page goes into
page cache and there I skip its memcg accounting.
In the end, when the large page is freed, the memcg_data for all its
sub-pages is cleared and I uncharge memcg with the order of large page.
An alternative would be to uncharge a small page from kmem in
secretmem_fault() and make this page charged in add_to_page_cache(), but
that would complicate the release path as I would need to re-charge the
small page back to kmem at secretmem_freepage() and track all the
participating memcgs till the large page is freed.
> And what about statistics?
Hmm, that's probably won't be accurate :-/
> I'm sorry for being late.
>
> Thank you!
>
> > +
> > + return 0;
> > +}
> > +
> > +static void secretmem_memcg_uncharge(struct page *page, int order)
> > +{
> > + unsigned long nr_pages = (1 << order);
> > + int i;
> > +
> > + for (i = 1; i < nr_pages; i++) {
> > + struct page *p = page + i;
> > +
> > + p->memcg_data = 0;
> > + }
> > +
> > + memcg_kmem_uncharge_page(page, PMD_PAGE_ORDER);
> > +}
> > +
> > static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > {
> > unsigned long nr_pages = (1 << PMD_PAGE_ORDER);
> > @@ -61,10 +94,14 @@ static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > if (!page)
> > return -ENOMEM;
> >
> > - err = set_direct_map_invalid_noflush(page, nr_pages);
> > + err = secretmem_memcg_charge(page, gfp, PMD_PAGE_ORDER);
> > if (err)
> > goto err_cma_release;
> >
> > + err = set_direct_map_invalid_noflush(page, nr_pages);
> > + if (err)
> > + goto err_memcg_uncharge;
> > +
> > addr = (unsigned long)page_address(page);
> > err = gen_pool_add(pool, addr, PMD_SIZE, NUMA_NO_NODE);
> > if (err)
> > @@ -81,6 +118,8 @@ static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > * won't fail
> > */
> > set_direct_map_default_noflush(page, nr_pages);
> > +err_memcg_uncharge:
> > + secretmem_memcg_uncharge(page, PMD_PAGE_ORDER);
> > err_cma_release:
> > cma_release(secretmem_cma, page, nr_pages);
> > return err;
> > @@ -310,6 +349,7 @@ static void secretmem_cleanup_chunk(struct gen_pool *pool,
> > int i;
> >
> > set_direct_map_default_noflush(page, nr_pages);
> > + secretmem_memcg_uncharge(page, PMD_PAGE_ORDER);
> >
> > for (i = 0; i < nr_pages; i++)
> > clear_highpage(page + i);
> > --
> > 2.28.0
> >
> >
--
Sincerely yours,
Mike.
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@kernel.org>
To: Roman Gushchin <guroan@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
David Hildenbrand <david@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
"H. Peter Anvin" <hpa@zytor.com>,
Christopher Lameter <cl@linux.com>, Shuah Khan <shuah@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Elena Reshetova <elena.reshetova@intel.com>,
linux-arch@vger.kernel.org, Tycho Andersen <tycho@tycho.ws>,
linux-nvdimm@lists.01.org, Will Deacon <will@kernel.org>,
x86@kernel.org, Matthew Wilcox <willy@infradead.org>,
Mike Rapoport <rppt@linux.ibm.com>,
Ingo Molnar <mingo@redhat.com>,
Michael Kerrisk <mtk.manpages@gmail.com>,
Arnd Bergmann <arnd@arndb.de>,
James Bottomley <jejb@linux.ibm.com>,
Borislav Petkov <bp@alien8.de>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andy Lutomirski <luto@kernel.org>,
Paul Walmsley <paul.walmsley@sifive.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Dan Williams <dan.j.williams@intel.com>,
linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
Palmer Dabbelt <palmer@dabbelt.com>,
linux-fsdevel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Rick Edgecombe <rick.p.edgecombe@intel.com>
Subject: Re: [PATCH v8 6/9] secretmem: add memcg accounting
Date: Sun, 15 Nov 2020 11:17:00 +0200 [thread overview]
Message-ID: <20201115091700.GY4758@kernel.org> (raw)
In-Reply-To: <CALo0P13aq3GsONnZrksZNU9RtfhMsZXGWhK1n=xYJWQizCd4Zw@mail.gmail.com>
On Fri, Nov 13, 2020 at 03:42:25PM -0800, Roman Gushchin wrote:
> вт, 10 нояб. 2020 г. в 07:16, Mike Rapoport <rppt@kernel.org>:
> >
> > From: Mike Rapoport <rppt@linux.ibm.com>
> >
> > Account memory consumed by secretmem to memcg. The accounting is updated
> > when the memory is actually allocated and freed.
> >
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> > mm/filemap.c | 2 +-
> > mm/secretmem.c | 42 +++++++++++++++++++++++++++++++++++++++++-
> > 2 files changed, 42 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 249cf489f5df..11387a077373 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -844,7 +844,7 @@ static noinline int __add_to_page_cache_locked(struct page *page,
> > page->mapping = mapping;
> > page->index = offset;
> >
> > - if (!huge) {
> > + if (!huge && !page->memcg_data) {
> > error = mem_cgroup_charge(page, current->mm, gfp);
> > if (error)
> > goto error;
> > diff --git a/mm/secretmem.c b/mm/secretmem.c
> > index 1aa2b7cffe0d..1eb7667016fa 100644
> > --- a/mm/secretmem.c
> > +++ b/mm/secretmem.c
> > @@ -17,6 +17,7 @@
> > #include <linux/syscalls.h>
> > #include <linux/memblock.h>
> > #include <linux/pseudo_fs.h>
> > +#include <linux/memcontrol.h>
> > #include <linux/set_memory.h>
> > #include <linux/sched/signal.h>
> >
> > @@ -49,6 +50,38 @@ struct secretmem_ctx {
> >
> > static struct cma *secretmem_cma;
> >
>
> Hi Mike!
>
> > +static int secretmem_memcg_charge(struct page *page, gfp_t gfp, int order)
> > +{
> > + unsigned long nr_pages = (1 << order);
> > + int i, err;
> > +
> > + err = memcg_kmem_charge_page(page, gfp, order);
> > + if (err)
> > + return err;
> > +
> > + for (i = 1; i < nr_pages; i++) {
> > + struct page *p = page + i;
> > +
> > + p->memcg_data = page->memcg_data;
> > + }
>
> Hm, it looks very strange to me. Why do we need to copy memcg_data?
> What about css reference counting?
I need to copy memcg_data to mark a page as being accounted so it won't
be charged again when it is added to page cache.
What happens here is that I allocate a large page and then use it as a
local cache for allocations in secretmem_fault(). I charge the large
page as kmem.
During secretmem_fault() a small sub-page from that large page goes into
page cache and there I skip its memcg accounting.
In the end, when the large page is freed, the memcg_data for all its
sub-pages is cleared and I uncharge memcg with the order of large page.
An alternative would be to uncharge a small page from kmem in
secretmem_fault() and make this page charged in add_to_page_cache(), but
that would complicate the release path as I would need to re-charge the
small page back to kmem at secretmem_freepage() and track all the
participating memcgs till the large page is freed.
> And what about statistics?
Hmm, that's probably won't be accurate :-/
> I'm sorry for being late.
>
> Thank you!
>
> > +
> > + return 0;
> > +}
> > +
> > +static void secretmem_memcg_uncharge(struct page *page, int order)
> > +{
> > + unsigned long nr_pages = (1 << order);
> > + int i;
> > +
> > + for (i = 1; i < nr_pages; i++) {
> > + struct page *p = page + i;
> > +
> > + p->memcg_data = 0;
> > + }
> > +
> > + memcg_kmem_uncharge_page(page, PMD_PAGE_ORDER);
> > +}
> > +
> > static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > {
> > unsigned long nr_pages = (1 << PMD_PAGE_ORDER);
> > @@ -61,10 +94,14 @@ static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > if (!page)
> > return -ENOMEM;
> >
> > - err = set_direct_map_invalid_noflush(page, nr_pages);
> > + err = secretmem_memcg_charge(page, gfp, PMD_PAGE_ORDER);
> > if (err)
> > goto err_cma_release;
> >
> > + err = set_direct_map_invalid_noflush(page, nr_pages);
> > + if (err)
> > + goto err_memcg_uncharge;
> > +
> > addr = (unsigned long)page_address(page);
> > err = gen_pool_add(pool, addr, PMD_SIZE, NUMA_NO_NODE);
> > if (err)
> > @@ -81,6 +118,8 @@ static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > * won't fail
> > */
> > set_direct_map_default_noflush(page, nr_pages);
> > +err_memcg_uncharge:
> > + secretmem_memcg_uncharge(page, PMD_PAGE_ORDER);
> > err_cma_release:
> > cma_release(secretmem_cma, page, nr_pages);
> > return err;
> > @@ -310,6 +349,7 @@ static void secretmem_cleanup_chunk(struct gen_pool *pool,
> > int i;
> >
> > set_direct_map_default_noflush(page, nr_pages);
> > + secretmem_memcg_uncharge(page, PMD_PAGE_ORDER);
> >
> > for (i = 0; i < nr_pages; i++)
> > clear_highpage(page + i);
> > --
> > 2.28.0
> >
> >
--
Sincerely yours,
Mike.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
WARNING: multiple messages have this Message-ID (diff)
From: Mike Rapoport <rppt@kernel.org>
To: Roman Gushchin <guroan@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
David Hildenbrand <david@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
linux-mm@kvack.org, linux-kselftest@vger.kernel.org,
"H. Peter Anvin" <hpa@zytor.com>,
Christopher Lameter <cl@linux.com>, Shuah Khan <shuah@kernel.org>,
Thomas Gleixner <tglx@linutronix.de>,
Elena Reshetova <elena.reshetova@intel.com>,
linux-arch@vger.kernel.org, Tycho Andersen <tycho@tycho.ws>,
linux-nvdimm@lists.01.org, Will Deacon <will@kernel.org>,
x86@kernel.org, Matthew Wilcox <willy@infradead.org>,
Mike Rapoport <rppt@linux.ibm.com>,
Ingo Molnar <mingo@redhat.com>,
Michael Kerrisk <mtk.manpages@gmail.com>,
Arnd Bergmann <arnd@arndb.de>,
James Bottomley <jejb@linux.ibm.com>,
Borislav Petkov <bp@alien8.de>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andy Lutomirski <luto@kernel.org>,
Paul Walmsley <paul.walmsley@sifive.com>,
"Kirill A. Shutemov" <kirill@shutemov.name>,
Dan Williams <dan.j.williams@intel.com>,
linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
Palmer Dabbelt <palmer@dabbelt.com>,
linux-fsdevel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Rick Edgecombe <rick.p.edgecombe@intel.com>
Subject: Re: [PATCH v8 6/9] secretmem: add memcg accounting
Date: Sun, 15 Nov 2020 11:17:00 +0200 [thread overview]
Message-ID: <20201115091700.GY4758@kernel.org> (raw)
In-Reply-To: <CALo0P13aq3GsONnZrksZNU9RtfhMsZXGWhK1n=xYJWQizCd4Zw@mail.gmail.com>
On Fri, Nov 13, 2020 at 03:42:25PM -0800, Roman Gushchin wrote:
> вт, 10 нояб. 2020 г. в 07:16, Mike Rapoport <rppt@kernel.org>:
> >
> > From: Mike Rapoport <rppt@linux.ibm.com>
> >
> > Account memory consumed by secretmem to memcg. The accounting is updated
> > when the memory is actually allocated and freed.
> >
> > Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
> > ---
> > mm/filemap.c | 2 +-
> > mm/secretmem.c | 42 +++++++++++++++++++++++++++++++++++++++++-
> > 2 files changed, 42 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 249cf489f5df..11387a077373 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -844,7 +844,7 @@ static noinline int __add_to_page_cache_locked(struct page *page,
> > page->mapping = mapping;
> > page->index = offset;
> >
> > - if (!huge) {
> > + if (!huge && !page->memcg_data) {
> > error = mem_cgroup_charge(page, current->mm, gfp);
> > if (error)
> > goto error;
> > diff --git a/mm/secretmem.c b/mm/secretmem.c
> > index 1aa2b7cffe0d..1eb7667016fa 100644
> > --- a/mm/secretmem.c
> > +++ b/mm/secretmem.c
> > @@ -17,6 +17,7 @@
> > #include <linux/syscalls.h>
> > #include <linux/memblock.h>
> > #include <linux/pseudo_fs.h>
> > +#include <linux/memcontrol.h>
> > #include <linux/set_memory.h>
> > #include <linux/sched/signal.h>
> >
> > @@ -49,6 +50,38 @@ struct secretmem_ctx {
> >
> > static struct cma *secretmem_cma;
> >
>
> Hi Mike!
>
> > +static int secretmem_memcg_charge(struct page *page, gfp_t gfp, int order)
> > +{
> > + unsigned long nr_pages = (1 << order);
> > + int i, err;
> > +
> > + err = memcg_kmem_charge_page(page, gfp, order);
> > + if (err)
> > + return err;
> > +
> > + for (i = 1; i < nr_pages; i++) {
> > + struct page *p = page + i;
> > +
> > + p->memcg_data = page->memcg_data;
> > + }
>
> Hm, it looks very strange to me. Why do we need to copy memcg_data?
> What about css reference counting?
I need to copy memcg_data to mark a page as being accounted so it won't
be charged again when it is added to page cache.
What happens here is that I allocate a large page and then use it as a
local cache for allocations in secretmem_fault(). I charge the large
page as kmem.
During secretmem_fault() a small sub-page from that large page goes into
page cache and there I skip its memcg accounting.
In the end, when the large page is freed, the memcg_data for all its
sub-pages is cleared and I uncharge memcg with the order of large page.
An alternative would be to uncharge a small page from kmem in
secretmem_fault() and make this page charged in add_to_page_cache(), but
that would complicate the release path as I would need to re-charge the
small page back to kmem at secretmem_freepage() and track all the
participating memcgs till the large page is freed.
> And what about statistics?
Hmm, that's probably won't be accurate :-/
> I'm sorry for being late.
>
> Thank you!
>
> > +
> > + return 0;
> > +}
> > +
> > +static void secretmem_memcg_uncharge(struct page *page, int order)
> > +{
> > + unsigned long nr_pages = (1 << order);
> > + int i;
> > +
> > + for (i = 1; i < nr_pages; i++) {
> > + struct page *p = page + i;
> > +
> > + p->memcg_data = 0;
> > + }
> > +
> > + memcg_kmem_uncharge_page(page, PMD_PAGE_ORDER);
> > +}
> > +
> > static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > {
> > unsigned long nr_pages = (1 << PMD_PAGE_ORDER);
> > @@ -61,10 +94,14 @@ static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > if (!page)
> > return -ENOMEM;
> >
> > - err = set_direct_map_invalid_noflush(page, nr_pages);
> > + err = secretmem_memcg_charge(page, gfp, PMD_PAGE_ORDER);
> > if (err)
> > goto err_cma_release;
> >
> > + err = set_direct_map_invalid_noflush(page, nr_pages);
> > + if (err)
> > + goto err_memcg_uncharge;
> > +
> > addr = (unsigned long)page_address(page);
> > err = gen_pool_add(pool, addr, PMD_SIZE, NUMA_NO_NODE);
> > if (err)
> > @@ -81,6 +118,8 @@ static int secretmem_pool_increase(struct secretmem_ctx *ctx, gfp_t gfp)
> > * won't fail
> > */
> > set_direct_map_default_noflush(page, nr_pages);
> > +err_memcg_uncharge:
> > + secretmem_memcg_uncharge(page, PMD_PAGE_ORDER);
> > err_cma_release:
> > cma_release(secretmem_cma, page, nr_pages);
> > return err;
> > @@ -310,6 +349,7 @@ static void secretmem_cleanup_chunk(struct gen_pool *pool,
> > int i;
> >
> > set_direct_map_default_noflush(page, nr_pages);
> > + secretmem_memcg_uncharge(page, PMD_PAGE_ORDER);
> >
> > for (i = 0; i < nr_pages; i++)
> > clear_highpage(page + i);
> > --
> > 2.28.0
> >
> >
--
Sincerely yours,
Mike.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-11-15 9:17 UTC|newest]
Thread overview: 116+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-10 15:14 [PATCH v8 0/9] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 1/9] mm: add definition of PMD_PAGE_ORDER Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 2/9] mmap: make mlock_future_check() global Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 17:17 ` David Hildenbrand
2020-11-10 17:17 ` David Hildenbrand
2020-11-10 17:17 ` David Hildenbrand
2020-11-10 17:17 ` David Hildenbrand
2020-11-10 18:06 ` Mike Rapoport
2020-11-10 18:06 ` Mike Rapoport
2020-11-10 18:06 ` Mike Rapoport
2020-11-10 18:06 ` Mike Rapoport
2020-11-12 16:22 ` David Hildenbrand
2020-11-12 16:22 ` David Hildenbrand
2020-11-12 16:22 ` David Hildenbrand
2020-11-12 16:22 ` David Hildenbrand
2020-11-12 19:08 ` Mike Rapoport
2020-11-12 19:08 ` Mike Rapoport
2020-11-12 19:08 ` Mike Rapoport
2020-11-12 19:08 ` Mike Rapoport
2020-11-12 20:15 ` David Hildenbrand
2020-11-12 20:15 ` David Hildenbrand
2020-11-12 20:15 ` David Hildenbrand
2020-11-12 20:15 ` David Hildenbrand
2020-11-15 8:26 ` Mike Rapoport
2020-11-15 8:26 ` Mike Rapoport
2020-11-15 8:26 ` Mike Rapoport
2020-11-15 8:26 ` Mike Rapoport
2020-11-17 15:09 ` David Hildenbrand
2020-11-17 15:09 ` David Hildenbrand
2020-11-17 15:09 ` David Hildenbrand
2020-11-17 15:09 ` David Hildenbrand
2020-11-17 15:58 ` Mike Rapoport
2020-11-17 15:58 ` Mike Rapoport
2020-11-17 15:58 ` Mike Rapoport
2020-11-17 15:58 ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 3/9] set_memory: allow set_direct_map_*_noflush() for multiple pages Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-13 12:26 ` Catalin Marinas
2020-11-13 12:26 ` Catalin Marinas
2020-11-13 12:26 ` Catalin Marinas
2020-11-13 12:26 ` Catalin Marinas
2020-11-10 15:14 ` [PATCH v8 4/9] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-13 13:58 ` Matthew Wilcox
2020-11-13 13:58 ` Matthew Wilcox
2020-11-13 13:58 ` Matthew Wilcox
2020-11-13 13:58 ` Matthew Wilcox
2020-11-15 8:53 ` Mike Rapoport
2020-11-15 8:53 ` Mike Rapoport
2020-11-15 8:53 ` Mike Rapoport
2020-11-15 8:53 ` Mike Rapoport
2020-11-13 14:06 ` Matthew Wilcox
2020-11-13 14:06 ` Matthew Wilcox
2020-11-13 14:06 ` Matthew Wilcox
2020-11-13 14:06 ` Matthew Wilcox
2020-11-15 8:45 ` Mike Rapoport
2020-11-15 8:45 ` Mike Rapoport
2020-11-15 8:45 ` Mike Rapoport
2020-11-15 8:45 ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 5/9] secretmem: use PMD-size pages to amortize direct map fragmentation Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 6/9] secretmem: add memcg accounting Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-13 1:35 ` Andrew Morton
2020-11-13 1:35 ` Andrew Morton
2020-11-13 1:35 ` Andrew Morton
2020-11-13 1:35 ` Andrew Morton
2020-11-13 23:42 ` Roman Gushchin
2020-11-13 23:42 ` Roman Gushchin
2020-11-13 23:42 ` Roman Gushchin
2020-11-13 23:42 ` Roman Gushchin
2020-11-15 9:17 ` Mike Rapoport [this message]
2020-11-15 9:17 ` Mike Rapoport
2020-11-15 9:17 ` Mike Rapoport
2020-11-15 9:17 ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 7/9] PM: hibernate: disable when there are active secretmem users Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 8/9] arch, mm: wire up memfd_secret system call were relevant Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-13 12:25 ` Catalin Marinas
2020-11-13 12:25 ` Catalin Marinas
2020-11-13 12:25 ` Catalin Marinas
2020-11-13 12:25 ` Catalin Marinas
2020-11-15 8:56 ` Mike Rapoport
2020-11-15 8:56 ` Mike Rapoport
2020-11-15 8:56 ` Mike Rapoport
2020-11-15 8:56 ` Mike Rapoport
2020-11-10 15:14 ` [PATCH v8 9/9] secretmem: test: add basic selftest for memfd_secret(2) Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-10 15:14 ` Mike Rapoport
2020-11-12 14:56 ` [PATCH v8 0/9] mm: introduce memfd_secret system call to create "secret" memory areas Mike Rapoport
2020-11-12 14:56 ` Mike Rapoport
2020-11-12 14:56 ` Mike Rapoport
2020-11-12 14:56 ` Mike Rapoport
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201115091700.GY4758@kernel.org \
--to=rppt@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=cl@linux.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=elena.reshetova@intel.com \
--cc=guroan@gmail.com \
--cc=hpa@zytor.com \
--cc=jejb@linux.ibm.com \
--cc=kirill@shutemov.name \
--cc=linux-api@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvdimm@lists.01.org \
--cc=linux-riscv@lists.infradead.org \
--cc=luto@kernel.org \
--cc=mark.rutland@arm.com \
--cc=mingo@redhat.com \
--cc=mtk.manpages@gmail.com \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=peterz@infradead.org \
--cc=rick.p.edgecombe@intel.com \
--cc=rppt@linux.ibm.com \
--cc=shuah@kernel.org \
--cc=tglx@linutronix.de \
--cc=tycho@tycho.ws \
--cc=viro@zeniv.linux.org.uk \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.