From: Martynas <m@lambda.lt>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
Y Song <ys114321@gmail.com>
Subject: Re: [PATCH bpf-next] bpf: add optional memory accounting for maps
Date: Fri, 01 Feb 2019 13:03:45 +0200 [thread overview]
Message-ID: <1549019025.1438919.1648518112.6139DD49@webmail.messagingengine.com> (raw)
In-Reply-To: <20190131183531.d466egde46lywzwa@ast-mbp.dhcp.thefacebook.com>
On Thu, Jan 31, 2019, at 8:35 PM, Alexei Starovoitov wrote:
> On Wed, Jan 30, 2019 at 03:02:51PM +0100, Martynas Pumputis wrote:
> > Previously, memory allocated for a map was not accounted. Therefore,
> > this memory could not be taken into consideration by the cgroups
> > memory controller.
> >
> > This patch introduces the "BPF_F_ACCOUNT_MEM" flag which enables
> > the memory accounting for a map, and it can be set during
> > the map creation ("BPF_MAP_CREATE") in "map_flags".
> >
> > When enabled, we account only that amount of memory which is charged
> > against the "RLIMIT_MEMLOCK" limit.
> >
> > To validate the change, first we create the memory cgroup "test-map":
> >
> > # mkdir /sys/fs/cgroup/memory/test-map
> >
> > And then we run the following program against the cgroup:
> >
> > $ cat test_map.c
> > <..>
> > int main() {
> > usleep(3 * 1000000);
> > assert(bpf_create_map(BPF_MAP_TYPE_HASH, 8, 16, 65536, 0) > 0);
> > usleep(3 * 1000000);
> > }
> > # cgexec -g memory:test-map ./test_map &
> > # cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
> > 397312
> > 258048
> >
> > <after 3 sec the map has been created>
> >
> > # bpftool map list
> > 19: hash flags 0x0
> > key 8B value 16B max_entries 65536 memlock 5771264B
> > # cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
> > 401408
> > 262144
> >
> > As we can see, the memory allocated for map is not accounted, as
> > 397312B + 5771264B > 401408B.
> >
> > Next, we enabled the accounting and re-run the test:
> >
> > $ cat test_map.c
> > <..>
> > int main() {
> > usleep(3 * 1000000);
> > assert(bpf_create_map(BPF_MAP_TYPE_HASH, 8, 16, 65536, BPF_F_ACCOUNT_MEM) > 0);
> > usleep(3 * 1000000);
> > }
> > # cgexec -g memory:test-map ./test_map &
> > # cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
> > 450560
> > 307200
> >
> > <after 3 sec the map has been created>
> >
> > # bpftool map list
> > 20: hash flags 0x80
> > key 8B value 16B max_entries 65536 memlock 5771264B
> > # cat /sys/fs/cgroup/memory/test-map/memory{,.kmem}.usage_in_bytes
> > 6221824
> > 6078464
> >
> > This time, the memory (including kmem) is accounted, as
> > 450560B + 5771264B <= 6221824B
> >
> > Signed-off-by: Martynas Pumputis <m@lambda.lt>
> ...
> > @@ -49,7 +51,9 @@ static struct bpf_map *xsk_map_alloc(union bpf_attr *attr)
> >
> > err = -ENOMEM;
> >
> > - m->flush_list = alloc_percpu(struct list_head);
> > + if (account_mem)
> > + gfp |= __GFP_ACCOUNT;
> > + m->flush_list = alloc_percpu_gfp(struct list_head, gfp);
>
> I think it's better to account this memory by default.
> Extra flag during map creation is not needed.
The main reason I made the accounting optional is that otherwise it could break existing BPF map users who have set cgroup memory limits.
> There are nokmem and nosocket memcg boot options.
> We can add one more to turn off accounting of bpf map memory.
>
Considering the suggested boot option, I'm OK with enabling it by default. Do you want me to add the option as part of this PR?
next prev parent reply other threads:[~2019-02-01 11:03 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-30 14:02 [PATCH bpf-next] bpf: add optional memory accounting for maps Martynas Pumputis
2019-01-31 7:15 ` Y Song
2019-01-31 9:27 ` Martynas
2019-01-31 18:35 ` Alexei Starovoitov
2019-02-01 11:03 ` Martynas [this message]
2019-02-01 22:58 ` Alexei Starovoitov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1549019025.1438919.1648518112.6139DD49@webmail.messagingengine.com \
--to=m@lambda.lt \
--cc=alexei.starovoitov@gmail.com \
--cc=ast@kernel.org \
--cc=daniel@iogearbox.net \
--cc=netdev@vger.kernel.org \
--cc=ys114321@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).