Re: [PATCH v5 0/8] per-cgroup tcp buffer pressure settings

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Glauber Costa <glommer@parallels.com>
To: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: linux-kernel@vger.kernel.org, paul@paulmenage.org,
	lizf@cn.fujitsu.com, ebiederm@xmission.com, davem@davemloft.net,
	gthelen@google.com, netdev@vger.kernel.org, linux-mm@kvack.org,
	kirill@shutemov.name, avagin@parallels.com, devel@openvz.org
Subject: Re: [PATCH v5 0/8] per-cgroup tcp buffer pressure settings
Date: Fri, 7 Oct 2011 12:20:04 +0400	[thread overview]
Message-ID: <4E8EB634.9090208@parallels.com> (raw)
In-Reply-To: <20111007170522.624fab3d.kamezawa.hiroyu@jp.fujitsu.com>

On 10/07/2011 12:05 PM, KAMEZAWA Hiroyuki wrote:
>
>
> Sorry for lazy answer.
Hi Kame,

Now matter how hard you try, you'll never be as lazy as I am. So that's 
okay.

>
> On Wed, 5 Oct 2011 11:25:50 +0400
> Glauber Costa<glommer@parallels.com>  wrote:
>
>> On 10/05/2011 04:29 AM, KAMEZAWA Hiroyuki wrote:
>>> On Tue,  4 Oct 2011 16:17:52 +0400
>>> Glauber Costa<glommer@parallels.com>   wrote:
>>>
>
>>> At this stage, my concern is view of interfaces and documenation, and future plans.
>>
>> Okay. I will try to address them as well as I can.
>>
>>> * memory.independent_kmem_limit
>>>    If 1, kmem_limit_in_bytes/kmem_usage_in_bytes works.
>>>    If 0, kmem_limit_in_bytes/kmem_usage_in_bytes doesn't work and all kmem
>>>       usages are controlled under memory.limit_in_bytes.
>>
>> Correct. For the questions below, I won't even look at the code not to
>> get misguided. Let's settle on the desired behavior, and everything that
>> deviates from it, is a bug.
>>
>>> Question:
>>>    - What happens when parent/chidlren cgroup has different indepedent_kmem_limit ?
>> I think it should be forbidden. It was raised by Kirill before, and
>> IIRC, he specifically requested it to be. (Okay: Saying it now, makes me
>> realizes that the child can have set it to 1 while parent was 1. But
>> then parent sets it to 0... I don't think I am handling this case).
>>
>
> ok, please put it into TODO list ;)

Done.

>
>
>>>    In future plan, kmem.usage_in_bytes should includes tcp.kmem_usage_in_bytes.
>>>    And kmem.limit_in_bytes should be the limiation of sum of all kmem.xxxx.limit_in_bytes.
>>
>> I am not sure there will be others xxx.limit_in_bytes. (see below)
>>
>
> ok.
>
>
>>>
>>> Question:
>>>    - Why this integration is difficult ?
>> It is not that it is difficult.
>> What happens is that there are two things taking place here:
>> One of them is allocation.
>> The other, is tcp-specific pressure thresholds. Bear with me with the
>> following example code: (from sk_stream_alloc_skb, net/ipv4/tcp.c)
>>
>> 1:      skb = alloc_skb_fclone(size + sk->sk_prot->max_header, gfp);
>>           if (skb) {
>> 3:              if (sk_wmem_schedule(sk, skb->truesize)) {
>>                           /*
>>                            * Make sure that we have exactly size bytes
>>                            * available to the caller, no more, no less.
>>                            */
>>                           skb_reserve(skb, skb_tailroom(skb) - size);
>>                           return skb;
>>                   }
>>                   __kfree_skb(skb);
>>           } else {
>>                   sk->sk_prot->enter_memory_pressure(sk);
>>                   sk_stream_moderate_sndbuf(sk);
>>           }
>>
>> In line 1, an allocation takes place. This allocs memory from the skbuff
>> slab cache.
>> But then, pressure thresholds are applied in 3. If it fails, we drop the
>> memory buffer even if the allocation succeeded.
>>
>
> Sure.
>
>
>> So this patchset, as I've stated already, cares about pressure
>> conditions only. It is enough to guarantee that no more memory will be
>> pinned that we specified, because we'll free the allocation in case
>> pressure is reached.
>>
>> There is work in progress from guys at google (and I have my very own
>> PoCs as well), to include all slab allocations in kmem.usage_in_bytes.
>>
>
> ok.
>
>
>> So what I really mean here with "will integrate later", is that I think
>> that we'd be better off tracking the allocations themselves at the slab
>> level.
>>
>>>      Can't tcp-limit-code borrows some amount of charges in batch from kmem_limit
>>>      and use it ?
>> Sorry, I don't know what exactly do you mean. Can you clarify?
>>
> Now, tcp-usage is independent from kmem-usage.
>
> My idea is
>
>    1. when you account tcp usage, charge kmem, too.

Absolutely.
>    Now, your work is
>       a) tcp use new xxxx bytes.
>       b) account it to tcp.uage and check tcp limit
>
>    To ingegrate kmem,
>       a) tcp use new xxxx bytes.
>       b) account it to tcp.usage and check tcp limit
>       c) account it to kmem.usage
>
> ? 2 counters may be slow ?

Well, the way I see it, 1 counter is slow already =)
I honestly think we need some optimizations here. But
that is a side issue.

To begin with: The new patchset that I intend to spin
today or Monday, depending on my progress, uses res_counters,
as you and Kirill requested.

So what makes res_counters slow IMHO, is two things:

1) interrupts are always disabled.
2) All is done under a lock.

Now, we are starting to have resources that are billed to multiple
counters. One simple way to work around it, is to have child counters
that has to be accounted for as well everytime a resource is counted.

Like this:

1) tcp has kmem as child. When we bill to tcp, we bill to kmem as well.
    For protocols that do memory pressure, we then don't bill kmem from
    the slab.
2) When kmem_independent_account is set to 0, kmem has mem as child.

>
>
>>>    - Don't you need a stat file to indicate "tcp memory pressure works!" ?
>>>      It can be obtained already ?
>>
>> Not 100 % clear as well. We can query the amount of buffer used, and the
>> amount of buffer allowed. What else do we need?
>>
>
> IIUC, we can see the fact tcp.usage is near to tcp.limit but never can see it
> got memory pressure and how many numbers of failure happens.
> I'm sorry if I don't read codes correctly.

IIUC, With res_counters being used, we get at least failcnt for free, right?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-10-07  8:20 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-04 12:17 [PATCH v5 0/8] per-cgroup tcp buffer pressure settings Glauber Costa
2011-10-04 12:17 ` [PATCH v5 1/8] Basic kernel memory functionality for the Memory Controller Glauber Costa
2011-10-04 12:17 ` [PATCH v5 2/8] socket: initial cgroup code Glauber Costa
2011-10-04 12:17 ` [PATCH v5 3/8] foundations of per-cgroup memory pressure controlling Glauber Costa
2011-10-04 12:17 ` [PATCH v5 4/8] per-cgroup tcp buffers control Glauber Costa
2011-10-04 12:17 ` [PATCH v5 5/8] per-netns ipv4 sysctl_tcp_mem Glauber Costa
2011-10-04 12:17 ` [PATCH v5 6/8] tcp buffer limitation: per-cgroup limit Glauber Costa
2011-10-04 12:48   ` Eric Dumazet
2011-10-05  8:08     ` Glauber Costa
2011-10-05  8:58       ` Eric Dumazet
2011-10-06  8:38         ` Glauber Costa
2011-10-04 12:17 ` [PATCH v5 7/8] Display current tcp memory allocation in kmem cgroup Glauber Costa
2011-10-04 12:18 ` [PATCH v5 8/8] Disable task moving when using kernel memory accounting Glauber Costa
2011-10-05  0:29 ` [PATCH v5 0/8] per-cgroup tcp buffer pressure settings KAMEZAWA Hiroyuki
2011-10-05  7:25   ` Glauber Costa
2011-10-07  8:05     ` KAMEZAWA Hiroyuki
2011-10-07  8:20       ` Glauber Costa [this message]
2011-10-07  8:55         ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E8EB634.9090208@parallels.com \
    --to=glommer@parallels.com \
    --cc=avagin@parallels.com \
    --cc=davem@davemloft.net \
    --cc=devel@openvz.org \
    --cc=ebiederm@xmission.com \
    --cc=gthelen@google.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizf@cn.fujitsu.com \
    --cc=netdev@vger.kernel.org \
    --cc=paul@paulmenage.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).