* [PATCH] introduce res_counter_charge_nofail() for socket allocations
@ 2012-01-18 15:15 Glauber Costa
[not found] ` <1326899758-9013-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
0 siblings, 1 reply; 8+ messages in thread
From: Glauber Costa @ 2012-01-18 15:15 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA
Cc: Glauber Costa, KAMEZAWA Hiroyuki, Johannes Weiner, Michal Hocko
There is a case in __sk_mem_schedule(), where an allocation
is beyond the maximum, but yet we are allowed to proceed.
It happens under the following condition:
sk->sk_wmem_queued + size >= sk->sk_sndbuf
The network code won't revert the allocation in this case,
meaning that at some point later it'll try to do it. Since
this is never communicated to the underlying res_counter
code, there is an inbalance in res_counter uncharge operation.
I see two ways of fixing this:
1) storing the information about those allocations somewhere
in memcg, and then deducting from that first, before
we start draining the res_counter,
2) providing a slightly different allocation function for
the res_counter, that matches the original behavior of
the network code more closely.
I decided to go for #2 here, believing it to be more elegant,
since #1 would require us to do basically that, but in a more
obscure way.
I will eventually submit it through Dave for the -net tree,
but I wanted to query you guys first, to see if this approach
is acceptable or if you'd prefer me to try something else.
Thanks
Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org>
---
include/linux/res_counter.h | 6 ++++++
include/net/sock.h | 10 ++++------
kernel/res_counter.c | 25 +++++++++++++++++++++++++
net/core/sock.c | 4 ++--
4 files changed, 37 insertions(+), 8 deletions(-)
diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h
index c9d625c..32a7b02 100644
--- a/include/linux/res_counter.h
+++ b/include/linux/res_counter.h
@@ -109,12 +109,18 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent);
*
* returns 0 on success and <0 if the counter->usage will exceed the
* counter->limit _locked call expects the counter->lock to be taken
+ *
+ * charge_nofail works the same, except that it charges the resource
+ * counter unconditionally, and returns < 0 if the after the current
+ * charge we are over limit.
*/
int __must_check res_counter_charge_locked(struct res_counter *counter,
unsigned long val);
int __must_check res_counter_charge(struct res_counter *counter,
unsigned long val, struct res_counter **limit_fail_at);
+int __must_check res_counter_charge_nofail(struct res_counter *counter,
+ unsigned long val, struct res_counter **limit_fail_at);
/*
* uncharge - tell that some portion of the resource is released
diff --git a/include/net/sock.h b/include/net/sock.h
index bb972d2..3269085 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1007,9 +1007,8 @@ static inline void memcg_memory_allocated_add(struct cg_proto *prot,
struct res_counter *fail;
int ret;
- ret = res_counter_charge(prot->memory_allocated,
- amt << PAGE_SHIFT, &fail);
-
+ ret = res_counter_charge_nofail(prot->memory_allocated,
+ amt << PAGE_SHIFT, &fail);
if (ret < 0)
*parent_status = OVER_LIMIT;
}
@@ -1053,12 +1052,11 @@ sk_memory_allocated_add(struct sock *sk, int amt, int *parent_status)
}
static inline void
-sk_memory_allocated_sub(struct sock *sk, int amt, int parent_status)
+sk_memory_allocated_sub(struct sock *sk, int amt)
{
struct proto *prot = sk->sk_prot;
- if (mem_cgroup_sockets_enabled && sk->sk_cgrp &&
- parent_status != OVER_LIMIT) /* Otherwise was uncharged already */
+ if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
memcg_memory_allocated_sub(sk->sk_cgrp, amt);
atomic_long_sub(amt, prot->memory_allocated);
diff --git a/kernel/res_counter.c b/kernel/res_counter.c
index 6d269cc..d508363 100644
--- a/kernel/res_counter.c
+++ b/kernel/res_counter.c
@@ -66,6 +66,31 @@ done:
return ret;
}
+int res_counter_charge_nofail(struct res_counter *counter, unsigned long val,
+ struct res_counter **limit_fail_at)
+{
+ int ret, r;
+ unsigned long flags;
+ struct res_counter *c;
+
+ r = ret = 0;
+ *limit_fail_at = NULL;
+ local_irq_save(flags);
+ for (c = counter; c != NULL; c = c->parent) {
+ spin_lock(&c->lock);
+ r = res_counter_charge_locked(c, val);
+ if (r)
+ c->usage += val;
+ spin_unlock(&c->lock);
+ if (r < 0 && ret == 0) {
+ *limit_fail_at = c;
+ ret = r;
+ }
+ }
+ local_irq_restore(flags);
+
+ return ret;
+}
void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val)
{
if (WARN_ON(counter->usage < val))
diff --git a/net/core/sock.c b/net/core/sock.c
index 5c5af998..3e81fd2 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1827,7 +1827,7 @@ suppress_allocation:
/* Alas. Undo changes. */
sk->sk_forward_alloc -= amt * SK_MEM_QUANTUM;
- sk_memory_allocated_sub(sk, amt, parent_status);
+ sk_memory_allocated_sub(sk, amt);
return 0;
}
@@ -1840,7 +1840,7 @@ EXPORT_SYMBOL(__sk_mem_schedule);
void __sk_mem_reclaim(struct sock *sk)
{
sk_memory_allocated_sub(sk,
- sk->sk_forward_alloc >> SK_MEM_QUANTUM_SHIFT, 0);
+ sk->sk_forward_alloc >> SK_MEM_QUANTUM_SHIFT);
sk->sk_forward_alloc &= SK_MEM_QUANTUM - 1;
if (sk_under_memory_pressure(sk) &&
--
1.7.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread[parent not found: <1326899758-9013-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>]
* Re: [PATCH] introduce res_counter_charge_nofail() for socket allocations [not found] ` <1326899758-9013-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> @ 2012-01-19 12:48 ` Johannes Weiner [not found] ` <20120119124841.GL24386-druUgvl0LCNAfugRpC6u6w@public.gmane.org> 2012-01-19 14:36 ` Michal Hocko 1 sibling, 1 reply; 8+ messages in thread From: Johannes Weiner @ 2012-01-19 12:48 UTC (permalink / raw) To: Glauber Costa Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki, Michal Hocko On Wed, Jan 18, 2012 at 07:15:58PM +0400, Glauber Costa wrote: > There is a case in __sk_mem_schedule(), where an allocation > is beyond the maximum, but yet we are allowed to proceed. > It happens under the following condition: > > sk->sk_wmem_queued + size >= sk->sk_sndbuf > > The network code won't revert the allocation in this case, > meaning that at some point later it'll try to do it. Since > this is never communicated to the underlying res_counter > code, there is an inbalance in res_counter uncharge operation. > > I see two ways of fixing this: > > 1) storing the information about those allocations somewhere > in memcg, and then deducting from that first, before > we start draining the res_counter, > 2) providing a slightly different allocation function for > the res_counter, that matches the original behavior of > the network code more closely. > > I decided to go for #2 here, believing it to be more elegant, > since #1 would require us to do basically that, but in a more > obscure way. > > I will eventually submit it through Dave for the -net tree, > but I wanted to query you guys first, to see if this approach > is acceptable or if you'd prefer me to try something else. > > Thanks > > Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> > Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> > Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> > --- > include/linux/res_counter.h | 6 ++++++ > include/net/sock.h | 10 ++++------ > kernel/res_counter.c | 25 +++++++++++++++++++++++++ > net/core/sock.c | 4 ++-- > 4 files changed, 37 insertions(+), 8 deletions(-) > > diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > index c9d625c..32a7b02 100644 > --- a/include/linux/res_counter.h > +++ b/include/linux/res_counter.h > @@ -109,12 +109,18 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent); > * > * returns 0 on success and <0 if the counter->usage will exceed the > * counter->limit _locked call expects the counter->lock to be taken > + * > + * charge_nofail works the same, except that it charges the resource > + * counter unconditionally, and returns < 0 if the after the current > + * charge we are over limit. > */ res_counter_margin() assumes usage <= limit is always true. Just make sure you return 0 if that is not the case, or the charge path can get confused, thinking there is enough room and retry needlessly. Otherwise, looks good. ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20120119124841.GL24386-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: [PATCH] introduce res_counter_charge_nofail() for socket allocations [not found] ` <20120119124841.GL24386-druUgvl0LCNAfugRpC6u6w@public.gmane.org> @ 2012-01-19 12:51 ` Glauber Costa [not found] ` <4F1811C4.50204-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Glauber Costa @ 2012-01-19 12:51 UTC (permalink / raw) To: Johannes Weiner Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki, Michal Hocko On 01/19/2012 04:48 PM, Johannes Weiner wrote: > On Wed, Jan 18, 2012 at 07:15:58PM +0400, Glauber Costa wrote: >> There is a case in __sk_mem_schedule(), where an allocation >> is beyond the maximum, but yet we are allowed to proceed. >> It happens under the following condition: >> >> sk->sk_wmem_queued + size>= sk->sk_sndbuf >> >> The network code won't revert the allocation in this case, >> meaning that at some point later it'll try to do it. Since >> this is never communicated to the underlying res_counter >> code, there is an inbalance in res_counter uncharge operation. >> >> I see two ways of fixing this: >> >> 1) storing the information about those allocations somewhere >> in memcg, and then deducting from that first, before >> we start draining the res_counter, >> 2) providing a slightly different allocation function for >> the res_counter, that matches the original behavior of >> the network code more closely. >> >> I decided to go for #2 here, believing it to be more elegant, >> since #1 would require us to do basically that, but in a more >> obscure way. >> >> I will eventually submit it through Dave for the -net tree, >> but I wanted to query you guys first, to see if this approach >> is acceptable or if you'd prefer me to try something else. >> >> Thanks >> >> Signed-off-by: Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> >> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> >> Cc: Johannes Weiner<hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> >> Cc: Michal Hocko<mhocko-AlSwsSmVLrQ@public.gmane.org> >> --- >> include/linux/res_counter.h | 6 ++++++ >> include/net/sock.h | 10 ++++------ >> kernel/res_counter.c | 25 +++++++++++++++++++++++++ >> net/core/sock.c | 4 ++-- >> 4 files changed, 37 insertions(+), 8 deletions(-) >> >> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h >> index c9d625c..32a7b02 100644 >> --- a/include/linux/res_counter.h >> +++ b/include/linux/res_counter.h >> @@ -109,12 +109,18 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent); >> * >> * returns 0 on success and<0 if the counter->usage will exceed the >> * counter->limit _locked call expects the counter->lock to be taken >> + * >> + * charge_nofail works the same, except that it charges the resource >> + * counter unconditionally, and returns< 0 if the after the current >> + * charge we are over limit. >> */ > > res_counter_margin() assumes usage<= limit is always true. Just make > sure you return 0 if that is not the case, or the charge path can get > confused, thinking there is enough room and retry needlessly. > > Otherwise, looks good. You mean return zero in res_counter_charge_fail() if we exceeded the limit? I do that, since one needs to know the allocation was supposed to fail. Or are you talking about something else ? ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <4F1811C4.50204-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>]
* Re: [PATCH] introduce res_counter_charge_nofail() for socket allocations [not found] ` <4F1811C4.50204-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> @ 2012-01-19 13:12 ` Johannes Weiner [not found] ` <20120119131217.GN24386-druUgvl0LCNAfugRpC6u6w@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Johannes Weiner @ 2012-01-19 13:12 UTC (permalink / raw) To: Glauber Costa Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki, Michal Hocko On Thu, Jan 19, 2012 at 04:51:16PM +0400, Glauber Costa wrote: > On 01/19/2012 04:48 PM, Johannes Weiner wrote: > >On Wed, Jan 18, 2012 at 07:15:58PM +0400, Glauber Costa wrote: > >>There is a case in __sk_mem_schedule(), where an allocation > >>is beyond the maximum, but yet we are allowed to proceed. > >>It happens under the following condition: > >> > >> sk->sk_wmem_queued + size>= sk->sk_sndbuf > >> > >>The network code won't revert the allocation in this case, > >>meaning that at some point later it'll try to do it. Since > >>this is never communicated to the underlying res_counter > >>code, there is an inbalance in res_counter uncharge operation. > >> > >>I see two ways of fixing this: > >> > >>1) storing the information about those allocations somewhere > >> in memcg, and then deducting from that first, before > >> we start draining the res_counter, > >>2) providing a slightly different allocation function for > >> the res_counter, that matches the original behavior of > >> the network code more closely. > >> > >>I decided to go for #2 here, believing it to be more elegant, > >>since #1 would require us to do basically that, but in a more > >>obscure way. > >> > >>I will eventually submit it through Dave for the -net tree, > >>but I wanted to query you guys first, to see if this approach > >>is acceptable or if you'd prefer me to try something else. > >> > >>Thanks > >> > >>Signed-off-by: Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> > >>Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> > >>Cc: Johannes Weiner<hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> > >>Cc: Michal Hocko<mhocko-AlSwsSmVLrQ@public.gmane.org> > >>--- > >> include/linux/res_counter.h | 6 ++++++ > >> include/net/sock.h | 10 ++++------ > >> kernel/res_counter.c | 25 +++++++++++++++++++++++++ > >> net/core/sock.c | 4 ++-- > >> 4 files changed, 37 insertions(+), 8 deletions(-) > >> > >>diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > >>index c9d625c..32a7b02 100644 > >>--- a/include/linux/res_counter.h > >>+++ b/include/linux/res_counter.h > >>@@ -109,12 +109,18 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent); > >> * > >> * returns 0 on success and<0 if the counter->usage will exceed the > >> * counter->limit _locked call expects the counter->lock to be taken > >>+ * > >>+ * charge_nofail works the same, except that it charges the resource > >>+ * counter unconditionally, and returns< 0 if the after the current > >>+ * charge we are over limit. > >> */ > > > >res_counter_margin() assumes usage<= limit is always true. Just make > >sure you return 0 if that is not the case, or the charge path can get > >confused, thinking there is enough room and retry needlessly. > > > >Otherwise, looks good. > > You mean return zero in res_counter_charge_fail() if we exceeded the limit? > > I do that, since one needs to know the allocation was supposed to fail. > Or are you talking about something else ? Yes, I mean the calculation in res_counter_margin(), which is supposed to tell the margin for new charges. Its current code will underflow and return something large when the usage exceeds the limit, which is not possible before your patch, so I think you need to include this: diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h index c9d625c..d06d014 100644 --- a/include/linux/res_counter.h +++ b/include/linux/res_counter.h @@ -142,7 +142,10 @@ static inline unsigned long long res_counter_margin(struct res_counter *cnt) unsigned long flags; spin_lock_irqsave(&cnt->lock, flags); - margin = cnt->limit - cnt->usage; + if (cnt->limit > cnt->usage) + margin = cnt->limit - cnt->usage; + else + margin = 0; spin_unlock_irqrestore(&cnt->lock, flags); return margin; } ^ permalink raw reply related [flat|nested] 8+ messages in thread
[parent not found: <20120119131217.GN24386-druUgvl0LCNAfugRpC6u6w@public.gmane.org>]
* Re: [PATCH] introduce res_counter_charge_nofail() for socket allocations [not found] ` <20120119131217.GN24386-druUgvl0LCNAfugRpC6u6w@public.gmane.org> @ 2012-01-19 13:13 ` Glauber Costa [not found] ` <4F1816E1.2050506-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> 0 siblings, 1 reply; 8+ messages in thread From: Glauber Costa @ 2012-01-19 13:13 UTC (permalink / raw) To: Johannes Weiner Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki, Michal Hocko On 01/19/2012 05:12 PM, Johannes Weiner wrote: > On Thu, Jan 19, 2012 at 04:51:16PM +0400, Glauber Costa wrote: >> On 01/19/2012 04:48 PM, Johannes Weiner wrote: >>> On Wed, Jan 18, 2012 at 07:15:58PM +0400, Glauber Costa wrote: >>>> There is a case in __sk_mem_schedule(), where an allocation >>>> is beyond the maximum, but yet we are allowed to proceed. >>>> It happens under the following condition: >>>> >>>> sk->sk_wmem_queued + size>= sk->sk_sndbuf >>>> >>>> The network code won't revert the allocation in this case, >>>> meaning that at some point later it'll try to do it. Since >>>> this is never communicated to the underlying res_counter >>>> code, there is an inbalance in res_counter uncharge operation. >>>> >>>> I see two ways of fixing this: >>>> >>>> 1) storing the information about those allocations somewhere >>>> in memcg, and then deducting from that first, before >>>> we start draining the res_counter, >>>> 2) providing a slightly different allocation function for >>>> the res_counter, that matches the original behavior of >>>> the network code more closely. >>>> >>>> I decided to go for #2 here, believing it to be more elegant, >>>> since #1 would require us to do basically that, but in a more >>>> obscure way. >>>> >>>> I will eventually submit it through Dave for the -net tree, >>>> but I wanted to query you guys first, to see if this approach >>>> is acceptable or if you'd prefer me to try something else. >>>> >>>> Thanks >>>> >>>> Signed-off-by: Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> >>>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> >>>> Cc: Johannes Weiner<hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> >>>> Cc: Michal Hocko<mhocko-AlSwsSmVLrQ@public.gmane.org> >>>> --- >>>> include/linux/res_counter.h | 6 ++++++ >>>> include/net/sock.h | 10 ++++------ >>>> kernel/res_counter.c | 25 +++++++++++++++++++++++++ >>>> net/core/sock.c | 4 ++-- >>>> 4 files changed, 37 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h >>>> index c9d625c..32a7b02 100644 >>>> --- a/include/linux/res_counter.h >>>> +++ b/include/linux/res_counter.h >>>> @@ -109,12 +109,18 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent); >>>> * >>>> * returns 0 on success and<0 if the counter->usage will exceed the >>>> * counter->limit _locked call expects the counter->lock to be taken >>>> + * >>>> + * charge_nofail works the same, except that it charges the resource >>>> + * counter unconditionally, and returns< 0 if the after the current >>>> + * charge we are over limit. >>>> */ >>> >>> res_counter_margin() assumes usage<= limit is always true. Just make >>> sure you return 0 if that is not the case, or the charge path can get >>> confused, thinking there is enough room and retry needlessly. >>> >>> Otherwise, looks good. >> >> You mean return zero in res_counter_charge_fail() if we exceeded the limit? >> >> I do that, since one needs to know the allocation was supposed to fail. >> Or are you talking about something else ? > > Yes, I mean the calculation in res_counter_margin(), which is supposed > to tell the margin for new charges. Its current code will underflow > and return something large when the usage exceeds the limit, which is > not possible before your patch, so I think you need to include this: > > diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > index c9d625c..d06d014 100644 > --- a/include/linux/res_counter.h > +++ b/include/linux/res_counter.h > @@ -142,7 +142,10 @@ static inline unsigned long long res_counter_margin(struct res_counter *cnt) > unsigned long flags; > > spin_lock_irqsave(&cnt->lock, flags); > - margin = cnt->limit - cnt->usage; > + if (cnt->limit> cnt->usage) > + margin = cnt->limit - cnt->usage; > + else > + margin = 0; > spin_unlock_irqrestore(&cnt->lock, flags); > return margin; > } Ah, I see. Okay, I will update my patch to include this. (Or would you think it would be better to add this as a preparation patch?) ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <4F1816E1.2050506-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>]
* Re: [PATCH] introduce res_counter_charge_nofail() for socket allocations [not found] ` <4F1816E1.2050506-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> @ 2012-01-19 13:38 ` Johannes Weiner 0 siblings, 0 replies; 8+ messages in thread From: Johannes Weiner @ 2012-01-19 13:38 UTC (permalink / raw) To: Glauber Costa Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki, Michal Hocko On Thu, Jan 19, 2012 at 05:13:05PM +0400, Glauber Costa wrote: > On 01/19/2012 05:12 PM, Johannes Weiner wrote: > >On Thu, Jan 19, 2012 at 04:51:16PM +0400, Glauber Costa wrote: > >>On 01/19/2012 04:48 PM, Johannes Weiner wrote: > >>>On Wed, Jan 18, 2012 at 07:15:58PM +0400, Glauber Costa wrote: > >>>>There is a case in __sk_mem_schedule(), where an allocation > >>>>is beyond the maximum, but yet we are allowed to proceed. > >>>>It happens under the following condition: > >>>> > >>>> sk->sk_wmem_queued + size>= sk->sk_sndbuf > >>>> > >>>>The network code won't revert the allocation in this case, > >>>>meaning that at some point later it'll try to do it. Since > >>>>this is never communicated to the underlying res_counter > >>>>code, there is an inbalance in res_counter uncharge operation. > >>>> > >>>>I see two ways of fixing this: > >>>> > >>>>1) storing the information about those allocations somewhere > >>>> in memcg, and then deducting from that first, before > >>>> we start draining the res_counter, > >>>>2) providing a slightly different allocation function for > >>>> the res_counter, that matches the original behavior of > >>>> the network code more closely. > >>>> > >>>>I decided to go for #2 here, believing it to be more elegant, > >>>>since #1 would require us to do basically that, but in a more > >>>>obscure way. > >>>> > >>>>I will eventually submit it through Dave for the -net tree, > >>>>but I wanted to query you guys first, to see if this approach > >>>>is acceptable or if you'd prefer me to try something else. > >>>> > >>>>Thanks > >>>> > >>>>Signed-off-by: Glauber Costa<glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> > >>>>Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> > >>>>Cc: Johannes Weiner<hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> > >>>>Cc: Michal Hocko<mhocko-AlSwsSmVLrQ@public.gmane.org> > >>>>--- > >>>> include/linux/res_counter.h | 6 ++++++ > >>>> include/net/sock.h | 10 ++++------ > >>>> kernel/res_counter.c | 25 +++++++++++++++++++++++++ > >>>> net/core/sock.c | 4 ++-- > >>>> 4 files changed, 37 insertions(+), 8 deletions(-) > >>>> > >>>>diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > >>>>index c9d625c..32a7b02 100644 > >>>>--- a/include/linux/res_counter.h > >>>>+++ b/include/linux/res_counter.h > >>>>@@ -109,12 +109,18 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent); > >>>> * > >>>> * returns 0 on success and<0 if the counter->usage will exceed the > >>>> * counter->limit _locked call expects the counter->lock to be taken > >>>>+ * > >>>>+ * charge_nofail works the same, except that it charges the resource > >>>>+ * counter unconditionally, and returns< 0 if the after the current > >>>>+ * charge we are over limit. > >>>> */ > >>> > >>>res_counter_margin() assumes usage<= limit is always true. Just make > >>>sure you return 0 if that is not the case, or the charge path can get > >>>confused, thinking there is enough room and retry needlessly. > >>> > >>>Otherwise, looks good. > >> > >>You mean return zero in res_counter_charge_fail() if we exceeded the limit? > >> > >>I do that, since one needs to know the allocation was supposed to fail. > >>Or are you talking about something else ? > > > >Yes, I mean the calculation in res_counter_margin(), which is supposed > >to tell the margin for new charges. Its current code will underflow > >and return something large when the usage exceeds the limit, which is > >not possible before your patch, so I think you need to include this: > > > >diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > >index c9d625c..d06d014 100644 > >--- a/include/linux/res_counter.h > >+++ b/include/linux/res_counter.h > >@@ -142,7 +142,10 @@ static inline unsigned long long res_counter_margin(struct res_counter *cnt) > > unsigned long flags; > > > > spin_lock_irqsave(&cnt->lock, flags); > >- margin = cnt->limit - cnt->usage; > >+ if (cnt->limit> cnt->usage) > >+ margin = cnt->limit - cnt->usage; > >+ else > >+ margin = 0; > > spin_unlock_irqrestore(&cnt->lock, flags); > > return margin; > > } > Ah, I see. > > Okay, I will update my patch to include this. (Or would you think it > would be better to add this as a preparation patch?) Probably better to just include it as we wouldn't need this check stand-alone. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] introduce res_counter_charge_nofail() for socket allocations [not found] ` <1326899758-9013-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> 2012-01-19 12:48 ` Johannes Weiner @ 2012-01-19 14:36 ` Michal Hocko [not found] ` <20120119143607.GF13932-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org> 1 sibling, 1 reply; 8+ messages in thread From: Michal Hocko @ 2012-01-19 14:36 UTC (permalink / raw) To: Glauber Costa Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki, Johannes Weiner, Tejun Heo, Li Zefan I guess you should have CCed cgroups maintainers as well. Let's do that now On Wed 18-01-12 19:15:58, Glauber Costa wrote: > There is a case in __sk_mem_schedule(), where an allocation > is beyond the maximum, but yet we are allowed to proceed. > It happens under the following condition: > > sk->sk_wmem_queued + size >= sk->sk_sndbuf > > The network code won't revert the allocation in this case, > meaning that at some point later it'll try to do it. Since > this is never communicated to the underlying res_counter > code, there is an inbalance in res_counter uncharge operation. > > I see two ways of fixing this: > > 1) storing the information about those allocations somewhere > in memcg, and then deducting from that first, before > we start draining the res_counter, > 2) providing a slightly different allocation function for > the res_counter, that matches the original behavior of > the network code more closely. > > I decided to go for #2 here, believing it to be more elegant, > since #1 would require us to do basically that, but in a more > obscure way. > > I will eventually submit it through Dave for the -net tree, > but I wanted to query you guys first, to see if this approach > is acceptable or if you'd prefer me to try something else. > > Thanks > > Signed-off-by: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> > Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu-+CUm20s59erQFUHtdCDX3A@public.gmane.org> > Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> > Cc: Michal Hocko <mhocko-AlSwsSmVLrQ@public.gmane.org> > --- > include/linux/res_counter.h | 6 ++++++ > include/net/sock.h | 10 ++++------ > kernel/res_counter.c | 25 +++++++++++++++++++++++++ > net/core/sock.c | 4 ++-- > 4 files changed, 37 insertions(+), 8 deletions(-) > > diff --git a/include/linux/res_counter.h b/include/linux/res_counter.h > index c9d625c..32a7b02 100644 > --- a/include/linux/res_counter.h > +++ b/include/linux/res_counter.h > @@ -109,12 +109,18 @@ void res_counter_init(struct res_counter *counter, struct res_counter *parent); > * > * returns 0 on success and <0 if the counter->usage will exceed the > * counter->limit _locked call expects the counter->lock to be taken > + * > + * charge_nofail works the same, except that it charges the resource > + * counter unconditionally, and returns < 0 if the after the current > + * charge we are over limit. > */ > > int __must_check res_counter_charge_locked(struct res_counter *counter, > unsigned long val); > int __must_check res_counter_charge(struct res_counter *counter, > unsigned long val, struct res_counter **limit_fail_at); > +int __must_check res_counter_charge_nofail(struct res_counter *counter, > + unsigned long val, struct res_counter **limit_fail_at); > > /* > * uncharge - tell that some portion of the resource is released > diff --git a/include/net/sock.h b/include/net/sock.h > index bb972d2..3269085 100644 > --- a/include/net/sock.h > +++ b/include/net/sock.h > @@ -1007,9 +1007,8 @@ static inline void memcg_memory_allocated_add(struct cg_proto *prot, > struct res_counter *fail; > int ret; > > - ret = res_counter_charge(prot->memory_allocated, > - amt << PAGE_SHIFT, &fail); > - > + ret = res_counter_charge_nofail(prot->memory_allocated, > + amt << PAGE_SHIFT, &fail); > if (ret < 0) > *parent_status = OVER_LIMIT; > } > @@ -1053,12 +1052,11 @@ sk_memory_allocated_add(struct sock *sk, int amt, int *parent_status) > } > > static inline void > -sk_memory_allocated_sub(struct sock *sk, int amt, int parent_status) > +sk_memory_allocated_sub(struct sock *sk, int amt) > { > struct proto *prot = sk->sk_prot; > > - if (mem_cgroup_sockets_enabled && sk->sk_cgrp && > - parent_status != OVER_LIMIT) /* Otherwise was uncharged already */ > + if (mem_cgroup_sockets_enabled && sk->sk_cgrp) > memcg_memory_allocated_sub(sk->sk_cgrp, amt); > > atomic_long_sub(amt, prot->memory_allocated); > diff --git a/kernel/res_counter.c b/kernel/res_counter.c > index 6d269cc..d508363 100644 > --- a/kernel/res_counter.c > +++ b/kernel/res_counter.c > @@ -66,6 +66,31 @@ done: > return ret; > } > > +int res_counter_charge_nofail(struct res_counter *counter, unsigned long val, > + struct res_counter **limit_fail_at) > +{ > + int ret, r; > + unsigned long flags; > + struct res_counter *c; > + > + r = ret = 0; > + *limit_fail_at = NULL; > + local_irq_save(flags); > + for (c = counter; c != NULL; c = c->parent) { > + spin_lock(&c->lock); > + r = res_counter_charge_locked(c, val); > + if (r) > + c->usage += val; > + spin_unlock(&c->lock); > + if (r < 0 && ret == 0) { > + *limit_fail_at = c; > + ret = r; > + } > + } > + local_irq_restore(flags); > + > + return ret; > +} > void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val) > { > if (WARN_ON(counter->usage < val)) > diff --git a/net/core/sock.c b/net/core/sock.c > index 5c5af998..3e81fd2 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -1827,7 +1827,7 @@ suppress_allocation: > /* Alas. Undo changes. */ > sk->sk_forward_alloc -= amt * SK_MEM_QUANTUM; > > - sk_memory_allocated_sub(sk, amt, parent_status); > + sk_memory_allocated_sub(sk, amt); > > return 0; > } > @@ -1840,7 +1840,7 @@ EXPORT_SYMBOL(__sk_mem_schedule); > void __sk_mem_reclaim(struct sock *sk) > { > sk_memory_allocated_sub(sk, > - sk->sk_forward_alloc >> SK_MEM_QUANTUM_SHIFT, 0); > + sk->sk_forward_alloc >> SK_MEM_QUANTUM_SHIFT); > sk->sk_forward_alloc &= SK_MEM_QUANTUM - 1; > > if (sk_under_memory_pressure(sk) && > -- > 1.7.7.4 > -- Michal Hocko SUSE Labs SUSE LINUX s.r.o. Lihovarska 1060/12 190 00 Praha 9 Czech Republic ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <20120119143607.GF13932-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org>]
* Re: [PATCH] introduce res_counter_charge_nofail() for socket allocations [not found] ` <20120119143607.GF13932-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org> @ 2012-01-19 20:57 ` James Bottomley 0 siblings, 0 replies; 8+ messages in thread From: James Bottomley @ 2012-01-19 20:57 UTC (permalink / raw) To: Michal Hocko Cc: Glauber Costa, cgroups-u79uwXL29TY76Z2rM5mHXA, KAMEZAWA Hiroyuki, Johannes Weiner, Tejun Heo, Li Zefan On Thu, 2012-01-19 at 15:36 +0100, Michal Hocko wrote: > I guess you should have CCed cgroups maintainers as well. > Let's do that now So this is just my preference in SCSI and cgroup maintainers get to specify their own preferences, but as a maintainer, I subscribe to the mailing lists, so there's no need to cc me directly. In fact, I prefer that you don't cc me directly since it would just clutter up my inbox and wouldn't be really helpful because all my scripts operate on my linux-scsi folder (however, I also have a procmail rule that trashes an email directly to me that also went to linux-scsi, so cc'ing me directly is usually harmless). James ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-01-19 20:57 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-18 15:15 [PATCH] introduce res_counter_charge_nofail() for socket allocations Glauber Costa
[not found] ` <1326899758-9013-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-01-19 12:48 ` Johannes Weiner
[not found] ` <20120119124841.GL24386-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-01-19 12:51 ` Glauber Costa
[not found] ` <4F1811C4.50204-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-01-19 13:12 ` Johannes Weiner
[not found] ` <20120119131217.GN24386-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2012-01-19 13:13 ` Glauber Costa
[not found] ` <4F1816E1.2050506-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-01-19 13:38 ` Johannes Weiner
2012-01-19 14:36 ` Michal Hocko
[not found] ` <20120119143607.GF13932-VqjxzfR4DlwKmadIfiO5sKVXKuFTiq87@public.gmane.org>
2012-01-19 20:57 ` James Bottomley
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).