linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] memcg: convert threshold to bytes
@ 2015-10-05 21:44 Shaohua Li
  2015-10-05 22:03 ` Andrew Morton
  2015-10-06 17:01 ` Michal Hocko
  0 siblings, 2 replies; 9+ messages in thread
From: Shaohua Li @ 2015-10-05 21:44 UTC (permalink / raw)
  To: linux-mm; +Cc: akpm, Johannes Weiner, Michal Hocko

The page_counter_memparse() returns pages for the threshold, while
mem_cgroup_usage() returns bytes for memory usage. Convert the threshold
to bytes.

Looks a regression introduced by 3e32cb2e0a12b69150

Signed-off-by: Shaohua Li <shli@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
---
 mm/memcontrol.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1fedbde..d9b5c81 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3387,6 +3387,7 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
 	ret = page_counter_memparse(args, "-1", &threshold);
 	if (ret)
 		return ret;
+	threshold <<= PAGE_SHIFT;
 
 	mutex_lock(&memcg->thresholds_lock);
 
-- 
2.4.6

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] memcg: convert threshold to bytes
  2015-10-05 21:44 [PATCH] memcg: convert threshold to bytes Shaohua Li
@ 2015-10-05 22:03 ` Andrew Morton
  2015-10-06 17:01 ` Michal Hocko
  1 sibling, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2015-10-05 22:03 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-mm, Johannes Weiner, Michal Hocko

On Mon, 5 Oct 2015 14:44:22 -0700 Shaohua Li <shli@fb.com> wrote:

> The page_counter_memparse() returns pages for the threshold, while
> mem_cgroup_usage() returns bytes for memory usage. Convert the threshold
> to bytes.
> 
> Looks a regression introduced by 3e32cb2e0a12b69150

That was two years ago.  Why hasn't anyone noticed before now?

> ...
>
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3387,6 +3387,7 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
>  	ret = page_counter_memparse(args, "-1", &threshold);
>  	if (ret)
>  		return ret;
> +	threshold <<= PAGE_SHIFT;
>  
>  	mutex_lock(&memcg->thresholds_lock);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] memcg: convert threshold to bytes
  2015-10-05 21:44 [PATCH] memcg: convert threshold to bytes Shaohua Li
  2015-10-05 22:03 ` Andrew Morton
@ 2015-10-06 17:01 ` Michal Hocko
  2015-10-06 19:22   ` Andrew Morton
  1 sibling, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2015-10-06 17:01 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-mm, akpm, Johannes Weiner

On Mon 05-10-15 14:44:22, Shaohua Li wrote:
> The page_counter_memparse() returns pages for the threshold, while
> mem_cgroup_usage() returns bytes for memory usage. Convert the threshold
> to bytes.
> 
> Looks a regression introduced by 3e32cb2e0a12b69150

Yes. This suggests
Cc: stable # 3.19+

> Signed-off-by: Shaohua Li <shli@fb.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/memcontrol.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 1fedbde..d9b5c81 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3387,6 +3387,7 @@ static int __mem_cgroup_usage_register_event(struct mem_cgroup *memcg,
>  	ret = page_counter_memparse(args, "-1", &threshold);
>  	if (ret)
>  		return ret;
> +	threshold <<= PAGE_SHIFT;
>  
>  	mutex_lock(&memcg->thresholds_lock);
>  
> -- 
> 2.4.6
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] memcg: convert threshold to bytes
  2015-10-06 17:01 ` Michal Hocko
@ 2015-10-06 19:22   ` Andrew Morton
  2015-10-07  7:30     ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2015-10-06 19:22 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Shaohua Li, linux-mm, Johannes Weiner

On Tue, 6 Oct 2015 19:01:23 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> On Mon 05-10-15 14:44:22, Shaohua Li wrote:
> > The page_counter_memparse() returns pages for the threshold, while
> > mem_cgroup_usage() returns bytes for memory usage. Convert the threshold
> > to bytes.
> > 
> > Looks a regression introduced by 3e32cb2e0a12b69150
> 
> Yes. This suggests
> Cc: stable # 3.19+

But it's been this way for 2 years and nobody noticed it.  How come?

Or at least, nobody reported it.  Maybe people *have* noticed it, and
adjusted their userspace appropriately.  In which case this patch will
cause breakage.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] memcg: convert threshold to bytes
  2015-10-06 19:22   ` Andrew Morton
@ 2015-10-07  7:30     ` Michal Hocko
  2015-10-07  7:58       ` Andrew Morton
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2015-10-07  7:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Shaohua Li, linux-mm, Johannes Weiner

On Tue 06-10-15 12:22:25, Andrew Morton wrote:
> On Tue, 6 Oct 2015 19:01:23 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> 
> > On Mon 05-10-15 14:44:22, Shaohua Li wrote:
> > > The page_counter_memparse() returns pages for the threshold, while
> > > mem_cgroup_usage() returns bytes for memory usage. Convert the threshold
> > > to bytes.
> > > 
> > > Looks a regression introduced by 3e32cb2e0a12b69150
> > 
> > Yes. This suggests
> > Cc: stable # 3.19+
> 
> But it's been this way for 2 years and nobody noticed it.  How come?

Maybe we do not have that many users of this API with newer kernels.

> Or at least, nobody reported it.  Maybe people *have* noticed it, and
> adjusted their userspace appropriately.  In which case this patch will
> cause breakage.

I dunno, I would rather have it fixed than keep bug to bug compatibility
because they would eventually move to a newer kernel one day when they
see the "breakage" anyway.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] memcg: convert threshold to bytes
  2015-10-07  7:30     ` Michal Hocko
@ 2015-10-07  7:58       ` Andrew Morton
  2015-10-07  8:30         ` Michal Hocko
  2015-10-07 15:56         ` Greg Thelen
  0 siblings, 2 replies; 9+ messages in thread
From: Andrew Morton @ 2015-10-07  7:58 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Shaohua Li, linux-mm, Johannes Weiner

On Wed, 7 Oct 2015 09:30:02 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> On Tue 06-10-15 12:22:25, Andrew Morton wrote:
> > On Tue, 6 Oct 2015 19:01:23 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > > On Mon 05-10-15 14:44:22, Shaohua Li wrote:
> > > > The page_counter_memparse() returns pages for the threshold, while
> > > > mem_cgroup_usage() returns bytes for memory usage. Convert the threshold
> > > > to bytes.
> > > > 
> > > > Looks a regression introduced by 3e32cb2e0a12b69150
> > > 
> > > Yes. This suggests
> > > Cc: stable # 3.19+
> > 
> > But it's been this way for 2 years and nobody noticed it.  How come?
> 
> Maybe we do not have that many users of this API with newer kernels.

Either it's zero or all the users have worked around this bug.

> > Or at least, nobody reported it.  Maybe people *have* noticed it, and
> > adjusted their userspace appropriately.  In which case this patch will
> > cause breakage.
> 
> I dunno, I would rather have it fixed than keep bug to bug compatibility
> because they would eventually move to a newer kernel one day when they
> see the "breakage" anyway.

They'd only see breakage if we fixed this in the newer kernel.

We could just change the docs and leave it as-is.  That it is called
"usage_in_bytes" makes that a bit awkward.

A bit of googling indicates that people are using usage_in_bytes.  A
few.  All the discussions I found clearly predate this bug.

So did people just stop using this?  Is there some alternative way of
getting the same info?

Why does memcg_write_event_control() says "DO NOT USE IN NEW FILES"
and "DO NOT ADD NEW FILES"?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] memcg: convert threshold to bytes
  2015-10-07  7:58       ` Andrew Morton
@ 2015-10-07  8:30         ` Michal Hocko
  2015-10-07 15:25           ` Shaohua Li
  2015-10-07 15:56         ` Greg Thelen
  1 sibling, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2015-10-07  8:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Shaohua Li, linux-mm, Johannes Weiner

On Wed 07-10-15 00:58:20, Andrew Morton wrote:
> On Wed, 7 Oct 2015 09:30:02 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> 
> > On Tue 06-10-15 12:22:25, Andrew Morton wrote:
> > > On Tue, 6 Oct 2015 19:01:23 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> > > 
> > > > On Mon 05-10-15 14:44:22, Shaohua Li wrote:
> > > > > The page_counter_memparse() returns pages for the threshold, while
> > > > > mem_cgroup_usage() returns bytes for memory usage. Convert the threshold
> > > > > to bytes.
> > > > > 
> > > > > Looks a regression introduced by 3e32cb2e0a12b69150
> > > > 
> > > > Yes. This suggests
> > > > Cc: stable # 3.19+
> > > 
> > > But it's been this way for 2 years and nobody noticed it.  How come?
> > 
> > Maybe we do not have that many users of this API with newer kernels.
> 
> Either it's zero or all the users have worked around this bug.
> 
> > > Or at least, nobody reported it.  Maybe people *have* noticed it, and
> > > adjusted their userspace appropriately.  In which case this patch will
> > > cause breakage.
> > 
> > I dunno, I would rather have it fixed than keep bug to bug compatibility
> > because they would eventually move to a newer kernel one day when they
> > see the "breakage" anyway.
> 
> They'd only see breakage if we fixed this in the newer kernel.
> 
> We could just change the docs and leave it as-is.  That it is called
> "usage_in_bytes" makes that a bit awkward.

The whole API is bytes based. Having one which is silently page size
based is definitely wrong.
 
> A bit of googling indicates that people are using usage_in_bytes.  A
> few.  All the discussions I found clearly predate this bug.
>
> 
> So did people just stop using this?

To be honest I haven't seen any real users from my enterprise
distribution experience and I also consider the API quite unusable
because most loads simply fill up their limit with the page case so
something like a vmpressure is a much better indicator of the memory
usage.

This has been introduced before my time. Kirill has introduced it back
in 2009.

> Is there some alternative way of getting the same info?

I am not aware of any alternative nor am I aware of any strong usecases
for such a small granularity API. The consensus so far has been that any
new controller knob for the new cgrroup API has to be backed by a strong
usecase.

I am pretty sure that we want some form of memory pressure notification,
though.

> Why does memcg_write_event_control() says "DO NOT USE IN NEW FILES"
> and "DO NOT ADD NEW FILES"?

Because eventfd notification mechanism is considered a wrong API to
convey notifications. New cgroup API is supposed to use a new mechanism.
The current one will have to live with the old/legacy cgroup API though.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] memcg: convert threshold to bytes
  2015-10-07  8:30         ` Michal Hocko
@ 2015-10-07 15:25           ` Shaohua Li
  0 siblings, 0 replies; 9+ messages in thread
From: Shaohua Li @ 2015-10-07 15:25 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, linux-mm, Johannes Weiner

On Wed, Oct 07, 2015 at 10:30:04AM +0200, Michal Hocko wrote:
> On Wed 07-10-15 00:58:20, Andrew Morton wrote:
> > On Wed, 7 Oct 2015 09:30:02 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> > 
> > > On Tue 06-10-15 12:22:25, Andrew Morton wrote:
> > > > On Tue, 6 Oct 2015 19:01:23 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> > > > 
> > > > > On Mon 05-10-15 14:44:22, Shaohua Li wrote:
> > > > > > The page_counter_memparse() returns pages for the threshold, while
> > > > > > mem_cgroup_usage() returns bytes for memory usage. Convert the threshold
> > > > > > to bytes.
> > > > > > 
> > > > > > Looks a regression introduced by 3e32cb2e0a12b69150
> > > > > 
> > > > > Yes. This suggests
> > > > > Cc: stable # 3.19+
> > > > 
> > > > But it's been this way for 2 years and nobody noticed it.  How come?
> > > 
> > > Maybe we do not have that many users of this API with newer kernels.
> > 
> > Either it's zero or all the users have worked around this bug.
> > 
> > > > Or at least, nobody reported it.  Maybe people *have* noticed it, and
> > > > adjusted their userspace appropriately.  In which case this patch will
> > > > cause breakage.
> > > 
> > > I dunno, I would rather have it fixed than keep bug to bug compatibility
> > > because they would eventually move to a newer kernel one day when they
> > > see the "breakage" anyway.
> > 
> > They'd only see breakage if we fixed this in the newer kernel.
> > 
> > We could just change the docs and leave it as-is.  That it is called
> > "usage_in_bytes" makes that a bit awkward.
> 
> The whole API is bytes based. Having one which is silently page size
> based is definitely wrong.
>  
> > A bit of googling indicates that people are using usage_in_bytes.  A
> > few.  All the discussions I found clearly predate this bug.
> >
> > 
> > So did people just stop using this?
> 
> To be honest I haven't seen any real users from my enterprise
> distribution experience and I also consider the API quite unusable
> because most loads simply fill up their limit with the page case so
> something like a vmpressure is a much better indicator of the memory
> usage.
> 
> This has been introduced before my time. Kirill has introduced it back
> in 2009.
> 
> > Is there some alternative way of getting the same info?
> 
> I am not aware of any alternative nor am I aware of any strong usecases
> for such a small granularity API. The consensus so far has been that any
> new controller knob for the new cgrroup API has to be backed by a strong
> usecase.
> 
> I am pretty sure that we want some form of memory pressure notification,
> though.
> 
> > Why does memcg_write_event_control() says "DO NOT USE IN NEW FILES"
> > and "DO NOT ADD NEW FILES"?
> 
> Because eventfd notification mechanism is considered a wrong API to
> convey notifications. New cgroup API is supposed to use a new mechanism.
> The current one will have to live with the old/legacy cgroup API though.

The regression introduced at "Wed Dec 10 15:42:31 2014". It's not that
old. We are evaluating this API, that's why I notice it. We have several
applications poll /proc/meminfo to get free memory size and take
preventive action. When I say preventive action, the action is expected
to take before we hit memory pressure and page reclaim runs. The memory
pressure notification is great, but it runs in page reclaim, so a little
bit later. This usage_in_bytes notification looks suitable for our usage
case. Of course I'd like to hear if there is better option.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] memcg: convert threshold to bytes
  2015-10-07  7:58       ` Andrew Morton
  2015-10-07  8:30         ` Michal Hocko
@ 2015-10-07 15:56         ` Greg Thelen
  1 sibling, 0 replies; 9+ messages in thread
From: Greg Thelen @ 2015-10-07 15:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michal Hocko, Shaohua Li, linux-mm, Johannes Weiner


Andrew Morton wrote:

> On Wed, 7 Oct 2015 09:30:02 +0200 Michal Hocko <mhocko@kernel.org> wrote:
>
>> On Tue 06-10-15 12:22:25, Andrew Morton wrote:
>> > On Tue, 6 Oct 2015 19:01:23 +0200 Michal Hocko <mhocko@kernel.org> wrote:
>> > 
>> > > On Mon 05-10-15 14:44:22, Shaohua Li wrote:
>> > > > The page_counter_memparse() returns pages for the threshold, while
>> > > > mem_cgroup_usage() returns bytes for memory usage. Convert the threshold
>> > > > to bytes.
>> > > > 
>> > > > Looks a regression introduced by 3e32cb2e0a12b69150
>> > > 
>> > > Yes. This suggests
>> > > Cc: stable # 3.19+
>> > 
>> > But it's been this way for 2 years and nobody noticed it.  How come?
>> 
>> Maybe we do not have that many users of this API with newer kernels.
>
> Either it's zero or all the users have worked around this bug.
>
>> > Or at least, nobody reported it.  Maybe people *have* noticed it, and
>> > adjusted their userspace appropriately.  In which case this patch will
>> > cause breakage.
>> 
>> I dunno, I would rather have it fixed than keep bug to bug compatibility
>> because they would eventually move to a newer kernel one day when they
>> see the "breakage" anyway.
>
> They'd only see breakage if we fixed this in the newer kernel.
>
> We could just change the docs and leave it as-is.  That it is called
> "usage_in_bytes" makes that a bit awkward.
>
> A bit of googling indicates that people are using usage_in_bytes.  A
> few.  All the discussions I found clearly predate this bug.
>
> So did people just stop using this?  Is there some alternative way of
> getting the same info?

We (Google) are using byte based notifications on memory.limit_in_bytes
on a pre 3e32cb2e0a12b69150 kernel.  So we'd notice the regression when
running newer kernels.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-10-07 15:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-05 21:44 [PATCH] memcg: convert threshold to bytes Shaohua Li
2015-10-05 22:03 ` Andrew Morton
2015-10-06 17:01 ` Michal Hocko
2015-10-06 19:22   ` Andrew Morton
2015-10-07  7:30     ` Michal Hocko
2015-10-07  7:58       ` Andrew Morton
2015-10-07  8:30         ` Michal Hocko
2015-10-07 15:25           ` Shaohua Li
2015-10-07 15:56         ` Greg Thelen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).