[LTP] [PATCH 0/1] overcommit_memory: Remove unstable subtest

All of lore.kernel.org
 help / color / mirror / Atom feed

* [LTP] [PATCH 0/1] overcommit_memory: Remove unstable subtest
@ 2020-11-16 13:09 Joerg Vehlow
  2020-11-16 13:09 ` [LTP] [PATCH] " Joerg Vehlow
  2020-11-17  9:28 ` [LTP] [PATCH 0/1] " Richard Palethorpe
  0 siblings, 2 replies; 8+ messages in thread
From: Joerg Vehlow @ 2020-11-16 13:09 UTC (permalink / raw)
  To: ltp

Hi,

this is something like an RFC. (I think I mixed my thoughts
between this and the patch description. Maybe read the patch
description first).
I found the overcommit_memory test, that tries to allocate
all alocatable memory for overcommit policy never, failed 
a lot and a lot more often, if the system has more memory.
When looking at the kernel source I found the reason: 
The percpu counter that counts the used memory uses a 
counter for every cpu, if the allocation or deallocations 
are very small. The more memory the system has, 
the bigger "small" is defined. See mm_compute_batch.

I started seeing this issue a lot after upgrading to 20200930
comming from 20190115. Some changes in the framework may have
led to this.

I don't think this is a kernel bug, but a result from switching
between overcommit modes. In overcommit mode never, the batch
size is a lot smaller than in the other modes
(ram_pages/cpus/256 instead of ra,_pages/cpus/4).
This leads to allocations done before switching the mode to be
accounted in the per cpu counters, and deallocated after in the
global counter, making the global counter negative. If the
overcommit mode was the same all the time, it should all have
been accounted in the same counters and the global counter
wouldn't be negative.

J?rg

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] [PATCH] overcommit_memory: Remove unstable subtest
  2020-11-16 13:09 [LTP] [PATCH 0/1] overcommit_memory: Remove unstable subtest Joerg Vehlow
@ 2020-11-16 13:09 ` Joerg Vehlow
  2020-11-17  9:28 ` [LTP] [PATCH 0/1] " Richard Palethorpe
  1 sibling, 0 replies; 8+ messages in thread
From: Joerg Vehlow @ 2020-11-16 13:09 UTC (permalink / raw)
  To: ltp

From: Joerg Vehlow <joerg.vehlow@aox-tech.de>

The test sets overcommit policy to never overcommit and then tries
to allocate all possible memory. This should fail, since there is already
some memory allocated for running the test programm, but due to inaccurate
memory accounting in mm/util.c __vm_enough_memory(), the allocation can still
succeed.
The commited memory is stored in a percpu counter, that counts in 1 + ncpu
variables. For small allocations and deallocations, the memory is counted
in a counter per cpu, without locking. For larger (de-)allocations, it is
counted in a unified counter under lock. This can lead to this szenario:
The counters for each cpu account for all used memory, while the unified
counter might even be negative (because it only registered deallocations).
The function __vm_enough_memory() takes only the unified counter into
account and truncates negative values. This leads to the successfull
allocation of commit_limit sometimes.
This may be a design decision to prevent lock contention, since the more
accurate percpu_counter_sum (in contrast to percpu_counter_read_positive)
needs to lock the spinlock in order to get correct values.

Signed-off-by: Joerg Vehlow <joerg.vehlow@aox-tech.de>
---
 testcases/kernel/mem/tunable/overcommit_memory.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/testcases/kernel/mem/tunable/overcommit_memory.c b/testcases/kernel/mem/tunable/overcommit_memory.c
index f77939908..3ce0f05c8 100644
--- a/testcases/kernel/mem/tunable/overcommit_memory.c
+++ b/testcases/kernel/mem/tunable/overcommit_memory.c
@@ -154,7 +154,6 @@ static void overcommit_memory_test(void)

 	update_mem_commit();
 	alloc_and_check(commit_left * 2, EXPECT_FAIL);
-	alloc_and_check(commit_limit, EXPECT_FAIL);
 	update_mem_commit();
 	alloc_and_check(commit_left / 2, EXPECT_PASS);

-- 
2.25.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [LTP] [PATCH 0/1] overcommit_memory: Remove unstable subtest
  2020-11-16 13:09 [LTP] [PATCH 0/1] overcommit_memory: Remove unstable subtest Joerg Vehlow
  2020-11-16 13:09 ` [LTP] [PATCH] " Joerg Vehlow
@ 2020-11-17  9:28 ` Richard Palethorpe
  2020-11-17 10:06   ` Joerg Vehlow
  1 sibling, 1 reply; 8+ messages in thread
From: Richard Palethorpe @ 2020-11-17  9:28 UTC (permalink / raw)
  To: ltp

Hello,

Joerg Vehlow <lkml@jv-coder.de> writes:

> Hi,
>
> this is something like an RFC. (I think I mixed my thoughts
> between this and the patch description. Maybe read the patch
> description first).
> I found the overcommit_memory test, that tries to allocate
> all alocatable memory for overcommit policy never, failed 
> a lot and a lot more often, if the system has more memory.
> When looking at the kernel source I found the reason: 
> The percpu counter that counts the used memory uses a 
> counter for every cpu, if the allocation or deallocations 
> are very small. The more memory the system has, 
> the bigger "small" is defined. See mm_compute_batch.
>
> I started seeing this issue a lot after upgrading to 20200930
> comming from 20190115. Some changes in the framework may have
> led to this.
>
> I don't think this is a kernel bug, but a result from switching
> between overcommit modes. In overcommit mode never, the batch
> size is a lot smaller than in the other modes
> (ram_pages/cpus/256 instead of ra,_pages/cpus/4).
> This leads to allocations done before switching the mode to be
> accounted in the per cpu counters, and deallocated after in the
> global counter, making the global counter negative. If the
> overcommit mode was the same all the time, it should all have
> been accounted in the same counters and the global counter
> wouldn't be negative.
>
> J?rg

Possibly /proc/sys/vm/stat_refresh can be used to flush these counters
after changing the overcommit policy?

For reference see Documentation/admin-guide/sysctl/vm.rst.

I guess that if these counters turning negative is considered a bug then
a warning will be printed otherwise the test needs to be smarter.

-- 
Thank you,
Richard.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] [PATCH 0/1] overcommit_memory: Remove unstable subtest
  2020-11-17  9:28 ` [LTP] [PATCH 0/1] " Richard Palethorpe
@ 2020-11-17 10:06   ` Joerg Vehlow
  2020-11-17 11:03     ` Richard Palethorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Joerg Vehlow @ 2020-11-17 10:06 UTC (permalink / raw)
  To: ltp

Hi Richard,
> Possibly /proc/sys/vm/stat_refresh can be used to flush these counters
> after changing the overcommit policy?
This sounds interesting. From the code I cannot see that it works with 
vm_committed_as.
If this is not done with some "invisible" magic, I do not think it will 
help.
> I guess that if these counters turning negative is considered a bug then
> a warning will be printed otherwise the test needs to be smarter.
No I do not think so. For correct memory accounting, used for e.g. 
Commited_AS in /proc /meminfo, the memory is summed up correctly using 
global_counter + sum(cpu_counters). If global_counter is negative, than 
the cpu_counters are positive by at least the same amount and the 
overall memory is reported correctly again.

The problem here is, that __vm_enough_memory in mm/util.c uses only the 
global_counter truncated at 0, through percpu_counter_read_positive. I 
guess this is intentional, because it prevents locking for memory 
allocations. The summation function used above requires locking.

The test cannot know what value the 1+ncpu counters have, so it cannot 
print a warning or anything.

While writing this an checking the kernel code I realized that this 
issue shouldn't exist anymore with linux 5.9.
Commit 56f3547bfa4d361148aa748ccb86073bc57f5e6c syncs the counters (i.e. 
summing up the total value in the gobal_counter), when the overcommit 
policy is changed to NEVER.
Yet this subtest cannot be considered a test f or this change, because 
for that allocations and deallocations have to be carefully constructed, 
to trigger the behavior I described.

I am not sure what ltp's policy is for supporting older kernel versions. 
Probably not supporting them, then this patch should be rejected, but 
maybe the commit id should be mentioned there.

J?rg

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] [PATCH 0/1] overcommit_memory: Remove unstable subtest
  2020-11-17 10:06   ` Joerg Vehlow
@ 2020-11-17 11:03     ` Richard Palethorpe
  2020-11-17 11:48       ` Joerg Vehlow
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Palethorpe @ 2020-11-17 11:03 UTC (permalink / raw)
  To: ltp

Hello Joerg,

Joerg Vehlow <lkml@jv-coder.de> writes:

> Hi Richard,
>> Possibly /proc/sys/vm/stat_refresh can be used to flush these counters
>> after changing the overcommit policy?
> This sounds interesting. From the code I cannot see that it works with
> vm_committed_as.
> If this is not done with some "invisible" magic, I do not think it
> will help.
>> I guess that if these counters turning negative is considered a bug then
>> a warning will be printed otherwise the test needs to be smarter.
> No I do not think so. For correct memory accounting, used for
> e.g. Commited_AS in /proc /meminfo, the memory is summed up correctly
> using global_counter + sum(cpu_counters). If global_counter is
> negative, than the cpu_counters are positive by at least the same
> amount and the overall memory is reported correctly again.

Right, and it seems that the percpu counters are not flushed to the
global counter when performing the sum, so it would not be possible to
use /meminfo to flush the counters or anything else AFAICT.

>
> The problem here is, that __vm_enough_memory in mm/util.c uses only
> the global_counter truncated at 0, through
> percpu_counter_read_positive. I guess this is intentional, because it
> prevents locking for memory allocations. The summation function used
> above requires locking.
>
> The test cannot know what value the 1+ncpu counters have, so it cannot
> print a warning or anything.

To clarify, I meant that the kernel will (probably) print a warning if a
counter value turns negative unexpectedly. Which is the case with the
page counter used in cgroups.

>
> While writing this an checking the kernel code I realized that this
> issue shouldn't exist anymore with linux 5.9.
> Commit 56f3547bfa4d361148aa748ccb86073bc57f5e6c syncs the counters
> (i.e. summing up the total value in the gobal_counter), when the
> overcommit policy is changed to NEVER.
> Yet this subtest cannot be considered a test f or this change, because
> for that allocations and deallocations have to be carefully
> constructed, to trigger the behavior I described.
>
> I am not sure what ltp's policy is for supporting older kernel
> versions. Probably not supporting them, then this patch should be
> rejected, but maybe the commit id should be mentioned there.
>
> J?rg

I think in general older versions are supported, at least back to 2.6
(although you may need to compile in a newer user land). However it
depends on the test, so we maybe could disable the test on older
kernels, but changes like the above are often backported to older
kernels...

Possibly the test should be converted to check for regressions to the
above commit? Which will probably also test whether setting overcommit
works as a byproduct.


-- 
Thank you,
Richard.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] [PATCH 0/1] overcommit_memory: Remove unstable subtest
  2020-11-17 11:03     ` Richard Palethorpe
@ 2020-11-17 11:48       ` Joerg Vehlow
  2020-11-17 12:45         ` Richard Palethorpe
  0 siblings, 1 reply; 8+ messages in thread
From: Joerg Vehlow @ 2020-11-17 11:48 UTC (permalink / raw)
  To: ltp

Hi
> I think in general older versions are supported, at least back to 2.6
> (although you may need to compile in a newer user land). However it
> depends on the test, so we maybe could disable the test on older
> kernels, but changes like the above are often backported to older
> kernels...
>
> Possibly the test should be converted to check for regressions to the
> above commit? Which will probably also test whether setting overcommit
> works as a byproduct.
>
In that case, I would vote to either remove the subtest, or make it more 
permissive, by using something like 1.5x commit_limit. That might also 
fail, but is very less likely.

For the new change I would rather create a new test, that tests exactly 
this change, although the more accurate accounting is more or less a 
by-product of the change, that is not even documented there... This is 
all about changing the batch size. They just added the synchronization 
of the counters, because they enlarge the batch size for policies, but 
NEVER and that could lead to the situation I described even more frequently.
Now that code of mm_compute_batch before the change, I wonder how this 
was even possible... The batch size was constant, if no memory hotplug 
occurred. So normally allocations and deallocation should be accounted 
for in the same counter type (although maybe the cpu that does the 
accounting may differ; allocated on core 0 deallocated on 1).
But never allocated on a cpu counter and deallocated on the global counter.

Ohh this could be mremap:
If a memory region is allocated with mmap and then grown with mremap, 
this may lead to these small allocations being added to the per cpu 
counters and the deallocation of the bigger range subtracted from the 
global counter. This could be e.g. a stack, that had to grow.
I wonder if this could overflow the global counter, if done often enough.

J?rg

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] [PATCH 0/1] overcommit_memory: Remove unstable subtest
  2020-11-17 11:48       ` Joerg Vehlow
@ 2020-11-17 12:45         ` Richard Palethorpe
  2020-11-18  7:54           ` Joerg Vehlow
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Palethorpe @ 2020-11-17 12:45 UTC (permalink / raw)
  To: ltp

Hello,

Joerg Vehlow <lkml@jv-coder.de> writes:

> Hi
>> I think in general older versions are supported, at least back to 2.6
>> (although you may need to compile in a newer user land). However it
>> depends on the test, so we maybe could disable the test on older
>> kernels, but changes like the above are often backported to older
>> kernels...
>>
>> Possibly the test should be converted to check for regressions to the
>> above commit? Which will probably also test whether setting overcommit
>> works as a byproduct.
>>
> In that case, I would vote to either remove the subtest, or make it
> more permissive, by using something like 1.5x commit_limit. That might
> also fail, but is very less likely.

More permissive sounds good to me, as often these tests trigger some
kernel error not related to the original intent of the test.

If we continue to see failures then it might be possible to scale the
commit_limit dynamically to avoid them, but for now this seems like a
good solution.

>
> For the new change I would rather create a new test, that tests
> exactly this change, although the more accurate accounting is more or
> less a by-product of the change, that is not even documented
> there... This is all about changing the batch size. They just added
> the synchronization of the counters, because they enlarge the batch
> size for policies, but NEVER and that could lead to the situation I
> described even more frequently.
> Now that code of mm_compute_batch before the change, I wonder how this
> was even possible... The batch size was constant, if no memory hotplug 
> occurred. So normally allocations and deallocation should be accounted
> for in the same counter type (although maybe the cpu that does the 
> accounting may differ; allocated on core 0 deallocated on 1).
> But never allocated on a cpu counter and deallocated on the global counter.
>
> Ohh this could be mremap:
> If a memory region is allocated with mmap and then grown with mremap,
> this may lead to these small allocations being added to the per cpu 
> counters and the deallocation of the bigger range subtracted from the
> global counter. This could be e.g. a stack, that had to grow.
> I wonder if this could overflow the global counter, if done often enough.
>
> J?rg

That is an interesting possibility!


-- 
Thank you,
Richard.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [LTP] [PATCH 0/1] overcommit_memory: Remove unstable subtest
  2020-11-17 12:45         ` Richard Palethorpe
@ 2020-11-18  7:54           ` Joerg Vehlow
  0 siblings, 0 replies; 8+ messages in thread
From: Joerg Vehlow @ 2020-11-18  7:54 UTC (permalink / raw)
  To: ltp

Hi,

I did some experiments and made the per cpu counters visible. Then I 
realized kernel developers are not that stupid ;)
The per cpu counters can only grow up to batch size and down to negative 
batch size, then they are flushed back to the global counter.

That means the counter seen by __vm_enough_memory can only be off in the 
interval [-numcpu * batch_size, numcpu * batch_size]. So the test can be 
fixed by trying to allocate commit_left + numcpu * batch_size.
Now the only problem is, that the batch size is not visible to 
userspace, so has to be calculated. It depends on the number of cpus and 
total memory pages:
max(ncpus*2, 32, min(int32_max, ram_pages / ncpus / 256) = batch_size 
(in pages)

I guess we can determine these two values from userspace and add the 
batch size to commit_left.
I will prepare a new patch with this approach.

J?rg

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-11-18  7:54 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-11-16 13:09 [LTP] [PATCH 0/1] overcommit_memory: Remove unstable subtest Joerg Vehlow
2020-11-16 13:09 ` [LTP] [PATCH] " Joerg Vehlow
2020-11-17  9:28 ` [LTP] [PATCH 0/1] " Richard Palethorpe
2020-11-17 10:06   ` Joerg Vehlow
2020-11-17 11:03     ` Richard Palethorpe
2020-11-17 11:48       ` Joerg Vehlow
2020-11-17 12:45         ` Richard Palethorpe
2020-11-18  7:54           ` Joerg Vehlow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.