linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Possible issue: userspace stalls during memory reclaim and swap
@ 2025-07-15  4:14 Yeongjin Kwon
  0 siblings, 0 replies; only message in thread
From: Yeongjin Kwon @ 2025-07-15  4:14 UTC (permalink / raw)
  To: linux-mm

Hi,

I’ve observed that under heavy memory pressure conditions, the Linux
kernel appears to stall userspace entirely while reclaiming memory and
swapping it out to disk, basically when thrashing. This results in
repeated freezes of the desktop environment, each lasting between
fractions of a second and several seconds, particularly on
low-performance systems with limited RAM and slow storage. When the
system is thrashing, it would be normal for applications to be less
responsive, but it is particularly detrimental to user experience for
the desktop environment to stop responding. My efforts towards
preventing desktop environment stalling have lead me to believe the
issue originates in the kernel. I wanted to share my observations and
see if others have noticed similar behavior or have suggestions. I am
not very experienced in the development and internals of the linux
kernel, so I would appreciate any help I can get.

I've been using Linux as my daily driver for 3-4 years on a
first-generation Microsoft Surface Pro with 4 GB of RAM and an SSD.
The issue I describe has persisted across several distributions
(Ubuntu, Kubuntu, Arch Linux) and kernel variants (stock, zen, LTS,
hardened) for the whole duration that I have used linux. Here are my
observations:

System freezes occur when memory usage is high and the system reclaims
memory and swaps data out.
Swap usage alone (when memory is available) does not tend to cause freezes.
Freezing tends to happen when switching to an application that hasn't
been used recently.
The problem worsens significantly when using a USB stick as the root
filesystem and swap device, suggesting I/O speed exacerbates the
issue.
The stock arch linux kernel has zswap enabled by default. Disabling
zswap improved responsiveness significantly, likely because
compression overhead was reduced.
I have also observed this problem on chromebooks used for education.
When a lot of applications were in use, the computer would display the
same freezing issues. The problem might have been more severe on those
computers since they used zswap/zram if I am correct.

I have tried the following mitigations but they did not work.
- Assigned memory protections to desktop environment processes using
cgroups, specifically the memory.min value (slightly helped).
- Tried the realtime kernel and made DE processes realtime.
- Increased vm.watermark_scale_factor, which made the problem worse
(likely due to more frequent reclaim activity).

I switched to a computer with much better specs around a month ago, so
this issue might have been fixed during that month when I was not
using my original computer. I am doubtful though since the issue has
not been addressed in all the years I have used linux. The last kernel
version I used on the surface pro might have been 6.14, but I am not
sure.

The issue should be reproducible on any low-memory, slow-storage
system by doing the following:
1. Fill memory and swap by launching multiple applications.
2. Keep one app in the foreground for a while.
3. Switch rapidly between apps or desktop widgets.

Here is a possible experiment to demonstrate the issue:
1. Create a block device using device-mapper and enable it as a swap partition
2. Suspend IO through the device mapper to simulate very slow swap
I/O. Or use the dm-delay mapper to delay writes to swap.
3. Observe if the system stalls globally.

If the kernel is functioning properly, only processes dependent on
swap or requesting more memory should block, while others should
remain responsive (e.g., those with mlocked memory or not requesting
new pages). If all userspace stalls, it may indicate an opportunity
for improving reclaim behavior. Additionally, certain processes could
also be granted memory protections and observed to see if they stall.
The kernel could also be traced/probed to see which part of the kernel
is stalling, if any. I have not done this experiment though so I do
not know what the outcome would be. I believe processes that are
stalled because of swap or reclaim should be placed on a queue so that
the kernel can service them one by one while letting the rest of the
system continue running.
When the system is thrashing, it is normal for applications to stall
and/or stop working. However, the desktop environment should keep
running responsively, which is impossible to accomplish if the kernel
indeed stalls all userspace for prolonged periods of time. A
responsive desktop environment is the most important factor of a good
experience using a computer, and is powerful enough to at least
partially make up for any adverse experiences caused by unresponsive
applications. The kernel could also heuristically control which
processes have their memory swapped out based on how responsive they
should be and how recently the processes were active, so that the
system provides a controlled user experience even while thrashing.
This problem disproportionately affects lower-end systems (e.g.,
educational Chromebooks), but can also impact high-end systems under
extreme memory pressure. It could hinder adoption of linux on many
systems.

I plan to continue testing on the Surface Pro and would appreciate any
insights or guidance from those more familiar with memory management
internals. I'd also like to know if others have observed similar
stalls.

Thank you,
Yeongjin Kwon


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-07-15  4:15 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-15  4:14 Possible issue: userspace stalls during memory reclaim and swap Yeongjin Kwon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).