From: "Joakim Tjernlund (Nokia)" <joakim.tjernlund@nokia.com>
To: Phillip Lougher <phillip@squashfs.org.uk>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Cc: "phillip.lougher@gmail.com" <phillip.lougher@gmail.com>
Subject: Re: squashfs can starve/block apps
Date: Sun, 6 Jul 2025 09:46:09 +0000 [thread overview]
Message-ID: <a215768cad23860c2e185e909472697a5bd4708f.camel@nokia.com> (raw)
In-Reply-To: <443821641.1977435.1751658706057@eu1.myprofessionalmail.com>
On Fri, 2025-07-04 at 20:51 +0100, Phillip Lougher wrote:
>
>
> > On 26/06/2025 15:27 BST Joakim Tjernlund (Nokia) <joakim.tjernlund@nokia.com> wrote:
> >
> >
> > On Thu, 2025-06-26 at 10:09 +0200, Joakim Tjernlund wrote:
> > > We have an app running on a squashfs RFS(XZ compressed) and a appfs also on squashfs.
> > > Whenever we validate an SW update image(stream a image.xz, uncompress it and on to /dev/null),
> > > the apps are starved/blocked and make almost no progress, system time in top goes up to 99+%
> > > and the console also becomes unresponsive.
> > >
> > > This feels like kernel is stuck/busy in a loop and does not let apps execute.
> > >
>
> I have been away at the Glastonbury festival, hence the delay in replying. But
> this isn't really anything to do with Squashfs per se, and basic computer
> science theory explains what is going on here. So I'm surprised no-else has
> responded.
>
> > > Kernel 5.15.185
> > >
> > > Any ideas/pointers ?
>
> Yes,
>
> > >
> > > Jocke
> >
> > This will reproduce the stuck behaviour we see:
> > > cd /tmp (/tmp is an tmpfs)
> > > wget
> > https://fullimage.xz/
>
> You've identified the cause here.
>
> >
> > So just downloading it to tmpfs will confuse squashfs, seems to
> > me that squashfs somehow see the xz compressed pages in page cache/VFS and
> > tried to do something with them.
>
> But this is the completely wrong conclusion. Squashfs doesn't "magically"
> see files downloaded into a different filesystem and try to do something
> with them.
>
> What is happening is the system is thrashing, because the page cache doesn't
> have enough remaining space to contain the working set of the running
> application(s).
>
> See Wikipedia article
> https://en.wikipedia.org/wiki/Thrashing_(computer_science)
>
> Tmpfs filesystems (/tmp here) are not backed by physical media, and their
> content are stored in the page cache. So in effect if fullImage.xz takes
> most of the page cache (system RAM), then there is no much space left to store
> the pages of the applications that are running, and they constantly replace
> each others pages.
>
> To make it easy, imagine we have two processes A and B, and the page cache
> doesn't have enough space to store both the pages for processes A and B.
>
> Now:
>
> 1. Process A starts and demand-pages pages into the page cache from the
> Squashfs root filesystem. This takes CPU resources to decompress the pages.
> Process A runs for a while and then gets descheduled.
>
> 2. Process B starts and demand-pages pages into the page cache, replacing
> Process A's pages. It runs for a while and then gets descheduled.
>
> 3 Process A restarts and finds all its pages have gone from page cache, and so
> it has to re-demand-page the pages back. This replaces Process B's pages.
>
> 4. Process B restarts and finds all its pages have gone from the page cache ...
>
> In effect the system spends all it's time reading pages from the
> Squashfs root filesystem, and doesn't do anything else, and hence it looks
> like it has hung.
>
> This is not a fault with Squashfs, and it will happen with any filesystem
> (ext4 etc) when system memory is too small to contain the working set of
> pages.
>
> Now, to repeat what has caused this is the download of that fullImage.xz
> which has filled most of the page cache (system RAM). To prevent that
> from happening, there are two obvious solutions:
>
> 1. Split fullImage.xz into pieces and only download one piece at a time. This
> will avoid filling up the page cache and the system trashing.
>
> 2. Kill all unnecessary applications and processes before downloading
> fullImage.xz. In doing that you reduce the working set to RAM available,
> which will again prevent thrashing.
>
> Hope that helps.
>
> Phillip
You are absolutely right, above was low RAM due to filling the tmpfs RAM.
But what threw me off was that I observed the same when streaming XZ to /dev/null.
After som digging I found why, some XZ options do not respect "-0" presets
w.r.t dict size and reset it back to default. Once I changed from
"-0 --check=crc32 --arm --lzma2=lp=2,lc=2"
to
"-0 --check=crc32 --lzma2=dict=128KiB"
I got a stable system.
Perhaps xz -l could be improved to include dict size to make this more obvious?
Jocke
prev parent reply other threads:[~2025-07-06 9:46 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-26 8:09 squashfs can starve/block apps Joakim Tjernlund (Nokia)
2025-06-26 14:27 ` Joakim Tjernlund (Nokia)
2025-07-04 19:51 ` Phillip Lougher
2025-07-06 9:46 ` Joakim Tjernlund (Nokia) [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a215768cad23860c2e185e909472697a5bd4708f.camel@nokia.com \
--to=joakim.tjernlund@nokia.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=phillip.lougher@gmail.com \
--cc=phillip@squashfs.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).