Re: squashfs can starve/block apps - Joakim Tjernlund (Nokia)

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Joakim Tjernlund (Nokia)" <joakim.tjernlund@nokia.com>
To: Phillip Lougher <phillip@squashfs.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Cc: "phillip.lougher@gmail.com" <phillip.lougher@gmail.com>
Subject: Re: squashfs can starve/block apps
Date: Sun, 6 Jul 2025 09:46:09 +0000	[thread overview]
Message-ID: <a215768cad23860c2e185e909472697a5bd4708f.camel@nokia.com> (raw)
In-Reply-To: <443821641.1977435.1751658706057@eu1.myprofessionalmail.com>

On Fri, 2025-07-04 at 20:51 +0100, Phillip Lougher wrote:
>
>
> > On 26/06/2025 15:27 BST Joakim Tjernlund (Nokia) <joakim.tjernlund@nokia.com> wrote:
> >
> >
> > On Thu, 2025-06-26 at 10:09 +0200, Joakim Tjernlund wrote:
> > > We have an app running on a squashfs RFS(XZ compressed) and a appfs also on squashfs.
> > > Whenever we validate an SW update image(stream a image.xz, uncompress it and on to /dev/null),
> > > the apps are starved/blocked and make almost no progress, system time in top goes up to 99+%
> > > and the console also becomes unresponsive.
> > >
> > > This feels like kernel is stuck/busy in a loop and does not let apps execute.
> > >
>
> I have been away at the Glastonbury festival, hence the delay in replying. But
> this isn't really anything to do with Squashfs per se, and basic computer
> science theory explains what is going on here.  So I'm surprised no-else has
> responded.
>
> > > Kernel 5.15.185
> > >
> > > Any ideas/pointers ?
>
> Yes,
>
> > >
> > >  Jocke
> >
> > This will reproduce the stuck behaviour we see:
> >  > cd /tmp (/tmp is an tmpfs)
> >  > wget
> > https://fullimage.xz/
>
> You've identified the cause here.
>
> >
> > So just downloading it to tmpfs will confuse squashfs, seems to
> > me that squashfs somehow see the xz compressed pages in page cache/VFS and
> > tried to do something with them.
>
> But this is the completely wrong conclusion.  Squashfs doesn't "magically"
> see files downloaded into a different filesystem and try to do something
> with them.
>
> What is happening is the system is thrashing, because the page cache doesn't
> have enough remaining space to contain the working set of the running
> application(s).
>
> See Wikipedia article
> https://en.wikipedia.org/wiki/Thrashing_(computer_science)
>
> Tmpfs filesystems (/tmp here) are not backed by physical media, and their
> content are stored in the page cache.  So in effect if fullImage.xz takes
> most of the page cache (system RAM), then there is no much space left to store
> the pages of the applications that are running, and they constantly replace
> each others pages.
>
> To make it easy, imagine we have two processes A and B, and the page cache
> doesn't have enough space to store both the pages for processes A and B.
>
> Now:
>
> 1. Process A starts and demand-pages pages into the page cache from the
>    Squashfs root filesystem.  This takes CPU resources to decompress the pages.
>    Process A runs for a while and then gets descheduled.
>
> 2. Process B starts and demand-pages pages into the page cache, replacing
>    Process A's pages.  It runs for a while and then gets descheduled.
>
> 3 Process A restarts and finds all its pages have gone from page cache, and so
>   it has to re-demand-page the pages back.  This replaces Process B's pages.
>
> 4. Process B restarts and finds all its pages have gone from the page cache ...
>
> In effect the system spends all it's time reading pages from the
> Squashfs root filesystem, and doesn't do anything else, and hence it looks
> like it has hung.
>
> This is not a fault with Squashfs, and it will happen with any filesystem
> (ext4 etc) when system memory is too small to contain the working set of
> pages.
>
> Now, to repeat what has caused this is the download of that fullImage.xz
> which has filled most of the page cache (system RAM).  To prevent that
> from happening, there are two obvious solutions:
>
> 1. Split fullImage.xz into pieces and only download one piece at a time.  This
>    will avoid filling up the page cache and the system trashing.
>
> 2. Kill all unnecessary applications and processes before downloading
>    fullImage.xz.  In doing that you reduce the working set to RAM available,
>    which will again prevent thrashing.
>
> Hope that helps.
>
> Phillip

You are absolutely right, above was low RAM due to filling the tmpfs RAM.
But what threw me off was that I observed the same when streaming XZ to /dev/null.

After som digging I found why, some XZ options do not respect "-0" presets
w.r.t dict size and reset it back to default. Once I changed from
  "-0 --check=crc32 --arm --lzma2=lp=2,lc=2"
to
  "-0 --check=crc32 --lzma2=dict=128KiB"
I got a stable system.

Perhaps xz -l could be improved to include dict size to make this more obvious?

 Jocke

     prev parent reply	other threads:[~2025-07-06  9:46 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-26  8:09 squashfs can starve/block apps Joakim Tjernlund (Nokia)
2025-06-26 14:27 ` Joakim Tjernlund (Nokia)
2025-07-04 19:51   ` Phillip Lougher
2025-07-06  9:46     ` Joakim Tjernlund (Nokia) [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a215768cad23860c2e185e909472697a5bd4708f.camel@nokia.com \
    --to=joakim.tjernlund@nokia.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=phillip.lougher@gmail.com \
    --cc=phillip@squashfs.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).