All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@kernel.org
To: linux-ext4@vger.kernel.org
Subject: [Bug 216322] Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze... task:fstrim  ext4_trim_fs - Dell XPS 13 9310
Date: Thu, 04 Aug 2022 11:47:47 +0000	[thread overview]
Message-ID: <bug-216322-13602-2MvUDlAfJU@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-216322-13602@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=216322

--- Comment #4 from Lukas Czerner (lczerner@redhat.com) ---
On Thu, Aug 04, 2022 at 12:44:45AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=216322
> 
> Theodore Tso (tytso@mit.edu) changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |tytso@mit.edu
> 
> --- Comment #2 from Theodore Tso (tytso@mit.edu) ---
> So the problem is that the FITRIM ioctl does not check if a signal is
> pending,
> and so if the fstrim program requests that the entire SSD (len=ULLONG_MAX),
> like the broomstick set off by Mickey Mouse in Fantasia's "Sorcerer's
> Apprentive", it will mindlessly send discard requests for any blocks not in
> use
> by the file system until it is done.   Or to put it another way, "Neither
> rain,
> nor snow, or a request to freeze the OS, shall stop the FITRIM ioctl from its
> appointed task."  :-)
> 
> The question is how to fix things.   The problem is that the FITRIM ioctl
> interface is pretty horrible.   The fstrim_range.len variable is an IN/OUT
> field where on the input it is the number of bytes that should be trimmed
> (from
> start to start+len) and when the ioctl returns fstrm_range.len is the number
> of
> bytes that were actually trimmed.   So this is not really amenable for
> -ERESTARTSYS.
> 
> Worse, the fstrim program in util-linux doesn't handle an EAGAIN error return
> code, so if it gets the EAGAIN after try_to_freeze_tasks send the fake signal
> to the process, fstrim will print to stderr "fstrim: FITRIM ioctl failed" and
> the rest of the file system trim operation will be aborted.
> 
> It might be that the only way we can fix this is to have FITRIM return
> EAGAIN,
> which will stop the fstrim in its tracks.  This is... not great, but
> typically
> fstrim is run out of crontab or a systemd timer once a month, so if the user
> tries to suspend right as the fstrim is running, hopefully we'll get lucky
> next
> month.    We can then try teach fstrim to do the right thing, and so this
> lossage mode would only happen in the combination of a new kernel and an
> older
> version of util-linux.
> 
> I'm not happy with that solution, but the alternative of creating a new
> FITRIM2
> ioctl that has a sane interface means that you need an new kernel and a new
> util-linux package, and if you don't, the user will have to deal with a hot
> laptop bag and a drained battery.   And not changing FITRIM's behaviour will
> have the same potential end result, if the user gets unlucky and tries to
> suspend the laptop when there is more than 60 seconds left before FITRIM to
> complete.   :-/
> 
> The other thing I'll note is that every file system has its own FITRIM
> implementation, and I suspect they all have this issue, because the FITRIM
> interface is fundamentally flawed.

I agree that the FITRIM interface is flawed in this way. But
ext4_try_to_trim_range() actually does have fatal_signal_pending() and
will return -ERESTARTSYS if that's true. Or did you have something else in
mind?

Also in that case, I see no reason why we would not be able to adjust
the fstrim_range to make it easier to re-start where we left off if
we're going to return -ERESTARTSYS. I am missing something?

I have not had time to look deeply into the traces, but are you actually
sure that we're not stuck in blkdev_issue_discard() instead?

-Lukas

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

  parent reply	other threads:[~2022-08-04 11:47 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-03 20:18 [Bug 216322] New: Freezing of tasks failed after 60.004 seconds (1 tasks refusing to freeze... task:fstrim ext4_trim_fs - Dell XPS 13 9310 bugzilla-daemon
2022-08-03 20:26 ` [Bug 216322] " bugzilla-daemon
2022-08-04  0:44 ` bugzilla-daemon
2022-08-04 11:47   ` Lukas Czerner
2022-08-04  1:01 ` bugzilla-daemon
2022-08-04 11:47 ` bugzilla-daemon [this message]
2022-08-04 14:45   ` Theodore Ts'o
2022-08-10  0:29   ` Dave Chinner
2022-08-04 14:45 ` bugzilla-daemon
2022-08-09 13:40 ` bugzilla-daemon
2022-08-09 17:48 ` bugzilla-daemon
2022-08-10  0:29 ` bugzilla-daemon
2022-09-09  6:02 ` bugzilla-daemon
2022-09-26 15:59 ` bugzilla-daemon
2022-09-27 13:24 ` bugzilla-daemon
2023-04-19 14:58 ` bugzilla-daemon
2023-04-21 23:46 ` bugzilla-daemon
2023-09-06 16:28 ` bugzilla-daemon
2023-09-13 15:02 ` bugzilla-daemon
2023-09-13 15:03 ` bugzilla-daemon
2023-09-25 12:04 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-216322-13602-2MvUDlAfJU@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.