From: Petr Mladek <pmladek@suse.cz>
To: Tejun Heo <tj@kernel.org>
Cc: linux-nfs@vger.kernel.org, Borislav Petkov <bp@suse.de>,
Thomas Gleixner <tglx@linutronix.de>,
Jiri Kosina <jkosina@suse.cz>,
Peter Zijlstra <peterz@infradead.org>,
Richard Weinberger <richard@nod.at>,
Trond Myklebust <trond.myklebust@primarydata.com>,
Oleg Nesterov <oleg@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
linux-kernel@vger.kernel.org, Michal Hocko <mhocko@suse.cz>,
Chris Mason <clm@fb.com>, Ingo Molnar <mingo@redhat.com>,
linux-mtd@lists.infradead.org, linux-api@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
live-patching@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
David Woodhouse <dwmw2@infradead.org>,
Anna Schumaker <anna.schumaker@netapp.com>
Subject: Re: [RFC PATCH 07/18] kthread: Make iterant kthreads freezable by default
Date: Fri, 12 Jun 2015 15:24:40 +0200 [thread overview]
Message-ID: <20150612132440.GH9409@pathway.suse.cz> (raw)
In-Reply-To: <20150610043154.GG11955@mtj.duckdns.org>
On Wed 2015-06-10 13:31:54, Tejun Heo wrote:
> Hello, Petr.
>
> On Tue, Jun 09, 2015 at 05:53:13PM +0200, Petr Mladek wrote:
> > I think that the interaction with the hardware should be the reason to
> > make them properly freezable. In the current state they are stopped at
> > some random place, they might either miss some event from the hardware
> > or the hardware might get resumed into another state and the kthread
> > might wait forever.
>
> Yes, IIUC, there are two classes of use cases where freezing kthreads
> makes sense.
First, thanks a lot for detailed explanation.
> * While hibernating, freezing writeback workers and whoever else which
> might issue IOs. This is because we have to thaw devices to issue
> IOs to write out the prepared hibernation image. If new IOs are
> issued from page cache or whereever when the devices are thawed for
> writing out the hibernation image, the hibernation image and the
> data on the disk deviate.
Just an idea. I do not say that it is a good solution. If we already
thaw devices needed to write the data. It should be possible to thaw
also kthreads needed to write the data.
> Note that this only works because currently none of the block
> drivers which may be used to write out hibernation images depend on
> freezable kthreads and hopefully nobody issues IOs from unfreezable
> kthreads or bh or softirq, which, I think, can happen with
> preallocated requests or bio based devices.
It means that some kthreads must be stopped, some must be available,
and we do not mind about the rest. I wonder which group is bigger.
It might help to decide about the more safe default. It is not easy
for me to decide :-/
> This is a very brittle approach. Instead of controlling things
> where it actually can be controlled - the IO ingress point - we're
> trying to control all its users with a pretty blunt tool while at
> the same time depending on the fact that some of the low level
> subsystems don't happen to make use of the said blunt tool.
>
> I think what we should do is simply blocking all non-image IOs from
> the block layer while writing out hibernation image. It'd be
> simpler and a lot more reliable.
This sounds like an interesting idea to me. Do you already have some
more precise plan in mind?
Unfortunately, for me it is not easy to judge it. I wonder how many
interfaces would need to get hacked to make this working.
Also I wonder if they would be considered as a hack or a clean
solution by the affected parties. By other words, I wonder if it is
realistic.
> * Device drivers which interact with their devices in a fully
> synchronous manner so that blocking the kthread always reliably
> quiesces the device. For this to be true, the device shouldn't
> carry over any state across the freezing points. There are drivers
> which are actually like this and it works for them.
I am trying to sort it in my head. In each case, we need to stop these
kthreads in some well defined state. Also the kthread (or device)
must be able to block the freezing if they are not in the state yet.
The stop points are defined by try_to_freeze() now. And freezable
kthreads block the freezer until they reach these points.
> However, my impression is that people don't really scrutinize why
> freezer works for a specific case and tend to spray them all over
> and develop a fuzzy sense that freezing will somehow magically make
> the driver ready for PM operatoins. It doesn't help that freezer is
> such a blunt tool and our historical usage has always been fuzzy -
> in the earlier days, we depended on freezer even when we knew it
> wasn't sufficient probably because updating all subsystems and
> drivers were such a huge undertaking and freezer kinda sorta works
> in many cases.
I try to better understand why freezer is considered to be a blunt
tool. Is it because it is a generic API, try_to_freeze() is put on
"random" locations, so that it does not define the safe point
precisely enough?
> IMHO we'd be a lot better served by blocking the kthread from PM
> callbacks explicitly for these drivers to unify control points for
> PM operations and make what's going on a lot more explicit. This
> will surely involve a bit more boilerplate code but with the right
> helpers it should be mostly trivial and I believe that it's likely
> to encourage higher quality PM operations why getting rid of this
> fuzzy feeling around the freezer.
I agree. There is needed some synchronization between the device
drivers and kthread to make sure that they are on the same note.
If I get this correctly. It works the following way now. PM notifies
both kthreads (via freezer) and device drivers (via callbacks) about
that the suspend is in progress. They are supposed to go into a sane
state. But there is no explicit communication between the kthread
and the driver.
What do you exactly mean with the callbacks? Should they set some
flags that would navigate the kthread into some state?
> I'm strongly against this. We really don't want to make it even
> fuzzier. There are finite things which need to be controlled for PM
> operations and I believe what we need to do is identifying them and
> implementing explicit and clear control mechanisms for them, not
> spreading this feel-good mechanism even more, further obscuring where
> those points are.
I see. This explains why you are so strongly against changing the
default. I see the following in the kernel sources:
61 set_freezable()
97 kthread_create()
259 kthread_run()
I means that only about 17% kthreads are freezable these days.
> This becomes the game of "is it frozen ENOUGH yet?" and that "enough"
> is extremely difficult to determine as we're not looking at the choke
> spots at all and freezable kthreads only cover part of kernel
> activity. The fuzzy enoughness also actually inhibits development of
> proper mechanisms - "I believe this is frozen ENOUGH at this point so
> it is probably safe to assume that X, Y and Z aren't happening
> anymore" and it usually is except when it's not.
I am not sure what you mean by frozen enough? I think that a tasks
is either frozen or not. Do I miss something, please?
Best Regards,
Petr
next prev parent reply other threads:[~2015-06-12 13:25 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-05 15:00 [RFC PATCH 00/18] kthreads/signal: Safer kthread API and signal handling Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 01/18] kthread: Allow to call __kthread_create_on_node() with va_list args Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 02/18] kthread: Add API for iterant kthreads Petr Mladek
2015-06-09 6:23 ` Tejun Heo
2015-06-15 12:46 ` Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 03/18] kthread: Add kthread_stop_current() Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 04/18] signal: Rename kernel_sigaction() to kthread_sigaction() and clean it up Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 05/18] freezer/scheduler: Add freezable_cond_resched() Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 06/18] signal/kthread: Initial implementation of kthread signal handling Petr Mladek
2015-06-06 21:58 ` Oleg Nesterov
2015-06-08 13:51 ` Petr Mladek
2015-06-08 21:13 ` Oleg Nesterov
2015-06-15 13:13 ` Petr Mladek
2015-06-15 19:14 ` Oleg Nesterov
2015-06-16 7:54 ` Petr Mladek
2015-06-09 7:10 ` Tejun Heo
2015-06-09 12:15 ` Jiri Kosina
2015-06-10 3:13 ` Tejun Heo
2015-06-05 15:01 ` [RFC PATCH 07/18] kthread: Make iterant kthreads freezable by default Petr Mladek
2015-06-09 7:20 ` Tejun Heo
2015-06-09 15:53 ` Petr Mladek
2015-06-10 4:31 ` Tejun Heo
2015-06-12 13:24 ` Petr Mladek [this message]
2015-06-13 23:22 ` Tejun Heo
2015-06-15 9:28 ` Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 08/18] kthread: Allow to get struct kthread_iterant from task_struct Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 09/18] kthread: Make it easier to correctly sleep in iterant kthreads Petr Mladek
2015-06-05 16:10 ` Peter Zijlstra
2015-06-08 10:01 ` Petr Mladek
2015-06-08 11:39 ` Peter Zijlstra
2015-06-09 15:25 ` Petr Mladek
2015-06-10 9:05 ` Peter Zijlstra
2015-06-09 7:32 ` Tejun Heo
2015-06-08 17:48 ` Steven Rostedt
2015-06-10 9:07 ` Peter Zijlstra
2015-06-10 14:07 ` Steven Rostedt
2015-06-11 4:28 ` Jiri Kosina
2015-06-05 15:01 ` [RFC PATCH 10/18] jffs2: Remove forward definition of jffs2_garbage_collect_thread() Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 11/18] jffs2: Convert jffs2_gcd_mtd kthread into the iterant API Petr Mladek
2015-06-06 21:16 ` Oleg Nesterov
2015-06-06 21:32 ` Jiri Kosina
2015-06-06 22:30 ` Oleg Nesterov
2015-06-06 22:44 ` Jiri Kosina
2015-06-06 22:58 ` Oleg Nesterov
2015-06-05 15:01 ` [RFC PATCH 12/18] lockd: Convert the central lockd service to kthread_iterant API Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 13/18] ring_buffer: Use iterant kthreads API in the ring buffer benchmark Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 14/18] ring_buffer: Allow to cleanly freeze the ring buffer benchmark kthreads Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 15/18] ring_buffer: Allow to exit the ring buffer benchmark immediately Petr Mladek
2015-06-08 17:44 ` Steven Rostedt
2015-06-15 15:23 ` Petr Mladek
2015-06-15 15:33 ` Steven Rostedt
2015-06-15 15:54 ` Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 16/18] kthread: Support interruptible sleep with a timeout by iterant kthreads Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 17/18] ring_buffer: Use the new API for a sleep with a timeout in the benchmark Petr Mladek
2015-06-05 15:01 ` [RFC PATCH 18/18] jffs2: Use the new API for a sleep with a timeout Petr Mladek
2015-06-05 16:22 ` [RFC PATCH 00/18] kthreads/signal: Safer kthread API and signal handling Peter Zijlstra
2015-06-09 6:14 ` Tejun Heo
2015-06-10 10:40 ` Peter Zijlstra
2015-06-11 22:02 ` Tejun Heo
2015-06-09 6:10 ` Tejun Heo
2015-06-09 7:58 ` Tejun Heo
2015-06-17 11:34 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150612132440.GH9409@pathway.suse.cz \
--to=pmladek@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=anna.schumaker@netapp.com \
--cc=bp@suse.de \
--cc=clm@fb.com \
--cc=dwmw2@infradead.org \
--cc=jkosina@suse.cz \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mtd@lists.infradead.org \
--cc=linux-nfs@vger.kernel.org \
--cc=live-patching@vger.kernel.org \
--cc=mhocko@suse.cz \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=richard@nod.at \
--cc=rostedt@goodmis.org \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=trond.myklebust@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).