From: Vojtech Pavlik <vojtech@suse.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>,
Jiri Kosina <jkosina@suse.cz>,
Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@redhat.com>,
Seth Jennings <sjenning@redhat.com>,
linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Arjan van de Ven <arjan@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Borislav Petkov <bp@alien8.de>,
live-patching@vger.kernel.org
Subject: Re: live kernel upgrades (was: live kernel patching design)
Date: Tue, 24 Feb 2015 13:36:01 +0100 [thread overview]
Message-ID: <20150224123601.GC3081@suse.cz> (raw)
In-Reply-To: <20150224102328.GC19976@gmail.com>
On Tue, Feb 24, 2015 at 11:23:29AM +0100, Ingo Molnar wrote:
> > Your upgrade proposal is an *enormous* disruption to the
> > system:
> >
> > - a latency of "well below 10" seconds is completely
> > unacceptable to most users who want to patch the kernel
> > of a production system _while_ it's in production.
>
> I think this statement is false for the following reasons.
The statement is very true.
> - I'd say the majority of system operators of production
> systems can live with a couple of seconds of delay at a
> well defined moment of the day or week - with gradual,
> pretty much open ended improvements in that latency
> down the line.
In the most usual corporate setting any noticeable outage, even out of
business hours, requires an ahead notice, and an agreement of all
stakeholders - teams that depend on the system.
If a live patching technology introduces an outage, it's not "live" and
because of these bureaucratic reasons, it will not be used and a regular
reboot will be scheduled instead.
> - I think your argument ignores the fact that live
> upgrades would extend the scope of 'users willing to
> patch the kernel of a production system' _enormously_.
>
> For example, I have a production system with this much
> uptime:
>
> 10:50:09 up 153 days, 3:58, 34 users, load average: 0.00, 0.02, 0.05
>
> While currently I'm reluctant to reboot the system to
> upgrade the kernel (due to a reboot's intrusiveness),
> and that is why it has achieved a relatively high
> uptime, but I'd definitely allow the kernel to upgrade
> at 0:00am just fine. (I'd even give it up to a few
> minutes, as long as TCP connections don't time out.)
>
> And I don't think my usecase is special.
I agree that this is useful. But it is a different problem that only
partially overlaps with what we're trying to achieve with live patching.
If you can make full kernel upgrades to work this way, which I doubt is
achievable in the next 10 years due to all the research and
infrastructure needed, then you certainly gain an additional group of
users. And a great tool. A large portion of those that ask for live
patching won't use it, though.
But honestly, I prefer a solution that works for small patches now, than
a solution for unlimited patches sometime in next decade.
> What gradual improvements in live upgrade latency am I
> talking about?
>
> - For example the majority of pure user-space process
> pages in RAM could be saved from the old kernel over
> into the new kernel - i.e. they'd stay in place in RAM,
> but they'd be re-hashed for the new data structures.
> This avoids a big chunk of checkpointing overhead.
I'd have hoped this would be a given. If you can't preserve memory
contents and have to re-load from disk, you can just as well reboot
entirely, the time needed will not be much more..
> - Likewise, most of the page cache could be saved from an
> old kernel to a new kernel as well - further reducing
> checkpointing overhead.
>
> - The PROT_NONE mechanism of the current NUMA balancing
> code could be used to transparently mark user-space
> pages as 'checkpointed'. This would reduce system
> interruption as only 'newly modified' pages would have
> to be checkpointed when the upgrade happens.
>
> - Hardware devices could be marked as 'already in well
> defined state', skipping the more expensive steps of
> driver initialization.
>
> - Possibly full user-space page tables could be preserved
> over an upgrade: this way user-space execution would be
> unaffected even in the micro level: cache layout, TLB
> patterns, etc.
>
> There's lots of gradual speedups possible with such a model
> IMO.
Yes, as I say above, guaranteeing decades of employment. ;)
> With live kernel patching we run into a brick wall of
> complexity straight away: we have to analyze the nature of
> the kernel modification, in the context of live patching,
> and that only works for the simplest of kernel
> modifications.
But you're able to _use_ it.
> With live kernel upgrades no such brick wall exists, just
> about any transition between kernel versions is possible.
The brick wall you run to is "I need to implement full kernel state
serialization before I can do anything at all." That's something that
isn't even clear _how_ to do. Particularly with Linux kernel's
development model where internal ABI and structures are always in flux
it may not even be realistic.
> Granted, with live kernel upgrades it's much more complex
> to get the 'simple' case into an even rudimentarily working
> fashion (full userspace state has to be enumerated, saved
> and restored), but once we are there, it's a whole new
> category of goodness and it probably covers 90%+ of the
> live kernel patching usecases on day 1 already ...
Feel free to start working on it. I'll stick with live patching.
--
Vojtech Pavlik
Director SUSE Labs
next prev parent reply other threads:[~2015-02-24 12:36 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-16 18:52 [PATCH 0/3] prevent /proc/<pid>/stack garbage for running tasks Josh Poimboeuf
2015-02-16 18:52 ` [PATCH 1/3] sched: add sched_task_call() Josh Poimboeuf
2015-02-16 20:44 ` Peter Zijlstra
2015-02-16 22:05 ` Josh Poimboeuf
2015-02-17 9:24 ` Peter Zijlstra
2015-02-17 14:12 ` Josh Poimboeuf
2015-02-17 18:15 ` Peter Zijlstra
2015-02-17 21:25 ` Josh Poimboeuf
2015-02-18 15:21 ` Peter Zijlstra
2015-02-18 17:12 ` Josh Poimboeuf
2015-02-19 0:20 ` Peter Zijlstra
2015-02-19 4:17 ` Josh Poimboeuf
2015-02-19 10:16 ` Peter Zijlstra
2015-02-19 16:24 ` Josh Poimboeuf
2015-02-19 16:33 ` Vojtech Pavlik
2015-02-19 17:03 ` Josh Poimboeuf
2015-02-19 17:08 ` Jiri Kosina
2015-02-19 17:19 ` Vojtech Pavlik
2015-02-19 17:32 ` Josh Poimboeuf
2015-02-19 17:48 ` Vojtech Pavlik
2015-02-19 20:40 ` Vojtech Pavlik
2015-02-19 21:42 ` Josh Poimboeuf
2015-02-20 7:46 ` Jiri Kosina
2015-02-20 8:49 ` Jiri Kosina
2015-02-20 9:50 ` Ingo Molnar
2015-02-20 10:02 ` Jiri Kosina
2015-02-20 10:44 ` live patching design (was: Re: [PATCH 1/3] sched: add sched_task_call()) Ingo Molnar
2015-02-20 10:58 ` Jiri Kosina
2015-02-20 19:49 ` Ingo Molnar
2015-02-20 21:46 ` Vojtech Pavlik
2015-02-20 22:08 ` Josh Poimboeuf
2015-02-21 18:30 ` Ingo Molnar
2015-02-22 8:52 ` Jiri Kosina
2015-02-22 10:17 ` Ingo Molnar
2015-02-22 19:18 ` Jiri Kosina
2015-02-23 12:43 ` Jiri Kosina
2015-02-24 10:37 ` Ingo Molnar
2015-02-21 18:18 ` Ingo Molnar
2015-02-21 18:57 ` Jiri Kosina
2015-02-21 19:16 ` Ingo Molnar
2015-02-21 19:31 ` Jiri Kosina
2015-02-21 19:48 ` Ingo Molnar
2015-02-21 20:10 ` Jiri Kosina
2015-02-21 20:53 ` Jiri Kosina
2015-02-22 8:46 ` Ingo Molnar
2015-02-22 9:08 ` Jiri Kosina
2015-02-22 9:46 ` live kernel upgrades (was: live kernel patching design) Ingo Molnar
2015-02-22 10:34 ` Ingo Molnar
2015-02-22 10:48 ` Ingo Molnar
2015-02-22 19:13 ` Jiri Kosina
2015-02-22 23:01 ` Andrew Morton
2015-02-23 0:18 ` Dave Airlie
2015-02-23 0:44 ` Arjan van de Ven
2015-02-23 8:17 ` Jiri Kosina
2015-02-23 10:42 ` Richard Weinberger
2015-02-23 11:08 ` Vojtech Pavlik
2015-02-23 11:50 ` Pavel Machek
2015-02-24 9:16 ` Ingo Molnar
2015-02-24 12:28 ` Jiri Slaby
2015-03-05 0:51 ` Ingo Molnar
2015-02-23 6:35 ` Vojtech Pavlik
2015-02-24 9:44 ` Ingo Molnar
2015-02-24 12:12 ` Vojtech Pavlik
2015-02-24 10:53 ` Ingo Molnar
2015-02-24 12:19 ` Vojtech Pavlik
2015-02-22 14:37 ` Josh Poimboeuf
2015-02-22 16:40 ` Josh Poimboeuf
2015-02-22 19:03 ` Jiri Kosina
2015-02-24 10:23 ` Ingo Molnar
2015-02-24 11:10 ` Petr Mladek
2015-02-24 12:36 ` Vojtech Pavlik [this message]
2015-02-23 11:39 ` Pavel Machek
2015-02-24 10:25 ` Ingo Molnar
2015-02-24 12:11 ` Jiri Slaby
2015-02-24 13:18 ` live kernel upgrades Pavel Emelyanov
2015-02-20 16:12 ` [PATCH 1/3] sched: add sched_task_call() Josh Poimboeuf
2015-02-20 20:08 ` Ingo Molnar
2015-02-20 21:22 ` Josh Poimboeuf
2015-02-20 17:05 ` Josh Poimboeuf
2015-02-19 21:26 ` Jiri Kosina
2015-02-19 21:38 ` Jiri Kosina
2015-02-19 23:11 ` Josh Poimboeuf
2015-02-16 18:52 ` [PATCH 2/3] stacktrace: add save_stack_trace_tsk_safe() Josh Poimboeuf
2015-02-18 0:13 ` Andrew Morton
2015-02-20 9:32 ` Jiri Kosina
2015-02-16 18:52 ` [PATCH 3/3] proc: fix /proc/<pid>/stack for running tasks Josh Poimboeuf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150224123601.GC3081@suse.cz \
--to=vojtech@suse.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=bp@alien8.de \
--cc=jkosina@suse.cz \
--cc=jpoimboe@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=live-patching@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=sjenning@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox