From: Vojtech Pavlik <vojtech@suse.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>,
Jiri Kosina <jkosina@suse.cz>,
Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
Ingo Molnar <mingo@redhat.com>,
Seth Jennings <sjenning@redhat.com>,
linux-kernel@vger.kernel.org,
Linus Torvalds <torvalds@linux-foundation.org>,
Arjan van de Ven <arjan@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
Borislav Petkov <bp@alien8.de>,
live-patching@vger.kernel.org
Subject: Re: live kernel upgrades (was: live kernel patching design)
Date: Tue, 24 Feb 2015 13:36:01 +0100 [thread overview]
Message-ID: <20150224123601.GC3081@suse.cz> (raw)
In-Reply-To: <20150224102328.GC19976@gmail.com>
On Tue, Feb 24, 2015 at 11:23:29AM +0100, Ingo Molnar wrote:
> > Your upgrade proposal is an *enormous* disruption to the
> > system:
> >
> > - a latency of "well below 10" seconds is completely
> > unacceptable to most users who want to patch the kernel
> > of a production system _while_ it's in production.
>
> I think this statement is false for the following reasons.
The statement is very true.
> - I'd say the majority of system operators of production
> systems can live with a couple of seconds of delay at a
> well defined moment of the day or week - with gradual,
> pretty much open ended improvements in that latency
> down the line.
In the most usual corporate setting any noticeable outage, even out of
business hours, requires an ahead notice, and an agreement of all
stakeholders - teams that depend on the system.
If a live patching technology introduces an outage, it's not "live" and
because of these bureaucratic reasons, it will not be used and a regular
reboot will be scheduled instead.
> - I think your argument ignores the fact that live
> upgrades would extend the scope of 'users willing to
> patch the kernel of a production system' _enormously_.
>
> For example, I have a production system with this much
> uptime:
>
> 10:50:09 up 153 days, 3:58, 34 users, load average: 0.00, 0.02, 0.05
>
> While currently I'm reluctant to reboot the system to
> upgrade the kernel (due to a reboot's intrusiveness),
> and that is why it has achieved a relatively high
> uptime, but I'd definitely allow the kernel to upgrade
> at 0:00am just fine. (I'd even give it up to a few
> minutes, as long as TCP connections don't time out.)
>
> And I don't think my usecase is special.
I agree that this is useful. But it is a different problem that only
partially overlaps with what we're trying to achieve with live patching.
If you can make full kernel upgrades to work this way, which I doubt is
achievable in the next 10 years due to all the research and
infrastructure needed, then you certainly gain an additional group of
users. And a great tool. A large portion of those that ask for live
patching won't use it, though.
But honestly, I prefer a solution that works for small patches now, than
a solution for unlimited patches sometime in next decade.
> What gradual improvements in live upgrade latency am I
> talking about?
>
> - For example the majority of pure user-space process
> pages in RAM could be saved from the old kernel over
> into the new kernel - i.e. they'd stay in place in RAM,
> but they'd be re-hashed for the new data structures.
> This avoids a big chunk of checkpointing overhead.
I'd have hoped this would be a given. If you can't preserve memory
contents and have to re-load from disk, you can just as well reboot
entirely, the time needed will not be much more..
> - Likewise, most of the page cache could be saved from an
> old kernel to a new kernel as well - further reducing
> checkpointing overhead.
>
> - The PROT_NONE mechanism of the current NUMA balancing
> code could be used to transparently mark user-space
> pages as 'checkpointed'. This would reduce system
> interruption as only 'newly modified' pages would have
> to be checkpointed when the upgrade happens.
>
> - Hardware devices could be marked as 'already in well
> defined state', skipping the more expensive steps of
> driver initialization.
>
> - Possibly full user-space page tables could be preserved
> over an upgrade: this way user-space execution would be
> unaffected even in the micro level: cache layout, TLB
> patterns, etc.
>
> There's lots of gradual speedups possible with such a model
> IMO.
Yes, as I say above, guaranteeing decades of employment. ;)
> With live kernel patching we run into a brick wall of
> complexity straight away: we have to analyze the nature of
> the kernel modification, in the context of live patching,
> and that only works for the simplest of kernel
> modifications.
But you're able to _use_ it.
> With live kernel upgrades no such brick wall exists, just
> about any transition between kernel versions is possible.
The brick wall you run to is "I need to implement full kernel state
serialization before I can do anything at all." That's something that
isn't even clear _how_ to do. Particularly with Linux kernel's
development model where internal ABI and structures are always in flux
it may not even be realistic.
> Granted, with live kernel upgrades it's much more complex
> to get the 'simple' case into an even rudimentarily working
> fashion (full userspace state has to be enumerated, saved
> and restored), but once we are there, it's a whole new
> category of goodness and it probably covers 90%+ of the
> live kernel patching usecases on day 1 already ...
Feel free to start working on it. I'll stick with live patching.
--
Vojtech Pavlik
Director SUSE Labs
next prev parent reply other threads:[~2015-02-24 12:36 UTC|newest]
Thread overview: 86+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-02-16 18:52 [PATCH 0/3] prevent /proc/<pid>/stack garbage for running tasks Josh Poimboeuf
2015-02-16 18:52 ` [PATCH 1/3] sched: add sched_task_call() Josh Poimboeuf
2015-02-16 20:44 ` Peter Zijlstra
2015-02-16 22:05 ` Josh Poimboeuf
2015-02-17 9:24 ` Peter Zijlstra
2015-02-17 14:12 ` Josh Poimboeuf
2015-02-17 18:15 ` Peter Zijlstra
2015-02-17 21:25 ` Josh Poimboeuf
2015-02-18 15:21 ` Peter Zijlstra
2015-02-18 17:12 ` Josh Poimboeuf
2015-02-19 0:20 ` Peter Zijlstra
2015-02-19 4:17 ` Josh Poimboeuf
2015-02-19 10:16 ` Peter Zijlstra
2015-02-19 16:24 ` Josh Poimboeuf
2015-02-19 16:33 ` Vojtech Pavlik
2015-02-19 17:03 ` Josh Poimboeuf
2015-02-19 17:08 ` Jiri Kosina
2015-02-19 17:19 ` Vojtech Pavlik
2015-02-19 17:32 ` Josh Poimboeuf
2015-02-19 17:48 ` Vojtech Pavlik
2015-02-19 20:40 ` Vojtech Pavlik
2015-02-19 21:42 ` Josh Poimboeuf
2015-02-20 7:46 ` Jiri Kosina
2015-02-20 8:49 ` Jiri Kosina
2015-02-20 9:50 ` Ingo Molnar
2015-02-20 10:02 ` Jiri Kosina
2015-02-20 10:44 ` live patching design (was: Re: [PATCH 1/3] sched: add sched_task_call()) Ingo Molnar
2015-02-20 10:58 ` Jiri Kosina
2015-02-20 19:49 ` Ingo Molnar
2015-02-20 21:46 ` Vojtech Pavlik
2015-02-20 22:08 ` Josh Poimboeuf
2015-02-21 18:30 ` Ingo Molnar
2015-02-22 8:52 ` Jiri Kosina
2015-02-22 10:17 ` Ingo Molnar
2015-02-22 19:18 ` Jiri Kosina
2015-02-23 12:43 ` Jiri Kosina
2015-02-24 10:37 ` Ingo Molnar
2015-02-21 18:18 ` Ingo Molnar
2015-02-21 18:57 ` Jiri Kosina
2015-02-21 19:16 ` Ingo Molnar
2015-02-21 19:31 ` Jiri Kosina
2015-02-21 19:48 ` Ingo Molnar
2015-02-21 20:10 ` Jiri Kosina
2015-02-21 20:53 ` Jiri Kosina
2015-02-22 8:46 ` Ingo Molnar
2015-02-22 9:08 ` Jiri Kosina
2015-02-22 9:46 ` live kernel upgrades (was: live kernel patching design) Ingo Molnar
2015-02-22 10:34 ` Ingo Molnar
2015-02-22 10:48 ` Ingo Molnar
2015-02-22 19:13 ` Jiri Kosina
2015-02-22 23:01 ` Andrew Morton
2015-02-23 0:18 ` Dave Airlie
2015-02-23 0:44 ` Arjan van de Ven
2015-02-23 8:17 ` Jiri Kosina
2015-02-23 10:42 ` Richard Weinberger
2015-02-23 11:08 ` Vojtech Pavlik
2015-02-23 11:50 ` Pavel Machek
2015-02-24 9:16 ` Ingo Molnar
2015-02-24 12:28 ` Jiri Slaby
2015-03-05 0:51 ` Ingo Molnar
2015-02-23 6:35 ` Vojtech Pavlik
2015-02-24 9:44 ` Ingo Molnar
2015-02-24 12:12 ` Vojtech Pavlik
2015-02-24 10:53 ` Ingo Molnar
2015-02-24 12:19 ` Vojtech Pavlik
2015-02-22 14:37 ` Josh Poimboeuf
2015-02-22 16:40 ` Josh Poimboeuf
2015-02-22 19:03 ` Jiri Kosina
2015-02-24 10:23 ` Ingo Molnar
2015-02-24 11:10 ` Petr Mladek
2015-02-24 12:36 ` Vojtech Pavlik [this message]
2015-02-23 11:39 ` Pavel Machek
2015-02-24 10:25 ` Ingo Molnar
2015-02-24 12:11 ` Jiri Slaby
2015-02-24 13:18 ` live kernel upgrades Pavel Emelyanov
2015-02-20 16:12 ` [PATCH 1/3] sched: add sched_task_call() Josh Poimboeuf
2015-02-20 20:08 ` Ingo Molnar
2015-02-20 21:22 ` Josh Poimboeuf
2015-02-20 17:05 ` Josh Poimboeuf
2015-02-19 21:26 ` Jiri Kosina
2015-02-19 21:38 ` Jiri Kosina
2015-02-19 23:11 ` Josh Poimboeuf
2015-02-16 18:52 ` [PATCH 2/3] stacktrace: add save_stack_trace_tsk_safe() Josh Poimboeuf
2015-02-18 0:13 ` Andrew Morton
2015-02-20 9:32 ` Jiri Kosina
2015-02-16 18:52 ` [PATCH 3/3] proc: fix /proc/<pid>/stack for running tasks Josh Poimboeuf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150224123601.GC3081@suse.cz \
--to=vojtech@suse.com \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=arjan@infradead.org \
--cc=bp@alien8.de \
--cc=jkosina@suse.cz \
--cc=jpoimboe@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=live-patching@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=sjenning@redhat.com \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.