Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support

multikernel.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
       [not found] ` <20250921014721.7323-1-hdanton@sina.com>
@ 2025-09-22 21:55   ` Cong Wang
  2025-09-24  1:12     ` Hillf Danton
  0 siblings, 1 reply; 22+ messages in thread
From: Cong Wang @ 2025-09-22 21:55 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-kernel, linux-mm, multikernel

On Sat, Sep 20, 2025 at 6:47 PM Hillf Danton <hdanton@sina.com> wrote:
>
> On Thu, 18 Sep 2025 15:25:59 -0700 Cong Wang wrote:
> > This patch series introduces multikernel architecture support, enabling
> > multiple independent kernel instances to coexist and communicate on a
> > single physical machine. Each kernel instance can run on dedicated CPU
> > cores while sharing the underlying hardware resources.
> >
> > The multikernel architecture provides several key benefits:
> > - Improved fault isolation between different workloads
> > - Enhanced security through kernel-level separation
> > - Better resource utilization than traditional VM (KVM, Xen etc.)
> > - Potential zero-down kernel update with KHO (Kernel Hand Over)
> >
> Could you illustrate a couple of use cases to help understand your idea?

Sure, below are a few use cases on my mind:

1) With sufficient hardware resources: each kernel gets isolated resources
with real bare metal performance. This applies to all VM/container use cases
today, just with pure better performance: no virtualization, no noisy neighbor.

More importantly, they can co-exist. In theory, you can run a multiernel with
a VM inside and with a container inside the VM.

2) Active-backup kernel for mission-critical tasks: after the primary kernel
crashes, a backup kernel in parallel immediately takes over without interrupting
the user-space task.

Dual-kernel systems are very common for automotives today.

3) Getting rid of the OS to reduce the attack surface. We could pack everything
properly in an initramfs and run it directly without bothering a full
OS. This is
similar to what unikernels or macro VM's do today.

4) Machine learning in the kernel. Machine learning is too specific to
workloads,
for instance, mixing real-time scheduling and non-RT can be challenging for
ML to tune the CPU scheduler, which is an essential multi-goal learning.

5) Per-application specialized kernel: For example, running a RT kernel
and non-RT kernel in parallel. Memory footprint can also be reduced by
reducing the 5-level paging tables when necessary.

I hope this helps.

Regards,
Cong

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
       [not found]     ` <20250922142831.GA351870@fedora>
@ 2025-09-22 22:41       ` Cong Wang
  2025-09-23 17:05         ` Stefan Hajnoczi
  0 siblings, 1 reply; 22+ messages in thread
From: Cong Wang @ 2025-09-22 22:41 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-kernel, pasha.tatashin, Cong Wang, Andrew Morton,
	Baoquan He, Alexander Graf, Mike Rapoport, Changyuan Lyu, kexec,
	linux-mm, multikernel

On Mon, Sep 22, 2025 at 7:28 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Sat, Sep 20, 2025 at 02:40:18PM -0700, Cong Wang wrote:
> > On Fri, Sep 19, 2025 at 2:27 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >
> > > On Thu, Sep 18, 2025 at 03:25:59PM -0700, Cong Wang wrote:
> > > > This patch series introduces multikernel architecture support, enabling
> > > > multiple independent kernel instances to coexist and communicate on a
> > > > single physical machine. Each kernel instance can run on dedicated CPU
> > > > cores while sharing the underlying hardware resources.
> > > >
> > > > The multikernel architecture provides several key benefits:
> > > > - Improved fault isolation between different workloads
> > > > - Enhanced security through kernel-level separation
> > >
> > > What level of isolation does this patch series provide? What stops
> > > kernel A from accessing kernel B's memory pages, sending interrupts to
> > > its CPUs, etc?
> >
> > It is kernel-enforced isolation, therefore, the trust model here is still
> > based on kernel. Hence, a malicious kernel would be able to disrupt,
> > as you described. With memory encryption and IPI filtering, I think
> > that is solvable.
>
> I think solving this is key to the architecture, at least if fault
> isolation and security are goals. A cooperative architecture where
> nothing prevents kernels from interfering with each other simply doesn't
> offer fault isolation or security.

Kernel and kernel modules can be signed today, kexec also supports
kernel signing via kexec_file_load(). It migrates at least untrusted
kernels, although kernels can be still exploited via 0-day.

>
> On CPU architectures that offer additional privilege modes it may be
> possible to run a supervisor on every CPU to restrict access to
> resources in the spawned kernel. Kernels would need to be modified to
> call into the supervisor instead of accessing certain resources
> directly.
>
> IOMMU and interrupt remapping control would need to be performed by the
> supervisor to prevent spawned kernels from affecting each other.

That's right, security vs performance. A lot of times we have to balance
between these two. This is why Kata Container today runs a container
inside a VM.

This largely depends on what users could compromise, there is no single
right answer here.

For example, in a fully-controlled private cloud, security exploits are
probably not even a concern. Sacrificing performance for a non-concern
is not reasonable.

>
> This seems to be the price of fault isolation and security. It ends up
> looking similar to a hypervisor, but maybe it wouldn't need to use
> virtualization extensions, depending on the capabilities of the CPU
> architecture.

Two more points:

1) Security lockdown. Security lockdown transforms multikernel from
"0-day means total compromise" to "0-day means single workload
compromise with rapid recovery." This is still a significant improvement
over containers where a single kernel 0-day compromises everything
simultaneously.

2) Rapid kernel updates: A more practical way to eliminate 0-day
exploits is to update kernel more frequently, today the major blocker
is the downtime required by kernel reboot, which is what multikernel
aims to resolve.

I hope this helps.

Regards,
Cong Wang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-22 22:41       ` Cong Wang
@ 2025-09-23 17:05         ` Stefan Hajnoczi
  2025-09-24 11:38           ` David Hildenbrand
  2025-09-24 17:18           ` Cong Wang
  0 siblings, 2 replies; 22+ messages in thread
From: Stefan Hajnoczi @ 2025-09-23 17:05 UTC (permalink / raw)
  To: Cong Wang
  Cc: linux-kernel, pasha.tatashin, Cong Wang, Andrew Morton,
	Baoquan He, Alexander Graf, Mike Rapoport, Changyuan Lyu, kexec,
	linux-mm, multikernel

[-- Attachment #1: Type: text/plain, Size: 4135 bytes --]

On Mon, Sep 22, 2025 at 03:41:18PM -0700, Cong Wang wrote:
> On Mon, Sep 22, 2025 at 7:28 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Sat, Sep 20, 2025 at 02:40:18PM -0700, Cong Wang wrote:
> > > On Fri, Sep 19, 2025 at 2:27 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > >
> > > > On Thu, Sep 18, 2025 at 03:25:59PM -0700, Cong Wang wrote:
> > > > > This patch series introduces multikernel architecture support, enabling
> > > > > multiple independent kernel instances to coexist and communicate on a
> > > > > single physical machine. Each kernel instance can run on dedicated CPU
> > > > > cores while sharing the underlying hardware resources.
> > > > >
> > > > > The multikernel architecture provides several key benefits:
> > > > > - Improved fault isolation between different workloads
> > > > > - Enhanced security through kernel-level separation
> > > >
> > > > What level of isolation does this patch series provide? What stops
> > > > kernel A from accessing kernel B's memory pages, sending interrupts to
> > > > its CPUs, etc?
> > >
> > > It is kernel-enforced isolation, therefore, the trust model here is still
> > > based on kernel. Hence, a malicious kernel would be able to disrupt,
> > > as you described. With memory encryption and IPI filtering, I think
> > > that is solvable.
> >
> > I think solving this is key to the architecture, at least if fault
> > isolation and security are goals. A cooperative architecture where
> > nothing prevents kernels from interfering with each other simply doesn't
> > offer fault isolation or security.
> 
> Kernel and kernel modules can be signed today, kexec also supports
> kernel signing via kexec_file_load(). It migrates at least untrusted
> kernels, although kernels can be still exploited via 0-day.

Kernel signing also doesn't protect against bugs in one kernel
interfering with another kernel.

> >
> > On CPU architectures that offer additional privilege modes it may be
> > possible to run a supervisor on every CPU to restrict access to
> > resources in the spawned kernel. Kernels would need to be modified to
> > call into the supervisor instead of accessing certain resources
> > directly.
> >
> > IOMMU and interrupt remapping control would need to be performed by the
> > supervisor to prevent spawned kernels from affecting each other.
> 
> That's right, security vs performance. A lot of times we have to balance
> between these two. This is why Kata Container today runs a container
> inside a VM.
> 
> This largely depends on what users could compromise, there is no single
> right answer here.
> 
> For example, in a fully-controlled private cloud, security exploits are
> probably not even a concern. Sacrificing performance for a non-concern
> is not reasonable.
> 
> >
> > This seems to be the price of fault isolation and security. It ends up
> > looking similar to a hypervisor, but maybe it wouldn't need to use
> > virtualization extensions, depending on the capabilities of the CPU
> > architecture.
> 
> Two more points:
> 
> 1) Security lockdown. Security lockdown transforms multikernel from
> "0-day means total compromise" to "0-day means single workload
> compromise with rapid recovery." This is still a significant improvement
> over containers where a single kernel 0-day compromises everything
> simultaneously.

I don't follow. My understanding is that multikernel currently does not
prevent spawned kernels from affecting each other, so a kernel 0-day in
multikernel still compromises everything?

> 
> 2) Rapid kernel updates: A more practical way to eliminate 0-day
> exploits is to update kernel more frequently, today the major blocker
> is the downtime required by kernel reboot, which is what multikernel
> aims to resolve.

If kernel upgrades are the main use case for multikernel, then I guess
isolation is not necessary. Two kernels would only run side-by-side for
a limited period of time and they would have access to the same
workloads.

Stefan

> 
> I hope this helps.
> 
> Regards,
> Cong Wang
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-22 21:55   ` [RFC Patch 0/7] kernel: Introduce multikernel architecture support Cong Wang
@ 2025-09-24  1:12     ` Hillf Danton
  2025-09-24 17:30       ` Cong Wang
  0 siblings, 1 reply; 22+ messages in thread
From: Hillf Danton @ 2025-09-24  1:12 UTC (permalink / raw)
  To: Cong Wang; +Cc: linux-kernel, linux-mm, multikernel

On Mon, 22 Sep 2025 14:55:41 -0700 Cong Wang wrote:
> On Sat, Sep 20, 2025 at 6:47 PM Hillf Danton <hdanton@sina.com> wrote:
> > On Thu, 18 Sep 2025 15:25:59 -0700 Cong Wang wrote:
> > > This patch series introduces multikernel architecture support, enabling
> > > multiple independent kernel instances to coexist and communicate on a
> > > single physical machine. Each kernel instance can run on dedicated CPU
> > > cores while sharing the underlying hardware resources.
> > >
> > > The multikernel architecture provides several key benefits:
> > > - Improved fault isolation between different workloads
> > > - Enhanced security through kernel-level separation
> > > - Better resource utilization than traditional VM (KVM, Xen etc.)
> > > - Potential zero-down kernel update with KHO (Kernel Hand Over)
> > >
> > Could you illustrate a couple of use cases to help understand your idea?
> 
> Sure, below are a few use cases on my mind:
> 
> 1) With sufficient hardware resources: each kernel gets isolated resources
> with real bare metal performance. This applies to all VM/container use cases
> today, just with pure better performance: no virtualization, no noisy neighbor.
> 
> More importantly, they can co-exist. In theory, you can run a multiernel with
> a VM inside and with a container inside the VM.
> 
If the 6.17 eevdf perfs better than the 6.15 one could, their co-exist wastes
bare metal cpu cycles.

> 2) Active-backup kernel for mission-critical tasks: after the primary kernel
> crashes, a backup kernel in parallel immediately takes over without interrupting
> the user-space task.
> 
> Dual-kernel systems are very common for automotives today.
> 
If 6.17 is more stable than 6.14, running the latter sounds like square skull
in the product environment.

> 3) Getting rid of the OS to reduce the attack surface. We could pack everything
> properly in an initramfs and run it directly without bothering a full
> OS. This is similar to what unikernels or macro VM's do today.
> 
Duno

> 4) Machine learning in the kernel. Machine learning is too specific to
> workloads, for instance, mixing real-time scheduling and non-RT can be challenging for
> ML to tune the CPU scheduler, which is an essential multi-goal learning.
> 
No room for CUDA in kernel I think in 2025.

> 5) Per-application specialized kernel: For example, running a RT kernel
> and non-RT kernel in parallel. Memory footprint can also be reduced by
> reducing the 5-level paging tables when necessary.

If RT makes your product earn more money in fewer weeks, why is eevdf
another option, given RT means no schedule at the first place?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-23 17:05         ` Stefan Hajnoczi
@ 2025-09-24 11:38           ` David Hildenbrand
  2025-09-24 12:51             ` Stefan Hajnoczi
  2025-09-24 17:18           ` Cong Wang
  1 sibling, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2025-09-24 11:38 UTC (permalink / raw)
  To: Stefan Hajnoczi, Cong Wang
  Cc: linux-kernel, pasha.tatashin, Cong Wang, Andrew Morton,
	Baoquan He, Alexander Graf, Mike Rapoport, Changyuan Lyu, kexec,
	linux-mm, multikernel

>>
>> Two more points:
>>
>> 1) Security lockdown. Security lockdown transforms multikernel from
>> "0-day means total compromise" to "0-day means single workload
>> compromise with rapid recovery." This is still a significant improvement
>> over containers where a single kernel 0-day compromises everything
>> simultaneously.
> 
> I don't follow. My understanding is that multikernel currently does not
> prevent spawned kernels from affecting each other, so a kernel 0-day in
> multikernel still compromises everything?

I would assume that if there is no enforced isolation by the hardware 
(e.g., virtualization, including partitioning hypervisors like 
jailhouse, pkvm etc) nothing would stop a kernel A to access memory 
assigned to kernel B.

And of course, memory is just one of the resources that would not be 
properly isolated.

Not sure if encrypting memory per kernel would really allow to not let 
other kernels still damage such kernels.

Also, what stops a kernel to just reboot the whole machine? Happy to 
learn how that will be handled such that there is proper isolation.

-- 
Cheers

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-24 11:38           ` David Hildenbrand
@ 2025-09-24 12:51             ` Stefan Hajnoczi
  2025-09-24 18:28               ` Cong Wang
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2025-09-24 12:51 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Cong Wang, linux-kernel, pasha.tatashin, Cong Wang, Andrew Morton,
	Baoquan He, Alexander Graf, Mike Rapoport, Changyuan Lyu, kexec,
	linux-mm, multikernel

[-- Attachment #1: Type: text/plain, Size: 1656 bytes --]

On Wed, Sep 24, 2025 at 01:38:31PM +0200, David Hildenbrand wrote:
> > > 
> > > Two more points:
> > > 
> > > 1) Security lockdown. Security lockdown transforms multikernel from
> > > "0-day means total compromise" to "0-day means single workload
> > > compromise with rapid recovery." This is still a significant improvement
> > > over containers where a single kernel 0-day compromises everything
> > > simultaneously.
> > 
> > I don't follow. My understanding is that multikernel currently does not
> > prevent spawned kernels from affecting each other, so a kernel 0-day in
> > multikernel still compromises everything?
> 
> I would assume that if there is no enforced isolation by the hardware (e.g.,
> virtualization, including partitioning hypervisors like jailhouse, pkvm etc)
> nothing would stop a kernel A to access memory assigned to kernel B.
> 
> And of course, memory is just one of the resources that would not be
> properly isolated.
> 
> Not sure if encrypting memory per kernel would really allow to not let other
> kernels still damage such kernels.
> 
> Also, what stops a kernel to just reboot the whole machine? Happy to learn
> how that will be handled such that there is proper isolation.

The reason I've been asking about the fault isolation and security
statements in the cover letter is because it's unclear:
1. What is implemented today in multikernel.
2. What is on the roadmap for multikernel.
3. What is out of scope for multikernel.

Cong: Can you clarify this? If the answer is that fault isolation and
security are out of scope, then this discussion can be skipped.

Thanks,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-23 17:05         ` Stefan Hajnoczi
  2025-09-24 11:38           ` David Hildenbrand
@ 2025-09-24 17:18           ` Cong Wang
  1 sibling, 0 replies; 22+ messages in thread
From: Cong Wang @ 2025-09-24 17:18 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: linux-kernel, pasha.tatashin, Cong Wang, Andrew Morton,
	Baoquan He, Alexander Graf, Mike Rapoport, Changyuan Lyu, kexec,
	linux-mm, multikernel

On Tue, Sep 23, 2025 at 10:05 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Mon, Sep 22, 2025 at 03:41:18PM -0700, Cong Wang wrote:
> > On Mon, Sep 22, 2025 at 7:28 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >
> > > On Sat, Sep 20, 2025 at 02:40:18PM -0700, Cong Wang wrote:
> > > > On Fri, Sep 19, 2025 at 2:27 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > > >
> > > > > On Thu, Sep 18, 2025 at 03:25:59PM -0700, Cong Wang wrote:
> > > > > > This patch series introduces multikernel architecture support, enabling
> > > > > > multiple independent kernel instances to coexist and communicate on a
> > > > > > single physical machine. Each kernel instance can run on dedicated CPU
> > > > > > cores while sharing the underlying hardware resources.
> > > > > >
> > > > > > The multikernel architecture provides several key benefits:
> > > > > > - Improved fault isolation between different workloads
> > > > > > - Enhanced security through kernel-level separation
> > > > >
> > > > > What level of isolation does this patch series provide? What stops
> > > > > kernel A from accessing kernel B's memory pages, sending interrupts to
> > > > > its CPUs, etc?
> > > >
> > > > It is kernel-enforced isolation, therefore, the trust model here is still
> > > > based on kernel. Hence, a malicious kernel would be able to disrupt,
> > > > as you described. With memory encryption and IPI filtering, I think
> > > > that is solvable.
> > >
> > > I think solving this is key to the architecture, at least if fault
> > > isolation and security are goals. A cooperative architecture where
> > > nothing prevents kernels from interfering with each other simply doesn't
> > > offer fault isolation or security.
> >
> > Kernel and kernel modules can be signed today, kexec also supports
> > kernel signing via kexec_file_load(). It migrates at least untrusted
> > kernels, although kernels can be still exploited via 0-day.
>
> Kernel signing also doesn't protect against bugs in one kernel
> interfering with another kernel.

This is also true, this is why memory encryption and authentication
could help. Hardware vendors can catch up with software, which
is how virtualization evolved (e.g. VPDA didn't exist when KVM was
invented).

>
> > >
> > > On CPU architectures that offer additional privilege modes it may be
> > > possible to run a supervisor on every CPU to restrict access to
> > > resources in the spawned kernel. Kernels would need to be modified to
> > > call into the supervisor instead of accessing certain resources
> > > directly.
> > >
> > > IOMMU and interrupt remapping control would need to be performed by the
> > > supervisor to prevent spawned kernels from affecting each other.
> >
> > That's right, security vs performance. A lot of times we have to balance
> > between these two. This is why Kata Container today runs a container
> > inside a VM.
> >
> > This largely depends on what users could compromise, there is no single
> > right answer here.
> >
> > For example, in a fully-controlled private cloud, security exploits are
> > probably not even a concern. Sacrificing performance for a non-concern
> > is not reasonable.
> >
> > >
> > > This seems to be the price of fault isolation and security. It ends up
> > > looking similar to a hypervisor, but maybe it wouldn't need to use
> > > virtualization extensions, depending on the capabilities of the CPU
> > > architecture.
> >
> > Two more points:
> >
> > 1) Security lockdown. Security lockdown transforms multikernel from
> > "0-day means total compromise" to "0-day means single workload
> > compromise with rapid recovery." This is still a significant improvement
> > over containers where a single kernel 0-day compromises everything
> > simultaneously.
>
> I don't follow. My understanding is that multikernel currently does not
> prevent spawned kernels from affecting each other, so a kernel 0-day in
> multikernel still compromises everything?

Linux kernel lockdown does reduce the blast radius of a 0-day exploit,
but it doesn’t eliminate it. I hope this is clearer.

>
> >
> > 2) Rapid kernel updates: A more practical way to eliminate 0-day
> > exploits is to update kernel more frequently, today the major blocker
> > is the downtime required by kernel reboot, which is what multikernel
> > aims to resolve.
>
> If kernel upgrades are the main use case for multikernel, then I guess
> isolation is not necessary. Two kernels would only run side-by-side for
> a limited period of time and they would have access to the same
> workloads.

Zero-downtime upgrade is probably the last we could achieve
with multikernel, as a true zero-downtime requires significant effort
on kernel-to-kernel coordination, so we would essentially need to
establish a protocol (via KHO, I hope) here.

On the other hand, isolation is relatively easy and more useful.
I understand you don't like kernel isolation, however, we need to
recognize the success of containers today, regardless we like it or
not.

By the way, although just a theory, I hope multikernel does not
prevent users using virtualization inside, as VM does not prevent
running containers inside. The choice should always be on users'
side, not ours.

I hope this helps.

Regards,
Cong Wang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-24  1:12     ` Hillf Danton
@ 2025-09-24 17:30       ` Cong Wang
  2025-09-24 22:42         ` Hillf Danton
  0 siblings, 1 reply; 22+ messages in thread
From: Cong Wang @ 2025-09-24 17:30 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-kernel, linux-mm, multikernel

On Tue, Sep 23, 2025 at 6:12 PM Hillf Danton <hdanton@sina.com> wrote:
>
> On Mon, 22 Sep 2025 14:55:41 -0700 Cong Wang wrote:
> > On Sat, Sep 20, 2025 at 6:47 PM Hillf Danton <hdanton@sina.com> wrote:
> > > On Thu, 18 Sep 2025 15:25:59 -0700 Cong Wang wrote:
> > > > This patch series introduces multikernel architecture support, enabling
> > > > multiple independent kernel instances to coexist and communicate on a
> > > > single physical machine. Each kernel instance can run on dedicated CPU
> > > > cores while sharing the underlying hardware resources.
> > > >
> > > > The multikernel architecture provides several key benefits:
> > > > - Improved fault isolation between different workloads
> > > > - Enhanced security through kernel-level separation
> > > > - Better resource utilization than traditional VM (KVM, Xen etc.)
> > > > - Potential zero-down kernel update with KHO (Kernel Hand Over)
> > > >
> > > Could you illustrate a couple of use cases to help understand your idea?
> >
> > Sure, below are a few use cases on my mind:
> >
> > 1) With sufficient hardware resources: each kernel gets isolated resources
> > with real bare metal performance. This applies to all VM/container use cases
> > today, just with pure better performance: no virtualization, no noisy neighbor.
> >
> > More importantly, they can co-exist. In theory, you can run a multiernel with
> > a VM inside and with a container inside the VM.
> >
> If the 6.17 eevdf perfs better than the 6.15 one could, their co-exist wastes
> bare metal cpu cycles.

I think we should never eliminate the ability of not using multikernel, users
should have a choice. Apologize if I didn't make this clear.

And even if you only want one kernel, you might still want to use
zero-downtime upgrade via multikernel. ;-)

>
> > 2) Active-backup kernel for mission-critical tasks: after the primary kernel
> > crashes, a backup kernel in parallel immediately takes over without interrupting
> > the user-space task.
> >
> > Dual-kernel systems are very common for automotives today.
> >
> If 6.17 is more stable than 6.14, running the latter sounds like square skull
> in the product environment.

I don't think anyone here wants to take your freedom of doing so.
You also have a choice of not using multikernel or kexec, or even
CONFIG_KEXEC=n. :)

On the other hand, let's also respect the fact that many automotives
today use dual-kernel systems (one for interaction, one for autonomous
driving).

>
> > 3) Getting rid of the OS to reduce the attack surface. We could pack everything
> > properly in an initramfs and run it directly without bothering a full
> > OS. This is similar to what unikernels or macro VM's do today.
> >
> Duno

Same, choice is always on the table, it must be.

>
> > 4) Machine learning in the kernel. Machine learning is too specific to
> > workloads, for instance, mixing real-time scheduling and non-RT can be challenging for
> > ML to tune the CPU scheduler, which is an essential multi-goal learning.
> >
> No room for CUDA in kernel I think in 2025.

Maybe yes. LAKE is framework for using GPU-accelerated ML
in the kernel:
https://utns.cs.utexas.edu/assets/papers/lake_camera_ready.pdf

If you are interested in this area, there are tons of papers existing.

>
> > 5) Per-application specialized kernel: For example, running a RT kernel
> > and non-RT kernel in parallel. Memory footprint can also be reduced by
> > reducing the 5-level paging tables when necessary.
>
> If RT makes your product earn more money in fewer weeks, why is eevdf
> another option, given RT means no schedule at the first place?

I wish there is a one-single perfect solution for everyone, unfortunately
the reality seems to be the opposite.

Regards,
Cong Wang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-24 12:51             ` Stefan Hajnoczi
@ 2025-09-24 18:28               ` Cong Wang
  2025-09-24 19:03                 ` Stefan Hajnoczi
  0 siblings, 1 reply; 22+ messages in thread
From: Cong Wang @ 2025-09-24 18:28 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: David Hildenbrand, linux-kernel, pasha.tatashin, Cong Wang,
	Andrew Morton, Baoquan He, Alexander Graf, Mike Rapoport,
	Changyuan Lyu, kexec, linux-mm, multikernel

On Wed, Sep 24, 2025 at 5:51 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Wed, Sep 24, 2025 at 01:38:31PM +0200, David Hildenbrand wrote:
> > > >
> > > > Two more points:
> > > >
> > > > 1) Security lockdown. Security lockdown transforms multikernel from
> > > > "0-day means total compromise" to "0-day means single workload
> > > > compromise with rapid recovery." This is still a significant improvement
> > > > over containers where a single kernel 0-day compromises everything
> > > > simultaneously.
> > >
> > > I don't follow. My understanding is that multikernel currently does not
> > > prevent spawned kernels from affecting each other, so a kernel 0-day in
> > > multikernel still compromises everything?
> >
> > I would assume that if there is no enforced isolation by the hardware (e.g.,
> > virtualization, including partitioning hypervisors like jailhouse, pkvm etc)
> > nothing would stop a kernel A to access memory assigned to kernel B.
> >
> > And of course, memory is just one of the resources that would not be
> > properly isolated.
> >
> > Not sure if encrypting memory per kernel would really allow to not let other
> > kernels still damage such kernels.
> >
> > Also, what stops a kernel to just reboot the whole machine? Happy to learn
> > how that will be handled such that there is proper isolation.
>
> The reason I've been asking about the fault isolation and security
> statements in the cover letter is because it's unclear:
> 1. What is implemented today in multikernel.
> 2. What is on the roadmap for multikernel.
> 3. What is out of scope for multikernel.
>
> Cong: Can you clarify this? If the answer is that fault isolation and
> security are out of scope, then this discussion can be skipped.

It is my pleasure. The email is too narrow, therefore I wrote a
complete document for you:
https://docs.google.com/document/d/1yneO6O6C_z0Lh3A2QyT8XsH7ZrQ7-naGQT-rpdjWa_g/edit?usp=sharing

I hope it answers all of the above questions and provides a clear
big picture. If not, please let me know.

(If you need edit permission for the above document, please just
request, I will approve.)

Regards,
Cong Wang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-24 18:28               ` Cong Wang
@ 2025-09-24 19:03                 ` Stefan Hajnoczi
  2025-09-27 19:42                   ` Cong Wang
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2025-09-24 19:03 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Hildenbrand, linux-kernel, pasha.tatashin, Cong Wang,
	Andrew Morton, Baoquan He, Alexander Graf, Mike Rapoport,
	Changyuan Lyu, kexec, linux-mm, multikernel

[-- Attachment #1: Type: text/plain, Size: 2931 bytes --]

On Wed, Sep 24, 2025 at 11:28:04AM -0700, Cong Wang wrote:
> On Wed, Sep 24, 2025 at 5:51 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > On Wed, Sep 24, 2025 at 01:38:31PM +0200, David Hildenbrand wrote:
> > > > >
> > > > > Two more points:
> > > > >
> > > > > 1) Security lockdown. Security lockdown transforms multikernel from
> > > > > "0-day means total compromise" to "0-day means single workload
> > > > > compromise with rapid recovery." This is still a significant improvement
> > > > > over containers where a single kernel 0-day compromises everything
> > > > > simultaneously.
> > > >
> > > > I don't follow. My understanding is that multikernel currently does not
> > > > prevent spawned kernels from affecting each other, so a kernel 0-day in
> > > > multikernel still compromises everything?
> > >
> > > I would assume that if there is no enforced isolation by the hardware (e.g.,
> > > virtualization, including partitioning hypervisors like jailhouse, pkvm etc)
> > > nothing would stop a kernel A to access memory assigned to kernel B.
> > >
> > > And of course, memory is just one of the resources that would not be
> > > properly isolated.
> > >
> > > Not sure if encrypting memory per kernel would really allow to not let other
> > > kernels still damage such kernels.
> > >
> > > Also, what stops a kernel to just reboot the whole machine? Happy to learn
> > > how that will be handled such that there is proper isolation.
> >
> > The reason I've been asking about the fault isolation and security
> > statements in the cover letter is because it's unclear:
> > 1. What is implemented today in multikernel.
> > 2. What is on the roadmap for multikernel.
> > 3. What is out of scope for multikernel.
> >
> > Cong: Can you clarify this? If the answer is that fault isolation and
> > security are out of scope, then this discussion can be skipped.
> 
> It is my pleasure. The email is too narrow, therefore I wrote a
> complete document for you:
> https://docs.google.com/document/d/1yneO6O6C_z0Lh3A2QyT8XsH7ZrQ7-naGQT-rpdjWa_g/edit?usp=sharing
> 
> I hope it answers all of the above questions and provides a clear
> big picture. If not, please let me know.
> 
> (If you need edit permission for the above document, please just
> request, I will approve.)

Thanks, that gives a nice overview!

I/O Resource Allocation part will be interesting. Restructuring existing
device drivers to allow spawned kernels to use specific hardware queues
could be a lot of work and very device-specific. I guess a small set of
devices can be supported initially and then it can grow over time.

This also reminds me of VFIO/mdev devices, which would be another
solution to the same problem, but equally device-specific and also a lot
of work to implement the devices that spawned kernels see.

Anyway, I look forward to seeing how this develops.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-24 17:30       ` Cong Wang
@ 2025-09-24 22:42         ` Hillf Danton
  0 siblings, 0 replies; 22+ messages in thread
From: Hillf Danton @ 2025-09-24 22:42 UTC (permalink / raw)
  To: Cong Wang; +Cc: linux-kernel, linux-mm, multikernel

On Wed, 24 Sep 2025 10:30:28 -0700 Cong Wang wrote:
>On Tue, Sep 23, 2025 at 6:12 PM Hillf Danton <hdanton@sina.com> wrote:
>> On Mon, 22 Sep 2025 14:55:41 -0700 Cong Wang wrote:
>> > On Sat, Sep 20, 2025 at 6:47 PM Hillf Danton <hdanton@sina.com> wrote:
>> > > On Thu, 18 Sep 2025 15:25:59 -0700 Cong Wang wrote:
>> > > > This patch series introduces multikernel architecture support, enabling
>> > > > multiple independent kernel instances to coexist and communicate on a
>> > > > single physical machine. Each kernel instance can run on dedicated CPU
>> > > > cores while sharing the underlying hardware resources.
>> > > >
>> > > > The multikernel architecture provides several key benefits:
>> > > > - Improved fault isolation between different workloads
>> > > > - Enhanced security through kernel-level separation
>> > > > - Better resource utilization than traditional VM (KVM, Xen etc.)
>> > > > - Potential zero-down kernel update with KHO (Kernel Hand Over)
>> > > >
>> > > Could you illustrate a couple of use cases to help understand your idea?
>> >
>> > Sure, below are a few use cases on my mind:
>> >
>> > 1) With sufficient hardware resources: each kernel gets isolated resources
>> > with real bare metal performance. This applies to all VM/container use cases
>> > today, just with pure better performance: no virtualization, no noisy neighbor.
>> >
>> > More importantly, they can co-exist. In theory, you can run a multiernel with
>> > a VM inside and with a container inside the VM.
>> >
>> If the 6.17 eevdf perfs better than the 6.15 one could, their co-exist wastes
>> bare metal cpu cycles.
>
> I think we should never eliminate the ability of not using multikernel, users
> should have a choice. Apologize if I didn't make this clear.
> 
If multikernel is one of features the Thompson and Ritchie Unix offered,
all is fine simply because the linux kernel is never the pill expected
to cure all pains particularly in the user space.

> And even if you only want one kernel, you might still want to use
> zero-downtime upgrade via multikernel. ;-)
> 
FYI what I see in Shenzhen 2025 in the car cockpit product environment WRT
multikernel is - hypervisor like QNX supports multi virtual machines
including Android, !Android, linux and !linux, RT and !RT.

Hillf

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-24 19:03                 ` Stefan Hajnoczi
@ 2025-09-27 19:42                   ` Cong Wang
  2025-09-29 15:11                     ` Stefan Hajnoczi
  0 siblings, 1 reply; 22+ messages in thread
From: Cong Wang @ 2025-09-27 19:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: David Hildenbrand, linux-kernel, pasha.tatashin, Cong Wang,
	Andrew Morton, Baoquan He, Alexander Graf, Mike Rapoport,
	Changyuan Lyu, kexec, linux-mm, multikernel

On Wed, Sep 24, 2025 at 12:03 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> Thanks, that gives a nice overview!
>
> I/O Resource Allocation part will be interesting. Restructuring existing
> device drivers to allow spawned kernels to use specific hardware queues
> could be a lot of work and very device-specific. I guess a small set of
> devices can be supported initially and then it can grow over time.

My idea is to leverage existing technologies like XDP, which
offers huge benefits here:

1) It is based on shared memory (although it is virtual)

2) Its API's are user-space API's, which is even stronger for
kernel-to-kernel sharing, this possibly avoids re-inventing
another protocol.

3) It provides eBPF.

4) The spawned kernel does not require any hardware knowledge,
just pure XDP-ringbuffer-based software logic.

But it also has limitations:

1) xdp_md is too specific for networking, extending it to storage
could be very challenging. But we could introduce a SDP for
storage to just mimic XDP.

2) Regardless, we need a doorbell anyway. IPI is handy, but
I hope we could have an even lighter one. Or more ideally,
redirecting the hardware queue IRQ into each target CPU.

>
> This also reminds me of VFIO/mdev devices, which would be another
> solution to the same problem, but equally device-specific and also a lot
> of work to implement the devices that spawned kernels see.

Right.

I prototyped VFIO on my side with AI, but failed with its complex PCI
interface. And the spawn kernel still requires hardware knowledge
to interpret PCI BAR etc..

Regards,
Cong Wang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
       [not found] ` <BC25CE95-75A1-48E7-86E7-4E5E933761B8@flygoat.com>
@ 2025-09-27 20:06   ` Cong Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Cong Wang @ 2025-09-27 20:06 UTC (permalink / raw)
  To: Jiaxun Yang
  Cc: linux-kernel, pasha.tatashin, Cong Wang, Andrew Morton,
	Baoquan He, Alexander Graf, Mike Rapoport, Changyuan Lyu, kexec,
	linux-mm, multikernel

Hi Jiaxun,

On Thu, Sep 25, 2025 at 8:48 AM Jiaxun Yang <jiaxun.yang@flygoat.com> wrote:
>
>
>
> > 2025年9月19日 06:25，Cong Wang <xiyou.wangcong@gmail.com> 写道：
> >
> > This patch series introduces multikernel architecture support, enabling
> > multiple independent kernel instances to coexist and communicate on a
> > single physical machine. Each kernel instance can run on dedicated CPU
> > cores while sharing the underlying hardware resources.
>
> Hi Cong,
>
> Sorry for chime in here, and thanks for brining replicated-kernel back to the life.

I have to clarify: in my design, kernel is not replicated. It is the opposite,
I intend to have diversified kernels for highly customization for each
application.

>
> I have some experience on original Popcorn Linux [1] [2], which seems to be the
> root of most code in this series, please see my comments below.
>
> >
> > The multikernel architecture provides several key benefits:
> > - Improved fault isolation between different workloads
> > - Enhanced security through kernel-level separation
>
> I’d agree with Stefen’s comments [3], an "isolation” solution is critical for adaptation
> of multikernel OS, given that multi-tenant system is almost everywhere.
>
> Also allowing other kernel to inject IPI without any restriction can impose DOS attack
> risk.

This is true. Like I mentioned, this is also a good opportunity to invite
hardware (CPU) vendors to catch up with software, for example, they
could provide hardware-filtering for IPI via MSR.

If we look at how virtualization evolves, it is the hardware follows software.
VMCS comes after Xen or KVM, VPDA comes after virtio.

>
> > - Better resource utilization than traditional VM (KVM, Xen etc.)
> > - Potential zero-down kernel update with KHO (Kernel Hand Over)
> >
> > Architecture Overview:
> > The implementation leverages kexec infrastructure to load and manage
> > multiple kernel images, with each kernel instance assigned to specific
> > CPU cores. Inter-kernel communication is facilitated through a dedicated
> > IPI framework that allows kernels to coordinate and share information
> > when necessary.
> >
> > Key Components:
> > 1. Enhanced kexec subsystem with dynamic kimage tracking
> > 2. Generic IPI communication framework for inter-kernel messaging
>
> I actually have concerns over inter-kernel communication. The origin Popcorn
> IPI protocol, which seems to be inherited here, was designed as a prototype,
> without much consideration on the ecosystem. It would be nice if we can reused
> existing infra design for inter kernel communication.

Popcorn does the opposite: it still stays with a single image which is
essentially against isolation. In fact, I also read its latest paper this year,
I don't see any essential change on this big direction:
https://www.ssrg.ece.vt.edu/papers/asplos25.pdf

This is why fundamentally Popcorn is not suitable for isolation. Please
don't get me wrong: I am not questioning its usefulness, it is just simply
two opposite directions. I wish people best luck on the heterogeneous
ISA design, and I hope major CPU vendors will catch up with you too. :)

>
> I would suggest look into OpenAMP [4] and remoteproc subsystem in kernel. They
> already have mature solutions on communication between different kernels over coherent
> memory and mailboxes (rpmsg [5] co). They also defined ELF extensions to pass side band
> information for other kernel images.

Thanks for the pointers. Jim Huang also shared his idea on remoteproc
at LinuxCon this year. After evaluations, I found remoteproc may not be
as good as IPI. Remoteproc is designed for heterogeneous systems with
different architectures, adding unnecessary abstraction layers.

>
> Linaro folks are also working on a new VirtIO transport called virtio-msg [6], [7], which is designed
> with Linux-Linux hardware partitioning scenario in mind.

I think there is still a fundamental difference between static partitioning.
and elastic resource allocation.

Static partitioning can be achieved as a default case of dynamic allocation
when resources remain unchanged, but the reverse is not possible.

Hope this makes sense to you.

Regards,
Cong Wang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
       [not found] ` <aNZWTB_AbK1qtacy@kernel.org>
@ 2025-09-27 20:27   ` Cong Wang
  2025-09-27 20:39     ` Pasha Tatashin
  2025-09-28 14:08     ` Jarkko Sakkinen
  0 siblings, 2 replies; 22+ messages in thread
From: Cong Wang @ 2025-09-27 20:27 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-kernel, pasha.tatashin, Cong Wang, Andrew Morton,
	Baoquan He, Alexander Graf, Mike Rapoport, Changyuan Lyu, kexec,
	linux-mm, multikernel

On Fri, Sep 26, 2025 at 2:01 AM Jarkko Sakkinen <jarkko@kernel.org> wrote:
>
> On Thu, Sep 18, 2025 at 03:25:59PM -0700, Cong Wang wrote:
> > This patch series introduces multikernel architecture support, enabling
> > multiple independent kernel instances to coexist and communicate on a
> > single physical machine. Each kernel instance can run on dedicated CPU
> > cores while sharing the underlying hardware resources.
> >
> > The multikernel architecture provides several key benefits:
> > - Improved fault isolation between different workloads
> > - Enhanced security through kernel-level separation
> > - Better resource utilization than traditional VM (KVM, Xen etc.)
> > - Potential zero-down kernel update with KHO (Kernel Hand Over)
>
> This list is like asking AI to list benefits, or like the whole cover
> letter has that type of feel.

Sorry for giving you that feeling. Please let me know how I can
improve it for you.

>
> I'd probably work on benchmarks and other types of tests that can
> deliver comparative figures, and show data that addresses workloads
> with KVM, namespaces/cgroups and this, reflecting these qualities.

Sure, I think performance comes after usability, not vice versa.


>
> E.g. consider "Enhanced security through kernel-level separation".
> It's a pre-existing feature probably since dawn of time. Any new layer
> makes obviously more complex version "kernel-level separation". You'd
> had to prove that this even more complex version is more secure than
> pre-existing science.

Apologize for this. Do you mind explaining why this is more complex
than the KVM/Qemu/vhost/virtio/VDPA stack?

>
> kexec and its various corner cases and how this patch set addresses
> them is the part where I'm most lost.

Sorry for that. I will post Youtube videos to explain kexec in detail,
please follow our Youtube channel if you are interested. (I don't
want to post a link here in case people think I am promoting my
own interest, please email me privately.)

>
> If I look at one of multikernel distros (I don't know any other
> tbh) that I know it's really VT-d and that type of hardware
> enforcement that make Qubes shine:
>
> https://www.qubes-os.org/
>
> That said, I did not look how/if this is using CPU virtualization
> features as part of the solution, so correct me if I'm wrong.

Qubes OS is based on Xen:
https://en.wikipedia.org/wiki/Qubes_OS

>
> I'm not entirely sure whether this is aimed to be alternative to
> namespaces/cgroups or vms but more in the direction of Solaris Zones
> would be imho better alternative at least for containers because
> it saves the overhead of an extra kernel. There's also a patch set
> for this:
>
> https://lwn.net/Articles/780364/?ref=alian.info

Solaris Zones also share a single kernel. Or maybe I guess
you meant Kernel Zones? Isn't it a justification for our multikernel
approach for Linux? :-)

BTW, it is less flexible since it completely isolates kernels
without inter-kernel communication. With our design, you can
still choose not to use inter-kernel IPI's, which turns dynamic
into static.

>
> VM barrier combined with IOMMU is pretty strong and hardware
> enforced, and with polished configuration it can be fairly
> performant (e.g. via page cache bypass and stuff like that)
> so really the overhead that this is fighting against is
> context switch overhead.
>
> In security I don't believe this has any realistic chances to
> win over VMs and IOMMU...

I appreciate you sharing your opinions. I hope my information
helps.

Regards,
Cong Wang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-27 20:27   ` Cong Wang
@ 2025-09-27 20:39     ` Pasha Tatashin
  2025-09-28 14:08     ` Jarkko Sakkinen
  1 sibling, 0 replies; 22+ messages in thread
From: Pasha Tatashin @ 2025-09-27 20:39 UTC (permalink / raw)
  To: Cong Wang
  Cc: Jarkko Sakkinen, linux-kernel, Cong Wang, Andrew Morton,
	Baoquan He, Alexander Graf, Mike Rapoport, Changyuan Lyu, kexec,
	linux-mm, multikernel

On Sat, Sep 27, 2025 at 4:27 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Fri, Sep 26, 2025 at 2:01 AM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> >
> > On Thu, Sep 18, 2025 at 03:25:59PM -0700, Cong Wang wrote:
> > > This patch series introduces multikernel architecture support, enabling
> > > multiple independent kernel instances to coexist and communicate on a
> > > single physical machine. Each kernel instance can run on dedicated CPU
> > > cores while sharing the underlying hardware resources.
> > >
> > > The multikernel architecture provides several key benefits:
> > > - Improved fault isolation between different workloads
> > > - Enhanced security through kernel-level separation
> > > - Better resource utilization than traditional VM (KVM, Xen etc.)
> > > - Potential zero-down kernel update with KHO (Kernel Hand Over)
> >
> > This list is like asking AI to list benefits, or like the whole cover
> > letter has that type of feel.
>
> Sorry for giving you that feeling. Please let me know how I can
> improve it for you.
>
> >
> > I'd probably work on benchmarks and other types of tests that can
> > deliver comparative figures, and show data that addresses workloads
> > with KVM, namespaces/cgroups and this, reflecting these qualities.
>
> Sure, I think performance comes after usability, not vice versa.
>
>
> >
> > E.g. consider "Enhanced security through kernel-level separation".
> > It's a pre-existing feature probably since dawn of time. Any new layer
> > makes obviously more complex version "kernel-level separation". You'd
> > had to prove that this even more complex version is more secure than
> > pre-existing science.
>
> Apologize for this. Do you mind explaining why this is more complex
> than the KVM/Qemu/vhost/virtio/VDPA stack?
>
> >
> > kexec and its various corner cases and how this patch set addresses
> > them is the part where I'm most lost.
>
> Sorry for that. I will post Youtube videos to explain kexec in detail,
> please follow our Youtube channel if you are interested. (I don't
> want to post a link here in case people think I am promoting my
> own interest, please email me privately.)
>
> >
> > If I look at one of multikernel distros (I don't know any other
> > tbh) that I know it's really VT-d and that type of hardware
> > enforcement that make Qubes shine:
> >
> > https://www.qubes-os.org/
> >
> > That said, I did not look how/if this is using CPU virtualization
> > features as part of the solution, so correct me if I'm wrong.
>
> Qubes OS is based on Xen:
> https://en.wikipedia.org/wiki/Qubes_OS
>
> >
> > I'm not entirely sure whether this is aimed to be alternative to
> > namespaces/cgroups or vms but more in the direction of Solaris Zones
> > would be imho better alternative at least for containers because
> > it saves the overhead of an extra kernel. There's also a patch set
> > for this:
> >
> > https://lwn.net/Articles/780364/?ref=alian.info
>
> Solaris Zones also share a single kernel. Or maybe I guess
> you meant Kernel Zones? Isn't it a justification for our multikernel
> approach for Linux? :-)

Solaris kernel zones use sun4v hypervisor to protect isolation. There
is no such thing on x86 and arm.

Pasha

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
       [not found]     ` <aNZh3uDdORZ5mfSD@kernel.org>
@ 2025-09-27 20:43       ` Cong Wang
  2025-09-28 14:22         ` Jarkko Sakkinen
  0 siblings, 1 reply; 22+ messages in thread
From: Cong Wang @ 2025-09-27 20:43 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: Christoph Lameter (Ampere), linux-kernel, pasha.tatashin,
	Cong Wang, Andrew Morton, Baoquan He, Alexander Graf,
	Mike Rapoport, Changyuan Lyu, kexec, linux-mm, multikernel

On Fri, Sep 26, 2025 at 2:50 AM Jarkko Sakkinen <jarkko@kernel.org> wrote:
>
> On Wed, Sep 24, 2025 at 11:39:44AM -0700, Cong Wang wrote:
> > On Wed, Sep 24, 2025 at 10:51 AM Christoph Lameter (Ampere)
> > <cl@gentwo.org> wrote:
> > > AFAICT various contemporary Android deployments do the multiple kernel
> > > approach in one way or another already for security purposes and for
> > > specialized controllers. However, the multi kernel approaches are often
> > > depending on specialized and dedicated hardware. It may be difficult to
> > > support with a generic approach developed here.
> >
> > You are right, the multikernel concept is indeed pretty old, the BarrelFish
> > OS was invented in around 2009. Jailhouse was released 12 years ago.
> > There are tons of papers in this area too.
>
> Jailhouse is quite nice actually. Perhaps you should pick that up
> instead, and start refining and improving it? I'd be interested to test
> refined jailhouse patches. It's also easy build test images having the
> feature both with BuildRoot and Yocto.

Static partitioning is not a bad choice, except it is less flexible. We can't
get dynamic resource allocation with just static partitioning, but we can
easily get static partitioning with dynamic allocation, in fact, it should be
the default case.

In my own opinion, the reason why containers today are more popular
than VM's is not just performance, it is elasticity too. Static partitioning
is essentially against elasticity.

More fundamentally, it is based on VMCS, which essentially requires
a hypervisor:
https://github.com/siemens/jailhouse/blob/master/hypervisor/control.c

>
> It would take me like half'ish day to create build target for it.
>
> > Dual-kernel systems, whether using virtualization or firmware, are indeed
> > common at least for automotives today. This is a solid justification of its
> > usefulness and real-world practice.
>
> OK so neither virtualization nor firmware are well defined here.
> Firmware e.g. can mean anything fro pre-bootloader to full operating
> system depending on context or who you ask.
>
> It's also pretty hard to project why VMs are bad for cars, and
> despite lacking experience with building operating systems for
> cars, I'd like to believe that the hardware enforcement that VT-x
> and VT-d type of technologies bring is actually great for cars.
>
> It's like every other infosec con where someone is hacking a car,
> and I seen even people who've participated to hackatons by car
> manufacturers. That industry is improving gradually and the
> challenge would be to create hard evidence that this brings
> better isolation than VM based solutions..

In case it is still not clear: No one wants to stop you from using a
VM. In fact, at least in theory, you could use a VM inside a multikernel.
Just like today we can still run a container in a VM (Kata Container).

Your choice is always on the table.

I hope this helps.

Regards,
Cong Wang

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-27 20:27   ` Cong Wang
  2025-09-27 20:39     ` Pasha Tatashin
@ 2025-09-28 14:08     ` Jarkko Sakkinen
  1 sibling, 0 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2025-09-28 14:08 UTC (permalink / raw)
  To: Cong Wang
  Cc: linux-kernel, pasha.tatashin, Cong Wang, Andrew Morton,
	Baoquan He, Alexander Graf, Mike Rapoport, Changyuan Lyu, kexec,
	linux-mm, multikernel

On Sat, Sep 27, 2025 at 01:27:04PM -0700, Cong Wang wrote:
> On Fri, Sep 26, 2025 at 2:01 AM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> >
> > On Thu, Sep 18, 2025 at 03:25:59PM -0700, Cong Wang wrote:
> > > This patch series introduces multikernel architecture support, enabling
> > > multiple independent kernel instances to coexist and communicate on a
> > > single physical machine. Each kernel instance can run on dedicated CPU
> > > cores while sharing the underlying hardware resources.
> > >
> > > The multikernel architecture provides several key benefits:
> > > - Improved fault isolation between different workloads
> > > - Enhanced security through kernel-level separation
> > > - Better resource utilization than traditional VM (KVM, Xen etc.)
> > > - Potential zero-down kernel update with KHO (Kernel Hand Over)
> >
> > This list is like asking AI to list benefits, or like the whole cover
> > letter has that type of feel.
> 
> Sorry for giving you that feeling. Please let me know how I can
> improve it for you.

There is no evidence of any of these benefits. That's the central
issue. You pretty much must give quantatitve proof of any of these
claims or the benefit is imaginary.

> 
> >
> > I'd probably work on benchmarks and other types of tests that can
> > deliver comparative figures, and show data that addresses workloads
> > with KVM, namespaces/cgroups and this, reflecting these qualities.
> 
> Sure, I think performance comes after usability, not vice versa.
> 
> 
> >
> > E.g. consider "Enhanced security through kernel-level separation".
> > It's a pre-existing feature probably since dawn of time. Any new layer
> > makes obviously more complex version "kernel-level separation". You'd
> > had to prove that this even more complex version is more secure than
> > pre-existing science.
> 
> Apologize for this. Do you mind explaining why this is more complex
> than the KVM/Qemu/vhost/virtio/VDPA stack?

KVM does not complicate kernel level separation or access control per
kernel instance at all. A guest in the end of the day is just a fancy
executable.

This feature on the other hand intervenes various easily breaking code
paths.

> 
> >
> > kexec and its various corner cases and how this patch set addresses
> > them is the part where I'm most lost.
> 
> Sorry for that. I will post Youtube videos to explain kexec in detail,
> please follow our Youtube channel if you are interested. (I don't
> want to post a link here in case people think I am promoting my
> own interest, please email me privately.)

Here I have to say that posting a youtube link to LKML is of your
own interest is not unacceptable as far as I'm concerned :-)

That said, I don't promise that I will watch any of the Youtube
videos posted either here or privately. All the quantitative
proof should be embedded to patches.

> 
> >
> > If I look at one of multikernel distros (I don't know any other
> > tbh) that I know it's really VT-d and that type of hardware
> > enforcement that make Qubes shine:
> >
> > https://www.qubes-os.org/
> >
> > That said, I did not look how/if this is using CPU virtualization
> > features as part of the solution, so correct me if I'm wrong.
> 
> Qubes OS is based on Xen:
> https://en.wikipedia.org/wiki/Qubes_OS


Yes, and it works great, and has much stronger security metrics than
this could ever reach, and that is quantitative fact, thanks to great
technologies such as VT-d :-)

This is why I'm repeating the requirement for quantitative proof. We
have already great solutions to most what this can do so building
evidence of usefulness is the huge stretch this patch set should
make it.

Nothing personal, but with the current basically just claims, I don't
believe in this. That said, by saying this I don't I'd pick my soccer
no. If there is enough evidence, I'm always ready to turn my opinion
180 degrees.

> 
> >
> > I'm not entirely sure whether this is aimed to be alternative to
> > namespaces/cgroups or vms but more in the direction of Solaris Zones
> > would be imho better alternative at least for containers because
> > it saves the overhead of an extra kernel. There's also a patch set
> > for this:
> >
> > https://lwn.net/Articles/780364/?ref=alian.info
> 
> Solaris Zones also share a single kernel. Or maybe I guess
> you meant Kernel Zones? Isn't it a justification for our multikernel
> approach for Linux? :-)
> 
> BTW, it is less flexible since it completely isolates kernels
> without inter-kernel communication. With our design, you can
> still choose not to use inter-kernel IPI's, which turns dynamic
> into static.
> 
> >
> > VM barrier combined with IOMMU is pretty strong and hardware
> > enforced, and with polished configuration it can be fairly
> > performant (e.g. via page cache bypass and stuff like that)
> > so really the overhead that this is fighting against is
> > context switch overhead.
> >
> > In security I don't believe this has any realistic chances to
> > win over VMs and IOMMU...
> 
> I appreciate you sharing your opinions. I hope my information
> helps.

I'd put strong focus on getting the figures aside with the claims :-)

> 
> Regards,
> Cong Wang

BR, Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-27 20:43       ` Cong Wang
@ 2025-09-28 14:22         ` Jarkko Sakkinen
  2025-09-28 14:36           ` Jarkko Sakkinen
  0 siblings, 1 reply; 22+ messages in thread
From: Jarkko Sakkinen @ 2025-09-28 14:22 UTC (permalink / raw)
  To: Cong Wang
  Cc: Christoph Lameter (Ampere), linux-kernel, pasha.tatashin,
	Cong Wang, Andrew Morton, Baoquan He, Alexander Graf,
	Mike Rapoport, Changyuan Lyu, kexec, linux-mm, multikernel

On Sat, Sep 27, 2025 at 01:43:23PM -0700, Cong Wang wrote:
> On Fri, Sep 26, 2025 at 2:50 AM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> >
> > On Wed, Sep 24, 2025 at 11:39:44AM -0700, Cong Wang wrote:
> > > On Wed, Sep 24, 2025 at 10:51 AM Christoph Lameter (Ampere)
> > > <cl@gentwo.org> wrote:
> > > > AFAICT various contemporary Android deployments do the multiple kernel
> > > > approach in one way or another already for security purposes and for
> > > > specialized controllers. However, the multi kernel approaches are often
> > > > depending on specialized and dedicated hardware. It may be difficult to
> > > > support with a generic approach developed here.
> > >
> > > You are right, the multikernel concept is indeed pretty old, the BarrelFish
> > > OS was invented in around 2009. Jailhouse was released 12 years ago.
> > > There are tons of papers in this area too.
> >
> > Jailhouse is quite nice actually. Perhaps you should pick that up
> > instead, and start refining and improving it? I'd be interested to test
> > refined jailhouse patches. It's also easy build test images having the
> > feature both with BuildRoot and Yocto.
> 
> Static partitioning is not a bad choice, except it is less flexible. We can't
> get dynamic resource allocation with just static partitioning, but we can
> easily get static partitioning with dynamic allocation, in fact, it should be
> the default case.
> 
> In my own opinion, the reason why containers today are more popular
> than VM's is not just performance, it is elasticity too. Static partitioning
> is essentially against elasticity.

How do you make a popularity comparison between VMs and containers, and
what does the word "popularity" means in the context? The whole world
runs basically runs with guest VMs (just go to check AWS, Azure, Oracle
Cloud and what not).

The problem in that argument is that there is no problem.

BR, Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-28 14:22         ` Jarkko Sakkinen
@ 2025-09-28 14:36           ` Jarkko Sakkinen
  2025-09-28 14:41             ` Jarkko Sakkinen
  0 siblings, 1 reply; 22+ messages in thread
From: Jarkko Sakkinen @ 2025-09-28 14:36 UTC (permalink / raw)
  To: Cong Wang
  Cc: Christoph Lameter (Ampere), linux-kernel, pasha.tatashin,
	Cong Wang, Andrew Morton, Baoquan He, Alexander Graf,
	Mike Rapoport, Changyuan Lyu, kexec, linux-mm, multikernel

On Sun, Sep 28, 2025 at 05:22:43PM +0300, Jarkko Sakkinen wrote:
> On Sat, Sep 27, 2025 at 01:43:23PM -0700, Cong Wang wrote:
> > On Fri, Sep 26, 2025 at 2:50 AM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > >
> > > On Wed, Sep 24, 2025 at 11:39:44AM -0700, Cong Wang wrote:
> > > > On Wed, Sep 24, 2025 at 10:51 AM Christoph Lameter (Ampere)
> > > > <cl@gentwo.org> wrote:
> > > > > AFAICT various contemporary Android deployments do the multiple kernel
> > > > > approach in one way or another already for security purposes and for
> > > > > specialized controllers. However, the multi kernel approaches are often
> > > > > depending on specialized and dedicated hardware. It may be difficult to
> > > > > support with a generic approach developed here.
> > > >
> > > > You are right, the multikernel concept is indeed pretty old, the BarrelFish
> > > > OS was invented in around 2009. Jailhouse was released 12 years ago.
> > > > There are tons of papers in this area too.
> > >
> > > Jailhouse is quite nice actually. Perhaps you should pick that up
> > > instead, and start refining and improving it? I'd be interested to test
> > > refined jailhouse patches. It's also easy build test images having the
> > > feature both with BuildRoot and Yocto.
> > 
> > Static partitioning is not a bad choice, except it is less flexible. We can't
> > get dynamic resource allocation with just static partitioning, but we can
> > easily get static partitioning with dynamic allocation, in fact, it should be
> > the default case.
> > 
> > In my own opinion, the reason why containers today are more popular
> > than VM's is not just performance, it is elasticity too. Static partitioning
> > is essentially against elasticity.
> 
> How do you make a popularity comparison between VMs and containers, and
> what does the word "popularity" means in the context? The whole world
> runs basically runs with guest VMs (just go to check AWS, Azure, Oracle
> Cloud and what not).
> 
> The problem in that argument is that there is no problem.

If I was working on such a feature I would probably package it for e.g,
BuildRoot with BR2_EXTERNAL type of Git and create a user space that
can run some test and benchmarks that actually highlight the benefits.

Then, I would trash the existing cover letter with something with clear
problem statement and motivation instead of whitepaper alike claims.

We can argue to the eterenity with qualitative aspects of any feature
but it is the quantitative proof that actually drives things forward.

BR, Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-28 14:36           ` Jarkko Sakkinen
@ 2025-09-28 14:41             ` Jarkko Sakkinen
  0 siblings, 0 replies; 22+ messages in thread
From: Jarkko Sakkinen @ 2025-09-28 14:41 UTC (permalink / raw)
  To: Cong Wang
  Cc: Christoph Lameter (Ampere), linux-kernel, pasha.tatashin,
	Cong Wang, Andrew Morton, Baoquan He, Alexander Graf,
	Mike Rapoport, Changyuan Lyu, kexec, linux-mm, multikernel

On Sun, Sep 28, 2025 at 05:36:32PM +0300, Jarkko Sakkinen wrote:
> On Sun, Sep 28, 2025 at 05:22:43PM +0300, Jarkko Sakkinen wrote:
> > On Sat, Sep 27, 2025 at 01:43:23PM -0700, Cong Wang wrote:
> > > On Fri, Sep 26, 2025 at 2:50 AM Jarkko Sakkinen <jarkko@kernel.org> wrote:
> > > >
> > > > On Wed, Sep 24, 2025 at 11:39:44AM -0700, Cong Wang wrote:
> > > > > On Wed, Sep 24, 2025 at 10:51 AM Christoph Lameter (Ampere)
> > > > > <cl@gentwo.org> wrote:
> > > > > > AFAICT various contemporary Android deployments do the multiple kernel
> > > > > > approach in one way or another already for security purposes and for
> > > > > > specialized controllers. However, the multi kernel approaches are often
> > > > > > depending on specialized and dedicated hardware. It may be difficult to
> > > > > > support with a generic approach developed here.
> > > > >
> > > > > You are right, the multikernel concept is indeed pretty old, the BarrelFish
> > > > > OS was invented in around 2009. Jailhouse was released 12 years ago.
> > > > > There are tons of papers in this area too.
> > > >
> > > > Jailhouse is quite nice actually. Perhaps you should pick that up
> > > > instead, and start refining and improving it? I'd be interested to test
> > > > refined jailhouse patches. It's also easy build test images having the
> > > > feature both with BuildRoot and Yocto.
> > > 
> > > Static partitioning is not a bad choice, except it is less flexible. We can't
> > > get dynamic resource allocation with just static partitioning, but we can
> > > easily get static partitioning with dynamic allocation, in fact, it should be
> > > the default case.
> > > 
> > > In my own opinion, the reason why containers today are more popular
> > > than VM's is not just performance, it is elasticity too. Static partitioning
> > > is essentially against elasticity.
> > 
> > How do you make a popularity comparison between VMs and containers, and
> > what does the word "popularity" means in the context? The whole world
> > runs basically runs with guest VMs (just go to check AWS, Azure, Oracle
> > Cloud and what not).
> > 
> > The problem in that argument is that there is no problem.
> 
> If I was working on such a feature I would probably package it for e.g,
> BuildRoot with BR2_EXTERNAL type of Git and create a user space that
> can run some test and benchmarks that actually highlight the benefits.
> 
> Then, I would trash the existing cover letter with something with clear
> problem statement and motivation instead of whitepaper alike claims.
> 
> We can argue to the eterenity with qualitative aspects of any feature
> but it is the quantitative proof that actually drives things forward.

I'd also carefully check as per modifying kexec that more complex use
cases are compatible such as IMA. I don't know if there is an issue with
secure boot but I'd make sure that there is no friction with it either.

There's also shared security related hardware resources such as TPM,
and in this context two instances end up thus sharing it for e.g.
measurements, and that type of cross-communication could have 
unpredictable consequences (would need to be checked).

BR, Jarkko

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-27 19:42                   ` Cong Wang
@ 2025-09-29 15:11                     ` Stefan Hajnoczi
  2025-10-02  4:17                       ` Cong Wang
  0 siblings, 1 reply; 22+ messages in thread
From: Stefan Hajnoczi @ 2025-09-29 15:11 UTC (permalink / raw)
  To: Cong Wang
  Cc: David Hildenbrand, linux-kernel, pasha.tatashin, Cong Wang,
	Andrew Morton, Baoquan He, Alexander Graf, Mike Rapoport,
	Changyuan Lyu, kexec, linux-mm, multikernel, jasowang

[-- Attachment #1: Type: text/plain, Size: 2780 bytes --]

On Sat, Sep 27, 2025 at 12:42:23PM -0700, Cong Wang wrote:
> On Wed, Sep 24, 2025 at 12:03 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >
> > Thanks, that gives a nice overview!
> >
> > I/O Resource Allocation part will be interesting. Restructuring existing
> > device drivers to allow spawned kernels to use specific hardware queues
> > could be a lot of work and very device-specific. I guess a small set of
> > devices can be supported initially and then it can grow over time.
> 
> My idea is to leverage existing technologies like XDP, which
> offers huge benefits here:
> 
> 1) It is based on shared memory (although it is virtual)
> 
> 2) Its API's are user-space API's, which is even stronger for
> kernel-to-kernel sharing, this possibly avoids re-inventing
> another protocol.
> 
> 3) It provides eBPF.
> 
> 4) The spawned kernel does not require any hardware knowledge,
> just pure XDP-ringbuffer-based software logic.
> 
> But it also has limitations:
> 
> 1) xdp_md is too specific for networking, extending it to storage
> could be very challenging. But we could introduce a SDP for
> storage to just mimic XDP.
> 
> 2) Regardless, we need a doorbell anyway. IPI is handy, but
> I hope we could have an even lighter one. Or more ideally,
> redirecting the hardware queue IRQ into each target CPU.

I see. I was thinking that spawned kernels would talk directly to the
hardware. Your idea of using a software interface is less invasive but
has an overhead similar to paravirtualized devices.

A software approach that supports a wider range of devices is
virtio_vdpa (drivers/vdpa/). The current virtio_vdpa implementation
assumes that the device is located in the same kernel. A
kernel-to-kernel bridge would be needed so that the spawned kernel
forwards the vDPA operations to the other kernel. The other kernel
provides the virtio-net, virtio-blk, etc device functionality by passing
requests to a netdev, blkdev, etc.

There are in-kernel simulator devices for virtio-net and virtio-blk in
drivers/vdpa/vdpa_sim/ which can be used as a starting point. These
devices are just for testing and would need to be fleshed out to become
useful for real workloads.

I have CCed Jason Wang, who maintains vDPA, in case you want to discuss
it more.

> 
> >
> > This also reminds me of VFIO/mdev devices, which would be another
> > solution to the same problem, but equally device-specific and also a lot
> > of work to implement the devices that spawned kernels see.
> 
> Right.
> 
> I prototyped VFIO on my side with AI, but failed with its complex PCI
> interface. And the spawn kernel still requires hardware knowledge
> to interpret PCI BAR etc..

Yeah, it's complex and invasive. :/

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC Patch 0/7] kernel: Introduce multikernel architecture support
  2025-09-29 15:11                     ` Stefan Hajnoczi
@ 2025-10-02  4:17                       ` Cong Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Cong Wang @ 2025-10-02  4:17 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: David Hildenbrand, linux-kernel, pasha.tatashin, Cong Wang,
	Andrew Morton, Baoquan He, Alexander Graf, Mike Rapoport,
	Changyuan Lyu, kexec, linux-mm, multikernel, jasowang

On Mon, Sep 29, 2025 at 8:12 AM Stefan Hajnoczi <stefanha@redhat.com> wrote:
>
> On Sat, Sep 27, 2025 at 12:42:23PM -0700, Cong Wang wrote:
> > On Wed, Sep 24, 2025 at 12:03 PM Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >
> > > Thanks, that gives a nice overview!
> > >
> > > I/O Resource Allocation part will be interesting. Restructuring existing
> > > device drivers to allow spawned kernels to use specific hardware queues
> > > could be a lot of work and very device-specific. I guess a small set of
> > > devices can be supported initially and then it can grow over time.
> >
> > My idea is to leverage existing technologies like XDP, which
> > offers huge benefits here:
> >
> > 1) It is based on shared memory (although it is virtual)
> >
> > 2) Its API's are user-space API's, which is even stronger for
> > kernel-to-kernel sharing, this possibly avoids re-inventing
> > another protocol.
> >
> > 3) It provides eBPF.
> >
> > 4) The spawned kernel does not require any hardware knowledge,
> > just pure XDP-ringbuffer-based software logic.
> >
> > But it also has limitations:
> >
> > 1) xdp_md is too specific for networking, extending it to storage
> > could be very challenging. But we could introduce a SDP for
> > storage to just mimic XDP.
> >
> > 2) Regardless, we need a doorbell anyway. IPI is handy, but
> > I hope we could have an even lighter one. Or more ideally,
> > redirecting the hardware queue IRQ into each target CPU.
>
> I see. I was thinking that spawned kernels would talk directly to the
> hardware. Your idea of using a software interface is less invasive but
> has an overhead similar to paravirtualized devices.

When we have sufficient hardware resources or prefer to use
SR IOV, the multikernel could indeed access hardware directly.
Queues are an alternative choice for elasticity.

>
> A software approach that supports a wider range of devices is
> virtio_vdpa (drivers/vdpa/). The current virtio_vdpa implementation
> assumes that the device is located in the same kernel. A
> kernel-to-kernel bridge would be needed so that the spawned kernel
> forwards the vDPA operations to the other kernel. The other kernel
> provides the virtio-net, virtio-blk, etc device functionality by passing
> requests to a netdev, blkdev, etc.

I think that is the major blocker. VDPA looks more complex than
queue-based solutions (including Soft Functions provided by mlx),
from my naive understanding, but I will take a deep look at VDPA.

>
> There are in-kernel simulator devices for virtio-net and virtio-blk in
> drivers/vdpa/vdpa_sim/ which can be used as a starting point. These
> devices are just for testing and would need to be fleshed out to become
> useful for real workloads.
>
> I have CCed Jason Wang, who maintains vDPA, in case you want to discuss
> it more.

Appreciate it.

Regards,
Cong Wang

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-10-02  4:17 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20250918222607.186488-1-xiyou.wangcong@gmail.com>
     [not found] ` <20250921014721.7323-1-hdanton@sina.com>
2025-09-22 21:55   ` [RFC Patch 0/7] kernel: Introduce multikernel architecture support Cong Wang
2025-09-24  1:12     ` Hillf Danton
2025-09-24 17:30       ` Cong Wang
2025-09-24 22:42         ` Hillf Danton
     [not found] ` <20250919212650.GA275426@fedora>
     [not found]   ` <CAM_iQpXnHr7WC6VN3WB-+=CZGF5pyfo9y9D4MCc_Wwgp29hBrw@mail.gmail.com>
     [not found]     ` <20250922142831.GA351870@fedora>
2025-09-22 22:41       ` Cong Wang
2025-09-23 17:05         ` Stefan Hajnoczi
2025-09-24 11:38           ` David Hildenbrand
2025-09-24 12:51             ` Stefan Hajnoczi
2025-09-24 18:28               ` Cong Wang
2025-09-24 19:03                 ` Stefan Hajnoczi
2025-09-27 19:42                   ` Cong Wang
2025-09-29 15:11                     ` Stefan Hajnoczi
2025-10-02  4:17                       ` Cong Wang
2025-09-24 17:18           ` Cong Wang
     [not found] ` <BC25CE95-75A1-48E7-86E7-4E5E933761B8@flygoat.com>
2025-09-27 20:06   ` Cong Wang
     [not found] ` <aNZWTB_AbK1qtacy@kernel.org>
2025-09-27 20:27   ` Cong Wang
2025-09-27 20:39     ` Pasha Tatashin
2025-09-28 14:08     ` Jarkko Sakkinen
     [not found] ` <78127855-104f-46e2-e5d2-52c622243b08@gentwo.org>
     [not found]   ` <CAM_iQpU2QucTR7+6TwE9yKb+QZg5u_=r9O_tMfsn7Ss7kJbd9A@mail.gmail.com>
     [not found]     ` <aNZh3uDdORZ5mfSD@kernel.org>
2025-09-27 20:43       ` Cong Wang
2025-09-28 14:22         ` Jarkko Sakkinen
2025-09-28 14:36           ` Jarkko Sakkinen
2025-09-28 14:41             ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).