public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: lkml - Kernel Mailing List <linux-kernel@vger.kernel.org>,
	virtualization <virtualization@lists.linux-foundation.org>
Subject: [PATCH 7/7] lguest: documentation pt VII: FIXMEs
Date: Sat, 21 Jul 2007 11:24:09 +1000	[thread overview]
Message-ID: <1184981049.6344.15.camel@localhost.localdomain> (raw)
In-Reply-To: <1184980905.6344.11.camel@localhost.localdomain>

Documentation: The FIXMEs

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>

---
 Documentation/lguest/lguest.c         |   12 ++++++++++++
 drivers/char/hvc_lguest.c             |    3 +++
 drivers/lguest/interrupts_and_traps.c |   14 ++++++++++++++
 drivers/lguest/io.c                   |   10 ++++++++++
 drivers/lguest/lguest.c               |    8 ++++++++
 drivers/lguest/lguest_asm.S           |   14 ++++++++++++++
 drivers/lguest/page_tables.c          |    5 +++++
 drivers/lguest/segments.c             |    4 ++++
 drivers/net/lguest_net.c              |   19 +++++++++++++++++++
 9 files changed, 89 insertions(+)

===================================================================
--- a/Documentation/lguest/lguest.c
+++ b/Documentation/lguest/lguest.c
@@ -1536,3 +1536,15 @@ int main(int argc, char *argv[])
 	/* Finally, run the Guest.  This doesn't return. */
 	run_guest(lguest_fd, &device_list);
 }
+/*:*/
+
+/*M:999
+ * Mastery is done: you now know everything I do.
+ *
+ * But surely you have seen code, features and bugs in your wanderings which
+ * you now yearn to attack?  That is the real game, and I look forward to you
+ * patching and forking lguest into the Your-Name-Here-visor.
+ *
+ * Farewell, and good coding!
+ * Rusty Russell.
+ */
===================================================================
--- a/drivers/char/hvc_lguest.c
+++ b/drivers/char/hvc_lguest.c
@@ -13,6 +13,9 @@
  * functions.
  :*/
 
+/*M:002 The console can be flooded: while the Guest is processing input the
+ * Host can send more.  Buffering in the Host could alleviate this, but it is a
+ * difficult problem in general. :*/
 /* Copyright (C) 2006 Rusty Russell, IBM Corporation
  *
  * This program is free software; you can redistribute it and/or modify
===================================================================
--- a/drivers/lguest/interrupts_and_traps.c
+++ b/drivers/lguest/interrupts_and_traps.c
@@ -231,6 +231,20 @@ static int direct_trap(const struct lgue
 	 * go direct, of course 8) */
 	return idt_type(trap->a, trap->b) == 0xF;
 }
+/*:*/
+
+/*M:005 The Guest has the ability to turn its interrupt gates into trap gates,
+ * if it is careful.  The Host will let trap gates can go directly to the
+ * Guest, but the Guest needs the interrupts atomically disabled for an
+ * interrupt gate.  It can do this by pointing the trap gate at instructions
+ * within noirq_start and noirq_end, where it can safely disable interrupts. */
+
+/*M:006 The Guests do not use the sysenter (fast system call) instruction,
+ * because it's hardcoded to enter privilege level 0 and so can't go direct.
+ * It's about twice as fast as the older "int 0x80" system call, so it might
+ * still be worthwhile to handle it in the Switcher and lcall down to the
+ * Guest.  The sysenter semantics are hairy tho: search for that keyword in
+ * entry.S :*/
 
 /*H:260 When we make traps go directly into the Guest, we need to make sure
  * the kernel stack is valid (ie. mapped in the page tables).  Otherwise, the
===================================================================
--- a/drivers/lguest/io.c
+++ b/drivers/lguest/io.c
@@ -553,6 +553,16 @@ void release_all_dma(struct lguest *lg)
 	up_read(&lg->mm->mmap_sem);
 }
 
+/*M:007 We only return a single DMA buffer to the Launcher, but it would be
+ * more efficient to return a pointer to the entire array of DMA buffers, which
+ * it can cache and choose one whenever it wants.
+ *
+ * Currently the Launcher uses a write to /dev/lguest, and the return value is
+ * the address of the DMA structure with the interrupt number placed in
+ * dma->used_len.  If we wanted to return the entire array, we need to return
+ * the address, array size and interrupt number: this seems to require an
+ * ioctl(). :*/
+
 /*L:320 This routine looks for a DMA buffer registered by the Guest on the
  * given key (using the BIND_DMA hypercall). */
 unsigned long get_dma_buffer(struct lguest *lg,
===================================================================
--- a/drivers/lguest/lguest.c
+++ b/drivers/lguest/lguest.c
@@ -251,6 +251,14 @@ static void irq_enable(void)
 {
 	lguest_data.irq_enabled = X86_EFLAGS_IF;
 }
+/*:*/
+/*M:003 Note that we don't check for outstanding interrupts when we re-enable
+ * them (or when we unmask an interrupt).  This seems to work for the moment,
+ * since interrupts are rare and we'll just get the interrupt on the next timer
+ * tick, but now we have CONFIG_NO_HZ, we should revisit this.  One way
+ * would be to put the "irq_enabled" field in a page by itself, and have the
+ * Host write-protect it when an interrupt comes in when irqs are disabled.
+ * There will then be a page fault as soon as interrupts are re-enabled. :*/
 
 /*G:034
  * The Interrupt Descriptor Table (IDT).
===================================================================
--- a/drivers/lguest/lguest_asm.S
+++ b/drivers/lguest/lguest_asm.S
@@ -41,6 +41,20 @@ LGUEST_PATCH(pushf, movl lguest_data+LGU
 .global lguest_noirq_start
 .global lguest_noirq_end
 
+/*M:004 When the Host reflects a trap or injects an interrupt into the Guest,
+ * it sets the eflags interrupt bit on the stack based on
+ * lguest_data.irq_enabled, so the Guest iret logic does the right thing when
+ * restoring it.  However, when the Host sets the Guest up for direct traps,
+ * such as system calls, the processor is the one to push eflags onto the
+ * stack, and the interrupt bit will be 1 (in reality, interrupts are always
+ * enabled in the Guest).
+ *
+ * This turns out to be harmless: the only trap which should happen under Linux
+ * with interrupts disabled is Page Fault (due to our lazy mapping of vmalloc
+ * regions), which has to be reflected through the Host anyway.  If another
+ * trap *does* go off when interrupts are disabled, the Guest will panic, and
+ * we'll never get to this iret! :*/
+
 /*G:045 There is one final paravirt_op that the Guest implements, and glancing
  * at it you can see why I left it to last.  It's *cool*!  It's in *assembler*!
  *
===================================================================
--- a/drivers/lguest/page_tables.c
+++ b/drivers/lguest/page_tables.c
@@ -14,6 +14,11 @@
 #include <linux/percpu.h>
 #include <asm/tlbflush.h>
 #include "lg.h"
+
+/*M:008 We hold reference to pages, which prevents them from being swapped.
+ * It'd be nice to have a callback in the "struct mm_struct" when Linux wants
+ * to swap out.  If we had this, and a shrinker callback to trim PTE pages, we
+ * could probably consider launching Guests as non-root. :*/
 
 /*H:300
  * The Page Table Code
===================================================================
--- a/drivers/lguest/segments.c
+++ b/drivers/lguest/segments.c
@@ -94,6 +94,10 @@ static void check_segment_use(struct lgu
 	    || lg->regs->ss / 8 == desc)
 		kill_guest(lg, "Removed live GDT entry %u", desc);
 }
+/*:*/
+/*M:009 We wouldn't need to check for removal of in-use segments if we handled
+ * faults in the Switcher.  However, it's probably not a worthwhile
+ * optimization. :*/
 
 /*H:610 Once the GDT has been changed, we look through the changed entries and
  * see if they're OK.  If not, we'll call kill_guest() and the Guest will never
===================================================================
--- a/drivers/net/lguest_net.c
+++ b/drivers/net/lguest_net.c
@@ -34,6 +34,25 @@
 #define SHARED_SIZE		PAGE_SIZE
 #define MAX_LANS		4
 #define NUM_SKBS		8
+
+/*M:011 Network code master Jeff Garzik points out numerous shortcomings in
+ * this driver if it aspires to greatness.
+ *
+ * Firstly, it doesn't use "NAPI": the networking's New API, and is poorer for
+ * it.  As he says "NAPI means system-wide load leveling, across multiple
+ * network interfaces.  Lack of NAPI can mean competition at higher loads."
+ *
+ * He also points out that we don't implement set_mac_address, so users cannot
+ * change the devices hardware address.  When I asked why one would want to:
+ * "Bonding, and situations where you /do/ want the MAC address to "leak" out
+ * of the host onto the wider net."
+ *
+ * Finally, he would like module unloading: "It is not unrealistic to think of
+ * [un|re|]loading the net support module in an lguest guest.  And, adding
+ * module support makes the programmer more responsible, because they now have
+ * to learn to clean up after themselves.  Any driver that cannot clean up
+ * after itself is an incomplete driver in my book."
+ :*/
 
 /*D:530 The "struct lguestnet_info" contains all the information we need to
  * know about the network device. */



  reply	other threads:[~2007-07-21  1:24 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-21  1:17 [PATCH 1/7] lguest: documentation pt I: Preparation Rusty Russell
2007-07-21  1:18 ` [PATCH 2/7] lguest: documentation pt II: Guest Rusty Russell
2007-07-21  1:19   ` [PATCH 3/7] lguest: documentation pt III: Drivers Rusty Russell
2007-07-21  1:20     ` [PATCH 4/7] lguest: documentation pt IV: Launcher Rusty Russell
2007-07-21  1:21       ` [PATCH 5/7] lguest: documentation pt V: Host Rusty Russell
2007-07-21  1:21         ` [PATCH 6/7] lguest: documentation pt VI: Switcher Rusty Russell
2007-07-21  1:24           ` Rusty Russell [this message]
2007-07-24  0:12 ` [PATCH 1/7] lguest: documentation pt I: Preparation Andrew Morton
2007-07-24  1:01   ` Rusty Russell
2007-07-24  1:18     ` Linus Torvalds
2007-07-24  1:51       ` Rusty Russell
2007-07-24  9:52         ` Alan Cox
2007-07-24 10:28           ` Rusty Russell
2007-07-24 12:04             ` Alan Cox
2007-07-24 22:35               ` Rusty Russell
2007-07-24  2:28       ` Rene Herman
2007-07-24  9:33       ` Alan Cox
2007-07-24  1:20     ` Andrew Morton
2007-07-24  1:39       ` Rusty Russell
2007-07-25 22:22     ` Rob Landley
2007-07-26  3:35       ` Rusty Russell
2007-07-27 18:32         ` Rob Landley
2007-07-24  2:21   ` Randy Dunlap
2007-07-24  3:06     ` Randy Dunlap
2007-07-24  3:27       ` Rusty Russell
2007-07-25 19:30     ` Rob Landley
2007-07-24 15:13   ` Jonathan Corbet
2007-07-24 16:00     ` Alan Cox
2007-07-24 16:57       ` Randy Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1184981049.6344.15.camel@localhost.localdomain \
    --to=rusty@rustcorp.com.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox